Tag Archives: recursion

Traversing all files under a Directory with java.util.File

evertwagenaar.com logo

Today I had this requirement.

I have a large collection of HTML on my local SSD. I downloaded a complete site with the wget command in order to do this  I used the command below:

wget --mirror --convert-links --adjust-extension --page-requisites 
--no-parent http://www.yoursite.org

This downloads the full website to your current directory, including scripts, CSS-files, graphics and videos. Then I needed to access all HTML files to clean them. I wanted to use Java so had to write a recursive function. After some attempts I got the following code. After the weekend I hope to be finished with the project and put the results online.

// Process all files and directories under dir
public static void visitAllDirsAndFiles(File dir) {
    process(dir);

    if (dir.isDirectory()) {
        String[] children = dir.list();
        for (int i=0; i<children.length; i++) {
         //Recursive call:
           visitAllDirsAndFiles(new File(dir, 
           children[i]));
        }
    }
}

// Process only directories under dir
public static void visitAllDirs(File dir) {
    if (dir.isDirectory()) {
        process(dir);

        String[] children = dir.list();
        for (int i=0; i<children.length; i++) {
            visitAllDirs(new File(dir, children[i]));
        }
    }
}

// Process only files under dir
public static void visitAllFiles(File dir) {
    if (dir.isDirectory()) {
        String[] children = dir.list();
        for (int i=0; i<children.length; i++) {
            visitAllFiles(new File(dir, children[i]));
        }
    } else {
        process(dir);
    }
}

In my case I only need some parts of the HTML code, which I want to transfer to MySQL. (The project I’m working on is a webscraping project).

I hope to be able to show you more details after the weekend. So stay tuned!

What is recursion or recursive?

recursion

Recursion or recursive

Recursion in computer science is a method of solving a problem where the solution depends on solutions to smaller instances of the same problem (as opposed to iteration). The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion or recursive evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions

As opposed to Iteration (a loop with a fixed size), recursion is not limited to a fixed size. Recursion can be best explained as a function calling itself. The classical example is recursion trough a FileSystem. This works as follows:

  1. Start at the top (‘C:/’ or ‘/’)
  2. Go to the next file;
  3. Is this a directory or file?
  4. In case of a directory -> Start over from the current directory, in case of a file-> Do something (like indexing)

This (simple) example traverses trough the whole File System.

A good and working example can be found in this article.