Channels ▼

Eric Bruno

Dr. Dobb's Bloggers

Using JDK 7's Fork/Join Framework

June 27, 2011

Java 7, which is due to be released within a matter of weeks, has many new features. In fact, it contains more new, big features than the previous Java SE version mainly because it's been so long since Java SE 6 was released. Some of the planned features even had to be deferred to JDK 8. Here's a summary of what's new:

  • JSR-292: Support for dynamically typed languages. Languages like Ruby, or Groovy, will now execute on the JVM with performance at or close to that of native Java code
  • JSR-334: Also called Project Coin, this includes many enhancements to the JVM core to treat smaller languages as first-class citizens
  • Improved class loading
  • JSR-166: The new Fork/Join framework for enhanced concurrency support
  • Unicode 6.0 and other Internationalization improvements
  • JSR-203: NIO.2, which includes better file system integration, better asynchronous support, multicast, and so on
  • Windows Vista IPv6 support
  • SDP, SCTP, and TLS 1.2 support
  • JDBC 4.1
  • Swing enhancements, Nimbus look-and-feel, enhanced platform window support, and new sound synthesizer
  • Updated XML and Web Services stack
  • Improved system and JVM reporting framework included with MBean enhancements

What got deferred to JDK 8? Here's a summary list:

  • Modular support for the JVM (Project Jigsaw)
  • Enhanced Java annotations
  • Java Closures (Project Lambda)
  • JSR-296: Swing Framework to eliminate boiler plate code

For a complete list of enhancements and new features, with full details, click here. For now, let's look at the new Fork/Join framework, and how it helps with Java concurrency.

What Is Fork/Join?

Fork/Join is an enhancement to the ExecutorService implementation that allows you to more easily break up processing to be executed concurrently, and recursively, with little effort on your part. It's based on the work of Doug Lea, a thought leader on Java concurrency, at SUNY Oswego. Fork/Join deals with the threading hassles; you just indicate to the framework which portions of the work can be broken apart and handled recursively. It employs a divide and conquer algorithm that works like this in pseudocode (as taken from Doug Lea's paper on the subject):

Result doWork(Work work) {
    if (work is small) {
        process the work
    }
    else {
        split up work
        invoke framework to solve both parts
    }
}

It's your job to determine the amount of work to process before splitting it up. If it's too granular, the overhead of the Fork/Join framework may hurt performance. But if it's just right, the advantage of parallelism will increase performance. For instance, the sample application we'll examine will look for XML files to process in a set of directories. If there are too many files, the code will use the Fork/Join framework to recursively break down the workload across multiple threads. Since XML file processing involves a combination of I/O and CPU work, this is a perfect use of Fork/Join.

The framework handles the threads based on available resources. It also employs a second algorithm called work stealing, where idle threads can steal work from busy threads to help spread the load around without spawning new threads. The same type of algorithm is often used in garbage collectors that use parallel worker threads to walk the heap.

Java 7 Fork/Join Processing Example

Let's explore a sample application that checks a set of work directories for new XML files. As the files are processed, they're moved out of the work directories and into a special "processed" directory. This sample is loosely based on a news processing system I worked on years ago, where news articles were written to the appropriate directories as they were published. Then, a worker process that periodically checked the directories would process the files, and make them available on a website.

The code below is the complete Fork/Join XML processing application (minus the actual XML processing details). The main class, XMLProcessingForkJoin, starts off the actual parsing of files within a directory periodically. It uses the ProcessXMLFiles class, which extends the Fork/Join framework's java.util.concurrent.RecursiveAction base class, to recursively split up and process all the files in the source directory.

public class XMLProcessingForkJoin {

   class ProcessXMLFiles extends RecursiveAction {
       static final int FILE_COUNT_THRESHOLD = 2;
       String sourceDirPath;
       String targetDirPath;
       File[] xmlFiles = null;

       public ProcessXMLFiles(String sourceDirPath, String targetDirPath, File[] xmlFiles) {
           this.sourceDirPath = sourceDirPath;
           this.targetDirPath = targetDirPath;
           this.xmlFiles = xmlFiles;
       }

       @Override
       protected void compute() {
           try {
               // Make sure the directory has been scanned
               if ( xmlFiles == null ) {
                   File sourceDir = new File(sourceDirPath);
                   if ( sourceDir.isDirectory() ) {
                       xmlFiles = sourceDir.listFiles();
                   }
               }

               // Check the number of files
               if ( xmlFiles.length <= FILE_COUNT_THRESHOLD ) {
                   parseXMLFiles(xmlFiles);
               }
               else {
                   // Split the array of XML files into two equal parts
                   int center = xmlFiles.length / 2;
                   File[] part1 = (File[])splitArray(xmlFiles, 0, center);
                   File[] part2 = (File[])splitArray(xmlFiles, center, xmlFiles.length);

                   invokeAll(new ProcessXMLFiles(sourceDirPath, targetDirPath, part1 ),
                             new ProcessXMLFiles(sourceDirPath, targetDirPath, part2 ));

               }
           }
           catch ( Exception e ) {
               e.printStackTrace();
           }
       }

       protected Object[] splitArray(Object[] array, int start, int end) {
           int length = end - start;
           Object[] part = new Object[length];
           for ( int i = start; i < end; i++ ) {
               part[i-start] = array[i];
           }
           return part;
       }

       protected void parseXMLFiles(File[] filesToParse) {
           // Parse and copy the given set of XML files
           // ...
       }
   }

   public XMLProcessingForkJoin(String source, String target) {
       // Periodically invoke the following lines of code:
       ProcessXMLFiles process = new ProcessXMLFiles(source, target, null);                
       ForkJoinPool pool = new ForkJoinPool();
       pool.invoke(process);
   }

   // Start the XML file parsing process with the Java SE 7 Fork/Join framework
   public static void main(String[] args) {
       if ( args.length < 2 ) {
           System.out.println("args - please specify source and target dirs");
           System.exit(-1);
       }
       String source = args[0];
       String target = args[1];

       XMLProcessingForkJoin forkJoinProcess = 
               new XMLProcessingForkJoin(source, target);
   }
}

It starts with the main class's constructor, XMLProcessingForkJoin, where a new ProcessXMLFiles object is created and handed off to the Fork/Join framework via a call to ForkJoinPool.invoke(). The framework then calls the object's compute() method. First, a check is made to populate the list of files within the directory. Next, if the number of files to process is at or below a threshold (two files in this case), the files are processed and we're done. Otherwise, the array of files is split into two parts, and two new Fork/Join tasks are created to process each sublist of files, and so on, recursively, until all the files are parsed and processed.

Since the code just parses XML files, I chose to extend RecursiveAction in this application. If your processing actually returns a result that needs to be combined with the results of other Fork/Join subtasks (i.e. sorting, compressing data, tallying numbers, and so on), then you can extend RecursiveTask. I'll take a closer look at this and other changes to the concurrent classes in Java SE 7 in a future blog.

Happy coding!
-EJB

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video