Channels ▼
RSS

Parallel

Lambdas and Streams in Java 8 Libraries


Parallelism Under the Hood

With the Fork/Join framework added in Java SE 7, the JDK has an API for efficiently implementing parallel computations. However, parallel code with Fork/Join looks very different from (and much bigger than) the equivalent serial code, which acts as a barrier to parallelization. By supporting the exact same set of operations on sequential and parallel streams, users can switch between serial and parallel execution without rewriting their code, removing this barrier and making parallelism more accessible and less error-prone.

The steps involved in implementing a parallel computation via recursive decomposition are: dividing a problem into subproblems, solving a subproblem sequentially to produce a partial result, and combining the partial results of two subproblems. The Fork/Join machinery is designed to automate this process.

In order to support the full set of operations on any stream source, we model the stream source with an abstraction called Spliterator, which is a generalization of a traditional iterator. In addition to supporting sequential access to the data elements, a spliterator also supports decomposition: just as an Iterator lets you carve off a single element and leave the rest described by the Iterator, a Spliterator lets you carve off a larger chunk (ideally, half) of the input elements into a new Spliterator, and leave the rest of the data to be described by the original Spliterator. (Both spliterators can then be decomposed further.) Additionally, a spliterator can provide source metadata such as the number of elements (if known) and a set of boolean characteristics (such as "the elements are sorted") that can be used by the Streams framework to optimize execution.

This approach separates the structural properties of recursive decomposition from the algorithms that can be executed in parallel on decomposable data structures. The author of a data structure need only provide the decomposition logic, and then immediately gets the benefit of parallel execution of stream operations.

Most users won't ever have to implement a Spliterator; they'll just use the stream() methods on existing collections. But, if you ever are implementing a collection or other stream source, you might want to consider providing a custom Spliterator. The API for Spliterator is shown below:

public interface Spliterator<T> {
    // Element access
    boolean tryAdvance(Consumer<? super T> action);
    void forEachRemaining(Consumer<? super T> action); 

    // Decomposition
    Spliterator<T> trySplit();

    // Optional metadata
    long estimateSize();
    int characteristics();
    Comparator<? super T> getComparator();
}

Base interfaces such as Iterable and Collection provide correct but low-performance spliterator() implementations, but sub-interfaces (like Set) and concrete implementations (like ArrayList) override these with higher-quality spliterators that take advantage of information not available to the base type. The quality of a spliterator implementation will affect performance of stream execution; returning well-balanced splits from the split() method will improve CPU utilization, and providing the correct characteristics and size metadata will enable many other optimizations.

Encounter Order

Many data sources, such as lists, arrays, and I/O channels, have a natural encounter order, which means the order in which the elements appear has significance. Others, such as HashSet, have no defined encounter order (and therefore an Iterator for a HashSet is permitted to serve up the elements in any order it likes.)

One of the characteristics tracked by Spliterator, and used by stream implementations, is whether the stream has a defined encounter order. With a few exceptions (such as Stream.forEach() or Stream.findAny()), parallel operations are constrained by encounter order. This means that in a stream pipeline like

List<String> names = people.parallelStream()
                           .map(Person::getName)
                           .collect(toList());

the names must appear in the same order as the corresponding people did in the stream source. Usually, this is what we want, and for many stream operations, this is not prohibitively expensive to preserve. On the other hand, if the source were a HashSet, then the names could appear in any order, and might appear in a different order across multiple executions.

Streams and Lambdas in the JDK

Having exposed Stream as a top-level abstraction, we want to ensure that the features of Stream are available as widely throughout the JDK as possible. Collection has been augmented with stream() and parallelStream() methods for converting collections into streams; arrays can be converted into streams with Arrays.stream().

Additionally, there are static factory methods in Stream (and the associated primitive specializations) for creating streams, such as Stream.of, Stream.generate, and IntStream.range. Many other classes have acquired new stream-bearing methods, such as String.chars, BufferedReader.lines, Pattern.splitAsStream, Random.ints, and BitSet.stream.

Finally, there is a set of APIs for constructing streams, to be used by library writers who wish to expose stream functionality on non-standard aggregates. The minimal information needed to create a Stream is an Iterator, but if the creator has additional metadata (such as knowing the size), the library can provide a more efficient implementation by implementing a Spliterator (as all of the JDK collections have).

Comparator Factories

The Comparator class has acquired a number of new methods that are useful for building comparators.

The static method Comparator.comparing() takes a function that extracts a Comparable sort key and produces a Comparator. Its implementation is very simple:

public static <T, U extends Comparable<? super U>> Comparator<T> comparing(
        Function<? super T, ? extends U> keyExtractor) {
    return (c1, c2) 
        -> keyExtractor.apply(c1).compareTo(keyExtractor.apply(c2));
}

Methods like this are an example of higher-order functions — functions who take as arguments functions or return new functions. Methods like this simplify user code by reducing duplication:

List<Person> people = ...
people.sort(comparing(p -> p.getLastName()));

This is much cleaner than the "old way," which usually involved an anonymous class instance that implemented Comparator. But the real power of this approach is its improved composability. For example, Comparator has a default method for reversing its direction. So, to sort the people by last name in reverse order, we can simply create the comparator as before, and then ask it to reverse itself:

people.sort(comparing(p -> p.getLastName()).reversed());

Similarly, the default method thenComparing allows you to take a Comparator and refine its behavior when the initial comparator views two elements as equal. To sort the people by last name then first name, we would do:

Comparator<Person> c = Comparator.comparing(p -> p.getLastName())
                                 .thenComparing(p -> p.getFirstName());
people.sort(c);

Mutative Collection Operations

Stream operations on collections produce a new value, collection, or side effect. However, sometimes we do want to mutate the collection in-place, and some new methods have been added to Collection, List, and Map to take advantage of lambdas, such as Iterable.forEach(Consumer), Collection.removeAll(Predicate), List.replaceAll(UnaryOperator), List.sort(Comparator), and Map.computeIfAbsent(). Additionally, non-atomic versions of the methods from ConcurrentMap, such as replace and putIfAbsent have been pulled up into Map.

Conclusion

While adding lambda expressions to the language is a huge step forward, developers get their work done every day by using the core libraries, so the language evolution effort was paired with a library evolution effort so that users could start using the new features on day one. The centerpiece of the new library features is the Stream abstraction, which provides powerful facilities for aggregate operations on data sets, and has been deeply integrated with the existing collection classes as well as other JDK classes.


Brian Goetz is a Java Language Architect at Oracle and the author of Java Concurrency In Practice. He was the specification lead for JSR-335 (Lambda Expressions for the Java Language) and has served on numerous other JCP Expert Groups. Reprinted from Java.net with permission from Oracle Corporation.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video