Java Parallel Computing
I've blogged and written in the past about Pervasive's DataRush Framework (http://www.pervasivedatarush.com/) for parallel programming and computing with Java. When I first used the framework, it involved writing a mix of components in both Java and XML. The XML portions were used to invoke prepared operators, and to instruct the flow of data and processing within the parallel application. While it proved to be a very powerful way of expressing the dataflows within an application, the XML took some getting used to.
As with other software frameworks that rely on, or have relied on, XML as way of configuring or glueing together the pieces that do the work, it can be a bit tedious. Pervasive received this feedback from its customers, and decided to do something about it. The result is a release candidate of the Pervasive DataRush framework that is now an all-Java parallel programming and computing environment. All business logic and dataflows are expressed in 100% pure Java.
What is a Dataflow?
In computer science, dataflow (http://en.wikipedia.org/wiki/Dataflow) has different meanings. In the simplest sense, it's a message-based system where, when data values change, simultaneous affects can be observed throughout the rest of the system. For concurrency, a dataflow consists of autonomous processes that communicate via messages sent over well-defined channels. In a system of dataflow components, there are input and output streams that connect to these channels, linearly and even recursively. The result is a continuous, determinate, processing function that consists of, potentially, many components operating on data concurrently, thereby feeding data to other connected components to concurrently process.
To you, a Java developer, DataRush is a framework that allows you to develop such autonomous components, and connect them through channels you define, to model a process. This process is then computed via these components; many in parallel. You don't simply write code to execute individual tasks concurrently. Instead, you develop components that break down a single task into multiple, parallel, operations that get executed concurrently. This is how DataRush allows you build systems that scale automatically, thanks to the DataRush runtime, to the number of systems, processors, and cores that you run them on.
The DataRush Framework
Pervasive decided that to help Java developers take advantage of the increase in parallelism that we're seeing, thanks to multi-core architectures, they would develop DataRush to support parallel processing with plain-old-Java-objects (POJOs). This way, your code maintains its platform and OS independence, and does not require you to be constrained to a web server, application server, or any other managed code environment, unless you choose to do so.
DataRush supports both 32-bit and 64-bit server Java VMs and environments, and integrates with open-source developer tools such as Eclipse and NetBeans. The DataRush framework comes with a library of operators that you can extend, or use as-is within your processing. Examples are operators that help you read and write to database engines, join data from database tables, process file I/O, sort data, and so on, all in parallel.
You can work with all of the tools you're used to today, including integrated development environments, debuggers, 3rd-party libraries, and Java code profilers. In fact, portions of the development process allow you to inspect your dataflows, and construct them, using a visual designer integrated with Eclipse. Included are tools that offer further profiling capabilities, so you can analyze customer or 3rd-party data sets, and graph your dataflows.
DataRush at JavaOne
Pervasive spoke a great deal about DataRush at JavaOne, and Jim Falgout joined AMD on stage during one of the keynotes to explain what their framework is all about. You can read Jim's accompanying white paper here: http://www.pervasivedatarush.com/blogs/datarush-white-paper
Like I said in the beginning of this blog, I've used earlier versions of the DataRush framework in actual projects, and I found that it works well. Pervasive has made big improvements since then, most of which is the all-Java approach to parallel processing with dataflows (XML is no longer required). Take a look at it to see if it solves your parallel processing needs in Java.
How are you solving parallel-processing challenges today? How do you ensure your server software is scaling to use the growing numbers of cores available to run on? Write back in the comments section and share with your fellow Java programmers.
In future blog entries, we'll explore parallel processing with Java in more detail, using various techniques. Obviously, your input is greatly appreciated.