A high-level standard is needed for parallel programming that addresses the needs of large-scale software development. Existing standards and standard proposals do not provide the required levels of reliability and maintainability; an alternative approach is required. A top-down approach based on the composition of structured, deterministic patterns of parallelism can satisfy the needs of large-scale software development while still providing high levels of scalability and performance.
Recently there has been a move towards standards for many-core computing. However, almost all systems and standards in parallel computing to date have been very mechanism-oriented. A mechanism-oriented standard is designed bottom-up, putting a thin layer of abstraction over features that are motivated primarily by existing hardware architectures.
In contrast, our philosophy has been application-oriented. An application-oriented system considers the needs of application algorithms first and the natural ways for these to be expressed. Requirements that large-scale applications have that are not being addressed by mechanism-oriented system and standards proposals are the need for safety, structure, and determinism.
We have demonstrated that it is possible to satisfy these needs in a high-performance system. Our approach has been based on the identification of structured patterns of computation in application workloads that can be reliably mapped onto high-performance implementations by an automated system. Many of these patterns happen also to be deterministic, and their composition results in a safe and deterministic but high-performance implementation.
Programming models and platforms can be based on these patterns, and can provide a level of automated support for efficient implementation and optimization. Programming platform interfaces that can allow structured patterns of parallel computation to be expressed also allow the programming system to better capture the intent of the software developer, especially with regard to memory coherence.
This more application-oriented, structured approach to designing and implementing parallel algorithms via a supporting platform is particularly relevant for many-core processors with a large amount of parallelism. While improving the productivity of experts, specific patterns and fused combinations of patterns can also guide relatively inexperienced users to developing efficient algorithm implementations that have good scalability.
This approach to parallelism can include a unified approach to developing parallel software. Deterministic patterns can include both collective "data-parallel" patterns such as map and reduce as well as structured "task-parallel" patterns such as superscalar task graphs. The structured pattern based approach, like data-parallel models, addresses issues of both data access and parallel task distribution in a common framework. Optimization of data access is important for both many-core processors with shared memory systems and accelerators with their own memories not directly attached to the host processor.
Ultimately, the industry needs to see new or existing standards evolve to address the parallelism challenge from the application point of view. With this approach, both data and task parallel algorithms can be mapped onto patterns that can then be executed efficiently on both multi-core processors and accelerators.
We have identified on the order of a dozen patterns that are useful for deterministic parallel computing. Considering the complexity and importance of the problem, this is not actually a large set, but is too much for a single article. I will therefore be posting about each of these in turn, along with examples of their usage. I will also be discussing how these simple patterns can form the basis for a solid high-level standard for highly productive and modular parallel computing.