Superscalar sequence and speculative selection are two parallel patterns that are closely related to serial computing; in fact, they represent ways that "implied" parallelism in a serial program can be extracted, if suitable restrictions are placed on the "serial" programming model. Drawing an analogy with sequential programming, you would expect the next pattern to be iteration. However, while iteration is used in serial programming to express many computations that are potentially parallelizable, it is not itself a parallel pattern. This is because in serial programming, each iteration can depend in arbitrary ways on previous iterations, so in general some loops are parallelizable while others, which can look very similar, are not. For parallel programming, we want explicit constructs that we are sure can be executed in parallel.
If we disallow dependencies between loop iterations, we do get a type of computation that can be parallelized easily: the map pattern. In the map pattern, a set of loop indices are generated and independent computations are performed for every unique index. These computations are not allowed to communicate with one another. At the end of the map, however, there is an implied barrier, and then other operations can depend on the output of the map operation as a whole.
Often data access is associated with the map. In particular, the map may be generated by applying a function to an array and replicating that function over every element of the array. In this case the index set is generated from the set of allowable array indices. In the same way, the outputs of a map may be collected into a contiguous output array. This relationship between processing and data input and output can be diagrammed as follows:
These forms of data access are very coherent in nature and are ideal targets for optimization. For example, a compiler or parallel programming platform may want to use knowledge of a map's coherence to generate code to prefetch data into a cache and evict data from cache, or to otherwise pre-schedule data movement. In addition, the map can be hierarchically "strip-mined" to exploit multiple levels of parallelism and multiple parallelism mechanisms. Parallel mechanisms in processors include superscalar instruction issue, pipelining, and parallel memory/compute operations in addition to the more obvious mechanisms such as multiple cores and vector instructions. Starting from the very simple parallelism structured represented by the map, it is possible for an automated system to generate an efficient implementation that can exploit multiple mechanisms at the same time. For example, the above map may be implemented on a dual-core processor with 4-way vector units and 4-cycle pipeline latency as follows:
Iteration gives rise to other patterns besides map. It is possible to parallelize loops in certain cases even when there are dependencies between loop bodies. However, the strong assertion of independence between iterations provided by the map gives the most opportunity for parallel execution.
It is also common to see the map operation combined with other operations, such as random read or write. It is also possible to elaborate on the input and output data access patterns, to allow, for example, strided access, multiple inputs and outputs, and neighborhood operations.