Pipelines/Streams offer easy parallel programming
Pipelines and/or stream programming often get high marks from users for being intuitive. They also qualify as a magic bullet to making parallel programming easy and successful.
Could it be this easy?: If you write your program as a pipeline of stages, you can magically get a parallel programs easily?
But you have to buy into one more thing: no global side-effects. In some cases (like TBB) you'll have to do this yourself, but in other systems (like CCC) you are forced by the programming model to completely avoid side-effects.
The "magic" which makes this all so easy for parallel programming comes from three things:
- to be parallel you need independent work to run in parallel: if you pipeline your work (streaming data) and you have no interdependencies other than the data streams themselves (no global side-effects) you get exactly that: independent work to run in parallel
- the pipeline stages themselves can be broken up to run in parallel by either data parallelism, or possibly a pipeline of their own (so nested parallelism is important)
- the very very sticky problem of data placement, which becomes a more and more severe problem in the future, is solved implicitly (the migration of data is very clear and very clean)
This magic combines to make parallelism highly likely.
Point #2 is subtle but important. Scaling through a pipeline might seem to be limited ot the number of stages... so that a 10 stage pipeline would seem to have 6 idle processors if run on a 16-processor machine. In practice, this does not happen - because individual pipeline stages will have parallelism within. In some programming systems this is exploited automatically.
Judging this by the "big three problems of parallelism" (scaling, debugging, maintaining) - and you find that pipelined/streamed programs are more likely to scale well and to be portable to future architectures. As for debugging - unless you introduce cycles, you can be deadlock and race-free.
So, pipeline/stream programming can save the world - or at least make parallel programming easier.
Intel Threading Building Blocks offers pipeline constructs for just this reason - and is available for free. TBB doesn't force you to make you program easy to debug - it makes the "no global side-effects" your problem. On the other hand, TBB makes doing pipelines in C++ with great scalability very easy including solving the "point 2" item above - it slips into a C++ program without a great deal of learning necessary.
Whatif.intel.com offers Intel Concurrent Collections for C/C++ which embodies many of the concepts here - and is available to experiment with for free. CCC takes more effort to learn, but in exchange guarantees a determinisitc program. CCC does not make solving the "point 2" item above as well as TBB, but CCC is new and this might be some feedback we see from users.
Parallel programming is easy when you write in terms of pipelines.