BubbleSort Won't Leave a Bathtub Ring
I thought this was going to be my third and final post about parallelizing Bubblesort. However, my last idea on the subject went a bit long in the explanation. Therefore, some setup here and code in the next post.
Up to this point I've been describing a task parallel decomposition scheme. Each pass through the data is a task. If there are multiple threads executing different tasks on the same data, we need to assure that those tasks don't overlap in the data they are considering for exchange.
To accomplish this I set up virtual zones in the data that were protected by requiring a thread to hold a lock object associated to the zone before that thread could proceed to examine and exchange items in a zone. Thus, if a thread is currently within a zone that a trailing thread desires to enter, the coveting thread must wait for the preceding thread to finish data manipulations within the zone and release the lock object for that zone.
Besides task decomposition, there is another general approach to parallel algorithms: data decomposition. That is, rather than consider how to divide up the independent tasks than can be run independently to solve the problem, think about dividing the large data set into independent subsets that can be computed on in parallel. For example, to forecast tomorrow's weather for your local area, you can divide that area up into smaller subregions. Then, starting with the current conditions in each region, you can run the prediction algorithm on each region. Each region's computation will be (almost) independent of the computations in the other regions. (I say "almost" since weather at the boundaries of a region, coming from another region, tends to influence the upcoming weather in the target region. As the simulation progresses, wind can blow rain or snow or heat across the region boundaries. There will need to be some coordination of border conditions as the computation proceeds.) The more regions used to divide up the area, the more parallelism can be brought to computing a solution, but the less work there will be within each region and the more synchronization needed to keep the computations across region boundaries coordinated.
Many problems have both a task decomposition and a data decomposition. The code may turn out to be the same, but the mindset that the programmer uses to describe the parallel solution will be different. One of my favorite examples to illustrate this dichotomy is the grading of exams in a large university class with hundreds of students. A cadre of student teaching assistants does the actual grading in these situations. A task parallel solution would be to assign each assistant's grading duty based on the type of question asked. One student will grade the True-False questions, another will grade the short answers section, another might do the essay questions, and so on.

