How Not to Pop Your Bubblesort with Threads
Looking at the code above should get your concurrency-sense tingling. In the midst of execution of the
j is at the last element of a data zone (
j == releasePoint), won't that thread be accessing (and maybe updating) an item from within the next zone (
A[j+1])? Is there a chance that this data race could cause a problem? (When I analyzed this code with the Intel Parallel Inspector XE tool, it pointed the code out as a data race.) Using the bookshelf on the submarine analogy of my last post, this would be like Duncan reaching the end of his current zone, opening up the hatch to the next, bumping the preceding character (Banquo, who has decided to swap the first two books in his zone, but not done so, yet), doing the comparison, and possibly swapping the books before or while Banquo exchanges book positions. The result will be a data race if both Duncan and Banquo decide to swap items. Could this lead to incorrect results or a livelock?
What if I can show that it doesn't? I see three cases that need to be addressed. The first two are two sides of the same coin and the third looks bad. For concreteness, I'm going to use four contiguous elements from an integer array with a zone boundary at the second element, say index
In all three cases, Thread
T0 has gone through the first data zone, released the lock, and acquired the lock for the next zone, and is ready to start execution with the local
j = q (or
T1 acquired the lock to the first zone (after the release by
T0) and reached the last comparison within that first zone (
j = p) before
T0 has been allowed to proceed with the first comparison in the second zone. Thus, both threads are poised to execute the
if-test at the start of the
for loop body. The overlapping element is in the
A[q] element and it is this element that I need to show has no data race.
In Case 1 I will have the four array elements holding 8, 10, 12, and 7. The overlapping element,
A[q], in this case contains 12.
T0 will determine that the
A[r] elements must be swapped and
T1 will not need to swap
A[q]. Thus, there is no data race on
T1, after finding no need to swap, will wait for
T0 to relinquish the lock for the second zone before it will touch
In Case 2, I use the four array elements holding 8, 12, 7, and 10. The overlapping element,
A[q], in this case contains 7.
T0 will not need to swap
T1 will need to swap
A[q]. Again, there is no data race on
In Case 3, I want to have the four array elements holding 10, 12, 8, and 7 and
T0 will swap the 8 and 7 while
T1 will be swapping the 12 and 8. Things have gotten a lot more interesting than the previous cases. One potential interleaving of the swap instructions after each thread has executed the
if-test and found that a swap must take place could be as follows:
T1: temp = A[p]; T0: temp = A[q]; T1: A[p] = A[q]; T1: A[q] = temp; T0: A[q] = A[r]; T0: A[r] = temp;
If you're following along on your own, you will see that the "12" item has been lost and replaced by a duplicate item "8". Anytime you can potentially "lose" a data item, you've got big problem. This interleaving (and many more like it) looks like a disastrous data race that could scuttle my parallel Bubblesort code worse than a screen door on a submarine.
One fix that I could implement would be, just before the last comparison in a zone, requiring the thread to acquire the lock on the next zone. For access to that last element where the overlap occurs, a thread would need to be holding two locks. This solves the problem illustrated above, but the already-complicated code would just get more convoluted. Plus, the additional overhead won't help performance and the number of threads that can be executing within the array at the same time gets restricted even further (lowering the scalability of the code).
Before you think I've installed a screen door onto my metaphorical submarine, let me tell you that there is a silver lining to this gray cloud. If you do some algorithmic analysis, you will find that Case 3 is impossible. (In fact, Case 2 can never occur in practice, either.) After a thread has gone through a zone, the first element just beyond the boundary item will be greater than all the items in the previous zone. In each pass, Bubblesort pushes items with larger key values into higher indexed slots. Consider, in Case 3, if "12" were the largest key value in the first zone, Thread
T0 would have swapped this value into
A[q] before relinquishing the lock on the first zone to
T1. In fact, after
T0 has run through all elements of a zone and released the lock to that zone,
A[q] will contain the largest value (or a value larger than the largest) from the first zone. Thus, the only valid case I need to consider for interleaving analysis was Case 1, and that case had no data race.
When trying to prove or disprove the correctness of your concurrent algorithms, don't rely solely on an interleaving analysis. You must also consider the properties of the serial algorithm and how those may or may not have been changed due to the transformation of the serial code.