Game of Life — Using Threads and Lists, Part 1
I'm still looking at the Game of Life simulation. In my last post I described the algorithm for focusing processing in only those areas of a grid where cells are volatile; that is, living or dying or being born. A list data structure was used to hold a collection of potentially changing cells and the computation in each generation only considered those locations. For this article and the next two, I want to rework that serial algorithm and describe some approaches to thread it for parallel execution. I can think of three different methods that could be attempted. Unlike my previous parallelization of the original serial code version, the threading ideas on the list-based variation of the Game of Life application will use a task decomposition for the parallelism.
Method the First: Parallelize the List Traversals
In my list-based Game of Life, the code to compute the current generation involved the traversal of the four declared lists. I can consider each of these list traversals as a separate task. These tasks aren't all independent, though. Recall the main processing loop code from the previous post:
while (gencount < GENS) { gencount++; TraverseList(&maylive, Vivify); TraverseList(&maydie, Kill); ClearList(&maylive); ClearList(&maydie); TraverseList(&newlive, AddNeighbors); TraverseList(&newdie, SubtractNeighbors); ClearList(&newlive); ClearList(&newdie); }
Since the first two calls to TraverseList()
take data from different lists (maylive
and maydie
) and update different lists (newlive
and newdie
, respectively), there are no obvious dependencies and these two calls can be run in parallel with some synchronization either before or after clearing the lists (depending on what you assign to each thread). Plus, you don't need to use a thread-safe list implementation. Any parallelization construct that can generate separate tasks and assign them to threads can be used. For example, creating a new thread in Pthreads for each call in each loop iteration (high overhead), assigning the task to a thread within a pool of threads, Intel Threading Building Blocks tasks, or an OpenMP sections construct, are all possible.
What about duplicate grid points on the two lists involved? This situation was raised at the end of my previous post and I showed that it was a possibility. However, as was pointed out, too, only at most one of the traversals will actually succeed since the state of neighbor counts will have stabilized before these list traversals. The conditions dictate either birthing a new cell or killing a live cell or leaving the situation as is, not a combination of two or more. So this poses no dependency between the two calls.
The second pair of traversals passes through the "might" lists to see if there could be newly living or newly dead cells in the next generation. If I try the same kind of parallelism on the second pair of TraverseList()
calls (treating each call as separate task to be assigned to a thread), I run into problems. While each of these calls gets grid points from different lists, the two functions executed on each element of the list being traversed can potentially update either the maylive
and maydie
lists. Thus, there is the possibility that two threads will both attempt to add a cell to the maylive
list and I need to have some thread-safe list structure.
My List
data structure is defined as
typedef struct list { int count; Cell entry[MAXLIST]; } List;
where a Cell
is pair of integers representing the row and column index of a grid point and MAXLIST
is some upper bound on the size of each list based on the size of the grid being used. With this List
structure, the code for the AddList()
operation can easily be made thread-safe. Since I've implemented my example code on a Windows system, I have employed the InterlockedIncrement
intrinsic function and use the return value as the index in which to write a new element into the array holding List
items.
void AddList(Cell x, List *list) { if (ListFull(list)) Warning("Attempt to insert at the end of a full list\n"); else{ int c = InterlockedIncrement(&list->count); list->entry[c] = x; } }
For other threading methods without the Interlocked intrinsics, any method to enforce mutually exclusive access to updating the count of items on the list and return that count for indexing will be sufficient. (Note that there is no need to require a thread-safe method of reading an element from any one list since only one thread will be taking data from any one list. Thus, no dependency on that front.)
Regardless of the need for thread-safe lists, you will have to implement some form of mutual exclusion for updating the numNeighbors
values that could possibly be incremented and decremented concurrently. I'll have some discussion on how to do that in support of the second parallelization method in the next post.
Now, if you've actually read down this far, you probably have figured out that parallelizing the algorithm this way is not going to be very scalable. Each parallel portion of the code will have at most two threads active at any time and there is a good chance of load imbalance if one list is longer than the other. I don't recommend this approach, but I wanted to describe and debunk an obvious method before getting to something more substantial and satisfying in the next installment.