That second part of the aforementioned algorithm modification seems a bit overly elaborate. It is simple enough to write up an *inorder* traversal function, but it would be so much easier if we wrote out the moves in such a way that the final printing is done by running through the array in index order. Using that idea, the root of the Tower of Hanoi tree goes into the middle array element. The roots of child subtrees will be placed into the middle position of the elements to the left and right of the (parent) root. After this it becomes a bit trickier to figure out where the next level of child nodes must go. For example, the right child of the left child of the root needs to be assigned to the middle position of the subarray between the root and the root's left child.

I was hoping there would be a simple formula to use in computing the index of the child nodes that could be done independently of the computation of index values for all other nodes in the tree. No such luck. However, when I sketched out a 2- and 3-level perfect binary tree and labeled each node with the index value that it must occupy, it was obvious how to compute the indices of child nodes given the index of the parent node and the level of node in the tree. (See if you can figure it out for yourself before you look at the code.) As with the variation using the heap structure on the array described above, we can add a parameter to the `tower()`

function that holds the array index of the current node and change the `level`

parameter to an offset value that can also determine when the recursion is no longer needed.

void tower(char src, char dest, char temp, int idx, int offset) { if (offset > 0) { cilk_spawn tower(src, temp, dest, idx-offset, offset/2); plan[idx][0] = src; plan[idx][1] = dest; cilk_spawn tower(temp, dest, src, idx+offset, offset/2); cilk_sync; } else { plan[idx][0] = src; plan[idx][1] = dest; } }

The initial value for `idx`

is set to `2**(numDisks – 1)`

and the starting value for offset is half of that. Obviously, `plan`

is a shared 2-dimesional array that is used to hold the peg notations for each move that corresponds to the node of the Tower of Hanoi tree. Once the computation is complete, because we populated the `plan`

array with an *inorder* traversal indexing scheme, we simply print out the moves starting from `plan[1][*]`

. To me, this is a more intuitive way to save the moves of the Tower of Hanoi solution than to do the *inorder* traversal of the saved results.

Is the parallel version faster than the serial version? Is there some number of disks where the parallel version takes less time than the serial execution? I don't know. I didn't run timings on the two versions, but then relative performance wasn't the point of this exercise. The point was to show that there are some computations whose results may appear to require exclusively serial execution. However, sometimes those seemingly serial computations may be coded with parallel algorithms if you can figure out how to present the final results in the proper order.

You may not need something as complex as a binary tree stuffed into an array. More likely, you will encounter situations where several linearly ordered tasks can be executing in parallel with the requirement that the results be "output" in the original linear order. A queue (with random access allowed for threads adding elements) can be used with a dedicated thread pulling out completed results in the required order. I guess what I've been trying to say in all of this is to give some thought to how ordered output could be organized by multiple tasks before you dismiss a computation as being strictly sequential.