Since this is a recursive algorithm and I tend to post material about parallel algorithms, I asked myself the question: "Can the Tower of Hanoi puzzle be solved in parallel?" If I'm physically moving pegs, then the answer is "No." Even with a myriad of multi-tasking monks making many masterly moves, the 64-disk puzzle cannot have more than one disk moved at a time. Even if I'm just printing out instructions, the order of printing is going to be a serial constraint to get the correct answer. Undaunted, I used Intel Cilk Plus and simply prefixed the recursive calls to `tower()`

with the `cilk_spawn`

keyword. The first output I got from running the parallel code on a two-core (four-thread) system solving the 3-disk puzzle was:

Moving 3 pegs from A to C Move from A to C Move from A to C Move from B to A Move from B to C Move from A to C Move from A to B Move from C to B

It's easy to determine that this is an erroneous output. The first two moves violate the disk order constraint, the third move is impossible since no disk has been previously moved to peg `B`

, and the final move doesn't put the last disk moved on the final destination peg. So, there are some obvious ordering issues between tasks and threads. Solving those issues with some kind of synchronization will just serialize the whole execution and the total time would be longer than the serial time due to all the overhead for spawning and managing tasks.

What if I knew or could compute, in advance, where a print statement was going to fall within the overall output stream? In that case, I could simply store the puzzle solution steps in an array at the proper position to be printed when computation has completed or when all elements in lower indexed positions have been filled and output. Luckily, with the Tower of Hanoi, I can compute where each and every executed output statement will fall within the entire output results; and I can do that computation independently of any other computation or other output.

As with all other recursive examples I've written about, you can visualize the solution moves of the Tower of Hanoi organized as a tree. It will be a perfect binary tree where each node is a move of one disk from a source peg to a destination peg. The left child of a non-leaf node is the tree solving the first recursive call, while the right child is the tree solving the second recursive call. Now, with this binary tree in your mind's eye, what algorithm does the `tower()`

code above suggest? Did you think of *inorder traversal* (i.e., process left subtree, process node, process right subtree)?

Do we know the minimum number nodes needed in the Tower of Hanoi tree for a given number of disks, `D`

? Yes, there will be (2**`D`

) – 1. (See Section 1.1 of *Concrete Mathematics* by Graham, Knuth, and Patashnik for a derivation of this limit.) For 2 disks, there will be 3 moves; for 3 disks, we have 7 moves; for 64 disks, it works out to 18,446,744,073,709,551,615 moves. (Note: if Lucas's monks could move 1 disk per second, they will need almost 5,845,420,461 centuries to complete the task. Regardless of when you think the universe started, I'd say we have quite a few more years. Take that Mayan calendar!) Thus, to hold all nodes of the solution tree, we simply allocate an array of size 2**`D`

.

How can you encode a binary tree (a 2D object) in an array (a 1D object)? One way is to use a scheme analogous to the heap data structure. The data in the heap is held in an array as a complete binary tree (with the heap property enforced at each node on the tree). The root of the tree is in index position [1]. The left child of the node at index [n] will be in the [2n] indexed element and the right child will be in the [2n+1] indexed element. If we add a parameter to the `tower()`

function for the index of the current node of the Tower of Hanoi tree, we can use that to store the move of that node rather than print out a message. The indices of the child trees are easily calculated and would be used as parameters to the recursive calls and tasks generated to execute those calls. Once all the recursive tasks have concluded, simply do a serial *inorder* traversal of the tree held in the array, printing out the moves, to complete the computation.