Dr. Dobb's | The Byzantine Generals Problem

The Byzantine Generals Problem is an agreement protocol that's built around an imaginary General who makes a decision to attack or retreat, and who must communicate his decision to his lieutenants.

The Byzantine General's problem is one of many in the field of agreement protocols. In 1982, Leslie Lamport described this problem in a paper written with Marshall Pease and Robert Shostak. Lamport framed the paper around a story problem after observing what he felt was an inordinate amount of attention received by Dijkstra's Dining Philosopher problem.

This problem is built around an imaginary General who makes a decision to attack or retreat, and must communicate the decision to his lieutenants. A given number of these actors are traitors (possibly including the General). Traitors cannot be relied upon to properly communicate orders; worse yet, they may actively alter messages in an attempt to subvert the process.

The generals are collectively known as processes. The general who initiates the order is the source process, and the orders sent to the other processes are messages. Traitorous generals and lieutenants are faulty processes, and loyal generals and lieutenants are correct processes. The order to retreat or attack is a message with a 1 or 0.

In general, a solution to agreement problems must pass three tests: termination, agreement, and validity. As applied to the Byzantine General's problem, these three tests are:

One side effect of this is that if the source process is faulty, all other processes still have to agree on the same value. It doesn't matter what value they agree on, they simply all have to agree. So if the General is subversive, all lieutenants still have to come to a common, unanimous decision.

Difficulties

This agreement problem doesn't lend itself to an easy solution. Imagine, for example, that the source process is the only faulty process. It tells half the processes that the value of their order is 0, and the other half that their value is 1.

After receiving the order from the source process, the remaining processes have to agree on a value that they will all decide on. The processes could quickly poll one another to see what value they received from the source process.

In this scenario, imagine the decision algorithm of a process that receives an initial message of 0 from the source process, but sees that one of the other processes says that the correct value is 1. Given the conflict, the process knows that either the source process is faulty, having given different values to two different peers, or the peer is faulty, and is lying about the value it received from the source process.

It's fine to reach the conclusion that someone is lying, but making a final decision on who is the traitor seems to be an insurmountable problem. And in fact, it can be proven that it is impossible to decide in some cases. The classic example used to show this is when there are only three processes: One source process and two peer processes.

In the configurations in Figures 1 and 2, the peer processes attempt to reach consensus by sending each other their proposed value after receiving it from the source process. In Figure 1, the source process (P₁) is faulty, sending two different values to the peers. In Figure 2, P₃ is faulty, sending an incorrect value to the peer.

You can see the difficulty P₂ faces in this situation. Regardless of which configuration it is in, the incoming data is the same. P₂ has no way to distinguish between the two configurations, and no way to know which of the two other processes to trust.

This situation doesn't necessarily get better just by throwing more nonfaulty processes at the problem. A naïve algorithm (as in Figures 1 and 2) would have each process tell every other process what it received from P₁. A process would decide the correct decision by simple majority.

It's relatively easy to show that, regardless of how many processes are in the system. A subversive source process with one collaborator can cause half the processes to choose to attack, and half the processes to retreat, leading to maximum confusion.

The Lamport, Pease, and Shostak Algorithm

In 1982, Lamport, Pease, and Shostak published a straightforward solution to this problem (research.microsoft.com/users/ lamport/pubs/byz.pdf). The algorithm assumes that there are n processes, with m faulty processes, where n>3m. Thus, for a scenario such as that in Figures 1 and 2 with one faulty process, there would have to be a minimum of four processes in the system to come to agreement. (For purposes here, n refers to the count of processes, and m to the number of faulty processes.)

The definition of the algorithm in the original paper is short and succinct, but can be confusing for programmers who don't have experience with distributed algorithms.

Lamport's algorithm is a recursive definition, with a base case for m=0, and a recursive step for m>0:

Lamport's Algorithm Definition

To most programmers, this is going to look like a conventional recursive function definition. However, it doesn't quite fit into the conventional recursive function mold you learned when studying the example of factorial(n).

Lamport's algorithm actually works in two stages. In the first step, the processes iterate through m+1 rounds of messages. In the second stage of the algorithm, each process takes all the information it has been given and uses it to come up with its decision.

The First Stage

The first stage is simply one of data gathering. The algorithm defines m+1 rounds of messaging between all the processes.

In Round 0, the General sends the order to all of its lieutenants. Having completed his work, the General now retires and stands by waiting for the remaining work to complete. Nobody sends any additional messages to the General, and the General won't send any more messages.

In each of the remaining rounds, each lieutenant composes a batch of messages, each of which is a tuple containing a value and a path. The value is simply a 1 or a 0. The path is a string of process IDs, <ID₁, ID₂,...,ID_n>. What the path means in this context is that in Round N, P_ID1 is saying that it was told in Round N-1 that P_IDN-1 was told by P that the command value was v. (This is much like the classic party game in which a message is whispered from ear to ear through a chain of players, becoming slightly mangled along the way.) No path can contain a cycle. In other words, if ID₁ is 1, no other ID in the string of process IDs will be a 1.

The message definition is easy in Round 1. Each process broadcasts a message to all the other processes, including itself, but excluding the General, with the value it received from the General and its own process ID.

In subsequent rounds, things get more complicated. Each process takes all the messages it received from the previous round, appends its process ID where allowed, and sends those messages to all other processes, including itself. (The "where allowed" just means that the process skips any messages where adding its process ID to the list would create a cycle in the string of process IDs.)

For example, suppose that in Round 0 that P₁, a faulty general, told P₂, P₃, and P₄ that the command value was 0, and told P₅, P₆, and P₇ that the command value was 1. In Round 1, the messages in Table 1 would be sent.

Sender = P₂		Sender = P₃		Sender = P₄		Sender = P₅		Sender = P₆		Sender = P₇
Dest	Msg	Dest	Msg	Dest	Msg	Dest	Msg	Dest	Msg	Dest	Msg
P₂	{0,12}	P₂	{0,13}	P₂	{0,14}	P₂	{1,15}	P₂	{1,16}	P₂	{1,17}
P₃	{0,12}	P₃	{0,13}	P₃	{0,14}	P₃	{1,15}	P₃	{1,16}	P₃	{1,17}
P₄	{0,12}	P₄	{0,13}	P₄	{0,14}	P₄	{1,15}	P₄	{1,16}	P₄	{1,17}
P₅	{0,12}	P₅	{0,13}	P₅	{0,14}	P₅	{1,15}	P₅	{1,16}	P₅	{1,17}
P₆	{0,12}	P₆	{0,13}	P₆	{0,14}	P₆	{1,15}	P₆	{1,16}	P₆	{1,17}
P₇	{0,12}	P₇	{0,13}	P₇	{0,14}	P₇	{1,15}	P₇	{1,16}	P₇	{1,17}

Table 1: Messages sent by all six lieutenant processes in Round 1.

Things get more complicated in the second round. From the previous rule, we know that each process now has six values that it received in the previous round—one message from each of the three processes—and it needs to send each of those messages to all of the other processes, which might mean each process would send 36 messages out.

In Table 1, I showed the messages being sent to all six processes, which is redundant because the same messages are broadcast to all processes. For Round 2, I just show the set of messages that each process sends to all of its neighbors.

The six messages that P₂ received in Round 1 were {0,12}, {0,13}, {0,14}, {1,15}, {1,16}, and {1,17}. According to the earlier definition, P₂ will append its process ID to the path and forward each resulting message to all other processes. The possible messages it could broadcast in Round 2 are {0,122}, {0,132}, {0,142}, {1,152}, {1,162}, and {1,172}. The first message, {1,122}, contains a cycle in the path value of the tuple, so it is tossed out, leaving five messages to be sent to all processes.

The first message that P₂ is sending in Round 2, {0,132}, is equivalent to saying, "P₂ is telling you that in round 1 P₃ told it that in round 0 that P₁ (the General) told it that the value was 0." The five messages shown in P₂'s column in Table 2 are sent to all six lieutenant processes, including itself.

Sender = P₂	Sender = P₃	Sender = P₄	Sender = P₅	Sender = P₆	Sender = P₇
{0,132}	{0,123}	{0,124}	{0,125}	{0,126}	{0,127}
{0,142}	{0,143}	{0,134}	{0,135}	{0,136}	{0,137}
{1,152}	{1,153}	{1,154}	{0,145}	{0,146}	{0,147}
{1,162}	{1,163}	{1,164}	{1,165}	{1,156}	{1,157}
{1,172}	{1,173}	{1,174}	{1,175}	{1,176}	{1,167}

Table 2: Messages sent by all six processes in Round 2.

It's easy to see that as the number of processes increases, the number of messages being exchanged starts to go up rapidly. If there are N processes, each process sends N-1 messages in Round 1, then (N-1)*(N-2) in Round 2, and (N-1)*(N-2)*(N-3) in Round 3. That can add up to a lot of messages in a big system.

The Second Stage

While sending messages in each round, processes are also accumulating incoming messages. The messages are stored in a tree format, with each round of messages occupying one rank of the tree. Figure 3 shows the layout of the tree for a simple configuration with six processes, one of which can be faulty. Because m=1, there are just two rounds of messaging: The first, in which the general sends a value to each lieutenant process, and a second, in which each process broadcasts its value to all the other processes. Two rounds of messaging are equivalent to two ranks in the tree.

In Figure 3, there are six processes, and the General (P₁) is faulty—sending a 1 to the first three lieutenants and 0 to the last two. The subsequent round of messaging results in P₂ having an information tree that looks just like that in Figure 3. (Because only the General is faulty, in this case all other processes will have an identical tree.)

Once a process has completed building its tree, it is ready to decide on a value. It does this by working its way up from the leaves of the tree, calculating the majority value at each rank, and assigning it to the rank above it. The output value at each level is the third item in the data structure attached to each node, and those values are all undefined during the information gathering stage.

In Figure 3, step 1 of the process assigns the initial values to the leaf nodes. In the next step, the majority value of {1,1,1,0,0} is evaluated and returns a value of 1, which is assigned to the output value in rank 0. Because that is the top rank, the process is done, and P₁ decides on a value of 1.

Every lieutenant value in a given exercise will have the same paths for all its nodes, and in this case, because only the General is faulty, we know that all lieutenants will have the same input values on all its leaves. As a result, all processes will agree on the same value, 1, which fulfills the agreement property.

A More Complicated Example

Getting a good understanding of the algorithm really requires walking through an example that has at least three ranks. Let's consider an example with n=7 and m=2. We'll continue with the convention that the General is P₁, and instead of having a faulty general, we'll have P₆ and P₇ be faulty processes. After the initial three rounds of information exchange, each process has the three-ranked tree in Figure 4.

The important thing to note in these trees is that I've inserted the value X for the input values of any input value that comes from the two faulty processes. We don't know what P₆ and P₇ might send in any given round, so in general, we'll try to evaluate this without pinning the result down.

You'll see that at rank 1, the values from path 17 and 16 are both set to X. In the first round, the two faulty processes communicated possibly false values to all other processes, and may have arbitrarily changed the values sent to different processes to skew the results.

As a result of those bad values in rank 1, we see their frequent occurrence in rank 2. In addition, there are additional bad values in rank 2 that resulted from other messages from the faulty processes.

All in all, at the leaf nodes, we have 18 deceptive values at the leaf nodes, and only 12 accurate messages that trace their way all the way back to the general through nothing but correct processes. Obviously, if we just voted on the majority of the messages we had received, we would be susceptible to falling for the wrong value.

Fortunately, the layout of the tree guarantees that we will actually get a correct value. In Figure 4, the rollup of the output values hasn't occurred yet, so every node has a question mark in the output value. In Figure 5, the output values are shown. The leaf rank has the output values set to the input values, with X used to indicate unknown values from faulty processes.

When the leaf rank is rolled up to the second rank, the nodes with paths 12, 13, 14, and 15 all have clear majority values of 0 for their output values, with 16 and 17 set to X, as their values are uncertain.

The final rollup to the top rank successfully sets the output value to 0, as four of the inputs are set to 0 and only 2 are set to X. Mission accomplished.

The Sample Code

I've included a C++ program (available online at www.ddj.com/code/) that implements this algorithm. It has a Process class used to send/receive messages, as well as to roll up the decision tree. A Traits class defines the number of processes, number of faulty processes, source process, and values the faulty processes sent in various rounds.

To help with visualization, the program outputs the tree for a given process in the format used by the graphviz program called dot (part of the free Graphviz program; www.graphviz.org). You can then use dot to create a picture of the output graph (all the figures in this article were created with dot).

As supplied, the program has n=7 and m=2. These are some good exercises to perform while experimenting with it :