Dr. Dobb's | Algorithm Alley: Unbiasing Random Bits

Algorithm Alley: Unbiasing Random Bits

Efficiently extracting random bits from a possibly biased source of bits.

April 01, 2002
URL:http://www.drdobbs.com/parallel/algorithm-alley-unbiasing-random-bits/184405028

Most computers use a pseudorandom number generator to mimic random numbers. While such pseudorandom numbers are sufficient for many applications, they may not suffice when more secure randomness is needed, such as when you are generating a cryptographic key. For example, years ago the security of the Netscape browser was broken when people found how the seed for the pseudorandom number generator was created.

When better randomness is required, software can be used to obtain randomness from the computer system, including behaviors such as disk movement, user keystrokes, mouse clicks, or sounds recorded by microphones. While many of these physical phenomena can produce random events that are hard to predict, it is not clear how to distill this randomness into something useful, such as random bits that are "0" half the time and "1" the other half. For example, you might try using the number of microseconds between keystrokes to generate random numbers, outputting a 0 if the number of microseconds is even and 1 if the number is odd. While it might be hard to accurately predict the number of microseconds between user keystrokes, it may happen that some users consistently end up with an odd number of microseconds between key strokes 70 percent of the time.

In this article, I'll present a means of efficiently extracting random bits from a possibly biased source of bits. I focus on a simple model of a random source: It generates bits that are 0 with probability p and 1 with probability q=1-p. Here, p should be strictly between 0 and 1. Each bit is independent; that is, whether it is 0 or 1 is not correlated with the value of any of the other bits. What you want is to generate fair bits that are independent and take on values 0 and 1, each with probability 1/2.

For a more physical interpretation, consider this: You have a coin and would like to use it to generate random bits. Unfortunately, the coin may be dented or weighted in some way you do not know, so it might be that it comes up heads with some probability p that does not equal 1/2. Can you use this coin to generate fair bits? Interestingly, you can even if you do not know the value of p. This procedure has practical applications to software that extracts randomness from biased sources, and also leads to some fun mathematics. The approach described here is based on work by Yuval Peres (see "Iterating von Neumann's Procedure for Extracting Random Bits," Annals of Statistics, March 1992).

Extracting Single Bits

The first question to consider is how you can use the possibly biased coin to generate a single fair bit. For convenience, assume coin flips as coming up either heads or tails and the bits produced as being 0s and 1s. The key insight you need (which has been attributed to John von Neumann) is to use symmetry. Suppose you flip the coin twice. If the coin lands heads and then tails, you should output a 0. This happens with probability pq. If instead the coin lands tails and then heads, you should output a 1. This happens with probability qp=pq. In the case where the coin provides two heads or two tails, you simply start over. Since the probability you produce a 0 or 1 is the same for each pair of flips, you must be generating fair bits. This procedure does not even need to know the value of p.

Listing One is the process just described. The procedure looks at consecutive pairs of flips and determines if they yield a fair bit. The variable NumFlips represents the number of biased flips available.

Listing One

Function ExtractBits ( Flips[0,NumFlips-1] )
    for (j = 0; j < (NumFlips-1)/2; j++) {
        if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Tails ) print 0;
        if ( Flips[2*j] == Tails ) and ( Flips[2*j+1] == Heads ) print 1;
    }
}

While this function provides a good first step, it doesn't seem efficient. The function discards pairs of flips when there are two heads (probability p²) or two tails (probability q²=(1-p)²). Using calculus or a graphing calculator reveals that p²+(1-p)² achieves its minimum value of 1/2 when p=1/2. Hence, no matter what the value of p is, the function throws away a pair of flips at least half of the time.

Suppose you define a function B(p) to represent the average number of fair bits per coin flip when the coin lands heads with probability p. You should note that 0B(p)1; you can't get more than one fair bit out of even a fair coin! Also, B(p) is meant to represent a long-term average; it ignores issues such as when you have an odd number of coin flips, the last one is useless under this scheme. For every two flips, you get a fair bit with probability 2pq, so B(p)=pq. When p=q=1/2, so that my coin is fair and I could conceivably extract one full fair bit per flip, B(p) is just 1/4.

Making More Use of Symmetry

A better approach also uses symmetry beyond pairs of flips. For example, suppose you flip two heads followed by two tails. In the original extraction scheme, you obtain no fair bits. But if you decide that two heads followed by two tails produces a 0, while two tails followed by two heads produces a 1, then you maintain symmetry while increasing the chances of producing a fair bit.

There is a nice way to visualize how to do this. Consider the original sequence of flips. You build up a new sequence of flips in the following way: Whenever you get a pair of flips that are the same in the original sequence of flips, you introduce a new flip of that type into the new sequence; see, for example, Table 1.

Table 1: Generating a new sequence.

Whenever a pair of flips is heads-tails or tails-heads, you generate a fair bit using von Neumann's approach. Whenever a pair is heads-heads or tails-tails, add a new coin flip to the new sequence. After you finish with the original sequence of flips, turn to the new sequence of flips to get more fair bits. You append the fair bits from the new sequence to those produced by the original sequence. In Table 1, the final output would be 010101. The new sequence looks for the symmetry between the sequences heads-heads-tails-tails and tails-tails-heads-heads.

There is no reason to stop with just a single new sequence. You can use the new sequence to generate another new sequence, recursively. For example, suppose you change the previous example to Table 2. In this example, the second level produces no extra fair bits, but if you generate a further new sequence recursively, you obtain one more fair bit.

Table 2: Recursively generating bits.

Listing Two implements the recursive variation. You can again define a function B(p) to represent the average number of fair bits you get per coin flip when the coin lands heads with probability p. For every two flips, you again get a fair bit with probability 2pq; this adds pq fair bits per coin flip, on average. If you don't get a fair bit, you get a new flip. Recall the coin gave two heads with probability p²and two tails with probability q², so on average you get (p²+q²)/2 additional recursive coin flips per original coin flip. Also, each coin flip at the new level is heads with probability p²/(p²+q²) and tails with probability q²/(p²+q²). (These values sum to 1, and preserve the proper ratio between two heads and two tails.)

Listing Two

Function ExtractBits ( Flips[0,NumFlips-1] )
    NumNewFlips = 0;
    for (j = 0; j < (n-1)/2; j++) {
        if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Tails ) print 0;
        if ( Flips[2*j] == Tails ) and ( Flips[2*j+1] == Heads ) print 1;
        if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Heads) {
             NewFlips[NumNewFlips++] = Heads;
        }
        if ( Flips[2*j] == Tails ) and ( Flips[2*j+1] == Tails) {
             NewFlips[NumNewFlips++] =Tails;
        }
    }
    if (NumNewFlips >= 2) ExtractBits (NewFlips[0,NumNewFlips-1]);
}

So now you can write the appropriate equation, as in >Figure 1(a). While this equation B(p) does not appear to have a simple closed form, you can use it to calculate specific values of B(p) recursively. Moreover, when p=q=1/2, you obtain B(p)=1/3, as shown in Figure 1(b). While this is an improvement for fair coins over the initial scheme, you are still far from the optimal of one fair bit per coin flip when the coin is fair. The recursive extraction removes some of the waste, but there is still more to be done.

Figure 1: Calculating B(p) for a simple recursive scheme.

Making Even More Use of Symmetry

There is another symmetry you can take advantage of. Consider when the coin lands heads-heads-heads-tails and heads-tails-heads-heads. Both sequences produce one fair bit and one flip for the next sequence in the simple recursive scheme. But you have not taken advantage of the order in which these two events happened; in the first sequence, the fair bit was produced second, and in the second sequence, it was produced first. The symmetry between these two situations can yield another fair bit.

An easy way to extract this extra bit is to again use the idea of creating an additional sequence of coin flips. From the original sequence, you can derive a new sequence that gives you a biased coin flip for each consecutive pair of original flips. If the two flips are the same, you get heads; if the two flips are different, you get tails. You then apply von Neumann's scheme to the new sequence as well as the original sequence; see Table 3. You can again glue together the two outputs to obtain 01010010.

Table 3: Introducing additional sequences yields more bits.

Now you can also use the other new sequence described previously. So when you start with an original sequence, use it to derive two further sequences as in Table 4. It turns out that if you glue all the bits together, you get independent and fair bits.

Table 4: Using the two sequences together.

Of course, you can go further by recursively extracting more bits from each sequence. That is, each new sequence should generate two further new sequences, and so on. This recursive construction is easy to code; see Listing Three.

Listing Three

Function ExtractBits ( Flips[0,NumFlips-1] )
    NumNewFlipsA = 0;
    NumNewFlipsB = 0;
    for (j = 0; j < (NumFlips-1)/2; j++) {
        if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Tails ) {
            print 0;
            NewFlipsA[NumNewFlipsA++] = Heads;
        } 
        if ( Flips[2*j] == Tails) and ( Flips[2*j+1] == Heads ) {
            print 1;
            NewFlipsA[NumNewFlipsA++] = Heads;
        }
        if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Heads) {
            NewFlipsB[NumNewFlipsB++] = Heads;
            NewFlipsA[NumNewFlipsA++] = Tails;
        }
        if ( Flips[2*j] == Tails ) and ( Flips[2*j+1] == Tails) {
            NewFlipsB[NumNewFlipsB++] = Tails;
            NewFlipsA[NumNewFlipsA++] = Tails;
        }
    }
    if (NumNewFlipsA >= 2) ExtractBits (NewFlipsA[0,NumNewFlipsA-1]);
    if (NumNewFlipsB >= 2) ExtractBits (NewFlipsB[0,NumNewFlipsB-1]);
}

All of the procedures to this point have been designed recursively, so the bits from one sequence are output before the bits from the derived sequences are output. Is this important? In fact, you must be somewhat careful to make sure that you do not introduce correlations by ordering the bits in an unusual fashion. The recursive approach is known to be safe, so I recommend sticking with it.

You can again write an equation that determines the long-term average number of bits produced per flip. The equation is similar to the previous case, except now for every two flips, we also get an additional flip in one of the derived sequences. This flip is heads with probability p²+q², since it comes up heads whenever the pair of flips are the same. This results in the equation in Figure 2(a). As shown in Figure 2(b), this equation yields that B(1/2)=1. Now if you flip a fair coin, you expect in the long run to get out one fair bit per flip using this recursive procedure. You are finally doing essentially as good as you could hope for.

Figure 2: Calculating B(p) for the recursive scheme with two new sequences generated per sequence.

This process extracts the maximum number of fair bits possible for every value of p. To prove this requires knowing some information theory. You need to know that the maximum rate at which fair bits can be produced from biased bits is given by the entropy function; see Figure 3(a). The complex looking equation has a straightforward solution; B(p) equals the entropy H(p). You can verify this by plugging B(p)=H(p) into the recurrence. It is easiest to first calculate term by term, and to use the fact that when p+q=1, you have 1-p²-q²=2pq. Suppose B(p)=H(p). Then you first calculate easier expressions for the terms on the right side; see Figures 3(b) and 3(c). Now check that the right side comes out to H(p)=B(p); see Figure 3(d).

Figure 3: Verifying that B(p)=H(p), the entropy function.

Conclusion

Although I have presented some basic psuedocode, you might find it interesting to write more efficient versions on your own. There is a collection of implementations and related links at http://www.ciphergoth.org/software/unbiasing/, and Peres's work has been the subject of discussion on the sci.crypt newsgroup.

Acknowledgment

Michael's work is supported by NSF grant numbers CCR-9983832, CCR-0118701, and CCR-0121154.

Michael is a professor of computer science at Harvard University and can be contacted at [email protected] or http://www.eecs.harvard.edu/~michaelm/.

Apr02: Algorithm Alley

Figure 2: Calculating B(p) for the recursive scheme with two new sequences generated per sequence.

Apr02: Algorithm Alley

Figure 3: Verifying that B(p)=H(p), the entropy function.

Apr02: Algorithm Alley

Table 1: Generating a new sequence.

Apr02: Algorithm Alley

Table 2: Recursively generating bits.

Apr02: Algorithm Alley

Table 3: Introducing additional sequences yields more bits.

Apr02: Algorithm Alley