Channels ▼
RSS

Testing Random Number Generators, Part 2


August 1996/Testing Random Number Generators, Part 2

If you need more than a simple random-number generator, you need more than simple tests to choose a suitable candidate.


Random number generators seem to do the impossible. Without outside intervention, such generators appear to produce arbitrary sequences of numbers on computers, decidedly non-arbitrary machines -- at least when they're working correctly.

The apparent magic of random number generators is due to complex processing of number sequences. As an example, consider one common generator, the linear congruential generator:

x(i+1) = (a*x(i) + c) mod m

For this generator, x(i) is the ith random number, x(i+1) is the next random number, a is a multiplier, c is a constant, and m is the modulus. This generator is often called a Lehmer generator, after its originator, D. H. Lehmer. A sequence of numbers from a Lehmer generator is many things, but random is not one of them. Like all computer generated numbers, it is "pseudorandom" at best, but for convenience sake we will refer to these pseudorandom sequences as "random numbers." Mathematically, the Lehmer generator is an example of a nonlinear difference equation. While we could bring sophisticated mathematics to bear on such equations, there is no need to inflict such complexity upon our readers. The issues underlying the equation are actually simple.

In this article, the conclusion of a two-part series begun in June, we discuss a C program that tests a Lehmer generator over particular values of the multiplier and modulus. This program implements what is called the spectral test, which is the final test in our suite. Our earlier test performed statistical tests, which examined short sequences of random numbers. The spectral test, by contrast, implicitly examines the entire set of possible random numbers that can produced by a Lehmer generator. The spectral test examines the structure of random numbers produced by a generator.

In this article we also discuss another important property of random number generators, the full-cycle lengths, and what we learned from our statistical and spectral tests. Finally, we make a large body of code available on the code disk and via ftp. With this code you'll find documenation that explains how you can add generators to our suite of statistical tests, and how you can add statistical tests to the suite.

The Spectral Test

The name "spectral test" is misleading. It suggests a sound or light spectrum, but has nothing to do with either. The spectral test measures the

uniformity of numbers produced over the full period by a Lehmer generator. Recall that a uniform distribution of numbers is one in which all numbers are equally likely to occur. The spectral test is called a full-cycle test because it examines the entire possible sequence of generated numbers. Our implementation of the test from Knuth ([6] , pp. 89-113) assumes that the generator is capable of producing the maximum possible set of numbers. Another way the spectral test differs from the frequency tests presented in our first article is that it tests for a uniform distribution in more than one dimension.

Figure 1 shows a uniform distribution of numbers in a graph of successive random numbers. The horizontal axis represents the ith value and the vertical axis represents the next value. You can tell that the points are consistent with a uniform distribution by the evenness with which the numbers are spaced out. Because a true uniform distribution is continuous, a theoretical uniform distribution would completely cover the area in the graph. Computers, however, can produce only a finite set of numbers, which implies that the area in the graph cannot be completely covered. This figure represents all pairs of numbers that can be represented using seven-bit numbers, 27=128 different values. In Knuth's terms ([6] , pp. 90-91), this set of values has an "accuracy" of seven bits. The full graph contains quite a few point -- 1282 points, because the next random number can be any of the 128 different values no matter what the current random number. To save space, we show only a part of the graph, to give you an idea what the distribution looks like.

Graphing Lehmer Generators

Consider a simple Lehmer generator with no constant,

x(i+1) = a * x(i) mod 127.

We use a modulus of 127 simply for illustration. The full cycle of this generator is the modulus of 127 minus one, 126. (This assumes the multiplier, a, has no common factors with 126.) This cycle length is too short for even a decent card game, but it's handy for illustrating the issues. There are twelve multipliers less than 127 that have no common factors with 126. For illustration, we present the spectral test results for the best and worst multipliers in two dimensions: multipliers of 53 and 85. Figure 2 shows the distributions resulting from these two multipliers. It uses the same representation scheme as Figure 1.

Figure 2 indicates one major difference between the Lehmer generator and random numbers right away: each random number, x(i), is associated with exactly one value of the next random number, x(i+1). It is less obvious, but each value of the next random number, x(i+1), is also associated with exactly one value of the prior random number, x(i). This is a big difference from Figure 1, where each value of x(i) is associated with all possible values of x(i+1) and vice versa.

Figure 2 also shows a big difference between these two multipliers. The successive values for the multiplier 53 are spread relatively uniformly around the graph. The successive values for the multiplier of 85 lie on three parallel straight lines. (For later reference, the distance between the lines is indicated by the arrow in the figure.) We could also draw lines for the multiplier of 53, but we'd need more such lines to include all the points. The distances between adjacent lines are less and there are fewer holes -- areas with no points -- in the graph.

We can draw similar graphs of successive random numbers in three dimensions. If we do so, we will find that parallel two-dimensional planes can be drawn through the points. The third dimension represents the value x(i+2), that is, the next number generated after x(i+1). We cannot draw graphs for four or higher dimensions, at least not in our universe, but we can calculate the points that would lie in those graphs. The fourth dimension represents the value of x(i+3), the fifth x(i+4), and so on. It is possible to imagine parallel three or higher dimension hyperplanes running through the points in those higher dimensions. In all dimensions, the issues are the same as in two dimensions: the number of planes, the distance between the planes, and the sizes of the holes where there are no computer-generated random numbers.

It might seem that this enormous dependence of successive values means that Lehmer generators are worthless. This is not true. Other random number generators have similar structures, but their structures either are not completely known or are more difficult to characterize mathematically. If you want to pursue this issue, Ripley ([15] , p. 30) presents some graphs similar to Figure 2 but for shift-register generators.

Evaluating the Graphs

Looking at Figure 2, how should we measure the uniformity of the distribution, that is, how should we measure how big holes are in two, three, or more dimensions? There are many ways, but we focus on the distance between adjacent parallel lines that cover all of the points in the graph. The basic idea is to use the distance between adjacent lines as a measure of the size of the holes. We measure the Euclidian distance between adjacent lines, illustrated by the arrow in the right-hand side of Figure 2. Note that the parallel lines in Figure 2 are equidistant. This is true in higher dimensions as well. It is not hard to see that a constant, c, in a Lehmer generator has no effect on this distance because it merely shifts up the lines, so the spectral test will not be affected by the value of the constant.

The spectral test finds the maximum Euclidian distance between adjacent hyperplanes, given a set of parallel hyperplanes that cover all of the points. Compared to other possible measures of the distance, such as the average distance for all possible sets of such hyperplanes, the maximum distance is a conservative measurement.

Since computers represent numbers in a finite number of bits, even an ideal generator implemented on a computer would produce a distribution with "holes." As seen in Figure 1, there is a finite distance between the lines, and this distance is related to the number of bits used to represent numbers in the computer. The "accuracy" of a Lehmer generator can be defined as the inverse of the maximum distance between adjacent hyperplanes, and an equivalent number of bits can be defined as the logarithm to the base 2 of that accuracy. Knuth [6] also describes a "figure of merit," which can be defined as the distance between the hyperplanes adjusted for the modulus, thereby making possible a standard of comparison independent of the modulus.

Knuth ([6] , p. 102) suggests that if a Lehmer generator has figures of merit greater than 0.1 for dimensions two through six, the multiplier "passes" the spectral test. If a Lehmer generator has figures of merit greater than 1.0 for dimensions two through six, the multiplier "passes with flying colors."

Implementing the Spectral Test

The spectral test would not be feasible if the test required a complete set of all random numbers and a search for the maximum distance between all sets of parallel hyperplanes by complete enumeration. Instead, using the method suggested by Dieter [12]

and explained by Knuth [6] , substantial simplifications are possible and our source code uses them. Far too long to be printed in this magazine, the source code for our version of the spectral test is available electronically, as described on page 3.

Figure 3 shows the output from our spectral program, spectral, for the two generators shown in Figure 2. On startup, the user determines the parameters for the test and spectral echoes the parameters. The first part of the listing echoes the number of multipliers used in the test, a value set at one or two by the user. A value of one indicates that the test is for a Lehmer generator and a value of two indicates that the test is for a generalization of the Lehmer generator to

xi+1 = (a*xi + b*xi-1 + c) mod m,


where b is a multiplier on the prior random number.

The primary output of spectral is the accuracy, number of bits, and figure of merit of the generator in various dimensions. The number of dimensions for the computations is set by the user. As Knuth indicates, it is best to compute the indicators of a generator's uniformity for dimensions from two up to the

limit, generally on the order of six. Figure 3 illustrates the usefulness of such computations: the figure of merit for the multiplier of 53 actually is worse than for the multiplier of 85 in five dimensions, even though the ordering obviously is reversed in all other dimensions.

It is apparent from the output that, given the modulus of 127, the multiplier of 53 is quite a bit better than the multiplier of 85 up to four dimensions. The accuracy and equivalent "number of bits" indicate that the multiplier of 53 has planes that are much closer together than does the multiplier of 85. The figures of merit give the same rank ordering because the two generators compared in Figure 3 have the same modulus. Otherwise, they would not necessarily have the same rank ordering.

A Spectral Test on RANDU

We don't have room to discuss spectral test results for all alternative generators. Nonetheless, it is interesting to see how badly the generator known as RANDU,


x(i+1) = (65539*x(i)) mod 231,

performs on the spectral test. In three dimensions, planes that cover all of the points fall on three lines and, naturally, the figure of merit is quite small, actually 1*10-5. The "number of bits" in three dimensions is only 3.4, not substantially more than for the illustrative generator in Figure 3.

The Extended-precision Library

To avoid overflow, spectral does arithmetic with 100-digit floating point numbers. It uses the 100-digit library written by S.L. Moshier. (For a description, see the sidebar, "An Extended Precision Library. ") Complete source code for this library is included with the code for this article.

Estimating Full-Cycle Length

The cycle length is an important characteristic of any generator. The cycle length limits the maximum number of random numbers that can be generated without exact repetition of the sequence. If you are doing something that uses hundreds of billions of random numbers, it will not do to have a generator with the cycle length of generators in some early versions of BASIC, commonly 32,767 (215-1).

We indicate the cycle lengths for the generators included in our suite of tests as in Table 1.

The estimate for the compiler's library generator is based on a modulus of 232, which is the modulus used in recent versions of the Microsoft and Borland compilers for DOS, OS/2, and Windows NT. The cycle length of 230, approximately 1.1*109, is for odd seeds. If the random number generator is seeded with an even number, the cycle length is shorter.

While it might seem like a nice idea to have a program that calculates a generator's cycle length, such a program would be messy and not very helpful. This is because cycle lengths are estimated by applying various theorems; consequently, such a program would be a series of queries to a user followed by application of the known theorems. It is simpler to do this yourself for a particular generator. Fishman and Moore [13]

provide a concise summary of the relevant theorems for Lehmer generators and Knuth [6]

provides a comprehensive discussion. The cycle lengths for the Marsaglia et al generators that postdate Knuth are estimates by the generators' originators.

As you can see, none of the generators in our suite has a cycle length less than a billion, and some cycle lengths are extremely long by any standard. Whether a cycle length is too short can be determined only based on a particular application. Certainly, for an application such as a card game, which might use a few hundred random numbers per play, all of these cycle lengths seem more than adequate. At the other extreme, if you intend to use hundreds of millions of random numbers, you will not want to use a Lehmer generator with a multiplier as small as 231 - 1.

Test Results for Some Generators

The purpose of our programs is to provide a basic set of routines to evaluate random number generators, not to produce a once-and-for-all set of test results. Nonetheless, in the process of writing and testing the statistical programs, we noticed some consistency in the results for the generators currently included in the suite. Though we strongly suggest that you run those tests yourself, we do have impressions about both the generators and the ability of the tests to discriminate between them.

The generator RANDU performs miserably on most of the statistical tests, but not the frequency test, the run test, and the serial correlation test. At least for this generator and probably for other Lehmer generators, these three tests may not be helpful for discriminating between multipliers. We do not infer that these tests are useless, however.

We have noticed a pattern of poor performance of Lehmer generators that use 232 as a modulus. Four generators in our suite use this modulus: Plauger ANSI C, Knuth line 26, RANDU, and the Microsoft C library generator. These generators consistently generate subsequences of pseudorandom numbers that "fail" the serial test and the collision test; that is, random numbers would be quite unlikely to repeatedly occur in sequences of numbers that look like these pseudorandom numbers. The failure on the serial test indicates that, in two dimensions, the pseudorandom numbers are not uniformly distributed. The failure on the collision test indicates that the generators did not produce similar low-order bits with the frequency that would occur for uniformly distributed numbers.

Based on these results, we advise against using generators with 232 as the modulus. While such generators can be very fast because they can use integer overflow on some C compilers, including those from Borland and Microsoft, these generators are not as good overall as others that are easily programmed and portable.

The best generators of the lot, in our opinion, are the combination generator in C by Dwyer (a L'Ecuyer generator with shuffling added) and the two Marsaglia-Zaman generators. The worst of the lot are the Fibonacci sequence and RANDU. These generators were intentionally included as examples of bad generators.

What about the generators that do not fail any of the tests? Of the fifteen generators shown in Table 1, ten of them seem reasonably good based on the tests available. If cycle length does not limit the part of the list pertinent to you, then you may want to decide based on some combination of execution speed and ease of programming and maintaining the code. These criteria depend on your computing environment. You also could add statistical tests from Fishman and Moore [2] and Marsaglia [7] to the suite, and these might help to narrow down the range of acceptable generators.

Conclusion

It is fair to say that much of the literature on random number generators is heavy stuff. It's heavy in two ways. First, it's heavy in the sense that many of the articles require uncommon mathematical skills and can take hours to read and understand. Second, it's heavy in the sense that even a fraction of the articles on the subject would make a big pile.

What's our excuse for adding to the pile? Our addition comes with an important difference: We provide software to go with our words. In part 1 of this article we described statistical tests. The code disk contains source code for the following:

  • Collision Test
  • Coupon-Collector's Test
  • Frequency Test
  • Gap Test
  • Maximum-of-t Test
  • Permutation Test
  • Poker Test
  • Run Test
  • Serial Correlation Test
  • Serial Test (Independent Sequential Pairs)
Except as described in part 1 for the Serial Correlation Test, these tests have been implemented as described by Knuth [6] .

In this article we described a full-cycle test -- the Spectral Test. This test is implemented according to Knuth, too. Source code for the spectral test is also on the code disk. It comes complete with source for the 100-digit arithmetic library. As far as we know, this is the only version of this test that is available in the public domain.

We hope that these articles on testing random number generators, combined with an earlier expository article on Lehmer generators by one of us [1]

have made the issues involved in using random number generators clearer. We have pointed out some pitfalls to avoid. We have given you code for several reasonably good portable generators. And, given you the tools and some of the information necessary to explore generators on your own should you be so inclined.

Acknowledgement

We are indebted to Stephen L. Moshier (moshier@world.std.com), without whose software this project would still be a long way from completion. Every test in this suite uses programs written by Steve. All pattern tests use functions chdtr (chi-square distribution) and KSmirnov (Kolmogorov-Smirnov distribution) which are due to his fine hand. They, in turn, use other functions of his. Our spectral test uses his 100-digit arithmetic package (described in the sidebar) which is a solid piece of work, indeed.

References

[1] Jerry Dwyer, "Quick and Portable Random Number Generators," C/C++ Users Journal, 13(6), June 1995, pp. 33-44.

[2] George S. Fishman and Louis R. Moore, "A Statistical Evaluation of Multiplicative Random Number Generators with Modulus 231-1," Journal of the American Statistical Association, 77 (1982), pp. 129-136.

[3] T.R. Hopkins, "A Revised Algorithm for the Spectral Test," Applied Statistics, 32(3), 1983, pp. 328-335 (in Fortran).

[4] F. James, "A review of pseudorandom number generators," Computer Physics Communications, 60, 1990, pp. 329-344.

[5] Lisa K. Jones and Laurie Seaton (eds.), Math77 Reference Manual, Language Systems Corporation, Sterling, Va., 1994.

[6] Donald E. Knuth, The Art of Computer Programming, Volume 2, Seminumerical Algorithms, 2nd Ed., Addison Wesley, Reading, Pa., 1981.

[7] George Marsaglia, "A Current View of Random Number Generators," Computer Science and Statistics, Proceedings of the Sixteenth Symposium, L. Ballard (ed.), 1985, pp. 3-10.

[8] W.L. Maier, "A Fast Pseudo Random Number Generator," Dr. Dobb's Journal, 176, 1991.

[9] Harald Niederreiter, Random Number Generations and Quasi-Monte Carlo Methods, 63, CBMS-NSF Regional Conference Series in Applied Mathematics, Philadelphia, Pa., Society for Industrial and Applied Mathematics, 1992.

[10] P.J. Plauger, The Standard C Library, Prentice Hall, Englewood Cliffs, 1992.

[11] Brian Wichmann and David Hill, "Building a Random-Number Generator," BYTE magazine, March, 1987, pp 127-128.

[12] U. Dieter. "How to Calculate Shortest Vectors in a Lattice." Mathematics of Computation,1975.

[13] George S. Fishman and Louis R. Moore III. "An Exhaustive Analysis of Multiplicative Congruential Random Number Generators with Modulus 231-1." SIAM Journal on Scientific and Statistical Computing, January 1986.

[14] S. L. Moshier. Methods and Programs for Mathematical Functions. Ellis Horwood Ltd., Chichester, 1989.

[15] Brian D. Ripley. Stochastic Simulation. John Wiley & Sons, 1987.

Jerry Dwyer is a Professor of Economics at Clemson University and a visiting scholar at the Federal Reserve Bank of Atlanta. He has a Ph.D. from the University of Chicago and has been programming for twenty years. He primarily writes programs for statistical analysis. He can be reached at dwyer@clemson.edu. The views expressed are those of the authors and not necessarily those of the Federal Reserve Bank of Atlanta or the Federal Reserve System.

K.B. Williams received a B.A. in Mathematics from California State University at Sacramento in 1955. He is a veteran of forty years in the scientific applications programming trenches and is the founder of King Baker Enterprises, in Stillwater, Oklahoma. He consults in applied numeric techniques. He can be reached at 802 South Ridge Drive, Stillwater, OK 74074, or via e-mail at Kbwms@aol.com.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video