### Irreversible Subcycle Generators

The irreversible generators are more numerous than the reversibles and they generally provide greater randomness for the same number of processor clock cycles. But their periods are shorter, so when used in 32-bit combination generators, they should be mixed with reversible generators to yield an acceptably large period. Irreversible generators are well-suited for 64-bit processors because their periods are large enough so as not to be an issue, and small enough to be obtainable by exhaustive search.

In Table 6, Per-32 and Per-64 refer to the ranges of periods you can get with 32 bits and 64 bits, respectively.

For 32-bit applications, I recommend only those with the longer periods, which means avoiding RA, RAR, RESDRA, and RERA. ISDLS and IESDLS can have large periods, but they are achieved at the expense of large shift-counts, which remove most of the bits (shifting in zeroes), thereby reducing randomness.

For 64 bits, all these generators are recommended. Note that I haven't tested IESDLS and IADLA with 64 bits yet, so I have no tables for them. A combination of three irreversible generators yields a period of over 2^{100}, which is more than adequate for all uses.

**CMRES: t = x; x = Multiplier*x; x = rotl(x,RL) – t;**

CMRES is a variation of CMR with a final subtraction that makes it irreversible, reducing its period enough to be traversable. That subtraction also boosts its randomness enough that this 64-bit generator by itself (not in a combination) can pass BigCrush — a surprising feat. A list of passing multipliers is shown in the Table 7. However, I recommend using this generator in a combination to get a longer period and to make seeding easier. Based on these test results, I'd say CMRES is the most random of all the subcycle generators I've tried. The combination of two of these subcycle generators in Listing Six passes BigCrush effortlessly:

**Listing Six: A generator created from two irreversible subcycle generators.**

void Seed2Cmres (uint32_t seed) { uint32_t n; uint64_t t; xx = 138563767ULL; for (n = (seed & 0x00ffffu) + 10; n>0; n--) { t = xx; xx = 3188803096312630803ULL * xx; xx = rotl(xx,33) - t; } yy = 2400589211ULL; for (n = (seed>>16) + 10; n>0; n--) { t = yy; yy = 14882990517504201107ULL * yy; yy = rotl(yy,30) - t; } } uint64_t Rand2Cmres () { // Combined period = 2^72.66 uint64_t t; t = xx; xx = 3188803096312630803ULL * xx; xx = rotl(xx,33) - t; t = yy; yy = 14882990517504201107ULL * yy; yy = rotl(yy,30) - t; return xx + yy; }

Although the 32-bit multiplication in most processors is fast, the 64-bit multiply might be slower. You are advised to check this, or run some timing tests. Note that modern Intel desktop processors can perform a 64-bit multiply in only 3 clock cycles [3] when running a 64-bit operating system. On such processors, this combination generator would be among the fastest, as well as have superb randomness. Table 7 shows some multipliers for CMRES that pass BigCrush, and also have large periods:

**RA: x+=rotl(x,L));
RS: x-=rotl(x,L));
RES: x=rotl(x,L)-x);**

RA was the first generator I invented, and while it works as well as the others, its period is among the shortest of all the irreversibles. RS and RES were the next generators I developed. All three are worthwhile because they're very fast (only two clock cycles of math) and provide good randomness.

**RAR: x = x + rotl(x,R1); x = rotl(x,R2);
RSR: x = x - rotl(x,R1); x = rotl(x,R2);
RESR: x = rotl(x,R1) - x; x = rotl(x,R2);**

RAR, RSR, and RESR are excellent generators for both 32 bits and 64 bits. The second rotation in them improves mixing of bits at a cost of only one clock cycle. In fact, these subcycle generators are good enough that the following 32-bit combination generator, which only uses one RSR and one RESR, passes the demanding BigCrush test-suite:

void SeedRsrResr (uint32_t seed) { uint32_t n; x = 542; y = 5981; for (n=(seed>>16 )+20; n>0; n--) { x -= rotl(x,11); x = rotl(x,27); } for (n=(seed&0xffff)+20; n>0; n--) { y = rotl(y,21) - y; y = rotl(y,20); } } uint32_t RandRsrResr () { // Combined period = 2^41.89 x -= rotl(x,11); x = rotl(x,27); // RSR: period=2847384=2^3*3^2*71*557 y = rotl(y,21) - y; y = rotl(y,20); // RESR:period=1435175=5^2*7*59*139 return x ^ y; }

This combination generator uses only 7 clock cycles of arithmetic and is superb for implementation in hardware due to the few gates it would use and its parallelizability. It provides both extremely high speed and good quality, but it has a relatively short period of 2^{41.89}. To boost its period, you could add a CERS generator to it. Or it could be implemented using 64 bits as in the following code:

void SeedRsrResr64 (uint32_t seed) { uint32_t n; xx = 981906ULL; yy = 590009ULL; for (n=(seed>>16 )+20; n>0; n--) { xx -= rotl(xx,11); xx = rotl(xx,27); } for (n=(seed&0xffff)+20; n>0; n--) { yy = rotl(yy,21) - yy; yy = rotl(yy,20); } } uint64_t RandRsrResr64 () { // Combined period = 2^85.01 xx -= rotl(xx,21); xx = rotl(xx,36); // RSR: period=3931871863377=3*733*1223*1462001 yy = rotl(yy,43) - yy; yy = rotl(yy,27); // RESR: period=9925159703554=2*53*93633582109 return xx ^ yy; }

On a 64-bit processor, this still uses only 7 clock cycles of math. But now it has a large period, and it still passes BigCrush. RAR, RSR, and RESR are excellent subcycle generators.

**RESDRA: x = rotl(x,R1) - x; x = x + rotl(x,R2);
RSDRES: x = x - rotl(x,R1); x = rotl(x,R2) – x;**

RESDRA and RSDRES are worthy generators for 64 bits, and should be a little more random than even RAR/RSR/RESR. You've already seen RESDRA in ` RandRersResrResdra`

, and RSDRES is a similar fine performer. But they both consume 4 clock cycles of arithmetic, so avoid them if you need top speed.

**RERA: x = rotl(x,R1) + rotl(x,R2);**

RERA is simply the sum of two rotations, and it combines well with XORs. Try these three pairs of left rotations in a 32-bit combination generator: (25,27), (19,29), and (5,23). Initialize them to 1, 1, and 2 respectively, and seed with a loop. That combination generator has a period of 2^{49.14}, and passes all tests. But RERA is not as powerful as the similar RERS.

**RERS: x = rotl(x,R1) – rot(x,R2);**

RERS is the difference of two rotations, and includes the RS and RES generators as special cases when a rotation is zero. So RERS encompasses a useful family of generators, and you've seen several in combination generators presented earlier, such as `Rand2RersRs`

, which uses three of them. It's a good performer, but probably not quite as random as RAR/RSR/RESR. It's prudent to mix types of subcycle generators within a combination generator, and RERS is a good choice for mixing because its last operation is not a rotation, so it's fundamentally a different kind of generator than RAR/RSR/RESR — perfect for adding to a mix. Table 8 presents a list of selected 64-bit generators of the types just described:

An annoying pattern you'll notice in this list is that the generators with the largest periods usually use a rotation of 21 or 43, reducing your ability to use a variety of rotations in the combination generator. This pattern is not true of RERS, making it that much more useful in a combination generator. The complete cycle tables for most of these generators took about a week each to compute due to their large maximum periods of about 2^{43}.

A few more generators use shifts instead of rotations. These are ISDLS, IESDLS, and IADLA, plus a couple more permutations obtained by changing the order of the add and subtract operators. They're fine generators, and I've created ISDLS generators in 64 bits as well as 32 bits. You can create a good 64-bit combination generator with three ISDLS subcycle generators using the following shift-pairs: (22,31), (11,13), and (7,19). Initialize each generator to 1 and seed with loops. All of these subcycle generators require 4 clock cycles of arithmetic, so I suggest they be used only when a rotate instruction is not available.

### Theory Is Needed

Other than what we've gleaned from test results, we know very little about this new family of subcycle-based generators. There is no theory whatsoever about them. Most of these generators use a rotation and an add/subtract/multiply, which are difficult to analyze together because they function in the different number-theoretic fields of `GF(2)`

and integers (`mod 2`

). I'm hoping that researchers in universities (and industry) who are expert in theoretical analysis will be able to provide some insights into these generators.
^{w}

For example, a way of determining the period of the important CMR generator (and the other reversibles) would be especially helpful for 64-bit applications, because their average periods are 2^{63}, which is too large to traverse by brute force. Presently, the best we can do is spend hours of CPU-time traversing a portion of a cycle, say 2^{45} numbers, and then we know the period is at least that large. And if that's larger than the number of random values required by an application, it's fine. It would be better to know the exact period, and that will require theoretical work. Even constraints on the period would be helpful. Will the period be odd or even? How large can the period be? Is 2^{32}-1 or the prime number 2^{32}-5 attainable? Are there anomalies in the numbers output by these generators? The spectral test [4] was devised to gauge the quality of multipliers for the LCG, and I suspect that other tests can be devised for some of these subcycle generators. Do some multipliers for CMR and CMRES produce better randomness than others? Probably, but we need some theory to reduce the search-time for good multipliers. For generators that do not use a multiply, do some constants and rotations produce better randomness than others? Again, probably so, but some theory is required to tease out the good constants. Does combining the outputs of multiple subcycle generators introduce anomalies? Testing indicates "no," but theory would tell us for certain.

### Recommendations

My final recommendations are to use 64-bit combination generators if you have access to 64-bit instructions, and to start with the outstanding CMRES subcycle generator. For 32 bits, start with CMR, which is almost as good. If a multiplication is too slow, the various reversible generators using left-shifts work well, and for 64 bits, the irreversible generators with two rotations (RAR, RSR, RESR, etc.) are fine performers. But I consider CMR and CMRES to be the most important generators presented in this article. Use them if you can.

This article only begins to explore the possibilities of subcycle-based generators. I have tried many more primitive generators than I've reported here. Some were rejected because they were no better than these, some because their periods were too short, and the remainder because they failed tests. There are probably others awaiting discovery. Also, more work can be done with combinations of subcycle generators. For example, combining six 64-bit or eight 32-bit generators can produce a combination generator having a period of about 2^{256}, which could be useful as a stream cipher in cryptography. An example is provided in the downloadable files.

Based on many rigorous tests, I am confident that subcycle-based generators will work well for most applications needing random numbers. I encourage those who explore these generators to publish their results for the benefit of the state of the art. If you publish an article or write a PhD dissertation about subcycle generators, I'd appreciate an email.

You can obtain my files containing tables of cycles for all the subcycle generators (both 32 bit and 64 bit) presented in this article, plus additional generators that might be worth investigating. The zip-file includes the source code for my cycle-finder program. It also includes source for my command-line interface to the TestU01 library, which contains all the 100-plus generators I tested. Click here to download the zip file.

### References

[1] Pierre L'Ecuyer, Richard Simard, August 2007, TestU01: A C library for empirical testing of random number generators, *ACM Transactions on Mathematical Software (TOMS)*, Volume 33 (Issue 4).

[2] Marsaglia, George, July 2003, Xorshift RNGs,*Journal of Statistical Software*, Volume 8 (Issue 14).

[3] Intel 64 and IA-32 Architectures Optimization Reference Manual [PDF], Order number 248966-023a, Jan. 2011, pg 724, Intel Corporation.

[4] Knuth, Donald, 1981, Seminumerical Algorithms, The Art of Computer Programming Vol 2, Second Edition, pp 89-113, Addison-Wesley.

*Mark Overton is a lead Software Engineer at Northrop Grumman and has several patents in image processing. He can be contacted at [email protected].*

Fast, High-Quality, Parallel Random Number Generators: Part 1