# Random Numbers in a Range Using Generic Programming

#### Source Code Accompanies This Article. Download It Now.

Michael is a doctoral candidate in the Computer Science Department of Ben-Gurion University. He can be contacted at orlovm@cs.bgu.ac.il.

Consider the generation of random numbers in an interval, say the generation of an integer in {0,1,...,m-1} where the probability of each integer is 1/m. Usually, you have a source of randomness that provides random bits in chunks of architecture word size—something like the wrapper around the mtrand library (bedaux.net/mtrand); see Listing One.

Mtrand implements the Mersenne Twister random-number generator (www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html), which is a high-quality source of randomness. By "high quality," I mean that you don't need to consider low-order bits of words produced by the generator to be of inferior quality than high-order bits. In principle, you can take the value returned by next() modulo m to produce an integer in the range [0,m-1].

However, this approach is problematic when m is comparable in size to the number of values representable by value_t. For instance, assume that the type of value_t is an unsigned 10-bit integer, and can thus contain values in the [0,1023] range. Consider m=684, which is approximately 2/3 of 1024. Taking a random value_t modulo m results in the lower half of [0,m-1] range having twice the probability of its upper half.

The regular approach to this problem is to ignore it (effectively assuming that m is low enough that the difference in probabilities is insignificant), or to reject the values in the "remainder" portion of the value_t range; see Listing Two.

Taking the previous example with m=684, the constants in Listing Two are:

• M=1023, the maximal value representable by value_t.
• rem=340, the remainder size.
• last=683, the last value in the range of "safe" values returned by next().

Is the remainder values rejection approach costly? It is, especially for values of m close to, but above, the half-range of value_t. For such values, the expected number of calls to next() for each call to next(max) is 2 (this follows from the probability of rejection being 1/2). Because the cost of random bits varies between "moderate" (for Mersenne Twister) to "expensive" (for cryptography-oriented generators, especially hardware implementations), the running time of the relevant program portions doubles in such cases (unless you settle for the low-quality linear congruential generators).

Is it wasteful? The rejected values are thrown away, but perhaps their inherent entropy can still be used—maybe to reduce the probability of rejection on the next iteration of the do/while loop? For instance, in the example, once rnd is rejected, and just before the next loop iteration, rnd-last-1 is uniformly distributed in the [0,rem-1] range. You could partition the [0,m-1] range into exact rem buckets, and in the next iteration draw a value in the corresponding bucket using m=rem. What's problematic in this solution is that the conditions for its application are quite rare because it requires that rem divides m. It can happen—with a value_t type of 10 bits, two possibilities are m=768 and m=320—however, it is possible to do a little better using the Greatest Common Divisor (GCD) algorithm.

Recall that the GCD algorithm computes the maximal number that divides its two arguments, in logarithmic number of arithmetic operations. Let's return to m=684 and rem=340. Both are divisible by 4 (which is the GCD). Thus, if rnd is rejected in the first loop iteration, you can find out in which quarter of the remainder values rnd is located, map it to the corresponding quarter of values in the [0,m-1] range, and let m=684/4=171 in the next iteration. Then, the probability of rejection becomes 1/6 instead of 1/3 (and can in general drop to a much lower value).

It turns out, however, that for randomness sources that are only moderately expensive, the GCD computation cancels any running time benefit attained by reducing the expected number of calls to next(). If you are writing this code for a library, you could specify that it assumes usage with only "expensive" random-number generators, or implement memoization. However, in the first case, the code's generic nature suffers, and in the second case, a penalty is incurred on users who don't need perfect distribution of values in the [0,m-1] range.

This gloomy situation does have a win-win solution if you are using a language that allows compile-time computations (C++, for instance), and if you are willing to limit the approach outlined earlier to values of m known at compile time.

```class Random
{
public:
typedef unsigned value_t;
Random(value_t seed) : mtrand(seed) { }
virtual ~Random() { }
value_t next() { return mtrand(); }
private:
MTRand_int32 mtrand;
};

```
Listing One

```
class SimpleRange : public Random
{
public:
SimpleRange(value_t seed) : Random(seed) { }
using Random::next;
value_t next(value_t max);
};
// Implementation
SimpleRange::value_t SimpleRange::next(value_t max)
{
const value_t M    = std::numeric_limits<value_t>::max();
const value_t rem  = (M - (max - 1)) % max;
const value_t last = M - rem;
value_t rnd;
// Once rnd is in the safe range of
// [0 .. last], return rnd % max
do
// rnd <- [0 .. M]
rnd = next();
while (rnd > last);
return rnd % max;
}
```
Listing Two

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

# First C Compiler Now on Github

The earliest known C compiler by the legendary Dennis Ritchie has been published on the repository.

Quick Read

# HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

Quick Read

# Building Bare Metal ARM Systems with GNU

All you need to know to get up and running... and programming on ARM

Quick Read

# Amazon's Vogels Challenges IT: Rethink App Dev

Amazon Web Services CTO says promised land of cloud computing requires a new generation of applications that follow different principles.

Quick Read

# How to Select a PaaS Partner

Eventually, the vast majority of Web applications will run on a platform-as-a-service, or PaaS, vendor's infrastructure. To help sort out the options, we sent out a matrix with more than 70 decision points to a variety of PaaS providers.

Quick Read

More "Best of the Web" >>