Random Numbers in a Range Using Generic Programming

The Greatest Common Divisor algorithm computes the maximal number that divides its two arguments—and can be used as an efficient random-number generator.


March 19, 2008
URL:http://www.drdobbs.com/tools/random-numbers-in-a-range-using-generic/206904716

Michael is a doctoral candidate in the Computer Science Department of Ben-Gurion University. He can be contacted at [email protected].


Consider the generation of random numbers in an interval, say the generation of an integer in {0,1,...,m-1} where the probability of each integer is 1/m. Usually, you have a source of randomness that provides random bits in chunks of architecture word size—something like the wrapper around the mtrand library (bedaux.net/mtrand); see Listing One.

Mtrand implements the Mersenne Twister random-number generator (www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html), which is a high-quality source of randomness. By "high quality," I mean that you don't need to consider low-order bits of words produced by the generator to be of inferior quality than high-order bits. In principle, you can take the value returned by next() modulo m to produce an integer in the range [0,m-1].

However, this approach is problematic when m is comparable in size to the number of values representable by value_t. For instance, assume that the type of value_t is an unsigned 10-bit integer, and can thus contain values in the [0,1023] range. Consider m=684, which is approximately 2/3 of 1024. Taking a random value_t modulo m results in the lower half of [0,m-1] range having twice the probability of its upper half.

The regular approach to this problem is to ignore it (effectively assuming that m is low enough that the difference in probabilities is insignificant), or to reject the values in the "remainder" portion of the value_t range; see Listing Two.

Taking the previous example with m=684, the constants in Listing Two are:

Is the remainder values rejection approach costly? It is, especially for values of m close to, but above, the half-range of value_t. For such values, the expected number of calls to next() for each call to next(max) is 2 (this follows from the probability of rejection being 1/2). Because the cost of random bits varies between "moderate" (for Mersenne Twister) to "expensive" (for cryptography-oriented generators, especially hardware implementations), the running time of the relevant program portions doubles in such cases (unless you settle for the low-quality linear congruential generators).

Is it wasteful? The rejected values are thrown away, but perhaps their inherent entropy can still be used—maybe to reduce the probability of rejection on the next iteration of the do/while loop? For instance, in the example, once rnd is rejected, and just before the next loop iteration, rnd-last-1 is uniformly distributed in the [0,rem-1] range. You could partition the [0,m-1] range into exact rem buckets, and in the next iteration draw a value in the corresponding bucket using m=rem. What's problematic in this solution is that the conditions for its application are quite rare because it requires that rem divides m. It can happen—with a value_t type of 10 bits, two possibilities are m=768 and m=320—however, it is possible to do a little better using the Greatest Common Divisor (GCD) algorithm.

Recall that the GCD algorithm computes the maximal number that divides its two arguments, in logarithmic number of arithmetic operations. Let's return to m=684 and rem=340. Both are divisible by 4 (which is the GCD). Thus, if rnd is rejected in the first loop iteration, you can find out in which quarter of the remainder values rnd is located, map it to the corresponding quarter of values in the [0,m-1] range, and let m=684/4=171 in the next iteration. Then, the probability of rejection becomes 1/6 instead of 1/3 (and can in general drop to a much lower value).

It turns out, however, that for randomness sources that are only moderately expensive, the GCD computation cancels any running time benefit attained by reducing the expected number of calls to next(). If you are writing this code for a library, you could specify that it assumes usage with only "expensive" random-number generators, or implement memoization. However, in the first case, the code's generic nature suffers, and in the second case, a penalty is incurred on users who don't need perfect distribution of values in the [0,m-1] range.

This gloomy situation does have a win-win solution if you are using a language that allows compile-time computations (C++, for instance), and if you are willing to limit the approach outlined earlier to values of m known at compile time.

class Random
{
public:
   typedef unsigned value_t;
   Random(value_t seed) : mtrand(seed) { }
   virtual ~Random() { }
   value_t next() { return mtrand(); }
private:
   MTRand_int32 mtrand;
};

Listing One


class SimpleRange : public Random
{
public:
   SimpleRange(value_t seed) : Random(seed) { }
   using Random::next;
   value_t next(value_t max);
};
// Implementation
SimpleRange::value_t SimpleRange::next(value_t max)
{
   const value_t M    = std::numeric_limits<value_t>::max();
   const value_t rem  = (M - (max - 1)) % max;
   const value_t last = M - rem;
   value_t rnd;
   // Once rnd is in the safe range of
   // [0 .. last], return rnd % max
   do
      // rnd <- [0 .. M]
      rnd = next();
   while (rnd > last);
   return rnd % max;
}
Listing Two

Templatized Implementation

Because all the constants computed in Listing Two need to be available at compile time, you need to replace std::numeric_limits<...>::max() with integer traits; see Listing Three.

GCD computation is easy with C++ templates; see Listing Four.

Now you can compute the constants that were defined in Listing Two. Because these constants are derived from m, the parameter to next(max), they are also kept in a traits structure specialized on max; see Listing Five.

You don't just compute rem, last, and gcd, but also determine the dispatch ID, which is used to select one of the three variants of next(max) implementations—actually, next<max>() because m is now a compile-time constant. The NO_LOOP variant is selected when there are no remainder values in the value_t range, and no loop is necessary. Unless a very strange platform is used, m is a power of 2. The RETRY_SAME variant is actually the SimpleRange implementation, and is selected when m and rem are relatively prime, and m cannot be reduced once a random value is rejected. Finally, the REDUCE_MAX variant is selected when m and rem share a common factor, which allows reduction of m—as in the m=684 and rem=340 example discussed previously. Because C++ doesn't allow partial specialization for member functions, and you need partial specialization on a dispatch ID, a private class Dispatch is used. Because Dispatch is specialized on all three values of dispatch_t, only a declaration is necessary for the nonspecialized template (see Listing Six).

Listings Seven and Eight show the straightforward specializations for the NO_LOOP and RETRY_SAME dispatch IDs.

Listing Nine shows the REDUCE_MAX specialization, which implements the previously described buckets-based approach, with a small variation—the remainder range is not partitioned into GCD subranges, but instead a modulo GCD value is used. That is, low-order bits of rnd are used instead of the high-order bits.


template <typename> struct traits;

template <> struct traits<unsigned>
{ static const unsigned max_value = UINT_MAX; };

template <> struct traits<int>
{ static const int max_value = INT_MAX; };

Listing Three

template <uintmax_t a, uintmax_t b> struct gcd
{ static const uintmax_t value = gcd<b, a%b>::value; };
template <uintmax_t a> struct gcd<a, 0>
{ static const uintmax_t value = a; };

Listing Four

namespace random_dispatch
{
   enum dispatch_t
   {
     NO_LOOP,    // rem == 0
     RETRY_SAME, // gcd == 1
     REDUCE_MAX  // gcd != 1
   };
}
template <typename value_t, value_t max>
struct random_traits
{
private:
   static const value_t M   = traits<value_t>::max_value;
   static const value_t rem = (M - (max - 1)) % max;
public:
   static const value_t last = M - rem;
   static const value_t gcd  = gcd<max,rem>::value;
   static const random_dispatch::dispatch_t dispatch =
      rem == 0 ? random_dispatch::NO_LOOP
      : (gcd == 1 ? random_dispatch::RETRY_SAME
         : random_dispatch::REDUCE_MAX);
};
Listing Five

class SmartRange : public Random
{
public:
   SmartRange(value_t seed) : Random(seed) { }
   using Random::next;
   template <value_t max> value_t next()
   { return detail::SmartRange::Dispatch<max>::next(*this); }
};
namespace detail { namespace SmartRange
{
   template <Random::value_t max,
      random_dispatch::dispatch_t =    
      random_traits<Random::value_t, max>::dispatch>
   class Dispatch;
}}
Listing Six

template <Random::value_t max>
class Dispatch<max, random_dispatch::NO_LOOP>
{
   typedef Random::value_t value_t;
public:
   static value_t next(Random& random)
   {
      // rnd <- [0 .. M]
      const value_t rnd = random.next();
      return rnd % max;
   }
};
Listing Seven

template <Random::value_t max>
class Dispatch<max, random_dispatch::RETRY_SAME>
{
   typedef Random::value_t             value_t;
   typedef random_traits<value_t, max> random_traits;
public:
   static value_t next(Random& random)
   {
      value_t rnd;
      // Once rnd is in the safe range of
      // [0 .. last], return rnd % max
      do
         // rnd <- [0 .. M]
         rnd = random.next();
      while (rnd > random_traits::last);
      return rnd % max;
   }
};
Listing Eight

template <Random::value_t max>
class Dispatch<max, random_dispatch::REDUCE_MAX>
{
   typedef Random::value_t value_t;
   typedef random_traits<value_t, max> random_traits;
public: 
   static value_t next(Random& random)
   {
      // rnd <- [0 .. M]
      const value_t rnd = random.next();

      // If rnd is in the safe range of
      // [0 .. last], return rnd % max
      if (rnd <= random_traits::last)
         return rnd % max;
      // Otherwise, we have the partial random value
      // in [last+1 .. M] range, and use it if possible
      // (this can happen only once, but it doesn't
      // matter for the implementation).
      else
         return max/random_traits::gcd
            * ((rnd - (random_traits::last + 1))
               % random_traits::gcd)
            + Dispatch<max/random_traits::gcd>::next(random);
   }
};
Listing Nine

Evaluation

The supplied code (available online at www.ddj.com/code/) can be used to estimate the runtime optimization resulting from the use of SmartRange. The theoretical limit is 25 percent off the original runtime: Going down from the expected 2 calls to next() per each call to next(max) when the probability of rejection is 1/2, to 1.5 expected calls to next() when the probability of the first rejection is 1/2 , and after reduction of m the probability of another rejection is 0 (one call × probability of 1/2 , plus two calls × probability of 1/2).

This result is what happens with the chosen Mersenne Twister implementation, when defining, for instance, m=231+32 on 32-bit platforms. Actually, SmartRange favors even better than the aforementioned 25 percent, probably due to more information being available to the compiler in this case; see Table 1.

Approach next()/next(max) Time
SimpleRange 1.99976 1.30s (0.53s w/inlining)
SmartRange 1.50794 0.35s

Table 1: Comparing SimpleRange with SmartRange on m = 231 + 32, with 10,000,000 iterations.

Conclusion

The computation of the remainder range size rem is carefully written to avoid overflows when value_t is an unsigned integer (as it is in the Random wrapper class). Mathematically, if you define m as the desired range exclusive maximum, M as the maximum number representable by value_t plus 1 (that is, making the previously used M greater by 1), and r as the remainder range size, then r=M mod m. Moreover, the GCD value that we compute in random_traits is actually:


g=gcd(m,r)=gcd(m,M mod m)=gcd(M,m)

Because on most architectures M is a power of 2 (do you know a platform on which it is not true?), g is actually 2 to the power of the number of trailing zeros in the binary representation of m. Thus, if we wish to forgo platform neutrality, the approach I present in this article might be efficient even when m is not known at compile time, especially if appropriate bit-twiddling hacks are employed (graphics.stanford.edu/~seander/bithacks.html).

Can REDUCE_MAX happen more than once for a single call to next(max)? The answer is no, but proving it is left as an exercise to interested readers.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.