Multithreaded Pseudo-Random Number Generator Library (MTPRNG)
To test the concept of generating pseudo-random values via a threaded implementation, we introduce MTPRNG (short for "Multi Threaded Pseudo Random Generator"), a static library that other applications can use in place of the common Standard C99 implementation. MTPRNG is available for download from the project homepage.
MTPRNG is designed as a static library that other applications can utilize in place of the Standard C99 rand() routine. The design of this library was to create a "base-case" for RNG via threads. In other words, efficiency was not a goal, rather we developed the library to implement our concept in a straightforward manner while trying to maintain some level of simplicity. This simplicity was necessary to assure that we implemented the threading properly, as thread development can become rather complicated. It was also necessary that our implementation is free of deadlock or other resource starvations, as such weaknesses would tarnish the validity of our study. While we cannot vouch that our system is free from these weaknesses, it has been run to completion without any locks, and the use of a mutex helps prevent resource mishandling.
Again, we implemented our RNG in a similar fashion to that of the Standard C99 specification. This similarity provides us with another RNG having similar parameters from which we could perform our analysis against. The following is how the ISO/SEC9899:TC3 C99 Standard Draft defines what the rand() routine should return:
- int rand(void);
- The rand function computes a sequence of pseudo-random integers in range 0 to RAND MAX.
- The implementation shall behave as if no library function calls the rand unction.
- The rand function returns a pseudo-random integer.
- The value of the RAND MAX macro shall be at least 32767.
MTPRNG utilizes the non-determinism property of a threaded application. The pool of random data, from which the RNG value is picked, is a shared resource that all the threads have access to. This data is randomized based on the number of bits that are to be returned. The C99 specification states that the rand() routine is to return an integer. Therefore, our implementation is setup to return an integer size of bits. The number of values returned by the C99 rand() is that of a signed integer from 0 to RAND MAX. An integer in the C99 specification is said to have a maximum value of 215- 1 or 32767. The actual size an architecture uses for an integer is not defined, rather it must hold at least the number of bits needed to represent the standard's range of values (-32767 to 32767). In the case of Linux, an integer is of size 32-bits.
MTPRNG uses threads to represent each bit of a 32-bit integer. This is horrendous performance-wise, because a bit has two values 0 or 1. Therefore, we have two threads for each bit, one thread has a value of '0' while another has a value of '1' for that same bit. Since we wanted to maintain simplicity in how we implemented the RNG, we have 64 threads to represent 32 bits. When a random value is requested, a mutex on a shared object is released and the first 32 threads that see this mutex available, enter the critical section where they can manipulate the shared critical resource. In certain cases, a value will be set and unset from a thread and thread composing its opposing value. In our smaller sized testing, it did not seem that the critical reference was getting mangled, in other words, when a value was stating that it got set, the collective result by all threads produced the expected value. Therefore, on an x86 architecture running a Linux kernel, our critical references seemed atomic in nature.
Mutual exclusion is another property that is often considered in a concurrent environment. The context of mutual exclusion is usually associated with access to a shared resource. Mutexes are utilized to prevent a mis-ordering of access between threads on a shared resource. When a shared resource is accessed, the mutex allows for a thread to block other threads so that the outcome of the thread's operation against that resource is predictable, and not confused by another thread trying to access that resource at the same time. Therefore, the access to such a resource is said to be mutually exclusive between threads. Aside progress and bounded waiting, mutual exclusion is also considered one of the three properties a concurrent application must utilize properly in ordered to be considered "correct" when considering critical sections within applications (see Modern Multithreading: Implementing, Testing, and Debugging Multithreaded Java and C++/Pthreads/Win32 Programs by Richard H. Carver and Kuo-Chung Tai).
MTPRNG utilizes a mutex to atomize access to the pool of random data that is to be manipulated. Therefore, when the lock is released a single thread can enter its critical section and inject its bit value into the proper location of the random data block of memory. Once that thread has updated the state of the random number generation, the lock is released and another thread can do the same thing. Just before the mutex is released, the thread increments a counter allowing the parent thread to know how many threads have manipulated the random data pool. Once enough bits have set, in our case one half of the total number of threads access the critical section, then the parent thread will set the bounded wait variable signaling that all threads are to stop manipulating the random pool. This mutex prevents all of the threads accessing the random data, or more than one thread manipulating the data at a time. While, the data is supposed to be pseudo-random, inherent from the scheduler, the reason for the mutual exclusion was to assure that only one thread at a time has atomic access to the critical reference (e.g., the random data pool). Such exclusion was implemented to assure that the threads are summoned in a non-deterministic order, one at a time, to manipulate the pool. This ordering of threads by the scheduler might provide insight as to what kernel was used, if the random data is not truly random, but consists of a pattern.
The parent thread monitors the request for a random value. When a value is requested, it sets a value that all threads are concurrently looking at. This action is effectively a bounded wait that the threads are performing. When the value is set, the threads are then allowed to enter their critical section, one at a time, to manipulate the random data pool. After the pool is filled, the random value is returned to the application requesting it, and the pool is reset.
The mutex and bounded waiting concepts outlined in the previous two sections are set in place to assure that the RNG will return a value and not an unexpected infinite delay. The atomicity and count of number of accesses to the pool, per random number request, are to aid progress of the threads so that they do not starve or cause a stall (i.e. live or deadlock). It is important to reduce any kind of starvation potentials, as stalling for a random number is not acceptable. The user requests data and not an excessive or never-ending deadlock on the shared resource.
Once RNGs are constructed and implemented, they are usually submitted to empirical statistical tests that try detecting statistical deficiencies by looking for evidence against the hypothesis. A test is defined by a statistic T function of a fixed set of numbers, whose distribution is known. As per P. L'Ecuyer's paper "Random Numbers" in the International Encyclopedia of Social and Behavioral Sciences, there is no universal test or battery of tests that can guarantee, when passed, that a given generator is fully reliable for all kinds of applications. However, passing many tests may improve one's confidence in the RNG.
When it comes to MTPRNG, our model is by far anywhere near perfect. However, such a model deserves studying, as its flaws might be correctable, which could result in a well-rounded RNG. Likewise, the utility of such a concept might stretch outside the bounds of merely providing values, as it could be suited as a scheduler profiling mechanism. It has been observed that the randomness of the numbers our implementation produces varies over the type of underlying kernel being used. Since the same compiler and the library were being used on two separate systems with different kernels, the kernels do seem to play an important role in characterizing the randomness. For instance, the Linux kernel, which characterizes a single thread per-process and uses a Completely Fair Scheduler, generated considerable random values over a number of samples compared to the Solaris kernel, which uses its own fair scheduler. Such results elude to the notion that the fingerprinting of the operating system and library (both kernels used POSIX thread generation) can play an important role in the RNG model.
Though it has still not being tested, the authors are confident that the processor architectures would play an important part in the model as well.
We conducted some tests to compare the randomness of our RNG to a standard RNG implemented with a seed and using algorithmic computations. We used two of the "tuftests" to analyze the randomness (see Some Difficult-to-pass Tests of Randomness, by George Marsaglia and Wai Wan Tsang). These tests are simplified variations of the battery of tests provided by the common diehard suite. Therefore, for a general perspective of our implementation, there was no reason to run the diehard set. We compare a sample distribution from our RNG with the standard provided by a number of presumably good RNGs -- "presumably good" meaning that they produce results similar to that of a well tested and accepted RNG as a standard. We decided to take 100,000 samples from our RNG and compare it with the lagged-Fibonacci RNG using the gcd() and bday() tuftests as a benchmark.
The results of our tests are as follows: A probability-value or "p-value" of 0.0 suggests that the values analyzed are completely random, as opposed to the lesser fit maximum range of 1.0. It was clear that the lesser the number of samples per implementation resulted in a p-value further away from 1.0. For a larger sample size, our RNG could not clear the gcd() tuftest but so is the case with the linear congruential generator.
We then carried out a test for 100,000 samples, which are displayed in Table 2. These results suggest that, as the number of samples reached 100,000, our randomness indicator was on 1.0. For the kernel specific analysis, the smaller sample size indicated a discrepancy in the results thus potentially providing a value to fingerprint the operating system.
We can thus discreetly claim that the threads scheduled on different kernels result in varied patterns of random numbers generated. It may be possible that the kernel schedulers thus play a role in the thread synchronization operation. However, more research needs to be conducted with this method of fingerprinting before we can come to an appropriate conclusion.
The second tuftest conducted was the bday() test. This test is an extension of the iterated spacings test. Marsaglia defines the birthday test bday() to treat each 32-bit number from the RNG as a birthday. The test generates 232 RNG samples (birthdays), sorts them, and then computes the differences between values. The result provides a purported Poisson value with a mean of 4. This process is repeated to get a sample of 5000 from the Poisson distribution. Then a goodness-of-fit test is applied to this sample and the result is inserted into a chi-square distribution function to get a p-value . In our case the bday() test showed mixed results, the duplicates expected and observed, varied based on the number of samples. For more duplicates the values were closer together compared to the fewer ones. The detailed results can be obtained from our original study (see 1CS-706: Random Number Generation via Threading). The original study, due to complications and time constraints, conducts tests on a set of values from MTPRNG using less than 100,000 samples.
Mere observation of a set of MTPRNG generated random values appears less than random. While such findings are not desired, we attribute the non-randomness to the kernel scheduler. Such conclusion, while we cannot say with complete certainty, might be useful as a kernel profiling/fingerprinting tool.
Conclusion and Other Work
It is clear that our thread-based seedless RNG is not perfect as it is an indirect function of the kernel scheduler, library, and underlying computational architecture. Though our implementation does not scale well with more samples, we can plan to carry further investigations on the per-thread lifecycle. By further exploring concepts in the concurrent programming realm, one can consider performing reachability testing on our threads. Such testing allows each thread to exercise a sequence of events based on various race-conditions. The various states might well provide a way of adding more nondeterminism to our RNG. The coordination of the generated random values with the state flow chart, per-thread, could provide hints as to how to increase the RNG's potential randomness. With the different "reachable" states, the library could then analyze the results and cull certain threading states that might produce less random data. There are some RNGs which use parallel threads per seed where multiple threads access the same generator or different generators. Then we have JAPARA which provides a selection of random number generator algorithms that allow the independent generation of random number streams on different threads, so there is no need for synchronized methods for generating the random numbers. Perhaps the simplest approach to constructing the parallel random number generator would be to set the seed once when the first generator object is instantiated and then all subsequent generator objects that are instantiated in other threads would be automatically seeded to ensure that the sequences for each generator instance do not overlap. This is the approach used in the GNU Scientific Library Reference Manual and Substream package. But all these approaches differ to our work since they all fall in the family of parallel seed based RNGs.
In summary, we used concurrency and thread synchronization as a means to generate random numbers. Although the implementation may not have strong randomness, it provides an initial concept to build upon. There are numerous methodologies by which we can enhance the design pattern of our library model, such as supporting an increase in the threads or using a combination of other RNGs.