Random number generation (RNG) is an important and desired property for any Monte Carlo simulation or cryptography application. The strength of such systems depend on the appearance of randomness amongst a desired set of generated numbers. Since the random numbers required are often in high orders of magnitude, computer-based systems for RNG are widely used. However, these systems often fall short of the goal in producing an apparent patternless stream of values, even though they still might meet some statistical tests for fitness. Such fitness tests are used to analyze the randomness of numbers to ensure that the values are void of any discernible patterns. The numerous applications of randomness have led to many different methods of generating random data. These methods vary regarding the fitness, or how unpredictable/statistically random their generated values appear, and how quickly the values can be created.
In Practical Random Number Generation in Software, John Viega demonstrated that there is a large gap between the theory and practice for random number generation and proposed various solutions to the real world pseudo-RNGs (PRNG). Here we reference all random number generators as being RNG and do not try to further qualify them into being pseudo versus non-pseudo random number generators.
In this article we present the idea of using threaded concurrent systems for seedless random number generation. The motivation comes from the notion that each thread executes independently of its concurrent brethren, spawned by a single parent process. These threads are responsible for flipping the bits of a block of memory which is then used as a random value.
Broadly speaking, there are two forms of random number generators:
- Generators that rely on mathematical algorithms, such as linear congruential generators. They are often provided via a specific platform's implementation of the Standard C99 rand() routine.
- Generators that derive their pseudo-random values from external or seemingly random sources, such as radio waves or motions of the mouse.
It is not uncommon for RNGs generated from a deterministic environment to be based on mathematical algorithms, which inherently limit the randomness of the system. Such algorithms can replicate a pattern over a period of time and quantity, which can eventually compromise the randomness of the system. The computational algorithms that produce long sequences of apparently random results are in fact completely determined by a shorter initial value, known as a seed or key.
Here we explore the idea of using threads as a seedless means of producing pseudo-random values in a deterministic system without relying on the properties of a mathematical algorithm. We can thus uniformly eliminate the issues associated with seeded RNGs. Besides we rely on an implementation which is similar to Standard C99 rand() routine, so all that needs to be done is replacing the standard function calls with our library calls. Finally, non-determinism by massively parallel thread executions is a concept unexplored till date. Though there are RNGs which use parallel threads per seed -- for selective seed policy the idea of using a thread lifecycle for generating a random number itself is new.
Concurrent Systems and Non-Determinism
In a concurrent system, each thread, by itself, is a lightweight process. Such threads are scheduled to execute once a processor is available. Multiple threads can be executed in parallel, on a multiprocessor system since they all are a part of the same process. While these threads are executing, any of them can be stopped and the order in which they are paused may be completely random for every turn. Since there is no priority among the threads, the operation of stopping and restarting any number of threads is carried in a non-deterministic manner. Thus, for each of the stop and restart operations, no particular thread can be prototyped to follow a pattern, unless the operating system scheduler has some influence. In other words, the programmer cannot make any assumptions about the order in which threads are scheduled.
Non-determinism is an aspect linked to the operating system process scheduler. Most of the modern operating systems use a hybrid between the deterministic and non-deterministic scheduling techniques. Real time, processor affinity or priority processes, require deterministic scheduling, whereas other processes or threads use a fair non-deterministic algorithm.
In such "fair algorithms" each process is given a time quantum for its execution. This is usually done to load balance a system effectively or achieve a target quality of service so that no one thread keeps the CPU to itself. When it comes to using threads, since every thread is scheduled to run in a process, the order of scheduling is not given much importance. Most of the schedulers use a random thread pickup, or a round-robin ordering, to execute threads. In fact in the case of some of the modern multicore systems, multiple threads can be executed in parallel on different cores of the system. This parallel execution results in the threads being executed and scheduled in a random manner. Such non-determinism depends on the processor architecture, libraries used, concurrent programming methodology, number of threads used to get an idea of the parallel operation at that instant.
The concept of concurrent programming and threading outlines the concept for our RNG. Our goal is to use the non-deterministic thread behavior to manipulate a block of memory. This block will then be converted to an integer and outputs as a single random value. We rely on the fact that since each thread uniquely represents a bit boolean value and since the threads are arbitrarily executing from our pool per operation, the random number generated must also be independent of the previous output.
Operating System Scheduling
The non-determinism of how a thread is accessed, when only a shared mutex amongst the threads prevents them from accessing a single shared resource, is a function dependant on the operating system's kernel scheduler. This scheduling of computational resources is the sole purpose of a process scheduler. Likewise, there are numerous scheduling algorithms that kernels can utilize to determine which process has time to execute operations on the system. This scheduling of process executions provides a kernel the ability to multi-task. Adding to the non-determinism potential of a threaded application, is user input. Consider a concurrent application running on the same system that a hypothetical threaded application is simultaneously running on. If any one of the applications is awaiting user input, the non-determinism of the system is further complicated. Depending on how "quick" the user is to respond, and what the user might have to do, the kernel must determine if other precesses are to be executed while awaiting user input. The scheduling framework is one of the primary driving factors that our threaded RNG relies on.
On a similar note, it might also be possible to extract patterns of the scheduler and determine the operating system that was using our tool to generate random values.
While this patterning would obviously not demonstrate successful pseudo-randomness of our concept, it could be interesting for anyone exploring finger-printing of operating systems.