Introduction and Summary
As I see it, there are two kinds of random number generators (RNGs) needed by most applications, namely—
- statistical-random generators, which seek to generate numbers that follow a uniform random distribution, and
- unpredictable-random generators, which seek to generate numbers that are cost-prohibitive to predict.
This document covers:
- Statistical-random and unpredictable-random generators, as well as recommendations on their use and properties.
- A discussion on when an application that needs numbers that "seem" random should specify their own "seed" (the initial state that the numbers are based on).
- An explanation of what programming language interfaces implement statistical-random and unpredictable-random generators, as well as advice on implementing them in programming languages.
- Issues on shuffling with an RNG.
This document does not cover:
- Testing an RNG implementation for adequate random number generation.
- Applications for which the selection of RNGs is constrained by statutory or regulatory requirements.
The following table summarizes the kinds of RNGs covered in this document:
|Kind of RNG ||When to Use This RNG ||Examples |
|Unpredictable-Random ||In information security cases, or when speed is not a concern. || |
|Statistical-Random ||When information security is not a concern, but speed is. See also "Shuffling". || |
|Seeded PRNG ||When generating reproducible results in a way not practical otherwise. ||Statistical-random quality PRNG with custom seed |
The following definitions are helpful in better understanding this document.
- Random number generator (RNG). Software and/or hardware that seeks to generate independent numbers that seem to occur by chance and that are approximately uniformly distributed(1).
- Pseudorandom number generator (PRNG). A random number generator that outputs seemingly random numbers using a deterministic algorithm (that is, an algorithm that returns the same output for the same input and state every time) and without making explicit use of nondeterminism.
- Seed. Arbitrary data for initializing the state of a PRNG.
- State length. The maximum size of the seed a PRNG can take to initialize its state without truncating or compressing that seed.
- Period. The maximum number of values in a generated sequence for a PRNG before that sequence repeats. The period will not be greater than 2L where L is the PRNG's state length.
- Stable. A programming interface is stable if it has no behavior that is unspecified, implementation-dependent, nondeterministic, or subject to future change.
- Information security. Defined in ISO/IEC 27000.
Unpredictable-random implementations (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that are cost-prohibitive to predict. Such implementations are indispensable in information security contexts, such as—
- generating keying material, such as encryption keys,
- generating random passwords, nonces, or session identifiers,
- generating "salts" to vary hash codes of the same password,
- use in communications between two networked computers,
- use in transfer, transport, messaging, and other communication protocols, and
- cases (such as in multiplayer networked games) when predicting future random numbers would give a player or user a significant and unfair advantage.
They are also useful in cases where the application generates random numbers so infrequently that the RNG's speed is not a concern.
An unpredictable-random implementation ultimately relies on one or more nondeterministic sources (sources that don't always return the same output for the same input) for random number generation. Sources that are reasonably fast for most applications (for instance, by producing very many random bits per second), especially sources implemented in hardware, are highly advantageous here, since an implementation for which such sources are available can rely less on PRNGs, which are deterministic and benefit from reseeding as explained later.
An unpredictable-random implementation generates uniformly distributed random bits such that it would be at least cost-prohibitive for an outside party to guess either prior or future unseen bits of the random sequence correctly with more than a 50% chance per bit, even with knowledge of the randomness-generating procedure, the implementation's internal state at the given point in time, and/or extremely many outputs of the RNG. (If the sequence was generated directly by a PRNG, ensuring future bits are unguessable this way should be done wherever the implementation finds it feasible; for example, see "Seeding and Reseeding".)
Seeding and Reseeding
If an unpredictable-random implementation uses a PRNG, the following requirements apply.
The PRNG's state length must be at least 128 bits and should be at least 256 bits.
Before an instance of the RNG generates a random number, it must have been initialized ("seeded") with an unpredictable seed, defined as follows. The seed—
- must consist of data which meets the quality requirement described earlier, which does not contain, in whole or in part, the PRNG's own output, and which ultimately derives from one or more nondeterministic sources (such data may be mixed with other arbitrary data as long as the result is no less cost-prohibitive to predict), and
- must be at least the same size as the PRNG's state length.
The RNG should be reseeded from time to time (using a newly generated unpredictable seed) to help ensure the unguessability of the output. If the implementation reseeds, it must do so before it generates more than 267 bits without reseeding and should do so as often as feasible (whenever doing so would not slow applications undesirably).
Examples of unpredictable-random implementations include the following:
/dev/random device on many Unix-based operating systems, which generally uses only nondeterministic sources; however, in some implementations of the device it can block for seconds at a time, especially if not enough randomness ("entropy") is available.
/dev/urandom device on many Unix-based operating systems, which often relies on both a PRNG and the same nondeterministic sources used by
BCryptGenRandom method in recent Windows-based systems.
- Two-source extractors, multi-source extractors, or cryptographic hash functions that take very hard-to-predict signals from two or more nondeterministic sources as input. Such sources include, where available—
- disk access timings,
- keystroke timings,
- thermal noise, and
- A. Seznec's technique called hardware volatile entropy gathering and expansion, provided a high-resolution counter is available.
Statistical-random generators are used, for example, in simulations, numerical integration, and many games to bring an element of chance and variation to the application, with the goal that each possible outcome is equally likely. However, statistical-random generators are generally suitable only if—
- information security is not involved, and
- the application generates random numbers so frequently that it would slow down undesirably if an unpredictable-random implementation were used instead.
If more than 20 items are being shuffled, a concerned application would be well advised to use alternatives to this kind of implementation (see "Shuffling").
A statistical-random implementation is usually implemented with a PRNG, but can also be implemented in a similar way as an unpredictable-random implementation provided it remains reasonably fast.
A statistical-random implementation generates random bits, each of which is uniformly randomly distributed independently of the other bits, at least for nearly all practical purposes. If the implementation uses a PRNG, that PRNG algorithm's expected number of state transitions before a cycle occurs and its expected number of state transitions during a cycle must each be at least 232. The RNG need not be equidistributed.
Seeding and Reseeding
If a statistical-random implementation uses a PRNG, the following requirements apply.
The PRNG's state length must be at least 64 bits, should be at least 128 bits, and is encouraged to be as high as the implementation can go to remain reasonably fast for most applications.
Before an instance of the RNG generates a random number, it must have been initialized ("seeded") with a seed described as follows. The seed—
- must consist of data not known a priori by the implementation, such as random bits from an unpredictable-random implementation,
- must not contain, in whole or in part, the RNG's own output,
- must not be a fixed value, a nearly fixed value, or a user-entered value,
- is encouraged not to consist of a timestamp (especially not a timestamp with millisecond or coarser granularity)(2), and
- must be at least the same size as the PRNG's state length.
The implementation is encouraged to reseed itself from time to time (using a newly generated seed as described earlier), especially if the PRNG has a state length less than 238 bits. If the implementation reseeds, it should do so before it generates more values than the square root of the PRNG's period without reseeding.
Examples and Non-Examples
Examples of statistical-random generators include the following:
- XorShift* 128/64 (state length 128 bits; nonzero seed).
- XorShift* 64/32 (state length 64 bits; nonzero seed).
xoroshiro128+ (state length 128 bits; nonzero seed — but see note in the source code about the lowest bit of the PRNG's outputs).
Lehmer128 (state length 128 bits).
JKISS on top of page 3 of Jones 2010 (state length 128 bits; seed with four 32-bit nonzero pieces).
std::ranlux48 engine (state length 577 bits; nonzero seed).
- PCG (classes named
pcg64_fast; state lengths 127, 255, and 127 bits, respectively) by Melissa O'Neill.
Non-examples include the following:
Seeded Random Generators
In addition, some applications use pseudorandom number generators (PRNGs) to generate results based on apparently-random principles, starting from a known initial state, or "seed". Such applications usually care about reproducible results. (Note that in the definitions for unpredictable-random and statistical-random generators given earlier, the PRNGs involved are automatically seeded before use.)
An application should use a PRNG with a seed it specifies (rather than an automatically-initialized PRNG or another kind of RNG) only if—
- the initial state (the seed) which the "random" result will be generated from—
- is hard-coded,
- was based on user-entered data,
- is known to the application and was generated using an unpredictable-random or statistical-random implementation (as defined earlier),
- is a verifiable random number (as defined later), or
- is based on a timestamp (but only if the reproducible result is not intended to vary during the time specified on the timestamp and within the timestamp's granularity; for example, a year/month/day timestamp for a result that varies only daily),
- the application might need to generate the same "random" result multiple times,
- the application either—
- makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
- finds it impractical to store or distribute the "random" numbers or results (rather than the seed) for later use, such as—
- by saving the result to a file,
- by storing the random numbers for the feature generating the result to "replay" later, or
- by distributing the results or the random numbers to networked users as they are generated, and
- any feature using that random number generation method to generate that "random" result will remain backward compatible with respect to the "random" results it generates, for as long as that feature is still in use by the application.
Meeting recommendation 4 is aided by using stable PRNGs; see "Definitions" and the following examples:
java.util.Random is stable.
- The C
rand method is not stable (because the algorithm it uses is unspecified).
- C++'s random number distribution classes, such as
std::uniform_int_distribution, are not stable (because the algorithms they use are implementation-defined according to the specification).
System.Random is not stable (because its generation behavior could change in the future).
Seedable PRNG Recommendations
Which PRNG to use for generating reproducible results depends on the application. But as recommendations, any PRNG algorithm selected for producing reproducible results—
- should meet or exceed the quality requirements of a statistical-random implementation,
- should be reasonably fast, and
- should have a state length of 64 bits or greater.
Custom seeds can come into play in the following situations, among others.
Many kinds of games generate game content using apparently-random principles, such as—
- procedurally generated maps for a role-playing game,
- shuffling a digital deck of cards for a solitaire game, or
- a game board or puzzle board that normally varies every session,
where the game might need to generate the same content of that kind multiple times.
In general, such a game should use a PRNG with a custom seed for such purposes only if—
- generating the random content uses relatively many random numbers (say, more than a few thousand), and the application finds it impractical to store or distribute the content or the numbers for later use (see recommendations 2 and 3), or
- the game makes the seed (or a "code" or "password" based on the seed, such as a barcode or a string of letters and digits) accessible to the player, to allow the player to regenerate the content (see recommendations 2 and 3).
Option 1 often applies to games that generate procedural terrain for game levels, since the terrain often exhibits random variations over an extended space. Option 1 is less suitable for puzzle game boards or card shuffling, since much less data needs to be stored.
Suppose a game generates a map with random terrain and shows the player a "code" to generate that map. Under recommendation 4, the game—
- may change the algorithm it uses to generate random maps, but
- should use, in connection with the new algorithm, "codes" that can't be confused with "codes" it used for previous algorithms, and
- should continue to generate the same random map using an old "code" when the player enters it, even after the change to a new algorithm.
A custom seed is appropriate when unit testing a method that uses a seeded PRNG in place of another kind of RNG for the purpose of the test (provided the method meets recommendation 4).
Verifiable Random Numbers
Verifiable random numbers are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, of the information used to derive such numbers—
- at least some of it is not known by anyone until some time after the announcement is made that those numbers will be generated, but all of it will eventually be publicly available, and
- some of it can be disclosed in the announcement that those numbers will be generated.
One process to generate verifiable random numbers is described in RFC 3797 (to the extent its advice is not specific to the Internet Engineering Task Force or its Nominations Committee). Although the source code given in that RFC uses the MD5 algorithm, the process does not preclude the use of hash functions stronger than MD5 (see the last paragraph of section 3.3 of that RFC).
Randomly generated numbers can serve as noise, that is, a randomized variation in images and sound. (See also Red Blob Games, "Noise Functions and Map Generation")(3). In general, the same considerations apply to any RNGs the noise implementation uses as in other cases.
However, special care should be taken if the noise implementation implements cellular noise, value noise, or gradient noise (such as Perlin noise) and uses one of the following techniques:
- The implementation should use a table of "hard-coded" gradients or hash values only if the noise generation meets the seeding recommendations (treating the table as the seed).
- If the noise implementation incorporates a hash function—
- that hash function should be reasonably fast, be stable (see "Definitions"), and have the so-called avalanche property, and
- the noise implementation should be initialized in advance with arbitrary data of fixed length to provide to the hash function as part of its input, if the seeding recommendations apply to the noise generation.
Wherever feasible, a cellular, value, or gradient noise implementation should use an RNG to initialize a table of gradients or hash values in advance, to be used later by the noise function (a function that outputs seemingly random numbers given an n-dimensional point).
Programming Language APIs
The following table lists application programming interfaces (APIs) implementing unpredictable-random and statistical-random RNGs for popular programming languages. Note the following:
- In single-threaded applications, for each kind of RNG, it's encouraged to create a single instance of the RNG on application startup and use that instance throughout the application.
- In multithreaded applications, for each kind of RNG, it's encouraged to either—
- create a single thread-safe instance of the RNG on application startup and use that instance throughout the application, or
- store separate and independently-initialized instances of the RNG in thread-local storage, so that each thread accesses a different instance (this might not always be ideal for unpredictable-random RNGs).
- Methods and libraries mentioned in the "Statistical-random" column need to be initialized with a full-length seed before use (for example, a seed generated using an implementation in the "Unpredictable-random" column).
- The mention of a third-party library in this section does not imply sponsorship or endorsement of that library, or imply a preference of that library over others. The list is not comprehensive.
|Language ||Unpredictable-random ||Statistical-random ||Other |
|C/C++ (G) ||(C) || |
xoroshiro128plus.c (128-bit nonzero seed);
xorshift128plus.c (128-bit nonzero seed)
|Python || |
secrets.SystemRandom (since Python 3.6);
|ihaque/xorshift library (128-bit nonzero seed; default seed uses |
random.seed() (19,936-bit seed) (A)
|Java (D) ||(C); |
|grunka/xorshift ( |
crypto.randomBytes(byteCount) (node.js only)
Math.random() (ranges from 0 through 1) (B)
|Ruby ||(C); |
SecureRandom class (
| || |
Random#rand() (ranges from 0 through 1) (A) (E);
Random#rand(N) (integer) (A) (E);
Random.new(seed) (default seed uses entropy)
(A) Default general RNG implements the Mersenne Twister, which doesn't meet the statistical-random requirements, strictly speaking, but might be adequate for many applications due to its extremely long period.
Math.random is implemented using
Math.random is "implementation-dependent", though, according to the ECMAScript specification.
(C) See "Advice for New Programming Language APIs" for implementation notes for unpredictable-random implementations.
java.util.Random class uses a 48-bit seed, so doesn't meet the statistical-random requirements. However, a subclass of
java.util.Random might be implemented to meet those requirements.
(E) In my opinion, Ruby's
Random#rand method presents a beautiful and simple API for random number generation.
(F) At least in Unix-based systems, calling the
SecureRandom constructor that takes a byte array is recommended. The byte array should be data described in note (C).
std::random_device, introduced in C++11, is not recommended because its specification leaves considerably much to be desired. For example,
std::random_device can fall back to a pseudorandom number generator of unspecified quality without much warning.
Advice for New Programming Language APIs
Wherever possible, applications should use existing libraries and techniques that already meet the requirements for unpredictable-random and statistical-random RNGs. For example—
an unpredictable-random implementation can—
- read from the
/dev/random devices in most Unix-based systems (using the
read system calls where available),
- call the
getentropy method on OpenBSD, or
- call the
BCryptGenRandom API in recent Windows-based systems,
and only use other techniques if the existing solutions are inadequate in certain respects or in certain circumstances, and
- a statistical-random implementation can use a PRNG algorithm mentioned as an example in the statistical-random generator section.
If existing solutions are inadequate, a programming language API could implement unpredictable-random and statistical-random RNGs by filling an output byte buffer with random bytes, where each bit in each byte will be randomly set to 0 or 1. For instance, a C language API for unpredictable-random generators could look like the following:
int random(uint8_t bytes, size_t size);, where "bytes" is a pointer to a byte array, and "size" is the number of random bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise. Any programming language API that implements such RNGs by filling a byte buffer ought to run in amortized linear time on the number of random bytes the API will generate.
Unpredictable-random and statistical-random implementations—
- should be reasonably fast for most applications, and
- should be safe for concurrent use by multiple threads, whenever convenient.
My document on random number generation methods includes details on eleven uniform random number methods; in my opinion, a new programming language's standard library ought to include those eleven methods separately for unpredictable-random and for statistical-random generators. That document also discusses how to implement other methods to generate random numbers or integers that follow a given distribution (such as a normal, geometric, binomial, or discrete weighted distribution) or fall within a given range.
There are special considerations in play when applications use RNGs to shuffle a list of items.
The first consideration touches on the shuffling method. The Fisher–Yates shuffle method does a substantially unbiased shuffle of a list, assuming the RNG it uses can choose from among all permutations of that list. However, that method is also easy to mess up (see also Jeff Atwood, "The danger of naïveté"); I give a correct implementation in another document.
Choosing from Among All Permutations
The second consideration is present if the application uses PRNGs for shuffling. If the PRNG's period is less than the number of distinct permutations (arrangements) of a list, then there are some permutations that PRNG can't choose when it shuffles that list. (This is not the same as generating all permutations of a list, which, for a sufficiently large list size, can't be done by any computer in a reasonable time.)
The number of distinct permutations is the multinomial coefficient m! / (w1! × w2! × ... × wn!), where m is the list's size, n is the number of different items in the list, x! means "x factorial", and wi is the number of times the item identified by i appears in the list. (This reduces to n!, if the list consists of n different items).
Formulas suggesting state lengths for PRNGs are implemented below in Python. For example, to shuffle a 52-item list, a PRNG with state length 226 or more is suggested, and to shuffle two 52-item lists of identical contents together, a PRNG with state length 500 or more is suggested.
""" Calculates factorial of x. """
if x<=1: return 1
for i in range(x): ret=ret*(i+1)
""" Calculates base-2 logarithm of x, rounded up. """
one=needCeil and ((x&1)!=0)
""" Suggested state length for PRNGs that shuffle
a list of n items. """
def stateLengthNChooseK(n, k):
""" Suggested state length for PRNGs that choose k
different items randomly from a list of n items
(see RFC 3797, sec. 3.3) """
def stateLengthDecks(numDecks, numCards):
""" Suggested state length for PRNGs that shuffle
multiple decks of cards in one. """
return ceillog2(fac(numDecks*numCards)/ \
Whenever a statistical-random implementation or seeded RNG is otherwise called for, an application is encouraged to choose a PRNG with a state length suggested by the formulas above (and with the highest feasible period for that state length), where the choice of PRNG is based on—
- the maximum size of lists the application is expected to shuffle, if that number is less than 100; otherwise,
- the average size of such lists; or, if the application chooses,
- the application shuffling 100-item lists (which usually means a state length of 525 or greater).
(Practically speaking, for sufficiently large list sizes, any given PRNG will not be able to randomly choose some permutations of the list. See also "Lack of randomness" in the BigDeal document by van Staveren.)
The PRNG chosen this way—
- should meet or exceed the quality requirements of a statistical-random implementation, and
- should have been initialized automatically with an unpredictable seed before use.
A seemingly random number can be generated from arbitrary data using a hash function.
A hash function is a function that takes an arbitrary input of any size (such as a sequence of bytes or a sequence of characters) and returns an output with a fixed size. That output is also known as a hash code. (By definition, hash functions are deterministic. The definition includes a PRNG that takes the input as a seed and outputs a random number of fixed size(4).)
A hash code can be used as follows:
- The hash code can serve as a seed for a PRNG, and the desired random numbers can be generated from that PRNG. (See my document on random number generation methods for techniques.)
- If a number of random bits is needed, and the hash code has at least that many bits, then that many bits can instead be taken directly from the hash code.
For such purposes, applications should choose hash functions designed such that—
- every bit of the input affects every bit of the output without a clear preference for 0 or 1 (the so-called avalanche property), and
- if the hash function's use implicates information security, then—
- it is at least cost-prohibitive to find an unknown second input that leads to the same output as that of a given input (the one-way property), and
- it is at least cost-prohibitive to find an unknown input that leads to a given output (collision resistance).
GPU Programming Environments
Because, in general, GL Shading Language (GLSL) and other programming environments designed for execution on a graphics processing unit (GPU)—
- have limited access to some system resources compared with other programming environments,
- are designed for parallel execution, and
- do not store state,
random number generators for such environments are often designed as hash functions, because their output is determined solely by the input rather than both the input and state (as with PRNGs). Moreover, some of the hash functions which have been written in GLSL give undesirable results in computers whose GPUs support only 16-bit binary floating point numbers and no other kinds of numbers, which makes such GPUs an important consideration when choosing a hash function.
In this document, I made the distinction between statistical-random and unpredictable-random generators because that is how programming languages often present random number generators — they usually offer a general-purpose RNG (such as C's
rand or Java's
java.util.Random) and sometimes an RNG intended for information security purposes (such as
What has motivated me to write a more rigorous definition of random number generators is the fact that many applications still use weak RNGs. In my opinion, this is largely because most popular programming languages today—
- specify few and weak requirements on RNGs (such as C's
- specify a relatively weak general-purpose RNG (such as Java's
java.math.Random, although it also includes a much stronger
- implement RNGs by default that leave a bit to be desired (particularly the Mersenne Twister algorithm found in PHP's
mt_rand as well as in Python and Ruby),
- seed RNGs with a timestamp by default (such as the .NET Framework implementation of
- leave the default seeding fixed.
In conclusion, most applications that require random numbers usually want either unpredictability ("cryptographic security"), or speed and high quality. I believe that RNGs that meet the descriptions specified in the Unpredictable-Random Generators and Statistical-Random Generators sections will meet the needs of those applications.
In addition, this document recommends using unpredictable-random implementations in many cases, especially in information security contexts, and recommends easier programming interfaces for both unpredictable-random and statistical-random implementations in new programming languages.
- the commenters to the CodeProject version of this page (as well as a similar article of mine on CodeProject), including "Cryptonite" and member 3027120, and
- Lee Daniel Crocker, who reviewed this document and gave comments.
Request for Comments
Feel free to send comments. They could help improve this page.
Comments on any aspect of the document are welcome, but answers to the following would be particularly appreciated.
- Have I characterized the randomness needs of applications properly?
- Did I cover the vast majority of applications that require randomness?
- Are there existing programming language APIs or software libraries, not mentioned in this document, that already meet the requirements for unpredictable-random or statistical-random RNGs?
- Are there certain kinds of applications that require a different kind of RNG (unpredictable-random, statistical-random, seeded, etc.) than I recommended?
- In a typical computer a consumer would have today:
- How many random numbers per second would an unpredictable-random implementation generate? A statistical-random implementation?
- How many random numbers per second does a typical application using RNGs generate? Are there applications that usually generate considerably more random numbers than that per second?
(1) If a number generator uses a nonuniform distribution, but otherwise meets this definition, then it can be converted to one with a uniform distribution, at least in theory, by applying the nonuniform distribution's cumulative distribution function (CDF) to each generated number. A CDF returns, for each number, the probability for a randomly generated variable to be equal to or less than that number; the probability is 0 or greater and 1 or less. Further details on CDFs or this kind of conversion are outside the scope of this document.
(2) This statement appears because multiple instances of a PRNG automatically seeded with a timestamp, when they are created at about the same time, run the risk of starting with the same seed and therefore generating the same sequence of random numbers.
(3) Noise implementations include cellular noise, value noise, gradient noise, colored noise (including white noise and pink noise), and noise following a Gaussian or other probability distribution. A noise implementation can use fractional Brownian motion to combine several layers of cellular, value, or gradient noise by calling the underlying noise function several times.
Note that usual implementations of noise (other than cellular, value, or gradient noise) don't sample each point of the sample space more than once; rather, all the samples are generated (e.g., with an RNG), then, for colored noise, a filter is applied to the samples.
(4) Note that some PRNGs (such as
xorshift128+) are not well suited to serve as hash functions, because they don't mix their state before generating a random number from that state.