Introduction and Summary
Many applications rely on random number generators (RNGs); these RNGs include—
- statistical RNGs, which seek to generate numbers that follow a uniform random distribution,
- cryptographic RNGs, which seek to generate numbers that are cost-prohibitive to predict, and
- seeded PRNGs (pseudorandom number generators), which generate numbers that "seem" random given an initial "seed".
This document covers:
- Statistical and cryptographic RNGs, as well as recommendations on their use and properties.
- A discussion on when an application that needs numbers that "seem" random should specify their own "seed" (the initial state that the numbers are based on).
- An explanation of what programming language interfaces implement statistical and cryptographic RNGs, as well as advice on implementing them in programming languages.
- Issues on shuffling with an RNG.
This document does not cover:
- Testing an RNG implementation for correctness or adequate random number generation.
- Applications for which the selection of RNGs is constrained by statutory or regulatory requirements.
The following table summarizes the kinds of RNGs covered in this document:
|Kind of RNG||When to Use This RNG||Examples|
|Cryptographic RNG||In information security cases, or when speed is not a concern.|
|Statistical RNG||When information security is not a concern, but speed is. See also "Shuffling".|
|Seeded PRNG||When generating reproducible results in a way not practical otherwise.||High-quality PRNG with custom seed|
The following definitions are helpful in better understanding this document.
- Random number generator (RNG). Software and/or hardware that seeks to generate independent numbers that seem to occur by chance and that are approximately uniformly distributed(1).
- Pseudorandom number generator (PRNG). A random number generator that outputs seemingly random numbers using a deterministic algorithm (that is, an algorithm that returns the same output for the same input and state every time), and in which its state can be initialized and possibly reinitialized with arbitrary data.
- Seed. Arbitrary data for initializing the state of a PRNG.
- State length. The maximum size of the seed a PRNG can take to initialize its state without truncating or compressing that seed.
- Period. The maximum number of values in a generated sequence for a PRNG before that sequence repeats. The period will not be greater than 2L where L is the PRNG's state length.
- Stable. A programming interface is stable if it has no behavior that is unspecified, implementation-dependent, nondeterministic, or subject to future change.
- Information security. Defined in ISO/IEC 27000.
Cryptographic RNGs (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that are cost-prohibitive to predict. Such RNGs are indispensable in information security contexts, such as—
- generating keying material, such as encryption keys,
- generating random passwords, nonces, or session identifiers,
- generating "salts" to vary hash codes of the same password,
- use in communications between two networked computers,
- use in transfer, transport, messaging, and other communication protocols, and
- cases (such as in multiplayer networked games) when predicting future random numbers would give a player or user a significant and unfair advantage.
They are also useful in cases where the application generates random numbers so infrequently that the RNG's speed is not a concern.
A cryptographic RNG ultimately relies on one or more nondeterministic sources, as described later in this section.
A cryptographic RNG implementation generates uniformly distributed random bits such that it would be at least cost-prohibitive for an outside party to guess either prior or future unseen bits of the random sequence correctly with more than a 50% chance per bit, even with knowledge of the randomness-generating procedure, the implementation's internal state at the given point in time, and/or extremely many outputs of the RNG. (If the sequence was generated directly by a PRNG, ensuring future bits are unguessable this way should be done wherever the implementation finds it feasible; for example, see "Seeding and Reseeding".)
Seeding and Reseeding
If a cryptographic RNG implementation uses a PRNG, the following requirements apply.
The PRNG's state length must be at least 128 bits and should be at least 256 bits. The security strength used by the RNG must be at least 112 bits, should be at least 128 bits, and may equal the PRNG's state length.
Before an instance of the RNG generates a random number, it must have been initialized ("seeded") with a seed defined as follows. The seed—
- must have as many bits as the PRNG's state length,
- must consist of data that ultimately derives from the output of one or more nondeterministic sources, where the output is at least as hard to predict as ideal random data with as many bits as the security strength, and
- may be mixed with arbitrary data other than the seed as long as the result is no easier to predict.
The RNG should be reseeded, using a newly generated seed as described earlier, to help ensure the unguessability of its output. If the implementation reseeds, it must do so before it generates more than 267 bits without reseeding, and should do so as often as feasible (whenever doing so would not slow down applications undesirably).
A cryptographic RNG ultimately relies on one or more nondeterministic sources (sources that don't always return the same output for the same input) for random number generation.(2) Examples of nondeterministic sources are—
- disk access timings,
- keystroke timings,
- thermal noise, and
- the output generated with A. Seznec's technique called hardware volatile entropy gathering and expansion, provided a high-resolution counter is available.
A value called entropy measures how hard it is to predict a nondeterministic source's output, compared to ideal random data; this is generally the size in bits of the ideal random data. (For example, a 64-bit output with 32 bits of entropy is as hard to predict as an ideal random 32-bit data block.) NIST SP 800-90B recommends min-entropy as the entropy measure and also details how nondeterministic sources can be used for information security.
If a cryptographic RNG implementation uses a PRNG, the output of the strongest nondeterministic source used to derive a seed ought to have as many bits of min-entropy as the security strength. If the implementation does not use a PRNG, the output of the strongest nondeterministic source used to derive an RNG output ought to have as many bits of min-entropy as the RNG output's size in bits.
Examples of cryptographic RNG implementations include the following:
/dev/random device on many Unix-based operating systems, which generally uses only nondeterministic sources; however, in some implementations of the device it can block for seconds at a time, especially if not enough randomness is available.
/dev/urandom device on many Unix-based operating systems, which often relies on both a PRNG and the same nondeterministic sources used by
BCryptGenRandom method in recent versions of Windows. (An independent analysis, published in 2007, showed flaws in an earlier version of
- Two-source extractors, multi-source extractors, or cryptographic hash functions that take very hard-to-predict signals from two or more nondeterministic sources as input.
- An RNG implementation complying with NIST SP 800-90A. The SP 800-90 series goes into further detail on how RNGs appropriate for information security can be constructed, and inspired much of the "Cryptographic RNGs" section.
Statistical RNGs are used, for example, in simulations, numerical integration, and many games to bring an element of chance and variation to the application, with the goal that each possible outcome is equally likely. However, statistical RNGs are generally suitable only if—
- information security is not involved, and
- the application generates random numbers so frequently that it would slow down undesirably if a cryptographic RNG were used instead.
If more than 20 items are being shuffled, a concerned application would be well advised to use alternatives to this kind of implementation (see "Shuffling").
A statistical RNG is usually implemented with a PRNG, but can also be implemented in a similar way as a cryptographic RNG provided it remains reasonably fast.
A statistical RNG generates random bits, each of which is uniformly randomly distributed independently of the other bits, at least for nearly all practical purposes. If the implementation uses a PRNG, that PRNG algorithm must either satisfy the one-way property or be significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of
BigCrush, part of L'Ecuyer and Simard's "TestU01". The RNG need not be equidistributed.
Seeding and Reseeding
If a statistical RNG implementation uses a PRNG, the following requirements apply.
The PRNG's state length must be at least 64 bits, should be at least 128 bits, and is encouraged to be as high as the implementation can go to remain reasonably fast for most applications.
Before an instance of the RNG generates a random number, it must have been initialized ("seeded") with a seed described as follows. The seed—
- must have as many bits as the PRNG's state length,
- must consist of data ultimately derived from the output of one or more nondeterministic sources (for example, the system clock) and/or cryptographic RNGs, where the output should cover a state space of at least as many bits as the PRNG's state length(3), and
- may be mixed with arbitrary data other than the seed.
The implementation is encouraged to reseed itself from time to time (using a newly generated seed as described earlier), especially if the PRNG has a state length less than 238 bits. If the implementation reseeds, it should do so before it generates more values than the square root of the PRNG's period without reseeding.
Examples and Non-Examples
Examples of statistical RNGs include the following:
- `xoshiro256`** (state length 256 bits; nonzero seed).
xoroshiro128+ (state length 128 bits; nonzero seed — but see note in the source code about the lowest bit of the PRNG's outputs).
Lehmer128 (state length 128 bits).
- XorShift* 128/64 (state length 128 bits; nonzero seed).
- XorShift* 64/32 (state length 64 bits; nonzero seed).
JKISS on top of page 3 of Jones 2010 (state length 128 bits; seed with four 32-bit nonzero pieces).
std::ranlux48 engine (state length 577 bits; nonzero seed).
The following also count as statistical RNGs, but are not preferred:
Non-examples include the following:
- Any linear congruential generator with modulus 263 or less (such as
java.util.Random and C++'s
std::minstd_rand0 engines) has a state length of less than 64 bits.
In addition, some applications use pseudorandom number generators (PRNGs) to generate results based on apparent randomness, starting from a known initial state, or "seed". Such applications usually care about reproducible results. (Note that in the definitions for cryptographic and statistical RNGs given earlier, the PRNGs involved are automatically seeded before use.)
An application should use a PRNG with a seed it specifies (rather than an automatically-initialized PRNG or another kind of RNG) only if—
- the initial state (the seed) which the "random" result will be generated from—
- is hard-coded,
- derived from user-entered data,
- is known to the application and was generated using a cryptographic or statistical RNG (as defined earlier),
- is a verifiable random number (as defined later), or
- is based on a timestamp (but only if the reproducible result is not intended to vary during the time specified on the timestamp and within the timestamp's granularity; for example, a year/month/day timestamp for a result that varies only daily),
- the application might need to generate the same "random" result multiple times,
- the application either—
- makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
- finds it impractical to store or distribute the "random" numbers or results (rather than the seed) for later use, such as—
- by saving the result to a file,
- by storing the random numbers for the feature generating the result to "replay" later, or
- by distributing the results or the random numbers to networked users as they are generated, and
- any feature using that random number generation method to generate that "random" result will remain backward compatible with respect to the "random" results it generates, for as long as that feature is still in use by the application.
Meeting recommendation 4 is aided by using stable PRNGs; see "Definitions" and the following examples:
java.util.Random is stable.
- The C
rand method is not stable (because the algorithm it uses is unspecified).
- C++'s random number distribution classes, such as
std::uniform_int_distribution, are not stable (because the algorithms they use are implementation-defined according to the specification).
System.Random is not stable (because its generation behavior could change in the future).
Recommendations for Seeded PRNGs
Which PRNG to use for generating reproducible results depends on the application. But as recommendations, any PRNG algorithm selected for producing reproducible results—
- should meet or exceed the quality requirements of a statistical RNG,
- should be reasonably fast, and
- should have a state length of 64 bits or greater.
Custom seeds can come into play in the following situations, among others.
Many kinds of games generate game content based on apparent randomness, such as—
- procedurally generated maps for a role-playing game,
- shuffling a digital deck of cards for a solitaire game, or
- a game board or puzzle board that normally varies every session,
where the game might need to generate the same content of that kind multiple times.
In general, such a game should use a PRNG with a custom seed for such purposes only if—
- generating the random content uses relatively many random numbers (say, more than a few thousand), and the application finds it impractical to store or distribute the content or the numbers for later use (see recommendations 2 and 3), or
- the game makes the seed (or a "code" or "password" based on the seed, such as a barcode or a string of letters and digits) accessible to the player, to allow the player to regenerate the content (see recommendations 2 and 3).
Option 1 often applies to games that generate procedural terrain for game levels, since the terrain often exhibits random variations over an extended space. Option 1 is less suitable for puzzle game boards or card shuffling, since much less data needs to be stored.
Example: Suppose a game generates a map with random terrain and shows the player a "code" to generate that map. Under recommendation 4, the game—
- may change the algorithm it uses to generate random maps, but
- should use, in connection with the new algorithm, "codes" that can't be confused with "codes" it used for previous algorithms, and
- should continue to generate the same random map using an old "code" when the player enters it, even after the change to a new algorithm.
A custom seed is appropriate when unit testing a method that uses a seeded PRNG in place of another kind of RNG for the purpose of the test (provided the method meets recommendation 4).
Verifiable Random Numbers
Verifiable random numbers are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, of the information used to derive such numbers—
- at least some of it is not known by anyone until some time after the announcement is made that those numbers will be generated, but all of it will eventually be publicly available, and
- some of it can be disclosed in the announcement that those numbers will be generated.
One process to generate verifiable random numbers is described in RFC 3797 (to the extent its advice is not specific to the Internet Engineering Task Force or its Nominations Committee). Although the source code given in that RFC uses the MD5 algorithm, the process does not preclude the use of hash functions stronger than MD5 (see the last paragraph of section 3.3 of that RFC).
Randomly generated numbers can serve as noise, that is, a randomized variation in images and sound. (See also Red Blob Games, "Noise Functions and Map Generation")(4). In general, the same considerations apply to any RNGs the noise implementation uses as in other cases.
However, a noise implementation that implements cellular noise, value noise, or gradient noise (such as Perlin noise) should, wherever feasible, use an RNG to initialize a table of gradients or hash values in advance, to be used later by the noise function (a function that outputs seemingly random numbers given an n-dimensional point). Those gradients or hash values may be "hard-coded" instead if the seeding recommendations apply to the noise generation (treating the hard-coded values as the seed).
If a cellular, value, or gradient noise implementation incorporates a hash function—
- that hash function should be reasonably fast, be stable (see "Definitions"), and have the so-called avalanche property, and
- if the seeding recommendations apply to the noise generation, the implementation should be initialized in advance with arbitrary data of fixed length to provide to the hash function as part of its input.
Programming Language APIs
The following table lists application programming interfaces (APIs) for
cryptographic and statistical RNGs for popular programming languages. Note the following:
- In single-threaded applications, for each kind of RNG, it's encouraged to create a single instance of the RNG on application startup and use that instance throughout the application.
- In multithreaded applications, for each kind of RNG, it's encouraged to either—
- create a single thread-safe instance of the RNG on application startup and use that instance throughout the application, or
- store separate and independently-initialized instances of the RNG in thread-local storage, so that each thread accesses a different instance (this might not always be ideal for cryptographic RNG implementations).
- Methods and libraries mentioned in the "Statistical" column need to be initialized with a seed before use (for example, a seed generated using an implementation in the "Cryptographic" column).
- The mention of a third-party library in this section does not imply sponsorship or endorsement
of that library, or imply a preference of that library over others. The list is not comprehensive.
xoroshiro128plus.c (128-bit nonzero seed);
xorshift128plus.c (128-bit nonzero seed)
secrets.SystemRandom (since Python 3.6);
|ihaque/xorshift library (128-bit nonzero seed; default seed uses |
random.seed() (19,936-bit seed) (A)
|Java (D)||(C); |
crypto.randomBytes(byteCount) (node.js only)
Math.random() (ranges from 0 through 1) (B)
SecureRandom class (
Random#rand() (ranges from 0 through 1) (A) (E);
Random#rand(N) (integer) (A) (E);
Random.new(seed) (default seed uses nondeterministic data)
random_int() (since PHP 7)
(A) General RNG implements Mersenne Twister, which is not preferred for a statistical RNG. PHP's
mt_rand() implements or implemented a flawed version of Mersenne Twister.
Math.random is implemented using
Math.random is "implementation-dependent", though, according to the ECMAScript specification.
(C) See "Advice for New Programming Language APIs" for implementation notes for cryptographic RNG implementations.
java.util.Random class uses a 48-bit seed, so doesn't meet the statistical RNG requirements. However, a subclass of
java.util.Random might be implemented to meet those requirements.
(E) In my opinion, Ruby's
Random#rand method presents a beautiful and simple API for random number generation.
(F) At least in Unix-based systems, calling the
SecureRandom constructor that takes a byte array is recommended. The byte array should be data described in note (C).
std::random_device, introduced in C++11, is not recommended because its specification leaves considerably much to be desired. For example,
std::random_device can fall back to a pseudorandom number generator of unspecified quality without much warning.
Advice for New Programming Language APIs
Wherever possible, applications should use existing libraries and techniques that already meet the requirements for cryptographic and statistical RNGs. For example—
a cryptographic RNG implementation can—
- read from the
/dev/random devices in most Unix-based systems (using the
read system calls where available),
- call the
getentropy method on OpenBSD, or
- call the
BCryptGenRandom API in recent Windows-based systems,
and only use other techniques if the existing solutions are inadequate in certain respects or in certain circumstances, and
- a statistical RNG implementation can use a PRNG algorithm mentioned as an example in the statistical RNGs section.
If existing solutions are inadequate, a programming language API could implement cryptographic and statistical RNGs by filling an output byte buffer with random bytes, where each bit in each byte will be randomly set to 0 or 1. For instance, a C language API for such RNGs could look like the following:
int random(uint8_t bytes, size_t size);, where "bytes" is a pointer to a byte array, and "size" is the number of random bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise. Any programming language API that implements such RNGs by filling a byte buffer ought to run in amortized linear time on the number of random bytes the API will generate.
Cryptographic and statistical RNG implementations—
- should be reasonably fast for most applications, and
- should be safe for concurrent use by multiple threads, whenever convenient.
My document on random number generation methods includes details on
eleven uniform random number methods; in my opinion, a new programming language's standard library ought to include
those eleven methods separately for cryptographic and for statistical RNGs. That document also
discusses how to implement other methods to generate random numbers or integers that follow a given distribution (such
as a normal, geometric, binomial, or discrete weighted distribution) or fall within a given range.
There are special considerations in play when applications use RNGs to shuffle a list of items.
The first consideration touches on the shuffling method. The Fisher–Yates shuffle method does a substantially unbiased shuffle of a list, assuming the RNG it uses can choose from among all permutations of that list. However, that method is also easy to mess up (see also Jeff Atwood, "The danger of naïveté"); I give a correct implementation in another document.
Choosing from Among All Permutations
The second consideration is present if the application uses PRNGs for shuffling. If the PRNG's period is less than the number of distinct permutations (arrangements) of a list, then there are some permutations that PRNG can't choose when it shuffles that list. (This is not the same as generating all permutations of a list, which, for a sufficiently large list size, can't be done by any computer in a reasonable time.)
The number of distinct permutations is the multinomial coefficient m! / (w1! × w2! × ... × wn!), where m is the list's size, n is the number of different items in the list, x! means "x factorial", and wi is the number of times the item identified by i appears in the list. (This reduces to n!, if the list consists of n different items).
Formulas suggesting state lengths for PRNGs are implemented below in Python. For example, to shuffle a 52-item list, a PRNG with state length 226 or more is suggested, and to shuffle two 52-item lists of identical contents together, a PRNG with state length 500 or more is suggested.
""" Calculates factorial of x. """
if x<=1: return 1
for i in range(x): ret=ret*(i+1)
""" Calculates base-2 logarithm of x, rounded up. """
one=needCeil and ((x&1)!=0)
""" Suggested state length for PRNGs that shuffle
a list of n items. """
def stateLengthNChooseK(n, k):
""" Suggested state length for PRNGs that choose k
different items randomly from a list of n items
(see RFC 3797, sec. 3.3) """
def stateLengthDecks(numDecks, numCards):
""" Suggested state length for PRNGs that shuffle
multiple decks of cards in one. """
return ceillog2(fac(numDecks*numCards)/ \
Whenever a statistical RNG implementation or seeded PRNG is otherwise called for, an application is encouraged to choose a PRNG with a state length suggested by the formulas above (and with the highest feasible period for that state length), where the choice of PRNG is based on—
- the maximum size of lists the application is expected to shuffle, if that number is less than 100; otherwise,
- the average size of such lists; or, if the application chooses,
- the application shuffling 100-item lists (which usually means a state length of 525 or greater).
(Practically speaking, for sufficiently large list sizes, any given PRNG will not be able to randomly choose some permutations of the list. See also "Lack of randomness" in the BigDeal document by van Staveren.)
The PRNG chosen this way should meet or exceed the quality requirements of a statistical RNG implementation.
A seemingly random number can be generated from arbitrary data using a hash function.
A hash function is a function that takes an arbitrary input of any size (such as a sequence of bytes or a sequence of characters) and returns an output with a fixed size. That output is also known as a hash code. (By definition, hash functions are deterministic. The definition includes a PRNG that takes the input as a seed and outputs a random number of fixed size(5).)
A hash code can be used as follows:
- The hash code can serve as a seed for a PRNG, and the desired random numbers can be generated from that PRNG. (See my document on random number generation methods for techniques.)
- If a number of random bits is needed, and the hash code has at least that many bits, then that many bits can instead be taken directly from the hash code.
For such purposes, applications should choose hash functions designed such that every bit of the input affects every bit of the output without a clear preference for 0 or 1 (the so-called avalanche property). Hash functions used in information security contexts should be designed such that finding an unknown second input that leads to the same output as that of a given input is cost-prohibitive (the one-way property) and so is finding an unknown input that leads to a given output (collision resistance), and should have other information security properties depending on the application.
GPU Programming Environments
Because, in general, GL Shading Language (GLSL) and other programming environments designed for execution on a graphics processing unit (GPU)—
- have limited access to some system resources compared with other programming environments,
- are designed for parallel execution, and
- do not store state,
random number generators for such environments are often designed as hash functions, because their output is determined solely by the input rather than both the input and state (as with PRNGs). Moreover, some of the hash functions which have been written in GLSL give undesirable results in computers whose GPUs support only 16-bit binary floating point numbers and no other kinds of numbers, which makes such GPUs an important consideration when choosing a hash function.
In this document, I made the distinction between statistical and cryptographic RNGs because that is how programming languages often present random number generators — they usually offer a general-purpose RNG (such as C's
rand or Java's
java.util.Random) and sometimes an RNG intended for information security purposes (such as
What has motivated me to write a more rigorous definition of random number generators is the fact that many applications still use weak RNGs. In my opinion, this is largely because most popular programming languages today—
- specify few and weak requirements on RNGs (such as C's
- specify a relatively weak general-purpose RNG (such as Java's
java.math.Random, although it also includes a much stronger
- implement RNGs by default that leave something to be desired (particularly the Mersenne Twister algorithm found in PHP's
mt_rand as well as in Python and Ruby),
- seed RNGs with a timestamp by default (such as the .NET Framework implementation of
- leave the default seeding fixed.
In conclusion, most applications that require random numbers usually want either unpredictability ("cryptographic security"), or speed and high quality. I believe that RNGs that meet the descriptions specified in the Cryptographic RNGs and Statistical RNGs sections will meet the needs of those applications.
In addition, this document recommends using cryptographic RNG implementations in many cases, especially in information security contexts, and recommends easier programming interfaces for both cryptographic and statistical RNGs in new programming languages.
- the commenters to the CodeProject version of this page (as well as a similar article of mine on CodeProject), including "Cryptonite" and member 3027120, and
- Lee Daniel Crocker, who reviewed this document and gave comments.
Request for Comments
Feel free to send comments. They could help improve this page.
Comments on any aspect of the document are welcome, but answers to the following would be particularly appreciated.
- Have I characterized the randomness needs of applications properly?
- Did I cover the vast majority of applications that require randomness?
- Are there existing programming language APIs or software libraries, not mentioned in this document, that already meet the requirements for cryptographic or statistical RNGs?
- Are there certain kinds of applications that require a different kind of RNG (cryptographic, statistical, seeded, etc.) than I recommended?
- In a typical computer a consumer would have today, how many random numbers per second does a typical application using RNGs generate? Are there applications that usually generate considerably more random numbers than that per second?
(1) If the software and/or hardware uses a nonuniform distribution, but otherwise meets this definition, it can be converted to use a uniform distribution, at least in theory, using unbiasing or randomness extraction methods that it is outside the scope of this document to describe.
(2) Nondeterministic sources that are reasonably fast for most applications (for instance, by enabling very many seeds to be generated per second), especially sources implemented in hardware, are highly advantageous in a cryptographic RNG.
(3) Timestamps with millisecond or coarser granularity are not encouraged, however, because multiple instances of a PRNG automatically seeded with a timestamp, when they are created at about the same time, run the risk of starting with the same seed and therefore generating the same sequence of random numbers.
(4) Noise implementations include cellular noise, value noise, gradient noise, colored noise (including white noise and pink noise), and noise following a Gaussian or other probability distribution. A noise implementation can use fractional Brownian motion to combine several layers of cellular, value, or gradient noise by calling the underlying noise function several times.
Note that usual implementations of noise (other than cellular, value, or gradient noise) don't sample each point of the sample space more than once; rather, all the samples are generated (e.g., with an RNG), then, for colored noise, a filter is applied to the samples.
(5) Note that some PRNGs (such as
xorshift128+) are not well suited to serve as hash functions, because they don't mix their state before generating a random number from that state.