Introduction and Summary
Many applications rely on random number generators (RNGs);
however, it's not enough for random numbers to merely "look random". But unfortunately, most popular programming languages today—
- specify few and weak requirements on their built-in RNGs (such as C's
- specify a relatively weak general-purpose RNG (such as Java's
- implement RNGs by default that leave something to be desired (such as Mersenne Twister),
- seed RNGs with a timestamp by default (such as the .NET Framework implementation of
- use fixed seeds by default in their RNGs (as is the case in MATLAB and C; see also the question titled "Matlab rand and c++ rand()" on Stack Overflow),
so that as a result, many applications use RNGs, especially built-in RNGs, that have little assurance of high quality or security. That is why this document discusses high-quality RNGs and suggests existing implementations of them.
This document covers:
- Statistical and cryptographic RNGs(1) and seeded PRNGs (pseudorandom number generators), as well as recommendations on their use and properties.
- A discussion on when an application that needs numbers that "seem" random ought to specify their own "seed" (the initial state that the numbers are based on).
- Nondeterministic sources, entropy, and seed generation.
- Existing implementations of RNGs.
- Guidance for implementations of RNGs designed for reuse by applications.
- Issues on shuffling with an RNG.
This document does not cover:
- Testing an RNG implementation for correctness or adequate random number generation (e.g., Dörre and Klebanov 2016(2)).
- Generation of random numbers or keying material based at least in part on a password (e.g. key derivation functions).
- Generation of random numbers that follow a nonuniform distribution; I discuss this topic in another document.
- Low-discrepancy sequences (quasirandom sequences), such as Sobol sequences. Their structure differs in an essential way from independent uniform random numbers.
- Applications for which the selection of RNGs is constrained by regulatory requirements.
The following table summarizes the kinds of RNGs covered in this document:
|Kind of RNG||When to Use This RNG||Examples|
|A cryptographic RNG seeks to generate numbers that are cost-prohibitive to predict.||In information security cases, or when speed is not a concern.|
|A statistical RNG seeks to generate numbers that are uniformly random for most practical purposes.||When information security is not a concern, but speed is.|
|A seeded PRNG generates numbers that "seem" random given an initial "seed".||When generating reproducible "randomness" in a way not practical otherwise.||High-quality PRNG with custom seed|
About This Document
This is an open-source document; for an updated version, see the source code or its rendering on GitHub. You can send comments on this document either on CodeProject or on the GitHub issues page.
The following definitions are helpful in better understanding this document.
- Random number generator (RNG). Software and/or hardware that seeks to generate independent numbers that seem to occur by chance and that are approximately uniformly distributed(3).
- Pseudorandom number generator (PRNG). A random number generator that outputs seemingly random numbers using a deterministic algorithm (that is, an algorithm that returns the same output for the same input and state every time), and in which its state can be initialized and possibly reinitialized with arbitrary data.
- Seed. Arbitrary data for initializing the state of a PRNG.
- State length. The maximum size of the seed a PRNG can take to initialize its state without shortening or compressing that seed.
- Information security. Defined in ISO/IEC 27000.
- MUST, SHOULD, SHOULD NOT, MAY, RECOMMENDED, NOT RECOMMENDED. As defined in RFC 2119 and RFC 8174.
Cryptographic RNGs (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that not only "look random", but are cost-prohibitive to predict. Cryptographic RNGs are RECOMMENDED for applications that use random numbers for information security, such as—
- generating keying material, such as encryption keys,
- generating random passwords, nonces, or session identifiers,
- generating "salts" to vary hash codes of the same password,
- use in communications between two networked computers (including in data transfer, data transport, and messaging), and
- cases (such as in multiplayer networked games) when predicting future random numbers would give a player or user a significant and unfair advantage,
as well as for applications that generate random numbers so infrequently that the RNG's speed is not a concern.
See "Cryptographic RNGs: Requirements" for requirements, and see "Existing RNG APIs in Programming Languages" for existing APIs.
Examples of cryptographic RNG implementations are:
- Randomness extractors or cryptographic hash functions that take very hard-to-predict signals from two or more nondeterministic sources as input.
- A "fast-key-erasure" random number generator described by D.J. Bernstein in his blog (Bernstein 2017)(4).
- An RNG implementation complying with NIST SP 800-90A. The SP 800-90 series goes into further detail on how RNGs appropriate for information security can be constructed, and inspired much of the "Cryptographic RNGs" section.
Compared to general-purpose computing devices such as desktop computers and smartphones, resource-constrained devices ("embedded" devices) are much less likely to have a cryptographic RNG available (Wetzels 2017)(5), although methods exist for implementing a cryptographic RNG on the Arduino (Peng 2017)(6).
Statistical RNGs are RNGs with a high quality of statistical randomness, but are not necessarily appropriate for information security. Even here, though, the numbers delivered by a statistical RNG not only "look random".
An application SHOULD NOT use RNGs with weaker quality than that of statistical RNGs.
An application SHOULD NOT use statistical RNGs in information security contexts, and it SHOULD NOT use them unless it generates random numbers so frequently that it would slow down undesirably if a cryptographic RNG were used instead.
See "Statistical RNGs: Requirements" for requirements.
Examples and Non-Examples
Examples of statistical RNGs include the following:
- xoshiro256** (state length 256 bits; nonzero seed).
- xoroshiro128** (state length 128 bits; nonzero seed).
Lehmer128 (for each: state length 128 bits; odd seed, so effectively 127 bits state length).
- XorShift* 128/64 (state length 128 bits; nonzero seed).
- XorShift* 64/32 (state length 64 bits; nonzero seed).
JLKISS64, described in (Jones 2007/2010)(7).
std::ranlux48 engine (state length 577 bits; nonzero seed).
- PCG (
pcg64_fast classes), by Melissa O'Neill. See also a critique by S. Vigna.
- Other examples include B. Jenkins's "A small noncryptographic PRNG" (sometimes called
jsf), C. Doty-Humphrey's
msws (Widynski 2017)(8), and D. Blackman's
The following also count as statistical RNGs, but are not preferred:
- Mersenne Twister shows a systematic failure in
BigCrush's LinearComp test. (See also (Vigna 2016)(9).)
xorshift128+. As described by (Blackman and Vigna 2018)(10), these linear PRNGs use weak scramblers, so that each output's lowest bits have low linear complexity even though the output as a whole has excellent statistical randomness. See also "Testing lowest bits in isolation".
Non-examples include the following:
- Any linear congruential generator with modulus 263 or less (such as
java.util.Random and C++'s
std::minstd_rand0 engines) has a state length of less than 64 bits.
System.Random, as implemented in the .NET Framework 4.7, can take a seed of at most 32 bits, so has a state length of at most 32 bits.
Some applications (such as simulations, machine learning, and some games) use pseudorandom number generators (PRNGs) to generate apparently "random" numbers starting from a known initial state, or "seed". Such applications usually care about repeatable "randomness". (Note that in the definitions for cryptographic and statistical RNGs given later, the PRNGs involved are automatically seeded before use.)
When to Use a Seeded PRNG
By using a seeded PRNG for repeatable "randomness", an application will be tied to that PRNG or its implementation. For this reason, an application SHOULD NOT use a seeded PRNG (rather than an automatically-initialized PRNG or another kind of RNG) unless—
- the application might need to generate the same "random" result multiple times,
- the application either—
- makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
- finds it impractical to store or distribute the "random" numbers or results (rather than the seed) for later use, such as—
- by saving the result to a file,
- by storing the "random" numbers for the feature generating the result to "replay" later, or
- by distributing the "random" numbers or results to networked users as they are generated, and
- any feature that uses such a PRNG to generate that "random" result will be consistent and deterministic, both across time and across supported hardware and operating systems, with respect to the "random" results it generates, for as long as that feature is still in use by the application.
Note: Meeting statement 3 is not exactly trivial.(11) Using only documented PRNGs, with implementation-independent behavior that will not change in the future, can help.
java.util.Random is one such PRNG, but none of the following is such a PRNG:
- The C
rand method, as well as C++'s random number distribution classes, such as
std::uniform_int_distribution, use implementation-defined algorithms for random number generation.
System.Random has random number generation behavior that could change in the future.
Which Seeded PRNG to Use
If an application decides to use a seeded PRNG for repeatable "randomness", that PRNG SHOULD meet or exceed the requirements of a statistical RNG (except the seed is application-defined instead) and SHOULD be reasonably fast.
Seed Generation for Seeded PRNGs
As much as possible, an application SHOULD generate seeds for a seeded PRNG using hard-to-predict data (e.g., the output of a cryptographic RNG) or as described in Nondeterministic Sources and Seed Generation.
Custom seeds can come into play in the following situations, among others.
Many kinds of games generate game content based on apparent randomness, such as—
- procedurally generated maps for a role-playing game,
- shuffling a digital deck of cards for a solitaire game, or
- a game board or puzzle board that normally varies every session,
where the game might need to generate the same content of that kind multiple times.
In general, such a game SHOULD NOT use a PRNG with a custom seed for such purposes unless—
- generating the random content uses relatively many random numbers (say, more than a few thousand), and the application finds it impractical to store or distribute the content or the numbers for later use, or
- the game makes the seed (or a "code" or "password" based on the seed, such as a barcode or a string of letters and digits) accessible to the player, to allow the player to regenerate the content.
Option 1 often applies to games that generate procedural terrain for game levels, since the terrain often exhibits random variations over an extended space. Option 1 is less suitable for puzzle game boards or card shuffling, since much less data needs to be stored.
Example: Suppose a game generates a map with random terrain and shows the player a "code" to generate that map. In this case, the game—
- MAY change the algorithm it uses to generate random maps, but
- SHOULD use, in connection with the new algorithm, "codes" that can't be confused with "codes" it used for previous algorithms, and
- SHOULD continue to generate the same random map using an old "code" when the player enters it, even after the change to a new algorithm.
A custom seed is appropriate when unit testing a method that uses a seeded PRNG in place of another kind of RNG for the purpose of the test (provided the test ensures backward compatibility).
Noise is a randomized variation in images, sound, and other data. (See also Red Blob Games, "Noise Functions and Map Generation"). For the purposes of RNG recommendations, there are two kinds of noise:
Procedural noise is generated using a noise function, which is a function that takes an n-dimensional point and, optionally, additional data such as gradients or hash values, and outputs a seemingly random number.(12) Procedural noise includes cellular noise, value noise, and gradient noise (such as Perlin noise). The same instance of a procedural implementation SHOULD NOT output different "random" numbers at different times for the same n-dimensional point. Such an instance SHOULD be initialized either with "hard-coded" data only or RNG-generated data.
Nonprocedural noise is generated using the help of an RNG. Nonprocedural noise includes colored noise (including white noise and pink noise), periodic noise, and noise following a Gaussian or other probability distribution. For nonprocedural noise, the same considerations apply to any RNGs the noise implementation uses as in cases not involving noise.
Verifiable Random Numbers
Verifiable random numbers are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, such information includes random numbers and/or uncertain data to be determined and publicly disclosed in the future. Generating verifiable randomness has been described in RFC 3797, in (Lenstra et al., 2015)(13), in (Boneh et al., 2018)(14) (which introduces the concept of verifiable delay functions, functions whose output deliberately takes time to compute but is easy to verify), and elsewhere.
Nondeterministic Sources and Seed Generation
To generate random numbers, RNGs ultimately rely on nondeterministic sources, that is, sources that don't return the same output for the same input every time. Such sources are used to help generate a seed for a PRNG, for example. The best nondeterministic sources for this purpose are those whose output is very hard to predict.
Examples of Nondeterministic Sources
Examples of nondeterministic sources are—
- disk access timings,
- timings of keystrokes and/or other input device interactions,
- thermal noise,
- the output of assembly instructions specially dedicated to random number generation, such as RdSeed,
- the output generated with A. Seznec's technique called hardware volatile entropy gathering and expansion (HAVEGE), provided a high-resolution counter is available, and
- differences between two high-resolution counter values taken in quick succession (such as in "Jitter RNG"; see (Müller)(15)).
RFC 4086, "Randomness Requirements for Security", section 3, contains a survey of nondeterministic sources.
- Online services that make random numbers available to applications, as well as outputs of audio and video devices (see RFC 4086 sec. 3.2.1), are additional nondeterministic sources. However, online services require Internet or other network access, and some of them require access credentials. Also, many mobile operating systems require applications to declare network, camera, and microphone access to users upon installation. For these reasons, these kinds of sources are NOT RECOMMENDED if other approaches are adequate.
- For noncryptographic RNGs, timestamps from the system clock are commonly used. Timestamps with millisecond or coarser granularity are not encouraged, however, because multiple instances of a PRNG automatically seeded with a timestamp, when they are created at about the same time, run the risk of starting with the same seed and therefore generating the same sequence of random numbers.(16)
- For general-purpose use, nondeterministic sources that enable many high-quality seeds per second to be generated are highly advantageous, especially for a cryptographic RNG.
Entropy is a value that describes how hard it is to predict a nondeterministic source's output, compared to ideal random data; this is generally the size in bits of the ideal random data. (For example, a 64-bit output with 32 bits of entropy is as hard to predict as an ideal random 32-bit data block.) NIST SP 800-90B recommends min-entropy as the entropy measure. Characterizing a nondeterministic source's entropy is nontrivial and beyond the scope of this document. See also RFC 4086 section 2.
In general, especially for cryptographic RNGs, to generate an N-bit seed, enough data needs to be gathered from nondeterministic sources to reach N bits of entropy or more.
Once data with enough entropy is gathered, it might need to be condensed into a seed to initialize a PRNG with. Following (Cliff et al., 2009)(17), it is suggested to generate an N-bit seed by using an HMAC (hash-based message authentication code) algorithm, with outputs at least N times 3 bits long, on data with at least N times 3 bits of entropy, then truncating the output to N bits. See also NIST SP 800-90B sec. 18.104.22.168 and RFC 4086 sec. 4.2 and 5.2.
Existing RNG APIs in Programming Languages
As much as possible, applications SHOULD use existing libraries and techniques that already meet the requirements for cryptographic and statistical RNGs. The following table lists application programming interfaces (APIs) for such RNGs for popular programming languages.
- Methods and libraries mentioned in the "Statistical" column need to be initialized with a seed before use (for example, a seed generated using an implementation in the "Cryptographic" column).
- The mention of a third-party library in this section does not imply sponsorship or endorsement of that library, or imply a preference of that library over others. The list is not comprehensive.
- See also Paragon's blog post on existing cryptographic RNGs.
|.NET (incl. C# and VB.NET) (H)|
|airbreather/AirBreather.Common library (XorShift1024Star, XorShift128Plus, XoroShiro128Plus)|
xoroshiro128plus.c (128-bit nonzero seed);
xorshift128plus.c (128-bit nonzero seed); frostburn/jkiss library
secrets.SystemRandom (since Python 3.6);
pypcg package; ihaque/xorshift library (128-bit nonzero seed; default seed uses
|Java (A) (D)||(C); |
XORShift128Plus); jenetics/prngine (
crypto.randomBytes(byteCount) (node.js only);
random-number-csprng package (node.js only);
|Ruby (A) (E)||(C); |
SecureRandom.rand() (ranges from 0 to 1 exclusive) (E);
SecureRandom.rand(N) (integer) (E) (for both,
random_bytes() (both since PHP 7)
(A) The general RNGs of recent versions of Python and Ruby implement Mersenne Twister, which is not preferred for a statistical RNG. PHP's
mt_rand() implements or implemented a flawed version of Mersenne Twister.
prngine, a Java library, also has
MT19937_64Random classes that implement Mersenne Twister.
Math.random() (which ranges from 0 to 1 exclusive) is implemented using
xorshift128+ (or a variant) in the V8 engine, Firefox, and certain other modern browsers as of late 2017;
Math.random() uses an "implementation-dependent algorithm or strategy", though (see ECMAScript sec. 22.214.171.124).
(C) A cryptographic RNG implementation can—
- read from the
/dev/urandom device in most Unix-based systems (using the
read system calls where available)(18),
- call the
arc4random_buf method on FreeBSD or macOS,
- call the
getentropy method on OpenBSD, or
- call the
BCryptGenRandom API in Windows 7 and later,
and only use other techniques if the existing ones are inadequate for the application.
java.util.Random class uses a 48-bit seed, so doesn't meet the statistical RNG requirements. However, a subclass of
java.util.Random might be implemented to meet those requirements.
SecureRandom.rand methods present a beautiful and simple API for random number generation, in my opinion. Namely,
rand() returns a number from 0 to 1 exclusive, and
rand(N) returns an integer from 0 to N exclusive.
(F) Calling the
setSeed method of
SecureRandom before use is RECOMMENDED. The data passed to the method SHOULD be data described in note (C). (Despite the name,
setSeed supplements the existing seed, according to the documentation.) See also (Klyubin 2013)(19). Using the
"SHA1PRNG" is NOT RECOMMENDED, because of weaknesses in seeding and RNG quality in implementations as of 2013 (Michaelis et al., 2013)(20).
std::random_device, introduced in C++11, is NOT RECOMMENDED because its specification leaves considerably much to be desired. For example,
std::random_device can fall back to a pseudorandom number generator of unspecified quality without much warning. At best,
std::random_device SHOULD only be used to supplement other techniques for random number generation.
(H) The .NET Framework's
System.Random class uses a seed of at most 32 bits, so doesn't meet the statistical RNG requirements. However, a subclass of
System.Random might be implemented to meet those requirements.
How to Initialize RNGs
For cryptographic RNGs, an application SHOULD use only one thread-safe instance of the RNG for the entire application to use.
For statistical RNGs, an application SHOULD use only one thread-safe instance of the RNG for the entire application to use. (If doing so is a performance bottleneck for the application and a PRNG is used, the application MAY instead create one RNG instance for each thread, as described below for seeded PRNGs.)
For seeded PRNGs, to reduce the chance of correlated random numbers or identical random number sequences, an application is encouraged to create one or more instances of a PRNG, where each instance—
- is accessible to only one thread, task, or subtask of the application (such as with thread-local storage),
- is initialized with a seed that is unrelated to the other seeds (using sequential or linearly related seeds can cause undesirable correlations in some PRNGs), and
- MAY use a different conforming RNG scheme from the others.
(L'Ecuyer et al. 2015)(21), section 4, goes in greater detail on ways to initialize PRNGs for generating random numbers in parallel, including how to ensure repeatable "randomness" this way if that is desired.
In a list with
N different items, there are
N factorial (that is,
1 * 2 * ... * N, or
N!) ways to arrange the items in that list. These ways are called permutations(22).
An application can shuffle a list—
- by generating a random integer at least 0 and less than the number of permutations, and converting that integer to a permutation, or
- by doing a Fisher–Yates shuffle (which is unfortunately easy to mess up — see (Atwood 2007)(23) — and is implemented correctly in another document of mine).
Either way, however, if a PRNG's period is less than the number of permutations, then there are some permutations that that PRNG can't choose when it shuffles that list. (This is not the same as generating all permutations of a list, which, for a list big enough, can't be done by any computer in a reasonable time. A PRNG's period is the maximum size of a "random" number sequence it can generate before that sequence repeats; this size is no greater than 2L where L is the PRNG's state length.)
On the other hand, for a list big enough, it's generally more important to have shuffles act random than to choose from among all permutations.
An application that shuffles a list can do the shuffling—
- using a cryptographic RNG, preferably one with a security strength of
B bits or greater, or
- if a noncryptographic RNG is otherwise appropriate, using a PRNG that—
- has a state length of
B bits or greater, and
- qualifies as a statistical RNG except it's initialized with a seed derived from data with at least
B bits of entropy, or "randomness".
B can usually be calculated for different lists using the Python code in the appendix; see also (van Staveren 2000, "Lack of randomness")(24). For example,
B is 226 (bits) for a 52-item list. An application MAY limit
B to 256 or greater, in cases when variety of permutations is not important.
GPU Programming Environments
In general, GL Shading Language (GLSL) and other programming environments designed for execution on a graphics processing unit (GPU) are stateless (they read from and write to data without storing any state themselves). Approaches that have been used for random number generation in GPU environments include—
- using hash functions, whose output is determined solely by the input rather than both the input and state (as with PRNGs)(25), and
- sampling "noise textures" with random data in each pixel (Peters 2016)(26).
(L'Ecuyer et al. 2015)(27) discusses parallel generation of random numbers using PRNGs, especially on GPUs.
A seemingly random number can be generated from arbitrary data using a hash function.
A hash function is a function that takes an arbitrary input of any size (such as an array of 8-bit bytes or a sequence of characters) and returns an output with a fixed number of bits. That output is also known as a hash code. (By definition, hash functions are deterministic(28).)
A hash code can be used as follows:
- The hash code can serve as a seed for a PRNG, and the desired random numbers can be generated from that PRNG. (See my document on random number generation methods for techniques.)
- If a number of random bits is needed, and the hash code has at least that many bits, then that many bits can instead be taken directly from the hash code.
Useful properties of some hash functions include—
- the avalanche property (every bit of the input affects every bit of the output without a clear preference for 0 or 1),
- collision resistance (finding two different inputs that lead to a given output is cost-prohibitive), and
- the one-way property (finding an unknown input that leads to a given output is cost-prohibitive) (see NIST SP 800-108).
Hash functions not used for information security SHOULD have the avalanche property (e.g, MurmurHash3, xxHash, CityHash). Hash functions used for information security SHOULD have the collision resistance, avalanche, and one-way properties (e.g., SHA2-256, BLAKE2).
Guidelines for New RNG APIs
This section contains guidelines for those seeking to implement RNGs designed for wide reuse (such as in a programming language's standard library). As mentioned earlier, an application SHOULD use existing RNG implementations whenever possible.
This section contains suggested requirements on cryptographic and statistical RNGs that a new programming language can choose to adopt.
Cryptographic RNGs: Requirements
A cryptographic RNG generates uniformly distributed random bits such that it would be cost-prohibitive for an outside party to correctly guess, with more than a 50% chance per bit, prior or future unseen outputs of that RNG after knowing how the RNG works and/or extremely many outputs of the RNG, or prior unseen outputs of that RNG after knowing the RNG's internal state at the given point in time.
If a cryptographic RNG implementation uses a PRNG, the following requirements apply.
The PRNG's state length MUST be at least 128 bits and SHOULD be at least 256 bits. The security strength used by the RNG MUST be at least 112 bits, SHOULD be at least 128 bits, and is less than or equal to the PRNG's state length.
Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed described as follows. The seed—
- MUST have as many bits as the PRNG's state length,
- MUST consist of data that ultimately derives from the output of one or more nondeterministic sources, where the output is at least as hard to predict as ideal random data with as many bits as the security strength, and
- MAY be mixed with arbitrary data other than the seed as long as the result is no easier to predict(29).
The RNG SHOULD reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds if it would generate more than a threshold number of bits without reseeding, that threshold SHOULD be 267 or less.
Statistical RNGs: Requirements
A statistical RNG generates random bits, each of which is uniformly distributed independently of the other bits, at least for nearly all practical purposes outside of information security. If the implementation uses a PRNG, that PRNG algorithm MUST either satisfy the collision resistance property or be significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of
BigCrush, part of L'Ecuyer and Simard's "TestU01". The RNG need not be perfectly equidistributed.
If a statistical RNG implementation uses a PRNG, the following requirements apply.
The PRNG's state length MUST be at least 64 bits, SHOULD be at least 128 bits, and is encouraged to be as high as the implementation can go to remain reasonably fast for most applications.
Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed described as follows. The seed—
- MUST have as many bits as the PRNG's state length,
- MUST consist of data that ultimately derives from the output of one or more nondeterministic sources and/or cryptographic RNGs, where the output is encouraged to cover a state space of at least as many bits as the PRNG's state length, and
- MAY be mixed with arbitrary data other than the seed.
The RNG MAY reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds if it would generate more than a threshold number of values without reseeding, that threshold SHOULD be 2L or less, where L is the PRNG's state length.
Implementing New RNG APIs
A programming language API designed for reuse by applications could implement RNGs using the following guidelines:
- The RNG API can include a method that fills one or more memory units (such as 8-bit bytes) completely with random bits (see example 1).
- If the API implements an automatically-initialized RNG, it SHOULD NOT allow applications to initialize that same RNG with a seed for repeatable "randomness"(30) (it MAY provide a separate PRNG to accept such a seed). See example 2.
- If the API provides a PRNG that an application can seed for repeatable "randomness", it SHOULD document that PRNG and any methods the API provides that use that PRNG (such as shuffling and Gaussian number generation), and SHOULD NOT change that PRNG or those methods in a way that would change the "random" numbers they deliver for a given seed. See example 2.
- My document on random number generation methods includes details on twelve uniform random number methods. In my opinion, a new programming language's standard library ought to include those twelve methods separately for cryptographic and for statistical RNGs.
- A C language RNG method for filling memory could look like the following:
int random(uint8_t bytes, size_t size);, where
bytes is a pointer to an array of 8-bit bytes, and
size is the number of random 8-bit bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise.
- A Java API that follows these guidelines can contain two classes: a
RandomGen class that implements an unspecified but general-purpose RNG, and a
RandomStable class that implements a PCG PRNG that is documented and will not change in the future.
RandomStable includes a constructor that takes a seed for repeatable "randomness", while
RandomGen does not. Both classes include methods to generate uniform random numbers, but
RandomStable specifies the exact algorithms to those methods and
RandomGen does not. At any time in the future,
RandomGen can change its implementation to use a different RNG while remaining backward compatible, while
RandomStable has to use the same algorithms for all time to remain backward compatible, especially because it takes a seed for repeatable "randomness".
- the commenters to the CodeProject version of this page (as well as a similar article of mine on CodeProject), including "Cryptonite" and member 3027120, and
- Lee Daniel Crocker, who reviewed this document and gave comments.
(1) A distinction between statistical and cryptographic RNGs seems natural, because many programming languages offer a general-purpose RNG (such as C's
rand or Java's
java.util.Random) and sometimes an RNG intended for information security purposes (such as Ruby's
(2) F. Dörre and V. Klebanov, "Practical Detection of Entropy Loss in Pseudo-Random Number Generators", 2016.
(3) If the software and/or hardware uses a nonuniform distribution, but otherwise meets this definition, it can be converted to use a uniform distribution, at least in theory, using unbiasing, deskewing, or randomness extraction (see RFC 4086 sec. 4 or Cliff et al. 2009 for further discussion).
(4) Bernstein, D.J. "Fast-key-erasure random number generators", Jun. 23, 2017.
(5) Wetzels, J., "33C3: Analyzing Embedded Operating System Random Number Generators", samvartaka.github.io, Jan. 3, 2017.
(6) B. Peng, "Two Fast Methods of Generating True Random Numbers on the Arduino", GitHub Gist, December 2017.
(7) Jones, D., "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications", 2007/2010.
(8) Widynski, B., "Middle Square Weyl Sequence RNG", arXiv:1704.00358v1 [cs.CR], 2017.
(9) S. Vigna, "An experimental exploration of Marsaglia's
xorshift generators, scrambled", 2016.
(10) Blackman, D., Vigna, S. "Scrambled Linear Pseudorandom Number Generators", 2018.
(11) An algorithm can become nondeterministic in many ways. For example, different results can happen because parallel operations can finish in a different order, items can be assigned differently to hash table buckets, or floating-point addition can be carried out in a different order.
Implementations of floating-point numbers and floating-point math can also differ, especially for math functions for which specifications don't always require delivering as-accurate-as-possible results (examples are the x87
FSIN instruction and the difference between
StrictMath in Java). Applications SHOULD avoid generating "random" floating-point numbers when they desire repeatable "randomness".
(12) Noise functions include functions that combine several outputs of a noise function, including by fractional Brownian motion. By definition, noise functions are deterministic.
(13) Lenstra, A.K., Wesolowski, B. "A random zoo: sloth, unicorn, and trx", 2015.
(14) Boneh, D., Bonneau, J., et al. "Verifiable Delay Functions", 2018.
(15) Müller, S. "CPU Time Jitter Based Non-Physical True Random Number Generator".
(16) For example, many questions on Stack Overflow highlight the pitfalls of creating a new instance of .NET's
System.Random each time a random number is needed, rather than only once in the application. See also the section "How to Initialize RNGs".
(17) Cliff, Y., Boyd, C., Gonzalez Nieto, J. "How to Extract and Expand Randomness: A Summary and Explanation of Existing Results", 2009.
(18) Using the similar
/dev/random is NOT RECOMMENDED, since in some implementations it can block for seconds at a time, especially if not enough randomness is available. See also "Myths about /dev/urandom".
(19) A. Klyubin, "Some SecureRandom Thoughts", Android Developers Blog, Aug. 14, 2013.
(20) Michaelis, K., Meyer, C., and Schwenk, J. "Randomly Failed! The State of Randomness in Current Java Implementations", 2013.
(21) P. L'Ecuyer, D. Munger, et al. "Random Numbers for Parallel Computers: Requirements and Methods, With Emphasis on GPUs". April 17, 2015.
(22) More generally, a list has
N! / (W_1! * W_2! * ... * W_K!) permutations (a multinomial coefficient), where
N is the list's size,
K is the number of different items in the list, and
W_i is the number of times the item identified by
i appears in the list. However, this number is never more than
N! and suggests using less randomness, so an application need not use this more complicated formula and MAY assume that a list has
N! permutations even if some of its items occur more than once.
(23) Atwood, Jeff. "The danger of naïveté", Dec. 7, 2007.
(24) van Staveren, Hans. "Big Deal: A new program for dealing bridge hands", Sep. 8, 2000
(25) The only binary floating-point numbers supported by some GPUs are 16-bit (with 10 significant bits of precision), notably not 32- or 64-bit as is otherwise common. An application ought to choose hash functions that deliver acceptable "noise" regardless of the size of floating-point numbers supported by the GPU.
(26) C. Peters, "Free blue noise textures", Moments in Graphics, Dec. 22, 2016. That article discusses the sampling of blue noise, not independent uniformly-distributed random numbers, but a similar approach applies to textures of noise however generated.
(27) P. L'Ecuyer, D. Munger, et al. "Random Numbers for Parallel Computers: Requirements and Methods, With Emphasis on GPUs". April 17, 2015.
(28) Note that although PRNGs can also act like hash functions (if they're seeded with the input and the PRNG is "large enough" for the input), some PRNGs (such as
xorshift128+) are not well suited to serve as hash functions, because they don't mix their state before generating a random number from that state.
(29) Such arbitrary data can include process identifiers, time stamps, environment variables, random numbers, virtual machine guest identifiers, and/or other data specific to the session or to the instance of the RNG. See also NIST SP800-90A and the references below.
Everspaugh, A., Zhai, Y., et al. "Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG", 2014.
Ristenpart, T., Yilek, S. "When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography", 2010.
Math.random() implementation to use a variant of
Math.random() to be seeded.) Nevertheless, APIs can still allow applications to provide additional input ("entropy") to the RNG in order to increase its randomness rather than to ensure repeatability.
Suggested Entropy Size
The following Python code suggests how many bits of entropy are needed for shuffling. For example:
- To shuffle an
n-item list, the suggested bits of entropy is at least as high as the base-2 logarithm, rounded up, of
- To shuffle a 52-item list, at least 226 bits of entropy is suggested (
- To shuffle two 52-item lists of identical contents together, at least 500 bits of entropy is suggested (
from math import factorial as fac
""" Calculates base-2 logarithm, rounded up, of x. """
one=needCeil and ((x&1)!=0)
""" Suggested state length (or bits of entropy)
for PRNGs that shuffle
a list of n items. """
def stateLengthNChooseK(n, k):
""" Suggested state length/entropy for PRNGs that choose k
different items randomly from a list of n items
(see RFC 3797, sec. 3.3) """
def stateLengthDecks(numDecks, numCards):
""" Suggested state length/entropy for PRNGs that shuffle
multiple decks of cards in one. """
return ceillog2(fac(numDecks*numCards)/ \