Click here to Skip to main content
13,900,622 members
Click here to Skip to main content
Add your own
alternative version

Tagged as

Stats

90.4K views
100 bookmarked
Posted 5 Mar 2016
Licenced Public Domain

Random Number Generator Recommendations for Applications

, 11 Mar 2019
Rate this:
Please Sign up or sign in to vote.
Most apps that use random numbers care about either unpredictability, speed/high quality, or repeatability. This article explains the three kinds of RNGs and gives recommendations on each kind.

Introduction and Summary

Many applications rely on random number generators (RNGs); however, it's not enough for random numbers to merely "look random". But unfortunately, most popular programming languages today—

  • specify few and weak requirements on their built-in RNGs (such as C's rand),
  • specify a relatively weak general-purpose RNG (such as Java's java.math.Random),
  • implement RNGs by default that leave something to be desired (such as Mersenne Twister),
  • seed RNGs with a timestamp by default (such as the .NET Framework implementation of System.Random), and/or
  • use fixed seeds by default in their RNGs (as is the case in MATLAB and C; see also the question titled "Matlab rand and c++ rand()" on Stack Overflow),

so that as a result, many applications use RNGs, especially built-in RNGs, that have little assurance of high quality or security. That is why this document discusses high-quality RNGs and suggests existing implementations of them.

This document covers:

  • Statistical and cryptographic RNGs(1) and seeded PRNGs (pseudorandom number generators), as well as recommendations on their use and properties.
  • A discussion on when an application that needs numbers that "seem" random ought to specify their own "seed" (the initial state that the numbers are based on).
  • Nondeterministic sources, entropy, and seed generation.
  • Existing implementations of RNGs.
  • Guidance for implementations of RNGs designed for reuse by applications.
  • Issues on shuffling with an RNG.

This document does not cover:

  • Testing an RNG implementation for correctness or adequate random number generation (e.g., Dörre and Klebanov 2016(2)).
  • Generation of random numbers or keying material based at least in part on a password (e.g. key derivation functions).
  • Generation of random numbers that follow a nonuniform distribution; I discuss this topic in another document.
  • Low-discrepancy sequences (quasirandom sequences), such as Sobol sequences. Their structure differs in an essential way from independent uniform random numbers.
  • Applications for which the selection of RNGs is constrained by regulatory requirements.

The following table summarizes the kinds of RNGs covered in this document:

Kind of RNGWhen to Use This RNGExamples
A cryptographic RNG seeks to generate numbers that are cost-prohibitive to predict.In information security cases, or when speed is not a concern./dev/urandom, BCryptGenRandom
A statistical RNG seeks to generate numbers that are uniformly random for most practical purposes.When information security is not a concern, but speed is.pcg64, xoshiro256**
A seeded PRNG generates numbers that "seem" random given an initial "seed".When generating reproducible "randomness" in a way not practical otherwise.High-quality PRNG with custom seed

About This Document

This is an open-source document; for an updated version, see the source code or its rendering on GitHub. You can send comments on this document either on CodeProject or on the GitHub issues page.

Contents

Definitions

The following definitions are helpful in better understanding this document.

  • Random number generator (RNG). Software and/or hardware that seeks to generate independent numbers that seem to occur by chance and that are approximately uniformly distributed(3).
  • Pseudorandom number generator (PRNG). A random number generator that outputs seemingly random numbers using a deterministic algorithm (that is, an algorithm that returns the same output for the same input and state every time), and in which its state can be initialized and possibly reinitialized with arbitrary data.
  • Seed. Arbitrary data for initializing the state of a PRNG.
  • State length. The maximum size of the seed a PRNG can take to initialize its state without shortening or compressing that seed.
  • Information security. Defined in ISO/IEC 27000.
  • MUST, SHOULD, SHOULD NOT, MAY, RECOMMENDED, NOT RECOMMENDED. As defined in RFC 2119 and RFC 8174.

Cryptographic RNGs

Cryptographic RNGs (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that not only "look random", but are cost-prohibitive to predict. Cryptographic RNGs are RECOMMENDED for applications that use random numbers for information security, such as—

  • generating keying material, such as encryption keys,
  • generating random passwords, nonces, or session identifiers,
  • generating "salts" to vary hash codes of the same password,
  • use in communications between two networked computers (including in data transfer, data transport, and messaging), and
  • cases (such as in multiplayer networked games) when predicting future random numbers would give a player or user a significant and unfair advantage,

as well as for applications that generate random numbers so infrequently that the RNG's speed is not a concern.

See "Cryptographic RNGs: Requirements" for requirements, and see "Existing RNG APIs in Programming Languages" for existing APIs.

Examples

Examples of cryptographic RNG implementations are:

  • Randomness extractors or cryptographic hash functions that take very hard-to-predict signals from two or more nondeterministic sources as input.
  • A "fast-key-erasure" random number generator described by D.J. Bernstein in his blog (Bernstein 2017)(4).
  • An RNG implementation complying with NIST SP 800-90A. The SP 800-90 series goes into further detail on how RNGs appropriate for information security can be constructed, and inspired much of the "Cryptographic RNGs" section.

Resource-Constrained Devices

Compared to general-purpose computing devices such as desktop computers and smartphones, resource-constrained devices ("embedded" devices) are much less likely to have a cryptographic RNG available (Wetzels 2017)(5), although methods exist for implementing a cryptographic RNG on the Arduino (Peng 2017)(6).

Statistical RNGs

Statistical RNGs are RNGs with a high quality of statistical randomness, but are not necessarily appropriate for information security. Even here, though, the numbers delivered by a statistical RNG not only "look random".

An application SHOULD NOT use RNGs with weaker quality than that of statistical RNGs.

An application SHOULD NOT use statistical RNGs in information security contexts, and it SHOULD NOT use them unless it generates random numbers so frequently that it would slow down undesirably if a cryptographic RNG were used instead.

See "Statistical RNGs: Requirements" for requirements.

Examples and Non-Examples

Examples of statistical RNGs include the following:

  • xoshiro256** (state length 256 bits; nonzero seed).
  • xoroshiro128** (state length 128 bits; nonzero seed).
  • Lehmer64 and Lehmer128 (for each: state length 128 bits; odd seed, so effectively 127 bits state length).
  • XorShift* 128/64 (state length 128 bits; nonzero seed).
  • XorShift* 64/32 (state length 64 bits; nonzero seed).
  • JKISS, JKISS32, JLKISS, JLKISS64, described in (Jones 2007/2010)(7).
  • C++'s std::ranlux48 engine (state length 577 bits; nonzero seed).
  • PCG (pcg32, pcg64, and pcg64_fast classes), by Melissa O'Neill. See also a critique by S. Vigna.
  • Other examples include B. Jenkins's "A small noncryptographic PRNG" (sometimes called jsf), C. Doty-Humphrey's sfc, msws (Widynski 2017)(8), and D. Blackman's gjrand.

The following also count as statistical RNGs, but are not preferred:

  • Mersenne Twister shows a systematic failure in BigCrush's LinearComp test. (See also (Vigna 2016)(9).)
  • xoroshiro128+, xoshiro256+, and xorshift128+. As described by (Blackman and Vigna 2018)(10), these linear PRNGs use weak scramblers, so that each output's lowest bits have low linear complexity even though the output as a whole has excellent statistical randomness. See also "Testing lowest bits in isolation".

Non-examples include the following:

  • Any linear congruential generator with modulus 263 or less (such as java.util.Random and C++'s std::minstd_rand and std::minstd_rand0 engines) has a state length of less than 64 bits.
  • System.Random, as implemented in the .NET Framework 4.7, can take a seed of at most 32 bits, so has a state length of at most 32 bits.

Seeded PRNGs

Some applications (such as simulations, machine learning, and some games) use pseudorandom number generators (PRNGs) to generate apparently "random" numbers starting from a known initial state, or "seed". Such applications usually care about repeatable "randomness". (Note that in the definitions for cryptographic and statistical RNGs given later, the PRNGs involved are automatically seeded before use.)

When to Use a Seeded PRNG

By using a seeded PRNG for repeatable "randomness", an application will be tied to that PRNG or its implementation. For this reason, an application SHOULD NOT use a seeded PRNG (rather than an automatically-initialized PRNG or another kind of RNG) unless—

  1. the application might need to generate the same "random" result multiple times,
  2. the application either—
    • makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
    • finds it impractical to store or distribute the "random" numbers or results (rather than the seed) for later use, such as—
      • by saving the result to a file,
      • by storing the "random" numbers for the feature generating the result to "replay" later, or
      • by distributing the "random" numbers or results to networked users as they are generated, and
  3. any feature that uses such a PRNG to generate that "random" result will be consistent and deterministic, both across time and across supported hardware and operating systems, with respect to the "random" results it generates, for as long as that feature is still in use by the application.

Note: Meeting statement 3 is not exactly trivial.(11) Using only documented PRNGs, with implementation-independent behavior that will not change in the future, can help. java.util.Random is one such PRNG, but none of the following is such a PRNG:

  • The C rand method, as well as C++'s random number distribution classes, such as std::uniform_int_distribution, use implementation-defined algorithms for random number generation.
  • .NET's System.Random has random number generation behavior that could change in the future.

Which Seeded PRNG to Use

If an application decides to use a seeded PRNG for repeatable "randomness", that PRNG SHOULD meet or exceed the requirements of a statistical RNG (except the seed is application-defined instead) and SHOULD be reasonably fast.

Seed Generation for Seeded PRNGs

As much as possible, an application SHOULD generate seeds for a seeded PRNG using hard-to-predict data (e.g., the output of a cryptographic RNG) or as described in Nondeterministic Sources and Seed Generation.

Examples

Custom seeds can come into play in the following situations, among others.

Games

Many kinds of games generate game content based on apparent randomness, such as—

  • procedurally generated maps for a role-playing game,
  • shuffling a digital deck of cards for a solitaire game, or
  • a game board or puzzle board that normally varies every session,

where the game might need to generate the same content of that kind multiple times.

In general, such a game SHOULD NOT use a PRNG with a custom seed for such purposes unless—

  1. generating the random content uses relatively many random numbers (say, more than a few thousand), and the application finds it impractical to store or distribute the content or the numbers for later use, or
  2. the game makes the seed (or a "code" or "password" based on the seed, such as a barcode or a string of letters and digits) accessible to the player, to allow the player to regenerate the content.

Option 1 often applies to games that generate procedural terrain for game levels, since the terrain often exhibits random variations over an extended space. Option 1 is less suitable for puzzle game boards or card shuffling, since much less data needs to be stored.

Example: Suppose a game generates a map with random terrain and shows the player a "code" to generate that map. In this case, the game—

  • MAY change the algorithm it uses to generate random maps, but
  • SHOULD use, in connection with the new algorithm, "codes" that can't be confused with "codes" it used for previous algorithms, and
  • SHOULD continue to generate the same random map using an old "code" when the player enters it, even after the change to a new algorithm.

Unit Tests

A custom seed is appropriate when unit testing a method that uses a seeded PRNG in place of another kind of RNG for the purpose of the test (provided the test ensures backward compatibility).

Noise

Noise is a randomized variation in images, sound, and other data. (See also Red Blob Games, "Noise Functions and Map Generation"). For the purposes of RNG recommendations, there are two kinds of noise:

  1. Procedural noise is generated using a noise function, which is a function that takes an n-dimensional point and, optionally, additional data such as gradients or hash values, and outputs a seemingly random number.(12) Procedural noise includes cellular noise, value noise, and gradient noise (such as Perlin noise). The same instance of a procedural implementation SHOULD NOT output different "random" numbers at different times for the same n-dimensional point. Such an instance SHOULD be initialized either with "hard-coded" data only or RNG-generated data.

  2. Nonprocedural noise is generated using the help of an RNG. Nonprocedural noise includes colored noise (including white noise and pink noise), periodic noise, and noise following a Gaussian or other probability distribution. For nonprocedural noise, the same considerations apply to any RNGs the noise implementation uses as in cases not involving noise.

Verifiable Random Numbers

Verifiable random numbers are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, such information includes random numbers and/or uncertain data to be determined and publicly disclosed in the future. Generating verifiable randomness has been described in RFC 3797, in (Lenstra et al., 2015)(13), in (Boneh et al., 2018)(14) (which introduces the concept of verifiable delay functions, functions whose output deliberately takes time to compute but is easy to verify), and elsewhere.

Nondeterministic Sources and Seed Generation

To generate random numbers, RNGs ultimately rely on nondeterministic sources, that is, sources that don't return the same output for the same input every time. Such sources are used to help generate a seed for a PRNG, for example. The best nondeterministic sources for this purpose are those whose output is very hard to predict.

Examples of Nondeterministic Sources

Examples of nondeterministic sources are—

  • disk access timings,
  • timings of keystrokes and/or other input device interactions,
  • thermal noise,
  • the output of assembly instructions specially dedicated to random number generation, such as RdSeed,
  • the output generated with A. Seznec's technique called hardware volatile entropy gathering and expansion (HAVEGE), provided a high-resolution counter is available, and
  • differences between two high-resolution counter values taken in quick succession (such as in "Jitter RNG"; see (Müller)(15)).

RFC 4086, "Randomness Requirements for Security", section 3, contains a survey of nondeterministic sources.

Notes:

  1. Online services that make random numbers available to applications, as well as outputs of audio and video devices (see RFC 4086 sec. 3.2.1), are additional nondeterministic sources. However, online services require Internet or other network access, and some of them require access credentials. Also, many mobile operating systems require applications to declare network, camera, and microphone access to users upon installation. For these reasons, these kinds of sources are NOT RECOMMENDED if other approaches are adequate.
  2. For noncryptographic RNGs, timestamps from the system clock are commonly used. Timestamps with millisecond or coarser granularity are not encouraged, however, because multiple instances of a PRNG automatically seeded with a timestamp, when they are created at about the same time, run the risk of starting with the same seed and therefore generating the same sequence of random numbers.(16)
  3. For general-purpose use, nondeterministic sources that enable many high-quality seeds per second to be generated are highly advantageous, especially for a cryptographic RNG.

Entropy

Entropy is a value that describes how hard it is to predict a nondeterministic source's output, compared to ideal random data; this is generally the size in bits of the ideal random data. (For example, a 64-bit output with 32 bits of entropy is as hard to predict as an ideal random 32-bit data block.) NIST SP 800-90B recommends min-entropy as the entropy measure. Characterizing a nondeterministic source's entropy is nontrivial and beyond the scope of this document. See also RFC 4086 section 2.

Seed Generation

In general, especially for cryptographic RNGs, to generate an N-bit seed, enough data needs to be gathered from nondeterministic sources to reach N bits of entropy or more.

Once data with enough entropy is gathered, it might need to be condensed into a seed to initialize a PRNG with. Following (Cliff et al., 2009)(17), it is suggested to generate an N-bit seed by using an HMAC (hash-based message authentication code) algorithm, with outputs at least N times 3 bits long, on data with at least N times 3 bits of entropy, then truncating the output to N bits. See also NIST SP 800-90B sec. 3.1.5.1 and RFC 4086 sec. 4.2 and 5.2.

Existing RNG APIs in Programming Languages

As much as possible, applications SHOULD use existing libraries and techniques that already meet the requirements for cryptographic and statistical RNGs. The following table lists application programming interfaces (APIs) for such RNGs for popular programming languages.

  • Methods and libraries mentioned in the "Statistical" column need to be initialized with a seed before use (for example, a seed generated using an implementation in the "Cryptographic" column).
  • The mention of a third-party library in this section does not imply sponsorship or endorsement of that library, or imply a preference of that library over others. The list is not comprehensive.
  • See also Paragon's blog post on existing cryptographic RNGs.
LanguageCryptographicStatistical
.NET (incl. C# and VB.NET) (H)RNGCryptoServiceProvider in System.Security.Cryptography namespaceairbreather/AirBreather.Common library (XorShift1024Star, XorShift128Plus, XoroShiro128Plus)
C/C++ (G)(C)xoroshiro128plus.c (128-bit nonzero seed); xorshift128plus.c (128-bit nonzero seed); frostburn/jkiss library
Python (A)secrets.SystemRandom (since Python 3.6); os.urandom()pypcg package; ihaque/xorshift library (128-bit nonzero seed; default seed uses os.urandom())
Java (A) (D)(C); java.security.SecureRandom (F)grunka/xorshift (XORShift1024Star or XORShift128Plus); jenetics/prngine (KISS32Random, KISS64Random)
JavaScript (B)crypto.randomBytes(byteCount) (node.js only); random-number-csprng package (node.js only); crypto.getRandomValues() (Web)pcg-random or xoroshiro128starstar package
Ruby (A) (E)(C); SecureRandom.rand() (ranges from 0 to 1 exclusive) (E); SecureRandom.rand(N) (integer) (E) (for both, require 'securerandom'); sysrandom gem
PHP (A)random_int(), random_bytes() (both since PHP 7)
Gocrypto/rand package
Other Languages(C)

(A) The general RNGs of recent versions of Python and Ruby implement Mersenne Twister, which is not preferred for a statistical RNG. PHP's mt_rand() implements or implemented a flawed version of Mersenne Twister. prngine, a Java library, also has MT19937_32Random and MT19937_64Random classes that implement Mersenne Twister.

(B) JavaScript's Math.random() (which ranges from 0 to 1 exclusive) is implemented using xorshift128+ (or a variant) in the V8 engine, Firefox, and certain other modern browsers as of late 2017; Math.random() uses an "implementation-dependent algorithm or strategy", though (see ECMAScript sec. 20.2.2.27).

(C) A cryptographic RNG implementation can—

  • read from the /dev/urandom device in most Unix-based systems (using the open and read system calls where available)(18),
  • call the arc4random or arc4random_buf method on FreeBSD or macOS,
  • call the getentropy method on OpenBSD, or
  • call the BCryptGenRandom API in Windows 7 and later,

and only use other techniques if the existing ones are inadequate for the application.

(D) Java's java.util.Random class uses a 48-bit seed, so doesn't meet the statistical RNG requirements. However, a subclass of java.util.Random might be implemented to meet those requirements.

(E) Ruby's Random#rand and SecureRandom.rand methods present a beautiful and simple API for random number generation, in my opinion. Namely, rand() returns a number from 0 to 1 exclusive, and rand(N) returns an integer from 0 to N exclusive.

(F) Calling the setSeed method of SecureRandom before use is RECOMMENDED. The data passed to the method SHOULD be data described in note (C). (Despite the name, setSeed supplements the existing seed, according to the documentation.) See also (Klyubin 2013)(19). Using the SecureRandom implementation "SHA1PRNG" is NOT RECOMMENDED, because of weaknesses in seeding and RNG quality in implementations as of 2013 (Michaelis et al., 2013)(20).

(G) std::random_device, introduced in C++11, is NOT RECOMMENDED because its specification leaves considerably much to be desired. For example, std::random_device can fall back to a pseudorandom number generator of unspecified quality without much warning. At best, std::random_device SHOULD only be used to supplement other techniques for random number generation.

(H) The .NET Framework's System.Random class uses a seed of at most 32 bits, so doesn't meet the statistical RNG requirements. However, a subclass of System.Random might be implemented to meet those requirements.

RNG Topics

 

How to Initialize RNGs

For cryptographic RNGs, an application SHOULD use only one thread-safe instance of the RNG for the entire application to use.

For statistical RNGs, an application SHOULD use only one thread-safe instance of the RNG for the entire application to use. (If doing so is a performance bottleneck for the application and a PRNG is used, the application MAY instead create one RNG instance for each thread, as described below for seeded PRNGs.)

For seeded PRNGs, to reduce the chance of correlated random numbers or identical random number sequences, an application is encouraged to create one or more instances of a PRNG, where each instance—

  • is accessible to only one thread, task, or subtask of the application (such as with thread-local storage),
  • is initialized with a seed that is unrelated to the other seeds (using sequential or linearly related seeds can cause undesirable correlations in some PRNGs), and
  • MAY use a different conforming RNG scheme from the others.

(L'Ecuyer et al. 2015)(21), section 4, goes in greater detail on ways to initialize PRNGs for generating random numbers in parallel, including how to ensure repeatable "randomness" this way if that is desired.

Shuffling

In a list with N different items, there are N factorial (that is, 1 * 2 * ... * N, or N!) ways to arrange the items in that list. These ways are called permutations(22).

An application can shuffle a list

  • by generating a random integer at least 0 and less than the number of permutations, and converting that integer to a permutation, or
  • by doing a Fisher–Yates shuffle (which is unfortunately easy to mess up — see (Atwood 2007)(23) — and is implemented correctly in another document of mine).

Either way, however, if a PRNG's period is less than the number of permutations, then there are some permutations that that PRNG can't choose when it shuffles that list. (This is not the same as generating all permutations of a list, which, for a list big enough, can't be done by any computer in a reasonable time. A PRNG's period is the maximum size of a "random" number sequence it can generate before that sequence repeats; this size is no greater than 2L where L is the PRNG's state length.)

On the other hand, for a list big enough, it's generally more important to have shuffles act random than to choose from among all permutations.

An application that shuffles a list can do the shuffling—

  1. using a cryptographic RNG, preferably one with a security strength of B bits or greater, or
  2. if a noncryptographic RNG is otherwise appropriate, using a PRNG that—
    • has a state length of B bits or greater, and
    • qualifies as a statistical RNG except it's initialized with a seed derived from data with at least B bits of entropy, or "randomness".

Here, B can usually be calculated for different lists using the Python code in the appendix; see also (van Staveren 2000, "Lack of randomness")(24). For example, B is 226 (bits) for a 52-item list. An application MAY limit B to 256 or greater, in cases when variety of permutations is not important.

GPU Programming Environments

In general, GL Shading Language (GLSL) and other programming environments designed for execution on a graphics processing unit (GPU) are stateless (they read from and write to data without storing any state themselves). Approaches that have been used for random number generation in GPU environments include—

  • using hash functions, whose output is determined solely by the input rather than both the input and state (as with PRNGs)(25), and
  • sampling "noise textures" with random data in each pixel (Peters 2016)(26).

(L'Ecuyer et al. 2015)(27) discusses parallel generation of random numbers using PRNGs, especially on GPUs.

Hash Functions

A seemingly random number can be generated from arbitrary data using a hash function.

A hash function is a function that takes an arbitrary input of any size (such as an array of 8-bit bytes or a sequence of characters) and returns an output with a fixed number of bits. That output is also known as a hash code. (By definition, hash functions are deterministic(28).)

A hash code can be used as follows:

  • The hash code can serve as a seed for a PRNG, and the desired random numbers can be generated from that PRNG. (See my document on random number generation methods for techniques.)
  • If a number of random bits is needed, and the hash code has at least that many bits, then that many bits can instead be taken directly from the hash code.

Useful properties of some hash functions include—

  • the avalanche property (every bit of the input affects every bit of the output without a clear preference for 0 or 1),
  • collision resistance (finding two different inputs that lead to a given output is cost-prohibitive), and
  • the one-way property (finding an unknown input that leads to a given output is cost-prohibitive) (see NIST SP 800-108).

Hash functions not used for information security SHOULD have the avalanche property (e.g, MurmurHash3, xxHash, CityHash). Hash functions used for information security SHOULD have the collision resistance, avalanche, and one-way properties (e.g., SHA2-256, BLAKE2).

Guidelines for New RNG APIs

This section contains guidelines for those seeking to implement RNGs designed for wide reuse (such as in a programming language's standard library). As mentioned earlier, an application SHOULD use existing RNG implementations whenever possible.

This section contains suggested requirements on cryptographic and statistical RNGs that a new programming language can choose to adopt.

Cryptographic RNGs: Requirements

A cryptographic RNG generates uniformly distributed random bits such that it would be cost-prohibitive for an outside party to correctly guess, with more than a 50% chance per bit, prior or future unseen outputs of that RNG after knowing how the RNG works and/or extremely many outputs of the RNG, or prior unseen outputs of that RNG after knowing the RNG's internal state at the given point in time.

If a cryptographic RNG implementation uses a PRNG, the following requirements apply.

  1. The PRNG's state length MUST be at least 128 bits and SHOULD be at least 256 bits. The security strength used by the RNG MUST be at least 112 bits, SHOULD be at least 128 bits, and is less than or equal to the PRNG's state length.

  2. Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed described as follows. The seed—

    • MUST have as many bits as the PRNG's state length,
    • MUST consist of data that ultimately derives from the output of one or more nondeterministic sources, where the output is at least as hard to predict as ideal random data with as many bits as the security strength, and
    • MAY be mixed with arbitrary data other than the seed as long as the result is no easier to predict(29).
  3. The RNG SHOULD reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds if it would generate more than a threshold number of bits without reseeding, that threshold SHOULD be 267 or less.

Statistical RNGs: Requirements

A statistical RNG generates random bits, each of which is uniformly distributed independently of the other bits, at least for nearly all practical purposes outside of information security. If the implementation uses a PRNG, that PRNG algorithm MUST either satisfy the collision resistance property or be significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of BigCrush, part of L'Ecuyer and Simard's "TestU01". The RNG need not be perfectly equidistributed.

If a statistical RNG implementation uses a PRNG, the following requirements apply.

  1. The PRNG's state length MUST be at least 64 bits, SHOULD be at least 128 bits, and is encouraged to be as high as the implementation can go to remain reasonably fast for most applications.

  2. Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed described as follows. The seed—

    • MUST have as many bits as the PRNG's state length,
    • MUST consist of data that ultimately derives from the output of one or more nondeterministic sources and/or cryptographic RNGs, where the output is encouraged to cover a state space of at least as many bits as the PRNG's state length, and
    • MAY be mixed with arbitrary data other than the seed.
  3. The RNG MAY reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds if it would generate more than a threshold number of values without reseeding, that threshold SHOULD be 2L or less, where L is the PRNG's state length.

Implementing New RNG APIs

A programming language API designed for reuse by applications could implement RNGs using the following guidelines:

  1. The RNG API can include a method that fills one or more memory units (such as 8-bit bytes) completely with random bits (see example 1).
  2. If the API implements an automatically-initialized RNG, it SHOULD NOT allow applications to initialize that same RNG with a seed for repeatable "randomness"(30) (it MAY provide a separate PRNG to accept such a seed). See example 2.
  3. If the API provides a PRNG that an application can seed for repeatable "randomness", it SHOULD document that PRNG and any methods the API provides that use that PRNG (such as shuffling and Gaussian number generation), and SHOULD NOT change that PRNG or those methods in a way that would change the "random" numbers they deliver for a given seed. See example 2.
  4. My document on random number generation methods includes details on twelve uniform random number methods. In my opinion, a new programming language's standard library ought to include those twelve methods separately for cryptographic and for statistical RNGs.

Examples:

  1. A C language RNG method for filling memory could look like the following: int random(uint8_t[] bytes, size_t size);, where bytes is a pointer to an array of 8-bit bytes, and size is the number of random 8-bit bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise.
  2. A Java API that follows these guidelines can contain two classes: a RandomGen class that implements an unspecified but general-purpose RNG, and a RandomStable class that implements a PCG PRNG that is documented and will not change in the future. RandomStable includes a constructor that takes a seed for repeatable "randomness", while RandomGen does not. Both classes include methods to generate uniform random numbers, but RandomStable specifies the exact algorithms to those methods and RandomGen does not. At any time in the future, RandomGen can change its implementation to use a different RNG while remaining backward compatible, while RandomStable has to use the same algorithms for all time to remain backward compatible, especially because it takes a seed for repeatable "randomness".

Acknowledgments

I acknowledge—

  • the commenters to the CodeProject version of this page (as well as a similar article of mine on CodeProject), including "Cryptonite" and member 3027120, and
  • Lee Daniel Crocker, who reviewed this document and gave comments.

Notes

(1) A distinction between statistical and cryptographic RNGs seems natural, because many programming languages offer a general-purpose RNG (such as C's rand or Java's java.util.Random) and sometimes an RNG intended for information security purposes (such as Ruby's SecureRandom).

(2) F. Dörre and V. Klebanov, "Practical Detection of Entropy Loss in Pseudo-Random Number Generators", 2016.

(3) If the software and/or hardware uses a nonuniform distribution, but otherwise meets this definition, it can be converted to use a uniform distribution, at least in theory, using unbiasing, deskewing, or randomness extraction (see RFC 4086 sec. 4 or Cliff et al. 2009 for further discussion).

(4) Bernstein, D.J. "Fast-key-erasure random number generators", Jun. 23, 2017.

(5) Wetzels, J., "33C3: Analyzing Embedded Operating System Random Number Generators", samvartaka.github.io, Jan. 3, 2017.

(6) B. Peng, "Two Fast Methods of Generating True Random Numbers on the Arduino", GitHub Gist, December 2017.

(7) Jones, D., "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications", 2007/2010.

(8) Widynski, B., "Middle Square Weyl Sequence RNG", arXiv:1704.00358v1 [cs.CR], 2017.

(9) S. Vigna, "An experimental exploration of Marsaglia's xorshift generators, scrambled", 2016.

(10) Blackman, D., Vigna, S. "Scrambled Linear Pseudorandom Number Generators", 2018.

(11) An algorithm can become nondeterministic in many ways. For example, different results can happen because parallel operations can finish in a different order, items can be assigned differently to hash table buckets, or floating-point addition can be carried out in a different order.

Implementations of floating-point numbers and floating-point math can also differ, especially for math functions for which specifications don't always require delivering as-accurate-as-possible results (examples are the x87 FSIN instruction and the difference between Math and StrictMath in Java). Applications SHOULD avoid generating "random" floating-point numbers when they desire repeatable "randomness".

(12) Noise functions include functions that combine several outputs of a noise function, including by fractional Brownian motion. By definition, noise functions are deterministic.

(13) Lenstra, A.K., Wesolowski, B. "A random zoo: sloth, unicorn, and trx", 2015.

(14) Boneh, D., Bonneau, J., et al. "Verifiable Delay Functions", 2018.

(15) Müller, S. "CPU Time Jitter Based Non-Physical True Random Number Generator".

(16) For example, many questions on Stack Overflow highlight the pitfalls of creating a new instance of .NET's System.Random each time a random number is needed, rather than only once in the application. See also the section "How to Initialize RNGs".

(17) Cliff, Y., Boyd, C., Gonzalez Nieto, J. "How to Extract and Expand Randomness: A Summary and Explanation of Existing Results", 2009.

(18) Using the similar /dev/random is NOT RECOMMENDED, since in some implementations it can block for seconds at a time, especially if not enough randomness is available. See also "Myths about /dev/urandom".

(19) A. Klyubin, "Some SecureRandom Thoughts", Android Developers Blog, Aug. 14, 2013.

(20) Michaelis, K., Meyer, C., and Schwenk, J. "Randomly Failed! The State of Randomness in Current Java Implementations", 2013.

(21) P. L'Ecuyer, D. Munger, et al. "Random Numbers for Parallel Computers: Requirements and Methods, With Emphasis on GPUs". April 17, 2015.

(22) More generally, a list has N! / (W_1! * W_2! * ... * W_K!) permutations (a multinomial coefficient), where N is the list's size, K is the number of different items in the list, and W_i is the number of times the item identified by i appears in the list. However, this number is never more than N! and suggests using less randomness, so an application need not use this more complicated formula and MAY assume that a list has N! permutations even if some of its items occur more than once.

(23) Atwood, Jeff. "The danger of naïveté", Dec. 7, 2007.

(24) van Staveren, Hans. "Big Deal: A new program for dealing bridge hands", Sep. 8, 2000

(25) The only binary floating-point numbers supported by some GPUs are 16-bit (with 10 significant bits of precision), notably not 32- or 64-bit as is otherwise common. An application ought to choose hash functions that deliver acceptable "noise" regardless of the size of floating-point numbers supported by the GPU.

(26) C. Peters, "Free blue noise textures", Moments in Graphics, Dec. 22, 2016. That article discusses the sampling of blue noise, not independent uniformly-distributed random numbers, but a similar approach applies to textures of noise however generated.

(27) P. L'Ecuyer, D. Munger, et al. "Random Numbers for Parallel Computers: Requirements and Methods, With Emphasis on GPUs". April 17, 2015.

(28) Note that although PRNGs can also act like hash functions (if they're seeded with the input and the PRNG is "large enough" for the input), some PRNGs (such as xorshift128+) are not well suited to serve as hash functions, because they don't mix their state before generating a random number from that state.

(29) Such arbitrary data can include process identifiers, time stamps, environment variables, random numbers, virtual machine guest identifiers, and/or other data specific to the session or to the instance of the RNG. See also NIST SP800-90A and the references below.
Everspaugh, A., Zhai, Y., et al. "Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG", 2014.
Ristenpart, T., Yilek, S. "When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography", 2010.

(30) Allowing applications to do so would hamper forward compatibility — the API would then be less free to change how the RNG is implemented in the future (e.g., to use a cryptographic or otherwise "better" RNG), or to make improvements or bug fixes in methods that use that RNG (such as shuffling and Gaussian number generation). (As a notable example, the V8 JavaScript engine recently changed its Math.random() implementation to use a variant of xorshift128+, which is backward compatible because nothing in JavaScript allows Math.random() to be seeded.) Nevertheless, APIs can still allow applications to provide additional input ("entropy") to the RNG in order to increase its randomness rather than to ensure repeatability.

Appendix

 

Suggested Entropy Size

The following Python code suggests how many bits of entropy are needed for shuffling. For example:

  • To shuffle an n-item list, the suggested bits of entropy is at least as high as the base-2 logarithm, rounded up, of n! (stateLengthN(n)).
  • To shuffle a 52-item list, at least 226 bits of entropy is suggested (stateLengthN(52)).
  • To shuffle two 52-item lists of identical contents together, at least 500 bits of entropy is suggested (stateLengthDecks(2, 52)).

 

from math import factorial as fac

def ceillog2(x):
    """ Calculates base-2 logarithm, rounded up, of x. """
    ret=0
    needCeil=True
    while x>1:
       one=needCeil and ((x&1)!=0)
       x=x>>1
       if one:
         ret+=1; needCeil=False
       ret+=1
    return ret

def stateLengthN(n):
  """ Suggested state length (or bits of entropy)
     for PRNGs that shuffle
    a list of n items. """
  return ceillog2(fac(n))

def stateLengthNChooseK(n, k):
  """ Suggested state length/entropy for PRNGs that choose k
   different items randomly from a list of n items
   (see RFC 3797, sec. 3.3) """
  return ceillog2(fac(n)/(fac(k)*fac(n-k)))

def stateLengthDecks(numDecks, numCards):
  """ Suggested state length/entropy for PRNGs that shuffle
    multiple decks of cards in one. """
  return ceillog2(fac(numDecks*numCards)/ \
      (fac(numDecks)**numCards))

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

Share

About the Author

Peter Occil
United States United States
No Biography provided

You may also be interested in...

Pro

Comments and Discussions

 
GeneralMessage Closed Pin
24-Oct-18 19:02
memberMember 1403197524-Oct-18 19:02 
QuestionWell written, well explained - now a C#/VB.NET addendum Pin
MSBassSinger20-Aug-18 7:19
professionalMSBassSinger20-Aug-18 7:19 
QuestionRandom.org Pin
Blair01116-May-18 10:01
memberBlair01116-May-18 10:01 
GeneralWeighted shuffle Pin
Member 1296954116-May-18 8:33
memberMember 1296954116-May-18 8:33 
QuestionGreat article... Pin
Zdenek Sedlak14-May-18 23:23
professionalZdenek Sedlak14-May-18 23:23 
PraiseGreat article Pin
Michael Haephrati27-Apr-18 8:42
mvpMichael Haephrati27-Apr-18 8:42 
Praisev Pin
BillW3322-Jan-18 8:35
professionalBillW3322-Jan-18 8:35 
QuestionGreat article Pin
Bill Eckard20-Jan-18 10:30
memberBill Eckard20-Jan-18 10:30 
QuestionMessage Closed Pin
20-Jan-18 8:00
memberkanchipuramsarees20-Jan-18 8:00 
QuestionRandom number by a machine is nor possible, all will repeat. Pin
Member 1126470618-Jan-18 21:59
memberMember 1126470618-Jan-18 21:59 
AnswerRe: Random number by a machine is nor possible, all will repeat. Pin
A_Griffin18-Jan-18 23:14
memberA_Griffin18-Jan-18 23:14 
AnswerRe: Random number by a machine is nor possible, all will repeat. Pin
sx200819-Jan-18 10:30
membersx200819-Jan-18 10:30 
GeneralRe: Random number by a machine is nor possible, all will repeat. Pin
Member 1290021119-Jan-18 21:24
memberMember 1290021119-Jan-18 21:24 
Generalawesome explanation Pin
Jenny326312-Nov-17 18:42
memberJenny326312-Nov-17 18:42 
GeneralLike the way of explaining Pin
Member 133918593-Sep-17 19:45
memberMember 133918593-Sep-17 19:45 
GeneralMy vote of 5 Pin
Arthur V. Ratz3-Aug-17 1:48
mvaArthur V. Ratz3-Aug-17 1:48 
QuestionConstrained RNGs Pin
RogerCreagh16-Jul-17 22:07
memberRogerCreagh16-Jul-17 22:07 
AnswerRe: Constrained RNGs Pin
Peter Occil17-Jul-17 3:39
memberPeter Occil17-Jul-17 3:39 
GeneralRe: Constrained RNGs Pin
degski11-Sep-18 18:20
memberdegski11-Sep-18 18:20 
GeneralRe: Constrained RNGs Pin
Peter Occil12-Sep-18 2:13
memberPeter Occil12-Sep-18 2:13 
Questionquery seeding recomendations Pin
Member 133012369-Jul-17 17:28
memberMember 133012369-Jul-17 17:28 
AnswerRe: query seeding recomendations Pin
Peter Occil9-Jul-17 18:28
memberPeter Occil9-Jul-17 18:28 
GeneralRe: query seeding recomendations Pin
Member 1330123611-Jul-17 17:32
memberMember 1330123611-Jul-17 17:32 
AnswerRe: query seeding recomendations Pin
Peter Occil11-Jul-17 3:24
memberPeter Occil11-Jul-17 3:24 
GeneralRe: query seeding recomendations Pin
Member 1330123611-Jul-17 17:59
memberMember 1330123611-Jul-17 17:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web02 | 2.8.190306.1 | Last Updated 11 Mar 2019
Article Copyright 2016 by Peter Occil
Everything else Copyright © CodeProject, 1999-2019
Layout: fixed | fluid