## Introduction and Summary

Many applications rely on random number generators (RNGs); these RNGs include—

*statistical RNGs*, which seek to generate numbers that follow a uniform random distribution,*cryptographic RNGs*, which seek to generate numbers that are cost-prohibitive to predict, and*seeded PRNGs* (pseudorandom number generators), which generate numbers that "seem" random given an initial "seed".

**This document covers:**

- Statistical and cryptographic RNGs, as well as recommendations on their use and properties.
- A discussion on when an application that needs numbers that "seem" random SHOULD specify their own "seed" (the initial state that the numbers are based on).
- Nondeterministic RNGs, entropy, and seed generation.
- An explanation of how to implement RNGs in programming code, including APIs that help in doing so.
- Issues on shuffling with an RNG.

**This document does not cover:**

- Testing an RNG implementation for correctness or adequate random number generation (e.g., Dörre and Klebanov 2016
^{(1)}). - Generation of random numbers or keying material based at least in part on a password (e.g.
*key derivation functions*). - Generation of random numbers that follow a nonuniform distribution; I discuss this topic in
**another document**. - Low-discrepancy sequences (quasirandom sequences), such as Sobol sequences. Their structure differs in an essential way from independent uniform random numbers.
- Applications for which the selection of RNGs is constrained by regulatory requirements.
- Key exchange protocols, including those based on a secret password (
*PAKEs*).

**The following table summarizes the kinds of RNGs covered in this document:**

Kind of RNG | When to Use This RNG | Examples |
---|

**Cryptographic RNG** | In information security cases, or when speed is not a concern. | `/dev/urandom` , `BCryptGenRandom` |

**Statistical RNG** | When information security is not a concern, but speed is. See also **"Shuffling"**. | `xoroshiro128+` , `xorshift128+` |

**Seeded PRNG** | When generating reproducible "randomness" in a way not practical otherwise. | High-quality PRNG with custom seed |

### About This Document

**This is an open-source document; for an updated version, see the** **source code** **or its** **rendering on GitHub****. You can send comments on this document either on** **CodeProject** **or on the** **GitHub issues page****.**

## Contents

## Definitions

The following definitions are helpful in better understanding this document.

**Random number generator (RNG).** Software and/or hardware that seeks to generate independent numbers that seem to occur by chance and that are approximately uniformly distributed^{(2)}.**Pseudorandom number generator (PRNG).** A random number generator that outputs seemingly random numbers using a deterministic algorithm (that is, an algorithm that returns the same output for the same input and state every time), and in which its state can be initialized and possibly reinitialized with arbitrary data.**Seed.** Arbitrary data for initializing the state of a PRNG.**State length.** The maximum size of the seed a PRNG can take to initialize its state without shortening or compressing that seed.**Period.** The maximum number of values in a generated sequence for a PRNG before that sequence repeats. The period will not be greater than 2^{L} where *L* is the PRNG's *state length*.**Stable.** A programming interface is *stable* if it has no behavior that is unspecified, implementation-dependent, nondeterministic, or subject to future change.**Information security.** Defined in ISO/IEC 27000.**Nondeterministic source.** Data source that does not always return the same output for the same input.**MUST, SHOULD, SHOULD NOT, MAY, RECOMMENDED, NOT RECOMMENDED.** As defined in RFC 2119 and RFC 8174.

## Cryptographic RNGs

Cryptographic RNGs (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that are cost-prohibitive to predict. Cryptographic RNGs are RECOMMENDED for applications that use random numbers for information security, such as—

- generating keying material, such as encryption keys,
- generating random passwords, nonces, or session identifiers,
- generating "salts" to vary hash codes of the same password,
- use in communications between two networked computers (including in data transfer, data transport, and messaging), and
- cases (such as in multiplayer networked games) when predicting future random numbers would give a player or user a significant and unfair advantage,

as well as for applications that generate random numbers so infrequently that the RNG's speed is not a concern.

### Requirements

A cryptographic RNG implementation generates uniformly distributed random bits such that it would be at least cost-prohibitive for an outside party to guess prior unseen bits of the random sequence correctly with more than a 50% chance per bit, even with knowledge of the randomness-generating procedure, the implementation's internal state at the given point in time, and/or extremely many outputs of the RNG.

If a cryptographic RNG implementation uses a PRNG, the following requirements apply:

The PRNG's *state length* MUST be at least 128 bits and SHOULD be at least 256 bits. The *security strength* used by the RNG MUST be at least 112 bits, SHOULD be at least 128 bits, and is less than or equal to the PRNG's *state length*.

Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed defined as follows. The seed—

- MUST have as many bits as the PRNG's
*state length*, - MUST consist of data that ultimately derives from the output of one or more
**nondeterministic sources**, where the output is at least as hard to predict as ideal random data with as many bits as the *security strength*, and - MAY be mixed with arbitrary data other than the seed as long as the result is no easier to predict
^{(3)}.

The RNG SHOULD reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds, it SHOULD do so as often as feasible (whenever doing so would not slow down applications undesirably). If the RNG reseeds if it would generate more than a threshold number of bits without reseeding, that threshold SHOULD be 2^{67} or less.

### Examples

Examples of cryptographic RNG implementations include the following:

- The
`/dev/random`

device on many Unix-based operating systems, which generally uses only nondeterministic sources; however, in some implementations of the device it can block for seconds at a time, especially if not enough randomness is available. - The
`/dev/urandom`

device on many Unix-based operating systems, which often relies on both a PRNG and the same nondeterministic sources used by `/dev/random`

. - The
`BCryptGenRandom`

method in Windows 7 and later. - Two-source extractors, multi-source extractors, or cryptographic
**hash functions** that take very hard-to-predict signals from two or more nondeterministic sources as input. - A "fast-key-erasure" random number generator described by D.J. Bernstein in his blog (Bernstein 2017)
^{(4)}. - An RNG implementation complying with NIST SP 800-90A. The SP 800-90 series goes into further detail on how RNGs appropriate for information security can be constructed, and inspired much of the "Cryptographic RNGs" section.

### Resource-Constrained Devices

Unlike with general-purpose computing devices such as desktops and smartphones, resource-constrained devices ("embedded" devices) are much less likely to have a cryptographic RNG available (Wetzels 2017)^{(5)}, although methods exist for implementing a cryptographic RNG on the Arduino (Peng 2017)^{(6)}.

## Statistical RNGs

Statistical RNGs are used, for example, in simulations, machine learning, numerical integration, and many games to bring an element of chance and variation to the application. However, statistical RNGs are generally suitable only if—

- information security is not involved, and
- the application generates random numbers so frequently that it would slow down undesirably if a cryptographic RNG were used instead.

A statistical RNG is usually implemented with a PRNG, but can also be implemented in a similar way as a cryptographic RNG provided it remains reasonably fast.

### Requirements

A statistical RNG generates random bits, each of which is uniformly distributed independently of the other bits, at least for nearly all practical purposes. If the implementation uses a PRNG, that PRNG algorithm MUST either satisfy the *collision resistance* property or be significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of `BigCrush`

, part of L'Ecuyer and Simard's "TestU01". The RNG need not be perfectly equidistributed.

If a statistical RNG implementation uses a PRNG, the following requirements apply.

The PRNG's *state length* MUST be at least 64 bits, SHOULD be at least 128 bits, and is encouraged to be as high as the implementation can go to remain reasonably fast for most applications.

Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed described as follows. The seed—

- MUST have as many bits as the PRNG's
*state length*, - MUST consist of data that ultimately derives from the output of one or more
**nondeterministic sources** and/or cryptographic RNGs, where the output is encouraged to cover a state space of at least as many bits as the PRNG's *state length*, and - MAY be mixed with arbitrary data other than the seed.

The RNG MAY reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds if it would generate more than a threshold number of values without reseeding, that threshold SHOULD be the PRNG's period's square root or less.

### Examples and Non-Examples

Examples of statistical RNGs include the following:

**xoshiro256**** (state length 256 bits; nonzero seed).**xoroshiro128**** (state length 128 bits; nonzero seed).`Lehmer64`

and `Lehmer128`

(for each: state length 128 bits; odd seed, so effectively 127 bits state length).- XorShift* 128/64 (state length 128 bits; nonzero seed).
- XorShift* 64/32 (state length 64 bits; nonzero seed).
`JKISS`

, `JKISS32`

, `JLKISS`

, `JLKISS64`

, described in (Jones 2007/2010)^{(7)}.- C++'s
`std::ranlux48`

engine (state length 577 bits; nonzero seed). - PCG (
`pcg32`

, `pcg64`

, and `pcg64_fast`

classes), by Melissa O'Neill. See also a **critique by S. Vigna**. - Other examples include B. Jenkins's "A small noncryptographic PRNG" (sometimes called
`jsf`

), C. Doty-Humphrey's `sfc`

, `msws`

(Widynski 2017)^{(8)}, and D. Blackman's `gjrand`

.

The following also count as statistical RNGs, but are not preferred:

- Mersenne Twister shows a
**systematic failure** in `BigCrush`

's LinearComp test. (See also (Vigna 2016)^{(9)}.) `xoroshiro128+`

, `xoshiro256+`

, and `xorshift128+`

. As described by (Blackman and Vigna 2018)^{(10)}, these linear PRNGs use weak scramblers, so that each output's lowest bits have low linear complexity even though the output as a whole has excellent statistical randomness. See also **"Testing lowest bits in isolation"**.

Non-examples include the following:

- Any
**linear congruential generator** with modulus 2^{63} or less (such as `java.util.Random`

and C++'s `std::minstd_rand`

and `std::minstd_rand0`

engines) has a *state length* of less than 64 bits. `System.Random`

, as implemented in the .NET Framework 4.7, can take a seed of at most 32 bits, so has a state length of at most 32 bits.

## Seeded PRNGs

In addition, some applications use pseudorandom number generators (PRNGs) to generate apparently "random" numbers starting from a known initial state, or "seed". Such applications usually care about repeatable "randomness". (Note that in the definitions for **cryptographic** and **statistical** RNGs given earlier, the PRNGs involved are automatically seeded before use.)

### When to Use a Seeded PRNG

An application SHOULD NOT use a PRNG with a seed it specifies (rather than an automatically-initialized PRNG or another kind of RNG) unless—

- the application might need to generate the same "random" result multiple times,
- the application either—
- makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
- finds it impractical to store or distribute the "random" numbers or results (rather than the seed) for later use, such as—
- by saving the result to a file,
- by storing the "random" numbers for the feature generating the result to "replay" later, or
- by distributing the "random" numbers or results to networked users as they are generated, and

- any feature that uses such a PRNG to generate that "random" result will remain backward compatible with respect to the "random" results it generates, for as long as that feature is still in use by the application.

**Note:** Meeting statement 3 is aided by using *stable* PRNGs; see **"Definitions"** and the following examples:

`java.util.Random`

is stable.- The C
`rand`

method is not stable (because the algorithm it uses is unspecified). - C++'s random number distribution classes, such as
`std::uniform_int_distribution`

, are not stable (because the algorithms they use are implementation-defined according to the specification). - .NET's
`System.Random`

is not stable (because its generation behavior could change in the future).

Using seeded RNGs ties the application to a particular RNG or RNG implementation, which is why they are NOT RECOMMENDED except in the limited circumstances given above.

### Which Seeded PRNG to Use

If an application decides to use a seeded PRNG for repeatable "randomness", that PRNG SHOULD meet or exceed the requirements of a **statistical RNG** (except it may use a custom seed) and SHOULD be reasonably fast.

### Seed Generation for Seeded PRNGs

As much as possible, an application SHOULD generate seeds for a seeded PRNG—

### Examples

Custom seeds can come into play in the following situations, among others.

#### Games

Many kinds of games generate game content based on apparent randomness, such as—

- procedurally generated maps for a role-playing game,
**shuffling** a digital deck of cards for a solitaire game, or- a game board or puzzle board that normally varies every session,

where the game might need to generate the same content of that kind multiple times.

In general, such a game SHOULD NOT use a PRNG with a custom seed for such purposes unless—

- generating the random content uses relatively many random numbers (say, more than a few thousand), and the application finds it impractical to store or distribute the content or the numbers for later use, or
- the game makes the seed (or a "code" or "password" based on the seed, such as a barcode or a string of letters and digits) accessible to the player, to allow the player to regenerate the content.

Option 1 often applies to games that generate procedural terrain for game levels, since the terrain often exhibits random variations over an extended space. Option 1 is less suitable for puzzle game boards or card shuffling, since much less data needs to be stored.

**Example:** Suppose a game generates a map with random terrain and shows the player a "code" to generate that map. In this case, the game—

- MAY change the algorithm it uses to generate random maps, but
- SHOULD use, in connection with the new algorithm, "codes" that can't be confused with "codes" it used for previous algorithms, and
- SHOULD continue to generate the same random map using an old "code" when the player enters it, even after the change to a new algorithm.

#### Unit Tests

A custom seed is appropriate when unit testing a method that uses a seeded PRNG in place of another kind of RNG for the purpose of the test (provided the test ensures backward compatibility).

#### Noise

*Noise* is a randomized variation in images, sound, and other data. (See also Red Blob Games, **"Noise Functions and Map Generation"**). For the purposes of RNG recommendations, there are two kinds of noise:

*Procedural noise* is generated using a *noise function*, which is a function that outputs seemingly random numbers given an *n*-dimensional point and, optionally, additional data (such as gradients or hash values).^{(11)} Procedural noise includes **cellular noise**, **value noise**, and **gradient noise** (such as **Perlin noise**). As much as possible, procedural noise implementations SHOULD **use an RNG to generate the additional data** for the noise function in advance. If using a **custom-seeded PRNG** is appropriate for the application, the additional data MAY be **"hard-coded"** instead. If the noise function **incorporates a** *hash function*, that hash function SHOULD be reasonably fast, be *stable* (see **"Definitions"**), and have the so-called *avalanche property*.

*Nonprocedural noise* is generated using the help of an RNG. Nonprocedural noise includes **colored noise** (including white noise and pink noise), periodic noise, and noise following a Gaussian or other **probability distribution**. For nonprocedural noise, the same considerations apply to any RNGs the noise implementation uses as in cases not involving noise.

#### Verifiable Random Numbers

*Verifiable random numbers* are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, such information includes random numbers and/or uncertain data to be determined and publicly disclosed in the future. Generating verifiable randomness has been described in **RFC 3797**, in (Lenstra et al., 2015)^{(12)}, in (Boneh et al., 2018)^{(13)} (which introduces the concept of *verifiable delay functions*, functions whose output deliberately takes time to compute but is easy to verify), and elsewhere.

## Nondeterministic Sources and Seed Generation

RNGs ultimately rely on nondeterministic sources to generate random numbers. Such sources are used to help generate a *seed* for a PRNG, for example. The best nondeterministic sources for this purpose are those whose output is very hard to predict.

### Examples of Nondeterministic Sources

Examples of nondeterministic sources are—

- disk access timings,
- timings of keystrokes and/or other input device interactions,
- thermal noise,
- the output of assembly instructions specially dedicated to random number generation, such as RdSeed,
- the output generated with A. Seznec's technique called hardware volatile entropy gathering and expansion (HAVEGE), provided a high-resolution counter is available, and
- differences between two high-resolution counter values taken in quick succession (such as in "Jitter RNG"; see (Müller)
^{(14)}).

RFC 4086, "Randomness Requirements for Security", section 3, contains a survey of nondeterministic sources.

**Notes:**

- Online services that make random numbers available to applications, as well as outputs of audio and video devices (see RFC 4086 sec. 3.2.1), are additional nondeterministic sources. However, online services require Internet or other network access, and some of them require access credentials. Also, many mobile operating systems require applications to declare network, camera, and microphone access to users upon installation. For these reasons, these kinds of sources are NOT RECOMMENDED if other approaches are adequate.
- For noncryptographic RNGs, timestamps from the system clock are commonly used. Timestamps with millisecond or coarser granularity are not encouraged, however, because multiple instances of a PRNG automatically seeded with a timestamp, when they are created at about the same time, run the risk of starting with the same seed and therefore generating the same sequence of random numbers.
- For general-purpose use, nondeterministic sources that enable many high-quality seeds per second to be generated are highly advantageous, especially for a cryptographic RNG.

### Entropy

*Entropy* is a value that describes how hard it is to predict a nondeterministic source's output, compared to ideal random data; this is generally the size in bits of the ideal random data. (For example, a 64-bit output with 32 bits of entropy is as hard to predict as an ideal random 32-bit data block.) NIST SP 800-90B recommends *min-entropy* as the entropy measure. Characterizing a nondeterministic source's entropy is nontrivial and beyond the scope of this document. See also RFC 4086 section 2.

### Seed Generation

In general, especially for cryptographic RNGs, **to generate an N-bit seed, enough data needs to be gathered from nondeterministic sources to reach N bits of entropy or more**.

Once data with enough entropy is gathered, it might need to be condensed into a seed to initialize a PRNG with. Following (Cliff et al., 2009)^{(15)}, it is suggested to generate an N-bit seed by using an HMAC or "cascade" hash function (such as SHA-256 or SHA-512), with outputs at least N times 3 bits long, on data with at least N times 3 bits of entropy, then truncating the output to N bits. See also NIST SP 800-90B sec. 3.1.5.1 and RFC 4086 sec. 4.2 and 5.2.

## Implementing RNGs in Programming Languages

As much as possible, **applications SHOULD use existing libraries and techniques** that already meet the requirements for cryptographic and statistical RNGs.

The following table lists application programming interfaces (APIs) for
cryptographic and statistical RNGs for popular programming languages. Note the following:

- Methods and libraries mentioned in the "Statistical" column need to be initialized with a seed before use (for example, a seed generated using an implementation in the "Cryptographic" column).
- The mention of a third-party library in this section does not imply sponsorship or endorsement of that library, or imply a preference of that library over others. The list is not comprehensive.

Language | Cryptographic | Statistical | Other |
---|

.NET (incl. C# and VB.NET) (H) | `RNGCryptoServiceProvider` in `System.Security.Cryptography` namespace | **airbreather/AirBreather.Common library** (XorShift1024Star, XorShift128Plus, XoroShiro128Plus) | |

C/C++ (G) | (C) | `xoroshiro128plus.c` (128-bit nonzero seed); `xorshift128plus.c` (128-bit nonzero seed); **frostburn/jkiss** library | |

Python | `secrets.SystemRandom` (since Python 3.6); `os.urandom()` | **ihaque/xorshift** library (128-bit nonzero seed; default seed uses `os.urandom()` ) | `random.getrandbits()` (A); `random.seed()` (19,936-bit seed) (A) |

Java (D) | (C); `java.security.SecureRandom` (F) | **grunka/xorshift** (`XORShift1024Star` or `XORShift128Plus` ); **jenetics/prngine** (`KISS32Random` , `KISS64Random` ) | prngine library (`MT19937_32Random` , `MT19937_64Random` ) |

JavaScript (B) | `crypto.randomBytes(byteCount)` or `random-number-csprng` package (node.js only); `crypto.getRandomValues()` (Web) | `xorshift` library | |

Ruby | (C); `SecureRandom.rand()` (ranges from 0 through 1) (E); `SecureRandom.rand(N)` (integer) (E) (for both, `require 'securerandom'` ) | | `Random#rand()` (ranges from 0 through 1) (A) (E); `Random#rand(N)` (integer) (A) (E); `Random.new(seed)` (default seed uses nondeterministic data) |

PHP | `random_int()` , `random_bytes()` (both since PHP 7) | | `mt_rand()` (A) |

(A) General RNG implements **Mersenne Twister**, which is not preferred for a statistical RNG. PHP's `mt_rand()`

implements or implemented a flawed version of Mersenne Twister.

(B) JavaScript's `Math.random`

(which ranges from 0 through 1) is implemented using `xorshift128+`

(or a variant) in the V8 engine, Firefox, and certain other modern browsers as of late 2017; the exact algorithm to be used by JavaScript's `Math.random`

is "implementation-dependent", though, according to the ECMAScript specification.

(C) A cryptographic RNG implementation can—

- read from the
`/dev/urandom`

device in most Unix-based systems (using the `open`

and `read`

system calls where available), - call the
`getentropy`

method on OpenBSD, or - call the
`BCryptGenRandom`

API in Windows 7 and later,

and only use other techniques if the existing ones are inadequate for the application.

(D) Java's `java.util.Random`

class uses a 48-bit seed, so doesn't meet the statistical RNG requirements. However, a subclass of `java.util.Random`

might be implemented to meet those requirements.

(E) In my opinion, Ruby's `Random#rand`

and `SecureRandom.rand`

methods present a beautiful and simple API for random number generation.

(F) Calling the `setSeed`

method of `SecureRandom`

before use is RECOMMENDED. The data passed to the method SHOULD be data described in note (C). (Despite the name, `setSeed`

*supplements* the existing seed, according to the documentation.) See also (Klyubin 2013)^{(16)}.

(G) `std::random_device`

, introduced in C++11, is NOT RECOMMENDED because its specification leaves considerably much to be desired. For example, `std::random_device`

can fall back to a pseudorandom number generator of unspecified quality without much warning. At best, `std::random_device`

SHOULD only be used to supplement other techniques for random number generation.

(H) The .NET Framework's `System.Random`

class uses a seed of at most 32 bits, so doesn't meet the statistical RNG requirements. However, a subclass of `System.Random`

might be implemented to meet those requirements.

In the limited cases where existing solutions are inadequate, a **programming language API** could implement cryptographic and statistical RNGs by filling one or more memory units (such as 8-bit bytes) completely with random bits.

**Example:** A C language API for an RNG could look like the following: `int random(uint8_t[] bytes, size_t size);`

, where `bytes`

is a pointer to an array of 8-bit bytes, and `size`

is the number of random 8-bit bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise.

If an API implements an automatically-initialized RNG, it SHOULD NOT allow applications to initialize that same RNG with a seed for repeatable "randomness"^{(17)} (it MAY provide a separate PRNG to accept such a seed). If an API provides a PRNG that an application can seed for repeatable "randomness", that PRNG SHOULD be *stable* and documented, and so SHOULD any methods the API provides that use that PRNG (such as shuffling and Gaussian number generation).

My document on **random number generation methods** includes details on ten uniform random number methods. In my opinion, a new programming language's standard library ought to include those ten methods separately for cryptographic and for statistical RNGs.

## RNG Topics

### How to Initialize RNGs

For cryptographic RNGs, an application SHOULD use only one thread-safe instance of the RNG for the entire application to use.

For statistical and seeded RNGs, to **reduce the chance of correlated random numbers or identical random number sequences**, an application is encouraged to create—

- one thread-safe instance of an RNG for the entire application to use, or
- one instance of an RNG for each thread of the application, where each instance—
- is accessible to only one thread (such as with thread-local storage), and
- is initialized with a seed that is unrelated to the other seeds (using sequential or linearly related seeds can cause
**undesirable correlations** in some PRNGs).

An application that generates **random numbers in parallel** can also do one or both of—

- using a different conforming RNG scheme for each instance, and
- using a conforming RNG scheme specially designed for parallel random number generation (such as so-called "splittable" PRNGs).

(Many questions on *Stack Overflow* highlight the pitfalls of creating a new RNG instance each time a random number is needed, rather than only once in the application. This is notably the case with the .NET generator `System.Random`

.)

### Shuffling

In a list with `N`

different items, there are `N`

factorial (that is, `1 * 2 * ... * N`

, or `N!`

) ways to arrange the items in that list. These ways are called *permutations*^{(18)}.

An application can **shuffle a list**—

- by generating a random integer at least 0 and less than the number of permutations, and converting that integer to a permutation, or
- by doing a
**Fisher–Yates shuffle** (which is unfortunately easy to mess up — see (Atwood)^{(19)} — and is implemented correctly in **another document of mine**).

Either way, however, if a PRNG's period is less than the number of permutations, then there are **some permutations that that PRNG can't choose** when it shuffles that list. (This is not the same as *generating* all permutations of a list, which, for a list big enough, can't be done by any computer in a reasonable time.)

If an application uses PRNGs for shuffling purposes, it is encouraged to—

- choose a PRNG with a state length
`B`

or greater, then - gather data with at least
`B`

bits of *entropy* (randomness), then **generate a full-length seed** with the data gathered this way, then- pass the seed to the chosen PRNG, then
- use the PRNG to do a Fisher–Yates shuffle.

Here, `B`

can usually be calculated for different lists using the Python code in the **appendix**; see also (van Staveren 2000, "Lack of randomness")^{(20)}. For example, `B`

is 226 bits for a 52-item list. (However, if information security is not involved, an application can instead choose any number 256 or more for `B`

and follow the steps above accordingly — for a list big enough, it's generally more important to have shuffles act random than to choose from among all permutations.)

The PRNG chosen this way SHOULD meet or exceed the requirements of a statistical RNG (except it uses a seed generated as given above) and SHOULD have the highest feasible period for its state length.

### GPU Programming Environments

In general, GL Shading Language (GLSL) and other programming environments designed for execution on a graphics processing unit (GPU) are stateless (they take data in and give data out without storing any state themselves), so random number generators for such environments are often designed as **hash functions**, because their output is determined solely by the input rather than both the input and state (as with PRNGs).

However, some of the hash functions which have been written in GLSL give undesirable results in computers whose GPUs support only 16-bit binary floating point numbers and no other kinds of numbers, which makes such GPUs an important consideration when choosing a hash function.

## Hash Functions

A seemingly random number can be generated from arbitrary data using a *hash function*.

A *hash function* is a function that takes an arbitrary input of any size (such as an array of 8-bit bytes or a sequence of characters) and returns an output with a fixed number of bits. That output is also known as a *hash code*. (By definition, hash functions are deterministic^{(21)}.)

A hash code can be used as follows:

- The hash code can serve as a seed for a PRNG, and the desired random numbers can be generated from that PRNG. (See my document on
**random number generation methods** for techniques.) - If a number of random bits is needed, and the hash code has at least that many bits, then that many bits can instead be taken directly from the hash code.

Useful properties of some hash functions include—

- the
*avalanche property* (every bit of the input affects every bit of the output without a clear preference for 0 or 1), *collision resistance* (finding two different inputs that lead to a given output is cost-prohibitive), and- the
*one-way property* (finding an unknown input that leads to a given output is cost-prohibitive) (see NIST SP 800-108).

Hash functions used for information security SHOULD have the collision resistance and one-way properties (e.g., SHA2-256, BLAKE2). Hash functions not used for information security SHOULD have the avalanche property (e.g, MurmurHash3, xxHash, CityHash).

## Motivation

What has motivated me to write a more rigorous definition of random number generators is the fact that many applications still use weak RNGs. In my opinion, this is largely because most popular programming languages today—

- specify few and weak requirements on RNGs (such as
**C's **`rand`

), - specify a relatively weak general-purpose RNG (such as Java's
`java.math.Random`

, although it also includes a much stronger `SecureRandom`

class), - implement RNGs by default that leave something to be desired (particularly the Mersenne Twister algorithm found in PHP's
`mt_rand`

as well as in Python and Ruby), - seed RNGs with a timestamp by default (such as the
**.NET Framework implementation of **`System.Random`

), and/or - leave the default seeding fixed (as is the case in
**MATLAB**; see also the question titled "Matlab rand and c++ rand()" on *Stack Overflow*).

Many programming languages offer a general-purpose RNG (such as C's `rand`

or Java's `java.util.Random`

) and sometimes an RNG intended for information security purposes (such as `java.security.SecureRandom`

). Thus, a distinction between *statistical* and *cryptographic* RNGs seems natural.

## Conclusion

Random numbers that merely "look random" are not enough for most applications. That is why this document defines **cryptographic RNGs** and **statistical RNGs**; I believe RNGs that meet either category will fulfill the expectations of many applications as regards random numbers. In general:

- For
*statistical RNGs*, the random numbers not only "look random", but are shown to behave like random numbers through statistical tests. - For
*cryptographic RNGs*, the random numbers not only "look random", but are virtually unpredictable.

In addition, this document recommends using cryptographic RNGs in many cases, especially in information security contexts, and recommends easier programming interfaces for both cryptographic and statistical RNGs in new programming languages.

I acknowledge—

- the commenters to the CodeProject version of this page (as well as a similar article of mine on CodeProject), including "Cryptonite" and member 3027120, and
- Lee Daniel Crocker, who reviewed this document and gave comments.

## Notes

^{(1)} F. Dörre and V. Klebanov, "Practical Detection of Entropy Loss in Pseudo-Random Number Generators", CCS '16, 2016.

^{(2)} If the software and/or hardware uses a nonuniform distribution, but otherwise meets this definition, it can be converted to use a uniform distribution, at least in theory, using *unbiasing*, *deskewing*, or *randomness extraction* (see RFC 4086 sec. 4 or Cliff et al. 2009 for further discussion).

^{(3)} Such arbitrary data can include process identifiers, time stamps, environment variables, random numbers, virtual machine guest identifiers, and/or other data specific to the session or to the instance of the RNG. See also NIST SP800-90A and the references below.

Everspaugh, A., Zhai, Y., et al. "Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG", 2014.

Ristenpart, T., Yilek, S. "When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography", 2010.

^{(4)} Bernstein, D.J. "Fast-key-erasure random number generators", Jun. 23, 2017.

^{(5)} Wetzels, J., "33C3: Analyzing Embedded Operating System Random Number Generators", samvartaka.github.io, Jan. 3, 2017.

^{(6)} B. Peng, "Two Fast Methods of Generating True Random Numbers on the Arduino", GitHub Gist, December 2017.

^{(7)} Jones, D., "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications", 2007/2010.

^{(8)} Widynski, B., "Middle Square Weyl Sequence RNG", arXiv:1704.00358v1 [cs.CR], 2017.

^{(9)} S. Vigna, "**An experimental exploration of Marsaglia's **`xorshift`

generators, scrambled", 2016.

^{(10)} Blackman, D., Vigna, S. "Scrambled Linear Pseudorandom Number Generators", 2018.

^{(11)} Noise functions include functions that combine several outputs of a noise function, including by **fractional Brownian motion**. By definition, noise functions are deterministic.

^{(12)} Lenstra, A.K., Wesolowski, B. "A random zoo: sloth, unicorn, and trx", 2015.

^{(13)} Boneh, D., Bonneau, J., et al. "Verifiable Delay Functions", 2018.

^{(14)} Müller, S. "CPU Time Jitter Based Non-Physical True Random Number Generator".

^{(15)} Cliff, Y., Boyd, C., Gonzalez Nieto, J. "How to Extract and Expand Randomness: A Summary and Explanation of Existing Results", 2009.

^{(16)} A. Klyubin, "Some SecureRandom Thoughts", Android Developers Blog, Aug. 14, 2013.

^{(17)} Allowing applications to do so would hamper forward compatibility — the API would then be less free to change how the RNG is implemented in the future (e.g., to use a cryptographic or otherwise "better" RNG), or to make improvements or bug fixes in methods that use that RNG (such as shuffling and Gaussian number generation). (As a notable example, the V8 JavaScript engine recently changed its `Math.random()`

implementation to use a variant of `xorshift128+`

, which is backward compatible because nothing in JavaScript allows `Math.random()`

to be seeded.) Nevertheless, APIs can still allow applications to provide additional input ("entropy") to the RNG in order to increase its randomness rather than to ensure repeatability.

^{(18)} More generally, a list has `N! / (W_1! * W_2! * ... * W_K!)`

permutations (a **multinomial coefficient**), where `N`

is the list's size, `K`

is the number of different items in the list, and `W_i`

is the number of times the item identified by `i`

appears in the list. However, this number is never more than `N!`

and suggests using less randomness, so an application need not use this more complicated formula and MAY assume that a list has `N!`

permutations even if some of its items occur more than once.

^{(19)} Atwood, Jeff. "**The danger of naïveté**".

^{(20)} van Staveren, Hans. **"Big Deal: A new program for dealing bridge hands"**, Sep. 8, 2000

^{(21)} Note that although PRNGs can also act like hash functions (if they're seeded with the input and the PRNG is "large enough" for the input), some PRNGs (such as `xorshift128+`

) are not well suited to serve as hash functions, because they don't mix their state before generating a random number from that state.

## Appendix

### Suggested Entropy Size

The following Python code suggests how many bits of entropy are needed for shuffling. For example:

- To shuffle an
`n`

-item list, the suggested bits of entropy is at least as high as the base-2 logarithm, rounded up, of `n!`

(`stateLengthN(n)`

). - To shuffle a 52-item list, at least 226 bits of entropy is suggested (
`stateLengthN(52)`

). - To shuffle two 52-item lists of identical contents together, at least 500 bits of entropy is suggested (
`stateLengthDecks(2, 52)`

).

from math import factorial as fac
def ceillog2(x):
""" Calculates base-2 logarithm, rounded up, of x. """
ret=0
needCeil=True
while x>1:
one=needCeil and ((x&1)!=0)
x=x>>1
if one:
ret+=1; needCeil=False
ret+=1
return ret
def stateLengthN(n):
""" Suggested state length (or bits of entropy)
for PRNGs that shuffle
a list of n items. """
return ceillog2(fac(n))
def stateLengthNChooseK(n, k):
""" Suggested state length/entropy for PRNGs that choose k
different items randomly from a list of n items
(see RFC 3797, sec. 3.3) """
return ceillog2(fac(n)/(fac(k)*fac(n-k)))
def stateLengthDecks(numDecks, numCards):
""" Suggested state length/entropy for PRNGs that shuffle
multiple decks of cards in one. """
return ceillog2(fac(numDecks*numCards)/ \
(fac(numDecks)**numCards))