## Introduction and Summary

Many applications rely on random number generators (RNGs); these RNGs include—

*statistical RNGs*, which seek to generate numbers that follow a uniform random distribution,*cryptographic RNGs*, which seek to generate numbers that are cost-prohibitive to predict, and*seeded PRNGs* (pseudorandom number generators), which generate numbers that "seem" random given an initial "seed".

**This document covers:**

- Statistical and cryptographic RNGs, as well as recommendations on their use and properties.
- A discussion on when an application that needs numbers that "seem" random SHOULD specify their own "seed" (the initial state that the numbers are based on).
- Nondeterministic RNGs, entropy, and seed generation.
- An explanation of how to implement RNGs in programming code, including APIs that help in doing so.
- Issues on shuffling with an RNG.

**This document does not cover:**

- Testing an RNG implementation for correctness or adequate random number generation.
- Generation of random numbers or keying material based at least in part on a password (e.g.
*key derivation functions*, *password authenticated key exchange*). - Applications for which the selection of RNGs is constrained by statutory or regulatory requirements.

**The following table summarizes the kinds of RNGs covered in this document:**

Kind of RNG | When to Use This RNG | Examples |
---|

**Cryptographic RNG** | In information security cases, or when speed is not a concern. | `/dev/urandom` , `BCryptGenRandom` |

**Statistical RNG** | When information security is not a concern, but speed is. See also **"Shuffling"**. | `xoroshiro128+` , `xorshift128+` |

**Seeded PRNG** | When generating reproducible "randomness" in a way not practical otherwise. | High-quality PRNG with custom seed |

### About This Document

**This is an open-source document; for an updated version, see the** **source code** **or its** **rendering on GitHub****. You can send comments on this document either on** **CodeProject** **or on the** **GitHub issues page****.**

## Contents

## Definitions

The following definitions are helpful in better understanding this document.

**Random number generator (RNG).** Software and/or hardware that seeks to generate independent numbers that seem to occur by chance and that are approximately uniformly distributed^{(1)}.**Pseudorandom number generator (PRNG).** A random number generator that outputs seemingly random numbers using a deterministic algorithm (that is, an algorithm that returns the same output for the same input and state every time), and in which its state can be initialized and possibly reinitialized with arbitrary data.**Seed.** Arbitrary data for initializing the state of a PRNG.**State length.** The maximum size of the seed a PRNG can take to initialize its state without shortening or compressing that seed.**Period.** The maximum number of values in a generated sequence for a PRNG before that sequence repeats. The period will not be greater than 2^{L} where *L* is the PRNG's *state length*.**Stable.** A programming interface is *stable* if it has no behavior that is unspecified, implementation-dependent, nondeterministic, or subject to future change.**Information security.** Defined in ISO/IEC 27000.**Nondeterministic source.** Data source that does not always return the same output for the same input.**MUST, SHOULD, SHOULD NOT, MAY, RECOMMENDED, NOT RECOMMENDED.** As defined in RFC 2119 and RFC 8174.

## Cryptographic RNGs

Cryptographic RNGs (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that are cost-prohibitive to predict. Cryptographic RNGs are RECOMMENDED for applications that use random numbers for information security, such as—

- generating keying material, such as encryption keys,
- generating random passwords, nonces, or session identifiers,
- generating "salts" to vary hash codes of the same password,
- use in communications between two networked computers (including in data transfer, data transport, and messaging), and
- cases (such as in multiplayer networked games) when predicting future random numbers would give a player or user a significant and unfair advantage,

as well as for applications that generate random numbers so infrequently that the RNG's speed is not a concern.

### Quality

A cryptographic RNG implementation generates uniformly distributed random bits such that it would be at least cost-prohibitive for an outside party to guess prior unseen bits of the random sequence correctly with more than a 50% chance per bit, even with knowledge of the randomness-generating procedure, the implementation's internal state at the given point in time, and/or extremely many outputs of the RNG.

### Seeding and Reseeding

If a cryptographic RNG implementation uses a PRNG, the following requirements apply.

The PRNG's *state length* MUST be at least 128 bits and SHOULD be at least 256 bits. The *security strength* used by the RNG MUST be at least 112 bits, SHOULD be at least 128 bits, and MAY equal the PRNG's *state length*.

Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed defined as follows. The seed—

- MUST have as many bits as the PRNG's
*state length*, - MUST consist of data that ultimately derives from the output of one or more
**nondeterministic sources**, where the output is at least as hard to predict as ideal random data with as many bits as the *security strength*, and - MAY be mixed with arbitrary data other than the seed as long as the result is no easier to predict
^{(2)}.

The RNG MAY reseed itself from time to time, using a newly generated seed as described earlier. If the implementation reseeds, it SHOULD do so as often as feasible (whenever doing so would not slow down applications undesirably). If the RNG reseeds if it would generate more than a threshold number of bits without reseeding, that threshold SHOULD be 2^{67} or less.

### Examples

Examples of cryptographic RNG implementations include the following:

- The
`/dev/random`

device on many Unix-based operating systems, which generally uses only nondeterministic sources; however, in some implementations of the device it can block for seconds at a time, especially if not enough randomness is available. - The
`/dev/urandom`

device on many Unix-based operating systems, which often relies on both a PRNG and the same nondeterministic sources used by `/dev/random`

. - The
`BCryptGenRandom`

method in Windows 7 and later. - Two-source extractors, multi-source extractors, or cryptographic
**hash functions** that take very hard-to-predict signals from two or more nondeterministic sources as input. - A "fast-key-erasure" random number generator described by D.J. Bernstein in his blog (Bernstein 2017)
^{(3)}. - An RNG implementation complying with NIST SP 800-90A. The SP 800-90 series goes into further detail on how RNGs appropriate for information security can be constructed, and inspired much of the "Cryptographic RNGs" section.

## Statistical RNGs

Statistical RNGs are used, for example, in simulations, numerical integration, and many games to bring an element of chance and variation to the application, with the goal that each possible outcome is equally likely. However, statistical RNGs are generally suitable only if—

- information security is not involved, and
- the application generates random numbers so frequently that it would slow down undesirably if a cryptographic RNG were used instead.

A statistical RNG is usually implemented with a PRNG, but can also be implemented in a similar way as a cryptographic RNG provided it remains reasonably fast.

**Note:** If more than 20 items are being shuffled, a concerned application would be well advised to use alternatives to a statistical RNG implementation (see **"Shuffling"**).

### Quality

A statistical RNG generates random bits, each of which is uniformly distributed independently of the other bits, at least for nearly all practical purposes. If the implementation uses a PRNG, that PRNG algorithm MUST either satisfy the *one-way property* or be significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of `BigCrush`

, part of L'Ecuyer and Simard's "TestU01". The RNG need not be perfectly equidistributed.

### Seeding and Reseeding

If a statistical RNG implementation uses a PRNG, the following requirements apply.

The PRNG's *state length* MUST be at least 64 bits, SHOULD be at least 128 bits, and is encouraged to be as high as the implementation can go to remain reasonably fast for most applications.

Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed described as follows. The seed—

- MUST have as many bits as the PRNG's
*state length*, - MUST consist of data that ultimately derives from the output of one or more
**nondeterministic sources** and/or cryptographic RNGs, where the output is encouraged to cover a state space of at least as many bits as the PRNG's *state length*, and - MAY be mixed with arbitrary data other than the seed.

The RNG MAY reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds if it would generate more than a threshold number of values without reseeding, that threshold SHOULD be the PRNG's period's square root or less.

### Examples and Non-Examples

Examples of statistical RNGs include the following:

**xoshiro256**** (state length 256 bits; nonzero seed).**xoroshiro128**** (state length 128 bits; nonzero seed).`Lehmer64`

and `Lehmer128`

(for each: state length 128 bits; odd seed, so effectively 127 bits state length).- XorShift* 128/64 (state length 128 bits; nonzero seed).
- XorShift* 64/32 (state length 64 bits; nonzero seed).
`JKISS`

, `JKISS32`

, `JLKISS`

, `JLKISS64`

, described in (Jones 2007/2010)^{(4)}.- C++'s
`std::ranlux48`

engine (state length 577 bits; nonzero seed). - PCG (
`pcg32`

, `pcg64`

, and `pcg64_fast`

classes), by Melissa O'Neill. See also a **critique by S. Vigna**. - Other examples include B. Jenkins's "A small noncryptographic PRNG" (sometimes called
`jsf`

), C. Doty-Humphrey's `sfc`

, `msws`

(Widynski 2017)^{(5)}, and D. Blackman's `gjrand`

.

The following also count as statistical RNGs, but are not preferred:

- Mersenne Twister shows a
**systematic failure** in `BigCrush`

's LinearComp test. (See also (Vigna 2016)^{(6)}.) `xoroshiro128+`

, `xoshiro256+`

, and `xorshift128+`

. As described by (Blackman and Vigna 2018)^{(7)}, these linear PRNGs use weak scramblers, so that each output's lowest bits have low linear complexity even though the output as a whole has excellent statistical randomness. See also **"Testing lowest bits in isolation"**.

Non-examples include the following:

- Any
**linear congruential generator** with modulus 2^{63} or less (such as `java.util.Random`

and C++'s `std::minstd_rand`

and `std::minstd_rand0`

engines) has a *state length* of less than 64 bits. `System.Random`

, as implemented in the .NET Framework 4.7, can take a seed of at most 32 bits, so has a state length of at most 32 bits.

## Seeded PRNGs

In addition, some applications use pseudorandom number generators (PRNGs) to generate results based on apparent randomness, starting from a known initial state, or "seed". Such applications usually care about repeatable "randomness". (Note that in the definitions for **cryptographic** and **statistical** RNGs given earlier, the PRNGs involved are automatically seeded before use.)

### When to Use a Seeded PRNG

An application SHOULD NOT use a PRNG with a seed it specifies (rather than an automatically-initialized PRNG or another kind of RNG) unless—

- the application might need to generate the same "random" result multiple times,
- the application either—
- makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
- finds it impractical to store or distribute the "random" numbers or results (rather than the seed) for later use, such as—
- by saving the result to a file,
- by storing the "random" numbers for the feature generating the result to "replay" later, or
- by distributing the results or the "random" numbers to networked users as they are generated, and

- any feature that uses such a PRNG to generate that "random" result will remain backward compatible with respect to the "random" results it generates, for as long as that feature is still in use by the application.

**Note:** Meeting statement 3 is aided by using *stable* PRNGs; see **"Definitions"** and the following examples:

`java.util.Random`

is stable.- The C
`rand`

method is not stable (because the algorithm it uses is unspecified). - C++'s random number distribution classes, such as
`std::uniform_int_distribution`

, are not stable (because the algorithms they use are implementation-defined according to the specification). - .NET's
`System.Random`

is not stable (because its generation behavior could change in the future).

Among other things, using seeded PRNGs only in the limited circumstances given above makes the application less dependent on a particular RNG or on a particular RNG implementation, meaning the application is free to change the RNG if needed.

### Which Seeded PRNG to Use

If an application decides to use a seeded PRNG for repeatable "randomness", that PRNG SHOULD—

- meet or exceed the quality requirements of a
**statistical RNG**, - be reasonably fast, and
- have a
*state length* of 64 bits or greater.

### Seed Generation for Seeded PRNGs

As much as possible, an application SHOULD generate seeds for a seeded PRNG—

### Examples

Custom seeds can come into play in the following situations, among others.

#### Games

Many kinds of games generate game content based on apparent randomness, such as—

- procedurally generated maps for a role-playing game,
**shuffling** a digital deck of cards for a solitaire game, or- a game board or puzzle board that normally varies every session,

where the game might need to generate the same content of that kind multiple times.

In general, such a game SHOULD NOT use a PRNG with a custom seed for such purposes unless—

- generating the random content uses relatively many random numbers (say, more than a few thousand), and the application finds it impractical to store or distribute the content or the numbers for later use, or
- the game makes the seed (or a "code" or "password" based on the seed, such as a barcode or a string of letters and digits) accessible to the player, to allow the player to regenerate the content.

Option 1 often applies to games that generate procedural terrain for game levels, since the terrain often exhibits random variations over an extended space. Option 1 is less suitable for puzzle game boards or card shuffling, since much less data needs to be stored.

**Example:** Suppose a game generates a map with random terrain and shows the player a "code" to generate that map. In this case, the game—

- MAY change the algorithm it uses to generate random maps, but
- SHOULD use, in connection with the new algorithm, "codes" that can't be confused with "codes" it used for previous algorithms, and
- SHOULD continue to generate the same random map using an old "code" when the player enters it, even after the change to a new algorithm.

#### Unit Tests

A custom seed is appropriate when unit testing a method that uses a seeded PRNG in place of another kind of RNG for the purpose of the test (provided the test ensures backward compatibility).

#### Verifiable Random Numbers

*Verifiable random numbers* are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, such information includes random numbers and/or uncertain data to be determined and publicly disclosed in the future. Generating verifiable randomness has been described in **RFC 3797**, in (Lenstra et al., 2015)^{(8)}, in (Boneh et al., 2018)^{(9)} (which introduces the concept of *verifiable delay functions*, functions whose output deliberately takes time to compute but is easy to verify), and elsewhere.

#### Noise

*Noise* is a randomized variation in images, sound, and other data. (See also Red Blob Games, **"Noise Functions and Map Generation"**). For the purposes of RNG recommendations, there are two kinds of noise:

*Procedural noise* is generated using a *noise function*, which is a function that outputs seemingly random numbers given an *n*-dimensional point and, optionally, additional data (such as gradients or hash values).^{(10)} Procedural noise includes **cellular noise**, **value noise**, and **gradient noise** (such as **Perlin noise**). As much as possible, procedural noise implementations SHOULD **use an RNG to generate the additional data** for the noise function in advance. If using a **custom-seeded PRNG** is appropriate for the application, the additional data MAY be **"hard-coded"** instead. If the noise function **incorporates a** *hash function*, that hash function SHOULD be reasonably fast, be *stable* (see **"Definitions"**), and have the so-called *avalanche property*.

*Nonprocedural noise* is generated using the help of an RNG. Nonprocedural noise includes **colored noise** (including white noise and pink noise), periodic noise, and noise following a Gaussian or other **probability distribution**. For nonprocedural noise, the same considerations apply to any RNGs the noise implementation uses as in cases not involving noise.

## Nondeterministic Sources and Seed Generation

RNGs ultimately rely on nondeterministic sources to generate random numbers. Such sources are used to help generate a *seed* for a PRNG, for example. The best nondeterministic sources for this purpose are those whose output is very hard to predict.

### Examples of Nondeterministic Sources

Examples of nondeterministic sources are—

- disk access timings,
- timings of keystrokes and/or other input device interactions,
- thermal noise,
- the output generated with A. Seznec's technique called hardware volatile entropy gathering and expansion (HAVEGE), provided a high-resolution counter is available, and
- differences between two high-resolution counter values taken in quick succession (such as in "Jitter RNG"; see (Müller)
^{(11)}).

RFC 4086, "Randomness Requirements for Security", section 3, contains a survey of nondeterministic sources.

**Notes:**

- Online services that make random numbers available to applications, as well as outputs of audio and video devices (see RFC 4086 sec. 3.2.1), are additional nondeterministic sources. However, online services require Internet or other network access, and some of them require access credentials. Also, many mobile operating systems require applications to declare network, camera, and microphone access to users upon installation. For these reasons, these kinds of sources are NOT RECOMMENDED if other approaches are adequate.
- For noncryptographic RNGs, timestamps from the system clock are commonly used. Timestamps with millisecond or coarser granularity are not encouraged, however, because multiple instances of a PRNG automatically seeded with a timestamp, when they are created at about the same time, run the risk of starting with the same seed and therefore generating the same sequence of random numbers.
- For general-purpose use, nondeterministic sources that enable many high-quality seeds per second to be generated are highly advantageous, especially for a cryptographic RNG.

### Entropy

*Entropy* is a value that describes how hard it is to predict a nondeterministic source's output, compared to ideal random data; this is generally the size in bits of the ideal random data. (For example, a 64-bit output with 32 bits of entropy is as hard to predict as an ideal random 32-bit data block.) NIST SP 800-90B recommends *min-entropy* as the entropy measure and also details how nondeterministic sources can be used for information security. See also RFC 4086 section 2.

### Seed Generation

In general, especially for cryptographic RNGs, **to generate an N-bit seed, enough data needs to be gathered from nondeterministic sources to reach N bits of entropy or more**.

Once data with enough entropy is gathered, it might need to be condensed into a seed to initialize a PRNG with. Following (Cliff et al., 2009)^{(12)}, it is suggested to generate an N-bit seed by using an HMAC or "cascade" hash function (such as SHA-256 or SHA-512), with outputs at least N times 3 bits long, on data with at least N times 3 bits of entropy, then truncating the output to N bits. See also NIST SP 800-90B sec. 3.1.5.1 and RFC 4086 sec. 4.2 and 5.2.

## Implementing RNGs in Programming Languages

As much as possible, **applications SHOULD use existing libraries and techniques** that already meet the requirements for cryptographic and statistical RNGs.

The following table lists application programming interfaces (APIs) for
cryptographic and statistical RNGs for popular programming languages. Note the following:

- Methods and libraries mentioned in the "Statistical" column need to be initialized with a seed before use (for example, a seed generated using an implementation in the "Cryptographic" column).
- The mention of a third-party library in this section does not imply sponsorship or endorsement of that library, or imply a preference of that library over others. The list is not comprehensive.

Language | Cryptographic | Statistical | Other |
---|

.NET (incl. C# and VB.NET) (H) | `RNGCryptoServiceProvider` in `System.Security.Cryptography` namespace | **airbreather/AirBreather.Common library** (XorShift1024Star, XorShift128Plus, XoroShiro128Plus) | |

C/C++ (G) | (C) | `xoroshiro128plus.c` (128-bit nonzero seed); `xorshift128plus.c` (128-bit nonzero seed); **frostburn/jkiss** library | |

Python | `secrets.SystemRandom` (since Python 3.6); `os.urandom()` | **ihaque/xorshift** library (128-bit nonzero seed; default seed uses `os.urandom()` ) | `random.getrandbits()` (A); `random.seed()` (19,936-bit seed) (A) |

Java (D) | (C); `java.security.SecureRandom` (F) | **grunka/xorshift** (`XORShift1024Star` or `XORShift128Plus` ); **jenetics/prngine** (`KISS32Random` , `KISS64Random` ) | prngine library (`MT19937_32Random` , `MT19937_64Random` ) |

JavaScript | `crypto.randomBytes(byteCount)` (node.js only) | `xorshift` library | `Math.random()` (ranges from 0 through 1) (B) |

Ruby | (C); `SecureRandom.rand()` (ranges from 0 through 1) (E); `SecureRandom.rand(N)` (integer) (E) (for both, `require 'securerandom'` ) | | `Random#rand()` (ranges from 0 through 1) (A) (E); `Random#rand(N)` (integer) (A) (E); `Random.new(seed)` (default seed uses nondeterministic data) |

PHP | `random_int()` , `random_bytes()` (both since PHP 7) | | `mt_rand()` (A) |

(A) General RNG implements **Mersenne Twister**, which is not preferred for a statistical RNG. PHP's `mt_rand()`

implements or implemented a flawed version of Mersenne Twister.

(B) JavaScript's `Math.random`

is implemented using `xorshift128+`

(or a variant) in the V8 engine, Firefox, and certain other modern browsers as of late 2017; the exact algorithm to be used by JavaScript's `Math.random`

is "implementation-dependent", though, according to the ECMAScript specification.

(C) A cryptographic RNG implementation can—

- read from the
`/dev/urandom`

and/or `/dev/random`

devices in most Unix-based systems (using the `open`

and `read`

system calls where available), - call the
`getentropy`

method on OpenBSD, or - call the
`BCryptGenRandom`

API in Windows 7 and later,

and only use other techniques if the existing ones are inadequate for the application.

(D) Java's `java.util.Random`

class uses a 48-bit seed, so doesn't meet the statistical RNG requirements. However, a subclass of `java.util.Random`

might be implemented to meet those requirements.

(E) In my opinion, Ruby's `Random#rand`

and `SecureRandom.rand`

methods present a beautiful and simple API for random number generation.

(F) Calling the `setSeed`

method of `SecureRandom`

before use is RECOMMENDED. The data passed to the method SHOULD be data described in note (C). (Despite the name, `setSeed`

*supplements* the existing seed, according to the documentation.)

(G) `std::random_device`

, introduced in C++11, is NOT RECOMMENDED because its specification leaves considerably much to be desired. For example, `std::random_device`

can fall back to a pseudorandom number generator of unspecified quality without much warning. At best, `std::random_device`

SHOULD only be used to supplement other techniques for random number generation.

(H) The .NET Framework's `System.Random`

class uses a seed of at most 32 bits, so doesn't meet the statistical RNG requirements. However, a subclass of `System.Random`

might be implemented to meet those requirements.

In the limited cases where existing solutions are inadequate, a programming language API could implement cryptographic and statistical RNGs by filling an output byte buffer with random bytes, where each bit in each byte will be randomly set to 0 or 1. Such an API is RECOMMENDED to be reasonably fast for most applications, and to be safe for concurrent use by multiple threads, whenever convenient.

**Example:** A C language API for such RNGs could look like the following: `int random(uint8_t[] bytes, size_t size);`

, where "bytes" is a pointer to a byte array, and "size" is the number of random bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise.

My document on **random number generation methods** includes details on ten uniform random number methods. In my opinion, a new programming language's standard library ought to include those ten methods separately for cryptographic and for statistical RNGs. That document also discusses how to implement other methods to generate random numbers or integers that follow a given distribution (such as a normal, geometric, binomial, or weighted distribution) or fall within a given range.

## RNG Topics

### How to Initialize RNGs

For cryptographic RNGs, an application SHOULD use only one thread-safe instance of the RNG for the entire application to use.

For statistical and seeded RNGs, to **reduce the chance of correlated random numbers or identical random number sequences**, an application is encouraged to create—

- one thread-safe instance of an RNG for the entire application to use, or
- one instance of an RNG for each thread of the application, where each instance—
- is accessible to only one thread (such as with thread-local storage), and
- is initialized with a seed that is unrelated to the other seeds (using sequential or linearly related seeds can cause
**undesirable correlations** in some PRNGs).

An application that generates **random numbers in parallel** can also do one or both of—

- using a different conforming RNG scheme for each instance, and
- using a conforming RNG scheme specially designed for parallel random number generation (such as so-called "splittable" PRNGs).

(Many questions on *Stack Overflow* highlight the pitfalls of creating a new RNG instance each time a random number is needed, rather than only once in the application. This is notably the case with the .NET generator `System.Random`

.)

### Shuffling

In a list with `N`

different items, there are `N`

factorial (that is, `1 * 2 * ... * N`

or `N!`

) ways to arrange the items in that list. These ways are called *permutations*.

**Note:** More generally, a list has `M! / (W_1! * W_2! * ... * W_N!)`

permutations (a **multinomial coefficient**), where `M`

is the list's size, `N`

is the number of different items in the list, and `W_i`

is the number of times the item identified by `i`

appears in the list.

An application can **shuffle a list**—

- by generating a random integer as low as 0 and as high as the number of permutations minus 1, and converting that integer to a permutation, or
- by doing a
**Fisher–Yates shuffle** (which is unfortunately easy to mess up — see (Atwood)^{(13)} — and is implemented correctly in **another document of mine**).

Either way, however, if a PRNG's period is less than the number of permutations, then there are **some permutations that PRNG can't choose** when it shuffles that list. (This is not the same as *generating* all permutations of a list, which, for a sufficiently large list size, can't be done by any computer in a reasonable time.)

To give all permutations a chance, an application needs to generate data with at least `B`

bits of *entropy* (randomness), where `B`

is the number of bits needed to store `P`

, where `P`

is the number of permutations minus 1 (e.g., `B`

is 226 bits for a 52-item list); see also (van Staveren 2000, "Lack of randomness")^{(14)}. (`B`

can be calculated for different lists using the Python code found in the **appendix**.)

If an application will use PRNGs for shuffling, an application is encouraged to generate data with at least `B`

bits of entropy, **generate a seed** with that data, then pass the seed to a PRNG with state length `B`

or greater. For general-purpose use (but not when information security is involved), an application can use the number 525 instead of `B`

(and so assume that lists of up to 100 items will be shuffled). (Practically speaking, for sufficiently large list sizes, any given PRNG will not be able to randomly choose some permutations of the list. `xoroshiro1024**`

and `ranlux48`

are two examples of PRNGs with state lengths of at least 525.)

The PRNG chosen this way SHOULD meet at least the quality requirements of a statistical RNG implementation, SHOULD have the highest feasible period for its state length, and SHOULD be initialized with a full-length seed.

### GPU Programming Environments

In general, GL Shading Language (GLSL) and other programming environments designed for execution on a graphics processing unit (GPU)—

- have limited access to some system resources compared with other programming environments,
- are designed for parallel execution, and
- do not store state,

so random number generators for such environments are often designed as **hash functions**, because their output is determined solely by the input rather than both the input and state (as with PRNGs). Moreover, some of the hash functions which have been written in GLSL give undesirable results in computers whose GPUs support only 16-bit binary floating point numbers and no other kinds of numbers, which makes such GPUs an important consideration when choosing a hash function.

## Hash Functions

A seemingly random number can be generated from arbitrary data using a *hash function*.

A *hash function* is a function that takes an arbitrary input of any size (such as a sequence of bytes or a sequence of characters) and returns an output with a fixed number of bits. That output is also known as a *hash code*. (By definition, hash functions are deterministic^{(15)}.)

A hash code can be used as follows:

- The hash code can serve as a seed for a PRNG, and the desired random numbers can be generated from that PRNG. (See my document on
**random number generation methods** for techniques.) - If a number of random bits is needed, and the hash code has at least that many bits, then that many bits can instead be taken directly from the hash code.

Useful properties of some hash functions include—

- the
*avalanche property* (every bit of the input affects every bit of the output without a clear preference for 0 or 1), - the
*one-way property* (finding an unknown second input that leads to the same output as that of a given input is cost-prohibitive), and *collision resistance* (finding an unknown input that leads to a given output is cost-prohibitive).

Applications SHOULD choose hash functions with the avalanche property. Hash functions used for information security SHOULD also have the collision resistance and one-way properties.

## Motivation

In this document, I made the distinction between *statistical* and *cryptographic* RNGs because that is how many programming languages present random number generators — they usually offer a general-purpose RNG (such as C's `rand`

or Java's `java.util.Random`

) and sometimes an RNG intended for information security purposes (such as `java.security.SecureRandom`

).

What has motivated me to write a more rigorous definition of random number generators is the fact that many applications still use weak RNGs. In my opinion, this is largely because most popular programming languages today—

- specify few and weak requirements on RNGs (such as
**C's **`rand`

), - specify a relatively weak general-purpose RNG (such as Java's
`java.math.Random`

, although it also includes a much stronger `SecureRandom`

class), - implement RNGs by default that leave something to be desired (particularly the Mersenne Twister algorithm found in PHP's
`mt_rand`

as well as in Python and Ruby), - seed RNGs with a timestamp by default (such as the
**.NET Framework implementation of **`System.Random`

), and/or - leave the default seeding fixed (as is the case in
**MATLAB**; see also the question titled "Matlab rand and c++ rand()" on *Stack Overflow*).

## Conclusion

Random numbers that merely "look random" are not enough for most applications. That is why this document defines **cryptographic RNGs** and **statistical RNGs**; I believe RNGs that meet either category will fulfill the expectations of many applications as regards random numbers. In general:

- For
*statistical RNGs*, the random numbers not only "look random", but are shown to behave like random numbers through statistical tests. - For
*cryptographic RNGs*, the random numbers not only "look random", but are virtually unpredictable.

In addition, this document recommends using cryptographic RNGs in many cases, especially in information security contexts, and recommends easier programming interfaces for both cryptographic and statistical RNGs in new programming languages.

I acknowledge—

- the commenters to the CodeProject version of this page (as well as a similar article of mine on CodeProject), including "Cryptonite" and member 3027120, and
- Lee Daniel Crocker, who reviewed this document and gave comments.

## Notes

^{(1)} If the software and/or hardware uses a nonuniform distribution, but otherwise meets this definition, it can be converted to use a uniform distribution, at least in theory, using *unbiasing*, *deskewing*, or *randomness extraction* (see RFC 4086 sec. 4 or Cliff et al. 2009 for further discussion).

^{(2)} Such arbitrary data can include process identifiers, time stamps, environment variables, random numbers, virtual machine guest identifiers, and/or other data specific to the session or to the instance of the RNG. See also NIST SP800-90A and the references below.

Everspaugh, A., Zhai, Y., et al. "Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG", 2014.

Ristenpart, T., Yilek, S. "When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography", 2010.

^{(3)} Bernstein, D.J. "Fast-key-erasure random number generators", Jun. 23, 2017.

^{(4)} Jones, D., "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications", 2007/2010.

^{(5)} Widynski, B., "Middle Square Weyl Sequence RNG", arXiv:1704.00358v1 [cs.CR], 2017

^{(6)} S. Vigna, "**An experimental exploration of Marsaglia's **`xorshift`

generators, scrambled", 2016.

^{(7)} Blackman, D., Vigna, S. "Scrambled Linear Pseudorandom Number Generators", 2018.

^{(8)} Lenstra, A.K., Wesolowski, B. "A random zoo: sloth, unicorn, and trx", 2015.

^{(9)} Boneh, D., Bonneau, J., et al. "Verifiable Delay Functions", 2018.

^{(10)} Noise functions include functions that combine several outputs of a noise function, including by **fractional Brownian motion**. By definition, noise functions are deterministic.

^{(11)} Müller, S. "CPU Time Jitter Based Non-Physical True Random Number Generator".

^{(12)} Cliff, Y., Boyd, C., Gonzalez Nieto, J. "How to Extract and Expand Randomness: A Summary and Explanation of Existing Results", 2009.

^{(13)} Atwood, Jeff. "**The danger of naïveté**".

^{(14)} van Staveren, Hans. **"Big Deal: A new program for dealing bridge hands"**, Sep. 8, 2000

^{(15)} Note that although PRNGs can also act like hash functions (if they're seeded with the input and the PRNG is "large enough" for the input), some PRNGs (such as `xorshift128+`

) are not well suited to serve as hash functions, because they don't mix their state before generating a random number from that state.

## Appendix

### Suggested Entropy Size

The following Python code suggests how many bits of entropy are needed for shuffling. For example—

- to shuffle a 52-item list, it is suggested to use a PRNG with state length 226 or more, initialized with a seed with at least 226 bits of entropy (
`stateLengthN(52)`

), and - to shuffle two 52-item lists of identical contents together, then the suggested state length and bits of entropy are 500 or more (
`stateLengthDecks(2, 52)`

).

def fac(x):
""" Calculates factorial of x. """
if x<=1: return 1
ret=1
for i in range(x): ret=ret*(i+1)
return ret
def ceillog2(x):
""" Calculates base-2 logarithm of x, rounded up. """
ret=0
needCeil=True
while x>1:
one=needCeil and ((x&1)!=0)
x=x>>1
if one:
ret+=1; needCeil=False
ret+=1
return ret
def stateLengthN(n):
""" Suggested state length (or bits of entropy)
for PRNGs that shuffle
a list of n items. """
return ceillog2(fac(n))
def stateLengthNChooseK(n, k):
""" Suggested state length/entropy for PRNGs that choose k
different items randomly from a list of n items
(see RFC 3797, sec. 3.3) """
return ceillog2(fac(n)/(fac(k)*fac(n-k)))
def stateLengthDecks(numDecks, numCards):
""" Suggested state length/entropy for PRNGs that shuffle
multiple decks of cards in one. """
return ceillog2(fac(numDecks*numCards)/ \
(fac(numDecks)**numCards))