Click here to Skip to main content
13,767,544 members
Click here to Skip to main content
Add your own
alternative version

Tagged as

Stats

80.5K views
92 bookmarked
Posted 5 Mar 2016
Licenced Public Domain

Random Number Generator Recommendations for Applications

, 24 Oct 2018
Rate this:
Please Sign up or sign in to vote.
Most apps that use random numbers care about either unpredictability, speed/high quality, or repeatability. This article explains the three kinds of RNGs and gives recommendations on each kind.

Introduction and Summary

Many applications rely on random number generators (RNGs); these RNGs include—

  • statistical RNGs, which seek to generate numbers that follow a uniform random distribution,
  • cryptographic RNGs, which seek to generate numbers that are cost-prohibitive to predict, and
  • seeded PRNGs (pseudorandom number generators), which generate numbers that "seem" random given an initial "seed".

This document covers:

  • Statistical and cryptographic RNGs, as well as recommendations on their use and properties.
  • A discussion on when an application that needs numbers that "seem" random SHOULD specify their own "seed" (the initial state that the numbers are based on).
  • Nondeterministic RNGs, entropy, and seed generation.
  • An explanation of how to implement RNGs in programming code, including APIs that help in doing so.
  • Issues on shuffling with an RNG.

This document does not cover:

  • Testing an RNG implementation for correctness or adequate random number generation (e.g., Dörre and Klebanov 2016(1)).
  • Generation of random numbers or keying material based at least in part on a password (e.g. key derivation functions).
  • Generation of random numbers that follow a nonuniform distribution; I discuss this topic in another document.
  • Low-discrepancy sequences (quasirandom sequences), such as Sobol sequences. Their structure differs in an essential way from independent uniform random numbers.
  • Applications for which the selection of RNGs is constrained by regulatory requirements.
  • Key exchange protocols, including those based on a secret password (PAKEs).

The following table summarizes the kinds of RNGs covered in this document:

Kind of RNGWhen to Use This RNGExamples
Cryptographic RNGIn information security cases, or when speed is not a concern./dev/urandom, BCryptGenRandom
Statistical RNGWhen information security is not a concern, but speed is. See also "Shuffling".xoroshiro128+, xorshift128+
Seeded PRNGWhen generating reproducible "randomness" in a way not practical otherwise.High-quality PRNG with custom seed

About This Document

This is an open-source document; for an updated version, see the source code or its rendering on GitHub. You can send comments on this document either on CodeProject or on the GitHub issues page.

Contents

Definitions

The following definitions are helpful in better understanding this document.

  • Random number generator (RNG). Software and/or hardware that seeks to generate independent numbers that seem to occur by chance and that are approximately uniformly distributed(2).
  • Pseudorandom number generator (PRNG). A random number generator that outputs seemingly random numbers using a deterministic algorithm (that is, an algorithm that returns the same output for the same input and state every time), and in which its state can be initialized and possibly reinitialized with arbitrary data.
  • Seed. Arbitrary data for initializing the state of a PRNG.
  • State length. The maximum size of the seed a PRNG can take to initialize its state without shortening or compressing that seed.
  • Period. The maximum number of values in a generated sequence for a PRNG before that sequence repeats. The period will not be greater than 2L where L is the PRNG's state length.
  • Stable. A programming interface is stable if it has no behavior that is unspecified, implementation-dependent, nondeterministic, or subject to future change.
  • Information security. Defined in ISO/IEC 27000.
  • Nondeterministic source. Data source that does not always return the same output for the same input.
  • MUST, SHOULD, SHOULD NOT, MAY, RECOMMENDED, NOT RECOMMENDED. As defined in RFC 2119 and RFC 8174.

Cryptographic RNGs

Cryptographic RNGs (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that are cost-prohibitive to predict. Cryptographic RNGs are RECOMMENDED for applications that use random numbers for information security, such as—

  • generating keying material, such as encryption keys,
  • generating random passwords, nonces, or session identifiers,
  • generating "salts" to vary hash codes of the same password,
  • use in communications between two networked computers (including in data transfer, data transport, and messaging), and
  • cases (such as in multiplayer networked games) when predicting future random numbers would give a player or user a significant and unfair advantage,

as well as for applications that generate random numbers so infrequently that the RNG's speed is not a concern.

Requirements

A cryptographic RNG implementation generates uniformly distributed random bits such that it would be at least cost-prohibitive for an outside party to guess prior unseen bits of the random sequence correctly with more than a 50% chance per bit, even with knowledge of the randomness-generating procedure, the implementation's internal state at the given point in time, and/or extremely many outputs of the RNG.

If a cryptographic RNG implementation uses a PRNG, the following requirements apply:

  1. The PRNG's state length MUST be at least 128 bits and SHOULD be at least 256 bits. The security strength used by the RNG MUST be at least 112 bits, SHOULD be at least 128 bits, and is less than or equal to the PRNG's state length.

  2. Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed defined as follows. The seed—

    • MUST have as many bits as the PRNG's state length,
    • MUST consist of data that ultimately derives from the output of one or more nondeterministic sources, where the output is at least as hard to predict as ideal random data with as many bits as the security strength, and
    • MAY be mixed with arbitrary data other than the seed as long as the result is no easier to predict(3).
  3. The RNG SHOULD reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds, it SHOULD do so as often as feasible (whenever doing so would not slow down applications undesirably). If the RNG reseeds if it would generate more than a threshold number of bits without reseeding, that threshold SHOULD be 267 or less.

Examples

Examples of cryptographic RNG implementations include the following:

  • The /dev/random device on many Unix-based operating systems, which generally uses only nondeterministic sources; however, in some implementations of the device it can block for seconds at a time, especially if not enough randomness is available.
  • The /dev/urandom device on many Unix-based operating systems, which often relies on both a PRNG and the same nondeterministic sources used by /dev/random.
  • The BCryptGenRandom method in Windows 7 and later.
  • Two-source extractors, multi-source extractors, or cryptographic hash functions that take very hard-to-predict signals from two or more nondeterministic sources as input.
  • A "fast-key-erasure" random number generator described by D.J. Bernstein in his blog (Bernstein 2017)(4).
  • An RNG implementation complying with NIST SP 800-90A. The SP 800-90 series goes into further detail on how RNGs appropriate for information security can be constructed, and inspired much of the "Cryptographic RNGs" section.

Resource-Constrained Devices

Unlike with general-purpose computing devices such as desktops and smartphones, resource-constrained devices ("embedded" devices) are much less likely to have a cryptographic RNG available (Wetzels 2017)(5), although methods exist for implementing a cryptographic RNG on the Arduino (Peng 2017)(6).

Statistical RNGs

Statistical RNGs are used, for example, in simulations, machine learning, numerical integration, and many games to bring an element of chance and variation to the application. However, statistical RNGs are generally suitable only if—

  • information security is not involved, and
  • the application generates random numbers so frequently that it would slow down undesirably if a cryptographic RNG were used instead.

A statistical RNG is usually implemented with a PRNG, but can also be implemented in a similar way as a cryptographic RNG provided it remains reasonably fast.

Requirements

A statistical RNG generates random bits, each of which is uniformly distributed independently of the other bits, at least for nearly all practical purposes. If the implementation uses a PRNG, that PRNG algorithm MUST either satisfy the collision resistance property or be significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of BigCrush, part of L'Ecuyer and Simard's "TestU01". The RNG need not be perfectly equidistributed.

If a statistical RNG implementation uses a PRNG, the following requirements apply.

  1. The PRNG's state length MUST be at least 64 bits, SHOULD be at least 128 bits, and is encouraged to be as high as the implementation can go to remain reasonably fast for most applications.

  2. Before an instance of the RNG generates a random number, it MUST have been initialized ("seeded") with a seed described as follows. The seed—

    • MUST have as many bits as the PRNG's state length,
    • MUST consist of data that ultimately derives from the output of one or more nondeterministic sources and/or cryptographic RNGs, where the output is encouraged to cover a state space of at least as many bits as the PRNG's state length, and
    • MAY be mixed with arbitrary data other than the seed.
  3. The RNG MAY reseed itself from time to time, using a newly generated seed as described earlier. If the RNG reseeds if it would generate more than a threshold number of values without reseeding, that threshold SHOULD be the PRNG's period's square root or less.

Examples and Non-Examples

Examples of statistical RNGs include the following:

  • xoshiro256** (state length 256 bits; nonzero seed).
  • xoroshiro128** (state length 128 bits; nonzero seed).
  • Lehmer64 and Lehmer128 (for each: state length 128 bits; odd seed, so effectively 127 bits state length).
  • XorShift* 128/64 (state length 128 bits; nonzero seed).
  • XorShift* 64/32 (state length 64 bits; nonzero seed).
  • JKISS, JKISS32, JLKISS, JLKISS64, described in (Jones 2007/2010)(7).
  • C++'s std::ranlux48 engine (state length 577 bits; nonzero seed).
  • PCG (pcg32, pcg64, and pcg64_fast classes), by Melissa O'Neill. See also a critique by S. Vigna.
  • Other examples include B. Jenkins's "A small noncryptographic PRNG" (sometimes called jsf), C. Doty-Humphrey's sfc, msws (Widynski 2017)(8), and D. Blackman's gjrand.

The following also count as statistical RNGs, but are not preferred:

  • Mersenne Twister shows a systematic failure in BigCrush's LinearComp test. (See also (Vigna 2016)(9).)
  • xoroshiro128+, xoshiro256+, and xorshift128+. As described by (Blackman and Vigna 2018)(10), these linear PRNGs use weak scramblers, so that each output's lowest bits have low linear complexity even though the output as a whole has excellent statistical randomness. See also "Testing lowest bits in isolation".

Non-examples include the following:

  • Any linear congruential generator with modulus 263 or less (such as java.util.Random and C++'s std::minstd_rand and std::minstd_rand0 engines) has a state length of less than 64 bits.
  • System.Random, as implemented in the .NET Framework 4.7, can take a seed of at most 32 bits, so has a state length of at most 32 bits.

Seeded PRNGs

In addition, some applications use pseudorandom number generators (PRNGs) to generate apparently "random" numbers starting from a known initial state, or "seed". Such applications usually care about repeatable "randomness". (Note that in the definitions for cryptographic and statistical RNGs given earlier, the PRNGs involved are automatically seeded before use.)

When to Use a Seeded PRNG

An application SHOULD NOT use a PRNG with a seed it specifies (rather than an automatically-initialized PRNG or another kind of RNG) unless—

  1. the application might need to generate the same "random" result multiple times,
  2. the application either—
    • makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
    • finds it impractical to store or distribute the "random" numbers or results (rather than the seed) for later use, such as—
      • by saving the result to a file,
      • by storing the "random" numbers for the feature generating the result to "replay" later, or
      • by distributing the "random" numbers or results to networked users as they are generated, and
  3. any feature that uses such a PRNG to generate that "random" result will remain backward compatible with respect to the "random" results it generates, for as long as that feature is still in use by the application.

Note: Meeting statement 3 is aided by using stable PRNGs; see "Definitions" and the following examples:

  • java.util.Random is stable.
  • The C rand method is not stable (because the algorithm it uses is unspecified).
  • C++'s random number distribution classes, such as std::uniform_int_distribution, are not stable (because the algorithms they use are implementation-defined according to the specification).
  • .NET's System.Random is not stable (because its generation behavior could change in the future).

Using seeded RNGs ties the application to a particular RNG or RNG implementation, which is why they are NOT RECOMMENDED except in the limited circumstances given above.

Which Seeded PRNG to Use

If an application decides to use a seeded PRNG for repeatable "randomness", that PRNG SHOULD meet or exceed the requirements of a statistical RNG (except it may use a custom seed) and SHOULD be reasonably fast.

Seed Generation for Seeded PRNGs

As much as possible, an application SHOULD generate seeds for a seeded PRNG—

Examples

Custom seeds can come into play in the following situations, among others.

Games

Many kinds of games generate game content based on apparent randomness, such as—

  • procedurally generated maps for a role-playing game,
  • shuffling a digital deck of cards for a solitaire game, or
  • a game board or puzzle board that normally varies every session,

where the game might need to generate the same content of that kind multiple times.

In general, such a game SHOULD NOT use a PRNG with a custom seed for such purposes unless—

  1. generating the random content uses relatively many random numbers (say, more than a few thousand), and the application finds it impractical to store or distribute the content or the numbers for later use, or
  2. the game makes the seed (or a "code" or "password" based on the seed, such as a barcode or a string of letters and digits) accessible to the player, to allow the player to regenerate the content.

Option 1 often applies to games that generate procedural terrain for game levels, since the terrain often exhibits random variations over an extended space. Option 1 is less suitable for puzzle game boards or card shuffling, since much less data needs to be stored.

Example: Suppose a game generates a map with random terrain and shows the player a "code" to generate that map. In this case, the game—

  • MAY change the algorithm it uses to generate random maps, but
  • SHOULD use, in connection with the new algorithm, "codes" that can't be confused with "codes" it used for previous algorithms, and
  • SHOULD continue to generate the same random map using an old "code" when the player enters it, even after the change to a new algorithm.

Unit Tests

A custom seed is appropriate when unit testing a method that uses a seeded PRNG in place of another kind of RNG for the purpose of the test (provided the test ensures backward compatibility).

Noise

Noise is a randomized variation in images, sound, and other data. (See also Red Blob Games, "Noise Functions and Map Generation"). For the purposes of RNG recommendations, there are two kinds of noise:

  1. Procedural noise is generated using a noise function, which is a function that outputs seemingly random numbers given an n-dimensional point and, optionally, additional data (such as gradients or hash values).(11) Procedural noise includes cellular noise, value noise, and gradient noise (such as Perlin noise). As much as possible, procedural noise implementations SHOULD use an RNG to generate the additional data for the noise function in advance. If using a custom-seeded PRNG is appropriate for the application, the additional data MAY be "hard-coded" instead. If the noise function incorporates a hash function, that hash function SHOULD be reasonably fast, be stable (see "Definitions"), and have the so-called avalanche property.

  2. Nonprocedural noise is generated using the help of an RNG. Nonprocedural noise includes colored noise (including white noise and pink noise), periodic noise, and noise following a Gaussian or other probability distribution. For nonprocedural noise, the same considerations apply to any RNGs the noise implementation uses as in cases not involving noise.

Verifiable Random Numbers

Verifiable random numbers are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, such information includes random numbers and/or uncertain data to be determined and publicly disclosed in the future. Generating verifiable randomness has been described in RFC 3797, in (Lenstra et al., 2015)(12), in (Boneh et al., 2018)(13) (which introduces the concept of verifiable delay functions, functions whose output deliberately takes time to compute but is easy to verify), and elsewhere.

Nondeterministic Sources and Seed Generation

RNGs ultimately rely on nondeterministic sources to generate random numbers. Such sources are used to help generate a seed for a PRNG, for example. The best nondeterministic sources for this purpose are those whose output is very hard to predict.

Examples of Nondeterministic Sources

Examples of nondeterministic sources are—

  • disk access timings,
  • timings of keystrokes and/or other input device interactions,
  • thermal noise,
  • the output of assembly instructions specially dedicated to random number generation, such as RdSeed,
  • the output generated with A. Seznec's technique called hardware volatile entropy gathering and expansion (HAVEGE), provided a high-resolution counter is available, and
  • differences between two high-resolution counter values taken in quick succession (such as in "Jitter RNG"; see (Müller)(14)).

RFC 4086, "Randomness Requirements for Security", section 3, contains a survey of nondeterministic sources.

Notes:

  1. Online services that make random numbers available to applications, as well as outputs of audio and video devices (see RFC 4086 sec. 3.2.1), are additional nondeterministic sources. However, online services require Internet or other network access, and some of them require access credentials. Also, many mobile operating systems require applications to declare network, camera, and microphone access to users upon installation. For these reasons, these kinds of sources are NOT RECOMMENDED if other approaches are adequate.
  2. For noncryptographic RNGs, timestamps from the system clock are commonly used. Timestamps with millisecond or coarser granularity are not encouraged, however, because multiple instances of a PRNG automatically seeded with a timestamp, when they are created at about the same time, run the risk of starting with the same seed and therefore generating the same sequence of random numbers.
  3. For general-purpose use, nondeterministic sources that enable many high-quality seeds per second to be generated are highly advantageous, especially for a cryptographic RNG.

Entropy

Entropy is a value that describes how hard it is to predict a nondeterministic source's output, compared to ideal random data; this is generally the size in bits of the ideal random data. (For example, a 64-bit output with 32 bits of entropy is as hard to predict as an ideal random 32-bit data block.) NIST SP 800-90B recommends min-entropy as the entropy measure. Characterizing a nondeterministic source's entropy is nontrivial and beyond the scope of this document. See also RFC 4086 section 2.

Seed Generation

In general, especially for cryptographic RNGs, to generate an N-bit seed, enough data needs to be gathered from nondeterministic sources to reach N bits of entropy or more.

Once data with enough entropy is gathered, it might need to be condensed into a seed to initialize a PRNG with. Following (Cliff et al., 2009)(15), it is suggested to generate an N-bit seed by using an HMAC or "cascade" hash function (such as SHA-256 or SHA-512), with outputs at least N times 3 bits long, on data with at least N times 3 bits of entropy, then truncating the output to N bits. See also NIST SP 800-90B sec. 3.1.5.1 and RFC 4086 sec. 4.2 and 5.2.

Implementing RNGs in Programming Languages

As much as possible, applications SHOULD use existing libraries and techniques that already meet the requirements for cryptographic and statistical RNGs.

The following table lists application programming interfaces (APIs) for cryptographic and statistical RNGs for popular programming languages. Note the following:

  • Methods and libraries mentioned in the "Statistical" column need to be initialized with a seed before use (for example, a seed generated using an implementation in the "Cryptographic" column).
  • The mention of a third-party library in this section does not imply sponsorship or endorsement of that library, or imply a preference of that library over others. The list is not comprehensive.
LanguageCryptographicStatisticalOther
.NET (incl. C# and VB.NET) (H)RNGCryptoServiceProvider in System.Security.Cryptography namespaceairbreather/AirBreather.Common library (XorShift1024Star, XorShift128Plus, XoroShiro128Plus)
C/C++ (G)(C)xoroshiro128plus.c (128-bit nonzero seed); xorshift128plus.c (128-bit nonzero seed); frostburn/jkiss library
Pythonsecrets.SystemRandom (since Python 3.6); os.urandom()ihaque/xorshift library (128-bit nonzero seed; default seed uses os.urandom())random.getrandbits() (A); random.seed() (19,936-bit seed) (A)
Java (D)(C); java.security.SecureRandom (F)grunka/xorshift (XORShift1024Star or XORShift128Plus); jenetics/prngine (KISS32Random, KISS64Random)prngine library (MT19937_32Random, MT19937_64Random)
JavaScript (B)crypto.randomBytes(byteCount) or random-number-csprng package (node.js only); crypto.getRandomValues() (Web)xorshift library
Ruby(C); SecureRandom.rand() (ranges from 0 through 1) (E); SecureRandom.rand(N) (integer) (E) (for both, require 'securerandom')Random#rand() (ranges from 0 through 1) (A) (E); Random#rand(N) (integer) (A) (E); Random.new(seed) (default seed uses nondeterministic data)
PHPrandom_int(), random_bytes() (both since PHP 7)mt_rand() (A)

(A) General RNG implements Mersenne Twister, which is not preferred for a statistical RNG. PHP's mt_rand() implements or implemented a flawed version of Mersenne Twister.

(B) JavaScript's Math.random (which ranges from 0 through 1) is implemented using xorshift128+ (or a variant) in the V8 engine, Firefox, and certain other modern browsers as of late 2017; the exact algorithm to be used by JavaScript's Math.random is "implementation-dependent", though, according to the ECMAScript specification.

(C) A cryptographic RNG implementation can—

  • read from the /dev/urandom device in most Unix-based systems (using the open and read system calls where available),
  • call the getentropy method on OpenBSD, or
  • call the BCryptGenRandom API in Windows 7 and later,

and only use other techniques if the existing ones are inadequate for the application.

(D) Java's java.util.Random class uses a 48-bit seed, so doesn't meet the statistical RNG requirements. However, a subclass of java.util.Random might be implemented to meet those requirements.

(E) In my opinion, Ruby's Random#rand and SecureRandom.rand methods present a beautiful and simple API for random number generation.

(F) Calling the setSeed method of SecureRandom before use is RECOMMENDED. The data passed to the method SHOULD be data described in note (C). (Despite the name, setSeed supplements the existing seed, according to the documentation.) See also (Klyubin 2013)(16).

(G) std::random_device, introduced in C++11, is NOT RECOMMENDED because its specification leaves considerably much to be desired. For example, std::random_device can fall back to a pseudorandom number generator of unspecified quality without much warning. At best, std::random_device SHOULD only be used to supplement other techniques for random number generation.

(H) The .NET Framework's System.Random class uses a seed of at most 32 bits, so doesn't meet the statistical RNG requirements. However, a subclass of System.Random might be implemented to meet those requirements.


In the limited cases where existing solutions are inadequate, a programming language API could implement cryptographic and statistical RNGs by filling one or more memory units (such as 8-bit bytes) completely with random bits.

Example: A C language API for an RNG could look like the following: int random(uint8_t[] bytes, size_t size);, where bytes is a pointer to an array of 8-bit bytes, and size is the number of random 8-bit bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise.

If an API implements an automatically-initialized RNG, it SHOULD NOT allow applications to initialize that same RNG with a seed for repeatable "randomness"(17) (it MAY provide a separate PRNG to accept such a seed). If an API provides a PRNG that an application can seed for repeatable "randomness", that PRNG SHOULD be stable and documented, and so SHOULD any methods the API provides that use that PRNG (such as shuffling and Gaussian number generation).

My document on random number generation methods includes details on ten uniform random number methods. In my opinion, a new programming language's standard library ought to include those ten methods separately for cryptographic and for statistical RNGs.

RNG Topics

 

How to Initialize RNGs

For cryptographic RNGs, an application SHOULD use only one thread-safe instance of the RNG for the entire application to use.

For statistical and seeded RNGs, to reduce the chance of correlated random numbers or identical random number sequences, an application is encouraged to create—

  • one thread-safe instance of an RNG for the entire application to use, or
  • one instance of an RNG for each thread of the application, where each instance—
    • is accessible to only one thread (such as with thread-local storage), and
    • is initialized with a seed that is unrelated to the other seeds (using sequential or linearly related seeds can cause undesirable correlations in some PRNGs).

An application that generates random numbers in parallel can also do one or both of—

  • using a different conforming RNG scheme for each instance, and
  • using a conforming RNG scheme specially designed for parallel random number generation (such as so-called "splittable" PRNGs).

(Many questions on Stack Overflow highlight the pitfalls of creating a new RNG instance each time a random number is needed, rather than only once in the application. This is notably the case with the .NET generator System.Random.)

Shuffling

In a list with N different items, there are N factorial (that is, 1 * 2 * ... * N, or N!) ways to arrange the items in that list. These ways are called permutations(18).

An application can shuffle a list

  • by generating a random integer at least 0 and less than the number of permutations, and converting that integer to a permutation, or
  • by doing a Fisher–Yates shuffle (which is unfortunately easy to mess up — see (Atwood)(19) — and is implemented correctly in another document of mine).

Either way, however, if a PRNG's period is less than the number of permutations, then there are some permutations that that PRNG can't choose when it shuffles that list. (This is not the same as generating all permutations of a list, which, for a list big enough, can't be done by any computer in a reasonable time.)

If an application uses PRNGs for shuffling purposes, it is encouraged to—

  1. choose a PRNG with a state length B or greater, then
  2. gather data with at least B bits of entropy (randomness), then
  3. generate a full-length seed with the data gathered this way, then
  4. pass the seed to the chosen PRNG, then
  5. use the PRNG to do a Fisher–Yates shuffle.

Here, B can usually be calculated for different lists using the Python code in the appendix; see also (van Staveren 2000, "Lack of randomness")(20). For example, B is 226 bits for a 52-item list. (However, if information security is not involved, an application can instead choose any number 256 or more for B and follow the steps above accordingly — for a list big enough, it's generally more important to have shuffles act random than to choose from among all permutations.)

The PRNG chosen this way SHOULD meet or exceed the requirements of a statistical RNG (except it uses a seed generated as given above) and SHOULD have the highest feasible period for its state length.

GPU Programming Environments

In general, GL Shading Language (GLSL) and other programming environments designed for execution on a graphics processing unit (GPU) are stateless (they take data in and give data out without storing any state themselves), so random number generators for such environments are often designed as hash functions, because their output is determined solely by the input rather than both the input and state (as with PRNGs).

However, some of the hash functions which have been written in GLSL give undesirable results in computers whose GPUs support only 16-bit binary floating point numbers and no other kinds of numbers, which makes such GPUs an important consideration when choosing a hash function.

Hash Functions

A seemingly random number can be generated from arbitrary data using a hash function.

A hash function is a function that takes an arbitrary input of any size (such as an array of 8-bit bytes or a sequence of characters) and returns an output with a fixed number of bits. That output is also known as a hash code. (By definition, hash functions are deterministic(21).)

A hash code can be used as follows:

  • The hash code can serve as a seed for a PRNG, and the desired random numbers can be generated from that PRNG. (See my document on random number generation methods for techniques.)
  • If a number of random bits is needed, and the hash code has at least that many bits, then that many bits can instead be taken directly from the hash code.

Useful properties of some hash functions include—

  • the avalanche property (every bit of the input affects every bit of the output without a clear preference for 0 or 1),
  • collision resistance (finding two different inputs that lead to a given output is cost-prohibitive), and
  • the one-way property (finding an unknown input that leads to a given output is cost-prohibitive) (see NIST SP 800-108).

Hash functions used for information security SHOULD have the collision resistance and one-way properties (e.g., SHA2-256, BLAKE2). Hash functions not used for information security SHOULD have the avalanche property (e.g, MurmurHash3, xxHash, CityHash).

Motivation

What has motivated me to write a more rigorous definition of random number generators is the fact that many applications still use weak RNGs. In my opinion, this is largely because most popular programming languages today—

  • specify few and weak requirements on RNGs (such as C's rand),
  • specify a relatively weak general-purpose RNG (such as Java's java.math.Random, although it also includes a much stronger SecureRandom class),
  • implement RNGs by default that leave something to be desired (particularly the Mersenne Twister algorithm found in PHP's mt_rand as well as in Python and Ruby),
  • seed RNGs with a timestamp by default (such as the .NET Framework implementation of System.Random), and/or
  • leave the default seeding fixed (as is the case in MATLAB; see also the question titled "Matlab rand and c++ rand()" on Stack Overflow).

Many programming languages offer a general-purpose RNG (such as C's rand or Java's java.util.Random) and sometimes an RNG intended for information security purposes (such as java.security.SecureRandom). Thus, a distinction between statistical and cryptographic RNGs seems natural.

Conclusion

Random numbers that merely "look random" are not enough for most applications. That is why this document defines cryptographic RNGs and statistical RNGs; I believe RNGs that meet either category will fulfill the expectations of many applications as regards random numbers. In general:

  • For statistical RNGs, the random numbers not only "look random", but are shown to behave like random numbers through statistical tests.
  • For cryptographic RNGs, the random numbers not only "look random", but are virtually unpredictable.

In addition, this document recommends using cryptographic RNGs in many cases, especially in information security contexts, and recommends easier programming interfaces for both cryptographic and statistical RNGs in new programming languages.

I acknowledge—

  • the commenters to the CodeProject version of this page (as well as a similar article of mine on CodeProject), including "Cryptonite" and member 3027120, and
  • Lee Daniel Crocker, who reviewed this document and gave comments.

Notes

(1) F. Dörre and V. Klebanov, "Practical Detection of Entropy Loss in Pseudo-Random Number Generators", CCS '16, 2016.

(2) If the software and/or hardware uses a nonuniform distribution, but otherwise meets this definition, it can be converted to use a uniform distribution, at least in theory, using unbiasing, deskewing, or randomness extraction (see RFC 4086 sec. 4 or Cliff et al. 2009 for further discussion).

(3) Such arbitrary data can include process identifiers, time stamps, environment variables, random numbers, virtual machine guest identifiers, and/or other data specific to the session or to the instance of the RNG. See also NIST SP800-90A and the references below.
Everspaugh, A., Zhai, Y., et al. "Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG", 2014.
Ristenpart, T., Yilek, S. "When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography", 2010.

(4) Bernstein, D.J. "Fast-key-erasure random number generators", Jun. 23, 2017.

(5) Wetzels, J., "33C3: Analyzing Embedded Operating System Random Number Generators", samvartaka.github.io, Jan. 3, 2017.

(6) B. Peng, "Two Fast Methods of Generating True Random Numbers on the Arduino", GitHub Gist, December 2017.

(7) Jones, D., "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications", 2007/2010.

(8) Widynski, B., "Middle Square Weyl Sequence RNG", arXiv:1704.00358v1 [cs.CR], 2017.

(9) S. Vigna, "An experimental exploration of Marsaglia's xorshift generators, scrambled", 2016.

(10) Blackman, D., Vigna, S. "Scrambled Linear Pseudorandom Number Generators", 2018.

(11) Noise functions include functions that combine several outputs of a noise function, including by fractional Brownian motion. By definition, noise functions are deterministic.

(12) Lenstra, A.K., Wesolowski, B. "A random zoo: sloth, unicorn, and trx", 2015.

(13) Boneh, D., Bonneau, J., et al. "Verifiable Delay Functions", 2018.

(14) Müller, S. "CPU Time Jitter Based Non-Physical True Random Number Generator".

(15) Cliff, Y., Boyd, C., Gonzalez Nieto, J. "How to Extract and Expand Randomness: A Summary and Explanation of Existing Results", 2009.

(16) A. Klyubin, "Some SecureRandom Thoughts", Android Developers Blog, Aug. 14, 2013.

(17) Allowing applications to do so would hamper forward compatibility — the API would then be less free to change how the RNG is implemented in the future (e.g., to use a cryptographic or otherwise "better" RNG), or to make improvements or bug fixes in methods that use that RNG (such as shuffling and Gaussian number generation). (As a notable example, the V8 JavaScript engine recently changed its Math.random() implementation to use a variant of xorshift128+, which is backward compatible because nothing in JavaScript allows Math.random() to be seeded.) Nevertheless, APIs can still allow applications to provide additional input ("entropy") to the RNG in order to increase its randomness rather than to ensure repeatability.

(18) More generally, a list has N! / (W_1! * W_2! * ... * W_K!) permutations (a multinomial coefficient), where N is the list's size, K is the number of different items in the list, and W_i is the number of times the item identified by i appears in the list. However, this number is never more than N! and suggests using less randomness, so an application need not use this more complicated formula and MAY assume that a list has N! permutations even if some of its items occur more than once.

(19) Atwood, Jeff. "The danger of naïveté".

(20) van Staveren, Hans. "Big Deal: A new program for dealing bridge hands", Sep. 8, 2000

(21) Note that although PRNGs can also act like hash functions (if they're seeded with the input and the PRNG is "large enough" for the input), some PRNGs (such as xorshift128+) are not well suited to serve as hash functions, because they don't mix their state before generating a random number from that state.

Appendix

 

Suggested Entropy Size

The following Python code suggests how many bits of entropy are needed for shuffling. For example:

  • To shuffle an n-item list, the suggested bits of entropy is at least as high as the base-2 logarithm, rounded up, of n! (stateLengthN(n)).
  • To shuffle a 52-item list, at least 226 bits of entropy is suggested (stateLengthN(52)).
  • To shuffle two 52-item lists of identical contents together, at least 500 bits of entropy is suggested (stateLengthDecks(2, 52)).

 

from math import factorial as fac

def ceillog2(x):
    """ Calculates base-2 logarithm, rounded up, of x. """
    ret=0
    needCeil=True
    while x>1:
       one=needCeil and ((x&1)!=0)
       x=x>>1
       if one:
         ret+=1; needCeil=False
       ret+=1
    return ret

def stateLengthN(n):
  """ Suggested state length (or bits of entropy)
     for PRNGs that shuffle
    a list of n items. """
  return ceillog2(fac(n))

def stateLengthNChooseK(n, k):
  """ Suggested state length/entropy for PRNGs that choose k
   different items randomly from a list of n items
   (see RFC 3797, sec. 3.3) """
  return ceillog2(fac(n)/(fac(k)*fac(n-k)))

def stateLengthDecks(numDecks, numCards):
  """ Suggested state length/entropy for PRNGs that shuffle
    multiple decks of cards in one. """
  return ceillog2(fac(numDecks*numCards)/ \
      (fac(numDecks)**numCards))

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

Share

About the Author

Peter Occil
United States United States
No Biography provided

You may also be interested in...

Pro

Comments and Discussions

 
GeneralMessage Closed Pin
24-Oct-18 20:02
memberMember 1403197524-Oct-18 20:02 
QuestionWell written, well explained - now a C#/VB.NET addendum Pin
MSBassSinger20-Aug-18 8:19
professionalMSBassSinger20-Aug-18 8:19 
QuestionRandom.org Pin
Blair01116-May-18 11:01
memberBlair01116-May-18 11:01 
GeneralWeighted shuffle Pin
Member 1296954116-May-18 9:33
memberMember 1296954116-May-18 9:33 
QuestionGreat article... Pin
Zdenek Sedlak15-May-18 0:23
professionalZdenek Sedlak15-May-18 0:23 
PraiseGreat article Pin
Michael Haephrati27-Apr-18 9:42
mvpMichael Haephrati27-Apr-18 9:42 
Praisev Pin
BillW3322-Jan-18 9:35
professionalBillW3322-Jan-18 9:35 
QuestionGreat article Pin
Bill Eckard20-Jan-18 11:30
memberBill Eckard20-Jan-18 11:30 
QuestionMessage Closed Pin
20-Jan-18 9:00
memberkanchipuramsarees20-Jan-18 9:00 
QuestionRandom number by a machine is nor possible, all will repeat. Pin
Member 1126470618-Jan-18 22:59
memberMember 1126470618-Jan-18 22:59 
AnswerRe: Random number by a machine is nor possible, all will repeat. Pin
A_Griffin19-Jan-18 0:14
memberA_Griffin19-Jan-18 0:14 
AnswerRe: Random number by a machine is nor possible, all will repeat. Pin
sx200819-Jan-18 11:30
membersx200819-Jan-18 11:30 
GeneralRe: Random number by a machine is nor possible, all will repeat. Pin
Member 1290021119-Jan-18 22:24
memberMember 1290021119-Jan-18 22:24 
Generalawesome explanation Pin
Jenny326312-Nov-17 19:42
memberJenny326312-Nov-17 19:42 
GeneralLike the way of explaining Pin
Member 133918593-Sep-17 20:45
memberMember 133918593-Sep-17 20:45 
GeneralMy vote of 5 Pin
Arthur V. Ratz3-Aug-17 2:48
mvpArthur V. Ratz3-Aug-17 2:48 
QuestionConstrained RNGs Pin
RogerCreagh16-Jul-17 23:07
memberRogerCreagh16-Jul-17 23:07 
AnswerRe: Constrained RNGs Pin
Peter Occil17-Jul-17 4:39
memberPeter Occil17-Jul-17 4:39 
GeneralRe: Constrained RNGs Pin
degski11-Sep-18 19:20
memberdegski11-Sep-18 19:20 
GeneralRe: Constrained RNGs Pin
Peter Occil12-Sep-18 3:13
memberPeter Occil12-Sep-18 3:13 
Questionquery seeding recomendations Pin
Member 133012369-Jul-17 18:28
memberMember 133012369-Jul-17 18:28 
AnswerRe: query seeding recomendations Pin
Peter Occil9-Jul-17 19:28
memberPeter Occil9-Jul-17 19:28 
GeneralRe: query seeding recomendations Pin
Member 1330123611-Jul-17 18:32
memberMember 1330123611-Jul-17 18:32 
AnswerRe: query seeding recomendations Pin
Peter Occil11-Jul-17 4:24
memberPeter Occil11-Jul-17 4:24 
GeneralRe: query seeding recomendations Pin
Member 1330123611-Jul-17 18:59
memberMember 1330123611-Jul-17 18:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web06-2016 | 2.8.181116.1 | Last Updated 24 Oct 2018
Article Copyright 2016 by Peter Occil
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid