## Introduction

This page introduces an implementation of *partially-sampled* exponential random numbers. Called *e-rands* in this document, they represent incomplete numbers whose contents are determined only when necessary, making them have potentially arbitrary precision.

Moreover, this document includes methods that operate on e-rands in a way that uses only uniform random bits, and without relying on floating-point arithmetic (except for conversion purposes in the example method `exprand`

). Also, the methods support e-rands with an arbitrary rate parameter (λ) greater than 0.

There are papers that discuss generating exponential random numbers using random bits (Flajolet and Saheb 1982)^{(1)}, (Karney 2014)^{(2)}, (Devroye and Gravel 2018)^{(3)}, (Thomas and Luk 2008)^{(4)}, but none I am aware of deal with generating partially-sampled exponential random numbers using an arbitrary rate, not just 1.

## About the Exponential Distribution

The *exponential distribution* takes a parameter λ. Informally speaking, a random number that follows an exponential distribution is the number of units of time between one event and the next, and λ is the expected average number of events per unit of time. Usually, λ is equal to 1.

An exponential random number is commonly generated as follows: `-ln(1 - RNDU01())`

, where `RNDU01()`

is a uniform random number in the interval [**0, 1). (This particular formula is not robust, though, for reasons that are outside the scope of this document, but see (Pedersen 2018)^{(9)**}.) This page presents an alternative way to sample exponential random numbers.

## Code

The following Python code implements partially-sampled exponential random numbers, called e-rands in this document for convenience (similarly to Karney's "u-rands" for partially-sampled uniform random numbers (Karney 2014)^{(2)}). It implements a way to determine whether one e-rand is less than another, as well as a way to fill an e-rand as necessary to give its fractional part a given number of bits.

It makes use of two observations (based on the parameter λ of the exponential distribution):

- While a coin flip with probability of heads of exp(-λ) is heads, the exponential random number is increased by 1.
- If a coin flip with probability of heads of 1/(1+exp(λ/2
^{k})) is heads, the exponential random number is increased by 2^{-k}, where*k*> 0 is an integer.

(Devroye and Gravel 2018)^{(3)} already made these observations in their Appendix, but only for λ = 1.

To implement these probabilities using just random bits, the code uses two algorithms:

- One to simulate a probability of the form
`exp(-x/y)`

(`zero_or_one_exp_minus`

; (Canonne et al. 2020)^{(5)}). - One to simulate a probability of the form
`1/(1+exp(lamda/pow(2, prec)))`

(`logisticexp`

(Morina et al. 2019)^{(6)}).

Both algorithms are included in the Python code below. (Note that `zero_or_one_exp_minus`

uses `random.randint`

which does not necessarily use only random bits; it could be replaced with a random-bit-only algorithm such as FastDiceRoller or Bernoulli, both of which were presented by Lumbroso (2013)^{(7)}.)

```
<code>
import random
def logisticexp(ln, ld, prec):
denom=ld*2**prec
while True:
if random.randint(0,1)==0: return 0
if zero_or_one_exp_minus(ln, denom) == 1: return 1
def exprandnew(lamdanum=1, lamdaden=1):
""" Returns an object to serve as a partially-sampled
exponential random number with the given
rate 'lamdanum'/'lamdaden'. The object is a list of five numbers:
the first is a multiple of 2^X, the second is X, the third is the integer
part (initially -1 to indicate the integer part wasn't sampled yet),
and the fourth and fifth are the lamda parameter's
numerator and denominator, respectively. Default for 'lamdanum'
and 'lamdaden' is 1.
The number created by this method will be "empty"
(no bits sampled yet).
"""
return [0, 0, -1, lamdanum, lamdaden]
def exprandfill(a, bits):
""" Fills the unsampled bits of the given exponential random number
'a' as necessary to make a number whose fractional part
has 'bits' many bits. If the number's fractional part already has
that many bits or more, the number is rounded using the round-to-nearest,
ties to even rounding rule. Returns the resulting number as a
multiple of 2^'bits'. """
# Fill the integer if necessary.
if a[2]==-1:
a[2]=0
while zero_or_one_exp_minus(a[3], a[4]) == 1:
a[2]+=1
if a[1] > bits:
# Shifting bits beyond the first excess bit.
aa = a[0] >> (a[1] - bits - 1)
# Check the excess bit; if odd, round up.
ret=aa >> 1 if (aa & 1) == 0 else (aa >> 1) + 1
return ret|(a[2]<<bits)
# Fill the fractional part if necessary.
while a[1] < bits:
index = a[1]
a[1]+=1
a[0]=(a[0]<<1)|logisticexp(a[3], a[4], index+1)
return a[0]|(a[2]<<bits)
def exprandless(a, b):
""" Determines whether one partially-sampled exponential number
is less than another; returns
true if so and false otherwise. During
the comparison, additional bits will be sampled in both numbers
if necessary for the comparison. """
# Check integer part of exponentials
if a[2] == -1:
a[2] = 0
while zero_or_one_exp_minus(a[3], a[4]) == 1:
a[2] += 1
if b[2] == -1:
b[2] = 0
while zero_or_one_exp_minus(b[3], b[4]) == 1:
b[2] += 1
if a[2] < b[2]:
return True
if a[2] > b[2]:
return False
index = 0
while True:
# Fill with next bit in a's exponential number
if a[1] < index:
raise ValueError
if b[1] < index:
raise ValueError
if a[1] <= index:
a[1] += 1
a[0] = logisticexp(a[3], a[4], index + 1) | (a[0] << 1)
# Fill with next bit in b's exponential number
if b[1] <= index:
b[1] += 1
b[0] = logisticexp(b[3], b[4], index + 1) | (b[0] << 1)
aa = (a[0] >> (a[1] - 1 - index)) & 1
bb = (b[0] >> (b[1] - 1 - index)) & 1
if aa < bb:
return True
if aa > bb:
return False
index += 1
def zero_or_one_exp_minus(x, y):
""" Generates 1 with probability exp(-px/py); 0 otherwise.
Reference:
Canonne, C., Kamath, G., Steinke, T., "[The Discrete Gaussian
for Differential Privacy](https://arxiv.org/abs/2004.00010v2)", arXiv:2004.00010v2 [cs.DS], 2020. """
if y <= 0 or x < 0:
raise ValueError
if x > y:
xf = int(x / y) # Get integer part
x = x % y # Reduce to fraction
if x > 0 and zero_or_one_exp_minus(x, y) == 0:
return 0
for i in range(1, xf + 1):
if zero_or_one_exp_minus(1, 1) == 0:
return 0
return 1
r = 1
oy = y
while True:
# NOTE: randint is used in this example, but
# it could be replaced by a random-bit-only
# algorithm such as the Fast Dice Roller (Lumbroso 2013),
# or the Bernoulli method given in the same paper.
if random.randint(0, y-1) >= x:
return r
if r == 1:
r = 0
else:
r = 1
y = y + oy
# Example of use
def exprand(lam):
return exprandfill(exprandnew(lam),53)*1.0/(1<<53)
```

## Correctness Testing

To test the correctness of the `exprandfill`

method, the Kolmogorov–Smirnov test was applied with various values of λ and the default precision of 53, using SciPy's `kstest`

method. The code for the test is very simple: `kst = scipy.stats.kstest(ksample, lambda x: scipy.stats.expon.cdf(x, scale=1/lamda))`

, where `ksample`

is a sample of random numbers generated using the `exprand`

method above. Note that SciPy uses a two-sided Kolmogorov–Smirnov test by default.

The table below shows the results of the correctness testing. For each parameter, five samples with 50,000 numbers per sample were taken, and results show the lowest and highest Kolmogorov–Smirnov statistics and p-values achieved for the five samples. Note that a p-value extremely close to 0 or 1 strongly indicates that the samples do not come from the corresponding exponential distribution.

λ | Statistic | p-value |
---|---|---|

1/10 | 0.00233-0.00435 | 0.29954-0.94867 |

1/4 | 0.00254-0.00738 | 0.00864-0.90282 |

1/2 | 0.00195-0.00521 | 0.13238-0.99139 |

2/3 | 0.00295-0.00457 | 0.24659-0.77715 |

3/4 | 0.00190-0.00636 | 0.03514-0.99381 |

9/10 | 0.00226-0.00474 | 0.21032-0.96029 |

1 | 0.00267-0.00601 | 0.05389-0.86676 |

2 | 0.00293-0.00684 | 0.01870-0.78310 |

3 | 0.00284-0.00675 | 0.02091-0.81589 |

5 | 0.00256-0.00546 | 0.10130-0.89935 |

10 | 0.00279-0.00528 | 0.12358-0.82974 |

To test the correctness of `exprandless`

, a two-independent-sample T-test was applied to scores involving e-rands and scores involving the Python `random.expovariate`

method. Specifically, the score is calculated as the number of times one exponential number compares as less than another; for the same λ this event should ideally be as likely as the event that it compares as greater. The Python code that follows the table calculates this score for e-rands and `expovariate`

. Even here, the code for the test is very simple: `kst = scipy.stats.ttest_ind(exppyscores, exprandscores)`

, where `exppyscores`

and `exprandscores`

are each lists of 20 results from `exppyscore`

or `exprandscore`

, respectively.

The table below shows the results of the correctness testing. For each pair of parameters, results show the lowest and highest T-test statistics and p-values achieved for the 20 results. Note that a p-value extremely close to 0 or 1 strongly indicates that exponential random numbers are not compared as less or greater with the expected probability.

Left λ | Right λ | Statistic | p-value |
---|---|---|---|

1/10 | 1/10 | -1.21015 – 0.93682 | 0.23369 – 0.75610 |

1/10 | 1/2 | -1.25248 – 3.56291 | 0.00101 – 0.39963 |

1/10 | 1 | -0.76586 – 1.07628 | 0.28859 – 0.94709 |

1/10 | 2 | -1.80624 – 1.58347 | 0.07881 – 0.90802 |

1/10 | 5 | -0.16197 – 1.78700 | 0.08192 – 0.87219 |

1/2 | 1/10 | -1.46973 – 1.40308 | 0.14987 – 0.74549 |

1/2 | 1/2 | -0.79555 – 1.21538 | 0.23172 – 0.93613 |

1/2 | 1 | -0.90496 – 0.11113 | 0.37119 – 0.91210 |

1/2 | 2 | -1.32157 – -0.07066 | 0.19421 – 0.94404 |

1/2 | 5 | -0.55135 – 1.85604 | 0.07122 – 0.76994 |

1 | 1/10 | -1.27023 – 0.73501 | 0.21173 – 0.87314 |

1 | 1/2 | -2.33246 – 0.66827 | 0.02507 – 0.58741 |

1 | 1 | -1.24446 – 0.84555 | 0.22095 – 0.90587 |

1 | 2 | -1.13643 – 0.84148 | 0.26289 – 0.95717 |

1 | 5 | -0.70037 – 1.46778 | 0.15039 – 0.86996 |

2 | 1/10 | -0.77675 – 1.15350 | 0.25591 – 0.97870 |

2 | 1/2 | -0.23122 – 1.20764 | 0.23465 – 0.91855 |

2 | 1 | -0.92273 – -0.05904 | 0.36197 – 0.95323 |

2 | 2 | -1.88150 – 0.64096 | 0.06758 – 0.73056 |

2 | 5 | -0.08315 – 1.01951 | 0.31441 – 0.93417 |

5 | 1/10 | -0.60921 – 1.54606 | 0.13038 – 0.91563 |

5 | 1/2 | -1.30038 – 1.43602 | 0.15918 – 0.86349 |

5 | 1 | -1.22803 – 1.35380 | 0.18380 – 0.64158 |

5 | 2 | -1.83124 – 1.40222 | 0.07491 – 0.66075 |

5 | 5 | -0.97110 – 2.00904 | 0.05168 – 0.74398 |

def exppyscore(ln,ld,ln2,ld2): return sum(1 if random.expovariate(ln*1.0/ld)<random.expovariate(ln2*1.0/ld2) \ else 0 for i in range(1000)) def exprandscore(ln,ld,ln2,ld2): return sum(1 if exprandless(exprandnew(ln,ld), exprandnew(ln2,ld2)) \ else 0 for i in range(1000))

## Application to Weighted Reservoir Sampling

**Weighted reservoir sampling** (choosing an item at random from a list of unknown size) is often implemented by—

- assigning each item a
*weight*(an integer 0 or greater) as it's encountered, call it*w*, - giving each item an exponential random number with λ =
*w*, call it a key, and - choosing the item with the smallest key

(see also (Efraimidis 2015)^{(8)}). However, using fully-sampled exponential random numbers as keys (such as the naïve idiom `-ln(1-RNDU01())/w`

in binary64) can lead to inexact sampling, since the keys have a limited precision, it's possible for multiple items to have the same random key (which can make sampling those items depend on their order rather than on randomness), and the maximum weight is unknown. Partially-sampled e-rands, as given in this document, eliminate the problem of inexact sampling. This is notably because the `exprandless`

method returns one of only two answers—either "less" or "greater"—and samples from both e-rands as necessary so that they will differ from each other by the end of the operation. Another reason is that partially-sampled e-rands have potentially arbitrary precision.

## Notes

^{(1)} Philippe Flajolet, Nasser Saheb. The complexity of generating an exponentially distributed variate. [Research Report] RR-0159, INRIA. 1982. inria-00076400.

^{(2)} Karney, C.F.F., "**Sampling exactly from the normal distribution**", arXiv:1303.6257v2 [physics.comp-ph], 2014.

^{(3)} Devroye, L., Gravel, C., "**Sampling with arbitrary precision**", arXiv:1502.02539v5 [cs.IT], 2018.

^{(4)} Thomas, D.B. and Luk, W., 2008, September. Sampling from the exponential distribution using independent bernoulli variates. In 2008 International Conference on Field Programmable Logic and Applications (pp. 239-244). IEEE.

^{(5)} Canonne, C., Kamath, G., Steinke, T., "**The Discrete Gaussian for Differential Privacy**", arXiv:2004.00010v2 [cs.DS], 2020.

^{(6)} Morina, G., Łatuszyński, K., et al., "From the Bernoulli Factory to a Dice Enterprise via Perfect Sampling of Markov Chains", 2019.

^{(7)} Lumbroso, J., "**Optimal Discrete Uniform Generation from Coin Flips, and Applications**", arXiv:1304.1916 [cs.DS].

^{(8)} Efraimidis, P. "**Weighted Random Sampling over Data Streams**", arXiv:1012.0256v2 [cs.DS], 2015.

^{(9)} Pedersen, K., "**Reconditioning your quantile function**", arXiv:1704.07949v3 [stat.CO], 2018