Click here to Skip to main content
14,576,812 members

Partially-Sampled Exponential Random Numbers

Rate this:
5.00 (1 vote)
Please Sign up or sign in to vote.
5.00 (1 vote)
30 Jun 2020Public Domain
A Python implementation of partially-sampled exponential random numbers, or e-rands.
This page introduces partially-sampled exponential random numbers. Called e-rands, they represent incomplete numbers whose contents are determined only when necessary, making them have potentially arbitrary precision. This page includes an implementation of e-rands in Python, correctness test results, and an application to weighted random sampling of a data stream of unknown size.

Introduction

This page introduces an implementation of partially-sampled exponential random numbers. Called e-rands in this document, they represent incomplete numbers whose contents are determined only when necessary, making them have potentially arbitrary precision.

Moreover, this document includes methods that operate on e-rands in a way that uses only uniform random bits, and without relying on floating-point arithmetic (except for conversion purposes in the example method exprand). Also, the methods support e-rands with an arbitrary rate parameter (λ) greater than 0.

There are papers that discuss generating exponential random numbers using random bits (Flajolet and Saheb 1982)(1), (Karney 2014)(2), (Devroye and Gravel 2018)(3), (Thomas and Luk 2008)(4), but none I am aware of deal with generating partially-sampled exponential random numbers using an arbitrary rate, not just 1.

About the Exponential Distribution

The exponential distribution takes a parameter λ. Informally speaking, a random number that follows an exponential distribution is the number of units of time between one event and the next, and λ is the expected average number of events per unit of time. Usually, λ is equal to 1.

An exponential random number is commonly generated as follows: -ln(1 - RNDU01()), where RNDU01() is a uniform random number in the interval [**0, 1). (This particular formula is not robust, though, for reasons that are outside the scope of this document, but see (Pedersen 2018)(9)**.) This page presents an alternative way to sample exponential random numbers.

Code

The following Python code implements partially-sampled exponential random numbers, called e-rands in this document for convenience (similarly to Karney's "u-rands" for partially-sampled uniform random numbers (Karney 2014)(2)). It implements a way to determine whether one e-rand is less than another, as well as a way to fill an e-rand as necessary to give its fractional part a given number of bits.

It makes use of two observations (based on the parameter λ of the exponential distribution):

  • While a coin flip with probability of heads of exp(-λ) is heads, the exponential random number is increased by 1.
  • If a coin flip with probability of heads of 1/(1+exp(λ/2k)) is heads, the exponential random number is increased by 2-k, where k > 0 is an integer.

(Devroye and Gravel 2018)(3) already made these observations in their Appendix, but only for λ = 1.

To implement these probabilities using just random bits, the code uses two algorithms:

  • One to simulate a probability of the form exp(-x/y) (zero_or_one_exp_minus; (Canonne et al. 2020)(5)).
  • One to simulate a probability of the form 1/(1+exp(lamda/pow(2, prec))) (logisticexp (Morina et al. 2019)(6)).

Both algorithms are included in the Python code below. (Note that zero_or_one_exp_minus uses random.randint which does not necessarily use only random bits; it could be replaced with a random-bit-only algorithm such as FastDiceRoller or Bernoulli, both of which were presented by Lumbroso (2013)(7).)

<code>
import random

def logisticexp(ln, ld, prec):
        denom=ld*2**prec
        while True:
           if random.randint(0,1)==0: return 0
           if zero_or_one_exp_minus(ln, denom) == 1: return 1

def exprandnew(lamdanum=1, lamdaden=1):
     """ Returns an object to serve as a partially-sampled
          exponential random number with the given
          rate 'lamdanum'/'lamdaden'.  The object is a list of five numbers:
          the first is a multiple of 2^X, the second is X, the third is the integer
          part (initially -1 to indicate the integer part wasn't sampled yet),
          and the fourth and fifth are the lamda parameter's
          numerator and denominator, respectively.  Default for 'lamdanum'
          and 'lamdaden' is 1.
          The number created by this method will be "empty"
          (no bits sampled yet).
          """
     return [0, 0, -1, lamdanum, lamdaden]

def exprandfill(a, bits):
    """ Fills the unsampled bits of the given exponential random number
           'a' as necessary to make a number whose fractional part
           has 'bits' many bits.  If the number's fractional part already has
           that many bits or more, the number is rounded using the round-to-nearest,
           ties to even rounding rule.  Returns the resulting number as a
           multiple of 2^'bits'. """
    # Fill the integer if necessary.
    if a[2]==-1:
        a[2]=0
        while zero_or_one_exp_minus(a[3], a[4]) == 1:
            a[2]+=1
    if a[1] > bits:
        # Shifting bits beyond the first excess bit.
        aa = a[0] >> (a[1] - bits - 1)
        # Check the excess bit; if odd, round up.
        ret=aa >> 1 if (aa & 1) == 0 else (aa >> 1) + 1
        return ret|(a[2]<<bits)
    # Fill the fractional part if necessary.
    while a[1] < bits:
       index = a[1]
       a[1]+=1
       a[0]=(a[0]<<1)|logisticexp(a[3], a[4], index+1)
    return a[0]|(a[2]<<bits)

def exprandless(a, b):
        """ Determines whether one partially-sampled exponential number
           is less than another; returns
           true if so and false otherwise.  During
           the comparison, additional bits will be sampled in both numbers
           if necessary for the comparison. """
        # Check integer part of exponentials
        if a[2] == -1:
            a[2] = 0
            while zero_or_one_exp_minus(a[3], a[4]) == 1:
                a[2] += 1
        if b[2] == -1:
            b[2] = 0
            while zero_or_one_exp_minus(b[3], b[4]) == 1:
                b[2] += 1
        if a[2] < b[2]:
            return True
        if a[2] > b[2]:
            return False
        index = 0
        while True:
            # Fill with next bit in a's exponential number
            if a[1] < index:
                raise ValueError
            if b[1] < index:
                raise ValueError
            if a[1] <= index:
                a[1] += 1
                a[0] = logisticexp(a[3], a[4], index + 1) | (a[0] << 1)
            # Fill with next bit in b's exponential number
            if b[1] <= index:
                b[1] += 1
                b[0] = logisticexp(b[3], b[4], index + 1) | (b[0] << 1)
            aa = (a[0] >> (a[1] - 1 - index)) & 1
            bb = (b[0] >> (b[1] - 1 - index)) & 1
            if aa < bb:
                return True
            if aa > bb:
                return False
            index += 1

def zero_or_one_exp_minus(x, y):
        """ Generates 1 with probability exp(-px/py); 0 otherwise.
               Reference:
               Canonne, C., Kamath, G., Steinke, T., "[The Discrete Gaussian
               for Differential Privacy](https://arxiv.org/abs/2004.00010v2)", arXiv:2004.00010v2 [cs.DS], 2020. """
        if y <= 0 or x < 0:
            raise ValueError
        if x > y:
            xf = int(x / y)  # Get integer part
            x = x % y  # Reduce to fraction
            if x > 0 and zero_or_one_exp_minus(x, y) == 0:
                return 0
            for i in range(1, xf + 1):
                if zero_or_one_exp_minus(1, 1) == 0:
                    return 0
            return 1
        r = 1
        oy = y
        while True:
            # NOTE: randint is used in this example, but
            # it could be replaced by a random-bit-only
            # algorithm such as the Fast Dice Roller (Lumbroso 2013),
            # or the Bernoulli method given in the same paper.
            if random.randint(0, y-1) >= x:
                return r
            if r == 1:
                r = 0
            else:
                r = 1
            y = y + oy

# Example of use
def exprand(lam):
   return exprandfill(exprandnew(lam),53)*1.0/(1<<53)

Correctness Testing

To test the correctness of the exprandfill method, the Kolmogorov–Smirnov test was applied with various values of λ and the default precision of 53, using SciPy's kstest method. The code for the test is very simple: kst = scipy.stats.kstest(ksample, lambda x: scipy.stats.expon.cdf(x, scale=1/lamda)), where ksample is a sample of random numbers generated using the exprand method above. Note that SciPy uses a two-sided Kolmogorov–Smirnov test by default.

The table below shows the results of the correctness testing. For each parameter, five samples with 50,000 numbers per sample were taken, and results show the lowest and highest Kolmogorov–Smirnov statistics and p-values achieved for the five samples. Note that a p-value extremely close to 0 or 1 strongly indicates that the samples do not come from the corresponding exponential distribution.

λ Statistic p-value
1/10 0.00233-0.00435 0.29954-0.94867
1/4 0.00254-0.00738 0.00864-0.90282
1/2 0.00195-0.00521 0.13238-0.99139
2/3 0.00295-0.00457 0.24659-0.77715
3/4 0.00190-0.00636 0.03514-0.99381
9/10 0.00226-0.00474 0.21032-0.96029
1 0.00267-0.00601 0.05389-0.86676
2 0.00293-0.00684 0.01870-0.78310
3 0.00284-0.00675 0.02091-0.81589
5 0.00256-0.00546 0.10130-0.89935
10 0.00279-0.00528 0.12358-0.82974

To test the correctness of exprandless, a two-independent-sample T-test was applied to scores involving e-rands and scores involving the Python random.expovariate method. Specifically, the score is calculated as the number of times one exponential number compares as less than another; for the same λ this event should ideally be as likely as the event that it compares as greater. The Python code that follows the table calculates this score for e-rands and expovariate. Even here, the code for the test is very simple: kst = scipy.stats.ttest_ind(exppyscores, exprandscores), where exppyscores and exprandscores are each lists of 20 results from exppyscore or exprandscore, respectively.

The table below shows the results of the correctness testing. For each pair of parameters, results show the lowest and highest T-test statistics and p-values achieved for the 20 results. Note that a p-value extremely close to 0 or 1 strongly indicates that exponential random numbers are not compared as less or greater with the expected probability.

Left λRight λStatisticp-value
1/101/10-1.21015 – 0.936820.23369 – 0.75610
1/101/2-1.25248 – 3.562910.00101 – 0.39963
1/101-0.76586 – 1.076280.28859 – 0.94709
1/102-1.80624 – 1.583470.07881 – 0.90802
1/105-0.16197 – 1.787000.08192 – 0.87219
1/21/10-1.46973 – 1.403080.14987 – 0.74549
1/21/2-0.79555 – 1.215380.23172 – 0.93613
1/21-0.90496 – 0.111130.37119 – 0.91210
1/22-1.32157 – -0.070660.19421 – 0.94404
1/25-0.55135 – 1.856040.07122 – 0.76994
11/10-1.27023 – 0.735010.21173 – 0.87314
11/2-2.33246 – 0.668270.02507 – 0.58741
11-1.24446 – 0.845550.22095 – 0.90587
12-1.13643 – 0.841480.26289 – 0.95717
15-0.70037 – 1.467780.15039 – 0.86996
21/10-0.77675 – 1.153500.25591 – 0.97870
21/2-0.23122 – 1.207640.23465 – 0.91855
21-0.92273 – -0.059040.36197 – 0.95323
22-1.88150 – 0.640960.06758 – 0.73056
25-0.08315 – 1.019510.31441 – 0.93417
51/10-0.60921 – 1.546060.13038 – 0.91563
51/2-1.30038 – 1.436020.15918 – 0.86349
51-1.22803 – 1.353800.18380 – 0.64158
52-1.83124 – 1.402220.07491 – 0.66075
55-0.97110 – 2.009040.05168 – 0.74398
def exppyscore(ln,ld,ln2,ld2):
        return sum(1 if random.expovariate(ln*1.0/ld)<random.expovariate(ln2*1.0/ld2) \
              else 0 for i in range(1000))

def exprandscore(ln,ld,ln2,ld2):
        return sum(1 if exprandless(exprandnew(ln,ld), exprandnew(ln2,ld2)) \
              else 0 for i in range(1000))

Application to Weighted Reservoir Sampling

Weighted reservoir sampling (choosing an item at random from a list of unknown size) is often implemented by—

  • assigning each item a weight (an integer 0 or greater) as it's encountered, call it w,
  • giving each item an exponential random number with λ = w, call it a key, and
  • choosing the item with the smallest key

(see also (Efraimidis 2015)(8)). However, using fully-sampled exponential random numbers as keys (such as the naïve idiom -ln(1-RNDU01())/w in binary64) can lead to inexact sampling, since the keys have a limited precision, it's possible for multiple items to have the same random key (which can make sampling those items depend on their order rather than on randomness), and the maximum weight is unknown. Partially-sampled e-rands, as given in this document, eliminate the problem of inexact sampling. This is notably because the exprandless method returns one of only two answers—either "less" or "greater"—and samples from both e-rands as necessary so that they will differ from each other by the end of the operation. Another reason is that partially-sampled e-rands have potentially arbitrary precision.

Notes

(1) Philippe Flajolet, Nasser Saheb. The complexity of generating an exponentially distributed variate. [Research Report] RR-0159, INRIA. 1982. inria-00076400.

(2) Karney, C.F.F., "Sampling exactly from the normal distribution", arXiv:1303.6257v2 [physics.comp-ph], 2014.

(3) Devroye, L., Gravel, C., "Sampling with arbitrary precision", arXiv:1502.02539v5 [cs.IT], 2018.

(4) Thomas, D.B. and Luk, W., 2008, September. Sampling from the exponential distribution using independent bernoulli variates. In 2008 International Conference on Field Programmable Logic and Applications (pp. 239-244). IEEE.

(5) Canonne, C., Kamath, G., Steinke, T., "The Discrete Gaussian for Differential Privacy", arXiv:2004.00010v2 [cs.DS], 2020.

(6) Morina, G., Łatuszyński, K., et al., "From the Bernoulli Factory to a Dice Enterprise via Perfect Sampling of Markov Chains", 2019.

(7) Lumbroso, J., "Optimal Discrete Uniform Generation from Coin Flips, and Applications", arXiv:1304.1916 [cs.DS].

(8) Efraimidis, P. "Weighted Random Sampling over Data Streams", arXiv:1012.0256v2 [cs.DS], 2015.

(9) Pedersen, K., "Reconditioning your quantile function", arXiv:1704.07949v3 [stat.CO], 2018

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

Share

About the Author

Peter Occil
United States United States
No Biography provided

Comments and Discussions

 
-- There are no messages in this forum --
Article
Posted 30 Jun 2020

Tagged as

Stats

1.6K views
1 bookmarked