Finding Probability Distribution Parameters from Percentiles

John D. Cook

4.18/5 (8 votes)

Feb 3, 2010

BSD

4 min read

74231

657

How to determine the parameters of a probability distribution given two percentile constraints

Download source - 1.43 KB

Introduction

Suppose there's a 10% chance of something being less than 30 and a 90% chance of it being less than 60. Once you pick a probability distribution family (normal, gamma, etc.) you need a way of determining what parameters will satisfy your two requirements.

More precisely, suppose a random variable X has a two-parameter distribution. The problem is to find values of those parameters so that Pr(X < x₁) = p₁ and Pr(X < x₂) = p₂.

This article will show how to compute these parameters for normal, Cauchy, Weibull, gamma, and inverse gamma distributions using Python's SciPy library.

The problem of finding parameters to satisfy two percentile equations is practical. It comes up, for example, in determining prior distributions in Bayesian statistics. This article also serves as a brief introduction to using SciPy.

Background

The mathematical solution to the problem posed here is described in the technical report Determining distribution parameters from quantiles. Here we explain how to implement in Python the calculations in that report. Here is an article giving more motivation for the problem.

Using the Code

The code described here is very simple to call. Suppose you want to find the mean and standard deviation for a normal distribution. You simply call normal_parameters with the appropriate arguments. It returns the mean and standard deviation as a pair.

(mean, stdev) = normal_parameters(x1, p1, x2, p2)

The functions for other distributions are similar:

cauchy_parameters
weibull_parameters
gamma_parameters
inverse_gamma_parameters
beta_parameters

All functions take the same four arguments and all return two parameters. However, the meaning of the parameters is different for each distribution. For the normal distribution, the problem is finding μ and σ so that F(x₁) = p₁ and F(x₂) = p₂ where F(x) is defined by:

$F(x) = \frac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^x \exp\left( -\frac{(t-\mu)^2}{2\sigma^2}\right) \, dt$

For the Cauchy distribution, the problem is similar except now:

$F(x) = \frac{1}{\pi\sigma} \int_{-\infty}^x \frac{dt}{ 1 + \left(\frac{t-\mu}{\sigma} \right)^2 }}$

For the Weibull distribution, the problem is finding γ and β where:

$F(x) = \frac{\gamma}{\beta^\gamma} \int_0^x t^{\gamma-1} \exp\left(-(t/\beta)^\gamma \right)\, dt$

For the gamma distribution, the problem is finding α and β where:

$F(x) = \frac{1}{\Gamma(\alpha)\,\beta^\alpha} \int_0^x t^{\alpha-1} \exp(-x/\beta)\, dt$

For the inverse gamma distribution, the problem is finding α and β where:

$F(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} \int_0^x t^{-\alpha-1}\exp(-\beta/t)\, dt$

For the beta distribution, the problem is finding a and b where:

$F(x) = \frac{\Gamma(a+b)}{\Gamma(a)\,\Gamma(b)} \int_0^x t^{a-1}(1-t)^{b-1}\, dt$

Looking Inside the Code

The code presented here depends critically on SciPy, a large library for scientific computing in Python. For an introduction to SciPy, see the CodeProject article Getting started with the SciPy (Scientific Python) library.

For the normal and Cauchy distributions, the location parameter is given by:

$\frac{x_1 F^{-1}(p_2) - x_2 F^{-1}(p_1)}{F^{-1}(p_2) - F^{-1}(p_1)}$

and the scale parameter is given by:

$\frac{x_2 - x_1}{F^{-1}(p_2) - F^{-1}(p_1)}$

where F(x) is the CDF of the normal or Cauchy distribution as in the previous section. The inverse of the CDF is given by the ppt method in SciPy. Here "ppt" stands for "percentile point function." Other libraries may call this the quantile function. The probability distribution classes are located in scipy.stats.

The parameters for the Weibull distribution can be given by a simple formula not requiring any SciPy functionality. The inverse gamma parameters are also easy to find since the inverse gamma problem can be reduced to the problem of finding parameters for the gamma distribution.

The gamma distribution parameters cannot be obtained so simply. The code to find these parameters illustrates SciPy and the functools module.

def gamma_parameters(x1, p1, x2, p2):
    
    # Standardize so that x1 < x2 and p1 < p2
    if p1 > p2:
        (p1, p2) = (p2, p1)
        (x1, x2) = (x2, x1)
    
    # function to find roots of for gamma distribution parameters
    def objective(alpha):
        return stats.gamma.ppf(p2, alpha) / stats.gamma.ppf(p1, alpha) - x2/x1
    
    # The objective function we're wanting to find a root of is decreasing.
    # We need to find an interval over which is goes from positive to negative.
    left = right = 1.0
    while objective(left) < 0.0:
        left /= 2
    while objective(right) > 0.0:
        right *= 2
    alpha = optimize.bisect(objective, left, right)
    beta = x1 / stats.gamma.ppf(p1, alpha)
    
    return (alpha, beta)

Notice the auxiliary function objective defined inside the gamma_parameters function. A nice feature of Python is that you can easily define little functions like this using closures. Only the gamma_parameters needs this function so we define it inside gamma_parameters to keep from cluttering the outer scope.

To find where the function objective is 0, we use one of the root-finding methods in the scipy.optimize. The code here uses the bisect method, an algorithm that is slow compared to other root-finding algorithms but is very reliable.

For the beta distribution, we have to direct method for solving for the distribution parameters. Instead we solve an optimization problem using the SciPy function fmin. We search for the parameters which come closest to satisfying the percentile constraints. In fact these parameters will satisfy the constraints exactly.

def beta_parameters(x1, p1, x2, p2):
    "Find parameters for a beta random variable X 
	so that P(X > x1) = p1 and P(X > x2) = p2."

    def square(x): 
        return x*x

    def objective(v):
        (a, b) = v
        temp  = square( stats.beta.cdf(x1, a, b) - p1 )
        temp += square( stats.beta.cdf(x2, a, b) - p2 )
        return temp
    
    # arbitrary initial guess of (3, 3) for parameters
    xopt = optimize.fmin(objective, (3, 3))
    return (xopt[0], xopt[1])

The approach used to calculate the beta distribution parameters could be used for other distributions as well. In fact, we could have used this approach for all distributions. However, the specialized methods used for the other distributions are more accurate and more efficient.

History

3^rd February, 2010: Initial version
4^th February, 2010: Article updated
26^th July, 2010: Beta distribution added