Click here to Skip to main content
Click here to Skip to main content

Finding Probability Distribution Parameters from Percentiles

, 26 Jun 2014
Rate this:
Please Sign up or sign in to vote.
How to determine the parameters of a probability distribution given two percentile constraints

Introduction

Suppose there's a 10% chance of something being less than 30 and a 90% chance of it being less than 60. Once you pick a probability distribution family (normal, gamma, etc.) you need a way of determining what parameters will satisfy your two requirements.

More precisely, suppose a random variable X has a two-parameter distribution. The problem is to find values of those parameters so that Pr(X < x1) = p1 and Pr(X < x2) = p2.

This article will show how to compute these parameters for normal, Cauchy, Weibull, gamma, and inverse gamma distributions using Python's SciPy library.

The problem of finding parameters to satisfy two percentile equations is practical. It comes up, for example, in determining prior distributions in Bayesian statistics. This article also serves as a brief introduction to using SciPy.

Background

The mathematical solution to the problem posed here is described in the technical report Determining distribution parameters from quantiles. Here we explain how to implement in Python the calculations in that report. Here is an article giving more motivation for the problem.

Using the Code

The code described here is very simple to call. Suppose you want to find the mean and standard deviation for a normal distribution. You simply call normal_parameters with the appropriate arguments. It returns the mean and standard deviation as a pair.

(mean, stdev) = normal_parameters(x1, p1, x2, p2)

The functions for other distributions are similar:

  • cauchy_parameters
  • weibull_parameters
  • gamma_parameters
  • inverse_gamma_parameters
  • beta_parameters

All functions take the same four arguments and all return two parameters. However, the meaning of the parameters is different for each distribution. For the normal distribution, the problem is finding μ and σ so that F(x1) = p1 and F(x2) = p2 where F(x) is defined by:

F(x) = \frac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^x  \exp\left( -\frac{(t-\mu)^2}{2\sigma^2}\right) \, dt

For the Cauchy distribution, the problem is similar except now:

F(x) = \frac{1}{\pi\sigma} \int_{-\infty}^x \frac{dt}{ 1 + \left(\frac{t-\mu}{\sigma} \right)^2 }}

For the Weibull distribution, the problem is finding γ and β where:

F(x) = \frac{\gamma}{\beta^\gamma} \int_0^x t^{\gamma-1} \exp\left(-(t/\beta)^\gamma \right)\, dt

For the gamma distribution, the problem is finding α and β where:

F(x) = \frac{1}{\Gamma(\alpha)\,\beta^\alpha} \int_0^x t^{\alpha-1} \exp(-x/\beta)\, dt

For the inverse gamma distribution, the problem is finding α and β where:

F(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} \int_0^x t^{-\alpha-1}\exp(-\beta/t)\, dt

For the beta distribution, the problem is finding a and b where:

F(x) = \frac{\Gamma(a+b)}{\Gamma(a)\,\Gamma(b)} \int_0^x t^{a-1}(1-t)^{b-1}\, dt

Looking Inside the Code

The code presented here depends critically on SciPy, a large library for scientific computing in Python. For an introduction to SciPy, see the CodeProject article Getting started with the SciPy (Scientific Python) library.

For the normal and Cauchy distributions, the location parameter is given by:

\frac{x_1 F^{-1}(p_2) - x_2 F^{-1}(p_1)}{F^{-1}(p_2) - F^{-1}(p_1)}

and the scale parameter is given by:

\frac{x_2 - x_1}{F^{-1}(p_2) - F^{-1}(p_1)}

where F(x) is the CDF of the normal or Cauchy distribution as in the previous section. The inverse of the CDF is given by the ppt method in SciPy. Here "ppt" stands for "percentile point function." Other libraries may call this the quantile function. The probability distribution classes are located in scipy.stats.

The parameters for the Weibull distribution can be given by a simple formula not requiring any SciPy functionality. The inverse gamma parameters are also easy to find since the inverse gamma problem can be reduced to the problem of finding parameters for the gamma distribution.

The gamma distribution parameters cannot be obtained so simply. The code to find these parameters illustrates SciPy and the functools module.

def gamma_parameters(x1, p1, x2, p2):
    
    # Standardize so that x1 < x2 and p1 < p2
    if p1 > p2:
        (p1, p2) = (p2, p1)
        (x1, x2) = (x2, x1)
    
    # function to find roots of for gamma distribution parameters
    def objective(alpha):
        return stats.gamma.ppf(p2, alpha) / stats.gamma.ppf(p1, alpha) - x2/x1
    
    # The objective function we're wanting to find a root of is decreasing.
    # We need to find an interval over which is goes from positive to negative.
    left = right = 1.0
    while objective(left) < 0.0:
        left /= 2
    while objective(right) > 0.0:
        right *= 2
    alpha = optimize.bisect(objective, left, right)
    beta = x1 / stats.gamma.ppf(p1, alpha)
    
    return (alpha, beta)

Notice the auxiliary function objective defined inside the gamma_parameters function. A nice feature of Python is that you can easily define little functions like this using closures. Only the gamma_parameters needs this function so we define it inside gamma_parameters to keep from cluttering the outer scope.

To find where the function objective is 0, we use one of the root-finding methods in the scipy.optimize. The code here uses the bisect method, an algorithm that is slow compared to other root-finding algorithms but is very reliable.

For the beta distribution, we have to direct method for solving for the distribution parameters. Instead we solve an optimization problem using the SciPy function fmin. We search for the parameters which come closest to satisfying the percentile constraints. In fact these parameters will satisfy the constraints exactly.

def beta_parameters(x1, p1, x2, p2):
    "Find parameters for a beta random variable X 
	so that P(X > x1) = p1 and P(X > x2) = p2."

    def square(x): 
        return x*x

    def objective(v):
        (a, b) = v
        temp  = square( stats.beta.cdf(x1, a, b) - p1 )
        temp += square( stats.beta.cdf(x2, a, b) - p2 )
        return temp
    
    # arbitrary initial guess of (3, 3) for parameters
    xopt = optimize.fmin(objective, (3, 3))
    return (xopt[0], xopt[1])	

The approach used to calculate the beta distribution parameters could be used for other distributions as well. In fact, we could have used this approach for all distributions. However, the specialized methods used for the other distributions are more accurate and more efficient.

History

  • 3rd February, 2010: Initial version
  • 4th February, 2010: Article updated
  • 26th July, 2010: Beta distribution added

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

About the Author

John D. Cook

United States United States
I am an independent consultant in software development and applied mathematics. I help companies learn from their data to make better decisions.
 
Check out my blog or send me a note.
 

 

Follow on   Twitter   Google+

Comments and Discussions

 
QuestionLooking for a C# version Pinmembervictorbos21-Oct-11 4:15 
Greetings John.
 
Given your impressive expertise on this subject, I thought you might have answers/suggestions for two problems I am grappling with:
1) Where might I find C# equivalent for this problem, i.e., estimating parameters for a distribution?
2) Where I might find pseudo code (or better yet C# code) for an efficient way two get the distribution of the sum of two discrete "ill-behaved" (i.e., no clear distribution type) real values? Not being a whiz at math or numerical techniques, I have so far tackled it the dumb brute force way, which is very inefficient.)
 
Any tips/pointers would be greatly appreciated. Thanks in advance.
 
--VVX
AnswerRe: Looking for a C# version PinmemberJohn D. Cook21-Oct-11 4:36 
GeneralRe: Looking for a C# version Pinmembervictorbos21-Oct-11 5:02 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140721.1 | Last Updated 26 Jun 2014
Article Copyright 2010 by John D. Cook
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid