13,201,763 members (68,255 online)
Add your own
alternative version

#### Stats

43.6K views
461 downloads
15 bookmarked
Posted 3 Feb 2010

# Finding Probability Distribution Parameters from Percentiles

, 26 Jun 2014
 Rate this:
Please Sign up or sign in to vote.
How to determine the parameters of a probability distribution given two percentile constraints

## Introduction

Suppose there's a 10% chance of something being less than 30 and a 90% chance of it being less than 60. Once you pick a probability distribution family (normal, gamma, etc.) you need a way of determining what parameters will satisfy your two requirements.

More precisely, suppose a random variable X has a two-parameter distribution. The problem is to find values of those parameters so that Pr(X < x1) = p1 and Pr(X < x2) = p2.

This article will show how to compute these parameters for normal, Cauchy, Weibull, gamma, and inverse gamma distributions using Python's SciPy library.

The problem of finding parameters to satisfy two percentile equations is practical. It comes up, for example, in determining prior distributions in Bayesian statistics. This article also serves as a brief introduction to using SciPy.

## Background

The mathematical solution to the problem posed here is described in the technical report Determining distribution parameters from quantiles. Here we explain how to implement in Python the calculations in that report. Here is an article giving more motivation for the problem.

## Using the Code

The code described here is very simple to call. Suppose you want to find the mean and standard deviation for a normal distribution. You simply call normal_parameters with the appropriate arguments. It returns the mean and standard deviation as a pair.

(mean, stdev) = normal_parameters(x1, p1, x2, p2)

The functions for other distributions are similar:

• cauchy_parameters
• weibull_parameters
• gamma_parameters
• inverse_gamma_parameters
• beta_parameters

All functions take the same four arguments and all return two parameters. However, the meaning of the parameters is different for each distribution. For the normal distribution, the problem is finding μ and σ so that F(x1) = p1 and F(x2) = p2 where F(x) is defined by:

For the Cauchy distribution, the problem is similar except now:

For the Weibull distribution, the problem is finding γ and β where:

For the gamma distribution, the problem is finding α and β where:

For the inverse gamma distribution, the problem is finding α and β where:

For the beta distribution, the problem is finding a and b where:

## Looking Inside the Code

The code presented here depends critically on SciPy, a large library for scientific computing in Python. For an introduction to SciPy, see the CodeProject article Getting started with the SciPy (Scientific Python) library.

For the normal and Cauchy distributions, the location parameter is given by:

and the scale parameter is given by:

where F(x) is the CDF of the normal or Cauchy distribution as in the previous section. The inverse of the CDF is given by the ppt method in SciPy. Here "ppt" stands for "percentile point function." Other libraries may call this the quantile function. The probability distribution classes are located in scipy.stats.

The parameters for the Weibull distribution can be given by a simple formula not requiring any SciPy functionality. The inverse gamma parameters are also easy to find since the inverse gamma problem can be reduced to the problem of finding parameters for the gamma distribution.

The gamma distribution parameters cannot be obtained so simply. The code to find these parameters illustrates SciPy and the functools module.

def gamma_parameters(x1, p1, x2, p2):

# Standardize so that x1 < x2 and p1 < p2
if p1 > p2:
(p1, p2) = (p2, p1)
(x1, x2) = (x2, x1)

# function to find roots of for gamma distribution parameters
def objective(alpha):
return stats.gamma.ppf(p2, alpha) / stats.gamma.ppf(p1, alpha) - x2/x1

# The objective function we're wanting to find a root of is decreasing.
# We need to find an interval over which is goes from positive to negative.
left = right = 1.0
while objective(left) < 0.0:
left /= 2
while objective(right) > 0.0:
right *= 2
alpha = optimize.bisect(objective, left, right)
beta = x1 / stats.gamma.ppf(p1, alpha)

return (alpha, beta)

Notice the auxiliary function objective defined inside the gamma_parameters function. A nice feature of Python is that you can easily define little functions like this using closures. Only the gamma_parameters needs this function so we define it inside gamma_parameters to keep from cluttering the outer scope.

To find where the function objective is 0, we use one of the root-finding methods in the scipy.optimize. The code here uses the bisect method, an algorithm that is slow compared to other root-finding algorithms but is very reliable.

For the beta distribution, we have to direct method for solving for the distribution parameters. Instead we solve an optimization problem using the SciPy function fmin. We search for the parameters which come closest to satisfying the percentile constraints. In fact these parameters will satisfy the constraints exactly.

def beta_parameters(x1, p1, x2, p2):
"Find parameters for a beta random variable X
so that P(X > x1) = p1 and P(X > x2) = p2."

def square(x):
return x*x

def objective(v):
(a, b) = v
temp  = square( stats.beta.cdf(x1, a, b) - p1 )
temp += square( stats.beta.cdf(x2, a, b) - p2 )
return temp

# arbitrary initial guess of (3, 3) for parameters
xopt = optimize.fmin(objective, (3, 3))
return (xopt[0], xopt[1])	

The approach used to calculate the beta distribution parameters could be used for other distributions as well. In fact, we could have used this approach for all distributions. However, the specialized methods used for the other distributions are more accurate and more efficient.

## History

• 3rd February, 2010: Initial version
• 4th February, 2010: Article updated
• 26th July, 2010: Beta distribution added

## License

This article, along with any associated source code and files, is licensed under The BSD License

## About the Author

 Singular Value Consulting United States
I am an independent consultant in software development and applied mathematics. I help companies learn from their data to make better decisions.

Check out my blog or send me a note.

## Comments and Discussions

 First Prev Next
 Looking for a C# version victorbos21-Oct-11 4:15 victorbos 21-Oct-11 4:15
 Re: Looking for a C# version John D. Cook21-Oct-11 4:36 John D. Cook 21-Oct-11 4:36
 Here is a GUI program -- it happens to be written in C# (and C++) -- for computing these things: https://biostatistics.mdanderson.org/SoftwareDownload/SingleSoftware.aspx?Software_Id=6[^] If you don't have a distribution type, I assume you have data from each of the distributions you want to add. You could randomly select (with replacement) samples from each, add them, and create a histogram.
 Re: Looking for a C# version victorbos21-Oct-11 5:02 victorbos 21-Oct-11 5:02
 Last Visit: 31-Dec-99 18:00     Last Update: 23-Oct-17 14:36 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.171020.1 | Last Updated 26 Jun 2014
Article Copyright 2010 by John D. Cook
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid