Click here to Skip to main content
13,195,821 members (52,954 online)
Click here to Skip to main content
Add your own
alternative version


37 bookmarked
Posted 31 Jul 2012

How to Calculate the Chi-Squared P-Value

, 5 Nov 2012
Rate this:
Please Sign up or sign in to vote.
Learn the Secrets of the Universe


Have you ever wondered how those big tables of P-Values for the Chi-Squared Test are Calculated?

Have you ever wanted to calculate those values yourself?

Have you ever asked for help online to calculate those values, and someone gave you an annoying link to a webpage that had either the tables or a form applet that would calculate it for you, with no mention What. So. Ever of the actual FORMULA!?!?!?!? 

Then you're in good company. Keep on reading, this is for you.


The Chi-Squared Distribution is probably the most widely used distribution in Statistics today. It is most well known for its use in Pearson's Chi-Squared Test which is used to measure goodness of fit.

int Observed[256];
int Expected[256]; 
double CriticalValue = 0.0;
double XSqr;
for(int I = 0; I < 256; I++)
    XSqr = Observed[I] - Expected[I];
    CriticalValue += ((XSqr * XSqr) / Expected[I]);
   Degrees of Freedom = 255 (The number of measure values - 1)
   Test Statistic = CriticalValue

Above is a brief example of the first part of Pearson's Chi-Squared Test. This demonstrates how we obtain the two values we need to calculate the P-Value: the Degrees of Freedom and the Critical Value. 

What is this 'P', and why is it so Valuable?

For those of you who are new to this, the P-Value is the probability that the difference between the Observed and the Expected values would happen purely by chance. For instance, each side of a fair coin should have an equal probability of occurring.

Heads = 50.0%
Tails = 50.0%

So in theory if you flip a coin 1,000 times, Heads and Tails should each occur 500 times. In reality, this rarely happens. You're much more likely to have Heads land head up 507 times, and Tails 493, or vice versa. So if it's a little bit different, that's understandable.

But when does that difference become just to much. A fair coin will probably land Heads up 507/1000. But what if it's 515? What if it's 600?  

This is what the P-Value is for. It allows you to Objectively assign a probability to an Observed value randomly occurring by comparing it with the Expected Value.

Using the code

Above I gave a brief example of how you calculate the Critical Value and Degrees of Freedom for a Chi-Squared Test. This was for demonstration only, and is not in the included code, as you will undoubtedly have to customize it for your own situation. If you have any problems with this, there are plenty of excellent tutorials online demonstrating how to calculate these values. I would highly recommend any number of these.

But that's not what I'm here to look at. I'm here to show you how to Calculate the actual P-Value. So let's look at some code.

double chisqr(int Dof, double Cv)
    if(Cv < 0 || Dof < 1)
        return 0.0;
    double K = ((double)Dof) * 0.5;
    double X = Cv * 0.5;
    if(Dof == 2)
	return exp(-1.0 * X);
    double PValue = igf(K, X);
    if(isnan(PValue) || isinf(PValue) || PValue <= 1e-8)
        return 1e-14;

    PValue /= gamma(K);
    //PValue /= tgamma(K); 
    return (1.0 - PValue);

This is the main function that you will be using, found in chisqr.c. It's very simple; you put in your values, and it spits out the P-Value; You'll notice, if you only have 2 Degrees of Freedom then you're in luck. There is a shortcut to calculating the P-Value using only one call to exp(), which is considerably faster than chisqr() would be otherwise

There are two other functions that you will need to in order to finish calculating the P-Value.

static double igf(double S, double Z)
    if(Z < 0.0)
	return 0.0;
    double Sc = (1.0 / S);
    Sc *= pow(Z, S);
    Sc *= exp(-Z);
    double Sum = 1.0;
    double Nom = 1.0;
    double Denom = 1.0;
    for(int I = 0; I < 200; I++)
	Nom *= Z;
	Denom *= S;
	Sum += (Nom / Denom);
    return Sum * Sc;

Next up is the Incomplete Gamma Function. This is what computes the first part of the P-Value. If you're looking for an explanation as to how this relates to calculating the Chi-Squared Value, you can find it in this section on Wiki's Chi Square Page.

As a side note, in chisqr.c. I give a brief explanation as to why I have the for loop go for 200 iterations. The short answer is, this is what I arrived at by trial and error. If you have any experience in this area and know anything more about this, please tell me in the comments below.

And last but not least... 

double gamma(double N)
    const long double SQRT2PI = 2.5066282746310005024157652848110452530069867406099383;
    long double Z = (long double)N;
    long double Sc = powl((Z + A), (Z + 0.5));
    Sc *= expl(-1.0 * (Z + A));
    Sc /= Z;
    long double F = 1.0;
    long double Ck;
    long double Sum = SQRT2PI;

    for(int K = 1; K < A; K++)
	Ck = powl(A - K, K - 0.5);
	Ck *= expl(A - K);
	Ck /= F;
	Sum += (Ck / Z);
	F *= (-1.0 * K);
    return (double)(Sum * Sc);

Say hello to the Gamma Function. For those of you who are unfamiliar with it, the Gamma Function allows you to compute the Factorial of decimals (i.e. 5.5!).

Quick Sidenote. If you've never used the Gamma Function before, there's something you should know about it. It will actually compute the factorial of the Number you put in, Minus 1. Gamma(X) = (X - 1)! So if you put in Gamma(5) you won't get 120, you'll get 24. "Why", you ask? I don't know: mathematicians are just weird like that. I think it's how they punish society for calling them nerds for all those years, but that's just my theory.

Depending whatever language you're using, you're standard math library might already have the gamma function built in. For instance in C, math.h you can calculate the gamma function by calling tgamma(). I would suggest you check to see if you have it, as the implementations in the standard libraries are usually more efficient, faster, and more accurate.

Now the Gamma Function you see above is an implementation of Spouge's Approximation. The reason that I like it, is that it allows you to compute the Gamma Function to any accuracy you desire, so long as you have a datatype big enough to hold the required data. You can implement this using an Arbitrarily Precise Library (in fact, I think it already is) and calculate your desired accuracy.

It is not, however, the fastest version of the Gamma Function. There are several approximations that actually give a value that is more than accurate enough for a Chi-Squared Test.

double approx_gamma(double Z)
    const double RECIP_E = 0.36787944117144232159552377016147;  // RECIP_E = (E^-1) = (1.0 / E)
    const double TWOPI = 6.283185307179586476925286766559;  // TWOPI = 2.0 * PI

    double D = 1.0 / (10.0 * Z);
    D = 1.0 / ((12 * Z) - D);
    D = (D + Z) * RECIP_E;
    D = pow(D, Z);
    D *= sqrt(TWOPI / Z);
    return D;

This is a simple version of Stirling's Approximation. This will give you around 6-7 digits of accuracy. I compared the P-Value I got using this one versus using my other implementation and the one in math.h.

// Degrees_of_Freedom = 255
// Critical_Value = 290.285192
// Output = P_Value
// chisqr(Degrees_of_Freedom, Critical_Value) = P_Value

chisqr(255, 290.285192) = 0.06364235  // Using tgamma() in math.h
chisqr(255, 290.285192) = 0.06364235  // Using gamma()
chisqr(255, 290.285192) = 0.06364235  // Using approx_gamma()  

So yeah, tried the gamma function in the math library, my implementation of Spouge's Approximation, and Stirling's Approximation printed them out to the screen: no difference. Some of the lower bits are probably different, but this is already much more accurate than any table or calculator you'd find online. So unless you are going for pure accuracy, I'd suggest sticking with approx_gamma().


Jeez, that's a lot of code. Well thank you everyone for sticking with me through all of this. I've found the algorithms to be very important, and I'm happy I get to share them with you all. If you have any questions about, please feel free to ask and I'll do my best to answer them. If you want to know anything about the specific formulas themselves, then I'd suggest looking in the links I gave.  If you have any comments, criticism, suggestions, or improvements, please feel free to post a comment.

One last thing. Every once in a while over the last few years I've been in need of these functions and have scrambled about trying to find them, often to find them placed under heavy and restrictive licenses. This in and of itself isn't a bad thing, but considering how important these are and how popular they are in math and testing equipment, I was simply amazed that there were so few implementations freely available.

So, to make sure this is absolutely clear. 

This code is place in the Public Domain, and may be used by anyone for any reason for any purpose with no charge, no permission needed, etc.  

I'll admit these aren't the best implementations of these functions, but they work, and they're better than nothing at all. 


Jacob W.


8/2/2012: I was doing some testing when I realized I had forgotten to account for when the PValue is nan or inf. I've updated the example with a corrected version, as well as the ZIP file. My apologies if this confused anybody.

11/6/2012:  Uploaded   ChiSqr() has had a significant upgrade.   I've modified the code so that it now uses Natural Logarithms to help calculate the ChiSqr P-Value.  This gives it a huge increase in range and accuracy.  It is slower though.  Sadly I haven't found away to improve this one aspect.  Despite that, I would highly recommend using this new Version over the original, as I believe it is more than worth it.  I'll update the examples in the Article in a few days.  Once again, all code is placed in the Public Domain.



This article, along with any associated source code and files, is licensed under A Public Domain dedication


About the Author

Jacob F. W.
United States United States
No Biography provided

You may also be interested in...

Comments and Discussions

Question[C++] optimization? Pin
Member 125142264-Oct-17 23:38
memberMember 125142264-Oct-17 23:38 
Questionvariable A? Pin
William M. Nelson12-Jun-14 7:30
memberWilliam M. Nelson12-Jun-14 7:30 
AnswerRe: variable A? Pin
Jacob F. W.12-Jun-14 7:46
memberJacob F. W.12-Jun-14 7:46 
GeneralThere are some errors in the updated version Pin
Member 107884681-May-14 9:29
memberMember 107884681-May-14 9:29 
GeneralRe: There are some errors in the updated version Pin
jcfried23-May-15 10:12
memberjcfried23-May-15 10:12 
GeneralRe: There are some errors in the updated version Pin
Member 125142264-Oct-17 5:26
memberMember 125142264-Oct-17 5:26 
QuestionFunction is not giving the right result Pin
SweetInfection22-Apr-14 13:13
memberSweetInfection22-Apr-14 13:13 
AnswerRe: Function is not giving the right result Pin
Jacob F. W.10-Jun-14 13:29
memberJacob F. W.10-Jun-14 13:29 
GeneralRe: Function is not giving the right result Pin
SweetInfection10-Jun-14 14:10
memberSweetInfection10-Jun-14 14:10 
GeneralRe: Function is not giving the right result Pin
Member 125142264-Oct-17 4:28
memberMember 125142264-Oct-17 4:28 
GeneralFantastic - would love to chat about algorithms. Pin
Member 1076472021-Apr-14 11:22
memberMember 1076472021-Apr-14 11:22 
GeneralVery good Pin
Łukasz Kamisiński4-Jul-13 19:25
memberŁukasz Kamisiński4-Jul-13 19:25 
GeneralRe: Very good Pin
Jacob F. W.7-Jul-13 13:45
memberJacob F. W.7-Jul-13 13:45 
GeneralRe: Very good Pin
Member 1022398326-Aug-13 14:17
memberMember 1022398326-Aug-13 14:17 
GeneralRe: Very good Pin
Jacob F. W.26-Aug-13 15:45
memberJacob F. W.26-Aug-13 15:45 
Questiona Python version Pin
PySams16-Aug-12 1:11
memberPySams16-Aug-12 1:11 
GeneralMy vote of 5 Pin
DQNOK9-Aug-12 4:50
memberDQNOK9-Aug-12 4:50 
GeneralMy vote of 5 Pin
eslipak6-Aug-12 23:10
membereslipak6-Aug-12 23:10 
AnswerPer your question.... Pin
Andrew Rissing6-Aug-12 10:58
memberAndrew Rissing6-Aug-12 10:58 
GeneralRe: Per your question.... Pin
Jacob F. W.6-Aug-12 11:50
memberJacob F. W.6-Aug-12 11:50 
GeneralRe: Per your question.... Pin
DQNOK10-Aug-12 12:49
memberDQNOK10-Aug-12 12:49 
The issue is the convergence criteria (obviously). You can quit looping when the value of (Nom / Denom) is so small that it is no longer changing the value of Sum.

That is, you could alter the loop with a test like the following:

for(int I = 0; I < 200; I++)
   double testsum;
   Nom *= Z;
   Denom *= S;
   testsum = Sum + (Nom / Denom);
   if( Sum == testsum ) // (Nom / Denom) too small to matter,
   Sum = testsum;

The Numerical Recipes book says this algorithm for the *lower* incomplete gamma function converges well for values of Z less than approx S+1, and that for larger values of Z, you should use the continued fraction algorithm (for the *upper* incomplete gamma function, then form gamma(S) - upper_igf(S,Z) as the return value). Since you're using the series algorithm for ALL values of Z, you will either want a very large number of iterations, or an exit strategy. You're better off using the exit strategy since, as you've pointed out, if you go too far you start losing precision.

I looked at my own implementation of the lower incomplete gamma function and saw that I had the number of iterations set to
I = 11 + sqrt(S) + 9*sqrt(Z); //a trial and error value.

But that's because I implemented it using Horner's rule in order to avoid a floating point division in each iteration of the loop, as well as the convergence test I noted above. Divisions are notoriously slow compared to multiplications, so I try to work them out of my code whenever possible. But to use Horner's rule, you have to know ahead of time how many iterations there are. Thus, the computation. I won't swear that it will work in all situations. I go ahead and use a loop like yours once I exit the Horner's section, just to make sure I've converged.

Again, thanks for a great article, and for sharing.
Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code.


QuestionHave you heard about the "Numerical Recipies in C"? Pin
SMDtheone6-Aug-12 7:51
memberSMDtheone6-Aug-12 7:51 
AnswerRe: Have you heard about the "Numerical Recipies in C"? Pin
DQNOK9-Aug-12 5:05
memberDQNOK9-Aug-12 5:05 
GeneralRe: Have you heard about the "Numerical Recipies in C"? Pin
Don Clugston13-Aug-12 20:46
memberDon Clugston13-Aug-12 20:46 
GeneralMy vote of 5 Pin
K4HVDs5-Aug-12 9:31
memberK4HVDs5-Aug-12 9:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.171019.1 | Last Updated 6 Nov 2012
Article Copyright 2012 by Jacob F. W.
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid