12,956,234 members (65,932 online)
alternative version

#### Stats

62.4K views
36 bookmarked
Posted 31 Jul 2012

# How to Calculate the Chi-Squared P-Value

, 5 Nov 2012 Public Domain
 Rate this:
Learn the Secrets of the Universe

## Introduction

Have you ever wondered how those big tables of P-Values for the Chi-Squared Test are Calculated?

Have you ever wanted to calculate those values yourself?

Have you ever asked for help online to calculate those values, and someone gave you an annoying link to a webpage that had either the tables or a form applet that would calculate it for you, with no mention What. So. Ever of the actual FORMULA!?!?!?!?

Then you're in good company. Keep on reading, this is for you.

## Background

The Chi-Squared Distribution is probably the most widely used distribution in Statistics today. It is most well known for its use in Pearson's Chi-Squared Test which is used to measure goodness of fit.

```int Observed[256];
int Expected[256];
double CriticalValue = 0.0;
double XSqr;

for(int I = 0; I < 256; I++)
{
XSqr = Observed[I] - Expected[I];
CriticalValue += ((XSqr * XSqr) / Expected[I]);
}

/*
Degrees of Freedom = 255 (The number of measure values - 1)
Test Statistic = CriticalValue
*/```

Above is a brief example of the first part of Pearson's Chi-Squared Test. This demonstrates how we obtain the two values we need to calculate the P-Value: the Degrees of Freedom and the Critical Value.

## What is this 'P', and why is it so Valuable?

For those of you who are new to this, the P-Value is the probability that the difference between the Observed and the Expected values would happen purely by chance. For instance, each side of a fair coin should have an equal probability of occurring.

Tails = 50.0%

So in theory if you flip a coin 1,000 times, Heads and Tails should each occur 500 times. In reality, this rarely happens. You're much more likely to have Heads land head up 507 times, and Tails 493, or vice versa. So if it's a little bit different, that's understandable.

But when does that difference become just to much. A fair coin will probably land Heads up 507/1000. But what if it's 515? What if it's 600?

This is what the P-Value is for. It allows you to Objectively assign a probability to an Observed value randomly occurring by comparing it with the Expected Value.

## Using the code

Above I gave a brief example of how you calculate the Critical Value and Degrees of Freedom for a Chi-Squared Test. This was for demonstration only, and is not in the included code, as you will undoubtedly have to customize it for your own situation. If you have any problems with this, there are plenty of excellent tutorials online demonstrating how to calculate these values. I would highly recommend any number of these.

But that's not what I'm here to look at. I'm here to show you how to Calculate the actual P-Value. So let's look at some code.

```double chisqr(int Dof, double Cv)
{
if(Cv < 0 || Dof < 1)
{
return 0.0;
}
double K = ((double)Dof) * 0.5;
double X = Cv * 0.5;
if(Dof == 2)
{
return exp(-1.0 * X);
}

double PValue = igf(K, X);
if(isnan(PValue) || isinf(PValue) || PValue <= 1e-8)
{
return 1e-14;
}

PValue /= gamma(K);
//PValue /= tgamma(K);

return (1.0 - PValue);
}```

This is the main function that you will be using, found in chisqr.c. It's very simple; you put in your values, and it spits out the P-Value; You'll notice, if you only have 2 Degrees of Freedom then you're in luck. There is a shortcut to calculating the P-Value using only one call to `exp()`, which is considerably faster than `chisqr()` would be otherwise

There are two other functions that you will need to in order to finish calculating the P-Value.

```static double igf(double S, double Z)
{
if(Z < 0.0)
{
return 0.0;
}
double Sc = (1.0 / S);
Sc *= pow(Z, S);
Sc *= exp(-Z);

double Sum = 1.0;
double Nom = 1.0;
double Denom = 1.0;

for(int I = 0; I < 200; I++)
{
Nom *= Z;
S++;
Denom *= S;
Sum += (Nom / Denom);
}

return Sum * Sc;
}```

Next up is the Incomplete Gamma Function. This is what computes the first part of the P-Value. If you're looking for an explanation as to how this relates to calculating the Chi-Squared Value, you can find it in this section on Wiki's Chi Square Page.

As a side note, in chisqr.c. I give a brief explanation as to why I have the for loop go for `200` iterations. The short answer is, this is what I arrived at by trial and error. If you have any experience in this area and know anything more about this, please tell me in the comments below.

And last but not least...

```double gamma(double N)
{
const long double SQRT2PI = 2.5066282746310005024157652848110452530069867406099383;

long double Z = (long double)N;
long double Sc = powl((Z + A), (Z + 0.5));
Sc *= expl(-1.0 * (Z + A));
Sc /= Z;

long double F = 1.0;
long double Ck;
long double Sum = SQRT2PI;

for(int K = 1; K < A; K++)
{
Z++;
Ck = powl(A - K, K - 0.5);
Ck *= expl(A - K);
Ck /= F;

Sum += (Ck / Z);

F *= (-1.0 * K);
}

return (double)(Sum * Sc);
}```

Say hello to the Gamma Function. For those of you who are unfamiliar with it, the Gamma Function allows you to compute the Factorial of decimals (i.e. 5.5!).

Quick Sidenote. If you've never used the Gamma Function before, there's something you should know about it. It will actually compute the factorial of the Number you put in, Minus 1. `Gamma(X)` = (X - 1)! So if you put in `Gamma(5)` you won't get 120, you'll get 24. "Why", you ask? I don't know: mathematicians are just weird like that. I think it's how they punish society for calling them nerds for all those years, but that's just my theory.

Depending whatever language you're using, you're standard math library might already have the gamma function built in. For instance in C, math.h you can calculate the gamma function by calling `tgamma()`. I would suggest you check to see if you have it, as the implementations in the standard libraries are usually more efficient, faster, and more accurate.

Now the Gamma Function you see above is an implementation of Spouge's Approximation. The reason that I like it, is that it allows you to compute the Gamma Function to any accuracy you desire, so long as you have a datatype big enough to hold the required data. You can implement this using an Arbitrarily Precise Library (in fact, I think it already is) and calculate your desired accuracy.

It is not, however, the fastest version of the Gamma Function. There are several approximations that actually give a value that is more than accurate enough for a Chi-Squared Test.

```double approx_gamma(double Z)
{
const double RECIP_E = 0.36787944117144232159552377016147;  // RECIP_E = (E^-1) = (1.0 / E)
const double TWOPI = 6.283185307179586476925286766559;  // TWOPI = 2.0 * PI

double D = 1.0 / (10.0 * Z);
D = 1.0 / ((12 * Z) - D);
D = (D + Z) * RECIP_E;
D = pow(D, Z);
D *= sqrt(TWOPI / Z);

return D;
} ```

This is a simple version of Stirling's Approximation. This will give you around 6-7 digits of accuracy. I compared the P-Value I got using this one versus using my other implementation and the one in math.h.

```// Degrees_of_Freedom = 255
// Critical_Value = 290.285192
// Output = P_Value
// chisqr(Degrees_of_Freedom, Critical_Value) = P_Value

chisqr(255, 290.285192) = 0.06364235  // Using tgamma() in math.h
chisqr(255, 290.285192) = 0.06364235  // Using gamma()
chisqr(255, 290.285192) = 0.06364235  // Using approx_gamma()  ```

So yeah, tried the gamma function in the math library, my implementation of Spouge's Approximation, and Stirling's Approximation printed them out to the screen: no difference. Some of the lower bits are probably different, but this is already much more accurate than any table or calculator you'd find online. So unless you are going for pure accuracy, I'd suggest sticking with `approx_gamma()`.

## Conclusion

Jeez, that's a lot of code. Well thank you everyone for sticking with me through all of this. I've found the algorithms to be very important, and I'm happy I get to share them with you all. If you have any questions about, please feel free to ask and I'll do my best to answer them. If you want to know anything about the specific formulas themselves, then I'd suggest looking in the links I gave.  If you have any comments, criticism, suggestions, or improvements, please feel free to post a comment.

One last thing. Every once in a while over the last few years I've been in need of these functions and have scrambled about trying to find them, often to find them placed under heavy and restrictive licenses. This in and of itself isn't a bad thing, but considering how important these are and how popular they are in math and testing equipment, I was simply amazed that there were so few implementations freely available.

So, to make sure this is absolutely clear.

This code is place in the Public Domain, and may be used by anyone for any reason for any purpose with no charge, no permission needed, etc.

I'll admit these aren't the best implementations of these functions, but they work, and they're better than nothing at all.

Thanks,

Jacob W.

## Update

8/2/2012: I was doing some testing when I realized I had forgotten to account for when the `PValue` is `nan` or` inf`. I've updated the example with a corrected version, as well as the ZIP file. My apologies if this confused anybody.

11/6/2012:  Uploaded `chisqr_v2.zip`.   ChiSqr() has had a significant upgrade.   I've modified the code so that it now uses Natural Logarithms to help calculate the ChiSqr P-Value.  This gives it a huge increase in range and accuracy.  It is slower though.  Sadly I haven't found away to improve this one aspect.  Despite that, I would highly recommend using this new Version over the original, as I believe it is more than worth it.  I'll update the examples in the Article in a few days.  Once again, all code is placed in the Public Domain.

## Share

 United States
No Biography provided

## You may also be interested in...

 First Prev Next
 Message Removed Member 1290497714-Dec-16 18:31 Member 12904977 14-Dec-16 18:31
 variable A? William M. Nelson12-Jun-14 7:30 William M. Nelson 12-Jun-14 7:30
 Re: variable A? Jacob F. W.12-Jun-14 7:46 Jacob F. W. 12-Jun-14 7:46
 There are some errors in the updated version Member 107884681-May-14 9:29 Member 10788468 1-May-14 9:29
 Re: There are some errors in the updated version jcfried23-May-15 10:12 jcfried 23-May-15 10:12
 Function is not giving the right result SweetInfection22-Apr-14 13:13 SweetInfection 22-Apr-14 13:13
 Re: Function is not giving the right result Jacob F. W.10-Jun-14 13:29 Jacob F. W. 10-Jun-14 13:29
 Re: Function is not giving the right result SweetInfection10-Jun-14 14:10 SweetInfection 10-Jun-14 14:10
 Fantastic - would love to chat about algorithms. Member 1076472021-Apr-14 11:22 Member 10764720 21-Apr-14 11:22
 Very good Łukasz Kamisiński4-Jul-13 19:25 Łukasz Kamisiński 4-Jul-13 19:25
 Re: Very good Jacob F. W.7-Jul-13 13:45 Jacob F. W. 7-Jul-13 13:45
 Re: Very good Member 1022398326-Aug-13 14:17 Member 10223983 26-Aug-13 14:17
 Re: Very good Jacob F. W.26-Aug-13 15:45 Jacob F. W. 26-Aug-13 15:45
 a Python version PySams16-Aug-12 1:11 PySams 16-Aug-12 1:11
 Hi all I just wanna share a Python version of this code. see ya
 My vote of 5 DQNOK9-Aug-12 4:50 DQNOK 9-Aug-12 4:50
 My vote of 5 eslipak6-Aug-12 23:10 eslipak 6-Aug-12 23:10
 Per your question.... Andrew Rissing6-Aug-12 10:58 Andrew Rissing 6-Aug-12 10:58
 Re: Per your question.... Jacob F. W.6-Aug-12 11:50 Jacob F. W. 6-Aug-12 11:50
 Re: Per your question.... DQNOK10-Aug-12 12:49 DQNOK 10-Aug-12 12:49
 Have you heard about the "Numerical Recipies in C"? SMDtheone6-Aug-12 7:51 SMDtheone 6-Aug-12 7:51
 Re: Have you heard about the "Numerical Recipies in C"? DQNOK9-Aug-12 5:05 DQNOK 9-Aug-12 5:05
 Re: Have you heard about the "Numerical Recipies in C"? Don Clugston13-Aug-12 20:46 Don Clugston 13-Aug-12 20:46
 My vote of 5 K4HVDs5-Aug-12 9:31 K4HVDs 5-Aug-12 9:31
 Last Visit: 31-Dec-99 18:00     Last Update: 29-May-17 11:51 Refresh 1