## Introduction

The 80-20 rules applies: even with the advances of statistics, most of our work requires only univariate descriptive statistics – those involve the calculations of mean, standard deviation, range, skewness, kurtosis, percentile, quartiles, etc. This article describes a simple way to construct a set of classes to implement descriptive statistics in C#. The emphasis is on the ease of use at the users' end.

## Requirements

To run the code, you need to have the following:

- .NET Framework 2.0 and above
- Microsoft Visual Studio 2005 if you want to open the project files included in the download project
- Nunit 2.4 if you want to run the unit tests included in the download project

The download included in this article is implemented as a class library. You will need to make a reference to the project to make use of the functionalities.

The download also includes a NUnit test in case you want to make changes to the code and run your own unit test.

## The Code

The goal of the code design is to simplify the usage. We envisage that the user will perform the following code to get the desired results. This involves a simple 3-steps process:

- Instantiate a
`Descriptive `

object
- Invoke its
`.Analyze() `

method
- Retrieve results from its
`.Result `

object

Here is a typical user’s code:

double[] x = {1, 2, 4, 7, 8, 9, 10, 12};
Descriptive desp = new Descriptive(x);
desp.Analyze();
Console.WriteLine("Result is: " + desp.Result.FirstQuartile.ToString());

Two classes are implemented:

`DescriptiveResult`

`Descriptive`

`DescriptiveResult `

is a class from which a result object derives, which holds the analysis results. In our implementation, the `.Result `

member variable is defined as follows:

public class DescriptiveResult
{
internal double[] sortedData;
public DescriptiveResult() { }
public uint Count;
public double Sum;
public double Mean;
public double GeometricMean;
public double HarmonicMean;
public double Min;
public double Max;
public double Range;
public double Variance;
public double StdDev;
public double Skewness;
public double Kurtosis;
public double IQR;
public double Median;
public double FirstQuartile;
public double ThirdQuartile;
internal double SumOfError;
internal double SumOfErrorSquare;

For simplicity, most member variables are implemented as `public `

variables. The only member function - `Percentile`

- allows the user to pass the argument (in percentage, e.g. 30 for 30%) and receive the percentile result.

The following table lists the available results (assuming that the `Descriptive `

object name you use is `desp`

:

Result | Result stored in variable |

Number of data points | `desp.Result.Count` |

Minimum value | `desp.Result.Min` |

Maximum value | `desp.Result.Max` |

Range of values | `desp.Result.Range` |

Sum of values | `desp.Result.Sum` |

Arithmetic mean | `desp.Result.Mean` |

Geometric mean | `desp.Result.GeometricMean` |

Harmonic mean | `desp.Result.HarmonicMean` |

Sample variance | `desp.Result.Variance` |

Sample standard deviation | `desp.Result.StdDev` |

Skewness of the distribution | `desp.Result.Skewness` |

Kurtosis of the distribution | `desp.Result.Kurtosis` |

Interquartile range | `desp.Result.IQR` |

Median (50% percentile) | `desp.Result.Median` |

FirstQuartile: 25% percentile | `desp.Result.FirstQuartile` |

ThirdQuartile: 75% percentile | `desp.Result.ThirdQuartile` |

Percentile | `desp.Result.Percentile()` * |

* The argument of percentile is values from 0 to 100, which indicates the percentile desired.

## Descriptive Class

The `Descriptive `

class does all the analysis, and it is implemented as follows:

public class Descriptive
{
private double[] data;
private double[] sortedData;
public DescriptiveResult Result = new DescriptiveResult();
#region Constructors
public Descriptive() { }
public Descriptive(double[] dataVariable)
{
data = dataVariable;
}
#endregion // Constructors

Note that we need a `sortedData `

class to facilitate percentile and quartile-related statistics. It stores the sorted version of the user data.

The constructor of `Descriptive `

class allows the user to assign the data array during the object instantiation:

double[] x = {1, 2, 4, 7, 8, 9, 10, 12};
Descriptive desp = new Descriptive(x);

Once the `Descriptive `

object is instantiated, the user only needs to call the `.Analyze() `

method to perform the analysis. Subsequently, the user can retrieve the analysis results from the `.Result `

object in the `Descriptive `

object.

The `Analyze() `

method is implemented as follows:

public void Analyze()
{
Result.Count = 0;
Result.Min = Result.Max = Result.Range = Result.Mean =
Result.Sum = Result.StdDev = Result.Variance = 0.0d;
double sumOfSquare = 0.0d;
double sumOfESquare = 0.0d;
double[] squares = new double[data.Length];
double cumProduct = 1.0d;
double cumReciprocal = 0.0d;
for (int i = 0; i < data.Length; i++)
{
if (i==0)
{
Result.Min = data[i];
Result.Max = data[i];
Result.Mean = data[i];
Result.Range = 0.0d;
}
else
{
if (data[i] < Result.Min) Result.Min = data[i];
if (data[i] > Result.Max) Result.Max = data[i];
}
Result.Sum += data[i];
squares[i] = Math.Pow(data[i], 2);
sumOfSquare += squares[i];
cumProduct *= data[i];
cumReciprocal += 1.0d / data[i];
}
Result.Count = (uint)data.Length;
double n = (double)Result.Count;
Result.Mean = Result.Sum / n;
Result.GeometricMean = Math.Pow(cumProduct, 1.0 / n);
Result.HarmonicMean = 1.0d / (cumReciprocal / n);
Result.Range = Result.Max - Result.Min;
double m1 = 0.0d;
double m2 = 0.0d;
double m3 = 0.0d;
double m4 = 0.0d;
for (int i = 0; i < data.Length; i++)
{
double m = data[i] - Result.Mean;
double mPow2 = m * m;
double mPow3 = mPow2 * m;
double mPow4 = mPow3 * m;
m1 += Math.Abs(m);
m2 += mPow2;
m3 += mPow3;
m4 += mPow4;
}
Result.SumOfError = m1;
Result.SumOfErrorSquare = m2;
sumOfESquare = m2;
Result.Variance = sumOfESquare / ((double)Result.Count - 1);
Result.StdDev = Math.Sqrt(Result.Variance);
double skewCum = 0.0d;
for (int i = 0; i < data.Length; i++)
{
skewCum += Math.Pow((data[i] - Result.Mean) / Result.StdDev, 3);
}
Result.Skewness = n / (n - 1) / (n - 2) * skewCum;
double m2_2 = Math.Pow(sumOfESquare, 2);
Result.Kurtosis = ((n + 1) * n * (n - 1)) / ((n - 2) * (n - 3)) *
(m4 / m2_2) -
3 * Math.Pow(n - 1, 2) / ((n - 2) * (n - 3));
sortedData = new double[data.Length];
data.CopyTo(sortedData, 0);
Array.Sort(sortedData);
Result.sortedData = new double[data.Length];
sortedData.CopyTo(Result.sortedData, 0);
Result.FirstQuartile = percentile(sortedData, 25);
Result.ThirdQuartile = percentile(sortedData, 75);
Result.Median = percentile(sortedData, 50);
Result.IQR = percentile(sortedData, 75) - percentile(sortedData, 25);
}

The calculations of descriptive statistics are quite straightforward, except for the percentile function (and the subsequent quartile calculations), is a little tricky. Therefore, I have a separate function to handle it, as follows:

internal static double percentile(double[] sortedData, double p)
{
if (p >= 100.0d) return sortedData[sortedData.Length - 1];
double position = (double)(sortedData.Length + 1) * p / 100.0;
double leftNumber = 0.0d, rightNumber = 0.0d;
double n = p / 100.0d * (sortedData.Length - 1) + 1.0d;
if (position >= 1)
{
leftNumber = sortedData[(int)System.Math.Floor(n) - 1];
rightNumber = sortedData[(int)System.Math.Floor(n)];
}
else
{
leftNumber = sortedData[0];
rightNumber = sortedData[1];
}
if (leftNumber == rightNumber)
return leftNumber;
else
{
double part = n - System.Math.Floor(n);
return leftNumber + part * (rightNumber - leftNumber);
}
}

The percentile algorithm is derived from Amir Aczel’s book "Complete Business Statistics".

## Conclusion

The descriptive statistics program presented here provides a simple way to obtain commonly used descriptive statistics, including standard deviations, skewness, kurtosis, percentiles, quartiles, etc.

## History

- 28
^{th} June, 2008: Initial post

## About the Author

Jan Low, PhD, is a senior software architect at Foundasoft.com, Malaysia. He is also the author of various text analysis software, statistical libraries, image processing libraries, and security encryption component. He programs primarily in C#, C++ and VB.NET.

Occupation: Senior software architect

Location: Malaysia