Click here to Skip to main content
Click here to Skip to main content

A User-Friendly C# Descriptive Statistic Class

, 28 Jun 2008 CPOL
Rate this:
Please Sign up or sign in to vote.
An article on most commonly used descriptive statistics, including standard deviations, skewness, kurtosis, percentiles, quartiles, etc.

Introduction

The 80-20 rules applies: even with the advances of statistics, most of our work requires only univariate descriptive statistics – those involve the calculations of mean, standard deviation, range, skewness, kurtosis, percentile, quartiles, etc. This article describes a simple way to construct a set of classes to implement descriptive statistics in C#. The emphasis is on the ease of use at the users' end.

Requirements

To run the code, you need to have the following:

  • .NET Framework 2.0 and above
  • Microsoft Visual Studio 2005 if you want to open the project files included in the download project
  • Nunit 2.4 if you want to run the unit tests included in the download project

The download included in this article is implemented as a class library. You will need to make a reference to the project to make use of the functionalities.

The download also includes a NUnit test in case you want to make changes to the code and run your own unit test.

The Code

The goal of the code design is to simplify the usage. We envisage that the user will perform the following code to get the desired results. This involves a simple 3-steps process:

  1. Instantiate a Descriptive object
  2. Invoke its .Analyze() method
  3. Retrieve results from its .Result object

Here is a typical user’s code:

double[] x  = {1, 2, 4, 7, 8, 9, 10, 12};
Descriptive desp = new Descriptive(x);
desp.Analyze(); // analyze the data
Console.WriteLine("Result is: " + desp.Result.FirstQuartile.ToString());

Two classes are implemented:

  • DescriptiveResult
  • Descriptive

DescriptiveResult is a class from which a result object derives, which holds the analysis results. In our implementation, the .Result member variable is defined as follows:

    /// <span class="code-SummaryComment"><summary></span>
    /// The result class the holds the analysis results
    /// <span class="code-SummaryComment"></summary></span>
    public class DescriptiveResult
    {
        // sortedData is used to calculate percentiles
        internal double[] sortedData;

        /// <span class="code-SummaryComment"><summary></span>
        /// DescriptiveResult default constructor
        /// <span class="code-SummaryComment"></summary></span>
        public DescriptiveResult() { }

        /// <span class="code-SummaryComment"><summary></span>
        /// Count
        /// <span class="code-SummaryComment"></summary></span>
        public uint Count;
        /// <span class="code-SummaryComment"><summary></span>
        /// Sum
        /// <span class="code-SummaryComment"></summary></span>
        public double Sum;
        /// <span class="code-SummaryComment"><summary></span>
        /// Arithmetic mean
        /// <span class="code-SummaryComment"></summary></span>
        public double Mean;
        /// <span class="code-SummaryComment"><summary></span>
        /// Geometric mean
        /// <span class="code-SummaryComment"></summary></span>
        public double GeometricMean;
        /// <span class="code-SummaryComment"><summary></span>
        /// Harmonic mean
        /// <span class="code-SummaryComment"></summary></span>
        public double HarmonicMean;
        /// <span class="code-SummaryComment"><summary></span>
        /// Minimum value
        /// <span class="code-SummaryComment"></summary></span>
        public double Min;
        /// <span class="code-SummaryComment"><summary></span>
        /// Maximum value
        /// <span class="code-SummaryComment"></summary></span>
        public double Max;
        /// <span class="code-SummaryComment"><summary></span>
        /// The range of the values
        /// <span class="code-SummaryComment"></summary></span>
        public double Range;
        /// <span class="code-SummaryComment"><summary></span>
        /// Sample variance
        /// <span class="code-SummaryComment"></summary></span>
        public double Variance;
        /// <span class="code-SummaryComment"><summary></span>
        /// Sample standard deviation
        /// <span class="code-SummaryComment"></summary></span>
        public double StdDev;
        /// <span class="code-SummaryComment"><summary></span>
        /// Skewness of the data distribution
        /// <span class="code-SummaryComment"></summary></span>
        public double Skewness;
        /// <span class="code-SummaryComment"><summary></span>
        /// Kurtosis of the data distribution
        /// <span class="code-SummaryComment"></summary></span>
        public double Kurtosis;
        /// <span class="code-SummaryComment"><summary></span>
        /// Interquartile range
        /// <span class="code-SummaryComment"></summary></span>
        public double IQR;
        /// <span class="code-SummaryComment"><summary></span>
        /// Median, or second quartile, or at 50 percentile
        /// <span class="code-SummaryComment"></summary></span>
        public double Median;
        /// <span class="code-SummaryComment"><summary></span>
        /// First quartile, at 25 percentile
        /// <span class="code-SummaryComment"></summary></span>
        public double FirstQuartile;
        /// <span class="code-SummaryComment"><summary></span>
        /// Third quartile, at 75 percentile
        /// <span class="code-SummaryComment"></summary></span>
        public double ThirdQuartile;

        /// <span class="code-SummaryComment"><summary></span>
        /// Sum of Error
        /// <span class="code-SummaryComment"></summary></span>
        internal double SumOfError;

        /// <span class="code-SummaryComment"><summary></span>
        /// The sum of the squares of errors
        /// <span class="code-SummaryComment"></summary></span>
        internal double SumOfErrorSquare;

        /// <span class="code-SummaryComment"><summary></span>
        /// Percentile
        /// <span class="code-SummaryComment"></summary></span>
        /// <span class="code-SummaryComment"><param name="percent">Pecentile, between 0 to 100</param></span>
        /// <span class="code-SummaryComment"><returns>Percentile<returns></span>

For simplicity, most member variables are implemented as public variables. The only member function - Percentile - allows the user to pass the argument (in percentage, e.g. 30 for 30%) and receive the percentile result.

The following table lists the available results (assuming that the Descriptive object name you use is desp:

Result Result stored in variable
Number of data points desp.Result.Count
Minimum value desp.Result.Min
Maximum value desp.Result.Max
Range of values desp.Result.Range
Sum of values desp.Result.Sum
Arithmetic mean desp.Result.Mean
Geometric mean desp.Result.GeometricMean
Harmonic mean desp.Result.HarmonicMean
Sample variance desp.Result.Variance
Sample standard deviation desp.Result.StdDev
Skewness of the distribution desp.Result.Skewness
Kurtosis of the distribution desp.Result.Kurtosis
Interquartile range desp.Result.IQR
Median (50% percentile) desp.Result.Median
FirstQuartile: 25% percentile desp.Result.FirstQuartile
ThirdQuartile: 75% percentile desp.Result.ThirdQuartile
Percentile desp.Result.Percentile()*

* The argument of percentile is values from 0 to 100, which indicates the percentile desired.

Descriptive Class

The Descriptive class does all the analysis, and it is implemented as follows:

/// <span class="code-SummaryComment"><summary></span>
/// Descriptive class
/// <span class="code-SummaryComment"></summary></span>
public class Descriptive
{
    private double[] data;
    private double[] sortedData;

    /// <span class="code-SummaryComment"><summary></span>
    /// Descriptive results
    /// <span class="code-SummaryComment"></summary></span>
    public DescriptiveResult Result = new DescriptiveResult();

    #region Constructors
    /// <span class="code-SummaryComment"><summary></span>
    /// Descriptive analysis default constructor
    /// <span class="code-SummaryComment"></summary></span>
    public Descriptive() { } // default empty constructor

    /// <span class="code-SummaryComment"><summary></span>
    /// Descriptive analysis constructor
    /// <span class="code-SummaryComment"></summary></span>
    /// <span class="code-SummaryComment"><param name="dataVariable">Data array</param></span>
    public Descriptive(double[] dataVariable)
    {
       data = dataVariable;
    }
    #endregion //  Constructors

Note that we need a sortedData class to facilitate percentile and quartile-related statistics. It stores the sorted version of the user data.

The constructor of Descriptive class allows the user to assign the data array during the object instantiation:

double[] x  = {1, 2, 4, 7, 8, 9, 10, 12};
Descriptive desp = new Descriptive(x);

Once the Descriptive object is instantiated, the user only needs to call the .Analyze() method to perform the analysis. Subsequently, the user can retrieve the analysis results from the .Result object in the Descriptive object.

The Analyze() method is implemented as follows:

/// <span class="code-SummaryComment"><summary></span>
/// Run the analysis to obtain descriptive information of the data
/// <span class="code-SummaryComment"></summary></span>
public void Analyze()
{
// initializations
Result.Count = 0;
Result.Min = Result.Max = Result.Range = Result.Mean =
Result.Sum = Result.StdDev = Result.Variance = 0.0d;

double sumOfSquare = 0.0d;
double sumOfESquare = 0.0d; // must initialize

double[] squares = new double[data.Length];
double cumProduct = 1.0d; // to calculate geometric mean
double cumReciprocal = 0.0d; // to calculate harmonic mean

// First iteration
for (int i = 0; i < data.Length; i++)
{
    if (i==0) // first data point
    {
        Result.Min = data[i];
        Result.Max = data[i];
        Result.Mean = data[i];
        Result.Range = 0.0d;
    }
    else
    { // not the first data point
        if (data[i] < Result.Min) Result.Min = data[i];
        if (data[i] > Result.Max) Result.Max = data[i];
    }
    Result.Sum += data[i];
    squares[i] = Math.Pow(data[i], 2); //TODO: may not be necessary
    sumOfSquare += squares[i];

    cumProduct *= data[i];
    cumReciprocal += 1.0d / data[i];
}

Result.Count = (uint)data.Length;
double n = (double)Result.Count; // use a shorter variable in double type
Result.Mean = Result.Sum / n;
Result.GeometricMean = Math.Pow(cumProduct, 1.0 / n);
// see http://mathworld.wolfram.com/HarmonicMean.html
Result.HarmonicMean = 1.0d / (cumReciprocal / n); 
Result.Range = Result.Max - Result.Min;

// second loop, calculate Stdev, sum of errors
//double[] eSquares = new double[data.Length];
double m1 = 0.0d;
double m2 = 0.0d;
double m3 = 0.0d; // for skewness calculation
double m4 = 0.0d; // for kurtosis calculation
// for skewness
for (int i = 0; i < data.Length; i++)
{
    double m = data[i] - Result.Mean;
    double mPow2 = m * m;
    double mPow3 = mPow2 * m;
    double mPow4 = mPow3 * m;

    m1 += Math.Abs(m);

    m2 += mPow2;

    // calculate skewness
    m3 += mPow3; // Math.Pow((data[i] - mean), 3);

    // calculate skewness
    m4 += mPow4; // Math.Pow((data[i] - mean), 4);

}

Result.SumOfError = m1;
Result.SumOfErrorSquare = m2; // Added for Excel function DEVSQ
sumOfESquare = m2;

// var and standard deviation
Result.Variance = sumOfESquare / ((double)Result.Count - 1);
Result.StdDev = Math.Sqrt(Result.Variance);

// using Excel approach
double skewCum = 0.0d; // the cum part of SKEW formula
for (int i = 0; i < data.Length; i++)
{
    skewCum += Math.Pow((data[i] - Result.Mean) / Result.StdDev, 3);
}
Result.Skewness = n / (n - 1) / (n - 2) * skewCum;

// kurtosis: see http://en.wikipedia.org/wiki/Kurtosis (heading: Sample Kurtosis)
double m2_2 = Math.Pow(sumOfESquare, 2);
Result.Kurtosis = ((n + 1) * n * (n - 1)) / ((n - 2) * (n - 3)) *
    (m4 / m2_2) -
    3 * Math.Pow(n - 1, 2) / ((n - 2) * (n - 3)); // second last formula for G2

// calculate quartiles
sortedData = new double[data.Length];
data.CopyTo(sortedData, 0);
Array.Sort(sortedData);

// copy the sorted data to result object so that
// user can calculate percentile easily
Result.sortedData = new double[data.Length];
sortedData.CopyTo(Result.sortedData, 0);

Result.FirstQuartile = percentile(sortedData, 25);
Result.ThirdQuartile = percentile(sortedData, 75);
Result.Median = percentile(sortedData, 50);
Result.IQR = percentile(sortedData, 75) - percentile(sortedData, 25);

} // end of method Analyze

The calculations of descriptive statistics are quite straightforward, except for the percentile function (and the subsequent quartile calculations), is a little tricky. Therefore, I have a separate function to handle it, as follows:

/// <span class="code-SummaryComment"><summary></span>
/// Calculate percentile of a sorted data set
/// <span class="code-SummaryComment"></summary></span>
/// <span class="code-SummaryComment"><param name="sortedData">array of double values</param></span>
/// <span class="code-SummaryComment"><param name="p">percentile, value 0-100</param></span>
/// <span class="code-SummaryComment"><returns></returns></span>
internal static double percentile(double[] sortedData, double p)
{
    // algo derived from Aczel pg 15 bottom
    if (p >= 100.0d) return sortedData[sortedData.Length - 1];

    double position = (double)(sortedData.Length + 1) * p / 100.0;
    double leftNumber = 0.0d, rightNumber = 0.0d;

    double n = p / 100.0d * (sortedData.Length - 1) + 1.0d;

    if (position >= 1)
    {
        leftNumber = sortedData[(int)System.Math.Floor(n) - 1];
        rightNumber = sortedData[(int)System.Math.Floor(n)];
    }
    else
    {
        leftNumber = sortedData[0]; // first data
        rightNumber = sortedData[1]; // first data
    }

    if (leftNumber == rightNumber)
        return leftNumber;
    else
    {
        double part = n - System.Math.Floor(n);
        return leftNumber + part * (rightNumber - leftNumber);
    }
} // end of internal function percentile

The percentile algorithm is derived from Amir Aczel’s book "Complete Business Statistics".

Conclusion

The descriptive statistics program presented here provides a simple way to obtain commonly used descriptive statistics, including standard deviations, skewness, kurtosis, percentiles, quartiles, etc.

History

  • 28th June, 2008: Initial post

About the Author

Jan Low, PhD, is a senior software architect at Foundasoft.com, Malaysia. He is also the author of various text analysis software, statistical libraries, image processing libraries, and security encryption component. He programs primarily in C#, C++ and VB.NET.
Occupation: Senior software architect
Location: Malaysia

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Jan Low, PhD
Architect Foundasoft.com
Malaysia Malaysia
Programmer and software architect.

Comments and Discussions

 
GeneralToo slow for streaming data... Pinmemberkrn_2k30-Jan-09 1:28 
GeneralRe: Too slow for streaming data... Pinmemberkrn_2k12-Feb-09 2:43 
GeneralI made some revisions so it can handle weights Pinmembercartfer8-Nov-08 5:43 
GeneralThe code doesn't ... ahem ... work Pinmemberjlundstocholm16-Oct-08 2:37 
GeneralRe: The code doesn't ... ahem ... work Pinmemberjlundstocholm16-Oct-08 2:57 
GeneralRe: The code doesn't ... ahem ... work PinmemberJan Low, PhD16-Oct-08 3:11 
No prob.
 
Jan Low

GeneralRe: The code doesn't ... ahem ... work PinmemberJan Low, PhD16-Oct-08 3:07 
GeneralGood clean work PinmemberSaar Yahalom29-Jun-08 0:26 
GeneralSweet PinmvpPete O'Hanlon28-Jun-08 12:43 
GeneralVery interesting PinmemberPaul Conrad28-Jun-08 10:45 
GeneralRe: Very interesting PinmemberRoberto Collina29-Jun-08 7:29 
GeneralRe: Very interesting Pinmemberpepepaco5-Aug-09 14:36 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.141216.1 | Last Updated 28 Jun 2008
Article Copyright 2008 by Jan Low, PhD
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid