Click here to Skip to main content
15,888,610 members
Articles / Programming Languages / C#
Tip/Trick

A computational statistics class

Rate me:
Please Sign up or sign in to vote.
2.97/5 (16 votes)
19 Nov 2013CPOL1 min read 114.6K   1.8K   50   13
A computational statistics class in C#

Introduction

This is a computational statistics class written in C#. The public methods are described below.

Public methods

  • Constructor:
    public statistics(params
     double[] list)
  • Update the design:
    public void update(params
     double[] list)
  • Compute the mode: public double mode() - If there is more then one mode the <st1:place>NaN value will be returned.
  • Compute the size of the design: public int length()
  • Compute the minimum value of the design: public double min()
  • Compute the maximum value of the design: public double max()
  • Compute the first quarter : public double Q1()
  • Compute the median : public double Q2()
  • Compute the third quarter : public double Q3()
  • Compute the average: public double mean()
  • Compute the range: public double range()
  • Compute the inter quantile range: public double IQ()
  • Compute middle of range:
    public double
     middle_of_range()
  • Compute sample variance: public double var()
  • Compute standard deviation: public double s()
  • Compute the YULE index: public double YULE()
  • Compute the index standard of a given member of the design
    :  public
     double Z(double member)
  • Compute the covariance
    : public double cov(statistics s), public
     static double cov(statistics s1,statistics s2)
  • Compute the correlation coefficient
    : public double r(statistics design), public  static double r(statistics
     design1,statistics design2)
  • Compute the "a" factor of the linear function of design
    : public double
     a(statistics design) 
  • Compute the "a" factor of the linear function of design2
    : public static double a(statistics
     design1,statistics design2)
  • Compute the "b" factor of the linear function of design:
    public double
     b(statistics design)
  • Compute the "b" factor of the linear function of design 2
    : public static
     double b(statistics design1,statistics design2)

Source code

C#
using System;
 
namespace statistic
{
    
    public class statistics
    {
 
      private double[] list;
 
      public statistics(params double[] list)
      {
        this.list=list;
      }
      
      public void update(params double[] list)
      {
         this.list=list;
      }
 
      public double mode()
      {
         try
         {
              double[] i=new double[list.Length];
              list.CopyTo(i,0);
              sort(i);
              double val_mode=i[0],help_val_mode=i[0];
              int old_counter=0,new_counter=0;
              int j=0;
              for (;j<=i.Length-1;j++)
               if (i[j]==help_val_mode) new_counter++;
               else if (new_counter>old_counter) 
               {
                   old_counter=new_counter;
                   new_counter=1;
                   help_val_mode=i[j];
                   val_mode=i[j-1];
               }
               else if (new_counter==old_counter) 
               {
                   val_mode=double.NaN;
                   help_val_mode=i[j];
                   new_counter=1;
               }
               else 
               {
                   help_val_mode=i[j];
                   new_counter=1;
               }
              if (new_counter>old_counter) val_mode=i[j-1]; 
              else if (new_counter==old_counter) val_mode=double.NaN;
              return val_mode;
         }
         catch (Exception)
         {
            return double.NaN;
         }
      }
 
      public int length()
      {
         return list.Length;
      }
 
      public double min()
      {
         double minimum=double.PositiveInfinity;
         for (int i=0;i<=list.Length-1;i++)
                  if (list[i]<minimum) minimum=list[i];
         return minimum;
      }
 
      public double max()
      {
         double maximum=double.NegativeInfinity;
         for (int i=0;i<=list.Length-1;i++)
                  if (list[i]>maximum) maximum=list[i];
         return maximum;
      }   
    
      public double Q1()
      {
         return Qi(0.25);
      }
 
      public double Q2()
      {
         return Qi(0.5);
      }
 
      public double Q3()
      {
         return Qi(0.75);
      }
 
      public double mean()
      {
         try
         {
          double sum=0;
          for (int i=0;i<=list.Length-1;i++)
                   sum+=list[i];
          return sum/list.Length;
         }
         catch (Exception)
         {
                  return double.NaN;
         }
      }
 
      public double range()
      {
         double minimum=min();
         double maximum=max();
         return (maximum-minimum);
      }
 
      public double IQ()
      { 
         return Q3()-Q1();
      }
 
      public double middle_of_range()
      {
         double minimum=min();
         double maximum=max();
         return (minimum+maximum)/2; 
      }
 
      public double var()
      {
         try
         {
              double s=0;
              for (int i=0;i<=list.Length-1;i++)
                       s+=Math.Pow(list[i],2);
              return (s-list.Length*Math.Pow(mean(),2))/(list.Length-1);
         }
         catch (Exception)
         {
            return double.NaN;
         }
      }
 
      public double s()
      {
         return Math.Sqrt(var());
      }
 
      public double YULE()
      {
         try
         {
            return ((Q3()-Q2())-(Q2()-Q1()))/(Q3()-Q1());
         }
         catch (Exception)
         {
      return double.NaN;
         }
      }
 
      public double Z(double member)
      {
         try
         {
            if (exist(member)) return (member-mean())/s();
                else return double.NaN;
         }
         catch(Exception)
         {
            return double.NaN;
         }
      }
 
      public double cov(statistics s)
      {
         try
         {
              if (this.length()!=s.length()) return double.NaN;
              int len=this.length();
              double sum_mul=0;
              for (int i=0;i<=len-1;i++)
                sum_mul+=(this.list[i]*s.list[i]);
              return (sum_mul-len*this.mean()*s.mean())/(len-1);
         }
         catch(Exception)
         {
            return double.NaN;
         }
      }
 
      public static double cov(statistics s1,statistics s2)
      {
         try
         {
              if (s1.length()!=s2.length()) return double.NaN;
              int len=s1.length();
              double sum_mul=0;
              for (int i=0;i<=len-1;i++)
                sum_mul+=(s1.list[i]*s2.list[i]);
              return (sum_mul-len*s1.mean()*s2.mean())/(len-1);
         }
         catch(Exception)
         {
            return double.NaN;
         }
      }
 
      public double r(statistics design)
      {
         try
         {
            return this.cov(design)/(this.s()*design.s());
         }
         catch(Exception)
         {
            return double.NaN;
         }
      }
 
      public static double r(statistics design1,statistics design2)
      {
         try
         {
            return cov(design1,design2)/(design1.s()*design2.s());
         }
         catch(Exception)
         {
            return double.NaN;
         }
      }
 
      public double a(statistics design)
      {
         try
         {
            return this.cov(design)/(Math.Pow(design.s(),2));
         }
         catch(Exception)
         {
             return double.NaN;
         }
      }
 
      public static double a(statistics design1,statistics design2)
      {
         try
         {
            return cov(design1,design2)/(Math.Pow(design2.s(),2));
         }
         catch (Exception)
         {
            return double.NaN;
         }
      }
 
      public double b(statistics design)
      {
         return this.mean()-this.a(design)*design.mean();
      }
 
      public static double b(statistics design1,statistics design2)
      {
         return design1.mean()-a(design1,design2)*design2.mean();
      }
 
      private double Qi(double i)
      {
         try
         {
              double[] j=new double[list.Length];
              list.CopyTo(j,0);
              sort(j);
              if (Math.Ceiling(list.Length*i)==list.Length*i) 
                  return (j[(int)(list.Length*i-1)]+j[(int)(list.Length*i)])/2;
              else return j[((int)(Math.Ceiling(list.Length*i)))-1];
          }
         catch(Exception)
         {
            return double.NaN;
         }
      }
      
      private void sort(double[] i)
      {
         double[] temp=new double[i.Length];
          merge_sort(i,temp,0,i.Length-1);
      }
 
      private void  merge_sort(double[] source,
           double[] temp,int left,int right)
      {
         int mid;
         if (left<right) 
         {
              mid=(left+right) / 2;
              merge_sort(source,temp,left,mid);
              merge_sort(source,temp,mid+1,right);
              merge(source,temp,left,mid+1,right);
         }
      }
 
      private void  merge(double[] source,double[] temp,
           int left,int mid,int right)
      {
         int i,left_end,num_elements,tmp_pos;
         left_end=mid - 1;
         tmp_pos=left;
         num_elements=right - left + 1;
         while ((left <= left_end) && (mid <= right)) 
         {
              if (source[left] <= source[mid]) 
              {            
                   temp[tmp_pos]= source[left];
                   tmp_pos++;
                   left++;
              }
              else
              {   
                   temp[tmp_pos] = source[mid];
                   tmp_pos++;
                   mid++;
              }
         }
         while (left <= left_end) 
         {
              temp[tmp_pos]= source[left];
              left++;
              tmp_pos++;
         }
         while (mid <= right) 
         {
              temp[tmp_pos]= source[mid];
              mid++;
              tmp_pos++;
         }
         for (i=1;i<=num_elements;i++)
         {
              source[right]= temp[right];
              right--;
         }
      }
 
      private bool exist(double member)
      {
         bool is_exist=false;
         int i=0;
         while (i<=list.Length-1 && !is_exist)
         {
           is_exist=(list[i]==member);
           i++;
         }
         return is_exist;
      }
    }
}

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Israel Israel
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionSome feedback about the code Pin
Bill_Hallahan19-Nov-13 14:07
Bill_Hallahan19-Nov-13 14:07 
I'm holding off on rating this article. Perhaps it will be updated.

Names
Several names are not the best names. "a" is ambiguous.

In addition to the "mode", it would be nice if class had methods named "mean", the "median", "variance", and "standard deviation", to name just a few common names from statistics. Some of the methods have unusual names. Without studying the code even more, I can't be sure that all those statistical operations are provided. The common statistical names would make that clear.

Also, there are no comments in the code at all. That can be okay for code that does mathematics, but without clear methods named, comments are necessary to make the code easy to use.

Miscellaneous issues
As Sébastien Lorion mentioned below, use Array.sort() to sort items. It's an efficient sort.

Also, there is some unusual code:
C#
for (int i=0;i<=list.Length-1;i++)

The more common way to write this is:
C#
for (int i = 0; i < list.Length; i++)

The big point isn't the spacing, although that makes it much more readable, the major point is using '<' instead of '<=' and not subtracting 1 from the length. This is a bit less calculation and almost a universal convention in C, C++, and C# for for loops where a data structure takes indices that range from 0 to length - 1.

Even though I code in a very different style, I have no issue with the style you use, however, I do expect code in a file to use the same style throughout.
As an aside for why I wrote code the way I do - I, and others, find it much easier to read code written in a style where there is a space on either side of an equal sign or comparison operator, and a space after a comma. Your code has this style on these lines:
C#
num_elements=right - left + 1;
 while ((left <= left_end) && (mid <= right))

But in the for loop above, you use a different style.

Similar to the comment above:
C#
while (i<=list.Length-1 && !is_exist)

The following is simpler, less calculation, and is done in a more common fashion.
C#
while (i < list.Length && !is_exist)


Exceptions
Catching all exceptions is generally considered bad. That catch-block might execute because of a stack overflow exception, in which case, merely returning Nan is not the best behavior. In general, for exceptions like a stack-overflow exception, there is a serious bug and you want the program to terminate.

Quoting: http://msdn.microsoft.com/en-us/library/vstudio/0yd65esw.aspx

"Although the catch clause can be used without arguments to catch any type of exception, this usage is not recommended. In general, you should only catch those exceptions that you know how to recover from. Therefore, you should always specify an object argument derived from System.Exception For example:"
C#
catch(Exception)
{
   return double.NaN;
}

You want something like (and I made up the exception names):

C#
catch(DivideByZeroException)
{
   return double.NaN;
}
catch(ArithmeticOverflowException)
{
   return double.NaN;
}

I believe that, in C#, returning inside a catch block might be okay (it's not okay in C++), but I wouldn't do it for maintenance reasons. If someone adds a 'finally' block after your catch handler that returns, the finally handler will still run 'before' your return executes. This will likely confuse some people.

See http://msdn.microsoft.com/en-us/library/vstudio/dszsf989.aspx for using a finally-handler.

Accuracy
I copied this from John D. Cooks post below so you could have all issues in one place. He made a great point about calculating the standard deviation. Also, the book he recommends at a link is an excellent resource, as are all the books in the series.

http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/[^]


Good effort. I hope you improve this.

Bill

modified 19-Nov-13 21:39pm.

SuggestionPearson and Spearman correlation Pin
HiltonFerraz8-Oct-13 7:55
HiltonFerraz8-Oct-13 7:55 
QuestionHow use? Pin
RSaldanha4-Aug-10 13:37
RSaldanha4-Aug-10 13:37 
GeneralMy vote of 1 Pin
Deep Ash23-Feb-10 1:54
Deep Ash23-Feb-10 1:54 
GeneralMy vote of 2 Pin
Saar Yahalom15-Aug-09 11:31
Saar Yahalom15-Aug-09 11:31 
GeneralRe: My vote of 2 Pin
laduran6-May-10 11:33
laduran6-May-10 11:33 
GeneralVariance Pin
Ashwini Pillutla11-Jun-09 10:52
Ashwini Pillutla11-Jun-09 10:52 
GeneralPotential inaccuracy Pin
John D. Cook22-Oct-08 6:40
John D. Cook22-Oct-08 6:40 
Generalcovariance Pin
jimrob7818-Jan-07 6:31
jimrob7818-Jan-07 6:31 
GeneralQuantLib Pin
Tomaž Štih7-Nov-04 21:44
Tomaž Štih7-Nov-04 21:44 
GeneralRe: QuantLib Pin
loizzi2-Sep-05 7:35
loizzi2-Sep-05 7:35 
GeneralRe: QuantLib Pin
anybudy8-Apr-07 8:14
anybudy8-Apr-07 8:14 
GeneralArray.Sort() ... Pin
Sebastien Lorion7-Nov-04 20:09
Sebastien Lorion7-Nov-04 20:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.