65.9K
CodeProject is changing. Read more.
Home

Least Squares Regression for Quadratic Curve Fitting

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.96/5 (23 votes)

Mar 4, 2010

CPOL

2 min read

viewsIcon

133498

downloadIcon

3057

A C# class for Least Squares Regression for Quadratic Curve Fitting.

Introduction

A recent software project had a requirement to derive the equation of a quadratic curve from a series of data points. That is to say, to determine a, b, and c, where y = ax2 + bx + c. Having determined a, b, and c, I would also need a value for R-squared (the coefficient of determination).

A quick search of Google failed to bring up a suitable C# class; quite possibly, a more thorough search would have done so.

I had a vague recollection of something called 'Least Squares Regression', so back to Google I went.

References

Least Squares Regression:

R square:

Cramer's rule:

Using the class

Declare an instance:

LstSquQuadRegr solvr = new LstSquQuadRegr();

Pass it some point pairs (at least 3):

solvr.AddPoints(x1, y1);
solvr.AddPoints(x2, y2);
solvr.AddPoints(x3, y3);
solvr.AddPoints(x4, y4);

Get the values:

double the_a_term = solvr.aTerm();  
double the_b_term = solvr.bTerm();
double the_c_term = solvr.cTerm();
double the_rSquare = solvr.rSquare();

The Theory

y = ax^2 + bx + c

We have a series of points (x1,y1), (x2,y2) ... (xn,yn).

for i = 1 to n

We want the values of a, b, and c that minimise the sum of squares of the deviations of yi from a*xi^2 + bxi + c. Such values will give the best-fitting quadratic equation.

Let the sum of the squares of the deviations be:

               n
   F(a,b,c) = SUM (a*xi^2 + bxi + c - yi)^2.
              i=1

dF/da = SUM 2*(a*xi^2+b*xi+c-yi)*xi^2 = 0,
dF/db = SUM 2*(a*xi^2+b*xi+c-yi)*xi = 0,
dF/dc = SUM 2*(a*xi^2+b*xi+c-yi) = 0.

(Here, all sums range over i = 1, 2, ..., n.) Dividing by 2 and rearranging gives these three simultaneous linear equations containing the three unknowns a, b, and c:

(SUM xi^4)*a + (SUM xi^3)*b + (SUM xi^2)*c = SUM xi^2*yi,
(SUM xi^3)*a + (SUM xi^2)*b +   (SUM xi)*c = SUM xi*yi,
(SUM xi^2)*a +   (SUM xi)*b +    (SUM 1)*c = SUM yi.

Using notation Sjk to mean the sum of x_i^j*y_i^k:

a*S40 + b*S30 + c*S20 = S21
a*S30 + b*S20 + c*S10 = S11
a*S20 + b*S10 + c*S00 = S01

Solve the simultaneous equations using Cramer's law:

  [ S40  S30  S20 ] [ a ]   [ S21 ]
  [ S30  S20  S10 ] [ b ] = [ S11 ]
  [ S20  S10  S00 ] [ c ]   [ S01 ] 

  D = [ S40  S30  S20 ] 
      [ S30  S20  S10 ] 
      [ S20  S10  S00 ]  
  
    = S40(S20*S00 - S10*S10) - S30(S30*S00 - S10*S20) + S20(S30*S10 - S20*S20)

 Da = [ S21  S30  S20 ]
      [ S11  S20  S10 ] 
      [ S01  S10  S00 ]  

    = S21(S20*S00 - S10*S10) - S11(S30*S00 - S10*S20) + S01(S30*S10 - S20*S20)

 Db = [ S40  S21  S20 ] 
      [ S30  S11  S10 ] 
      [ S20  S01  S00 ]  

    = S40(S11*S00 - S01*S10) - S30(S21*S00 - S01*S20) + S20(S21*S10 - S11*S20)
  
 Dc = [ S40  S30  S21 ] 
      [ S30  S20  S11 ] 
      [ S20  S10  S01 ] 
  
    = S40(S20*S01 - S10*S11) - S30(S30*S01 - S10*S21) + S20(S30*S11 - S20*S21)  

a = Da/D
b = Db/D
c = Dc/D

R square

R2 = 1 - (residual sum of squares / total sum of squares).

                        n
total sum of squares = SUM (yi - y_mean)^2.
                       i=1

This is the sum of the squares of the differences between the measured y values and the mean y value.

                            n
 residual sum of squares = SUM (yi - yi_predicted)^2.
                           i=1

This is the sum of the squares of the difference between the measured y values and the values of y predicted by the equation.

The Code

A bunch of helper methods calculate all the various sums of squares. When calculating the values of a, b, and c, I used the sjk notation above as I found it easier to keep track.

/******************************************************************************
                          Class LstSquQuadRegr
     A C#  Class for Least Squares Regression for Quadratic Curve Fitting
                          Alex Etchells  2010    
******************************************************************************/

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;


public  class LstSquQuadRegr
{
     /* instance variables */
    ArrayList pointArray = new ArrayList(); 
    private int numOfEntries; 
    private double[] pointpair;          

    /*constructor */
    public LstSquQuadRegr()
    {
        numOfEntries = 0;
        pointpair = new double[2];
    }

    /*instance methods */    
    /// <summary>
    /// add point pairs
    /// </summary>
    /// <param name="x">x value</param>
    /// <param name="y">y value</param>
    public void AddPoints(double x, double y) 
    {
        pointpair = new double[2]; 
        numOfEntries +=1; 
        pointpair[0] = x; 
        pointpair[1] = y;
        pointArray.Add(pointpair);
    }

    /// <summary>
    /// returns the a term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>a term</returns>
    public double aTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k. 
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //a = Da/D
        return (s21*(s20 * s00 - s10 * s10) - 
                s11*(s30 * s00 - s10 * s20) + 
                s01*(s30 * s10 - s20 * s20))
                /
                (s40*(s20 * s00 - s10 * s10) -
                 s30*(s30 * s00 - s10 * s20) + 
                 s20*(s30 * s10 - s20 * s20));
    }

    /// <summary>
    /// returns the b term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>b term</returns>
    public double bTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k.
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //b = Db/D
        return (s40*(s11 * s00 - s01 * s10) - 
                s30*(s21 * s00 - s01 * s20) + 
                s20*(s21 * s10 - s11 * s20))
                /
                (s40 * (s20 * s00 - s10 * s10) - 
                 s30 * (s30 * s00 - s10 * s20) + 
                 s20 * (s30 * s10 - s20 * s20));
    }

    /// <summary>
    /// returns the c term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>c term</returns>
    public double cTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
                       "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k.
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //c = Dc/D
        return (s40*(s20 * s01 - s10 * s11) - 
                s30*(s30 * s01 - s10 * s21) + 
                s20*(s30 * s11 - s20 * s21))
                /
                (s40 * (s20 * s00 - s10 * s10) - 
                 s30 * (s30 * s00 - s10 * s20) + 
                 s20 * (s30 * s10 - s20 * s20));
    }
    
    public double rSquare() // get r-squared
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        // 1 - (residual sum of squares / total sum of squares)
        return 1 - getSSerr() / getSStot();
    }
   

    /*helper methods*/
    private double getSx() // get sum of x
    {
        double Sx = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx += ppair[0];
        }
        return Sx;
    }

    private double getSy() // get sum of y
    {
        double Sy = 0;
        foreach (double[] ppair in pointArray)
        {
            Sy += ppair[1];
        }
        return Sy;
    }

    private double getSx2() // get sum of x^2
    {
        double Sx2 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx2 += Math.Pow(ppair[0], 2); // sum of x^2
        }
        return Sx2;
    }

    private double getSx3() // get sum of x^3
    {
        double Sx3 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx3 += Math.Pow(ppair[0], 3); // sum of x^3
        }
        return Sx3;
    }

    private double getSx4() // get sum of x^4
    {
        double Sx4 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx4 += Math.Pow(ppair[0], 4); // sum of x^4
        }
        return Sx4;
    }

    private double getSxy() // get sum of x*y
    {
        double Sxy = 0;
        foreach (double[] ppair in pointArray)
        {
            Sxy += ppair[0] * ppair[1]; // sum of x*y
        }
        return Sxy;
    }

    private double getSx2y() // get sum of x^2*y
    {
        double Sx2y = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx2y += Math.Pow(ppair[0], 2) * ppair[1]; // sum of x^2*y
        }
        return Sx2y;
    }

    private double getYMean() // mean value of y
    {
        double y_tot = 0;
        foreach (double[] ppair in pointArray)
        {
            y_tot += ppair[1]; 
        }
        return y_tot/numOfEntries;
    }

    private double getSStot() // total sum of squares
    {
        //the sum of the squares of the differences between 
        //the measured y values and the mean y value
        double ss_tot = 0;
        foreach (double[] ppair in pointArray)
        {
            ss_tot += Math.Pow(ppair[1] - getYMean(), 2);
        }
        return ss_tot;
    }

    private double getSSerr() // residual sum of squares
    {
        //the sum of the squares of te difference between 
        //the measured y values and the values of y predicted by the equation
        double ss_err = 0;
        foreach (double[] ppair in pointArray)
        {
            ss_err += Math.Pow(ppair[1] - getPredictedY(ppair[0]), 2);
        }
        return ss_err;
    }

    private double getPredictedY(double x)
    {
        //returns value of y predicted by the equation for a given value of x
        return aTerm() * Math.Pow(x, 2) + bTerm() * x + cTerm();
    }
}

Points of Interest

That's it really - it seems to agree pretty closely with the values given by Excel. I hope it saves someone else the bother of working it out.