Least Squares Regression for Quadratic Curve Fitting

Alex@UEA

Rate me:

4.95/5 (22 votes)

9 Mar 2010CPOL2 min read

128.8K

A C# class for Least Squares Regression for Quadratic Curve Fitting.

Download source code - 1.44 KB

Introduction

A recent software project had a requirement to derive the equation of a quadratic curve from a series of data points. That is to say, to determine a, b, and c, where y = ax² + bx + c. Having determined a, b, and c, I would also need a value for R-squared (the coefficient of determination).

A quick search of Google failed to bring up a suitable C# class; quite possibly, a more thorough search would have done so.

I had a vague recollection of something called 'Least Squares Regression', so back to Google I went.

References

Least Squares Regression:

R square:

http://en.wikipedia.org/wiki/R_square

Cramer's rule:

http://www.purplemath.com/modules/cramers.htm

Using the class

Declare an instance:

LstSquQuadRegr solvr = new LstSquQuadRegr();

Pass it some point pairs (at least 3):

solvr.AddPoints(x1, y1);
solvr.AddPoints(x2, y2);
solvr.AddPoints(x3, y3);
solvr.AddPoints(x4, y4);

Get the values:

double the_a_term = solvr.aTerm();  
double the_b_term = solvr.bTerm();
double the_c_term = solvr.cTerm();
double the_rSquare = solvr.rSquare();

The Theory

y = ax^2 + bx + c

We have a series of points (x1,y1), (x2,y2) ... (xn,yn).

for i = 1 to n

We want the values of a, b, and c that minimise the sum of squares of the deviations of y_i from a*x_i^2 + bx_i + c. Such values will give the best-fitting quadratic equation.

Let the sum of the squares of the deviations be:

               n
   F(a,b,c) = SUM (a*xi^2 + bxi + c - yi)^2.
              i=1

dF/da = SUM 2*(a*xi^2+b*xi+c-yi)*xi^2 = 0,
dF/db = SUM 2*(a*xi^2+b*xi+c-yi)*xi = 0,
dF/dc = SUM 2*(a*xi^2+b*xi+c-yi) = 0.

(Here, all sums range over i = 1, 2, ..., n.) Dividing by 2 and rearranging gives these three simultaneous linear equations containing the three unknowns a, b, and c:

(SUM xi^4)*a + (SUM xi^3)*b + (SUM xi^2)*c = SUM xi^2*yi,
(SUM xi^3)*a + (SUM xi^2)*b +   (SUM xi)*c = SUM xi*yi,
(SUM xi^2)*a +   (SUM xi)*b +    (SUM 1)*c = SUM yi.

Using notation S_jk to mean the sum of x_i^j*y_i^k:

a*S40 + b*S30 + c*S20 = S21
a*S30 + b*S20 + c*S10 = S11
a*S20 + b*S10 + c*S00 = S01

Solve the simultaneous equations using Cramer's law:

  [ S40  S30  S20 ] [ a ]   [ S21 ]
  [ S30  S20  S10 ] [ b ] = [ S11 ]
  [ S20  S10  S00 ] [ c ]   [ S01 ] 

  D = [ S40  S30  S20 ] 
      [ S30  S20  S10 ] 
      [ S20  S10  S00 ]  
  
    = S40(S20*S00 - S10*S10) - S30(S30*S00 - S10*S20) + S20(S30*S10 - S20*S20)

 Da = [ S21  S30  S20 ]
      [ S11  S20  S10 ] 
      [ S01  S10  S00 ]  

    = S21(S20*S00 - S10*S10) - S11(S30*S00 - S10*S20) + S01(S30*S10 - S20*S20)

 Db = [ S40  S21  S20 ] 
      [ S30  S11  S10 ] 
      [ S20  S01  S00 ]  

    = S40(S11*S00 - S01*S10) - S30(S21*S00 - S01*S20) + S20(S21*S10 - S11*S20)
  
 Dc = [ S40  S30  S21 ] 
      [ S30  S20  S11 ] 
      [ S20  S10  S01 ] 
  
    = S40(S20*S01 - S10*S11) - S30(S30*S01 - S10*S21) + S20(S30*S11 - S20*S21)  

a = Da/D
b = Db/D
c = Dc/D

R square

R² = 1 - (residual sum of squares / total sum of squares).

                        n
total sum of squares = SUM (yi - y_mean)^2.
                       i=1

This is the sum of the squares of the differences between the measured y values and the mean y value.

                           n
residual sum of squares = SUM (yi - yi_predicted)^2.
                          i=1

This is the sum of the squares of the difference between the measured y values and the values of y predicted by the equation.

The Code

A bunch of helper methods calculate all the various sums of squares. When calculating the values of a, b, and c, I used the s_jk notation above as I found it easier to keep track.

/******************************************************************************
                          Class LstSquQuadRegr
     A C#  Class for Least Squares Regression for Quadratic Curve Fitting
                          Alex Etchells  2010    
******************************************************************************/

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;


public  class LstSquQuadRegr
{
     /* instance variables */
    ArrayList pointArray = new ArrayList(); 
    private int numOfEntries; 
    private double[] pointpair;          

    /*constructor */
    public LstSquQuadRegr()
    {
        numOfEntries = 0;
        pointpair = new double[2];
    }

    /*instance methods */    
    /// <summary>
    /// add point pairs
    /// </summary>
    /// <param name="x">x value</param>
    /// <param name="y">y value</param>
    public void AddPoints(double x, double y) 
    {
        pointpair = new double[2]; 
        numOfEntries +=1; 
        pointpair[0] = x; 
        pointpair[1] = y;
        pointArray.Add(pointpair);
    }

    /// <summary>
    /// returns the a term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>a term</returns>
    public double aTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k. 
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //a = Da/D
        return (s21*(s20 * s00 - s10 * s10) - 
                s11*(s30 * s00 - s10 * s20) + 
                s01*(s30 * s10 - s20 * s20))
                /
                (s40*(s20 * s00 - s10 * s10) -
                 s30*(s30 * s00 - s10 * s20) + 
                 s20*(s30 * s10 - s20 * s20));
    }

    /// <summary>
    /// returns the b term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>b term</returns>
    public double bTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k.
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //b = Db/D
        return (s40*(s11 * s00 - s01 * s10) - 
                s30*(s21 * s00 - s01 * s20) + 
                s20*(s21 * s10 - s11 * s20))
                /
                (s40 * (s20 * s00 - s10 * s10) - 
                 s30 * (s30 * s00 - s10 * s20) + 
                 s20 * (s30 * s10 - s20 * s20));
    }

    /// <summary>
    /// returns the c term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>c term</returns>
    public double cTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
                       "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k.
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //c = Dc/D
        return (s40*(s20 * s01 - s10 * s11) - 
                s30*(s30 * s01 - s10 * s21) + 
                s20*(s30 * s11 - s20 * s21))
                /
                (s40 * (s20 * s00 - s10 * s10) - 
                 s30 * (s30 * s00 - s10 * s20) + 
                 s20 * (s30 * s10 - s20 * s20));
    }
    
    public double rSquare() // get r-squared
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        // 1 - (residual sum of squares / total sum of squares)
        return 1 - getSSerr() / getSStot();
    }
   

    /*helper methods*/
    private double getSx() // get sum of x
    {
        double Sx = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx += ppair[0];
        }
        return Sx;
    }

    private double getSy() // get sum of y
    {
        double Sy = 0;
        foreach (double[] ppair in pointArray)
        {
            Sy += ppair[1];
        }
        return Sy;
    }

    private double getSx2() // get sum of x^2
    {
        double Sx2 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx2 += Math.Pow(ppair[0], 2); // sum of x^2
        }
        return Sx2;
    }

    private double getSx3() // get sum of x^3
    {
        double Sx3 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx3 += Math.Pow(ppair[0], 3); // sum of x^3
        }
        return Sx3;
    }

    private double getSx4() // get sum of x^4
    {
        double Sx4 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx4 += Math.Pow(ppair[0], 4); // sum of x^4
        }
        return Sx4;
    }

    private double getSxy() // get sum of x*y
    {
        double Sxy = 0;
        foreach (double[] ppair in pointArray)
        {
            Sxy += ppair[0] * ppair[1]; // sum of x*y
        }
        return Sxy;
    }

    private double getSx2y() // get sum of x^2*y
    {
        double Sx2y = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx2y += Math.Pow(ppair[0], 2) * ppair[1]; // sum of x^2*y
        }
        return Sx2y;
    }

    private double getYMean() // mean value of y
    {
        double y_tot = 0;
        foreach (double[] ppair in pointArray)
        {
            y_tot += ppair[1]; 
        }
        return y_tot/numOfEntries;
    }

    private double getSStot() // total sum of squares
    {
        //the sum of the squares of the differences between 
        //the measured y values and the mean y value
        double ss_tot = 0;
        foreach (double[] ppair in pointArray)
        {
            ss_tot += Math.Pow(ppair[1] - getYMean(), 2);
        }
        return ss_tot;
    }

    private double getSSerr() // residual sum of squares
    {
        //the sum of the squares of te difference between 
        //the measured y values and the values of y predicted by the equation
        double ss_err = 0;
        foreach (double[] ppair in pointArray)
        {
            ss_err += Math.Pow(ppair[1] - getPredictedY(ppair[0]), 2);
        }
        return ss_err;
    }

    private double getPredictedY(double x)
    {
        //returns value of y predicted by the equation for a given value of x
        return aTerm() * Math.Pow(x, 2) + bTerm() * x + cTerm();
    }
}

Points of Interest

That's it really - it seems to agree pretty closely with the values given by Excel. I hope it saves someone else the bother of working it out.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Alex@UEA

Software Developer University Of East Anglia, UK

United Kingdom

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

/****************************************************************************** Class LstSquQuadRegr A C# Class for Least Squares Regression for Quadratic Curve Fitting Alex Etchells 2010 version 2 using decimals to resolve rounding errors ******************************************************************************/ using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Collections; public class LstSquQuadRegr { /* instance variables */ ArrayList pointArray = new ArrayList(); private int numOfEntries; private decimal[] pointpair; /*constructor */ public LstSquQuadRegr() { numOfEntries = 0; pointpair = new decimal[2]; } /*instance methods */ public void AddPoints(double x, double y) { pointpair = new decimal[2]; numOfEntries +=1; pointpair[0] = Convert.ToDecimal(x); pointpair[1] = Convert.ToDecimal(y); pointArray.Add(pointpair); } /* y = ax^2 + bx + c notation Sjk to mean the sum of x_i^j*y_i^k. 2a*S40 + 2c*S20 + 2b*S30 - 2*S21 = 0 2b*S20 + 2a*S30 + 2c*S10 - 2*S11 = 0 2a*S20 + 2c*S00 + 2b*S10 - 2*S01 = 0 solve the system of simultaneous equations using Cramer's law [ S40 S30 S20 ] [ a ] [ S21 ] [ S30 S20 S10 ] [ b ] = [ S11 ] [ S20 S10 S00 ] [ c ] [ S01 ] D = [ S40 S30 S20 ] [ S30 S20 S10 ] [ S20 S10 S00 ] S40(S20*S00 - S10*S10) - S30(S30*S00 - S10*S20) + S20(S30*S10 - S20*S20) Da = [ S21 S30 S20 ] [ S11 S20 S10 ] [ S01 S10 S00 ] S21(S20*S00 - S10*S10) - S11(S30*S00 - S10*S20) + S01(S30*S10 - S20*S20) Db = [ S40 S21 S20 ] [ S30 S11 S10 ] [ S20 S01 S00 ] S40(S11*S00 - S01*S10) - S30(S21*S00 - S01*S20) + S20(S21*S10 - S11*S20) Dc = [ S40 S30 S21 ] [ S30 S20 S11 ] [ S20 S10 S01 ] S40(S20*S01 - S10*S11) - S30(S30*S01 - S10*S21) + S20(S30*S11 - S20*S21) */ private decimal deciATerm() { if (numOfEntries < 3) { throw new InvalidOperationException("Insufficient pairs of co-ordinates"); } //notation sjk to mean the sum of x_i^j*y_i^k. decimal s40 = getSxy(4, 0); decimal s30 = getSxy(3, 0); decimal s20 = getSxy(2, 0); decimal s10 = getSxy(1, 0); decimal s00 = numOfEntries; decimal s21 = getSxy(2, 1); decimal s11 = getSxy(1, 1); decimal s01 = getSxy(0, 1); // Da / D return (s21 * (s20 * s00 - s10 * s10) - s11 * (s30 * s00 - s10 * s20) + s01 * (s30 * s10 - s20 * s20)) / (s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20)); } public double aTerm() { return Convert.ToDouble(deciATerm()); } private decimal deciBTerm() { if (numOfEntries < 3) { throw new InvalidOperationException("Insufficient pairs of co-ordinates"); } //notation sjk to mean the sum of x_i^j*y_i^k. decimal s40 = getSxy(4, 0); decimal s30 = getSxy(3, 0); decimal s20 = getSxy(2, 0); decimal s10 = getSxy(1, 0); decimal s00 = numOfEntries; decimal s21 = getSxy(2, 1); decimal s11 = getSxy(1, 1); decimal s01 = getSxy(0, 1); // Db / D return (s40 * (s11 * s00 - s01 * s10) - s30 * (s21 * s00 - s01 * s20) + s20 * (s21 * s10 - s11 * s20)) / (s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20)); } public double bTerm() { return Convert.ToDouble(deciBTerm()); } private decimal deciCTerm() { if (numOfEntries < 3) { throw new InvalidOperationException("Insufficient pairs of co-ordinates"); } //notation sjk to mean the sum of x_i^j*y_i^k. decimal s40 = getSxy(4, 0); decimal s30 = getSxy(3, 0); decimal s20 = getSxy(2, 0); decimal s10 = getSxy(1, 0); decimal s00 = numOfEntries; decimal s21 = getSxy(2, 1); decimal s11 = getSxy(1, 1); decimal s01 = getSxy(0, 1); // Dc / D return(s40*(s20 * s01 - s10 * s11) - s30*(s30 * s01 - s10 * s21) + s20*(s30 * s11 - s20 * s21)) / (s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20)); } public double cTerm() { return Convert.ToDouble(deciCTerm()); } public double rSquare() // get rsquare { if (numOfEntries < 3) { throw new InvalidOperationException("Insufficient pairs of co-ordinates"); } return Convert.ToDouble(1 - getSSerr() / getSStot()); } /*helper methods*/ private decimal getSxy(int xPower, int yPower) // get sum of x^xPower * y^yPower { decimal Sxy = 0; foreach (decimal[] ppair in pointArray) { decimal xToPower = 1; for (int i = 0; i < xPower; i++) { xToPower = xToPower * ppair[0]; } decimal yToPower = 1; for (int i = 0; i < yPower; i++) { yToPower = yToPower * ppair[1]; } Sxy += xToPower * yToPower; } return Sxy; } private decimal getYMean() { decimal y_tot = 0; foreach (decimal[] ppair in pointArray) { y_tot += ppair[1]; } return y_tot/numOfEntries; } private decimal getSStot() { decimal ss_tot = 0; foreach (decimal[] ppair in pointArray) { ss_tot += (ppair[1] - getYMean()) * (ppair[1] - getYMean()); } return ss_tot; } private decimal getSSerr() { decimal ss_err = 0; foreach (decimal[] ppair in pointArray) { ss_err += (ppair[1] - getPredictedY(ppair[0])) * (ppair[1] - getPredictedY(ppair[0])); } return ss_err; } private decimal getPredictedY(decimal x) { return deciATerm() * x * x + deciBTerm() * x + deciCTerm(); } }

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.