Least Squares Regression for Quadratic Curve Fitting
A C# class for Least Squares Regression for Quadratic Curve Fitting.
Introduction
A recent software project had a requirement to derive the equation of a quadratic curve from a series of data points. That is to say, to determine a, b, and c, where y = ax2 + bx + c. Having determined a, b, and c, I would also need a value for R-squared (the coefficient of determination).
A quick search of Google failed to bring up a suitable C# class; quite possibly, a more thorough search would have done so.
I had a vague recollection of something called 'Least Squares Regression', so back to Google I went.
References
Least Squares Regression:
- http://mathforum.org/library/drmath/view/72047.html
- http://mathforum.org/library/drmath/view/53796.html
R square:
Cramer's rule:
Using the class
Declare an instance:
LstSquQuadRegr solvr = new LstSquQuadRegr();
Pass it some point pairs (at least 3):
solvr.AddPoints(x1, y1);
solvr.AddPoints(x2, y2);
solvr.AddPoints(x3, y3);
solvr.AddPoints(x4, y4);
Get the values:
double the_a_term = solvr.aTerm();
double the_b_term = solvr.bTerm();
double the_c_term = solvr.cTerm();
double the_rSquare = solvr.rSquare();
The Theory
y = ax^2 + bx + c
We have a series of points (x1,y1), (x2,y2) ... (xn,yn).
for i = 1 to n
We want the values of a, b, and c that minimise the sum of squares of the deviations of yi from a*xi^2 + bxi + c. Such values will give the best-fitting quadratic equation.
Let the sum of the squares of the deviations be:
n
F(a,b,c) = SUM (a*xi^2 + bxi + c - yi)^2.
i=1
dF/da = SUM 2*(a*xi^2+b*xi+c-yi)*xi^2 = 0,
dF/db = SUM 2*(a*xi^2+b*xi+c-yi)*xi = 0,
dF/dc = SUM 2*(a*xi^2+b*xi+c-yi) = 0.
(Here, all sums range over i = 1, 2, ..., n.) Dividing by 2 and rearranging gives these three simultaneous linear equations containing the three unknowns a, b, and c:
(SUM xi^4)*a + (SUM xi^3)*b + (SUM xi^2)*c = SUM xi^2*yi,
(SUM xi^3)*a + (SUM xi^2)*b + (SUM xi)*c = SUM xi*yi,
(SUM xi^2)*a + (SUM xi)*b + (SUM 1)*c = SUM yi.
Using notation Sjk to mean the sum of x_i^j*y_i^k:
a*S40 + b*S30 + c*S20 = S21
a*S30 + b*S20 + c*S10 = S11
a*S20 + b*S10 + c*S00 = S01
Solve the simultaneous equations using Cramer's law:
[ S40 S30 S20 ] [ a ] [ S21 ]
[ S30 S20 S10 ] [ b ] = [ S11 ]
[ S20 S10 S00 ] [ c ] [ S01 ]
D = [ S40 S30 S20 ]
[ S30 S20 S10 ]
[ S20 S10 S00 ]
= S40(S20*S00 - S10*S10) - S30(S30*S00 - S10*S20) + S20(S30*S10 - S20*S20)
Da = [ S21 S30 S20 ]
[ S11 S20 S10 ]
[ S01 S10 S00 ]
= S21(S20*S00 - S10*S10) - S11(S30*S00 - S10*S20) + S01(S30*S10 - S20*S20)
Db = [ S40 S21 S20 ]
[ S30 S11 S10 ]
[ S20 S01 S00 ]
= S40(S11*S00 - S01*S10) - S30(S21*S00 - S01*S20) + S20(S21*S10 - S11*S20)
Dc = [ S40 S30 S21 ]
[ S30 S20 S11 ]
[ S20 S10 S01 ]
= S40(S20*S01 - S10*S11) - S30(S30*S01 - S10*S21) + S20(S30*S11 - S20*S21)
a = Da/D
b = Db/D
c = Dc/D
R square
R2 = 1 - (residual sum of squares / total sum of squares).
n
total sum of squares = SUM (yi - y_mean)^2.
i=1
This is the sum of the squares of the differences between the measured y values and the mean y value.
n
residual sum of squares = SUM (yi - yi_predicted)^2.
i=1
This is the sum of the squares of the difference between the measured y values and the values of y predicted by the equation.
The Code
A bunch of helper methods calculate all the various sums of squares. When calculating the values of a, b, and c, I used the sjk notation above as I found it easier to keep track.
/******************************************************************************
Class LstSquQuadRegr
A C# Class for Least Squares Regression for Quadratic Curve Fitting
Alex Etchells 2010
******************************************************************************/
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
public class LstSquQuadRegr
{
/* instance variables */
ArrayList pointArray = new ArrayList();
private int numOfEntries;
private double[] pointpair;
/*constructor */
public LstSquQuadRegr()
{
numOfEntries = 0;
pointpair = new double[2];
}
/*instance methods */
/// <summary>
/// add point pairs
/// </summary>
/// <param name="x">x value</param>
/// <param name="y">y value</param>
public void AddPoints(double x, double y)
{
pointpair = new double[2];
numOfEntries +=1;
pointpair[0] = x;
pointpair[1] = y;
pointArray.Add(pointpair);
}
/// <summary>
/// returns the a term of the equation ax^2 + bx + c
/// </summary>
/// <returns>a term</returns>
public double aTerm()
{
if (numOfEntries < 3)
{
throw new InvalidOperationException(
"Insufficient pairs of co-ordinates");
}
//notation sjk to mean the sum of x_i^j*y_i^k.
double s40 = getSx4(); //sum of x^4
double s30 = getSx3(); //sum of x^3
double s20 = getSx2(); //sum of x^2
double s10 = getSx(); //sum of x
double s00 = numOfEntries;
//sum of x^0 * y^0 ie 1 * number of entries
double s21 = getSx2y(); //sum of x^2*y
double s11 = getSxy(); //sum of x*y
double s01 = getSy(); //sum of y
//a = Da/D
return (s21*(s20 * s00 - s10 * s10) -
s11*(s30 * s00 - s10 * s20) +
s01*(s30 * s10 - s20 * s20))
/
(s40*(s20 * s00 - s10 * s10) -
s30*(s30 * s00 - s10 * s20) +
s20*(s30 * s10 - s20 * s20));
}
/// <summary>
/// returns the b term of the equation ax^2 + bx + c
/// </summary>
/// <returns>b term</returns>
public double bTerm()
{
if (numOfEntries < 3)
{
throw new InvalidOperationException(
"Insufficient pairs of co-ordinates");
}
//notation sjk to mean the sum of x_i^j*y_i^k.
double s40 = getSx4(); //sum of x^4
double s30 = getSx3(); //sum of x^3
double s20 = getSx2(); //sum of x^2
double s10 = getSx(); //sum of x
double s00 = numOfEntries;
//sum of x^0 * y^0 ie 1 * number of entries
double s21 = getSx2y(); //sum of x^2*y
double s11 = getSxy(); //sum of x*y
double s01 = getSy(); //sum of y
//b = Db/D
return (s40*(s11 * s00 - s01 * s10) -
s30*(s21 * s00 - s01 * s20) +
s20*(s21 * s10 - s11 * s20))
/
(s40 * (s20 * s00 - s10 * s10) -
s30 * (s30 * s00 - s10 * s20) +
s20 * (s30 * s10 - s20 * s20));
}
/// <summary>
/// returns the c term of the equation ax^2 + bx + c
/// </summary>
/// <returns>c term</returns>
public double cTerm()
{
if (numOfEntries < 3)
{
throw new InvalidOperationException(
"Insufficient pairs of co-ordinates");
}
//notation sjk to mean the sum of x_i^j*y_i^k.
double s40 = getSx4(); //sum of x^4
double s30 = getSx3(); //sum of x^3
double s20 = getSx2(); //sum of x^2
double s10 = getSx(); //sum of x
double s00 = numOfEntries;
//sum of x^0 * y^0 ie 1 * number of entries
double s21 = getSx2y(); //sum of x^2*y
double s11 = getSxy(); //sum of x*y
double s01 = getSy(); //sum of y
//c = Dc/D
return (s40*(s20 * s01 - s10 * s11) -
s30*(s30 * s01 - s10 * s21) +
s20*(s30 * s11 - s20 * s21))
/
(s40 * (s20 * s00 - s10 * s10) -
s30 * (s30 * s00 - s10 * s20) +
s20 * (s30 * s10 - s20 * s20));
}
public double rSquare() // get r-squared
{
if (numOfEntries < 3)
{
throw new InvalidOperationException(
"Insufficient pairs of co-ordinates");
}
// 1 - (residual sum of squares / total sum of squares)
return 1 - getSSerr() / getSStot();
}
/*helper methods*/
private double getSx() // get sum of x
{
double Sx = 0;
foreach (double[] ppair in pointArray)
{
Sx += ppair[0];
}
return Sx;
}
private double getSy() // get sum of y
{
double Sy = 0;
foreach (double[] ppair in pointArray)
{
Sy += ppair[1];
}
return Sy;
}
private double getSx2() // get sum of x^2
{
double Sx2 = 0;
foreach (double[] ppair in pointArray)
{
Sx2 += Math.Pow(ppair[0], 2); // sum of x^2
}
return Sx2;
}
private double getSx3() // get sum of x^3
{
double Sx3 = 0;
foreach (double[] ppair in pointArray)
{
Sx3 += Math.Pow(ppair[0], 3); // sum of x^3
}
return Sx3;
}
private double getSx4() // get sum of x^4
{
double Sx4 = 0;
foreach (double[] ppair in pointArray)
{
Sx4 += Math.Pow(ppair[0], 4); // sum of x^4
}
return Sx4;
}
private double getSxy() // get sum of x*y
{
double Sxy = 0;
foreach (double[] ppair in pointArray)
{
Sxy += ppair[0] * ppair[1]; // sum of x*y
}
return Sxy;
}
private double getSx2y() // get sum of x^2*y
{
double Sx2y = 0;
foreach (double[] ppair in pointArray)
{
Sx2y += Math.Pow(ppair[0], 2) * ppair[1]; // sum of x^2*y
}
return Sx2y;
}
private double getYMean() // mean value of y
{
double y_tot = 0;
foreach (double[] ppair in pointArray)
{
y_tot += ppair[1];
}
return y_tot/numOfEntries;
}
private double getSStot() // total sum of squares
{
//the sum of the squares of the differences between
//the measured y values and the mean y value
double ss_tot = 0;
foreach (double[] ppair in pointArray)
{
ss_tot += Math.Pow(ppair[1] - getYMean(), 2);
}
return ss_tot;
}
private double getSSerr() // residual sum of squares
{
//the sum of the squares of te difference between
//the measured y values and the values of y predicted by the equation
double ss_err = 0;
foreach (double[] ppair in pointArray)
{
ss_err += Math.Pow(ppair[1] - getPredictedY(ppair[0]), 2);
}
return ss_err;
}
private double getPredictedY(double x)
{
//returns value of y predicted by the equation for a given value of x
return aTerm() * Math.Pow(x, 2) + bTerm() * x + cTerm();
}
}
Points of Interest
That's it really - it seems to agree pretty closely with the values given by Excel. I hope it saves someone else the bother of working it out.