Click here to Skip to main content
Click here to Skip to main content

Least Squares Regression for Quadratic Curve Fitting

By , 9 Mar 2010
 

Introduction

A recent software project had a requirement to derive the equation of a quadratic curve from a series of data points. That is to say, to determine a, b, and c, where y = ax2 + bx + c. Having determined a, b, and c, I would also need a value for R-squared (the coefficient of determination).

A quick search of Google failed to bring up a suitable C# class; quite possibly, a more thorough search would have done so.

I had a vague recollection of something called 'Least Squares Regression', so back to Google I went.

References

Least Squares Regression:

R square:

Cramer's rule:

Using the class

Declare an instance:

LstSquQuadRegr solvr = new LstSquQuadRegr();

Pass it some point pairs (at least 3):

solvr.AddPoints(x1, y1);
solvr.AddPoints(x2, y2);
solvr.AddPoints(x3, y3);
solvr.AddPoints(x4, y4);

Get the values:

double the_a_term = solvr.aTerm();  
double the_b_term = solvr.bTerm();
double the_c_term = solvr.cTerm();
double the_rSquare = solvr.rSquare();

The Theory

y = ax^2 + bx + c

We have a series of points (x1,y1), (x2,y2) ... (xn,yn).

for i = 1 to n

We want the values of a, b, and c that minimise the sum of squares of the deviations of yi from a*xi^2 + bxi + c. Such values will give the best-fitting quadratic equation.

Let the sum of the squares of the deviations be:

               n
   F(a,b,c) = SUM (a*xi^2 + bxi + c - yi)^2.
              i=1

dF/da = SUM 2*(a*xi^2+b*xi+c-yi)*xi^2 = 0,
dF/db = SUM 2*(a*xi^2+b*xi+c-yi)*xi = 0,
dF/dc = SUM 2*(a*xi^2+b*xi+c-yi) = 0.

(Here, all sums range over i = 1, 2, ..., n.) Dividing by 2 and rearranging gives these three simultaneous linear equations containing the three unknowns a, b, and c:

(SUM xi^4)*a + (SUM xi^3)*b + (SUM xi^2)*c = SUM xi^2*yi,
(SUM xi^3)*a + (SUM xi^2)*b +   (SUM xi)*c = SUM xi*yi,
(SUM xi^2)*a +   (SUM xi)*b +    (SUM 1)*c = SUM yi.

Using notation Sjk to mean the sum of x_i^j*y_i^k:

a*S40 + b*S30 + c*S20 = S21
a*S30 + b*S20 + c*S10 = S11
a*S20 + b*S10 + c*S00 = S01

Solve the simultaneous equations using Cramer's law:

  [ S40  S30  S20 ] [ a ]   [ S21 ]
  [ S30  S20  S10 ] [ b ] = [ S11 ]
  [ S20  S10  S00 ] [ c ]   [ S01 ] 

  D = [ S40  S30  S20 ] 
      [ S30  S20  S10 ] 
      [ S20  S10  S00 ]  
  
    = S40(S20*S00 - S10*S10) - S30(S30*S00 - S10*S20) + S20(S30*S10 - S20*S20)

 Da = [ S21  S30  S20 ]
      [ S11  S20  S10 ] 
      [ S01  S10  S00 ]  

    = S21(S20*S00 - S10*S10) - S11(S30*S00 - S10*S20) + S01(S30*S10 - S20*S20)

 Db = [ S40  S21  S20 ] 
      [ S30  S11  S10 ] 
      [ S20  S01  S00 ]  

    = S40(S11*S00 - S01*S10) - S30(S21*S00 - S01*S20) + S20(S21*S10 - S11*S20)
  
 Dc = [ S40  S30  S21 ] 
      [ S30  S20  S11 ] 
      [ S20  S10  S01 ] 
  
    = S40(S20*S01 - S10*S11) - S30(S30*S01 - S10*S21) + S20(S30*S11 - S20*S21)  

a = Da/D
b = Db/D
c = Dc/D

R square

R2 = 1 - (residual sum of squares / total sum of squares).

                        n
total sum of squares = SUM (yi - y_mean)^2.
                       i=1

This is the sum of the squares of the differences between the measured y values and the mean y value.

                            n
 residual sum of squares = SUM (yi - yi_predicted)^2.
                           i=1

This is the sum of the squares of the difference between the measured y values and the values of y predicted by the equation.

The Code

A bunch of helper methods calculate all the various sums of squares. When calculating the values of a, b, and c, I used the sjk notation above as I found it easier to keep track.

/******************************************************************************
                          Class LstSquQuadRegr
     A C#  Class for Least Squares Regression for Quadratic Curve Fitting
                          Alex Etchells  2010    
******************************************************************************/

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;


public  class LstSquQuadRegr
{
     /* instance variables */
    ArrayList pointArray = new ArrayList(); 
    private int numOfEntries; 
    private double[] pointpair;          

    /*constructor */
    public LstSquQuadRegr()
    {
        numOfEntries = 0;
        pointpair = new double[2];
    }

    /*instance methods */    
    /// <summary>
    /// add point pairs
    /// </summary>
    /// <param name="x">x value</param>
    /// <param name="y">y value</param>
    public void AddPoints(double x, double y) 
    {
        pointpair = new double[2]; 
        numOfEntries +=1; 
        pointpair[0] = x; 
        pointpair[1] = y;
        pointArray.Add(pointpair);
    }

    /// <summary>
    /// returns the a term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>a term</returns>
    public double aTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k. 
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //a = Da/D
        return (s21*(s20 * s00 - s10 * s10) - 
                s11*(s30 * s00 - s10 * s20) + 
                s01*(s30 * s10 - s20 * s20))
                /
                (s40*(s20 * s00 - s10 * s10) -
                 s30*(s30 * s00 - s10 * s20) + 
                 s20*(s30 * s10 - s20 * s20));
    }

    /// <summary>
    /// returns the b term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>b term</returns>
    public double bTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k.
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //b = Db/D
        return (s40*(s11 * s00 - s01 * s10) - 
                s30*(s21 * s00 - s01 * s20) + 
                s20*(s21 * s10 - s11 * s20))
                /
                (s40 * (s20 * s00 - s10 * s10) - 
                 s30 * (s30 * s00 - s10 * s20) + 
                 s20 * (s30 * s10 - s20 * s20));
    }

    /// <summary>
    /// returns the c term of the equation ax^2 + bx + c
    /// </summary>
    /// <returns>c term</returns>
    public double cTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
                       "Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k.
        double s40 = getSx4(); //sum of x^4
        double s30 = getSx3(); //sum of x^3
        double s20 = getSx2(); //sum of x^2
        double s10 = getSx();  //sum of x
        double s00 = numOfEntries;
        //sum of x^0 * y^0  ie 1 * number of entries

        double s21 = getSx2y(); //sum of x^2*y
        double s11 = getSxy();  //sum of x*y
        double s01 = getSy();   //sum of y

        //c = Dc/D
        return (s40*(s20 * s01 - s10 * s11) - 
                s30*(s30 * s01 - s10 * s21) + 
                s20*(s30 * s11 - s20 * s21))
                /
                (s40 * (s20 * s00 - s10 * s10) - 
                 s30 * (s30 * s00 - s10 * s20) + 
                 s20 * (s30 * s10 - s20 * s20));
    }
    
    public double rSquare() // get r-squared
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException(
               "Insufficient pairs of co-ordinates");
        }
        // 1 - (residual sum of squares / total sum of squares)
        return 1 - getSSerr() / getSStot();
    }
   

    /*helper methods*/
    private double getSx() // get sum of x
    {
        double Sx = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx += ppair[0];
        }
        return Sx;
    }

    private double getSy() // get sum of y
    {
        double Sy = 0;
        foreach (double[] ppair in pointArray)
        {
            Sy += ppair[1];
        }
        return Sy;
    }

    private double getSx2() // get sum of x^2
    {
        double Sx2 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx2 += Math.Pow(ppair[0], 2); // sum of x^2
        }
        return Sx2;
    }

    private double getSx3() // get sum of x^3
    {
        double Sx3 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx3 += Math.Pow(ppair[0], 3); // sum of x^3
        }
        return Sx3;
    }

    private double getSx4() // get sum of x^4
    {
        double Sx4 = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx4 += Math.Pow(ppair[0], 4); // sum of x^4
        }
        return Sx4;
    }

    private double getSxy() // get sum of x*y
    {
        double Sxy = 0;
        foreach (double[] ppair in pointArray)
        {
            Sxy += ppair[0] * ppair[1]; // sum of x*y
        }
        return Sxy;
    }

    private double getSx2y() // get sum of x^2*y
    {
        double Sx2y = 0;
        foreach (double[] ppair in pointArray)
        {
            Sx2y += Math.Pow(ppair[0], 2) * ppair[1]; // sum of x^2*y
        }
        return Sx2y;
    }

    private double getYMean() // mean value of y
    {
        double y_tot = 0;
        foreach (double[] ppair in pointArray)
        {
            y_tot += ppair[1]; 
        }
        return y_tot/numOfEntries;
    }

    private double getSStot() // total sum of squares
    {
        //the sum of the squares of the differences between 
        //the measured y values and the mean y value
        double ss_tot = 0;
        foreach (double[] ppair in pointArray)
        {
            ss_tot += Math.Pow(ppair[1] - getYMean(), 2);
        }
        return ss_tot;
    }

    private double getSSerr() // residual sum of squares
    {
        //the sum of the squares of te difference between 
        //the measured y values and the values of y predicted by the equation
        double ss_err = 0;
        foreach (double[] ppair in pointArray)
        {
            ss_err += Math.Pow(ppair[1] - getPredictedY(ppair[0]), 2);
        }
        return ss_err;
    }

    private double getPredictedY(double x)
    {
        //returns value of y predicted by the equation for a given value of x
        return aTerm() * Math.Pow(x, 2) + bTerm() * x + cTerm();
    }
}

Points of Interest

That's it really - it seems to agree pretty closely with the values given by Excel. I hope it saves someone else the bother of working it out.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Alex@UEA
Software Developer University Of East Anglia, UK
United Kingdom United Kingdom
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralSingle C# static function versionmemberJuraj Lutisan5 May '13 - 23:43 
Here is code in one function. Implementation uses only two foreach cycles and the meaning of a, b, c parameters follows a mathematical convention: y = a + b*x + c*x*x.
 
public static void SimpleParabolicRegression(IEnumerable<double> vectorX, IEnumerable<double> vectorY, out double a, out double b, out double c, out double r)
{
    int n = 0;
    double sx = 0, sy = 0, sx2 = 0,  sx3 = 0, sx4 = 0, sxy = 0, sx2y = 0;
    foreach (var item in vectorX.Zip(vectorY, (x, y) => new { x, y }))
    {
        n++;
        sx  += item.x;
        sx2 += item.x * item.x;
        sx3 += item.x * item.x * item.x;
        sx4 += item.x * item.x * item.x * item.x;
        sy  += item.y;
        sxy += item.x * item.y;
        sx2y += item.x * item.x * item.y;
    }
 
    if (n < 3)
        throw new ArgumentException("Dimension of data set have to be 3 or more.");
 
    double d = (sx4 * (sx2 * n - sx * sx) - sx3 * (sx3 * n - sx * sx2) + sx2 * (sx3 * sx - sx2 * sx2));
 
    c = (sx2y * (sx2 * n - sx * sx) - sxy * (sx3 * n - sx * sx2) + sy * (sx3 * sx - sx2 * sx2)) / d;
    b = (sx4 * (sxy * n - sy * sx) - sx3 * (sx2y * n - sy * sx2) + sx2 * (sx2y * sx - sxy * sx2)) / d;
    a = (sx4 * (sx2 * sy - sx * sxy) - sx3 * (sx3 * sy - sx * sx2y) + sx2 * (sx3 * sxy - sx2 * sx2y)) / d;
 
    double ssErr = 0, ssTot = 0, yMean = sy/(double)n;
    foreach (var item in vectorX.Zip(vectorY, (x, y) => new { x, y }))
    {
        double err = item.y - (a + b*item.x + c*item.x*item.x);
        ssErr += err * err;
        double dif = item.y - yMean;
        ssTot += dif * dif;
    }
 
    r = Math.Sqrt(1.0 - ssErr / ssTot);
}

GeneralRe: Single C# static function versionmemberAlex@UEA6 May '13 - 21:32 
Neat!
How does it cope with the problems pointed out by Prcy in the 'Incorrect results' message below?
GeneralRe: Single C# static function versionmemberJuraj Lutisan6 May '13 - 23:52 
If you need minimalize truncation error during multiplication and division you can use decimal instead of double (or create generic version of my function). Remember that speed is slower in this case. For the most of situations double is enough.
QuestionC++ versionmemberRobin Imrie8 Nov '12 - 0:33 
For those of you who are interested I have ported this class to c++. Here is the code...
 
TCheckedVariable.h
///-------------------------------------------------------------------------------------------------
// file:	TCheckedVariable.h
//
// summary:	Declares the checked variable class
//
///-------------------------------------------------------------------------------------------------
#pragma once
#include <utility>

 
///-------------------------------------------------------------------------------------------------
/// \class	TCheckedVariable
///
/// \brief	Checked variable class - provides a wrapper arround a variable to allow for checking
///			that it has been initialized or not.
///-------------------------------------------------------------------------------------------------
template<typename T>
class TCheckedVariable : private std::pair<bool,T>
{
public:
 
	TCheckedVariable(void)
	{
		first = false;
	}
 
	TCheckedVariable( T t )
	{
		first = true;
		second = t;
	}
 
	bool IsInitialised() const
	{
		return first;
	}
 
	void Reset()
	{
		first = false;
	}
 
	const T& operator=( const T &rhs )
	{
		first = true;
		second = rhs;
		return second;
	}
 
	operator T() const
	{
		return second;
	}
};
 
CurveFit.h
///-------------------------------------------------------------------------------------------------
// file:	CurveFit.h
//
// summary:	Declares the curve fit class
//
///-------------------------------------------------------------------------------------------------
#pragma once
#include <vector>
#include "TCheckedVariable.h"

///-------------------------------------------------------------------------------------------------
/// \class	CDataPoint
///
/// \brief	Data point class.
///-------------------------------------------------------------------------------------------------
class CDataPoint
{
public:
	CDataPoint() : m_x(0.0), m_y(0.0) {}
	CDataPoint( double x, double y ) : m_x(x), m_y(y) {}
 
	// attributes
protected:
	double m_x;
	double m_y;
 
public:
	inline double x() const { return m_x; }
	inline double y() const	{ return m_y; }
	inline void x( double x ) { m_x = x; }
	inline void y( double y ) { m_y = y; }
};
 
typedef std::vector<CDataPoint> CDataPointArray;
 
///-------------------------------------------------------------------------------------------------
/// \class	CCurveFit
///
/// \brief	Curve fit.
///-------------------------------------------------------------------------------------------------
class CCurveFit
{
public:
	CCurveFit(void);
	~CCurveFit(void);	
 
	// attributes
protected:
	CDataPointArray m_dataPoints;
 
	TCheckedVariable<double> m_a;
	TCheckedVariable<double> m_b;
	TCheckedVariable<double> m_c;
 
public:
	void AddPoints( double x, double y );
	void AddPoints( const CDataPoint &dp );
	void AddPoints( const CDataPointArray &dpArray );
 
	// operations
public:
	// Gets the a term of the equation ax^2 + bx + c.
	double GetATerm();
	
	// Gets the b term of the equation ax^2 + bx + c.
	double GetBTerm();
 
	// Gets c term of the equation ax^2 + bx + c.
	double GetCTerm();
 
	/// Gets r squared value.
	double GetRSquare();
 
protected:
	// helper functions

	// Gets sum if x^nXPower * y^nYPower.
	double getSxy( int nXPower, int nYPower );
	double getYMean();
	double getSStot();
	double getSSerr();
	double getPredictedY( double x );
};
 
 
and curvefit.cpp
///-------------------------------------------------------------------------------------------------
// file:	CurveFit.cpp
//
// summary:	Implements the curve fit class
//
///-------------------------------------------------------------------------------------------------
#include "StdAfx.h"
#include "CurveFit.h"

const int MIN_VALUES = 3;
 
///-------------------------------------------------------------------------------------------------
/// \fn	CCurveFit::CCurveFit(void)
///
/// \brief	Default constructor.
///-------------------------------------------------------------------------------------------------
CCurveFit::CCurveFit(void)
{
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	CCurveFit::~CCurveFit(void)
///
/// \brief	Destructor.
///-------------------------------------------------------------------------------------------------
CCurveFit::~CCurveFit(void)
{
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	void CCurveFit::AddPoints( double x, double y )
///
/// \brief	Adds the points to 'y'.
///
/// \param	x	The double to process.
/// \param	y	The double to process.
///-------------------------------------------------------------------------------------------------
void CCurveFit::AddPoints( double x, double y )
{
	AddPoints( CDataPoint( x, y ) );
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	void CCurveFit::AddPoints( const CDataPoint &dp )
///
/// \brief	Adds the points.
///
/// \param	dp	The dp.
///-------------------------------------------------------------------------------------------------
void CCurveFit::AddPoints( const CDataPoint &dp )
{
	m_dataPoints.push_back( dp );
	m_a.Reset();
	m_b.Reset();
	m_c.Reset();
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	void CCurveFit::AddPoints( const CDataPointArray &dpArray )
///
/// \brief	Adds the points.
///
/// \param	dpArray	Array of dps.
///-------------------------------------------------------------------------------------------------
void CCurveFit::AddPoints( const CDataPointArray &dpArray )
{
	m_dataPoints.insert( 
		m_dataPoints.rbegin().base(), 
		dpArray.begin(),
		dpArray.end() );
 
	m_a.Reset();
	m_b.Reset();
	m_c.Reset();
}
 
/*
  y = ax^2 + bx + c
 
  notation Sjk to mean the sum of x_i^j*y_i^k. 
  2a*S40 + 2c*S20 + 2b*S30 - 2*S21 = 0
  2b*S20 + 2a*S30 + 2c*S10 - 2*S11 = 0
  2a*S20 + 2c*S00 + 2b*S10 - 2*S01 = 0
  
  solve the system of simultaneous equations using Cramer's law
 
  [ S40  S30  S20 ] [ a ]   [ S21 ]
  [ S30  S20  S10 ] [ b ] = [ S11 ]
  [ S20  S10  S00 ] [ c ]   [ S01 ] 
 
  D = [ S40  S30  S20 ] 
	  [ S30  S20  S10 ] 
	  [ S20  S10  S00 ]  
  
	  S40(S20*S00 - S10*S10) - S30(S30*S00 - S10*S20) + S20(S30*S10 - S20*S20)
 
 Da = [ S21  S30  S20 ]
	  [ S11  S20  S10 ] 
	  [ S01  S10  S00 ]  
 
	  S21(S20*S00 - S10*S10) - S11(S30*S00 - S10*S20) + S01(S30*S10 - S20*S20)
 
 Db = [ S40  S21  S20 ] 
	  [ S30  S11  S10 ] 
	  [ S20  S01  S00 ]  
 
	  S40(S11*S00 - S01*S10) - S30(S21*S00 - S01*S20) + S20(S21*S10 - S11*S20)
  
 Dc = [ S40  S30  S21 ] 
	  [ S30  S20  S11 ] 
	  [ S20  S10  S01 ] 
  
	  S40(S20*S01 - S10*S11) - S30(S30*S01 - S10*S21) + S20(S30*S11 - S20*S21)  
 
 */
 

///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::GetATerm();
///
/// \brief	Gets the a term of the equation ax^2 + bx + c.
///
/// \return	a term.
///-------------------------------------------------------------------------------------------------
double CCurveFit::GetATerm()
{
	if( m_dataPoints.size() < MIN_VALUES )
	{
		throw std::exception( "Insufficient pairs of co-ordinates" );
	}
 
	if( m_a.IsInitialised() )
		return m_a;
 
	// notation sjk to mean the sum of x_i^j_i^k.
	double s40 = getSxy( 4, 0 );
	double s30 = getSxy( 3, 0 );
	double s20 = getSxy( 2, 0 );
	double s10 = getSxy( 1, 0 );
	double s00 = (double)m_dataPoints.size();
 
	double s21 = getSxy( 2, 1 );
	double s11 = getSxy( 1, 1 );
	double s01 = getSxy( 0, 1 );
 
	// Da / D
	m_a = (s21 * (s20 * s00 - s10 * s10) - s11 * (s30 * s00 - s10 * s20) + s01 * (s30 * s10 - s20 * s20))
				/
				(s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20));
 
	return m_a;
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::GetBTerm();
///
/// \brief	Gets the b term of the equation ax^2 + bx + c.
///
/// \return	The b term.
///-------------------------------------------------------------------------------------------------
double CCurveFit::GetBTerm()
{
	if( m_dataPoints.size() < MIN_VALUES )
	{
		throw std::exception( "Insufficient pairs of co-ordinates" );
	}
 
	if( m_b.IsInitialised() )
		return m_b;
 
	// notation sjk to mean the sum of x_i^j_i^k.
	double s40 = getSxy( 4, 0 );
	double s30 = getSxy( 3, 0 );
	double s20 = getSxy( 2, 0 );
	double s10 = getSxy( 1, 0 );
	double s00 = (double)m_dataPoints.size();
 
	double s21 = getSxy( 2, 1 );
	double s11 = getSxy( 1, 1 );
	double s01 = getSxy( 0, 1 );
 
	//   Db / D
	m_b = (s40 * (s11 * s00 - s01 * s10) - s30 * (s21 * s00 - s01 * s20) + s20 * (s21 * s10 - s11 * s20))
		/
		(s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20));
 
	return m_b;
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::GetCTerm();
///
/// \brief	Gets c term of the equation ax^2 + bx + c.
///
/// \return	The c term.
///-------------------------------------------------------------------------------------------------
double CCurveFit::GetCTerm()
{
	if( m_dataPoints.size() < MIN_VALUES )
	{
		throw std::exception( "Insufficient pairs of co-ordinates" );
	}
 
	if( m_c.IsInitialised() )
		return m_c;
 
	// notation sjk to mean the sum of x_i^j_i^k.
	double s40 = getSxy( 4, 0 );
	double s30 = getSxy( 3, 0 );
	double s20 = getSxy( 2, 0 );
	double s10 = getSxy( 1, 0 );
	double s00 = (double)m_dataPoints.size();
 
	double s21 = getSxy( 2, 1 );
	double s11 = getSxy( 1, 1 );
	double s01 = getSxy( 0, 1 );
 
	//   Dc / D
	m_c = (s40*(s20 * s01 - s10 * s11) - s30*(s30 * s01 - s10 * s21) + s20*(s30 * s11 - s20 * s21))
			/
			(s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20));
 
	return m_c;
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::GetRSquare();
///
/// \brief	Gets r squared.
///
/// \return	The r squared value.
///-------------------------------------------------------------------------------------------------
double CCurveFit::GetRSquare()
{
	if( m_dataPoints.size() < MIN_VALUES )
	{
		throw std::exception( "Insufficient pairs of co-ordinates" );
	}
 
	return (1.0 - getSSerr() / getSStot());
 
}
 
// helper functions

///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::getSxy( int nXPower, int nYPower );
///
/// \brief	Gets sum if x^nXPower * y^nYPower.
///
/// \param	nXPower	The x power.
/// \param	nYPower	The y power.
///
/// \return	The sxy.
///-------------------------------------------------------------------------------------------------
double CCurveFit::getSxy( int nXPower, int nYPower )
{
	double dSxy = 0.0;
 
	std::for_each( m_dataPoints.begin(), m_dataPoints.end(),
		[ &dSxy, nXPower, nYPower ]( const CDataPoint &dp )
	{
		dSxy += pow( dp.x(), nXPower ) * pow( dp.y(), nYPower );
	} );
 
	return dSxy;
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::getYMean()
///
/// \brief	Get y coordinate mean.
///
/// \return	The calculated y coordinate mean.
///-------------------------------------------------------------------------------------------------
double CCurveFit::getYMean()
{
	double dTotal = 0.0;
	int nCount = 0;
	std::for_each( m_dataPoints.begin(), m_dataPoints.end(),
		[ &dTotal, &nCount ]( const CDataPoint &dp )
	{
		dTotal += dp.y();
		nCount++;
	} );
 
	return dTotal / nCount;
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::getSStot()
///
/// \brief	Gets s stot.
///
/// \return	The s stot.
///-------------------------------------------------------------------------------------------------
double CCurveFit::getSStot()
{
	double ssTot = 0.0;
	double dYMean = getYMean();
 
	std::for_each( m_dataPoints.begin(), m_dataPoints.end(),
		[ &ssTot, dYMean ]( const CDataPoint &dp )
	{
		ssTot += pow( (dp.y() * dYMean), 2 );
	} );
 
	return ssTot;
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::getSSerr()
///
/// \brief	Gets s serr.
///
/// \return	The s serr.
///-------------------------------------------------------------------------------------------------
double CCurveFit::getSSerr()
{
	double ssErr = 0.0;
 
	auto pThis = this;
 
	std::for_each( m_dataPoints.begin(), m_dataPoints.end(),
		[ &ssErr, pThis ]( const CDataPoint &dp )
	{
		ssErr += pow( (dp.y() - pThis->getPredictedY( dp.x() )), 2 );
	} );
	
	return ssErr;
}
 
///-------------------------------------------------------------------------------------------------
/// \fn	double CCurveFit::getPredictedY( double x )
///
/// \brief	Gets predicted y coordinate.
///
/// \param	x	The double to process.
///
/// \return	The predicted y coordinate.
///-------------------------------------------------------------------------------------------------
double CCurveFit::getPredictedY( double x )
{
	return (GetATerm() * pow( x, 2 )) + (GetBTerm() * x) + GetCTerm();
}
Thanks,
Robin.

GeneralMy vote of 5memberNicoD22 May '12 - 0:18 
Exactly what I was looking for!!1 - Thanks a lot...
GeneralMy vote of 5membernaeem_libra25 Dec '11 - 4:56 
Appreciated Work. 5/5
QuestionNice work, but a suggestion:memberm00se123428 Nov '11 - 22:18 
Thanks for this, no struggling with my mathbook this morning thanks to your work.
 
The readability of this code is an advantage, I would however suggest a different way to find the parameters: now you are going through the Arraylist multiple times; especially with big datasets I suspect there has to a big advantage in using something like this instead:
 
    private double getParameters()
    {
        double Sx = 0;
        double Sy = 0;
 
        foreach (double[] ppair in pointArray)
        {
            Sx += ppair[0];
            Sy += ppair[1];
            //Etcetera...
        }
 
    }

AnswerRe: Nice work, but a suggestion:memberAlex@UEA28 Nov '11 - 22:57 
Hi m00se1234 and thanks for the suggestion
 
I note that you are using double rather than decimal which suggests you may be using the old version
 
There is an updated version contained in the 'incorrect results[^]' comment below
 
Alex
Generalincorrect resultsmemberPrcy9 Dec '10 - 11:14 
I tried the class and got in some cases incorrect results:
 
LstSquQuadRegr l = new LstSquQuadRegr();
l.AddPoints(400, 1);
l.AddPoints(401, 3);
l.AddPoints(402, 1);
Debug.WriteLine("a:"+l.cTerm().ToString());

---> a:-2 correct
 

LstSquQuadRegr l = new LstSquQuadRegr();
l.AddPoints(400, 1);
l.AddPoints(401, 3);
l.AddPoints(402.0000000001, 1);
Debug.WriteLine("a:"+l.cTerm().ToString());

---> a:1,00581686019371 incorrect!!!
 

I found rounding errors as cause of the trouble. A quick fix is to use decimal insteat of double (replace all double by decimal, fix the Math.Pow-compiler errors by casting or using multiplication insteat of Math.Pow).
But the better solution would be to optimitze the calculation of D, Da ...
GeneralRe: incorrect resultsmemberAlex@UEA10 Dec '10 - 0:23 
thanks for pointing that one out Prcy!
 
(incidentally your debug lines as written are returning the C term although your illustrative values are the A term.)
 
I have taken the approach of using decimals and using multiplication instead of Math.Pow.
This also gave me the opportunity to add a generic getSxy(int xPower, int yPower) // get sum of x^xPower * y^yPower method - something which had bugged me before
 

I want to slot this new version straight into those projects which already use this class,
so data is still entered and returned as doubles.
 
Here is the updated version
/******************************************************************************
                          Class LstSquQuadRegr
     A C#  Class for Least Squares Regression for Quadratic Curve Fitting
                          Alex Etchells  2010    
          version 2 using decimals to resolve rounding errors
******************************************************************************/
 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
 

public  class LstSquQuadRegr
{
     /* instance variables */
    ArrayList pointArray = new ArrayList(); 
    private int numOfEntries; 
    private decimal[] pointpair;
 
    /*constructor */
    public LstSquQuadRegr()
    {
        numOfEntries = 0;
        pointpair = new decimal[2];
    }
 
    /*instance methods */
    public void AddPoints(double x, double y) 
    {
        pointpair = new decimal[2]; 
        numOfEntries +=1; 
        pointpair[0] = Convert.ToDecimal(x);
        pointpair[1] = Convert.ToDecimal(y);
        pointArray.Add(pointpair);
    }
 
/*
  y = ax^2 + bx + c
 
  notation Sjk to mean the sum of x_i^j*y_i^k. 
  2a*S40 + 2c*S20 + 2b*S30 - 2*S21 = 0
  2b*S20 + 2a*S30 + 2c*S10 - 2*S11 = 0
  2a*S20 + 2c*S00 + 2b*S10 - 2*S01 = 0
  
  solve the system of simultaneous equations using Cramer's law
 
  [ S40  S30  S20 ] [ a ]   [ S21 ]
  [ S30  S20  S10 ] [ b ] = [ S11 ]
  [ S20  S10  S00 ] [ c ]   [ S01 ] 
 
  D = [ S40  S30  S20 ] 
      [ S30  S20  S10 ] 
      [ S20  S10  S00 ]  
  
      S40(S20*S00 - S10*S10) - S30(S30*S00 - S10*S20) + S20(S30*S10 - S20*S20)
 
 Da = [ S21  S30  S20 ]
      [ S11  S20  S10 ] 
      [ S01  S10  S00 ]  
 
      S21(S20*S00 - S10*S10) - S11(S30*S00 - S10*S20) + S01(S30*S10 - S20*S20)
 
 Db = [ S40  S21  S20 ] 
      [ S30  S11  S10 ] 
      [ S20  S01  S00 ]  
 
      S40(S11*S00 - S01*S10) - S30(S21*S00 - S01*S20) + S20(S21*S10 - S11*S20)
  
 Dc = [ S40  S30  S21 ] 
      [ S30  S20  S11 ] 
      [ S20  S10  S01 ] 
  
      S40(S20*S01 - S10*S11) - S30(S30*S01 - S10*S21) + S20(S30*S11 - S20*S21)  
 
 */
 
    private decimal deciATerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException("Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k. 
        decimal s40 = getSxy(4, 0);
        decimal s30 = getSxy(3, 0);
        decimal s20 = getSxy(2, 0);
        decimal s10 = getSxy(1, 0);
        decimal s00 = numOfEntries;
 
        decimal s21 = getSxy(2, 1);
        decimal s11 = getSxy(1, 1);
        decimal s01 = getSxy(0, 1);
 
        //   Da / D
        return (s21 * (s20 * s00 - s10 * s10) - s11 * (s30 * s00 - s10 * s20) + s01 * (s30 * s10 - s20 * s20))
                /
                (s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20));
    }
 
    public double aTerm()
    {
        return Convert.ToDouble(deciATerm());
    }
 
    private decimal deciBTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException("Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k. 
        decimal s40 = getSxy(4, 0);
        decimal s30 = getSxy(3, 0);
        decimal s20 = getSxy(2, 0);
        decimal s10 = getSxy(1, 0);
        decimal s00 = numOfEntries;
 
        decimal s21 = getSxy(2, 1);
        decimal s11 = getSxy(1, 1);
        decimal s01 = getSxy(0, 1);
 
        //   Db / D
        return (s40 * (s11 * s00 - s01 * s10) - s30 * (s21 * s00 - s01 * s20) + s20 * (s21 * s10 - s11 * s20))
        /
        (s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20));
    }
 
    public double bTerm()
    {
       return Convert.ToDouble(deciBTerm());
    }
 
    private decimal deciCTerm()
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException("Insufficient pairs of co-ordinates");
        }
        //notation sjk to mean the sum of x_i^j*y_i^k.  
        decimal s40 = getSxy(4, 0);
        decimal s30 = getSxy(3, 0);
        decimal s20 = getSxy(2, 0);
        decimal s10 = getSxy(1, 0);
        decimal s00 = numOfEntries;
 
        decimal s21 = getSxy(2, 1);
        decimal s11 = getSxy(1, 1);
        decimal s01 = getSxy(0, 1);
 
        //   Dc / D
        return(s40*(s20 * s01 - s10 * s11) - s30*(s30 * s01 - s10 * s21) + s20*(s30 * s11 - s20 * s21))
                /
                (s40 * (s20 * s00 - s10 * s10) - s30 * (s30 * s00 - s10 * s20) + s20 * (s30 * s10 - s20 * s20));
    }
 
    public double cTerm()
    {
        return Convert.ToDouble(deciCTerm());
    }
    
    public double rSquare() // get rsquare
    {
        if (numOfEntries < 3)
        {
            throw new InvalidOperationException("Insufficient pairs of co-ordinates");
        }
        return Convert.ToDouble(1 - getSSerr() / getSStot());
    }
   
 
    /*helper methods*/
    private decimal getSxy(int xPower, int yPower) // get sum of x^xPower * y^yPower
    {
        decimal Sxy = 0;
        foreach (decimal[] ppair in pointArray)
        {
            decimal xToPower = 1;
            for (int i = 0; i < xPower; i++)
            {
                xToPower = xToPower * ppair[0];
            }
            
            decimal yToPower = 1;
            for (int i = 0; i < yPower; i++)
            {
                yToPower = yToPower * ppair[1];
            }
            Sxy += xToPower * yToPower;
        }
        return Sxy;
    }
 

    private decimal getYMean()
    {
        decimal y_tot = 0;
        foreach (decimal[] ppair in pointArray)
        {
            y_tot += ppair[1]; 
        }
        return y_tot/numOfEntries;
    }
 
    private decimal getSStot()
    {
        decimal ss_tot = 0;
        foreach (decimal[] ppair in pointArray)
        {
            ss_tot += (ppair[1] - getYMean()) *  (ppair[1] - getYMean());
        }
        return ss_tot;
    }
 
    private decimal getSSerr()
    {
        decimal ss_err = 0;
        foreach (decimal[] ppair in pointArray)
        {
            ss_err += (ppair[1] - getPredictedY(ppair[0])) * (ppair[1] - getPredictedY(ppair[0]));
        }
        return ss_err;
    }
 

    private decimal getPredictedY(decimal x)
    {
        return deciATerm() * x * x + deciBTerm() * x + deciCTerm();
    }
}
 
 
cheers Alex
GeneralMe Like!memberJohn Devron27 Sep '10 - 8:03 
Poke tongue | ;-P Very clean solution. Works perfectly in my application. Thanks so much for saving me the trouble of re-inventing the wheel. It behaves beautifully as a jitter free filter!
Generallooks nice but high order polynomial is bettermemberlastguy30 Aug '10 - 14:38 
this one is 2nd order
GeneralRe: looks nice but high order polynomial is better [modified]memberAlex@UEA6 Oct '10 - 23:27 
lastguy wrote:
looks nice but high order polynomial is better

 
well that's me told!
 
lastguy wrote:
this one is 2nd order

 
well so it is!
clue's in the title mate.
 

whilst I'm here - I'm still wondering why I wrote
 
    private double getSx2() // get sum of x^2
    private double getSx3() // get sum of x^3
etc, etc
 
rather than
   private double getSx(int power) // get sum of x^power
 
or better still
   private double getSxy(int xPower, int yPower) // get sum of x^xPower * y^yPower
 
still as my late father always said: 'if it aint broke, don't fix it' Smile | :)

modified on Monday, October 11, 2010 4:49 AM

GeneralGood jobmvpPete O'Hanlon4 Mar '10 - 20:57 
Congratulations on an excellent article. This gets my 5.

"WPF has many lovers. It's a veritable porn star!" - Josh Smith

As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes.

My blog | My articles | MoXAML PowerToys | Onyx



GeneralI've been looking for one of thesememberDon Kackman4 Mar '10 - 10:33 
thanks Smile | :)
10 PRINT Software is hard. - D Knuth
20 GOTO 10

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 9 Mar 2010
Article Copyright 2010 by Alex@UEA
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid