Best-fitting line, circle and ellipse

Alasdair Craig

4.82/5 (40 votes)

Dec 5, 2013

CPOL

7 min read

123260

10823

Library for least-squares best-fitting of lines, circles and rotated ellipses

Introduction

This library is for determining the best-fitting 2D line, circle or rotated ellipse of a set of input points.

Best-fitting shapes where red dots = input and blue = adjusted.

Take for example a set of 2D x,y points that closely but not accurately approximates a circle. Then there is a centre point and radius that represents the best circle that matches the points. This library will give you that point and radius. Similarly for a line: you get the gradient and y-axis intercept, and ellipse: the centre, major and minor axes and rotation.

Background

Best-fitting shapes have uses in engineering, surveying and computational geometry. There is plenty of literature out there on this, but very little reusable code seems to exist. This library offers a means to incorporate this ability into a client application. It does so using the least-squares method.

Using the code

The code is standard C++ with a dependency on the Boost ublas library for matrix operations. A Visual Studio 2008 solution file, and a Makefile are provided for building on Windows and Linux respectively. Boost needs to be pre-installed with the relevant directory paths setup in your build environment.

General overview

UML class diagram of the best-fit library OO design.

Use the factory class to make an instance of the BestFit-derived computation object.
Decide what std::ostream, if any, you want textual output of the computation to go to. e.g. Output to the console or to a std::stringstream, or none at all.
Create two instances of the BestFitIO struct, one for input parameters and one for output.
At the very least, two members need to be set for the input struct: numPoints is the number of points to be provided, and points being the starting address of a sequence of pairs of doubles being the x,y points.
The output struct can be left as default constructed, though see the Further input and output section below for additional options.
Call Compute() on the computation object, passing in the I/O structs.
If needed, get the results from the output BestFitIO instance, or rely simply on what was output to the ostream.
delete the factory-made object.

A simple example

Here is an example of computing the best-fitting circle of 10 test points that approximate a circle of unit radius centered at coordinate (100,200). The true centre of these points is then found to be (99.969,200.016) and the radius 0.977.

#include <iostream>
#include <sstream>
#include "BestFit.h"

int main(int argc, char * argv[])
{
    int count = 10;
    double points[20] = { 
        99.9510022,201.01559,
        100.0701559,199.016258,
        100.921381,200.0685969,
        99.031849,199.895546,
        99.469357,200.869111,
        100.525856,200.793908,
        99.13055,200.56031,
        100.787632,199.483885,
        100.498142,199.169206,
        99.363449,199.276086
    };

    // Input settings specify the source coordinates.
    BestFitIO input;
    input.numPoints = count;
    input.points = points;

    // Output settings in this simple example left with default values.
    BestFitIO output;

    // Create the best-fitting circle object, and make it do the work.
    int type = BestFitFactory::Circle;
    BestFit *b = BestFitFactory::Create(type, std::cout);
    b->Compute(input, output);

    ... // output contains solved parameters, and optionally the adjusted points.

    // The factory allocated memory needs to be freed.  
    delete b;
    return 0;
}

Get the results

After computation the BestFitIO::outputFields contains the parameters of the shape - all of type double. The following enumerations are used to index into this array:

enum { LineGradient, LineYIntercept };
enum { CircleCentreX, CircleCentreY, CircleRadius };
enum { EllipseCentreX, EllipseCentreY, EllipseMajor, EllipseMinor, EllipseRotation };

For example, to retrieve the solved parameters of the circle in the above sample:

double centre_x = output.outputFields[BestFitIO::CircleCentreX];
double centre_y = output.outputFields[BestFitIO::CircleCentreY];
double radius   = output.outputFields[BestFitIO::CircleRadius];

In addition, some quality statistics are output to the specified ostream, in this case std::cout. More or less output is set with the input BestFitIO::verbosity setting. No output is also possible with the one-parameter override of the factory constructor.

Global check of the adjustment     ***PASSES***
Check of the evaluated unknowns    ***PASSES***
Number of observations             10
Number of unknowns                 3
Degrees of freedom                 7
Iterations until convergence       6
Variance                           0.000690
Std. dev. observation unit weight  0.026277
***********************************
Adj. circle centre X               99.968823
Adj. circle centre Y               200.016389
Adj. circle radius                 0.977187

This was an example where a circle was the best-fitting shape - the same procedure is followed for a line or ellipse.

Further input and output

In addition to solving for the geometry parameters, the library can also calculate:

the adjusted points - for each input point, the equivalent point that lies on the best-fitting shape
the residuals - the corrections to the input points

In the simple case above, no adjusted points were output. However, if output.wantAdjustedObs=true, the output.points array will contain the points described in (1) above. Either allocate memory for output.points, or a trick to save memory is to set output.points = input.points. The input points will then be overwritten with the adjusted points. Also by setting output.wantResiduals=true, the output.residuals array is allocated and populated with the corrections described in (2) above. Both these predicates default to false.

BestFitIO output;
output.points = new double[20]; // allocate space for adjusted points.
output.wantAdjustedObs = true;  // output.points will get populated
output.wantResiduals = true;    // output.residuals will get allocated and populated

if (bestFit.Compute(input, output))
{
    ... // extract data from output.points and output.residuals
    delete [] output.residuals;
    delete [] output.points;
}

The Demo application

As a testing tool and a graphical way of demonstrating the use of the library I wrote a demo application. This is dependent on MFC and the Dundas Ultimate Grid and therefore is Windows only. Note that by using the library it is then also dependent on Boost.

Demo application using best-fit as a dependent library.

The demo application uses the best-fit code as a statically linked-in library and allows you to:

See the provisional and adjusted coordinate data, and a graphical view of the points;
Right-click to add, and left-click to remove points from the graphical view;
See the results of the adjustment dynamically change;
Open and save the points to file;
Generate test data of a random distribution of points approximating given line, circle or ellipse parameters.

Points of Interest

Command-line program

The library code can also be compiled as a stand-alone console program. This takes stream input from stdio. To build as such, all that you need to do is to undefine BEST_FIT_LIB in the preprocessor defines and set the build target as an .exe. For Linux the Makefile has this mode as the default target. When building with BEST_FIT_LIB defined there is an additional dependency on Boost program_options for parsing of command line switches. Command line usage is as follows:

$ ./best-fit
Usage: best-fit [options] NUM_POINTS X,Y ...
Allowed options:
  -t [ --type ] arg     type of equation (0=line, 1=circle, 2=ellipse)
  -v [ --verbose ] arg  0=simple, 1=default, 2=verbose, 3=more verbose, etc.
  -h [ --help ]         produce this message

Speed/performance metrics

Testing the speed of best-fit on a Core 2 Duo shows the following speed stats for increasing number of points.

	No. of unknowns	100 pts	1 000 pts	10 000 pts
Line	2	<1ms	2ms	15ms
Circle	3	<1ms	4ms	60ms
Ellipse	5	1ms	10ms	124ms

There is a truly massive difference in matrix multiplication time between the Release and Debug builds of Boost uBlas. This is well documented on the Internet. I had unfortunately missed this in an earlier version of this article which showed drastically incorrect timings. This has been corrected and the above timings are with NDEBUG defined to disable the time consuming debug mode checks.

Algorithm

Linear least squares adjustment is the algorithm behind the code. Specifically the parametric case of the adjustment. Given provisional values for the unknowns, this least squares method solves for small adjustments that refine the provisionals which are then reused iteratively until the solution converges. Matrix operations are integral to this, unfortunately including matrix inversion, which means it will be slow for large matrices.

In the case of the best-fitting line, only y values are treated as observations. The x value is treated as invariable. The sum of the squares of the shortest vertical distance of each point to the line is minimised. This is the ordinary parametric case of the least squares adjustment.

In the case of the circle and the rotated ellipse, both the x and the y values are treated as observations. The sum of the squares of the shortest distance of each point to the shape is minimised. This is a special case of the general least squares adjustment known as the quasi-parametric case.

Both the parametric and quasi-parametric case are solved in the same way. The difference is in the formulation of the weight matrix and the meaning of the residuals. The residuals of the quasi-parametric case do not immediately equate to a useful geometric quantity as with the parametric case.

The solution vector, (or corrections to the provisional unknowns) is given by:

$the solution vector$

and the vector of residuals, (or corrections to the observations) is given by:

$vector of residuals$

where A is known as the design matrix of the observations, P the weight matrix and ℓ the observation vector. The inverse that exists in the equation for the solution vector is what necessitates a matrix inversion and negatively affects the performance.

To formulate the design matrix the equations of the geometry shapes need to be linear. The equation of the straight line is already linear, but the equations for a circle and a rotated ellipse need to be linearised first by Taylor expansion.

Provisional values for the unknowns are first determined by approximation. Each coordinate observation and the provisional unknown values are substituted into the linearised equation to form one row of the A matrix. The ℓ matrix is the observed less provisional.

The solution vector x is then determined, and added to the provisional values to refine them. If the change was sufficiently small then the solution has converged and the provisional values are final. Otherwise the new provisionals are used to reformulate the matrices A, P and ℓ before iterating again.

In conclusion, this is one approach, not the only approach, and not necessarily the best approach. However this approach doesn't need to concern itself with differences between lines, circles and ellipses, and this makes it convenient for concise implementation as a general purpose library.

Credit and source: Heinz Rüther, Advanced Engineering Surveying lecture notes (unpublished), Dept. of Geomatics, University of Cape Town, 1996.

History

Dec 2013: First release;
Jan 2014: Added the algorithm section based on feedback in the comments.
Feb 2014: A few bug fixes and test-suite improvements, changed the Makefile to default to a Release build, but most importantly updated the timings section.
Jan 2017: Added demo-app as a precompiled executable for easy download, based on feedback in the comments.