## Introduction

This library is for determining the best-fitting 2D line, circle or rotated ellipse of a set of input points.

Take for example a set of 2D x,y points that closely but not accurately approximates a circle. Then there is a centre point and radius that represents the best circle that matches the points. This library will give you that point and radius. Similarly for a line: you get the gradient and y-axis intercept, and ellipse: the centre, major and minor axes and rotation.

## Background

Best-fitting shapes have uses in engineering, surveying and computational geometry. There is plenty of literature out there on this, but very little reusable code seems to exist. This library offers a means to incorporate this ability into a client application. It does so using the least-squares method.

## Using the code

The code is standard C++ with a dependency on the Boost ublas library for matrix operations. A Visual Studio 2008 solution file, and a Makefile are provided for building on Windows and Linux respectively. Boost needs to be pre-installed with the relevant directory paths setup in your build environment.

### General overview

- Use the factory class to make an instance of the
`BestFit`

-derived computation object.
- Decide what std::ostream, if any, you want textual output of the computation to go to. e.g. Output to the console or to a std::stringstream, or none at all.
- Create two instances of the
`BestFitIO`

struct, one for input parameters and one for output.
- At the very least, two members need to be set for the input struct:
`numPoints`

is the number of points to be provided, and `points`

being the starting address of a sequence of pairs of doubles being the x,y points.
- The output struct can be left as default constructed, though see the Further input and output section below for additional options.
- Call
`Compute()`

on the computation object, passing in the I/O structs.
- If needed, get the results from the output
`BestFitIO`

instance, or rely simply on what was output to the ostream.
`delete`

the factory-made object.

### A simple example

Here is an example of computing the best-fitting circle of 10 test points that approximate a circle of unit radius centered at coordinate (100,200). The true centre of these points is then found to be (99.969,200.016) and the radius 0.977.

#include <iostream>
#include <sstream>
#include "BestFit.h"
int main(int argc, char * argv[])
{
int count = 10;
double points[20] = {
99.9510022,201.01559,
100.0701559,199.016258,
100.921381,200.0685969,
99.031849,199.895546,
99.469357,200.869111,
100.525856,200.793908,
99.13055,200.56031,
100.787632,199.483885,
100.498142,199.169206,
99.363449,199.276086
};
BestFitIO input;
input.numPoints = count;
input.points = points;
BestFitIO output;
int type = BestFitFactory::Circle;
BestFit *b = BestFitFactory::Create(type, std::cout);
b->Compute(input, output);
...
delete b;
return 0;
}

### Get the results

After computation the `BestFitIO::outputFields`

contains the parameters of the shape - all of type `double`

. The following enumerations are used to index into this array:

enum { LineGradient, LineYIntercept };
enum { CircleCentreX, CircleCentreY, CircleRadius };
enum { EllipseCentreX, EllipseCentreY, EllipseMajor, EllipseMinor, EllipseRotation };

For example, to retrieve the solved parameters of the circle in the above sample:

double centre_x = output.outputFields[BestFitIO::CircleCentreX];
double centre_y = output.outputFields[BestFitIO::CircleCentreY];
double radius = output.outputFields[BestFitIO::CircleRadius];

In addition, some quality statistics are output to the specified ostream, in this case std::cout. More or less output is set with the input `BestFitIO::verbosity`

setting. No output is also possible with the one-parameter override of the factory constructor.

Global check of the adjustment ***PASSES***
Check of the evaluated unknowns ***PASSES***
Number of observations 10
Number of unknowns 3
Degrees of freedom 7
Iterations until convergence 6
Variance 0.000690
Std. dev. observation unit weight 0.026277
***********************************
Adj. circle centre X 99.968823
Adj. circle centre Y 200.016389
Adj. circle radius 0.977187

This was an example where a circle was the best-fitting shape - the same procedure is followed for a line or ellipse.

In addition to solving for the geometry parameters, the library can also calculate:

- the adjusted points - for each input point, the equivalent point that lies on the best-fitting shape
- the residuals - the corrections to the input points

In the simple case above, no adjusted points were output. However, if `output.wantAdjustedObs=true`

, the output.points array will contain the points described in (1) above. Either allocate memory for output.points, or a trick to save memory is to set `output.points = input.points`

. The input points will then be overwritten with the adjusted points. Also by setting `output.wantResiduals=true`

, the output.residuals array is allocated and populated with the corrections described in (2) above. Both these predicates default to false.

BestFitIO output;
output.points = new double[20]; output.wantAdjustedObs = true; output.wantResiduals = true;
if (bestFit.Compute(input, output))
{
... delete [] output.residuals;
delete [] output.points;
}

## The Demo application

As a testing tool and a graphical way of demonstrating the use of the library I wrote a demo application. This is dependent on MFC and the Dundas Ultimate Grid and therefore is Windows only. Note that by using the library it is then also dependent on Boost.

The demo application uses the *best-fit* code as a statically linked-in library and allows you to:

- See the provisional and adjusted coordinate data, and a graphical view of the points;
- Right-click to add, and left-click to remove points from the graphical view;
- See the results of the adjustment dynamically change;
- Open and save the points to file;
- Generate test data of a random distribution of points approximating given line, circle or ellipse parameters.

## Points of Interest

### Command-line program

The library code can also be compiled as a stand-alone console program. This takes stream input from stdio. To build as such, all that you need to do is to undefine `BEST_FIT_LIB`

in the preprocessor defines and set the build target as an .exe. For Linux the Makefile has this mode as the default target. When building with `BEST_FIT_LIB`

defined there is an additional dependency on Boost program_options for parsing of command line switches. Command line usage is as follows:

**$ ./best-fit**
Usage: best-fit [options] NUM_POINTS X,Y ...
Allowed options:
-t [ --type ] arg type of equation (0=line, 1=circle, 2=ellipse)
-v [ --verbose ] arg 0=simple, 1=default, 2=verbose, 3=more verbose, etc.
-h [ --help ] produce this message

### Speed/performance metrics

Testing the speed of *best-fit* on a Core 2 Duo shows the following speed stats for increasing number of points.

| No. of unknowns | 100 pts | 1 000 pts | 10 000 pts |
---|

Line | 2 | <1ms | 2ms | 15ms |
---|

Circle | 3 | <1ms | 4ms | 60ms |
---|

Ellipse | 5 | 1ms | 10ms | 124ms |
---|

There is a truly massive difference in matrix multiplication time between the Release and Debug builds of Boost uBlas. This is well documented on the Internet. I had unfortunately missed this in an earlier version of this article which showed drastically incorrect timings. This has been corrected and the above timings are with NDEBUG defined to disable the time consuming debug mode checks.

## Algorithm

Linear least squares adjustment is the algorithm behind the code. Specifically the parametric case of the adjustment. Given provisional values for the unknowns, this least squares method solves for small adjustments that refine the provisionals which are then reused iteratively until the solution converges. Matrix operations are integral to this, unfortunately including matrix inversion, which means it will be slow for large matrices.

In the case of the best-fitting line, only `y`

values are treated as observations. The `x`

value is treated as invariable. The sum of the squares of the shortest vertical distance of each point to the line is minimised. This is the ordinary parametric case of the least squares adjustment.

In the case of the circle and the rotated ellipse, both the `x`

and the `y`

values are treated as observations. The sum of the squares of the shortest distance of each point to the shape is minimised. This is a special case of the general least squares adjustment known as the quasi-parametric case.

Both the parametric and quasi-parametric case are solved in the same way. The difference is in the formulation of the weight matrix and the meaning of the residuals. The residuals of the quasi-parametric case do not immediately equate to a useful geometric quantity as with the parametric case.

The solution vector, (or corrections to the provisional unknowns) is given by:

<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mi>x
<mo>=
<msup>
<mrow>
<mo>(
<msup>
<mi>A
<mn>T
<mi>P
<mi>A
<mo>)
<mn>-1
<msup>
<mi>A
<mn>T
<mi>P
<mi>ℓ

and the vector of residuals, (or corrections to the observations) is given by:

<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mi>v
<mo>=
<mi>A
<mi>x
<mo>-
<mi>ℓ

where A is known as the design matrix of the observations, P the weight matrix and ℓ the observation vector. The inverse that exists in the equation for the solution vector is what necessitates a matrix inversion and negatively affects the performance.

To formulate the design matrix the equations of the geometry shapes need to be linear. The equation of the straight line is already linear, but the equations for a circle and a rotated ellipse need to be linearised first by Taylor expansion.

Provisional values for the unknowns are first determined by approximation. Each coordinate observation and the provisional unknown values are substituted into the linearised equation to form one row of the A matrix. The ℓ matrix is the observed less provisional.

The solution vector x is then determined, and added to the provisional values to refine them. If the change was sufficiently small then the solution has converged and the provisional values are final. Otherwise the new provisionals are used to reformulate the matrices A, P and ℓ before iterating again.

In conclusion, this is one approach, not the only approach, and not necessarily the best approach. However this approach doesn't need to concern itself with differences between lines, circles and ellipses, and this makes it convenient for concise implementation as a general purpose library.

**Credit and source:** Heinz Rüther, Advanced Engineering Surveying lecture notes (unpublished), Dept. of Geomatics, University of Cape Town, 1996.

## History

- Dec 2013: First release;
- Jan 2014: Added the algorithm section based on feedback in the comments.
- Feb 2014: A few bug fixes and test-suite improvements, changed the Makefile to default to a Release build, but most importantly updated the timings section.