Click here to Skip to main content
13,004,446 members (72,726 online)
Click here to Skip to main content
Add your own
alternative version


34 bookmarked
Posted 24 Feb 2012

Introduction to Numerical Solutions

, 24 Feb 2012
Rate this:
Please Sign up or sign in to vote.
An introduction to numerical solver algorithms with general purpose demonstration code.
<pre />


One of the most common problems is to compute an answer to an equation. If it is a simple equation with one unknown variable and the variable can be isolated it’s easy to solve, such as computing y:

y = a*x +b, y = a0 + a1*x + a2*x^2, y = a0*e^x, y = x^(a/x + b)

The problem is much harder when solving for a, b based on measured x, y values. Common examples include solving for the location of a GPS receiver from a set of measurements, curve fitting an NMR return to a Lorentz distribution, computing the apparent time constants of a heat system from recorded data, or computing the solution to a transcendental function.

The classical technique to solve numerically is to compute an error function, and search (guess) for a solution that minimizes the error. Different problems lend to different approaches. But it is often through an iterative process.

This article describes some classical approaches, and presents a novel approach blending the advantages of some of the techniques presented that is often useful. Often the “optimal” technique depends on how much computing power and time is available, the accuracy required, and the smoothness and shape of the error function. Finding an optimal technique is in part subjective and relative to the problem, and it is an art. The provided new approach is meant to provide an interesting example.

The source code provided, and it is relatively simple to use, the caller defines an error function (delegate/callback) and an initial estimate of the function’s coefficients. As an API, a large number of problems can be solved with a few calls without understanding the mathematics behind it.


These are very flexible techniques. The solution converges quickly if you can describe what you do want (or not) as an error function to minimize, preferably with a smooth monotonic slope.

Example 1

Suppose you have a set of data points, and you need to calculate the line that “best” fits that line.

y = mx + b

In this case, y and x are known, but given any two pairs of y and x, a different pair of m and b are calculated due to measurement noise. The goal is to minimize the error of the points.

The error function could be, given a “b” and an “m”, compute a predicted y for each known x, then:

Error = sum( (y-ypred)*(y-ypred)) for all y and ypredicted.

This is a simple problem to solve with least squares, the provided source code demonstrates how to fit a line or a quadratic to a data set. Both a linear and quadratic solver (Solve_FirstOrder, Solve_SecondOrder) calls are provided. Note this minimizes the square of the error and not the delta error which is often more desirable if there is noise in the x, y measurements.

LinearLeastSquares lls = new LinearLeastSquares();
double [] x = {1, 2, 3, 4, 5};
double [] y = {3, 5, 7, 9, 11};
lls.Solve_FirstOrder(x, y);
if ((1 = lls.Solution[0])&& (2 == lls.Solution[1]))
    // Found the correct line that passes through y at a given x.
    // y = lls.Solution[1] * x + lls.Solution[0];</p>

Example 2

Computing where you are from several noisy distance measurements or angle measurements from known points (GPS / triangulation). In this case if you were where your guess (coefficients) said you were, then the actual measurements would have a certain value, and the difference between predicted and actual is the error. In this case, the error transformed by derivative of value vs parameter translates the error into an update to the estimate. This is the basis of what is called non-linear least squares, or the Gauss-Newton method. In its simplest form:

J*dA = R

dA is adjustment to coefficients as a vector,

R is error at each x position as a vector.

J is a matrix called the Jacobian, the coordinate transform from error to coefficient

1. A = initial estimate

2. dA = J^-1 * R

3. A = A + dA

4. Stop if magnitude of R is small enough, else go to step 2.

The matrix grows in both number of measurements and coefficients, and often it isn’t square, and when squared up multiplying by the transpose of J, it sometimes isn’t invertible. In these cases, a pseudo inverse is used. The problem with the pseudo inverse is it sometimes isn’t accurate due to round off problems, different matrix packages use different approaches to minimize the problem. Sometimes this method doesn’t converge to a solution without modifying it.

For these reasons, an iterative non-linear code example isn’t provided, it would be too large and complex, and in general, a creative solution guessing scheme converges faster.

Example 3

Suppose heat is transferred from A to B, and B to C. We might now the heater current applied at A, and the temperature at C, but nothing directly about how heat was transferred through B.

B = (B-A)*e^(-t/tba) - (C-B)*e^(-t/tbc)

You have the values of C and A, but not B, nor the time constants tba and tbc, and exponential functions are far from linear. If there is a unique solution that minimize the error (what we measured minus what the guessed at constants would predict) then the problem can be solved for. In this case the non-linear least squares would also work but as more coefficients are added, that solution might converge to different local minimums given different initial estimates and not converge to the ideal answer.

// Given many x and y, solve for a, b, c.
// These are the "real" parameters we're trying to guess at.
double a = 1.001, b = 2.002, c = 3.003;

// Generate the fake data, y = a*x^2 + b*x*e^x + c*e^(x + 1)
for (int i = 0; i < data.Length; i++) // data.Length is 100.
    double di = (double)(i * 0.1);
    data[i] = a*di*di + b*di*System.Math.Exp(di) + c*System.Math.Exp(di + 1);

// Set up each parameter's search criteria.
Solver.Parameter pa = new Solver.Parameter(
                "a",     // Name.
                1.05,       // Center
                1.2,     // Max
                0.8,     // Min
                0.1,     // Initial step size
                0.0001);  // Final step size
Solver.Parameter pb = new Solver.Parameter("b", 2.05, 2.1, 1.9, 0.1, 0.0001);
Solver.Parameter pc = new Solver.Parameter("c", 3.05, 3.1, 2.9, 0.01, 0.0001);

// Pre-Solve
//   ComputeError    The error function.
//   200             200 generations, each generation is only 10 calculations
//   10              10 walks per variable before moving on.
//   1.25            How much over testing is done per generation.
//   0.00005         After reaching this error stop trying.
Solver s = new Solver();
s.Solve_DirectSearch(ComputeError, 10, 10, 1.25, 0.00005);
pa.CenterValue = pa.Solution;
pb.CenterValue = pb.Solution;
pc.CenterValue = pc.Solution;

// Now re-solve with slope variate, slope variate assumes your close enough
// to the solution that the errors can more or less fit a quadratic and the
// trench of a sadle point can be "walked" to improve the solution.
double err = s.Solve_SlopeVariate(ComputeError, 20, 5, 1.25, 0.00000000005);

// Our provided "error" we can convert to stdev error.
err = System.Math.Sqrt(err) / data.Length;
label1.Text = 
    "Actual A, B, C = " + a + ", " + b + ", " + c + "\r\n" + 
    "Solved A, B, C = " + pa.Solution + ", " + pb.Solution + ", " + pc.Solution + 
    "\r\nError = " + err; 


Most approaches used by the author or seen in documentation require an error function, and then guess, improve the guess and repeat, a rough sketch of this is:

  1. Initial guess
  2. Compute error
  3. Refine guess
  4. Compute error
  5. If error isn’t small enough, go to step 3, else repeat.

What varies per algorithm is how the refine step is done:


Create a set of possible coefficients, repeat with a single “best” or a few “best” values.

Direct Search:

Vary one parameter at a time, repeat while reducing the step size.

Least Squares:

Treat the error as a translatable vector space from the solution.

Slope Variation:

Curve fit (least squares) the effect of one coefficient on another, direct search in pairs.

Key Feature Comparison

Plotting the error vs. any two coefficients produces a surface plot. Picture trying to find the lowest spot on a map, but only knowing the altitudes of the places visited.

· Least Squares flattens the errors into a solution, but if the map is too complex that won’t work.

· Iterative least squares uses the slope to get magnitude and direction of where to try next, but if the second derivative (rate of change of slope, or Hessian) or higher order things are at play, then the direction and or magnitude may lead to a relatively equivalent error in a new location, the solution might not converge. Inverting large matrixes is time consuming.

· Using higher order derivatives (Hessian matrix) can help if the curve is very smooth (no noise), and there are few divots in the surface (local minima) the algorithm might lock onto.

· Checking a geometric pattern to compute a search region (simplex method) or a grid pattern, and then refining the search region iteratively (genetic) around likely candidates works well if there is noise and local minima, but it takes of computations. For 5 coefficients, a single 20 test point search is 20^5 = 3,200,000 error calculations. If each error calculation computes the sum of the errors for 10k data points, and 20 iterations, this may be 640 billion calculations. Solving the same problem at 10 Hz would likely require over a 20 GHz processor.

· Varying only one parameter at a time is called a pattern search or direct search, or Hook-Jeeves algorithm. In this case the solution is explored starting at one location walking in one strait line at a time, varying step size. In general, setup appropriately, Hook-Jeeves overshoots/undershoots and explores to find the solution. The disadvantages is it may lock onto a local minimum, or it may jump over a local minimum. With a large number of coefficients, varying each in order can also tend to jump across local minima. Testing for 5 coefficients with 10k data points may take 1000 iterations or 50 million calculations instead of 640 billion. At 10Hz, this might be done with a dual core processor, GPU or an ASIC.

Theory of the new algorithm

There are many cases why an algorithm may fail to converge on the correct solution. Suppose, that the error map of two coefficients has a narrow deep trench, at some angle to the coefficient axis. That is usually the reason why other methods that are good at initially converging on the right solution stop converging. In this case, most searches will not lock onto the trench, and follow it and instead lock onto some cross section across it, and reduce step size without converging on the correct solution. Noise, and local minima will make matters worse.

This is an error plot of two coefficients in a complex equation (lighter is less error, ideal solution in center).

The error has a dominant trench from left to right that most algorithms will lock onto, some will lock into the center trench, but most will not be able to converge within it very well. The solution optimized within the center diagonal well would require the algorithm to detect, vary and trace along that point.

There are several approaches that “rotate” to the trench using Hessian or slopes but they are susceptible to noise, and work best when the trench isn’t curved, and there are not ripples in the surface.

For two coefficients a, and b, a slice through the map at a given b, sampled a few times at a step size provides a line. When that line can be approximated by a least squares curve, and that curve has a bottom, then that bottom point is the bottom of the trench in that slice. Linking the “bottom” points of multiple slices and then least squares fitting again provides the line that is the bottom of the trench for coefficients a and b. Due to noise, etc. when the step size is large enough it will skip over local minima and other ripples in the error surface that are small. The algorithm will also track a curved trench. As the step size is reduced the algorithm will continue to converge.

To ensure the refinement is real, and not due to noise the refinement is done as a pair, and only applied if the same refinement can be made when starting with “a” OR by starting with “b”.

The algorithm has several nice properties:

  • · Axises are chosen in pairs (every combination), so if any combination could be improved the algorithm rapidly converges.
  • · Subsampling at an interval that reduces locks onto major features of the error function and tends to ignore local minima.
  • · When the algorithm does lock onto a local minima, the affect of refining another pair of axis will generally “pop” the system out of that minimum.
  • · Algorithm converges extremely quickly.
  • · Uses least squares to fit a quadratic and assumes another algorithm has provided a relatively good initial estimate.

The algorithm has one significant weaknesses but can be covered by combining it with a another algorithm. If there isn’t a dominant minimum within the range of values being examined the algorithm doesn’t work. When first providing likely “estimate” locations via a genetic grid search this isn’t a problem.


Performance is relative to the equation chosen. There are large sets of equations that are best suited for each algorithm, the following demonstrates a typical case where the “Slope Variation” approach works better than others.

y = (1.001)*x^2 + (2.002)*x*e^x + (3.003)*e^(x + 1)

Paris Genetic 0.9, 2, 3.00999 Error 0.4099, 10k error function evaluations.

Paris Genetic Multiple 0.9, 2, 3.00999 Error 0.4099, 40k error function evaluations.

Direct Search 1.13, 2.00259, 3.00055, Error of 0.2157, 2k error function evaluations

Slope Variation 0.96177, 2.00163, 3.00439, Error of 0.047, 200 error function evaluations.

Slope Variation 1.000994, 2.0019999, 3.00300, Error of 6.35 x10^-6, 19.7k error function evaluations.

The Paris and Direct search techniques, could only marginally improve the error over time and converged slowly if at all towards the end. The Slope Variation method converged rapidly, and continues to converge rapidly, where other techniques do not.


When paired with an initial method to produce a refined estimate, the new method will often converge far more quickly and continue to converge almost to an arbitrary level of precision. These features make it a good approach to refine a likely solution quickly.


The zip file includes a pdf with a lot of the math done out by hand for some of the methods. Most use the same coefficient names or similar to what is shown in wikipedia to avoid confusion.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Software Developer (Senior) KMC Systems
United States United States
Phil is a Principal Software developer focusing on weird yet practical algorithms that run the gamut of embedded and desktop (PID loops, Kalman filters, FFTs, client-server SOAP bindings, ASIC design, communication protocols, game engines, robotics).

In his personal life he is a part time mad scientist, full time dad, and studies small circle jujitsu, plays guitar and piano.

You may also be interested in...


Comments and Discussions

QuestionI have a 7 parameter equation - Can I modify your program to perform curve fit passing in the initial data set? Pin
NGErndt22-Sep-14 12:36
memberNGErndt22-Sep-14 12:36 
QuestionHow could you apply this when given an empirical dataset (x and y) and an equation with multiple parameters that need to be optimized Pin
NGErndt17-Sep-14 13:03
memberNGErndt17-Sep-14 13:03 
AnswerRe: How could you apply this when given an empirical dataset (x and y) and an equation with multiple parameters that need to be optimized Pin
HoshiKata16-Mar-16 6:46
memberHoshiKata16-Mar-16 6:46 
QuestionYou have left out the 2 most important methods of fitting data to equations Pin
MicroImaging24-Feb-12 11:56
memberMicroImaging24-Feb-12 11:56 
You have left out comparing your results to the Standard nonlinear least squares parametric solver Levenberg-Marquardt algorithm">

And an algorithm I have successfully used with either L1, L2, or LInfinity criteria pioneered by Nelder and Mead in Applied Statistics. I use this algorithm over and over because no derivatives are ever computed, or even estimated unlike the Levenberg Marquardt algorithm.

I would be interested in your comparison using the several equations including the very non-linear and highly sensitive electrochemical double layer equations where one of the coefficients you are trying to solve for appears in the the term e^a(E-E0) where E is applied voltage.
AnswerRe: You have left out the 2 most important methods of fitting data to equations Pin
HoshiKata25-Feb-12 1:06
memberHoshiKata25-Feb-12 1:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170627.1 | Last Updated 24 Feb 2012
Article Copyright 2012 by HoshiKata
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid