Applications often require solving nonlinear equations. The standard form of the problem is to solve for a number x such that f(x) = 0. This article presents three algorithms for solving this problem and describes the advantages and disadvantages of each. The algorithms are implemented in C# 4.0.
The two most well-known algorithms for root-finding are the bisection method and Newton’s method. In a nutshell, the former is slow but robust and the latter is fast but not robust. Brent’s method is robust and usually much faster than the bisection method.
The bisection method is perfectly reliable. Suppose you know that f(a) is negative and f(b) is positive. As long as f is a continuous function, there must be some value of x between a and b where f(x) = 0. The bisection method is guaranteed to find x within the specified tolerance. In fact, the error after n steps of the bisection method, |f(x)| is guaranteed to be less than |f(b) - f(a)| 2-n. Roughly speaking, the method produces one bit of precision in the answer at each step.
There are a classes of functions (for example, convex functions) for which Newton’s method is guaranteed to converge, starting from any initial guess. However, the method is very often used in practice when there is no theoretical guarantee that it will work. Often this approach succeeds. But when Newton’s method fails, it can fail spectacularly. Instead of getting closer to the desired solution, each iteration goes further from the solution. If the function being optimized is approximately flat, the method can shoot extremely far from the solution in a single step.
When Newton’s method does work, it works very well. Once it starts to converge, the number of correct bits in the result doubles at each step. Whereas bisection cuts the error in half at each step, Newton’s method squares the error at each step.
I saw a textbook once that introduced Newton’s method by quoting this nursery rhyme that sums up the method’s behavior.
There once was a girl
who had a big curl
right in the middle of her forehead.
When she was good
she was very, very good,
but when she was bad she was horrid.
Brent’s method, like the bisection method, is guaranteed to converge. In the worst case, it can be a little slower than the bisection method. However, its rate of convergence is often closer to that of Newton’s method than that of the bisection method. If the function is “nice” (technically, its derivative is Lipshitz continuous), then the error at each step is raised to the power 1.6 in the next step. (The exact power is the golden ratio: (1 + v5)/2 = 1.1618...)
To compare the rates of convergence of the three methods, you might see something like the following for running the three methods. Suppose you start each method with an initial guess at the root you want to find, and your initial guess is accurate to within 1/2, i.e. one bit.
After one iteration, the bisection answer would be accurate to two bits. After the following iteration, the answer is good to three bits. The sequence of bit accuracies might be 1, 2, 3, 4, ...
If your initial guess is close enough to the solution, the accuracy of Newton’s method would be 1, 2, 4, 8, 16, ... bits.
The accuracy for Brent’s method might be something like 1, 2, 3, 5, 8, 13, 22, ... bits. (You might notice that these are Fibonacci numbers. This is not a coincidence. The ratio of consecutive Fibonacci numbers is approximately the golden ratio and the approximation improves as the numbers get larger.)
For many applications, Brent’s method is ideal. It is as robust as the bisection method, but has speed comparable to Newton’s method. Another advantage of Brent’s method over Newton’s method is that the former does not require a derivative function as an argument but the latter does. (Brent’s method requires that a derivative exists in order to guarantee fast convergence, but the algorithm does not require you to write a function to evaluate the derivative.)
Using the Code
RootFinding has all
static methods; there is no need to instantiate an object before using its methods. The class has six methods, two overloaded versions each of
Newton. Each algorithm name corresponds to two overloaded methods, one simplified and one verbose. The verbose methods have two additional parameters:
iterationsUsed, the number of algorithm steps used to produce the solution, and
errorEstimate, an estimate of the error remaining in the solution.
The mathematical theory of root-finding always solves problems of the form f(x) = 0. If you wanted to solve an equation such as x + sin(x) = 8, you would first define a new function g(x) = x + sin(x) - 8 and then solve g(x) = 0. This standardization is fine in theory, but in practice it is a bit awkward since one often wants to solve equations of the form f(x) = c for some constant c. For convenience, the methods in
RootFinding allow the user to specify a value
target which represents this constant c and defaults to 0.
The code for Brent’s method is taken from the book Algorithms for Minimization Without Derivatives by the creator of the algorithm, Richard P. Brent. The book gives the source code for the algorithm in Algol 60. This code was translated into C# while preserving the original variable names. There is even a
goto statement. I chose to maintain correspondence to the original code over modernizing the style of the code.
Here is an example of using the simplified interface to Brent’s algorithm.
double root = RootFinding.Brent( new FunctionOfOneVariable(f), -2, 1.5);
And here is an example of using the more verbose interface to the same algorithm.
double root = RootFinding.Brent(
new FunctionOfOneVariable(f), 1.0, 5.0, 1e-10, 0.2, out iterationsUsed, out errorEstimate );
Points of Interest
Brent’s method and the bisection method both require starting values a and b such that the function f(x) has opposite signs at these two points. How do you find these points to start with? You often know good starting points from the context of your problem. If you know that such points must exist but you don't know specific values, you can search for starting values. For example, you could start with a = b and then make a smaller and b larger until f(a) and f(b) have opposite signs.
- 6 May 2010: Initial release