## Introduction

The most obvious way of evaluating mathematical functions might not work. For example, how would you calculate `log(x+1)/x`

? The most obvious approach would be to add `1`

to `x`

, take the log, then divide by `x`

. For some values of `x`

, that will produce an accurate result. For other values, it will be off by 100%.

This article gives three examples where the most direct evaluation of functions can fail. Along the way, it gives four general rules for accurately evaluating functions.

The code included with the article applies the principles in the article to evaluating nine functions that often come up in statistics.

## Example 1: Quadratic Equation

Suppose you want to find the roots of `3x<sup>2</sup> + 10<sup>9</sup>x + 5 = 0`

. This will be simple: just apply the quadratic equation. What could go wrong? Let's see.

double a = 3.0, b = 1.0e9, c = 5.0;
double d = b*b - 4.0*a*c;
double r1 = (-b - sqrt(d))/(2.0*a);
double r2 = (-b + sqrt(d))/(2.0*a);

This code correctly computes `r1`

to all displayed digits but calculates `r2`

as `0`

even though the correct value to machine precision is `<nobr>-4.5e-8</nobr>`

. The absolute error in computing `r2`

is small but the relative error is 100%. Why is one root computed accurately and the other not?

The cardinal rule in numerical analysis is to avoid subtracting nearly equal numbers. The more nearly equal two numbers are, the more precision is lost in the subtraction. Since `b`

is so much larger than `a `

and `c`

, the variable `sqrt(d)`

is very nearly the same as `b`

. The variables `b`

and `sqrt(d)`

agree to 12 decimal places, so about 12 decimal places are lost in the subtraction. How do we compute `r2 `

more accurately?

**Rule 1**: Use algebraic trickery to avoid loss of precision.

There's an equivalent but much less familiar form of the quadratic equation. Multiply the numerator and denominator of the standard form by the numerator with the ± sign turned upside down and simplify. This new form adds `b`

and `sqrt(d)`

in the denominator, avoiding the problem of subtracting nearly equal numbers. The revised code below is completely accurate:

double a = 3.0, b = 1.0e9, c = 5.0;
double d = b*b - 4.0*a*c;
double r1 = (-b - sqrt(d))/(2.0*a);
double r2 = -2.0*c/(b + sqrt(d));

There's still some inaccuracy since `d`

equals `b*b`

in machine precision, but we've improved our error rate from 100% down to 10%. For more accuracy, abandon the quadratic equation and use a root finding algorithm.

Along these same lines, see Comparing three methods of computing standard deviation for numerical examples showing how three algebraically equivalent formulas can have very different behavior. A similar phenomena happens when fitting a line to data.

## Example 2: Ratios of Factorials

Probability calculations often involve taking the ratio of very big numbers to produce a moderate-sized number. The final result may fit inside a `double`

with room to spare, but the intermediate results would overflow. For example, suppose you need to calculate the number of ways to select 10 objects from a set of 200. This is 200!/(190! 10!), about 2.2 e16. But 200! and 190! would overflow the range of a `double`

.

There are two ways around this problem. Both use the following rule.

**Rule 2**: Use algebraic trickery to avoid overflow.

The first trick is to recognize that 200! = 200*199*198*...*191*190! and so 200!/(190! 10!) = 200*199*198*...*191/10!. This certainly works, but it's limited to factorials. A more general technique is to use logarithms to avoid overflow: take the logarithm of the expression you want to evaluate and then exponentiate the result. In this example, log( 200!/(190! 10!) ) = log(200!) - log(190!) - log(10!). If you have code that calculates the logarithm of factorials directly without calculating factorials first, you could use it to find the logarithm of the result you want, then apply the `exp`

function. For more on this example, see Five tips for floating point computing.

## Example 3: log(1 + x)

Now let's look at the example of computing `log(x+1)`

mentioned in the introduction. Consider the following code:

double x = 1e-16;
double y = log(1 + x)/x;

This code sets `y`

to 0, even though the correct value equals 1 to machine precision. What went wrong?

Double precision numbers are accurate to about 15 decimal places, so `1 + x`

equals `1`

to machine precision. The log of `1`

is zero, so y is set to zero. But for small values of `x`

, `log(1 + x)`

is approximately `x`

, and so `log(1+x)/x`

is approximately `1`

. That means the code above for computing `log(1 + x)/x`

returns a result with 100% relative error. If `x`

is not so small that `1 + x `

equals `1`

in the machine, we can still have problems. If `x`

is moderately small, the bits in `x`

are not totally lost in computing `1 + x`

, but some of them are. The closer `x`

is to `0`

, the more bits are lost.

This problem is similar to the problem with the quadratic equation, so we might try a similar solution. Unfortunately, algebraic trickery won't help. So we try our second rule.

**Rule 3**: Use analytical approximations to avoid loss of precision.

The numerical analysts favorite form of approximation is power series. The power series for `log(1 + x) is x + x<sup>2</sup>/2! + x<sup>3</sup>/3! + `

... For small values of `x`

, simply returning `x`

for `log(1 + x) `

is an improvement. This would work well for the smallest values of `x`

, but for some not-so-small values, this will not be accurate enough, but neither will directly computing `log(1 + x)`

. The sample code included with this article uses a rational approximation for small values of `x`

and direct calculation for larger values.

## Example 4: Inverse Logit

Next let's look at computing `f(x) = e<sup>x</sup>/(1 + e<sup>x</sup>)`

. (Statisticians call this the "inverse logit" function because it is the inverse of the function they call the "logit" function.) The most direct approach would be to compute `exp(x)/(1 + exp(x))`

. Let's see where that can break down.

double x = 1000;
double t = exp(x);
double y = t/(1.0 + t);

Printing out `y`

gives `-1.#IND`

. This is because the calculation for `t`

overflowed, producing a number larger than could fit in a `double`

. But we can figure out the value of `y`

easily. If `t`

is so big that we can't store it, then `1 + t`

is essentially the same as `t`

and the ratio is very nearly `1`

. This suggests we find out which values of `x`

are so big that `f(x)`

will equal `1`

to machine precision, then just return `1`

for those values of `x`

to avoid the possibility of overflow. This is our third rule.

**Rule 4**: Don't calculate a result you can accurately predict.

The header file *float.h*

has a constant `DBL_EPSILON`

which is the smallest `double`

precision that we can add to `1`

without getting `1`

back. A little algebra shows that if `x`

is bigger than `-log(DBL_EPSILON)`

, then `f(x)`

will equal `1`

to machine precision. So here's a code snippet for calculating `f(x)`

for large values of `x`

:

const double x_max = -log(DBL_EPSILON);
if (x > x_max) return 1.0;

## Using the Code

The code provided with this article calculates seven functions that come up in statistics. Each avoids problems of overflow, underflow, or loss of precision that could occur for large negative arguments, large positive arguments, or arguments near zero.

`LogOnePlusX`

computes `log(1 + x)`

as in Example 3 `ExpMinusOne`

computes `e<sup>x</sup>-1`

`Logit `

computes `log(x/(1-x)) `

`LogitInverse`

computes `e<sup>x</sup>/(1 + e<sup>x</sup>)`

as discussed in Example 4 `LogLogitInverse`

computes `log(e<sup>x</sup>/(1 + e<sup>x</sup>)) `

`LogitInverseDifference`

computes `LogitInverse(x) - LogitInverse(y) `

`LogOnePlusExpX`

computes `log(1 + exp(x)) `

`ComplementaryLogLog`

computes `log( -log(1 - x) )`

`ComplementaryLogLogInverse`

computes `1.0 - exp(-exp(x))`

## Points of Interest

The solutions presented here look unnecessary or even wrong at first. If this kind of code isn't well commented, someone will come through and "simplify" it incorrectly. They'll be proud of all the unnecessary clutter they removed. And if they do not test extreme values, their new code will appear to work correctly. The wrong answers and NaNs will only appear later.

## History

- 16
^{th} April, 2008: Initial post - 18
^{th} April, 2008: Revised article to clarify a couple of points - 9
^{th} October, 2008: Added links to further examples - 29
^{th} October, 2008: Added examples