Introduction
Kalman filters allow you to filter out noise and combine different measurements to compute an answer. They are a particularly powerful
type of filter, and mathematically elegant. Without a matrix math package, they are typically hard to compute, examples of simple filters and a general case
with a simple matrix package is included in the source code. Developed by Rudolf Kalman and others as an ideal way to estimate something by measuring
something, its vague applicability (estimate something by measuring something) is where it is so powerful, there doesn’t need to be a direct relationship between measured and estimate.
Background
Example 1
Suppose you need to measure a car’s position, but your
position estimate is accurate to +/- 10 meters at any one time, but the
velocity measurement is accurate to 0.1 meters. Further if the velocity noise
error is due to a different physical process
than the position measurement, both the velocity and position measurements can
be combined in a Kalman filter to compute the position. It will be particularly
accurate because the error biasing any one estimate due to position errors will
be mitigated by velocity and vice versa.
Example 2
Suppose you have GPS ranges to satellites (called pseudo
ranges) the errors of any one set of measurements will be uncorrelated: the nominal case is when one is higher the other will
be lower. They have separate clock based measurement errors and pass through
different areas of the atmosphere. The solution of computing position from the
pseudo ranges can be done by matrix least squares mathematics, which can be
rearranged as an iterative non-linear solution that … improves an estimate of
the location till the error approaches zero. Good GPS systems usually take that
formulation a bit farther and rewrite it as a Kalman filter. A Kalman filter
for navigation can also combine the Doppler (different kind of noise)
accumulated carrier, fractional carrier, accelerometers etc.
Demystifying Kalman Filters
There are typically 2 sets of equations: one to update the
“state”, another to use the latest “state” computed in the past to predict what
the system is or will be in the future. For example if you get measurements 10x
a second but need to make rocket course corrections 100x a second, you update
the filter at 10x, but in between updates you predict the state forward to make
the course corrections 10 time.
The general theory is to add the measurement into the
current estimate using a gain based on how certain the last estimate was. If
the estimates are stable, the gain is small and only a small effect of the
measurement is filtered in.
Predict:
X = F*X + H*U Rolls state (X) forward to new time.
P = F*P*F^T + Q Rolls the uncertainty forward in time.
Update:
Y = M –
H*X Called the
innovation, = measurement – state.
S = H*P*H^T + R S= Residual covariance transformed by H+R
K = P *
H^T *S^-1 K = Kalman gain =
variance / residual covariance.
X = X +
K*Y Update with
gain the new measurement
P = (I
– K * H) * P Update covariance
to this time.
Where:
X =
State
F =
rolls X forward, typically be some time delta.
U =
adds in values per unit time dt.
P =
Covariance – how each thing varies compared to each other.
Y =
Residual (delta of measured and last state).
M =
Measurement
S =
Residual of covariance.
R =
Minimal innovative covariance, keeps filter from locking in to a solution.
K =
Kalman gain
Q =
minimal update covariance of P, keeps P from getting too small.
H =
Rolls actual to predicted.
I =
identity matrix.
The key to a good Kalman filter design in my experience is
noting how each measurement can be transformed into the state, and how the
noises of state and measurements match up in terms of how they are related or
not. In the above equations, each letter is not a single number but an object
(matrix).
Keeping the order (matrix size) small or at least sparse
(lots of zeros in the matrix) is important. The key thing in the equations
above is S^-1 is the inverse of S (equivalent of multiplying by 1/S). The
inverse of S isn’t always something you can just compute, so usually a pseudo
inverse is used. This can be hard to compute if there are values near zero
(round off). In short … either do a lot of reading and work and learn about the
different approaches or keep the matrix small and or use a math package.
It is usually best from an efficiency and ease of use standpoint to break each thing into a small matrix based process and chain them together. This is usually a good idea because each process in the chain will likely be easier to understand, describe and test. Each will also have it's own special cases for handling reset, coast due to lack of data, bounds checking due to boundary conditions, etc.
Using the code
Suppose you get a position measurement when you can (time
between measurements vary) and you know the position is noisy, but the average
velocity of what your measuring should be changing slowly and very smoothly.
This is a second order, time varying Kalman filter.
X = a vector, X[0] =position, X[1] = velocity.
P = a 2x2 matrix (4 numbers)
Q = minimal covariance (2x2).
R = single value.
H = [1, 0], we measure only position so there is no update
of state.
To estimate forward from this time by delta (dt):
X = F*X
+ H*U
P =
F*X*F^T +Q (computed below during update).
F = 2x2 matrix: [1, dt], [0, 1].
U = 0, If we knew position always
increased at baseline velocity, U would be a function of dt.
So H*U = 0
X[0] = F[0, 0]*X[0] + F[0,1] *
X[1] = 1 * X[0] + dt*X[1] = X[0] + dt *
X[1]
X[1] = F[1,0]*X[0] +
F[1,1]*X[1] = 0 * X[0] + 1 * X[1] =
X[1].
double EstimatePredicition(double dt)
{
return m_x[0] + (dt * m_x[1]);
}
To update:
Y = M – H*X Called the innovation = measurement – state
transformed by H.
S = H*P*H^T + R S= Residual covariance =
covariane transformed by H + R
K = P *
H^T *S^-1 K = Kalman gain =
variance / residual covariance.
X = X +
K*Y Update with
gain the new measurement
P = (I
– K * H) * P Update
covariance to this time.
Predict:
X = F*X
+ H*U Rolls state (X)
forward to new time.
P = F*P*F^T + Q Rolls the
uncertainty forward in time.
The X = F*X + H*U was converted into code above:
oldX =
m_x[0];
m_x[0]
+= dt * m_x[1]; // From above.
Converting P requires some math:
P = [1, dt] [p00, p01] [1, 0] + [q00, q01]
[0, 1] [p10, p11] [dt, 1] [q10, q11]
P = [p00 + dt*p10, p01 + dt*p11]
[1, 0] + [q00, q01]
[p10, p11] [dt, 1] [q10, q11]
P = [p00 + dt*p10 + dt*p01 +
dt*dt*p11, p01 + dt*p11] + [q00, q01]
[p10 + dt*p11, p11] [q10, q11]
In code:
m_p[0] = m_p[0] + dt*(m_p[2] +
m_p[1]) + dt*dt *m_p[3] + m_q[0];
m_p[1] = m_p[1] + dt*m_p[3] +
m_q[1];
m_p[2] = m_p[2] + dt*m_p[3] +
m_q[2];
m_p[3] = m_p[3] + m_q[3];
Y = M – H*X
y0 = m - m_x[0];
y1 = ((oldX-m)/dt) -m_x[1];
S = H*P*H^T + R
Because
H = [1, 0] this is easy, and s is a single value not a matrix to invert.
s =
m_p[0] + m_r;
K = P * H^T *S^-1
K =
m_p[0] / s;
X = X + K*Y
m_x[0]
+= y0 * k;
m_x[1]
+= y1 * k;
P = (I – K * H) * P
Because H is [1, 0], and K is s scalar in this case.
for (int i = 0; i < 4; i++) m_p[i] = 1 – k*m_p[i];
The following fills out the API:
double Variance () { return m_p[0]; }
double Value() { return m_x[0]; }
void Reset(
double qx,
double qv,
double r,
double pd,
double ix)
{
m_q[0] = qx; m_q[1] = qv;
m_r = r;
m_p[0] = m_p[3] = pd;
m_p[1] = m_p[2] = 0;
m_x[0] = ix;
m_x[1] = 0;
}
Another example
Suppose you had two measurement of the same thing, say
position measured by GPS, and velocity measured by an accelerometer. The
measurement techniques do not vary the same way because the sources of noise
are unrelated (little noise cross correlations) and the amount of noise is
typical of a measurement system, it is Gaussian. The position noise is large,
say 15 meters, but the velocity noise is low, say 0.01 m/s.
This is a good example of how a Kalman filter can really use
the low noise velocity information to fix position information that might be
noisy.
The Kalman filter learns the velocity over time, and essentially trusts the velocity information to update the position estimate more than the position measurement.
If we introduced a jump in position, the position would jump and the filter would re-adapt.
If the change evolved with the signal like most tracking signal problems do, then you have scenarios such as:
In this case, the velocity went from 0.4 to -0.4 in the middle of the test. The abrupt change in velocity was smoothed based on the noise value used. If we reduced the Qv value to say 0.001 from 0.01 in the above graph the filter would rely on velocity too much:
Example of tracking a parabola with a 2nd order Kalman filter:
Note the top right is the path, top left is the error showing measurement error and the Kalman filter error from the ideal. The Gain plot is the magnitude of the determinant of the Kalman gain matrix (which isn’t a single value).
Conclusion
This is meant to just be an introduction, a jumping off spot for someone kind
of new to Kalman filtering.
On the job, I've developed Kalman filters for image object tracking, missile guidance filters, uplink telemetry timing filters, GPS tracking and navigation filters. Some were small (2-4 state) some large (13) some very large (23). Some took days, others weeks. The most effective and reliable were always small.
Usually it isn't easy. Especially if you have three or more states. By far the most common mistake is making them too big and never quite finding out what's wrong. Figuring out the cross interactions is a science, but it's often good to be deliberately a little more or less
aggressive in how it smooths or how quickly it adapts, so... doing that for 20 seemingly unrelated things is tuning 20*20 often non-linear things, by hand.
Kalman filters always work best when they are kept as basic as possible.