Extended Kalman Filters are even more interesting because they let you do sensor fusion and such
My simple head space (as I was taught and re-learned thru experience, and have passed on)
1. Kalman Gain close to 1 or 0 is a warning sign that careful consideration is needed.
This fact can be brought up immediately in example #5 and continued
2a. K close to 1.0 can be bad because..., however for some applications (dynamic models) it can be acceptable since...
2b. K close to 0.0 can be bad because... however for some applications (dynamic models) it can be acceptable since...
3. To solve the problem from step 2, As a first step, for those applications where K close to zero or one is bad... a fudge factor term (called Q for reasons discussed later) can be added to the Kalman Gain computation
3a. Choosing the correct fudge factor for the application is often very difficult and may require lots of simulation runs (a parameter study) with different measurement sequences (including some expected off-nominals) and various values for the process noise.
Remember we are designing a filter, likely for a new application (or a non-trivial extension of an existing application)... so all the elements of an engineering design are needed. Make solution hypothesis, test them, refine them, test them some more with greater realism and eventually real-world data, continue to refine the solution.
4. For easy case of a simple application and only a few unknown states, the process noise can be guesstimated from experience. For more complex applications (perhaps there are dozens of unknown states to estimate) a more rigorous approach to select the correct mathematical description of Process Noise is needed.
-- End of Fudge Factor discussion --
5. Here you can introduce the notion that the state dynamics cannot model everything and that unmodeled part can be approximated by Process Noise. For example an unmodeled constant acceleration, gives dt^4
Here are some sentences I think are wrong or misleading
"As you can see, the Kalman Gain gradually decreases; therefore, the KF converges." However, the Kalman Filter may converge to garbage. This garbage could be a "lag", or just plain wrong.
"The process noise produces estimation errors." A well chosen process noise is important to reduce estimation errors over an ensemble of conditions, by accommodating a range of unmodeled state dynamics. A poorly chosen process may not improve anything.
My simple head space (as I was taught and re-learned thru experience, and have passed on)
1. Kalman Gain close to 1 or 0 is a warning sign that careful consideration is needed.
This fact can be brought up immediately in example #5 and continued
2a. K close to 1.0 can be bad because..., however for some applications (dynamic models) it can be acceptable since...
2b. K close to 0.0 can be bad because... however for some applications (dynamic models) it can be acceptable since...
3. To solve the problem from step 2, As a first step, for those applications where K close to zero or one is bad... a fudge factor term (called Q for reasons discussed later) can be added to the Kalman Gain computation
3a. Choosing the correct fudge factor for the application is often very difficult and may require lots of simulation runs (a parameter study) with different measurement sequences (including some expected off-nominals) and various values for the process noise.
Remember we are designing a filter, likely for a new application (or a non-trivial extension of an existing application)... so all the elements of an engineering design are needed. Make solution hypothesis, test them, refine them, test them some more with greater realism and eventually real-world data, continue to refine the solution.
4. For easy case of a simple application and only a few unknown states, the process noise can be guesstimated from experience. For more complex applications (perhaps there are dozens of unknown states to estimate) a more rigorous approach to select the correct mathematical description of Process Noise is needed.
-- End of Fudge Factor discussion --
{I think you covered this section well} Then you can introduce the notion that the state dynamics cannot model everything and that unmodeled part can be approximated by Process Noise. For example an unmodeled constant acceleration, gives a process noise of ....
Here are some sentences I think are wrong or misleading
"As you can see, the Kalman Gain gradually decreases; therefore, the KF converges." However, the Kalman Filter may converge to garbage. This garbage could be a "lag", or just plain wrong.
"The process noise produces estimation errors." A well chosen process noise is important to reduce estimation errors over an ensemble of conditions, by accommodating a range of unmodeled state dynamics. A poorly chosen process may not improve anything.
Open the Sendspin live demo in your browser: https://www.sendspin-audio.com/#live-demo
Some more info on Kalman implementation here https://github.com/Sendspin/time-filter/blob/main/docs%2Fthe...
I recently updated the homepage of my Kalman Filter tutorial with a new example based on a simple radar tracking problem. The goal was to make the Kalman Filter understandable to anyone with basic knowledge of statistics and linear algebra, without requiring advanced mathematics.
The example starts with a radar measuring the distance to a moving object and gradually builds intuition around noisy measurements, prediction using a motion model, and how the Kalman Filter combines both. I also tried to keep the math minimal while still showing where the equations come from.
I would really appreciate feedback on clarity. Which parts are intuitive? Which parts are confusing? Is the math level appropriate?
If you have used Kalman Filters in practice, I would also be interested to hear whether this explanation aligns with your intuition.
1. understand weighted least squares and how you can update an initial estimate (prior mean and variance) with a new measurement and its uncertainty (i.e. inverse variance weighted least squares)
2. this works because the true mean hasn't changed between measurements. What if it did?
3. KF uses a model of how the mean changes to predict what it should be now based on the past, including an inflation factor on the uncertainty since predictions aren't perfect
4. after the prediction, it becomes the same problem as (1) except you use the predicted values as the initial estimate
There are some details about the measurement matrix (when your measurement is a linear combination of the true value -- the state) and the Kalman gain, but these all come from the least squares formulation.
Least squares is the key and you can prove it's optimal under certain assumptions (e.g. Bayesian MMSE).
1. Model of system
2. Internal state
3. How is optimal estimation defined
4. Covariance (statistics)
Kalman filter is optimal estimation of internal state and covariance of system based on measurements so far.
Kalman process/filter is mathematical solution to this problem as the system is evolving based on input and observable measurements. Turns out that internal state that includes both estimated value and covariance is all that is needed to fully capture internal state for such model.
It is important to undrstand, that having different model for what is optimum, uncertenty or system model, compared to what Rudolf Kalman presented, gives just different mathematical solution for this problem. Examples of different optimal solutions for different estimation models are nonlinear Kalman filters and Wiener filter.
---
I think that book on this topic from author Alex Becker is great and possibly best introduction into this topic. It has lot of examples and builds requred intuition really well. All I was missing is little more emphasis into mathematical rigor and chapter about LQG regulator, but you can find both of this in original paper by Rudolf Kalman.
Yet virtually all tutorials stick to single-input examples, which is really an edge case. This site is no exception.
See for example: https://rlabbe.github.io/Kalman-and-Bayesian-Filters-in-Pyth...
Is there something in this particular resource that makes it worth buying?
Before I got into control theory, I've read a lot of HN posts about kalman filters being the "sensor fusion" algorithm, which is the wrong mental model. You can do sensor fusion with state estimation, but you can't do state estimation with sensor fusion.
The book goes further into topics like tuning, practical design considerations, common pitfalls, and additional examples. But there are definitely many good free resources out there, including the one you linked.
if you dont want to buy the book, most of the linear kalman filter stuff is available for free: https://kalmanfilter.net/kalman-filter-tutorial.html
The one important point that I think warrants a small paragraph near the end is that the example you gave is a way of doing forecasting (estimating the future state) and nowcasting (estimating the current state), but Kalman filters can also be used retrospectively to do retrocasting (using the present data to get a better estimate of the past).
Nowcasting and retrocasting are concepts that a lot of people have trouble with. That trouble is the crux of the Kalman filter ... combining (noisy) measurements with (noisy) dead reckoning gives us (better) knowledge. For complete symmetry, it is important to point out that we can't just use old measurements to describe the past any more than we should only use current and past measurements to define our estimate of the present.
I've been in the process of writing a tutorial on how PID filters work for a much younger audience. As a result, I've been looking back at the original tutorials that made stuff click for me. I had several engineers try to explain PID control to me over the course of about a year, but I don't think I really got it until I ended up watching Terry Davis (yeah, the TempleOS guy) show off how to use PID control in SimStructure using a hovering rocket as an example.
The way he built the concept up was to take each component and build on the control system until he had something that worked. He started off with a simple proportional controller that ended up having a steady state error with the rocket hovering beneath the target height. Once he had that and pointed out the steady state error, he implemented the integral term showed off how it resulted in overshoot. Once that was working, he implemented the derivative control to back the overshoot off until he had something that settled pretty quickly.
I'm not sure how you could do something similar for a Kalman Filter, but I did find it genuinely constructive to see the thought process behind adding each component of the equation.
Your early explanation of the filter (as a method for estimating the state of a system under uncertainty) was great but (unless I missed it) when you introduced the equations I wasn't clear that was the filter. I hope that makes sense.
Basically, a Kalman filter is part of a larger class of "estimators", which take the input data, and run additional processing on top of it to figure out the true measurement.
The very basic estimator a low pass filter is also an "estimator" - it rejects high frequency noise, and gives you essentially a moving average. But is a static filter that assumes that your process has noise of a certain frequency, and anything below that is actual changes in the measured variable.
You can make the estimator better. Say you have some idea of how the process variable should behave.For a very simple case, say you are measuring temperature, and you have a current measurement, and you know that change in temperature is related to current being put through a winding. You can capture that relationship in a model of the process, which runs along side the measurement of the actual temperature. Now you have the noisy temperature reading, the predicted reading (which acts like a mean), and you can compute the covariance of the noise, which then you can use to tune the parameter of low pass filter. So if your noise changes in frequency for some reason, the filter will adjust and take care of it.
The Kalman filter is an enhanced version of above, with the added feature of capturing correlation between process variables and using the measurement to update variables that are not directly measurement. For example, if position and velocity are correlated, a refined measurement on the position from gps, will also correct a refined measurement on velocity even if you are not measuring velocity (since you are computing velocity based of an internal model)
The reason it can be kind of confusing is because it basically operates in the matrix linear space, by design to work with other tools that let you do further analysis. So with restriction to linear algebra, you have to assume gaussian noise profile, and estimate process dependence as a covariance measure.
But Kalman filter isnt the end/all be all for noise rejection. You can do any sort of estimation in non linear ways. For example, I designed an automated braking system for an aircraft that tracks a certain brake force command, by commanding a servo to basically press on a brake pedal. Instead of a Kalman filter, I basically ran tests on the system and got a 4d map of (position, pressure, servo_velocity)-> new_pressure, which then I inverted to get the required velocity for target new pressure. So the process estimation was basically commanding the servo to move at a certain speed, getting the pressure, then using position, existing pressure, and pressure error to compute a new velocity, and so on.
The challenge would be to keep it intuitive and accessible without oversimplifying. Still, it could be an interesting direction to explore.
So it is right to say that the implementation of the KF is tightly coupled to the system. Getting that part right is usually the hardest step.
You're right that the term can feel vague without that context. I’ll consider adding a short clarification earlier in the introduction to make this clearer before diving into the math. Thanks for the suggestion.
Small clarification: nonlinear Kalman filters are suboptimal. EKF relies on linear approximations, and UKF uses heuristic approximations.
The derivation of the Q matrix is a separate topic and requires additional assumptions about the motion model and noise characteristics, which would have made the example significantly longer. I cover this topic in detail in the book.
I'll consider adding a brief explanation or reference to make that step clearer. Thanks for pointing this out.
In Kalman filter theory there are two different components:
- The system model
- The Kalman filter (the algorithm)
The state transition and measurement equations belong to the system model. They describe the physics of the system and can vary from one application to another.
The Kalman filter is the algorithm that uses this model to estimate the current state and predict the future state.
I'll consider making that distinction more explicit when introducing the equations. Thanks for pointing this out.
Higher sampling rates can help in some cases, especially when tracking fast dynamics or reducing measurement noise through repeated updates. However, the main strength of the Kalman filter is combining a model with noisy measurements, not necessarily relying on high sampling rates.
In practice, Kalman filters can work well even with relatively low-rate measurements, as long as the model captures the system dynamics reasonably well.
I also agree that it's often something you design into the system rather than applying as a post-processing step.
All of that stuff is used in industry because a lot of regulation (for things like aircraft) basically requires your control laws to be linear so that you can prove stability.
In reality, when you get into non linear control, you can do a lot more stuff. I did a research project in college where we had an autonomous underwater glider that could only get gps lock when it surfaced, and had to rely on shitty MEMS imu control under water. I actually proposed doing a neural network for control, but it got shot down because "neural nets are black boxes" lol.
"If you can't explain it simply, you don't understand it well enough."
Albert Einstein
The Kalman Filter is an algorithm for estimating and predicting the state of a system in the presence of uncertainty, such as measurement noise or influences of unknown external factors. The Kalman Filter is an essential tool in areas like object tracking, navigation, robotics, and control. For instance, it can be applied to estimate the trajectory of a computer mouse by reducing noise and compensating for hand jitter, resulting in a more stable motion path.
In addition to engineering, the Kalman Filter finds applications in financial market analysis, such as detecting stock price trends in noisy market data, and in meteorological applications for weather prediction.
Although the Kalman Filter is a simple concept, many educational resources present it through complex mathematical explanations and lack real-world examples or illustrations. This gives the impression that the topic is more complex than it actually is.
This guide presents an alternative approach that uses hands-on numerical examples and simple explanations to make the Kalman Filter easy to understand. It also includes examples with bad design scenarios where Kalman Filter fails to track the object correctly and discusses methods for correcting such issues.
By the end, you will not only understand the underlying concepts and mathematics but also be able to design and implement the Kalman Filter on your own.
This project explains the Kalman Filter at three levels of depth, allowing you to choose the path that best fits your background and learning goals:
Example-driven guide to Kalman Filter
We begin by formulating the problem to understand why we need an algorithm for state estimation and prediction.
To illustrate this, consider the example of a tracking radar:
![]()
Suppose we have a radar that tracks an aircraft. In this scenario, the aircraft is the system, and the quantity to be estimated is its position, which represents the system state.
The radar samples the target by steering a narrow pencil beam toward it and provides position measurements of the aircraft. Based on these measurements, we can estimate the system state (the aircraft's position).
To track the aircraft, the radar must revisit the target at regular intervals by pointing the pencil beam in its direction. This means the radar must be able to predict the aircraft's future position for the next beam. If it fails to do so, the beam may be pointed in the wrong direction, resulting in a loss of track. To make this prediction, we need some knowledge about how the aircraft moves. In other words, we need a model that describes the system's behavior over time, known as the dynamic model.
To simplify the example, let us consider a one-dimensional world in which the aircraft moves along a straight line either toward the radar or away from it.

The system state is defined as the range of the airplane from the radar, denoted by \( r \). The radar sends a pulse toward the airplane, which reflects off the target and returns to the radar. By measuring the time elapsed between the transmission and reception of the pulse and knowing that the pulse is an electromagnetic wave traveling at the speed of light, the radar can easily calculate the airplane's range \( r \). In addition to range, the radar can also measure the airplane's velocity \( v \), just like a police radar gun detects a car's speed by using the Doppler effect.
Let us assume that at time \( t_{0} \), the radar measures the aircraft's range and velocity with very high accuracy and precision. The measured range is 10,000 meters, and the velocity is 200 meters per second. This gives us the system state:
\[ r_{t_{0}} = 10,000m \]
The next step is to predict the system state at time \( t_{1}=t_{0}+\Delta t \), where \( \Delta t \) is the target revisit time. Given that the aircraft is expected to maintain a constant velocity, a constant velocity dynamic model can be used to predict its future position.
The distance traveled during the time interval \( \Delta t \) is given by:
\[ \Delta r = v \cdot \Delta t \]
Assuming a sampling interval of 5 seconds, the predicted position at time \( t_{1} \) is:
\[ r_{t_{1}} = r_{t_{0}} + \Delta r = 10,000 + 200 \cdot 5 = 11,000m \]
This is an elementary algorithm built on simple principles. The current system state is derived from the measurement, and the dynamic model is used to predict the future state.

In real life, things are more complex. First, the radar measurements are not perfectly precise. It is affected by noise and contains a certain level of randomness. If ten different radars were to measure the aircraft's range at the same moment, they would produce ten slightly different results. These results would likely be close to each other, but not identical. The variation in measurements is caused by measurement noise.
This leads to a new question: How certain is our estimate? We need an algorithm that not only provides an estimate but also tells us how reliable that estimate is.
Another issue is the accuracy of the dynamic model. While we may assume that the aircraft moves at a constant velocity, external factors such as wind can introduce deviations from this assumption. These unpredictable influences are referred to as process noise.
Just as we want to assess the certainty of our measurement-based estimate, we also want to understand the level of confidence in our prediction.
The Kalman Filter is a state estimation algorithm that provides both an estimate of the current state and a prediction of the future state, along with a measure of their uncertainty. Moreover, it is an optimal algorithm that minimizes state estimation uncertainty. That is why the Kalman Filter has become such a widely used and trusted algorithm.

Let us begin with a simple example: a one-dimensional radar that measures range and velocity by transmitting a pulse toward an aircraft and receiving the reflected echo. The time delay between pulse transmission and echo reception provides information about the aircraft range \(r\), and the frequency shift of the reflected echo provides information about the aircraft velocity \(v\) (Doppler effect).
In this example, the system state is described by both the aircraft range \(r\) and velocity \(v\). We define the system state by the vector \(\boldsymbol{x}\), which includes both quantities:
\[ \boldsymbol{x}=\left[\begin{matrix}r\\v\\\end{matrix}\right] \]
We denote vectors by lowercase bold letters and matrices by uppercase bold letters.
Because the system state includes more than one variable, we use linear algebra tools, such as vectors and matrices, to describe the mathematics of the Kalman Filter. If you are not comfortable with linear algebra, please review the One-Dimensional Kalman Filter section in the online tutorial or in the book. It presents the Kalman Filter equations and their derivation using high-school-level mathematics, along with four fully solved examples.
In this example, we will use the first measurement to initialize the Kalman Filter (for more information on initialization techniques and their impact on the Kalman Filter performance, refer to Chapter 21 of the book). At time \(t_0\), the radar measures a range of \(10,000m\) and a velocity of \(200m/s\). The measurements are denoted by the letter \(\boldsymbol{z}\).
We stack the measurements into the measurement vector \(\boldsymbol{z}\):
\[ \boldsymbol{z}_0=\left[\begin{matrix}10{,}000\\200\\\end{matrix}\right] \]
The subscript \(0\) indicates time \(t_0\).
The measurement does not reflect the exact system state. Measurements are corrupted by random noise; therefore, each measurement is a random variable.
Can we trust this measurement? How certain is it? Each measurement is accompanied by a squared measurement uncertainty value (sometimes called the measurement error). This squared uncertainty is the measurement's variance. You can read more about variance in the Essential Background I section. For a more detailed discussion of measurement uncertainty, see the Kalman Filter in One Dimension section.
In radar systems, measurement uncertainty is largely determined by the ratio of received signal strength to noise. The higher the signal-to-noise ratio, the lower the measurement variance, and the greater our confidence in the measurement.
The following figure compares low-signal and high-signal cases in the presence of noise.

Let us assume that the standard deviation of the range measurement is \( 4m \) and the standard deviation of the velocity measurement is \( 0.5m/s \). Since variance is the square of the standard deviation, the squared measurement uncertainty (denoted by \( \boldsymbol{R} \)) is:
\[ \boldsymbol{R}_0=\left[\begin{matrix}16&0\\0&0.25\\\end{matrix}\right] \]
\( \boldsymbol{R} \) is a covariance matrix. The main diagonal elements contain the variances, and the off-diagonal elements are the covariances between measurements.
\[ \boldsymbol{R}=\left[\begin{matrix}\sigma_r^2&\sigma_{rv}^2\\[0.5em]\sigma_{vr}^2&\sigma_v^2\\\end{matrix}\right] \]
In this example, we assume that errors in the range and velocity measurements are not related to each other, so the off-diagonal elements of the measurement covariance matrix are set to zero.
For a refresher on variance and standard deviation, see the Essential Background I section of the online tutorial.
For a refresher on covariance matrices, see the Essential Background II section.
During initialization, the only information we have is a single measurement. In this example, the measurement and the system state are described by the same quantities (\(r\) and \(v\)). Therefore, we can use the measurement as the initial estimate of the system state. This can be done only during the initialization step:
\[ \boldsymbol{\hat{x}}_{0,0}=\boldsymbol{z}_0=\left[\begin{matrix}10{,}000\\200\\\end{matrix}\right] \]
The subscript \(0,0\) has the following meaning:
In other words, the estimate is for time \(t_0\), and it was also calculated at the time \(t_0\).
We now predict the next state. Assume the target revisit time is 5 seconds \((\Delta t=5s)\), therefore \(t_1=5s\).
To estimate the future system state, we must describe how the system evolves over time. In this example, we assume a constant velocity dynamic model (the motion model):
\[ v_{1} = v_{0} = v \] \[ r_{1} = r_{0} + v_{0}\Delta t \]
(For examples of accelerating dynamic models, refer to Chapter 9 of the book.)
Let us describe the dynamic model in a matrix form:
\[ {\hat{\boldsymbol{x}}}_{1,0}=\boldsymbol{F}{\hat{\boldsymbol{x}}}_{0,0} \]
The subscript \(1,0\) has the following meaning:
Thus, \( \hat{\boldsymbol{x}}_{1,0} \) is our estimate of the system state at time \(t_1\), computed using information available at time \(t_0\). In other words, it is a prediction of the future state.
The matrix \( \boldsymbol{F} \) is called the state transition matrix and describes how the system state evolves over time:
\[ {\hat{\boldsymbol{x}}}_{1,0}=\left[\begin{matrix}{\hat{r}}_{1,0}\\{\hat{v}}_{1,0}\\\end{matrix}\right]=\left[\begin{matrix}1&\Delta t\\0&1\\\end{matrix}\right]\left[\begin{matrix}{\hat{r}}_{0,0}\\{\hat{v}}_{0,0}\\\end{matrix}\right]=\left[\begin{matrix}1&5\\0&1\\\end{matrix}\right]\left[\begin{matrix}10,000\\200\\\end{matrix}\right]=\left[\begin{matrix}11,000\\200\\\end{matrix}\right] \]
Appendix C of the book describes a method for modeling the dynamics of any linear system.
The equation
\[ {\hat{\boldsymbol{x}}}_{n+1,n}=\boldsymbol{F}{\hat{\boldsymbol{x}}}_{n,n} \]
is the state extrapolation (prediction) equation. It tells us how to compute the next state from the current one. It takes our current state estimate and uses the system's motion model to predict the state at the next time step.
The full form of the state extrapolation equation is:
\[ {\hat{\boldsymbol{x}}}_{n+1,n}=\boldsymbol{F}{\hat{\boldsymbol{x}}}_{n,n} + \boldsymbol{G}\boldsymbol{u}_n \]
where:
The input vector represents additional information provided to the Kalman Filter, such as readings from an onboard accelerometer.
In this simple example, we assume there is no input, so \(\boldsymbol{u}_n=0\).
For an example that includes an input term, see the State Extrapolation Equation page of the online tutorial or the fully solved Example 10 in the book.
Every measurement and every estimate in the Kalman Filter comes with uncertainty information. After predicting the next state, we should also ask: how precise is this prediction?
The squared uncertainty of the current state estimate is represented by the covariance matrix:
\[ \boldsymbol{P}_{0,0}=\left[\begin{matrix}16&0\\0&0.25\\\end{matrix}\right] \]
However, the prediction covariance is not computed as:
\[ \textcolor{red}{\xcancel{\textcolor{black}{ \boldsymbol{P}_{1,0}=\boldsymbol{F}\boldsymbol{P}_{0,0} }}} \]
This is because \(\boldsymbol{P}\) is a covariance matrix, and variances and covariances involve squared terms.
The covariance extrapolation equation (without the process noise) is given by:
\[ \boldsymbol{P}_{n+1,n}=\boldsymbol{F}\boldsymbol{P}_{n,n}\boldsymbol{F}^T \]
You can find the full derivation in the Covariance Extrapolation Equation section of the online tutorial.
For our example:
$$ \boldsymbol{P}_{1,0}=\boldsymbol{F}\boldsymbol{P}_{0,0}\boldsymbol{F}^T=\left[\begin{matrix}1&5\\0&1\\\end{matrix}\right]\left[\begin{matrix}16&0\\0&0.25\\\end{matrix}\right]\left[\begin{matrix}1&0\\5&1\\\end{matrix}\right]=\left[\begin{matrix}1&5\\0&1\\\end{matrix}\right]\left[\begin{matrix}16&0\\1.25&0.25\\\end{matrix}\right]=\left[\begin{matrix}\colorbox{yellow}{$22.25$}&1.25\\1.25&\colorbox{yellow}{$0.25$}\\\end{matrix}\right] $$
Look at the main diagonal of the covariance matrix.
The velocity variance \(\sigma_v^2\) is still \(0.25 \, m^2/s^2\). It did not change because the dynamic model assumes constant velocity.
In contrast, the range variance \(\sigma_r^2\) increased from \(16m^2\) to \(22.25m^2\). This reflects the fact that uncertainty in velocity leads to increasing uncertainty in range over time.
As noted earlier, the assumption of constant-velocity dynamics is not fully accurate. In reality, the aircraft's velocity can be affected by external and unknown factors, such as wind. As a result, the actual prediction uncertainty is higher than what the simple model predicts.
These unpredictable influences are called process noise and are denoted by \(\boldsymbol{Q}\). To take these effects into account, we add \(\boldsymbol{Q}\) to the prediction covariance equation:
\[ \boldsymbol{P}_{n+1,n}=\boldsymbol{F}\boldsymbol{P}_{n,n}\boldsymbol{F}^T + \boldsymbol{Q}\]
To gain intuition about how process noise affects Kalman Filter performance, see Example 6 in the online tutorial.
Let us assume that the standard deviation of the random acceleration is \(\sigma_a=0.2m/s^2\). This represents uncertainty in random aircraft acceleration caused by unpredictable environmental influences.
Consequently, the random acceleration variance \(\sigma_a^2=0.04m^2/s^4\).
For our example, the process noise matrix is given by:
$$ \boldsymbol{Q} = \left[\begin{matrix} \frac{\Delta t^4}{4} & \frac{\Delta t^3}{2} \\[0.5em] \frac{\Delta t^3}{2} & \Delta t^2 \end{matrix}\right] \sigma_a^2 $$
With \(\Delta t=5\mathrm{s}\) and \(\sigma_a^2=0.04\,\mathrm{m}^2/\mathrm{s}^4\), this becomes:
$$ \boldsymbol{Q}=\left[\begin{matrix}\frac{625}{4}&\frac{125}{2}\\[0.5em] \frac{125}{2}&25\\\end{matrix}\right]0.04=\left[\begin{matrix}6.25&2.5\\2.5&1\\\end{matrix}\right] $$
The derivation of the process noise matrix is presented in Section 8.2.2 of the book.
After adding the process noise, the squared uncertainty of our prediction is:
$$ \boldsymbol{P}_{1,0}=\boldsymbol{F}\boldsymbol{P}_{0,0}\boldsymbol{F}^T+\boldsymbol{Q}\ =\left[\begin{matrix}22.25&1.25\\1.25&0.25\\\end{matrix}\right]+\left[\begin{matrix}6.25&2.5\\2.5&1\\\end{matrix}\right]\ =\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right] $$
Initialization
We initialized the Kalman Filter by using the first measurement as the initial state estimate \( {\hat{\boldsymbol{x}}}_{0,0} \), and the measurement covariance as the initial state covariance \(\boldsymbol{P}_{0,0}\).
Note that this can be done only during the initialization phase.
Prediction
We predicted the state and its uncertainty at the next time step, when the radar revisits the aircraft. The Kalman Filter prediction equations are:
State Extrapolation Equation
\[ {\hat{\boldsymbol{x}}}_{n+1,n}=\boldsymbol{F}{\hat{\boldsymbol{x}}}_{n,n} + \boldsymbol{G}\boldsymbol{u}_n \]
Covariance Extrapolation Equation
\[ \boldsymbol{P}_{n+1,n}=\boldsymbol{F}\boldsymbol{P}_{n,n}\boldsymbol{F}^T + \boldsymbol{Q}\]
where:
Assume the second measurement at \(t_1\):
\[ \boldsymbol{z}_1=\left[\begin{matrix}11{,}020\\202\\\end{matrix}\right] \]
Due to a strong noise spike during this measurement, the signal-to-noise ratio is significantly lower than for the first measurement. As a result, the uncertainty of the second measurement is higher.
Let us assume that the standard deviation of the range measurement is \(6m\) and the standard deviation of the velocity measurement is \(1.5m/s\). The corresponding measurement covariance matrix is:
\[ \boldsymbol{R}_1=\left[\begin{matrix}\colorbox{yellow}{$36$}&0\\0&\colorbox{yellow}{$2.25$}\\\end{matrix}\right] \]
We want to estimate the current system state \(\hat{\boldsymbol{x}}_{1,1}\). At time \(t_1\), we have two pieces of information:
Which one should we trust?
Intuitively, we might prefer to use the measurement as the current estimate, that is \(\hat{\boldsymbol{x}}_{1,1}=\boldsymbol{z}_1\), because it is more up to date than the prediction.
On the other hand, the measurement is also noisier. If we compare the main diagonal elements of the prediction covariance \(\boldsymbol{P}_{1,0}\) with the measurement covariance \(\boldsymbol{R}_1\), we see that the prediction uncertainty is smaller than the measurement uncertainty:
\[ \boldsymbol{P}_{1,0}=\left[\begin{matrix}\colorbox{yellow}{$28.5$}&3.75\\3.75&\colorbox{yellow}{$1.25$}\\\end{matrix}\right] \]
So perhaps we should ignore the new measurement and keep the prediction, that is \(\hat{\boldsymbol{x}}_{1,1}=\hat{\boldsymbol{x}}_{1,0}\)?
In this case, we lose the new information provided by the current measurement.
The key idea of the Kalman Filter is that we do neither. Instead, we combine the prediction and the measurement, giving more weight to the one with lower uncertainty.
The solution is a weighted average between the measurement and the prediction:
\[ \hat{x}_{1,1}=K_1 z_1\ +\ \left({1-\ K}_1\right){\hat{x}}_{1,0}, \quad 0\leq K_1 \leq 1 \]
Here, the weight \(K_1\) is the Kalman Gain. It determines how much weight is given to the measurement versus the prediction in a way that minimizes the uncertainty of the estimate. This is what makes the Kalman Filter an optimal filter (as long as the system and noise behave according to the assumptions of the model).
I will introduce the Kalman gain equation shortly, but first let us focus on the State Update Equation. In matrix form, it is written as:
\[ \hat{\boldsymbol{x}}_{1,1}=\boldsymbol{K}_1\boldsymbol{z}_1 + (\boldsymbol{I} - \boldsymbol{K}_1)\hat{\boldsymbol{x}}_{1,0} \]
where \(\boldsymbol{I}\) is the identity matrix (a square matrix with ones on the main diagonal and zeros elsewhere).
Let us rewrite this equation:
\[ \hat{\boldsymbol{x}}_{1,1}=\boldsymbol{K}_1\boldsymbol{z}_1 + \hat{\boldsymbol{x}}_{1,0} - \boldsymbol{K}_1\hat{\boldsymbol{x}}_{1,0}=\hat{\boldsymbol{x}}_{1,0}+\boldsymbol{K}_1(\boldsymbol{z}_1 - \hat{\boldsymbol{x}}_{1,0}) \]
This form shows that the updated state is the prediction \(\hat{\boldsymbol{x}}_{1,0}\) plus a correction term \(\boldsymbol{K}_1\left(\boldsymbol{z}_1 - \hat{\boldsymbol{x}}_{1,0}\right)\).
The correction is proportional to the difference between the measurement and the prediction \(\boldsymbol{z}_1 - \hat{\boldsymbol{x}}_{1,0}\), which is called the innovation or residual.
In our example, both the system state and the measurement are vectors that represent the same physical quantities (range and velocity). Therefore, we can directly subtract \(\hat{\boldsymbol{x}}_{1,0}\) from \(\boldsymbol{z}_1\).
However, this is not always the case. In general, the measurement and the system state may belong to different physical domains. For example, a digital thermometer measures an electrical signal, while the system state is the temperature.
For this reason, the predicted state must first be transformed into the measurement domain:
\[ \boldsymbol{H} \hat{\boldsymbol{x}}_{1,0} \]
The matrix \(\boldsymbol{H}\) is called the observation matrix (or measurement matrix). It maps the state variables to the quantities that are actually measured.
In our example, the observation matrix is simply the identity matrix:
\[ \boldsymbol{H}=\left[\begin{matrix}1&0\\0&1\\\end{matrix}\right]=\boldsymbol{I} \]
For more information about the observation matrix, see the Measurement Equation section of the online tutorial and Examples 9 and 10 in the book.
We can now rewrite the state update equation as:
\[ \hat{\boldsymbol{x}}_{1,1}=\hat{\boldsymbol{x}}_{1,0}+\boldsymbol{K}_1(\boldsymbol{z}_1 - \boldsymbol{H}\hat{\boldsymbol{x}}_{1,0}) \]
The innovation \(\boldsymbol{z}_1 - \boldsymbol{H}\hat{\boldsymbol{x}}_{1,0}\) represents new information.
The Kalman gain determines how much this new information should change the predicted state, that is, how strongly we correct the prediction.
In a one-dimensional case, the Kalman Gain is given by:
\[ K_n=\frac{p_{n,\ n-1}}{p_{n,\ n-1}+r_n} \]
where:
The Kalman gain is chosen to minimize the variance of the updated estimate \(p_{n,n}\), which is why the Kalman Filter is optimal.
To build intuition and see the full derivation in the one-dimensional case, see the Kalman Filter in One Dimension section of the online tutorial.
For the multivariate Kalman Filter, the Kalman gain becomes a matrix and is given by:
\[ \boldsymbol{K}_n=\boldsymbol{P}_{n,n-1}\boldsymbol{H}^T\left(\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^T+\boldsymbol{R}_n\right)^{-1} \]
For the derivation of the multivariate Kalman Gain Equation, see the Kalman Gain section of the online tutorial.
Let us calculate the Kalman Gain for \(t_1\):
\[ \boldsymbol{K}_1=\boldsymbol{P}_{1,0}\boldsymbol{H}^T\left(\boldsymbol{H}\boldsymbol{P}_{1,0}\boldsymbol{H}^T+\boldsymbol{R}_1\right)^{-1} \]
In our example, \(\boldsymbol{H}=\boldsymbol{I}\) and \(\boldsymbol{H}^T=\boldsymbol{I}\).
Substitute the matrices:
\[ \boldsymbol{P}_{1,0}=\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right], \quad \boldsymbol{R}_1=\left[\begin{matrix}36&0\\0&2.25\\\end{matrix}\right] \]
\[ \boldsymbol{K}_1=\boldsymbol{P}_{1,0}\boldsymbol{H}^T\left(\boldsymbol{H}\boldsymbol{P}_{1,0}\boldsymbol{H}^T+\boldsymbol{R}_1\right)^{-1}=\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right]\left[\begin{matrix}1&0\\0&1\\\end{matrix}\right]\left(\left[\begin{matrix}1&0\\0&1\\\end{matrix}\right]\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right]\left[\begin{matrix}1&0\\0&1\\\end{matrix}\right]+\left[\begin{matrix}36&0\\0&2.25\\\end{matrix}\right]\right)^{-1} \]
\[ =\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right]\left(\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right]+\left[\begin{matrix}36&0\\0&2.25\\\end{matrix}\right]\right)^{-1} =\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right]\left(\left[\begin{matrix}64.5&3.75\\3.75&3.5\\\end{matrix}\right]\right)^{-1} \]
\[ =\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right]\left[\begin{matrix}0.0165&-0.0177\\-0.0177&0.3047\\\end{matrix}\right]=\left[\begin{matrix}0.4048&0.6377\\0.0399&0.3144\\\end{matrix}\right] \]
\[ \boldsymbol{K}_1=\left[\begin{matrix}0.4048&0.6377\\0.0399&0.3144\\\end{matrix}\right] \]
The updated state estimate is:
\[ \hat{\boldsymbol{x}}_{1,1}=\hat{\boldsymbol{x}}_{1,0}+\boldsymbol{K}_1(\boldsymbol{z}_1 - \boldsymbol{H}\hat{\boldsymbol{x}}_{1,0}) \]
In our example, \(\boldsymbol{H}=\boldsymbol{I}\), so the innovation is simply:
\[ \boldsymbol{z}_1 - \boldsymbol{I}\hat{\boldsymbol{x}}_{1,0}=\boldsymbol{z}_1 - \hat{\boldsymbol{x}}_{1,0}=\left[\begin{matrix}11{,}020\\202\\\end{matrix}\right] - \left[\begin{matrix}11{,}000\\200\\\end{matrix}\right]=\left[\begin{matrix}20\\2\\\end{matrix}\right] \]
Now apply the correction:
\[ \boldsymbol{K}_1\left[\begin{matrix}20\\2\\\end{matrix}\right]=\left[\begin{matrix}0.4048&0.6377\\0.0399&0.3144\\\end{matrix}\right]\left[\begin{matrix}20\\2\\\end{matrix}\right]=\left[\begin{matrix}9.37\\1.43\\\end{matrix}\right] \]
Finally:
\[ \hat{\boldsymbol{x}}_{1,1}=\left[\begin{matrix}11{,}000\\200\\\end{matrix}\right]+\left[\begin{matrix}9.37\\1.43\\\end{matrix}\right]=\left[\begin{matrix}11{,}009.37\\201.43\\\end{matrix}\right] \]
Once we have estimated the current state, we also want to quantify the uncertainty of that estimate.
In a one-dimensional case, the Covariance Update Equation is:
\[ p_{n,n}=(1-K_n)p_{n,\ n-1} \]
For the derivation, see the Kalman Filter in One Dimension section of the online tutorial.
For the multivariate Kalman Filter, the covariance update equation is commonly written in a numerically stable form, known as the Joseph form, which was introduced by Peter Joseph.
\[ \boldsymbol{P}_{n,n}=(\boldsymbol{I} - \boldsymbol{K}_n\boldsymbol{H})\boldsymbol{P}_{n,n-1}(\boldsymbol{I} - \boldsymbol{K}_n\boldsymbol{H})^T + \boldsymbol{K}_n\boldsymbol{R}_n\boldsymbol{K}_n^T \]
where:
For the derivation, see the Covariance Update Equation section of the online tutorial.
In the literature, you will also often see the simplified covariance update:
\[ \boldsymbol{P}_{n,n}=(\boldsymbol{I} - \boldsymbol{K}_n\boldsymbol{H})\boldsymbol{P}_{n,n-1} \]
For its derivation, see the Simplified Covariance Update Equation section.
Both forms give the same result in exact arithmetic. However, for computer implementations, the Joseph form is generally preferred because it is more numerically stable.
For this example only, let us use the simplified covariance update equation:
\[ \boldsymbol{P}_{1,1}=(\boldsymbol{I} - \boldsymbol{K}_1\boldsymbol{H})\boldsymbol{P}_{1,0} \]
In our example, \(\boldsymbol{H}=\boldsymbol{I}\), so:
\[ \boldsymbol{P}_{1,1}=(\boldsymbol{I} - \boldsymbol{K}_1)\boldsymbol{P}_{1,0} \]
Now substitute the matrices:
\[ \boldsymbol{P}_{1,1}=\left(\left[\begin{matrix}1&0\\0&1\\\end{matrix}\right] - \left[\begin{matrix}0.4048&0.6377\\0.0399&0.3144\\\end{matrix}\right]\right)\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right] \]
\[ =\left[\begin{matrix}0.5952&-0.6377\\-0.0399&0.6856\\\end{matrix}\right]\left[\begin{matrix}28.5&3.75\\3.75&1.25\\\end{matrix}\right]=\left[\begin{matrix}14.57&1.43\\1.43&0.71\\\end{matrix}\right] \]
The uncertainty of the updated estimate is lower than both the prediction uncertainty and the measurement uncertainty:
\[ \boldsymbol{P}_{1,1}=\left[\begin{matrix}\colorbox{yellow}{$14.57$}&1.43\\1.43&\colorbox{yellow}{$0.71$}\\\end{matrix}\right]\ \ \ \ \ \ \boldsymbol{P}_{1,0}=\ \left[\begin{matrix}\colorbox{yellow}{$28.5$}&3.75\\3.75&\colorbox{yellow}{$1.25$}\\\end{matrix}\right]\ \ \ \ \ \boldsymbol{R}_\mathbf{1}=\left[\begin{matrix}\colorbox{yellow}{$36$}&0\\0&\colorbox{yellow}{$2.25$}\\\end{matrix}\right] \]
By combining the measurement with the prediction, and weighting them using the Kalman gain, we obtain an estimate with lower uncertainty.
Adding new information, even when it has high uncertainty, always reduces the estimation uncertainty. See the Sensor Fusion chapter in the book and Appendices G and H for the mathematical proof. From a theoretical point of view, new measurements should never be ignored.
In practice, however, it is often necessary to reject certain measurements. See the Outlier Treatment chapter in the book for practical methods of handling unreliable measurements.
The prediction step of Iteration 1 (from \( t_1 \) to \( t_2 \) ) is identical to the prediction step of Iteration 0 (from \( t_0 \) to \( t_1 \) ) except that we now start from the updated estimate \(\hat{\boldsymbol{x}}_{1,1}\) and \(\boldsymbol{P}_{1,1}\).
\[ \hat{\boldsymbol{x}}_{2,1}=\boldsymbol{F}\hat{\boldsymbol{x}}_{1,1} \]
\[ \hat{\boldsymbol{x}}_{2,1}=\left[\begin{matrix}1&5\\0&1\\\end{matrix}\right]\left[\begin{matrix}11,009.37\\201.43\\\end{matrix}\right]=\left[\begin{matrix}12,016.5\\201.43\\\end{matrix}\right] \]
\[ \boldsymbol{P}_{2,1}=\boldsymbol{F}\boldsymbol{P}_{1,1}\boldsymbol{F}^\top + \boldsymbol{Q} \]
\[ \boldsymbol{P}_{2,1}=\ \left[\begin{matrix}1&5\\0&1\\\end{matrix}\right]\left[\begin{matrix}14.57&1.43\\1.43&0.71\\\end{matrix}\right]\left[\begin{matrix}1&0\\5&1\\\end{matrix}\right]+\left[\begin{matrix}6.25&2.5\\2.5&1\\\end{matrix}\right]=\left[\begin{matrix}52.86&7.47\\7.47&1.71\\\end{matrix}\right] \]
Notice that both variances increase again during the prediction step. This happens because, as time passes without a new measurement, uncertainty naturally grows. In particular, uncertainty in velocity causes additional uncertainty in range, which is why the range variance increases more rapidly than the velocity variance.
Update
We estimate the current system state \(\hat{\boldsymbol{x}}_{1,1}\) as a weighted combination of the predicted state \(\hat{\boldsymbol{x}}_{1,0}\) and the measurement \(\boldsymbol{z}_1\).
The weighting is determined by the Kalman Gain \(K_1\). The Kalman Gain is computed from the predicted state covariance \(\boldsymbol{P}_{1,0}\) and the measurement covariance \(\boldsymbol{R}_1\), and it minimizes the uncertainty of the updated estimate \(\boldsymbol{P}_{1,1}\).
The Kalman Filter update equations are:
State Update Equation
\[ \hat{\boldsymbol{x}}_{n,n}=\hat{\boldsymbol{x}}_{n,n-1}+\boldsymbol{K}_n\left(\boldsymbol{z}_n\ -\ \boldsymbol{H}\hat{\boldsymbol{x}}_{n,n-1}\right) \]
Covariance Update Equation (Joseph form)
\[ \boldsymbol{P}_{n,n}=\left(\boldsymbol{I}-\boldsymbol{K}_n\boldsymbol{H}\right)\boldsymbol{P}_{n,n-1}\left(\boldsymbol{I}-\boldsymbol{K}_n\boldsymbol{H}\right)^T+\boldsymbol{K}_n\boldsymbol{R}_n\boldsymbol{K}_n^T \]
Or its simplified form
\[\boldsymbol{P}_{n,n}=\left(\boldsymbol{I}-\boldsymbol{K}_n\boldsymbol{H}\right)\boldsymbol{P}_{n,n-1}\]
Kalman Gain equation
\[ \boldsymbol{K}_n=\ \boldsymbol{P}_{n,n-1}\boldsymbol{H}^T\left(\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^T+\boldsymbol{R}_n\right)^{-1}\]
where:
Prediction
The prediction step in Iteration 1 is the same as in Iteration 0.
We propagate the current state estimate and its covariance forward to the next time step, when the radar revisits the aircraft, using the state transition model.
This simple example was used to illustrate the main concepts of the Kalman Filter and its three phases: initialization (which happens only at the start of operation), prediction, and update.
After initialization, the Kalman Filter operates in a continuous predict-update loop, as shown in the figure below.

This example demonstrates the core ideas behind the Kalman Filter and its predict-update cycle.
Example-driven guide to Kalman Filter