## Introduction and Background

This article will teach you how to estimate motion in a stream of images. I saw that some articles already exist on CodeProject for motion detection. To be precise whereas motion detection amounts to just detecting if there is any movement, motion estimation means you have to estimate the direction of motion also. Motion estimation has been extensively studied by both computer vision and human vision researchers for more than 25 years. It is also commonly referred to as optical flow computation especially in the computer vision literature. The algorithm that I will use in this article was given by Watson & Ahumada (WA) in the following paper which I highly encourage you to read:

The beauty of this algorithm is that it is a fairly accurate model of how our brain itself senses motion. In order to understand the algorithm some familiarity with Fourier analysis may be needed. The algorithm works by resolving the input stimulus into its component sinusoidal gratings oriented in different directions and estimating the motion of those gratings. This approach works well because determining the motion of a sine grating is easy: there exists a simple relationship between the spatial and temporal frequencies of the grating and its velocity (ω_{x}v_{x}+ω_{y}v_{y}+ω_{t}=0 where ω_{x},ω_{y} are the spatial frequencies of the grating v_{x},v_{y} is the velocity of the grating and ω_{t} is the temporal frequency of the grating). Having given you the necessary pointers that explain the algorithm in detail, let me now move on to how you can use the algorithm for optical flow estimation in your own applications.

## Using the Code

The *WAv2.zip* file available for download with this article has two projects:

`WAv2`

: This is a class library (DLL) that contains the functions to do optical flow computation. It uses two DLLs: *fft.dll* and *mapack.dll* which should appear in the *bin/Debug* directory of `WAv2 `

project.
`Examples`

: This is a project which shows you how to use the functions in *WAv2.dll* to do optical flow computation. Six test cases of optical flow computation are covered: simple, medium, complex, blocks, grid, vcbox. In each case 8 images are input to the WA algorithm; the algorithm analyzes the images and computes the resulting optical flow. You will need to download the images for these test cases from this website. Download the *simple.tar.gz*, *medium.tar.gz*, *complex.tar.gz*, *blocks.tar.gz*, *grid.tar.gz*, *vcbox.tar.gz* files and unzip them into some folder. I created animated gifs of these test cases so that you can see the motion in them. Below are shown the animated gifs and also the optical flow extracted by my code for all the 6 test cases:

Simple:

Medium:

Complex:

Blocks:

Grid:

Vcbox:

The most important function in *WAv2.cs* is:

private float[][,,] _WAComputeOpticalFlow(ComplexF[,,] I,
double fr, int N, int[] LevelsToCompute)

This is where the optical flow computation is done. Note that I have declared the function as `private`

and there is no need for you to call it directly. Instead there are a number of `public`

functions in *WAv2.cs* with names `WAComputeOpticalFlow`

all of which at some point call `_WAComputeOpticalFlow`

. You should use one of these functions for your optical flow computation. I will now go through an example in detail; as we work through the example, it will become clear what the different arguments to `_WAComputeOpticalFlow`

mean. The example I choose is the test case: simple. In the file *examples.cs* there is a function `Simple()`

which looks like this:

static void Simple()
{
string dir = "c:/c#/optic_flow_test_data/simple/";
int N = 8;
string[] filenames = new string[N];
for(int i = 0; i < N; i++)
filenames[i] = dir + "anim."+(i+10)+".tif";
WAdetector WA = new WAdetector();
float[,,] f = WA.WAComputeOpticalFlow(filenames);
float[,,] quiver_data = new float[f.GetLength(0),
f.GetLength(1),2];
for(int y = 0; y < quiver_data.GetLength(0); y++)
for(int x = 0; x < quiver_data.GetLength(1); x++)
{
quiver_data[y,x,0] = f[y,x,0]*f[y,x,2];
quiver_data[y,x,1] = f[y,x,1]*f[y,x,2];
}
Bitmap bmp = WAdetector.Quiver(quiver_data);
int date = DateTime.Now.Year*1000 +
DateTime.Now.Month*100 + DateTime.Now.Day;
int time = DateTime.Now.Hour*1000 +
DateTime.Now.Minute*100 + DateTime.Now.Second;
string filename = dir + "simple_flow-" +
date + "-" + time + ".bmp";
bmp.Save(filename);
}

The variable `dir`

points to the directory where the images are stored. Change the value of this variable accordingly if you store the images in some location other than *c:/c#/optic_flow_test_data/simple/*. `N=8`

images will be given as input to the optical flow algorithm. You will understand why I have chosen 8 images as we work through the example. `filenames`

just stores the names of the files. The optical flow computation is done by the following statement:

float[,,] f = WA.WAComputeOpticalFlow(filenames);

The optical flow is returned in the variable `f`

. `f[y,x,0]`

is v_{x}, the velocity in x direction at location (x,y). `f[y,x,1]`

is v_{y}, the velocity in the `y`

direction at location (x,y). `f[y,x,2]`

is a weight given to the velocity estimate (higher weight means the velocity estimate is more accurate). The velocity estimates are returned in a left-handed coordinate system meaning that `v`

_{x} is positive to the right and v_{y }is positive to the bottom.

The optical flow can be visualized as a quiver plot using the function `WAdetector.Quiver`

. In my quiver plots I usually scale the velocity estimates by their weight. This is what is done by lines 10-16. Lines 18-21 save the quiver bitmap as a BMP file. The quiver plot has been shown above. Looking at the quiver plot you can see the algorithm correctly captures the rotation of the square and the movement of the sphere in the input image sequence. Let us now look more closely at what the following line does:

float[,,] f = WA.WAComputeOpticalFlow(filenames);

The function `WAComputeOpticalFlow(filenames)`

computes the optical flow for the sequence of images whose filenames are specified in the array `filenames`

. Currently I support only two formats: `PixelFormat.Format8bppIndexed`

which is what many GIF files are and `PixelFormat.Format24bppRgb`

which is what many TIF and JPG files are.

`WAComputeOpticalFlow(filenames)`

uses default values of parameters needed by the WA algorithm and calls the function `public float[][,,] WAComputeOpticalFlow(int sampling_rate, int fd, int T, int num_orientations, int[] levels_to_compute, string[] filename) `

with the default parameters as shown by following code snippet:

public float[,,] WAComputeOpticalFlow(string[] filename)
{
Bitmap bmp;
try
{
bmp = new Bitmap(filename[0]);
}
catch
{
throw new Exception("ReadImage: unable to read file " +
filename[0]);
}
int h = bmp.Height;
int w = bmp.Width;
int x = (int) Math.Round(0.5*(Math.Log(h,2) + Math.Log(w,2)));
int n = x - 5;
if (n < 0) n = 0;
if (n > 3) n = 3;
int[] levels_to_compute = new int[]{n};
float[][,,] f = WAComputeOpticalFlow(80, 2, 16, 10,
levels_to_compute, filename);
return f[0];
}

The most important parameter setting you may wish to understand is the timing information. If you have used any optical flow algorithms before, many of them just take two images as input and compute the flow between them. Well, if I show you a sequence of images at a frame rate of 1Hz there is no chance that you will see motion in them. When you go to the theater the images are played at a rate of 24-30Hz or so because it is only then that the perception of motion occurs. Because the WA algorithm is a model of human visual motion sensing you need to specify such timing information to the algorithm in addition to giving it the sequence of images. There are two things I wish you to understand:

- First, the motion that your brain senses at time
`t`

is based on what your eyes saw from time `t-T`

to `t`

where `T`

is approximately 200ms. The intensity matrix `I(y,x,n)`

that is input to `_WAComputeOpticalFlow`

and computed by `WAComputeOpticalFlow(int sampling_rate, int fd, int M, int num_orientations, int[] levels_to_compute, string[] filename)`

represents discrete-time samples of what your eyes saw in this 200ms period. You specify the sampling rate at which these samples were taken by the variable `sampling_rate`

. If there are a total of `M`

samples which is the length of `I`

along the `n`

dimension (`M=I.GetLength(2)`

) then `M/sampling_rate`

should equal `T`

=200ms. Because the algorithm takes a FFT (Fast Fourier Transform) of `I`

it requires `M`

to be a power of 2. The higher `M`

is, the better; it should at least be 16 otherwise the optical flow estimates may not be very accurate. Using `M=16`

we get a `sampling_rate`

of 16/0.2=80Hz.
- The next thing you want to consider is what is the frame duration (
`frame duration = 1/frame rate`

) of the images that would be input to the algorithm. As I mentioned before when you go to a theater the images are played at a certain frame rate to create the perception of motion. It is found that a frame duration of about 30ms corresponding to a frame rate of about 33Hz is optimal for motion perception. In the function `WAComputeOpticalFlow(int sampling_rate, int fd, int M, int num_orientations, int[] levels_to_compute, string[] filename)`

I constrain the frame duration to be an integer multiple of `1/sampling_rate`

; frame duration = `fd/sampling_rate`

. It will become clear in just a moment why I do this. With `sampling_rate=80`

Hz I find a frame duration of 2/80=25ms works well in practice.

Let me summarise: with `sampling_rate`

=80Hz, `M`

=16 the intensity matrix `I`

in `WAComputeOpticalFlow`

represents 16 samples of what your eyes saw over a time period of `T`

=16/80=200ms. With a frame duration of `2/80=25ms `

each image stays on the eye or the retina for 25ms. If you have really understood the above discussion you understand this means each image will have to be repeated `fd=2`

times in the intensity matrix (25ms = 2 samples at a sampling rate of 80Hz). This is how the intensity matrix is created in `WAComputeOpticalFlow(int sampling_rate, int fd, int M, int num_orientations, int[] levels_to_compute, string[] filename)`

. By now, hopefully you can also understand why I constrained the frame duration to be an integer multiple of `1/sampling_rate`

. I leave it as an exercise to you to write a function in which the frame duration is not constrained in this manner if you may happen to need such a function.

There are two other parameters of `WAComputeOpticalFlow`

that need some explanation: `num_orientations`

and `levels_to_compute`

. `num_orientations`

specifies the number of orientations/gratings into which the input will be resolved. Just set `num_orinetations`

=10 and don't mess with it. `levels_to_compute`

is an array through which you specify the spatial scales or levels at which you are interested in getting the optical flow information. If you are familiar with Fourier analysis you know the Fourier Transform allows an image to be expressed in terms of its spatial frequency components. The different levels of the WA algorithm are sensitive to different spatial frequencies of the input as shown in following figure:

In my current implementation, I do not allow for computing optical flow at levels higher than 3. However, you can change this to your heart's content by modifying the value of `private const int MAXLEVELS`

in *WAv2.cs*. In my implementation, if the input has a resolution of `h`

by `w`

pixels, then level `i`

returns optical flow computed at a resolution of `2`^{-i}h

by `2`^{-i}w

pixels. It is possible to write an implementation in which the resolution does not decrease as the level number increases. However, it may be something that is unwarranted, because the low frequency components of an image represent its coarse structure and supersampling this coarse structure may not give any additional information.

Let me now address the practical question: what should be the value of `levels_to_compute`

? The first thing to note is that the entries of `levels_to_compute`

have to be in ascending order, otherwise the optical flow computation will crash. The level at which you may want optical flow to be computed is really application-dependent. You should compute the flow at all levels meaning `levels_to_compute = new int[]{0,1,2,3}`

and then narrow down the levels that seem to work best or are sufficient for you. If you are familiar with Gaussian Pyramids that's what the `levels_to_compute`

is. Level `i `

instructs the algorithm to compute optical flow at level `i`

of the Gaussian Pyramid of the input.

If you are confused, don't worry. Use lines 13-19 of the function `WAComputeOpticalFlow`

above as a guideline to choose the level at which to compute the optical flow. I have found this guideline to be useful in choosing what level to use at least in my applications. The guideline is based on the established fact that the power spectrum of natural images displays a power law behavior and `1/f`^{2}

frequency dependence, e.g. read following paper: Modeling the Power Spectra of Natural Images: Statistics and Information by van der Schaaf A. & van Hateren J.H. in Vision Research, Volume 36, Number 17, September 1996 , pp. 2759-2770(12) (a free version of the article is available here).

What this means is that most of the power is concentrated at low spatial frequencies and it rapidly falls off with increase in frequency. Therefore you may be better off computing optical flow at higher levels because that's where most of the power is. As you will experiment with the different levels, you may find that the lower levels are well tuned for detecting very small displacements whereas higher levels respond better to relatively larger displacements; this is because very small displacements will get washed out at higher levels of the pyramid due to Gaussian blurring and subsampling.

One way to test this out is to create a display of random dots undergoing radial motion. The displacement of a dot is directly proportional to its distance from the center of expansion/contraction. When I made such a display (512 by 512 pixel resolution) the following are the optical flows I got at successively higher levels.

At level 0 (center frequency = 1/4 cycles/pixel):

At level 1 (center frequency = 1/8 cycles/pixel):

At level 2 (center frequency = 1/16 cycles/pixel):

At level 3 (center frequency = 1/32 cycles/pixel):

At level 4 (center frequency = 1/64 cycles/pixel):

You can see from the above images that lower levels capture flow at center (small displacements), whereas higher levels capture flow at periphery (larger displacements).

Enough said. Just a few more pointers: what does the return value of `WAComputeOpticalFlow(int sampling_rate, int fd, int M, int num_orientations, int[] levels_to_compute, string[] filename)`

mean? Let the return value be stored in a variable `f`

. Then `f[i][y,x,0]`

is v_{x}, the velocity in the x direction; `f[i][y,x,1]`

is v_{y}, the velocity in the y direction; `f[i][y,x,2]`

is a weight associated with the`(v`_{x},v_{y})

estimate and all are computed at location (x,y) and level `i`

. The velocity estimates are returned in a left-handed coordinate system meaning that v_{x }is positive to the right and v_{y }is positive to the bottom.

Visualizing the optical flow as a quiver plot using `WAdetector.Quiver`

is great but sometimes you may want to store and read the raw optical flow. The functions `public static void SaveOpticalFlow(float[,,] f, string filename)`

and `public static float[,,] ReadOpticalFlow(string filename)`

do just that.

## Wrap Up

This is an advanced topic and I don't know how well of a job I have done explaining the code. I solicit your comments to make this article better.