External Run Time Downloads
For a quick start you should ensure you have a Windows compatible webcam, Microsoft's Java Virtual Machine (else Sun's), and also Sun's Java Media Framework which is set to recognise your webcam device. If you have not then see the steps below.
You then just run the executable … in the bin, and perform in front of your webcam ... and enjoy. What you would see on a reasonable PC is a fast capture of the summary movement. This is all it is and then you start thinking about all the wonderful applications, and also how this can be made to work in C# alongside all your other fondest solutions.
Now from the beginning
Let's be sure that we can all get the executable to run. You must install Java on your PC, and for this application, this should ideally be Microsoft's JVM. You will find this around the Internet, or it is available above. You will need to install the shorter named package first, and then the longer named patch. You will find Sun's JMF at the links above and follow the instructions to install and to capture your video devices. You will then have JMStudio on your desktop and you should be able to capture your webcam. If not then go to Preferences, capture your devices, commit and try again. On older versions of Windows a reboot would often help but it should be reasonably easy to get this far and this is far enough to run the software presented here.
We will revisit later why Java and why Microsoft and why Sun's JMF; i.e. why not J#, yet.
What is really going on here?
Movement detection has been an evolving science but with many taking similar paths to finding software solutions by various pixel handling algorithms. Many look to separate out a representative image background so that visiting foreground objects can be identified by the difference, others may monitor pixels stochastically to similar effect, or some look to track features. The processing between frames can be significant, and one has to strike a balance so that the frame rate does not drop too low.
The algorithm presented here just does not bother with all of that processing and just cuts to the chase by identifying movement on the fly in separate objects. It is fast, does not rely on any of the solutions mentioned above, does not therefore have the problems recognised with each of same, and is accurate. It opens up many further applications for its use, not least because it involves no learning or feature memory, and so this introduces itself to uses in rapidly changing or moving scenes. Those possibilities though are outside the scope of this basic simplified version.
How does one use it?
It fires up in a demo mode. You will see a Java console window with some progress information and for debug, and so you can ignore this. There is the primary display window (below) which shows your webcam video stream and overlaid with various means of identifying movement objects.
You will also see other frames with options, these being a cut down GUI from a larger application.
It is at this point that I confess to not being a professional programmer, this being to prepare you for looking into the source code from which that will be abundantly clear. I have my own style and my own way of doing things but I still believe that you should be able to make sense of the important functionality. I also have to say that I am much more interested in the algorithms than the GUI and much of the latter aspects of the code is untested, for which I apologize, but my stamina and time are finite.
I bring this up now because I am not at this first pass of writing going to delve into all the features even of this simplified version. You can see that the larger frame (below) includes many options that are obvious as read or from the source, or even largely redundant in this context or better handled by questions or trial. I am therefore only going to talk about the important choices, so I am also helping you to discern what to actually read.
The Cam or URL is set to Cam but if you tick URL and Save, then on restart it will look to an address for streaming image files; e.g. networked video servers. I do not suggest you rush to do this because the results are not as good as the cam directly. If you do though then set up the addresses in the imgurls.txt file in the repository folder. It is very important to ensure that each frame is fresh and not repeated, and so if the image is not updated rapidly you should reset the Msecs per hit higher and save and restart. Note the clipping options. The kernx/y are ideally left as unity unless image resolutions are perhaps very high from the data source. This frame usually has to be saved and requires restart, not so for the smaller frames.
The smaller frame (below) has one important field, that being the Gradient Tolerance. This is the one you aim for first to reduce noise but not so much as to lose motion capture and its presentation in the display. The parent of this application has options for Gaussian smoothing which are very useful but fog up the code and slow the processing. You work with this tolerance alongside other options I will introduce you to.
If you have not yet worked around the GUI menu items then now is the time. Firstly though take a moment to click the mouse inside the primary display, then move it a little and repeat. You will see the rectangle reset to a new position and the processing within. Now, ensure that this primary display frame is in focus, and hit the first letter of a colour, e.g. Red, on the keyboard. Now click in two different positions within the display again. You can have many of these rectangles running in parallel and they can overlap. Another way of setting these is by the GUI - and see Set/Edit in the menu of the display frame.
Look through the menu items of the larger frame, and find Camera then Videosource. If you click in this then an avatar appears which follows the mouse and is available for testing. Now go to the smaller frame and find the Rectangles menu, and for a rectangle within Activity find Configuration. Another frame (below) will pop up and beyond the obvious items there is Capture Range Pixels, and Internal Border Pixels. Note that if you change anything in these frames you should hit Reset to effect the change.
Capture Range Pixels is used to determine if a cluster of moving pixels is connected to any other. Clusters build recursively by enclosing in an octagon of tightest fit, all external angles being 45 degrees. If by measures of distance of these number of pixels there is another cluster then it is merged. For example then, if this range is 320 and the display is 320 x 240 then all movement will be identified within one object. Similarly, if this range is 20 in the same display then smaller objects separated by at least 20 pixels will be marked as individual objects.
The Internal Border Pixels affects the presentation of the drawing options and is best left alone.
The Gradient Tolerance mentioned previously, and these options, can be changed on the fly to generate acceptable visual and data performance; balancing noise with the visual data and the thresholded slider bar quantifying activity.
The User button will retrieve the configuration as may have been saved for that frame, else Factory restores any defaults.
Well why not, given the nature of the purpose and the solution, and its potential use across platforms. But, for image data handling of this kind and with the need for fast processing the better choice is probably C++. What is interesting though is that Microsoft's JVM handles this code extremely well, it is very fast and uniform. If you run it using Sun's JVM then it is notably slower and what I suspect is garbage handling introduces pauses into the flow of the video stream, often noticeable.
The natural next step, given how well the Microsoft JVM performed, was to try J# using Visual Studio .NET. At least a couple of very good articles in CodeProject pointed the way to doing this by either DirectX or Avicap32. The port across to J# of a larger version of this application was wonderfully straightforward in the main, and the frame grabbing took some time but was worth it. I never managed to make up my mind which approach to capturing the image data was better. I put much effort into optimising the data handling for array handling and rendering images data.
However, knowing this was the way to go, to move to .NET with all else that has to offer and as a natural step forward via J#, it failed me; or it did the way I did it or with the status of the beta versions at that time. I could not achieve the same performance for the .NET runtime as I could using the Microsoft JVM, and especially so - as the fuller product requires - when I render several images to displays on the screen simultaneously. My only route forward was to return to using the Microsoft JVM and to consider paths forward using C++ or DSPs. My preference though was to progress to .NET.
Why am I here writing this?
Well, I really appreciated the help I found here when converting from Java to J#, especially for how to grab and handle image data from video devices. So I am in my own way giving something back. I also hope though that someone somewhere in a community of such size will pick up on my plaudits for the Microsoft JVM and note my disappointment that there is no clear path forward for applications like this through to .NET. My further selfish intellectual and commercial interest is in exploring algorithms such as these, and other end of network digital solutions.
What may come from this article is some shared interest in this algorithm per se and perhaps some further development to port this across to .NET, and addressing as far as one can, the issues of performance. Note that this would still work very well in .NET because the frame rate is still very adequate, but the faster this can be made to be then the more analysis that can occur between frames. In my opinion very fast on the fly motion or activity quantification opens up many uses and new directions.
The source and executable
You download these from the links above.
The executable is run.bat or if you wish to try it using Sun's Java then edit sun.bat
The source is pure Java and the main class is
jmfcam05.java. Remember I am not a programmer and so do not expect to find any great quality in style or presentation, but I hope you find it useful. Note the licensing and that there are absolutely no warranties.
There may be more resources at http://www.exactfutures.com/ including extended versions of The Andyble Algorithm and more in The Algol Collection.
- New article: 1st July 2006