I did a little bit of the theory in University.
Basically you need to find the differences between two frames of the video. (this gives you the movement between the two frames)
Secondly once you have the differences you can then do your pattern matching to try to see what has moved. (I.E. a person or object, tip would be to start very general and then refine). Do this by comparing the shape of the change between two frames. You might need to expand some objects to be identified over more than one frame. Obviously there are various ways of pattern matching. http://en.wikipedia.org/wiki/Object_recognition
If you can't figure out what it was then add it to a list of undefined object and continue to the next frame. (if this occurs alot go back and look to see if this should be matched to your patterns for matching)
so just to recap, find changes, pattern match on changes, then learn if there is a missing pattern.
Hopefully this helps clear things up a bit.