This is the first part of a series of articles to introduce concepts and software for a framework that delivers motion and touch as services, both locally and over the internet, all with the aid of your regular webcam. Yes you heard (or read) right, motion and touch as services. Actually, there's a bit more; with this tech, some time in the near future, you will be able to develop apps that actually replicate 3D forms across a network!
All of the software will be open source with very permissive licensing, except for a couple of core libraries which are patent pending freeware (intellectual rights are mine). The core libraries currently host code for image analysis, but will also host code to simulate texture and replicate 3D forms.
The computer can almost become a living thing.
What is it Joe?
“Sir, it the Kinect, Wii and Sony Move... But on steroids! It will rockses your World and it are a wonderful concept! I speechles, it have cutted my tongue into piece!”
Photo credit to Smoobs; http://www.flickr.com/photos/smoo/
Here goes! A few words about what will be possible:
The framework will Allow you to invent your own gestures, effects and commands to control devices, it will also be flexible enough to adapt to existing games with no modifications to your code.
With a simple everyday webcam and a computing device (tablet PC etc), the new framework will open up all sorts of exciting prospects to extend human-computer interactions into an ecosystem of enhanced sensory exchanges. For example; imagine software that gives you the freedom to create crazy stuff that lets you point at one computer to transfer data to the next. Personally, I've always dreamt of boxing or taekwando tournaments across the web (any good ol' bashing), perhaps we would achieve some semblance of that limited mainly by connection and round-trip speeds; sadly, though, it would require a top end webcam that can take crisp shots of fast moving objects.
What about the ability to feel the fabric of your brand new sofa while sitting at your desk, before ordering over the web? How about an enhanced movie experience where you can feel the creepy crawlies in a horror film while sitting in your favourite chair in your sitting room? Dream of the possibilities of having your websites contact and support pages, your forum, Flash, Silverlight, HTML5 bits etc interacting in novel ways with gestures... Even actuators, motors, robots and other mechanical devices!
Possibilities! Possibilities!! Possibilities!!!
Every aspect of the technology's development will be done with your participation, there are things I can show you, while there are things you can teach me as well; as code is released here, the real life framework and servers are being built. Watch out for roll outs of both incremental code and new explanatory articles every fortnight or so. The current article deals with the most basic concepts of the framework.
I've always been intrigued by realtime object recognition, even starting an open source project on www.Codeplex.com in 2006. Since 2010, I've been seriously engaged in tackling real time object recognition and started a journey of grit, sweat, failures, the odd success and gruelling restarts on www.KC36.com. The lessons learnt were many.
A Hint At The Methodology and Supported Platforms
The first and most important lesson was that the Cartesian coordinate system is a thoroughly unnatural and confusing choice for image analysis. Firstly, its representation of angles and lengths are inaccurate; secondly nature prefers circles and other conics. As an example, the pupils of our eyes are round, while our field of vision is conical (I digress; indeed, throughout observable nature, straight lines are always found to be local approximations of curves, straight lines are merely one of our convenient mathematical concepts).
Ahem! How do you represent conics with a pile of squares? Not happening, sir!
We solve the problem by breaking our image into polygons. In order to break the image into polygons, we find the edges (or depths) and approximate curves with a polyline so that each object in the image becomes a polygon. Doing this enables us to simplify computations and execute them at increased speed.
More details about this area will be provided in later articles in this series.
The platforms that will be supported eventually, will be:
- Windows (C++, C#)
- Android (via C++ and C#/Mono)
- Linux (via C++ and C#/Mono)
- iOS (via C++)
- Mac (via C++)
At this time, only Windows and the Pbgra32 image format are supported. However, the library is written in ANSI-C++ and has no dependencies, therefore as soon as other image formats are supported, there will be few or no issues compiling it for other platforms.
Intro To The Edge Detection Code
The edge detection code does a number of things, the major ones being:
- Edge detection
- Edge categorising
- Edge indexing and sorting
- Polyline segment angle approximations
- Curve approximation with polylines
Firstly, it must be stressed that the app and libraries shown below have no GPU acceleration and are unoptimised (except for the SSE switch in Visual C++), yet execution times are very competitive, even on my 7 year old, barely crawling granny 'puter that’s running an advanced OS its primeval BIOS utterly rejects like its the spawn of Satan. Here are the specs of my good 'ol workhorse:
OS: Windows Server 2008 Standard Service Pack 2 (32-Bit)
Make: Asus Pundit P4S8L
Processor: Intel Celeron 2.40GHz
RAM: 1.5 GB
The pics below show the result for finding, categorising, indexing, sorting the edges (and all the other goodies mentioned at the start of this section) of a 615 x 407 pic.
14 milliseconds, not bad enh?!
The code and library for the app below are supplied with this article, so you can do your own tests. Your mileage may vary and you will find that the busyness of your test pics also plays a part in the execution time.
NOTE: We do not use a webcam yet, for now that's not required to show what is possible. Webcams will come in subsequent articles.
Fig. 1 Edge categorisation
Fig. 2. Curve approximation
Fig. 3. Segment angle approximation
For some reason, its slow on 64-bit machines though, I will look into it and provide a 64-bit version at a later date.
NOTE: to avoid System.IO.FileNotFoundException, 64-bit users will need to install Microsoft Visual C++ 2010 SP1 Redistributable Package (x86) http://www.microsoft.com/en-gb/download/details.aspx?id=8328
The software for this article is made up of two projects, KC36.Client and KC36.NET. KC36.Client is the GUI, while KC36.NET wraps the native library, KC36.Native. The native library contains the core analysis functions.
What This Code Release Does
In a few short words, it organises an image into polylines which can then be searched for patterns. Fairly simple and straightforward. The code block below explains how to use the current files, it is quite similar to code found in the Execute() method found in the GUI file, ClientMainUI.cs.
The code in the native library is patent pending, so I am a bit constrained with it as far as details go; one day in the future, it might become open source, but for now, the fear of the anti-innovation Apple/Samsung battle is the beginning of wisdom.
Using The Code
internal class Class1
internal unsafe void TestIt(Image image)
int bitsPerPixel = 4;
int trkBarMinBrightnessDiffTolerance = 177;
int cmbEdgeAngleTolerance = 1;
Bitmap bitmap = (Bitmap)image;
byte pixelArray = Tools.GetBytes(bitmap);
int pixelArrayCount = pixelArray.Length;
int stride = bitmap.Width * bitsPerPixel;
int width = bitmap.Width;
int height = bitmap.Height;
Wrapper.InitialiseCore(width, height, stride,
int* polylineAngles = Wrapper.Directions;
int* indices = Wrapper.DirectionIndices;
int* edgeMetrics = Wrapper.EdgeMetrics;
int featurePointCount = Wrapper.FeaturePointCount;
int segmentPropertyCount = Wrapper.SegmentPropertyCount;
int segments = new int[segmentPropertyCount];
int* segmentsNative = Wrapper.SegmentProperties;
Experiment with various settings, get used to them and see what you get. These controls are the beginning of an exciting and epochal journey to rewrite the way the Internet and computer interactions are defined.
To use the downloaded files, unzip to a suitable directory. You will find another zip file called KC36.Native.zip and a license file called “License (for KC36.Native).txt.” Read the license, then unzip the code in the same directory. The license is quite restrictive for the intro release, but that's only because some of the code will be obsolete in the next week or so; subsequent licenses will be much freer. By the way, both KC36.Client and KC36.NET are MIT licensed and as free as air
That’s all for now, folks! The next article will introduce the object recognition code, it'll link to the intro above and fill in any technical gaps. Expect it in a week from now.
Points of Interest
The code is slow on 64-bit machines, thats something thats being worked on. Also, to avoid System.IO.FileNotFoundException, 64-bit users will need to install Microsoft Visual C++ 2010 SP1 Redistributable Package (x86) http://www.microsoft.com/en-gb/download/details.aspx?id=8328