|This is the final post in the series on the Ultimate Coder Challenge: Going Perceptual. The applications are in and now it's do or die. The biggest challenge to our contestants at this point isn't the code, or the SDK, or their idea. It's our hardware. It's the installers. It's our ability to understand what they were trying to do and that has proven to be difficult in some cases.
Perceptual computing is a form of interacting with a computer in a more natural human way. You speak or you gesture, or you may even want to just swipe. It's not about the keyboard and mouse. The biggest problem is that what we think makes a good method of interaction (cue Minority Report) is actually a terrible way to interact. It's tiring, there's no feedback, and gestures are in 3D, not 2, and not constrained to a box or button and so can be ambiguous. Further, a touch gesture as a definite start and stop. It starts when you touch and ends when you stop touching. When does a gesture start and stop?
Further, and as has been mentioned elsewhere, there's no standard. We all know how to pinch-zoom, or swipe to scroll, or even push a button. How do you push a button in 3D space? How do pinch to zoom, without the initial contraction of the fingers being misinterpreted as a shrink action first?
So on to the challengers:
Lee's work I've written about extensively, usually while ruefully shaking my head at the madman. He shamelessly takes on way more than he can chew and after endless dead ends, delays, setbacks and possibly a broken keyboard or two he comes out with something a little special. His virtual conference is, for me, an excellent example of what's possible with perceptual computing: a UI that revolves around you. You're in a virtual 3D environment talking to others and as you turn, the viewport turns with you.
Except there's one huge problem with this approach: in a video conference you never turn. If you turn then you're no longer looking at the screen in front of you. Further, as you turn, the mass-tracking (not head tracking) turns the virtual camera in the direction that you're turning, meaning the image on the screen pans, meaning, well, that you don't actually need to turn to view it. Kind of like an anti-Catch-22.
It's very cool though. Very cool.
Sixense's puppetry demo is complete. It's the ultimate kids toy and allows two people (or one talented person) to host and record a virtual puppet show. It works, but I can't help but feel that so much effort went into things like scenes and a story that the creators forgot what puppet shows are about. Dialogue. Well, and violence, for those who fondly remember the Punch and Judy days, but mostly dialogue and often costumes. So why not ditch the fancy scenes and instead allow quick changes of clothes? Or have amusing sound effects when one puppet bonks another puppet on the head? That would focus the players on thinking about their puppet rather than trying to control a puppet that keeps flying all over the countryside.
It's an excellent example of 3D free form gesture based interaction with a computer, and for that they score highly. Most importantly their user feedback on interpreting hand gestures is by far the best of any contestant.
Code Monkeys Stargate application shows promise in (for me) the ultimate goal of a head tracking video game. Unfortunately the dream never lived up to reality and the sheer effort involved in trying to control the targeting quickly took away from the luster. I did get it to work, but the headache afterwards was not worth it.
Infrared5 followed a similar path to Code Monkeys with their Kiwi challenge, however they introduced an extra variable by having the control of the application be done via a smartphone application. The connection between the game and the smartphone was seamless and slick. I could not, however, control anything but the fire button from my iPhone which made the game unplayable for me. Restarting didn't solve the issue. Further, I often ended up with a black landscape in my viewport that no amount of yelling, tilting, clicking or swiping would get me out of, and closing the app via alt-tab (there's no close button I could find) was near impossible because no sooner had you popped out of the app then you were thrust back into it, black soulless void and all.
Pete has created an image processing application that uses a series of gestures to activate filters. His biggest problem is there is no defined, standard set of gestures one can call on to immediately dive into his app. There's no help button, so you're left guessing at gestures. Thumb up and down, swiping up and down, and in my case swearing like a sailor. An excellent attempt at making gesture input a natural part of the interaction with the application, but let down, I think, but the maturity of the platform.
Eskil has demonstrated an abstraction layer that allows developers to take advantage of perceptual computing without needing to write the boilerplate code. In fact, without needing to know anything about the nuts and bolts at all. This is a tremendous achievement given the short time available.
A short note on the perceptual computing camera: I'm typing this review on my Lenevo Yoga with the camera perched on the top of the screen. The camera is heavy, as I've mentioned before, but it's only now, after hours of clenching my fist, twiddling my fingers, bobbing my head frantically and yelling 'Engage' in various accents to try and interact with the game, that I've realised I'm getting tired always hiking the screen back up to the vertical position. The camera makes the lid sag due to the weight, and when it's not sagging it's trying to tip the laptop base over apex. It needs to be lighter and it needs to be way smaller.
Another general comment on the use of the camera as an input device is that onscreen feedback is critical. If you don't get feedback on what the computer thinks you're doing you go nowhere. You can't debug. I cannot overstate how important feedback is.
Also, I do need to also comment on the Lenevo itself. I love the feel of the keyboard and most especially the palm rest. So very comfortable. But please, to anyone who is thinking of manufacturing a laptop keyboard, DO NOT reduce the width of the right shift key and squeeze in the up arrow in the space created. It means I'm constantly hitting the up arrow. Constantly. It's doing my head in. Apart from that the screen is crisp, the battery life good, and the flip back keyboard weirdly useful. Propping this thing on a table with the keyboard folded back to watch a movie or flick through the news is brilliant.
As to perceptual computing, my overwhelming feeling is "it's coming". But it's not here yet. We are trying to make a computer be like the real world. You turn or look or speak or grab at something that isn't there in the hope that the computer will mimic or replicate your intent virtually. This, to me, is akin to skuemorphism: changing something to be like something else, like Apple making it's calendar application leather bound, or having ebook readers show an animated page turn. A computer is not the real world, and it doesn't have the limitations of the real world (so to speak) so why mimic it?
I think the future of Perceptual computing will most likely be subtle. The computer may recognise you or your voice, and will recognise when you are in front of the computer, when you're looking at it, and when you're not. The end of screen savers, really, since it would just go to sleep. Gestures such as brushing away an app to close it, or flicking it to move it to another screen would be intuitive, and voice recognition would allow utility commands such as searching or bookmarking to be carried out without needing to take your hands of the keyboard or, indeed, outside of the current application.
Gaze tracking is another incredibly important, yet not currently functioning (at least on the hardware I have in front of me) feature that has a myriad of uses. My immediate use would be for eye tracking in UI testing, but even simpler could things like auto scrolling, or even auto-hiding of elements when they are not being looked at directly. The eye does, however, jump around an awful lot so the smoothing algorithm will need to be heavily weighted.
The contestants achieved some incredible feats of patience, innovation, creativity and problem solving. I take my hat off to them for their perseverance and outright foolishness in taking on a challenge that many of them were so unprepared for, yet so willing to have a go and rise to the occasion. Rise they did, so well done, guys. Get some rest and have a beer. You've earned it.
The Code Project | Co-founder
Microsoft C++ MVP