Ultimate Coder – Week 2: Blog posting

Pete O'Hanlon

5.00/5 (1 vote)

Feb 26, 2013

CPOL

20 min read

9617

This is a copy of the post I made on the Intel site here. For the duration of the contest, I am posting a weekly blog digest of my progress with using the Perceptual Computing items. The first weeks post is really a scene setter where I explain how I got to this point, and details bits and pieces about the app I intend to build. My fellow competitors are planning to build some amazing applications, so I’d suggest that you head on over to the main Ultimate Coder site and follow along with their progress as well. I’d like to take this opportunity to wish them good luck.

Week 1

Well, this is the first week of coding for the Perceptual Computing challenge, and I thought it might be interesting for you to know how I’m approaching developing the application, what I see the challenges as being, and any roadblocks that I hit on the way. I must say, up front, that this is going to be a long post precisely because there’s so much to put in here. I’ll be rambling on about decisions I make, and I’ll even post some code in for you to have a look at if you’re interested.

As I’m not just blogging for developers here, writing these posts is certainly going to be interesting because I don’t want to bog you down with technical details that you aren’t interested in if you just want to know about my thought processes intead, but I don’t want to leave you wondering how I did something if you are interested in it. Please let me know if there’s anything that you’d like clarification on, but also, please let me know if the article weighs in too heavily on the technical side.

Day 1.

Well, what a day I’ve had with Huda. A lot of what I want to do with Huda is sitting in my head, so I thought I’d start out by roughing out a very, very basic layout of what I wanted to put into place. Armed with my trusty copy of Expression Blend, I mocked out a rough interface which allowed me to get a feel for sizing and positioning. What I really wanted to get the feel of was, would Huda really fit into a layout that was going to allow panels to fly backwards and forwards, and yet still allow the user to see the underlying photo. I want the “chrome” to be unobtrusive, but stylish.

As you can see, this representation is spartan at best, and if this was the end goal of what I was going to put into Huda, I would hang my head in shame, but as it’s a mockup, it’s good enough for my purposes. I’ve divided the screen into three rough areas at the moment. At the right, we have a list of all the filters that have been applied to the image, in the order they were applied. The user is going to be able to drag things around in this list using a variety of inputs, so the text is going to be large enough to cope with a less accurate touch point than from a mouse alone.

The middle part of the picture represents the pictures that are available for editing in the currently selected folder. When the user clicks on a folder in the left hand panel, this rearranges to show that folder at the top, and all it’s children – and the pictures will appear in the centre of the screen. The user will click on an picture to open it for editing. I’ve taken this approach, rather than just using a standard Open File dialog because I want the user to be able to use none-keyboard/mouse input, and the standard dialogs aren’t optimised for things like touch. This does have the advantage of allowing me to really play with the styling and provide a completely unified experience across the different areas of the application.

Well, now that I’ve finished roughing out the first part of the interface, it’s time for me to actually write some code. I’ve decided that the initial codebase is going to be broken down into four projects – I’m using WPF, C#, .NET 4.5 and Visual Studio Ultimate 2012 on Windows 8 Professional for those who care about such things - and it looks like this:

Goldlight.Common provides common items such as interfaces that are used in the different projects, and definitions for things like WeakEvents.
Goldlight.Perceptual.Sdk is the actual meat of the SDK code. Initially this will be kept simple, but I will expand and enhance this as we go through the weeks.
Goldlight.Windows8 contains the plumbing necessary to use Ultrabook^TM features such as flipping the display into tablet mode, and it isolates the UI from having to know about all the plumbing that has to be put in place to use the WinRT libraries.
Huda is the actual application, so I’m going to spend most of this week and next week deep in this part, with some time spent in Goldlight.Perceptual.Sdk.

When I start writing a UI, I tend to just rough-block things in as a first draft. So that’s what I did today. I’ve created a basic page and removed the standard Windows chrome. I’m doing this because I want to have fine grained control of the interface when it transitions between desktop and tablet mode. The styling is incredibly easy to apply, so that’s where I started.

A quick note if you’re not familiar with WPF development. When styling WPF applications, it’s generally a good idea to put the styling in something called a ResourceDictionary. If you’re familiar with CSS, this is WPF’s equivalent of a separate stylesheet file. I won’t bore you with what this file actually looks like at this point, but please let me know if you would like more information. Once I’ve fine tuned some of the styling, I’ll make this file available and show how it can be used to transition the interface over – this will play a large part when we convert our application from desktop to tablet mode, so it makes sense to put the plumbing in place right at the start.

My first pass on the UI involved creating a couple of basic usercontrols that animate when the user brings the mouse over them or touches them; giving a visual cue that there’s something of interest in this area. I’ve deliberately created them to be ugly – this is a large part of my WPF workflow – concentrate on putting the basics in place and then refine them. I work almost exclusively with a development pattern called MVVM (Model View ViewModel), which basically allows me to decouple the view from the underlying application logic. This is a standard pattern for WPF zealots like myself, and I find that it really shines in situations like this, where I just quickly want to get some logic in place.

The first usercontrol I put in place is just an empty shell that will hold the filters that the user has added. As I need to get an image on the screen before I can add any filters, I don’t want to spend too much time on this just yet – I just needed to have it in the UI as a placeholder, primarily so that I can see how gesture code will affect the UI.

The second conrol is more interesting. This control represents the current selected folder and all its children. My first pass of this was just to put a ListBox in place in this control, and to have the control expand and contract as the user interacts with it. The ListBox holds the children of the current folder, so I put a button in place to allow the user to display the images for the top level folder. When I run the application, it quickly becomes apparent to me that this doesn’t work as a UI design, so I will revisit this with alternative ideas.

I could have left the application here, happy that I had the beginnings of a user interface in place. Granted, it doesn’t do much right now – it displays the child folders associated with My Pictures, and that’s about it, but it does work. However, what’s the point of my doing this development if I don’t bring in gesture and voice control. In the following video, once Huda has started, I’m controlling the interface entirely with gestures and voice recognition (when I say filter, a message box is displayed saying filter – not the most startling achievement, but it’s pretty cool at how easy it is to do). Because I’m going to issue a voice command for this first demonstration, I decided not to do a voice over – it would just sound weird if I paused halfway through a sentence to say “Filter”.

Huda – the very first draft. Day 1, and boy is it ugly.

As you can see – that interface is ugly, but it’s really useful to get an idea of what works and what doesn’t. As I’m not happy with the folder view, that’s what I’ll work on tidying up in day 2.

Note: I’m not going to publish videos every day, and I’m not going to publish a day by day account of the development. The first couple of days are really important in terms of starting the process off and these are the points where I can really make quick wins, but later on, when I start really being finicky with the styling, you really aren’t going to be interested in knowing that I changed a TextBlock.FontSize from 13.333 to 18.666 – at least I hope you’re not.

The important thing for me, at the end of day 1, is that I have something to see from both sides. I have a basic UI in place; there’s a long way to go with it yet, but it’s on the screen but there’s actual Perceptual work going on there, and it’s actually pretty easy to get the basics in place. More importantly, my initial experiments have shown that the gestures are quite jerky, and getting any form of fine grained control is going to take quite a bit of ingenuity. Unfortunately, while I can get the voice recognition to work, it appears to be competing for processing time with the gesture code – both of which are running as background tasks.

One of the tasks I’ll have to undertake is to profile the application to see where the hold up is – I have suspicions that the weak events might be partly to blame, but I’ll need to verify this. Basically, a weak event is a convenience that allows a developer to right code that isn’t going to be adversely affected if they forget to release an event. While this is a standard pattern in the WPF world, it does have an overhead, and that might not be best when we need to eke out the last drop of performance. I might have to put the onus on any consumer of this library to remember to unhook any events that they have hooked up.

Here’s the gesture recognition code that I put in place today, I know it’s not perfect and there’s a lot needs doing to it to make it production level code, but as it’s the end of the first day I’m pretty happy with it:

using Goldlight.Perceptual.Sdk.Events;
using Goldlight.Windows8.Mvvm;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace Goldlight.Perceptual.Sdk
{
    public class GesturePipeline : AsyncPipelineBase
    {
        private WeakEvent gestureRaised = new WeakEvent();

        public event EventHandler HandMoved
        {
            add { gestureRaised.Add(value); }
            remove { gestureRaised.Remove(value); }
        }

        public GesturePipeline()
        {
            EnableGesture();
        }

        public override void OnGestureSetup(ref PXCMGesture.ProfileInfo pinfo)
        {
            // Limit how close we have to get.
            pinfo.activationDistance = 75;
            base.OnGestureSetup(ref pinfo);
        }

        public override bool OnNewFrame()
        {
            // We only query the gesture if we are connected. If not, we shouldn't
            // attempt to query the gesture.
            try
            {
                if (!IsDisconnected())
                {
                    var gesture = QueryGesture();
                    PXCMGesture.Gesture gesture1;
                    PXCMGesture.GeoNode nodeData;
                    var status = gesture.QueryNodeData(0, GetSearchPattern(), out nodeData);
                    if (status >= pxcmStatus.PXCM_STATUS_NO_ERROR)
                    {
                        var handler = gestureRaised;
                        if (handler != null)
                        {
                            var node = nodeData;
                            handler.Invoke(new GestureEventArgs(node.positionImage.x, 
 node.positionImage.y, node.body.ToString()));
                            }
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                // Error handling to go here...
                Debug.WriteLine(ex.ToString());
            }
            return base.OnNewFrame();
        }

        private PXCMGesture.GeoNode.Label GetSearchPattern()
        {
            return PXCMGesture.GeoNode.Label.LABEL_BODY_HAND_PRIMARY |
                PXCMGesture.GeoNode.Label.LABEL_FINGER_INDEX |
                PXCMGesture.GeoNode.Label.LABEL_FINGER_MIDDLE |
                PXCMGesture.GeoNode.Label.LABEL_FINGER_PINKY |
                PXCMGesture.GeoNode.Label.LABEL_FINGER_RING |
                PXCMGesture.GeoNode.Label.LABEL_FINGER_THUMB |
                PXCMGesture.GeoNode.Label.LABEL_HAND_FINGERTIP;
        }
    } 
}

At this stage, I’d just like to offer my thanks to Grasshopper.iics for the idea of tying the hand gesture to the mouse. As a quick way to demonstrate that the gesture was working, it’s a great idea. As I need to track individual fingers, it’s not a viable long term solution, but as a way to say “oh yes, that is working”, it’s invaluable.

Day 2

I’ve had a night to think about the folder display, and as I said yesterday, I’m really not happy with the whole button/list approach to the interface. What had started off as an attempt to try to steer clear of the whole file system as logical tree metaphor just feels too alien to me, and I suspect that I would end up having to rework a lot of the styling there to make this appear to be a logical tree. Plus, I really need to hook something up in the UI so that I can select folders and trigger the reload of the selected folder along with child folders. We’ll attend to the styling first.

As I’ve stated that we are going to present this part of the interface as though it’s a tree, it makes sense for us to actually use a tree, so I’m going to rip out the list and button, and replace them with a simple tree. As I’m using MVVM, the only thing I have to update is my UI code (known as the View). My internal logic is going to remain exactly the same – this is why I love MVVM. More importantly, this highlights why I start off with rough blocks. I like the fact that I can quickly get a feel for what’s working before I invest too much time in it. If you’re a developer using a tech stack that you’re comfortable with, and you have a technology that allows you to take this rapid iterative approach, I cannot recommend this quick, rough prototyping enough. It’s saved me from a lot of pain any number of times, and I suspect that it will do the same for you.

The second thing I’m going to do is hook selecting a tree node up to actually doing the refresh. Again, I put most of the plumbing in place for this yesterday – all I need to do today is actually hook the tree selection to the underlying logic.

Now, I really want to play around with the styling a little bit. I’m going to restyle the folder tree so that it looks a bit more attractive, and I’m going to change the filter control and the folder view so that the user can drag them around the interface – including using touch as I really feel that this really helps to make the Ultrabook^TM stand out from other form factors. Having played around with the code, I’ve now got this in place:

I’ve added a lot of plumbing code to support the touch drag here. I’m leaving this part of the code "open" and unfinished right now because I want to add support into this for dragging through gestures. By doing this, I won’t have to touch the UI elements code to make them work, they should just work because they are responding to commands from this code.

Day 3+

I’ve really been pushing to get the application to the point where the user can select a photo from my own folder browser and picture selector combo, but the first thing I thought I would address was the voice control. When I really sat down with the code I’d put in place, I realised that I was doing more than I really needed to. Basically, in my architecture, I was creating a set of commands that the application would use as a sort of command and control option and while that seemed to me to be a logical choice when I put it in, sobre reflection pointed out to me that I was overcomplicating things. Basically, the only thing that needs to know about the commands is the application itself, so why was I supplying this to the perceptual part? If I let the Perceptual SDK just let me know about all the voice data it receives, the different parts of Huda could cherry pick as they saw fit. Two minutes of tidy up code, and it’s responding nicely.

As a quick aside here. The voice recognition doesn’t send you a stream of words as you’re reading out. It waits until you’ve paused and it sends you a phrase. This means that you have to be a little bit patient; you can’t expect to say “Filter” in Huda and for the filters to pop up a millisecond later because the voice recognition portion is waiting to see if you’ve finished that portion.

Fortunately, this means that my voice code is currently insanely simple:

using Goldlight.Perceptual.Sdk.Events;
using Goldlight.Windows8.Mvvm;
using System;
namespace Goldlight.Perceptual.Sdk
{
    /// 
    /// Manages the whole voice pipeline.
    /// 
    public class VoicePipeline : AsyncPipelineBase
    {
        private WeakEvent voiceRecognized = new WeakEvent();

        /// 
        /// Event raised when the voice data has been recognized.
        /// 
        public event EventHandler VoiceRecognized
        {
            add { voiceRecognized.Add(value); }
            remove { voiceRecognized.Remove(value); }
        }

        /// 
        /// Instantiates a new instance of <see cref="VoicePipeline"/>.
        /// 
        public VoicePipeline() : base()
        {
            EnableVoiceRecognition();
        }

        public override void OnRecognized(ref PXCMVoiceRecognition.Recognition data)
        {
            var handler = voiceRecognized;

            if (handler != null)
            {
                handler.Invoke(new VoiceEventArgs(data.dictation));
            }
            base.OnRecognized(ref data);
        }
    } 
}

The call to EnableVoiceRecognition lets the SDK know that this piece of functionality is interested in handling voice recognition (cunningly enough – I love descriptive method names). I could have put this functionality into the same class as I’m using for the gesture recognition, but there are a number of reasons I’ve chosen not to, the top two reasons being.

I develop in an Object Oriented language, so I’d be violating best practices by “mixing concerns” here. This basically means that a class should be responsible for doing one thing, and one thing only. If it has to do more than one thing, then you need more than one class.
I want to be able to pick and mix what I use and where I use it in my main application. If I have more than one piece of functionality in one class then I have to start putting unnecessarily complicated logic in there to separate the different parts out into areas that I want to use.

The OnRecognized piece of code lets me pick up the phrases as they are recognized, and I just forward those phrases on to Huda. As Huda is going to have to decide what to do when it gets a command, I’m going to let it see all of them and just choose the ones it wants to deal with. This is an incredibly trivial operation.

“Wow Pete, that’s a lot of text to say not a lot” I hear you say. Well, I would if you were actually talking to me. It could be the voices in my head supplying your dialogue here. The bottom line here is that Huda now has the ability to recognise commands much more readily than it did at the start of the week, and it recognizes them while I’m waving my arms about moving the cursor around the screen. That’s exciting. Not dangerous exciting. Just exciting in the way that it means that I don’t have to sacrifice part of my desired feature set in the first week. So far so good on the voice recognition front.

By the end of the week, Huda is now in the position where it displays images for the user to select, based off whether or not there are pictures in a particular folder. This is real progress, and I’m happy that we can use this to push forwards in week 2. Better still though, the user can select one of those pictures and it opens up in the window in the background.

I’m not quite happy with the look of the back button in the folders – it’s still too disconnected, so I’ve changed it. Next week, I’ll add folder representations so that it’s apparent what these actually are, but as that’s just a minor template change, I’m going to leave it for now. Here’s a sample of Huda in action, opening up folders and choosing a picture to view.

Week 1 – Folders and photo opening almost complete

Keeping my head together

So, what do I do to keep my attention on the project? How do I keep focused? Music and a lot of Cola. So, for your delectation, this weeks playlist included:

AC/DC – For those about to rock (one of my favourites)
David Lee Roth – Eat ‘em and Smile/Skyscraper
Andrea Bocelli – Romanza
Black Veil Brides – Set the world on fire
Deep Purple – Purpendicular
Herb Ellis and Joe Pass – Two for theroad
The Angels – Two minute warning
Chickenfoot – Chickenfoot III

Each week, I’ll let you know what I’ve been listening to, and let’s see if you can judge what part of the application goes with what album. Who knows, there may be a correlation on something or other in there – someone may even end up getting a grant out of this.

Final thoughts for week 1

This has been a busy week. Huda has reached a stage where I can start adding the real meat of the application – the filters. This is going to be part of my push next week; getting the filter management and photo saving into place. The photo is there for the user to see, but that’s nowhere near enough so we’ll be looking at bringing the different parts together, and really making that UI pop out. By the end of next week, we should have all the filters in place – including screens for things like saturation filters. This is where we’ll start to see the benefits of the Ultrabook because we’ll offer touch optimised and keyboard/mouse/touch optimised screens depending on how the Ultrabook is being used.

Right now, the Perceptual features aren’t too refined, and I’ve not even begun to scratch the surface of what I want to do with the Ultrabook. Sure, I can drag things around and select them with touch, but that’s not really utilising the features in a way that I’d like. Next week, I’m incorporating some user interface elements that morph depending on whether you are using the Ultrabook in a desktop mode, or as a tablet. For example, I’ll be adding some colour adjustment filters where you can adjust values using text boxes in desktop mode, but I’ll be using another mechanism for adjusting these values when it’s purely tablet (again, I don’t want to spoil the surprise here, but I think there is a pretty cool alternative way of doing this).

The big challenge this week has been putting together this blog entry. This is the area where the solo contestants have the biggest disadvantage – time blogging is time we aren’t coding, so there’s a fine line we have to tread here.

One thing I haven’t articulated is why I’m using WPF over WinRT/Metro for the application development. As I’ve hinted, I’ve a long history with WPF, and I’m a huge fan of it for developing applications. On the surface (no pun intended), WinRT XAML apps would appear to be a no brainer as a choice, but there are things that I can do quickly in WPF that will take me longer to achieve with WinRT XAML, simply because WPF is feature rich and XAML support in Windows 8 has a way to go to match this. That’s not to say that I won’t port this support across to WinRT at some point, but as speed is of the essence here, I want to be able to use what I know well, and what I have a huge backlog of code that I can draw on as and when I need to.

I’d like to thank the judges, and my fellow contestants for their support, ideas and comments this week. No matter what happens at the end of this contest, I can’t help think that we are all winners. Some of the ideas that the other teams have are so way out there, that I can’t help but want to incorporate more. Whether or not I get the time to add extra is something that is up for grabs, but right now I think that I want to try and bring gaze into the mix as an input, possibly to help with people with accessibility issues – an area that I haven’t really explored with the gesture SDK yet.