Click here to Skip to main content
Click here to Skip to main content

Getting the most of Kinect SDK in C# - Part 2 of ?: ImageStreams

, 6 Jan 2012
Rate this:
Please Sign up or sign in to vote.
A series about experimenting with Kinect for Windows SDK.

Introduction

This is part two of a series documenting my experiments with Kinect for Windows SDK. After the first two or three articles, this series should make a quite good walkthrough for beginners and a nice reference for more advanced developers.

Part one described the initialization of the Kinect SDK including parameters for the image capturing engine. Please refer to part one for details about initialization and general background.

Series table of contents:

  1. Initialization
  2. ImageStreams
  3. Coming soon...

Background

The Kinect device has two cameras:

  • Video camera - RGB camera for capturing images
  • Depth camera - infrared camera used to capture depth data

This article will focus on getting and processing data acquired by cameras.

What are ImageStreams?

ImageStream is a class provided by Kinect SDK for accessing data captured by Kinect cameras. Each Kinect Runtime has two streams:

  • VideoStream - has to be opened with ImageStreamType.Video and ImageType.Color, ImageType.ColorYuv, or ImageType.ColorYuvRaw.
  • DepthStream - has to be opened with ImageStreamType.Depth and ImageType.Depth or ImageType.DepthAndPlayerIndex.

As previously described in part one, each stream has to be Open() after Runtime initialization. The third parameter required for the stream is ImageResolution - 80x60, 320x240, 640x480, and 1280x1024. Please note that DepthStream has a maximum resolution of 640x480 and different values of ImageType support different resolutions.

The usage of ImageStream is quite simple. You can call the GetNextFrame method or attach to events exposed by runtime: DepthFrameReady or VideoFrameReady.

In case of using events, you will get the frame data through ImageFrameReadyEventArgs.ImageFrame.

Accessing image data

Using any of the methods mentioned above, you will get an ImageFrame, which holds the image data itself in an Image field and some metadata such as:

  • Type - contains the type of image (ImageType) - useful in case you use the same handler for both types of ImageStream
  • FrameNumber
  • Timestamp
  • Resolution

As FrameNumber and Timestamp seem to be quite accurate, they present great value if you need to detect lost frames, measure time between frames, keep sync between video and depth, or otherwise - if you don't need a new image more often than one second.

PlanarImage

The Kinect SDK provides its own class for keeping captured images. It is as simple as it can be - it holds Width, Height, BytesPerPixel, and raw data in byte[] Bits.

Video frames hold information in 32-bit XRGB or 16-bit UYVY format.

Depth frames have two different formats depending on choosing the Depth or DepthAndPlayerIndex stream type:

  • 12-bit depth data (stored in two bytes with upper 4 bits unused)
  • 3-bit player index (bits 0-2) and 12-bit depth data (starting at bit 3)

A depth data of value 0 means that objects at this position are either too close or too far.

The PlanarImageHelper class included in the sources simplifies access to individual pixels:

public class PlanarImageHelper
{
    private PlanarImage _img;
    
    public PlanarImage Image { get { return _img; } }

    public PlanarImageHelper(PlanarImage src)
    {
        _img = src;
    }

    public Byte GetRedAt(int x, int y)
    {
        return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 2];
    }

    public Byte GetGreenAt(int x, int y)
    {
        return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 1];
    }

    public Byte GetBlueAt(int x, int y)
    {
        return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 0];
    }

    public int GetPlayerAt(int x, int y)
    {
        return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel] & 0x07;
    }

    public int GetDepthAt(int x, int y, bool hasPlayerData)
    {
        try
        {
            int BaseByte = y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel;
            if (hasPlayerData)
            {
                return (_img.Bits[BaseByte + 1] << 5) | (_img.Bits[BaseByte] >> 3);
            }
            else
            {
                return (_img.Bits[BaseByte + 1] << 8) | (_img.Bits[BaseByte]);
            }
        }
        catch
        {
            return 0;
        }
    }
}

ImageStreamTest

In the attached source code, you will find the ImageStreamTest application. It is a simple illustration of ImageStreams usage and depth data utilisation.

On the left side of the window, you can choose the effect applied on the image based on depth data:

  • None - just the video fames captured
  • Depth - each pixel on the video frame is compared with depth data for the same point and replaced with white if it does not fall in the range set by sliders
  • Player - all pixels that do not contain player index are replaced with white
  • Background - a not very successful try to show background only - pixels without player index are copied to background image and with player index are replaced with remembered background

How to process the images

It depend on your needs. As you can see, in my example, I choose the "iterative" method, because it is very simple to write and very clear to read. On the other hand, it has very poor performance.

As the depth frame can be treated as a grayscale image, you can achieve the same effects as in my example using filters easily found in all good image processing libraries - threshold and mask.

First, you have to decide what you really need. If you are building an augmented reality application, then you will need a high quality video and fast image blending. If you will analyse only part of the image from time to time (face recognition, for example), then you still need hi-res images, but not high a fps, and this means you can skip processing every frame in the event handler and get frames on demand.

As you can see from the previus sections, the Kinect SDK provides images in very raw format. This means it could be easily converted to anything you need. Most graphics libraries are able to take this raw array of bytes and create an internal image representation in the most efficient way.

Points of interest

If your needs are mostly image processing with depth map aid, you should stop here and look for some image processing library.

But if you really want to get the most of the Kinect NUI, go to the next big thing - the skeleton tracking engine.

History

  • 2012-01-06: Initial submission.

License

This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)

About the Author

Jarek Kruza
Other Flextronics
Poland Poland
Programming since 10 years old with first commercial app sold in age of 16.
 
In past got Bachelor's degree in Computer Sciences and worked as Linux administrator and software developer.
 
Currently slightly over 30 and working as IT Project Manager for Flextronics.
 
Still coding for fun and/or money.

Comments and Discussions

 
QuestionMessage Automatically Removed Pingroupsghjyuk7-Jan-12 1:16 
Message Automatically Removed

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140709.1 | Last Updated 6 Jan 2012
Article Copyright 2012 by Jarek Kruza
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid