Throughout the past few days, I got many requests about Kinect color to depth pixel mapping. As you probably already know, Kinect streams are not properly aligned. The RGB and depth cameras have a different resolution and their point of view is slightly shifted. As a result, more and more people have been asking me (either in the blog comments or by email) about properly aligning the color and depth streams. The most common application they want to build is a cool green-screen effect, just like the following video:
View on YouTube
As you can see, the pretty girl is tracked by the Kinect sensor and the background is totally removed. I can replace the background with a solid color, a gradient fill, or even a random image!
Nice, huh? So, I created a simple project that maps a player’s depth values to the corresponding color pixels. This way, I could remove the background and replace it with something else. The source code is hosted on GitHub as a separate project. It is also part of Vitruvius.
Read the tutorial to understand how Kinect coordinate mapping works and create the application by yourself.
How background removal works
When we refer to “background removal”, we need to keep the pixels which form the user and remove anything else that does not belong to the user. The depth camera of the Kinect sensor comes in handy for determining a user’s body. However, we need to find the RGB color values, not the depth distances. We need to specify which RGB values correspond to the user’s depth values. Confused? Please don’t.
Using Kinect, each point in space has the following information:
- Color value: Red + Green + Blue
- Depth value: The distance from the sensor
The depth camera gives us the depth value and the RGB camera provides us with the color value. We map those values using CoordinateMapper. CoordinateMapper is a useful Kinect property that determines which color values correspond to each depth distances (and vice-versa).
Please note that the RGB frames (1920×1080) are wider than the depth frames (512×424). As a result, not every color pixel has a corresponding depth mapping. However, body tracking is performed primarily using the depth sensor, so no need to worry about missing values.
In the GitHub project I shared, you can use the following code to remove the background and get the green-screen effect:
void Reader_MultiSourceFrameArrived(object sender, MultiSourceFrameArrivedEventArgs e)
var reference = e.FrameReference.AcquireFrame();
var colorFrame = reference.ColorFrameReference.AcquireFrame();
var depthFrame = reference.DepthFrameReference.AcquireFrame();
var bodyIndexFrame = reference.BodyIndexFrameReference.AcquireFrame();
if (colorFrame != null && depthFrame != null && bodyIndexFrame != null)
camera.Source = _backgroundRemovalTool.GreenScreen(colorFrame, depthFrame, bodyIndexFrame);
As you can see, the whole magic is relying on a single the BackgroundRemovalTool class. We need to be aware of the color frame data, the depth frame data and, of course, the body data, so to remove the background.
The BackgroundRemovalTool class has the following arrays of data:
- WriteableBitmap _bitmap: The final image with the cropped background
- ushort _depthData: The depth values of a depth frame
- byte _bodyData: The information about the bodies standing in front of the sensor
- byte _colorData: The RGB values of a color frame
- byte _displayPixels: The RGB values of the mapped frame
- ColorSpacePoint _colorPoints: The color points we need to map
It also uses a image source (WriteableBitmap) for creating the final bitmap image. The CoordinateMapper is passed as a parameter from the connected Kinect sensor.
Let’s head to the GreenScreen method. Firstly, we need to get the dimensions of each frame (remember, frames have different widths and heights):
int colorWidth = colorFrame.FrameDescription.Width;
int colorHeight = colorFrame.FrameDescription.Height;
int depthWidth = depthFrame.FrameDescription.Width;
int depthHeight = depthFrame.FrameDescription.Height;
int bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
int bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
Then, we need to initialize the arrays. Initialization happens only once, so to avoid allocating memory every time we have a new frame.
if (_bitmap == null)
_depthData = new ushort[depthWidth * depthHeight];
_bodyData = new byte[depthWidth * depthHeight];
_colorData = new byte[colorWidth * colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[depthWidth * depthHeight * BYTES_PER_PIXEL];
_colorPoints = new ColorSpacePoint[depthWidth * depthHeight];
_bitmap = new WriteableBitmap(depthWidth, depthHeight, DPI, DPI, FORMAT, null);
We now need to populate the arrays with new frame data. Before doing so, we check that the array lengths correspond to the dimensions we found earlier:
if (((depthWidth * depthHeight) == _depthData.Length) &&
((colorWidth * colorHeight * BYTES_PER_PIXEL) == _colorData.Length) &&
((bodyIndexWidth * bodyIndexHeight) == _bodyData.Length))
if (colorFrame.RawColorImageFormat == ColorImageFormat.Bgra)
It’s time to use the coordinate mapper now. The coordinate mapper will map the depth values to the _colorPoints array:
That’s it! The mapping has been done. What we have to do is specify which pixels belong to human bodies and add them to the _displayPixels array. So, we loop through the depth values and update the _displayPixels array accordingly.
for (int y = 0; y < depthHeight; ++y)
for (int x = 0; x < depthWidth; ++x)
int depthIndex = (y * depthWidth) + x;
byte player = _bodyData[depthIndex];
if (player != 0xff)
ColorSpacePoint colorPoint = _colorPoints[depthIndex];
int colorX = (int)Math.Floor(colorPoint.X + 0.5);
int colorY = (int)Math.Floor(colorPoint.Y + 0.5);
if ((colorX >= 0) && (colorX < colorWidth) && (colorY >= 0) && (colorY < colorHeight))
int colorIndex = ((colorY * colorWidth) + colorX) * BYTES_PER_PIXEL;
int displayIndex = depthIndex * BYTES_PER_PIXEL;
_displayPixels[displayIndex + 0] = _colorData[colorIndex];
_displayPixels[displayIndex + 1] = _colorData[colorIndex + 1];
_displayPixels[displayIndex + 2] = _colorData[colorIndex + 2];
_displayPixels[displayIndex + 3] = 0xff;
This would result in a bitmap with transparent pixels for a background and colored pixels for the human bodies. Finally, here is how the WriteableBitmap is updated:
Marshal.Copy(_displayPixels, 0, _bitmap.BackBuffer, _displayPixels.Length);
_bitmap.AddDirtyRect(new Int32Rect(0, 0, depthWidth, depthHeight));
Back to the XAML code, you can change the background of the Grid (or whatever) element is behind the Image element and have the background of your choice. For example, this code results to the following image:
<SolidColorBrush Color="Green" />
<Image Name="camera" />
While this code results in a footbal stadium background:
<ImageBrush ImageSource="/Soccer.jpg" />
<Image Name="camera" />
Enjoy and share if you like it!
View the complete source code.
The BackgroundRemovalTool is part of Vitruvius, an open-source library that will speed-up the development of your Kinect projects. Vitruvius supports both version 1 and version 2 sensors, so you can use it for any kind of Kinect project. Download it and give it a try.
PS 2: New Kinect book - 20% off
This blog post is part of a new book I am publishing a new ebook in a few days. The book is an in-depth developer guide about Kinect, using simple language and step-by-step examples. You'll learn usability tips, performance tricks and best practices for implementing robust Kinect apps. Please meet Kinect Essentials, the essence of my 3 years of teaching, writing and developing for the Kinect platform. Oh, did I mention that you'll get a 20% discount if you simply subscribe now? Hurry up
Subscribe here for 20% off
The post Background removal using Kinect 2 (green screen effect) appeared first on Vangos Pterneas.