How to Use Kinect HD Face

Vangos Pterneas

5.00/5 (1 vote)

Jun 6, 2015

CPOL

7 min read

20142

How to use Kinect HD Face

Throughout my previous article, I demonstrated how you can access the 2D positions of the eyes, nose, and mouth, using Microsoft’s Kinect Face API. The Face API provides us with some basic, yet impressive, functionality: we can detect the X and Y coordinates of 4 eye points and identify a few facial expressions using just a few lines of C# code. This is pretty cool for basic applications, like Augmented Reality games, but what if you need more advanced functionality from your app?

Recently, we decided to extend our Kinetisense project with advanced facial capabilities. More specifically, we needed to access more facial points, including lips, jaw and cheeks. Moreover, we needed the X, Y and Z position of each point in the 3D space. Kinect Face API could not help us, since it was very limited for our scope of work.

Thankfully, Microsoft has implemented a second Face API within the latest Kinect SDK v2. This API is called HD Face and is designed to blow your mind!

At the time of writing, HD Face is the most advanced face tracking library out there. Not only does it detect the human face, but it also allows you to access over 1,000 facial points in the 3D space. Real-time. Within a few milliseconds. Not convinced? I developed a basic program that displays all of these points. Creepy, huh?!

In this article, I am going to show you how to access all these points and display them on a canvas. I’ll also show you how to use Kinect HD Face efficiently and get the most out of it.

Prerequisites

Kinect for XBOX v2 sensor with an adapter (or Kinect for Windows v2 sensor)
Kinect for Windows v2 SDK
Windows 8.1 or higher
Visual Studio 2013 or higher
A dedicated USB 3 port

Source Code

Download the source code from GitHub
- .NET 4.5
- WinRT

Tutorial

Although Kinect HD Face is truly powerful, you’ll notice that it’s badly documented, too. Insufficient documentation makes it hard to understand what’s going on inside the API. Actually, this is because HD Face is supposed to provide advanced, low-level functionality. It gives us access to raw facial data. We, the developers, are responsible to properly interpret the data and use them in our applications. Let me guide you through the whole process.

Step 1: Create a New Project

Let’s start by creating a new project. Launch Visual Studio and select File -> New Project. Select C# as you programming language and choose either the WPF or the Windows Store app template. Give your project a name and start coding.

Step 2: Import the Required Assemblies

To use Kinect HD Face, we need to import 2 assemblies: Microsoft.Kinect.dll and Microsoft.Kinect.Face.dll. Right click your project name and select “Add Reference”. Navigate to the Extensions tab and select those assemblies.

If you are using WinRT, Microsoft.Kinect is called WindowsPreview.Kinect.

Step 3: XAML

The user interface is pretty simple. Open your MainWindow.xaml or MainPage.xaml file and place a drawing canvas within your grid. Preferably, you should add the canvas within a Viewbox element. The Viewbox element will let your Canvas scale proportionally as the window size changes. No additional effort from your side.

<Viewbox Grid.Row="1">
      <Canvas Name="canvas" Width="512" Height="424" />
</Viewbox>

Step 4: Declare the Kinect HD Face Objects

After typing the XAML code, open the corresponding C# file (MainWindow.xaml.cs or MainPage.xaml.cs) and import the Kinect namespaces.

For .NET 4.5, import the following:

using Microsoft.Kinect;
using Microsoft.Kinect.Face;

For WinRT, import the following:

using WindowsPreview.Kinect;
using Microsoft.Kinect.Face;

So far, so good. Now, let’s declare the required objects. Like Kinect Face Basics, we need to define the proper body source, body reader, HD face source, and HD face reader:

// Provides a Kinect sensor reference.
private KinectSensor _sensor = null;

// Acquires body frame data.
private BodyFrameSource _bodySource = null;

// Reads body frame data.
private BodyFrameReader _bodyReader = null;

// Acquires HD face data.
private HighDefinitionFaceFrameSource _faceSource = null;

// Reads HD face data.
private HighDefinitionFaceFrameReader _faceReader = null;

// Required to access the face vertices.
private FaceAlignment _faceAlignment = null;

// Required to access the face model points.
private FaceModel _faceModel = null;

// Used to display 1,000 points on screen.
private List<Ellipse> _points = new List<Ellipse>();

Step 5: Initialize Kinect and body/face Sources

As usual, we’ll first need to initialize the Kinect sensor, as well as the frame readers. HD Face works just like any ordinary frame: we need a face source and a face reader. The face reader is initialized using the face source. The reason we need a Body source/reader is that each face corresponds to a specific body. You can’t track a face without tracking its body first. The FrameArrived event will fire whenever the sensor has new face data to give us.

_sensor = KinectSensor.GetDefault();

if (_sensor != null)
{
	// Listen for body data.
	_bodySource = _sensor.BodyFrameSource;
	_bodyReader = _bodySource.OpenReader();
	_bodyReader.FrameArrived += BodyReader_FrameArrived;

	// Listen for HD face data.
	_faceSource = new HighDefinitionFaceFrameSource(_sensor);
	_faceReader = _faceSource.OpenReader();
	_faceReader.FrameArrived += FaceReader_FrameArrived;

	_faceModel = new FaceModel();
	_faceAlignment = new FaceAlignment();
        
	// Start tracking!        
	_sensor.Open();
}

Step 6: Connect a Body with a Face

The next step is a little tricky. This is how we connect a body to a face. How do we do this? Simply by setting the TrackingId property of the Face source. The TrackingId is the same as theTrackingId of the body.

private void BodyReader_FrameArrived(object sender, BodyFrameArrivedEventArgs e)
{
    using (var frame = e.FrameReference.AcquireFrame())
    {
        if (frame != null)
        {
            Body[] bodies = new Body[frame.BodyCount];
            frame.GetAndRefreshBodyData(bodies);
            
            Body body = bodies.Where(b => b.IsTracked).FirstOrDefault();
            
            if (!_faceSource.IsTrackingIdValid)
            {
                if (body != null)
                {
                    _faceSource.TrackingId = body.TrackingId;
                }
            }
        }
    }
}

So, we have connected a face with a body. Let’s access the face points now.

Step 7: Get and Update the Facial Points!

Dive into the FaceReader_FrameArrived event handler. We need to check for two conditions before accessing any data. First, we need to ensure that the frame is not null. Secondly, we ensure that the frame has at least one tracked face. Ensuring these conditions, we can call the GetAndRefreshFaceAlignmentResult method, which updates the facial points and properties.

The facial points are given as an array of vertices. A vertex is a 3D point (with X, Y, and Z coordinates) that describes the corner of a geometric triangle. We can use vertices to construct a 3D mesh of the face. For the sake of simplicity, we’ll simply draw the X-Y-Z coordinates. Microsoft’s SDK Browser contains a 3D mesh of the face you can experiment with.

private void FaceReader_FrameArrived(object sender, HighDefinitionFaceFrameArrivedEventArgs e)
{
    using (var frame = e.FrameReference.AcquireFrame())
    {
        if (frame != null && frame.IsFaceTracked)
        {
            frame.GetAndRefreshFaceAlignmentResult(_faceAlignment);
            UpdateFacePoints();
        }
    }
}

private void UpdateFacePoints()
{
    if (_faceModel == null) return;
    
    var vertices = _faceModel.CalculateVerticesForAlignment(_faceAlignment);
}

As you can see, the vertices is a list of CameraSpacePoint. The CameraSpacePoint is a Kinect-specific structure that contains information about a 3D point.

Hint: we have already used CameraSpacePoints when we performed body tracking.

Step 8: Draw the Points on Screen

And now, the fun part: we have a list of CameraSpacePoint objects and a list of Ellipse objects. We’ll add the ellipses within the canvas and we’ll specify their exact X & Y position.

Caution: The X, Y, and Z coordinates are measured in meters! To properly find the corresponding pixel values, we’ll use Coordinate Mapper. Coordinate Mapper is a built-in mechanism that converts between 3D space positions to 2D screen positions.

private void UpdateFacePoints()
{
    if (_faceModel == null) return;
    
    var vertices = _faceModel.CalculateVerticesForAlignment(_faceAlignment);
    
    if (vertices.Count > 0)
    {
        if (_points.Count == 0)
        {
            for (int index = 0; index < vertices.Count; index++)
            {
                Ellipse ellipse = new Ellipse
                {
                    Width = 2.0,
                    Height = 2.0,
                    Fill = new SolidColorBrush(Colors.Blue)
                };
                
                _points.Add(ellipse);
            }
            
            foreach (Ellipse ellipse in _points)
            {
                canvas.Children.Add(ellipse);
            }
        }
        
        for (int index = 0; index < vertices.Count; index++)
        {
            CameraSpacePoint vertice = vertices[index];
            DepthSpacePoint point = _sensor.CoordinateMapper.MapCameraPointToDepthSpace(vertice);
            
            if (float.IsInfinity(point.X) || float.IsInfinity(point.Y)) return;
            
            Ellipse ellipse = _points[index];
            
            Canvas.SetLeft(ellipse, point.X);
            Canvas.SetTop(ellipse, point.Y);
        }
    }
}

That’s it. Build the application and run it. Stand between 0.5 and 2 meters from the sensor. Here’s the result:

Kinect HD Face 1

But Wait!

OK, we drew the points on screen. So what? Is there a way to actually understand what each point is? How can we identify where they eyes are? How can we detect the jaw? The API has no built-in mechanism to get a human-friendly representation of the face data. We need to handle over 1,000 points in the 3D space manually!

Don’t worry, though. Each one of the vertices has a specific index number. Knowing the index number, you can easily deduce where does it correspond to. For example, the vertex numbers 1086, 820, 824, 840, 847, 850, 807, 782, and 755 belong to the left eyebrow.

Similarly, you can find accurate semantics for every point. Just play with the API, experiment with its capabilities and build your own next-gen facial applications!

If you wish, you can use the Color, Depth, or Infrared bitmap generator and display the camera view behind the face. Keep in mind that simultaneous bitmap and face rendering may cause performance issues in your application. So, handle with care and do not over-use your resources.

Source Code

Download the source code from GitHub
- .NET 4.5
- WinRT

PS: I’ve been quite silent during the past few months. It was not my intention and I really apologize for that. My team was busy developing the Orthosense app for Intel’s International Competition. We won the GRAND PRIZE and we were featured on USA Today. From now on, I promise I’ll be more active in the Kinect community. Please keep sending me your comments and emails.

Till the next time, enjoy Kinecting!

The post How to use Kinect HD Face appeared first on Vangos Pterneas.