Sitting posture recognition with Kinect sensor

Gavrilov Alexey

4.78/5 (11 votes)

Sep 28, 2011

CPOL

6 min read

69130

3370

Recognition of concentrating, non-concentrating, sleeping, and raise-hand postures.

Introduction

Many specialists are predicting that in the closer future, a new revolution in information technologies will occur. This revolution will be connected with new computer abilities to segment, track, and understand pose, gestures, and emotional expressions of humans. For this, computers must begin to use new types of video sensors that will provide 3D-videos. The Kinect sensor is the first of such new types of sensors. The Kinect sensor has two cameras: a traditional color video camera and an infrared light sensor that measures depth, position, and motion. The Kinnect sensor started as a sensor for the XBox 360 game system about an year ago, but almost immediately many software developers began to try to use it for recognition of human poses and gestures. More information about can be read from www.kinecthacks.com.

My article is devoted to research of sitting posture recognition. Sitting posture recognition is based on human skeleton tracking. There are three software packages that may produce human skeleton tracking with the Kinect sensor: OpenNi/PrimeSense Nite library, Micosoft Kinnect Research SDK, and the Libfreenet library. I have used the first two. On their basis, I developed C# WPF applications where I combined color video streams and skeleton images.

These applications run under Microsoft Windows 7 and .NET Framework 4.0. For their compilation, you need Microsoft Visual Studio 2010. You may find instructions to install the OpenNi/PrimeSense Nite library and the Microsoft Kinect Research SDK at www.kinecthacks.com.

Background

Sitting posture recognition algorithm is based on human skeleton tracking and obtaining three coordinates (x_s, y_s, z_s), (x_h, y_h, z_h), and (x_k, y_k, z_k) of the positions of the human Shoulder (denoted as S), Hip (denoted as H), and Knee (denoted as K).

A sitting posture is related to the angle a between the line HK (from hip to knee) and the line HS (from hip to shoulder).

We will distinguish the left body part angle a - angle between “center hip to left knee” vector and “center hip to center shoulder” vector, and right body part angle a - angle between “center hip to left knee” vector and “center hip to center shoulder” vector.

From angle a and the hand’s position, the human sitting posture can be concluded and classified as one of four specified types - sleeping, concentrating, raising hand, and non-focusing, as given in the table below.

Angle, a	Hand posture	Sitting posture
0 ~ 40	down	sleeping
40~80	down	non-concentrating
80~100	down	concentrating
80~100	up	raising hand
100~180	down	non-concentrating

Using the Code

I had two problems combining a color video stream and a skeleton image.

The first problem was how to locate them simply in one control in a window. For this problem, I used a simple WPF form for both applications that contain a StatusBar control and a Grid panel. The Grid panel contains an Image and Canvas control with the same size.

<Window x:Class="RecognitionPose.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="User tracking with Microsoft SDK" Height="600" 
        Width="862" Loaded="Window_Loaded" 
        DataContext="{Binding}">
    <DockPanel LastChildFill="True">
        
        <StatusBar Name="statusBar" 
             MinHeight="40" DockPanel.Dock="Bottom">
            <StatusBarItem>
                <TextBlock Name="textBlock" 
                   Background="LemonChiffon" 
                   FontSize='10'> Ready </TextBlock>
            </StatusBarItem>
        </StatusBar>
        <Grid DockPanel.Dock="Top">
            <Image Name="imgCamera" Width="820" 
               ClipToBounds="True" Margin="10,0" />
            <Canvas Width="820" Height="510" 
               Name="skeleton"   ClipToBounds="True"/>
        </Grid>
    </DockPanel>
</Window>

The second problem of working with both OpenNI/PrimeSense Nite and the Microsoft SDK is that the events of refreshing video frames and skeleton frames occur non-synchronously.

For this problem, for the Microsoft SDK case, I call the main method RecognizePose of my Recognition class in the SkeletonFrameReady event handler after imgCamera and the skeleton controls are refreshed. The SkeletonFrameRead event handler simply synchronizes with the VideoFrameReady event handler by copying the current video frame in the planar image temp variable:

planarImage = ImageFrame.Image;

and then copying this temp variable to imgCamera.Source in the SkeletonFrameReady event handler:

imgCamera.Source = BitmapSource.Create(planarImage.Width, planarImage.Height,
  194,194,PixelFormats.Bgr32, null, planarImage.Bits, 
  planarImage.Width * planarImage.BytesPerPixel);

For the OpenNi/PrimeSense Nite case, I use the NuiVision library http://www.codeproject.com/Articles/169161/Kinect-and-WPF-Complete-body-tracking written by Vangos Pterneas for synchronization of the video frame and skeleton recognition events. I call the RecognizePose method in the UsersUpdated event handler of this library.

For sitting posture recognition, the main problem was to find for the human, the distance and angle relative to the Kinect sensor where recognition is stable. For this purpose, I added five parameters in the application settings to control the algorithm behavior:

isDebug -if true then show information about the current human location on the status bar;
confidenceAngle - control differences between the left part body and right part body angles a; if this angle is more for the given level, we assume that the recognition isn't stable.
standPoseFactor - control differences between the sitting and standing pose; if the current human height multiplication on this factor is more than the initial human height in standing pose, we assume that the current pose is standing pose too.
isAutomaticChoiceAngle - choice between automatic definition angle a as nearest to camera (true) and calculation angle a as average (false) between the left part body and right part body angles a;
shiftAngle - shift angle subtracted from angle a to delete skeleton recognitions error.

I found that the most stable sitting recognition occurs when these parameters have these values:

confidenceAngle=50 degree;
standPoseFactor=1.1;
isAutomaticChoiceAngle=true;
shiftAngle=20.

The Kinect sensor is located on the floor, the distance between the Kinect sensor and sitting human is about 2 meters, and the human body in turned on a 45-degree angle relative to the sensor.

Advantage of a sitting human location is that the Kinect sensor may constantly track the parts of the human body that are necessary for recognition:

two knees;
one hip;
two shoulders;
two hands;
head

For other human locations, this isn't so. For example, for frontal location, the sensor really doesn't track the hip; for profile location, the sensor tracks only one part of the body: right or left.

Points of Interest

I made two movies about using these two applications:

http://www.youtube.com/watch?v=OIoVTSreR0w (recognition using Microsoft Research Kinect SDK)
http://www.youtube.com/watch?NR=1&v=1-etFuPg7J4 (recognition using PrimeSense Nite library)

From the movies, we can conclude that recognition works well for both software packages. However, the applications may be improved significantly by extending the sitting human location zone where the recognition is stable. For this, we must use not one but two or more Kinect sensors.

I think that these applications may be used in any area where it is necessary to control human behavior in sitting pose. For cases when human state becomes non-concentrating or sleeping, the applications may be enhanced by adding some feedback that will send an alarm, alert, or emergency signal. On the other hand, this application may be used in universities to collect statistics about student activity during seminars and labs. These applications will calculate the average time a student is concentrating or not-concentrating during a seminar, the number of times they are raising hands, and the professor can account this statistics in personal works with the student.