Kinect Library

Interface Library: OpenNI/NITE

There are two main libraries that will interface with the Kinect and provide skeletal data. The first is the OpenNI/NITE stack created by PrimeSense, and the second is the Microsoft Kinect SDK.

When this project was started, OpenNI was chosen, if only because it was at the time the only such library available. There were rumors of an official SDK from Microsoft, but it was unconfirmed for several months. With the release of the Kinect SDK several months ago, I've made a few looks at it to see if it would work for our purposes, and believe that it could. However, as there is no pressing feature that would yet make us switch, I haven't focused on it.

OpenNI/NITE

OpenNI is an open source stack for interfacing with generic Natural Interaction devices and protocols, originally created and released by PrimeSense. NITE is the closed source OpenNI module that handles skeletal tracking, gesture detection, and other related features. This stack is complimented by avin2's fork of the PrimeSense sensor driver which is compatible with the Kinect.

Pros

Cross platform (Windows, Linux, Mac OSX)
C, with C++ wrapper
Simple support for gesture detection
Can track several people simultaneously (theoretically - I haven't really tested this)
Longer existence, many projects already based on it

Cons

Calibration is not as flexible
Skeletal tracking is not as fine-grained

Microsoft Kinect SDK

The official Kinect SDK for developing Windows applications.

Pros

Very accurate skeleton tracking (can even track wrists, for example)
C++, .NET
Supported by Microsoft
Access to microphone and motor (with some advanced noise cancellation technology to boot)

Cons

Windows 7 support only
Hasn't been around for as long
From what I've seen on some blogs and other peoples thoughts, it uses more processing power

Design

Component Model

This model is defined by a base entity that contains a set of other components. The base entity in this case is the Kinect class, that provides the low level OpenNI interfaces to the Kinect by way of the depth image, RGB image, user tracking, and skeleton data. The final components are the KinectImage and KinectSkeleton that provide the high level abstraction that external classes can use.

By breaking this functionality out into components, adding functionality is simplified, as a clean new component can be created with the desired abilities. It has direct access to the raw data provided by OpenNI and NITE through the Kinect entity, but doesn't need to maintain this data itself. This allows multiple components to work off the same data set, and not require multiple repeated calculations or data storage.

Pros

Easy to add new functionality, as functionality is segregated
Data/processing is not repeated as all data is isolated to a single object

Alternatives/Previous Designs

Originally, I had one large object that handled skeleton tracking, pose detection, hand tracking, image generation, and everything. It quickly became too large to add functionality. Things that needed to modify the state of tracking, storing the last hand seen, and enabling and disabling pose watching all seemed to conflict with each other. But I didn't want to make each different object that wanted access to the Kinect to have to create its own set of Kinect callbacks and storage objects.

Workflow

The current version of the Kinect library is designed to make the workflow fluid.

Calibration

Calibration is now done online, rather than at a dedicated time, as it did originally. This still allows the user to calibrate when needed, but makes the workflow feel more fluid when the tracking is lost and found repeatedly in a short time, and also explains better what is going on during this process. It doesn't provide as easy a way to indicate the status of calibration, or alert the user to what they need to do to calibrate, which is a large drawback.

A new feature is saved calibrations. The first time ever (first run of the program), the first calibration data is saved to disk. For all subsequent calibrations, this base is used to enable much quicker calibration. The change was extremely noticeable. Not only did it recalibrate more quickly, but when calibrating from a file, it no longer requires the user to make a 'Y' pose for the calibration to detect. A downside to this feature is that on the rare case when a calibration file is not present, the user must know to make the 'Y' pose, and must wait longer to make the calibration complete.

Another design addition was the ability to reset the targeting if it is ever lost. This was added because, during initial testing, there were many times were it was either unclear who it was trying to target, or was difficult to target a second person when one person had already been located, or it would begin tracking with the skeleton in a disjointed manner. By enabling a 'kill' switch in the targeting, it is easy to reset things when they aren't as expected. Combined with the online calibration and saved calibrations, this greatly improves usability.

In the original incarnation, calibration occurred before anything else could happen. This however caused bugs with losing a target, and trying to recalibrate while in the calibration state. It had the advantage of being able to dedicate a section to explaining how to calibrate, and also showing a status bar showing the state of calibration.

Recording

The imagined workflow for recording involves the Wiimote. With the Wiimote in hand, the user can quickly and at any moment start and stop the recording. At any point while preparing to record, or even in a recording (though it would cause some jumpiness), the user can reset the tracking state and do a simple fix on many problems that might have arisen.

The overall workflow would hopefully go something like this:

The user starts the program
The interface opens, with the preview, timeline, Kinect image (with skeleton overlay, when tracking), and wiimote state.
The user moves in the view of the Kinect until they are outlined on the image
The user is calibrated and a skeleton is overlayed on the image (possibly requiring a Y pose)
The user loads the action they wish to edit, and scrubs the current time to the point they would like to insert a recording
The user makes the start pose for the recording
The user clicks a button on the Wiimote to start recording
The user performs the animation
The user clicks a button on the Wiimote to stop recording
(Possibly? Not sure the best way to preview before applying the recording..) The preview window plays back the newly recorded section. The user then clicks a button on the Wiimote to confirm the recording.
- On second thought, if undo is linked to the Wiimote, they can simply record, see if it worked, if not, undo, then re-record.
  - This would probably be the most fluid, and fastest. And the easiest, as it wouldn't require a temporary timeline of sorts.
The recorded frames are inserted into the timeline at the position of the current selector, with all following frames bumped by the length of the recording

ar/tyler-design-documentation.txt · Last modified: 2014/08/13 14:51 by tmburdge

Back to top

Table of Contents

Kinect Library

Interface Library: OpenNI/NITE

OpenNI/NITE

Pros

Cons

Microsoft Kinect SDK

Pros

Cons

Design

Component Model

Pros

Alternatives/Previous Designs

Workflow

Calibration

Recording