next up previous
Next: PARCOR Images Up: Gesture Recognition using HLAC Previous: Gesture Recognition using HLAC

Introduction

Recently interest in gesture recognition has been rapidly increasing because of its broad range of applicability in natural user interface. For example, Darrell et al.[1] and Nishimura et al.[2] used Dynamic Programming (DP) technique to match the input image sequences to the learned models with normalized image correlation scores. On the other hand, Yamato et al.[3] and Starner et al.[4] used Hidden Markov Model (HMM). However, in these methods, features extracted from image sequences are simple. To get good features, Campbell et al.[5] utilized 3-D data gathered from stereo cameras. In this paper, we propose a new gesture recognition method which uses higher order local autocorrelation (HLAC) features[6,7] extracted from PARCOR images.

Since a human who are doing a gesture often moves within image frames, features for gesture recognition should be invariant to the global shift of the person. We further have to cope with nonuniform changes in the speed of the gesture to be recognized. In speech recognition, the continuous speech waveform is usually converted to a sequence of equally spaced feature vectors, which are assumed to form an exact representation of the speech waveform and are regarded as being stationary in the duration covered by a single vector. Then these feature vectors are fed into a recognizer based on Hidden Markov Model. Here we take the same strategy for gesture recognition using shift invariant feature vectors extracted from a sequence of images.

To extract dominant information from a sequence of images, we apply linear prediction coding (LPC) technique to the sequences of pixel values and construct PARCOR images in which each pixel contains PARCOR coefficient computed from the sequence of corresponding pixel values. Then HLAC features, which are inherently shift-invariant and computationally inexpensive, are extracted from the PARCOR images. Once we obtain feature vectors from a sequence of images, we can use the HMM based recognizer which is similar with the one for continuous speech recognition to cope with nonuniform changes in the speed of the gesture. This similarity of the recognizer with speech recognition method is desirable when we integrate speech and gesture for human computer interaction systems[8]. For example, we can use a coupled HMM[9] to integrate speech and gesture.

In this paper, we explain how to construct PARCOR images from a sequence of images and how to compute HLAC features from the PARCOR images[10]. To confirm the goodness of the proposed method, experimental results of gesture recognition are shown.


next up previous
Next: PARCOR Images Up: Gesture Recognition using HLAC Previous: Gesture Recognition using HLAC
Takio Kurita
1998-03-13