next up previous
Next: Bibliography Up: Gesture Recognition using HMM Previous: RWC Multi-modal Database of

HMM for gesture recognition

From the RWC database, 17 patterns of gestures shown in Figure5 from 6 persons were used in the following recognition experiment. At first, the size of each image in the image sequence was reduced to a half or a quarter. Then the first and second order PARCOR images were computed for each frame using online estimation of autocorrelations. The size of PARCOR images was reduced to a half again and HLAC features were extracted. Thus we can obtain 70 dimensional feature vector for each frame. The sequence of feature vectors were fed into a HMM based recognizer.

In the first experiment, the parameters of HMM with 7 hidden states shown in Figure3 were estimated using all the samples and then recognition rates were evaluated. The results are shown in Table3 as ``closed''. In the second experiment, the samples of first three repetitions were used for learning of HMM and the recognition rates were evaluated by using the fourth repetition. The results are shown in Table3 as ``open''. These results are not higher than expected. The reason is probably that the number of training samples is too small to estimate the parameters of such high dimensional HMM.

To reduce the dimension of feature vectors and to obtain uncorrelated features, we applied Principal Component Analysis (PCA) to HLAC feature vectors. New features are obtained by linear combinations of the HLAC feature ${\bf x}$ with weights C=[cij] as

\begin{displaymath}{\bf z} = C^T ({\bf x} - \bar{{\bf x}}).
\end{displaymath} (10)

The optimal coefficients C are determined by calculating the eigen vectors of the covariance matrix $\Sigma$ of HLAC feature vectors. By setting a threshold of cumulative variations in new features, namely

\begin{displaymath}\sum_{i=1}^k \lambda_i/\sum_{i=1}^M \lambda_i,
\end{displaymath}

to 0.99, the 70 dimensional feature vectors were reduced to 23 and 12 dimensional principal component feature vectors for the cases with $80 \times 60$and $160 \times 120$ PARCOR images respectively. These reductions of dimensions suggest that HLAC features of PARCOR images are very redundant. The recognition rates are shown in Table3 as "closed (PCA)" and "open (PCA)". It is noticed that the results are improved by removing the redundancies in HLAC features of PARCOR images by PCA.


 
 
Table 3: Recognition rates.
  Size of PARCOR images Rate (%)
closed $80 \times 60$ 61.76
closed $160 \times 120$ 71.08
open $80 \times 60$ 40.69
open $160 \times 120$ 61.03
closed (PCA) $80 \times 60$ 79.66
closed (PCA) $160 \times 120$ 91.18
open (PCA) $80 \times 60$ 66.67
open (PCA) $160 \times 120$ 82.11


  
Figure 5: Snapshots of gestures.
\begin{figure}\begin{center}
\psfig{file=images/m01i1.020.eps,width=16mm}\psfig{...
...,width=16mm}\psfig{file=images/m01y1.050.eps,width=16mm}\end{center}\end{figure}


next up previous
Next: Bibliography Up: Gesture Recognition using HMM Previous: RWC Multi-modal Database of
Takio Kurita
1998-03-13