Suo Qiu, Fuhao Qiu, Xiaoyi Zou, Junwu Weng, Simin Zhao
Yayu Luo, Guomin Liao, Guicong Xu
This is one of the main projects in Lab of Next-Generation Human-Computer Interaction. It is supported by Guangdong Specific Project of Generic Technology and Significant Scientific Discoveries. In this project, we try to implement a Hand Gesture Based Control System in home environment. Additionally, optimization and evaluation of hand-gesture control system are also included.
There are three groups supporting this project. One of them is responsible for the User-Interface on television, one of them is working for the hardware implementation of this system (this system will finally be integrated on chip) and the core object tracking algorithm is what our group is concerned about.
Two parts are included in our algorithm. Detector and Tracker. In detector part, we use a trained offline AdaBoost Classifier to detect Palm Gesture. Classifier is trained by using Haar feartures. In tracker part, our tracking algorithm is based on the work in . Coupled with Incremental Visual Tracking (IVT), we use one of the most important feature of Human Hand, skin color, to weight feature of ROI. Due to the constantly changing appearance of hand, subspace used to describe it would be updated every five frames. Also, the motion model we use here is particle filter.
In this project, I am mainly responsible for two parts. Improvement of tracking algorithm, problem solving of tracking and grabbing drifting. And I also participated evaluation of this system.
In our tracking algorithm, six hundred particles are involved in each frame. It is a huge impact to its computational complexity, especially for real time tracking, since every particle needs to be calculated in each frame. Therefore, optical flow algorithm is involved in the particle selection procedure. As illustrated in Figure 1, particles (green rectangle) including optical flow points (orange points) are selected for further calculation. By this way, speed of feature matching step is boosted effectively. Here, we use Lucas–Kanade method to calculate the optical flow.
Figure 1. Particle Selection by Optical Flow
IVT use constantly updating subspace to describe target. However, it is just a generative model. Only the appearance of target are taken into consideration. Therefore we try to also regard the negative samples, the background, as very important information. Algorithm proposed in  is also a subspace learning method. By using Fisher Discriminant Analysis, it takes both positive and negative samples into account. Also, this algorithm involves Sequential Karhunen–Loeve algorithm to update subspace. However, this method needs to maintain two subspace, and it really impacts the computational efficiency. The demo of Incremental Fisher Discriminative Analysis is linked below.
Drifting is one of the disadvantage of IVT. As the tracking process goes on, more and more background information will be learned into the subspace. Therefore, IVT is not a long-term tracking algorithm. In order to improve the performance of IVT, we try to re-locate position of our target on the basis of the matching output of IVT so that it could fix to our target precisely.
(1) Skin Segmentation
(2) Binarized Image
Figure 2. Search of Max-Inscribed Circle
One of the ways we try to correct the drifting problem is that we search the maximum inscribed circle of hand and use the center of that circle as the final output of tracking. Based on the output of tracking algorithm, we could acquire the approximate position of target. By using the information of skin segmentation, we could find out the max-inscirbed circle of target and then re-locate the target's position.
However, the result of this method really relys on the performance of skin segmentation. Also, because of the changing of hand shape, the position of max-inscribed circle is not always unique. Therefore, at last, we use a easier way to re-locate the position. We try to calculate the centeroid of Back Projection Image of skin color, and use it as the position of target. This method works well.
Since the position of target, hand, needs to be re-located in each frame, the bounding box fixes to the target tightly. Meanwhile, we use the position of target to control the position of mouse cursor on screen. Hence, when user try to perform grabbing gesture, the position of mouse cursor may shift and it really influence the comfort of control. In this situation, we need to freeze the shift of cursor when user try to perform grabbing gesture.
First thing we should do to solve this problem is that we need to know the motion state of hand. There was three ways we considered. 1) Ellipse Fitting, 2) Rays Counting, 3) Dense Optical Flow. Ellipse Fitting is about using a ellipse to represent the outter contour of hand shape, just as illustrated in Figure 3. Rays Counting is about counting the length of the rays that emitted upward from the center of hand shape, and the longest length of rays is used to represent the motion state of hand, as illustrated in Figure 4 (not an original picture from our project). More specificly, if the hand is with its fingers opening, then the rays will be long. Otherwise, they will be short. However, the performance of these two methods relys on the result of skin segmentation. If hand could not be seperated perfectly, both of these two methods would not work well, just as shown in Figure3(3).
Figure 3. Ellipse Fitting and Rays Counting of Hand Shape
Therefore, dense optical flow is our final solution. We try to use many different ways to get information from the dense optical flow of our target. And based on the different patterns of optical flow, we categorize the motion of hand into six classes. PalmMove, Steady, Grab, Fist, FistMove and Open. Six classes are related with each other. For instance, before the Grab state, there must be Steady, in which the hand must freeze and does not make any move. Before Fist state, there must be Grab state and so forth. Both the Grab and Open state are the time period that we need to freeze the mouse pointer. The state transfer flow is shown in Figure 4. By this way, Grabbing Drifting problem could be solved well.
Figure 4. State Transfer Flow
Becasue of the complexity of this system, any error in each module may casue system crash or the discomfort of control. Therefore, evaluation of this system is necessary. Evaluation includes, but is not confined to precision of initial detection, tracking and grabing detection as well as comfort test of control. Let's take the comfort test of control for example. We use different size of test grid, as illustrated in Figure 5, to test the control comfort level. At the beginning, tester is asked to use hand to control the mouse cursor and move the cursor to the orange square shown in Figure 5. We may record related data during this procedure. When the cursor reach its target, a new target will appear in a random position. As the process goes on, the size of the squares in grid will change, and that means it is more difficult to control the cursor since the target tester need to reach become smaller. By analyzing the control time and the trajectory of cursor, we could classify the tracking algorithm into different comfort levels, and it could help us improve our method.
(1) 5 by 5
(2) 10 by 10
Figure 5. Comfort Test Grid
 Ross, David A., et al. "Incremental learning for robust visual tracking."International Journal of Computer Vision 77.1-3 (2008): 125-141.
 Lin, Ruei-Sung, Ming-Hsuan Yang, and Stephen E. Levinson. "Object tracking using incremental fisher discriminant analysis." Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. Vol. 2. IEEE, 2004.
 Levey, A., and Michael Lindenbaum. "Sequential Karhunen-Loeve basis extraction and its application to images." Image Processing, IEEE Transactions on 9.8 (2000): 1371-1374.