Hand-Gesture Based Input Method -- Paw Input Method

Weilong Zheng, Junwu Weng, Mengxiao Hou, Xiaoyu Pan


This project is a course design in Modern Electronic Design course. Groups in this class were provided certain fund to support their projects. In this project, we designed a convenient Chinese input method which was used in Hand-Gesture Control System. By our input method, users could input Chinese text on such a control system effectively. Our works could be seperated into two parts. Design of Paw Chinese Input Method and implementation of the input method as well as a simple Hand-Gesture Control System. I was primarily responsible for the implementation of the Hand-Gestrue Control System.

Paw Input Method

Our Paw Chinese Input Method is inspired by Ye Fan's Aeviou Chinese Input Method. In Chinese Pinyin, there are 23 initials and 37 finals. More specificly, when users try to input the first Pinyin letter, there are 26 choices, 23 initials and 3 vowels (a, e, o). 37 finals can be built up by only 8 letters (a, e, v, i, o, u, n and g). When users need to input following letters, the number of choices at current input is not greater than six.

  • (1) First Letters in Six Groups

  • (2) Top Group Selected

  • (3) Letter "b" Selected

  • Figure 1. User-Interface of Paw Input Method

Therefore, we first seperate 26 first letter options into 6 groups, as shown in Figure 1(1). Then, when users want to input letter "b", for example, they need to click the top button first, and the 5 first letters in top button group will appear in nearby buttons, as shown in Figure 1(2). After users select letter "b", of course "b" has been inputted at the same time, five (not greater than six) letter candidates will show up, as illustrated in Figure 1(3), and users could continue their inputting progress.

  • Figure 2. User Interface of Final Version

As we could see, by taking advantage of Chinese Pinyin's features, this method could effectively narrow down control range. Only two gestures (waiting and confirmed) are included. Additionally, in order to implement "backspace" and "turn page" commands, we expand the number of gestures to four by including two gestures, moving leftward and downward with fist. However, because of limitted time and lack of software development knowledge, we could not finish this project perfectly. There are still many details need to be considered. Our final version UI is shown in Figure 2.

Hand-Control System Based on Kinect Camera

Object tracking in 2D images is a complicated topic in computer vision field. In order to simplify our algorithm, we use Kinect Camera, as illustrated in Figure 3, to capture users' motion, since it could output depth images which is easier to process.

  • Figure 3. Microsoft XBOX Kinect Camera

Image Pre-Processing

Depth images from Kinect are represented in 16 bits. That is to say the value range of each pixel in depth image is from 0 to 65535. First of all, we need to transform the 16-bits depth image to gray image (8-bits) so that it could be displayed on computer screen correctly, just as shown in Figure 4(1). The value of each pixel on depth image represents the distance between camera and objects in scene. Therefore, it can seperate foreground and background easily. We use 100 as a threshold to binarize gray depth image, and the result is shown in Figure 4(2). Only when the distance between hand and camera is less than a certain value, can user's hand be shown uniquely on binarized depth image.

  • (1) Gray Depth Image

  • (2) Binarized Depth Image

  • (3) Maximum Outer Contour

  • Figure 4. Procedure of Gesture Recognition Algorithm

Then we try to get contour of hand shape, which will be used in gesture recognition, in depth image. The method of contour finding is described in [1]. This algorithm has been implemented in OpenCV. The result is shown in Figure 4(3).

Gesture Templates Matching

We use Hu Invariant Moments to describe hand's shape. This algorithm was proposed by Ming-Kuei Hu in [2]. Hu moments is proved to be invariants to the image scale, rotation and reflection. It is a 7-dimensions feature. For hand shapes in each frame, we use algorithm mentioned in [2] to get their Hu moments features. Then calculate the distance between these features and templates' features which have been stored in computer in advance. Some of them are shown in Figure 5. Pictures in the first row are templates of fist (for "confirmed" gesture), and those in the second row are templates of palm (for "waiting" gesture). The template which is close to the hand in current frame is regarded as the shape of current hand. This is the procedure of gesture recognition.

  • Figure 5. Templates of Hand Shape

The templates group we use only include two gestures sampled at different angles. We do not need to store a lot of templates in advance, since the control range of hand is small and Hu moments is invariant to scale, rotation and reflection.

Video Demo:

  • Reference

    [1] Suzuki, Satoshi. "Topological structural analysis of digitized binary images by border following." Computer Vision, Graphics, and Image Processing 30.1 (1985): 32-46.

    [2] Hu, Ming-Kuei. "Visual pattern recognition by moment invariants." Information Theory, IRE Transactions on 8.2 (1962): 179-187.