A Novel Computerized Lip Reading System for Automatic Human Speech Recognition and Transcription
Using OpenCV2 libraries, C++, C#, and MATLAB programming languages, and the OuluVS English speech video database consisting of 1,218 files and 10 distinct phrases spoken by 20 individuals, a novel computerized lip reading system was developed, trained, and tested to recognize and transcribe human speech. Lip reading has applications in security, gaming, human-computer interactions, and deafness research. The first step in developing the lip reading system involved recognizing the speaker's face in every video frame using a facial recognition algorithm in OpenCV2. After detecting the speaker's mouth region, key points were placed on the inner outline of the lips, which allowed for numerical feature extraction based on the changes in the positions of the speaker's lips over time. A total of 120 features were extracted, consisting of five coefficients generated from polynomial curve fitting of the lips, the 0th, 1st, and 2nd gradients, and four functional features consisting of the minimum, mean, maximum, and standard deviation of the lip key points. A novel mouth-phoneme model that relates phonemes and visemes using audio and visual information was developed, allowing for the direct conversion between lip movements and phonemes, and furthermore, the lip reading of any word in the English language. Microsoft's Speech API was used to extract phonemes from audio data in the OuluVS database, and WEKA (Waikato Environment for Knowledge Analysis) was used to train the lip reading system. Overall, the lip reading system was 86% accurate, well above other researchers' accuracies of 62% using the same database. The AjayVS database, consisting of phrases not present in the OuluVS database, was created to further test the lip reading system. Results indicate that the lip reading system was 89% accurate using the AjayVS database. Future work will involve developing a real-time lip reading system, as well as expanding the lip reading system to other languages.