Abstract
This work presents a framework to recognise signer independent mouthings in continuous sign language, with no manual annotations needed. Mouthings represent lip-movements that correspond to pronunciations of words or parts of them during signing. Research on sign language recognition has focused extensively on the hands as features. But sign language is multi-modal and a full understanding particularly with respect to its lexical variety, language idioms and grammatical structures is not possible without further exploring the remaining information channels. To our knowledge no previous work has explored dedicated viseme recognition in the context of sign language recognition. The approach is trained on over 180.000 unlabelled frames and reaches 47.1% precision on the frame level. Generalisation across individuals and the influence of context-dependent visemes are analysed.
Chapter PDF
Similar content being viewed by others
References
Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Pattern Analysis and Machine Intelligence 20(12), 1371–1375 (1998)
Vogler, C., Metaxas, D.: Handshapes and movements: Multiple-channel American sign language recognition. In: Camurri, A., Volpe, G. (eds.) GW 2003. LNCS (LNAI), vol. 2915, pp. 247–258. Springer, Heidelberg (2004)
Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new vision based features. Pattern Recognition Letters 32(4), 572–577 (2011)
Ong, S.C., Ranganath, S.: Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Pattern Analysis and Machine Intelligence 27(6), 873–891 (2005)
Lucas, C., Bayley, R., Valli, C.: What’s your sign for pizza?: an introduction to variation in American Sign Language. Gallaudet University Press, Washington, D.C (2003)
Emmorey, K.: Language, Cognition, and the Brain: Insights From Sign Language Research. Psychology Press (November 2001)
Sandler, W.: Sign Language and Linguistic Universals. Cambridge University Press (February 2006)
Lan, Y., Harvey, R., Theobald, B.-J.: Insights into machine lip reading. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4825–4828 (March 2012)
Hilder, S., Theobald, B.J., Harvey, R.: In pursuit of visemes. In: Proceedings of the International Conference on Auditory-Visual Speech Processing, pp. 154–159 (2010)
Fisher, C.G.: Confusions among visually perceived consonants. Journal of Speech, Language and Hearing Research 11(4), 796 (1968)
Petajan, E.D.: Automatic Lipreading to Enhance Speech Recognition (Speech Reading). PhD thesis, University of Illinois at Urbana-Champaign, Champaign, IL, USA (1984)
Zhou, Z., Zhao, G., Pietikainen, M.: Towards a practical lipreading system. In: Computer Vision and Pattern Recognition, pp. 137–144 (2011)
Zhao, G., Barnard, M., Pietikainen, M.: Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia 11(7), 1254–1265 (2009)
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)
Chi†u, A., Rothkrantz, L.J.M.: Automatic visual speech recognition. In: Ramakrishnan, S. (ed.) Speech Enhancement, Modeling and Recognition- Algorithms and Applications. InTech (March 2012)
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio-visual speech recognition. In: Final Workshop 2000 Report, vol. 764 (2000)
Aghaahmadi, M., Dehshibi, M.M., Bastanfard, A., Fazlali, M.: Clustering persian viseme using phoneme subspace for developing visual speech application. Multimedia Tools and Applications, 1–21 (2013)
Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing 27(6), 803–816 (2009)
Tian, Y.L., Kanade, T., Cohn, J.: Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(2), 97–115 (2001)
Buehler, P., Everingham, M., Zisserman, A.: Employing signed TV broadcasts for automated learning of British sign language. In: Proceedings of 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, pp. 22–23 (2010)
Cooper, H., Ong, E.J., Pugeault, N., Bowden, R.: Sign language recognition using sub-units. The Journal of Machine Learning Research 13(1), 2205–2231 (2012)
Kelly, D., McDonald, J., Markham, C.: Weakly supervised training of a sign language recognition system using multiple instance learning density matrices. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 41(2), 526–541 (2011)
Cooper, H., Holt, B., Bowden, R.: Sign language recognition. In: Moeslund, T.B., Hilton, A., Krüger, V., Sigal, L. (eds.) Visual Analysis of Humans, pp. 539–562. Springer, London (2011)
Koller, O., Ney, H., Bowden, R.: May the force be with you: Force-aligned SignWriting for automatic subunit annotation of corpora. In: IEEE International Conference on Automatic Face and Gesture Recognition, Shanghai, PRC (April 2013)
Michael, N., Neidle, C., Metaxas, D.: Computer-based recognition of facial expressions in ASL: from face tracking to linguistic interpretation. In: Proceedings of the 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, LREC, Malta (2010)
Vogler, C., Goldenstein, S.: Facial movement analysis in ASL. Universal Access in the Information Society 6(4), 363–374 (2008)
Pfister, T., Charles, J., Zisserman, A.: Large-scale learning of sign language by watching TV (using co-occurrences). In: Proceedings of the British Machine Vision Conference, U. K. Leeds (2013)
Gross, R., Matthews, I., Baker, S.: Generic vs. person specific active appearance models. Image and Vision Computing 23(12), 1080–1093 (2005)
Xiao, J., Baker, S., Matthews, I., Kanade, T.: Real-time combined 2D+ 3D active appearance models. In: CVPR (2), pp. 535–542 (2004)
Schmidt, C., Koller, O., Ney, H., Hoyoux, T., Piater, J.: Enhancing gloss-based corpora with facial features using active appearance models. In: International Symposium on Sign Language Translation and Avatar Technology, Chicago, IL, USA, vol. 2 (2013)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication 50(5), 434–451 (2008)
Elliott, E.A.: Phonological Functions of Facial Movements: Evidence from deaf users of German Sign Language. Thesis, Freie Universität, Berlin, Germany (2013)
Jiang, J., Alwan, A., Bernstein, L.E., Auer, E.T., Keating, P.A.: Similarity structure in perceptual and physical measures for visual consonants across talkers. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. I–441–I–444 (May 2002)
Turkmani, A.: Visual Analysis of Viseme Dynamics. Ph.d., University of Surrey (2008)
Beulen, K.: Phonetische Entscheidungsbäume für die automatische Spracherkennung mit großem Vokabular. Mainz (1999)
Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 13–16 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Koller, O., Ney, H., Bowden, R. (2014). Read My Lips: Continuous Signer Independent Weakly Supervised Viseme Recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer, Cham. https://doi.org/10.1007/978-3-319-10590-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-10590-1_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10589-5
Online ISBN: 978-3-319-10590-1
eBook Packages: Computer ScienceComputer Science (R0)