Abstract
This paper presents the design and implementation of E m o A s s i s t: a smart-phone based system to assist in dyadic conversations. The main goal of the system is to provide access to more non-verbal communication options to people who are blind or visually impaired. The key functionalities of the system are to predict behavioral expressions (such a yawn, a closed lip smile, a open lip smile, looking away, sleepy, etc.) and 3-D affective dimensions (valence, arousal, and dominance) from visual cues in order to provide the correct auditory feedback or response. A number of challenges related to the data communication protocols, efficient tracking of the face, modeling of behavioral expressions/affective dimensions, feedback mechanism and system integration were addressed to build an effective and functional system. In addition, orientation-sensor information from the smart-phone was used to correct image alignment to improve the robustness for real world application. Empirical studies show that the E m o A s s i s t can predict affective dimensions with acceptable accuracy (Maximum Correlation-Coefficient for valence: 0.76, arousal: 0.78, and dominance: 0.76) in natural dyadic conversation. The overall minimum and maximum response-times are (64.61 milliseconds) and (128.22 milliseconds), respectively. The integration of sensor information for correcting the orientation improved (16 % in average) the accuracy in recognizing behavioralexpressions. A usability study with ten blind people in social interaction shows that the E m o A s s i s t is highly acceptable with an Average acceptability rating using of 6.0 in Likert scale (where 1 and 7 are the lowest and highest possible ratings, respectively).
Similar content being viewed by others
Notes
1 yawn, closed lip smile, looking away, open-lip smile, sleepy etc.
2 Valence, Arousal, and Dominance: VAD.
References
Atkinson AP, Adolphs R (2005) Visual emotion perception: mechanisms and processes. Emotion and Consciousness 150
AKMMahbubur Rahman, Tanveer MI, Yeasin M (2011) A spatio-temporal probabilistic framework for dividing and predicting facial action units. In: Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II. ACII’11. Springer-Verlag, Berlin, pp 598–607
Bee N, Franke S, Andrea E (2009) Relations between facial display, eye gaze and head tilt: dominance perception variations of virtual agents. In: ACII Workshop 2009. doi:10.1109/ACII.2009.5349573
Boker SM, Cohn JF, Theobald B-J, Matthews I, Brick TR, Spies JR (2009) Effects of damping head movement and facial expression in dyadic conversation using real–time facial expression tracking and synthesized avatars. Philos Trans R Soc, B Biol Scie 364(1535):3485–3495
Cohen I, Sebe N, Garg A, Chen LS, Huang TS (2003) Facial expression recognition from video sequences: temporal and static modeling. Comput Vis Image Underst 91(1):160–187
Clemons J, Zhu H, Savarese S, Austin T (2011) Mevbench: a mobile computer vision benchmarking suite. In: 2011 IEEE International Symposium on Workload Characterization (IISWC). IEEE, pp 91–102
Dunbar NE (2005) Perceptions of power and interactional dominance in interpersonal relationships. J Soc Pers Relat 2:22. doi:10.1177/0265407505050944
Ekman P (1982) Emotions in the human face. Studies in Emotion and Social Interaction
Goldie P (2002) The emotions: a philosophical exploration. Oxford University Press, USA
Graesser A, Chipman P (2006) Detection of emotions during learning with AutoTutor. In: 28th Annual Meetings of the Cognitive Science Society. Erlbaum, pp 285–290
Grahe JE, Bernieri FJ (1999) The importance of nonverbal cues in judging rapport. J Nonverbal Behav 23(4):253–269
Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: 2008 IEEE International Conference on Multimedia and Expo. IEEE, pp 865–868
Hinds A, Sinclair A, Park J, Suttie A, Paterson H, Macdonald M (2003) Impact of an interdisciplinary low vision service on the quality of life of low vision patients. Br J Ophthalmol 11:87
Krishna S, Balasubramanian V, Panchanathan S (2010) Enriching social situational awareness in remote interactions: insights and inspirations from disability focused research. In: ACM Multimedia. ACM, Firenze. doi:10.1145/1873951.1874202
Krishna S, Bala S, McDaniel T, McGuire S, Panchanathan S (2010) VibroGlove: an assistive technology aid for conveying facial expressions. In: Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems. ACM, pp 3637–3642
Lamb TA (1981) Nonverbal and paraverbal control in Dyads and Triads: sex or power differences? Social Psychology Quarterly 44(1):49–53. http://www.jstor.org/stable/3033863
Liu L et al (2008) Vibrotactile rendering of human emotions on the manifold of facial expressions. J Multi 3(3):18–25
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol
Lucey S, Ashraf AB, Cohn J (2007) Investigating spontaneous facial action recognition through aam representations of the face. I-TECH Education and Publishing, Vienna, pp 275–286
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60
McKeown G, Valstar M, Cowie R, Pantic M, Schroder M (2012) The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3:5–17
Nicolaou MA, Gunes H, Pantic M (2011) Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Trans Affect Comput 2(2)
Palm G, Glodek M (2013) Towards emotion recognition in human computer interaction. In: Neural nets and surroundings. Springer, pp 323–336
Pelin A (2011) Real-time mobile-cloud computing for context- aware blind navigation. IJNGC 2. http://perpetualinnovation.net/ojs/index.php/ijngc/article/view/107
Peterson LL, Davie BS (2007) Computer networks: a systems approach. Elsevier
Rahman S, Li L (2010) iFeeling: vibrotactile rendering of human emotions on mobile phones. In: Mobile multimedia processing, vol 5960 of lecture notes in computer science. Springer Berlin / Heidelberg
Rahman AKMM, Tanveer MI, Anam ASMI, Yeasin M (2012) IMAPS: a smart phone based real-time framework for prediction of affect in natural dyadic conversation. In: Visual Communications and Image Processing (VCIP), 2012 IEEE, pp 1–6. doi:10.1109/VCIP.2012.6410828
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524
Roberts NA, Tsai JL, Coan JA (2007) Emotion elicitation using dyadic interaction tasks. In: Coan JA, Allen JJB (eds) Handbook of emotion elicitation and assessment. Oxford University Press
Russell JA (1978) Evidence of convergent validity on the dimensions of affect. J Pers Soc Psychol 36(10). http://content.apa.org/journals/psp/36/10/1152
Saragih JM, Lucey S, Cohn JF (2009) Face alignment through subspace constrained mean-shifts. In: 2009 IEEE 12th international conference on Computer vision. IEEE, pp 1034–1041
Saragih JM, Lucey S, Cohn JF (2011) Deformable model fitting by regularized landmark mean-shift. Int J Comput Vision 91(1):16. doi:10.1007/s11263-010-0380-4
TakeoKanade Y-L, Cohn JF (2001) Recognizing facial actions by combining geometric features and regional appearance patterns. Citeseer
Tanveer MI, Anam ASM, Rahman AKM, Ghosh S, Yeasin M (2012) FEPS: a sensory substitution system for the blind to perceive facial expressions. In: Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility. ACM, pp 207–208
Vapnik V, Golowich SE, Smola A (1997) Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems 281–287
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Watson RW, Mamrak SA (1987) Gaining efficiency in transport services by appropriate design and implementation choices. ACM Trans Comput Syst (TOCS) 5 (2):97–120
Zeng Z, Pantic M, Roisman GI, Huang TS A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58. doi:10.1109/TPAMI.2008.52
Acknowledgments
We are grateful to the participants of our study, specially the “Design Team” for actively helping us in our research and for giving the amazing feedback. Any opinions, findings, and conclusions or recommendations expressed in this material are our own and do not necessarily reflect the views of the funding institution. We are also thankful to our lab colleague Md Iftekhar Tanveer to share his code to extract facial features and head pose from the face tracker.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially funded by National Science Foundation (NSF-IIS-0746790), USA.
Rights and permissions
About this article
Cite this article
Rahman, A., Anam, A.I. & Yeasin, M. E m o A s s i s t: emotion enabled assistive tool to enhance dyadic conversation for the blind. Multimed Tools Appl 76, 7699–7730 (2017). https://doi.org/10.1007/s11042-016-3295-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3295-4