Abstract
Facial expression recognition (FER) becomes research focus in affective computing, as it already plays an important role in public security application scenarios such as urban safety management and safety driving assistance systems. Modeling the spatiotemporal information of facial expression sequences in a targeted manner, integrating and utilizing them appropriately is challenging. In this paper, a facial expression recognition method based on spatial-temporal decision fusion network (STDFN) is proposed. Firstly, the facial expression sequences are divided into four sub-sequences according to face regions, and BiLSTM are used for each of sub-sequences to extract local temporal features. The local morphological features of facial expressions can be captured in more detail to maximize the utilization of the temporal features of dynamic facial expressions. Then, VGG19 is utilized to extract the shallow spatial features of peak expression frame, and the channel weights of spatial features is assigned by squeeze-and-excitation module to attain the weighted spatial features. This allows valid spatial features to be purposefully retained to avoid overfitting. Finally, temporal features and spatial features are used separately calculating expression classification results. And a decision-level fusion module is designed to fuse the two results to obtain the final FER result. Extensive experimental results demonstrate that on three FER datasets CK+, Oulu-CASIA and MMI, achieves 98.83%, 89.31% and 82.86% accuracy, which proved that STDFN effectively improved the recognition accuracy of FER.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tayibnapis, I.R., Koo, D.Y., Choi, M.K., et al.: A novel driver fatigue monitoring using optical imaging of face on safe driving system. In: 2016 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC), pp. 115–120 (2016)
Poria, S., Cambria, E., Bajpai, R., et al.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. (2020)
Zhang, K., Huang, Y., Du, Y., et al.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. IEEE (2018)
Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008-19th British Machine Vision Conference (BMVC), vol. 275, pp. 1–10. British Machine Vision Association (2008)
Liu, M., Shan, S., Wang, R., et al.: Learning expression lets on spatio-temporal manifold for dynamic facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1749–1756. IEEE (2014)
Jung, H., Lee, S., Yim, J., et al.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2983–2991. IEEE (2015)
Zhang, T., Zheng, W., Cui, Z., et al.: Spatial–temporal recurrent neural network for emotion recognition. IEEE Trans. Cybern. 49(3), 839–847 (2018)
Zhao, X., et al.: Peak-piloted deep network for facial expression recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 425–442. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_27
Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de-expression residue learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2168–2177 (2018)
Lucey, P., Cohn, J.F., Kanade, T., et al.: The extended Cohn-Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops (CVPRW), pp. 94–101 (2010)
Zhao, G., Huang, X., Taini, M., et al.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)
Hu, M., Wang, H., Wang, X., et al.: Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks. J. Vis. Commun. Image Represent. 59, 176–185 (2019)
Liu, X., Jin, L., Han, X., et al.: Mutual information regularized identity-aware facial expression recognition in compressed video. Pattern Recogn. 119, 108105 (2021)
Ding, H., Zhou, S.K., Chellappa, R.: Facenet2expnet: regularizing a deep face recognition net for expression recognition. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 118–126. IEEE (2017)
Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In: Proceedings of 3rd International Workshop on EMOTION (Satellite of LREC): Corpora for Research on Emotion and Affect, p. 65 (2010)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Zhou, P., Shi, W., Tian, J., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), vol. 2, pp. 207–212 (2016)
Acknowledgments
This work is supported by the Natural Science Foundation of Shandong Province (No. ZR2020LZH008, ZR2021MF118, ZR2019MF071), the Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project) (NO. 2021CXGC010506, NO. 2021SFGC0104).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X., Yang, H., Zhang, X., Zheng, X., Li, W. (2023). Deep Spatio-Temporal Decision Fusion Network for Facial Expression Recognition. In: Xu, Y., Yan, H., Teng, H., Cai, J., Li, J. (eds) Machine Learning for Cyber Security. ML4CS 2022. Lecture Notes in Computer Science, vol 13657. Springer, Cham. https://doi.org/10.1007/978-3-031-20102-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-20102-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20101-1
Online ISBN: 978-3-031-20102-8
eBook Packages: Computer ScienceComputer Science (R0)