Skip to main content

Deep Spatio-Temporal Decision Fusion Network for Facial Expression Recognition

  • Conference paper
  • First Online:
Machine Learning for Cyber Security (ML4CS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13657))

Included in the following conference series:

Abstract

Facial expression recognition (FER) becomes research focus in affective computing, as it already plays an important role in public security application scenarios such as urban safety management and safety driving assistance systems. Modeling the spatiotemporal information of facial expression sequences in a targeted manner, integrating and utilizing them appropriately is challenging. In this paper, a facial expression recognition method based on spatial-temporal decision fusion network (STDFN) is proposed. Firstly, the facial expression sequences are divided into four sub-sequences according to face regions, and BiLSTM are used for each of sub-sequences to extract local temporal features. The local morphological features of facial expressions can be captured in more detail to maximize the utilization of the temporal features of dynamic facial expressions. Then, VGG19 is utilized to extract the shallow spatial features of peak expression frame, and the channel weights of spatial features is assigned by squeeze-and-excitation module to attain the weighted spatial features. This allows valid spatial features to be purposefully retained to avoid overfitting. Finally, temporal features and spatial features are used separately calculating expression classification results. And a decision-level fusion module is designed to fuse the two results to obtain the final FER result. Extensive experimental results demonstrate that on three FER datasets CK+, Oulu-CASIA and MMI, achieves 98.83%, 89.31% and 82.86% accuracy, which proved that STDFN effectively improved the recognition accuracy of FER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Tayibnapis, I.R., Koo, D.Y., Choi, M.K., et al.: A novel driver fatigue monitoring using optical imaging of face on safe driving system. In: 2016 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC), pp. 115–120 (2016)

    Google Scholar 

  2. Poria, S., Cambria, E., Bajpai, R., et al.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)

    Article  Google Scholar 

  3. Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. (2020)

    Google Scholar 

  4. Zhang, K., Huang, Y., Du, Y., et al.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  5. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. IEEE (2018)

    Google Scholar 

  6. Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008-19th British Machine Vision Conference (BMVC), vol. 275, pp. 1–10. British Machine Vision Association (2008)

    Google Scholar 

  7. Liu, M., Shan, S., Wang, R., et al.: Learning expression lets on spatio-temporal manifold for dynamic facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1749–1756. IEEE (2014)

    Google Scholar 

  8. Jung, H., Lee, S., Yim, J., et al.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2983–2991. IEEE (2015)

    Google Scholar 

  9. Zhang, T., Zheng, W., Cui, Z., et al.: Spatial–temporal recurrent neural network for emotion recognition. IEEE Trans. Cybern. 49(3), 839–847 (2018)

    Article  Google Scholar 

  10. Zhao, X., et al.: Peak-piloted deep network for facial expression recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 425–442. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_27

    Chapter  Google Scholar 

  11. Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de-expression residue learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2168–2177 (2018)

    Google Scholar 

  12. Lucey, P., Cohn, J.F., Kanade, T., et al.: The extended Cohn-Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops (CVPRW), pp. 94–101 (2010)

    Google Scholar 

  13. Zhao, G., Huang, X., Taini, M., et al.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)

    Article  Google Scholar 

  14. Hu, M., Wang, H., Wang, X., et al.: Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks. J. Vis. Commun. Image Represent. 59, 176–185 (2019)

    Article  Google Scholar 

  15. Liu, X., Jin, L., Han, X., et al.: Mutual information regularized identity-aware facial expression recognition in compressed video. Pattern Recogn. 119, 108105 (2021)

    Article  Google Scholar 

  16. Ding, H., Zhou, S.K., Chellappa, R.: Facenet2expnet: regularizing a deep face recognition net for expression recognition. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 118–126. IEEE (2017)

    Google Scholar 

  17. Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In: Proceedings of 3rd International Workshop on EMOTION (Satellite of LREC): Corpora for Research on Emotion and Affect, p. 65 (2010)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  19. Zhou, P., Shi, W., Tian, J., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), vol. 2, pp. 207–212 (2016)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the Natural Science Foundation of Shandong Province (No. ZR2020LZH008, ZR2021MF118, ZR2019MF071), the Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project) (NO. 2021CXGC010506, NO. 2021SFGC0104).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiangwei Zheng or Wei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., Yang, H., Zhang, X., Zheng, X., Li, W. (2023). Deep Spatio-Temporal Decision Fusion Network for Facial Expression Recognition. In: Xu, Y., Yan, H., Teng, H., Cai, J., Li, J. (eds) Machine Learning for Cyber Security. ML4CS 2022. Lecture Notes in Computer Science, vol 13657. Springer, Cham. https://doi.org/10.1007/978-3-031-20102-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20102-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20101-1

  • Online ISBN: 978-3-031-20102-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy