Abstract
Although great progress has been made in facial expression recognition, it still faces challenges such as occlusion and pose changes in real-world scenario. To address this issue, we propose a simple yet effective multi-granularity and multi-scale feature fusion network (MM-Net) to achieve robust expression recognition without either manually extracting local patches or designing complex sub-networks. Specifically, we use a puzzle generator to divide the image into local regions of different granularity, which are then randomly shuffled and reorganized to form a new input image. By feeding the facial puzzles in order from fine-grained to coarse-grained, the network progressively mines the local fine-grained information, the coarse-grained information, and the global information. Besides, considering the subtle inter-class variation characteristic of different expressions, we use the multi-scale feature fusion strategy in the shallow feature extraction module to obtain global features with detailed information for capturing the subtle differences in facial expression images. Extensive experimental results on three in-the-wild FER benchmarks demonstrate the superiority of the proposed MM-Net compared to state-of-the-art methods.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The RAF-DB datasets analyzed during the current study are available in http://www.whdeng.cn/raf/model1.html. The FERPlus datasets analyzed during the current study are available in https://github.com/Microsoft/FERPlus. The AffectNet datasets analyzed during the current study are available in http://mohammadmahoor.com/affectnet/. The FED-RO datasets analyzed during the current study are available in https://doi.org/10.1109/TIP.2018.2886767.
References
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. 13(3), 1195–1215 (2020). https://doi.org/10.1109/TAFFC.2020.2981446
Lucey, P., Cohn, J.F., Kanade, T., et al.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 94–101 (2010)
Zhao, G., Huang, X., Taini, M., Li, S.Z., PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011). https://doi.org/10.1016/j.imavis.2011.07.002
Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Web-based database for facial expression analysis. In: 2005 IEEE International Conference on Multimedia and Expo, p. 5 (2005)
Kim, Y., Yoo, B., Kwak, Y., Choi, C., Kim, J.: Deep generative-contrastive networks for facial expression recognition. arXiv preprint (2017). arXiv:1703.07140
Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017). https://doi.org/10.1109/TIP.2017.2689999
Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de-expression residue learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2168–2177 (2018)
Hazourli, A.R., Djeghri, A., Salam, H., Othmani, A.: Deep multi-facial patches aggregation network for facial expression recognition. arXiv preprint (2020). arXiv:2002.09298
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)
Goodfellow, I.J., Erhan, D., Carrier, P.L., et al.: Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124 (2013)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2019). https://doi.org/10.1109/TAFFC.2017.2740923
Farzaneh, A.H., Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2401–2410 (2021)
Li, H., Xiao, X., Liu, X., Guo, J., Wen, G., Liang, P.: Heuristic objective for facial expression recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02619-7
Siqueira, H., Magg, S., Wermter, S.: Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5800–5809 (2020)
Cai, J., Meng, Z., Khan, A.S., et al.: Identity-free facial expression recognition using conditional generative adversarial network. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1344–1348 (2021)
Zhang, F., Zhang, T., Mao, Q., Xu, C.: Joint pose and expression modeling for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3359–3368 (2018)
Hammal, Z., Arguin, M., Gosselin, F.: Comparing a novel model based on the transferable belief model with humans during the recognition of partially occluded facial expressions. J. Vis. 9(2), 22–22 (2009). https://doi.org/10.1167/9.2.23
Ramírez Cornejo, J.Y., Pedrini, H.: Recognition of occluded facial expressions based on centrist features. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1298–1302 (2016)
Pan, B., Wang, S., Xia, B.: Occluded facial expression recognition enhanced through privileged information. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 566–573 (2019)
Adil, B., Nadjib, K.M., Yacine, L.: A novel approach for facial expression recognition. In: 2019 International Conference on Networking and Advanced Systems (ICNAS), pp. 1–5 (2019)
Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519 (2021)
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020). https://doi.org/10.1109/TIP.2019.2956143
Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2019). https://doi.org/10.1109/TIP.2018.2886767
Du, R., Chang, D., Bhunia, A.K., Xie, J., Ma, Z., Song, Y.-Z., Guo, J.: Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: European Conference on Computer Vision, pp. 153–168 (2020)
Ding, H., Zhou, P., Chellappa, R.: Occlusion-adaptive deep network for robust facial expression recognition. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–9 (2020)
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021). https://doi.org/10.1109/TIP.2021.3093397
Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021). https://doi.org/10.1109/TAFFC.2021.3122146
Liang, X., Xu, L., Zhang, W., et al.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02413-5
Liu, C., Hirota, K., Dai, Y.: Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf. Sci. 619, 781–794 (2023). https://doi.org/10.1016/j.ins.2022.11.068
Liao, L., Zhu, Y., Zheng, B., Jiang, X., Lin, J.: Fergcn: facial expression recognition based on graph convolution network. Mach. Vis. Appl. 33(3), 40 (2022). https://doi.org/10.1007/s00138-022-01288-9
Gao, H., Wu, M., Chen, Z., et al.: Ssa-icl: Multi-domain adaptive attention with intra-dataset continual learning for facial expression recognition. Neural Netw. 158, 228–238 (2023). https://doi.org/10.1016/j.neunet.2022.11.025
Ruan, D., Yan, Y., Lai, S., et al.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7656–7665 (2021)
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6897–6906 (2020)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, pp. 69–84 (2016)
Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5152–5161 (2019)
Xia, H., Li, C., Tan, Y., Li, L., Song, S.: Destruction and reconstruction learning for facial expression recognition. IEEE Multimed. 28(2), 20–28 (2021). https://doi.org/10.1109/MMUL.2021.3076834
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Duta, I.C., Liu, L., Zhu, F., Shao, L.: Pyramidal convolution: rethinking convolutional neural networks for visual recognition. arXiv preprint (2020). arXiv:2006.11538
Gao, S., Cheng, M., Zhao, K., et al.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2021). https://doi.org/10.1109/TPAMI.2019.2938758
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision, pp. 87–102 (2016)
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst., vol. 32 (2019)
Huang, C.: Combining convolutional neural networks for emotion recognition. In: 2017 IEEE MIT Undergraduate Research Technology Conference (URTC), pp. 1–4 (2017)
Su, C., Wei, J., Lin, D., Kong, L.: Using attention lsgb network for facial expression recognition. Pattern Anal. Appl. (2022). https://doi.org/10.1007/s10044-022-01124-w
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847 (2018)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos.62106054), the Science and Technology Project of Guangxi (Grant 2018GXNSFAA281351) and the Research Projects of Guangxi Normal University (Natural Sciences) (2021JC012).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xia, H., Lu, L. & Song, S. Feature fusion of multi-granularity and multi-scale for facial expression recognition. Vis Comput 40, 2035–2047 (2024). https://doi.org/10.1007/s00371-023-02900-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02900-3