Skip to main content

A Feature Fusion Network for Skeleton-Based Gesture Recognition

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14268))

Included in the following conference series:

  • 902 Accesses

Abstract

With the development of the times, the requirements for human-computer interaction methods have gradually increased, and naturalness and comfort are constantly pursued on the basis of traditional precision. Gesture is one of the innate ways of human communication, which is highly intuitive and can be employed as an effective means of natural human-computer interaction. In this paper, dynamic gestures are investigated based on the 3D skeletal information of gestures, and different cropping boxes are placed at generate global and local datasets respectively according to whether they depend on the motion trajectory of the gesture. By analyzing the geometric features of skeletal sequences, a dual-stream 3D CNN (Double_C3D) framework is proposed for fusion at the feature level, which relies on 3D heat map video streams and uses the video streams as the input to the network. Finally, the Double_C3D framework was evaluated on the SHREC dynamic gesture recognition dataset and the JHMBD dynamic behavior recognition dataset with an accuracy of 91.72% and 70.54%, respectively.

The authors would like to acknowledge the support from the National Natural Science Foundation of China (62006204, 52075530), the Guangdong Basic and Applied Basic Research Foundation (2022A1515011431), and Shenzhen Science and Technology Program (RCBS20210609104516043, JSGG20210802154004014). This work is also partially supported by the AiBle project co-financed by the European Regional Development Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Oudah, M., Al-Naji, A. and Chahl, J.: Hand gesture recognition based on computer vision: a review of techniques. J. Imaging 6(8), 73 (2020)

    Google Scholar 

  2. Gao, Q., Chen, Y., Ju, Z., et al.: Dynamic hand gesture recognition based on 3D hand pose estimation for human-robot interaction. IEEE Sens. J. 22(18), 17421–17430 (2021)

    Article  MATH  Google Scholar 

  3. Yu, J., Gao, H., Zhou, D., Liu, J., Gao, Q., Ju, Z.: Deep temporal model-based identity-aware hand detection for space human-robot interaction. IEEE Trans. Cybern. 52(12), 13738–13751 (2022). https://doi.org/10.1109/TCYB.2021.3114031

    Article  MATH  Google Scholar 

  4. Kolkur, S., Kalbande, D., Shimpi, P., et al.: Human skin detection using RGB, HSV and YCbCr color models. arXiv preprint arXiv:1708.02694 (2017)

  5. Wu, Z., Xiong, C., Jiang, Y.G., et al.: Liteeval: A coarse-to-fine framework for resource efficient video recognition. Adv. Neural Inform. Process. Syst. 32 (2019)

    Google Scholar 

  6. Hongchao, S., Hu, Y., Guoqing, Z., et al.: Behavior Identification based on Improved Two-Stream Convolutional Networks and Faster RCNN. In: 2021 33rd Chinese Control and Decision Conference (CCDC). IEEE, pp. 1771–1776 (2021)

    Google Scholar 

  7. Wang, P., Li, W., Ogunbona, P., et al.: RGB-D-based human motion recognition with deep learning: a survey. Comput. Vis. Image Underst. 171, 118–139 (2018)

    Article  MATH  Google Scholar 

  8. Xiao, F., Lee, Y.J., Grauman, K., et al.: Audiovisual slowfast networks for video recognition. arXiv preprint arXiv:2001.08740 (2020)

  9. De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)

    Google Scholar 

  10. Cai, J., Jiang, N., Han, X., et al.: JOLO-GCN: mining joint-centered light-weight information for skeleton-based action recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2735–2744 (2021)

    Google Scholar 

  11. Caputo, F.M., Prebianca, P., Carcangiu, A., et al.: A 3 cent recognizer: simple and effective retrieval and classification of mid-air gestures from single 3D traces. In: STAG, pp. 9–15 (2017)

    Google Scholar 

  12. Chen, X., Wang, G., Guo, H., et al.: Mfa-net: Motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19(2), 239 (2019)

    Article  MATH  Google Scholar 

  13. Li, M., Chen, S., Chen, X., et al.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)

    Google Scholar 

  14. Si, C., Chen, W., Wang, W., et al.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)

    Google Scholar 

  15. Yang, H., Yan, D., Zhang, L., et al.: Feedback graph convolutional network for skeleton-based action recognition. IEEE Trans. Image Process. 31, 164–175 (2021)

    Article  MATH  Google Scholar 

  16. Deng, Z., Gao, Q., Ju, Z., et al.: Skeleton-based multifeatures and multistream network for real-time action recognition. IEEE Sens. J. 23(7), 7397–7409 (2023)

    Article  MATH  Google Scholar 

  17. Zolfaghari, M., Oliveira, G.L., Sedaghat, N., et al.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2904–2913 (2017)

    Google Scholar 

  18. Baradel, F., Wolf, C., Mille, J.: Human action recognition: pose-based attention draws focus to hands. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 604–613 (2017)

    Google Scholar 

  19. Nunez, J.C., Cabido, R., Pantrigo, J.J., et al.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)

    Article  Google Scholar 

  20. Yu, J., Gao, H., Chen, Y., Zhou, D., Liu, J., Ju, Z.: Adaptive spatiotemporal representation learning for skeleton-based human action recognition. IEEE Trans. Cogn. Develop. Syst. 14(4), 1654–1665 (2022). https://doi.org/10.1109/TCDS.2021.3131253

    Article  MATH  Google Scholar 

  21. Devineau, G., Xi, W., Moutarde, F., et al.: Convolutional neural networks for multivariate time series classification using both inter-and intra-channel parallel convolutions. In: Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP'2018) (2018)

    Google Scholar 

  22. De Smedt, Q., Wannous, H., Vandeborre, J.P., et al.: Shrec'17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)

    Google Scholar 

  23. Jhuang, H., Gall, J., Zuffi, S., et al.: Towards understanding action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3192–3199 (2017)

    Google Scholar 

  24. Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: 2019 Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG IEEE 2019), pp. 1–8 (2019)

    Google Scholar 

  25. Dhingra, N., Kunz, A.: Res3atn-deep 3D residual attention network for hand gesture recognition in videos. In: 2019 International Conference on 3D Vision (3DV). IEEE, pp. 491–501 (2019)

    Google Scholar 

  26. Dadashzadeh, A., et al.: HGR‐Net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vision 13(8), 700–707 (2019)

    Google Scholar 

  27. Chen, Y., Ding, Z., Chen, Y. L., et al.: Rapid recognition of dynamic hand gestures using leap motion. In: 2015 IEEE International Conference on Information and Automation. IEEE, 2015: 1419–1424

    Google Scholar 

  28. Choutas, V., Weinzaepfel, P., Revaud, J., et al.: Potion: pose motion representation for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7024–7033 (2018)

    Google Scholar 

  29. Shi, L., Zhang, Y., Cheng, J., et al.: Decoupled spatial-temporal attention network for skeleton-based action recognition. arXiv preprint arXiv:2007.03263 (2020)

  30. Zhang, S., Yang, Y., Xiao, J., et al.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimedia 20(9), 2330–2343 (2018)

    Article  MATH  Google Scholar 

  31. Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp.148–157 (2017)

    Google Scholar 

  32. Duan, H., Zhao, Y., Chen, K., et al.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2969–2978 (2022)

    Google Scholar 

  33. Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

    Google Scholar 

  34. Ludl, D., Gulde, T., Curio, C.: Simple yet efficient real-time pose-based action recognition. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp. 581–588 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Qing Gao or Zhaojie Ju .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

You, X., Gao, Q., Gao, H., Ju, Z. (2023). A Feature Fusion Network for Skeleton-Based Gesture Recognition. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14268. Springer, Singapore. https://doi.org/10.1007/978-981-99-6486-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6486-4_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6485-7

  • Online ISBN: 978-981-99-6486-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy