Abstract
With the development of the times, the requirements for human-computer interaction methods have gradually increased, and naturalness and comfort are constantly pursued on the basis of traditional precision. Gesture is one of the innate ways of human communication, which is highly intuitive and can be employed as an effective means of natural human-computer interaction. In this paper, dynamic gestures are investigated based on the 3D skeletal information of gestures, and different cropping boxes are placed at generate global and local datasets respectively according to whether they depend on the motion trajectory of the gesture. By analyzing the geometric features of skeletal sequences, a dual-stream 3D CNN (Double_C3D) framework is proposed for fusion at the feature level, which relies on 3D heat map video streams and uses the video streams as the input to the network. Finally, the Double_C3D framework was evaluated on the SHREC dynamic gesture recognition dataset and the JHMBD dynamic behavior recognition dataset with an accuracy of 91.72% and 70.54%, respectively.
☆ The authors would like to acknowledge the support from the National Natural Science Foundation of China (62006204, 52075530), the Guangdong Basic and Applied Basic Research Foundation (2022A1515011431), and Shenzhen Science and Technology Program (RCBS20210609104516043, JSGG20210802154004014). This work is also partially supported by the AiBle project co-financed by the European Regional Development Fund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Oudah, M., Al-Naji, A. and Chahl, J.: Hand gesture recognition based on computer vision: a review of techniques. J. Imaging 6(8), 73 (2020)
Gao, Q., Chen, Y., Ju, Z., et al.: Dynamic hand gesture recognition based on 3D hand pose estimation for human-robot interaction. IEEE Sens. J. 22(18), 17421–17430 (2021)
Yu, J., Gao, H., Zhou, D., Liu, J., Gao, Q., Ju, Z.: Deep temporal model-based identity-aware hand detection for space human-robot interaction. IEEE Trans. Cybern. 52(12), 13738–13751 (2022). https://doi.org/10.1109/TCYB.2021.3114031
Kolkur, S., Kalbande, D., Shimpi, P., et al.: Human skin detection using RGB, HSV and YCbCr color models. arXiv preprint arXiv:1708.02694 (2017)
Wu, Z., Xiong, C., Jiang, Y.G., et al.: Liteeval: A coarse-to-fine framework for resource efficient video recognition. Adv. Neural Inform. Process. Syst. 32 (2019)
Hongchao, S., Hu, Y., Guoqing, Z., et al.: Behavior Identification based on Improved Two-Stream Convolutional Networks and Faster RCNN. In: 2021 33rd Chinese Control and Decision Conference (CCDC). IEEE, pp. 1771–1776 (2021)
Wang, P., Li, W., Ogunbona, P., et al.: RGB-D-based human motion recognition with deep learning: a survey. Comput. Vis. Image Underst. 171, 118–139 (2018)
Xiao, F., Lee, Y.J., Grauman, K., et al.: Audiovisual slowfast networks for video recognition. arXiv preprint arXiv:2001.08740 (2020)
De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)
Cai, J., Jiang, N., Han, X., et al.: JOLO-GCN: mining joint-centered light-weight information for skeleton-based action recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2735–2744 (2021)
Caputo, F.M., Prebianca, P., Carcangiu, A., et al.: A 3 cent recognizer: simple and effective retrieval and classification of mid-air gestures from single 3D traces. In: STAG, pp. 9–15 (2017)
Chen, X., Wang, G., Guo, H., et al.: Mfa-net: Motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19(2), 239 (2019)
Li, M., Chen, S., Chen, X., et al.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)
Si, C., Chen, W., Wang, W., et al.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)
Yang, H., Yan, D., Zhang, L., et al.: Feedback graph convolutional network for skeleton-based action recognition. IEEE Trans. Image Process. 31, 164–175 (2021)
Deng, Z., Gao, Q., Ju, Z., et al.: Skeleton-based multifeatures and multistream network for real-time action recognition. IEEE Sens. J. 23(7), 7397–7409 (2023)
Zolfaghari, M., Oliveira, G.L., Sedaghat, N., et al.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2904–2913 (2017)
Baradel, F., Wolf, C., Mille, J.: Human action recognition: pose-based attention draws focus to hands. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 604–613 (2017)
Nunez, J.C., Cabido, R., Pantrigo, J.J., et al.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Yu, J., Gao, H., Chen, Y., Zhou, D., Liu, J., Ju, Z.: Adaptive spatiotemporal representation learning for skeleton-based human action recognition. IEEE Trans. Cogn. Develop. Syst. 14(4), 1654–1665 (2022). https://doi.org/10.1109/TCDS.2021.3131253
Devineau, G., Xi, W., Moutarde, F., et al.: Convolutional neural networks for multivariate time series classification using both inter-and intra-channel parallel convolutions. In: Reconnaissance des Formes, Image, Apprentissage et Perc
eption (RFIAP'2018) (2018)De Smedt, Q., Wannous, H., Vandeborre, J.P., et al.: Shrec'17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
Jhuang, H., Gall, J., Zuffi, S., et al.: Towards understanding action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3192–3199 (2017)
Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: 2019 Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG IEEE 2019), pp. 1–8 (2019)
Dhingra, N., Kunz, A.: Res3atn-deep 3D residual attention network for hand gesture recognition in videos. In: 2019 International Conference on 3D Vision (3DV). IEEE, pp. 491–501 (2019)
Dadashzadeh, A., et al.: HGR‐Net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vision 13(8), 700–707 (2019)
Chen, Y., Ding, Z., Chen, Y. L., et al.: Rapid recognition of dynamic hand gestures using leap motion. In: 2015 IEEE International Conference on Information and Automation. IEEE, 2015: 1419–1424
Choutas, V., Weinzaepfel, P., Revaud, J., et al.: Potion: pose motion representation for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7024–7033 (2018)
Shi, L., Zhang, Y., Cheng, J., et al.: Decoupled spatial-temporal attention network for skeleton-based action recognition. arXiv preprint arXiv:2007.03263 (2020)
Zhang, S., Yang, Y., Xiao, J., et al.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimedia 20(9), 2330–2343 (2018)
Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp.148–157 (2017)
Duan, H., Zhao, Y., Chen, K., et al.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2969–2978 (2022)
Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Ludl, D., Gulde, T., Curio, C.: Simple yet efficient real-time pose-based action recognition. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp. 581–588 (2019)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
You, X., Gao, Q., Gao, H., Ju, Z. (2023). A Feature Fusion Network for Skeleton-Based Gesture Recognition. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14268. Springer, Singapore. https://doi.org/10.1007/978-981-99-6486-4_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-6486-4_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6485-7
Online ISBN: 978-981-99-6486-4
eBook Packages: Computer ScienceComputer Science (R0)