Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition

Gao, Z.; Li, S. H.; Zhang, G. T.; Zhu, Y. J.; Wang, C.; Zhang, H.

doi:10.1007/s11042-017-4384-8

Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition

Published: 16 February 2017

Volume 76, pages 20125–20148, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

500 Accesses
14 Citations
Explore all metrics

Abstract

Regularized multi-task learning (MTL) algorithms have been exploited in the field of pattern recognition and computer vision gradually, which can fully excavate the relationships of different related tasks. Therefore, many dramatically favorable approaches based on regularized MTL have been proposed. In the past decades, although the promising results about human action recognition have been achieved, most of existing action recognition algorithms focus on action descriptors, single/multi-view and multi-modality action recognition, and few works are related with MTL, especial of lacking the systematic evaluation of existing MTL algorithms for human action recognition. Thus, in the paper, seven popular regularized MTL algorithms in which different actions are considered as different tasks, are systematically exploited on two public multi-view action datasets. In detail, dense trajectory features are firstly extracted for each view, and then the shared codebook are constructed for all views by k-means, and then each video is coded by the shared codebook. Moreover, according to different regularized MTL algorithms, all actions or part of actions are considered as related, and then these actions are set to different tasks in MTL. Further, the effectiveness of different number of training samples from different action views is also evaluated for MTL. Large scale experimental results show that: 1) Regularized MTL is very useful for action recognition which can dig the latent relationship among different actions; 2) Not of all human actions are related, if irrelative actions are put together in MTL, its performance will fall; 3) With the increase of the training samples from different views, the relationships about different actions can be fully exploited, and it promotes the accuracy improvement of action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-attention Network for View-invariant Action Recognition

Article Open access 20 July 2023

Cross-modal guides spatio-temporal enrichment network for few-shot action recognition

Article 13 August 2024

Cmf-transformer: cross-modal fusion transformer for human action recognition

Article 17 August 2024

Notes

References

Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Article MathSciNet Google Scholar
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. in: VS-PETS
Doumanoglou A, Kim T-K, Zhao X, Malassiotis S (2014) Active random forests: an application to autonomous unfolding of clothes. In Proceedings of the European Conference on Computer Vision (ECCV)
Everts I, van Gemert J, Gevers T (2014) Evaluation of color spatio-temporal interest points for human action recognition, IEEE trans. Image Process 23(4):1569–1580
Article MathSciNet Google Scholar
Evgeniou T, Pontil M (2004) Regularized multi–task learning. in: KDD
Gao Z, Song JM, Zhang H, Liu AA, Xu GP, Xue YB (2013) Human action recognition via multi-modality information. J Elect Eng Technol 8(2):742–751
Google Scholar
Gao Y, Wang M, Ji R, Wu X, Dai Q (2014a) 3D object retrieval with Hausdorff distance learning. IEEE Trans Ind Electron 61(4):2088–2098
Article Google Scholar
Gao Z, Zhang H, Liu AA, Xue YB, Xu GP (2014b) Human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning. KSII Trans Int Inf Syst 8(2):483–503
Google Scholar
Gao Z, Zhang LF, Chen MY et al (2014c) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657
Article Google Scholar
Gao Z, Zhang H, Xu GP, Xue YB, Hauptmann AG (2015a) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97
Article Google Scholar
Z. Gao, H. Zhang, G.P Xu, Y.B Xue (2015b) Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition, Neurocomputing, 151, Part 2, Pages 554–564.
Gao Z, Zhang H, Liu AA, Xu GP, Xue YB (2016a) Human action recognition on depth dataset. Neural Comput & Applic 27(7):2047–2054
Article Google Scholar
Gao Z, Zhang Y, Zhang H, Xue YB, Xu GP (2016b) Multi-dimensional human action recognition model based on image set and group sparisty. Neurocomputing 215:138–149. doi:10.1016/j.neucom.2016.01.113
Gao Z, Nie WZ, Liu AA, Zhang H (2016c) Evaluation of local spatial–temporal features for cross-view action recognition. Neurocomputing, 173. Part 1:110–117
Google Scholar
Gao Z, Wang D, Zhang H, Xue Y, Xu G (2016d) A fast 3D retrieval algorithm via class-statistic and pair-constraint model. Proceedings of the 2016 ACM on Multimedia Conference, 117–121
Ge L, Ju R, Ren T, Wu G (2015) Interactive RGB-D image segmentation using hierarchical graph cut and geodesic distance. Proceedings of Pacific Rim Conference on Multimedia (PCM'15), Gwangju, Korea, 114–124
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space time shapes. IEEE Trans Pattern Anal Mach Intell:2247–2253
Guo Y (2013) Convex subspace representation learning from multi-view data. In AAAI:387–393
Guo W, Chen G (2015) Human action recognition via multi-task learning base on spatial–temporal feature. Inf Sci 320(1):418–428
Article MathSciNet Google Scholar
Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. Proceedings of IEEE International Conference on Multimedia and Expo (ICME'16), Seattle, USA
Hao T, Peng W, Wang Q, Wang B, Sun J-S (2016) Reconstruction and application of protein–protein interaction network. Int J Mol Sci 17:907
Article Google Scholar
Hu R, Xu H, Rohrbach M, Feng J, Saenko K, Darrell T (2015) Natural language object retrieval. arXiv preprint arXiv:1511.04164
Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d gradients. Proceedings of European Conference on Computer Vision 275:1–10
Google Scholar
Konecny J, Hagara M (2013) One-shot-learning gesture recognition using HOG-HOF features. CoRR, abs/1312.4190
Kumar A, Daum’e H III (2011) A co-training approach for multi-view spectral clustering. In ICML 393–400
Laptev I, Lindeberg T (2003) Space-time interest points. in: ICCV’03, p 432–439
Laptev I, Marszałek M, Schmid C, Rozenfeld B (2009) Learning realistic human actions from movies. in Proc. CVPR'08
Li R, Tian T, Sclaroff S (2007) Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. in: ICCV'07, p 1–8
Lin L, Wang K, Zuo W, Wang M, Luo J, Zhang L (2015) A deep structured model with radius-margin bound for 3d human activity recognition. Int J Comput Vis 118:256
Article MathSciNet Google Scholar
Liu A, Wang Z, Nie W, Yuting S (2015a) Graph-based characteristic view set extraction and matching for 3D model retrieval. Inf Sci, doi:10.1016/j.ins.2015.04.042
Liu A-A, Su Y-T, Jia P-P, Gao Z, Hao T, Yang Z-X (2015b) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208
Article Google Scholar
Liu A-A, Xu N, Nie W, Su Y, Wong Y, Kankanhalli M (2016a) Benchmarking a multimodal and Multiview and interactive dataset for human action recognition. IEEE Transactions on Cybernetics 0(0):1–1
Google Scholar
Liu A-A, Nie W-Z, Gao Y, Su Y-T (2016b) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116
Article MathSciNet Google Scholar
Liu J, Ren T, Wang Y, Zhong S-H, Bei J, Chen S (2016c) Object proposal on RGB-D images via elastic edge boxes. Neurocomputing, doi:10.1016/j.neucom.2016.09.111
Liu A-A, Su Y-T, Nie W-Z, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Mansur A, Makihara Y, Yagi Y (2013) Inverse dynamics for action recognition. IEEE Trans Cybern 43(4):1226–1236
Article Google Scholar
Marszalek M, Laptev I, Schmid C (2009) Actions in context. in: CVPR’09, p 2929–2936
Nie L, Wang M, Zha Z-J, Li G, Chua T-S (2011) Multimedia answering: enriching text QA with media information. SIGIR:695–704
Nie WZ, Liu AA, Gao Z, Su YT (2015) Clique-graph matching by preserving global & local structure. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4503–4510
Nie WZ, Liu AA, Li WH, Su YT (2016) Cross-view action recognition by cross-domain learning, Image and Vision Computing.
Onishi K, Takiguchi T, Ariki Y (2008) 3D human posture estimation using the HOG features from monocular image. in: ICPR, p 1–4
Rahmani H, Mian A (2016) 3D action recognition from novel viewpoints. In: CVPR
Ran J, Yang L, Ren T, Ge L, Wu G (2015) Depth-aware salient object detection using anisotropic center-surround difference. Signal Processing: Image Communication (SPIC) 38:115–126
Google Scholar
Rodriguez MD, Ahmed J, Shah M (2008) Action match a spatio-temporal maximum average correlation height filter for action recognition. in: CVPR’08, p 1–8
Suk H, Jain AK, Lee S (2011) A network of dynamic probabilistic models for human interaction analysis. IEEE Trans Circuits Syst Video Technol 21(7):932–945
Article Google Scholar
Sun S (2013) A survey of multi-view machine learning. Neural Comput & Applic 23(Issue 7-8):2031–2038
Article Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. ICCV
Wang H, Kläser A, Schmid C, Liu C-L (2011) Action recognition by dense trajectories. CVPR:3169–3176
Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1):60–79
Article MathSciNet Google Scholar
Wang J, Nie X, Xia Y, Wu Y, Zhu S (2014a) Cross-view action modeling, learning and recognition. In CVPR
Wang J, Nie X, Xia Y, Wu Y, Zhu S-C (2014b) Cross-view action modeling, learning, and recognition. Proc of IEEE Conf on Computer Vision and Pattern Recognition (CVPR)
Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3d exemplars. ICCV
Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In CVPRW
Xu C, Tao D, Xu C (2013) A survey on multi-view learning https://arxiv.org/abs/1304.5634
Yao H, Zhang S, Zhang Y, Li J, Tian Q (2016) Coarse-to-fine description for fine-grained visual categorization. IEEE Trans Image Process 25(10):4858–4872
Article MathSciNet Google Scholar
Yuting S et al (2014) Coupled hidden conditional random fields for RGB-D human action recognition. Singal Process. doi:10.1016/j.sigpro.2014.08.038
Google Scholar
Zhang X, Zhang H, Zhang Y, Yang Y, Wang M, Luan H-B, Li J, Chua T-S (2016) Deep fusion of multiple semantic cues for complex event recognition. IEEE Trans Image Process 25(3):1033–1046
Article MathSciNet Google Scholar
Zhou J, Chen J, Ye J (2012) MALSAR: multi-tAsk learning via structural regularization. Arizona State University, http://www.MALSAR.org
Zhou Q, Wang G, Jia K, Zhao Q (2013) Learning to share latent tasks for action recognition. in: ICCV

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No.61572357, No.61202168). Tianjin Research Program of Application Foundation and Advanced Technology (14JCZDJC31700 and 13JCQNJC0040). Tianjin Municipal Natural Science Foundation (No.13JCQNJC0040). Country China Scholarship Council (No.201608120021).

Author information

Authors and Affiliations

Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology, Tianjin, 300384, China
Z. Gao, S. H. Li, G. T. Zhang & H. Zhang
Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, 300384, China
Z. Gao, S. H. Li, G. T. Zhang & H. Zhang
National Engineering Laboratory for Information Security Technologies, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
Y. J. Zhu
The Home Depot, Atlanta, GA, 30339, United States
C. Wang

Authors

Z. Gao
View author publications
You can also search for this author in PubMed Google Scholar
S. H. Li
View author publications
You can also search for this author in PubMed Google Scholar
G. T. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Y. J. Zhu
View author publications
You can also search for this author in PubMed Google Scholar
C. Wang
View author publications
You can also search for this author in PubMed Google Scholar
H. Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Z. Gao or H. Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, Z., Li, S.H., Zhang, G.T. et al. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimed Tools Appl 76, 20125–20148 (2017). https://doi.org/10.1007/s11042-017-4384-8

Download citation

Received: 12 October 2016
Revised: 09 December 2016
Accepted: 09 January 2017
Published: 16 February 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11042-017-4384-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dual-attention Network for View-invariant Action Recognition

Cross-modal guides spatio-temporal enrichment network for few-shot action recognition

Cmf-transformer: cross-modal fusion transformer for human action recognition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dual-attention Network for View-invariant Action Recognition

Cross-modal guides spatio-temporal enrichment network for few-shot action recognition

Cmf-transformer: cross-modal fusion transformer for human action recognition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.