Abstract
Regularized multi-task learning (MTL) algorithms have been exploited in the field of pattern recognition and computer vision gradually, which can fully excavate the relationships of different related tasks. Therefore, many dramatically favorable approaches based on regularized MTL have been proposed. In the past decades, although the promising results about human action recognition have been achieved, most of existing action recognition algorithms focus on action descriptors, single/multi-view and multi-modality action recognition, and few works are related with MTL, especial of lacking the systematic evaluation of existing MTL algorithms for human action recognition. Thus, in the paper, seven popular regularized MTL algorithms in which different actions are considered as different tasks, are systematically exploited on two public multi-view action datasets. In detail, dense trajectory features are firstly extracted for each view, and then the shared codebook are constructed for all views by k-means, and then each video is coded by the shared codebook. Moreover, according to different regularized MTL algorithms, all actions or part of actions are considered as related, and then these actions are set to different tasks in MTL. Further, the effectiveness of different number of training samples from different action views is also evaluated for MTL. Large scale experimental results show that: 1) Regularized MTL is very useful for action recognition which can dig the latent relationship among different actions; 2) Not of all human actions are related, if irrelative actions are put together in MTL, its performance will fall; 3) With the increase of the training samples from different views, the relationships about different actions can be fully exploited, and it promotes the accuracy improvement of action recognition.
Similar content being viewed by others
References
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. in: VS-PETS
Doumanoglou A, Kim T-K, Zhao X, Malassiotis S (2014) Active random forests: an application to autonomous unfolding of clothes. In Proceedings of the European Conference on Computer Vision (ECCV)
Everts I, van Gemert J, Gevers T (2014) Evaluation of color spatio-temporal interest points for human action recognition, IEEE trans. Image Process 23(4):1569–1580
Evgeniou T, Pontil M (2004) Regularized multi–task learning. in: KDD
Gao Z, Song JM, Zhang H, Liu AA, Xu GP, Xue YB (2013) Human action recognition via multi-modality information. J Elect Eng Technol 8(2):742–751
Gao Y, Wang M, Ji R, Wu X, Dai Q (2014a) 3D object retrieval with Hausdorff distance learning. IEEE Trans Ind Electron 61(4):2088–2098
Gao Z, Zhang H, Liu AA, Xue YB, Xu GP (2014b) Human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning. KSII Trans Int Inf Syst 8(2):483–503
Gao Z, Zhang LF, Chen MY et al (2014c) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657
Gao Z, Zhang H, Xu GP, Xue YB, Hauptmann AG (2015a) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97
Z. Gao, H. Zhang, G.P Xu, Y.B Xue (2015b) Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition, Neurocomputing, 151, Part 2, Pages 554–564.
Gao Z, Zhang H, Liu AA, Xu GP, Xue YB (2016a) Human action recognition on depth dataset. Neural Comput & Applic 27(7):2047–2054
Gao Z, Zhang Y, Zhang H, Xue YB, Xu GP (2016b) Multi-dimensional human action recognition model based on image set and group sparisty. Neurocomputing 215:138–149. doi:10.1016/j.neucom.2016.01.113
Gao Z, Nie WZ, Liu AA, Zhang H (2016c) Evaluation of local spatial–temporal features for cross-view action recognition. Neurocomputing, 173. Part 1:110–117
Gao Z, Wang D, Zhang H, Xue Y, Xu G (2016d) A fast 3D retrieval algorithm via class-statistic and pair-constraint model. Proceedings of the 2016 ACM on Multimedia Conference, 117–121
Ge L, Ju R, Ren T, Wu G (2015) Interactive RGB-D image segmentation using hierarchical graph cut and geodesic distance. Proceedings of Pacific Rim Conference on Multimedia (PCM'15), Gwangju, Korea, 114–124
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space time shapes. IEEE Trans Pattern Anal Mach Intell:2247–2253
Guo Y (2013) Convex subspace representation learning from multi-view data. In AAAI:387–393
Guo W, Chen G (2015) Human action recognition via multi-task learning base on spatial–temporal feature. Inf Sci 320(1):418–428
Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. Proceedings of IEEE International Conference on Multimedia and Expo (ICME'16), Seattle, USA
Hao T, Peng W, Wang Q, Wang B, Sun J-S (2016) Reconstruction and application of protein–protein interaction network. Int J Mol Sci 17:907
Hu R, Xu H, Rohrbach M, Feng J, Saenko K, Darrell T (2015) Natural language object retrieval. arXiv preprint arXiv:1511.04164
Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d gradients. Proceedings of European Conference on Computer Vision 275:1–10
Konecny J, Hagara M (2013) One-shot-learning gesture recognition using HOG-HOF features. CoRR, abs/1312.4190
Kumar A, Daum’e H III (2011) A co-training approach for multi-view spectral clustering. In ICML 393–400
Laptev I, Lindeberg T (2003) Space-time interest points. in: ICCV’03, p 432–439
Laptev I, Marszałek M, Schmid C, Rozenfeld B (2009) Learning realistic human actions from movies. in Proc. CVPR'08
Li R, Tian T, Sclaroff S (2007) Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. in: ICCV'07, p 1–8
Lin L, Wang K, Zuo W, Wang M, Luo J, Zhang L (2015) A deep structured model with radius-margin bound for 3d human activity recognition. Int J Comput Vis 118:256
Liu A, Wang Z, Nie W, Yuting S (2015a) Graph-based characteristic view set extraction and matching for 3D model retrieval. Inf Sci, doi:10.1016/j.ins.2015.04.042
Liu A-A, Su Y-T, Jia P-P, Gao Z, Hao T, Yang Z-X (2015b) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208
Liu A-A, Xu N, Nie W, Su Y, Wong Y, Kankanhalli M (2016a) Benchmarking a multimodal and Multiview and interactive dataset for human action recognition. IEEE Transactions on Cybernetics 0(0):1–1
Liu A-A, Nie W-Z, Gao Y, Su Y-T (2016b) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116
Liu J, Ren T, Wang Y, Zhong S-H, Bei J, Chen S (2016c) Object proposal on RGB-D images via elastic edge boxes. Neurocomputing, doi:10.1016/j.neucom.2016.09.111
Liu A-A, Su Y-T, Nie W-Z, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Mansur A, Makihara Y, Yagi Y (2013) Inverse dynamics for action recognition. IEEE Trans Cybern 43(4):1226–1236
Marszalek M, Laptev I, Schmid C (2009) Actions in context. in: CVPR’09, p 2929–2936
Nie L, Wang M, Zha Z-J, Li G, Chua T-S (2011) Multimedia answering: enriching text QA with media information. SIGIR:695–704
Nie WZ, Liu AA, Gao Z, Su YT (2015) Clique-graph matching by preserving global & local structure. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4503–4510
Nie WZ, Liu AA, Li WH, Su YT (2016) Cross-view action recognition by cross-domain learning, Image and Vision Computing.
Onishi K, Takiguchi T, Ariki Y (2008) 3D human posture estimation using the HOG features from monocular image. in: ICPR, p 1–4
Rahmani H, Mian A (2016) 3D action recognition from novel viewpoints. In: CVPR
Ran J, Yang L, Ren T, Ge L, Wu G (2015) Depth-aware salient object detection using anisotropic center-surround difference. Signal Processing: Image Communication (SPIC) 38:115–126
Rodriguez MD, Ahmed J, Shah M (2008) Action match a spatio-temporal maximum average correlation height filter for action recognition. in: CVPR’08, p 1–8
Suk H, Jain AK, Lee S (2011) A network of dynamic probabilistic models for human interaction analysis. IEEE Trans Circuits Syst Video Technol 21(7):932–945
Sun S (2013) A survey of multi-view machine learning. Neural Comput & Applic 23(Issue 7-8):2031–2038
Wang H, Schmid C (2013) Action recognition with improved trajectories. ICCV
Wang H, Kläser A, Schmid C, Liu C-L (2011) Action recognition by dense trajectories. CVPR:3169–3176
Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1):60–79
Wang J, Nie X, Xia Y, Wu Y, Zhu S (2014a) Cross-view action modeling, learning and recognition. In CVPR
Wang J, Nie X, Xia Y, Wu Y, Zhu S-C (2014b) Cross-view action modeling, learning, and recognition. Proc of IEEE Conf on Computer Vision and Pattern Recognition (CVPR)
Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3d exemplars. ICCV
Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In CVPRW
Xu C, Tao D, Xu C (2013) A survey on multi-view learning https://arxiv.org/abs/1304.5634
Yao H, Zhang S, Zhang Y, Li J, Tian Q (2016) Coarse-to-fine description for fine-grained visual categorization. IEEE Trans Image Process 25(10):4858–4872
Yuting S et al (2014) Coupled hidden conditional random fields for RGB-D human action recognition. Singal Process. doi:10.1016/j.sigpro.2014.08.038
Zhang X, Zhang H, Zhang Y, Yang Y, Wang M, Luan H-B, Li J, Chua T-S (2016) Deep fusion of multiple semantic cues for complex event recognition. IEEE Trans Image Process 25(3):1033–1046
Zhou J, Chen J, Ye J (2012) MALSAR: multi-tAsk learning via structural regularization. Arizona State University, http://www.MALSAR.org
Zhou Q, Wang G, Jia K, Zhao Q (2013) Learning to share latent tasks for action recognition. in: ICCV
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (No.61572357, No.61202168). Tianjin Research Program of Application Foundation and Advanced Technology (14JCZDJC31700 and 13JCQNJC0040). Tianjin Municipal Natural Science Foundation (No.13JCQNJC0040). Country China Scholarship Council (No.201608120021).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Gao, Z., Li, S.H., Zhang, G.T. et al. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimed Tools Appl 76, 20125–20148 (2017). https://doi.org/10.1007/s11042-017-4384-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4384-8