Abstract
Generalized canonical correlation analysis (GCCA) has been widely used for classification and regression problems. The key idea of GCCA is to map the data from different views into a common space with the minimum reconstruction error. However, GCCA employs the squared Frobenius norm as a distance metric to find a latent correlated space without a specific strategy to cope with outliers, thus misguiding the GCCA’s training task in real-world applications and leading to suboptimal performance. This inspires us to propose a novel robust formulation for GCCA, namely, GCCA with the p-order (\(0<p\le 2\)) of Frobenius norm minimization (called RGCCA). It is difficult to solve the RGCCA involving the nonsmooth and nonconvex p-order of F-norm terms. Therefore, an efficient iterative algorithm is developed to solve RGCCA, theoretically analyzing its convergence property. In addition, the parameters of RGCCA nicely trade-off between accuracy and training time, a property especially useful for larger samples. Empirical experiments and theoretical analysis prove the effectiveness and robustness of RGCCA on both noiseless and noisy datasets.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sun L, Ji S, Ye J (2010) Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis. IEEE Trans Pattern Anal Mach Intell 33:194–200
Sarvestani RR, Boostani R (2017) FF-SKPCCA: kernel probabilistic canonical correlation analysis. Appl Intell 46:438–454
Elmadany NED, He Y, Guan L (2018) Information fusion for human action recognition via biset/multiset globality locality preserving canonical correlation analysis. IEEE Trans Image Process 27:5275–5287
Wong HS, Wang L, Chan R, Zeng T (2021) Deep tensor CCA for multi-view learning. IEEE Trans Big Data 8:1664–1677
Chen Z, Liang K, Ding SX, Yang C, Peng T, Yuan X (2021) A comparative study of deep neural network-aided canonical correlation analysis-based process monitoring and fault detection methods. IEEE Trans Neural Netw Learn Syst 33:6158–6172
Safayani M, Ahmadi SH, Afrabandpey H, Mirzaei A (2018) An EM based probabilistic two-dimensional CCA with application to face recognition. Appl Intell 48:755–770
Sun S, Xie X, Yang M (2015) Multiview uncorrelated discriminant analysis. IEEE Trans Cybern 46:3272–3284
Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27:3111–3124
Gao L, Qi L, Chen E, Guan L (2017) Discriminative multiple canonical correlation analysis for information fusion. IEEE Trans Image Process 27:1951–1965
Chen H, Chen Z, Chai Z, Jiang B, Huang B (2021) A single-side neural network-aided canonical correlation analysis with applications to fault diagnosis. IEEE Trans Cybern 52:9454–9466
Xiu X, Pan L, Yang Y, Liu W (2022) Efficient and fast joint sparse constrained canonical correlation analysis for fault detection. IEEE Trans Neural Netw Learn Syst:1–11. https://doi.org/10.1109/TNNLS.2022.3201881
Wang Y, Cang S, Yu H (2019) Mutual information inspired feature selection using kernel canonical correlation analysis. Expert Syst Appl 4:1–9
Chen L, Wang K, Li M, Wu M, Pedrycz W, Hirota K (2022) K-means clustering-based kernel canonical correlation analysis for multimodal emotion recognition in human-robot interaction. IEEE Trans Industr Electron 70:1016–1024
Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 29th International Conference on Machine Learning. pp. 1247–1255
Wang W, Arora R, Livescu K, Bilmes J (2015) On deep multi-view representation learning. In: Proceedings of the 29th International Conference on Machine Learning, pp 1083–1092
Xiu X, Miao Z, Yang Y, Liu W (2021) Deep canonical correlation analysis using sparsity constrained optimization for nonlinear process monitoring. IEEE Trans Industr Inf 18:6690–6699
Yu Y, Tang S, Aizawa K, Aizawa A (2018) Category-based deep CCA for fine-grained venue discovery from multimodal data. IEEE Trans Neural Netw Learn Syst 30:1250–1258
Horst P (1961) Generalized canonical correlations and their application to experimental data. J Clin Psychol 17:331–347
Kanatsoulis CI, Fu X, Sidiropoulos ND, Hong M (2018) Structured SUMCOR multiview canonical correlation analysis for large-scale data. IEEE Trans Signal Process 67:306–319
Fu X, Huang K, Hong M, Sidiropoulos ND, So AM-C (2017) Scalable and flexible multiview MAX-VAR canonical correlation analysis. IEEE Trans Signal Process 65:4150–4165
Carroll JD (1968) Generalization of canonical correlation analysis to three or more sets of variables. In: Proceedings of the 76th annual convention of the American Psychological Association, pp 227–228
Lu C, Feng J, Chen Y, Liu W, Lin Z, Yan S (2019) Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans Pattern Anal Mach Intell 42:925–938
Gao Y, Lin T, Zhang Y, Luo S, Nie F (2021) Robust principal component analysis based on discriminant information. IEEE Trans Knowl Data Eng 35:1991–2003
Sørensen M, Kanatsoulis CI, Sidiropoulos ND (2021) Generalized canonical correlation analysis: a subspace intersection approach. IEEE Trans Signal Process 69:2452–2467
Zheng T, Ge H, Li J, Wang L (2021) Unsupervised multi-view representation learning with proximity guided representation and generalized canonical correlation analysis. Appl Intell 51:248–264
Gloaguen A, Philippe C, Frouin V, Gennari G, Dehaene-Lambertz G, Le Brusquet L, Tenenhaus A (2022) Multiway generalized canonical correlation analysis. Biostatistics 23:240–256
Chu D, Liao LZ, Ng MK, Zhang X (2013) Sparse canonical correlation analysis: new formulation and algorithm. IEEE Trans Pattern Anal Mach Intell 35:3050–3065
Hardoon DR, Shawe-Taylor J (2011) Sparse canonical correlation analysis. Mach Learn 83:331–353
Xu M, Zhu Z, Zhang X, Zhao Y, Li X (2019) Canonical correlation analysis with L2,1-norm for multiview data representation. IEEE Trans Cybern 50:4772–4782
Li Y, Yang M, Zhang Z (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 31:1863–1883
Yang X, Liu W, Liu W, Tao D (2019) A survey on canonical correlation analysis. IEEE Trans Knowl Data Eng 33:2349–2368
Zhao H, Wang Z, Nie F (2018) A new formulation of linear discriminant analysis for robust dimensionality reduction. IEEE Trans Knowl Data Eng 31:629–640
Yu Y, Xu G, Jiang M, Zhu H, Dai D, Yan H (2019) Joint transformation learning via the L2,1-norm metric for robust graph matching. IEEE Trans Cybern 51:521–533
Nie F, Wang Z, Wang R, Wang Z, Li X (2019) Towards robust discriminative projections learning via non-greedy L2,1-norm minmax. IEEE Trans Pattern Anal Mach Intell 43:2086–2100
Bala R, Dagar A, Singh RP (2021) A novel online sequential extreme learning machine with L2,1-norm regularization for prediction problems. Appl Intell 51:1669–1689
Tenenhaus A, Tenenhaus M (2014) Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis. Eur J Oper Res 238:391–403
Tenenhaus A, Philippe C, Frouin V (2015) Kernel generalized canonical correlation analysis. Comput Stat Data Anal 90:114–131
Tenenhaus M, Tenenhaus A, Groenen PJ (2017) Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika 82:737–777
Li X, Xiu X, Liu W, Miao Z (2021) An efficient newton-based method for sparse generalized canonical correlation analysis. IEEE Signal Process Lett 29:125–129
LeCun Y (1998) The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist/
Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36:1325–1339
Martin N, Maes H (1979) Multivariate analysis. Academic Press, London
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16:2639–2664
Chen J, Wang G, Shen Y, Giannakis GB (2018) Canonical correlation analysis of datasets with a common source graph. IEEE Trans Signal Process 66:4398–4408
Wang Y, Shahrampour S (2021) ORCCA: optimal randomized canonical correlation analysis. IEEE Trans Neural Netw Learn Syst:1–13. https://doi.org/10.1109/TNNLS.2021.3124868
Fu X, Huang K, Papalexais E, Song HA, Talukdar PP, Faloutsos C, Sidiropoulos N, Mitchell T (2018) Efficient and distributed generalized canonical correlation analysis for big multiview data. IEEE Trans Knowl Data Eng 31:2304–2318
Wang Q, Gao Q, Xie D, Gao X, Wang Y (2018) Robust DLPP with nongreedy L1-norm minimization and maximization. IEEE Trans Neural Netw Learn Syst 29:738–743
Yan H, Ye Q, Zhang T, Yu D-J, Yuan X, Xu Y, Fu L (2018) Least squares twin bounded support vector machines based on L1-norm distance metric for classification. Pattern Recogn 74:434–447
Jin J, Xiao R, Daly I, Miao Y, Wang X, Cichocki A (2020) Internal feature selection method of CSP based on L1-norm and Dempster-Shafer theory. IEEE Trans Neural Netw Learn Syst 32:4814–4825
Li Y, Sun H, Yan W, Cui Q (2021) R-CTSVM+: Robust capped L1-norm twin support vector machine with privileged information. Inf Sci 574:12–32
Lai Z, Xu Y, Yang J, Shen L, Zhang D (2016) Rotational invariant dimensionality reduction algorithms. IEEE Trans Cybern 47:3733–3746
Ye Q, Li Z, Fu L, Zhang Z, Yang W, Yang G (2019) Nonpeaked discriminant analysis for data representation. IEEE Trans Neural Netw Learn Syst 30:3818–3832
Ye Q, Huang P, Zhang Z, Zheng Y, Fu L, Yang W (2021) Multiview learning with robust double-sided twin SVM. IEEE Trans Cybern 52:12745–12758
Nakkala MR, Singh A, Rossi A (2021) Multi-start iterated local search, exact and matheuristic approaches for minimum capacitated dominating set problem. Appl Soft Comput 108:1–19
Mao J, Pan Q, Miao Z, Gao L (2021) An effective multi-start iterated greedy algorithm to minimize makespan for the distributed permutation flowshop scheduling problem with preventive maintenance. Expert Syst Appl 169:1–11
Ye Q, Zhao H, Li Z, Yang X, Gao S, Yin T, Ye N (2017) L1-norm distance minimization-based fast robust twin support vector k-plane clustering. IEEE Trans Neural Netw Learn Syst 29:4494–4503
Kim C, Klabjan D (2019) A simple and fast algorithm for L1-norm kernel PCA. IEEE Trans Pattern Anal Mach Intell 42:1842–1855
Li C, Ren P, Shao Y, Ye Y, Guo Y (2020) Generalized elastic net Lp-norm nonparallel support vector machine. Eng Appl Artif Intell 88:1–16
Yan H, Fu L, Hu J, Ye Q, Qi Y, Yu D-J (2022) Robust distance metric optimization driven GEPSVM classifier for pattern classification. Pattern Recogn 129:1–14
Zhang C, Fu H, Hu Q, Cao X, Xie Y, Tao D, Xu D (2018) Generalized latent multi-view subspace clustering. IEEE Trans Pattern Anal Mach Intell 42:86–99
Fu L, Li Z, Ye Q, Yin H, Liu Q, Chen X, Fan X, Yang W, Yang G (2020) Learning robust discriminant subspace based on joint L2,p-and L2,s-norm distance metrics. IEEE Trans Neural Netw Learn Syst 33:130–144
Ma J (2020) Capped L1-norm distance metric-based fast robust twin extreme learning machine. Appl Intell 50:3775–3787
Liu Y, Jia R, Liu Q, Zhang X, Sun H (2021) Crowd counting method based on the self-attention residual network. Appl Intell 51:427–440
Khouloud S, Ahlem M, Fadel T, Amel S (2022) W-net and inception residual network for skin lesion segmentation and classification. Appl Intell 52:3976–3994
Acknowledgements
This work was supported by the National Key Research and Development Program of China: Key Projects of International Scientific and Technological Innovation Cooperation between Governments (No. 2019YFE0123800), the National Natural Science Foundation of China (Nos. 62072243 and 62072246), the Natural Science Foundation of Jiangsu Province (BK20201304), the Foundation of National Defense Key Laboratory of Science and Technology (JZX7Y202001SY000901), the “333 Project” of Jiangsu Province under Project (BRA2020044) and the EU’s Horizon 2020 Program (LC-GV-05-2019).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Convergence Analysis
This section proves that Algorithm 1 monotonically decreases the objective function value of (8) and has a local optimal solution. Next, we introduce Lemma 1.
Lemma 1: [61]: For any nonzero matrices \({V}^{\left(t+1\right)},{V}^{t}\), when \(0<p\le 2\), the inequality holds:
Theorem 1: In each iteration of Algorithm 1, we have
It means that Algorithm 1 monotonically decreases the objective function value of (8).
Proof: According to step 3 in Algorithm 1, in the \(\left(t+1\right)\) th iteration, we have the following inequality:
The inequality (A.3) multiplies -1 and adds \(\sum_{j=1}^{J}tr\left({I}_{N}{d}_{j}^{t}\right)\) on both sides, we have
By simple algebra, (A.4) becomes,
Since \({\Vert G\left({I}_{N}-{P}_{j}\right)\Vert }_{F}^{2}=tr\left(G\left({I}_{N}-{P}_{j}\right){G}^{T}\right)\) holds for each j, then the (A.5) is transformed to
From the definitions of \({U}_{j}\) and \({P}_{j}\), (A.6) becomes,
From the definition of \({d}_{j}\), and denote by \({V}_{j}^{\left(t+1\right)}\text{=}{G}^{\left(t\text{+}1\right)}-{U}_{j}^{T}{X}_{j}\) and \({V}_{j}^{t}\text{=}{G}^{t}-{U}_{j}^{T}{X}_{j}\). Substituting \({V}_{j}^{\left(t+1\right)}\) and \({V}_{j}^{t}\) into the (A.7), by simple algebra, we rewrite (A.7) as follows,
According to Lemma 1, we have
Inequality (A.9) holds for each index j, thus we have
Combing inequalities (A.8) and (A.10), by simple algebra, we have
According to the definitions of \({V}_{j}^{\left(t+1\right)}\) and \({V}_{j}^{t}\), the (A.11) is rewritten as follows,
The (A.12) holds indicates that Algorithm 1 monotonically decreases the objective function value of (8) in each iteration. This suggests that the \({G}^{*}\) moves towards the optimal solution in each iteration. □
Theorem 2: Algorithm 1 converges to a local optimal solution of the objective function (8).
Proof: The (8) is a conditional extremum problem. We transform (8) to an unconditional extremum problem by utilizing the Lagrange multiplier method. The constructed Lagrange auxiliary function of (8) is shown as follows:
where \(\alpha\) is the Lagrange multiplier. To satisfy the constraint condition \(G{G}^{T}\text{=}{I}_{r}\), we set \(\alpha\) as the diagonal matrix. According to KKT condition for the optimal solution, we take \(\left(\partial {L}_{1}/\partial G\right)=0\), then,
where \(\overline{M }=\left({I}_{N}-2M\right)\). By simple algebra, we obtain the optimal solution
According to the aforementioned analysis, the optimal solution of the objective function (12) can be obtained in step 3 of Algorithm 1. Therefore, the converged solution (\({G}^{*}\)) of Algorithm 1 satisfies the KKT condition of the (12). The Lagrange auxiliary function of the objective function (12) is shown as follows:
where \(\overline{\alpha }\) is the Lagrangian multiplier. Taking \(\left(\partial {L}_{2}/\partial G\right)=0\), we obtain the KKT condition of (12) as follows,
The (A.17) is formally similar to the (A.15). The key difference between them is that the diagonal matrix \(D\) is known in each iteration. In Algorithm 1, assume that obtaining the optimal solution \({G}^{*}\) in the \(\left(t+1\right)\) th iteration. Then, we have \({G}^{\left(t\text{+}1\right)}\text{=}{G}^{*}\text{=}{G}^{t}\). In addition, according to definition of \(D\), we can see that the (A.17) is the same as (A.15) in this case. This suggests that the converged solution of Algorithm 1 satisfies the KKT condition of the objective function (8). That is \(\left(\partial {L}_{1}/\partial G\right)\left|{}_{G\text{=}{G}^{*}}\right.\text{=}{0}\). Given this, the converged solution of Algorithm 1 is a local optimal solution of the objective function (8). □
According to Theorem 1, an iterative algorithm is developed to search for the local optimal solution of the objective function (8). However, the developed algorithm needs to solve the eigen-decomposition of the matrix \(DM\) iteratively. The iteration procedure makes the RGCCA higher computational cost than the GCCA family methods. This defect exists in iterative algorithms [47, 48, 58, 61, 62]. We will explore a better strategy with the theoretical guarantee to reduce the cost of RGCCA. Fortunately, we can adjust RGCCA’s parameters to balance the robustness and cost, especially for larger samples. In summary, Algorithm 1 scales well to large-scale datasets, which indicates that the developed iterative algorithm is useful for practical applications.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yan, H., Cheng, L., Ye, Q. et al. Robust generalized canonical correlation analysis. Appl Intell 53, 21140–21155 (2023). https://doi.org/10.1007/s10489-023-04666-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04666-6