Abstract
A new supervised feature extraction method appropriate for small sample size situations is proposed in this work. The proposed method is based on the first-order statistics, in which there is no need to estimate the scatter matrices. Thus, the presented method not only can avoid the singularity problem in small sample size situations but also can achieve high performance in such situations. In addition, due to the fact that the proposed algorithm only exploits the first order statistical moments, it is very fast making it suitable for real-time hyperspectral scene analysis. The proposed method makes a matrix whose columns are obtained by averaging training samples of different classes. Then, a new transform is used to map the features from the original space into a new low-dimensional space such that the new features are as different from each other as possible. Subsequently, to capture the inherent nonlinearity of the original data, the algorithm is improved using the kernel trick. In experiments, four widely-used hyperspectral datasets, namely, Indian Pines, University of Pavia, Salinas, and Botswana are classified. The experimental results show that the proposed algorithm achieves state-of-the-art results in small sample size situations.







Similar content being viewed by others
References
Bo C, Lu H, Wang D (2017) Spectral-spatial K-Nearest Neighbor approach for hyperspectral image classification. Multimedia Tools Appl. https://doi.org/10.1007/s11042-017-4403-9
Camps-Valls G, Shervashidze N, Borgwardt KM (2010) Spatio-spectral remote sensing image classification with graph kernels. IEEE Geosci Remote Sens Lett 7(4):741–745
Cao X, Han J, Yang S, Tao D, Jiao L (2016) Band selection and evaluation with spatial information. Int J Remote Sens 37(19):4501–4520
Cao X, Wei C, Han J, Jiao L (2017) Hyperspectral band selection using improved classification map. IEEE geoscience and remote sensing letters
Chang C, Linin C (2008) LIBSVM—a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen LF, Mark Liao HY, Ko MT, Lin JC, Yu GJ (2000) A new LDA-based face recognition systerm which can solve the small sample size problem. Pattern Recogn 33:1713–1726
Cui Y, Fan L (2012) Feature extraction using fuzzy maximum margin criterion. Neurocomputing 86:52–58
Dehghani H, Ghassemian H (2006) Measurement of uncertainty by the entropy: application to the classification of MSS data. Int J Remote Sens 27(18):4005–4014
Foody GM (2004) Thematic map comparison. Photogramm Eng Remote Sens 70(5):627–633
Ghimire D, Jeong S, Lee J, Park SH (2017) Facial expression recognition based on local region specific features and support vector machines. Multimedia Tools Appl 76(6):7803–7821
Hastie T, Buja A, Tibshirane R (1995) Penalized discriminant analysis. Ann Stat 23(1):73–102
Howland P, Park H (2004) Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans Pattern Anal Mach Intell 26(8):995–1006
Imani M, Ghassemian H (2014) Feature extraction using attraction points for classification of hyperspectral images in a small sample size situation. Geoscience and Remote Sensing Letters 11(11):1986–1990
Imani M, Ghassemian H (2014) Band clustering-based feature extraction for classification of hyperspectral images using limited training samples. Geoscience and Remote Sensing Letters 11(8):1325–1329
Imani M, Ghassemian H (2015) Feature extraction using weighted training samples. Geoscience and Remote Sensing Letters, IEEE 12(7):1387–1391
Imani M, Ghassemian H (2015) Ridge regression-based feature extraction for hyperspectral data. Int J Remote Sens 36(6):1728–1742
Ji SW, Ye JP (2008) Generalized linear discriminant analysis: a unified framework and efficient model selection. IEEE Trans Neural Netw 19(10):1768–1782
Jiang J, Chen C, Yu Y, Jiang X, Ma J (2017) Spatial-aware collaborative representation for hyperspectral remote sensing image classification. IEEE Geosci Remote Sens Lett 14(3):404–408
Jiang X, Fang X, Chen Z, Gao J, Jiang J, Cai Z (2017) Supervised gaussian process latent variable model for hyperspectral image classification. IEEE Geosci Remote Sens Lett 14(10):1760–1764
Kamandar M, Ghassemian H (2013) Linear feature extraction for hyperspectral images based on information theoretic learning. IEEE Geosci Remote Sens Lett 10(4):702–706
Landgrebe DA (2002) Hyperspectral image data analysis. IEEE Signal Process Mag 19(1):17–28
Li J et al (2015) Multiple feature learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens 53(3):1592–1606
Lu J, Plataniotis KN, Venetsanopoulos AN (2005) Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn Lett 26(2):181–191
Lu JW, Plataniotis K, Venetsanopoulos A (2005) Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn Lett 26:181–191
Marconcini M, Camps-Valls G, Bruzzone L (2009) A composite semisupervised SVM for classification of hyperspectral images. IEEE Geosci Remote Sens Lett 6(2):234–238
Melgani M, Bruzzone L (2004) Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Remote Sens 42(8):1778–1790
Ren Y, Liao L, Maybank S, Zhang Y, Liu X (2017) Hyperspectral image spectral-spatial feature extraction via tensor principal component analysis. IEEE Geosci Remote Sens Lett 14(9):1431–1435
Schacke K (2004) On the Kronecker product. Master’s Thesis, University of Waterloo
Shahdoosti HR, Javaheri N (2017) Pansharpening of clustered MS and pan images considering mixed pixels. IEEE Geosci Remote Sens Lett 14(6):826–830
Shahdoosti HR, Javaheri N (2018) A new hybrid feature extraction method in a dyadic scheme for classification of hyperspectral data. Int J Remote Sens 39(1):101–130
Shahdoosti HR, Mirzapour F (2017) Spectral–spatial feature extraction using orthogonal linear discriminant analysis for classification of hyperspectral data. European Journal of Remote Sensing 50(1):111–124
Shahshahani BM, Landgrebe DA (1994) The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans Geosci Remote Sens 32(5):1087–1095
Sharma A, Dubey A, Tripathi P, Kumar V (2010) Pose invariant virtual classifiers from single training image using novel hybrid-eigenfaces. Neurocomputing 73(10):1868–1880
Tong F, Tong H, Jiang J, Zhang Y (2017) Multiscale union regions adaptive sparse representation for hyperspectral image classification. Remote Sens 9(9):872
Wang JG, Lin YS, Yang WK, Yang JY (2008) Kernel maximum scatter difference based feature extraction and its application to face recognition. Pattern Recogn Lett 29:1832–1835
Wenjing T, Fei G, Renren D, Yujuan S, Ping L (2017) Face recognition based on the fusion of wavelet packet sub-images and fisher linear discriminant. Multimedia Tools Appl 76(21):22725–22740
Xia J, Chanussot J, Du P, He X (2014) (semi-)supervised probabilistic principal component analysis for hyperspectral remote sensing image classification. IEEE J Sel Topics Appl Earth Observ Remote Sens 7(6):2224–2236
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Yan D, Chu Y, Li L, Liu D (2017) Hyperspectral remote sensing image classification with information discriminative extreme learning machine. Multimedia Tools Appl. https://doi.org/10.1007/s11042-017-4494-3
Ye JP (2006) Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. J Mach Learn Res 7:1183–1204
Ye JP, Li Q (2005) A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941
Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data—with application to face recognion. Pattern Recogn 34:2067–2070
Zhou X, Li S, Tang F, Qin K, Hu S, Liu S (2017) Deep learning with grouped features for spatial spectral classification of hyperspectral images. IEEE Geosci Remote Sens Lett 14(1):97–101
Zhu M, Martinez AM (2006) Selecting principal components in a two-stage lda algorithm. In: 2006 I.E. Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1, pp 132–137
Acknowledgements
The authors would like to thank the six anonymous reviewers for their helpful suggestions, as well as the Associate Editor for handling the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Firstly, the derivation of eq. (8) of the paper is described. Considering equality \( 2\left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right){\mathbf{WAA}}^T-2\lambda \mathbf{W}=0 \) (see Eq. (7)), one can apply the cs operator to the both sides of the equality:
Using the equality cs(abc) = (cT ⊗ a)cs(b) [28], where ⊗ is the Kronecker product, one can rewrite Eq. (16) as:
which is equal to Eq. (8).
Secondly, the reason that why csW is the eigenvector corresponding to the largest eigenvalue of \( {\mathbf{AA}}^T\otimes \left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right) \) is explained. Eq. (5) can be rewritten as:
Because the trace is invariant under cyclic permutations, one may write:
Using the equality tr(aTb) = cs(a)Tcs(b) [28], Eq. (19) can be rewritten as:
Using the equality cs(abc) = (c ⊗ a)cs(b) [28], one can write:
Due to the fact that csW is the eigenvector of \( {\mathbf{AA}}^T\otimes \left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right) \) (see Eq. (8)), one can conclude:
So, the maximum of g is obtained if csW is the eigenvector corresponding to the largest eigenvalue of \( {\mathbf{AA}}^T\otimes \left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right) \).
Rights and permissions
About this article
Cite this article
Shahdoosti, H.R., Javaheri, N. A fast algorithm for feature extraction of hyperspectral images using the first order statistics. Multimed Tools Appl 77, 23633–23650 (2018). https://doi.org/10.1007/s11042-018-5695-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5695-0