Skip to main content
Log in

A fast algorithm for feature extraction of hyperspectral images using the first order statistics

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A new supervised feature extraction method appropriate for small sample size situations is proposed in this work. The proposed method is based on the first-order statistics, in which there is no need to estimate the scatter matrices. Thus, the presented method not only can avoid the singularity problem in small sample size situations but also can achieve high performance in such situations. In addition, due to the fact that the proposed algorithm only exploits the first order statistical moments, it is very fast making it suitable for real-time hyperspectral scene analysis. The proposed method makes a matrix whose columns are obtained by averaging training samples of different classes. Then, a new transform is used to map the features from the original space into a new low-dimensional space such that the new features are as different from each other as possible. Subsequently, to capture the inherent nonlinearity of the original data, the algorithm is improved using the kernel trick. In experiments, four widely-used hyperspectral datasets, namely, Indian Pines, University of Pavia, Salinas, and Botswana are classified. The experimental results show that the proposed algorithm achieves state-of-the-art results in small sample size situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bo C, Lu H, Wang D (2017) Spectral-spatial K-Nearest Neighbor approach for hyperspectral image classification. Multimedia Tools Appl. https://doi.org/10.1007/s11042-017-4403-9

  2. Camps-Valls G, Shervashidze N, Borgwardt KM (2010) Spatio-spectral remote sensing image classification with graph kernels. IEEE Geosci Remote Sens Lett 7(4):741–745

    Article  Google Scholar 

  3. Cao X, Han J, Yang S, Tao D, Jiao L (2016) Band selection and evaluation with spatial information. Int J Remote Sens 37(19):4501–4520

    Article  Google Scholar 

  4. Cao X, Wei C, Han J, Jiao L (2017) Hyperspectral band selection using improved classification map. IEEE geoscience and remote sensing letters

  5. Chang C, Linin C (2008) LIBSVM—a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  6. Chen LF, Mark Liao HY, Ko MT, Lin JC, Yu GJ (2000) A new LDA-based face recognition systerm which can solve the small sample size problem. Pattern Recogn 33:1713–1726

    Article  Google Scholar 

  7. Cui Y, Fan L (2012) Feature extraction using fuzzy maximum margin criterion. Neurocomputing 86:52–58

    Article  Google Scholar 

  8. Dehghani H, Ghassemian H (2006) Measurement of uncertainty by the entropy: application to the classification of MSS data. Int J Remote Sens 27(18):4005–4014

    Article  Google Scholar 

  9. Foody GM (2004) Thematic map comparison. Photogramm Eng Remote Sens 70(5):627–633

    Article  Google Scholar 

  10. Ghimire D, Jeong S, Lee J, Park SH (2017) Facial expression recognition based on local region specific features and support vector machines. Multimedia Tools Appl 76(6):7803–7821

  11. Hastie T, Buja A, Tibshirane R (1995) Penalized discriminant analysis. Ann Stat 23(1):73–102

    Article  MathSciNet  MATH  Google Scholar 

  12. Howland P, Park H (2004) Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans Pattern Anal Mach Intell 26(8):995–1006

    Article  Google Scholar 

  13. Imani M, Ghassemian H (2014) Feature extraction using attraction points for classification of hyperspectral images in a small sample size situation. Geoscience and Remote Sensing Letters 11(11):1986–1990

    Article  Google Scholar 

  14. Imani M, Ghassemian H (2014) Band clustering-based feature extraction for classification of hyperspectral images using limited training samples. Geoscience and Remote Sensing Letters 11(8):1325–1329

    Article  Google Scholar 

  15. Imani M, Ghassemian H (2015) Feature extraction using weighted training samples. Geoscience and Remote Sensing Letters, IEEE 12(7):1387–1391

    Article  Google Scholar 

  16. Imani M, Ghassemian H (2015) Ridge regression-based feature extraction for hyperspectral data. Int J Remote Sens 36(6):1728–1742

    Article  Google Scholar 

  17. Ji SW, Ye JP (2008) Generalized linear discriminant analysis: a unified framework and efficient model selection. IEEE Trans Neural Netw 19(10):1768–1782

    Article  Google Scholar 

  18. Jiang J, Chen C, Yu Y, Jiang X, Ma J (2017) Spatial-aware collaborative representation for hyperspectral remote sensing image classification. IEEE Geosci Remote Sens Lett 14(3):404–408

    Article  Google Scholar 

  19. Jiang X, Fang X, Chen Z, Gao J, Jiang J, Cai Z (2017) Supervised gaussian process latent variable model for hyperspectral image classification. IEEE Geosci Remote Sens Lett 14(10):1760–1764

    Article  Google Scholar 

  20. Kamandar M, Ghassemian H (2013) Linear feature extraction for hyperspectral images based on information theoretic learning. IEEE Geosci Remote Sens Lett 10(4):702–706

    Article  Google Scholar 

  21. Landgrebe DA (2002) Hyperspectral image data analysis. IEEE Signal Process Mag 19(1):17–28

    Article  Google Scholar 

  22. Li J et al (2015) Multiple feature learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens 53(3):1592–1606

    Article  Google Scholar 

  23. Lu J, Plataniotis KN, Venetsanopoulos AN (2005) Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn Lett 26(2):181–191

    Article  Google Scholar 

  24. Lu JW, Plataniotis K, Venetsanopoulos A (2005) Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn Lett 26:181–191

    Article  Google Scholar 

  25. Marconcini M, Camps-Valls G, Bruzzone L (2009) A composite semisupervised SVM for classification of hyperspectral images. IEEE Geosci Remote Sens Lett 6(2):234–238

    Article  Google Scholar 

  26. Melgani M, Bruzzone L (2004) Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Remote Sens 42(8):1778–1790

    Article  Google Scholar 

  27. Ren Y, Liao L, Maybank S, Zhang Y, Liu X (2017) Hyperspectral image spectral-spatial feature extraction via tensor principal component analysis. IEEE Geosci Remote Sens Lett 14(9):1431–1435

    Article  Google Scholar 

  28. Schacke K (2004) On the Kronecker product. Master’s Thesis, University of Waterloo

  29. Shahdoosti HR, Javaheri N (2017) Pansharpening of clustered MS and pan images considering mixed pixels. IEEE Geosci Remote Sens Lett 14(6):826–830

    Article  Google Scholar 

  30. Shahdoosti HR, Javaheri N (2018) A new hybrid feature extraction method in a dyadic scheme for classification of hyperspectral data. Int J Remote Sens 39(1):101–130

    Article  Google Scholar 

  31. Shahdoosti HR, Mirzapour F (2017) Spectral–spatial feature extraction using orthogonal linear discriminant analysis for classification of hyperspectral data. European Journal of Remote Sensing 50(1):111–124

    Article  Google Scholar 

  32. Shahshahani BM, Landgrebe DA (1994) The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans Geosci Remote Sens 32(5):1087–1095

    Article  Google Scholar 

  33. Sharma A, Dubey A, Tripathi P, Kumar V (2010) Pose invariant virtual classifiers from single training image using novel hybrid-eigenfaces. Neurocomputing 73(10):1868–1880

    Article  Google Scholar 

  34. Tong F, Tong H, Jiang J, Zhang Y (2017) Multiscale union regions adaptive sparse representation for hyperspectral image classification. Remote Sens 9(9):872

    Article  Google Scholar 

  35. Wang JG, Lin YS, Yang WK, Yang JY (2008) Kernel maximum scatter difference based feature extraction and its application to face recognition. Pattern Recogn Lett 29:1832–1835

    Article  Google Scholar 

  36. Wenjing T, Fei G, Renren D, Yujuan S, Ping L (2017) Face recognition based on the fusion of wavelet packet sub-images and fisher linear discriminant. Multimedia Tools Appl 76(21):22725–22740

  37. Xia J, Chanussot J, Du P, He X (2014) (semi-)supervised probabilistic principal component analysis for hyperspectral remote sensing image classification. IEEE J Sel Topics Appl Earth Observ Remote Sens 7(6):2224–2236

    Article  Google Scholar 

  38. Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671

    Article  Google Scholar 

  39. Yan D, Chu Y, Li L, Liu D (2017) Hyperspectral remote sensing image classification with information discriminative extreme learning machine. Multimedia Tools Appl. https://doi.org/10.1007/s11042-017-4494-3

  40. Ye JP (2006) Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. J Mach Learn Res 7:1183–1204

    MathSciNet  MATH  Google Scholar 

  41. Ye JP, Li Q (2005) A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941

    Article  Google Scholar 

  42. Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data—with application to face recognion. Pattern Recogn 34:2067–2070

    Article  MATH  Google Scholar 

  43. Zhou X, Li S, Tang F, Qin K, Hu S, Liu S (2017) Deep learning with grouped features for spatial spectral classification of hyperspectral images. IEEE Geosci Remote Sens Lett 14(1):97–101

    Article  Google Scholar 

  44. Zhu M, Martinez AM (2006) Selecting principal components in a two-stage lda algorithm. In: 2006 I.E. Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1, pp 132–137

Download references

Acknowledgements

The authors would like to thank the six anonymous reviewers for their helpful suggestions, as well as the Associate Editor for handling the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Reza Shahdoosti.

Appendix

Appendix

Firstly, the derivation of eq. (8) of the paper is described. Considering equality \( 2\left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right){\mathbf{WAA}}^T-2\lambda \mathbf{W}=0 \) (see Eq. (7)), one can apply the cs operator to the both sides of the equality:

$$ 2\mathrm{cs}\left(\left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right){\mathbf{WAA}}^T\right)-2\lambda \mathrm{cs}\left(\mathbf{W}\right)=0 $$
(16)

Using the equality cs(abc) = (cT ⊗ a)cs(b) [28], where ⊗ is the Kronecker product, one can rewrite Eq. (16) as:

$$ \left({\mathbf{AA}}^T\otimes \left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right)-\lambda {\mathbf{I}}_{ld}\right)\mathrm{cs}\mathbf{W}=0 $$
(17)

which is equal to Eq. (8).

Secondly, the reason that why csW is the eigenvector corresponding to the largest eigenvalue of \( {\mathbf{AA}}^T\otimes \left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right) \) is explained. Eq. (5) can be rewritten as:

$$ g={\sum}_{i=1}^d tr\left({\mathbf{A}}^T{\mathbf{W}}^T{\boldsymbol{P}}_i^T{\boldsymbol{P}}_i\mathbf{WA}\right) $$
(18)

Because the trace is invariant under cyclic permutations, one may write:

$$ g={\sum}_{i=1}^d tr\left({\mathbf{W}}^T{\boldsymbol{P}}_i^T{\boldsymbol{P}}_i{\mathbf{W}\mathbf{AA}}^T\right) $$
(19)

Using the equality tr(aTb) = cs(a)Tcs(b) [28], Eq. (19) can be rewritten as:

$$ g={\sum}_{i=1}^d\mathrm{cs}{\left(\mathbf{W}\right)}^T\mathrm{cs}\left({\boldsymbol{P}}_i^T{\boldsymbol{P}}_i{\mathbf{WAA}}^T\right) $$
(20)

Using the equality cs(abc) = (c ⊗ a)cs(b) [28], one can write:

$$ {\displaystyle \begin{array}{c}g={\sum}_{i=1}^d\mathrm{cs}{\left(\mathbf{W}\right)}^T\left({\mathbf{AA}}^T\otimes {\boldsymbol{P}}_i^T{\boldsymbol{P}}_i\right)\mathrm{cs}\left(\mathbf{W}\right)=\\ {}\mathrm{cs}{\left(\mathbf{W}\right)}^T\left({\mathbf{AA}}^T\otimes {\sum}_{i=1}^d\left({\boldsymbol{P}}_i^T{\boldsymbol{P}}_i\right)\right)\mathrm{cs}\left(\mathbf{W}\right)\end{array}} $$
(21)

Due to the fact that csW is the eigenvector of \( {\mathbf{AA}}^T\otimes \left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right) \) (see Eq. (8)), one can conclude:

$$ g=\lambda \mathrm{cs}{\left(\mathbf{W}\right)}^T\mathrm{cs}\left(\mathbf{W}\right)=\lambda tr\left({\mathbf{W}}^T\mathbf{W}\right)=\lambda $$
(22)

So, the maximum of g is obtained if csW is the eigenvector corresponding to the largest eigenvalue of \( {\mathbf{AA}}^T\otimes \left({\sum}_{i=1}^d\left({{\boldsymbol{P}}_i}^T{\boldsymbol{P}}_i\right)\right) \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahdoosti, H.R., Javaheri, N. A fast algorithm for feature extraction of hyperspectral images using the first order statistics. Multimed Tools Appl 77, 23633–23650 (2018). https://doi.org/10.1007/s11042-018-5695-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5695-0

Keywords

Navigation

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy