Similar classes latent distribution modelling-based oversampling method for imbalanced image classification

Ye, Wei; Dong, Minggang; Wang, Yan; Gan, Guojun; Liu, Deao

doi:10.1007/s11227-022-05037-7

Similar classes latent distribution modelling-based oversampling method for imbalanced image classification

Published: 28 January 2023

Volume 79, pages 9985–10019, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

346 Accesses
Explore all metrics

Abstract

Learning an unbiased classifier from imbalanced image datasets is challenging since the classifier may be strongly biased toward the majority class. To address this issue, some generative model-based oversampling methods have been proposed. However, most of these methods pay little attention to boundary samples, which may contribute tiny to learning an unbiased classifier. In this paper, we focus on boundary samples and propose a similar classes latent distribution modelling-based oversampling method. Specifically, first, we model each class as different von Mises–Fisher distributions, thereby aligning feature learning with the class distributions. Furthermore, we develop a distance minimization loss function, which makes latent representations from similar classes close to each other. In this way, the generator can capture more shared features during training. In addition, we propose a boundary sampling strategy, which uses latent variables near the decision boundary to generate boundary samples. These samples expand the minority decision region and reshape the decision boundary. Experiments on four imbalanced image datasets show that the proposed method achieves promising performance in terms of Recall, Precision, F1-score, and G-mean.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synthetic Minority with CutMix for Imbalanced Image Classification

A novel oversampling method based on Wasserstein CGAN for imbalanced classification

Article Open access 01 February 2025

KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling

Article 26 June 2019

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Zhou B, Cui Q, Wei XS, Chen ZM (2020) BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9719–9728
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona, P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755
Wang J, Lukasiewicz T, Hu X, Cai J, Xu Z (2021) RSG: a simple but effective module for learning imbalanced datasets. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3784–3793
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Article Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Catania CA, Bromberg F, Garino CG (2012) An autonomous labeling approach to support vector machines algorithms for network traffic anomaly detection. Expert Syst Appl 39(2):1822–1829
Article Google Scholar
Reza MS, Ma J (2018) Imbalanced histopathological breast cancer image classification with convolutional neural network. In: 2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, pp 619–624
Jain A, Ratnoo S, Kumar D (2017) Addressing class imbalance problem in medical diagnosis: a genetic algorithm approach. In: 2017 International Conference on Information, Communication, Instrumentation and Control (ICICIC). IEEE, pp 1–8
Li X, Li K (2022) High-dimensional imbalanced biomedical data classification based on p-Adaboost-Pauc algorithm. J Supercomput 1–24
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Fajardo VA, Findlay D, Jaiswal C, Yin X, Houmanfar R, Xie H, Liang J, She X, Emerson D (2021) On oversampling imbalanced data with deep conditional generative models. Expert Syst Appl 169:114463
Article Google Scholar
Wang X, Xu J, Zeng T, Jing L (2021) Local distribution-based adaptive minority oversampling for imbalanced data classification. Neurocomputing 422:200–213
Article Google Scholar
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Suh S, Lee H, Lukowicz P, Lee YO (2021) Cegan: Classification enhancement generative adversarial networks for unraveling data imbalance problems. Neural Netw 133:69–86
Article Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing. Springer, pp 878–887
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp. 475–482
Barua S, Islam MM, Yao X, Murase K (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Article Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, pp 1322–1328
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27
Liu J, Gu C, Wang J, Youn G, Kim J-U (2019) Multi-scale multi-class conditional generative adversarial network for handwritten character generation. J Supercomput 75(4):1922–1940
Article Google Scholar
Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl 91:464–471
Article Google Scholar
Antoniou A, Storkey A, Edwards H (2017) Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340
Islam Z, Abdel-Aty M, Cai Q, Yuan J (2021) Crash data augmentation using variational autoencoder. Accid Anal Prevent 151:105950
Article Google Scholar
Ali-Gombe A, Elyan E (2019) MFC-GAN: class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing 361:212–221
Article Google Scholar
Son M, Jung S, Jung S, Hwang E (2021) BCGAN: a CGAN-based over-sampling model using the boundary class for data balancing. J Supercomput 77(9):10463–10487
Article Google Scholar
Mullick, SS, Datta S, Das S (2019) Generative adversarial minority oversampling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1695–1704
Choi HS, Jung D, Kim S, Yoon S (2021) Imbalanced data classification via cooperative interaction between classifier and generator. IEEE Trans Neural Netw Learn Syst
Park S, Hong Y, Heo B, Yun S, Choi JY (2022) The majority can help the minority: Context-rich minority oversampling for long-tailed classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6887–6896
Guo T, Zhu X, Wang Y, Chen F (2019) Discriminative sample generation for deep imbalanced learning. In: Twenty-Eighth International Joint Conference on Artificial Intelligence $\{$IJCAI-19$\}$ International Joint Conferences on Artificial Intelligence Organization
Larsen ABL, Sønderby SK, Larochelle H, Winther O (2016) Autoencoding beyond pixels using a learned similarity metric. In: International Conference on Machine Learning. PMLR, pp 1558–1566
Gurumurthy S, Kiran Sarvadevabhatla R, Venkatesh Babu R (2017) Deligan: Generative adversarial networks for diverse and limited data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 166–174
Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381
Article Google Scholar
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):1–54
Article Google Scholar
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: International Conference on Machine Learning. PMLR, pp 2642–2651
Gulrajani I, Ahmed, F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, vol 30
Mariani G, Scheidegger F, Istrate R, Bekas C, Malossi C (2018) Bagan: Data augmentation with balancing gan. arXiv preprint arXiv:1803.09655
Huang G, Jafari AH (2021) Enhanced balancing GAN: minority-class image generation. In: Neural computing and applications, pp 1–10
Tanabe A, Fukumizu K, Oba S, Takenouchi T, Ishii S (2007) Parameter estimation for von mises-fisher distributions. Comput Stat 22(1):145–157
Article MATH MathSciNet Google Scholar
Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 212–220
LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989)Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, vol 2
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747
Krizhevsky A, Nair V, Hinton G (2010) Cifar-10 (canadian institute for advanced research). http://www.cs.toronto.edu/kriz/cifar.html 5(4), 1
Darlow LN, Crowley EJ, Antoniou A, Storkey AJ (2018) Cinic-10 is not imagenet or cifar-10. arXiv preprint arXiv:1810.03505
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(1):559–563
Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article MATH Google Scholar
Holm, S (1979)A simple sequentially rejective multiple test procedure. Scand J Stat 65–70
Derrac, J, Garcia S, Sanchez L, Herrera F (2015) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61563012), the Guangxi Natural Science Foundation of China (No. 2021GXNSFAA220074), and the Guangxi Key Laboratory of Embedded Tech-nology and Intelligent System Foundation (No. 2019-1-4).

Author information

Authors and Affiliations

School of Information Science and Engineering, Guilin University of Technology, Guilin, 541004, China
Wei Ye, Minggang Dong, Yan Wang, Guojun Gan & Deao Liu
Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin, 541004, China
Wei Ye, Minggang Dong, Yan Wang, Guojun Gan & Deao Liu

Authors

Wei Ye
View author publications
You can also search for this author in PubMed Google Scholar
Minggang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guojun Gan
View author publications
You can also search for this author in PubMed Google Scholar
Deao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

WY contributed to writing—original draft, methodology, and validation. MD contributed to conceptualization, writing—review and editing, supervision, and funding acquisition. YW contributed to validation and coding. GG contributed to validation. DL contributed to coding.

Corresponding author

Correspondence to Minggang Dong.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Network architectures

See Tables 13, 14, 15, 16, 17, 18 and 19.

Table 13 The network architecture of the classification model LeNet-5

Full size table

Table 14 The network architecture of Generator for MNIST, Fashion-MNIST

Full size table

Table 15 The network architecture of Discriminator for MNIST, Fashion-MNIST

Full size table

Table 16 The network architecture of Encoder for MNIST, Fashion-MNIST

Full size table

Table 17 The network architecture of Generator for CIFAR-10, CINIC-10

Full size table

Table 18 The network architecture of Discriminator for CIFAR-10, CINIC-10

Full size table

Table 19 The network architecture of Encoder for CIFAR-10, CINIC-10

Full size table

Appendix 2: Examples of generated images

See Figs. 10, 11, 12 and 13.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ye, W., Dong, M., Wang, Y. et al. Similar classes latent distribution modelling-based oversampling method for imbalanced image classification. J Supercomput 79, 9985–10019 (2023). https://doi.org/10.1007/s11227-022-05037-7

Download citation

Accepted: 29 December 2022
Published: 28 January 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11227-022-05037-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar classes latent distribution modelling-based oversampling method for imbalanced image classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Synthetic Minority with CutMix for Imbalanced Image Classification

A novel oversampling method based on Wasserstein CGAN for imbalanced classification

KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling

Data availability

References

Acknowledgements