Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data

Liu, Jie

doi:10.1007/s00500-021-06532-4

Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data

Data analytics and machine learning
Published: 21 November 2021

Volume 26, pages 1141–1163, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jie Liu ORCID: orcid.org/0000-0003-0895-7598¹

1338 Accesses
22 Citations
Explore all metrics

Abstract

Synthetic minority oversampling methods have been proven to be an efficient solution for tackling imbalanced data classification issues. Different strategies have been proposed for generating synthetic minority samples. However, noisy samples which may cause the overlapping of minority and majority classes have not yet been properly treated for reducing their influence on the performance of a classification model. A new method, named Importance-SMOTE, is proposed in this paper. In this method, only borderline and edge samples in minority class are oversampled. The synthetic minority samples are generated proportionally to the importance of the minority samples which is calculated according to the composition and distribution of its nearest neighbors. The positions of the synthetic minority samples are determined by the relative importance of the paired neighbors. The proposed method is expected to obtain a more precise estimation of the true decision surface and reduce the influence of noisy samples. Various public imbalanced datasets and a real case study are considered in the experiments to prove the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

Article 21 April 2022

MaMiPot: a paradigm shift for the classification of imbalanced data

Article 07 December 2022

ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection

Article Open access 21 January 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bach M, Werner A, Żywiec J, Pluskiewicz W (2017) The study of under- and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Inf Sci (Ny) 384:174
Article Google Scholar
Barua S, Islam MM, Yao X, Murase K (2014) MWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl Data Eng 26:405
Article Google Scholar
Branco P, Torgo L, Ribeiro RP (2016) (不平衡数据综述) A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2):1
Article Google Scholar
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class Imbalanced Problem. Pacific-asia Conference on Advances in Knowledge Discovery & Data Mining, Springer-Verlag, pp 475–482
Chen Z, Duan J, Kang L, Qiu G (2021) A hybrid data-level ensemble to enable learning from highly imbalanced dataset. Inf Sci (Ny) 554:157
Article MathSciNet Google Scholar
Cieslak DA, Chawla NV, Striegel A (2006) “Combating imbalance in network intrusion datasets.,” in GrC, pp. 732–737
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Fernández A, del Río S, Chawla NV, Herrera F (2017) An insight into imbalanced Big Data classification: outcomes and challenges. Complex Intell. Syst. 3:105
Article Google Scholar
Fernández A, García S, Herrera F, Chawla NV (2018) SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. J Artif Intell Res 61:863
Article MathSciNet Google Scholar
Han H, Wang W, Mao B (2005) “Borderline-SMOTE : A New Over-Sampling Method in,” in International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23–26 Proceedings, Part I, 2005
Hassib EM, El-Desouky AI, Labib LM, El-kenawy ESM (2019) WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network. Soft Comput. 24:5573
Article Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in Proceedings of the International Joint Conference on Neural Networks
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans. Knowl. Data Eng 21(9):1263–1284
Article Google Scholar
Japkowicz N (2000) The class imbalance problem: significance and strategies,” in Proceedings of the 2000 International Conference on Artificial Intelligence
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Netw Learn. Syst. 29(8):3573
Article Google Scholar
Kovács G (2019) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput. J. 83:105662
Article Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221
Article Google Scholar
Last F, Douzas G, Bacao F (2017) “Oversampling for Imbalanced Learning Based on K-Means and SMOTE,”
Laurikkala J (2001) “Improving identification of difficult small classes by balancing class distribution,” in Conference on Artificial Intelligence in Medicine in Europe, pp. 63–66
Li Y, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans. Pattern Anal. Mach. Intell. 33:1189
Article Google Scholar
Liu J, Zio E (2018) A scalable fuzzy support vector machine for fault detection in transportation systems. Expert Syst Appl 102:36
Article Google Scholar
Liu M, Miao L, Zhang D (2014) Two-stage cost-sensitive learning for software defect prediction. IEEE Trans. Reliab 63:676
Article Google Scholar
Liu J, Li YF, Zio E (2017) A SVM framework for fault detection of the braking system in a high speed train. Mech. Syst. Signal Process 87:401
Article Google Scholar
Liu X, Yi GY, Bauman G, He W (2021) Ensembling imbalanced-spatial-structured support vector machine. Econom. Stat. 17:145
MathSciNet Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013) “An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics,.” Inf Sci (Ny) 250:113–141
Article Google Scholar
MacIejewski T, Stefanowski J (2011) “Local neighbourhood extension of SMOTE for mining imbalanced data,” in IEEE SSCI 2011: Symposium Series on Computational Intelligence - CIDM 2011: 2011 IEEE Symposium on Computational Intelligence and Data Mining
Mathew J, Pang CK, Luo M, Weng HL (2018) Classification of imbalanced data by oversampling in kernel space of support vector machines. Neural Netw Learn Syst IEEE Trans 29(9):4065–4076
Article Google Scholar
Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Discov 28(1):92–122
Article MathSciNet Google Scholar
Napierala K, Stefanowski J (2012) “Identification of different types of minority class examples in imbalanced data,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Napierała K, Stefanowski J (2015) Addressing imbalanced data with argument based rule learning. Expert Syst Appl 42:9468
Article Google Scholar
Napierala K, Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 46:563
Article Google Scholar
Nekooeimehr I, Lai-Yuen SK (2016) Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst Appl 46:405
Article Google Scholar
Noorhalim N, Ali A, Shamsuddin SM (2019) “Handling imbalanced ratio for class imbalance problem using SMOTE,” in Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017)
Piri S, Delen D, Liu T (2018) A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decis Support Syst 106:15
Article Google Scholar
Rey D, Neuhäuser M (2011) Wilcoxon-signed-rank test. In: Lovric M (ed) International encyclopedia of statistical science. Springer, Berlin, Heidelberg, pp 1658–1659. https://doi.org/10.1007/978-3-642-04898-2_616
Chapter Google Scholar
Rivera WA (2017) “Noise reduction a priori synthetic over-sampling for class imbalanced data sets,.” Inf Sci (Ny) 408:146–161
Article Google Scholar
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci (Ny) 291:184
Article Google Scholar
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10:e0118432
Article Google Scholar
Shilaskar S, Ghatol A (2019) Diagnosis system for imbalanced multi-minority medical dataset”. Soft Comput 23:4789
Article Google Scholar
Skryjomski P, Krawczyk B (2017) “Influence of minority class instance types on SMOTE imbalanced data oversampling,” in Proceedings of Machine Learning Research LIDTA 2017
Stefanowski J, Napierała K, Trzcielińska M (2014) Local characteristics of minority examples in pre-processing of Imbalanced Data. In: Andreasen T, Christiansen H, Cubero J-C, Raś ZW (eds) Foundations of intelligent systems (ISMIS 2014 Roskilde, Denmark, June 25–27, 2014 Proceedings) . Springer, Cham, pp 123–132
Tuncer T, Dogan S (2019) A novel octopus based Parkinson’s disease and gender recognition method using vowels. Appl. Acoust. 155:75
Article Google Scholar
Tuncer T, Dogan S, Acharya UR (2020) Automated detection of Parkinson’s disease using minimum average maximum tree and singular value decomposition method with vowels. Biocybern. Biomed. Eng. 40:211
Article Google Scholar
Wang B, Japkowicz N (2004) “Imbalanced data set learning with synthetic samples,” in InProc. IRIS Machine Learning Workshop
Xu Y, Wu C, Zheng K, Niu X, Yang Y (2017) Fuzzy-Synthetic minority oversampling technique: oversampling based on fuzzy set theory for android malware detection in imbalanced datasets. Int J Distrib Sens Netw. https://doi.org/10.1177/1550147717703116
Article Google Scholar
Zhai J, Zhang S, Zhang M, Liu X (2018) Fuzzy integral-based ELM ensemble for imbalanced big data classification. Soft Comput 22:3519
Article Google Scholar
Zhu R, Guo Y, Xue JH (2020) Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recognit Lett. 133:217
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (No.52005027).

Funding

The author declares he has no financial interests.

Author information

Authors and Affiliations

School of Reliability and Systems Engineering, Beihang University, Beijing, China
Jie Liu

Authors

Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Liu.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Tables

Table 6 Experimental results with respect to F-Measure of KNN

Full size table

6,

Table 7 Experimental results with respect to F-Measure of CART

Full size table

7,

Table 8 Experimental results with respect to AUC(PRC) of KNN

Full size table

8,

Table 9 Experimental results with respect to AUC(PRC) of CART

Full size table

9

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J. Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data. Soft Comput 26, 1141–1163 (2022). https://doi.org/10.1007/s00500-021-06532-4

Download citation

Accepted: 02 November 2021
Published: 21 November 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00500-021-06532-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

MaMiPot: a paradigm shift for the classification of imbalanced data

ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

MaMiPot: a paradigm shift for the classification of imbalanced data

ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection

Explore related subjects

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.