Abstract
Label noise can be a major problem in classification tasks, since most machine learning algorithms rely on data labels in their inductive process. Thereupon, various techniques for label noise identification have been investigated in the literature. The bias of each technique defines how suitable it is for each dataset. Besides, while some techniques identify a large number of examples as noisy and have a high false positive rate, others are very restrictive and therefore not able to identify all noisy examples. This paper investigates how label noise detection can be improved by using an ensemble of noise filtering techniques. These filters, individual and ensembles, are experimentally compared. Another concern in this paper is the computational cost of ensembles, once, for a particular dataset, an individual technique can have the same predictive performance as an ensemble. In this case the individual technique should be preferred. To deal with this situation, this study also proposes the use of meta-learning to recommend, for a new dataset, the best filter. An extensive experimental evaluation of the use of individual filters, ensemble filters and meta-learning was performed using public datasets with imputed label noise. The results show that ensembles of noise filters can improve noise filtering performance and that a recommendation system based on meta-learning can successfully recommend the best filtering technique for new datasets. A case study using a real dataset from the ecological niche modeling domain is also presented and evaluated, with the results validated by an expert.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bensusan H, Giraud-Carrier CG, Kennedy CJ (2000) A higher-order approach to meta-learning. In: 10th international conference on inductive logic programming, pp 33–42
Brazdil P, Giraud-Carrier CG, Soares C, Vilalta R (2009) Metalearning—applications to data mining. Cognitive technologies. Springer, Berlin
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167
Brown G (2010) Ensemble learning. Encyclopedia of machine learning. Springer, Berlin, pp 312–320
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Everitt BS, Landau S, Leese M (2009) Cluster analysis. Wiley, New York
Frenay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Gamberger D, Lavrač N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: 16th international conference on machine learning (ICML), pp 143–151
Garcia LPF, Lorena AC, de Carvalho ACPLF (2012) A study on class noise detection and elimination. In: Brazilian symposium on neural networks (SBRN), pp 13–18
Garcia LPF, de Carvalho ACPLF, Lorena AC (2015a) Effect of label noise in the complexity of classification problems. Neurocomputing 160:108–119
Garcia LPF, Sáez JA, Luengo J, Lorena AC, de Carvalho ACPLF, Herrera F (2015b) Using the one-vs-one decomposition to improve the performance of class noise filters via an aggregation strategy in multi-class classification problems. Knowledge-Based Syst 90:153–164
Garcia LPF, de Carvalho ACPLF, Lorena AC (2016) Noise detection in the meta-learning level. Neurocomputing 176:14–25
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, New York
Giraud-Carrier CG, Vilalta R, Brazdil P (2004) Introduction to the special issue on meta-learning. Machine Learning 54(3):187–193
Giraud-Carrier CG, Brazdil P, Soares C, Vilalta R (2009) Meta-learning. In: Wang J (ed) Encyclopedia of data warehousing and mining. IGI Global, Hershey, pp 1207–1215
Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. thesis, University of Waikato
Han J, Kamber M, Pei J (2012) Data preprocessing. Data mining. The Morgan Kaufmann series in data management systems, 3rd edn. Morgan Kaufmann, Boston, pp 83–124
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300
Kanda J, de Carvalho ACPLF, Hruschka ER, Soares C (2011) Selection of algorithms to solve traveling salesman problems using meta-learning. Int J Hybrid Intell Syst 8(3):117–128
Khoshgoftaar T, Rebours P (2004) Generating multiple noise elimination filters with the ensemble-partitioning filter. In: IEEE international conference on information reuse and integration, pp 369–375
Lichman M (2013) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
Lorena AC, Garcia LPF, de Carvalho ACPLF (2015) Adapting noise filters for ranking. In: Brazilian conference on intelligent systems (BRACIS), pp 299–304
Mantovani RG, Rossi ALD, Vanschoren J, Bischl B, de Carvalho ACPLF (2015) To tune or not to tune: recommending when to adjust SVM hyper-parameters via meta-learning. In: International joint conference on neural networks (IJCNN), pp 1–8
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Ellis Horwood, Upper Saddle River
Miranda PBC, Prudêncio RBC, de Carvalho ACPLF, Soares C (2014) A hybrid meta-learning architecture for multi-objective optimization of SVM parameters. Neurocomputing 143:27–43
Orriols-Puig A, Maciá N, Ho TK (2010) Documentation for the data complexity library in C++. Technical report. La Salle—Universitat Ramon Llull
Peng Y, Flach PA, Soares C, Brazdil P (2002) Improved dataset characterisation for meta-learning. In: 5th international conference on discovery science, pp 141–152
Pfahringer B, Bensusan H, Giraud-Carrier CG (2000) Meta-learning by landmarking various learning algorithms. In: 17th international conference on machine learning (ICML), pp 743–750
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Rossi ALD, de Carvalho ACPLF, Soares C (2012) Meta-learning for periodic algorithm selection in time-changing data. In: Brazilian symposium on neural networks (SBRN), pp 7–12
Rossi ALD, de Carvalho ACPLF, Soares C, de Souza BF (2014) Metastream: a meta-learning based method for periodic algorithm selection in time-changing data. Neurocomputing 127:52–64
Sáez JA, Luengo J, Herrera F (2013) Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit 46(1):355–364
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Sáez JA, Galar M, Luengo J, Herrera F (2016) INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf Fusion 27:19–32
Schubert E, Wojdanowski R, Zimek A, Kriegel HP (2012) On evaluation of outlier rankings and outlier scores. In: 12th SIAM international conference on data mining (SDM), pp 1047–1058
Sluban B, Gamberger D, Lavrač N (2010) Advances in class noise detection. In: 19th European conference on artificial intelligence (ECAI), pp 1105–1106
Sluban B, Gamberger D, Lavrač N (2014) Ensemble-based noise detection: noise ranking and visual performance evaluation. Data Min Knowl Discov 28(2):265–303
Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41(1):6:1–6:25
de Souza BF, de Carvalho ACPLF, Soares C (2010) Empirical evaluation of ranking prediction methods for gene expression data classification. In: 12th Ibero-American conference on artificial intelligence (IBERAMIA), pp 194–203
Teng CM (1999) Correcting noisy data. In: 16th international conference on machine learning (ICML), pp 239–248
Toledo RY, Mota YC, Martínez L (2015) Correcting noisy ratings in collaborative recommender systems. Knowledge-Based Syst 76:96–108
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6(6):448–452
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Verbaeten S, Assche AV (2003) Ensemble methods for noise elimination in classification problems. In: 4th international workshop multiple classifier systems, pp 317–325
Wilson DL (1972) Asymtoptic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Wu X, Zhu X (2008) Mining with noise knowledge: error-aware data mining. IEEE Trans Syst Man Cybern Part A 38(4):917–932
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210
Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: 20th international conference on machine learning (ICML), pp 920–927
Acknowledgments
The authors would like to thank FAPESP (processes 2011/14602-7 and 2012/22608-8), CNPq and CAPES for their financial support. The third author’s research was supported by the Natural Sciences and Engineering Research Council of Canada, by the CALDO Programme, and by the National Research Centre of Poland (NCN) Grant DEC-2013/09/B/ST6/01549. We are also very grateful to Dr. Augusto Hashimoto de Mendonça which works at Center for Water Resources & Applied Ecology from Environmental Engineering Sciences of the School of Engineering of São Carlos at University of São Paulo and Professor Dr. Giselda Durigan from Forestry Institute of the State of São Paulo for their evaluation of the given list of potentially noisy examples of non native specie H. coronarium dataset.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Thomas Gärtner, Mirco Nanni, Andrea Passerini and Celine Robardet.
Appendix 1: Characterization and complexity measures
Appendix 1: Characterization and complexity measures
Table 4 summarizes the meta-features used to describe the noisy datasets: characterization and complexity measures.
Rights and permissions
About this article
Cite this article
Garcia, L.P.F., Lorena, A.C., Matwin, S. et al. Ensembles of label noise filters: a ranking approach. Data Min Knowl Disc 30, 1192–1216 (2016). https://doi.org/10.1007/s10618-016-0475-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-016-0475-9