Abstract
In many important application domains such as text categorization, biomolecular analysis, scene classification and medical diagnosis, examples are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research on feature selection methods that allow the identification of relevant and informative features for multi-label classification. However, the methods proposed for this task are scattered in the literature, with no common framework to describe them and to allow an objective comparison. Here, we revisit a categorization of existing multi-label classification methods and, as our main contribution, we provide a comprehensive survey and novel categorization of the feature selection techniques that have been created for the multi-label classification setting. We conclude this work with concrete suggestions for future research in multi-label feature selection which have been derived from our categorization and analysis.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–168
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Chen W, Yan J, Zhang B, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Proceedings of the 7th IEEE international conference on data mining. pp 451–456
Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery. pp 42–53
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Dasarathy BV (1991) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alamitos
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
de Carvalho ACPLF, Freitas AA (2009) A tutorial on multi-label classification techniques. In: Abraham A, Hassanien A-E, Snášel V (eds) Foundations of Computational Intelligence Volume 5. Springer, Berlin, pp 177–195
De Comité F, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision trees from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition. Springer, pp 35–49
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45
Dendamrongvit S, Vateekul P, Kubat M (2011) Irrelevant attributes and imbalanced classes in multi-label text-categorization domains. Intell Data Anal 15(6):843–859
Doquire G, Verleysen M (2011) Feature selection for multi-label classification problems. In: Proceedings of the 11th conference on artificial neural networks on advances in computational intelligence. Springer, pp 9–16
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687
Forman G (2004) A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the 21st international conference on machine learning. ACM, pp 1–38
Fürnkranz J, Hüllermeier E, Loza Mencía E, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153
Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv (CSUR) 47(3):52
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 22–30
Gu Q, Li Z, Han J (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on information and knowledge management. pp 1087–1096
Guyon I, Elisseeff A (2006) An introduction to feature extraction. In: Guyon I, Nikravesh M, Gunn S, Zadeh LA (eds) Feature extraction, foundations and applications. Springer, Berlin, pp 1–24
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications, vol 207. Springer, Berlin
Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16–17):1897–1916
Jungjit S, Freitas A (2015) A lexicographic multi-objective genetic algorithm for multi-label correlation based feature selection. In: Proceedings of the companion publication of the 2015 annual conference on genetic and evolutionary computation. ACM, pp 989–996
Jungjit S, Michaelis M, Freitas AA, Cinatl J (2013) Two extensions to multi-label correlation-based feature selection: a case study in bioinformatics. In: Proceedings of the IEEE international conference on systems, man, and cybernetics. IEEE, pp 1519–1524
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: Proceedings of the science and information conference (SAI). IEEE, pp 372–378
Kocev D, Slavkov I, Dzeroski S (2013) Feature ranking for multi-label classification using predictive clustering trees. In: International workshop on solving complex machine learning problems with ensemble methods, in conjunction with ECML/PKDD. pp 56–68
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Kong X, Yu PS (2012) gmlc: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31(2):281–305
Lastra G, Luaces O, Quevedo JR, Bahamonde A (2011) Graphical feature selection for multilabel classification tasks. In: Proceedings of the 10th international conference on advances in intelligent data analysis. pp 246–257
Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357
Li GZ, You M, Ge L, Yang JY, Yang MQ (2010) Feature selection for semi-supervised multi-label learning with application to gene function analysis. In: Proceedings of the 1st ACM international conference on bioinformatics and computational biology. pp 354–357
Li L, Liu H, Ma Z, Mo Y, Duan Z, Zhou J, Zhao J (2014) Multi-label feature selection via information gain. In: Advanced data mining and applications, lecture notes in computer science. Springer International Publishing, pp 345–355
Li R, Zhang Y, Lu Z, Lu J, Tian Y (2010) Technique of image retrieval based on multi-label image annotation. In: Proceedings of the 2nd international conference on multimedia and information technology (MMIT), vol 2. IEEE, pp 10–13
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 4th international conference on knowledge discovery and data mining. pp 80–86
Liu H, Motoda H (eds) (2008) Less is more. In: Computational methods of feature selection. Chapman & Hall/CRC, Boca Raton, pp 3–17
Liu Y, Jin R, Yang L (2006) Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the 21st national conference on artificial intelligence. pp 421–426
Mencía EL, Furnkranz J (2008) Pairwise learning of multilabel classifications with perceptrons. In: Proceeding of the 2008 IEEE international joint conference on neural networks. pp 2899–2906
Molina LC, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE international conference on data mining. pp 306–313
Olsson J, Oard DW (2006) Combining feature selectors for text classification. In: Proceedings of the 15th ACM international conference on information and knowledge management. ACM, pp 798–799
Pereira RB, Plastino A, Zadrozny B, Merschmann LH (2015) Information gain feature selection for multi-label classification. J Inf Data Manag 6(1):48
Pupo OGR, Morell C, Soto SV (2013) ReliefF-ML: an extension of ReliefF algorithm to multi-label learning. In: Ruiz-Shulcloper J, Sanniti di Baja G (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Berlin, pp 528–535
Quinlan JR (1986) Induction of decision trees. Mach Lear 1:81–106
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Massachusetts
Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of the New Zealand computer science research student conference. pp 143–150
Read J (2010) Scalable multilabel classification. Ph.D. dissertation, Hamilton
Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Proceedings of the 20th European conference on machine learning and knowledge discovery in databases. pp 254–269
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Reyes O, Morell C, Ventura S (2015) Scalable extensions of the relieff algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing 161:168–182
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Rogati M, Yang Y (2002) High-performing feature selection for text classification. In: Proceedings of the 11th international conference on information and knowledge management. ACM, pp 659–661
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Sechidis K, Nikolaou N, Brown G (2014) Information theoretic feature selection in multi-label data through composite likelihood. In: Fränti P, Brown G, Loog M, Escolano F, Pelillo M (eds) Structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 143–152
Shao H, Li G, Liu G, Wang Y (2013) Symptom selection for multi-label data of inquiry diagnosis in traditional chinese medicine. Sci China Inf Sci 56(5):1–13
Sorower MS (2010) A literature survey on algorithms for multi-label learning. Technical Report, Oregon State University, Corvallis
Spolaôr N, Monard MC (2014) Evaluating relieff-based multi-label feature selection algorithm. In: Proceedings of the 14th edition of the Ibero-American conference on artificial intelligence. Springer, pp 194–205
Spolaôr N, Tsoumakas G (2013) Evaluating feature selection methods for multi-label text classification. In: Proceedings of the first workshop on bio-medical semantic indexing and question answering
Spolaôr N, Cherman EA, Monard MC, Lee HD (2013) A comparison of multi-label feature selection methods using the problem transformation approach. Electron Notes Theor Comput Sci 292:135–151
Spolaôr N, Monard MC, Tsoumakas G, Lee HD (2015) A systematic review of multi-label feature selection and a new method based on label construction. Neurocomput Prog Intell Syst Des 180:3–15
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. In: Bello JP, Chew E, Turnbull D (eds) Proceedings of the 9th international conference on music information retrieval. pp 325–330
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13
Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning. pp 406–417
Tsoumakas G, Dimou A, Spyromitros E, Mezaris V, Kompatsiaris I, Vlahavas I (2009) Correlation based pruning of stacked binary relevance models for multi-label learning. In: Proceedings of the 1st international workshop on learning from multi-label data. pp 101–116
Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Berlin, pp 667–685
Wandekokem ED, Varejão FM, Rauber TW (2010) An overproduce-and-choose strategy to create classifier ensembles with tuned svm parameters applied to real-world fault diagnosis. In: Progress in pattern recognition, image analysis, computer vision, and applications, Lecture notes in computer science, vol 6419. Springer, pp 500–508
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the 14th international conference on machine learning. pp 412–420
Yu K, Yu S, Tresp V (2005) Multi-label informed latent semantic indexing. In: Proceedings of the 28th ACM SIGIR conference on research and development in information retrieval. pp 258–265
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18:1338–1351
Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26:1819–1837
Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229
Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data 4(3):1411–1421
Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor Newslett 6(1):80–89
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pereira, R.B., Plastino, A., Zadrozny, B. et al. Categorizing feature selection methods for multi-label classification. Artif Intell Rev 49, 57–78 (2018). https://doi.org/10.1007/s10462-016-9516-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-016-9516-4