Abstract
Cancer is considered a leading cause of mortality in both developed and developing countries. Cancer classification based on the microarray dataset has provided insight into possible treatment strategies. A complicated and high-dimensional number of genes and a few numbers of instances are characteristics of the microarray datasets. Gene selection is therefore a challenging and required task for the data analysis of microarray expression. The selection of genes may reveal insight into the underlying mechanism of a particular biological phenomenon. Several academics have recently developed methods of feature selection, utilizing metaheuristic algorithms for interpreting and analyzing microarray data. Nevertheless, due to the few numbers of samples in microarray data compared to the high dimensionality, several data mining approaches have been unsuccessful to select the most relevant and informatics genes. As a result, incorporating various classifiers can enhance feature selection and classification performance. The current study aims to propose a method for cancer classification by employing ensemble learning. Hence, in this paper, particle swarm optimization and an ensemble learning method collaborate for feature selection and cancer classification. As a result, the analysis indicates the effectiveness of the proposed method for cancer classification based on microarray datasets, and in terms of accuracy, the performance outcomes are 100%, 92.86%, 86.36%, 100%, 85.71% for leukemia, colon, breast cancer, ovarian, and central nervous system, respectively, which overcome most of the state-of-the-art methods and also dominance on the baseline ensemble method with 12% enhancement.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
Data are available upon request.
Code availability
Code is available upon request.
Abbreviations
- PSO:
-
Particle swarm optimization
- SVM:
-
Support vector machines
- KNN:
-
K-nearest neighbors
- DT:
-
Decision trees
- NB:
-
Naïve Bayes
- RF:
-
Random forest
- CNN:
-
Convolutional neural network
- PCA:
-
Principle component analysis
- GSP:
-
Gene selection programming
- GEP:
-
Gene expression programming
- GA:
-
Genetic algorithm
- CNS:
-
Central nervous system
- AML:
-
Acute myeloblastic leukemia
- ALL:
-
Acute lymphoblastic leukemia
- GWO:
-
Grey wolf optimizer
- WOA:
-
Whale optimization algorithm
- BAT:
-
Bat algorithm
- MFO:
-
Moth-flame optimization
- FFA:
-
Firefly algorithm
- MVO:
-
Multi-verse optimizer
- ROC:
-
Receiver operating characteristic curve
References
Plummer M, de Martel C, Vignat J, Ferlay J, Bray F, Franceschi S (2018) Global burden of cancers attributable to infections in 2012: a synthetic analysis. Lancet Glob Heal 4(9):e609–e616. https://doi.org/10.1016/S2214-109X(16)30143-7
WHO, “Cancer,” (2020) World Health Organization. https://www.who.int/news-room/fact-sheets/detail/cancer. Accessed 23 June 2021
Montazeri M (2016) Machine learning models in breast cancer survival prediction. Technol Heal Care 24(1):31–42. https://doi.org/10.3233/THC-151071
Peng Y (2006) A novel ensemble machine learning for robust microarray data classification. Comput Biol Med 36(6):553–573. https://doi.org/10.1016/J.COMPBIOMED.2005.04.001
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17. https://doi.org/10.1016/J.CSBJ.2014.11.005
Wang X, Hessner MJ, Wu Y, Pati N, Ghosh S (2003) Quantitative quality control in microarray experiments and the application in data filtering, normalization and false positive rate prediction. Bioinformatics 19(11):1341–1347. https://doi.org/10.1093/bioinformatics/btg154
Mohamad MS, Omatu S, Yoshioka M, Deris S (2008) An approach using hybrid methods to select informative genes from microarray data for cancer classification. In: Proceedings of—2nd Asia international conference on modelling simulation, AMS 2008, pp 603–608. https://doi.org/10.1109/AMS.2008.71
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform. https://doi.org/10.1155/2015/198363
Křížek P (2008) Feature selection: stability, algorithms, and evaluation Doctoral thesis. Czech Technical University
Hosseini ES, Moattar MH (2019) Evolutionary feature subsets selection based on interaction information for high dimensional imbalanced data classification. Appl Soft Comput 82:105581. https://doi.org/10.1016/j.asoc.2019.105581
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science, pp 39–43. https://doi.org/10.1109/MHS.1995.494215.
Ali A, Shamsuddin SM, Ralescu AL (2007) Classification with class imbalance problem: a review. Int J Adv Soft Comput its Appl 7(3):176–204
Bolón-Canedo V, Alonso-Betanzos A (2019) Ensembles for feature selection: a review and future trends. Inf Fus 52(2018):1–12. https://doi.org/10.1016/j.inffus.2018.11.008
Dittman D, Khoshgoftaar TM, Wald R, Napolitano A (2011) Random forest: a reliable tool for patient response prediction. In: 2011 IEEE international conference on bioinformatics and biomedicine workshops. BIBMW 2011, pp 289–296. https://doi.org/10.1109/BIBMW.2011.6112389
Alelyani S (2021) Stable bagging feature selection on medical data. J Big Data. https://doi.org/10.1186/s40537-020-00385-8
Jowkar GH, Mansoori EG (2016) Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. Comput Biol Chem 64:263–270. https://doi.org/10.1016/j.compbiolchem.2016.07.004
Morovvat M, Osareh A (2016) An ensemble of filters and wrappers for microarray data classification. Mach Learn Appl An Int J 3(2):01–17. https://doi.org/10.5121/mlaij.2016.3201
Dagnew G, Shekar BH (2021) Ensemble learning-based classification of microarray cancer data on tree-based features. Cogn Comput Syst 3(1):48–60. https://doi.org/10.1049/ccs2.12003
Panda M (2018) Elephant search optimization combined with deep neural network for microarray data analysis. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2017.12.002
Hussain S, Muhammad S, Iqbal J, Ahmad I (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05367-8
Tabares-Soto R, Orozco-Arias S, Romero-Cano V, Segovia Bucheli V, Luis Rodriguez-Sotelo J, Felipe Jimenez-Varon C (2020) A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data”. PEERJ Comput Sci. https://doi.org/10.7717/peerj-cs.270
Ebrahimpour MK, Eftekhari M (2017) Ensemble of feature selection methods: a hesitant fuzzy sets approach. Appl Soft Comput J 50:300–312. https://doi.org/10.1016/j.asoc.2016.11.021
Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl Based Syst 123:116–127. https://doi.org/10.1016/j.knosys.2017.02.013
Chen KH, Wang KJ, Wang KM, Angelia MA (2014) Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput J 24:773–780. https://doi.org/10.1016/j.asoc.2014.08.032
Al-betar MA, Alomari OA, Abu-romman SM (2020) Genomics A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 112(1):114–126. https://doi.org/10.1016/j.ygeno.2019.09.015
Gumaei A, El-zaart A (2021) Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression. Health Inform J. https://doi.org/10.1177/1460458221989402
Alanni R, Hou J, Azzawi H, Xiang Y (2019) A novel gene selection algorithm for cancer classification using microarray datasets. BMC Med Genomics. https://doi.org/10.1186/s12920-018-0447-6
Shi P, Liang K, Han D, Zhang Y (2017) 2718. A novel intelligent fault diagnosis method of rotating machinery based on deep learning and PSO-SVM. J Vibroeng 19(8):1. https://doi.org/10.21595/jve.2017.18380
Panda M (2018) Elephant search optimization combined with deep neural network for microarray data analysis. J King Saud Univ Comput Inf Sci 1:1. https://doi.org/10.1016/j.jksuci.2017.12.002
Dabba A, Tari A, Meftali S (2020) Hybridization of Moth flame optimization algorithm and quantum computing for gene selection in microarray data. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02434-9
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453. https://doi.org/10.1016/j.asoc.2017.11.006
Zhou Y, Kang J, Guo H (2020) Many-objective optimization of feature selection based on two-level particle cooperation. Inf Sci (Ny) 532:91–109. https://doi.org/10.1016/j.ins.2020.05.004
Zhou Y, Kang J, Kwong S, Wang X, Zhang Q (2020) An evolutionary multi-objective optimization framework of discretization-based feature selection for classification. Swarm Evol Comput 60:100770. https://doi.org/10.1016/j.swevo.2020.100770
Zhou Y, Zhang W, Kang J, Zhang X, Wang X (2021) A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Inf Sci (Ny) 547:841–859. https://doi.org/10.1016/j.ins.2020.08.083
Iliyasu AM, Fatichah C (2017) A quantum hybrid PSO combined with fuzzy k-NN approach to feature selection and cell classification in cervical cancer detection. Sensors (Switzerland) 17(12):1–17. https://doi.org/10.3390/s17122935
Kavitha KR, Harishankar UN, Akhil MC (2018) PSO based feature selection of gene for cancer classification using SVM-RFE. In: 2018 international conference on advances in computing, communications and informatics, ICACCI 2018, pp 1012–1016. https://doi.org/10.1109/ICACCI.2018.8554429.
Gu S, Cheng R, Jin Y (2018) Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput 22:811–822. https://doi.org/10.1007/s00500-016-2385-6
Cilia ND, De Stefano C, Fontanella F, Raimondo S, di Freca AS (2019) An experimental comparison of feature-selection and classification methods for microarray datasets. Inf 10(3):1–13. https://doi.org/10.3390/info10030109
Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40(11):3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007
Mazumder DH (2019) An enhanced feature selection filter for classification of microarray cancer data. ETRI J. https://doi.org/10.4218/etrij.2018-0522
Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537. https://doi.org/10.1126/SCIENCE.286.5439.531
Alrefai N (2019) Ensemble machine learning for leukemia cancer diagnosis based on microarray datasets. Int J Appl Eng Res 14(21):4077–4084
Alon U et al. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Accessed 20 July 2019. Available: http://www.pnas.org.
van’t Veer LJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536. https://doi.org/10.1038/415530a
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215. https://doi.org/10.1016/j.asoc.2017.09.038
Gordon GJ et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967
Petricoin EF et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306):572–577. https://doi.org/10.1016/S0140-6736(02)07746-2
Pomeroy SL et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442. https://doi.org/10.1038/415436a
Pashaei E, Ozen M, Aydin N (2016) Gene selection and classification approach for microarray data based on Random Forest Ranking and BBHA. In: 3rd IEEE EMBS international conference on biomedical and health informatics, BHI 2016, pp 308–311. https://doi.org/10.1109/BHI.2016.7455896.
Molina D, Poyatos J, Del Ser J, García S, Hussain A, Herrera F (2020) Comprehensive taxonomies of nature- and bio-inspired optimization: inspiration versus algorithmic behavior, critical analysis recommendations. Cognit Comput. https://doi.org/10.1007/s12559-020-09730-8
Eberhart S (1998) A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No.98TH8360), pp 69–73. https://doi.org/10.1109/ICEC.1998.699146.
Han J, Kamber M, Pei J (2011) Data mining. concepts and techniques, 3rd Edition (The Morgan Kaufmann Series in Data Management Systems). Accessed 01 Dec 2018. [Online]. Available: http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf
Lysiak R, Kurzynski M, Woloszynski T (2014) Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers. Neurocomputing 126:29–35. https://doi.org/10.1016/j.neucom.2013.01.052
Cavalcanti GDC, Oliveira LS, Moura TJM, Carvalho GV (2016) Combining diversity measures for ensemble pruning. Pattern Recognit Lett 74:38–45. https://doi.org/10.1016/j.patrec.2016.01.029
Brodley C, Lane T (1996) Creating and exploiting coverage and diversity. In: Proc. AAAI-96 workshop on integrating multiple learned models. Portland, OR, pp 8–14
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116. https://doi.org/10.1007/S10115-006-0040-8
García V, Salvador Sánchez J (2014) Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inf Sci. https://doi.org/10.1016/j.ins.2014.09.064
Kilicarslan S, Adem K, Celik M (2020) Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Med Hypotheses 137:109577. https://doi.org/10.1016/j.mehy.2020.109577
Funding
This study did not receive external or internal funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest that are relevant to the content of this article.
Ethics approval
All information and the data source used in our study were mentioned in the research, and it is available and public for research purpose.
Consent for publication
In this study we used public dataset and cited appropriately.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Alrefai, N., Ibrahim, O. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput & Applic 34, 13513–13528 (2022). https://doi.org/10.1007/s00521-022-07147-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07147-y