Abstract
In recent years, the relationship between data quality and machine learning models has garnered increasing attention. The present work quantifies dataset quality using complexity measurements and conducts detailed experimental analysis on the impact of these measurements on the accuracy of the Extend Belief Rule Base (EBRB) model, a representative advanced rule-based system that has attracted great attention in the last few years. The purpose is to identify several complexity measurements that most significantly affect and are most sensitive to the accuracy of the EBRB system. In the experimental section, we validate the impact of complexity measurements on the EBRB model across 108 classification task datasets, and perform sensitivity analysis in determining the top five complexity measurements that most significantly impact the performance of the EBRB model. Accordingly, we provide guidance that can help decision-makers in the data preprocessing aspects of the EBRB model, thereby improving its performance and enhancing the accuracy of the EBRB model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press (2000)
Fang, W., Gong, X., Liu, G., Wu, Y., Fu, Y.: A balance adjusting approach of extended belief-rule-based system for imbalanced classification problem. IEEE Access 8, 41201–41212 (2020)
Fu, Y.G., Huang, H.Y., Guan, Y., Wang, Y.M., Liu, W., Fang, W.J.: EBRB cascade classifier for imbalanced data via rule weight updating. Knowl. Based Syst. 223, 107010 (2021)
Garcia, L., Lorena, A., Lehmann, J.: ECoL: complexity measures for classification problems (2018)
Garcia, L.P., de Carvalho, A.C., Lorena, A.C.: Effect of label noise in the complexity of classification problems. Neurocomputing 160, 108–119 (2015)
Garcia, L.P., Lorena, A.C., de Souto, M.C., Ho, T.K.: Classifier recommendation using data complexity measures. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 874–879. IEEE (2018)
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Ho, T.K., Basu, M., Law, M.H.C.: Measures of geometrical complexity in classification problems. In: Data Complexity in Pattern Recognition, pp. 1–23 (2006)
Komorniczak, J., Ksieniewicz, P.: problexity–an open-source python library for supervised learning problem complexity assessment. Neurocomputing 521, 126–136 (2023)
Leyva, E., González, A., Perez, R.: A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans. Knowl. Data Eng. 27(2), 354–367 (2014)
Liu, J., Martinez, L., Calzada, A., Wang, H.: A novel belief rule base representation, generation and its inference methodology. Knowl. Based Syst. 53, 129–141 (2013)
Lorena, A.C., Costa, I.G., Spolaôr, N., De Souto, M.C.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75(1), 33–42 (2012)
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)
Mendel, J.M., Bonissone, P.P.: Critical thinking about explainable AI (XaI) for rule-based fuzzy systems. IEEE Trans. Fuzzy Syst. 29(12), 3579–3593 (2021). https://doi.org/10.1109/TFUZZ.2021.3079503
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., USA (1997)
Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis. John Wiley & Sons (2021)
Orriols-Puig, A., Macia, N., Ho, T.K.: Documentation for the data complexity library in C++. Universitat Ramon Llull, La Salle 196(1–40), 12 (2010)
Sotoca, J.M., Mollineda, R.A., Sánchez, J.S.: A meta-learning framework for pattern classication by means of data complexity measures. Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial 10(29), 31–38 (2006)
Tanwani, A.K., Farooq, M.: Classification potential vs. classification accuracy: a comprehensive study of evolutionary algorithms with biomedical datasets. In: Bacardit, J., Browne, W., Drugowitsch, J., Bernadó-Mansilla, E., Butz, M.V. (eds.) IWLCS 2008-2009. LNCS (LNAI), vol. 6471, pp. 127–144. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17508-4_9
Van Rijn, J.N., et al.: OpenML: a collaborative science platform. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III 13. LNCS (LNAI), vol. 8190, pp. 645–649. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_46
Yang, J.B., Liu, J., Wang, J., Sii, H.S., Wang, H.W.: Belief rule-base inference methodology using the evidential reasoning approach-Rimer. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 36(2), 266–285 (2006)
Yang, L.H., Liu, J., Wang, Y.M., Martínez, L.: A micro-extended belief rule-based system for big data multiclass classification problems. IEEE Trans. Syst. Man Cybern. Syst. 51(1), 420–440 (2018)
Yang, L.H., Ren, T.Y., Ye, F.F., Hu, H., Wang, H., Zheng, H.: Extended belief rule base with ensemble imbalanced learning for lymph node metastasis diagnosis in endometrial carcinoma. Eng. Appl. Artif. Intell. 126, 106950 (2023)
Ye, F.F., Yang, L.H., Wang, Y.M., Lu, H.: A data-driven rule-based system for china’s traffic accident prediction by considering the improvement of safety efficiency. Comput. Ind. Eng. 176, 108924 (2023)
Acknowledgements
The authors would like to extend the sincere thanks to the National-Local Joint Engineering Laboratory of System Credibility Automatic Verification for generous support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xian, Y., Zeng, G., Liu, J. (2024). Data Complexity and Its Effect on EBRB System Accuracy. In: Bravo, J., Nugent, C., Cleland, I. (eds) Proceedings of the International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI 2024). UCAmI 2024. Lecture Notes in Networks and Systems, vol 1212. Springer, Cham. https://doi.org/10.1007/978-3-031-77571-0_80
Download citation
DOI: https://doi.org/10.1007/978-3-031-77571-0_80
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77570-3
Online ISBN: 978-3-031-77571-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)