Skip to main content

Abstract

In recent years, the relationship between data quality and machine learning models has garnered increasing attention. The present work quantifies dataset quality using complexity measurements and conducts detailed experimental analysis on the impact of these measurements on the accuracy of the Extend Belief Rule Base (EBRB) model, a representative advanced rule-based system that has attracted great attention in the last few years. The purpose is to identify several complexity measurements that most significantly affect and are most sensitive to the accuracy of the EBRB system. In the experimental section, we validate the impact of complexity measurements on the EBRB model across 108 classification task datasets, and perform sensitivity analysis in determining the top five complexity measurements that most significantly impact the performance of the EBRB model. Accordingly, we provide guidance that can help decision-makers in the data preprocessing aspects of the EBRB model, thereby improving its performance and enhancing the accuracy of the EBRB model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 179.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/inspection/_permutation_importance.py.

References

  1. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  2. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press (2000)

    Google Scholar 

  3. Fang, W., Gong, X., Liu, G., Wu, Y., Fu, Y.: A balance adjusting approach of extended belief-rule-based system for imbalanced classification problem. IEEE Access 8, 41201–41212 (2020)

    Article  Google Scholar 

  4. Fu, Y.G., Huang, H.Y., Guan, Y., Wang, Y.M., Liu, W., Fang, W.J.: EBRB cascade classifier for imbalanced data via rule weight updating. Knowl. Based Syst. 223, 107010 (2021)

    Article  Google Scholar 

  5. Garcia, L., Lorena, A., Lehmann, J.: ECoL: complexity measures for classification problems (2018)

    Google Scholar 

  6. Garcia, L.P., de Carvalho, A.C., Lorena, A.C.: Effect of label noise in the complexity of classification problems. Neurocomputing 160, 108–119 (2015)

    Article  Google Scholar 

  7. Garcia, L.P., Lorena, A.C., de Souto, M.C., Ho, T.K.: Classifier recommendation using data complexity measures. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 874–879. IEEE (2018)

    Google Scholar 

  8. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  9. Ho, T.K., Basu, M., Law, M.H.C.: Measures of geometrical complexity in classification problems. In: Data Complexity in Pattern Recognition, pp. 1–23 (2006)

    Google Scholar 

  10. Komorniczak, J., Ksieniewicz, P.: problexity–an open-source python library for supervised learning problem complexity assessment. Neurocomputing 521, 126–136 (2023)

    Article  Google Scholar 

  11. Leyva, E., González, A., Perez, R.: A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans. Knowl. Data Eng. 27(2), 354–367 (2014)

    Article  Google Scholar 

  12. Liu, J., Martinez, L., Calzada, A., Wang, H.: A novel belief rule base representation, generation and its inference methodology. Knowl. Based Syst. 53, 129–141 (2013)

    Article  Google Scholar 

  13. Lorena, A.C., Costa, I.G., Spolaôr, N., De Souto, M.C.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75(1), 33–42 (2012)

    Article  Google Scholar 

  14. Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)

    Article  Google Scholar 

  15. Mendel, J.M., Bonissone, P.P.: Critical thinking about explainable AI (XaI) for rule-based fuzzy systems. IEEE Trans. Fuzzy Syst. 29(12), 3579–3593 (2021). https://doi.org/10.1109/TFUZZ.2021.3079503

    Article  Google Scholar 

  16. Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., USA (1997)

    Google Scholar 

  17. Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis. John Wiley & Sons (2021)

    Google Scholar 

  18. Orriols-Puig, A., Macia, N., Ho, T.K.: Documentation for the data complexity library in C++. Universitat Ramon Llull, La Salle 196(1–40), 12 (2010)

    Google Scholar 

  19. Sotoca, J.M., Mollineda, R.A., Sánchez, J.S.: A meta-learning framework for pattern classication by means of data complexity measures. Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial 10(29), 31–38 (2006)

    Google Scholar 

  20. Tanwani, A.K., Farooq, M.: Classification potential vs. classification accuracy: a comprehensive study of evolutionary algorithms with biomedical datasets. In: Bacardit, J., Browne, W., Drugowitsch, J., Bernadó-Mansilla, E., Butz, M.V. (eds.) IWLCS 2008-2009. LNCS (LNAI), vol. 6471, pp. 127–144. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17508-4_9

    Chapter  Google Scholar 

  21. Van Rijn, J.N., et al.: OpenML: a collaborative science platform. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III 13. LNCS (LNAI), vol. 8190, pp. 645–649. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_46

    Chapter  Google Scholar 

  22. Yang, J.B., Liu, J., Wang, J., Sii, H.S., Wang, H.W.: Belief rule-base inference methodology using the evidential reasoning approach-Rimer. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 36(2), 266–285 (2006)

    Article  Google Scholar 

  23. Yang, L.H., Liu, J., Wang, Y.M., Martínez, L.: A micro-extended belief rule-based system for big data multiclass classification problems. IEEE Trans. Syst. Man Cybern. Syst. 51(1), 420–440 (2018)

    Article  Google Scholar 

  24. Yang, L.H., Ren, T.Y., Ye, F.F., Hu, H., Wang, H., Zheng, H.: Extended belief rule base with ensemble imbalanced learning for lymph node metastasis diagnosis in endometrial carcinoma. Eng. Appl. Artif. Intell. 126, 106950 (2023)

    Article  Google Scholar 

  25. Ye, F.F., Yang, L.H., Wang, Y.M., Lu, H.: A data-driven rule-based system for china’s traffic accident prediction by considering the improvement of safety efficiency. Comput. Ind. Eng. 176, 108924 (2023)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to extend the sincere thanks to the National-Local Joint Engineering Laboratory of System Credibility Automatic Verification for generous support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xian, Y., Zeng, G., Liu, J. (2024). Data Complexity and Its Effect on EBRB System Accuracy. In: Bravo, J., Nugent, C., Cleland, I. (eds) Proceedings of the International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI 2024). UCAmI 2024. Lecture Notes in Networks and Systems, vol 1212. Springer, Cham. https://doi.org/10.1007/978-3-031-77571-0_80

Download citation

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy