Skip to main content

Evolution-Based Online Automated Machine Learning

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13280))

Included in the following conference series:

Abstract

Automated Machine Learning (AutoML) deals with finding well-performing machine learning models and their corresponding configurations without the need of machine learning experts. However, if one assumes an online learning scenario, where an AutoML instance executes on evolving data streams, the question for the best model and its configuration with respect to occurring changes in the data distribution remains open. Algorithms developed for online learning settings rely on few and homogeneous models and do not consider data mining pipelines or the adaption of their configuration. We, therefore, introduce EvoAutoML, an evolution-based online learning framework consisting of heterogeneous and connectable models that supports large and diverse configuration spaces and adapts to the online learning scenario. We present experiments with an implementation of EvoAutoML on a diverse set of synthetic and real datasets, and show that our proposed approach outperforms state-of-the-art online algorithms as well as strong ensemble baselines in a traditional test-then-train evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Changes in data distributions or patterns are also referred to as concept drift [36].

  2. 2.

    https://github.com/kulbachcedric/EvOAutoML.git.

  3. 3.

    https://github.com/kulbachcedric/EvOAutoML.git.

References

  1. Agrawal, R., Imielinski, T., Swami, A.N.: Database mining: a performance perspective. IEEE TKDE 5(6), 914–925 (1993)

    Google Scholar 

  2. Alberg, D., Last, M., Kandel, A.: Knowledge discovery in data streams with regression tree methods. Wiley Interdisc. DMKD 2(1), 69–78 (2012)

    Google Scholar 

  3. Bahri, M., Bifet, A., Gama, J., Gomes, H.M., Maniu, S.: Data stream analysis: foundations, major tasks and tools. Wiley Interdisc.: DMKD 11(3), e1405 (2021)

    Google Scholar 

  4. Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SIAM ICD, pp. 443–448 (2007)

    Google Scholar 

  5. Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22

    Chapter  Google Scholar 

  6. Bifet, A., Gavaldà, R., Holmes, G., Pfahringer, B.: Machine Learning for Data Streams: With Practical Examples in MOA. MIT Press, Cambridge (2018)

    Book  Google Scholar 

  7. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. JMLR 11, 1601–1604 (2010)

    Google Scholar 

  8. Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 135–150. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_15

    Chapter  Google Scholar 

  9. Bifet, A., Read, J., Žliobaitė, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8188, pp. 465–479. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40988-2_30

    Chapter  MATH  Google Scholar 

  10. Breiman, L.: Random forests. ML 45(1), 5–32 (2001)

    MATH  Google Scholar 

  11. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)

    Google Scholar 

  12. Brochu, E., Cora, V.M., de Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. CoRR (2010)

    Google Scholar 

  13. Celik, B., Vanschoren, J.: Adaptation strategies for automated machine learning on evolving data. CoRR abs/2006.06480 (2020)

    Google Scholar 

  14. Domingos, P.M., Hulten, G.: Mining high-speed data streams. In: Ramakrishnan, R., Stolfo, S.J., Bayardo, R.J., Parsa, I. (eds.) SIGKDD, pp. 71–80. ACM (2000)

    Google Scholar 

  15. Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-Sklearn 2.0: the next generation. CoRR (2020)

    Google Scholar 

  16. Feurer, M., Klein, A., Eggensperger, E.A.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D. (eds.) Advances in Neural Information Processing Systems 28: NIPS, pp. 2962–2970 (2015)

    Google Scholar 

  17. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29

    Chapter  Google Scholar 

  18. Gijsbers, P., Vanschoren, J.: GAMA: genetic automated machine learning assistant. J. Open Sour. Softw. 4(33), 1132 (2019)

    Article  Google Scholar 

  19. Gomes, H.M., et al.: Adaptive random forests for evolving data stream classification. ML 106(9–10), 1469–1495 (2017)

    MathSciNet  Google Scholar 

  20. Gomes, H.M., Read, J., Bifet, A.: Streaming random patches for evolving data stream classification. In: ICDM. IEEE (2019)

    Google Scholar 

  21. Hall, M.A., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  22. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: SIGKDD, pp. 97–106. ACM (2001)

    Google Scholar 

  23. Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning - Methods, Systems, Challenges. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5

    Book  Google Scholar 

  24. Imbrea, A.: An empirical comparison of automated machine learning techniques for data streams. B.S. thesis, University of Twente (2020)

    Google Scholar 

  25. Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. JMLR 18, 25:1–25:5 (2017)

    Google Scholar 

  26. Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36(1), 250–256 (2020)

    Article  Google Scholar 

  27. Montiel, J., et al.: River: machine learning for streaming data in Python (2020)

    Google Scholar 

  28. Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. JMLR 19, 72:1–72:5 (2018)

    Google Scholar 

  29. Oza, N.C.: Online bagging and boosting. In: ICSMC, pp. 2340–2345. IEEE (2005)

    Google Scholar 

  30. Oza, N.C., Russell, S.J.: Experimental comparisons of online and batch versions of bagging and boosting. In: Lee, D., Schkolnick, M., Provost, F.J., Srikant, R. (eds.) ACM SIGKDD, pp. 359–364. ACM (2001)

    Google Scholar 

  31. Oza, N.C., Russell, S.J.: Online bagging and boosting. In: Richardson, T.S., Jaakkola, T.S. (eds.) Workshop on AISTATS (2001)

    Google Scholar 

  32. Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: AAAI, pp. 4780–4789. AAAI (2019)

    Google Scholar 

  33. van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J.: Having a blast: meta-learning and heterogeneous ensembles for data streams. In: Aggarwal, C.C., Zhou, Z., Tuzhilin, A., Xiong, H., Wu, X. (eds.) ICDM, pp. 1003–1008 (2015)

    Google Scholar 

  34. Stetsenko, P.: Machine learning with Python and H2O (2020). http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/PythonBooklet.pdf

  35. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: SIGKDD, pp. 847–855. ACM (2013)

    Google Scholar 

  36. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. ML 23(1), 69–101 (1996)

    Google Scholar 

  37. Zöller, M., Huber, M.F.: Survey on automated machine learning. CoRR abs/1904.12054 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cedric Kulbach .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kulbach, C., Montiel, J., Bahri, M., Heyden, M., Bifet, A. (2022). Evolution-Based Online Automated Machine Learning. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13280. Springer, Cham. https://doi.org/10.1007/978-3-031-05933-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05933-9_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05932-2

  • Online ISBN: 978-3-031-05933-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy