Abstract
Automated Machine Learning (AutoML) deals with finding well-performing machine learning models and their corresponding configurations without the need of machine learning experts. However, if one assumes an online learning scenario, where an AutoML instance executes on evolving data streams, the question for the best model and its configuration with respect to occurring changes in the data distribution remains open. Algorithms developed for online learning settings rely on few and homogeneous models and do not consider data mining pipelines or the adaption of their configuration. We, therefore, introduce EvoAutoML, an evolution-based online learning framework consisting of heterogeneous and connectable models that supports large and diverse configuration spaces and adapts to the online learning scenario. We present experiments with an implementation of EvoAutoML on a diverse set of synthetic and real datasets, and show that our proposed approach outperforms state-of-the-art online algorithms as well as strong ensemble baselines in a traditional test-then-train evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Changes in data distributions or patterns are also referred to as concept drift [36].
- 2.
- 3.
References
Agrawal, R., Imielinski, T., Swami, A.N.: Database mining: a performance perspective. IEEE TKDE 5(6), 914–925 (1993)
Alberg, D., Last, M., Kandel, A.: Knowledge discovery in data streams with regression tree methods. Wiley Interdisc. DMKD 2(1), 69–78 (2012)
Bahri, M., Bifet, A., Gama, J., Gomes, H.M., Maniu, S.: Data stream analysis: foundations, major tasks and tools. Wiley Interdisc.: DMKD 11(3), e1405 (2021)
Bifet, A., Gavaldà , R.: Learning from time-changing data with adaptive windowing. In: SIAM ICD, pp. 443–448 (2007)
Bifet, A., Gavaldà , R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22
Bifet, A., Gavaldà , R., Holmes, G., Pfahringer, B.: Machine Learning for Data Streams: With Practical Examples in MOA. MIT Press, Cambridge (2018)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. JMLR 11, 1601–1604 (2010)
Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 135–150. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_15
Bifet, A., Read, J., Žliobaitė, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8188, pp. 465–479. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40988-2_30
Breiman, L.: Random forests. ML 45(1), 5–32 (2001)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)
Brochu, E., Cora, V.M., de Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. CoRR (2010)
Celik, B., Vanschoren, J.: Adaptation strategies for automated machine learning on evolving data. CoRR abs/2006.06480 (2020)
Domingos, P.M., Hulten, G.: Mining high-speed data streams. In: Ramakrishnan, R., Stolfo, S.J., Bayardo, R.J., Parsa, I. (eds.) SIGKDD, pp. 71–80. ACM (2000)
Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-Sklearn 2.0: the next generation. CoRR (2020)
Feurer, M., Klein, A., Eggensperger, E.A.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D. (eds.) Advances in Neural Information Processing Systems 28: NIPS, pp. 2962–2970 (2015)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Gijsbers, P., Vanschoren, J.: GAMA: genetic automated machine learning assistant. J. Open Sour. Softw. 4(33), 1132 (2019)
Gomes, H.M., et al.: Adaptive random forests for evolving data stream classification. ML 106(9–10), 1469–1495 (2017)
Gomes, H.M., Read, J., Bifet, A.: Streaming random patches for evolving data stream classification. In: ICDM. IEEE (2019)
Hall, M.A., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: SIGKDD, pp. 97–106. ACM (2001)
Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning - Methods, Systems, Challenges. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5
Imbrea, A.: An empirical comparison of automated machine learning techniques for data streams. B.S. thesis, University of Twente (2020)
Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. JMLR 18, 25:1–25:5 (2017)
Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36(1), 250–256 (2020)
Montiel, J., et al.: River: machine learning for streaming data in Python (2020)
Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. JMLR 19, 72:1–72:5 (2018)
Oza, N.C.: Online bagging and boosting. In: ICSMC, pp. 2340–2345. IEEE (2005)
Oza, N.C., Russell, S.J.: Experimental comparisons of online and batch versions of bagging and boosting. In: Lee, D., Schkolnick, M., Provost, F.J., Srikant, R. (eds.) ACM SIGKDD, pp. 359–364. ACM (2001)
Oza, N.C., Russell, S.J.: Online bagging and boosting. In: Richardson, T.S., Jaakkola, T.S. (eds.) Workshop on AISTATS (2001)
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: AAAI, pp. 4780–4789. AAAI (2019)
van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J.: Having a blast: meta-learning and heterogeneous ensembles for data streams. In: Aggarwal, C.C., Zhou, Z., Tuzhilin, A., Xiong, H., Wu, X. (eds.) ICDM, pp. 1003–1008 (2015)
Stetsenko, P.: Machine learning with Python and H2O (2020). http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/PythonBooklet.pdf
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: SIGKDD, pp. 847–855. ACM (2013)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. ML 23(1), 69–101 (1996)
Zöller, M., Huber, M.F.: Survey on automated machine learning. CoRR abs/1904.12054 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kulbach, C., Montiel, J., Bahri, M., Heyden, M., Bifet, A. (2022). Evolution-Based Online Automated Machine Learning. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13280. Springer, Cham. https://doi.org/10.1007/978-3-031-05933-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-05933-9_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05932-2
Online ISBN: 978-3-031-05933-9
eBook Packages: Computer ScienceComputer Science (R0)