Dsa-PAML: a parallel automated machine learning system via dual-stacked autoencoder

Liu, Pengjie; Pan, Fucheng; Zhou, Xiaofeng; Li, Shuai; Zeng, Pengyu; Liu, Shurui; Jin, Liang

doi:10.1007/s00521-022-07119-2

Dsa-PAML: a parallel automated machine learning system via dual-stacked autoencoder

Original Article
Published: 28 March 2022

Volume 34, pages 12985–13006, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Pengjie Liu^1,2,3,4,
Fucheng Pan^1,2,3,
Xiaofeng Zhou ORCID: orcid.org/0000-0001-9837-1261^1,2,3,
Shuai Li^1,2,3,4,
Pengyu Zeng^1,2,3,4,
Shurui Liu^1,2,3 &
…
Liang Jin^1,2,3

447 Accesses
Explore all metrics

Abstract

Finding a high-performance machine learning pipeline (ML pipeline) for a supervised learning task takes much time. It requires many choices, including preprocessing datasets, selecting algorithms, tuning hyperparameters, and ensembling candidate models. With increasing pipelines arises a combination explosion problem. This work presents a new automated machine learning (AutoML) system called Dsa-PAML to address this challenge by recommending, training, and ensembling suitable models for supervised learning tasks. Dsa-PAML is a parallel automated system based on a dual-stacked autoencoder (Dsa). Firstly, meta-features of datasets and ML pipelines are used to alleviate cold-start recommendation problems. Secondly, a novel dual-stacked autoencoder is used to simultaneously learn the latent features of datasets and ML pipelines, efficiently learning collaborations of both datasets and ML pipelines and recommending suitable ML pipelines for a new dataset. Thirdly, Dsa-PAML can train the recommended ML pipelines on the new dataset in a parallel method, which substantially reduces the time complexity of the proposed method. Finally, a parallel selective ensemble system is embedded into Dsa-PAML. It selects base models from candidate ML pipelines according to their runtime, classification performance, and diversity on the validation set, enhancing Dsa-PAML’s stability for most datasets. Amounts of experiments on 30 UCI datasets show that our approach outperforms current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CF-DAML: Distributed automated machine learning based on collaborative filtering

Article 31 March 2022

A Stacked Autoencoder Based Meta-Learning Model for Global Optimization

Data-Driven Recommendation Model with Meta-learning Autoencoder for Algorithm Selection

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Baer JL (1973) A survey of some theoretical aspects of multiprocessing. ACM Comput Surv (CSUR) 5(1):31–80
Article Google Scholar
Baudart G, Hirzel M, Kate K, Ram P, Shinnar A, Tsay J (2021) Pipeline combinators for gradual automl. Advances in Neural Information Processing Systems, 34
Chen X, Wujek B (2020) Autodal: distributed active learning with automatic hyperparameter selection. Proceedings of the AAAI conference on artificial intelligence, 34(04):3537–3544 https://doi.org/10.1609/aaai.v34i04.5759. https://ojs.aaai.org/index.php/AAAI/article/view/5759
Davies A, Veličković P, Buesing L, Blackwell S, Zheng D, Tomašev N, Tanburn R, Battaglia P, Blundell C, Juhász A et al (2021) Advancing mathematics by guiding human intuition with ai. Nature 600(7887):70–74
Article Google Scholar
Dong B, Zhu Y, Li L, Wu X (2020) Hybrid collaborative recommendation via dual-autoencoder. IEEE Access 8:46030–46040
Article Google Scholar
Feldmann J, Youngblood N, Karpov M, Gehring H, Li X, Stappers M, Le Gallo M, Fu X, Lukashchuk A, Raja AS et al (2021) Parallel convolutional processing using an integrated photonic tensor core. Nature 589(7840):52–58
Article Google Scholar
Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2015) Efficient and robust automated machine learning. In: proceedings of the 28th International conference on neural information processing systems, Volume 2, NIPS’15. MIT Press, Cambridge, MA, USA, p 2755-2763
Fusi N, Sheth R, Elibol M (2018) Probabilistic matrix factorization for automated machine learning. Adv Neural Inf Process Syst 31:3348–3357
Google Scholar
Guo X, Lin W, Li Y, Liu Z, Yang L, Zhao S, Zhu Z (2020) Dken: deep knowledge-enhanced network for recommender systems. Inf Sci 540:263–277
Article MathSciNet Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Article Google Scholar
Kaissis GA, Makowski MR, Rückert D, Braren RF (2020) Secure, privacy-preserving and federated machine learning in medical imaging. Nat Mach Intell 2(6):305–311
Article Google Scholar
Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML workshop on AutoML, Citeseer. vol 9, p 50
Kong H, Yan J, Wang H, Fan L (2020) Energy management strategy for electric vehicles based on deep q-learning using bayesian optimization. Neural Comput Appl 32(18):14431–14445
Article Google Scholar
Krogh A, Vedelsby J et al (1995) Neural network ensembles, cross validation, and active learning. Adv Neural Inf Process Syst 7:231–238
Google Scholar
Kumar MR, Venkatesh J, Rahman AMZ (2021) Data mining and machine learning in retail business: developing efficiencies for better customer retention. J Ambient Intell Human Comput, pp 1–13
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Li D, Deng L, Cai Z (2021) Design of traffic object recognition system based on machine learning. Neural Comput Appl 33(14):8143–8156
Article Google Scholar
Li L, Jamieson K, De Salvo G, Talwalkar RA, Hyperband A (2016) A novel bandit-based approach to hyperparameter optimization. Computer Vision and Pattern Recognition
Li Y, Jiawei J, Gao J, Shao Y, Zhang C, Cui B (2020) Efficient automatic cash via rising bandits. Proc AAAI Conf Artif Intell 34:4763–4771. https://doi.org/10.1609/aaai.v34i04.5910
Article Google Scholar
Lian G, Wang Y, Qin H, Chen G (2021) Towards unified on-road object detection and depth estimation from a single image. Int J Mach Learn Cybernet, pp 1–11
Lu H, Ma X, Ma M (2021) A hybrid multi-objective optimizer-based model for daily electricity demand prediction considering covid-19. Energy 219:119568
Article Google Scholar
Maher M, Sakr S (2019) Smartml: A meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. In: EDBT: 22nd International conference on extending database technology
Marmolin H (1986) Subjective mse measures. IEEE Trans Syst Man Cybernet 16(3):486–489
Article Google Scholar
Marowka A (2018) Python accelerators for high-performance computing. J Supercomput 74(4):1449–1460
Article Google Scholar
Mohr F, Wever M, Hüllermeier E (2018) Ml-plan: automated machine learning via hierarchical planning. Mach Learn 107(8):1495–1515
Article MathSciNet Google Scholar
Najafi M, Sadoghi M, Jacobsen HA (2020) Scalable multiway stream joins in hardware. IEEE Trans Knowl Data Eng 32(12):2438–2452. https://doi.org/10.1109/TKDE.2019.2916860
Article Google Scholar
Narciso DA, Martins F (2020) Application of machine learning tools for energy efficiency in industry: a review. Energy Rep 6:1181–1199
Article Google Scholar
Nguyen V, Gupta S, Rana S, Li C, Venkatesh S (2019) Filtering bayesian optimization approach in weakly specified search space. Knowl Inf Syst 60(1):385–413
Article Google Scholar
Olson RS, Moore JH (2016) Tpot: a tree-based pipeline optimization tool for automating machine learning. In: Workshop on automatic machine learning, PMLR. pp 66–74
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Rakotoarison H, Schoenauer M, Sebag M (2019) Automated machine learning with monte-carlo tree search. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19. International joint conferences on artificial intelligence organization. pp 3296–3303, https://doi.org/10.24963/ijcai.2019/457
Rama K, Kumar P, Bhasker B (2021) Deep autoencoders for feature learning with embeddings for recommendations: a novel recommender system solution. Neural Comput Appl 33:1–11
Article Google Scholar
Raschka S (2018) Mlxtend: providing machine learning and data science utilities and extensions to python’s scientific computing stack. J Open Sour Softw 3(24):638
Article Google Scholar
Ribeiro MHDM, dos Santos Coelho L (2020) Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl Soft Comput 86:105837
Article Google Scholar
Rosenfield GH, Fitzpatrick-Lins K (1986) A coefficient of agreement as a measure of thematic classification accuracy. Photogramm Eng Remote Sens 52(2):223–227
Google Scholar
de Sá AG, Pinto WJG, Oliveira LOV, Pappa GL (2017) Recipe: a grammar-based framework for automatically evolving classification pipelines. In: European conference on genetic programming, Springer. pp 246–261
Sauter NK, Hattne J, Grosse-Kunstleve RW, Echols N (2013) New python-based methods for data processing. Acta Crystallogr Sect D Biol Crystallogr 69(7):1274–1282
Article Google Scholar
Shao T, Zhang H, Cheng K, Zhang K, Bie L (2021) The hierarchical task network planning method based on monte carlo tree search. Knowl-Based Syst 225:107067
Article Google Scholar
Shi J, Yu T, Goebel K, Wu D (2021) Remaining useful life prediction of bearings using ensemble learning: the impact of diversity in base learners and features. J Comput Inf Sci Eng 21(2):0210074
Article Google Scholar
Sun T, Zhou ZH (2018) Structural diversity for decision tree ensemble learning. Frontiers Comput Sci 12(3):560–570
Article Google Scholar
Tahmasebi H, Ravanmehr R, Mohamadrezaei R (2021) Social movie recommender system based on deep autoencoder network using twitter data. Neural Comput Appl 33(5):1607–1623
Article Google Scholar
Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 847–855
Vanschoren J, Van Rijn JN, Bischl B, Torgo L (2014) Openml: networked science in machine learning. ACM SIGKDD Explor Newslett 15(2):49–60
Article Google Scholar
Wang L, Xie S, Li T, Fonseca R, Tian Y (2021) Sample-efficient neural architecture search by learning actions for monte carlo tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence
Wang Q, Xu W, Zheng H (2018) Combining the wisdom of crowds and technical analysis for financial market prediction using deep random subspace ensembles. Neurocomputing 299:51–61
Article Google Scholar
Wang Y, Wang D, Geng N, Wang Y, Yin Y, Jin Y (2019) Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Appl Soft Comput 77:188–204
Article Google Scholar
Wei GW (2019) Protein structure prediction beyond alphafold. Nat Mach Intell 1(8):336–337
Article Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article Google Scholar
Xiao J, He C, Jiang X, Liu D (2010) A dynamic classifier ensemble selection approach for noise data. Inf Sci 180(18):3402–3421
Article Google Scholar
Yang C, Akimoto Y, Kim DW, Udell M (2019) Oboe: collaborative filtering for automl model selection. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1173–1183
Yang C, Fan J, Wu Z, Udell M (2020) Automl pipeline selection: efficiently navigating the combinatorial space. In: proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1446–1456
Zhang F, Zhai J, Wu B, He B, Chen W, Du X (2021) Automatic irregularity-aware fine-grained workload partitioning on integrated architectures. IEEE Trans Knowl Data Eng 33(3):867–881. https://doi.org/10.1109/TKDE.2019.2940184
Article Google Scholar
Zhang G, Liu Y, Jin X (2020) A survey of autoencoder-based recommender systems. Frontiers Comput Sci 14(2):430–450
Article Google Scholar
Zhang S, Yao L, Xu X, Wang S, Zhu L (2017) Hybrid collaborative recommendation via semi-autoencoder. In: international conference on neural information processing. Springer, pp 185–193
Zhu B, Han J, Zhao J, Wang H (2020) Combined hierarchical learning framework for personalized automatic lane-changing. IEEE Transactions on Intelligent Transportation Systems
Zhuang F, Zhang Z, Qian M, Shi C, Xie X, He Q (2017) Representation learning via dual-autoencoder for recommendation. Neural Netw 90:83–89
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China under Grant No. 2019B090916002.

Author information

Authors and Affiliations

Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang, 110016, China
Pengjie Liu, Fucheng Pan, Xiaofeng Zhou, Shuai Li, Pengyu Zeng, Shurui Liu & Liang Jin
Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, China
Pengjie Liu, Fucheng Pan, Xiaofeng Zhou, Shuai Li, Pengyu Zeng, Shurui Liu & Liang Jin
Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, 110169, China
Pengjie Liu, Fucheng Pan, Xiaofeng Zhou, Shuai Li, Pengyu Zeng, Shurui Liu & Liang Jin
University of Chinese Academy of Sciences, Beijing, 100049, China
Pengjie Liu, Shuai Li & Pengyu Zeng

Authors

Pengjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fucheng Pan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Li
View author publications
You can also search for this author in PubMed Google Scholar
Pengyu Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Shurui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Liang Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: The details of ML pipelines in the MKB

The hyperparameters configurations of the data-preprocessors and the ML classifiers are shown in Tables 11 and 12. The first column is the model’s name and the number of times the associated item appears in the MKB. The second column is hyperparameters and ranges corresponding to the models, where ${\{i, j,...\}}$ is the discrete value; [i, j] denotes the closed interval between i and j; (i, j) denotes the open interval between i and j.

Table 11 The data-preprocessors and hyperparameters configurations

Full size table

Table 12 The ML classifiers and hyperparameters configurations

Full size table

Appendix B: The attributes of the test datasets

See Table 13 and Fig. 11.

Table 13 The attributes of the test datasets

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, P., Pan, F., Zhou, X. et al. Dsa-PAML: a parallel automated machine learning system via dual-stacked autoencoder. Neural Comput & Applic 34, 12985–13006 (2022). https://doi.org/10.1007/s00521-022-07119-2

Download citation

Received: 24 August 2021
Accepted: 17 February 2022
Published: 28 March 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00521-022-07119-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dsa-PAML: a parallel automated machine learning system via dual-stacked autoencoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CF-DAML: Distributed automated machine learning based on collaborative filtering

A Stacked Autoencoder Based Meta-Learning Model for Global Optimization

Data-Driven Recommendation Model with Meta-learning Autoencoder for Algorithm Selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: The details of ML pipelines in the MKB

Appendix B: The attributes of the test datasets

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Dsa-PAML: a parallel automated machine learning system via dual-stacked autoencoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CF-DAML: Distributed automated machine learning based on collaborative filtering

A Stacked Autoencoder Based Meta-Learning Model for Global Optimization

Data-Driven Recommendation Model with Meta-learning Autoencoder for Algorithm Selection

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: The details of ML pipelines in the MKB

Appendix B: The attributes of the test datasets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.