0% found this document useful (0 votes)
115 views8 pages

Mlops - Definitions, Tools and Challenges: Elated Ork

Uploaded by

Navendu Brajesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views8 pages

Mlops - Definitions, Tools and Challenges: Elated Ork

Uploaded by

Navendu Brajesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MLOps - Definitions, Tools and Challenges

Georgios Symeonidis Evangelos Nerantzis


MLV Research Group MLV Research Group
Dept. of Computer Science Dept. of Computer Science
2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) | 978-1-6654-8303-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/CCWC54503.2022.9720902

International Hellenic University International Hellenic University


Kavala, Greece Kavala, Greece
giorgos.simeonidis@athenarc.gr e.nerantzis@athenarc.gr

Apostolos Kazakis George A. Papakostas*


MLV Research Group MLV Research Group
Dept. of Computer Science Dept. of Computer Science
International Hellenic University International Hellenic University
Kavala, Greece Kavala, Greece
apkazak@cs.ihu.gr gpapak@cs.ihu.gr

Abstract—This paper is an concentrated overview of the II. R ELATED W ORK


Machine Learning Operations (MLOps) area. Our aim is to
define the operation and the components of such systems by MLOps is a relatively new field and as expected there
highlighting the current problems and trends. In this context is not many review papers. In this section we will initially
we present the different tools and their usefulness in order to report the review papers and then we will mention some of
provide the corresponding guidelines. Moreover, the connection the most important and influential work in every task of the
between MLOps and AutoML (Automated Machine Learning) MLOps cycle (Figure 1). At first, Akshita Goyal [1] provides
is identified and how this combination could work is proposed.
The novelty of our approach relies on the combination of state- a general overview of MLOps, thereupon Yizhen Zhao [2]
of-the-art topics such as AutoML, exlainability and sustain- reviews the academic literature around ML in production in
ability in order to overcome the current challenges in MLOps order to define the importance of MLOps. Moreover Yue
identifying them not only as the answer for the incorporation Zhou et al. [3] focuses around the resource consumption
of ML models in production but also as a possible tool for in every step of MLOps life-cycle. Besides reviews, there
efficient, robust and accurate machine learning models.
are plenty of papers about the applications of MLOps in
Keywords-MLOps; AutoML; machine learning, Deployment; different topics such as MLOps approach in the cloud-native
re-training; monitoring; explainability; robustness; sustainabil- data pipeline design by I. Pölöskei [4] ,application of MLOps
ity; fairness
in prediction of lifestyle diseases from Manjunatha Reddy
I. I NTRODUCTION et al. [5] and SensiX++: Bringing MLOps and Multi-tenant
Model Serving to Sensory Edge Devices by Chulhong Min
Incorporating machine learning models in production is et al. [6].
a challenge that remains from the creation of the first In terms of the different stages of MLOps, Sasu Makineth
models until now. For years data scientists, machine learning et al. [7] describe the importance of MLOps in the field
engineers, front end engineers, production engineers tried to of data science, based on a survey where they collected
find a way to work together and combine their knowledge responses from 331 professionals from 63 different coun-
in order to deploy ready for production models. This task tries. As for the data manipulation task, Cedric Renggli
has many difficulties and it is not easy to overcome them. et al. [8] describe the significance of data quality for an
This is why only a small percentage of the ML projects MLOps system while demonstrates how different aspects of
manage to reach production. In the previous years a set data quality propagate through various stages of machine
of techniques and tools have been proposed and used in learning development. Philipp Ruf et al. [9] examine the
order to minimize as much as possible this kind of prob- role and the connectivity of the MLOps tools for every
lems. The development of these tools had multiple targets. task in the MLOps cycle. Also, they present a recipe for
Data preprocessing, models’ creation, training, evaluation, the selection of the better Open-Source tools. Monitoring
deployment, and monitoring are some of them. As the field and the corresponding challenges were discussed by Janis
of AI progresses such kind of tools are constantly emerging. Klaise et al. [10] using recent examples of production
ready solutions using open source tools. Finally Damnian
978-1-6654-8303-2/22/$31.00 ©2022 IEEE

0453

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:13:21 UTC from IEEE Xplore. Restrictions apply.
A. Tamburri [11] presents the current trends and challenges,
focusing on sustainability and explainability.

III. MLO PS
MLOps(machine learning operations) stands for the col-
lection of techniques and tools for the deployment of ML
models in production [12]. Contains the combination of
DevOps and Machine Learning. DevOps [13] stands for
a set of practices with the main purpose to minimize the Figure 1. MLOps Life-cycle.
needed time for a software release, reducing the gap between
software development and operations [14][15]. The two
main principles of DevOps are Continuous Integration (CI)
and Continuous Delivery (CD). Continuous integration is
the practice by which software development organizations
try to integrate code written by developer teams at frequent
intervals. So they constantly test their code and make small
improvements each time based on the errors and weaknesses
that results from the tests. This results in a reduction in
the software development process cycle [16]. Continuous Figure 2. MLOps Pipeline
delivery is the practice according to which, there is con-
stantly a new version of the software under development
to be installed for testing, evaluation and then production. replicated and delivered reliably at any time, in short cus-
With this practice, the software releases resulting from the tom cycles”. This approach includes three basic procedures
continuous integration with the improvements and the new involving: collection, selection and preparation of data to
features reach the end users much faster [17]. After the be used in model training, in finding and selecting the most
great acceptance of DevOps and the practices of ”continuous efficient model after testing and experimenting with different
software development” in general [18][14], the need to apply models, in developing and sending the selected model in
the same principles that govern DevOps in machine learning production. A simplified form of such a pipeline is shown
models became imperative [12]. This is how these practices, in Figure 2.
called MLOps (Machine Learning Operations), came about. After collecting, evaluating and selecting the data that will
MLOps attempts to automate Machine Learning processes be used for training, we automate the process of creating
using DevOps practices and approaches. The two main models and training them. This allows us to produce more
DevOps principles they seek to serve are: Continuous Inte- than one model which we can test and experiment in order to
gration (CI) and Continuous Delivery (DC) [15]. Although produce a more efficient and effective model while recording
it seems simple in reality it is not. This is due to the fact that the results of our tests. Then we have to resolve various
a Machine Learning model is not independent but is part of issues related to the production of the model, as well as
a wider software system and consists not only of code but submit it to various tests in order to confirm its reliability
also of data. As the data is constantly changing, the model before developing it for production. Finally, we can monitor
is constantly called upon to retrain from the new data that the model and collect the resulting new data, which will
emerges. For this reason, MLOps introduce a new practice, be used to retrain the model, thus ensuring its continuous
in addition to CI and CD, that of Continuous Training (CT), improvement [23].
which aims to automatically retrain the model where needed.
From the above, it becomes clear that compared to DevOps, B. Maturity Levels
MLOps are much more complex and incorporate additional Depending on the level of automation of a MLOps system,
procedures involving data and models [19][9][20]. it can be classified at a corresponding level [19]. These levels
were named by the community maturity levels. Although
A. MLOps pipeline there is no universal maturity model, the two main ones
While there are several attempts to capture and describe were created by Google and Microsoft. Google model [24]
MLOps, the one that is best known is the proposal of consists of three levels and its structure is presented in
ToughWorks [21][22], which automates the life cycle of Figure 3. MLOps level 0: Manual process, MLOps level
end-to-end Machine Learning applications (Figure 2). It is 1: ML pipeline automation, MLOps level 2: CI/CD pipeline
”a software engineering approach in which an interoperable automation. Microsoft model [25] consists of five levels and
team produces machine learning applications based on code, its structure is presented in Figure 4. Level 1: No MLOps,
data and models in small, secure new versions that can be Level 2: DevOps but no MLOps, Level 3: Automated

0454

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:13:21 UTC from IEEE Xplore. Restrictions apply.
A. Data Preprocessing Tools
Data processing tools are divided into two main cate-
gories: data labeling tools and data versioning tools. Data
labeling tools (also called annotation tools, tagging or sorting
data), big data labeling plans such as text, images or sound.
Data labeling tools can in turn be divided into different
categories depending on the task they perform. Some are
designed to highlight specific file types such as videos or
images [27]. Few of these tools can edit all file types.
There are also different types of tags that differ in each
tool. Boundary frames, polygonal annotations, and semantic
Figure 3. Googles Maturity Levels.
segmentation are the most common features in the label
market. Your choices about data labeling tools will be
an essential factor in the success of the machine learning
model. You need to specify the type of data labeling your
organization needs [28]. Labeling accuracy is an important
aspect of data labeling [29]. High quality data creates better
model performance. Data extraction tools (also called data
version controls) by managing different versions of data sets
and storing them in an accessible and well-organized way
[30]. This allows data science teams to gain knowledge, such
as identifying how changes affect model performance and
understanding how data sets evolve. The most important data
preprocessing tools are listed in table I.
Figure 4. Microsoft Maturity Levels.
Name Status Launched in Use
iMerit Private 2012 Data Preprocessing
Pachyderm Private 2014 Data Versioning
Training, Level 4: Automated Model Deployment, Level 5: Labelbox Private 2017 Data Preprocessing
Full MLOps Automated Operations. Prodigy Private 2017 Data Preprocessing
Comet Private 2017 Data Versioning
Data Version Control Open Source 2017 Data Versioning
Qri Open Source 2018 Data Versioning
IV. T OOLS AND P LATFORMS Weights and Biases Private 2018 Data Versioning
Delta Lake Open Source 2019 Data Versioning
In recent years many different tools have emerged in Doccano Open Source 2019 Data Preprocessing
Snorkel Private 2019 Data Preprocessing
order to help automate the sequence of artificial learning Supervisely Private 2019 Data Preprocessing
processes [26]. This section provides an overview of the Segments.ai Private 2020 Data Preprocessing
Dolt Open Source 2020 Data Versioning
different tools and requirements that these tools meet. Note LakeFs Open Source 2020 Data Versioning
that different tools automate different phases in the machine
Table I
learning workflow. The majority of tools come from the DATA P REPROCESSING T OOLS .
open source community because half of all IT organizations
use open source tools for AI and ML and the percentage is
expected to be around two-thirds by 2023. At GitHub alone,
there are 65 million developers and 3 million organizations B. Modeling Tools
contributing to 200 million projects. Therefore, it is not The tools with which we extract features from a raw data
surprising that there are advanced sets of open source tools in set in order to create optimal training data sets are called
the landscape of machine learning and artificial intelligence. feature engineering tools. Tools like these have the ability
Open source tools focus on specific tasks within MLOps to speed up the feature extraction process [31] when applied
instead of providing end-to-end machine learning life-cycle for common applications and generic problems. To monitor
management. These tools and platforms typically require a the versions of the data of each experiment and its results
development environment in Python and R. In recent years as well as to compare between different experiments, we
many different tools have emerged which help in automating use experiment tracking tools, which store all the necessary
the ML pipeline. The choice of tools for MLOps is based on information about the different experiments because devel-
the context of the respective ML solution and the operations oping machine learning projects involve running multiple
setup. experiments with different models, model parameters, or

0455

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:13:21 UTC from IEEE Xplore. Restrictions apply.
Name Status Launched in Use
training data. Hyperparameter tuning or optimization tools
Google Cloud Platform Public 2008 end-to-end
automate the process of searching and selecting hyperpa- Microsoft Azure Public 2010 end-to-end
H2O.ai Open source 2012 end-to-end
rameters that give optimal performance for machine learning Unravel Data Private 2013 Model Monitoring
Algorithmia Private 2014 Model Deployment / Serving
models. Hyperparameters are the parameters of the machine Iguazio Private 2014 end-to-end
Databricks Private 2015 end-to-end
learning models such as the size of a neural network or types TensorFlow Serving Open source 2016 Model Deployment / Serving
of regularization that model developers can adjust to achieve Featuretools
Amazon SageMaker
Private
Public
2017
2017
Feature Engineering
end-to-end
different results [32]. The most important modeling tools are Kubeflow Open Source 2018 Model Deployment / Serving
OpenVino Open source 2018 Model Deployment / Serving
listed in table II. Triton Inference Server Open source 2018 Model Deployment / Serving
Fiddler Private 2018 Model Monitoring
Losswise Private 2018 Model Monitoring
Name Status Launched in Use Alibaba Cloud ML Platform for AI Public 2018 end-to-end
Mlflow Open source 2018 end-to-end
Hyperopt Open Source 2013 Hyperparameter Optimization BentoMl Open Source 2019 Model Deployment / Serving
SigOpt Public 2014 Hyperparameter Optimization Superwise.ai Private 2019 Model Monitoring
Iguazio Data Science Platform Private 2014 Feature Engineering MLrun Open source 2019 Model Monitoring
TsFresh Private 2016 Feature Engineering DataRobot Private 2019 end-to-end
Featuretools Private 2017 Feature Engineering Seldon Private 2020 Model Deployment / Serving
Comet Private 2017 Experiment Tracking Torch Serve Open source 2020 Model Deployment / Serving
Neptune.ai Private 2017 Experiment Tracking KFServing Open source 2020 Model Deployment / Serving
TensorBoard Open source 2017 Experiment Tracking Syndicai Private 2020 Model Deployment / Serving
Google Vizier Public 2017 Hyperparameter Optimization Arize Private 2020 Model Monitoring
Scikti-Optimize Open source 2017 Hyperparameter Optimization Evidently AI Open Source 2020 Model Monitoring
dotData Private 2018 Feature Engineering WhyLabs Open source 2020 Model Monitoring
Weight and Biases Private 2018 Experiment Tracking Cloudera Public 2020 end-to-end
CML Open source 2018 Experiment Tracking BodyWork Open source 2021 Model Deployment / Serving
MLFlow Open source 2018 Experiment Tracking Cortex private 2021 Model Deployment / Serving
Optuna Open source 2018 Hyperparameter Optimization Sagify Open source 2021 Model Deployment / Serving
Talos Open Source 2018 Hyperparameter Optimization Aporia Open source 2021 Model Monitoring
AutoFet Open Source 2019 Feature Engineering Deep checks Private 2021 Model Monitoring
Feast Private 2019 Feature Engineering
GuildAi Open Source 2019 Experiment Tracking Table III
Rasgo Private 2020 Feature Engineering O PERATIONALIZATION T OOLS .
ModelDB Open source 2020 Experiment Tracking
HopsWork Private 2021 Feature Engineering
Aim Open source 2021 Experiment Tracking

Table II
M ODELING T OOLS . Michelangelo(2015) [37], Airbnb with Bighead(2017) [38]
and Netflix with Metaflow(2020) [39].

C. Operationalization Tools E. How to choose the right tools


Then to facilitate the integration of ML models in a The MLOps life-cycle consists of different tasks. Every
production environment, we use machine learning model task has unique characteristics and the corresponding tools
deployment [33] tools. Machine learning model monitoring are developing matching with them. Whereat, an efficient
is a key aspect of every successful ML project because ML MLOps system depends on the choice of the right tools, both
model performance tends to decay after model deployment for each task and for the connectivity between them. Every
due to changes in the input data flow over time [34]. Model challenge also has its own characteristics and the right way
monitoring tools detect data drifts and anomalies over time to go depends on them [40]. There is not a general recipe
and allow setting up alerts in case of performance issues. one choosing some specific tools [9], but we can provide
Finally, we should not forget to mention that at this time some general guidelines, that can be helpful at eliminating
there are tools available that cover the life cycle of an end- some tools simplifying this problem. There are tools that
to-end machine learning application. The most important offer a variety of functionalities and there are tools that are
operationalization tools are listed in table III. more specialized. Generally, the fewer tools we use the better
because it is easier, for example, to archive compatibility
D. The example of colossal companies between 3 tools than between 5. But there are some tasks
It’s common for big companies to develop their own that require better flexibility. So the biggest challenge is to
MLOps platforms in order to deploy fast and successful, find the balance between flexibility and compatibility. For
reliable and reproducible pipelines. The main problems that this reason it is important to make a list of the available tools
led these companies to create their own platforms are mainly that are capable of solving the individual problem in every
two. Initially, the time needed to build and deliver a model task. Then, we can check the compatibility between them
in production [35]. The main goal is to reduce the time re- in order to find the best way to go. This requires excellent
quired, from a few months to a few weeks. Also, the stability knowledge of as many tools as possible from every team
of ML models in their predictions and the reproduction of working on a MLOps system. So the list gets smaller when
these models in different conditions are always two of the we add as a precondition the pre-existing knowledge of these
most important goals. Some illustrative examples of such tools. This is not always a solution, so we can add tools that
companies are : Google with TFX(2019) [36], Uber with are easy to understand and use.

0456

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:13:21 UTC from IEEE Xplore. Restrictions apply.
Name class Status
Auto-sklearn Tool Open Source
Auto-Keras Tool Open Source
TPOT Tool Open Source
Auto-Pytorch Tool Open Source
BigML Tool and Platform commercial
Google Cloud AutoML Platform Open Source
Akkio Platform Open Source
H2O Platform Commercial
Figure 5. AutoML Vs ML. Microsoft Azure AutoML Platform commercial
Amazon SageMaker Autopilot Platform commercial

Table IV
V. AUTO ML AUTO ML T OOLS A ND P LATFORMS .
In the last years more and more companies try to integrate
machine learning models into the production process. For
this reason another software solution was created. AutoML
is the process of automating the different tasks that an ML still AutoML will always be more computational expensive
model creation requires [41]. Specifically, AutoML pipeline compare to classic machine learning techniques, mostly
contains data preparation, models creation, hyper parameter because they perform the same tasks in much more less
tuning, evaluation and validation. With these techniques a time. Also, we are given much less flexibility. The AutoML
bunch of models is trained in the same data set, then a tool works as a pipeline and so we have no control over
hyper parameter fine tuning is applied, finally the models the choices it will make. So AutoML does not qualify for
are evaluating and the best model is exported. Therefore the very specialized tasks. On the other hand, with AutoML
process of creating and selecting the appropriate model, as retraining is a much easier and straightforward task. As long
well as the preparation of the data, turns into a much simpler as the new data are labeled or the models use unsupervised
and more accessible process [42]. This is the reason why techniques, we only have to feed the new data to AutoML
every year more and more companies turn their attention tool and deploy the new model. In conclusion, AutoML is
to AutoML. The combination of AutoML and MLOps a much more quicker and efficient process than the classic
simplifies and makes much more feasible the deployment ML pipeline [53], which can be extremely beneficial in the
of the ML models in production. In these section we will achievement of efficient and high maturity level MLOps
make a brief introduction into the most modern AutoML systems.
tools and platforms aiming at the combination of AutoML
VI. MLO PS C HALLENGES
and MLOps.
In the past years, lots of research tends to focus on
A. Tools and Platforms the maturity levels of MLOps and the transition to fully
Every year more and more tools and platforms are emerg- automated pipelines [19]. Several challenges have been
ing [42]. AutoML platforms are services, which are mainly detected in this area and it is not always easy to overcome
accessible in the cloud. Therefore, for this task they are not them [54]. A low maturity level system relies on the classical
preferred. Although when a cloud based MLOps platform machine learning techniques and requires an extremely good
selected, is possible to have better compatibility. There are connection between the individual working teams such as
also libraries and API’s written in python and c++, which data scientists, ML engineers and frond end engineers. Lots
are much more preferable when an end-to-end cloud-based of technical problems arise from this deviation and the lack
MLOps platform has not been chosen. The ones stand out of compatibility from one step to another. The first challenge
are Auto-Sklearn [43], Auto-Keras [44], TPOT [45], Auto- lies in the creation of robust efficient pipelines with strong
Pytorch [46], BigML [47]. The main platforms are Google compatibility. Constant evolving is another critical point of
Cloud AutoML [48], Akkio [49], H2O [50], Microsoft Azure a high maturity level of a MLOps platform, thus constant
AutoML [51] and Amazon SageMaker Autopilot [52]. The retraining shifts in the top of the current challenges.
most important tools are listed in table IV.
A. Efficient Pipelines
B. Combining MLOps and AutoML A MLOps system includes various pipelines [3]. Com-
It is obvious that the combination of the two techniques monly a data manipulation pipeline, a model creation
can be extremely effective [9], but there are still some pros pipeline and a deployment pipeline are mandatory. Each of
and cons. AutoML requires a vast computational power in these pipelines must be compatible with the others, in a
order to perform. The development of technological means way that optimizes flow and minimizes errors. From this
computational power but every year more power is getting aspect it is critical to choose the right tools for the creation
closer and closer to overcome these kind of challenges, but and connection of these pipelines. The shape of the targets

0457

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:13:21 UTC from IEEE Xplore. Restrictions apply.
determines the best combination of tools and techniques, is focused on sustainability, robustness [57], fairness, and
whereat you do not have an ideal combination for each explainability [58]. The reason is that we need to know more
problem, but the problem determines the combination to about the structure of the model, the performance, the reason
be chosen. Also, it is always critical to use the same data why it works or it doesn’t.
preprocessing libraries in every pipeline. In this way, we will
VII. C ONCLUSION
prevent the rise of multiple compatibility errors.
In conclusion, MLOps is the most efficient way to
B. Re-Training incorporate ML models in production. Every year more
After monitoring and tracking your model performance, enterprises use these techniques and more research has been
the next step is retraining your machine learning model [55]. made in the area. But MLOps maybe has a different usage.
The objective is to ensure that the quality of your model in In addition to the application of ML models in production,
production is up to date. However, even if the pipelines are a fully mature MLOps system with continuous training can
perfect, there are many problems that complicate or even lead us to more efficient and realistic ML models. Further,
make retraining impossible. From our point of view, the most choosing the right tools for each job is a constant challenge.
important of them is new data manipulation. Although there are many papers and articles for the different
1) New Data Manipulation: When a model is deployed tools it is not easy to follow the guidelines and incorporate
in production, we use new, raw data to make the predictions them in the most efficient way. Sometimes we have to choose
and use them to extract the final results. However, when between flexibility and robustness with the respective pros
we are using supervised learning, we do not have at our and cons. Finally, monitoring is a stage that must be one
disposal the corresponding labels. So it is impossible to of the main points of interest. Monitoring the state of the
measure the accuracy and constantly evaluate the model. It whole system using sustainability, robustness, fairness, and
is possible to perceive the robustness of the model only by explainability is from our point of view the key for mature,
evaluating the final results, which isn’t always an option. automated, robust and efficient MLOps systems. For this
Even if we manage to evaluate the model and find low reason, it is essential to develop model and techniques which
metrics at new data, the same problem arises again. In order enables this kind of monitoring such as explainable machine
to retrain (fine tune) the model, the labels are prerequisites. learning models. AutoML is maybe the game changer in
Manually labeling the new data is a solution but slows the maturity and efficiency chase. For this reason, a more
down the process and fails at constant retraining tasks. An comprehensive and practical survey for the usage of AutoML
approach is using the trained model to label the new data or in MLOps is necessary.
use unsupervised learning instead of supervised learning but ACKNOWLEDGMENT
also relies on the type of the problem and the targets of the This work was supported by the MPhil program “Ad-
task. Finally, there are types of data where there is no need vanced Technologies in Informatics and Computers”, hosted
for labeling. The most common area that uses this kind of by the Department of Computer Science, International Hel-
data is time series and forecasting. lenic University, Kavala, Greece.
C. Monitoring R EFERENCES
In most papers and articles, monitoring is positioned as [1] A. Goyal, “Machine learning operations,” International
one of the most important functions in MLOps [56]. This Journal of Information Technology Insights Transformations
is because to understand the results helps understanding [ISSN: 2581-5172 (online)], vol. 4, 2020. [Online]. Avail-
able: http://www.technology.eurekajournals.com/index.php
the lack of the entire system. The last section shows the /IJITIT/article/view/655
importance of monitoring not only for the accuracy of the
model, but for every aspect of the system. [2] Y. Zhao, “Machine learning in production: A literature re-
1) Data monitoring: Monitoring the data can be ex- view,” 2021. [Online]. Available: https://scholar.google.com/
tremely useful in many ways. Detection of outliers and drift [3] Y. Zhou, Y. Yu, and B. Ding, “Towards mlops: A case study of
is a way to prevent a failure of the model and help the right ml pipeline platform,” Proceedings - 2020 International Con-
training. Constant monitoring of the shape of the data is ference on Artificial Intelligence and Computer Engineering,
always opposed to training data it is away. There are lots of ICAICE 2020, pp. 494–500, 10 2020.
tools and techniques for data monitoring and choosing the [4] I. Pölöskei, “Mlops approach in the cloud-native data
right ones also depends on the target. pipeline design,” Acta Technica Jaurinensis, 2021. [Online].
2) Model Monitoring: Monitoring the accuracy of a Available: https://acta.sze.hu/index.php/acta/article/view/581
model is a way to evaluate the performance in a bunch of
[5] M. Reddy, B. Dattaprakash, S. S. Kammath, S. KN, and
data at a precise moment. For a high maturity level system, S. Manokaran, “Application of mlops in prediction of
we need to monitor more aspects of our model and the lifestyle diseases,” SPAST Abstracts, vol. 1, 2021. [Online].
whole system. In the previous years, lots of research [10][11] Available: https://spast.org/techrep/article/view/942

0458

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:13:21 UTC from IEEE Xplore. Restrictions apply.
[6] C. Min, A. Mathur, U. G. Acer, A. Montanari, and F. Kawsar, [19] M. M. John, H. H. Olsson, and J. Bosch, “Towards
“Sensix++: Bringing mlops and multi-tenant model serving mlops: A framework and maturity model,” 2021 47th
to sensory edge devices,” 9 2021. [Online]. Available: Euromicro Conference on Software Engineering and
https://arxiv.org/abs/2109.03947v1 Advanced Applications (SEAA), pp. 1–8, 9 2021. [Online].
Available: https://ieeexplore.ieee.org/document/9582569/
[7] S. Mäkinen, H. Skogström, V. Turku, E. Laaksonen, and
T. Mikkonen, “Who needs mlops: What data scientists seek [20] M. Treveil and D. Team, “Introducing mlops how to scale
to accomplish and how can mlops help?” 2021. machine learning in the enterprise,” p. 185, 2020. [Online].
Available: https://www.oreilly.com/library/view/introducing-
mlops/9781492083283/
[8] C. Renggli, L. Rimanic, N. M. Gürel, B. Karlaš,
W. Wu, C. Zhang, and E. Zurich, “A data quality- [21] D. Sato, “Thoughtworksinc/cd4ml-workshop: Repository
driven view of mlops,” 2 2021. [Online]. Available: with sample code and instructions for ”continuous
https://arxiv.org/abs/2102.07750v1 intelligence” and ”continuous delivery for machine
learning: Cd4ml” workshops.” [Online]. Available:
[9] P. Ruf, M. Madan, C. Reich, and D. Ould-Abdeslam, https://github.com/ThoughtWorksInc/cd4ml-workshop
“Demystifying mlops and presenting a recipe for the
selection of open-source tools,” Applied Sciences 2021, [22] T. Granlund, A. Kopponen, V. Stirbu, L. Myllyaho, and
Vol. 11, Page 8861, vol. 11, p. 8861, 9 2021. [Online]. T. Mikkonen, “Mlops challenges in multi-organization setup:
Available: https://www.mdpi.com/2076-3417/11/19/8861/htm Experiences from two real-world cases.” [Online]. Available:
https://www.mdpi.com/2076-3417/11/19/8861 https://oraviz.io/
[23] D. Sato, A. Wider, and C. Windheuser, “Continuous
[10] J. Klaise, A. V. Looveren, C. Cox, G. Vacanti, and A. Coca, delivery for machine learning.” [Online]. Available:
“Monitoring and explainability of models in production,” 7 https://martinfowler.com/articles/cd4ml.htmlDataPipelines
2020. [Online]. Available: https://arxiv.org/abs/2007.06299v1
[24] Google, “Mlops: Continuous delivery and automation
[11] D. A. Tamburri, “Sustainable mlops: Trends and challenges,” pipelines in machine learning-google cloud.” [Online].
Proceedings - 2020 22nd International Symposium on Sym- Available: https://cloud.google.com/architecture/mlops-
bolic and Numeric Algorithms for Scientific Computing, continuous-delivery-and-automation-pipelines-in-machine-
SYNASC 2020, pp. 17–23, 9 2020. learning
[25] Microsoft, “Machine learning operations maturity model-
[12] S. Alla and S. K. Adari, “What is mlops?” in Beginning azure architecture center-microsoft docs.” [Online]. Available:
MLOps with MLFlow. Springer, 2021, pp. 79–124. https://docs.microsoft.com/en-us/azure/architecture/example-
scenario/mlops/mlops-maturity-model
[13] S. Sharma, “The devops adoption playbook : a guide to
adopting devops in a multi-speed it enterprise,” IBM Press, [26] A. Felipe and V. Maya, “The state of mlops,” 2016.
pp. 34–58.
[27] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine
learning on big data: Opportunities and challenges,” Neuro-
[14] B. Fitzgerald and K.-J. Stol, “Continuous software engi-
computing, vol. 237, pp. 350–361, 5 2017.
neering: A roadmap and agenda,” Journal of Systems and
Software, vol. 123, pp. 176–189, 1 2017. [28] T. G. Dietterich, “Machine learning for sequential data:
A review,” Lecture Notes in Computer Science (including
[15] N. Gift and A. Deza, Practical MLOps: operationalizing subseries Lecture Notes in Artificial Intelligence and Lecture
machine learning models. O’Reilly Media, Inc, 2020. Notes in Bioinformatics), vol. 2396, pp. 15–30, 2002. [On-
line]. Available: https://link.springer.com/chapter/10.1007/3-
[16] E. RAJ, “Mlops using azure machine learning rapidly test, 540-70659-3.2
build, and manage production-ready machine learning life
cycles at scale.” PACKT PUBLISHING LIMITED, pp. 45–62, [29] T. Fredriksson, D. I. Mattos, J. Bosch, and H. H. Olsson,
2021. “Data labeling: An empirical investigation into industrial
challenges and mitigation strategies,” Lecture Notes in
Computer Science (including subseries Lecture Notes in
[17] C. A. Ioannis Karamitsos, Saeed Albarhami, “Ap- Artificial Intelligence and Lecture Notes in Bioinformatics),
plying devops practices of continuous automation vol. 12562 LNCS, pp. 202–216, 11 2020. [Online].
for machine learning,” Information 2020, Vol. 11, Available: https://link.springer.com/chapter/10.1007/978-3-
Page 363, vol. 11, p. 363, 7 2020. [Online]. 030-64148-1.13
Available: https://www.mdpi.com/2078-2489/11/7/363/htm
https://www.mdpi.com/2078-2489/11/7/363 [30] M. Armbrust, T. Das, L. Sun, B. Yavuz, S. Zhu,
M. Murthy, J. Torres, H. van Hovell, A. Ionescu,
[18] B. Fitzgerald and K.-J. Stol, “Continuous software A. Łuszczak, M. Świtakowski, M. Szafrański, X. Li,
engineering and beyond: Trends and challenges T. Ueshin, M. Mokhtar, P. Boncz, A. Ghodsi,
general terms,” Proceedings of the 1st International S. Paranjpye, P. Senster, R. Xin, and M. Zaharia,
Workshop on Rapid Continuous Software Engineering “Delta lake,” Proceedings of the VLDB Endowment,
- RCoSE 2014, vol. 14, 2014. [Online]. Available: vol. 13, pp. 3411–3424, 8 2020. [Online]. Available:
http://dx.doi.org/10.1145/2593812.2593813 https://dl.acm.org/doi/abs/10.14778/3415478.3415560

0459

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:13:21 UTC from IEEE Xplore. Restrictions apply.
[31] S. Khalid, T. Khalil, and S. Nasreen, “A survey of feature se- [46] L. Zimmer, M. Lindauer, and F. Hutter, “Auto-pytorch
lection and feature extraction techniques in machine learning,” tabular: Multi-fidelity metalearning for efficient and robust
Proceedings of 2014 Science and Information Conference, autodl,” IEEE Transactions on Pattern Analysis and Machine
SAI 2014, pp. 372–378, 10 2014. Intelligence, vol. 43, pp. 3079–3090, 6 2020. [Online].
Available: https://arxiv.org/abs/2006.13799v3
[32] R. Bardenet, M. Brendel, B. Kégl, M. Sebag,
and S. Fr, “Collaborative hyperparameter tuning,” [47] BigML, “Bigml.com.” [Online]. Available: https://bigml.com/
vol. 28, pp. 199–207, 5 2013. [Online]. Available:
https://proceedings.mlr.press/v28/bardenet13.html [48] Google, “Cloud automl custom machine learn-
ing models-google cloud.” [Online]. Available:
[33] L. Savu, “Cloud computing: Deployment models, delivery https://cloud.google.com/automl
models, risks and research challanges,” 2011 International
Conference on Computer and Management, CAMAN 2011, [49] Akkio, “Modern business runs on ai-no code ai-akkio.”
2011. [Online]. Available: https://www.akkio.com/

[34] J. de la Rúa Martı́nez, “Scalable architecture for automating [50] H2O, “H2o.ai-ai cloud platform.” [Online]. Available:
machine learning model monitoring,” 2020. [Online]. https://www.h2o.ai/
Available: http://oatd.org/oatd/record?record=oai
[51] Microsoft, “What is automated ml?-automl-azure machine
[35] J. Bosch and H. H. Olsson, “Digital for real: A multicase learning.” [Online]. Available: https://docs.microsoft.com/en-
study on the digital transformation of companies in the us/azure/machine-learning/concept-automated-ml
embedded systems domain,” Journal of Software: Evolution
and Process, vol. 33, p. e2333, 5. [Online]. Available: [52] Amazon, “Amazon sagemaker autopilot-
https://onlinelibrary.wiley.com/doi/full/10.1002/smr.2333 amazon sagemaker.” [Online]. Available:
https://aws.amazon.com/sagemaker/autopilot/
[36] “Tensorflow extended (tfx)-ml production pipelines.”
[Online]. Available: https://www.tensorflow.org/tfx [53] M. Feurer and F. Hutter, “Practical automated machine learn-
ing for the automl challenge 2018,” ICML 2018 AutoML
[37] “Meet michelangelo: Uber’s machine learning platform.” Workshop, 2018.
[Online]. Available: https://eng.uber.com/michelangelo-
machine-learning-platform/ [54] G. Fursin, “The collective knowledge project: making ml
models more portable and reproducible with open apis,
[38] “Bighead: Airbnb’s end-to-end machine reusable best practices and mlops,” 6 2020. [Online].
learning platform-databricks.” [Online]. Available: Available: https://arxiv.org/abs/2006.07161v2
https://databricks.com/session/bighead-airbnbs-end-to-end-
machine-learning-platform [55] S. Schelter, F. Biessmann, T. Januschowski, D. Salinas,
S. Seufert, and G. Szarvas, “On challenges in machine
[39] “Metaflow.” [Online]. Available: https://metaflow.org/ learning model management,” 2018.

[40] S. A. S. K. Adari, “Beginning mlops with mlflow deploy [56] A. Banerjee, C.-C. Chen, C.-C. Hung, X. Huang,
models in aws sagemaker, google cloud, and microsoft azure,” Y. Wang, and R. Chevesaran, “Challenges and experiences
2021. [Online]. Available: https://doi.org/10.1007/978-1- with mlops for performance diagnostics in hybrid-cloud
4842-6549-9 enterprise software deployments,” 2020. [Online]. Available:
https://www.vmware.com/solutions/trustvm-
[41] S. K. Karmaker, M. Hassan, M. J. Smith, M. M.
Hassan, L. Xu, C. Zhai, K. Veeramachaneni, S. K. [57] K. D. Apostolidis and G. A. Papakostas, “A survey on adver-
Karmaker, M. M. Hassan, S. Ginn, M. J. Smith, L. Xu, sarial deep learning robustness in medical image analysis,”
K. Veeramachaneni, and C. Zhai, “Automl to date and Electronics, vol. 10, p. 2132, 2021.
beyond: Challenges and opportunities,” ACM Computing
Surveys (CSUR), vol. 54, p. 175, 10 2021. [Online]. [58] G. P. Avramidis, M. P. Avramidou, and G. A. Papakostas,
Available: https://dl.acm.org/doi/abs/10.1145/3470918 “Rheumatoid arthritis diagnosis: Deep learning vs. humane,”
Applied Sciences, vol. 12, p. 10, 2022.
[42] P. Gijsbers, E. LeDell, J. Thomas, S. Poirier, B. Bischl, and
J. Vanschoren, “An open source automl benchmark,” 7 2019.
[Online]. Available: https://arxiv.org/abs/1907.00909v1

[43] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer,


and F. Hutter, “Auto-sklearn 2.0: Hands-free automl
via meta-learning,” 7 2020. [Online]. Available:
http://arxiv.org/abs/2007.04074

[44] “Autokeras.” [Online]. Available: https://autokeras.com/

[45] “Tpot.” [Online]. Available: http://epistasislab.github.io/tpot/

0460

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:13:21 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy