MLOps
MLOps
Other Challenges
In addition to these two primary challenges, there are
many other smaller inefficiencies that prevent businesses
from being able to scale AI projects (and for which, as we’ll
see later in this report, MLOps provides solutions). For
example, the idea of reproducibility: when companies do
not operate with clear and reproducible workflows, it’s very
common for people working in different parts of the
company to unknowingly be working on creating exactly
the same solution.
From a business perspective, getting to the 10th or 20th AI
project or use case usually still has a positive impact on the
balance sheet, but eventually, the marginal value of the
next use case is lower than the marginal costs (see Figures
1-1 and 1-2).
Figure 1-1. Cumulative revenues, costs, and profits over time (number
of use cases). Note that after use case 8, profit is decreasing due to
increased costs and stagnation of revenue.
Figure 1-2. Marginal revenue, cost, and profit over time (number of use
cases).
One might see these figures and conclude that the most
profitable way to approach AI projects is to only address
the top 5 to 10 most valuable use cases and stop. But this
does not take into account the continued cost of AI project
maintenance.
Adding marginal cost to the maintenance costs will
generate negative value and negative numbers on the
balance sheet. It is, therefore, economically impossible to
scale use cases, and it’s a big mistake to think that the
business will be able to easily generalize Enterprise AI
everywhere by simply taking on more AI projects
throughout the company.
Ultimately, to continue seeing returns on investment (ROI)
in AI projects at scale, taking on exponentially more use
cases, companies must find ways to decrease both the
marginal costs and incremental maintenance costs of
Enterprise AI. Robust MLOps practices, again, are one part
of the solution.
On top of the challenges of scaling, a lack of transparency
and lack of workflow reusability generally mean there are
poor data governance practices happening. Imagine if no
one understands or has clear access to work by other
members of the data team—in case of an audit, figuring out
how data has been treated and transformed as well as what
data is being used for which models becomes nearly
impossible. With members of the data team leaving and
being hired, this becomes exponentially more complicated.
For those on the business side, taking a deeper look into
the AI project life cycle and understanding how—and why—
it works is the starting point to addressing many of these
challenges. It helps bridge the gap between the needs and
goals of the business and those of the technical sides of the
equation to the benefit of the Enterprise AI efforts of the
entire organization.
Data scientists
Though most see data scientists’ role in the ML model
life cycle as strictly the model-building portion, it is
actually—or at least, it should be—much wider. From the
very beginning, data scientists need to be involved with
subject matter experts, understanding and helping to
frame business problems in such a way that they can
build a viable ML solution.
Architects
AI projects require resources, and architects help
properly allocate those resources to ensure optimal
performance of ML models. Without the architect role,
AI projects might not perform as expected once they are
used.
Introduce transparency
MLOps is a critical part of transparent strategies for
ML. Upper management, the C-suite, and data scientists
should all be able to understand what ML models are
being used by the business and what effect they’re
having. Beyond that, they should arguably be able to
drill down to understand the whole data pipeline behind
those ML models. MLOps, as described in this report,
can provide this level of transparency and
accountability.
Build Responsible AI
The reality is that introducing automation vis-à-vis ML
models shifts the fundamental onus of accountability
from the bottom of the hierarchy to the top. That is,
decisions that were perhaps previously made by
individual contributors who operated within a margin of
guidelines (for example, what the price of a given
product should be or whether or not a person should be
accepted for a loan) are now being made by a machine.
Given the potential risks of AI projects as well as their
particular challenges, it’s easy to see the interplay
between MLOps and Responsible AI: teams must have
good MLOps principles to practice Responsible AI, and
Responsible AI necessitates MLOps strategies.
Scale
MLOps is important not only because it helps mitigate
the risk, but also because it is an essential component to
scaling ML efforts (and in turn benefiting from the
corresponding economies of scale). To go from the
business using one or a handful of models to tens,
hundreds, or thousands of models that positively impact
the business requires MLOps discipline.
Figure 1-5. The exponential growth of MLOps. This represents only the
growth of MLOps, not the parallel growth of the term ModelOps (subtle
differences explained in the sidebar MLOps versus ModelOps versus
AIOps).
Closing Thoughts
MLOps is critical—and will only continue to become more
so—to both scaling AI across an enterprise as well as
ensuring it is deployed in a way that minimizes risk. Both of
these are goals with which business leaders should be
deeply concerned.
While certain parts of MLOps can be quite technical, it’s
only in streamlining the entire AI life cycle that the
business will be able to develop AI capabilities to scale
their operations. That’s why business leaders should not
only understand the components and complexities of
MLOps, but have a seat at the table when deciding which
tools or processes the organization will follow and use to
execute.
The next chapter is the first to dive into the detail of
MLOps, starting with the development of ML models
themselves. Again, the value of understanding MLOps
systems at this level of detail for business leaders is to be
able to drive efficiencies from business problems to
solutions. This is something to keep in mind throughout
Chapters 2–4.
1 This report will cover the other two components (deployment and
iteration) only at a high level. Those looking for more detail on
each component should read Introducing MLOps (O’Reilly).
Chapter 2. Developing and
Deploying Models
Data Selection
Data selection sounds simple, but can often be the most
arduous part of the journey once one delves into the details
to see what’s at stake and all the factors that influence data
reliability and accuracy. Key questions for finding data to
build ML models include (but are not limited to):
Responsible AI
A responsible use of machine learning (more commonly
referred to as Responsible AI) covers two main dimensions:
Intentionality
Ensuring that models are designed and behave in ways
aligned with their purpose. This includes assurance that
data used for AI projects comes from compliant and
unbiased sources, plus a collaborative approach to AI
projects that ensures multiple checks and balances on
potential model bias.
Intentionality also includes explainability, meaning the
results of AI systems should be explainable by humans
(ideally, not just the humans that created the system).
Accountability
Centrally controlling, managing, and auditing the
Enterprise AI effort—no shadow IT! Accountability is
about having an overall view of which teams are using
what data, how, and in which models.
It also includes the need for trust that data is reliable
and being collected in accordance with regulations as
well as a centralized understanding of which models are
used for what business processes. This is closely tied to
traceability—if something goes wrong, is it easy to find
where in the pipeline it happened?
Feature Engineering
Feature engineering is the process of taking raw data from
the selected datasets and transforming it into “features”
that better represent the underlying problem to be solved.
It includes data cleansing, which can represent the largest
part of a project in terms of time spent.
When it comes to feature creation and selection, the
question of how much and when to stop comes up regularly.
Adding more features may produce a more accurate model
or achieve more fairness. However, it also comes with
downsides, all of which can have a significant impact on
MLOps strategies down the line:
Model Deployment
For people on the business and not the technical side, it
can be difficult to understand exactly what it means to
deploy an ML model or AI project, and more importantly,
why it matters. It’s probably not necessary for most readers
to understand the “how” of model deployment in detail—it
is quite a complex process that is mostly handled by data
engineers, software engineers, and/or DevOps.1
However, it is valuable to know the basics in order to be a
more productive participant in the AI project life cycle, and
more broadly, in MLOps processes. For example, when
detailing business requirements, it’s helpful to have a
baseline understanding of the types of model deployment in
order to have a richer discussion about what makes sense
for the use case at hand.
Deploying an ML model simply means integrating it into an
existing production environment. For example, a team
might have spent several months building a model to detect
fraudulent transactions. However, after that model is
developed, it needs to actually be deployed. In this case,
that means integrating it into existing processes so that it
can actually start scoring transactions and returning the
results.
There are two types of model deployment:
Model as a service, or live-scoring model
Requests are handled in real time. For the fraudulent
transaction example, this would mean as each
transaction happens, it is scored. This method is best
reserved for cases (like fraud) where predictions need to
happen right away.
Embedded model
Here the model is packaged into an application, which is
then published. A common example is an application
that provides batch scoring of requests. This type of
deployment is good if the model is used on a consistent
basis, but the business doesn’t necessarily require the
predictions in real time.
Closing Thoughts
Though traditionally people on the business side, and
especially business leaders, aren’t the ones developing or
deploying ML models, they have a vested interest in
ensuring that they understand the processes and establish
firm MLOps guidelines to steer them.
In these stages, carelessness (it’s important to note that
blunders are usually accidental and not intentional) can put
the organization at risk—a poorly developed model can, at
best, seriously affect revenue, customer service, or other
processes. At worst, it can open the floodgates to a PR
disaster.
For IT or DevOps
The concerns of the DevOps team are very familiar and
include questions like:
Ground truth
The ground truth is the correct answer to the question that
the model was asked to solve, for example, “Is this credit
card transaction actually fraudulent?” In knowing the
ground truth for all predictions a model has made, one can
judge how well that model is performing.
Sometimes ground truth is obtained rapidly after a
prediction, for example, in models deciding which
advertisements to display to a user on a web page. The
user is likely to click on the advertisements within seconds,
or not at all.
However, in many use cases, obtaining the ground truth is
much slower. If a model predicts that a transaction is
fraudulent, how can this be confirmed? In some cases,
verification may only take a few minutes, such as a phone
call placed to the cardholder. But what about the
transactions the model thought were OK but actually
weren’t? The best hope is that they will be reported by the
cardholder when they review their monthly transactions,
but this could happen up to a month after the event (or not
at all).
In the fraud example, ground truth isn’t going to enable
data science teams to monitor performance accurately on a
daily basis. If the situation requires rapid feedback, then
input drift may be a better approach.
Input drift
Input drift is based on the principle that a model is only
going to predict accurately if the data it was trained on is
an accurate reflection of the real world. So, if a comparison
of recent requests to a deployed model against the training
data shows distinct differences, then there is a strong
likelihood that the model performance is compromised.
This is the basis of input drift monitoring. The beauty of
this approach is that all the data required for this test
already exists—no need to wait for ground truth or any
other information.
Figure 3-1. The difference between shadow testing and A/B testing
Closing Thoughts
Model monitoring and iteration is what many people
naturally think of when they hear MLOps. And while it’s
just one small part of a much larger process, it is
undoubtedly important. Many on the business side see AI
projects as something that can be built, implemented, and
will then just work. However, as seen in this section, this
often isn’t the case.
Unlike static software code, ML models—because of
constantly changing data—need to be carefully monitored
and possibly tweaked in order to achieve the expected
business results.
Chapter 4. Governance
Types of Governance
Applying good governance to MLOps is challenging. The
processes are complex, the technology is opaque, and the
dependence on data is fundamental. Governance initiatives
in MLOps broadly fall into one of two categories:
Data governance
A framework for ensuring appropriate use and
management of data.
Process governance
The use of well-defined processes to ensure that all
governance considerations have been addressed at the
correct point in the life cycle of the model, and that a
full and accurate record has been kept.
Data Governance
Data governance, which concerns itself with the data being
used—especially for model training—addresses questions
like:
Process Governance
The second type of governance is process governance,
which focuses on formalizing the steps in the MLOps
process and associating actions with those.
Today, process governance is most commonly found in
organizations with a traditionally heavy burden of
regulation and compliance, such as finance. Outside of
these organizations, it is rare. With ML creeping into all
spheres of commercial activity, and with rising concern
about Responsible AI, we will need new and innovative
solutions that can work for all businesses.
Those responsible for MLOps must manage the inherent
tension between different user profiles, striking a balance
between getting the job done efficiently, and protecting
against all possible threats. This balance can be found by
assessing the specific risk of each project and matching the
governance process to that risk level. There are several
dimensions to consider when assessing risk, including:
Tactical
Implement and enforce the vision
Operational
Execute on a daily basis
Closing Thoughts
It is hard to separate MLOps from its governance. It is not
possible to successfully manage the model life cycle,
mitigate the risks, and deliver value at scale without
governance. Governance impacts everything from how the
business can acceptably exploit ML, the data and
algorithms that can be used, to the style of
operationalization, monitoring, and retraining.
MLOps at scale is in its infancy. Few businesses are doing
it, and even fewer are doing it well—meaning it’s an
opportunity for businesses to set themselves apart and get
ahead in the race to AI. When planning to scale MLOps,
start with governance and use it to drive the process. Don’t
bolt it on at the end. Think through the policies; think about
using tooling to give a centralized view; engage across the
organization. It will take time and iteration, but ultimately
the business will be able to look back and be proud that it
took its responsibilities seriously.
People
As touched on in Chapter 1, the AI project life cycle must
involve different types of profiles with a wide range of skills
in order to be successful, and each of those people has a
role to play in MLOps. But the involvement of various
stakeholders isn’t about passing the project from team to
team at each step—collaboration between people is critical.
For example, subject matter experts usually come to the
table—or at least, they should come to the table—with
clearly defined goals, business questions, and/or key
performance indicators (KPIs) that they want to achieve or
address. In some cases, they might be extremely well
defined (e.g., “In order to hit our numbers for the quarter,
we need to reduce customer churn by 10%,” or “We’re
losing n dollars per quarter due to unscheduled
maintenance, how can we better predict downtime?”). In
other cases, less so (e.g., “Our service staff needs to better
understand our customers to upsell them” or “How can we
get people to buy more widgets?”).
In organizations with healthy processes, starting the ML
model life cycle with a more-defined business question isn’t
necessarily always an imperative, or even an ideal scenario.
Working with a less-defined business goal can be a good
opportunity for subject matter experts to work directly with
data scientists up front to better frame the problem and
brainstorm possible solutions before even beginning any
data exploration or model experimentation.
Subject matter experts have a role to play not only at the
beginning of the AI project life cycle, but the end
(postproduction) as well. Oftentimes, to understand if an
ML model is performing well or as expected, data scientists
need subject matter experts to close the feedback loop—
traditional metrics (accuracy, precision, recall, etc.) are not
enough.
For example, data scientists could build a simple churn
prediction model that has very high accuracy in a
production environment; however, marketing does not
manage to prevent anyone from churning. From a business
perspective, that means the model didn’t work, and that’s
important information that needs to make its way back to
those building the ML model so that they can find another
possible solution—e.g., introducing uplift modeling that
helps marketing better target potential churners who might
be receptive to marketing messaging.
To get started on a strong foundation with MLOps, it might
be worth looking at the steps AI projects must take at your
organization and who needs to be involved. This can be a
good starting point to making sure the right stakeholders
not only have a seat at the table, but that they can
effectively work with each other to develop, monitor, and
govern models that will not put the business at risk. For
example, are these people even using the same tools and
speaking the same language? (More on this in
“Technology”.)
Processes
MLOps is essentially an underlying system of processes—
essential tasks for not only efficiently scaling data science
and ML at the enterprise level, but also doing it in a way
that doesn’t put the business at risk. Teams that attempt to
deploy data science without proper MLOps practices in
place will face issues with model quality, continuity, or
worse—they will introduce models that have a real,
negative impact on the business (e.g., a model that makes
biased predictions that reflect poorly on the company).
MLOps is also, at a higher level, a critical part of
transparent strategies for machine learning. Upper
management and the C-suite should be able to understand
as well as data scientists what ML models are deployed in
production and what effect they’re having on the business.
Beyond that, they should arguably be able to drill down to
understand the whole data pipeline behind those models.
MLOps, as described in this report, can provide this level of
transparency and accountability.
That being said, getting started involves formalizing the
steps in the MLOps process and associating actions with
those. Typically, these actions are reviews, sign-offs, and
the capture of supporting materials such as documentation.
The aim is twofold:
Technology
Unfortunately (but unsurprisingly), there is no magic-bullet
solution: one MLOps tool that can make all processes work
perfectly. That being said, technology can help ensure that
people work together (the importance of which was
described in “People ”) as well as guide processes.
Fortunately, more and more data science and ML platforms
allow for one system that checks all of these boxes and
more, including making other parts of the AI project life
cycle easier, like automating workflows and preserving
processing operations for repeatability. Some also allow for
the use of version control and experimental branch spin-off
to test out theories, then merge, discard, or keep them, as
well.
The bottom line when it comes to getting started and
technology is that it’s important not to further fragment the
AI project life cycle with a slew of different tools that
further complicate the process, requiring additional work
to cobble together different technologies. MLOps is one
unified process, so tooling should unite all different people
and parts of processes into one place.
Closing Thoughts
In order for AI to become truly scalable and enact holistic
organizational change, enterprises must achieve alignment
across people, processes, and technology, as described
specifically in this section, but also throughout the entire
report. This task is far from a turnkey undertaking.
While this alignment is critical, building robust MLOps
practices doesn’t happen overnight, and it requires a
significant time investment from everyone within an
organization. Change management is an often overlooked,
but critical—and admittedly challenging—part of pivoting
an organization’s strategy around data. That is one area of
AI transformation, and of MLOps, where the business can
bring particular value and strengths that technical teams
might not be able to lead on their own. This fact further
underscores the need for business and technology experts
to work together toward common goals, of which MLOps is
just the beginning.
About the Authors
Mark Treveil has designed products in fields as diverse as
telecommunications, banking, and online trading. His own
startup led a revolution in governance in the UK local
government, where it still dominates. He is now part of the
Dataiku Product Team based in Paris.
Lynn Heidmann received her BA in journalism/mass
communications and anthropology from the University of
Wisconsin–Madison in 2008 and decided to bring her
passion for research and writing into the world of tech. She
spent seven years in the San Francisco Bay Area writing
and running operations with Google and subsequently
Niantic before moving to Paris to head content initiatives at
Dataiku. In her current role, Lynn follows and writes about
technological trends and developments in the world of data
and AI.