0% found this document useful (0 votes)

36 views15 pages

Chapter 2

Uploaded by

Steven Ayare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views15 pages

Chapter 2

Uploaded by

Steven Ayare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Chapter 2

Fundamentals of Machine Learning Systems for MLOPS

2.1 Introduction to ML System

Machine Learning (ML) is a subset of artificial intelligence (AI) that enables computers to learn from data and
make decisions or predictions without being explicitly programmed. It involves developing algorithms that allow a
system to improve its performance on a task over time by recognizing patterns in data.

Key Concepts of Machine Learning:

1. Data: ML algorithms learn from data, which can be in the form of numbers, text, images, or other formats.
2. Training: In ML, a model is trained using a dataset. The model adjusts its internal parameters to minimize
errors in predictions.
3. Models: The trained model is used to make predictions or decisions on new, unseen data.
4. Learning: ML systems learn from experience (data) and improve their accuracy without being
reprogrammed.

Types of Machine Learning:

1. Supervised Learning: The model is trained on labeled data, where both inputs and outputs are provided.
The goal is to learn a mapping from input to output (e.g., predicting house prices based on features like area
and location).
2. Unsupervised Learning: The model is given unlabeled data and tries to find hidden patterns or structures
(e.g., grouping customers based on purchasing behavior without predefined categories).
3. Reinforcement Learning: The model learns by interacting with an environment and receiving feedback in
the form of rewards or penalties, aiming to maximize long-term rewards.

2.2 ML Models

There are many commonly used models in ML, and we will abstain from giving an overview of all of them here. In
addition to common models, many model variations, novel architectures, and optimization strategies are published
on a weekly basis. In May 2019 alone, more than 13,000 papers were submitted to ArXiv, a popular electronic
archive of research where papers about new models are frequently submitted. It is useful, however, to share an
overview of different categories of models and how they can be applied to different problems. To this end, I propose
here a simple taxonomy of models based on how they approach a problem. You can use it as a guide for selecting an
approach to tackle a particular ML problem. Because models and data are closely coupled in ML, you will notice
some overlap between this section and “Data types”. ML algorithms can be categorized based on whether they
require labels.
Here, a label refers to the presence in the data of an ideal output that a model should produce for a given example.
Supervised algorithms leverage datasets that contain labels for inputs, and they aim to learn a mapping from inputs
to labels. Unsupervised algorithms, on the other hand, do not require labels. Finally, weakly supervised algorithms
leverage labels that aren’t exactly the desired output but that resemble it in some way. Many product goals can be
tackled by both supervised and unsupervised algorithms.
Machine learning (ML) models can be broadly classified based on the kind of learning they perform and the type of
data they work with. There are three main types of ML models: Supervised Learning, Unsupervised Learning,
and Reinforcement Learning. These can be further broken down into various specific models or algorithms.

1. Supervised Learning Models

In supervised learning, the algorithm is trained on labeled data, meaning the input data is paired with the correct
output. The model learns a mapping from inputs to outputs and is then able to make predictions on unseen data.

Common Supervised Learning Models:

 Linear Regression: Used for predicting continuous values, such as predicting house prices based on
features like area, number of rooms, etc.
 Logistic Regression: Used for classification tasks, especially for binary outcomes (e.g., spam vs. not
spam).
 Support Vector Machines (SVM): A powerful classifier that works by finding a hyperplane that best
separates the data into classes.
 Decision Trees: A tree-like model of decisions that splits data based on feature values. It’s interpretable but
can be prone to overfitting.
 Random Forest: An ensemble method that uses multiple decision trees to improve accuracy and reduce
overfitting.
 K-Nearest Neighbors (KNN): A classification algorithm that assigns a class to a data point based on the
majority class of its nearest neighbors.
 Naive Bayes: A probabilistic classifier based on Bayes' theorem, suitable for text classification and other
probabilistic tasks.
 Neural Networks: A network of nodes (neurons) inspired by biological neural networks, used for both
classification and regression, especially in deep learning.

2. Unsupervised Learning Models

Unsupervised learning involves training a model on data that has no labels or explicit outputs. The goal is often to
find hidden patterns or groupings in the data.

Common Unsupervised Learning Models:

 K-Means Clustering: A clustering algorithm that partitions data into k distinct clusters based on similarity.
 Hierarchical Clustering: Builds a hierarchy of clusters, represented as a tree-like structure (dendrogram).
 Principal Component Analysis (PCA): A dimensionality reduction technique used to reduce the number
of features while retaining the essential information.
 Gaussian Mixture Model (GMM): A probabilistic model that assumes all data points are generated from a
mixture of several Gaussian distributions.
 Autoencoders: Neural networks designed for unsupervised learning, typically used for dimensionality
reduction or anomaly detection.
 t-Distributed Stochastic Neighbor Embedding (t-SNE): A technique used to visualize high-dimensional
data in 2D or 3D space.

3. Reinforcement Learning Models

Reinforcement learning (RL) involves training an agent to make a sequence of decisions by interacting with an
environment and receiving rewards or penalties. The model learns to maximize cumulative rewards over time.

Common Reinforcement Learning Models:

 Q-Learning: A model-free RL algorithm where the agent learns a policy by estimating the value of taking
certain actions in certain states.
 Deep Q Networks (DQN): A combination of Q-Learning and deep learning, where a deep neural network
approximates the Q-values.
 Policy Gradient Methods: A family of RL algorithms that optimize the policy directly, rather than the
value function.
 Actor-Critic Models: These models combine value-based and policy-based methods, where the "actor"
selects actions and the "critic" evaluates them.
 Proximal Policy Optimization (PPO): A more stable RL algorithm, widely used in training deep
reinforcement learning models.

4. Semi-Supervised Learning Models

In Semi-Supervised Learning, the model is trained with a small amount of labeled data and a large amount of
unlabeled data. This approach leverages the vast amounts of unlabeled data, with the small labeled set guiding the
model’s learning.

Common Semi-Supervised Learning Models:

 Self-Training: The model is initially trained on the labeled data and then predicts labels for the unlabeled
data, which are added to the training set.
 Co-Training: Two models are trained on different views of the data and help label the unlabeled data for
each other.
 Graph-Based Methods: Use the relationships between data points (represented as a graph) to propagate
labels from labeled data to unlabeled data.

5. Self-Supervised Learning Models

Self-supervised learning is a type of unsupervised learning where the system learns by creating labels from the input
data itself. It generates pseudo-labels based on inherent structures within the data and learns to predict these labels.

Examples:

 Contrastive Learning: Used in deep learning, where a model learns to distinguish between similar and
dissimilar data points by maximizing the agreement between positive pairs and minimizing the agreement
between negative pairs.
 SimCLR: A self-supervised learning framework that learns representations of images by maximizing the
similarity between augmented versions of the same image.

2.3 Key challenges in ML production

While Machine Learning (ML) models can be highly effective in solving problems and automating tasks, deploying
and maintaining them in production environments introduces several challenges. These challenges span the entire
machine learning lifecycle, from data collection and model training to deployment, monitoring, and maintenance.
Below are some of the key challenges in ML production:

1. Data Quality and Availability

 Data Cleaning and Preprocessing: Raw data is often noisy, incomplete, or inconsistent, which requires
significant cleaning, transformation, and normalization before it can be used for training. This step is
critical, as the model's performance heavily depends on the quality of the data it learns from.
 Data Labeling: For supervised learning, acquiring accurately labeled data can be expensive, time-
consuming, and error-prone, especially when working with large datasets.
 Imbalanced Data: In many real-world datasets, certain classes of data may be underrepresented, leading to
biased models that perform poorly on underrepresented classes.
 Data Drift: Over time, the statistical properties of data can change (data drift), making previously trained
models less effective. Continuous monitoring and adaptation are needed to address this.

2. Model Overfitting and Generalization

 Overfitting: A common problem where a model performs well on training data but poorly on unseen data
(test set or production data). This occurs when the model learns noise or patterns that do not generalize well
to new data.
 Bias-Variance Tradeoff: Striking the right balance between a model's complexity (high variance) and its
simplicity (high bias) is often difficult. A model that is too complex may overfit, while a simpler model
may underperform.

3. Scalability and Efficiency

 Training Time: Some ML models, particularly deep learning models, can require significant
computational resources and time to train, especially on large datasets. This can make it difficult to iterate
quickly.
 Inference Speed: Once a model is deployed, the inference (prediction) speed is crucial for real-time
applications. Ensuring low latency and high throughput during inference, especially in production
environments with large-scale data, can be a major challenge.
 Resource Consumption: Many ML models, especially deep learning models, are resource-intensive in
terms of memory, CPU, and GPU usage. Optimizing these models to be resource-efficient is essential for
production environments.

4. Model Deployment and Integration

 Model Versioning: Keeping track of different versions of models and ensuring the correct version is
deployed can become complex, especially when multiple models are in production simultaneously.
 Deployment Pipelines: Building an automated and robust ML pipeline to handle the deployment process,
including testing, continuous integration, and monitoring, can be challenging.
 Model Compatibility: Ensuring that the model works well across different platforms, environments, or
devices (e.g., on-premise servers, cloud infrastructure, edge devices) can introduce integration challenges.
5. Monitoring and Maintenance

 Model Monitoring: Monitoring a model’s performance in production is essential for identifying any drop
in accuracy or problems that arise due to data drift or model degradation over time.
 Detecting Concept Drift: Changes in the underlying distribution of the data (concept drift) can lead to
reduced model performance. Continuously retraining models with new data or adapting the model is
necessary to keep it relevant.
 A/B Testing: When deploying a new model, it's important to perform A/B testing to compare it with the
previous version and evaluate improvements. This requires careful setup and analysis.

6. Model Interpretability and Explainability

 Black-box Models: Many complex ML models, especially deep neural networks, function as “black
boxes” — meaning they are not easily interpretable. This is a challenge when stakeholders need to
understand how a model arrived at a particular decision, especially in regulated industries like healthcare,
finance, and law.
 Accountability and Trust: As models make critical decisions, it is essential to ensure that they can be
trusted, and their outputs are understandable and explainable to end-users, especially in high-risk
applications.

7. Security and Privacy Concerns

 Data Privacy: ML models can inadvertently memorize sensitive data, leading to privacy issues. Ensuring
that data used for training and predictions complies with privacy regulations like GDPR is crucial.
 Adversarial Attacks: ML models are vulnerable to adversarial attacks, where small, intentionally crafted
changes to input data can cause the model to make incorrect predictions. Robustness to such attacks is
important for models in security-sensitive applications.
 Model Theft and Reverse Engineering: In production, ML models can be reverse-engineered or stolen.
Securing models and their endpoints against unauthorized access or misuse is a key challenge.

8. Compliance and Ethical Considerations

 Bias and Fairness: Ensuring that models do not perpetuate or amplify bias based on race, gender, age, etc.,
is a growing concern in ML. Bias can arise from biased data or unfair treatment of certain groups by the
model.
 Regulations: In many industries (e.g., finance, healthcare), there are strict regulations about how data is
used and models are deployed. Ensuring compliance with these regulations can be challenging.
 Ethical AI: Ensuring that AI systems operate in an ethical and responsible manner, without causing harm
or making discriminatory decisions, is an ongoing challenge.

9. Collaboration Between Teams

 Cross-functional Collaboration: Building ML models often requires collaboration between data scientists,
engineers, product managers, and domain experts. Miscommunication or lack of alignment between these
teams can hinder progress and lead to inefficiencies.
 Communication of Results: Explaining technical results to non-technical stakeholders is often difficult,
yet essential for business decisions. Effective communication is required to translate model outputs into
actionable insights.

10. Cost Management

 Computational Costs: Training large-scale models (e.g., deep learning models) can be expensive in terms
of computational resources, especially when using GPUs or cloud-based infrastructure. Optimizing these
costs while maintaining model performance is a key challenge.
 Operational Costs: Running models in production at scale requires ongoing infrastructure and monitoring
costs. Efficiently managing these costs while ensuring high availability and low latency is critical.

2.4 ML Project Lifecycle along with the pitfalls of focusing on only ML models

The ML Project Lifecycle has following stages

Define the Business Goal: This is the first and crucial step where the objective of the ML project is clearly defined.
The business problem that needs to be solved with ML is identified, and success criteria are established. It’s
important to understand what you want to achieve through the model, like improving sales, reducing costs, etc.

Collect and Prepare Data: Once the goal is defined, the next step is to gather relevant data from various sources.
This may involve extracting data from databases, APIs, or other sources. After collecting the data, it needs to be
cleaned and prepared (e.g., removing duplicates, handling missing values) so that it is ready for the model-building
process.

Build and Deploy Model: In this stage, different machine learning algorithms are applied to the data to build a
predictive model. Feature engineering, selection of the right algorithm, and training the model with the data are all
part of this step. Once the model is trained and evaluated, it is deployed into a production environment to be used by
the application.

Integrate with Application: After deploying the model, it needs to be integrated with a real-world application. This
involves using the model’s predictions in a practical system or service that can serve the end-users or business
processes. For instance, an ML model might be embedded into a recommendation engine or a customer service
chatbot.

Monitor Impact: The final stage is about continuously monitoring the model's performance and ensuring that it is
delivering the expected business outcomes. If the model’s accuracy or relevance declines over time (e.g., due to
changing data), it may need to be retrained, updated, or adjusted. Monitoring ensures the model remains useful and
effective in meeting the business goals.

Each stage is part of a cyclical process, as indicated by the loop, meaning that after monitoring, the business goals
might be refined, and the process begins again.
The pitfalls in ML models are

a) Using the wrong datasets, which can easily lead to inaccurate or biased results.

b) Lacking enough labeled data to build a model

c) Finding out historical features used to train the model are unavailable in the production or real-time environment

d) Discovering there is no practical way to integrate the model predictions into the current application

e) Realizing the ML project costs are higher than the generated value or, in a worst-case scenario, cause losses in
revenue or customer satisfaction

2.5 The Machine Learning Process

The machine learning process involves several key steps, starting from understanding the problem and preparing the
data, to building and deploying a model, and eventually monitoring its performance in a production environment.
This process is iterative and requires continuous refinement. Below is a detailed overview of the Machine Learning
Process:

1. Problem Definition

Before diving into the technical aspects, it’s crucial to understand the problem at hand. This involves:

 Understanding Business Requirements: Determine what the goal of the project is (e.g., classification,
regression, recommendation). Understanding the business context will help in framing the problem
correctly.
 Defining the Objective: This step involves specifying clear, measurable objectives, such as predicting
customer churn, classifying emails as spam or not, or forecasting sales.
 Setting Evaluation Metrics: Metrics like accuracy, precision, recall, F1 score (for classification), or mean
squared error (MSE) for regression should be defined early on to measure the model's success.

2. Data Collection and Acquisition

Data is the foundation of machine learning, so gathering high-quality data is critical. This step includes:

 Collecting Data: Data can be collected from multiple sources like databases, APIs, web scraping, or third-
party datasets. The data could include structured data (tables), unstructured data (images, text), or semi-
structured data (XML, JSON).
 Data Availability: Assess if the required data is available in sufficient quantities and whether it’s diverse
enough to train a robust model.
 Understanding the Data: Data exploration helps in understanding its size, type, structure, and any
potential issues like missing values or outliers.

3. Data Preprocessing and Cleaning

Once the data is collected, it needs to be cleaned and transformed into a usable format. This stage involves:
 Handling Missing Data: Missing values can be handled by techniques like imputation, removing rows
with missing values, or filling them with default values (e.g., mean, median).
 Data Transformation: Data needs to be transformed into a format suitable for machine learning
algorithms. This may involve normalization (scaling numerical values) or encoding categorical data.
 Outlier Detection: Identifying and handling outliers (extreme values) to prevent them from negatively
impacting the model.
 Feature Engineering: Creating new features (derived variables) from existing ones can improve the
model’s performance. This might involve combining or transforming raw features, for example, generating
a “day of the week” feature from a timestamp.
 Feature Selection: Selecting the most relevant features to reduce complexity and improve model
performance. Techniques like correlation analysis, decision trees, or L1 regularization can be used.

4. Exploratory Data Analysis (EDA)

EDA is an essential step to better understand the data, its patterns, and any relationships between features. This
process typically involves:

 Visualizing the Data: Using plots (e.g., histograms, scatter plots, box plots) to uncover patterns,
correlations, and anomalies in the data.
 Statistical Analysis: Computing summary statistics (e.g., mean, median, standard deviation) to get a sense
of data distribution.
 Identifying Patterns: Understanding the relationships between different features to decide which variables
are important for the model.

5. Model Selection

In this stage, the appropriate machine learning model or algorithm is selected based on the problem type
(classification, regression, clustering, etc.) and the nature of the data. The models might include:

 Supervised Learning: If you have labeled data, algorithms like Logistic Regression, Decision Trees,
Random Forests, Support Vector Machines, or Neural Networks might be appropriate.
 Unsupervised Learning: For unlabeled data, clustering techniques like K-Means, DBSCAN, or
dimensionality reduction techniques like PCA or t-SNE might be used.
 Reinforcement Learning: If the problem involves learning from an environment through trial and error
(e.g., gaming, robotics), reinforcement learning models such as Q-Learning or Deep Q Networks (DQN)
may be suitable.

6. Model Training

Once the model is selected, the next step is training it on the dataset. This involves:

 Splitting the Data: Typically, the data is split into training, validation, and test sets. The model is trained
on the training set, tuned on the validation set, and evaluated on the test set.
 Model Training: The training process involves feeding the data into the chosen model to learn the
underlying patterns. During training, the model adjusts its internal parameters (like weights in neural
networks) to minimize a loss function.
 Hyperparameter Tuning: ML models often have hyperparameters (like learning rate, regularization
strength, or number of trees in a random forest) that need to be optimized. This can be done using
techniques like Grid Search, Random Search, or Bayesian Optimization.

7. Model Evaluation

After training, the model’s performance is assessed using appropriate evaluation metrics based on the type of
problem. Common metrics include:

 Classification Metrics: For classification tasks, metrics like accuracy, precision, recall, F1 score, and
AUC-ROC curve are used.
 Regression Metrics: For regression tasks, metrics like mean squared error (MSE), mean absolute error
(MAE), R² are used.
 Cross-Validation: Techniques like K-Fold Cross-Validation are used to ensure that the model performs
well on unseen data and is not overfitting.

8. Model Optimization and Fine-Tuning

Once a model has been evaluated, the next step is to optimize and fine-tune it. This involves:

 Adjusting Hyperparameters: Fine-tuning the model’s hyperparameters using techniques like Grid
Search or Random Search to improve performance.
 Regularization: Applying regularization techniques like L1, L2, or Dropout to prevent overfitting and
improve generalization.
 Ensemble Methods: Combining multiple models (e.g., Random Forests, Boosting algorithms like
XGBoost or AdaBoost) to improve predictive performance.

9. Model Deployment

Once the model has been trained and optimized, it is ready to be deployed into a production environment. This
involves:

 Model Serving: Setting up a system to serve the model for real-time predictions (e.g., using REST APIs,
cloud services, or on-premise servers).
 Containerization: Packaging the model in containers (e.g., using Docker) for easy deployment and
scalability.
 CI/CD Pipelines: Implementing Continuous Integration (CI) and Continuous Deployment (CD) pipelines
for automating model updates, monitoring, and maintenance.

10. Model Monitoring and Maintenance

After deployment, continuous monitoring of the model’s performance in production is crucial:

 Monitoring: Track performance metrics to detect any drop in accuracy or other performance issues due to
data drift or model decay.
 Model Retraining: As new data becomes available, the model might need to be retrained to adapt to
changes. This can be done periodically or triggered by performance degradation.
 Logging and Alerts: Set up logging and alerting mechanisms to detect failures, errors, or performance
issues in real-time.

11. Model Retraining

The machine learning model might require retraining due to:

 Data Drift: Changes in the underlying data distribution over time that can lead to reduced model
performance.
 Concept Drift: Changes in the relationship between input features and target variables over time.
 New Data: Incorporating new data into the model to improve its predictions.

2.6 Significance of source code control in ML Project

Source code control (also known as version control) is essential in managing the complexity of software
development and particularly crucial in machine learning (ML) projects. As ML models evolve over time, tracking
changes to the code, data, and models becomes increasingly important. Proper version control enables the team to
work collaboratively, maintain reproducibility, and manage complex changes.

 History: Source code control, or version control, has been used in software development for decades, with
early systems like RCS. It evolved into modern distributed version control systems like Git.

 Role in Version Control: In ML projects, source code control tracks changes to code, data, and model
artifacts. It ensures versioning, traceability, and the ability to roll back to previous states. This is crucial for
reproducibility.

 Role in Collaboration: Source code control enables collaboration among data scientists, engineers, and
researchers. It allows multiple team members to work on code simultaneously, merge changes, and resolve
conflicts. It fosters efficient teamwork and maintains code quality.

 Significance: It ensures that machine learning projects can be reliably reproduced, tested, and scaled. It also
provides a history of changes, which is valuable for debugging and auditing. Source code control is an
essential part of the MLOps lifecycle.

Role of Source Code Control in ML Projects

In ML projects, source code control plays several critical roles:

1. Versioning of Code, Data, and Models: ML projects involve not just code (which implements models and
algorithms) but also datasets and model parameters. Keeping track of different versions of code, data, and
trained models allows for efficient model iteration, comparison, and deployment. This is particularly
important when:
o Different versions of models need to be tested against each other.
o The team is iterating over experiments that require consistent tracking of changes.
o Reproducing past results for debugging or further development is necessary.
2. Collaboration: Source control enables multiple team members (data scientists, software engineers, product
managers) to work on the same project simultaneously without conflicts. Each team member can work on
different parts of the project, with changes being merged smoothly.
o Branching and Merging: Team members can work on different features or improvements
independently and then merge them into the main codebase.
o Collaboration Tools: Modern version control platforms (e.g., GitHub, GitLab, Bitbucket) provide
tools for reviewing and approving code changes, thus improving collaboration and transparency.
3. Reproducibility: In ML, reproducing experiments is crucial for validating models and ensuring
consistency. Version control helps ensure that the exact version of code, dependencies, and data that
produced a certain result can be retrieved. This reproducibility is particularly important in research and
regulated industries like healthcare or finance.
o By tagging specific commits or versions, teams can capture the state of the model at any point and
recreate it in the future.
4. Experiment Tracking: Machine learning projects involve frequent experimentation with different models,
hyperparameters, and datasets. Source control systems like Git can be used alongside experiment tracking
tools (such as MLflow or Weights & Biases) to track these experiments, log parameters, and store results.
This allows teams to:
o Compare different experiments.
o Rollback to earlier versions of models or code when needed.
5. Code Integrity and Quality: By using version control systems, teams can maintain code integrity through
automated checks, continuous integration, and testing. Source control tools support:
o Automated tests to check the validity of models or code changes.
o Code reviews, where team members review and suggest improvements to each other’s code
before it gets merged.
6. Model Deployment and Rollback: ML models evolve over time, and new models are deployed in
production based on experiments. Source control can help ensure that each model version is correctly
deployed and rolled back when necessary. This is especially useful when a newly deployed model leads to
performance issues or errors, as the previous stable version can be restored quickly.

Example: Your team is starting up a new ML Project and you have been asked to set up code Repo (on Azure or
GitHub) for the same. Please Perform the needed steps as below and upload a screenshot of the commands as
answers.

1: Initialize the git repo on your local System.

2: Configure your user details in the git client.

3: Prepare .gitignore, readme_file for Python Project

4: Create 3 files created in the root project directory – {your_name}.py and {your sur_name}.py &
{your_sap_id}.py

3: Commit {your_name}.py with the message “{your name} committing this change for term test 1”

4: Discard {your sur_name}.py from the working directory.

5: Push {your_sap_id}.py along with. gitignore, readme_file to your remote repo.

2.7 Personas for an ML platform

Various Personas for an ML Platform:

1. Data Scientist:

 Role: Develops and trains machine learning models.

 Responsibilities: Data exploration, model building, and experimentation.

 Needs: Access to data, experimentation environment, and version control.

2. Machine Learning Engineer:

 Role: Translates data science models into production-ready systems.

 Responsibilities: Model deployment, scalability, and optimization.

 Needs: Deployment tools, scalability, and infrastructure knowledge.

3. Data Engineer:

 Role: Manages data pipelines and ETL processes.

 Responsibilities: Data extraction, transformation, and loading.

 Needs: Data pipeline tools, data quality monitoring.

4. DevOps/IT Administrator:

 Role: Manages platform infrastructure and operations.

 Responsibilities: Infrastructure setup, maintenance, and scaling.

 Needs: Container orchestration, monitoring, and automation.

5. Business Analyst/Product Manager:

 Role: Drives data-driven decision-making and product development.

 Responsibilities: Define project goals, KPIs, and product requirements.

 Needs: Access to analytics and insights, collaboration tools.

6. Compliance Officer/Legal:

 Role: Ensures data and model compliance with regulations.

 Responsibilities: Data privacy, model fairness, and legal compliance.

 Needs: Compliance tools and documentation.

7. End User:

 Role: Utilizes the ML-powered applications.

 Responsibilities: Interact with and benefit from ML-driven services.

 Needs: User-friendly interfaces and reliable service.

Each persona plays a vital role in the end-to-end ML platform, contributing to the success of machine learning
projects.

2.8 Model Retraining

Model Retraining is a crucial aspect of maintaining the performance and accuracy of machine learning
models over time. As data evolves, models may become less effective due to various factors such as
changes in the underlying data distribution, new trends, or the introduction of new data. Retraining helps
ensure that models remain relevant and accurate in their predictions.

For example, COVID-19 abruptly changed human behavior across the globe. But the pandemic not only
significantly impacted human lives, it also disrupted ML models. Data engineers woke up to find that
their ML models, which were trained on pre-pandemic data sets, had suddenly drifted and were not
delivering reliable results.

The models’ performance degraded because the pre-pandemic data was not reflecting current behaviors
and therefore it was no longer relevant or accurate. These models had to be retrained to ensure their
validation and efficacy for the pandemic era. While COVID-19 is an extreme example, data keeps
changing because people change and the world changes. This means models trained on outdated data lose
relevance. Model retraining, also known as continuous training or continual training, is the act of training
models again and again on updated data and then redeploying them to production.

By retraining, data engineers can ensure the models are up-to-date, valid, and trustworthy. This ensures
the predictions and outputs of models are always accurate for the business use cases they were designed
to answer. If models aren’t retrained, they will become stale. Accurate models are essential for business
success. If an organization uses a model that provides inaccurate outputs, the result could be loss of
customers and profit. For example, if a model is supposed to detect fraud but doesn’t do so accurately,
this will mean either that fraudsters get away with fraud, costing the company its customers and perhaps a
loss of millions in insurance claims, or that there will be too many false positives, resulting in frustrated
end-users (who won’t be able to make online purchases) and adverse financial impact to the company’s
customers (again, losing customers). Automating the process of model retraining makes it reliable and
optimized. Automation also reduces the chance of manual errors or data engineers forgetting to retrain
models. With automation, data engineers and data scientists can ensure their measurements are defensible
and quantitative and that explainability tests are set up.

The ultimate goal of retraining is to avoid the two types of drift:

Data drift

When the statistical distribution of production data is different from the baseline data used to train or
build the model. This happens when human behavior changes, training data was inaccurate, or there were
data quality issues.

Concept drift

When the statistical properties of the target variable change over time. In other words, the concept, or the
relations between the datasets, have drifted.

There are four main approaches for retraining:

Interval-based: According to a certain schedule or repeating interval; for example, retraining every
Sunday night or every end of the month. This ensures the models will always stay up-to-date since they
are constantly retrained. However, this method can be costly since resources are used even when
retraining is unnecessary.

Performance-based: Retraining takes place when a predetermined threshold or baseline is crossed,

which indicates model degradation and drift. This ensures the model can always answer the business use
case. However, if the threshold is inaccurately determined or the data does not come in on time, the model
could turn stale before the organization is aware and can retrain it.

Based on data changes: This type of retraining takes place when there are new data sets or when code
changes are made. Such retraining ensures adaptivity to engineering changes but might miss drift that
degrades the model performance.

Manually on-demand

This non automated retraining method provides complete control for data scientists but is prone to errors
and could mean retraining does not occur when needed.

---------------------------------------------------------------------------------------------------------------------
Some Important Questions

 What are the key challenges arises in deploying ML models in production?

 Describe ML Models in brief
 Your team is starting up a new ML Project and you have been asked to set up code Repo (on Azure or
GitHub) for the same. Please Perform the needed steps as below and upload a screenshot of the commands
as answers.
 Discuss in brief the importance of source code control in machine learning projects, including its historical
development with its role in version control and collaboration
 ML Project Lifecycle along with the pitfalls of focusing on only ML models
 Explain the Personas for an ML platform
 What is Model Retraining with respect to ML projects? Why it is required? Explain the four main
approaches for retraining

Systemverilog For Design and Verification: Engineer Explorer Series
100% (2)
Systemverilog For Design and Verification: Engineer Explorer Series
486 pages
An Exposition Ofafrican Ethics
100% (1)
An Exposition Ofafrican Ethics
20 pages
ML
No ratings yet
ML
3 pages
Assignment 2 ML
No ratings yet
Assignment 2 ML
2 pages
Basic of Machine Learning
No ratings yet
Basic of Machine Learning
7 pages
Machine Learning Models
100% (1)
Machine Learning Models
2 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
4 pages
unit 1 ml
No ratings yet
unit 1 ml
41 pages
Deep Learnng IA
No ratings yet
Deep Learnng IA
69 pages
CHP 1
No ratings yet
CHP 1
47 pages
Machine Learning concise notes
No ratings yet
Machine Learning concise notes
7 pages
Rohit Unit 1 ML Notes
No ratings yet
Rohit Unit 1 ML Notes
27 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
53 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
56 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Introduction To ML
No ratings yet
Introduction To ML
4 pages
Tutorial Sheet1 (M.L.)
No ratings yet
Tutorial Sheet1 (M.L.)
49 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
4 pages
3. Introduction to Machine Learning
No ratings yet
3. Introduction to Machine Learning
20 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Assignment No 1
No ratings yet
Assignment No 1
9 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
Lecture 2 Introduction To ML
No ratings yet
Lecture 2 Introduction To ML
35 pages
ML_UT_1_merged
No ratings yet
ML_UT_1_merged
31 pages
ML_THEORY
No ratings yet
ML_THEORY
54 pages
INTRODUCTION TO MACHINE LEARNING
No ratings yet
INTRODUCTION TO MACHINE LEARNING
31 pages
Fundamentals of ML 1
No ratings yet
Fundamentals of ML 1
38 pages
AI Module III
No ratings yet
AI Module III
14 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
MLES
No ratings yet
MLES
30 pages
data science notes c
No ratings yet
data science notes c
4 pages
AI - Learning
No ratings yet
AI - Learning
88 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Unit-1 ML[1].Docx 3rd Sem
No ratings yet
Unit-1 ML[1].Docx 3rd Sem
20 pages
21cs743 Solutions
No ratings yet
21cs743 Solutions
19 pages
MLP Unit-I
No ratings yet
MLP Unit-I
62 pages
Unit 6 Learning and Knowledge Acquisition
No ratings yet
Unit 6 Learning and Knowledge Acquisition
9 pages
1.Intro
No ratings yet
1.Intro
18 pages
ml
No ratings yet
ml
9 pages
ch 1 machine learning
No ratings yet
ch 1 machine learning
24 pages
DL MODULE 1
No ratings yet
DL MODULE 1
11 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
ML Unit1 Docx Unitr 2
No ratings yet
ML Unit1 Docx Unitr 2
46 pages
AITools Unit 2
No ratings yet
AITools Unit 2
34 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
ML Revision
No ratings yet
ML Revision
207 pages
Study On Machine Learning Research Paper
No ratings yet
Study On Machine Learning Research Paper
17 pages
DIR Notes 1
No ratings yet
DIR Notes 1
39 pages
book of 843_AI_Student_HandbookXI-104-127
No ratings yet
book of 843_AI_Student_HandbookXI-104-127
24 pages
Machine%20learning.pdf
No ratings yet
Machine%20learning.pdf
4 pages
ai_presentation
No ratings yet
ai_presentation
28 pages
R22 Machine Learning Digital Notes Final
No ratings yet
R22 Machine Learning Digital Notes Final
143 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Ai Unit-4 ML
No ratings yet
Ai Unit-4 ML
4 pages
Machine Learning Chapter 1
No ratings yet
Machine Learning Chapter 1
12 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
NEP Syllabus Questions
No ratings yet
NEP Syllabus Questions
3 pages
Annual Report
No ratings yet
Annual Report
207 pages
Chapter 6
No ratings yet
Chapter 6
24 pages
Chapter 4
No ratings yet
Chapter 4
14 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
93 - Prepositions of Movement PDF
No ratings yet
93 - Prepositions of Movement PDF
4 pages
Nature and Stages of A Criminal Case
No ratings yet
Nature and Stages of A Criminal Case
3 pages
Bab Vii Perenc - Pondasi
No ratings yet
Bab Vii Perenc - Pondasi
12 pages
The Cult of Morr
No ratings yet
The Cult of Morr
11 pages
RSLogix 5000 Converting A Project To A PR PDF
No ratings yet
RSLogix 5000 Converting A Project To A PR PDF
3 pages
How Consulting Is Changing in India
No ratings yet
How Consulting Is Changing in India
7 pages
CA4-Portfolio - C - Job Search Form (JSF)
No ratings yet
CA4-Portfolio - C - Job Search Form (JSF)
7 pages
101 Blog Topic Ideas by BetterCreator
75% (4)
101 Blog Topic Ideas by BetterCreator
14 pages
Student Centered Lesson Plan
No ratings yet
Student Centered Lesson Plan
2 pages
4.kharbanda Et Al 2017
No ratings yet
4.kharbanda Et Al 2017
6 pages
03 - Bombay - Foreign - Liquor - Rules - 1953 - 1 - 1 - 1 - Police Important
No ratings yet
03 - Bombay - Foreign - Liquor - Rules - 1953 - 1 - 1 - 1 - Police Important
72 pages
Had Better Explanation
No ratings yet
Had Better Explanation
1 page
RDOS Best Practice Guidelines For Design of Storage and Collection Space of Waste in MF Commercial and Mixed Use Buildings Final
No ratings yet
RDOS Best Practice Guidelines For Design of Storage and Collection Space of Waste in MF Commercial and Mixed Use Buildings Final
22 pages
Dbatu Syllabus
No ratings yet
Dbatu Syllabus
31 pages
Joseph_Keller
No ratings yet
Joseph_Keller
3 pages
Physical Education in The Tertiary Level
No ratings yet
Physical Education in The Tertiary Level
35 pages
Full Download Geologic Time Scale 2020 Felix M. Gradstein PDF
100% (2)
Full Download Geologic Time Scale 2020 Felix M. Gradstein PDF
64 pages
Chennai Company Details
100% (1)
Chennai Company Details
31 pages
Quaid-e-Azam (History Project)
No ratings yet
Quaid-e-Azam (History Project)
3 pages
PERSONALITY and WORK MOTIVATION
No ratings yet
PERSONALITY and WORK MOTIVATION
12 pages
UP Diliman Map
0% (1)
UP Diliman Map
1 page
Karnataka Professional Civil ..-1
No ratings yet
Karnataka Professional Civil ..-1
2 pages
SciMethod Presentation Rubric
No ratings yet
SciMethod Presentation Rubric
2 pages
CMMI Model Content Changes
No ratings yet
CMMI Model Content Changes
8 pages
Error Codes Swingo 4000
No ratings yet
Error Codes Swingo 4000
4 pages
SW 103 1st Topic PDF
No ratings yet
SW 103 1st Topic PDF
6 pages
Final Research
No ratings yet
Final Research
47 pages
1-Put A Period, Exclamation Mark, or A Question Mark After Each Sentence
No ratings yet
1-Put A Period, Exclamation Mark, or A Question Mark After Each Sentence
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Chapter 2

Fundamentals of Machine Learning Systems for MLOPS

2.1 Introduction to ML System

Key Concepts of Machine Learning:

Types of Machine Learning:

1. Supervised Learning Models

Common Supervised Learning Models:

2. Unsupervised Learning Models

Common Unsupervised Learning Models:

3. Reinforcement Learning Models

Common Reinforcement Learning Models:

4. Semi-Supervised Learning Models

Common Semi-Supervised Learning Models:

5. Self-Supervised Learning Models

2.3 Key challenges in ML production

1. Data Quality and Availability

2. Model Overfitting and Generalization

3. Scalability and Efficiency

4. Model Deployment and Integration

6. Model Interpretability and Explainability

7. Security and Privacy Concerns

8. Compliance and Ethical Considerations

9. Collaboration Between Teams

10. Cost Management

The ML Project Lifecycle has following stages

b) Lacking enough labeled data to build a model

2.5 The Machine Learning Process

2. Data Collection and Acquisition

3. Data Preprocessing and Cleaning

4. Exploratory Data Analysis (EDA)

8. Model Optimization and Fine-Tuning

10. Model Monitoring and Maintenance

After deployment, continuous monitoring of the model’s performance in production is crucial:

11. Model Retraining

The machine learning model might require retraining due to:

2.6 Significance of source code control in ML Project

Role of Source Code Control in ML Projects

In ML projects, source code control plays several critical roles:

1: Initialize the git repo on your local System.

2: Configure your user details in the git client.

3: Prepare .gitignore, readme_file for Python Project

4: Discard {your sur_name}.py from the working directory.

5: Push {your_sap_id}.py along with. gitignore, readme_file to your remote repo.

Various Personas for an ML Platform:

 Role: Develops and trains machine learning models.

 Responsibilities: Data exploration, model building, and experimentation.

 Needs: Access to data, experimentation environment, and version control.

2. Machine Learning Engineer:

 Role: Translates data science models into production-ready systems.

 Responsibilities: Model deployment, scalability, and optimization.

 Needs: Deployment tools, scalability, and infrastructure knowledge.

 Role: Manages data pipelines and ETL processes.

 Responsibilities: Data extraction, transformation, and loading.

 Needs: Data pipeline tools, data quality monitoring.

 Role: Manages platform infrastructure and operations.

 Responsibilities: Infrastructure setup, maintenance, and scaling.

 Needs: Container orchestration, monitoring, and automation.

5. Business Analyst/Product Manager:

 Role: Drives data-driven decision-making and product development.

 Responsibilities: Define project goals, KPIs, and product requirements.

 Needs: Access to analytics and insights, collaboration tools.

 Role: Ensures data and model compliance with regulations.

 Responsibilities: Data privacy, model fairness, and legal compliance.

 Needs: Compliance tools and documentation.

 Role: Utilizes the ML-powered applications.

 Responsibilities: Interact with and benefit from ML-driven services.

 Needs: User-friendly interfaces and reliable service.

2.8 Model Retraining

The ultimate goal of retraining is to avoid the two types of drift:

There are four main approaches for retraining:

Performance-based: Retraining takes place when a predetermined threshold or baseline is crossed,

 What are the key challenges arises in deploying ML models in production?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.