01 Introduction-part 2
01 Introduction-part 2
learning
Points to be covered:
Part one:
What is AI?
When to use machine learning?
Overview on machine learning Applications?
What is machine learning?
Examples on how to apply Machine Learning?
Part two:
Difference types of learning.
Main challenges in Machine Learning.
Hyperparameter Tuning and Model Selection.
Outline of course content
2
Part two
Types of Learning
• Supervised (inductive) learning
– Given: training data + desired outputs (labels)(labeled data)
• Unsupervised learning
– Given: training data (without desired outputs)(unlabeled
data)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Self-supervised learning
– The Model learn from himself. By generating its own label
from the input data.
• Reinforcement learning
– The Model learn from Rewards and Penalties.
Supervised Learning
The spam filter is a good example of this: it is trained with many example
emails along with their class (spam or ham), and it must learn how to classify
new emails.
1(Malignant)
0(Benign)
Tumor Size
ورم
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
Tumor Size
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
Predict Benign Predict Malignant
Tumor Size
Supervised Learning
• x can be multi-dimensional
– Each dimension corresponds to an attribute
Tumor Size
Supervised Learning
Regression example:
Predict a target numeric value, such as the price of a car, given a set of
features (mileage, age, brand, etc.) called predictors. This sort of task is called
regression.
To train the system, you need to give it many examples of cars, including both
their predictors and their labels (i.e., their prices).
Regression
Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent
7
(1,000,000 sq km)
6
نطاق الجليد البحري
في القطب الشمالي 5
لشهر سبتمبر 4
3
2
1
0
1970 1980 1990 2000 2010 2020
Unsupervised Learning
In unsupervised learning, as you might guess, the
training data is unlabeled.
The Model tries to learn without a teacher. So, the
Model find structure and pattern in the data on its
own.
Given input
Output hidden
Clustering
If you use a hierarchical clustering algorithm, it may also subdivide each group
into smaller groups. This may help you target your posts for each group.
Another example on Unsupervised
Learning
Unsupervised Learning
Dimensionality reduction
In which the goal is to simplify the data without losing too much
information, it has two types:
One way to do this is to merge several correlated features into
one (Feature extraction).
For example, a car’s mileage may be very correlated with its age,
so the dimensionality reduction algorithm will merge them into
one feature that represents the car’s wear and tear.
Another way to select features from many features which is
more related to problem (Feature selection)
Advantage to use Dimensionality reduction:
Model will run faster
Data will take up less space and memory
Unsupervised Learning
Anomaly detection
Anomaly detection also know as outliers detection, it is a process used to
identify the unusual patterns or observation in data known as (outliers or
anomalies).
For example, detecting unusual credit card transactions to prevent fraud, or
automatically removing outliers from a dataset before feeding it to another
learning algorithm.
The system is shown mostly normal instances during training, so it learns to
recognize them and when it sees a new instance it can tell whether it looks like
a normal one or whether it is likely an anomaly.
Anomaly detection
Unsupervised Learning
Anomaly detection
The importance of Anomaly detection across different domains:
Finance: Identify fraudulent transactions.
Cybersecurity: Detecting intrusions.
Healthcare: Monitoring patients vitals and identifying unusual readings that
could indicate a medical issues.
IoT: Detecting abnormal behaviors in connected devices.
Unsupervised Learning
Association rule learning
Association rule learning, in which the goal is to dig into large amounts of data
and discover interesting relations between attributes.
For example, suppose you own a supermarket. Running an association rule
on your sales logs may reveal that people who purchase barbecue sauce and
potato chips also tend to buy steak.
Thus, you may want to place these items close to each other.
Unsupervised Learning
Association rule learning (Example)
Imagine a small dataset of transactions recorded by a supermarket.
Each transaction list items purchased by a customer:
• Transaction 1: Bread, Milk
• Transaction 2: Bread, Diapers, Juice, Eggs
• Transaction 3: Diapers, Juice, Cola
• Transaction 4: Bread, Milk, Diapers, Juice
• Transaction 5: Bread, Milk, Diapers, Cola.
Discovering Associations Rule:
Identifying Item sets: First identify frequent items (That appear frequently together),
For example:(Bread, Milk).
Calculating Support: Calculated by dividing number of transaction that appears item
over total number of transaction, Example: Support(Bread, Milk)= 3/5= 60%
Generating rule: From the frequent item sets , we generate the rule. For example, one
rule could be Bread => Milk, means customer buy Bread always buy Milk
Unsupervised Learning
Association rule learning (Example)
Example:
Some photo-hosting services, such as Google Photos, are good examples of
this.
Once you upload all your family photos to the service, it automatically
recognizes that the same person A shows up in photos 1, 5, and 11, while
another person B shows up in photos 2, 5, and 7. This is the unsupervised
part of the algorithm (clustering).
Now all the system needs is for you to tell it who these people are. Just one
label per person 5 and it is able to name everyone in every photo, which is
useful for searching photos.
Self-supervised Learning
To predict one part of an image given another part, or for text, to predict the
next word in a sentence.
Reinforcement Learning
• Given a sequence of states and actions with (delayed)
rewards, output a policy
– Policy is a mapping from states actions that tells you what to
do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand
Reinforcement Learning
Reinforcement Learning
Key components of Reinforcement learning:
Agent: The learner or decision- maker.
... rt +1 s rt +2 s rt +3 s ...
st a t +1
at +1 t +2
at +2 t +3
at +3
t
Types of Learning From another
perspective
Another criterion used to classify Machine Learning systems is whether or
not the system can learn incrementally from a stream of incoming data.
• In batch learning, the system is incapable of learning incrementally: it
must be trained using all the available data. This will generally take a lot of
time and computing resources, so it is typically done offline.
First the system is trained, and then it is launched into production and runs
without learning anymore; it just applies what it has learned. This is called
offline learning.
Online learning
Online learning
Most Machine Learning tasks are about making predictions. This means
that given a number of training examples, the system needs to be able to
generalize to examples it has never seen before.
The system learns the examples by heart, then generalizes to new cases
by comparing them to the learned examples (or a subset of
them), using a similarity measure.
Instance-based learning
Instance-based learning
Instance-based learning:
Example:
Instead of just flagging emails that are identical to known spam emails,
your spam filter could be programmed to also flag emails that are very
similar to known spam emails.
The system would flag an email as spam if it has many words in common
with a known spam email.
Note:
Model-based learning
Model-based learning:
For example, suppose you want to know if money makes people happy,
GDP per capita example seen before in previous lecture.
Model-based learning
Main challenges in ML
Main challenges :
Start with from data (Bad data) which may be due to:
• Poor-Quality Data
Incomplete Data
Inconsistent Data
Outdated Data
• Irrelevant Features
Feature engineering
Insufficient Quantity of Training Data
Insufficient Quantity of Training Data:
For a toddler to learn what an apple is, all it takes is for you to point to an
apple and say “apple” (possibly repeating this procedure a few times). Now
the child is able to recognize apples in all sorts of colors and shapes.
Machine Learning is not quite there yet; it takes a lot of data for most
Machine Learning algorithms to work properly. Even for very simple
problems you typically need thousands of examples, and for complex
problems such as image or speech recognition you may need millions of
examples.
For
Feature selection: selecting the most useful features to train on among existing
features.
Feature extraction: combining existing features to produce a more useful one
(as we saw earlier, dimensionality reduction algorithms can help).Creating new
features by gathering new data.
Main challenges in ML
Main challenges :
Causes of overfitting :
Overfitting the Training Data
To solve overfitting:
Example:
Underfitting the Training Data
Underfitting the Training Data:
Development set:
Suppose that the linear model generalizes better, but you want to apply some
regularization to avoid overfitting.
The question is: how do you choose the value of the regularization hyperparameter?
One option is to train 100 different models using 100 different values for this
hyperparameter.
Hyperparameter tuning and Model
selection
The problem is that you measured the generalization error multiple times on the test
set, and you adapted the model and hyperparameters to produce the best model for
that particular set. This means that the model is unlikely to perform as well on new
data.
A common solution to this problem is called holdout validation: you simply hold out
part of the training set to evaluate several candidate models and select the best one.
The new heldout set is called the validation set (or sometimes the development set, or
dev set).
Development set
1) Train multiple models with various hyperparameters on the reduced training set
(i.e., the full training set minus the validation set), and you select the model that
performs best on the validation set.
2) After this holdout validation process, you train the best model on the full training
set (including the validation set), and this gives you the final model.
3) Lastly, you evaluate this final model on the test set to get an estimate of the
generalization error.
Development set
1) Data splitting: Data is divided into three parts: the training set, the development/
validation set, and the test set. The training set is used to train the model, the
development set to validate and improve it, and the test set to evaluate its final
performance.
2) Early stop: The development set can be used to implement early stopping, which
is a form of regularization used to avoid overfitting. If the performance on the
development set begins to degraded, and the performance on the training set
continuous to improve, it’s a state that the model is starting to over fit, and the
training can be stopped.
Cross over
Cross over:
Cross over