0% found this document useful (0 votes)
8 views63 pages

01 Introduction-part 2

The document provides an introduction to machine learning, covering key concepts such as types of learning (supervised, unsupervised, semi-supervised, self-supervised, and reinforcement learning) and their applications. It discusses the importance of data quality and challenges faced in machine learning, including insufficient and nonrepresentative training data. Additionally, it outlines the process of model selection, hyperparameter tuning, and generalization strategies in machine learning.

Uploaded by

david1milad1982
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views63 pages

01 Introduction-part 2

The document provides an introduction to machine learning, covering key concepts such as types of learning (supervised, unsupervised, semi-supervised, self-supervised, and reinforcement learning) and their applications. It discusses the importance of data quality and challenges faced in machine learning, including insufficient and nonrepresentative training data. Additionally, it outlines the process of model selection, hyperparameter tuning, and generalization strategies in machine learning.

Uploaded by

david1milad1982
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Introduction to machine

learning
Points to be covered:
Part one:
 What is AI?
 When to use machine learning?
 Overview on machine learning Applications?
 What is machine learning?
 Examples on how to apply Machine Learning?
Part two:
 Difference types of learning.
 Main challenges in Machine Learning.
 Hyperparameter Tuning and Model Selection.
 Outline of course content

2
Part two
Types of Learning
• Supervised (inductive) learning
– Given: training data + desired outputs (labels)(labeled data)
• Unsupervised learning
– Given: training data (without desired outputs)(unlabeled
data)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Self-supervised learning
– The Model learn from himself. By generating its own label
from the input data.
• Reinforcement learning
– The Model learn from Rewards and Penalties.
Supervised Learning

The supervised learning is the ML model where the


training data you feed to the algorithm includes the
desired solutions, called labels is divided into:
 Classification
 Regression.
Supervised Learning
Classification example:

The spam filter is a good example of this: it is trained with many example
emails along with their class (spam or ham), and it must learn how to classify
new emails.

A labeled training set for supervised learning (e.g., spam classification)


Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
 Breast Cancer (Malignant / Benign)
‫خبيثة‬ ‫حميدة‬

 1(Malignant)

 0(Benign)
 Tumor Size
‫ورم‬
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
 Breast Cancer (Malignant / Benign)

 1(Malignant)

 0(Benign)
 Tumor Size

Tumor Size
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size
Predict Benign Predict Malignant

Tumor Size
Supervised Learning
• x can be multi-dimensional
– Each dimension corresponds to an attribute

- Uniformity of Cell Size


- Uniformity of Cell Shape
Age

Tumor Size
Supervised Learning
Regression example:

Predict a target numeric value, such as the price of a car, given a set of
features (mileage, age, brand, etc.) called predictors. This sort of task is called
regression.
To train the system, you need to give it many examples of cars, including both
their predictors and their labels (i.e., their prices).

Regression
Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent

7
(1,000,000 sq km)

6
‫نطاق الجليد البحري‬
‫في القطب الشمالي‬ 5
‫لشهر سبتمبر‬ 4
3
2
1
0
1970 1980 1990 2000 2010 2020
Unsupervised Learning
In unsupervised learning, as you might guess, the
training data is unlabeled.
The Model tries to learn without a teacher. So, the
Model find structure and pattern in the data on its
own.
Given input
Output hidden

An unlabeled training set for unsupervised


learning
Unsupervised Learning
Here are some of the most important unsupervised
learning algorithms :
• Clustering
• Dimensionality reduction
• Anomaly detection
• Association rule learning
Unsupervised Learning
• Given x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s
– E.g., clustering
Unsupervised Learning
For example, say you have a lot of data about your visitors. You may want to
run a clustering algorithm to try to detect groups of similar visitors .
• Solution the clustering algorithm try to find structure and pattern in the
data on its own to divide the visitor into clusters.
For example:
• 40% of your visitors are males who love comic books and generally read in
the evening
• 20% are young scientific lovers who visit during the weekends, and so on.

Clustering
If you use a hierarchical clustering algorithm, it may also subdivide each group
into smaller groups. This may help you target your posts for each group.
Another example on Unsupervised
Learning
Unsupervised Learning
Dimensionality reduction
In which the goal is to simplify the data without losing too much
information, it has two types:
 One way to do this is to merge several correlated features into
one (Feature extraction).
For example, a car’s mileage may be very correlated with its age,
so the dimensionality reduction algorithm will merge them into
one feature that represents the car’s wear and tear.
 Another way to select features from many features which is
more related to problem (Feature selection)
Advantage to use Dimensionality reduction:
 Model will run faster
 Data will take up less space and memory
Unsupervised Learning
Anomaly detection
Anomaly detection also know as outliers detection, it is a process used to
identify the unusual patterns or observation in data known as (outliers or
anomalies).
For example, detecting unusual credit card transactions to prevent fraud, or
automatically removing outliers from a dataset before feeding it to another
learning algorithm.
The system is shown mostly normal instances during training, so it learns to
recognize them and when it sees a new instance it can tell whether it looks like
a normal one or whether it is likely an anomaly.

Anomaly detection
Unsupervised Learning
Anomaly detection
The importance of Anomaly detection across different domains:
 Finance: Identify fraudulent transactions.
 Cybersecurity: Detecting intrusions.
 Healthcare: Monitoring patients vitals and identifying unusual readings that
could indicate a medical issues.
 IoT: Detecting abnormal behaviors in connected devices.
Unsupervised Learning
Association rule learning
Association rule learning, in which the goal is to dig into large amounts of data
and discover interesting relations between attributes.
For example, suppose you own a supermarket. Running an association rule
on your sales logs may reveal that people who purchase barbecue sauce and
potato chips also tend to buy steak.
Thus, you may want to place these items close to each other.
Unsupervised Learning
Association rule learning (Example)
Imagine a small dataset of transactions recorded by a supermarket.
Each transaction list items purchased by a customer:
• Transaction 1: Bread, Milk
• Transaction 2: Bread, Diapers, Juice, Eggs
• Transaction 3: Diapers, Juice, Cola
• Transaction 4: Bread, Milk, Diapers, Juice
• Transaction 5: Bread, Milk, Diapers, Cola.
Discovering Associations Rule:
Identifying Item sets: First identify frequent items (That appear frequently together),
For example:(Bread, Milk).
Calculating Support: Calculated by dividing number of transaction that appears item
over total number of transaction, Example: Support(Bread, Milk)= 3/5= 60%
Generating rule: From the frequent item sets , we generate the rule. For example, one
rule could be Bread => Milk, means customer buy Bread always buy Milk
Unsupervised Learning
Association rule learning (Example)

Discovering Associations Rule:


Identifying Item sets: (Bread, Milk).
Calculating Support: Support(Bread, Milk)= 3/5= 60%
Generating rule: Bread => Milk
Calculating Confidence: The confidence for rule Bread=> Milk is equals the number of
transactions containing both Bread and Milk divided by the number of transactions
containing Bread.
So Confidence(Bread=> Milk)=Support (Bread, Milk)/Support(Bread)= 60%/ (4/5) = 75%
Unsupervised Learning
Association rule learning (Example)

Discovering Associations Rule:


Identifying Item sets: (Bread, Milk).
Calculating Support: Support(Bread, Milk)= 3/5= 60%
Generating rule: Bread => Milk
Calculating Confidence: 75%
Calculating Lift: lift measures the likelihood of bread and Milk being bought together
compared to being bought independent.
Lift (Bread=> Milk)= Confidence (Bread => Milk)/ Support (Milk)= 75%/(3/5)= 125%
A Lift value greater than 1 indicates that Bread and Milk more likely to be bought
together than separately.
Unsupervised Learning

Organize computing clusters Social network analysis

Astronomical data analysis


Semi-supervised Learning

Example:
Some photo-hosting services, such as Google Photos, are good examples of
this.
Once you upload all your family photos to the service, it automatically
recognizes that the same person A shows up in photos 1, 5, and 11, while
another person B shows up in photos 2, 5, and 7. This is the unsupervised
part of the algorithm (clustering).
Now all the system needs is for you to tell it who these people are. Just one
label per person 5 and it is able to name everyone in every photo, which is
useful for searching photos.
Self-supervised Learning

To predict one part of an image given another part, or for text, to predict the
next word in a sentence.
Reinforcement Learning
• Given a sequence of states and actions with (delayed)
rewards, output a policy
– Policy is a mapping from states  actions that tells you what to
do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand

Reinforcement Learning
Reinforcement Learning
Key components of Reinforcement learning:
Agent: The learner or decision- maker.

Environment: The space in which the agent operates.

State: A representation of the current situation or condition in which the


agent is.

Actions: What the agent can do or the choices it can make.

Reward: Feedback from the environment in response to the agent’s


actions.

Policy: A strategy used by the agent to determine its actions based on


the current state.
Reinforcement Learning
The Agent-Environment Interface
Idea:
Receive feedback in the form of rewards
Agent ʼ s utility is defined by the reward function
Must (learn to) act so as to maximize expected
rewards

Agent and environment interact at discrete time steps : t  0, 1, 2, K


Agent observes state at step t : st S
produces action at step t : at  A(st )
gets resulting reward : rt1 
and resulting next state : st 1

... rt +1 s rt +2 s rt +3 s ...
st a t +1
at +1 t +2
at +2 t +3
at +3
t
Types of Learning From another
perspective
Another criterion used to classify Machine Learning systems is whether or
not the system can learn incrementally from a stream of incoming data.
• In batch learning, the system is incapable of learning incrementally: it
must be trained using all the available data. This will generally take a lot of
time and computing resources, so it is typically done offline.
First the system is trained, and then it is launched into production and runs
without learning anymore; it just applies what it has learned. This is called
offline learning.

• In online learning, you train the system incrementally by feeding it data


instances sequentially, either individually or by small groups called mini-
batches. Each learning step is fast and cheap, so the system can learn
about new data on the fly, as it arrives. This is called online learning.
Batch learning

Batch learning features:

Example on application that uses Batch learning


Online learning
Online learning

Online learning
Online learning

Online learning algorithms can also be used to train systems on huge


datasets that cannot fit in one machine’s main memory (this is called out-
of-core learning).
The algorithm loads part of the data, runs a training step on that data, and
repeats the process until it has run on all of the data.

Using online learning to handle huge datasets


Types of Learning From another
perspective
Generalization:

One more way to categorize Machine Learning systems is by how they


generalize.

Most Machine Learning tasks are about making predictions. This means
that given a number of training examples, the system needs to be able to
generalize to examples it has never seen before.

Having a good performance measure on the training data is good, but


insufficient; the true goal is to perform well on new instances.

There are two main approaches to generalization:


• Instance-based learning
• Model-based learning.
Instance-based learning
Instance-based learning:

The system learns the examples by heart, then generalizes to new cases
by comparing them to the learned examples (or a subset of
them), using a similarity measure.

For example, in Figure the new instance would be classified as a triangle


because the majority of the most similar instances belong to that class.

Instance-based learning
Instance-based learning
Instance-based learning:

Example:
Instead of just flagging emails that are identical to known spam emails,
your spam filter could be programmed to also flag emails that are very
similar to known spam emails.

This requires a measure of similarity between two emails.


A (very basic) similarity measure between two emails could be to count
the number of words they have in common.

The system would flag an email as spam if it has many words in common
with a known spam email.

Most Common used ML model uses Instance-based learning (KNN)

Note:
Model-based learning
Model-based learning:

Is another way to generalize from a set of examples is to build a model of


these examples, then use that model to make predictions.

For example, suppose you want to know if money makes people happy,
GDP per capita example seen before in previous lecture.

Model-based learning
Main challenges in ML
Main challenges :

The main challenges may be from data or model.

Start with from data (Bad data) which may be due to:

• Insufficient Quantity of Training Data

• Nonrepresentative Training Data


Sampling Noise
Sampling Bias

• Poor-Quality Data
Incomplete Data
Inconsistent Data
Outdated Data

• Irrelevant Features
Feature engineering
Insufficient Quantity of Training Data
Insufficient Quantity of Training Data:

For a toddler to learn what an apple is, all it takes is for you to point to an
apple and say “apple” (possibly repeating this procedure a few times). Now
the child is able to recognize apples in all sorts of colors and shapes.

Machine Learning is not quite there yet; it takes a lot of data for most
Machine Learning algorithms to work properly. Even for very simple
problems you typically need thousands of examples, and for complex
problems such as image or speech recognition you may need millions of
examples.

For example, AI models, especially deep learning requires large amount


of data to learn effectively.

Even very simple problems, may need thousands of examples or millions


to solve problem.
Insufficient Quantity of Training Data
Insufficient Quantity of Training Data:
Nonrepresentative Training Data
Nonrepresentative Training Data:

In order to generalize well, it is crucial that your training data be


representative of the new cases you want to generalize to.

A more representative training sample


Nonrepresentative Training Data
Nonrepresentative Training Data:

It is crucial to use a training set that is representative of the cases you


want to generalize to.
This is often harder than it sounds: if the sample is too small, you will have
sampling noise

Even very large samples can be nonrepresentative if the sampling method


is flawed. This is called sampling bias.
Nonrepresentative Training Data

Sampling noise and Sampling Bias:


Poor-Quality Data
Poor-Quality Data:
Irrelevant features
Irrelevant features:

For

A critical part of the success of a Machine Learning project is coming up with a


good set of features to train on.
This process, called feature engineering, involves:

Feature selection: selecting the most useful features to train on among existing
features.
Feature extraction: combining existing features to produce a more useful one
(as we saw earlier, dimensionality reduction algorithms can help).Creating new
features by gathering new data.
Main challenges in ML
Main challenges :

The main challenges may be from data or model.

(Bad model) which may be due to:

• Overfitting the Training Data


• Underfitting the Training Data
Overfitting the Training Data
Overfitting the Training Data:
In Machine Learning this is called overfitting: it means that the model
performs well on the training data, but it does not generalize well.
Overfitting the Training Data

Causes of overfitting :
Overfitting the Training Data
To solve overfitting:

Example:
Underfitting the Training Data
Underfitting the Training Data:

Underfitting is the opposite of overfitting: it occurs when your model is too


simple to learn the underlying structure of the data.
For example, a linear model of life satisfaction is prone to underfit; reality is
just more complex than the model, so its predictions are bound to be
inaccurate, even on the training examples.
Underfitting the Training Data
Underfitting the Training Data:
Underfitting the Training Data
To solve Underfitting:
Hyperparameter tuning and Model
selection
Model selection:
Suppose you are hesitating between two models (say a linear model and a polynomial
model):

How can you decide?


One option is to train both and compare how well they generalize using the test set.

Development set:

Suppose that the linear model generalizes better, but you want to apply some
regularization to avoid overfitting.

The question is: how do you choose the value of the regularization hyperparameter?
One option is to train 100 different models using 100 different values for this
hyperparameter.
Hyperparameter tuning and Model
selection

The problem is that you measured the generalization error multiple times on the test
set, and you adapted the model and hyperparameters to produce the best model for
that particular set. This means that the model is unlikely to perform as well on new
data.

A common solution to this problem is called holdout validation: you simply hold out
part of the training set to evaluate several candidate models and select the best one.
The new heldout set is called the validation set (or sometimes the development set, or
dev set).
Development set

Development set is used to :

1) Train multiple models with various hyperparameters on the reduced training set
(i.e., the full training set minus the validation set), and you select the model that
performs best on the validation set.

2) After this holdout validation process, you train the best model on the full training
set (including the validation set), and this gives you the final model.

3) Lastly, you evaluate this final model on the test set to get an estimate of the
generalization error.
Development set

Development set includes:

1) Data splitting: Data is divided into three parts: the training set, the development/
validation set, and the test set. The training set is used to train the model, the
development set to validate and improve it, and the test set to evaluate its final
performance.

2) Early stop: The development set can be used to implement early stopping, which
is a form of regularization used to avoid overfitting. If the performance on the
development set begins to degraded, and the performance on the training set
continuous to improve, it’s a state that the model is starting to over fit, and the
training can be stopped.
Cross over

Cross over:
Cross over

Cross over types:


What We’ll Cover in this Course
• Supervised learning • Unsupervised learning
– Linear regression – Clustering
– Decision tree and Random K- mean
forest • Evaluation
– Support vector machines • Applications
– Naive Bayes
– Neural networks
– Introduction to deep
learning
63

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy