0% found this document useful (0 votes)
18 views98 pages

Lec 01 (ML) Introduction

Its my presentation lecture

Uploaded by

ahsen.ejaz67
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views98 pages

Lec 01 (ML) Introduction

Its my presentation lecture

Uploaded by

ahsen.ejaz67
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Machine Learning

Dr. Faraz Akram

Riphah International University


Artificial Intelligence
Instructor: Dr. Faraz Akram
– Email: faraz.akram@riphah.edu.pk
– Office: B-202

Prerequisites:
– Programming (MATLAB)
– Probability and Statistics

Workload:
– About 2 - 3 Assignments +
2
Books
“Machine Learning” by Tom M. Mitchell

“An Introduction to statistical learning” by Gareth


James et.al.

“Machine learning in action” by Peter Harrington

“Pattern Recognition and Machine Learning” by C.M.


Bishop

3
Class Attendance:
Minimum 75% class attendance is mandatory to appear in the
examinations.

Assessment Plan:

Assignments 10%
Project 10%
Midterm Examinations 30%
Final Examination 50%

4
Contents

What is Artificial Intelligence

Machine Learning Framework

Examples/Applications of ML

Why Machine Learning?

Types of learning

Contents

5
What is Intelligence?
What is Intelligence?
• Is there a “holistic” definition for intelligence?
• Here are some definitions:
– the ability to comprehend; to understand and profit from experience
– a general mental capability that involves the ability to reason, plan,
solve problems, think abstractly, comprehend ideas and language,
and learn
– is effectively perceiving, interpreting and responding to the
environment

• None of these tells us what intelligence is, so instead,


maybe we can enumerate a list of elements that an
intelligence must be able to perform:
– perceive, reason and infer, solve problems, learn and adapt, apply
common sense, apply analogy, recall, apply intuition, reach
emotional states, achieve self-awareness

• Which of these are necessary for intelligence? Which are


sufficient?

7
The Turing Test
• 1950 – Alan Turing
devised a test for
intelligence called the
Imitation Game
– Ask questions of two
entities, receive answers
from both Questions
Answers Answers
– If you can’t tell which of
the entities is human and
which is a computer
program, then you are
fooled and we should
therefore consider the
computer to be intelligent
Which is the person?
Which is the computer?8
9
What is Artificial Intelligence?
What is learning?
• “Learning denotes changes in a system that ... enable a system
to do the same task … more efficiently the next time.” - Herbert
Simon

• “Learning is constructing or modifying representations of what is


being experienced.” - Ryszard Michalski

• “Learning is making useful changes in our minds.” - Marvin


Minsky

“Machine learning refers to a system capable of


the autonomous acquisition and
integration of knowledge.”
11
Machine Learning: A Definition

Definition: A computer program is said to


learn from experience E with respect to some
class of tasks T and performance measure P, if
its performance at tasks in T, as measured by P,
improves with experience E.
T. Mitchell (1997). Machine Learning

If a computer program can improve how it


performs a certain tasks based on past experience
then you can say it has learned
12
The Extraction of Knowledge
From Data

Machine learning teaches computers to do what comes


naturally to humans and animals: learn from experience

Machine learning algorithms use computational methods to


“learn” information directly from data without relying on a
predetermined equation as a model. The algorithms adaptively
improve their performance as the number of samples available
for learning increases.

13
“Machine Learning is the field of
study that gives computers the
ability to learn without being
explicitly programmed.”
Arthur Samuel (1959)

14
Computers Program Themselves
• Traditional Programming

Data
Computer Output
Program

• Machine Learning

Data
Computer Program
Output

15
The Machine Learning Framework
• Apply a prediction function to a feature representation of
the image to get the desired output:

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
16
The Machine Learning Framework

𝑦 = 𝑓(𝒙)
Image Feature
Output

Prediction
Function

• Training: given a training set of labeled examples


{(𝒙1, 𝑦1), … , (𝒙𝑁, 𝑦𝑁)}, estimate the prediction function 𝑓
by minimizing the prediction error on the training set
• Testing: apply 𝑓 to a never before seen test example 𝒙
and output the predicted value 𝑦 = 𝑓(𝒙)

17
Steps
Training
Training
Images Image
Features Machine
Learning
Training Algorithm
Labels

Testing

Image Learned
Features Prediction
model
Test Image
Machine Learning
Applications/Examples
Can Computers beat Humans at Chess?
• Chess Playing is a classic AI problem
– well-defined problem
– very complex: difficult for humans to play well

3000
Deep Blue
2800 Garry Kasparov (current World Champion)
2600
2400 Deep Thought
Points Ratings

2200
Ratings
2000
1800
1600
1400
1200
1966 1971 1976 1981 1986 1991 1997

Conclusion: YES: today’s computers can beat even the best human
Spam Detection
• Problem: Classify each e-mail message as SPAM or
non-SPAM

22
Voice / Speech Recognition

23
Medical Diagnosis
• Tumor Detection
• Predicting tumor cells as benign or
malignant
• Cancer prediction and prognosis.

24
Optical Character Recognition

25
Hand Written Character Recognition

26
Text or Document Classification
• Categorizing news stories as finance,
weather, entertainment, sports, etc.

27
Security

• Face Recognition
• Signature / Fingerprint / Iris
verification
• DNA fingerprinting

28
Fraud Detection
• Classifying credit card transactions as
legitimate or fraudulent

30
Fraud Detection
• Fraud costs the financial industry approx.
$80 billion annually

31
Games

Chess Backgammon

32
Unassisted Control Of Vehicle
(Robots, Navigation)

Self-Driving Car Robot


Self-driving cars that rely on machine learning to navigate
may soon be available to consumers. 33
Stock Price Prediction

34
Weather Prediction

Atmospheric Temperature Prediction


35
Clustering Images

36
Whale Detection From Audio
Recordings
• Prevent collisions with shipping traffic

Illustration of ships navigating safely around the habitat of whales.


37
Disease Prediction

38
39
40
41
42
According to a recent study, machine
learning algorithms are expected to
replace 25% of the jobs across the
world, in the next 10 years.

43
Machine learning framework
ML Framework
Training
Training
Images Image
Features Machine
Learning
Training Algorithm
Labels

Testing

Image Learned
Features Prediction
model
Test Image
Types of Learning
Types of Learning

Supervised Unsupervised
learn
Training data includes Training data does not
desired outputs include desired outputs

Types of
Learning
Semi-Supervised Reinforcement
Training data includes a Rewards from sequence of
few desired outputs actions

47
Supervised Learning
• Supervised Learning – learn a function from a set
of training examples which are preclassified
feature vectors.

feature vector class Given a previously unseen


(square, red) I feature vector, what is the
(square, blue) I rule that tells us if it is in
(circle, red) II
class I or class II?
(circle blue) II
(triangle, red) I
(triangle, green) I (circle, green) ?
(ellipse, blue) II (triangle, blue) ?
(ellipse, red) II

48
Supervised Learning

49
Supervised
Training data includes
desired outputs

Classification Regression
Discrete output Continuous output

Yes/No, 1/2/3, A/B/C, If the target value can


or Red/Yellow/Black? take on a number of
values, say any value
from 0.00 to 100.00,
or -999 to 999,
or −∞ to +∞

50
Summary: ML Techniques

1 2 3 4
Supervised Unsupervised Semi-supervised Reinforcement
Predictive model Group and Input data is a Rewards from
based on both interpret data mixture of labelled sequence of
input and output based only on and unlabelled actions
data input data examples.

Classification Regression Clustering

51
Classification vs Regression
• Classification: the output variable takes discrete values
(class labels)
– Sorting incoming fish to sea bass or Salmon
– Whether an email is genuine or spam
– Whether a tumor is cancerous or benign

• Regression: the output variable takes continuous values


– Predict tomorrow’s stock market price given current market
conditions
– Changes in temperature
– Fluctuations in power demand
– Electricity load forecasting

52
53
Classification: Example
• A fish processing plan wants to automate the process of sorting
incoming fish according to species (salmon or sea bass)

classifier

salmon

sorting
chamber

sea bass
• The automation system consists of
• a conveyor belt for incoming products
• two conveyor belts for sorted products
• a pick-and-place robotic arm
• a vision system with an overhead CCD camera
• a computer to analyze images and control the robot arm

55
Sorting Fish
Problem Analysis
• Set up a camera and take some sample images to extract features
– Length
– Lightness
– Width
– Number and shape of fins
– Position of the mouth, etc…

Preprocessing
• Use a segmentation operation to isolate fishes from one another and
from the background.
• Information from a single fish is sent to a feature extractor whose
purpose is to reduce the data by measuring certain features.
• The features are passed to a classifier.

56
57
Features
• Feature is any distinctive aspect, quality or
characteristic (i.e., color, height)

58
What makes a “good” feature vector?
• The quality of a feature vector is related to its ability to discriminate
examples from different classes
– Examples from the same class should have similar feature values
– Examples from different classes have different feature values

• More Feature properties

59
60
Feature for Classification
• Notice salmon tends to be shorter than sea bass
length 2 4 8 10 12 14
Bass 0 1 3 8 10 5
Salmon 2 5 10 5 1 0

12 Salmon
Sea Bass
10
8
Count

6
4
2
0
0 5 10 15
Length

Can we select the length of the fish as a possible


feature for discrimination? 61
Single Feature (length) Classifier
• Find the best 𝐿 threshold
Fish length < 𝑳 Classify as Salmon

Fish length > 𝑳 Classify as Sea Bass

For example, at L = 5, misclassified:


2 4 8 10 12 14
• 1 Sea Bass Bass 0 1 3 8 10 5
• 16 Salmon Salmon 2 5 10 5 1 0

17
Classification error = = 34%
50
62
Fish Classified as Fish Classified as

Salmon Sea Bass


12
Salmon
10
Sea Bass
8
Count

0
0 2 4 6 8 10 12 14 16
Length

After searching through all possible thresholds L, the best


L= 9, and still 20% of fish is misclassified

63
• Lesson learned
– Length is a poor feature alone

• What to do?
– Try another feature
– Salmon tends to be lighter
– Try average fish lightness
(Intensity of image pixels)

64
Single Feature (lightness) Classifier
1 2 3 4 5
Bass 0 1 2 10 12
Salmon 6 10 6 1 0

14

12
Bass
10
Salmon
8

0
0 1 2 3 4 5 6

• Now fish are classified best at lightness


threshold of 3.5 with classification error of 8%
65
Can do better by feature combining
• Use both length and lightness features
• Feature vector [length, lightness]

decision
boundary

decision regions

length

Classification error = 4% 66
Another Classifier

length

Classification error = 0%

67
Example:
• We want to predict a person's height (in Predict Height from weight
inches) from his weight (in pounds). 72

• A sample of ten people is available (for


70

whom we know their height and weight)


68

Height (Inches)
• The purpose of regression is to come 66

up with an equation of a line that fits


through that cluster of points with the 64

minimal amount of deviations (error)


from the line. 62

60

• Once we have this regression 80 100 120 140


Weight (pounds)
160 180 200

equation, if we knew a person's weight, 𝑛


we could then predict their height. 1 2
𝑀𝑆𝐸 = ෍ 𝑦𝑖 − 𝑦ෝ𝑖
𝑛
𝑖=1
Example: Regression

Predict Height from weight


72

-- Linear
70

68
Height (Inches)

66

64

62

60
80 100 120 140 160 180 200
Weight (pounds)
Example: Regression

Predict Height from weight


72

70
-- Second order polynomial

68
Height (Inches)

66

64

62

60
80 100 120 140 160 180 200
Weight (pounds)
Example: Regression

Predict Height from weight


72

70

68
-- Fourth order polynomial
Height (Inches)

66

64

62

60
80 100 120 140 160 180 200
Weight (pounds)
Example: Regression

Predict Height from weight


72

70

68

-- Sixth order polynomial


Height (Inches)

66

64

62

60
80 100 120 140 160 180 200
Weight (pounds)
Unsupervised Learning
• Unsupervised Learning – All data is unlabeled and
the algorithms learn to inherent structure from the
input data.
• No classes are given. The idea is to find patterns
in the data. This generally involves clustering.

73
Clustering
• The organization of unlabeled
data into similarity groups
called clusters.

• A cluster is a collection of data


items which are “similar”
between them, and “dissimilar”
to data items in other clusters.

74
Unsupervised Learning

75
Example: Unsupervised Learning
• Independent Component Analysis
– separate a combined signal into its original sources

76
Semi-supervised Learning
• Problems where you have a large amount of
input data (X) and only some of the data is
labeled (Y) are called semi-supervised learning
problems.
• These problems sit in between both supervised
and unsupervised learning.

77
Semi-supervised Learning

Supervised learning Unsupervised learning

Semi-supervised learning 78
Reinforcement Learning
• Reinforcement Learning – learn from feedback
after a decision is made.

• Trial-and-error search

• It is not told what actions to take, but the algorithm


will have to discover itself which actions yield the
most rewards by trying them.

79
Summary: ML Techniques

1 2 3 4
Supervised Unsupervised Semi-supervised Reinforcement
Predictive model Group and Input data is a Rewards from
based on both interpret data mixture of labelled sequence of
input and output based only on and unlabelled actions
data input data examples.

Classification Regression Clustering

80
Example
• Suppose clinicians want to predict whether someone will
have a heart attack within a year. They have data on
previous patients, including age, weight, height, and blood
pressure. They know whether the previous patients had
heart attacks within a year. So the problem is combining the
existing data into a model that can predict whether a new
person will have a heart attack within a year.

What type of problem is this?


• Supervised ?
• Unsupervised?

Classification or Regression?
81
Example
• Suppose that we are interested in developing an
algorithm to predict a stock’s price based on previous
stock returns.
• We can train the method using stock returns from the
past 6 months.

What type of problem is this?


• Supervised ?
• Unsupervised?

Classification or Regression?

82
When Should We Use Machine
Learning?
Humans are unable to
explain their expertise

Human expertise
A C Solution changes
is absent with time

When ML?

The problem size is to


E D Solution needs to be
vast for our limited adapted to particular
reasoning capabilities cases
83
Choosing The Right Algorithm

• There are dozens of Machine


Learning
supervised and unsupervised
machine learning algorithms.
Each takes a different
Supervised Unsupervised
approach to learning. Learning Learning

How to decide
Classification Regression Clustering
which algorithm
to use?
• Nearest Neighbors • Linear Regression • k-Means
• Support Vector Machines • Decision Tree regression • k-Medians
• Naïve Bayes • Neural Networks • Expectation Maximization
• Discriminant Analysis • Stepwise Regression • Hierarchical Clustering
• Random Forest • Gaussian Mixture
• Artificial Neural networks • Hidden Markov model
• Genetic algorithms
• Logistic Regression 90
Why is it necessary to introduce so
many different learning approaches,
rather than just a single best
method?

There is no best method or one size


fits all. Finding the right algorithm is
partly just trial and error—even highly
experienced data scientists can’t tell
whether an algorithm will work
without trying it out.

91
Important concepts in
selecting ML procedure
for a specific data set

92
Training Accuracy
• How well the classifier performed on the
training data?
Classifier-1 Classifier-2

length
Training error = 4% Training error = 0%
Training accuracy = 96% Training accuracy =100%

93
Generalization
• In general, we do not really care how well the method
works on the training data. Rather, we are interested in
the accuracy of the predictions that we obtain when we
apply our method to previously unseen test data.

• The goal is for classifier to perform well on new data.

• The ability to produce correct outputs on previously


unseen examples is called generalization.

94
Classifier on training data
Training error = 0%

95
Test Classifier on new data
Test error = 25%

What went wrong?


96
What Went Wrong?
added 2 samples

• We always have only a limited amount of data, not all


possible data.
• We should make sure the decision boundary does not adapt
too closely to the particulars of the data we have at hand, but
rather grasps the “big picture”.
97
What Went Wrong?

• Complicated boundaries overfit the data, they are


too tuned to the particular training data at hand.
• Therefore complicated boundaries tend to not
generalize well to the new data.
98
Generalization
training data test data

• The big question of learning theory: how to get good generalization


with a limited number of examples
• Intuitive idea: favor simpler classifiers
• Simpler decision boundary may not fit ideally to the training data but
tends to generalize better to new data

99
Underfitting

• We can also underfit data, i.e. use too simple decision


boundary
• chosen model is not expressive enough
• There is no way to fit a linear decision boundary so that
the training examples are well separated
• Training error is too high
• test error is, of course, also high
100
Underfitting → Overfitting

underfitting “just right” overfitting

101
What are we seeking?

Under-fitting Over-fitting
Error

“just right”

Test error

Training error

Flexibility/Complexity 103
Cross-validation
• Training error/accuracy do not give an indication of
how well the learner will perform on “new” data.
• Cross validation is a better model evaluation method

Split your data set into two parts,


Dtrain one for training your model and
the other for validating your model.
The error on the validation
data is called “validation error”
Dval
(Eval)

Eval 104
K-Fold Cross-validation
• More accurate than using only one
validation set.
1 K
Egen  Eval =  Eval (k )
K k =1

Dtrain Dval
Dtrain
Dval
Dtrain
Dval Dtrain

Eval(1) Eval(2) Eval(3)


105
Leave-one-out cross validation
• Leave-one-out cross validation is K-fold cross validation
taken to its logical extreme, with K equal to N, the
number of data points in the set.
• That means that N separate times, the classifier is
trained on all the data except for one point and a
prediction is made for that point.

106

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy