Lec 01 (ML) Introduction
Lec 01 (ML) Introduction
Prerequisites:
– Programming (MATLAB)
– Probability and Statistics
Workload:
– About 2 - 3 Assignments +
2
Books
“Machine Learning” by Tom M. Mitchell
3
Class Attendance:
Minimum 75% class attendance is mandatory to appear in the
examinations.
Assessment Plan:
Assignments 10%
Project 10%
Midterm Examinations 30%
Final Examination 50%
4
Contents
Examples/Applications of ML
Types of learning
Contents
5
What is Intelligence?
What is Intelligence?
• Is there a “holistic” definition for intelligence?
• Here are some definitions:
– the ability to comprehend; to understand and profit from experience
– a general mental capability that involves the ability to reason, plan,
solve problems, think abstractly, comprehend ideas and language,
and learn
– is effectively perceiving, interpreting and responding to the
environment
7
The Turing Test
• 1950 – Alan Turing
devised a test for
intelligence called the
Imitation Game
– Ask questions of two
entities, receive answers
from both Questions
Answers Answers
– If you can’t tell which of
the entities is human and
which is a computer
program, then you are
fooled and we should
therefore consider the
computer to be intelligent
Which is the person?
Which is the computer?8
9
What is Artificial Intelligence?
What is learning?
• “Learning denotes changes in a system that ... enable a system
to do the same task … more efficiently the next time.” - Herbert
Simon
13
“Machine Learning is the field of
study that gives computers the
ability to learn without being
explicitly programmed.”
Arthur Samuel (1959)
14
Computers Program Themselves
• Traditional Programming
Data
Computer Output
Program
• Machine Learning
Data
Computer Program
Output
15
The Machine Learning Framework
• Apply a prediction function to a feature representation of
the image to get the desired output:
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
16
The Machine Learning Framework
𝑦 = 𝑓(𝒙)
Image Feature
Output
Prediction
Function
17
Steps
Training
Training
Images Image
Features Machine
Learning
Training Algorithm
Labels
Testing
Image Learned
Features Prediction
model
Test Image
Machine Learning
Applications/Examples
Can Computers beat Humans at Chess?
• Chess Playing is a classic AI problem
– well-defined problem
– very complex: difficult for humans to play well
3000
Deep Blue
2800 Garry Kasparov (current World Champion)
2600
2400 Deep Thought
Points Ratings
2200
Ratings
2000
1800
1600
1400
1200
1966 1971 1976 1981 1986 1991 1997
Conclusion: YES: today’s computers can beat even the best human
Spam Detection
• Problem: Classify each e-mail message as SPAM or
non-SPAM
22
Voice / Speech Recognition
23
Medical Diagnosis
• Tumor Detection
• Predicting tumor cells as benign or
malignant
• Cancer prediction and prognosis.
24
Optical Character Recognition
25
Hand Written Character Recognition
26
Text or Document Classification
• Categorizing news stories as finance,
weather, entertainment, sports, etc.
27
Security
• Face Recognition
• Signature / Fingerprint / Iris
verification
• DNA fingerprinting
28
Fraud Detection
• Classifying credit card transactions as
legitimate or fraudulent
30
Fraud Detection
• Fraud costs the financial industry approx.
$80 billion annually
31
Games
Chess Backgammon
32
Unassisted Control Of Vehicle
(Robots, Navigation)
34
Weather Prediction
36
Whale Detection From Audio
Recordings
• Prevent collisions with shipping traffic
38
39
40
41
42
According to a recent study, machine
learning algorithms are expected to
replace 25% of the jobs across the
world, in the next 10 years.
43
Machine learning framework
ML Framework
Training
Training
Images Image
Features Machine
Learning
Training Algorithm
Labels
Testing
Image Learned
Features Prediction
model
Test Image
Types of Learning
Types of Learning
Supervised Unsupervised
learn
Training data includes Training data does not
desired outputs include desired outputs
Types of
Learning
Semi-Supervised Reinforcement
Training data includes a Rewards from sequence of
few desired outputs actions
47
Supervised Learning
• Supervised Learning – learn a function from a set
of training examples which are preclassified
feature vectors.
48
Supervised Learning
49
Supervised
Training data includes
desired outputs
Classification Regression
Discrete output Continuous output
50
Summary: ML Techniques
1 2 3 4
Supervised Unsupervised Semi-supervised Reinforcement
Predictive model Group and Input data is a Rewards from
based on both interpret data mixture of labelled sequence of
input and output based only on and unlabelled actions
data input data examples.
51
Classification vs Regression
• Classification: the output variable takes discrete values
(class labels)
– Sorting incoming fish to sea bass or Salmon
– Whether an email is genuine or spam
– Whether a tumor is cancerous or benign
52
53
Classification: Example
• A fish processing plan wants to automate the process of sorting
incoming fish according to species (salmon or sea bass)
classifier
salmon
sorting
chamber
sea bass
• The automation system consists of
• a conveyor belt for incoming products
• two conveyor belts for sorted products
• a pick-and-place robotic arm
• a vision system with an overhead CCD camera
• a computer to analyze images and control the robot arm
55
Sorting Fish
Problem Analysis
• Set up a camera and take some sample images to extract features
– Length
– Lightness
– Width
– Number and shape of fins
– Position of the mouth, etc…
Preprocessing
• Use a segmentation operation to isolate fishes from one another and
from the background.
• Information from a single fish is sent to a feature extractor whose
purpose is to reduce the data by measuring certain features.
• The features are passed to a classifier.
56
57
Features
• Feature is any distinctive aspect, quality or
characteristic (i.e., color, height)
58
What makes a “good” feature vector?
• The quality of a feature vector is related to its ability to discriminate
examples from different classes
– Examples from the same class should have similar feature values
– Examples from different classes have different feature values
59
60
Feature for Classification
• Notice salmon tends to be shorter than sea bass
length 2 4 8 10 12 14
Bass 0 1 3 8 10 5
Salmon 2 5 10 5 1 0
12 Salmon
Sea Bass
10
8
Count
6
4
2
0
0 5 10 15
Length
17
Classification error = = 34%
50
62
Fish Classified as Fish Classified as
0
0 2 4 6 8 10 12 14 16
Length
63
• Lesson learned
– Length is a poor feature alone
• What to do?
– Try another feature
– Salmon tends to be lighter
– Try average fish lightness
(Intensity of image pixels)
64
Single Feature (lightness) Classifier
1 2 3 4 5
Bass 0 1 2 10 12
Salmon 6 10 6 1 0
14
12
Bass
10
Salmon
8
0
0 1 2 3 4 5 6
decision
boundary
decision regions
length
Classification error = 4% 66
Another Classifier
length
Classification error = 0%
67
Example:
• We want to predict a person's height (in Predict Height from weight
inches) from his weight (in pounds). 72
Height (Inches)
• The purpose of regression is to come 66
60
-- Linear
70
68
Height (Inches)
66
64
62
60
80 100 120 140 160 180 200
Weight (pounds)
Example: Regression
70
-- Second order polynomial
68
Height (Inches)
66
64
62
60
80 100 120 140 160 180 200
Weight (pounds)
Example: Regression
70
68
-- Fourth order polynomial
Height (Inches)
66
64
62
60
80 100 120 140 160 180 200
Weight (pounds)
Example: Regression
70
68
66
64
62
60
80 100 120 140 160 180 200
Weight (pounds)
Unsupervised Learning
• Unsupervised Learning – All data is unlabeled and
the algorithms learn to inherent structure from the
input data.
• No classes are given. The idea is to find patterns
in the data. This generally involves clustering.
73
Clustering
• The organization of unlabeled
data into similarity groups
called clusters.
74
Unsupervised Learning
75
Example: Unsupervised Learning
• Independent Component Analysis
– separate a combined signal into its original sources
76
Semi-supervised Learning
• Problems where you have a large amount of
input data (X) and only some of the data is
labeled (Y) are called semi-supervised learning
problems.
• These problems sit in between both supervised
and unsupervised learning.
77
Semi-supervised Learning
Semi-supervised learning 78
Reinforcement Learning
• Reinforcement Learning – learn from feedback
after a decision is made.
• Trial-and-error search
79
Summary: ML Techniques
1 2 3 4
Supervised Unsupervised Semi-supervised Reinforcement
Predictive model Group and Input data is a Rewards from
based on both interpret data mixture of labelled sequence of
input and output based only on and unlabelled actions
data input data examples.
80
Example
• Suppose clinicians want to predict whether someone will
have a heart attack within a year. They have data on
previous patients, including age, weight, height, and blood
pressure. They know whether the previous patients had
heart attacks within a year. So the problem is combining the
existing data into a model that can predict whether a new
person will have a heart attack within a year.
Classification or Regression?
81
Example
• Suppose that we are interested in developing an
algorithm to predict a stock’s price based on previous
stock returns.
• We can train the method using stock returns from the
past 6 months.
Classification or Regression?
82
When Should We Use Machine
Learning?
Humans are unable to
explain their expertise
Human expertise
A C Solution changes
is absent with time
When ML?
How to decide
Classification Regression Clustering
which algorithm
to use?
• Nearest Neighbors • Linear Regression • k-Means
• Support Vector Machines • Decision Tree regression • k-Medians
• Naïve Bayes • Neural Networks • Expectation Maximization
• Discriminant Analysis • Stepwise Regression • Hierarchical Clustering
• Random Forest • Gaussian Mixture
• Artificial Neural networks • Hidden Markov model
• Genetic algorithms
• Logistic Regression 90
Why is it necessary to introduce so
many different learning approaches,
rather than just a single best
method?
91
Important concepts in
selecting ML procedure
for a specific data set
92
Training Accuracy
• How well the classifier performed on the
training data?
Classifier-1 Classifier-2
length
Training error = 4% Training error = 0%
Training accuracy = 96% Training accuracy =100%
93
Generalization
• In general, we do not really care how well the method
works on the training data. Rather, we are interested in
the accuracy of the predictions that we obtain when we
apply our method to previously unseen test data.
94
Classifier on training data
Training error = 0%
95
Test Classifier on new data
Test error = 25%
99
Underfitting
101
What are we seeking?
Under-fitting Over-fitting
Error
“just right”
Test error
Training error
Flexibility/Complexity 103
Cross-validation
• Training error/accuracy do not give an indication of
how well the learner will perform on “new” data.
• Cross validation is a better model evaluation method
Eval 104
K-Fold Cross-validation
• More accurate than using only one
validation set.
1 K
Egen Eval = Eval (k )
K k =1
Dtrain Dval
Dtrain
Dval
Dtrain
Dval Dtrain
106