0% found this document useful (0 votes)
106 views134 pages

Intro To ML

This document provides an introduction to machine learning, including definitions of key terms and descriptions of different machine learning paradigms. It begins with an outline and then defines machine learning as enabling computers to learn without being explicitly programmed by analyzing large amounts of data. The document describes the differences between supervised and unsupervised learning, as well as reinforcement learning. It also discusses machine learning strategies like rote learning, learning by instruction, learning by deduction, and learning by induction.

Uploaded by

Imran Liaqat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views134 pages

Intro To ML

This document provides an introduction to machine learning, including definitions of key terms and descriptions of different machine learning paradigms. It begins with an outline and then defines machine learning as enabling computers to learn without being explicitly programmed by analyzing large amounts of data. The document describes the differences between supervised and unsupervised learning, as well as reinforcement learning. It also discusses machine learning strategies like rote learning, learning by instruction, learning by deduction, and learning by induction.

Uploaded by

Imran Liaqat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 134

COE 292

Introduction to Artificial Intelligence

Machine Learning – Part 1


Content on these slides was mostly developed by Dr. Akram F. Ahmed, COE Dept.

7/24/2022 INTRODUCTION TO AI 1
Outline
• Introduction
• AI and ML
• ML Strategies and Paradigms
• ML Training and Evaluation
• Supervised Learning
• Unsupervised Learning

7/24/2022 INTRODUCTION TO AI 2
Introduction
• Learning is one of the most important activities of human beings
and living beings in general, which help us in adapting to the
environment.
• Learning involves making changes in the way of learning to improve
inference and knowledge acquisition.
• Most of the knowledge in the real world is not formalized and not
available in textual form which makes it difficult for computers to
learn and infer.

7/24/2022 INTRODUCTION TO AI 3
Machine Learning (ML)
• The concept of machine learning is based on the principles of
training computing machines, and enabling them to teach
themselves.
• Machine learning aims to create theories and procedures -learning
algorithms, that allow machines to learn.
• Machine learning is the subfield of computer science that gives
"computers the ability to learn without being explicitly
programmed.” by Arthur Samuel, 1959.

7/24/2022 INTRODUCTION TO AI 4
Machine Learning (ML)
• The computation undertaken by a learning system can be viewed as
occurring at two distinct times, training time and consultation time.
• Training Time:
◦ the time prior to consultation time where data points (examples)
are presented to train the machine learning algorithm. In other
words, its the time spent by new systems training to get ready to
consultation time.
• Consultation Time:
◦ the time between when a new data point is presented to the
system and the time when the inference is completed.
7/24/2022 INTRODUCTION TO AI 5
AI and Machine Learning

7/24/2022 INTRODUCTION TO AI 6
AI and Machine Learning: Key Differences

INTRODUCTION TO AI 7
AI and Machine Learning: Key Differences

INTRODUCTION TO AI 8
Fundamental Strategies of Learning
• In the learning process, the learner transforms the information
provided by a teacher (or environment) into some new form that
can be used in the future.
• The nature of this knowledge and transformation are the deciding
factors for the type of learning strategy used.

7/24/2022 INTRODUCTION TO AI 9
Fundamental Strategies of Learning
• Rote Learning: When learning, the original knowledge is copied in
the same form and stored in the knowledge Base (KB), when
needed in the future it will be retrieved as it was sored without
change (Memorization).
• Learning by Instruction: requires the knowledge to be transformed
into an operational form before it can be integrated into the
knowledge base.
◦ Example are students in a class where the teacher presents a
number of facts. The basic transformations performed by a
learner are selection and reformation of information provided by
the teacher.

7/24/2022 INTRODUCTION TO AI 10
Fundamental Strategies of Learning
• Learning by Deduction: is carried out through a sequence of
deductive inference steps using known facts.
◦ Example: if we have the sentence in our KB (i.e. a rule), saying "a
father's father is a grand father," and we have the case that 𝑎 is
the father of 𝑏 and 𝑏 is the father of 𝑐 , then we can deduce
that 𝑎 is the grand father of 𝑐.
• Learning by Analogy: provides learning of new concepts through the
use of similar known concepts or solutions.
◦ Example: solving exam questions where based on the solution of
somewhat similar solutions we have already done at home or in
class we attempt to solve the new problem.

7/24/2022 INTRODUCTION TO AI 11
Fundamental Strategies of Learning
• Learning by Induction (or Similarity): is often observed after the
learner experiences a number of instances or examples regarding
the same problem and formulating a general concept.
◦ Example: Learning that those students who complete their homework by
themselves and attend all classes in time get more Cumulative Percentile
Index (CPI) is an example of inductive learning.
• Reinforcement Learning: is the study of decision making over time
with consequences.
◦ Example: Whenever, a teacher appreciates the student's efforts by saying
"very good", for asking an interesting question, or answering a question, it
is reinforcement-based learning for the student.

7/24/2022 INTRODUCTION TO AI 12
Machine Learning Paradigms
• Machine Learning can be divided into the following
classes: Ground Truth
The real (true) label(s),
◦ Supervised value(s) or class(es)
associated with the given
 when learning is achieved while there is a teacher present data.
Examples:
for learning to take place • Given Ahmed’s face
picture, the ground
 There is ground truth available for the given training data truth label could be his
Name and/or ID#
 Algorithm learns relationship between features and labels
during training
 The algorithm iteratively makes predictions on the
training data and is corrected by the teacher

7/24/2022 INTRODUCTION TO AI 13
Machine Learning Paradigms
• Machine Learning can be divided into the following
classes: Ground Truth
The real (true) label(s),
◦ Unsupervised value(s) or class(es)
associated with the given
 Ground truth is not available data.
Examples:
 Learning is achieved without a teacher • Given Ahmed’s face
picture, the ground

 Algorithm learns patterns* or groupings in the data truth label could be his
Name and/or ID#
during training
 Example: clustering methods such as 𝑘-means.
 Has fewer models and evaluation methods than
supervised learning

7/24/2022 INTRODUCTION TO AI 14
Machine Learning Paradigms
◦ Reinforcement
 “learning what to do—how to map situations to actions— Ground Truth
The real (true) label(s),
so as to maximize a numerical reward signal.” [Sutton & value(s) or class(es)
associated with the given
Barto]. data.
Examples:
◦ Semi-Supervised • Given Ahmed’s face
picture, the ground
 The ground truth data may be scarce or partially available truth label could be his
Name and/or ID#
◦ Self-Supervised
 Ground truth data not available, but pseudo-labels are
generated to conduct the learning process
 “Obtains supervisory signals from the data itself, often
leveraging the underlying structure in the data” [Meta AI]
7/24/2022 INTRODUCTION TO AI 15
Machine Learning Paradigms
◦ Self-Supervised
 Examples: Ground Truth
The real (true) label(s),
◦ “we can hide part of a sentence and predict the hidden words value(s) or class(es)
associated with the given
from the remaining words.” data.
◦ “We can also predict past or future frames in a video (hidden Examples:
• Given Ahmed’s face
data) from current ones (observed data).” picture, the ground
truth label could be his
Name and/or ID#

7/24/2022 INTRODUCTION TO AI 16
Typical Problems for ML?
Classification:
Predict a class label for an input

INTRODUCTION TO AI 17
Supervised vs. Unsupervised
• Supervised • Unsupervised
◦ Example: distinguishing between ◦ Example: grouping data into classes
two plant types

7/24/2022 INTRODUCTION TO AI 18
Supervised vs. Unsupervised – Examples
• You’re running a real-estate company, and you want to develop ML
algorithms to address each of two problems:
◦ Problem 1: You have a large database of housing prices for houses
of different sizes. You want to predict the prices of houses given
their sizes.
◦ Problem 2: You’d like the software to examine specific houses in
your database, and put each house in one of two categories;
highly desirable or undesirable.
• Should you treat these as classification or as regression problems?

7/24/2022 INTRODUCTION TO AI 19
Supervised vs. Unsupervised – Examples
• Of the following examples, which would you address using an
unsupervised ML algorithm? (Check all that apply.)
Given an email labeled as spam/not spam, learn a spam filter.
Given a set of news articles found on the web, group them into a
set of articles about the same story.
Given a dataset of patients diagnosed as either having heart
disease or not, learn to classify new patients as having heart
disease or not.
Given a database of university students, automatically discover
sport skills and group students into different sport segments.

7/24/2022 INTRODUCTION TO AI 20
ML Training and Evaluation
• Cross Validation
• Underfitting and Overfitting

7/24/2022 INTRODUCTION TO AI 21
ML Training and Evaluation – Cross-validation
• In any Machine Learning problem, we are given a set of data with
labels that tell us what this data means to the expert of the field.
◦ Example: Lets assume that we have collected data about heart
disease from a large sample that cover all possible causes of
having the disease. Furthermore, assume that we represent the
entire collected data by the blue bar below where each dot in the
bar represents the data set collected from one person.

7/24/2022 INTRODUCTION TO AI 22
ML Training and Evaluation – Cross-validation
• In Machine Learning we need to do two things with this data:
1. Estimate the parameters of the machine learning model, i.e. use
it to guess the shape of the curve that best fits the data if a 2D
estimator is used.
◦ In Machine Learning, parameters estimation is called Training the
algorithm.
2. Evaluate how well do the learned parameters work, i.e. we need
to test how good a job will the estimated curve do when we
present it with data it has never seen before.
◦ In Machine Learning, evaluating a method is called Testing the
algorithm.
7/24/2022 INTRODUCTION TO AI 23
ML Training and Evaluation – Cross-validation
• Therefore in Machine learning:
◦ We need the data to train the machine learning algorithm.
◦ We need to test the trained ML model on data it hasn’t seen in
training, to make sure that the algorithm performs well.
• Question: where can we get training and testing data?
◦ Using the same data for training and testing does not work since
we do not know how the algorithm performs when it is given a set
of data it has not been trained on.
◦ Using all the data for training will not leave any data for testing

7/24/2022 INTRODUCTION TO AI 24
ML Training and Evaluation – Cross-validation
• Answer: we use the labeled data provided by an expert. In the
example of the heart disease we will divide the collected data into a
training set and testing set.
◦ A common practice in Machine Learning is to use 75% of the data
for training and 25% of data for testing. This is called the holdout
method
◦ The question is which 25% to choose for testing and which 75% to choose for
training?

7/24/2022 INTRODUCTION TO AI 25
ML Training and Evaluation – Cross-validation
• We use k-fold cross-validation method :
◦ divide the data into a number of subsets (folds)
◦ In each fold, assign some (different) k-1 subsets for training and
leave a single subset for testing

7/24/2022 INTRODUCTION TO AI 26
ML Training and Evaluation – Cross-validation
• Example: a Four-Fold cross validation: the data is divided into FOUR
equal sets. For the heart disease example they are shown below:

• We then train and test the Machine Learning algorithm as follows:


◦ Sets 1,2,3 training and Set 4 for testing
◦ Sets 1,2,4 training and Set 3 for testing
◦ Sets 2,3,4 training and set 1 for testing
◦ ...etc (all possible combinations)
• In practice, its is common to use 3, 5 or 10-fold Cross validation.
7/24/2022 INTRODUCTION TO AI 27
ML Training and Evaluation – Cross-validation
• k-fold cross-validation can help us to obtain reliable estimates of the
model’s generalization performance, that is, how well the model
performs on unseen data.
• But the main disadvantage is increased computational cost

7/24/2022 INTRODUCTION TO AI 28
ML Training and Evaluation – Underfitting and Overfitting
• Suppose we have the dataset as shown below
◦ Data is labeled, red circles and blue circles
• How can we train and obtain the best classifier?

7/24/2022 INTRODUCTION TO AI 29
ML Training and Evaluation – Underfitting and Overfitting
• Idea 1: Let us try a linear classifier
represented by a straight line:
◦ As can be seen that there are many blue
points above the line that are
misclassified
◦ No matter how we rotate or shift the line,
we will always have high misclassification
rate in training and testing
◦ This is known as Underfitting Over simplifies the
complexity in the data

7/24/2022 INTRODUCTION TO AI 30
ML Training and Evaluation – Underfitting and Overfitting
• Idea 2:
◦ Lets use a curve that best can separate the red from the blue
classes
◦ Let us divide our data into training and testing as shown below
|Training | Testing| |---------|--------| |

7/24/2022 INTRODUCTION TO AI 31
ML Training and Evaluation – Underfitting and Overfitting
• Idea 2:
◦ We can find the "wavey" curve that best fits all the points in the
training set as shown below:
Fits the varying
training data
very well

7/24/2022 INTRODUCTION TO AI 32
ML Training and Evaluation – Underfitting and Overfitting
• Idea 2:
◦ Now if we use the curve to test with data we get the following:

Does not do well


with the testing data

◦ As can be seen that many test points are not classified correctly.
◦ This is what we call Overfitting
7/24/2022 INTRODUCTION TO AI 33
ML Training and Evaluation – Underfitting and Overfitting
• Idea 3:
◦ Allow for some misclassification and we can get:

◦ This curve does not overfit nor underfit


◦ There are some misclassifications but within an acceptable range
7/24/2022 INTRODUCTION TO AI 34
Supervised Learning - Classification
• Classification predicts the classes (categories/lables) to which the
given examples would belong to and then classifies the examples in
those categories.
• It assumes the following:
◦ Existence of some teacher (environment),
◦ A fitness function to measure the fitness of an example for a class, and
◦ Some external method of classifying the training instances.
• A classifier typically learns with the help of a training set containing
examples in which any given target example has been previously
labeled.
7/24/2022 INTRODUCTION TO AI 35
Supervised Learning - Classification
• Examples of Supervised Learning algorithms:
◦ 𝑘 Nearest Neighbor (𝑘-NN)
◦ Support Vector Machines (SVM)

7/24/2022 INTRODUCTION TO AI 36
K-Nearest Neighbor (k-NN)
• Uses 𝑘 closest points (nearest neighbors) for
performing classification
• 𝑘 -Nearest Neighbor algorithms
classify a new example by comparing
it to all previously seen examples.
• The classifications of the 𝑘 most
similar previous cases are used for
predicting the classification of the
current example.

7/24/2022 INTRODUCTION TO AI 37
K-Nearest Neighbor (k-NN)
• The training examples are used for
◦ providing a library of sample cases
◦ re-scaling the similarity function to
maximize performance

7/24/2022 INTRODUCTION TO AI 38
K-Nearest Neighbor Algorithm
1. Calculate the distance between a test point and every training
instance.
2. Pick the 𝑘 closest (nearest) training examples and assign the test
instance to the most common category amongst these nearest
neighbors.
◦ Voting multiple neighbors (𝑘) helps decrease susceptibility to
noise.
◦ Usually use odd value for 𝑘 to break ties.

7/24/2022 INTRODUCTION TO AI 39
K-Nearest Neighbor (k-NN) – Examples
• Example: Given training data (shown as solid circles) to classify two
different attributes, a new point (hollow circle) has to be classified.
◦ It is assigned the most frequent label of its 𝑘 nearest neighbors as
shown below
◦ Note that changing 𝑘 may lead to different classification

7/24/2022 INTRODUCTION TO AI 40
K-Nearest Neighbor: Distance Metrics
• 𝑘-NN methods assume a function for determining the similarity or
distance between any two instances.
• Euclidean distance is the generic choice.
• Considering two patterns and
• The Euclidean distance between them is given by:

where m is the number of dimensions

7/24/2022 INTRODUCTION TO AI 41
K-Nearest Neighbor: Distance Metrics
• Example: Find the distance between 𝑥=(3,5) and 𝑧=(1,2).
◦ The Euclidean distance in 2-dimensions is

• Euclidean distance in higher dimension


◦ Example: Find the Euclidean distance between 784-dimensional
vectors x; z?

7/24/2022 INTRODUCTION TO AI 42
K-Nearest Neighbor: Examples
• Consider the data shown below, in which there are two classes of
data indicated by O and X. A new data point (0,0) is to be classified
using the K-NN algorithm. Which of the following is true?
◦ 1-NN classifies the new data as X
◦ 5-NN classifies the new data as O
◦ 3-NN classifies the new data as X
◦ 1-NN cannot be used to classify
the data since there are lots of points
◦ 1-NN classifies the new data as O

7/24/2022 INTRODUCTION TO AI 43
K-Nearest Neighbor: Examples
• Example: Given the following data of a
diseased patient, find the survivability of the
new patient (shown in green) using:
◦ 1-NN
◦ 3-NN
◦ 48-NN
• As we can see the 1-NN classifies the
survivability to "Did Not Survive" while 3-NN
classifies it to "Survived."
• Selecting 48-NN will compare with average
and so on.
7/24/2022 INTRODUCTION TO AI 44
K-Nearest Neighbor: Overfitting and Underfitting
• Based on the dataset that you may have,
the value of 𝑘 determines if you are
facing under or over fitting
• In this example:
◦ If the value of 𝑘=1, the result may be
considered as overfitting since outliers
in training samples may misclassify
some new samples especially if they
are close to the outlier.
◦ If the value of 𝑘 is set to be too large
then you may have underfitting.

7/24/2022 INTRODUCTION TO AI 45
K-Nearest Neighbor: Overfitting and Underfitting
• When 𝑘 is equal to the number of
samples in the data, 𝑘-NN becomes an
average comparator.
• Good value of 𝑘 maybe 3, 4, 5, 6, and 7.

7/24/2022 INTRODUCTION TO AI 46
K-Nearest Neighbor: Overfitting and Underfitting

• In the above example, using few neighbors corresponds to high


model complexity (overfitting), and using many neighbors
corresponds to low model complexity (underfitting)
• In conclusion, the selection of the value of 𝑘 is crucial to ensure that
the classifier works correctly and as per our needs.
7/24/2022 INTRODUCTION TO AI 47
K-Nearest Neighbor: Drawing decision boundaries
• For 1-NN, we can identify surfaces where if
a point lies within a specific surface, it will
be classified to its nearest neighbor. x

• In the following example, the dots


represents the nodes we will be using for
classification and the x is the new point that
needs to be classified.
• Different distance metrics can change the x x

decision surface as shown on the right


where the used metric is shown under each
picture

7/24/2022 INTRODUCTION TO AI 48
1-Nearest Neighbor: Drawing decision boundaries
• Boundary lines are formed by the intersection of perpendicular
bisectors of every pair of points.
• Steps:
1. Examine the region where you think decision boundaries should occur.
2. Find the closest oppositely labeled points (+/-) and connect them, forming
a line.
3. Draw perpendicular bisectors of these lines.
4. Extend and join all bisectors

7/24/2022 INTRODUCTION TO AI 49
Drawing 1-NN decision boundaries: Examples
• Example:

7/24/2022 INTRODUCTION TO AI 50
Drawing 1-NN decision boundaries: Examples
• Example:

7/24/2022 INTRODUCTION TO AI 51
Drawing 1-NN decision boundaries: Examples
• Given the following data points and labels shown below, draw on
the graph the decision boundaries for 1-Nearest Neighbors
Point Coor. Class
A (0, 0) Black
B (0, -2) White
C (-2, 0) White
D (2, 2) White
E (1, 2) Black

7/24/2022 INTRODUCTION TO AI 52
Drawing 1-NN decision boundaries: Examples
• Given the following data points and labels shown below, draw on
the graph the decision boundaries for 1-Nearest Neighbors
Point Coor. Class
A (0, 0) Black
B (0, -2) White
C (-2, 0) White
D (2, 2) White
E (1, 2) Black

7/24/2022 INTRODUCTION TO AI 53
K-Nearest Neighbor: Pros and Cons
• Pros
◦ It is extremely easy to implement
◦ Requires no training prior to making real time predictions.
◦ This makes the 𝑘-NN algorithm much faster than other algorithms
that require training, e.g. SVM, linear regression, etc.
◦ Since the algorithm requires no training before making
predictions, new data can be added seamlessly.
◦ There are only two parameters required to implement the 𝑘-NN,
the value of 𝑘 and the distance function (e.g. Euclidean,
Manhattan, etc.)
7/24/2022 INTRODUCTION TO AI 54
K-Nearest Neighbor: Pros and Cons
• Cons
◦ Not efficient in testing; computation of distance measure to every
training example
◦ The 𝑘-NN algorithm doesn't work well with high dimensional data
because with large number of dimensions, it becomes difficult for
the algorithm to calculate distance in each dimension.
◦ The 𝑘-NN algorithm doesn't work well with categorical features
since it is difficult to find the distance between dimensions with
categorical features.

7/24/2022 INTRODUCTION TO AI 55
K-Nearest Neighbor: Implementation
• On Jupyter Notebook: kNN-Implementation.ipynb

7/24/2022 INTRODUCTION TO AI 56
ML Evaluation Metrics
• Different applications have very different goals
• It's very important to choose evaluation metric that match the goal
of the application.
• Accuracy, which is a measure of the overall performance of model,
is widely used, but many others are possible, e.g. precision, recall
and f1 score

7/24/2022 INTRODUCTION TO AI 57
ML Evaluation Metrics
• Accuracy: for what fraction of all instances is the classifier's
prediction correct
• Classification error (1 – Accuracy): for what fraction of all instances
is the classifier's prediction incorrect
• Precision: How accurate the positive predictions are - what fraction
of positive predictions are correct
• Recall: Coverage of actual positive sample - what fraction of all
positive instances does the classifier correctly identify as positive

7/24/2022 INTRODUCTION TO AI 58
K-Nearest Neighbor: Evaluating Model Performance
• Confusion matrix facilitate the calculation of these metrics
• Confusion Matrix: is a matrix in which (m,n)th element is the number
of examples of the mth class which were labeled, by the classifier, as
belonging to the nth class
◦ A confusion matrix for the Iris classifier looks like the following
Predicted Label :
Iris-setosa Iris-versicolor Iris-virginica
Iris-setosa 12 0 0
True Label:

Iris-versicolor 0 11 0
Iris-virginica 0 1 6

7/24/2022 INTRODUCTION TO AI 59
Confusion Matrix for a Binary Classifier
• Suppose that the correct label is either 1 (+ve) or 0 (-ve). Then the
confusion matrix is just 2x2
• For example, in the highlighted box, you would write the # examples
of class 1 which were misclassified as class 0
Predicted Label :

1 0

True Label:
1
0

7/24/2022 INTRODUCTION TO AI 60
False Positives & False Negatives
• TP (True Positives) = examples that were correctly labeled as “1”
• FN (False Negatives) = examples that should have been “1”, but
were labeled as “0”
• FP (False Positives) = examples that should have been “0”, but were
labeled as “1”
Predicted Label :
• TN (True Negative) = examples that were correctly
1 0
labeled as “0”

True Label:
1 TP FN
0 FP TN

7/24/2022 INTRODUCTION TO AI 61
False Positives & False Negatives

7/24/2022 INTRODUCTION TO AI 62
Evaluating the Classification Algorithm: Example
• Example: Iris-setosa Iris-versicolor Iris-virginica
Iris-setosa 12 0 0
Iris-versicolor 0 11 0
Iris-virginica 0 1 6

• Accuracy of predicting Iris-setosa (IS) = 12+18/12+18 = 100%


Predicted Label : Predicted Label :

IS Not-IS IS Not-IS

True Label:
True Label:

IS TP FN IS 12 0
Not-IS FP TN Not-IS 0 18

7/24/2022 INTRODUCTION TO AI 63
Evaluating the Classification Algorithm: Example
• Example: Iris-setosa Iris-versicolor Iris-virginica
Iris-setosa 12 0 0
Iris-versicolor 0 11 0
Iris-virginica 0 1 6

• Accuracy of predicting Iris-versicolor (IVC) = 11+18/11+18+1 = 97%


Predicted Label : Predicted Label :

IVC Not-IVC IVC Not-IVC

True Label:
True Label:

IVC TP FN IVC 11 0
Not-IVC FP TN Not-IVC 1 18

7/24/2022 INTRODUCTION TO AI 64
Evaluating the Classification Algorithm: Example
• Example: Iris-setosa Iris-versicolor Iris-virginica
Iris-setosa 12 0 0
Iris-versicolor 0 11 0
Iris-virginica 0 1 6

• Accuracy of predicting Iris-virginica (IVN) = 6+23/6+23+1 = 97%


Predicted Label : Predicted Label :

IVN Not-IVN IVN Not-IVN

True Label:
True Label:

IVN TP FN IVN 6 1
Not-IVN FP TN Not-IVN 0 23

7/24/2022 INTRODUCTION TO AI 65
Evaluating the Classification Algorithm: Example
• Example: Iris-setosa Iris-versicolor Iris-virginica
Iris-setosa 12 0 0
Iris-versicolor 0 11 0
Iris-virginica 0 1 6

• Accuracy of predicting Iris-setosa (IS) = 12+18/12+18 = 100%


• Accuracy of predicting Iris-versicolor (IVC) = 11+18/11+18+1 = 97%
• Accuracy of predicting Iris-virginica (IVN) = 6+23/6+23+1 = 97%
• Overall accuracy = (12+11+6)/30 = 97%

7/24/2022 INTRODUCTION TO AI 66
Precision vs Recall
• Precision is used when the goal is to limit the number of false
positives
◦ Examples of Precision-oriented machine learning tasks:
 Search engine ranking, query suggestion
 Document classification
 Many customer-facing tasks (users remember failures!)
• Recall is used when it is important to avoid false negatives
◦ Examples of Recall-oriented machine learning tasks:
 Search and information extraction in legal discovery
 Tumor detection
 Often paired with a human expert to filter out false positives

7/24/2022 INTRODUCTION TO AI 67
COE 292
Introduction to Artificial Intelligence

Machine Learning – Part 2


Content on these slides was mostly developed by Dr. Akram F. Ahmed, COE Dept.

7/25/2022 INTRODUCTION TO AI 1
Outline
• Maximum Marginal Classifiers
• Perceptron
• Support Vector Machines (SVM)
• SVM Kernel Function
• SVMs vs Perceptron

7/25/2022 INTRODUCTION TO AI 2
Maximum Marginal Classifiers
• We would like to build an email Spam filter that can detect Spam
emails from other emails.
• Suppose that we only want to filter out
emails that ask you to purchase a
product.
• The data analyst analyzed the emails and
gave the following data (figure on the right)
• We want to have a classifier that will
separate most of the Spam from not
Spam?
7/25/2022 INTRODUCTION TO AI 3
Maximum Marginal Classifiers
• Given data on the occurrence of the
word “buy” in emails
• We attempt to find the best vertical line
that can separate the Spam from Non
Spam as shown (right)
• Since we are working in 1-Dimensional
space this is the best we can do.
• This way we have used the previous
history using one dimensional feature to
classify the Spam emails from others

7/25/2022 INTRODUCTION TO AI 4
Maximum Marginal Classifiers
• Suppose now we have a new set of
data from which we want to classify
Spam emails from others
• However, this time the data includes
the count of spelling mistakes along
with the occurrence of the word
“buy”
• Clearly we could use a 1-D classifier
where
◦ If #Buy + SpellingMistakes > 4 then
spam

7/25/2022 INTRODUCTION TO AI 5
Maximum Marginal Classifiers
• We can also view the given data as a 2-D graph as shown below

7/25/2022 INTRODUCTION TO AI 6
Maximum Marginal Classifiers
• Using the rules that the sum
of the spelling mistakes and
number of “buy” are greater
or equal to 4, we need to
sketch the line such that for
any point (𝑥,𝑦) on the line
we have 𝑥+𝑦=4.
• This is represented in the
figure (right)

7/25/2022 INTRODUCTION TO AI 7
Maximum Marginal Classifiers
• Usually data is not well arranged as the example and hence we need
to find an algorithm that will place the line without any information
provided by the user.
• Perceptron is an linear classification algorithm.
◦ It learns a decision boundary that separates two classes using a
line (called a hyperplane) in the feature space.
◦ As such, it is appropriate for those problems where the classes can
be separated well by a line or linear model, referred to as linearly
separable.

7/25/2022 INTRODUCTION TO AI 8
Perceptron Algorithm: Motivation
• In the below figure, which decision boundary is better?
◦ Both have zero training error (perfect training accuracy)
◦ But, one of them seems intuitively better

7/25/2022 INTRODUCTION TO AI 9
Perceptron Algorithm: Motivation
• Yet a better questions is: Which of the linear separators is optimal?

• We want the computer algorithm to find the best line that separates
any two sets of data. How can we tell the computer to find this line?
◦ We need a sequence of steps that the computer can follow to find the best
line.
7/25/2022 INTRODUCTION TO AI 10
Perceptron Algorithm: Introduction
• It takes inputs, xi , aggregate them (as a weighted sum) and returns
1 if the aggregated sum is more than some threshold else returns 0.
• It supports real inputs as well as Boolean (0, 1) which makes it more
useful and generalized.
• The output y is determined as follows:

◦ Where th is some selected threshold. In other words, we take a weighted sum of the
inputs and set the output as 1 only when the sum is more than an arbitrary threshold th.

7/25/2022 INTRODUCTION TO AI 11
Perceptron Algorithm: Introduction
• However, according to the convention, the thresholding parameter
th is add it as one of the inputs, with the weight as -th (see the
figure)
• Here, w0 is called the bias and x0 is always set to 1.
• Then the model’s mathematical representation
will be:

7/25/2022 INTRODUCTION TO AI 12
Perceptron Algorithm: Introduction
• The coefficients of the model, wi , are referred to as input weights
and are trained using the gradient descent optimization algorithm
• The optimal weights and the bias will depend on the data

7/25/2022 INTRODUCTION TO AI 13
Perceptron Algorithm: Overview
• Take the example below that represents two sets of data that need
to be classified:
◦ Find a line that separates the two classes of data (red and blue)?
◦ We know that the line equation is given by ax + by + c = 0

x2
x1

7/25/2022 INTRODUCTION TO AI 14
Perceptron Algorithm: Overview
• The line equation is given by ax + by + c = 0
• The inputs, xi , are x1=x, x2=y and x0=1
• The weights here are w1=a and w2=b and the bias is w0=c

• If y=1, we are classifying the


observation as red

x2
• If y=0, we are classifying the
observation as blue

x1

7/25/2022 INTRODUCTION TO AI 15
Perceptron Algorithm: Overview
• Points from the training dataset are selected randomly to train, one
at a time
• Each time, the Perceptron makes a prediction, and the error is
calculated
• The weights of the model are then updated
to reduce the errors for the point chosen
◦ This is called the Perceptron update rule

x2
x1

7/25/2022 INTRODUCTION TO AI 16
Perceptron Algorithm: Overview
• This process is repeated for all examples in the training dataset,
called an epoch (iteration).
• This process of updating the model using points is then repeated for
many epochs
• The weights are updated with a small
proportion of the error in each epoch, and the
proportion is controlled by a hyperparameter

x2
called the learning rate (𝞴), typically set
to a small value.
x1

7/25/2022 INTRODUCTION TO AI 17
Perceptron Algorithm: Overview
• The learning rate (𝞴), is typically set to a small value to ensure
learning does not occur too quickly and referred to as premature
convergence of the optimization (search) procedure for the model
weights.
◦ weights(t + 1) = weights(t) + learning_rate * input_i,
◦ Where t is epoch t
◦ E.g. wi (t+1)= wi (t)+ 𝞴 * xi
• Training is stopped when the error made by the model falls to a low
level or no longer improves, or a maximum number of epochs is
performed.

7/25/2022 INTRODUCTION TO AI 18
Perceptron Algorithm: Learning
• Step 1: Start with a random line that divides the
space into blue and red side. The equation of the
line is given by 𝑎𝑥1+𝑏𝑥2 +𝑐 = 0 x2
• Step 2: pick a large number n (number of
repetition or epochs) (we will pick 𝑛=1000)
• Step 3: pick a learning rate (usually small since x1
we do not want any single points to have a large
effect on the approximation) 𝜆=0.01

7/25/2022 INTRODUCTION TO AI 19
Perceptron Algorithm: Learning
• Step 4: repeat for 𝑛 and for all points
◦ 4.1: Pick a random point from the given points
(𝑝,𝑞) x2
◦ 4.2: If point color is correctly classified, i.e.
▪ If point is blue and 𝑎𝑥1+𝑏𝑥2 +𝑐 < 0, OR
▪ If point is red and 𝑎𝑥1+𝑏𝑥2 +𝑐 > 0
x1
▪ Then, Do nothing

7/25/2022 INTRODUCTION TO AI 20
Perceptron Algorithm: Learning
◦ 4.3: If point is blue and 𝑎𝑥1+𝑏𝑥2 +𝑐 > 0 [we want it < 0]
▪ 𝑎 = 𝑎 − 𝜆𝑥1 [pivot the line counter clockwise, fix y intercept]
▪ 𝑏 = 𝑏 − 𝜆𝑥2 [pivot the line clockwise , fix x intercept] x2
▪𝑐=𝑐−𝜆 [move the line up]
◦ 4.4: If point is red and 𝑎𝑥1+𝑏𝑥2 +𝑐 < 0 [we want it > 0]
▪ 𝑎 = 𝑎 + 𝜆𝑥1 [pivot the line clockwise , fix y intercept]
x1
▪ 𝑏 = 𝑏 + 𝜆𝑥2 [pivot the line counter clockwise , fix x intercept]
▪𝑐=𝑐+𝜆 [move the line down]

• Step 5: You have found your line!

7/25/2022 INTRODUCTION TO AI 21
Perceptron Algorithm: Learning

Step 1 Step 2 Step 3 Step 4 Step 5

7/25/2022 INTRODUCTION TO AI 22
Perceptron Error
• Suppose that we ran our Perceptron Algorithm and obtained the
shown result:
◦ How can we calculate the error to
compare two different runs of the
algorithm?

• Note: the algorithm chooses points randomly to take the line closer
to the point if its misclassified, thus even if the number of epochs is
not very large, you may get different lines at each runs
7/25/2022 INTRODUCTION TO AI 23
Perceptron Error
• Idea: Use the shortest distance between misclassified nodes and the
estimated line as a measure for error for that point.
• Then the total error is the sum of all the distance for points that
were misclassified.
• In real life we do not use distances and we
used something proportional to them.

7/25/2022 INTRODUCTION TO AI 24
Perceptron Error
• Example: Suppose that a Perceptron classifier misclassified two test
points located at (4,5) and (1,1) and returned the soft margin line
equation as 2𝑥+3𝑦−6=0 as shown :
• Calculate the Perceptron error?

7/25/2022 INTRODUCTION TO AI 25
Perceptron Error
• Solution: A proportional measure to the distance
is when you substitute the point coordinates into
the line equation of the soft margin and taking
the absolute value of the error. Substituting the
point coordinates into the soft margin line
equation we get
◦ For point (4,5) : |2×4+3×5−6|=17
◦ For point (1,1) : |2×1+3×1−6|=1
• The total error in this case is 17+1=18
• Using the error calculations, different lines can
be compared where the least error line is the
best one.
7/25/2022 INTRODUCTION TO AI 26
Perceptron Algorithm Implementations
• On Jupyter Notebooks:
◦ Perceptron-Implementation
◦ Perceptron-Implementation-Scikit-Learn

7/25/2022 INTRODUCTION TO AI 27
Support Vector
Machines (SVMs)

7/25/2022 INTRODUCTION TO AI 28
Support Vector Machine (SVM)
• SVM is the amalgamation of Maximum-margin with the Perceptron
algorithm in which higher dimensions maybe used to classify the
data.
• The main idea is to have a classifier
that can detect a line that has
maximum margin to all support
vectors as shown

7/25/2022 INTRODUCTION TO AI 29
Support Vector Machine (SVM)
• The figure below, shows the training data replicating the weight of
mice where a red dot represents a non-obese mouse while a green
dot is for obese mouse.

• To classify a new mouse as obese or not, need to define a threshold


as a cut point between being obese or not.
• The best threshold is the mid-point between the edge node of each
cluster as shown below:

7/25/2022 INTRODUCTION TO AI 30
Support Vector Machine (SVM)
• If we get a new observation shown in black below: Classified as not obese

margin

• The distance between the edge observation of a cluster to the


threshold is called a margin.
• When we use the threshold that gives the maximum margin to
make classification we call it Maximum Marginal Classifiers

7/25/2022 INTRODUCTION TO AI 31
Support Vector Machine (SVM)
• What if our training data looked like this:

• There is an outlier that will cause the Maximum Marginal Classifier


to look like this:

• The outlier node is classified as not obese!!.


7/25/2022 INTRODUCTION TO AI 32
Support Vector Machine (SVM)
• If we try to classify a new observation shown in black we will classify
it as not obese! Although it is very far away from the not obese and
closer to the obese.

• The Maximum Marginal Classifiers are very sensitive to outliers.


• What can we do about it?

7/25/2022 INTRODUCTION TO AI 33
Support Vector Machine (SVM)
• How about we allow some misclassification (i.e. allow some error)
to help us classify new observations better.
• If we do not consider the outlier node we will get the same
threshold as the example without the outlier. Hence, some
misclassification may classify some training data incorrectly but
increases the correct classification of the new observed data (a
good trade off)

7/25/2022 INTRODUCTION TO AI 34
Support Vector Machine (SVM)
• Choosing a threshold that allows misclassification is an example of
Bias/Variance trade off where the expert has to do some trade off
for better classification.
• Since we allowed some misclassification, the margin is called a soft
margin, as opposed to hard margin.
• How do we know that the Soft Margin is the best selected
threshold?
◦ We use Cross Validation.

7/25/2022 INTRODUCTION TO AI 35
Support Vector Machine (SVM)
• When we are using soft margin to
classify the data, we often refer to is
as Support Vector Classifier
• The name Support Vector Classifier
come from the fact the observations
on the edge and within the Soft
Margin are called Support Vectors

7/25/2022 INTRODUCTION TO AI 36
Support Vector Machine (SVM)
• Suppose we have the data of a drug dosage as shown below where
the red dots represent patients that got not cured while the green
dots represent those got cured.

• The data has lots of overlap


• Basically what the data is pointing out is the fact that if the dosage
is too little or too high the drug does not work. It will only work if
the dosage is right.

7/25/2022 INTRODUCTION TO AI 37
Support Vector Machine (SVM)
• How can we classify this problem?
◦ Perceptron will not work since
overlap is quite high
◦ If we put the soft margin at any
point, we will have lots of misclassification.
◦ We do not have any additional data to put as a second axis →
Maximum marginal classifiers and Support Vector classifiers will
not work.
• What should we Do?

7/25/2022 INTRODUCTION TO AI 38
Support Vector Machine (SVM)
• Transform the problem to a higher
dimension by computing the values
on the 𝑦 -axis as 𝑦 = 𝑓(𝑥)
• We will use 𝑓(𝑥)=𝑥2
• Plotting the 2-D data we get the
figure on the right
• We now can draw a support vector
classifier to classify the data, this will
help us in classifying the group of
people who got cured and those who
did not get cured.
7/25/2022 INTRODUCTION TO AI 39
Support Vector Machine (SVM)
• Now we can classify any new
observation by plotting it onto the
new 2-D (i.e. at point (𝑥,𝑥2) ) and
compare it with our soft classifier
line.
◦ If it is above, the dosage will not cure
and if it is below the patient will be
cured.
• In general, taking the data to a
higher dimension may result in a
better classification.

7/25/2022 INTRODUCTION TO AI 40
Support Vector Machine (SVM)
• To increase the dimensionality of the
data:
◦ we either get a new set of data (like
the Spam email example where we
added the spelling mistakes as
another dimension) or
◦ we manipulate the data to get the
best result in the second dimension
(Like the above example).

7/25/2022 INTRODUCTION TO AI 41
Support Vector Machine (SVM)
• To select a manipulation technique,
SVM uses something called Kernel
Functions to systematically find
support vector classifiers in higher
dimensions.

7/25/2022 INTRODUCTION TO AI 42
Kernel Function
• Example - the polynomial kernel
◦ One kernel function uses a polynomial of degree 𝑑 where 𝑑 starts
from 1
◦ When 𝑑=1 , the polynomial kernel computes the relationship
between each pair of 1-D observations to find the best support
vector classifier

7/25/2022 INTRODUCTION TO AI 43
Kernel Function
• Example - the polynomial kernel
◦ When 𝑑=2 , the polynomial kernel
computes the relationship
between each pair of 2-D
observations to find the best
support vector classifier

7/25/2022 INTRODUCTION TO AI 44
Kernel Function
• Example - the polynomial kernel
◦ When 𝑑=3 , the polynomial kernel
computes the relationship between
each pair of 3-D observations to find
the best support vector classifier
• We can increase 𝑑 till an appropriate
level
• Finally, we use cross validation to find
the best value for 𝑑
Remark: For SVM, there are many kernel functions such as the polynomial, radial, nonlinear
and sigmod. We just have to find the best kernel for the considered data set

7/25/2022 INTRODUCTION TO AI 45
SVM Kernel: Example 1
• Suppose we have the following data for some problem that requires
classification

7/25/2022 INTRODUCTION TO AI 46
SVM Kernel: Example 1
• Using the square kernel and maximal margin we get

7/25/2022 INTRODUCTION TO AI 47
SVM Kernel: Example 2
• Suppose we have the following
data in which we want to classify
large yellow objects and small
purple objects as category green
while the large purple objects and
small yellow objects as category
black

7/25/2022 INTRODUCTION TO AI 48
SVM Kernel: Example 2
• We can use some kernel to stretch it out like:
◦ (Same figure viewed from different angles)

7/25/2022 INTRODUCTION TO AI 49
SVM Kernel: Example 2
• We can use some kernel to stretch it out like:
◦ (Same figure viewed from different angles)

Decision Hyperplane

7/25/2022 INTRODUCTION TO AI 50
SVM Kernel: Example 3

7/25/2022 INTRODUCTION TO AI 51
Support Vector Classifiers
• In 1-D, a support vector classifier is a single dot within the 1-D
space.
• In 2-D, a support vector classifier is a line within the 2-D space, i.e.
𝑥𝑦 plane.
• In 3-D, a support vector classifier is a plane within the 3-D space, i.e.
𝑥𝑦𝑧 cube.
• In higher dimensions, a support vector classifier is a hyperplane
within the same dimension

7/25/2022 INTRODUCTION TO AI 52
SVM (without kernel) vs Perceptron

7/25/2022 INTRODUCTION TO AI 53
When does the SVM performance degrade?
• The SVM does not perform well if he following conditions occur:
◦ Data with lots of error since the discriminator location (the best separating
line) depends on the few nearest data points.
◦ Choosing the wrong kernel might result with a bad classifier.
◦ Kernel selection is a trial and error (generate and test).
◦ Experience plays a vital role in selecting the best kernel given a data set
◦ There is no single kernel that is the best for all data set
◦ Using SVM with large data sets may be expensive since the
complexity of the calculations required by the kernel is often high.

7/25/2022 INTRODUCTION TO AI 54
SVM Implementation Example
• On Jupyter Notebook: SVM-Implementation.ipynb

7/25/2022 INTRODUCTION TO AI 55
References/Sources
• https://www.youtube.com/watch?v=efR1C6CvhmE
• https://www.youtube.com/watch?v=Lpr__X8zuE8&list=RDCMUCgBncpylJ1kiVaPyP-PZauQ&index=2
• https://www.youtube.com/watch?v=_PwhiWxHK8o
• https://www.youtube.com/watch?v=fSytzGwwBVw
• https://www.youtube.com/watch?v=_aWzGGNrcic

7/25/2022 INTRODUCTION TO AI 56
COE 292
Introduction to Artificial Intelligence

Machine Learning – Part 3 – Unsupervised Learning


Presentation is based on the content developed by Dr. Akram F. Ahmed, COE Dept.

7/27/2022 INTRODUCTION TO AI 1
Overview
• Unsupervised learning eliminates the need of a teacher.
• The learner is solely responsible to form own concepts and evaluate these
for learning.
• For example, scientists are usually not blessed to have a teacher to help
them pursue the research, instead they propose hypothesis to explain the
observations made by them, and evaluate their hypotheses based on criteria
like generality, simplicity, and elegance, and test these hypotheses through
experiments designed by themselves.

7/27/2022 INTRODUCTION TO AI 2
Clustering
• Grouping objects by similarity
• Take all data and ask – what are typical
groups in data
• 𝑘-means clustering is an example of
clustering algorithms

7/27/2022 INTRODUCTION TO AI 3
K-Means Clustering
• 𝑘-means Clustering is used when you want to partition data into 𝑘 clusters
• The steps of the algorithm are shown below:

7/27/2022 INTRODUCTION TO AI 4
K-Means Clustering
• Example: Suppose we have some data represented as the following figure:

7/27/2022 INTRODUCTION TO AI 5
K-Means Clustering
• Step 1
o Suppose we want 2 clusters, i.e. 𝑘 = 2
o Choose two random centroids, i.e. means,
as shown in the figure
o A centroid is the imaginary or real location
representing the center of the cluster

7/27/2022 INTRODUCTION TO AI 6
K-Means Clustering
• Step 2
o For each point, calculate the distance to
both centroids and color it (associate it)
with the color of the nearest centroid.
▪ (note you can find that line between the
centroid with the property that each point on
the line is equidistant from both centroid)

7/27/2022 INTRODUCTION TO AI 7
K-Means Clustering
• Step 3
o Using the new labeled points calculate the location
of the new centroid (or shift the centroid to the
mean values of all similarly labeled points)

7/27/2022 INTRODUCTION TO AI 8
K-Means Clustering
• Step 4
o Repeat the same steps to get the location of the new centroids

o Stop if no nodes changes its association to the centroid (i.e. does not change color)

7/27/2022 INTRODUCTION TO AI 9
K-Means Clustering - Examples

7/27/2022 INTRODUCTION TO AI 10
References/Sources
• https://www.youtube.com/watch?v=efR1C6CvhmE
• https://www.youtube.com/watch?v=Lpr__X8zuE8&list=RDCMUCgBncpylJ1kiVaPyP-PZauQ&index=2
• https://www.youtube.com/watch?v=_PwhiWxHK8o
• https://www.youtube.com/watch?v=fSytzGwwBVw
• https://www.youtube.com/watch?v=_aWzGGNrcic

7/27/2022 INTRODUCTION TO AI 11

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy