0% found this document useful (0 votes)
12 views30 pages

Lec03 Classifiers KNN+DT

The document outlines foundational concepts in machine learning, focusing on classification methods such as k-Nearest Neighbors (k-NN) and Decision Trees. It discusses the mechanics of k-NN, including the importance of selecting the right 'k' value, the concept of distance-weighted neighbors, and improvements for efficiency. Additionally, it covers decision trees, their structure, impurity measures, and strategies for avoiding overfitting through pruning techniques.

Uploaded by

muskansah099
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views30 pages

Lec03 Classifiers KNN+DT

The document outlines foundational concepts in machine learning, focusing on classification methods such as k-Nearest Neighbors (k-NN) and Decision Trees. It discusses the mechanics of k-NN, including the importance of selecting the right 'k' value, the concept of distance-weighted neighbors, and improvements for efficiency. Additionally, it covers decision trees, their structure, impurity measures, and strategies for avoiding overfitting through pruning techniques.

Uploaded by

muskansah099
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Foundations of Machine Learning

Classifiers
Aug 2024

Vineeth N Balasubramanian
Classification Methods
• k-Nearest Neighbors
• Decision Trees
• Naïve Bayes
• Support Vector Machines
• Logistic Regression
• Neural Networks
• Ensemble Methods (Boosting, Random Forests)

Foundations of Machine Learning


2
k-Nearest Neighbors
• Basic idea:
• If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute
Distance Test
Record

Training Choose k of the


Records “nearest” records

Foundations of Machine Learning


3
k-Nearest Neighbors
• Majority vote within the k nearest neighbors

new

K= 1: blue
K= 3: green

Foundations of Machine Learning


4
k-Nearest Neighbors
• Choosing k is important
• If k is too small, sensitive to noise points
• If k is too large, neighborhood may include points from other classes

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

Foundations of Machine Learning


5
k-Nearest Neighbors
• An arbitrary instance is represented by (a1(x), a2(x), a3(x),.., an(x))
• ai(x) denotes features
• Euclidean distance between two instances
• d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2)
• In case of continuous-valued target function
• Mean value of k nearest training examples

Foundations of Machine Learning


6
How to determine k
• Determined experimentally
• Start with k=1 and use a test set to validate the error rate of the
classifier
• Repeat with k=k+2
• Choose the value of k for which the error rate is minimum
• Note: k typically an odd number to avoid ties in binary
classification

Foundations of Machine Learning


7
k-Nearest Neighbors
• Eager Learning (Induction)
• Explicit description of target function on the whole training set
• Instance-based Learning (Transduction)
• Learning=storing all training instances
• Classification=assigning target function to a new instance
• Referred to as “Lazy” learning
Similar Keywords: K-Nearest Neighbors, Memory-Based Reasoning,
Example-Based Reasoning, Instance-Based Learning, Case-Based
Reasoning, Lazy Learning

Foundations of Machine Learning


8
Voronoi Diagram

Decision surface formed by the training examples!

Foundations of Machine Learning


9
Improvements
• Distance-Weighted Nearest Neighbors
• Assign weights to the neighbors based on their ‘distance’ from the query point
(E.g., weight ‘may’ be inverse square of the distances)
• Scaling (normalization) attributes for fair computation of distances

Foundations of Machine Learning


10
Improvements
• Distance-Weighted Nearest Neighbors
• Assign weights to the neighbors based on their ‘distance’ from the query point
(E.g., weight ‘may’ be inverse square of the distances)
• Scaling (normalization) attributes for fair computation of distances
• Measure “closeness” differently
• Finding “close” examples in a large training set quickly
• E.g. Efficient memory indexing using kd-trees

Foundations of Machine Learning


11
k-NN: Summary
• Pros
• Highly effective inductive inference method for noisy training data and complex target
functions
• Target function for a whole space may be described as a combination of less complex
local approximations
• Trains very fast (“Lazy” learner)
• Cons
• Curse of dimensionality
• In higher dimensions, all the data points lie on the surface of the unit hypersphere (the inside
is empty!) Check: http://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/chap1-high-dim-
space.pdf
• Storage: all training examples are saved in memory
• A decision tree or linear classifier is much smaller
• Slow at query time
• Can be overcome and presorting and indexing training samples

Foundations of Machine Learning


12
Convergence of 1-NN
P(Y|x) P(Y|x’’) x2
P (knnError) x
= 1 − Pr( y = y1 ) neighbor y2
= 1 −  Pr(Y = y ' | x) 2
y
y'
x1
= 1 − Pr( y* | x) 2 − 
y ' y*
Pr(Y = y ' | x ) 2
P(Y|x1)
... y1
 2(1 − Pr( y* | x))
= 2(Bayes optimal error rate) assume equal let y*=argmax Pr(y|x)

Foundations of Machine Learning


13
Convergence of 1-NN
P(Y|x) P(Y|x’’) x2
P (knnError) x
= 1 − Pr( y = y1 ) neighbor y2
= 1 −  Pr(Y = y ' | x) 2
y
y'
x1

Possible to show 
= − − =
2 2
1 Pr( y* | x ) Pr(Y y ' | x )
that: as the size of training data set
P(Y|xapproaches
) 1
y ' y*
... the one nearest neighbor classifier guarantees
infinity, y1 an error rate of
noworse
2(1 − Pr(
thany*twice
| x)) the Bayes error rate (the minimum achievable
error rate given the distribution
= 2(Bayes optimal error rate) of the data). We will see this later.
assume equal let y*=argmax Pr(y|x)

Foundations of Machine Learning


14
Non-parametric Density Estimation using kNNs
• K-Nearest Neighbor
estimator
• Instead of fixing bin width
h and counting the
number of instances, fix
the instances (neighbors)
k andMore
checklater
bin width
when we move to unsupervised learning
k
pˆ (x ) =
2Nd k (x )
dk(x), distance to kth
closest instance to x
Source: Ethem Alpaydin, Introduction to Machine Learning, 3rd Edition (Slides)

Foundations of Machine Learning


15
Classification Methods
• k-Nearest Neighbors
• Decision Trees
• Naïve Bayes
• Support Vector Machines
• Logistic Regression
• Neural Networks
• Ensemble Methods (Boosting, Random Forests)

Foundations of Machine Learning


16
Example

Foundations of Machine Learning


17
Decision Trees
Node

• An efficient
nonparametric
method
• A hierarchical
model
• Divide-and–
conquer
Leaf
strategy
Source: Ethem Alpaydin, Introduction to Machine Learning, 3rd Edition (Slides)

Foundations of Machine Learning


18
Divide and Conquer
• Internal decision nodes
• Univariate: Uses a single attribute, xi
• Numeric xi :
• Binary split : xi > wm
• Discrete xi :
• n-way split for n possible values
• Multivariate: Uses more than one attributes, x
• Leaves
• Classification: Class labels, or proportions
• Regression: Numeric; r average, or local fit
• Learning is greedy; find the best split recursively
Source: Ethem Alpaydin, Introduction to Machine Learning, 3rd Edition (Slides)

Foundations of Machine Learning


19
Classification Trees (C4.5, J48)
• For node m, Nm instances reach m, Nim belong to Ci
i
N
Pˆ(C i | x, m)  pmi = m
2-class problem
Nm
• Node m is pure if pim is 0 or 1
• Measure of impurity is entropy
K
I m = −  pmi log2 pmi
i =1

Entropy in information theory specifies the average (expected) amount of information derived from observing
an event
Source: Ethem Alpaydin, Introduction to Machine Learning, 3rd Edition (Slides)

Foundations of Machine Learning


20
Classification Trees
• If node m is pure, generate a leaf and stop, otherwise split and continue
recursively
• Impurity after split: Nmj of Nm take branch j. Nimj belong to Ci
i
Nmj
Pˆ(C i |x, m, j )  pmj
i
=
Nmj
n Nmj K
I'm = −  mj 2 mj
p i
log p i

j =1 Nm i =1

• Information Gain: Expected reduction in impurity measure after split


• Choose the best attribute(s) (with maximum information gain) to split
the remaining instances and make that attribute a decision node
• You can use same logic to find best splitting value too
Source: Ethem Alpaydin, Introduction to Machine Learning, 3rd Edition (Slides)

Foundations of Machine Learning


21
Other Measures of Impurity
• The properties of functions measuring the impurity of a split:
•  (1 / 2,1 / 2)   ( p,1 − p), for any p  [0,1]
•  (0,1) =  (1,0) = 0
1
•  ( p,1 − p ) is increasing in p on [0, ]
2
1
and decreasing in p on [ ,1]
2
• Examples (other than entropy)
• Gini impurity/index:

Foundations of Machine Learning


22
Decision Trees: Example

Foundations of Machine Learning


23
Decision
Trees:
Example

Foundations of Machine Learning


24
Decision
Trees:
Example

Foundations of Machine Learning


25
Overfitting and Generalization
• Overfitting can occur with noisy training examples, also when
small numbers of examples are associated with leaf nodes.
How to handle?
• Pruning: Remove subtrees for better generalization
(decrease variance)
• Prepruning: Early stopping
• Postpruning: Grow the whole tree then prune subtrees which overfit
on the pruning set
• Prepruning is faster, postpruning is more accurate

Foundations of Machine Learning


26
Overfitting and Generalization
• Occam’s Razor principle: when multiple hypotheses can
solve a problem, choose the simplest one
• a short hypothesis that fits data unlikely to be coincidence
• a long hypothesis that fits data might be coincidence

• How to select “best” tree:


• Measure performance over training data
• Measure performance over separate validation data set
• Minimum Description Length: Minimize size(tree) +
size(misclassifications(tree))

Foundations of Machine Learning


27
Rule Extraction from Trees
• Convert tree to equivalent set
of rules
• Prune each rule independently
of others, by removing any
preconditions that result in
improving its estimated
accuracy
• Sort final rules into desired
sequence for use

Foundations of Machine Learning


28
Multivariate Trees

Foundations of Machine Learning


29
Readings
• Chapters 8, 9, EA Introduction to ML 2nd Edn

Foundations of Machine Learning


30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy