0% found this document useful (0 votes)

7 views7 pages

Self Reading - KNN - Notes

The document discusses the concept of representing data as points in high-dimensional space and introduces the K-nearest neighbor (KNN) algorithm for classification. It explains how to compute distances between data points and emphasizes the importance of feature vectors in machine learning. Additionally, it addresses challenges such as the curse of dimensionality and the impact of feature scaling on the KNN classifier's performance.

Uploaded by

Hadia Ramzan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views7 pages

Self Reading - KNN - Notes

Uploaded by

Hadia Ramzan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

3 | G EOMETRY AND N EAREST N EIGHBORS

Our brains have evolved to get us out of the rain, find where the Learning Objectives:
berries are, and keep us from getting killed. Our brains did not • Describe a data set as points in a
high dimensional space.
evolve to help us grasp really large numbers or to look at things in
• Explain the curse of dimensionality.
a hundred thousand dimensions. – Ronald Graham
• Compute distances between points
in high dimensional space.
• Implement a K-nearest neighbor
model of learning.
• Draw decision boundaries.
You can think of prediction tasks as mapping inputs (course
• Implement the K-means algorithm
reviews) to outputs (course ratings). As you learned in the previ- for clustering.
ous chapter, decomposing an input into a collection of features (e.g.,
words that occur in the review) forms a useful abstraction for learn-
ing. Therefore, inputs are nothing more than lists of feature values.
This suggests a geometric view of data, where we have one dimen-
sion for every feature. In this view, examples are points in a high-
dimensional space.
Once we think of a data set as a collection of points in high dimen-
sional space, we can start performing geometric operations on this
data. For instance, suppose you need to predict whether Alice will
like Algorithms. Perhaps we can try to find another student who is
Dependencies: Chapter 1
most “similar” to Alice, in terms of favorite courses. Say this student
is Jeremy. If Jeremy liked Algorithms, then we might guess that Alice
will as well. This is an example of a nearest neighbor model of learn-
ing. By inspecting this model, we’ll see a completely different set of
answers to the key learning questions we discovered in Chapter 1.

3.1 From Data to Feature Vectors

An example is just a collection of feature values about that example,

for instance the data in Table 1 from the Appendix. To a person, these
features have meaning. One feature might count how many times the
reviewer wrote “excellent” in a course review. Another might count
the number of exclamation points. A third might tell us if any text is
underlined in the review.
To a machine, the features themselves have no meaning. Only
the feature values, and how they vary across examples, mean some-
thing to the machine. From this perspective, you can think about an
example as being represented by a feature vector consisting of one
“dimension” for each feature, where each dimenion is simply some
real value.
Consider a review that said “excellent” three times, had one excla-
30 a course in machine learning

mation point and no underlined text. This could be represented by

the feature vector h3, 1, 0i. An almost identical review that happened
to have underlined text would have the feature vector h3, 1, 1i.

AI?
Note, here, that we have imposed the convention that for binary
features (yes/no features), the corresponding feature values are 0
and 1, respectively. This was an arbitrary choice. We could have
made them 0.92 and 16.1 if we wanted. But 0/1 is convenient and
easy?
helps us interpret the feature values. When we discuss practical
issues in Chapter 5, you will see other reasons why 0/1 is a good

sys?
choice.
Figure 3.1 shows the data from Table 1 in three views. These three

AI?
views are constructed by considering two features at a time in differ-
ent pairs. In all cases, the plusses denote positive examples and the
minuses denote negative examples. In some cases, the points fall on
top of each other, which is why you cannot see 20 unique points in
all figures.

sys?
The mapping from feature values to vectors is straighforward in
the case of real valued features (trivial) and binary features (mapped
to zero or one). It is less clear what to do with categorical features.
For example, if our goal is to identify whether an object in an image
easy?
is a tomato, blueberry, cucumber or cockroach, we might want to
Figure 3.1: A figure showing projections
know its color: is it Red, Blue, Green or Black? of data in two dimension in three
One option would be to map Red to a value of 0, Blue to a value ways – see text. Top: horizontal axis
corresponds to the first feature (easy)
of 1, Green to a value of 2 and Black to a value of 3. The problem and the vertical axis corresponds to
with this mapping is that it turns an unordered set (the set of colors) the second feature (AI?); Middle:
horizontal is second feature and vertical
into an ordered set (the set {0, 1, 2, 3}). In itself, this is not necessarily
is third (systems?); Bottom: horizontal
a bad thing. But when we go to use these features, we will measure is first and vertical is third. Truly,
examples based on their distances to each other. By doing this map- the data points would like exactly on
(0, 0) or (1, 0), etc., but they have been
ping, we are essentially saying that Red and Blue are more similar purturbed slightly to show duplicates.
(distance of 1) than Red and Black (distance of 3). This is probably Match the example ids from Table 1
not what we want to say! ? with the points in Figure 3.1.
A solution is to turn a categorical feature that can take four dif-
ferent values (say: Red, Blue, Green and Black) into four binary
features (say: IsItRed?, IsItBlue?, IsItGreen? and IsItBlack?). In gen-
eral, if we start from a categorical feature that takes V values, we can
map it to V-many binary indicator features.
With that, you should be able to take a data set and map each
example to a feature vector through the following mapping:
The computer scientist in you might
be saying: actually we could map it
• Real-valued features get copied directly. ? to log2 V-many binary features! Is
this a good idea or not?
• Binary features become 0 (for false) or 1 (for true).

• Categorical features with V possible values get mapped to V-many

binary indicator features.
geometry and nearest neighbors 31

After this mapping, you can think of a single example as a vec-

tor in a high-dimensional feature space. If you have D-many fea-
tures (after expanding categorical features), then this feature vector
will have D-many components. We will denote feature vectors as
x = h x1 , x2 , . . . , x D i, so that xd denotes the value of the dth fea-
ture of x. Since these are vectors with real-valued components in
D-dimensions, we say that they belong to the space R D .
For D = 2, our feature vectors are just points in the plane, like in
Figure 3.1. For D = 3 this is three dimensional space. For D > 3 it
becomes quite hard to visualize. (You should resist the temptation
to think of D = 4 as “time” – this will just make things confusing.)
Unfortunately, for the sorts of problems you will encounter in ma-
chine learning, D ⇡ 20 is considered “low dimensional,” D ⇡ 1000 is
“medium dimensional” and D ⇡ 100000 is “high dimensional.” Can you think of problems (per-
haps ones already mentioned in this
? book!) that are low dimensional?
That are medium dimensional?
3.2 K-Nearest Neighbors That are high dimensional?

The biggest advantage to thinking of examples as vectors in a high

dimensional space is that it allows us to apply geometric concepts
to machine learning. For instance, one of the most basic things
that one can do in a vector space is compute distances. In two-
dimensional space, the distance between h2, 3i and h6, 1i is given
p p
by (2 6)2 + (3 1)2 = 18 ⇡ 4.24. In general, in D-dimensional
space, the Euclidean distance between vectors a and b is given by
Eq (3.1) (see Figure 3.2 for geometric intuition in three dimensions): (.6, 1, .8)

" #1
D 2
2
d( a, b) = Â ( ad bd ) (3.1)
)
.5

d =1
,
.4
,
(0

Now that you have access to distances between examples, you

can start thinking about what it means to learn again. Consider Fig-
ure 3.3. We have a collection of training data consisting of positive
examples and negative examples. There is a test point marked by a
question mark. Your job is to guess the correct label for that point.
Figure 3.2: A figure showing Euclidean
Most likely, you decided that the label of this test point is positive. distance in three dimensions. The
One reason why you might have thought that is that you believe length of the green segments are 0.6, 0.6
and 0.3 respectively, in the x-, y-, and
that the label for an example should be similar to the label of nearby z-axes. The total distance between the
points. This is an example of a new form of inductive bias. red
p dot and the orange dot is therefore
The nearest neighbor classifier is build upon this insight. In com- 0.62 + 0.62 + 0.32 = 0.9.
Verify that d from Eq (3.1) gives the
parison to decision trees, the algorithm is ridiculously simple. At
training time, we simply store the entire training set. At test time,
? same result (4.24) for the previous
computation.
we get a test example x̂. To predict its label, we find the training ex-
ample x that is most similar to x̂. In particular, we find the training

?
32 a course in machine learning

example x that minimizes d( x, x̂). Since x is a training example, it has

a corresponding label, y. We predict that the label of x̂ is also y.
Despite its simplicity, this nearest neighbor classifier is incred-
ibly effective. (Some might say frustratingly effective.) However, it
is particularly prone to overfitting label noise. Consider the data in
Figure 3.4. You would probably want to label the test point positive.
Unfortunately, it’s nearest neighbor happens to be negative. Since the
nearest neighbor algorithm only looks at the single nearest neighbor,
it cannot consider the “preponderance of evidence” that this point
should probably actually be a positive example. It will make an un-
necessary error. ?
A solution to this problem is to consider more than just the single
nearest neighbor when making a classification decision. We can con-
sider the K-nearest neighbors and let them vote on the correct class
for this test point. If you consider the 3-nearest neighbors of the test Figure 3.4: A figure showing an easy
point in Figure 3.4, you will see that two of them are positive and one NN classification problem where the
test point is a ? and should be positive,
is negative. Through voting, positive would win. but its NN is actually a negative point
The full algorithm for K-nearest neighbor classification is given that’s noisy.
in Algorithm 3.2. Note that there actually is no “training” phase for Why is it a good idea to use an odd
K-nearest neighbors. In this algorithm we have introduced five new
? number for K?

conventions:

1. The training data is denoted by D.

2. We assume that there are N-many training examples.

3. These examples are pairs ( x1 , y1 ), ( x2 , y2 ), . . . , ( x N , y N ).

(Warning: do not confuse xn , the nth training example, with xd ,
the dth feature for example x.)

4. We use [ ]to denote an empty list and · to append · to that list.

5. Our prediction on x̂ is called ŷ.

The first step in this algorithm is to compute distances from the

test point to all training points (lines 2-4). The data points are then
sorted according to distance. We then apply a clever trick of summing
the class labels for each of the K nearest neighbors (lines 6-10) and
using the sign of this sum as our prediction. Why is the sign of the sum com-
The big question, of course, is how to choose K. As we’ve seen, puted in lines 2-4 the same as the
? majority vote of the associated
with K = 1, we run the risk of overfitting. On the other hand, if training examples?
K is large (for instance, K = N), then KNN-Predict will always
predict the majority class. Clearly that is underfitting. So, K is a
hyperparameter of the KNN algorithm that allows us to trade-off
between overfitting (small value of K) and underfitting (large value of
K).

Why can’t you simply pick the

value of K that does best on the
training data? In other words, why
? do we have to treat it like a hy-
perparameter rather than just a
parameter.
geometry and nearest neighbors 33

Algorithm 3 KNN-Predict(D, K, x̂)

1: S []
2: for n = 1 to N do

3: S S hd(xn , x̂), ni // store distance to training example n

4: end for

5: S sort(S) // put lowest-distance objects first

6: ŷ 0
7: for k = 1 to K do

8: hdist,ni Sk // n this is the kth closest data point

9: ŷ ŷ + yn // vote according to the label for the nth training point
10: end for

11: return sign(ŷ) // return +1 if ŷ > 0 and 1 if ŷ < 0

Figure 3.5: A figure of a ski and a

snowboard.

One aspect of inductive bias that we’ve seen for KNN is that it
assumes that nearby points should have the same label. Another
aspect, which is quite different from decision trees, is that all features
are equally important! Recall that for decision trees, the key question
was which features are most useful for classification? The whole learning Figure 3.6: Classification data for ski vs
algorithm for a decision tree hinged on finding a small set of good snowboard in 2d

features. This is all thrown away in KNN classifiers: every feature

is used, and they are all used the same amount. This means that if
you have data with only a few relevant features and lots of irrelevant
features, KNN is likely to do poorly.
A related issue with KNN is feature scale. Suppose that we are
trying to classify whether some object is a ski or a snowboard (see
Figure 3.5). We are given two features about this data: the width
and height. As is standard in skiing, width is measured in millime-
ters and height is measured in centimeters. Since there are only two
features, we can actually plot the entire training set; see Figure 3.6
where ski is the positive class. Based on this data, you might guess
that a KNN classifier would do well.
Suppose, however, that our measurement of the width was com-
puted in millimeters (instead of centimeters). This yields the data
shown in Figure 3.7. Since the width values are now tiny, in compar-
ison to the height values, a KNN classifier will effectively ignore the
width values and classify almost purely based on height. The pre-
dicted class for the displayed test point had changed because of this Figure 3.7: Classification data for ski vs
feature scaling. snowboard in 2d, with width rescaled
to mm.
We will discuss feature scaling more in Chapter 5. For now, it is
just important to keep in mind that KNN does not have the power to
decide which features are important.
34 a course in machine learning

M ATH R EVIEW | V ECTOR A RITHMETIC AND V ECTOR N ORMS

A (real-valued) vector is just an array of real values, for instance x = h1, 2.5, 6i is a three-dimensional
vector. In general, if x = h x1 , x2 , . . . , x D i, then xd is it’s dth component. So x3 = 6 in the previous ex-
ample.

Vector sums are computed pointwise, and are only defined when dimensions match, so h1, 2.5, 6i +
h2, 2.5, 3i = h3, 0, 3i. In general, if c = a + b then cd = ad + bd for all d. Vector addition can
be viewed geometrically as taking a vector a, then tacking on b to the end of it; the new end point is
exactly c.

Vectors can be scaled by real values; for instance 2h1, 2.5, 6i = h2, 5, 12i; this is called scalar multi-
plication. In general, ax = h ax1 , ax2 , . . . , ax D i.

The norm of a vector

q x, written || x|| is its length. Unless otherwise specified, this is its Euclidean length,
namely: || x|| = Âd xd2 .

Figure 3.8:

3.3 Decision Boundaries

The standard way that we’ve been thinking about learning algo-
rithms up to now is in the query model. Based on training data, you
learn something. I then give you a query example and you have to
guess it’s label.
An alternative, less passive, way to think about a learned model
is to ask: what sort of test examples will it classify as positive, and
what sort will it classify as negative. In Figure 3.9, we have a set of
training data. The background of the image is colored blue in regions
that would be classified as positive (if a query were issued there) Figure 3.9: decision boundary for 1nn.

and colored red in regions that would be classified as negative. This

coloring is based on a 1-nearest neighbor classifier.
In Figure 3.9, there is a solid line separating the positive regions
from the negative regions. This line is called the decision boundary
for this classifier. It is the line with positive land on one side and
negative land on the other side.
Decision boundaries are useful ways to visualize the complex-
ity of a learned model. Intuitively, a learned model with a decision
boundary that is really jagged (like the coastline of Norway) is really
complex and prone to overfitting. A learned model with a decision
boundary that is really simple (like the bounary between Arizona Figure 3.10: decision boundary for knn
with k=3.
and Utah) is potentially underfit.
Now that you know about decision boundaries, it is natural to ask:
what do decision boundaries for decision trees look like? In order
geometry and nearest neighbors 35

to answer this question, we have to be a bit more formal about how

to build a decision tree on real-valued features. (Remember that the
algorithm you learned in the previous chapter implicitly assumed
binary feature values.) The idea is to allow the decision tree to ask
questions of the form: “is the value of feature 5 greater than 0.2?”
That is, for real-valued features, the decision tree nodes are param-
eterized by a feature and a threshold for that feature. An example
decision tree for classifying skis versus snowboards is shown in Fig-
ure 3.11.
Now that a decision tree can handle feature vectors, we can talk
about decision boundaries. By example, the decision boundary for
the decision tree in Figure 3.11 is shown in Figure 3.12. In the figure,
space is first split in half according to the first query along one axis.
Figure 3.11: decision tree for ski vs.
Then, depending on which half of the space you look at, it is either snowboard
split again along the other axis, or simply classified.
Figure 3.12 is a good visualization of decision boundaries for
decision trees in general. Their decision boundaries are axis-aligned
cuts. The cuts must be axis-aligned because nodes can only query on
a single feature at a time. In this case, since the decision tree was so
shallow, the decision boundary was relatively simple.

3.4 K-Means Clustering

Up through this point, you have learned all about supervised learn-
ing (in particular, binary classification). As another example of the Figure 3.12: decision boundary for dt in
previous figure
use of geometric intuitions and data, we are going to temporarily
What sort of data might yield a
consider an unsupervised learning problem. In unsupervised learn- very simple decision boundary with
ing, our data consists only of examples xn and does not contain corre- a decision tree and very complex
sponding labels. Your job is to make sense of this data, even though
? decision boundary with 1-nearest
neighbor? What about the other
no one has provided you with correct labels. The particular notion of way around?
“making sense of” that we will talk about now is the clustering task.
Consider the data shown in Figure 3.13. Since this is unsupervised
learning and we do not have access to labels, the data points are
simply drawn as black dots. Your job is to split this data set into
three clusters. That is, you should label each data point as A, B or C
in whatever way you want.
For this data set, it’s pretty clear what you should do. You prob-
ably labeled the upper-left set of points A, the upper-right set of
points B and the bottom set of points C. Or perhaps you permuted
these labels. But chances are your clusters were the same as mine.
The K-means clustering algorithm is a particularly simple and
effective approach to producing clusters on data like you see in Fig-
ure 3.13. The idea is to represent each cluster by it’s cluster center.
Given cluster centers, we can simply assign each point to its nearest

Figure 3.13: simple clustering data...

clusters in UL, UR and BC.

Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
K-Nearest Neighbourhood
100% (1)
K-Nearest Neighbourhood
7 pages
18.4 - K-Nearest Neighbours Geometric Intuition With A Toy Example - mp4
No ratings yet
18.4 - K-Nearest Neighbours Geometric Intuition With A Toy Example - mp4
3 pages
Image Classification AI
No ratings yet
Image Classification AI
150 pages
Unit 3
No ratings yet
Unit 3
100 pages
Unit - IV
No ratings yet
Unit - IV
78 pages
DS - Module 4
No ratings yet
DS - Module 4
57 pages
Classification of CNC Machine
81% (16)
Classification of CNC Machine
11 pages
Statistical Learning
No ratings yet
Statistical Learning
92 pages
Lazy LearningClassification Using Nearest Neighbors
No ratings yet
Lazy LearningClassification Using Nearest Neighbors
36 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Unit 2h
No ratings yet
Unit 2h
39 pages
2005 RG Body
No ratings yet
2005 RG Body
1,402 pages
05 KNN
No ratings yet
05 KNN
49 pages
K-Nearest Neighbors: Nipun Batra July 5, 2020
No ratings yet
K-Nearest Neighbors: Nipun Batra July 5, 2020
66 pages
01 Basics 02knn 02
No ratings yet
01 Basics 02knn 02
7 pages
Distance Metric Learning For Large Margin Nearest Neighbor Classification
No ratings yet
Distance Metric Learning For Large Margin Nearest Neighbor Classification
8 pages
MS Boundary Gate
No ratings yet
MS Boundary Gate
18 pages
SWE622 Lecture 3 Classification
No ratings yet
SWE622 Lecture 3 Classification
57 pages
Physics (Marks) Chemistry (Marks) Resultsdistance
No ratings yet
Physics (Marks) Chemistry (Marks) Resultsdistance
3 pages
3a KNN PDF
No ratings yet
3a KNN PDF
26 pages
Machine Learning For Humans, Part 2.3 - Supervised Learning III - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 2.3 - Supervised Learning III - by Vishal Maini - Machine Learning For Humans - Medium
25 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
KNN & Support Vector Machines: Dr.S.Vasantharathna
No ratings yet
KNN & Support Vector Machines: Dr.S.Vasantharathna
22 pages
Geometry of High-Dimensional Space
No ratings yet
Geometry of High-Dimensional Space
36 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
No ratings yet
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
39 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
KNN Notes
No ratings yet
KNN Notes
6 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
KPM180 Manual
No ratings yet
KPM180 Manual
108 pages
MLT Notes
No ratings yet
MLT Notes
17 pages
Predict Based Simmiliarity and Validation
No ratings yet
Predict Based Simmiliarity and Validation
19 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
Mitocw - Watch?V Eg8Djywdmyg: Professor
No ratings yet
Mitocw - Watch?V Eg8Djywdmyg: Professor
13 pages
Industrial Ro Plant 1000 LPH PDF
No ratings yet
Industrial Ro Plant 1000 LPH PDF
7 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
Chuong 2
No ratings yet
Chuong 2
197 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
4.4-InstanceBasedLearning Part 1
No ratings yet
4.4-InstanceBasedLearning Part 1
16 pages
RR and RD Piles Design and Installation Manual
No ratings yet
RR and RD Piles Design and Installation Manual
56 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
Precedent EM Wiring
No ratings yet
Precedent EM Wiring
64 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Chap10 Questionnaire Survey and Errors
No ratings yet
Chap10 Questionnaire Survey and Errors
33 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
Lecture 34 - Model Based Reinforcement Learning
No ratings yet
Lecture 34 - Model Based Reinforcement Learning
26 pages
Wendland, Aristeae Ad Philocratem Epistula
No ratings yet
Wendland, Aristeae Ad Philocratem Epistula
275 pages
2022 MDP APP and Budget Matrix F2F SARSARACAT ES
No ratings yet
2022 MDP APP and Budget Matrix F2F SARSARACAT ES
15 pages
Digital Mp3 Player
No ratings yet
Digital Mp3 Player
3 pages
AirCheck Detail Report - PK8AP02
No ratings yet
AirCheck Detail Report - PK8AP02
100 pages
NIBDocument NIB16
No ratings yet
NIBDocument NIB16
92 pages
Hussein Abdullahi Elmi: Personal Profile
No ratings yet
Hussein Abdullahi Elmi: Personal Profile
3 pages
MatLab Add
No ratings yet
MatLab Add
9 pages
COMSATS University Islamabad: Terminal Examination, SPRING 2021
No ratings yet
COMSATS University Islamabad: Terminal Examination, SPRING 2021
6 pages
JavaScript Business Rules - 250 Course Outline
No ratings yet
JavaScript Business Rules - 250 Course Outline
6 pages
Etx 2 v6.7 - Datasheet
No ratings yet
Etx 2 v6.7 - Datasheet
13 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
Business Technology
No ratings yet
Business Technology
4 pages
Lecture - Arrays
No ratings yet
Lecture - Arrays
16 pages
IRC Codes
No ratings yet
IRC Codes
36 pages
Dictionary
No ratings yet
Dictionary
33 pages
Admit Card
No ratings yet
Admit Card
2 pages
TLWA Assignment-1 - 03-09-2024
No ratings yet
TLWA Assignment-1 - 03-09-2024
2 pages
Swami Tech
No ratings yet
Swami Tech
32 pages
Productattachments Files Downloads Ezmimo 2-4ghz Datasheet
No ratings yet
Productattachments Files Downloads Ezmimo 2-4ghz Datasheet
1 page
Ai 2024 Board Paper Solution
No ratings yet
Ai 2024 Board Paper Solution
4 pages
Daily Water Station Check List
No ratings yet
Daily Water Station Check List
1 page
Lecture W5ab
No ratings yet
Lecture W5ab
56 pages
CS-878 Lecture-02 Logistic Regression
No ratings yet
CS-878 Lecture-02 Logistic Regression
55 pages
Eigen Values and Eigen Vectors
No ratings yet
Eigen Values and Eigen Vectors
53 pages
Lecture W6b
No ratings yet
Lecture W6b
33 pages
Lecture W3
No ratings yet
Lecture W3
28 pages
Lecture 14 15 - Temporal Difference Learning, Lambda-Return, Backward View of TD (Lambda)
No ratings yet
Lecture 14 15 - Temporal Difference Learning, Lambda-Return, Backward View of TD (Lambda)
26 pages
Lecture 11 12 - Model Free Prediction, Monte-Carlo Learning, Temporal Difference Learning
No ratings yet
Lecture 11 12 - Model Free Prediction, Monte-Carlo Learning, Temporal Difference Learning
24 pages
Lecture W7ab
No ratings yet
Lecture W7ab
21 pages
Lesson 8-Image Segmentation - Traditional Approaches
No ratings yet
Lesson 8-Image Segmentation - Traditional Approaches
35 pages
Kashi Vishwanath Entry Ticket (5 Persons)
No ratings yet
Kashi Vishwanath Entry Ticket (5 Persons)
1 page
Lecture 19 - Model-Free Control, Off-Policy Learning
No ratings yet
Lecture 19 - Model-Free Control, Off-Policy Learning
9 pages
Lecture 35 36 - Exploration vs. Exploitation
No ratings yet
Lecture 35 36 - Exploration vs. Exploitation
18 pages
Lecture 22 - Value Function Approximation
No ratings yet
Lecture 22 - Value Function Approximation
17 pages
Geometric Hashing: Efficient Algorithms for Image Recognition and Matching
From Everand
Geometric Hashing: Efficient Algorithms for Image Recognition and Matching
Fouad Sabry
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Self Reading - KNN - Notes

Uploaded by

Self Reading - KNN - Notes

Uploaded by

3 | G EOMETRY AND N EAREST N EIGHBORS

3.1 From Data to Feature Vectors

An example is just a collection of feature values about that example,

mation point and no underlined text. This could be represented by

• Categorical features with V possible values get mapped to V-many

After this mapping, you can think of a single example as a vec-

The biggest advantage to thinking of examples as vectors in a high

Now that you have access to distances between examples, you

example x that minimizes d( x, x̂). Since x is a training example, it has

1. The training data is denoted by D.

2. We assume that there are N-many training examples.

3. These examples are pairs ( x1 , y1 ), ( x2 , y2 ), . . . , ( x N , y N ).

4. We use [ ]to denote an empty list and · to append · to that list.

5. Our prediction on x̂ is called ŷ.

The first step in this algorithm is to compute distances from the

Why can’t you simply pick the

Algorithm 3 KNN-Predict(D, K, x̂)

3: S S hd(xn , x̂), ni // store distance to training example n

5: S sort(S) // put lowest-distance objects first

8: hdist,ni Sk // n this is the kth closest data point

11: return sign(ŷ) // return +1 if ŷ > 0 and 1 if ŷ < 0

Figure 3.5: A figure of a ski and a

features. This is all thrown away in KNN classifiers: every feature

M ATH R EVIEW | V ECTOR A RITHMETIC AND V ECTOR N ORMS

The norm of a vector

3.3 Decision Boundaries

and colored red in regions that would be classified as negative. This

to answer this question, we have to be a bit more formal about how

3.4 K-Means Clustering

Figure 3.13: simple clustering data...

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.