0% found this document useful (0 votes)

2 views22 pages

New Data Science Module Nearest Neighbors

The document provides an overview of the k-Nearest Neighbors (kNN) classification method, explaining its general concept of classifying points based on the majority class of their neighbors. It includes examples of how to assign labels to data points, the importance of choosing an appropriate value for k, and the necessity of scaling features for accurate distance calculations. Additionally, it demonstrates the implementation of kNN in Python using the scikit-learn library with practical examples and visualizations.

Uploaded by

akul joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views22 pages

New Data Science Module Nearest Neighbors

Uploaded by

akul joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

BU MET CS-677: Data Science With Python, v.2.

0 kNN - Nearest Neighbors Classification

NEAREST

NEIGHBORS

CLASSIFICATION

Page 1
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

General Idea

• points in the same class are

ususally ”neighbors”
• assign class based on
majority of neighbors
• need distance

• need to choose k - number of

neighbors
• note: k must be odd for
simple majority

Page 2
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Example of kNN

12
10
8
6

4
Y 2
0 $ %
2
4
6
8
6 3 0 3 6 9 12 15 18
X

• what labels for A and B ?

Page 3
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Assigning a Label for A

Y
$
N
N
N

point k neighbors majority

1 x1 green
A 3 x1, x2, x3 red
5 x1, x2, x3, x4, x5 green

Page 4
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Assigning a Label for B

Y
%
N
N
N

point k neighbors majority

1 x2 red
B 3 x2, x3, x5 red
5 x1, x2, x3, x4, x5 green

Page 5
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

How to Choose k
12
10
8
6

4
Y
2
0 $ %
2
4
6
8
6 3 0 3 6 9 12 15 18
X

point k neighbors majority

1 x1 green
A 3 x1, x2, x3 red
5 x1, x2, x3, x4, x5 green
1 x2 red
B 3 x2, x3, x5 red
5 x1, x2, x3, x4, x5 green

Page 6
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Illustration in Python
import numpy as np
import pandas as pd
from sklearn . neighbors import \
KNeighborsClassifier

data = pd . DataFrame (
{ " id " : [ 1 ,2 ,3 ,4 ,5 ,6] ,
" Label " : [ " green " ," red " ," red " ,
" green " ," green " ," red " ] ,
" X " : [1 , 6 , 7 , 10 , 10 , 15] ,
" Y " : [2 , 4 , 5 , -1 , 2 , 2 ]} ,
columns = [ " id " , " Label " , " X " ," Y " ]}
X = data [[ " X " ," Y " ]]. values
Y = data [[ " Label " ]]. values
knn_classifier = KNeighborsClassifier (
n_neighbors =3)
knn_classifier . fit (X , Y )
new_instance = np . asmatrix ([3 , 2])
prediction = knn_classifier . predict (
new_instance )

ipdb> prediction[0]
red

Page 7
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

A Numerical Example

object Height Weight Foot Label

xi (H) (W) (F) (L)
x1 5.00 100 6 green
x2 5.50 150 8 green
x3 5.33 130 7 green
x4 5.75 150 9 green
x5 6.00 180 13 red
x6 5.92 190 11 red
x7 5.58 170 12 red
x8 5.92 165 10 red

• note different scales

Page 8
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

What is the Label?

14
12

10 Foot

8
6
200
180
4.8 5.0 160
5.2 5.4 140 ight
120 We
Height5.6 5.8 6.0 100

(H=6, W=160, F=10) 7→ ?

Page 9
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

kNN in Python
import pandas as pd
data = pd . DataFrame (
{ " id " :[ 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8] ,
" Label " :[ " green " ," green " ," green " ," green " ,
" red " ," red " ," red " ," red " ] ,
" Height " :[5 ,5.5 ,5.33 ,5.75 ,6.00 ,5.92 ,5.58 ,5.92] ,
" Weight " :[100 ,150 ,130 ,150 ,180 ,190 ,170 ,165] ,
" Foot " :[6 , 8 , 7 , 9 , 13 , 11 , 12 , 10]} ,
columns =[ " id " ," Height " ," Weight " ,
" Foot " ," Label " ])

X = data [[ " Height " ," Weight " ," Foot " ]]. values
Y = data [[ " Label " ]]. values

scaler = StandardScaler (). fit ( X )

X = scaler . transform ( X )

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

knn_classifier . fit (X , Y )

new_instance = np . asmatrix ([6 , 160 , 10])

new_instance_scaled = scaler . transform ( new_instance )
prediction = knn_classifier . predict ( new_instance_scaled )

ipdb> prediction[0]
’red’
Page 10
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Result Without Scaling

import pandas as pd
data = pd . DataFrame (
{ " id " :[ 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8] ,
" Label " :[ " green " ," green " ," green " ," green " ,
" red " ," red " ," red " ," red " ] ,
" Height " :[5 ,5.5 ,5.33 ,5.75 ,6.00 ,5.92 ,5.58 ,5.92] ,
" Weight " :[100 ,150 ,130 ,150 ,180 ,190 ,170 ,165] ,
" Foot " :[6 , 8 , 7 , 9 , 13 , 11 , 12 , 10]} ,
columns =[ " id " ," Height " ," Weight " ,
" Foot " ," Label " ])

X = data [[ " Height " ," Weight " ," Foot " ]]. values
Y = data [[ " Label " ]]. values

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

knn_classifier . fit (X , Y )

new_instance = np . asmatrix ([6 , 160 , 10])

prediction = knn_classifier . predict ( new_instance )

ipdb> prediction[0]
’red’

Page 11
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Why Scaling?

6

8

Foot 10

12

14
4.8 t
5.0
5.2
5.4
5.6
5.8
200 180 160 140 120 100 6.0 h
Weight Heig

• (euclidean) distances d(·)

dominated by one dimension

Page 12
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Effect of Scaling

2.0
1.5
1.0
0.5
Foot 0.0

0.5
1.0
1.5
2.0 1.52.0
1.0
2.0 1.5 1.0 0.5 0.00.5 ht
0.5
0.0 0.5 1.0 1.5 1.0
1.5
2.0 H eig
Weight 2.0

• without scaling: d(x7, x8) < d(x4, x8)

• with scaling: d(x7, x8) > d(x4, x8)

Page 13
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Calculating k
import pandas as pd
from sklearn . preprocessing import StandardScaler
from sklearn . neighbors import KNeighborsClassifier
from sklearn . model_selection import train_test_split

X = data [[ " Height " ," Weight " ," Foot " ]]. values
Y = data [[ " Label " ]]. values

scaler = StandardScaler (). fit ( X )

X = scaler . transform ( X )
X_train , X_test , Y_train , Y_test = train_test_split (X ,Y ,
test_size =0.5 , random_state =0)
error_rate = []
for k in [1 ,3]:
knn_classifier = KNeighborsClassifier ( n_neighbors = k )
knn_classifier . fit ( X_train , Y_train )
pred_k = knn_classifier . predict ( X_test )
error_rate . append ( np . mean ( pred_k != Y_test ))

ipdb> error_rate
[0.5, 0.5]
Page 14
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Calculating k for IRIS

import numpy as np
import pandas as pd
import matplotlib . pyplot as plt
from sklearn . preprocessing import StandardScaler , LabelEncoder
from sklearn . neighbors import KNeighborsClassifier
from sklearn . model_selection import train_test_split

url = r ’ https :// archive . ics . uci . edu / ml / ’ + \

r ’ machine - learning - databases / iris / iris . data ’

iris_feature_names = [ ’ sepal - length ’ , ’ sepal - width ’ ,

’ petal - length ’ , ’ petal - width ’]

data = pd . read_csv ( url , names =[ ’ sepal - length ’ , ’ sepal - width ’ ,

’ petal - length ’ , ’ petal - width ’ , ’ Class ’ ])

class_labels = [ ’ Iris - versicolor ’ , ’ Iris - virginica ’]

data = data [ data [ ’ Class ’ ]. isin ( class_labels )]

X = data [ iris_feature_names ]. values

scaler = StandardScaler ()
scaler . fit ( X )
X = scaler . transform ( X )

le = LabelEncoder ()
Y = le . fit_transform ( data [ ’ Class ’ ]. values )
X_train , X_test , Y_train , Y_test = train_test_split (X ,Y , test_size =0.5 ,
random_state =3)

Page 15
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Calculating k for IRIS

(cont’d)

error_rate = []
for k in range (1 ,21 ,2):
knn_classifier = KNeighborsClassifier ( n_neighbors = k )
knn_classifier . fit ( X_train , Y_train )
pred_k = knn_classifier . predict ( X_test )
error_rate . append ( np . mean ( pred_k != Y_test ))

figure ( figsize =(10 ,4))

ax = plt . gca ()
ax . xaxis . set_major_locator ( MaxNLocator ( integer = True ))
plt . plot ( range (1 ,21 ,2) , error_rate , color = ’ red ’ , linestyle = ’ dashed ’ ,
marker = ’o ’ , markerfacecolor = ’ black ’ , markersize =10)
plt . title ( ’ Error Rate vs . k for Iris Subset ’)
plt . xlabel ( ’ number of neighbors : k ’)
plt . ylabel ( ’ Error Rate ’)

Page 16
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Calculating k for IRIS

Error Rate vs. k: Iris-versicolor and Iris-virginica
0.060

0.055

0.050

0.045
Error Rate

0.040

0.035

0.030

0.025

0.020
2 4 6 8 10 12 14 16 18
number of neighbors: k

Page 17
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

k for IRIS
Iris-setosa
Iris-versicolor
Iris-virginica7
6

petal-length
5
4
3
2
1
4.5
4.0
3.5 h
4.5 5.0
5.5 3.0 l-widt
sepal-6.0 2.5 sepa
leng6.5
th 7.0 7.5 8.0 2.0

Error Rate vs. k: Iris-setosa and Iris-virginica

0.04

0.02
Error Rate

0.00

0.02

0.04

2 4 6 8 10 12 14 16 18
number of neighbors: k

Page 18
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

A Categorical Dataset

Day Weather Temperature Wind Play

1 sunny hot low no
2 rainy mild high yes
3 sunny cold low yes
4 rainy cold high no
5 sunny cold high yes
6 overcast mild low yes
7 sunny hot low yes
8 overcast hot high yes
9 rainy hot high no
10 rainy mild low yes

• what label for x* = (sunny,

cold, low)?

Page 19
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Python Code
import pandas as pd
import numpy as np
from sklearn . neighbors import KNeighborsClassifier
from sklearn . preprocessing import LabelEncoder

data = pd . DataFrame (
{ ’ Day ’: [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10] ,
’ Weather ’ :[ ’ sunny ’ , ’ rainy ’ , ’ sunny ’ , ’ rainy ’ , ’ sunny ’ , ’ overcast ’ ,
’ sunny ’ , ’ overcast ’ , ’ rainy ’ , ’ rainy ’] ,
’ Temperature ’: [ ’ hot ’ , ’ mild ’ , ’ cold ’ , ’ cold ’ , ’ cold ’ , ’ mild ’ ,
’ hot ’ , ’ hot ’ , ’ hot ’ , ’ mild ’] ,
’ Wind ’: [ ’ low ’ , ’ high ’ , ’ low ’ , ’ high ’ , ’ high ’ , ’ low ’ , ’ low ’ ,
’ high ’ , ’ high ’ , ’ low ’] ,
’ Play ’: [ ’ no ’ , ’ yes ’ , ’ yes ’ , ’ no ’ , ’ yes ’ , ’ yes ’ , ’ yes ’ ,
’ yes ’ , ’ no ’ , ’ yes ’]} ,
columns = [ ’ Day ’ , ’ Weather ’ , ’ Temperature ’ , ’ Wind ’ , ’ Play ’]
)
input_data = data [[ ’ Weather ’ , ’ Temperature ’ , ’ Wind ’ ]]
dummies = [ pd . get_dummies ( data [ c ]) for c in input_data . columns ]
binary_data = pd . concat ( dummies , axis =1)

X = binary_data [0:10]. values

le = LabelEncoder ()
Y = le . fit_transform ( data [ ’ Play ’ ]. values )

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

knn_classifier . fit (X , Y )
new_instance = np . asmatrix ([0 ,0 ,1 ,1 ,0 ,0 ,0 ,1])
prediction = knn_classifier . predict ( new_instance )

ipdb> prediction
1
Page 20
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

kNN: IRIS
import pandas as pd
import numpy as np
from sklearn . preprocessing import StandardScaler , LabelEncoder
from sklearn . neighbors import KNeighborsClassifier
from sklearn . model_selection import train_test_split

url = r ’ https :// archive . ics . uci . edu / ml / ’ + \

r ’ machine - learning - databases / iris / iris . data ’

iris_feature_names = [ ’ sepal - length ’ , ’ sepal - width ’ ,

’ petal - length ’ , ’ petal - width ’]
data = pd . read_csv ( url , names =[ ’ sepal - length ’ , ’ sepal - width ’ ,
’ petal - length ’ , ’ petal - width ’ , ’ Class ’ ])
class_labels = [ ’ Iris - versicolor ’ , ’ Iris - virginica ’]
data = data [ data [ ’ Class ’ ]. isin ( class_labels )]

X = data [ iris_feature_names ]. values

scaler = StandardScaler ()
scaler . fit ( X )
X = scaler . transform ( X )
le = LabelEncoder ()
Y = le . fit_transform ( data [ ’ Class ’ ]. values )

X_train , X_test , Y_train , Y_test = train_test_split (X ,Y ,

test_size =0.5 , random_state =3)
knn_classifier = KNeighborsClassifier ( n_neighbors =15)
knn_classifier . fit ( X_train , Y_train )
prediction = knn_classifier . predict ( X_test )
error_rate = np . mean ( prediction != Y_test )

ipdb> error_rate
0.06

Page 21
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Concepts Check:

(a) distances and neighbors

(b) nearest neigbor intuition
(c) need for scaling
(d) how to choose k
(e) analyzing categorical data

Page 22

Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
No ratings yet
Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
6 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Lecture 4 KNN
No ratings yet
Lecture 4 KNN
17 pages
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
No ratings yet
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
55 pages
K - Nearest Neighbours
No ratings yet
K - Nearest Neighbours
6 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
KNN Assignment Report
No ratings yet
KNN Assignment Report
3 pages
T2 KNN
No ratings yet
T2 KNN
16 pages
KNN Cookbook
No ratings yet
KNN Cookbook
8 pages
ML 3
No ratings yet
ML 3
6 pages
MLT Lab 09
No ratings yet
MLT Lab 09
3 pages
K-Nearest Neighbors Clearly Explained
No ratings yet
K-Nearest Neighbors Clearly Explained
11 pages
Module 3 Lab 2
No ratings yet
Module 3 Lab 2
6 pages
KNN Classifier
No ratings yet
KNN Classifier
5 pages
1 Supervise Learning (KNN) (Solution) : 1.1 Distance Measuring in Machine Learning
No ratings yet
1 Supervise Learning (KNN) (Solution) : 1.1 Distance Measuring in Machine Learning
14 pages
KNN Datacamp
No ratings yet
KNN Datacamp
31 pages
KNN Colab Illustration
No ratings yet
KNN Colab Illustration
5 pages
Explanation:: You Said
No ratings yet
Explanation:: You Said
4 pages
Dhanashree ML Report
No ratings yet
Dhanashree ML Report
3 pages
KNN Lab
No ratings yet
KNN Lab
4 pages
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
No ratings yet
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
21 pages
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
Practical 7
No ratings yet
Practical 7
6 pages
CSL0777 L22
No ratings yet
CSL0777 L22
35 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
Risss ML Record 6
No ratings yet
Risss ML Record 6
6 pages
KNN Activity
No ratings yet
KNN Activity
4 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
Assignment No 2 AI
No ratings yet
Assignment No 2 AI
4 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Week 4 Classification KNN
No ratings yet
Week 4 Classification KNN
21 pages
K-NN Algorithm in Machine Learning
No ratings yet
K-NN Algorithm in Machine Learning
11 pages
K Nearest Neighbour's (KNN) (1) Using R
No ratings yet
K Nearest Neighbour's (KNN) (1) Using R
9 pages
K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025
No ratings yet
K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025
14 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
K-NN Algorithm: Need To Create Two Files File 1: KNN - Py Second File: Expt3.py
No ratings yet
K-NN Algorithm: Need To Create Two Files File 1: KNN - Py Second File: Expt3.py
4 pages
Sample KNN
No ratings yet
Sample KNN
7 pages
It - S All About Neighbors - Completed
No ratings yet
It - S All About Neighbors - Completed
14 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
Summary of K-Nearest Neighbours Algorithms
No ratings yet
Summary of K-Nearest Neighbours Algorithms
1 page
BDA 6TH SEM Question Bank
No ratings yet
BDA 6TH SEM Question Bank
6 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Microsoft Official Course: Implementing Failover Clustering With Hyper-V
No ratings yet
Microsoft Official Course: Implementing Failover Clustering With Hyper-V
45 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
2 pages
Information Society Notes
No ratings yet
Information Society Notes
30 pages
Week 07
No ratings yet
Week 07
24 pages
ERD-Practice Questions
No ratings yet
ERD-Practice Questions
21 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
FS-2-Compilation - Mondejar, Reo M. BECED 4-5
No ratings yet
FS-2-Compilation - Mondejar, Reo M. BECED 4-5
51 pages
Data Manipulation-MS Access
No ratings yet
Data Manipulation-MS Access
72 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
K Nearest Neighbor Algorithm in Python - Towards Data Science
No ratings yet
K Nearest Neighbor Algorithm in Python - Towards Data Science
7 pages
Sequential Pattern Mining
No ratings yet
Sequential Pattern Mining
24 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Etl Tools Comparison
No ratings yet
Etl Tools Comparison
21 pages
Mongodb Report
No ratings yet
Mongodb Report
26 pages
Car Hire Management System: 1.1statement of Problem
No ratings yet
Car Hire Management System: 1.1statement of Problem
26 pages
1c - MEAL Assistant
100% (1)
1c - MEAL Assistant
2 pages
BE Slides Winter 2024 2025
No ratings yet
BE Slides Winter 2024 2025
19 pages
ATUL
No ratings yet
ATUL
68 pages
Collibra Technical QnA
No ratings yet
Collibra Technical QnA
3 pages
Repayment Performa in Case of Lion International Bank in Dessie Branch
No ratings yet
Repayment Performa in Case of Lion International Bank in Dessie Branch
30 pages
Data Models With Examples
No ratings yet
Data Models With Examples
4 pages
Integrating Indigenous
No ratings yet
Integrating Indigenous
2 pages
Journey of Financial Technology (Fintech) : A Systematic Literature Review and Future Research Agenda
No ratings yet
Journey of Financial Technology (Fintech) : A Systematic Literature Review and Future Research Agenda
20 pages
Cse 17CS82 M2 S1 PPT
No ratings yet
Cse 17CS82 M2 S1 PPT
35 pages
Asaco Group Research.
No ratings yet
Asaco Group Research.
6 pages
102-Huawei OceanStor V5 Series Hybrid Flash Storage Systems V1.5
No ratings yet
102-Huawei OceanStor V5 Series Hybrid Flash Storage Systems V1.5
30 pages
2020 PGDBA-Tuted
No ratings yet
2020 PGDBA-Tuted
14 pages
Successful Thesis Proposals in Architecture and Urban Planning
No ratings yet
Successful Thesis Proposals in Architecture and Urban Planning
22 pages
Use Excel Data Model To Manage Your Cashflow: Let's Get Started
No ratings yet
Use Excel Data Model To Manage Your Cashflow: Let's Get Started
17 pages
Q1. The Following SAS Program Is Submitted
No ratings yet
Q1. The Following SAS Program Is Submitted
23 pages
Lab7 Sol
No ratings yet
Lab7 Sol
3 pages
Downloads - Azure Data Services - Módulo 2
No ratings yet
Downloads - Azure Data Services - Módulo 2
11 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Data Strategy Worksheet: Component Typical Questions
No ratings yet
Data Strategy Worksheet: Component Typical Questions
2 pages
Example: SQL Statements in COBOL and ILE COBOL Programs: Send Feedback Rate This Page
No ratings yet
Example: SQL Statements in COBOL and ILE COBOL Programs: Send Feedback Rate This Page
11 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

New Data Science Module Nearest Neighbors

Uploaded by

New Data Science Module Nearest Neighbors

Uploaded by

BU MET CS-677: Data Science With Python, v.2.

0 kNN - Nearest Neighbors Classification

• points in the same class are

• need to choose k - number of

• what labels for A and B ?

Assigning a Label for A

point k neighbors majority

Assigning a Label for B

point k neighbors majority

point k neighbors majority

object Height Weight Foot Label

• note different scales

What is the Label?

(H=6, W=160, F=10) 7→ ?

scaler = StandardScaler (). fit ( X )

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

new_instance = np . asmatrix ([6 , 160 , 10])

Result Without Scaling

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

new_instance = np . asmatrix ([6 , 160 , 10])

• (euclidean) distances d(·)

• without scaling: d(x7, x8) < d(x4, x8)

• with scaling: d(x7, x8) > d(x4, x8)

scaler = StandardScaler (). fit ( X )

Calculating k for IRIS

url = r ’ https :// archive . ics . uci . edu / ml / ’ + \

iris_feature_names = [ ’ sepal - length ’ , ’ sepal - width ’ ,

data = pd . read_csv ( url , names =[ ’ sepal - length ’ , ’ sepal - width ’ ,

class_labels = [ ’ Iris - versicolor ’ , ’ Iris - virginica ’]

X = data [ iris_feature_names ]. values

Calculating k for IRIS

figure ( figsize =(10 ,4))

Calculating k for IRIS

Error Rate vs. k: Iris-setosa and Iris-virginica

Day Weather Temperature Wind Play

• what label for x* = (sunny,

X = binary_data [0:10]. values

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

url = r ’ https :// archive . ics . uci . edu / ml / ’ + \

iris_feature_names = [ ’ sepal - length ’ , ’ sepal - width ’ ,

X = data [ iris_feature_names ]. values

X_train , X_test , Y_train , Y_test = train_test_split (X ,Y ,

(a) distances and neighbors

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.