0% found this document useful (0 votes)

55 views14 pages

K-Nearest Neighbor: General Gist

- K-nearest neighbor algorithm predicts the class of a new sample based on the classes of the k closest samples in the training data, where k is a positive integer. - Distance between samples is calculated using the Euclidean distance formula, and the k nearest neighbors are found by sorting distances and selecting the first k rows. - The new sample is assigned the class that is most common among its k nearest neighbors.

Uploaded by

SANPREET SINGH GILL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views14 pages

K-Nearest Neighbor: General Gist

Uploaded by

SANPREET SINGH GILL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

K-Nearest Neighbor

K nearest neighbor algorithm is used to solve the models by predicting the

class of the input sample by comparing the distance of the newly added
sample to the values already assigned to a given class. If the distance
between the two values is less then we assign the sample the same class
as the value which is nearest.

General gist:
• Find Euclidean distance (Distance formula) between the
new sample and previous samples.
• Sort the distances obtained in ascending order
• Pick the class of the new sample according to the class
of the nearest neighbors.

K is the value of that defines the number of neighbors we are comparing

our new sample with.

• Larger value of K means a smoother curve of separation

resulting in less complex models
• Smaller value of K tends to overfit the data resulting in
less complex models

This algorithm is also known as lazy learner algorithm because using this
algorithm we don’t actually need to train the data set. We need to just
compare the distance and subsequently neighbors to classify.

To understand the working, I tried to applied KNN without any specific

libraries such as NumPy and SciKit-learn and only with two features.
Python Code:
# 18EC028- Internship- KNN
from math import sqrt

# calculate the Euclidean distance between two vectors

def euclidean_distance(row1, row2):
distance = 0.0
for i in range(len(row1) - 1):
distance += pow((row1[i] - row2[i]), 2) # distance formula
return sqrt(distance)

# Locate the most similar neighbors

def get_neighbors(train, test_row, num_neighbors):
distances = list()
for train_row in train:
dist = euclidean_distance(test_row, train_row)
distances.append((train_row, dist))
distances.sort(key=lambda tup: tup[1])
neighbors = list()
for i in range(num_neighbors):
neighbors.append(distances[i][0])
return neighbors

# Make a classification prediction with neighbors

def predict_classification(train, test_row, num_neighbors):
neighbors = get_neighbors(train, test_row, num_neighbors)
output_values = [row[-1] for row in neighbors]
prediction = max(set(output_values), key=output_values.count)
return prediction

# Test distance function (Main part of the code)

dataset = [[2.7810836, 2.550537003, 0],
[1.465489372, 2.362125076, 0],
[3.396561688, 4.400293529, 0],
[1.38807019, 1.850220317, 0],
[3.06407232, 3.005305973, 0],
[7.627531214, 2.759262235, 1],
[5.332441248, 2.088626775, 1],
[6.922596716, 1.77106367, 1],
[8.675418651, -0.242068655, 1],
[7.673756466, 3.508563011, 1]] # Sample data set
prediction = predict_classification(dataset, dataset[0], 3) # Call predict_classifier function
print('Expected %d, Got %d.' % (dataset[0][-1], prediction))
First, we will start by making a sample data set to keep verifying the
working of our functions.

# calculate the Euclidean distance between two vectors

def euclidean_distance(row1, row2):
distance = 0.0
for i in range(len(row1) - 1):
distance += pow((row1[i] - row2[i]), 2) # distance formula
return sqrt(distance)

Starting with a function to calculate distances between the test sample

and all the other sample present in the data set. The function takes two
rows as input parameter. ‘distance’ variable is created and then initialized
with zero to hold the Euclidean distance. As the data set contains two
features and their output/Class, the length of the row becomes 3.
‘(len(row1)-1)’ excludes the output/class integer and focuses only on the
other 2 feature present in the row. Those two feature acts like coordinates
in a 2D plane hence calculating distance between them becomes easier
with just a simple distance formula. As there is symmetry in table, we need
not check for row 2. Distance stores the following result.
(Feature1 of row 1 – Feature 1 of row 2)2 in first iteration
(Feature2 of row 1 – Feature 2 of row 2)2 in second iteration
As the distance is getting added between iteration, it will result in distance
formula
This function then returns the square root of the distance. For that we will
have to use sqrt function from the library math.
Up next is a function which will help us to find the neighbors of the new
sample we will add

# Locate the most similar neighbors

This function begins by taking three parameters as input.

• ‘train’ which is nothing but the data set we will provide as whole
• ‘test_row’ which will take the new sample we want to run our
algorithm on to get neighbors and predictions
• ‘num_neighbors’ which is K of KNN algorithm. The number of
neighbors we want to return to classify our new sample
A local variable ‘distance’ which is a list. This will be useful to store the
distances that we will calculate between each row of data set and our new
sample(‘test_row’).
The loop iterates through every row in the dataset we passed through
‘train’ variable. it appends the current row and the distance between the
new sample and the current row.
Next, we will sort the list in ascending order with each ‘train_row’ as key
to subsequent distance. The sort will be in ascending order with the least
distances stating from 0th index.
We will create a new variable ‘neighbor’ to hold the least distances in
distances list depending on the number of K
The for loop will iterate for K number of times. Appending distances row
wise.
The function will return the neighbors of the new sample.
The last step in applying KNN is to predict the class of the new sample.

# Make a classification prediction with neighbors

The last function is predict_classification. This function also takes the

same three input parameters as get_neighbor function.
The reason for this is we will have to call the function get_neighbor to get
the list of neighbors. We will give output values by comparing only the last
column in row (row[-1]) and making a 1D array or list of it.
The prediction value will make a set out of the output values to get only
unique values of the classes. Key=output_values.count will give us the
count of class present given by get_neighbor. If class 1 is occurring more
than class 0 the max value of the key count will be returned. Which is 1.
If class 0 is occurring more then key of class 0 is more and hence the max
function will return 0.
Hence, we have also classified the sample of our new data.

Now we will test all our functions by executing the main part of the code
# Test distance function (Main part of the code)
dataset = [[2.7810836, 2.550537003, 0],
[1.465489372, 2.362125076, 0],
[3.396561688, 4.400293529, 0],
[1.38807019, 1.850220317, 0],
[3.06407232, 3.005305973, 0],
[7.627531214, 2.759262235, 1],
[5.332441248, 2.088626775, 1],
[6.922596716, 1.77106367, 1],
[8.675418651, -0.242068655, 1],
[7.673756466, 3.508563011, 1]] # Sample data set
prediction = predict_classification(dataset, dataset[0], 3) # Call predict_classifier function
print('Expected %d, Got %d.' % (dataset[0][-1], prediction))
We will pass the whole dataset into argument. This is the ‘train’ parameter.
Dataset[0] means we are sending 1st row of the dataset as a new sample
test

[[2.7810836, 2.550537003, 0], #sample data set (datset[0])

We are selecting K as 3 that means we have to compare our sample

points with 3 closest neighbors.
As it is the first data set the expected class is 0. And the output we are
getting is:

This algorithm is limited to only 2 features. For Multiple features we will

have to use SciKit-learn library. SciKit_learn is extremely useful for
plotting the data and to carry out vectorized mathematics which is very
fast also we can optimize our code using various inbuilt optimization
functions.

KNN using libraries

Python Code:
import pandas as pd
import numpy as np
import operator

col = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'type']

iris = pd.read_csv("IrisDataset.csv", names=col)
print("First Five Entries:")
print(iris.head())
print("Columns of data set: ", iris.columns)
print("Dimension: ", iris.shape)
print("Size: ", iris.size)

def euclidean_distance(row1, row2, length):

distance = 0.0
for i in range(length): # Excludes calculating distance between Type in iris
data set
distance += np.square(row1[i] - row2[i]) # distance formula
return np.sqrt(distance)

def knn(dataset, test_sample, k):

distances = {}
length = test_sample.shape[1]
for x in range(len(dataset)):
dist = euclidean_distance(test_sample, dataset.iloc[x], length)
distances[x] = dist[0]
sorted_d = sorted(distances.items(), key=operator.itemgetter(1))
neighbors = []
for x in range(k):
neighbors.append(sorted_d[x][0])
counts = {"Setosa": 0, "Versicolor": 0, "Virginica": 0}
for x in range(len(neighbors)):
response = dataset.iloc[neighbors[x]][-1]
if response in counts:
counts[response] = +1
else:
counts[response] = 1
max_counts = sorted(counts.items(), key=operator.itemgetter(1), reverse=True)
return max_counts[0][0]

test_set = [[1.4, 3.6, 3.4, 1.2]]

test = pd.DataFrame(test_set)
k=4
predict = knn(iris, test, k)
print("The predicted type is: ", predict)

The Output we will get is:

With a changed test_sample

test_set = [[7.9, 3.8, 6.4, 2.0]]

test = pd.DataFrame(test_set)
k=4
predict = knn(iris, test, k)
print("The predicted type is: ", predict)

Output we will get is:

The libraries used in this algorithm is:
import pandas as pd
import numpy as np
import operator

Pandas is used for data manipulation and analysis. We will use pandas to
read a CSV (comma separated value) file, which holds the data for three
different type of flowers we will import pandas library under the alias ‘pd’.
Numpy is a powerful scientific library which is designed to work for array
operations and give mathematical edge which MATLAB provides. We will
import numpy under alias ‘np’
Operator is used in sorting of the iterable data type and specifying the
column we need as subject to sort the data type

First step will be to read the CSV file correctly. First download the .CSV
file and place it in the working directly of the python projects.
When we open the .CSV file we will notice that the comma separated
values are not having any column name. We will provide the separate
column name through code.
col = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'type']
iris = pd.read_csv("IrisDataset.csv", names=col)
print("First Five Entries:")
print(iris.head())
print("Columns of data set: ", iris.columns)
print("Dimension: ", iris.shape)
print("Size: ", iris.size)

Lists work like array in python so we will put the column names as string
in a list. pd.read_csv will create an object which we are storing in variable
‘iris’. The argument in this method is the file name of the .CSV filed we
stored in this directory. And ’col’ list as a separator which will name the
columns according to the comma separator in the .CSV file.
We can print the first five data rows of the .CSV file by calling head method
on object ‘iris’ as ‘iris.head()’.
We can view the columns in the similar fashion as ‘iris.columns’
Dimensions and Size of the data set can also be used. These all class
methods are defined under the library pandas.

Once the data set is read into ‘iris’ object we can proceed for the first step
in algorithm. To find the distance between the sample point and each entry
of the dataset
test_set = [[7.9, 3.8, 6.4, 2.0]]
test = pd.DataFrame(test_set)

length = test_sample.shape[1]
for x in range(len(dataset)):
dist = euclidean_distance(test_sample, dataset.iloc[x], length)
distances[x] = dist[0]
sorted_d = sorted(distances.items(), key=operator.itemgetter(1))

def euclidean_distance(row1, row2, length):

distance = 0.0
for i in range(length): # Excludes calculating distance between Type in iris
distance += np.square(row1[i] - row2[i]) # distance formula
return np.sqrt(distance)

We will first set in the values at which we want to predict the type of the
flower. To do this we will make a list and set the values and assign to
‘test_set’. The whole algorithm works on the pd object so we will have to
convert the ‘test_set’ to pd object and put it in the same data frame as the
of the ‘iris’ we read earlier with each value representing separate column.
It will be passed in an enclosing function where we will find the length
(Number of columns) of the test_sample. It will be later passed in
Euclidean_distance function.
We will make a for loop to send each row of dataset one by one. The loop
will run for each row hence the ‘len(dataset)’ will return the number of
rows. Inside the calling of Euclidean_distance function we will pass the
‘test_sample’ as argument 1 and ‘dataset.iloc[x]’ as argument 2. iloc is the
method of class pandas which stands for integer location. It returns the xth
row as x is the argument of illoc. This loop will run for every row of
‘dataset’. The 3rd argument is ‘length’ which holds the length of the
test_sample
In Euclidean_distance function we first assign a 0.0 to make distance a
float variable.
The for loop runs till the length of test_sample we passed earlier. This is
because we don’t want to include the Type of the flower while calculating
distances. We just want to perform calculation on four features.
So, the loop will run four times making the following equation:
Distance = √(𝑋 − 𝑥)2 + (𝑌 − 𝑦)2 + (𝑍 − 𝑧)2 + (𝑊 − 𝑤)2
Where X,Y,Z,W is the 4 features of the ‘test_sample’ and x,y,z,w is the
features of the each row which is getting passed as row2. This is achieved
by using numpy class’ method square and sqrt. And this function returns
the distance
This function will return the distance till the ‘len(dataset)’. Each value is
put into dictionary one by one then variable is named as ‘distances’
Next step will be to sort the distances to get the distance to the
‘test_sample’ in ascending order. This can be achieved by using sorted
function in which we take distances dictionary as items and we sort the by
taking distances as subjects. This can be told to compiler by
‘key=operator.itemgetter(1)’ where 1 indicates the values in the
dictionary. Likewise itemgetter(0) will indicates the key in the dictionary
and will sort according to key as a subject.

Once we have the sorted dictionary, we will move to find out the nearest
neighbor of the ‘test_sample’.

neighbors = []
for x in range(k):
neighbors.append(sorted_d[x][0])

We will define a variable ‘neighbor’ as a list. This loop will run to the range
defined by k. we will take the first k KEYS/INDEX of the closest values
from the sorted distances variable named ‘sorted_d’ and we have our
neighbors.
The last step in this process will be to determine the class of the
neighbors. We will do this by first making a dictionary which will hold all
the types of flower present in the dataset.

counts = {"Setosa": 0, "Versicolor": 0, "Virginica": 0}

for x in range(len(neighbors)):
response = dataset.iloc[neighbors[x]][-1]
if response in counts:
counts[response] = +1
else:
counts[response] = 1
max_counts = sorted(counts.items(), key=operator.itemgetter(1), reverse=True)
return max_counts[0][0]

This is stored in the variable ‘counts’. This for loop determines the types
of class near the neighbor value. This is done by taking iloc value of the
index of the closest neighbor. [-1] just reads the list from the end. The last
column is the type of the flower. This whole statement will assign the type
of flower of nearest neighbor to ‘response’ variable.
Now we will increment the value in ‘counts’ by the number of times a specif
type of flower is encountered.
We will reverse sort the list of ‘counts’ variable in descending order to get
the maximum number of counts at the very first key: value position.
We will return the type of flower with maximum count. Which will make up
our prediction.

This is the main code that will control the flow:

test_set = [[7.9, 3.8, 6.4, 2.0]]

test = pd.DataFrame(test_set)
k=4
predict = knn(iris, test, k)
print("The predicted type is: ", predict)

knn is the enclosing function which will also make call to the
Euclidean_distance.

Question-Answers in Machine Learning
No ratings yet
Question-Answers in Machine Learning
14 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Worksheet - 2.3 20BCS7490
No ratings yet
Worksheet - 2.3 20BCS7490
6 pages
Tutorial On Surface Code Quantum Error Correction
No ratings yet
Tutorial On Surface Code Quantum Error Correction
29 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Introduction To Operations Research: RK Jana
No ratings yet
Introduction To Operations Research: RK Jana
27 pages
Unit 2 Artificial Intelligence - Problem-Solving Through Searching
No ratings yet
Unit 2 Artificial Intelligence - Problem-Solving Through Searching
120 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
Lab 8
No ratings yet
Lab 8
7 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
ML Mid1 Myans
No ratings yet
ML Mid1 Myans
24 pages
ML Practical Kiranjot 6-10
No ratings yet
ML Practical Kiranjot 6-10
10 pages
Ai Lab Programs
No ratings yet
Ai Lab Programs
5 pages
Student Dropout Analysis Chat
No ratings yet
Student Dropout Analysis Chat
2 pages
KNN Cookbook
No ratings yet
KNN Cookbook
8 pages
R18 B.Tech Ece
No ratings yet
R18 B.Tech Ece
2 pages
Implementing KNN Algorithm On The Iris Dataset
No ratings yet
Implementing KNN Algorithm On The Iris Dataset
7 pages
Practical 7
No ratings yet
Practical 7
6 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Business Intelligence and Decision Support Systems (9 Ed., Prentice Hall)
No ratings yet
Business Intelligence and Decision Support Systems (9 Ed., Prentice Hall)
41 pages
Exercises - 1.2.2 Linear Systems - 1.2 Basic Control - ChM012x Courseware - Edx PDF
No ratings yet
Exercises - 1.2.2 Linear Systems - 1.2 Basic Control - ChM012x Courseware - Edx PDF
10 pages
V
No ratings yet
V
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
Chapter 8
No ratings yet
Chapter 8
91 pages
Assignment No 2 AI
No ratings yet
Assignment No 2 AI
4 pages
MachineLearning-Spring24 - KNN Implementation For Classification
No ratings yet
MachineLearning-Spring24 - KNN Implementation For Classification
3 pages
Shubham Pract 6 - Merged
No ratings yet
Shubham Pract 6 - Merged
12 pages
Optics
No ratings yet
Optics
3 pages
Exponential Backoff and Jitter - AWS Architecture Blog
No ratings yet
Exponential Backoff and Jitter - AWS Architecture Blog
9 pages
ML Programs
No ratings yet
ML Programs
14 pages
L-032 L2 Samyuktha Mandampully PPS Experiment 1 PDF Algorithms Computer Programming 4
No ratings yet
L-032 L2 Samyuktha Mandampully PPS Experiment 1 PDF Algorithms Computer Programming 4
1 page
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Wa0003
No ratings yet
Wa0003
16 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
AI Lab10
No ratings yet
AI Lab10
4 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
ML Practical Kunal 6-10
No ratings yet
ML Practical Kunal 6-10
10 pages
Class-XI Database+Concepts
No ratings yet
Class-XI Database+Concepts
32 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Faller - VFM - COBEM - 2023 - Rev - Final 3
No ratings yet
Faller - VFM - COBEM - 2023 - Rev - Final 3
7 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
First Course On Fuzzy Theory and Applications: Kwang H. Lee
No ratings yet
First Course On Fuzzy Theory and Applications: Kwang H. Lee
5 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Lab4 KNN
No ratings yet
Lab4 KNN
9 pages
Lab 10 - Manual and Assignment On KNN
No ratings yet
Lab 10 - Manual and Assignment On KNN
3 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
2 pages
Lab Session 9
No ratings yet
Lab Session 9
2 pages
Monte Carlo Simulation PDF
100% (1)
Monte Carlo Simulation PDF
14 pages
Presentation: Kleene Theorem Automata Theory
No ratings yet
Presentation: Kleene Theorem Automata Theory
21 pages
Blokchain Technology Assignment: 1. Public Distribution System (PDS)
No ratings yet
Blokchain Technology Assignment: 1. Public Distribution System (PDS)
5 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
32 pages
Implementation of RSA 2048-Bit and AES 256-Bit With Digital Signature For Secure Electronic Health Record Application PDF
No ratings yet
Implementation of RSA 2048-Bit and AES 256-Bit With Digital Signature For Secure Electronic Health Record Application PDF
6 pages
Bilal Ahmad Ai & DSS Assign # 03
No ratings yet
Bilal Ahmad Ai & DSS Assign # 03
7 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Econometrics Work-Sheet, Fikadu
No ratings yet
Econometrics Work-Sheet, Fikadu
3 pages
Smoothsort Demystified
No ratings yet
Smoothsort Demystified
27 pages
DL Exp-1.4 19BCS1431
No ratings yet
DL Exp-1.4 19BCS1431
5 pages
Worksheet - 2.3 20BCS7611
No ratings yet
Worksheet - 2.3 20BCS7611
6 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Assignment 3 B
No ratings yet
Assignment 3 B
7 pages
HW02 Sol - KNN DT
No ratings yet
HW02 Sol - KNN DT
8 pages
Decision Science PPT 2
No ratings yet
Decision Science PPT 2
20 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
7 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Sp14 Cs188 Lecture 4 - Csps I
No ratings yet
Sp14 Cs188 Lecture 4 - Csps I
43 pages
ITM Chapter 5 New On Probability Distributions
No ratings yet
ITM Chapter 5 New On Probability Distributions
16 pages
Seminar Report
No ratings yet
Seminar Report
29 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
PLC Latching Function
No ratings yet
PLC Latching Function
4 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
KNN Algorithm
No ratings yet
KNN Algorithm
10 pages
Prompt Engineering Guide For Students
No ratings yet
Prompt Engineering Guide For Students
5 pages
Write An ALP For All Arithematic Operations and Write ALP For Product of Two Numbers Withoutusing MUL Operation
No ratings yet
Write An ALP For All Arithematic Operations and Write ALP For Product of Two Numbers Withoutusing MUL Operation
3 pages
ML Notes
100% (2)
ML Notes
125 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

K-Nearest Neighbor: General Gist

Uploaded by

K-Nearest Neighbor: General Gist

Uploaded by

K-Nearest Neighbor

K nearest neighbor algorithm is used to solve the models by predicting the

K is the value of that defines the number of neighbors we are comparing

• Larger value of K means a smoother curve of separation

To understand the working, I tried to applied KNN without any specific

# calculate the Euclidean distance between two vectors

# Locate the most similar neighbors

# Make a classification prediction with neighbors

# Test distance function (Main part of the code)

# calculate the Euclidean distance between two vectors

Starting with a function to calculate distances between the test sample

# Locate the most similar neighbors

This function begins by taking three parameters as input.

# Make a classification prediction with neighbors

The last function is predict_classification. This function also takes the

[[2.7810836, 2.550537003, 0], #sample data set (datset[0])

We are selecting K as 3 that means we have to compare our sample

This algorithm is limited to only 2 features. For Multiple features we will

KNN using libraries

col = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'type']

def euclidean_distance(row1, row2, length):

def knn(dataset, test_sample, k):

test_set = [[1.4, 3.6, 3.4, 1.2]]

The Output we will get is:

test_set = [[7.9, 3.8, 6.4, 2.0]]

Output we will get is:

def euclidean_distance(row1, row2, length):

counts = {"Setosa": 0, "Versicolor": 0, "Virginica": 0}

This is the main code that will control the flow:

test_set = [[7.9, 3.8, 6.4, 2.0]]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.