0% found this document useful (0 votes)

186 views10 pages

KNN ALGORITHM IN MACHINELEARNING

K-Nearest Neighbors is a lazy learning algorithm that classifies new data points based on the majority class of its k nearest neighbors. It requires storing all training data and calculating distances between new and stored points, making it computationally expensive for large datasets. Preprocessing techniques like dimensionality reduction and attribute weighting can help address the "curse of dimensionality" issue KNN faces with high-dimensional data.

Uploaded by

nithinmamidala999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views10 pages

KNN ALGORITHM IN MACHINELEARNING

Uploaded by

nithinmamidala999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

K – Nearest Neighbors

Day 4
Introduction
At its core, the algorithm says:

• Pick a number of neighbors you want to use for

classification or regression
• Choose a method to measure distances
• Keep a data set with records
• For every new point, identify the number of nearest
neighbors you picked using the method you chose
• Let them vote if it is a classification issue or take a
mean/median for regression
Diagrammatically
• Let them vote if it is a
classification issue or take a
mean/median for regression
• Here, N = 1. The new green point is labeled black as
its nearest neighbor is black too

• Here, N = 3. The new green point

is labeled white based on the
voting of three nearest neighbors
From the algorithm, clearly KNN is

• Lazy: This is a technical term! All the techniques

we learned so far have a phase called “training
phase” and try to identify a function from the
training set. Then apply this function to the test
data. Such learning is called “eager learning”. K-
NN on the other hand does not generalize and
uses all the training data (or a subset) in the
testing phase. This type of learning is called lazy
learning or instance based learning.
• K-NN requires more time, as all data points
are needed to decide.
• It requires more memory as all training data
needs to be stored. So, it is very expensive for
large data sets and large dimensions. Where
N is the number of training examples, d is the
dimension of each sample.
• Hence, a lot of time must be spent in reducing
the N and d. K-NN does suffer from the curse
of dimensionality.
Attributes
Handling curse of dimensionality
• K-NN is heavily impacted by huge number of
dimensions
• Reduce the dimensions using
– Correlation , Principal Component Analysis
– Gain Ratio, Info gain (filter approach: We lose
some that are important)
– Wrapper methods (Forward selection,
Backward elimination)
– Weighting attributes
• Scaling the attributes
– Attributes with larger range can dominate
To understand this, consider the following pair of data
points (0.1, 20) and (0.9, 720)
• The distance is almost completely dominated by (720-
20) = 700. To avoid this, we standardize attributes to
force the attributes to have a common value range.
The common techniques include
• Taking logarithms when one variable is varying several
orders of magnitude
• Dividing with the highest value to get the variables
between 0 and 1
• Standardizing will bring most of the data to -3 and 3
• Categorical variables and Ordinal variables need to
be converted to numeric
• Handling missing values
– K-NN is impacted heavily by missing values
– Imputation is one option
• Handling overfitting
– Remove outliers (Wilson Editing)
• Speeding up KNN
– Condensation
Feature Engineering
• Library (class)
• kNN produces complex decision surfaces.
• As complexity of decision surfaces increases,
accuracy decreases and we need more data
• Increase K to decrease the over-fit.
• kNN gives no explicability.
• kNN is a distance method, only numeric
variables. Convert categorical/ordinal
values into numerical
• kNN works well in batch mode not in real
time
• kNN fails when there are missing values.
(use kNN Imputation in data pre-
processing to fill missing values)
• In kNN, training is easy but predictions are
difficult

Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Question Bank - Machine Learning
No ratings yet
Question Bank - Machine Learning
16 pages
RPG Tips & Tricks
100% (4)
RPG Tips & Tricks
88 pages
ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
DS Admin Cmds
No ratings yet
DS Admin Cmds
121 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Intro To Deep Learning Final Exam IT3320E HUST
No ratings yet
Intro To Deep Learning Final Exam IT3320E HUST
8 pages
Supervised and Deep Learning
No ratings yet
Supervised and Deep Learning
83 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
REPORT On DECISION TREE
No ratings yet
REPORT On DECISION TREE
40 pages
List of Ms Dos Commands PDF
100% (1)
List of Ms Dos Commands PDF
34 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Oops in Sap Abap - 2
No ratings yet
Oops in Sap Abap - 2
117 pages
Datastage Architecture
No ratings yet
Datastage Architecture
4 pages
K-Means and PCA
No ratings yet
K-Means and PCA
69 pages
Semarchy XDM Ebook
No ratings yet
Semarchy XDM Ebook
46 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Final Exam Paper Fall 2020
No ratings yet
Final Exam Paper Fall 2020
3 pages
Data Mining
No ratings yet
Data Mining
6 pages
Pycryptodome Master
100% (1)
Pycryptodome Master
82 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Exercise Simple Linear Regression
100% (1)
Exercise Simple Linear Regression
5 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Session 3 - Logistic Regression
50% (2)
Session 3 - Logistic Regression
28 pages
Serial Key Eset Nod32 Antivirus 8
No ratings yet
Serial Key Eset Nod32 Antivirus 8
23 pages
DataStage XML and Web Services Packs Overview
No ratings yet
DataStage XML and Web Services Packs Overview
71 pages
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
37 pages
NE40E Feature Description - Segment Routing
No ratings yet
NE40E Feature Description - Segment Routing
56 pages
Introduction To Online Car Rental System: Chapter-1
100% (2)
Introduction To Online Car Rental System: Chapter-1
40 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
A Comparison of Classification Techniques On Prediction of Student Performance
No ratings yet
A Comparison of Classification Techniques On Prediction of Student Performance
6 pages
Machine Learning: Louis Fippo Fitime
No ratings yet
Machine Learning: Louis Fippo Fitime
37 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Analysis With Missing Data
No ratings yet
Analysis With Missing Data
55 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Decision Tree Entropy Gini
No ratings yet
Decision Tree Entropy Gini
5 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
DataStage Stages 12-Dec-2013 12PM
No ratings yet
DataStage Stages 12-Dec-2013 12PM
47 pages
UNIT-1: Introduction To Computer Organization
No ratings yet
UNIT-1: Introduction To Computer Organization
51 pages
Artificial Intelligence DITI 1113: Uniformed Search II
No ratings yet
Artificial Intelligence DITI 1113: Uniformed Search II
36 pages
TF Idf
100% (3)
TF Idf
38 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
Motherboard
100% (1)
Motherboard
14 pages
SVM
No ratings yet
SVM
12 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Render Pal V2
No ratings yet
Render Pal V2
105 pages
Requirement Analysis
No ratings yet
Requirement Analysis
3 pages
3.1 C 4.5 Algorithm-19
No ratings yet
3.1 C 4.5 Algorithm-19
10 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
Improved Shuffled Frog Leaping Algorithm For Continuous Optimization Problem
No ratings yet
Improved Shuffled Frog Leaping Algorithm For Continuous Optimization Problem
4 pages
Proc Tabulate: Doing More: Art Carpenter California Occidental Consultants, Anchorage, AK
No ratings yet
Proc Tabulate: Doing More: Art Carpenter California Occidental Consultants, Anchorage, AK
18 pages
MBA IT 108 Solutions
No ratings yet
MBA IT 108 Solutions
10 pages
Introduction To Tree Methods
No ratings yet
Introduction To Tree Methods
15 pages
Datastage Errors and Resolution
No ratings yet
Datastage Errors and Resolution
10 pages
CP210x CP211x Overview Technical
No ratings yet
CP210x CP211x Overview Technical
37 pages
Ictl Form 1 Test 1
No ratings yet
Ictl Form 1 Test 1
10 pages
Essay On Travelling
No ratings yet
Essay On Travelling
1 page
Scheme Programing
100% (1)
Scheme Programing
43 pages
Introduction To Machine Learning PDF
100% (1)
Introduction To Machine Learning PDF
17 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
16 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
APT Config
No ratings yet
APT Config
9 pages
Data Analytics & Data Science Job Ready Program
No ratings yet
Data Analytics & Data Science Job Ready Program
4 pages
Summary Measures: Multiple Choice Questions
No ratings yet
Summary Measures: Multiple Choice Questions
9 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
University of Tunis Fall 2013 Tunis Business School Decision & Game Theory Tutorial 3
No ratings yet
University of Tunis Fall 2013 Tunis Business School Decision & Game Theory Tutorial 3
4 pages
DataStage Parallel Routines
No ratings yet
DataStage Parallel Routines
5 pages
CSI 2110 Summary PDF
No ratings yet
CSI 2110 Summary PDF
17 pages
3657 Atmpa0825c
No ratings yet
3657 Atmpa0825c
5 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Amulya DataStag Resume
No ratings yet
Amulya DataStag Resume
4 pages
CDMA Security
100% (1)
CDMA Security
13 pages
What Is The Flow of Loading Data Into Fact & Dimensional Tables?
No ratings yet
What Is The Flow of Loading Data Into Fact & Dimensional Tables?
3 pages
Email: Mobile No: Professional Summary
No ratings yet
Email: Mobile No: Professional Summary
3 pages
Battula Edukondalu: Good Experience in
No ratings yet
Battula Edukondalu: Good Experience in
3 pages
Bandits
No ratings yet
Bandits
2 pages
Excel Analysis of Between Companies
No ratings yet
Excel Analysis of Between Companies
5 pages
Sandeep ds3 2014-04-22
No ratings yet
Sandeep ds3 2014-04-22
3 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Bharathi.A: E-Mail
No ratings yet
Bharathi.A: E-Mail
3 pages
Emx Gui Messages
No ratings yet
Emx Gui Messages
35 pages
Hanumanth 3+ Testing Resume
No ratings yet
Hanumanth 3+ Testing Resume
3 pages
08250771
No ratings yet
08250771
8 pages
Using Kerberos To Authenticate A Solaris 10 OS LDAP Client With Microsoft Active Directory
No ratings yet
Using Kerberos To Authenticate A Solaris 10 OS LDAP Client With Microsoft Active Directory
25 pages
AI - (Deep Learning/NLP) : 5 Days
No ratings yet
AI - (Deep Learning/NLP) : 5 Days
4 pages
In Remove Duplicate Stage
No ratings yet
In Remove Duplicate Stage
2 pages
Debbie Hoppe, John Alden Life Insurance Company, Sacramento, CA
No ratings yet
Debbie Hoppe, John Alden Life Insurance Company, Sacramento, CA
2 pages
C12 Assignment (Predict The Output)
No ratings yet
C12 Assignment (Predict The Output)
7 pages
Sales Forecasting Using Kernel Based Support Vector Machine Algorithm
No ratings yet
Sales Forecasting Using Kernel Based Support Vector Machine Algorithm
6 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
MDGP WhitePaper Performance
No ratings yet
MDGP WhitePaper Performance
37 pages
Distributed Databases: Solutions To Practice Exercises
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
Square Topology For NoCs
No ratings yet
Square Topology For NoCs
4 pages
Top Ten Considerations For Devicewise Configurations
No ratings yet
Top Ten Considerations For Devicewise Configurations
14 pages
Bvms - Opc Server
No ratings yet
Bvms - Opc Server
20 pages
Using eQube-BITcRA To Make Sense of PLM Data
No ratings yet
Using eQube-BITcRA To Make Sense of PLM Data
15 pages
Resume - Arsalan Khan
No ratings yet
Resume - Arsalan Khan
2 pages
IV Predavanje (Computer Architecture)
No ratings yet
IV Predavanje (Computer Architecture)
3 pages
Quantum Computing: Quantum Key Distribution
No ratings yet
Quantum Computing: Quantum Key Distribution
4 pages
Parallel Stages
No ratings yet
Parallel Stages
20 pages
IDOC Process in SAP:: Events
0% (1)
IDOC Process in SAP:: Events
2 pages
Attiny Avr Set Up Instructions
No ratings yet
Attiny Avr Set Up Instructions
4 pages
SPLab Exercise I
No ratings yet
SPLab Exercise I
3 pages
How To Remove Vray Watermark in Sketchup With Vray 1
No ratings yet
How To Remove Vray Watermark in Sketchup With Vray 1
2 pages
Advanced C & Programming Logic Design: B.E.Third Semester (Computer Science & Engineering (New) ) (C.B.S.)
No ratings yet
Advanced C & Programming Logic Design: B.E.Third Semester (Computer Science & Engineering (New) ) (C.B.S.)
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

KNN ALGORITHM IN MACHINELEARNING

Uploaded by

KNN ALGORITHM IN MACHINELEARNING

Uploaded by

K – Nearest Neighbors

• Pick a number of neighbors you want to use for

• Here, N = 3. The new green point

• Lazy: This is a technical term! All the techniques

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.