0% found this document useful (0 votes)

26 views2 pages

Aaaaabbbbbbhw 1

The document outlines Homework 1 for DSCI 552, focusing on the Vertebral Column Data Set, which includes biomechanical attributes for binary classification of patients. It details tasks such as data pre-processing, exploratory analysis, and classification using K-nearest neighbors (KNN) with various distance metrics. Additionally, it instructs on evaluating model performance through confusion matrices and error rates, while exploring weighted voting methods and the impact of training set size on model accuracy.

Uploaded by

renesmeeczy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views2 pages

Aaaaabbbbbbhw 1

Uploaded by

renesmeeczy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Homework 1 DSCI 552, Instructor: Mohammad Reza Rajati

1. Vertebral Column Data Set

This Biomedical data set was built by Dr. Henrique da Mota during a medical residence
period in Lyon, France. Each patient in the data set is represented in the data set
by six biomechanical attributes derived from the shape and orientation of the pelvis
and lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar lordosis angle,
sacral slope, pelvic radius and grade of spondylolisthesis. The following convention is
used for the class labels: DH (Disk Hernia), Spondylolisthesis (SL), Normal (NO) and
Abnormal (AB). In this exercise, we only focus on a binary classification task NO=0
and AB=1.1

(a) Download the Vertebral Column Data Set from: https://archive.ics.uci.

edu/ml/datasets/Vertebral+Column.
(b) Pre-Processing and Exploratory data analysis:
i. Make scatterplots of the independent variables in the dataset. Use color to
show Classes 0 and 1.
ii. Make boxplots for each of the independent variables. Use color to show
Classes 0 and 1 (see ISLR p. 129).
iii. Select the first 70 rows of Class 0 and the first 140 rows of Class 1 as the
training set and the rest of the data as the test set.
(c) Classification using KNN on Vertebral Column Data Set
i. Write code for k-nearest neighbors with Euclidean metric (or use a software
package).
ii. Test all the data in the test database with k nearest neighbors. Take de-
cisions by majority polling. Plot train and test errors in terms of k for
k ∈ {208, 205, . . . , 7, 4, 1, } (in reverse order). You are welcome to use smaller
increments of k. Which k ∗ is the most suitable k among those values? Cal-
culate the confusion matrix, true positive rate, true negative rate, precision,
and F1 -score when k = k ∗ .2
iii. Since the computation time depends on the size of the training set, one may
only use a subset of the training set. Plot the best test error rate,3 which
is obtained by some value of k, against the size of training set, when the
size of training set is N ∈ {10, 20, 30, . . . , 210}.4 Note: for each N , select
your training set by choosing the first bN/3c rows of Class 0 and the first
N − bN/3c rows of Class 1 in the training set you created in 1(b)iii. Also, for
each N , select the optimal k from a set starting from k = 1, increasing by 5.
For example, if N = 200, the optimal k is selected from {1, 6, 11, . . . , 196}.
This plot is called a Learning Curve.
Let us further explore some variants of KNN.
1
Make sure that you convert labels to 0 and 1, otherwise you may not obtain correct answers.
2
We will learn in the lectures what these mean, for now research how they are computed and compute
them.
3
Obviously, use the test data you created in 1(b)iii
4
For extra practice, you are welcome to choose smaller increments of N .

1
Homework 1 DSCI 552, Instructor: Mohammad Reza Rajati

(d) Replace the Euclidean metric with the following metrics5 and test them. Sum-
marize the test errors (i.e., when k = k ∗ ) in a table. Use all of your training data
and select the best k when {1, 6, 11, . . . , 196}.
i. Minkowski Distance:
A. which becomes Manhattan Distance with p = 1.
B. with log10 (p) ∈ {0.1, 0.2, 0.3, . . . , 1}. In this case, use the k ∗ you found
for the Manhattan distance in 1(d)iA. What is the best log10 (p)?
C. which becomes Chebyshev Distance with p → ∞
ii. Mahalanobis Distance.6
(e) The majority polling decision can be replaced by weighted decision, in which the
weight of each point in voting is inversely proportional to its distance from the
query/test data point. In this case, closer neighbors of a query point will have
a greater influence than neighbors which are further away. Use weighted voting
with Euclidean, Manhattan, and Chebyshev distances and report the best test
errors when k ∈ {1, 6, 11, 16, . . . , 196}.
(f) What is the lowest training error rate you achieved in this homework?

5
You can use sklearn.neighbors.DistanceMetric. Research what each distance means.
6
Mahalanobis Distance requires inverting the covariance matrix of the data. When the covariance matrix
is singular or ill-conditioned, the data live in a linear subspace of the feature space. In this case, the features
have to be transformed into a reduced feature set in the linear subspace, which is equivalent to using a
pseudoinverse instead of an inverse.

K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Cost Management A Strategic Emphasis 8th Edition Blocher Digital Access
100% (2)
Cost Management A Strategic Emphasis 8th Edition Blocher Digital Access
405 pages
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
No ratings yet
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
74 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
Cs4758 KNN Lectureslides
No ratings yet
Cs4758 KNN Lectureslides
34 pages
Lab 8
No ratings yet
Lab 8
7 pages
Lesson 4 - Supervised Learning
No ratings yet
Lesson 4 - Supervised Learning
36 pages
KNN 2
No ratings yet
KNN 2
53 pages
K Nearest Neighbor Algorithm PDF
No ratings yet
K Nearest Neighbor Algorithm PDF
40 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Homework1 DSCI 552
No ratings yet
Homework1 DSCI 552
2 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
Application of Proper Draping
50% (2)
Application of Proper Draping
9 pages
Amapl - SS316L - Dia 100 MM - HT - 24SL1214 - 2596.000 Kgs.
No ratings yet
Amapl - SS316L - Dia 100 MM - HT - 24SL1214 - 2596.000 Kgs.
4 pages
Recitation 8
No ratings yet
Recitation 8
5 pages
KNN Practice Set
No ratings yet
KNN Practice Set
5 pages
CS4780 Homework 5 SP24-2
No ratings yet
CS4780 Homework 5 SP24-2
7 pages
Stochastic Gradient Descent 1
No ratings yet
Stochastic Gradient Descent 1
42 pages
Q - Skills For Success - Level 1 - Reading and Writing Split
No ratings yet
Q - Skills For Success - Level 1 - Reading and Writing Split
116 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
Ue21cs352a 20230830121058
No ratings yet
Ue21cs352a 20230830121058
18 pages
Supervised Learning
No ratings yet
Supervised Learning
10 pages
Hep & GIT Final MCQ 21 B
100% (4)
Hep & GIT Final MCQ 21 B
23 pages
Artificial Intelligence Lab 7
No ratings yet
Artificial Intelligence Lab 7
10 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
ML Cat QNS
No ratings yet
ML Cat QNS
4 pages
04 KNN Implementation
No ratings yet
04 KNN Implementation
7 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
BSE181055-Assignment 3
No ratings yet
BSE181055-Assignment 3
16 pages
MLLABDA2
No ratings yet
MLLABDA2
5 pages
Assignment 2 - ML FK
No ratings yet
Assignment 2 - ML FK
3 pages
Physics (Marks) Chemistry (Marks) Resultsdistance
No ratings yet
Physics (Marks) Chemistry (Marks) Resultsdistance
3 pages
NIT Part C D
No ratings yet
NIT Part C D
471 pages
400 (M) G Alfa Romeo 166 01
No ratings yet
400 (M) G Alfa Romeo 166 01
3 pages
HW02 - KNN DT
No ratings yet
HW02 - KNN DT
3 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
ML Unit 2 r20 Jntuk
No ratings yet
ML Unit 2 r20 Jntuk
34 pages
Science-Unit-Plann-Final 2
No ratings yet
Science-Unit-Plann-Final 2
111 pages
Cylinder Liner - Production Recommendation 0742048 3
No ratings yet
Cylinder Liner - Production Recommendation 0742048 3
17 pages
Ch2 - Lec2 - K Nearest Neighbour (KNN)
No ratings yet
Ch2 - Lec2 - K Nearest Neighbour (KNN)
18 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Lecture#2. K Nearest Neighbors
No ratings yet
Lecture#2. K Nearest Neighbors
10 pages
Assignment 2 Specification
No ratings yet
Assignment 2 Specification
3 pages
Distribution Restriction:: Approved For Public Release Distribution Is Unlimited
100% (1)
Distribution Restriction:: Approved For Public Release Distribution Is Unlimited
386 pages
(Ebook) Edexcel A Level Biology Studentbook 1 by Ed Lees ISBN 9781471807343, 1471807347 - Discover The Ebook With All Chapters in Just A Few Seconds
No ratings yet
(Ebook) Edexcel A Level Biology Studentbook 1 by Ed Lees ISBN 9781471807343, 1471807347 - Discover The Ebook With All Chapters in Just A Few Seconds
47 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
No ratings yet
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
3 pages
Lecture10 Mid
No ratings yet
Lecture10 Mid
43 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
10 EST Solution
No ratings yet
10 EST Solution
16 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
François Quesnay
No ratings yet
François Quesnay
5 pages
Guidance On Road Markings
No ratings yet
Guidance On Road Markings
17 pages
Rail Gun
100% (1)
Rail Gun
20 pages
Lab 5
No ratings yet
Lab 5
2 pages
Week 07
No ratings yet
Week 07
24 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
Cell Organelle Chart-1
No ratings yet
Cell Organelle Chart-1
4 pages
ML Assignment 3 Nptel 2019
No ratings yet
ML Assignment 3 Nptel 2019
26 pages
HW02 Sol - KNN DT
No ratings yet
HW02 Sol - KNN DT
8 pages
ML Lec-13
No ratings yet
ML Lec-13
17 pages
Full Charm SLD
0% (1)
Full Charm SLD
31 pages
Midterm - APS1070 - 2019 - 09 Fall
No ratings yet
Midterm - APS1070 - 2019 - 09 Fall
2 pages
2010 DG Challenger
No ratings yet
2010 DG Challenger
14 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
African Traditional Religion (ATR)
100% (1)
African Traditional Religion (ATR)
18 pages
Assignment 3 B
No ratings yet
Assignment 3 B
7 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
Writing Research Report
No ratings yet
Writing Research Report
33 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
Machine Learning 20CSE09
No ratings yet
Machine Learning 20CSE09
3 pages
HW 02
No ratings yet
HW 02
3 pages
Presentation 1
No ratings yet
Presentation 1
24 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Bioplastic 2
No ratings yet
Bioplastic 2
13 pages
Catalogue Mitsubishi 6D24TC
No ratings yet
Catalogue Mitsubishi 6D24TC
2 pages
MCS-012 Block 3
No ratings yet
MCS-012 Block 3
94 pages
Ref - Integrity Problems of Concrete Piles - FPrimeC - FPrimeC Solutions Inc
No ratings yet
Ref - Integrity Problems of Concrete Piles - FPrimeC - FPrimeC Solutions Inc
7 pages
CBS-Manual July 2019
No ratings yet
CBS-Manual July 2019
8 pages
Modifying Cy8Cproto-062-4343W Psoc™ 6 Mcu Board To Work With An External Flash Memory
No ratings yet
Modifying Cy8Cproto-062-4343W Psoc™ 6 Mcu Board To Work With An External Flash Memory
26 pages
ICT Project Creation Process
No ratings yet
ICT Project Creation Process
3 pages
U Zaw Lin Aung (Chemistry) Grade 10 Time Allowed: 1:30hours
No ratings yet
U Zaw Lin Aung (Chemistry) Grade 10 Time Allowed: 1:30hours
1 page
Characteristics of Effective Technical Communication: Section .1
No ratings yet
Characteristics of Effective Technical Communication: Section .1
23 pages
B&C De10
No ratings yet
B&C De10
1 page
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Aaaaabbbbbbhw 1

Uploaded by

Aaaaabbbbbbhw 1

Uploaded by

Homework 1 DSCI 552, Instructor: Mohammad Reza Rajati

1. Vertebral Column Data Set

(a) Download the Vertebral Column Data Set from: https://archive.ics.uci.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.