0% found this document useful (0 votes)

45 views9 pages

I. Bstract Iii. ATA ET: Heart Disease Prediction Using Weka Tools On Machine Learning Anshu Garg, Jasleen Kaur

This document summarizes a project that aims to predict different types of heart disease by classifying patients into arrhythmia classes based on their medical records and electrocardiogram readings using machine learning algorithms. The project uses a dataset containing medical records of 452 patients with 279 attributes from the UCI machine learning repository. Feature selection and dimensionality reduction techniques are applied to preprocess the data. Classification algorithms like K-nearest neighbors, logistic regression, random forests, and decision trees are then applied and evaluated on the dataset to predict cardiac arrhythmia types with the goal of helping doctors make more accurate diagnoses.

Uploaded by

NOOBHACK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views9 pages

I. Bstract Iii. ATA ET: Heart Disease Prediction Using Weka Tools On Machine Learning Anshu Garg, Jasleen Kaur

Uploaded by

NOOBHACK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

HEART DISEASE PREDICTION USING WEKA TOOLS

ON MACHINE LEARNING

Anshu Garg,Jasleen Kaur

Dept. of Computer Science, CHANDIGARH UNIVERSITY
Project Report

I. ABSTRACT III. DATA SET

HEART DISEASE is a of a group of conditions in
The dataset for the project is taken from the UCI
which the electrical activity of the heart is irregular
Repositoryhttps://archive.ics.uci.edu/ml/datasets/
or is faster or slower than normal. It is the leading
Arrhythmia There are (452) rows, each
cause of death for both men and women in the
representing medical record of a different patient.
world. In this project, we plan to predict Cardiac
There are 279 attributes like age, weight and
Arrhythmia based on a patient’s medical record.
patient’s ECG related data. General attributes like
Our objective is to classify a patient into one of the
age and weight have discrete integral values while
Arrhythmia classes like Tachycardia and
other ECG features like QRS duration have real
Bradycardia based on his ECG measurements and
values.
help us in understanding the application of machine
The variable Class is our target variable. There are
learning in medical domain. After appropriate
in total 13 classes –
feature selection we plan to solve this problem by
using Machine Learning Algorithms namely K
Nearest Neighbour, Logistic Regression, Naïve
Bayes and SVM .

II. INTRODUCTION
The total number of deaths due to cardiovascular
diseases read 17.3 million a year according to the
WHO causes of death. Thus, how to predict cardiac
arrhythmia in real life is of great significance. In
this project, we plan to develop a machine learning
system that can classify a patient into different
cardiac arrhythmic classes. The diagnosis of cardiac
arrhythmia can be classified into various classes
based on the Electrocardiogram(ECG) readings and
other attributes. First class will refer to the normal
patient while other classes shall represent different
classes of cardiac arrhythmia like Tachycardia,
Bradycardia and Coronary artery diseases. This is a
supervised learning problem.
IV. SURVEY classification. However there are differences between
the cardiolog's and the programs classification. Taking
The aim is to distinguish between the presence and the cardiolog's as a gold standard we aim to minimize
absence of cardiac arrhythmia and to classify it in this difference by means of machine learning tools.
one of the 13 groups. For the time being, there
exists a computer program that makes such a
V. SCOPE
These machine learning techniques can be deployed
in hospitals where a large dataset is available and
can help the doctors in making more precise B. Random Forests and Decision Trees:
decisions and to cut down the number of causalities We implement a Random Forest classifier. The
due to heart diseases in the future. model works by continually sampling with
replacement a portion of the training dataset, and
fitting a decision tree to it. The number of trees refer
to the number of times the dataset is randomly
VI. METHODOLOGY sampled. Moreover, in each sampling iteration, a
random set of features are selected. In decision
A. Feature Selection: trees, each node refers to one of the input variables,
Firstly, we removed some of the categorical features which has edges to children for all possible values
that were 95% of time indicating either all 0’s or all that the input can take. Each leaf corresponds to a
1’s. If any training instance has a missing value for value of the class label given the values of the input
a given attribute, we set it as the mean of the value variables represented by the path from the root node
plus or minus the standard deviation for that to the leaf node. The number of trees and the
attribute related to the class it belongs to. number of leaves are learned via cross validation.
If for a given attribute majority of values are
missing, then we discard that attribute and remove it
from our training set. C. Principal Component Analysis:
The features can be grouped into 5 blocks – PCA is being used to identify patterns in the data
 features concerning biographical and then expressing the data in such a way to
characteristics, i.e., age, sex, height, weight highlight similarities and differences. Primarily we
and heart rate. are using PCA to reduce the number of dimensions
 features concerning average wave durations by identifying the more important features i.e. the
of each interval (PR interval, QRS complex, principal components. The number of principal
and ST intervals). components is less than or equal to the smaller of
 features concerning vector angles of each the number of original variables. The first principal
wave. component has the largest possible variance and
each succeeding component in turn has the highest
 features concerning widths of each wave.
variance possible under the constraint that it is
orthogonal to the preceding components.
VII. MODELS was improve by careful feature selection described
previously. The results are summarized below –
A. KNN (K-Nearest Neighbours):

Training-Testing K Training Test

We used KNN because it is simple to implement & Size Neighbour Accurac Accurac
very straight forward. Here, an object is classified s y y
by a majority vote of its neighbors, with the object 80%-20% 6 67 % 61 %
being assigned to the class most common among its
k nearest neighbors. This is done by measuring 70%-30% 6 65 % 62 %
distances between the object and its neighbors. The
Table 1: KNN Classification with PCA
following formula shows a representation of simple
Euclidian distance, where ‘a’ and ‘b’ are the
respective positions of the object and one of its
neighbours. KNN is very sensitive to irrelevant or
redundant features because all features contribute to Training-Testing K Training Test
Size Neighbour Accurac Accurac
the similarity and thus to the classification. This
s y y
80%-20% 6 65 % 64 %

70%-30% 6 67 % 62 %

Table 2: KNN Classification with RF

Image 1: KNN accuracy training vs testing

B. Logistic Regression:

Since the logistic regression is used for binary

classification of datasets with categorical
dependent features, in order to apply logistic
regression to our multi-class dataset, we firstly
classified our instances into two major classes,
class 1 (which contained all the instances with
“class 01” label) and class NOT-1
(which contained the instances for all the other
classes). We classified our data in this way,
because about half of our instances were labeled
as class 01. The results of our implementation
are summarized below-
70%-30% 89 % 71 %

Table 3: Logistic Regression with PCA

Training-Testing Training Test

Size Accuracy Accurac
y
80%-20% 96 % 73 %
70%-30% 98 % 72 %

Table 4: Logistic Regression with RF

Image 2: Logistic regression classification

Training-Testing Training Test

Size Accuracy Accuracy
80%-20% 90 % 74 %
C. Naïve – Bayes Classifier:

It is a classification technique based on Bayes

Theorem with an assumption of independence
among predictors.We implemented our own
Naive Bayes binomial and multinomial
Image 3: Logistic regression training vs testing classifiers in Python. We use Naive Bayesian
equation to calculate the posterior probability for
each class. The class with the highest posterior
probability is the outcome of prediction. In the
first case, the training-testing data was split 80%
- 20% and in the second case, the training-testing
data was split 80% - 20%. The results are
summarised below –

Image 4: Naïve Bayes Classification

Training-Testing Training Test

Size Accuracy Accurac
y
80%-20% 79 % 62 %
70%-30% 78 % 65 %

Table 5: Naïve-Bayes with PCA

Training-Testing Training Test
Size Accuracy Accurac
y
80%-20% 75 % 63 %
70%-30% 74 % 62 %
Training-Testing Training Test
Table 6: Naïve-Bayes with RF Size Accuracy Accurac
y
D. SVM (Support Vector Machines ) : 80%-20% 100 71 %
In SVM, a hyperplane is selected to best %
separate the points in the input variable space 70%-30% 99 % 70 %
by their class, either class 0 or class 1. In two-
dimensions you can visualize this as a line. E. Weighted KNN:
You can make classifications using this line. A refinement of the KNN classification algorithm is
By plugging in input values into the line
equation,
we calculate whether a new point is above or to weigh the contribution of each of the K
below the line. We tried both the polynomial neighbours according to their distance to the query
and the linear kernels for the SVM and found point, giving greater weight to closer neighbours.
out that the linear kernel outperformed the The results are summarized below-
polynomial kernel.

Image 6: Weighted KNN accuracy training vs testing

Training-Testing K Training Test

Size Neighbour Accurac Accurac
Image 5: SVM Classification s y y
80%-20% 6 99 % 65 %
Training-Testing Training Test
Accuracy Accurac 70%-30% 6 99 % 64 %
Size
y
80%-20% 99 % 75 % Table 9: Weighted KNN Classification with PCA
70%-30% 99 % 71 %

Table 7: SVM with PCA

The main objective of this project was to
develop a system that could robustly detect an
arrhythmia. The second objective of this
project was to develop a method to robustly
classify an ECG trace into one of 13 broad
arrhythmia classes. We report our performance
Table 10: Weighted KNN Classification with RF for each of the five methods

VIII. RESULTS
Training-Testing K Trainin Test
Size Neighbou g Accura
rs Accura cy
cy
80%-20% 6 99 % 70 %

method.
70%-30%Logistic regression
6 gave comparatively
99 % 66 % % for testing set.
better results with average accuracy around 73
%. Naïve-Bayes classifier gave poor results due
to problem of lack of enough training examples X. ACKNOWLEDGEMENT
(452) and excessive number of features. SVM We are highly grateful to our professors
using linear kernels gave the best results with ER.JYOTI ARORA and ER.DIPTI SHARMA for
average accuracy of classification around 99 % their continued guidance and support throughout
for training set and 73 the course of this project.
using two different methodologies. We show
results for each algorithm, as well as vary
XI. REFERENCES
other parameters for better results. [1].http://en.wikipedia.org/wiki/Cardiac_dysrhythmia

[2].http://www.cdc.gov/dhdsp/data_statistics/fact_shee
ts/d
IX. ANALYSIS ocs/fs_heart_disease.pdf

It is clear from the above data that the SVM and [3].Cunningham 2007. k-Nearest Neighbor Classifiers.
Logistic Regression algorithms are capable of Technical Report UCD-CSI-2007-4.
automatically detecting arrhythmias with reliable University College Dublin
accuracy (Training Data = 98 % and Testing
Data=73%). Furthermore, Random Forests [4] Roger VL et al. Heart disease and stroke
consistently performs better than PCA in terms of statistics— 2012 update: a report from
feature selection. Our general approach in this the American Heart
project was as follows. We started with KNN and
we tried to obtain maximum accuracy for different [5] http://www.texasheart.org/HIC/Topics/Cond/bbblo
values of K ranging from 3 to 13. Then we used ck.cfm
Logistic Regression which uses the sigmoid
function and we ran it using Gradient descent and [6] http://archive.ics.uci.edu/ml/
Newton’s
[7] UCI machine learning repository
(2013), http://archive.ics.
uci.edu/ml

Measures of Association
100% (1)
Measures of Association
15 pages
Project Proposal
No ratings yet
Project Proposal
23 pages
Linear Regression
83% (6)
Linear Regression
499 pages
Heart Disease Identification Using Machine Learning Classification
100% (2)
Heart Disease Identification Using Machine Learning Classification
11 pages
Heart Disease Prediction Using KNN Algorithm-2
No ratings yet
Heart Disease Prediction Using KNN Algorithm-2
19 pages
Thesis On Comparison of Machine Learning Techniques To Predict Cardiovascular Disease
No ratings yet
Thesis On Comparison of Machine Learning Techniques To Predict Cardiovascular Disease
52 pages
Conference PPT Anas2
No ratings yet
Conference PPT Anas2
14 pages
Paper - Review On KNN
No ratings yet
Paper - Review On KNN
21 pages
Presentation1 2
No ratings yet
Presentation1 2
14 pages
Genotype X Environment Interaction Chapter 13
No ratings yet
Genotype X Environment Interaction Chapter 13
30 pages
BCSP241006 BCS221016 BCS221023 Report
No ratings yet
BCSP241006 BCS221016 BCS221023 Report
38 pages
Heart Disease Diagnosis System
No ratings yet
Heart Disease Diagnosis System
20 pages
PythonHeartDisease FirstReview
No ratings yet
PythonHeartDisease FirstReview
20 pages
Cse437 4
No ratings yet
Cse437 4
14 pages
P21 Final Project Report
No ratings yet
P21 Final Project Report
9 pages
Term Paper
No ratings yet
Term Paper
8 pages
ML Report2
No ratings yet
ML Report2
21 pages
Prediction of Heart Disease Using Decision Tree in Comparison With KNN To Improve Accuracy
No ratings yet
Prediction of Heart Disease Using Decision Tree in Comparison With KNN To Improve Accuracy
5 pages
PeerEval Classification
No ratings yet
PeerEval Classification
5 pages
CHP 13 ARMA
No ratings yet
CHP 13 ARMA
23 pages
1000 Machine Learning MCQ (Multiple Choice Questions) - Sanfoundry
No ratings yet
1000 Machine Learning MCQ (Multiple Choice Questions) - Sanfoundry
16 pages
1822 B.E Cse Batchno 114
No ratings yet
1822 B.E Cse Batchno 114
42 pages
IEEE Paper Format Template
No ratings yet
IEEE Paper Format Template
4 pages
Heart Disease Identification Method Using
No ratings yet
Heart Disease Identification Method Using
72 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
PythonHeartDisease FirstReview
No ratings yet
PythonHeartDisease FirstReview
4 pages
Heart Disease Prediction - Medical Image Analysis - Robust Healthcare Forecasting
No ratings yet
Heart Disease Prediction - Medical Image Analysis - Robust Healthcare Forecasting
5 pages
Machine Learning Oral Questions
No ratings yet
Machine Learning Oral Questions
10 pages
A Data Analytics Tutorial Building Predictive
No ratings yet
A Data Analytics Tutorial Building Predictive
15 pages
Multiple Regression
No ratings yet
Multiple Regression
17 pages
Heart Disease Python Report 1st Phase
No ratings yet
Heart Disease Python Report 1st Phase
33 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
Heart Disease
No ratings yet
Heart Disease
5 pages
Ass 2 Ans
No ratings yet
Ass 2 Ans
5 pages
Comparative Study of Heart Disease Prediction Using Machine Learning Algorithms
No ratings yet
Comparative Study of Heart Disease Prediction Using Machine Learning Algorithms
6 pages
Testing Heteroskedasticity Stata
No ratings yet
Testing Heteroskedasticity Stata
4 pages
2021 IC FICTA Boosting Accuracy
No ratings yet
2021 IC FICTA Boosting Accuracy
13 pages
IEEE Conference Team ATOM
No ratings yet
IEEE Conference Team ATOM
5 pages
Analysis of Variance-1
No ratings yet
Analysis of Variance-1
42 pages
Heart Disease Prediction Using Machine Learning Techniques: Abstract
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Abstract
5 pages
Ai Finalreport b2
No ratings yet
Ai Finalreport b2
11 pages
Prediction of Heart Diseases Using Machine Learning
No ratings yet
Prediction of Heart Diseases Using Machine Learning
49 pages
Pearson Correlation Coefficient
No ratings yet
Pearson Correlation Coefficient
7 pages
Quasi Maximum Likelihood - Applications
No ratings yet
Quasi Maximum Likelihood - Applications
17 pages
Predicting The Presence of Heart Diseases Using Comparative Data Mining and Machine Learning Algorithms
No ratings yet
Predicting The Presence of Heart Diseases Using Comparative Data Mining and Machine Learning Algorithms
5 pages
ML-UNIT-IV - Complete
No ratings yet
ML-UNIT-IV - Complete
42 pages
Literature Survey
No ratings yet
Literature Survey
11 pages
Anova
No ratings yet
Anova
38 pages
Reliabilitas Dan Validitas: Case Processing Summary
No ratings yet
Reliabilitas Dan Validitas: Case Processing Summary
3 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
New Microsoft PowerPoint Presentation (Recovered)
No ratings yet
New Microsoft PowerPoint Presentation (Recovered)
23 pages
Prediction of Heart Disease Using Machine Learning
No ratings yet
Prediction of Heart Disease Using Machine Learning
5 pages
Stats
No ratings yet
Stats
5 pages
Research Paper - IT - Group No 8
No ratings yet
Research Paper - IT - Group No 8
10 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
27 pages
Vasu Gupta, Sharan Srinivasan, Sneha Kudli, Prediction and Classification of Cardiac Arrhythmia
No ratings yet
Vasu Gupta, Sharan Srinivasan, Sneha Kudli, Prediction and Classification of Cardiac Arrhythmia
5 pages
Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T
No ratings yet
Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T
11 pages
AST Day 2 Slides
No ratings yet
AST Day 2 Slides
58 pages
Effective Heart Disease Prediction Using Data Mining Technique
No ratings yet
Effective Heart Disease Prediction Using Data Mining Technique
11 pages
Heart Disease Prediction Using Machine
No ratings yet
Heart Disease Prediction Using Machine
1 page
Research On Factors Affecting The Engagement of Gen Z Employees in Vietnamese Enterprises
No ratings yet
Research On Factors Affecting The Engagement of Gen Z Employees in Vietnamese Enterprises
18 pages
Final Reviews
No ratings yet
Final Reviews
4 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
17 pages
INTRODUCTION
No ratings yet
INTRODUCTION
14 pages
AB Report Group 2
No ratings yet
AB Report Group 2
14 pages
Heart Disease Detection Using Machine Learning
No ratings yet
Heart Disease Detection Using Machine Learning
12 pages
Final Year Project
No ratings yet
Final Year Project
57 pages
Correlations Correlations: A A A A
No ratings yet
Correlations Correlations: A A A A
13 pages
Sat - 95.Pdf - Heart Disease Prediction Using Machine Learning Algorithms
No ratings yet
Sat - 95.Pdf - Heart Disease Prediction Using Machine Learning Algorithms
11 pages
ANOVAWelchcorrection Satterthwaitecorrectionand Kruskal Wallistestcomparisonoftype Ierrorrateandpower
No ratings yet
ANOVAWelchcorrection Satterthwaitecorrectionand Kruskal Wallistestcomparisonoftype Ierrorrateandpower
23 pages
Heart Disease PredictionUsing
No ratings yet
Heart Disease PredictionUsing
6 pages
Article Eda
No ratings yet
Article Eda
7 pages
Gpower
No ratings yet
Gpower
4 pages
IEEE Paper Format Template
No ratings yet
IEEE Paper Format Template
3 pages
Final Research Paper
No ratings yet
Final Research Paper
3 pages
Prediction Heart Disease
No ratings yet
Prediction Heart Disease
11 pages
Mathematics Project
No ratings yet
Mathematics Project
7 pages
Second Progres Report
No ratings yet
Second Progres Report
10 pages
FP Report - Group 2
No ratings yet
FP Report - Group 2
4 pages
Heart Disease Detection Using Machine Learning: Chithambaram T Logesh Kannan N Gowsalya M (Gowsalya.m@vit - Ac.in)
No ratings yet
Heart Disease Detection Using Machine Learning: Chithambaram T Logesh Kannan N Gowsalya M (Gowsalya.m@vit - Ac.in)
5 pages
Activity Stat
No ratings yet
Activity Stat
6 pages
Heart Disease Prediction Using Machine Learning Algorithm
No ratings yet
Heart Disease Prediction Using Machine Learning Algorithm
4 pages
Heart Disease Prediction Model: Dissertation
No ratings yet
Heart Disease Prediction Model: Dissertation
4 pages
Synopsis (Heart Disease Prediction)
No ratings yet
Synopsis (Heart Disease Prediction)
7 pages
Heart Disease Prediction Using Machine Learning IJERTV9IS040614
No ratings yet
Heart Disease Prediction Using Machine Learning IJERTV9IS040614
4 pages
Business Analysis and Econometric Application: Poonam Singh National Institute of Industrial Engineering
No ratings yet
Business Analysis and Econometric Application: Poonam Singh National Institute of Industrial Engineering
13 pages
RBD 2 Missing
No ratings yet
RBD 2 Missing
4 pages
Heart Disease Prediction With Machine Learning Approaches
No ratings yet
Heart Disease Prediction With Machine Learning Approaches
6 pages
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

I. Bstract Iii. ATA ET: Heart Disease Prediction Using Weka Tools On Machine Learning Anshu Garg, Jasleen Kaur

Uploaded by

I. Bstract Iii. ATA ET: Heart Disease Prediction Using Weka Tools On Machine Learning Anshu Garg, Jasleen Kaur

Uploaded by

HEART DISEASE PREDICTION USING WEKA TOOLS

Anshu Garg,Jasleen Kaur

I. ABSTRACT III. DATA SET

Training-Testing K Training Test

Table 2: KNN Classification with RF

Image 1: KNN accuracy training vs testing

Since the logistic regression is used for binary

Table 3: Logistic Regression with PCA

Training-Testing Training Test

Table 4: Logistic Regression with RF

Training-Testing Training Test

It is a classification technique based on Bayes

Image 4: Naïve Bayes Classification

Training-Testing Training Test

Table 5: Naïve-Bayes with PCA

Image 6: Weighted KNN accuracy training vs testing

Training-Testing K Training Test

Table 7: SVM with PCA

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.