0% found this document useful (0 votes)

33 views1 page

Heart Disease Prediction (1) (1) - 1

Uploaded by

Shubam Padha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views1 page

Heart Disease Prediction (1) (1) - 1

Uploaded by

Shubam Padha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Practical- 12

Aim: To predict heart disease based on factors such as age, gender ,

trestbps etc.
In [2]: # Importing essential libraries
import numpy as np
import pandas as pd

In [3]: # Loading the dataset

df = pd.read_csv('heart.csv')

Exploring the dataset

In [4]: # Returns number of rows and columns of the dataset
df.shape

Out[4]: (303, 14)

In [5]: # Returns an object with all of the column headers

df.columns

Out[5]: Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',

'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
dtype='object')

In [6]: # Returns different datatypes for each columns (float, int, string, bool, etc.)
df.dtypes

Out[6]: age int64

sex int64
cp int64
trestbps int64
chol int64
fbs int64
restecg int64
thalach int64
exang int64
oldpeak float64
slope int64
ca int64
thal int64
target int64
dtype: object

In [7]: # Returns the first x number of rows when head(x). Without a number it returns 5
df.head()

Out[7]: age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target

0 63 1 3 145 233 1 0 150 0 2.3 0 0 1 1

1 37 1 2 130 250 0 1 187 0 3.5 0 0 2 1

2 41 0 1 130 204 0 0 172 0 1.4 2 0 2 1

3 56 1 1 120 236 0 1 178 0 0.8 2 0 2 1

4 57 0 0 120 354 0 1 163 1 0.6 2 0 2 1

In [8]: # Returns the last x number of rows when tail(x). Without a number it returns 5
df.tail()

Out[8]: age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target

298 57 0 0 140 241 0 1 123 1 0.2 1 0 3 0

299 45 1 3 110 264 0 1 132 0 1.2 1 0 3 0

300 68 1 0 144 193 1 1 141 0 3.4 1 2 3 0

301 57 1 0 130 131 0 1 115 1 1.2 1 1 3 0

302 57 0 1 130 236 0 0 174 0 0.0 1 1 2 0

In [9]: # Returns true for a column having null values, else false
df.isnull().any()

Out[9]: age False

sex False
cp False
trestbps False
chol False
fbs False
restecg False
thalach False
exang False
oldpeak False
slope False
ca False
thal False
target False
dtype: bool

In [10]: # Returns basic information on all columns

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 303 non-null int64
1 sex 303 non-null int64
2 cp 303 non-null int64
3 trestbps 303 non-null int64
4 chol 303 non-null int64
5 fbs 303 non-null int64
6 restecg 303 non-null int64
7 thalach 303 non-null int64
8 exang 303 non-null int64
9 oldpeak 303 non-null float64
10 slope 303 non-null int64
11 ca 303 non-null int64
12 thal 303 non-null int64
13 target 303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.3 KB

In [11]: # Returns basic statistics on numeric columns

df.describe().T

Out[11]: count mean std min 25% 50% 75% max

age 303.0 54.366337 9.082101 29.0 47.5 55.0 61.0 77.0

sex 303.0 0.683168 0.466011 0.0 0.0 1.0 1.0 1.0

cp 303.0 0.966997 1.032052 0.0 0.0 1.0 2.0 3.0

trestbps 303.0 131.623762 17.538143 94.0 120.0 130.0 140.0 200.0

chol 303.0 246.264026 51.830751 126.0 211.0 240.0 274.5 564.0

fbs 303.0 0.148515 0.356198 0.0 0.0 0.0 0.0 1.0

restecg 303.0 0.528053 0.525860 0.0 0.0 1.0 1.0 2.0

thalach 303.0 149.646865 22.905161 71.0 133.5 153.0 166.0 202.0

exang 303.0 0.326733 0.469794 0.0 0.0 0.0 1.0 1.0

oldpeak 303.0 1.039604 1.161075 0.0 0.0 0.8 1.6 6.2

slope 303.0 1.399340 0.616226 0.0 1.0 1.0 2.0 2.0

ca 303.0 0.729373 1.022606 0.0 0.0 0.0 1.0 4.0

thal 303.0 2.313531 0.612277 0.0 2.0 2.0 3.0 3.0

target 303.0 0.544554 0.498835 0.0 0.0 1.0 1.0 1.0

Data Visualization
In [12]: # Importing essential libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [13]: # Plotting histogram for the entire dataset

fig = plt.figure(figsize = (15,15))
ax = fig.gca()
g = df.hist(ax=ax)

In [14]: # Visualization to check if the dataset is balanced or not

g = sns.countplot(x='target', data=df)
plt.xlabel('Target')
plt.ylabel('Count')

Out[14]: Text(0, 0.5, 'Count')

Feature Engineering
Feature Selection
In [15]: # Selecting correlated features using Heatmap

# Get correlation of all the features of the dataset

corr_matrix = df.corr()
top_corr_features = corr_matrix.index

# Plotting the heatmap

plt.figure(figsize=(20,20))
sns.heatmap(data=df[top_corr_features].corr(), annot=True, cmap='RdYlGn')

Out[15]: <Axes: >

Data Preprocessing

Handling categorical features

After exploring the dataset, I observed that converting the categorical variables into dummy variables using 'get_dummies()'. Though we don't have any strings in our dataset it is necessary to convert ('sex',
'cp', 'fbs', 'restecg', 'exang', 'slope', 'ca', 'thal') these features.

Example: Consider the 'sex' column, it is a binary feature which has 0's and 1's as its values. Keeping it as it is would lead the algorithm to think 0 is lower value and 1 is a higher value, which should not
be the case since the gender cannot be ordinal feature.

In [16]: dataset = pd.get_dummies(df, columns=['sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'ca', 'thal'])

Feature Scaling
In [17]: dataset.columns

Out[17]: Index(['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'target', 'sex_0',

'sex_1', 'cp_0', 'cp_1', 'cp_2', 'cp_3', 'fbs_0', 'fbs_1', 'restecg_0',
'restecg_1', 'restecg_2', 'exang_0', 'exang_1', 'slope_0', 'slope_1',
'slope_2', 'ca_0', 'ca_1', 'ca_2', 'ca_3', 'ca_4', 'thal_0', 'thal_1',
'thal_2', 'thal_3'],
dtype='object')

In [18]: from sklearn.preprocessing import StandardScaler

standScaler = StandardScaler()
columns_to_scale = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak']
dataset[columns_to_scale] = standScaler.fit_transform(dataset[columns_to_scale])

In [19]: dataset.head()

Out[19]: age trestbps chol thalach oldpeak target sex_0 sex_1 cp_0 cp_1 ... slope_2 ca_0 ca_1 ca_2 ca_3 ca_4 thal_0 thal_1 thal_2 thal_3

0 0.952197 0.763956 -0.256334 0.015443 1.087338 1 False True False False ... False True False False False False False True False False

1 -1.915313 -0.092738 0.072199 1.633471 2.122573 1 False True False False ... False True False False False False False False True False

2 -1.474158 -0.092738 -0.816773 0.977514 0.310912 1 True False False True ... True True False False False False False False True False

3 0.180175 -0.663867 -0.198357 1.239897 -0.206705 1 False True False True ... True True False False False False False False True False

4 0.290464 -0.663867 2.082050 0.583939 -0.379244 1 True False True False ... True True False False False False False False True False

5 rows × 31 columns

In [20]: # Splitting the dataset into dependent and independent features

X = dataset.drop('target', axis=1)
y = dataset['target']

Model Building
I will be experimenting with 3 algorithms:

1. KNeighbors Classifier
2. Decision Tree Classifier
3. Random Forest Classifier

KNeighbors Classifier Model

In [21]: # Importing essential libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score

In [22]: # Finding the best accuracy for knn algorithm using cross_val_score
knn_scores = []
for i in range(1, 21):
knn_classifier = KNeighborsClassifier(n_neighbors=i)
cvs_scores = cross_val_score(knn_classifier, X, y, cv=10)
knn_scores.append(round(cvs_scores.mean(),3))

In [23]: # Plotting the results of knn_scores

plt.figure(figsize=(20,15))
plt.plot([k for k in range(1, 21)], knn_scores, color = 'red')
for i in range(1,21):
plt.text(i, knn_scores[i-1], (i, knn_scores[i-1]))
plt.xticks([i for i in range(1, 21)])
plt.xlabel('Number of Neighbors (K)')
plt.ylabel('Scores')
plt.title('K Neighbors Classifier scores for different K values')

Out[23]: Text(0.5, 1.0, 'K Neighbors Classifier scores for different K values')

In [24]: # Training the knn classifier model with k value as 12

knn_classifier = KNeighborsClassifier(n_neighbors=12)
cvs_scores = cross_val_score(knn_classifier, X, y, cv=10)
print("KNeighbours Classifier Accuracy with K=12 is: {}%".format(round(cvs_scores.mean(), 4)*100))

KNeighbours Classifier Accuracy with K=12 is: 84.48%

Decision Tree Classifier

In [25]: # Importing essential libraries
from sklearn.tree import DecisionTreeClassifier

In [26]: # Finding the best accuracy for decision tree algorithm using cross_val_score
decision_scores = []
for i in range(1, 11):
decision_classifier = DecisionTreeClassifier(max_depth=i)
cvs_scores = cross_val_score(decision_classifier, X, y, cv=10)
decision_scores.append(round(cvs_scores.mean(),3))

In [27]: # Plotting the results of decision_scores

plt.figure(figsize=(20,15))
plt.plot([i for i in range(1, 11)], decision_scores, color = 'red')
for i in range(1,11):
plt.text(i, decision_scores[i-1], (i, decision_scores[i-1]))
plt.xticks([i for i in range(1, 11)])
plt.xlabel('Depth of Decision Tree (N)')
plt.ylabel('Scores')
plt.title('Decision Tree Classifier scores for different depth values')

Out[27]: Text(0.5, 1.0, 'Decision Tree Classifier scores for different depth values')

In [28]: # Training the decision tree classifier model with max_depth value as 3
decision_classifier = DecisionTreeClassifier(max_depth=3)
cvs_scores = cross_val_score(decision_classifier, X, y, cv=10)
print("Decision Tree Classifier Accuracy with max_depth=3 is: {}%".format(round(cvs_scores.mean(), 4)*100))

Decision Tree Classifier Accuracy with max_depth=3 is: 78.51%

Random Forest Classifier

In [29]: # Importing essential libraries
from sklearn.ensemble import RandomForestClassifier

In [30]: # Finding the best accuracy for random forest algorithm using cross_val_score
forest_scores = []
for i in range(10, 101, 10):
forest_classifier = RandomForestClassifier(n_estimators=i)
cvs_scores = cross_val_score(forest_classifier, X, y, cv=5)
forest_scores.append(round(cvs_scores.mean(),3))

In [31]: # Plotting the results of forest_scores

plt.figure(figsize=(20,15))
plt.plot([n for n in range(10, 101, 10)], forest_scores, color = 'red')
for i in range(1,11):
plt.text(i*10, forest_scores[i-1], (i*10, forest_scores[i-1]))
plt.xticks([i for i in range(10, 101, 10)])
plt.xlabel('Number of Estimators (N)')
plt.ylabel('Scores')
plt.title('Random Forest Classifier scores for different N values')

Out[31]: Text(0.5, 1.0, 'Random Forest Classifier scores for different N values')

In [32]: # Training the random forest classifier model with n value as 90

forest_classifier = RandomForestClassifier(n_estimators=90)
cvs_scores = cross_val_score(forest_classifier, X, y, cv=5)
print("Random Forest Classifier Accuracy with n_estimators=90 is: {}%".format(round(cvs_scores.mean(), 4)*100))

Random Forest Classifier Accuracy with n_estimators=90 is: 82.80999999999999%

In [ ]:

Github link:- https://github.com/Shubam85/Heart-disease-prediction.git

Diabetes Prediction Using KNN - Ipynb
No ratings yet
Diabetes Prediction Using KNN - Ipynb
723 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Data Loading - Jupyter Notebook
No ratings yet
Data Loading - Jupyter Notebook
15 pages
Kidney Ipynb
No ratings yet
Kidney Ipynb
253 pages
Kidney Disease Prediction - Ipynb
No ratings yet
Kidney Disease Prediction - Ipynb
148 pages
Heart Attack - Ipynb
No ratings yet
Heart Attack - Ipynb
162 pages
Practical 1
No ratings yet
Practical 1
26 pages
Predicting Heart Disease Using ML
No ratings yet
Predicting Heart Disease Using ML
57 pages
Heart Disease Prediction - Ipynb
No ratings yet
Heart Disease Prediction - Ipynb
207 pages
Python Solution
No ratings yet
Python Solution
30 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
Prac 1 Feb
No ratings yet
Prac 1 Feb
22 pages
Chisquare
No ratings yet
Chisquare
9 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Dsa 1
No ratings yet
Dsa 1
8 pages
Aids
No ratings yet
Aids
88 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Employees Burnout Analysis
No ratings yet
Employees Burnout Analysis
20 pages
ML Cops
No ratings yet
ML Cops
17 pages
Dsbda 3a
No ratings yet
Dsbda 3a
11 pages
230103-ECON209 S2025 Lab 2.ipynb-Colab
No ratings yet
230103-ECON209 S2025 Lab 2.ipynb-Colab
10 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
6 pages
Heart Diseases EDA
No ratings yet
Heart Diseases EDA
1 page
Logistic Regression Implementation
No ratings yet
Logistic Regression Implementation
10 pages
MS Project Presentation
No ratings yet
MS Project Presentation
14 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Assignment3 VidulGarg
No ratings yet
Assignment3 VidulGarg
14 pages
Project 3 - Diabetes Prediction - Ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction - Ipynb - Colab
4 pages
Heart Failure Prediction With Detailed Headings
No ratings yet
Heart Failure Prediction With Detailed Headings
12 pages
Mall Customer
No ratings yet
Mall Customer
1 page
Business Case - Aerofit - Descriptive Statistics Probability (Final)
100% (1)
Business Case - Aerofit - Descriptive Statistics Probability (Final)
1 page
Classwork 10
No ratings yet
Classwork 10
1 page
Howxtre
No ratings yet
Howxtre
8 pages
LP Practical ! Jupyter Notebook
No ratings yet
LP Practical ! Jupyter Notebook
6 pages
Ml1.ipynb - Colaboratory
No ratings yet
Ml1.ipynb - Colaboratory
5 pages
TITANIC CLASSIFICATION - Task1
No ratings yet
TITANIC CLASSIFICATION - Task1
2 pages
Dovdush KN-305 Lab3
No ratings yet
Dovdush KN-305 Lab3
2 pages
Descriptive Information and Statistics - Stata Learning Modules
No ratings yet
Descriptive Information and Statistics - Stata Learning Modules
20 pages
Week 13 1-Pandas
No ratings yet
Week 13 1-Pandas
10 pages
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
No ratings yet
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
2 pages
Pandas
No ratings yet
Pandas
4 pages
Odds Ratio and Relative Risk For A 2 by 2 Table, and McNemar's Test and Cohen's Kappa For Matched Categorical Data
No ratings yet
Odds Ratio and Relative Risk For A 2 by 2 Table, and McNemar's Test and Cohen's Kappa For Matched Categorical Data
12 pages
Practical 1
No ratings yet
Practical 1
7 pages
Cs2351 Ai Notes
100% (1)
Cs2351 Ai Notes
91 pages
Correlation: Import As Import As Import As Import As From Import From Import Import Matplotlib Import
No ratings yet
Correlation: Import As Import As Import As Import As From Import From Import Import Matplotlib Import
1 page
Artificial Neural Network (Ann)
No ratings yet
Artificial Neural Network (Ann)
1 page
Data Pre Processing 1
No ratings yet
Data Pre Processing 1
35 pages
ALY6015 Final Project Report
No ratings yet
ALY6015 Final Project Report
19 pages
Manual For Multitrack Studio
100% (1)
Manual For Multitrack Studio
115 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Shailesh020902@gmail - Com 1
No ratings yet
Shailesh020902@gmail - Com 1
1 page
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Da
No ratings yet
Da
2 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
Virtual Law Practice
100% (1)
Virtual Law Practice
27 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Unit 6 VBA
100% (1)
Unit 6 VBA
32 pages
Forecast Demand Managing PDF
No ratings yet
Forecast Demand Managing PDF
70 pages
PLC Editor ManualEN V13
No ratings yet
PLC Editor ManualEN V13
46 pages
KOHLER - MPAC1000 Setup Program
No ratings yet
KOHLER - MPAC1000 Setup Program
52 pages
Improving Maternal Child Health (MCH) Services
No ratings yet
Improving Maternal Child Health (MCH) Services
22 pages
UGRD-ITE6102 Computer Programming 1 - ALL - IN - SOURCE UGRD-ITE6102 Computer Programming 1 - ALL - IN - SOURCE
No ratings yet
UGRD-ITE6102 Computer Programming 1 - ALL - IN - SOURCE UGRD-ITE6102 Computer Programming 1 - ALL - IN - SOURCE
19 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
BPS Step by Step
100% (1)
BPS Step by Step
71 pages
Words and Rules - Steven Pinker, PHD
100% (1)
Words and Rules - Steven Pinker, PHD
6 pages
1ZO-986 Exam Dump1
No ratings yet
1ZO-986 Exam Dump1
23 pages
Sampling Theorem
No ratings yet
Sampling Theorem
92 pages
PRBT in Monolithic Architecture
100% (3)
PRBT in Monolithic Architecture
5 pages
Chapter 4 Memory Element
No ratings yet
Chapter 4 Memory Element
87 pages
A) Calculate The Theoretical Value of The Depth of Embedment, D (15%) - Step 1 Calculate Ka and KP
No ratings yet
A) Calculate The Theoretical Value of The Depth of Embedment, D (15%) - Step 1 Calculate Ka and KP
6 pages
Amdahl's Law, Also Known As Amdahl's Argument,: Parallel Computing Speedup Computer Architect Gene Amdahl Afips
No ratings yet
Amdahl's Law, Also Known As Amdahl's Argument,: Parallel Computing Speedup Computer Architect Gene Amdahl Afips
3 pages
Scientific Calculator
100% (1)
Scientific Calculator
15 pages
Programming Tips
No ratings yet
Programming Tips
21 pages
Course Syllabus 1 PDF
100% (1)
Course Syllabus 1 PDF
6 pages
Archmodels v003 PDF
No ratings yet
Archmodels v003 PDF
5 pages
ANA Website Redesign Playbook PDF
No ratings yet
ANA Website Redesign Playbook PDF
34 pages
IEEE Membership
No ratings yet
IEEE Membership
2 pages
MB Memory Z790 8L 2G4 D5
No ratings yet
MB Memory Z790 8L 2G4 D5
11 pages
Os Mini Project
No ratings yet
Os Mini Project
14 pages
STRATEGY 1-Draw A Diagram
No ratings yet
STRATEGY 1-Draw A Diagram
16 pages
Entanglement of Formation and Concurrence PDF
No ratings yet
Entanglement of Formation and Concurrence PDF
18 pages
HW3 Soln PDF
No ratings yet
HW3 Soln PDF
17 pages
Intrusion Detection Using Time-Inhomogeneous Hidden Bernoulli Model
No ratings yet
Intrusion Detection Using Time-Inhomogeneous Hidden Bernoulli Model
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Heart Disease Prediction (1) (1) - 1

Uploaded by

Heart Disease Prediction (1) (1) - 1

Uploaded by

Practical- 12

Aim: To predict heart disease based on factors such as age, gender ,

In [3]: # Loading the dataset

Exploring the dataset

Out[4]: (303, 14)

In [5]: # Returns an object with all of the column headers

Out[5]: Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',

Out[6]: age int64

0 63 1 3 145 233 1 0 150 0 2.3 0 0 1 1

1 37 1 2 130 250 0 1 187 0 3.5 0 0 2 1

2 41 0 1 130 204 0 0 172 0 1.4 2 0 2 1

3 56 1 1 120 236 0 1 178 0 0.8 2 0 2 1

4 57 0 0 120 354 0 1 163 1 0.6 2 0 2 1

298 57 0 0 140 241 0 1 123 1 0.2 1 0 3 0

299 45 1 3 110 264 0 1 132 0 1.2 1 0 3 0

300 68 1 0 144 193 1 1 141 0 3.4 1 2 3 0

301 57 1 0 130 131 0 1 115 1 1.2 1 1 3 0

302 57 0 1 130 236 0 0 174 0 0.0 1 1 2 0

Out[9]: age False

In [10]: # Returns basic information on all columns

In [11]: # Returns basic statistics on numeric columns

Out[11]: count mean std min 25% 50% 75% max

age 303.0 54.366337 9.082101 29.0 47.5 55.0 61.0 77.0

sex 303.0 0.683168 0.466011 0.0 0.0 1.0 1.0 1.0

cp 303.0 0.966997 1.032052 0.0 0.0 1.0 2.0 3.0

trestbps 303.0 131.623762 17.538143 94.0 120.0 130.0 140.0 200.0

chol 303.0 246.264026 51.830751 126.0 211.0 240.0 274.5 564.0

fbs 303.0 0.148515 0.356198 0.0 0.0 0.0 0.0 1.0

restecg 303.0 0.528053 0.525860 0.0 0.0 1.0 1.0 2.0

thalach 303.0 149.646865 22.905161 71.0 133.5 153.0 166.0 202.0

exang 303.0 0.326733 0.469794 0.0 0.0 0.0 1.0 1.0

oldpeak 303.0 1.039604 1.161075 0.0 0.0 0.8 1.6 6.2

slope 303.0 1.399340 0.616226 0.0 1.0 1.0 2.0 2.0

ca 303.0 0.729373 1.022606 0.0 0.0 0.0 1.0 4.0

thal 303.0 2.313531 0.612277 0.0 2.0 2.0 3.0 3.0

target 303.0 0.544554 0.498835 0.0 0.0 1.0 1.0 1.0

In [13]: # Plotting histogram for the entire dataset

In [14]: # Visualization to check if the dataset is balanced or not

Out[14]: Text(0, 0.5, 'Count')

# Get correlation of all the features of the dataset

# Plotting the heatmap

Out[15]: <Axes: >

Handling categorical features

Out[17]: Index(['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'target', 'sex_0',

In [18]: from sklearn.preprocessing import StandardScaler

In [20]: # Splitting the dataset into dependent and independent features

KNeighbors Classifier Model

In [23]: # Plotting the results of knn_scores

In [24]: # Training the knn classifier model with k value as 12

KNeighbours Classifier Accuracy with K=12 is: 84.48%

Decision Tree Classifier

In [27]: # Plotting the results of decision_scores

Decision Tree Classifier Accuracy with max_depth=3 is: 78.51%

Random Forest Classifier

In [31]: # Plotting the results of forest_scores

In [32]: # Training the random forest classifier model with n value as 90

Random Forest Classifier Accuracy with n_estimators=90 is: 82.80999999999999%

Github link:- https://github.com/Shubam85/Heart-disease-prediction.git

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.