0% found this document useful (0 votes)

15 views12 pages

Dsbda 5

The document outlines a data analysis process for a heart disease dataset using Python and pandas. It includes data loading, cleaning, and exploratory analysis, with a focus on handling missing values and outliers. The dataset consists of 302 entries and 14 columns, with various health-related features and a target variable indicating the presence of heart disease.

Uploaded by

gagan.d0077

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

Dsbda 5

Uploaded by

gagan.d0077

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

TI18_DSBDA_5th_HeartDisease

January 30, 2025

[1]: import pandas as pd

[3]: df=pd.read_csv("/home/bcl07/heart_disease.csv")

[4]: df

[4]: age sex cp trestbps chol fbs restecg thalach exang oldpeak \
0 52 1 0 125 212 0 1 168 0 1.0
1 53 1 0 140 203 1 0 155 1 3.1
2 70 1 0 145 174 0 1 125 1 2.6
3 61 1 0 148 203 0 1 161 0 0.0
4 62 0 0 138 294 1 1 106 0 1.9
… … … .. … … … … … … …
1020 59 1 1 140 221 0 1 164 1 0.0
1021 60 1 0 125 258 0 0 141 1 2.8
1022 47 1 0 110 275 0 0 118 1 1.0
1023 50 0 0 110 254 0 0 159 0 0.0
1024 54 1 0 120 188 0 1 113 0 1.4

slope ca thal target

0 2 2 3 0
1 0 0 3 0
2 0 0 3 0
3 2 1 3 0
4 1 3 2 0
… … .. … …
1020 2 0 2 1
1021 1 1 3 0
1022 1 1 2 0
1023 2 0 2 1
1024 1 1 3 0

[1025 rows x 14 columns]

[5]: df.columns

1
[5]: Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
dtype='object')

[6]: df.isnull().sum()

[6]: age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64

[8]: df=df.drop_duplicates()

[9]: df.describe()

[9]: age sex cp trestbps chol fbs \

count 302.00000 302.000000 302.000000 302.000000 302.000000 302.000000
mean 54.42053 0.682119 0.963576 131.602649 246.500000 0.149007
std 9.04797 0.466426 1.032044 17.563394 51.753489 0.356686
min 29.00000 0.000000 0.000000 94.000000 126.000000 0.000000
25% 48.00000 0.000000 0.000000 120.000000 211.000000 0.000000
50% 55.50000 1.000000 1.000000 130.000000 240.500000 0.000000
75% 61.00000 1.000000 2.000000 140.000000 274.750000 0.000000
max 77.00000 1.000000 3.000000 200.000000 564.000000 1.000000

restecg thalach exang oldpeak slope ca \

count 302.000000 302.000000 302.000000 302.000000 302.000000 302.000000
mean 0.526490 149.569536 0.327815 1.043046 1.397351 0.718543
std 0.526027 22.903527 0.470196 1.161452 0.616274 1.006748
min 0.000000 71.000000 0.000000 0.000000 0.000000 0.000000
25% 0.000000 133.250000 0.000000 0.000000 1.000000 0.000000
50% 1.000000 152.500000 0.000000 0.800000 1.000000 0.000000
75% 1.000000 166.000000 1.000000 1.600000 2.000000 1.000000
max 2.000000 202.000000 1.000000 6.200000 2.000000 4.000000

thal target

2
count 302.000000 302.000000
mean 2.314570 0.543046
std 0.613026 0.498970
min 0.000000 0.000000
25% 2.000000 0.000000
50% 2.000000 1.000000
75% 3.000000 1.000000
max 3.000000 1.000000

[10]: df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 302 entries, 0 to 878
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 302 non-null int64
1 sex 302 non-null int64
2 cp 302 non-null int64
3 trestbps 302 non-null int64
4 chol 302 non-null int64
5 fbs 302 non-null int64
6 restecg 302 non-null int64
7 thalach 302 non-null int64
8 exang 302 non-null int64
9 oldpeak 302 non-null float64
10 slope 302 non-null int64
11 ca 302 non-null int64
12 thal 302 non-null int64
13 target 302 non-null int64
dtypes: float64(1), int64(13)
memory usage: 35.4 KB

[11]: df.isna().sum()

[11]: age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0

3
thal 0
target 0
dtype: int64

[12]: df.head()

[12]: age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \
0 52 1 0 125 212 0 1 168 0 1.0 2
1 53 1 0 140 203 1 0 155 1 3.1 0
2 70 1 0 145 174 0 1 125 1 2.6 0
3 61 1 0 148 203 0 1 161 0 0.0 2
4 62 0 0 138 294 1 1 106 0 1.9 1

ca thal target
0 2 3 0
1 0 3 0
2 0 3 0
3 1 3 0
4 3 2 0

[13]: df.fbs.unique()

[13]: array([0, 1])

[16]: subSet1 = df[['age','cp','chol','thal']]

[18]: subSet2 = df[['exang','slope','target']]

[19]: merged_df = subSet1.merge(right=subSet2,how='cross')

merged_df.head()

[19]: age cp chol thal exang slope target

0 52 0 212 3 0 2 0
1 52 0 212 3 1 0 0
2 52 0 212 3 1 0 0
3 52 0 212 3 0 2 0
4 52 0 212 3 0 1 0

[20]: df.columns

[20]: Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',

'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
dtype='object')

[21]: def remove_outliers(column):

Q1 = column.quantile(0.25)
Q3 = column.quantile(0.75)

4
IQR = Q3 - Q1
threshold = 1.5 * IQR
outlier_mask = (column < Q1 - threshold) | (column > Q3 + threshold)
return column[~outlier_mask]

[23]: col_name = ['cp','thal','exang','oldpeak','slope','ca']

for col in col_name:
df[col] = remove_outliers(df[col])

/tmp/ipykernel_10564/1228815343.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-

docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = remove_outliers(df[col])
/tmp/ipykernel_10564/1228815343.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-

5
See the caveats in the documentation: https://pandas.pydata.org/pandas-
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = remove_outliers(df[col])

[28]: from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn.linear_model import LogisticRegression
import seaborn as sns
import matplotlib.pyplot as plt

[29]: plt.figure(figsize=(10, 6)) # Adjust the figure size if needed

for col in col_name:

sns.boxplot(data=df[col])
plt.title(col)
plt.show()

6
7
8
9
[30]: df = df.dropna()

[31]: df.isna().sum()

[31]: age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64

[32]: df = df.drop('fbs',axis=1)

[34]: correlations = df.corr()['target'].drop('target')

# Print correlations
print("Correlation with the Target:")
print(correlations)
print()

# Plot correlation heatmap

plt.figure(figsize=(8, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

Correlation with the Target:

age -0.199970
sex -0.300311
cp 0.408985
trestbps -0.132882
chol -0.053834
restecg 0.125710
thalach 0.398870
exang -0.435511
oldpeak -0.436247

10
slope 0.327420
ca -0.459629
thal -0.389514
Name: target, dtype: float64

[37]: x = df[['cp','thal','exang','oldpeak','slope','ca']]
y = df.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)

x_train.shape,x_test.shape,y_train.shape,y_test.shape

[37]: ((219, 6), (55, 6), (219,), (55,))

[38]: from sklearn.preprocessing import StandardScaler

[39]: scaler = StandardScaler()

11
[40]: x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

[42]: import numpy as np

[43]: y_train= np.array(y_train).reshape(-1, 1)

y_test= np.array(y_test).reshape(-1, 1)

[44]: y_train.shape

[44]: (219, 1)

[45]: model = LogisticRegression()

model.fit(x_train_scaled, y_train)

# Make predictions on the test set

y_pred = model.predict(x_test_scaled)

# Evaluate the model's accuracy

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.8181818181818182
/home/bcl07/.local/lib/python3.8/site-packages/sklearn/utils/validation.py:1183:
DataConversionWarning: A column-vector y was passed when a 1d array was
expected. Please change the shape of y to (n_samples, ), for example using
ravel().
y = column_or_1d(y, warn=True)

[46]: #Classification model using Decision Tree

from sklearn.tree import DecisionTreeClassifier
tc=DecisionTreeClassifier(criterion='entropy')
tc.fit(x_train_scaled,y_train)
y_pred=tc.predict(x_test_scaled)

print("Training Accuracy Score :",accuracy_score(y_pred,y_test))

print("Training Confusion Matrix :",confusion_matrix(y_pred,y_test))

Training Accuracy Score : 0.7454545454545455

Training Confusion Matrix : [[21 5]
[ 9 20]]

[ ]:

Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Vedant, Aiml
No ratings yet
Vedant, Aiml
63 pages
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
Stroke Prediction
No ratings yet
Stroke Prediction
14 pages
Data Cleaning
No ratings yet
Data Cleaning
22 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Eda-Ml-Decision-Tree - Ipynb - Colab
No ratings yet
Eda-Ml-Decision-Tree - Ipynb - Colab
20 pages
Aiml
No ratings yet
Aiml
27 pages
CSDA
No ratings yet
CSDA
7 pages
COMP5318
No ratings yet
COMP5318
42 pages
Machine Learning Algorithm 1690246024
No ratings yet
Machine Learning Algorithm 1690246024
26 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Linear Reg Signal and Noise PDF
No ratings yet
Linear Reg Signal and Noise PDF
20 pages
Anemia Code
No ratings yet
Anemia Code
33 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
Heart - Disease - 1.ipynb - Colaboratory
No ratings yet
Heart - Disease - 1.ipynb - Colaboratory
9 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
Cover Letter Akash
No ratings yet
Cover Letter Akash
1 page
DSDBAAssignment2 SUMEET
No ratings yet
DSDBAAssignment2 SUMEET
8 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Heart Disease Classification Using Ann Hands-On
No ratings yet
Heart Disease Classification Using Ann Hands-On
7 pages
DSBDA4
No ratings yet
DSBDA4
6 pages
Assignment 1
No ratings yet
Assignment 1
11 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Heart Diesese
No ratings yet
Heart Diesese
9 pages
Prg7a - Jupyter Notebook
No ratings yet
Prg7a - Jupyter Notebook
12 pages
AI Mini Project
No ratings yet
AI Mini Project
6 pages
B 4 Heart
No ratings yet
B 4 Heart
9 pages
ML 7
No ratings yet
ML 7
6 pages
LP Practical ! Jupyter Notebook
No ratings yet
LP Practical ! Jupyter Notebook
6 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Heart Disease Classification ML Assignment - Jupyter Notebook
No ratings yet
Heart Disease Classification ML Assignment - Jupyter Notebook
7 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
Starting Out With C++ - From Control Structures Through Objects 9th Edition Tony Gaddis Download
No ratings yet
Starting Out With C++ - From Control Structures Through Objects 9th Edition Tony Gaddis Download
55 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Openlab 1
No ratings yet
Openlab 1
17 pages
'Name-Piyush Tiwari''/n' 'Section - C'/N' 'Roll - No-2001610100142'
No ratings yet
'Name-Piyush Tiwari''/n' 'Section - C'/N' 'Roll - No-2001610100142'
28 pages
ML Merged
No ratings yet
ML Merged
28 pages
Week1 Code Corrected
No ratings yet
Week1 Code Corrected
2 pages
RajiKermani Asu 0010N 13314
No ratings yet
RajiKermani Asu 0010N 13314
152 pages
Binary Tree Data Structure
100% (1)
Binary Tree Data Structure
56 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
No ratings yet
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
15 pages
Datascience
No ratings yet
Datascience
1 page
DV Mid Internal 1
No ratings yet
DV Mid Internal 1
8 pages
Dot Net Technology
No ratings yet
Dot Net Technology
142 pages
Uii Data, Expressions, Statements-1
No ratings yet
Uii Data, Expressions, Statements-1
28 pages
Pandas Library Problems For Parctice
No ratings yet
Pandas Library Problems For Parctice
13 pages
Linear Regression: Data Exploration
No ratings yet
Linear Regression: Data Exploration
12 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
21 Longest Common Subsequence (LCS)
No ratings yet
21 Longest Common Subsequence (LCS)
41 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Blockchain Question Bank - MCQs II Module
No ratings yet
Blockchain Question Bank - MCQs II Module
38 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Unit 3 - 1 Central Processing Unit
No ratings yet
Unit 3 - 1 Central Processing Unit
79 pages
Industrial Training Report
No ratings yet
Industrial Training Report
25 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Drawbacks of Procedural Language
No ratings yet
Drawbacks of Procedural Language
32 pages
5 Condition 23 Lms
No ratings yet
5 Condition 23 Lms
44 pages
UCS310
No ratings yet
UCS310
2 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Coding Contest With Answers
No ratings yet
Coding Contest With Answers
21 pages
Python Report SHREE
No ratings yet
Python Report SHREE
35 pages
WMR Fleet Management Interfaces Mínimas Manual v2.1
No ratings yet
WMR Fleet Management Interfaces Mínimas Manual v2.1
13 pages
Chapter 4: Modern Cryptography: Cryptographic Hash Functions
No ratings yet
Chapter 4: Modern Cryptography: Cryptographic Hash Functions
4 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
LAB2
No ratings yet
LAB2
4 pages
Half Yearly Worksheet
No ratings yet
Half Yearly Worksheet
5 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
ASSIGNMENT Discrete Mathematics
No ratings yet
ASSIGNMENT Discrete Mathematics
7 pages
Cs 2
No ratings yet
Cs 2
4 pages
Xzdec Letter
No ratings yet
Xzdec Letter
2 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
CSA Lab 10
No ratings yet
CSA Lab 10
4 pages
BITS Pilani: Reference Types & Java Memory Management
No ratings yet
BITS Pilani: Reference Types & Java Memory Management
10 pages
Dsa Live: Working Professional
No ratings yet
Dsa Live: Working Professional
8 pages
Spring 2024 - CS604 - 2
No ratings yet
Spring 2024 - CS604 - 2
2 pages
Bca 3 y Imp Question Python
No ratings yet
Bca 3 y Imp Question Python
2 pages
Chapter 7 Critical Thinking Questions
No ratings yet
Chapter 7 Critical Thinking Questions
2 pages
Shake Them Haters off Volume 12: Mastering Your Mathematics Skills – the Study Guide
From Everand
Shake Them Haters off Volume 12: Mastering Your Mathematics Skills – the Study Guide
Russell Bailey
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Dsbda 5

Uploaded by

Dsbda 5

Uploaded by

TI18_DSBDA_5th_HeartDisease

January 30, 2025

[1]: import pandas as pd

slope ca thal target

[1025 rows x 14 columns]

[9]: age sex cp trestbps chol fbs \

restecg thalach exang oldpeak slope ca \

[13]: array([0, 1])

[16]: subSet1 = df[['age','cp','chol','thal']]

[18]: subSet2 = df[['exang','slope','target']]

[19]: merged_df = subSet1.merge(right=subSet2,how='cross')

[19]: age cp chol thal exang slope target

[20]: Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',

[21]: def remove_outliers(column):

[23]: col_name = ['cp','thal','exang','oldpeak','slope','ca']

See the caveats in the documentation: https://pandas.pydata.org/pandas-

See the caveats in the documentation: https://pandas.pydata.org/pandas-

See the caveats in the documentation: https://pandas.pydata.org/pandas-

See the caveats in the documentation: https://pandas.pydata.org/pandas-

See the caveats in the documentation: https://pandas.pydata.org/pandas-

[28]: from sklearn.model_selection import train_test_split

[29]: plt.figure(figsize=(10, 6)) # Adjust the figure size if needed

for col in col_name:

[34]: correlations = df.corr()['target'].drop('target')

# Plot correlation heatmap

Correlation with the Target:

[37]: ((219, 6), (55, 6), (219,), (55,))

[38]: from sklearn.preprocessing import StandardScaler

[39]: scaler = StandardScaler()

[42]: import numpy as np

[43]: y_train= np.array(y_train).reshape(-1, 1)

[45]: model = LogisticRegression()

# Make predictions on the test set

# Evaluate the model's accuracy

[46]: #Classification model using Decision Tree

print("Training Accuracy Score :",accuracy_score(y_pred,y_test))

Training Accuracy Score : 0.7454545454545455

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.