0% found this document useful (0 votes)
15 views9 pages

AIML Practical 05 22105A2021

Uploaded by

Quereshi Naushin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

AIML Practical 05 22105A2021

Uploaded by

Quereshi Naushin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Naushin Quereshi

22105A2021
AIML

Practical 5: Classify patients with heart disease


Classify patients into two categories - whether they have
heart disease (1) or do not have heart disease (0). Plot
the classification result.
Also, experiment and specify which feature can be
removed to improve the classification accuracy.
Using the UCI Heart Disease Dataset.
 Description: The UCI Heart Disease dataset
contains 303 patient records, with 13 clinical features
commonly used to predict the presence of heart
disease.
 Features: The dataset includes attributes like age,
gender, chest pain type, resting blood pressure,
cholesterol, fasting blood sugar, resting
electrocardiographic results, maximum heart rate
achieved, exercise-induced angina, oldpeak (ST
depression), slope of the peak exercise ST segment,
the number of major vessels colored by fluoroscopy,
and thalassemia.
Submit neatly labelled code in Jupyter notebook which
you can create in Google Colab. Ensure that code
compiles and prints the required output correctly. Choose
appropriate visualization technique to depict the
data/results.
CODE:
# Load the dataset (replace 'path/to/HeartDisease.csv'
with the actual file path if it's in your file system)
# If the dataset is in Google Colab's file system, you can
use '/content/HeartDisease (1).csv' directly.
data = pd.read_csv('/content/HeartDisease (1).csv',
header=None)
# Load the dataset (replace 'path/to/HeartDisease.csv'
with the actual file path if it's in your file system)
# If the dataset is in Google Colab's file system, you can
use '/content/HeartDisease (1).csv' directly.
import pandas as pd

data = pd.read_csv('/content/HeartDisease (1).csv',


header=None)

# Rename the columns


# Defining column_names here
column_names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs',
'restecg', 'thalach',
'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target']

data.columns = column_names
# Display the cleaned data
data.info(), data.head()
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,
confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset


data = pd.read_csv('/content/HeartDisease (1).csv',
header=None)

# Rename the columns


column_names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs',
'restecg', 'thalach',
'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target']
data.columns = column_names

# Replace '?' with NaN and convert columns to numeric


data = data.replace('?', np.nan)
for column in data.columns:
data[column] = pd.to_numeric(data[column],
errors='coerce')

# Drop rows with NaN values


data = data.dropna()

# Separate features (X) and target (y)


X = data.drop('target', axis=1)
y = data['target']

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a Logistic Regression model


model = LogisticRegression(random_state=42)
model.fit(X_train_scaled, y_train)
# Make predictions
y_pred = model.predict(X_test_scaled)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Plot confusion matrix


cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=['No Heart Disease', 'Heart Disease'],
yticklabels=['No Heart Disease', 'Heart Disease'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

# Experiment with feature removal


for feature in X.columns:
X_temp = X.drop(feature, axis=1)
X_train_temp, X_test_temp, y_train_temp, y_test_temp
= train_test_split(
X_temp, y, test_size=0.2, random_state=42
)

scaler_temp = StandardScaler()
X_train_temp_scaled =
scaler_temp.fit_transform(X_train_temp)
X_test_temp_scaled =
scaler_temp.transform(X_test_temp)

model_temp = LogisticRegression(random_state=42)
model_temp.fit(X_train_temp_scaled, y_train_temp)

y_pred_temp =
model_temp.predict(X_test_temp_scaled)
accuracy_temp = accuracy_score(y_test_temp,
y_pred_temp)

print(f"Accuracy without {feature}:


{accuracy_temp:.2f}")

Output:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy