0% found this document useful (0 votes)

10 views6 pages

1stTask.ipynb - Colab

The document outlines a collaborative project involving data preprocessing, class imbalance handling, and model training for COVID-19 test result prediction using various machine learning algorithms. Key steps include loading data, balancing classes with SMOTE, and evaluating models like Random Forest, Gradient Boosting, and Decision Tree based on accuracy and other metrics. The results indicate high accuracy for Random Forest and Gradient Boosting models, though performance on the positive class is notably lower.

Uploaded by

mavep24656

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

1stTask.ipynb - Colab

Uploaded by

mavep24656

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

4/8/25, 7:44 PM 1stTask.

ipynb - Colab

Group Members

1. 21/04905 James Wainaina Githirwa

2. 21/04883 Fabian Ndung'u
3. 21/06700 Peter Kamau
4. 21/04956 Oliver Samwel
5. 21/05462 Purity Njenga
6. 21/05041 Caleb Sirma
7. 21/05119 Bramwel wanyoike
8. ⁠19/02645 Ian Karanja

keyboard_arrow_down Import Libraries

This step imports the necessary libraries for data manipulation, modeling, and visualization.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (accuracy_score, classification_report,
confusion_matrix, precision_recall_curve,
PrecisionRecallDisplay)
from imblearn.over_sampling import SMOTE
from sklearn.utils import resample
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

# Suppress FutureWarnings
warnings.filterwarnings("ignore", category=FutureWarning)

keyboard_arrow_down Load and Preprocess Data

This function loads the dataset and performs initial preprocessing, including removing non-predictive columns and handling missing values.

# Load and preprocess data

def load_and_preprocess(filepath):
data = pd.read_csv(filepath)

# Remove non-predictive columns

non_predictive = [
"rapid_flu_results", "rapid_strep_results",
"cxr_findings", "cxr_impression", "cxr_label", "cxr_link",
"batch_date", "test_name", "swab_type"
]
data.drop(columns=[col for col in non_predictive if col in data.columns], inplace=True)

# Handle missing values

data.dropna(subset=["covid19_test_results"], inplace=True)

# Numerical imputation
num_cols = data.select_dtypes(include=['number']).columns
data[num_cols] = data[num_cols].fillna(data[num_cols].mean())

# Categorical imputation
cat_cols = data.select_dtypes(exclude=['number']).columns.drop('covid19_test_results')
for col in cat_cols:
data[col] = data[col].fillna(data[col].mode()[0])

return data

keyboard_arrow_down Handle Class Imbalance

This function balances the dataset by down-sampling the majority class (negative cases) to a specified size.

# Handle class imbalance

def balance_dataset(data):
majority = data[data['covid19_test_results'] == 'Negative']
minority = data[data['covid19_test_results'] == 'Positive']

https://colab.research.google.com/drive/1W1rtHxuRmCvggLRwkGG5j0Ji0cJLpOKf#scrollTo=wjI5rECNSP2q&printMode=true 1/6
4/8/25, 7:44 PM 1stTask.ipynb - Colab
# Downsample majority
majority_down = resample(majority,
replace=False,
n_samples=5000,
random_state=42)

return pd.concat([majority_down, minority], ignore_index=True)

keyboard_arrow_down Main Execution

This section executes the main workflow, including loading, balancing the dataset, and preparing for modeling.

# Main execution
data = load_and_preprocess('coronavirusdataset.csv')
balanced_data = balance_dataset(data)

# Preprocess for modeling

X = pd.get_dummies(balanced_data.drop('covid19_test_results', axis=1), drop_first=True)
y = balanced_data['covid19_test_results'].map({'Negative': 0, 'Positive': 1})

keyboard_arrow_down Split Data

This step splits the data into training and testing sets, ensuring stratification based on the target variable.

# Split data before resampling

X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.3,
stratify=y,
random_state=42
)

keyboard_arrow_down Apply SMOTE

This step applies SMOTE to the training data to balance the classes further.

# Apply SMOTE only to training data

smote = SMOTE(sampling_strategy='auto', random_state=42)
X_res, y_res = smote.fit_resample(X_train, y_train)

print("\nClass distribution after resampling:")

print(pd.Series(y_res).value_counts())

Class distribution after resampling:

covid19_test_results
0 3500
1 3500
Name: count, dtype: int64

keyboard_arrow_down Model Training and Evaluation

This section defines the models, trains them, and evaluates their performance using accuracy, classification reports, confusion matrices, and
precision-recall curves.

# Model training and evaluation

models = {
"Random Forest": RandomForestClassifier(class_weight='balanced', random_state=42),
"Gradient Boosting": GradientBoostingClassifier(random_state=42),
"Decision Tree": DecisionTreeClassifier(class_weight='balanced', random_state=42)
}

for name, model in models.items():

# Training
model.fit(X_res, y_res)

# Prediction
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:,1] if hasattr(model, "predict_proba") else [0]*len(y_test)

# Evaluation
print(f"\n{name} Evaluation:")

https://colab.research.google.com/drive/1W1rtHxuRmCvggLRwkGG5j0Ji0cJLpOKf#scrollTo=wjI5rECNSP2q&printMode=true 2/6
4/8/25, 7:44 PM 1stTask.ipynb - Colab
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=['Predicted Negative', 'Predicted Positive'],
yticklabels=['Actual Negative', 'Actual Positive'])
plt.title(f"{name} Confusion Matrix")
plt.show()

# Precision-Recall Curve
precision, recall, _ = precision_recall_curve(y_test, y_proba)
disp = PrecisionRecallDisplay(precision=precision, recall=recall)
disp.plot()
plt.title(f"{name} Precision-Recall Curve")
plt.show()

# Feature Importance (if available)

if hasattr(model, 'feature_importances_'):
importances = pd.Series(model.feature_importances_, index=X.columns)
top_features = importances.sort_values(ascending=False).head(10)

plt.figure(figsize=(10,6))
top_features.sort_values().plot.barh(color='darkgreen')
plt.title(f"{name} - Top 10 Features")
plt.xlabel("Importance Score")
plt.tight_layout()
plt.show()

https://colab.research.google.com/drive/1W1rtHxuRmCvggLRwkGG5j0Ji0cJLpOKf#scrollTo=wjI5rECNSP2q&printMode=true 3/6
4/8/25, 7:44 PM 1stTask.ipynb - Colab

Random Forest Evaluation:

Accuracy: 0.9714854111405835
precision recall f1-score support

Negative 1.00 0.98 0.99 1500

Positive 0.05 0.25 0.09 8

accuracy 0.97 1508

macro avg 0.52 0.61 0.54 1508
weighted avg 0.99 0.97 0.98 1508

https://colab.research.google.com/drive/1W1rtHxuRmCvggLRwkGG5j0Ji0cJLpOKf#scrollTo=wjI5rECNSP2q&printMode=true 4/6
4/8/25, 7:44 PM 1stTask.ipynb - Colab

Gradient Boosting Evaluation:

Accuracy: 0.9602122015915119
precision recall f1-score support

Negative 1.00 0.96 0.98 1500

Positive 0.07 0.50 0.12 8

accuracy 0.96 1508

macro avg 0.53 0.73 0.55 1508
weighted avg 0.99 0.96 0.98 1508

https://colab.research.google.com/drive/1W1rtHxuRmCvggLRwkGG5j0Ji0cJLpOKf#scrollTo=wjI5rECNSP2q&printMode=true 5/6
4/8/25, 7:44 PM 1stTask.ipynb - Colab

Decision Tree Evaluation:

Accuracy: 0.9602122015915119
precision recall f1-score support

Negative 1.00 0.96 0.98 1500

Positive 0.02 0.12 0.03 8

accuracy 0.96 1508

macro avg 0.51 0.54 0.51 1508
weighted avg 0.99 0.96 0.97 1508

https://colab.research.google.com/drive/1W1rtHxuRmCvggLRwkGG5j0Ji0cJLpOKf#scrollTo=wjI5rECNSP2q&printMode=true 6/6

Bronstein V Latin School of Chicago EXHIBITS To Omnibus Opposition To Motions To Dismiss
No ratings yet
Bronstein V Latin School of Chicago EXHIBITS To Omnibus Opposition To Motions To Dismiss
134 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
Stats 101c Final Project
100% (1)
Stats 101c Final Project
16 pages
911 Social Security Death Index, Tail Numbers, Daniel Lewin, Flight 77
100% (1)
911 Social Security Death Index, Tail Numbers, Daniel Lewin, Flight 77
44 pages
Summary
No ratings yet
Summary
51 pages
Week10 - Colab
No ratings yet
Week10 - Colab
3 pages
COMPARISON - Jupyter Notebook
No ratings yet
COMPARISON - Jupyter Notebook
5 pages
Data_Mining_Lab-2
No ratings yet
Data_Mining_Lab-2
6 pages
ML101 Graded Assignment 2.Ipynb - Colab
No ratings yet
ML101 Graded Assignment 2.Ipynb - Colab
6 pages
Machine Learning Assignment (1)
No ratings yet
Machine Learning Assignment (1)
8 pages
ML Lab Assessment 4
No ratings yet
ML Lab Assessment 4
4 pages
Ann 2
No ratings yet
Ann 2
8 pages
Python Essential Methods In Machine Learning
No ratings yet
Python Essential Methods In Machine Learning
6 pages
Q2.ipynb - Colaboratory
No ratings yet
Q2.ipynb - Colaboratory
3 pages
DeepLearningLab2.Ipynb - Colab
No ratings yet
DeepLearningLab2.Ipynb - Colab
7 pages
Jaycolpdf 1
No ratings yet
Jaycolpdf 1
5 pages
ML Assignment 4
No ratings yet
ML Assignment 4
7 pages
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
9 pages
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
10 pages
TP.ipynb - Colab
No ratings yet
TP.ipynb - Colab
6 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
ads-exp-8-smote
No ratings yet
ads-exp-8-smote
8 pages
Machine Learnin1
100% (1)
Machine Learnin1
41 pages
ML Mini Project (1)
No ratings yet
ML Mini Project (1)
9 pages
machine-learning-assignment (1)
No ratings yet
machine-learning-assignment (1)
7 pages
ANN_EXPERIENTIAL_LEARNING
No ratings yet
ANN_EXPERIENTIAL_LEARNING
43 pages
Machine Learning Lab Assignment CSE-716: S. M. Shafkat Raihan ID: 16701041 SESSION: 2015-16
No ratings yet
Machine Learning Lab Assignment CSE-716: S. M. Shafkat Raihan ID: 16701041 SESSION: 2015-16
9 pages
Machine Learning Final Report
No ratings yet
Machine Learning Final Report
8 pages
Binary Classification
No ratings yet
Binary Classification
2 pages
Lab4 - Jupyter Notebook
No ratings yet
Lab4 - Jupyter Notebook
7 pages
Ml-Exp-2 - Jupyter Notebook
No ratings yet
Ml-Exp-2 - Jupyter Notebook
2 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Screenshot 2024-03-19 at 8.41.33 PM
No ratings yet
Screenshot 2024-03-19 at 8.41.33 PM
3 pages
Imbalanced Dataset Customer Churn
No ratings yet
Imbalanced Dataset Customer Churn
9 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
22 pages
ML 5
No ratings yet
ML 5
3 pages
Northbay Summarizes Data Pre-Processing Algorithms
No ratings yet
Northbay Summarizes Data Pre-Processing Algorithms
10 pages
ML LAB 146
No ratings yet
ML LAB 146
50 pages
Codigo Modelo
No ratings yet
Codigo Modelo
5 pages
Lab2 Linear Regression
100% (1)
Lab2 Linear Regression
18 pages
ML 11 Decision Trees
No ratings yet
ML 11 Decision Trees
4 pages
AIML_ECE304_Assign-2_Kartikeya_Kandpal_Ajitesh_S.ipynb - Colab
No ratings yet
AIML_ECE304_Assign-2_Kartikeya_Kandpal_Ajitesh_S.ipynb - Colab
3 pages
DL Practical PROGRAM
No ratings yet
DL Practical PROGRAM
28 pages
ML Python Lab
No ratings yet
ML Python Lab
1 page
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
LSTM - Jupyter Notebook
No ratings yet
LSTM - Jupyter Notebook
7 pages
Diabetic Classification Using Machine Learning
No ratings yet
Diabetic Classification Using Machine Learning
10 pages
Loadalgarve MLP
No ratings yet
Loadalgarve MLP
7 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
5 pages
Jupyter Lab
No ratings yet
Jupyter Lab
42 pages
exp2
No ratings yet
exp2
3 pages
Comprehensive Overview of Common ML Techniques
No ratings yet
Comprehensive Overview of Common ML Techniques
7 pages
Untitled 10
No ratings yet
Untitled 10
6 pages
AI Note
No ratings yet
AI Note
5 pages
ML Mini Project
No ratings yet
ML Mini Project
9 pages
Import As Import As Import As Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import
No ratings yet
Import As Import As Import As Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import
8 pages
data preprocessing
No ratings yet
data preprocessing
9 pages
ML Usar Manual-2
No ratings yet
ML Usar Manual-2
21 pages
DL & AI - Lab Manual
No ratings yet
DL & AI - Lab Manual
33 pages
Ensemble Learning
No ratings yet
Ensemble Learning
1 page
Supervised_classi_&_regression
No ratings yet
Supervised_classi_&_regression
5 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
AG-HMC70 Manual
No ratings yet
AG-HMC70 Manual
8 pages
HF43F Datasheet
No ratings yet
HF43F Datasheet
3 pages
Elwood Parts and Service Manual
No ratings yet
Elwood Parts and Service Manual
75 pages
1A - Safe Operation in Chemical Plants With Stop Work Authority
No ratings yet
1A - Safe Operation in Chemical Plants With Stop Work Authority
12 pages
PROJECT[1]
No ratings yet
PROJECT[1]
11 pages
(Template) Prelim SFG 65-Answers
No ratings yet
(Template) Prelim SFG 65-Answers
3 pages
Law Nov 21 Notes by CA Kevin Haria For Updates, Join Channel @KEVINHARIA
No ratings yet
Law Nov 21 Notes by CA Kevin Haria For Updates, Join Channel @KEVINHARIA
349 pages
National Artist in Visual Arts
No ratings yet
National Artist in Visual Arts
70 pages
L6 Changing Pattern of Industrialisation Under The British
100% (5)
L6 Changing Pattern of Industrialisation Under The British
3 pages
The Importance of Energy Changes and Electron Transfer in Metabolism
No ratings yet
The Importance of Energy Changes and Electron Transfer in Metabolism
19 pages
Group Theory: Symmetry Operations
No ratings yet
Group Theory: Symmetry Operations
4 pages
Free Trade and Autarky
No ratings yet
Free Trade and Autarky
21 pages
Nervous System For Nursing
100% (3)
Nervous System For Nursing
112 pages
Cambridge CELTA Course Trainee Book2
No ratings yet
Cambridge CELTA Course Trainee Book2
4 pages
Business Management
No ratings yet
Business Management
4 pages
NetApp ONTAP 9.12.1 - Snaplock
No ratings yet
NetApp ONTAP 9.12.1 - Snaplock
12 pages
Effect of Different Soil Types On Growth and Productivity OF RED KIDNEY BEANS (Phaseolus Vulgaris)
No ratings yet
Effect of Different Soil Types On Growth and Productivity OF RED KIDNEY BEANS (Phaseolus Vulgaris)
9 pages
Activity 2 _ Robotics Engineering Explorations
No ratings yet
Activity 2 _ Robotics Engineering Explorations
5 pages
Hold-Time-Study-Protocol-Cum-Report-for-Oral-Liquid
No ratings yet
Hold-Time-Study-Protocol-Cum-Report-for-Oral-Liquid
10 pages
Unit 4 - Statistics
No ratings yet
Unit 4 - Statistics
52 pages
The BIG Questions. Life There Are Over 100 Definitions For 'Life' and All Are Wrong
No ratings yet
The BIG Questions. Life There Are Over 100 Definitions For 'Life' and All Are Wrong
7 pages
Kami Export - Hilary Buscaglia - Wk5ASBio3ExchangeIS
No ratings yet
Kami Export - Hilary Buscaglia - Wk5ASBio3ExchangeIS
3 pages
How To Use The Child/Adolescent Psychiatry Screen (CAPS)
No ratings yet
How To Use The Child/Adolescent Psychiatry Screen (CAPS)
3 pages
Mac Family Tree User Guide
No ratings yet
Mac Family Tree User Guide
179 pages
Head Trip
No ratings yet
Head Trip
3 pages
GE Mark VI Manual - 1
86% (28)
GE Mark VI Manual - 1
156 pages
The Folded Earth by Anuradha Roy
No ratings yet
The Folded Earth by Anuradha Roy
5 pages
High Availability: Administration Guide
No ratings yet
High Availability: Administration Guide
59 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

1stTask.ipynb - Colab

Uploaded by

1stTask.ipynb - Colab

Uploaded by

4/8/25, 7:44 PM 1stTask.

1. 21/04905 James Wainaina Githirwa

keyboard_arrow_down Import Libraries

keyboard_arrow_down Load and Preprocess Data

# Load and preprocess data

# Remove non-predictive columns

# Handle missing values

keyboard_arrow_down Handle Class Imbalance

# Handle class imbalance

return pd.concat([majority_down, minority], ignore_index=True)

keyboard_arrow_down Main Execution

# Preprocess for modeling

keyboard_arrow_down Split Data

# Split data before resampling

keyboard_arrow_down Apply SMOTE

# Apply SMOTE only to training data

print("\nClass distribution after resampling:")

Class distribution after resampling:

keyboard_arrow_down Model Training and Evaluation

# Model training and evaluation

for name, model in models.items():

# Feature Importance (if available)

Random Forest Evaluation:

Negative 1.00 0.98 0.99 1500

accuracy 0.97 1508

Gradient Boosting Evaluation:

Negative 1.00 0.96 0.98 1500

accuracy 0.96 1508

Decision Tree Evaluation:

Negative 1.00 0.96 0.98 1500

accuracy 0.96 1508

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.