SFA Paper 9
SFA Paper 9
Machine Learning)
Abstract
This study introduces a machine learning-based model to predict students' final
grades, focusing on handling imbalanced datasets in grade classification. Five
machine learning algorithms—Decision Tree (J48), Support Vector Machine
(SVM), Naïve Bayes (NB), K-Nearest Neighbor (kNN), and Logistic Regression
(LR)—were compared for accuracy. By incorporating Synthetic Minority
Oversampling Technique (SMOTE) and feature selection (F.S.), the model
achieved up to 99.6% accuracy. This work aims to aid educators in identifying
students at risk and improving academic outcomes.
Introduction
Higher education institutions collect large amounts of student data, but
effectively predicting academic performance remains a challenge. Traditional
methods struggle with imbalanced datasets, leading to inaccurate grade
predictions. This study tackles these challenges by developing a robust machine
learning model that classifies student grades into five categories (exceptional,
excellent, distinction, pass, fail). The approach improves the accuracy of
predictions, enabling better decision-making by educators to support students
effectively.
Methodology
The proposed framework follows four steps:
1. Data Collection: Student grades from two core courses over multiple
years were analyzed.
2. Data Preprocessing: Grades were categorized into five groups.
Imbalanced class issues were addressed using SMOTE and feature
selection algorithms.
3. Model Training: Five machine learning algorithms were applied, and
their accuracy was compared using 10-fold cross-validation.
4. Evaluation: Metrics like accuracy, precision, recall, and F-measure were
used to determine the best model.
Technology Used
1. Algorithms:
Decision Tree (J48)
Support Vector Machine (SVM)
Naïve Bayes (NB)
K-Nearest Neighbor (kNN)
Logistic Regression (LR)
2. Data Techniques:
SMOTE: Balances class distribution by oversampling minority
classes.
Feature Selection: Reduces data dimensions, improving
accuracy.
3. Tools:
WEKA: A data mining tool used for model implementation and
analysis.
Future Scope
Advanced Techniques: Incorporate more sophisticated algorithms like
neural networks or ensemble models.
Expanding Features: Include more student attributes such as
attendance, demographics, and behavioral data.
Scalability: Apply the model to larger, more diverse datasets for better
generalization.
Real-Time Monitoring: Develop systems to predict student performance
dynamically throughout a semester.