Predicting Exam Scores Using Linear Regression in Python
Predicting Exam Scores Using Linear Regression in Python
Objective
To build and train a simple linear regression model that predicts
students’ exam scores based on the number of hours they study.
Background
Linear regression is one of the simplest and most widely used
algorithms in machine learning. It models the relationship between a
dependent variable and one or more independent variables by fitting a
linear equation to observed data.
In this case, we attempt to understand and predict how study time
affects exam performance.
Dataset Description
A synthetic dataset is manually created with two features:
Hours: Number of hours spent studying.
Scores: Corresponding exam scores.
Example:
Methodology
1. Data Preparation
The dataset is defined using a Python dictionary and converted
into a Pandas DataFrame for easier manipulation and
visualization.
2. Data Visualization
A scatter plot is created using matplotlib to visualize the linear
relationship between hours studied and exam scores.
This step helps confirm the assumption that the relationship may
be linear.
3. Feature Selection and Splitting
Features (Hours) and labels (Scores) are separated.
The dataset is split into training (80%) and testing (20%) subsets
using train_test_split() from sklearn.
4. Model Training
A linear regression model is instantiated and trained using the
training data.
model.fit() is used to find the best-fitting line for the training set.
Code
Output
Applications
Academic Planning: Estimating exam performance based on
preparation time.
Tutoring Services: Personalizing study plans.
EdTech Platforms: Building predictive analytics dashboards for
learners.
Conclusion
This case study effectively demonstrates how linear regression can be
used to model and predict real-world outcomes—in this case, student
performance. The approach is simple yet powerful, laying the
foundation for more complex predictive modeling tasks.