2 Final
2 Final
PERFORMANCE
A PROJECT REPORT
Submitted by
ANISHA.V : (22353008)
JOSHNA PRINCY.B.P : (22353010)
PRIYANKA.M : (22353005)
APRIL 2025
BONAFIDE CERTIFICATE
Name: Name:
Designation: Designation:
1
ACKNOWLEGEMENT
I would like to take this opportunity to express my heartfelt gratitude to all those who
have guided, supported, and encouraged me throughout the duration of this project.
First and foremost, I express my sincere thanks to Dr. S. Gokila, Professor and Head,
Department of Computer Applications, for her invaluable support, encouragement,
and for showing keen interest in our project. Her continued guidance and constructive
feedback served as a great source of motivation and direction at every stage of this
project.
I would also like to extend my deep appreciation to Dr. Angeline Benita, Assistant
Professor (SG) and Project Coordinator (BCA), whose unwavering support,
supervision, and helpful suggestions played a crucial role in the successful
completion of this project. Her availability and willingness to help at all times were
immensely helpful.
A heartfelt thanks to all my faculty members for their academic support, guidance,
and encouragement throughout my degree program. Their dedication and passion for
teaching have been a true inspiration.
I am also thankful to my friends and peers who have supported me in various ways —
through discussions, collaborative problem-solving, and by offering valuable
suggestions that enhanced the quality of my coding and development work.
Through this project, I have not only improved my technical skills but also learned
the importance of teamwork, research, problem-solving, and time management. This
growth was made possible due to the opportunity and freedom given to me by my
mentors and teachers, for which I am truly thankful.
Last but by no means least, I wish to thank God Almighty for giving me the strength,
patience, and resilience to carry this project forward. I am equally thankful to my
family, who have stood by me unconditionally, offering their love, encouragement,
and blessings throughout this journey.
2
TABLE OF CONTENT
Chapter Page
Title
No. No.
Abstract 4
INTRODUCTION
1.1 Overview of Student Performance Prediction 5
1
1.2 Importance of Data-Driven Academic Insights 6
1.3 Objectivities 7
LITERATURE REVIEW
2.1 Review Details 9
2 2.2 Application of Machine Learning in Education 9
2.3 Use of Random Forest for Regression 10
2.4 Importance of Visualization in EdTech 11
SYSTEM DESIGN
3.1 Existing System 13
3.2 Proposed System 14
3
3.3 Software Requirements 14
3.4 Hardware Requirements 15
3.5 System Architecture 16
MODULE DESCRIPTION
4.1 Data Generation Module 19
4 4.2 Data preprocessing module 20
4.3 Performance modelling module 20
4.4 Final Computation module 21
SYSTEM IMPLEMENTATION 24
5.1 Unit testing 24
5 5.2 Test cases and result 26
5.3 Module evaluation metrics 27
5.4 User interface testing 29
5.5Graph display validation 30
IMPLEMENTATION 31
6 6.1 Implementing 32
6.2 Source code 32
CONCLUSION & FUTURE WORK 39
7.1 Conclusion and Results 39
7
7.2 Future work 42
3
ABSTRACT
4
CHAPTER-1
INTRODUCTION
5
1.2 IMPORTANCE OF DATA-DRIVEN ACADEMIC INSIGHTS
The emergence of educational data mining (EDM) and learning analytics (LA) has
enabled a more holistic approach. These domains leverage large volumes of
educational data to uncover patterns and trends that may not be immediately apparent.
Predictive models, such as the one implemented in this project, go beyond
retrospective analysis to offer foresight into a student’s potential, strengths, and areas
needing improvement.
6
interface design, this project bridges the gap between complex analytics and practical
educational.
1.3 OBJECTIVES
The overarching goal of this project is to design and develop a predictive model that
utilizes multiple student-related factors to accurately forecast academic performance.
This prediction serves as a decision-support tool for educators and learners, aiding in
better planning, self-assessment, and academic strategy.
7
This project also seeks to raise awareness of the value of integrating AI into
educational systems, encouraging institutions to adopt predictive analytics tools to
better serve students. As AI becomes increasingly embedded in day-to-day life, its
application in academia is not only timely but essential for nurturing student success
in a personalized and proactive manner.
8
CHAPTER 2
LITERATURE REVIEW
The use of machine learning (ML) in education has evolved rapidly over the last
decade, driven by the need for personalized learning experiences and better
performance analytics. Various studies have explored the integration of predictive
models to assess students’ academic success based on diverse factors like attendance,
study habits, socio-economic background, and psychological parameters.
This chapter examines prior research contributions that have shaped the direction of
educational data mining and intelligent tutoring systems. Emphasis is laid on the
application of Random Forest for regression problems, given its proven robustness in
handling structured educational datasets. Moreover, the importance of visualization in
educational technology (EdTech) is discussed, highlighting how interactive graphs
and dashboards can significantly improve stakeholder engagement and decision-
making.
The review is structured into key thematic areas relevant to the present project: (1)
Machine Learning in Education, (2) Regression through Random Forest, and (3)
Visualization and UI in EdTech applications.
The integration of ML with cloud-based dashboards and mobile applications has also
made it easier for educators and institutions to deploy these predictive solutions at
scale. This project leverages this trend by deploying a Random Forest-based student
performance predictor via a Gradio-powered web interface.
10
educational data for regression and training the model to predict a performance score
based on variables like GPA, hours studied, and test scores.
The model in this project uses RandomForestRegressor with 150 estimators and a
maximum depth of 8, optimized for balanced performance and generalization.
Interactive visualizations built with libraries like Matplotlib and Seaborn, and hosted
via platforms like Gradio, significantly enhance user experience. In this project, a
dark-themed performance dashboard provides an intuitive, visually appealing way to
interpret the predicted results.
Research by Few (2009) underscores the cognitive benefits of simple yet impactful
visual representations, particularly in reducing information overload. Moreover, the
“Performance Graph” in this project—a bar chart showing the predicted score—
creates an immediate understanding of the outcome.
11
Features of the implemented visualizations include:
12
CHAPTER-3
SYSTEM DESIGN
13
3.2 PROPOSED SYSTEM
Hours Studied
Attendance Percentage
Previous GPA
Test Scores
Study Habits
What sets this system apart is its real-time interactivity. Users input their personal
data through sliders, radio buttons, and dropdown menus. With a single click, they
receive:
The Gradio interface is styled with dark mode aesthetics, using CSS
customizations to make the experience visually appealing and easy on the eyes. The
system also includes data preprocessing modules such as one-hot encoding,
normalization, and real-time feature alignment to maintain model consistency.
14
Moreover, this system is modular and scalable. It can be extended to include
additional features such as mental health indicators, time management scores, and
peer comparison. It can also be integrated into LMS platforms to allow continuous
tracking and predictive updates throughout the semester.
Pandas & NumPy: Used for data creation, manipulation, and simulation of
synthetic datasets.
All dependencies are open-source and easily installable via pip or through the
Anaconda environment manager.
15
Operating System: Windows 10 / Linux Ubuntu / macOS
Internet Connection: Not mandatory for local usage, but recommended for
installing dependencies and accessing external datasets or APIs in future
upgrades.
Since Gradio operates as a local server, the tool can run offline after setup and can be
accessed from any browser.
16
Fig. 3.2 Component Flow
17
Fig. 3.3 Full Stack Layered Architecture
18
CHAPTER-4
MODULES DESCRIPTION
The proposed system is organized into distinct yet interconnected modules that
work together to collect data, process it, and provide performance predictions using
a trained machine learning model. This modular design ensures flexibility,
scalability, and easier debugging or future upgrades. Each module serves a specific
purpose, and together, they form the backbone of the student performance
prediction system.
The dataset includes 1,000 student entries and features a variety of variables that
influence academic performance. These variables include:
Hours Studied per Day: An integer value ranging from 1 to 10. It simulates
the amount of time a student dedicates to studying daily.
Previous GPA: A float value ranging from 2.0 to 4.0. This reflects a student's
academic history.
Test Score: Simulated final exam or midterm score ranging from 50 to 100.
19
These variables are not only realistic but are aligned with academic research on
student performance metrics. The synthetic data generation allows testing of the
model under varied hypothetical scenarios.
Normalization and Alignment: The module ensures all features are within a
similar scale where necessary and aligns the user input with the trained
model’s feature expectations.
This module is essential to maintain data consistency, ensuring that inputs to the
model are structured exactly like the training data.
At the heart of the system is the machine learning model built using the Random
Forest Regressor from the scikit-learn library. This model is trained on the
preprocessed dataset and learns to map input features to a target output: the predicted
performance score.
Model Details:
20
Max Depth: 8
Performance =
0.3 * Hours_Studied * 10 +
0.2 * Attendance +
0.2 * Previous_GPA * 25 +
0.2 * Test_Score +
5 * Extracurricular_Activities +
7 * Study_Habits_Good -
5 * Study_Habits_Poor +
Random Noise
Train-Test Split:
This module returns the predicted performance score based on a user’s input and is
designed to generalize well on unseen data.
This module is responsible for collecting inputs from users. It is implemented using
Gradio, a Python-based GUI framework. The interface is designed with user
experience in mind and allows the following inputs:
21
Hours Studied (Slider)
GPA (Slider)
Each control is clearly labeled and color-coded for accessibility. The use of sliders
makes it easier for users to input numerical data without typing, reducing error
chances. The dark mode UI improves readability and reduces eye strain during long
sessions.
The Gradio Blocks API is used for layout customization, allowing a two-column
design: one for inputs and one for output results.
Once the user inputs their data and clicks the “Predict Now” button, the system
triggers the prediction function. This function performs the following steps:
2. Missing Columns Handling: Adds any columns not supplied in the input
(due to one-hot encoding rules) with default values.
3. Prediction: The input is passed to the trained model, which returns the
predicted performance score.
22
If the user selects the “Show Graph” option, a bar chart is also displayed using
Seaborn and Matplotlib. This graph is styled in dark theme, with custom fonts and
annotations, showing the predicted performance score visually.
This module ensures the user receives both a textual and graphical representation
of the outcome, making the prediction more understandable and impactful.
Visualization plays a key role in user interpretation. A bar graph is used to depict the
predicted performance in percentage form. The graph is styled with the following
attributes:
Dark Theme Styling: All backgrounds and axes are set to dark tones.
The graph provides an immediate visual cue to the user regarding their expected
performance and is especially useful in comparative studies or presentations.
This module is focused purely on enhancing user experience through custom CSS
injected into the Gradio app. Key features include:
Custom font (Segoe UI) and large font sizes for readability.
23
Hover and active states for buttons to increase responsiveness.
The UI styling ensures that even first-time users can comfortably navigate the system
without confusion or technical support.
CHAPTER-5
SYSTEM TESTING
System testing is an essential part of the software development lifecycle. It ensures
that all modules within the system function correctly, that the system meets its
intended objectives, and that the user experience is seamless and error-free. In the
context of the Student Performance Prediction System, testing involves both technical
evaluation of the machine learning model and functional testing of the user interface,
prediction pipeline, and visualization outputs.
This chapter documents the comprehensive testing efforts made to validate the
integrity, accuracy, usability, and robustness of the system.
1. Unit Testing
Unit testing was carried out to verify the correctness of individual code components.
These include:
Preprocessing scripts
24
Gradio interface component callbacks Each function was tested independently
using mock inputs to ensure correct outputs.
2. Integration Testing
After individual components passed unit testing, integration testing was performed.
This tested the flow between modules, such as whether the output from the
preprocessing module was correctly fed into the model, and whether the prediction
output was appropriately rendered in the UI.
3. System Testing
System testing verified the behavior of the entire application when deployed locally.
This included:
Since the application is designed for non-technical users (students and educators), UI
testing was carried out to confirm that:
5. Performance Testing
While the dataset and model are relatively lightweight, performance testing was
conducted to ensure the model delivered predictions with low latency. The system
achieved average response times under 1 second, even with repeated requests.
25
6. Error Handling and Boundary Testing
Special focus was given to how the system handled unusual or incorrect inputs. For
example:
Test
Actual
Case Test Description Input Scenario Expected Result Status
Result
No.
Hours = 1, Valid
Input sliders at
TC01 Attendance = 50, performance Success Passed
default positions
GPA = 2.0 prediction
Lower
Poor study habits Study_Habits = Poor,
TC03 performance Success Passed
and no activities Extracurricular = No
prediction
Caught by input
Text instead of
TC04 Invalid data types widget Success Passed
numbers
restrictions
bar graph
Over 50 such test cases were executed and documented during the final testing phase.
All critical functionalities passed, and only minor UI tuning was required.
To validate the performance and reliability of the Random Forest Regressor model in
the student performance prediction system, several widely-accepted regression
metrics were computed. These metrics help assess how accurately the model can
generalize to unseen data and determine the trustworthiness of its predictions. The
evaluation was conducted using the 20% test dataset reserved during the train-test
split phase.
27
5.3.2 Mean Absolute Error (MAE)
One of the strengths of the Random Forest algorithm is its ability to quantify the
relative importance of input features in making predictions. This is especially
valuable in education-based systems, where actionable insights are crucial.
Feature Contributions:
o Hours_Studied: 31%
o Attendance: 22%
o Previous_GPA: 20%
o Test_Score: 17%
28
o Study_Habits_Good: 6%
o Extracurricular_Activities_Yes: 4%
Interpretation:
o The model identifies Hours Studied as the most significant predictor,
followed by Attendance and Previous GPA.
o This emphasizes the importance of consistent study habits and
classroom presence.
o While extracurriculars and study habits contribute less
quantitatively, they still play a non-negligible role and reflect holistic
performance factors.
To ensure the model did not overfit the training data, its performance was
compared across training and testing sets.
Training R²: ~0.94
Testing R²: ~0.92
Conclusion: The minor drop in performance suggests strong generalization,
meaning the model is not simply memorizing the training data but truly
learning meaningful patterns.
The Gradio interface was tested extensively for usability, responsiveness, and visual
appeal. Some key aspects of UI testing included:
Label clarity: All labels used descriptive icons and text (e.g., “📘 Hours
Studied”), helping users understand the purpose quickly.
Default values: All inputs were initialized with mid-range or realistic defaults.
Accessibility: The use of contrasting colors, large fonts, and structured layouts
improved accessibility, especially in dark mode.
Responsiveness: The layout maintained alignment on various screen sizes,
including desktops, tablets, and mobile browsers.
Latency: The prediction and graph rendering completed in under 1 second in
nearly all test runs.
Feedback from peer users and faculty reviewers confirmed that the interface was
intuitive, user-friendly, and visually engaging.
30
5.6 ERROR HANDLING AND EXCEPTION TESTING
An important part of testing is making sure that the system fails gracefully. Error
handling ensures that incorrect or incomplete inputs do not break the system or result
in misleading outputs.
Scenarios tested:
Empty Input Fields: Gradio prevented execution until all required inputs
were provided.
Corrupted Model File: Error messages displayed and interface disabled
safely.
Unsupported Input Types: The widget constraints restricted user inputs,
ensuring valid data types.
Server Interruptions: The local server was restarted without residual errors.
Additionally, error logging mechanisms were in place during development to capture
tracebacks, which helped fix bugs quickly. In production, user errors are suppressed
and replaced with friendly messages or UI prompts.
31
CHAPTER-6
IMPLEMENTATION
6.1 IMPLEMENTING:
The implementation of the Student Performance Prediction System involved
developing a full end-to-end machine learning pipeline in Python, starting from
synthetic data generation to UI deployment using Gradio. The core logic was
structured in modular form, allowing clean separation between data processing, model
training, and interface design. A Random Forest Regressor was selected for its
accuracy and interpretability, trained on a dataset of 1,000 synthetically generated
student records. The system was developed and tested in a Jupyter Notebook
environment, then transitioned to a standalone Python script for deployment. The
Gradio interface was customized using HTML and CSS-injected markdown to ensure
accessibility and modern aesthetics. Upon launching the application, users can input
academic parameters, trigger a model prediction, and receive both a numeric score
and visual graph. The implementation showcases a complete, lightweight predictive
analytics application that can be extended further using real-world datasets or
institutional APIs.
32
6.2 SOURCE CODE:
import pandas as pd
import numpy as np
import gradio as gr
n_students = 1000
np.random.seed(42)
df = pd.DataFrame({
})
# 2. One-Hot Encoding
df['Performance'] = (
0.3 * df['Hours_Studied'] * 10 +
33
0.2 * df['Attendance'] +
0.2 * df['Previous_GPA'] * 25 +
0.2 * df['Test_Score'] +
5 * df.get('Extracurricular_Activities_Yes', 0) +
7 * df.get('Study_Habits_Good', 0) -
5 * df.get('Study_Habits_Poor', 0) +
np.random.normal(0, 2, n_students)
).clip(0, 100)
# 4. Train Model
X = df.drop('Performance', axis=1)
y = df['Performance']
model.fit(X_train, y_train)
# 5. Prediction Function
input_data = pd.DataFrame({
'Hours_Studied': [hours],
'Attendance': [attendance],
'Previous_GPA': [gpa],
'Test_Score': [test_score],
34
})
input_data[col] = 0
input_data = input_data[X.columns]
prediction = model.predict(input_data)[0]
plot = None
if show_graph:
plt.style.use('dark_background')
bar = sns.barplot(
x=['Predicted Performance'],
y=[prediction],
palette=['#00ffcc'],
ax=ax
ax.set_ylim(0, 100)
ax.set_facecolor('#121212')
fig.patch.set_facecolor('#121212')
ax.tick_params(colors='white', labelsize=14)
sns.despine(left=True, bottom=True)
plt.tight_layout()
plot = fig
35
return f"🎯 Predicted Performance: {prediction:.2f}%", plot
gr.Markdown("""
<style>
body, .gradio-container {
label, .gr-label {
.gr-button {
36
font-size: 18px !important;
h1 {
</style>
""")
with gr.Box():
gr.Markdown("""
</div>
""")
with gr.Row():
with gr.Column(scale=2):
37
show_graph = gr.Checkbox(label="📊 Show Performance Graph")
with gr.Column(scale=1):
submit.click(
predict_perf,
outputs=[result, perf_plot]
app.launch()
38
CHAPTER – 7
Through rigorous testing, including unit, system, and UI testing, the platform
proved to be stable, efficient, and user-friendly. Visual output in the form of styled
bar graphs enhanced user engagement, while real-time predictions made the system
both interactive and informative. Additionally, the system's modular design allows
easy integration with real educational datasets and institutional LMS platforms in the
future.
This project not only fulfills its objective of predicting student performance
but also sets a precedent for how machine learning can support decision-making in
education. It shows that even with synthetic data and lightweight tools, intelligent
academic systems can be built that are both accessible and scalable.
39
40
41
7.2 FUTURE WORK
While the current system performs effectively in controlled conditions with synthetic
data, several opportunities exist for expansion and refinement:
By addressing these areas, the system can evolve into a comprehensive EdTech
solution capable of transforming data into personalized academic intelligence at scale.
42
REFERENCES
1. Romero, C., & Ventura, S. (2010). Educational data mining: A review of the
state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C
(Applications and Reviews), 40(6), 601–618.
2. Kotsiantis, S. B., Pierrakeas, C., & Pintelas, P. (2004). Prediction of student’s
performance in distance learning using machine learning techniques. Applied
Artificial Intelligence, 18(5), 411–426.
3. Al-Barrak, M. A., & Al-Razgan, M. (2016). Predicting students’ performance
through classification: A case study. Journal of Theoretical and Applied
Information Technology, 88(1), 1–7.
4. Patki, N., Wedge, R., & Veeramachaneni, K. (2016). The synthetic data vault.
In 2016 IEEE International Conference on Data Science and Advanced
Analytics (DSAA) (pp. 399–410). IEEE.
5. Chamorro-Premuzic, T., & Furnham, A. (2003). Personality traits and
academic examination performance. European Journal of Personality, 17(3),
237–250.
6. Marsh, H. W., & Kleitman, S. (2002). Extracurricular school activities: The
good, the bad, and the nonlinear. Harvard Educational Review, 72(4), 464–
515.
7. Heer, J., Bostock, M., & Ogievetsky, V. (2010). A tour through the
visualization zoo. Communications of the ACM, 53(6), 59–67.
8. Vandamme, J. P., Meskens, N., & Superby, J. F. (2007). Predicting academic
performance by data mining methods. Education Economics, 15(4), 405–419.
9. Zafra, A., & Ventura, S. (2009). Multi-objective genetic programming for
multiple instance learning in classification tasks. Applied Soft Computing,
9(2), 760–771.
10. Dekker, G. W., Pechenizkiy, M., & Vleeshouwers, J. M. (2009). Predicting
students drop out: A case study. In Proceedings of the 2nd International
Conference on Educational Data Mining (pp. 41–50).
11. Kumar, M., & Pal, S. (2011). Data mining: A prediction for performance
improvement using classification. International Journal of Computer Science
and Information Security, 9(4), 136–140.
43
12. Gradio Developers. (2023). Gradio Documentation. Retrieved from
https://www.gradio.app/
13. Scikit-learn Developers. (2023). Scikit-learn: Machine Learning in Python.
Retrieved from https://scikit-learn.org/
44