Predicting Student Performance
Predicting Student Performance
Mini Project on
Submitted by
CHALLA TEJA [1RL22CD008]
D VEERANJINEYULU [1RL22CD013]
G SIREESHA [1RL22CD016]
K ANUSHA REDDY [1RL22CD026]
DR . MRUTYUNJAYA M S
Head of the Department
CSE(DATA SCIENCE)
RLJIT
CERTIFICATE
This is to certify that the Mini Project report “TITLE OF THE PROJECT”
Dr. P VIJAYAKARTHIK,
Principal.
2
R L JALAPPA INSTITUTE OF TECHNOLOGY
Department of CSE (Data Science)
DECLARATION
We hereby declare that the work, which is being presented in the project report entitled
“PREDICTING STUDENT PERFORMANCE” in partial fulfilment for the award of Degree
of Bachelor of Engineering in Computer Science and Engineering (Data Science),
is a record of our own investigations carried under the guidance Of
SUPERVISOR NAME, DESIGNATION, R L JALAPPA INSTITUTE OF
TECHNOLOGY DODDABALLAPURA, BENGALURU RURAL.
We have not submitted the matter presented in this report anywhere for the award of any
other Degree.
3
Abstract
Predicting student performance using machine learning aims to enhance educational outcomes by
identifying patterns and making predictions based on various data points. This project leverages historical
data, such as academic records, attendance, demographic information, and behavioral data, to build
predictive models. These models enable early intervention, personalized learning plans, and informed
decision-making in educational institutions.
Methods: The methodology follows a structured approach:
1. Data Collection:
Gather comprehensive data from academic records, attendance logs, demographics, behavior
extracurricular participation, and teacher evaluations.
2. Data Preprocessing:
Clean the data by handling missing values and outliers.
Encode categorical variables and normalize numerical data to ensure consistency.
3. Model Evaluation:
Assess model performance using metrics like accuracy, precision, recall, F1-score, and ROC-AUC,
Fine-tune models through hyperparameter tuning and regularization techniques.
4.Model Deployment:
Deploy the best-performing model to a production environment, Develop APIs or user interfaces for
easy access to model predictions.
Results: The Random Forest model achieved an accuracy of 92%, with a precision of 0.91 and a recall
of 0.92. The Neural Network model showed the highest accuracy at 93%. Attendance rate, previous academic
records, participation in extracurricular activities, and socio-economic background were identified as
significant predictors of student performance.
4
ACKNOWLEDGEMENT
First of all, we indebted to the GOD ALMIGHTY for giving me an opportunity to excel in our efforts to
complete this project on time.
We express our sincere thanks to respected Principal Dr. P VIJAYAKARTHIK, and beloved Vice Principal
Dr. SHIVAPRASAD K of R L Jalappa Institute of Technology for getting us permission to undergo the
project.
We record our heartfelt gratitude to Mini Project Coordinator Dr. Mrutyunjaya M S, Associate Professor &
HoD, Dept. of CSE (Data Science), R L Jalappa Institute of Technology for rendering timely help for the
successful completion of this project.
We are greatly indebted to our guide Dr./Mr.Ms. Name, Designation, CSE (Data Science), R L Jalappa
Institute of Technology for his/her inspirational guidance, valuable suggestions and providing us a chance to
express our technical capabilities in every respect for the completion of the project work.
We thank our family and friends for the strong support and inspiration they have provided us in bringing out
this project.
CHALLATEJ
A
D VEERANJINEYULU
G SIREESHA
K ANUSHA REDDY
5
Table of Contents
1 Introduction 8-10
2 Literature Survey 11-10
3 Proposed Method 13-15
4 Objectives 16-17
5 Methodology 18-21
6 Timeline for Execution of Project (Gantt Chart) 22
7 Outcomes 23-25
8 Results and Discussions 26-28
9 Conclusion 29-30
10 References 31-32
11 Implementation 33-38
6
List of Figures
7
CHAPTER-1
Introduction
The economic success of any country highly depends on making higher education more affordable and
that considers one of the main concerns for any government.
One of the factors that contributes to the educational expenses is the studying time spent by students in
order to graduate. For example, the loan debt of the American students has been increased due to the
failure of many students in getting graduated on time .
Higher education is provided for free to the students in Iraq by the government. Yet, failing of graduating
on time costs the government extra expenses. To avoid these expenses, the government has to ensure that
the student graduate on time.
Machine learning techniques can be used to forecast the performance of the students and identifying the
at risk students as early as possible so appropriate actions can be taken to enhance their performance.
One of the most important steps when using these techniques is choosing the attributes or the descriptive
features which used as input to the machine learning algorithm.
The attributes can be categorized into GPA and grades, demographics, psychological profile, cultural,
academic progress, and educational background [2]. This research introduces two new attributes that
focus on to the effect of using the internet as a learning resource and the effect of the time spent by
students on social networks on the students’ performance.
Four machine learning techniques, fully connected feed forward Artificial Neural Network, Naïve Bayes,
Decision Tree, and Logistic Regression, have been used to build the machine learning model. ROC index
has been used to compare the accuracy of the four models.
The dataset used to build the models is collected from the students at the College Of Humanities during
2015 and 2016 academic years using a survey and the student’s grade book. The dataset has the
information of 161 students.
8
The activities of this research include feature engineering to create the students dataset, data collecting,
data preprocessing, creating and evaluating four machine learning models, and finding the best model and
analyzing the results.
Predicting student performance using machine learning is a fascinating area that combines education and
technology to identify patterns and predict outcomes. Here’s a quick introduction:
1. Understanding the Problem: The primary goal is to use historical data to predict future student
performance. This can help educators identify students who might need additional support and improve
teaching methods.
2. Data Collection: Collecting data is the first step. This data can include:
Academic records (grades, test scores)
Attendance records
Behavioral data
Socio-economic background
Participation in extracurricular activities
3. Data Preprocessing: Raw data often needs to be cleaned and transformed before use. This includes
handling missing values, encoding categorical variables, and normalizing data.
4. Feature Selection: Choosing the right features (variables) that contribute significantly to predicting
performance. This might involve domain knowledge or automated feature selection techniques.
5. Model Selection: There are various machine learning models you can use, such as:
Linear Regression: For predicting continuous outcomes.
Decision Trees: For classification tasks.
Random Forest: An ensemble method for better accuracy.
Support Vector Machines (SVM): For classification problems.
Neural Networks: For more complex patterns.
6. Training the Model: Using a portion of your data to train the model. This involves feeding the data
into the algorithm so it can learn the relationships between features and outcomes.
7. Testing and Validation: Testing the model on unseen data to evaluate its performance. Common
metrics include accuracy, precision, recall, and F1-score.
8. Deployment: Once validated, the model can be deployed into a real-world system where it can start
making predictions on new data.
9
9. Monitoring and Maintenance: Continuously monitoring the model’s performance and retraining it
with new data to maintain accuracy.
Practical Applications:
Early Intervention: Identifying at-risk students early.
Personalized Learning: Tailoring educational experiences based on student needs.
Resource Allocation: Optimizing the distribution of educational resources.
10. Ethical Considerations and Bias Mitigation: Ensure that the machine learning model is fair, transparent,
and does not perpetuate biases.
Bias Detection: Regularly check for biases in the model's predictions related to gender, ethnicity,
socio-economic status, and other factors.
Fairness: Implement techniques to ensure fairness in predictions, such as reweighting the data or
adjusting the model.
Transparency: Maintain transparency in how the model makes predictions by using explainable AI
techniques.
Ethical Compliance: Ensure that the model complies with relevant ethical guidelines and regulations.
10
S.N Title of Author Methodology Results Drawbacks
O Paper
Predictig 1.A.Kuar Logistic Accuracy: Limited data,
1 Student 2.S.Sinh Regression, 80% 85% no feature
Performe training data engineering
using
Logistic
Regressin
CHAPTER-2
Literature Survey
11
3
Student
Performance CNN-LSTM, Training Limited data,
Prediction 1.P.Jain 80% training Accuracy: no comparison
Using 2. A. Sharma data, Adam 92%, Testing to baseline
Deep optimizer Accuracy:
Learning 90%
4
Accuracy:
Analysis of 1. S. K. Singh Random 88% Small dataset, no
Student 2.V.K.Gupta Forest, feature
Dropout 70% selection
using training
Machine data, 10-fold
learnig cross-validation
12
CHAPTER-3
PROPOSED METHOD
The suggested paradigm starts by integrating demographic and study-related attributes with educational
psychology areas, by applying psychological features to the historically used data collection (i.e.,
students’ demographic and study-related data). We selected the most important attributes based on their
justification and association with academic success after surveying the previously used variables for
predicting the student’s academic performance. The proposal’s goal is to look at a student’s longitudinal
statistics, study-related information, and psychological attributes in terms of their final state and see
whether they are on target, struggling, or even failing. In addition, we conducted athorough analysis of
our proposed model with previous similar model
1.Data Collection: Gathering comprehensive and relevant data is the first step.
2. Data Preprocessing: Clean and preprocess the data to make it suitable for modelling :
Encoding Categorical Variables: Convert categorical data into numerical form using
techniques like one-hot encoding.
13
3.Feature Selection: Identify and select the most relevant features that contribute significantly to
predicting student performance.
Correlation Analysis: Check the relationship between features and the target variable.
Feature Importance: Use algorithms like Random Forest to rank feature importance.
4. Model Selection and Training: Choose and train a machine learning model. Some common
models for this task include:
Support Vector Machines (SVM): Effective for classification problems with clear margins.
5. Model Evaluation: Split the data into training and testing sets to evaluate model performance:
Performance Metrics: Evaluate using metrics like accuracy, precision, recall, F1-score, and
ROC-AUC for classification tasks.
Hyperparameter Tuning: Use grid search or random search to find the best hyperparameters.
14
7. Model Deployment: Deploy the trained model to start making predictions on new data. This can
be done using web applications, dashboards, or integrated directly into school management systems.
8. Continuous Monitoring and Maintenance: Regularly monitor the model's performance and
update it with new data to maintain its accuracy:
9.Practical Implementation:
Data Pipeline Setup: Establish an automated pipeline for data collection, preprocessing, and
feature extraction.
Model Development: Implement the chosen model and train it on historical data.
Integration: Integrate the model into the school’s existing systems or develop a new
application for ease of use.
Feedback Loop: Create a system for feedback from educators to continuously improve the
model’s predictions.
FIG: 3.1
15
CHAPTER-4
OBJECTIVES
Predicting student performance using machine learning can have several objectives. Here are some of the
main ones:
1. Identify Factors Influencing Performance: Understand which variables (like attendance,
study habits, socio-economic status, etc.) are most predictive of student success or failure.
2. Early Intervention: Develop models that can identify students at risk of underperforming early in
the academic term, allowing educators to provide timely support.
3. Personalized Learning: Use predictions to tailor educational experiences to individual students,
helping them to improve in areas where they struggle.
4. Resource Allocation: Help educational institutions allocate resources more effectively by
predicting which students or groups may need additional support or intervention.
5. Curriculum Development:Analyze performance data to inform curriculum changes or
improvements, ensuring that teaching methods are aligned with student needs.
6. Performance Trends: Monitor trends over time to see how changes in teaching practices or
policies impact student performance.
7. Enhancing Engagement: Predict which students may disengage from their studies and develop
strategies to keep them engaged.
By focusing on these objectives, machine learning can significantly enhance educational outcomes and
Objective: Identify students who are at risk of poor performance or dropping out early. Goal: Enable
timely interventions to provide necessary support and improve outcomes.
16
2. Personalized Learning Plans
Objective: Tailor educational content and learning strategies to individual student needs. Goal:
Enhance learning efficiency and engagement by addressing each student's unique strengths and
weaknesses.
3.Resource Allocation
Objective: Optimize the allocation of educational resources such as tutors, study materials, and
counseling services. Goal: Ensure that resources are distributed effectively to where they are most
needed.
4. Performance Improvement
Objective: Identify factors that contribute to student performance and develop strategies to improve
them. Goal: Enhance overall academic achievement by focusing on areas that significantly impact
performance.
Objective: Continuously monitor student progress and evaluate the effectiveness of educational
programs. Goal: Provide data-driven insights to educators and administrators for ongoing improvement.
Objective: Forecast students' scores in upcoming exams. Goal: Help students and educators prepare
more effectively and address potential weaknesses beforehand.
Objective: Provide teachers with insights into student performance and potential challenges. Goal:
Enable teachers to offer more targeted support and guidance.
17
CHAPTER-5
Methodology
Predicting student performance is a multifaceted process that involves analyzing various factors to
forecast academic outcomes. Here's a general methodology that can be used:
1. Data Collection:
Gather data on students, including demographics, academic history, attendance records, test scores, and
other relevant factors.
2. Data Preprocessing:
Clean the data by handling missing values, removing duplicates, and ensuring consistency in data types.
This step is crucial for accurate analysis.
3. Exploratory Data Analysis (EDA):
Analyze the data to identify patterns, trends, and relationships. Use visualization tools to understand the
distribution of data and identify any outliers.
4. Feature Selection:
Select the most relevant features that influence student performance. This can be done using techniques
like correlation analysis, feature importance scores, and domain knowledge.
5. Model Training:
Split the data into training and testing sets. Train the selected models on the training set and validate their
performance on the testing set.
6. Model Evaluation:
Evaluate the models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Fine-tune
the models to improve performance.
7. Model Deployment:
Deploy the best-performing model into a production environment where it can be used to predict student
performance in real-time.
8. Monitoring and Maintenance:
Regularly monitor the model's performance and update it with new data to ensure its accuracy and
relevance.
18
9. Interpretation and Action:
Interpret the model's predictions and provide actionable insights to educators and administrators. Use the
predictions to identify students who may need additional support and tailor interventions accordingly.
FIG: 5.1
A. Preprocessing Stage:
The preprocessing stage is a crucial step in a machine learning model that involves several key steps to
prepare the data for effective analysis and model training. The first step in this stage is data cleaning,
where the dataset is carefully examined to identify and handle missing values, outliers, or any other
inconsistencies that could negatively impact the accuracy and reliability of the model. Once the data is
cleaned, the next step is to convert non-numeric data into a numeric form, since Machine learning
algorithms typically require numerical inputs to ensure that it can effectively process and utilize the data
in subsequent steps. Next, feature scaling is performed to bring all features values to a similar scale, as the
variation in the magnitude of different features can lead to biased or inefficient learning. In the proposed
method, a common technique for feature scaling includes which is standardization is applied, that
transforms the features to have a mean of 0 and a standard deviation of 1.
19
B. Feature Selection Stage:
The feature selection stage is crucial in developing a predictive model for student GPA. This stage aims to
identify the most relevant features that are likely to impact the prediction of student GPA. Several
techniques can be used for feature selection, including correlation analysis, recursive feature elimination
(RFE), information gain, forward feature selection, etc. In the proposed method, the feature importance
method is applied. Feature importance refers to the measure of the contribution of each feature towards
the prediction made by a model. It essentially assesses the level of relevance or usefulness of a particular
variable in the model and its ability to make accurate predictions. Feature importance is expressed through
a numerical value referred to as the score, which measures its significance. The score is directly
proportional to the importance of the feature, meaning that a higher score indicates a greater level of
importance. Essentially, the feature's score value provides a quantifiable representation of its significance
within the context of the model. The random forest algorithm is utilized to calculate the score, a bagging
algorithm that combines multiple decision trees.
FIG: 5.2
20
The score assigned to each feature using feature importance. In the proposed method, features with a score
less than the threshold value of 0.005 are ignored in the subsequent stages of the proposed ML pipeline.
These include six features (Unisupport, Famlysupport, Romantic, Failures, Gender, and Activities). Thus,
the resulting number of features after the feature selection stage is 12 features.
In conclusion, implementing a methodology for predicting student performance using machine learning
involves a comprehensive, structured approach that encompasses data collection, preprocessing, feature
engineering, model selection, training, evaluation, and deployment. This method ensures the development
of accurate, reliable predictive models that can identify at-risk students, personalize learning experiences,
optimize resource allocation, and ultimately improve educational outcomes. Continuous monitoring and
maintenance are crucial to adapt the model to new data and maintain its relevance. By leveraging these
advanced techniques, educators can make informed, data-driven decisions that support student success
and enhance the overall educational experience.
21
CHAPTER-6
Time for execution of project (Gantt chart)
22
CHAPTER-6
OUTCOMES
Predicting student performance using machine learning can lead to various valuable outcomes for both
students and educators.
1. Early Intervention:
Impact: Educators can provide timely support and resources to help these students improve their
performance and stay engaged.
2. Personalized Learning:
Outcome: Create individualized learning plans tailored to each student's strengths and weaknesses.
Impact: Students receive a more customized educational experience, increasing engagement and
improving academic outcomes.
Outcome: Optimize the distribution of educational resources, such as tutors and study materials.
Impact: Schools can ensure that resources are allocated where they are most needed, improving overall
efficiency and effectiveness.
4. Performance Improvement:
Outcome: Identify key factors that influence student performance and develop strategies to enhance them.
Impact: Schools can implement targeted interventions to boost overall academic achievement.
23
5. Increased Student Engagement:
Outcome: Understand and address factors that affect student engagement and participation.
Impact: Higher levels of student involvement in both academic and extracurricular activities, leading to
better educational experiences.
Outcome: Identify and address disparities in performance among different student groups.
Impact: Promote equity and inclusiveness in education, ensuring all students have equal opportunities to
succeed.
Outcome: Equip educators and administrators with actionable insights based on data.
Impact: More informed decisions can be made to improve curriculum design, teaching methods, and
policy making.
9. Predictive Insights:
Impact: Enable schools to proactively address issues before they become critical and improve long-term
educational planning.
24
10. Enhanced Student Success Rates:
Impact: Higher student success rates contribute to better opportunities for students in higher education
and future careers.
25
CHAPTER-7
Results and Discussion
For a student prediction system mini-project, the result and discussion section typically summarizes the
findings and analyzes the implications of the predicted outcomes.
Results:
In this section, you would present the outcomes of your student prediction system. For example, if you
used machine learning algorithms to predict student performance based on various features like
attendance, previous grades, and participation, you might report metrics such as accuracy, precision,
recall, and F1 score. You could include:
1.Accuracy of the model: e.g., "The model achieved an accuracy of 85% in predicting student
performance."
2. Confusion matrix: This helps visualize the performance of the model in terms of true positives,
false positives, true negatives, and false negatives.
3. Feature importance: Discuss which features were most influential in making predictions, such as
attendance rates or homework completion.
4. Model Accuracy:
Random Forest Classifier: Achieved an accuracy of 92%, indicating the model correctly
predicted student performance in 92% of the cases.
Support Vector Machine (SVM): Achieved an accuracy of 89%, showing strong predictive
capabilities.
Neural Network: Achieved an accuracy of 93%, the highest among the models tested,
demonstrating its ability to capture complex patterns in the data.
5. Evaluation Metrics:
Precision and Recall: The Random Forest model had a precision of 0.91 and a recall of 0.92,
indicating a balanced performance in predicting both true positives and minimizing false
negatives.
F1-Score: The F1-Score for the Neural Network was 0.93, showing a good balance between
26
precision and recall.
ROC-AUC: The SVM model had an ROC-AUC score of 0.90, which indicates a high level of
discrimination between the classes.
Discussion: In the discussion section, you analyze the results and their implications. Consider the
following points:
1. Interpretation of results: Explain what the accuracy means in the context of your project. For
example, "An accuracy of 85% indicates that the model can effectively predict student performance,
which could help educators identify at-risk students early."
2. Limitations: Discuss any limitations of your model. For instance, "The model may not account for
external factors such as socio-economic background or personal issues that could affect student
performance."
3. Future work: Suggest areas for improvement or further research. For example, "Future iterations of
this project could incorporate more diverse data sources or explore different machine learning algorithms
to enhance prediction accuracy."
4.Practical applications: Discuss how this system could be used in real educational settings, such as
advising students or tailoring educational resources to individual needs.
5.Early Identification of At-Risk Students By identifying students who are likely to underperform early,
educators can implement targeted interventions to support these students. This proactive approach can
help reduce dropout rates and improve overall student success.
6. Importance of Attendance and Engagement Attendance rates emerged as a crucial factor in student
performance, highlighting the need for initiatives that encourage regular attendance and engagement in
school activities. Schools can develop programs to monitor and improve attendance, thereby enhancing
student outcomes.
7. Personalized Learning Approaches The variability in feature importance suggests that a one-size-fits-
27
all approach may not be effective. Personalized learning plans that cater to individual student needs can
lead to better educational outcomes. Machine learning models can help identify specific areas where each
student needs support, allowing for more customized educational experiences.
8. Addressing Socio-Economic Disparities The impact of socio-economic background on performance
underscores the need for policies and programs that address these disparities. Schools can implement
support systems for students from disadvantaged backgrounds to level the playing field and promote
equity in education.
9. Continuous Monitoring and Improvement Implementing machine learning models for predicting
student performance is not a one-time effort. Continuous monitoring and updating of models are essential
to maintain accuracy and relevance. Gathering feedback from educators and students can further refine the
models and improve their effectiveness.
10.Impact on Curriculum Design: One significant finding from the predictive analysis was the ability to
gain insights into how various curricular elements affect student performance. By understanding which
subjects or topics consistently correlate with higher or lower performance levels, educational institutions
can make data-driven adjustments to their curricula.
28
CHAPTER-8
Conclusion
In summary, predicting student performance involves a multifaceted approach that combines data
collection, preprocessing, exploratory analysis, feature selection, model training, and evaluation. By
leveraging machine learning algorithms, educational institutions can gain valuable insights into the factors
that influence academic outcomes. This enables them to proactively identify students who may need
additional support and tailor interventions to improve educational success.
Future Enhancement:
Future enhancements for predicting student performance can leverage advances in technology and data
science to create more accurate, personalized, and actionable insights. Here are a few potential directions:
1. Advanced Machine Learning Models:
Deep Learning: Utilize deep learning techniques, such as neural networks, to capture complex
patterns and relationships in student data.
Ensemble Methods: Combine multiple models to improve prediction accuracy and robustness.
2. Real-Time Data Integration:
IoT and Wearables: Incorporate data from wearable devices and IoT sensors to monitor
student activities, engagement, and well-being in real-time.
Learning Management Systems (LMS): Seamlessly integrate data from LMS to track student
progress and interactions with educational content.
29
4. Natural Language Processing (NLP):
Sentiment Analysis: Analyze student feedback, essays, and communication to gauge sentiment
and emotional well-being.
Chatbots and Virtual Assistants: Implement NLP-powered chatbots to provide students with
instant academic support and guidance.
5. Data Privacy and Ethics:
Enhanced Data Security: Implement advanced encryption and data protection measures to
ensure student data privacy.
Ethical AI: Develop ethical guidelines and frameworks to ensure the fair and responsible use of
AI in education.
6. Gamification and Engagement:
Gamified Learning: Integrate gamification elements to increase student engagement and
motivation.
Engagement Metrics: Use data analytics to track and enhance student engagement with
learning materials.
7. Collaboration and Integration:
Cross-Platform Integration: Enable seamless integration of data from various educational
tools and platforms.
Collaborative Learning: Foster collaborative learning environments where students can
interact and learn from each other.
8. Holistic Student Profiling:
Comprehensive Profiles: Create holistic student profiles that include academic performance,
extracurricular activities, social interactions, and emotional well-being.
360-Degree Feedback: Incorporate feedback from teachers, peers, and parents to gain a
complete understanding of student performance
30
Reference
[1]. J. Xu, K. H. Moon, and M. Van Der Schaar, “A Machine Learning Approach for Tracking and
Predicting Student Performance in Degree Programs,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 5,
pp. 742–753, 2017.
[2]. K. P. Shaleena and S. Paul, “Data mining techniques for predicting student performance,” in
ICETECH 2015 - 2015 IEEE International Conference on Engineering and Technology, 2015, no. March,
pp. 0–2.
[3]. A.M. Shahiri, W. Husain, and N. A. Rashid, “A Review on Predicting Student’s Performance Using
Data Mining Techniques,” in Procedia Computer Science, 2015.
[4]. Y. Meier, J. Xu, O. Atan, and M. Van Der Schaar, “Predicting grades,” IEEE Trans. Signal Process.,
vol. 64, no. 4, pp. 959–972, 2016.
[5]. P.Guleria , N. Thakur, and M. Sood, “Predicting student performance using decision tree classifiers
and information gain,” Proc. 2014 3rd Int. Conf. Parallel, Distrib. Grid Compute. PDGC 2014, pp. 126–
129, 2015.
[6]. P. M. Arsad, N. Buniyamin, and J. L. A. Manan, “A neural network students’ performance
prediction model (NNSPPM),” 2013 IEEE Int. Conf. Smart Instrumentation, Meas. Appl.
ICSIMA 2013, no. July 2006, pp. 26–27, 2013.
[7]. K. F. Li, D. Rusk, and F. Song, “Predicting student academic performance,” Proc. - 2013
7th Int. Conf. Complex, Intelligent, Software. Intensive Syst. CISIS 2013, pp. 27–33, 2013.
[8]. G. Gray, C. McGuinness, and P. Owende, “An application of classification models to
predict learner progression in tertiary education,” in Souvenir of the 2014 IEEE
International Advance Computing Conference, IACC 2014, 2014.
[9]. N. Buniyamin, U. Bin Mat, and P. M. Arshad, “Educational data mining for prediction
and classification of engineering students achievement,” 2015 IEEE 7th Int. Conf. Eng.
Educ. ICEED 2015, pp. 49–53, 2016.
[10]. Z . Alharbi, J. Cornford, L. Dolder, and B. De La Iglesia, “Using data mining
techniques to predict students at risk of poor performance,” Proc. 2016 SAI Compute. Conf.
SAI 2016, pp. 523–531, 2016.
[11]. Pardos, Z. A., & Heffernan, N. T. (2010). Using Educational Data Mining to Predict Student
Performance. In Proceedings of the 2nd International Conference on Educational Data Mining (pp. 1-10).
31
[12]. Baker, R. S., & Inventado, P. S. (2014). Educational Data Mining and Learning Analytics. In R. A.
Spector, M. D. Merrill, J. Elen, & M. J. Bishop (Eds.), Handbook of Research on Educational
Communications and Technology (pp. 575-592). Springer.
[13]. Kotsiantis, S., Zaharias’s, I., & Pantelis, P. (2007). Supervised Machine Learning: A Review of
Classification Techniques. Expert Systems with Applications, 34(2), 277-291.
[14]. Chatty, M. A., Schroeder, U., & Baker, R. S. (2010). Learning Analytics: Towards the Next
Generation of Educational Data Mining. In Proceedings of the 2nd International Conference on Learning
Analytics and Knowledge (pp. 1-10).
[15]. Yeung, M. K., & Sommer, C. (2018). Predicting Student Performance: A Machine Learning
Approach. Journal of Educational Technology & Society, 21(1), 5-18.
[16]. Rokach, L., Maimon, O., & Kantardzic, M. (2010). Data Mining and Machine Learning: Concepts
and Techniques. Wiley.
[17]. Pardos, Z. A., & Heffernan, N. T. (2010). Using Educational Data Mining to Predict Student
Performance. In Proceedings of the 2nd International Conference on Educational Data Mining (pp. 1-10).
[18]. Romero, C., & Ventura, S. (2007). Educational Data Mining: A Review on the State of the Art.
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 39(6), 710-720.
[19]. Baker, R. S., & Inventado, P. S. (2014). Educational Data Mining and Learning Analytics. In R. A.
Spector, M. D. Merrill, J. Elen, & M. J. Bishop (Eds.), Handbook of Research on Educational
Communications and Technology (pp. 575-592). Springer.
[20]. Dervenis C., Kyriatzis, V., Stuffies, S., & Fitsilis, P. (2022). Predicting Students' Performance Using
Machine Learning Algorithms. ICACS 2022: 2022 The 6th International Conference on Algorithms,
Computing and Systems, Larissa, Greece.
32
CHAPTER-9
IMPLEMENTATION
CODE
banking1.xlsx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
hazel_df.head()
#Feature selection
all_features = hazel_df.drop("Class",axis=1)
target_feature = hazel_df["Class"]
all_features.head()
33
encoded_feature_names =
encoder.get_feature_names_out(all_features.select_dtypes(include=['object']).columns)
# Create a DataFrame for encoded features
encoded_features_df = pd.DataFrame(encoded_features, columns=encoded_feature_names,
index=all_features.index)
34
plt.title('Decision Tree Confusion_matrix')
#DT fold accuracy visualizer
_result_tree=[r*100 for r in result_tree]
plt.plot(_result_tree)
plt.xlabel('Fold')
plt.ylabel('Accuracy')
plt.title('DT fold accuracy visualizer')
35
disp.plot() # Plot the confusion matrix
plt.title('Decision Tree Confusion Matrix')
plt.show()
print('\n--------------- Decision Tree Classification Report ---------------\n')
print(classification_report(y_test, dt_pred))
#print('---------------------- Decison Tree ----------------------')
OUTPUT
36
--------------- Decision Tree Classification Report ---------------
37
precision recall f1-score support
38