0% found this document useful (0 votes)
52 views8 pages

Final Paper

Uploaded by

Motti Zachariah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views8 pages

Final Paper

Uploaded by

Motti Zachariah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Survey on Educational Data in order to

Predict Student Academic Success


Dr. Ambily Merlin Kuruvilla, Associate Professor, PG Department of Computer Applications and
Artificial Intelligence, Saintgits College of Applied Sciences.

Kannan Saji, Second Year M.Sc. AI, PG Department of Computer Applications and Artificial
Intelligence, Saintgits College of Applied Sciences.

Motti Zachariah Varghese, Third Year BCA Students, PG Department of Computer Applications and
Artificial Intelligence, Saintgits College of Applied Sciences.

Jeeval Jolly Jacob, Third Year BCA Students, PG Department of Computer Applications and Artificial
Intelligence, Saintgits College of Applied Sciences.

Kevin Justees, Third Year BCA Students, PG Department of Computer Applications and Artificial
Intelligence, Saintgits College of Applied Sciences.

Abstract
In light of the significant impact that student performance has on their future employability,
educational institutions bear a profound responsibility in ensuring students' academic success.
Predicting student academic achievement has been an ongoing area of research across various
academic domains. Leveraging machine learning algorithms, we can forecast student performance by
analysing a wide array of factors, including personal, psychological, and environmental aspects.
The classroom context plays a pivotal role in students' academic performance. School resources, class
sizes, curriculum quality, and extracurricular activities collectively influence student engagement and
success. Furthermore, effective teaching practices, pedagogical strategies, and teacher-student
relationships exert a profound influence on academic outcomes. Various tools, such as standardized
tests, instructor evaluations, coursework, and projects, are employed to assess to check performance of
students, providing insights into their subject-specific expertise and comprehension. Ongoing
monitoring and feedback processes unveil areas where students may require additional support and
interventions, necessitating a multifaceted approach to enhancing academic performance.
This study surveyed various ML algorithms to predict students' academic success. The dataset used for
the study was collected from Kaggle Machine Learning Repository. The cleaned data was inputted into
various ML algorithms such as Logistic Regression (LR), Multilayer Perceptron (MLP), Naïve Bayes
(NB), and KNN algorithms. Results were analysed, revealing that the LR algorithm outperforms the
other algorithms.
Keywords: Machine Learning, Logistic Regression, Multilayer Perceptron, k-Nearest Neighbors: ML,
LR, MLP, KNN.

1. INTRODUCTION
In today's fiercely competitive world, the performance of students in academic plays a vital role in
shaping their future prospects. As educational institutions constantly strive to enhance student
achievement, they are increasingly turning to the potent tools of data analysis and machine learning for
valuable insights.
Python, a widely-used programming language in the field of data science and machine learning,
provides researchers with a rich toolbox of libraries and resources to analyze and visualize large
datasets. In this research endeavor, we use a Python dataset to forecast student performance in exams,
drawing from a spectrum of influential factors such as attendance, study hours, and prior grades.

1
To assure the validity and correctness of the dataset, the study begins with a thorough data cleaning
and preprocessing phase. The data is then subjected to a thorough analysis using a wide range of
statistical techniques, such as regression analysis and data visualization. The goal of these analytical
techniques is to find patterns and trends in the data.
The implementation of ML techniques to build a predictive model with the capacity to precisely
anticipate student performance on exams is the research's ultimate goal. In conclusion, this study
highlights the significant synergy between data analysis and ML for improving student performance
and boosting academic accomplishment. The knowledge gained from this research project is ready to
influence educational practice and policy, ultimately enabling students to reach their full academic
potential.
Understanding and predicting students' academic performance has become crucial for educators,
educational organizations, and legislators in the current educational environment. Examining
educational data has become a useful tool for predicting student academic progress as the emphasis on
personalized instruction and resource allocation has increased. This introduction offers a
comprehensive viewpoint on the importance of examining educational data and its potential to offer
insightful perceptions of student performance and accomplishments.
Educational data constitutes a vast repository of information encompassing academic records,
standardized test scores, attendance logs, demographic particulars, and socio-economic indicators. By
aggregating and meticulously scrutinizing these data points, both researchers and educators stand to
gain valuable insights into student behavior, trends, and academic performance. These insights are
instrumental in identifying students at risk of underperformance, fine-tuning instructional
methodologies, and optimizing resource allocation.
The capacity to predict student academic success carries substantial implications for educational
institutions. Early identification of students grappling with academic challenges allows for timely
intervention and the provision of tailored support, mitigating academic setbacks and enhancing the
probability of success. Furthermore, the ability to forecast student performance offers valuable
guidance for pedagogical strategies, enabling educators to customize their teaching methods to cater to
the diverse needs of their students.
Moreover, the analysis of educational data empowers evidence-based decision-making at both
institutional and policy levels. By delving into patterns and trends discernible in student data,
educational policymakers can pinpoint systemic issues, conceive targeted interventions, and allocate
resources judiciously. This data-driven approach has the potential to engender more equitable
education systems that prioritize student success and foster the development of effective educational
policies.
While the exploration of educational data holds immense promise, it is not devoid of challenges.
Safeguarding data privacy, preserving data integrity, and adhering to ethical considerations are pivotal
aspects that necessitate scrupulous attention. The implementation of robust data governance, informed
consent protocols, and adherence to privacy regulations remains imperative in protecting students' data
and upholding trust in data analysis practices.

2. LITERATURE REVIEW
ML algorithms, well-known for their ability to analyze large datasets autonomously, provide
exceptional efficiency in pattern identification and anomaly detection. These algorithms have the
unique capacity to independently learn from data and use this newfound knowledge to build predictive
models that may reliably anticipate future occurrences. As a result, ML algorithms are ubiquitous
across sectors, efficiently utilizing real-time data for a wide range of applications.
Investigating student academic performance necessitates a thorough analysis of the several elements
that influence academic attainment within an educational system. A variety of research has dug into
the critical roles played by socioeconomic issues, parental participation, teacher quality, the learning
environment, student engagement, technological integration, psychological components, school

2
climate, and assessment techniques. These vast studies continuously highlight the critical contributions
of a supportive family environment, effective pedagogical approaches, a favorable school climate, and
active student participation as critical drivers of academic achievement. Furthermore, these studies
highlight the important impact of socioeconomic position, motivation, self-regulation, and cognitive
ability on students' academic achievement. As a result, gaining a thorough understanding of student
academic performance entails acknowledging the subtle interplay of these numerous components,
which collectively define the learning experience and contribute to educational results.
Evans Austin Brew, Benjamin Nketiah, and Richard Koranteng [6] performed research that gives a
critical perspective on academic achievement, diving into the elements impacting student results in
senior high schools. The study emphasizes the critical significance of truancy as a hindrance to
academic achievement, suggesting collaborative efforts to eliminate truancy and promote regular
school attendance.
Furthermore, the study highlights the importance of parents' educational levels and wealth. Educating
parents on the crucial necessity of providing an education for their children emerges as a critical
suggestion, as increased understanding can allow them to play a more active part in supporting their
children's academic journeys. The availability and sufficiency of textbooks, libraries, and practical
laboratories are acknowledged as essential components of a complete education, necessitating
continual monitoring and modifications to satisfy students' changing requirements.
The availability of nutritional food in schools is also emphasized as an important element, with
appropriate nutrition favorably influencing children' educational outcomes. Furthermore, the study
emphasizes the importance of skilled instructors, pushing for ongoing monitoring and professional
development to provide them with the necessary skills and expertise to give quality education. To
achieve these goals, governments and funding organizations are encouraged to invest in
implementation research, which will identify gaps and bottlenecks in the education system and
influence effective development measures. Academic performance can be improved by following these
tips, allowing pupils to achieve their goals.
Through their thorough literature review, Balqis Albreiki, Nazar Zaki, and Hany Alashwal [7] provide
insight into the changing landscape of education. They investigate improvements in data gathering
systems and data mining techniques, highlighting their effectiveness in better understanding
educational systems. The evaluation accomplishes its goals brilliantly, improving student performance
by predicting at-risk cases and dropouts. However, one notable limitation is the lack of corrective
measures to provide fast feedback. The goal of future research in this area is to create efficient
ensemble methods and dynamic approaches for performance prediction, with a focus on both static and
dynamic data. This essay fervently emphasizes ML techniques' promising trajectory in forecasting
student success.
Gyan Prakash Srivastava and Dr. Ritu Sharma [8] drew light on the several factors that influence
academic achievement in secondary school students. Their research highlights absenteeism as a
significant barrier to academic progress, potentially leading to school dropout. Furthermore, they
identify many elements that greatly influence academic achievement, such as parental education and
income levels, access to textbooks, libraries, practical laboratories, lunch availability, and instructor
quality. Students who achieve academically and have positive exposure to these characteristics are
more likely to thrive than those who do not. The paper emphasizes the complex interplay between
academic achievement and other student demands, providing a comprehensive view of the varied
terrain.

3. PROPOSED METHODOLOGY

3.1 Logistic Regression


Logistic regression, a fundamental statistical method for binary classification tasks, attempts to
estimate the likelihood of an event occurring based on input parameters. The cornerstone of logistic

3
regression is the development of methodologies for categorical data analysis by mathematicians and
statisticians in the nineteenth century.
The fundamental equation of generalized linear model is:
h(E(x)) = α+βy1+γy2 here h() is the link function E(x) is the expectation of target variable α+βy1+γy2
is linear predictor.
p(x) / 1-p(a) = ez
log[p(a) / 1-p(a)] = z
log[p(x) / 1-p(x) = w .X + b
p(X ; b ; w) = ew . X+ b / 1+ew. X+ b = 1 / 1+e-w. x+b

3.2 Naive Baye


Naive Bayes, a classification algorithm, is based on Bayes' theorem and the essential assumption of
feature independence. In Python, the scikit-learn library can be used to implement the Naive Bayes
algorithm. There are several Naive Bayes classifiers available in the library, including Gaussian,
Multinomial, and Bernoulli variations.
Vector B=(s1,s2….sn) and class variable j, Bayes Theorem states that:
p ( B∨ j )∗p( j)
p( j∨B)=
p (B)
p( j∨B) is posterior probability, from the likelihood p ( B∨ j ) ,and prior proabilities p(j),p(B)
Using the chain rule ,the likelihood p( j∨B) can be decomposed as:
p ( B∨ j ) =p ¿ |j)
Conditional independence,
p( j∨B)=p (s¿ ∨ j)∗p(s 2∨ j)... ¿ ¿
Posterior probability,
p¿
The Naive Bayes classifier integrates the framework with a decision mechanism, which includes
picking the most likely hypothesis. The above decision rule is also known as the maximal a posteriori
(MAP) decision rule.
n
j=argmax j p( j) ∏ p(s i∨ j)
i=1

3.3 Multilayer Perceptron


Multilayer perception networks (MLPs) are a type of neural network that is distinguished by its
numerous layers of interconnected nodes or neurons that are organized in a feedforward fashion. The
node weights in MLP classifier can then be modified based on adjustments that minimize the
inaccuracy in the overall output for the nth data point.
i output node
1
ε (ny )= ∑ ei (ny )
2
2
Using gradient descent, each weight ω jiis,

∂ ε ( xy )
∆ ω ji ( xy )=−x b b ( ) is the learning rate, bin is the output of the prior neuron I, and (n) is the
∂ ai ( n ) i (n ) i n
error calculated using the weighted sum of bins in the input connection of neuron I. It is easier to
demonstrate that the derivative to be computed for an output node depends on the induced local field
cj, which itself fluctuates.

4
−∂ ε ( ny )
=e j ( n) ∅ ' ( c ( n) )∅ ' is a derivative of the activation function, which is a constant in and of itself.
∂ c j ( ny ) j

The analysis is made more challenging by the change in weights for a concealed node. However, it
may be demonstrated that the pertinent derivative.
−∂ ε ( ny ) −∂ ε ( ny )
=∅ ' (c j ( ny ) ) ∑ ωlj ( ny )
∂ c j ( ny ) l ∂ vl
This is based on the weights of the lth node, which represents the output layer, changing. This
approach is a backpropagation of the activation function.

3.4 KNN Algorithm


The algorithm works by locating the k-nearest neighbors inside the training dataset for each data point
in the test dataset. KNN classifier can be sensitive to the choice of distance metric used, as well as the
scale of the features.

Results And Discussion


We will describe each classifier's final result in this section. These two sections—Performance
Evaluation and Performance Analysis of the Applied Models—make up this main section. The
effectiveness of ML models can be assessed using a variety of metrics. Precision, Recall, F1 score, and
Accuracy are the most important factors considered when evaluating a model's performance. The
confusion matrix's values are used to determine the model's precision, recall, F1-Score, and accuracy
scores during the testing phase.

TPI
Precision=
TPI + FPI
TPI
Recall=
TPI + FNI

TPI +TNI
Accuracy=
TPI + FPI + FNI +TNI

2∗precision∗recall
F 1−Score=
precision+recall

TPI= True Positive Instances, TNI= True Negative Instances, FPI= False Positive Instances, FNI=
False Negative Instances.

Model Precision Recall F1-score Accuracy


Logistic Regression 0.72 0.75 0.73 91.89%
Naive Bayes 0.52 0.82 0.69 70.83%
KNN Classification 0.57 0.51 0.54 75.337%
Multilayer perceptron 0.59 0.77 0.67 80.75%
Table 1: Students Academic Performance

5
S t u d e n t s A c a d e m ic P e r f o r m a n c e
Precision Recall F1-score Accuracy
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Logistic Regression Naive Bayes KNN Classification Multilayer perceptron

Figure 1: Students Academic Performance

4. CONCLUSION
A detailed analysis of the student academic performance information reveals several important
conclusions. It is obvious that a variety of important factors, including parental education level, study
time, and attendance, have a profound impact on academic achievement. Notably, students who spend
more time studying and have a consistent attendance record tend to do better academically than those
whose parents have lower levels of education.
A variety of ML algorithms prove beneficial in utilizing these insights and accurately predicting the
academic achievement of students. These include support vector machines (SVMs), decision trees,
random forests, and linear regression. These algorithms are skilled at taking into account a variety of
factors and predicting a student's grade depending on the data they receive. Furthermore, different
performance groups for students, such as high-performing, average-performing, and low-performing,
can be identified using classification algorithms like k-nearest neighbors (KNN), Naive Bayes, and
multilayer perceptrons (MLPs). This classification makes it easier to spot children who are at danger of
falling behind and allows for the introduction of interventions and support systems that are specifically
designed to improve their academic performance.
In conclusion, the student academic performance dataset is a useful tool for comprehending the
complex web of variables that affect academic success. We can better understand the complex
connections between these variables and academic success by utilizing the power of ML techniques.
This in turn gives us the ability to improve our predictive and categorization models, encouraging the
use of data to promote academic advancement and student support.

5. REFERENCES

1. P. R. Pintrich, R. W. Marx, and R. A. Boyle, “Beyond cold conceptual change: the role of
motivational beliefs and classroom contextual factors in the process of conceptual
change,” Review of Educational Research, vol. 63, no. 2, pp. 167–199, 1993.

2. A.Bandura “Perceived Self-Efficacy in Cognitive Development and Functioning,” Educational


Psychologist, vol. 28, no. 2, pp. 117–148, 1993.

6
3. E. Yukselturk and S. Bulut, “Predictors for student success in an online course,” Educational
Technology and Society, vol. 10, no. 2, pp. 71–83, 2007.

4. N. Mousoulides and G. Philippou, “Students' motivational beliefs, self-regulation and


mathematics achievement,” The Psychology of Mathematics Education, vol. 3, pp. 321–328,
2005.

5. A. Marcou and G. Philippou, “Motivational beliefs, self-regulated learning and mathematical


problem solving,” Group for the Psychology of Mathematics Education, vol. 3, pp. 297–304,
2005.

6. B. J. Zimmerman, “A social cognitive view of self-regulated academic learning,” Journal of


Educational Psychology, vol. 81, no. 3, pp. 329–339, 1989.

7. B. J. Zimmerman and M. Martinez-Pons, “Student differences in self-regulated learning:


relating grade, sex, and giftedness to self-efficacy and strategy use,” Journal of Educational
Psychology, vol. 82, no. 1, pp. 51–59, 1990.

8. T. G. Duncan and W. J. McKeachie, “The making of the motivated strategies for learning
questionnaire,” Educational Psychologist, vol. 40, no. 2, pp. 117–128, 2005.

9. P. R. Pintrich, D. A. Smith, T. Garcia, and W. J. Mckeachie, “Reliability and predictive validity


of the motivated strategies for learning questionnaire (Mslq),” Educational and Psychological
Measurement, vol. 53, no. 3, pp. 801–813, 1993.

10. C. Gbollie and M. David, “Aligning expansion and quality in higher education: an imperative
to Liberia's economic growth and development,” Journal of Education and Practice, vol. 5, no.
12, pp. 139–150, 2014.

11. P. R. Pintrich, “The role of metacognitive knowledge in learning, teaching, and


assessing,” Theory into Practice, vol. 41, no. 4, pp. 219–225, 2002.

12. R. Schmidt and Y. Watanade, “Motivation, strategy use, and pedagogical preferences in
foreign language learning,” in Motivation and Second Language Acquisition, Z. Dornyei and R.
Schmidt, Eds., University of Hawaii, Second Language Teaching and Curriculum Center,
Honolulu, Hawaii, USA, Technical Report #23, pp. 313–359, 2001.

13. X. Xu, “The relationship between language learning motivation and the choice of language
learning strategies among Chinese graduates,” International Journal of English Linguistics,
vol. 1, no. 2, 2011.

14. N.-D. Yang, “The relationship between EFL learners' beliefs and learning strategy
use,” System, vol. 27, no. 4, pp. 515–535, 1999.

15. P. R. Pintrich, R. W. Roeser, and E. A. M. de Groot, “Classroom and Individual differences in


early adolescents' motivation and self-regulated learning,” The Journal of Early Adolescence,
vol. 14, no. 2, pp. 139–161, 1994.

7
16. B. J. Zimmerman, “Self-regulating academic learning and achievement: the emergence of a
social cognitive perspective,” Educational Psychology Review, vol. 2, no. 2, pp. 173–201,
1990.

17. J. Gasco, A. Goñi, and J. D. Villarroel, “Sex differences in mathematics motivation in 8th and
9th grade,” Procedia—Social and Behavioral Sciences, vol. 116, pp. 1026–1031, 2014.

18. Dr. N.V. Balaji, “Improved Artificial Neural Network Through Metaheuristic Methods and
Rough Set Theory for Modern Medical Diagnosis”, Indian Journal of Computer Science and
Engineering (IJCSE)

19. Dr. N.V. Balaji, “Heart disease prediction system using Correlation Based Feature Selection
with Multilayer Perceptron approach”, IOP Conference Series: Materials Science and
Engineering

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy