Journal Publications
Journal Publications
Raunak Bedi
Abstract
Prediction of academic performance of students is one of the major topics for universities and
schools as it can be helpful to design the right mechanisms to avoid dropouts and improve
academic results, among others. A lot of processes have been automated in usual activities of
students to benefit them and manage big data gathered from software products for tech-based
learning. Hence, processing and analyzing the same data properly can give a lot of vital insights
to their knowledge and relation between students and their homework. This information can feed
promising methods and algorithms for prediction of student performance.
This study is conducted to analyze various machine learning models used for predicting student’s
performance. This study presents an in-depth review of studies examining data of online learning
environments to predict students’ outcomes with machine learning techniques. This study will help
identify the online course features used for prediction of learners’ outcome, determine the outputs
of prediction, strategies, and methodologies of feature extraction for prediction of performance,
evaluation metrics, and challenges and limitations for analyzing the outcomes.
Introduction
The way people used to learn has been revolutionized by online learning and education has never
been so convenient and affordable to billions of people worldwide. Irrespective of rising interest
and benefits of distance and online learning, universities and schools are highly concerned about
students’ retention and academic performance, along with low degree/certification completion
rates and high dropout rates.
Shalom Presidency School
st
A-199 1 floor sector 55 Gurgaon, Haryana,122011, India
bedi.raunak@gmail.com
Phone No=+91 9354980255
2
algorithm can determine whether an image
Dropping out or failing an online program has object(Watt et al, 2020).
or course is usually an important parameter
to assess course/program quality and
allocate resources by institutional Additionally, machine learning can
authorities. understand the products a customer might
like. After analyzing the previous products,
the system suggests new product that might
Low certification and dropout rates are also be interesting to the customer (Witten et al,
a major risk factor to profitability, funding, 2016). All such examples have similar
and reputation of an institution (Arce et al, principle. The data is processed by the
2015). These outcomes have vast impact system and this data can be identified, and
on well-being, self-esteem, odds of then this knowledge is used to make future
graduating, and employment of students decisions. The rise in data has been
(Arce et al., 2015; Larusson & White, 2014). effective to make such applications more
Hence, it is important to find more efficient effective. Machine learning is categorized
methods to forecast performance of into supervised and unsupervised learning
students as early as possible for students, as per the type of input. Input data belongs
educators, and institutions to take necessary to a common class structure in supervised
measures for improving online learning learning (Mitchell, 2007; Kumar et al,
experiences of students and building 2012). This input data is called training
intervention strategies to meet the needs of data. The algorithm is basically aimed to
students. With rising interest of online create a prediction model for predicting a
learning and big data produced by students property with other properties. After
by interacting with online learning creating the model, it processes data with
environments, several machine learning similar class structure to input data. There is
methods are proposed by researchers to no known class structure in input data and
predict students’ performance and improve algorithm is aimed to reveal the data
their learning outcomes. structure in unsupervised learning
1.1 Background (Mitchell, 1997; Sugiyama, 2015).
3
dropout rates (Kalles and Pierrakeas, machine learning models, they also used
2006). Areas out of online learning are ensembled techniques and compared the
major contexts where performance or results. They found best result with
dropout predictions are widely used for decision trees. Behavioral features were the
research purposes. Purpose of these studies other area focused by the researchers. They
varies. Finding the best prediction created a model by taking these features or
approach is important in some studies. without them. The prediction results were
Some studies are aimed to determine the improved by including these behavioral
viability of machine learning to predict features.
student performance ordropout.
Cortez & Silva (2008) conducted a study
A study was conducted at the “Eindhoven on performance prediction at the
University of Technology” to determine “University of Minho, Portugal”. The
the effectiveness of machine learning to dataset had information on whether exam
predict students’ dropout rate (Dekker et has been passed by the students in
al., 2009). Building various prediction Portuguese language and Mathematics.
models with various machine learning They used ML models like random forest,
approaches like Logit, BayesNet and decision trees, support vector machines,
CART are the basic methodology here. and neural networks and compared them
Then, they compared prediction outcomes for accuracy. They also compared a dataset
of various models in terms of effectiveness. with previous exam results and the one
J48 classifier has successfully built the which didn’t have past grades.
most efficient model. A group of Performance was improved by adding
researchers from three Indian universities previous results.
have conducted a similar study. They
analyzed the dataset of university students
using various algorithms. They compared 2.1 Research Gap
the recall value and precision later. They E-learning has become a common form of
got most accurate results with the ADT education and a vital part of development
decision tree algorithm (Yadav et al.,2012). of online education (Giannakos & Vlamos,
However, prediction of performance of 2013). E-learning has become a common
students rather than dropouts is more phenomenon because of the impact of
relevant with this study. There are also COVID-19 across the world because of its
some studies which have predicted spatial flexibility, high temporal, rich
students’ performance. In “Hellenic Open education materials, and low learning
University”, a study was done to analyze curve. However, teachers cannot perceive
the use of ML in distance education by learning status of the students easily in this
Kalles & Pierrakeas (2006). They used mode (Qu et al., 2019). Hence, it raises
decision trees and genetic algorithms to concerns over the quality of online
come up with a predictive model and learning. Hence, this study is based on
compared the results for accuracy. The education, i.e., by predicting student
“Genetically Evolved Decision Trees performance. It also evaluates
(GATREE)” model has provided most effectiveness of various machine learning
accurate results. approaches. Recall, precision, F-
Research Methodology
Recommendation systems gather data on user choices for a range of elements like
8
websites, books, applications, travel prediction model for academic
destinations, e-learning, etc. In context of performance (Iyanda et al., 2018). Finally,
performance of students, it is possible to ANN’s potential for prediction of learning
acquire explicit information or implicit outcomes is compared to “multivariate LR
information by gathering scores and model” in medical education
monitoring their behavior like materials (Dharmasaroja & Kingkaew, 2016).
downloaded and visiting study materials
(Bobadilla et al, 2013). Recommender Results
systems consider various data sources for
predictions. They manage factors like
novelty, precision, stability, and dispersion. Application of techniques like
4.2.1. Collaborative Filtering collaborative filtering, machine learning,
artificial neural network and recommender
It plays a vital role in prediction, even systems can consider different types of
though it is used with other filtering information for students’ behavioral
methods like knowledge-based, content- prediction, for example, tasks’ grades and
based, or social approaches (Bobadilla et demographic characteristics. A study by
al, 2013). Just like decisions are based as the “Hellenic Open University” is a good
per previous knowledge and experiences, point to start, where researchers applied
collaborative filtering is used to perform various supervised and
prediction. Some studies have predicted
various issues regarding performance of
students with collaborative filtering.
Hence, Bydžovská (2015) found
similarities between students where their
knowledge was represented as a range of
grades from earlier courses.
13