Research Paper, 2020
Research Paper, 2020
Abstract—The growth and development of predictive models goal of reducing dropouts, increasing retention, performance
in the current world has influenced considerable changes. Today, and graduation rate.
predictive modelling of academic performance has transformed
more than a few institutions by improving their students' Application of the appropriate data mining technique that
academic performance. This paper presents a computational suits the current scenario is important in order to identify
predictive model using artificial neural networks to predict useful patterns. In this article, factors that have an impact on
whether a student will pass or fail. The model is unique in the the pass rates of students are identified and used in the
current literature as it is specifically designed to evaluate the classification model. The following algorithms are applied in
effectiveness of the predictive strategies on neural networks as the construction of the classification model-Artificial Neural
well as on five additional algorithms. The analysis of the Networks, Logic Regression, eXtremeGBoost, SVM, Naive
experimental results shows that Artificial Neural Networks Bayes, and Random Forest algorithms.
outperformed the eXtremeGBoost, Linear Regression, Support
Vector Machine, Naive Bayes, and Random Forest algorithms for The rest of this article is structured as follows: the
academic performance prediction. literature review is presented in Section II. The description of
the data and the methodology used are presented in
Keywords—Classification modelling; data mining; higher Sections III and IV. The results and its discussion are
education institutions; accuracy; academic performance presented in Section V. In Section VI, conclusions and
recommendations are presented.
I. INTRODUCTION
Public higher education providers are institutions that have II. LITERATURE REVIEW
been established and funded by the state through the In a research conducted by [4], the researchers attempted
Department of Higher Education and Training (DHET). Public to explore the applicability of Fuzzy C-Means clustering
providers include universities, universities of technology, and technique for academic performance of students. They found
comprehensive universities. Private providers are owned by that fuzzy C-Means clustering algorithm serves as a good
private organizations or individuals. Higher education benchmark to monitor the progression of students modelling
institutions (HEIs) operate in an increasingly complex and in educational domain. The author in [5] also recommended a
challenging environment. Competition has increased, and fuzzy logic-based expert system that periodically evaluates
previously anticipated government funding has become scarce student performance and supplies students with feedback on
[1]. In such circumstances, HEIs must succeed in a financial progress within data grid environment. The system made use
sense or else they will go out of business [2]. In their quest for of the fuzzy logic theory and develop the decision making
survival, common practices adopted by HEIs are to increase process based on fuzzy rules to assess whether a student gets
the intake of students and try to improve on their success rates. very poor, poor, good, average or excellent performance.
Since, many government and private funds depends on the
throughput rates of institutions, being able to predict the In an attempt to identify the main attributes that may affect
chances of any new student’s success is very important. This the performance of students in engineering, [6] applied data
study aims to improve the pass rates of students’ in a mining concepts such as k-Means clustering and Decision tree
particular private academic institution by providing a Techniques. They used records of 1500 students enrolled for
classification model to assist in identifying student at risk of various subjects in engineering. The author in [7] investigated
failing a program. Being able to identify such students, the the impact of classroom attendance and gender on academic
educational institutes can provide a targeted support performance of university students in an Organic Chemistry
mechanisms to the needy students. The author in [3] mention course. Data was collected through survey involving real time
that the reasons for the identification of a student at risk of documentation of attendance for each student at each class
dropouts or attrition early enough are to be able to provide lesson over a three month period. Their findings show that
necessary support and interventions for the student with the attendance had a significant impact on the performance. In
another study, [8] analysed the impact of class attendance,
practical work and assignments in a course on the success rate.
415 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
They found that the number of given assignment has a Number of employed parents or guardians.
negative impact on the academic performance. They used
C4.5 as the classification algorithm for their work. Several Group Assignment marks.
other studies conducted have shown that class attendance is an Test marks.
important predictor of academic outcomes which conclude
that students who attend more classes generally earn higher Individual Assignment marks.
final grades [9].
The scatterplot (Fig. 1) shows the distribution of individual
In a study by [10], one of the factors that influences a test marks in relation to the individual assignment marks. In
student’s ability to succeed is the socioeconomic conditions. analysis of this scatterplot, most of the students perform well
This fact is supported by [11] who state that Student poverty in both tests and individual assignments. There are a few
and the lack of sufficient funding have consistently been cited outliers who perform very well in individual assignments but
as key reasons for student academic failure and progression poorly in tests. According to this scatterplot, the approximate
difficulties. In the study by [12], they used marks of four range for tests with most students’ marks is 40 to 80, and that
academic batches of Computer Science & Information for the individual assignments is 50 to 90. This shows that
Technology (CS&IT) students for predicting performance. In students are generally performing better in individual
their study, they collected records of 347 undergraduate assignments than in tests.
students have been mined with classifiers such as Decision
The scatterplot (Fig. 2) for Test and Group assignment
tree, Neural Networks and Naive Bayes.
marks shows that a greater proportion of students perform
In another study, [13] applied Naïve Bayes for the very well in group assignments, where they take part in
classification of student evaluation. Their dataset consisted of research activities. By comparison, a lot of students fail the
the following parameter-age, place of birth, gender, high tests as shown by the large concentration of test marks below
school status (public or private), department in high school, the mark of 50, compared to the test mark greater than 50.
organization activeness, age at the start of high school level, This could provide a basis for intervention by the private
and progress GPA score. institution in efforts to assist the students prepare better for
tests.
Discriminate analysis was done by [14] to predict the
success and failure of students in a specific physics course.
Discriminate analysis is a similar technique to multiple
regression except that it is used for categorized data. They
used this technique to provide a function that contains the
variables that should be used for predicting the success of a
student. They collected the data for 1622 students who
enrolled in Electricity & Magnetism course, which had a high
rate of failure. At first they identified many possible predictors
such as, SAT grade, MATH GPA, Overall GPA. In another
study [15], applied predictive modelling techniques to identity
students at risk of dropping out of their registered
qualification. They used Support Vector Machine, Naïve
Bayes, Decision tree, K-nearest neighbors and Random Forest
on 1156 students.
III. DATA DESCRIPTION
Fig. 1. Scatterplot of Test and Individual Assignment marks.
This research followed a quantitative approach.
Questionnaires were administered to private academic
institutions in an anonymously manner to enhance the privacy
and anonymity of the participants. The questionnaires in this
study were distributed in two ways: manually handed out and
also using the online survey tool survey monkey. The dataset
consisted of the following attributes:
Study hours per week.
Bursary - whether a student has a bursary or not.
Class Attendance.
Student workload (number of modules registered).
Fulltime study or attending through part-time classes.
English language proficiency marks. Fig. 2. Scatterplot of Test and Group Assignment Marks.
416 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
417 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
418 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
[6] V. Sreenivasarao and C. G. Yohannes, "Improving academic [13] N. Dengen, E. Budiman, M. Wati, and U. Hairah, "Student Academic
performance of students of defence university based on data Evaluation using Naïve Bayes Classifier Algorithm," in 2018 2nd East
warehousing and data mining," Global Journal of computer science and Indonesia Conference on Computer and Information Technology
technology, 2012. (EIConCIT), 2018: IEEE, pp. 104-107.
[7] O. D. Ayodele, "Class attendance and academic performance of second [14] E. W. Thomas, M. J. Marr, A. Thomas, R. M. Hume, and N. Walker,
year university students in an organic chemistry course," African Journal "Using discriminant analysis to identify students at risk," in Technology-
of Chemical Education, vol. 7, no. 1, pp. 63-75, 2017. Based Re-Engineering Engineering Education Proceedings of Frontiers
[8] N. A. Yassein, R. G. M. Helali, and S. B. Mohomad, "Predicting student in Education FIE'96 26th Annual Conference, 1996, vol. 1: IEEE, pp.
academic performance in KSA using data mining techniques," Journal 185-188.
of Information Technology and Software Engineering, vol. 7, no. 5, pp. [15] R. Lottering, R. Hans, and M. Lall, "A model for the identification of
1-5, 2017. students at risk of dropout at a university of technology," in 2020
[9] A. Kirby and B. McElroy, "The effect of attendance on grade for first International Conference on Artificial Intelligence, Big Data, Computing
year economics students in University College Cork," Vol. XX, No. XX, and Data Communication Systems (icABCD), 2020: IEEE, pp. 1-8.
Issue, Year, 2003. [16] I. H. Witten and E. Frank, "Data mining: practical machine learning
[10] D. E. Roby, "Research on school attendance and student achievement: A tools and techniques with Java implementations," Acm Sigmod Record,
study of Ohio schools," Educational Research Quarterly, vol. 28, no. 1, vol. 31, no. 1, pp. 76-77, 2002.
pp. 3-16, 2004. [17] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-
[11] S. Mngomezulu, R. Dhunpath, and N. Munro, "Does financial assistance 32, 2001.
undermine academic success? Experiences of'at risk'students in a South [18] J. Gao, X. He, and L. Deng, "Deep learning for web search and natural
African university," Journal of Education (University of KwaZulu- language processing," 2015.
Natal), no. 68, pp. 131-148, 2017. [19] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning,
[12] R. Asif, A. Merceron, and M. K. Pathan, "Predicting student academic vol. 20, no. 3, pp. 273-297, 1995.
performance at degree level: a case study," International Journal of [20] D.-M. Tsai and C.-C. Lin, "Fuzzy C-means based clustering for linearly
Intelligent Systems and Applications, vol. 7, no. 1, p. 49, 2014. and nonlinearly separable data," Pattern recognition, vol. 44, no. 8, pp.
1750-1760, 2011.
419 | P a g e
www.ijacsa.thesai.org