0% found this document useful (0 votes)
38 views5 pages

Paper 7

This paper presents the use of various classification techniques like Naive Bayes, Decision Trees, and Multilayer Perceptrons to predict student academic performance using attributes like school, gender, age, family size, parents' education level, and employment from an educational data set. The techniques are applied using the WEKA data mining tool and their results are analyzed based on performance metrics like accuracy, precision, and recall. Decision Trees are found to have the best performance in correctly classifying students' final grades compared to the other techniques.

Uploaded by

janukrmdj117
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views5 pages

Paper 7

This paper presents the use of various classification techniques like Naive Bayes, Decision Trees, and Multilayer Perceptrons to predict student academic performance using attributes like school, gender, age, family size, parents' education level, and employment from an educational data set. The techniques are applied using the WEKA data mining tool and their results are analyzed based on performance metrics like accuracy, precision, and recall. Decision Trees are found to have the best performance in correctly classifying students' final grades compared to the other techniques.

Uploaded by

janukrmdj117
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2017 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON)

GLA University, Mathura, Oct 26-28, 2017

Predicting Academic Performance of Student Using


Classification Techniques
Sagardeep Roy Anchal Garg
Department of Computer Science and Engineering Department of Computer Science and Engineering
ASET, Amity University ASET, Amity University
Noida, India Noida, India
roy.sagard@gmail.com agarg@amity.edu

Abstract—Data Mining methods are applied on educational grades are known. [3] The paper is further divided into five
data with the intent of enhancing teaching methods, improving sections. Various related literature review is presented in the
quality of teaching, identifying weak students, identify factors second section. Third section describes the implementation
that influence Student’s academic performance. This utilization using WEKA tool, the methodology. Fourth section explains
of data mining methods to elevate quality of education, all the outcomes and findings. Conclusion is given in the fifth
identifying students who need improvement is termed as and last section of the paper.
educational data mining. EDM has become a major research
interest for many researchers. The primary function of
educational data mining is prediction of student’s academic II. LITERATURE REVIEW
performance. [1] Predicting student’s academic performance There are many papers published in the application areas of
helps in identifying a number of things like students who are EDM & LA. Many researches have been done in Predicting
likely to drop out, students who are weak and needs
academic performance of a student. In [4], a research was
improvement, students who are good in academics but lately
deteriorated. The intent of this paper is to determine factors that done to predict drop out students. The results proved that
can influence a student’s academic performance. Decision Tress gives good accuracy. In [5], ANN technique
was applied to predict academic performance of students. In
Keywords—Educational Data Mining, Performance Prediction, [6], three classical DM methods namely Decision Tree, Naïve
Classification, Decision Trees, WEKA. Bayes and Neural Networks were implemented and used to
predict student academic achievement. Naïve Bayes produced
I. INTRODUCTION better results. In [7], Kabakcheiva used CRISP-DM model to
Education system all over the world has changed rapidly predict student academic performance. Decision Tree (J48)
since vast research in the field of EDM and learning analytics. classifier, two Bayesian classifiers such as Naïve Bayes and
Use of DM methods, machine learning methods and different BayesNet, Nearest Neighbor algorithm (IBk) and two rule
statistical techniques in education is EDM. EDM uses the learners (OneR and JRip) were applied. The results produced
above mentioned techniques to analyze data from educational confirmed that Decision Tree (J48) has better accuracy than
institutions to uncover different patterns of student’s behaviors the rule learner (JRip) and IBk. Bayesian Classifiers had poor
and predict performance of students. The main goal of this accuracy. In [8], regression analysis was applied to predict
research field is to help student improve their skills, to find out student’s marks in a distance learning system. In [9], Genetic
what hinders student from achieving success and how to Programming was used with DM techniques to predict student
improve it. This is done by various data mining techniques, failure. CRISP-DM is more popular with researchers as other
machine learning algorithms and statistical techniques. DM process models such as SEMMA which is developed by
Learning analytics is similar to Educational data mining, is a the largest producers of analytics software, SAS Institute. [10].
collection and analysis of usage data associated with student It concentrates only on modeling tasks forgetting the business
learning. [2]. Initially EDM and LA where major topics in angles unlike CRISP-DM which has “Business Understanding
education but slowly predicting student performance started Phase”. SEMMA intends to provide services only to clients of
becoming more popular as the primary objective of the topic is SAS Enterprise Miner. [11]. In [12], C. Romero instead of
to analyze and predict student performance so that the student using traditional classification algorithms to predict classes
can perform better. This paper presents different classification pass or fail, introduced and applied classification via
methods by which performance of students can be predicted.
clustering. In [13], Jia et al predicted student’s retention by
The paper used dataset from UCI machine learning repository.
The data attributes include student’s marks, demographic, combining SVM and neural network to improve classification
social and school related features. This dataset was previously accuracy.
used where different DM techniques such as Random Forest, III. METHODOLOGY
Neural Networks, SVMs and regression techniques were used
to predict secondary school student performance. They In this paper, we have used WEKA (Waikato Environment for
concluded that it is possible to predict final grades if previous Knowledge Analysis). It is a prominent data mining tool

978-1-5386-3004-4/17/$31.00 ©2017 IEEE 568


written in Java, developed at the University of Waikato, New IV. RESULTS AND DISCUSSIONS
Zealand. It is non-propriety and freely available. [14]. We In this segment, we will converse the outcomes and
have used this tool for pre-processing, classification and findings found from the output of Naïve Bayes classifier, J48
visualization of the data. Decision Tree and MLP. Further we will analyze the dataset
TABLE I. NAMES AND DESCRIPTION OF ATTRIBUTES and compare attributes with final grades to see which
attributes are strongest predictors. Performance of any
Attribute Description Value algorithm is evaluated on the basis of precision and recall.
SC School {GP,MS}
G Gender {F,M}
Classification matrix of J48, Naïve Bayes and MLP is shown
A Age 15-22 in Table II, Table III and Table IV. Also TP rate, FP rate,
AD Type of Address {U,R} Precision and Recall of each classifier is shown in Table V,
FS Size of Family {LE3,GT3} Table VI and Table VII. Table VIII shows the comparison of
PS Parent's Cohabitation {T,A} three classifiers on the basis of time taken to run and correctly
Status classified instances.
ME Educational {None, Basic, Middle
Qualification of Mother School, Secondary
School, Graduate} TABLE II. CONFUSION MATRIX OF J48
FE Educational {None, Basic, Middle
Qualification of Father School, Secondary G3 Prediction
School, Graduate} Fail D B A C
MJ Employment type of {Educator, Healthcare, Actual Fail 109 21 0 0 0
Mother Civil Service, Home, D 16 70 0 0 17
Other} B 0 0 46 3 11
FJ Employment type of {Educator, Healthcare, A 0 0 11 29 0
Father Civil Service, Home, C 0 12 12 0 38
Other}
RSN Reason for opting a {distance, reputation, TABLE III. CONFUSION MATRIX OF NAÏVE BAYES
certain school course preference,
Other} G3 Prediction
GD Guardian of Student {M, F, O} Fail D B A C
TT Time taken to travel to {<15min, 15-30 min, Actual Fail 93 37 0 0 0
school 30min-1hr, >1hr} D 11 69 3 0 20
ST Weekly Study time {<2hrs, 2-5hrs, 5-10hrs, B 0 0 48 6 6
>10hrs} A 0 0 11 28 1
FL No of previous failures n if 1<=n<3, else 4 C 0 16 13 0 33
C Coaching {Y, N}
FES Educational support {Y, N} TABLE IV. CONFUSION MATRIX OF MLP
given by family
EPC extra paid classes {Y, N} G3 Prediction
ECA extra-curricular {Y, N} Fail D B A C
activities Actual Fail 93 33 0 0 4
NUS studied nursery {Y, N} D 34 44 7 0 18
HI pursue higher education {Y, N} B 1 6 26 13 14
NET Access to internet {Y, N} A 0 2 12 23 3
RS Relationship Status i.e. {Y, N} C 4 28 10 4 16
In a relationship or not
FR Family relationship {Very Bad, Bad, TABLE V. PERFORMANCE OF NAÏVE BAYES
Medium, Good, Very
Good}
FT free time after school {Very Low, Low, CLASSIFIER TP RATE FP RATE PRECISION RECALL CLASS
Medium, High, Very
High}
HO Hanging out with {Very Low, Low, NB 0.700 0.017 0.824 0.700 A(16-20)
friends Medium, High, Very
High}
DA workday alcohol {Very Low, Low,
0.800 0.081 0.640 0.800 B(14-15)
consumption Medium, High, Very
High}
WA weekly alcohol {Very Low, Low,
consumption Medium, High, Very NB 0.532 0.081 0.550 0.532 C(12-13)
High}
HL current health status {Very Bad, Bad,
Medium, Good, Very 0.670 0.182 0.566 0.670 D(10-11)
Good}
AB Absences in school 0-93
G1 First year grade 0-20 0.715 0.042 0.894 0.715 FAIL
G2 Second year grade 0-20
Result(G3) Final grade {A,B,C,D,F}

569
TABLE VI. PERFORMANCE OF MLP scored “0” in final period meaning all these students are drop-
outs. One interesting pattern among drop out students is that
CLASSIFIER TP RATE FP RATE PRECISION RECALL CLASS all the students studied in the same school i.e. GP. Almost all
students fall in 15-16 age categories. We removed G1, G2 and
failed class variables which are strongest predictors as we are
MLP 0.575 0.048 0.575 0.575 A(16-20) trying to figure out what other factors other than previous
academic performances that can influence final grades. Fitting
Random Forest Model, we find out in variable importance plot
0.433 0.087 0.473 0.433 B(14-15)
as shown in Fig 1 the top factors influencing final grades are
students who wants to do higher education, student’s study
0.258 0.117 0.291 0.258 C(12-13) time, weekend and workday alcohol consumption, mother’s
education and romantic relation of Student. Variables
compared with final grades are given in Table VIII.
0.427 0.236 0.389 0.427 D(10-11)
TABLE IX. VARIABLES COMPARED WITH FINAL GRADES
S. No. Variables compared with Final Grade
0.715 0.147 0.705 0.715 FAIL

1 Alcohol Consumption(Workday and Weekend)


TABLE VII. PERFORMANCE OF J48
2 Health

CLASSIFIER TP RATE FP RATE PRECISION RECALL CLASS 3 Romantic Relationship


4 Parent’s Education

J48 0.725 0.008 0.906 0.725 A(16-20)


Alcohol Consumption
When alcohol consumption given in dataset is compared with
0.767 0.069 0.667 0.767 B(14-15) final grade, we found out that alcohol consumption in week
days does affect academic performance. Although weekend
0.613 0.084 0.576 0.613 C(12-13) alcohol consumption does not affect grade as much as
weekday alcohol consumption, we can see this in Figure 2 and
0.680 0.113 0.680 0.680 D(10-11)
Figure 3. We also compared both “WA” and “DA” with
absences and found out that more drinking leads to more
absences. So we can clearly say that alcohol consumption does
0.838 0.060 0.872 0.838 FAIL affect academic performance.

TABLE VIII. PERFORMANCE OF COMPARISON Health


One interesting finding is that student with poor health status
NAÏVE BAYES MLP J48
scored more than the students who had a health score of “very
good”. So we can’t conclude that healthier students tend to
score high.
TIME TAKEN TO 0.03 SECONDS 29.06 SECONDS 0.17 SECONDS
BUILD MODEL
Romantic Relationship
When we compared romantic relationships with academic
CORRECTLY 68.6076 % 51.1392 % 73.9241 %
CLASSIFIED performance, we found out mean final grade of those students
INSTANCES who are not in a romantic relationship is greater than those
who are in a romantic relationship. Students who are not in
INCORRECTLY 31.3924 % 48.8608 % 26.0759 % relationship scored higher grades more frequently than those
CLASSIFIED
INSTANCES students who are in relationship

Parents Education
Clearly, we can see that G1 and G2 are the strongest We wanted to see how Parent’s Education affects the final
predictors. But we already knew it as student achievement is grade. When both Mother and Father’s Education were
highly affected by previous performances. [15]. Let us see compared with final grades and we found out that both
other variables which might influence final grades by doing parent’s education level is important factor for scoring good
explanatory data analysis. Drop out Students have been grades. As the education level of mother increases, the desire
identified. There were no students with score “0” in first to pursue higher education increases. We observe that as the
period i.e. G1. Number of students who scored only zero in education level of mother and father increases, the probability
second period i.e. G2 were 13 but these same students also of having zero number of failures decreases.

570
When parent education level is compared with internet access
and study time, some good observations were recorded. As the
education level of mother increases, the probability of student
having internet access increases. The mean study time
increases with the increase in mother’s education level. For
father’s case there is no such pattern. Also, we found out that
when the education level of the mother increases, the
percentage of students who consume high level of alcohol
during weekends decreases.
Thus, we understand how much important factor is mother
education for student’s scoring good grades. So we can
conclude that alcohol consumption (weekend and workday
both), parent’s education are strong predictors.

Fig. 3. Weekend Alcohol Consumption

V. CONCLUSION
Predicting student’s academic performance is of great concern
to the education institutes. It helps us to identify the abilities
of students, their interests and weaknesses. Student
performance can be influenced by different types of attributes.
This can be social, demographic and related to school. Here
we used Naïve Bayes, J48 decision tree and MLP for
classification. Naïve Bayes had 68.60 % accuracy, J48 had
73.92% accuracy and MLP had 51.13% accuracy. Previous
grades are highly linearly correlated to final grades. Without
Fig. 1. Variable Importance plot previous grades it is very difficult to predict final grades but
we can analyze the data and find which other attributes plays
an important role in predicting final grades. The top factors
influencing final grades are weekend and workday alcohol
consumption, Parent’s education. Also romantic relationship
slightly influences academic performance. Education level of
mother is another important factor in determining student’s
good scores.

REFERENCES

[1] C. Romero, S. Ventura, “Educational Data Mining: A Review of the


State of the Art,” IEEE Transactions on Systems, Man, and
Cybernetics—Part c: Applications and Reviews, vol. 40, no. 6, 2010, pp.
601-618.
[2] G. Siemens, Learning and Academic Analytics, 5 August 2011,
http://www.learninganalytics.net/?p=131
[3] P. Cortez and A. Silva. “Using Data Mining to Predict Secondary School
Student Performance”, In A. Brito and J. Teixeira Eds., Proceedings of
Fig. 2. Weekday Alcohol Consumption 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-
12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.
[4] G.W.Dekker,M. Pechenizkiy and J. M. Vleeshouwers, "Predicting
Students Drop Out: A Case Study", 2nd International Conference On
Educational Data Mining Proceedings, Cordoba, Spain, July 1-3, 2009
[5] M. F. Musso, E. Kyndt, E. C. Cascallar and F. Dochy, “Predicting
general academic performance and identifying the differential
contribution of participating variables using artificial neural networks”,
Frontline Learning Research, 1(1):42–71, 2013.
[6] E. Osmanbegoviü , M. Suljiü, “DATA MINING APPROACH FOR
PREDICTING STUDENT PERFORMANCE”, Journal of Economics
and Business, Vol. X, Issue 1, May 2012

571
[7] D. Kabakchieva, “Predicting Student Performance by Using Data [11] S. S. Rohanizadeh and M. B. Moghadam, “A Proposed Data Mining
Mining Methods for Classification”, Cybernetics and Information Methodology and its Application to Industrial Procedures”, Journal of
Technologies- The Journal of Institute of Information and Industrial Engineering 4 (2009) pp 37-50.
Communication Technologies of Bulgarian Academy of Sciences, DOI: [12] C. Romero, M. López, J. M. Luna, S. Ventura, “Predicting students’
10.2478/cait-2013-0006 final performance from participation in on-line discussion forums”,
[8] S. B. Kotsiantis, “Use of machine learning techniques for educational ELSEVIER, Computers & Education 68 (2013) 458–472, 2013
proposes: a decision support system for forecasting students [13] J.W. Jia and M. Mareboyana,“Machine Learning Algorithms and
grades”,Artificial Intelligence Review, 37(4):331–344, 2012. Predictive Models for Undergraduate Student Retention”, In
[9] C.M. Vera, A. Cano, C. Romero, S. Ventura, “Predicting student failure Proceedings of the World Congress on Engineering and Computer
at school using genetic programming and different data mining Science, volume 1, 2013.
approaches with high dimensional and imbalanced data”, DOI [14] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutmann, & I. Witten
10.1007/s10489-012-0374-8 (2009), “The WEKA Data Mining Software: An Update”, ACM
[10] A. Azevedo, and M. F. Santos, “KDD, SEMMA and CRISP-DM: a SIGKDD Explorations Newsletter, 11(1), 10-18. Retrieved from
parallel overview” In Proceedings of the IADIS European Conference http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.148.3671
on Data Mining 2008, pp 182-185. Archived January 9, 2013, at the [15] S. Kotsiantis, C. Pierrakeas and P. Pintelas (2004), “Predicting Students’
Wayback Machine. Performance in Distance Learning Using Machine Learning
Techniques”, Applied Artificial Intelligence (AAI), 18, no. 5, 411–426.

572

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy