0% found this document useful (0 votes)
82 views5 pages

Data Mining: A Prediction of Performer or Underperformer Using Classification

This document summarizes a research paper that uses data mining techniques to predict student performance at institutions of higher education. Specifically, it uses a Bayesian classification method on previous student performance data to classify current students as either performers or underperformers. The goal is to help institutions identify struggling students early to reduce dropout rates and improve overall performance. It provides background on data mining and classification techniques. It then describes Bayesian classification in more detail and discusses how the authors applied it to educational data from various degree colleges in India to predict student performance.

Uploaded by

Ardian Syah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views5 pages

Data Mining: A Prediction of Performer or Underperformer Using Classification

This document summarizes a research paper that uses data mining techniques to predict student performance at institutions of higher education. Specifically, it uses a Bayesian classification method on previous student performance data to classify current students as either performers or underperformers. The goal is to help institutions identify struggling students early to reduce dropout rates and improve overall performance. It provides background on data mining and classification techniques. It then describes Bayesian classification in more detail and discusses how the authors applied it to educational data from various degree colleges in India to predict student performance.

Uploaded by

Ardian Syah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Umesh KUmar Pandey et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol.

2 (2) , 2011, 686-690

Data Mining : A prediction of performer or


underperformer using classification
Umesh Kumar Pandey S. Pal
VBS Purvanchal University, Jaunpur

Abstract — Now a day’s students have a large set of it has been called exploratory analysis, data driven
data having precious information hidden. Data mining discovery and deductive learning. Data mining access of a
technique can help to find this hidden information. In this database differs from this traditional access in several
paper, data mining techniques name Byes classification ways: query, data and output [10]. A data mining
method is used on these data to help an institution. algorithm is a well-defined procedure that takes data as
input and produces output in the form of models or
Institutions can find those students who are consistently
patterns. The term well-defined indicate that the
perform well. This study will help to institution reduce the
procedure can be precisely encoded as a finite set of rules
drop put ratio to a significant level and improve the [4]. The structures discovered during the data mining
performance level of the institution. process can describe the entire (the most of the) set of
data and they are called “models”. There are also cases
Keywords: Data mining, classification, Predictive when the structures discovered get some local properties
model, Bayesian classification. of the data and in that case the term of “pattern” is used
[5]
I. INTRODUCTION
As the cost of processing power and storage is coming III. BACKGROUND AND RELATED WORK
down, data storage became easier and cheaper so the Data mining is an emerging methodology used in
amount of data stored in educational databases is educational field to enhance our understanding of learning
increasing rapidly. In order to get benefits from such large process to focus on identifying, extracting and evaluating
database to find hidden relationships between variables variables related to the learning process of students [2].
using different data mining techniques developed and Data mining can be applied to a number of different
used. applications such as data summarization, learning
Data mining, sometimes also called Knowledge classification rules, finding associations, analyzing
Discovery in databases (KDD), can be very useful in the changes and detecting anomalies [8]. Educational data
student-centered educational system. Within the KDD mining is used to identify and enhance educational process
process, there can be used different means of data mining which can improve their decision making process [6].
analysis that allow getting important information from the Gabrilson uses the data mining prediction technique to
database such as: classification, clustering, association, identify the most effective factor to determine a student’s
decision tree, neural network etc [5]. test score, and then adjusting these factors to improve the
student’s test score performance in the following year [7].
Classification is perhaps the most familiar and most Luan uses data mining to group students to determine
popular data mining technique. Predication can be thought which student can easily pile up their courses and which
of as classifying an attribute value into one of a set of take courses for longer period of time [9].
possible classes. It is often viewed as forecasting a
continuous value, wile classification a discrete value [10].
IV. CLASSIFICATION
This study aims to analyze previous year data and Predictive modeling is the process by which a model
predict New Year student as a performer and is created or chosen to try to best predict the probability
underperformer. These applications can help both of an outcome. In many cases the model is chosen on the
instructor and student to enhance the quality education. basis of detection theory to try to guess the probability of
The classification carried out using a Bayesian an outcome given a set amount of input data [16]
classification method.
Classification is a predictive data mining technique,
makes predication about values of data using know results
II. DATA MINING
found from different data [10. Predictive models have the
Data mining techniques are used to operate on large specific aim of allowing us to predict the unknown value
volumes of data to discover hidden patterns and of a variable of interest given known values of other
relationships helpful in decision making [3]. Alternatively variables. Predictive modeling can be thought of as

686
Umesh KUmar Pandey et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (2) , 2011, 686-690

learning a mapping from an input set of vector VI. EDUCATIONAL DATA MINING IN HIGHER
measurements x to a scalar output y [4]. Classification EDUCATION
maps data into predefined groups are classes. It is often
referred to as supervised learning because the classes are Providing higher education to all sector’s of a
determined before examining the data. They often describe nation’s population means confronting social inequalities
these classes by looking at the characteristic of data deeply rooted in history, culture and economic structure
already known to belong to the classes [10]. that influence an individual’s ability to compete. Quality
assurance in higher education has raised to the top of the
V. BAYESIAN CLASSIFICATION policy agenda in many nations [12].
Bayes classification has been proposed that is based
on Bayes rule of conditional probability. Bayes rule is a Educational Data Mining is an emerging discipline,
technique to estimate the likelihood of a property given concerned with developing methods for exploring the
the set of data as evidence or input Bayes rule or Bayes unique types of data that come from educational settings,
theorem is- and using those methods to better understand students,
P ( xi | hi ) P( hi ) and the settings which they learn in [15]. Mining in
P (hi | xi ) = --------------------------------- educational environment is called Educational Data
P (xi | hi) +P (xi | h2) P(h2) Mining, concern with developing new methods to
discover knowledge from educational database[6].Lack of
The approach is called “naïve” because it assumes the deep and enough knowledge in higher educational system
independence between the various attribute values. Naïve may prevent system management to achieve quality
Bayes classification can be viewed as both a descriptive objectives, data mining methodology can help this
and a predictive type of algorithm. The probabilities are knowledge gaps in higher education system [11].
descriptive and are then used to predict the class
membership for a target tuple. The naïve Bayes approach VII. APPLICATION
has several advantages: it is easy to use; unlike other
classification approaches only one scan of the training data In this study, data gathered from different degree
is required; easily handle mining value by simply omitting colleges affiliated with Dr. R. M. L. Awadh University,
that probability [10]. An advantage of the naive Bayes Faizabad, India. These data are analyzed using
classifier is that it requires a small amount of training data classification method to predict the student’s
to estimate the parameters (means and variances of the performance. In order to apply this technique following
variables) necessary for classification. Because steps are performed in sequence:
independent variables are assumed, only the variances of
the variables for each class need to be determined and not 1) Data set: The data set used in this study was
the entire covariance matrix. In spite of their naive design obtained from different colleges on the sampling
and apparently over-simplified assumptions, naive Bayes method of computer science department of course
classifiers have worked quite well in many complex real- PGDCA of session 2009-10. Initially size of the
world situations [16]. data is 600.
Table 1 2) Database Software: Microsoft SQL Server 2005
was used to store the data. The reason behind
Division choosing MSSQL Server is: it was compatible and
efficient to use with the database management
I II III FAIL system i.e. relational database.
GEN 100 120 34 46 3) Application software: Matlab environment is used
as programming environment. The Matlab software
Caste OBC 61 78 19 12
suitable for the development of application with
Category SC/ST 29 50 28 23 MSSQL server 2005.
Language ENGLISH 100 70 18 6 4) Data mining Process: The data exploration and
presentation process consisted of following steps:
Medium HINDI 90 178 63 65
BA(NC) 12 76 54 22 i) Data preparation: In this step the data that
was maintained in different table was joined
BA(CA) 43 72 4 10
in a single table in 1 NF. After joining
Class BSc(Bio.) 80 50 15 18 process all errors were removed.
BSc(Math) 54 20 6 8
ii) Data selection and transformation: In this
Bcom 1 30 12 13 step only those fields were selected which
were required for data mining. For example
sex, language medium, stream of bachelor

687
Umesh KUmar Pandey et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (2) , 2011, 686-690

degree and division obtained. The data is from the training set are used to make the
transformed into the format of table 1. The prediction. Then estimate P (ti | Cj) by
head division shows the obtained division of p
the student in PGDCA final exam. Caste P ( ti | Cj) =  P( xik | Cj)
category shows the demographic distribution k=1
of the student defined by the GOI To calculate P (ti) we can estimate the
(Government of India). Language medium is likelihood that ti is in each class. The
the medium in which student passed his/her probability that ti is in a class is the product
graduation program. Class is the stream of the conditional probabilities for each
which a student passed to get admission in attribute value. The class with the highest
PGDCA. This categorized as BA (NC), BA probability is the one chosen for the tuple
(CA), B Sc (Bio), B Sc (Math) and B Com. [10].
BA (NC) is for those BA students who did
not take calculative subject in the BA iv) Results and discussion: After Bayesian
program and BA (CA) is indicating those classification the table 1 of data produces
students who took calculative subject. the table 2 and figure 1 shows the pictorial
representation of table2. Table 2 shows the
iii) Implementation of mining model: Given a highest probability of division of a particular
training set the naïve Bayes algorithm first class (medium, category and class). For
estimates the prior probability P (Cj) for example if a student is of medium, OBC
each class by counting how often each class category and BA with non-calculative
occurs in the training data. For ach attribute subject then it can be predicted that s/he will
value xi can be counted to determine P (xi). score second division mark in the final
Similarly the probability P (xi | Cj) can be exam.
estimated by counting how often each value
occurs in the class in the training data.
When classifying a target tuple, the
conditional and prior probabilities generated .

0.90

0.80

0.70

0.60

0.50

0.40

0.30

0.20

0.10

0.00
Bcom

Bcom

Bcom

Bcom

Bcom

Bcom
BA(NC)

BA(CA)

BSc(Bio.)

BSc(Math)

BA(NC)

BA(CA)

BSc(Bio.)

BSc(Math)

BA(NC)

BA(CA)

BSc(Bio.)

BSc(Math)

BA(NC)

BA(CA)

BSc(Bio.)

BSc(Math)

BA(NC)

BA(CA)

BSc(Bio.)

BSc(Math)

BA(NC)

BA(CA)

BSc(Bio.)

BSc(Math)

II I I I II II I I I II II II I I II II II I I II II II I I II III II II I FAIL
Gen OBC SC/ST Gen OBC SC/ST
english hindi

Figure 1

688
Umesh KUmar Pandey et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (2) , 2011, 686-690

Probability
Category

Division

Class
Medium

English Gen II BA(NC) 0.549218


I BA(CA) 0.527331
I BSc(Bio.) 0.712938
I BSc(Math) 0.805456
II Bcom 0.685979
OBC II BA(NC) 0.596068
I BA(CA) 0.520268
I BSc(Bio.) 0.717771
I BSc(Math) 0.810201
II Bcom 0.758005
SC/ST II BA(NC) 0.471241
II BA(CA) 0.507795
I BSc(Bio.) 0.601875
I BSc(Math) 0.715802
II Bcom 0.594066
Hindi Gen II BA(NC) 0.467961
II BA(CA) 0.585719
I BSc(Bio.) 0.376518
I BSc(Math) 0.504112
II Bcom 0.484983
OBC II BA(NC) 0.568261
II BA(CA) 0.652269
I BSc(Bio.) 0.428287
I BSc(Math) 0.553675
II Bcom 0.634676
SC/ST III BA(NC) 0.384808
II BA(CA) 0.600667
II BSc(Bio.) 0.335702
I BSc(Math) 0.373642
FAIL Bcom 0.456478

Table 2
VIII. CONCLUSION [2]. Alaa el-Halees (2009) Mining Students Data to
In this paper, Bayesian classification method is used on Analyze e-learning Behavior: A Case Study.
student database to predict the students division on the [3]. Connolly T., C. Begg and A. Strachan (1999)
basis of previous year database. This study will help to Database systems: A Practical Approach to
the students and the teachers to improve the division of Design, Implementation, and Management (3rd
the student. This study will also work to identify those Ed.). Harlow: Addison-Wesley.687
students which needed special attention to reduce failing [4]. David Hand, Heikki, Mannil Padraic smyth,
ration and taking appropriate action at right time. “Principles of Data Mining” PHI
[5]. Elena susena. “using data mining techniques in
higher education” National defence university
REFERENCES “carol I” Bucharest 68-72.
[1]. Akpınar, H., “Veri Tabanlarında Bilgi Keşfi ve [6]. Erdoğan, Ş. Z., “Veri Madenciliği ve Veri
Veri Madenciliği”, İ.Ü. İşletme Fakültesi Madenciliğinde Kullanılan K-Means
Dergisi, Sayı:1 (1-22), Nisan 2000 Algoritmasının Öğrenci Veri Tabanında

689
Umesh KUmar Pandey et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (2) , 2011, 686-690

Uygulanması”, Yüksek Lisans Tezi, İstanbul [10]. Margret H. Dunham, “Data Mining: Introductory
Üniversitesi, 2004. and advance topic”.
[7]. Gabrilson, S., Fabro, D. D. M., Valduriez, P., [11]. Shaeela Ayesha, Tasleem Mustafa, Ahsaan Raza
2008. Towards the efficient development of star, M Inayat Khan-“Data mining Model for
model transformations using model weaving and higher education” ISSN 1450-216X Vol.43 No.1
matching transformations, Software and Systems (2010) pp24-29
Modeling 2003. Data Mining with CRCT Scores. [12]. Philip G Altbach, Liz Reisberg and laura E.
Office of information technology, Geogia Rumbley, a report ptrpared for the UNESCO
Department of Education. 2009 world conference on higher education.
[8]. Han, J. W., Kamber, M., 2006. Data Mining: [13]. Thearling, K., “An Introduction to Data Mining”
Concepts and Techniques, 2nd Edition, The ,http://thearling.com/text/dmwhite/dm
Morgan Kaufmann Series in Data Management [14]. Westphal, C., Blaxton, T., 1998. Data Mining
Systems, Gray, J. Series Editor, Morgan Solutions, John Wiley.
Kaufmann Publishers. [15]. www.educationaldatamining.org
[9]. Luan, J., 2002. Data mining and knowledge [16]. http://en.wikipedia.org/wiki/Predictive_modellin
management in higher education – potential g
applications. In Proceedings of AIR Forum,
Toronto, Canada

690

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy