0% found this document useful (0 votes)
100 views6 pages

Prediction of Graduate Admission IEEE - 2020

An IEEE paper regarding the use of Machine Learning in Graduate Admissions.

Uploaded by

Anu Ramanujam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views6 pages

Prediction of Graduate Admission IEEE - 2020

An IEEE paper regarding the use of Machine Learning in Graduate Admissions.

Uploaded by

Anu Ramanujam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE SoutheastCon2020

Prediction of Graduate Admission using Multiple Supervised


Machine Learning Models
Zain Bitar Amjed Al-Mousa
Electrical Engineering Department Computer Engineering Department
Princess Sumaya University for Technology Princess Sumaya University for Technology
Amman, Jordan Amman, Jordan
zainbitarr@gmail.com a.almousa@psut.edu.jo

Abstract–In response to the highly competitive job market at Several Machine learning supervised learning algorithms
present times, an increased interest in graduate studies has will be utilized in this paper to predict the rate of acceptance
arisen. This has not only burdened applicants but also led to an as a percentage: SVM (support vector machines), Logistic
increased workload on admission faculty members of Regression, Linear Regression, Decision Trees, and Random
universities. Any chance of abridging the admission process
Forest. Regression models will be compared according to their
impelled applicants and faculty workers to look for faster,
efficient, and more accurate methods for predicting admissions. coefficient of determination denoted by R2 whilst
The goal approach of this paper is to implement and compare classification models will be compared according to their
several supervised predictive analysis methods on a labeled accuracy, precision, and recall. Ensemble methods such as
dataset based on real applications from the prestigious university Boosting and bagging as well as stacking will be used to
of UCLA; Regression, classification, and Ensemble methods are enhance the accuracy for the classifier; several studies have
all the supervised methods that are to be employed for been focusing on the impact of Ensemble methods in the
prediction. The dataset relies profoundly on the academic future of machine learning [3].
performance of the applicants during their undergrad years. The
coefficient of determination, as well as precision and accuracy,
are the measures used to compare the different models. All II. RELATED WORK
predictive methods proved to show accurate results, however;
certain methods proved to be more promising than others were. With the flourishing of machine learning in the twentieth
Predictions were obtained within short time frames, which in century and its ability to make everyday tasks effortless, its
turn will cut down the time in the admission process. applicability also extended in educational fields. A very
interesting domain where machine learning was implemented
Keywords- admissions, graduate studies Regression, within the educational field is the ability to predict the
Classification, Ensemble methods, coefficient of determination admission of an applicant into an educational institution. In
several countries such as China, the application process could
I. INTRODUCTION be competitive especially with the ability of students to get
Graduate programs have gained increased popularity early admission. The approach was to employ supervised
seeing that most students find themselves looking for a chance methods such as Logistic Regression on a university dataset to
to continue their education after completing their compute the prediction of students with different educational
undergraduate study; this is usually a result of students gaining abilities on getting an early acceptance in admission [1].
a better perspective of what educational field they would like Another take on machine learning applicability in the
to pursue. Nevertheless, most prestigious schools require a educational admission field is implementing a recommender
minimum standard of academic performance based on that uses KNN to recommend the best-suited university for
previously taken standardized exam scores and cumulative each applicant based on an academic dataset with features that
GPA as well as several other academic measures; This can be test the academic performance identical to those in the dataset
highlighted in several papers [1,2]. With this increased used [2].
demand for graduate admission, registration offices have been Usually, datasets do not come in a single language
pressured with overlooking thousands of applications. especially as in [4] if you are building a universal predictor the
The predictor of design will provide registration offices data will more likely have unnecessary features. In [3] the
with a good overlook of hundreds even thousands of predictor was able to use Decision Tree modeling along with
applications at a time with high accuracies; this is also very recommender systems as in [2] are two strong modeling
useful from an applicant’s point of view where it saves them algorithms used in the prediction process. The predictor also
money on application fees and time such that the applicant can provided applicants a view of what features have the highest
still get a chance of early admission. Highly correlated effect in influencing their admission rate. Similarly, the
features with the admission rate will be highlighted; predictor to be designed should also be able to use Decision
elucidating to the applicants what will affect their chances of Trees to provide good approximations; however, further
admission. The methods will be trained to predict admission enhancement of this model will be presented later in this
per the UCLA admission rules, which rely on the applicant’s paper.
previous educational record. The predictor will not consider The First take on this implementation is in [5] where it
any non-educational related features. introduces a predictor known as GRADE; GRADE

978-1-7281-6861-6/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: East Carolina University. Downloaded on June 16,2021 at 11:31:39 UTC from IEEE Xplore. Restrictions apply.
preprocesses numerical and categorical data such as GRE TABLE 1: The statistical characteristics of the Data
score, GPA and school name. The predictive analysis method Label Count Mean Std Max Min
of [4] is logistic regression infused with the log of odds; alike GRE 500 316.4 11.29 340 290
this paper performance measures such as the ROC and AUC TOEFL 500 107.2 6.08 120 92.0
will be computed, compared and enhanced as that is the Uni
Rating 500 3.11 1.14 5.0 1.0
approach of this paper.
SOP 500 3.37 0.99 5.0 1.0
III. DATA ANALYSIS& PREPROCESSING LOR 500 3.48 0.92 5.0 1.0
CGPA 500 8.57 0.60 9.0 6.8
The data of use [6] composed of five-hundred instances
Research 500 0.56 0.50 1.0 0.0
with no null value entries nor any categorical attributes; each
Admittance 500 0.72 0.14 0.97 0.34
instance in the dataset represented an applicant. This dataset
has been acquired from UCLA’s admittance history data. The
number of attributes given in dataset is eight where all As seen above the data scales differ profoundly across
attributes are numeric: attributes and this can affect the SVM in fitting the data with
the “widest fitting street”. To avoid this effect a preprocessing
I. GRE Score (General Record Examinations); this technique known as Standard Scaling is imported from Scikit-
score measures general knowledge in undergrad learn; this technique was chosen over min-max scaling given
Math and English. This score ranges from a value of that the data might include several outliers.
260 to 340 Correlation matrices help provide an intuitive idea about
what predictive analysis methods would be more useful in
II. TOEFL Score (Test of English as a Foreign finding a suitable model. Fig 1 employs Matplotlib and
Language); this score measures students English Seaborn libraries to plot the correlation matrix using color
abilities. This score ranges from a value 0 to 120. maps to help emphasize the number scale in colors.
It can be seen that all features have a positive uniform
III. SOP (Statement of Purpose); a letter written by the correlation with the admission rate. The highest impacting
applicant explaining their purpose of the application. attribute on the admission rate is the “CGPA” at a substantial
This is scored on a range from one to five. value of 0.88 whilst, “Research” has the lowest value between
all features at a value of 0.54.
IV. LOR (Letter of Recommendation); tests the weight of
the recommendation provided by the applicant. This
is scored on a range from one to five.

V. CGPA (Cumulative GPA); based on the academic


performance of the applicant in undergraduate
studies. This is scored on a range from one to ten.

VI. University Rating; based on the reputation of the


applicant's previous university. This is scored on a
range from one to five.

VII. Research Experience; binary value based on whether


the applicant has any research familiarity. This value
is either one or zero.

VIII. Chance of Admission; the rate of admission into


graduate school. This attribute is the targeted value in
which will be predicted as the rate from zero to one.

The dataset included a “Serial No.” attribute, which had


no benefit to the data frame considering it had the same value
as the numeric index of the instances. To understand the
distribution of the data and to gain a better statistical
understanding Table.1 describes the data using Pandas
(Python Data Analysis Library) describe function. Fig.1: Correlation matrix using all eight attributes

Authorized licensed use limited to: East Carolina University. Downloaded on June 16,2021 at 11:31:39 UTC from IEEE Xplore. Restrictions apply.
Now to visualize the effect of the features within themselves, a
scatter plot is shown in Fig.2 using Matplotlib scatter plot IV. MODELING METHODS
function. This plot emphasizes the relationships between the Several supervised predictive analysis methods can be
three highest correlated features with the admittance rate: implemented on the data whether these methods categorize
“GRE_score”, “TOEFL_Score”, and “CGPA”. As suggest by under Regression, Classification, and Ensemble (applicable to
Fig.1, the “GRE_Score” along with “TOEFL_Score” have a both regression and classification). Innately modeling the data
high correlation among each other along with the “CGPA”. To set was more likely to be done using Regression algorithms;
clarify this correlation a scatter plot Fig.2 was implemented to nevertheless, Classification can also be done by slightly
represent a 3D plot in a 2D plot using the color map feature. altering the target attribute. An admittance rate value above
The scatter plot captures the linearly intrinsic trend amongst 0.75 is considered as a definite admission or a binary 1
the three largest impacting features on the admittance rate. anything below will be considered as a binary zero: given that
in Table 1 the mean of the admit rate is about 0.72 a value
slightly higher will be a reasonable rate.
Ensemble techniques such as vote classifier, bagging,
boosting and stacking algorithms will be introduced to
improve the accuracy of the classification process; Ensemble
methods usage has been widespread, as shown in papers [7,8]
comparing and upgrading their performance in terms of
accuracy. The above algorithms are used post splitting the data
into test data (20%) and training data (80%).

A. Regression Models
Several different regression algorithms will be used to
pursue the best model according to the performance measure
(coefficient of determination). Based on Fig.1 and Fig.2
Linear Regression models seem promising. After importing
the Linear Regression library from Scikit-Learn, the model
Fig.2: Correlation between the GRE_Score, CGPA, and TOEFL_Score was fitted and later used to predict the test data. To make sure
the model was not overfitting the data, cross-validation was
“University rating” which does not show a very promising done using three folds and the scoring was based on the root
correlation value in terms of admittance rate, will prove to mean square error found in Table 2. Three folds produced a
have a clear correlation. This can be better elucidated using a decent variation of data and data size in each fold. Testing the
bar graph under the assumption that an “admittance rate” performance of this model was done by computing the
above 0.75 guarantees acceptance of the applicant. Fig.3 score using the test data along with the predicted values.
shows how “University rating” actually does affect the SVR (Support vector machine regression) was used in
“admittance rate” proportionally with a mild exception that the expectation that the performance measure might increase;
number of admitted applicants from a “University rating” of however, in contrast to linear regression SVR has several
four is slightly greater than that of a five. hyper-parameters that can be adjusted to realize the lowest
mean square error: ε (epsilon) and C. Subsequently, to
maximize the performance of SVM, GridSearch was imported
from Scikit-Learn to establish the best hyper-parameter values
of C=1.5 and ε=0.1 along with a linear kernel. Now cross-
validation can be implemented with values found in Table 2.
Now to test the performance of the model the same score
was evaluated on the test data along with the SVR predicted
values
Now for the final regression algorithm, Decision Tree
Regressors along with their ensemble Random Forest
Regressors will be implemented. Similar to SVR Decision
Tree Regressor hyper-parameters were computed using
GridSearch, the same procedure will be repeated in this case
of Decision Tree Regressors. These are the values acquired
max depth=5 and min samples split=50. Cross-Validation
scores of both algorithms are found in Table 2. The score
was also evaluated on the predicted values of both algorithms.
Fig.3: Number of applicants admitted along with their university rating

Authorized licensed use limited to: East Carolina University. Downloaded on June 16,2021 at 11:31:39 UTC from IEEE Xplore. Restrictions apply.
TABLE 2: Cross-validation scores using all four-regression algorithms 1
1 1 1
Regression First Fold Second Fold Third Fold

Linear
0.0643 0.0615 0.0568 GridSearch was used to find the optimal generalization
Regression
SVR 0.0668 0.0724 0.0675 factor “C” value, which was found to be one. This model
Decision Tree provided an adequate accuracy as shown in Table 3. To ensure
0.0765 0.0737 0.0667
Regression the model did not over-fit the data cross-validation was done;
Random Forest moreover, the Cross-validation score of this classifier based on
Regression 0.0702 0.0654 0.06190
the accuracy using three folds has a mean value of 0.8946.
Precision and recall values can be readily calculated from the
A bar graph using all four algorithms was used to confusion matrix in Fig 5.
compare which algorithm had the best performance in terms of
modeling the data in Fig.4.It can be deduced that Linear
Regression is the best-fit algorithm and it matches the data
with a score of 0.819. The second best-fitting model is the
Random Forest Regressors at a value 0.792: Ensemble
algorithms are proven to improve performance especially in
the case of weak classifiers.

Fig.5: Confusion matrix of the predicted values using Logistic Regression.

The second algorithm used is SVC; this classifier is


known as a strong classifier, with minimum computational
power. All hyper-parameters are interpreted using GridSearch
with the following results C=10, degree=1.0, gamma= 0.01,
and kernel ='linear'. The mean cross-validation score of three
folds is 0.89663; given such a high cross-validation score the
model looks promising. A confusion matrix, shown in Fig.6, is
used to show the model's performance measures.
Fig.4: score of four different regression algorithms.

B. Classification Models
After splitting the data into training and testing sets, the
admittance rate also known as the target feature is transformed
into a binary value, where if the admittance rate is greater than
0.75 is considered as a binary one anything else is a zero.
Three different classification algorithms models Decision
Trees, Logistic regression, and SVC (support vector classifier)
are to be compared based on accuracy, recall, precision, and
AUC.
The first algorithm to be used is Logistic Regression; this
algorithm relies on a sigmoid function and a threshold. Similar
to the Linear Regression algorithm the Logistic Regression
places weights on each feature with a cost function (1). The
algorithm works on minimizing the cost function using
gradient descent.
Fig.6: Confusion matrix of the predicted values using SVC.

Authorized licensed use limited to: East Carolina University. Downloaded on June 16,2021 at 11:31:39 UTC from IEEE Xplore. Restrictions apply.
The final classification algorithm to use is the Decision
Tree algorithm. Even though this algorithm is very trivial and
considered as a weak learner, it has a rather logical approach.
The cost function in this algorithm is based on the CART
principle. CART is known for being a greedy algorithm: it
looks for the best way to find a split at the current layer
without taking into account the best split for further layers.
Equation (2) represents the CART cost function and its
reliance on the gini index G and the number of instances m;
and represent the gini measure, while and
represent the total number of instances to the left and
right respectively.

Fig.8: The ROC curves of all three classifiers


J k, t G G (2)
The performance measures accuracy, AUC, precision, and
Now to put the model to use, GridSearch was put to use recall are all documented in Table 3. Where SVC and Logistic
once more to find the following hyper-parameters: max depth Regression gave identical and credible results; however,
and minimum samples per leaf. The ideal values, using a logistic regression outperformed SVC when it came to having
threefold cross-validation GridSearch, whereas follows max a high recall rate without misidentifying the applicants that
depth = 4 and minimum samples per leaf = 10. The mean were accepted. The weakest classifier in terms of performance
cross-validation score of three folds is 0.8776. A confusion is the Decision Tree classifier; to improve the Decision Tree
matrix in Fig.7 is computed to measure the prediction model classifier performance a better variant of this classifier;
performance. Random Tree can be used instead.

TABLE 3: Performance measures of all classifiers


Decision Tree Logistic SVC
Regression
Accuracy 0.87 0.91 0.91
AUC 0.9251 0.9643 0.9613
Precision 0.7949 0.8461 0.8461
Recall 0.8611 0.9167 0.9167

C. Ensemble Methods
The initiative behind using ensembles is to improve the
performance measure “accuracy” of weak classifiers:
Ensemble methods work on employing several different or
identical classifiers in parallel or sequentially. The principle
behind using ensemble methods is that the accuracy is more
likely enhanced. Four different Ensemble methods have been
employed: Voting classifiers Bagging, AdaBoosting, and
stacking.
Fig. 7: Confusion matrix of the predicted values using Decision Trees
The first ensemble method of use is the Voting Classifier,
which was implemented on the dataset using previously
A significant performance measure is the ROC curve
implemented classifiers: Decision Tree, SVC, and Logistic
along with its AUC. The AUC ensures that the predictor
Regression. The voting hyper-parameter chosen is Hard
focuses on the “FN”; this ensures that the predictor does not
Voting: SVC does not provide any probability measure thus
miss any eligible applicant. Fig 8 is a graph of the ROC of all
one is committed to using this hyper-parameter. The accuracy
three classifiers based on their decision functions; the trends of
of the voting classifier is 0.91. This accuracy is intuitive;
all the predictors look identical. A more accurate indicator of
Logistic Regression and SVC are not only strong classifiers
these curves is the AUC, which will be later calculated and
but also have a similar linear model classification technique in
tabulated.
terms of a linear kernel; however, when comparing the
accuracy of the voting classifier with that of the Decision Tree
it does have improved accuracy.
Now for bagging, this Ensemble Method uses one
classifier with randomly sampling repeated instances. Bagging

Authorized licensed use limited to: East Carolina University. Downloaded on June 16,2021 at 11:31:39 UTC from IEEE Xplore. Restrictions apply.
works by increasing the training data using bootstrap samples, imported from MLXTEND library using four classifiers SVC,
this principle working technique is used for generalization. To Random Forest, Gaussian Naive Bayes and as a meta-
ensure the bagging classifier has optimal performance a classifier Logistic Regression. After Training the Stacking
GridSearch was computed to obtain the following hyper- classifier and then testing it on the test data, the accuracy of
parameters max samples=0.6, max features =4, bootstrap this classifier accuracy reached a value of 0.925. Which is
=True and number of estimators =20. Bagging using Decision considered superior amongst all classifiers along with all other
Trees improved the decision tree classifier’s accuracy to a ensemble methods in terms of accuracy.
value of 0.9125, which is higher than the other three classifiers
in terms of accuracy. Bagging can only effectively work on
unstable classifiers that have a high tendency to over-fit. V. CONCLUSION
Boosting, on the other hand, works differently; it places
Prediction of the graduate admission of applicants, where
weights on each instance and each model, which in turn
the dataset was made of eight important features related to
affects the performance of the next model by focusing on all
educational history, using and comparing supervised machine
the miss-classified instances. AdaBoost is the Boosting
learning models. This research documented several model
algorithm used in this research paper; AdaBoost helps with
performances whether it was the coefficient of determination
increasing variance in under fitted models. AdaBoost using
for regression or accuracy and AUC for classification.
decision Trees classifier is very effective in enhancing the
Ensemble techniques were also implemented to improve
estimator’s accuracy; the downside to boosting is that the
accuracy especially in terms of weak or unstable classifiers.
classifier can only work sequentially which is much more
To implement this predictor using the given dataset any model
time-consuming.
considered in this research will provide a minimum accuracy
Using GridSearch and importing AdaBoost from Scikit-
of 0.87 and score of 0.724; however, using an Ensemble
learn the max number of estimators recommended is 10 at a
technique improved the accuracy score to an accuracy of
learning rate of 0.85. Employing AdaBoost on Decision Trees,
0.925. In general, all models can provide a good enough
we get an accuracy of 0.9. Given that ten estimators are used
prediction given that the training data had double the number
each with an individual weight, a weight function is utilized to
of accepted applicants to rejected applicants. This predictor
compute the weights of all ten classifiers and is in Fig.9. The
proved to work competently; however, given the dataset, this
initial estimators are more likely to have a higher weight given
predictor works only on the educational aspect of the applicant
that Decision Trees split at estimators with better performance
without taking into account personal data. This predictor has
measures (Gini Index). Using AdaBoost on SVM classifiers is
great potential in affecting the admission process worldwide.
not effective due to them being originally very stable
The predictor presented can be implemented not only in
classifiers [7] the same could be said about Logistic
university admission faculties [5] but also at recruiting
Regression classifiers.
agencies or human resources departments. Implementing this
predicator would reduce the time needed to analyze the CVs
of applicants. This would allow human resources to focus on
applicants that are more legible.

REFERENCES
[1] Chen, Y., Pan, C. C., Yang, G. K., & Bai, J. (2014, August). Intelligent
decision system for accessing academic performance of candidates for
early admission to university. In 2014 10th International Conference on
Natural Computation (ICNC) (pp. 687-692). IEEE.
[2] Hasan, M., Ahmed, S., Abdullah, D. M., & Rahman, M. S. (2016, May).
Graduate school recommender system: Assisting admission seekers to
apply for graduate studies in appropriate graduate schools. In 2016 5th
International Conference on Informatics, Electronics and Vision
(ICIEV) (pp. 502-507). IEEE.
[3] Dietterich, T. G. (1997). Machine-learning research. AI magazine, 18(4),
[4] Yazdipour, S., & Taherian, N. (2017, December). Data Driven Decision
Support to Fund Graduate Studies in Abroad Universities. In 2017
International Conference on Machine Learning and Data Science
Fig.9: Weights of all ten trees in order (MLDS) (pp. 44-50). IEEE.
[5] Waters, A., & Miikkulainen, R. (2014). Grade: Machine learning support
The final Ensemble method used is Stacking; Stacking is a for graduate admissions. AI Magazine, 35(1), 64-64.
[6] Mohan S. Acharya, Asfia Armaan, Aneeta S Antony . A Comparison of
hybrid between both Boosting and Bagging where the Regression Models for Prediction of Graduate Admissions, 2019 IEEE
Stacking classifiers work with models in parallel as well as in International Conference on Computational Intelligence in Data Science
sequence. Meta classifiers are employed to train on all the data [7] Wickramaratna, J., Holden, S., & Buxton, B. (2001, July).
predicted by the modeling layers ahead. Stacking is known for Performancedegradation in boosting. In International Workshop on
Multiple Classifier Systems (pp. 11-21). Springer, Berlin, Heidelberg.
being the most effective Ensemble method in increasing the
accuracy of several classifiers. The Stacking model is

Authorized licensed use limited to: East Carolina University. Downloaded on June 16,2021 at 11:31:39 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy