0% found this document useful (0 votes)
72 views34 pages

Mini Proj RCT 222 PDF

The document provides an introduction and overview of using machine learning models to predict graduate school admission chances. Specifically: 1) It discusses how the number of students applying for graduate studies has increased, motivating the study of using grades and other factors to predict admission chances. 2) It presents the dataset used, which contains 500 rows of data on 7 variables like GRE scores, TOEFL scores, and university ratings for past applicants. 3) It proposes developing an admission predictor using linear regression with dimensionality reduction to provide users a chance of admission ranging from 0 to 1 based on their information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views34 pages

Mini Proj RCT 222 PDF

The document provides an introduction and overview of using machine learning models to predict graduate school admission chances. Specifically: 1) It discusses how the number of students applying for graduate studies has increased, motivating the study of using grades and other factors to predict admission chances. 2) It presents the dataset used, which contains 500 rows of data on 7 variables like GRE scores, TOEFL scores, and university ratings for past applicants. 3) It proposes developing an admission predictor using linear regression with dimensionality reduction to provide users a chance of admission ranging from 0 to 1 based on their information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

CHAPTER 1

INTRODUCTION

The world markets are developing rapidly and continuously looking for the
best knowledge and experience among people. Young workers who want to stand
out in their jobs are always looking for higher degrees that can help them in
improving their skills and knowledge.
OVERVIEW
As a result, the number of students applying for graduate studies has increased
in the last decade [1]–[4]. This fact has motivated us to study the grades of students
and the possibility of admission for master’s programs that can help universities
in predicting the possibility of accepting master’s students submitting each year
and provide the needed resources. The dataset [5] presented in this paper is related
to educational domain.
Admission is a dataset with 500 rows that contains 7 different independent
variables which are:
• Graduate Record Exam1 (GRE) score. The score will be out of 340 points.
• Test of English as a Foreigner Language2 (TOEFL) score, which will be out of
120 points.
• University Rating (Uni.Rating) that indicates the Bachelor University ranking
among the other universities. The score will be out of 5.

1
PROBLEM STATEMENT
College Admission predictor is the process of predicting the possibility of the
students whether they will admit into the college or not . Deep learning is used in
various computer vision applications and related studies. Here we are developing an
Admission predictor using linear regression in the Machine learning

EXISTING SYSTEM
In the Existing System, Many machine algorithms are used to the prediction of
Graduate Admission. The existing system compares the four machine learning
algorithms on the basis of accuracy. The algorithms are Linear Regression, Support
Vector Regression, Random forest Regression .In this system Linear Regression
performs the best on the dataset with low MSE and high R2 score. Figure 2, shows
sample data set, the dataset contains 500 rows and 7 independent variables of data. In
the figure 1, showing the architecture of existing system
PROPOSED SYSTEM
The proposed system consist of four regression models. Out of those we use Linear
Regression using Dimensionality Reduction which is also a high accurate model. A
user interface is provided through which an actor can interact with the system. The
algorithm with improved accuracy will act as a backend for the user interface.
Whenever any actor (Student/Consultancy) provides the data to the user interface it
will show the result of Chance of Admission which is ranging 0 to 1. Figure 3:
Architecture of Propose System User Manual There are several steps for using this: 1.
Initially the user has to open our website and provide all

2
CHAPTER 2
LITERATURE SURVEY

M. Injadat et.al [1] proposed a comparative approach by developing four


machine learning regression models: linear regression, support vector machine,
decision tree and random forest for predictive analytics of graduate admission chances.
Then compute error functions for the developed models and compare their
performances to select the best performing model out of these developed models the
linear regression is the best performing model with R2 score of 0.72.
F. Salo, et.al [2] proposed a comparison of different regression models. The
developed models are gradient boosting regress or and linear regression model.
Gradient boosting regress or have to score of 0.84. That surpassing the performance of
linear regression model. They computed different other performance error metrics like
mean absolute error, mean square error, and root mean square error.
A. Moubayed et.al [3] Networks were utilized by to make a choice emotionally
supportive network for assessing the application put together by global scholars in the
college. This representation was intended to anticipate the exhibition of the hopeful
scholars by contrasting them and the presentation of scholar at present concentrating in
the college and had a comparative outlined throughout their application. Right now on
the present understudy's profile the representation anticipated whether the hopeful
understudy ought to be allowed admission to the college. Since the correlations were
made distinctly with the scholars who were at that point approved in the college and
the information of the scholars who are deprived of confirmation were excluded from
the examination this representation end up being less effective because of the issue of
class lopsidedness.
A. B. Nassif et.al [4] looked into from a college perspective to anticipating the
probability of the scholar trying out the college after they have verified regarding
courses in the college. The Author(s). This is an open access article distributed under
the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0),
3
which permits unrestricted use, distribution, and reproduction in any medium, provided
the original author and source are credited.They utilized the K-Means calculation for
bunching the scholars depending on various components like criticism, family pay,
family occupation, guardians capability, inspiration, and so forth to anticipate the
scholars will enlist at the college or not. Contingent on the likeness of the traits between
the scholars that are assembled into groups and choices are completed. The goal of the
representation to build the enrolment of the scholars in the college.
M. S. Acharya et.al [5] utilized AI and prescient displaying to build up a
representation to assess the confirmation approaches and benchmarks in the Tennessee
Tech University. A notable variant of the C4.5 calculation, J48 was utilized to make
the representation. Like the representations referenced above they utilized the various
components of the understudy outlined to assess the odds of their admittance to the
college. The representation functioned admirably in anticipating the genuine positive
situations are the understudy had a decent outlined to make sure about the affirmation,
yet it bombed in effectively recognizing the genuine negatives on account of which
understudy that doesn't fulfill the characterized criteria.
S. S. Shapiro et.al [6] the yield of school affirmation was anticipated utilizing
AI procedures. Yield rate can be characterized as the rate at that the scholars are been
allowed affirmation by the college enlist for the course. Numerous AI calculations like
Random Forest, Logistic Regression and SVM were utilized to make the
representation.
G. K. Uyanık et.al [7] This section provides the literature review of the work that
has previously done on predicting the chances of students enrolment in universities.
There have been several project and studies performed on topics related to students
admission into universities. (Bibodiet al. (n.d.)) used multiple machine learning models
to create a system that would help the students to shortlist the universities suitable for
them also a second model was created to help the colleges to decide on enrolment of
the student.
C. López-Martín et.al [8] M developed a model that can provide the list of

4
universities/colleges where the which best suitable for a student based on their
academic records and college admission criteria. The model was developed by
applying data mining techniques and knowledge discovery rules to the already existing
in-house admission prediction system of the university. (Mane) conducted a similar
research that predicted the chance of a student getting admission in college based on
their Senior Secondary School ,Higher Secondary School and Common Entrance
Examination scores using the pattern growth approach to association rule mining. The
performance of both the models was good the only drawback was the problem
statement was single university centric.
O. Mahdi et.al [9] In research conducted by (Jamison) the yield of college
admission was predicted using machine learning techniques. Yield rate can be defined
as the rate at which the students who have been granted admission by the university
actually enroll for the course. Multiple machine learning algorithms like Random
Forest, Logistic Regression and SVM were used to create the model; the models were
compared based on their performance and accuracy, Random Forest outperformed the
other models with 86% accuracy and was thus used to create the system. The factors
that proved to be significant in predicting successful application were also highlighted.
N. S. Altman, et.al [10] GRADE system was developed by to support the
admission process for the graduate students in the University of Texas Austin
Department of Computer Science. The main objective of the project was to develop a
system that can help the admission committee of the university to take better and faster
decisions. Logistic regression and SVM were used to create the model, both models
performed equally well and the final system was developed using Logistic regression
due to its simplicity .The time required by the admission committee to review the
applications was reduced by 74% but human intervention was required to make the
final decision on status if the application.

M. Azzeh et.al [11] created a similar model to predict the enrolment of the
student in the university based on the factors like SAT score, GPA score, residency

5
race etc. The Model was created using the Multiple Logistic regression algorithm, it
was able to achieve accuracy rate of 67% only.
T. K. Ho et.al [12] In look into directed by Jamison the yield of school
affirmation was anticipated utilizing AI procedures. Yield rate can be characterized as
the rate at that the scholars are been allowed affirmation by the college enlist for the
course. Numerous AI calculations like Random Forest, Logistic Regression and SVM
were utilized to make the representation.

A. B. Nassif et.al [13] proposed different machine learning algorithms


for predicting the chances of admission. The models are K- Nearest Neighbor
and Linear Regression, Ridge Regression, Random Forest. These are trained by
features have a high impact on the probability of admission. Out of the generated
models the linear regression model have 79% accuracy Logistic regression and
SVM were used to create the model, both models performed equally well and the
final system was developed using Logistic regression due to its simplicity .The
time required by the admission committee to review the applications was
reduced by 74% but human intervention was required to make the final decision
on status if the application.

D. E. Rumelhart et.al[14] proposed a developed project uses machine


attributes like GRE, TOEFL, CGPA, research papers etc. According to their
scores the possibilities of chance of admit is calculated. The developed model
has 93% accuracy. The performance is analyzed through various metrics such as
accuracy, precision, F-measure, recall and area under the receiver operator curve.

M. S. AcharyaEt.al [15] Machine Learning models are used for


training and testing in different ways by researchers. For example, chose to use
ensemble learning for an efficient result. In the performance is analyzed through
We ka in order to decide which model performs the best based on mean absolute
error (MAE) value. While use multiple machine learning models and pick the

6
model that performs the best. The performance is analyzed through various
metrics such as accuracy, precision, F-measure, recall and area under the receiver
operator curve. For example, chose to use ensemble learning for an efficient
result. In the performance is analyzed through We ka in order to decide which
model performs the best based on mean absolute error (MAE) value. While use
multiple machine learning models and pick the model that performs the best. The
performance is analyzed through various metrics such as accuracy, precision, F-
measure, recall and area under the receiver operator curve

CHAPTER 3

SYSTEM DESIGN
7
In this chapter, the various UML diagrams College admission
predictor is represented and the various functionalities are explained.

3.1 UNIFIED MODELLING LANGUAGE


Unified Modeling language (UML) is a standardized modeling language
enabling developers to specify, visualize, construct and document artifacts of a
software system. Thus, UML makes these artifacts scalable, secure and robust
in execution. It uses graphic notation to create visual models of software systems.
UML is designed to enable users to develop an expressive, ready touse visual
modeling language. In addition, it supports high-level development concepts such
as frameworks, patterns and collaborations. Some of the UML diagrams are
discussed.

3.1.1USE CASE DIAGRAM OF COLLEGE ADMISSION PREDICTOR


Use case diagrams are considered for high level requirement analysis of a
system. So when the requirements of a system are analyzed the functionalities are
captured in use cases. So it can be said that uses cases are nothing but the system
functionalities written in an organized manner. Now the second things which are
relevant to the use cases are the actors. Actors can be defined as something that
interacts with the system. The actors can be human user, some internal applications
or may be some external applications. Use case diagrams are used to gather the
requirements of a system including internal and external influences. These
requirements are mostly design requirements. Hence, when a system is analyzed
to gather its functionalities, use cases are prepared and actorsare identifier The
Functionalities are to be represented as a use casein the representation. Each and
every use case is a function in which the user or the server can have the access on
it. The names of the use cases are given insuch a way that the functionalities are
preformed, because the main purpose of the functionalities is to identify the
requirements. To add some extra notes that should be clarified to the user, the
notes kind of structure is added to the use case diagram. Only the main
8
relationships between the actors and the functionalities are shown because all the
representation may collapse the diagram. The use case diagram as shown in Figure
3.1 provides details based on the College Admission Predictor.

Fig 3.1 use case diagram of college admission predictor

3.1.2 SEQUENCE DIAGRAM OF COLLEGE ADMISSION PREDICTOR

Sequence diagrams model the flow of logic within thesystem in a visual manner,
enabling to both document and validate the logic,and are commonly used for
both analysis and design purposes.

9
Fig 3.1 sequence diagram of college admission predictor
3.1.3ACTIVITY DIAGRAM OF COLLEGE ADMISSION PREDICTOR

Activity is a particular operation of the system.


Activity diagram issuitable for modeling the activity flow of the system Activity
diagrams are not only used for visualizing dynamic nature of a system but they
are also used to construct the executable system by using forward and reverse
engineering techniques. The only missing thing in activity diagram is the message
part.

10
Fig 3.1 activity diagram of college admission predictor

3.1.4 COMPONENT DIAGRAM OF COLLEGE ADMISSION


PREDICTOR

A component diagram displays the structural relationship of components of a


software system. These are mostly used when working with complex systems that
have many components such as sensor nodes, cluster head and base station. It does
not describe the functionality of the system but it describes the components used
to make those functionalities. Components communicate with each other using
interfaces.

The interfaces are linked using connectors. The Figure 3.5 shows a component
11
diagram.

Fig 3.1 component diagram of college admission predictor

3.1.5 DEPLOYMENT DIAGRAM OF COLLEGE ADMISSION


PREDICTOR

A deployment diagrams shows the hardware of your


system and the software in those hardware. Deployment diagrams are useful when
your software solution is deployed across multiple machines such as sensor
nodes, cluster head and base station with each having a unique configuration. The
Figure 3.6 represents deployment diagram for the developed application.

Deployment Diagram in the figure 3.6 shows how the modules gets
deployed in the system

12
Fig 3.1 deployment diagram of college admission predictor

3.1.6 PACKAGE DIAGRAM OF COLLEGE ADMISSION PREDICTOR

Package diagrams are used to reflect the organization


of packages and their elements. When used to represent class elements, package
diagrams provide a visualization of the namespaces. Package diagrams are used
to structure high level system elements.

Package diagrams can be used to simplify complex class diagrams, it can group
classes into packages. A package is a collection of logically related UML
elements.

Packages are depicted as file folders and can be used on any of the UML diagrams.
The Figure 3.7 represents package diagram for the developed application which
represents how the elements are logically related.

13
Fig 3.1 package diagram of college admission predictor

Package is a namespace used to group together elements that are semantically


related and might change together. It is a general-purpose mechanism to organize
elements into groups to provide better structure for

14
CHAPTER 4
SYSTEM ARCHITECTURE
In this chapter, the System Architecture for College
Admission Predictor is represented and the modules are explained.

4.1 ARCHITECTURE DIAGRAM:

Figure 4.1 System Architecture of College Admission Predictor

15
4.2 ARCHITECTURE DESCRIPTION:
In system architecture the detailed description about the system modules and the
working of each module is discussed as shown in figure 4.1.

4.2.1 Training and Testing :


From the architecture diagram the data set is processed before undergoing the
process and then the training and testing is undergone Different methods are used to
find the chance of admit in the graduate admission predictor

4.2.2 Linear Regression:


Multiple linear regression, also known simply as multiple regression, is a
statistical technique that uses several explanatory variables to predict the outcome of a
response variable. The goal of multiple linear regression is to model the linear
relationship between the independent variables and dependent variable. Multiple
regression analysis is an extension of simple linear regression which is used for
describing and making predictions based on linear relationships between predictor
variables (i.e. independent variables) and a response variable (i.e. a dependent
variable). Although multiple regression analysis is simpler than many other types
statistical modelling methods, there are still some crucial steps that must be taken to
ensure the validity of the results. Multiple linear regression (MLR) is used to determine
a mathematical relationship among a number of random variables. In other terms, MLR
examines how multiple independent variables are related to one dependent variable.

4.2.3 Random Forest:


The random forest algorithm is one of the most popular and powerful machine
learning algorithms that is capable of performing both regression and classification
tasks . This algorithm creates forests within number of decision reads. Therefore, the
more data is available the more accurate and robust results will be provided . Random
Forest method can handle large datasets with higher dimensionality without over fitting
16
the model. In addition, it can handle the missing values and maintains accuracy of
missing data

4.2.4 Decision:
The decision is made that which algorithm should be used to get better result and better
accuracy and the data will be passed by the user and we will get the result.a response
variable (i.e. a dependent variable). Although multiple regression analysis is simpler
than many other types statistical modelling methods, there are still some crucial steps
that must be taken to ensure the validity of the results

17
CHAPTER 5
SYSTEM IMPLEMENTATION

Here we are implementing our model as per the following figure.

Fig 5.1 Implementation Diagram

For any model we should provide the dataset as the input and we should preprocess it
and then we will select the model and measure the accuracy for our model. An
algorithm which provides high accuracy will be considered as the best algorithm for
our model.

Steps to implement proposed work:

Step 1: Importing Libraries Libraries required are pandas, numpy and Multilinear
Regression classifier for prediction. Pandas is used for performing operations on data
frames. Further, more using numpy, we will perform necessary mathematical
operations.

Numpy:

Numpy is a Python package which stands for Numerical Python. It is the core library
for scientific computing, which contains a powerful n-dimensional array object,
provide tools for integrating C, C++ etc. It is also useful in linear algebra, random
number capability etc.

Numpy is a general-purpose array-processing package. It provides a highperformance


multidimensional array object, and tools for working with these arrays. It is the
fundamental package for scientific computing with python. Besides its obvious
scientific uses, Numpy can also be used as an efficient multi-dimensional container of
18
generic data. Array in Numpy is a table of elements (usually numbers), all of the same
type, indexed by a tuple of positive integers.

In Numpy, number of dimensions of the array is called rank of the array. A tuple of
integers giving the size of the array along each dimension is known as shape of the
array. An array class in Numpy is called as ndarray. Elements in Numpy arrays are
accessed by using square brackets and can be initialized by using nested Python.

Pandas:

Pandas is an open-source Python Library providing high-performance data


manipulation and analysis tool using its powerful data structures. The name Pandas is
derived from the word Panel Data an Econometrics from Multidimensional data.

In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.

Prior to Pandas, Python was majorly used for data preparation. It had very little
contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the
origin of data load, prepare, manipulate, model, and analyze.

Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

Step 2: Reading the dataset

To pick the right variables, basic understanding of dataset is enough to know that data
19
is relevant, high quality, and of adequate volume. As part of our model building
efforts, we are working to select the best predictor variables for our model.

Reading the csv file:

df= pd.read_csv(body)

step3:Training and Test Sets:

Splitting Data The dataset should divide into two subsets:

training set—a subset to train a model.

Test set—a subset to test the trained model.

Splitting the data should meet the following two conditions:

Is large enough to yield statistically meaningful results.

Is representative of the data set as a whole. In other words, don’t pick a test set with
different characteristics than the training set

X = df.iloc[:, :-1].values

y = df.iloc[:, 8].values

Scikit-learn provides a range of supervised and unsupervised learning algorithms


via a consistent interface in Python.

It is licensed under a permissive simplified BSD license and is distributed under


many Linux distributions, encouraging academic and commercial use. The library is
built upon the SciPy (Scientific Python) that must be installed before we can use scikit-
learn.

From sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state =


0)

The algorithm can be used for predicting an output vector y given an input matrix
20
X. In the first step a tree ensemble is generated with gradient boosting. The trees are
then used to form rules, where the paths to each node in each tree form one rule. A rule
is a binary decision if an observation is in a given node, which is dependent on the
input features that were used in the splits. The ensemble of rules together with the
original input features are then being input in a L1-regularized linear model, also called
Lasso, which estimates the effects of each rule on the output target but at the same time
estimating many of those effects to zero.

we can use rule fit for predicting a numeric response (categorial not yet implemented).

The input has to be a numpy matrix with only numeric values.

Step 4: Fit multiple Linear Regression model to our Train set.

Step 5: Create an object called regressor in the LinearRegression class

Step 6:Fit the linear regression model to the training set… We use the fit #method the
arguments of the fit method will be training sets

Step7:Predicting the Test set results 3.2.2 Code for Multi Linear Regression

The algorithm can be used for predicting an output vector y given an input matrix X.
In the first step a tree ensemble is generated with gradient boosting. The trees are then
used to form rules, where the paths to each node in each tree form one rule. A rule is a
binary decision if an observation is in a given node, which is dependent on the input
features that were used in the splits. The ensemble of rules together with the original
input features are then being input in a L1-regularized linear model, also called Lasso,
which estimates the effects of each rule on the output target but at the same time
estimating many of those effects to z

21
CHAPTER 6

CODING AND SCREENSHOTS

6.1 CODING:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
admission_df =
pd.read_csv(r"C:\Users\LENOVO\Desktop\Admission_Predict_Ver1.1.csv")
admission_df.head()
admission_df.isnull().sum()
admission_df.info()
df_university=admission_df.groupby(by = "University Rating").mean()
df_university
admission_df.hist(bins=30,figsize=(20,20),color='b')
sns.pairplot(admission_df)
corr_matrix = admission_df.corr()
plt.figure(figsize=(12,12))
sns.heatmap(corr_matrix, annot=True)
plt.show()
admission_df.columns
x=admission_df.drop(columns=["Chance of Admit "])
x.head()
y=admission_df["Chance of Admit "]
y.head()
x.shape
y.shape

22
admission_df=admission_df.rename(columns={'GRE Score':'GRE Score','TOEFL
Score':'TOEFL Score','University Rating':'University Rating','SOP':'SOP','LOR
':'LOR', 'CGPA':'CGPA','Research':'Research','Chance of Admit ':'chance'})
from sklearn.model_selection import train_test_split
X_train, X_test,y_train,y_test = train_test_split(x,y,test_size=0.20)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, accuracy_score
LinearRegression_model = LinearRegression()
LinearRegression_model.fit(X_train,y_train)
n, m = 1,7
cy_train=[1 if chance > 0.83 else 0 for chance in y_train]
cy_train=np.array(cy_train)
cy_test=[1 if chance > 0.83 else 0 for chance in y_test]
cy_test=np.array(cy_test)
print(cy_test)
array = np.array([input().strip().split() for _ in range(n)], float)
pred=LinearRegression_model.predict(array)
print(pred[0]*100)
accuracy_LinearRegression = LinearRegression_model.score(X_test,y_test)
accuracy_LinearRegression
from sklearn.linear_model import LinearRegression
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam
ANN_model =keras.Sequential()
ANN_model.add(Dense(50,input_dim =7))
ANN_model.add(Activation('relu'))
ANN_model.add(Dense(150))
23
ANN_model.add(Activation('relu'))
ANN_model.add(Dropout(0.5))
ANN_model.add(Dense(150))
ANN_model.add(Activation('relu'))
ANN_model.add(Dropout(0.5))
ANN_model.add(Dense(50))
ANN_model.add(Activation('linear'))
ANN_model.add(Dense(1))

ANN_model.compile(loss = "mse", optimizer="adam")


ANN_model.summary()
ANN_model.compile(optimizer = "Adam", loss = "mean_squared_error")
epochs_hist = ANN_model.fit(X_train, y_train, epochs = 100, batch_size = 20)
result = ANN_model.evaluate(X_test, y_test)
accuracy_ANN=1-result
print("Accuracy : {}".format(accuracy_ANN))
n, m = 1,7
from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()
lr = LogisticRegression()
lr.fit(X_train, cy_train)
from sklearn.metrics import classification_report
print(classification_report(cy_test, lr.predict(X_test)))
cy = lr.predict(X_test)
from sklearn.metrics import confusion_matrix
lr_confm = confusion_matrix(cy, cy_test)
print("confusion_matrix")
print(lr_confm)
sns.heatmap(lr_confm, annot=True, fmt='.2f',xticklabels = ["Admitted", "Rejected"] ,
24
yticklabels = ["Admitted", "Rejected"] )
plt.ylabel('Actual Class')
plt.xlabel('Predicted Class')
plt.title('Logistic Regression')
plt.show()
array = np.array([input().strip().split() for _ in range(n)], float)
pred=ANN_model.predict(array)
print(pred[0]*100)
epochs_hist.history.keys()
plt.plot(epochs_hist.history['loss'])
plt.title("Model Loss Progress During Training")
plt.xlabel("Epoch")
plt.ylabel("Training Loss")
#zplt.legend(["Training Loss"])
from sklearn.tree import DecisionTreeRegressor
dr=DecisionTreeRegressor()
dr.fit(X_train, cy_train)
from sklearn.metrics import classification_report
print(classification_report(cy_test, dr.predict(X_test)))
cy = dr.predict(X_test)
from sklearn.metrics import confusion_matrix
dr_confm = confusion_matrix(cy, cy_test)
print("confusion_matrix")
print(dr_confm)
DecisionTree_model = DecisionTreeRegressor()
lrr=DecisionTreeRegressor()
lrr.fit(X_train, cy_train)
from sklearn.metrics import classification_report
print(classification_report(cy_test, lrr.predict(X_test)))
25
cy = lrr.predict(X_test)
from sklearn.metrics import confusion_matrix
lrr_confm = confusion_matrix(cy, cy_test)
print("LinearRegression")
print("confusion_matrix")
print(lrr_confm)
DecisionTree_model.fit(X_train, y_train)
accuracy_DecisionTree = DecisionTree_model.score(X_test,y_test)
accuracy_DecisionTree
n, m = 1,7
array = np.array([input().strip().split() for _ in range(n)], float)
pred=DecisionTree_model.predict(array)
print(pred[0]*100)
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor()
rf.fit(X_train, cy_train)
print(classification_report(cy_test, rf.predict(X_test)))
cy = rf.predict(x_test)
rf_confm = confusion_matrix(cy, cy_test)
print(rf_confm)
print("LinearRegression")
print("confusion_matrix")
print(rr_confm)
RandomForest_model = RandomForestRegressor(n_estimators=100,max_depth = 10)
RandomForest_model.fit(X_train,y_train)
n, m = 1,7
array = np.array([input().strip().split() for _ in range(n)], float)
pred=RandomForest_model.predict(array)
print(pred[0]*100}
26
6.2 SCREENSHOTS:
Dataset used :

Fig 6.1 Dataset before processing

27
CORRELATED GRAPH:

Fig 6.2 Correlated Graph

28
Fig 6.3 Dataset After processing

29
OUTPUT SCREENSHOTS:

Fig 6.4 Confusion Matrix of Logistic Regression

Fig 6.5 Logistic Regression Confusion Matrix and Classification Report

30
Fig 6.6 Linear Regression Confusion Matrix and Classification Report

Fig 6.7 Input and Chance of Admit using linear regression

31
CHAPTER 7
CONCLUSION AND FUTUREWORK
CONCLUSION:
The main objective of this work is to compare the machine learning algorithms and
finding out the best suitable algorithms to predict the chance to get admit in Foreign
Universities by developing a prototype of the system that can be used by the students
aspiring to pursue their education in the USA. Multiple Machine Learning algorithms
were used for the proposed work. Random Forest proved to best-fit for development of
the system when compared with the Multi Linear and Polynomial Model. This model
can be used by the students for evaluating their chances of getting shortlisted in a
particular university with an average accuracy of 91%. From the proposed work we are
able to identify only chance to get seat and we are not able to identify which university
we are obtaining. So, in future we can develop a model, which gives us a list of
universities in which we can obtain admission.
FUTURE WORK:
As for the future work, more models can be conducted on more datasets to learn the
model that gives the best performance. we can develop a model, which gives us a list
of universities in which we can obtain admission. machine learning models were
performed to predict the opportunity of a student to get admitted to a master’s program.
The machine learning models included are multiple linear regression, k-nearest
neighbour, random forest, and Multilayer Perceptron. Experiments show that the
Multilayer Perceptron model surpasses other models.

32
REFERENCES

[1] M. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Multi-split


Optimized Bagging Ensemble Model Selection for Multi-class Educational Data
Mining,” Appl. Intell., vol. 50, pp. 4506–4528, 2020.

[2] F. Salo, M. Injadat, A. Moubayed, A. B. Nassif, and A. Essex, “Clustering


Enabled Classification using Ensemble Feature Selection for Intrusion
Detection,” in 2019 International Conference on Computing, Networking and
Communications (ICNC), 2019, pp. 276–281.

[3] M. N. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Systematic


ensemble model selection approach for educational data mining,” Knowledge-
Based Syst., vol. 200, p. 105992, Jul. 2020.

[4] A. Moubayed, M. Injadat, A. B. Nassif, H. Lutfiyya, and A. Shami, “E-


Learning: Challenges and Research Opportunities Using Machine Learning Data
Analytics,” IEEE Access, 2018.

[5] M. S. Acharya, A. Armaan, and A. S. Antony, “A Comparison of Regression


Models for Prediction of Graduate Admissions,” Kaggle, 2018. .

[6] S. S. Shapiro, M. B. Wilk, and B. T. Laboratories, “An analysis of variance


test for normality,” 1965.

[7] G. K. Uyanık and N. Güler, “A Study on Multiple Linear Regression


Analysis,” Procedia - Soc. Behav. Sci., vol. 106, pp. 234–240, 2013.

[8] C. López-Martín, Y. Villuendas-Rey, M. Azzeh, A. Bou Nassif, and S.


Banitaan, “Transformed k-nearest neighborhood output distance minimization
for predicting the defect density of software projects,” J. Syst. Softw., vol. 167,
p. 110592, Sep. 2020.

[9] A. B. Nassif, O. Mahdi, Q. Nasir, M. A. Talib, and M. Azzeh, “Machine


Learning Classifications of Coronary Artery Disease,” in 2018 International

33
Joint Symposium on Artificial Intelligence and Natural Language Processing
(iSAI-NLP), 2018, pp. 1–6.

[10] N. S. Altman, “An introduction to kernel and nearest-neighbor


nonparametric regression,” Am. Stat., vol. 46, no. 3, pp. 175–185, 1992.

[11] A. B. Nassif, M. Azzeh, L. F. Capretz, and D. Ho, “A comparison between


decision trees and decision tree forest models for software development effort
estimation,” in 2013 3rd International Conference on Communications and
Information Technology, ICCIT 2013, 2013, pp. 220–224.

[12] T. K. Ho, Random Decision Forests. USA: IEEE Computer Society, 1995.
[13] A. B. Nassif, “Software Size and Effort Estimation from Use Case
Diagrams Using Regression and Soft Computing Models,” University of
Western Ontario, 2012.

[14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal


representations by error propagation,” MIT Press. Cambridge, MA, vol. 1, no. V,
pp. 318–362, 1986.

[15] M. S. Acharya, A. Armaan, and A. S. Antony, “A comparison of regression


models for prediction of graduate admissions,” ICCIDS 2019 - 2nd Int. Conf.
Comput. Intell. Data Sci. Proc., pp. 1–5, 2019.

34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy