Mini Proj RCT 222 PDF
Mini Proj RCT 222 PDF
INTRODUCTION
The world markets are developing rapidly and continuously looking for the
best knowledge and experience among people. Young workers who want to stand
out in their jobs are always looking for higher degrees that can help them in
improving their skills and knowledge.
OVERVIEW
As a result, the number of students applying for graduate studies has increased
in the last decade [1]–[4]. This fact has motivated us to study the grades of students
and the possibility of admission for master’s programs that can help universities
in predicting the possibility of accepting master’s students submitting each year
and provide the needed resources. The dataset [5] presented in this paper is related
to educational domain.
Admission is a dataset with 500 rows that contains 7 different independent
variables which are:
• Graduate Record Exam1 (GRE) score. The score will be out of 340 points.
• Test of English as a Foreigner Language2 (TOEFL) score, which will be out of
120 points.
• University Rating (Uni.Rating) that indicates the Bachelor University ranking
among the other universities. The score will be out of 5.
1
PROBLEM STATEMENT
College Admission predictor is the process of predicting the possibility of the
students whether they will admit into the college or not . Deep learning is used in
various computer vision applications and related studies. Here we are developing an
Admission predictor using linear regression in the Machine learning
EXISTING SYSTEM
In the Existing System, Many machine algorithms are used to the prediction of
Graduate Admission. The existing system compares the four machine learning
algorithms on the basis of accuracy. The algorithms are Linear Regression, Support
Vector Regression, Random forest Regression .In this system Linear Regression
performs the best on the dataset with low MSE and high R2 score. Figure 2, shows
sample data set, the dataset contains 500 rows and 7 independent variables of data. In
the figure 1, showing the architecture of existing system
PROPOSED SYSTEM
The proposed system consist of four regression models. Out of those we use Linear
Regression using Dimensionality Reduction which is also a high accurate model. A
user interface is provided through which an actor can interact with the system. The
algorithm with improved accuracy will act as a backend for the user interface.
Whenever any actor (Student/Consultancy) provides the data to the user interface it
will show the result of Chance of Admission which is ranging 0 to 1. Figure 3:
Architecture of Propose System User Manual There are several steps for using this: 1.
Initially the user has to open our website and provide all
2
CHAPTER 2
LITERATURE SURVEY
4
universities/colleges where the which best suitable for a student based on their
academic records and college admission criteria. The model was developed by
applying data mining techniques and knowledge discovery rules to the already existing
in-house admission prediction system of the university. (Mane) conducted a similar
research that predicted the chance of a student getting admission in college based on
their Senior Secondary School ,Higher Secondary School and Common Entrance
Examination scores using the pattern growth approach to association rule mining. The
performance of both the models was good the only drawback was the problem
statement was single university centric.
O. Mahdi et.al [9] In research conducted by (Jamison) the yield of college
admission was predicted using machine learning techniques. Yield rate can be defined
as the rate at which the students who have been granted admission by the university
actually enroll for the course. Multiple machine learning algorithms like Random
Forest, Logistic Regression and SVM were used to create the model; the models were
compared based on their performance and accuracy, Random Forest outperformed the
other models with 86% accuracy and was thus used to create the system. The factors
that proved to be significant in predicting successful application were also highlighted.
N. S. Altman, et.al [10] GRADE system was developed by to support the
admission process for the graduate students in the University of Texas Austin
Department of Computer Science. The main objective of the project was to develop a
system that can help the admission committee of the university to take better and faster
decisions. Logistic regression and SVM were used to create the model, both models
performed equally well and the final system was developed using Logistic regression
due to its simplicity .The time required by the admission committee to review the
applications was reduced by 74% but human intervention was required to make the
final decision on status if the application.
M. Azzeh et.al [11] created a similar model to predict the enrolment of the
student in the university based on the factors like SAT score, GPA score, residency
5
race etc. The Model was created using the Multiple Logistic regression algorithm, it
was able to achieve accuracy rate of 67% only.
T. K. Ho et.al [12] In look into directed by Jamison the yield of school
affirmation was anticipated utilizing AI procedures. Yield rate can be characterized as
the rate at that the scholars are been allowed affirmation by the college enlist for the
course. Numerous AI calculations like Random Forest, Logistic Regression and SVM
were utilized to make the representation.
6
model that performs the best. The performance is analyzed through various
metrics such as accuracy, precision, F-measure, recall and area under the receiver
operator curve. For example, chose to use ensemble learning for an efficient
result. In the performance is analyzed through We ka in order to decide which
model performs the best based on mean absolute error (MAE) value. While use
multiple machine learning models and pick the model that performs the best. The
performance is analyzed through various metrics such as accuracy, precision, F-
measure, recall and area under the receiver operator curve
CHAPTER 3
SYSTEM DESIGN
7
In this chapter, the various UML diagrams College admission
predictor is represented and the various functionalities are explained.
Sequence diagrams model the flow of logic within thesystem in a visual manner,
enabling to both document and validate the logic,and are commonly used for
both analysis and design purposes.
9
Fig 3.1 sequence diagram of college admission predictor
3.1.3ACTIVITY DIAGRAM OF COLLEGE ADMISSION PREDICTOR
10
Fig 3.1 activity diagram of college admission predictor
The interfaces are linked using connectors. The Figure 3.5 shows a component
11
diagram.
Deployment Diagram in the figure 3.6 shows how the modules gets
deployed in the system
12
Fig 3.1 deployment diagram of college admission predictor
Package diagrams can be used to simplify complex class diagrams, it can group
classes into packages. A package is a collection of logically related UML
elements.
Packages are depicted as file folders and can be used on any of the UML diagrams.
The Figure 3.7 represents package diagram for the developed application which
represents how the elements are logically related.
13
Fig 3.1 package diagram of college admission predictor
14
CHAPTER 4
SYSTEM ARCHITECTURE
In this chapter, the System Architecture for College
Admission Predictor is represented and the modules are explained.
15
4.2 ARCHITECTURE DESCRIPTION:
In system architecture the detailed description about the system modules and the
working of each module is discussed as shown in figure 4.1.
4.2.4 Decision:
The decision is made that which algorithm should be used to get better result and better
accuracy and the data will be passed by the user and we will get the result.a response
variable (i.e. a dependent variable). Although multiple regression analysis is simpler
than many other types statistical modelling methods, there are still some crucial steps
that must be taken to ensure the validity of the results
17
CHAPTER 5
SYSTEM IMPLEMENTATION
For any model we should provide the dataset as the input and we should preprocess it
and then we will select the model and measure the accuracy for our model. An
algorithm which provides high accuracy will be considered as the best algorithm for
our model.
Step 1: Importing Libraries Libraries required are pandas, numpy and Multilinear
Regression classifier for prediction. Pandas is used for performing operations on data
frames. Further, more using numpy, we will perform necessary mathematical
operations.
Numpy:
Numpy is a Python package which stands for Numerical Python. It is the core library
for scientific computing, which contains a powerful n-dimensional array object,
provide tools for integrating C, C++ etc. It is also useful in linear algebra, random
number capability etc.
In Numpy, number of dimensions of the array is called rank of the array. A tuple of
integers giving the size of the array along each dimension is known as shape of the
array. An array class in Numpy is called as ndarray. Elements in Numpy arrays are
accessed by using square brackets and can be initialized by using nested Python.
Pandas:
In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data preparation. It had very little
contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the
origin of data load, prepare, manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.
import numpy as np
import pandas as pd
To pick the right variables, basic understanding of dataset is enough to know that data
19
is relevant, high quality, and of adequate volume. As part of our model building
efforts, we are working to select the best predictor variables for our model.
df= pd.read_csv(body)
Is representative of the data set as a whole. In other words, don’t pick a test set with
different characteristics than the training set
X = df.iloc[:, :-1].values
y = df.iloc[:, 8].values
The algorithm can be used for predicting an output vector y given an input matrix
20
X. In the first step a tree ensemble is generated with gradient boosting. The trees are
then used to form rules, where the paths to each node in each tree form one rule. A rule
is a binary decision if an observation is in a given node, which is dependent on the
input features that were used in the splits. The ensemble of rules together with the
original input features are then being input in a L1-regularized linear model, also called
Lasso, which estimates the effects of each rule on the output target but at the same time
estimating many of those effects to zero.
we can use rule fit for predicting a numeric response (categorial not yet implemented).
Step 6:Fit the linear regression model to the training set… We use the fit #method the
arguments of the fit method will be training sets
Step7:Predicting the Test set results 3.2.2 Code for Multi Linear Regression
The algorithm can be used for predicting an output vector y given an input matrix X.
In the first step a tree ensemble is generated with gradient boosting. The trees are then
used to form rules, where the paths to each node in each tree form one rule. A rule is a
binary decision if an observation is in a given node, which is dependent on the input
features that were used in the splits. The ensemble of rules together with the original
input features are then being input in a L1-regularized linear model, also called Lasso,
which estimates the effects of each rule on the output target but at the same time
estimating many of those effects to z
21
CHAPTER 6
6.1 CODING:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
admission_df =
pd.read_csv(r"C:\Users\LENOVO\Desktop\Admission_Predict_Ver1.1.csv")
admission_df.head()
admission_df.isnull().sum()
admission_df.info()
df_university=admission_df.groupby(by = "University Rating").mean()
df_university
admission_df.hist(bins=30,figsize=(20,20),color='b')
sns.pairplot(admission_df)
corr_matrix = admission_df.corr()
plt.figure(figsize=(12,12))
sns.heatmap(corr_matrix, annot=True)
plt.show()
admission_df.columns
x=admission_df.drop(columns=["Chance of Admit "])
x.head()
y=admission_df["Chance of Admit "]
y.head()
x.shape
y.shape
22
admission_df=admission_df.rename(columns={'GRE Score':'GRE Score','TOEFL
Score':'TOEFL Score','University Rating':'University Rating','SOP':'SOP','LOR
':'LOR', 'CGPA':'CGPA','Research':'Research','Chance of Admit ':'chance'})
from sklearn.model_selection import train_test_split
X_train, X_test,y_train,y_test = train_test_split(x,y,test_size=0.20)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, accuracy_score
LinearRegression_model = LinearRegression()
LinearRegression_model.fit(X_train,y_train)
n, m = 1,7
cy_train=[1 if chance > 0.83 else 0 for chance in y_train]
cy_train=np.array(cy_train)
cy_test=[1 if chance > 0.83 else 0 for chance in y_test]
cy_test=np.array(cy_test)
print(cy_test)
array = np.array([input().strip().split() for _ in range(n)], float)
pred=LinearRegression_model.predict(array)
print(pred[0]*100)
accuracy_LinearRegression = LinearRegression_model.score(X_test,y_test)
accuracy_LinearRegression
from sklearn.linear_model import LinearRegression
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam
ANN_model =keras.Sequential()
ANN_model.add(Dense(50,input_dim =7))
ANN_model.add(Activation('relu'))
ANN_model.add(Dense(150))
23
ANN_model.add(Activation('relu'))
ANN_model.add(Dropout(0.5))
ANN_model.add(Dense(150))
ANN_model.add(Activation('relu'))
ANN_model.add(Dropout(0.5))
ANN_model.add(Dense(50))
ANN_model.add(Activation('linear'))
ANN_model.add(Dense(1))
27
CORRELATED GRAPH:
28
Fig 6.3 Dataset After processing
29
OUTPUT SCREENSHOTS:
30
Fig 6.6 Linear Regression Confusion Matrix and Classification Report
31
CHAPTER 7
CONCLUSION AND FUTUREWORK
CONCLUSION:
The main objective of this work is to compare the machine learning algorithms and
finding out the best suitable algorithms to predict the chance to get admit in Foreign
Universities by developing a prototype of the system that can be used by the students
aspiring to pursue their education in the USA. Multiple Machine Learning algorithms
were used for the proposed work. Random Forest proved to best-fit for development of
the system when compared with the Multi Linear and Polynomial Model. This model
can be used by the students for evaluating their chances of getting shortlisted in a
particular university with an average accuracy of 91%. From the proposed work we are
able to identify only chance to get seat and we are not able to identify which university
we are obtaining. So, in future we can develop a model, which gives us a list of
universities in which we can obtain admission.
FUTURE WORK:
As for the future work, more models can be conducted on more datasets to learn the
model that gives the best performance. we can develop a model, which gives us a list
of universities in which we can obtain admission. machine learning models were
performed to predict the opportunity of a student to get admitted to a master’s program.
The machine learning models included are multiple linear regression, k-nearest
neighbour, random forest, and Multilayer Perceptron. Experiments show that the
Multilayer Perceptron model surpasses other models.
32
REFERENCES
33
Joint Symposium on Artificial Intelligence and Natural Language Processing
(iSAI-NLP), 2018, pp. 1–6.
[12] T. K. Ho, Random Decision Forests. USA: IEEE Computer Society, 1995.
[13] A. B. Nassif, “Software Size and Effort Estimation from Use Case
Diagrams Using Regression and Soft Computing Models,” University of
Western Ontario, 2012.
34