Abstracts DS
Abstracts DS
Abstract:
This study leverages machine learning to analyze the intricacies of college admissions using data
admissions cycle. We explore how demographic details, academic performance, and other
factors influence decisions through a comprehensive data analysis process. After thorough data
cleaning, we delve into the data's hidden patterns, create new features, and uncover key insights
to deploy diverse machine learning algorithms. Our goal is not only to predict but also to
understand the complex factors influencing admissions decisions, contributing to the ongoing
discussion on data-driven decision-making in higher education.
The college admissions process stands as a pivotal gateway for individuals seeking to advance
their education and shape their future careers. The world markets are developing rapidly and
continuously looking for the best knowledge and experience among people. Young workers who
want to stand out in their jobs are always looking for higher degrees that can help them in
improving their skills and knowledge.
In today's rapidly evolving educational landscape, understanding the intricate dynamics that
influence admissions decisions is paramount for both prospective students and educational
institutions alike. Leveraging advanced data science techniques, particularly machine learning,
offers a promising avenue to unravel the complexities of this process and glean valuable insights.
Against this backdrop, this study embarks on a journey to harness the power of machine learning
to dissect the intricacies of college admissions. Focusing on the 2019 admissions cycle we aim to
scrutinize a rich repository of data encompassing demographic details, academic performance
metrics, and various other factors that may influence admissions outcomes. Through a systematic
and rigorous analytical process, we endeavor to unlock hidden patterns, uncover latent insights,
and shed light on the multifaceted decision-making processes that underpin college admissions.
The significance of analyzing college admissions extends beyond individual aspirations; it
encompasses broader societal implications and institutional strategies. With universities facing
increasing pressures to enhance diversity, equity, and inclusion, there is a growing need for data-
driven approaches to inform admissions policies and practices. By looking into the data
generated during the admissions process, institutions can glean actionable intelligence to
optimize their recruitment strategies, improve student retention, and foster an inclusive campus
environment.
Moreover, for prospective students navigating the labyrinthine terrain of college admissions,
access to transparent and data-driven insights can be empowering. Understanding the factors that
influence admissions decisions allows applicants to make informed choices, tailor their
applications strategically, and maximize their chances of securing acceptance to institutions that
align with their academic and personal aspirations.
By employing a diverse array of machine learning algorithms, ranging from established such as
random forests and support vector machines, we aspire to not only predict admissions outcomes
but also interpret the underlying narratives embedded within the data. Our objective extends
beyond simple prediction; we seek to unravel the "stories" behind the data unearthing the
nuanced interplay of factors that shape admissions decisions and painting a comprehensive
portrait of the decision-making landscape.
Through this interdisciplinary endeavor at the intersection of data science and higher education,
we aspire to contribute to the ongoing discourse surrounding data-driven decision-making in
academia. By elucidating the intricate dynamics of college admissions, we aim to equip
stakeholders with actionable insights to foster greater transparency, equity, and effectiveness in
the admissions process. This study serves as a testament to the transformative potential of data
science in higher education, illuminating new pathways for informed decision-making and
advancing the quest for educational excellence and inclusivity.
Literature Review
Machine Learning (ML) has been increasingly utilized in the field of college admissions,
providing valuable insights and predictions to assist students in their application process (Wang
& Shi, 2016). These technologies allow schools to sift through large data sets, evaluating
thousands of applications more efficiently.
Various models, and ML techniques such as logistic regression, neural networks, decision trees,
and random forests have been explored and compared for modeling and forecasting admission
outcomes (Acharya et al.,2019). Logistic regression models can be trained using historical data
(such as past applicants’ GPAs, and test scores) to estimate the probability of admission. On the
other hand, decision trees can manage both numerical and categorical features while Random
Forest provides robust predictions.
These models consider a range of factors such as GRE Score, TOEFL Score, the University
Ranking, Proposal Statement, Recommendation Letter, Undergraduate GPA, and the Study
Experience (Priyadarshini et al., 2023). This aids students in identifying universities suitable for
their profile. Therefore, the choice of model depends on the specific problem, available data,
interpretability, and computational resources.
Despite the potential benefits, there are challenges in applying ML for college admissions
predictions (Li et al., 2023). Issues related to data fairness, model fairness, and the inherently
subjective nature of the admissions process have been identified.
It underscores the importance of predictive modeling fairness and ethical issues in this
application (Priyadarshini et al., 2023). While ML offers exciting possibilities for streamlining
admissions processes, institutions must carefully weigh the benefits against potential
pitfalls. Ensuring fairness, transparency, and a human touch remains essential in shaping the
future of college admissions.
In conclusion, while ML holds promise in streamlining the college admissions process and
assisting students in making informed decisions, careful consideration must be given to the
ethical implications and potential biases inherent in ML models. Further research is needed to
refine these models and address the identified challenges.
EXPERIMENTAL SETUP
The dataset used is dmission data with 400 rows that contain 8 different independent variables
which are:
• Graduate Record Exam (GRE) score. The score will be out of 340 points.
• Test of English as a Foreigner Language (TOEFL) score, which will be out of 120 points.
• University Rating (Uni. Rating) that indicates the Bachelor University ranking among the other
universities. The score will be out of 5.
• Statement of purpose (SOP) which is a document written to show the candidate's life,
ambitions, and motivations for the chosen degree/ university. The score will be out of 5 points.
• Letter of Recommendation Strength (LOR) which verifies the candidate’s professional
experience, builds credibility, boosts confidence, and ensures your competency. The score is out
of 5 points.
• Undergraduate GPA (CGPA) out of 10
• Research Experience that can support the application, such as publishing research papers at
conferences, or working as a research assistant with a university professor (either 0 or 1).
In the data College Admissions Analysis, we used correlation to show how the features
correspond with the output, and which one set of data may correspond to another set. It gives the
measure of the strength of the association between two variables.
Other visualizations of the dataset to get an insight into what the data looks like include pie
charts, histogram pie charts, line plots, and reg plots. They provide a visual interpretation of
numerical
MODEL EVALUATION
In this work, we split the original dataset into training (80 %) and test (20 %) and then train three
machine learning models, The Random Forest Regressor, and Gradient Boosting models to fit
the training data. Then we use the trained models to predict the Chance of Admit. The
performance of the models was measured through the Median squared error, mean squared error,
mean absolute error, Explained variance error, and Coefficient of determination (R 2 score). All
models were run on Anaconda specific (Jupyter) to run code and train the three models. After
finishing the training data and used to predict the chance of admission, then evaluated to choose
the perfect model that has less error rate, so to do this evaluation, we used the R 2 score to choose
the best. Finally, we find that the best model when using the R 2 score is The Random Forest
Regressor.
Conclusion
The field of college admission prediction becomes increasingly relevant as more individuals
choose to earn a degree. In this study, machine learning models were performed to estimate what
are the student's chances of getting an admission . With Machine learning techniques and
analyzing data from the 2019 admission cycle , we have uncovered hidden patterns and
uncovered key factors influencing admissions decisions. As for future work, more models can be
conducted on more current datasets to learn the model that gives the best performance.
Prospects
As we gather more information on students, the model can be enhanced in the future. Natural
language processing methods can also be used to assess the Statement of Purpose essays and
Letters of Recommendation and provide insightful information. In addition, past trends may be
taken into consideration when adjusting the attribute weighting.
References
Zhenru Wang and Yijie Shi, "Prediction of the admission lines of college entrance examination
based on machine learning," 2016 2nd IEEE International Conference on Computer and
Communications (ICCC), Chengdu, China, 2016, pp. 332-335, doi:
10.1109/CompComm.2016.7924718.
Basu, T., Buckmire, R., & Tweneboah, O. (2022). An Application of Machine Learning to
College Admissions: The Summer Melt Problem. Journal of Machine Learning for
Modeling and Computing, 3(4).
Acharya, M. S., Armaan, A., & Antony, A. S. (2019, February). A comparison of regression
models for prediction of graduate admissions. In 2019 International Conference on
Computational Intelligence in Data Science (ICCIDS) (pp. 1-5). IEEE.
Priyadarshini, A., Martinez-Neda, B., & Gago-Masague, S. (2023, September). Admission
Prediction in Undergraduate Applications: An Interpretable Deep Learning Approach.
In 2023 Fifth International Conference on Transdisciplinary AI (TransAI) (pp. 135-140).
IEEE.
Li, L., Sha, L., Li, Y., Raković, M., Rong, J., Joksimovic, S., ... & Chen, G. (2023, March).
Moral machines or the tyranny of the majority? A systematic review on predictive bias in
education. In LAK23: 13th international learning analytics and knowledge
conference (pp. 499-508).
160
140
120
100
Stretch
80
60
40
20
0
5 10 15 20 25 30 35 40 45
Force