123
123
INSTITUTE OF ENGINEERING
NATIONAL COLLEGE OF ENGINEERING
A
MIDTERM PROGRESS REPORT
ON
”AUTOMATED RESUME SCREENING USING
NATURAL LANGUAGE PROCESSING”
SUBMITTED BY:
PRASAMSHA PANDAY (NCE078BCT026)
SABIN PYAKUREL (NCE078BCT036)
PRATIK PANDE (NCE078BECT028)
SUDIP GHIMIRE (NCE078BCT041)
SUBMITTED TO:
DEPARTMENT OF ELECTRONICS & COMPUTER
ENGINEERING
LALITPUR, NEPAL
JANUARY, 2025
Certificate
This is to certify that the work carried out by Mrs. Prasamsha Panday, Mr. Pratik
Pande, Mr. Sabin Pyakurel and Mr. Sudip Ghimire for the project entitled ”Re-
sume Screening Using NLP” for the award of the degree of Bachelor of Computer
Engineering of the Institute of Engineering is based upon the authentic work. We
have the pleasure in forwarding their project. The project was carried out under
our supervision and all the materials included as well as the software product is
the result of their yearlong authentic work-effort.
i
Acknowledgments
We would like to express our sincere gratitude to all those who have contributed
to the completion of this project.
First and foremost, we would like to thank our project supervisor Er Suroj Burlakoti,
for his invaluable guidance, continuous support, and encouragement throughout
the course of this project.
We are also deeply grateful to Department Of Electronics And Computer Engi-
neering for providing the resources and infrastructure necessary to carry out this
work.
We would like to acknowledge the support of our respected teachers of our col-
lege for their sincere advice and constant guidance, supervision and continuous
encouragement throughout the study of the project.
Last but not least, we would like to express my heartfelt gratitude to all the stu-
dents and teachers who helped us in the project and who are directly and indirectly
involved in this project.
Thank you all for your contributions.
ii
Abstract
Automated resume screening using Natural Language Processing (NLP) refers to
the use of AI-driven software to analyze job applicants’ resumes in an automated
fashion. In today’s competitive job market, hiring has become a challenging and
time-consuming process, especially when it comes to reviewing a large number
of resumes. Traditional methods of manual resume screening are not only inef-
ficient but can also introduce unintentional bias into the selection process. The
project, titled “Automated Resume Screening Using NLP,” aims to address these
challenges by creating a web-based application that streamlines the recruitment
process through automation. Using advanced Natural Language Processing (NLP)
techniques, this system will analyze resumes, extract essential details such as skills,
experience, qualifications, and job titles, and match them against specific job re-
quirements. By employing algorithms like cosine similarity, the application will
rank resumes based on how well they align with the job description, helping re-
cruiters identify top candidates efficiently. Additionally, the system is designed
to promote fairness by focusing solely on job-related information, ensuring con-
sistent and unbiased evaluations. The ultimate goal of this project is to improve
the hiring process by making it faster, more accurate, and equitable, benefiting
both employers and job seekers alike. The project produces an accuracy of 96.55
percent, which is considerably high than preceding projects.
iii
Contents
Certificate i
Acknowledgements ii
Abstract iii
List of Figures vi
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review 3
2.1 Related theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 System Analysis 10
3.1 Requirement specification . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . 10
3.1.2 Non-Functional Requirements . . . . . . . . . . . . . . . . . 11
3.2 Feasibility study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Economic Feasibility . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 Legal Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.4 Time Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Methodology 13
4.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iv
4.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.6 Vector Generation Using Word Embeddings . . . . . . . . . . . . . 16
4.7 Model Training: Random Forest Classifier . . . . . . . . . . . . . . 17
4.8 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.9 Similarity Calculation . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.10 Frontend and Backend Development . . . . . . . . . . . . . . . . . . 19
4.11 Tools Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
References 27
A APPENDIX 29
v
List of Figures
4.1 Block Diagram of the System . . . . . . . . . . . . . . . . . . . . . 13
4.2 Activity Diagram of the System . . . . . . . . . . . . . . . . . . . . 14
4.3 Sequence Diagram of the System . . . . . . . . . . . . . . . . . . . 15
4.4 Training of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
vi
List of Abbreviations
HR Human Resources
NER Named Entity Recognition
LSTM Long Term Short Memory
VSM Vector Space Model
BERT Bidirectional Encoder Representation From Transformer
PDF Portable Document Format
NLP Natural Language Processor
CBOW Continuous Bag Of words
vii
1. Introduction
1.1 Background
Finding and hiring qualified employees is a critical function within Human Re-
sources (HR), especially in large and ever-changing job markets. Every month,
millions of individuals enter the workforce, creating a high volume of applications
for each open position. Such high mass can make it difficult to efficiently identify
the best candidates.
One of the main challenges HR departments is time and efficiency. Resumes come
in a variety of formats, making it time-consuming and prone to errors to manu-
ally screen and shortlist applicants. Effectively evaluating resumes also requires a
deep understanding of the specific skills and experience needed for the role, which
can be inconsistent within HR teams. This creates a situation where qualified
candidates might be overlooked due to inefficient screening processes, while HR
departments spend excessive time sifting through applications.
1.3 Objectives
The general objective of the project is to design a employee selection system using
classification algorithm. The specific objectives are:
1
• To design and develop web application to screen resume effectively and effi-
ciently for It sectors.
1.4 Scope
This project aims to transform how recruitment works by automating the process
of screening and ranking resumes. It will make a big difference in areas like IT sec-
tor where candidates apply for different job positions. By using machine learning
and natural language processing, it helps employers quickly find the best candi-
dates while saving time and effort. The system also promotes fairness by giving
all job seekers an equal chance and ensuring resumes match job descriptions more
accurately. Ultimately, it will improve communication between employers and
candidates, making the hiring process smoother and more effective. The scope of
this project is currently limited to processing resumes written in English and fol-
lowing standard resume formats.Additionally,the screening process mainly focuses
on matching keywords and phrases that are most relevant to the job descriptions
provided by employers. While this approach is effective for structured and clearly
defined resumes, future improvements could make the system more versatile by
supporting different formats and multiple languages, making it even more inclusive
and user-friendly.
2
2. Literature Review
The volume of job applications has increased exponentially with the advent of
online job portals, necessitating the use of automated systems to manage and pri-
oritize candidate profiles. Resume ranking programs, leveraging advancements in
artificial intelligence and machine learning, offer promising solutions to streamline
this process. By employing algorithms capable of analyzing and evaluating re-
sumes based on predefined criteria, these programs aim to enhance the efficiency
and accuracy of candidate selection processes. This literature review explores the
evolution, methodologies, challenges, and advancements in resume ranking sys-
tems, providing insights into their effectiveness and potential impact on modern
recruitment practices.
Recent research demonstrates the effectiveness of machine learning and other
such methodologies for ranking resumes through various innovative approaches.
A paper by Chirag Darwani [1] employed various Named Entity Recognition
(NER) approaches to assess similarity between categorized resume data and job
requirements. Techniques included Rule-Based algorithms, regular expressions,
and Bidirectional-LSTM with Conditional Random Field algorithms. The spaCy
module, pre-trained on resume samples, identified entities like names, phone num-
bers, and educational institutions. A content-based recommendation system uti-
lized vectorization, TF-IDF, and cosine similarity measures to rank resumes based
on their fit for job requirements. Vectorization transformed text into numerical
vectors essential for machine learning models. TF-IDF scores reflected term impor-
tance, while cosine similarity computed similarity between resume and job query
vectors. The system used the Vector Space Model (VSM) to represent resumes
and job descriptions, facilitating similarity calculations and candidate ranking.
Performance testing with Software Developer Engineer resumes validated candi-
date rankings based on cosine similarity scores.
In their paper, Tomas Mikolov and his team introduced two new models that
quickly create word vector representations from large datasets. These models im-
3
prove word similarity tasks, performing better than older neural network methods
in both accuracy and speed. Impressively, the techniques can generate high-quality
word vectors from a 1.6 billion-word dataset in less than a day. These word repre-
sentations set new standards for capturing both grammatical and meaning-based
similarities, marking a significant advancement in natural language processing.
The book ”Speech and Language Processing” by Dan Jurafsky and James H. Mar-
tin explore a variety of topics related to natural language processing. They discuss
text preprocessing methods such as tokenization and stemming, as well as vector
embeddings through n-gram models, and offer a thorough introduction to neural
networks. The book also dives into text classification, covering multi-class classi-
fiers like Naive Bayes, along with more advanced techniques like sequence labeling
and machine translation. It offers in-depth insights into both foundational and
contemporary methods utilized in NLP.
A paper[2] used text preprocessing to remove numbers and convert text to low-
ercase, followed by BERT-based extractive summarization for job vacancies using
bert-base-uncased. Summaries were limited to 10 sentences, determined by the
ELBOW method. Text representation involved converting resumes and vacancies
into numeric vectors. Cosine similarity computed between these vectors deter-
mined match scores, sorted for final ranking.
One article compared the performance of two multiclass Classifier(Random Forest
Classifier and Naive Bayes Classifier) and found out that Random Forest Classifier
had better performance and lower error rate of 2% for test dataset with non-linear
relationship among the features than Naive Bayes Classifier which had a error
rate of 6.2%.However,this was not the same case when using another test dataset
where the features were largely independent.In such case,Naive Bayes Classifier
was found to have better error rate of 1% than Random Forest Classifier.
Pradeep Kumar Roy in their research [3],created a system where they can mini-
mize the cost of hiring new candidates for the job positions in the company. They
focused on 3 major problems in this process:
4
• Finding out if the candidate is fit
for the job role.They performed various NLP techniques for text preprocessing,TF-
IDF for vectorization and used Machine Learning to perform the classification
using the algorithms of Random Forest with 38.9 percent accuracy, Multinomial
Naı̈ve Bayes with 44.39 percent, Logistic Regression with 62.4 percent, and the
highest accuracy was obtained by Linear Support Vector Machine Classifier with
an accuracy of 78.53 percent.
In enhancing their project, we integrated Word2Vec embeddings[4] with a lin-
ear Support Vector Machine (SVM) classifier, aiming to augment resume parsing
capabilities. Word2Vec embeddings were employed to capture semantic relation-
ships between words, enriching the system’s understanding of resume content.
The linear SVM classifier utilized these embeddings to classify resumes based on
extracted features such as skills, experiences, and project details. Additionally,
emphasis was placed on user interface (UI) design to provide an intuitive and ef-
ficient experience for recruiters and hiring managers. This integration not only
aimed to improve accuracy in resume parsing but also sought to enhance usabil-
ity through a well-designed interface, addressing both technical and user-centric
aspects of the project.
5
angle between the vectors; that is, it is the dot product of the vectors divided by
the product of their lengths. It follows that the cosine similarity does not depend
on the magnitudes of the vectors, but only on their angle. The cosine similarity
always belongs to the interval [-1,1].
For example, two proportional vectors have a cosine similarity of 1, two orthogonal
vectors have a similarity of 0, and two opposite vectors have a similarity of -1. In
some contexts, the component values of the vectors cannot be negative, in which
case the cosine similarity is bounded in [0,1].
For example, in information retrieval and text mining, each word is assigned a
different coordinate and a document is represented by the vector of the numbers
of occurrences of each word in the document. Cosine similarity then gives a useful
measure of how similar two documents are likely to be, in terms of their subject
matter, and independently of the length of the documents.
The technique is also used to measure cohesion within clusters in the field of data
mining.
One advantage of cosine similarity is its low complexity, especially for sparse vec-
tors: only the non-zero coordinates need to be considered.
Other names for cosine similarity include Orchini similarity and Tucker coefficient
of congruence; the Otsuka–Ochiai similarity is cosine similarity applied to binary
data.
2.2.3 Machine Learning
Machine learning is a field of artificial intelligence (AI) that allows computers to
learn without being explicitly programmed. Machine learning algorithms use data
to learn how to perform tasks such as classification, prediction, and clustering.
ML algorithms are mathematical models that uses different data-sets in the form
of text, audio, images and videos, in order to help the machine to learn, improving
its performance in each iteration.
ML algorithms can be used to perform a variety of tasks like:
1. Classification: This is the task of assigning a label to an input. For example, a
machine learning algorithm could be used to classify images as either cats or dogs.
2. Prediction: This is the task of predicting a future value based on past data.
For example, a machine learning algorithm could be used to predict the weather
6
or the stock market.
3. Clustering: This is the task of grouping similar data together. For example, a
machine learning algorithm could be used to group customers together based on
their buying behavior.
Some of the exciting applications of ML technology are fraud detection, spam fil-
tering, medical diagnosis, self-driving cars, recommendation systems, etc.
2.2.4 Word2Vec
Word2Vecis a widely used technique in natural language processing (NLP) for
learning vector representations of words. Developed by Tomas Mikolov and his
team at Google in 2013, Word2Vec aims to convert words into dense, continuous
vector spaces, enabling the model to capture intricate semantic relationships and
contextual meanings of words.
The core idea behind Word2Vec is to map each word to a vector in a high-
dimensional space where semantically similar words are positioned close to each
other. This mapping allows the model to capture meaningful relationships be-
tween words based on their usage in large text corpora. For example, words with
similar meanings or functions, such as ”king” and ”queen,” will have vector rep-
resentations that are close in this space.
Word2Vec employs two primary model architectures to generate these word vec-
tors: the Continuous Bag of Words (CBOW) model and the Skip-Gram model.
The CBOW model predicts a target word based on its surrounding context words.
For instance, given the context words ”the,” ”cat,” and ”on,” CBOW might pre-
dict the target word ”mat.” On the other hand, the Skip-Gram model works in
reverse; it uses a target word to predict the surrounding context words. For ex-
ample, given the target word ”cat,” Skip-Gram would attempt to predict context
words like ”the,” ”on,” and ”mat.”
During training, Word2Vec uses a neural network to adjust the word vectors such
that the probability of predicting the correct context words (or target words) is
maximized. This process ensures that the learned vectors reflect meaningful rela-
tionships and similarities between words, making them useful for a variety of NLP
tasks, such as text classification, sentiment analysis, and machine translation. By
representing words in a continuous vector space, Word2Vec provides a powerful
7
tool for understanding and processing natural language.
2.2.5 Natural Language Processing(NLP)
• Stemming:
Stemming is a process that reduces words to their root form by stripping
suffixes. This technique helps in standardizing words and reducing dimen-
sionality in text analysis. For example, the words ”running,” ”runner,” and
”runs” might all be reduced to ”run.” Stemming algorithms, such as the
Porter Stemmer and Snowball Stemmer, apply heuristic rules to remove
common prefixes and suffixes, though they do not always produce actual
words. For instance, ”fishing” might be stemmed to ”fish,” but ”fished”
could be stemmed to ”fish” as well.
Example: ”running” → ”run”
”happily” → ”happy”
• Lemmatization:
Lemmatization is a more sophisticated approach than stemming, focusing
on reducing words to their base or dictionary form called a lemma. Unlike
stemming, lemmatization considers the context and the part of speech to
ensure the resulting lemma is a valid word. For example, ”better” is lem-
matized to ”good,” and ”running” is lemmatized to ”run.” Lemmatization
often uses lexical databases like WordNet for accuracy.
Example: ”running” → ”run”
”better” → ”good”
• Tokenization:
Tokenization involves breaking down text into smaller units, such as words,
phrases, or sentences. This process is essential for many NLP tasks as it
simplifies the text into manageable pieces. Tokenization can be word-level
(breaking text into words) or sentence-level (breaking text into sentences).
Example: ”Hello world!” → [”Hello”, ”world!”]
8
have little significance for text analysis, such as ”the,” ”is,” ”in,” etc. Re-
moving stop words helps in focusing on the more meaningful words in the
text. Example: ”The quick brown fox” → [”quick”, ”brown”, ”fox”]
9
3. System Analysis
10
8. Report Generation:
- The system should generate reports summarizing the ranking and classification
results.
• Skill Set: The development team has the necessary skills and expertise to
build and maintain the system.
11
• Scalability: The system architecture to handle future growth in the volume
of resumes and job descriptions.
• Hardware and software costs for development and deployment (servers, stor-
age, etc.)
12
4. Methodology
This flowchart represents a job-matching system. Job seekers upload resumes, and
job providers post job descriptions through a user interface. The data is stored
in respective databases and processed using text preprocessing and Word2Vec
embeddings. Cosine similarity ranks resumes based on relevance, while a Ran-
domForest model categorizes job postings. The system helps match candidates to
suitable jobs efficiently.
13
4.2 Activity Diagram
14
This activity diagram represents a job processing system. It starts with user
interaction, where job providers post and manage jobs, while job seekers view jobs
and upload resumes. The system stores job descriptions and resumes in respective
databases. These are processed through text preprocessing, followed by ranking
resumes and matching job categories. The process ensures efficient job matching
between job providers and job seekers.
This sequence diagram represents the job processing system’s workflow. Users log
in, select a job, or upload a resume through the frontend (Django). The backend
15
stores the data and sends it for processing. Machine learning (Jupyter) applies
text preprocessing, converts text using Word2Vec/FastText, and classifies resumes
using Random Forest Classifier. The processed data is stored, and job matches
with scores are sent back to the frontend. Finally, users see the matching job
results.
The methodology is divided into several key stages: data collection, data pre-
processing, feature extraction,model training, and evaluation.
• Stopword Removal: Common words such as ”the”, ”and”, ”is”, etc., that do
not contribute meaningful information, were removed.
• Lemmatization: Words were reduced to their base or root form (e.g., “run-
ning” to “run”) to ensure consistency and improve analysis.
These preprocessing steps helped clean the data and reduce noise, preparing the
resumes for feature extraction.
16
was applied.Pre-trained word embedding,Word2Vec was used to represent each
word in the resume as a vector. These embedding were chosen because they
capture semantic meaning and relationships between words, helping the model
understand context and job-related terms in resumes.
This process converted each resume into a numerical vector that captured the es-
sential information and context from the text, which was then used as input for
the machine learning model.
The next step was to train a machine learning model using the processed and
vectored data.Random Forest Classifier was used as a classification model which
helped to make prediction of the resumes into respective categories. During model
training,following steps were carried out:
• Data Splitting: The dataset was divided into training and validation sets
17
using an 80-20 split. The training set (80 percent) was used to train the
model, while the validation set (20 percent) was kept aside to evaluate the
model’s performance on unseen data.
• Model Training: The Random Forest classifier was trained on the feature
vectors of the resumes, with each vector labeled according to its correspond-
ing job category.
3.Recall
Recall, also known as sensitivity or true positive rate, is a performance metric
used in binary classification tasks. It measures the proportion of actual positive
instances that are correctly identified by the model.
18
Mathematically, recall is calculated using the formula:
TP
Recall = T P +F N
4.Precision
Precision is a performance metric used in binary classification tasks that measure
the proportion of correctly predicted instances out of all instances predicted as
positive by the model.
TP
Precision = T P +F P
5.F1-score
The F1 score is a performance metric commonly used in binary classification
tasks,which considers both precision and recall to provide a balanced measure of a
model’s performance. It is the harmonic mean of precision and recall,emphasizing
the balance between the two metrics.
2×(Precision×Recall)
F1 = Precision+Recall
• Django: Django, a Python framework has been used for backend develop-
ment, handling the server-side logic, database interactions, and integration
19
of the NLP model to process resumes.
• Tensorflow: Tensorflow has been utilized for training and evaluating the
NLP model, which has helped in automatically analyzing and scoring re-
sumes based on the job requirements.
20
5. Results and Discussion
5.1 Results
The resume screening product innovated from this project was deployed and put
to a test to check its efficiency and effectiveness.The project was successful in
effectively ranking the resumes automatically helping job recruiter to reduce the
tiring work of evaluating the resumes manually.The Random Forest Classification
model that was put in an application was for the most part correctly able to
classify the resumes into respective job categories.Not only this,the product also
provided a platform for job recruiter to post job vacancies and for job seeker to
apply to for those vacant jobs.
21
For class ”Cloud Enginner:” Precision: 0.944, i.e., out of the predicted ”Cybersecu-
rity Specialist” instances, 94.4% were correct.Recall: 1.00, i.e., all actual ”Cyberse-
curity Specialist”(17) instances were correctly identified.F1-score: 0.971 (harmonic
mean of precision and recall).Support: 17 instances of ”Cybersecurity Specialist”
in the test set.
22
5.4 Analysis of Confusion Matrix
Cloud Engineer:
TP: 18 (Correctly classified as Cloud Engineer)
FN: 2 (1 misclassified as Cybersecurity Specialist, 1 as Data Scientist)
FP: 0 (No other class was wrongly predicted as Cloud Engineer)
TN: Remaining 106 instances
Cybersecurity Specialist:
TP: 17 (Correctly classified as Cybersecurity Specialist)
FN: 0 (No Cybersecurity Specialist instance was misclassified)
FP: 0 (No other class was wrongly predicted as Cybersecurity Specialist)
TN: Remaining 110 instances
23
Data Scientist:
TP: 14 (Correctly classified as Data Scientist)
FN: 1 (1 misclassified as DevOps Engineer)
FP: 1 (1 wrongly predicted as Data Scientist from Cloud Engineer)
TN: Remaining 111 instances
DevOps Engineer:
TP: 15 (Correctly classified as DevOps Engineer)
FN: 1 (1 misclassified as Data Scientist)
FP: 1 (1 wrongly predicted as DevOps Engineer from Data Scientist)
TN: Remaining 109 instances
Graphics Designer:
TP: 11 (Correctly classified as Graphics Designer)
FN: 0 (No misclassification)
FP: 0 (No other class wrongly predicted as Graphics Designer)
TN: Remaining 115 instances
Robotics Engineer:
TP: 17 (Correctly classified as Robotics Engineer)
FN: 2 (1 misclassified as Machine Learning Engineer, 1 as Software Developer)
FP: 1 (1 wrongly predicted as Robotics Engineer from Machine Learning Engi-
neer)
TN: Remaining 108 instances
24
Software Developer:
TP: 21 (Correctly classified as Software Developer)
FN: 0 (No misclassification)
FP: 0 (No other class wrongly predicted as Software Developer)
TN: Remaining 117 instances
25
6. Conclusion and Future Enhance-
ments
6.1 Conclusion
The Resume Screening system developed using NLP, Cosine Similarity, and Ran-
dom Forest algorithms provides an efficient and automated solution for resume
screening in recruitment processes. By automating the tedious task of evaluat-
ing resumes, it allows recruiters to focus on high-value tasks such as interviewing
and final decision-making. The system is capable of extracting key data from
resumes, calculating similarity scores with job descriptions, and ranking candi-
dates accordingly, significantly reducing the manual effort involved in candidate
selection.Through the integration of machine learning models, the system not only
offers efficiency but also accuracy, ensuring that the most qualified candidates are
prioritized. Despite facing challenges in handling varied resume formats and op-
timizing the algorithms, the project has demonstrated the potential of leveraging
NLP and machine learning to enhance recruitment processes.
In conclusion, the project successfully addresses a critical need in the recruitment
industry, and with further enhancements, it has the potential to become a robust
solution for organizations looking to streamline their hiring process.
26
deep learning-based models, can be integrated to better handle different
resume formats and improve data extraction accuracy.
• Support for More File Formats: In the future, the system could support
additional file formats beyond DOCX and PDF, such as TXT, RTF, and
ODT, making it more versatile for different user needs.
27
References
[1] Chirag Daryani. An automated resume screening system using natural lan-
guage processing and similarity. Ethics And Information Technology, 2020.
[3] Pradeep Kumar Roy. A machine learning approach for automation of resume
recommendation system. Procedia Computer Science, 2020.
28
Appendix A. APPENDIX
29
Figure A.3: Job Seeker View
30
Figure A.5: Job Provider View
31