Research Paper
Research Paper
Shubham Shaurya
Department of Computer Science PRASUNET
Company Internship Program
shubhamshauryabgp@gmail.com
Abstract
The modern recruitment landscape demands swift, precise, and intelligent solutions to match job
seekers with suitable roles. This research presents the design and implementation of an automated
Resume and Job Description Compatibility Analyzer. The system utilizes Natural Language Processing
(NLP) and Machine Learning (ML) techniques to extract relevant information such as skills, education,
and experience from candidate resumes and job descriptions. A classification model predicts the job
category based on resume content, while fuzzy matching techniques compare resume skills with job
requirements to generate a match score. The solution is deployed through an interactive
Streamlitbased web application, aiming to assist recruiters and job seekers in making informed
decisions with efficiency and accuracy.
Index Terms
Resume Matching, Job Description Analysis, NLP, Machine Learning, Resume Classification, Streamlit,
Fuzzy Matching, Skill Extraction.
1. Introduction
Hiring the right candidate is crucial for organizational growth, yet the process often involves screening
hundreds of resumes manually. This project proposes a machine-driven approach that not only
classifies resumes into job categories using a trained model but also compares the resume with a job
description to evaluate compatibility. By automating resume parsing, keyword extraction, and skill-
matching logic, the system saves time and improves decision-making in talent acquisition.
2. Problem Statement
• Difficulty in identifying the most compatible resume from a large applicant pool.
This project aims to bridge the gap by developing a system that analyzes both resumes and job
descriptions using NLP and ML techniques to predict relevance and category.
• Scikit-learn: For implementing TF-IDF vectorization, Logistic Regression, and Random Forest
classifiers.
• NLTK & spaCy: For text processing, tokenization, and named entity recognition.
4. Dataset Description
The project used the UpdatedResumeDataSet.csv file from Kaggle, which contained resumes
classified into multiple job categories (e.g., Data Scientist, Java Developer, HR). The resumes were
preprocessed to remove noise, lowercase all text, and remove stopwords.
A separate CSV (job_title_des.csv) was used to fetch real-world job descriptions for compatibility
comparison.
5. Methodology
5.1 Preprocessing
• Tokenization: Converting resume and job description text into meaningful tokens.
• Vectorization: TF-IDF vectorizer was used to convert cleaned text into numerical form.
• Skills: Extracted using NLP-based POS tagging and verified against a skills.txt database.
• Education & Experience: Extracted using keyword detection and regex patterns.
• Fuzzy Matching: Used to match extracted skills with job description keywords.
6. Application Workflow
4. Paste Job Description (Optional): System compares and lists matched/missing skills.
7. Results
• Top Skills: System displays top 5 extracted skills and highlights matched ones.
• Category Prediction: High accuracy in predicting categories like Data Science, DevOps, HR,
etc.
8. Deployment
While the classifier worked well for technical resumes, skill extraction sometimes yielded irrelevant
results due to diverse formatting. Integration with spaCy improved skill extraction from context.
Future work may include using BERT-based models for deeper semantic understanding.
10. Conclusion
This system successfully demonstrates how a combination of NLP and ML can automate the
resumejob matching process. By combining skill extraction, job category prediction, and fuzzy
matching, the application provides actionable insights to both job seekers and employers. The
solution significantly reduces manual effort, improves match accuracy, and enhances hiring efficiency.
11. References