Ra Report Final
Ra Report Final
“RESUME ANALYZER”
SUBMITTED BY
GUIDED BY
Prof.Kanchan Bhale
DEPARTMENT OF INFORMATION TECHNOLOGY
2024-2025
vii
INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY
HINJEWADI, PUNE-57
Certificate
This is to certify that the SRS report entitled
“RESUME ANALYZER”
have partially completed the Project entitled “Resume Analyzer”, under my guidance in partial
fulfillment of the requirement for the Project Based Learning in S.E. Information Technology of
International Institute of Information Technology, Hinjewadi, by Savitribai Phule Pune
University for the academic year 2024 – 2025.
vii
Acknowledgement
With great joy and thankfulness, we submit this project report entitled "Resume Analyzer" as part of the
Project-Based Learning (PBL) in Second Year Information Technology at International Institute of
Information Technology, Hinjawadi, Pune.
This project has been a very enriching and learning experience for us. We are very privileged to have
worked on this project under the able guidance of Prof. Kanchan Bhale. Her regular advice, motivation,
and constructive criticism have been of great help during the course of developing this project. Her
mentorship guided us to formulate and streamline our ideas into an operational system.
We would like to convey our heartfelt gratitude to Dr. Jyoti Surve, Information Technology Department
Head, whose guidance and support ensured a healthy learning environment and facilitated valuable
resources being made available to us throughout the duration of this project.
Our heartfelt gratitude also goes to Dr. Vaishali Patil, our esteemed Principal, for granting the
infrastructure, facilities, and academic environment necessary for carrying out the project smoothly and
efficiently.
In addition, we convey our sincere appreciation to all the faculty members of the Information
Technology Department, and management for their support and motivation during the project period.
This project has played an important role in our academic knowledge, personal development, and insight
into real-world applications of Natural Language Processing and Machine Learning in the field of
automated resume assessment systems
Subodh Unavane.
Prashant Wadile.
Limesh Warude
Prasanna Vishwas.
ix
Contents
1.1 Overview 1
2 Literature Survey 4
2.1 Research Paper Summary
3.1 Introduction 9
3.1.1 Purpose 9
3.2.1 Frontend 10
3.2.2 Backend 10
3.6.2 ER Diagram 13
4 System Design 15
5 Technical Specification 20
6 Result Discussion 28
7 Glossary 34
8 Conclusion 35
9 References 36
ix
List of Figures
List of Tables
1 Literature Survey 4
2 Phase Wise Implementation 13
xxii
Abstract
Resume Analyzer is a web application built with Flask that helps simplify the campus recruitment
process through the provision of an intelligent and well-structured platform for analyzing student
resumes. The system provides a platform for students to register, upload their resumes, and get
automated skill-based feedback and course suggestions, while Training and Placement Officers
(TPOs) can view analyzed data through a special dashboard.
Through the integration of natural language processing (NLP), user verification, and database
management, Resume Analyzer mechanizes the otherwise labor-intensive and manually intensive
process of resume screening. Not only does this minimize human error, but it also improves accuracy
and consistency in evaluating candidates' capabilities and preparedness for industry demands.
The system guarantees that resume assessments are performed in a standard format and are available
in real time to approved users. This centralized assessment platform enables TPOs to make informed
decisions, provide customized career advice, and effectively match students to prospective career
opportunities based on skill gaps.
6
Chapter 1
Introduction to Project Topic
1.1 Overview
Resume screening is an important step in the recruitment and selection process in today's
competitive job market. Manual screening of resumes, however, is redundant with hundreds of
applications flooding in for every vacancy, proving to be time-consuming, cumbersome, and
subject to inconsistency or human prejudice. This is not unique to industries but is also the problem
of Training and Placement Officers (TPOs) in educational institutions in the context of campus
recruitments.
The Resume Analyzer project meets this challenge by having an automated, intelligent, and
scalable system analyzing student resumes via Natural Language Processing (NLP). The system is
implemented based on Flask (Python web framework) and takes advantage of PDF parsing, regular
expressions, and keyword-matching algorithms to pull structured information such as candidate
name, contact info, education, skills, and experience from uploaded resumes.
After parsing, the system rates against a domain-based dataset of keywords to generate scores and
highlight skill gaps. Based on this report, it sends course recommendations through which students
are able to improve their profiles in targeted sections of Web Development, Data Science, Android,
iOS, and UI/UX.
Students are able to register, login, upload resume, and see scores and recommendations.
TPOs have access to all student scores and data through a centralized dashboard, allowing them to
make data-driven placement decisions.
Overall, the system increases efficiency, decreases manual labor, and facilitates data-driven
decision-making in college placement processes..
Resume Analyzer is a web-based system that is aimed at helping Training and Placement Officers
(TPOs) screen student resumes effectively and efficiently. Developed using Flask, Python, and
Natural Language Processing (NLP), the system offers a platform where students can register, login,
and upload their resume in PDF format.
1
As soon as a resume is uploaded, the system itself identifies vital information like name, email
address, phone number, education background, skills, and work experience based on text extraction
and pattern-matching rules. The system compares the extracted data against a predetermined list of
domain-related keywords to measure the quality of the resume and determine a resume score.
According to the skill or domain gaps identified, the system also recommends appropriate online
courses to students so that they enhance their profiles and become industry-fit. At the same time,
TPOs are provided with a dashboard where they can see all student responses, scores, and suggested
improvements in one central place.
Resume Analyzer helps to ensure resume assessments are done uniformly, eliminating the necessity
for manual screening and minimizing the potential for human error. Resume Analyzer introduces
organization, precision, and instant accessibility into the placement process, rendering it an excellent
instrument for institutions of learning getting ready to deal with campus placement.
In universities, Training and Placement Officers (TPOs) screen hundreds of student resumes during
campus recruitment drives. Historically, it is done manually, which is not only inefficient and time-
consuming but also prone to human error, inconsistency, and bias. Screening resumes of different
formats and structures increases the complexity, and it becomes challenging to quickly and
objectively identify suitable candidates.
Also, students are generally unaware of the lacunae in their skillsets and do not receive individualized
mentorship to enhance their resumes according to industry standards. No uniform technique is
available to examine the quality of a resume or propose focused courses that can make the students
more employable.
Thus, there is a requirement for a computerized, smart, and scalable mechanism capable of:
Resume Analyzer solves this issue by offering an web-based solution with Natural Language
Processing (NLP) that assists both the students and the TPOs in efficiently streamlining the
recruitment preparation process.
2
1.4 Applying Software Engineering Approach
1. Requirement Analysis:
During this stage, the project objectives and non-functional and functional requirements were clearly
outlined. The system was designed to support the upload of resumes, NLP parsing, resume scoring,
and course suggestion for students and a TPO dashboard.
2. System Design:
The system architecture, ER diagrams, and data flow were prepared based on requirements. Critical
modules like login/registration, upload file, resume parser, and dashboards were organized for
execution.
3. Implementation
The system was implemented with Python (Flask framework) as the backend and simple HTML/CSS
as the frontend. External libraries such as pdfminer, re, and spaCy were used for parsing and NLP
operations. Code was modularized to improve readability and testability.
4. Testing:
The application was tested with resumes of various formats and industries to validate correct
extraction of candidate information and correct functionality of the resume scoring mechanism.
The end system was also implemented locally and tested locally, and it has been implemented for
scalability purposes, ready for the future deployment into a college intranet or cloud hosting service.
Clean coding practices and programming principles, good version control via Git, as well as
comprehensive documentation, have been employed in the course of development. In this way, the
3
system would be highly dependable, manageable, and easily expanded with future added features like
analysis, live updation, or integration into foreign platforms..
4
Chapter 2
Our Resume Analyzer using NLP addresses the limitations of existing solutions by combining
rule-based and intelligent NLP techniques for accurate and structured data extraction. The project
is designed to be lightweight, cost-effective, and adaptable for educational or recruitment purposes,
ensuring both flexibility and performance..
Summary:
The article solves the problem of effectively screening and ranking a large number of resumes during the
hiring process. It suggests an automated system that employs Natural Language Processing (NLP)
and machine learning (ML) to make the process efficient. The system seeks to identify important
information in resumes, match candidates' qualifications with job specifications, and rank resumes
according to their appropriateness. This method attempts to cut down time invested by recruiters on
manual filtering and increase the probability of selecting the top candidate.
• The mechanism incorporates NLP and ML for classifying resumes and ranking applicants.
• MaLSTM model (Siamese networks + LSTM with Manhattan distance) is utilized for ranking
candidates according to similarity between the resume and the job description.
• The system pulls information such as skills, experience, and education from resumes.
• The system architecture consists of a user interface, a processing block, and a database.
The study concludes that resume ranking via machine learning proves more efficient compared to
conventional methods using human intuition. The proposed approach will aid in constructing future
headhunting solutions through candidate ranking and re-ranking as per hiring behavior.
6
2.2 Research Paper Summary 2
Kamal Chandwani et al. (2024).
Smart Resume Analyzer using NLP.
International Journal of Advanced Research in Engineering, Science and Management (IJARESM),
12(5).
🔗 https://www.ijaresm.com/smart-resume-analyzer-using-nlp
Summary
This research study discusses the application of artificial intelligence (AI) for enhancing resume analysis
and refining the recruitment process. The researchers built a resume analyzer with the help of natural
language processing (NLP) and machine learning algorithms. The objective is to design a system
that can effectively extract prominent information from resumes and determine candidate suitability
correctly. The research investigates how various AI methods handle the analysis of resumes across
various industries and job levels. It also looks into how the formatting and content of resumes affect
candidate selection. The research findings seek to offer insights that will enable recruiters to hire
more effectively and advise job seekers on how to produce effective resumes.
• The resume analyzer built utilizes NLP to extract key information such as skills, experience,
education, and certifications.
• The system is made user-friendly, with an intuitive interface and adjustable features to
support recruiters in making decisions.
The study concludes that AI can transform resume analysis and recruitment. By using sophisticated NLP
methods and developing custom analyzers, one can have deeper insights into potential candidates.
But it's important to shun typical errors such as too much dependence on keywords and the omission
7
of soft skills. Ongoing refinement, user input, and remedying biases are necessary to build a more
equitable and streamlined recruitment process.
Summary:-
This paper proposes a model for the extraction of significant details from semi-structured CVs/resumes
and prioritizing them in terms of a given employer's needs.
Three key phases comprise the process: segmenting the CV/resume based on subject, retrieving
structured data from unstructured format, and ranking this data on the basis of a decision tree
algorithm.
The CV/resume is rendered as HTML for enabling the extraction of structured data. The system employs
the ID3 decision tree algorithm to classify and weight the extracted information, with the positively
weighted data being employed to train the system for future application.
• The ID3 decision tree algorithm is employed for classification and weighting of data.
• The performance of the system is tested using other classifier algorithms such as logistic
regression.
8
According to the research, the proposed model is capable of extracting and ranking information from
CVs/resumes efficiently. Using NLP and machine learning, the system seeks to automate and make
the resume screening process more efficient.
Summary:-
This research paper discusses the issue of high-volume resume processing during recruitment. It
suggests a resume parser based on machine learning and natural language processing (NLP) to
process resumes and identify key information. The objective is to make the extraction of important
information from resumes, which may be in different formats, automatic to assist HR departments
and recruiters in finding appropriate candidates more effectively.
• The solution involves extracting data from resumes using NLP techniques.
• The system seeks to transform unstructured resume data into a structured one for simpler
analysis.
• The analysis is conducted using machine learning algorithms, Support Vector Machines, and
Random Forest.
The research concludes that the suggested method is able to simplify the hiring process through the
automation of resume evaluation. The automation seeks to free time for both recruiters and
applicants, enhance candidate choice, and enhance general recruitment efficiency.
9
Chapter 3
Software Requirements Specification
3.1 Introduction
This chapter outlines the software requirements of the Resume Analyser system. It defines the
objectives, scope, and constraints that guide the design and development process. The goal is to
ensure that all functional and non-functional requirements are clearly identified and addressed to
create a reliable, efficient, and scalable application.
3.1.1 Purpose
The primary purpose of the Resume Analyser system is to automate the resume screening process by
using Natural Language Processing (NLP) techniques. It enables extraction of relevant candidate
information such as name, contact details, education, skills, and experience from uploaded PDF
files. The system streamlines the recruitment process by offering structured data that assists
recruiters or placement officers in faster and more accurate shortlisting of candidates..
3.1.3Project Scope
3.3.1User Interfaces
3.3.2Hardware Interfaces
3.3.3Software Interfaces
• Python 3.10.6
11
• Libraries: PyPDF2, spaCy, pandas
• Web: Flask Framework
The Resume Analyser system is designed with the following key software quality attributes in mind:
• Reliability: The system must consistently extract accurate information from various resume formats
without crashing or malfunctioning.
• Maintainability: The modular structure of the code (separate functions for text extraction, parsing,
and display) allows for easy updates, debugging, and future enhancements.
• Scalability: Designed to support future improvements like batch resume processing or AI-based
analysis without major changes to the architecture.
The Data Flow Diagram (Level 0) represents the overall flow of data within the Resume Analyser
system
Data Flow Diagram:-The Data Flow Diagram of our Resume Analyzer illustrates the step-by-step
process where a student uploads a resume, the backend extracts text, applies NLP to identify key
fields like skills and education, and finally generates structured data with a resume score and course
recommendations visible to the TPO.
12
Figure 3.1: Data Flow Diagram
13
3.6.2 Entity Relationship Diagram
The ER Diagram models the logical relationships between data entities in the system:
Candidate: Stores basic information like name, email, and contact number.
Education: Stores educational qualifications such as degree, institution, and year.
Skills: Stores a list of identified technical and soft skills.
Experience: Stores past job roles, company names, and duration.
Relationships:
A Candidate can have one-to-many relationships with Education, Skills, and Experience entities.
Entity Relationship Diagram:- The ER diagram of our Resume Analyzer system shows how entities
like Student, Resume, Skills, Education, and Domain are linked—where each student uploads one
resume, which is parsed into related skills and education fields, and matched to relevant domains
using the backend keyword dataset.
15
Chapter 4
4.System Design
The Resume Analyser system is structured to process resumes efficiently using a layered architecture.
The system is divided into modules to ensure each task is handled independently and can be easily
maintained or upgraded in the future.
The system architecture of the Resume Analyser defines the flow of data and how different modules
interact with one another. It follows a modular architecture, ensuring separation of concerns and
making the system scalable and maintainable.
System Architecture Diagram:-The system architecture of the Resume Analyzer represents the interaction
between the user interface, Flask-based backend, NLP engine, and the database to enable automated resume
parsing and analysis
16
1. User Upload Interface:
The user uploads a resume in PDF format through a command-line interface or web-based
dashboard.
2. PDF Text Extraction Layer:
The uploaded PDF is processed using PyPDF2, which reads and extracts the raw textual content.
3. NLP Processing Layer:
The extracted text is passed to spaCy, which performs Named Entity Recognition (NER) to detect
fields like:
Name
Email
Phone Number
Education
Skills
Experience
4. Regex Extraction:
In addition to NLP, regular expressions are applied for pattern-based extraction of contact
information such as email IDs and phone numbers.
5. Data Structuring Module:
Parsed data is cleaned and formatted into a structured dictionary or JSON format. Optionally, it can
be stored in a CSV or database for future use.
6. Output Display Layer:
The extracted and structured data is displayed back to the user or exported, based on the
implementation type.
This layered structure improves readability, reusability, and debugging during development.
The Use Case Diagram showcases the interaction between the user (actor) and the Resume Analyser
system. It helps in identifying all the services (use cases) offered by the system from the user's
perspective.
Actors:
Use Cases:
Upload Resume
17
Register
View Resume Score
Login
View All Resumes
UML Diagram:-The UML diagram illustrates the main classes of the Resume Analyzer system—such as
ResumeParser, TextExtractor, EntityRecognizer, and User—and their relationships, showing how data flows
and objects interact within the application
The Class Diagram defines the main classes in the system and the relationships between them. It reflects
the object-oriented structure of the project.
18
🔹 1. User
user_id: int
name: str
email: str
password: str
role: str (e.g., "Student", "TPO")
🔹 4. ResumeParser
resume_text: str
extracted_skills: list
extracted_education: list
extracted_experience: list
extracted_contact: dict (email, phone)
🔹 5. NLPAnalyzer
keywords_dataset: dict
matched_keywords: list
score: int
🔹 6. CourseRecommender
domain_keywords: dict
missing_skills: list
suggested_courses: list
🔹 7. DatabaseManager
connection_string: str
student_records: list
resume_data: dict
19
course_mapping: dict
Relationships:
These classes allow for encapsulated logic and cleaner integration of new modules in the future.
Class Diagram:- The class diagram outlines the core components of the system, including classes like
ResumeParser, TextExtractor, EntityRecognizer, and ResultFormatter, highlighting their attributes
and interactions within the object-oriented structure of the application.
20
Chapter 5
Technical Specification
It includes frontend (HTML, CSS, JavaScript), backend (Flask, Python), database (MySQL), and
system requirements like authentication and automated report generation to ensure seamless
functionality.
import spacy
nlp = spacy.load('en_core_web_sm')
import re
import PyPDF2
import pdfplumber
import requests
import pandas as pd
import base64, random
import time, datetime
import io, os
from pyresparser import ResumeParser
from pdfminer3.layout import LAParams, LTTextBox
from pdfminer3.pdfpage import PDFPage
from pdfminer3.pdfinterp import PDFResourceManager
from pdfminer3.pdfinterp import PDFPageInterpreter
from pdfminer3.converter import TextConverter
from PIL import Image
import pymysql
21
from Courses import (
ds_course, web_course, android_course, ios_course, uiux_course,
resume_videos, interview_videos, cybersecurity_course, game_dev_course,
blockchain_course, cloud_computing_course, big_data_course,
networking_course,
robotics_course, quantum_computing_course, ai_course, vr_ar_course
)
import plotly.express as px
from yt_dlp import YoutubeDL
import json
22
5.1.2 HTML Code (Snippet)
This HTML displays a Project Review Report with student and mentor details. It uses basic CSS for
clean formatting and structured layout. The content includes name fields, review date, and mentor
comments. Ideal for use in web apps or converting to PDF via Flask.
1. Home.html
24
2. Login.html
{% if message %}
<div class="alert alert-info">{{ message }}</div>
{% endif %} {% if error %}
<div class="alert alert-danger">{{ error }}</div>
{% endif %}
25
required
/>
</div>
<div class="d-grid">
<button type="submit" class="btn btn-primary">Login</button>
</div>
<div class="mt-4 text-center">
<p>
Don't have an account?
<a href="{{ url_for('register') }}">Register here</a>
</p>
<hr />
<p class="small text-muted">
For TPO Login, use the credentials provided by your institution
</p>
</div>
</form>
</div>
</div>
</div>
</div>
{% endblock %}
26
3.Admin_Lgin.html
{% if error %}
<div class="alert alert-danger">{{ error }}</div>
{% endif %}
<form method="POST">
<div class="mb-3">
<label for="username" class="form-label">Username</label>
<input
type="text"
class="form-control"
id="username"
name="username"
required
/>
</div>
<div class="mb-3">
<label for="password" class="form-label">Password</label>
<input
type="password"
class="form-control"
id="password"
name="password"
required
/>
27
</div>
<div class="d-grid">
<button type="submit" class="btn btn-primary">Sign In</button>
</div>
</form>
</div>
</div>
</div>
</div>
{% endblock %}
28
Chapter 6
RESULT DISCUSSION:
The Project Review System processes review sheets, evaluating student projects at different stages.
The system efficiently captures mentor feedback, stores it in a structured format, and automatically
generates PDFs for record-keeping and analysis.
1. Home Page
29
30
2.Student Login:
31
Output:
32
3.Admin Login:
31
Output:
34
33
Chapter 7:
Glossary:
Resume Analyzer – A system that reads and interprets resumes using Natural Language Processing
to extract structured data.
Natural Language Processing (NLP) – A branch of AI used to analyze and understand human
language, enabling the system to identify useful information from resumes.
spaCy – An open-source Python library used in the project for processing and extracting named
entities such as names, degrees, and skills from text.
PyPDF – A Python library used to extract raw text from PDF resume files.
Named Entity Recognition (NER) – An NLP technique used to locate and classify entities such as
names, organizations, locations, and dates within text.
Candidate – The person whose resume is uploaded and analyzed by the system.
Recruiter / TPO – The user who accesses analyzed resume data to evaluate candidate
qualifications and suitability.
Dashboard – A user interface for the Training and Placement Officer (TPO) to view structured
resume data and scores.
PDF Resume – The input document format accepted by the system for text extraction and analysis.
Regex (Regular Expressions) – A pattern-matching tool used to extract specific data like emails
and phone numbers from raw text.
Text Extraction – The process of converting PDF content into plain text using libraries like
PyPDF.
Structured Output – The formatted and organized result of resume analysis, which includes fields
such as name, skills, and education.
Resume Score – A numerical evaluation of a resume’s quality or relevance based on extracted
content and predefined metrics.
Authentication – The process that allows only authorized users (students and TPOs) to log in and
access the system functionalities.
Course Recommendation – A system feature that suggests additional courses based on the skills
detected in a student's resume.
Chapter 8
Conclusion
The Resume Analyzer using Natural Language Processing provides an efficient, automated solution
for Training and Placement Officers (TPOs) in colleges to screen and analyze student resumes. By
using Python libraries such as spaCy and PyPDF2, the system extracts essential details like name,
contact info, education, skills, and experience from PDF files and presents them in a structured
format. The inclusion of a resume scoring feature makes it even more helpful in evaluating
candidates based on predefined parameters.
This tool reduces manual workload, minimizes human errors, and speeds up the shortlisting process.
It demonstrates how NLP can be practically applied in academic and recruitment environments. The
modular design ensures maintainability and offers a base for future enhancements tailored to campus
placement needs.
By using this system, colleges can move towards a more digitized and data-driven placement process.
It brings transparency, speed, and consistency, making it easier for recruiters to find the right
candidates and for students to better understand their resume strengths.
Future Scope:
1. Multiformat and Multilingual Support: The system can be extended to accept .docx, .txt, and
image-based resumes through OCR, as well as support regional languages to cater to a wider range of
students.
2. Web-Based Portal for TPOs: A user-friendly web dashboard can be developed for TPOs to
upload resumes, track resume scores, and filter candidates based on different attributes like skills,
branch, or academic year.
3. Integration with Company Portals: The system can connect with external job portals or
employer APIs to automatically match student profiles with available job opportunities.
4. Cloud-Based Deployment: Hosting the system on cloud platforms like AWS or Google Cloud
will enable access across departments or institutions, allowing centralized and scalable placement
management.
5. Student Feedback and Analytics Module: A feedback system from recruiters and tracking
analytics for placed students can be added to improve future recruitment strategies and understand
hiring trends.
35
Chapter 9
References
38