Data Ananlysis Project Anandita
Data Ananlysis Project Anandita
A PROJECT REPORT
Submitted by
ANANDITA SAMANT
BACHELOR OF BUSINESS
ADMINISTRATION
SCHOOL OF MANAGEMENT
BHUBANESWAR CAMPUS
CENTURION UNIVERSITY OF TECHNOLOGY AND MANAGEMENT
ODISHA
MAY 2025
SCHOOL OF MANAGEMENT
BHUBANESWAR CAMPUS
1
BONAFIDE CERTIFICATE
Certified that this project report “DATA ANALYSIS THOUGH MICROSOFT EXCEL”
is the Bonafide work of ANANDITA SAMANT who carried out the project work under
my supervision. This is to further certify to the best of my knowledge, that this project
has not been carried out earlier in this institute and the university.
SIGNATURE
Certified that the above mentioned project has been duly carried out as per the
norms of the college and statutes of the university.
SIGNATURE
(Dr. Ronismita Mishra / Dr. Anshuman Jena)
HEAD OF THE DEPARTMENT / DEAN OF THE SCHOOL
Lecturer of School of Management
DEPARTMENT SEAL
2
DECLARATION
I hereby declare that the project entitled “Data Analysis Though Microsoft Excel” submitted
for the 2nd Semester BBA is my original work and the project has not formed the basis
for the award of any Degree / Diploma or any other similar titles in any other University
/ Institute.
Place: Bhubaneswar
Date: 14 / 05 / 2025
3
ACKNOWLEDGEMENTS
I am highly grateful to team members who evinced keen interest and invaluable
support in the progress and successful completion of my project work.
I am indebted to my parents for their constant encouragement, co-operation and
help. Words of gratitude are not enough to describe the accommodation and fortitude
which they have shown throughout my endeavor.
Place: Bhubaneswar
Date: 14 / 04 / 2025
4
TABLE OF CONTENTS
CERTIFICATE i
DECLARATION ii
ACKNOWLEDGEMENT iii
LIST OF ACRONYMS iv
LIST OF TABLE v
2. CHAPTER – 2 FIND HOW MANY BLANK CELLS THERE ARE IN GENDER COLUMN.
3. CHAPTER 3 CHANGE CITY DEVELOPMENT INDEX COLUMN TO PERCENTAGE AND
CLEAN ANY DATA WITH UNNECESSARY CHARACTERS.
4. CHAPTER – 4 FORMAT NO_ENROLLMENT PROPERLY TO "NO ENROLLMENT". WHILE
FORMATTING, KEEP ORIGINAL COLUMN, DO IT IN A SEPARATE COLUMN AND HIDE
THE ORIGINAL COLUMN.
5. CHAPTER – 5 FORMAT EXPERIENCE COLUMN FROM ">20" TO "20+" USING "=IF".
6. CHAPTER – 6 PROPERLY FORMAT COMPANY SIZE COLUMN - CHANGE DATA TYPE TO
TEXT TO FIX OCT-49 TO 10-49.
7. CHAPTER – 7 SHORT DATASET IN ASCENDING MANNER ACCORDING TO ENROLEE ID.
8. CHAPTER- 8 COUNT TOTAL NUMBER OF PEOPLE LOOKING FOR A NEW JOB. (1) IN
TARGET COLUMN.
10. CHAPTER- 10 FILTER AND REMOVE ALL ENROLLEES WITH LAST NEW JOB LESS THAN
OR EQUAL TO 2.
5
14. CHAPTER- 14 FILTER OUT ANY ENROLLEE THAT HAS ANY DATA POINT AS A BLANK.
16. CHAPTER-16 USE PIVOT TABLE AND PIVOT CHART ALONG WITH APPROPRIATE
SLICERS TO VISUALISE THE REMAINDER OF THE DATASET AND SHOW REASONS FOR
YOUR CHOICE OF THE BEST ENROLLEE. THOSE WHO DO NOT HAVE THE FEATURE,
MENTION YOUR EXCEL VERSION AND THAT THE FEATURE IS NOT
AVAILABLE FOR YOU
6
TASK-1 Do a preliminary exploration of the dataset and note down anything that
stands out.
7
TASK- 2 FIND HOW MANY BLANK CELLS THERE ARE IN
GENDER COLUMN.
The Excel screenshot shows a dataset, specifically from Sheet1, which contains
information about individuals (enrollees) potentially seeking new job opportunities. Each
row represents a unique person, and the columns provide detailed attributes such as
enrollee ID, city, city development index, gender, relevant experience, enrollment status,
education, major discipline, work experience, company size and type, years since the last
new job, training hours, and a target variable indicating if the person is currently looking
for a job (where 1 = yes and 0 = no).
In cell D2002, a formula =COUNTBLANK(D2:D2001) has been used to count the
number of blank cells in the gender column (Column D). The result, 448, indicates that
gender information is missing for 448 enrollees in the dataset. This is an important
observation because such missing data can affect analysis, particularly if gender-based
comparisons or diversity-focused evaluations are being made. Identifying and addressing
this gap would be a necessary step during the data cleaning process to ensure the dataset's
completeness and reliability.
8
TASK-3 CHANGE CITY DEVELOPMENT INDEX COLUMN TO
PERCENTAGE AND CLEAN ANY DATA WITH UNNECESSARY
CHARACTERS.
Remove any extra characters if present (like symbols or text), multiply the decimal values by 100, and
format the column as a percentage. This is useful because expressing the city development index as
apercentage makes it easier for people to understand and compare the development levels between cities.
9
TASK-4 FORMAT NO_ENROLLMENT PROPERLY TO "NO ENROLLMENT".
WHILE FORMATTING, KEEP ORIGINAL COLUMN, DO IT IN A SEPARATE
COLUMN AND HIDE THE ORIGINAL COLUMN.
Create a new column where each entry like "no_enrollment" is formatted as "No Enrollment" (with
proper capitalization), while keeping the original column hidden. This improves data presentation and
makes the values more readable and professional without losing the raw data.
This section of the dataset shows how you created a new formatted version of the “enrolled_university”
column using an IF formula to make the data cleaner and more readable. Specifically, you created a new
column next to the original one and entered the formula =IF(G2="no_enrollment", "No Enrollment",
G2). What this formula does is check whether the value in cell G2 (from the original
"enrolled_university" column) equals “no_enrollment.” If it does, it changes the text to a more readable
format—“No Enrollment”—with proper capitalization and spacing. If the value is something else, like
“Full time course” or “Part time course,” it simply keeps that same value unchanged.
You then dragged this formula down the entire new column to apply it to all rows. As a result, the
original technical format (like "no_enrollment") is converted into cleaner, user-friendly terms such as
"No Enrollment," making the dataset easier to read and present. You kept the original column for
reference but created this cleaner column separately and may have hidden the original one afterward to
avoid confusion. This step is essential in data cleaning because properly formatted values improve
understanding and are more suitable for reports, dashboards, and visualizations.
10
TASK- 5 FORMAT EXPERIENCE COLUMN FROM ">20" TO "20+" USING "=IF".
Use a formula like =IF(K2=">20","20+",K2) to replace entries like ">20" with "20+" in a new column.
This makes the experience values more user-friendly and consistent for interpretation or reporting.
In the Excel sheet visible in the screenshot, you have performed data transformation and
standardization on a column of numeric values to enhance consistency and prepare the
dataset for analysis. Specifically, in column J, you have used a formula to recategorize
values from column K, which appear to represent a numeric count or range (such as "0",
"6", "9", "20+").
The formula used in cell J2 is:
=IF(K2=”>20”,”20+”,IF(K2=”<1”,”0”,K2))
11
TASK-6 PROPERLY FORMAT COMPANY SIZE COLUMN - CHANGE DATA
TYPE TO TEXT TO FIX OCT-49 TO 10-49.
Format the "company size" column and fix date-like errors like "Oct-49" Change the data
type of the company size column to text to prevent Excel from auto-converting entries
like "10-49" into dates like "Oct-49". This step is necessary because it preserves the
original meaning of the data and prevents misinterpretation
In this step, you addressed the formatting issue in the "Company Size" column (Column
N) of your Excel dataset. Some entries like "10-49" were mistakenly being interpreted as
dates—specifically, being auto-converted by Excel into a format such as "Oct-49", which
misrepresents the actual data. To fix this, you selected the entire column, then opened the
"Format Cells" dialog box. Inside the dialog, under the "Number" tab, you changed the
cell format to "Text". This action ensures that Excel treats all values in the column as
literal text strings rather than trying to parse them as dates or numbers.
By converting the format to text, you preserved the integrity of categorical data such as
"10-49", "<10", "10000+", etc., and avoided future misinterpretation by Excel. This is a
critical data cleaning step, especially in datasets where categorical ranges are easily
confused with date formats.
12
TASK-7 SHORT DATASET IN ASCENDING MANNER ACCORDING
TO ENROLEE ID
Use the Sort function to arrange all rows from the smallest to largest enrollee ID. Sorting
makes the dataset orderly and helps in easily locating specific records, especially when
using lookup functions.
In this step, you sorted the dataset in ascending order based on the "enrollee_id" column.
To do this, you clicked the dropdown arrow in the header cell of the "enrollee_id" column
(Column A). This opened a filter menu, where you selected “Sort Smallest to Largest”
from the available options. Excel then automatically rearranged all rows in the
spreadsheet so that the enrollee records were organized from the lowest ID number to the
highest.
This sorting action ensures the dataset is structured in a logical order, which is
particularly useful for tasks like VLOOKUPs, checking for duplicates, or performing
systematic reviews of individual records. It's a key data preparation step for making the
dataset more accessible and analysis-ready.
13
TASK-8 COUNT TOTAL NUMBER OF PEOPLE LOOKING FOR A
NEW JOB. (1) IN TARGET COLUMN.
Count total number of people looking for a new job (where target = 1) Use a formula like
=COUNTIF(target_range, 1) to count how many enrollees have their "target" value
marked as 1. This helps you understand how many people in the dataset are actively
seeking new job opportunities.
looking at the provided Excel sheet, it seems in the process of analyzing job seeker data.
I used the COUNTIF function in cell Q1 to calculate the total number of individuals
marked with a "1" in the 'target' column (column 'P'), indicating those who are actively
looking for a new job, which currently totals 505.
14
TASK-9 VLOOKUP USING ENROLEE ID "10653" TO FIND
CORRESPONDING TRAINING HOURS.
so what I'm doing here is using this VLOOKUP thing in this cell, like in C6. What it does
is, it goes and looks for whatever number is in this other cell, which is A6 – that's the
"41" enrollee ID. Then, it goes through this whole big chunk of data, from the top left at
A2 all the way down to O2004. The number 13 tells it, "Okay, once you find that '41' in
the first column, go over to the 13th column in that same row and grab whatever's there."
And that last zero just means it has to be an exact match for that enrollee ID, can't be
something kinda close. Looks like I'm trying to pull out the 'training_hours' for each
person based on their ID.
15
TASK- 10 FILTER AND REMOVE ALL ENROLLEES WITH LAST
NEW JOB LESS THAN OR EQUAL TO 2.
Filter and remove all enrollees with "last new job" less than or equal to 2
Apply a filter to remove any rows where the "last new job" value is 1 or 2. This helps
narrow the data to candidates who are either more experienced or have more stable job
history, which may be more desirable.
so what I'm doing right here is filtering this column, the one that says "last_new_job" –
that's column 'N'. I've clicked on that little arrow thing next to the title, and now this box
popped up. I'm telling it to only show me the rows where the number in that
"last_new_job" column is less than or equal to 2. So, it's gonna hide all the people who
got their last new job more than two years ago. I'm doing this to probably narrow down
the list to people who are more recently active in the job market, maybe they're more
likely to be looking for something new right now.
16
TASK-11 FILTER OUT ALL ENROLLEES WITH EXPERIENCE >=5,<=10.
Filter out enrollees with experience between 5 and 10 years
Use number filters to exclude all rows where experience is 5, 6, 7, 8, 9, or 10. This might
be useful if you're focusing on junior or senior-level candidates only, skipping the mid-
range. I'm doing here with the "experience" column, that's column 'J', is I'm trying to filter
out all the people who have a middling amount of experience, like between 5 and 10
years. I clicked on that little filter arrow, and now I'm telling it to show me the rows
where the "experience" is either greater than or equal to 0 and less than or equal to 4, and
then also where the "experience" is greater than or equal to 11. Basically, I'm setting up
two conditions to grab the folks with less experience and the folks with more experience,
and just hide everyone in that 5 to 10 year range. This helps me focus on either the
newbies or the more seasoned people in the dataset.
17
TASK-12 FILTER OUT ENROLLEES WITH EDUCATION <GRADUATE.
Filter out enrollees with education less than "Graduate" Filter out entries such as "High
School" or "Primary School" to keep only those who are graduates or above. This ensures
you're only considering candidates with a minimum education level for certain roles.
In this screenshot, a filter is being applied to the "education_level" column of the dataset
to exclude candidates with lower educational qualifications. Specifically, only
"Graduate," "Masters," and "PhD" levels are selected, while "High School," "Primary
School," and blank entries are unchecked. This filtering step is crucial for narrowing
down the candidate pool to those who meet a minimum education threshold, which is
often required for more skilled or specialized job roles. By focusing only on higher
education levels, this step helps ensure that the analysis considers only candidates likely
to meet certain professional or technical requirements.
18
TASK- 13 FILTER ALL ENROLLEES TO SHOW THOSE WITH
RELEVANT EXPERIENCE ONLY.
Filter to show only enrollees with relevant experience Use the filter to keep only those
who have “Has relevant experience” in the respective column. This focuses the analysis
on candidates who are more likely to fit into the job quickly due to relevant past work.
what I'm doing with this "relevant_experience" column, that's column 'F', is I'm trying to
see only the people who actually have experience that's considered relevant. I clicked on
that little filter arrow next to the column title, and then I went down and unchecked the
box next to "No relevant experience". This way, only the rows where it says "Has
relevant experience" will be visible, and all the other folks who don't have that marked
will be hidden. It just helps me focus on the candidates who are more likely to hit the
ground running because they've got the right kind of background.
19
TASK-14 FILTER OUT ANY ENROLLEE THAT HAS ANY DATA
POINT AS A BLANK.
Filter out any enrollee that has any data point as blank
Apply a filter or use a formula to detect and remove any row that has a missing value in
any column. Clean data without blanks ensures better accuracy in analysis and avoids
errors in charts or calculations.
So hereI'm doing here is trying to clean up my data a bit. I'm looking at the whole
spreadsheet, and what I want to do is get rid of any rows where there's any missing
information, like a blank cell in any of these columns – city, gender, relevant experience,
education, major, you name it. I haven't actually done the filtering yet in this picture, but
what I would do is probably go through each column, one by one, and filter out the
blanks. Or, sometimes Excel has a special way to select all the blank cells in the whole
sheet at once, and then I can just delete those entire rows. This makes sure that when I
start doing my analysis or making charts, I'm not getting messed up by incomplete data.
20
TASK-15 FIND BEST OVERALL ENROLLEE AS A CANDIDATE
ACCORDING TO INFORMATION, USE YOUR OWN JUDGEMENT AND
JUSTIFY WHY THE SPECIFIC ENROLLEE IS THE BEST CHOICE.
So what I've done here is I've taken all that cleaned-up data, and I'm trying to figure out
who the best candidate is. It's not really something you can just point to with a formula.
What I'm doing is looking at everything together like their education level, how much
experience they have, if it's the right kind of experience, and maybe even how long it's
been since their last job. I'm trying to find someone who ticks a lot of the good boxes. For
example, maybe someone has a really high education, a good amount of relevant
experience, and they've also done a lot of training hours. That kind of person would
probably be a really strong candidate. It's like I'm weighing all the different things and
trying to find the best balance of qualities in one person.
21
TASK-16 USE PIVOT TABLE AND PIVOT CHART ALONG WITH APPROPRIATE
SLICERS TO VISUALISE THE REMAINDER OF THE DATASET AND SHOW REASONS
FOR YOUR CHOICE OF THE BEST ENROLLEE. THOSE WHO DO NOT HAVE THE
FEATURE, MENTION YOUR EXCEL VERSION AND THAT THE FEATURE IS NOT
AVAILABLE FOR YOU
Use a Pivot Table and Pivot Chart with slicers to visualize data and support your best
candidate choice
Create a pivot table to summarize key information (e.g., average training hours by
education level) and use a pivot chart with slicers for interactivity. This visualization
makes your choice of the best candidate clearer and easier to present, showing how they
compare with others in important areas.
what I've done here is I've taken that pivot table I made, and now I'm using these things
called "slicers". you see those little boxes down there for 'enrollee_id',
'city_development_index', 'experience', and 'training_hours'? These are like interactive
filters for my pivot table and any charts I might make from it. So, if I click on a specific
enrollee ID in that slicer, the pivot table will only show me the data related to that person.
Same for the city index, experience level, or training hours. This makes it super easy to
drill down into the data and see how different groups or individuals compare on these key
things. For example, I could click on the highest training hours in the slicer and see which
education levels are most common for those individuals. It's a really handy way to
explore the data and visually back up why I picked a certain person as the best candidate.
22
ASSESSMENT
Internal:
SL FULL
RUBRICS MARKS OBTAINED REMARKS
NO MARK
Understanding the relevance, scope and
1 10
dimension of the project
2 Methodology 10
3 Quality of Analysis and Results 10
4 Interpretations and Conclusions 10
5 Report 10
Total 50
23
COURSE OUTCOME (COs) ATTAINMENT
1 2 3 4 5 6 7 8 9 10
LOW HIGH
➢ Learning Gap (if any):
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
➢ Books / Manuals Referred:
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
24