0% found this document useful (0 votes)
7 views24 pages

Data Ananlysis Project Anandita

The document is a project report by Anandita Samant on data analysis using Microsoft Excel, submitted for a Bachelor of Business Administration degree at Centurion University. It includes a bonafide certificate, declaration, acknowledgments, and a detailed table of contents outlining various tasks performed in Excel, such as data cleaning, formatting, and analysis. The report emphasizes the importance of data integrity and presentation in making informed decisions based on the dataset of individuals seeking new job opportunities.

Uploaded by

ananditasamant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views24 pages

Data Ananlysis Project Anandita

The document is a project report by Anandita Samant on data analysis using Microsoft Excel, submitted for a Bachelor of Business Administration degree at Centurion University. It includes a bonafide certificate, declaration, acknowledgments, and a detailed table of contents outlining various tasks performed in Excel, such as data cleaning, formatting, and analysis. The report emphasizes the importance of data integrity and presentation in making informed decisions based on the dataset of individuals seeking new job opportunities.

Uploaded by

ananditasamant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

DATA ANALYSIS THOUGH MICROSOFT EXCEL

A PROJECT REPORT

Submitted by

ANANDITA SAMANT

in partial fulfillment for the award of the degree of

BACHELOR OF BUSINESS
ADMINISTRATION

SCHOOL OF MANAGEMENT

BHUBANESWAR CAMPUS
CENTURION UNIVERSITY OF TECHNOLOGY AND MANAGEMENT
ODISHA

MAY 2025

SCHOOL OF MANAGEMENT
BHUBANESWAR CAMPUS

1
BONAFIDE CERTIFICATE

Certified that this project report “DATA ANALYSIS THOUGH MICROSOFT EXCEL”

is the Bonafide work of ANANDITA SAMANT who carried out the project work under

my supervision. This is to further certify to the best of my knowledge, that this project

has not been carried out earlier in this institute and the university.

SIGNATURE

(MR. Imad mohammed )

Certified that the above mentioned project has been duly carried out as per the
norms of the college and statutes of the university.

SIGNATURE
(Dr. Ronismita Mishra / Dr. Anshuman Jena)
HEAD OF THE DEPARTMENT / DEAN OF THE SCHOOL
Lecturer of School of Management

DEPARTMENT SEAL

2
DECLARATION

I hereby declare that the project entitled “Data Analysis Though Microsoft Excel” submitted

for the 2nd Semester BBA is my original work and the project has not formed the basis

for the award of any Degree / Diploma or any other similar titles in any other University

/ Institute.

Name of the Student: ANANDITA SAMANT

Signature of the Student:

Registration No: 240409120011

Place: Bhubaneswar

Date: 14 / 05 / 2025

3
ACKNOWLEDGEMENTS

I wish to express my profound and sincere gratitude to MR. Imad mohammed,


Department of Applied Sciences, SoET, Bhubaneswar Campus, who guided me into
the intricacies of this project nonchalantly with matchless magnanimity.
I thank Dr. Ronismita Mishra, Head of the Department, SOM, Bhubaneswar
Campus and Dr. Anshuman Jena, Dean, School of Management, Bhubaneswar
Campus for extending their support during Course of this investigation.

I would be failing in my duty if I don’t acknowledge the cooperation rendered


during various stages of image interpretation by team members.

I am highly grateful to team members who evinced keen interest and invaluable
support in the progress and successful completion of my project work.
I am indebted to my parents for their constant encouragement, co-operation and
help. Words of gratitude are not enough to describe the accommodation and fortitude
which they have shown throughout my endeavor.

Name of the Student: ANANDITA SAMANT

Signature of the Student:

Registration No: 240409120011

Place: Bhubaneswar

Date: 14 / 04 / 2025

4
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

CERTIFICATE i

DECLARATION ii
ACKNOWLEDGEMENT iii
LIST OF ACRONYMS iv
LIST OF TABLE v

1. CHAPTER – 1 DO A PRELIMINARY EXPLORATION OF THE DATASET AND NOTE


DOWN ANYTHING THAT STANDS OUT.

2. CHAPTER – 2 FIND HOW MANY BLANK CELLS THERE ARE IN GENDER COLUMN.
3. CHAPTER 3 CHANGE CITY DEVELOPMENT INDEX COLUMN TO PERCENTAGE AND
CLEAN ANY DATA WITH UNNECESSARY CHARACTERS.
4. CHAPTER – 4 FORMAT NO_ENROLLMENT PROPERLY TO "NO ENROLLMENT". WHILE
FORMATTING, KEEP ORIGINAL COLUMN, DO IT IN A SEPARATE COLUMN AND HIDE
THE ORIGINAL COLUMN.
5. CHAPTER – 5 FORMAT EXPERIENCE COLUMN FROM ">20" TO "20+" USING "=IF".
6. CHAPTER – 6 PROPERLY FORMAT COMPANY SIZE COLUMN - CHANGE DATA TYPE TO
TEXT TO FIX OCT-49 TO 10-49.
7. CHAPTER – 7 SHORT DATASET IN ASCENDING MANNER ACCORDING TO ENROLEE ID.

8. CHAPTER- 8 COUNT TOTAL NUMBER OF PEOPLE LOOKING FOR A NEW JOB. (1) IN
TARGET COLUMN.

9. CHAPTER-9 VLOOKUP USING ENROLEE ID "10653" TO FIND CORRESPONDING


TRAINING HOURS.

10. CHAPTER- 10 FILTER AND REMOVE ALL ENROLLEES WITH LAST NEW JOB LESS THAN
OR EQUAL TO 2.

11. CHAPTER-11 FILTER OUT ALL ENROLLEES WITH EXPERIENCE >=5,<=10.

12. CHAPTER- 12 FILTER OUT ENROLLEES WITH EDUCATION <GRADUATE.

13. CHAPTER- 13 FILTER ALL ENROLLEES TO SHOW THOSE WITH RELEVANT


EXPERIENCE ONLY.

5
14. CHAPTER- 14 FILTER OUT ANY ENROLLEE THAT HAS ANY DATA POINT AS A BLANK.

15. CHAPTER-15 FIND BEST OVERALL ENROLLEE AS A CANDIDATE ACCORDING TO


INFORMATION, USE YOUR OWN JUDGEMENT AND JUSTIFY WHY THE SPECIFIC
ENROLLEE IS THE BEST CHOICE.

16. CHAPTER-16 USE PIVOT TABLE AND PIVOT CHART ALONG WITH APPROPRIATE
SLICERS TO VISUALISE THE REMAINDER OF THE DATASET AND SHOW REASONS FOR
YOUR CHOICE OF THE BEST ENROLLEE. THOSE WHO DO NOT HAVE THE FEATURE,
MENTION YOUR EXCEL VERSION AND THAT THE FEATURE IS NOT
AVAILABLE FOR YOU

6
TASK-1 Do a preliminary exploration of the dataset and note down anything that
stands out.

Preliminary exploration of the dataset


Start by quickly scanning through the dataset to understand what types of columns and data
are included, check for inconsistencies, outliers, and missing values, and make notes on
anything unusual. This step is important because it helps you get familiar with the dataset
and guides you in deciding what kind of cleaning and analysis are needed next. The dataset
is about people in the data sector looking for moving to a new job. Your final task is to find
the best candidate amongst all of the data points. There are 2000 rows and 14 columns.

7
TASK- 2 FIND HOW MANY BLANK CELLS THERE ARE IN
GENDER COLUMN.

Count blank cells in the "gender" column


Use the =COUNTBLANK() function on the gender column to find out how many cells
do not contain any value. This is important because missing gender data might affect
analysis or decision-making later, especially if gender is a factor you're analyzing.

The Excel screenshot shows a dataset, specifically from Sheet1, which contains
information about individuals (enrollees) potentially seeking new job opportunities. Each
row represents a unique person, and the columns provide detailed attributes such as
enrollee ID, city, city development index, gender, relevant experience, enrollment status,
education, major discipline, work experience, company size and type, years since the last
new job, training hours, and a target variable indicating if the person is currently looking
for a job (where 1 = yes and 0 = no).
In cell D2002, a formula =COUNTBLANK(D2:D2001) has been used to count the
number of blank cells in the gender column (Column D). The result, 448, indicates that
gender information is missing for 448 enrollees in the dataset. This is an important
observation because such missing data can affect analysis, particularly if gender-based
comparisons or diversity-focused evaluations are being made. Identifying and addressing
this gap would be a necessary step during the data cleaning process to ensure the dataset's
completeness and reliability.

8
TASK-3 CHANGE CITY DEVELOPMENT INDEX COLUMN TO
PERCENTAGE AND CLEAN ANY DATA WITH UNNECESSARY
CHARACTERS.

Remove any extra characters if present (like symbols or text), multiply the decimal values by 100, and
format the column as a percentage. This is useful because expressing the city development index as
apercentage makes it easier for people to understand and compare the development levels between cities.

To convert the values in the "city_development_index" column to percentage format in


Excel, start by clicking on the column header (in this case, Column C) to highlight the
entire column. Then, go to the "Home" tab on the Excel ribbon at the top of the screen.
Within the "Number" group, click on the dropdown menu where it usually shows
“General,” “Number,” or some other format. From the list, select "Percentage". Once
selected, Excel will automatically convert all the decimal values (such as 0.92, 0.75, or
0.68) into percentages (like 92%, 75%, or 68%) by multiplying the original values by 100
and appending the percentage sign. This helps make the data more understandable,
especially for people who are more familiar with percentage scales than decimals.
To control how many decimal places are shown, use the Increase Decimal ( .0→.00 ) or
Decrease Decimal buttons right next to the format box. If needed, make sure that the
column doesn't include any non-numeric data or text characters before converting,
because those will cause errors or prevent proper formatting. Formatting the
"city_development_index" column this way makes it easier to visually compare how
developed different cities are and can help in filtering or sorting enrollees based on their
location's development level during analysis.

9
TASK-4 FORMAT NO_ENROLLMENT PROPERLY TO "NO ENROLLMENT".
WHILE FORMATTING, KEEP ORIGINAL COLUMN, DO IT IN A SEPARATE
COLUMN AND HIDE THE ORIGINAL COLUMN.

Create a new column where each entry like "no_enrollment" is formatted as "No Enrollment" (with
proper capitalization), while keeping the original column hidden. This improves data presentation and
makes the values more readable and professional without losing the raw data.

This section of the dataset shows how you created a new formatted version of the “enrolled_university”
column using an IF formula to make the data cleaner and more readable. Specifically, you created a new
column next to the original one and entered the formula =IF(G2="no_enrollment", "No Enrollment",
G2). What this formula does is check whether the value in cell G2 (from the original
"enrolled_university" column) equals “no_enrollment.” If it does, it changes the text to a more readable
format—“No Enrollment”—with proper capitalization and spacing. If the value is something else, like
“Full time course” or “Part time course,” it simply keeps that same value unchanged.
You then dragged this formula down the entire new column to apply it to all rows. As a result, the
original technical format (like "no_enrollment") is converted into cleaner, user-friendly terms such as
"No Enrollment," making the dataset easier to read and present. You kept the original column for
reference but created this cleaner column separately and may have hidden the original one afterward to
avoid confusion. This step is essential in data cleaning because properly formatted values improve
understanding and are more suitable for reports, dashboards, and visualizations.

10
TASK- 5 FORMAT EXPERIENCE COLUMN FROM ">20" TO "20+" USING "=IF".

Use a formula like =IF(K2=">20","20+",K2) to replace entries like ">20" with "20+" in a new column.
This makes the experience values more user-friendly and consistent for interpretation or reporting.

In the Excel sheet visible in the screenshot, you have performed data transformation and
standardization on a column of numeric values to enhance consistency and prepare the
dataset for analysis. Specifically, in column J, you have used a formula to recategorize
values from column K, which appear to represent a numeric count or range (such as "0",
"6", "9", "20+").
The formula used in cell J2 is:

=IF(K2=”>20”,”20+”,IF(K2=”<1”,”0”,K2))

11
TASK-6 PROPERLY FORMAT COMPANY SIZE COLUMN - CHANGE DATA
TYPE TO TEXT TO FIX OCT-49 TO 10-49.

Format the "company size" column and fix date-like errors like "Oct-49" Change the data
type of the company size column to text to prevent Excel from auto-converting entries
like "10-49" into dates like "Oct-49". This step is necessary because it preserves the
original meaning of the data and prevents misinterpretation

In this step, you addressed the formatting issue in the "Company Size" column (Column
N) of your Excel dataset. Some entries like "10-49" were mistakenly being interpreted as
dates—specifically, being auto-converted by Excel into a format such as "Oct-49", which
misrepresents the actual data. To fix this, you selected the entire column, then opened the
"Format Cells" dialog box. Inside the dialog, under the "Number" tab, you changed the
cell format to "Text". This action ensures that Excel treats all values in the column as
literal text strings rather than trying to parse them as dates or numbers.
By converting the format to text, you preserved the integrity of categorical data such as
"10-49", "<10", "10000+", etc., and avoided future misinterpretation by Excel. This is a
critical data cleaning step, especially in datasets where categorical ranges are easily
confused with date formats.

12
TASK-7 SHORT DATASET IN ASCENDING MANNER ACCORDING
TO ENROLEE ID

Use the Sort function to arrange all rows from the smallest to largest enrollee ID. Sorting
makes the dataset orderly and helps in easily locating specific records, especially when
using lookup functions.

In this step, you sorted the dataset in ascending order based on the "enrollee_id" column.
To do this, you clicked the dropdown arrow in the header cell of the "enrollee_id" column
(Column A). This opened a filter menu, where you selected “Sort Smallest to Largest”
from the available options. Excel then automatically rearranged all rows in the
spreadsheet so that the enrollee records were organized from the lowest ID number to the
highest.
This sorting action ensures the dataset is structured in a logical order, which is
particularly useful for tasks like VLOOKUPs, checking for duplicates, or performing
systematic reviews of individual records. It's a key data preparation step for making the
dataset more accessible and analysis-ready.

13
TASK-8 COUNT TOTAL NUMBER OF PEOPLE LOOKING FOR A
NEW JOB. (1) IN TARGET COLUMN.

Count total number of people looking for a new job (where target = 1) Use a formula like
=COUNTIF(target_range, 1) to count how many enrollees have their "target" value
marked as 1. This helps you understand how many people in the dataset are actively
seeking new job opportunities.

looking at the provided Excel sheet, it seems in the process of analyzing job seeker data.
I used the COUNTIF function in cell Q1 to calculate the total number of individuals
marked with a "1" in the 'target' column (column 'P'), indicating those who are actively
looking for a new job, which currently totals 505.

14
TASK-9 VLOOKUP USING ENROLEE ID "10653" TO FIND
CORRESPONDING TRAINING HOURS.

Use VLOOKUP with enrollee ID "10653" to find their training hours


Use a VLOOKUP function like =VLOOKUP(10653, table_range, column_index,
FALSE) to find the number of training hours for that specific ID. This demonstrates how
to retrieve targeted data efficiently from a large dataset.

so what I'm doing here is using this VLOOKUP thing in this cell, like in C6. What it does
is, it goes and looks for whatever number is in this other cell, which is A6 – that's the
"41" enrollee ID. Then, it goes through this whole big chunk of data, from the top left at
A2 all the way down to O2004. The number 13 tells it, "Okay, once you find that '41' in
the first column, go over to the 13th column in that same row and grab whatever's there."
And that last zero just means it has to be an exact match for that enrollee ID, can't be
something kinda close. Looks like I'm trying to pull out the 'training_hours' for each
person based on their ID.

15
TASK- 10 FILTER AND REMOVE ALL ENROLLEES WITH LAST
NEW JOB LESS THAN OR EQUAL TO 2.

Filter and remove all enrollees with "last new job" less than or equal to 2
Apply a filter to remove any rows where the "last new job" value is 1 or 2. This helps
narrow the data to candidates who are either more experienced or have more stable job
history, which may be more desirable.

so what I'm doing right here is filtering this column, the one that says "last_new_job" –
that's column 'N'. I've clicked on that little arrow thing next to the title, and now this box
popped up. I'm telling it to only show me the rows where the number in that
"last_new_job" column is less than or equal to 2. So, it's gonna hide all the people who
got their last new job more than two years ago. I'm doing this to probably narrow down
the list to people who are more recently active in the job market, maybe they're more
likely to be looking for something new right now.

16
TASK-11 FILTER OUT ALL ENROLLEES WITH EXPERIENCE >=5,<=10.
Filter out enrollees with experience between 5 and 10 years
Use number filters to exclude all rows where experience is 5, 6, 7, 8, 9, or 10. This might
be useful if you're focusing on junior or senior-level candidates only, skipping the mid-
range. I'm doing here with the "experience" column, that's column 'J', is I'm trying to filter

out all the people who have a middling amount of experience, like between 5 and 10
years. I clicked on that little filter arrow, and now I'm telling it to show me the rows
where the "experience" is either greater than or equal to 0 and less than or equal to 4, and
then also where the "experience" is greater than or equal to 11. Basically, I'm setting up
two conditions to grab the folks with less experience and the folks with more experience,
and just hide everyone in that 5 to 10 year range. This helps me focus on either the
newbies or the more seasoned people in the dataset.

17
TASK-12 FILTER OUT ENROLLEES WITH EDUCATION <GRADUATE.

Filter out enrollees with education less than "Graduate" Filter out entries such as "High
School" or "Primary School" to keep only those who are graduates or above. This ensures
you're only considering candidates with a minimum education level for certain roles.

In this screenshot, a filter is being applied to the "education_level" column of the dataset
to exclude candidates with lower educational qualifications. Specifically, only
"Graduate," "Masters," and "PhD" levels are selected, while "High School," "Primary
School," and blank entries are unchecked. This filtering step is crucial for narrowing
down the candidate pool to those who meet a minimum education threshold, which is
often required for more skilled or specialized job roles. By focusing only on higher
education levels, this step helps ensure that the analysis considers only candidates likely
to meet certain professional or technical requirements.

18
TASK- 13 FILTER ALL ENROLLEES TO SHOW THOSE WITH
RELEVANT EXPERIENCE ONLY.
Filter to show only enrollees with relevant experience Use the filter to keep only those
who have “Has relevant experience” in the respective column. This focuses the analysis
on candidates who are more likely to fit into the job quickly due to relevant past work.

what I'm doing with this "relevant_experience" column, that's column 'F', is I'm trying to
see only the people who actually have experience that's considered relevant. I clicked on
that little filter arrow next to the column title, and then I went down and unchecked the
box next to "No relevant experience". This way, only the rows where it says "Has
relevant experience" will be visible, and all the other folks who don't have that marked
will be hidden. It just helps me focus on the candidates who are more likely to hit the
ground running because they've got the right kind of background.

19
TASK-14 FILTER OUT ANY ENROLLEE THAT HAS ANY DATA
POINT AS A BLANK.

Filter out any enrollee that has any data point as blank
Apply a filter or use a formula to detect and remove any row that has a missing value in
any column. Clean data without blanks ensures better accuracy in analysis and avoids
errors in charts or calculations.

So hereI'm doing here is trying to clean up my data a bit. I'm looking at the whole
spreadsheet, and what I want to do is get rid of any rows where there's any missing
information, like a blank cell in any of these columns – city, gender, relevant experience,
education, major, you name it. I haven't actually done the filtering yet in this picture, but
what I would do is probably go through each column, one by one, and filter out the
blanks. Or, sometimes Excel has a special way to select all the blank cells in the whole
sheet at once, and then I can just delete those entire rows. This makes sure that when I
start doing my analysis or making charts, I'm not getting messed up by incomplete data.

20
TASK-15 FIND BEST OVERALL ENROLLEE AS A CANDIDATE
ACCORDING TO INFORMATION, USE YOUR OWN JUDGEMENT AND
JUSTIFY WHY THE SPECIFIC ENROLLEE IS THE BEST CHOICE.

Find the best overall enrollee as a candidate based on all information


Analyze the dataset after cleaning and filtering, then choose the most suitable candidate
based on a balance of education, experience, skills, and other qualities. Justify your
choice with clear reasoning—like having the highest training hours, relevant experience,
and a strong work history.

So what I've done here is I've taken all that cleaned-up data, and I'm trying to figure out
who the best candidate is. It's not really something you can just point to with a formula.
What I'm doing is looking at everything together like their education level, how much
experience they have, if it's the right kind of experience, and maybe even how long it's
been since their last job. I'm trying to find someone who ticks a lot of the good boxes. For
example, maybe someone has a really high education, a good amount of relevant
experience, and they've also done a lot of training hours. That kind of person would
probably be a really strong candidate. It's like I'm weighing all the different things and
trying to find the best balance of qualities in one person.

21
TASK-16 USE PIVOT TABLE AND PIVOT CHART ALONG WITH APPROPRIATE
SLICERS TO VISUALISE THE REMAINDER OF THE DATASET AND SHOW REASONS
FOR YOUR CHOICE OF THE BEST ENROLLEE. THOSE WHO DO NOT HAVE THE
FEATURE, MENTION YOUR EXCEL VERSION AND THAT THE FEATURE IS NOT
AVAILABLE FOR YOU

Use a Pivot Table and Pivot Chart with slicers to visualize data and support your best
candidate choice
Create a pivot table to summarize key information (e.g., average training hours by
education level) and use a pivot chart with slicers for interactivity. This visualization
makes your choice of the best candidate clearer and easier to present, showing how they
compare with others in important areas.

what I've done here is I've taken that pivot table I made, and now I'm using these things
called "slicers". you see those little boxes down there for 'enrollee_id',
'city_development_index', 'experience', and 'training_hours'? These are like interactive
filters for my pivot table and any charts I might make from it. So, if I click on a specific
enrollee ID in that slicer, the pivot table will only show me the data related to that person.
Same for the city index, experience level, or training hours. This makes it super easy to
drill down into the data and see how different groups or individuals compare on these key
things. For example, I could click on the highest training hours in the slicer and see which
education levels are most common for those individuals. It's a really handy way to
explore the data and visually back up why I picked a certain person as the best candidate.

22
ASSESSMENT
Internal:
SL FULL
RUBRICS MARKS OBTAINED REMARKS
NO MARK
Understanding the relevance, scope and
1 10
dimension of the project
2 Methodology 10
3 Quality of Analysis and Results 10
4 Interpretations and Conclusions 10
5 Report 10
Total 50

Date: Signature of the Faculty

23
COURSE OUTCOME (COs) ATTAINMENT

➢ Expected Course Outcomes (COs):


(Refer to COs Statement in the Syllabus)
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
➢ Course Outcome Attained:
How would you rate your learning of the subject based on the specified COs?

1 2 3 4 5 6 7 8 9 10
LOW HIGH
➢ Learning Gap (if any):
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
➢ Books / Manuals Referred:
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________

Date: Signature of the Student


➢ Suggestions / Recommendations:
(By the Course Faculty)
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________

Date: Signature of the Faculty

24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy