0% found this document useful (0 votes)
21 views

Employee Data Analysis System ( Ip Class 12 ) ( 2024-25 )

The document provides an overview of the Python Pandas library, detailing its features for data manipulation and analysis, including data structures like Series, DataFrame, and Panel. It also covers CSV file handling, data visualization using Matplotlib, and outlines a project objective focused on salary and compensation analysis. Additionally, it includes source code for a system managing employee information and visualizing salary data.

Uploaded by

Andrea B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Employee Data Analysis System ( Ip Class 12 ) ( 2024-25 )

The document provides an overview of the Python Pandas library, detailing its features for data manipulation and analysis, including data structures like Series, DataFrame, and Panel. It also covers CSV file handling, data visualization using Matplotlib, and outlines a project objective focused on salary and compensation analysis. Additionally, it includes source code for a system managing employee information and visualizing salary data.

Uploaded by

Andrea B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

OVERVIEW OF PYTHON:

Pandas is a Python library that offers efficient,


adaptable, and expressive data structures specifically
crafted for data manipulation. Originating in 2008, Wes
McKinney developed Python Pandas for the purpose of
data analysis within the Python ecosystem. Data
analysis involves extensive processing, including tasks
like restructuring, cleaning, and merging. Leveraging
various tools such as NumPy, SciPy, Cython, and
Pandas facilitates the accomplishment of five
fundamental steps in data processing and analysis:
loading, preparing, manipulating, modeling, and
analyzing data.

FEATURES OF PANDAS
Pandas, a widely adopted library within the scientific Python
ecosystem for conducting data analysis, excels in various data
processing tasks and encompasses the following notable
characteristics:

It possesses the ability to effortlessly read from and write to diverse data formats,
accommodating different data types such as integers, floats, and doubles.

Columns within a Pandas data structure can be seamlessly inserted or removed to


facilitate dynamic data manipulations.
Pandas supports the group-by operation, enabling efficient data aggregation
and transformations. Additionally, it facilitates high-performance merging
and joining of datasets.

With robust Input-Output (I/O) capabilities, Pandas can efficiently retrieve


data directly from sources like a MySQL database, populating it into a
structured data frame.

Pandas excels in the selection of subsets from extensive datasets and offers
seamless integration of multiple datasets.

It provides a straightforward mechanism for identifying and filling missing


data, enhancing data integrity and completeness.

Pandas enables the application of operations to distinct groups within the


data, promoting targeted and group-specific data manipulations.

The library supports the reshaping of data into various forms, allowing users
to tailor datasets to their specific analytical requirements.

Data Structure in Pandas


In the realm of computer science, a data structure serves as a
means of organizing and storing data for effective accessibility
and manipulation. It encompasses a set of data values along
with operations that can be applied to that data, facilitating
efficient storage, retrieval, and modification. Within Pandas,
three primary data structures are employed:

➤ Series: This is a one-dimensional structure designed for storing


elements of uniform data type that can be modified as
needed.
➤ Dataframe: Representing a two-dimensional structure, it
accommodates mutable data of various types,
offering versatility in handling heterogeneous
datasets

➤ Panel: Serving as a three-dimensional arrangement, a Panel


provides a structured way of storing items, adding an
additional layer of complexity and organization to data
storage in Pandas.

CSV FILE
CSV (Comma Separated Values) serves as a straightforward
file format for storing tabular data, resembling a
spreadsheet or database. This plain text format organizes
data records into lines, with each record containing one or
more fields separated by commas, giving rise to the name
"CSV."

CSV Format Characteristics:

Each table row corresponds to a line in the CSV file.


Field values within a row are delimited by commas.

Advantages of CSV Format:

Simplicity, compactness, and ubiquity in data storage.


A widely adopted format for data interchange.
Creating and Reading CSV File:

CSV, being a text file, can be crafted and modified using any text
editor. Commonly, a CSV file is generated by exporting data from a
spreadsheet or database. CSV files adhere to a standard structure
where columns are separated by a delimiter (comma, semicolon,
space, or tab), and each new line signifies a new row.

Loading Data from CSV to Data Frames:

Python offers two library functions, namely read_csv() and to_csv(),


which are utilized for loading data from CSV files into DataFrames.
The read_csv() function facilitates importing tabular data, while the
to_csv() function converts DataFrame data into CSV format for
storage.

Reading from a CSV File to Data Frame:

The read_csv function from the Pandas package allows the import of
tabular data from CSV files into a Pandas DataFrame by specifying
the file name as a parameter.

Storing Data Frame Data to CSV File:

Pandas to_csv() function converts DataFrame content into CSV data.


It can either write the CSV data into a file specified by a file object or
return the CSV data as a string.
DATA VISUALIZATION
Data visualization is a crucial step in understanding and
analyzing information effectively. By representing complex
data through charts or graphs, the human brain can process
information more efficiently than through spreadsheets or
textual reports. Visualization, or Data Visualization, aids in
identifying patterns, relationships, and outliers, providing a
quick understanding of complex problems.

MATPLOTLIB
Matplotlib, an open-source 2D plotting library, is a go-to tool
for visualizing figures in Python. Renowned for its ease of use
and versatility, Matplotlib supports static, animated, and
interactive 2D plots or figures. It grants users control over
every aspect of a figure, allowing for interactive and non-
interactive plotting and offering various output formats.

Types of Visualization:
Matplotlib offers a range of visualizations, including Line
Plots, Scatter Plots, Histograms, Box Plots, Bar Charts, and
Pie Charts.
Matplotlib Figure Components:
Figure: The outermost canvas containing one or more axes
(plots/subplots).

Axes: Represents an individual plot, with a figure possibly


containing multiple axes.

Axis: Number line-like objects defining graph limits.

Artist: Everything visible on the figure, such as text


objects and line collections.

Labels: Manage axes dimensions, specifying data types.

Title: Describes the content of the graph.

Legend: Explains different data sets in the chart.

xticks() and yticks(): Set tick locations and labels for


the x and y-axes.
Line Plot/Chart:
A type of plot displaying data points connected by straight lines,
Line Plots are commonly used to visualize trends over time.
Matplotlib's plot() function allows users to create Line Plots with
customizable grid, axis labels, title, and display options.

Bar Plot/Chart:
Representing categorical data with rectangular bars, Bar Charts
are useful for comparing numeric values across different
categories. Matplotlib's bar() function facilitates the creation of
Bar Charts with configurable characteristics like bar width and
color.
OBJECTIVE OF PROJECT

Salary and Compensation Analysis:


Create a system to manage salary information, including basic salary,
allowances, and deductions. Provide tools for analyzing
compensation data and generating salary-related reports.

Employee Information Management:


Develop a system to store and manage employee information
efficiently, including personal details, contact information, and job-
related data.

Data Validation and Verification:


Implement validation checks to ensure accurate and consistent data
entry. Verify and validate user inputs to maintain data integrity.

User-Friendly Interface:
Design an intuitive and user-friendly interface for ease of use. Ensure
that employees, managers, and administrators can navigate the
system effortlessly
CSV FILE
SOURCE CODE
#HERE WE HAVE THE USER DEFINED FUNCYIONS
#THAT HAVE BEEN CREATED TO AID OUT PROJECT ,

#IMPORTING REQUESTED LIBRARIES :

import pandas as pd
import matplotlib.pyplot as plt ch='Y'
while ch=='Y':

print('1.Read CSV file')


print('2.Show all records')
print('3.Show the name of female employees')
print('4.Search Record')
print('5.Add new record')
print('6.Delete record')
print('7.Modify record')
print('8.Show salary chart using line graph')
print('9.Showsalary chart using bar graph')
print('10.Save data into csv file')
choice=int(input('enter your choice :'))
if choice==1:
df=pd.read_csv(C:\Users\ujjua\OneDrive\Desktop\Newfolder\
'Emp.csv') #read the csv file print('file opened')

elif choice==2:
print(df)

elif choice==3:
print(df[df['gender']=='F']['name'])

elif choice==4:
e=int(input('enter emp no to search'))
inx=df[df.empno==e].index.values #to get index value

if len(inx)==0:
print("record not found")
else:
print(df[df.empno==e])
elif choice==5:
e=int(input('Enter emp no\t'))
n=input('Enter name\t')
d=input('Enter dept\t')
s=int(input("Enter salary\t"))
g=input("Enter gender\t")
df=df.append({'empno':e,'name':n,'dept':d,'gender':g,'salary'
:s},ignore_index=True)
print('record added')
elif choice==6:
e=int(input('enter emp no to delete'))
inx=df[df.empno==e].index.values
if len(inx)==0:
print("record not found")
else:
print(df[df.empno==e])
df=df[df['empno']!=e]
print('record deleted')
df.index=range(len(df)) #rearange index no

elif choice==7:
e==int(input('enter emp no to modify'))
inx=df[df.empno==e].index.values #to get index value
if len(inx)==0:
print('record not found')
else:
print(df[df.empno==e])
n=input('enter new name')
d=input('enter new dept')
s=int(input('enter new salary'))
g=input('enter new gender')
df.loc[inx,"name"]=n
df.loc[inx,"dept"]=d
df.loc[inx,"salary"]=s
df.loc[inx,"gender"]=g
print("record updated")
elif choice==8:
plt.ylabel('Salary')
plt.xlabel('Empno')
plt.plot(df['empno'],df['salary'])
plt.title('Salary Chart')
plt.show( )

elif choice==9:
plt.bar(df['name'],df['salary'])
plt.title('Salary Graph')
plt.xlabel('Names')
plt.ylabel('Salary')
plt.show( )

elif choice==10:
df.to_csv('emp.csv',index=False)
print('file saved')
ch=input('Do u want to continue').upper( )
OUTPUT SCREEN

CHOICE 1
CHOICE 2
CHOICE 3
CHOICE 4
CHOICE 5
CHOICE 6
CHOICE 7
CHOICE 8
CHOICE 9
CHOICE 9
BIBLOGRAPHY

For successfully completing my project, I have taken


help from the following website links:

https://ncert.nic.in/textbook.php?leip1=0-7
https://www.geeksforgeeks.org/bar-plot-in-matplotlib/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy