0% found this document useful (0 votes)
13 views4 pages

Lab Session 07: Perform Following Operations Using Pandas

The document outlines a laboratory session for a Data Science course focusing on operations using Pandas, including handling NaN values, sorting, and grouping data. It includes pre-lab and post-lab tasks with example code snippets demonstrating how to fill NaN values, sort DataFrames, and group data for analysis. The lab aims to provide practical experience in data manipulation using Python and Pandas.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Lab Session 07: Perform Following Operations Using Pandas

The document outlines a laboratory session for a Data Science course focusing on operations using Pandas, including handling NaN values, sorting, and grouping data. It includes pre-lab and post-lab tasks with example code snippets demonstrating how to fill NaN values, sort DataFrames, and group data for analysis. The lab aims to provide practical experience in data manipulation using Python and Pandas.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

REGD. NO.

238W1A5464 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-2025

Lab Session 07: Perform following operations using pandas

Date of the Session: 17/02/2025 Time of the Session:12:30AM to 1:00PM

Pre-Lab Task: Write answers before entering into lab.


1. What does NaN stand for in Pandas, and why do missing values occur in a dataset?
A. NaN stands for Not a Number in Pandas. It is used to represent missing or undefined values in a dataset.
Missing values can occur due to various reasons:
a.Data collection errors (e.g., missing fields in a survey)
b.Data entry errors (e.g., missing values in a database)
c.Absence of data (e.g., a product or customer may not have a value for a certain attribute)
d.Merging datasets where some values do not match.

2. How can we fill NaN values in a Pandas DataFrame with a specific string?
A. youcan use the fillna() function to replace NaN values with a specific string:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'City': ['New York', None, 'Chicago', None]}
df = pd.DataFrame(data)
df_filled = df.fillna("Unknown")
print (df_filled)

3. What is the purpose of the sort_values() function in Pandas?


A. The sort_values() function is used to sort a DataFrame by one or more columns in either ascending or
descending order. It helps in organizing the data to make it easier to analyze
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 30, 35, 40]})
df_sorted = df.sort_values(by='Age')
print(df_sorted)

4. How does the groupby() function work, and when should it be used?
A. The groupby() function in Pandas is used to group data based on one or more columns and then apply an
aggregate function (like sum, mean, count, etc.) on each group.
 Usage: It is used when you want to analyze subsets of data and perform aggregate calculations on
these subsets.

5. Can you explain a real-world scenario where sorting and grouping data is essential?
A. Scenario: In a sales report analysis, sorting and grouping data is essential for understanding performance
across different product categories or regions.
 Sorting: To find the top-selling products or regions, you can sort the sales data by the total revenue
in descending order. This helps identify high-performers at a glance.
 Grouping: To calculate total revenue for each region or category, you can group the sales data by
region or product category and then calculate the sum of sales. This helps compare the performance
of different regions or categories.

LAB No. 07 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |


REGD. NO.238W1A5464 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-2025

In Lab Task:

a. Filling NaN with string

Code:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'City': ['New York', None, 'Chicago', None]}
df = pd.DataFrame(data)
df_filled = df.fillna("Unknown")
print(df_filled)

Ourtput:
Name City
0 Alice New York
1 Bob Unknown
2 Charlie Chicago
3 David Unknown

b. Sorting based on column values

Code:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 30, 35, 40]}
df = pd.DataFrame(data)
df_sorted = df.sort_values(by='Age', ascending=True)
print(df_sorted)

Output:
Name Age
0 Alice 24
1 Bob 30
2 Charlie 35
3 David 40

c. groupby()

Code:
data = {'Category': ['A', 'B', 'A', 'B'],
'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Category')['Value'].sum()
print(grouped_df)

Output:
Category
A 40
B 60
Name: Value, dtype: int64

LAB No. 07 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |


REGD. NO.238W1A5464 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-2025

Post Lab Task:


a. Write a Python code snippet to fill all NaN values in a DataFrame with the string "Missing".
A. import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'City': ['New York', None, 'Chicago', None],
'Age': [24, None, 35, None]}
df = pd.DataFrame(data)
df_filled = df.fillna("Missing")
print(df_filled)
Output:
Name City Age
0 Alice New York 24
1 Bob Missing Missing
2 Charlie Chicago 35
3 David Missing Missing

b. Given a DataFrame with a "Salary" column, how would you sort it in descending order?
A. Code:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Salary': [50000, 60000, 70000, 55000]}
df = pd.DataFrame(data)
df_sorted = df.sort_values(by='Salary', ascending=False)
print(df_sorted)
Output:
Name Salary
2 Charlie 70000
1 Bob 60000
3 David 55000
0 Alice 50000

c. How can you group a DataFrame by a "Department" column and calculate the average salary for each
department?
A. Code:
data = {'Department': ['HR', 'IT', 'HR', 'IT'],
'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Department')['Salary'].mean()
print(grouped_df)
Output:
Department
HR 52500.0
IT 65000.0
Name: Salary, dtype: float64

d. What happens when you use multiple columns in groupby()? Provide an example scenario.
A. When using multiple columns in groupby(), the DataFrame is grouped by the unique combinations of
values from those columns.
Example scenario: You have a dataset of employees and want to calculate the average salary by both
"Department" and "Gender".

LAB No. 07 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |


REGD. NO.238W1A5464 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-2025

e. How would you handle a dataset where multiple columns contain NaN values and need different
replacement strategies?
A. You can use the fillna() method with a dictionary, where each column has a different strategy for
replacing NaN values
Code:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, None, 35, None],
'City': [None, 'Los Angeles', 'Chicago', None]}
df = pd.DataFrame(data)
replacement_values = {'Age': 30, 'City': 'Unknown'}
df_filled = df.fillna(replacement_values)
print(df_filled)
output:
Name Age City
0 Alice 24.0 Unknown
1 Bob 30.0 Los Angeles
2 Charlie 35.0 Chicago
3 David 30.0 Unknown

Students Signature

(For Evaluator’s use only)


Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured:_______ out of ________

Signature of the Evaluator with Date:

LAB No. 07 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy