Lab Session 07: Perform Following Operations Using Pandas
Lab Session 07: Perform Following Operations Using Pandas
2. How can we fill NaN values in a Pandas DataFrame with a specific string?
A. youcan use the fillna() function to replace NaN values with a specific string:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'City': ['New York', None, 'Chicago', None]}
df = pd.DataFrame(data)
df_filled = df.fillna("Unknown")
print (df_filled)
4. How does the groupby() function work, and when should it be used?
A. The groupby() function in Pandas is used to group data based on one or more columns and then apply an
aggregate function (like sum, mean, count, etc.) on each group.
Usage: It is used when you want to analyze subsets of data and perform aggregate calculations on
these subsets.
5. Can you explain a real-world scenario where sorting and grouping data is essential?
A. Scenario: In a sales report analysis, sorting and grouping data is essential for understanding performance
across different product categories or regions.
Sorting: To find the top-selling products or regions, you can sort the sales data by the total revenue
in descending order. This helps identify high-performers at a glance.
Grouping: To calculate total revenue for each region or category, you can group the sales data by
region or product category and then calculate the sum of sales. This helps compare the performance
of different regions or categories.
In Lab Task:
Code:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'City': ['New York', None, 'Chicago', None]}
df = pd.DataFrame(data)
df_filled = df.fillna("Unknown")
print(df_filled)
Ourtput:
Name City
0 Alice New York
1 Bob Unknown
2 Charlie Chicago
3 David Unknown
Code:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 30, 35, 40]}
df = pd.DataFrame(data)
df_sorted = df.sort_values(by='Age', ascending=True)
print(df_sorted)
Output:
Name Age
0 Alice 24
1 Bob 30
2 Charlie 35
3 David 40
c. groupby()
Code:
data = {'Category': ['A', 'B', 'A', 'B'],
'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Category')['Value'].sum()
print(grouped_df)
Output:
Category
A 40
B 60
Name: Value, dtype: int64
b. Given a DataFrame with a "Salary" column, how would you sort it in descending order?
A. Code:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Salary': [50000, 60000, 70000, 55000]}
df = pd.DataFrame(data)
df_sorted = df.sort_values(by='Salary', ascending=False)
print(df_sorted)
Output:
Name Salary
2 Charlie 70000
1 Bob 60000
3 David 55000
0 Alice 50000
c. How can you group a DataFrame by a "Department" column and calculate the average salary for each
department?
A. Code:
data = {'Department': ['HR', 'IT', 'HR', 'IT'],
'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Department')['Salary'].mean()
print(grouped_df)
Output:
Department
HR 52500.0
IT 65000.0
Name: Salary, dtype: float64
d. What happens when you use multiple columns in groupby()? Provide an example scenario.
A. When using multiple columns in groupby(), the DataFrame is grouped by the unique combinations of
values from those columns.
Example scenario: You have a dataset of employees and want to calculate the average salary by both
"Department" and "Gender".
e. How would you handle a dataset where multiple columns contain NaN values and need different
replacement strategies?
A. You can use the fillna() method with a dictionary, where each column has a different strategy for
replacing NaN values
Code:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, None, 35, None],
'City': [None, 'Los Angeles', 'Chicago', None]}
df = pd.DataFrame(data)
replacement_values = {'Age': 30, 'City': 'Unknown'}
df_filled = df.fillna(replacement_values)
print(df_filled)
output:
Name Age City
0 Alice 24.0 Unknown
1 Bob 30.0 Los Angeles
2 Charlie 35.0 Chicago
3 David 30.0 Unknown
Students Signature