0% found this document useful (0 votes)
15 views19 pages

Pandas_Worksheet

The document outlines a series of tasks to be performed using the Pandas library in Python, including operations on Series and DataFrames, such as sorting, handling missing values, calculating statistics, and merging data. It also includes specific tasks related to the penguins and Titanic datasets, requiring data loading, filtering, and analysis. Additionally, it covers operations on a family income dataset and attendance records from workshops, emphasizing data manipulation and analysis techniques.

Uploaded by

U Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views19 pages

Pandas_Worksheet

The document outlines a series of tasks to be performed using the Pandas library in Python, including operations on Series and DataFrames, such as sorting, handling missing values, calculating statistics, and merging data. It also includes specific tasks related to the penguins and Titanic datasets, requiring data loading, filtering, and analysis. Additionally, it covers operations on a family income dataset and attendance records from workshops, emphasizing data manipulation and analysis techniques.

Uploaded by

U Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

1.

Perform the following operations using PANDAS Series object(s):

a. Create a series with 5 elements. Display the series sorted on index and also sorted on
values separately.

Code

Output

b. Create a series with N elements with some duplicate values. Find the minimum and
maximum ranks assigned to the values using ‘first’ and ‘max’ methods.

Code

Input
c. Display the index value of the minimum and maximum element of a Series.

Code

Output
2. Create a data frame having at least 3 columns and 50 rows to store numeric data
generated using a random function. Replace 10% of the values by null values whose index
positions are generated using random function. Do the following:

a. Identify and count missing values in a data frame.

Code

Output
b. Drop the column having more than 5 null values.

Code

Output
c. Identify the row label having maximum of the sum of all values in a row and drop that row.

Code

Output

d) Sort the data frame on the basis of the first column.


e. Remove all duplicates from the first column.
f) f. Find the correlation between first and second column and covariance between second
and third column.

g) Discretize the second column and create 5 bins.


3 Write a program using the Pandas library to perform the following operations on the
penguins dataset from the Seaborn library:

a. Load the penguins dataset into a Pandas dataframe.

b. Determine the number of observations/records and the number of attributes in the


dataframe.
c. Display the names of the attributes, row indexes, and data types of each attribute in
the dataframe.

d Display the first 5 and last 5 records of the dataframe.


e) Retrieve the values of the second column for the third and fourth records.

f. Display a summary of the data distribution for all attributes in the dataframe.
g. Compute the pairwise correlation between all attributes in the dataframe.

4. Consider the Titanic dataset, which contains information about passengers on board the
Titanic,

including their age, gender, passenger class, survival status, and other attributes. Write a
program
using the Pandas library to perform the following operations on the Titanic dataset:

a)Load the Titanic dataset into a Pandas DataFrame.

b. Check for any duplicate records and missing values in the dataset and handle them
appropriately.

c. Calculate and display the total number of passengers who survived and those who did not.
d. Filter the DataFrame to select only the records of passengers who were under the age of
18.

e. Calculate the average age for passengers belonging to each of the passenger class.

f. Calculate the total fare paid by the passengers of first class.


g. Create a new column in the DataFrame called "Family Size" that represents the total
number of family members (including the passenger) on board.

h. Calculate the correlation between age and fare attributes of the dataset.

i. Compute descriptive statistics for any numeric attribute genderwise.

j. Create a contingency table that shows the count of passengers based on their survival
status(survived or not) and passenger class (first, second, or third class).
5. Consider the following data frame containing a family name, gender of the family member
and her/his monthly income in each record.

a. Calculate and display familywise gross monthly income.

b. Calculate and display the member with the highest monthly income.
c.Calculate and display monthly income of all members with income greater than Rs.
60000.00.

d.Calculate and display the average monthly income of the female members
6. Consider two excel files having attendance of two workshops. Each file has three fields
‘Name’,‘Date, duration (in minutes) where names are unique within a file. Note that duration
may take one of three values (30, 40, 50) only. Import the data into two data frames and do
the following:

a. Perform merging of the two data frames to find the names of students who had attended
both workshops.

b. Find names of all students who have attended a single workshop only.

C. Merge two data frames row-wise and find the total number of records in the data frame.
d. Merge two data frames row-wise and use two columns viz. names and dates as multi-row

indexes.

e.Generate descriptive statistics for this hierarchical data frame.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy