Worksheet - Pandas
Worksheet - Pandas
com
Ans:
0 1
1 2
2 2
3 7
4 Sachin
dtype: object
0 1
1 2
2 2
dtype: object
2 Write a program in python to find maximum value over index in Data frame.
Ans:
# importing pandas as pd
import pandas as pd
Ans:
1|Page
www.python4csip.com
4 Write a python program to sort the following data according to ascending order
of Age.
Name Age Designation
Sanjeev 37 Manager
Keshav 42 Clerk
Rahul 38 Accountant
Ans:
import pandas as pd
name=pd.Series(['Sanjeev','Keshav','Rahul'])
age=pd.Series([37,42,38])
designation=pd.Series(['Manager','Clerk','Accountant'])
d1={'Name':name,'Age':age,'Designation':designation}
df=pd.DataFrame(d1)
print(df)
df1=df.sort_values(by='Age')
print(df1)
Ans:
import pandas as pd
name=pd.Series(['Sanjeev','Keshav','Rahul'])
age=pd.Series([37,42,38])
designation=pd.Series(['Manager','Clerk','Accountant'])
d1={'Name':name,'Age':age,'Designation':designation}
df=pd.DataFrame(d1)
print(df)
2|Page
www.python4csip.com
df2=df.sort_values(by='Name',ascending=0)
print(df2)
6 Which of the following thing can be data in Pandas?
1. A python dictionary
2. An nd array
3. A scalar value
4. All of above
Ans:
Ans:
3. Value,size
Ans:
1. True
Ans:
3|Page
www.python4csip.com
4. None
Ans:
1. Dataframe
12 What will be the output of df.iloc[3:7,3:6]?
Ans:
It will display the rows with index 3 to 6 and columns with index 3 to 5 in a
dataframe ‘df’
13 How to select the rows where where age is missing?
1. df[df[‘age’].isnull]
2. df[df[‘age’]==NaN]
3. df[df[‘age’]==0]
4. None
Ans:
'Bidprice':[13,12,7,10,17,15],
'Runs':[1000,2400,900,200,3600,3700]}
df=pd.DataFrame(d)
print(df)
print(df.iloc[:2,:])
print(df.iloc[-3:,:])
15 Write a command to Find most expensive Player.
Ans:
print(df[df['BidPrice']==df['BidPrice'].max()])
16 Write a command to Print total players per team.
4|Page
www.python4csip.com
Ans:
print(df.groupby('Team').Player.count())
17 Write a command to Find player who had highest BidPrice from each team.
Ans:
val=df.groupby('Team')
print(val['Player','BidPrice'].max())
1. Mathematician
2. Statistician
3. Software Programmer
4. All of the above
Ans:
4 All the above
22 What is the built-in database used for python?
1. Mysql
2. Pysqlite
3. Sqlite3
4. Pysqln
Ans:
3 Sqlite3
23 How can you drop columns in python that contain NaN?
Ans:
df1.dropna(axis=1)
5|Page
www.python4csip.com
Ans:
df1.dropna(axis=0)
25 A Series is ______________ array, which is labelled and ______________ type.
Ans:
Ans:
4 All
Ans:
4.6
29 How many rows the resultant data frame will have?
import pandas as pd
df1=pd.DataFrame({‘key’:[‘a’,’b’,’c’,’d’], ‘value’:[1,2,3,4]})
df2=pd.DataFrame({‘key’:[‘a’,’b’,’e’,’b’], ‘value’:[5,6,7,8]})
df3=df1.merge(df2, on=’key’, how=’inner’)
1. 3
2. 4
3. 5
4. 6
Ans:
1. 3
30 How many rows the resultant data frame will have?
6|Page
www.python4csip.com
import pandas as pd
df1=pd.DataFrame({‘key’:[‘a’,’b’,’c’,’d’], ‘value’:[1,2,3,4]})
df2=pd.DataFrame({‘key’:[‘a’,’b’,’e’,’b’], ‘value’:[5,6,7,8]})
df3=df1.merge(df2, on=’key’, how=’right’)
1. 3
2. 4
3. 5
4. 6
Ans:
2. 4
31 How many rows the resultant data frame will have?
import pandas as pd
df1=pd.DataFrame({‘key’:[‘a’,’b’,’c’,’d’], ‘value’:[1,2,3,4]})
df2=pd.DataFrame({‘key’:[‘a’,’b’,’e’,’b’], ‘value’:[5,6,7,8]})
df3=df1.merge(df2, on=’key’, how=’left’)
1. 3
2. 4
3. 5
4. 6
Ans:
3. 5
Ans:
pop()
33 A ____________ is an interactive way to quickly summarize large amount of data.
Ans:
Pivoting
34 _______________Method is used to rename the existing indexes in a data frame.
Ans:
rename
35 __________________ Attribute that can prohibit to create a new data frame in
sort_values() method.
Ans:
Inplace
36 Write a program in python to calculate the sum of marks in CS subject in a
given dataset-
‘CS’:[45,55,78,95,99,97], ‘IP’:[87,89,98,94,78,77]
Ans:
d1={ ‘CS’:[45,55,78,95,99,97], ‘IP’:[87,89,98,94,78,77] }
df=pd.DataFrame(d1)
print(df['CS'].sum())
7|Page
www.python4csip.com
37 Write a python program to create a data frame with headings (CS and IP) from
the list given below-
[[79,92][86,96],[85,91],[80,99]]
Ans:
l=[[10,20],[20,30],[30,40]]
df=pd.DataFrame(l,columns=['CS','IP'])
print(df)
38 How you can find the total number of rows and columns in a data frame.
Ans:
df.shape
39 MaxTemp MinTemp City RainFall
45 30 Delhi 25.6
34 24 Guwahati 41.5
48 34 Chennai 36.8
32 22 Bangluru 40.2
44 29 Mumbai 38.5
39 37 Jaipur 24.9
Ans:
print(df.sum(axis=0))
40 Based on the above data frame df, Write a command to compute mean of
column MaxTemp.
Ans:
Print(df['MaxTemp'].mean())
41 Based on the above data frame df, Write a command to compute average
MinTemp, RainFall for first 4 rows.
Ans:
df[['MinTemp', 'Rainfall’]][:4].mean()
42 Which method is used to read the data from MySQL database through Data
Frame?
Ans:
read_sql_query()
Ans:
execute()
44 What will be the output of following code?
8|Page
www.python4csip.com
import pandas as pd
df = pd.DataFrame([45,50,41,56], index = [True, False, True, False])
print(df.iloc[True])
Ans:
It will display error message like- Cannot index by location index with a non-integer
key because iloc accept only integer index.
9|Page
www.python4csip.com
Two functions for pivoting are: pivot() and pivot_table()
52. Write a python code to create a dataframe with appropriate headings from the
list given below:
['S101', 'Amy', 70], ['S102', 'Risha', 69], ['S104', 'Susan', 75], ['S105','George',
82]
import pandas as pd
L=[['S101','Amy',70], ['S102','Risha',69], ['S104','Susan',75], ['S105','George',82]]
df=pd.DataFrame(L,index=[1,2,3,4],columns=['ID','Name','Points'])
print(df)
53. Consider the following dataframe, and answer the questions given below:
import pandas as pd
df = pd.DataFrame({“Quarter1":[2000, 4000, 5000, 4400, 10000],
"Quarter2":[5800, 2500, 5400, 3000, 2900],
"Quarter3":[20000, 16000, 7000, 3600, 8200],
"Quarter4":[1400, 3700, 1700, 2000, 6000]})
Write the code to find mean value from above dataframe df over the index and
column axis. (Skip NaN value)
print(df.mean(axis=0,skipna=True))
print(df.mean(axis=1,skipna=True))
54. Use sum() function to find the sum of all the values over the index axis.
print(df.sum(axis=0))
55. Find the median of the dataframe df.
print(df.median())
56. Find the output of the following code:
import pandas as pd
data = [{'a': 10, 'b': 20},{'a': 6, 'b': 32, 'c': 22}]
df1 = pd.DataFrame(data,columns=['a','b'])
df2 = pd.DataFrame(data,columns=['a','b1'])
print(df1)
print(df2)
a b
0 10 20
1 6 32
a b1
0 10 NaN
1 6 NaN
57.
import pandas as pd
x1=[[10,150],[40,451],[15,302],[40,703]]
df1=pd.DataFrame(x1,columns=['mark1','mark2'])
x2=[[30,20],[20,25],[20,30],[5,30]]
df2=pd.DataFrame(x2,columns=['mark1','mark2
']) print(df1)
print(df2)
10 | P a g e
www.python4csip.com
60. To change index label of df1 from 0 to zero and from 1 to one.
df1=df1.rename(index={0:'zero',1:'one'})
62. For the given code fill in the blanks so that we get the desired output with
maximum value for Quantity and Average Value for Cost:
import pandas as pd
import numpy as np
d={'Product':['Apple','Pear','Banana','Grapes'],'Quantity':[100,150,200,250],
'Cost':[1000,1500,1200,900]}
df = pd.DataFrame(d)
df1 =
print(df1)
Quantity 250.0
Cost 1150.0
dtype: float64
df1=pd.DataFrame([df['Quantity'].max(),df['Cost'].mean()],index=['Quantity','Cost'])
11 | P a g e
www.python4csip.com
import pandas as pd
df1=pd.DataFrame({'Icecream':['Vanila','ButterScotch','Caramel'] ,
'Cookies':['Goodday','Britannia', 'Oreo']})
df2=pd.DataFrame({'Chocolate':['DairyMilk','Kitkat'],'Icecream':['Vanila','ButterScotc
h'],'Cookies':['Hide and Seek','Britannia'})
df2.reindex_like(df1)
print(df2)
Chocolate Icecream Cookies
0 DairyMilk Vanila Hide and Seek
1 Kitkat ButterScotch Britannia
print(df1.add(df2))
12 | P a g e
www.python4csip.com
df1=df1.sort_values(by=’Second’,ascending=False)
df2=df2.rename(index={0:’a’,1:’b’,2:’c’,3:’d’})
70. To display those rows in df1 where value of third column is more than 45.
print(df1[df1[‘Third’]>45])
import pandas as pd
student_df=pd.DataFrame({'Name':['Ananmay','Aditi','Mehak','Kriti'],'Class':['XI','XI','
XI','XI'],'Marks':[95,82,65,45]},index=[1,2,3,4])
data={'Name':'Sohail','Class':'XII','Marks':77}
newstd=pd.DataFrame(data,index=[5])
student_df=student_df.append(newstd)
73. Jitesh wants to sort a DataFrame df. He has written the following code.
df=pd.DataFrame({"a":[13, 24, 43, 4],"b":[51, 26, 37, 48]})
print(df)
df.sort_values(‘a’)
print(df)
He is getting an output which is showing original DataFrame and not the sorted
DataFrame. Identify the error and suggest the correction so that the sorted
DataFrame is printed.
The possible reason is that the original dataframe is not
modified. The correct answer is:
df.sort_values(‘a’,inplace=True)
74. Write a command to display the name of the company and the highest car price
from DataFrame having data about cars.
import pandas as pd
car={'Name':['Innova','Tavera','Royal','Scorpio'],'Price':[300000,800000,25000
0,650000]}
df=pd.DataFrame(car,index=[1,2,3,4])
print(df[df.Price==df.Price.max()])
75. Write a command in python to Print the total number of records in the
DataFrame.
print(df1.count())
13 | P a g e
www.python4csip.com
76. Consider a DataFrame ‘df’ created using the dictionary given below, answer
the questions given below:
77. Write a command to create a pivot table based on ‘qualify’ column and display
sum of the score and attempt columns.
print(df.pivot_table(columns=['qualify'],values=['score','attempts'],aggfunc='sum'))
78. Write a command to display the names of students who have qualified.
print(df[df['qualify']=='yes'].name)
79. Consider the following DataFrame df and answer the questions given below:
80. Write command to compute mean of every column of the data frame.
print(df.mean(axis=0))
81. Write command to add one more row to the data frame with data [5,12,33,3]
14 | P a g e
www.python4csip.com
82.
Emp_ID Name Dept Salary Status
100 Kabir IT 34000 Regular
110 Rishav Finance 28500 Regular
120 Seema IT 13500 Contract
130 David IT 41000 Regular
140 Ruchi HRD 17000 Contract
Consider the above Data frame as df.
Write a Python Code to calculate the average salary of the Regular employees
and the Contract employees separately.
print(df.groupby('Status').mean().Salary)
83. Write a Python Code to print the dataframe in the descending order of Salary.
df=df.sort_values(by='Salary',ascending=False)
print(df)
84. Write a Python Code to update the Salary of all Contract employees to Rs
19000
df.Salary[df.Status=='Contract']=19000
85. Write a Python Code to count the total number of employees in each
department.
print(df.groupby('Dept').count().Name)
86. Write a Python Code to display the maximum salary of the “Contract” staff.
print(df[df['Status']=='Contract'].max().Salary)
print(df.iloc[3:4,:])
del df['Status']
89. Write a Python Code to display the maximum salary of all employees in the
‘IT’ department.
print(df[df.Dept=='IT'].max().Salary)
90. Write a Python Code to delete the 1st and the last record.
df=df.drop([0,4])
15 | P a g e
www.python4csip.com
print(df[df>50].count().sum())
93. Write Python Code to count the number of even numbers and number of odd
numbers in the dataframe.
print('No of Even Numbers:',df[df%2==0].count().sum())
print('No of Odd Numbers:',df[df%2==1].count().sum())
94. Consider the above data frame df.
employee sales Quarter State
Sahay 125600 1 Delhi
George 235600 1 Tamil Nadu
Priya 213400 1 Kerala
Manila 189000 1 Haryana
Raina 456000 1 West Bengal
Manila 172000 2 Haryana
Priya 201400 2 Kerala
import pandas as pd
data={'employee':['Sahay','George','Priya','Manila','Raina','Manila','Priya'],
'Sales':[125600,235600,213400,189000,456000,172000,201400],
'Quarter':[1,1,1,1,1,2,2],'State':['Delhi','TamilNadu','Kerala','Haryana','West
Bengal','Haryana','Kerala']}
df=pd.DataFrame(data)
print(df)
95. Write Python Program to find total sales per state.
print(df.groupby('State').sum().Sales)
print(df.groupby('employee').sum().Sales)
97. Write Python Program to find average sales on both employee and state wise.
print(df.groupby(['employee','State']).sum().Sales)
98. Write Python Program to find mean,median and minimum sale statewise.
print(df.groupby('State').mean().Sales)
print(df.groupby('State').median().Sales)
print(df.groupby('State').min().Sales)
99. Write Python Program to find maximum sales quarter-wise.
print(df.groupby('Quarter').max().Sales)
100 Write Python Program to create a Pivot Table with State as the index, Sales as
. the values and calculating the maximum Sales in each State.
print(df.pivot_table(index='State',values='Sales',aggfunc='max'))
16 | P a g e