Data Analysis 6060
Data Analysis 6060
(2000) P.T.O.
6060 2 6060 11
(i) arr2 [1] [1] (iii) Reshape array num to an array of size 4x4.
\.
(ii) arrI [: 2, -1] (iv) Replace the diagonal elements of array
num to 0.
(iii) arrl * 3
(v) To create an array of l's with the same
(iv) arrl > 5
shape and type as the given array num.
(v) arr2 [2] =4
(b) Consider the dataframe Score given below :
(b) List and describe different types of sampling of Score3
Name Class Scorel Score2
data. (5) A 1 85 90 88
B 2 74 86 80
C 1 83 71 92
(c) Consider the Series object Company having D 2 64 68 73
E 2 77 62 72
'Company_Name' as index and Profit (in Crores)
F 1 90 87 92
as values: (3)
P.T.O.
6060 10 6060 3
6. (a) Consider the pandas series s2 = pd.Series ((2, 4,
6, 8, I 0, 12]). · Company·Name Profit
TCS 350
Write python code to plot cumulative ~sum of s2.
Set the x limit to [ 0, I OJ and y limit to (0,50]. Set Reliance 200
the style of line graph to dot(.) pattern and market L&T 800
to star shape. Set appropriate values for xticks Wipro 150
and yticks. (5) ◄
Number One I Two I Three (i) To display the Company _Name having
State profit> 250.
Ohio 0 .11 12
Colorado 3 14 ls (ii) To display the index.
Provide the output of following commands. (iii) To assign name 'Company_Name' to index.
(i) df.stack()
(d) Write a python code to draw a scatter plot
(ii) df.unstack(level=O) comparing monthly revenue (in Crores) and
monthly expenditure (in Crores) of a company for
(c) Consider the series a given below and write year 2021. (5)
commands to perform the following operations :
revenue= (581,684,739,563,856,716,589,820,
a= pd.Series([6,np.nan,-4,np.nan,3,8,np.nan,5]) 792, 695, 770, 812]
P.T.O.
6060 9
6060 4
5. (a) Define categorical and interval data. Give example
Import necessary libraries. Assign the title of the (4)
of each.
plot as 'Revenue vs Expenditure' and label y-axis
as 'Expenditure'. Assign red color to 'Expenditure' (b) What is hierarchical Indexing? Why do we use
data points and green color to 'Revenue' data hierarchical indexing in pandas? Which pandas
feature enables you to have multiple index
points.
levels on an axis? Give an example of hierarchical
(6)
indexing.
(e) Define correlation and covanance. Outline the
difference between the two. (5)
(c) Consider the data fame df 2 given below: (5)
(f) Create a DataFrame having five rows and four Name Age
0 Rohit 10
columns and populate it with random values in the Amit 13
1
range 1 to 100. Set the index of the rows as [' L', 2 Ankur 12
'M' 'N' 'O' 'P'] and column indexes as ['Coll' Write python commands to perform following
' , ' '
(4)
'Col2', 'Col3', 'Col4']. operations :
sl.reindex(range(l0), method= 'ffill') (iv) Check if the entry 'Rohit ' exists in df 2.
P.T.0.
6060 8 6060 5
(i) Read the file test.csv into a DataFrame Section B
data.
(ii) Print the first 10 rows of data. 2. (a) Consider the following DataFrame House Rent
given below : ( 10)
(iii) Display the 5 summary statistics for each
column of data. -
Rooms Area Bathroom Furnishing Status Rent
2 1100 2 Unfurnished 10000
(iv) Remove the rows with all null values.
2 800 1 Semi-Furnished 16000
2 900 2 Furnished 22000
(v) Identify duplicate values in data. 1 250 1 Unfurnished 5000
2 1000 2 Semi-Furnished 23000
3 1200 2 Semi-Furnished 25000
(c) Consider the following piece of code and give the
1 400 1 Unfurnished 7000
output : (5) 1 250 1 Furnished 6500
1 375 1 Unfurnished 6000
3 900 2 Unfurnished 8500
import pandas as pd
3 1286 2 Furnished 35000
dfl = pd.DataFrame({'id': (1,3,6,7], 'val' ('a', 2 600 1 Semi-Furnished 8000
2 800 1 Unfurnished 12000
'b', 'c', 'd']})
df2 = pd.DataFrame( {'id' [1,2,3,5,6,8], 'val' Write python commands to perform the following
('p', 'q', 'r', 's', 't', 'u']}) operations :
df3 = pd.merge(dfl, df2, on= 'id', how= 'outer') (i) Find the index of house with maximum rent.
print(df3)
(ii) Sort the dataframe House Rent on "Area".
How many NaN values are there in the data frame
(iii) Calculate total Area and total rent.
df 3? Write pandas command to replace NaN with
the last known valid value in df3. (iv) Compute the count of houses having rooms
I, 2, 3 etc.
P.T.O.
6 6060 7
6060
(c) Consider numpy array arr given below: (5)
(v) Create a new DataFrame df having a
hierarchical index on columns "Rooms" and arr= [ [O, 1, 2, 3],
"Furnishing Status". [4, 5, 6, 7],
[8, 9, 10, 11],
(b) Refer to DataFrame House_Rent given in question [12, 13, 14, 15],
P.T.O.