0% found this document useful (0 votes)
13 views6 pages

Data Analysis 6060

The document is a question paper for a Data Analysis and Visualization course, containing various programming tasks related to Python, Numpy, and Pandas. It includes commands to manipulate dataframes, perform statistical operations, and visualize data. The paper is structured into two sections, with specific instructions for candidates on how to attempt the questions.

Uploaded by

hackerthing78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

Data Analysis 6060

The document is a question paper for a Data Analysis and Visualization course, containing various programming tasks related to Python, Numpy, and Pandas. It includes commands to manipulate dataframes, perform statistical operations, and visualize data. The paper is structured into two sections, with specific instructions for candidates on how to attempt the questions.

Uploaded by

hackerthing78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

22· oS·~'\~_,_j

6060 12 [This question paper contains 12 printed rflr'~~';7!~ ·\;~:i


~ \ t/·) } ,..,
Give the output of following commands : Your Roi ·J'«) ;L~:.,..,)''
..:-~)1,:1-'~·D:!'- -'·/~i,.
elh'-\'\~
(i) Score[['Name', 'Class']] Sr. No. of Question Paper : 6060

(ii) Score[Score['Class'] ==1] ['Name'] Unique Paper Code 2344001201

Name of the Paper Data Analysis and Visualization


(iii) Score[Score['Score3'] < 80]
Using Python
(iv) Score[ 'Class']. value_counts() .sort_index() ·
Name of the Course Computer Science: Generic
Elective (G.E.)
(v) Score.sum(axis="columns")
.. (NEP-UGCF-2022)
Write a function diff to compute the difference
Semester II
between the maximum and minimum of
each column of dataframe Score and apply it to Duration : 3 Hours Maximum Marks : 90
dataframe Score. (10)
Instructions for Candidates

1. Write your Roll No. on the top immediately on receipt


of this question paper.

2. This question paper has two sections A and B.

3. Question 1 in Section A is compulsory.

4. Attempt any 4 questions from Section B.

5. Parts of a question must be attempted together.

6. Section A carries 30 marks and each question m


Section B carries 15 marks.

7. Use of Calculator is not allowed.

(2000) P.T.O.
6060 2 6060 11

(ii) Assign rank in descending order.


Section A

(iii) Retrieve all values except NaN. (6)


Assume numpy has been imported as np and pandas
has been imported as pd.
]. (a) Write Numpy commands to perform the following
(5) operations on array num : (5)
1. (a) Consider the following numpy arrays ~

arrl = np.array([[4,3,2], [l,9,6]]) (i) Create an array num containing values


from3lto46.
arr2 = np.array([[3,7,5], [2,9,8], [5,1,6]])
(ii) Convert datatype of array num to floating
Give the output of the following commands :
type data.

(i) arr2 [1] [1] (iii) Reshape array num to an array of size 4x4.
\.
(ii) arrI [: 2, -1] (iv) Replace the diagonal elements of array
num to 0.
(iii) arrl * 3
(v) To create an array of l's with the same
(iv) arrl > 5
shape and type as the given array num.
(v) arr2 [2] =4
(b) Consider the dataframe Score given below :
(b) List and describe different types of sampling of Score3
Name Class Scorel Score2
data. (5) A 1 85 90 88
B 2 74 86 80
C 1 83 71 92
(c) Consider the Series object Company having D 2 64 68 73
E 2 77 62 72
'Company_Name' as index and Profit (in Crores)
F 1 90 87 92
as values: (3)

P.T.O.
6060 10 6060 3
6. (a) Consider the pandas series s2 = pd.Series ((2, 4,
6, 8, I 0, 12]). · Company·Name Profit
TCS 350
Write python code to plot cumulative ~sum of s2.
Set the x limit to [ 0, I OJ and y limit to (0,50]. Set Reliance 200
the style of line graph to dot(.) pattern and market L&T 800
to star shape. Set appropriate values for xticks Wipro 150
and yticks. (5) ◄

Write the python commands to perform the


(b) Consider dataframe df given below : (4) following operations :

Number One I Two I Three (i) To display the Company _Name having
State profit> 250.
Ohio 0 .11 12
Colorado 3 14 ls (ii) To display the index.

Provide the output of following commands. (iii) To assign name 'Company_Name' to index.

(i) df.stack()
(d) Write a python code to draw a scatter plot
(ii) df.unstack(level=O) comparing monthly revenue (in Crores) and
monthly expenditure (in Crores) of a company for
(c) Consider the series a given below and write year 2021. (5)
commands to perform the following operations :
revenue= (581,684,739,563,856,716,589,820,
a= pd.Series([6,np.nan,-4,np.nan,3,8,np.nan,5]) 792, 695, 770, 812]

expenditure= (631, 545, 435, 532, 688, 540, 485,


(i) Sort the values and keep NaN in initial
679, 709, 535] .
positions.

P.T.O.
6060 9
6060 4
5. (a) Define categorical and interval data. Give example
Import necessary libraries. Assign the title of the (4)
of each.
plot as 'Revenue vs Expenditure' and label y-axis
as 'Expenditure'. Assign red color to 'Expenditure' (b) What is hierarchical Indexing? Why do we use
data points and green color to 'Revenue' data hierarchical indexing in pandas? Which pandas
feature enables you to have multiple index
points.
levels on an axis? Give an example of hierarchical
(6)
indexing.
(e) Define correlation and covanance. Outline the
difference between the two. (5)
(c) Consider the data fame df 2 given below: (5)

(f) Create a DataFrame having five rows and four Name Age
0 Rohit 10
columns and populate it with random values in the Amit 13
1
range 1 to 100. Set the index of the rows as [' L', 2 Ankur 12

'M' 'N' 'O' 'P'] and column indexes as ['Coll' Write python commands to perform following
' , ' '
(4)
'Col2', 'Col3', 'Col4']. operations :

(3) (i) Create a new object df 3 by reindexing


(g) Give the output of the following code:
df 2 row index as [0, 1, 2, 3, 4] and column
import Pandas as pd index as ['x', 'y'].

sl = pd.Series(['Certificate', 'Bachelor', (ii) Delete the entry of 'Amit' from df3.

'Master', 'Doctorate'],index = [2,4,6,8]) (iii) Rename index of df 2 as [l, 2, 3].

sl.reindex(range(l0), method= 'ffill') (iv) Check if the entry 'Rohit ' exists in df 2.

print(s 1) (v) Modify Age of 'Ankur' to 15 usings loc


command.

P.T.0.
6060 8 6060 5
(i) Read the file test.csv into a DataFrame Section B
data.

(ii) Print the first 10 rows of data. 2. (a) Consider the following DataFrame House Rent
given below : ( 10)
(iii) Display the 5 summary statistics for each
column of data. -
Rooms Area Bathroom Furnishing Status Rent
2 1100 2 Unfurnished 10000
(iv) Remove the rows with all null values.
2 800 1 Semi-Furnished 16000
2 900 2 Furnished 22000
(v) Identify duplicate values in data. 1 250 1 Unfurnished 5000
2 1000 2 Semi-Furnished 23000
3 1200 2 Semi-Furnished 25000
(c) Consider the following piece of code and give the
1 400 1 Unfurnished 7000
output : (5) 1 250 1 Furnished 6500
1 375 1 Unfurnished 6000
3 900 2 Unfurnished 8500
import pandas as pd
3 1286 2 Furnished 35000
dfl = pd.DataFrame({'id': (1,3,6,7], 'val' ('a', 2 600 1 Semi-Furnished 8000
2 800 1 Unfurnished 12000
'b', 'c', 'd']})

df2 = pd.DataFrame( {'id' [1,2,3,5,6,8], 'val' Write python commands to perform the following
('p', 'q', 'r', 's', 't', 'u']}) operations :

df3 = pd.merge(dfl, df2, on= 'id', how= 'outer') (i) Find the index of house with maximum rent.
print(df3)
(ii) Sort the dataframe House Rent on "Area".
How many NaN values are there in the data frame
(iii) Calculate total Area and total rent.
df 3? Write pandas command to replace NaN with
the last known valid value in df3. (iv) Compute the count of houses having rooms
I, 2, 3 etc.

P.T.O.
6 6060 7
6060
(c) Consider numpy array arr given below: (5)
(v) Create a new DataFrame df having a
hierarchical index on columns "Rooms" and arr= [ [O, 1, 2, 3],
"Furnishing Status". [4, 5, 6, 7],
[8, 9, 10, 11],

(b) Refer to DataFrame House_Rent given in question [12, 13, 14, 15],

2(a), Write a python code to plot a bar plot


[16, 17, 18, 19],
[20, 21, 22, 23]]
displaying no of Furnished, Unfurnished, Semi-·
Furnished houses. Import appropriate libraries. The Write numpy commands to retrieve following
title of graph should be "House Data". Give elements:
appropriate labels for x and y axis. Save the figure
(i) (1, 4), (3, 1), (5, 0), and (2, 3)
with name "house.jpg". (5)

(ii) Retrieve 0, 2, 4 rows (use positive index)

3. (a) Write python code to create a numpy array a 1


(iii) Retrieve I, 3, 5 rows (use negative index)
containing 50 floating points values in the range
0 to 1. Put the data of numpy array al into 5 bins. (iv) Retrieve values greater than 10
Set the precision to 4. Assign names to bins
(v) Retrieve rows 1 to 4.
as ['Small' 'Medium' 'Large' 'x-Large' 'xx-
' ' ' '
Large']. (5)
4. (a) What is data wrangling? Identify the possible
issues that can arise in data wrangling process?
(b) Write a numpy code to create a 3D array a3 of
(5)
size 4 x 5 x 3 of random numbers in range 1 to
60 and swap axis 1 with axis 2. Identify the number (b) Consider a csv file test.csv having 3 columns and
of matrices in the array a 3, dimension of a matrix 50 rows. Write python command to perform
in array a3 and the data type of array a3. (5) following operations : (5)

P.T.O.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy