0% found this document useful (0 votes)
10 views12 pages

GE02 (DAVP) VIVA Questionnaire

The document is a comprehensive VIVA questionnaire for a course on Data Analysis and Visualization using Python, focusing on libraries such as NumPy, Pandas, and Matplotlib. It includes a wide range of questions covering fundamental concepts, methods, and functionalities of these libraries, as well as practical coding tasks. The questions are designed to assess understanding of data structures, operations, and data manipulation techniques in Python.

Uploaded by

U Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

GE02 (DAVP) VIVA Questionnaire

The document is a comprehensive VIVA questionnaire for a course on Data Analysis and Visualization using Python, focusing on libraries such as NumPy, Pandas, and Matplotlib. It includes a wide range of questions covering fundamental concepts, methods, and functionalities of these libraries, as well as practical coding tasks. The questions are designed to assess understanding of data structures, operations, and data manipulation techniques in Python.

Uploaded by

U Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

GE02 : Data Analysis & Visualization using Python (Sem- II)

VIVA Questionnaire

Questions based on NumPy

1. What is the purpose of NumPy library?


2. Define ndarray object of Numpy. Briefly explain the following properties of an ndarray object:
a. shape
b. dtype
c. ndim
3. In what ways do the NumPy arrays are better than the built-in data structures of Python?
OR
Explain how homogeneous ndarrays of Numpy are better than the heterogeneous list of python.
4. State the name of various constructor methods available for ndarray creation in the numpy library.
5. Differentiate between array(), asarray() and arrange()methods used in Numpy.
6. Differentiate between ones()/zeros()/full()/empty() and ones_like()/zeros_like()/full_like()/empty_like()
methods used in Numpy.
7. Differentiate between zeros() and empty() methods used in Numpy.
8. Write name of the method(s) that can be used to create an identity matrix in Numpy.
9. Differentiate between eye() and identity() methods used in Numpy.
10. Differentiate between rand() and randn() methods for random generation in Numpy.
11. Differentiate between shape and reshape() for an ndarray, with example.
12. Differentiate between dtype and type() in python with example.
13. Differentiate between int32 and uint32 data types.
14. Differentiate between int32 and int16 data types.
15. Differentiate between int32 and uint32 data types.
16. Can we specify the data type of an ndarray during its creation? If so, then how?
17. Can we change the data type of an ndarray once it has been created? If yes, then how?
18. Which two concepts of ndarray objects are primarily responsible for making batch computations possible for
the same?
19. What is Vectorization?
20. What is broadcasting?
21. What is slicing?
22. What do you mean by bare slice?
23. Differentiate between a view and a copy in Numpy library, with the help of an example.
24. What are the various ways through which we can obtain a view and copy of an array?
25. Differentiate between Boolean indexing and fancy indexing with example(s).
26. For a 4x4 2D array, what will be the output of following statements-
a1 = numpy.arange(16).reshape((4,4))
a. print(a1*2)
b. print(a1[a1>10])
c. print(a1**2)
d. print(a1>10)
e. a1[:2 , :2] = 0
print(a1)
f. a1[2: , 2:] = 1
print(a1)
g. a1[ : , 1:] = -1
print(a1)
h. print (a1[[3, 1, 2]])
i. print (a1[[-3, -1, -2]])
j. print(a1[[2,3]][:,[-2,-3]])
k. print (a1[:,[-3,-1,-2]])
l. print (a1[[-3, -1, -2],[0, 3, 1]])
27. Create a 6x6 empty array containing zeros and fill the values as instructed(using fancy indexing only):
a. Fill all even rows with 1 and all odd rows with -1.
b. Print the last 3 columns of the 2D array.
28. Create a 6x6 empty array containing zeros and fill the values as instructed(using fancy indexing only):
a. Fill all the diagonal elements with 1.
b. Print the last 3 rows of the 2D array.
29. Differentiate amongst T attribute, swapaxes() and transpose() for numpy array with the help of an illustrative
example.
30. For a 3D array of shape (2,3,4), write down the shape of the resultant transposed 3D array after using T
attribute.
31. For a (3x2x2) 3D array, what will be the output of following statements-
a1 = numpy.arange(12).reshape((3,2,2))
a. print(a1.T)
b. print(a1.swapaxes(0,2))
c. print(a1.transpose((1,0,2)))
32. For the following 2D array, write down the appropriate commands that will perform the required slicing.
(Required slice has been highlighted)
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
(a)

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
(b)
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
(c)
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
(d)
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
(e)
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
(f)
33. Create random integer 1D array of 8 elements and split it into 4 equal sized partitions. Use these partitions to
create two 2x2 arrays namely a1 and a2. Perform floor division between a1 and a2 and store the result into
a3. Scale up the values of a2 by a power of 2 and store the result into a4. Find out the correlation between
arrays a3 and a4.
Questions based on Pandas library

1. What is the purpose of Pandas library?


2. Differentiate between series and 1D array of numpy.
3. Differentiate between dataframe and 2D numpy array.

Questions on Series object

4. Is it possible to create a heterogeneous Series object? If so, explain how.


5. What will be the output of following statements?

6. Differentiate between isnull() and notnull() methods of a Series object.


7. What will be the output of following statements:

8. Briefly explain the following properties of a Series object:


a. S1.name
b. S1.index.name
c. S1.dtype
d. S1.ndim
e. S1.size
f. S1.shape
g. S1.empty
h. Also differentiate between s1.size, s1.shape and s1.ndim properties.
9. What will be the output of following statements: (Slicing in Series)

10. If series is an immutable object, then how can we still perform data manipulation operations on it? Justify.
11. If a series object is once created, how can we add or remove elements from the same?
12. Differentiate between reindexing and renaming the indexes of a Series object.
13. If I want to change the size of a series object, then through which concept (i.e. between reindexing and
renaming of indexes) can be used?
14. What do you mean by forward fill and backward fill methods? And, where do we use them?
15. What will be the output of following code snippet: (forward filling & backward filling)

16. What will be the output of following statements: (Arithmetic operations between different sized series
objects)

17. Is it possible to sort a Series object based on its indexes? If yes, then explain how.
18. What do you mean by ranking a series object? What are the various methods available through which we
can rank a series object?
19. What will be the output of following statements:
20. State the method of Pandas library that can be used to:
a. Obtain only unique values present in a Series.
b. Obtain the frequency of each distinct element of a Series.
c. Inspect the membership of a given value in a Series.

Questions on Dataframe object

21. What will be the output of following statements: (Creation of dataframes)

22. What does the following statements will generate? (Aliasing v/s copying)

23. Briefly explain the following properties of a dataframe object:


a. df.index
b. df.columns
c. df.axes
d. df.dtypes
e. df.size VS len(df)
f. df.T
24. Differentiate between loc and iloc operators of a dataframe.
25. Differentiate between at and iat operators of a dataframe.
26. Differentiate between loc and at operators of a dataframe.
27. For the following dataframe, what will the following statements generate: (Slicing & Indexing)
a. print(df1.A)
b. print(df1[["A","B","C"]])
c. print(df1[["B","C","A"]])
d. print(df1.loc["II", : ])
e. print(df1.loc["I":"II", : ])
f. print(df1.loc[ :"II", : ])
g. print(df1.loc["II": , : ])
h. print(df1.loc[ : ,"B":"C"])
i. print(df1.loc[ : , :"C"])
j. print(df1.loc[ : ,"B": ])
k. print(df1.loc[ : ,"A":"C"])
l. print(df1.loc["II":"III","A":"B"])
m. print(df1.iloc[ :3])
n. print(df1.iloc[1: ])
o. print(df1.at["I","C"])
p. print(df1.iat[0,2])
q. print(df1.C["I"])
r. print(df1[df1["B"]>40])

28. Differentiate between del and drop for a dataframe.


29. What do you mean by using Boolean indexes for a dataframe? Give one use-case scenario where using
dataframe having Boolean indexes will be useful.
30. “All statistical methods provided by the Pandas library share three common parameters”. State them.
31. What is the purpose of describe() for dataframe objects?
32. Differentiate between df.describe() and df.describe(include=’all’).
33. Differentiate between df.sum() and df.cumsum().
34. Differentiate between corr() and corrwith() methods of computing correlation for dataframes.

Questions on exporting (and importing) to (& from) CSV files

35. State the functions used to read from and write to csv files using dataframe objects of Pandas library.
36. Briefly explain the purpose of following parameters used in the to_csv() method:
a. Sep
b. Index
c. Header
d. Columns
e. Na_rep
37. Briefly explain the purpose of following parameters used in the read_csv() method:
a. Sep
b. Index_col
c. Header
d. Names
e. Skip_rows
38. Write a program using Python that shall read n number of rows from a csv file into a dataframe.

Questions on handling missing values, duplicates & outliers; mapping and binning

39. State the methods of Pandas library that can be used to check whether a series or dataframe contains missing
values or not.
40. Differentiate between isnull() & notnull() methods.
41. What are the various ways to deal with the missing values present in a series or dataframe?
42. How can we delete only those rows of a dataframe which contain only null values?
43. Is it possible to delete only those rows of a dataframe which do not contain even a minimum number of true
values? If yes, explain how?
44. Is it possible to delete those columns from a dataframe which contains any null values?
45. In how many ways, we can specify a fill value to replace the missing values of a dataframe?
46. What do you mean by forward filling & backward filling in context of treating missing values in a
dataframe?
47. How to check for duplicate rows present in a dataframe?
48. How to get rid of duplicate rows present in a dataframe?
49. Differentiate between df.duplicated() and df.drop_duplicates().
50. Differentiate between df.drop_duplicates() and df.drop_duplicates([“A”,”B”]), provided df has four
columns, namely- A, B, C and D.
51. Differentiate between df.drop_duplicates() and df.drop_duplicates(keep=”last”).
52. What will be the output of following statements:
a. print(s1.replace([-999,-1000,-100],0))
b. print(s1.replace([-999,-1000,-100],[9,3,2]))
c. print(s1.replace({-999:0,-1000:1,-100:2}))
53. How does the map() method work? State its one use-case scenario.
54. What do you mean by binning? State one use-case scenario of binning.
55. Differentiate between equal_width and equal_depth binning.
56. Differentiate between cut() and qcut() methods of pandas library.
Questions on merging of data frames & Hierarchical indexing in Series & dataframes

57. How does the merge() method work?


58. Differentiate amongst one-to-one join, one-to-many join and many-to-many join between two data sets.
59. Differentiate between left and right join.
60. Differentiate between inner and outer join.
61. When the two dataframes (involved in merging) do not have any common column, then how do we specify
the key columns for both dataframes?
62. Can we merge two dataframes based on more than one key columns? If so, then explain how?
63. If the two dataframes (involved in merging) have two common columns, and they are being merged based on
only one of the common columns, then how do we differentiate between the second common column
appearing twice in the merged dataset?
64. What do you mean by hierarchical indexing?
65. What do you mean by multi-indexed dataframe/series object?
66. In a multi-indexed series object, suppose there are two levels of row indexes, then is it possible to swap these
two levels of row indexes? If so, then explain how?
67. Differentiate between stack() and unstack() methods for a multi-indexed series or dataframe object.
68. Differentiate between df.set_index() and df.get_index () methods for a multi-indexed series or dataframe
object.
69. What do you mean by an Multi-indexed object?

Questions on groupby, aggregation and pivoting

70. What do you mean by split-apply-combine paradigm for series & dataframe objects?
71. What will be the output of following statements for the given dataframe:

grouped_df = df1.groupby(df1["key1"])

a. print(grouped_df["data1"].agg(['sum','mean'))
b. print(grouped_df["data1"].agg([('SUM','sum'),('MEAN','mean')]))
c. print(grouped_df.agg({'data1':['count','sum'], 'data2':'mean'}))
Questions based on Matplotlib

1. What is the purpose of Matplotlib & Seaborn library?


2. What do you mean by figure and subplot object?

3. Explain the following terms-


a. Figsize
b. Dpi
c. Face color vs edge color
d. Set_title() (set_title() vs title())
e. suptitle()
f. Set_xlabel()/set_ylabel() (set_xlabel()/set_ylabel() VS xlabel()/ylabel())
g. Set_xlim()/set_ylim()
h. Set_xticks()/set_yticks()
i. Set_xticklabels()/set_yticklabels()
4. State the methods that can be used to add title and axes labels to a figure object.
5. State the methods that can be used to add title and axes labels to a subplot object.
6. What are the various methods available to create subplots in a figure object?
7. Differentiate between add_subplot() and subplots() methods.

8. While plotting a line graph, explain the different parameters of the plot() method of figure object.
9. What are the various ways to specify a color to the color parameter?
10. What are the various values of the linestyle parameter?
11. What are the various values of the linewidth parameter?
12. What are the various values of the marker parameter?
13. What are the various values of the markersize parameter?
14. How can we add a legend to a graph?
15. “splots[0][0].plot(np.random.rand(10), "r--^", linewidth=2, markersize=4)” --- What does “r--^” indicate?

16. What do you mean by Annotations? How can you add annotations to a graph?
17. What is the purpose of adding shapes to a graph? How can we do so?
18. What is the method that can be used to export a graph from jupyter notebook to a pdf?
19. Which methods are available in Matplotlib and seaborn through which we can plot bar charts for a
series/dataframe objects? (df.plot.bar(), df.plot.barh(), sns.barplot())
20. Differentiate between histogram and density plot.
21. What is KDE?
22. Which methods are available in Matplotlib and seaborn through which we can plot histogram, density and
KDE (kernel density estimation) curves for a series/dataframe objects?
(s1.plot.hist(), df.plot.hist(), df.plot.density(), df.plot.kde(), sns.histplot())
23. What is the purpose of scatter plot? How does a pairplot relate to a scatterplot?
24. Which methods are available in Matplotlib and seaborn through which we can plot scatter plots & pairplot
for a series/dataframe objects? (df.plot.scatter(), sns.paiplot(df))
25. What is a heatmap?
26. Which methods are available in Matplotlib and seaborn through which we can plot heatmap for dataframe
objects? (plt.imshow(df1, cmap=’seismic’))

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy