0% found this document useful (0 votes)
18 views12 pages

Priority Questions

Fds priority Questions

Uploaded by

sainaresh2727
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views12 pages

Priority Questions

Fds priority Questions

Uploaded by

sainaresh2727
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Mount Zion College of Engineering & Technology

To Make Man Whole!!

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3352 – FOUNDATIONS OF DATA SCIENCE

UNIT I INTRODUCTION
P3 Data Science: Benefits and uses – facets of data
P1*Data Science Process: Overview
P2Defining research goals
P1*Retrieving data
P1*Data preparation
P1*Exploratory Data analysis
P1*Build the model
P2Presenting findings and building applications
P3Data Mining and Data Warehousing
P2Basic Statistical descriptions of Data
PART A - PRIORITY 1

1) Define Data science and its life cycle.


2) What is big data and list the V’s of big data?
3) List the categories of data used in data science.
4) State outliers with an example
5) Define data warehouse, data mart and data lake.
6) List the steps in data cleansing.
7) How will you handle the missing data?
8) What are the different ways of combining data?
9) Sketch the components of big data technologies
10) Define statistics and its types

PART A - PRIORITY 2

1) Identify the components of data science.


2) List the issues with real world data.
3) Identify the important contents of a project charter.
4) Mention the benefits of data preparation phase.
5) What is the implication of erroneous data for analysis?
6) What is confusion matrix?
Mount Zion College of Engineering & Technology
To Make Man Whole!!

PART A - PRIORITY 3

1) List the common evaluation metrics used to measure the performance of models.
2) How will you combine data from different data sources?
3) Define Euclidean distance.
4) What is machine learning?

PART B – PRIORITY 1

1) Give an overview of the data science process.


2) Explain the different stages of data preparation.
3) Describe the approaches for data exploration.
4) Explain the different types of retrieving data for analysis

PART B – PRIORITY 2

1) Discuss the significance setting research goal for the data science project.
2) Describe in brief about the tools for data science model building.
3) Explain the basic statistical descriptions of data.
4) Explain the benefits and uses of data science.

PART B – PRIORITY 3

1) Elaborate any 5 application domains of data science.


2) Describe the facets or categories of data for data mining.
Mount Zion College of Engineering & Technology
To Make Man Whole!!

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3352 – FOUNDATIONS OF DATA SCIENCE

UNIT II DESCRIBING DATA

P3  Types of Data
P2  Types of Variables
P1* Describing Data with Tables and Graphs
P1* Describing Data with Averages
P1* Describing Variability
P2  Normal Distributions and Standard (z) Scores
PART A – PRIORITY 1

1) Define population and sample.


2) What is frequency distribution?
3) Write the guidelines for frequency distribution.
4) What is relative frequency distribution?
5) What is cumulative frequency distribution?
6) List the instructions to find the median.
7) Define range and variance.
8) Define standard deviation
9) What are degrees of freedom?
10) What is interquartile range?
11) Write the steps to calculate of the IQR.
12) Define Z score.
13) What is standard normal curve?

PART A – PRIORITY 2
1) What is causation?
2) What is positive relationship?
3) What is negative relationship?
4) Define linear relationship
5) What is curvilinear relationship?
6) What is standard error estimate?
7) When does regression fallacy occur?
Mount Zion College of Engineering & Technology
To Make Man Whole!!

PART A – PRIORITY 3

1) Differentiate constant and variable.


2) Write the steps to convert histogram to frequency histogram.
3) Give an example for stem and leaf display.
4) What are the typical shapes of graph?
5) Draw positively skewed distribution graph.
6) Draw negatively skewed distribution graph.
7) Define Bar graph and misleading graph.

PART B – PRIORITY 1

1) Explain the different types of frequency distribution with suitable examples and
diagrams
2) Construct the histogram and convert it to a frequency polygon for the following data
138, 139, 139, 145, 145, 150, 145, 136, 150, 152, 144, 138, 138, 150, 149, 133, 134,
152, 155, 151
3) Using the computation formula for the sum of squares, calculate the population
standard deviation for the scores in (a) and the sample standard deviation for the
scores in (b).
a) 1, 3, 7, 2, 0, 4, 7, 3
b) 10, 8, 5, 0, 1, 1, 7, 9, 2
6) Determine the values of the range and the IQR for the following sets of data.
a) Retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63
b) Residence changes: 1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4
7) Suppose that the burning times of electric light bulbs approximate a normal curve
with a mean of 1200 hours and a standard deviation of 120 hours. What proportion
of lights burn for
(a) less than 960 hours?
(b) more than 1500 hours?
(c) within 50 hours of the mean?
(d) between 1300 and 1400 hours?

PART B – PRIORITY 2

1) Discuss the methods to measure the variability for qualitative and ranked data.
2) Construct the frequency table and draw histogram, stem leaf displays for the
following data 139, 145, 150, 145, 136, 150, 152, 144, 138, 138
3) Compute the mean, median and mode for the following data sets
a) 45, 55, 60, 60, 63, 63, 63, 63, 65, 65, 70
b) 26.9, 26.3, 28.7, 27.4, 26.6, 27.4, 26.9, 26.9

PART B – PRIORITY 3

1) Explain the types of data and types of variables


Mount Zion College of Engineering & Technology
To Make Man Whole!!

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3352 – FOUNDATIONS OF DATA SCIENCE

UNIT III DESCRIBING RELATIONSHIPS

Priority 1

 Correlation - Definition
 correlation coefficient for quantitative data
 computational formula for correlation coefficient
 Regression - regression line
 least squares regression line
 multiple regression equations

Priority 2

 Standard error of estimate


 interpretation of r2
Priority 3
 Scatter plots
 regression towards the mean
PART A – PRIORITY 1

1) Define scatter plot.


2) Define Pearson correlation coefficient
3) Define Regression.
4) Differentiate simple and multiple linear regressions?
5) Define least square regression equation.
6) Define interpretation of r^2
7) Define multiple regressions.
8) Define regression towards mean.
9) Differentiate correlation and regression.
10) Define correlation matrix.
11) What is standard error estimate?
PART A – PRIORITY 2
1) What are the types of correlation?
2) What is causation?
3) Differentiate linear and non linear relationship.
4) List the types of nonlinear relationship.
5) What is curvilinear relationship?
Mount Zion College of Engineering & Technology
To Make Man Whole!!

6) What is an outlier?

PART A – PRIORITY 3

1) When does regression fallacy occur?


2) Give the least square regression equation.
3) State the multiple regression equation.

PART B – PRIORITY 1

1) Calculate the coefficient of correlation between the expenditure on advertising and


sales of the company from the following data.
Advertising Expenditure 165 166 167 168 167 169 170 172
(in 000 rs)
Sales (in Lakhs ) 167 168 165 172 168 172 169 171

2) In an investigation into prediction using the stars and planets a celebrated astrologist
Horace Cope predicted the ages at which thirteen young people would first marry. The
complete data, of predicted and actual ages at first marriage, are now available and are
summarised in the table.
Person Predicte Actual
d Age Age(y
(x years)
years)

A 24 23
B 30 31
C 28 28
D 36 35
E 20 20
F 22 25
G 31 45
H 28 30
I 21 22
J 29 27
K 40 40
L 25 27
M 27 26

i. Draw a scatter diagram of these data.


Mount Zion College of Engineering & Technology
To Make Man Whole!!

ii. Calculate the equation of the regression line of y on x and draw this line on the
scatter

3) Conduct a multiple regression analysis by finding the regression model on the


following data set.

Y X1 X2
140 60 22
155 62 25
159 67 24
179 70 20
192 71 15
200 72 14
212 75 14
215 78 11

PART B – PRIORITY 2

1) Find the standard error of the estimate of the mean weight of high school football
players using the data given of weight of the players
Player 1 2 3 4 5 6 7 8 9 10
Number
Weight in 150 203 176 190 168 193 189 178 197 172
pounds

2) What is the significance of r2? Give a detailed interpretation of r2.

PART B – PRIORITY 3

1) What are scatter plots? Elaborate on the various types with suitable examples.
2) Explain regression towards the mean.
Mount Zion College of Engineering & Technology
To Make Man Whole!!

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3352 – FOUNDATIONS OF DATA SCIENCE


UNIT IV- PYTHON LIBRARIES FOR DATA WRANGLING
Priority 1
 comparisons, masks, boolean logic
 Hierarchical indexing
 combining datasets
 aggregation and grouping
 pivot tables
Priority 2
 Data manipulation with Pandas
 data indexing and selection –
 operating on data
 missing data
Priority 3
 Basics of Numpy arrays
 aggregations
 computations on arrays
 fancy indexing
 structured arrays
PART A – PRIORITY 1
1. What is numpy in python used for?
2. Write a python program create an array?
3. Write the output of the following numpy copy.

• np.array([3,14,4,2,3])

• np,array([1,2,3,4],dtype=’float32’

• np.array([range (i,i+3) for i in [2,4,6]])

• np.zeros (1().dtype=int)

• np.ones((3,5),dtype=float)
Mount Zion College of Engineering & Technology
To Make Man Whole!!

• np.full((3,5),3.14)

• np.arange(0,20,2)

• np.linespace(0,1,5)

• np.random .random((3,3))

• np.random.normal(0,1(3,3))

4. What is data frame?


5. How a pandas data frame can be constructed?
6. What are indexers?
7. How missing data can be handled in python?
8. How the operations can be performed on null values in pandas data structures?
9. Define hierarchical indexing?
10. What is pivot table?
11. What is fancy indexing?
12. What is combined indexing?
13. What is the arithmetic operation implemented in numpy?
14. What is structured array in numpy?
15. Mention the purpose of iloc, loc, ix.

PART A – PRIORITY 2

1) What are index preservation and index alignment?


2) Map python operators and pandas methods.
3) Mention the function names that operate on null values in pandas.
4) How will you slice and index multi index?
5) What are the methods in data aggregations on multi indices?
PART A – PRIORITY 3
1) How will you concatenate arrays using numpy?
2) What are the categories of joins?
3) What are the methods used for groupby in pandas?
4) What are the basics of numpy arrays?
5) Define series object.

PART B – PRIORITY 1

1. Explain hierarchical indexing using pandas with an example.


2. Write a python program to explain Data indexing and selection using pandas.
3. How pandas libraries are used in data science for handling missing data ? Explain in
detail.
4. Explain combining datasets, aggregation and grouping in pandas.
Mount Zion College of Engineering & Technology
To Make Man Whole!!

5. Explain pivot tables in pandas.

PART B – PRIORITY 2

1. Extract from the array np.array([3,4,6,10,24,89,45,43,46,99,100]) with Boolean


masking all the number
a. Which are not divisible by 3
b. Which are divisible by 5
c. Which are divisible by 3 and 5
d. Which are divisible by 3 and set them to 42

PART B – PRIORITY 3

1) Explain about fancy indexing in detail using python program.


2) Write a python program to create structured arrays.
Mount Zion College of Engineering & Technology
To Make Man Whole!!

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3352 – FOUNDATIONS OF DATA SCIENCE


UNIT V DATA VISUALIZATION
Priority 1
 Three Dimensional Plotting
 Geographic Data with Basemap
 Visualization with Seaborn.
Priority 2
 Density and contour plots
 Histograms – legends – colors – subplots
 Text and annotation
 Visualizing errors
 Customization colors
Priority 3
 Importing Matplotlib
 Line plots
 Scatter plots
PART A - PRIORITY 1

1) What is the purpose of matplotlib?


2) Write the dual interface of matplotlib
3) How to draw a simple line plot using matplotlib?
4) Write the syntax to draw scatter plot using matplolib?
5) Write the different between plot and scatter functions?
6) Define contour plot?
7) What are the functions can be used to draw the contour plots?
8) What is the purpose of using histogram?
9) Write the source code to draw a simple histogram?
10) How to create a three-dimensional wireframe plot?
11) Define surface plot?
12) What is the use of seaborn?
13) Write the significance of data visualization.
14) Define Kernel Distribution Estimation
Mount Zion College of Engineering & Technology
To Make Man Whole!!

PART A - PRIORITY 2

1. What is the purpose of error bar?


2. Enumerate the classes of colormaps in scatterplot
3. Comment on text transforms.
4. List the ways to customize Matplotlib
5. What is density plot?
6. What are pair plot?
7. What are factor plot and surface plot in seaborn?

PART A - PRIORITY 3

1. Write the features of seaborn modules.


2. Short notes on Basemap tool kit.

PART B - PRIORITY 1

1. Explain about the density and contour plots in matplotlib.


2. Describe text and annotation in detail with a python programming.
3. Illustrate customization and three dimensional plotting
4. Explain Geographic Data with Basemap Visualization.
5. Write a python program to visualize various plots using seaborn.

PART B - PRIORITY 2

1. How histogram can be implemented using matplotlib?


2. How will you customize the legends in matplotlib?
3. Describe about the subplots in matplotlib.
4. Illustrate the simple line plots with its attributes in matplotlib
5. Write python program to visualize the dataset using scatterplot and explain its
parameters
PART B - PRIORITY 3

1. Explain the concept of adding single and multiple legends to the plot.
2. Describe about customizing colors
3. Write a python program to draw histogram for any dataset.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy