Priority Questions
Priority Questions
UNIT I INTRODUCTION
P3 Data Science: Benefits and uses – facets of data
P1*Data Science Process: Overview
P2Defining research goals
P1*Retrieving data
P1*Data preparation
P1*Exploratory Data analysis
P1*Build the model
P2Presenting findings and building applications
P3Data Mining and Data Warehousing
P2Basic Statistical descriptions of Data
PART A - PRIORITY 1
PART A - PRIORITY 2
PART A - PRIORITY 3
1) List the common evaluation metrics used to measure the performance of models.
2) How will you combine data from different data sources?
3) Define Euclidean distance.
4) What is machine learning?
PART B – PRIORITY 1
PART B – PRIORITY 2
1) Discuss the significance setting research goal for the data science project.
2) Describe in brief about the tools for data science model building.
3) Explain the basic statistical descriptions of data.
4) Explain the benefits and uses of data science.
PART B – PRIORITY 3
P3 Types of Data
P2 Types of Variables
P1* Describing Data with Tables and Graphs
P1* Describing Data with Averages
P1* Describing Variability
P2 Normal Distributions and Standard (z) Scores
PART A – PRIORITY 1
PART A – PRIORITY 2
1) What is causation?
2) What is positive relationship?
3) What is negative relationship?
4) Define linear relationship
5) What is curvilinear relationship?
6) What is standard error estimate?
7) When does regression fallacy occur?
Mount Zion College of Engineering & Technology
To Make Man Whole!!
PART A – PRIORITY 3
PART B – PRIORITY 1
1) Explain the different types of frequency distribution with suitable examples and
diagrams
2) Construct the histogram and convert it to a frequency polygon for the following data
138, 139, 139, 145, 145, 150, 145, 136, 150, 152, 144, 138, 138, 150, 149, 133, 134,
152, 155, 151
3) Using the computation formula for the sum of squares, calculate the population
standard deviation for the scores in (a) and the sample standard deviation for the
scores in (b).
a) 1, 3, 7, 2, 0, 4, 7, 3
b) 10, 8, 5, 0, 1, 1, 7, 9, 2
6) Determine the values of the range and the IQR for the following sets of data.
a) Retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63
b) Residence changes: 1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4
7) Suppose that the burning times of electric light bulbs approximate a normal curve
with a mean of 1200 hours and a standard deviation of 120 hours. What proportion
of lights burn for
(a) less than 960 hours?
(b) more than 1500 hours?
(c) within 50 hours of the mean?
(d) between 1300 and 1400 hours?
PART B – PRIORITY 2
1) Discuss the methods to measure the variability for qualitative and ranked data.
2) Construct the frequency table and draw histogram, stem leaf displays for the
following data 139, 145, 150, 145, 136, 150, 152, 144, 138, 138
3) Compute the mean, median and mode for the following data sets
a) 45, 55, 60, 60, 63, 63, 63, 63, 65, 65, 70
b) 26.9, 26.3, 28.7, 27.4, 26.6, 27.4, 26.9, 26.9
PART B – PRIORITY 3
Priority 1
Correlation - Definition
correlation coefficient for quantitative data
computational formula for correlation coefficient
Regression - regression line
least squares regression line
multiple regression equations
Priority 2
6) What is an outlier?
PART A – PRIORITY 3
PART B – PRIORITY 1
2) In an investigation into prediction using the stars and planets a celebrated astrologist
Horace Cope predicted the ages at which thirteen young people would first marry. The
complete data, of predicted and actual ages at first marriage, are now available and are
summarised in the table.
Person Predicte Actual
d Age Age(y
(x years)
years)
A 24 23
B 30 31
C 28 28
D 36 35
E 20 20
F 22 25
G 31 45
H 28 30
I 21 22
J 29 27
K 40 40
L 25 27
M 27 26
ii. Calculate the equation of the regression line of y on x and draw this line on the
scatter
Y X1 X2
140 60 22
155 62 25
159 67 24
179 70 20
192 71 15
200 72 14
212 75 14
215 78 11
PART B – PRIORITY 2
1) Find the standard error of the estimate of the mean weight of high school football
players using the data given of weight of the players
Player 1 2 3 4 5 6 7 8 9 10
Number
Weight in 150 203 176 190 168 193 189 178 197 172
pounds
PART B – PRIORITY 3
1) What are scatter plots? Elaborate on the various types with suitable examples.
2) Explain regression towards the mean.
Mount Zion College of Engineering & Technology
To Make Man Whole!!
• np.array([3,14,4,2,3])
• np,array([1,2,3,4],dtype=’float32’
• np.zeros (1().dtype=int)
• np.ones((3,5),dtype=float)
Mount Zion College of Engineering & Technology
To Make Man Whole!!
• np.full((3,5),3.14)
• np.arange(0,20,2)
• np.linespace(0,1,5)
• np.random .random((3,3))
• np.random.normal(0,1(3,3))
PART A – PRIORITY 2
PART B – PRIORITY 1
PART B – PRIORITY 2
PART B – PRIORITY 3
PART A - PRIORITY 2
PART A - PRIORITY 3
PART B - PRIORITY 1
PART B - PRIORITY 2
1. Explain the concept of adding single and multiple legends to the plot.
2. Describe about customizing colors
3. Write a python program to draw histogram for any dataset.