0% found this document useful (0 votes)

22 views11 pages

Esa - QP - Ue19-20cs203 - SDS

Uploaded by

pes1ug23cs690

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

Esa - QP - Ue19-20cs203 - SDS

Uploaded by

pes1ug23cs690

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

SRN

PES University, Bengaluru UE19/20CS203

(Established under Karnataka Act No. 16 of 2013)

MAY 2022: END SEMESTER ASSESSMENT (ESA) B TECH III SEMESTER

UE19/20CS203 – STATISTICS FOR DATA SCIENCE
Time: 3 Hrs Answer All Questions Max Marks: 100
• Answer all questions in the same order as given and to the point.
• Do not directly write the answer, write out all the steps taken to solve the problem
• Only the required data tables to solve the given problems are in the last page
1 a) What is sampling? Mention the different probability sampling techniques .Explain any three with 8
examples.
Solution:
The process of selecting observations(a sample) in order to make an inference that can be
generalized to the population.

Simple random sampling, as the name suggests, is an entirely random method of selecting the
sample.
● Here, each subject or unit in the population has an equal chance of being selected.
● The sampling frame should include the whole population.
● A table of random number or lottery system is used to determine which units are to be
selected.
Simple random sampling is always an EPS design, but not all EPS designs are simple random
sampling
Systematic sampling
When to Use : When project budget is tight and less time to complete.
Key Thing: Find the kth value to select every kth member. k = N / n
How: Assign numbers to each population member.
Selection : Randomly select first person and then select every kth person.
Advantages: Easy to select, Sample evenly spread over entire reference population, cost effective.
Disadvantages: Sample may be biased, Each element does not have equal chance, Ignorance of all
elements between two kth element.
SRN

Stratified sampling is the type of sampling in which the population is divided into 2 or more
groups called strata based on a shared characteristic or trait.
Then simple random samples are selected from each group.
The selected 2 or more samples are combined into one.
The strata or groups don’t overlap. But, they represent the entire population.
The shared characteristics based on which the population is divided could be gender,
educational attainment, income, age etc.
Cluster Sampling
When to Use : When population is already broken up into groups(clusters). Key Thing:
Heterogeneous members in each group.
How: Population is divided into non-overlapping areas(clusters).
Each cluster is a miniature or microcosm of a population.
Selection : Clusters are selected randomly and all elements are included or elements are chosen
using simple random sample.
Advantages: More convenient for geographically dispersed populations, Less travel cost,
Simplified administration of the survey.
Disadvantages: Statistically less efficient, Sampling error is higher,
problems are higher than simple random sampling.

b) What is web scraping? With a neat diagram explain the components of a web scraper. 1+5
Solution:
Solution:

Web scraping is like any other Extract-Transform-Load (ETL) Process. Web Scrapers crawl
websites, extracts data from it, transforms to a usable structured format and load it to a file or
database for subsequent use.
A typical web scraper has the following components.
SRN
c) For the following data 1x6
30 75 79 80 80 105 126 138 149 179 179 191
223 232 232 236 240 242 245 247 254 274 384 470
Compute the mean, median, mode and the 5%, 10%, and 20% trimmed means

Solution:
The mean is found by averaging together all 24 numbers, which produces a value of
195.42.
The median is the average of the 12th and 13th numbers, which is (191 + 223)/2 =
207.00.
It is trimodal 80,179,232
To compute the 5% trimmed mean, we must drop 5%
of the data from each end. This comes to (0.05)(24) = 1.2 observations.
We round 1.2 to 1, and trim one observation off each end
The 5% trimmed mean is the average of the remaining 22 numbers:
75 + 79 +···+ 274 + 384/22= 190.45
To compute the 10% trimmed mean, round off (0.1)(24) = 2.4 to 2.
Drop 2 observations from each end, and then average the remaining 20:
79 + 80 +···+ 254 + 274/20= 186.55
To compute the 20% trimmed mean, round off (0.2)(24) = 4.8 to 5. Drop 5 observations
from each end, and then average the remaining 14:
105 + 126 +···+ 242 + 245/14= 194.07

2 a) The four sides of a rectangular frame consist of two pieces selected from a population 1+2+2
whose mean length is 30 cm with standard deviation 0.1 cm, and two pieces selected from
a population whose mean length is 45 cm with standard deviation 0.3 cm.
i. Find the mean perimeter of the rectangular frame.
ii. Assuming the four pieces are chosen independently, find the standard deviation of
the perimeter.
Solution:
Let X1 and X2 denote the lengths of the pieces chosen from the population with mean 30
and standard deviation 0.1, and let Y1 and Y2 denote the lengths of the pieces chosen from
the population with mean 45 and standard deviation 0.3.

i. μX1+X2+Y1+Y2 = μX1 +μX2 +μY1 +μY2 = 30+30+45+45= 150

ii.
SRN
b) IC chips often contain surface imperfections. For a certain type of IC chip, 9% contain no 1+2+2
imperfections, 22% contain 1 imperfection, 26% contain 2 imperfections,20% contain 3
imperfections, 12% contain 4 imperfections, and the remaining 11% contain 5
imperfections. Let Y represent the number of imperfections in a randomly chosen chip.
What are the possible values for Y? Is Y discrete or continuous? Find P(Y = y) for each
possible value y.

Solution
The possible values for Y are the integers 0, 1, 2, 3, 4, and 5. The random variable Y is discrete,
because it takes on only integer values. Nine percent of the outcomes in the sample space are
assigned the value 0. Therefore P(Y = 0) = 0.09. Similarly P(Y = 1) = 0.22, P(Y = 2) = 0.26, P(Y =
3) = 0.20, P(Y = 4) = 0.12, and P(Y = 5) = 0.11.

c) X is a continuous Random Variable with the probability density function as given below. 2+3

It is verified that µx=50 and σx=0.45. Compute the probability that the X is outside the
interval 49.1 - 50.9. How close is this probability to the Chebyshev’s Inequality bound?

Solution:
SRN
d) A Company produces “20 ounce” jars of a Chilly sauce. The true amounts of sauce in the 2+3
jars of this brand sauce follow a normal distribution. Suppose the companies “20 ounce”
jars follow a normal distribution with a mean µ=20.2 ounces with a standard deviation
s=0.125 ounces. What proportion of the sauce jars contain between 20 and 20.3 ounces of
sauce?

Solution:

3 a) Let X1, . . . , Xn be a random sample from a population with the Poisson(λ) distribution. 5
Find the MLE of λ.

Solution:
SRN
b) Let X1 and X2 be independent, each with unknown mean μ and known variance 5

σ2 = 1.Let . Find the bias, variance, and mean squared error of .

Solution:

c) A random sample of n = 50 boys showed a mean average daily intake of protein products 2+2
equal to 756 grams with a standard deviation of 35 grams.
i. Find a 95% confidence interval for the population average µ.
ii. Find a 99% confidence interval for µ, the population average daily intake
of protein products for boys.

Solution:
35
s  756 ± 1.96  756 ± 9.70
x ± 1.96 50
n
or 746.30 < μ < 765.70 grams.

x ± 2.58
s
 756 ± 2.58
35  756 ± 12.77
n 50

or 743.23 < μ < 768.77 grams.

SRN
d) 3+3
Estimate the confidence intervals for the following:
i. A group of 78 people enrolled in a weight-loss program that involved adhering to a
special diet and to a daily exercise program. After six months, their mean weight
loss was 25 pounds, with a sample standard deviation of 9 pounds. A second group
of 43 people went on the diet but didn’t exercise. After six months, their mean
weight loss was 14 pounds, with a sample standard deviation of 7 pounds. Find a
95%confidence interval for the mean difference between the weight losses.

ii. In a random sample of 150 customers of a high-speed internet provider, 63 said

that their service had been interrupted one or more times in the past month. Find a
95% confidence interval for the proportion of customers whose service was
interrupted one or more times in the past month.

Solution:

4 a) A marketing company claims that it receives 8% responses from its mailing. To test this 1+1+2+1
claim, a random sample of 500 were surveyed with 30 responses. Test at the  = .05
significance level.

Solution:

First, check:
n p˄ = (500)(.08) = 40 Determine region of rejection

n(1-p˄) = (500)(.92) = 460

H0: p˄ = .08 H1: p˄ ≠ .08

α = .05
n = 500, p = .06
Critical Values: ± 1.96
p − .06 − .08
Z= = = −1.648
 (1 −  ) .08(1 − .08)
n 500

Do not reject H0 at  = .05

There isn’t sufficient evidence to reject the company’s claim of 8% response rate.
SRN
4 b) Recently many of the IT companies have been experimenting with work from 2+2+2
home(WFH), allowing employees to work at home on their computers. Among other
things, WFH is supposed to reduce the number of sick days taken. Suppose that at one
firm, it is known that over the past few years employees have taken a mean of 5.4 sick
days. This year, the firm introduces WFH. Management chooses a simple random sample
of 80 employees to follow in detail, and, at the end of the year, these employees average
4.5 sick days with a standard deviation of 2.7 days. Let μ represent the mean number of
sick days for all employees of the firm.
i. Find the P-value for testing H0 :μ ≥ 5.4 versus H1 :μ < 5.4.
ii. Do you believe it is plausible that the mean number of sick days is at least 5.4, or
are you convinced that it is less than 5.4? Explain your reasoning.
iii. Is the result statistically significant at the 5% level?

Solution:

Yes, the result statistically significant at the 5% level

c) For the given table of observed values, 2+2

i. Construct the corresponding table of expected values.
ii. If appropriate, perform the chi-square test for the null hypothesis that the row and
column outcomes are independent. If not appropriate, explain why.

Solution:
SRN
d) A test is made of the hypotheses H0 :μ ≤ 25 versus H1 :μ > 25. For each of the following 5x1
situations, determine whether the decision was correct, a type I error
occurred, or a type II error occurred.
i. μ = 23, H0 is rejected. ii. μ = 25, H0 is not rejected.
iii. μ = 29, H0 is not rejected. iv. μ = 27, H0 is rejected.
v. μ = 20, H0 is not rejected

Solution:

Correct decision .H0 is True and not rejected

5 a) Find the power of the 5% level test of H0: μ ≤ 80 versus H1: μ > 80 for the mean yield of 1+2+2
the new process under the alternative μ = 82, assuming n = 50 and σ = 5.

Solution:
the population standard deviation for the new process is σ = 5 and that

upper 5% is the rejection region. The critical point has a z-score of 1.645, so its value
is 80+ (1.645) (0.707) = 81.16.We will reject H0 if ≥ 81.16. This is the rejection
region.

We will reject H0 if ≥ 81.16. The z-score for the critical point of 81.16
under the alternate hypothesis is z = (81.16 − 82)/0.707 = −1.19. The area to the right
of z = −1.19 is 0.8830. This is the power.

b) State the assumptions for Errors in Linear Models. 2

Solution:
Assumptions for Errors in Linear Models:
In the simplest situation, the following assumptions are satisfied:
1. The errors 1,…,n are random and independent. In
particular, the magnitude of any error i does not
influence the value of the next error i + 1.
2. The errors 1,…,n all have mean 0.
3. The errors 1,…,n all have the same variance, which
we denote by 2.
The errors 1,…,n are normally distributed.
SRN
c) What is a confounding variable? How we can reduce the risk of confounding 3

Solution:

❖ Confounding Variable is a variable that influences both the independent variable

as well as the dependent variable causing a spurious correlation.
❖ This may interfere in your analysis and ruin your experiment by giving useless
results.
❖ Confounding variables can cause two major problems:
▪ Increase variance
▪ Introduce bias.
❖ A confounding variable are like extra independent variables that are having a
hidden effect on your dependent variables.
❖ A confounding variable can be what the actual cause of a correlation is, hence any
studies must take these into account and find ways of dealing with them.

❖ One of the ways by which confounding can be avoided in controlled experiments

by choosing values for certain factors in such a way that there exists no correlation
between those factors.
SRN
d) The details pertaining to the no. of hours spent by students in preparing for the SDS final 10
exam and the marks scored (on a scale of (0 – 100) is provided in the following table.
Using these values, Estimate the marks scored by a student who has spent 2.35
hours.

Solution:
❖ We need to first obtain the least square line which is given by,

▪
▪

Inferential Statistics
No ratings yet
Inferential Statistics
74 pages
Gabion Wall Design Sheet
100% (1)
Gabion Wall Design Sheet
3 pages
Central Limit Theorem and Confidence Interval Notes
No ratings yet
Central Limit Theorem and Confidence Interval Notes
11 pages
Week 3
No ratings yet
Week 3
56 pages
Sampling and Estimation by Mureba M B
No ratings yet
Sampling and Estimation by Mureba M B
25 pages
ESA - QP - UE19-20CS203 - SDS - Scheme and Solution
No ratings yet
ESA - QP - UE19-20CS203 - SDS - Scheme and Solution
12 pages
Sampling Design: Basic Concepts and Procedure: Sampling Frame. Known. Random Samples
No ratings yet
Sampling Design: Basic Concepts and Procedure: Sampling Frame. Known. Random Samples
18 pages
Module 4 (301 SI-2)
No ratings yet
Module 4 (301 SI-2)
24 pages
Statistics YTU Day 1
No ratings yet
Statistics YTU Day 1
37 pages
Lecture 2 SRS
No ratings yet
Lecture 2 SRS
25 pages
AdHStat1 3notes
No ratings yet
AdHStat1 3notes
10 pages
Section Check in Statistics Sampling
No ratings yet
Section Check in Statistics Sampling
8 pages
Statistical Inference
No ratings yet
Statistical Inference
52 pages
Pre FinalExam Reviewer
No ratings yet
Pre FinalExam Reviewer
4 pages
CH 6 Sampling - and - Estimation
No ratings yet
CH 6 Sampling - and - Estimation
15 pages
Eda Finals
No ratings yet
Eda Finals
8 pages
Research Methodology and Biostatistics Part II 2
No ratings yet
Research Methodology and Biostatistics Part II 2
45 pages
Chi Square Test
No ratings yet
Chi Square Test
44 pages
Unit-Iii P&S
No ratings yet
Unit-Iii P&S
21 pages
Math
No ratings yet
Math
10 pages
Unit-2 Continuous Distribution
No ratings yet
Unit-2 Continuous Distribution
9 pages
CH 7
No ratings yet
CH 7
18 pages
Lecture 5 Confidence Intervals - After Class
No ratings yet
Lecture 5 Confidence Intervals - After Class
26 pages
Survey Sampling: Stat 138
No ratings yet
Survey Sampling: Stat 138
8 pages
Statistics Lecture Course 2022-2023
No ratings yet
Statistics Lecture Course 2022-2023
66 pages
Basic Univariate Statistics For Engineers 2019
No ratings yet
Basic Univariate Statistics For Engineers 2019
32 pages
Data Science Interview Q - A
No ratings yet
Data Science Interview Q - A
165 pages
Essentials of Modern Business Statistics With Microsoft Excel 6th Edition Anderson Solutions Manual Download
100% (15)
Essentials of Modern Business Statistics With Microsoft Excel 6th Edition Anderson Solutions Manual Download
25 pages
Public Administration Unit-53 Patterns of Relationship Between The Secretariat and Directorates
100% (3)
Public Administration Unit-53 Patterns of Relationship Between The Secretariat and Directorates
20 pages
Week 11: Sampling Distribution
No ratings yet
Week 11: Sampling Distribution
9 pages
A) An Inference Made About The Population Based On The Sample
No ratings yet
A) An Inference Made About The Population Based On The Sample
11 pages
B.SC (CS With AI) Unit - 1
No ratings yet
B.SC (CS With AI) Unit - 1
19 pages
Reviewer in Statistics and Probability
No ratings yet
Reviewer in Statistics and Probability
7 pages
Reviewer Statistics and Probability
No ratings yet
Reviewer Statistics and Probability
5 pages
Chapter 8 Notes
100% (1)
Chapter 8 Notes
6 pages
Data Science Q&A
No ratings yet
Data Science Q&A
4 pages
SMA 4.1 Sampling and Estimation
No ratings yet
SMA 4.1 Sampling and Estimation
27 pages
Lectorial Slides 6a
No ratings yet
Lectorial Slides 6a
30 pages
REELS - R Test
100% (1)
REELS - R Test
11 pages
Seminar 4
No ratings yet
Seminar 4
43 pages
Applied Statistics and Probability For Engineers Chapter - 8
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 8
13 pages
Ch6 Sampling and Estimation
No ratings yet
Ch6 Sampling and Estimation
24 pages
Ba7102 SM Nov Dec 2015 Imp QN
No ratings yet
Ba7102 SM Nov Dec 2015 Imp QN
4 pages
Reviewer StatProb
No ratings yet
Reviewer StatProb
35 pages
Stats-And-Prob-Reviewer (Grade 11 Stem)
100% (1)
Stats-And-Prob-Reviewer (Grade 11 Stem)
5 pages
Test 1 Review A
No ratings yet
Test 1 Review A
7 pages
Brief Lecture Notes
No ratings yet
Brief Lecture Notes
13 pages
Philippine Christian University: Week 1
No ratings yet
Philippine Christian University: Week 1
6 pages
Review Questions On Chapter 4
No ratings yet
Review Questions On Chapter 4
2 pages
Problem Set Solution QT I I 17 Dec
No ratings yet
Problem Set Solution QT I I 17 Dec
22 pages
Ce 023 Eda Module 4
No ratings yet
Ce 023 Eda Module 4
5 pages
Sampling & Estimation
No ratings yet
Sampling & Estimation
19 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
15 pages
MATH 102 Prelim Exam
No ratings yet
MATH 102 Prelim Exam
9 pages
Rosenberg Self Esteem
No ratings yet
Rosenberg Self Esteem
2 pages
Iso Iec 29146-2016
No ratings yet
Iso Iec 29146-2016
42 pages
Lecture 10B - Area Computation Techniques and Omitted Measurements
No ratings yet
Lecture 10B - Area Computation Techniques and Omitted Measurements
14 pages
Sample Final Exam A
100% (1)
Sample Final Exam A
12 pages
QUESTIONS - Quantitative Technique Answer
No ratings yet
QUESTIONS - Quantitative Technique Answer
13 pages
Statistics and Probability
No ratings yet
Statistics and Probability
7 pages
Presentation By: Graciella Fae C. Puyaoan
No ratings yet
Presentation By: Graciella Fae C. Puyaoan
17 pages
Guide To Parksmart Certification
No ratings yet
Guide To Parksmart Certification
6 pages
Analysis of Laminated Composite Plate Using Matlab
100% (1)
Analysis of Laminated Composite Plate Using Matlab
10 pages
Simple Random Sampling
No ratings yet
Simple Random Sampling
10 pages
Modul B Inggris Xii
100% (1)
Modul B Inggris Xii
27 pages
Progress in Energy and Combustion Science: Steffen Heidenreich, Pier Ugo Foscolo
No ratings yet
Progress in Energy and Combustion Science: Steffen Heidenreich, Pier Ugo Foscolo
24 pages
Olaigbe-Cv Rev 2.0
No ratings yet
Olaigbe-Cv Rev 2.0
3 pages
Syl 1011 Psychology of Love
No ratings yet
Syl 1011 Psychology of Love
3 pages
Grades 3-6
No ratings yet
Grades 3-6
77 pages
Table 1. Standard Rating Conditions: Gpm/ton 105.00
No ratings yet
Table 1. Standard Rating Conditions: Gpm/ton 105.00
28 pages
Chad Rafe: Unit: Unit 1 Lesson: Getting Started in Spanish I
No ratings yet
Chad Rafe: Unit: Unit 1 Lesson: Getting Started in Spanish I
64 pages
Madhavi Sem 4 Dissertation
No ratings yet
Madhavi Sem 4 Dissertation
64 pages
Managing Innovation: (Research, Design, Production, and Marketing)
No ratings yet
Managing Innovation: (Research, Design, Production, and Marketing)
33 pages
E-Recruitment A New Dimension of Human R PDF
No ratings yet
E-Recruitment A New Dimension of Human R PDF
7 pages
Crim Reduced Outline
No ratings yet
Crim Reduced Outline
19 pages
Mithun's Project Report
No ratings yet
Mithun's Project Report
42 pages
Significant Figures
No ratings yet
Significant Figures
31 pages
ConfD Kick Start Guide
No ratings yet
ConfD Kick Start Guide
37 pages
RM1038-e - Pasio 50
No ratings yet
RM1038-e - Pasio 50
6 pages
A Machine Learning Approach: SVM For Image Classification in CBIR
No ratings yet
A Machine Learning Approach: SVM For Image Classification in CBIR
7 pages
Technology: Autoform
No ratings yet
Technology: Autoform
8 pages
Gokulram J Instr Supervisor 7
No ratings yet
Gokulram J Instr Supervisor 7
3 pages
Local Literature 1
No ratings yet
Local Literature 1
5 pages
Assignement 1 - NISR IZABAYO Jean de La Croix 220005236
No ratings yet
Assignement 1 - NISR IZABAYO Jean de La Croix 220005236
3 pages
Poster - The Canary in The Mineshaft
No ratings yet
Poster - The Canary in The Mineshaft
1 page
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
SAT Math Level 2 Subject Test Practice Problems 2013 Edition
From Everand
SAT Math Level 2 Subject Test Practice Problems 2013 Edition
Dr. David Kronmiller
1/5 (1)
ACT Math Section and SAT Math Level 2 Subject Test Practice Problems 2013 Edition
From Everand
ACT Math Section and SAT Math Level 2 Subject Test Practice Problems 2013 Edition
Dr. David Kronmiller
3/5 (3)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Esa - QP - Ue19-20cs203 - SDS

Uploaded by

Esa - QP - Ue19-20cs203 - SDS

Uploaded by

SRN

PES University, Bengaluru UE19/20CS203

MAY 2022: END SEMESTER ASSESSMENT (ESA) B TECH III SEMESTER

i. μX1+X2+Y1+Y2 = μX1 +μX2 +μY1 +μY2 = 30+30+45+45= 150

σ2 = 1.Let . Find the bias, variance, and mean squared error of .

or 743.23 < μ < 768.77 grams.

ii. In a random sample of 150 customers of a high-speed internet provider, 63 said

n(1-p˄) = (500)(.92) = 460

H0: p˄ = .08 H1: p˄ ≠ .08

Do not reject H0 at  = .05

Yes, the result statistically significant at the 5% level

c) For the given table of observed values, 2+2

Correct decision .H0 is True and not rejected

b) State the assumptions for Errors in Linear Models. 2

❖ Confounding Variable is a variable that influences both the independent variable

❖ One of the ways by which confounding can be avoided in controlled experiments

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.