Esa - QP - Ue19-20cs203 - SDS
Esa - QP - Ue19-20cs203 - SDS
Simple random sampling, as the name suggests, is an entirely random method of selecting the
sample.
● Here, each subject or unit in the population has an equal chance of being selected.
● The sampling frame should include the whole population.
● A table of random number or lottery system is used to determine which units are to be
selected.
Simple random sampling is always an EPS design, but not all EPS designs are simple random
sampling
Systematic sampling
When to Use : When project budget is tight and less time to complete.
Key Thing: Find the kth value to select every kth member. k = N / n
How: Assign numbers to each population member.
Selection : Randomly select first person and then select every kth person.
Advantages: Easy to select, Sample evenly spread over entire reference population, cost effective.
Disadvantages: Sample may be biased, Each element does not have equal chance, Ignorance of all
elements between two kth element.
SRN
Stratified sampling is the type of sampling in which the population is divided into 2 or more
groups called strata based on a shared characteristic or trait.
Then simple random samples are selected from each group.
The selected 2 or more samples are combined into one.
The strata or groups don’t overlap. But, they represent the entire population.
The shared characteristics based on which the population is divided could be gender,
educational attainment, income, age etc.
Cluster Sampling
When to Use : When population is already broken up into groups(clusters). Key Thing:
Heterogeneous members in each group.
How: Population is divided into non-overlapping areas(clusters).
Each cluster is a miniature or microcosm of a population.
Selection : Clusters are selected randomly and all elements are included or elements are chosen
using simple random sample.
Advantages: More convenient for geographically dispersed populations, Less travel cost,
Simplified administration of the survey.
Disadvantages: Statistically less efficient, Sampling error is higher,
problems are higher than simple random sampling.
b) What is web scraping? With a neat diagram explain the components of a web scraper. 1+5
Solution:
Solution:
Web scraping is like any other Extract-Transform-Load (ETL) Process. Web Scrapers crawl
websites, extracts data from it, transforms to a usable structured format and load it to a file or
database for subsequent use.
A typical web scraper has the following components.
SRN
c) For the following data 1x6
30 75 79 80 80 105 126 138 149 179 179 191
223 232 232 236 240 242 245 247 254 274 384 470
Compute the mean, median, mode and the 5%, 10%, and 20% trimmed means
Solution:
The mean is found by averaging together all 24 numbers, which produces a value of
195.42.
The median is the average of the 12th and 13th numbers, which is (191 + 223)/2 =
207.00.
It is trimodal 80,179,232
To compute the 5% trimmed mean, we must drop 5%
of the data from each end. This comes to (0.05)(24) = 1.2 observations.
We round 1.2 to 1, and trim one observation off each end
The 5% trimmed mean is the average of the remaining 22 numbers:
75 + 79 +···+ 274 + 384/22= 190.45
To compute the 10% trimmed mean, round off (0.1)(24) = 2.4 to 2.
Drop 2 observations from each end, and then average the remaining 20:
79 + 80 +···+ 254 + 274/20= 186.55
To compute the 20% trimmed mean, round off (0.2)(24) = 4.8 to 5. Drop 5 observations
from each end, and then average the remaining 14:
105 + 126 +···+ 242 + 245/14= 194.07
2 a) The four sides of a rectangular frame consist of two pieces selected from a population 1+2+2
whose mean length is 30 cm with standard deviation 0.1 cm, and two pieces selected from
a population whose mean length is 45 cm with standard deviation 0.3 cm.
i. Find the mean perimeter of the rectangular frame.
ii. Assuming the four pieces are chosen independently, find the standard deviation of
the perimeter.
Solution:
Let X1 and X2 denote the lengths of the pieces chosen from the population with mean 30
and standard deviation 0.1, and let Y1 and Y2 denote the lengths of the pieces chosen from
the population with mean 45 and standard deviation 0.3.
ii.
SRN
b) IC chips often contain surface imperfections. For a certain type of IC chip, 9% contain no 1+2+2
imperfections, 22% contain 1 imperfection, 26% contain 2 imperfections,20% contain 3
imperfections, 12% contain 4 imperfections, and the remaining 11% contain 5
imperfections. Let Y represent the number of imperfections in a randomly chosen chip.
What are the possible values for Y? Is Y discrete or continuous? Find P(Y = y) for each
possible value y.
Solution
The possible values for Y are the integers 0, 1, 2, 3, 4, and 5. The random variable Y is discrete,
because it takes on only integer values. Nine percent of the outcomes in the sample space are
assigned the value 0. Therefore P(Y = 0) = 0.09. Similarly P(Y = 1) = 0.22, P(Y = 2) = 0.26, P(Y =
3) = 0.20, P(Y = 4) = 0.12, and P(Y = 5) = 0.11.
c) X is a continuous Random Variable with the probability density function as given below. 2+3
It is verified that µx=50 and σx=0.45. Compute the probability that the X is outside the
interval 49.1 - 50.9. How close is this probability to the Chebyshev’s Inequality bound?
Solution:
SRN
d) A Company produces “20 ounce” jars of a Chilly sauce. The true amounts of sauce in the 2+3
jars of this brand sauce follow a normal distribution. Suppose the companies “20 ounce”
jars follow a normal distribution with a mean µ=20.2 ounces with a standard deviation
s=0.125 ounces. What proportion of the sauce jars contain between 20 and 20.3 ounces of
sauce?
Solution:
3 a) Let X1, . . . , Xn be a random sample from a population with the Poisson(λ) distribution. 5
Find the MLE of λ.
Solution:
SRN
b) Let X1 and X2 be independent, each with unknown mean μ and known variance 5
Solution:
c) A random sample of n = 50 boys showed a mean average daily intake of protein products 2+2
equal to 756 grams with a standard deviation of 35 grams.
i. Find a 95% confidence interval for the population average µ.
ii. Find a 99% confidence interval for µ, the population average daily intake
of protein products for boys.
Solution:
35
s 756 ± 1.96 756 ± 9.70
x ± 1.96 50
n
or 746.30 < μ < 765.70 grams.
x ± 2.58
s
756 ± 2.58
35 756 ± 12.77
n 50
Solution:
4 a) A marketing company claims that it receives 8% responses from its mailing. To test this 1+1+2+1
claim, a random sample of 500 were surveyed with 30 responses. Test at the = .05
significance level.
Solution:
First, check:
n p˄ = (500)(.08) = 40 Determine region of rejection
α = .05
n = 500, p = .06
Critical Values: ± 1.96
p − .06 − .08
Z= = = −1.648
(1 − ) .08(1 − .08)
n 500
Solution:
Solution:
SRN
d) A test is made of the hypotheses H0 :μ ≤ 25 versus H1 :μ > 25. For each of the following 5x1
situations, determine whether the decision was correct, a type I error
occurred, or a type II error occurred.
i. μ = 23, H0 is rejected. ii. μ = 25, H0 is not rejected.
iii. μ = 29, H0 is not rejected. iv. μ = 27, H0 is rejected.
v. μ = 20, H0 is not rejected
Solution:
5 a) Find the power of the 5% level test of H0: μ ≤ 80 versus H1: μ > 80 for the mean yield of 1+2+2
the new process under the alternative μ = 82, assuming n = 50 and σ = 5.
Solution:
the population standard deviation for the new process is σ = 5 and that
upper 5% is the rejection region. The critical point has a z-score of 1.645, so its value
is 80+ (1.645) (0.707) = 81.16.We will reject H0 if ≥ 81.16. This is the rejection
region.
We will reject H0 if ≥ 81.16. The z-score for the critical point of 81.16
under the alternate hypothesis is z = (81.16 − 82)/0.707 = −1.19. The area to the right
of z = −1.19 is 0.8830. This is the power.
Solution:
Solution:
❖ We need to first obtain the least square line which is given by,
▪
▪