0% found this document useful (0 votes)

34 views35 pages

Seminar Week 4 - With Solutions - Fullpage

Uploaded by

Anika Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views35 pages

Seminar Week 4 - With Solutions - Fullpage

Uploaded by

Anika Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

ETF1100 Business Statistics

Week 4
Understanding Statistical Uncertainty
Charanjit Kaur
Week 4: Samples and Sampling Distributions
Learning Outcomes:
• Revisiting the Normal Distribution
• Understanding the process and purpose of random sampling
• Identifying possible biases from different methods of sampling
• Understanding the law of large numbers & the central limit theorem (CLT)
• Using sampling distribution of a statistic to express uncertainties
Probability Distribution: Normal Distribution
• The most common distribution in statistics is the normal distribution
• It is a symmetric (bell-shaped) distribution
• The normal distribution has two features: Mean and Stdev

Normal Distribution
• Notation: 𝑿 ~ 𝑵 𝑴𝒆𝒂𝒏, 𝑺𝒕𝒅𝒆𝒗
𝑿 ~ 𝑵 𝝁, 𝝈
• Skewness = 0
• Mean = Median = Mode
Calculating Normal Distribution Probabilities using Excel
=NORM.INV(probability, mean, standard deviation)
This calculate what “X value” is such that the probability of getting a number LOWER than
this is equal to the entered probability.
→ Percentile at the desired probability

Calculate the 10th percentile of this price distribution

P 𝑃𝑟𝑖𝑐𝑒 < 𝑃𝑟𝑖𝑐𝑒 ∗ = 10%
=NORM.INV(probability, mean, standard deviation)
𝑃𝑟𝑖𝑐𝑒 ∗ =NORM.INV(0.1,1215,419)
=678.03
1215 Price
10% of houses are priced lower than $678.03k (000s)
STANDARD Normal Distribution
• A special case: STANDARD normal distribution Z=
𝑋−𝜇
𝜎

• Mean = 0 and Stdev = 1

• Used in statistics to assess statistical uncertainty (this week… and next week)

Standard Normal Distribution

• Notation: 𝑍 ~ 𝑁(0,1)
• Skewness = 0
• Mean = Median = Mode
Normal Distribution & Standard normal distribution

𝑋 ~ 𝑁(𝑀𝑒𝑎𝑛, 𝑆𝑡𝑑𝑒𝑣) 𝑍 ~ 𝑁(0,1)

House Price Distribution
𝑃𝑟𝑖𝑐𝑒 000𝑠 ∼ 𝑁(𝑀𝑒𝑎𝑛 = 1215, 𝑆𝑡𝑑𝑒𝑣 = 419)

1215 Price ($000s)

Is the normal distribution always a good approximation for numerical data?

• It depends on the nature of the observed data distribution
• Not appropriate for skewed distribution
• It is most used in the context of statistical analysis/hypothesis testing → STANDARD
NORMAL distribution
Purpose of Random Sampling
Let’s first go back to “statistics”

Statistics: The study of the collection, organisation,

analysis, and interpretation of data.

How is statistics used in decision making?

→ drawing general conclusions about the population from
a sample set of data
Basic Concepts of Statistics: Population and Sample
Sample:
A subset of the population selected for analysis.
• Often chosen randomly
• Preferably representative of the population
Statistic (Estimate): Computable summaries of the sample
• Sample mean (𝑥)ҧ and
• Sample standard deviation (𝑠)

Population:
All members of a group about which you want to draw a conclusion.
Parameter: A measurable characteristic of a population
• population mean (𝜇)
• population standard deviation (𝜎)
Characteristics of Random Sample
Representative
• sample is randomly chosen from the whole population.
• characteristics of people who respond to a survey do not differ significantly from those
in the same sample who do not respond.
• members of the sample exhibit features consistent with the general population.
• all members of the population are equally likely to be chosen for the sample.
• sampling is not done based on voluntary participation.
Representative Sample
Representative sample is determined by:
1) Data collection process (sampling design)
2) Survey design → wording design of the questions/form.
3) Sample size → a sufficiently large sample means the sample statistic gets closer to the population
parameter
Biased sample:
• Non-representative statistics
• Invalid inference → invalid conclusions. It could end with catastrophic outcomes if used in business
decisions

Potential biases:
• Selection bias – each identity in the population has an uneven chance of being chosen
• Non-responsive bias – data collection process leading to systematic non-response from certain
groups
Identifying potential sampling bias: Examples
A marketing study aimed at analysing metro train users’ satisfaction is being
conducted for the Melbourne metropolitan zone.
• Sample 1: Data is collected by verbal surveys across all train stations across the
metropolitan zone. The surveyor conducts the survey by randomly choosing
passengers across all operational times over one week period.
→Minimal selection bias

• Sample 2: Data is collected by a mandatory survey of all train passengers

arriving and exiting at Caulfield station between 8-9 am on Friday morning.
→ Selection bias, both by geographical location and respondent demographics
Identifying potential sampling bias: Example
Political party X is conducting a poll of voter’s opinion and voting tendencies for the
upcoming election
Data Sample: obtained by phone survey, with the calls made to randomly chosen
registered landline numbers between 9am-5pm between Monday and Friday.

Is this data sample a random sample?

Statistics is UNCERTAIN

Statistical analysis depends on random sampling of data

→Different sets of data can generate a slightly different estimate

Statistics is about quantifying the uncertainty of the sample estimate

If data is to influence decisions, decision-makers need to be able to

understand the extent of this uncertainty
Expectation
ഥ is an estimate of 𝑬 𝑿 = 𝝁
𝒙

• Expectation of the random variable X

• This represents the “theoretical” population mean
• In real world problem this is our “unknown”
• In simulation, this can be calculated from probability theory
Expectation
𝑬 𝑿 =𝝁
• E.g. A simple trial of rolling a six-sided fair dice, and recording face value
• Six possible values: {1,2,3,4,5,6}
• Each outcome has equal probability of 1/6
• Random variable 𝑋 is the face value of each roll
• What is the expectation of 𝑋? From probability theory:
1 1 1 1 1 1
𝐸 𝑋 = ×1 + ×2 + ×3 + ×4 + ×5 + × 6 = 3.5
6 6 6 6 6 6

We can estimate this expectation by conducting experiments and collecting data.

Does sample size matter?

Key takeaways:
• Large samples → estimate gets closer and closer to truth
• New sample gives you different path → statistical variability
• When sample size is small, there is larger variability of estimate
• ALL of this is subject to truly random sample!
Sampling Distribution

Up to this point, you might think….. just collect more high-quality data!
But this is not always possible:
• Limited access to individuals/experiments
• Limited monetary resources to do so
• Some business problems → small data problem
The sampling distribution is a statistical tool that helps quantify the
uncertainty of the estimate for a given data set.
The basics of Sampling Distribution
• Sample statistic is only an estimate of the truth
• Since the sample may vary, any sample statistic is not exact and has
variation/error around them.
• The smaller the error, the greater the accuracy.
• We need to take into account such variability in the statistic if we want to
analyse the statistic.
• Assume we take data samples repeatedly, and compute sample means as
the statistic for each set of sample. Then we would have the sampling
distribution of the sample mean to portray its variability.
Sampling Distribution of the Sample Mean

Statistical theory gives us a result (Central Limit Theorem):

• If the sample size 𝒏 is large, 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
𝑠
𝑆𝐸 𝑥ҧ =
𝑛

𝑺𝒕𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
ഥ ∼ 𝑵 𝑴𝒆𝒂𝒏,
𝒙
𝒔𝒂𝒎𝒑𝒍𝒆 𝒔𝒊𝒛𝒆
• This is true regardless of the shape of the population distribution
• Magic? NO! Just the beauty of statistics.

https://ebsmonash.shinyapps.io/cltdemo/
The following are based on all possible samples of size n.

21
Sampling Distribution of the Sample Mean

Key takeaways:
• The sample mean 𝒙
ഥ is centred around the true mean
• Its uncertainty is measured by the standard error

𝑠
𝑆𝐸 𝑥ҧ =
𝑛
• The standard error is always smaller than the standard deviation
• Large sample size → 𝒙
ഥ is more precise estimate of the true population mean
Estimation of the true population mean
There are two types of estimates:
1) Point Estimate
A single value that estimates a population parameter

2) Interval Estimate
A range of values within which the population parameter probably lies. This range is
known as a confidence interval estimate

Point estimates do not indicate uncertainty (sampling error).

Better approach: give a range of values within which the unknown population parameter is
thought likely to lie. We refer to this range of plausible values as a confidence interval

Confidence interval = plausible range of the unknown population mean given some level of
probability
23
Confidence Interval: Basic Format
𝑠 𝑠 OR 𝒔
𝑋−𝑍 < 𝜇 < 𝑋+𝑍 ഥ±𝒁
𝒙
𝑛 𝑛 𝒏

ഥ
𝑿  Margin of error

Point Estimate A value that embodies the Standard error - A measure of the error
Estimate  by 𝑋ത desired level of confidence associated with the point estimate
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
( )
𝑛

24
Width of the Confidence Interval

1–a

𝑠 𝑠
𝑥ǉ − 𝑍 𝑥ǉ + 𝑍
𝑛 𝑛

Lower Confidence Limit Width of the Upper Confidence Limit

/ Lower boundary confidence interval / Upper boundary

The width of a confidence interval indicates the precision of the estimate.

Note:
• (1-𝛼) is referred to as the level of confidence
• 𝛼 is referred to as the level of significance. It is the probability left in the “tail ends” of the confidence intervals
E.g. for a 95% confidence interval, 𝛼 = 1 − 0.95 = 0.05
26
Factors that affect the width of a Confidence Interval Estimate
If the standard deviation (𝜎) ↑, the spread of the distribution is larger
𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
standard error ↑, width ↑, estimate is less precise
𝑛

𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
If the sample size (n) ↑,standard error ↓, width ↓, estimate is more precise
𝑛
The bigger the sample, the more information we have to increase the precision of the interval estimate of the
sample mean, the narrower the interval.

If the level of confidence (1-α) ↑, critical value changes, width ↑ , the estimate is less precise
The more confident we are, the more values we need to include in our confidence interval, the wider the
interval.

𝒔
ഥ±𝒁
𝒙
𝒏
Confidence Interval in Repeated Sampling Context

We select a sample of 𝑛 observations

repeatedly and and for each sample we
construct a 95% confidence interval for
the population mean.

We could expect 95% of intervals to

contain the population mean.
While 5% of the intervals would not
contain the population mean.

(Source: Lind, Marchal and Wathen, Statistical Techniques in Business Economics, 2021, 18th edition)
Confidence Interval – Average house prices
Calculate the 95% confidence interval of the population mean house prices
𝒔
ഥ =
𝑺𝑬 𝒙 already given here
𝒏

Standard deviation estimate, 𝒔

Sample size, 𝒏 𝒔
ഥ±𝒛
95% C.I.: 𝒙 𝜶
𝟏− 𝟐 𝒏
0.05
𝑧 𝛼
1− 2
= N𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉 1 −
2

[1170.195, 1260.729]
We are 95% confident that the true average house price in this Melbourne
suburb is between $1,170,195 and $1,260,729.
Note: Remember that price is recorded in thousands of dollars.
Confidence Interval – Average house prices
Houses in the school zone are more expensive, on average compared to
houses outside the school zone.

Let’s use the concepts of confidence interval to validate/invalidate this claim

Confidence Interval – Average house prices
Houses in the school zone are more expensive, on average compared to
houses outside the school zone.

Let’s use the concepts of confidence interval to validate/invalidate this claim

Calculation of the 95% confidence interval
School Zone Outside School Zone
alpha 0.05 0.05
• TRUE
xbar 1258.7863 1046.0448
SE(xbar) 26.0798 44.1962
• FALSE
z_(1-alpha/2) 1.959963985 1.959963985

Lower bound 1207.670856 959.4218151

Upper bound 1309.901663 1132.667737
The intervals show that the lower bound and upper bound for houses within the school Zone are both higher.
Furthermore, the two intervals do not overlap. Hence there is a clear difference in prices.
Sampling Distribution of Proportion
Statistical theory ALSO gives us a result (Central Limit Theorem):
Here, 𝝅 =unknown population proportion
𝑋 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠 𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡
• Proportions estimated by 𝑝 = =
𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

𝑝 1−𝑝
• If the sample size 𝑛 is large, 𝑝 ∼ 𝑁 𝜋, 𝑆𝐸 𝑝 where 𝑆𝐸 𝑝 =
𝑛

• Lower & upper bounds of a 1 − 𝛼 confidence interval for the sample proportion
(1 − 𝛼)% C.I. = 𝒑 ± 𝑧 𝛼
1− 2
𝑆𝐸 𝒑
Confidence Interval – Marketing Survey
More than half of our potential market have tried our frozen food product

Let’s use the concepts of confidence interval to validate/invalidate this claim

Contingency table from Week 3 Tutorial

Count of Person ID Gender
Have tried Female Male Grand Total
No 0.21262 0.20911 0.42173 𝑝
Yes 0.25234 0.32593 0.57827
Grand Total 0.46495 0.53505 1.00000
N= 856
Confidence Interval – Marketing Survey
More than half of our potential market have tried our frozen food product

Let’s use the concepts of confidence interval to validate/invalidate this claim

95% confidence interval

We can be 95% confident that the
alpha 0.05 population proportion of the potential
p_hat 0.57827 market who have tried our product is
SE(p_hat) 0.016878955 between 54.52% and 61.14%.
z_(1-alpha/2) 1.959963985

Lower bound 0.545188884

This range is well above half.
Upper bound 0.611353172
Sampling Distribution – SPECIAL CASE
SPECIAL CASE: what happens when sample size 𝒏 is small?
• If you are confident that your data comes from a population
distribution that is normally distributed:
𝑥ҧ − 𝜇
∼ 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 − 𝑡 𝑑𝑓 = 𝑛 − 1
𝑠/ 𝑛
• Confidence intervals can be constructed by
𝑠
95% C.I.: 𝑥ҧ ± 𝒕 𝜶
𝟏− 𝟐 ,𝒅𝒇 𝑛

• Here, 𝒕 𝜶
𝟏− 𝟐 ,𝒅𝒇
is calculated by “=T.INV(1-alpha/2,df)”
Samples and Sampling Distributions

❖ Representative Sample
▪ Sample design
▪ Survey design
▪ Sample size
❖ Sampling distribution of the sample statistic
▪ General concepts
▪ Relationship with population distribution and sample size
▪ Confidence interval calculation and interpretations

Bizstat ssn2
No ratings yet
Bizstat ssn2
55 pages
Lectorial Slides 6a
No ratings yet
Lectorial Slides 6a
30 pages
Implications For Sampling Distributions and Population Inferences PPT Rommel
No ratings yet
Implications For Sampling Distributions and Population Inferences PPT Rommel
12 pages
Isom 2500
No ratings yet
Isom 2500
58 pages
Chapter 6 Sampling and Estimation - v2
No ratings yet
Chapter 6 Sampling and Estimation - v2
57 pages
Sample and Sampling Procedure: Population
No ratings yet
Sample and Sampling Procedure: Population
21 pages
Unit - 4
No ratings yet
Unit - 4
10 pages
Sampling Distributions and Confidence Intervals
No ratings yet
Sampling Distributions and Confidence Intervals
68 pages
Lecture Slides 12 - Sampling and The Central Limit Theorem
No ratings yet
Lecture Slides 12 - Sampling and The Central Limit Theorem
40 pages
CH7 - Sampling and Sampling Distributions
No ratings yet
CH7 - Sampling and Sampling Distributions
37 pages
Brief Lecture Notes
No ratings yet
Brief Lecture Notes
13 pages
BUSN 2429 Chapter 7 Sampling Distribution
No ratings yet
BUSN 2429 Chapter 7 Sampling Distribution
83 pages
Chapter 6-8 Sampling and Estimation
No ratings yet
Chapter 6-8 Sampling and Estimation
48 pages
Sampling & Sampling Distributions
No ratings yet
Sampling & Sampling Distributions
44 pages
Eba3e PPT ch06
No ratings yet
Eba3e PPT ch06
41 pages
Sampling Design and Analysis MTH 494 Lecture-32: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494 Lecture-32: Ossam Chohan Assistant Professor CIIT Abbottabad
119 pages
Lecture 5
No ratings yet
Lecture 5
72 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
36 pages
Why "Sample" The Population? Why Not Study The Whole Population?
No ratings yet
Why "Sample" The Population? Why Not Study The Whole Population?
9 pages
FIN 640 - Lecture Notes 4 - Sampling and Estimation
100% (1)
FIN 640 - Lecture Notes 4 - Sampling and Estimation
40 pages
Sp25 Module 06 Sampling
No ratings yet
Sp25 Module 06 Sampling
45 pages
Confidence Intervals and Hypothesis Tests For Means
No ratings yet
Confidence Intervals and Hypothesis Tests For Means
40 pages
Why "Sample" The Population? Why Not Study The Whole Population?
No ratings yet
Why "Sample" The Population? Why Not Study The Whole Population?
9 pages
Notes On Sampling and Hypothesis Testing
No ratings yet
Notes On Sampling and Hypothesis Testing
10 pages
Economics 1280 Notes
No ratings yet
Economics 1280 Notes
67 pages
Screenshot 2024-12-15 at 01.18.34
No ratings yet
Screenshot 2024-12-15 at 01.18.34
161 pages
Stats-And-Prob-Reviewer (Grade 11 Stem)
100% (1)
Stats-And-Prob-Reviewer (Grade 11 Stem)
5 pages
P&S - Lec 6 - Sampling Distribution
No ratings yet
P&S - Lec 6 - Sampling Distribution
32 pages
Chapter 4 Sampling Distributions PDF
No ratings yet
Chapter 4 Sampling Distributions PDF
74 pages
8 - Sampling Distributions
No ratings yet
8 - Sampling Distributions
30 pages
Lecture 3 - Sampling Design - 2018
No ratings yet
Lecture 3 - Sampling Design - 2018
53 pages
Evans Analytics2e PPT 06 Final
100% (1)
Evans Analytics2e PPT 06 Final
36 pages
Basic Univariate Statistics For Engineers 2019
No ratings yet
Basic Univariate Statistics For Engineers 2019
32 pages
Sampling Technique and Sampling Distribution
No ratings yet
Sampling Technique and Sampling Distribution
47 pages
Probability and Statistics
No ratings yet
Probability and Statistics
35 pages
EECM3724 Unit 4 Ch7 Slides 2022
No ratings yet
EECM3724 Unit 4 Ch7 Slides 2022
24 pages
Sampling and Statistical Inference: Eg: What Is The Average Income of All Stern Students?
100% (1)
Sampling and Statistical Inference: Eg: What Is The Average Income of All Stern Students?
11 pages
UNIT 10 - Estimations (With Voice)
No ratings yet
UNIT 10 - Estimations (With Voice)
67 pages
SB K49 Lecture7
No ratings yet
SB K49 Lecture7
57 pages
Stat Notes
No ratings yet
Stat Notes
5 pages
Unit-3 Chapter-5 Sampling and Sampling Distributions
No ratings yet
Unit-3 Chapter-5 Sampling and Sampling Distributions
62 pages
Estadística II T2
No ratings yet
Estadística II T2
4 pages
Statistical Inference
No ratings yet
Statistical Inference
52 pages
Chapter 8
No ratings yet
Chapter 8
36 pages
002 Probability-and-Statistics-Part-4-Statistics
No ratings yet
002 Probability-and-Statistics-Part-4-Statistics
123 pages
Sampling & Sampling Distributions
No ratings yet
Sampling & Sampling Distributions
26 pages
RMB W2
No ratings yet
RMB W2
22 pages
Chapter 2 Students-Sta408
No ratings yet
Chapter 2 Students-Sta408
59 pages
Lecture 5 Statistics
0% (1)
Lecture 5 Statistics
52 pages
Lecture 8
No ratings yet
Lecture 8
39 pages
Sampling Method and Estimation: Statistics For Economics 1
No ratings yet
Sampling Method and Estimation: Statistics For Economics 1
62 pages
Lecture No. Probability & Statistics
No ratings yet
Lecture No. Probability & Statistics
60 pages
Sampling Distribution
No ratings yet
Sampling Distribution
102 pages
Topic Four & Five - Sampling Distributions & Estimation Theory
No ratings yet
Topic Four & Five - Sampling Distributions & Estimation Theory
32 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Seminar Week 4 - With Solutions - Fullpage

Uploaded by

Seminar Week 4 - With Solutions - Fullpage

Uploaded by

ETF1100 Business Statistics

Calculate the 10th percentile of this price distribution

• Mean = 0 and Stdev = 1

Standard Normal Distribution

𝑋 ~ 𝑁(𝑀𝑒𝑎𝑛, 𝑆𝑡𝑑𝑒𝑣) 𝑍 ~ 𝑁(0,1)

1215 Price ($000s)

Is the normal distribution always a good approximation for numerical data?

Statistics: The study of the collection, organisation,

How is statistics used in decision making?

• Sample 2: Data is collected by a mandatory survey of all train passengers

Is this data sample a random sample?

Statistical analysis depends on random sampling of data

→Different sets of data can generate a slightly different estimate

Statistics is about quantifying the uncertainty of the sample estimate

If data is to influence decisions, decision-makers need to be able to

• Expectation of the random variable X

We can estimate this expectation by conducting experiments and collecting data.

Statistical theory gives us a result (Central Limit Theorem):

Point estimates do not indicate uncertainty (sampling error).

Lower Confidence Limit Width of the Upper Confidence Limit

The width of a confidence interval indicates the precision of the estimate.

We select a sample of 𝑛 observations

We could expect 95% of intervals to

Standard deviation estimate, 𝒔

Let’s use the concepts of confidence interval to validate/invalidate this claim

Let’s use the concepts of confidence interval to validate/invalidate this claim

Lower bound 1207.670856 959.4218151

Let’s use the concepts of confidence interval to validate/invalidate this claim

Contingency table from Week 3 Tutorial

Let’s use the concepts of confidence interval to validate/invalidate this claim

95% confidence interval

Lower bound 0.545188884

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.