0% found this document useful (0 votes)
17 views82 pages

Week 9+10+11

Uploaded by

lpthao.work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views82 pages

Week 9+10+11

Uploaded by

lpthao.work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

STATISTICS

IN ECONOMICS AND BUSINESS

Nguyen Huyen Trang


Faculty of Statistics - National Economics University
trangtk@neu.edu.vn
LECTURE 7: INFERENCE STATISTICS

• Estimation:
Inferential Approximating the actual
Statistics values of parameters
Population • Hypothesis Testing:
Confirm or disprove about the
population parameters
Random Sample
Population
parameters Sample
statistics

Sampling
OUTLINE

Hypothesis
Probability Estimation
Testing
• Probability of • Point • Hypothesis
discrete variable Estimation testing for a
• Probability of • Interval population
continuous Estimation mean
variable • Hypothesis
• Normal testing for a
distribution population
proportion
TYPE OF STATISTICS
PROBABILITY

• Probability is quantitative measure of uncertainty, the


chance that an uncertain event will occur.
• Probability: Subjective and Objective
• Probability of event A is denoted by P(A)
➢ 0 ≤ P(event) ≤ 1
➢ P(certain) = 1
➢ P(impossible) = 0
• P(A) > P(B): A is more possible to occur than B
PROBABILITY

Impossible Unlikely Equal Chances Likely Certain

0 0.5 1
0% 50% 100%

½
• When a meteorologist states that the chance of rain is 50%
• If the chance of rain rises to 80%
• If the chance drops to 20%
CLASSICAL DEFINITION

• Total number of basic outcomes: 𝑁


• Number of outcomes favor for event A: 𝑁𝐴
• Probability of event A:
𝑁𝐴
𝑃 𝐴 =
𝑁

• Probability is Population Proportion


RANDOM VARIABLE

• Variable 𝑋, value is random


• Discrete variable: 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
➢ Number of item: 𝑋 = (0,1,2, … )
➢ Score of test: 𝑋 = (0,1,2, … , 100)
• Continuous variable: 𝑋 = (𝑥𝑚𝑖𝑛 ; 𝑥𝑚𝑎𝑥 )
➢ Time
➢ Temperature
➢ Length, Weight
PROBABILITY OF DISCRETE VARIABLE

▪ Finite number of outcomes


X = (x1, x2,..., xn)
➢ Roll a dice once
Let X be the number showing on top → X could be 1-6
➢ Toss a coin twice
Let X be the number of heads → X could be 0, 1, 2
▪ Denote: P(X = xi) = pi
Value x1 x2 ... xn Sum
Probability p1 p2 ... pn 1.0
PROBABILITY OF DISCRETE VARIABLE

Toss a coin twice. Let X be the number of heads


4 possible outcomes
Probability Distribution
T T x Value Probability
0 1/4 = .25
T H 1 2/4 = .50
2 1/4 = .25
H T Probability
.50

.25
H H
0 1 2 x
DISCRETE PROBABILITY DISTRIBUTION

The probability distribution for a random variable describes


how probabilities are distributed over the values of the
random variable
Binomial
Discrete
Hypergeometric
Probability
Distributions Poisson

...
DISCRETE PROBABILITY DISTRIBUTION
PROBABILITY OF CONTINUOUS VARIABLE

Outcomes vary along continuous scale

Uniform

Continuos Normal
Probability
Distributions Exponential

...
PROBABILITY OF CONTINUOUS VARIABLE

Central Limit Theorem


As the n↑ the sampling
sample size distribution
gets large becomes almost
enough… normal regardless
of shape of
population

x
PROBABILITY OF CONTINUOUS VARIABLE
NORMAL DISTRIBUTION

▪ By Carl Friedrich Gauss (1777-1855) in 1809


▪ The most important probability distribution in statistics
and important tool in analysis
NORMAL DISTRIBUTION

X ~ N(μ,σ 2 )
➢ Bell-shaped
➢ Symmetrical: Mean = Median = Mode
➢ Never touches the x-axis
➢ Total area under curve is 1.00
➢ Probability of any single value = 0
NORMAL DISTRIBUTION

By varying the parameters μ and σ, we obtain


different normal distributions
NORMAL DISTRIBUTION

f(x) Changing μ shifts the


distribution left or right.

Changing σ increases or
decreases the spread.
σ

μ x
STANDARDIZING

All variables measured by marketers can be represented by the


"standard" normal distribution.
The process of converting any Normal random variable to a
Standard Normal Random Variable.
• A special form of the normal distribution or bell curve.
• Mean=0
• Standard deviation = 1 in all cases

20
STANDARDIZED NORMAL VARIABLE
0.5

0.4
• 𝑋~𝑁 𝜇, 𝜎 2
0.3
𝑋−𝜇
•𝑍 = 0.2
𝜎
0.1

• 𝑍~𝑁(0,1) 0
-4 -3 -2 -1 0 1 2 3 4
•𝑃 𝑍 < 1 =
• 𝑃 𝑍 < 1.15 =
•𝑃 𝑍 > 3 =
• 𝑃 −1 < 𝑍 < 1.3 =
-2.5

1.5

3.5
2
-3.5
-3

4
1

2.5
-4

-1.5

0.5
-2

-1
-0.5
Z TABLE

P (Z<0,72) = 0,7642
THREE SIGMA RULE
THE STANDARDIZED NORMAL TABLE

To find P(a < X < b) when X is distributed normally


X−μ
1. Standardized X-values to Z-values: Z =
σ

2. Use the Cumulative Normal Table:


a−μ b−μ b−μ a−μ
𝑃 a<X<b =P <Z< =F −F
σ σ σ σ
EXAMPLE

𝑋~𝑁 100,16
• 𝑃 𝑋 < 104 =
• 𝑃 𝑋 > 92 =
• 𝑃 94 < 𝑋 < 102 =
• Probability that X differ from the mean not more than
standard deviation =
• P(X>a)=0.2 => a=?
CENTRAL LIMIT THEOREM

• As n→∞, the distribution of the sample mean becomes Normal,


with center μ and standard deviation σ/ 𝑛.
• This happens regardless of the shape of the original population.
• i.e. follows a Normal distribution with
CENTRAL LIMIT THEOREM
WHAT SIZE N?

• If the distribution of X (the population) is normal, then


for all n the sample mean will follow a normal
distribution.
• If the distribution of X is VERY not normal, then we
will need a large n for us to see the normality of the
distribution of the sample mean.
• In all cases, as n gets larger, the distribution of the mean
gets more normal.
HOW DOES THIS HELP?

• This means that if we have a large enough sample, we


can always find out probabilities to do with the mean,
since it will have a normal distribution no matter what
the original distribution.
CENTRAL LIMIT THEOREM
EXCERCISE 1

The average height of Vietnamese women is


1.6m, with a standard deviation of 0.2m. If I
choose 25 women at random, what is the
probability that their average height is less than
1.53m?
EXCERCISE 2

Suppose that annual percentage salary increases for the


chief executive officers of all mid-size corporations are
normally distributed with mean 12.2% and standard
deviation 3.6%. A random sample of nine observations
from this population of percentage salary increases is
taken. What is the probability that the sample mean will
be less than 10%?
EXCERCISE 3

A manufacturer claims that the life of its spark plugs is


normally distributed with mean 36,000 miles and
standard deviation 4,000 miles. For a random sample of
sixteen of these plugs, the average life was found to be
34,500 miles. If the manufacturer’s claim is correct,
what would be the probability of finding a sample
mean this smaller?
ESTIMATION

The aim of estimation is to determine the approximate


value of a parameter of the population using statistics
calculated in respect of a sample drawn from that
population.
▪As an example, we estimate the mean of a population using
the mean of a sample drawn from that population. That is,
the sample mean is an estimator of the population mean.
▪The actual statistic we calculate in respect of the sample is
called an estimate of the population parameter. For example,
a calculated sample mean is an estimate of the population
mean.
ESTIMATORS

There are two types of estimators:


• Point estimate: a single value or point.
i.e. sample mean = 4 is a point estimate of the population mean,
μ.
• Interval estimate: Draws inferences about a population by
estimating a parameter using an interval (range).
i.e. We are 95% confidence that the unknown mean score lies
between 56 and 78.
INTERVAL ESTIMATORS FOR µ,
σ IS KNOWN

We know that

• We also know that, for a standard normal distribution, 95% of


the area is contained between -1.96 and +1.96.
i.e.
INTERVAL ESTIMATORS FOR µ,
σ IS KNOWN

• Put these things together….


• And rearranging…
INTERVAL ESTIMATORS FOR µ,
σ IS KNOWN

• This is called a 95% confidence interval for μ.


• What this means:
In repeated sampling, 95% of the intervals created this way
would contain μ and 5% would not.
• Can change how confident we are by changing the 1.96
❑Use 1.645 to get a 90% confidence interval
❑Use 2.575 to get a 99% confidence interval
EXAMPLE 1

Suppose we know from experience that a random


variable X~N(μ, 1.66), and for a sample of size 10
from this population, the sample mean is 1.58. What
can you infer about the population mean? (using a
95% confidence interval for μ)
GENERAL NOTATION

• In general, a 100(1-α)% confidence interval estimator for μ is


given by

• Confidence level: 100(1-α)%- the probability that a parameter


falls into CI (confidence interval).
WHAT DOES 100(1- α)% MEAN?

• If we want 95% confidence, α=0.05 (or 5%).

• If we want 90% confidence, α=0.10 (or 10%).

• If we want 99% confidence, α=0.01 (or 1%).

→ Helps to look up z-table


WHAT DOES Zα/2 MEAN?

We want to find the middle 100(1- α)% area of the standard


normal curve:
• So the area left in each tail will be α/2.
• Zα/2 is the point which marks off area of α/2 in the tail
• Need to look up normal tables to find this!
FACTORS INFLUENCE WIDTH OF
THE INTERVAL

• σ fixed; can’t be changed


• Vary the sample size: as n gets bigger, the interval gets
narrower.
• Vary the confidence level: we want to be more confident,
then we simply change the 1.96 to another number from the
standard normal, 2.33 will give 98% confidence, 2.575 will
give 99% confidence; increasing confidence level will make
the interval wider.
IMPORTANT

• Remember that it is the INTERVAL that changes from


sample to sample.
• μ is a fixed and constant value. It is either within the
interval or not.
• You should interpret a 95% confidence interval as
saying, “In repeated sampling, 95% of such intervals
created would contain the true population mean” or
“We are 95% confident that the true population mean
will be from 0.78 to 2.38.”
EXAMPLE 2

Average height of a sample of 25 men is found to be


178cm. Assume that the standard deviation of male
heights is known to be 10cm, and that heights follow a
normal distribution.
Find
1. A 95% confidence interval for the population mean
height.
2. A 90% confidence interval for the population mean
height.
INTERVAL ESTIMATORS FOR µ,
σ IS UNKNOWN

• We can’t simply substitute s in for σ, since

does not have a standard normal distribution!


• However, it does follow a known distribution: it follows a t-
distribution with n-1 degrees of freedom. The statistic is called t-
statistic:
ABOUT THE T-DISTRIBUTION

• Found by Gossett, published under pseudonym


“Student”.
• Called “Student’s t-distribution”
• It is symmetric around 0, mound shaped (like a
normal), but has a higher variance than a normal
distribution.
• The higher the degrees of freedom, the more normal the
curve looks.
DEGREE OF FREEDOM (DF)

• Number of obs whose value are free to vary after


calculating the sample mean
• Eg:

X1 = 1 (or another value)


X2 = 2 (or another value)
X3 = 3 (can’t be changed)
STANDARD NORMAL DISTRIBUTION
AND STUDENT’S DISTRIBUTION
ABOUT THE T-DISTRIBUTION

Bell-shaped
Symmetric

More spread out


STUDENT’S TABLE

𝑡0.025 9 = 2,262
HINTS FOR USING THE T-TABLES

• Bottom row has df=∞; this is the standard normal


probabilities.
If df is very large, use Z tables even if σ is unknown

• If df is not on tables as exact, use whatever df is closest.


INTERVAL ESTIMATORS FOR µ,
σ IS UNKNOWN

Note:
(i) The population must follow normal distribution to get t-statistic
(ii) Use t-table to find t-value
EXAMPLE 3

A random sample, size n = 25, = 50, s = 8. Use


95% confidence level to estimate the true population
mean.
CONFIDENT INTERVAL FOR MEAN

Can the population NoNo


Yes
standard deviation be
assumed known?

Is the sample size


𝜎 large? (n>=30)?
𝑥ҧ ± 𝑧𝛼/2
𝑛
No Yes
Yes No

𝑆 𝑆
𝑥ҧ ± 𝑧𝛼/2 𝑥ҧ ± 𝑡𝛼/2
𝑛 𝑛
CONFIDENT INTERVAL
FOR PROPORTION

𝑝(1
ҧ − 𝑝)ҧ 𝑝(1
ҧ − 𝑝)ҧ
𝑧𝛼/2 × 𝑧𝛼/2 ×
𝑛 𝑛

Lower Confidence Upper Confidence


Limit Limit
ഥ)
Point Estimation (𝐩

Width of
confidence interval
EXAMPLE

According to a 2010 report from the American Council on


Education, females make up 57% of the college population in
the United States. Students in a statistics class at Tallahassee
Community College want to determine the proportion of female
students at TCC. They select a random sample of 135 TCC
students and find that 72 are female. Find a 95% confidence
interval for proportion of females at the college?
DETERMINE THE SAMPLE SIZE

• Suppose that before we gather data, we know that we


want to get an average within a certain distance of the
true population value.
• We can use the Central Limit Theorem to find the
minimum sample size required to meet this condition,
if the standard deviation of the population is known.
EXAMPLE 4

Assume that the standard deviation of a population is


5. I want to estimate the true population mean lying
in a range of 3, with 99% certainty.
SPSS

Analyze > Descriptive Statistics > Explore …


OUTPUT
HYPOTHESIS TESTING

• Steps of hypothesis testing

• Hypothesis testing for a population mean

• Hypothesis testing for a population proportion


HYPOTHESIS TESTING STEPS

▪ Null and alternative hypothesis

▪ Test statistic: calculating statistical value

▪ Comparing with critical value

▪ Concluding about the hypothesis


NULL AND ALTERNATIVE HYPOTHESIS

• Convert the research question to null and alternative


hypotheses

• The null hypothesis (H0) is a claim of “no difference


in the population”. Always contain the equal sign “=“

• The alternative hypothesis (H1) claims “H0 is false”.


The sign of H1 (>, <, ≠ ) decided the type of test
NULL AND ALTERNATIVE HYPOTHESIS
COMPARING WITH CRITICAL VALUE

▪ Level of significance
Actual Situation
Decision Rain (H0 True) Not rain (H0 False)
Umbrella
No Error Type II Error
(Not Reject (1 - ) (β)
H0 )
No umbrella Type I Error No Error
(Reject H0) ( ) (1-β)
Significance level () = The probability of Type I Error
= The probability of rejecting a true null hypothesis
TEST STATISTICS

• Large sample case:

σ known σ unknown

𝐱ത − 𝛍𝟎 𝐱ത − 𝛍𝟎
𝐳= 𝛔 𝐳=
ൗ 𝐧 𝐒ൗ
𝐧

• Rejection rule:
Left tail Right tail Two tail

Z < -Zα Z > Zα |Z| > Zα/2


TEST STATISTICS

• Small sample case:


σ known σ unknown

𝐱ത − 𝛍𝟎 𝐱ത − 𝛍𝟎
𝐭= 𝛔 𝐭=
ൗ 𝐧 𝐒ൗ
𝐧

• Rejection rule:
Left tail Right tail Two tail
t < -tα, n-1 t > tα, n-1 |t| > tα/2, n-1
COMPARING WITH CRITICAL VALUE

Level of significance =  Represents


critical value
H0: μ = 3 /2 /2
H1: μ ≠ 3 Rejection
Two-tail test 0 region is
shaded
H0: μ ≤ 3 
H1: μ > 3
Upper-tail test 0

H0: μ ≥ 3

H1: μ < 3
Lower-tail test 0
COMPARING WITH CRITICAL VALUE
CONCLUDING

• Reject H0

• Not reject H0

At alpha = 0.05, there is (not) sufficient evidence to

conclude that…..
SUMMARY OF TEST STATISTICS

Yes No
n > 30 ?
No
Ϭ known ? Popul.
Yes
approx.
Yes normal
Use s to
estimate Ϭ No ?
Ϭ known ?
No
Yes Use s to
estimate Ϭ

x − x − x − x − Increase n
z= z= z= t=
/ n s/ n / n s/ n to > 30
EXERCISE

An accountant claims to be able to complete a standard


tax return in at most an hour. For a random sample of 24
tax returns, the accountant averaged 63.2 minutes with a
standard deviation of 7.7 minutes. Is there sufficient
evidence to suggest that the accountant’s claim is
incorrect?
HYPOTHESIS PAIR

- “Average waiting time of passengers is 5


minutes”
-“Mean of households’ consumption
expenditure is $500 per month”
-“Government report that average income is
$2400”
-“Expected value of price is $10”
-“Mean of consumption was 2 mil.VND”
HYPOTHESIS PAIR

• Average income is higher than $2400

• Mean of wage is lower than $300

• Average profit has been changed from $90.000

• GPA is less than or equal to 5.5

• The price is more than or equal to $150

• Average of revenue is not small than $2000


X

EXAMPLE

How many should Kleenex package of tissues contain?


Researchers determined that 60 tissues is the average number of
tissues used during a cold. Suppose a random sample of 100
Kleenex users yielded the following data on the number of
tissues used during a cold: x = 52, s = 22.
1. Give the null and alternative hypotheses to determine if the
number of tissues used during a cold is not 60.
2. Calculate the value of the test statistic
3. Give the critical value if alpha= 0.05
4. Make the conclusion
SPSS

Analyze > Compare Means > One-Sample T Test…


OUTPUT
TEST ABOUT
THE POPULATION PROPORTION

Hypothesis:
• Two tail:
H0: p = p0
H1: p ≠ p0
• Left tail:
H0: p = p0
H1: p < p0
• Right tail:
H0: p = p0
H1: p > p0
TEST STATISTICS

• Large sample (np ≥ 5 and n(1-p) ≥ 5)

ഥ − 𝐩𝟎
𝐩
𝐳=
𝛔𝐩ഥ

𝑝0 (1− 𝑝0 )
where: 𝜎𝑝ҧ =
𝑛

• Rejection rule
Left tail Right tail Two tail

Z < -Zα Z > Zα |Z| > Zα/2


EXAMPLE

For a Christmas and New Year’s week, the National


Safety Council estimated that 500 people would be
killed and 25,000 injured on the nation’s roads. The
NSC claimed that 50% of the accidents would be
caused by drunk driving.
A sample of 120 accidents showed that 67 were
caused by drunk driving. Use these data to test the
NSC’s claim with a = 0.05.
EXAMPLE

• Hypothesis
H0: p = .5
Ha: p  .5
• Test Statistic
p0 (1 − p0 ) .5(1 − .5)
p = = = .045644
n 120
p − p0 (67 /120) − .5
z= = = 1.278
p .045644

• Reject H0 if z < -1.96 or z > 1.96


• Conclusion: Do not reject H0.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy