IEP102 Group 3
IEP102 Group 3
IEP102
2BSIE-A
CONTENT
STATISTICAL INFERENCE:
USE S TO ESTIMATE
UNKNOWN ,USE A T TEST
STATISTIC AND POOLED
STANDARD DEVIATION
TWO POPULATION PROPORTIONS
POPULATION GOAL: FORM A CONFIDENCE
PROPORTIONS INTERVAL FOR OR TEST A
HYPOTHESIS ABOUT TWO
POPULATION PROPORTIONS
ASSUMPTIONS:
ESTIMATION FOR TWO POPULATION
ESTIMATING TWO
POPULATION
VALUES
POPULATION
MEANS, PAIRED POPULATION
INDEPENDENT SAMPLES PROPORTION
SAMPLES
EXAMPLES:
1.66
THE CONFIDENCE INTERVAL FOR
IS:
PAIRED SAMPLES
ASSUMPTIONS:
POPULATIONS ARE NORMALLY
DISTRIBUTED
PAIRED
SAMPLES
N IS THE
NUMBER OF
PAIRS IN THE
PAIRED SAMPLES
THE CONFIDENCE INTERVAL
POPULATION MEANS, FOR
INDEPENDENT SAMPLES
IS:
CALCULATING THE TEST STATISTIC
The test statistic is:
HYPOTHESIS TESTING
solution:
2.040
2.040
HYPOTHESIS TESTING FOR PAIRED SAMPLES
PAIRED SAMPLES
CONFIDENCE INTERVAL FOR 2 POPULATION PROPORTIONS
POPULATION PROPORTIONS
THE CONFIDENCE INTERVAL FOR
IS:
HYPOTHESIS TESTS FOR 2 POPULATION PROPORTIONS
2 POPULATION PROPORTIONS
PAIRED SAMPLES
THE POINT ESTIMATE
FOR THE POPULATION
MEAN PAIRED
DIFFERENCE IS
THE SAMPLE
STANDARD DEVIATION
IS
RATIO OF
VARIANCES
HYPOTHESIS TESTING FOR THE RATIO OF THE VARIANCES OF TWO NORMALLY
DISTRIBUTED POPULATIONS
HYPOTHESIS TESTING FOR THE DIFFERENCE IN THE PROPORTIONS OF TWO
POPULATION
HYPOTHESIS TESTING FOR THE DIFFERENCE IN THE PROPORTIONS OF TWO
POPULATION
HYPOTHESIS TESTING FOR THE DIFFERENCE IN THE PROPORTIONS OF TWO
POPULATION
TESTING HYPOTHESES AND CONFIDENCE
INTERVAL
There is a relationship between confidence intervals and
hypothesis testing. When the null hypothesis is rejected in a
hypothesis-testing situation, the confidence interval for the
mean using the same level of significance will not contain the
hypothesized mean. Likewise, when the null hypothesis is not
rejected, the confidence interval computed using the same
level of significance will contain the hypothesized mean.
EXAMPLE
Sugar Production Sugar is packed in 5-
pound bags. An inspector suspects the
bags may not contain 5 pounds. A sample
of 50 bags produces a mean of 4.6
pounds and a standard deviation of 0.7
pound. Is there enough evidence to
conclude that the bags do not contain 5
pounds as stated at a 0.05? Also, find the
95% confidence interval of the true mean.
In summary, then, when the null hypothesis is rejected at a significance level of a, the confidence interval
computed at the 1 a level will not contain the value of the mean that is stated in the null hypothesis. On
the other hand, when the null hypothesis is not rejected, the confidence interval computed at the same
significance level will contain the value of the mean stated in the null hypothesis. These results are true for
other hypothesis testing situations and are not limited to means tests. The relationship between
confidence intervals and hypothesis testing presented here is valid for two-tailed tests. The relationship
between one-tailed hypothesis tests and one sided or one-tailed confidence intervals is also valid.
The hypothesis testing and confidence interval results always agree. To understand the basis of this
agreement, remember how confidence levels and significance levels function:
A confidence level determines the distance between the sample mean and the confidence limits.
A significance level determines the distance between the null hypothesis value and the critical regions.
Both of these concepts specify a distance from the mean to a limit. Surprise! These distances are
precisely the same length.
Interpreting these statistics goes beyond the scope of this article. But, using this equation, the distance for
our fuel cost example is $63.57.
P-value and significance level approach: If the sample mean is more than $63.57 from the null hypothesis
mean, the sample mean falls within the critical region, and the difference is statistically significant.
Confidence interval approach: If the null hypothesis mean is more than $63.57 from the sample mean, the
interval does not contain this value, and the difference is statistically significant.
Approximate Hypothesis Tests: the z Test and the t Test
The two test are the two common test of hypothesis that a population mean
equal a particular value and the of the hypothesis that two population means are
equal: the z test and t test.
They are based on approximations to the probability distribution of the test
statistic when the null hypothesis is true, so their significance levels are not
exactly what they claim to be.
The test are accurate if the sample size is large and the population is nearly
normal, but the can be inaccurate otherwise.
The z test uses the normal approximation, while the t test uses the Student’s t
curve.
Hypothesis tests and confidence intervals are related.
TESTING HYPOTHESES AND CONFIDENCE
INTERVAL USING Z STATISTICS
A Z-test is a type of statistical hypothesis test where the test-
statistic follows a normal distribution.
A Sample size that’s greater than 30. This is because we want to ensure our sample mean
comes from a distribution that is normal. As stated by the central limit theorem, any
distribution can be approximated as normally distributed if it contains more than 30 data
points.
NOTE: If the test statistic is greater (or lower depending on the test we are conducting) than
the critical value, then the alternate hypothesis is true because the sample’s mean is
statistically significant enough from the population mean.
Z-TEST EXAMPLE
We have a probability model for how the observations arise, assuming the null hypothesis is true.
Typically, the model is that under the null hypothesis, the data are like random draws with or
without replacement from a box of numbered tickets.
Under the null hypothesis, the test statistic X, converted to standard units, has a probability
histogram that can be approximated well by the normal curve.
Under the null hypothesis, we can find the expected value of the test statistic, X.
Under the null hypothesis, either we can find the SE of the test statistic, SE(X), or we can estimate
SE(X)accurately enough to ignore the error of the estimate of the SE. Let se denote either the exact
SE of X under the null hypothesis, or the estimated value of SE(X) under the null hypothesis.
Then, under the null hypothesis, the probability histogram of the Z statistic Z=(X−E(X))/se is
approximated well by the normal curve, and we can use the normal approximation to select the
rejection region for the test using Z as the test statistic. If the null hypothesis is true,
P(Z<za)≈a
P(Z>z1−a)≈a, and
P(|Z|>z1−a/2)≈a..
These three approximations yield three different z tests of the hypothesis that μ=μ0=0 at
approximate significance level a:
In a left-tail test, the probability of a TYPE 1 ERROR is approximately the area of the left tail
of the normal curve, from minus infinity to za.
In a right-tail test, the probability of a TYPE 1 ERROR is approximately the area of the right
tail to the normal curve, from z1−a to infinity.
In a two-tail test, the probability of a TYPE 1 ERROR is approximately the sum of the areas
of both tails of the normal curve, the left tail from minus infinity to za/2 and the right tail
from z1−a/2 to infinity.
These three tests are called z tests. The observed value of Z is called the Z score.
Which of these three test if any, should one use?
The P value depends on the type of z test (left-tail, right-tail or two-tail) and the observed value
of the Z statistic, which is the test statistic in standard units under the null hypothesis.
P value is calculated using the normal approximation to the probability distribution of the Z
statistic under the null hypothesis.
P VALUE FOR Z TEST
Each of the three z tests gives us a family of procedures for testing the null hypothesis at any
(approximate) significance level a between 0 and 100%—we just use the appropriate quantile of the
normal curve. This makes it particularly easy to find the P value for a z test. Recall that the P value is
the smallest significance level for which we would reject the null hypothesis, among a family of tests
of the null hypothesis at different significance levels.
Suppose the z score (the observed value of Z) is x. In a left-tail test, the P value is the area under the
normal curve to the left of x: Had we chosen the significance level a so that za=x, we would have
rejected the null hypothesis, but we would not have rejected it for any smaller value of a, because for
all smaller values of a, za<x. Similarly, for a right-tail z test, the P value is the area under the normal
curve to the right of x: If x=z1−a we would reject the null hypothesis at approximate significance level
a, but not at smaller significance levels. For a two-tail z test, the P value is the sum of the area under
the normal curve to the left of −|x| and the area under the normal curve to the right of |x|.
Finding P values and specifying the rejection region for the z test involves the probability distribution
of Z under the assumption that the null hypothesis is true. Rarely is the alternative hypothesis
sufficiently detailed to specify the probability distribution of Z completely, but often the alternative
does help us choose intelligently among left-tail, right-tail, and two-tail z tests. This is perhaps the
most important issue in deciding which hypothesis to take as the null hypothesis and which as the
alternative: We calculate the significance level under the null hypothesis, and that calculation must
be tractable.
Z TEST FOR POPULATION CHANGE
Suppose we have a population of N units of which G are labeled "1" and the rest are labeled "0." Let
p=G/N be the population percentage. Consider testing the null hypothesis that p=p0 against the
alternative hypothesis that p≠p0, using a random sample of n units drawn with replacement. (We
could assume instead that N>>n and allow the draws to be without replacement.)
Provided n is large and p0 is not too close to zero or 100% (say n×p>30 and n×(1−p)>30), the probability
histogram of Z will be approximated reasonably well by the normal curve, and we can use it as the Z
statistic in a z test. For example, if we reject the null hypothesis when |Z|>1.96, the significance level of
the test will be about 95%.
Z TEST FOR POPULATION MEAN
Consider testing the null hypothesis that the population mean μ is equal to a specific null value μ0,
against the alternative hypothesis that μ<μ0, on the basis of a random sample with replacement of size
n. Recall that the sample mean M of n random draws with or without replacement from a box of
numbered tickets is an unbiased estimator of the population mean μ: If
Then,
where N is the size of the population. The population mean determines the expected value of the
sample mean. The SE of the sample mean of a random sample with replacement is
where SD(box) is the SD of the list of all the numbers in the box, and n is the sample size. As a special
case, the sample percentage \phi of n independent random draws from a 0-1 box is an unbiased
estimator of the population percentage p, with SE equal to
IZ TEST FOR A DIFFERENCE OF POPULATION MEAN
Paired Samples
Consider a population of N individuals, each of whom is labeled with two numbers. For example, the N individuals might be a
group of doctors, and the two numbers that label each doctor might be the annual payments to the doctor by an HMO under
the terms of the current contract and under the terms of a proposed revision of the contract. Let the two numbers
associated with individual i be ci and ti. (Think of c as control and t as treatment. In this example, control is the current
contract, and treatment is the proposed contract.) Let μc be the population mean of the N values
{c1,c2,…,cN},
{t1,t2,…,tN}.
against the alternative hypothesis that μ<μ0. With μ0=$0, this null hypothesis is that the average annual payment to doctors
under the proposed revision would be the same as the average payment under the current contract, and the alternative is
that on average doctors would be paid less under the new contract than under the current contract. With μ0=−$5,000, this
null hypothesis is that the proposed contract would save the HMO an average of $5,000 per doctor, compared with the
current contract; the alternative is that under the proposed contract, the HMO would save even more than that. With
μ0=$1,000, this null hypothesis is that doctors would be paid an average of $1,000 more per year under the new contract
than under the old one; the alternative hypothesis is that on average doctors would be paid less than an additional $1,000
per year under the new contract—perhaps even less than they are paid under the current contract. For the remainder of this
example, we shall take μ0=$1,000.
IThe data on which we shall base the test are observations of both ci and ti for a sample of n individuals
chosen at random with replacement from the population of N individuals (or a simple random sample of size
n<<N): We select n doctors at random from the N doctors under contract to the HMO, record the current
annual payments to them, and calculate what the payments to them would be under the terms of the new
contract. This is called a paired sample, because the samples from the population of control values and from
the population of treatment values come in pairs: one value for control and one for treatment for each
individual in the sample. Testing the hypothesis that the difference between two population means is equal to
μ0 using a paired sample is just the problem of testing the hypothesis that the population mean μ of the set
of differences
is equal to μ0. Denote the n (random) observed values of ci and ti by {C1,C2,…,Cn} and {T1,T2,…,Tn}, respectively. The
sample mean M of the differences between the observed values of ti and ci is the difference of the two sample means:
has expected value zero, and when n is large the probability histogram of Z can be approximated well by the normal curve. Thus
we can use Z as the Z statistic in a z test of the null hypothesis that μ=μ0. Under the alternative hypothesis that μ<μ0 (doctors
on the average are paid less than an additional $1,000 per year under the new contract), the expected value of Z is less than
zero, so we should use a left-tail z test. Under the alternative hypothesis μ≠μ0 (on average, the difference in average annual
payments to doctors is not an increase of $1,000, but some other number instead), the expected value of Z could be positive or
negative, so we would use a two-tail z test. Under the alternative hypothesis that μ>μ0 (on average, under the new contract,
doctors are paid more than an additional $1,000 per year), the expected value of Z would be greater than zero, so we should use
a right-tail z test.
IIndependent Samples
Consider two separate populations of numbers, with population means μt and μc, respectively. Let μ=μt−μc
be the difference between the two population means. We would like to test the null hypothesis that μ=μ0
against the alternative hypothesis that μ>0. For example, let μt be the average annual payment by an HMO to
doctors in the Los Angeles area, and let μc be the average annual payment by the same HMO to doctors in
the San Francisco area. Then the null hypothesis with μ0=0 is that the HMO pays doctors in the two regions
the same amount annually, on average; the alternative hypothesis is that the average annual payment by the
HMO to doctors differs between the two areas. Suppose we draw a random sample of size nt with
replacement from the first population, and independently draw a random sample of size nc with replacement
from the second population. Let Mt and Mc be the sample means of the two samples, respectively, and let
be the difference between the two sample means. Because the expected value of Mt is μt and the expected
value of Mc is μc, the expected value of M is
Because the two random samples are independent, Mt and −Mc are independent random variables, and the
SE of their sum is
Let st and sc be the sample standard deviations of the two samples, respectively. If nt and nc are both
very large, the two sample standard deviations are likely to be close to the standard deviations of the
corresponding populations, and so is likely to be close to SE(Mt), and is likely to be close to
SE(Mc).
has expected value zero and its probability histogram is approximated well by the normal curve, so we
can use it as the Z statistic in a z test.
the expected value of Z is greater than zero, so it is appropriate to use a right-tail z test.
If the alternative hypothesis were μ≠μ0, under the alternative the expected value of Z could be greater
than zero or less than zero, so it would be appropriate to use a two-tail z test. If the alternative hypothesis
were μ<μ0, under the alternative the expected value of Z would be less than zero, so it would be
appropriate to use a left-tail z test.
Nearly Normally Distributed Populations
if the list has mean μ and standard deviation SD, and for every pair of values a<b,
A list is nearly normally distributed if the normal curve is a good approximation to the histogram of the list
transformed to standard units. The histogram of a list that is approximately normally distributed is
(nearly) symmetric about some point, and is (nearly) bell-shaped.
No finite population can be exactly normally distributed, because the area under the normal curve
between every two distinct values is strictly positive—no matter how large or small the values nor how
close together they are. No population that contains only a finite number of distinct values can be exactly
normally distributed, for the same reason. In particular, populations that contain only zeros and ones are
not approximately normally distributed, so results for the sample mean of samples drawn from nearly
normally distributed populations need not apply to the sample percentage of samples drawn from 0-1
boxes. Such results will be more accurate for the sample percentage when the population percentage is
close to 50% than when the population percentage is close to 0% or 100%, because then the histogram
of population values is more nearly symmetric.
TESTING HYPOTHESES AND CONFIDENCE
INTERVAL USING T STATISTICS
T Test for a Mean
When the population standard deviation is unknown, the
z test is not normally used for testing hypotheses
involving means. A different test, called the t test, is
used. The distribution of the variable should be
approximately normal.
The t distribution is similar to the standard normal distribution in the following ways.
1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0 and are located at the center of the distribution.
4. The curve never touches the x axis.
The t distribution differs from the standard normal distribution in the following ways.
1. The variance is greater than 1.
2. The t distribution is a family of curves based on the degrees of freedom, which is a number
related to sample size. (Recall that the symbol for degrees of freedom is d.f. See Section 7–2
for an explanation of degrees of freedom.)
3. As the sample size increases, the t distribution approaches the normal distribution.
STUDENT'S T-CURVE
Student's t curve is similar to the normal curve, but broader. It is positive, has a single
maximum, and is symmetric about zero. The total area under Student's t curve is 100%.
Student's t curve approximates some probability histograms more accurately than the
normal curve does. There are actually infinitely many Student t curves, one for each positive
integer value of the degrees of freedom. As the degrees of freedom increases, the difference
between Student's t curve and the normal curve decreases.
T Test for the Mean of a Nearly Normally Distributed Population
Consider testing the null hypothesis that μ=μ0 using the sample mean M and sample
standard deviation s> of a random sample of size n drawn with replacement from a
population that is known to have a nearly normal distribution. Define
Student ‘s t curve with n-1 degrees of freedom will be an accurate approximation to the
probability histogram of T
As we saw earlier in this chapter for the Z statistic, these three approximations give three
tests of the null hypothesis μ=μ0 at approximate significance level a—a left-tail t test, a
right-tail t test, and a two-tail t test:
Use left tail test if the expected vale of T is less than zero
Use right tail test if the expected value of T is greater than zero
Use two tail test if the expected value of T is not zero, but could be less or greater than
zero
In right tail t test, P-value is the area under Student’s t curve with n-1 degrees of
freedom from t to infinity.
In two tail t test, P-value is the area under Student’s t curve with n-1 degrees of freedom
between minus infinity and -|t| and |t| and infinity.
The appropriate curve to use to find the rejection is the Student’s t curve with n-1 degrees of
freedom, where n is the number of individuals in the sample.
GROUP 3 MEMBERS:
Biñas, Mica Buanghog, Argie
Cabe , Aira Jane Cris, Jacob Ivan
Payot, Rizza Mae Ebale, Karl Kenneth
Puyat, Rocelle Leosala, John Rafael
Odon, Nathaniel
Piloton, Brandon
Reynaldo, Karl Andrei
Santonia, Jhon Mark
Tono, Kahlil Alexis