0% found this document useful (0 votes)
34 views75 pages

IEP102 Group 3

This document discusses hypothesis testing for comparing two populations. It covers testing for differences in population means and proportions using independent samples. The key steps outlined are stating the null and alternative hypotheses, selecting an appropriate test statistic such as z-test, t-test, or chi-square test, determining the significance level, collecting and analyzing sample data, computing the test statistic and p-value, and making a decision to reject or fail to reject the null hypothesis based on comparing the p-value to the significance level. Examples of applying these techniques to compare things like weather patterns, student exam scores, call duration times, and voter preferences are also provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views75 pages

IEP102 Group 3

This document discusses hypothesis testing for comparing two populations. It covers testing for differences in population means and proportions using independent samples. The key steps outlined are stating the null and alternative hypotheses, selecting an appropriate test statistic such as z-test, t-test, or chi-square test, determining the significance level, collecting and analyzing sample data, computing the test statistic and p-value, and making a decision to reject or fail to reject the null hypothesis based on comparing the p-value to the significance level. Examples of applying these techniques to compare things like weather patterns, student exam scores, call duration times, and voter preferences are also provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

GROUP 3

IEP102
2BSIE-A
CONTENT
STATISTICAL INFERENCE:

01 HYPOTHESIS TESTING FOR TWO


POPULATION

TESTING HYPOTHESES AND


02 CONFIDENCE INTERVAL USING
Z STATISTICS

TESTING HYPOTHESES AND


03 CONFIDENCE INTERVAL USING
T STATISTICS
STATISTICAL INFERENCE:
HYPOTHESIS TESTING FOR TWO
POPULATION
Hypothesis testing for two
populations is a statistical technique
used to compare two groups or
populations and determine if there is
a significant difference between their
means, variances, or proportions. It is
commonly used in various fields,
including social sciences, medicine,
and business, to draw conclusions
based on sample data.
TO PERFORM HYPOTHESIS TESTING FOR TWO
POPULATIONS, THE FOLLOWING STEPS ARE
TYPICALLY FOLLOWED:
STEP 1: STEP 2: STEP 3:
State the null and alternative
Select an appropriate test Determine the significance
hypotheses: - The null
statistic: - The choice of the level (α): - The significance
hypothesis (H0) represents the
test statistic depends on the level, denoted by α,
assumption that there is no
nature of the data and the represents the threshold
significant difference between
specific hypothesis being below which the null
the two populations. - The
tested. Commonly used test hypothesis is rejected.
alternative hypothesis (Ha)
statistics include the t-test, z- Commonly used values for α
represents the claim or research
test, chi-square test, and F- are 0.05 and 0.01,
hypothesis that there is a
test. corresponding to 5% and 1%
significant difference between
significance levels,
the two populations. It can be
respectively.
one-sided or two-sided,
depending on the research
question.
TO PERFORM HYPOTHESIS TESTING FOR TWO
POPULATIONS, THE FOLLOWING STEPS ARE
TYPICALLY FOLLOWED:
STEP 4: STEP 5: STEP 6:
Compute the test statistic and Make a decision: - Compare the p-
Collect and analyze the data: p-value: - The test statistic is value to the significance level (α). -
- Random samples are calculated using the sample If the p-value is less than α, the null
collected from each data and the chosen test hypothesis is rejected in favor of the
population, and relevant data statistic formula. - The p-value alternative hypothesis. It indicates
is the probability of observing a that there is sufficient evidence to
(e.g., sample means,
support the claim of a significant
variances, or proportions) are test statistic as extreme as the
difference between the populations.
one computed, assuming the
computed. - If the p-value is greater than or
null hypothesis is true.
equal to α, the null hypothesis is not
rejected. It suggests that there is
not enough evidence to conclude a
significant difference between the
populations.
TO PERFORM HYPOTHESIS TESTING FOR TWO
POPULATIONS, THE FOLLOWING STEPS ARE
TYPICALLY FOLLOWED:
Examples:
STEP 7:
Is the weather in Chicago significantly different
than it was 10 years ago?
Draw conclusions: - Based on the decision
Do students who have access to audiovisual
made in Step 6, conclusions are drawn aids for a class perform better than students
regarding the research question or who do not?
hypothesis being tested. It's worth noting Does the variability in the duration of a call
that the specific details and formulas for decrease when the signal reception is
hypothesis testing may vary depending on improved?
the characteristics of the data and the test Do voters from one group overwhelmingly
being used. It is essential to consult prefer one candidate over another in a local
election compared to another group of voters?
appropriate statistical references or
All of the above examples have one thing in
software for conducting hypothesis tests
common: they deal with two populations and
accurately. how they compare and contrast. We will again
deal here with hypothesis testing for means,
variances, and proportions. Visually, we discuss
the following:
DIFFERENT DATA SOURCES
*UNRELATED
*INDEPENDENT
POPULATION MEANS, *SAMPLE SELECTED FROM ONE
INDEPENDENT SAMPLES POPULATION HAS NO EFFECT ON
THE SAMPLE SELECTED FROM THE
OTHER POPULATION
USE THE DIFFERENCE BETWEEN
THE TWO SAMPLE MEANS
USE Z TEST OR POOLED VARIANCE
T TEST
THE TEST STATISTIC FOR
POPULATION MEANS,
INDEPENDENT SAMPLES
WHEN AND
POPULATION MEANS,
ARE KNOWN AND BOTH
INDEPENDENT SAMPLES
POPULATIONS ARE NORMAL OR
BOTH SAMPLE SIZES ARE
ATLEAST 30, THE TEST
STATISTIC IS A Z-VALUE...

AND THE STANDARD ERROR OF


DIFFERENCE BETWEEN
TWO MEANS

POPULATION MEANS, GOAL:


INDEPENDENT SAMPLES FORM A CONFIDENCE INTERVAL
FOR THE DIFFERENCE BETWEEN
2 POPULATION MEANS
POPULATION MEANS, THE CONFIDENCE INTERVAL FOR
INDEPENDENT SAMPLES
POPULATION MEANS,
INDEPENDENT SAMPLES
HYPOTHESIS TEST FOR
POPULATION MEANS,
INDEPENDENT
SAMPLES

USE A Z TEST STATISTIC

USE S TO ESTIMATE
UNKNOWN ,USE A T TEST
STATISTIC AND POOLED
STANDARD DEVIATION
TWO POPULATION PROPORTIONS
POPULATION GOAL: FORM A CONFIDENCE
PROPORTIONS INTERVAL FOR OR TEST A
HYPOTHESIS ABOUT TWO
POPULATION PROPORTIONS
ASSUMPTIONS:
ESTIMATION FOR TWO POPULATION
ESTIMATING TWO
POPULATION
VALUES

POPULATION
MEANS, PAIRED POPULATION
INDEPENDENT SAMPLES PROPORTION
SAMPLES

EXAMPLES:

GROUP 1 VS. SAME GROUP


PROPORTION 1 VS.
INDEPENDENT BEFORE VS.AFTER
PROPORTION 2
GROUP 2 TREATMENT
THE TEST STATISTIC FOR
POPULATION MEANS,
INDEPENDENT SAMPLES
HYPOTHESIS TESTS FOR TWO POPULATION
PROPORTIONS
POPULATION PROPORTIONS
PAIRED SAMPLES EXAMPLES
Assume you send your salespeople to a customer service
training workshop. Is the training effective?You collect
the following data:
TWO POPULATION PROPORTIONS EXAMPLES
Is there a significance difference between the
proportion of men and the proportion of
women who will vote Yes on proportion A?
In a random sample, 36 of 72 men and 31 of 50
women indicated they would voted Yes
POPULATION Test at the .05 level of significance
PROPORTION
The test statistic for
FORMING INTERVAL ESTIMATES:
POPULATION MEANS,
INDEPENDENT SAMPLES

THE POPULATION VARIANCES


ARE ASSUMED EQUAL SO USE
THE 2 SAMPLE STANDARD
DEVIATIONS AND POOL THEM TO
ESTIMATE

THE TEST STATISTIC IS A T-


VALUE WITH
DEGREESS OF FREEDOM
PAIRED SAMPLES SOLUTION
HAS THE TRAINING MADE A DIFFERENCE IN THE NUMBER OF
COMPLAINTS (AT THE 0.01 LEVEL)

TEST STATISTIC: 1.66

1.66
THE CONFIDENCE INTERVAL FOR
IS:

PAIRED SAMPLES

N IS THE NUMBER OF PAIRS IN THE PAIRED SAMPLE


POPULATION MEANS,
INDEPENDENT SAMPLES
THE POOLED STANDARD
DEVIATION IS:
POPULATION MEANS,
INDEPENDENT SAMPLES

ASSUMPTIONS:
POPULATIONS ARE NORMALLY
DISTRIBUTED

THE POPULATIONS HAVE EQUAL


VARIANCES

SAMPLES ARE INDEPENDENT


HYPOTHESIS TESTING FOR PAIRED
SAMPLES

THE TEST STATISTIC FOR


IS:

PAIRED
SAMPLES

N IS THE
NUMBER OF
PAIRS IN THE
PAIRED SAMPLES
THE CONFIDENCE INTERVAL
POPULATION MEANS, FOR
INDEPENDENT SAMPLES
IS:
CALCULATING THE TEST STATISTIC
The test statistic is:
HYPOTHESIS TESTING
solution:

2.040

2.040
HYPOTHESIS TESTING FOR PAIRED SAMPLES
PAIRED SAMPLES
CONFIDENCE INTERVAL FOR 2 POPULATION PROPORTIONS

POPULATION PROPORTIONS
THE CONFIDENCE INTERVAL FOR
IS:
HYPOTHESIS TESTS FOR 2 POPULATION PROPORTIONS

2 POPULATION MEANS, INDEPENDENT SAMPLES


2 POPULATION PROPORTIONS

Since we begin by assuming the null


POPULATION
hypothesis is true, we assume p1=p2
PROPORTIONS
and pool the two estimates.

the pooled estimate for the overall


proportion is:

where x1 and x2 are the numbers from


samples 1 and 2
HYPOTHESIS TESTS FOR 2 POPULATION PROPORTIONS
POPULATION PROPORTIONS
HYPOTHESIS TESTS FOR 2 POPULATION PROPORTIONS
EXAMPLE

2 POPULATION PROPORTIONS

THE HYPOTHESIS TEST IS:

THE SAMPLE PROPORTIONS ARE:


THE POOLED ESTIMATE FOR THE
OVERALL PROPORTION IS:
CONTINGENCY TABLE EXAMPLE
LEFT HANDED VS. GENDER
DOMINANT HAND: LEFT VS. RIGHT
GENDER: MALE VS. FEMALE

SAMPLE RESULTS IN A CONTINGENCY TABLE


LOGIC OF THE TEST
OBSERVED VS. EXPECTED FREQUENCY
EXAMPLE: 2 POPULATION FREQUENCY
PAIRED DIFFERENCES

PAIRED SAMPLES
THE POINT ESTIMATE
FOR THE POPULATION
MEAN PAIRED
DIFFERENCE IS

THE SAMPLE
STANDARD DEVIATION
IS

N IS THE NUMBERS OF PAIRS IN THE PAIRED SAMPLE


EXPECTED CELL FREQUENCIES
HYPOTHESIS TEST FOR

2 POPULATION MEANS, INDEPENDENT SAMPLES


HYPOTHESIS TESTING FOR THE RATIO OF THE VARIANCES OF TWO NORMALLY
DISTRIBUTED POPULATIONS
As must be obvious by now, we are taking each two
population confidence interval and adapting it to the
hypothesis testing procedure. Next up is the ratio of the
two unknown variances of two normally
distributed populations.

RATIO OF
VARIANCES
HYPOTHESIS TESTING FOR THE RATIO OF THE VARIANCES OF TWO NORMALLY
DISTRIBUTED POPULATIONS
HYPOTHESIS TESTING FOR THE DIFFERENCE IN THE PROPORTIONS OF TWO
POPULATION
HYPOTHESIS TESTING FOR THE DIFFERENCE IN THE PROPORTIONS OF TWO
POPULATION
HYPOTHESIS TESTING FOR THE DIFFERENCE IN THE PROPORTIONS OF TWO
POPULATION
TESTING HYPOTHESES AND CONFIDENCE
INTERVAL
There is a relationship between confidence intervals and
hypothesis testing. When the null hypothesis is rejected in a
hypothesis-testing situation, the confidence interval for the
mean using the same level of significance will not contain the
hypothesized mean. Likewise, when the null hypothesis is not
rejected, the confidence interval computed using the same
level of significance will contain the hypothesized mean.
EXAMPLE
Sugar Production Sugar is packed in 5-
pound bags. An inspector suspects the
bags may not contain 5 pounds. A sample
of 50 bags produces a mean of 4.6
pounds and a standard deviation of 0.7
pound. Is there enough evidence to
conclude that the bags do not contain 5
pounds as stated at a 0.05? Also, find the
95% confidence interval of the true mean.

In summary, then, when the null hypothesis is rejected at a significance level of a, the confidence interval
computed at the 1 a level will not contain the value of the mean that is stated in the null hypothesis. On
the other hand, when the null hypothesis is not rejected, the confidence interval computed at the same
significance level will contain the value of the mean stated in the null hypothesis. These results are true for
other hypothesis testing situations and are not limited to means tests. The relationship between
confidence intervals and hypothesis testing presented here is valid for two-tailed tests. The relationship
between one-tailed hypothesis tests and one sided or one-tailed confidence intervals is also valid.
The hypothesis testing and confidence interval results always agree. To understand the basis of this
agreement, remember how confidence levels and significance levels function:

A confidence level determines the distance between the sample mean and the confidence limits.

A significance level determines the distance between the null hypothesis value and the critical regions.

Both of these concepts specify a distance from the mean to a limit. Surprise! These distances are
precisely the same length.

A 1-sample t-test calculates this distance as follows:

The critical t-value * standard error of the mean

Interpreting these statistics goes beyond the scope of this article. But, using this equation, the distance for
our fuel cost example is $63.57.

P-value and significance level approach: If the sample mean is more than $63.57 from the null hypothesis
mean, the sample mean falls within the critical region, and the difference is statistically significant.

Confidence interval approach: If the null hypothesis mean is more than $63.57 from the sample mean, the
interval does not contain this value, and the difference is statistically significant.
Approximate Hypothesis Tests: the z Test and the t Test

The two test are the two common test of hypothesis that a population mean
equal a particular value and the of the hypothesis that two population means are
equal: the z test and t test.
They are based on approximations to the probability distribution of the test
statistic when the null hypothesis is true, so their significance levels are not
exactly what they claim to be.
The test are accurate if the sample size is large and the population is nearly
normal, but the can be inaccurate otherwise.
The z test uses the normal approximation, while the t test uses the Student’s t
curve.
Hypothesis tests and confidence intervals are related.
TESTING HYPOTHESES AND CONFIDENCE
INTERVAL USING Z STATISTICS
A Z-test is a type of statistical hypothesis test where the test-
statistic follows a normal distribution.

The name Z-test comes from the Z-score of the normal


distribution. This is a measure of how many standard deviations
away a raw score or sample statistics is from the populations’
mean.

Z-tests are the most common statistical tests conducted in fields


such as healthcare and data science. Therefore, it’s an essential
concept to understand.
REQUIREMENTS FOR A Z-TEST
In order to conduct a Z-test, your statistics need to meet a few requirements, including:

A Sample size that’s greater than 30. This is because we want to ensure our sample mean
comes from a distribution that is normal. As stated by the central limit theorem, any
distribution can be approximated as normally distributed if it contains more than 30 data
points.

The standard deviation and mean of the population is known.

The sample data is collected/acquired randomly.


Z-TEST STEPS
1. STATE THE NULL 2. STATE THE 3. CHOOSE YOUR 4. CALCULATE
HYPOTHESIS ALTERNATIVE CRITICAL VALUE YOUR Z-TEST
sample mean, μ_1
Null hypothesis or Alternate hypothesis Critical value the population mean,
Ho is always what or H1 is always the determines if the μ_0
the researcher opposite of the null null hypothesis is the number of data
believes that is true hypothesis. This rejected or not. This points in the sample,
from the researched happens when the is based on n
population or the sample mean is not confidence intervals. population’s
mean, Ho: μ = μo equal to the standard deviation,
population mean, H1: σ
μ is not equal to μ0.

NOTE: If the test statistic is greater (or lower depending on the test we are conducting) than
the critical value, then the alternate hypothesis is true because the sample’s mean is
statistically significant enough from the population mean.
Z-TEST EXAMPLE

A school says that its pupils are, on


average, smarter than other schools. It
takes a sample of 50 students whose
average IQ measures to be 110. The
population, or the rest of the schools,
has an average IQ of 100 and standard
deviation of 20.
Suppose that we are testing a null hypothesis using a test statistic X, and the following conditions
hold:

We have a probability model for how the observations arise, assuming the null hypothesis is true.
Typically, the model is that under the null hypothesis, the data are like random draws with or
without replacement from a box of numbered tickets.

Under the null hypothesis, the test statistic X, converted to standard units, has a probability
histogram that can be approximated well by the normal curve.

Under the null hypothesis, we can find the expected value of the test statistic, X.

Under the null hypothesis, either we can find the SE of the test statistic, SE(X), or we can estimate
SE(X)accurately enough to ignore the error of the estimate of the SE. Let se denote either the exact
SE of X under the null hypothesis, or the estimated value of SE(X) under the null hypothesis.
Then, under the null hypothesis, the probability histogram of the Z statistic Z=(X−E(X))/se is
approximated well by the normal curve, and we can use the normal approximation to select the
rejection region for the test using Z as the test statistic. If the null hypothesis is true,

P(Z<za)≈a

P(Z>z1−a)≈a, and

P(|Z|>z1−a/2)≈a..

These three approximations yield three different z tests of the hypothesis that μ=μ0=0 at
approximate significance level a:

Reject the null hypothesis whenever Z<za (left-tail z test)


Reject the null hypothesis whenever Z>z1−a (right-tail z test)
Reject the null hypothesis whenever |Z|>z1−a/2 (two-tail z test)
The word “tail” refer to the tails of the normal curve:

In a left-tail test, the probability of a TYPE 1 ERROR is approximately the area of the left tail
of the normal curve, from minus infinity to za.

In a right-tail test, the probability of a TYPE 1 ERROR is approximately the area of the right
tail to the normal curve, from z1−a to infinity.

In a two-tail test, the probability of a TYPE 1 ERROR is approximately the sum of the areas
of both tails of the normal curve, the left tail from minus infinity to za/2 and the right tail
from z1−a/2 to infinity.

These three tests are called z tests. The observed value of Z is called the Z score.
Which of these three test if any, should one use?

If under the alternative hypothesis, E(Z)<0, use the left-tail test


If under the alternative hypothesis, E(Z)>0, use the right-tail test
If under the alternative hypothesis, it is possible that E(Z)<0 and it is possible that E(Z)>0,
use the two-tail test.
If under the alternative hypothesis, E(Z)=0, consult a statistician

The P value depends on the type of z test (left-tail, right-tail or two-tail) and the observed value
of the Z statistic, which is the test statistic in standard units under the null hypothesis.

P value is calculated using the normal approximation to the probability distribution of the Z
statistic under the null hypothesis.
P VALUE FOR Z TEST

Each of the three z tests gives us a family of procedures for testing the null hypothesis at any
(approximate) significance level a between 0 and 100%—we just use the appropriate quantile of the
normal curve. This makes it particularly easy to find the P value for a z test. Recall that the P value is
the smallest significance level for which we would reject the null hypothesis, among a family of tests
of the null hypothesis at different significance levels.

Suppose the z score (the observed value of Z) is x. In a left-tail test, the P value is the area under the
normal curve to the left of x: Had we chosen the significance level a so that za=x, we would have
rejected the null hypothesis, but we would not have rejected it for any smaller value of a, because for
all smaller values of a, za<x. Similarly, for a right-tail z test, the P value is the area under the normal
curve to the right of x: If x=z1−a we would reject the null hypothesis at approximate significance level
a, but not at smaller significance levels. For a two-tail z test, the P value is the sum of the area under
the normal curve to the left of −|x| and the area under the normal curve to the right of |x|.

Finding P values and specifying the rejection region for the z test involves the probability distribution
of Z under the assumption that the null hypothesis is true. Rarely is the alternative hypothesis
sufficiently detailed to specify the probability distribution of Z completely, but often the alternative
does help us choose intelligently among left-tail, right-tail, and two-tail z tests. This is perhaps the
most important issue in deciding which hypothesis to take as the null hypothesis and which as the
alternative: We calculate the significance level under the null hypothesis, and that calculation must
be tractable.
Z TEST FOR POPULATION CHANGE

Suppose we have a population of N units of which G are labeled "1" and the rest are labeled "0." Let
p=G/N be the population percentage. Consider testing the null hypothesis that p=p0 against the
alternative hypothesis that p≠p0, using a random sample of n units drawn with replacement. (We
could assume instead that N>>n and allow the draws to be without replacement.)

Under the null hypothesis, the sample percentage

Has expected value E(ϕ)=p0 and standard error

Let Z be ϕ transformed to standard units:

Provided n is large and p0 is not too close to zero or 100% (say n×p>30 and n×(1−p)>30), the probability
histogram of Z will be approximated reasonably well by the normal curve, and we can use it as the Z
statistic in a z test. For example, if we reject the null hypothesis when |Z|>1.96, the significance level of
the test will be about 95%.
Z TEST FOR POPULATION MEAN

Consider testing the null hypothesis that the population mean μ is equal to a specific null value μ0,
against the alternative hypothesis that μ<μ0, on the basis of a random sample with replacement of size
n. Recall that the sample mean M of n random draws with or without replacement from a box of
numbered tickets is an unbiased estimator of the population mean μ: If

Then,

where N is the size of the population. The population mean determines the expected value of the
sample mean. The SE of the sample mean of a random sample with replacement is

where SD(box) is the SD of the list of all the numbers in the box, and n is the sample size. As a special
case, the sample percentage \phi of n independent random draws from a 0-1 box is an unbiased
estimator of the population percentage p, with SE equal to
IZ TEST FOR A DIFFERENCE OF POPULATION MEAN
Paired Samples

Consider a population of N individuals, each of whom is labeled with two numbers. For example, the N individuals might be a
group of doctors, and the two numbers that label each doctor might be the annual payments to the doctor by an HMO under
the terms of the current contract and under the terms of a proposed revision of the contract. Let the two numbers
associated with individual i be ci and ti. (Think of c as control and t as treatment. In this example, control is the current
contract, and treatment is the proposed contract.) Let μc be the population mean of the N values

{c1,c2,…,cN},

and let μt be the population mean of the N values

{t1,t2,…,tN}.

Suppose we want to test the null hypothesis that


μ=μt−μc=μ0

against the alternative hypothesis that μ<μ0. With μ0=$0, this null hypothesis is that the average annual payment to doctors
under the proposed revision would be the same as the average payment under the current contract, and the alternative is
that on average doctors would be paid less under the new contract than under the current contract. With μ0=−$5,000, this
null hypothesis is that the proposed contract would save the HMO an average of $5,000 per doctor, compared with the
current contract; the alternative is that under the proposed contract, the HMO would save even more than that. With
μ0=$1,000, this null hypothesis is that doctors would be paid an average of $1,000 more per year under the new contract
than under the old one; the alternative hypothesis is that on average doctors would be paid less than an additional $1,000
per year under the new contract—perhaps even less than they are paid under the current contract. For the remainder of this
example, we shall take μ0=$1,000.
IThe data on which we shall base the test are observations of both ci and ti for a sample of n individuals
chosen at random with replacement from the population of N individuals (or a simple random sample of size
n<<N): We select n doctors at random from the N doctors under contract to the HMO, record the current
annual payments to them, and calculate what the payments to them would be under the terms of the new
contract. This is called a paired sample, because the samples from the population of control values and from
the population of treatment values come in pairs: one value for control and one for treatment for each
individual in the sample. Testing the hypothesis that the difference between two population means is equal to
μ0 using a paired sample is just the problem of testing the hypothesis that the population mean μ of the set
of differences

is equal to μ0. Denote the n (random) observed values of ci and ti by {C1,C2,…,Cn} and {T1,T2,…,Tn}, respectively. The
sample mean M of the differences between the observed values of ti and ci is the difference of the two sample means:

=(sample mean of observed values of ti)−(sample mean of observed values of ci).


IM is an unbiased estimator of μ, and if n is large, the normal approximation to its probability histogram will be
accurate. The SE of M is the population standard deviation of the N values {d1,d2,…,dN}, which we shall denote SD,
divided by the square root of the sample size, n1/2. Let sd denote the sample standard deviation of the n observed
differences (Ti−Ci),i=1,2,…,n:
(recall that M is the sample mean of the observed differences). If the sample size n is large, sd is very likely to be close to SD(d),
and so, under the null hypothesis,

has expected value zero, and when n is large the probability histogram of Z can be approximated well by the normal curve. Thus
we can use Z as the Z statistic in a z test of the null hypothesis that μ=μ0. Under the alternative hypothesis that μ<μ0 (doctors
on the average are paid less than an additional $1,000 per year under the new contract), the expected value of Z is less than
zero, so we should use a left-tail z test. Under the alternative hypothesis μ≠μ0 (on average, the difference in average annual
payments to doctors is not an increase of $1,000, but some other number instead), the expected value of Z could be positive or
negative, so we would use a two-tail z test. Under the alternative hypothesis that μ>μ0 (on average, under the new contract,
doctors are paid more than an additional $1,000 per year), the expected value of Z would be greater than zero, so we should use
a right-tail z test.
IIndependent Samples

Consider two separate populations of numbers, with population means μt and μc, respectively. Let μ=μt−μc
be the difference between the two population means. We would like to test the null hypothesis that μ=μ0
against the alternative hypothesis that μ>0. For example, let μt be the average annual payment by an HMO to
doctors in the Los Angeles area, and let μc be the average annual payment by the same HMO to doctors in
the San Francisco area. Then the null hypothesis with μ0=0 is that the HMO pays doctors in the two regions
the same amount annually, on average; the alternative hypothesis is that the average annual payment by the
HMO to doctors differs between the two areas. Suppose we draw a random sample of size nt with
replacement from the first population, and independently draw a random sample of size nc with replacement
from the second population. Let Mt and Mc be the sample means of the two samples, respectively, and let

be the difference between the two sample means. Because the expected value of Mt is μt and the expected
value of Mc is μc, the expected value of M is

Because the two random samples are independent, Mt and −Mc are independent random variables, and the
SE of their sum is
Let st and sc be the sample standard deviations of the two samples, respectively. If nt and nc are both
very large, the two sample standard deviations are likely to be close to the standard deviations of the
corresponding populations, and so is likely to be close to SE(Mt), and is likely to be close to
SE(Mc).

is likely to be close to SE(M). Under the null hypothesis, the statistic

has expected value zero and its probability histogram is approximated well by the normal curve, so we
can use it as the Z statistic in a z test.

Under the alternative hypothesis

the expected value of Z is greater than zero, so it is appropriate to use a right-tail z test.

If the alternative hypothesis were μ≠μ0, under the alternative the expected value of Z could be greater
than zero or less than zero, so it would be appropriate to use a two-tail z test. If the alternative hypothesis
were μ<μ0, under the alternative the expected value of Z would be less than zero, so it would be
appropriate to use a left-tail z test.
Nearly Normally Distributed Populations

if the list has mean μ and standard deviation SD, and for every pair of values a<b,

A list is nearly normally distributed if the normal curve is a good approximation to the histogram of the list
transformed to standard units. The histogram of a list that is approximately normally distributed is
(nearly) symmetric about some point, and is (nearly) bell-shaped.

No finite population can be exactly normally distributed, because the area under the normal curve
between every two distinct values is strictly positive—no matter how large or small the values nor how
close together they are. No population that contains only a finite number of distinct values can be exactly
normally distributed, for the same reason. In particular, populations that contain only zeros and ones are
not approximately normally distributed, so results for the sample mean of samples drawn from nearly
normally distributed populations need not apply to the sample percentage of samples drawn from 0-1
boxes. Such results will be more accurate for the sample percentage when the population percentage is
close to 50% than when the population percentage is close to 0% or 100%, because then the histogram
of population values is more nearly symmetric.
TESTING HYPOTHESES AND CONFIDENCE
INTERVAL USING T STATISTICS
T Test for a Mean
When the population standard deviation is unknown, the
z test is not normally used for testing hypotheses
involving means. A different test, called the t test, is
used. The distribution of the variable should be
approximately normal.
The t distribution is similar to the standard normal distribution in the following ways.

1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0 and are located at the center of the distribution.
4. The curve never touches the x axis.

The t distribution differs from the standard normal distribution in the following ways.
1. The variance is greater than 1.
2. The t distribution is a family of curves based on the degrees of freedom, which is a number
related to sample size. (Recall that the symbol for degrees of freedom is d.f. See Section 7–2
for an explanation of degrees of freedom.)
3. As the sample size increases, the t distribution approaches the normal distribution.

The t test is defined next.


The formula for the t test is similar to the formula for the z test. But since the population standard
deviation s is unknown, the sample standard deviation s is used instead. The critical values for the t
test are given in Table F in Appendix C. For a one-tailed test, find the a level by looking at the top
row of the table and finding the appropriate column. Find the degrees of freedom by looking down
the left-hand column. Notice that the degrees of freedom are given for values from 1 through 30,
then at intervals above 30. When the degrees of freedom are above 30, some textbooks will tell
you to use the nearest table value; however, in this textbook, you should always round down to the
nearest table value. For example, if d.f. 59, use d.f. 55 to find the critical value or values. This is a
conservative approach. As the degrees of freedom get larger, the critical values approach the z
values.
T TEST
The z test is based on the normal distribution and requires a large sample size to be
accurate.
The t test is based on Student’s t distribution and can be used for small sample sizes.
The t distribution is wider than the normal distribution and depends on the sample size. The
smaller the sample size, the more spread out the t distribution is.

STUDENT'S T-CURVE
Student's t curve is similar to the normal curve, but broader. It is positive, has a single
maximum, and is symmetric about zero. The total area under Student's t curve is 100%.
Student's t curve approximates some probability histograms more accurately than the
normal curve does. There are actually infinitely many Student t curves, one for each positive
integer value of the degrees of freedom. As the degrees of freedom increases, the difference
between Student's t curve and the normal curve decreases.
T Test for the Mean of a Nearly Normally Distributed Population
Consider testing the null hypothesis that μ=μ0 using the sample mean M and sample
standard deviation s> of a random sample of size n drawn with replacement from a
population that is known to have a nearly normal distribution. Define

Student ‘s t curve with n-1 degrees of freedom will be an accurate approximation to the
probability histogram of T
As we saw earlier in this chapter for the Z statistic, these three approximations give three
tests of the null hypothesis μ=μ0 at approximate significance level a—a left-tail t test, a
right-tail t test, and a two-tail t test:

Reject the null hypothesis if T<tn−1,a (left-tail)


Reject the null hypothesis if T>tn−1,1−a (right-tail)
Reject the null hypothesis if |T|>tn−1,1−a/2 (two-tail)

Use left tail test if the expected vale of T is less than zero

Use right tail test if the expected value of T is greater than zero

Use two tail test if the expected value of T is not zero, but could be less or greater than
zero

Consult a statistician if the expected value of T is zero


P value t test are computed the same way as p-values for z tests
In left tail t test, P-value is the area under Student’s t curve with n-1 degrees of freedom
from minus infinity to t.

In right tail t test, P-value is the area under Student’s t curve with n-1 degrees of
freedom from t to infinity.

In two tail t test, P-value is the area under Student’s t curve with n-1 degrees of freedom
between minus infinity and -|t| and |t| and infinity.

Let μ1 and μ2 be the means of the two populations, and let

The T statistic to test the null hypothesis that μ=μ0 is

The appropriate curve to use to find the rejection is the Student’s t curve with n-1 degrees of
freedom, where n is the number of individuals in the sample.
GROUP 3 MEMBERS:
Biñas, Mica Buanghog, Argie
Cabe , Aira Jane Cris, Jacob Ivan
Payot, Rizza Mae Ebale, Karl Kenneth
Puyat, Rocelle Leosala, John Rafael
Odon, Nathaniel
Piloton, Brandon
Reynaldo, Karl Andrei
Santonia, Jhon Mark
Tono, Kahlil Alexis

INSTRUCTOR: Mr. John Lester


Fabello

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy