Orca Share Media1521296717980
Orca Share Media1521296717980
REVIEWER
IMPORTANT FORMULAE
Z-TEST FOR There are only two outcomes and the probability of success does not
PROPORTION ̂− 𝒑
𝒑 change from trial to trial( the outcomes of each trial are independent)
𝒁=
√𝒑𝒒
𝒏
TWO DEPENDENT ⃐𝑫 − 𝝁𝑫 The paired sample t-test, sometimes called the dependent sample t-test, is a
𝒕= statistical procedure used to determine whether the mean difference between two
T-TEST 𝑺𝑫 sets of observations is zero. In a paired sample t-test, each subject or entity is
√𝒏 measured twice, resulting in pairs of observations. Common applications of the
WHERE: paired sample t-test include case-control studies or repeated-measures designs.
Suppose you are interested in evaluating the effectiveness of a company training
𝟐 (∑ 𝑫)𝟐 program. One approach you might consider would be to measure the performance
∑𝑫 ∑𝑫 −
⃐ =
𝑫 𝒏
AND 𝑺𝑫 = √ 𝒏−𝟏 𝒏 of a sample of employees before and after completing the program, and analyze
the differences using a paired sample t-test.
PEARSON PRODUCT 𝒏 ∑ 𝒙𝒚 − (∑ 𝒙)(∑ 𝒚) The Pearson product-moment correlation coefficient (or Pearson correlation coefficient,
𝒓= for short) is a measure of the strength of a linear association between two variables and
√[𝒏(∑ 𝒙𝟐 ) − (∑ 𝒙)𝟐 ][𝒏(∑ 𝒚𝟐 ) − (∑ 𝒚)𝟐 ] is denoted by r. Basically, a Pearson product-moment correlation attempts to draw a line
of best fit through the data of two variables, and the Pearson correlation coefficient, r,
indicates how far away all these data points are to this line of best fit (i.e., how well the
data points fit this new model/line of best fit).
SPEARMAN RANK 𝟔 ∑ 𝑫𝟐 The Spearman's rank-order correlation is the nonparametric version of the Pearson
𝒑=𝟏− product-moment correlation. Spearman's correlation coefficient, (ρ, also signified by rs)
𝒏(𝒏𝟐 − 𝟏) measures the strength and direction of association between two ranked variables.
EXAMPLE # 1
A principal at a certain school claims that the students in his school are above average intelligence. A random sample of thirty students IQ scores have a mean score of 112. Is there
sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15.
Step 1:
State the Null hypothesis and Alternative Hypothesis
H0: μ=100.
H1: μ > 100..
The fact that we are looking for scores “greater than” a certain point means that this is a one-tailed test.
Step 2:
State the alpha level. If you aren’t given an alpha level, use 5% (0.05).
Step 3:
Find the rejection region area (given by your alpha level above) from the z-table. An area of .05 is equal to az-score of 1.645.
Step 4:
Find the test statistic using this formula:
Step 5:
If Step 4 is greater than Step 3, reject the null hypothesis. If it’s less than Step 3, you cannot reject the null hypothesis. In this case, it is greater (4.56 > 1.645), so you can reject
the null.
Step 6: Conclusion
EXAMPLE #2
Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A researcher thinks that a diet high in raw cornstarch will have a positive or negative
effect on blood glucose levels. A sample of 30 patients who have tried the raw cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw cornstarch had an
effect.
Step 1:
State the null hypothesis and Alternative hypothesis:
H0:μ=100
H1:≠100
Step 2:
State your alpha level. We’ll use 0.05 for this example. As this is a two-tailed test, split the alpha into two.
0.05/2=0.025
Step 3: Find the z-score associated with your alpha level. You’re looking for the area in one tail only. A z-score for 0.75(1-0.025=0.975) is 1.96. As this is a two-tailed test, you
would also be considering the left tail (z=1.96)
Step 4: Find the test statistic using this formula:
z=(140-100)/(15/√30)=14.60.
Step 5: If Step 4 is less than -1.96 or greater than 1.96 (Step 3), reject the null hypothesis. In this case, it is greater, so you can reject the null.
Step 6: Conclusion
In the population, the average IQ is 100. A team of scientists wants to test a new medication to see if it has either a positive or negative effect on intelligence, or no effect at all. A
sample of 30 participants who have taken the medication has a mean of 140 with a standard deviation of 20. Did the medication affect intelligence? Use alpha = 0.05.
2. State Alpha
Alpha = 0.05
3. Calculate Degrees of Freedom
df = n - 1 = 30 - 1 = 29
4. State Decision Rule
Using an alpha of 0.05 with a two-tailed test with 29 degrees of freedom, we would expect our distribution to look something like this:
Use the t-table to look up a two-tailed test with 29 degrees of freedom and an alpha of 0.05. We find a critical value of 2.0452. Thus, our decision rule for this two-tailed test is:
If t is less than -2.0452, or greater than 2.0452, reject the null hypothesis.
5. Calculate Test Statistic
6. State Results
t = 10.96
Result: Reject the null hypothesis.
7. State Conclusion
Medication significantly affected intelligence, t = 10.96, p < 0.05.
Two Independent Sample for Z – test
Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.
Men Women
Characteristic n S n s
Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance.
H0: μ1 = μ2
H1: μ1 ≠ μ2 α=0.05
Because both samples are large (> 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula,
we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s12/s22. Suppose we
call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.52/20.12 =
0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is
This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H0 if Z < -1.960 or is Z > 1.960.
We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common
standard deviation.
Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer
in value to the standard deviation in the women (20.1) as there were slightly more women in the sample. Recall, Sp is a weight average of the standard deviations in the
comparison groups, weighted by the respective sample sizes.
Now the test statistic:
Step 5. Conclusion.
We reject H0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and
women. The p-value is p < 0.010.
Example # 2
A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are
enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is
asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows.
Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test
using the five-step approach.
Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is
reasonable. The ratio of the sample variances, s12/s22 =28.72/30.32 = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is
reasonable. The appropriate test statistic is:
We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common
standard deviation.
Step 5. Conclusion.
We reject H0 because -2.92 < -1.701. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6
weeks as compared to patients taking placebo, p < 0.005.
TWO INDEPENDENT SAMPLE T – TEST
TWO DEPENDENT SAMPLE T – TEST
The management of Resale Furniture, a change of second hand furniture stores in Metro Manila, designed an incentive plan for sales people. To evaluate this innovative incentive
plan, 10 sales people were selected at random and their weekly incomes before and after the incentive plan were recorded.
Sales person 1 2 3 4 5 6 7 8 9 10
Before P2000 P3000 P2500 P3100 P1900 P1750 P2300 P3210 P2340 P1870
After P3500 P4120 P3800 P4200 P1820 P1600 P3400 P1900 P2340 P3290
Was there a significant increase in the typical salesperson’s weekly income due to the innovative incentive plan? Use 0.05 significance level.
H0: 𝜇𝐷 ≤ 0
H1: 𝜇𝐷 > 0
Step 4: compute
∑𝐷
⃐ =
𝐷
𝑁
6000
⃐ =
𝐷 = 600
10
STANDARD DEVIATION
929.50
TEST VALUE
2.041
Z TEST FOR PROPORTION
TERMINOLOGIES
• Null Hypothesis – Statement about the probability model generating the data, e.g. the data are from a normal distribution with mean 0, the data are from a
hypergeometric distribution, the data in the two-way table are from a multinomial distribution with independence between row and column probabilities. This is
the hypothesis we are interested in rejecting.
• Test statistic – We use the test statistic to see if the data observed are in agreement with the model. The test statistic is typically a simple statistic computed
from the data and large values (positive, negative, or both) indicate unlikely outcomes, or rare events.
• Null distribution – Under the assumption of the null hypothesis, the test statistic has a probability distribution called the null distribution. We use this
distribution to compute the chance of observing a test statistic as rare as ours.
• p-value – The chance of observing a test statistic as extreme as ours (or more extreme), computed using the null distribution of the test statistic.
• α or significance level – a probability which is fixed in advance of making the hypothesis test. If the observed p-value is smaller than the significance level then
the null hypothesis is rejected. This level is set in advance to keep the size of the observed significance level from influencing the decision to reject. For example,
if you set α = 0.01 and the p-value is 0.012 then the null hypothesis is not rejected, even though it is quite close to 0.01.
• highly statistical significant – when the p-value is below α =1% These are commonly accepted levels at which the hypothesis would be rejected. • Rejection –
when we reject a null hypothesis, we are not proving it wrong. We say that the data do not support the null hypothesis; that the chance of observing data like
ours is so small that we no longer think the null hypothesis is true.
• Alternative Hypothesis – An alternative to the null hypothesis. If the experiment that gives rise to the data is well designed then we are willing to accept the
data, but if there are flaws in the design then the small p-value may result even when the null hypothesis is true. There are similarities between confidence
intervals and hypothesis tests
Types of Error
Type I error involves cases where a hypothesis is true, but it is rejected because the test statistic exceeds the critical value for the significance level α.
Type II error occurs when the null hypothesis is false, but the data does not indicate that it should be rejected. This situation could be considered a "false
positive" result