0% found this document useful (0 votes)
207 views11 pages

Orca Share Media1521296717980

The document provides formulas and explanations for common statistical tests including one sample and two sample z-tests, one sample and two sample t-tests, tests of proportions, Pearson correlation, and Spearman rank correlation. Examples are given for conducting a one sample z-test to test if a sample mean comes from a given population mean and standard deviation. The document defines what each test is used for and the assumptions that must be met to apply each test.

Uploaded by

Ace Patriarca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views11 pages

Orca Share Media1521296717980

The document provides formulas and explanations for common statistical tests including one sample and two sample z-tests, one sample and two sample t-tests, tests of proportions, Pearson correlation, and Spearman rank correlation. Examples are given for conducting a one sample z-test to test if a sample mean comes from a given population mean and standard deviation. The document defines what each test is used for and the assumptions that must be met to apply each test.

Uploaded by

Ace Patriarca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

STATISTICS AND PROBABILITY

REVIEWER

IMPORTANT FORMULAE

TEST FORMULA WHEN TO USE


ONE SAMPLE ⃐𝑿 − 𝝁 The one-sample Z test is used when we want to know whether our sample
Z-TEST comes from a particular population. The one-sample Z test is used only for
𝒛= 𝝈 tests of the sample mean. Thus, our hypothesis tests whether the average of
n is greater than or
equal to 30 √𝒏 our sample (M) suggests that our students come from a population with a
know mean  or whether it comes from a different population.
ONE SAMPLE ⃐𝑿 − 𝝁 The one-sample t-test is a member of the t-test family. All the tests in the
T-TEST t-test family compare differences in mean scores of continuous-level
𝒕= 𝝈 (interval or ratio), normally distributed data. The 1-sample t-test does
n is less than 30 compare the mean of a single sample. Unlike the other tests, the
√𝒏 independent and dependent sample t-test it works with only one mean
score.
DEGREE OF Used in identifying the critical region for t distribution.
FREEDOM FOR df = n – 1
T-TEST
TWO INDEPENDENT ⃐ 𝟏 − ⃐𝑿𝟏 ) − (𝝁𝟏 − 𝝁𝟐 )
(𝑿 When you have two normally distributed but independent
Z-TEST 𝒁= populations, σ is known
n is greater than or 𝝈𝟐
𝟏 𝝈𝟐
𝟏
√ +
equal to 30 𝒏𝟏 𝒏𝟏
TWO INDEPENDENT VARIANCES ARE ASSUMED TO BE EQUAL
Your data must meet the following requirements:
T-TEST (⃐𝑿𝟏 − ⃐𝑿𝟏 ) − (𝝁𝟏 − 𝝁𝟐 )
n is less than 30 𝒕= 1. Dependent variable that is continuous (i.e., interval or ratio level)
(𝒏𝟏 − 𝟏)𝑺𝟐 𝟐 2. Independent variable that is categorical (i.e., two or more groups)
𝟏 +(𝒏𝟐 − 𝟏)𝑺𝟐

𝒏𝟏 +𝒏𝟐 − 𝟐
. √𝒏𝟏 + 𝒏𝟏 3. Cases that have values on both the dependent and independent variables
𝟏 𝟐 4. Independent samples/groups (i.e., independence of observations)
Random sample of data from the population
Normal distribution (approximately) of the dependent variable for each group
Homogeneity of variances (i.e., variances approximately equal across groups).
No outliers

Z-TEST FOR There are only two outcomes and the probability of success does not
PROPORTION ̂− 𝒑
𝒑 change from trial to trial( the outcomes of each trial are independent)
𝒁=
√𝒑𝒒
𝒏
TWO DEPENDENT ⃐𝑫 − 𝝁𝑫 The paired sample t-test, sometimes called the dependent sample t-test, is a
𝒕= statistical procedure used to determine whether the mean difference between two
T-TEST 𝑺𝑫 sets of observations is zero. In a paired sample t-test, each subject or entity is
√𝒏 measured twice, resulting in pairs of observations. Common applications of the
WHERE: paired sample t-test include case-control studies or repeated-measures designs.
Suppose you are interested in evaluating the effectiveness of a company training
𝟐 (∑ 𝑫)𝟐 program. One approach you might consider would be to measure the performance
∑𝑫 ∑𝑫 −
⃐ =
𝑫 𝒏
AND 𝑺𝑫 = √ 𝒏−𝟏 𝒏 of a sample of employees before and after completing the program, and analyze
the differences using a paired sample t-test.

PEARSON PRODUCT 𝒏 ∑ 𝒙𝒚 − (∑ 𝒙)(∑ 𝒚) The Pearson product-moment correlation coefficient (or Pearson correlation coefficient,
𝒓= for short) is a measure of the strength of a linear association between two variables and
√[𝒏(∑ 𝒙𝟐 ) − (∑ 𝒙)𝟐 ][𝒏(∑ 𝒚𝟐 ) − (∑ 𝒚)𝟐 ] is denoted by r. Basically, a Pearson product-moment correlation attempts to draw a line
of best fit through the data of two variables, and the Pearson correlation coefficient, r,
indicates how far away all these data points are to this line of best fit (i.e., how well the
data points fit this new model/line of best fit).
SPEARMAN RANK 𝟔 ∑ 𝑫𝟐 The Spearman's rank-order correlation is the nonparametric version of the Pearson
𝒑=𝟏− product-moment correlation. Spearman's correlation coefficient, (ρ, also signified by rs)
𝒏(𝒏𝟐 − 𝟏) measures the strength and direction of association between two ranked variables.

ONE SAMPLE Z TEST

EXAMPLE # 1

A principal at a certain school claims that the students in his school are above average intelligence. A random sample of thirty students IQ scores have a mean score of 112. Is there
sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15.
Step 1:
State the Null hypothesis and Alternative Hypothesis
H0: μ=100.
H1: μ > 100..
The fact that we are looking for scores “greater than” a certain point means that this is a one-tailed test.
Step 2:
State the alpha level. If you aren’t given an alpha level, use 5% (0.05).
Step 3:
Find the rejection region area (given by your alpha level above) from the z-table. An area of .05 is equal to az-score of 1.645.
Step 4:
Find the test statistic using this formula:

For this set of data: z= (112.5-100) / (15/√30)=4.56.

Step 5:
If Step 4 is greater than Step 3, reject the null hypothesis. If it’s less than Step 3, you cannot reject the null hypothesis. In this case, it is greater (4.56 > 1.645), so you can reject
the null.
Step 6: Conclusion

EXAMPLE #2
Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A researcher thinks that a diet high in raw cornstarch will have a positive or negative
effect on blood glucose levels. A sample of 30 patients who have tried the raw cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw cornstarch had an
effect.
Step 1:
State the null hypothesis and Alternative hypothesis:
H0:μ=100
H1:≠100
Step 2:
State your alpha level. We’ll use 0.05 for this example. As this is a two-tailed test, split the alpha into two.
0.05/2=0.025
Step 3: Find the z-score associated with your alpha level. You’re looking for the area in one tail only. A z-score for 0.75(1-0.025=0.975) is 1.96. As this is a two-tailed test, you
would also be considering the left tail (z=1.96)
Step 4: Find the test statistic using this formula:

z=(140-100)/(15/√30)=14.60.
Step 5: If Step 4 is less than -1.96 or greater than 1.96 (Step 3), reject the null hypothesis. In this case, it is greater, so you can reject the null.
Step 6: Conclusion

ONE SAMPLE T – TEST

In the population, the average IQ is 100. A team of scientists wants to test a new medication to see if it has either a positive or negative effect on intelligence, or no effect at all. A
sample of 30 participants who have taken the medication has a mean of 140 with a standard deviation of 20. Did the medication affect intelligence? Use alpha = 0.05.

1. Define Null and Alternative Hypotheses

2. State Alpha
Alpha = 0.05
3. Calculate Degrees of Freedom
df = n - 1 = 30 - 1 = 29
4. State Decision Rule
Using an alpha of 0.05 with a two-tailed test with 29 degrees of freedom, we would expect our distribution to look something like this:
Use the t-table to look up a two-tailed test with 29 degrees of freedom and an alpha of 0.05. We find a critical value of 2.0452. Thus, our decision rule for this two-tailed test is:
If t is less than -2.0452, or greater than 2.0452, reject the null hypothesis.
5. Calculate Test Statistic

6. State Results
t = 10.96
Result: Reject the null hypothesis.
7. State Conclusion
Medication significantly affected intelligence, t = 10.96, p < 0.05.
Two Independent Sample for Z – test

Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.

Men Women

Characteristic n S n s

Systolic Blood Pressure 1,623 128.2 17.5 1,911 126.5 20.1

Diastolic Blood Pressure 1,622 75.6 9.8 1,910 72.6 9.7

Total Serum Cholesterol 1,544 192.4 35.2 1,766 207.1 36.7

Weight 1,612 194.0 33.8 1,894 157.7 34.6

Height 1,545 68.9 2.7 1,781 63.4 2.5


Body Mass Index 1,545 28.8 4.6 1,781 27.6 5.9

Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance.

 Step 1. Set up hypotheses and determine level of significance

H0: μ1 = μ2

H1: μ1 ≠ μ2 α=0.05

 Step 2. Select the appropriate test statistic.

Because both samples are large (> 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula,
we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s12/s22. Suppose we
call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.52/20.12 =
0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is

 Step 3. Set up decision rule.

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H0 if Z < -1.960 or is Z > 1.960.

 Step 4. Compute the test statistic.

We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common
standard deviation.

Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer
in value to the standard deviation in the women (20.1) as there were slightly more women in the sample. Recall, Sp is a weight average of the standard deviations in the
comparison groups, weighted by the respective sample sizes.
Now the test statistic:

 Step 5. Conclusion.

We reject H0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and
women. The p-value is p < 0.010.

Example # 2

A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are
enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is
asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows.

Treatment Sample Size Mean Standard Deviation

New Drug 15 195.9 28.7

Placebo 15 227.4 30.3

Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test
using the five-step approach.

 Step 1. Set up hypotheses and determine level of significance

H0: μ1 = μ2 H1: μ1 < μ2 α=0.05

 Step 2. Select the appropriate test statistic.

Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is
reasonable. The ratio of the sample variances, s12/s22 =28.72/30.32 = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is
reasonable. The appropriate test statistic is:

 Step 3. Set up decision rule.


This is a lower-tailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table (in More Resources to the right). In order to
determine the critical value of t we need degrees of freedom, df, defined as df=n1+n2-2 = 15+15-2=28. The critical value for a lower tailed test with df=28 and α=0.05 is -1.701
and the decision rule is: Reject H0 if t < -1.701.

 Step 4. Compute the test statistic.

We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common
standard deviation.

Now the test statistic,

 Step 5. Conclusion.

We reject H0 because -2.92 < -1.701. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6
weeks as compared to patients taking placebo, p < 0.005.
TWO INDEPENDENT SAMPLE T – TEST
TWO DEPENDENT SAMPLE T – TEST

The management of Resale Furniture, a change of second hand furniture stores in Metro Manila, designed an incentive plan for sales people. To evaluate this innovative incentive
plan, 10 sales people were selected at random and their weekly incomes before and after the incentive plan were recorded.

Sales person 1 2 3 4 5 6 7 8 9 10
Before P2000 P3000 P2500 P3100 P1900 P1750 P2300 P3210 P2340 P1870
After P3500 P4120 P3800 P4200 P1820 P1600 P3400 P1900 P2340 P3290
Was there a significant increase in the typical salesperson’s weekly income due to the innovative incentive plan? Use 0.05 significance level.

Step 1: state null and alternative hypothesis

H0: 𝜇𝐷 ≤ 0

H1: 𝜇𝐷 > 0

Step 2: the level of significance is 0.05

Step 3: Determine the critical region

Df = 9 and tcritical = 1.833

Step 4: compute

SALES PERSON BEFORE X1 AFTER X2 D = X2 – X1 D2 = (X2 – X1)


1 2000 3500 1500 2250000
2 3000 4120 1120 1254400
3 2500 3800 1300 1210000
4 3100 4200 1100 6400
5 1900 1820 -80 22500
6 1750 1600 -150 1210000
7 2300 3400 1100 1716000
8 3210 1900 -1310 0
9 2340 2340 0 2016400
10 1870 3290 1420
∑ 𝐷 = 6000 ∑ 𝐷2 = 11375800
DETERMINE THE MEAN OF THE DIFFERENCES

∑𝐷
⃐ =
𝐷
𝑁
6000
⃐ =
𝐷 = 600
10

STANDARD DEVIATION

929.50

TEST VALUE

2.041
Z TEST FOR PROPORTION
TERMINOLOGIES

• Null Hypothesis – Statement about the probability model generating the data, e.g. the data are from a normal distribution with mean 0, the data are from a
hypergeometric distribution, the data in the two-way table are from a multinomial distribution with independence between row and column probabilities. This is
the hypothesis we are interested in rejecting.

• Test statistic – We use the test statistic to see if the data observed are in agreement with the model. The test statistic is typically a simple statistic computed
from the data and large values (positive, negative, or both) indicate unlikely outcomes, or rare events.

• Null distribution – Under the assumption of the null hypothesis, the test statistic has a probability distribution called the null distribution. We use this
distribution to compute the chance of observing a test statistic as rare as ours.

• p-value – The chance of observing a test statistic as extreme as ours (or more extreme), computed using the null distribution of the test statistic.

• observed significance level – another name for the p-value.

• α or significance level – a probability which is fixed in advance of making the hypothesis test. If the observed p-value is smaller than the significance level then
the null hypothesis is rejected. This level is set in advance to keep the size of the observed significance level from influencing the decision to reject. For example,
if you set α = 0.01 and the p-value is 0.012 then the null hypothesis is not rejected, even though it is quite close to 0.01.

• statistical significant – when the p-value is below α =5%,

• highly statistical significant – when the p-value is below α =1% These are commonly accepted levels at which the hypothesis would be rejected. • Rejection –
when we reject a null hypothesis, we are not proving it wrong. We say that the data do not support the null hypothesis; that the chance of observing data like
ours is so small that we no longer think the null hypothesis is true.

• Alternative Hypothesis – An alternative to the null hypothesis. If the experiment that gives rise to the data is well designed then we are willing to accept the
data, but if there are flaws in the design then the small p-value may result even when the null hypothesis is true. There are similarities between confidence
intervals and hypothesis tests

Types of Error

Type I error involves cases where a hypothesis is true, but it is rejected because the test statistic exceeds the critical value for the significance level α.

Type II error occurs when the null hypothesis is false, but the data does not indicate that it should be rejected. This situation could be considered a "false
positive" result

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy