0% found this document useful (0 votes)
86 views12 pages

Bios Tat

There are two types of variables: categorical and numerical. Categorical variables yield categorical data like "male" or "female" while numerical variables yield discrete or continuous numerical data. The chi-square test compares counts or frequencies of categorical responses between groups. It is calculated using a contingency table that cross-tabulates the counts for each category. The test statistic is compared to critical values in the chi-square distribution table to determine if there are statistically significant differences between the observed and expected counts.

Uploaded by

chams_ortega
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views12 pages

Bios Tat

There are two types of variables: categorical and numerical. Categorical variables yield categorical data like "male" or "female" while numerical variables yield discrete or continuous numerical data. The chi-square test compares counts or frequencies of categorical responses between groups. It is calculated using a contingency table that cross-tabulates the counts for each category. The test statistic is compared to critical values in the chi-square distribution table to determine if there are statistically significant differences between the observed and expected counts.

Uploaded by

chams_ortega
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Types of Data:

There are basically two types of random variables and they yield two types of data: numerical and categorical. A chi square (X2)
statistic is used to investigate whether distributions of categorical variables differ from one another. Basically categorical variable
yield data in the categories and numerical variables yield data in numerical form. Responses to such questions as "What is your
major?" or Do you own a car?" are categorical because they yield data such as "biology" or "no." In contrast, responses to such
questions as "How tall are you?" or "What is your G.P.A.?" are numerical. Numerical data can be either discrete or continuous.
The table below may help you see the differences between these two variables.

 Data Type  Question Type Possible Responses


 Categorical  What is your sex? male or female
 Numerical Disrete- How many cars do you own? two or three
 Numerical Continuous - How tall are you?  72 inches

Notice that discrete data arise fom a counting process, while continuous data arise from a measuring process.

The Chi Square statistic compares the tallies or counts of categorical responses between two (or more) independent groups.
(note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.)

2 x 2 Contingency Table

There are several types of chi square tests depending on the way the data was collected and the hypothesis being tested. We'll
begin with the simplest case: a 2 x 2 contingency table. If we set the 2 x 2 table to the general notation shown below in Table 1,
using the letters a, b, c, and d to denote the contents of the cells, then we would have the following table:

Table 1. General notation for a 2 x 2 contingency table.


Variable 1
 Variable 2  Data type 1  Data type 2  Totals
 Category 1  a b a+b
 Category 2  c d c+d
 Total a+c b+d a+b+c+d=N

For a 2 x 2 contingency table the Chi Square statistic is calculated by the formula:

Note: notice that the four components of the denominator are the four totals from the table columns and rows.

Suppose you conducted a drug trial on a group of animals and you hypothesized that the animals receiving the drug would
survive better than those that did not receive the drug. You conduct the study and collect the following data:

Ho: The survival of the animals is independent of drug treatment.

Ha: The survival of the animals is associated with drug treatment.

Table 2. Number of animals that survived a treatment.


   Dead  Alive Total
 Treated  36  14  50
 Not
 30  25  55
treated
 Total  66  39  105

Applying the formula above we get:

Chi square = 105[(36)(25) - (14)(30)]2 / (50)(55)(39)(66) = 3.418

Before we can proceed we eed to know how many degrees of freedom we have. When a comparison is made between one
sample and another, a simple rule is that the degrees of freedom equal (number of columns minus one) x (number of rows minus
one) not counting the totals for rows or columns. For our data this gives (2-1) x (2-1) = 1.

We now have our chi square statistic (x 2 = 3.418), our predetermined alpha level of significalnce (0.05), and our degrees of
freedom (df =1). Entering the Chi square distribution table with 1 degree of freedom and reading along the row we find our value
of x2 (3.418) lies between 2.706 and 3.841. The corresponding probability is 0.10<P<0.05. This is below the conventionally
accepted significance level of 0.05 or 5%, so the null hypothesis that the two distributions are the same is verified. In other
words, when the computed x2 statistic exceeds the critical value in the table for a 0.05 probability level, then we can reject the
null hypothesis of equal distributions. Since our x 2 statistic (3.418) did not exceed the critical value for 0.05 probability level
(3.841) we can accept the null hypothesis that the survival of the animals is independent of drug treatment (i.e. the drug had no
effect on survival).

Table 3. Chi Square distribution table.

probability level (alpha)


Df 0.5 0.10 0.05 0.02 0.01 0.001

1 0.455 2.706 3.841 5.412 6.635 10.827

2 1.386 4.605 5.991 7.824 9.210 13.815

3 2.366 6.251 7.815 9.837 11.345 16.268

4 3.357 7.779 9.488 11.668 13.277 18.465

5 4.351 9.236 11.070 13.388 15.086 20.517

To make the chi square calculations a bit easier, plug your observed and expected values into the following applet. Click on the
cell and then enter the value. Click the compute button on the lower right corner to see the chi square value printed in the lower
left hand coner.

Note: Some earlier versions of Netscape for the Macintosh do not support java 1.1 and if you are using one of these browsers you
will not see the applet.

Chi Square Goodness of Fit (One Sample Test)

This test allows us to compae a collection of categorical data with some theoretical expected distribution. This test is often used
in genetics to compare the results of a cross with the theoretical distribution based on genetic theory. Suppose you preformed a
simpe monohybrid cross between two individuals that were heterozygous for the trait of interest.

Aa x Aa

The results of your cross are shown in Table 4.

Table 4. Results of a monohybrid coss between two heterozygotes for the 'a' gene.
   A  a  Totals
 A  10  42  52
 a  33  15  48
 Totals  43  57  100

The penotypic ratio 85 of the A type and 15 of the a-type (homozygous recessive). In a monohybrid cross between two
heterozygotes, however, we would have predicted a 3:1 ratio of phenotypes. In other words, we would have expected to get 75
A-type and 25 a-type. Are or resuls different?

Calculate the chi square statistic x2 by completing the following steps:

1. For each observed number in the table subtract the corresponding expected number (O — E).


2. Square the difference [ (O —E)2 ].
3. Divide the squares obtained for each cell in the table by the expected number for that cell [ (O - E)2 / E ].
4. Sum all the values for (O - E)2 / E. This is the chi square statistic.

For our example, the calculation would be:

  Observed Expected (O — E) (O — E)2 (O — E)2/ E

A-type 85 75 10 100 1.33

a-type 15 25 10 100 4.0

Total 100 100      5.33

x2 = 5.33

We now have our chi square statistic (x 2 = 5.33), our predetermined alpha level of significalnce (0.05), and our degrees of
freedom (df =1). Entering the Chi square distribution table with 1 degree of freedom and reading along the row we find our value
of x2 5.33) lies between 3.841 and 5.412. The corresponding probability is 0.05<P<0.02. This is smaller than the conventionally
accepted significance level of 0.05 or 5%, so the null hypothesis that the two distributions are the same is rejected. In other
words, when the computed x2 statistic exceeds the critical value in the table for a 0.05 probability level, then we can reject the
null hypothesis of equal distributions. Since our x 2 statistic (5.33) exceeded the critical value for 0.05 probability level (3.841) we
can reject the null hypothesis that the observed values of our cross are the same as the theoretical distribution of a 3:1 ratio.

Table 3. Chi Square distribution table.

probability level (alpha)


Df 0.5 0.10 0.05 0.02 0.01 0.001

1 0.455 2.706 3.841 5.412 6.635 10.827

2 1.386 4.605 5.991 7.824 9.210 13.815

3 2.366 6.251 7.815 9.837 11.345 16.268

4 3.357 7.779 9.488 11.668 13.277 18.465

5 4.351 9.236 11.070 13.388 15.086 20.517

To put this into context, it means that we do not have a 3:1 ratio of A_ to aa offspring.
To make the chi square calculations a bit easier, plug your observed and expected values into the following java applet.

Click on the cell and then enter the value. Click the compute button on the lower right corner to see the chi square value printed
in the lower left hand coner.

Note: Some versions of Netscape for the Macintosh do not support java 1.1 and if you are using one of these browsers you will
not see the applet. 

Chi Square Test of Independence

For a contingency table that has r rows and c columns, the chi square test can be thought of as a test of independence. In a test
ofindependence the null and alternative hypotheses are:

Ho: The two categorical variables are independent.

Ha: The two categorical variables are related.

We can use the equation Chi Square = the sum of all the (fo - fe)2 / fe

Here fo denotes the frequency of the observed data and fe is the frequency of the expected values. The general table would look
something like the one below:

   Category I Category II Category III Row Totals


 Sample A  a b c a+b+c
 Sample B  d e f d+e+f
 Sample C  g h i g+h+i
 Column
 a+d+g b+e+h c+f+i  a+b+c+d+e+f+g+h+i=N
Totals

Now we need to calculate the expected values for each cell in the table and we can do that using the the row total times the
column total divided by the grand total (N). For example, for cell a the expected value would be (a+b+c)(a+d+g)/N.

Once the expected values have been calculated for each cell, we can use the same procedure are before for a simple 2 x 2 table.

 Observed Expected |O - E| (O — E)2  (O — E)2/ E


         

Suppose you have the following categorical data set.

Table . Incidence of three types of malaria in three tropical regions.


   Asia Africa South America Totals
 Malaria A 31 14 45 90
 Malaria B 2 5 53 60
 Malaria C 53 45 2 100
 Totals  86 64 100 250

We could now set up the following table:

 Observed Expected |O -E|  (O — E)2  (O — E)2/ E


 31  30.96  0.04  0.0016  0.0000516
 14  23.04  9.04 81.72 3.546
 45  36.00  9.00 81.00 2.25
 2  20.64  18.64 347.45 16.83
 5  15.36  10.36 107.33 6.99
 53  24.00  29.00 841.00 35.04
 53  34.40  18.60 345.96 10.06
 45  25.60  19.40 376.36 14.70
 2  40.00  38.00  1444.00 36.10
Chi Square = 125.516
Degrees of Freedom = (c - 1)(r - 1) = 2(2) = 4

Table 3. Chi Square distribution table.

probability level (alpha)


Df 0.5 0.10 0.05 0.02 0.01 0.001

1 0.455 2.706 3.841 5.412 6.635 10.827

2 1.386 4.605 5.991 7.824 9.210 13.815

3 2.366 6.251 7.815 9.837 11.345 16.268

4 3.357 7.779 9.488 11.668 13.277 18.465

5 4.351 9.236 11.070 13.388 15.086 20.517


Reject Ho because 125.516 is greater than 9.488 (for alpha = 0.05)

Thus, we would reject the null hypothesis that there is no relationship between location and type of malaria. Our data tell us
there is a relationship between type of malaria and location, but that's all it says.

Follow the link below to access a java-based program for calculating Chi Square statistics for contingency tables of up to 9 rows
by 9 columns. Enter the number of row and colums in the spaces provided on the page and click the submit button. A new form
will appear asking you to enter your actual data into the cells of the contingency table. When finished entering your data, click
the "calculate now" button to see the results of your Chi Square analysis. You may wish to print this last page to keep as a record.

Chi-Square Test

Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a
specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the
actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and
expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other
factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work,
causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null
hypothesis, which states that there is no significant difference between the expected and observed result. 

2
The formula for calculating chi-square (  ) is:

2
=    (o-e)2/e

That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the deviation, d),
divided by the expected data in all possible categories. 

For example, suppose that a cross between two pea plants yields a population of 880 plants, 639 with green seeds and 241 with
yellow seeds. You are asked to propose the genotypes of the parents. Your hypothesis is that the allele for green is dominant to
the allele for yellow and that the parent plants were both heterozygous for this trait. If your hypothesis is true, then the
predicted ratio of offspring from this cross would be 3:1 (based on Mendel's laws) as predicted from the results of the Punnett
square (Figure B. 1).

Figure B.1 - Punnett Square. Predicted offspring from cross between green and yellow-seeded plants. Green (G) is dominant (3/4
green; 1/4 yellow).

2
To calculate   , first determine the number expected  in each category. If the ratio is 3:1 and the total number of observed
individuals is 880, then the expected numerical values  should be 660 green and 220 yellow. 

Chi-square requires that you use numerical values, not percentages or ratios.

2 2
Then calculate   using this formula, as shown in Table B.1. Note that we get a value of 2.668 for  . But what does this

2
number mean? Here's how to interpret the   value:

1. Determine degrees of freedom (df). Degrees of freedom can be calculated as the number of categories in the problem minus
1. In our example, there are two categories (green and yellow); therefore, there is I degree of freedom.

2. Determine a relative standard to serve as the basis for accepting or rejecting the hypothesis. The relative standard commonly
used in biological research is p >  0.05. The p value is the probability that the deviation of the observed from that expected is due
to chance alone (no other forces acting). In this case, using p > 0.05, you would expect any deviation to be due to chance alone
5% of the time or less.

3. Refer to a chi-square distribution table (Table B.2). Using the appropriate degrees of 'freedom, locate the value closest to your
calculated chi-square in the table. Determine the closestp (probability) value associated with your chi-square and degrees of

2
freedom. In this case ( =2.668), the p value is about 0.10, which means that there is a 10% probability that any deviation
from expected results is due to chance only. Based on our standard p > 0.05, this is within the range of acceptable deviation. In
terms of your hypothesis for this example, the observed chi-squareis not significantly different from expected. The observed
numbers are consistent with those expected under Mendel's law.

Step-by-Step Procedure for Testing Your Hypothesis and Calculating Chi-Square

1. State the hypothesis being tested and the predicted results. Gather the data by conducting the proper experiment (or, if
working genetics problems, use the data provided in the problem).

2. Determine the expected numbers for each observational class. Remember to use numbers, not percentages. 

Chi-square should not be calculated if the expected value in any category is less than 5.

2
3. Calculate   using the formula. Complete all calculations to three significant digits. Round off your answer to two
significant digits.

4. Use the chi-square distribution table to determine significance of the value.

a. Determine degrees of freedom and locate the value in the appropriate column.

2
b. Locate the value closest to your calculated   on that degrees of freedom df row.
c. Move up the column to determine the p value.
5. State your conclusion in terms of your hypothesis.

2
a. If the p value for the calculated   is p > 0.05, accept your hypothesis. 'The deviation is small enough that chance
alone accounts for it. A p value of 0.6, for example, means that there is a 60% probability that any deviation from
expected is due to chance only. This is within the range of acceptable deviation.

2
b. If the p value for the calculated   is p < 0.05, reject your hypothesis, and conclude that some factor other than
chance is operating for the deviation to be so great. For example, a p value of 0.01 means that there is only a 1% chance
that this deviation is due to chance alone. Therefore, other factors must be involved.

The chi-square test will be used to test for the "goodness to fit" between observed and expected data from several laboratory
investigations in this lab manual. 

Table B.1
Calculating Chi-Square

  Green Yellow
Observed (o) 639 241
Expected (e) 660 220
Deviation (o - e) -21 21
2
Deviation  (d2) 441 441
2
d /e 0.668 2
2 2
 =  d /e = 2.668 . .

Table B.2
Chi-Square Distribution

Degrees of

Freedom
Probability (p)

(df)
  0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001
1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83
2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82
3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27
4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47
5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52
6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46
7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32
8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12
9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88
10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59
  Nonsignificant Significant

Z-test
Description
The Z-test compares sample and population means to determine if there is a significant difference.
It requires a simple random sample from a population with a Normal distribution and where where the mean is known.
Calculation
The z measure is calculated as:

z = (x - m) / SE

where x is the mean sample to be standardized, m (mu) is the population mean and SE is the standard error of the
mean.

SE = s / SQRT(n)

where s is the population standard deviation and n is the sample size.

The z value is then looked up in a z-table. A negative z value means it is below the population mean (the sign is ignored in the
lookup table).
Discussion
The Z-test is typically with standardized tests, checking whether the scores from a particular sample are within or outside the
standard test performance.
The z value indicates the number of standard deviation units of the sample from the population mean.
Note that the z-test is not the same as the z-score, although they are closely related.

Z-test is a statistical test where normal distribution is applied and is basically used for dealing with problems relating to large
samples when n ≥ 30.
n = sample size
For example suppose a person wants to test if both tea & coffee are equally popular in a particular town. Then he can take a
sample of size say 500 from the town out of which suppose 280 are tea drinkers. To test the hypothesis, he can use Z-test.
Z-TEST’S FOR DIFFERENT PURPOSES
There are different types of Z-test each for different purpose. Some of the popular types are outlined below:

1. z test for single proportion is used to test a hypothesis on a specific value of the population proportion.

Statistically speaking, we test the null hypothesis H0: p = p0 against the alternative hypothesis H1: p >< p0 where p is the
population proportion and p0 is a specific value of the population proportion we would like to test for acceptance.

The example on tea drinkers explained above requires this test. In that example, p 0 = 0.5. Notice that in this particular
example, proportion refers to the proportion of tea drinkers.

2. z test for difference of proportions is used to test the hypothesis that two populations have the same proportion.

For example suppose one is interested to test if there is any significant difference in the habit of tea drinking between
male and female citizens of a town. In such a situation, Z-test for difference of proportions can be applied.

One would have to obtain two independent samples from the town- one from males and the other from females and
determine the proportion of tea drinkers in each sample in order to perform this test.

3. z -test for single mean is used to test a hypothesis on a specific value of the population mean.

Statistically speaking, we test the null hypothesis H 0: μ = μ0 against the alternative hypothesis H1: μ >< μ0 where μ is the
population mean and μ0 is a specific value of the population that we would like to test for acceptance.

Unlike the t-test for single mean, this test is used if n ≥ 30 and population standard deviation is known.

4. z test for single variance is used to test a hypothesis on a specific value of the population variance.
Statistically speaking, we test the null hypothesis H 0: σ = σ0 against H1: σ >< σ0 where σ is the population mean and σ0 is a
specific value of the population variance that we would like to test for acceptance.

In other words, this test enables us to test if the given sample has been drawn from a population with specific variance
σ0. Unlike the chi square test for single variance, this test is used if n ≥ 30.

5. Z-test for testing equality of variance is used to test the hypothesis of equality of two population variances when the
sample size of each sample is 30 or larger.

ASSUMPTION

Irrespective of the type of Z-test used it is assumed that the populations from which the samples are drawn are normal.

Data types that can be analysed with z-tests

 data points should be independent from each other


 z-test is preferable when n is greater than 30.
 the distributions should be normal if n is low, if however n>30 the distribution of the data does not
have to be normal
 the variances of the samples should be the same (F-test)
 all individuals must be selected at random from the population
 all individuals must have equal chance of being selected
 sample sizes should be as equal as possible but some differences are allowed

Data types that can be analysed with t-tests

 data sets should be independent from each other except in the case of the paired-sample t-test
 where n<30 the t-tests should be used
 the distributions should be normal for the equal and unequal variance t-test (K-S test or Shapiro-
Wilke)
 the variances of the samples should be the same (F-test) for the equal variance t-test
 all individuals must be selected at random from the population
 all individuals must have equal chance of being selected
 sample sizes should be as equal as possible but some differences are allowed

Limitations of the tests

if you do not find a significant difference in your data, you cannot say that the samples are the same

Introduction to the z and t-tests

Z-test and t-test are basically the same; they compare between two means to suggest whether both samples come from the
same population. There are however variations on the theme for the t-test. If you have a sample and wish to compare it with a
known mean (e.g. national average) the single sample t-test is available. If both of your samples are not independent of each
other and have some factor in common, i.e. geographical location or before/after treatment, the paired sample t-test can be
applied. There are also two variations on the two sample t-test, the first uses samples that do not have equal variances and the
second uses samples whose variances are equal.

It is well publicised that female students are currently doing better then male students! It could be speculated that this is due to
brain size differences? To assess differences between a set of male students' brains and female students' brains a z or t-test could
be used. This is an important issue (as I'm sure you'll realise lads) and we should use substantial numbers of measurements.
Several universities and colleges are visited and a set of male brain volumes and a set of female brain volumes are gathered (I
leave it to your imagination how the brain sizes are obtained!).

Hypotheses

Data arrangement

Excel can apply the z or t-tests to data arranged in rows or in columns, but the statistical packages nearly always use columns and
are required side by side.

Results and interpretation

Degrees of freedom:

For the z-test degrees of freedom are not required since z-scores of 1.96 and 2.58 are used for 5% and 1%
respectively.

For unequal and equal variance t-tests = (n1 + n2) - 2

For paired sample t-test = number of pairs - 1

The output from the z and t-tests are always similar and there are several values you need to look for:

You can check that the program has used the right data by making sure that the means (1.81 and 1.66 for the t-test), number of
observations (32, 32) and degrees of freedom (62) are correct. The information you then need to use in order to reject or accept
your HO, are the bottom five values. The t Stat value is the calculated value relating to your data. This must be compared with the
two t Critical values depending on whether you have decided on a one or two-tail test (do not confuse these terms with the one
or two-way ANOVA). If the calculated value exceeds the critical values the H O must be rejected at the level of confidence you
selected before the test was executed. Both the one and two-tailed results confirm that the H O must be rejected and the
HA accepted.

We can also use the P(T<=t) values to ascertain the precise probability rather than the one specified beforehand. For the results
of the t-test above the probability of the differences occurring by chance for the one-tail test are 2.3x10 -9 (from 2.3E-11 x 100).
All the above P-values denote very high significant differences.
The T-Test

The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate
whenever you want to compare the means of two groups, and especially appropriate as the analysis for the posttest-only two-
group randomized experimental design.

Figure 1. Idealized distributions for treated and comparison group posttest values.

Figure 1 shows the distributions for the treated (blue) and control (green) groups in a study. Actually, the figure shows the
idealized distribution -- the actual distribution would usually be depicted with a histogram or bar graph. The figure indicates
where the control and treatment group means are located. The question the t-test addresses is whether the means are
statistically different.

What does it mean to say that the averages for two groups are statistically different? Consider the three situations shown in
Figure 2. The first thing to notice about the three situations is that the difference between the means is the same in all three.
But, you should also notice that the three situations don't look the same -- they tell very different stories. The top example shows
a case with moderate variability of scores within each group. The second situation shows the high variability case. the third
shows the case with low variability. Clearly, we would conclude that the two groups appear most different or distinct in the
bottom or low-variability case. Why? Because there is relatively little overlap between the two bell-shaped curves. In the high
variability case, the group difference appears least striking because the two bell-shaped distributions overlap so much.

Figure 2. Three scenarios for differences between means.

This leads us to a very important conclusion: when we are looking at the differences between scores for two groups, we have to
judge the difference between their means relative to the spread or variability of their scores. The t-test does just this.

Statistical Analysis of the t-test

The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The
bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-
to-noise metaphor in research: the difference between the means is the signal that, in this case, we think our program or
treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may
make it harder to see the group difference. Figure 3 shows the formula for the t-test and how the numerator and denominator
are related to the distributions.
Figure 3. Formula for the t-test.

The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called
the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in
that group. We add these two values and then take their square root. The specific formula is given in Figure 4:

Figure 4. Formula for the Standard error of the difference between the means.

Remember, that the variance is simply the square of the standard deviation.

The final formula for the t-test is shown in Figure 5:

Figure 5. Formula for the t-test.

The t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value
you have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the
groups is not likely to have been a chance finding. To test the significance, you need to set a risk level (called the alpha level). In
most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred you would
find a statistically significant difference between the means even if there was none (i.e., by "chance"). You also need to
determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups
minus 2. Given the alpha level, the df, and the t-value, you can look the t-value up in a standard table of significance (available as
an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, you can
conclude that the difference between the means for the two groups is different (even given the variability). Fortunately,
statistical computer programs routinely print the significance test results and save you the trouble of looking them up in a table.

The t-test, one-way Analysis of Variance (ANOVA) and a form of regression analysis are mathematically equivalent (see the
statistical analysis of the posttest-only randomized experimental design) and would yield identical results.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy