Parametric and Non
Parametric and Non
STATISTICAL TOOLS
SATISTICAL TOOLS
1. CHI SQAURE
2. SPEARMAN RHO
3. T TEST
4. Z TEST
5. ANOVA
often ordinal, meaning it does not rely on numbers, but rather on ranking or
order sorts
CHI SQUARE
actual observed data (or model results). The data used in calculating a chi square
variables, and drawn from a large enough sample. Chi square are often used in
hypothesis testing.
Chi square is used decide whether there is any difference between the
A Chi square is design to categorical data. That means that the data has
been counted and divided into categories. It will not work with parametric or
continuous data. For example, if you want to test whether attending class influences
how students perform on exam, using test scores (from 0-1oo) as data would not be
appropriate for Chi square test. However, arranging students into the categories
“Pass” or “Fail” would. Additionally, the data in Chi square grid should not be in the
form of percentages, or anything other than frequency (count) data. Thus, by dividing
a class of 54 into groups according to whether by attended class and whether they
PASS FAIL
ATTENDED 25 6
SKIPPED 8 15
FORMULA OF A CHI SQUARE:
The Chi square test is calculated by evaluating the cell frequencies that involve the
the variables. The comparison between the expected type of frequency and the
actual observed frequency is then made in this test. The computation of the
expected frequency square test is calculated as the product of the total number of
observations in the row and the column, which is divided by the expected frequency.
The researcher should know that the greater the difference between the observed
and expected cell frequency, the larger the value of Chi square statistic in the Chi
square test. In order to determine if the association between the two variables exists,
NOTE that any statistical test that uses the chi square distribution can be called chi
square.
EXAMPLE PROBLEM
-Employees Satisfaction
-If the employees were satisfied with the working conditions of the company is
increased or decreased?
A business owner had been working to improve employee relations in his company.
He predicted that he met his goal of increasing employee satisfaction from 65% to
80%. Employees from four departments were asked if they were satisfied with the
working conditions of the company. The results are shown in the following table:
Satisfaction 12 38 8
Dissatisfaction 7 19 1
Total 19 57 9
We can use chi square to determine whether the results support or reject the
Our first step is to calculate the predicted values so that we can compare them to the
actual values from the survey. The predicted number of satisfied employees is 80%
of the total number of employees in each department. This leaves the remaining
20% as the number of dissatisfied employees. For example, the predicted number of
following table shows the observed and expected values for each department. The
observed values are in bold and the expected values are in parentheses.
The next step is to use the chi square table found at the beginning of the lesson to
find the p-value. Because our data has four categories (the four departments in the
company), our degree of freedom is three. Following the row for a degree of freedom
of three, we want to find the nearest value to the chi square value of 11.6806. The
accepted or refused. Since our p-value is less than 0.05, the hypothesis should be
refused. In other words, the data does not support the business manager's prediction
EXAMPLE PROBLEM
-Fair Gamble
Many casinos use card-dealing machines to deal cards at random. Occasionally, the
machine is tested to ensure an equal likelihood of dealing for each suit. To conduct
the test, 1,500 cards are dealt from the machine, while the number of cards in each
suit is counted. Theoretically, 375 cards should be dealt from each suit. As you can
see from the results in our table, this is not the case:
Discrepancies are significant. If the discrepancies are significant, then the game
would not be fair. Measures would need to be taken to ensure that the game is fair.
We can use chi square to determine if the discrepancies are significant. If the
discrepancies are significant, then the game would not be fair. Measures would need
a relationship between two variables. The result always be between 1 and minus 1.
is the nonparametric version of the pearson correlation coefficient. Your data must
Where n is the number of the data points of the two variables and di is the difference
in the ranks of the 1th element of each random variable considered. The spearman
The closer p is to zero, the weaker the association between the ranks.
EXAMPLE PROBLEM
A photographer has noticed that a freshly mixed batch of chemicals will develop
photographs faster than an old batch of chemicals. The photographer keeps records
1 9
35 44
4 12
38 49
6 14
40 52
Use this data to find the regression equation for predicting time as a function of age
- If a freshly mixed batch of chemicals will develop photographs faster than an old
batch of chemicals?
EXAMPLE PROBLEM
on the basis of the order in which language skills appear in development. Not being
entirely confident that she has selected the correct ordering of skills, she asks
another professional to rank the items from 1 to 15 in terms of the order in which he
Investigator: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Consultant: 1 3 2 4 7 5 6 8 10 9 11 12 15 13 14
technique that can show whether and how strongly pairs of variables are related. For
example, height and weight are related; taller people tend to be heavier than shorter
people. The relationship isn't perfect. People of the same height vary in weight, and
you can easily think of two people you know where the shorter one is heavier than
the taller one. Nonetheless, the average weight of people 5'5'' is less than the
average weight of people 5'6'', and their average weight is less than that of people
5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is
Although this correlation is fairly obvious your data may contain unsuspected
correlations. You may also suspect there are correlations, but don't know which are
of your data.
ordered pairs (x,y) that make up a sample from a population. The correlation
there is likely a strong linear relationship between the two variables, with a positive
slope
. • If r is close to -1, we say that the variables are negatively correlated. This means
there is likely a strong linear relationship between the two variables, with a negative
slope
. • If r is close to 0, we say that the variables are not correlated. This means that
there is likely no linear relationship between the two variables, however, the
+1), squaring them makes then easier to understand. The square of the coefficient
(or r square) is equal to the percent of the variation in one variable that is related to
the variation in the other. After squaring r, ignore the decimal point. An r of .5 means
25% of the variation is related (.5 squared =.25). An r value of .7 means 49% of the
A correlation report can also show a second result of each test - statistical
significance. In this case, the significance level will tell you how likely it is that the
correlations reported may be due to chance in the form of random sampling error. If
you are working with small sample sizes, choose a report format that includes the
scientists in China wanted to know if there was a relationship between how weedy
rice populations are different genetically. The goal was to find out the evolutionary
potential of the rice. Pearson’s correlation between the two groups was analyzed. It
showed a positive Pearson Product Moment correlation of between 0.783 and 0.895
for weedy rice populations. This figure is quite high, which suggested a fairly strong
relationship.
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 1:Make a chart. Use the given data, and add three more columns: xy, x2, and
y2.
SUBJECT AGE X GLUCOSE XY X2
LEVEL Y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be
43 × 99 = 4,257.
LEVEL Y
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 3: Take the square of the numbers in the x column, and put the result in the x2
column.
SUBJECT AGE X GLUCOSE XY X2 Y2
LEVEL Y
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 4: Take the square of the numbers in the y column, and put the result in the y2
column.
LEVEL Y
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Step 5: Add up all of the numbers in the columns and put the result at the bottom of
the column. The Greek letter sigma (Σ) is a short way of saying “sum of.”
SUBJECT AGE X GLUCOSE XY X2 Y2
LEVEL Y
coefficient
Click here if you want easy, step-by-step instructions for solving this formula.
Σx = 247
Σy = 486
Σxy = 20,485
Σx2 = 11,409
Σy2 = 40,022
= 0.5298
The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or
EXAMPLE PROBLEM
This time, Olivia is talking to a friend about a new television show. She notices that
some of her classmates aren't familiar with the show, but they are carrying around a
lot of library books. She wonders if the more library books a student owns, the less
television they watch. She collects data from her friends about how many library
-Does more books students owns, the less the television they watch?
T TEST
The t test tells you how significant the differences between groups are; In other
have happened by chance. A t-test allow us to compare the average values of two
data sets and determine if they came from the population in the above examples, if
we were take a samples of students from class A another sample of students from
class B we would not expect them to have exactly the same mean and standard
deviation. Similarly, samples taken from the placebofed control group and those
taken from the drug prescribed group should have a slightly different mean and
standard deviation.
difference between the means of two groups, which may be related in a certain
features. It is mostly used when the data sets, like the data set recorded as the
outcome from flipping coin 100 times, would follow a normal distribution.
A Paired sample t-test compares means from the same group at different times (say,
A One sample t-test tests the mean of a single group against a known mean.
You probably don’t want to calculate the test by hand (the math can get very messy,
but if you insist you can find the steps for an independent samples t test here.
EXAMPLE PROBLEM
A drug company may want to test a new cancer drug to find out if it improves life
expectancy. In an experiment, there’s always a control group (a group who are given
a placebo, or “sugar pill”). The control group may show an average life expectancy of
+5 years, while the group taking the new drug might have a life expectancy of +6
years. It would seem that the drug might work. But it could be due to a fluke. To test
this, researchers would use a Student’s t-test to find out if the results are repeatable
EXAMPLE PROBLEM
result in a decrease in food intake (in this case, chocolate chips) in rats. Rats
habenula. Following a ten day recovery period, rats (kept at 80 percent body weight)
are tested for the number of chocolate chips consumed during a 10 minute period of
time both with and without electrical stimulation. The testing conditions are counter
balanced.
--Did using electrical stimulation and not using electrical stimulation can decrease
A z-score is the number of standard deviations from the mean a data point is. But
more technically it’s a measure of how many standard deviations below or above the
population mean a raw score is. A z-score is also known as a standard score and it
deviations (which would fall to the far left of the normal distribution curve) up to +3
standard deviations (which would fall to the far right of the normal distribution curve).
In order to use a z-score, you need to know the mean μ and also the population
standard deviation σ.
Z-scores are a way to compare results from a test to a “normal” population. Results
from tests or surveys have thousands of possible results and units. However, those
results can often seem meaningless. For example, knowing that someone’s weight is
150 pounds might be good information, but if you want to compare it to the “average”
some weights are recorded in kilograms). A z-score can tell you where that person’s
and a standard deviation (σ) of 25. Assuming a normal distribution, your z score
would be:
z = (x – μ) / σ
The z score tells you how many standard deviations from the mean your score is. In
this example, your score is 1.6 standard deviations above the mean.
alternate-z-score. You may also see the z score formula shown to the left. This is
exactly the same formula as z = x – μ / σ, except that x̄ (the sample mean) is used
instead of μ (the population mean) and s (the sample standard deviation) is used
instead of σ (the population standard deviation). However, the steps for solving it are
The z-score in the center of the curve is zero. The z-scores to the right of the mean
are positive and the z-scores to the left of the mean are negative. If you look up the
score in the z-table, you can tell what percentage of the population is above or below
your score. The table below shows a z-score of 2.0 highlighted, showing .9772
(which converts to 97.72%). If you look at the same score (2.0) of the normal
Boys of a certain age are known to have a mean weight μ=85 pounds. A complaint is
made that the boys living in municipal children’s home are underfed. As one bit of
evidence n=25 boys (at the same age) are weighed and found to have a mean
SOLUTION:
The null hypothesis is μ=85, and the alternative hypothesis is μ <85. In general, we
Follows the standard normal distribution, it is actually a bit irrelevant here whether or
not the weights are normally distributed, because the same size n= 25 is large
enough for the Central Limit Theorem to apply defined above, follows at least
EXAMPLE PROBLEM
university in PA claims that his graduate students earn more than this. He surveys
46 randomly selected students and finds their average salary is $13,445 with a
significant. In other words, they help you to figure out if you need to reject the null
hypothesis or accept the alternate hypothesis. Basically, you’re testing group to see
that is used to check if the means of two or more groups are significantly different
from each other. ANOVA checks the impact of one or more factors by comparing the
effective or not.
FORMULA
Where,
F= ANOVA coefficient
Where,
Where,
ANIMALS
Chicken 5 20 2
Cats 7 12 1
Ducks 3 16 4
Solution:
Chicken 5 20 2 4
Cats 5 12 1 1
Ducks 5 16 4 16
EXAMPLE PROBLEM
therapy (CCT) against a control condition. Subjects were randomly assigned to the
Suppose you want to determine whether the brand of laundry detergent used and
the temperature affects the amount of dirt removed from your laundry. To this end,
you buy two different brand of detergent (“ Super” and “Best”) and choose three
different temperature levels (“cold”, “warm”, and “hot”). Then you divide your laundry
randomly into 6×r piles of equal size and assign each r piles into the combination of
(“Super” and “Best”) and (”cold”,”warm”, and “hot”). In this example, we are
H0D: The amount of dirt removed does not depend on the type of detergent
H0T : The amount of dirt removed does not depend on the temperature
One says the experiment has two factors (Factor Detergent, Factor Temperature) at
a = 2(Super and Best) and b = 3(cold, warm and hot) levels. Thus there are ab = 3×2
loads in total. The amounts Yijk of dirt removed when washing sub pile k (k = 1,2,3,4)
Solution:
Mr 5 11 11 9
ONE WAY ANOVA
A ONE WAY ANOVA is a type of statistical test that compares the variance in
the group means thin the sample whilst considering only one independent variable or
mutually exclusive theories about our data. Before we can generate a hypothesis, we
need to have a question about our data that we want an answer to.
there is a difference between them. Within each group there should be three or more
observations. One way ANOVA assumes that each group comes from an
approximately normal distribution and that the variability within the groups is roughly
constant. The factors are arranged so that experiments are columns and subjects
are rows, this is how you must enter your data in the StatsDirect workbook. The
overall F test is fairly robust to small deviations from these assumptions but you
could use the Kruskal-Wallis test as an alternative to one way ANOVA if there was
any doubt.
Numerically, one way ANOVA is a generalisation of the two sample t test. The F
statistic compares the variability between the groups to the variability within the
groups:
EXAMPLE PROBLEM
therapy (CCT) against a control condition. Subjects were randomly assigned to the
SOLUTION
MS_{error} = \frac{20.25 + 30.25 + 33.64}{3} = 28.05 Note: this is just the average
Calculating the remaining error (or within) terms for the ANOVA table:
EXAMPLE PROBLEM
among four algebra curricula is difficult to understand. Eight grade students are
randomly assigned to one of the four groups. Their state achievement test scores
are compared at the end of the year. Use the ONE WAY statistical procedure to
SOLUTION
Note: this is just the average within-group variance; it is not sensitive to group mean
differences!
Calculating the remaining error (or within) terms for the ANOVA table:
The two-way ANOVA compares the mean differences between groups that
have been split between two independent variables (called factors). The primary
two independent variables on the dependent variable. For example, you may want to
The interaction term in a two-way ANOVA informs you whether the effect of one of
your independent variables on the dependent variable is the same for all values of
The two-way ANOVA compares the mean differences between groups that have
been split on two independent variables (called factors). The primary purpose of a
EXAMPLE PROBLEM
agitation levels in patients who are in the early and middle stages of Alzheimer’s
disease. Patients were selected to participate in the study based on their stage of
Alzheimer’s disease. Three forms of music were tested: Easy listening, Mozart, and
piano interludes. While listening to music, agitation levels were recorded for the
-What form of music makes the patients beat their heart per minute?
INTERLUDE LISTENING
Post means to analyze the results of your experimental data. Post hoc tests are
designed for situations in which the researcher has already obtained a significant
omnibus F-test with a factor that consists of three or more means and additional
on which means are significantly different from each other. Post hoc tests are
designed for situations in which the researcher has already obtained a significant
omnibus F-test with a factor that consists of three or more means and additional
A post hoc study is conducted using data that has already been collected.
Using this data, the researchers conducts new analyses for new objectives, which
were not planned before the experiment. Analyses of pooled data from previously
An Introduction Using SPSS, Stata, and Excel. Springer Science and Business
Media.
Kinnear and Gray (1999). SPSS for Windows Made Simple. Taylor and Francis.
: http://www2.webster.edu/~woolflm/8banswer.html
http://www.spearman%20&%20regression%20answers.pdf
http://www. 2way-ANOVA-Example-Problem-VK.pdf
http://www. spearman%20&%20regression%20answers.pdf