Choosing The Right Statistical Test
Choosing The Right Statistical Test
Statistical tests are used in hypothesis testing. They can be used to:
For a statistical test to be valid, your sample size needs to be large enough to approximate the
true distribution of the population being studied.
Statistical assumptions
Statistical tests make some common assumptions about the data they are testing:
1. Homogeneity of variance: the variance within each group being compared is similar among
all groups. If one group has much more variation than others, it will limit the test’s
effectiveness.
2. Normality of data: the data follows a normal distribution (a bell curve). This assumption
applies only to quantitative data.
If your data do not meet the assumptions of normality or homogeneity of variance, you may be
able to perform a nonparametric statistical test, which allows you to make comparisons without
any assumptions about the data distribution.
Types of variables
The types of variables you have usually determine what type of statistical test you can use.
Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of
quantitative variables include:
Continuous (ratio variables): represent measures and can usually be divided into units
smaller than one (e.g. 0.75 grams).
Discrete (integer variables): represent counts and usually can’t be divided into units smaller
than one (e.g. 1 tree).
Categorical variables represent groupings of things (e.g. the different tree species in a forest).
Types of categorical variables include:
Parametric tests usually have stricter requirements than nonparametric tests, and are able to
make stronger inferences from the data. They can only be conducted with data that adheres to
the common assumptions of statistical tests.
The most common types of parametric test include regression tests, comparison tests, and
correlation tests.
Regression tests
Regression is a statistical method used to examine the relationship between two or more
variables. In the simplest terms, it helps us understand how the typical value of the dependent
variable (what you are trying to predict) changes when any one of the independent variables (the
predictors) is varied, while the other variables are held constant, if not constant adjust those
confounding variables..
Comparison tests
Comparison tests look for differences among group means. They can be used to test the effect of
a categorical variable on the mean value of some other characteristic.
The Student's t-test\t-test is used to compare the means of two groups or to compare a
sample mean to a known value when the sample size is small and the population standard
deviation is unknown.
Example: Imagine you want to compare the effectiveness of two different brands of headache
medication. You randomly assign 10 patients to Brand A and another 10 to Brand B and measure
the time it takes for the headache to resolve. You would use a t-test to determine if there's a
significant difference in the mean resolution times between the two brands.
ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g.,
the average heights of children, teenagers, and adults).
Correlation tests check whether variables are related without hypothesizing a cause-and-effect
relationship.
The correlation coefficient is measured on a scale that varies from + 1 through 0 to – 1. Complete
correlation between two variables is expressed by either + 1 or -1. When one variable increases as
the other increases the correlation is positive; when one decreases as the other increases it is
negative.
Non-parametric tests don’t make as many assumptions about the data, and are useful when one
or more of the common statistical assumptions are violated. However, the inferences they make
aren’t as strong as with parametric tests.
The Chi-square test is used to determine whether there is a significant association between two
categorical variables. It compares the observed frequencies in each category to the frequencies
you would expect if there were no association between the categories.
Chi-square is the standard method and is best when you have a large number of subjects in
categories. It provides an approximate P value and can be calculated by hand as well. This is also
known as the chi-square test of independence. Fisher's exact test is used to calculate P values
for small sample sizes.
The Kruskal-Wallis H test (sometimes also called the "one-way ANOVA on ranks") is a rank-based
nonparametric test that can be used to determine if there are statistically significant differences
between two or more groups of an independent variable on a continuous or ordinal dependent
variable. The null hypothesis of the Kruskal-Wallis test is that the mean ranks of the groups are
the same.