3.normality Test and Homogenity
3.normality Test and Homogenity
Visual inspection of the distribution may be used for assessing normality, although this
approach is usually unreliable and does not guarantee that the distribution
is normal. The frequency distribution (histogram), stem-and-leaf plot, boxplot, P-P plot
(probability-probability plot), and Q-Q plot (quantile-quantile plot) are used for checking
normality visually. The frequency distribution that plots the observed values against their
frequency, provides both a visual judgment about whether the distribution is bell shaped
and insights about gaps in the data and outliers outlying values. The normality tests are
supplementary to the graphical assessment of normality .The main tests for the assessment
of normality are Kolmogorov-Smirnov (K-S) , Shapiro-Wilk test, etc.
1. Most statistical tests rest upon the assumption of normality. Deviations from normality, called
non-normality, render those statistical tests inaccurate, so it is important to know if your data
are normal or non-normal.
2. Tests that rely upon the assumption or normality are called parametric tests. If your data is
not normal, then you would use statistical tests that do not rely upon the assumption of
normality, call non-parametric tests. Non-parametric tests are less powerful than parametric
tests, which means the non-parametric tests have less ability to detect real differences or
variability in your data. In other words, you want to conduct parametric tests because you
want to increase your chances of finding significant results.
In the context of statistical analysis, we often talk about null hypothesis and alternative hypothesis. If
we are to compare method A with method B about its superiority and if we proceed on the assumption
that both methods are equally good, then this assumption is termed as the null hypothesis. As against
this, we may think that the method A is superior or the method B is inferior, we are then stating what
is termed as alternative hypothesis. The null hypothesis is generally symbolized as H0 and the
alternative hypothesis as Ha
The level of significance is always some percentage (usually 5%) which should be chosen wit great
care, thought and reason. In case we take the significance level at 5 per cent, then this implies that H0
will be rejected when the sampling result (i.e., observed evidence) has a less than 0.05 probability of
occurring if H0 is true. In other words, the 5 per cent level of significance means that researcher is
willing to take as much as a 5 per cent risk of rejecting the null hypothesis when it (H0) happens to be
true. Thus the significance level is the maximum value of the probability of rejecting H0 when it is
true and is usually determined in advance before testing the hypothesis.
Homogeneity Test:
A data set is homogeneous if it is made up of things (i.e. people, cells or traits) that are similar to each
other. For example a data set made up of 20-year-old college students enrolled in Physics 101 is a
homogeneous sample.
Running statistical tests for homogeneity becomes important when performing any kind of data
analysis, as many hypothesis tests run on the assumption that the data has some type of
homogeneity. For example, an ANOVA test assumes that the variances of different populations are
equal (i.e. homogeneous).
This tests to see if two populations come from the same unknown distribution (if they do, then they
are homogeneous). The test is run the same way as the standard chi-square test; the Χ2 statistic is
computed, and the null hypothesis (that the data comes from the same distribution) is either accepted
or rejected. The chi-square test of homogeneity tests to see whether different columns (or rows) of
data in a table come from the same population or not (i.e., whether the differences are consistent with
being explained by sampling error alone