Chapter 12
Chapter 12
Non-Parametric Tests
Contents
Relevance- Advantages and Disadvantages Tests for
Randomness of a Series of Observations - Run Test Specified Mean or Median of a Population Signed Rank Test Goodness of Fit of a Distribution Kolmogorov- Smirnov Test Comparing Two Populations Kolmogorov- Smirnov Test Equality of Two Means Mann - Whitney (U)Test
Learning Objectives
This Chapter aims to
highlight the importance of non parametric tests when the assumptions in tests of significance described in Chapters X and X1 on Statistical Inference and ANOVA respectively are doubtful to be valid.
describe certain non parametric tests of significance relating to randomness, mean of a population, means of two or more than two populations and rank correlation, etc.
Relevance
under consideration or some assumption for a parametric test is not valid or is doubtful.
The hypothesis to be tested does not relate to the parameter
of a population
The numerical accuracy of collected data is not fully assured Results are required rather quickly through simple
calculations.
This test has been involved for testing whether the observations in a sample occur in a certain order or they occur in a random order. The hypotheses are Ho : The sequence of observations is random
H1 : The sequence of observations is not random The only condition for validity of the test is that the observations in the sample be obtained under similar conditions
First, all the observations are arranged in the order they are collected. Then the median is calculated. All the observations in the sample larger than the median value are given a + sign and those below the median are given a sign.
If there are an odd number of observations then the median observation is ignored. This ensures that the number of + signs is equal to the number of signs
A succession of values with the same sign is called a run and the number of runs, R, gives an idea of the randomness of the observations.
This is the test statistic. If the value of R is low, it indicates certain trend in the observations, If the value of R is high, it indicates presence of some factor causing regular fluctuations in the observations
T Test
The statistic T is defined as the minimum of the sum of positive ranks and sum of negative ranks The critical value of T at 5% level of significance is found using table If the calculated value, is less than the critical value, the null hypothesis not rejected and not rejected otherwise
The criteria using rank methods is reverse of the parametric tests wherein the null hypothesis is rejected if the critical value exceeds the tabulated value.
The test is used to investigate the significance of the difference between observed and expected cumulative distribution function for a variable with a specified theoretical distribution which could be Binomial, Poisson, Normal or an Exponential. It tests whether the observations could reasonably have come from the specified distribution
Null Hypothesis
The testing procedure envisages calculations of observed and expected cumulative distribution functions denoted by Fo(x) and Fe(x), respectively, derived from the sample.
The comparison of the two distributions for various values of the variable is measured by the test statistic D = | Fo(x) Fe(x) | If the value of the difference of D is less, the null hypothesis is likely to be accepted. But if the difference is more, it is likely to be rejected.
The Chi-square test is the most popular test of goodness of fit. On comparing the two tests, we note that the K-S test is easier to apply.
While c2- test is specially meant for categorical data, the K-S test is applicable for random samples from continuous populations.
The K-S statistic utilises each of the n observations. Hence, the K-S test makes better use of available information than Chi-square statistic
This test is used for testing whether two samples come from two identical population distributions. The hypotheses are: H0: F1 (x) = F2 (x) i.e. the two populations of random variables x and y are almost the same. H1: F1 (x) F2 (x) i.e. the two populations are not same that is claimed
There are no assumptions to be made for the populations. However, for reliable results, the samples should be sufficiently large say, 15 or more.
Procedure
Given samples of size n1 and n2 from the two populations, the cumulative distribution functions F1 (x) can be determined and plotted. The maximum value of the difference between the plotted values can thus be found and compared with a critical value obtained from the concerned Table. If the observed value exceeds the critical value the null hypothesis that the two population distributions are identical is rejected.
For using the U test, all observations are combined and ranked as one group of data, from smallest to largest. The largest negative score receives the lowest rank. In case of ties, the average rank is assigned. After the ranking, the rank values for each sample are totaled. The U statistic is calculated as follows
Or
where, n1 = Number of observations in sample 1; n2 = Number of observations in sample 2 Rl = Sum of ranks in sample 1; R2 = Sum of ranks in sample 2. For testing purposes, the smaller of the above two U values is used.
This test is analogous to ANOVA, and is used to test the significance of the differences among means of several groups recorded only in terms of ranks of observations in a group.
However, if the original data is recorded in absolute values it could be converted into ranks. The hypotheses, like in ANOVA are: Ho : m1 = m2 = m3 H1 : All means are not equal
This test is used for testing equality of means of a number of populations, and the null hypothesis is of the type H0 : m1 = m2 = m3 ( can be even more than 3) It may be recalled that H0 is the same as in ANOVA. However, here the ranks of observations are used and not actual observations.
Procedure
Assigning combined ranks to the observations in all the samples from smallest to largest. The rank sum of each sample is then calculated. The test statistics H is calculated as follows
Procedure
where Tj = Sum of ranks for treatment j nj = Number of observations for treatment i n = nj = Total number of observations k = Number of treatments
Test for Given Samples to be from the Same Population - Friedman's Test
Friedman's test is a non-parametric test for testing hypothesis that a given number of samples have been drawn from the same population. This test is similar to ANOVA but it does not require the assumption of normality and equal variance. This test is carried out with the data in terms of ranks of observations rather than their actual values, like in ANOVA. It is used whenever the number of samples is greater than or equal to 3 (say k) and each of the sample size is equal (say n) like two-way analysis of variance. It is referred to as Two-Way ANOVA. The null hypothesis to be tested is that all the k samples have come from identical populations
where, k = Number of samples(brands) = 3 ( in the illustration) n = Number of observations for each sample(brand) = 6 ( in the illustration) Ri = Sum of ranks of jth sample (brand)
The statistical tables exist for the sampling distribution of Friedmans F, these are not readily for various values of n and k. The sampling distribution of F can be approximated by a c2 (chi-square) distribution with k l degrees of freedom.
The chi-square distribution table value is compared with the calculated value.
di is the difference in the two ranks given to ith individual / unit /object
There is no statistic to be calculated for testing the significance of the rank correlation. The calculated value of rs is itself compared with the tabulated value of rs, given in Appendix , at 5% or 1% level of significance. If the calculated value is more than the tabulated value, the null hypothesis that there is no correlation in the two rankings is rejected The hypotheses are as follows
Ho : s = 0
H1 : s 0