0% found this document useful (0 votes)
22 views74 pages

STAT22209 - Nonparametric Statistics

Stat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views74 pages

STAT22209 - Nonparametric Statistics

Stat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Advanced Statistics II

( PST22209/ FST 22209/ ESNRM22209)

Nonparametric Statistics

R.M. KAPILA RATHNAYAKA


B.Sc. Special (Math. & Stat. ) (Ruhuna), M.Sc. (Industrial Mathematics) (USJ),
M.Sc. (Stat. ) (WHUT, China),
Ph.D. (Applied Statistics, WHUT)
• Before discussing nonparametric techniques, we should
consider why the methods we usually use are called
parametric.

• Parametric tests involve the estimation of at least one


population parameter (like the mean or the standard
deviation).

• Such tests are based upon the assumption that the population
being studied has a normal distribution.

08/04/2024 2
• What happens if you want to study a
population that is not normally
distributed?

• Then we use a nonparametric test.


Nonparametric Test Procedures

1. Do Not Involve Population Parameters


Example: Probability Distributions, Independence

2. Data Measured on Any Scale

 Ratio or Interval,

 Ordinal or Nominal
Advantage of non-parametric tests

1. So the main advantage of doing a nonparametric test is that


we are not assuming that the population is normally
distributed.
– This data would be skewed right, or positively skewed.

2. Need Not Involve with Population Parameters.

3. There are some non-parametric tests that can be applying for


ranks, signs, symbols so on.
5
4. Make Fewer Assumptions and usually don't
require too much calculation, so the can be done
quickly.

5. If the sample sizes are small then there is no


alternative method other than using non-parametric
testes.
Hypothesis Testing Procedures

Hypothesis
Testing
Procedures

Parametric Nonparametric

Wilcoxon Rank Kruskal-Wallis


Sum Test H-Test
One-Way
Z Test t Test
ANOVA
Many More Tests Exist!
Parametric and Non-parametric tests for comparing two
or more groups
One Sample Runs test of Randomness
• This is used to test the randomness (biased) of a given
sample.

• The one-sample runs test of significance is commonly used as


a nonparametric test of randomness in a sample .

• Assumption: the samples were randomly selected.

• Meaning: chosen without preference or bias.


Run (r)

• A run is a sequence of symbol which is followed by and


proceeded by different symbol or no symbol at all.
Example (01):
Suppose we toss a coin 10 times and we get
T H T T H H H T H H
Is this coin unbiased (randomly) or biased.

10
 Example (03):
Marks in statistics of 12 students are as follows.
66 06 74 27 92 95 04 35 26
80 46 78
It is asked to test whether the abilities of student are
same.

08/04/2024 11
The sampling distribution of the r statistic
• A one-sample runs test: based on the idea that too few or too
many runs show that the items were not chosen randomly.
• The mean:

• The standard error:

• Standardizing the sample r statistic

12
Problem- n>20
• A manufacturer of breakfast cereal uses a machine to insert
randomly one of two types of toys in each box. The company
wants randomness so that every child in the neighborhood
does not get the same toy.

• Testers choose samples of 60 successive boxes to see whether


the machine is properly mixing the two types of toys.

• Using the symbols A and B to represent the two types of toys, a


tester reported that one such batch looked like in the next slide.
The batch
B,A,B,B,B,A,A,A,B,B,A,B,B,B,B,A,A,A,A,B,
A,B,A,A,B,B,B,A,A,B,A,A,A,A,B,B,A,B,B,A,
A,A,A,B,B,A,B,B,B,B,A,A,B,B,A,B,A,A,B,B

• n1=29 number of boxes containing toy A

• n2=31 number of boxes containing toy B

• r = 29
14
• Hypothesis:

H0:The toys are randomly mixed.

H1:The toys are not randomly mixed


• The sampling distribution of r can be closely approximated
by the normal distribution if either n1 or n2 larger than 20.
• α = 0.05

• Accept the null hypothesis.

• Conclusion: toys are being inserted in boxes in random


order.
Mann-whitney U test (Wilcoxon rank sum test)
• The Mann-Whitney U test is one of the best-known
non-parametric statistical significance tests.

• It is sometimes also called the Mann-Whitney-Wilcoxon test.

• It is a non-parametric test that is used to compare two


population means that come from the same population, it is
also used to test whether two population means are equal or
not.
18
Applications:
• Mann-Whitney U test is used for every field, but in frequently
used in psychology, Food industry and business.
Assumptions:
• Mann-Whitney U test is a non-parametric test, hence it does not
assume any assumptions related to the distribution.

• However, some assumptions that are assumed


1. The sample drawn from the population is random.

2. Independence within the samples.

3. Ordinal measurement scale is assumed.


Steps
• Let two independent sample of sizes n , n be
1 2

( x1 , x2 ,........, xn1 )( y1 , y2 , ........ , yn 2 )


combined the samples an arranged in order in order of
magnitude. U 2

• Count number of y values proceed each x values, let

n2 (n2  1)
U 2  n1n2   R2
2
U 2  number of counts
R2  rank sum of y ' s
08/04/2024 21
• Count number of x values proceed each y values, let

n1 (n1  1)
U1  n1n2   R1
2
U1  number of counts
R1  rank sum of x' s n1

U  min(U1 , U 2 )
* U*

p  value
Hypothesis:
H0
: The corresponding populations are identical (same)
H1
: The populations are different. 22
Example
• The effectiveness of advertising for two rival products (Brand
X and Brand Y) was compared.

• Market research at a local shopping center was carried out,


with the participants being shown adverts for two rival
brands of coffee, which they then rated on the overall
likelihood of them buying the product.

• Half of the participants gave ratings for one of the products,


the other half gave ratings for the other product.
The data are ratings (ordinal data), and hence a nonparametric
test is appropriate - the Mann-Whitney U test.
• : 3 + 4 + 1.5 + 7.5 + 1.5 + 5.5 = 23

• : 11 + 9 + 5.5 + 12 + 7.5 + 10 = 55

• : = 2 <0.05

• So, our obtained U is less than the critical value of U for a 0.05
significance level.
Example :
The following data show the marks obtained for a question paper
by 10 male students and 10 female students.

Male : 31 25 38 33 42 40 44 26 43 35
Female : 44 30 34 47 35 32 35 47 48 34

Test the hypothesis that there is no difference between the average

marks of male students and female students.

08/04/2024 27
Wilcoxon Signed Rank Test
• Another popular nonparametric test for matched or paired data
is called the Wilcoxon Signed Rank Test.

• It is based on difference scores, but in addition to analyzing the


signs of the differences, it also takes into account the magnitude
of the observed differences.

• It can be used as an alternative to the paired Student's t-test, t-


test for matched pairs, or the t-test for dependent samples when
the population cannot be assumed to be normally distributed.
1. State the null hypothesis - in this case it is that the median
difference is equal to zero.
2. Calculate each paired difference,
=-
where are the pairs of observations.
3. Rank the differences, ignoring the signs
(i.e. assign rank 1 to the smallest ||, rank 2 to the next etc.)

4. Label each rank with its sign, according to the sign of .


5. Calculate
W+, the sum of the ranks of the positive differences,
W−, the sum of the ranks of the negative differences.
6. Choose W = max(W−,W+).

7. Use tables of critical values for the Wilcoxon signed rank sum
test to find the probability of observing a value of W or more
extreme.
Normal approximation

• Pairs is such that n(n+1)/2 is large enough (> 20), a normal


approximation can be used with
There are two types of tied observations that may arise when
using the Wilcoxon signed rank test:
• Observations in the sample may be exactly equal to M (i.e. 0
in the case of paired differences). Ignore such observations
and adjust n accordingly.

• Two or more observations/differences may be equal. If so,


average the ranks across the tied observations and reduce the
variance by t3−t/ 48 for each group of t tied ranks.
Example
• The table below shows the hours of relief provided by two
analgesic drugs in 12 patients suffering from arthritis.

• Is there any evidence that one drug provides longer relief


than the other?
Solution:
• In this case our null hypothesis is that the median difference
is zero.

• Actual differences (Drug B - Drug A) are:


+1.5, +2.1, +0.3,−0.2, +2.6,−0.1, +1.8,−0.6, +1.5, +2.0,
+2.3, +12.4

• Ranking the differences and affixing a sign to each rank


• Calculating W+ and W− gives:
W− = 1 + 2 + 4 = 7
W+ = 3 + 5.5 + 5.5 + 7 + 8 + 9 + 10 + 11 + 12 = 71
W = max(W−,W+) = 71.
• We have
n = 12×13/ 2 = 78

• We can use a normal approximation in this case.

• This gives a two-sided p-value of p = 0.012. There is strong


evidence that Drug B provides more relief than Drug A.
Example:
• Data: Before introducing a new beer on the market, the brewery
wants to know whether people appreciate it more than an existing
beer. Fifteen people give marks to both beers (blind test).
Person Mark, new Mark, old Difference
1 6 4 2
2 8 3 5
3 4 7 -3
4 8 6 2
5 9 5 4
6 6 8 -2
7 7 4 3
8 5 5 0
9 8 6 2
10 8 5 3
11 8 8 0
12 7 5 2
13 9 7 2
14 5 4 1
15 6 5 1
Person New Old Differ. Abs. Rank Abs Positive
1 6 4 2 2 7.5 yes
2 8 3 5 5 15 yes
3 4 7 -3 3 12 no
4 8 6 2 2 7.5 yes
5 9 5 4 4 14 yes
6 6 8 -2 2 7.5 no
7 7 4 3 3 12 yes
8 5 5 0 0 1.5 zero
9 8 6 2 2 7.5 yes
10 8 5 3 3 12 yes
11 8 8 0 0 1.5 zero
12 7 5 2 2 7.5 yes
13 9 7 2 2 7.5 yes
14 5 4 1 1 3.5 yes
15 6 5 1 1 3.5 yes
Normal Approximation

 W  n(n  1) / 4 99  n(n  1) / 4
P(W  99)  P(  )
n(n  1)( 2n  1) / 24 n(n  1)( 2n  1) / 24
 P( Z  2.15)  0.015.
Nonparametric tests for comparing three or
more groups or conditions:

(a) Kruskal-Wallis test:


Similar to the Mann-Whitney test, except it enables you to
compare three or more groups rather than just two.
Different subjects are used for each group.

(b) Friedman's Test:


Can use it with three or more conditions.
Kruskal- walli’s H test
• The Kruskal Wallis test is a non parametric test, which means that
the test doesn’t assume your data comes from a particular
distribution.

• The test is the non parametric alternative to the One Way ANOVA
and is used when the assumptions for ANOVA aren’t met (like
the assumption of normality).

• In statistics, the Kruskal–Wallis one-way analysis of variance by


ranks is a non-parametric method for testing equality of
08/04/2024 41
population medians among groups.
Test statistics :

k 2
12 Ri
H 
n(n  1) i 1 ni
 3(n  1)

 In this test all observation are rank


i th
jointly.

th

Ri
: Sum of ranks occupied by
ni
observations
i
of sample.

 2

 The H statistic is well approximated by the distribution


with k-1 degree of freedom.
08/04/2024 42
Example
A shoe company wants to know if three groups of workers
have different salaries:

Women: 23K, 41K, 54K, 66K, 78K.

Men: 45K, 55K, 60K, 70K, 72K

Minorities: 18K, 30K, 34K, 40K, 44K.


Solution
• Step 1: Sort the data for all groups/samples into ascending
order in one combined set.
• Step 2: Assign ranks to the sorted data points. Give tied values
the average rank.
• Step 3: Add up the different ranks for each group/sample.
Women: 23K, 41K, 54K, 66K, 90K = 2 + 6 + 9 + 12 + 15 = 44.
Men: 45K, 55K, 60K, 70K, 72K = 8 + 10 + 11 + 13 + 14 = 56.
Minorities: 20K, 30K, 34K, 40K, 44K = 1 + 3 + 4 + 5 + 7 = 20.
• Step 4: Calculate the test statistic:

k 2
12 Ri
H 
n(n  1) i 1 ni
 3(n  1)

H=6.72
• Step 5: Find the critical chi-square value. With c-1
degrees of freedom.

For 3– 1 degrees of freedom and an alpha level of .05, the


critical chi square value is 5.99.

Step 6: The chi-square value is not less than the test statistic, so
there is not enough evidence to suggest that the means are
unequal.
Example 02:
Does it make any difference to students’ comprehension of
statistics whether the lectures are given in English, Sinhala or
Singlish (both)?

Group A: lectures in English;


Group B: lectures in Sinhala;
Group C: lectures in Both.

DV: student rating of lecturer's intelligibility on 100-point scale


("0" = "incomprehensible").
English English Sinhala Sinhala Both (raw Both
(raw score) (rank) (raw score) (rank) score) (rank)

20 3.5 25 7.5 19 1.5


27 9 33 10 20 3.5
19 1.5 35 11 25 7.5
23 6 36 12 22 5

M = 22.25 M = 32.25 M = 21.50


SD = 3.59 SD = 4.99 SD = 2.65

Step 1:
Rank the scores, ignoring which group they belong to.
Lowest score gets lowest rank.
.
Step 3:

Find H.

 12 Tc 2
H     3   N  1
 N  N  1 nc 

N is the total number of subjects;


Tc is the rank total for each group;
nc is the number of subjects in each group.
 12 Tc 2 
H     3   N  1
 N  N  1 nc 

2 2 2 2
Tc 20 40.5 17.5
     586.62
nc 4 4 4

 12   (
H     586.62   3  13 )  6.12
 12 * 13  
Step 4:
Degrees of freedom are the number of groups minus one. Here, d.f. =
3 - 1 = 2.

For 2 d.f., a Chi-Square of 5.99 has a p = .05 of occurring by chance.


Here, d.f. = 3 - 1 = 2.

For 2 d.f., a Chi-Square of 5.99.

Conclusion:

The three groups differ significantly; the language in which


statistics is taught does make a difference to the lecturer's
intelligibility.
Example:
An experiment decide to compare three preventive method
against corrosion yielded (in thousand).

Method A 77 54 67 24 71 66
Method B 66 41 59 65 62 64 52

Method c 49 52 69 47 56

Use the 0.05 level of significance to test the null hypothesis


that three samples come from identical populations.
08/04/2024 54
Hypothesis:

H0 : The corresponding populations are identical .


(same)

H1
: The populations are different.

08/04/2024 55
Example :
Four different milling machines were being considered for
purchase by a manufacturer. Potentially, the company
would be purchasing hundreds of these machines, so it
wanted to make sure it made the best decision.
Initially, five of each machine were
borrowed, and each was randomly assigned to one of 20
technicians (all technicians were similar in skill).
Each machine was put through a series of
tasks and rated using a standardized test.. The data are:
08/04/2024 56
Machine 2 Machine 3 Machine 4
Machine 1

24.5 28.4 26.1 32.2


23.5 34.2 28.3 34.3
26.4 29.5 24.3 36.2
27.1 32.2 26.2 35.6
29.9 30.1 27.8 32.5

Use the 0.05 level of significance to test the null


hypothesis that four samples come from identical
populations.

08/04/2024 57
Friedman test
• This method compares several related samples and can be
used as a non-parametric alternative to the two way ANOVA.

• The power of this method is low with small samples but it is


the best method for non-parametric two way analysis of

variance with sample sizes above five or more.

08/04/2024 59
Example
• A randomized block experiment was conducted to evaluate the effect
of a drug treatment on enzyme activity.
• Three different drug therapies were given to four animals, with each
animal belonging to a different litter.
Drug 1 Drug 2 Drug 3
Animal 1
Animal 2
Animal 3
Animal 4

• The Friedman test provides the desired test of

H0: four different treatment (Drug) effects are zero

H1: not all treatment effects are zero.


Treatment
Block 1 2 ... k
1 X11 X12 ... X1k
2 X21 X22 ... X2k
3 X31 X32 ... X3k
... ... ... ... ...
b Xb1 Xb2 ... Xbk

H0 : The treatment effects have identical effects

Ha : At least one treatment is different from at least one other treatment

08/04/2024 61
H0 : The treatment effects have identical effects

Ha : At least one treatment is different from at least one other treatment


To illustrate the Friedman rank test, return to the fast-food chain study
in which six raters (blocks) evaluated four restaurants .
Conclude that there are significant differences (as perceived by
the raters) in the median service ratings at the four restaurants.
Contingency tables

• A contingency table (also known as a cross


tabulation or crosstab) is a type of table in a matrix format that
displays the (multivariate) frequency distribution of the
variables.

• They are heavily used in survey research, business intelligence,


engineering and scientific research.

• They provide a basic picture of the interrelation between two


variables and can help find interactions between them.
Testing association in a contingency table
• A contingency table with r rows and c columns is referred to
as an r×c table (“r×c” is read as “r by c”).

• The row and column totals are called marginal frequencies.

=
r c (Oij  Eij ) 2
We find such that,

i 1 j 1 Eij

• If with degrees of freedom, reject the null hypothesis of


independence at the level of significance; otherwise fail to
reject the null hypothesis.
Example
• Suppose that a researcher studied the relationship between
having the AIDS Syndrome and sexual preference.
• The study resulted in the following data for thirty subjects:

Y = "yes" and N = "no" for AIDS


F = "female", M = "male" and B = "both" for SEXPREF.
Null Hypothesis (H0): There is no relationship between sexual preference and
whether or not an individual has the AIDS Syndrome (Independent).

Alternative Hypothesis (H1): There is a relationship between sexual preference


and whether or not an individual has the AIDS Syndrome (dependent).
(AIDS Syndrome is not distributed similarly across the different levels of sexual
preference)
Example
• Suppose that we wish to determine whether the opinions of the voting
residents of the Colombo Municipal Council area concerning a new tax
reform are independent of their levels of income.
• A random sample of 1000 registered voters are classified as to whether
they are in a low, medium or high income bracket and whether or not
they favor a new tax reform.

Income Level
Tax reform Low Medium High Total
For 182 213 203 598
Against 154 138 110 402
total 336 351 313 1000

• Test the null hypothesis, of independence between a voter’s opinion concerning


the new tax reform.
08/04/2024
08/04/2024
 74
74

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy