0% found this document useful (0 votes)
38 views

09 Introduction To Nonparametric Methods

This document provides an overview of nonparametric statistical methods. It discusses when to use nonparametric tests over parametric tests, some advantages and disadvantages of nonparametric methods, assumptions of nonparametric statistics, and how to select the appropriate statistical tool depending on the data and purpose of analysis. Specific nonparametric methods covered include the Spearman rank correlation coefficient and assessing normality using statistical graphs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

09 Introduction To Nonparametric Methods

This document provides an overview of nonparametric statistical methods. It discusses when to use nonparametric tests over parametric tests, some advantages and disadvantages of nonparametric methods, assumptions of nonparametric statistics, and how to select the appropriate statistical tool depending on the data and purpose of analysis. Specific nonparametric methods covered include the Spearman rank correlation coefficient and assessing normality using statistical graphs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Nonparametric Methods

Glyzel Grace M. Francisco


STAT1200 – Management Science
2nd Semester, 2022-2023

CENTRAL LUZON STATE UNIVERSITY


DEPARTMENT of
STATISTICS Learning Outcomes

At the end of this lesson, you would be able to:

1. Decipher when to perform nonparametric tests.


2. Perform Spearman Rank Correlation
3. Perform Chi-square Test for Independence

NONPARAMETRIC METHODS | 2
DEPARTMENT of
STATISTICS Parametric vs. Nonparametric Statistical Analysis

Parametric tests assume underlying statistical distributions in the data.


Therefore, several conditions of validity must be met so that the result of a
parametric test is reliable. For example, Student’s t-test for two
independent samples is reliable only if each sample follows a normal
distribution and if sample variances are homogeneous.

Nonparametric tests do not rely on any distribution. They can thus be


applied even if parametric conditions of validity are not met.

NONPARAMETRIC METHODS | 3
DEPARTMENT of
STATISTICS Advantages of Nonparametric Methods

1. They can be used to test population parameters when the variable is not
normally distributed.
2. They can be used when the data are nominal or ordinal.
3. They can be used to test hypotheses that do not involve population
parameters.
4. In some cases, the computations are easier than those for the parametric
counterparts.
5. They are easy to understand.
6. There are fewer assumptions that have to be met, and the assumptions
are easier to verify.

NONPARAMETRIC METHODS | 4
DEPARTMENT of
STATISTICS Disadvantages of Nonparametric Methods
1. They are less sensitive than their parametric counterparts when the assumptions of
the parametric methods are met. Therefore, larger differences are needed before
the null hypothesis can be rejected.
2. They tend to use less information than the parametric tests. For example, the sign
test requires the researcher to determine only whether the data values are above or
below the median, not how much above or below the median each
value is.
3. They are less efficient than their parametric counterparts when the assumptions of
the parametric methods are met. That is, larger sample sizes are needed to
overcome the loss of information. For example, the nonparametric sign test is
about 60% as efficient as its parametric counterpart, the z test. Thus, a sample size
of 100 is needed for use of the sign test, compared with a sample size of 60 for use
of the z test to obtain the same results.
NONPARAMETRIC METHODS | 5
DEPARTMENT of
STATISTICS Assumptions of Nonparametric Statistics

1. The sample or samples are randomly selected.


2. If two or more samples are used, they must be independent of each other
unless otherwise stated.

Remarks:
• If the parametric assumptions can be met, the parametric methods are
preferred.
• When parametric assumptions cannot be met, the nonparametric
methods are a valuable tool for analyzing the data.

NONPARAMETRIC METHODS | 6
DEPARTMENT of
STATISTICS Selection of statistical tools
Parametric Test Nonparametric Test
Conditions/ Purposes
Normal Distribution Non-normal Distribution
One sample z-test (if 𝜎 is known)
Compare a mean with
and Wilcoxon test
standard value
One sample t-test (if 𝜎 is unknown)
Two independent samples z-test
(if 𝜎1 𝑎𝑛𝑑 𝜎2 are known)
Compare two means of
and Mann-Whitney test
unpaired data sets
Two independent samples t-test
(If 𝜎1 𝑎𝑛𝑑 𝜎2 is unknown)
Compare two means of
Paired-sample t-test Wilcoxon test
paired data sets
Compare >2 means of
One-way ANOVA Kruskal-Wallis test
unmatched data sets

NONPARAMETRIC METHODS | 7
DEPARTMENT of
STATISTICS Selection of statistical tools
Parametric Test Nonparametric Test
Conditions/ Purposes
Normal Distribution Non-normal Distribution
Compare >2 means of
Multi-factor ANOVA Friedman test
matched data sets

Find the relationship


Pearson’s correlation Spearman’s correlation
between two variables
Predict the values of
Simple linear or
one variable from Spearman’s correlation
nonlinear regression
another
Find the relationship Multiple regression Kendall’s coefficient of
among several variables (linear/nonlinear) concordance
NONPARAMETRIC METHODS | 8
DEPARTMENT of
STATISTICS

Assessing normality using different statistical graphs/plots


• A normal quantile plot (or normal probability plot) is a graph of points (x,y) where
each x value is from the original set of sample data, and each y value is the
corresponding z score that is a quantile value expected from the standard normal
distribution.
Procedure for determining whether it is reasonable to assume that sample data are
from a normally distributed population:
1. Histogram: Construct a histogram. Reject normality if the histogram departs
dramatically from a bell shape.
2. Outliers: Identify outliers. Reject normality if there is more than one outlier
present. (Just one outlier could be an error or the result of chance variation, but
be careful, because even a single outlier can have a dramatic effect on results.)
NONPARAMETRIC METHODS | 9
DEPARTMENT of
STATISTICS

3. Normal quantile plot: If the histogram is basically symmetric and there is at most one
outlier, use technology to generate a normal quantile plot. Use the following criteria
to determine whether or not the distribution is normal. (These criteria can be used
loosely for small samples, but they should be used more strictly for large samples.)
Normal Distribution:
The population distribution is normal if the pattern of the points is reasonably close to a straight line
and the points do not show some systematic pattern that is not a straight-line pattern.

Not a Normal Distribution:


The population distribution is not normal if either or both of these two conditions applies:
• The points do not lie reasonably close to a straight line.
• The points show some systematic pattern that is not a straight-line pattern. Later in this section
we will describe the actual process of constructing a normal quantile plot, but for now we focus
on interpreting such a plot.
NONPARAMETRIC METHODS | 10
DEPARTMENT of
STATISTICS Example (Normal)
The first case shows a histogram of IQ scores that is close to being bell-shaped, so the
histogram suggests that the IQ scores are from a normal distribution. The
corresponding normal quantile plot shows points that are reasonably close to a
straight-line pattern, and the points do not show any other systematic pattern that is
not a straight line. It is safe to assume that these IQ scores are from a normally
distributed population.

NONPARAMETRIC METHODS | 11
DEPARTMENT of
STATISTICS Example (Uniform)
The second case shows a histogram of data having a uniform distribution. The
corresponding normal quantile plot suggests that the points are not normally
distributed because the points show a systematic pattern that is not a straight-
line pattern. These sample values are not from a population having a normal
distribution.

NONPARAMETRIC METHODS | 12
DEPARTMENT of
STATISTICS Spearman Rank Correlation Coefficient

• The Spearman rank correlation coefficient is a nonparametric statistic that


uses ranks to determine if there is a relationship between two variables.
• The computations for the rank correlation coefficient are simpler than
those for the Pearson coefficient and involve ranking each set of data.
• The difference in ranks is found, and 𝑟𝑠 is computed by using these
differences.
• If both sets of data have the same ranks, 𝑟𝑠 will be +1.
• If the sets of data are ranked in exactly the opposite way, 𝑟𝑠 will be -1.
• If there is no relationship between the rankings, 𝑟𝑠 will be near 0.

NONPARAMETRIC METHODS | 13
DEPARTMENT of
STATISTICS Spearman Rank Correlation Coefficient
Assumptions for Spearman’s Rank Correlation Coefficient
1. The sample is a random sample.
2. The data consist of two measurements or observations taken on the
same individual.
Formula for Computing the Spearman Rank Correlation Coefficient
6 σ 𝑑2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
Where: 𝑑 = difference in ranks
𝑛 = number of data pairs

Decision Rule: Reject Ho if |𝑟𝑠 | ≥ 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒


NONPARAMETRIC METHODS | 14
DEPARTMENT of
STATISTICS Spearman Rank Correlation Coefficient
Steps in Performing Spearman’s Rank Correlation Coefficient
Step 1: State the hypotheses.
Step 2: Find the critical value.
Step 3: Find the test value.
a. Rank the values in each data set.
b. Subtract the rankings for each pair of data values
c. Square the differences.
d. Find the sum of the squares.
e. Substitute in the formula of 𝑟𝑠
Step 4: Make the decision.
Step 5: Summarize the results.
NONPARAMETRIC METHODS | 15
DEPARTMENT of
STATISTICS Example 1

Nursing
Hospitals
Homes
Find the Spearman rank correlation coefficient
107 230
for the following data, which represent the
number of hospitals and nursing homes in each 61 134
of seven randomly selected states. At the 0.05 202 704
level of significance, is there enough evidence 133 376
to conclude that there is a correlation between
145 431
the two?
117 538
108 3

NONPARAMETRIC METHODS | 16
DEPARTMENT of
STATISTICS Example 1
Critical Values for the Rank Correlation Coefficient

Claim: there is a correlation between the two variables

1. Ho: 𝜌 = 0 (There is no correlation between the number of


hospitals and nursing homes)
Ha: 𝜌 ≠ 0 (There is correlation between the number of
hospitals and nursing homes) -> claim

2. Test: Spearman’s Rank Correlation Coefficient

𝑛 = 7 , 𝛼 = 0.05 , critical value = 0.786

3. Decision Rule: Reject Ho if |𝑟𝑠 | ≥ 0.786.

NONPARAMETRIC METHODS | 17
DEPARTMENT of
STATISTICS Example 1
4. Computation
a.) Rank each data set as shown in the table. Let 𝑋1 be the hospitals and
𝑋2 be the nursing homes.
Hospitals Nursing Homes
Rank of 𝑿𝟏 Rank of 𝑿𝟐 𝒅 = 𝑿𝟏 − 𝑿𝟐 𝒅𝟐
(𝑿𝟏 ) (𝑿𝟐 )
107 2 230 2 0 0
61 1 134 1 0 0
202 7 704 7 0 0
133 5 376 4 1 1
145 6 431 5 1 1
117 4 538 6 -2 4
108 3 373 3 0 0 σ 𝑑2 = 6
NONPARAMETRIC METHODS | 18
DEPARTMENT of
STATISTICS Example 1

b.) Substitute in the formula for


6 σ 𝑑2 6 6
𝑟𝑠 = 1 − 2
=1− 2
= 𝟎. 𝟖𝟗𝟐𝟗
𝑛 𝑛 −1 7 7 −1
5. Decision
Since 0.8929 > critical value (0.786), we reject 𝐻𝑜 .

6. Conclusion
At 5% level of significance, the sample data support the claim that there is
correlation between the number of hospitals and nursing homes.
(see next slide for the wording of final conclusion)

NONPARAMETRIC METHODS | 19
DEPARTMENT of
STATISTICS



Claim: ρ ≠ 0
Decision: Reject 𝑯𝒐
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 20
DEPARTMENT of
STATISTICS Example 2
English Math
The following data shows the final term 56 66
exam scores in English and Math of 10 75 70
students. At the 0.01 level of 45 40
significance, is there enough evidence to 71 60
conclude that there is a correlation 62 65
between the two variables? 64 56
58 59
80 77
76 67
61 63

NONPARAMETRIC METHODS | 21
DEPARTMENT of
STATISTICS Example 2
Critical Values for the Rank Correlation Coefficient
Claim: there is a correlation between the two
variables

1. Ho: 𝜌 = 0 (There is no correlation between the


number of hospitals and nursing homes)

Ha: 𝜌 ≠ 0 (There is correlation between the number


of hospitals and nursing homes) -> claim

2. Test: Spearman’s Rank Correlation Coefficient


𝑛 = 10 , 𝛼 = 0.01 , critical value = 0.794
3. Decision Rule: Reject Ho if |𝑟𝑠 | ≥ 0.794.
NONPARAMETRIC METHODS | 22
DEPARTMENT of
STATISTICS Example 2
4. Computation
a.) Rank each data set as shown in the table. Let 𝑋1 be English scores and
𝑋2 be the Math scores.
English (𝑿𝟏 ) Rank of 𝑿𝟏 Math(𝑿𝟐 ) Rank of 𝑿𝟐 𝒅 = 𝑿𝟏 − 𝑿𝟐 𝒅𝟐
56 9 66 4 5 25
75 3 70 2 1 1
45 10 40 10 0 0
71 4 60 7 3 9
62 6 65 5 1 1
64 5 56 9 4 16
58 8 59 8 0 0
80 1 77 1 0 0
76 2 67 3 1 1 σ 𝑑2 = 54
61 7 63 6 1 1
NONPARAMETRIC METHODS | 23
DEPARTMENT of
STATISTICS Example 2

b.) Substitute in the formula for


6 σ 𝑑2 6 54
𝑟𝑠 = 1 − 2
=1− 2
= 𝟎. 𝟔𝟕𝟐𝟕
𝑛 𝑛 −1 10 10 − 1
5. Decision
Since 0.6727 ≥ critical value (0.794), we failed to reject 𝐻𝑜 .

6. Conclusion
At 1% level of significance, there is no sufficient sample evidence to support
the claim that there is correlation between the English and Math scores of
students. (see next slide for the wording of final conclusion)
NONPARAMETRIC METHODS | 24
DEPARTMENT of
STATISTICS

Claim: ρ ≠ 0
Decision: Failed to reject 𝑯𝒐

INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 25
DEPARTMENT of
STATISTICS Chi-Squared Test of Independence

• tests the null hypothesis that the row variable and column variable in a
contingency table are not related

Ho: The row variable and column variable are not related
Ha: The row variable and column variable are related

Assumptions:
• The sample data are randomly selected.
• For every cell in the contingency table, the expected frequency is at least 5.

NONPARAMETRIC METHODS | 26
DEPARTMENT of
STATISTICS Chi-Squared Test of Independence

Test Statistic Value:


𝑘
𝑂𝑖 − 𝐸𝑖 2
𝑐 = ෍
2
𝐸𝑖
𝑖=1
Where:
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝐸𝑖 =
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙

Decision Rule: Reject Ho if 2𝑐 > 2𝛼,(𝑟−1)(𝑐−1) where r is the number of rows
and c is the number of columns in a contingency table
NONPARAMETRIC METHODS | 27
DEPARTMENT of
STATISTICS Chi-Squared Test of Independence

Steps in Performing Chi-Squared Test of Independence


Step 1: State the hypotheses.
Step 2: Find the critical value.
Step 3: Find the test value.
a. First, find the expected values for each cell of the contingency table.
b. Find the test value using the formula of 2𝑐 .
Step 4: Make the decision.
Step 5: Summarize the results.

NONPARAMETRIC METHODS | 28
DEPARTMENT of
STATISTICS Example 1

Based on the table below, is there evidence to suggest that sex is related to
whether a person is left-handed or right-handed? Test at 0.05 level of
significance.

Hand Preference
Sex Total
Left Right
Female 12 108 120
Male 24 156 180
Total 36 264 300

NONPARAMETRIC METHODS | 29
DEPARTMENT of
STATISTICS Example 1
Claim: The sex and hand preference are related
1. Ho: The sex and hand preference are not related
Ha: The sex and hand preference are related -> claim
2. Test: Chi-Squared Test
𝑑𝑓 = 𝑟 − 1 𝑐 − 1 = 2 − 1 2 − 1 = 1 (see slide 27)
𝛼 = 0.05 , critical value = 3.841

3. Decision Rule: Reject Ho if 2𝑐 > 3.841.


NONPARAMETRIC METHODS | 30
DEPARTMENT of
STATISTICS Example 1
4. Computation
First, find the expected values for each cell of the contingency table.
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
Hand Preference (j) 𝐸𝑖𝑗 =
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
Sex (i) Total
Left (1) Right (2)
(120)(36) (120)(264)
𝐸11 = = 14.4 𝐸12 = = 105.6
Female (1) 12 108 120 300 300
Male (2) 24 156 180
(180)(36) (180)(264)
Total 36 264 300 𝐸21 = = 21.6 𝐸22 = = 158.4
300 300

𝑘
2 2 2 2 2
𝑂𝑖 − 𝐸𝑖 12 − 14.4 108 − 105.6 24 − 21.6 156 − 158.4
𝑐 = ෍
2 = + + + = 𝟎. 𝟕𝟓𝟕𝟔
𝐸𝑖 14.4 105.6 21.6 158.4
𝑖=1

NONPARAMETRIC METHODS | 31
DEPARTMENT of
STATISTICS Example 1

5. Decision
Since 0.7576 > critical value (3.841), we failed to reject 𝐻𝑜 .

6. Conclusion
At 5% level of significance, we can conclude that the sex and the hand
preference are not related.

Ho: The sex and hand preference are not related


Ha: The sex and hand preference are related

NONPARAMETRIC METHODS | 32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy