0% found this document useful (0 votes)

4 views

LECTURE 2 - Introduction to Nonparametric

The document provides an introduction to non-parametric methods in statistics, focusing on simple linear regression and its assumptions, as well as the differences between parametric and non-parametric techniques. It discusses the application of regression models using cancer mortality data related to radiation exposure, and outlines the advantages and disadvantages of non-parametric methods. Additionally, it covers classical rank-based tests such as Spearman rank correlation and their use in situations where traditional assumptions may not hold.

Uploaded by

Maribel Hidalgo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

LECTURE 2 - Introduction to Nonparametric

Uploaded by

Maribel Hidalgo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Lecture 2 - Introduction to Non-Parametric

Administrative Notes
1) SPSS can have glitches. If you experience difficulty reading in data: 1-make sure your
original file is closed. 2-restart SPSS. 3-restart your computer

Simple Linear Regression

• Outcome: continuous/interval
• Predictor: continuous/interval

Then how is simple linear regression different from correlation?

• The two are technically equivalent, but used in different ways. Regressions are used to
predict or estimate the value of the outcome, Y, given the predictor, X.
Pre-assumptions
(a) A straight line is a reasonable model
(b) X values (predictor) are “known” or measured without error—not random
The main assumptions we then need are about the errors in the model:
Yi=β0 + β1*Xi + εi
εi = Individual variation about the average y at a given x
Cancer Data: Radiation and Cancer Mortality

Regression Model: 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶�

𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 = 114.72 + 9.23 ∗ 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
What can we do with this regression model?
1) Prediction: What is the estimated cancer mortality for people exposed to radiation of 5?

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶�
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 = 114.72 + 9.23 ∗ 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶�
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 = 114.72 + 9.23 ∗ 5

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶�
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 = 160.87

2) Interpretation:
�𝒊𝒊 ’s.
Estimates 𝜷𝜷
General
�0 = intercept= average value of Y when all the X’s=0
β

Page 1 of 24
�1 = “slopes” associated with predictor Xj; 1 unit change (usually increase) in Xj is associated
β
with a β�j change in Y
In the context of our radiation example.
�0 or b0 (intercept): 114.72; the average cancer mortality rate is 114.72 death per 100,000 people
β
when there is no radiation exposure.
�1 or b1 (slope): 9.23; for every point increase in radiation exposure, the cancer mortality rate
β
increase by 9.23 deaths per 100,000 people
�𝒊𝒊 ’s.
Confidence Interval for 𝜷𝜷
�0 : 95% CI [95.69, 133.74]
β
On average in area where there is no radiation exposure, cancer mortality rate is between 96 and
134 deaths/100,000 people. Since these rates are all above 0 we’re sure (95%) that there’s cancer
even without radiation.
�1: 95% CI [5.88, 12.59]
β
On average, every extra point of radiation exposure is associated with an increase in mortality
rate of between 5.88 to 12.59 deaths per 100,000. Again whole CI is above 0 => positive
relationship between radiation and morality rate.
SPSS
Parameter Estimates
Dependent Variable: CancerMortality
95% Confidence Interval
Parameter B Std. Error t Sig. Lower Bound Upper Bound
Intercept 114.716 8.046 14.258 0.000 95.691 133.741
Radiation 9.231 1.419 6.507 0.000 5.877 12.586

Assumptions:
- Y is continuous
- Average relationship between Y and X’s is linear
- X’s are on the correct scale
- Errors (𝜀𝜀′𝑠𝑠) are
 Mean zero
Like having a random
 Independent
sample from a
 Normally distributed
common distribution
 Have constant variance (same degree of certainty in the measurements for
each subject / set of X values)

Page 2 of 24
This class is about what we do when one or more of the OLS assumptions is violated.

Thus far, we dealt with independent observations. But what happens when the observations are
NOT independent of each other.
Example: Effect of COVID quarantine and Exercise Program
X=weight before COVID quarantine
Y=weight after COVID quarantine
Z=X-Y= amount of weight lost (note: Z<0 indicates weight gained)
Paired t-test setting

Non-parametric
Nonparametric/semi-parametric methods: use these techniques when you do not know or don’t
wish to make assumptions about the distribution of
- The data: i.e. X or Y variables or errors
- The parameters estimate (e.g. β;s)
There are many such techniques:
- Classical methods: principally based on ranks; examples include Spearman rank
correlation, Wilcoxon rank-sum, Wilcoxon signed-rank tests, the Kruskal-Wallis test, etc.
o Those generalized Pearson correlation, t-test, paired t-test, and ANOVA
respectively
- Modern methods: smoothing, permutation tests, the bootstrap and other simulation or
resampling methods.
Idea is to let the data tell you about the distribution or nature of the relationships. These
techniques make fewer (but not NO) assumptions.
- Want to do standard tests or fit models making as few assumptions as possible about
distribution of the estimator for parameter of interest.
Classical Approaches: Mostly based on ranks. The basic procedure is as follows:
1. Order (Rank) the values in your data set from smallest to largest
2. Replace the original values by their ranks
3. Ties get the average of the ranks
4. Run the analysis you wanted to do on the ranks
Key: Distribution of ranks is “known”
- it’s uniform on 1,2,….,n

Page 3 of 24
- so it can easily get the distribution of your test statistics
Advantages of Non-parametric Methods:
• Easy: can be used with most standard methods
• Know that you are using the “right” distribution
• Works no matter what underlying distribution
• Robust even if you have an extreme outlier, at worst it’s rank is 1 to n
Disadvantages of Non-parametric Methods:
• Lose lots of information when we throw away original values
• Non-parametric methods tend to be less powerful than the corresponding parametric
method would be if its assumptions were correct

Example: Smallest Largest

0 100

Data Set 1: … … ….. …. …

G1 G2

Data Set 2: …...

G1
….…. G2
Rank method views these 2 data sets as identical and probably would not show a group
difference for either; for set 1, a t-test definitely would see a difference.
Main Classical Rank Based Tests
1. Spearman Rank Correlation
-sample of size n
-two variables X,Y
-want to measure the strength of their relationship
-standard “Pearson” correlation gives.
1
∑𝑛𝑛 � �
𝑖𝑖=1(𝑋𝑋𝑖𝑖 −𝑋𝑋)(𝑌𝑌𝑖𝑖 −𝑌𝑌)
r = 𝑛𝑛 𝑆𝑆𝑆𝑆(𝑋𝑋) 𝑆𝑆𝑆𝑆(𝑌𝑌)

which is a measure of the degree of linear relationship between X and Y

Non-parametric version
- Rank X’s and Y’s from 1 to n (preserve the pairs)

Page 4 of 24
- Calculate Pearson correlation of the ranks
- Answer is still between -1 to 1 and the interpretation is largely the same BUT the
Spearman correlation doesn’t measure linear relationships
- Spearman correlation measures whether X and Y have the same ordering
(monotonicity; monotonic relationships)
When would we use Spearman correlation?
- Distributions of X and Y are very non-normal
- Sample size is small or there are extreme outliers
- If you want strength of a non-linear relationship:
- If data are ordinal, but not interval scaled
How do we choose between Pearson and Spearman correlations?
- The two methods can produce quite similar answers (e.g. when the data match the
parametric assumptions of normality); in this case, you feel very confident about the
answer and it doesn’t matter which you use
- They can produce quite different answers (e.g. relationship is non-linear; there are
outliers); how big the difference depends on the exact data pattern. In this case, you
need to try to identify what caused the difference and decide which is more
appropriate based on context/analytic goals. This is an art rather than a rigid rule.
Bottom line: if in doubt, calculate both and compare the results.

Page 5 of 24
Figure A: The Spearman correlation captures
perfect non-linear relationship without
figuring out a transformation

Figure B: If you have elliptical data cloud,

Pearson and Spearman give basically the
same answer.

Figure C: Spearman less affected by the

outliers.

Constructing Hypothesis Test for the Spearman Rank Correlation

1. Pretend that when you use ranks, the distributional assumptions for Fisher z-test or large-
sample t-test are good enough, Works reasonably well if n large
a. However, goal of non-parametric is not to make distributional assumptions! Nicer
to get an “exact” test.
2. We can write down all the possible combinations of X and Y ranks, calculate the
corresponding correlations, sorting through the results and see where our observed value
falls. This gives the null distribution (no correlation, random correspondence between x
and y ranks.) where our value falls on the null distribution gives the p-value.
- Idea is that under the H0, any rank of Y is equally likely to go with any X rank.
Then we look at our observed rank correlation and see where it falls on the list.
Page 6 of 24
The more it is in the extreme tails of the null distribution, the smaller the p-value
we get/the more the data support the alternative.
Example: With n=3 observations, there are 3 possible combinations of X and Y ranks (in
general, its n!; this gets large real fast and needs a computer). If we calculate these 6 correlations,
this gives us the null distribution of the Spearman rank correlation. Each of these values is
equally likely (p=1/6 under H0, and this will let us conclude the p-value.
With n=3 this gives the following:
H0: ρ=0 – no relationship (i.e. no correspondence of ordering) between X and Y
HA: ρ≠0 – there is a relationship between X and Y (2 tail)
OR HA: ρ>0 – there is a POSITIVE relationship between X and Y (1 tail)

Raw X Rank X Raw Y Rank Y

5 1 0 1
10 2 2 2
15 3 10 3

Possible shuffles:
Rank of X X 1 2 3 Spearman
Correlations
Null distribution given
Possible orders Y 1 2 3 1 (observed value) by the 6 values- all
6 possible 1 3 2 .5 correlations of X and Y
orders for y 2 1 3 .5 ranks; all equally likely
values all 2 3 1 -0.5 under H0
equally under 3 1 2 -0.5
H0 3 2 1 -1

p-value=probability of a correlation as high or higher than what we observed under H0

1 tail p-value: 1/6 = 0.167 (probability of a positive correlation this high or higher under H0)
2 tail p-value: 2/6 = 1/3 = 0.333 (probability of this high or higher correlation in absolute
value under H0)
We got the best possible rank correlation but since the n is small we cannot conclude it is
significant (p-value-1/6 = 0.167 for 1-sided test or 1/6+1/6=1/3 = 0.333 for a 2-sided test). This
is a pain to do by hand for large n but easy for a computer and illustrates one of the key ideas
underlying simulation methods.

Page 7 of 24
Rank Analogues for t-tests and ANOVA
Paired t-test setting
Example: Effect of Diet and Exercise Program
X=weight before diet/exercise program
Y=weight after the program
Z=X-Y= amount of weight lost (note: Z<0 indicates weight gained)
What I want to know is whether the program helps people lose weight. How do we define
“helping”?

Scenario 1: On average people in program lose weight (classical paired t-test)

Classical paired t-test hypotheses
H0: 𝜇𝜇𝑧𝑧 ≤ 0 – on average, no weight loss or even gained weight
HA: 𝜇𝜇𝑧𝑧 > 0 – on average, there is weight loss
Note: Paired t-test assumes that Z, the weight loss variable, is normally distributed and that
group means are a good way of looking at things.

P=1/2 P=1/2

P>1/2 P<1/2

Black line=mean weight loss

P<1/2 P>1/2 Red line=median weight loss

Paired t-test gives the same answer (no

program effect) for all of those, non-
parametric tests may not.

Page 8 of 24
Scenario 2: We could ask instead whether people on the program are more likely to lose weight
than not to lose weight.
Let P= P(Z>0)= probability of losing weight
Then we can write our hypotheses
H0: P ≤ ½ -people are no more likely to have lost (than gained wt.); half or fewer lose weight
HA: P > ½ -more than half of people lose weight
You can think of this as asking whether median weight loss is above zero.
How do we actually perform the hypothesis test?
Sign Test: Focuses on H0: P ≤ ½ versus HA: P > ½
- Throws away all info except whether or not the person lost weight. Test statistic is
the number or fraction of people in data set who lost weight.
In general, if we have a sample of size n and we let n* be the number of subjects who have a
difference (either + or -) between their two values (i.e. non-zero weight loss). We want to
calculate the number/fraction with a positive value.
Why n*? Convention is that pairs with no difference tell you nothing about the direction of the
effect. People with ties some value before and after tell you nothing about direction of change
and are usually dropped.
- However, we can argue in our context that weight losses of zeroes are “failures” and count
them as “non-loss”. The clinician/researcher need to decide!
- Under H0 (P= ½ as boundary condition), then s+, the number of pairs with a positive
difference, has a binomial distribution with parameters n* and p= ½ (flipping a fair coin)
The p-value is probability of seeing data as or more extreme than what you observed (more
favorable to HA assuming H0 is true.
In weight loss example, p-value is the probability that “this” many people or more in our
sample would have lost weight if the program didn’t work.
Example: n= 9 people
n* = 8 people had weight change
s+ = 7 of these 8 lost weight
H0: P ≤ ½ versus HA: P > ½ recall p=probability of losing weight
p-value=probability that at least 7 out of 8 people would lose weight if program did not work
p-value = P(s+≥7|p=1/2, n=8) = p(s+ = 7) + p(s+ = 8)
}

as many people or more than what I observed would lose

weight assuming H0 is true Page 9 of 24
Side note 1: If we wanted to do a 2-sided test (for program cause change in weight), we also
need to include p(s+ = 0) and p(s+ = 1) for both extremes.
Side note 2: If we wanted to count the 1 person with no change (as non-loss; failed) then we have
7 out of 9 lose weight  p-value = P(s+≥7|p=1/2, n=9) = p(s+ = 7) + p(s+ = 8) + p(s+ = 9)
How do we actually calculate this p-value for sign test?
Recall: Binomial probabilities are computed as follows s+~Bin(n,p)

P (s+ = k) = �𝑛𝑛𝑘𝑘� pk (1-p)n-k

𝑛𝑛!
where �𝑛𝑛𝑘𝑘� = “n choose k” = 𝑘𝑘!(𝑛𝑛−𝑘𝑘)!

where n! = n ∙ (n-1)… 3 ∙ 2 ∙ 1
In our example n*= 8 P= ½ under H0
p-value= P(s+ = 7) + P(s+ = 8)
1 1 1 1
= �87� (2)7 (2)1 + �88�(2)8 (2)0 = .035

In general, let computer do it, the conclusion for the 1-sided test using 𝛼𝛼 = .05 is that we reject
H0 (p-value=0.035) and conclude that people on the diet program do “lose weight” i.e. more
likely to have lower weight after the program than before.
Two-sided test would be:
p-value= P (s+ = 0) + P (s+ = 1) + P (s+= 7) + P (s+= 8) = .07 (by symmetry) and we’d fail to
reject at .05
A problem with the sign test is that it has rather low power because it throws away all info about
the actual values (even ranks) and we really care about magnitude of the changes in our example.

Page 10 of 24
Wilcoxon signed-rank test:
- same set up as the sign test but technically instead of testing whether the median
difference is zero (i.e. P=1/2), Wilcoxon signed-rank tests whether the difference scores
are symmetric about zero. It takes into account the magnitude as well as the signs but
only using ranks, not the original values to avoid distributional assumptions.
Procedure:
1. take differences: Z=X-Y
2. Take |Z|, absolute value of the differences (i.e. size but not direction) and rank them from
smallest to largest
3. Our test statistic W+ = sum of the ranks for the subjects with positive changes (of course
W- works just as well)
High W+ means either
(i) lots of positive differences (most people lost weight)
(ii) All the people with the highest ranks had positive changes (biggest weight changes
were losses)
a. High W- or low W+ means opposite
b. W- ≈ W+ suggests there was no change in either direction
We need to get a p-value associated with our test statistic W+ (or W-).
To get the distribution of W+ under H0 you (or better computer!) write done all the possible
combinations of ranks (i.e. possible values of W+), for your sample size, order them, and look
at where your observed value falls on the list.
- Why are we doing this? Under H0, each rank (subject) is equally likely to be a
loss or a gain so all combinations of ranks are equally likely. We calculate the
rank sum and this is the probability associated with each value of W+.
Note: Need n≥5 or no hope of significance
- May have ties  use average ranks but his makes calculating W+ distribution
even harder
- It outs out there’s also a normal approximation for the distribution of W+ if n is
large enough (n > 20).
Example: Weight data W+=30 (7+2+4.5+8+1+4.5+3); W- = 6
This is obviously titled in favor of weight loss but because the one person who gained weight
had a high rank and sample is small the p-value = .0508 (1-sided; look up via table or
computer; or p=0.10 for two sided p-value) so just fail to reject H0.

Page 11 of 24
Weight Before Weight After Weight Change Sign of Change Removed sign Ranks Ranks with Signs
ID X Y X-Y Sign X-Y Absolute X-Y Rank of Absolute Signed Rank
1 125 110 15 + 15 7 7
2 115 112 3 + 3 2 2
3 130 125 5 + 5 4.5 4.5
4 140 140 0 0
5 115 124 -9 - 9 6 -6
6 140 123 17 + 17 8 8
7 125 123 2 + 2 1 1
8 140 135 5 + 5 4.5 4.5
9 135 131 4 + 4 3 3

Two sample T-test and ANOVA equivalents

Wilcoxon Rank-Sum Test or the Mann-Whitney U Test (equivalent)
-non-parametric version of 2-sample t-test
-assumes a continuous (or ordinal) outcome and independent observations
-Null hypothesis: members of two groups come from the same population
distribution/have the same distribution of the outcome variable (i.e. no “group
difference”)

Let x1,….. 𝑥𝑥𝑛𝑛1 , be values for group 1

y1,…., 𝑦𝑦𝑛𝑛2 be values for group 2

Under H0 we’re assuming x’s and y’s are exchangeable (i.e. the group labels don’t really matter
or whether there is an actual difference)

Most general way to write this mathematically: if X is a randomly selected member of group 1
and Y is a randomly selected member of group 2 then

H0: P (X ≥ Y) = P (Y ≥ X) = ½
HA: P (X ≥ Y) = P (Y ≥ X) ≠ ½ for two-sided test

Really is asking X consistently/systematically bigger than Y or vice versa. One of the groups
tends to have higher values than the other group.

If the distribution of the Y’s (group 2) has the same shape as the distribution of the X’s (Figure A
and B) but is just shifted higher or lower by a fixed amount (location shift) then the above is
equivalent to a comparison of group medians (or means). Hence, the Wilcoxon test is often
stated as a test of medians.

If fact, the Wilcoxon test can detect shifts or differences in shape/distribution.

If you further assume that the distributions are symmetric, then it’s the usual test of means since
mean=median.
Page 12 of 24
Location shift

µ1 µ2
mean=median mean=median

Location shift

Figure A: There is a location shift, and distributions are symmetrical. Testing means, medians or
distributions are equivalent hypotheses.

Note: if the distributions are symmetric but not identical shape then mean/median test is NOT the
same as testing of distributions.

Figure B: There is a location shift, but the distributions are not symmetric. Testing the means is
NOT the same as testing medians, but testing one is the same as testing the other.

Figure C: Shapes of distributions are different (no symmetry or location shift); all tests are
different; Y is systemically higher than X but means or medians could be equal.

Page 13 of 24
Wilcoxon Rank Sum Test focuses on:

for two-sided test

H0: P (X ≥ Y) = P (Y ≥ X) = ½ there is no systematic difference in the magnitude of ______
between groups X and Y
HA: P (X ≥ Y) = P (Y ≥ X) ≠ ½ there is no systematic difference in the magnitude of
______ between groups X and Y

How do we actually do the non-parametric test?

- Put all the observations (both groups: X and Y) together and rank them from smallest to
largest; Ties get average rank
- Calculate the sum of the ranks for group one (R1) or group 2 (R2); doesn’t matter which one;
this is the test statistics.
- Idea is that under H0 the sums of ranks for 2 groups should be about the same (Any value
should be equally likely to come from either group.) If the groups are different sizes, it's the
average ranks that should be the same. The way this is presented on computer output is the
“expected” rank sum for each group under H0 adjusting for sample sizes.
- To figure out the p-value associated with R1 or R2 we need to write down all the possible sets
of ways are ranks for the n1+n2 total subjects could have been split into groups of size and n1
and n2, to calculate the associated R1 and R2 and see where our observed values fall.

Example: 4 people in the Phase II clinical trial (evaluating safety and a little efficacy), 2 get
assigned a new treatment and 2 get control/placebo

Outcome that measures how well people respond, assume high values=good
Original Original
Control New Treatment
60 71
62 80

In our observed sample ranks split as:

Rank Rank
Control New Treatment
1 3
2 4

This is as good as we could hope for the new treatment. Is it significant?

Rank sums are Rcontrol= 3 and Rtx= 7

Let's focus on TX group: How many ways are there for the ranks to be allocated?

4
� � =4 choose 2= 6 ways
2

Page 14 of 24
Ranks of
Treatment Sum
1,2 3
1,3 4
1,4 5
2,3 5
2,4 6
3,4 7 (observed)

If there’s no treatment effect, the various rank values (1,2,3,4) are equally likely to be in either
group

P(Rtx = 3) = 1�6

P(Rtx = 4) = 1�6

P(Rtx = 5) = 1�3

= P(data as or more favorable to HA than what we observed)

p-value = P(tx group would have done so well in our sample if tx didn’t work)

= P(Rtx = 7)

= 1�6

= 0.167

As expected, this is not significant.

-The sample size is too small.

When would you want to use the rank-sum test?

- If your data are (very) non-normal

- If your sample sizes are fairly small (can't check normality)
- If your variances in the 2 groups are unequal (standard t-test assumes equal variances)
- For large sample sizes, the power for Wilcoxon test is not that much worse than the 2-
sample t-test.
- In practice for small samples, I try both tests and see if the answers differ.

Kruskal- Wallis Test -

Page 15 of 24
- non-parametric analogue of ANOVA; it allows you to compare whether values of an
outcome are systematically different (larger versus smaller) or across 3 or more groups
- it’s a straightforward extension of the Wilcoxon rank sum test.

Procedure= like rank-sum test

• pool all data values and rank them

• Calculate the sums of the ranks for each group
• Check whether the rank sums are “different” from what would be expected by chance given
the group sizes (average ranks should be ≈ under H0)

Page 16 of 24
Simulation Based Non-Parametric Tests

- Classical rank tests depend on the idea that the ranks are uniformly distributed and under
the null hypothesis, H0, are “randomly distributed” (equally likely to occur in any
particular combination)
- This means that you can get the null distribution of the test statistic for a rank based
method by simply writing down all the possible combinations of ranks and calculating
the corresponding test statistic values. This is a special case of something called a
permutation test.

Permutation Test

- A permutation test tries to approximate the null distribution of your test statistic without
making assumptions about the underlying data distribution or invoking a theoretical
argument.
- In essence, we let the data tell us what the null distribution looks like;
- Instead of using a Z, t, F, or χ2 table, we “simulate” a data-set specific table.
- In a permutation test, H0 is generally that there is “no relationship” between two
constructs of interest and HA is that there is a relationship.
- The key idea in creating the null distribution of the test statistic is to “break” the
relationship that may (or may not) by present in the data by shuffling or permuting the
values of one of the variables.

Procedure

Step 1: Pick your hypothesis – usually H0 is generally that there is “no relationship” between
two key variables and HA is that there is a relationship, but you need to be very clear, what
variables you are relating!

Step 2: Pick a test statistics – this should be an intuitive measure of how well your data match H0
vs HA. It can be a traditional statistic (e.g. sample correlation, t-score, etc.) or something else
(e.g difference in medians, rate ratio, etc.). Calculate the test statistics for the observed data.

Step 3: Get the distribution of your test statistic under H0 by picking an appropriate way to
reshuffle your data to break the relationship and calculate the test statistics for each of the
simulated data sets. Order the resulting test statistic values.

Step 4: To get the p-value find the percentile corresponding to the value of your observed test
statistics from Step 3.
Example: Permutation Test for Group Difference
Outcome: # of papers published last year by assistant professors in
(1) Math Dept: 1,2,6 ͞x1 = 3, med1= 2
(2) Biostat Dept: 4,9,11 ͞x2 = 8, med2= 9

Page 17 of 24
Goal: We want to see who publishes “more” which could mean on average (means), in terms of
the center of the distribution (median), whether a randomly selected biostatistician usually has a
higher value than a randomly selected mathematician (distribution), etc.
- We want to compare groups say by means or medians.
- We will look at permutation tests based on both the difference in means and the
difference in medians.
o For the means, we could use a t-test, but the n’s are small and there appears to be
one unusual point in each group so usual t-statistic may not have a t distribution
and we don’t know the distribution of the sample mean.
o For the distribution of the difference in medians is not so simple (sadly…) and in
any case depends on the underlying data distribution which we don’t want to
assume.
Test statistics
(a) Means – traditional choice is t-statistics

8−3
t= 1 1
= 1.93
3.16 � +
3 3

(b) Difference in medians: 9-2 = 7

What do we shuffle? Here the prospective relationship is between # of papers and department
membership so we permute the group labels to break the relationship between number of
publications and field of study.
This amounts to choosing which 3 people get the “biostat” label. How many are there?
n= 6 profs; need k=3 in each group
6! 6∙5∙4∙3∙2∙1
�63� = 3!3!= = 20
3∙2∙1∙3∙2∙1

There are 20 possible combinations all equally likely under H0 (prob ½)

Just showing some ways to ‘reshuffle’ group labels.
Original
Score Department Reshuffle1 Reshuffle2 Reshuffle3 Reshuffle4 ….
1 Math Math Math Math Biostat
2 Math Biostat Biostat Biostat Biostat
6 Math Biostat Math Biostat Math
4 Biostat Biostat Math Math Biostat

Page 18 of 24
9 Biostat Math Biostat Math Math
11 Biostat Math Biostat Biostat Math

-For each of these 20 versions of the data set we calculate

a) the traditional t-score
b) difference in medians
We want to see where our observed values fall on the resulting list/null distributions.

The figure shows histograms of all possible permutation values for the test statistics.
Our t-score was the second highest (1-sided p-value=2/20 = 0.1) and the median difference was
tied for the highest (1-sided p-value=2/20 = 0.1). Not good enough to establish a significant
difference in publication rates between biostatistics and math (bummer!)
- In this case, it was possible to list out all the possible permutations of the data but if the
sample sizes are larger, this will not be feasible even with a computer. Ordinarily we
would just do a random subset of permutations, say P=1000 and get an estimate.
- Aside: Why use the t-test in the first part of this example even though the t-test
assumptions are unmet?
o The t-test statistics still gives us an intuitively reasonable measure of whether the
average publication levels differ standardized by the variability = we just don’t
want to assume we can use a t-table to calculate the corresponding p-value.
- The smaller the 𝛼𝛼 you want to use, the larger the # of permutations you need to get a
good approximation of the tail probabilities.

Page 19 of 24
The Bootstrap
-Permutation procedures try to generate the null distribution of a statistic without making many
assumptions
-There are other inferential situations when you might want the “actual distribution under the
“true” value rather than the null distributions; e.g. to get the confidence interval, we’re really
interested in the distribution of HA
-The bootstrap is a simulation procedure for looking at the “actual” distribution of your
parameter estimate or test statistic without making assumptions about either the distribution of
the data or of the estimator/test statistics other than that we have a representative sample.
When do we use the bootstrap?
-If the distribution of estimator/statistic is unknown/hard to derive analytically

-Use it even for standard statistics (͞x, s, 𝛽𝛽̂ ) where the usual assumptions may be dubious (non-
normality, outliers, small n)
In general, suppose we’re interested in a population parameter 𝜃𝜃, and we estimate it using a
^
statistic, 𝜃𝜃, based on a sample of size n: {𝑥𝑥1 , … . 𝑥𝑥2 , … . , 𝑥𝑥𝑛𝑛 }
Question: How can we estimate the distribution (i.e. probabilities, standard error, etc.) for 𝜃𝜃? We
need this to do inference.
-Key idea is to mimic the relationship between the sample we observed and the population.

-Ideally to get the distribution of 𝜃𝜃� based on a sample size of n we would draw lots of samples of
�∗ for each of them and look at the resulting distribution (e.g. histogram)
size n, calculate 𝜃𝜃
Problem: We can’t afford to do this and even if we could, we’d just want to combine points into
one big sample to get better estimates!
Solution: Bootstrap approach relies on the idea that my sample is “like” or is representative of
the population (standard basis for statistical inference)
- so…. taking samples from my original sample should be “like” sampling from the
original population. But I can take as many “resamples” as I want from my original data
set!
- The bootstrap samples need to be the same size as the original sample so that the
parameter estimate 𝜃𝜃�∗ behave the same way as the original 𝜃𝜃�.
- This means that we need to sample with replacement or we won’t get any variation. Some
of the original values will appear more than once in a given bootstrap sample and others
will not appear at all.
- Because the original sample is smaller than the full population resampling from it gives a
coarser estimate of the distribution of 𝜃𝜃� – the smaller the sample, the worse this problem
is,

Page 20 of 24
- Bootstrap does NOT save you from small sample sizes or “create” new data – it just
makes maximum use of the data you have
- �∗ ’s, to estimate anything we want about
We can use the bootstrap parameter value, the 𝜃𝜃
the distribution of 𝜃𝜃�.
Conceptual Picture

Procedure:

1. Take original sample of size n and calculate the estimator/test statistic of interest, 𝜃𝜃�.
2. Obtain B Bootstrap samples of size n with replacement from the original sample (some
values occur multiple times while others are left out of any given resample.)
3. For each bootstrap sample compute the statistic of interest to get 𝜃𝜃�1∗ , … 𝜃𝜃�𝐵𝐵∗
4. Use the 𝜃𝜃� ∗ ’s to learn anything you want about the distribution of 𝜃𝜃

Page 21 of 24
(a) Make a histogram of 𝜃𝜃� to get the shape of distribution
(b) You can order the 𝜃𝜃� to get percentiles/probabilities associated with the distribution of
𝜃𝜃�
(c) You can calculate the standard deviation of the 𝜃𝜃� ∗ to get a standard error estimate for
𝜃𝜃�; to estimate the uncertainty of 𝜃𝜃� and calculate confidence interval. This is
especially useful if you do not have a formula for the standard error of 𝜃𝜃� – you can
simply estimate it by the standard deviation of the 𝜃𝜃� ∗ ′𝑠𝑠

The bootstrap helps you do inference about 𝜃𝜃 (which is fixed but unknown) by describing the
behavior 𝜃𝜃� which is supposed to be a good estimate of 𝜃𝜃
It works in most (though not all) situations. It is totally non-parametric but it doesn’t solve all
problems. It does not solve having a small sample, no new data are created and it does not help
you if original sample was bad.
One of the major uses of the bootstrap is to get uncertainly estimate/CIs for “difficult” parameter
estimates (i.e. ones whose distribution we don’t know or want to assume)
You could take the bootstrap estimate of the standard deviation error (i.e the standard deviation
of 𝜃𝜃�1∗ , … 𝜃𝜃�𝐵𝐵∗ from bootstrap samples and create a standard confidence interval

𝜃𝜃� ± 2 SD (𝜃𝜃�*’s) 95% Confidence interval

sample bootstrap
estimate standard
error

But….. this “2” implicitly assumes that 𝜃𝜃� is normally distributed! (so 95% of values lie w/in 2
SDs).

If our situation is that 𝜃𝜃� IS normal and we just didn’t have a formula for the standard error, this
would be ok. If 𝜃𝜃 is not normal we can do the following instead:
Bootstrap Percentile Confidence Interval

(1) Take your bootstrap estimates, 𝜃𝜃�1∗ , … 𝜃𝜃�𝐵𝐵∗ , and order them from smallest to largest
(2) Identify the desired confidence level 1- 𝛼𝛼
𝛼𝛼 ↔ 100 (1-𝛼𝛼) % CI
.05 ↔ 95% CI
and get the 𝛼𝛼 �2 and 1- 𝛼𝛼 �2 percentiles of your set of 𝜃𝜃� ∗and use these as the upper and
lower bounds of you CI. For a 95% CI, we take the 2.5% and 97.5% values e.g. if we
have B=200 bootstrap samples these are the 5th and 195th values of 𝜃𝜃� ∗

Page 22 of 24
Notes: You can do this for any 𝛼𝛼 and any B. If you use a small 𝛼𝛼 you need a bigger # of
bootstrap samples, B, to get good estimates of the edge of the interval
How many bootstrap samples do we need? Rough rule of thumb
-B=50-200 is pretty good for getting a standard error estimate
-B=500-2000 is usually good for CI
-You can always check by running it a few times and seeing if estimate/CI changes
-The more complicated the distn of 𝜃𝜃� or the smaller 𝛼𝛼 is the more samples you need.
-Extra bootstrap samples are almost free never hurts to do too many
-In theory you can get 𝜃𝜃� ∗ for all possible bootstrap samples and get the “ideal” bootstrap distn.
However there are nn possible samples which makes even a computer choke.
-In practice you pick a random selection from possible samples  unbiased approximation to
idea value.
-The bootstrap can be used for things s.e.’s/CI’s In particular, you can estimate the bias
(systematic error) in your parameter estimate and correct for it. Along with this you can get “bias
corrected bootstrap CIs” (The formula is a bit messy/technical so we’re not going to worry about
hard calcs.)
Empirical Bootstrap Example
Empirical just means we resample from our original data.
Variable x; sample of size n=4
values: 0,2,4,10
Suppose we’re interested in the mean and median of x.
Sample mean ͞x might not be normal (n is small; one “unusual” value of 10 can’t really tell if this
is an outlier given sample size) For median, we don’t know the distribution in general.
Sample values are ͞x=4, m=3
Let’s get bootstrap CIs:
There are 44= 256 possible bootstrap samples (counting order)
mean median
0,0,0,0  0 0
0,0,0,2  .5 0
0,0,2,0  .5 0

I generated them all. Bootstrap means looks fairly normal even with n=4 but the distribution of
sample median is skewed.

Page 23 of 24
Page 24 of 24

Solution Manual For Practical Statistics For Nursing Using SPSS 1st Edition Knapp 150632567X 9781506325675
100% (48)
Solution Manual For Practical Statistics For Nursing Using SPSS 1st Edition Knapp 150632567X 9781506325675
36 pages
NonParametrics pt1
No ratings yet
NonParametrics pt1
13 pages
Lecture 10 Non Parametric Slides Edited 2019
No ratings yet
Lecture 10 Non Parametric Slides Edited 2019
29 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Statistical Tests
No ratings yet
Statistical Tests
55 pages
Grp.5 Statistics
No ratings yet
Grp.5 Statistics
14 pages
09 Introduction To Nonparametric Methods
No ratings yet
09 Introduction To Nonparametric Methods
32 pages
Categorical and Discrete Data Non Parametric Tests
No ratings yet
Categorical and Discrete Data Non Parametric Tests
162 pages
Choosing The Right Statistical Test: Source
No ratings yet
Choosing The Right Statistical Test: Source
4 pages
Nonparametric Test: DR - Dr. Siswanto, MSC
No ratings yet
Nonparametric Test: DR - Dr. Siswanto, MSC
44 pages
Pearson R Correlation: Test
No ratings yet
Pearson R Correlation: Test
5 pages
What Are Non Parametric Methods!
No ratings yet
What Are Non Parametric Methods!
21 pages
Parametric & Non-Parametric Tests
100% (1)
Parametric & Non-Parametric Tests
34 pages
Lec3_permutation
No ratings yet
Lec3_permutation
8 pages
ENS185 Module on Statistical Tests (1)
No ratings yet
ENS185 Module on Statistical Tests (1)
9 pages
Chapter 5 T test & ANOVA
No ratings yet
Chapter 5 T test & ANOVA
26 pages
09 Introduction To Nonparametric Methods 2
No ratings yet
09 Introduction To Nonparametric Methods 2
4 pages
BE.104 Spring Biostatistics: Detecting Differences and Correlations J. L. Sherley
100% (2)
BE.104 Spring Biostatistics: Detecting Differences and Correlations J. L. Sherley
9 pages
Spearman's Rank Correlation QM3 - 1617
No ratings yet
Spearman's Rank Correlation QM3 - 1617
2 pages
UNIT 8Research
No ratings yet
UNIT 8Research
18 pages
Spearman's Rank Correlation: XX X, and Y, With Sample Values X Y With Sample Values
No ratings yet
Spearman's Rank Correlation: XX X, and Y, With Sample Values X Y With Sample Values
8 pages
Choosing A Significance Test Objectives
No ratings yet
Choosing A Significance Test Objectives
15 pages
01
No ratings yet
01
2 pages
pogi dem
No ratings yet
pogi dem
9 pages
Lecture 9 - Parametric Statistics (Teaching)
No ratings yet
Lecture 9 - Parametric Statistics (Teaching)
10 pages
Test of Difference Correlational SP Es
No ratings yet
Test of Difference Correlational SP Es
6 pages
Psych Stats
No ratings yet
Psych Stats
8 pages
Screenshot 2024-02-26 at 3.43.36 PM
No ratings yet
Screenshot 2024-02-26 at 3.43.36 PM
20 pages
Simple Linear Regression and Measures of Correlation
No ratings yet
Simple Linear Regression and Measures of Correlation
33 pages
What Are Non Parametric Methods!
No ratings yet
What Are Non Parametric Methods!
19 pages
Notes Stats
No ratings yet
Notes Stats
8 pages
Inferential Statistics
No ratings yet
Inferential Statistics
76 pages
Parametric & Non-Parametric Tests
No ratings yet
Parametric & Non-Parametric Tests
34 pages
Nonparametric Lecture
No ratings yet
Nonparametric Lecture
31 pages
Applied Statistics: From Bivariate Through Multivariate Techniques Seconddownload
100% (2)
Applied Statistics: From Bivariate Through Multivariate Techniques Seconddownload
53 pages
statistics-in-psychology_compress notes
No ratings yet
statistics-in-psychology_compress notes
11 pages
Parametric Test R
No ratings yet
Parametric Test R
47 pages
SED-MATH-321-VER-0.2 (1)
No ratings yet
SED-MATH-321-VER-0.2 (1)
149 pages
GROUP 4 PPT Format
No ratings yet
GROUP 4 PPT Format
40 pages
Non Parametric Methods
No ratings yet
Non Parametric Methods
8 pages
Lecture 4 Regression Analysis
No ratings yet
Lecture 4 Regression Analysis
51 pages
Test of Significance
No ratings yet
Test of Significance
32 pages
R&M Assignment
No ratings yet
R&M Assignment
5 pages
Non Parametric Statistics
No ratings yet
Non Parametric Statistics
96 pages
STAT22209 - Nonparametric Statistics
No ratings yet
STAT22209 - Nonparametric Statistics
74 pages
Hypothesis Testing With T Tests Edited 1
No ratings yet
Hypothesis Testing With T Tests Edited 1
31 pages
Choosing a test
No ratings yet
Choosing a test
19 pages
Research Methods: PH.D in Nursing
No ratings yet
Research Methods: PH.D in Nursing
63 pages
4.02 Comparing Group Means - T-Tests and One-Way ANOVA Using Stata, SAS, R, and SPSS (2009)
No ratings yet
4.02 Comparing Group Means - T-Tests and One-Way ANOVA Using Stata, SAS, R, and SPSS (2009)
51 pages
Common Statistical Tests Encountered in Chiropractic Research Michael T. Haneline, DC, MPH
No ratings yet
Common Statistical Tests Encountered in Chiropractic Research Michael T. Haneline, DC, MPH
12 pages
Lecture 6 - Regression
No ratings yet
Lecture 6 - Regression
6 pages
C8203 IRDA Class Support Handbook
No ratings yet
C8203 IRDA Class Support Handbook
53 pages
Non Parametric Test: Business Research Methods
No ratings yet
Non Parametric Test: Business Research Methods
26 pages
Hypothesis
No ratings yet
Hypothesis
49 pages
Statistical Tests
No ratings yet
Statistical Tests
10 pages
Statistical Treatment
No ratings yet
Statistical Treatment
4 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
STA 202 Correlation and Regression
No ratings yet
STA 202 Correlation and Regression
11 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Ward, B., Bhati, D., Neha, F., & Guercio, A. (2024). Analyzing the Impact of AI Tools on Student Study Habits and Academic Performance.
No ratings yet
Ward, B., Bhati, D., Neha, F., & Guercio, A. (2024). Analyzing the Impact of AI Tools on Student Study Habits and Academic Performance.
7 pages
DSV Module-2
No ratings yet
DSV Module-2
23 pages
Research: Encyclopaedia of Social Sciences, D. Slesinger and M. Stephenson (1930) Defined Research
No ratings yet
Research: Encyclopaedia of Social Sciences, D. Slesinger and M. Stephenson (1930) Defined Research
15 pages
Bongabon Senior High School
No ratings yet
Bongabon Senior High School
8 pages
Sample Size A Rough Guide: Ronán Conroy
No ratings yet
Sample Size A Rough Guide: Ronán Conroy
30 pages
Measurement in Nursing Research
No ratings yet
Measurement in Nursing Research
31 pages
SHS Applied Subjects English and Research MELC
No ratings yet
SHS Applied Subjects English and Research MELC
11 pages
The Journal of Finance - 2025 - KALMENOVITZ - Regulatory Fragmentation
No ratings yet
The Journal of Finance - 2025 - KALMENOVITZ - Regulatory Fragmentation
46 pages
Villa
No ratings yet
Villa
11 pages
Research II Take Home
No ratings yet
Research II Take Home
7 pages
ESTIMATION (One Population) : CHAPTER - 8
100% (1)
ESTIMATION (One Population) : CHAPTER - 8
14 pages
QUARTER 2 WEEK 4 5 Module 5 in Practical Research 2
No ratings yet
QUARTER 2 WEEK 4 5 Module 5 in Practical Research 2
17 pages
Surprise Higher Dividends Equal Higher Earnings Growth
No ratings yet
Surprise Higher Dividends Equal Higher Earnings Growth
19 pages
Suyash Pandey
No ratings yet
Suyash Pandey
24 pages
H2 MYE Revision Package Hypothesis Testing Solutions
No ratings yet
H2 MYE Revision Package Hypothesis Testing Solutions
9 pages
ISPE_PV_Packaging for OSD Forms DP
No ratings yet
ISPE_PV_Packaging for OSD Forms DP
22 pages
A Hybrid Efficient Method To Downscale Wave Climate To Coastal Areas - Camus - 2011
No ratings yet
A Hybrid Efficient Method To Downscale Wave Climate To Coastal Areas - Camus - 2011
12 pages
Worksheet October 10 Solutions PDF
No ratings yet
Worksheet October 10 Solutions PDF
6 pages
A Study On Emotional Intelligence and Self Management Among The Bpo Employees With Special Reference To Vagus Technologi 1
100% (3)
A Study On Emotional Intelligence and Self Management Among The Bpo Employees With Special Reference To Vagus Technologi 1
108 pages
Abm Quiz Module A
No ratings yet
Abm Quiz Module A
90 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
Mean and Variance of Probability Distribution
No ratings yet
Mean and Variance of Probability Distribution
2 pages
School Dropouts in The Philippines - Causes, Changes and Statistics
No ratings yet
School Dropouts in The Philippines - Causes, Changes and Statistics
19 pages
2.2.the Story of An Hour
No ratings yet
2.2.the Story of An Hour
3 pages
QT Bba
No ratings yet
QT Bba
26 pages
Geotechnical Uncertainties and Reliability Theory Applications
No ratings yet
Geotechnical Uncertainties and Reliability Theory Applications
9 pages
Lecture - 2
No ratings yet
Lecture - 2
31 pages
Walter Onyango Midigo
No ratings yet
Walter Onyango Midigo
17 pages
Eps B301
No ratings yet
Eps B301
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

LECTURE 2 - Introduction to Nonparametric

Uploaded by

LECTURE 2 - Introduction to Nonparametric

Uploaded by

Lecture 2 - Introduction to Non-Parametric

Simple Linear Regression

Then how is simple linear regression different from correlation?

Regression Model: 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶�

Example: Smallest Largest

Data Set 1: … … ….. …. …

Data Set 2: …...

which is a measure of the degree of linear relationship between X and Y

Figure B: If you have elliptical data cloud,

Figure C: Spearman less affected by the

Constructing Hypothesis Test for the Spearman Rank Correlation

Raw X Rank X Raw Y Rank Y

p-value=probability of a correlation as high or higher than what we observed under H0

Scenario 1: On average people in program lose weight (classical paired t-test)

Black line=mean weight loss

P<1/2 P>1/2 Red line=median weight loss

Paired t-test gives the same answer (no

as many people or more than what I observed would lose

P (s+ = k) = �𝑛𝑛𝑘𝑘� pk (1-p)n-k

Two sample T-test and ANOVA equivalents

Let x1,….. 𝑥𝑥𝑛𝑛1 , be values for group 1

y1,…., 𝑦𝑦𝑛𝑛2 be values for group 2

If fact, the Wilcoxon test can detect shifts or differences in shape/distribution.

for two-sided test

How do we actually do the non-parametric test?

In our observed sample ranks split as:

This is as good as we could hope for the new treatment. Is it significant?

Rank sums are Rcontrol= 3 and Rtx= 7

= P(data as or more favorable to HA than what we observed)

As expected, this is not significant.

-The sample size is too small.

When would you want to use the rank-sum test?

- If your data are (very) non-normal

Kruskal- Wallis Test -

Procedure= like rank-sum test

• pool all data values and rank them

(b) Difference in medians: 9-2 = 7

There are 20 possible combinations all equally likely under H0 (prob ½)

-For each of these 20 versions of the data set we calculate

𝜃𝜃� ± 2 SD (𝜃𝜃�*’s) 95% Confidence interval

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.