0% found this document useful (0 votes)
13 views36 pages

Lesson 3

This document covers the Pearson correlation coefficient, detailing its interpretation, calculation, and significance testing. It explains the strength and direction of relationships between quantitative variables, and when to use Pearson versus Spearman's correlation coefficients. Additionally, it provides a step-by-step guide for hypothesis testing related to correlation coefficients.

Uploaded by

euziaheunice
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views36 pages

Lesson 3

This document covers the Pearson correlation coefficient, detailing its interpretation, calculation, and significance testing. It explains the strength and direction of relationships between quantitative variables, and when to use Pearson versus Spearman's correlation coefficients. Additionally, it provides a step-by-step guide for hypothesis testing related to correlation coefficients.

Uploaded by

euziaheunice
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Correlational and

regression
analysis
MS102 – Lesson 3
Learning outcomes
At the end of the lesson, students must have:
▪ interpreted correlation and regression on various
datasets,
▪ calculated and interpreted the correlation and
regression, and
▪ performed data mining.
Pearson correlation
coefficient (r)
➢ is the most widely used correlation coefficient and
is known by many names:
✓ Pearson’s r
✓ Bivariate correlation
✓ Pearson product-moment correlation coefficient
(PPMCC)
✓ The correlation coefficient
Pearson correlation
coefficient (r)
➢ (a descriptive statistic) summarizes the
characteristics of a dataset.
➢ Specifically, it describes the strength
and direction of the linear relationship
between two quantitative variables.
Pearson correlation Strength Direction
coefficient (r) value
Greater than .5 Strong Positive
Between .3 and .5 Moderate Positive
Between 0 and .3 Weak Positive
0 None None
Between 0 and –.3 Weak Negative
Between –.3 and –.5 Moderate Negative
Less than –.5 Strong Negative
Pearson correlation
coefficient (r)
➢ (also an inferential statistic), can be
used to test statistical hypotheses.
➢ Specifically, we can test whether two
variables have a significant relationship.
Visualizing the Pearson
correlation coefficient
➢ Another way to think of the Pearson
correlation coefficient (r) is as a measure of
how close the observations are to a line of
best fit.
➢ When the slope is negative, r is negative.
➢ When the slope is positive, r is positive.
when the correlation
coefficient 𝑟=1, it
indicates a perfect
positive linear
relationship between
two variables.

Example: if you were comparing the number of


hours studied and test scores, 𝑟=1 would mean
that for every additional hour of study, the test
score increases by a constant amount.
When 𝑟=−1r in Pearson
correlation, it indicates a
perfect negative linear
relationship between
two variables.

Example: if you were comparing the amount of


time spent watching TV and test scores, 𝑟=−1
would mean that for every additional hour spent
watching TV, the test score decreases by a
constant amount.
When r is greater than .5 or less than –.5, the
points are close to the line of best fit:

if r > 0.5, it suggests


that there is a moderate
to strong positive
relationship between the
two variables
When r is greater than .5 or less than –.5, the
points are close to the line of best fit:

if r < -0.5, it suggests a


moderate to strong
negative relationship.
When r is between 0 and .3 or between 0 and
–.3, the points are far from the line of best fit:

A weak positive correlation


suggests a very slight tendency
for one variable to increase as
the other increases, but the
relationship is weak and there
is a lot of variation or noise in
the data.
When r is between 0 and .3 or between 0 and
–.3, the points are far from the line of best fit:

A weak negative correlation


means a very slight tendency
for one variable to decrease
as the other increases, but
again, the relationship is weak.
When r is 0, a line of best fit is not helpful in
describing the relationship between the
variables:
When to use the Pearson
correlation coefficient?
The Pearson correlation coefficient is a good choice
when all of the following are true:
➢ Both variables are quantitative: You will need to
use a different method if either of the variables is
qualitative.
➢ The variables are normally distributed: You can
create a histogram of each variable to verify
whether the distributions are approximately normal.
It’s not a problem if the variables are a little non-
normal.
When to use the Pearson
correlation coefficient?
The Pearson correlation coefficient is a good choice
when all of the following are true:
➢ The data have no outliers: Outliers are observations that
don’t follow the same patterns as the rest of the data. A
scatterplot is one way to check for outliers—look for
points that are far away from the others.
➢ The relationship is linear: “Linear” means that the
relationship between the two variables can be described
reasonably well by a straight line. You can use a
scatterplot to check whether the relationship between two
variables is linear.
Pearson vs. Spearman’s rank
correlation coefficients
Spearman’s rank correlation coefficient is another
widely used correlation coefficient. It’s a better choice
than the Pearson correlation coefficient when one or
more of the following is true:
➢ The variables are ordinal.
➢ The variables aren’t normally distributed.
➢ The data includes outliers.
➢ The relationship between the variables is non-linear and
monotonic.
Calculating the Pearson
correlation coefficient
Testing for the significance of the
Pearson correlation coefficient
➢ The Pearson correlation of the sample is r.
➢ It is an estimate of rho (ρ), the Pearson correlation
of the population.
➢ Knowing r and n (the sample size), we can infer
whether ρ is significantly different from 0.

✓ Null hypothesis (H0): ρ = 0


✓ Alternative hypothesis (Ha): ρ ≠ 0
Steps to test the
hypothesis:
Step 1: Calculate the t (a test statistic) value
Steps to test the
hypothesis:
Step 2: Find the critical value of t
You can find the critical value of t (t*) in a t table. To
use the table, you need to know three things:
➢ The degrees of freedom (df): For Pearson correlation
tests, the formula is df = n – 2.
➢ Significance level (α): By convention, the significance
level is usually .05
➢ One-tailed or two-tailed: Most often, two-tailed is an
appropriate choice for correlations.
For example, in a
test with 𝛼 = 0.05,
the critical t-value
will be located in the
column
corresponding to
0.05 depending on
whether it’s one-tail
or two-tail.
Example:
Finding the critical
value of t
for a two-tailed test of
significance at α = .05
and df = 8, the critical
value of t (t*) is 2.306.
Steps to test the
hypothesis:
Step 3: Compare the t value to the critical value
Determine if the absolute t value is greater than the
critical value of t. “Absolute” means that if the t value is
negative you should ignore the minus sign.
➢ Example: Comparing the t value to the critical value of
t (t*)
t = 1.506 t* = 2.306
The t value is less than the critical value of t.
Steps to test the
hypothesis:
Step 4: Decide whether to reject the null hypothesis
➢ If the t value is greater than the critical value, the
relationship is statistically significant (p < α).
The data allows you to reject the null hypothesis and support
the alternative hypothesis.
➢ If the t value is less than the critical value, the relationship is
not statistically significant (p > α).
The data doesn’t allow you to reject the null hypothesis and
doesn’t provide support for the alternative hypothesis.
https://www.scribbr.com/statistics/pearson-correlation-
coefficient/
Weight Length
Imagine that you’re studying the
(kg) (cm)
relationship between newborns’ 3.63 53.1
weight and length. You have the 3.02 49.7
3.82 48.4
weights and lengths of the 10
3.42 54.2
babies born last month at your local 3.59 54.9
hospital. After you convert the 2.87 43.7
3.03 47.2
imperial measurements to metric,
3.46 45.2
you enter the data in a table:
3.36 54.4
3.3 50.4
Step 1: Calculate the sums of x and
y
Start by renaming the variables to
“x” and “y.” It doesn’t matter which
variable is called x and which is
called y—the formula will give the
same answer either way.

Next, add up the values of x and y.


(In the formula, this step is indicated
by the Σ symbol, which means “take
the sum of”.)
Example: Calculating the sums of x and y
Weight = x
Length = y

Σx = 3.63 + 3.02 + 3.82 + 3.42 + 3.59 + 2.87 + 3.03 + 3.46


+ 3.36 + 3.30

Σx = 33.5

Σy = 53.1 + 49.7 + 48.4 + 54.2 + 54.9 + 43.7 + 47.2 + 45.2


+ 54.4 + 50.4

Σy = 501.2
Step 2: Calculate x2 and y2 and their sums

Create two new columns that contain the squares of x and y.

Take the sums of the new columns.


Step 3: Calculate the cross product and its
sum

In a final column, multiply together x and y


(this is called the cross product). Take the
sum of the new column.
Step 4: Calculate r

Use the formula and the numbers you


calculated in the previous steps to find r.
Step 4: Calculate r

Use the formula and the numbers you


calculated in the previous steps to find r.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy