0% found this document useful (0 votes)
48 views10 pages

Pearson's Correlation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views10 pages

Pearson's Correlation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Pearson Correlation

The bivariate Pearson Correlation produces a sample correlation coefficient, r, which


measures the strength and direction of linear relationships between pairs of
continuous variables. By extension, the Pearson Correlation evaluates whether there
is statistical evidence for a linear relationship among the same pairs of variables in
the population, represented by a population correlation coefficient, ρ (“rho”). The
Pearson Correlation is a parametric measure.

This measure is also known as:

 Pearson’s correlation
 Pearson product-moment correlation (PPMC)

Common Uses

The bivariate Pearson Correlation is commonly used to measure the following:

 Correlations among pairs of variables


 Correlations within and between sets of variables

The bivariate Pearson correlation indicates the following:

 Whether a statistically significant linear relationship exists between two


continuous variables
 The strength of a linear relationship (i.e., how close the relationship is to being
a perfectly straight line)
 The direction of a linear relationship (increasing or decreasing)

Note: The bivariate Pearson Correlation cannot address non-linear relationships or


relationships among categorical variables. If you wish to understand relationships
that involve categorical variables and/or non-linear relationships, you will need to
choose another measure of association such as the non-parametric test Chi Square.

Note: The bivariate Pearson Correlation only reveals associations among


continuous variables. The bivariate Pearson Correlation does not provide any
inferences about causation, no matter how large the correlation coefficient is.

Data Requirements

Your data must meet the following requirements:

1. Two or more continuous variables (i.e., interval or ratio level)


2. Cases that have values on both variables
3. Linear relationship between the variables
4. Independent cases (i.e., independence of observations)

1
 There is no relationship between the values of variables between cases.
This means that:
 the values for all variables across cases are unrelated
 for any case, the value for any variable cannot influence the
value of any variable for other cases
 no case can influence another case on any variable
 The biviariate Pearson correlation coefficient and corresponding
significance test are not robust when independence is violated.

5. Bivariate normality
 Each pair of variables is bivariately normally distributed
 Each pair of variables is bivariately normally distributed at all levels of
the other variable(s)
 This assumption ensures that the variables are linearly related;
violations of this assumption may indicate that non-linear relationships
among variables exist. Linearity can be assessed visually using a
scatterplot of the data.
6. Random sample of data from the population
7. No outliers

Hypotheses

The null hypothesis (H0) and alternative hypothesis (H1) of the significance test for
correlation can be expressed in the following ways, depending on whether a one-
tailed or two-tailed test is requested:

Two-tailed significance test:

H0: r = 0 ("the population correlation coefficient is 0; there is no association")


H1: r ≠ 0 ("the population correlation coefficient is not 0; a nonzero correlation could
exist")

One-tailed significance test:

H0: r = 0 ("the population correlation coefficient is 0; there is no association")


H1: r > 0 ("the population correlation coefficient is greater than 0; a positive
correlation could exist")
OR
H1: r < 0 ("the population correlation coefficient is less than 0; a negative correlation
could exist")

where r is the population correlation coefficient.

2
where cov(x, y) is the sample covariance of x and y; var(x) is the sample variance
of x; and var(y) is the sample variance of y.

Correlation can take on any value in the range [-1, 1]. The sign of the correlation
coefficient indicates the direction of the relationship, while the magnitude of the
correlation (how close it is to -1 or +1) indicates the strength of the relationship.

 -1 : perfectly negative linear relationship


 0 : no relationship
 +1 : perfectly positive linear relationship

The strength can be assessed by these general guidelines [1] (which may vary by
discipline):

 .1 < | r | < .3 … small / weak correlation


 .3 < | r | < .5 … medium / moderate correlation
 .5 < | r | ……… large / strong correlation

Note: The direction and strength of a correlation are two distinct properties. The
scatterplots below show correlations that are r = +0.90, r = 0.00, and r = -0.90,
respectively. The strength of the nonzero correlations are the same: 0.90. But the
direction of the correlations is different: a negative correlation corresponds to a
decreasing relationship, while and a positive correlation corresponds to an increasing
relationship.

r = -0.90 r = 0.00 r = 0.90

Note that the r = 0.00 correlation has no discernable increasing or decreasing linear
pattern in this particular graph. However, keep in mind that Pearson correlation is
only capable of detecting linear associations, so it is possible to have a pair of
variables with a strong nonlinear relationship and a small Pearson correlation

3
coefficient. It is good practice to create scatterplots of your variables to corroborate
your correlation coefficients.

Data Set-Up

Your dataset should include two or more continuous numeric variables, each defined
as scale, which will be used in the analysis.

Each row in the dataset should represent one unique subject, person, or unit. All of
the measurements taken on that person or unit should appear in that row. If
measurements for one subject appear on multiple rows -- for example, if you have
measurements from different time points on separate rows -- you should reshape
your data to "wide" format before you compute the correlations.

Run a Bivariate Pearson Correlation


To run a bivariate Pearson Corre
lation in SPSS, click Analyze > Correlate > Bivariate.

The Bivariate Correlations window opens, where you will specify the variables to be
used in the analysis. All of the variables in your dataset appear in the list on the left
side. To select variables for the analysis, select the variables in the list on the left and
click the blue arrow button to move them to the right, in the Variables field.

4
A Variables: The variables to be used in the bivariate Pearson Correlation. You
must select at least two continuous variables, but may select more than two. The test
will produce correlation coefficients for each pair of variables in this list.
B Correlation Coefficients: There are multiple types of correlation coefficients.
By default, Pearson is selected. Selecting Pearson will produce the test statistics for
a bivariate Pearson Correlation.
C Test of Significance: Click Two-tailed or One-tailed, depending on your
desired significance test. SPSS uses a two-tailed test by default.
D Flag significant correlations: Checking this option will include asterisks (**)
next to statistically significant correlations in the output. By default, SPSS marks
statistical significance at the alpha = 0.05 and alpha = 0.01 levels, but not at the
alpha = 0.001 level (which is treated as alpha = 0.01)
E Options: Clicking Options will open a window where you can specify
which Statistics to include (i.e., Means and standard deviations, Cross-
product deviations and covariances) and how to address Missing
Values (i.e., Exclude cases pairwise or Exclude cases listwise). Note that the
pairwise/listwise setting does not affect your computations if you are only entering
two variable, but can make a very large difference if you are entering three or more
variables into the correlation procedure.

5
OUTPUT
TABLES
The results will display the correlations in a table, labeled Correlations.

A Correlation of Height with itself (r=1), and the number of non-missing


observations for height (n=408).
B Correlation of height and weight (r=0.513), based on n=354 observations with
pairwise non-missing values.
C Correlation of height and weight (r=0.513), based on n=354 observations with
pairwise non-missing values.
D Correlation of weight with itself (r=1), and the number of nonmissing
observations for weight (n=376).

The important cells we want to look at are either B or C. (Cells B and C are identical,
because they include information about the same pair of variables.) Cells B and C
contain the correlation coefficient for the correlation between height and weight, its

6
p-value, and the number of complete pairwise observations that the calculation was
based on.

The correlations in the main diagonal (cells A and D) are all equal to 1. This is
because a variable is always perfectly correlated with itself. Notice, however, that the
sample sizes are different in cell A (n=408) versus cell D (n=376). This is because of
missing data -- there are more missing observations for variable Weight than there
are for variable Height.
If you have opted to flag significant correlations, SPSS will mark a 0.05 significance
level with one asterisk (*) and a 0.01 significance level with two asterisks (0.01). In
cell B (repeated in cell C), we can see that the Pearson correlation coefficient for
height and weight is .513, which is significant (p < .001 for a two-tailed test), based
on 354 complete observations (i.e., cases with nonmissing values for both height and
weight).

DECISION AND CONCLUSIONS


Based on the results, we can state the following:

 Weight and height have a statistically significant linear relationship (p < .001).
 The direction of the relationship is positive (i.e., height and weight are positively
correlated), meaning that these variables tend to increase together (i.e., greater
height is associated with greater weight).
 The magnitude, or strength, of the association is approximately moderate (.3 < | r | <
.5).

Correlation Analysis

To demonstrate the use of correlation, we will explore the interrelationships among


some of the variables included in the survey4ED.sav data file provided on the website
accompanying this book. The survey was designed to explore the factors that affect
respondents’ psychological adjustment and wellbeing.

Example of research question:


1. Is there a relationship between the amount of control people have over
their internal states and their levels of perceived stress?
2. Do people with high levels of perceived control experience lower levels
of perceived stress?

PRELIMINARY ANALYSES FOR CORRELATION

7
Before performing a correlation analysis, it is a good idea to generate a scatterplot.
Inspection of the scatterplots also gives you a better idea of the nature of the
relationship between your variables.

Procedure for generating a scatterplot


1. From the menu at the top of the screen, click on Graphs, then select Legacy
Dialogs.
2. Click on Scatter/Plot and then Simple Scatter. Click Define.
3. Click on the first variable and move it into the Y-axis box (this will run
vertically). By convention, the dependent variable is usually placed along the
Y-axis (e.g. Total perceived stress: tpstress).
4. Click on the second variable and move to the X-axis box (this will run across
the page). This is usually the independent variable (e.g. Total PCOISS:
tpcoiss).
5. In the Label Cases by: box, you can put your ID variable so that outlying points
can be identified.
6. Click on OK

Step 1: Checking for outliers


Check your scatterplot for outliers—that is, data points that are out on their own, either very
high or very low, or away from the main cluster of points. Extreme outliers are worth
checking: was the information entered correctly? Could these values be errors? Outliers can
seriously influence some analyses, so this is worth investigating.

Step 2: Inspecting the distribution of data points


The distribution of data points can tell you a number of things:
• Are the data points spread all over the place? This suggests a very low correlation.
• Are all the points neatly arranged in a narrow cigar shape? This suggests quite a strong
correlation.
• Could you draw a straight line through the main cluster of points, or would a curved line
better represent the points? If a curved line is evident (suggesting a curvilinear
relationship) Pearson correlation should not be used, as it assumes a linear relationship.
• What is the shape of the cluster? Is it even from one end to the other? Or does it start off
narrow and then get fatter? If this is the case, your data may be violating the assumption of
homoscedasticity.

Step 3: Determining the direction of the relationship between the variables


The scatterplot can tell you whether the relationship between your two variables is positive or
negative.

8
Procedure for requesting Pearson r or Spearman rho
1. From the menu at the top of the screen, click on Analyze, then select Correlate,
then Bivariate.
2. Select your two variables and move them into the box marked Variables (e.g.
Total perceived stress: tpstress, Total PCOISS: tpcoiss). If you wish you can
list a whole range of variables here, not just two. In the resulting matrix, the
correlation between all possible pairs of variables will be listed. This can be
quite large if you list more than just a few variables.
3. In the Correlation Coefficients section, the Pearson box is the default option. If
you wish to request the Spearman rho (the non-parametric alternative), tick this
box instead (or as well).
4. Click on the Options button. For Missing Values, click on the Exclude cases
pairwise box. Under Options, you can also obtain means and standard
deviations if you wish.
5. Click on Continue and then on OK.

Total perceived
stress Total PCOISS
Total perceived stress Pearson Correlation 1 -.581**
Sig. (2-tailed) .000
N 433 426
Total PCOISS Pearson Correlation -.581** 1
Sig. (2-tailed) .000
N 426 430
**. Correlation is significant at the 0.01 level (2-tailed).

INTERPRETATION OF OUTPUT FROM CORRELATION


For both Pearson and Spearman results, SPSS provides you with a table giving the correlation
coefficients between each pair of variables listed, the significance level and the number of
cases. The results for Pearson correlation are shown in the section headed Correlation.

Step 1: Checking the information about the sample. The first thing to look at in the table
labelled Correlations is the N (number of cases). Is this correct? If there are a lot of missing
data, you need to find out why. Did you forget to tick the Exclude cases pairwise in the
missing data option?

Step 2: Determining the direction of the relationship. The second thing to consider is the
direction of the relationship between the variables. Is there a negative sign in front of the
correlation coefficient value? If there is, this means there is a negative correlation between
the two variables (i.e. high scores on one are associated with low scores on the other). The
interpretation of this depends on the way the variables are scored.

Step 3: Determining the strength of the relationship. The third thing to consider in the output
is the size of the value of the correlation coefficient. This can range from –1.00 to 1.00. This
value will indicate the strength of the relationship between your two variables. A correlation
of 0 indicates no relationship at all, a correlation of 1.0 indicates a perfect positive
correlation, and a value of –1.0 indicates a perfect negative correlation. Small r = 0.10 to
0.29; medium r=0.30 to 0.4; large r=0.50 to 1.0

9
Step 4: Calculating the coefficient of determination. To get an idea of how much variance
your two variables share, you can also calculate what is referred to as the coefficient of
determination. Sounds impressive, but all you need to do is square your r value.

Step 5: Assessing the significance level The next thing to consider is the significance level
(listed as Sig. 2 tailed). The significance of r or is strongly influenced by the size of the
sample. In a small sample (e.g. n=30), you may have moderate correlations that do not reach
statistical significance at the traditional p<.05 level. In large samples (N=100+), however,
very small correlations (e.g. r=.2) may reach statistical significance. While you need to report
statistical significance, you should focus on the strength of the relationship and the amount of
shared variance.

PRESENTING THE RESULTS FROM CORRELATION


The results of the above example using Pearson correlation could be presented in a research
report as follows..

The relationship between perceived control of internal states (as measured by the PCOISS) and
perceived stress (as measured by the Perceived Stress Scale) was investigated using Pearson
product-moment correlation coefficient. Preliminary analyses were performed to ensure no
violation of the assumptions of normality, linearity and homoscedasticity. There was a strong,
negative correlation between the two variables, r = –.58, n = 426, p < .001, with high levels of
perceived control associated with lower levels of perceived stress.

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy