Pearson's Correlation
Pearson's Correlation
Pearson’s correlation
Pearson product-moment correlation (PPMC)
Common Uses
Data Requirements
1
There is no relationship between the values of variables between cases.
This means that:
the values for all variables across cases are unrelated
for any case, the value for any variable cannot influence the
value of any variable for other cases
no case can influence another case on any variable
The biviariate Pearson correlation coefficient and corresponding
significance test are not robust when independence is violated.
5. Bivariate normality
Each pair of variables is bivariately normally distributed
Each pair of variables is bivariately normally distributed at all levels of
the other variable(s)
This assumption ensures that the variables are linearly related;
violations of this assumption may indicate that non-linear relationships
among variables exist. Linearity can be assessed visually using a
scatterplot of the data.
6. Random sample of data from the population
7. No outliers
Hypotheses
The null hypothesis (H0) and alternative hypothesis (H1) of the significance test for
correlation can be expressed in the following ways, depending on whether a one-
tailed or two-tailed test is requested:
2
where cov(x, y) is the sample covariance of x and y; var(x) is the sample variance
of x; and var(y) is the sample variance of y.
Correlation can take on any value in the range [-1, 1]. The sign of the correlation
coefficient indicates the direction of the relationship, while the magnitude of the
correlation (how close it is to -1 or +1) indicates the strength of the relationship.
The strength can be assessed by these general guidelines [1] (which may vary by
discipline):
Note: The direction and strength of a correlation are two distinct properties. The
scatterplots below show correlations that are r = +0.90, r = 0.00, and r = -0.90,
respectively. The strength of the nonzero correlations are the same: 0.90. But the
direction of the correlations is different: a negative correlation corresponds to a
decreasing relationship, while and a positive correlation corresponds to an increasing
relationship.
Note that the r = 0.00 correlation has no discernable increasing or decreasing linear
pattern in this particular graph. However, keep in mind that Pearson correlation is
only capable of detecting linear associations, so it is possible to have a pair of
variables with a strong nonlinear relationship and a small Pearson correlation
3
coefficient. It is good practice to create scatterplots of your variables to corroborate
your correlation coefficients.
Data Set-Up
Your dataset should include two or more continuous numeric variables, each defined
as scale, which will be used in the analysis.
Each row in the dataset should represent one unique subject, person, or unit. All of
the measurements taken on that person or unit should appear in that row. If
measurements for one subject appear on multiple rows -- for example, if you have
measurements from different time points on separate rows -- you should reshape
your data to "wide" format before you compute the correlations.
The Bivariate Correlations window opens, where you will specify the variables to be
used in the analysis. All of the variables in your dataset appear in the list on the left
side. To select variables for the analysis, select the variables in the list on the left and
click the blue arrow button to move them to the right, in the Variables field.
4
A Variables: The variables to be used in the bivariate Pearson Correlation. You
must select at least two continuous variables, but may select more than two. The test
will produce correlation coefficients for each pair of variables in this list.
B Correlation Coefficients: There are multiple types of correlation coefficients.
By default, Pearson is selected. Selecting Pearson will produce the test statistics for
a bivariate Pearson Correlation.
C Test of Significance: Click Two-tailed or One-tailed, depending on your
desired significance test. SPSS uses a two-tailed test by default.
D Flag significant correlations: Checking this option will include asterisks (**)
next to statistically significant correlations in the output. By default, SPSS marks
statistical significance at the alpha = 0.05 and alpha = 0.01 levels, but not at the
alpha = 0.001 level (which is treated as alpha = 0.01)
E Options: Clicking Options will open a window where you can specify
which Statistics to include (i.e., Means and standard deviations, Cross-
product deviations and covariances) and how to address Missing
Values (i.e., Exclude cases pairwise or Exclude cases listwise). Note that the
pairwise/listwise setting does not affect your computations if you are only entering
two variable, but can make a very large difference if you are entering three or more
variables into the correlation procedure.
5
OUTPUT
TABLES
The results will display the correlations in a table, labeled Correlations.
The important cells we want to look at are either B or C. (Cells B and C are identical,
because they include information about the same pair of variables.) Cells B and C
contain the correlation coefficient for the correlation between height and weight, its
6
p-value, and the number of complete pairwise observations that the calculation was
based on.
The correlations in the main diagonal (cells A and D) are all equal to 1. This is
because a variable is always perfectly correlated with itself. Notice, however, that the
sample sizes are different in cell A (n=408) versus cell D (n=376). This is because of
missing data -- there are more missing observations for variable Weight than there
are for variable Height.
If you have opted to flag significant correlations, SPSS will mark a 0.05 significance
level with one asterisk (*) and a 0.01 significance level with two asterisks (0.01). In
cell B (repeated in cell C), we can see that the Pearson correlation coefficient for
height and weight is .513, which is significant (p < .001 for a two-tailed test), based
on 354 complete observations (i.e., cases with nonmissing values for both height and
weight).
Weight and height have a statistically significant linear relationship (p < .001).
The direction of the relationship is positive (i.e., height and weight are positively
correlated), meaning that these variables tend to increase together (i.e., greater
height is associated with greater weight).
The magnitude, or strength, of the association is approximately moderate (.3 < | r | <
.5).
Correlation Analysis
7
Before performing a correlation analysis, it is a good idea to generate a scatterplot.
Inspection of the scatterplots also gives you a better idea of the nature of the
relationship between your variables.
8
Procedure for requesting Pearson r or Spearman rho
1. From the menu at the top of the screen, click on Analyze, then select Correlate,
then Bivariate.
2. Select your two variables and move them into the box marked Variables (e.g.
Total perceived stress: tpstress, Total PCOISS: tpcoiss). If you wish you can
list a whole range of variables here, not just two. In the resulting matrix, the
correlation between all possible pairs of variables will be listed. This can be
quite large if you list more than just a few variables.
3. In the Correlation Coefficients section, the Pearson box is the default option. If
you wish to request the Spearman rho (the non-parametric alternative), tick this
box instead (or as well).
4. Click on the Options button. For Missing Values, click on the Exclude cases
pairwise box. Under Options, you can also obtain means and standard
deviations if you wish.
5. Click on Continue and then on OK.
Total perceived
stress Total PCOISS
Total perceived stress Pearson Correlation 1 -.581**
Sig. (2-tailed) .000
N 433 426
Total PCOISS Pearson Correlation -.581** 1
Sig. (2-tailed) .000
N 426 430
**. Correlation is significant at the 0.01 level (2-tailed).
Step 1: Checking the information about the sample. The first thing to look at in the table
labelled Correlations is the N (number of cases). Is this correct? If there are a lot of missing
data, you need to find out why. Did you forget to tick the Exclude cases pairwise in the
missing data option?
Step 2: Determining the direction of the relationship. The second thing to consider is the
direction of the relationship between the variables. Is there a negative sign in front of the
correlation coefficient value? If there is, this means there is a negative correlation between
the two variables (i.e. high scores on one are associated with low scores on the other). The
interpretation of this depends on the way the variables are scored.
Step 3: Determining the strength of the relationship. The third thing to consider in the output
is the size of the value of the correlation coefficient. This can range from –1.00 to 1.00. This
value will indicate the strength of the relationship between your two variables. A correlation
of 0 indicates no relationship at all, a correlation of 1.0 indicates a perfect positive
correlation, and a value of –1.0 indicates a perfect negative correlation. Small r = 0.10 to
0.29; medium r=0.30 to 0.4; large r=0.50 to 1.0
9
Step 4: Calculating the coefficient of determination. To get an idea of how much variance
your two variables share, you can also calculate what is referred to as the coefficient of
determination. Sounds impressive, but all you need to do is square your r value.
Step 5: Assessing the significance level The next thing to consider is the significance level
(listed as Sig. 2 tailed). The significance of r or is strongly influenced by the size of the
sample. In a small sample (e.g. n=30), you may have moderate correlations that do not reach
statistical significance at the traditional p<.05 level. In large samples (N=100+), however,
very small correlations (e.g. r=.2) may reach statistical significance. While you need to report
statistical significance, you should focus on the strength of the relationship and the amount of
shared variance.
The relationship between perceived control of internal states (as measured by the PCOISS) and
perceived stress (as measured by the Perceived Stress Scale) was investigated using Pearson
product-moment correlation coefficient. Preliminary analyses were performed to ensure no
violation of the assumptions of normality, linearity and homoscedasticity. There was a strong,
negative correlation between the two variables, r = –.58, n = 426, p < .001, with high levels of
perceived control associated with lower levels of perceived stress.
10