Multiple Regression in SPSS
Multiple Regression in SPSS
Multiple regression
An introduction to multiple regression
Performing a multiple regression on
SPSS
Section 1: An introduction to multiple regression
When using multiple regression in psychology, many researchers use the term
“independent variables” to identify those variables that they think will influence
some other “dependent variable”. We prefer to use the term “predictor variables” for
those variables that may be useful in predicting the scores on another variable that
we call the “criterion variable”. Thus, in our example above, type of occupation,
salary and years in full-time employment would emerge as significant predictor
variables, which allow us to estimate the criterion variable – how satisfied someone
is likely to be with their job. As we have pointed out before, human behaviour is
inherently noisy and therefore it is not possible to produce totally accurate
predictions, but multiple regression allows us to identify a set of predictor variables
which together provide a useful estimate of a participant’s likely score on a criterion
variable.
HOW DOES MULTIPLE REGRESSION RELATE TO CORRELATION AND ANALYSIS OF
VARIANCE?
To put this another way, in ANOVA we can directly manipulate the factors and
measure the resulting change in the dependent variable. In multiple regression we
simply measure the naturally occurring scores on a number of predictor variables
and try to establish which set of the observed variables gives rise to the best
prediction of the criterion variable.
1. You can use this statistical technique when exploring linear relationships between
the predictor and criterion variables – that is, when the relationship follows a
straight line. (To examine non-linear relationships, special techniques can be used.)
2. The criterion variable that you are seeking to predict should be measured on a
continuous scale (such as interval or ratio scale). There is a separate regression
method called logistic regression that can be used for dichotomous dependent
variables (not covered here).
3. The predictor variables that you select should be measured on a ratio, interval, or
ordinal scale. A nominal predictor variable is legitimate but only if it is
dichotomous, i.e. there are no more that two categories. For example, sex is
acceptable (where male is coded as 1 and female as 0) but gender identity
(masculine, feminine and androgynous) could not be coded as a single variable.
Instead, you would create three different variables each with two categories
(masculine/not masculine; feminine/not feminine and androgynous/not
androgynous). The term dummy variable is used to describe this type of
dichotomous variable.
4. Multiple regression requires a large number of observations. The number of cases
(participants) must substantially exceed the number of predictor variables you are
using in your regression. The absolute minimum is that you have five times as many
participants as predictor variables. A more acceptable ratio is 10:1, but some people
argue that this should be as high as 40:1 for some statistical selection methods (see
page 210).
TERMINOLOGY
There are certain terms we need to clarify to allow you to understand the results of
this statistical technique.
When you have only one predictor variable in your model, then beta is equivalent to
the correlation coefficient between the predictor and the criterion variable. This
Multicollinearity
When choosing a predictor variable you should select one that might be correlated
with the criterion variable, but that is not strongly correlated with the other predictor
variables. However, correlations amongst the predictor variables are not unusual.
The term multicollinearity (or collinearity) is used to describe the situation when a
high correlation is detected between two or more predictor variables. Such high
correlations cause problems when trying to draw inferences about the relative
contribution of each predictor variable to the success of the model. SPSS provides
you with a means of checking for this and we describe this below.
Selection methods
In contrast, “hierarchical” methods enter the variables into the model in a specified
order. The order specified should reflect some theoretical consideration or previous
findings. If you have no reason to believe that one variable is likely to be more
important than another you should not use this method. As each variable is entered
into the model its contribution is assessed. If adding the variable does not
significantly increase the predictive power of the model then the variable is
dropped.
In “statistical” methods, the order in which the predictor variables are entered into
(or taken out of) the model is determined according to the strength of their
correlation with the criterion variable. Actually there are several versions of this
method, called forward selection, backward selection and stepwise selection. In
Forward selection, SPSS enters the variables into the model one at a time in an
order determined by the strength of their correlation with the criterion variable. The
effect of adding each is assessed as it is entered, and variables that do not
significantly add to the success of the model are excluded.
In Backward selection, SPSS enters all the predictor variables into the model. The
weakest predictor variable is then removed and the regression re-calculated. If this
significantly weakens the model then the predictor variable is re-entered – otherwise
it is deleted. This procedure is then repeated until only useful predictor variables
remain in the model.
Stepwise is the most sophisticated of these statistical methods. Each variable is
entered in sequence and its value assessed. If adding the variable contributes to the
model then it is retained, but all other variables in the model are then re-tested to see
if they are still contributing to the success of the model. If they no longer contribute
significantly they are removed. Thus, this method should ensure that you end up
with the smallest possible set of predictor variables included in your model.
In addition to the Enter, Stepwise, Forward and Backward methods, SPSS also
offers the Remove method in which variables are removed from the model in a
block – the use of this method will not be described here.
EXAMPLE STUDY
Further analysis was conducted on the data to determine whether the spelling
performance on this list of 48 words accurately reflected the children’s spelling
ability as estimated by a standardised spelling test. Children’s chronological age,
their reading age, their standardised reading score and their standardised spelling
score were chosen as the predictor variables. The criterion variable was the
percentage correct spelling score attained by each child using the list of 48 words.
For the purposes of this book, we have created a data file that will reproduce some
of the findings from this second analysis. As you will see, the standardised spelling
score derived from a validated test emerged as a strong predictor of the spelling
score achieved on the word list. The data file contains only a subset of the data
collected and is used here to demonstrate multiple regression. (These data are
available in the Appendix.)
You will then be presented with the Linear Regression dialogue box shown below.
You now need to select the criterion (dependent) and the predictor (independent)
variables.
We have chosen to use the percentage correct spelling score (“spelperc”) as our
criterion variable. As our predictor variables we have used chronological age
As we have a relatively small number of cases and do not have any strong
theoretical predictions, we recommend you select Enter (the simultaneous method).
This is usually the safest to adopt.
Now click on the button. This will bring up the Linear Regression: Statistics
dialogue box shown below
Select Estimates.
The Collinearity diagnostics option gives some useful additional output that allows
you to assess whether you have a problem with collinearity in your data. The R
squared change option is useful if you have selected a statistical method such as
When you have selected the statistics options you require, click on the Continue
button. This will return you to the Linear Regression dialogue box. Now click on
the button. The output that will be produced is illustrated on the following pages.
Tip The SPSS multiple regression option was set to Exclude cases listwise. Hence,
although the researcher collected data from 52 participants, SPSS analysed the data
from only the 47 participants who had no missing values.
Enter)
14.9882 47
percentage
correct spelling chronological age This second table gives details of
reading age the correlation between each pair
standardised of variables. We do not want strong
reading score standardised correlations between the criterion
spelling score and the predictor variables. The
Descriptive Statistics values here are acceptable.
b.
Dependent Variable: percentage correct spelling
Model Summary
a.
All requested variables entered.
R R Square Square Estimate
Std. Error of the model!
Model Adjusted R
1
a. This table reports an ANOVA,
.923a .852 .838 9.6377 which assesses the overall
significance of our model. As p <
Predictors: (Constant), standardised ANOVAb 0.05 our model is significant.
spelling score, chronological age,
reading age, standardised reading Sum of
score
Squares df Mean Square F 92.884
Model 1 Sig. 22447.277 4 5611.819
Regression Residual Total 26348.426 46
60.417 .000a 3901.149 42
a.
Predictors: (Constant), standardised spelling score, The Standardized Beta
chronological age, reading age, standardised reading score
b.
Dependent Variable: percentage correct unit change in this
predictor variable has a large
effect on the criterion variable.
a
The t and Sig (p) values give a rough
spelling Coefficients indication of the impact of
Standardi zed
Coefficien ts
Coefficients give a measure of the
Unstandardized contribution of each variable to the
Coefficients model. A large value indicates that a
Model B Std. Error Beta
t Sig. variable – a big
each predictor
1 spelling score absolute t value and small p value
(Constant) -232.079 30.500 -7.609 .000 1.298 .252 .406 suggests that a predictor variable is
chronological age 5.159 .000 -.162 .110 -.144 -1.469 .149 having a large impact on the
reading age .530 .156 .394 3.393 .002 1.254 .165 .786 criterion variable. If you
standardised requested Collinearity diagnostics
reading score 7.584 .000 these will also be included in this
standardised table – see next page.
a.
Dependent Variable: percentage correct spelling
Standardi
zed
Unstandardized Coefficients Beta t Sig. Tolerance VIF
B Std. Error
Model Coefficien ts Collinearity Statistics
1 -232.079 30.500 -7.609 .000
(Constant) 1.298 .252 .406 5.159 .000 .568 1.759 -.162 .110 -.144
chronological age -1.469 .149 .365 2.737
reading age
.530 .156 .394 3.393 .002 .262 3.820 1.254 .165 .786
standardised
reading score
7.584 .000 .329 3.044
standardised
spelling score
a.
Dependent Variable: percentage correct spelling
The tolerance values are a measure of the correlation between the predictor
variables and can vary between 0 and 1. The closer to zero the tolerance value is for
a variable, the stronger the relationship between this and the other predictor
variables. You should worry about variables that have a very low tolerance. SPSS
will not include a predictor variable in a model if it has a tolerance of less that
0.0001. However, you may want to set your own criteria rather higher – perhaps
excluding any variable that has a tolerance level of less than 0.01. VIF is an
alternative measure of collinearity (in fact it is the reciprocal of tolerance) in which
a large value indicates a strong relationship between predictor variables.
Variables Entered/Removeda
.100).
.
This table shows us the ed spelling Stepwise (Criteria: Probabilit
Model 1
Variables Entered
order in which the score y-of-F-to-e nter <=
variables were entered case three variables
and removed form our were added and none
model. We can see that were removed.
standardis in this
Variables
standardis
a.
Dependent Variable: percentage correct spelling
Here we can see that model 1, which included only standardised spelling score
accounted for 71% of the variance (Adjusted R2=0.711). The inclusion of
chronological age into model 2 resulted in an additional 9% of the variance being
explained (R2 change = 0.094). The final model 3 also included standardised reading
score, and this model accounted for 83% of the variance (Adjusted R2=0.833).
Model Summary
Change Statistics
R R Square Std. Error of the Estimate Change F Change df1 df2
Model Adjusted R Square R Square Sig. F Change
.847a .717 .711 12.8708 .717 114.055 1 45 .000
1
.900b .811 .802 10.6481 .094 21.747 1 44 .000
2
.919c .844 .833 9.7665 .034 9.302 1 43 .004
3
a.
Predictors: (Constant), standardised spelling score
b.
Predictors: (Constant), standardised spelling score, chronological age
c.
Predictors: (Constant), standardised spelling score, chronological age, standardised reading score
Sum of
a.
Predictors: (Constant), standardised spelling score
b.
Predictors: (Constant), standardised spelling score, chronological age
c.
Predictors: (Constant), standardised spelling score, chronological
age, standardised reading score
d. zed
Dependent Variable: percentage correct spelling
Here SPSS
Coefficientsa reports the Beta, t and sig (p)
Standardi
values for each of the models. These
Unstandardized Coefficients Beta t Sig. Tolerance VIF output from the Enter
the method.
B Std. Error
Model
Coefficien ts Collinearity Statistics were explained in
1 spelling score chronological age 1.576 .115 .987 13.679 .000 .827 1.209
(Constant)
1.075 .230 .336 4.663 .000 .827 1.209
standardised
23 -209.171 26.562 -7.875 .000
spelling score chronological age
standardised 1.197 .163 .750 7.349 .000 .348 2.875
reading score
-85.032 13.688 -6.212 .000 1.092 .211 .342 5.162 .000 .827 1.210
(Constant) .406 .133 .301 3.050 .004 .371 2.698
1.352 .127 .847 10.680 .000 1.000
standardised
spelling score (Constant) 1.000 -209.328 28.959 -7.228 .000
standardised
a.
Dependent Variable: percentage correct spelling
d
Excluded Variables
This table gives statistics for the
variables that were
Partial
Collinearity Statistics
Minimum
Model
Beta In t Sig.
Correlation Tolerance VIF
Tolerance
excluded from each
1
chronological age reading age a b
.336 4.663 .000 .575 .827 1.209 .827 .301 3.050 .004 .422 .371 2.698 .348
standardised a
.208 2.249 .030 .321 .675 1.481 .675 -.144c -1.469 .149 -.221 .365 2.737
reading score reading age
a
standardised .288 2.317 .025 .330 .371 2.696 .371 .262
23
reading score reading age b
.036 .395 .695 .060 .517 1.933 .435 model.
a.
Predictors in the Model: (Constant), standardised spelling score
b.
Predictors in the Model: (Constant), standardised spelling score, chronological age
c.
Predictors in the Model: (Constant), standardised spelling score, chronological age, standardised reading
d.
score Dependent Variable: percentage correct spelling
In your results section, you would report the significance of the model by citing the
F a nd the associated p value, along with the adjusted R square, which indicates the
strength of the model. So, for the final model reported above, we would write:
Adjusted R square = .833; F3,43 = 77.7, p < 0.0005 (using the stepwise method).
Significant variables are shown below.