0% found this document useful (0 votes)
29 views49 pages

Meet5 Psy 312 Decision-Making Association

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views49 pages

Meet5 Psy 312 Decision-Making Association

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

labing both of you is breaking all the rule

Where? Where do we go from


DECISION -MAKING in
STATISTICS
meet#5
here? It’s not exactly clear
Should I stay? Should I go? I really don’t know

torn between two labers feeling like a fooool,


D
E How many IV?

C
I
S What type
of Test?
I
O
Pearson
N Correlation /
Regression

T-test / ANOVA
R www.researchgate.net/

Independent
figure/A-basic-

E
decision-tree-on-how-
to-select-the-
appropriate-statistical-
E test-is-
shown_fig5_25630388
9
Test of Association: Linear Regression / SPSS practice
#6. Eight motorists who own auto insurance policies from the same insurance
company observed that their individual monthly auto insurance premiums seem
to be dependent on the driver’s experience. Listed below are their years of
driving experience and their respective monthly auto insurance premium.

OR
Ho: driving-years experience
do not predict monthly
insurance premium
(bx + a = 0)
Ha: driving-years experience
do not predict monthly Source: University
insurance premium of Idaho;
www.webpages.ui
(bx + a ≠ 0) daho.edu
REVIEW: Assumptions for Regression
1. Interval level of measurement *
2. related pairs / paired values *
3. absence of outliers *
4. normality of variable *
5. Linearity/relationship is linear (straight line)
6. Homoscedasticity (the distance between the points to a straight
line are of same variation)
7. Independent and dependent variables are at least moderately
correlated
8. IVs NOT multicollinear/ not highly correlated (r ≥ .9)
Encode data in SPSS, make
sure measurement level is
scale for all variables

ASSUMPTION CHECK: NORMALITY

1. Check assumptions of
normality using skewness
and kurtosis (click analyze,
descriptive statistics,
descriptives)
2. Transfer both variables to
variables box, Click options
and check Mean, SD,
Kurtosis and Skewness,
click continue and ok
Skewness and Kurtosis values for both years experience
and monthly insurance, data normality is achieved
Skewness: between -1 and +1
Kurtosis: between -3 and +3
Check assumptions of
normality of DV:
(1) analyze, descriptive
statistics, explore
(2) Transfer DV to dependent list box, click
statistics, check outliers, click continue**
**For assumptions check purposes,
you may include the IV in
transferring to the dependent list

(3) click plots, click histogram and


normality plots with tests (will
provide shapiro wilk stats), click
continue, click OK
* Interpret Shapiro Wilk: must
be non-significant so as to meet
normality of DV assumption

* No circles/asterisks/stars
indicated in boxplot. Thus, no
outliers. Assumption of absence of
outliers is met.
* If there are, do not do another
data cleaning yet, decide after
checking other assumptions test
like cook’s distance
No circles/asterisks/stars indicated in boxplot. Thus, no outliers.
Assumption of absence of outliers is met.
Other assumptions
of regression can
be checked using
the
ACTUAL
REGRESSION
ANALYSIS
1. Click analyze,
regression,
linear
2. Transfer DV to
dependent box
and the IV to
Independent
box, click
statistics
3. Click on: estimates, model fit, r-squared
change, descriptives and collinearity
diagnostics . Also check casewise
diagnostics under residuals. Click continue,
then Click plots.

4. Under plots, put ZRESID in Y and


ZPRED in X, check normal probability
plots, click continue
5. Click save box, and click/check cook’s
distance. Leave “include the covariance
matrix” box ticked or checked. Click
continue, then click ok.

The first table in the output viewer is the


descriptive statistics table which shows the
mean SD and the number of participants for
each variable (the same if no missing values)
The second table is the correlations table. Each variable’s correlation with itself is
always 1.00. Variable crossings for each pair of variables indicate their correlations,
e.g. correlation between monthly insurance and driving experience is r= -.768 . The
sig (1-tailed) portion of the table shows the p value of the correlations. For this
example, the p-value is significant (p=.013). Driving-years experience and monthly
insurance premium are significantly correlated.
Always smaller or similar
to R-squared

Both R2 and Adjusted R2 indicate how much variance the predictors can
explain in the outcome. in this case, the predictors can explain 52.1% of the
variance of the DV; in other words, the years of driving experience can
explain 52.1% of the variance in the monthly insurance premium.

Use Adjusted R-squared because it only increases when a new variable that
has impact on the regression model is added, whereas R-squared keeps on
increasing when there are new variables added, regardless of whether these
variables are useful or not
The R-square change is tested with an F-test, which is referred to
as the F-change. A significant F-change means that the variables
added in that step significantly improved the prediction. For this
round of regression, because all predictors are entered at once, no
need to use it. If predictors were entered in a hierarchical manner
or by block, change statistics values would be more useful.
The F-ratio in the ANOVA table tests whether the overall
regression model is a good fit for the data. The table shows
that the independent variable, driving-years experience,
statistically significantly predicts the dependent variable,
monthly insurance premium, F(1, 6) = 8.624 p=.026
This shows that the regression model is a good fit of the
data.
Unstandardized and Standardized coefficients indicate change in dependent
variable for every change in each IV.
Standardized coefficients: used more since it allows the comparison of the
two regressors or predictors in the model.

You may write the interpretation for this result in this manner:
“Given the significance value of the predictor, p=.026, it can be said that driving-
years experience significantly predicted the monthly insurance premium. For
every unit increase in driving-years experience, there is a .768 decrease in
monthly insurance premium.”
Assumption check: MULTICOLLINEARITY

(1) VIF value should not be more than 10 for variables to not be
considered as multicollinear. Here, VIF for the DV is 1 indicating that
multicollinearity assumption is met

or (2) Correlations may also be checked to determine multicollinearity. For


some researchers, ideally, correlations should be not more than .70
Under the residual statistics table, look for standardized residuals (STD residuals),
check if minimum is not less than -3 and if maximum is not more than 3.
Also, look for cook’s distance values, if it is below 1, there is no need to worry
about outliers exerting influence on regression line (cook’s distance value for each
x,y pair can be seen in data view)
For Normal P-P plot, the goal is to check if
the points or the tiny circles more or less
fall on the line or follow the normality
line so as to see normality of residuals. A
little deviation is alright. Normally
distributed residuals suggest that
linearity is present between predictors
and outcome

Example of
a P-P plot
that has a
drastic
deviation
assumption check: HOMOSCEDASTICITY
Ideally, a plot that looks like the
dots/circles were shot out of a shotgun
indicates that data is homoscedastic. There
is no pattern and there are points equally
distributed above and below zero on the X
axis, and to the left and right of zero on the
Y axis.
Homoscedastic data also suggests linearity.
No point should also fall outside of -3 and
+3

Example of a
scatterplot that is
not homoscedastic
Reporting Regression Analysis results

Table 1. Descriptive Statistics and Correlations of the Variables

M SD 1 2

1. Driving-years experience 11.250 7.401 - -.768*


2. Monthly insurance 59.250 14.917 -
premium
*p<.05

Table 1 presents the relationship among variables. The predictor, driving


experience, is significantly correlated with the outcome variable of interest in this
study, monthly insurance premium. The r=.768, indicates moderate to strong
negative relationship between the predictor and the outcome variables.
Reporting Regression Analysis results

Table 2. Summary of Regression Analysis Predicting Monthly Insurance


Premium
b SE B β p
Driving-years
-1.548 .527 -.768 .026
experience
Note: R2 = .521

Given the significance value of the predictor, p=.026, it can be said that driving-
years experience significantly predicted the monthly insurance premium. For
every unit increase in driving-years experience, there is a .768 decrease in
monthly insurance premium.
Reporting Regression Analysis results

The results showed that, as seen in Table 1, years of driving experience


was significantly correlated with the monthly insurance premium (r = -.768,
p<.026). This seems to indicate that the years of driving experience is a
possible predictor for the insurance premium being paid monthly.
Regression analysis in Table 2 shows that the model indicates that driving-
years experience indeed significantly predicts the monthly insurance
premium, F(1,6)=F. 8.624, p=.026. As noted in Table 2, the years of driving
experience is able to account for, or explain, 52.1% of the variance in the
insurance premium being paid monthly (adjusted R2=.521).
The results suggest that drivers who have longer experience in driving are
likely to pay lower monthly insurance premium.
SPSS Demonstration Multiple Linear Regression
Is analytical reasoning Ho: Reading comprehension,
predicted by reading mathematics and geometry
comprehension, capacities do not predict analytical
mathematics and reasoning
geometry capacities? (bx + a = 0)
Ha: Reading comprehension,
mathematics and geometry
capacities predict analytical
reasoning
(bx + a ≠ 0)

Slides 29-49
are from
G.Conway’s
Encode data in SPSS, make sure that measure is scale ppt
Check normality of DV
using analyze, descriptive
statistics, explore
Transfer DV to dependent
list box, click statistics,
check outliers, click
continue

click plots, click histogram and


normality plots with tests (will
provide shapiro wilk statistics),
click continue, click OK
Interpret Shapiro
Wilk: must be non-
significant so as to
meet normality of
DV assumption
Boxplot may show
outliers (such as in
this case) but do
not delete values
yet and decide
only after
checking other
values (e.g. cook’s
distance)
Skewness and kurtosis can
also be checked for more
evidence of normality of
variables. (analyze,
descriptive statistics,
descriptives. Then transfer
variables of interest to
variables box, click options,
check/tick mean, SD,
kurtosis and skewness. Then
continue, then OK
Other assumptions of regression can be checked using
the actual regression analysis
Click analyze, regression, linear
Do note that this is
multiple regression,
Transfer DV to so there has to
dependent box and have at least 2 IVs
the three IVs to
Independent box,
click statistics
Click estimates and model fit (if not yet
automatic), r-squared change, descriptives
and collinearity diagnostics . Also check
casewise diagnostics under residuals. Click
continue, then Click plots.
Under plots, put ZRESID in Y and ZPRED in X,
check normal probability plots, click continue
Click save box, and click/check
cook’s distance. Leave “include
the covariance matrix” box
ticked or checked. Click
continue, then click ok.

The first table in the


output viewer is the
descriptive statistics table
which shows the mean SD
and the number of
participants for each
variable (the same if no
missing values)
The second table is the correlations table.
To know correlations of variables with each other, each variable’s correlation with itself is
always 1.00. The 1.00 values form a diagonal line.
Variable crossings for each pair of variables indicate their correlations, e.g. correlation
between reading comprehension and analytical reasoning is r=.635, between reading
comprehension and mathematics is r=.659, between mathematics and geometry, r=.780
The sig (1-tailed) portion of the table shows the p value of the correlations. For this
example, all p-values are significant (p=.000)
Again Both R2 and Adjusted R2 indicate how much
variance the predictors can explain in the outcome. in
this case, the predictors can explain 69.6% of the
variance of the DV; in other words: geometry, reading
comprehension and mathematics can explain 69.6% of
the variance in analytical reasoning.
Use Adjusted R-squared because it only increases
when a new variable that has impact on the
regression model is added, whereas R-squared
keeps on increasing when there are new variables
added, regardless of whether these variables are
useful or not
The R-square change is tested with an F-test, which is
referred to as the F-change. A significant F-change means
that the variables added in that step signficantly improved
the prediction. For this round of regression, because all
predictors are entered at once, no need to use it. If
predictors were entered in a hierarchical manner or by block,
change statistics values would be more useful.
The F-ratio in the ANOVA table tests
whether the overall regression model
is a good fit for the data. The table
shows that the independent variables
statistically significantly predict the
dependent variable, F(3, 245) =
189.852 p=.000 (i.e., the regression
model is a good fit of the data).
VIF value should not be more than 10 for variables to not be
considered as multicollinear. Here, VIF for variables are
between 1.77 and 3.26 indicating that multicollinearity
assumption is met

Correlations may also be checked to


determine multicollinearity. For some
researchers, ideally, correlations should be
not more than .70
Unstandardized and Standardized coefficients indicate change in
dependent variable for every change in each IV.
Standardized coefficients: used more since it allows the comparison
of the two regressors or predictors in the model.
Given the significance value of each of the
predictors, p=.000, it can be said that all three
predictors significantly predict analytical
reasoning. For every unit increase in reading
comprehension, there is a .167 increase in
analytical reasoning. For every unit increase in
mathematics, there is a .536 increase in analytical
reasoning and for every unit increase in geometry
there is a .216 increase in analytic reasoning.
Under the residual statistics table, look for standardized residuals
(STD residuals), check if minimum is not less than -3 and if
maximum is not more than 3.
Also, look for cook’s distance values, if it is below 1, there is no need
to worry about outliers exerting influence on regression line (cook’s
distance value for each x,y pair can be seen in data view)
For Normal P-P plot, the goal is
to check if the points or the tiny
circles more or less fall on the
line or follow the normality line
so as to see normality of
residuals. A little deviation is
alright. Normally distributed
residuals suggest that linearity is
present between predictors and
outcome

Example of a P-P plot


that has a drastic
deviation
The next assumption to check is
homoscedasticity. The
scatterplot of the residuals will
appear right below the normal
P-P plot in the output. Ideally, a
plot that looks like the
dots/circles were shot out of a
shotgun indicates that data is
homoscedastic. There is no
pattern and there are points
Example of plot that is not
equally distributed above and
Not homoscedastic
below zero on the X axis, and to
the left and right of zero on the
Y axis. Homoscedastic data also
suggests linearity.
No point should also fall outside
of -3 and +3
This is how you report your regression analysis results in a
table
Table 1. Descriptive Statistics and Correlations of the Variables

M SD 1 2 3 4

1. Reading Comprehension .1315 .991 - .659* .528* .635*


2. Mathematics .0987 1.05 - .780* .815*
3. Geometry .1137 1.04 - .723*
4. Analytical Reasoning .1671 1.05 -
*p<.001
Table 1 presents the relationships among variables. All
predictors are significantly correlated with one another,
with r values indicating moderate to strong positive
relationships. All variables are also significantly correlated
to analytical reasoning, which is the outcome variable of
interest in the current research.
This is how you report your regression analysis results in a table

Table 2. Summary of Multiple Regression Analysis for Variables


Predicting Analytical Reasoning

b SE B β p

Reading
.177 .049 .167 .000
Comprehension

Geometry .220 .057 .216 .000

Mathematics .536 .063 .536 .000


Reporting the Results
Reading comprehension was significantly correlated analytical
reasoning (r = -.635, p < .001). This seem to indicate that reading
comprehension is a possible predictor of analytical reasoning.
Likewise, geometry (r=.723) and mathematics (r=.815) are
significantly correlated with analytical reasoning at p<.001.
Regression analysis shows that there are significant predictors in
the model, F(3,245)=F. 189.852, p=.000. In fact, as indicated in table
2, all three independent variables are significant predictors and are
able to acount for or explain 69.6% of the variance in analytical
reasoning (adjusted R2=.696).
The results suggest that students who have high scores in
geometry and in math, as well as in reading comprehension also
likely have higher analytical reasoning scores.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy