0% found this document useful (0 votes)
12 views40 pages

Multi Regrson

This document discusses multiple regression analysis, focusing on how to model a dependent variable influenced by multiple independent variables. It explains the difference between bivariate and multiple regression, the interpretation of coefficients, and the use of dummy variables for nominal categories. Key concepts include controlling for other variables, standardized coefficients, and the significance of results in sociological research.

Uploaded by

syuhada_ad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views40 pages

Multi Regrson

This document discusses multiple regression analysis, focusing on how to model a dependent variable influenced by multiple independent variables. It explains the difference between bivariate and multiple regression, the interpretation of coefficients, and the use of dummy variables for nominal categories. Key concepts include controlling for other variables, standardized coefficients, and the significance of results in sociological research.

Uploaded by

syuhada_ad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 40

Multiple Regression 1

Sociology 5811 Lecture 22


Copyright © 2005 by Evan Schofer
Do not copy or distribute without
permission
Announcements
• None!
Multiple Regression
• Question: What if a dependent variable is
affected by more than one independent variable?
• Strategy #1: Do separate bivariate regressions
– One regression for each independent variable
• This yields separate slope estimates for each
independent variable
– Bivariate slope estimates implicitly assume that
neither independent variable mediates the other
– In reality, there might be no effect of family wealth
over and above education
Multiple Regression
• Job Prestige: Two separate regression models
Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 9.417 1.421 6.625 .000
HIGHEST YEAR OF
2.488 .108 .520 23.056 .000
SCHOOL COMPLETED
a. Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE (1970)

Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 35.608 1.290 27.611 .000
RS FAMILY INCOME
2.075 .446 .122 4.652 .000
WHEN 16 YRS OLD
a. Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE (1970)

Both variables have positive, significant slopes


Multiple Regression
• Idea #2: Use Multiple Regression
• Multiple regression can examine “partial”
relationships
– Partial = Relationships after the effects of other
variables have been “controlled” (taken into account)
• This lets you determine the effects of variables
“over and above” other variables
– And shows the relative impact of different factors on
a dependent variable
• And, you can use several independent variables
to improve your predictions of the dependent var
Multiple Regression
• Job Prestige: 2 variable multiple regression
Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 8.977 1.629 5.512 .000
HIGHEST YEAR OF
2.487 .111 .520 22.403 .000
SCHOOL COMPLETED
RS FAMILY INCOME
.178 .394 .011 .453 .651
WHEN 16 YRS OLD
a. Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE (1970)

Education slope is basically Family Income slope decreases


unchanged compared to bivariate analysis
(bivariate: b = 2.07)
And, outcome of hypothesis
test changes – t < 1.96
Multiple Regression
• Ex: Job Prestige: 2 variable multiple regression
• 1. Education has a large slope effect controlling
for (i.e. “over and above”) family income
• 2. Family income does not have much effect
controlling for education
• Despite a strong bivariate relationship
• Possible interpretations:
• Family income may lead to education, but education is the
critical predictor of job prestige
• Or, family income is wholly unrelated to job prestige… but
is coincidentally correlated with a variable that is
(education), which generated a spurious “effect”.
The Multiple Regression Model
• A two-independent variable regression model:
Yi a  b1 X 1i  b2 X 2i  ei
• Note: There are now two X variables
• And a slope (b) is estimated for each one
• The full multiple regression model is:
Yi a  b1 X 1i  b2 X 2i   bk X ki  ei
• For k independent variables
Multiple Regression: Slopes
• Regression slope for the two variable case:
 sY  rYX1  rYX 2 rX 1 X 2
b1  
 sX  1  rX2 X
 1  1 2

• b1 = slope for X1 – controlling for the other


independent variable X2
• b2 is computed symmetrically. Swap X1s, X2s
sY
• Compare to bivariate slope: bYX  rYX
sX
Multiple Regression Slopes
• Let’s look more closely at the formulas:
 sY  rYX1  rYX 2 rX 1 X 2 s
b1   versus  b  Y
r
 sX  1  rX X 2 YX
sX
YX
 1  1 2

• What happens to b1 if X1 and X2 are totally


uncorrelated?
• Answer: The formula reduces to the bivariate
• What if X1 and X2 are correlated with each other
AND X2 is more correlated with Y than X1?
Regression Slopes
• So, if two variables (X1, X2) are correlated and
both predict Y:
• The X variable that is more correlated with Y
will have a higher slope in multivariate
regression
– The slope of the less-correlated variable will shrink
• Thus, slopes for each variable are adjusted to
how well the other variable predicts Y
– It is the slope “controlling” for other variables
Multiple Regression Slopes
• One last thing to keep in mind…
 sY  rYX1  rYX 2 rX 1 X 2 s
b1   versus  b  Y
r
 sX  1  rX X 2 YX
sX
YX
 1  1 2

• What happens to b1 if X1 and X2 are almost


perfectly correlated?
• Answer: The denominator approaches Zero
• The slope “blows up”, approaching infinity
• Highly correlated independent variables can
cause trouble for regression models… watch out
Interpreting Results
• (Over)Simplified rules for interpretation
– Assumes good sample, measures, models, etc.
• Multivariate regression with two variables: A, B
• If slopes of A, B are the same as bivariate, then
each has an independent effect
• If A remains large, B shrinks to zero we typically
conclude that effect of B was spurious, or
operates through A
• If both A and B shrink a little, each has an effect,
but some overlap or mediation is occurring
Interpreting Multivariate Results
• Things to watch out for:
• 1. Remember: Correlation is not causation
– Ability to “control” for many variables can help detect
spurious relationships… but it isn’t perfect.
– Be aware that other (omitted) variables may be
affecting your model. Don’t over-interpret results.
• 2. Reverse causality
– Many sociological processes involve bi-directional
causality. Regression slopes (and correlations) do not
identify which variable “causes” the other.
• Ex: self-esteem and test scores.
Standardized Regression Coefficients
• Regression slopes reflect the units of the
independent variables
• Question: How do you compare how “strong”
the effects of two variables if they have totally
different units?
• Example: Education, family wealth, job prestige
– Education measured in years, b = 2.5
– Family wealth measured on 1-5 scale, b = .18
– Which is a “bigger” effect? Units aren’t comparable!
• Answer: Create “standardized” coefficients
Standardized Regression Coefficients
• Standardized Coefficients
– Also called “Betas” or Beta Weights”
– Symbol: Greek b with asterisk: *
– Equivalent to Z-scoring (standardizing) all
independent variables before doing the regression
• Formula of coeficient for Xj:  s Xj 
 j 
*
b j

 Ys 
• Result: The unit is standard deviations
• Betas: Indicates the effect a 1 standard
deviation change in Xj on Y
Standardized Regression Coefficients
• Ex: Education, family income, and job prestige:
Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 8.977 1.629 5.512 .000
HIGHEST YEAR OF
2.487 .111 .520 22.403 .000
SCHOOL COMPLETED
RS FAMILY INCOME
.178 .394 .011 .453 .651
WHEN 16 YRS OLD
a. Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE (1970)

An increase of 1 standard What is the interpretation of


deviation in Education results the “family income” beta?
in a .52 standard deviation
increase in job prestige Betas give you a sense of
which variables “matter most”
R-Square in Multiple Regression
• Multivariate R-square is much like bivariate:
2SS REGRESSION
R 
SSTOTAL
• But, SSregression is based on the multivariate
regression
• The addition of new variables results in better
prediction of Y, less error (e), higher R-square.
R-Square in Multiple Regression
• Example: Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .522a .272 .271 12.41
a. Predictors: (Constant), INCOM16, EDUC

• R-square of .272 indicates that education, parents


wealth explain 27% of variance in job prestige
• “Adjusted R-square” is a more conservative,
more accurate measure in multiple regression
– Generally, you should report Adjusted R-square.
Dummy Variables
• Question: How can we incorporate nominal
variables (e.g., race, gender) into regression?
• Option 1: Analyze each sub-group separately
– Generates different slope, constant for each group
• Option 2: Dummy variables
– “Dummy” = a dichotomous variables coded to
indicate the presence or absence of something
– Absence coded as zero, presence coded as 1.
Dummy Variables
• Strategy: Create a separate dummy variable for
all nominal categories
• Ex: Gender – make female & male variables
– DFEMALE: coded as 1 for all women, zero for men
– DMALE: coded as 1 for all men
• Next: Include all but one dummy variables into
a multiple regression model
• If two dummies, include 1; If 5 dummies, include 4.
Dummy Variables
• Question: Why can’t you include DFEMALE
and DMALE in the same regression model?
• Answer: They are perfectly correlated
(negatively): r = -1
– Result: Regression model “blows up”
• For any set of nominal categories, a full set of
dummies contains redundant information
– DMALE and DFEMALE contain same information
– Dropping one removes redundant information.
Dummy Variables:
Interpretation
• Consider the following regression equation:
Yi a  b1 INCOMEi  b2 DFEMALEi  ei
• Question: What if the case is a male?
• Answer: DFEMALE is 0, so the entire term
becomes zero.
– Result: Males are modeled using the familiar
regression model: a + b1X + e.
Dummy Variables:
Interpretation
• Consider the following regression equation:
Yi a  b1 INCOMEi  b2 DFEMALEi  ei
• Question: What if the case is a female?
• Answer: DFEMALE is 1, so b2(1) stays in the
equation (and is added to the constant)
– Result: Females are modeled using a different
regression line: (a+b2) + b1X + e
– Thus, the coefficient of b2 reflects difference in
the constant for women.
Dummy Variables:
Interpretation
• Remember, a different constant generates a
different line, either higher or lower
– Variable: DFEMALE (women = 1, men = 0)
– A positive coefficient (b) indicates that women are
consistently higher compared to men (on dep. var.)
– A negative coefficient indicated women are lower
• Example: If DFEMALE coeff = 1.2:
– “Women are on average 1.2 points higher than men”.
Dummy Variables:

Interpretation
Visually: Women = blue, Men = red
10 Overall slope for
9 all data points
8

7
Note: Line for men,
6
women have same
5
slope… but one is
high other is lower.
4
The constant differs!
3

2 If women=1, men=0:
HAPPY

1 The constant (a) reflects


0 men only. Dummy
0 20000 40000 60000 80000 100000
coefficient (b) reflects
INCOME
increase for women
(relative to men)
Dummy Variables
• What if you want to compare more than 2 groups?
• Example: Race
– Coded 1=white, 2=black, 3=other (like GSS)
• Make 3 dummy variables:
– “DWHITE” is 1 for whites, 0 for everyone else
– “DBLACK” is 1 for Af. Am., 0 for everyone else
– “DOTHER” is 1 for “others”, 0 for everyone else
• Then, include two of the three variables in the
multiple regression model.
Dummy Variables:
Interpretation
• Ex: Job Prestige Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 9.666 1.672 5.780 .000
EDUC 2.476 .111 .517 22.271 .000
INCOM16 6.282E-02 .397 .004 .158 .874
DBLACK -2.666 1.117 -.055 -2.388 .017
DOTHER 1.114 1.777 .014 .627 .531
a. Dependent Variable: PRESTIGE

• Negative coefficient for DBLACK indicates a


lower level of job prestige compared to whites
– T- and P-values indicate if difference is significant.
Dummy Variables:
Interpretation
• Comments:
• 1. Dummy coefficients shouldn’t be called
slopes
– Referring to the “slope” of gender doesn’t make sense
– Rather, it is the difference in the constant (or “level”)
• 2. The contrast is always with the nominal
category that was left out of the equation
– If DFEMALE is included, the contrast is with males
– If DBLACK, DOTHER are included, coefficients
reflect difference in constant compared to whites.
Interaction Terms
• Question: What if you suspect that a variable has
a totally different slope for two different sub-
groups in your data?
• Example: Income and Happiness
– Perhaps men are more materialistic -- an extra dollar
increases their happiness a lot
– If women are less materialistic, each dollar has a
smaller effect on income (compared to men)
• Issue isn’t men = “more” or “less” than women
– Rather, the slope of a variable (income) differs across
groups
Interaction Terms
• Issue isn’t men = “more” or “less” than women
– Rather, the slope of a variable coefficient (for income)
differs across groups
• Again, we want to specify a different regression
line for each group
– We want lines with different slopes, not parallel lines
that are higher or lower.
Interaction Terms
• Visually: Women = blue, Men = red
10 Overall slope for
9 all data points
8
Note: Here, the slope
7
for men and women
6
differs.
5

4 The effect of income on


3
happiness (X1 on Y)
2
varies with gender (X2).
This is called an
HAPPY

0
“interaction effect”
0 20000 40000 60000 80000 100000

INCOME
Interaction Terms
• Interaction effects: Differences in the
relationship (slope) between two variables for
each category of a third variable
• Option #1: Analyze each group separately
• Option #2: Multiply the two variables of interest:
(DFEMALE, INCOME) to create a new variable
– Called: DFEMALE*INCOME
– Add that variable to the multiple regression model.
Interaction Terms
• Consider the following regression equation:
Yi a  b1 INCOMEi  b2 DFEM * INCi  ei
• Question: What if the case is male?
• Answer: DFEMALE is 0, so b2(DFEM*INC)
drops out of the equation
– Result: Males are modeled using the ordinary
regression equation: a + b1X + e.
Interaction Terms
• Consider the following regression equation:
Yi a  b1 INCOMEi  b2 DFEM * INCi  ei
• Question: What if the case is male?
• Answer: DFEMALE is 1, so b2(DFEM*INC)
becomes b2*INCOME, which is added to b1
– Result: Females are modeled using a different
regression line: a + (b1+b2) X + e
– Thus, the coefficient of b2 reflects difference in
the slope of INCOME for women.
Interaction Terms
• Interpreting interaction terms:
• A positive b for DFEMALE*INCOME indicates
the slope for income is higher for women vs. men
– A negative effect indicates the slope is lower
– Size of coefficient indicates actual difference in slope
• Example: DFEMALE*INCOME. Observed b’s:
– Income: b = .5
– DFEMALE * INCOME: b = -.2
• Interpretation: Slope is .5 for men, .3 for women.
Interaction Terms
• Continuous variable can also interact
• Example: Effect of education and income on
happiness
– Perhaps highly educated people are less materialistic
– As education increases, the slope between between
income and happiness would decrease
• Simply multiply Education and Income to create
the interaction term “EDUCATION*INCOME”
– And add it to the model
Interaction Terms
• How do you interpret continuous variable
interactions?
• Example: EDUCATION*INCOME: Coefficient = 2.0
• Answer: For each unit change in education, the
slope of income vs. happiness increases by 2
– Note: coefficient is symmetrical: For each unit
change in income, education slope increases by 2
– Dummy interactions result in slopes for each group
– Continuous interactions result in many slopes
• Each category of education*income has a different slope.
Interaction Terms
• Comments:
• 1. If you make an interaction you should also
include the component variables in the model:
– A model with “DFEMALE * INCOME” should also
include DFEMALE and INCOME
– There is some debate on this issue… but that is the
safest course of action
• 2. Sometimes interaction terms are highly
correlated with its components
• Watch out for that.
Interaction Terms
• Question: Can you think of examples of two
variables that might interact?
• Either from your final project? Or anything else?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy