TNDY - TA Session 2
TNDY - TA Session 2
Linear regression
1
y=α + βX
or, in this case:
income=α + β ( age )
Coef: the coefficient β measures by how much does the dependent variable (income
“Y”) change if the independent variable (age “X”) is increased by 1 unit (1 year of life).
o Q: How would you interpret the coefficient of age in this regression? How much
income will change if I increased the age by one unit.
t = coef/SE.
o Sign: The sign tells you if the mean coefficient value is to the left (negative) or to
the right (positive) of zero. Another way to think about it, is that it tells you if the
variables X and Y are negatively (or, inversely: “more of X less of Y”) or positively
(“more of X more of Y”) associated.
o Value: For a conventional statistical significance level, we expect the absolute
value of t to be greater than 1.96 (rule of thumb) to reject the null hypothesis
that income and age are not related (i.e., that β=0 ) at the 0.05 level of
significance.
Interpretation for t=1.96: It means that β (the coef) is 1.96 standard
errors (SE) away from “0”. In statistical terms, this is good indication that
our estimate of β (the quantification of the association between X and Y)
is large enough to be differentiable from “0”; indeed, that “distance” of
1.96 SEs from “0” stands for a 95% certainty that there is an association
between X and Y.
y=α + β 1 X + β 2 Z
or,
2
income=α + β 1 ( age ) + β 2 (female)
What our new linear model is telling us is that the income of individuals is a function
(i.e., depends) on their age and on their gender.
o By “statistical control” we mean that, in this case, the association between
income (Y) and age (X) is independent from the gender (Z) of the individual, and
that the association between income (Y) and gender (Z) is independent from the
age (X) of the individual.
o By “independent” we mean that estimated differences in income between males
and females ( β 2) are not related to differences in age between males and
females and differences in income as people age. Note that, at the same time,
the estimated differences in income as people age ( β 1) are not related to
differences in income and age between males and females.
o That’s what a multiple regression does: It separates (isolates) the
associations between Y and each of the independent variables, and
quantifies their independent associations in the form of a coefficient
(the βs ).
This is how you run the multiple regression model income=α + β 1 ( age ) + β 2 (female)
in Stata:
3
o Answer:
Because the variable is coded 1=female, 0=male, then a 1-unit change is
the same as going from males to females.
Accordingly, the coef. for gender is the average difference between males
and females.
Because the coef. for being a female is negative ( β 2=−2202), then we
can say that females show an income that is $2202 smaller than that of
males.
In other words: The difference in income between males and
females is $2202, on average.
Let’s run a quick visualization (not from the model but from the observed data) on the income
differences by gender as individuals age:
4
Imagine that you also wonder if the difference in income as people age are related to the fact
that there are differences in age by races/ethnicity, and within each race/ethnicity between
males and females. What to do? Yup: run a multiple regression model now including “race”.
reg inctot age female i.race
5
o Q: Can you interpret the coefficient of race3? The Asian show 478$ income less
than the white people, on average