10 - Linear Models
10 - Linear Models
Ryan Marcus
Outline
●
Course logistics
●
Brief review
●
Linear models: interpretation
●
Linear models: analysis
Upcoming Deadlines
●
HW2 due Mar 2nd (12 days)
●
Midterm 1 on Mar 4th (14 days)
●
Sample midterm released this week
Office Hours
●
Average wait time: 3.5 minutes
●
P95 wait time: 18 minutes
Outline
●
Course logistics
●
Brief review
●
Linear models: interpretation
●
Linear models: analysis
Is Yawning Contagious?
CONTROL or TREAT
●
Mythbusters episode: got random people into
two rooms, had someone go yawn in one room,
recorded how many yawners.
Effect: “no yawn” or
“yawn”
Empirical P value
Hypothesis Testing
●
P values can be confusing.
FALSE
Big improvement!
Vectorization: The Downside
●
Downside to vectorization: memory usage
●
Large arrays requires space
●
Fix: batching!
US
Seems like a clear
relationship – can we
model it?
Equatorial Guinea
LifeExp = a * GDPpc + b
LifeExp = a * GDPpc + b How to find a and b?
b, the intercept
On average, when GDPpc
increases by 1, life expectancy
a, the slope
increase by 0.00028 years
Linear Models
LifeExp = a * GDPpc + b
●
How do we know these
are the “right”
parameters?
●
The squared error loss
function is convex
●
assuming your input features
are linearly independent!
Use numpy to apply our
prediction to test input data
Not good!
Not really a linear
relationship, so our
conclusion is suspect
Luxembourg
US
Equatorial Guinea
model variance
total variance
Explained Variance
Variance after
modeling (green)
model variance
total variance
Variance before
modeling (red)
Explained Variance
model variance
total variance
Explained Variance
model variance
total variance
Compute R2
→ 39% of the variance is explained by
the linear relationship between GDPpc
and life expectancy
✅ 😭
Overfitting
R2 = 0.396085
R2 = 0.589182
R2 = 0.521163
Outline
●
Course logistics
●
Brief review
●
Linear models: interpretation
●
Linear models: analysis
Significance?
●
How do we know our results are significant?
●
Hypothesis testing! Test stat = coefficient
●
How do we fix it?
Collect test stats from
●
Nullnull
hypothesis:
world
input/output pairing is
arbitrary within each
year
●
Preserves increased averages
over time in the null world
Only 2015