Poisson Regression 1730136731
Poisson Regression 1730136731
Edps/Psych/Soc 589
Carolyn J. Anderson
Outline
GLMs for count data.
Poisson regression for counts.
Poisson regression for rates.
Inference and model checking.
Wald, Likelihood ratio, & Score test.
Checking Poisson regression.
Residuals.
Confidence intervals for fitted values (means).
Overdispersion.
Fitting GLMS (a little technical).
Newton-Raphson algorithm/Fisher scoring.
Statistic inference & the Likelihood function.
“Deviance”.
Summary
C.J. Anderson (Illinois) Poisson Regression 2.1/ 59
Outline Poisson regression for counts Crab data SAS/R Poisson regression for rates Lung cancer SAS/R
More Examples. . .
Number of presidential appointments to the Supreme Court (King,
1987).
Number of children in a classroom that a child lists as being their
friend (unlimited nomination procedure, sociometric data).
Number of hard disk failures at uiuc during a year.
Number of deaths due to SARs (Yu, Chan & Fung, 2006).
Number of arrests resulting from 911 calls.
Number of orders of protection issued.
In some of these examples, we should consider “exposure” to the event.
i.e., “t”.
e.g., hard disk failures: In this case, “exposure” could be the number of
hours of operation. Rather than model the number of failures (i.e., counts),
we would want to measure and model the failure “rate”
Y /t = rate
C.J. Anderson (Illinois) Poisson Regression 4.1/ 59
Outline Poisson regression for counts Crab data SAS/R Poisson regression for rates Lung cancer SAS/R
log(µ) = α + βx
Since the log of the expected value of Y is a linear function of explanatory
variable(s), and the expected value of Y is a multiplicative function of x:
µ = exp(α + βx)
= eα eβx
Interpretation of β
log(µ) = α + βx
Consider 2 values of x (x1 & x2 ) such that the difference between them
equals 1. For example, x1 = 10 and x2 = 11:
x2 = x1 + 1
µ1 = eα eβx1 = eα eβ(10)
µ2 = eα eβx2
= eα eβ(x1 +1)
= eα eβx1 eβ
= eα eβ(10) eβ
Interpretation of β (continued)
When we look at a 1 unit increase in the explanatory variable (i.e.,
x2 − x1 = 1), we have
If β = 0, then e0 = 1 and
µ1 = e α .
µ2 = e α .
µ = E(Y ) is not related to x.
If β > 0, then eβ > 1 and
µ1 = eα eβx1
µ2 = eα eβx2 = eα eβx1 eβ = µ1 eβ
µ2 is eβ times larger than µ1 .
If β < 0, then 0 ≤ eβ < 1
µ1 = eα eβx1 .
µ2 = eα eβx2 = eα eβx1 eβ = µ1 eβ .
µ2 is eβ times smaller than µ1 .
C.J. Anderson (Illinois) Poisson Regression 9.1/ 59
Outline Poisson regression for counts Crab data SAS/R Poisson regression for rates Lung cancer SAS/R
yi = number of deaths
xi = time point (quarter)
xi y i xi y i
1 0 8 18
2 1 9 23
3 2 10 31
4 3 11 20
5 1 12 25
6 4 13 37
7 9 14 45
R is even worse:
poi0 ← glm(count ∼ month, data=aids, family=poisson(link=”identity”))
Error: no valid set of coefficients has been found: please supply starting
values
Pattern in residuals.
Comparison in Log-Scale
e−µ̂i µ̂yi
P (Yi = y) =
y!
µ̂(quarter = 3) = 1.5606
A Smoother Look
The data were collapsed into 8 groups by their width (i.e., ≤ 23.25,
23.25–24.25, 24.25–25.25. . . , > 29.25).
From the figure of collapsed data, it looks like either a linear or a log
link might work.
The estimated model with the linear link :
SAS
data crab; input color spine width satell weight;
datalines;
color spine width satell weight
3 3 28.3 8 3050
4 3 22.5 0 1550
2 1 26.0 9 2300
..
.
run;
R: Poisson regression
crab data.txt
color spine width satell weight
3 3 28.3 8 3050
4 3 22.5 0 1550
2 1 26.0 9 2300
4 3 24.8 0 2100
.
.
.
The term “− log(t)” is an adjustment term and each individual may have a
different value of t.
− log(t) is referred to as an “offset”.
C.J. Anderson (Illinois) Poisson Regression 38.1/ 59
Outline Poisson regression for counts Crab data SAS/R Poisson regression for rates Lung cancer SAS/R
As a Multiplicative Model
The Poisson log-linear regression model with a log link for rate data is
log(µ/t) = α + βx
µ/t = eα eβx
µ = teα eβx
The expected value of counts depends on both t and x, both of which are
observations (i.e., neither is a parameter of the model).
Estimated Parameters
Note: Poisson regression models for rate data are related to models for
“survival times”.
Define
1 if city is Frederica
Fredericia =
0 other city
That is,
α + β1 + β2 (Age) + β3 (Age)2
if Fredericia
log(Y /pop) =
α + β2 (Age) + β3 (Age)2 if other city
R: Data
R: Data
Next Steps