Shorten - Count Data Analysis
Shorten - Count Data Analysis
By Lemma Derseh
Presentation outline
Poisson regression
• The Poisson is different than the binomial, Bin(n, π), which takes on
numbers only up to some n, and leads to a proportion (out of n)
• We think that the covariates influence the mean of the counts (μ)
in a multiplicative way, i.e. as a covariate increases by 1 unit, the
log of the mean increases by β units and this implies the mean
increases by a “fold-change” of or “scale factor” of exp(β).
• The log link is the canonical link in GLM for Poisson distribution
Interpretation of loglinear model
That is, the number of MIs is growing at the rate of 2.6% per month
due to a one year increase in age
Interpretation
(Stata Oriented)
Data: We use data from the STATA log provided by Long (1990) on the number
of publications produced by Ph.D. biochemists to illustrate the application on
the models mentioned. The variables in the data set include art: articles in last
three years of Ph.D., fem: coded one for females, mar: coded one if married,
kid5: number of children under age five, phd: prestige of PhD program, and
ment: articles published by mentor in last three years.
Data assessment
• After running the data using Stata: sum art, one can see that the
mean number of articles is 1.69 and the variance is 3.71
We will also store the estimates for latter use using the
command: estimates store poison
A Poisson Model
. glm art fem mar kid5 phd ment, family(poisson) nolog
AIC = 3.621981
Log likelihood = -1651.056316 BIC = -4564.031
OIM
art Coef. Std. Err. z P>|z| [95% Conf. Interval]
art
fem -.2091446 .0634047 -3.30 0.001 -.3334155 -.0848737
mar .103751 .071111 1.46 0.145 -.035624 .243126
kid5 -.1433196 .0474293 -3.02 0.003 -.2362793 -.0503599
phd -.0061662 .0310086 -0.20 0.842 -.066942 .0546096
ment .0180977 .0022948 7.89 0.000 .0135999 .0225955
_cons .640839 .1213072 5.28 0.000 .4030814 .8785967
inflate
fem .1097465 .2800813 0.39 0.695 -.4392028 .6586958
mar -.3540107 .3176103 -1.11 0.265 -.9765155 .2684941
kid5 .2171001 .196481 1.10 0.269 -.1679956 .6021958
phd .0012702 .1452639 0.01 0.993 -.2834418 .2859821
ment -.134111 .0452461 -2.96 0.003 -.2227918 -.0454302
_cons -.5770618 .5093853 -1.13 0.257 -1.575439 .421315
Zero-Inflated Poisson
• Looking at the inflate equation we see that the only significant
predictor of being in the “always zero” class is the number of
articles published by the mentor, with each article by the mentor
associated with 13.4% lower odds of never publishing