Poisson Regression
Poisson Regression
REGRESSION ANALYSIS
INTRODUCTION TO
POISSON REGRESSION
PROTOTYPE EXAMPLE #1
We have data for 44 physicians working in
emergency medicine at a major hospital system.
The concern is about the number of complaints
each received the previous year. Why is it
different from physician to physician? Could it
be explained using other factors? Data available
consist of the number of patient visits - and four
covariates (the revenue, in dollars per hour;
work load at the emergency service, in hours;
gender, Female/Male, and residency training in
emergency medicine, No/Yes.
Question:
Can we do Regression using the Normal
Error Regression Model?
Possible Issue:
The response, Number of Complaints, is
not on continuous scale; it’s a count – of
course, not normally distributed.
PROTOTYPE EXAMPLE #2
Skin Cancer data for different age groups in two
metropolitan areas: Minneapolis-St. Paul and Dallas-Fort
Worth: (1) Any age effect? If so, is it the same for the
two cities? (2) Any weather effect? (difference between
two cities) If so, is it the same for all age groups?
Skin Cancer Data
City: Minneapolis-St. Paul Dallas-Ft. Worth
Age Group Cases Population Cases Population
15-24 1 172,675 4 181,343
25-34 16 123,063 38 146,207
35-44 30 96,216 119 121,374
45-54 71 92,051 221 111,353
55-64 102 72,159 259 83,004
65-74 130 54,722 310 55,932
75-84 133 32,185 226 29,007
85+ 40 8,328 65 7,538
Questions are regression-type
questions (about main effects or
marginal contributions and about
possible effect modifications).
β 0 + β1 x + lns i if X = x
lnλ i =
β 0 + β1 (x + 1) + lns i if X = x + 1
λ i (X = x + 1)
= exp(β1 )
λ i (X = x)
The result indicates that the common perception is almost true, that
the relationship between the number of complaints and no
residency training in emergency service is marginally significant (p
= 0.0779); the relative risk associated with no residency training is:
exp(.3041) = 1.36
Those without previous training is 36 percent more likely to receive
the same number of complaints as those who were trained in the
specialty.
SAS SAMPLE
a SAS program would include these instructions:
DATA EMERGENCY;
INPUT VISITS CASES RESIDENCY;
LN = LOG(VISITS);
CARDS;
(Data);
PROC GENMOD DATA EMERGENCY;
CLASS RESIDENCY;
MODEL CASES = RESIDENCY/ DIST = POISSON LINK = LOG OFFSET = LN;
H 0 : β1 = β 2 = = β k = 0
H o : β1 = β 2 = ... = β k = 0
Since:
E(Yi ) = s i λ(x1i , x 2i ,..., x ki )
= s i exp(β 0 + β1 x1i + β 2 x 2i + ... + β k x ki )
k
= s i exp(β 0 + ∑ β jx ji )
j=1
E(Yi ) = s i λ(x i )
= s i exp(β 0 + β1 Hour i + β 2 Hour i2 )
TESTING FOR A GROUP OF VARABLES
• The question is: “Does the addition of a group
of factors add significantly to the prediction of
Y over and above that achieved by other
factors?
• The Null Hypothesis for this test may stated
as: "Factors {Xi+1 , Xi+2,…, Xi+m}, considered
together as a group, do not have any value
added to the prediction of the Mean of Y that
other factors are already included in the
model". In other words,
H 0 : β i +1 = β i + 2 = ... = β i +m = 0
• This “multiple contribution” test is often used
to test whether a similar group of variables,
such as demographic characteristics, is
important for the prediction of the mean of Y;
these variables have some trait in common.
• Another application: collection of powers
and/or product terms. It is of interest to assess
powers & interaction effects collectively before
considering individual interaction terms in a
model. It reduces the total number of tests &
helps to provide better control of overall Type I
error rates which may be inflated due to
multiple testing.
The process can also be used to test
for the contribution of a categorical
covariate which is represented by
several “dummy variables”.
EXAMPLE #3
In this example, the dependent variable is the number of
cases of skin cancer. Data involve only two covariates;
age and location, both are categorical. We use seven
dummy variables to represent the eight age groups
(with “85+” age group being the baseline) and one for
location (with Minneapolis-St. Paul as the baseline)
Skin Cancer Data
City: Minneapolis-St. Paul Dallas-Ft. Worth
Age Group Cases Population Cases Population
15-24 1 172,675 4 181,343
25-34 16 123,063 38 146,207
35-44 30 96,216 119 121,374
45-54 71 92,051 221 111,353
55-64 102 72,159 259 83,004
65-74 130 54,722 310 55,932
75-84 133 32,185 226 29,007
85+ 40 8,328 65 7,538
SEQUENTIAL ADJUSTMENT
In the type 3 analysis, we test the significance of
the effect of each factor added to the model
containing all other factors – like in most common
multiple regression analyses; that is to investigate
the additional contribution of the factor to the
explanation of the dependent variable. Sometimes,
however, we may be interested in a hierarchical or
sequential adjustment. For example, we focus on
the quadratic term (adjusted) in addition to the
regular term (unadjusted). This can be achieved
using PROC GENMOD by requesting the type 1
analysis option
OVERDISPERSION
The Poisson is a very special distribution; its
mean µ and its variance σ2 are equal. If we use the
variance-mean ratio as a dispersion parameter
then it is 1 in a standard Poisson model, less than
1 in an under-dispersed model, and greater than 1
in an over-dispersed model. Over-dispersion is a
common phenomenon in practice and it causes
concerns because the implication is serious; the
analysis which assumes the Poisson model often
under-estimates standard errors and, thus,
wrongly inflates the level of significance.
MEASURING OVERDISPERSION
After a Poisson regression model is fitted,
dispersion is measured by the scaled deviance
or scaled Pearson chi-square; it is the
deviance or Pearson chi-square divided by the
degrees of freedom (number of observations
minus number of parameters). The deviance is
defined as twice the difference between the
maximum achievable log likelihood and the log
likelihood at the maximum likelihood
estimates of the regression parameters.
EXAMPLE
Refer to the data set on emergency service
with all four covariates, we can see that both
indices are greater than 1 indicating an over-
dispersion. In this example, we have a sample
size of 44 but five degrees of freedom lost due
to the estimation of the five regression
parameters, including the intercept.
Criterion df Value Scaled Value
Deviance 39 54.52 1.3980
Pearson Chi-Square 39 54.42 1.3700
FITTING OVERDISPERSED MODEL
PROC GENMOD allows the specification of a
scale parameter to fit over-dispersed Poisson
regression models. Instead of a variance equal
to the mean,
Var(Y) = µ
it allows the variance function to have a
multiplicative “over-dispersion factor” ϕ
(specified by users):
Var(Y) = ϕµ
EXAMPLE #4
Refer to the data set on emergency service;
Using all four covariates, we have the
following results by fitting the “regular”
Poisson Model.
Variable Coefficient St Error z-Statistic p-Value
Intercept -8.1338 0.9220 -8.822 <0.0001
No Residency 0.2090 0.2012 1.039 0.2988
Female -0.1954 0.2182 -0.896 0.3703
Revenue 0.0016 0.0028 0.571 0.5775
Hours 0.0007 0.0004 1.750 0.0452