0% found this document useful (0 votes)
40 views15 pages

Lecture 10 Spring 2017

Poisson regression is used to model count data that follows a Poisson distribution. It assumes the mean and variance of the count are equal. The model relates the log of the mean to predictors using a log link function. Maximum likelihood is used to estimate the regression coefficients. The coefficients are asymptotically normally distributed and can be used to perform statistical inference. The model can be extended to allow for overdispersion by adding a dispersion parameter.

Uploaded by

Madina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views15 pages

Lecture 10 Spring 2017

Poisson regression is used to model count data that follows a Poisson distribution. It assumes the mean and variance of the count are equal. The model relates the log of the mean to predictors using a log link function. Maximum likelihood is used to estimate the regression coefficients. The coefficients are asymptotically normally distributed and can be used to perform statistical inference. The model can be extended to allow for overdispersion by adding a dispersion parameter.

Uploaded by

Madina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Poisson regression

1/15
Counts data

Examples of counts data:

I Number of hospitalizations over a period of time

I Number of passengers in a bus station

I Blood cells number in a blood sample

I Number of typos in a book

I ······

2/15
Example: tortoise species data
The Galapagos Islands off the coast of Ecuador are great
locations for studying the factors that influence the development
and survival of different life species. The data set provides
counts for the total number of tortoise species, and the number
of species that occur only on that one island (the endemics)
(Johnson and Raven, 1973).

3/15
Example: tortoise species data

This data set also contains the following geographic variables:

I Area: area in square km;

I Elevation: elevation in meters;

I Nearest: distance from nearest island;

I Scruz: distance from Santa Cruz (which is near the center


of the Galapagos);

I Adjacent: area of adjacent island in square km.

4/15
Poisson distribution for counts data

I Poisson distribution can be defined via a counting process


with the following properties:
1. The expected number of events occurring in an interval of
time is proportional to the length of the interval.
2. The probability that two events occurring in an infinitely
small interval is 0.
3. The number of events occurring in separate intervals are
independent.

I Poisson is a good approximation of Binomial distributed


data when the total number of trials is large and small
success probability.
5/15
Poisson regression

Assume that the response Yi is a count, where Yi could taking


values 0,1,2,· · · . The distribution of Yi may be modelled by the
Poisson distribution with mean µi . That is

Yi ∼ Poisson(µi ),

which has the pmf f (y) = exp(−µ)µy /y! for y = 0, 1, 2 · · · .


Here µ > 0.

6/15
Link function

One common link function used for the Poisson regression is


the log function. That is

log(µi ) = XiT β,

where Xi is a p-dim predictor and β is a p-dim unknown


coefficients. The link function implies that

µi = exp(XiT β).

7/15
Maximum likelihood estimator

The log-likelihood function of β is


n n n n
Y e−µi µYi i X X X
`(β) = log{ }= Yi log(µi ) − µi − log(Yi !)
Yi !
i=1 i=1 i=1 i=1
n
X n
X n
X
= Yi XiT β − exp(XiT β) − log(Yi !).
i=1 i=1 i=1

The the MLE for β is


n
hX n
X i
β̂ = arg max Yi XiT β − exp(XiT β) .
β
i=1 i=1

8/15
Score function and hessian matrix

I The score function is


n
∂`(β) X
= {Yi − exp(XiT β)}Xi .
∂β
i=1

I The MLE β̂ is a solution of ∂`(β)/∂β = 0.

I The Hessian matrix is


n
∂ 2 `(β) X
= − Xi XiT exp(XiT β) = −X T VX ,
∂β∂β T
i=1

where X = (X1 , · · · , Xn )T is an n × p design matrix and


V = diag{exp(X1T β), · · · , exp(XnT β)}.

9/15
Asymptotic normality of β̂

Applying the large sample theory of the maximum likelihood


estimator β̂, we have

β̂ − β ∼ N(0, (X T VX )−1 ).

Wald type inference for β could be based on the asymptotic


normality.

10/15
Deviance
I The log-likehood for µi in a saturated model is
n
X
`(µi ) = {Yi log(Yi ) − Yi } + Const..
i=1
I The log-likelihood for µi is the full model with
µi = exp(XiT β) is
n
X
`(β) = {Yi log(µ̂i ) − µ̂i } + Const..
i=1

where µ̂i = exp(XiT β̂) and β̂ is the MLE of β.


I The deviance is then defined as
n
X
D=2 {Yi log(Yi /µ̂i ) − (Yi − µ̂i )}.
i=1
11/15
Some remarks

I The likelihood ratio type inference could be conducted


based on the deviance.

I The analysis of deviance can be done as that in logistic


regression model.

I The model diagnostic and residual plots could be also


done similarly as those in logistic regression model.

12/15
Over or under dispersion

I In poisson regression model, we assume that

E(Yi ) = Var(Yi ) = µi .

Note that the mean and variance are the same. This might
not be flexible in practice.

I A generalization of the Poisson regression model is

E(Yi ) = µi and Var(Yi ) = φµi ,

where φ is the dispersion parameter.

13/15
Quasi-likelihood

I Similar to the logistic regression model, the quasi


log-likelihood for β can be defined as
n Z µi
X Yi − µ
Q(β) = dµ
Yi φV (µ)
i=1

where V (µ) = µ and µi = exp(XiT β).

I The estimation of β is the same as the usual poisson


regression without dispersion parameter.

I The asymptotic normality of β̂ is β̂ − β ∼ N(0, φ(X T VX )−1 ).

14/15
Estimation of dispersion parameter

The dispersion parameter φ can be estimated by


Pn
(Yi − µ̂i )/µ̂i
φ̂ = i=1 .
n−p

where µ̂i = exp(XiT β̂).

15/15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy