SPE Poisson Logistic Regression
SPE Poisson Logistic Regression
Janne Pitkäniemi
Finnish Cancer Registry
Tampere university
1 / 20
Elapse of time and Epidemiology
Epidemiology deals with the occurence of event (disease) in populations observed
over time
▶ concepts of risk and rate are used to measure the frequency with which the
event (disease) cases occur
▶ risk is dened as D N , where D is the number of people who developed the
disease during pre-specied follow-up from 0 to t and N is the number of
disease-free population at the beginning of follow-up and
▶ rate is dened as YD , where Y is the amount of person-time at risk observed
when following disease free subjects from 0 to t.
▶ Note: risk increases with t but rate can vary depending on the length of the
follow-up period.
▶ Virtually all prospective follow-up studies include loss to follow-up
censoring and risk must be estimated using appropriate methods
described in this course.
2 / 20
Points to be covered
▶ Incidence rates, rate ratios and rate dierences from
follow-up studies can be computed by tting Poisson regression models.
▶ Risk ratios and dierences can be computed from binary data by tting
Logistic regression models.
3 / 20
The Estonian Biobank cohort: survival among the elderly
Follow-up of 60 random individuals aged 75-103 at recruitment, until death (•)
or censoring (o) in April 2014 (linkage with the Estonian Causes of Death
Registry). (time-scale: calendar time).
60
●
●
● ●
● ●
● ●
●
50
●
●
● ●
● ●
● ● ●
●
40
● ●
● ●
●
●
●
index
● ●
●
30
●
●
● ●
● ●
●
●
●●
20
●
●
●
●
●
●
● ● ●
●
10
●
●
●
●
● ●
● ●
● ●
●
0
Time
4 / 20
The Estonian Biobank cohort: survival among the elderly
Follow-up time for 60 random individuals aged 75-103 at recruitment (time-scale:
time in study).
60
● ●
● ●
● ●
● ●
● ●
50
● ●
●
●
●
● ● ●
●
40
● ●
● ●
●●
●
index
● ●
●
30
●●
● ●
● ● ●
●
●●
20
●
●
●●
●
●
● ● ●
●
10
●
●● ●●
● ●
● ●
● ●
0
0 2 4 6 8
5 / 20
Events, dates and risk time
▶ Mortality as the outcome:
d: indicator for status at exit:
1: death observed
0: censored alive
▶ Dates:
y = (dox - doe)/365.25
6 / 20
Crude overall rate computed by hand and model
Total no. cases, person-years & rate (/1000 y):
> D <= sum( d ); Y <= sum(y) ; R <= D/(Y/1000)
> round( c(D=D, Y=Y, R=R), 2)
D Y R
884.00 11678.24 75.70
R-implementation of the rate estimation with Poisson regression:
A model with oset term A model with poisreg=family (Epi package)
> m1 <= glm( D ~ 1, family=poisson,
oset=log(Y)) > glm(cbind(D, Y) ~1, family=poisreg)
n
[δi log (λ) − λyi ]
P
log (L) =
i =1
Solving the score equations:
∂ log L(λ)
− yi = Dλ − Y = 0 and − λY = 0
P δ
∂λ
= λ
i
D
y ) = β0 + β1 x1
µ
log (
9 / 20
Comparing rates: The Thorotrast Study
▶ Cohort of seriously ill patients in Denmark on whom angiography of brain
was performed.
▶ Exposure: contrast medium used in angiography,
1. thor = thorotrast (with 232Th), used 1935-50
2. ctrl = other medium (?), used 1946-63
▶ Outcome of interest: death
10 / 20
Tabulating rates: thorotrast vs. control
Tabulating cases, person-years & rates by group
> stat. table ( contrast ,
+ list ( N = count(),
+ D = sum(d),
+ Y = sum(y),
+ rate = ratio(d,y,1000) ) )
============================================
contrast N D Y rate
============================================
ctrl 1236 797.00 30517.56 26.12
thor 807 748.00 19243.85 38.87
============================================
11 / 20
Rate ratio estimation with Poisson regression
▶ Include contrast as the explanatory variable (factor).
▶ Insert person years in units that you want rates in
> m2 <= glm( cbind(d,y/1000) ~ contrast,family = poisreg(link="log") )
> round( summary(m2)$coef, 4)[, 1:2]
13 / 20
Rate dierence estimation with Poisson regression
▶ The approach with d/y enables additive rate models too:
> contrast<-c(0,1)
> m5 <-glm(cbind(d,y/1000) ~contrast,
family=poisreg(link="identity") )
> round( ci.exp(m5,Exp=F), 3 )
14 / 20
Binary data: Treatment success Y/N
85 diabetes-patients with foot-wounds:
▶ Dalterapin (Dal)
▶ Placebo (Pl)
Treatment/Placebo given to diabetes patients, the design is prospective and
outcome is measured better(Y)/worse(N). Is the probability of outcome more
than 15% yes, then use the risk dierence or risk ratio (RR)
Treatment group
Dalterapin Placebo
Better 29 20
Worse 14 22
Total 43 42
29 20
p̂ Dal = 43 = 67% p̂ Pl = 42 = 47%
15 / 20
Binary data: Crosstabulation analysis of 2x2 table
> library(Epi)
> dlt <- rbind( c(29,14), c(20,22) )
> colnames( dlt ) <- c("Better","Worse")
> rownames( dlt ) <- c("Dal","Pl")
> kable(twoby2( dlt ),"latex")
2 by 2 table analysis:
Better Worse P(Better) 95% conf. interval
Dal 29 14 0.6744 0.5226 0.7967
Pl 20 22 0.4762 0.3316 0.6249
95% conf. interval
Relative Risk: 1.4163 0.9694 2.0692
Sample Odds Ratio: 2.2786 0.9456 5.4907
Conditional MLE Odds Ratio: 2.2560 0.8675 6.0405
Probability difference: 0.1982 -0.0110 0.3850
16 / 20
Binary regression estimation of odds ratio
For grouped binary data, the response is a two-column matrix with columns
(successes,failures).
> library(Epi)
> library(xtable)
> dlt <- data.frame(rbind( c(29,14),c(20,22) ))
> colnames( dlt ) <- c("Better","Worse")
> dlt$trt <- c(1,0)
> b2<-glm(cbind(Better,Worse)~trt,
+ family=binomial(link="logit"),
+ data=dlt)
> xtable(round( ci.exp( b2 ), digits=6 ))
exp(Est.) 2.5% 97.5%
(Intercept) 0.91 0.50 1.67
trt 2.28 0.95 5.49
▶ The default parameters in logistic regression are odds (the intercept:
20/22 = 0.9090) and the odds-ratio ((29/14)/(20/22) = 2.28).
▶ This is NOT what you want, because odds ratio is biased estimate of the
risk ratio.(recall if p>10% 1−p p ̸≈ p )
17 / 20
Binary regression - Estimation of risk ratio (Relative risk)
> library(Epi)
> library(xtable)
> dlt <- data.frame(rbind( c(29,14),c(20,22) ))
> colnames( dlt ) <- c("Better","Worse")
> dlt$trt <- c(1,0)
> b2<-glm(cbind(Better,Worse)~trt,
+ family=binomial(link="log"),
+ data=dlt)
> xtable(round( ci.exp( b2 ), digits=6 ))
exp(Est.) 2.5% 97.5%
(Intercept) 0.48 0.35 0.65
trt 1.42 0.97 2.07
Diabetics with Dalterapin treatment are 1.4 times likely to get better than those
treated with placebo
18 / 20
Binary regression - Estimation of risk dierence
> library(Epi)
> library(xtable)
> dlt <- data.frame(rbind( c(29,14),c(20,22) ))
> colnames( dlt ) <- c("Better","Worse")
> dlt$trt <- c(1,0)
> b2<-glm(cbind(Better,Worse)~trt,
+ family=binomial(link="identity"),
+ data=dlt)
> xtable(round( ci.exp( b2,Exp=F ), digits=6 ))
Estimate 2.5% 97.5%
(Intercept) 0.48 0.33 0.63
trt 0.20 -0.01 0.40
Twenty percent more of the Diabetics with Dalterapin treatment are getting
better compared to Diabetics treated with placebo
19 / 20
Conclusion: What did we learn?
▶ Rates, their ratio and dierence can be analysed by Poisson regression
▶ In Poisson models the response can be either:
▶ case indicator d with offset = log(y), or
▶ case and person-years cbind(d,y) with poisreg-family (Epi-package)
▶ Both may be tted on either grouped data, or individual records.
▶ Binary outcome can be modeled with binary regression.
20 / 20