0% found this document useful (0 votes)
66 views15 pages

International Statistical Institute (ISI)

This document discusses how adjusting for covariates affects precision in logistic regression models compared to classic linear regression models. While adjustment for non-confounding predictive covariates improves precision in linear regression, the paper shows that such adjustment results in lower precision in logistic regression. However, when testing for treatment effects in randomized studies, adjusting for predictive covariates is always more efficient when using logistic models, similar to linear regression. The paper proves these results analytically and discusses how key factors like association between variables differently impact precision in logistic versus linear models.

Uploaded by

Guilherme Marthe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views15 pages

International Statistical Institute (ISI)

This document discusses how adjusting for covariates affects precision in logistic regression models compared to classic linear regression models. While adjustment for non-confounding predictive covariates improves precision in linear regression, the paper shows that such adjustment results in lower precision in logistic regression. However, when testing for treatment effects in randomized studies, adjusting for predictive covariates is always more efficient when using logistic models, similar to linear regression. The paper proves these results analytically and discusses how key factors like association between variables differently impact precision in logistic versus linear models.

Uploaded by

Guilherme Marthe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Some Surprising Results about Covariate Adjustment in Logistic Regression Models

Author(s): Laurence D. Robinson and Nicholas P. Jewell


Source: International Statistical Review / Revue Internationale de Statistique, Vol. 59, No. 2
(Aug., 1991), pp. 227-240
Published by: International Statistical Institute (ISI)
Stable URL: http://www.jstor.org/stable/1403444 .
Accessed: 16/06/2014 14:13

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to
International Statistical Review / Revue Internationale de Statistique.

http://www.jstor.org

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
International Statistical Review (1991), 58, 2, pp. 227-240. Printed in Great Britain
StatisticalInstitute
? International

Some Surprising Results About Covariate


Adjustment in Logistic Regression Models
Laurence D. Robinson and Nicholas P. Jewell
Program in Biostatistics and Department of Statistics, University of California, Berkeley,
CA 94720,USA

Summary
Results from classic linear regressionregardingthe effect of adjustingfor covariatesupon the
precisionof an estimatorof exposure effect are often assumed to apply more generally to other
types of regressionmodels. In this paperwe show that such an assumptionis not justifiedin the case
of logistic regression,where the effect of adjustingfor covariatesupon precisionis quite different.
For example, in classiclinear regressionthe adjustmentfor a non-confoundingpredictivecovariate
results in improvedprecision, whereas such adjustmentin logistic regression results in a loss of
precision.However, when testing for a treatmenteffect in randomizedstudies, it is always more
efficientto adjustfor predictivecovariateswhen logisticmodels are used, and thus in this regardthe
behaviorof logisticregressionis the same as that of classiclinear regression.

Key words: Adjustment for covariates; Asymptotic relative efficiency;Classic linear regression;
Logisticregression;Omittedcovariate;Precision.

1 Introduction
The ability of covariance adjustment to improve the precision of estimates is a
long-standing idea in statistics that originated with R.A. Fisher (1932). In particular, in a
randomized experiment, when the assumptions of 'classic' linear regression apply,
adjustment for covariates that are associated with the response variable is not required to
obtain a valid estimate of the treatment effect, but nonetheless is desirable, as it will
improve the precision of the treatment effect estimate. This improvement in precision can
be explained in terms of a reduction of residual variance, an intuitive notion so persuasive
that it has become the conventional wisdom to assume that similar gains in precision will
be achieved with respect to regression models other than the classic, such as logistic
regression (Mantel & Haenszel, 1959; Mantel, 1989).
Recently, however, some authors have recognized that in some situations the
conventional wisdom regarding covariate adjustment does not apply. Wickramaratne &
Holford (1989) give a specific 2 x 2 x 2 contingency table example (for which a logistic
regression analysis is appropriate) in which the pooled and stratum specific (log) odds
ratio estimates are equal, and where the variance of the pooled (log) odds ratio estimate
is less than that of the stratified estimate. This point was also addressed by Breslow &
Day (1987).
In this paper it will be proven that adjustment for covariates always leads to a loss (or
at best no gain) of precision with respect to logistic regression models. Section 2 outlines
the details of the classic linear regression model, which are then used for comparison with
logistic model results obtained in later sections. In ? 3 logistic regression models which
parametrize the 2 x 2 x 2 contingency table situation are introduced, and asymptotic

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
228 L.D. ROBINSON and N.P. JEWELL

varianceformulaefor the pooled and adjusted estimates of exposure effect are stated. In
? 4 it is demonstratedthat the varianceof the pooled estimate is alwaysless than or equal
to the variance of the adjusted estimate, and this result is then extended to the more
generalcase of several strata. In ? 5 it is demonstratedthat the result of ? 4 also applies to
common finite sample estimates of the asymptoticvariances. In ? 6 a simple argument,
involvingthe symmetricnatureof logistic regression,is given which demonstratesthat the
conventionalwisdom cannot apply to logistic regression. In ? 7 the effects of certain key
factorswhich influenceprecision are examined. In ? 8, it is shown that it is always as or
more efficientto adjust for covariateswhen testing for the presence of a treatmenteffect
in randomizedstudies, in the context of a logistic regressionmodel, despite the associated
loss of precisiondemonstratedin ? 4.

2 The 'Conventional(Classic Linear Regression) Wisdom'


Supposethe followingtwo classic linear regressionmodels provide valid descriptionsof
the structureof a population:

E(YIX1)= a*+b*X1, var(Y X1) = 21 (a constant); (1)


E(Y IX, X2) = a + b1X1+ b2X2, var (Y IX, X2) = 0212 (a constant). (2)
Supposenow that a large simple randomsample is obtained from the population, and
both models are fitted to the data via the method of least squares, resultingin estimators
6* and 6, (of b* and b1, respectively).We will denote the asymptoticrelativeprecisionof
the estimator61 (relative) to the estimatorb6 by ARP(b1to bT),and define it as follows:

ARP (61 to ) (b,)]-' - var (6I;)


A[var
-[ar var
Thus, our measure of the precision of an estimator is the inverse of its asymptotic
variance.
For the estimators~6 and ~6 associatedwith models (1) and (2), the following formula
for ARP(b1 to b ) can be obtained:
2
ARP(61 to ')= 1-
12
1 -
p Y2.1

Here p12 is the simple correlation between the variables X1 and X2, and PY2.1 is the
partialcorrelationbetween the variablesY and X2 conditionalon fixed X1.
The comparison of asymptotic variances is of particularinterest when there is no
confounding,i.e. when b* = bl, and hence both estimatorsb^ and 6• are estimatingthe
same unknownpopulationparameter.For the classic linear regressionmodels described
above, there will be no confoundingif one or both of the following two conditionsholds.
Condition 1. pl2 = 0.

Condition 2. PY,2.1= 0 (note this is equivalent to b2 = 0).


When Condition 1 alone holds, ARP to 6) = (1 - p2.1)-1 > 1. It is this result that
explains the desirability of adjusting (b1
for a predictive covariate in randomized studies,
even though a valid estimate of treatment effect can be obtained without adjustment.
When Condition 2 alone holds, ARP ( to ;) 1 - p22 < 1. This result explainswhy it is
undesirable to adjust for a non-predictive covariate which is correlated with the risk factor
of interest. When both Conditions 1 and 2 hold, ARP(1 to =:)= 1.

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
Covariate Adjustment in Logistic Regression Models 229

We now consider the behavior of the ARP(6I to t) more generally, not restricting
ourselves to conditions of no confounding. This provides important insight into key
factors which influence the precision of the estimator 61. In general, the value of
ARP (61 to b~) is seen to be (i) less than, (ii) equal to, or (iii) greaterthan 1 dependingon
whether p22.1 is (i) less than, (ii) equal to, or (iii) greater than p22. Thus a strong
associationbetween Y and X2 has a beneficialeffect upon the precisionof ~6, whereas a
strongassociationbetween X1 and X2 has a detrimentaleffect, and hence the precisionof
b1reflectsthe competingeffects of these Y - X2 and X1 - X2 relationships.
It is the above behavior of the ARP(61 to 1'), and more generally of the precision of
61, that we loosely refer to as the conventionalwisdom. The purpose of this paper is to
demonstrate that the conventional wisdom breaks down with respect to the logistic
regressionmodel.

3 The Logistic Regression Model


Let Y, X1, and X2 each be a dichotomousvariabletaking on the values 0 and 1. The
variableY will be consideredthe responsevariable,and the variablesX1 and X2 potential
risk factors. The variable X1 will be considered the risk factor of primary interest.
Individualsfor whom the value of X1 equals 1 will be referredto as 'exposed', and those
for whom the value of X1 equals 0 as 'unexposed'. Individualsfor whom the value of Y
equals 1 will be referredto as 'diseased', and those for whom the value of Y equals 0 as
'non-diseased'.Of course, all of the following results and comments apply whatever the
variablesrefer to in specific applications.Let us now assume the following two logistic
regressionmodels both provide a valid descriptionof the populationstructure:

pprr(Y=1X bXX1,
log[ ) a* + (3)
[ pr(Y= 1
IX X2)
log pr(Y==11X, a + b lX1 + b2X2. (4)
1 - pr (Y= 1 X1, •2)
X2)
Model (3) always provides a valid descriptionof the relationshipbetween the dichoto-
mous variables Y and X1, whereas model (4) imposes an assumptionof no interaction
(i.e. the variablesX1 and X2 are assumedto have additiveeffects with respect to the log
odds).
Suppose now that simple randomsamples of N1 exposed and No unexposed individuals
are obtained, and that both logistic regressionmodels are fit via the method of maximum
likelihood, resulting in respective estimators 6T and 61. Standard likelihood theory
techniquesresult in the following asymptoticvarianceformulae(Gart, 1962):

where p, = pr (Y = 1 I X = i) for i = 0, 1 and qi = 1 -pi;

1[ 1 1 1
NooPoo
No0qol(
NloPlo Nloqlo NolPol Nooqoo
SNlJpll N1lq1l

Here pi = pr(Y = 1 X1 = i, X2 =j) for i, j = 0, 1, qi = 1 - p, and N, equals the number


of individuals sampled for whom X1=i and X2=j. Note that N11+N1o=N1 and

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
230 L.D. ROBINSON and N.P. JEWELL

No, + Noo= No. Also, here and throughout,the term asymptoticrefers to both No and N1
tendingto infinity.
The second varianceformulagiven above is conditionalon both X1 and X2. However,
in accordancewith the samplingscheme, in which prespecifiednumbersof exposed and
unexposedindividualsare sampled, but where the distributionof X2 is allowed to vary,
for the purposeof definingthe asymptoticrelative efficiencywe shall requirethe variance
conditional only on X1, that is var (b IX1), and thus must take the expectation of
var(b•1 X1, X2) with respect to X2. This resultsin the following formula:
{N
_[ 1 1 1 1 1
var(b1 X1)= +- + +N1
f1N 11jp11 . NIC11q11 NocolPo0 NO001qj_
1 1 1
+ [c + N + N + N 1 q]-11-1 (7)
[ NocooPoo Nocooqoo
SNlcioPlo
"
In the above formula ci = pr (X2=j i) for i, j= 0, 1.
X1= Nlcloqio
Table 1 gives the set of tables which representthe outcomes expected to result from the
samplingscheme describedabove. In all tables in this paper D will denote 'diseased', DI
'non-diseased',E 'exposed', and E 'unexposed'. To avoid technical difficulties,we will
assume that the population contains no 'structuralzeroes' (McCullagh& Nelder, 1983,
p. 61), so that none of the expected cell entries are 0.
The entries in the pooled table equal the sum of the correspondingentries in the two
sub-tables,for example
Nlcloplo + N1cjllP1 = Nl[cloPlo + c11P11] =
NlP1.
Furthermore,we see that var (IT IX1) simply equals the sum of the inverses of the
expected cell entries of the pooled table, and that var (61 X1) can be expressed as
[Vi' + Vo1]-1, where Vjequals the sum of the inversesof the expected cell entries of the
sub-tableX2 = j, for j = 0, 1.
The estimator 61 referred to above is the maximum likelihood estimator. Another
estimator commonly used is the 'inverse variance weighted, stratified estimator'
(Weinberg,1985), in which the parameter b1 is estimated separately from the two
sub-tables (using observed proportions), and then a weighted average of the two
estimates is obtained, the weights being inversely proportional to their respective
estimated variances. This estimator, also referred to as the 'Woolf estimator' (Woolf,
1955), can easily be computed by hand, whereas the maximum likelihood estimator
generallyrequiresan iterativescheme, and hence the use of a computer. It can be shown
that the varianceformulae given for the maximumlikelihood estimator I1 also apply to
the Woolf estimator(Gart, 1962).

4 The Asymptotic Relative Precision of b1 versus


b1* of the estimator
As in ? 2, we will denote the asymptoticrelativeprecision 6, (relative)
to the estimator I: by ARP ('1 to It), and define it in terms of the ratio of the inverse of

Table 1
Expected cell frequencies for pooled and sub-tables from cohort design.
Pooled table Sub-table X2 = 0 Sub-table X2 1

D D Total D D Total D D Total


E N1 E E
NlcloPo0 Nlcloq0o Nocoo
Nip1 Nlq1
E NoPo Noqo No E NocooPoo Nocooqoo Nlclo E Nlcllp1z
NocolPol0 Nlcllq1
Nocol0qO Nlclj
Nocol

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
Covariate Adjustment in Logistic Regression Models 231

asymptoticvariances:
[var (61 X1)]-' var
to (1
ARP
(• Ix1)
var(b, X,)
A[varR(6b|X)]-f
Here the asymptotic variances are those stated in ? Note again that both of these
3.
asymptotic variancesare conditionalon X1, in accordancewith the samplingscheme.
The main result of this paper is that ARP(•1 to T) 1, with equality occurringif and
only if the variableX2 is independentof (Y, X1). Since ARP(61 to b ') < 1 is equivalentto
[var (6*' X1)]-1 [var (61 X1)]-',
we must show that
1 1 1 1 -1 1 1 1 -1 1
-
[Npi N oqo Nlcloqlo NocooPoo NocooqooJ
NiqlNoPo
PNicioplo 1
'+ + + --1q
+[NC1 N1p1
NlCllpll NlCllqll NocolPol Nocolqol
This result follows readily as an applicationof Minkowski'sinequality(Hardy,
Littlewood& Polya,1952,pp. 30-31), whichfor our purposesmaybe statedas follows:
assumeall aijpositive,for i = 1, ..., I andj = 0, ..., J - 1. Forfiniter < 1, butnot equal
to 0, we havethe following:
II J-1 -r l1r J-1 I -l1r

i=1 j= j=0 i=1

with equality occurringif and only if a1j= k a1j,for all i and all choices of j / j' and for
some finite k > 0, where the value of k depends on the specificchoice of j and j'.
For our particularapplication, we restate the above theorem for the specific case of
I 4, J = 2, and r = -1. This yields
=
-
- - - 1- 11 1 1 1
+ 111 + +
++ >- -+-+-+- +
ao + a a20 + a21 a3 +
a31
a4a40
+ 1 10 20
a30 a40
[1 1 1
+ -+-+-+- 1_1
all a21 a31 a41
withequalityoccurringif andonlyif ail = k for i = 1, ... , 4 andfor somefinitek > 0.
Fromthiswe can immediately concludethataio

[var(T' IX1)]-1' [var(1 1


X1)]-1,
and hence that ARP(•1 to IT) < 1, as an applicationof Minkowski'sinequality,where
alo = all = NicliPil, a20= Nlcloqlo, a21= Nlllqll,
Nlcloplo,
a30= NOCOOPo, a31= NOC01Po01,a40 = NOCq000, a41= N0C0olqo01.
In the above statement of Minkowski's inequality it is stated that ARP (•1 to It) = 1 if
and only if = k a1o for i = 1, . . . , 4, a condition referred to as 'proportionality' by
ail
Hardy et al (1952). It can be shown (Bishop, Fienberg & Holland, 1975, p. 47) that, for
our application, such proportionality is equivalent to the variable X2 being independent
of (Y, X1).
We now examine the behavior of the ARP ( 1 to It) for logistic regression, and compare
it with the behavior seen for classic linear regression in ? 2. As with regard to classic
linear regression, of particular interest are those situations where there is no confounding.

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
232 L.D. ROBINSON and N.P. JEWELL

For the logistic regressionmodels (3) and (4) stated in ? 3 there will be no confounding,
that is b~ = b1, if one or both of the following two conditionsholds (Gail, 1986).
Condition1'. X1 and X2 are independentgiven Y.
Condition2'. Y and X2 are independentgiven X1 (note this is equivalentto b2 = 0).
Condition2' is very much analogousto the no confoundingCondition2 PY2.1= 0 of ? 2.
In particular,for classic linear regressionthe condition 'Y and X2 independentgiven XI'
does in fact imply that PY2.1=0. When Condition 2' alone holds, ARP(i1 to /*)<1,
which is the same result as was obtained for classic linear regressionwhen the analogous
no confoundingCondition2 holds. Thus we see that, for both logistic and classic linear
regression, adjustmentfor a non-predictivecovariate X2 which is associated with the
predictorvariableX1 results in a loss of precision.
Condition 1' may also be regarded as analogous to the no confounding Condition 1
P12 = 0 of ? 2, in that both conditions refer to a lack of an association between the
variables X1 and X2. However, for logistic regression the absence of association is
conditionalon Y, which is not the case for classic linear regression.In particular,it is not
Condition 1' which implies Condition 1 P12 = 0 with respect to classic linear regression,
but rather the condition 'XI and X2 independent'. When Condition 1' alone holds,
ARP (61 to
/b)< 1, which is not consistent with the analogous result from classic linear
regression,where we saw that ARP (b1 to 6*)> 1 when the no confoundingCondition 1
holds. Thus, whereas adjustingfor a non-confoundingcovariate X2 which is associated
with the dependent variable Y (conditional on X1) results in a gain in precision with
respect to classic linear regression,it results in a loss of precisionwith respect to logistic
regression.
When both Conditions 1' and 2' hold, the variableX2 is independentof (Y, X1), and
thus ARP(61 to b~)= 1. Furthermore,note that when X2 is independentof (Y, XI), with
respect to classic linear regression,both P12 = 0 and PY2.1= 0. Thus, for both logistic and
classic linear regression, ARP (b1 to b") = 1 when the variable X2 is independent of
(Y, X1).
More generally,for classic linear regressionwe saw that the value of ARP (61 to T) can
be less than, equal to, or greaterthan 1 dependingon the relative strengthsof the Y1 - X2
and X1 - X2 relationships,whereas for logistic regressionthe value of ARP (61 to 6I*) is
always less than or equal to 1 (again with equality occurring if and only if X2 is
independentof (Y, X1)). This suggests that, unlike classic linear regression, where the
Y - X2 and X1 - X2 relationshipshave opposingeffects which compete with each other to
determine the relative precision of b1, with respect to logistic regression these two
relationshipshave similareffects which combine to cause an automaticloss of precision.
Sections6 and 7 give additionalinsightinto the behaviorof the ARP (61 to /T) with respect
to logistic regression.
The result we have obtained for the case of two strata, i.e. two levels of the variable
X2, can be extended to the more general case of J > 2 strata in a straightforward manner.
In particular, the asymptotic variance of the maximum likelihood estimator of b1 can be
shown to equal (Gart, 1962)
_J- 1_ 1 1 1

var(1I N1p-+-N +-Nc+-N


X1)=
;=o NIClCPli Niciqi Noco;Poi
where Noco/qo;

ci =pr(X2=j X =i), =i, X2=j), (i=O, 1;j=O,...,J-1).


This asymptotic variance, which also pertains to the Woolf estimator of b1, is a simple
pij=pr(Y=1X,
extension of the formula given previously for the J =2 strata case. The desired result,

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
Covariate Adjustment in Logistic Regression Models 233

[var(/T IX1)]-' ~[var (b• X1)]-', then follows as an application of Minkowski's in-
equality with I=4, r =-1, and J = the number of strata, in a manner completely
analogouswith the two strata case. This also allows extension of the asymptoticrelative
precisionresult to the case of adjustmentfor a set of discrete covariates.

5 The RelationshipBetween Estimated Variances

In ? 4 it was proven that var (T I X1))- var (61 1X1), a result which pertains to the
asymptoticvariances.Suppose now that an actual set of data is obtained, and from that
data set estimates of b' and b1 are computed. In this section we will consider both the
maximumlikelihood estimator and the Woolf estimator of bl. Typically an investigator
will also obtain estimates of the variancesvar (I~ X1) and var (61• X1). Although the
maximumlikelihood estimatorand the Woolf estimatorof b1 have the same asymptotic
variance, the method by which this variance is estimated is generally different. In this
section we will examine the question of whether the result of ? 4, which pertains to
asymptoticvariances,extends to their common finite sample estimates.
The estimationof var ( IX1) given by (5) is very straightforward.There is only one
commonlyused method for estimatingvar (IT 1X1), namely to substitute the maximum
likelihood estimates and ,0 for p, and po, respectively. These maximumlikelihood
P1
estimates are the observed proportions of diseased individuals (that is Y = 1) among
exposed (that is X1 = 1) and unexposed (that is X1 = 0) individuals.Suppose now that the
data is as given in Table 2. From this data set we obtain the estimated variance of IT,
denoted by var (IT), as

1 1 1 1
var (6) =- + + +
alo + a11 a20+ a21 a30+ a31 a40+ a41
Let us now examinethe issue of estimatingthe varianceof I1. Usually,we further
conditionon X2 in calculatingan estimatedvarianceof 61. Therefore,regardlessof
whetherthe maximumlikelihoodestimatoror the Woolfestimatorhas been usedfor ~1,
the estimate of var (• IX1, X2), which we shall denote by var (b•), is obtained by
substitutingestimates pij (for the unknown probabilitiespij, for i, j = 0, 1) into the
asymptoticvarianceformula(6).
When the method of maximumlikelihoodis used to obtain an estimate b1, typicallythe
estimatesPijwhich are substitutedinto the asymptoticvarianceformulaare the maximum
likelihood estimates based on the regressionmodel. When the Woolf estimatoris used,
typicallythe estimatesfij which are substitutedare the observed proportions,which, in
this case, are not the same as the maximumlikelihoodestimates. Note, however, that it is
actuallyvalid to substituteeither set of estimates ^ijinto the asymptoticvarianceformula
regardlessof which estimator b1 has been obtained (although the maximumlikelihood
estimates YI will generallynot be availablewhen the Woolf estimatorhas been obtained).
First consider the var (b1) obtained by substitutingthe observed proportions in (6);

Table 2
Data for pooled and sub-tables arising from cohort studies.
Pooled table Sub-table X2 = 0 Sub-table X2 1

D D D D D D
E alo+a a20+a21 E a,( a20 E a,, a21
E a3o + a3
a0 +a41 E a30 a4o E a31
a4o a41

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
234 L.D. ROBINSON and N.P. JEWELL

then
var[ 1
1-- 1 1 a 1a1
1 -1

-1
ao a20 a30 40l a21 a31 41

Thus it follows immediately from Minkowski's inequality that when the observed
proportions3p are used, vrr(6*) ~ var(61), and hence we see that, in this case, the
relationshipbetween the asymptoticvariancesextends to these estimatedvariances.
Now considerthe case where the maximumlikelihoodestimates •ijare used. It is a well
known property of maximum likelihood estimation that the fitted sub-table cell
frequencies must sum to the pooled table (Breslow & Day, 1980). Thus we have, for
example, Noo10+N11p11 equalling the total number of diseased, exposed (that is
Y = 1, X1 = 1) individuals,which in the above data set is a10+ a11. However, this total
number equals Nf1I, and thus we have No1,f10 + = Similarly we have
+ = etc. Thus, Minkowski's Nl1t11also
inequality Nlfl1.
applies when the maxi-
N1o^loNl1111 NI1,
mum likelihood estimates 5ijare used, and once again we have the result var (b~) ~
var(61).

6 The SymmetricNature of the Logistic Regression Model


As in ? 3, we will assume that the logistic regressionmodels (3) and (4) provide a valid
descriptionof the structureof a three dichotomousvariable system. It is a well known
propertyof logistic regressionthat when models (3) and (4) are valid, models (8) and (9)
are also valid (Breslow & Powers, 1978):
[ pr (Xi =11Y)]=c* * (8)
log-1 - pr (XI = 1 1Y) + Y

log [ 1 -pr(X =1Y, X2) c + d Y + dX2 (9)


pr (X1 =1 Y, X2)
Another well known propertyof logistic regressionis that = and b, = dl. Thus,
we see that, for the of the
purpose estimating parametersb,* bl
and dl
bl, we can either treat Y
as the responsevariableand X1 as a predictorvariable,or we can treatX, as the response
variableand Y as the predictorvariable.
Suppose now that the variablesX1 and X2 are independent given Y, and that there
exists a non-null associationbetween X2 and Y given X1 (and thus b2* 0). Considering
models (3) and (4), in which Y is the response variableand X, a predictorvariable,from
the conventional wisdom we would expect that var (b1 XI) < var (b IX1), that is
ARP(61 to •*) > 1. Thus, we would expect adjustmentfor the covariateX2 to result in an
increasein precision.
But now suppose the variableX, is treated as the response variableand the variable Y
as a predictorvariable, as in models (8) and (9). In this situation, the covariateX2 and
predictor variable Y are not independent given response variable X1, whereas X2 is
independent of the response variable X1 given Y (and hence d2 = 0). Now, from the
conventional wisdom we would expect that var(d1 I Y) > var(dc | Y), which implies
ARP to I*) < 1. Thus, from this point of view, the conventional wisdom suggests that
(•1
adjustment for the covariate X2 would result in a loss of precision.
Because of the asymptotic equivalence of the estimators IT and dia, and also of b1 and
we see that application of the conventional wisdom leads to a contradiction. Thus,
al,
from the 'symmetric nature' of the logistic regression model alone we can conclude that
the conventional wisdom must break down with respect to logistic regression. Further-

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
Covariate Adjustment in Logistic Regression Models 235

more, the symmetric nature immediately suggests that the Y-X2 and X1 -X
relationshipshave similareffects which combine to influenceprecision, in contrastto the
situationobserved with respect to classic linear regression,as was discussedpreviouslyin
? 4.
The previousargumentwas stated specificallywith respect to logistic regressionmodels
(3), (4), (8), and (9), for which the covariate X2 is dichotomous, and where it was
assumedthat the variablesX1 and X2 are independentgiven Y. However, the validityof
the argumentapplies more generally to situations where there is confounding, and to
situationsinvolvingadjustmentfor a set of covariates,some of which may be continuous.
Thus, we strongly suspect that for logistic regression, when the risk factor of primary
interestis dichotomous,adjustmentfor any set of covariateswill result in a loss (or at best
no gain) of precision.

7 The Effects of Key Factors which Influence Precision


In this section we look at the effects of certain key factors upon precision. First we
examinehow the strengthof the Y - X2 association,as measuredby b2, affectsprecision.
Subsequentlywe considerthe influenceof the marginaldistributionof Y.
The results of previous sections suggest that, for logistic regression, the stronger the
association between the variables Y and X2, conditional on X1, that is the larger the
magnitudeof b2, the poorer the precision of the estimator b/1.Furthermore,we might
suspect that as the magnitudeof b2 goes to infinity, the variance of might also go to
infinity. We will address these issues by examining the behavior of the11ARP (61 to b~) as
of
the value b2 varies, while the values of other parametersare held fixed.
In a three dichotomousvariablesystem there are 23 - 1= 7 parameterswhich are free
to vary. Let us now assume that the variablesX1 and X2 are independentgiven Y; that is
we will focus our attentionon a situationwhere there is no confounding.This assumption
actually imposes two restrictions upon the three dichotomous variable system, i.e.
independence of X1 and X2 at both levels of Y (also note that this assumption of
conditionalindependenceat both levels of Y implies that there is no interaction). Given
these two restrictions,the three dichotomousvariablesystem can now be parametrizedby
7 - 2 = 5 parameters.We will parametrizethis system with the followingfive parameters:
=p r(Y==pr(Y=1JX1=1), pr(Y=1IX=0O), pr(Xl=l),
m = pr(X2= 1 X1= 1, Y= 1)=pr(X2= 1 X1 = 0, Y= 1)= pr(X2= 1 Y= 1),
k=pr(X2= 1 X1 = 1, Y=0)=pr(X2= 1 X1 = 0, Y=0) =pr(X2= 1 Y=0).
The first three probabilitiesin the above list parametrizethe pooled table (note that in
a 'cohort' sampling scheme in which N1 exposed and No unexposed individuals are
sampled, the parameterpr (X1 = 1) is fixed at N1/(N1+ No) by the investigator).The last
two parameters,m and k, determine how the pooled table gets distributedinto the two
sub-tables(correspondingto X2 = 0 and X2 = 1). Given this parametrization,we have
[m/(1 - m)

Suppose now that we vary the parameter m while holding the remaining four
parametersfixed. In fixingthe first three parameters,we have fixed the pooled table, and
hence var (iT X1) also. Since b2 is a function of m, it varies with m. Since the
distribution of the pooled table into the sub-tables varies with m, we see that
var (b IX1), and hence ARP(bI to bT), also vary as m varies. Consider now the

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
236 L.D. ROBINSON and N.P. JEWELL

Table 3
Population probabilities that form the basis of Fig. 1.
Pooled table Sub-table X2 = 0 Sub-table X2 = 1

D D Total D D D D
E 0-250 0-250 0-500 E 0-250(1-m) 0-150 E 0-250m 0-100
E 0-125 0-375 0.500 E 0-125(1 - m) 0.225 E 0-125m 0-150

populationprobabilitiessummarizedin Table 3. In Table 3 the four parametershave been


fixed as follows:p, = 0.50, po = 0.25, pr (X1 = 1) = 0-50, k = 0.40. Suppose that we also
arbitrarilyfix the total sample size at 200. From this informationwe may immediately
obtain var (I~ X1) = 009333. By varying the parameterm, a graph of ARP ( 1 to I )
versus b2 is obtained, shown in Fig. 1.
From Fig. 1 we see that ARP (•1 to •T) achieves a maximumvalue of 1 at b2 =0, and
that it decreasesmonotonicallyas b2 moves away from 0 in either direction.Furthermore,
it is clear from this graph that ARP (•1 to IT) reaches asymptotesas b2 goes to plus and
minusinfinity.The values of these asymptotescan be easily computed. As m -- 1, b2-- c0,
and the X2 = 0 sub-table provides increasinglyless informationregardingthe parameter
bl. Thus var (11 X1) can be computed solely from the X2 = 1 sub-table, where m has
been set equal to 1. This results in
var (61 X1) = 50-1' + 20-1 + 25-1 + 30-1 = 0.14333.
From this we obtain the asymptote as b2--~ as 0.09333/0-14333= 0-65116. In a similar
mannerwe obtain the asymptoteas b2--~-oo as 0.09333/0-11556= 0-80769.
In this examplewe see that the loss of precisioninducedby adjustmentfor X2 increases
with the magnitudeof b2 in both directions. However, the varianceof 61 does not go off
to infinityas b2 goes to plus and minus infinity,but ratherapproachesasymptotesin both
directions.This reflectsthe fact that the variancedependsnot only on b2 but also on other
factors, particularlythe marginaldistributionsof X2 and Y, and that these other factors
have been fixed at levels (by our specificchoices of values for the four fixed parameters)
which limit the potential for loss of precisiondue to a strong Y - X2 association.

10-

0-9-

0.8-

0-7-

-6 -4 -2 0 2 4 6

b2
Figure 1. Plot of asymptotic relative precision (ARP) of the adjusted
estimator b1 to the pooled estimator bf, against b2, holding all other
parameters fixed at values shown in Table 3.

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
Covariate Adjustment in Logistic Regression Models 237
Table 4
Population probabilities that form the basis of Fig. 2.
Pooled table Sub-table X2 = 0 Sub-table X2 = 1

D D Total D D D D
E 0-500p, 0.500(1 -pi) 0-500 E 0-200p 1 0400(1 -p) E 0-300p, 0-100(1 -p)
E 0.500Po 0.500(1 - Po) 0.500 E 0-200po 0-400(1 - Po) E 0-300po 0-100(1 - po)

We now examine the effect of the marginal distributionof Y upon the asymptotic
relative precision by varying pr (Y = 1), while the values of other parametersare held
fixed. Again we will assumethat the variablesX1 and X2 are independentgiven Y, so that
the parametrizationof the previousexample also applies here. In this case we will fix the
values of the parameterspr (X1 = 1), m, and k, and then vary both p, and Po in such a
way as to hold

b- = log
[pl/(l-Po)
SPo/(1
-p04 _P0)
po)
-
fixed. As p, andpo vary, so does the overall incidenceof disease pr (Y = 1). Table 4 gives
an example of certainpopulationprobabilities.
In Table 4 we have set pr (X1 = 1) = 0.50, k = 0-20, and m = 0.60 (and thus b2 = log 6).
Again we consider a total sample size of 200. We now vary both p, and Po so as to hold
b* fixed at log 3, to obtain a graphof ARP (I1 to T) versus pr (Y = 1), shown in Fig. 2.
From Fig. 2 we see that for both small and large values of pr (Y = 1), the ARP to b )
is relativelyclose to the maximumvalue of 1, while for values of pr (Y = 1) closer(11 to
0.5
the ARP (61 to 6T) is further from 1. Thus, this particularexample suggests that the
potential for loss of precisiondue to adjustmentfor covariateswill tend to be greater in
cohort studies where the disease is relativelycommon. It must be noted, however, that
the minimumARP (h1 to 6*) value does not occur exactly at pr (Y = 1) = 0-5, and that in
fact the minimumcan occur at values of pr (Y = 1) quite far from 0-5 when the marginal
distributionof X2 is very skewed. Nonethelesss, in most studies the marginaldistribution

1"00

0-95-

0-90-

0.85-

0.0 0-2 0.4 0.6 0.8 1.0


pr(Y= 1)
Figure 2. Plot of asymptotic relative precision (ARP) of the adjusted
estimator b1 to the pooled estimator bf, against the probability of disease,
pr (Y = 1), holding all other parameters fixed at values shown in Table 4.

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
238 L.D. ROBINSON and N.P. JEWELL

of X2 will not be highly skewed, so that the conclusionreached above remainsvalid. The
above example also suggests that the potential for loss of precision will tend to be
particularlygreat for case-control studies, where the oversamplingof cases ensures a
relativelyhigh frequencyof disease in the sample.

8 Testing for No TreatmentEffect in RandomizedStudies


In this section we examinethe use of the estimators61 and b6 for testing the hypothesis
of no treatmenteffect in randomizedstudies (or other situations where X1 and X2 are
knownto be independent),for both classic linear regressionand logistic regression.As in
previous sections, we will assume that the variables X1 and X2 are dichotomous. The
variableX1 will now indicate whether a particularindividualreceived treatmentor not.
The randomallocationof treatmentand control to study subjectsensures that X1 and X2
are independent.
Let us now suppose that the response variable Y is such that the classic linear
regression models (1) and (2) of ? 2 are valid. From the results of ? 2 we know that
b* = b1 and that 61 is a more precise estimator of the treatment effect than is 1*. It
follows immediatelythat if we wish to test the null hypothesis of no treatment effect,
which can be expressed as Ho:b1= b* = 0, then test statisticsbased on the more precise
estimator ~1will generallyhave greaterpower than tests based on 1*.
Now suppose that the outcome variable Y is such that the logistic regressionmodels (3)
and (4) of ? 3 are valid. Independence of X1 and X2 does not in this case ensure no
confounding,so that generallyb1 will not equal b*. However, when b1 equals 0, b* also
equals 0, and thus the null hypothesis of no treatment effect can be expressed as
Ho:b1= b*b=0 in this case as well. Because for the logistic case 6 is a more precise
estimatorthan b1, at first glance we might suspect that tests of Ho:b1 = b = 0 based on
bj would give greaterpower. However, from the well known result that when X1 and X2
are independent,the value of the parameterb falls between 0 and b1 (Gail, 1986), we
also see that the point estimate of treatment effect 1* will tend to be smaller than the
estimate b1. Hence, upon closer examinationit is not clear which type of hypothesistest,
that based on 16 or b1, will give greaterpower in testing Ho.
Pitman(Cox & Hinkley, 1974, p. 338) developed a general definitionof the asymptotic
relativeefficiencyof two hypothesistests which may be appliedto determinewhich of the
two types of test statisticsgives greaterpower. In particular,we have

to* at b = 0) var(m
1 1 X .1)
ARE(6 =lim{l bo db bb)/(d db1 b) [lim var (- 1 X0)?
b-_b,0
Here we are considering the parameter b* as a function of b1. Notationally, let
E(A1)= E pr (X2 = j)Aj, that is expectationover the distributionof X2. Accordingto this
notation, pi = E(pl) and Po = E(p0o). Consequently, b* can be expressed as
log [E(pl1)E(qo0)/E(po1)E(qlA)]
and var (bjI X1) can be expressed as

[N1 + [NoE(po;)E(qo)]-1.
E(p1l)E(qlj)]-'
Also, using the independence of X, and X2, the formula for var (b• IX1) given by (7)
can be expressed as

var E[
(1 IX1)= N,
Pq+&1
P Npjqoj
Nop

ooqo

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
Covariate Adjustment in Logistic Regression Models 239

Now, using the fact that


pij = [exp (a + b1i + b2j)]/[1 + exp (a + b1i + b2j)]
for i, j = 0, 1, we take the derivativeof b* with respect to b1 to obtain
d E(p1ljq1)
.b=
db- E(p1l)E(q1lj)
As bl--0, we also have pio--Poo, and pi -po, and thus
Pl--Pol,
d E(pojqoj)
lim
db1
b,~10 E(poj)E(qoj)
Similarly,
var (IT X1) = [var (1 x1)]-' E(pojqoj)
lim lim=.
b, Ovar (1 IX1) b,-o [var(b~ IX1)]-1• E(p)E(q0o)"
Finally,we obtain the result
ARE(6 to at b = E(poJ)E(qoj)]2 E(pojqoj)
E(po])E(qoj)
0)- E(pojqoj) E(Poj)E(qoj) E(p0oqoj)
We immediatelyconclude that ARE (I1 to / at b1= 0) ~ 1, with equality occurringif
and only if X2 is independentof (Y, X1). Thus, to test the null hypothesisof no treatment
effect in a randomizedstudy, it is alwaysas or more efficientto adjustfor the covariateX2
when logistic models are used. Thus, in this regardthe logistic regressionmodel behaves
similarlyto the classiclinear regressionmodel. This result is essentiallya special case of a
result of Gail, Tan & Piantadosi(1988), althoughthese authorswork with the score test
ratherthan the asymptoticallyequivalentWald test. We note that it is straightforwardto
extend the above derivationto allow for a discrete multivariateX2.

9 Discussion
For classiclinear regressionmodels, the precisionof the estimatorb1depends upon the
relativestrengthsof the Y - X2 and X1 - X2 associations. In particular,a strong Y- X2
associationhas a beneficial effect upon precision, whereas a strong X1 - X2 association
has a detrimentaleffect. It has been, heretofore, conventionalwisdom to assume that the
above behavior of classic linear regression with respect to precision applies more
generallyto other types of regressionmodels. In this paper, however, we have shown that
the behaviorof logistic regressionwith respect to precisionis quite differentfrom that of
classic linear regression. In particular,while a strong X1 - X2 association again has a
detrimentaleffect upon precisionfor logistic regression,a strong Y - X2 associationalso
has a detrimentaleffect. Consequently,whereasin classic linear regressionadjustmentfor
predictive covariates can result in either increased or decreased precision, adjustment for
predictive covariates will always result in a loss of precision for logistic regression.
However, we have seen that for logistic regression, as for classic regression, adjustment
for predictive covariates results in greater efficiency when testing for a treatment effect in
randomized studies.
In any particular investigation, one may be interested in estimating bj, or both.
Given the behavior of logistic regression with respect to precision, when bl,the parameter of
interest is b1 it seems plausible that in some situations it might be preferable to use the
biased but more precise /• to estimate bl, rather than the unbiased but less precise /•, as
the estimator /• may result in greater accuracy, as measured by mean square error. Work

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions
240 L.D. ROBINSON and N.P. JEWELL

is currentlyin progressto determineguidelinesfor when one might use as an estimator


of b1. It should be noted that for large sample sizes the mean square error will be
dominated by its bias rather than variance component, so that for sufficiently large
samplesthe adjustedestimatorwill always be preferable.
Although we have largely focused on the cohort design, it is straightforwardto show
that the resultsand argumentsextend to other standarddesigns, includingcross-sectional
studies. In furtherwork we shall consider other common generalizedlinear models used
with dichotomousdependent variables and standardregressionmodels used in survival
analysisapplications.

Acknowledgements
We thankW.J. Redfearnfor severalilluminatingdiscussions,and S. Selvinfor showingus an early proof of a
specialcase of the resultof ? 4, as well as numerousinvaluablediscussions.We are also indebtedto P. Armitage
for commentswith regard to the material in ? 8. Support for this researchwas provided in part by Grant
A129162from the NationalInstituteof Allergy and InfectiousDiseases. The manuscriptwas completedwhile
the second authorvisited the Departmentof Statistics,Universityof Oxford, with supportfrom a travel grant
fromthe BurroughsWellcomeFund.

References
Bishop, Y.M.M., Fienberg, S.E. & Holland, P.W. (1975). Discrete Multivariate Analysis: Theory and Practice.
Cambridge,Mass:MIT Press.
Breslow, N.E. & Day, N.E. (1980). Statistical Methods in Cancer Research, 1: The Analysis of Case-Control
Studies.Lyon, France:IARC ScientificPublications.
Breslow, N.E. & Day, N.E. (1987). Statistical Methods in Cancer Research, 2: The Design and Analysis of
CohortStudies.Lyon, France:IARC ScientificPublications.
Breslow,N.E. & Powers,W. (1978). Are there two logisticregressionsfor retrospectivestudies?Biometrics34,
100-105.
Cox, D.R. & Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman and Hall.
Fisher, R.A. (1932). Statistical Methods For Research Workers. Edinburgh: Oliver and Boyd (13th ed., 1958).
Gail, M.H. (1986). Adjustingfor covariatesthat have the same distributionin exposed and unexposedcohorts.
In Modern Statistical Methods in Chronic Disease Epidemiology, Ed. S.H. Moolgavkar and R.L. Prentice,
pp. 3-18. New York: Wiley.
Gail, M.H., Tan, W.Y. & Piantadosi,S. (1988). Tests for no treatmenteffects in randomizedclinical trials.
Biometrika 75, 57-64.
Gart,J.J. (1962). On the combinationof relativerisks. Biometrics18, 601-610.
Hardy,G.H., Littlewood,J.E. & Polya, G. (1952). Inequalities.London:CambridgeUniversityPress.
Mantel,N. (1989). Confoundingin epidemiologicstudies. Biometrics45, 1317-18.
Mantel, N. & Haenszel, W. (1959). Statisticalaspects of the analysis of data from retrospectivestudies of
disease. J. Nat. Cancer Inst. 22, 719-48.
McCullagh,P. & Nelder, J.A. (1983). GeneralizedLinearModels. London:Chapmanand Hall.
Weinberg,C.R. (1985). On pooling acrossstratawhen frequencymatchinghas been followedin a cohortstudy.
Biometrics 41, 117-27.
Wickramaratne, P.J. & Holford, T.R. (1989). Confoundingin epidemiologicstudies. Response. Biometrics45,
1319-22.
Woolf, B. (1955). On estimatingthe relationshipbetween blood group and disease. Ann. Human Genetics19,
251-53.

Resume
Les r6sultatsde l'analysede regressionlin6aireclassiqueconcernantl'effet d'ajustementpour des variables
concomitantessur la precisiond'un estimateurd'exposition,sont souvent supposes s'appliquerde fagon plus
g6neralea d'autrestypes de modulesde r6gression.Dans cet article, on montrequ'une telle suppositionn'est
pas jutifiee dans le cas d'une r6gressionlogistique, oil I'effet d'ajustementde variablesconcomitantessur la
precisionest tout a fait different.Par exemple, en r6gressionlin'aire classique,I'ajustementpour une variable
concomitantede prevision non confondante se traduit en une precision amelioree. Par contre, le meme
ajustementen regressionlin6airelogistique,se traduiten une perte de precision.Quoiqu'ilen soit, quandI'effet
d'un traitementest test6 dans une etude randomis6eil est toujours plus efficace d'ajusterpour des variables
concomitantespr6visionnellesquand un modble logistique est utilis6 et ainsi, le comportementen regression
logistiqueest identiquea celiu en regressionlin6aireclassique.

[Received May 1990, accepted October 1990]

This content downloaded from 185.2.32.21 on Mon, 16 Jun 2014 14:13:33 PM


All use subject to JSTOR Terms and Conditions

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy