0% found this document useful (0 votes)
29 views26 pages

PD2004 9

This document discusses models for binary dependent variables, including logit and probit models. It covers homogeneous models that do not account for heterogeneity between subjects, as well as random effects and fixed effects models to accommodate heterogeneity. Key points discussed include: - Logit and probit models use nonlinear functions of explanatory variables to model the probability of a binary response being 1. - Random effects models incorporate subject-specific random variables to accommodate heterogeneity between subjects. This reduces the number of parameters compared to fixed effects models. - Maximum likelihood is commonly used to estimate model parameters, with the normal equations providing the maximum likelihood estimates.

Uploaded by

Fantay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views26 pages

PD2004 9

This document discusses models for binary dependent variables, including logit and probit models. It covers homogeneous models that do not account for heterogeneity between subjects, as well as random effects and fixed effects models to accommodate heterogeneity. Key points discussed include: - Logit and probit models use nonlinear functions of explanatory variables to model the probability of a binary response being 1. - Random effects models incorporate subject-specific random variables to accommodate heterogeneity between subjects. This reduces the number of parameters compared to fixed effects models. - Maximum likelihood is commonly used to estimate model parameters, with the normal equations providing the maximum likelihood estimates.

Uploaded by

Fantay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

9.

Binary Dependent Variables

• 9.1 Homogeneous models


– Logit, probit models
– Inference
– Tax preparers
• 9.2 Random effects models
• 9.3 Fixed effects models
• 9.4 Marginal models and GEE
• Appendix 9A - Likelihood calculations
9.1 Homogeneous models
• The response of interest, yit, now may be only a 0 or a 1, a binary
dependent variable.
– Typically indicates whether the ith subject possesses an
attribute at time t.
• Suppose that the probability that the response equals 1 is denoted
by Prob(yit = 1) = pit.
– Then, we may interpret the mean response to be the probability
that the response equals 1 , that is, E yit = 0 Prob(yit =
0) + 1 Prob(yit = 1) = pit .
– Further, straightforward calculations show that the variance is
related to the mean through the expression Var yit = pit (1 - pit ) .
Inadequacy of linear models
• Homogeneous means that we will not incorporate subject-specific
terms that account for heterogeneity.
• Linear models of the form yit = xit  + it are inadequate because:
– The expected response is a probability and thus must vary
between 0 and 1 although the linear combination, xit , may
vary between negative and positive infinity.
– Linear models assume homoscedasticity (constant variance) yet
the variance of the response depends on the mean which varies
over observations.
– The response must be either a 0 or 1 although the distribution of
the error term is typically regarded as continuous.
Using nonlinear functions of
explanatory variables
• In lieu of linear, or additive, functions, we express the probability of
the response being 1 as a nonlinear function of explanatory variables
pit =  (xit ).
• Two special cases are:

z
– the logit case 1 e
π( z )  z
 z
–  (z ) as a cumulative standard
1  e normal
e  1distribution function, the
probit case.
• These two functions are similar. I focus on the logit case because it
permits closed-form expressions unlike the cumulative normal
distribution function.
Threshold interpretation
• Suppose that there exists an underlying linear model,
yit* = xit  + it*.
– The response is interpreted to be the “propensity” to possess a
characteristic.
– We do not observe the propensity but we do observe when the
propensity crosses a threshold, say 0.
– We observe
0 yit*  0
yit   *
1 y it  0
• Using the logit distribution function,
Prob (it*  a) = 1/ (1 + exp(-a) )
• Note that Prob(-it*  xit ) = Prob(it*  xit ). Thus,

1
Prob( yit  1)  Prob( y  0)  Prob(  xit β) 
*
it
*
it   (xit β)
1  exp(xit β)
Random utility interpretation
• In economics applications, we think of an individual choosing
among c categories.
– Preferences among categories are indexed by an
unobserved utility function.
– We model utility as a function of an underlying value plus
random noise, that is, Uitj = uit(Vitj + itj), j = 0,1.
– If Uit1 > Uit0 , then denote this choice as yit = 1.
– Assuming that uit is a strictly increasing function, we have
Prob( y it  1)  Prob(U it 0  U it1 )

 Probu it (Vit 0   it 0 )  u it (Vit1   it1 )   Prob it 0   it1  Vit1  Vit 0 

• Parameterize the problem by taking Vit0 = 0 and Vit1 = xit β.


• We may take the difference in the errors, it0 - it1 , to be normal
or logistic, corresponding to the probit and logit cases.
Logistic regression
• This is another phrase used to describe the logit case.
• Using p = (z), the inverse of  can be calculated as z = -1(p)
= ln ( p/(1-p) ) .
– Define logit (p) = ln ( p/(1-p) ) to be the logit function.
– Here, p/(1-p) is known as the odds ratio. It has a convenient
economic interpretation in terms of fair games.
• That is, suppose that p = 0.25. Then, the odds ratio is 0.333.
• The odds against winning are 0.333 to 1, or 1 to 3. If we bet $1, then in a
fair game we should win $3.
• The logistic regression models the linear combination of explanatory
variables as the logarithm of the odds ratio,
xit  = ln ( pit/(1-pit) ) .
Parameter interpretation
• To interpret  =( 1, 2, …, K), we begin by assuming that jth
explanatory variable, xitj, is either 0 or 1.
• Then, with the notation, we may interpret

 j  xit1  1  xitK  β  xit1  0  xitK  β


 Prob( yit  1 | xitj  1)   Prob( yit  1 | xitj  0) 
 ln    ln 
• Thus,  1  Prob( yit  1 | xitj  1)   1  Prob( yit  1 | xitj  0) 
   

j
e 

Prob( yit  1 | xitj  1) / 1  Prob( yit  1 | xitj  1) 
• Prob(
To illustrate, if y =  1 | x then
0.693,
j it itj / 1 Prob(
 0)exp( ) = 2.y
j it  1 | xitj  0) 
– The odds (for y = 1) are twice as great for xj = 1 as for xj = 0.
More parameter interpretation
• Similarly, assuming that jth explanatory variable is
continuous, we have
d d  Prob( yit  1 | xitj ) 
j  xit β  ln 
dxitj dxitj  1  Prob( yit  1 | xitj ) 

d
dxitj
 
Prob( yit  1 | xitj ) / 1  Prob( yit  1 | xitj ) 


Prob( yit  1 | xitj ) / 1  Prob( yit  1 | xitj ) 
• Thus, we may interpret j as the proportional change in the
odds ratio, known as an elasticity in economics.
Parameter estimation
• The customary estimation method is maximum likelihood.
• The log likelihood of a single observation is
ln(1  π( xit β)) if yit  0
 
yit ln π( x it β)  (1  yit ) ln(1  π( x it β))  
• The log likelihood of the data set is ln π(xit β) if yit  1

 y it
it ln π( xit β)  (1  yit ) ln(1  π( xit β))
• Taking partial derivatives with respect to b yields the score equations
π(xit β)
it
x it  yit  π(xit β) 
π(xit β)1  π(xit β) 
0

– The solution of these equations, say bMLE, yields the maximum


likelihood estimate.
• The score equations can also be expressed as a generalized estimating
equation: 
it it
 y  E y it 
β
E y it Var y it 1
0

• where

E yit  π(xit β) E yit  x it π(xit β)
Var yit  π( xit β )1  π( xit β) 
β
For the logit function
• The normal equations are:
 x y
it
it it   ( xit β)   0

– The solution depends on the responses yit only through the vector of
statistics it xit yit .
• The solution of these equations, say bMLE, yields the
maximum likelihood estimate bMLE .
• This method can be extended to provide standard errors for
the estimates.
9.2 Random effects models
• We accommodate heterogeneity by incorporating subject-specific
variables of the form:
pit =  (i + xit  ).
– We assume that the intercepts are realizations of random variables
from a common distribution.
• We estimate the parameters of the {i} distribution and the K slope
parameters .
• By using the random effects specification, we dramatically reduced the
number of parameters to be estimated compared to the Section 9.3 fixed
effects set-up.
– This is similar to the linear model case.
• This model is computationally difficult to evaluate.
Commonly used distributions
• We assume that subject-specific effects are independent and come from a
common distribution.
– It is customary to assume that the subject-specific effects are normally
distributed.
• We assume, conditional on subject-specific effects, that the responses are
independent. Thus, there is no serial correlation.
• There are two commonly used specifications of the conditional
distributions in the random effects panel data model.
– 1. A logistic model for the conditional distribution of a response. That is,

– 2. A normal model for the conditional distribution of a response. That is,


1
Prob( yit  1 |  i )  π( i  xit β) 
1  exp ( i  xit β) 
– where  is the standard normal distribution function.

Prob( yit  1 |  i )   ( i  xit β)


Likelihood
• Let Prob(yit = 1| i) =(i + xit ) denote the conditional
probability for both the logistic and normal models.
• Conditional on i, the likelihood for the it th observation is:
yit (1 yit ) 1  π( i  xit β) if yit  0
π( i  xit β) (1  π( i  xit β)) 
π( i  xit β) if yit  1
• Conditional on i, the likelihood for the ith subject is:
Ti

 π i  xit β  it 1  π i  xit β  it


y 1 y

t 1

• Thus, the (unconditional) likelihood for the ith subject is:


Ti

 πa  xit β  1  πa  xit β 1 y


yit
li  it
φ(a )da
t 1

– Here,  is the standard normal density function.


• Hence, the total log-likelihood is i ln li .
– Note: lots of evaluations of a numerical integral….
Comparing logit to probit specification
• There are no important advantages or disadvantages when
choosing the conditional probability  to be:
– logit function (logit model)
– standard normal (probit model)
• The likelihood involves roughly the same amount of work to
evaluate and maximize, although the logit function is slightly
easier to evaluate than the standard normal distribution function.
• The probit model is slightly easier to interpret because
unconditional probabilities can be expressed in terms of the
standard normal distribution function.
• That is,

 x β 
Prob( yit  1)  E Φ( i  xit β)  Φ it 
 2 
 1    
9.3 Fixed effects models
• As with homogeneous models, we express the probability of the
response being 1 as a nonlinear function of linear combinations of
explanatory variables.
• To accommodate heterogeneity, we incorporate subject-specific
variables of the form:
pit =  (i + xit ).
– Here, the subject-specific effects account only for the intercepts and
do not include other variables.
– We assume that {i} are fixed effects in this section.
• In this chapter, we assume that responses are serially uncorrelated.
• Important point: Panel data with dummy variables provide inconsistent
parameter estimates….
Maximum likelihood estimation
• Unlike random effect models, maximum likelihood estimators are inconsistent in
fixed effects models.
– The log likelihood of the data set is

 y
it
it
– This log likelihood can
ln  (  x β)  (1  y ) ln(1   (  x β))
still be i it
maximized to yield it
maximum i
likelihood it
estimators.
– However, as the subject size n tends to infinity, the number of parameters also tends to
infinity.
• Intuitively, our ability to estimate  is corrupted by our inability to estimate
consistently the subject-specific effects {i } .
– In the linear case, we had that the maximum likelihood estimates are equivalent to the least
squares estimates.
• The least squares estimates of  were consistent.
• The least squares procedure “swept out” intercept estimators when producing
estimates of  .
Maximum likelihood estimation is
inconsistent
• Example 9.2 (Chamberlain, 1978, Hsiao 1986).
– Let Ti = 2, K=1 and xi1 = 0 and xi2=1.
– Take derivatives of the likelihood function to get the
score functions – these are in display (9.8).
– From (9.8), the score functions are
L e i e i  
 yi1  yi 2  i
 i  
0
– and  i 1 e 1 e
L  e i   
   yi 2  i   
0
β i  1 e 
– Appendix 9A.1
• Maximize this to get bmle
• Show that the probability limit of bmle is 2 , and hence is an
inconsistent estimator of .
Conditional maximum likelihood
estimation
• This estimation technique provides consistent estimates of the
beta coefficients.
– It is due to Chamberlain (1980) in the context of fixed
effects panel data models.
• Let’s consider the logit specification of , so that
1
pit  π( i  xit β) 
1  exp ( i  xit β) 
• Big idea: With this specification, it turns out that t yit is a
sufficient statistic for i.
– Thus, if we condition on t yit , then the distribution of the
responses will not depend on i.
Example of the sufficiency
• To illustrate how to separate the intercept from the slope
effects, consider the case Ti = 2.
– Suppose that the sum, t yit = yi1+yi2, equals either 0 or 2.
• If sum equals 0, then Prob (yi1 = 0, yi2 = 0 |yi1 + yi2 = sum) = 1.
• If sum equals 2, then Prob (yi1 = 1, yi2 = 1 |yi1 + yi2 = sum) = 1.
• Both conditional probabilities do not depend on i .
• Both conditional events are certain and will contribute nothing
to a conditional likelihood.
– If sum equals 1,
Prob  yi1  yi 2  1  Prob  yi1  0 Prob  yi 2  1  Prob  yi1  1Prob  yi 2  0 

exp i  xi1β   exp i  xi 2β 



1  exp i  xi1β 1  exp i  xi 2β 
Example of the sufficiency
• Thus,
Prob yi1  0 Prob yi 2  1
 
Prob yi1  0, yi 2  1 | yi1  yi 2  1 
Prob yi1  yi 2  1

exp i  xi 2β  expxi 2 β 


 
exp i  xi1β   exp i  xi 2β  expxi1β   expxi 2β 

• This does not depend on i .


– Note that if an explanatory variable xij is time-constant (xij2
xij1 ), then the corresponding parameter j disappears
from the conditional likelihood.
Conditional likelihood estimation
• Let Si be the random variable representing t yit and let sumi be
the realization of t yit .
• The conditional likelihood of the data set is
n
 piy1i1 piy2i 2  piTyiT 
  
i 1  Prob( S i  sumi ) 
– Note that the ratio equals one when sumi equal 0 or Ti.
– The distribution of Si is messy and is difficult to compute
for moderate size data sets with T more than 10.
• This provides a fix for the problem of “infinitely many
nuisance parameters.”
– Computationally difficult, hard to extend to more complex
models, hard to explain to consumers
9.4 Marginal models and GEE
• Marginal models, also know as “population-averaged” models,
only require specification of the first two moments
– Means, variances and covariances
– Not a true probability model
– Ideal for moment estimation (GEE, GMM)
• Begin in the context of the random effects binary dependent
variable model
– The mean is E yit =  it   it (β,  )   πa  x it β  d F (a)

– The variance is Var yit = it (1- it ).


– The covariance is Cov (yir, yis)
 πa  x ir β  πa  x is β  d F (a)   ir  is

GEE – generalized estimating equations
• This is a method of moments procedure
– Essentially the same as generalized method of moments
– One matches theoretical moments to sample moments, with
appropriate weighting.
• Idea – find the values of the parameters that satisfy
n
0 K   G  (b EE , EE )Vi (b EE , EE )  (y i  μ i (b EE , EE ))
1

i 1

– We have already specified the variance matrix.


– We also use a K x Ti matrix of derivatives

 μ i (β, )   μi1 μiTi 
G  (β, )       
 β   β β 
– For binary variables, we have

it  xit  πa  xit β d F (a)
β
Marginal Model
• Choose the mean function to be  it  Φx it β 
– Motivated by probit specification
 x β 
Prob( yit  1)  E Φ( i  xit β)  Φ it 
 2 
 1    

• For the variance function, consider Var yit =  it (1- it).
• Let Corr(yir, yis) denote the correlation between yir and yis.
– This is known as a working correlation.
• Use the exchangeable correlation structure specified as
1 for r  s
Corr ( y ir , y is )  
  for r  s

• Here, the motivation is that the latent variable i is common to


all observations within a subject, thus inducing a common
correlation.
• The parameters τ = (, ) constitute the variance components.
Robust Standard Errors
• Model-based standard errors are taken from the square root of
the diagonal elements of
1
 n 

 G  (b EE ,  EE )Vi (b EE ,  EE )  G  (b EE ,  EE )  
 i 1
1

• As an alternative, robust or empirical standards errors are


from
1 1
 n
  n
 n


 G  Vi1G    G  Vi1 y i  μ i y i  μ i  Vi1G   G  Vi1G  
 
    
 i 1   i 1  i 1 

• These are robust to misspecified heterscedasticity as well as


time series correlation.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy