0% found this document useful (0 votes)
112 views

Logit and Probit Ordered and Multinomial Models

Uploaded by

Francisco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

Logit and Probit Ordered and Multinomial Models

Uploaded by

Francisco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

LOGIT

ANO
PRO BIT
Ordered and
Multinomial ·
Models

Vani K. Borooah

Series: Quantitative Applications


in the Social Sciences
I
N.Ch3m. 300.72 B7361

r;~ Autor: Borooah, Vani K.


Título: Logit and probit : ordered an
138
APER
11111111111111111111111111111111111111111111111111
93260300 Ac. 294170
i3 x. l FCH
Series/Number 07-138

SAGE UNIVERSITY PAPERS

Series: Quantitative Applications


in the Social Sciences
LOGIT ANO PROBIT
Series Edito-r: Michael S. Lewis-Beck, University of Iowa Ordered and Multinomial Models

Editorial Consultants
Richard A. Berk, Sociology, University of California, Los Angeles
William D. Berry, Politica/ Science, Florida State University
Kenneth A. Bollen , Sociology, University of North Carolina, Chapei Hill
Linda B. Bourque, Public Health, University of California, Los Angeles
Jacques A. Hagenaars, Social Sciences, Tilburg University
Sally Jackson, Communications, University of Arizona
Richard M. Jaeger (recently deceased) , Education, University of VANI K. BOROOAH
North Carolina , Greensboro
Gary King , Department of Government, Harvard University
University of Ulster
Roger E. Kirk, Psychology, Baylor University
Helena Chmura Kraemer, Psychiatry and Behavioral Sciences,
Stanford University
Peter Marsden , Sociology, Harvard University
Helmut Norpoth, Politica/ Science, SUNY, Stony Brook
Frank L Schmidt, Management and Organization, University of Iowa
-- Herbert Weisberg, Politicai Science, The Ohio State University

Publisher
Sara Miller McCune, Sage Publications , lnc.

@ SAGE PUBLICATIONS
lnternational Educational and Professional Publisher
Thousand Oaks London New Delhi

·:- .
Copyright ©2002 by Sage Publications, Inc.
Ali rights reserved. No part of this book may be r.eproduced or utilized in any form
or by any means, electronic or mechanical, including photocopying, recording, or by
any information storage and retrieval system, without permission in writing from thc
publisher.

For information:
Sage Publications, Inc.

@ 2455 Tcller Road


Thousand Oaks, California 91320
E-mail: order@sagepub.com
CONTENTS

Sage Publications Ltd.


6 Bonhill Street
~- z ~"i i ':({)
Series Editor's Introduction iv
~, 9~260~
London EC2A 4PU
United Kingdom
Sage Publications India Pvt. Ltd.
Preface vi
M-32 Market
Greater Kailash I 1. Introduction
New Delhi 110 048 India
2. Ordered Models 4
Printed in the United States of America
c;; lntroduction
Library of Congress Cataloging-in-Publication Datt "§ 4
oç: oo Methodology 7
Borooah, Vani K.
Application to Deprivation Status 15
Logit and probit: ordered and multinomial me
p. cm. - (Quantitative applications in the
.:;s
·-
~ ~
("1)

N "<t
Estimation Over Subsamples: Characteristics Versus
fi") g;
Includes bibliographical references. ü O' Coef:ficients 36
ISBN 0-7619-2242-3 ç:
C<::l
1. Social sciences-Statistical methods. 2. Pr• -o Jits. 3. Multinomial Logit 45
I. Title. II. Sage ur,iversity papers series ~ •ns in the
social sciences; no. 07-138
~
'"0
---
=:
Introduction · 45
HA31.7 .B67 2001 """"'~
o = A Random Utility Model 45
519.2'--dc21
; ---
...~._..~
=-- .926 The Class of Logit Models: MultinomJal and Conditional -+7
-..o~
Multinomial Logit 47
This book is printed on acid-free paper. \0
f•t') ·~
"j ==== Application to Occupational Outcome.; 52
05 06 10 9 8 7 6 5 ~ cd ~ - -:-
N >- "O ===: Country of Birth 54
Acquiring Editor: C. Deborah Laughton ~~nfJ~ Area of Residence 55
Editorial Assistam:
Producrion Editor:
Veronica Novak
Denise Santoyo
~ g ·Sn ::-- Conditional Logit and the Independence of Irrelevant
..... 0~
Alternatives 72
Production Assistam: Kathryn Journey o~~ ::r:
Typesetter: Technical Typesctting Inc.
e
<'j
;_; ô =-=- ü~
ço
Q-
.c: ~ ~
4. Program Listings 76
When citing a university paper, please use the proper fo u ;;:; '~ ..: University Introduction
Paper series title and inciude paper number. One of the i < [-< w d (depend-
76
ing on the style manual used): Ordered Probit and Logit Programs 77
(1) BOROOAH, V. K. (2001) Logil and Probil: Ordered and Multinomial Models. Sage University Multinomial Logit Programs 85
Papers Serics on Quantitative Applications in the Social Sciences, 07-138. Thousand Oaks, CA:
Sage. Notes 91
OR
References 95
(2) Borooah, V. K. (2001). Logit and Probit: Ordered und Multinomial Models. (Sage University
Papers Series on Quantitative Applications in the Social Sciences, series no. 07-138). Thousand About the Author
Oaks, CA: Sage.
97

~
\'

individual differences in social deprivati l)n ( measured in t hree cate-


gories, "not deprived," "mildly deprived ," anel "severely deprived") .
One question that often comes up is wh ether logit is preferred ove1
probit, or vice-versa. The fundamental th eoretical differencc between
SERIES EDITOR'S INTRODUCTION the two approaches concerns the distribution of the error term, logis-
tic versus normal. ln practice, as noted here, it is difficult to jw;tify
the selection of one over the oth er.
For ordinary Ieast squares (OLS) to yield BLUE estimators, the clas- Treatment is eventually extended to multinomial, or non -ordered.
sical regression assumptions must be met. Some of these assumptions dependent variables with categories > 2. For example, choice of reli-
are easier to meet than others. Further, the substantive consequences gion, choice of neighborhood, choice o1 shopping center, choice of
of their violation vary, from assumption to assumption. One assump- job. A key assumption of multinomial lngit is the Independence of
tion that can be hard to meet, and that has serious consequences for Irrelevant Alternatives (liA) . As Profe~sor Borooah discusses, this
OLS interpretation if not met, is the assumption that the dependent assumption is both the strength and the weakness of the techn ique.
variable is continuous. If instead the dependent variable is discrete, He also makes the important but often forgotten distinction be t\'leen
consisting of two or more outcome categories, then OLS poses seri- odds-ratios and risk-ratios. With binary logit there is no difference
ous inference problems. ln such circumstances, maximum Iikelihood between the two ratios; however, with multinomiallogit outcomes are
techniques such as Iogit or probit are generally more efficient. The given in terms of risk-ratios.
Sage QASS series has given considerable attention to application of The text concludes with helpful detail on the computer programs
Iogit or probit when the dependent variable is dichotomous. Consult actually used to obtain the table results. This step-by-step explication
the relevant sections in No. 45, Linear Probability, Logit, and Probit of computing procedure allows readers t•J see how to run the analy-
Models, by Aldrich and Nelson; No. 86, Logit Modeling, by DeMaris: ses. The exposition is in STATA, but the author also points out othcr
No. 101, Interpreting Probability Models: L ogit, Probit, and Other Gen- available software in SAS, SPSS, and LIMDEP. Overall, the mono-
eralized Linear Models, by Liao; No. 106, Applied Logistic Regression graph provides a current guide to estimating and interpreting results
A nalysis, by Menard; and No. 132, Logistic Regression: A Primer, by from the more complex discrete depende nt variable models.
Pampel. While these works may touch on discrete dependent vari-
ables with categories > 2, they do not emphasize them. -Michael S. Lewis-Beck
The monograph at hand is unique, because it attends exclusively to Series Editor
estimation when the dependent variable has multiple categories. After
an introduction, dependent variables that are discrete and ordered
are considered. For example, suppose a politicai scientist has elec-
tion survey data, and wishes to explain the dependent variable of
Politica! Interest, where respondents are scored 0= low, 1 =moder-
ate, 2=high. The variable is discrete, with respondents falling into
one of three categories. Further, the variable is ordered from "low"
to "high" interest. With such an ordinal variable, we might say that
someone who scored "high" has more politicai interest than someone
scored "Iow,'' but we cannot say precisely how much more. Thus, OLS
regression seems less desirable, ordered Iogit or ordered probit more
desirable, for they accommodate this lower measurement leve!. Pro-
fessor Borooah explicates both procedures in an effort to account for
iv
LOGIT AND PROBI1~: ORDERED
AND MULTINO MIAI--~ MODELS
PREFACE
VANI K. BOROOAH
Uniuersity of Ulster
There are many instances where the appropriate variable for analysis
is merely a coding for some qualitative outcome. As a consequence,
in such situations the dependent variable takes a discrete number of
mutually exclusive, and collectively exhaustive, values. This is in con- 1. INTRODUCT lON
trast to other situations in which, at least conceptually, the dcpen-
dent variable assumes a continuum of values. Although conventional Kenneth Tynan, the theatre critic, once famous ly described h1s pro-
regression methods are not appropriate for the statistical analysis of fession as consisting of people "who know the way, but can't drive
discrete dependent variables, one can, nonetheless, in the spirit of the car." Researchers in the social sciem.es often find themselves in
regression analysis, construct models which link the observed outcome an analogous situation. Some can ]acate points on their research map
to the values of certain "determining" or "explanatory" variables. This but are uncertain how to get from one place to another. Others can
monograph discusses the estimation, simulation and interpretation of trace their path through the suburbia of research but feel less coufi-
multiple (> 2) outcome models, with ordered and with unordered out- dent of getting behind the steering wheel. A fortunate few can both
comes, against the backdrop of questions relating to socioeconomic navigate and pilot. This observation defines the broad purposes of this
inequality. monograph, which are (a) to trace the main paths that lead th rough
ln preparing this monograph, I am grateful to Michael Lewis-Beck the landscape of ordered and multinomial logit models, and (b) to
and to two anonymous referees for their valuable comments. I am also offer instruction in driving along these routes. But first, before gears
grateful to the Social Policy Association for permission to reproduce are engaged, some words by way of introduction.
material from my paper, "Targeting social need: Why are deprivation There are many instances where the appropriate variable for anal-
leveis in Northern Ireland higher for Catholics than for Protestants ?" ysis is merely a coding for some qualitative outcome. Such models
(Borooah, 2000) and to the Scottish Economic Society for permis- are known as qualitative choice models. For example, in judging a
sion to reproduce material from my paper, "How do employees of government's performance, a person strougly approves (coding = 1);
ethnic origin fare on the occupational ladder in Britain?" (Borooah, approves (coding = 2); disapproves (coding = 3); or strongly disap-
2001). The results reported in this monograph are based on data from proves (coding = 4). Or, a person votes Liberal (coding = 1), Con-
the 1991 Census for Great Britain. This data, which is Crown copy- servative (coding = 2), or Labour (coding = 3). As a consequence,
right, was kindly made available by the Census Microdata Unit at the in such situations, the dependent variable takes a discrete number of
Cathy Marsh Centre for Census and Survey Research, University of mutually exclusive, and collectively exhau ~ tive, values. This is in con-
Manchester, through funding by JISC/ESRC/DENI. Needless to say, trast to other situations in which, at least conceptually, the depen-
however, I alone remain responsible for the results, their interpreta- dent variable assumes a continuum of values. Although conventional
tion and, indeed, for any errors that this work might contain. regression methods are not appropriate for the statistical analysis of
discrete dependent variables, one can, nonetheless, in the spi rit of
regression analysis, construct models which link the observed outcome
to the values of certain "determining" ur "explanatory" variables.
Qualitative choice models in which the dependent variable takes more
VI
2 l

than two values are known as multiple outcome model s. One may Ordered models are discussed in Chapter 2 and multinomial logit
further subdivide the class of multiple outcome models into those models are discussed in Chapter 3. Both chapters are united by two
involving (a) ordered outcomes (such as the example on the degree aims, (a) to convey an understanding of the underlying methodology
of approval of governmental performance, above) and (b) unordered of the models and (b) to impart an ability to use the models for
outcomes (such as the voting example, above). research in the social sciences.
Qualitative choice models have become a growth industry in applied Meeting the first aim involves steering a safe path bet:ween the
econometric analysis. Social scientists have always had an interest in Scylla of oversimplification and the Charybdis of excessive technical-
the choice between mutually exclusive options. The increasing avail- ity. I have tried to navigate this narrow passage without, I hope. holing
ability of survey data (in either cross-section or panei form) has meant the ship. However, in arder to lighten its load, I have assumed that
that more and more scholars are able to translate their intellectual the reader has sufficient knowledge of the material that form s the
speculation into hard results. Scholars come to these data with a spe- "prequel" to this monograph. ln particular, these include:
cific set of questions to which they seek answers. However, when these
data are analyzed, the results are often shrouded in a Delphic oblique- (1) The deficicncies of the linear probability model, or why the met h-
ness about what the "right" answers might be. ln order to dispel this ods of ordinary regression analysis an· not appropriate for <malyzi ng
fog of ambiguity, textbook techniques have to be manipulated so that models with discrete dependent varial•les.
the derived results point clearly in the right direction. This mono- (2) A familiarity with logit and probit ne thods as applied to mociels
graph discusses the estimation, simulation and interpretation of multi- where the dependent variable has only two possible outcomes.
pie outcome models, with ordered and unordered outcomes, against
the backdrop of questions relating to socioeconomic inequality. For those wishing to refresh their memories on these topics, earlier
Ordered and unordered models require different techniques for monographs in this series by Aldrich and Nelson (1984), DeMaris
their respective analysis. Ordered models may be estimated by either (1992), Liao (1994) and Menard (1995 '1 , as well as more general
logit methods, which are known as ordered logit models, or by probit econometrics texts like Greene (2000), p1 ovide an excellent revi ew.
methods, which are known as ordered probit models. Models where Meeting the second aim is, in my view, more difficult. Usin g a
the outcomes are unordered are most easily estimated by logit meth- model involves severallayers of understar ding. First, one needs to be
ods. Although in principie it is possible to estimate such models by clear about the questions to ask. Next, th ere is the problem of how to
probit, for computational reasons it is often not feasible. For that rea- answer the questions. Typically, a particular research question can be
son, multiple outcome models with unordered outcomes are referred answered in mure than one way, and so it is important to know how
to as multinomial logit models. these different ways differ, why they differ and what might be th e best
Multinomiallogit models may be conditional, which means that the way for addressing the problem at hand . 1,astly, after one has clecided
choices between alternatives may depend not just upon the character- on the questions and how they are to be answered, there is the prac-
istics of the individual making the choice but also upon the attributes tical problem of going about obtaining the answers-of implemcnting
of the choice. For example, the choice by individuais of which shop- one's research strategy by obtaining and interpreting results.
ping centcr to patronize may depend upon the attributes of the cen- I have tried to address these issues-in the context of ordered logit
ters (number and variety of shops, the standard of upkeep of the and probit, and multinomial logit, models-by adopting a three-fold
centers), which do not vary across the individuais, and upon income strategy. First, I have tried to secure the maximum possible ove rlap
and family size, which do vary across the individuais. A further com- between the exposition of the methodology and 1the empirical an alysis.
plication is that individual characteristics and choice attributes may There is very Iittle that is set out in the theoretical sections of any
interact: the choice of shopping center may depend on the distance chapter that is not echoed in the section~ of that chapter concernecl
which an individual has to trave! from his/her place of residence (res- with the application of the methodology.
idence being an individual's characteristic) to a particular shopping Second, I have anchored the exposition in two maj or pieres of
center (the location of the center being a choice attribute). applied work. I did so because I felt that h<wing a single empi rica l (a nel
)

"real world") thread running through a chapter might better il!umi- by which is meant that the outcome associated with a higher value
nate the use of a model than the separate strands of fragmented exam- of the variable Y; is ranked higher than the outcome associated with
ples. Chapter 2, on ordered logit and probit, is grounded in empirical a lower value of the variable. Another way of expressing this is to
work on deprivation. This work uses data from the Northern Ireland say that the dependent variable, Y;, associated with the outcomes is
census on nearly 14,000 individuais to examine where the roots of ordinal: "stronger" outcomes are associated with higher values of thc
deprivation and inequalities in the deprivation experience between variable. However, this ordinal nature of the outcomes has 110 impli-
Catholics and Protestants might lie. Chapter 3, on multinomial logit cation for differences in the strength of the outcomes; the outcome
models, is rooted in empírica! work on occupational attainment by associated with Y; = 2 is not twice as strong as that associated with
ethnic minorities in Britain. This work uses data from the British cen- Y; = 1. Consequently, the actual values taken by an ordinal depen-
sus on nearly 100,000 male full-time employees to examine the dif- dent variable are irrelevant, so long as Jarger values correspond to
ferent chances of black Caribbeans, Indians, and whites of being in stronger outcomes: we could have defined Y; = 5 if the first ou tcome
various occupational categories. occurred; Y; = 7 if the second outcome occurred and so on.
Third, the 1ast chapter contains a complete listing of the computer An example of ordered outcomes is provided by a person's hea!th
programs used to generate the empírica! results. There are, to my status. The outcomes associated with this, e.g., "poor," "good," and
knowledge, at least four well-known and highly regarded pieces of "excellent," could be represented by a variable taking, respectively,
software which, among other things, handle problems of the kind dis- the values 1, 2 and 3. ln this example, the outcome associated with
cussed in this monograph: SAS; SPSS vlO.O (a good introduction to Y ; = 3 (excellent health) is better than that associated with Y; = 2
the use of SAS and SPSS procedures in the analysis of events with (good health) and this, in turn, is better than the outcome associ-
ordered and unordered outcomes is provided by the Stat/Math Center ated with Y; = 1 (poor health) . Other examples of ordered ou tcomes
at the Indiana University 1); LIMDEP (see Greene, 1995); and STATA are outcomes relating to the levei of insurance coverage taken by
(see STATA, 1999). It just so happens that, by a quirk of fate, 1 am a person (no cover, part cover, full cover) or outcomes relating to
most familiar with STATA. For this reason the programs of Chapter 4 the employment status of working-age persons (inactive, unemployed,
are written in STATA code. Almost every line in these programs has employed). One example of a nonordered outcome is a person's reli-
a comment attached to it that explains what it is supposed to do and
gion, e.g., Y; = 1 for Christians, Y; = 2 for Jews, Y; = 3 for Muslims,
how it relates to the material of the earlier chapters. The earlier chap-
and Y; = 4 for Hindus. While the outcomes are ali different in this
ters map the route; Chapter 4 teaches how to drive the car!
example, they cannot be ranked and, therefore, cannot be regarded
as ordered outcomes. To put it differently, the dependent variable
associated with the religion outcomes is not ordinal.
2. ORDERED MODELS When the outcomes are clearly ordered, one should take account
of the fact that the dependent variable is both discrete anel ordinal.
Introduction
For example, if the outcomes are coded 1, 2, 3, a linear regression
Suppose that there are N persons (indexed i= 1, .. . , N) for each would treat the difference between a 3 and a 2 identically to the clif-
of whom an "event" can occur. Suppose that this event has M > 2 ference between a 2 and a 1 whereas, in fact, the numbers are only
outcomes, indexed j = 1, . .. , M , where these outcomes are mutually a ranking and have no cardinal significaJlce. On the other hand, to
exclusive and collectively exhaustive. Let the values taken by the vari- estimate an econometric relation with an ordinal dependent variable
able Y ; represent these outcomes for person i such that: Y ; = 1 if the using the methods of multinomiallogit (discussed in the next chapter)
first outcome occurs for this person (j = 1); Y; = 2 if the second out- would mean that the information conveyed by the ordered nature of
come occurs (j = 2) and so on till Y; = M if the last outcome occurs the data was being discarded. The most commonly used and appro-
(j = M). Suppose further that these outcomes are inherently ordered, priate methods for estimating models wit!t more than two outcomes, 2
7
6

two possible errors, the loss of efficiency a less serious error to make
when the dependent variable associated with the outcomes is both
discrete and ordinal, are those of ordered logit and ordered probit. 3 than that of biased estimates.
Another example of such ambiguity-and one that drives the next
However, the above observation is subject to an important caveat.
chapter on multinomial logit-is provided by a person's occupation.
A critica! assumption of ordered logit and probit is that of parai-
Whether a person works in an unskilled/semi-ski lled, skilled, or pro-
lei slopes. The implications of this assumption are discussed in some
fessional/managerial job could be regarded as a matter of individ-
detail below, but in essence it means that if there is a varial.Jle which
ual choice, although the constraints that .tffect this choice may differ
affects the likelihood of a person being in the ordered categories (e.g.,
from person to person and vary according to race and/or gende r inter
diet on health status) then it is assumed that the coefficients linking
alia. On this choice of interpretation, occupational outcomes could
the variable value to the different outcomes will be the sarne across
be viewed as "nonordered," meaning there is nothing inherently desir-
all the outcomes (a given diet will affect the likelihood of a person
able or undesirable in one type of occupational category over an other.
being in excellent health exactly as it will affect the likelihood of him
For this interpretation, the appropriate estimation method for ana-
or her being in poor health). If this assumption is invalid, so that the
lyzing occupational outcomes is multinomial logit. lndeed, th is was
slope coefficients associated with a particular variable are different precisely the method adopted by Schmidt and Strauss (1975 ) in their
across the different outcomes (a given diet will affect the likelihood analysis of the occupation of 1,000 persons in terms their educa tion,
of a person being in excellent health differently than it will affect experience, race and sex. This method was cchoed in Greene (2000.
the likelihood of him or her being in poor health), thcn the meth- p. 859) in providing an example of the use of multinomial logit.
ods of ordered logit and probit are no longer appropriate and the On the other hand, if a university professor was asked whethe1 he or
model ought to be estimated using the methods of multinomial logit she would prefer to have a banker or a janitor as his son-in-law, that
(discussed in Chap ter 3). professor might plump for the former. ln expressing this prefe re nce.
The fact that it is not always possible to unambiguously identify the professor is implicitly "ranking" occu pations, with bankers being
outcomes as ordinal provides another reason for being cautious in in higher ranked jobs (making more desirable sons-in-Jaw) than jani-
the use of ordered estimation methods. For example, where a person tors. However, the point is that this ranking is purely subjective~ (i.e.,
lives in a city (North, South, East, or West) is ostensibly a nonordered there is nothing inheren tly more desirablc about being a banker than
variable. But if one knew that certain parts of the city provided more about being a janitor) and does not carry the objectivity th at would
salubrious Jiving conditions than other parts, thcn a variable defining be attached to the professor's preference that his or her son- in -law
a person's place of residence could acquire an ordinal connotation should enjoy good health rather than suffer iii health. The moral of
with, say, living in the North (Y; = 4) being "better" than living in the the story is that it is better to treat outcomes as nonordered rtnless
; I South (Y; = 3) . However, in the face of uncertainty about whether a one has good reasons for imposing a rankillg. 5
variable is ordered or nonordered, a sensible rule might be to regard
it as nonordered and, as a corollary, to estimate models using it as
a dependent variable by the methods of multinomial Jogit. This rule Methodology
is sensible because treating an outcome variable as ordered, when in
I fact it is nonordered, imposes a ranking on the outcomes that they do The methodology and underlying logic of ordered logit and prob it
'• '
not possess and invokes the restrictive assumption of parallel slopes models are perhaps best presented using a concrete example . Sup-
(referred to above and discussed in detail below), which is likely to pose that there are N persons (indexed i = ], .. . , N) living in an
bias the estimates. On the other hand, not treating an outcome vari- area and that each person's "degree of deprivation" can be repre-
able as ordered, when in fact it is ordered, fails to impose a legiti- sented by the value of a variable D;. such that higher values of D,
mate ranking on the outcomes. This omission may lead to a loss of represent higher degrees of deprivation. The value assumed by this
"deprivation index" for a particular perso n-hereafter referred to as
efficiency, but it is unlikely to bias the estimates. ln the face of these
7

his or her "deprivation score"--depends upon a variety of factors per- The 8 1, 82 ;::: O of E quation 2.2 are unknown parameters ( ô 1 < 8 2 ) to
taining to that person. Examples of such factors might include being be estimated along with the f3 k of Equation 2.1. A persmú classifica-
unemployed, being a single parent, and living in a particular area. tion in terms of deprivation levei depends upon whether or not his or
Suppose that the deprivation index, D;, is a linear function of K fac- her deprivation score, D;, crosses a threshold. The probabilities of Y;
tors ("determining variables") whose values, for individual í, are X ;k> taking values 1, 2 and 3 are given by
k = 1, .. . , K. This means that the deprivation index can be repre-
sented as Pr(Y; = 1) = Pr(Z;+e;::; ô 1 ) =Pr(e; - ~ 8 1 - Z;)

K Pr( Y; =2)=Pr(ôl:::: Z;+e; :::: o2) = Pr(ôl -Z; < S;:::: ô2- z,) (2.3)
D ; = Lf3kXik +e;= Z; +e;. (2.1) Pr(Y;=3 ) =Pr(Z ;+e;;::: 82 ) = Pr(si _: ô 2 - Z;) .
k= l

where f3k is the coef:ficient associated with the k 1h variable (k =


Each of the N observations is treated as a single draw trom a
1, ... , K) and Z; = 'Lf=t f3kXik· An increase in the value of the k 1h
multinomial distribution, and in this case the multinomial distribu-
factor for a particular person will cause his or her deprivation score
tion has three outcomes, not deprived, mildly deprived, and severely
to rise if f3k > O and fali if f3k < O. However, because the relationship
deprived. Suppose that of the N persom;, N 1 were not deprived, N 2
between the deprivation score and the deprivation-inducing factors is
were mildly deprived and N 3 were severely deprived. 6 Then th e like-
not an exact one-for example, there may be factors left out of the
lihood of observing the sample, which is simply the product of the
equation or factors may be measured inaccurately-an error term, e;
is included in the equation to capture this inexactitude. probability of the individual observations, is
The problem with the formulation in Equation 2.1 is that the exact
shade of a person's deprivation, as represented by the values of D;, is L= [Pr( Y; = l)]N'lPr(Y; = 2)t 2 [Pr(Y; = 3)t3
dif:ficult, if not impossible, to observe. The deprivation index is a latent = [F(ot - Z;))N' [F( ô2 -- Z; l - F( ô 1 - Z;)]N 2
varíable, which (though conceptually useful) is unobservable either in
x [1- F(o 1 -- Z;)]N3 • (2.4)
principie or practice, and Equation 2.1 is a latent regressíon, which as
it stands cannot be estimated.
where F(x) = Pr( e; < x) is the cumulative probability distribution of
However, what can be observed is a person's deprivation levei-a
the errar terms. If we knew the probability distribution of the error
person can, for example, be classi:fied as being "not deprived," "mildly
deprived," or "severely deprived"-and a variable Y; can be associ- terms-that is, if we knew what F(x ) was- then we could choose as
ii: our estimates of f3 k> 8 1, and ô 2 those values which maxímized the
ated with these deprivation leveis, such that Y; = 1 if person is not
líkelihood of observing the sample obse1vations. 7 ln the absence of
'·. deprived, Y; = 2 if person is mildly deprived, and Y; = 3 if person
is severely deprived. ln terms of the earlier discussion, Y; is an ordi- such knowledge, we could assume that the errar terms followed a
nal variable. The categorization of the persons in the sample in terms particular probability distribution.
of the three leveis of deprivation is ímplicítly based on the values of The difference between the ordered logit and the ordered probit
the latent variable D;, in conjunction with "threshold" values 8 1 and models lies in the (assumed) distributi on of s;, the error term in
82 , such that Equation 2.1. An ordered logit model is the result of assuming that e,
is iogistically distributed, while an ordere d probit model is the res ult
Y; = 1, ifD;:s o 1 of assuming that s ; is normally distributed. It is natural to ask which
distribution is the appropriate one to use. 8 The logistic distri bution
Y;=2, if 8 1 ::; D; :S 82 (2.2) is similar to the normal except in the tails, which are considerably
Y; =3, if D ; ;::: 82 . heavier. 9 As Greene (2000) points out, ·'it is difficult to justify the
10 lJ

choice of one distribution over the other on theoretical grounds ... where [3 0 is the intercept term and Wj -= [3 0 + Z ;. Greene's cutoff
in most applications, it seems not to make much difference" (p. 815). points are denoted by J-L t and J-L 2 (which are d ifferent frorn STATA's
Using the estimated values S k of the coefficients {3 k allows an esti- cutoff points 8 1 and 8 2 ; precisely how tbey are different is shown
mated value z;= I:~= 1 Skxik to be computed for each individual in below). Greene (2000) then sets the first cutoff point, J-L t, to zero. 11
the sample. Using the Z; in conjunction with 81 and 82 , which are the Therefore, under his formulation , the eqnations for the probahilities
estimated values of the cutoff parameters 8 1 and 82 , allows the prob- ftn , P;z, and Pi3 become
abilities of being at different leveis of deprivation to be estimated for
every person in the sample. These estimates-respectively denoted P i1 = Pr(s; S - WJ = F( -Ú';) = J(-So- .i;)
ftn , Pi2• and [J;3-are computed as = F(8 1 - Z;) (2.7a)

Pit = Pr(s; S B1 - Z;) = F(8 1 - Z;) (2.5a) P;z = Pr(- Ú'; < t:; s P-2- Ú';) = I(P-2- Ú';)- F( -Ú';)

P;z = Pr(8 1 ,_. Z; < e; S 82 - Z;) = F(82 - Z;)- F( 81 - Z;) (2.5b) = F([Lz - So- Z;)- F( -So - Z;)

Pi3 = Pr(s; 2: 82 - Z;) = 1- F(8 2 - Z;) (2.5c) = F(82 - Z; )- F(81- Z;) (2.7b)
Pi3 = Pr(s; 2: P-2- Ú';) = 1- F([L ] - Ú';)
where I:7= 1 Pij = 1 for ali i= 1, . .. , N.
The model described above is also known as the proportional-odds = 1- F(P-2 - So- Z;) = 1- ['(8 2 -· Z;). (2.7c)
model, because if one considers the odds-ratio with respect to some
category j = m, Equations 2.5a to 2.5c of STATA and Equations 2.7a to 2.7c of
Greene (2000) are equivalent 12 when Ô; = J-L; - [3 0 • That is to say,
Pr(Y; S m) STATA cutoff points are equal to Greene's cutoff poin ts less the
OR(m) = Pr(Y; > m)' intercept term. ln that sense, S11\TA absorbs the intercept term into
its cutoff points. 13
then this ratio is independent of the category m. The odds ratio is
assumed to be constant for ali categories.10 Ordered Logit
Under a logistic distribution, the cumt.!ative distribution function
A Clarification on Notation of the random variable X is

h'
The ordered regressions as estimated by STATA (whether logit or
·. probit) do rwt explicitly include an intercept term: in other words, Pr(X s x) = A(x) = exp(x)/[1 + exp(x)) = 1/ (1 + exp( -x) ), (28)
the {3k (k = 1, ... , K) in Equation 2.1 are ali slope coefficients.
The intcrcept term is not explicitly shown because it is absorbed, and so if it is assumed that the error terms follow a logistic distribution,
,.
.
\,
in a manner to be shown below, into the cutoff points, 8 1and 82 .
On the other hand, Greene (2000, p. 876), in his formulation of the
ordered regression, explicitly includes an intercept term. His equiva- Pr(Y; = 1) = A(8 1 - Z;) = 1/[1 + exp(Z;- 8 1)] (2.C>a)
lem of Equation 2.1 is
Pr(Y; = 2) = A(ô 2 - Z;)- A(ô 1 - Z; \
K = 1/ [1 + exp(Z;- 82 )] -1/[1 + exp(Z;- 8 1) ] (2.9b)
D; = f3o +L {3kXik +e; = f3o + Z; + e; = W; + e; (2.6)
k=1 Pr(Y; = 3) = 1-- A(ô 2 - Z;) = l - 1/ I+ exp(Z;- 8 2 )) . (2.Gc)
1.)

The estimates of the f3 k> 8 1 and 82 are obtained by maximizing the


a Pr( Y; = 3) _ ~ [ 1- A(8z - Z,) ]a f.
i:I Z;
likelihood function (Equation 2.4), using the logistic distribution func- --~-- - dZ - ~ zk
ax;k '
tion A( .) in place of F(. ).
= A' (oz- Z;)f3 k> {2. 12c)
Ordered Probit
and under a normal distribution is
The cumula tive distribution of a standard normal varia te 14 (SNV)
X is a Pr(Y; = 1) = ~[<1>( 81 _ z ,)) aZ; =: -<1>'(8 1 - Z;)/3 k (2.Ba)
ax;k dZ; ax;k
Pr( X < x) = <I>(x) = fo x(1 / 27T) exp( -X2 ! 2)dX, (2.10) a Pr(Y; = 2) =~-[<I>( o?- Z;)- <1>(81- Z;)] oZ;
ax;k dZ; - axik
and so if it is assumed that the error terms are SNVs = [<1>'(8 2 - Z;) - <1>'(8 1 - l ;) ]/3" (2.11b)
a 'l
a Pr(Y; = 3) = ~[ 1 - <P(Sz _ Z;))-~_!_ = cp'(8 2 - Z ;){3k (2.13c)
Pr(Y; = 1) = <1>(8 1 - Z;) (2.1la) ax;k dZ; ax ;k
Pr(Y; = 2) = <1>(8 2 - Z;)- <1>(8 1 - Z;) (2.1lb)
where A'(x ) = dA(x)/ dx and <P' (x) = d<I>(x) j dx are the probabil-
Pr(Y; = 3) = 1- <1>(8 2 - Z;). (2.11c) ity density functions of the logistic and of the normal distributions.
respectively. The marginal effects can be obtained by evaluating the
The estimates of the f3k> 8 1 and 82 are obtained by maximizing the appropriate density functions at the relevant points and multiplying
likelihood function (Equation 2.4), using the normal distribution func- by the associated coefficient. For example, from Equation 2.8, the
tion <1>(.) in place of F(.). density function of t he logistic distribution is

Marginal Effects: Continuous Variables A'(x) = !!_ Í exp(x) J= [1 + exp(x)] exp(x ) - [exp(x) f
dx L1 + exp(x ) [1 + exp(x)F
A natural question to ask is how the probabilities of the vari-
ous outcomes would change when the value of one of the variables = A(x )[1 - A(x)].
infiuencing the outcomes changes. For example, if age is a factor
which infl.uences deprivation then how would a person's probabil- The marginal effect under Equation 2.12a is given by
r ity of being at the different deprivation leveis (not deprived, mildly
deprived, and severely deprived) be affected if he or she was a year A'(o 1 - Z ;) {3 k = A(8 1 - Z;)[1 - A(8 1 - Z;) ]f3 k
older or younger? The marginal effect on the three probabilities for
person i. of a small change in x ik (the value of the k 111 determining
= 1
1 + exp(Z; - 8 1)
[1- + 1 1
J
exp( Z;- 8 1) fh ·
variable for person i) , under a logistic distribution, is
Now if the value of the k 1h determin ing variable increases by a
d [A (01- Z)] aZ ; _ A' ( U] Z)/3 small amount and f3 k > O, then under hoth the Iogit and the pro-
_a Pr(Y;=1)_
_.:___:___..:. - - . i ---
" -- i k (2.12a)
ax;k dZ; ax ;k bit model thc probability of not being deprived must fali because.
by Equations 2.12a and 2.13a, the derivative of Pr(Y; = 1) has the
a Pr(Y; = 2) = ~[A(oz - Z;) - A(ol- Z;)) aZ ; opposite sign to f3 k. Under both models too, the probability of being
axik dZ; axik severely deprived must rise since by Equations 2.12c and 2.13c the
= [A'(o 2 - Z;)- A'(81- Z ;)]f3k (2.12b) derivative of Pr(Y; = 3) has the sarne sign as f3k. However, it is not
14 15

clear what would happen to the middle probability. Depending upon the deprivation outcome being considerec.. That is to say, the 01 dered
how the other probabilities change, the probability of being mildly logit and probit models fit a parallel slope.1 cumulatiue model which . in
deprived, Pr(Y; = 2), could either rise, fall, or remain unchanged. 15 the logit case, takes the following form:
Consequently, given a change in a determining variable, it is impos-
sible to infer the direction of change in all the probabilities from K
the sign of the coefficient associated with it. lt is only the direction log --.:E.!__ =
1- PJ
a1 +L f3 kX;;,
of change in the probabilities of the two extreme cases that will be "=I
unambiguously determined. For this reason, Greene (2000) cautions K
that "we must be very careful in interpreting thc coefficients in this log P1 + P2 = a2 +L
"f3 v
kA;;,
model. .. since this is the least obvious of the models" (p. 878). 1- P 1- Pz k=l

K
Marginal Effects: Dummy Variables
lo P1 + P2 + · · · + PM-J = a ,14 +L f3 kXi k·
However, -the above approach for evaluating marginal effects is g 1 - p 1 - Pz - · · · - PM - J k=l
only appropriate when the determining variable is continuous and
not when it is a dummy variable. The effects of a dummy variable
P1 + P2 + PM = l
should be analyzed by comparing the probabilities that result when
where P j = Pr(Y; = j ). The validity of the para/lei slope assumption
the dummy variable takes one value with the probabilities that are
can be tested by estimating a multinomial Iogit model on the data
the consequence of it taking the other value, the values of the other
(see chapter 3). The multinomial logit model allows the slope coeffi-
variables remaining unchanged between the two comparisons. For
cients f3 k to be different between the outc:omes j = 1, .. . , .M. While
example, suppose employment status is a factor which infiuences
the ordered logit model estimates K coefficients, the mul tin omiallogit
deprivation and let X;k = 1 if a person is unemployed and X ;k = O if
model estimates K(M - 1) parameters. 16 If L 1 is the likelihood value
he or she is employed. ln order to analyze how a person's probabil-
from the ordered logit model anel L 2 is the likelihood value from the
ities of being at the three deprivation leveis would be affected if he
multinomiallogit model, then onc can compute 2( L 2 - L 1 ) anel com-
or she moved from employment into unemployment, first evaluate Z;
pare with x2 (K(M- 2)). Note that this is not strictly a likeli hood-
under the assumption that X;k = 1 (call it zl) and use Equations 2.9a
ratio test because the ordered logit model is not nested withm the
to 2.9c-or, if a probit model is used, Equations 2.1la to 2.11c-to
multinomial logit model. Consequently, the test is only sugr;estiue . a
calculate the three probabilities for Y; = 1 (not deprived), Y; = 2
"vety large" x2 value would provide grounds for concern. a ''moclcr-
f (mildly deprived), and Y; = 3 (severely deprived). Then, keeping
ately large" value would not (STATA, 1999, p. 480). However, 1f one
the values of the other determining variables unchanged, evaluate
does have reason for believing th at the para llel slope assumpt ion is
Z; under the assumption that X;k = O (call it z?) and recompute
not valid then the moclel ought to be estimated using the methocl of
the three probabilities. Note that from Equation 2.1, ZJ = z? + f3 k. multinomial logit, notwithstanding th e fa ct that the clependen1 V<1ri-
The difference between the two sets of probabilities is the effect of
able is clearly ordinal.
a person moving from employment (X;k = O) into unemployment
~ :
(X;k = 1), or vice-versa, on his or her probability of being at different
deprivation leveis. Application to Deprivation Status

Defining and Constructing the Depriuation Tndex


The Parallel Slopes Assumption

I
There are N persons, indexecl, i= 1, . . , N. A conclition is defi necl
A criticai assumption of the ordered logit and probit models is that as a "deprivation-inducing condition" (DlC) if the presence of that
the slope coefficíents f3 k of Equation 2.1 do not uary according to condition causes an individual to experie nce deprivation . Surpose
lb II

there are K DICs, indexed k = 1, ... , K and let I;k be a categori- I Whelan (1996) pointed out that the role of tastes presented a major
cal variable with respect to DIC k and person i, such that I;k = 1 if r problem. If observed diffcrences in living patterns could largely be
thc DIC is present for person i, and Ia, = O if it is absent. Then the ascribed to preferences rather than to resources then the absence
deprivation leveis of person i, denoted DT, is defined as of particular items could not be taken as an indicator of want. For

K
DT = l:aFik>
k=1
(2.14) I
l
!
example, Piachaud (1987) has highlighted the considerable variation
in the deprivation scores of households at similar incarne leve!s. Even
if one could separate preferences from needs, the importance of the
researcher in choosing the items that enter the deprivation index is
seen as a further problem. Then there is the question of whether
where ak > Ois the weight attached to the k 1h DIC and is independent
of the person being considered. If the weights relevant to the personal
DICs are defined as ak = 1 - Pk> where Pk represents the frequency
with which cqndition k occurs, then th e ak embodies the notion of
I
!
deprivation should be measured solely by reference to an individual's
own circumstance or, also, by reference to his or her wider social and
geographical environment. Borooah and Carcach (1997), for exam-
ple, drew attention to the importance of neighborhood-quality m
"relative deprivation," i.e., the smaller the frequency with which a par- determining the degree to which people were afraid of crime. Lastly,
ticular DIC is experienced, the greater the weight attached to it when if one could arrive at a satisfactory set of indicators, there is the issue
it is experienced. The use of such weights echoes the work of Desai of how these are to be weighted in the construction of the m erall
and Shah (1988) who, in a re-examination of Townsend's (1979) orig- index. Overarching these problems are the constraints imposed by
inal data, essentially argued that to be deprived of something that the data; one can only construct a deprivation index from the da ta
almost everyone has is more important than to be deprived of some- that are available, not from data that one might wish had been avail-
thing that few people possess. able. The deprivation index that I constructed was based upon data
It is only by accident that the weights, a k, will sum to unity. They from the 1991 Census which gave information on the living circum-
may, however be normalized by defining a k = ak;n, where n = stances of 13,164 individuais living in the region. 17 These data tell us,
:Lf= 1 ak. Under this normalization, the deprivation levei of person i for example, whether, at the time of the Census, a person:
may be defined as
• Lived in a household in which none of thc members normally had the

Di= D;n- 1 = (t akfik)n- = t


k=1
1

k=1
akfi k· (2.15)
use of a car or a van;
• Lived in a household in which none of the members were earners;
• Lived in a household in which the members did not have exclusive use
,, of an inside toilet;
Because Di is simply a scalar transform of Di the sarne ranking of
individuais, in terms of their deprivation leveis, will be obtained using • Lived in a house without any central heati11g;
Di as using Di. However, in terms of their normalized weights, the • Lived in a house without a public supply of water piped into the house;
deprivation index Di offers advantages of interpretation over DT and • Lived in a house not connected to a public sewer;
the subsequent analysis will, therefore, be conducted in terms of this • Lived in a house that represented nonpermanent accommodation;
measure. Since, by definition, I:L 1 a k = 1, Equation 2.15 implies • Lived in a house for which the ratio of the number of residents tu the
that O::: Di ::: 1 : D; =O when none of the DICs are present, that is number of rooms was greater than 1;
when Iik = O for all k = 1 ... K and D ; = 1 when all the DICs are • Suffered from a long-term illness, health problem or handicap which
present, that is when I ik = 1 for all k = 1 ... K. limited his or her daily activities or the wo rk he or she could do.
A major problem in constructing a deprivation index lies in decid-
ing on the DICs that should enter its construction. Reviewing the Using this information, separate deprivation índices were con-
literature on the construction of deprivation índices, Nolan and structed for retired and nonretired persons. Both índices were built
18 [9

around the sarne set of DICs, set out above, but the weights attached • UE; = 1, if the person was unemployed; UE ; = O, othe1wise:
to these DICs differed according to whether the person was retired or • HN UM ; = 1, if the number of persons in the household were <;Íx or
not. As mentioned above, the weight associated with a DIC reflected more; HNUM ; = O, otherwise;
the frequency with which the relevant DIC was experienced-the • SNPAR; = 1, if the person was a single parent: SNPAR ; = O. othe1wise:
lower the frequency, the greater the weight. The frequencies with • AREA.; = 1, if person was resident in area a of Northern Ircland:
which severa] of the DICs were experienced were, however, consider- AREA.; = O, otherwise. There were 10 <;uch areas 21 identified 111 tbc
ably different for the retired and nonretired parts of the sample, and 1991 Census for Northern Ireland.
the use of different weights in constructing deprivation índices for
retired and nonretired persons reflected this difference. Depending ln addition to these variabl es, it was possible that th e levei of
on the calculated value of bis or her deprivation index, each person deprivation might depend on the sex of a person and, because in
was then assigned to one of three deprivation levels-not deprived, Northern Ireland Catholics are a relatively disadvan taged group,
mildly deprived, and severely deprived 18-and associated with each also upon his or her religion. (Of 13,164 persons in th e sample:
outcome (j = 1, 2, 3) and each person (i= 1, ... , N), was avalue of 7,243 were men and 5,921 were women; 4,364 were Catholics and
Y;, an ordinal dependent variable, such that: 8,800 were Protestants22 ). To account for this, two other variahles
were considered:
• Y; = 1 if the person was not deprived
• SEX; = 1, if the person was female, SEX , =O o therwise;
• · Y; = 2 if the person was mildly deprived
• CT; = 1, if the person was Catholic, CT ; = O. othe1w ise .
• Y; = 3 if the person was severely deprived
Consequently, in the context of this application Equati on 2. 1. was
It should be emphasized that it is these data, on the ualues of Y ;,
specified as
that would typically be auailable to a researcher for analysis. Of the
total sample of 13,164 persons, 45.9% (6,042 persons) were classed
as being not deprived, 34.9% (4,594 persons) were classed as being
D; = (3 1 + (3 2 * SEX; + (3 3 * CT; + (3 4 "A GE;
mildly deprived, and 19.2% (2,528 persons) were classed as being + (3 5 *AGE~ + (3 6 * HIGHED; + (3 7 * MIDED;
severely deprived. +f3s *RET; +{3 9 d NAC;+(3 10 t UE;+ /3 11 * HNUM,

Equation Specification + !31':. * SNPAR; + f3 t3 * AREA2i . . . + (3 21 * AREA Iili + F,


The determining variables used to "explain" a person's deprivation = Z;+s;. (2.16)
levei were: ·
The squared value of the age variable ~ 1GEf above) introduces a
nonlinearity to the age effect: the marginal effect of an increase in
• AGE; in years, normalized by setting AGE; = O for persons who were
16 years old; 19 age upon D; depends upon the age from which the increase takes
place. If {3 4 < O and {3 5 > O, then increasing a person's age reduces
• HIGHED; = 1, if the person had first, or higher, degree qualifications
his or her deprivation score, but this effect is smaller the older the

I
of UK standard; HIGHED; =O, otherwise;
• MIDED; = 1, if the person had post-A leve!, but less than degree,
person is. 23 1

I
qualifications 20 ; MIDED; =O, otherwise; I
The Equation Statistics
• RET; = 1, if the person was retired; RET; =O, otherwise;
• INAC; = 1, if the person was economically inactive; INAC; = O, The estimated parameters Sk and 81 and 82 maximize the li keli-
otherwise; hood of observing the sample in which N 1 = 6042, N2 = 4594. and
~I
L-V - N~OO ~ ~~~~ ~ ~~~~N
~~OON~~-~ON~~~o oooo o-~o
oo~o oo o~~~o ~v~~ oo ~M~~ ~
N-~o~~--o~~oo~~~~~o ~ ~
~0~ - ~~N~OO~O~N~~OO~~~­
~~-o~~OO~NN~NOO~ OO ~N~~OO
11
N 3 = 252R (see Equation 2.4). These estimates are shown in Table 2.1 ~~~~~~~~~~~~~ ~~~ ~ ~~~~~
~
~ ooooo~~oo~ oooooooooo

for the ordered logit model and in Table 2.2 for the ordered probit
model. The z-ratios in Tables 2.1 and 2.2 are the ratios of the esti-
mated coefficients to their estimated standard errors: the z-ratios are
I~ ~
~
~
§
..
i::
~
~
?o
~
I I I I i I

(asymptotically) distributed as N(O, 1) under the null hypothesis that t o ~ ~ ~


~~N~~ ~~~N~~ 00 ~ 00~
the associated coefficients are zero.24 o ~
~
-~- O~NOO~N~ON~
C~OOMOOOO~~~
~
~~~~ oo
~~N~~
~-N-~ ~
11 ~ 00 ~\Q ~M~Ov~tn\OOll")tn~~r-Ntn <!.)
Greene (2000, p. 831-833) has a number of suggestions for measur- ~~~o~~~No ~o~~~ ~ oo~~~oo
">.: ~ O~OtnV MV~~OOt-OOli")QO'\MOt-00

""'
I
N.....-400V.....-4M00'\ 00 ~ tnO~NNMriOV
ing the "goodness-of-fit" of equations with discrete dependent vari- OOOOO~rl~OOO OOOOOOOOO
E
11 ~
-<:> I I I I I I I I
ables. He suggests that, at a minimum, one should report the max- ~ "'0..
imized value of the log-likelihood function. The values of L 1 are Q., c
~
-

the maximized log-likelihood values shown at thc head of the tables ~ ·u
~
c::
(respectively, -12423.56 and -12426.25). Since the hypothesis that "c:: ~j"'
:$
all the slopes in the model are zero is often interesting, the results f "E"'
.!::;
õ11
-
11
Q.,
~0~0
~0~0
~0~0
~0~ 0~ 0~ 0~ 0
~0
0000000000 000\00000V)O
I0 ~0
~0~0
~0~ 0~ 0~ 0~
0 0 0 0 0 0 0 0 0 0 0 0\000000......-tO

of comparing the "full" model with an "intercept only" model should ~


also be reported. The x2 values at the head of Tables 2.1 and 2.2 -5" "';..:
Ci
(respectively, 2571.98 and 2566.6) are defined as 2(L 1 - L 0 ), where z ~
L 0 is the value of the log-likelihood function when the only explana- .s ~-

-
~
.-; c: ,.....

]
NONOO~O~O OO~NN~~ oo ~~ ~ ~ ~
tory variable was the constant term and L 1 , as observed earlier, is N O
r.u ·.;::
~
~~ OO~~~~~M-~ON~~N~~O~
~~o~-~~N~~ ~o~~oo--~o~

~O'-<
(1"', r-.
N V)""Ci"
the value of the log-likelihood function when all the explanatory vari- ,_J ~ ~ ~ v) cO r-..: 0 M M v) 0 ~ r..: 6 -.i ~ ..0 v-; <r1 ~ 00 ~~-

~·§. I I - N I I - I I 000
o"'
-<:>
ables were included; the degrees of freedom are equal the number
E-<Cí
of slope coefficients estimated. These x2 values decisively reject the 'à'
g
null hypothesis that the model did not have greater explanatory power ~
·~
than an "intercept only" model. 25 ~
~· I~
,_J
The "pseudo-R 2 " is defined as 1 - L d L 0 and is due to ]
N~~~~~~~N~
N ~~~~NOO-N-~~~~~~0~~~
~~oooo~ ~ ~ oo ~
......
~
~
~~~~NOOOO~~ M~MN~OO~~~~~ 00 -

s"
~
McFadden (1973). This is bounded from below by O and from c~~o ~~~~o~~No~~~o~~~
"''
"2" 1:! ~ - N
~~~o~~~ oo~~M~ooo~~~~oo~ ~ ~
~ :;
above by 1. A O value corresponds to all the slope coefficients being o ~ ~MOO OO ~~ ~N~~~~~~~~~~~
ooooo ooo~ooooooooooo
~ ~
o o ~ ~
N
,..... 00000000 000000000000 0 0 v v
zero and a value of 1 corresponds to perfect predíction (that is, to I ~ b :.: :; ::s -
L 1 = 0). Unfortunately, as Greene (2000) notes, the values between 11 ] + + +-
:j~
O and 1 have no natural interpretation, though ít has been suggested g "
-C
-<:>-<:>-<:>
"' "'v "'v
~
that the pseudo-R 2 value increases as the fit of the model improves.
Other measures have been suggested. Ben-Akiva and Lerman (1985) ~~ ~
~~~ ~oo ~~N -oo ~oo - ~~ ~ ~
~~~~~8f::N~~ ~~f;;~ ~ ~ ~ g~~
~
~ ~
~ ~
::(j
:::>
'-l
::l
~
- ~ ~~~00~~~-N~~ ~N- N ~~~-0 ~~
and Kay and Little (1986) suggested a fit measure which measured .<::;:
- §tj V N tn O \0 t- O CO O\ .....-~ li") O\ O\ \O t- 00 t""'l \O 0\ li")
~OONO N ~~O~N0 ~~-~~00~~~
o 00
00 ~ ctct~
- o
the average probability of correct prediction by the prediction rule. ~~ ~ dI 6 dI ~ d ~ ~ dI 6I d ó ;;I 3 6I d d 6 ~ 3 ó c:i r~
'i'; ~
Cramer (1999) suggested a measure that corrected for the failure
..§
of Ben-Akiva/Lerman measure to take into account the fact that, ~
in unbalanced samples, the less frequent outcome will usually be ·~
predicted very badly. ln their survey of pseudo-R 2 measures, Veall -..1
and Zimmerman (1996) argued that in models of the multinomial ] ~ ~ ~ ~ ,..... N

probit or logit type, only the McFadden (1973) measure "seemed 1:! x v~-~ ~~§~~~~~-Ec E :5 ~
o ~ ~ ~ " c " ·- ~ c:: c::
".ucoco~.-~~~~ ~ ~ ~ c o ~ "' ç ~
wro'Ou co uco~"O- y ;: :. . , I _, C'! rr1
"'
<--1

worthwhile."
21
~
...,,
~._1
(<)
~ ~~MN~~ ~~~~00 N~~~NN
~~~~~~~oo~-~O~~~MNOON~
o
o ~~~g~~~~~~~~~N~~g~~~
11 ~~~8~~~~~8~~88~~~8~~
N
~
~~~~o~~~~~~~~~~~~~~~~~
~ 00000000000000000000
An alternative to "point" measures of goodness-of-fit might he to
c ~ I I I I I i assess the predictive ability of the model. Such assessments are rou-
""' ;::
~ ~ tine in models of binary choice where thc hits (Y; = 1) and misses
~
.. ".,;,
6 (Y; =O) predicted by the model on the basis of a prediction rule (say,
8 v Y; = 1 if P; > 0.5 , Y; = O, otherwise) are compared to the actual
8 ~
o ~~Mr-
0\
oo~OOMM r-r-v~~ooooO\oor-~r-~
V\ON\OO\O\t-1""""10\M~M~V)N\ON1""""1r-t-
~ hits and misses. This procedure could be extended to multiple ou r
~
11
~ ~~~~~~~~~~~~~8~~Ng~~ 2 come models where the predictions could be based on a rule whereby
N)( N~~OVOVlV\Or-~Ot-t-VlOOt-Vl~\OV ~)

Y; = m, if Pim = Maxj (p ij )· These predictions could then be com-


~~NO~OOOOM~OOO~~~~~N~~~
1"""'1QOON\Ot-~V)Vr-~MON~1"""'1~00N 6
"
.c
óóóóóóóóóóóóóóóóóóóO e
~
I I I I I I I I
"'o. pared to a "naive" model that predicted ali cases to be in the modal
l:l..; i':' category of the clependent variahle, anel the percentage reduction in
g ~
"ü error in moving from the naive to the full moclel could be computed.
..0
~
"O ~
c i(] This approach is, however, not withou t pitfalls. First, unlike the case
]"'
N 0000000000001"""'100001""""1V)0
11
888888888888~88888~8
of the linear regression model, where the coefficients are chose n to
c
<;
SI"Cl.
~
óóóóóóóódCóódóóóóóóó maximize R 2 , in discrete choice models the coefficient estimates do
o~
..c not maximize any goodness-of-fit measurc. So to assess the model
~ ::i on the basis of goodness-of-fit, however measured, may be mis lead-
.5 ~- ing. Second, the predictions are critically depend~nt on the prediction
~
"'. co

] rule adopted, and the adopted rule may tum out to be quite inappro-
M ~~~N~M~~~MMN~M~M~OO~~
"'·-
f.Ll ~
~

N
VlOOOt-VlNO\O~VOt-MOONOOt-M\0~
\Ot-NOOO\N0\00\0\0\0-VlMt-OOt-Nt-0
M ~
-~~~
~~~
...J -~ ~~~OO~~M~~~~~O~~~~MOOO -.:t ("') ..-1 priate to the needs of the sample. For example, in a binaty mod el, if
I 000
~!
I ~
~
I I - N I I
the sample is unbalanced-that is, has many more ls than Os-then a
c
'ê5' rule that the model should predict the outcome for which estim atecl
~
o ii probability is greatest might never predict l 1 (o r a 0).
:.õ §
8 <:
A<
"O .ri
i(]
g N
~ ~~oo~~~~
~:"""-- V) \0 1""""1 M 00 \0 0\ _. r-<
~~"'~~~~~
r-t r-
r-1 00 1""""1 1""""1 l.rl V'l
~,....
00 ~ The Estimates
~ ~
., ..0 "E
~N\OVlr-0\001""""1000\\00\00r-r-~1""""1~1""""1r-

~~~8~~~Õ~~~~~~~~~~~~
~

~
~

""' '"""~
~ N

"8
o g ~
NNOOVl~MVt-MMMV~VVVM~V
00000000000000000000
00000000000000000000
~~
..,.~
o o
00
; :;
~ ~
The coefficients in Tables 2.1 and 2.2 are ali significantly different
I
11
~ c.
v
:::t
v ~
:::t ::!
from zero except for the two area coefficients associated with living
in County Down and in County Deny (variables dwn and dry in the
~ ++-i-
H g _g ..c..c..c
"< '"" "<
tables). The items under the column "P > lzl" in Table 2.1 show

~
>:>
ti:; v v that under the null hypothesis that the coefficients on dwn anel dry
'·. ,..... N
...;) ~ ~~~N-~OO~~~~NOOM~ ~~ 'O ; ; were zero, there was, respectively, a 46.9cta and a 31.5 % ch ance of
~ ~~g~~~~~~~~I::~~~~~~!:~ ~~·
... ... ~...
<'"> M
i:: ~ ~
observing a value in the distribution tails bcyond the observed valu es.
~ ·õ 8 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
O
~
N
A.. A. A<
... § ~;::sg~;e~~~~~~~~~~~;::!8~
o ......
0.0492598 anel 0.0791624. The items under the column "P > lzl" in
~ ~ OOOOOOOOOcióciciOOcicic:iOO o,....;
- N

~ V I I I I I I Table 2.2 show these chances as 59.1 % and 44.5 %, respectively. As


~· regards the other coefficients, the items uncler the column '' P > lzl"
~ in Tables 2.1 and 2.2 show that, under thc null hypothesis that they
~

~I
were zero, there was a "zero" chance of ohserving a value in the dis-

"E"'
a ~
I
><
N u 1$-o ...
.r:"'E"'
1i'n1i'n~"'.,C>l):sgo..-o~-"'<::õs~c:-s
c::
~ ü "" ro ~ .5 ;:::s :a 8 ..c 53 ~ "O t> "" u ~ .D "O J:::
~N
; :i
~ ~ ;::a.., I ...-o4 N M
tribution tails beyond the observed values. The items under th e col-
umn "95% Conf. Interval" show the limits uutside which the estimates
must lie if their associated coefficien ts are to be l·egarded as difft'rent

22
from zero at the 5% levei of significance. ln both Tables 2.1 and 2.2, these categories are defined in terms of the probabilities of the values
the estimates on dwn and dry coefficients lay within their 95% limits, of an underlying latent variable crossing particular thresholds, where
while the other coefficient estimates lay outside their limits. these thresholds are established by the values of the cutoff points.
ln interpreting the individual coefficients the importance of the ln the ordered logit points the two cutoffs were estimated27 as
ceteris paribus ("other things equal") clause must be emphasized. _cut1 = 0.1804476 and _cut2 = 2.058464 where, in the notation
The estimate on the variable ct was positive, indicating that ceteris of Equations 2.5a to 2.5c, 81 = _cut1 and 82 = _cut2. Consc-
paribus Catholics had a higher probability of being severely deprived quently, in the ordere_E logit model, the probability of a person being
and a lower probability of being not deprived than Protestants. This not deprived was fr(Z; + s; s 0.1804476), being mildly deprived was
is not to say that every Catholic had a higher probability of being Pr(0.1804476 S Z; + s; S 2.058464), and being severely deprived
severely deprived than every Protestant. Rather the correct interpre- was Pr(Z; + s; 2:: 2.058464). ln the orde red probit model the two
tation is that, given two persons who were similar in respect of every cutoffs were estimated as _cut1 = 0.1009536 and _cut2 = 1.212453.
characteristic except religion, the person who was Catholic was more ln the ordered pr_2bit model, the probability of a person being not
likely to be severely deprived and less likely to be not deprived than deprived was Pr(Z; + s; s 0.1009536), being mildly deprived was
the person who was Protestant. The estimate on the variable sex was Pr(O.l009536 S Z; + s; S 1.212453), and being severely deprived
negative, indicating that ceteris paribus women had a lower probabil- was Pr(Z; + s; 2:: 1.212453). The reason the estimated cutoff points
ity of being severely deprived and a higher probability of being not were different in the two models is that the slope coefficient esti-
deprived than men. The other coefficients carry a similar interpreta- mates were also substantially different. However, the conjunction of
tion. For example, the negative estimates on the variables highed and the estimated slope and cutoff coefficients in each model meant that
mided, indicate that ceteris paribus persons with educational qualifica- the predictions from the two models were very similar.
tions had a lower probability of being severely deprived and a higher
The Predicted Probabilities: Calculation Frmn Individuais
probability of being not deprived than persons without educational ..-. ..-. K ,. .
qualifications. As the discussion of the previous discussion indicated, Using the estimated Z;- which, remembering that Z; = Lk=t /3" X;k,
the signs of the coefficient estimates allow only the direction of change are computed using the estimates shown in Tabk 2.1, in conjunction
in the probabilities of the extreme outcomes, following a change in with the values of ·the determining variables for every individual-
the value of the associated variable, to be predicted. Thc direction of STATA will predict, for each of the 13,164 individuais in the sample,
change in the probabilities of the intermediate outcomes cannot be the probabilities of belonging to the three different deprivation lev-
inferred. For example, from an inspection of the estimates we can- ~ eis, by computing the Pil> pi2, and Pi3 of, respectively, Equations
not say whether the probability of a woman being mildly deprived is 2.5a, 2.5b, and 2.5c. Table 2.3 shows, using the logistic distribution for
larger or smaller than that of a man. the error term s;, these calculations for the first twenty five persons
in the sample. We see from Table 2.3 that, given his or her cir-
The Cutoff Points cumstances, person 17 had a very low probability of being severely
deprived (4%) and a high probability of being not deprived (78% ).
The estimated cutoff points are shown below the estimates. A dis- On the other hand, person 2's circumstances meant that he or she
play at the bottom of Tables 2.1 and 2.2 shows how the probabilities had a very high probability of being severely deprived (69%) and a
for the categories were computed from the fitted equation. 26 This very low probability of being not deprived (6% ). It is instructive to
display was generated by including the "table" option in the syntax compare the predicted probabilities from the logit model with the
of the oprobit and ologit commands set out in the program listing predicted probabilities obtained from assuming that the s; were nor-
in Chapter 4. The assumption behind ordinal regression is that the mally distributed. These are shown in Table 2.4. A comparison of
observed categories represent crude but correctly ordered differences Tables 2.3 and 2.4 shows that, notwithstanding the differences in coef-
on an underlying continuous scale. The probabilitics of belonging to ficient estimates between the logit and probit models, the predictecl
26 )I
• I

1:<\BLE 2.3 TABLE 2.4


Ordered Logit Calculated Probabilities of Being at Different Ordered Probit Calculated Probabilities of Being at Different
Deprivation Leveis: First 25 Persons in the Sample Deprivation Leveis: First 25 Persons in the Sample
p1 p2 p3

I
Pnum p1 p2
Pnum f' 3
1 0.2468799 0.4350583 0.318061 7 1 0.2564767 0.41 97788 0.32374·1.'·
2 0.0635542 0.2438714 0.6925744 2 0.0561468 0.2607234 0.6831 2CJ7
3
4
0.1456977
0.221723
0.3815914
0.4290327
0.472711
0.3492444
I 3 0.148693 0.3 789889 0.4723lt{J
4 0.2297609 0.41 52423 0.35499óR
5 0.1984064 0.4197503 0.3818433 5 0.2077654 0.4091158 0.3831 J88
6 0.0774309 0.2769676 0.6456016 6 0.0723101 0.29l8726 0.63 'i81 77,
7 0.3072922 0.4363922 0.2563156 7 0.3164209 0.4204631 0.263116
8 0.6020886 0.3061397 0.0917717 8 0.5960203 0.31 62006 0.08777(1)
9 0.6067917 0.303063 0.0901453
I 9 0.6006402 0. 31.14694 0 . 08~890S
10 0.5668221 0.3285579 0.10462
11 0.3003936 0.4370245 0.2625819 I 10 0.5609338 0. 3361012 0. 10296:'1

12
13
0.4497833
0.1598765
0.3926532
0.3946234
0.1575635
0.4455001
!i 11
12
0.3078035
0.4515077
0.4210712
0.3873188
0.2711253
0.16ll 736
13 0.1692549 0.3 920921 0.43;)653
14
15
16
0.5337384
0.6265731
0.5928456
0.348435
0.2899149
0.3121286
0.1178265
0.0835119
0.0950258
l 14
15
16
0.5286788
0.6201 291
0.586R787
0.35.10054
0.3016761
0. 32l5305
0.11 R3J)7
0.07R194R
0.0915909
17 0.7833934 0.1760466 0.04056 17 0.7813467 O.J 89157 0.0294963
18 0.5323462 0.3492447 0.1184092 18 0.5321 902 0.35l2332 0.1 165766
19 0.5169058 0.3580674 0.1250268 19 0.5122991 0.3610432 0. 1266577
20 0.6024157 0.3059263 0.091658 20 0.5962939 0.3160395 0.08 766(1(,
21 0.5469929 0.3406158 0.1123913 21 0.5415704 0. 34 64162 O. ll 20 J'J,.J
22 0.5918241i 0.3127853 0.0953901 22 0.5858997 0.32.20953 0.09200:"
23 0.7823876 0.1768215 0.0407909 23 0.7803024 0.1 899634 0.0297342
24
25
0.5389312
0.4514354
0.3453955
0.3918849
0.1156733
0.1566797
- ' 24
25
0.5342544
0.449369
0.350 1835
0.3 8~l1331
0.1 15562 1
0.16 2497<l
Pnum = person number •
,, Pnum = person number

' '
pl = probability of being "not deprived"
p2 = probability of being "mildly deprivcd"
p3 = probability of being "severely deprived"
tf! p1 = probability of being "not deprived"
p2 = probability of being " m.ildly deprived"
p3 = probability of being "severely depriveci"

probabilities are very similar. For everyone of the 25 persons listed, Pi2, and p;3 , as shown in Tables 2.5 and 2.6 (referred to as jj 1, J3 2• and
the predicted probabilities of being at the three deprivation leveis h) were slightly different between the logic and probit models. How-
are not very different under the logit and probit formulations. ln ever, after rounding , these differences disappeared and the estimates
that sense, it did not matter which model was used; the predicted were respectively: 46%, 35 % and 19%. Un der the logi t model. the
outcomes were very similar. mean probabilities, as calculated above, are equal to the sample pro-
The individual probabilities can be used to generate sample statis- portions in the three categories of deprivation.28 This is a property
tics of deprivation leveis. These are shown in Table 2.5 for the logit of the ordered logit model. The mean probabiliúes under the probit
model and in Table 2.6 for the pro bit model. The mean values of pi1, model are dose, but not equ al. to the sample proportions and this too
L.õ k.';l

is regularly observed in practice. The mean probabilities as set out in


Tables 2.5 and 2.6 sum to precise unity. This is because the individ- TABLE 2.5
ual probability estimates (from which the mean is computed) sum to Sample Statistics of Deprivation Leveis: Logit Model
-
unity. Pr(y=1 )
The median (50th percentile) values of Pit> Pi 2 and Pi3 , as shown ~
Percentiles Smallest
in Tables 2.5 and 2.6, were also slightly different between the logit
and probit models though, again, these differences disappeared after t 1%
5%
0.0675202
0.1466481
0.0185615
0.0198162
rounding, yielding estimates of the median probabilities: 48%, 37% 10% 0.1875675 0.0300967 Obs 13164
and 14%, respectively. It is not necessary that the median probabilities 25% 0.303902 0.0347162 Sum of Wgt. 13164
sum to unity for any of the two models and, indeed, in this application 50% 0.4843619 Mean 0.4592781
they fall short of unity. The reason for this is that while, for any one Largest Standard Deviation 0.1889648
person the estimates must sum to unity, the median values of ftn, Pi 2 75% 0.5993176 0.8697239
and Pi3 need not relate to the sarne person. For example, if the values 90% 0.6954355 0.8697425 Variance 0.035 7077
95% 0.7368268 0.8699151 Skewne,;s - 0.1866478
of ftn, Pi2 and pi3 (i = 1, ... , 13, 164) were arranged in ascending 99% 0.8308569 0.8699151 Kurtosi; 2.155329
or descending order of magnitude, the person at the 50th percentile
for ftn may be different from the person at the 50th percentile for Pr (y = 2)
fti 3 . On the logit estimates (Table 2.5), the probability of being not Percentiles Smallest
deprived (ftn) at the lowest percentile was 6.7% and, within this class, 1% 0.1389147 0.0915195
the lowest estimated value was 1.9%. At the other end of the scale, 5% 0.2075179 0.0969696
10% 0.239251 0.1077327 Obs 13164
the value of ftn at the highest percentile was 83.1% and, within this 25% 0.2989144 0.1077327 Sum of Wgt. 13164
class, the highest estimated value was 87%.
50% 0.3674155 Mean 0.3493639
Largest Standard Deviation 0.076207
Predicted Probabilities: Calculation at the Mean 75% 0.4150781 0.4377985
90% 0.4339717 0.4377985 Variance 0.0058075
ln the previous subsection the estimates of the mean probabili- 95% 0.4366746 0.4377985 Skewness - 0.8337451
ties were computed from the estimates of the individual probabili- 99% 0.4377566 0.4377985 Kurtosb 2.'!17122
ties, ftn, pi2, and Pi3 (i= 1, . .. , 13, 164). Under the logit model the Pr (y = 3)
mean probabilities were exactly equal to the sample proportions, and
Percentiles Smallest
'·. under the probit model they were close to the sample proportions.
However, there is an alternative way of calculating mean probabilities 1% 0.0301859 0.0223522
5% 0.0517813 0.0223522
(see Greene, 2000, p. 879) and this leads to a different outcome from
that set out above. Let xk = L;:,l xik/N be the mean value of the
10%
25%
0.0627571
0.0927392
0.0223856
0.0223891
Obs
Sum of Wgt.
13164
13164
k 1h determining variable and let
50% 0.1399815 Mean 0.191358
Largest Standard Deviaition 0.1405856
75% 0.2593728 0.8095671
Z= tZJN= t(tskxik)/N= I;SkL;X;kfN 90%
95%
0.3984033
0.4708132
0.8312855
0.8832142
Variance
Skewne,.s
0.0197643
1.451671

2::: skx.k
= (2.17)
99% 0.6786129 0.889919 Kurto si ~ 4.969932

k
30 f :q

TABLE 2.6 be the mean value of the Z;. Then complite p1• p2 , and p3 as
Sample Statistics of Deprivation Leveis: Probit Model
Pr (y = 1)
p1 = Pr(e; ::::: 81 - Z) (2.18a)

Percentiles Smallest P2 = Pr(8 1 - Z <e; :S 8~- Z) (2.18h)


1%
5%
0.0610129
0.1495682
0.0089802
0.0095785
P3 = Pr(e; :=: 82 - Z). (2.1 8c)
10% 0.1936998 0.0190097 Obs 13164
25 % 0.311691 0.0255515 Sum of Wgt. 13164 Then, in general: fh # p1, fh # p2 , and fJJ # p3 . This is not sur-
prising because while both the P; and th .;: ft; (j = 1, 2, 3), purport
50% 0.4870896 Mean 0.4602124 to measure the overall probability of being at the different leveis of
Largest Standard Deviation 0.1855043 deprivation, they are computecl very differently. The P; are computed
75% 0.5928672 0.8747257
90% 0.6~91779 0.8747432 Variance 0.0344118
as the mean of the estimated individual probabilities, P;;. and the
95% 0.7340055 0.8749362 Skewness -0.212917 P;; are computed from the values of the determining variables for
99% 0.8299778 0.8749362 Kurtosis 2.250704 the individual in conjunction with t.he estimated cutoff point~ . On
the other hand, the p1, which bypass the individual probabilities, are
Pr (y = 2)
computed instead from the mean value, Z, of 'the individual Z; (o r
1%
Percentiles
0.1500103
Smallest
0.0957713
equivalently, calculated using the mean values, x,_of the determin-
5% 0.2209043 0.09959
ing variables, X;k) in conjunction with the estimated cutoff points.
10% 0.2547602 0.1132009 Obs 13164 Tables 2.7 anel 2.8 compare the values of P; and ft; (j = 1, 2, 3) for
25% 0.3088254 0.1132009 Sum of Wgt. 13164 the logit and probit models, respectively.

50% 0.3662931 Mean 0.3488479 Calculation of Marginal Probabilities: Continuous Variables


Largest Standard Deviation 0.0663654
75 % 0.4044376 0.421618 The only continuous variable in thc model, as specified in
90% 0.418923 0.421618 Variance 0.0044044 Equation 2.16, are AGE; and AGET. l n order to compute the
95% 0.4207648 0.421618 Skewness - 1.039971
99% 3.528536
effect of an increase of one year in the age of a person on the prob-
0.421588 0.421618 Kurtosis
abilities of his or her being at the three different deprivation leveis.
Pr (y = 3) one needs to evaluate the appropriate density functions at thc rele-
:\ i Percentiles Smallest vant points and multiply these by the coefficient estimates associatcd
1% 0.0194342 0.011863
5% 0.0412402 0.011863
10% 0.0542446 0.011892 Obs 13164 TABLE 2.7
25% 0.0890826 0.0118946 Sum of Wgt. 13164 Overall Probability of Being at Differe nt Deprivation Leve is:
Logit Model
50% 0.1402645 Mean 0.1909397
Largest Standard Deviation 0.1432815 CALCUL4TF.D AS.· CALCULATED FR OM
75% 0.2674853 0.7992974 Probabi!ity of Mean of Individual Mean of Derenninill(!,
90% 0.4023936 0.8322631 Vari ance 0.0205296 Being: Probabilities: jj Variables: Í'
95 % 0.4708187 0.8908314 Skewness 1.315678
99% 0.668156 0.8952485 Kurtosis 4.483449 Not deprived 0.459 0.447
Mildly deprived 0.349 0.394
Severely deprived 0191 0.159
JL

TABLE 2.8 81 - Z for the marginal effect on Pr( Y; = 2 ); and 82 - Z for the marginal
Overall Probability of Being at Different Deprivation Leveis: effec t on Pr(Y; = 2). Multiply these evaluations by the coefficient estt-
Probit Model mate on AGE; and also multiply these evaluations by the coefficient
estima te on AGEf . Add these two result~ to obtain the effect on the
CAL CULATED A S: CAL CULA TED FROM: probabilities ( of being at the different deprivation leveis) of increasing
Probability of Mean of Individual Mean of Determining
the auerage age of the persons in the sample by one year.
Being: Probabilities: p Variables: p

Not deprived 0.460 0.452


Tables 2.9 and 2.10 shows the marginal etfects of an increase in age,
Mildly deprived 0.349 0.387
Severely deprived 0.191 0. 161 computed in the two different ways describcd above, for the logit and
probit models.

with the two. variables (see Equations 2.12a to 2.12c and 2.13a to Calculation of Marginal Probabilities: Dummy Variables
2.13c). This can be done in either of two ways: Earlier it was observed that the effects of a dummy variable should
be analyzed by comparing the probabilities that result when the
1. For each individual in the sample, evaluate the marginal effects at the dummy variable takes one value with the probabilities that are the
relevant points for that individual. These relevant points for individual consequence of it taking the other value. the values of the other
~(i =~1 ... 1~, 164} are: 81 - Z; for the marginal effect on Pr(}j = 1); variables remaining unchanged between the two comparisons. This
82 - Z; and 81 - Z; for the marginal effect on Pr(Y; = 2); and 82 - Z;
methodology is now used to analyze the effect of religion on the
for the marginal effect on Pr(Y; = 3). Multiply these evaluations by the
coefficient estimate ({3 4 ) onAGE; and also multiply these evaluations by probabilities of being at different leveis of deprivation by comparing
the coefficient estimate ({3 5 ) on AGE~ . Add these two results to obtain, the situation in which (for all i) CTi = 1 (the religion dummy) with
for each individual, the marginal effect on his or her probabilities of a the situation in which (for all i) CTi = O. As with the case of contin-
small (one year) change in age. The mean of these individual effects uous variables, this broad methodology can be implemented in either
then yields the auerage effect on the probabilities (of being at the differ- of two ways:
ent deprivation leveis) of increasing the age of euery person in the sample
by one year. 1. Suppose ceteris paribus that all the persons in the sample were Catholic
2. Compute the mean value of Z; over ali the persons in the sample (see so that CT; = 1 (i = 1 ... 13, 164). Then for this scenario, Z, (the
Equation 2.17). If this is denoted Z , evaluate the marginal effects at the estimated va!ue of Z;) is computed from Equation 2.16, using the coef-
·i relevant points: 81 - Z for the marginal effect on Pr(Y; = 1); B2 - Z and ficient estimates ~ k and with CT; = 1 fo r ali i. Call this estimated

TABLE 2.9 TABLE 2.10


Marginal Effect of an Additional Year of Age on the Probability Marginal Effect of an Additional Year ot Age on the Probabili ty
of Being at Different Deprivation Leveis: Logit Model of Being at Different Deprivation Leveis: Probit Model
li
CALCULATED AS: CALCULATED FROM: CALCULATED AS: CALCULATED AT
Probability of Mean of Individual Mean of Determining Probability of Mean of Individual Mean of Detennining
Being: Marginal Effects Variables Being: Marginal Effects Variables
Not deprived - 0.00523 - 0.00607 I Not deprived
: -0.00524 -0.00591
Mildly deprived 0.00191 0.00279 I• Mildly deprived 0.00171 0.00226
Severely deprived 0.00332 0.00329 i Seve rely deprived 0.00353 0.00365
34 Yí

value Zf. Let fit denote the (computed) probability of person i being TABLE 2.11
at deprivation leve! j (j = 1, 2, 3) in this hypothetical situation, where The Effect of Religion on the Probability of Being at Differen t
these probabilities are computed from Equations 2.5a to 2.5c-or, if a Deprivation Leveis: Logit Model
probit model is used, from Equations 2.1la to 2.11c-with zj = Zf.
Now suppose ceteris paribus that ali the persons in the sample were CALCULATED AS: CALCULATED AT
Protestant so that CTi = O for ali i = 1 . .. 13, 164 and let Zf represent l'l'!ean of Individual Mean of Determining
the estimated value of zj. From Equation 2.16, Zf = zr s3·
+ Let p~
denote the (computed) probability of person i bcing at deprivation levei
Probability of
Being: CAT
Marginal E.ffects
PRT CAT
Variables
PRT

j (j = 1, 2, 3) under these hypothetical circumstances. For any person Not deprived 0.432 0.472 0.417 0.4()2
i in the sample, the difference between the p~ and pf; is entirely due Mildly deprived 0.360 0.346 0.407 0.3:-17
to religion: the fit are computed using Zf-that is, with the Catholic Severely deprived 0.207 0.182 0.176 0.1 ~ 1

variable "switched on"-and the fit are computed using zr, with the
Catholic variable "switched off," without the value of any other variable
b emg -c = ""
· d . If Pi
· a1tere 'c/"
N Pii
wi=I Jv an d pi
-P = ""N 'P N are t h e respec-
wi=I Pii/
when the probabilities were computed as the mean of the individ-
tive means of the two sets of probability estimates, then the difference
ual probabilities than when they were computed from the average
between them measures the effect of religion on the mean probability
of being at different deprivation leveis. characteristics of the sample. This is not surprising. Being in a state
of severe deprivation is the result of possessing "extreme'' values
2. An alternative is to compare the probabilities that result when the vari-
able CTi takes its two different values across ali the persons in the of the deprivation determining variables. The infiuence of extrem e
sample, with the values of the other variables, in cach case, held at their values is dampened when the individual values are set equal to the
sample means. Denote by zcand zP the values of z when, respectively, sample averages. On the other hand, extreme values are allowed
CTi = 1 and CTi = O for ali i = 1 ... 13, 164 with, in each case, full play when the individual values are used in the probability
xik = xk for thc other variables. From Equation 2.16, zc = zP + s3. calculations.
This exercise, in effect, constructs a "straw person" who, apart from reli- So which method is the appropriate onc to use? The criticai qu es-
gion, embodies the average qualities of the sample and who is Catholic tion is how the values of the other variables are to be held constant,
in one scenario and Protestant in another. Then, using zc and then zP when the dummy variable of interest takes its two different valu es.
in Equations 2.5a to 2.5c, estimates of the three probabilities (of being The second method, in effect, assigns to each individual the values of
not deprived, mildly deprived, and severely deprived) can be computed
the sample means. But, of course, there is no sanctity to the mean.
for the two hypothetical situations in which first this person is Catholic
and second this person is Protestant. If these two sets of probabilities The common value assigned could be the median or any other va lue,
are denoted, respectively, ê;.f and ê;.f (j = 1, 2, 3) then their difference and each of these different assignments would lead to a different out-
measures the effect of religion on the probability of being at diffcrent
deprivation leveis.
TABLE 2.J2
Table 2.11 shows the values of PJ
and pf and of qf and qf The Effect of Religion on the Probability of Being at Different
Deprivation Levels: Probit Model
(j = 1, 2, 3) computed under the logit model, and Table 2.12 does
the sarne for the probit model. Two features of these Tables are sig- CALCULATED AS: CA LCULATED AT
nificant. First, for any of the two ways of computing marginal effects Mean of Individual Mean of Derermining
there was hardly any differt!nce between the logit and probit proba- Probability of Marginal Effects Variables
Being: CA T PRT CAT PRT
bilities. Second, for any one model there was considerable difference
between the probabilities calculated in the two different ways. Notice Not deprived 0.433 0.473 0.421 0.466
that the probability of being severely deprived was significantly higher Mildly deprived 0.359 0.346 0.398 OJ82
Severely deprived 0.208 0.191 0.181 0. 152
36 37

come in terms of the calculated probabilities, qj. The first methodol- + 'Y10 * CTi * HNUM ; + (31l * SNPAR;
ogy does not suffer from this defect. Since the individual values are
not interfered with, there would be a unique outcome in terms of the + 'Yu * CT; * SNPAR ;
calculated individual probabilities, and therefore a unique outcome,
under the two scenarios, in terms of the mean29 probabilities, Pj· + (3 13 *AREA 2; · · · + (3 21 *AR.l::,A toi
+ 'YD * CT ; *AREA 2i · · · + 'Yz! * CT; *AREA 10 i +si
Estimation Over Subsamples:
Characteristics Versus Coefficients = Z; + z; + s; .
Religion had an effect on the probabilities of bcing at different By including "interaction variables" in the equation-that is, the
deprivation leveis (Tables 2.11 and 2.12) because of the presence of (3 3 original explanatory variables shown in Equation 2.16 multiplied hy the
in Equation 2.16. This means that, under the first methodology, Catholic dummy variable- the effect of a particular variable on depri-
z; zr=I= and that, under the second, zc zP.
=I= Consequently, the vation is allowed to be different, depending upon whether the person
probabilities of being deprived were different when for each per- concerned is Protestant or Catholic. ln Equation 2.19, the (3 k are the
son in the sample CT; = 1 than when CT; = O. But this leads one "Protestant" coefficients 30 and the 'Yk-the coefficients attached to the
to consider the possibility that for every variable in Equation 2.16, interaction variables-represent the additional contribution to these
the "Catholic" coefficients are different from the "Protestant" coef- coefficients resulting from being Catholic. The 'Yk values measure,
ficients. ln other words, Equation 2.16 should have been estimated therefore, the strength of the "interaction effects." A test of 'Yk = O
allowing the Catholic and Protestant coefficients to be different from is, therefore, a test of the null hypothesis that there is no interaction
each other. There are two (almost equivalcnt) ways of implement- between the k th explanatory variable and the religion of a person. To
ing this. The first is to estimate a single equation but allow each put it differently, 'Yk = O implies that there is no difference between
Catholic coefficient to be different from the corresponding Protestant the Protestant and Catholic coefficients on the kth explanatory vari-
coefficient in the equation. The second is to estimate two separate able. The big advantage of this single equa tion "integrated" approach,
equations on the Catholic and Protestant subsamples. ln arder to over the two separate equations approach, is that it makes it possi-
implement the first strategy, define the equation as ble to easily test whether some variable had a differential impact on
the two groups. Indeed, the single equation "integrated" approach is
D; = (3 1 + ')'1 * CT;+ (3 2 * SEX; (2.19) explored in some depth in the subsequent chapter. However, to give a
flavor of the second approach, Equation 2.16 was estimated separately
·. + 'Yz * CT; * SEX; + (3 3 *AGE1 on the Catholic and Protestant subsamples. Tables 2.13 and 2.14 show,
+ ')'3 * CT; * AGE;+ (3 4 * AGEI respectively, the results of estimating Equation 2.16 separately for
Catholics and for Protestants, when the errors are assumed to be
+ ')'4 * CT; *AGEI + (3 5 * HIGHED ; logistically distributed.31
+ 'Ys * CT 1 * HIGHED; + (3 6 * MIDED 1 ln the equation estimates shown in Tables 2.13 and 2. 14, the
variables dwn and dry wcre dropped. This is because their coeffi-
+ 'YG* CT; *MIDED; + (3 7 * RET; cients were individually (on the z-score) and jointly (on a likelihood
+ 1'7 * CT; * RET; + (3 8 * INAC1 ratio test) not significantly different from zero. This is an impor-
tant methodological point: when equatio ns are used for prediction
+ 'Ys * CT; * INAC; + (3 9 * UE ; should they contain ali the variables, even though some of the coef-
+ 1'9 * CT; * UE ; + (3 10 * HN UM ; ficients may not be significantly different from zero, or should they
": t •

,__.. --

TABLE 2.13
w
00 Ordered Logit on Deprivation in Northern Ireland: Protestant Subsample

Ordered Logit Eftimates Log Likelihood = -8124.0888; Number of obs = 8800; LR x2 (17) = 1527.75; Prob > x2 = 0.0000; Pseudo R 2 = 0.0859
y Coefficient Standard Errar z P> lzl [95% Conf Interval]

sex - 0.2293014 0.0444134 - 5.163 0.000. -0.31635 - 0.1422527


age - 0.0275073 0.0060613 -4.538 0.000 -0.0393873 - 0.0156273
age2 0.0008768 0.0001131 7.754 0.000 0.0006551 0.0010984
ret 0.6275465 0.1008167 6.225 0.000 0.4299493 0.8251437
inac 1.179892 0.0954488 12.362 0.000 0.9928158 1.366968
ue 1.556987 0.0848854 18.342 0.000 1.390615 1.723359
highed -0.8085732 0.085388 -9.469 0.000 - 0.9759306 -0.6412159
mided - 0.6320822 0.1490549 - 4.241 0.000 -0.9242244 - 0.33994
h num 0.95203 0.08401 11.332 0.000 0.7873735 1.116686
snpar 0.3647026 0.0845436 4.314 0.000 0.1990001 0.5304051
ard - 0.4509469 0.063093 - 7.147 0.000 - 0.5746069 - 0.327287
crk - 0.3349415 0.0715481 - 4.681 0.000 - 0.4751733 - 0.1947098
ant 0.3530984 0.0758826 4.653 0.000 0.204371 2 0.5018256
col 0.368263 0.0833618 4.418 0.000 0.2048769 0.5316491
arm 0.4664409 0.0988147 4.720 0.000 0.2727676 0.6601141
ban 0.2097729 0.076176 2.754 0.006 0.0604708 0.359075
frm 0.6888737 0.0980013 7.029 0.000 0.4967947 0.8809528

_cut1 0.1488535 0.0834433 (Ancillary parameters)


_cut2 2.04582 0.0869085

y Probability Observed

Pr( xb + u < _cutl) 0.4970


2 Pr (_cutl < xb + u < _cut2) 0.3399
3 Pr (_cut2 < x b + u) 0.1631

TABLE 2.14
Ordered Logit on Deprivation in Northern Ireland: Catholic Subsample

Ordered Logit Estimares Log Likelihood = - 4283 .363; Number of obs = 4364; LRx2 (17) = 875.06: Prob > x 2 = 0.0000; Pseudo R 2 = 0.0927
y Coefficient Standard E1ror z P> lzl [95% Conf Intermlj

- 0.0430983 0.0621282 - 0.694 0.488 - 0.1648672 0.0786707


sex
-0.0181642 0.0089546 -2.028 0.043 - 0.0357148 - 0.0006136
age
0.00064 0.0001754 3.648 0.000 0.0002961 0.0009839
age2
0.6109521 0.1773859 3.444 0.001 0.2632821 0.958622
ret
1.487517 0.1299683 11.445 0.000 1.232784 1.742251
inac
J .364899 0.0906986 15.049 0.000 1.187133 1.542665
ue
- 1.09291 0.1133639 -9.641 0.000 -1.3 15099 - 0.8707208
highed
- 0.7667852 0.2172056 -3.530 0.000 -1.1925 - 0.3410701
mid ed
0.9087931 0.0739094 12.296 0.000 0.7639333 1.053653
h num
0.239649 0.0987062 2.428 0.015 0.0461884 0.4331096
snpar
-0.5853662 0.16575 -3.532 0.000 - 0.9102303 -0.2605022
ard
-0.3226664 0.1620424 -1.991 11.046 -0.6402637 -0.0050692
crk
o 2364182 0.1329838 1.778 0.075 -0.0242052 0.4970816
:1J1t
0.4795002 0.1039698 4.612 0.000 0.2757232 0.6832773
col
0.2546794 0.0926459 2.749 0.006 0.073096g 0.4362621
arm
0.2025872 0.0954257 2. !23 0.034 0.0 155563 0.3896! 8
ban
0.5243653 0.0950858 5.515 0.000 0.3380006 0.7107301
frm

_cutl 0.01 !0645 0.1138485 (Anci llaJy parameters) ~-~

_cut2 1.866844 0.1178967 /ó BlB(/~


:=::; 0.~
y
Pr(
Probability

xb + 11 < _cut I)
Observed

0.3824
n

-:r·-
.....,
*
...~
:o, I
V) í
____ '·9--
"- Pr (_cutJ < xb + 11 < _cut2) 0.3682
'..;.)
0.249.'
·o
-o 3 Pr(_cut2<xb+u) ·, m :........-
~+~t't\07
_, 1
'+V

contain only those variables with significantly non-zero coefficients? ln arder to determine how much of the Catholic-Protestant dt!priua-
One argument is that if one believed a priori that a variable had a tion gap (defined as the difference in the proportions of Catholics and
legitimate place in the equation specification then one should persist Protestants at different deprivation leveis) was due to differences in
with this belief and include it, no matter what. Another argument, characteristics, and how much was due to differences in coefficients,
the econometric issue was posed in terms of the following questions:
however, is that because the purpose of estimation and prediction
is to confront equation specification with data, to base predictions
1. What would have been the predicted probabilities of Protestants and
on the coefficient estimates obtained from the full specification may
Catholics being at different leveis of deprivation if the characteristics
be misleading since it would allow variables, whose legitimacy in the possessed by each group had been evaluated using its own coefficient
specification had been explicitly "rejected" by the data, to influence estimates? Denote these prohabilities as, respectively, as p~ and p~
the predictions. ln the previous section, following the firs t argu- (i = 1, ... , 13, 164; j = 1, 2, 3) and their means as pf and pf (j =
ment, the predictions were based on the full specification; in this 1, 2, 3). 33
section, following the second argument, they are based on a restricted 2. What would have been the predicted probabilities of Protestants being
specification. · at different leveis of deprivation if their characteristics had been eval-
Table 2.15, below, shows that: 50% and 38% respectively of the uated using Catholic coefficient estimates? Denote these '·synthetic"
Protestant and Catholic samples were not deprived; 34% and 37% probabilities as q~ (i= 1, ... , 13, 164; j = 1. 2, 3) and their mean as qf
respectively of the Protestant and Catholic samples were mildly (j = 1, 2, 3).
deprived; and 16% and 25 % respectively of the Protestant and 3. How do the qf compare to the pf and pf?
Catholic samples were severely deprived. The fact that a smaller pro-
portion of Catholics, compared to Protestants, were not deprived and The values of these three probabili ties, fJ), ii), and pf , are
that a higher proportion of Catholics were both mildly and severely shown in Table 2.16, not just for Catholics and Protestants in their
deprived (see Table 2.15) could be due to two reasons. First, those entirety but also for subgroups within Catholics and Protestants.
characteristics which increased the probability of being deprived Table 2.16 shows that, calculated over ali the persons in the smnple,
were disproportionately concentrated among Catholics and/or those the average "owri-coefficient"34 probabilities of Protestants being not
characteristics which decreased the probability of being Jeprived deprived, rnildly deprived and severely deprived were 50%, 34%, and
were disproportionately concentrated among Protestants.32 Second, a 16% respectively. When Protestant characteristics were evaluated at
particular attribute was penalized more harshly (if it was deprivation- Catholic coefficients, the probability of being not deprived fell to 46%
incr~asing: for example, being unemployed) and/or was rewarded and the probability of being severely deprived and of being mildly
I.
less generously (if it was deprivation-reducing: for example, having deprived rose to, respectively, 19% and 3:1%. This story was repeated
educational qualifications) if the person possessing the attribute was when subgroups (single parents, retired persons, inactive persons,
Catholic rather than Protestant. unemployed persons, and persons living in large families) from the
sample were analyzed. With two exceptions (retired and unemployed
persons) the probability of being not deprived always fel!, and the
TABLE 2.15 probability of being severely deprived always rose, when Protes-
Deprivation in Northern Ireland by Religion tant characteristics were evaluated at Catholic coefficients though,
PERCENTAGE OF SAMPLE THAT IS: naturally, the magnitude of these changes varied according to the
Mildly Srrongly subgroup being considered. 35 The large~ t fall (in the probability of
Not Depriued Depriued Depriued being not deprived) and the sharpest rise (in the probability of being
Ali persons 45.9 34. 9 19.2 strongly deprived) was recorded for perw ns who were economically
Catholics 38.2 36.8 24.9 inactive. The pattern with respect to the probability of being mildly
Protestants 49.7 34.0 16.3
42 4 .~

TABLE 2.16 deprived was that the position of Protestant persons who were single
Predicted Probabiiities of Cathoiics and Protestants parents, or retired, or unemployed or living in large fa milies woul d
Being at Different Deprivation Leveis* have been unchanged, 36 but that of persons who were in active woulcl
have worsened, had their characteristics been evaluated at Catholir
PREDICTED PROBABILITY
OF BEING: coefficients.
Not Mildly Strongly The difference between the proportions of Protestant5 and
Deprived Deprived Deprived Catholics in the different catego ries of deprivation, .Pf - PJ,can be
AI/ Persons deconstructed as
Protestants at Protestant 49.7 34.0 16.2
coefficients fi} - PJ = Ui) - fJÍ) + (pf - qf) = Aj + Bj (2 20)
Protestants at Catholic 45.8 35.3 18.9
coefficients
Catholics at C:atholic 38.2 36.9 24.9 ln Equation 2.20, Aj represents that part of the "deprivation gap''
coefficients between Catholics and Protestants that is due to intergroup differ-
Single Parents ences in characteristics, and Bj represents that part of the gap that
Protestants at Protestant 37.2 38.2 24.6
is due to intergroup differences in coefficient values. These abso lute
coefficients
Protestants at Catholic 35.0 38.1 26.9 differences can be expressed as proportions·
coefficients
Catholics at Catholic 27.3 37.8 34.8 A }· = A */ (p-P- p-c) and B } = 1-- A .I (2.21)
} } .I
coefficients
Retired
Protestants at Protestant 24.8 41.4 33.8 and the vaiues of A j and B j are shown. in percentage form , in
coefficients Table 2.17. •
Protestants at Catholic 24.6 40.7 34.6 Table 2.17 shows that, computed over al J the persons in thc sam-
pie, 66% of the deprivation gap (pf - PJ) between Catholics and
coefficients
Catholics at Catholic 20.7 39.6 39.7
coefficients Protestants, with respect to those who were not deprived, was due
Inactive to the fact that persons in the Catholic subsample had charactcris-
Protestants at Protestant 31.3 41.4 27.3 tics which were different from persons in the Protestant subsampie .
coefficients Thirty four percent was due to the fact t hat Catholic characteris-
Protestants at Catholic 20.7 38.9 40.5
i_ ' tics were evaluated differently fro m Protestant characteristics. With
coefficien ts
\ Catholics ai Catholic 16.5 37.6 45.9 respect to severe deprivation, 69% of the Catholic-Protestant gap
coefficients was due to attribute differences, but with respect to mild depriva-
Unemployed tion only 55% of the Catholic-Protestant gap was d ue to attrihute
Protcstants at Protestant 21.1 40.3 38.6
differences.
coefficients
1• I Protestants at Catholic 21.3 39.5 39.2 Turning to the specific subgroups, for penons who were single par-
'·- coefficients ents, retired, or unemployed, a comparatively large percentage of the
Catholics at Catholic 16.3 36.3 47.4 Catholic-Protestant gap, with respect to the different leveis of depri -
coefficients vation, was due to differences in characterist ics between Catholics and
ln Large Families
Protestants. Reiativeiy Iittle of the gap was dueto the fact that specific
Protcstants at Protestant 28.1 40.0 31.9
coefficients characteristics, when applied to Catholics, had more serious conse-
Protestants at Catholic 26.0 39.0 34.9 quences for deprivation than they did when applied to Protestants.
coefficients
Catholics at Catholic 23.2 38.6 38.2
coefficients
•c alculated from the ordered logit estim ares of Tables 2.13 and 2.14.
44 .f.5

TABLE 2.1 7 3. MULTINOMIAL LOGIT


Contributions to the Deprivation Gap
Between Catholics and Protestants* Introduction
PERCENTAGE CONTRIBUT/ ONS BY The previous chapter considered a class of models which were cen-
DIFFERENCES l N:
tered around events with multiple (>2) outcomes, where these out-
CharacteTistics (A ) Coefficients (Bi)
comes were inherently ordered. ln this class of models the dependent
Ali Persons variable Y ;, when defining these outcomes for person i (Y; = 1, for
No-Deprivation 65.8 34.2
the first outcome; Y ; = 2, for the second outcome; and so on, until
Mild-Deprivation 54.7 45.3
Strong-Deprivation 69.4 30.6 Y; = M, for the M th (last) outcome, i = 1, ... , N), was a discrete,
Single Parents ordinal variable. The appropriate methods of estimating such models
No-Deprivation 78.2 21.8 were ordered logit and ordered probit. This chapter focuses on multiple
Mild-Deprivation 64.4 35.6 outcome models where the outcomes are not ordered. The methodol-
Strong-Deprivation 77.7 22.3
ogy of multinomiallogit- which is the appropriate estimation method
Retired
No-Deprivation 96.1 3.9 for this class of models- is explained and then applied to an analy-
Mild-Deprivation 63.1 36.9 sis of occupation choice. An important property (and limitation) of
Strong-Deprivation 85.7 14.3 multinomiallogit is that of the Independen ce of brelevant Alternatives
fnactive (!IA). This chapter concludes with a discussion of this property and
No-Deprivation 28.3 71.7
of how its limitations might be circumvented.
Mild-Deprivation 33.9 66.1
Strong-Deprivation 29.4 70.6 Severa! real world events provide examples of unordered outcomes.
Unemployed The choice of transportation to work (by bus, train or car) is clearly
No-Deprivation 103.0 -3.0 such an example. Other examples of unordered outcomes are occupa-·
Mild-Deprivation 81.7 18.3 tional choice (unskilled; skilled; professional; managerial), choice of
Strong-Deprivation 93.5 6.5
residence location (north; south; east; west), and choice of party at
ln Large Families
No-Deprivation 58.1 41.9 elections (Conservative; Liberal; Labor) . A word that is common to
Mild- Deprivation 27.2 72.8 all the above examples is "choice." Since there is nothing inherently
Strong-Deprivation 51.4 48.6 good or bad about the outcomes, individua is may be viewed as choos-
*Calculated from the probability estimates of Table 2.16. ing, from the menu of available outcomes, that outcome which suits
!.,, them the best. lndeed, the framework of utility maximization offers a
For example, for unemployed persons, over 94% of the Catholic- good starting point for understanding the structure of multiple out-
Protestant gap, with respect to severe deprivation, could be explained come models when the outcomes are unordered.
in terms of differences in characteristics between persons belonging
to the two communities. However, for inactive persons differences in
A Random Utility Model
the coefficients used to evaluate characteristics (as opposed to differ-
1...' i ences in the characteristics themselves) accounted for over two-thirds Given a choice between M alternatives (indexed, j = 1, .... M) ,
of the deprivation gap between Catholics and Protestants. For persons the utility that the ith person (i = 1, ... , N) derives from the jth
living in large families, nearly half of the intercommunity deprivation alternative may be represented as Uij· Suppose that this utility is a
gap in respect of severe deprivation and over 70% in respect of mild linear function of H factors (determining variables). Of these H fac-
deprivation was due to differences in the coefficients used to evaluate tors, suppose that R factors are specific to the individual and have
the deprivation generating characteristics. nothing to do with the nature of the choice and that S (H = R + S)
46 47

factors are specific to the choice and have nothing to do with the indi- The Class of Logit Models: Multinomial and Conditional
viuual. For example, in terms of choosing a mode of travei to work,
a characteristic of the individual might be that he or she does not McFadden (1973) has shown th at if tb c M errar terms e ij (}
Iive near a train station. The fact that this individual rareiy traveis to 1, ... , M ) are independently and identically distributed with Weibull
work by train has nothing to do with the nature of train travei bu t is distributi on F(e;j) = exp[exp( - e;j)], th en
solely an outcome of where that person lives. On the other hand , the
fact that rush hour traffic is heavy is an attribute of car travei that
exp( L~-­
reduces every person's utiiity from going to work by car. Suppose that
the vaiues of the R variabies representing the characteristics of the ith
Pr( Y; = m ) = L:7~ 1 exp(Z;j ) ( 3.3)

person are X ;, r = 1 ... , R and that the vaiues of the variables rep-
resenting the attributes of the jth choice are UJs, s = 1, . . . , S. The A model in which the probabilities of thf different outcomes, j =
utiiity function may be written as 1, ... , M are defined by Equation 3.3 is defined here as a generalized
logit model. The term "generalized" conveys the fact that the model
R S
incorporates both characteristic effects anel attribute effects, respec-
U;j = L [3 j,xir +L 'YisUJs + ê;j = z ij + ê;j, (3.1)
tively the X;, and U}s of E quation 3. 1. Within the class of generalized
r=l s= l
logit models, two subclasses may be disting:u ished:
where [3 jr is the coefficient associated with the rth characteristic (r =
1, ... , R) for the jth alternative, 'Yis is the coefficient associated with 1. Multinomial logit models. T hese are models which incorpora te onl y
the sth attribute (s = 1, ... , S) for the íth person, and characteristic effects, so that fo r such models ali th e Y;, = O in
R S Equation 3.1. ln effect, such models apply when the data are individu al
specific.
Z ;j =L f3 j,X;, + L 'Y;sHJs · (3.2)
r =l s=l 2. Conditional logit models. T bese are models wbich incorporare onl y
attribute effects, so that fo r such models ali the {3 jr = O in Equat io11 3. I.
An increase in X;, the vaiue of the rth characteristic for person i, will ln effect, such models apply when the dat<' are choice specific.
cause his o r her utiiity from choice j to rise if [3 jr > O and to fall if
f3 jr < O. An increase in J.J),., the vaiue of the sth attribute for choice j ,
will cause utility to rise for person i if 'Yis > O and to fall if 'Yis < O. Multinomial Logit
However, since the relationship between utility and its determining
variables is not an exact one-for example, there may be factors left The multinomial model is defined by E qu ation 3.3 but wi th th e
out of the . equation or factors may be measured inaccurately-an caveat that now, with the Yis = O,
error term, e; is included in the equation to capture this inexacti-
tude. Hence the term random utility model. R
A person will choose j = m if and only if it offers, of all the available Z ;j =L f3 j,.X;,.. ( ~.4)
choices, the highest leve! of utility. ln other words, if Y; is a random r= I
variable whose value (j = 1, ... , M) indicates the choice made by
person i, the probability that person i will choose alternative m is Because the probabilities Pr(Y; = j) sum to 1 over ali the choices
(that is, L:J~ 1 Pr(Y; = j) = 1), only M - 1 of the probabi lities can he
Pr( Y; = m) = Pr( U;m > Uij ) for all j = 1, ... , M, j =1-m determined independently. Consequently the mu ltinomi allogit model
of Equation 3.3 is indeterminate, as it is a system of M equations
==} Pr(Z;m+ê;m > Z;j +e;j )
in only M - 1 independent unknowns. A convenient normalization
==}Pr(eij- e;m < Z;m- Z;j) for all j = 1, .. . ,M, j =f.m. that solves the problem is to set {3 1,. = O, r = l , ... , R . U nder thi s
48 .fl}

normalization Zil = O, and so from Equation 3.3 by 1-the probability of that outcome. Tht s the odds-ratio for j = m
is
1
Pr(Y; = 1) = 1 + "2:~ 2 exp(Z;j) (3.5a) Pr(Y; = m) Pr(Y; = m) Pr(Y; = 1)
ORm =i-= Pr(Y; = m) Pr(Y, = 1) 1- Pr(Y; = m )
exp(Z;m) m = 2, ... , M. RR 111 Pr(Y; = 1)
Pr(Y; = m) = 1 + 'L~ exp(Z;i )
2
(3.5b)
(3 .9)
1- RRm Pr(Y; = 1)'

As a consequence of the normalization, the probabilities are uniquely where OR 111 and RRm are, respectively, th e odds-ratio and the risk-
determined so that Equation 3.5b represents a system of M- 1 equa- ratio associated with outcome j = m (the latter relative to the "base"
tions in the M -1 unknown probabilities, Pr(Y; = m) with Pr(Y; = 1) outcome j = 1).
having being defined by Equation 3.5a through the normalization ln a hinary model, there is no distinction between the RR and the
adopted. OR since the base outcome Y; = 1 is simply the outcome Y; =1= m. ln
From Equations 3.5a and 3.5b, the logarithm of the ratio of the a model with more than two possible outcomes, the outcomes Y, = 1
probability of outcome j = k to that of outcome j = m is and Y; =I= m are different. The natural me thod in ali logit models is
to express results as the ratio of the likelihood of an outcome and
the likelihood of some base outcome-that is, to compute the RR. ln
1og (Pr(Y; = m)) ~ (3.6) binary models, however, the RR is the OR and so results in such mod-
Pr(Y; = k) = f:í_(f3mr- {3 k,. )Xir = Z;m- Z ;k>
els are expressed in terms of the latter. ln multinomial logit models,
on the other hand, results are expressed in terms of RR and not in
so that the logarithm of the risk-ratio (that is, the logarithm of the terms of OR since these are now different from each other.
ratio of the probability of outcome m to that of outcome k, or
log(Prob(Y; = m)/Prob(Y; = k)]) does not depend upon the other Estimation and Prediction
choices. The risk-ratio or, as it is sometimes referred to, the relative
Each of the N ohservations on the dependent variable Y; (i
lisk-Prob(Y; = m)/ Prob(Y; = k)-can easily be calculated from the
1, . . . , N) is treated as a single draw frorn a multinornial distribution
Jog risk-ratio by taking its exponential. If k = 1, the Jog risk-ratio is
with M outcomes. Define a dummy variabk oiJ = 1 if person i makes
choice j, O;j = O otherwise, j = 1, .. . , M. Then the likelihood of
observing the sample is
1og (Pr(Y; = m))
l.u;
Pr(Y; = 1) = L f3mrX;,. = Z ;m,
R

r=l
(m = 2, ... ,M), (3.7)
N !11 N M

and the risk-ratio is


L= n n(Pr(Y; = j)]
8
ii => log L =L L O;j Pr(Y; = j) , (3.10)
i=l )=1 i=J i=l

1'·
'··~
Pr(Y; = m) (~ ) where Pr(Y; = j) is defined by Equatio n 3.5a if j = 1, and by
Pr(Y; = 1) = exp f:í_ f3m ,.X;,. Equation 3.5b if j > 1. The parameter estimates í3 jr (j = 1, . . . , M;
r = 1, ... , R) are chosen so as to maximize the likelihood fu nction
= exp(Z;m), (m = 2, ... , M). (3.8) (Equation 3.10).
Given these estimates, for each person i = 1, . . . , N one can form
The risk-ratio (RR) should be distinguished from the odds-ratio estimates of Z;j using Equation 3.4, with the í3 jr in place of f!J,
(OR) where the latter refers to the probability of an outcome divided for every outcome j = 1, ... , M. Then, t•sing these estimates, Z;;,
50
r 51

the predicted probabilities, Pij• can be computed from Equations 3.5a


I X ;, (the value, for person i, of the rth determ ining variable) what will
and 3.5b, for i = 1, ... , N and j = 1, ... , M. Note that for every
person, the estimated probabilities must sum to unity across all the
I be the change, for some outcome m, in the following?

outcomes, ~f= 1 Pij = 1. A property of the multinomial logit model 1. The probability Pr(Y; = m).
is that the mean of the estimated individual probabilities for each 2. The risk-ratio Pr(Y; = m)jPr (Y; = 1).
outcome (pj = I:~ 1 Pij• j = 1, .. . , M) is equal to the observed pro-
portion of persons in that outcome category. The second question is relatively easy to answer, but the first
An alternative way of predicting probabilities from the multinomial is much more difficult. Taking the easier question first , from
logit model is to compute the mean of Z;j across all the persons for Equation 3.7,
each outcome j = 1, ... , Jv/ as
_a_
aX ;, 10g
(Prob(Y; = m)) = .Bmn
zj = 8N Z;j/N = 8N(Rf; .Bj,xir) I N Prob( Y; = 1)

so that for a small change in X;, the direction of change in the risk-
=
R
~ .Bj,
(
8 N
X;, / N
)
= f; .Bj,x,.,
R
(3.11) ratio can be inferred from the sign of the associated coefficicnt; the
relatiue probability of j = m increases if .Bmr > O and decreases if
.Bmr < O.
and then to calculate the predicted probabilities as However, the direction of change in Pr( Y; = m ), the probability of
observing outcome j = m , for a small change in X;, cannot be inferred
1 from the sign of .Bnw The reason is that in a multinomial mod el a
(3.12a)
P1 = 1+ I:f= 2
exp(Z) change in the value of a variable for a particular person affects for
him or her the probability of euery outcome. Since these probab ilities
exp(Zj) m = 2, ... , M. (3.12b) are constrained to sum to unity, whether Pr(Y; = m) goes up or down
Pm = 1 + I:f= 2 exp(Zj) depends upon what happens to the other probabilities. Therefore in
effect it depends not just upon the sign of .Bmr but also upon the size
Then, in general, p 1 =I= p1 , Pz =I= p2 , and h =I= p3 , which is not sur- f of that coefficient relative to the size of the other coefficients attached

··~
prising because although both the Pj and the h (j = 1, 2, 3) purport
to measure the overall probability of the different outcomes, they are
computed very differently. The Pi are computed as the mean of the
I to the variable, that is, to the f3 j,j = 1, .. , M, j =1= m. Consequently,
aProb(Y; = m) jaX;, need not have the sarne sign as /3 111 ,.
The most effective way of establishing the effect of a change in
estimated individual probabilities, Pij• and the Pij are computed from the value of a variable upon the outcome probabilities in a multino-
the values of the determining variables for the individual. On the mial model is to compare the computed probabilities before and after
other hand the pj• which bypass the individual probabilities, are com- the change with the ualues of the other uariables left unchanged. This
puted from the mean values z j of the individual zij (or, equivalently, method is most useful when the determining yariable under analy-
hE..
~· calculated using the mean values X, of the determining variables X ;,). sis is a dummy variable. If the rth variab le is d dummy variable, so
that X ;, = 1 or X ;, = O, then first evaluate ..2ij for every outcome
Marginal Effects j = 1, . . . , M using Equation 3.4, under the as'sumption that for ali
The question of the marginal effect on the probabilities of the dif- persons X; , = 1. Call this estimate Zfj · Then ceteris paribus evaluate
ferent outcomes of a small change in the value of the determining Z;j for every outcome f = 1, . . . , M, under the assumption that fo r ali
· · ~o ~~ ~o •
variables can be phrased in two separate ways. For a small change in persons X;, = O. Call th1s estlmate Z;i and note that Z;j = Z;i + f3 ;r ·
) .1
)L

Using Equations 3.5a and 3.5b to compute the predicted probabili- TABLE 3.1
. fi rst usmg
t1es, • z ij = z~lij (. to ob tam• Pij • z ij = ZAijQ (.to
AJ) an d t h en usmg Sample Statistics of Male Full-Time Employees, by Ethnic Group•
obtain f;~j). D enote the mean probabilities, computed over all per- White Black lndian
sons, as P-1j = "" N Al
L..,i=l P ij an
d P-oj = "" N Ao Th e d'ff
L..,i=l Pij· 1 erence b etween 1,572
Sample Size 96,297 863
the PJ and p~ is the "mean" effect of a change, for ali persons, in the Age (yrs) 35 .4 32.6 34.9
value of the rth determining variable on the probability of observing % in Occupational Class
Professional/Managerial{fechnical 41.7 29.3 35.1
outcome i (j = 1, ... , M). 40.7 50.2 39.8
Skilled Manual/Nonmanual
An alternative way of keeping the values of the other variables Unskilled/Semiskilled 17.6 20.5 25.2
unchanged while changes to the value of a variable are being analyzed % With Post-1 8 Qualification
is to set the values of the other variables to their mean values, X, = Degree 15.1 7.9 17.4
b.6
I:f:, 1 X i,/ N. Then define Subdegree 9.2
75.7
7.1
85.0 76.0
No Post-1R Oualification
-ZJ = f3 jr +L sf-r
f3 jsXs and ZJ =L
sf-r
f3 jsXs, (3.13) % Bom in
Britain 96.2 54.6 7.6
Overseas 3.8 45.4 9~.4

and use Equations 3.5a and 3.5b to compute the predicted probabil- % Living in
ities, first using Z ij = ZJ
(to obtain ft}) and then using Zij = (to ZJ London
North
10.1
50.3
51.5
24.9
43.6
30.2
obtain f;~). The difference between the f;j and f;J is the effect of a South 39.6 23.6 26.2
change in the value of the rth determining variable for the "average
•Information from the 1991 census for Britain.
person" (defined as a person with average values for ali other vari-
ables) on the probability of observing outcome i (j = 1, .. . , M) for
that person. 98,732 men 96,297 were "white," 863 we1e "black," and 1,572 were
The effect of a change in the value of a determining variable (Asian) Indians. Table 3.1 shows the salient features of the sam-
upon the risk-ratio is much more straightforward. The difference in ple statistics. This table demonstrates very clearly the differences in
the log risk-ratjs> wh~n, rest>ectively, Xir = 1 and Xir = O is, from occupational status between whites, blacks, and Indians. For exam-
Equation 3.7, z& - zz = {3 1.,, and so the corresponding difference ple, while 42% of white mal e employees were in PMT jobs, only 35 %
in the risk-ratio is exp({3 jr ). Thus the exponential value of a coeffi- of Jndians, and only 29% of blacks were in similar employment. Nor
I
cient represents the change in the risk-ratio (for that outcome) for could these differences be explained away in terms of differences in
1.,~ a one unit change in a determining variable. Needless to say, a one characteristics: 17% of lndian employees had degrees, as opposed to
unit change is most appropriately considered fo r variables that are 15% of white employes.
dummy variables. ln terms of the choice formulation set out earlier, it would appear
that whites, blacks, and Indians face different sets of constraints in
Application to Occupational Outcomes making their choices. But such observations beg the question of what
lies behind these differences in constraints. It is possible that persons
.l ·e Equation Specification from minority ethnic groups face disadvantage in the labour mar-
'
lnformation was extracted from the 1991 Census for Britain on ket in relation to equally qualified white persons. It is also possible
the occupational class (and other characteristics) of mal e full-time that persons from minority ethnic groups have less favorable worker
employees who were between the ages of 25 and 45 years.37 The occu- characteristics (hereafter, referred to as characteristics) than white
pational classes were unskilled/semiskilled (UNS), skilled manual/non- persons. Intergroup differences in occupational representation may,
manual (SKL ), and professional/managerial/technical (PMT) . Of these therefore, be the result of both ethnic and characteristics disadvan-
54 ::;c

tageo The crucial question is: how much of these differences is the Area of Residence41
result of ethnic disadvantage and how much is due to characteris-
iics disadvantage? This section shows how the method of multinomial • STH = 1, if living in the South of Britain 42 (excluding London); STH =
logit that is described in the previous section can be used to answer O, otherwiseo
this questiono • NTH = 1, if living in the North of Britain: 13 NTH =O, othetwiseo
The starting point was to define the dependent variable Y; for each
of the i = 1, ooo, 98, 732 men in the sample such that: On the basis of these variables the two (log) risk-ratio equations
(Equation 307 of the previous section, for i = 2, 3) were specifi ed as
• Y; = 1 if the person was employed in a UNS occupationo
follows:
• Y; = 2 if the person was employed in a SKL occupation o
• Y; = 3 if the person was employed in a PMT occupationo Pr(Y; = j)) =ajo+ {3i * BLK; + Yjo * IND;
log ( Pr( Y; = 1) 0 ( ~ . 14)
The deteiomining variables used in the multinomial specification
were: +ai 1 * OVB; + {3i 1 *BLK; * OVB; + Yj t * !ND; * OVB;
Age +a 12 * HED; +.f3j 2 *BLK ; dfED; + Yj2 * IND; * HED;
+a 13 * MED ; + {3J3 *BLK; * MED; + yj 3 dND; *MED;
• AGE in years: normalized by setting AGE = O for persons who were
25 years oldo +aj4 *NTH; + {3j 4 *BLK; * NTH ; + Yj 4 *"IND; * N TH ;
+ ai 5 * STH; + f3js * BLK; * STH; + Yjs * IND; * STH;
Education38
+ 8 ·1 *AGE + 8 2 * AGE 2 + 8J-·1 * BUS + 8 4 * SCI
j I ./ I I j I

• HED = 1, if the person had degree-level qualifications; HED = O,


otherwiseo
+ 8i5 * BUS;* HED ; + ei6 * SCI; * HED; = Z;j
• MED = 1, if the person had post-A levei, but less than degree, qualifi-
cations; MED = O, otherwiseo The outcome j = 1-that is, being in the U NS occupational class- is
hereafter referred to as the "base" outcomeo The coefficients of this
Area of Study39 outcome are set to zero and the risk-ratios of th e othe r outcomes are
defined with respect to the probability of this base outcomeo 44
• SCI = 1, if area was science related; SCI = O, otherwiseo ln Equation 3o14, the air are the "white" coeffici ents; the f3;,
'· · ~ • BUS ,;, 1, if area was business-studies related; BUS = O, otherwiseo and the Yjr represent the additional contribution to these coeffi-
cients resulting from being, respectively, black and Indiano On the
Ethnicity40 other hand, the Bjk are coefficients that are assumed not have a n
"ethnic dimension" meaning that they are (assumed to be) invari-
• BLK = 1, if the person's ethnicity was Black-Caribbcan; BLK = O, ant with respect to ethnicityo From Equation 3014, the log risk-ratio
otherwiseo is ajo for a 25-year-old (AGE = 0), Dritish-born (OVB = 0),
• IND = 1, if the person's ethnicity was Indian; IND =O, otherwiseo whitc (BLK = IND =O) male employee with no post-J 8 educa-
tional qualifications (HED = MED = 0), living in London (NTH =
Country of Birth STH = O)o If such a person were, instead, black or Indian then h is
log risk-ratio would change by, respectively f3 jo and l'jo to hecome,
• OVB = 1, if the person was bom outside Britain; OVB = O, otherwiseo respectively: (ajo+ f3jo) and (ajo+ Yjo) o
56 )'

If, for example, {3 jO < O then the ratio of the probability of being in Greene (2000, p. 831- 833) has a numbcr of suggestions fo r mea-
occupational class j to the probability of being in the UNS class would suring the "goodness-of-fit" of equations with discrete dependent
be higher for a white person than for an equivalent black person. The variables. At a minimum he suggests that one should report the
magnitude of the coefficients {3 jO and l'jo measure the degree of "eth- maximized value of the log-likelihood function. The v&lues of L 1
nic disadvantage" (with respect to occupational class j) faced by blacks are the maximized log-likelihood values shown at the head of the
and Indians vis-à-vis whites in the context of the characteristics set out tables (respectively, -86948.078 and -86952.126 ). Since the hypotll-
above.45 The interaction terms in Equation 3.14, involving the ethnic esis that all the slopes in the model are zero is often interesting, the
variables BLK and IND, allow the degree of ethnic disadvantage to results of comparing the "full " model with an "intercept only" model
vary with some of the nonethnic characteristics of a person. These should also be reported. The x2 values at the head of the Tables 3.2
nonethnic characteristics were country of birth, region of residence, and 3.3 (30880 .36 and 30872.27) are defined as 2(L 1 - L 0 ) where
and levei of qualification. For men with the characteristics described L 0 is the value of the log-likelihood function when the only explana-
above, the log risk-ratio of a white overseas born (OVB = 1) male tory variable was the constant term and L 1, as observed earlier is
being in occupational class j is a jo+ aj 1, and that of a black overseas the value of the log-likelihood function when all the explanatory
born (OVB = 1) male is a jo + aj 1 + f3jo + {3 jl· variables were included. The degrees of frcedom are equal the num-
A set of nonethnic interactions that were included in Equation 3.14 ber of slope coefficients estimated. These x2 values decisively reject
was between the area of study and the levei of qualification. These the null hypothesis that the model did not have greater explanatory
interactions have associated coefficients 8js and 8j 6 and nonzero val- power than an "intercept only" model.
ues, for these would imply that in determining the value of the risk- The "pseudo-R 2 " is defined as 1 - L d L 0 and is due to McFadden
ratio it was not just the area of study and the levei of qualification, (1973). This is bounded from below by O and from above by 1.
considered separately, that mattered but also how subject and qual- A O value corresponds to ali the slope coefficients being zero. and
ifications fused to produce, for example, a graduate with a science- a value of 1 corresponds to perfect prediction (that is, to L 1 = 0).
related degree. Unfortunately, as Greene (2000) notes, t!te values between O and 1
have no natural interpretation, though it has been suggested that
The Equation Statistics the pseudo-R 2 value increases as the fi t of the model improves.
Other measures have been suggested. Ben-Akiva and Lerman (1985)
Equation 3.14 was estimated first without any restrictions imposed
and Kay and Little (1986) suggested a fit measure that measurcd
upon its coefficients and then with some of its coefficients constrained
the average probability of correct predictíon by the prediction rule.
to be zero. Equation 3.14 contained a total of 48 coefficients, 24 each
Cramer (1999) suggested a measure that corrected for the failure of
in the SKL (j = 1) and PMT (j = 2) equations. Of these 48 coeffi-
Ben-Akiva/Lerman measure to take into account that, in unbalanced
t~t cients, 13 coefficients were set to zero. These coefficients set to zero
samples, the less frequent outcome will usually be predicted very
were, individually, not significantly different from zero, and likelihood
badly. ln their survey of pseudo-R 2 measu res, Veall and Zimmerman
ratio tests with x2 (13) = 8.1 did not reject the joint hypotheses that
(1996) argued that in models of the multinomial probit or logit type.
they were ali equal to zero. Tables 3.2 and 3.3 show, respectively, the
only the McFadden (1973) measure "seemed worthwhile."
results of estimating Equation 3.14 without and with the zero restric-
An alternative to "point" measures of goodness-of-fit might be
tions imposed. A comparison of the results with and without the zero
Ir~ restrictions imposed showed that imposing the zero restrictions did
to assess the predictive ability of the model. Such assessments are
routine in models of binary choice whe re the hits (Yi = 1) anel
not qualitatively affect the estimates of the coefficients that were not
misses (Y; = O) predicted by the model on the basis of a predic-
set to zero. The z-ratios in Tables 3.2 and 3.3 are the ratios of the esti-
lion rule ( e.g., Yi = 1 if Pi > 0.5 , Yi = O, otherwise) are compa reci
mated coefficients to their estimated standard errors the z-ratios are
to the actual bits and misses. This procedure could be extencled to
asymptotically distributed as N(O, 1) under the null hypothesis that
the associated coefficients are zero. 46 (text continues on page 62)
·-
~

TABLE 3.2
V\
00 Multinirnial Logit Estirnation of Occupational Choice: Full Specification

Multinominal Regression Log Likelihood = -86948.078; Number of obs = 98732; LR x2 ( 46) = 30880.36; Prob > x2 = 0.0000; Pseudo R 2 = 0.1508

y Coefficient Standard Error z P> lzl (95 % Conf Interual]

y=2
age 0.0261899 0.0064884 4.036 0.000 0.0134729 0.0389069
age2 -0.0009772 0.000306 - 3.194 0.001 - 0.0015768 -0.0003775
blkcb 0.3655553 0.1672964 2.185 0.029 0.0376603 0.6934503
blkovs 0.08195 0.1887671 0.434 0.664 - 0.2880266 0.4519266
ind 0.4707812 0.2790797 1.687 0.092 - 0.0762051 1.017767
indovs - 0.6064709 0.2728072 -2. 223 0.026 -1.141163 - 0.0717786
ovsbn -0.1 09104 0.0540097 -2.020 0.043 - 0.214961 - 0.003247
north -0. 1445068 0.0355612 -4.064 0.000 - 0.2142055 -0.0748082
indnth - 0.4185296 0.1586727 -2.638 0.008 -0.7295225 -0.1075368
blknth - 0.7146398 0.2170033 - 3.293 0.001 - 1.139959 - 0.2893211
south -0.0776431 0.0365635 - 2.124 0.034 - 0.1493062 -0.0059799
indsth -0.3459856 0.1654501 -2.091 0.037 -0.6702619 - 0.0217094
blksth -0.6664'J42 0.225652 -2.954 0.003 -1.108764 -0.2242244
highed 1.083631 0.1822909 5.945 0.000 0.7263477 1.440915
indhe -0.2140724 0.4490211 - 0.477 0.634 -1.094138 0.6659928
blkhe -1.26384 0.7230326 - 1.748 0.080 -2.680958 0.1532775
mided 0.6060102 0.2709862 2.236 0.025 0.0748871 1.137133
indme -0.3332887 0.4428193 -0.753 0.452 - 1.201199 0.5346213
blkme 0.9548953 1.034856 0.923 0.356 -1.073386 2.983177
subbus 1.205355 0.3189165 3.780 0.000 O.SR02901 1.83042
subsci 0.5103959 0.2820694 1.809 O.D70 -0.0424499 1.063242
bus h - 0.6332213 0.4138681 -1.530 0.126 -1.444388 0.1779454
sei h - 0.795496 0.3545109 -2.244 0.025 -1.490325 --0.1006674
_cons 0.7755227 0.042255 18.353 0.000 0.6927043 0.858341

y=3
age 0.0893431 0.0073103 12.222 0.000 0.0750152 0.103671
age2 -0.0024609 0.0003397 -7.244 0.000 - 0.0031266 -0.0017951
blkcb - 0.0672642 0.1950314 - 0.345 0.730 -0.4495187 0.3149904
blkovs -0.5708287 0.2236755 -2.552 0.011 -1.009225 - 0.1324329
ind 0.0918407 0.3348558 0.274 0.784 - 0.5644646 0.748146
indovs -0.7893394 0.3292557 -2.397 0.017 -1.434669 - 0.1440101
ovsbn 0.0932756 0.0565007 1.651 0.099 - 0.0174637 0.204015
north -0.6113055 0.037637 - 16.242 0.000 - 0.6850727 -0.5375384
indnth -0.4696822 0.1943706 -2.416 0.016 -0.8506416 -0.0887228
blknth - 0.1352359 0.2598729 -0.520 0.603 - 0.6445774 0.3741057
- 0.1697707 0.0383309 - 4.429 0.000 -0.244898 -0.0946434
south
indsth - 0.5134623 0.1951871 -2.631 0.009 - 0.8960221 - 0.1309025
-0.418329 0.268549 -1.558 0.119 - 0.9446753 0.1080174
blksth
highed 3.877128 0.1713991 22.620 0.000 3.541191 4.213064
indhe -- 0.1698475 0.4062421 - 0.418 0.676 -0.9660674 0.6263724
blkhe -1.139()74 O.ol2842Y -l.dóU 0.063 -2.3411 24 u.0611 7Sí:l
mided 2.850769 0.2490918 11.445 0.000 2.362558 3.33898
indme - 0.2049551 0.4154452 --0.493 0.622 -1.019213 0.6093026
blkme 0.743211 1.023254 0.726 0.468 - 1.262329 2.748751
0.8151578 0.298235 2.733 0.006 0.230628 1.399688
subbus
0.2500103 0.2599529 0.962 0.336 - 0.2594879 0.7595085
subsci
-0.0414593 0.3898265 -0.106 0.915 - 0.8055052 0.7225865
bus h
0.1600076 0.3281777 0.488 0.626 - 0.4832088 0.803224
seih
-0.0164003 0.0462775 -0.354 0.723 - 0.1071026 0.074301 9
_cons

(Outcorne y = l is the comparison group) .


V\
'-D
,,:.~

TABLE 3.3
0\
o Multinimial Logit Estimation of Occupational Choice: Restricted Specification

Multinomial Regression Log Likelihood = -86952.126; Number of obs = 98732; LR x 2 (33) = 30872 .27; Prob > x2 = 0.0000; Pseudo R 2 = 0.151
y Coefficient Standard En·or z P> lzl [95% Conf Interval]
y=2
age 0.026323 0.0064795 4.062 0.000 0.0136234 0.0390227
age2 - 0.0009813 0.0003057 -3.210 0.001 - 0.0015805 - 0.0003822
blk 0.426683 0.1116093 3.823 0.000 0.2079328 0.6454332
blkovs (dropped)
ind 0.4144723 0.2152989 1.925 0.054 - 0.0075059 0.8364504
indovs - 0.565894 0.2215624 -2.554 0.011 -1.000148 - 0.1316396
ovsbn - 0.106519 0.05151 - 2.068 0.039 - 0.2074767 -0.0055613
north - 0.1438706 0.0351996 -4.087 0.000 - 0.2128606 - 0.0748806
indnth -0.4152823 0.1546391 -2.685 0.007 - 0.7183694 -0.1121952
blknth -0.6568392 0.1761849 -3.728 0.000 - 1.002155 - 0.3115232
south - 0.0765746 0.0362333 -2.113 0.035 - 0.1475906 -0.0055585
indsth -0.3417737 0.1632409 -2.094 0.036 - 0.66172 - 0.0218275
blksth -0.6677784 0.2086548 -3.200 0.001 -1.076734 - 0.2'588226
highed 1.373433 0.1088435 12.618 0.000 1.160104 1.586763
indhe (dropped)
blkhe -1.312484 0.7222905 -1.8 17 0.069 -2.728147 0.1031797
mided 0.8347366 0.1476385 5.654 0.000 0.5453704 1.124103
indme (dropped)
blkme (dropped)
subbus 0.9348005 0.1923779 4.859 0.000 0.5577467 1.311854
subsci 0.2707989 0.1355025 1.998 0.046 0.0052189 0.5363788
bus h - 0.6003602 0.1675312 -3.584 0.000 - 0.9287153 -0.272005
seih - 0.9608239 0.1601125 - 6.001 0.000 - 1.274639 - 0.6470093
_cons 0.7737729 0.041826 18.500 0.000 0.6917954 0.8557504

y=3
age 0.08951 18 0.0073026 12.258 0.000 0.0751989 0.1038247
age2 - 0.0024658 0.0003395 -7.263 0.000 - 0.0031313 - 0.0018004
blk (dropped)
blkovs - 0.6781675 0.154667 - 4.385 0.000 - 0.9813092 -0.3750257
ind (dropped)
indovs - 0.7063556 0.1295174 -5 .454 0.000 -0.9602051 -0.4525062
ovsbn 0.0954136 0.0553704 1.723 0.085 - 0.0131104 0.2039377
north -0.6105677 0.0368792 - 16.556 0.000 -0.6828497 - 0.5382858
indnth -0.4542501 0.181704 - 2.500 0.012 -0.8103834 -0.0981168
blknth (dropped)
south - 0.1682364 0.0376312 -4.471 0.000 -0.2419921 - 0.0944807
indsth -0.501347 0.1887684 -2.656 0.008 -0 .8713262 - 0.1313678
blks th -0.4365967 0.2228379 -1.959 0.050 - 0.8733511 0.0001576
highed 4.171942 0.0844821 49.383 0.000 4.00636 4.337524
indhc (dropped )
blkhe - 1.198389 0.6072452 - 1.973 0.048 - 2.388567 - 0.0082099
mided 3.089609 0.069541 8 44.428 0.000 2.95331 3.225909
indme (dropped)
blkm e (dropped)
subbus 0.5326858 0.1343444 3.965 0.000 0.2693757 0.795996
subsci (dropped)
bus h (dropped)
seih (dropped)
_cons - 0.01 88423 0.0454545 -0 .415 0.678 - 0.1079314 0.07024ó9

(Outcome y = ] is the comparison group).


Mlogit: likelihood-ratio test x'( 13) = 8.10; Prob > x' = 0.8372.
O'
bL. 63

multiple outcome models, where the predictions could be based on • living in London;
a rule whereby Y; = m, if Pim = Maxift ij ). These predictions could • having post-18 educational qualifications. preferably a degree and prefer-
then be compared to a "naive" model that predicted ali cases to be ably in a business-related subject; and
in the modal category of the dependent variable, and the percentage • being bom in Britain.
reduction in errar in moving from the naive to the full model could
Living in London conferred two benefits. First, there was a gen-
be computed. This approach is, however, not without pitfalls. First,
eral benefit that accrued to all persons. This stemmed from the facl
unlike the case of the linear regression model, where the coefficients
that the risk-ratio of being in SKL or in PMT employment of men
are chosen to maximize R 2 , in discrete eh o ice models the coefficient living in the North or in the South was ceteris paribus lower than
estimates do not maximize any goodness-of-fit measure. Therefore to that of persons living in the London 48 (&j4, &js < O, j = 2, 3 in
assess the model on the basis of goodness-of-fit, however measured, Equation 3.14). Second, there was a specific benefit that accrued to
may be misleading. Second, the predictions are critically dependent blacks and lndians. By living in London, rather than outside London,
on the prediction rule adopted, and the adopted rule may tum out blacks and lndians experienced a greater boost to their risk-ratio of
to be quite inappropriate to the needs of the sample. For example, if SKL and PMT employment than did whites. The reduction in the
the sample is unbalanced in a binary model-that is, has many more risk-ratio, of being in SKL or PMT employment, for a person living
1s than Os-then a rule that the model should predict the outcome outside London, relative to living in London, would have been greater
for which the estimated probability is greatest might never predict a 1 if that person had becn black49 or lndian than if that person had been
(ora 0). white(Sj4> Sjs• Yj4• Yjs < O, j = 2 and s j5• Yj4• Yjs < O, j = 3).
ln that sense, London was more kind to Indians and blacks, relative
to whites, than was the rest of Britain. ln particular, lndians living in
The Estimates London (NTH; = STH; = O) did not face any disadvantage relative
As the earlier discussion emphasized, the sign of a coefficient esti- to whites, with respect to the risk-ratio of being in PMT employment
mate in Tables 3.2 and 3.3 refiects the direction of change in the risk- since Yjo =O, j = 3. Living outside London (e.g., NTH; = 1) reduced
ratio, Pr(Y; = j) j Pr(Y; = 1), in response to a ceteris paribus change this risk-ratio for all persons, but the reduction was greater for lndians
in the value of the variable to which the coefficient is attached. It than for whites. ln that sense, the ethnic parity in PMT employment
that Indians enjoyed with respect to whites in London was eroded out-
does not refiect the direction of change in the individual probabilities
side London. The "London effect" operated most strongly in favor of
Pr(Y; = j). The estimation results reported in this subsection per-
blacks. Only 10% of white male ft1ll-time employces lived in London,
tain to the restricted coefficients shown in Table 3.3. This was because
but it was the area of residence of 44% of lndian, and 53% of black,
\ when the full equation specification (as shown in Table 3.2) was con-
\ male full-time employees.
fronted with the data, it was found that only a subset of these variables Being bom outside Britain was always a disadvantage: the risk-ratio
exerted a significant effect on the risk-ratios. ln most cases, the vari- of being in SKL and in PMT employment was greater for British-bom
ables that were excluded were interaction terms involving a nonethnic than for overseas-bom men (âj 1 < O, j = 2, 3 in Equation 3.14). How-
variable, X , and the ethnic variables, BLK and/or IND . This meant ever, given that the interaction of the birthplace variable with ethnicity
a.... that while the variable X by itself exerted a significant effect on the was negative (S jl , Yjl < O, j = 2, 3), the disadVf.ntage of being born
log risk-ratio of person i belonging to a particular occupational class, overseas was greater for lndians and blacks than/it was for whites, i.e.,
the ethnicity of a person did not alter this effect. the reduction in a person's risk-ratio that stemmed from being born
The estimation results, shown in Table 3.3, identify three character- overseas was greater for lndians and blacks than for whites.
istics as being important for improving a person's risk-ratio of being Post-18 qualifications, both in the form of sub-degree and of degree-
in SKL or PMT employment: 47 level (o r higher) qualifications, raised the risk-ratio of being in a
b4

SKL or PMT job, with the effect of degree-level qualifications being TABLE 3.4
stronger than that of sub-degree qualifications. ln addition to qual- Predicted Probabilities of Whites, Blacks, and Indians Being in
ifications, the subj ect in which the qualification was obtained also Different Occupational Classes*
mattered. Qualifications in business-studies type subjects provided the Predicted Probability of Being:
best means of entry into both SKL and PMT ernploymen t. The econo-
UNS SKL PMT
metric finding was that for Indian men, Yjz (HED) and yi 3 (MED) in - ~

Equation 3.14 were not significantly different from zero for both SKL Probabilities as mea n of
(j = 1) and PMT (j = 2) employment. For black meu, f3,2 (HED) predicted individual
probabilities
was significantly negative for PMT employment. Black meu obtained Ali persons 17.7 40.8 -11.5
a lower return on their degree-level qualifications than did Indian or White 17.5 40.7 41.8
white men, but Indian and white men received the sarne return. Black 20.5 50.2 29.3
The literature suggests that immigrants, particularly from develop- Indian 25 .1 39.8 35.1
ing countries, who had obtained their qualifications in their countries Predicted Probabilities
at mean values of
of origin suffered (whether justifiably or not) from a perception that determining variables
such qualifications were "less worthy" than equivalent qualifications Ali persons 14.3 41.8 43.9
obtained in the host country. Unfortunately the data do not record, White 14.2 41.7 44.1
in the case of persons born overseas, their date of arrival in Britain. Black 19.6 52.0 28.4
lndian 22.0 43.3 34.7
Consequently, while it is known that the majority of the black and
Indian men in the sample were born overseas (Table 3.1) there is no 'Calculated from the multinomial logit es timates of Table 3.3.
information on their age at arrival in Britain and, by implication, no UNS= U nskilled/Semiskilled manu al
SKL = Skilled Manual/nonma nual
possibility of surmising where they might have obtained their post-18
PMT = Professional/Managerial(fechnical
qualifications (if any).
of all persons, and of persons from each ethnic group, of belonging to
The Predicted Probabilities the three occupational classes are exactly equal to the corresponding
Using the estimated Z;,-which, remembering that zij = L~=l ~ jr X sample proportions in each class. 50 This is a property of the multino-
X;, are computed using the estimates shown in Table 3.3 in con- mial logit model, the means of the predicted individual probabilities
junction with the values of the determining variables for every of the outcomes are always equal to the sample proportions for the
individual-STATA will predict, for each of the 98,732 persons in the outcomes.
sarnple, the . probabilities of belonging to the three different occu- An alternative way of predicting probabilities is to compute the
·. mean of the Z ii over all persons and over the white, black, and Indian
pational classes by computing the p; 1from Equation 3.5a and the
p;2 and Pi3 from Equation 3.5b. The mean values of these indi- groups. If these are denoted, respectively. zi, Zwi' Z8 i and 21, then
vidual probabilities, computed over all persons and then over the Equation 3.12a can be used to calculate the probability of the first out-
three groups, white, black, and Indian, are denoted, respectively, pi, come for all persons and for each of the groups (j/ 1, ftw 1 , Pst> and
Pwi' fts, , and Pii (j = 1, 2, 3). These mean probabilities shown in ftn), and Equation 3.12b can be used to calculate the probabilities
Table 3.4 (Probabilities as mean of predicted individual probabili- of the second and third outcomes for ali persons and for each of the
ties) indicate, for example, that the predicted mean probabilities of gro ups (p j, PWJ• ft 8 , , and ft 1, , (j = 2, 3)). 51 These probabilities, shown
all male employees and of white, black, and Indian male employees in Table 3.4 (Predicted probabilities at mean values of determining
of being in PMT jobs were, respectively, 41.5 %, 41.8%, 29.3%, and variables), show that at the mean values of the determining vari-
35.1 %. lt should be cmphasized that the predicted mean probabilities ables the predicted probabilities of all, white, black, and Indian rnale
66 .I ó7

employees of being in PMT jobs were, respectively, 43.9%, 44.1%,


28.4%, and 34.7%. It is not surprising that these are different from
the probabilities reported in the upper part of the table. For example,
I this hypothetical situation. Now suppose that ali the 98,732 persons
in the sample were black, so that BLK ; = 1 for ali i = 1, ... . N
and let p~ denote the probability of person i being in occupational
although both the Pwi and the Pwj (j = 1, 2, 3) purport to mea-
sure the overall probability of white persons being in the different
occupational classes, they are computed very differently. The Pwi are
computed as the mean (ove r the 96,297 white persons) of the esti-
II class j (j = 1, 2, 3) under these hypothetical circumstances. 54 Note
that because of the specification of Equat ion 3.14, the probabilities,
plj and p~ are the result of ali the individuais in the sample having
their characteristics evaluated at, respectively, white (ai,) and black
mated individual probabilities, Pii' where the Pii are computed from (ai, + {3i,) coeffi.cients. They are referred to hereafter as "ethnic"
the values of the determining variables for the individuais. On the probabilities because they are eomputed from samples which differ
other hand, the P wi-which bypass the individual probabilities-are only in respect of the ethnicity of the persons comprising the sam-
computed, instead, from the mean value Zwi of the individual Z;i of pies. The coeffi.cient estimates, âi, and âi, + p1,, when applied to
the 96,297 white persons in the sample.52 ln general, therefore, for Equation 3.5a and 3.5b yield estimates, flW and p~, of the ethnic
any group g: Pgl =I= Pgl ; Pg2 =I= Pg2 and Pg3 =I= Pg3· probabilities.
- w = ""
If Pi N Aw/N
Li=l P;i
-B = '\'N
an d Pi AB !N are t he respecttve
L,i= J Pu,
·
Uncovering Marginal Effects Through Simulations means of the individual probability estimates (where the latter was
computed under the two hypothetical sets of circumstances) then
ln the earlier discussion it was emphasized that the direction of differences between the mean ethnic probabilities pf and pf are
change in Pr(Y; = j) , the probability of observing outcome j for a entirely the result of different sets of coefficients (white and black)
small change in X ;, could not be inferred from the sign of f3 ir since being applied to a given set of characteristics (that of the N persons in
aProb(Y; = j) jaX ;, need not have the sarne sign as f3ir· ln the context the sample). These differences may, therefore, be attributed entirely
of multinomiallogit models, it was only the direction of change in the to the unequal treatment of persons who, except in their ethnicity, are
risk-ratios that coul<.l be predicted from the sign of the coeffi.cients. identical in every respect. More succinctly, they may be attribute d to
As a consequence, the preceding discussion of the estimation results the fact that blacks face "ethnic disadvantage." A simil ar exercise can
was cast in terms of such ratios. be performed for the hypothetical case whcn everyone in the sample
Nonetheless, one may often be interested in the underlying proba- is Indian.
bilities, Pr(Y; = j) , rather than in the risk-ratios relative to some base The estimates of these ethnic probabilitil:s are shown in the upper
outcome, Pr(Y; = j) j Pr(Y; = 1). ln the context of the present appli- panei (Probabilities as mean of predicted individual probabilities) of
cation, one might be interested in particular in how the probability Table 3.5 for th e three groups, white, black, and lndian. This shows
of being in a specific occupational class differed between the ethnic that if everyone in the sample had th eir cbat acteristics evalu ated using
groups. Since the coefficient estimates do not directly offer answers the black coefficients a ir + f3 ir instead of having them evaluated at
to such questions, the alternative is to view such results through the the coefficients relevant to their ethnic group, 55 then 40% of the total
window of simulations. The results of the model can be made trans- sample would be in PMT jobs, 39% would be in SKL jobs, and 21%
parent by calculating the probabilities of the different outcomes in would be in UNS jobs.
a variety of hypothetical situations.53 More specifi.cally, the effects of An alternative to calculating the ethnic probabilities in this man-
ethnicity can be analyzed by comparing the probabilities that result ner would be to compare the prohabilities t hat result when the ethnic
when the ethnic dummy variables take different values, the values of variables take their different valu es and the valu:es of the other vari-
the other variables remaining unchanged between the comparisons. ables are held constant in each case to the mean of their sample
Suppose that all the 98,732 persons in the sample were white, so values. Denote by :Zf and Zf the m ean values (computed over the
~ R I
that IND ; = BLK; = O for all i = 1, ... , N. Let plj denote the
A

98,732 persons in the sample) of ZiJ = Lr=I f3JrX;,. when everyone


probability of person i being in occupational class j (j = 1, 2, 3) in in the sample was (treated as) white and evetyone in the sample was
68 b<.J

TABLE 3.5 PMT jobs. As a consequence, when everyone is reduced to the sample
Predicted "Ethnic" Probabilities of Whites, Blacks, and Indians mean, Indians "lose out" and other groups "gain."
Being in Different Occupational Classes* However, as Table 3.5 (Sample Proportions) shows, only 29% of
Predicted Probability of
black male employees were actually in PMT jobs. The sample propor-
Being in Occupations: tions of blacks and whites in occupation class j, denoted, respectively,
SKL
sf and sf, will, in general, be different from the ethnic probabilitics.
UNS PMT
pf and Pf (and from fif and fi}"'). This refiects the fact that blacks
Probabilities as mean of predicted and whites differ not just in terms of how they are treated but also in
individual probabilities terms of their characteristics. The fact that the sample proportion of
White 17.5 40.7 41.8
Black 20.7 39.5 39.8 blacks in PMT jobs (sf = 29.3%) is less than their predicted ethnic
lndian 19.3 47.0 33.7 probability (pf = 39.8% or fif = 41.0%) uf being in such jobs is due
Sample Proporti~ns: to the fact that, relative to the sample in its entirety, the characteristics
White 17.5 40.7 41.8 of blacks are less suited to PMT jobs. ln other words, blacks, relative
Black 20.5 50.2 29.3 to whites, face, with respect to PMT jobs, both ethnic and characteris-
lndian 25.1 39.8 35.1
tics disadvantage. The total of these separa te disadvantages is referred
Predicted Probabilities at mean to as the overall disadvantage.
values of determining variables
White 14.2 41.6 44.2
Black 19.3 39.7 41.0 Measuring Occupational Disadvantage
Indian 16.6 50.8 32.6
A measure of the ethnic disadvantage experienced, on average, by
•calculated from the multinomial logit estimates of Table 3.3. blacks vis-à-vis whites, with respect to occupational class j is Af,
where
UNS= Unskilled/Semiskilled manual
SKL = Skilled ManuaVnonmanual ,B -B/ ··W
PMT = ProfessionaVManageriaVTechnical 1\J =PJ PJ · (3.15)

If the two probabilities are equal, then Af = 1 and there JS no


(treated as) black. 56 Then the white and black probabilities of being ethnic disadvantage. However, if ft7
< pj , then Af < 1 and there is
in the different occupational classes (respectively denoted, and fi/ black ethnic disadvantage57 for occupational class j, with the size of
fif,j = 1, 2, 3) are calculated using and :Zf :Zf
in Equation 3.12a this disadvantage being greater the further Af is from 1.
for the first outcome and in Equation 3.12b for the second and third A measure of the overall disadvantage experienced, on average,
outcomes. The values of these estimated probabilities are shown for by blacks vis-à-vis whites, with respect to occupational class j is J.L7,
whites, blacks, and Indians in Table 3.5. (Predicted probabilities at where
mean values of determining variables) When the probabilities are
computed from the average characteristics of the sample, a larger J.L7 = sJ;sr (3.16)
proportion of whites and blacks (and a smaller proportion of Indians)
are predicted to be in the PMT class (fif = 44.2%, fif = 41.0%, If the two sample proportions are equal, then J.L7 = 1 and there is no
fii = 32.6%) than were predicted when the probabilities were com- overall disadvantage. However, if sf < sj. then J.L7 < 1 and there is
puted as the mean of the individual probabilitics (p f = 41.8%, black overall disadvantage58 for occupational class j , with the size of
pf = 39.8%, p~ = 33.7% ). This is a consequence of Indians having a this disadvantage being greater the further J.Lf is from 1.
higher than average endowment of higher educacional qualifications A measure of the characteristics disadvantage experienced, on aver-
(see Table 3.1) which is a major determinant of success in obtaining age, by blacks vis-à-vis whites with respect to occupational class j is
70 71

ôf, where ôf is the ratio of the overall disadvantage to the ethnic TABLE 3.6
disadvantage, Estimates of Ethnic, Characteristics, and Overall Disadvantage*
Faced by Blacks and Indians Relative to Whites
8f = p,f /Àf. Ethnic Characteristics Overa/1
Disadvantage J)isadvan tage Disadva r1tage
If ôf = 1, then blacks do not face a characteristics disadvantage since (%) ( %) (%)
p,f = Àf. ln this case the ratio of the black and white sample propor- D isadvantage calculated
tions is equal to the corresponding ratio of the ethnic probabilities. If from probabilities as
the latter ratio is less than 1, then it is entirely due to the identical mean of predicted
characteristics of blacks and whites being evaluated more favorably individual probabilities
for the latter than for the former. However, if ôj < 1, then there is a Black/White
characteristic disadvantage, and the size of this disadvantage is greater j = 1 (SKL) 0.97 1.27 1.23
the further ô} is from 1. Blacks are penalized for their characteristics j = 2 (PMT)
Indian/White
0.95 0.74 0.70

f.
since then p,. < À They Jose out by having inferior characteristics, in j = 1 (SKL) 1.15 0.85 0.9R
addition to (possibly) having these characteristics evaluated less favor- j = 2 (PMT) 0.81 1.04 O.R4
ably than they would have been had they been the characteristics of Disadvantage calculated
whites. Lastly, if ôj > 1, then blacks enjoy an characteristics advan- from predicted

tage since then p,j > Àf. Blacks draw the sting from possible unfair
probabilities at mean
values of determining
treatment by acquiring superior characteristics. lndeed, if the value of variables
ôf was sufficiently large then it would be possible for IL j > 1, even Black/White
though Àj < 1. ln other words, the superior characteristics of blacks j = 1 (SKL) 0.95 1.26 1.23
j = 2 (PMT) 0.93 0.75 0. 70
would more than neutralize any unfair treatment that they might expe-
Indian/White
rience. Table 3.6 shows estimates of the three disadvantages for each j = 1 (SKL) 1.22 0.80 0.98
of the three ethnic groups. j = 2 (PMT) 0.73 1.15 0.84
The figures under the column "Ethnic Disadvantage" in Table 3.6
•Advantage if val ue > 1
indicate that if lndians and whites had been assigned a common set of Calculated from the figures of Tables 3.5
characteristics ( namely, the characteristics of the sample as a whole) UNS = Unskilled/Semiskilled manual
then the probability of lndians being in the PMT class would have SKL = Skilled Manu al/nonmanual
been 81% of the corresponding white probability. A similar exercise PMT = Professional/Manageria i/Technical
·. for blacks would have yielded a value of 95 %. However, the fact that
lndians and blacks, as distinct groups, had characteristics that were
different from those of the sample, considered in its entirety, meant their white counterparts. With respect to the SKL class, lndians had
that the sample proportions of lndians and blacks in the PMT class a characteristics disadvantage, relative to whites, of 15% while blacks
were, respectively, 84% and 70% of the corresponding white propor- had a characteristics advantage of 27% over their white counterparts.
tion. This rise for lndians from the ethnic 81% to the sample 84%, One could also have defin ed ethnic di sadvantage as pf I fi7'
-that
and the corresponding fali for blacks from the ethnic 95 % to the sam- is, as the ratio of the probabilities (shown in th e lower panei of
pie 70%, could be attributed, respectiveJy, to "superior" lndian char- Table 3.5) obtained by setting the values of ali except the ethnic vari-
acteristics and "inferior" black characteristics. Table 3.6 shows that, ables to their sample means. The characte ristics disadvantage is now
with regard to PMT employment, lndians had a characteristics advan- recomputed as the ratio of the overall disadvantage 59 and the new
tage of 4%, but blacks had a characteristics disadvantage of 15%, over estimate of ethnic disadvantage. The new values of ethnic and char-
72 /.)

acteristics disadvantage (as well as the unchanged value of overall the need to re-estimate the model. For example, suppose a (M + l )th
disadvantage) are shown in Table 3.6. alternative is introduced with attributes WM+l. s s = 1, .. . :_S. The
estimated Z-value associated with this new alternative is: ZM+I =
Conditional Logit and 2.:::;= 1 YsWM+l .s• where the coefficient estimates y, have already been
the lndependence of Irrelevant Alternatives ohtained. Ali that the introduction of a new alterna tive requires is that
a new term ZM+l to be added to the numerator in Equation 3.3 anel
ln a conditional logit model the outcome probabilities P;(Y; = j) for the probabilities to be recomputed. The reason that the existing
depend only upon the choice attributes and not upon the characteristics coefficient estimates will also serve when the set of alternatives is
of the individuais making the choices. ln other words, the conditional expanded is that the addition of an alternarive cannot change the
logit model is defined by Equation 3.3 but with the caveat that now, relative risk with which existing alternatives are chosen.
with the f3 jr = O, The percentage change in the probability of choosing a particular
alternative, given the introduction of a new alternative is easily com-
s puted as
Z;j =L 'Y;sYJ-fs· (3.17)
s= l Pr(YM+l = j) - Pr(YM = j) = ( exp(Zj) LJ:texp(Zi)) _ l
Apart from this, the model is essentially the sarne as the multinomial Pr( YM = j) "LJ:i 1 exp(Zj ) exp(Zi)
logit model. Usually the reason for omitting the characteristics of the - exp(Zu +1)
individuais is that data on individuais are not available. Consequently,
"L~~i exp( Z1)
1
the conditional logit model can be usefully thuught of as a collec-
tive of persons-hereafter, referred to as the "population"-making =- Pr( YM+I = M + 1),
choices between alternatives with different attributes-for example,
commuters choosing between modes of travei or consumers choos- where Pr(YM+l = j) and Pr(Y M = j) refer to the probabiliry of
ing between supermarkets. With this rnind, the subscript i relating to choosing alternative j when there are, respectively, M + 1 and M
individuais is dropped in the exposition that follows. The dependent alternatives available. As a consequence of in troducing an additional
variable Y = j relates to the choice made by the population and the alternative the probabilities of choosing the existing alternatives will
coefficient 'Ys refers to the weight attached by the population to the all fall by the sarne pcrcentage, which is equal to the probability of
sth attribute. As in the multinomial model, the risk-ratios of any two choosing the new alternative.
alternatives j and k in the conditionallogit model are independent of However, this property of a uniform percentage drop in ali cxist-
the other available alternatives, ing probabilities in the face of a new alte rnative is also a weakness

log Pr(Y =
( Pr(Y = k)
j)) = z'. _ z k_ s r
- L(Jtis- Wks h s· (3.18)
because it implies that the cross-elasticity of demand for an existing
alternative with respect to the new alternative is uniform across the
s=l
alternatives. 62 For this to be valid, the alte rnatives must be viewed as
completely distinct and independent. The fact that they are so viewed,
ln Equation 3.18 the log risk-ratio depends solely on the (differences which gives rise to liA, stems from the assumption that the errors are
between) attributes associated with j and k and is independent of independently and identically distributed.
the attributes associated with any other altcrnative. 60 This property, How debilitating this limitation of the model is depends on the
known as the Independence of Jrre/evant Alternatives (liA), is both the nature of the problem being analyzed. ln a model of occupational
principal strength and the principal weakness of such models. 61 It is a choice, for example, where the alternatives might be thought to be
strength because it allows the introduction of new alternatives without well-defined and relatively immutable, the issue of introducing new
74 7~

alternatives might not be a serious one.63 On the other hand, with hold similar (say, left-wing) views and which are generally regarded by
commuters' choice of travei mode the introduction of new modes voters as being not substantially different from each other. However,
of transport-new train !ines, bus lanes, congestion charges on cars, since the model cannot distinguish between a spurious and a genuine
etc.-must be considered a serious possibility. new alternative, with a "fourth " party the predicted logit probabili ties
A classic example of problems caused by the liA assumption is of voting for a left-wing party wo uld be predicted to rise substantially.
the "red bus-blue bus" problem. Suppose that commuters have three This then is the great weakness of condi tional logit: under JIA, the
choices for travei to work, car, train, or bus. Suppose, paren thetically, model offers no protection against spurious share inflation through
that ali the buses are painted red and that the logit predictions of the introduction of similar, or identical, alternatives.
Pr(Y = j) are Parenthetically one should note a furt her limitation that arises
when one considers compound choices-for example, the simul tane-
• Car: 55.4 ous choice of travei time and travei mode. Then, as Domencich and
• Bus (Red): 23.0 McFadden (1996) show, the joint probability of choosing time 1 and
• Train: 21.6. mode j , Pr(t n j), can be written as the product of: (the probabilíty
of choosing mode j, from the available choice of modes, at time 1)
The risk-ratio between car and bus travei is 55 .4/23 .0 = 2.4. Now and (the probability of choosing time t from the avail able choice
the bus company decides to paint half its fleet blue. It is reasonable of times). ln other words, an implication of liA is that the utility
to expect that this purely cosmetic change would Ieave commuters' function embodying a compound choice must be additively separable
choices unaffected and that the sarne proportions, as shown above, from the individual choices, U ( t, j) = cf;(j) + lj;(t).
would continue to travei by car, bus and train. However, since the
model cannot distinguish between a spurious and a genuine new alter- Alternatives to the Logit Model
native, with a "fourth" alterna tive the predicted logit probabilities are 64
Before considering alternatives to the logit model it is important
• Car: 45.1 to examine whether the assumptio n of liA is or is not valid. Suppose
• Bus (Red): 18.8 one believes that a subset of the choice set is irrelevant. E liminating
it from the choice set should not significantly alter the coefficient esti-
• Bus (Blue): 18.4
mates. This thinking lies behind the Hausman and McFadden (1984)
• Train: 17.7
test. If ·h is the vector of coefficient estim ates based on the restricted
set of choiccs, y0 is the vector of coefficient estimates based on the
Note that the probability of each of the existing alternatives, car, full set of choices, and ~ - Vaare the respective estimates of their
(red) bus, train has fallen by 18.4% , the probability of travei by covariance matrices, then the stati stic:
blue bus. The risk-ratios between car and red bus (and betwecn train
and red bus) travei remain unchanged but, in order to accommo- (·h - .Yo)' (~ - Va)( Y1 - Yo)
date this, some commuters who earlier travelled by car, train, or red
bus have to change to trave! by blue bus. As a consequence, by sim- is distributed as x 2 (S) under th e null hypothesis that the restrictions
ply painting half its fleet blue, the bus company could increase the of liA are validly imposed.
take-up of bus travei from 23 % of all commuters to 37.2% ! If this null hypothesis is not accepted, then one possibility is to
A more substantive example might be found in politics. Suppose specify the random utility function of E qua tion 3.1 as a multivnriate
a politicai party X is in competition with two other parties, Y and probit model,
Z. Using a multinomial logit model one can estimate the likelihood
s
of voters voting for Z as a ratio of the likelihood of voting for party
u j = L Ys Tf]s + êj êj ,...._ N(O, 2.) ,
X. Suppose now party X splits into two parties, X1 and X2, which s=l
76 77

where with a nonscalar covariance matrix l the errors need no longer Ordered Probit and Logit Programs
be independent. The problem with this model is the practical one of
computing the multinormal integration and estimation of the unre- version 6.0 I* Using STATA version 6.0 *1:
stricted covariance matrix (Greene, 2000, p. 865). use C: \ SAGE\ NI.dta I* Rcading data that are in STATA
Another way of relaxing the liA restriction is to group the alter- format: 58 vars, 13,164 observations *I;
natives in such a way that the variances differ across the groups, but #delimit; I* Commands will be terminated by; *I;
are the sarne within each group. Then the liA assumption is relaxed I* TITLE: ORDERED LOGIT AND PROBIT USING
between groups of alternatives but is maintained within each group. DEPRIVATION EXAMPLE *I;
This is the method of nested logit. This topic is not pursued here but gen pnum = _n; I* Each person is given a number *I;
details and an application can be found in Greene (1995).
I* y is dependent variable for ordered lugit
y = 1 is not deprived; y = 2 is mildly deprived; y = 3 is
severely deprived *I;
4. PROGRAM LISTINGS
tabulate y rc, col; tabulate y sex, col;
Introduction I* Oprobit equation is being estimated on entire subsample:
Table 2.2 *I;
This chapter contains the computer listings of the STATA programs oprobit y sex ct age age2 ret inac ue highed mided hnum
that generated the results discussed in the earlier chapters. The point snpar ard dwn crk ant col arm ban dry frm, table;
was made in the introductory chapter-and it bears repeating-that predict p1 p2 p3; I* Predicted probabilities are stored
there are at least four well-known, highly regarded pieces of soft- in p1, p2, p3 *I;
ware which, among other things, handle problems of the kind dis-
sort pnum; I* Observations are sorted by person number in
cussed in this monograph: SAS; SPSS vlO.O (A good introduction to
the use of SAS and SPSS procedures in the analysis of events with ascending order *I;
ordered and unordered outcomes is provided by the Stat/Math Center list pnum p1 p2 p3 in 1/25, noobs; I* Predicted probabilities
at the Indiana University 65 ); LIMDEP (see Greene, 1995); and STATA of first 25 persons are listed with person number: Table 2.4 *!;
(see STATA, 1999). It just so happens that I am more familiar with summarize pl p2 p3, detail; I* Predicted probabilities are
STATA than with the others. For this reason, and this reason alone, summarized with detail: Table 2.6 *I;
the programs of this chapter are written in STATA code. predict z, xb; I* The value of Z for each person is computed */:
The next section contains the program listings of the ordered logit summarize z, mean; gen zm = r(mean); I* The mean of Z is
'
' and probit models of Chapter 2, and section 3 contains the program computed and stored in zm *I;
listings of the multinomial logit model of Chapter 3. The programs I* b[ _cutl] and b[ _cut2] store the estimated values of the cutoff
are liberally interspersed with comments. These comments try to fulfill points, 81 and 82 *I; .
three aims:
gen p1 = normprob (_b[_cut1]- z); gen p2 = normprob (_b[_cut2J
• explain what a particular piece of program code is going to do (or has - z) - normprob(_b[_cutl] - z); gen p3 = 1- normprob
dane); (-b[_cut2]- z);
• relate the piece of code to a specific table, referenced in the earlier I* For each person, p1, p2 and p3 are predicted the probabilities,
cb,apters, so that the reader can see how the results were generated calculated using Equations 2.5a to 2.5c.
• relate the piece of code to specific equations, referenced in the earlier They are the sarne as the predicted p1-p3 calculated earlier,
chapters, so that the reader can see the methodology employed to gen- above *I;
erate a particular result gen ql = normprob(_b[_cutl]- zm); gen q2 = normprob(_b[_cut2]

I
78 70

- zm)- normproh(_b[_cutl]- zm); ge.n q3 = 1- normprob I* Calculating Marginal Effects: Dummy Variables (Religion) */;
(_b[_cut2]- zm); gen cto = ct; !* Saving original values *I;
I* q1, q2 and q3 are the predicted probabilities, calculated replace ct = 1; I* Everyone is Catholic ''I;
setting values to sample means: Equations 2.18a to 2.18c *!; predict pl p2 p3; /* Predicted probabili t~~s for each person
summarize p1 p2 p3; summarize q1 q2 q3; /* Comparing when CT; = 1 *I;
probabilities: Table 2.8 *I; predict z, xb; summarize z; gen zm = r(mean); I* The mean of
drop p1 p2 p3 q1 q2 q3; I* Releasing variable names for Z is computed and stored in zm *I;
subsequent use *I; gen q1 = normprob(-b[_cutl] - zm); gen q2 = normprob
I* Calculating Marginal Effects: Continuous Variables (Age) *I; (_b[_cut2]- zm)- normprob(_b[_cutl l- zm); gen
gen a1 = normd(_b[_cutl]- z)*_b[age]; q3 = 1- normprob(_b[_cut2]- zm);
gen a2 = (normd(_b[_cut2]- z)- normd(_b[_cutl]- z))*_b[age]; I* q1, q2 and q3 are the predicted probabilities, calculated
gen a3 = -l*Íwrmd(_b[_cut2]- z)*_b[age]; setting nonreligion values to sample means *I;
/* a1-a3 are marginal effects for AGE; calculated for each summarize p1 p2 p3; summarize q1 q2 q3; /* Table 2.12 *1:
individual: Equations 2.13a to 2.13c *I; drop p1 p2 p3 q1 q2 q3 z zm;
gen am1 = normd(_b[_cutl]- zm)* _b[age]; replace ct = O; I* Eve1yone is Protestan t *I;
gen am2 = (normd(_b[_cut2]- zm)- normd(_b[_cutl]- zm)) predict p1 p2 p3; I* Predicted probabilities for each person
*_b[age]; when CT;=O *I;
gen am3 = -l*normd(_b[_cut2]- zm)*_b[age]; predict z, xb; summarize z; gen zm = r(mean); /* The mean
I* am1-am3 are marginal effects for AGE setting variable values to of Z is computed and stored in zm */;
sample mean *I; gen q1 = normprob(_b[_cutlJ- zm); gen q2 = normprob
gen b1 = normd(_b[_cutl]- z)*_b[age2]; (_b[_cut2]- zm)- normprob(_b[_cutl]- zm);
gen b2 = (normd(_b[_cut2]- z)- normd(_b[_cutl]- z))* _b[age2]; gen q3 = 1-normprob(_b[_cut2]- zm);
gen b3 = -l*normd(_b[_cut2]- z)* _b[age2] ; /* q1, q2 and q3 are the predicted probabilities, calculated
~etting non-religion values to sample means *I;
I* a1-a3 are marginal effects for AGET calculated for each
individual: Equations 2.13a to 2.13c *I; summarize r1 p2 p3; summarize q1 q2 q3; I* Table 2.12 *I;
drop p1 p2 p3 q1 q2 q3 z zm;
gen bm1 = normd(_b[_cut1]- zm)* _b[age2];
replace ct = cto; !* Restoring original values *I;
gen bm2 = (normd(_b[_cut2]- zm)- normd(_b[_cutl]- zm))
'·. ~ _b[age2];
I* Ologit equation is being estimated on entire subsample:
gen bm3 = -l*normd(_b[_cut2]- zm)*_b[age2]; Table 2.1 *I;
I* am1-am3 are marginal effects for AGE2 setting variable ologit y sex ct age age2 ret inac ue highed mided hnum snpar
ard dwn crk ant col arm ban dry frm , table;
values to sample mean *I;
predict p1 p2 p3; I* Predicted probabili ties are stored
gen c1 = a1 + b1; gen c2 = a2 + b2; gen in p1, p2, p3 */; .
c3 = a3 + b3; gen cm1 = am1 + bm1; gen cm2 = am2 + bm2;
sort pnum; I* Observations are sorted hy pe~son number in
gen cm3 = am3 + bm3;
ascending order *I;
/* Add effects of AGE and AGE2 *I;
1

summarize c1 c2 c3; summarize cm1 cm2 cm3; I* Table 2.10 *I; list pnum p1 p2 p3 in 1/25, noobs; I* Predicted probabilities
drop z zm c1 a1 b1 c2 a2 b2 c3 a3 b3 cm1 am1 bm1 cm2 am2 of first 25 persons are listed, with person number: Table 2.4 *I;
bm2 cm3 am3 bm3; summarize p1 p2 p3, detail; /* Predicted probabilities are
:)[
80

summarized with detail: Table 2.6 *I; gen c3 = a3 + b3;


predict z, xb; /* The value of Z for each person is computed */; I* a1-a3 are marginal effects for A GE; calculated for each
summarize z, mean; gen zm = r(mean); I* The mean of individual: Equations 2.12a to 2.12c */;
Z is computed and stored in zm *I; I* b1-b3 are marginal effects for AGEf calculated for each
I* b[ _cutl] and b[ _cut2] store the estimated values of the individual: Equations 2. 12a to 2.12c *I;
cutoff points, cS 1 and cS 2 *I; /* cl-c3 is the sum of effects *I;
gen p1 = 1/ (1 + exp(z- _b[_cutl])); gen p2 = 1/ (1 + exp(z- replace lden1 = 1/ (1 + exp(zm- _b(_c utl]));
_b[_cut2])) -1/(1+exp(z- _b[_cutl])); gen p3 = 1- replace lden2 = 1/( 1 + exp(zm- _b[_cut2]));
1/ (1 + exp(z- _b[_cut2])) ; replace lden3 = 1/(1 + exp(zm- _b( _c ut2]));
I* For each person, p1, p2 and p3 are predicted the probabilities, I* Logit probabilities are calculated at mean *!;
calculated using equations 2.5a to 2.5c. gen am1 = lden1*(1 -lden1)*_b(age];
They are the· sarne as the predicted p1-p3 calculated earlier, gen bm1 = Jden1 *(1 - lden1)*_b( age2l;
above *I; gen q1 = 1/ (1 + exp(zm- _b(_cutl])); gen gen cm1 = am1 + bm1 ;
q2 = 1/ (1 + exp(zm- _b (_cut2])) - 1/ (1 + exp(zm- _b gen am2 = (lden2*(1 -lden2) -· ldenl * (1- lden1))*_b[age];
[ ~cut1]));
gen bm2 = (lden2*(1 -lden2)- lden l *(1 -ldenl))* _b(age2J;
gen q3 = 1- 1/ (1 + exp(zm- _b(_cut2])) ; gen cm2 = am2 + bm2;
/* q1, q2 and q3 are the predicted probabilities, calculated gen am3 = -lden3*(1- lden3)*_b[age] ;
setting values to sample means: Equations 2.18a to 2.18c *1;. gen bm3 = -lden3*(1 -lden3)*_b[age2];
gen cm3 = am3 + bm3;
summarize p1 p2 p3; summarize q1 q2 q3; I* Comparing
/* a1-a3 are marginal effects for A GE; calculated
probabilities: Table 2. 7 *I;
at sample means *I;
drop p1 p2 p3 q1 q2 q3; I* Releasing variable names for
/* b1-b3 are marginal effects for AGi:} calculated
subsequent use *I;
I* Calculating Marginal Effects: Continuous Variables at sample means *I;
/* c1-c3 is the sum of effects *I;
(Age) *I;
summarize c1 c2 c3; summarize cm1 cm2 cm3; /* Table 2.9 */;
gen lden1 = 1/(1 + exp(z- _b(_cutl]));
drop z zm c1 a1 bl c2 a2 b2 c3 a3 b3 cml aml bm1 cm2 am2
gen lden2 = 1/(1 + exp(z- _b(_cut2]));
bm2 cm3 am3 bm3;
gen lden3 = 1/(1 + exp(z- _b(_cut2])) ; /* Calculating Marginal Effects: Dummy Variables (Religion) '''/;
'·. I* Logit probabilities are calculated: Equations 2.9a to 2.9c *I;
replace ct = 1; /* Everyone is Catholic */;
gen a1 = lden1 *(1 - lden1 )* _b( age];
predict p1 p2 p3; /* Predicted probabÜities for each person
gen b1 = lden1*(1-lden1)*_b(age2];
gen c1 = a1 +b1 ; when CT 1=1 *!;
gen a2 = (lden2*(1 -lden2)- lden1 * predict z, xb; summarize z; gen zm = r(mean); I* The mean
(1 - lden1))*_b(age]; of Z is computed and stored in zm *I;
gen b2 = (lden2*(1 -lden2)- lden1 * gen q1 = 1/ (1 + exp(zm- _b(_cutl])): gen q2 = 1/(1 + exp
(1 -lden1))* _b[age2]; (zm - _b(_cut2]))- 1/ (1 + exp(zm -- _b[_cutl])); gen q3
gen c2 = a2+b2; = 1 - 1/(1 + exp(zm- _b( _cut2]));
gen a3 = -lden3*(1-lden3)*_b(age] ; I* q1, q2 and q3 are the predicted probabilities, calculated
gen b3 = - lden3*(1 -lden3)*_b(age2]; setting nonreligion values to sample means *I;
82 83

summarize p1 p2 p3 ; summarize q1 q2 q3; I* Table 2.11 */; predict ppll ppl2 pp13 if ct == O & resnum == 1;
drop p1 p2 p3 q1 q2 q3 z zm; egen mppll = mean(ppll ); egen mppl2 = mean(ppl2); egen
replace ct = O; I* Everyone is Protestant *I; mppl3 = mean(ppl3);
predict p1 p2 p3; /* Predicted probabilities for each person I* Ologit equation is being estimated on Catholic subsample */;
when CTi=O *I; ologit y sex age age2 ret inac ue highed mided hnum snpar ard
predict z, xb; summarize z; gen zm = r(mean); I* The mean dwn crk ant col arm ban dry frm if ct = = 1, table;
of Z is computed and stored in zm *I; lrtest, saving(O); I* Likelihood value saved for LR test */;
gen q1 = 1/(1 + exp(zm- _b[_cutl])); gen q2 = 1/ (1 + exp ologit y sex age age2 ret inac ue highed mided hnum snpar ard
(zm- _b[_cut2]))- 1/ (1 + exp(zm- _br_cutl])); crk ant col arm ban frm if ct = = 1, table;
gen q3 = 1-11(1+exp(zm- _b[_cut2])); lrtest; I* LR test on coefficients on dwn & dry jointly zero */:
I* q1, q2 and q3 are the predicted probabilities, calculated I* Predictions are being made for Catholics using Catholic
setting nonreligion values to sample means *I; coefficients and means computed *I;
summarize p1 p2 p3; summarize q1 q2 q3; /* Table 2.11 *I; predict cca1 cca2 cca3 if ct == J;
drop p1 p2 p3 ql q2 q3 z zm; egen mcca1 = mean(cca1); egen mcca2 = mean(cca2); egen
replace ct = cto; I* Restoring original values *I; mcca3 = mean(cca3);
I* Ologit equation is being estimated on Protestant subsample *I; predict ccs1 ccs2 ccs3 if ct = = 1 & snpar == 1;
ologit y sex age age2 ret inac ue highed mided hnum snpar ard egen mccs1 = mean( ccsl ); egen mccs2 = mean( ccs2); egen
dwn crk ant col arm ban dry frm if ct == O, table; mccs3 = mean(ccs3);
lrtest,saving(O); /* Likelihood value saved for LR test */; predict ccr1 ccr2 ccr3 if ct = = 1 & ret == 1;.
ologit y sex age age2 ret inac ue highed mided hnum snpar ard egen mccr1 = mean(ccr1); egen mccr2 = mean(ccr2); egen
crk ant col arm ban frm if ct == O, table; mccr3 = mean(ccr3);
lrtest; /* LR test on coefficients on dwn & dry jointly zero *I; predict cci1 cci2 cci3 if ct == 1 & inac == 1;
I* Predictions are being made for Protestants using Protestant egen mcci1 = mean(cci1 ); egen mcci2 = mean(cci2); egen
coefficients and means computed *I; mcci3 = mean(cci3);
predict ppa1 ppa2 ppa3 if ct ==O; predict ccu1 ccu2 ccu3 if ct = = 1 & ue == 1;
egen mppa1 = mean(ppa1); egen mppa2 = mean(ppa2); egen egen mccu1 = mean(ccu1); egen mccu2 = mean(ccu2); egen
mppa3 = mean(ppa3); mccu3 = mean(ccu3);
predict pps1 pps2 pps3 if ct == O & snpar == 1; predict ccll ccl2 ccl3 if ct == 1 & resnum == 1;
egen mpps1 = mean(pps1); egen mpps2 = mean(pps2); egen egen mccl1 = mean( ccll ); egen mccl2 = mcan( cc12); egen
mpps3 = mean(pps3); mccl3 = mean(cc13);
predict ppr1 ppr2 ppr3 if ct ==O & ret == 1; I* Predictions are being made for Protestants using Catholic
'
egen mppr1 = mean(ppr1); egen mppr2 = mean(ppr2); egen coefficients and means computed *!;
t mppr3 = mean(ppr3); predict pca1 pca2 pca3 if ct = = O;
l
predict ppi1 ppi2 ppi3 if ct ==O & inac == 1; egen mpca1 = mean(pca1); egen mpca 2 = mean(pca2) ; egen
egen mppi1 = mean(ppi1); egen mppi2 = mean(ppi2); egen mpc<'13 = mean(pca3);
mppi3 = mean(ppi3); predict pcs1 pcs2 pcs3 if ct ==O & snpar == 1;
predict ppu1 ppu2 ppu3 if ct == O & ue == 1; egen mpcs1 = mean(pcs1); egen mpcs2 = mean(pcs2); egen
egen mppu1 = mean(ppu1); egen mppu2 = mean(ppu2); egen mpcs3 = mean(pcs3);
mppu3 = mean(ppu3); predict pcrl pcr2 pcr3 if ct = = O & ret == 1;
84 85

egen mpcr1 = mean(pcr1); egen mpcr2 = mean(pcr2); egen I* End of program */;
mpcr3 = mean(pcr3);
predict pci1 pci2 pci3 if ct == O & inac == 1; Multinomial Logit Programs
egen mpci1 = mean(pci1); egen mpci2 = mean(pci2); egen
mpci3 = mean(pci3); version 6.0 I* Using STATA version 6.0 *I
predict pcu1 pcu2 pcu3 if ct == O & ue == 1; use c:\ SAGE\ GB.dta I* Reading data which are
egen mpcu1 = mean(pcu1); egen mpcu2 = mean(pcu2); egen in STATA format: 24 vars 98732 observations:
mpcu3 = mean(pcu3); data are for white, black and Indian full-time male employees,
predict pcll pcl2 pcl3 if ct ==O & resnum == 1;
25-45 years of age *I;
egen mpcl1 = mean(pcll); egen mpcl2 = mean(pc12); egen
#delimit;
mpcl3 = mean(pcl3);
I* Now perçentage contributions will be calculated *I; I* TITLE: MULTINOMIAL LOGIT USING OCCUPATIONAL
gen Aa1 = ((mppa1 - mpca1)1(mppa1- mcca1))*100; CLASS EXAMPLE *I; gen pnum = _n; I* Every person is
gen Aa2 = ((mppa2- mpca2)1(mppa2- mcca2))*100; assigned a number *I;
gen Aa3 = ((mppa3 - mpca3)l(mppa3- mcca3))*100; /* y is dependent variable for multinomial logit
gen As1 = ((mpps1- mpcs1)1(mpps1 - mccs1))*100; y = 1 if person is in unskilledlsemiskilled manual occupation
gen As2 = ((mpps2- mpcs2)1(mpps2 - mccs2))*100; y = 2 if person is in skilled manuallnonmanual occupation
gen As3 = ((mpps3- mpcs3)1(mpps3- mccs3))*100; y = 3 if person is in professional/managerialltechnical
gen Arl = ((mppr1- mpcrl)l(mppr1 - mccrl))*lOO; occupation *I;
gen Ar2 = ((mppr2- mpcr2)1(mppr2- mccr2))*100; I* Tabulating y for ali men; white men; hlack men; Indian
gen Ar3 = ((mppr3- mpcr3)1(mppr3- mccr3))*100; men *I;
gen Ail = ((mppil - mpcil)l(mppil - mccil))*lOO; tab y; tab y if blk == O & ind = = O; tab y if
gen Ai2 = ((mppi2- mpci2)1(mppi2- mcci2))*100; blk == 1; tab y if ind == 1;
gen Ai3 = ((mppi3- mpci3)1(mppi3- mcci3))*100; I* Now estimating multinomial logit equation, Table 3.2: see
gen Aul = ((mppul - mpcul)l(mppul - mccul))*lOO; Equation 3.14 for equation specification.
gen Au2 = ((mppu2- mpcu2)1(mppu2- mccu2))*100;
Note: base(l) below sets outcome 1 as base value *I;
gen Au3 = ((mppu3- mpcu3)1(mppu3- mccu3))*100;
mlogit y age age2 blk blkovs ind indovs ovsbn north inclnth
gen All = ((mppll- mpcll)l(mppll- mccl1))*100;
blknth south inclsth blksth highed indhe blkhe micled
gen Al2 = ((mppl2- mpcl2)1(mppl2- mccl2))*100;
indme blkme subbus subsci bush scih, base(l);
gen Al3 = ((mppl3- mpcl3)1(mppl3-mccl3))*100;
lrtest, saving(O); I* ~aving log-likelihood value for LR test *I;
summarize
ppa1 ppa2 ppa3 pca1 pca2 pca3 cca1 cca2 cca3 !* Defining zero constraints:
ppsl pps2 pps3 pcs1 pcs2 pcs3 ccsl ccs2 ccs3 blk, blkovs, dropped from equation for )' = 2;
pprl ppr2 ppr3 pcrl pcr2 pcr3 ccr1 ccr2 ccr3 ind, blknth, scih, bush, subsci dropped from equation for y = 3;
ppi1 ppi2 ppi3 pci1 pci2 pci3 cci1 cci2 cci3 indme, indhe, blkme droppcd from both equations */;
ppu1 ppu2 ppu3 pcul pcu2 pcu3 ccul ccu2 ccu3 constraint define 1 (2]blkovs = O; constraint define 2 (3]blk = O;
ppll ppl2 ppl3 pcl1 pcl2 pcl3 ccll ccl2 ccl3; constraint define 3 (3]ind = O; constraint define 4 (3]blknth = O;
I* Table 2.16 *I; constraint define 5 [3]scih = O; constraint define 6 [3]bush = O;
summarize Aa1 Aa2 Aa3 As1 As2 As3 Ar1 Ar2 Ar3 constraint define 7 [3]subsci = O; constraint define 8 indhe;
Ai1 Ai2 Ai3 Au1 Au2 Au3 All Al2 Al3; I* Table 2.17 *I; constraint define 9 indme; constraint define 10 blkme;
86
R7

I* Now mlogit equation will be estimated with constraints 1-10 ind ==O, mean; gen zm3 = r(mean) .
imposed, Table 3.3. I* The probabilities of outcomes 1, 2, 3 are calculated using
Note: constr(1-10) below imposes constraints *I; Equations 3.12a and 3.J2b, for white meu *I;
mlogit y age age2 blk blkovs ind indovs ovsbn north indnth gen sum = 1 + exp(zm2) + exp(zm3); gen pl = 1/sum; gen
blknth south indsth blksth highed indhe blkhe mided indme p2 = exp(zm2)/sum ; gen p3 = exp(zm3 )1sum;
blkme subbus subsci bush scih, constr (1-10) base(1); summarize p1 p2 p3 if blk == O & ind == O; I* Predicted
lrtest; I* Likelihood ratio test carried out: zero restriction~ probabilities are shown for white men (Table 3.4, lower
not rejected with chi2(13) = 8.1 *I; panei) *I;
I* Predicted probabilities for EACH PERSON for outcomes drop sum p1 p2 p3 zm1 zm2 zm3; I* V~t riables are released for
1, 2, 3 are stored, respectively, in pl, p2, p3 */; subsequent use *I;
predict p1, outcome(1); predict p2, outcome(2); predict p3, I* Now computing the mean of z1 z2 z3 over black men and
outcome (3); storing in zml, zm2, zm3 */;
/* Predicted probabilities are summarized for: all men, whites, summarize z1 if blk == 1, mean; gen zrnl = r(mean);
blacks, Indians (Table 3.4, upper panei) *I; summarize z2 if blk == 1, mean; gen zm2 = r(mean);
summarize p1 p2 p3; summarize p1 p2 p3 if blk ==O & summarize z3 if blk == 1, mean; gen zm3 = r(mean);
ind == O; summarize pl p2 p3 if blk == 1; I* The probabilities of outcomes 1, 2, 3 are calcula ted using
summarize pl p2 p3 if ind == 1; Equations 3.12a and 3.12b, for black men *I;
drop p1 p2 p3; I* Variables are released fo r subsequent use *I; gen sum = J + exp(zm2) + exp(zm3); gen p1 = 1/sum; gen
I* The value of Z is calculated for EACH PERSON for outcomes p2 = exp(zm2)1sum; gen p3 = exp(zm3)1sum;
1, 2, 3 and stored in z1, z2, z3 *I; summarize p1 p2 p3 if blk == 1; I* Predicted probabilities
predict z1, outcome(1) xb; predict z2, outcome(2) xb; predict are shown for black men (Table 3.4, lower pane!) */;
z3, outcome(3) xb; drop sum p1 p2 p3 zm1 zm2 zm3; /* Variables are released í~li
I* The means of zl, z2, z3 are stored in zm1, zm2, zm3 *!; subsequent use ~I;
summarize z1, mean; gen zm1 = r(mean); summarize z2, mean;
I* Now computing the mean of zl z2 z3 over Indian men and
gen zm2 = r(mean); summarize z3, mean; gen zm3 = r(mean);
storing in zm1, zm2, zm3 *I;
/* The probabilities of outcomes 1, 2, 3 are calculated using
summarize z1 if ind == 1, mean; gen zml = r( mean);
Equations 3.12a and 3.12b, for all men *I;
summarize z2 if ind == 1, mean; gen zm2 = r
'·. gen sum = 1 + exp(zm2) + exp(zm3); gen p1 = 1/sum; gen
(mean); summarize z3 if ind == 1, mean; gen zm3 = r(me ~m):
p2 = exp(zm2)1sum; gen p3 = exp(zm3)1sum; I* The probabilities of outcomes 1, 2, 3 are calculated using
summarize pl p2 p3; I* Predicted probabilities are shown for ali
Equations 3.12a and 3.12b, for Indian men */;
men (Table 3.4, lower panei) */;
gen sum = 1 + exp(zm2) + exp(zm3); gen p1 = 1/sum; gen
drop sum p1 p2 p3 zm1 zm2 zm3; I* Variables are released for p2 = exp(zm2)1sum; gen p3 = exp(zm3)/sum;
subsequent use *!; summarize p1 p2 p3 if ind == 1; I* Predicted probabilities
I* Now computing the mean of z1 z2 z3 over white men and are shown for Indian men (Table 3.4, lower pane!)*/;
storing in zml, zm2, zm3 *I; drop sum p1 p2 p3 zm1 zm2 zm3; drop zl z2 z3; /" Variables arç
summarize z1 if blk == O & ind == O, mean; gen zm1 = r released for subsequent use *I;
(mean); summarize z2 if blk == O & ind ==O, mean; /* Ethnic simulations follow *I;
gen zm2 = r(mean); summarize z3 if blk ==O & gen blko = blk; gen indo = ind ; /* Ethn ic variables
0 7
88

recomputed *I;
saved *I;
replace bik = O; repiace ind = O; !* Everyone is white *I; replace blknth = blk*north; replace blksth =
I* Ali interaction variabies invoiving blk and ind need to be blk*south;
replace blkovs = blk*ovsbn; replace blkhe = blk*highed;
recomputed *I;
I* No need to replace ind* variabies since ind is already O *I;
repiace indnth = ind*north; repiace indsth = ind*south;
repiace biknth = blk*north; repiace biksth = blk*south;
I* predicted probabilities for EACH PERSON for outcomes
repiace bikovs = bik*ovsbn; repiace indovs = ind*ovsbn ; 1, 2, 3 are stored, respectively, in pl, p2, p3 *I;
repiace bikhe = bik*highed; Predict pl, outcome(l); predict p2, outcome(2); predict p3,
I* Predicted probabiiities for EACH PERSON for outcomes outcome(3);
1, 2, 3 are stored, respectively, in p1, p2, p3 *I; summarize pl p2 p3; drop p1 p2 p3; I* Predicted probabilities
predict p1, outcome(1); predict p2, outcome(2); predict p3, are summarized over entire sample:
outcome(3); EVERYONE assumed black (Tabie 3.5, upper panei) */;
summarize p1 p2 p3; I* Predicted probabiiities are summarized I* The value of Z is calcuiated for BACH PERSON for outcomes
over entire sampie: 1, 2, 3 and stored in zl , z2, z3: EVERYONE assumed biack */;
EVERYONE assumed white (Tabie 3.5, upper panei) *I; predict z1, outcome(l) xb; predict z2, outcome(2) xb; predict
z3, outcome(3) xb;
drop p1 p2 p3;
I* The value of Z is calculated for EACH PERSON for I* The mean of zl, z2, z3 are computed over entire sampie and
outcomes 1, 2, 3 and stored in z1 , z2, z3: EVERYONE stored in zml, zm2 zm3 *I;
summarize z1, mean; gen zm1 = r(mean); summarize z2, mean;
assumed white *I;
predict z1, outcome(1) xb; predict z2, outcome(2) xb; predict gen zm2 = r(mean); summarize z3, mean; gen zm3 = r(mean);
/* The probabiiities of outcomes 1, 2, 3 are calculated using
z3, outcome(3) xb;
I* The mean of z1, z2, z3 are computed over entire sampie and Equations 3.12a and 3.12b:
stored in zm1, zm2 zm3 *I; EVERYONE assumed black *I;
summarize z1, mean; gen zm1 = r(mean); summarize z2, gen sum = 1 + exp(zm2) + exp(zm3); ge11 pl = 1/sum;
mean; gen zm2 = r(mean); summarize z3, mean; gen gen p2 = exp(zm2)1sum; gen p3 = exp(zm3)1sum;
zm3 = r(mean); summarize p1 p2 p3; /* Predicted probabilities are summarized
r' The probabilities of outcomes 1, 2, 3 are calculated using over entire sampie:
Equations 3.12a and 3.12b: EVERYONE assumed black (Table 3.5, lower panei) *I;
EVERYONE assumed white *I; drop z1 z2 z3 sum p1 p2 p3 zm1 zm2 zm3; I* Variabies are
gen sum = 1 + exp(zm2) + exp(zm3); gen p1 = 1lsum; gen reieased for subsequent use *I;
p2 = exp(zm2)1sum; gen p3 = exp(zm3)1sum; repiace ind = 1; repiace bik = O; I*Everyone is
summarize p1 p2 p3; I* Predicted probabilities are summarized Indian *I;
over entire sampie: I* Ali interaction variabies involving bik and ind need to
EVERYONE assumed white (Tabie 3.5, lower panei) *I; be recomputed *I;
drop zl z2 z3 sum p1 p2 p3 zm1 zm2 zm3; /* Variables are repiace indnth = ind*north; repiace indsth = ind*south;
reieased for subsequent use *I; replace blknth = bik*north; repiace biksth = blk*south;
repiace blk = 1; I*Everyone is biack *I; repiace bikovs = bik*ovsbn; repiace indovs = ind*ovsbn;
i* Ali interaction variabies invoiving blk and ind need to be repiace bikhe = bik*highed;
90 01

I* Predicted probabilities for EACH PERSON for outcomes NOTES


1, 2, 3 are stored, respectively, in p1, p2, p3 *I;
1. Available at http://www.indiana.edur statmat h/stat/a ll/cat/giant.html.
predict p1, outcome(1); predict p2, outcome(2); predict p3, 2. With just two outcomes, ordin ary logit and pro bit methocls can be used inespec-
outcome(3); tive of whether the dependent variable is ordinal o r nonordinal.
summarize p1 p2 p3; drop p1 p2 p3; /* Predicted probabilities 3. However, ordinal models may be based on distributions other than the logit or
probit. For example, the log-log or negative log-log or complementary log-log clistrihn-
are summarized over entire sample:
tinns for skewed ordinal data (see Agresti. 1990). Th e ge neral point is that ordercd logit
EVERYONE assumed Indian (Table 3.5, upper panei) *I; and probit models are a subset of a much larger set of options of dealing with ordinal
I* The value of Z is calculated for EACH PERSON for dependent variables. Indeed, if the number of outcomes is large (e.g., greater th an 20)
outcomes 1, 2, 3 and stored in z1, z2, z3: EVERYONE then the methods of ordered logit and probit coul d become cumbersome and it might
be preferable to use other methods (see: Jõreskog and Sorbom, 1988, pp. 44--45, anel
assumed Indian *I; 1993, pp. 1-17, for discussion of this poin t).
predict z1, outcome(1) xb; predict z2, outcome(2) xb; 4. The ranking may be on the basis of incarne, in which case there is an objective
predict z3; outcome(3) xb; hierarchy of occupations, or it may be based upo n cultural assumptions anel préjndice.
I* The mean of z1, z2, z3 are computed over entire sample 5. Better, that is, than treating outco mes as ordered, unless one had goocl reasons
for not imposing a ranking.
and stored in zm1, zm2 zm3 *I; 6. Remembering that N = N , + N 2 + N 3 .
summarize z1, mean; gen zm1 = r(mean); summarize z2, mean; 7. Because of the manner in which th ey are computed, these estim atcs are termed.
gen zm2 = r(mean); summarize z3, mean; maximum likelihood estimares.
gen zm3 = r(mean); 8. There are other approachcs to modelling ordin al outcomes, such as adjacent
I* The probabilities of outcomes 1, 2, 3 are calculated categories logit, stereotype, logit and contin uation ratio logit models (see Agresri, l99ó.
pp. 216-220).
using Equations 3.12a and 3.12b: 9. It closely resembles a t distribution with seven degrees of freedom (CJreene,
EVERYONE assumed 2000, p. 815).
Indian *I; 10. See Brant (1990) for a discussion of this property.
gen sum = 1 + exp(zm2) + exp(zm3); gen p1 = 11sum; 11. Greene (2000) actually indexes his cutoff points beginning with O. So in his
nota tion, !J-11 = O. For ease of comparison, however, I have begun the index for bis
gen p2 = exp(zm2)1sum; gen p3 = exp(zm3)1sum; cutoff points at 1.
summarize p1 p2 p3; /* Predicted probabilities are summarized 12. Remember that Greene (2000, p. 876) writes ,:j:; as {3tx.
over entire sample: 13. Since ~J- 1 =O, 8 1 = -{3 0 , so that the first ST/\TA cutoff point is the negcl!ive of
the in tercept.
EVERYONE assumed Indian (Table 3.5, lower panei) *I;
14. Normally distributed with mean O and varianc:e 1.
drop z1 z2 z3 sum p1 p2 p3 zm1 zm2 zm3; I* Variables are 15. Remember that the three probab il ities must sum to unity.
released for subsequent use *I; 16. The coefficients associated wi th one of the outcomes have to be norma lized for
replace blk = blko; replace ind = indo; I* Original identifiability. This is discussed in the next chapter.
17. Th ese individuais represent a 2% sample o; th e census records. For ;l fuller
variables restored *I; account of the data used see Borooah (2000a).
I* End of program *I; 18. A person was regarcled as not deprived if D, ,= O; as mildly deprived if O , D, :::,
15: as severely deprived if D; > l5 where l5 was the mean of the depriva tion ind ex.
19. After normalization, AGE; = 1, for those wh(\ were 17 yea rs o ld; AGE, == 2. for
those who were 18 years old; and so on.
20. These are qualifications generally obtained at 18+. 1
21. These were: Belfast (Areal); Ards, Castlereagh, North Down (Ar ea 2):
Down, Lisburn (Area 3); Carrickfergus, Larne, Newtownabbey (Area 4); 1\ntrim.
Ballymena and Ballymoney (Area 5); Armagh, Newry & Mourne (Area 7); Coleraine.
Cookstown, Maghrafelt, Moyle (Area 6); Banbridge. Craigavon. Du ngannon (Area 8);
Derry, Limavady (Area 9); and Fermanagh, Omagh. Strabane (Area 10). Because of
92

multicollinearity, ali 10 areas cannot be included in Equation 2.16. AREA 1 (Belfast) is 48. Unless there is a specification to the contrary, the discussion in this section and
the area that was dropped (aliased) from the equation. subsequent sections always cany a ceteris paribus claus..:.
22. AJI non-Roman Catholics are identified as "Protestants," although this latter 49. Though not for blacks in the North in PMT employment.
group contained persons who either did not state a religion (7.2% of Northern lreland 50. ln other words, 41.5% of ali men, 41.8% of white men, 29.3% of black men,
residents) o r declared that they had no religion (3.8% of Northern lreland residents). and 35.1% of lndian men in the sample were in PMT jobs.
23. Note that (a) under STATA, {3 1 =O, and (b) from Equation 2.16, dD ;/d.AGE, = 51. By replacing zi in the equations with, as appropriate, z"1' ZHJ and ZIJ
{3 4 + 2{3y4GE, (j = 1, 2, 3).
24. These ratios represent a form of the Wald statistic. 52. Or equivalently calculated using the mean values X w, of the individual values oi
25. ln order to test the parallel slopes assumption the model was also estimated the determining vari ables X ,k for the white persons in the sample.
using multinomiallogit and this yielded a likelihood value (denoted L 2 ) of -12380.36. 53. It should be noted that this section presen ts simulation-based strategies con-
The computed value of 2(L 2 - L 1 ) was 86.4 (the comparison was with the ordered cerned with a specific set of questions rel ating to differences in intergroup outcornes.
logit model) and on a strict likelihood-ratio test interpretatio11 this exceeded the 5% For a more general approach to simulation strategies see Tomz, Wittenberg, and King
criticai x 2 (20) value of 31.41. However, as noted earlier, thc value of 2(L 2 - L 1 ) is only (1999). Tomz et ai. (1999) have developed a program, ll: larify, that uses stochastic sim-
suggestive since the statistic does not provide the basis for a likelihood-ratio test. ulation to convert the raw output of statistical procedur.;:s into results that are of direct
26. Note: xb + u (in the Tables) = z, + e, (in the text). interest to researchcrs. This program is designed fo r use with STATA.
27. The cutoff points are really only coefficients of the model and are estimated 54. A similar exercise can be performed for the hypothetical case where everyone
along with the slope coefficients using maximum likelihood methods. in the sample is lndian .
28. These proportions were set out at the start of this section. 55. Which were ai, for whites and ai,+ yi, for lndians.
29. Though the outcome would be different if the average was defined as the median 56. The Z,i are computed using the estimates shown in Table 3.3 in conjunction
(rather than the mean) of the individual probabilities. with the va lues of the dctermining variables for every individual. When evetyone is
30. Note that under STATA {3 1 = O. white, IN D , = BLK, = O and when everone is black, BL K , = 1, in the estimated
31. For case of presentation, the subsequent discussion is entirely in terms of the Equation 3.14.
logit model. 57. On the other hand, if PJ> pl, then Af > 1 anJ there is an ethnic advantage.
32. For example, nearly 16% of Catholics, compared to 8% of Protestants, in the 58. On the other hand, if si > s) , then ~Lf > 1 and there is an overall bonus.
sample were single parents. 59. This depends only on the ratio of sample proportions which, of course, remain
33. Note that these predicted means would be identical to the sample proportions unchanged.
of Catholics and Protestants at the different deprivation leveis. óO. l n the multinomial model, log-odds-ra tio depended solely on the (differences
34. That is, Protestant attributes evaluated at Protestant coefficients. between) coefficients associated with j and k and was independent on the coefficients
35. ln the case of unemploye d and retired persons these probabilities hardly associa ted with any other outcome (Equation 3.6).
changed. 61. Domencich and McFadden (1996).
36. ln the sense of having the sarne average probability of being mildly deprived. 62. This drawback is shared by many empirically convenient functi onal forms in
37. These individuais represent a 2% sample of tbe census records. For a fuller eco nomics, for example the Cobb-Douglas or the constant elasticity of substituti on
account of the data used see Borooah (2000b ). (CES) functional forms.
38. The default was no post-18 qualifications. 63. Though it must be emphasized that new altcrna tives could be introduced by
39. Relevant only if the person had post-18 qualifications. The default area was splitting existing ones: for example, skilled manu a]Jnonmanual into skilled man ual and
Arts-related subjects. skilled nonmanual.
40. The default group was white. 64. See the discussion earlier in the section on how probabilities are recomputed
41. The default area was London. when a new alternative is introduced.
42. ln terms of the standard regions of 13ritain: East Midlands; East Anglia; the 65. Available at http://www.indiana.edu;- statmath/srat/all/cat/giant.html.
Southeast (excluding London); the Southwest.
43. The North; Yorkshire; West Midlands; Northwest; Wales; Scotland.
44. Note that any one of the three outcomes could have becn chosen as the base
outcome.
45 . Of colll·se, if the relevant coefficient for a particular group was positive, then it
would enjoy an "ethnic bonus."
46. These ratios represent a form of the Wald statistic.
47. Other than age: older persons were more likely to be in the higher occupational
classes or, in terms of the estimated coefficients of Equation 3.14, Ôi1 > O, j = 1, 2.
i):")

References
AGRESTI, A. (1996).An introduction to categorical da ta analysis. New York: John Wi ley.
ALDRICH, J. H., and NELSON, F D. (1984). Linear probability, logit, anct probit
models, Quantitatil'e Applications in the Social Sciences, 07-045. Beve rly H ills, CA:
Sage.
BEN-AKIVA, M. , and LERMAN, S. (1985). Discrete choice analysis . Londor.: :1\!IT
Press.
BOROOAH, V. K., and CARCACH, C. (1997) . Fe ar and crime: Evidence from Aus-
tralia. British Journal of Climinology, 37, 634-656.
BOROOAH, V. K. (2000). Targeting social necd: Why are deprivation leveis in North-
ern Ireland higher for Catholics than for Protestants? Journal of Social P(J/icy, 29,
281-301.
BOROOAH, V. K. (2001). How do employees of ethnic origin fare on the occupational
ladder in Britain? Scottish Journal of Politicai Economy, 48, l-26.
BRANT, R. (1990). Asscssing proportionality in the rroportional o dds model f01 ordinal
logistic regression. Biometrika, 44, · 131-·140.
CRAMER, J. (1999). Predictive performa nce of tlll" binary logit model in unbalanced
samples. Joumal of the Royal Statistica/ Society, Se1ies D (The Statistician ). 88. 85-94.
DEMARIS, A. (1992). Logit modelling. Quanritativr Applications in the Social Sciences.
07-086. Newbury Park, CA: Sage.
DESAI, M., and SHAH, A. (1988). An econometric approach to the measurement of
poverty. Oxford Economic Papers, 40, 505-522.
DOMENCICH, T. A. , and McFADDEN, D. (1996) Urban travei demand: .4 bPiwvioral
ana(ysis. Amsterdam : North-Hollanú.
GREENE, W. H. (1995). LIMDEP, version 7.0: User '.; manual. Bellport. NY: Ecnnomet-
ric Software.
GREEI\fE, W. H . (2000) . Economellic analysis. E nglewood Cliffs, NJ: Pre nt ice Hall
(4th ed.).
HAUSMAN, J., and McFADDEN. D . (1984) . A specitication test for the m ul tinom ial
logit model. Econometrica, 52, 1219-1240.
HENSHER, D. (1986). Simultaneous estimation of hierarchica/logit mode choic,· models
(Working Paper No. 24). MacQuarie Un ivers ity. School of Economic and Fin ancial
Studies.
JÓRESKOG, K., and SÓRBOM, D. (1988). PRELlS: A program for multivaruuc darn
screening and data swnmarization. Chicago: Scientific Software Inc.
JÓRESKOG, K., and SÓRBOM, D. (1993). LJSREL 8: Strucwml equarion modcling
with the SlMPLlS command language. H illsdale, NJ: Lawrence Erlbaum.
KAY, R., and LITTLE, S. (1986). Assessing thc tit of the logistic model: A ca ~e study
of children with haemolytic uraemic syndrom e. A pplied Statistics. 35, 16-30.
LIAO, T. F. (1994). Interpreting probability models. Quantirative Applicntion ' in tlw
Social Sciences, 07-101. Newbury Park, CA: Sage. I
McFADDEN, D . (1973). Conditional logit analysis of quhlita tive choice beh av ior. ln
P. Zarembka (Ed.), Frontiers in econometrics. New York: Academic Press.
MENARD, S. (1995). Applied regression a nalysis . Quantitative Applications in tlie Social
Sciences. Thousand Oaks, CA: Sage.
NOLAN, B., and WHELAN, C. T. (1996). Resozm es deprivation and poverty. Oxf0rd:
Clarendon Press.
~o J.

i ABOUT THE AUTHOR


PIACHAUD, D. (1987). Problems in the definition of poverty. Joumal of Social Po/iq, l
16, 147-164.
SCHMIDT, P., and STRAUSS, R. P. (1975). The Prediction of occupation using multiple
logit models. International Economic Review, 13, 471-486.
STATA (1999). Stata reference manual release 6. Cotlege Station, TX: Stata Press.
lI VANI K. BOROOAH is Professor of Applied Economics at the Uni-
versity of Ulster, a position he has held since 1987. Prior to that he was
Senior Research Officer at the Department of Applied Econornics at
TOMZ, M., WITTENBERG, J., and KING, G. (1999). CLARIFY: Software fo r interpret-
ing and presenting statistical results, version 1.2.1. Cambridge, MA: Harvard University
r the University of Cambridge and Fellow and College Lecturer in Eco-
nomics at Queens' College, Cambridge. Bom and mostly educateJ in
Press. (http://gking.harvard.edu/). j_
TOWNSEND, P. (1979). Poverty i11 the United Kingdom . Harmondsworth: Penguin. India, Borooah earned his PhD from the University of Southampton
VEALL, M. R., and ZIMMERMANN, K F. (1996). Pseudo-R 2 measures for some and his MA from the University of Bombay. His research focuses on
common limited dependent variable models. Journal of Economic Surveys, 10, 241-
260.
I poverty, inequality, and labor market outcomes, and he has published
severa! academic papers and books on these subjects. He is particu-
larly interested in the public policy implications of intergroup differ-
\ ences in economic and social outcomes. Borooah combines his study
' of the policy implications of intergroup differences with an interest
in the economies and societies of developing countries. He has been
President of the European Public Choice Society and President of the
Irish Economics Association.

I
i

I j
I Quantitative Applications - ~the Social Sciences
~-· I . - - - - - - - - - r - ·--- ·-------- - -
i A SAGE UNIV ER S ITY PAPERS SERIES
-----------:-------- --- -- ----- 1. Analysls of Vari ance, 2nd Editio n 53. Secondary Analysis of Survey Data
I j lversen/ Norpoth Kit~colt/Na lhan

r---
I --------{-------------+-
!
I
~ - Operations Research Meth od s Nagei/Neef
3. Causal Modeling , 2nd Editi o n Asher
54. Multiva riate Analysis of Vari ancc
Bray/Maxwe/1

~·-~·:---~-------~--=-~--~-----
I
4. Tests of Significance Henke/ 55. Th e Logic of Causal Order Davis
S. Cohort An alysis Glenn 56. lnt roductio n to Linear Goa I Prog ram mi ng
6. Canonical Analysi s and Fact or Comp ari son lgnizio

-t-- ---+--_____
-------------- Levine 57. Understandi ng Regression An alysis
' 7. Analysis of Nom inal Dat a, 2nd Ed ition Schroeder/Sjoquist/Stephan
_ ______~_
Reynolds
8. Analysis of Ordinal Data
58. Randomized Response Fax/Tracy
59. Meta-A nalysis Wolf
I Hildebrand! Laing/Rosentha / 60. Li nea r Programmi ng Feiring
- ' - - - -- 9. Time Serles Analysis, 2nd Edlti on Ostrom 61. Multiple Com parisons Klockars/Sax
I
1O. Ecologlcal lnference Langbain!Lichtman 62. lntormat ion Theory Krippendorff
f-------
I -------- ---- - - - - 11 . Multidimens ional Scaling Kruska//Wish 63. Su rv ey Quest ions Converse/Presser
-t 12. Analysls of Covarlance Wildt/Ahto la 64. La t ent Class Analysis McCutcheon

~
- 13. lntroduction to Factor Anal ys is Kim/ Muel/er 65. Th <ee-Way Scaling and Clustering
I_ _____ _

--
14. Factor Analysis Kim/Mueller Arabie!Carro/1/DeSarbo

---~-1-
15. Multiple lndicators Sul/ivan/ Feldman 66. Q Metho dology McKeown/Thomas
16. Exploratory Data An alysis Hart wig!Dearing 67. An alyzi ng Decisio n Ma kin g Louvielf'
i
----1------------- -- ------- 17. Reliability and Validity As sess ment 68. Ra se h Models fo r Meas urement Andrich
- I 1 Carmines!Ze!ler 69. Principal Com ponents An alysis Df•nteman
--
I 18. Analyzing Panei Data Markus 70. Poo led Time Series Analysis Sayrs

--------
19. Discriminant Analysis K/ecka 71. An alyzing Complex Survey Data
20. Log-Linear Models Kn oke/ Burke

~==-----·:-----------=-~~=~=~
1 \ - --- - - - - Leet'Forthofer/Lorimor
21 . lnterrupted Time Series An alysis 72. lnterac tion Effect s in Mult iple Regression

' ----- --- ~ --- ----==--~-~--~


McDowaii/ McC/eary! Meidinger! Hay Jaccard/ Turrisi/Wan
22. Applied Regression Lewis-BecY. 73. Understandi ng Significance Testing Mol>r
23. Research Designs Spector 74 . Ex perimental Design and Analy sis
i
t----------:- -- ---- ----- ---- 24. Unidimensional Scalin g Mclver/Carminas
25. Magnitude Scaling Lodge 75 .
Bro .vn/Melamed
Metric Scol ing Wellet!Romney
f---- - - - -t - - - - --- 26. Multiattribute Evaluation Edwards/Newman
27. Dynam ic Modeling lfuckleldt!KohfeldtLikens
76.
77.
Longitudi nal Rcsearch Menard
Expert Sys tems Benfer/Brenf,Furbee
1 i
-- --- --4----
28. Network Analysi s Knok<i KuAilnski 78. Data Theory and Dimensional Analy sls
1 i
r ~- ------------·-- --- 29. lnterpreting and Using Regression Acl1en Jac,,by
, ' 30. Test Item Bias Osterlind 79. Reg ressi on Diagnostics Fox
I 31 . Mobllity Table s Hout 80. Computer-As sis ted lnterviewing Saris
32. Measures of Associat ion Liebetrau 81. Con tex tuai Analys is lve rsen
f --- 33 . Confi rmatory Factor Analysis Long 82. Summa ted Rating Scale Con struction
I
34. Covariance Stru ctu re Models Long Spe .-tor
35. lntroduclion to Survey Sa mpli ng Kalton 83. Cen tra l Ten dency and Variabillty We1sberg
36. Achievement Testlng Bejar 84. AN OVA: Repea ted Measu res Glfden
37. Nonrecursi ve Cou sa! Mod ets Berry 85. Process ing Data Bourque/Ciark
----- -------- ------ 38. Matri x Algebra Namboodiri 86 . Log it Modeling DeMans
r~------ 39. lntroduction to Applled Dem ography 87 . Ana lytic Mapping and Geographic
Rives! Serow Databases Garson-!Biggs
1
40. Microcomp utor Met hod s for Social 88. Worklng With Archiva l Data
i- ----------- ---~--- --
Scientists , 2nd Edition Schrodt Elder!Pavalkó/C/1pp
I~-- ----------,- - - - - --- - --- 41. Game Th eory Zaga .-e
42. Uslng Published Dat a Jacob
89 .
90.
Mult,ple Comparison Procedures Toothaker
Non pa rametric Statistics Gibbons
43. Bayesia n Stat istical lnference /versen 91 . Nonpa rametrl c Measures of Associatlon
4~ . Cluster Analysis Aidenderfer/8/ashfie/d Gibbons

______ ______ 45. Linear Probobility, Logit , and Prob it Models 92. Understanding Regresslon Assumption s

I
!-----
, -- Aidrich/ Nelson
46. Eve nt History Anal ysis Allison
47. Ca nonical Corre-lation An ai ysis Thompso.1
Berry
93. Reg ress lon Wi t h Dummy Vari ables Hardy
94. Loglinear Models Wi tti La tent Varlables
I
48. Models for lnnovation Diffusi on Hagenaars
L_ MahajRn! Peterson 95. Boot strappi ng Mooney/ Duval
49. Basic Content Analy si s, 2nd Editi on 96. Max 1mum Li kelihood Estim ation Etiason
L.ll)ll(JlCt.- L I V 1'- t"\.l'-1~ \....f "\ l ~ U a '-' L t LJ r\. We ber 97. Ord inal Lo g-Linear Models lshi1-Kuntz
50. Multiple Regression in Pract ice 98. Random Fac\ ors in ANOVA
I
Nf. : 044265 R$ 27.67- 11 /08i2009 Berry/ Fefdman Jackson!Brashers
51 . Stochastic Parameter Regression Models 99. Univari ate T~ st s for Time Seri es Models
CURSO DE CIENCIA POLÍTICA NeVJbold/Bos Cromwe/1/Labys/Terraza

~~~~-:;;,~~:t~=~~L~~y~~e~r~;~ies Model s
52. Using Microcomputers in Research
Madron!Ta te/Brookshire 100. \

- - - - - - ---- ---- --r---------- --1


Other volumes in thi s s e ri es listed on outsi de back c over
--1- -----------
Quantitative Applications in the Social Sciences
A SAGE UNIVERSITY PAPERS SERIE S

101 . lnterp reting Probabllity Models : Logit, 120. Stati stica l Graph ics fo r Visua llzing
Probit, and Other Generalized Linear Models Multivar iate Dat a Jacoby
Lia o 121 . Appl ied Correspondence Ana lysis C/ausen
102. Typologles and Taxonom ies Ba iley 122. Game Th eory Topic s Fink/Gates/Humes
103. Data Analysis : An lntroduction Lewis-Beck 123. Social Choice: Theory and Resea rch
104. Multiple Attribute Declsi on Making Johnson
Yoon/Hwa ng 124. Neural Networks Abdi/Valentin/ Edelman
105. Causal Analysis With Panei Data Finke/ 125. Relating Statlstics and Exper imenta l
106. Applied Logistlc Regression Anaiysis, Desi gn : An lntroduction Levin
2nd Editlon Menard 126. Lalent Clas s Scal ing Analy sls Dayton
107. Chaos and Catastrophe Theories Brown 127. Sorting Data : Coliection and Analy sls
108. Basic Math for Social Scientlsts: Concepts Coxon
Hag/e 128. Analyzing Documenta ry Accoun ts Hodson
109. Basic Math for Social Scientists: 129. Effect Size for A NOVA Desi gns
Problems and Solutions Hagle Cortina/Nouri
110. Calculus lversen 130. Nonparametric Simple Regresslon :
111. Regresslon Models: Censored, Sample Smoothing Scatterplots Fax
Selected, or Truncated Data Breen 131. Multiple and Generalized Nonparametric
112. Tree Models of Slmllarity and Associatlon Regression Fox
Jam es E. Corter 132. Logistic Regression : A Primer Pampel
113. Computatlonal Modeling Taberfrimpone 133. Translating Questionnaires and Other
114. LISA EL Approaches to lnteraction Effects Research lnstruments : Problems
in Multiple Regression Jaccard/Wan and Solutlons Behling/Law I·
115. Analyzing Repeated Surveys Firebaugh 134. Generalized Linear Models :
116. Monte Cario Slmulation Mooney A Unifled Approach Gil/
1 H. Statistical Graphlcs for Univariate and 135. lnt eractlon Effects in Log lstic Regression
Bivariate Data Jacoby Jaccard
118. lnteraction Effects in Factorial Analysis 136. Missing Data Allison
of Variance Jaccard 137. Spiine Regression Models Marsh/Cormier
119. Odds Ratios ln the Analysis of 138. Log it and Probit : Ordered and Multinomlal
Contingency Tables RudCJs Models Borooah

'
'·.

Visit our website at www.sagepub .com


ISBN 0-7619-2242-3

11111111111111111111111111
780761 922421
111111 ~i
($). SAGE PUBLICATIONS
lnterna tional Educational and Professional Publ isher
Thousand ~aks London New Dell1i
__ ___j
, I
I

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy