(Ebook) Applied Regression Analysis and Generalized Linear Models by John Fox ISBN 9781452205663, 1452205663
(Ebook) Applied Regression Analysis and Generalized Linear Models by John Fox ISBN 9781452205663, 1452205663
https://ebooknice.com/product/beyond-multiple-linear-regression-
applied-generalized-linear-models-and-multilevel-models-in-r-33348806
ebooknice.com
https://ebooknice.com/product/beyond-multiple-linear-regression-
applied-generalized-linear-models-and-multilevel-models-in-r-33645838
ebooknice.com
https://ebooknice.com/product/an-r-and-s-plus-companion-to-applied-
regression-2330488
ebooknice.com
https://ebooknice.com/product/generalized-linear-models-with-random-
effects-unified-analysis-via-h-likelihood-2529518
ebooknice.com
(Ebook) Regression Analysis: An Intuitive Guide for Using and
Interpreting Linear Models by Jim Frost ISBN 9781735431185, 1735431184
https://ebooknice.com/product/regression-analysis-an-intuitive-guide-
for-using-and-interpreting-linear-models-11168880
ebooknice.com
https://ebooknice.com/product/an-r-companion-to-applied-
regression-7354440
ebooknice.com
https://ebooknice.com/product/regression-analysis-an-intuitive-guide-
for-using-and-interpreting-linear-models-42876970
ebooknice.com
https://ebooknice.com/product/an-r-companion-to-applied-
regression-6683450
ebooknice.com
https://ebooknice.com/product/confidence-intervals-in-generalized-
regression-models-1624340
ebooknice.com
Applied Regression Analysis and Generalized Linear
Models 3rd Edition John Fox Digital Instant Download
Author(s): John Fox
ISBN(s): 9781452205663, 1452205663
Edition: 3rd
File Details: PDF, 9.73 MB
Year: 2015
Language: english
THIRD EDITION
APPLIED REGRESSION
ANALYSIS and
GENERALIZED LINEAR
MODELS
For Bonnie and Jesse (yet again)
THIRD EDITION
APPLIED REGRESSION
ANALYSIS and
GENERALIZED LINEAR
MODELS
John Fox
McMaster University
FOR INFORMATION: Copyright © 2016 by SAGE Publications, Inc.
SAGE Publications, Inc. All rights reserved. No part of this book may be
2455 Teller Road reproduced or utilized in any form or by any means,
Thousand Oaks, California 91320
electronic or mechanical, including photocopying,
recording, or by any information storage and retrieval
E-mail: order@sagepub.com
system, without permission in writing from the
publisher.
SAGE Publications Ltd.
1 Oliver’s Yard
55 City Road
London EC1Y 1SP
United Kingdom
Preface xv
About the Author xxiv
I. DATA CRAFT 12
3. Examining Data 28
4. Transforming Data 55
14. Logit and Probit Models for Categorical Response Variables 370
23. Linear Mixed-Effects Models for Hierarchical and Longitudinal Data 700
Appendix A 759
References 762
Author Index 773
Subject Index 777
Data Set Index 791
Contents _________________
Preface xv
About the Author xxiv
1. Statistical Models and Social Science 1
1.1 Statistical Models and Social Reality 1
1.2 Observation and Experiment 4
1.3 Populations and Samples 8
Exercise 10
Summary 10
Recommended Reading 11
I. DATA CRAFT 12
2. What Is Regression Analysis? 13
2.1 Preliminaries 15
2.2 Naive Nonparametric Regression 18
2.3 Local Averaging 22
Exercise 25
Summary 26
3. Examining Data 28
3.1 Univariate Displays 30
3.1.1 Histograms 30
3.1.2 Nonparametric Density Estimation 33
3.1.3 Quantile-Comparison Plots 37
3.1.4 Boxplots 41
3.2 Plotting Bivariate Data 44
3.3 Plotting Multivariate Data 47
3.3.1 Scatterplot Matrices 48
3.3.2 Coded Scatterplots 50
3.3.3 Three-Dimensional Scatterplots 50
3.3.4 Conditioning Plots 51
Exercises 53
Summary 53
Recommended Reading 54
4. Transforming Data 55
4.1 The Family of Powers and Roots 55
4.2 Transforming Skewness 59
4.3 Transforming Nonlinearity 63
4.4 Transforming Nonconstant Spread 70
4.5 Transforming Proportions 72
4.6 Estimating Transformations as Parameters* 76
Exercises 78
Summary 79
Recommended Reading 80
Appendix A 759
References 762
Author Index 773
Subject Index 777
Data Set Index 791
Preface ___________________
L inear models, their variants, and extensions—the most important of which are general-
ized linear models—are among the most useful and widely used statistical tools for social
research. This book aims to provide an accessible, in-depth, modern treatment of regression
analysis, linear models, generalized linear models, and closely related methods.
The book should be of interest to students and researchers in the social sciences. Although
the specific choice of methods and examples reflects this readership, I expect that the book will
prove useful in other disciplines that employ regression models for data analysis and in courses
on applied regression and generalized linear models where the subject matter of applications is
not of special concern.
I have endeavored to make the text as accessible as possible (but no more accessible than
possible—i.e., I have resisted watering down the material unduly). With the exception of four
chapters, several sections, and a few shorter passages, the prerequisite for reading the book is a
course in basic applied statistics that covers the elements of statistical data analysis and infer-
ence. To the extent that I could without doing violence to the material, I have tried to present
even relatively advanced topics (such as methods for handling missing data and bootstrapping)
in a manner consistent with this prerequisite.
Many topics (e.g., logistic regression in Chapter 14) are introduced with an example that
motivates the statistics or (as in the case of bootstrapping, in Chapter 21) by appealing to familiar
material. The general mode of presentation is from the specific to the general: Consequently,
simple and multiple linear regression are introduced before the general linear model, and linear,
logit, and probit models are introduced before generalized linear models, which subsume all the
previous topics. Indeed, I could start with the generalized linear mixed-effects model (GLMM),
described in the final chapter of the book, and develop all these other topics as special cases
of the GLMM, but that would produce a much more abstract and difficult treatment (cf., e.g.,
Stroup, 2013).
The exposition of regression analysis starts (in Chapter 2) with an elementary discussion of
nonparametric regression, developing the notion of regression as a conditional average—in the
absence of restrictive assumptions about the nature of the relationship between the response and
explanatory variables. This approach begins closer to the data than the traditional starting point
of linear least-squares regression and should make readers skeptical about glib assumptions of
linearity, constant variance, and so on.
More difficult chapters and sections are marked with asterisks. These parts of the text can be
omitted without loss of continuity, but they provide greater understanding and depth, along with
coverage of some topics that depend on more extensive mathematical or statistical background.
I do not, however, wish to exaggerate the background that is required for this “more difficult’’
material: All that is necessary is some exposure to matrices, elementary linear algebra, elementary
differential calculus, and some basic ideas from probability and mathematical statistics. Appen-
dices to the text provide the background required for understanding the more advanced material.
xv
xvi Applied Regression Analysis and Generalized Linear Models
All chapters include summary information in boxes interspersed with the text and at the end
of the chapter, and most conclude with recommendations for additional reading. You will find
theoretically focused exercises at the end of most chapters, some extending the material in the
text. More difficult, and occasionally challenging, exercises are marked with asterisks. In addi-
tion, data-analytic exercises for each chapter are available on the website for the book, along
with the associated data sets.
Synopsis
Chapter 1 discusses the role of statistical data analysis in social science, expressing the
point of view that statistical models are essentially descriptive, not direct (if abstract)
Preface xvii
representations of social processes. This perspective provides the foundation for the
data-analytic focus of the text.
Chapter 2 introduces the notion of regression analysis as tracing the conditional distribution
of a response variable as a function of one or several explanatory variables. This idea is ini-
tially explored “nonparametrically,’’ in the absence of a restrictive statistical model for the
data (a topic developed more extensively in Chapter 18).
Chapter 3 describes a variety of graphical tools for examining data. These methods are use-
ful both as a preliminary to statistical modeling and to assist in the diagnostic checking of a
model that has been fit to data (as discussed, e.g., in Part III).
Chapter 5 discusses linear least-squares regression. Linear regression is the prototypical lin-
ear model, and its direct extension is the subject of Chapters 7 to 10.
Chapter 6, on statistical inference in regression, develops tools for testing hypotheses and
constructing confidence intervals that apply generally to linear models. This chapter also
introduces the basic methodological distinction between empirical and structural relation-
ships—a distinction central to understanding causal inference in nonexperimental research.
Chapter 7 shows how “dummy variables’’ can be employed to extend the regression model
to qualitative explanatory variables (or “factors’’). Interactions among explanatory variables
are introduced in this context.
Chapter 8, on analysis of variance models, deals with linear models in which all the explan-
atory variables are factors.
Chapter 9* develops the statistical theory of linear models, providing the foundation for
much of the material in Chapters 5 to 8 along with some additional, and more general, results.
This chapter also includes an introduction to instrumental-variables estimation and two-stage
least squares.
Chapter 10* applies vector geometry to linear models, allowing us literally to visualize the
structure and properties of these models. Many topics are revisited from the geometric per-
spective, and central concepts—such as “degrees of freedom’’ —are given a natural and com-
pelling interpretation.
1
I believe that it was Michael Friendly of York University who introduced me to the term data craft, a term that aptly
characterizes the content of this section and, indeed, of the book more generally.
xviii Applied Regression Analysis and Generalized Linear Models
Chapter 11 deals with the detection of unusual and influential data in linear models.
Chapter 12 describes methods for diagnosing a variety of problems, including non-normally
distributed errors, nonconstant error variance, and nonlinearity. Some more advanced mate-
rial in this chapter shows how the method of maximum likelihood can be employed for select-
ing transformations.
Chapter 13 takes up the problem of collinearity—the difficulties for estimation that ensue
when the explanatory variables in a linear model are highly correlated.
Chapter 14 takes up linear-like logit and probit models for qualitative and ordinal categorical
response variables. This is an important topic because of the ubiquity of categorical data in
the social sciences (and elsewhere).
Chapter 15 describes the generalized linear model, showing how it encompasses linear, logit,
and probit models along with statistical models (such as Poisson and gamma regression mod-
els) not previously encountered in the text. The chapter includes a treatment of diagnostic
methods for generalized linear models, extending much of the material in Part III, and ends
with an introduction to inference for linear and generalized linear models in complex survey
samples.
Chapter 16 describes time-series regression, where the observations are ordered in time
and hence cannot usually be treated as statistically independent. The chapter introduces the
method of generalized least squares, which can take account of serially correlated errors in
regression.
Preface xix
Chapter 17 takes up nonlinear regression models, showing how some nonlinear models can be
fit by linear least squares after transforming the model to linearity, while other, fundamentally
nonlinear, models require the method of nonlinear least squares. The chapter includes treat-
ments of polynomial regression and regression splines, the latter closely related to the topic of
the subsequent chapter.
Chapter 18 introduces nonparametric regression analysis, which traces the dependence of the
response on the explanatory variables in a regression without assuming a particular functional
form for their relationship. This chapter contains a discussion of generalized nonparametric
regression, including generalized additive models.
Chapter 19 describes methods of robust regression analysis, which are capable of automati-
cally discounting unusual data.
Chapter 20 discusses missing data, explaining the potential pitfalls lurking in common
approaches to missing data, such as complete-case analysis, and describing more sophisticated
methods, such as multiple imputation of missing values. This is an important topic because
social science data sets are often characterized by a large proportion of missing data.
Chapter 21 introduces the “bootstrap,’’ a computationally intensive simulation method for
constructing confidence intervals and hypothesis tests. In its most common nonparametric
form, the bootstrap does not make strong distributional assumptions about the data, and it
can be made to reflect the manner in which the data were collected (e.g., in complex survey
sampling designs).
Chapter 22 describes methods for model selection, model averaging in the face of model
uncertainty, and model validation. Automatic methods of model selection and model averag-
ing, I argue, are most useful when a statistical model is to be employed for prediction, less
so when the emphasis is on interpretation. Validation is a simple method for drawing honest
statistical inferences when—as is commonly the case—the data are employed both to select a
statistical model and to estimate its parameters.
Chapter 23 introduces linear mixed-effects models and describes the fundamental issues that
arise in the analysis of clustered data through models that incorporate random effects. Illus-
trative applications include both hierarchical and longitudinal data.
Chapter 24 describes generalized linear mixed-effects models for non-normally distributed
response variables, such as logistic regression for a dichotomous response, and Poisson and
related regression models for count data. The chapter also introduces nonlinear mixed-effects
models for fitting fundamentally nonlinear equations to clustered data.
xx Applied Regression Analysis and Generalized Linear Models
Appendices
Several appendices provide background, principally—but not exclusively—for the starred
portions of the text. With the exception of Appendix A, which is printed at the back of the book,
all the appendices are on the website for the book.
Computing
Nearly all the examples in this text employ real data from the social sciences, many of them
previously analyzed and published. The online exercises that involve data analysis also almost
all use real data drawn from various areas of application. I encourage readers to analyze their
own data as well.
The data sets for examples and exercises can be downloaded free of charge via the World
Wide Web; point your web browser at www.sagepub.com/fox3e. Appendices and exercises are
distributed as portable document format (PDF) files.
I occasionally comment in passing on computational matters, but the book generally ignores
the finer points of statistical computing in favor of methods that are computationally simple. I
feel that this approach facilitates learning. Thus, for example, linear least-squares coefficients
are obtained by solving the normal equations formed from sums of squares and products of the
variables rather than by a more numerically stable method. Once basic techniques are absorbed,
the data analyst has recourse to carefully designed programs for statistical computations.
I think that it is a mistake to tie a general discussion of linear and related statistical models
too closely to particular software. Any reasonably capable statistical software will do almost
everything described in this book. My current personal choice of statistical software, both for
research and for teaching, is R—a free, open-source implementation of the S statistical pro-
gramming language and computing environment (Ihaka & Gentleman, 1996; R Core Team,
2014). R is now the dominant statistical software among statisticians; it is used increasingly
in the social sciences but is by no means dominant there. I have coauthored a separate book
(Fox & Weisberg, 2011) that provides a general introduction to R and that describes its use in
applied regression analysis.
Preface xxi
graded. I distribute answers after the homework is collected. There are midterm and final take-
home exams (after the review classes), also focused on data analysis.
• I used the material in the predecessors of Chapters 1 to 15 and the several appendices for
a two-semester course for social science graduate students (at York University in Toronto)
with similar statistical preparation. For this second, more intensive, course, background
topics (such as linear algebra) were introduced as required and constituted about one fifth
of the course. The organization of the course was similar to the first one.
Both courses include some treatment of statistical computing, with more information on pro-
gramming in the second course. For students with the requisite mathematical and statistical
background, it should be possible to cover almost all the text in a reasonably paced two-semester
course.
In learning statistics, it is important for the reader to participate actively, both by working
though the arguments presented in the book and—even more important—by applying methods
to data. Statistical data analysis is a craft, and, like any craft, developing proficiency requires
effort and practice. Reworking examples is a good place to start, and I have presented illustra-
tions in such a manner as to facilitate reanalysis and further analysis of the data.
Where possible, I have relegated formal “proofs’’ and derivations to exercises, which never-
theless typically provide some guidance to the reader. I believe that this type of material is best
learned constructively. As well, including too much algebraic detail in the body of the text invites
readers to lose the statistical forest for the mathematical trees. You can decide for yourself (or
your students) whether or not to work the theoretical exercises. It is my experience that some
people feel that the process of working through derivations cements their understanding of the
statistical material, while others find this activity tedious and pointless. Some of the theoretical
exercises, marked with asterisks, are comparatively difficult. (Difficulty is assessed relative to
the material in the text, so the threshold is higher in starred sections and chapters.)
In preparing the data-analytic exercises, I have tried to find data sets of some intrinsic inter-
est that embody a variety of characteristics. In many instances, I try to supply some direction
in the data-analytic exercises, but—like all real-data analysis—these exercises are fundamen-
tally open-ended. It is therefore important for instructors to set aside time to discuss data-ana-
lytic exercises in class, both before and after students tackle them. Although students often miss
important features of the data in their initial analyses, this experience—properly approached and
integrated—is an unavoidable part of learning the craft of data analysis.
A few exercises, marked with pound-signs (#) are meant for “hand’’ computation. Hand com-
putation (i.e., with a calculator) is tedious, and is practical only for unrealistically small prob-
lems, but it sometimes serves to make statistical procedures more concrete (and increases our
admiration for our pre-computer-era predecessors). Similarly, despite the emphasis in the text
on analyzing real data, a small number of exercises generate simulated data to clarify certain
properties of statistical methods.
I struggled with the placement of cross-references to exercises and to other parts of the text,
trying brackets [too distracting!], marginal boxes (too imprecise), and finally settling on tradi-
tional footnotes.2 I suggest that you ignore both the cross-references and the other footnotes on
first reading of the text.3
Finally, a word about style: I try to use the first person singular—“I’’—when I express opin-
ions. “We’’ is reserved for you—the reader—and I.
2
Footnotes are a bit awkward, but you don’t have to read them.
3
Footnotes other than cross-references generally develop small points and elaborations.
Preface xxiii
Acknowledgments
Many individuals have helped me in the preparation of this book.
I am grateful to the York University Statistical Consulting Service study group, which read,
commented on, and corrected errors in the manuscript, both of the previous edition of the book
and of the new section on mixed-effects models introduced in this edition.
A number of friends and colleagues donated their data for illustrations and exercises—
implicitly subjecting their research to scrutiny and criticism.
Several individuals contributed to this book by making helpful comments on it and its prede-
cessors (Fox, 1984, 1997, 2008): Patricia Ahmed, University of Kentucky; Robert Andersen; A.
Alexander Beaujean, Baylor University; Ken Bollen; John Brehm, University of Chicago; Gene
Denzel; Shirley Dowdy; Michael Friendly; E. C. Hedberg, NORC at the University of Chicago; Paul
Herzberg; Paul Johnston; Michael S. Lynch, University of Georgia; Vida Maralani, Yale Univer-
sity; William Mason; Georges Monette; A. M. Parkhurst, University of Nebraska-Lincoln; Doug
Rivers; Paul D. Sampson, University of Washington; Corey S. Sparks, The University of Texas at
San Antonio; Robert Stine; and Sanford Weisberg. I am also in debt to Paul Johnson’s students at
the University of Kansas, to William Mason’s students at UCLA, to Georges Monette’s students
at York University, to participants at the Inter-University Consortium for Political and Social
Research Summer Program in Robert Andersen’s advanced regression course, and to my
students at McMaster University, all of whom were exposed to various versions of the second
edition of this text prior to publication and who improved the book through their criticism,
suggestions, and—occasionally—informative incomprehension.
Edward Ng capably assisted in the preparation of some of the figures that appear in the book.
C. Deborah Laughton, Lisa Cuevas, Sean Connelly, and—most recently—Vicki Knight,
my editors at Sage Publications, were patient and supportive throughout the several years that
I worked on the various editions of the book.
I have been very lucky to have colleagues and collaborators who have been a constant source
of ideas and inspiration—in particular, Michael Friendly and Georges Monette at York Univer-
sity in Toronto and Sanford Weisberg at the University of Minnesota. I am sure that they will
recognize their influence on this book. I owe a special debt to Georges Monette for his contri-
butions, both direct and indirect, to the new chapters on mixed-effects models in this edition.
Georges generously shared his materials on mixed-effects models with me, and I have benefited
from his insights on the subject (and others) over a period of many years.
Finally, a number of readers have contributed corrections to earlier editions of the text, and
I thank them individually in the posted errata to these editions. Paul Laumans deserves partic-
ular mention for his assiduous pursuit of typographical errors. No doubt I’ll have occasion in
due course to thank readers for corrections to the current edition.
If, after all this help and the opportunity to prepare a new edition of the book, deficiencies
remain, then I alone am at fault.
John Fox
Toronto, Canada
August 2014
About the Author _________
xxiv
T
1 Statistical Models
and Social Science
he social world is exquisitely complex and rich. From the improbable moment of birth,
each of our lives is governed by chance and contingency. The statistical models typically
used to analyze social data—and, in particular, the models considered in this book—are, in
contrast, ludicrously simple. How can simple statistical models help us to understand a com-
plex social reality? As the statistician George Box famously remarked (e.g., in Box, 1979),
‘‘All models are wrong but some are useful’’ (p. 202). Can statistical models be useful in the
social sciences?
This is a book on data analysis and statistics, not on the philosophy of the social sciences. I
will, therefore, address this question, and related issues, very briefly here. Nevertheless, I feel
that it is useful to begin with a consideration of the role of data analysis in the larger process of
social research. You need not agree with the point of view that I express in this chapter to
make productive use of the statistical tools presented in the remainder of the book, but the
emphasis and specific choice of methods in the text partly reflect the ideas in this chapter. You
may wish to reread this material after you study the methods described in the sequel.
1
2 Chapter 1. Statistical Models and Social Science
Each of these precarious occurrences clearly affected my income, as have other events—
some significant, some small—too numerous and too tedious to mention, even if I were aware
of them all. If, for some perverse reason, you were truly interested in my income (and, perhaps,
in other matters more private), you could study my biography and through that study arrive at
a detailed (if inevitably incomplete) understanding. It is clearly impossible, however, to pursue
this strategy for many individuals or, more to the point, for individuals in general.
Nor is an understanding of income in general inconsequential, because income inequality is
an (increasingly, as it turns out) important feature of our society. If such an understanding
hinges on a literal description of the process by which each of us receives an income, then the
enterprise is clearly hopeless. We might, alternatively, try to capture significant features of the
process in general without attempting to predict the outcome for specific individuals. One
could draw formal analogies (largely unproductively, I expect, although some have tried) to
chaotic physical processes, such as the determination of weather and earthquakes.
Concrete mathematical theories purporting to describe social processes sometimes appear in
the social sciences (e.g., in economics and in some areas of psychology), but they are relatively
rare.1 If a theory, like Newton’s laws of motion, is mathematically concrete, then, to be sure,
there are difficulties in applying and testing it; but, with some ingenuity, experiments and
observations can be devised to estimate the free parameters of the theory (a gravitational con-
stant, for example) and to assess the fit of the theory to the resulting data.
In the social sciences, verbal theories abound. These social theories tend to be vague, ellipti-
cal, and highly qualified. Often, they are, at least partially, a codification of ‘‘common sense.’’
I believe that vague social theories are potentially useful abstractions for understanding an
intrinsically complex social reality, but how can such theories be linked empirically to that
reality?
A vague social theory may lead us to expect, for example, that racial prejudice is the partial
consequence of an ‘‘authoritarian personality,’’ which, in turn, is a product of rigid childrear-
ing. Each of these terms requires elaboration and procedures of assessment or measurement.
Other social theories may lead us to expect that higher levels of education should be associated
with higher levels of income, perhaps because the value of labor power is enhanced by train-
ing, because occupations requiring higher levels of education are of greater functional impor-
tance, because those with higher levels of education are in relatively short supply, or because
people with high educational attainment are more capable in the first place. In any event, we
need to consider how to assess income and education, how to examine their relationship, and
what other variables need to be included.2
Statistical models of the type considered in this book are grossly simplified descriptions of
complex social reality. Imagine that we have data from a social survey of a large sample of
employed individuals. Imagine further, anticipating the statistical methods described in subse-
quent chapters, that we regress these individuals’ income on a variety of putatively relevant
characteristics, such as their level of education, gender, race, region of residence, and so on.
We recognize that a model of this sort will fail to account perfectly for individuals’ incomes,
so our model includes a ‘‘residual,’’ meant to capture the component of income unaccounted
1
The methods for fitting nonlinear models described in Chapter 17 are sometimes appropriate to the rare theories in
social science that are mathematically concrete.
2
See Section 1.2.
1.1 Statistical Models and Social Reality 3
for by the systematic part of the model, which incorporates the ‘‘effects’’ on income of educa-
tion, gender, and so forth.
The residuals for our model are likely very large. Even if the residuals were small, however,
we would still need to consider the relationships among our social ‘‘theory,’’ the statistical
model that we have fit to the data, and the social ‘‘reality’’ that we seek to understand. Social
reality, along with our methods of observation, produces the data; our theory aims to explain
the data, and the model to describe them. That, I think, is the key point: Statistical models are
almost always fundamentally descriptive.
I believe that a statistical model cannot, and is not literally meant, to capture the social pro-
cess by which incomes are ‘‘determined.’’ As I argued above, individuals receive their incomes
as a result of their almost unimaginably complex personal histories. No regression model, not
even one including a residual, can reproduce this process: It is not as if my income is partly
determined by my education, gender, race, and so on, and partly by the detailed trajectory of
my life. It is, therefore, not sensible, at the level of real social processes, to relegate chance and
contingency to a random term that is simply added to the systematic part of a statistical model.
The unfortunate tendency to reify statistical models—to forget that they are descriptive summa-
ries, not literal accounts of social processes—can only serve to discredit quantitative data anal-
ysis in the social sciences.
Nevertheless, and despite the rich chaos of individuals’ lives, social theories imply a struc-
ture to income inequality. Statistical models are capable of capturing and describing that struc-
ture or at least significant aspects of it. Moreover, social research is often motivated by
questions rather than by hypotheses: Has income inequality between men and women changed
recently? Is there a relationship between public concern over crime and the level of crime?
Data analysis can help to answer these questions, which frequently are of practical—as well as
theoretical—concern. Finally, if we proceed carefully, data analysis can assist us in the discov-
ery of social facts that initially escape our hypotheses and questions.
It is, in my view, a paradox that the statistical models that are at the heart of most modern
quantitative social science are at once taken too seriously and not seriously enough by many
practitioners of social science. On one hand, social scientists write about simple statistical mod-
els as if they were direct representations of the social processes that they purport to describe. On
the other hand, there is frequently a failure to attend to the descriptive accuracy of these models.
As a shorthand, reference to the ‘‘effect’’ of education on income is innocuous. That the
shorthand often comes to dominate the interpretation of statistical models is reflected, for
example, in much of the social science literature that employs structural-equation models (once
commonly termed ‘‘causal models,’’ a usage that has thankfully declined). There is, I believe,
a valid sense in which income is ‘‘affected’’ by education, because the complex real process by
which individuals’ incomes are determined is partly conditioned by their levels of education,
but—as I have argued above—one should not mistake the model for the process.3
Although statistical models are very simple in comparison to social reality, they typically
incorporate strong claims about the descriptive pattern of data. These claims rarely reflect the
3
There is the danger here of simply substituting one term (‘‘conditioned by’’) for another (‘‘affected by’’), but the point
is deeper than that: Education affects income because the choices and constraints that partly structure individuals’ lives
change systematically with their level of education. Many highly paid occupations in our society are closed to individu-
als who lack a university education, for example. To recognize this fact, and to examine its descriptive reflection in a
statistical summary, is different from claiming that a university education literally adds an increment to individuals’
incomes.
4 Chapter 1. Statistical Models and Social Science
substantive social theories, hypotheses, or questions that motivate the use of the statistical mod-
els, and they are very often wrong. For example, it is common in social research to assume a
priori, and without reflection, that the relationship between two variables, such as income and
education, is linear. Now, we may well have good reason to believe that income tends to be
higher at higher levels of education, but there is no reason to suppose that this relationship is
linear. Our practice of data analysis should reflect our ignorance as well as our knowledge.
A statistical model is of no practical use if it is an inaccurate description of the data, and we
will, therefore, pay close attention to the descriptive accuracy of statistical models. Unhappily,
the converse is not true, for a statistical model may be descriptively accurate but of little practi-
cal use; it may even be descriptively accurate but substantively misleading. We will explore
these issues briefly in the next two sections, which tie the interpretation of statistical models to
the manner in which data are collected.
With few exceptions, statistical data analysis describes the outcomes of real social pro-
cesses and not the processes themselves. It is therefore important to attend to the descrip-
tive accuracy of statistical models and to refrain from reifying them.
Leave Granted?
Pratte 9 91 100 57
Linden 9 91 100 32
Stone 12 88 100 43
Iacobucci 12 88 100 33
Décary 20 80 100 80
Hugessen 26 74 100 65
Urie 29 71 100 21
MacGuigan 30 70 100 90
Heald 30 70 100 46
Mahoney 34 66 100 44
Marceau 36 64 100 50
Desjardins 49 51 100 47
All judges 25 75 100 608
SOURCE: Adapted from Table 1 in Greene and Shaffer, ‘‘Leave to Appeal
and Leave to Commence Judicial Review in Canada’s Refugee-
Determination System: Is the Process Fair?’’ International Journal of Refugee
Law, 1992, Vol. 4, No. 1, p. 77, by permission of Oxford University Press.
the judges reflect differences in their propensities to grant leave to appeal.4 The cases were,
however, assigned to the judges not randomly but on a rotating basis, with a single judge hear-
ing all of the cases that arrived at the court in a particular week. In defending the current refu-
gee determination process, expert witnesses for the Crown argued that the observed differences
among the judges might therefore be due to characteristics that systematically differentiated the
cases that different judges happened to hear.
It is possible, in practice, to ‘‘control’’ statistically for such extraneous ‘‘confounding’’ vari-
ables as may explicitly be identified, but it is not, in principle, possible to control for all rele-
vant explanatory variables, because we can never be certain that all relevant variables have
been identified.5 Nevertheless, I would argue, the data in Table 1.1 establish a prima facie case
for systematic differences in the judges’ propensities to grant leave to appeal to refugee clai-
mants. Careful researchers control statistically for potentially relevant variables that they can
identify; cogent critics demonstrate that an omitted confounding variable accounts for the
observed association between judges and decisions or at least argue persuasively that a specific
omitted variable may be responsible for this association—they do not simply maintain the
abstract possibility that such a variable may exist.
4
Even so, this inference is not reasonably construed as a representation of the cognitive process by which judges arrive
at their determinations. Following the argument in the previous section, it is unlikely that we could ever trace out that
process in detail; it is quite possible, for example, that a specific judge would make different decisions faced with the
same case on different occasions.
5
See the further discussion of the refugee data in Section 22.3.1.
6 Chapter 1. Statistical Models and Social Science
1. The omitted variable must influence the response. For example, if the gender of the
refugee applicant has no impact on the judges’ decisions, then it is irrelevant to control
statistically for gender.
2. The omitted variable must be related as well to the explanatory variable that is the focus
of the research. Even if the judges’ decisions are influenced by the gender of the appli-
cants, the relationship between outcome and judge will be unchanged by controlling for
gender (e.g., by looking separately at male and female applicants) unless the gender of
the applicants is also related to judges—that is, unless the different judges heard cases
with substantially different proportions of male and female applicants.
The strength of randomized experimentation derives from the second point: If cases were ran-
domly assigned to judges, then there would be no systematic tendency for them to hear cases
with differing proportions of men and women—or, for that matter, with systematic differences
of any kind.
It is, however, misleading to conclude that causal inferences are completely unambiguous in
experimental research, even within the bounds of statistical uncertainty (expressed, for exam-
ple, in the p-value of a statistical test). Although we can unambiguously ascribe an observed
difference to an experimental manipulation, we cannot unambiguously identify that manipula-
tion with the explanatory variable that is the focus of our research.
In a randomized drug study, for example, in which patients are prescribed a new drug or an
inactive placebo, we may establish with virtual certainty that there was greater average
improvement among those receiving the drug, but we cannot be sure that this difference is due
(or solely due) to the putative active ingredient in the drug. Perhaps the experimenters inadver-
tently conveyed their enthusiasm for the drug to the patients who received it, influencing the
patients’ responses, or perhaps the bitter taste of the drug subconsciously convinced these
patients of its potency.
Experimenters try to rule out alternative interpretations of this kind by following careful
experimental practices, such as ‘‘double-blind’’ delivery of treatments (neither the subject nor
the experimenter knows whether the subject is administered the drug or the placebo), and by
holding constant potentially influential characteristics deemed to be extraneous to the research
(the taste, color, shape, etc., of the drug and placebo are carefully matched). One can never be
certain, however, that all relevant variables are held constant in this manner. Although the
degree of certainty achieved is typically much greater in a randomized experiment than in an
observational study, the distinction is less clear-cut than it at first appears.
Causal inferences are most certain—if not completely definitive—in randomized experi-
ments, but observational data can also be reasonably marshaled as evidence of causation.
Good experimental practice seeks to avoid confounding experimentally manipulated expla-
natory variables with other variables that can influence the response variable. Sound analy-
sis of observational data seeks to control statistically for potentially confounding variables.
6
These points are developed more formally in Sections 6.3 and 9.7.
1.2 Observation and Experiment 7
Income
Education Prestige
Figure 1.1 Simple ‘‘causal model’’ relating education, income, and prestige of occupations.
Education is a common prior cause of both income and prestige; income intervenes
causally between education and prestige.
In subsequent chapters, we will have occasion to examine observational data on the prestige,
educational level, and income level of occupations. It will materialize that occupations with
higher levels of education tend to have higher prestige and that occupations with higher levels
of income also tend to have higher prestige. The income and educational levels of occupations
are themselves positively related. As a consequence, when education is controlled statistically,
the relationship between prestige and income grows smaller; likewise, when income is con-
trolled, the relationship between prestige and education grows smaller. In neither case, how-
ever, does the relationship disappear.
How are we to understand the pattern of statistical associations among the three variables? It
is helpful in this context to entertain an informal ‘‘causal model’’ for the data, as in Figure 1.1.
That is, the educational level of occupations influences (potentially) both their income level
and their prestige, while income potentially influences prestige. The association between pres-
tige and income is ‘‘spurious’’ (i.e., not causal) to the degree that it is a consequence of the
mutual dependence of these two variables on education; the reduction in this association when
education is controlled represents the removal of the spurious component. In contrast, the cau-
sal relationship between education and prestige is partly mediated by the ‘‘intervening vari-
able’’ income; the reduction in this association when income is controlled represents the
articulation of an ‘‘indirect’’ effect of education on prestige (i.e., through income).
In the former case, we partly explain away the association between income and prestige:
Part of the relationship is ‘‘really’’ due to education. In the latter case, we partly explain the
association between education and prestige: Part of the relationship is mediated by income.
Causal interpretation of observational data is always risky, especially—as here—when the data
are cross-sectional (i.e., collected at one point in time) rather than longitudinal (where the data
8 Chapter 1. Statistical Models and Social Science
are collected over time). Nevertheless, it is usually impossible, impractical, or immoral to col-
lect experimental data in the social sciences, and longitudinal data are often hard to come by.7
Moreover, the essential difficulty of causal interpretation in nonexperimental investigations—
due to potentially confounding variables that are left uncontrolled—applies to longitudinal as
well as to cross-sectional observational data.8
The notion of ‘‘cause’’ and its relationship to statistical data analysis are notoriously difficult
ideas. A relatively strict view requires an experimentally manipulable explanatory variable, at
least one that is manipulable in principle.9 This is a particularly sticky point because, in social
science, many explanatory variables are intrinsically not subject to direct manipulation, even in
principle. Thus, for example, according to the strict view, gender cannot be considered a cause
of income, even if it can be shown (perhaps after controlling for other determinants of income)
that men and women systematically differ in their incomes, because an individual’s gender can-
not be changed.10
I believe that treating nonmanipulable explanatory variables, such as gender, as potential
causes is, at the very least, a useful shorthand. Men earn higher incomes than women because
women are (by one account) concentrated into lower paying jobs, work fewer hours, are
directly discriminated against, and so on (see, e.g., Ornstein, 1983). Explanations of this sort
are perfectly reasonable and are subject to statistical examination; the sense of ‘‘cause’’ here
may be weaker than the narrow one, but it is nevertheless useful.
7
Experiments with human beings also frequently distort the processes that they purport to study: Although it might well
be possible, for example, to recruit judges to an experimental study of judicial decision making, the artificiality of the
situation could easily affect their simulated decisions. Even if the study entailed real judicial judgments, the mere act of
observation might influence the judges’ decisions—they might become more careful, for example.
8
We will take up the analysis of longitudinal data in Chapters 23 and 24 on mixed-effects models.
9
For clear presentations of this point of view, see, for example, Holland (1986) and Berk (2004).
10
This statement is, of course, arguable: There are historically many instances in which individuals have changed their
gender, for example by disguise, not to mention surgery. Despite some fuzziness, however, I believe that the essential
point—that some explanatory variables are not (normally) subject to manipulation—is valid. A more subtle point is that
in certain circumstances, we could imagine experimentally manipulating the apparent gender of an individual, for
example, on a job application.
1.3 Populations and Samples 9
population of random rearrangements of the subjects, even when these subjects are not sampled
from some larger population. If, for example, we find a highly ‘‘statistically significant’’ differ-
ence between two experimental groups of subjects in a randomized experiment, then we can be
sure, with practical certainty, that the difference was due to the experimental manipulation. The
rub here is that our interest almost surely extends beyond this specific group of subjects to
some larger—often ill-defined—population.
Even when subjects in an experimental or observational investigation are literally sampled at
random from a real population, we usually wish to generalize beyond that population. There
are exceptions—election polling comes immediately to mind—but our interest is seldom con-
fined to the population that is directly sampled. This point is perhaps clearest when no sam-
pling is involved—that is, when we have data on every individual in a real population.
Suppose, for example, that we examine data on population density and crime rates for all
large U.S. cities and find only a weak association between the two variables. Suppose further
that a standard test of statistical significance indicates that this association is so weak that it
easily could have been the product of ‘‘chance.’’11 Is there any sense in which this information
is interpretable? After all, we have before us data on the entire population of large U.S. cities
at a particular historical juncture.
Because our interest inheres not directly—at least not exclusively—in these specific cities
but in the complex social processes by which density and crime are determined, we can reason-
ably imagine a different outcome. Were we to replay history conceptually, we would not
observe precisely the same crime rates and population density statistics, dependent as these are
on a myriad of contingent and chancy events; indeed, if the ambit of our conceptual replay of
history is sufficiently broad, then the identities of the cities themselves might change.
(Imagine, for example, that Henry Hudson had not survived his trip to the New World or, if he
survived it, that the capital of the United States had remained in New York. Less momentously,
imagine that Fred Smith had not gotten drunk and killed a friend in a brawl, reducing the num-
ber of homicides in New York by one.) It is, in this context, reasonable to draw statistical infer-
ences to the process that produced the currently existing populations of cities. Similar
considerations arise in the analysis of historical statistics, for example, of time-series data.12
Much interesting data in the social sciences—and elsewhere—are collected haphazardly.
The data constitute neither a sample drawn at random from a larger population nor a coherently
defined population. Experimental randomization provides a basis for making statistical infer-
ences to the population of rearrangements of a haphazardly selected group of subjects, but that
is in itself cold comfort. For example, an educational experiment is conducted with students
recruited from a school that is conveniently available. We are interested in drawing conclusions
about the efficacy of teaching methods for students in general, however, not just for the stu-
dents who participated in the study.
Haphazard data are also employed in many observational studies—for example, volunteers
are recruited from among university students to study the association between eating disorders
and overexercise. Once more, our interest transcends this specific group of volunteers.
To rule out haphazardly collected data would be a terrible waste; it is, instead, prudent to be
careful and critical in the interpretation of the data. We should try, for example, to satisfy our-
selves that our haphazard group does not differ in presumably important ways from the larger
11
Cf. the critical discussion of crime and population density in Freedman (1975).
12
See Chapter 16 for a discussion of regression analysis with time-series data.
10 Chapter 1. Statistical Models and Social Science
population of interest, or to control statistically for variables thought to be relevant to the phe-
nomena under study.
Statistical inference can speak to the internal stability of patterns in haphazardly collected
data and—most clearly in experimental data—to causation. Generalization from haphazardly
collected data to a broader population, however, is inherently a matter of judgment.
Randomization and good sampling design are desirable in social research, but they are
not prerequisites for drawing statistical inferences. Even when randomization or random
sampling is employed, we typically want to generalize beyond the strict bounds of statis-
tical inference.
Exercise
Exercise 1.1. Imagine that students in an introductory statistics course complete 20 assign-
ments during two semesters. Each assignment is worth 1% of a student’s final grade, and stu-
dents get credit for assignments that are turned in on time and that show reasonable effort. The
instructor of the course is interested in whether doing the homework contributes to learning,
and (anticipating material to be taken up in Chapters 5 and 6), she observes a linear, moder-
ately strong, and highly statistically significant relationship between the students’ grades on the
final exam in the course and the number of homework assignments that they completed. For
concreteness, imagine that for each additional assignment completed, the students’ grades on
average were 1.5 higher (so that, e.g., students completing all of the assignments on average
scored 30 points higher on the exam than those who completed none of the assignments).
(a) Can this result be taken as evidence that completing homework assignments causes
higher grades on the final exam? Why or why not?
(b) Is it possible to design an experimental study that could provide more convincing evi-
dence that completing homework assignments causes higher exam grades? If not, why
not? If so, how might such an experiment be designed?
(c) Is it possible to marshal stronger observational evidence that completing homework
assignments causes higher exam grades? If not, why not? If so, how?
Summary
! With few exceptions, statistical data analysis describes the outcomes of real social pro-
cesses and not the processes themselves. It is therefore important to attend to the
descriptive accuracy of statistical models and to refrain from reifying them.
! Causal inferences are most certain—if not completely definitive—in randomized experi-
ments, but observational data can also be reasonably marshaled as evidence of causation.
Good experimental practice seeks to avoid confounding experimentally manipulated
explanatory variables with other variables that can influence the response variable.
Recommended Reading 11
Sound analysis of observational data seeks to control statistically for potentially con-
founding variables.
! In analyzing observational data, it is important to distinguish between a variable that is a
common prior cause of an explanatory and response variable and a variable that inter-
venes causally between the two.
! It is overly restrictive to limit the notion of statistical causation to explanatory variables
that are manipulated experimentally, to explanatory variables that are manipulable in
principle, or to data that are collected over time.
! Randomization and good sampling design are desirable in social research, but they are
not prerequisites for drawing statistical inferences. Even when randomization or random
sampling is employed, we typically want to generalize beyond the strict bounds of statis-
tical inference.
Recommended Reading
! Chance and contingency are recurrent themes in Stephen Gould’s fine essays on natural
history; see, in particular, Gould (1989). I believe that these themes are relevant to the
social sciences as well, and Gould’s work has strongly influenced the presentation in
Section 1.1.
! The legitimacy of causal inferences in nonexperimental research is and has been a hotly
debated topic. Sir R. A. Fisher, for example, famously argued in the 1950s that there
was no good evidence that smoking causes lung cancer, because the epidemiological
evidence for the relationship between the two was, at that time, based on observational
data (see, e.g., the review of Fisher’s work on lung cancer and smoking in Stolley,
1991). Perhaps the most vocal recent critic of the use of observational data was David
Freedman. See, for example, Freedman’s (1987) critique of structural-equation modeling
in the social sciences and the commentary that follows it.
! A great deal of recent work on causal inference in statistics has been motivated by
‘‘Rubin’s causal model.’’ For a summary and many references, see Rubin (2004). A
very clear presentation of Rubin’s model, followed by interesting commentary, appears
in Holland (1986). Pearl (2009) develops a different account of causal inference from
nonexperimental data using directed graphs. For an accessible, book-length treatment of
these ideas, combining Rubin’s ‘‘counterfactual’’ approach with Pearl’s, see Morgan
and Winship (2007). Also see Murnane and Willett (2011), who focus their discussion
on research in education.
! Berk (2004) provides an extended, careful discussion, from a point of view different
from mine, of many of the issues raised in this chapter.
! The place of sampling and randomization in statistical investigations has also been
widely discussed and debated in the literature on research design. The classic presenta-
tion of the issues in Campbell and Stanley (1963) is still worth reading, as is Kish
(1987). In statistics, these themes are reflected in the distinction between model-based
and design-based inference (see, e.g., Koch & Gillings, 1983) and in the notion of super-
population inference (see, e.g., Thompson, 1988).
! Achen (1982) argues eloquently for the descriptive interpretation of statistical models,
illustrating his argument with effective examples.
PART I
Data Craft
A
2 What Is
Regression Analysis?
s mentioned in Chapter 1, statistical data analysis is a craft, part art (in the sense of a
skill developed through practice) and part science (in the sense of systematic, formal
knowledge). Introductions to applied statistics typically convey some of the craft of data analy-
sis but tend to focus on basic concepts and the logic of statistical inference. This and the next
two chapters develop some of the elements of statistical data analysis:
! The current chapter introduces regression analysis in a general context, tracing the con-
ditional distribution of a response variable as a function of one or several explanatory
variables. There is also some discussion of practical methods for looking at regressions
with a minimum of prespecified assumptions about the data.
! Chapter 3 describes graphical methods for looking at data, including methods for exam-
ining the distributions of individual variables, relationships between pairs of variables,
and relationships among several variables.
! Chapter 4 takes up methods for transforming variables to make them better behaved—
for example, to render the distribution of a variable more symmetric or to make the rela-
tionship between two variables more nearly linear.
Figure 2.1 is a scatterplot showing the relationship between hourly wages (in dollars) and for-
mal education (in years) for a sample of 14,601 employed Canadians. The line in the plot
shows the mean value of wages for each level of education and represents (in one sense) the
regression of wages on education.1 Although there are many observations in this scatterplot,
few individuals in the sample have education below, say, 5 years, and so the mean wages at
low levels of education cannot be precisely estimated from the sample, despite its large overall
size. Discounting, therefore, variation in average wages at very low levels of education, it
appears as if average wages are relatively flat until about 10 years of education, at which point
they rise gradually and steadily with education.
Figure 2.1 raises several issues that we will take up in subsequent chapters:2 Because of the
large number of points in the plot and the discreteness of education (which is represented as
number of years completed), the plot is difficult to examine. It is, however, reasonably clear
that the distribution of wages at fixed levels of education is positively skewed. One such condi-
tional distribution is shown in the histogram in Figure 2.2. The mean is a problematic measure
of the center of a skewed distribution, and so basing the regression on the mean is not a good
idea for such data. It is also clear that the relationship between hourly wages and education is
1
See Exercise 5.2 for the original statistical meaning of the term ‘‘regression.’’
2
See, in particular, Chapter 3 on examining data and Chapter 4 on transforming data.
13
14 Chapter 2. What Is Regression Analysis?
50
30
20
10
0 5 10 15 20
Education (years)
Figure 2.1 A scatterplot showing the relationship between hourly wages (in dollars) and educa-
tion (in years) for a sample of 14,601 employed Canadians. The line connects the
mean wages at the various levels of education. The data are drawn from the 1994
Survey of Labour and Income Dynamics (SLID).
Estimated Probability Density
0.08
0.06
0.04
0.02
0.00
10 20 30 40 50
Hourly Wage Rate (dollars)
Figure 2.2 The conditional distribution of hourly wages for the 3,384 employed Canadians in the
SLID, who had 12 years of education. The vertical axis is scaled as density, which
means that the total area of the bars of the histogram is 1. Moreover, because each bar
of the histogram has a width of 1, the height of the bar also (and coincidentally) repre-
sents the proportion of the sample in the corresponding interval of wage rates. The
vertical broken line is at the mean wage rate for those with 12 years of education:
$12.94.
2.1 Preliminaries 15
not linear—that is, not reasonably summarized by a straight line—and so the common reflex to
summarize relationships between quantitative variables with lines is also not a good idea here.
Thinking more abstractly, regression analysis, broadly construed, traces the distribution of a
response variable (denoted by Y )—or some characteristic of this distribution (such as its
mean)—as a function of one or more explanatory variables (X1 ; . . . ; Xk ):3
pðyjx1 ; . . . ; xk Þ ¼ f ðx1 ; . . . ; xk Þ ð2:1Þ
Here, pðyjx1 ; . . . ; xk Þ represents the probability (or, for continuous Y , the probability density)
of observing the specific value y of the response variable, conditional on a set of specific val-
ues (x1 ; . . . ; xk ) of the explanatory variables, and pðY jx1 ; . . . ; xk Þ is the probability distribution
of Y (or the density function of Y ) for these specific values of the X s.4 In the relationship
between the response variable wages (Y ) and the single explanatory variable education (X ), for
example, pðY jxÞ represents the population distribution of wages for all individuals who share
the specific value x of education (e.g., 12 years). Figure 2.1 is therefore the sample analog of
the population conditional distribution of Y .
The relationship of Y to the X s is of particular interest when we entertain the possibility that
the X s affect Y or—more weakly—when we wish to use the X s to predict the value of Y .
Primarily for convenience of exposition, I will initially use the term regression analysis to refer
to those cases in which both Y and the X s are quantitative (as opposed to qualitative) vari-
ables.5 This chapter introduces basic concepts of regression analysis in a very general setting
and explores some simple methods of regression analysis that make very weak assumptions
about the structure of the data.
2.1 Preliminaries
Figure 2.3 illustrates the regression of a continuous Y on a single, discrete X , which takes on
several values, labeled x1 ; x2 ; . . . ; x5 . Alternatively, you can think of X as a continuous variable
for which x1 ; x2 ; . . . ; x5 are specific representative values. As illustrated in the figure, the values
of X need not be evenly spaced. For concreteness, imagine (as in Figure 2.1) that Y represents
3
The response variable is often called the dependent variable, and the explanatory variables are often called indepen-
dent variables or predictors.
4
If the concept of (or notation for) a conditional distribution is unfamiliar, you should consult online Appendix D on
probability and estimation. Please keep in mind more generally that background information is located in the appen-
dixes, available on the website for the book.
5
Later in the book, we will have occasion to consider statistical models in which the explanatory variables (Chapters 7
and 8) and the response variable (Chapter 14) are qualitative/categorical variables. This material is centrally important
because categorical variables are very common in the social sciences.
16 Chapter 2. What Is Regression Analysis?
p(Y|x)
µ5
µ1
µ4
µ2 µ3
X
x1 x2 x3 x4 x5
Figure 2.3 Population regression of Y on X. The conditional distribution of Y, pðYjxÞ, is shown for
each of a few values of X. The distribution of Y at X ¼ x1 is positively skewed; at
X ¼ x2 , it is bimodal; at X ¼ x3 , it is heavy tailed; at X ¼ x4 , it has greater spread than
at X ¼ x5 . The conditional means of Y given X, that is, µ1 ; . . . ; µ5 , are not a linear
function of X.
wages, that X represents years of formal education, and that the graph shows the conditional
distribution pðY jxÞ of wages for some of the values of education.
Most discussions of regression analysis begin by assuming that the conditional distribution
of the response variable, pðY jx1 ; . . . ; xk Þ, is a normal distribution; that the variance of Y condi-
tional on the X s is everywhere the same regardless of the specific values of x1 ; . . . ; xk ; and that
the expected value (the mean) of Y is a linear function of the X s:
µ ” EðY jx1 ; . . . ; xk Þ ¼ α þ β1 x1 þ & & & þ βk xk ð2:2Þ
This utopian situation is depicted for a single X in Figure 2.4. As we will see,6 the assumptions
of normality, common variance, and linearity, along with independent random sampling, lead
to linear least-squares estimation of the model in Equation 2.2. In this chapter, in contrast, we
will pursue the notion of regression with as few assumptions as possible.
Figure 2.3 illustrates why we should not be too hasty to make the assumptions of normality,
equal variance, and linearity:
! Skewness. If the conditional distribution of Y is skewed, as is pðY jx1 Þ, then the mean
will not be a good summary of its center. This is the case as well in Figure 2.1, where
the (sample) conditional distributions of wages given education are all positively
skewed.
! Multiple modes. If the conditional distribution of Y is multimodal, as is pðY jx2 Þ, then it
is intrinsically unreasonable to summarize its center by a single number.
6
Chapters 6 and 9.
2.1 Preliminaries 17
p(Y|x)
E(Y) = α + βx
µ5
µ4
µ3
µ1 µ2
X
x1 x2 x3 x4 x5
Figure 2.4 Common assumptions in regression analysis: The conditional distributions pðYjxÞ are
all normal distributions with the same variance, and the conditional means of Y (here
µ1 ; . . . ; µ5 ) are all on a straight line.
This is not to say, of course, that linear regression analysis or, more generally, linear statistical
models, are of little practical use. Much of this book is devoted to the exposition of linear mod-
els. It is, however, prudent to begin with an appreciation of the limitations of linear models
because their effective use in data analysis frequently depends on adapting to these limitations:
We may, for example, transform data to make the assumptions of normality, equal variance,
and linearity more nearly correct.7
There are two additional advantages to approaching regression analysis from a general per-
spective: First, an appreciation of the practical difficulties of fitting the very general model in
Equation 2.1 to data motivates the specification of more restrictive models, such as the usual
7
See Chapters 4 and 12.
18 Chapter 2. What Is Regression Analysis?
linear regression model. Second, modern methods of nonparametric regression, while not quite
as general as the model in Equation 2.1, are emerging as practical alternatives to the more tra-
ditional linear models.
The balance of the present chapter is devoted to an initial foray into the territory of nonpara-
metric regression. I will begin by taking a direct or ‘‘naı̈ve’’ approach to the problem and then
will extend this approach by local averaging. In the process, we will encounter for the first time
a number of recurring themes in this book, including the direct examination of data by graphi-
cal displays, smoothing to clarify patterns in data, and the detection and treatment of unusual
data.8
8
More sophisticated methods for nonparametric regression are discussed in Chapter 18.
9
We will explore other approaches to displaying distributions in the next chapter.
10
Think of a graph like Figure 2.1 that shows the population conditional means, µjx—the values that we now want to
estimate from our sample.
11
This is an interesting—and unusual—problem in several respects: First, although it is more reasonable to suppose that
actual weight ‘‘affects’’ the report than vice versa, our desire to use the report to predict actual weight (presumably
because it is easier to elicit a verbal report than actually to weigh people) motivates treating measured weight as the
response variable. Second, this is one of those comparatively rare instances in which a linear-regression equation is a
natural specification, because if people are unbiased reporters of their weight, then we should have µ ¼ x
(i.e., expected reported weight equal to actual weight). Finally, if people are accurate as well as unbiased reporters of
their weight, then the conditional variance of Y given x should be very small.
2.2 Naive Nonparametric Regression 19
160 4
140
100
80
60
40
40 45 50 55 60 65 70 75
Reported Weight (kg)
Figure 2.5 Naive nonparametric regression of measured weight on reported weight, each in kilo-
grams. The range of reported weight has been dissected into five bins (separated by bro-
ken lines), each containing about 20 observations. The solid line connects the averages
of measured weight and reported weight in the five bins, shown as filled circles. The
dotted line around which the points cluster is Y ¼ X. The fourth observation is an out-
lier. Because of the very different ranges of measured and reported weight (due to the
outlier), the scales for the axes are different, and the line Y ¼ X is not at 45 degrees.
Even if the sample is large, replicated values of X will be rare because X is continuous.12 In
the absence of replicated X s, we cannot directly examine the conditional distribution of Y
given X , and we cannot directly calculate conditional means. If we indeed have a large sample
of individuals at our disposal, however, then we can dissect the range of X into many narrow
intervals, or bins, of reported weight, each bin containing many observations; within each such
bin, we can display the conditional distribution of measured weight and estimate the condi-
tional mean of Y with great precision.
In very large samples, and when the explanatory variables are discrete, it is possible to
estimate a regression by directly examining the conditional distribution of Y given the
X s. When the explanatory variables are continuous, we can proceed similarly by dissect-
ing the X s into a large number of narrow bins.
If, as is more typical, we have only a relatively small sample, then we have to make do with fewer
bins, each containing relatively few observations. This situation is illustrated in Figure 2.5, using
12
No numerical data are literally continuous, of course, because data are always recorded to some finite number of
digits, and in the current example, people would be unlikely to report their weights in fractions of a kilogram. This is
why tied values are possible. Individuals’ measured weights (Y , in the example), however, could well be measured to
greater precision. The philosophical issues surrounding continuity are subtle but essentially irrelevant to us: For practi-
cal purposes, a variable is continuous when it takes on many different values.
20 Chapter 2. What Is Regression Analysis?
80
60
Prestige
40
20
Figure 2.6 Naive nonparametric regression of occupational prestige on average income for 102
Canadian occupations in 1971. The range of income has been dissected into five bins,
each containing about 20 observations. The line connects the average prestige and
income scores in the five bins, shown as filled circles.
data on reported and measured weight for each of 101 Canadian women engaged in regular
exercise.13 A partially contrasting example, using the prestige and income levels of 102
Canadian occupations in 1971, is shown in Figure 2.6.14
The X -axes in Figures 2.5 and 2.6 are carved into five unequal-width bins, each bin contain-
ing approximately 20 observations (the middle bin contains the extra observations). The non-
parametric regression line displayed on each plot is calculated by connecting the points defined
by the conditional response variable means Y and the explanatory variable means X in the five
intervals.
Recalling our purpose, which is to estimate the model in Equation 2.3, there are two sources
of error in this simple procedure of binning and averaging:
! Sampling error (variance). The conditional sample means Y will, of course, change if
we select a new sample (even if we could retain the same selection of xs). Sampling
error is minimized by using a small number of relatively wide bins, each with many
observations.
13
These data were generously made available to me by Caroline Davis of York University, who used them as part of a
larger study; see Davis (1990). The error in the data described below was located by Professor Davis. The 101 women
were volunteers for the study, not a true sample from a larger population.
The observant reader will have noticed that there are apparently fewer than 101 points in Figure 2.5: Because both
measured and reported weight are given to the nearest kilogram, many points are overplotted (i.e., lie on top of one
another). We will learn to deal with overplotting in Chapter 3.
14
The Canadian occupational prestige data are described in Fox and Suschnigg (1989). Although there are many more
occupations in the Canadian census, these 102 do not constitute a random sample from the larger population of occupa-
tions. Justification for treating the 102 occupations as a sample implicitly rests on the claim that they are ‘‘typical’’ of
the population, at least with respect to the relationship between prestige and income—a problematic, if arguable, claim.
Discovering Diverse Content Through
Random Scribd Documents
first wife! How intolerably she suffered! But she never utters a word.
It is Dick Edgeworth who tells her story in complete ignorance that
he is doing anything of the kind. "It was a singular trait of character
in my wife," he observes, "who had never shown any uneasiness at
my intimacy with Sir Francis Delaval, that she should take a strong
dislike to Mr. Day. A more dangerous and seductive companion than
the one, or a more moral and improving companion than the other,
could not be found in England." It was, indeed, very singular.
For the first Mrs. Edgeworth was a penniless girl, the daughter of a
ruined country gentleman, who sat over his fire picking cinders from
the hearth and throwing them into the grate, while from time to time
he ejaculated "Hein! Heing!" as yet another scheme for making his
fortune came into his head. She had had no education. An itinerant
writing-master had taught her to form a few words. When Dick
Edgeworth was an undergraduate and rode over from Oxford she fell
in love with him and married him in order to escape the poverty and
the mystery and the dirt, and to have a husband and children like
other women. But with what result? Gigantic wheels ran downhill
with the bricklayer's son inside them. Sailing carriages took flight
and almost wrecked four stage coaches. Machines did cut turnips,
but not very efficiently. Her little boy was allowed to roam the
country like a poor man's son, bare-legged, untaught. And Mr. Day,
coming to breakfast and staying to dinner, argued incessantly about
scientific principles and the laws of nature.
But here we encounter one of the pitfalls of this nocturnal rambling
among forgotten worthies. It is so difficult to keep, as we must with
highly authenticated people, strictly to the facts. It is so difficult to
refrain from making scenes which, if the past could be recalled,
might perhaps be found lacking in accuracy. With a character like
Thomas Day, in particular, whose history surpasses the bounds of
the credible, we find ourselves oozing amazement, like a sponge
which has absorbed so much that it can retain no more but fairly
drips. Certain scenes have the fascination which belongs rather to
the abundance of fiction than to the sobriety of fact. For instance,
we conjure up all the drama of poor Mrs. Edgeworth's daily life; her
bewilderment, her loneliness, her despair, how she must have
wondered whether any one really wanted machines to climb walls,
and assured the gentlemen that turnips were better cut simply with
a knife, and so blundered and floundered and been snubbed that
she dreaded the almost daily arrival of the tall young man with his
pompous, melancholy face, marked by the smallpox, his profusion of
uncombed black hair, and his finical cleanliness of hands and person.
He talked fast, fluently, incessantly, for hours at a time about
philosophy and nature, and M. Rousseau. Yet it was her house; she
had to see to his meals, and, though he ate as though he were half
asleep, his appetite was enormous. But it was no use complaining to
her husband. Edgeworth said, "She lamented about trifles." He went
on to say: "The lamenting of a female with whom we live does not
render home delightful." And then, with his obtuse open-
mindedness, he asked her what she had to complain of? Did he ever
leave her alone? In the five or six years of their married life he had
slept from home not more than five or six times. Mr. Day could
corroborate that. Mr. Day corroborated everything that Mr.
Edgeworth said. He egged him on with his experiments. He told him
to leave his son without education. He did not care a rap what the
people of Henley said. In short, he was at the bottom of all the
absurdities and extravagances which made Mrs. Edgeworth's life a
burden to her.
Yet let us choose another scene—one of the last that poor Mrs.
Edgeworth was to behold. She was returning from Lyons, and Mr.
Day was her escort. A more singular figure, as he stood on the deck
of the packet which took them to Dover, very tall, very upright, one
finger in the breast of his coat, letting the wind blow his hair out,
dressed absurdly, though in the height of fashion, wild, romantic, yet
at the same time authoritative and pompous, could scarcely be
imagined; and this strange creature, who loathed women, was in
charge of a lady who was about to become a mother, had adopted
two orphan girls, and had set himself to win the hand of Miss
Elizabeth Sneyd by standing between boards for six hours daily in
order to learn to dance. Now and again he pointed his toe with rigid
precision; then, waking from the congenial dream into which the
dark clouds, the flying waters, and the shadow of England upon the
horizon had thrown him, he rapped out an order in the smart,
affected tones of a man of the world. The sailors stared, but they
obeyed. There was something sincere about him, something proudly
indifferent to what you thought; yes, something comforting and
humane, too, so that Mrs. Edgeworth for her part was determined
never to laugh at him again. But men were strange; life was difficult,
and with a sigh of bewilderment, perhaps of relief, poor Mrs.
Edgeworth landed at Dover, was brought to bed of a daughter, and
died.
Day meanwhile proceeded to Lichfield. Elizabeth Sneyd, of course,
refused him—gave a great cry, people said; exclaimed that she had
loved Day the blackguard, but hated Day the gentleman, and rushed
from the room. And then, they said, a terrible thing happened. Mr.
Day, in his rage, bethought him of the orphan, Sabrina Sydney,
whom he had bred to be his wife; visited her at Sutton Coldfield;
flew into a passion at the sight of her; fired a pistol at her skirts,
poured melted sealing wax over her arms, and boxed her ears. "No;
I could never have done that," Mr. Edgeworth used to say, when
people described the scene. And whenever to the end of his life he
thought of Thomas Day he fell silent. So great, so passionate, so
inconsistent—his life had been a tragedy, and in thinking of his
friend, the best friend he had ever had, Richard Edgeworth fell
silent.
It is almost the only occasion upon which silence is recorded of him.
To muse, to repent, to contemplate were foreign to his nature. His
wife and friends and children are silhouetted with extreme vividness
upon a broad disc of interminable chatter. Upon no other
background could we realise so clearly the sharp fragment of his first
wife, or the shades and depths which make up the character, at once
humane and brutal, advanced and hidebound of the inconsistent
philosopher, Thomas Day. But his power is not limited to people;
landscapes, groups, societies seem, even as he describes them, to
split off from him, to be projected away, so that we are able to run
just ahead of him and anticipate his coming. They are brought out
all the more vividly by the extreme incongruity which so often marks
his comment and stamps his presence; they live with a peculiar
beauty, fantastic, solemn, mysterious, in contrast with Edgeworth,
who is none of these things. In particular, he brings before us a
garden in Cheshire, the garden of a parsonage, an ancient but
commodious parsonage.
One pushed through a white gate and found oneself in a grass court,
small but well kept, with roses growing in the hedges and grapes
hanging from the walls. But what, in the name of wonder, were
those objects in the middle of the grass plot? Through the dusk of
an autumn evening there shone out an enormous white globe.
Round it at various distances were others of different sizes—the
planets and their satellites, it seemed. But who could have placed
them there, and why? The house was silent; the windows shut;
nobody was stirring. Then, furtively peeping from behind a curtain,
appeared for a second the face of an elderly man, handsome,
dishevelled, distraught. It vanished.
In some mysterious way, human beings inflict their own vagaries
upon nature. Moths and birds must have flitted more silently through
the little garden; over everything must have brooded the same
fantastic peace. Then, red-faced, garrulous, inquisitive, in burst
Richard Lovell Edgeworth. He looked at the globes; he satisfied
himself that they were of "accurate design and workmanlike
construction". He knocked at the door. He knocked and knocked. No
one came. At length, as his impatience was overcoming him, slowly
the latch was undone, gradually the door was opened; a clergyman,
neglected, unkempt, but still a gentleman, stood before him.
Edgeworth named himself, and they retired to a parlour littered with
books and papers and valuable furniture now fallen to decay. At last,
unable to control his curiosity any longer, Edgeworth asked what
were the globes in the garden? Instantly the clergyman displayed
extreme agitation. It was his son who had made them, he
exclaimed; a boy of genius, a boy of the greatest industry, and of
virtue and acquirements far beyond his age. But he had died. His
wife had died. Edgeworth tried to turn the conversation, but in vain.
The poor man rushed on passionately, incoherently about his son,
his genius, his death. "It struck me that his grief had injured his
understanding," said Edgeworth, and he was becoming more and
more uncomfortable when the door opened and a girl of fourteen or
fifteen, entering with a tea-tray in her hand, suddenly changed the
course of his host's conversation. Indeed, she was beautiful; dressed
in white; her nose a shade too prominent, perhaps—but no, her
proportions were exquisitely right. "She is a scholar and an artist!"
the clergyman exclaimed as she left the room. But why did she leave
the room? If she was his daughter why did she not preside at the
tea-table? Was she his mistress? Who was she? And why was the
house in this state of litter and decay? Why was the front door
locked? Why was the clergyman apparently a prisoner, and what was
his secret story? Questions began to crowd into Edgeworth's head as
he sat drinking his tea; but he could only shake his head and make
one last reflection, "I feared that something was not right," as he
shut the white wicket gate behind him, and left alone for ever in the
untidy house among the planets and their satellites, the mad
clergyman and the lovely girl.
II
LAETITIA PILKINGTON
Let us bother the librarian once again. Let us ask him to reach down,
dust, and hand over to us that little brown book over there, the
Memoirs of Mrs. Pilkington, three volumes bound in one, printed by
Peter Hoey in Dublin, MDCCLXXVI. The deepest obscurity shades her
retreat; the dust lies heavy on her tomb—one board is loose, that is
to say, and nobody has read her since early in the last century when
a reader, presumably a lady, whether disgusted by her obscenity or
stricken by the hand of death, left off in the middle and marked her
place with a faded list of goods and groceries. If ever a woman
wanted a champion, it is obviously Laetitia Pilkington. Who then was
she?
Can you imagine a very extraordinary cross between Moll Flanders
and Lady Ritchie, between a rolling and rollicking woman of the
town and a lady of breeding and refinement? Laetitia Pilkington
(1712-1759) was something of the sort—shady, shifty, adventurous,
and yet, like Thackeray's daughter, like Miss Mitford, like Madame de
Sévigné and Jane Austen and Maria Edgeworth, so imbued with the
old traditions of her sex that she wrote, as ladies talk, to give
pleasure. Throughout her Memoirs, we can never forget that it is her
wish to entertain, her unhappy fate to sob. Dabbing her eyes and
controlling her anguish, she begs us to forgive an odious breach of
manners which only the suffering of a lifetime, the intolerable
persecutions of Mr. P——n, the malignant, she must say the h——h,
spite of Lady C——t can excuse. For who should know better than
the Earl of Killmallock's great-granddaughter that it is the part of a
lady to hide her sufferings? Thus Laetitia is in the great tradition of
English women of letters. It is her duty to entertain; it is her instinct
to conceal. Still, though her room near the Royal Exchange is
threadbare, and the table is spread with old play-bills instead of a
cloth, and the butter is served in a shoe, and Mr. Worsdale has used
the teapot to fetch small beer that very morning, still she presides,
still she entertains. Her language is a trifle coarse, perhaps. But who
taught her English? The great Doctor Swift.
In all her wanderings, which were many and in her failings, which
were great, she looked back to those early Irish days when Swift
had pinched her into propriety of speech. He had beaten her for
fumbling at a drawer: he had daubed her cheeks with burnt cork to
try her temper; he had bade her pull off her shoes and stockings
and stand against the wainscot and let him measure her. At first she
had refused; then she had yielded. "Why," said the Dean, "I
suspected you had either broken Stockings or foul toes, and in either
case should have delighted to expose you." Three feet two inches
was all she measured, he declared, though, as Laetitia complained,
the weight of Swift's hand on her head had made her shrink to half
her size. But she was foolish to complain. Probably she owed her
intimacy to that very fact—she was only three feet two. Swift had
lived, a lifetime among the giants; now there was a charm in dwarfs.
He took the little creature into his library. "'Well,' said he, 'I have
brought you here to show you all the Money I got when I was in the
Ministry, but don't steal any of it.' 'I won't, indeed. Sir,' said I; so he
opened a Cabinet, and showed me a whole parcel of empty drawers.
'Bless me,' says he, 'the Money is flown.'" There was a charm in her
surprise; there was a charm in her humility. He could beat her and
bully her, make her shout when he was deaf, force her husband to
drink the lees of the wine, pay their cab fares, stuff guineas into a
piece of gingerbread, and relent surprisingly, as if there were
something grimly pleasing to him in the thought of so foolish a
midget setting up to have a life and a mind of her own. For with
Swift she was herself; it was the effect of his genius. She had to pull
off her stockings if he told her to. So, though his satire terrified her,
and she found it highly unpleasant to dine at the Deanery and see
him watching, in the great glass which hung before him for that
purpose, the butler stealing beer at the sideboard, she knew that it
was a privilege to walk with him in his garden; to hear him talk of
Mr. Pope and quote Hudibras; and then be hustled back in the rain
to save coach hire, and then to sit chatting in the parlour with Mrs.
Brent, the housekeeper, about the Dean's oddity and charity, and
how the sixpence he saved on the coach he gave to the lame old
man who sold gingerbread at the corner, while the Dean dashed up
the front stairs and down the back so violently that she was afraid
he would fall and hurt himself.
But memories of great men are no infallible specific. They fall upon
the race of life like beams from a lighthouse. They flash, they shock,
they reveal, they vanish. To remember Swift was of little avail to
Laetitia when the troubles of life came thick about her. Mr. Pilkington
left her for Widow W—rr—n. Her father—her dear father—died. The
sheriff's officers insulted her. She was deserted in an empty house
with two children to provide for. The tea chest was secured, the
garden gate locked, and the bills left unpaid. And still she was young
and attractive and gay, with an inordinate passion for scribbling
verses and an incredible hunger for reading books. It was this that
was her undoing. The book was fascinating and the hour late. The
gentleman would not lend it, but would stay till she had finished.
They sat in her bedroom. It was highly indiscreet, she owned.
Suddenly twelve watchmen broke through the kitchen window, and
Mr. Pilkington appeared with a cambric handkerchief tied about his
neck. Swords were drawn and heads broken. As for her excuse, how
could one expect Mr. Pilkington and the twelve watchmen to believe
that? Only reading! Only sitting up late to finish a new book! Mr.
Pilkington and the watchmen interpreted the situation as such men
would. But lovers of learning, she is persuaded, will understand her
passion and deplore its consequences.
And now what was she to do? Reading had played her false, but still
she could write. Ever since she could form her letters, indeed, she
had written, with incredible speed and considerable grace, odes,
addresses, apostrophes to Miss Hoadley, to the Recorder of Dublin,
to Dr. Delville's place in the country. "Hail, happy Delville, blissful
seat!" "Is there a man whose fixed and steady gaze——"—the verses
flowed without the slightest difficulty on the slightest occasion. Now,
therefore, crossing to England, she set up, as her advertisement had
it, to write letters upon any subject, except the law, for twelve pence
ready money, and no trust given. She lodged opposite White's
Chocolate House, and there, in the evening, as she watered her
flowers on the leads, the noble gentlemen in the window across the
road drank her health, sent her over a bottle of burgundy; and later
she heard old Colonel——crying, "Poke after me, my lord, poke after
me," as he shepherded the D—— of M—lb—gh up her dark stairs.
That lovely gentleman, who honoured his title by wearing it, kissed
her, complimented her, opened his pocketbook, and left her with a
banknote for fifty pounds upon Sir Francis Child. Such tributes
stimulated her pen to astonishing outbursts of impromptu gratitude.
If, on the other hand, a gentleman refused to buy or a lady hinted
impropriety, this same flowery pen writhed and twisted in agonies of
hate and vituperation. "Had I said that your F——r died Blaspheming
the Almighty", one of her accusations begins, but the end is
unprintable. Great ladies were accused of every depravity, and the
clergy, unless their taste in poetry was above reproach, suffered an
incessant castigation. Mr. Pilkington, she never forgot, was a
clergyman.
Slowly but surely the Earl of Killmallock's great-granddaughter
descended in the social scale. From St. James's Street and its noble
benefactors she migrated to Green Street to lodge with Lord Stair's
valet de chambre and his wife, who washed for persons of
distinction. She, who had dallied with dukes, was glad for company's
sake to take a hand at quadrille with footmen and laundresses and
Grub Street writers, who, as they drank porter, sipped green tea,
and smoked tobacco, told stories of the utmost scurrility about their
masters and mistresses. The spiciness of their conversation made
amends for the vulgarity of their manners. From them Laetitia picked
up those anecdotes of the great which sprinkled her pages with
dashes and served her purpose when subscribers failed and
landladies grew insolent. Indeed, it was a hard life—to trudge to
Chelsea in the snow wearing nothing but a chintz gown and be put
off with a beggarly half-crown by Sir Hans Sloane; next to tramp to
Ormond Street and extract two guineas from the odious Dr. Meade,
which, in her glee, she tossed in the air and lost in a crack of the
floor; to be insulted by footmen; to sit down to a dish of boiling
water because her landlady must not guess that a pinch of tea was
beyond her means. Twice on moonlight nights, with the lime trees in
flower, she wandered in St. James's Park and contemplated suicide
in Rosamond's Pond. Once, musing among the tombs in Westminster
Abbey, the door was locked on her, and she had to spend the night
in the pulpit wrapped in a carpet from the Communion Table to
protect herself from the assaults of rats. "I long to listen to the
young-ey'd cherubims!" she exclaimed. But a very different fate was
in store for her. In spite of Mr. Colley Cibber, and Mr. Richardson,
who supplied her first with gilt-edged notepaper and then with baby
linen, those harpies, her landladies, after drinking her ale, devouring
her lobsters, and failing often for years at a time to comb their hair,
succeeded in driving Swift's friend, and the Earl's great-
granddaughter, to be imprisoned with common debtors in the
Marshalsea.
Bitterly she cursed her husband who had made her a lady of
adventure instead of what nature intended, "a harmless household
dove". More and more wildly she ransacked her brains for
anecdotes, memories, scandals, views about the bottomless nature
of the sea, the inflammable character of the earth—anything that
would fill a page and earn her a guinea. She remembered that she
had eaten plovers' eggs with Swift. "Here, Hussey," said he, "is a
Plover's egg. King William used to give crowns apiece for them. . . ."
Swift never laughed, she remembered. He used to suck in his cheeks
instead of laughing. And what else could she remember? A great
many gentlemen, a great many landladies; how the window was
thrown up when her father died, and her sister came downstairs
with the sugar-basin, laughing. All had been bitterness and struggle,
except that she had loved Shakespeare, known Swift, and kept
through all the shifts and shades of an adventurous career a gay
spirit, something of a lady's breeding, and the gallantry which, at the
end of her short life, led her to crack her joke and enjoy her duck
with death at her heart and duns at her pillow.
III
MISS ORMEROD[8]
The trees stood massively in all their summer foliage spotted and
grouped upon a meadow which sloped gently down from the big
white house. There were unmistakable signs of the year 1835 both
in the trees and in the sky, for modern trees are not nearly so
voluminous as these ones, and the sky of those days had a kind of
pale diffusion in its texture which was different from the more
concentrated tone of the skies we know.
Mr. George Ormerod stepped from the drawing-room window of
Sedbury House, Gloucestershire, wearing a tall furry hat and white
trousers strapped under his instep; he was closely, though
deferentially, followed by a lady wearing a yellow-spotted dress over
a crinoline, and behind her, singly and arm in arm, came nine
children in nankeen jackets and long white drawers. They were
going to see the water let out of a pond.
The youngest child, Eleanor, a little girl with a pale face, rather
elongated features, and black hair, was left by herself in the
drawing-room, a large sallow apartment with pillars, two
chandeliers, for some reason enclosed in holland bags, and several
octagonal tables some of inlaid wood and others of greenish
malachite. At one of these little Eleanor Ormerod was seated in a
high chair.
"Now, Eleanor," said her mother, as the party assembled for the
expedition to the pond, "here are some pretty beetles. Don't touch
the glass. Don't get down from your chair, and when we come back
little George will tell you all about it."
So saying, Mrs. Ormerod placed a tumbler of water containing about
half a dozen great water grubs in the middle of the malachite table,
at a safe distance from the child, and followed her husband down
the slope of old-fashioned turf towards a cluster of extremely old-
fashioned sheep; opening, directly she stepped on to the terrace, a
tiny parasol of bottle green silk with a bottle green fringe, though
the sky was like nothing so much as a flock bed covered with a
counterpane of white dimity.
The plump pale grubs gyrated slowly round and round in the
tumbler. So simple an entertainment must surely soon have ceased
to satisfy. Surely Eleanor would shake the tumbler, upset the grubs,
and scramble down from her chair. Why, even a grown person can
hardly watch those grubs crawling down the glass wall, then floating
to the surface, without a sense of boredom not untinged with
disgust. But the child sat perfectly still. Was it her custom, then, to
be entertained by the gyrations of grubs? Her eyes were reflective,
even critical. But they shone with increasing excitement. She beat
one hand upon the edge of the table. What was the reason? One of
the grubs had ceased to float: he lay at the bottom; the rest,
descending, proceeded to tear him to pieces.
"And how has little Eleanor enjoyed herself?" asked Mr. Ormerod, in
rather a deep voice, stepping into the room and with a slight air of
heat and of fatigue upon his face.
"Papa," said Eleanor, almost interrupting her father in her eagerness
to impart her observation, "I saw one of the grubs fall down and the
rest came and ate him!"
"Nonsense, Eleanor," said Mr. Ormerod. "You are not telling the
truth." He looked severely at the tumbler in which the beetles were
still gyrating as before.
"Papa, it was true!"
"Eleanor, little girls are not allowed to contradict their fathers," said
Mrs. Ormerod, coming in through the window, and closing her green
parasol with a snap.
"Let this be a lesson," Mr. Ormerod began, signing to the other
children to approach, when the door opened, and the servant
announced,
"Captain Fenton."
Captain Fenton "was at times thought to be tedious in his recurrence
to the charge of the Scots Greys in which he had served at the battle
of Waterloo."
But what is this crowd gathered round the door of the George Hotel
in Chepstow? A faint cheer rises from the bottom of the hill. Up
comes the mail coach, horses steaming, panels mud-splashed.
"Make way! Make way!" cries the ostler and the vehicle dashes into
the courtyard, pulls up sharp before the door. Down jumps the
coachman, the horses are led off, and a fine team of spanking greys
is harnessed with incredible speed in their stead. Upon all this—
coachman, horses, coach, and passengers—the crowd looked with
gaping admiration every Wednesday evening all through the year.
But to-day, the twelfth of March, 1852, as the coachman settled his
rug, and stretched his hands for the reins, he observed that instead
of being fixed upon him, the eyes of the people of Chepstow darted
this way and that. Heads were jerked. Arms flung out. Here a hat
swooped in a semi-circle. Off drove the coach almost unnoticed. As
it turned the corner all the outside passengers craned their necks,
and one gentleman rose to his feet and shouted, "There! there!
there!" before he was bowled into eternity. It was an insect—a red-
winged insect. Out the people of Chepstow poured into the high
road; down the hill they ran; always the insect flew in front of them;
at length by Chepstow Bridge a young man, throwing his bandanna
over the blade of an oar, captured it alive and presented it to a
highly respectable elderly gentleman who now came puffing upon
the scene—Samuel Budge, doctor, of Chepstow. By Samuel Budge it
was presented to Miss Ormerod; by her sent to a professor at
Oxford. And he, declaring it "a fine specimen of the rose
underwinged locust" added the gratifying information that it "was
the first of the kind to be captured so far west."
And so, at the age of twenty-four Miss Eleanor Ormerod was thought
the proper person to receive the gift of a locust.
When Eleanor Ormerod appeared at archery meetings and croquet
tournaments young men pulled their whiskers and young ladies
looked grave. It was so difficult to make friends with a girl who could
talk of nothing but black beetles and earwigs—"Yes, that's what she
likes, isn't it queer?—Why, the other day Ellen, Mama's maid, heard
from Jane, who's under-kitchenmaid at Sedbury House, that Eleanor
tried to boil a beetle in the kitchen saucepan and he wouldn't die,
and swam round and round, and she got into a terrible state and
sent the groom all the way to Gloucester to fetch chloroform—all for
an insect, my dear!—and she gives the cottagers shillings to collect
beetles for her—and she spends hours in her bedroom cutting them
up—and she climbs trees like a boy to find wasps' nests—oh, you
can't think what they don't say about her in the village—for she does
look so odd, dressed anyhow, with that great big nose and those
bright little eyes, so like a caterpillar herself, I always think—but of
course she's wonderfully clever and very good, too, both of them.
Georgiana has a lending library for the cottagers, and Eleanor never
misses a service—but there she is—that short pale girl in the large
bonnet. Do go and talk to her, for I'm sure I'm too stupid, but you'd
find plenty to say—" But neither Fred nor Arthur, Henry nor William
found anything to say—
". . . probably the lecturer would have been equally well pleased
had none of her own sex put in an appearance."
This comment upon a lecture delivered in the year 1889 throws
some light, perhaps, upon archery meetings in the 'fifties.
It being nine o'clock on a February night some time about 1862 all
the Ormerods were in the library; Mr. Ormerod making architectural
designs at a table; Mrs. Ormerod lying on a sofa making pencil
drawings upon grey paper; Eleanor making a model of a snake to
serve as a paper weight; Georgiana making a copy of the font in
Tidenham Church; some of the others examining books with
beautiful illustrations; while at intervals someone rose, unlocked the
wire book case, took down a volume for instruction or
entertainment, and perused it beneath the chandelier.
Mr. Ormerod required complete silence for his studies. His word was
law, even to the dogs, who, in the absence of their master,
instinctively obeyed the eldest male person in the room. Some
whispered colloquy there might be between Mrs. Ormerod and her
daughters—
"The draught under the pew was really worse than ever this
morning, Mama—"
"And we could only unfasten the latch in the chancel because
Eleanor happened to have her ruler with her—"
"—hm—m—m. Dr. Armstrong—Hm—m—m—"
"—Anyhow things aren't as bad with us as they are at Kinghampton.
They say Mrs. Briscoe's Newfoundland dog follows her right up to
the chancel rails when she takes the sacrament—"
"And the turkey is still sitting on its eggs in the pulpit."
—"The period of incubation for a turkey is between three and four
weeks"—said Eleanor, thoughtfully looking up from her cast of the
snake and forgetting, in the interest of her subject, to speak in a
whisper.
"Am I to be allowed no peace in my own house?" Mr. Ormerod
exclaimed angrily, rapping with his ruler on the table, upon which
Mrs. Ormerod half shut one eye and squeezed a little blob of
Chinese white on to her high light, and they remained silent until the
servants came in, when everyone, with the exception of Mrs.
Ormerod, fell on their knees. For she, poor lady, suffered from a
chronic complaint and left the family party forever a year or two
later, when the green sofa was moved into the corner, and the
drawings given to her nieces in memory of her. But Mr. Ormerod
went on making architectural drawings at nine p.m. every night
(save on Sundays when he read a sermon) until he too lay upon the
green sofa, which had not been used since Mrs. Ormerod lay there,
but still looked much the same. "We deeply felt the happiness of
ministering to his welfare," Miss Ormerod wrote, "for he would not
hear of our leaving him for even twenty-four hours and he objected
to visits from my brothers excepting occasionally for a short time.
They, not being used to the gentle ways necessary for an aged
invalid, worried him . . . the Thursday following, the 9th October,
1873, he passed gently away at the mature age of eighty-seven
years." Oh, graves in country churchyards—respectable burials—
mature old gentlemen—D.C.L., LL.D., F.R.S., F.S.A.—lots of letters
come after your names, but lots of women are buried with you!
"If you're sure I'm not in your way," said Miss Lipscomb unstrapping
her paint box and planting her tripod firmly in the path, "—I'll try to
get a picture of those lovely hydrangeas against the sky—What
flowers you have in Penzance!"
The market gardener crossed his hands on his hoe, slowly twined a
piece of bass round his finger, looked at the sky, said something
about the sun, also about the prevalence of lady artists, and then,
with a nod of his head, observed sententiously that it was to a lady
that he owed everything he had.
"Ah?" said Miss Lipscomb, flattered, but already much occupied with
her composition.
"A lady with a queer-sounding name," said Mr. Pascoe, "but that's
the lady I've called my little girl after—I don't think there's such
another in Christendom."
Of course it was Miss Ormerod, equally of course Miss Lipscomb was
the sister of Miss Ormerod's family doctor; and so she did no
sketching that morning, but left with a handsome bunch of grapes
instead—for every flower had drooped, ruin had stared him in the
face—he had written, not believing one bit what they told him—to
the lady with the queer name, back there came a book "In-ju-ri-ous
In-sects," with the page turned down, perhaps by her very hand,
also a letter which he kept at home under the clock, but he knew
every word by heart, since it was due to what she said there that he
wasn't a ruined man—and the tears ran down his face and Miss
Lipscomb, clearing a space on the lodging-house table, wrote the
whole story to her brother.
"The prejudice against Paris Green certainly seems to be dying
down," said Miss Ormerod when she read it.—"But now," she sighed
rather heavily, being no longer young and much afflicted with the
gout, "now it's the sparrows."
One might have thought that they would have left her alone—
innocent dirt-grey birds, taking more than their share of the
breakfast crumbs, otherwise inoffensive. But once you look through
a microscope—once you see the Hessian and the Bot as they really
are—there's no peace for an elderly lady pacing her terrace on a fine
May morning. For example, why, when there are crumbs enough for
all, do only the sparrows get them? Why not swallows or martins?
Why—oh, here come the servants for prayers—
"Forgive us our trespasses as we forgive them that trespass against
us. . . . For thine is the Kingdom and the power and the glory, for
ever and ever. Amen—"
"The Times, ma'am—"
"Thank you, Dixon. . . . The Queen's birthday! We must drink her
Majesty's health in the old white port, Dixon. Home Rule—tut—tut—
tut. All that madman Gladstone. My father would have thought the
world was coming to an end, and I'm not at all sure that it isn't. I
must talk to Dr. Lipscomb—"
Yet all the time in the tail of her eye she saw myriads of sparrows,
and retiring to the study proclaimed in a pamphlet of which 36,000
copies were gratuitously distributed that the sparrow is a pest.
"When he eats an insect," she said to her sister Georgians, "which
isn't often, it's one of the few insects that one wants to keep—one of
the very few," she added with a touch of acidity natural to one
whose investigations have all tended to the discredit of the insect
race.
"But there'll be some very unpleasant consequences to face," she
concluded—"Very unpleasant indeed."
Happily the port was now brought in, the servants assembled; and
Miss Ormerod, rising to her feet, gave the toast "Her Blessed
Majesty." She was extremely loyal, and moreover she liked nothing
better than a glass of her father's old white port. She kept his pigtail,
too, in a box.
Such being her disposition it went hard with her to analyse the
sparrow's crop, for the sparrow she felt, symbolises something of the
homely virtue of English domestic life, and to proclaim it stuffed with
deceit was disloyal to much that she, and her fathers before her,
held dear. Sure enough the clergy—the Rev. J. E. Walker—
denounced her for her brutality; "God Save the Sparrow!" exclaimed
the Animal's Friend; and Miss Carrington, of the Humanitarian
League, replied in a leaflet described by Miss Ormerod as "spirity,
discourteous, and inaccurate."
"Well," said Miss Ormerod to her sister, "it did me no harm before to
be threatened to be shot at, also hanged in effigy, and other little
attentions."
"Still it was very disagreeable, Eleanor—more disagreeable I believe,
to me than to you," said Georgiana. Soon Georgiana died. She had
however finished the beautiful series of insect diagrams at which she
worked every morning in the dining-room and they were presented
to Edinburgh University. But Eleanor was never the same woman
after that.
Dear forest fly—flour moths—weevils—grouse and cheese flies—
beetles—foreign correspondents—eel worms—ladybirds—wheat
midges—resignation from the Royal Agricultural Society—gall mites—
boot beetles—Announcement of honorary degree to be conferred—
feelings of appreciation and anxiety—paper on wasps—last annual
report warnings of serious illness—proposed pension—gradual loss
of strength—Finally Death.
That is life, so they say.
"It does no good to keep people waiting for an answer," sighed Miss
Ormerod, "though I don't feel as able as I did since that unlucky
accident at Waterloo. And no one realises what the strain of the
work is—often I'm the only lady in the room, and the gentlemen so
learned, though I've always found them most helpful, most generous
in every way. But I'm growing old. Miss Hartwell, that's what it is.
That's what led me to be thinking of this difficult matter of flour
infestation in the middle of the road so that I didn't see the horse
until he had poked his nose into my ear. . . . Then there's this
nonsense about a pension. What could possess Mr. Barron to think
of such a thing? I should feel inexpressibly lowered if I accepted a
pension. Why, I don't altogether like writing LL.D. after my name,
though Georgie would have liked it. All I ask is to be let go on in my
own quiet way. Now where is Messrs. Langridge's sample? We must
take that first. 'Gentlemen, I have examined your sample and find . .
.'"
"If any one deserves a thorough good rest it's you. Miss Ormerod,"
said Dr. Lipscomb, who had grown a little white over the ears. "I
should say the farmers of England ought to set up a statue to you,
bring offerings of corn and wine—make you a kind of Goddess, eh—
what was her name?"
"Not a very shapely figure for a Goddess," said Miss Ormerod with a
little laugh. "I should enjoy the wine though. You're not going to cut
me off my one glass of port surely?"
"You must remember," said Dr. Lipscomb, shaking his head, "how
much your life means to others."
"Well, I don't know about that," said Miss Ormerod, pondering a
little. "To be sure, I've chosen my epitaph. 'She introduced Paris
Green into England,' and there might be a word or two about the
Hessian fly—that, I do believe, was a good piece of work."
"No need to think about epitaphs yet," said Dr. Lipscomb.
"Our lives are in the hands of the Lord," said Miss Ormerod simply.
Dr. Lipscomb bent his head and looked out of the window. Miss
Ormerod remained silent.
"English entomologists care little or nothing for objects of practical
importance," she exclaimed suddenly. "Take this question of flour
infestation—I can't say how many grey hairs that hasn't grown me."
"Figuratively speaking. Miss Ormerod," said Dr. Lipscomb, for her
hair was still raven black.
"Well, I do believe all good work is done in concert," Miss Ormerod
continued. "It is often a great comfort to me to think that."
"It's beginning to rain," said Dr. Lipscomb. "How will your enemies
like that, Miss Ormerod?"
"Hot or cold, wet or dry, insects always flourish!" cried Miss
Ormerod, energetically sitting up in bed.
"Old Miss Ormerod is dead," said Mr. Drummond, opening The Times
on Saturday, July 20th, 1901.
"Old Miss Ormerod?" asked Mrs. Drummond.