0% found this document useful (0 votes)
82 views

The Nature of Regression Analysis

1. The term "regression" originated from Francis Galton's study of hereditary traits in families, where he found that offspring of unusually tall or short parents tended to "regress" towards the average height of the general population. 2. The modern interpretation of regression analysis concerns using a dependent variable to predict or estimate the mean of an explanatory variable. It focuses on the relationship between variables rather than explaining underlying phenomena. 3. Regression analysis uses data points and averages to find the "regression line" that best captures the trend of how the dependent variable changes in relation to the explanatory variable(s).

Uploaded by

tohmina tultuli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

The Nature of Regression Analysis

1. The term "regression" originated from Francis Galton's study of hereditary traits in families, where he found that offspring of unusually tall or short parents tended to "regress" towards the average height of the general population. 2. The modern interpretation of regression analysis concerns using a dependent variable to predict or estimate the mean of an explanatory variable. It focuses on the relationship between variables rather than explaining underlying phenomena. 3. Regression analysis uses data points and averages to find the "regression line" that best captures the trend of how the dependent variable changes in relation to the explanatory variable(s).

Uploaded by

tohmina tultuli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

guj75772_ch01.

qxd 31/07/2008 11:00 AM Page 15

Chapter

1
The Nature of
Regression Analysis
As mentioned in the Introduction, regression is a main tool of econometrics, and in this
chapter we consider very briefly the nature of this tool.

1.1 Historical Origin of the Term Regression


The term regression was introduced by Francis Galton. In a famous paper, Galton found
that, although there was a tendency for tall parents to have tall children and for short par-
ents to have short children, the average height of children born of parents of a given height
tended to move or “regress” toward the average height in the population as a whole.1 In
other words, the height of the children of unusually tall or unusually short parents tends to
move toward the average height of the population. Galton’s law of universal regression was
confirmed by his friend Karl Pearson, who collected more than a thousand records of
heights of members of family groups.2 He found that the average height of sons of a group
of tall fathers was less than their fathers’ height and the average height of sons of a group
of short fathers was greater than their fathers’ height, thus “regressing” tall and short sons
alike toward the average height of all men. In the words of Galton, this was “regression to
mediocrity.”

1.2 The Modern Interpretation of Regression


The modern interpretation of regression is, however, quite different. Broadly speaking, we
may say
Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variables, with a view to
estimating and/or predicting the (population) mean or average value of the former in terms of
the known or fixed (in repeated sampling) values of the latter.

1
Francis Galton, “Family Likeness in Stature,” Proceedings of Royal Society, London, vol. 40, 1886,
pp. 42–72.
2
K. Pearson and A. Lee, “On the Laws of Inheritance,’’ Biometrika, vol. 2, Nov. 1903, pp. 357–462.

15
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 16

16 Part One Single-Equation Regression Models

The full import of this view of regression analysis will become clearer as we progress, but
a few simple examples will make the basic concept quite clear.

Examples
1. Reconsider Galton’s law of universal regression. Galton was interested in finding out
why there was a stability in the distribution of heights in a population. But in the modern
view our concern is not with this explanation but rather with finding out how the average
height of sons changes, given the fathers’ height. In other words, our concern is with pre-
dicting the average height of sons knowing the height of their fathers. To see how this can
be done, consider Figure 1.1, which is a scatter diagram, or scattergram. This figure
shows the distribution of heights of sons in a hypothetical population corresponding to the
given or fixed values of the father’s height. Notice that corresponding to any given height of
a father is a range or distribution of the heights of the sons. However, notice that despite the
variability of the height of sons for a given value of father’s height, the average height of
sons generally increases as the height of the father increases. To show this clearly, the cir-
cled crosses in the figure indicate the average height of sons corresponding to a given
height of the father. Connecting these averages, we obtain the line shown in the figure. This
line, as we shall see, is known as the regression line. It shows how the average height of
sons increases with the father’s height.3
2. Consider the scattergram in Figure 1.2, which gives the distribution in a hypothetical
population of heights of boys measured at fixed ages. Corresponding to any given age, we
have a range, or distribution, of heights. Obviously, not all boys of a given age are likely to
have identical heights. But height on the average increases with age (of course, up to a

FIGURE 1.1
Hypothetical 75 × Mean value
distribution of sons’
×
heights corresponding ×
× ×
to given heights of × ×
×
× ×
fathers. 70 × × ×
× ×
× ×
× × ×
Son's height, inches

× × ×
× × ×
× × × ×
× ×
× × × ×
× × × ×
65
× × ×
× × × ×
× × × ×
× × ×
× × ×
× × ×
× × ×
× × ×
60 × ×
× ×

60 65 70 75
Father's height, inches

3
At this stage of the development of the subject matter, we shall call this regression line simply the
line connecting the mean, or average, value of the dependent variable (son’s height) corresponding to
the given value of the explanatory variable (father’s height). Note that this line has a positive slope but
the slope is less than 1, which is in conformity with Galton’s regression to mediocrity. (Why?)
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 17

Chapter 1 The Nature of Regression Analysis 17

FIGURE 1.2 70 Mean value


Hypothetical
distribution of heights
corresponding to
selected ages. 60

Height, inches
50

40

10 11 12 13 14
Age, years

certain age), which can be seen clearly if we draw a line (the regression line) through the cir-
cled points that represent the average height at the given ages. Thus, knowing the age, we
may be able to predict from the regression line the average height corresponding to that age.
3. Turning to economic examples, an economist may be interested in studying the de-
pendence of personal consumption expenditure on aftertax or disposable real personal in-
come. Such an analysis may be helpful in estimating the marginal propensity to consume
(MPC), that is, average change in consumption expenditure for, say, a dollar’s worth of
change in real income (see Figure 1.3).
4. A monopolist who can fix the price or output (but not both) may want to find out
the response of the demand for a product to changes in price. Such an experiment may
enable the estimation of the price elasticity (i.e., price responsiveness) of the demand for the
product and may help determine the most profitable price.
5. A labor economist may want to study the rate of change of money wages in relation to
the unemployment rate. The historical data are shown in the scattergram given in Figure 1.3.
The curve in Figure 1.3 is an example of the celebrated Phillips curve relating changes in the
money wages to the unemployment rate. Such a scattergram may enable the labor economist
to predict the average change in money wages given a certain unemployment rate. Such
knowledge may be helpful in stating something about the inflationary process in an econ-
omy, for increases in money wages are likely to be reflected in increased prices.
6. From monetary economics it is known that, other things remaining the same, the
higher the rate of inflation π, the lower the proportion k of their income that people would
want to hold in the form of money, as depicted in Figure 1.4. The slope of this line repre-
sents the change in k given a change in the inflation rate. A quantitative analysis of this
relationship will enable the monetary economist to predict the amount of money, as a
proportion of their income, that people would want to hold at various rates of inflation.
7. The marketing director of a company may want to know how the demand for the
company’s product is related to, say, advertising expenditure. Such a study will be of
considerable help in finding out the elasticity of demand with respect to advertising ex-
penditure, that is, the percent change in demand in response to, say, a 1 percent change in
the advertising budget. This knowledge may be helpful in determining the “optimum”
advertising budget.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 18

18 Part One Single-Equation Regression Models

FIGURE 1.3 +
Hypothetical Phillips
curve.

Rate of change of money wages


0 Unemployment rate, %

FIGURE 1.4 k=
Money
Money holding in Income
relation to the inflation
rate π.

0 π
Inflation rate

8. Finally, an agronomist may be interested in studying the dependence of a particular


crop yield, say, of wheat, on temperature, rainfall, amount of sunshine, and fertilizer. Such
a dependence analysis may enable the prediction or forecasting of the average crop yield,
given information about the explanatory variables.
The reader can supply scores of such examples of the dependence of one variable on one
or more other variables. The techniques of regression analysis discussed in this text are
specially designed to study such dependence among variables.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 19

Chapter 1 The Nature of Regression Analysis 19

1.3 Statistical versus Deterministic Relationships


From the examples cited in Section 1.2, the reader will notice that in regression analysis
we are concerned with what is known as the statistical, not functional or deterministic,
dependence among variables, such as those of classical physics. In statistical relation-
ships among variables we essentially deal with random or stochastic4 variables, that is,
variables that have probability distributions. In functional or deterministic dependency,
on the other hand, we also deal with variables, but these variables are not random or
stochastic.
The dependence of crop yield on temperature, rainfall, sunshine, and fertilizer, for
example, is statistical in nature in the sense that the explanatory variables, although
certainly important, will not enable the agronomist to predict crop yield exactly because of
errors involved in measuring these variables as well as a host of other factors (variables)
that collectively affect the yield but may be difficult to identify individually. Thus, there is
bound to be some “intrinsic” or random variability in the dependent-variable crop yield that
cannot be fully explained no matter how many explanatory variables we consider.
In deterministic phenomena, on the other hand, we deal with relationships of the type,
say, exhibited by Newton’s law of gravity, which states: Every particle in the universe
attracts every other particle with a force directly proportional to the product of their masses
and inversely proportional to the square of the distance between them. Symbolically,
F = k(m 1 m 2 /r 2 ), where F = force, m 1 and m 2 are the masses of the two particles, r =
distance, and k = constant of proportionality. Another example is Ohm’s law, which states:
For metallic conductors over a limited range of temperature the current C is proportional to
the voltage V; that is, C = ( k1 )V where k1 is the constant of proportionality. Other examples
of such deterministic relationships are Boyle’s gas law, Kirchhoff’s law of electricity, and
Newton’s law of motion.
In this text we are not concerned with such deterministic relationships. Of course, if
there are errors of measurement, say, in the k of Newton’s law of gravity, the otherwise
deterministic relationship becomes a statistical relationship. In this situation, force can be
predicted only approximately from the given value of k (and m 1 , m 2 , and r), which contains
errors. The variable F in this case becomes a random variable.

1.4 Regression versus Causation


Although regression analysis deals with the dependence of one variable on other variables,
it does not necessarily imply causation. In the words of Kendall and Stuart, “A statistical
relationship, however strong and however suggestive, can never establish causal connec-
tion: our ideas of causation must come from outside statistics, ultimately from some theory
or other.”5

4
The word stochastic comes from the Greek word stokhos meaning “a bull’s eye.” The outcome of
throwing darts on a dart board is a stochastic process, that is, a process fraught with misses.
5
M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Charles Griffin Publishers, New York,
vol. 2, 1961, chap. 26, p. 279.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 20

20 Part One Single-Equation Regression Models

In the crop-yield example cited previously, there is no statistical reason to assume that
rainfall does not depend on crop yield. The fact that we treat crop yield as dependent on
rainfall (among other things) is due to nonstatistical considerations: Common sense
suggests that the relationship cannot be reversed, for we cannot control rainfall by varying
crop yield.
In all the examples cited in Section 1.2 the point to note is that a statistical relationship
in itself cannot logically imply causation. To ascribe causality, one must appeal to a priori
or theoretical considerations. Thus, in the third example cited, one can invoke economic
theory in saying that consumption expenditure depends on real income.6

1.5 Regression versus Correlation


Closely related to but conceptually very much different from regression analysis is
correlation analysis, where the primary objective is to measure the strength or degree of
linear association between two variables. The correlation coefficient, which we shall
study in detail in Chapter 3, measures this strength of (linear) association. For example, we
may be interested in finding the correlation (coefficient) between smoking and lung cancer,
between scores on statistics and mathematics examinations, between high school grades
and college grades, and so on. In regression analysis, as already noted, we are not primar-
ily interested in such a measure. Instead, we try to estimate or predict the average value of
one variable on the basis of the fixed values of other variables. Thus, we may want to know
whether we can predict the average score on a statistics examination by knowing a student’s
score on a mathematics examination.
Regression and correlation have some fundamental differences that are worth mention-
ing. In regression analysis there is an asymmetry in the way the dependent and explanatory
variables are treated. The dependent variable is assumed to be statistical, random, or sto-
chastic, that is, to have a probability distribution. The explanatory variables, on the other
hand, are assumed to have fixed values (in repeated sampling),7 which was made explicit in
the definition of regression given in Section 1.2. Thus, in Figure 1.2 we assumed that the
variable age was fixed at given levels and height measurements were obtained at these
levels. In correlation analysis, on the other hand, we treat any (two) variables symmetri-
cally; there is no distinction between the dependent and explanatory variables. After all, the
correlation between scores on mathematics and statistics examinations is the same as that
between scores on statistics and mathematics examinations. Moreover, both variables
are assumed to be random. As we shall see, most of the correlation theory is based on the
assumption of randomness of variables, whereas most of the regression theory to be
expounded in this book is conditional upon the assumption that the dependent variable is
stochastic but the explanatory variables are fixed or nonstochastic.8

6
But as we shall see in Chapter 3, classical regression analysis is based on the assumption that the
model used in the analysis is the correct model. Therefore, the direction of causality may be implicit
in the model postulated.
7
It is crucial to note that the explanatory variables may be intrinsically stochastic, but for the purpose
of regression analysis we assume that their values are fixed in repeated sampling (that is, X assumes
the same values in various samples), thus rendering them in effect nonrandom or nonstochastic. But
more on this in Chapter 3, Sec. 3.2.
8
In advanced treatment of econometrics, one can relax the assumption that the explanatory variables
are nonstochastic (see introduction to Part 2).
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 21

Chapter 1 The Nature of Regression Analysis 21

1.6 Terminology and Notation


Before we proceed to a formal analysis of regression theory, let us dwell briefly on the
matter of terminology and notation. In the literature the terms dependent variable and
explanatory variable are described variously. A representative list is:

Dependent variable Explanatory variable


 
Explained variable Independent variable
 
Predictand Predictor
 
Regressand Regressor
 
Response Stimulus
 
Endogenous Exogenous
 
Outcome Covariate
 
Controlled variable Control variable

Although it is a matter of personal taste and tradition, in this text we will use the dependent
variable/explanatory variable or the more neutral regressand and regressor terminology.
If we are studying the dependence of a variable on only a single explanatory variable,
such as that of consumption expenditure on real income, such a study is known as simple,
or two-variable, regression analysis. However, if we are studying the dependence of one
variable on more than one explanatory variable, as in the crop-yield, rainfall, temperature,
sunshine, and fertilizer example, it is known as multiple regression analysis. In other
words, in two-variable regression there is only one explanatory variable, whereas in multi-
ple regression there is more than one explanatory variable.
The term random is a synonym for the term stochastic. As noted earlier, a random or
stochastic variable is a variable that can take on any set of values, positive or negative, with
a given probability.9
Unless stated otherwise, the letter Y will denote the dependent variable and the X’s
( X 1 , X 2 , . . . , X k ) will denote the explanatory variables, X k being the kth explanatory
variable. The subscript i or t will denote the ith or the tth observation or value. X ki (or X kt )
will denote the ith (or tth) observation on variable X k . N (or T) will denote the total
number of observations or values in the population, and n (or t) the total number of obser-
vations in a sample. As a matter of convention, the observation subscript i will be used for
cross-sectional data (i.e., data collected at one point in time) and the subscript t will be
used for time series data (i.e., data collected over a period of time). The nature of cross-
sectional and time series data, as well as the important topic of the nature and sources of
data for empirical analysis, is discussed in the following section.

9
See Appendix A for formal definition and further details.
guj75772_ch01.qxd 23/08/2008 10:42 AM Page 22

22 Part One Single-Equation Regression Models

1.7 The Nature and Sources of Data for Economic Analysis10


The success of any econometric analysis ultimately depends on the availability of the
appropriate data. It is therefore essential that we spend some time discussing the nature,
sources, and limitations of the data that one may encounter in empirical analysis.

Types of Data
Three types of data may be available for empirical analysis: time series, cross-section, and
pooled (i.e., combination of time series and cross-section) data.

Time Series Data


The data shown in Table 1.1 of the Introduction are an example of time series data. A time
series is a set of observations on the values that a variable takes at different times. Such data
may be collected at regular time intervals, such as daily (e.g., stock prices, weather
reports), weekly (e.g., money supply figures), monthly (e.g., the unemployment rate, the
Consumer Price Index [CPI]), quarterly (e.g., GDP), annually (e.g., government
budgets), quinquennially, that is, every 5 years (e.g., the census of manufactures), or
decennially, that is, every 10 years (e.g., the census of population). Sometime data are
available both quarterly as well as annually, as in the case of the data on GDP and consumer
expenditure. With the advent of high-speed computers, data can now be collected over an
extremely short interval of time, such as the data on stock prices, which can be obtained
literally continuously (the so-called real-time quote).
Although time series data are used heavily in econometric studies, they present special
problems for econometricians. As we will show in chapters on time series econometrics
later on, most empirical work based on time series data assumes that the underlying time
series is stationary. Although it is too early to introduce the precise technical meaning of
stationarity at this juncture, loosely speaking, a time series is stationary if its mean and
variance do not vary systematically over time. To see what this means, consider Figure 1.5,
which depicts the behavior of the M1 money supply in the United States from January 1,
1959, to September, 1999. (The actual data are given in Exercise 1.4.) As you can see from
this figure, the M1 money supply shows a steady upward trend as well as variability over
the years, suggesting that the M1 time series is not stationary.11 We will explore this topic
fully in Chapter 21.

Cross-Section Data
Cross-section data are data on one or more variables collected at the same point in time,
such as the census of population conducted by the Census Bureau every 10 years (the lat-
est being in year 2000), the surveys of consumer expenditures conducted by the University
of Michigan, and, of course, the opinion polls by Gallup and umpteen other organizations.
A concrete example of cross-sectional data is given in Table 1.1. This table gives data on
egg production and egg prices for the 50 states in the union for 1990 and 1991. For each

10
For an informative account, see Michael D. Intriligator, Econometric Models, Techniques, and
Applications, Prentice Hall, Englewood Cliffs, N.J., 1978, chap. 3.
11
To see this more clearly, we divided the data into four time periods: 1951:01 to 1962:12; 1963:01
to 1974:12; 1975:01 to 1986:12, and 1987:01 to 1999:09: For these subperiods the mean values of
the money supply (with corresponding standard deviations in parentheses) were, respectively, 165.88
(23.27), 323.20 (72.66), 788.12 (195.43), and 1099 (27.84), all figures in billions of dollars. This is a
rough indication of the fact that the money supply over the entire period was not stationary.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 23

Chapter 1 The Nature of Regression Analysis 23

FIGURE 1.5 1200


M1 money supply:
United States,
1000
1951:01–1999:09.

800

600

400

200

0
55 60 65 70 75 80 85 90 95

year the data on the 50 states are cross-sectional data. Thus, in Table 1.1 we have two cross-
sectional samples.
Just as time series data create their own special problems (because of the stationarity
issue), cross-sectional data too have their own problems, specifically the problem of hetero-
geneity. From the data given in Table 1.1 we see that we have some states that produce huge
amounts of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska). When
we include such heterogeneous units in a statistical analysis, the size or scale effect must be
taken into account so as not to mix apples with oranges. To see this clearly, we plot in Fig-
ure 1.6 the data on eggs produced and their prices in 50 states for the year 1990. This figure
shows how widely scattered the observations are. In Chapter 11 we will see how the scale
effect can be an important factor in assessing relationships among economic variables.

Pooled Data
In pooled, or combined, data are elements of both time series and cross-section data. The
data in Table 1.1 are an example of pooled data. For each year we have 50 cross-sectional
observations and for each state we have two time series observations on prices and output
of eggs, a total of 100 pooled (or combined) observations. Likewise, the data given in
Exercise 1.1 are pooled data in that the Consumer Price Index (CPI) for each country
for 1980–2005 is time series data, whereas the data on the CPI for the seven countries
for a single year are cross-sectional data. In the pooled data we have 182 observations—
26 annual observations for each of the seven countries.

Panel, Longitudinal, or Micropanel Data


This is a special type of pooled data in which the same cross-sectional unit (say, a family or
a firm) is surveyed over time. For example, the U.S. Department of Commerce carries out
a census of housing at periodic intervals. At each periodic survey the same household
(or the people living at the same address) is interviewed to find out if there has been any
change in the housing and financial conditions of that household since the last survey. By
interviewing the same household periodically, the panel data provide very useful informa-
tion on the dynamics of household behavior, as we shall see in Chapter 16.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 24

24 Part One Single-Equation Regression Models

FIGURE 1.6 160


Relationship between
eggs produced and

Price of eggs per dozen (in cents)


140
prices, 1990.

120

100

80

60

40
0 2000 4000 6000 8000
Number of eggs produced (millions)

TABLE 1.1 U.S. Egg Production


State Y1 Y2 X1 X2 State Y1 Y2 X1 X2
AL 2,206 2,186 92.7 91.4 MT 172 164 68.0 66.0
AK 0.7 0.7 151.0 149.0 NE 1,202 1,400 50.3 48.9
AZ 73 74 61.0 56.0 NV 2.2 1.8 53.9 52.7
AR 3,620 3,737 86.3 91.8 NH 43 49 109.0 104.0
CA 7,472 7,444 63.4 58.4 NJ 442 491 85.0 83.0
CO 788 873 77.8 73.0 NM 283 302 74.0 70.0
CT 1,029 948 106.0 104.0 NY 975 987 68.1 64.0
DE 168 164 117.0 113.0 NC 3,033 3,045 82.8 78.7
FL 2,586 2,537 62.0 57.2 ND 51 45 55.2 48.0
GA 4,302 4,301 80.6 80.8 OH 4,667 4,637 59.1 54.7
HI 227.5 224.5 85.0 85.5 OK 869 830 101.0 100.0
ID 187 203 79.1 72.9 OR 652 686 77.0 74.6
IL 793 809 65.0 70.5 PA 4,976 5,130 61.0 52.0
IN 5,445 5,290 62.7 60.1 RI 53 50 102.0 99.0
IA 2,151 2,247 56.5 53.0 SC 1,422 1,420 70.1 65.9
KS 404 389 54.5 47.8 SD 435 602 48.0 45.8
KY 412 483 67.7 73.5 TN 277 279 71.0 80.7
LA 273 254 115.0 115.0 TX 3,317 3,356 76.7 72.6
ME 1,069 1,070 101.0 97.0 UT 456 486 64.0 59.0
MD 885 898 76.6 75.4 VT 31 30 106.0 102.0
MA 235 237 105.0 102.0 VA 943 988 86.3 81.2
MI 1,406 1,396 58.0 53.8 WA 1,287 1,313 74.1 71.5
MN 2,499 2,697 57.7 54.0 WV 136 174 104.0 109.0
MS 1,434 1,468 87.8 86.7 WI 910 873 60.1 54.0
MO 1,580 1,622 55.4 51.5 WY 1.7 1.7 83.0 83.0

Note: Y1 = eggs produced in 1990 (millions).


Y2 = eggs produced in 1991 (millions).
X1 = price per dozen (cents) in 1990.
X2 = price per dozen (cents) in 1991.
Source: World Almanac, 1993, p. 119. The data are from the Economic Research Service, U.S. Department of Agriculture.
guj75772_ch01.qxd 23/08/2008 10:42 AM Page 25

Chapter 1 The Nature of Regression Analysis 25

As a concrete example, consider the data given in Table 1.2. The data in the table, orig-
inally collected by Y. Grunfeld, refer to the real investment, the real value of the firm, and
the real capital stock of four U.S. companies, namely, General Electric (GM), U.S. Steel
(US), General Motors (GM), and Westinghouse (WEST), for the period 1935–1954.12
Since the data are for several companies collected over a number of years, this is a classic
example of panel data. In this table, the number of observations for each company is the
same, but this is not always the case. If all the companies have the same number of obser-
vations, we have what is called a balanced panel. If the number of observations is not the
same for each company, it is called an unbalanced panel. In Chapter 16, Panel Data
Regression Models, we will examine such data and show how to estimate such models.
Grunfeld’s purpose in collecting these data was to find out how real gross investment (I)
depends on the real value of the firm (F) a year earlier and real capital stock (C) a year
earlier. Since the companies included in the sample operate in the same capital market, by
studying them together, Grunfeld wanted to find out if they had similar investment functions.

The Sources of Data13


The data used in empirical analysis may be collected by a governmental agency (e.g., the
Department of Commerce), an international agency (e.g., the International Monetary Fund
[IMF] or the World Bank), a private organization (e.g., the Standard & Poor’s Corporation), or
an individual. Literally, there are thousands of such agencies collecting data for one purpose
or another.
The Internet
The Internet has literally revolutionized data gathering. If you just “surf the net” with a
keyword (e.g., exchange rates), you will be swamped with all kinds of data sources. In
Appendix E we provide some of the frequently visited websites that provide economic and
financial data of all sorts. Most of the data can be downloaded without much cost. You may
want to bookmark the various websites that might provide you with useful economic data.
The data collected by various agencies may be experimental or nonexperimental.
In experimental data, often collected in the natural sciences, the investigator may want to
collect data while holding certain factors constant in order to assess the impact of some
factors on a given phenomenon. For instance, in assessing the impact of obesity on blood
pressure, the researcher would want to collect data while holding constant the eating,
smoking, and drinking habits of the people in order to minimize the influence of these
variables on blood pressure.
In the social sciences, the data that one generally encounters are nonexperimental in
nature, that is, not subject to the control of the researcher.14 For example, the data on GNP,
unemployment, stock prices, etc., are not directly under the control of the investigator. As we
shall see, this lack of control often creates special problems for the researcher in pinning
down the exact cause or causes affecting a particular situation. For example, is it the money
supply that determines the (nominal) GDP or is it the other way around?

12
Y. Grunfeld, “The Determinants of Corporate Investment,” unpublished PhD thesis, Department of
Economics, University of Chicago, 1958. These data have become a workhorse for illustrating panel
data regression models.
13
For an illuminating account, see Albert T. Somers, The U.S. Economy Demystified: What the Major
Economic Statistics Mean and their Significance for Business, D.C. Heath, Lexington, Mass., 1985.
14
In the social sciences too sometimes one can have a controlled experiment. An example is given in
Exercise 1.6.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 26

26 Part One Single-Equation Regression Models

TABLE 1.2 Investment Data for Four Companies, 1935–1954


Observation I F−1 C−1 Observation I F−1 C−1
GE US
1935 33.1 1170.6 97.8 1935 209.9 1362.4 53.8
1936 45.0 2015.8 104.4 1936 355.3 1807.1 50.5
1937 77.2 2803.3 118.0 1937 469.9 2673.3 118.1
1938 44.6 2039.7 156.2 1938 262.3 1801.9 260.2
1939 48.1 2256.2 172.6 1939 230.4 1957.3 312.7
1940 74.4 2132.2 186.6 1940 361.6 2202.9 254.2
1941 113.0 1834.1 220.9 1941 472.8 2380.5 261.4
1942 91.9 1588.0 287.8 1942 445.6 2168.6 298.7
1943 61.3 1749.4 319.9 1943 361.6 1985.1 301.8
1944 56.8 1687.2 321.3 1944 288.2 1813.9 279.1
1945 93.6 2007.7 319.6 1945 258.7 1850.2 213.8
1946 159.9 2208.3 346.0 1946 420.3 2067.7 232.6
1947 147.2 1656.7 456.4 1947 420.5 1796.7 264.8
1948 146.3 1604.4 543.4 1948 494.5 1625.8 306.9
1949 98.3 1431.8 618.3 1949 405.1 1667.0 351.1
1950 93.5 1610.5 647.4 1950 418.8 1677.4 357.8
1951 135.2 1819.4 671.3 1951 588.2 2289.5 341.1
1952 157.3 2079.7 726.1 1952 645.2 2159.4 444.2
1953 179.5 2371.6 800.3 1953 641.0 2031.3 623.6
1954 189.6 2759.9 888.9 1954 459.3 2115.5 669.7
GM WEST
1935 317.6 3078.5 2.8 1935 12.93 191.5 1.8
1936 391.8 4661.7 52.6 1936 25.90 516.0 0.8
1937 410.6 5387.1 156.9 1937 35.05 729.0 7.4
1938 257.7 2792.2 209.2 1938 22.89 560.4 18.1
1939 330.8 4313.2 203.4 1939 18.84 519.9 23.5
1940 461.2 4643.9 207.2 1940 28.57 628.5 26.5
1941 512.0 4551.2 255.2 1941 48.51 537.1 36.2
1942 448.0 3244.1 303.7 1942 43.34 561.2 60.8
1943 499.6 4053.7 264.1 1943 37.02 617.2 84.4
1944 547.5 4379.3 201.6 1944 37.81 626.7 91.2
1945 561.2 4840.9 265.0 1945 39.27 737.2 92.4
1946 688.1 4900.0 402.2 1946 53.46 760.5 86.0
1947 568.9 3526.5 761.5 1947 55.56 581.4 111.1
1948 529.2 3245.7 922.4 1948 49.56 662.3 130.6
1949 555.1 3700.2 1020.1 1949 32.04 583.8 141.8
1950 642.9 3755.6 1099.0 1950 32.24 635.2 136.7
1951 755.9 4833.0 1207.7 1951 54.38 732.8 129.7
1952 891.2 4924.9 1430.5 1952 71.78 864.1 145.5
1953 1304.4 6241.7 1777.3 1953 90.08 1193.5 174.8
1954 1486.7 5593.6 2226.3 1954 68.60 1188.9 213.5

Notes: Y = I = gross investment = additions to plant and equipment plus maintenance and repairs, in millions of dollars deflated by P1.
X2 = F = value of the firm = price of common and preferred shares at Dec. 31 (or average price of Dec. 31 and Jan. 31 of the following year) times
number of common and preferred shares outstanding plus total book value of debt at Dec. 31, in millions of dollars deflated by P2.
X3 = C = stock of plant and equipment = accumulated sum of net additions to plant and equipment deflated by P1 minus depreciation allowance
deflated by P3 in these definitions.
P1 = implicit price deflator of producers’ durable equipment (1947 = 100).
P2 = implicit price deflator of GNP (1947 = 100).
P3 = depreciation expense deflator = 10-year moving average of wholesale price index of metals and metal products (1947 = 100).
Source: Reproduced from H. D. Vinod and Aman Ullah, Recent Advances in Regression Methods, Marcel Dekker, New York, 1981, pp. 259–261.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 27

Chapter 1 The Nature of Regression Analysis 27

The Accuracy of Data15


Although plenty of data are available for economic research, the quality of the data is often
not that good. There are several reasons for that.

1. As noted, most social science data are nonexperimental in nature. Therefore, there is the
possibility of observational errors, either of omission or commission.
2. Even in experimentally collected data, errors of measurement arise from approxima-
tions and roundoffs.
3. In questionnaire-type surveys, the problem of nonresponse can be serious; a researcher
is lucky to get a 40 percent response rate to a questionnaire. Analysis based on such a
partial response rate may not truly reflect the behavior of the 60 percent who did not re-
spond, thereby leading to what is known as (sample) selectivity bias. Then there is the
further problem that those who do respond to the questionnaire may not answer all the
questions, especially questions of a financially sensitive nature, thus leading to additional
selectivity bias.
4. The sampling methods used in obtaining the data may vary so widely that it is often dif-
ficult to compare the results obtained from the various samples.
5. Economic data are generally available at a highly aggregate level. For example, most
macrodata (e.g., GNP, employment, inflation, unemployment) are available for the econ-
omy as a whole or at the most for some broad geographical regions. Such highly aggre-
gated data may not tell us much about the individuals or microunits that may be the
ultimate object of study.
6. Because of confidentiality, certain data can be published only in highly aggregate form.
The IRS, for example, is not allowed by law to disclose data on individual tax returns;
it can only release some broad summary data. Therefore, if one wants to find out how
much individuals with a certain level of income spent on health care, one cannot do so
except at a very highly aggregate level. Such macroanalysis often fails to reveal the dy-
namics of the behavior of the microunits. Similarly, the Department of Commerce,
which conducts the census of business every 5 years, is not allowed to disclose infor-
mation on production, employment, energy consumption, research and development
expenditure, etc., at the firm level. It is therefore difficult to study the interfirm differences
on these items.

Because of all of these and many other problems, the researcher should always keep
in mind that the results of research are only as good as the quality of the data. There-
fore, if in given situations researchers find that the results of the research are “unsatisfac-
tory,” the cause may be not that they used the wrong model but that the quality of the data
was poor. Unfortunately, because of the nonexperimental nature of the data used in most
social science studies, researchers very often have no choice but to depend on the available
data. But they should always keep in mind that the data used may not be the best and should
try not to be too dogmatic about the results obtained from a given study, especially when
the quality of the data is suspect.

A Note on the Measurement Scales of Variables16


The variables that we will generally encounter fall into four broad categories: ratio scale,
interval scale, ordinal scale, and nominal scale. It is important that we understand each.

15
For a critical review, see O. Morgenstern, The Accuracy of Economic Observations, 2d ed., Princeton
University Press, Princeton, N.J., 1963.
16
The following discussion relies heavily on Aris Spanos, Probability Theory and Statistical Inference:
Econometric Modeling with Observational Data, Cambridge University Press, New York, 1999, p. 24.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 28

28 Part One Single-Equation Regression Models

Ratio Scale
For a variable X, taking two values, X1 and X2, the ratio X1X2 and the distance (X2 − X1)
are meaningful quantities. Also, there is a natural ordering (ascending or descending) of the
values along the scale. Therefore, comparisons such as X 2 ≤ X 1 or X 2 ≥ X 1 are meaning-
ful. Most economic variables belong to this category. Thus, it is meaningful to ask how big
this year’s GDP is compared with the previous year’s GDP. Personal income, measured
in dollars, is a ratio variable; someone earning $100,000 is making twice as much as an-
other person earning $50,000 (before taxes are assessed, of course!).
Interval Scale
An interval scale variable satisfies the last two properties of the ratio scale variable but not
the first. Thus, the distance between two time periods, say (2000–1995) is meaningful, but
not the ratio of two time periods (2000/1995). At 11:00 a.m. PST on August 11, 2007,
Portland, Oregon, reported a temperature of 60 degrees Fahrenheit while Tallahassee,
Florida, reached 90 degrees. Temperature is not measured on a ratio scale since it does not
make sense to claim that Tallahassee was 50 percent warmer than Portland. This is mainly
due to the fact that the Fahrenheit scale does not use 0 degrees as a natural base.
Ordinal Scale
A variable belongs to this category only if it satisfies the third property of the ratio scale
(i.e., natural ordering). Examples are grading systems (A, B, C grades) or income class
(upper, middle, lower). For these variables the ordering exists but the distances between the
categories cannot be quantified. Students of economics will recall the indifference curves
between two goods. Each higher indifference curve indicates a higher level of utility, but
one cannot quantify by how much one indifference curve is higher than the others.
Nominal Scale
Variables in this category have none of the features of the ratio scale variables. Variables
such as gender (male, female) and marital status (married, unmarried, divorced, separated)
simply denote categories. Question: What is the reason why such variables cannot be
expressed on the ratio, interval, or ordinal scales?
As we shall see, econometric techniques that may be suitable for ratio scale variables
may not be suitable for nominal scale variables. Therefore, it is important to bear in mind
the distinctions among the four types of measurement scales discussed above.

Summary and 1. The key idea behind regression analysis is the statistical dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variables.
Conclusions
2. The objective of such analysis is to estimate and/or predict the mean or average value of the
dependent variable on the basis of the known or fixed values of the explanatory variables.
3. In practice the success of regression analysis depends on the availability of the appro-
priate data. This chapter discussed the nature, sources, and limitations of the data that
are generally available for research, especially in the social sciences.
4. In any research, the researcher should clearly state the sources of the data used in
the analysis, their definitions, their methods of collection, and any gaps or omissions
in the data as well as any revisions in the data. Keep in mind that the macroeconomic
data published by the government are often revised.
5. Since the reader may not have the time, energy, or resources to track down the data, the
reader has the right to presume that the data used by the researcher have been properly
gathered and that the computations and analysis are correct.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 29

Chapter 1 The Nature of Regression Analysis 29

EXERCISES 1.1. Table 1.3 gives data on the Consumer Price Index (CPI) for seven industrialized
countries with 1982–1984 = 100 as the base of the index.
a. From the given data, compute the inflation rate for each country.17
b. Plot the inflation rate for each country against time (i.e., use the horizontal axis for
time and the vertical axis for the inflation rate).
c. What broad conclusions can you draw about the inflation experience in the seven
countries?
d. Which country’s inflation rate seems to be most variable? Can you offer any
explanation?
1.2. a. Using Table 1.3, plot the inflation rate of Canada, France, Germany, Italy, Japan,
and the United Kingdom against the United States inflation rate.
b. Comment generally about the behavior of the inflation rate in the six countries
vis-à-vis the U.S. inflation rate.
c. If you find that the six countries’ inflation rates move in the same direction as the
U.S. inflation rate, would that suggest that U.S. inflation “causes” inflation in the
other countries? Why or why not?

TABLE 1.3 Year U.S. Canada Japan France Germany Italy U.K.
CPI in Seven
Industrial Countries, 1980 82.4 76.1 91.0 72.2 86.7 63.9 78.5
1980–2005 1981 90.9 85.6 95.3 81.8 92.2 75.5 87.9
(1982–1984 = 100) 1982 96.5 94.9 98.1 91.7 97.0 87.8 95.4
1983 99.6 100.4 99.8 100.3 100.3 100.8 99.8
Source: Economic Report of the 1984 103.9 104.7 102.1 108.0 102.7 111.4 104.8
President, 2007, Table 108,
p. 354. 1985 107.6 109.0 104.2 114.3 104.8 121.7 111.1
1986 109.6 113.5 104.9 117.2 104.6 128.9 114.9
1987 113.6 118.4 104.9 121.1 104.9 135.1 119.7
1988 118.3 123.2 105.6 124.3 106.3 141.9 125.6
1989 124.0 129.3 108.0 128.7 109.2 150.7 135.4
1990 130.7 135.5 111.4 132.9 112.2 160.4 148.2
1991 136.2 143.1 115.0 137.2 116.3 170.5 156.9
1992 140.3 145.3 117.0 140.4 122.2 179.5 162.7
1993 144.5 147.9 118.5 143.4 127.6 187.7 165.3
1994 148.2 148.2 119.3 145.8 131.1 195.3 169.3
1995 152.4 151.4 119.2 148.4 133.3 205.6 175.2
1996 156.9 153.8 119.3 151.4 135.3 213.8 179.4
1997 160.5 156.3 121.5 153.2 137.8 218.2 185.1
1998 163.0 157.8 122.2 154.2 139.1 222.5 191.4
1999 166.6 160.5 121.8 155.0 140.0 226.2 194.3
2000 172.2 164.9 121.0 157.6 142.0 231.9 200.1
2001 177.1 169.1 120.1 160.2 144.8 238.3 203.6
2002 179.9 172.9 119.0 163.3 146.7 244.3 207.0
2003 184.0 177.7 118.7 166.7 148.3 250.8 213.0
2004 188.9 181.0 118.7 170.3 150.8 256.3 219.4
2005 195.3 184.9 118.3 173.2 153.7 261.3 225.6

17
Subtract from the current year’s CPI the CPI from the previous year, divide the difference by the
previous year’s CPI, and multiply the result by 100. Thus, the inflation rate for Canada for 1981 is
[(85.6 − 76.1)/76.1] × 100 = 12.48% (approx.).
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 30

30 Part One Single-Equation Regression Models

1.3. Table 1.4 gives the foreign exchange rates for nine industrialized countries for the
years 1985–2006. Except for the United Kingdom, the exchange rate is defined as
the units of foreign currency for one U.S. dollar; for the United Kingdom, it is defined
as the number of U.S. dollars for one U.K. pound.
a. Plot these exchange rates against time and comment on the general behavior of the
exchange rates over the given time period.
b. The dollar is said to appreciate if it can buy more units of a foreign currency.
Contrarily, it is said to depreciate if it buys fewer units of a foreign currency. Over
the time period 1985–2006, what has been the general behavior of the U.S. dollar?
Incidentally, look up any textbook on macroeconomics or international economics
to find out what factors determine the appreciation or depreciation of a currency.
1.4. The data behind the M1 money supply in Figure 1.5 are given in Table 1.5. Can you
give reasons why the money supply has been increasing over the time period shown in
the table?
1.5. Suppose you were to develop an economic model of criminal activities, say, the hours
spent in criminal activities (e.g., selling illegal drugs). What variables would you con-
sider in developing such a model? See if your model matches the one developed by the
Nobel laureate economist Gary Becker.18

TABLE 1.4 Exchange Rates for Nine Countries: 1985–2006

South United
Year Australia Canada China P. R. Japan Mexico Korea Sweden Switzerland Kingdom
1985 0.7003 1.3659 2.9434 238.47 0.257 872.45 8.6032 2.4552 1.2974
1986 0.6709 1.3896 3.4616 168.35 0.612 884.60 7.1273 1.7979 1.4677
1987 0.7014 1.3259 3.7314 144.60 1.378 826.16 6.3469 1.4918 1.6398
1988 0.7841 1.2306 3.7314 128.17 2.273 734.52 6.1370 1.4643 1.7813
1989 0.7919 1.1842 3.7673 138.07 2.461 674.13 6.4559 1.6369 1.6382
1990 0.7807 1.1668 4.7921 145.00 2.813 710.64 5.9231 1.3901 1.7841
1991 0.7787 1.1460 5.3337 134.59 3.018 736.73 6.0521 1.4356 1.7674
1992 0.7352 1.2085 5.5206 126.78 3.095 784.66 5.8258 1.4064 1.7663
1993 0.6799 1.2902 5.7795 111.08 3.116 805.75 7.7956 1.4781 1.5016
1994 0.7316 1.3664 8.6397 102.18 3.385 806.93 7.7161 1.3667 1.5319
1995 0.7407 1.3725 8.3700 93.96 6.447 772.69 7.1406 1.1812 1.5785
1996 0.7828 1.3638 8.3389 108.78 7.600 805.00 6.7082 1.2361 1.5607
1997 0.7437 1.3849 8.3193 121.06 7.918 953.19 7.6446 1.4514 1.6376
1998 0.6291 1.4836 8.3008 130.99 9.152 1,400.40 7.9522 1.4506 1.6573
1999 0.6454 1.4858 8.2783 113.73 9.553 1,189.84 8.2740 1.5045 1.6172
2000 0.5815 1.4855 8.2784 107.80 9.459 1,130.90 9.1735 1.6904 1.5156
2001 0.5169 1.5487 8.2770 121.57 9.337 1,292.02 10.3425 1.6891 1.4396
2002 0.5437 1.5704 8.2771 125.22 9.663 1,250.31 9.7233 1.5567 1.5025
2003 0.6524 1.4008 8.2772 115.94 10.793 1,192.08 8.0787 1.3450 1.6347
2004 0.7365 1.3017 8.2768 108.15 11.290 1,145.24 7.3480 1.2428 1.8330
2005 0.7627 1.2115 8.1936 110.11 10.894 1,023.75 7.4710 1.2459 1.8204
2006 0.7535 1.1340 7.9723 116.31 10.906 954.32 7.3718 1.2532 1.8434

Source: Economic Report of the President, 2007, Table B–110, p. 356.

18
G. S. Becker, “Crime and Punishment: An Economic Approach,” Journal of Political Economy, vol. 76,
1968, pp. 169–217.
guj75772_ch01.qxd 27/08/2008 01:14 PM Page 31

Chapter 1 The Nature of Regression Analysis 31

TABLE 1.5 1959:01 138.8900 139.3900 139.7400 139.6900 140.6800 141.1700


Seasonally Adjusted 1959:07 141.7000 141.9000 141.0100 140.4700 140.3800 139.9500
M1 Supply: 1960:01 139.9800 139.8700 139.7500 139.5600 139.6100 139.5800
1959:01–1999:07 1960:07 140.1800 141.3100 141.1800 140.9200 140.8600 140.6900
(billions of dollars) 1961:01 141.0600 141.6000 141.8700 142.1300 142.6600 142.8800
Source: Board of Governors, 1961:07 142.9200 143.4900 143.7800 144.1400 144.7600 145.2000
Federal Reserve Bank, USA. 1962:01 145.2400 145.6600 145.9600 146.4000 146.8400 146.5800
1962:07 146.4600 146.5700 146.3000 146.7100 147.2900 147.8200
1963:01 148.2600 148.9000 149.1700 149.7000 150.3900 150.4300
1963:07 151.3400 151.7800 151.9800 152.5500 153.6500 153.2900
1964:01 153.7400 154.3100 154.4800 154.7700 155.3300 155.6200
1964:07 156.8000 157.8200 158.7500 159.2400 159.9600 160.3000
1965:01 160.7100 160.9400 161.4700 162.0300 161.7000 162.1900
1965:07 163.0500 163.6800 164.8500 165.9700 166.7100 167.8500
1966:01 169.0800 169.6200 170.5100 171.8100 171.3300 171.5700
1966:07 170.3100 170.8100 171.9700 171.1600 171.3800 172.0300
1967:01 171.8600 172.9900 174.8100 174.1700 175.6800 177.0200
1967:07 178.1300 179.7100 180.6800 181.6400 182.3800 183.2600
1968:01 184.3300 184.7100 185.4700 186.6000 187.9900 189.4200
1968:07 190.4900 191.8400 192.7400 194.0200 196.0200 197.4100
1969:01 198.6900 199.3500 200.0200 200.7100 200.8100 201.2700
1969:07 201.6600 201.7300 202.1000 202.9000 203.5700 203.8800
1970:01 206.2200 205.0000 205.7500 206.7200 207.2200 207.5400
1970:07 207.9800 209.9300 211.8000 212.8800 213.6600 214.4100
1971:01 215.5400 217.4200 218.7700 220.0000 222.0200 223.4500
1971:07 224.8500 225.5800 226.4700 227.1600 227.7600 228.3200
1972:01 230.0900 232.3200 234.3000 235.5800 235.8900 236.6200
1972:07 238.7900 240.9300 243.1800 245.0200 246.4100 249.2500
1973:01 251.4700 252.1500 251.6700 252.7400 254.8900 256.6900
1973:07 257.5400 257.7600 257.8600 259.0400 260.9800 262.8800
1974:01 263.7600 265.3100 266.6800 267.2000 267.5600 268.4400
1974:07 269.2700 270.1200 271.0500 272.3500 273.7100 274.2000
1975:01 273.9000 275.0000 276.4200 276.1700 279.2000 282.4300
1975:07 283.6800 284.1500 285.6900 285.3900 286.8300 287.0700
1976:01 288.4200 290.7600 292.7000 294.6600 295.9300 296.1600
1976:07 297.2000 299.0500 299.6700 302.0400 303.5900 306.2500
1977:01 308.2600 311.5400 313.9400 316.0200 317.1900 318.7100
1977:07 320.1900 322.2700 324.4800 326.4000 328.6400 330.8700
1978:01 334.4000 335.3000 336.9600 339.9200 344.8600 346.8000
1978:07 347.6300 349.6600 352.2600 353.3500 355.4100 357.2800
1979:01 358.6000 359.9100 362.4500 368.0500 369.5900 373.3400
1979:07 377.2100 378.8200 379.2800 380.8700 380.8100 381.7700
1980:01 385.8500 389.7000 388.1300 383.4400 384.6000 389.4600
1980:07 394.9100 400.0600 405.3600 409.0600 410.3700 408.0600
1981:01 410.8300 414.3800 418.6900 427.0600 424.4300 425.5000
1981:07 427.9000 427.8500 427.4600 428.4500 430.8800 436.1700
1982:01 442.1300 441.4900 442.3700 446.7800 446.5300 447.8900
1982:07 449.0900 452.4900 457.5000 464.5700 471.1200 474.3000
1983:01 476.6800 483.8500 490.1800 492.7700 499.7800 504.3500
1983:07 508.9600 511.6000 513.4100 517.2100 518.5300 520.7900
1984:01 524.4000 526.9900 530.7800 534.0300 536.5900 540.5400
1984:07 542.1300 542.3900 543.8600 543.8700 547.3200 551.1900

(Continued)
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 32

32 Part One Single-Equation Regression Models

TABLE 1.5 1985:01 555.6600 562.4800 565.7400 569.5500 575.0700 583.1700


(Continued) 1985:07 590.8200 598.0600 604.4700 607.9100 611.8300 619.3600
1986:01 620.4000 624.1400 632.8100 640.3500 652.0100 661.5200
1986:07 672.2000 680.7700 688.5100 695.2600 705.2400 724.2800
1987:01 729.3400 729.8400 733.0100 743.3900 746.0000 743.7200
1987:07 744.9600 746.9600 748.6600 756.5000 752.8300 749.6800
1988:01 755.5500 757.0700 761.1800 767.5700 771.6800 779.1000
1988:07 783.4000 785.0800 784.8200 783.6300 784.4600 786.2600
1989:01 784.9200 783.4000 782.7400 778.8200 774.7900 774.2200
1989:07 779.7100 781.1400 782.2000 787.0500 787.9500 792.5700
1990:01 794.9300 797.6500 801.2500 806.2400 804.3600 810.3300
1990:07 811.8000 817.8500 821.8300 820.3000 822.0600 824.5600
1991:01 826.7300 832.4000 838.6200 842.7300 848.9600 858.3300
1991:07 862.9500 868.6500 871.5600 878.4000 887.9500 896.7000
1992:01 910.4900 925.1300 936.0000 943.8900 950.7800 954.7100
1992:07 964.6000 975.7100 988.8400 1004.340 1016.040 1024.450
1993:01 1030.900 1033.150 1037.990 1047.470 1066.220 1075.610
1993:07 1085.880 1095.560 1105.430 1113.800 1123.900 1129.310
1994:01 1132.200 1136.130 1139.910 1141.420 1142.850 1145.650
1994:07 1151.490 1151.390 1152.440 1150.410 1150.440 1149.750
1995:01 1150.640 1146.740 1146.520 1149.480 1144.650 1144.240
1995:07 1146.500 1146.100 1142.270 1136.430 1133.550 1126.730
1996:01 1122.580 1117.530 1122.590 1124.520 1116.300 1115.470
1996:07 1112.340 1102.180 1095.610 1082.560 1080.490 1081.340
1997:01 1080.520 1076.200 1072.420 1067.450 1063.370 1065.990
1997:07 1067.570 1072.080 1064.820 1062.060 1067.530 1074.870
1998:01 1073.810 1076.020 1080.650 1082.090 1078.170 1077.780
1998:07 1075.370 1072.210 1074.650 1080.400 1088.960 1093.350
1999:01 1091.000 1092.650 1102.010 1108.400 1104.750 1101.110
1999:07 1099.530 1102.400 1093.460

1.6. Controlled experiments in economics: On April 7, 2000, President Clinton signed into
law a bill passed by both Houses of the U.S. Congress that lifted earnings limitations
on Social Security recipients. Until then, recipients between the ages of 65 and 69 who
earned more than $17,000 a year would lose $1 worth of Social Security benefit for
every $3 of income earned in excess of $17,000. How would you devise a study to
assess the impact of this change in the law? Note: There was no income limitation for
recipients over the age of 70 under the old law.
1.7. The data presented in Table 1.6 were published in the March 1, 1984, issue of The Wall
Street Journal. They relate to the advertising budget (in millions of dollars) of 21 firms
for 1983 and millions of impressions retained per week by the viewers of the products
of these firms. The data are based on a survey of 4000 adults in which users of the
products were asked to cite a commercial they had seen for the product category in the
past week.
a. Plot impressions on the vertical axis and advertising expenditure on the horizontal
axis.
b. What can you say about the nature of the relationship between the two variables?
c. Looking at your graph, do you think it pays to advertise? Think about all those
commercials shown on Super Bowl Sunday or during the World Series.
Note: We will explore further the data given in Table 1.6 in subsequent chapters.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 33

Chapter 1 The Nature of Regression Analysis 33

TABLE 1.6
Impressions, Expenditure,
Impact of Advertising
Firm millions millions of 1983 dollars
Expenditure
1. Miller Lite 32.1 50.1
Source: http://lib.stat.cmu.edu/
DASL/Datafiles/tvadsdat.html. 2. Pepsi 99.6 74.1
3. Stroh’s 11.7 19.3
4. Fed’l Express 21.9 22.9
5. Burger King 60.8 82.4
6. Coca-Cola 78.6 40.1
7. McDonald’s 92.4 185.9
8. MCl 50.7 26.9
9. Diet Cola 21.4 20.4
10. Ford 40.1 166.2
11. Levi’s 40.8 27.0
12. Bud Lite 10.4 45.6
13. ATT/Bell 88.9 154.9
14. Calvin Klein 12.0 5.0
15. Wendy’s 29.2 49.7
16. Polaroid 38.0 26.9
17. Shasta 10.0 5.7
18. Meow Mix 12.3 7.6
19. Oscar Meyer 23.4 9.2
20. Crest 71.1 32.4
21. Kibbles ‘N Bits 4.4 6.1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy