The Nature of Regression Analysis
The Nature of Regression Analysis
Chapter
1
The Nature of
Regression Analysis
As mentioned in the Introduction, regression is a main tool of econometrics, and in this
chapter we consider very briefly the nature of this tool.
1
Francis Galton, “Family Likeness in Stature,” Proceedings of Royal Society, London, vol. 40, 1886,
pp. 42–72.
2
K. Pearson and A. Lee, “On the Laws of Inheritance,’’ Biometrika, vol. 2, Nov. 1903, pp. 357–462.
15
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 16
The full import of this view of regression analysis will become clearer as we progress, but
a few simple examples will make the basic concept quite clear.
Examples
1. Reconsider Galton’s law of universal regression. Galton was interested in finding out
why there was a stability in the distribution of heights in a population. But in the modern
view our concern is not with this explanation but rather with finding out how the average
height of sons changes, given the fathers’ height. In other words, our concern is with pre-
dicting the average height of sons knowing the height of their fathers. To see how this can
be done, consider Figure 1.1, which is a scatter diagram, or scattergram. This figure
shows the distribution of heights of sons in a hypothetical population corresponding to the
given or fixed values of the father’s height. Notice that corresponding to any given height of
a father is a range or distribution of the heights of the sons. However, notice that despite the
variability of the height of sons for a given value of father’s height, the average height of
sons generally increases as the height of the father increases. To show this clearly, the cir-
cled crosses in the figure indicate the average height of sons corresponding to a given
height of the father. Connecting these averages, we obtain the line shown in the figure. This
line, as we shall see, is known as the regression line. It shows how the average height of
sons increases with the father’s height.3
2. Consider the scattergram in Figure 1.2, which gives the distribution in a hypothetical
population of heights of boys measured at fixed ages. Corresponding to any given age, we
have a range, or distribution, of heights. Obviously, not all boys of a given age are likely to
have identical heights. But height on the average increases with age (of course, up to a
FIGURE 1.1
Hypothetical 75 × Mean value
distribution of sons’
×
heights corresponding ×
× ×
to given heights of × ×
×
× ×
fathers. 70 × × ×
× ×
× ×
× × ×
Son's height, inches
× × ×
× × ×
× × × ×
× ×
× × × ×
× × × ×
65
× × ×
× × × ×
× × × ×
× × ×
× × ×
× × ×
× × ×
× × ×
60 × ×
× ×
60 65 70 75
Father's height, inches
3
At this stage of the development of the subject matter, we shall call this regression line simply the
line connecting the mean, or average, value of the dependent variable (son’s height) corresponding to
the given value of the explanatory variable (father’s height). Note that this line has a positive slope but
the slope is less than 1, which is in conformity with Galton’s regression to mediocrity. (Why?)
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 17
Height, inches
50
40
10 11 12 13 14
Age, years
certain age), which can be seen clearly if we draw a line (the regression line) through the cir-
cled points that represent the average height at the given ages. Thus, knowing the age, we
may be able to predict from the regression line the average height corresponding to that age.
3. Turning to economic examples, an economist may be interested in studying the de-
pendence of personal consumption expenditure on aftertax or disposable real personal in-
come. Such an analysis may be helpful in estimating the marginal propensity to consume
(MPC), that is, average change in consumption expenditure for, say, a dollar’s worth of
change in real income (see Figure 1.3).
4. A monopolist who can fix the price or output (but not both) may want to find out
the response of the demand for a product to changes in price. Such an experiment may
enable the estimation of the price elasticity (i.e., price responsiveness) of the demand for the
product and may help determine the most profitable price.
5. A labor economist may want to study the rate of change of money wages in relation to
the unemployment rate. The historical data are shown in the scattergram given in Figure 1.3.
The curve in Figure 1.3 is an example of the celebrated Phillips curve relating changes in the
money wages to the unemployment rate. Such a scattergram may enable the labor economist
to predict the average change in money wages given a certain unemployment rate. Such
knowledge may be helpful in stating something about the inflationary process in an econ-
omy, for increases in money wages are likely to be reflected in increased prices.
6. From monetary economics it is known that, other things remaining the same, the
higher the rate of inflation π, the lower the proportion k of their income that people would
want to hold in the form of money, as depicted in Figure 1.4. The slope of this line repre-
sents the change in k given a change in the inflation rate. A quantitative analysis of this
relationship will enable the monetary economist to predict the amount of money, as a
proportion of their income, that people would want to hold at various rates of inflation.
7. The marketing director of a company may want to know how the demand for the
company’s product is related to, say, advertising expenditure. Such a study will be of
considerable help in finding out the elasticity of demand with respect to advertising ex-
penditure, that is, the percent change in demand in response to, say, a 1 percent change in
the advertising budget. This knowledge may be helpful in determining the “optimum”
advertising budget.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 18
FIGURE 1.3 +
Hypothetical Phillips
curve.
FIGURE 1.4 k=
Money
Money holding in Income
relation to the inflation
rate π.
0 π
Inflation rate
4
The word stochastic comes from the Greek word stokhos meaning “a bull’s eye.” The outcome of
throwing darts on a dart board is a stochastic process, that is, a process fraught with misses.
5
M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Charles Griffin Publishers, New York,
vol. 2, 1961, chap. 26, p. 279.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 20
In the crop-yield example cited previously, there is no statistical reason to assume that
rainfall does not depend on crop yield. The fact that we treat crop yield as dependent on
rainfall (among other things) is due to nonstatistical considerations: Common sense
suggests that the relationship cannot be reversed, for we cannot control rainfall by varying
crop yield.
In all the examples cited in Section 1.2 the point to note is that a statistical relationship
in itself cannot logically imply causation. To ascribe causality, one must appeal to a priori
or theoretical considerations. Thus, in the third example cited, one can invoke economic
theory in saying that consumption expenditure depends on real income.6
6
But as we shall see in Chapter 3, classical regression analysis is based on the assumption that the
model used in the analysis is the correct model. Therefore, the direction of causality may be implicit
in the model postulated.
7
It is crucial to note that the explanatory variables may be intrinsically stochastic, but for the purpose
of regression analysis we assume that their values are fixed in repeated sampling (that is, X assumes
the same values in various samples), thus rendering them in effect nonrandom or nonstochastic. But
more on this in Chapter 3, Sec. 3.2.
8
In advanced treatment of econometrics, one can relax the assumption that the explanatory variables
are nonstochastic (see introduction to Part 2).
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 21
Although it is a matter of personal taste and tradition, in this text we will use the dependent
variable/explanatory variable or the more neutral regressand and regressor terminology.
If we are studying the dependence of a variable on only a single explanatory variable,
such as that of consumption expenditure on real income, such a study is known as simple,
or two-variable, regression analysis. However, if we are studying the dependence of one
variable on more than one explanatory variable, as in the crop-yield, rainfall, temperature,
sunshine, and fertilizer example, it is known as multiple regression analysis. In other
words, in two-variable regression there is only one explanatory variable, whereas in multi-
ple regression there is more than one explanatory variable.
The term random is a synonym for the term stochastic. As noted earlier, a random or
stochastic variable is a variable that can take on any set of values, positive or negative, with
a given probability.9
Unless stated otherwise, the letter Y will denote the dependent variable and the X’s
( X 1 , X 2 , . . . , X k ) will denote the explanatory variables, X k being the kth explanatory
variable. The subscript i or t will denote the ith or the tth observation or value. X ki (or X kt )
will denote the ith (or tth) observation on variable X k . N (or T) will denote the total
number of observations or values in the population, and n (or t) the total number of obser-
vations in a sample. As a matter of convention, the observation subscript i will be used for
cross-sectional data (i.e., data collected at one point in time) and the subscript t will be
used for time series data (i.e., data collected over a period of time). The nature of cross-
sectional and time series data, as well as the important topic of the nature and sources of
data for empirical analysis, is discussed in the following section.
9
See Appendix A for formal definition and further details.
guj75772_ch01.qxd 23/08/2008 10:42 AM Page 22
Types of Data
Three types of data may be available for empirical analysis: time series, cross-section, and
pooled (i.e., combination of time series and cross-section) data.
Cross-Section Data
Cross-section data are data on one or more variables collected at the same point in time,
such as the census of population conducted by the Census Bureau every 10 years (the lat-
est being in year 2000), the surveys of consumer expenditures conducted by the University
of Michigan, and, of course, the opinion polls by Gallup and umpteen other organizations.
A concrete example of cross-sectional data is given in Table 1.1. This table gives data on
egg production and egg prices for the 50 states in the union for 1990 and 1991. For each
10
For an informative account, see Michael D. Intriligator, Econometric Models, Techniques, and
Applications, Prentice Hall, Englewood Cliffs, N.J., 1978, chap. 3.
11
To see this more clearly, we divided the data into four time periods: 1951:01 to 1962:12; 1963:01
to 1974:12; 1975:01 to 1986:12, and 1987:01 to 1999:09: For these subperiods the mean values of
the money supply (with corresponding standard deviations in parentheses) were, respectively, 165.88
(23.27), 323.20 (72.66), 788.12 (195.43), and 1099 (27.84), all figures in billions of dollars. This is a
rough indication of the fact that the money supply over the entire period was not stationary.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 23
800
600
400
200
0
55 60 65 70 75 80 85 90 95
year the data on the 50 states are cross-sectional data. Thus, in Table 1.1 we have two cross-
sectional samples.
Just as time series data create their own special problems (because of the stationarity
issue), cross-sectional data too have their own problems, specifically the problem of hetero-
geneity. From the data given in Table 1.1 we see that we have some states that produce huge
amounts of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska). When
we include such heterogeneous units in a statistical analysis, the size or scale effect must be
taken into account so as not to mix apples with oranges. To see this clearly, we plot in Fig-
ure 1.6 the data on eggs produced and their prices in 50 states for the year 1990. This figure
shows how widely scattered the observations are. In Chapter 11 we will see how the scale
effect can be an important factor in assessing relationships among economic variables.
Pooled Data
In pooled, or combined, data are elements of both time series and cross-section data. The
data in Table 1.1 are an example of pooled data. For each year we have 50 cross-sectional
observations and for each state we have two time series observations on prices and output
of eggs, a total of 100 pooled (or combined) observations. Likewise, the data given in
Exercise 1.1 are pooled data in that the Consumer Price Index (CPI) for each country
for 1980–2005 is time series data, whereas the data on the CPI for the seven countries
for a single year are cross-sectional data. In the pooled data we have 182 observations—
26 annual observations for each of the seven countries.
120
100
80
60
40
0 2000 4000 6000 8000
Number of eggs produced (millions)
As a concrete example, consider the data given in Table 1.2. The data in the table, orig-
inally collected by Y. Grunfeld, refer to the real investment, the real value of the firm, and
the real capital stock of four U.S. companies, namely, General Electric (GM), U.S. Steel
(US), General Motors (GM), and Westinghouse (WEST), for the period 1935–1954.12
Since the data are for several companies collected over a number of years, this is a classic
example of panel data. In this table, the number of observations for each company is the
same, but this is not always the case. If all the companies have the same number of obser-
vations, we have what is called a balanced panel. If the number of observations is not the
same for each company, it is called an unbalanced panel. In Chapter 16, Panel Data
Regression Models, we will examine such data and show how to estimate such models.
Grunfeld’s purpose in collecting these data was to find out how real gross investment (I)
depends on the real value of the firm (F) a year earlier and real capital stock (C) a year
earlier. Since the companies included in the sample operate in the same capital market, by
studying them together, Grunfeld wanted to find out if they had similar investment functions.
12
Y. Grunfeld, “The Determinants of Corporate Investment,” unpublished PhD thesis, Department of
Economics, University of Chicago, 1958. These data have become a workhorse for illustrating panel
data regression models.
13
For an illuminating account, see Albert T. Somers, The U.S. Economy Demystified: What the Major
Economic Statistics Mean and their Significance for Business, D.C. Heath, Lexington, Mass., 1985.
14
In the social sciences too sometimes one can have a controlled experiment. An example is given in
Exercise 1.6.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 26
Notes: Y = I = gross investment = additions to plant and equipment plus maintenance and repairs, in millions of dollars deflated by P1.
X2 = F = value of the firm = price of common and preferred shares at Dec. 31 (or average price of Dec. 31 and Jan. 31 of the following year) times
number of common and preferred shares outstanding plus total book value of debt at Dec. 31, in millions of dollars deflated by P2.
X3 = C = stock of plant and equipment = accumulated sum of net additions to plant and equipment deflated by P1 minus depreciation allowance
deflated by P3 in these definitions.
P1 = implicit price deflator of producers’ durable equipment (1947 = 100).
P2 = implicit price deflator of GNP (1947 = 100).
P3 = depreciation expense deflator = 10-year moving average of wholesale price index of metals and metal products (1947 = 100).
Source: Reproduced from H. D. Vinod and Aman Ullah, Recent Advances in Regression Methods, Marcel Dekker, New York, 1981, pp. 259–261.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 27
1. As noted, most social science data are nonexperimental in nature. Therefore, there is the
possibility of observational errors, either of omission or commission.
2. Even in experimentally collected data, errors of measurement arise from approxima-
tions and roundoffs.
3. In questionnaire-type surveys, the problem of nonresponse can be serious; a researcher
is lucky to get a 40 percent response rate to a questionnaire. Analysis based on such a
partial response rate may not truly reflect the behavior of the 60 percent who did not re-
spond, thereby leading to what is known as (sample) selectivity bias. Then there is the
further problem that those who do respond to the questionnaire may not answer all the
questions, especially questions of a financially sensitive nature, thus leading to additional
selectivity bias.
4. The sampling methods used in obtaining the data may vary so widely that it is often dif-
ficult to compare the results obtained from the various samples.
5. Economic data are generally available at a highly aggregate level. For example, most
macrodata (e.g., GNP, employment, inflation, unemployment) are available for the econ-
omy as a whole or at the most for some broad geographical regions. Such highly aggre-
gated data may not tell us much about the individuals or microunits that may be the
ultimate object of study.
6. Because of confidentiality, certain data can be published only in highly aggregate form.
The IRS, for example, is not allowed by law to disclose data on individual tax returns;
it can only release some broad summary data. Therefore, if one wants to find out how
much individuals with a certain level of income spent on health care, one cannot do so
except at a very highly aggregate level. Such macroanalysis often fails to reveal the dy-
namics of the behavior of the microunits. Similarly, the Department of Commerce,
which conducts the census of business every 5 years, is not allowed to disclose infor-
mation on production, employment, energy consumption, research and development
expenditure, etc., at the firm level. It is therefore difficult to study the interfirm differences
on these items.
Because of all of these and many other problems, the researcher should always keep
in mind that the results of research are only as good as the quality of the data. There-
fore, if in given situations researchers find that the results of the research are “unsatisfac-
tory,” the cause may be not that they used the wrong model but that the quality of the data
was poor. Unfortunately, because of the nonexperimental nature of the data used in most
social science studies, researchers very often have no choice but to depend on the available
data. But they should always keep in mind that the data used may not be the best and should
try not to be too dogmatic about the results obtained from a given study, especially when
the quality of the data is suspect.
15
For a critical review, see O. Morgenstern, The Accuracy of Economic Observations, 2d ed., Princeton
University Press, Princeton, N.J., 1963.
16
The following discussion relies heavily on Aris Spanos, Probability Theory and Statistical Inference:
Econometric Modeling with Observational Data, Cambridge University Press, New York, 1999, p. 24.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 28
Ratio Scale
For a variable X, taking two values, X1 and X2, the ratio X1X2 and the distance (X2 − X1)
are meaningful quantities. Also, there is a natural ordering (ascending or descending) of the
values along the scale. Therefore, comparisons such as X 2 ≤ X 1 or X 2 ≥ X 1 are meaning-
ful. Most economic variables belong to this category. Thus, it is meaningful to ask how big
this year’s GDP is compared with the previous year’s GDP. Personal income, measured
in dollars, is a ratio variable; someone earning $100,000 is making twice as much as an-
other person earning $50,000 (before taxes are assessed, of course!).
Interval Scale
An interval scale variable satisfies the last two properties of the ratio scale variable but not
the first. Thus, the distance between two time periods, say (2000–1995) is meaningful, but
not the ratio of two time periods (2000/1995). At 11:00 a.m. PST on August 11, 2007,
Portland, Oregon, reported a temperature of 60 degrees Fahrenheit while Tallahassee,
Florida, reached 90 degrees. Temperature is not measured on a ratio scale since it does not
make sense to claim that Tallahassee was 50 percent warmer than Portland. This is mainly
due to the fact that the Fahrenheit scale does not use 0 degrees as a natural base.
Ordinal Scale
A variable belongs to this category only if it satisfies the third property of the ratio scale
(i.e., natural ordering). Examples are grading systems (A, B, C grades) or income class
(upper, middle, lower). For these variables the ordering exists but the distances between the
categories cannot be quantified. Students of economics will recall the indifference curves
between two goods. Each higher indifference curve indicates a higher level of utility, but
one cannot quantify by how much one indifference curve is higher than the others.
Nominal Scale
Variables in this category have none of the features of the ratio scale variables. Variables
such as gender (male, female) and marital status (married, unmarried, divorced, separated)
simply denote categories. Question: What is the reason why such variables cannot be
expressed on the ratio, interval, or ordinal scales?
As we shall see, econometric techniques that may be suitable for ratio scale variables
may not be suitable for nominal scale variables. Therefore, it is important to bear in mind
the distinctions among the four types of measurement scales discussed above.
Summary and 1. The key idea behind regression analysis is the statistical dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variables.
Conclusions
2. The objective of such analysis is to estimate and/or predict the mean or average value of the
dependent variable on the basis of the known or fixed values of the explanatory variables.
3. In practice the success of regression analysis depends on the availability of the appro-
priate data. This chapter discussed the nature, sources, and limitations of the data that
are generally available for research, especially in the social sciences.
4. In any research, the researcher should clearly state the sources of the data used in
the analysis, their definitions, their methods of collection, and any gaps or omissions
in the data as well as any revisions in the data. Keep in mind that the macroeconomic
data published by the government are often revised.
5. Since the reader may not have the time, energy, or resources to track down the data, the
reader has the right to presume that the data used by the researcher have been properly
gathered and that the computations and analysis are correct.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 29
EXERCISES 1.1. Table 1.3 gives data on the Consumer Price Index (CPI) for seven industrialized
countries with 1982–1984 = 100 as the base of the index.
a. From the given data, compute the inflation rate for each country.17
b. Plot the inflation rate for each country against time (i.e., use the horizontal axis for
time and the vertical axis for the inflation rate).
c. What broad conclusions can you draw about the inflation experience in the seven
countries?
d. Which country’s inflation rate seems to be most variable? Can you offer any
explanation?
1.2. a. Using Table 1.3, plot the inflation rate of Canada, France, Germany, Italy, Japan,
and the United Kingdom against the United States inflation rate.
b. Comment generally about the behavior of the inflation rate in the six countries
vis-à-vis the U.S. inflation rate.
c. If you find that the six countries’ inflation rates move in the same direction as the
U.S. inflation rate, would that suggest that U.S. inflation “causes” inflation in the
other countries? Why or why not?
TABLE 1.3 Year U.S. Canada Japan France Germany Italy U.K.
CPI in Seven
Industrial Countries, 1980 82.4 76.1 91.0 72.2 86.7 63.9 78.5
1980–2005 1981 90.9 85.6 95.3 81.8 92.2 75.5 87.9
(1982–1984 = 100) 1982 96.5 94.9 98.1 91.7 97.0 87.8 95.4
1983 99.6 100.4 99.8 100.3 100.3 100.8 99.8
Source: Economic Report of the 1984 103.9 104.7 102.1 108.0 102.7 111.4 104.8
President, 2007, Table 108,
p. 354. 1985 107.6 109.0 104.2 114.3 104.8 121.7 111.1
1986 109.6 113.5 104.9 117.2 104.6 128.9 114.9
1987 113.6 118.4 104.9 121.1 104.9 135.1 119.7
1988 118.3 123.2 105.6 124.3 106.3 141.9 125.6
1989 124.0 129.3 108.0 128.7 109.2 150.7 135.4
1990 130.7 135.5 111.4 132.9 112.2 160.4 148.2
1991 136.2 143.1 115.0 137.2 116.3 170.5 156.9
1992 140.3 145.3 117.0 140.4 122.2 179.5 162.7
1993 144.5 147.9 118.5 143.4 127.6 187.7 165.3
1994 148.2 148.2 119.3 145.8 131.1 195.3 169.3
1995 152.4 151.4 119.2 148.4 133.3 205.6 175.2
1996 156.9 153.8 119.3 151.4 135.3 213.8 179.4
1997 160.5 156.3 121.5 153.2 137.8 218.2 185.1
1998 163.0 157.8 122.2 154.2 139.1 222.5 191.4
1999 166.6 160.5 121.8 155.0 140.0 226.2 194.3
2000 172.2 164.9 121.0 157.6 142.0 231.9 200.1
2001 177.1 169.1 120.1 160.2 144.8 238.3 203.6
2002 179.9 172.9 119.0 163.3 146.7 244.3 207.0
2003 184.0 177.7 118.7 166.7 148.3 250.8 213.0
2004 188.9 181.0 118.7 170.3 150.8 256.3 219.4
2005 195.3 184.9 118.3 173.2 153.7 261.3 225.6
17
Subtract from the current year’s CPI the CPI from the previous year, divide the difference by the
previous year’s CPI, and multiply the result by 100. Thus, the inflation rate for Canada for 1981 is
[(85.6 − 76.1)/76.1] × 100 = 12.48% (approx.).
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 30
1.3. Table 1.4 gives the foreign exchange rates for nine industrialized countries for the
years 1985–2006. Except for the United Kingdom, the exchange rate is defined as
the units of foreign currency for one U.S. dollar; for the United Kingdom, it is defined
as the number of U.S. dollars for one U.K. pound.
a. Plot these exchange rates against time and comment on the general behavior of the
exchange rates over the given time period.
b. The dollar is said to appreciate if it can buy more units of a foreign currency.
Contrarily, it is said to depreciate if it buys fewer units of a foreign currency. Over
the time period 1985–2006, what has been the general behavior of the U.S. dollar?
Incidentally, look up any textbook on macroeconomics or international economics
to find out what factors determine the appreciation or depreciation of a currency.
1.4. The data behind the M1 money supply in Figure 1.5 are given in Table 1.5. Can you
give reasons why the money supply has been increasing over the time period shown in
the table?
1.5. Suppose you were to develop an economic model of criminal activities, say, the hours
spent in criminal activities (e.g., selling illegal drugs). What variables would you con-
sider in developing such a model? See if your model matches the one developed by the
Nobel laureate economist Gary Becker.18
South United
Year Australia Canada China P. R. Japan Mexico Korea Sweden Switzerland Kingdom
1985 0.7003 1.3659 2.9434 238.47 0.257 872.45 8.6032 2.4552 1.2974
1986 0.6709 1.3896 3.4616 168.35 0.612 884.60 7.1273 1.7979 1.4677
1987 0.7014 1.3259 3.7314 144.60 1.378 826.16 6.3469 1.4918 1.6398
1988 0.7841 1.2306 3.7314 128.17 2.273 734.52 6.1370 1.4643 1.7813
1989 0.7919 1.1842 3.7673 138.07 2.461 674.13 6.4559 1.6369 1.6382
1990 0.7807 1.1668 4.7921 145.00 2.813 710.64 5.9231 1.3901 1.7841
1991 0.7787 1.1460 5.3337 134.59 3.018 736.73 6.0521 1.4356 1.7674
1992 0.7352 1.2085 5.5206 126.78 3.095 784.66 5.8258 1.4064 1.7663
1993 0.6799 1.2902 5.7795 111.08 3.116 805.75 7.7956 1.4781 1.5016
1994 0.7316 1.3664 8.6397 102.18 3.385 806.93 7.7161 1.3667 1.5319
1995 0.7407 1.3725 8.3700 93.96 6.447 772.69 7.1406 1.1812 1.5785
1996 0.7828 1.3638 8.3389 108.78 7.600 805.00 6.7082 1.2361 1.5607
1997 0.7437 1.3849 8.3193 121.06 7.918 953.19 7.6446 1.4514 1.6376
1998 0.6291 1.4836 8.3008 130.99 9.152 1,400.40 7.9522 1.4506 1.6573
1999 0.6454 1.4858 8.2783 113.73 9.553 1,189.84 8.2740 1.5045 1.6172
2000 0.5815 1.4855 8.2784 107.80 9.459 1,130.90 9.1735 1.6904 1.5156
2001 0.5169 1.5487 8.2770 121.57 9.337 1,292.02 10.3425 1.6891 1.4396
2002 0.5437 1.5704 8.2771 125.22 9.663 1,250.31 9.7233 1.5567 1.5025
2003 0.6524 1.4008 8.2772 115.94 10.793 1,192.08 8.0787 1.3450 1.6347
2004 0.7365 1.3017 8.2768 108.15 11.290 1,145.24 7.3480 1.2428 1.8330
2005 0.7627 1.2115 8.1936 110.11 10.894 1,023.75 7.4710 1.2459 1.8204
2006 0.7535 1.1340 7.9723 116.31 10.906 954.32 7.3718 1.2532 1.8434
18
G. S. Becker, “Crime and Punishment: An Economic Approach,” Journal of Political Economy, vol. 76,
1968, pp. 169–217.
guj75772_ch01.qxd 27/08/2008 01:14 PM Page 31
(Continued)
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 32
1.6. Controlled experiments in economics: On April 7, 2000, President Clinton signed into
law a bill passed by both Houses of the U.S. Congress that lifted earnings limitations
on Social Security recipients. Until then, recipients between the ages of 65 and 69 who
earned more than $17,000 a year would lose $1 worth of Social Security benefit for
every $3 of income earned in excess of $17,000. How would you devise a study to
assess the impact of this change in the law? Note: There was no income limitation for
recipients over the age of 70 under the old law.
1.7. The data presented in Table 1.6 were published in the March 1, 1984, issue of The Wall
Street Journal. They relate to the advertising budget (in millions of dollars) of 21 firms
for 1983 and millions of impressions retained per week by the viewers of the products
of these firms. The data are based on a survey of 4000 adults in which users of the
products were asked to cite a commercial they had seen for the product category in the
past week.
a. Plot impressions on the vertical axis and advertising expenditure on the horizontal
axis.
b. What can you say about the nature of the relationship between the two variables?
c. Looking at your graph, do you think it pays to advertise? Think about all those
commercials shown on Super Bowl Sunday or during the World Series.
Note: We will explore further the data given in Table 1.6 in subsequent chapters.
guj75772_ch01.qxd 31/07/2008 11:00 AM Page 33
TABLE 1.6
Impressions, Expenditure,
Impact of Advertising
Firm millions millions of 1983 dollars
Expenditure
1. Miller Lite 32.1 50.1
Source: http://lib.stat.cmu.edu/
DASL/Datafiles/tvadsdat.html. 2. Pepsi 99.6 74.1
3. Stroh’s 11.7 19.3
4. Fed’l Express 21.9 22.9
5. Burger King 60.8 82.4
6. Coca-Cola 78.6 40.1
7. McDonald’s 92.4 185.9
8. MCl 50.7 26.9
9. Diet Cola 21.4 20.4
10. Ford 40.1 166.2
11. Levi’s 40.8 27.0
12. Bud Lite 10.4 45.6
13. ATT/Bell 88.9 154.9
14. Calvin Klein 12.0 5.0
15. Wendy’s 29.2 49.7
16. Polaroid 38.0 26.9
17. Shasta 10.0 5.7
18. Meow Mix 12.3 7.6
19. Oscar Meyer 23.4 9.2
20. Crest 71.1 32.4
21. Kibbles ‘N Bits 4.4 6.1