Wale Vet
Wale Vet
VOLUME 3
SERIES EDITORS
EDITORIAL BOARD
The titles published in this series are listed at the end of this volume.
Computational Techniques
for Econometrics
and Economic Analysis
edited by
D. A. Belsley
Boston College, Chestnut Hill, U.S.A.
Preface vii
Part One:
The Computer and Econometric Methods
Computational Aspects of Nonparametric Simulation Estimation
Ravi Bansal, A. Ronald Gallant, Robert Hussey, and George Tauchen 3
On the Accuracy and Efficiency Of GMM Estimators: A Monte Carlo Study
A. J. Hughes Hallett and Yue Ma 23
A Bootstrap Estimator for Dynamic Optimization Models
Albert J. Reed and Charles Hallahan 45
Computation of Optimum Control Functions by Lagrange Multipliers
Gregory C. Chow 65
Part Two:
The Computer and Economic Analysis
Computational Approaches to Learning with Control Theory
David Kendrick 75
Computability, Complexity and Economics
Alfred Lorn Norman 89
Robust Min-Max Decisions with Rival Models
Ber~ Rustem 109
Part Three:
Computational Techniques for Econometrics
Wavelets in Macroeconomics: An Introduction
William L. Goffe 137
MatClass: A Matrix Class for C++
C. R. Birchenhall 151
Parallel Implementations of Primal and Dual Algorithms for Matrix
Balancing
Ismail Chabini, Omar Drissi-Kartouni and Michael Florian 173
VI Table of Contents
Part Four:
The Computer and Econometric Studies
Variational Inequalities for the Computation of Financial Equilibria in the
Presence of Taxes and Price Controls
Anna Nagurney and June Dong 189
Modeling Dynamic Resource Adjustment Using Iterative Least Squares
Agapi Somwaru, Eldon Ball and Utpal Vasavada 207
Intensity of Takeover Defenses: The Empirical Evidence
Atreya Chakraborty and Christopher F. Baum 219
Index 235
DAVID A. BELSLEY, EDITOR
Preface
D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis. vii-ix.
viii Preface
his method of Lagrange multipliers for solving the standard optimal control problem
over the more usual method of solving the Bellman equations.
Continuing his tradition of seeing what the cutting edge has to offer economic and
econometric analysis, W.L. Goffe, in ''Wavelets in Macroeconomics: An Introduc-
tion," examines the usefulness of wavelets for characterizing macroeconomic time
series.
C.R. Birchenhall, in "MatClass: A Matrix Class for C++," provides an introduc-
tion to object-oriented programming along with an actual C++ object class library
in a context of interest to econometricians: a set of numerical classes that allows the
user ready development of numerous econometric procedures.
In "Parallel Implementations of Primal and Dual Algorithms for Matrix Balanc-
ing," I. Chabini, O. Drissi-Kailouni, and M. Florian exploit the power of parallel
processing (within the accessible and inexpensive "286" MS-DOS world) to bring
the computational task of matrix balancing, both with primal and dual algorithms,
more nearly into line.
David A. Belsley
PART ONE
ABSTRACT. This paper develops a nonparametric estimator for structural equilibrium models
that combines numerical solution techniques for nonlinear rational expectations models with
nonparametric statistical techniques for characterizing the dynamic properties of time series
data. The estimator uses the the score function from a nonparametric estimate of the law
of motion of the observed data to define a GMM criterion function. In effect, it forces the
economic model to generate simulated data so as to match a nonparametric estimate of the
conditional density of the observed data. It differs from other simulated method of moments
estimators in using the nonparametric density estimate, thereby allowing the data to dictate
what features of the data are important for the structural model to match. The components
of the scoring function characterize important kinds of nonlinearity in the data, including
properties such as nonnormality and stochastic volatility.
. The nonparametric density estimate is obtained using the Gallant-Tauchen seminonpara-
metric (SNP) model. The simulated data that solve the economic model are obtained using
Marcet's method of parameterized expectations. The paper gives a detailed description of the
method of parameterized expectations applied to an equilibrium monetary model. It shows that
the choice of the specification of the Euler equations and the manner of testing convergence
have large effects on the rate of convergence of the solution procedure. It also reviews sev-
eral optimization algorithms for minimizing the GMM objective function. The Neider-Mead
simplex method is found to be far more successful than others for our estimation problem.
1. INTRODUCTION
D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 3-22.
© 1994 Kluwer Academic Publishers.
4 R. Bansal et al.
models (Gallant and Tauchen, 1989, 1992), have been used to discover and charac-
terize important forms of nonlinear behavior in economic time series, especially in
financial time series. Linear Gaussian models cannot explain such nonlinear behav-
ior in actual data. Thus, nonlinear structural models must be examined to see the
extent to which they can explain the nonlinear behavior found in actual economic
data. This paper shows how statistical techniques can be combined with numerical
solution techniques to estimate nonlinear structural equilibrium models.
The most common approach for estimation of nonlinear structural models is
probably generalized method of moments (GMM) applied to Euler equations, as de-
veloped in Hansen and Singleton (1982). This technique has been widely employed in
financial economics and macroeconomics, though it is a limited information method
and has shortcomings. For example, the estimation can encounter problems when
there are unobserved variables, as is the case for the model we consider in Section
2 where the decision interval is a week, but some of the data are observed monthly.
Also it does not provide an estimate of the law of motion of the economic variables.
Thus, if the model is rejected, little information is available regarding the properties
of the observed data that the model has failed to capture.
In this paper we describe an alternative strategy for estimating nonlinear structural
models that was first applied in Bansal, Gallant, Hussey, and Tauchen (1992). The
approach is similar to the simulated method of moments estimators of Duffie and
Singleton (1989) and Ingram and Lee (1991). However, unlike those estimators,
which match preselected moments of the data, our estimator minimizes a GMM
criterion based on the score function of a nonparametric estimator of the conditional
density of the observed data. In effect, the estimator uses as a standard of comparison
a nonparametric estimate of the law of motion of the observed data. By selecting the
GMM criterion in this way, we allow the observed data to determine the dynamic
properties the structural model must match.
The estimator works by combining the method of parameterized expectations for
numerically solving a nonlinear structural equilibrium model (Marcet, 1991; den
Haan and Marcet, 1990) with the seminonparametric (SNP) method for estimating
the conditional density of actual data (Gallant and Tauchen, 1989, 1992). For a par-
ticular setting of the parameters of the structural model, the method of parameterized
expectations generates simulated data that solve the model. The model parameters
are then estimated by searching for the parameter values that minimize a GMM crite-
rion function based on the scoring function of the SNP conditional density estimate.
The nonparametric structural estimator thus has three components: (1) using SNP to
estimate the conditional density of actual data, (2) using the method of parameter-
ized expectations to obtain simulated data that satisfy the structural model, and (3)
estimating the underlying structural parameters by using an optimization algorithm
that finds those parameter values that minimize the GMM criterion function.
Below we discuss in detail how the estimator works in the context of a two-country
equilibrium monetary model. The model is based on Lucas (1982), Svensson (1985),
and Bansal (1990), and is developed in full detail in Bansal, Gallant, Hussey, and
Tauchen (1992). It accommodates time non-separabilities in preferences (Dunn and
Singleton, 1986) and money via a transactions cost technology (Feenstra, 1986). In
An Estimator for Nonlinear Structural Models 5
effect, the model is a nonlinear filter that maps exogenous endowment and money
supply processes into endogenous nominal processes, including exchange rates, in-
terest rates, and forward rates. We show how this nonlinear dynamic model can be
solved and simulated for estimation and evaluation.
In applying our estimator to this model, we find that there are several choices
available to the researcher that greatly affect the estimator's success and rate of
convergence. For example, the form in which one specifies the Euler equations
on which the parameterized expectations algorithm operates can significantly affect
the speed of convergence. This is an important finding, since our estimator uses
this algorithm repeatedly at different model parameter values. Also, the means for
testing convergence can have important consequences; we find it best to test for
convergence of the projection used in parameterized expectations instead of testing
for convergence of the coefficients representing the projection. Finally, we find that
the complexity of our estimation procedure causes some optimization algorithms
to have greater success in minimizing the GMM objective function. Among the
optimization techniques we tried are gradient search methods, simulated annealing,
and simplex methods. In Section 3.1 below we discuss how these methods work and
their strengths and weaknesses for ol!r type of optimization problem.
The rest of the paper is organized as follows: Section 2 specifies the illustrative
monetary model and describes the simulation estimator. Section 3 discusses practical
aspects of implementing the estimator, including solving the model with parameter-
ized expectations and optimizing the GMM objective function to estimate the model
parameters. Concluding remarks comprise the final section.
Eo f
t=o
J3t [(cr; c;;-6 r--r - 1] /(1 - 1),
where 0 < J3 < 1,0 < 8 < 1,1 > 0, and where Cit and Cit are the consumption
services from goods produced in countries 1 and 2, respectively. Preferences are of the
constant relative risk aversion (CRRA) type in terms of the composite consumption
goods. The parameter 1 is the coefficient of relative risk aversion, 8 determines the
allocation of expenditure between the two services, and J3 is the subjective discount
factor. If 1 = 1, then preferences collapse to log-utility
00
where Cit and C2t are the acquisitions of goods, the /'i,ij determine the extent to which
past acquisitions of goods provide services (and hence utility) in the current period,
and Lc is the lag length. If Lc = 0, then the utility function collapses to the standard
time separable case where Cit = CIt and Cit = C2t. If the nonseparability parameters
/'i,ij are positive, then past acquisitions of goods provide services today. If they are
negative, then there is habit persistence. Other patterns are possible as well. Recent
acquisitions of goods can provide services today, while acquisitions further in the
past contribute to habit persistence.
We introduce money into the model via a transaction-costs technology. The
underlying justification for transactions costs is that the acquisition of goods is costly
both in terms of resources and time. Money, by its presence, economizes on these
costs and hence is valued in equilibrium. Transaction costs, 1/J( c, m), in our model are
an increasing function of the amount of goods consumed C and a decreasing function
of the magnitude of real balances m held by the consumer in the trading period. The
functional form we use for the transaction-costs technology is
where et is the spot exchange rate and H is the k-period forward exchange rate, with
both rates defined in units of country 1's currency per unit of country 2's currency.
Wit and W2t are the stochastic endowments of goods within the two countries. Lump
sum transfers of qlt and q2t units of currency are made by the government at time t.
These transfers are known to the agent at the beginning of period t but can be used
for carrying out transactions only in period t + 1.
The stationary decision problem facing the agent delivers the following Euler
equations for the asset holdings MI,t+1 and M 2,t+1 :
i = 1,2,
where MUcit is the marginal utility of Cit, and 'l/JCit and 'l/Jmit are the derivatives
of transaction costs, 'l/J( Cit, mit), with respect to the first and second arguments,
respectively. Transactions costs modify the returns to the two monies, Mit and M 2t .
We would expect Plt/ PI,t+1 to be the return attime t+ 1 for carrying forward an extra
unit of country one's currency today. However, because of transaction costs, every
extra unit of currency carried forward also lowers transaction costs in the next period
by a real amount, -'l/Jmi,t+1 ' so the total return is given by [(1 - 'l/Jmi,t+l )Pit/ Pi,t+d,
The model also delivers an intratemporal restriction on the choice of goods Cit and
C2t
i = 1,2.
The parameter vector of the structural economic model is
A = ({3, 'Y, 6, 1/Jo, a, "-11, ... , "-lLc> "-21, ... , "-2L c> a~, vec(A)', vech(n1/2)')'.
For each value of A the model defines a nonlinear mapping from the strictly exogenous
process {St} to an output process {Ut }. The output process is
Ut = (dMlt , dM2t , dWlt, dW2t, dClt, dc2t , dP1t , dP2t , R1t, It let, det)',
which is an II-vector containing the elements of St along with the gross consumption
growth rates, the gross inflation rates, the four-period interest rate in country 1, the
ratio of the four-period forward exchange rate to the spot rate, and the gross growth
rate of the spot exchange rate. It proves convenient also to include the elements of St
in the output process, mapping them directly with an identity map. The particular set
of variables comprising the remaining elements of Ut are those endogenous variables
that turn out to be of interest for various aspects of the analysis of the model and the
empirical work.
The mapping from ( { St}, A) to the endogenous elements of Ut is defined by the
solution to the nonlinear rational expectations model. In practice, we use Marcet's
method of parameterized expectations (Marcet, 1991; den Haan and Marcet, 1990)
to approximate the map. Given a value of A, the method "solves" the model in the
sense of determining simulated r~alizations of the variables that satisfy the Euler
equations. In what follows, {ul'} denotes a realization of the output process given
A and a realization of {St}. A complete description of how we apply the method of
parameterized expectations to this problem is given in Section 3.1 below.
subscript K denotes the Kth model in the hierarchy. The length of OKn depends on
the model. In practice, K is determined by a model selection criterion that slowly
expands the model with sample size n and thereby ensures consistency. For the Kth
model in the hierarchy, the corresponding 0Kn solves the first-order condition
Kn
The left-hand side is the gradient of the log likelihood function evaluated at a simu-
lated realization {y;} ;=TO and at the 0Kn determined by fitting the Kth SNP model
to the actual data {ytl f::to. If the length of >., fA, is less than the length of (I K, f K,
then the model is overidentified (under the order condition) and a GMM criterion is
used to minimize the length of the left-hand side with respect to a suitable weighting
matrix.
Interestingly, this approach defines a consistent and asymptotically normal esti-
mator irrespective of the particular SNP model used, so long as fK ~ fA and an
identification condition is met. In practice, we implement the estimator using the
particular SNP model that emerges from the specification search in the nonparametric
estimation of f (·1· ). The choice of K is thus data-determined. This selection rule
forces the scoring function to be appropriate for the particular sample at hand. The
scoring function of the fitted SNP model contains just those indicators important to
fit the data and no more. Also, because the fitted SNP model has the interpretation of
a nonparametric maximum-likelihood estimator, the information equality from max-
imum likelihood theory provides a convenient simplification that greatly facilitates
estimation of the weighting matrix for the GMM estimation.
to form the GMM objective function for the nonparametric structural estimator. The
third is optimization of the objective function. Each of these components is described
in detail below.
We use the method of parameterized expectations (Marcet, 1991; den Haan and
Marcet, 1990) to obtain simulated data that satisfy the Euler equations of the structural
economic model. In essence, this method approximates conditional expectations of
certain terms with the projections of those terms on a polynomial in the state variables.
The method uses Euler equations to iterate between postulated values of time series
and projections based on those postulated values until those values and projections
each converge. This procedure will be explained more fully below.
We find that the specification of the Euler equations greatly affects the speed with
which the parameterized expectations algorithm converges. From Section 2, the first
two Euler equations are
Using the definition of the velocity of money, Vit = Cit Pit! Mit, i = 1,2, one form
in which these equations can be rewritten is
E t ( MUcit )
Vit = E [j3MU ( dCi,t+1 )( ./,
I+""cit ) (1 -1/J ) ]' i = 1,2.
t Ci,t+1 dMi,t+1 Vi,t+1 1+"'Ci,t+1 mi,t+1
Et [fi2 (dCI,t-L c+2, dC2,t-Lc+2,' .. ,dCI,t+Lc+l, dC2,t+Lc+l, Va, Vi,t+l, dMi,t+l; A)]
12 R. Bansal et al.
i = 1,2,
where the lij (.) are particular functional forms too complex to be written out here.
The market clearing conditions of the model imply that
where poly(·) is a polynomial in St, and Vij is the vector of its coefficients. We
choose to use an exponential polynomial because economic theory implies that
Et[Fij,t] should be positive. In practice, the polynomial we use consists of linear
and squared terms of the elements of St.
Below is a description of the algorithm for solving for the equilibrium velocity
series given a vector A. In every instance, the ranges of the indices are i = 1, 2 and
j = 1, 2; superscripts indicate iteration numbers.
Step 1.
Simulate a realization of {Ut}, where Ut is iid N(O, n).
An Estimator for Nonlinear Structural Models 13
Step 2.
From some initial So, generate a realization of {St} using
Step 3.
Determine starting realizations of the velocity series {~~}.
We consider two possible ways to do this. The first is to specify starting values
for v?j' perhaps values of Vij obtained from a previous solution at a nearby A. Then,
given v?j and some initial observations on velocities ~~, t = 0, ... , Lc, the remaining
elements of the starting velocity series for t = Lc + 1, ... , T, can be determined
using the following relationships recursively
This structure is recursive because S~ contains de? t-I. A drawback to this approach
is that the simulated time series produced by the 'solution procedure are dependent
upon the starting values, so any attempt to replicate the solution exactly would require
knowing those starting values.
A second approach for establishing starting realizations of the velocity series
would set l'I~ and V2~ to be constants for all t. For these constants, one could
calculate steady-state values for the two velocities, or simply set the velocities equal
to 1. This latter approach still produces convergence in a relatively small number of
iterations.
Regardless of the approach used to determine starting values of velocity, if one
uses the procedure described below to improve the stability of the algorithm by
dampening iteration updates, starting values must also be specified for the polynomial
coefficients v?j. We recommend setting all of the coefficients to zero except the
constants. This means that Et(Fij,t) = exp[poly(.s\,vij)] reduces to Et(Fij,d =
exp[constan4j]. The constants can be set equal to the log of the unconditional means
of the Fij,t's. Setting the initial polynomial coefficients in this way gives a very
stable position from which to start the iterations.
Step 4.
Iteration k: Using the ~;-I series, calculate the Fi~~1 and regress each of these four
on a linearized version of exp[poly(S/-I, vb)] to estimate vb. The linearization is
d one aroun d vij
k-I
.
A linearized version of the exponential function is used to allow one to perform
linear regressions rather than nonlinear regressions at each iteration. When the
14 R. Bansal et al.
the value of its linearized version at the point at which we want to evaluate it.
Den Haan and Marcet (1990) actually suggest a more gradual way of modifying
the guesses at the polynomial coefficients from iteration to iteration. Rather than
setting vt equal to the coefficients obtained from the regressions, one can set vt
equal to a convex combination of those coefficients, call them bt, and the guess at
the coefficients from the previous iteration as
k = Pbk.
v··
'J 'J
+ (1 - p )v k-l
..
'J '
where 0 < p ::; 1. This procedure has the effect of dampening the speed with which
the guesses at the coefficients are updated. The smaller is p, the more gradually the
coefficients are modified from one iteration to the next. One might want to use this
gradual updating scheme to stabilize iterations that are not well behaved. For the
model in this paper, we were always able to set p = 1, which implies no dampening
in updating the coefficients.
Step 5.
Determine the two Vi~ series according to
k -k-l k -k-l k
Vit = exp[poly(St ,vidll exp(poly[St ,Vi2)],
and the two dC~t series according to
Step 6.
Repeat steps 3 and 4 until the velocity series converge. Convergence is reached when
realization of the velocity processes (which amounts to estimating the lIi/S) and
updating the postulated values of the velocity processes based on the estimated
conditional expectation values. The procedure continues until the velocity processes
converge.
Once the equilibrium velocity and consumption growth series have been determined
from the first two Euler equations, the four-period interest rate series in country 1,
the premium of the four-period forward exchange rate over the spot rate, and the
exchange rate growth can be determined from the remaining Euler equations without
additional iterations. The Euler equations can be written as
In these equations dPit = (dMit ~d/(dCit ~,t-d, the gross inflation rate in
each country. As before, 113 and 123 are particular function forms. The conditional
expectations terms in the equations are each estimated by regressing the value of the
function inside the expectations operator on a polynomial in St. The polynomial we
use consists of the elements of St raised to the first, second, third, and fourth powers.
The resulting simulation values are used to form {y;}.
The time required to solve the structural economic model at some value of A is
an important consideration, since our nonparametric estimator requires solutions at
many different values of A in finding the value that minimizes the GMM objective
function. When we use simulated time series of length 1000 to solve the model
(excluding an initial discarded 500 observations), convergence for most values of A
is achieved in approximately one minute on a SUN SPARCstation 2.
the conditional density on AO and write p(ylx), but we always make explicit the
dependence of the joint density p(y, x, AO) on AO, because that becomes important.
The SNP estimator is a sieve estimator that is based on the sequence of models
{!K(ylx,OK)}K=o, where OK E 0K ~ ~K, 0 K ~ 0K+I and where!(ylx,OK) is
a truncated Hermite series expansion. This hierarchy of models can, under regularity
conditions, approximate p(ylx) well in the sense
Thus, BKn is the estimated parameter vector of a Kth order SNP model fitted to the
data by maximum likelihood. The estimator Ais the solution of the GMM estimation
problem
where
which is the natural estimate of the inverse of the information matrix based on the
gradient-outer-productformula. This choice makes the minimized value of the GMM
objective function, sn(.X), approximately X2(£K - £A) for large K.
Below we consider several different algorithms for minimizing sn (A). Regardless
of the algorithm, it is advantageous to control the interface between the optimizer
and the economic model by scaling the optimizer's guesses at the parameter values
to be within a range in accordance with the economic theory behind our model.
For example, in our model it only makes sense for 8 to be between 0 and 1, so we
constrain the optimizer to attempt solutions only with such values. These constraints
are imposed by using various forms of logistic transformations.
The basic computational task for the estimator is to evaluate ~ = argminA.A {sn (A)}.
This minimization is not straightforward for our problem because of the large number
of parameters to be estimated (between 37 and 41 depending upon whether one, two,
or three lags of consumption services enter the utility function) and because analytical
derivatives of the objective function with respect to A are not available. We tried
four different algorithms for minimizing the objective function and found significant
differences across algorithms for our problem.
point problems.
Gradient search methods use numerical derivatives in place of analytical deriva-
tives when the latter are unavailable. For our type of problem, this does not work well.
The computations turn out to be about as demanding as would be those for analytical
derivatives approximating the gradient of the objective function oSn(>")/O>' at a.
single point>. entails computing the simulated process {y; } after small perturbations
in each of the >.. With f>. on the order of 37 to 41, this entails, at a minimum, recom-
puting the equilibrium of the model that many additional times just to approximate a
single one-sided gradient. The net effect is to generate about as many function calls
as would a naive grid-search. In fact, our experience suggests that a naive grid search
might even work better. In the course of approximating oSn(>')/O>' via perturbing >.
and forming difference quotients, values of >. that produce sharp improvement in the
objective function are uncovered quite by happenstance. Neither NPSOL nor DFP
retains and makes use subsequently of these particularly promising values of >.; the
effort that goes into to computing the equilibrium for these>. is lost. Simple grid
search would retain these >.'s.
The elements of V and T are tuning parameters that must be selected in advance and
are adjusted throughout the course of the iterations. We used the defaults. There are
additional tuning parameters that determine when these adjustments occur. Again,
we accepted the defaults.
The algorithm was defeated by the large number of function evaluations that it
requires. Most exasperating was its insistence on exploring unprofitable parameter
values. After making some promising initial progress the algorithm would plateau
far from an optimum and give no indication that further progress could be achieved
if the iterations were permitted to continue.
is available in the GQOPT optimization package (Quandt and Goldfeld, 1991). The
method works as follows: We begin the minimization of a function of fA variables
by constructing a simplex of (fA + 1) points in fA -dimensional space: Ao, AI, ... ,
Ai>.. We denote the value of the function at point Ai by Si. The lowest, highest, and
second highest values are
Sl = min(si), Sh = max(sd,
• •
corresponding to points AI, Ah, and Ahh. We also define the notation [AiAj] to indicate
the distance from Ai to Aj.
The algorithm works by replacing Ah in the simplex continuously by another
point with a lower function value. Three operations are used to search for such a new
point-reflection, contraction, and expansion-each of which is undertaken relative to
the centroid Aof the simplex points excluding Ah. The centroid is constructed as
i =F h.
Ar = (1 + ar)A - arAh,
where a r > 0 is the reflection coefficient. Ar lies on the line between Ah and A, on
the far side of A, and a r is the ratio of the distance [A r A] to [Ah A]. If Sl < Sr ::; Shh,
we replace Ah with Ar and start the process again with this new simplex.
If reflection has produced a new minimum (sr < Sl), we search for an even lower
function value by expanding the reflection. The expansion point is defined by
where a e > 1 is the expansion coefficient that defines the ratio of the distance [Ae).]
to [ArA]. Ae is farther out than Ar on the line between Ah and A. If Se < Sr, Ah is
replaced in the simplex by Ae. Otherwise, the expansion has failed and Ar replaces
Ah. The process is then restarted with the new simplex.
If reflection of Ah has not even produced a function value less than Shh - which
means that replacing Ah with Ar would leave Sr the maximum - we rename Ah to be
either the old Ah or AT) whichever has a lower function value. Then we attempt to
find an improved point by constructing the contraction
where 0 < a c < 1. The contraction coefficient a c is the ratio of the distance [AcA]
to [AhA]. If Sc < Sh, then the contraction has succeeded, and we replace Ah with
Ac and restart the process. If this contraction has failed, we construct a new simplex
by contracting all the points toward the one with the lowest function value, which is
accomplished by replacing the Ai'S with (Ai + AI) /2. Then the process of updating
the simplex restarts.
20 R. Bansal et al.
NeIder and Mead suggest stopping their procedure when the standard deviation
of the Ai'S is less than some critical value. In our empirical work, we strengthen this
stopping rule by restarting the algorithm several times from the value on which the
NeIder-Mead procedure settles. When this restarting leads to no further significant
improvement in the objective function value, we accept the best point as the minimum
of the function. In implementing the algorithm, we also found it advantageous to
modify the error handling procedures of the NeIder-Mead code provided in GQOPT
slightly to allow us to start the procedure with a wider ranging simplex.
The NeIder-Mead simplex method was far more successful than the other methods
we tried for minimizing our objective function. There are two aspects of this method
that we believe are responsible for its success. First, the method finds new lower
points on the objective surface without estimating derivatives. Second, by using
the operations of reflection, expansion, and contraction, the NeIder-Mead method
is designed to jump over ridges in the objective surface easily in searching for new
lower points. This property can be important in preventing an optimization algorithm
from shutting down too early. Despite these advantages, however, the performance
of the NeIder-Mead method is not completely satisfactory, because it requires a very
large number of function calls to find the minimum of the function. Given the number
of parameters in our model and the complexity of evaluating the objective function
at anyone point, the method can occupy several weeks of computing time on a Sun
SPARCstation. Even though this computing demand is substantial and far greater
than we expected from the outset of this project, we still consider our non parametric
structural estimator very successful in achieving our goal of estimating a nonlinear
rational expectations model and fully accounting for the complex nonlinear dynamics
of actual time series in that estimation.
Results from applying this estimator to the illustrative monetary model are avail-
able in Bansal, Gallant, Hussey, and Tauchen (1992).
4. CONCLUSION
ACKNOWLEDGEMENTS
This material is based upon work supported by the National Science Foundation
under Grants No. SES-8808015 and SES-90-23083. We thank Geert Bekaert,
Lars Hansen, David Hsieh, Ellen McGrattan, Tom Sargent, and many seminar and
conference participants for helpful comments at various stages of this research.
REFERENCES
Bansal, R., 1990, "Can non-separabilities explain exchange rate movements and risk premia?",
Carnegie Mellon University, Ph.D. dissertation.
Bansal, R., A. R. Gallant, R. Hussey and G. Tauchen, 1992, "Nonparametric estimation of
structural models for high-frequency currency market data", Duke University, manuscript.
Bollerslev, T., 1986, "Generalized autoregressive conditional heteroskedasticity", Journal of
Econometrics 31,307-327.
den Haan, W. J. and A. Marcet, 1990, "Solving the stochastic growth model by parameterizing
expectations", Journal of Business and Economic Statistics 8, 31-4.
Duffie, D. and K. J. Singleton, 1989, "Simulated moments estimation of markov models of
asset prices", Stanford University, Graduate School of Business, manuscript.
Dunn, Kenneth and K. J. Singleton, 1986, "Modeling the term structure of interest rates under
non-separable utility and durability of goods", Journal of Financial Economics 17, 27-55.
Engle, R. F., 1982, "Autoregressive conditional heteroscedasticity with estimates of the vari-
ance of United Kingdom inflation", Econometrica 50, 987-1007.
Feenstra, R. C., 1986, "Functional equivalence between liquidity costs and the utility of
money", Journal of Monetary Economics 17,271-291.
Gallant, A. R. and G. Tauchen, 1989, "Seminonparametric estimation of conditionally con-
strained heterogeneous processes: asset pricing applications", Econometrica 57, 1091-
1120.
Gallant, A. R. and G. Tauchen, 1992, "A nonparametric approach to nonlinear time series:
estimation and simulation", in David Brillinger, Peter Caines, John Zeweke, Emanuel
Paryen, Murray Rosenblatt, and Murad S. Taggu (eds.), New Directions in Time Series
Analysis, Part II, New York: Springer-Verlag, 71-92.
Gill, P. E., W. Murray, M. A. Saunders and M. H. Wright, 1986, "User's guide for NPSOL
(version 4.0): a Fortran package for nonlinear programming", Technical Report SOL 86-2,
Palo Alto: Systems Optimization Laboratory, Stanford University.
Goffe, W. L., G. D. Ferrier, and J. Rodgers, 1992, "Global Optimization of statistical functions:
Preliminary results" in Hans M. Amman, David A. Belsley, and Louis F. Pau (eds.), Com-
putational Economics and Econometrics, Advanced Studies in Theoretical and Applied
Econometrics, Vol. 22, 19-32, Boston: Kluwer Academic Publishers.
Hansen, L. P., 1982, "Large sample properties of generalized method of moments estimators",
Econometrica 50, 1029-1054.
Hansen, L. P. and T. J. Sargent, 1980, "Formulation and estimation of dynamic linear rational
expectations models", Journal of Economic Dynamics and Control 2, 7-46.
Hansen, L. P. and K. J. Singleton, 1982, "Generalized instrumental variables estimators of
nonlinear rational expectations models", Econometrica 50, 1269-1286.
Ingram, B. F. and B. S. Lee, 1991, "Simulation estimation of time-series models", Journal of
Econometrics 47, 197-205.
Judd, K. L., 1991, "Minimum weighted least residual methods for solving aggregate growth
models", Federal Reserve Bank of Minneapolis, Institute of Empirical Macroeconomics,
manuscript.
22 R. Bansal et al.
Lucas, R. E., Jr., 1982, "Interest rates and currency prices in a two-country world", Journal of
Monetary Economics 10, 335-360.
Marcet, A., 1991, "Solution of nonlinear models by parameterizing expectations: an applica-
tion to asset pricing with production", manuscript.
McCallum, B. T., 1983, "On non-uniqueness in rational expectations models: an attempt at
perspective", Journal of Monetary Economics 11, 139-168.
NeIder, J. A. and R. Mead, 1964, "A simplex method for function minimization", The Computer
Journal 7, 308-313.
Quandt, R. E. and S. M. Goldfeld, 1991, GQOPTIPC, Princeton, N.J.
Stigum, M., 1990, The money market, 3rd ed., Homewood, II.: Dow lones-Irwin.
Svensson, L. E. 0., 1985, "Currency prices, terms of trade and interest rates: a general
equilibrium asset-pricing cash-in-advance approach", Journal of International Economics
18,17-4l.
Tauchen, G., 1990, "Associate editor's introduction", Journal of Business and Economic
Statistics 8, l.
Taylor, J. B. and H. Uhlig, 1990, "Solving nonlinear stochastic growth models: a comparison
of alternative solution methods", Journal of Business and Economic Statistics 8, 1-17.
A.J. HUGHES HALLETT AND YUE MA
ABSTRACT. GMM estimators are now widely used in econometric and financial analysis.
Their asymptotic properties are well known, but we have little knowledge of their small sample
properties or their rate of convergence to their limiting distribution. This paper reports small
sample Monte Carlo evidence which helps discriminate between the many GMM estimators
proposed in the literature. We add a new GMM estimator which delivers better finite sample
properties. We also test whether biases in the parameter estimates are either significant or
significantly different between estimators. We conclude that they are, with both relative and
absolute biases depending on sample size, fitting criterion, non-normality of disturbances, and
parameter size.
1. INTRODUCTION
One of the most interesting developments in econometric theory over the past decade
has been the introduction of the General Method of Moments (GMM) estimators.
Not only is this a significant development because it offers a new and more flexible
approach to estimation, it also opens up an estimation methodology that is particularly
well suited to a range of problems - such as the econometrics of financial markets -
where the form of the probability distributions, as well as their parameters, plays an
important role.
The theoretical properties of GMM estimators - consistency, asymptotic effi-
ciency and sufficiency - were established rapidly after Hansen first introduced the
concept (Hansen, 1982). These properties are established in Duffie and Singleton
(1989), Smith and Spencer (1991) and Deaton and Laroque (1992). However, few
results have been presented on the GMM's small sample properties or rate of conver-
gence to consistency. This would provide important information on the general relia-
bility of GMM estimators. It seems that we lack such information because, although
the principle of GMM estimation is well defined, there is no obvious agreement on
the algorithms to be used for computing the estimates themselves. The theoretical
contributions have been vague on implementation and the choice of fitting criterion.
This paper examines 7 different suggestions from the recent literature.
The first purpose of this paper is to provide some empirical experience that helps
the user discriminate between different GMM estimation techniques. Second, we
introduce a new GMM estimator which, in our experiments at least, produces better
finite sample results than any of the other techniques reported in the literature. Third,
D. A. Belsley (ed.). Computational Techniques for Econometrics and Economic Analysis. 23-44.
© 1994 Kluwer Academic Publishers.
24 A.J. Hughes Hallett and Yue Ma
we draw a distinction between the case where we have to estimate a few model
parameters conditionally on an assumed distribution for the random components (the
traditional econometric approach) and the more general problem of fitting a whole
distribution or probability model.
Most GMM estimators can be specified within the framework established by Hansen
(1982). That framework exploits the general orthogonality condition
(1)
(2)
where f (Yt) is any function of the observed data, and j (Xt, {J) is its fitted counterpart
under the maintained hypothesis and chosen parameter values /3.
In practice we have to define the best fit in some metric, i.e. we choose by /3
solving
(3)
where the value of r defines the norm and W the weighting function. This is the
GMM strategy when the f(-) represent a series of moments from the probability
distribution of Yt; that is, f (.) defines the sample moments, and j (.) represents the
fitted moments given Xt and the choice of {J. In many cases we do not have an
analytic maintained hypothesis, so the fitted moments j(-) have to be constructed
by numerical simulation with pseudo-data replicated many times through the model
to generate numerical evaluations of those moments. That variant is the method of
simulated moments.
Now, if (1) is correct the sample moment,
T
gT({3) =L g(Xt, {3) / T
t=1
where mi is the i-th sample moment and fLi is the corresponding central moment
from the probability density function expressed in terms of the parameters of the
underlying theoretical model. The simplest GMM estimator minimises
2
Define
or
while Smith and Spencer's (1991) version considers only the first three moments
3
J3T «(3) = 9T«(3)' 9T«(3) = L (mj - fLj)2 .
j=1
26 A.J. Hughes Hallett and Yue Ma
This estimator has the advantage that each element in the objective function is ex-
pressed in the same units of measurement. It therefore produces a result that is
independent of those units, and hence of the weighting of the moments implicit in
the previous two estimators.
(4) The Newey and West (1987) version of the GMM estimator introduces the
weighting matrix WT = Vrl, where
T
VT = I: g(Xt,[3*)g'(xt,[3*)jT,
t=l
Xt + Xt-l
at-l = 2 ,ao = 0, aT = +00 ,
and f(x, [3) is the probability density function of Xt. Then Ut represents the differ-
ence between the actual frequency of observations in the interval [at-I, ad and the
theoretical probability of getting an observation in the same interval according to the
maintained hypothesis. We then pick [3 to minimise the "error" between the two
probabilities, that is, to maximise the fit between the observed relative frequencies
and the maintained probability. To do that, write U = (u\, U2, ... ,UT)', together
with
1 Other values are obviously possible for (3*, and it would certainly be possible to iterate
on the (3* values to give a fully converged GLS style estimator.
On the Accuracy and Efficiency of GMM Estimators 27
T
g(Xt, (3) = T· Z;Ut, and gT(f3) =~ g(Xt, (3)/T = Z' u.
t=1
(6) Finally, we also compare the GMM estimators with the Maximum likelihood
(ML) estimator. Suppose the pdf is I(Xt, (3). The ML estimator maximises
T
J(f3) = II I(Xt, (3) .
t=1
(7) The existing literature sheds very little light on the small sample properties
of GMM estimators. There appear to be just two studies that attempt to do so.
Tauchen (1986) and Gregory and Smith (1990) both look at the performance of
GMM estimators in a very particular model of assets in macroeconomic performance.
Gregory and Smith find (i) that small sample bias increases as the size of the parameter
being estimated increases (as we do below), (ii) the confidence intervals on the GMM
estimates shrink significantly with increasing sample size (as we do), and (iii) the rate
of convergence to consistency slows noticeably when there are stronger dynamics in
the model. Tauchen concentrates on the instrumental variable version of the GMM
estimator in the same model and finds that the finite sample results are not sensitive
to the choice of instruments. But all of this is done with just 2 sample sizes, 2 GMM
estimators, 2 parameter settings, and 1 maintained hypothesis.
Deaton and Laroque (1992), by contrast, use a different model and report poor
estimates of the underlying distributions with small and medium sample sizes. No
details are given; but evidently rather large samples may be needed to achieve the
desired asymptotic properties - depending on the model, parameter values, estimation
criterion, and distributional context chosen. That calls for closer investigation, both
between different GMM estimators and relative to traditional estimation techniques.
To compare the performances of the various GMM estimators and the ML estimator
in finite samples, we ran a series of Monte Carlo experiments using sample sizes of
20 and 200 and a variety of parameters values. First, we took 5 cases of the normal
distribution with parameter values (1", (J'2) = {(O, 1), (0,0.25), (0,2), (2,0.25),
(2,2)}; then 3 cases of the gamma distribution with parameters (r, A) = {(1,3),
28 A.i. Hughes Hallett and Yue Ma
TABLE 1
(3, 1), (1, I)}; and finally 3 examples of the beta distribution with (p, q) = {( 1, 3),
(3, 1), (1, I)}. In each experiment, 500 estimation replications were carried out. The
NAG Library Fortran subroutines were used to generate the random pseudo data,
On the Accuracy and Efficiency ofGMM Estimators 29
Table 1 (continued)
SAMPLE SIZE 20 =
METHOD bias variance bias variance (d.f. =4)
TRUE PARAM. (Jl., (12) = (0, 1)
AHH 0.01l03 0.04674 -0.13735 0.09080 3.8 43%
HNW 0.01298 0.05960 -0.18649 0.09332 4.2
O-L -0.01365 0.46749 0.19824 2.65313 5.2
OS 0.01402 0.06273 -0.23965 0.94469 6.1
SIMPLElSS(3)IML 0.01403 0.06374 -0.29827 0.10405 6.2 20%
and each of the 7 estimators were applied to the resulting 500 "data" sets. In this,
the start-up seeds were randomised by the clock and a quasi-Newton algorithm was
used to find a minimum of a non-linear function subject to fixed upper and lower
bounds for the range of possible parameter values. For the estimates themselves, the
selection criteria are (1) unbiased ness
30 A.J. Hughes Hallett and Yue Ma
TABLE 2
= -N.L
1 N A
where 13
represents the true parameter value; and N is the number of replications
(N = 500) and (2) efficiency
N N
. 1 ",A -2 - l",A
vanance = N _ 1 L..J (f3i - 13) , where 13 = N L..J f3i .
i==1 i==1
The choice of Normal, Gamma, and Beta densities in these experiments covers the
wide variety of distribution shapes that are found in economic and financial data.
On the Accuracy and Efficiency ofGMM Estimators 31
Table 2 (continued)
SAMPLE SIZE = 20
TRUE PARAM. (T, A) = (3,1) (d.f. =4)
AHH 0.55093 1.67360 0.18058 0.22084 5.2 27%
HNW 0.65978 2.00182 0.21690 0.25896 5.9
D-L 0.77432 8.77511 0.20008 1.42427 6.4
DS 1.52956 5.66632 0.48944 0.50706 6.7
SS(3) 1.84638 7.06289 0.55156 0.71606 7.0
ML 1.60091 6.13546 0.51914 0.69212 6.8 8%
SIMPLE 2.28789 6.09199 0.77963 0.68508 7.3
Tables 1 to 3 contain the results of our Monte Carlo parameter estimation experiments
for the Normal, Gamma, and Beta distribution cases, respectively. For each estimate
of them we report the bias and variance achieved by our 7 GMM techniques across the
500 Monte Carlo replications. For reasons of space we present only the results for the
small sample size experiments (T = 20) and for the large sample sizes (T = 200).
Results for the intervening cases (T = 50, 100 etc.) are available from the authors.
In what follows, we take the bias in an estimated parameter to be an indicator of
the relative accuracy of the given estimator in the specified circumstances, and the
variance to be an indicator of the estimator's reliability (or sensitivity to "outliers" in
the data).
32 A.J. Hughes Hallett and Yue Ma
TABLE 3
Both criteria, small sample bias and small sample efficiency, put our own GMM
estimator (denoted AHH here) in first place for performance and the Hansen-Newey-
West estimator (HNW) in second place. There are a total of 88 comparisons here 2, and
there is just one case where our GMM estimator does not perform best [the maximum
likelihood technique produces a marginally smaller variance for the second parameter
estimate in the N(2, 0.25) and T = 200 case]. Similarly there are just two cases
2 11 distributions (of 3 types) each with 2 parameters judged by 2 criteria in 2 sample size
experiments.
On the Accuracy and Efficiency of GMM Estimators 33
Table 3 (continued)
SAMPLE SIZE = 20
TRUE PARAM. (p, q) = (1,3) (dJ. = 4)
AHH 0.15040 0.14359 0.60575 1.95128 4.5 34%
HNW 0.15425 0.18224 0.60940 2.25979 4.7
O-L 0.40169 0.93666 1.23680 6.73138 5.3
OS 0.15920 0.18723 0.62354 2.30046 4.8
SS(3) 0.16143 0.18834 0.63016 2.30985 4.9
ML 0.47709 0.46201 1.63302 5.00642 5.7 14%
SIMPLE 0.42583 1.47930 1.22201 9.34737 6.0
The most obvious feature of these results is that the methods are largely unbiased and
efficient. An exception is the poor performance of the Deaton-Laroque estimator in
small samples (T = 20). This poor performance is concentrated in the variances of
these parameter estimates, which are often (but not always) 50 to 500 times larger
than that of the other estimators - particularly for the second parameter. There are
fewer problems with the bias of the estimates, although 7 out of the 10 bias results
show signs opposite to the other estimators. Things look better in large samples.
The Deaton-Laroque estimator generally produces better results than the methods of
simulated moments or maximum likelihood and captures third place for large values
of T. Consequently it appears that the Deaton-Laroque method requires much larger
samples to achieve reasonable statistical properties. This sensitivity or unreliability
in small sample sizes is also shared by the maximum likelihood estimates and also
appears in the Beta and Gamma distribution results below, although the Deaton-
Laroque estimator will not be singled out there since all the estimators (beyond the
best two) do badly in those exercises.
A second feature is that both the bias and the variance of the estimates fall roughly
by a factor of 10 with a lO-fold increase in the sample size, suggesting that estimates
by any of the 7 methods converge on statistical consistency at the rate of O(T-I).
This feature does not seem to vary much among the different techniques. Thus, the
performance ranking remains as described above: our GMM estimator dominates
the Hansen-Newey-West estimator in every case, and the latter in turn dominates all
others. Moreover the degree of dominance of our estimator over the Hansen-Newey-
West estimator is usually larger than the dominance of the latter over the next best.
Finally, both the bias and the variance of the estimates rise somewhat with larger
values of (72 (the distribution's second parameter) and rather less so with p, (its first
parameter), but these tendencies are weak compared to the results which follow in
Tables 2 and 3.
The most awkward result in Table 1, therefore, is the poor performance of the
maximum likelihood estimator. In this exercise, it produces independent estimates
of p, and (72 and should be efficient at any sample size; its results should be at least as
good as any of the others. One explanation why this is not so is that the differences
observed in Table 1 may not be statistically significant but are simply the result of
different numerical procedures. This possibility is examined in Section 5 below.
The estimates in Table 2 show much greater bias and inefficiency - particularly in
small samples, where the estimates of at least one of the two parameters are really
very poor. It seems that consistency here requires a substantially larger sample size
than for problems involving normally distributed variables.
Having said that, our GMM estimator still dominates Hansen-Newey-West, but
by a smaller margin than the latter dominates the next best. In that sense, the best two
both pull ahead of the pack in small samples. This implies that there is an increasing
On the Accuracy and Efficiency of GMM Estimators 35
relative (but not absolute) reliability as the quality of the estimates starts to fall. In
any event, it now matters more which estimator one chooses. Moreover, the choice
between estimators is wider in that, even in larger samples, the biases and variances
of the parameter estimates may be 8-10 times larger if the "wrong" estimator is used.
Once again, both the Deaton-Laroque and the maximum likelihood estimators appear
to be more unreliable and inaccurate than the others in small samples. On the other
hand, the biases and variances from the two best techniques fall by factors of 10 or
more when the sample size is increased from 20 to 200, suggesting that the estimators
are still converging on consistency a little faster than O(T-l).
Finally, both the bias and variances tend to increase with the size of the underlying
parameter but, interestingly, not with the size of the other parameter value.
Like in the Gamma distribution results, the estimates in Table 3 show relatively large
biases and variances in small samples. Consistency therefore requires fairly large
samples, though not as large as the Gamma distribution estimates.
As before, our GMM estimator dominates all others, and the Hansen-Newey-West
estimator comes second, for accuracy, reliability, and mean square errors. The degree
of dominance is reasonably large again, which is consistent with the proposition that
these two estimators pull ahead of the pack as the quality of the estimates starts to
fall. The simple method of moments and the Deaton-Laroque estimators continue to
perform badly in small samples, and the two best methods show biases and variances
failing by factors of 10 or more as the sample increases from 20 to 200. So once
again, consistency appears to be achieved at a rate of more than O(T-I).
The most striking feature of these results is the relatively poor performance of the
maximum likelihood estimators. Numerically, the maximum likelihood estimators
produce the worst or second worst bias results in 19 out of the 20 Normal distribution
tests (Table 1),6 out of the 12 Gamma distribution tests (Table 2), and 9 out of 12
Beta distribution tests (Table 3). Similarly they show the largest or second largest
variances in 18 out of 20 tests in Table I, in 2 of the 12 tests in Table 2, and in 6 out
of 12 tests in Table 3. So, for accuracy and reliability, maximum likelihood performs
relatively badly compared to the leading GMM estimators.
These results are remarkable because we know that, in the case of normally
distributed variables at least, maximum likelihood estimates are independently dis-
tributed and efficient in the sense of actually reaching the Cramer-Rao lower bound
(Mood, Graybill and Boes, 1974, chapter 7). Theoretically, they cannot be beaten.
For the Gamma and Beta distribution, the theory is not so clear. First, maximum
likelihood estimates are now no longer independent of one another. Estimators that
pay little attention to the higher order moments of the distribution being fitted may
be able to secure lower biases (or variances) in their parameter estimates at the
36 A.J. Hughes Hallett and Yue Ma
implicit cost of higher biases in those higher order moments which are unpenalised.
Such trade-offs are not available for an estimator that tries to fit the entire likelihood
function.
Second, the unbiasedness and efficiency properties of maximum likelihood esti-
mation are now only asymptotic, and our sample sizes of 20 to 200 may be too small
to capture those properties. Thus maximum likelihood estimates may have produced
worse results than some of the GMM estimators because the maximum likelihood
estimators' sampling distributions converge more slowly (in both mean and variance)
to their limiting distribution.
These arguments can explain why the maximum likelihood estimates are worse
than some of the GMM estimates in the Gamma and Beta tests, but not why they
are worse in the Normal distribution tests. In this latter case, there are only two
possible explanations: the maximum likelihood biases (variances) in Table 1 are
not statistically significant, and/or they are the result of numerical instabilities in the
algorithm used to compute them. The implications of these two explanations are
quite different, however. If the biases (variances) are not significantly different from
zero (or each other), then there are not problems with the estimating techniques, and,
statistically it does not matter which is chosen. But if they are significant, it would
be worthwhile to examine the numerical properties of different maximum likelihood
algorithms to eliminate (as far as possible) any problems of numerical instability.
ageofN replications, X = N- 1 E
Xi = (NT)-I EE Xij· If the underlying ob-
servations Xij are drawn from a Normal(Ji, ( 2 ) distribution, then X '" N(Ji, u 2 j NT)
exactly, and our recorded bias is distributed as N(O, u 2 j NT) in Table 1. Simi-
larly our recorded estimate of the second parameter (the distributions variance) is
the average of N replications; i.e. 8 2 = N- 1 ~ 8;, where each (T - 1)8;;u 2 is
8;
distributed X~T-I) and has variance 2{T - 1). Hence has variance 2u4 j(T - 1)
and /!:-
82 N(u 2 , 2u4 j N(T
- 1)), so bias in the latter is distributed asymptotically as
N(0,2u 4 jN(T - 1)).
For the results in Table 1, we have T = 20 or 200, N = 500, and five different
generating distributions. The bias in the first parameter estimate (Table 1, column 1)
will therefore be significantly different from zero at a 5% level if it lies outside the
interval ±1.96u:;: where U:;: = u jJNT. Similarly the biases in the second parameter
estimate will be significant at a 5% level if they lie outside the interval ±1.96us ,
where Us = u 2 ../2jN(T - 1). Table 4 summarises U:;: and Us for the five different
distributions represented in Table 1.
Evidently the maximum likelihood estimates of the mean (Ji) show significant
biases at a 5% level only in the N(O, i) case with T = 200. The remaining 9
On the Accuracy and Efficiency ofGMM Estimators 37
TABLE 4
Ufij Us
u 2 = 0.25 u2 = 1 u2 = 2 u 2 = 0.25 u2 = 1 u2 = 2
T=20 .005 .01 .014 .004 .015 .029
T=200 .0016 .0032 .0045 .001 .005 .009
maximum likelihood estimators show no significant biases. These tests are exact.
The tests of bias in the estimates of the variance parameter are asymptotic with respect
to a "sample" size of 500, but show a greater number of significant biases. In fact,
significant biases appear in all 11 second parameter estimates.
Hence maximum likelihood estimation has produced some significant biases,
more when estimating of the second parameter than the first. Both the presence of
significant biases in half the cases and the fact that these biases tend to appear in both
the variance parameter and the larger samples for the estimated mean suggest that
numerical instability is at least part of the reason for the poor maximum likelihood
performance. Indeed the estimates for ai and a; from Table 4 are smaller than
the variances of the estimates actually recorded in columns 2 and 4 of Table 1.
The computed distributions of our estimates are therefore very much wider than the
Cramer-Rao lower bound would imply, which is symptomatic of numerical instability.
However, that is not the real issue. The crucial question is, are these biases actually
larger (in a statistical sense) than those coming from the GMM estimators? The
difficulty here is that the exact distributions of the parameter estimates obtained from
the GMM techniques are not analytically tractable, since their estimating equations
do not admit a closed form solution. Thus, we cannot obtain an exact variance of the
parameter estimates Xi and sy to which we could apply the central limit theorem to
derive tests of the biases in x or s2. However we can use the maximum likelihood
values already obtained to estimate those variances. That gives the test results in
Table 7.
Thus, whereas it is possible to argue that maximum likelihood estimators do
not provide any significant biases in the first parameter, and that the actual biases
observed are the result of numerical instabilities in the algorithm used to maximise
the likelihood function, the same cannot be said for the GMM estimators. With our
tests, there is a much higher incidence of significant bias in both parameters and
both sample sizes. The chief offenders are the Deaton-Laroque (DL) and the Method
of Simulated Moments (DS) estimators. At the other end of the scale, our own
GMM estimator and the HNW estimator produced no significant biases in the first
parameter estimate and fewer in the second. Hence there are significant differences
in accuracy and reliability between the AHH and HNW estimators on the one hand,
and the remaining GMM estimators on the other.
38 A.J. Hughes Hallett and Yue Ma
Here formal testing for bias is difficult since the exact distribution of estimates, even
for maximum likelihood, of the two parameters is unknown. Thus, we are unable to
determine the variances of those estimators to which the Central Limit Theorem might
otherwise apply. Further we cannot substitute the estimated variances (maximum
likelihood or otherwise) obtained in Table 2, for biased parameter estimates entail
biased variance estimates (there being no independence property now). Any formal
justification for this approach has thus disappeared.
But even if conventional asymptotic tests are not possible, a conditional test can
be used that is a sufficient (but not necessary) condition for detecting significant
biases. The maximum likelihood estimates of a Gamma distribution are obtained by
solving
T = XA
(4)
and A = e..p(r) /
simultaneously for T and A, where x = T- 1 E;=l Xi, and the Xi are the random
drawings in a sample of size T. The function 1/J(T) is the Digamma function:
1/J( T) = dlogr( T) / dT where r( T) = (T - I)!. Note that 1/J( T) > 0 is monotonically
increasing in T for T 2: l.
However, for testing purposes, we can form conditional estimates of T and A by
inserting the true values of A and T from the underlying distribution on the right of
(1). Call these conditional estimates T* and A*, and let the actual estimates obtained
by solving (1) be f and 5.. Then, with positively biased estimates, the probability of
any particular positive bias in f or .A is less than the probability of the same bias in T*
or A* under the null of unbiasedness. Hence a sufficient condition for the biases in
Table 2 to be significant (at the 5% level) is that they should be significant for T* or
A*. Indeed (4) implies that T* :.:.- (T, A2(F2/NT). Using the fact that (F2 = T/ A2 for
each of the three Gamma distributions estimated, we find the maximum likelihood
estimates of T to be significantly biased3 for both the small and large samples. The
consistency and asymptotic efficiency of GMM estimators allow us to extend these
asymptotic tests to the other estimators of T in Table 2. Once again, all estimates
show significant biases.
Hence we conclude that, in the case of the Gamma distribution tests, all estimators
show significant biases that do not vanish with larger sample sizes. The sampling
distributions evidently converge slowly on their asymptotic distributions, in terms
TABLE 5
Biases in the estimated means and variances of gamma distributed variables
from Table 2 (/-L = f /~; ([2 = f / ~2).
T=20 T=200
True parameters
and Bias in Bias in Bias in Bias in
distribution Estimator mean variance mean variance
(3,1) AHH .008 -.452 .0113 -.02
/-L=3 HNW .007 -.529 .0115 -.034
([2 =3 DL .145 -.379 -.2015 .225
DS -.212 -.958 -.0106 -.188
SS(3) .124 -.987 .040 -.131
ML .029 -1.006 .0142 -.147
Simple -.029 -1.330 -.016 -.331
both of unbiasedness and having larger variances than in the limit (compare values
for .x2(T2/NT with column 2 of Table 2). It is clear that both our preferred GMM
estimators (AHH first, and then HNW) are more accurate and more reliable (having
smaller biases and lower variances) than their rivals - including maximum likelihood.
But, this does not cause them to be unbiased or near-minimum variance. In fact these
results are purely relative: while our own GMM estimator is preferable to the others,
it is not necessarily good.
And this is as far as we can go. Conditional tests on .x itself are not possible
since the variance for the distribution of the inverse geometric mean, (IIxi) -I IT, is
not known and the central limit theorem cannot be applied. Beyond this, we can
only look at the biases in the estimated means (= f /,\) and variances (= f / ,\2)
40 A.J. Hughes Hallett and Yue Ma
numerically. These figures are given in Table 5, but formal tests are not possible
since both are derived from ratios of nonindependently distributed random variables.
It is clear from Table 5 that the biases in the mean are systematically smaller than
those in the variance, and they vary less across estimators than do those for the
variance estimates. 4
These results illustrate an important point. General statements indicating that a
particular estimator is more accurate, or converges faster to its asymptotic distribution,
can be extremely misleading. In this exercise the means have been well estimated in
all cases. The variances are less well estimated - but their fit is still good compared
to many of the estimates of the T, A parameters. And such results are easily obtained,
since even significant biases in T and A of the same sign will offset each other to
produce means or variances with relatively little bias. That is, the quality of the results
obtained from estimating particular characteristics may be quite different from those
obtained from fitting the distribution as a whole. Hence it matters whether the real
objective is to fit particular parameters or the distribution as a whole.
Here not even conditional tests are available to determine the significance of the
biases in the maximum likelihood estimates of Table 3. These estimates arise from
solving
simultaneously for p and q, a process that does not yield a tractable closed-form
solution. At best one can inspect the numerical biases in Table 3 or the equivalent
bias results in Table 6. But, just as these, Table 6 shows how easily numerically
"significant" biases in the parameter estimates can offset one another to give appar-
ently unbiased mean and variance estimates. Both are estimated with much smaller
numerical biases than are p and q themselves. There is no clear tendency here for the
variance to be more biased than the mean, and both biases show a stronger tendency
to diminish with increasing T. Nor is there any apparent ranking of biases across
estimators. Yet the general message is the same: it matters for estimation whether
one focuses on particular characteristics of the distribution or its entirety.
To test the goodness of fit of the entire distribution implied by each replication
underlying the results in Tables 1 to 3, we have used the traditional X2 test: the
likelihood ratio goodness-of-fit tests (Kendall and Stewart (1974)). The mean X2
4 Our own GMM estimator generally does better than the other estimators in Table 5. On
the other hand the bias in the variance estimates converges to zero with increasing T, but there
is little convergence of the biases in the means.
On the Accuracy and Efficiency of GMM Estimators 41
TABLE 6
Biases in the estimated means and variances of beta distributed variables from
Table 3.
pq )
(fl = p: q ,
(72 -
- (p+q)2 (p+q+ I)
T=20 T=200
True parameters
and Bias in Bias in Bias in Bias in
distribution Estimator mean variance mean variance
(1,3) AHH .0081 -.0056 .0005 .0003
fl = 1/4 HNW .0077 -.0056 .0007 -.0048
(72 = .0375 DL .0014 -.0094 .0007 -.0003
DS .0076 -.0057 .0007 -.0004
SS(3) .0076 -.0058 .0011 -.0006
ML .0083 -.0117 .0007 -.0011
Simple .0025 -.0091 .0024 .0002
statistics, for each estimation technique under review, are given in Tables 1 to 3.
The conventional goodness of fit test would accept the null hypothesis that the
observations fitted by the named technique conformed to a normal, gamma or beta
distribution, respectively, if the associated X2 test statistics were less than the critical
values of 27.6 (for a 5% significance level and T = 200). For T = 20, the critical
value is 9.5. Every estimator therefore passes this test easily, even in the smaller
samples, and the null hypothesis is correctly accepted.
It is clear, however, that these tests are considerably more powerful in the larger
42 A.i. Hughes Hallett and Yue Ma
TABLE 7
Test the significance of the estimates' bias for Normal Cj.L, (]"2)
N(O, 1) N(0,2) N(O, 1/4) N(2,2) N(2,1/4)
T == 200 P, a- 2 p, a- i p, a- i p, a- i p, a- 2
AHH
* * *
HNW
* * * * *
DL
* * * * * * * *
DS
* * * * * *
SIMPLE!
SS(3)1ML
* * * * * *
T == 20
AHH
* * * * *
HNW
* * * * *
DL
* * * * * * * *
DS
* * * * *
SIMPLE!
SS(3)1ML
* * * * *
Note: * indicates a significant bias at the 5% level.
7. CONCLUSIONS
Basically, our concerns about the poor small sample properties of GMM estimators
have been born out. While we have observed a fairly rapid rate of convergence
towards consistency and asymptotic efficiency, there is still evidence of statistically
significant biases and large variances, even in the larger samples.
Just how bad the small sample properties actually are depends on the particular
estimation technique chosen. It matters which GMM estimator is used and which
numerical implementation of the maximum likelihood estimator is applied. In these
exercises there is a clear ranking: our own GMM estimator performs best, followed
by the Hansen-Newey-West estimator, and then the Method of Simulated Moments.
The Deaton-Laroque estimator shows a great deal of variability in small samples, but
is a relatively good performance in larger samples.
Moreover, it appears that the differences between the performance of these es-
timators widen as we depart from the classical assumptions of large samples and
normally distributed variables. We find the results are sensitive to the sample size,
the form of fitting criterion, non-normality in the underlying distribution, and the
size of the parameter being estimated. We also find that most estimators are worse in
regard of efficiency than unbiasedness. Nevertheless, the GMM estimators all fairly
good for fitting probability distributions in their entirety, even in relatively small
samples.
APPENDIX
where
J
00
reT) = ST-J e- s ds
o
Then
P pq
ILl = - - IL2 = -:----~.....,----:-::-
p+q (p+q+ I) (p+q)2
2pq(q - p)
IL3= -:--------~~~~-------
(p+q+2) (p+q+ I) (p+q)3
ACKNOWLEDGEMENTS
We are grateful to Dave Belsley, Gregor Smith, Jim Powell, Robin Lumsdaine and
participants of the Econometrics Seminar at Princeton for their comments.
REFERENCES
Deaton, AS. and Laroque, G. (1992) On the behaviour of commodity prices, Review of
Economic Studies, 59, 1-24.
Duffie, D. and Singleton K.J. (1989) Simulated Moments Estimation of Markov Models of
Asset Prices, Stanford University Discussion Paper, Stanford, CA
Gregory, A and G. Smith (1990) "Calibration as Estimation" Econometric Reviews, 9, pp.57-
89.
Hansen, L.P. (1982) Large sample properties of generalised Method of Moments Estimators,
Econometrica, Vol. 50, pp 1029-1054.
Hughes Hallett, A.J. (1992) Stabilising earnings in a volatile market, paper presented in the
Royal Economics Society Conference, London (April).
Kendall, M.G. and Stewart, A. (1973) The Advanced Theory ofStatistics, Vol. 2, Third Edition,
Griffen & Co., London.
Mood, A F. Graybill and D. Boes (1974) Introduction to the Theory ofStatistics, McGraw-Hill,
New York.
Newey, w.K. and West K.D. (1987) A Simple, positive semi-definite, heteroscedasticity and
autocorrelation consistent covariance matrix, Econometrica, 55, pp 703-708.
Smith, G. and Spencer M. (1991) Estimation and testing in models of exchange rate target
zones and process switching, in P. Krugman and M. Miller (eds), Exchange rate targets
and currency bands, Cambridge University Press, Cambridge and New York.
Tauchen, G. (1986) Statistical Properties of Generalised Method of Moments Estimators of
Structural Parameters Obtained from Financial Market Data, Journal of Business and
Economic Statistics, 4, pp.397-425.
ALBERT J. REED AND CHARLES HALLAHAN
1. INTRODUCfION
problems, but we apply it here to dynamic and stochastic problems. The stochastic
regulator encompasses a wide range of dynamic and stochastic models, and dynamic
and stochastic models provide a rich interpretation of economic data. These models
readily differentiate among the response of an endogenous variable to an actual
change, to a perfectly expected change, and to an unexpected change in an exogenous
variable. Furthermore the problem addresses the Lucas critique by recognizing that
such responses are not invariant to systematic changes in policy.
After discussing the stochastic regulator problem in Section 2, the bootstrap
estimator is presented in Section 3. Section 4 provides an example of interest to
agricultural economists, and Section 5 summarizes the paper.
Here we review the setup of the stochastic regulator, its solution, and the conditions
that deliver the solution. A more thorough treatment can be found in Sargent (1987b,
Chapter 1). An understanding of the stochastic regulator is crucial to understanding
the estimation procedure.
Consider a general dynamic and stochastic optimization problem defined by a
vector of state variables x = [x~ : x~l' and a vector of control variables u. The
problem is to find the control sequence {Ut} satisfying
V(xo) = max £0
{u.} t=O
subject to,
Xlt+1 =gl(Xt,Ut)
where,
£V(g(Xt,Ut,C:t+dlxt} = J V(g(Xt,Ut,C:t+d)dG(c:),
If
ogl =0
OXI
and Xlt does not Granger cause X2t, (i.e., og2/oxl = 0), then because og2/0Ut = 0,
the necessary conditions reduce to
The above conditions are termed Euler equations and have a convenient structure.
The parameters of the Euler equations contain only the parameters of ogl/OUt, /3,
and the parameters of the return function. Unlike the first-order conditions for more
general problems, the Euler equations do not contain V'(xt+d, which complicates
estimation efforts because changes systematically over an iterative solution procedure
and presumably over a data sample. For this reason the proposed estimation procedure
applies to dynamic problems in which the state and control variables can written so
that ogl /OXI = o.
The Euler equations are unobservable because of the expectations operator. If
et is a forecast error and Xt- j (j = 0, 1, ... ) are elements of an information set,
the Rational Expectations Hypothesis (REH) states £(etlxt, Xt-h ... ) = O. If the
parameters of the Euler equations can be expressed in terms of the parameter vector
(), the forecast error is
48 A.i. Reed and C. Hallahan
d = argmino S«(}, V) ,
where
and
L
00
m n «(}, x) = n- I et«(}) 0 Zt .
t=1
To obtain a closed-form solution to recursive dynamic and stochastic optimization
problems, one must compromise on the functional form. The Stochastic Optimal
Linear Regulator specifies a quadratic objective function and linear constraints. This
class of models takes the form
V(XO) = max £0
{Ut}
f
t=O
(3t{[x~,u~l [~, q]
W [~:]}
subject to
where
Xt = Xlt] , T=
[ X2t [Til T21] .
T21 0
The infinite time horizon problem is re-cast as a two period problem comprising
Bellman's equation
Assuming Xlt does not Granger cause X2t. we have a2l = 0 and b2 = 0, and the
problem is recursive. If, in addition all = 0, the Euler equations
serve as a set of necessary conditions. Notice the Euler equations only contain the
parameters bl and the parameters of the objective function.
Now, define b = [b; : b~l', the matrix a with elements aij, and make the transfor-
mations
Vt = q-lW'Xt + Ut
Q =q
B=b.
This permits the problem to be re-stated as
with solution,
Vt = -Fxt ,
where
The above discussion indicates that convergence of the Ricatti equations induces
an important function. This function maps the parameters of the stochastic regulator
(i.e., Til, T21, W, q, a12, bl , and a22) to the reduced-form coefficients, A - BF.
The above discussion also reveals that iterations on the Ricatti equations amount
to solving the dynamic problem 'backwards'. In the two-period reformulation of
the problem, period t's value function is defined as the maximum of the current
period return and the next period's expected value function. Period t - l's value
function is defined as the maximum of period t - l's return function and period
t's expected value function. Back substituting next period's value function into the
current period's condition yields a sequence of optimal controls. In short, the solution
procedure proceeds forward by computing past values of the optimal control.
By definition, finite time horizon problems are bounded, and their solution requires
beginning in the terminal period and ending in the starting period. However, infinite
time horizon problems require bounded value functions, which in turn require that
distant period return functions and their control must approach zero. Notice that if
Po = 0, Fo = 0, A - BFo = 0, and B -:f:. 0, the control in period T (i.e., VT) is 0 as
T - 00. Hence, setting Po = 0 and iterating on
until the matrix P converges to a fixed point is equivalent to solving the infinite time
horizon problem backwards.
The reduced-form solution of the stochastic regulator describes the movement of
economic data in four different, but interrelated, dimensions. First, the reduced-form
is not invariant to systematic changes in policy. A systematic change in a policy
variable within the X2 vector is represented by a change in the a22 coefficient. The
solution procedure indicates a change in policy will not only alter the A matrix, but
also will alter F, and therefore alter decision rules of agents. Hence, the problem
addresses the Lucas critique of econometric policy evaluation in which reduced forms
are not invariant to changes in policy.
Second, like any regression model, the reduced form coefficients measure the
response of next period's state vector to a one unit change in the current state vector.
Third, the reduced-form describes the response of the economy to c t+ I, the vector of
exogenous shocks. Specifically, such a change cannot be predicted either by agents
in the model or by the econometrician, based on the current period state variables.
The above setup implies a serially correlated response of the state variables to a
single, uncorrelated shock. A persistently higher path of food prices following a
drought describes a serially correlated response to a single, uncorrelated surprise. A
bounded regulator problem implies a stable A - BF matrix (one with eigenvalues
less than unity in modulus). A stable A - BF matrix implies the state variables can
be expressed as a function of current and past shocks. In particular, let the matrix H
capture the instantaneous causality (covariance) between elements of the ct vector,
and define et as the vector of uncorrelated errors (Sargent, 1978). The inverted
system is
A Bootstrap Estimator for Dynamic Optimization Models 51
L
00
The coefficients of this impulse response function measure the contribution of past
shocks on the current state vector. Equivalently, the coefficients measure the persis-
tent movement of the state vector following a single shock.
Fourth, it can be shown that the linear (in variables) Euler equations
can be factored into symmetric 'feedback' and 'feedforward' terms, and the endoge-
nous state vector XIt can be expressed as a function of the future expected stream
of the exogenous state variables {X2t} (Sargent, 1987 a, Ch. 14). Since {x2t+d is
assumed known, the prediction equations describing the stochastic path of X2 are ig-
nored. The computation of this 'perfect foresight' solution is detailed in the Appendix
for the example given in a subsequent section.
The proposed estimation procedure enables the analyst to compute and make
approximately correct inferences about the above responses. Successful computation
permits a rigorous interpretation of the economic time series data.
3. A BOOTSTRAP ESTIMATE
The parameters of the model described in the previous section are estimated using
a bootstrapping procedure and Bayes' Theorem. The prior density is an indicator
function that is diffuse when the boundary conditions hold and 0 otherwise. The
Bayesian bootstrap procedure permits valid inference on all of the parameters and
response coefficients.
The most convenient way to explain how the bootstrap procedure is applied here
is to examine the four fundamental components of the model. These are
d = argmino S(f), V) ,
where
and
52 A.i. Reed and C. Hallahan
L
00
3. Constraints
In the unrestricted reduced form, ,8\2 is a 'free' parameter. In the stochastic regu-
lator, ,8\2 is a function of ,811 and ,821. This function or restriction may be impossible
to impose on an econometric reduced-form representation. Conceptually, however,
both reduced forms satisfy a similar regression structure because both residuals sat-
isfy the condition £(Ct+IXt) = O. Conceptually, either regression structure could be
estimated using a Seemingly Unrelated Regressions (SUR) estimator. The essence
of the proposed procedure is to generate bootstrap samples using the unrestricted
reduced form, restrict the bootstrap estimates to satisfy boundary conditions, and
compute the restricted reduced form.
GMM estimates of fJ are computed from the sample Euler equations using both
the original data and the bootstrap samples. Bootstrap 'T' statistics are used to make
draws on the parameters ,821 and fJ from the approximate likelihood function.
The Ricatti equations are evaluated at the parameter values of the problem.
Convergence within J iterations implies the boundary conditions hold, and the prior is
given a value of one. A - BF is computed for draws that converge. Nonconvergence
implies the boundary conditions do not hold in J iterations. In this case, the prior
density is assigned a value of zero.
The key to implementing the above procedure lies in drawing the parameters
from the bootstrap T statistic. The problem is similar to that of Geweke (1986) who
had the convenience of exact inference in a linear regression model with normally
distributed error terms. There, the pivotal element is distributed as a multivariate
Student-t and can be drawn from a random number generator and added to the OLS
estimate to obtain parameter draws from the likelihood. Here, the bootstrap 'T'
statistic may not be pivotal, but we assume it is nearly so, so that the likelihood can
conveniently be factored.
Bickel and Freedman (1981) and Freedman (1981) provide the conditions under
which the distribution of a bootstrap estimate approximates the distribution of the
statistic - roughly, the conditional distribution of the bootstrap sample must eventually
A Bootstrap Estimator for Dynamic Optimization Models 53
approach the distribution of the sample. When this condition holds, the conditional
distribution of the bootstrap pivot approaches the distribution of the theoretical pivot.
This result is important for both frequentist and Bayesian inference. It enables
frequentists to construct accurate confidence intervals when the distribution of the
sample is unknown. For a Bayesian analysis, the moments of the posterior density of
the parameters must be computed. The posterior density is proportional to the product
of a prior density and the likelihood function. Boos and Monahan (1986) factor the
likelihood function into a function of the data and a function of the theoretical pivot.
This factorization is performed under the assumption that the statistic is sufficient.
Bickel and Freedman's (1981) result permits Boos and Monahan (1986) to replace
the unobserved pivot with the bootstrap pivot in order to approximate the posterior
density.
This result is central to our method. It permits us to make draws from the support
of the approximate likelihood function using bootstrap pivots. SUR estimates of
the unrestricted parameter vector /3 = [/3;!, /3;2' /3~d' and GMM estimates of the
parameters (J deliver the point estimates b = [b;!, b;2' b~d' and the point estimate
d. In addition each estimator provides the covariance matrices C b and Cd. The
theoretical pivot for parameter /3 is T! = C;!/2(b - /3), and the bootstrap pivot is
Tt = C:-!/2(b* - b). Since the distribution ofT! is near Tt, set T! equal to Tt and
The subvector (/3~! : (J'), is used to construct the matrices of the stochastic regulator
problem and the Ricatti equations. For a draw in which the Ricatti equations con-
verge, the restricted response coefficient A - BF is computed. Means and standard
deviations are then computed for these 'successful' draws. We illustrate this method
in the next section.
4. AN EXAMPLE
One statistic of interest to agricultural economists is the food price margin. The
food price margin is the difference between the value of a particular food item and
the price paid to farmers for the farm component of the good. Hence, the food
price margin defines the value added to the item by the processing sector. Empirical
research in this area attempts to predict how food price margins change in response
to a variety of exogenous shifters. Wohlgenant (1989) recognizes that nonfarm and
farm factors of production are substitutes in the manufacture of food and explores the
implications of input substitution for the movement of food price margins. Estimates
of the parameters are obtained from functions derived from static duality theory. An
earlier study, Wohlgenant (1985) explores the movement of food price margins over
time. Estimates of the parameters of a univariate dynamic and stochastic optimization
54 A.J. Reed and C. Hallahan
problem are computed. The problem illustrated in this section shows that multivariate
relationships among factors of production need not be sacrificed to obtain parameter
estimates of a dynamic and stochastic economic model.
Our example has the following specification:
The representative food processor's objective function
L
00
where
~labt )
( ~Jart ,
~enet
where
+ demt.
labt )
( Jart =
[PII
P21
PI2
P13] (labt-I)
P22 P23 Jart-I
enet P31 P32 P33 enet_1
A Bootstrap Estimator for Dynamic Optimization Models 55
wagt
wagt-I
+ [ p"
PI5 PI6 P17 PI8
P24 P25 P26 P27 P28 PN
P29 1 enprt
enprt_1
P34 P35 P36 P37 P38 P39
demt
demt_1
The price margin
(Pt)
rt
= [Wl1
W21
WI2 W13 ]
W22 W23
C"b,_' )
jart-I
enet_1
wagt
wagt-I
enprt
+ [W14 WI5 WI6 WI7 WI8 WI9 ]
W24 W25 W26 W27 W28 W29 enprt-I
demt
demt-I
The model describes a typical food processing firm. This firm employs labor
(lab), farm (jar), and energy (ene) in the production of food. The firm's production
process is described by a linear production function. Each period the processing firm
receives the price of food (P), and pays wages (wag), farm price (r), and energy price
(enpr). The model also describes a typical farm supplier. This supplier receives the
price r for the farm inputs sold to the processor.
The processing firm incurs two types of internal capital costs associated with
utilizing the three factors. First, it incurs a long-run-returns-to-scale cost associated
with combining capital and the three factors. Returns-to-scale-cost parameters are
embedded in the H matrix (with elements hij). Wohlgenant (1989) could not reject
constant returns to scale for most of the food processing industries. We impose this
restriction with h22 = O. Second, the processing firm incurs short-run capital costs
of adjustment, whose parameters are embedded in the D matrix (having elements
dij ). While the farm firm experiences long-run constant returns to scale, capital costs
associated with output adjustments are captured in the parameter c.
The processing industry aggregate faces a consumer demand function for food
output as well as the cost function of the farm sector. The variable dem represents
the stochastic shifts in consumer demand, and AI represents the slope of the inverted
demand function. I At the beginning of each period, a shock occurs to wages,
energy prices, and demand shifts. These shocks define a set of Markov processes
described by three linear difference equations. A change in the parameters of these
difference equations represents a change in economic policy. The problem is to find
the sequences of labor, farm, and energy that maximizes the expected social welfare
I The AI and Cti parameters are obtained or derived from previous empirical studies
[Huang (1988), Putnam (1989)]. Using the sample means of the data, the demand shifter, dem,
is evaluated as the residual of the consumer demand function.
56 A.J. Reed and C. Hallahan
function. In turn, this solution implies a sequence of equilibrium food and farm price
sequences.
We used quarterly, U.S. beef industry data from 1965.1 to 1988.4 to construct
the variable sequences of the model. Data sources and a description of the variable
construction are available from the authors upon request. Two observations are lost
to lags in the model. Four observations are lost to fourth differences. Hence, 90
observations are used in the estimation. Bootstrap samples of size 90 are drawn.
Aggregating the representative processor and the representative farm supplier's
objective function gives the following dynamic programming problem
L P) ,
00
where
.".(3) _
"t -
L
00
where
.".(4) _
"t -
A Bootstrap Estimator for Dynamic Optimization Models 57
subject to the equations of motion given above. We compute the posterior distribution
of the objective function parameters and the linear stochastic difference equations.
The prior is assigned a value 1 if the Ricatti equations associated with V(4) converge
within 150 iterations. Otherwise, the prior is assigned a value zero. 664 of 1000
draws from the bootstrap likelihood resulted in convergent Ricatti equations.
We also compute the posterior for A - BF. A - BF represent the response
coefficients of the reduced-form input demand functions. Combining A - BF with
the consumer demand function gives the parameters of the food price equation.
Combining A - BF with the farm supplier's Euler equations gives the parameters
of the farm price equation. The food price and the farm price functions constitute the
food price margin function.
In Tables 1 to 3, we report the means and standard deviations (in parentheses) of
the posterior distribution. We assume a quadratic loss function. Therefore, the mean
represents our parameter estimate because it minimizes the loss function (Zellner,
1987). The standard error serves as the measure of dispersion of the posterior.
Table 1 reports the estimates of the parameters of the stochastic regulator, its
reduced-form solution, and the price margin functions. The negative estimate of
hI I suggests that labor is a capital saving input in the long-run in the beef industry.
The results also suggest firms consume capital when they adjust labor. (d l1 > 0)
However, they can offset capital adjustment costs by substituting farm inputs for labor
(d 12 < 0). Our estimate of the parameter c (62.6) indicates the short-run supply of
farm inputs facing the processing industry is upward sloping.
Table 1 also reports the parameter estimates of the equation of motion. The results
indicate the demand shifter displays oscillating (complex roots) patterns. The average
period from peak-to-peak is approximately one month (about 1/3 of a quarter). Also,
the results indicate that changes in energy prices have been more permanent than
have changes in wages.
Estimates of the coefficients of the equilibrium input demand functions are re-
ported in Table 1. Coefficient estimates of the reduced form are composite functions
of all or many of the parameters of the problem. Hence, the standard deviations
associated with the composite coefficients embody the standard deviations of many
parameters. The composite coefficients sometimes capture opposite effects. We es-
timate a negative steady-state cost of capital associated with labor. We also estimate
a positive dynamic cost of capital associated with labor (hl1 < 0, d l1 > 0). The
result of these offsetting effects is a positive response of labor to current period wages
(0.173). Apparently, it is the negative steady-state costs that induce firms to hire less
labor when consumer demand increases (-0.111). Our results are consistent with
58 A.J. Reed and C. Hallahan
TABLE I
Parameter estimates, beef model. •
where
Labt
-(wagt, rt, enprt) [ tart
1 -10.2 .000 .000
(8.1)
- (1/2) (Labtl tart, enet) [ .000.000.000
Labt )
( tart
enet .000 .000 6.34 enet
(5.3)
32.1 -48.0 .000
(29.2) (43.3)
-48.0 1.00 .000 ~Labt )
-(1/2) (~Labtl ~ tart, ~ enet) ( ~ tart .
(43.3)
~enet
.000 .000 4.72
(4.5)
where,
.000 .000
[ (.10)
100
( :;~tt:11 ) = .000 .953 .000
(.12)
( enprt
wagt )
demt+1 .000 .000 .995 demt
(.12)
-.21 .000 .000
(.09)
+
.000 -.037 .000
+ (cc2t+1+l)
lt
(.10)
.000 .000 -.286 c3t+1
(.10)
A Bootstrap Estimator for Dynamic Optimization Models 59
Table 1 (continued)
The decision rule
.268 .042 .007
(.30) (.09) (.02)
labt ) =
( Jart -.39 .915 .003 ( labt - 1 )
(.28) (.21) (.02) Jart-l
enet enet_l
.002 -.038 .292
(.07 (.12) (.19)
1
.173 -.011 -.013 .000 -.111 .008
(.49) (.05) (.05) (.00) (.18) (.03)
wagt-l
wagt
.014 .001 .012 -.000 .007 -.009 enprt
+ (.22) (.02) (.11) (.01) (.12) (.02) ( enprt-l .
demt
-.014 .001 -.197 -.002 .104 -.007 demt-l
(.06) (.01) (.55) (.05) (.22) (.05)
The price margin
1
(12.5) (.51) (2.0)
* Reported values are means of the posterior, and the numbers in parentheses are standard
errors of the posterior.
the notion that consumers have shifted toward products containing more nonfarm
inputs.
These results are also used to trace the impacts of exogenous changes on the
food price margin. Our results indicate that a wage increase induces an increase in
labor demand. However, positive adjustment costs associated with labor dampen
this increase. In response to a wage increase, firms substitute farm inputs for labor.
This raises the demand for farm inputs and increases farm prices. Our results suggest
the increase in wages raises the marginal costs of processing. However, the larger
increase in farm prices narrows the food price margin. The results also suggest a weak
relationship between the demand for farm inputs and a (positive) shift in consumer
demand. Our point estimate is slightly negative (-0.007). Hence, we estimate that
the price margin widens when consumers increase their demand for beef.
60 A.J. Reed and C. Hallahan
TABLE 2
Estimates of the perfect foresight solution, beef prices .•
Food Pricet, Pt Farm Pricet, Tt
j £t wagt+i £t enpTt+i £t demt+i £t wagt+i £t enpTt+i £t demt+i
1 -0.0489 -0.0084 1.0136 3.5922 1.9117 -3.6690
(.109) (.056) (.082) (5.38) (3.18) (5.42)
TABLE 3
Impulse response estimates, beef model. •
Food Pricet, Pt Farm Pricet, rt
j wagt_j enprt-j demt-j wagt_j enprt_j demt_j
0 0.0000 0.0000 0.0000 2.1067 -0.0265 -2.2515
0.0000 0.0000 0.0000 (2.41) (.736) (2.60)
farm inputs for labor before an expected wage increase. Our results also indicate the
price and quantity demanded of farm commodities change before a known increase
in energy price.
The impulse response coefficients are reported in Table 3. Our results measure the
change in food and farm prices following a shock to an exogenous variable. These
coefficients account for the contemporaneous relationships among the various shocks.
Hence, it is difficult to provide an intuitive explanation of the results presented in
Table 3.
62 A.i. Reed and C. Hallahan
5. CONCLUSIONS
This study uses the Bayesian bootstrap to compute econometric estimates of stochas-
tic, dynamic programming problems. Typically, statistical inferences on the reduced-
form coefficients of such a problem are difficult because of the complex cross-
equation restrictions that characterize such solutions. Likewise, direct estimation of
the problem's parameters requires the estimates to adhere to boundary conditions,
which when imposed, require classical techniques to be significantly modified or
discarded (since the boundary condition cannot be checked by evaluating, for ex-
ample, the eigenvalues of a matrix). By contrast, our procedure combines textbook
algorithms with the Bayesian bootstrap to form an estimator that is well suited to
impose such a restriction.
The estimator holds value for analysts facing difficulties imposing restrictions
on any econometric model. It also should be useful for analysts pursing a Bayesian
analysis, but who are uncomfortable with the usual assumption of normally distributed
error terms. All that is required is a statistical representation from which the analyst
can draw bootstrap samples of the variables of the model. The estimator could be
used, for example, to estimate static duality models when one is concerned with
imposing the required curvature restrictions.
REFERENCES
Baxter, M., M.J. Cricini, and K.G. Rouwenhorst: 1990, 'Solving the stochastic growth model
by a discrete state-space, euler-equation approach', Journal of Business and Economic
Statistics 8, 19-21.
Bickel, P.J., and D.A. Freedman: 1981, 'Some asymptotic theory for the bootstrap', The
Annals of Statistics 9, 1196-1217.
Boos D.O., and I.F. Monahan: 1986, 'Bootstrap methods using prior information', Biometrika
73,77-83.
Christiano, L.J.: 1990, 'Solving the stochastic growth model by linear-quadratic approximation
and by value-function iteration', Journal of Business and Economic Statistics 8, 23-26.
Coleman, W.J.: 1990, 'Solving the stochastic growth model by policy-function iteration',
Journal of Business and Economic Statistics 8, 27-29.
den Haan, W.J., and A. Marcet: 1990, 'Solving the stochastic growth model by parameterizing
expectations', Journal of Business and Economic Statistics 8, 31-34.
Eckstein, Z.: 1985, 'The dynamics of agriculture supply: a reconsideration', American Journal
of Agricultural Economics 67,204-214.
Freedman, D.A: 1981, 'Bootstrapping regression models', The Annals of Statistics 9, 1218-
1228.
Gagnon, J.E.: 1990, 'Solving the stochastic growth model by deterministic extended path',
Journal of Business and Economic Statistics 8, 35-38.
Gallant, AR.: 1987, Nonlinear Statistical Models, New York: John Wiley and Sons.
Gallant, AR., and G.H. Golub: 1984, 'Imposing curvature restrictions on flexible functional
forms', Journal of Econometrics 26, 295-321.
Geweke, J.: 1986, 'Exact inference in the inequality constrained normal linear regression
model' , Journal of Applied Econometrics 1, 127-141.
Huang, K.: 1988, 'An inverse demand system for U.S. composite goods', American Journal
of Agricultural Economics 70, 902-909.
A Bootstrap Estimator for Dynamic Optimization Models 63
Labadie, P.: 1990, 'Solving the stochastic growth model by using a recursive mapping based
on least squares projection', Journal of Business and Economic Statistics 8, 39-40.
Lucas, R.E.: 1976, 'Econometric policy evaluation: a critique', The Phillips Curve and the
Labor Market (K. Brunner and A. Meltzer eds) Volume 1 of Carnegie-Rochester Con-
ferences in Public Policy, a supplementary series to the Journal of Monetary Economics,
Amsterdam: North Holland.
McGratten, E.R.: 1990, 'Solving the stochastic growth model by linear-quadratic approxima-
tion', Journal of Business and Economic Statistics 8, 41-44.
Miranda, MJ., and lW. Glauber: 1991. "Estimation of dynamic nonlinear rational expecta-
tions models of commodity markets with private and government stockholding." Paper
presented at the annual meetings of the American Agricultural Economics Association.
Manhattan, Kansas. August 4-7, 1991.
Putnam, J.J.: 1989, Food Consumption, Price, and Expenditures USDNERS Satistical Bul-
letin No. 773.
Sargent, T.J.: 1987a, Macroeconomic Theory, Boston: Academic Press.
Sargent, T.J.: 1987b, Dynamic Macroeconomic Theory, Cambridge: Harvard University Press.
Sargent, T.J.: 1978: 'Estimation of dynamic demand schedules under rational expectations',
Journal of Political Economy, 86, 1009-1044.
Sims, C.: 1990, 'Solving the stochastic growth model by backsolving with a particular
nonlinear form for the decision rule', Journal of Business and Economic Statistics 8,
45--48.
Tauchen, G.: 1990, 'Solving the stochastic growth model by using quadrature methods and
value-function iterations', Journal of Business and Economic Statistics 8, 49-51.
Taylor J.B. and H. Uhlig: 1990, 'Solving nonlinear stochastic growth models: a comparison
of alternative solution methods', Journal of Business and Economic Statistics 8, 1-17.
Wohlgenant, M. K.: 1989, 'Demand for farm output in a complete system of demand functions' ,
American Journal ofAgricultural Economics, 71,241-252.
Wohlgenant, M.K.: 1985, 'Competitive storage, rational expectations, and short-run food price
determination', American Journal of Agricultural Economics, 67,739-748.
Zellner, A.: 1987b, An Introduction to Bayesian Inference in Econometrics, Malibar: Robert
E. Krieger Publishing Company.
GREGORY C. CHOW
ABSTRACT. An algorithm is proposed to compute the optimal control function without solving
for the value function in the Bellman equation of dynamic programming. The method is to
sol ve a pair of vector equations for the control variables and the Lagrange multipliers associated
with a set of first-order conditions for an optimal stochastic control problem. It approximates
the vector control function and the vector Lagrangean function locally for each value of the
state variables by linear functions. An example illustrates that such a local approximation is
better than global approximations of the value function.
Previously (Chow 1992a, 1993) I have shown that the optimum control function
of a standard optimum control problem can be derived more conveniently by using
Lagrange multipliers than solving the Bellman partial differential equation for the
value function. This derivation also provides numerical methods for computing the
value of the optimum control corresponding to a given value of the state variable
that are more accurate than those based on solving the Bellman equation. This paper
explains the gain in numerical accuracy and illustrates it by example.
Consider the following standard optimum control problem in discrete time (an anal-
ogous problem in continuous time is considered in Chow (1993), and the results of
this paper apply equally well to that problem). Let Xt be a column vector of p state
variables and Ut be a vector of q control variables. Let r be a concave and twice
differentiable function and f3 be a discount factor. E t denotes conditional expectation
given information at time t, which includes Xt. The problem is
L
00
subject to
Xt+l = f(Xt, ut} + CHI, (2)
where Ct+1 is an i.i.d. random vector with mean zero and covariance matrix I;.
Chow (1992a) solves this problem by introducing the p x 1 vector At of Lagrange
multipliers and setting to zero the derivatives of the Lagrangean expression
L
00
The optimum control at time t is obtained by solving equations (4) and (5) for Ut and
At.
The difficult part in solving these equations is the evaluation of the conditional
expectation EtAt+!, a problem to be treated shortly. We first point out the main
differences between this approach and that of solving the Bellman partial differential
equation for the value function V(x). First, it is not necessary to know the value
function to derive the optimum control function since the latter is a functional, not of
V, but of the vector A of derivatives of V with respect to the state variables. Thus,
obtaining the value function V requires more than is needed to obtain the optimum
control function and hence solves a more difficult problem than necessary. For
example, in the problem of static demand theory derived from maximizing consumer
utility subject to a budget constraint, Bellman's method amounts to finding the indirect
utility function by solving a partial differential equation, whereas we would apply
the method of Lagrange multipliers in obtaining the demand function. Second, our
equation (5) could be obtained by differentiating the Bellman equation with respect
to the state variables. This is a very important first order condition for optimality,
but it is ignored when one tries to solve the Bellman equation for the value function
and thus makes the solution of the optimum control problem more difficult. Third,
for most realistic applied problems an analytical solution for the value function is not
available. A common practice when solving the Bellman equation is to use a global
approximation to the value function when deriving the optimum control function.
By contrast, in solving equations (4) and (5) for a given Xt we avoid using a global
approximation to the Lagrange function in the neighborhood of Xt and use instead
a linear function to approximate A locally for each Xt. This typically yields a more
accurate approximation to the Lagrange function and hence to the corresponding
value function in the Bellman approach.
To provide a numerical method for solving the first order conditions (4) and (5), we
approximate the Lagrange function in the neighborhood of Xt by a linear function,
(6)
where the t subscripts of the parameters H t and ht indicate that the linear function
(6) applies to points not too far from Xt, in particular to Xt+!. Thus
Computation of Optimum Control Functions 67
(7)
Et .At+1 = Hd(xt, ut} + ht .
Taking Xt as given, we try to solve (4) and (5) for Ut and.At using (7) for Et.At+l.
Substituting (7) into (4) yields
ar
-a + (3 -a
at'
(Hd + h t ) = 0. (8)
Ut Ut
Assuming tentative~ H t and h t to be known, we solve (8) for Ut using linear
approximations of -a
rand f
Ut
(9)
ar
-a
Xt
= Kltxt + K 12tUt + kIt ,
(10)
where the time subscripts for the parameters of the linear functions indicate that the
functions are valid for values of x and U near Xt and the optimal u;. These parameters
are obtained by evaluating the partial derivatives of
value for u;,
:r
and f at Xt and some initial
the latter to be revised after each iterati~n. Substituting (9) and (10)
into (8) gives
Ut = GtXt + gt , (12)
where
To find the parameters H t and h t for .At, we substitute (6), (7), (9), (10) and (12)
into (5) to get
(17)
To solve equations (4) and (5) numerically, we assume some initial value for the
optimal u; and linearize or / OUt, or / OXt and f about Xt and this value of u; as in
(9) and (10). We then solve the pair of equations (13) and (16) iteratively for G t and
H t . Given G t and H t , the pair of equations (14) and (17) can be solved iteratively
for 9t and h t. The value of optimal control u; is found by GtXt + 9t. This value will
be used to relinearize or / OU, or/ox and f until convergence.
The reader may recognize that the numerical method suggested in this paper
amounts to solving the well known matrix Ricatti equations (13) and (16) for G t and
H t in linear-quadratic control problems. However, there are two important differ-
ences from the standard treatment of stochastic control by dynamic programming.
First, our derivation is different as it does not use the value function at all. Second,
we emphasize the solution of two equations (4) and (5) for Ut and At while treating
Xt as given. We have avoided global approximations to the functions u(x) and A(X)
which can lead to large errors. We employ linear approximations to u(x) and A(X)
only locally about a given Xt and build up the nonlinear functions u(x) and A(X)
by these locally linear approximations for different Xt. To generalize our second
point, we can choose other methods to solve equations (4) and (5) for a given Xt.
We could, for example, use a quadratic approximation to A( x) as discussed in Chow
(1992a). We leave other numerical methods for solving equations (4) and (5) for
future research.
3. AN ILLUSTRATIVE EXAMPLE
The first equation assumes Xlt = log At to be a random walk with a drift "Y, Ct
being a random shock to technology. The second equation gives the evolution of
capital stock X2t, with b denoting the rate of depreciation and investment being the
difference between output qt-I given by the production function and consumption
UI,t-l. The utility function r in (1) is assumed to be
TABLE 1
Optimal control variables corresponding to
selected state variables.
Ul U2 Xl X2
TABLE 2
Parameters of linear optimal control functions.
Gll Gl2 G2l G22 91 92
1. 1.255 0.0546 -0.219 -0.0095 0.165 1.223
2. 1.207 0.0513 -0.213 -0.0090 0.558 1.112
3. 1.280 0.0531 -0.209 -0.0087 0.384 1.107
4. 1.169 0.0471 -0.199 -0.0080 1.059 1.055
5. 1.197 0.0474 -0.199 -0.0079 1.085 1.071
6. 1.170 0.0455 -0.198 -0.0077 1.382 1.086
7. 1.263 0.0479 -0.203 -0.0077 1.182 1.136
8. 1.470 0.0532 -0.210 -0.0076 0.517 1.195
9. 1.602 0.0545 -0.211 -0.0072 0.207 1.202
10. 1.714 0.0550 -0.206 -0.0066 -0.048 1.167
11. 1.789 0.0547 -0.202 -0.0062 -0.123 1.149
12. 2.009 0.0581 -0.207 -0.0060 -0.957 1.164
13. 1.974 0.0543 -0.196 -0.0054 -0.495 1.090
14. 2.420 0.0638 -0.205 -0.0054 -2.500 1.142
15. 2.542 0.0633 -0.204 -0.0051 -2.820 1.122
16. 2.316 0.0548 -0.193 -0.0046 -1.354 1.042
17. 2.558 0.0581 -0.192 -0.0044 -2.262 1.036
18. 3.021 0.0652 -0.195 -0.0042 -4.254 1.050
19. 3.372 0.0693 -0.198 -0.0041 -5.693 1.063
TABLE 3
Regressions of coefficients of linear control functions on
state variables (t statistics in parentheses).
Explanatory variables
Dependent
variables Xl X2 R2
the parameters of the locally linear approximations change with the state variables.
To describe the changes, Table 3 presents linear regressions of four parameters on the
two state variables and the accompanying t statistics and R2 for descriptive purposes
only (as the regressions are not based on a stochastic model).
For this example, we set the maximum number of iterations for solving the pair
of equations G t and H t using (13) and (16) to 25, given each value for u;
used in
linearizing ar / au, ar / ax and f. For our criterion of convergence to three significant
figures, the maximum number of 25 is found to be better than 50 and 20. Once the
optimal linear control function is found for Xl and X2 as of 1951.1, the optimum u;,
G t , H t , 9t and ht can be used as initial values to compute the optimal linear control
function corresponding to Xl and X2 as of 1953.1, and so forth. It takes about eight
hours on a 486 personal computer to maximize a likelihood function with respect to
the five parameters using a simulated annealing maximization algorithm (see Goffe,
Ferrier and Rogers, 1992) which evaluates the likelihood function about 14,000 times
(or about two seconds per evaluation of the likelihood function). At each evaluation,
one must find the linear optimal control function for the given parameters, compute
the residuals of the observed values of the control variables from the computed
optimal values for 152 quarters, compute the value of the likelihood function, and
determine the new values of the five parameters for the next functional evaluation,
which may be time consuming. Hence, merely computing the linear optimal control
function for a given set of parameters in our example should take less than one second
of a 486 computer using Gauss.
In this paper, I have shown how locally linear optimal control functions can be
computed for a standard stochastic control problem in discrete time. The algorithm
is based on solving two equations for the vectors of control variables and Lagrange
multipliers, given the vector of state variables. It is easy to implement using a personal
computer. It can serve as an important component of an algorithm for the statistical
estimation of the parameters of a stochastic control problem in econometrics.
ACKNOWLEDGEMENTS
The author would like to thank Chunsheng Zhou for excellent programming assistance
in obtaining the numerical results reported in this paper and David Belsley for helpful
comments on an early draft.
REFERENCES
Chow, Gregory c., "Dynamic optimization without dynamic programming," Economic Mod-
elling, 9 (1992a), 3-9.
Chow, Gregory C., "Statistical estimation and testing of a real business cycle model," Princeton
University, Econometric Research Program, Research Memorandum No. 365 (1992b)
Chow, Gregory C.,"Optimal control without solving the Bellman equation," Journal of Eco-
nomic Dynamics and Control, 17 (1993).
72 G.c. Chow
Goffe, William L., Gary Ferrier and John Rogers, "Global optimization of statistical functions,"
in Computational Economics and Econometrics, Vol. 1, eds. Hans M. Amman, D. A.
Belsley, and Louis F. Pau, Dordrecht: Kluwer, 1992.
King, Robert G., Charles I. Plosser and S. T. Rebelo, "Production, growth, and business cycles:
II. New Directions," Journal o/Monetary Economics, 21 (1988),309-342.
Watson, Mark W., "Measures of fit for calibrated models," Northwestern University and
Federal Reserve Bank of Chicago, mimeo, 1990.
PART TWO
ABSTRACf. Macroeconomics has just passed through a period in which it was assumed
that everyone knew everything. Now hopefully we are moving into a period where those
assumptions will be replaced with the more realistic ones that different actors have differ-
ent information and learn in different ways. One approach to implementing these kinds of
assumptions is available from control theory.
This paper discusses the learning procedures that are used in a variety of control theory
methods. These methods begin with deterministic control with and without state variable
and parameter updating. They also included two kinds of stochastic control: passive and
active. With passive learning, stochastic control variables are chosen while considering
the uncertainty in parameter estimates, but no attention is paid to the potential impact of
today's control variables on future learning. By contrast, active learning control seeks a
balance between reaching today's goals and gaining information that makes it easier to reach
tomorrow's goals.
INTRODUCTION
We have just passed through a period in which the key assumptions in macroeconomic
theory were that everyone knew everything. Now hopefully we are moving to a new
period in which it is assumed that the various actors have different information about
they economy; moreover, they learn but they do so in different ways.
Recently Abhay Pethe (1992) has suggested that we are now in a position to
develop dynamic empirical macroeconomic models in which some actors learn in
a sophisticated fashion by engaging in active learning with dual control techniques
while other actors learn only incidentally as new observations arrive and are processed
to form new estimates. One subset of this latter group considers the uncertainty in
the economic system when they choose their actions for the next period. The other
subset ignores the uncertainty in choosing a course of action for the next period.
Finally, there is a fourth group that does not even bother to update their parameter
estimates as additional observations are obtained.
While it is possible that one or more of these subgroups will be empty in any real
economy, the starting assumption that different actors have different information,
choose their actions in different ways, and learn in different ways seems a much
more realistic and solid foundation for macroeconomics that the assumptions of the
previous era.
However, in the new period the analysis of macroeconomic systems will require
different tools then those used in the previous era. While many results from the
previous period could be obtained with analytical mathematics, the tools of the new
era are much more likely to be computational. In anticipation of this, the current
paper reviews the state of the art with regard to one set of tools that could serve well
in the new era. These are the methods of control theory, which date back to the work
of Simon (1956) and Theil (1957) as well as Aoki (1967), Livesey (1971), MacRae
(1972), Prescott (1972), Pindyck (1973), Chow (1975) and Abel (1975). These
methods are now enjoying a resurgence as attention turns once again to learning in
economic systems.
Also the resurgence is being abetted by technical changes in computer hardware
and software that have continued at a rapid pace in the last two decades. Control
methods that were difficult to use twenty years ago on mainframe computers can now
be used on ubiquitous desktop computers. Also super computers, some with parallel
processing capabilities, are rapidly opening an era in which even active learning
stochastic control methods can be used on economic models of substantial size.
It is in this context that this paper examines the current state of the art in numerical
methods for control theory beginning with deterministic systems and passing through
passive learning methods to end with active learning systems. The emphasis is not
on the scope of the activity, since no attempt is made to be comprehensive. Rather
the focus will be on areas where new developments in hardware and software offer
us new opportunities. Also, some major problems that stand in our pathway will
be highlighted. Deterministic problems will be discussed first, followed by passive
learning and active learning problems.
1. DETERMINISTIC CONTROL
Since all uncertainty is ignored in solving deterministic control problems one is free
to use either quadratic-linear or general nonlinear methods. Consider first quadratic-
linear methods and then progress to the general nonlinear problems.
The deterministic quadratic-linear tracking problem is written as find
( N-l
Uk ) k=O
to minimize the cost functional
N-l
+~ L {[Xk - XkJ' W[Xk - XkJ + [Uk -ihJ' A[Uk - Uk]} , (1)
k=O
where
subject to
Decision makers who use stochastic methods fall into two groups. Individuals in the
first group use passive learning methods. Decision makers in this group consider the
uncertainty in the system equation parameters while determining policies; however,
no consideration is given to the effect of the decisions on future learning. In contrast,
Computational Approaches to Learning with Control Theory 79
individuals who use active learning methods consider the possibility of perturbing
the system in order to decrease parameter uncertainty in the future.
There are two sources of uncertainty in passive learning models: (1) additive
error terms and (2) unknown parameters. There is also the possibility of state variable
measurement error in passive learning models; however we delay the discussion of
measurement error until the next section of this paper.
The most basic passive learning quadratic-linear tracking problem is written as
find
( N-l
Uk ) k=O
N-l
+! L [Xk - Xk]' W[Xk - Xk] + [Uk - Uk]' A[Uk - Uk]} , (3)
k=O
where
E = expectations operator,
Xk = state vector - an n vector,
Xk = desired state vector - an n vector,
Uk = control vector - an m vector,
Uk = desired control vector - an m vector,
W N = symmetric state variable penalty matrix at terminal period, N,
W = symmetric state variable penalty matrix for periods 0 thru N - 1,
A = symmetric control variable penalty matrix forperiods 0 thru N - I,
subject to
and
~k '" N(O, Q) ,
~ 00
00 '" N(Oo, ~olo) ,
where
80 D. Kendrick
In this method, the covariance of the parameters of the systems equations, ~1I11,
plays a major role in the choice of controls. The policy makers avoid controls
that add to the uncertainty in the system by choosing controls that are associated
with parameters with low uncertainty or choosing combinations of controls that are
associated with parameters that have negative covariances. Thus there is a motivation
to hold a "portfolio" of controls that have relatively low uncertainty.
As is shown in Kendrick (1981, Ch. 6), passive learning controls can be computed
with a variant of the Riccati method that is computationally very efficient. However,
the calculations that involve the covariance of the parameters make this method
somewhat less efficient than deterministic methods. Thus, the loss in moving from
deterministic to passive learning stochastic methods is not computational efficiency
so much as restriction on model specification. In deterministic methods one can easily
move from quadratic-linear to general nonlinear specifications. However, stochastic
control methods in the algorithms used in this paper are restricted to linear systems
equations and normal distributions.
The reason for this restriction is that one needs to be able to map the uncertainty in
one period into the next period with dynamic equations. It is a desirable property of
such systems that the form of the distributions remain unchanged from one period to
the next. For example, linear relationships can be used to map normal distributions in
one period into normal distributions in the next period. In contrast, a quadratic rela-
tionship would map a normal distribution in one period into a chi square distribution
in the next period and a Wishart distribution in the third period.
Restricting systems equations, and therefore econometric models, to linear equa-
tions is a high price to pay for being able to do stochastic control. Hopefully this
restriction will soon be broken by advances in numerical methods. One promising
approach to nonlinear models is Matulka and Neck (1992).
Passive learning models were formerly solved on mainframe computers. How-
ever, the personal computers in use today are fast enough to permit solution of these
models on the desktop. For example, the DUAL code of Amman and Kendrick
(1991) has recently been made available on IBM PC's and compatibles. This code
has both passive and active learning capabilities, but for the time being it is expected
that most usage on personal computers will be in the passive mode. This will change
shortly with the widespread use of faster CISC and RISC microprocessors as is
discussed below.
In summary, with passive learning stochastic control the choice of control vari-
ables in each period is affected by the covariance of the parameters of the system
equation. Also, as with deterministic control there is updating of the parameter esti-
mates and of the state variables in each period. We do not define separate names for
the different updating behaviors because it seems sensible that any decision maker
Computational Approaches to Learning with Control Theory 81
Next we consider the actor who considers the effects of the choice of control variables
in the current period on the future covariance of the parameters. This actor is
sophisticated enough to realize that perturbations to the system today will yield
improved parameter estimates that enable him to control the economic system better
in the future. He is also sophisticated enough to know that if the elements in the
covariance matrix are small that there will be little payoff to active learning efforts.
Moreover he knows that even if the elements in the covariance matrix are small it
may be worthwhile to attempt to learn if the additive system noises are large.
The model for this actor may be written as a general quadratic linear tracking
problem which is to choose the control path
( ) N-I
Uk k=O
N-I
+t L [Xk - Xk]' W[Xk - Xk] + [Uk - ih]' A[Uk - Uk]} , (5)
k=O
where
E = expectations operator,
Xk = state vector - an n vector,
Xk = desired state vector - an n vector,
Uk = control vector - an m vector,
Uk = desired control vector - an m vector,
WN = symmetric state variable penalty matrix at terminal period, N,
W = symmetric state variable penalty matrix forperiods thru N - 1,
A = symmetric control variable penalty matrix forperiods thru N - 1,
°°
subject to
with: Xo given,
where
(7)
and the first order Markov process
(8)
where
where the vectors ~k. (k. 11k. Xo, r;~U are assumed to be mutually independent,
normally distributed random vectors with known means and covariances (positive
semi-definite):
and where
Measurement error is also included in this model. Thus the state variables are not
observed directly but rather through a noisy process. Of course as the sizes of the
measurement errors decrease the gain to active learning efforts will increase.
Computational Approaches to Learning with Control Theory 83
The presence of measurement error in models with distributed lags also raises
the following issue: Normally some data are collected in each time period, and flash
estimates are issued before the full data set has been collected and processed. Thus
the most recent state estimate will be the noisiest while state variables from several
periods ago will have less noise associated with them. So there is a premium on using
data from several periods ago in the feedback rule. However, to control a system well
one wants to use the most recent state variables. This tradeoff between recent states
with noisy measurements and lagged states with less noisy measurement has not yet
been studied numerically. However, the computer code to facilitate such work is
already available.
The problem setup here is general enough to include not only measurement errors
but also to permit inclusion of time varying parameters. This level of sophistication
has not yet been programmed into our numerical codes, but the mathematical deriva-
tions and separate program development have been done by Tucci (1989). When
time varying parameters are present, the parameter covariance elements are likely
to be larger, so there will be more gain from active learning efforts. On the other
hand parameters learned today will be changing in the future; therefore there is less
potential gain from learning. This tradeoff has not yet been studied numerically.
Active learning stochastic control can be done with the DUAL code mentioned
above. This program has recently been modified and versions developed for super-
computers and workstations as well as mainframes. We have versions running on
Cray and IBM supercomputers, IBM mainframes and SUN and IBM workstations.
In addition a version for IBM PC's and compatibles has recently been developed. We
have discovered that we can solve small active learning problems even on IBM AT
computers with 80286 chips and substantially larger models on IBM PS/2 computers
with 80386 chips. Thus we are confident that the 486 chips and beyond will have
the capability to solve active learning stochastic control problems with a number of
states and controls.
Also we have found that it is possible to do large numbers of Monte Carlo runs on
very small models using SUN and IBM workstations. So far these experiments have
shown that actors who are sophisticated enough to employ active learning techniques
will not necessarily perform better on average than actors who use passive learning
stochastic control methods or even in some cases deterministic methods, cf. Amman
and Kendrick (1994). However, we are treating these results with some caution
because of the possibility that nonconvexities in the cost-to-go can affect them.
More than ten years ago Kendrick (1978) and Norman, Norman, and Palash
(1979) first encountered nonconvexities in active learning stochastic control prob-
lems. However, these results were obtained with computer codes of such complexity
that it was uncertain whether or not the nonconvexities were fundamental or not.
Also, the codes and computers of that time were not fast enough to permit detailed
studies of the problem. However, recently Mizrach (1991) has cast new light on this
problem by providing detailed derivations for the single-state, single-control problem
of MacRae (1972). He found that the non convexity was not a passing phenomenon
but rather was fundamental to active learning problems solved with the Tse and
Bar-Shalom( 1973) algorithm.
84 D. Kendrick
Amman and Kendrick (1992) then followed Mizrach's work by using numerical
work to confirm his results and focusing on the cause of the nonconvexities as the
initial covariance of the unknown parameter. As an aid to understanding this result
consider the MacRae model that is stated below.
The MacRae model was chosen for this work because it is the simplest possible
adaptive control problem. If nonconvexities occur in this problem then one can
expect that they will also appear in more complex models. The MacRae model is
Xo =0. (11)
5\ = 0, Uk =0 Vk.
This problem has been solved using the dual control algorithm ofTse and Bar-Shalom
as described in detail in Ch. 11 of Kendrick (1981). At period k of an N period
model with N - k periods to go to total cost-to-go can be written as
where the D, C, and P subscripts represent the deterministic, cautionary, and probing
components, respectively. The deterministic term includes all of the nonstochastic
elements. The cautionary term is a function of E k+ II k' i.e. of the uncertainty in the
next period before a new control can be applied. The probing term can be written as
N-I
Jp,N-k = ! tr L (Rj E;jj) , (13)
j=k+1
where !R is a Riccati-like term and E;jj is the covariance matrix of the unknown
parameters in period j after updating with data through period j. Notice that the
probing term is a function of the parameter covariance matrix for all periods from
the current to the terminal period.
Computational Approaches to Learning with Control Theory 85
IN
I
0" =0.5
•
== IN
".
I --
0' =LO
=2.0
==-- IN
".
J :=::::=
0"
==== "•
IN
0" =4.0
~
Fig. l.
~
Effects of a 2 on the total cost-to-go.
"•
It is this probing term that is the primary source of nonconvexities. In fact Amman
and Kendrick (1992) have shown that the nonconvexities can be switched off and on
by altering the initial variance of the uncertain b parameter. An example of this sort
is shown in Figure 1. With the setting of (72 = .5 at the top of the figure the cost-to-go
function remains a convex function of the initial period control, Uo. However, as (72
increases the non convexity appears and causes two local optima for the problem.
In addition to the nonconvexities from the probing term, Amman and Kendrick
also found that there are combinations of parameter values which, in conjunction
with large values of (72, will result in nonconvexities also arising in the cautionary
term.
So the bad news is that the nonconvexities appear to be fundamental to active
86 D. Kendrick
learning stochastic control problems. However, the good news is that there may be
some regularities about these nonconvexities that can be exploited to design efficient
solution algorithms. Also, even if brute force grid search methods must be employed,
computer speeds are increasing so rapidly that models that exhibit nonconvexities
can be solved.
Finally, there is some prospect that the parameter values in empirical economic
models will be such that the nonconvexities occur only rarely. It will take some time
and effort to establish this fact, but one can be hopeful that this will occur.
4. CONCLUSIONS
Economists do not need to use the unrealistic assumption that all economic actors
know everything. Rather there are tools at hand that will allow us to portray different
actors as having different information and able to learn as time passes. Moreover,
there are available algorithms and computer codes for modeling different kinds of
learning behavior in different actors. Some actors may be so sophisticated as to use
active learning methods in which they probe the system in order to improve parameter
estimates over time. Other actors may be sophisticated enough to consider the
covariance of parameters in choosing their control variables but learn only passively
with the arrival of new information. Other actors may be so unsophisticated that they
do not even update parameter estimates when new observations arrive.
Since computer speeds have increased greatly in recent years we can now model
all these kinds of behaviors using code that operates on supercomputers, workstations
and even personal computers.
However, the most sophisticated methods that involve active learning can give
rise to nonconvexities in the cost-to-go, so caution must be exercised until we can
learn more about when these nonconvexities arise and how to solve active learning
problems when they do occur.
REFERENCES
Abel, Andrew (1975), "A Comparison of Three Control Algorithms to the Monetarist-Fiscalist
Debate," Annals ofEconomic and Social Measurement, Vol. 4, No.2, pp. 239-252, Spring.
Amman, Hans M. and David A. Kendrick (1991), "A User's Guide for DUAL, A Program
for Quadratic-Linear Stochastic Control Problems, Version 3.0", Technical Paper T90--94,
Center for Economic Research, The University of Texas, Austin, Texas 78712.
Amman, Hans M. and David A. Kendrick (1992), "Nonconvexities in Stochastic Control
Models", Paper 92-91, Center for Economic Research, The University of Texas, Austin,
Texas, 78712.
Amman, Hans M. and David A. Kendrick (1994), "Active Learning - Monte Carlo Results,"
forthcoming in 1994 in Vol. 18 of the Journal of Economic Dynamics and Control.
Aoki, Masanao (1967), Optimization of StoclUlstic Systems, Academic Press, New York.
Chow, Gregory (1975), Analysis and Control of Dynamic Systems, John Wiley and Sons, Inc.,
New York.
Drud, Arne (1992), "CONOPT - A Large Scale GRG Code," forthcoming in the ORSA Journal
on Computing.
Computational Approaches to Learning with Control Theory 87
Fair, Ray (1984), Specification, Estimation and Analysis of Macroeconometric Models, Har-
vard University Press, Cambridge, Mass. 02138.
Hatheway, Lawrence (1992), Modeling International Economic Interdependence: An Appli-
cation of Feedback Nash Dynamic Games, Ph.D. Dissertation, Department of Economics,
The University of Texas, Austin, Texas 78712.
Kendrick, David A (1978), "Non-convexities from Probing an Adaptive Control Problem,"
Journal of Economic Letters, Vol. 1, pp. 347-351.
Kendrick, David A. (1981), Stochastic Control for Economic Models, McGraw-Hill Book
Company, New York.
Livesey, David A (1971), "Optimizing Short-Term Economic Policy," Economic Journal,
Vol. 81, pp. 525-546.
MacRae, Elizabeth Chase (1972), "Linear Decision with Experimentation," Annals of eco-
nomic and Social Measurement, Vol. 1, No.4, October, pp. 437-448.
Matulka, Josef and Reinhard Neck (1992), "A New Algorithm for Optimum Stochastic Control
on Nonlinear Economic Models," forthcoming in the European Journal of Operations
Research.
Mizrach, Bruce (1991), "Non-Convexities in an Stochastic Control Problem with Learning,"
Journal of Economic Dynamics and Control, Vol. 15, No.3, pp. 515-538.
Norman, A, M. Norman and C. Palash (1979), "Multiple Relative Maxima in Optimal Macroe-
conomic Policy: An Illustration", Southern Economic Journal, 46, 274-279.
Parasuk, Chartchai (1989), Application of Optimal Control Techniques in Calculating Equi-
librium Exchange Rates, Ph.D. Dissertation, Department of Economics, The University of
Texas, Austin, Texas 78712.
Park, Jin-Seok (1992), A Macroeconomic Model of Monopoly: A Theoretical Simulation
Approach and Optimal Control Applications, Ph.D. dissertation in progress, Department
of Economics, University of Texas, Austin, Texas 78712.
Pethe, Abhay (1992), "Using Stochastic Control in Economics: Some Issues", Working Paper
92-5, Center for Economic Research, The University of Texas, Austin, Texas, 78712.
Pindyck, Robert S. (1973), Optimal Planning for Economic Stabilization, North Holland
Publishing Co., Amsterdam.
Prescott, E. C. (1972), "The Multi-period Control Problem under Uncertainty," Econometrica,
Vol. 40, pp. 1043-1058.
Simon, H. A (1956), "Dynamic Programming under Uncertainty with a Quadratic Criterion
Function," Econometrica, Vol. 24, pp. 74-81, January.
Theil, H. (1957), "A Note on Certainty Equivalence in Dynamic Planning," Econometrica,
Vol. 25, pp. 346-349, April.
Tse, Edison and Yaakov Bar-Shalom (1973), "An Actively Adaptive Control for Linear Sys-
tems with Random Parameters," IEEE Transactions on Automatic Control, Vol. AC-17,
pp. 38-52, February.
Tucci, Marco (1989), Time Varying Parameters in Adaptive Control, Center for Economic
Research, The University of Texas, Austin, Texas 78712.
Turnovsky, Stephen J. (1973), "Optimal Stabilization Policies for Deterministic and Stochastic
Linear Systems", Review of Economic Studies, Vol. 40.
Turnovsky, Stephen J. (1977), Macroeconomic Analysis and Stabilization Policy, Cambridge
University Press, London.
ALFRED LORN NORMAN
ABSTRACf. Herbert Simon advocates that economists should study procedural rationality
instead of substantive rationality. One approach for studying procedural rationality is to con-
sider algorithmic representations of procedures, which can then be studied using the concepts
of computability and complexity. For some time, game theorists have considered the issue of
computability and have employed automata to study bounded rationality. Outside game theory
very little research has been performed. Very simple examples of the traditional economic
optimization models can require transfinite computations. The impact of procedural rationality
on economics depends on the computational resources available to economic agents.
1. INTRODUCTION
H. Simon (1976) suggests that the proper study of rationality in economics is pro-
cedural rationality. Simon believes that procedural rationality should encompass
the cognitive process in searching for solutions to problems. This study should be
performed using computational mathematics, which he defines as the analysis of
the relative efficiencies of different computational processes for solving problems of
various kinds. "The search for computational efficiency is a search for procedural
rationality, ... " In this paper, problem-solving processes are formalized as algorithms
for solving economic problems. Placed in an algorithmic format, procedural ratio-
nality can be studied using the theory of computability and complexity developed by
mathematicians and computer scientists.
In Section 2 the concepts of computability and complexity are presented. The
traditional format of computability is for finite representations. One example is finite
sequences from a finite alphabet. Another is the study of functions f : Nn -+
N k , n 2: 0, k > 0, where N is the natural numbers 0, 1,2, . ... While this model
is appropriate for studying finite state game theory, it is not applicable to most
traditional single agent optimization problems such as the theory of the firm or the
consumer defined as optimization problems over the reals. To study the complexity
of such problems, the information-based complexity concept of Traub, Wasilkowski
and Woiniakowski (1988) is recommended. This approach encompasses both finite
representable combinatorial complexity as well as optimization over the reals. An
important question in complexity theory is whether a problem is tractable, that is can
be computed with polynomial resources.
One application of complexity theory is determining the computational cost of
achieving accuracy in algorithms used in numerical analysis such as integration.
Economists should perform such analyses for algorithms used in optimization models
D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 89-108.
© 1994 Kluwer Academic Publishers.
90 A.L. Norman
and econometrics. A start in this direction has been made by Norman and Jung
(1977), Norman (1981,1994) and Rustem and Velupillai (1987) in the area oflinear
quadratic control. In this paper, we focus on the relationship between computability
and complexity and economic theory with special emphasis on bounded rationality.
In this paper we focus on computational complexity and do not consider dynamic
complexity arising from chaotic behavior even though Spear (1989) demonstrates
that the two concepts are related.
In section 3 the literature concerning computability, complexity and bounded
rationality in finite action game theory is considered. This literature dates back at
least to Rabin's (1957) demonstration of the existence of a noncomputable strategy.
More recently Binmore (1990) and Canning (1992) have considered the impact
of restricting players to computable algorithms. Since Aumann's (1981) suggestion,
game theorists have modeled bounded rationality by replacing players with automata.
A brief survey of this literature is presented. For automata theory there are two
types of complexity: the computational complexity of computing the best-response
automaton and the strategic complexity of implementing the strategy. Overall, game
theory contains many problems currently considered intractable.
Outside of game theory very little research in economics has been done on com-
putability and complexity and their relationship to bounded rationality. The literature
concerning the theory of the firm and the theory of the consumer is considered in
Section 4. Norman (1994) demonstrates that very simple models of the firm can
require transfinite computations to determine profit maximization. Also, such trans-
finite problems cannot be ignored by appealing to concepts such as €-rationality,
because the computational complexity of €-optimization can be exponential, that is,
intractable. Beja (1989) and Rustem and Velupillai (1989) demonstrate a fatal flaw in
the traditional choice model. Norman (1992) proposes a new discrete-mathematics
consumer model for choice with technological change.
In Section 5 we briefly consider two miscellaneous, unrelated articles. The first
is Spear's (1989a) use of computability theory to characterize the identification of
a rational expectation equilibrium. The second is Norman's (1987) use of compu-
tational complexity to characterize alternative mechanisms to clear the astray-Starr
(1974) household exchange problem.
Section 6 forecasts the impact of computability and complexity on economics.
If bounded rationality is interpreted as optimization with a computational resource
restriction, the impact on economic theory depends on whether the restriction is
computability, tractability or linearity.
Finally, the reader is warned that because symbol usage generally follows the
references, some symbols are used for several purposes in the paper.
There are several approaches to the theory of computability that include recursive
functions, Turing machines, algorithms, and rewrite systems. Because these alterna-
tives are equivalent up to a coding (transformation), the choice selected should be
Computability, Complexity and Economics 91
that most accessible to the reader. While mathematicians, and hence economic theo-
rists, generally prefer the recursive function approach, the readers of Computational
Economics are likely to prefer an algorithmic approach that is intuitively obvious to
economists with some computer programming experience.
Let us consider the algorithmic approach to computability of Sommerhalder
and van Westrhenen (1988), which analyzes the properties of simple-algorithmic-
language, SAL(N), programs. A SAL(N) program is a mathematical entity defined
as a quadruple (n, k, p, P), where P is a sequence of SAL statements, and the
variables occurring in the sequence P belong to XI, ... ,xp E NP. Of these p
variables, n ~ 0 are input variables and k > 0 are output variables. There are two
types of SAL statements:
1. Assignment statements: (Note: we will use f-, not :=, for assignment)
Xi f- 0
Xi f- Xj
Xi f- Xj + 1 (Successor)
Xi f- Xj 0 1 (if Xj = 0 then Xi f- 0 else Xi f- Xj - 1) (Predecessor)
2. While statement:
while Xi t- 0 do S od where S is a sequence of SAL statements.
The set F(SAL(N)) contains all functions f:Nn -+ N k , n ~ 0, k > 0 for
which there exists a SAL(N) program (n, k, p, P) which, given an input(xl, ... ,x n ),
computes (XI, ... ,Xk) = f(XI, ... ,xn ) as output in a finite number of steps. If the
program computes an output for at least one input, the function is a partial recursive
function, and if the program computes an output for every input, the function is a total
recursive function. The set F(SAL(N) is equivalent to the set of recursive functions.
(See Sommerhalder and van Westrhenen, 1988).
In SAL programs, arithmetic operations such as +, -, x, and -;- must be con-
structed as macros. For example the addition macro Xi f - ADD (Xi + Xj) can be
constructed as
while Xj t- 0 do
Xi = Xi +1
Xj = Xj 0 1
od
Adding the standard arithmetic operations as statements in SAL would decrease the
number of statements required to compute a function; nevertheless, if a function was
not computable without arithmetic statements, it would not become computable with
arithmetic statements. Computability addresses the issue of what can be computed
in a finite number of statements, not the number of statements. For simplicity, it is
desirable to keep the instruction set of SAL to the minimum.
92 A.L. Norman
(1)
where U stands for a pair consisting in information N and algorithm tjJ. w(·) is a
vector whose ith element is the number of operations performed on the ith element
of r or n, as designated. This cost function is closely related to the time needed
to perform the computation. To determine the total time, the cost vectors would be
replaced with the time needed to perform the associated operations.
In this paper we concern ourselves only with the worst-case setting of complexity.
Here the error and cost of approximation are defined over all problem elements as
follows:
L X = L x? (6)
xEJ xE(Q-J)
For simplicity, consider the special case where Q consists of just three numbers. We
introduce a SAL macro for addition called ADD. The critical steps in an ND-SAL(N)
program would be three statements (i = 1,2,3):
The program would terminate only if X4 equals X5. After these three statements have
been executed there are eight possibilities:
96 A.L Norman
Case X4 Xs
1 XI +X2+X3 0
2 XI +X2 X3
3 XI +X3 X2
4 X2+ X 3 XI
5 XI X2 +X3
6 X2 XI +X3
7 X3 X2 +X3
8 0 XI +X2 +X3
Game theory is the only field of economics that has generated a literature concerning
computability, complexity, and bounded rationality. This is not totally surprising
since finite action game theory is one of the few economic subjects fitting the tradi-
tional computational models for problems represented by either natural numbers or
Computability, Complexity and Economics 97
finite sequences from a finite alphabet. The first topic we consider is the impact of
the concept of computability on game theory. Next we consider an example of an
NP-complete problem in game theory and finally, we consider finite automata as a
form of bounded rationality.
The knowledge that there exist games with noncomputable optimal strategies has
been known at least since Rabin's (1957) paper. We now present a simple number
theoretic example due to Jones (1982).
Example 1. An arithmetical Game of Length 5
There are two players 1 and 2 who take turns assigning nonnegative integer values
to the variables of a polynomial:
player 1 picks XI
player 2 picks X2
player 1 picks X3
player 2 picks X4
player 1 picks X5
The payoff matrix for this game is symmetric. Both players' 1 and 2 actions are
c, cooperate, and d, defect. The first entry in each element of the payoff matrix
represents the payoff to player 1 and the second the payoff to player 2.
This game has one Nash equilibrium (d, d). While both players would be bet-
ter off cooperating (c, c), this action combination is not stable because both play-
ers could improve their position by switching actions. In the repeated prisoner's
dilemma, the problem is to determine the circumstances under which the two play-
ers would cooperate to achieve a higher payoff. Intuitively it would seem likely
that they would have incentives to cooperate. Let us consider this problem. Let
at E{(C, c), (c, d), (d, c), (d, d)} be the action combination selected by players 1 and
2 in period t. A history h of length l(h) = k is {a j , ai+l, ... , ai+k}, and HT is a
set of all histories of length strictly less than T. A strategy for player i in period t is
If : Ht-l --+ (c, d); that is, a strategy provides a rule for action given all possible
histories. One method of calculating the payoffs in a repeated game is the average
Computability, Complexity and Economics 99
payoff. Let jI = (fl (hO), h(hO)), which means the strategy for the first stage. Then,
recursively for t = 2,3, ... , T: It = l(jI, p, ... ,It-I). Let P(ft) be the payoff to
the two players in period t w~en they use strat~y combination It. Then the average
payoff to the two players is P(f) = (ljT) 2::t=1 PCP).
Let us now describe two very simple strategies for the repeated prisoner's
dilemma. Since the game is symmetric, we only need describe the strategies for
player 1. These two are
a. Constant defect: II (h) -+ d for all histories h.
b. Tit-for-tat: II(hO) -+ c and II(h) -+ h~. (That is, initially cooperate and
afterwards execute the last action taken by player 2.)
As the game is repeated, the set Ht of all possible histories, which is also the domain
of the strategy, increases exponentially. Nevertheless, for the average payoff, T-
period, repeated prisoner's dilemma game, the only Nash equilibrium is the action
combination (d, d) each period.
Kalai's approach for studying bounded rationality in repeated games is full au-
tomation, where both players are replaced by automata. An automaton is a triple
((M, mO), B, T), where M is the set of states of the automaton. The behaviorfunc-
tion B : M -+ (c, d) prescribes an action to player 1 at every state on the automaton.
The transition function T : M x A -+ M transits the automaton to a new state from
an old one as a function of the action combinations of both players. The automata
for the two strategies listed above are presented in the following table.
Neyman (1985) maintains that the two-person, repeated prisoner dilemma played
by automata can result in a cooperative Nash equilibrium. Let PJ:
1 ,m2 represent
T repetitions with the average payoff criterion, where each player i chooses an
automaton of size not exceeding mi. Neyman asserts that, if 2 ::; mJ, m2 ::;
T - 1, then there is a Nash equilibrium pair of automata of PJ:t,
m 2 that prescribes
cooperation throughout PT. The occurs because restricting the size of the automata
prevents the usual backward induction. Zemel (1985) introduces small talk into
the finitely repeated prisoner's dilemma as an alternative approach to explaining
cooperation.
Next let us consider Pen-Porath's (1986) result concerning the advantage of
having a bigger automaton in an infinitely repeated two-person zero-sum game,
Z~I' m2 • Since zero-sum games have a value in mixed strategy, every player can
guarantee his/her pure strategy Z-maxmin value with an automaton of size one.
Ben-Porath's result concerning the advantage of being bigger is that for every given
positive integer ml, there is a positive integer m2, and an automaton A2 of size m2,
100 AL. Norman
such that for every automaton Al of size mt. player 1's payoff is no more than the
pure strategy Z-maxmin value of player 1.
Rubinstein (1986) and Abreu and Rubinstein (1988) have investigated the choice
of automata when the number of states is costly. Also, games with finite actions
have the desirable property that all equilibrium payoffs can be well approximated by
equilibria of bounded complexity. This idea is pursued in the papers of Kalai and
Stanford (1988) and Ben-Porath and Peleg (1987).
In addition to characterizing the behavior of automata, game theorists have also
investigated the computational complexity of computing the best-response automaton
under various conditions. Gilboa (1988) considers the problem of computing the best-
response automaton, A I, for player 1 in a repeated game G with n players and n - 1
finite automata, (A 2 , ••• , An), for the remaining players in G. He demonstrates
that the computational complexity of both problem (1) - determining whether a
particular Al is a best-response automaton - and problem (2) - finding a best-response
automaton Al - is polynomial. If the number of players is unrestricted, problem (1)
is NP-complete and problem (2) is not polynomial. Ben-Porath (1990) demonstrates
that for a repeated two person game where player 2 plays a mixed automaton strategy
with a finite support, problem (1) is a NP-complete problem and problem (2) does
not have a polynomial solution. Papadimitriou (1992) considers the relationship
between the computational complexity of determining a best-response strategy and
the strategic complexity in a repeated prisoner's dilemma. If an upper bound is
placed on the number of states of the best-response automaton, the problem is a
NP-complete problem; whereas, if no bound is imposed, the problem is polynomial.
Finally, game theorists are in the process of developing a complexity measure
for implementing an automaton. Kalai-Stanford (1988) define the complexity of a
strategy to be its size (the number of states of the smallest automaton prescribing
it). In general the amount of information needed for playing a strategy equals the
complexity of the strategy; that is, the complexity of a strategy, f, equals the number
of equivalence classes of histories it induces. Banks and Sundaram (1990) propose
an alternative strategic complexity concept that includes a measure of the need to
monitor the opponent's action. Lipman and Srivastava (1990) propose a strategic
complexity measure based on the details of the history required by the strategy. They
are interested in the frequency with which perturbations in history change the induced
strategy. Papadimitriou's (1992) result indicates that achieving a specified Kalai-
Stanford strategic complexity increases the computational complexity of computing
the best response automaton.
The original calculus-based models of profit and utility maximization are defined
over the reals - for example, the positive orthant of ~n. Consequently, the traditional
computability and complexity arguments based on either the natural numbers or finite
representations from a finite alphabet are not applicable.
In order to demonstrate just how simple a noncomputable optimization problem
Computability, Complexity and Economics 101
can be, we consider the problem presented in Norman (1994), which employs the
information-based complexity model. A monopolist has a linear production process,
faces a linear inverse demand function, and has a profit function for t = 1,2, ... ,T:
Pt = a - dqt,
(8)
where a and d are known, qt is the tth observation of net output, Xt is the tth
level of the production process, (3 is the unknown scalar parameter, and (t is the
tth unobserved disturbance term. The (t are iid normal with mean zero and known
variance one. Since the complexity results are invariant to defining the cost function
as a zero, linear, or quadratic function, the cost function is defined as c(qt) = 0 to
simplify the notation.
Given a normal prior on (3 at time t = 1, the prior information on (3 at time t
is a normal distribution N(mt, hd, where mt is the mean updated by h t = h t - I +
xLI and h t is the precision updated by mt = (mt-Iht- I + qt-Ixt-d/ht. For this
paper let us consider two cases:
1. The agent knows (3 precisely. He or she has either been given precise
knowledge of (3 or has observed (1) a countable number of times so that his or her
prior on (3 has asymptotically converged to N«(3, 00).
2. The agent's prior information on (3 is represented by N(ml' hJ), where hI has
a very small positive value.
The monopolist is interested in maximizing his expected discounted profit over a
finite time horizon:
T
Jr = supE[LTt-IPt(Xt)qt(Xt) Il-I, xt-I], (9)
a;T t=1
h
were TIS. the d'Iscount f actor, qt - I 'IS ( ql, q2, ... , qt-I ) an d x t - I 'IS ( XI, X2, ... , Xt-I ) .
qt-I and x t - I represent the fact that the decision maker anticipates complete infor-
mation that is observed exactly and without delay.
First consider the optimization problem where (3 is a known parameter. The
optimal Xt can be exactly determined as a function of the parameters of f E F
without recourse to the information operator as
* a (10)
x t = 2d(3'
E[Q\(qd] -d (12)
Q2(qd '
where Q\ (q\) and Q2 (q\) are quadratic forms in the normal variable q\. This expec-
tation cannot be carried out explicitly to give an analytic closed expression. This
implies the O-complexity of this problem with an unknown parameter is transfinite.
Norman (1993) uses these two cases to provide a Bayesian explanation of Knight's
concepts of risk and uncertainty. Risk is where the parameters and distributions of
the decision problem are known, and uncertainty is where at least one parameter
or distribution is not known. The conjecture is that, for nonlinear problems, the f-
computational complexity of an uncertainty problem always lies in a equal or higher
computational class than that for the equivalent risk problem.
The reader might have an illusion that transfinite problems are an oddity in eco-
nomics. The author asserts that the opposite is likely to be the case. Readers who
are not familiar with computational complexity, but who have some knowledge of
numerical analysis, should realize that all those problems for which the traditional
numerical analysis focused on asymptotic convergence of alternative algorithms are
transfinite computational problems. The author asserts that most of the standard cal-
culus optimization problems in the theory of the consumer and the firm are transfinite.
Only special cases, such as quadratic problems, are computable. Also, expressions
that are defined by infinite series are frequently not computable. Another example is
traditional asymptotic convergence theory of econometric estimates.
The reader might assume that the problem can be circumvented by appealing to
f-rational arguments; that is, by using f-approximations which can be computed in
a finite number of computations. If the constraint is that these approximations be
tractable in the sense of being polynomiaL costs with respect to the growth parameters
of the problem, using f-approximations is not always possible.
Consider the discrete-time, stationary, infinite horizon discounted stochastic con-
trol problem requiring computation of a fixed point J* of the nonlinear operator T
(acting on a space of function on the set S E ~n) defined by Bellman's equation
(T J)(x) = inf[g(x, u)
uEU
+a J
s
J(u)P(ylx, u)dy], "Ix E S. (13)
Here, U c lRm is the control space, g( x, u) is the cost incurred if the current state is
x and control u is applied, a E(O, 1) is a discount factor, and P(ylx, u) is a stochastic
kernel that specifies the probability distribution of the next state y when the current
state is x and control u is applied. Then J* is interpreted as the value of the expected
discounted cost, starting from state x, providing that the control actions are chosen
optimally. A variation of this model has been considered by economists [for example
Easley and Keifer (1989)] investigating parameter estimation in an estimation and
Computability, Complexity and Economics 103
where Pit is the price of the ith item in the tth period, It is the income in the tth
period, and bit E BIt if bit E B t and Pit ~ It.
Because of the high rate of technological change in the marketplace and the usually
long time interval between purchases, the consumer of durable goods generally faces
a new set of alternatives possessing new technological attributes. We assume that the
consumer searches for his preferred item by ranking his alternatives. This ranking
operation is costly, because it requires real resources in the form of mental effort,
time, and travel expenses. Given the rapid rate of technological change, we assume
that the consumer's preferences are not given a priori but are determined, to the
extent they can be done so efficiently, in the consumer's search for the preferred item.
We model the ranking of two items as a binary operation, R(bit , bjt), which the
consumer must execute to determine his preferences between two items, bit and
bjt. This operation is modeled as a primitive operation with positive costs, and no
attempt is made to model the human neural network. We assume that the cost, c, of
104 AL. Norman
comparing items is invariant to the two items being compared. The reflexive binary
ranking operation R(bit • bjt) is assumed to have the following cost: Given any two
unranked bit and bjt E B, C(R(bit • bjt )), the cost of executing R(bit. bjt) is c. If
bit and bjt have been ranked, the cost of remembering R(bit. bjt) is O. Also the
consumer could rank alternatives if he or she choose. However, given the cost, this
might not be optimal. In addition, the consumer expends resources to determine
which items in his or her consumption set are budget feasible: For any bit E B, the
cost of performing F (bit) is k.
The consumer's search to find an optimal consumption bundle depends on market
organization. The type of organization considered is a consumer selecting a new
TV from a wall of TVs presented in a electronics discount store. Consequently,
the consumer's search can be conceptualized as one through an unordered sequence
to find a preferred item satisfying a budget constraint; the consumer's search can
be modeled as an algorithm. Organized in this fashion, characterizing an efficient
search is equivalent to determining the combinatorial computational complexity of
the choice problem.
The computational complexity of finding the preferred item in a one-time choice
problem is n. An efficient algorithm, then, is a variation of finding the largest
number in a sequence. Thus, in a one-time choice problem, it is never efficient to
develop a complete preference ordering, which is a variation of sorting a file and
has a computational complexity of n In n. Consequently, if ranking alternatives is
expensive, a procedural rational consumer facing technological change would never
determine a complete preference ordering, a fundamental assumption of a substantive
rational consumer.
5. TWOPAPERS
In this section we consider two separate, unrelated papers. First, Spear (1989) demon-
strates how the imposition of computability on a rational expectations equilibrium,
REE, with incomplete information implies that such equilibria are not identifiable.
Second, Norman (1987) demonstrates how complexity theory can be used to create
a theory of money. These papers are related only in that they provide some insights
into the range of topics in economics to which the concepts of computability and
complexity might be applied.
Spear considers a two-period overlapping-generations model. To use finite rep-
resentation computability theory, he assumes the economy has a countable number
of states, S. The set cl> consists of total recursion functions on S, where total means
that the associated SAL(N) programs stop for all states s E S. The economy maps
admissible forecasts ¢>o into temporary equilibria, T.E. price functions <PI E cl>. This
mapping, 9 : cl> -4 cl>, which, given the assumptions is 9 : N -4 N, is assumed total
recursive and has a fixed point.
Spear considers the problem of determining the circumstances under which agents
can identify the rational expectations eqUilibrium, REE. For the problem under
consideration, identification means the ability to construct an algorithm that can
Computability, Complexity and Economics 105
decide in a finite (not asymptotic) number of steps which function among a class of
recursive functions has generated an observed sequence of ordered pairs of numbers
of the form (j, J[jD. The two basic results for complete information are (1) if the
T.E. price function is primitive recursive, agents can identify it; however, if 4>g[ij is
not primitive recursive, identification may not be possible, and (2) if the function 9 is
primitive recursive, it can be identified in the limit. Primitive recursive functions are
those that can be computed by SAL(N) programs that do not employ while statements.
(Sequences of assignment statements can be executed a specified number of times
with times statements.)
With incomplete information the basic result is: There is no effective procedure
for determining when a given model-consistent updating scheme yields a REE, unless
Rg is empty.
In the second paper, Norman (1987) constructs a theory of money based on the
complexity of barter exchange.
The monetary model employed is the Ostray-Starr (1974) household exchange
problem: Let Wand Z be n x H matrices representing the initial endowments and
excess demands of the H households with columns representing households and
rows representing goods. The entries of W are non-negative. A positive entry for Z
indicates an excess demand, a negative entry excess supply. Given an n-vector price
p whose elements are all positive, the system (p, Z, W) satisfies for i = 1,2, ... , nand
j = 1,2, ... , H the following restrictions:
p'Z=O,
I:Zij = 0, (7)
j=!
These conditions state that the value of each household's excess demands equals
the value of its excess supplies, and the excess supply of any good cannot exceed
its respective endowment. In addition, aggregate excess demand equals aggregate
excess supply.
In this model the general equilibrium auctioneer has generated a set of equi-
librium prices, and the task remains to find a set of trades that clear the resulting
household excess demands. In a manner analogous to the creation of the auctioneer,
a broker is created to arrange a clearing sequence, a set of trades that will reduce all
household excess demands to zero. The difficulty of the broker's task depends on
the conditions imposed on each trade. For all exchange mechanisms considered, all
trades considered must satisfy the condition that the value of the goods received by a
household must equal the value of goods sent without credit. If no other conditions
are imposed on the exchange mechanism, the broker can simply exchange all excess
demands simultaneously. The computational complexity of the resulting "command
exchange" mechanism is nH.
106 A.L. Norman
Because bilateral barter will not clear the household exchange model in general,
multiparty barter in the form of chains is considered. In a chain, household jl
receives good i l and sends good i 2. Household Jz receives good i2 and sends good
i 3. Household jm receives good im and sends good i l . The value of the goods being
traded, y, is equal in all cases. The computational complexity of the multiparty barter
exchange mechanism is the minimum of (n 2H, nH2). Introducing money reduces
the complexity of the exchange mechanism to nH.
6. CONCLUDING REMARKS
REFERENCES
1. Abreu, D. and Rubinstein, A. 1988, "The structure of Nash equilibrium in repeated games
with finite automata", Econometrica, vol 56, No.6.
2. Aho, A. v., J. E. Hopcroft and J. D. Ullman, 1974, The design and analysis of computer
algorithms (Addison-Wesley: Reading).
3. Aumann, R. J., 1981, "Survey of repeated games", in Essays in Game theory and
Mathematical Economics in Honor of Oskar Morgenstern (Bibiographische Institut:
Mannheim).
4. Banks, J. S. and R. K. Sundaram, 1990, "Repeated games, finite automata, and complex-
ity", Games and Economic Behavior, vol 2, pp. 97-119.
Computability, Complexity and Economics 107
5. Beja, A. 1989, "Finite and infinite complexity in axioms of rational choice or Sen's
characterization of preference-compatibility cannot be improved", Journal of Economic
Theo~,voI49,pp. 339-346.
6. Ben-Porath, E. 1986, "Repeated games with finite automata.", IMSSS, Stanford Univer-
si ty (manuscri pt).
7. Ben-Porath, E. and Peleg, B. 1987, "On the Folk theorem and finite automata", The
Hebrew University (discussion paper).
8. Ben-Porath, E., 1990, "The complexity of computing a best response automaton in
repeated games with mixed strategies", Games and Economic Behavior, vol 2, pp. 1-12.
9. Binmore, K. 1990, Essays on the Foundations of Game Theory, (Basil Blackwell, Ox-
ford).
10. Blum M., 1967, "A machine independent theory of the complexity of recursive func-
tions",1. ACM, vol 14, pp. 3322-336.
11. Canning, D. 1992, "Rationality, Computability, and Nash Equilibrium", Econometrica,
Vol 60, No 4, pp. 877-888.
12. Chow, Chee-Seng and John N. Tsitsiklis, 1989, "The Complexity of Dynamic Program-
ming", Journal of Complexity, 5,466-488.
13. Easley, David and N. M. Keifer, 1988, "Controlling a stochastic process with unknown
parameters", Econometrica, Vol 56 No.5, 1045-1064.
14. Jones, 1. P., 1982, "Some Undecidable Determined Games", International Journal of
Game Theory, vol. II, Issue 2, pp. 63-70.
15. Gilboa, Itzhak, 1988, ''The complexity of computing best-response automata in repeated
games", Journal of Economic Theory, vol 45, pp. 342-352.
16. Hartmanis, Juris, 1989, "Overview of Computational Complexity Theory in Hartmanis",
J (ed) Computational Complexity Theory (American Mathematical Society: Providence).
17. Kalai, E., 1990, "Bounded Rationality and Strategic Complexity in Repeated Games", in
Ichiishi, T, A Neyman, and Y. Tuaman, (eds), Game Theo~ and Applications, (Academic
Publishers, San Diego).
18. Kalai, E. and W. Stanford, 1988, "Finite rationality and interpersonal complexity in
repeated games", Econometrica, vol 56, 2, pp. 397-410.
19. Lipman, B. L. and S. Srivastava, 1990, "Informational requirements and strategic com-
plexity in repeated games", Games and Economic Behavior, vol 2, pp. 273-290.
20. Lipman, B. L. 1991, "How to decide how to decide how to ... : Modeling limited ratio-
nality", Econometrica, vol 59, No.4, pp. 1105-1125.
21. MatijaseviS, J. V., 1971, "On recursive unsolvability of Hilbert's tenth problem", Pro-
ceedings of the Fourth International Congress on Logic, Methodology and Philosophy
of Science, Bucharest, Amsterdam 1973, pp. 89-110.
22. Neyman, A., 1985, "Bounded complexity justifies cooperation in the finitely repeated
prisoner's dilemma", Economics Letters, Vol 19, pp. 227-229.
23. Norman, A, 1981, "On the control of structural models", Journal of Econometrics, Vol
15, pp. 13.24.
24. Norman, Alfred L., 1987, "A Theory of Monetary Exchange", Review of Economic
Studies, 54, 499-517.
25. Norman, Alfred L., 1992, "On the complexity of consumer choice, Department of
Economics", The University of Texas at Austin, (manuscript) Presented at the 1992
Society of Economics and Control Summer Conference, Montreal.
26. Norman, Alfred L., 1994, "On the Complexity of Linear Quadratic Control", European
Journal of Operations Research, 73, 1-12.
27. Norman, Alfred L., 1994, "Risk, Uncertainty and Complexity", Journal of Economic
Dynamics and Control, 18,231-249.
28. Norman, Alfred L. and Woo S. Jung, 1977, "Linear Quadratic Control Theory For Models
With Long Lags", Econometrica, 45, no.4, 905-917.
29. Ostroy, 1. and R. Starr, 1974, "Money and the Decentralization of Exchange", Econo-
metrica, vol 42, pp. 1093-1113.
108 A.L. Norman
30. Papadmimitriou, C. H., 1992, "On players with a bounded number of states", Games and
Economic Behavior, Vol 4, pp. 122-131.
31. Papadminitriou, C. H. and K. Steiglitz, 1982, Combinatorial Optimization: Algorithms
and Complexity, (Prentice-Hall: Englewood Cliffs).
32. Prasad K. and J. S. Kelly, 1990, "NP-Completeness of some problems concerning voting
games", International Journal of Game Theory, Vol 19, pp. 1-9.
33. Rabin, M. 0., 1957, "Effective computability of winning strategies", M. Dresher et al.
(eds), Contributions to the Theory of Games, Annals of Mathematical Studies, Vol 39,
pp. 147-157.
34. Rubinstein, A. 1986, "Finite automata play the repeated prisoner's dilemma", Journal of
Economic Theory, vol 39, pp. 83-96.
35. Rustem, B. and K. Velupillai, 1987, "Objective Functions and the complexity of policy
design", Journal of Economic Dynamics and Control, vol 11, pp. 185-192.
36. Rustem, Band K. Velupillai, 1990, "Rationality, computability, and complexity", Journal
of Economic Dynamics and Control, vol 14, pp. 419-432.
37. Simon, H. A., 1976, "Form substantive to procedural rationality", S. Latsis (ed), Method
and Appraisal in Economics, (Cambridge University Press, Cambridge).
38. Sommerhalder, R. and S. van Westrhenen, 1988 The theory of Computability: Programs,
Machines, Effectiveness and Feasibility, (Addison Wesley: Workingham).
39. Spear, S. E., 1989a, "When are small frictions negligible?", in Barnett, w., 1. Geweke,
and K. Shell (eds), Economic complexity: Chaos, sunspots, bubbles, and nonlinearity,
(Cambridge University Press, Cambridge).
40. Spear, S. E., 1989, "Learning Rational Expectations under computability constraints",
Econometrica, Vol 57, No.4, pp. 889-910.
41. Traub, J.F., G. W. Wasilkowski and H. Wozniakowski, 1988, Information Based Com-
plexity, (Academic Press, Inc., Boston).
42. Zemel, E., 1985, "Small talk and cooperation: A note on bounded rationality", Journal
of Economic Theory, vol 49, No.1, pp. 1-9.
BER<;:RUSTEM
ABSTRACT. In the presence of rival models of the same system, an optimal policy can be
computed to take account of all the models. A min-max, worst-case design, problem is an
extreme case of the ordinary pooling of the models for policy optimization. It is shown that,
due to its noninferiority, the min-max strategy corresponds to the robust policy. If such a robust
policy happens to have too high a political cost to be implemented, an alternative pooling can
be formulated using the robust pooling as a guide.
An algorithm is described for solving the constrained min-max problem. This consists of
a sequential quadratic programming subproblem, a stepsize strategy based on a differentiable
penalty function and an adaptive rule for updating the penalty parameter.
The global convergence and local convergence rate of the algorithm are established in
Rustem (1992). In this paper, we discuss the numerical convergence properties of the algorithm
and related issues such as the convergence of the stepsize to unity and the properties of the
penalty parameter.
where Y and U are, respectively, the endogenous or output variables and policy
instruments or controls of the system. J is the policy objective function and F is the
model of the economy. In general, F is nonlinear with respect to Y and U. Problem
(1) is essentially a static transcription of a dynamic optimization problem in discrete
time, where
y,
U= Ut Y = Yt
YT
with Ut E IRn and Yt E IRm denoting the control and endogenous variable vectors at
time period t. The optimization covers the periods t = 1, ... , T. Thus, Y E IRm x T ,
U E IRnxT , F : IF C IRnx - t IRTxm and J : .If c IRnx - t 1R' (nx = T x (m + n)).
The vector valued function F is essentially an econometric model comprising a
system of nonlinear difference equations represented in static form for time periods
t = 1, ... ,T.
The formulation of the policy optimization problem (1) is, in practice, an over-
simplification. Originating from rival economic theories, there exist rival models
purporting to represent the same system. The problem of forecasting under similar
circumstances has been approached through forecast pooling by Granger and New-
bold (1977) and, more recently, by Makridakis and Winkler (1983) and Lawrence et
al., (1986). In the presence of rival models, the policy maker may also wish to take
account of all existing rival models in the design of optimal policy. One strategy in
such a situation is to adopt the worst case design problem
min
yl,o •. Y1n mod,U
max {Ji(yi,u)
i
IFi(yi,U) = 0; i = 1, ... ,mmod} , (2)
where there are i = 1, ... , mmod rival models, with yi, F i , respectively, denoting
the dependent (or endogenous) variable vector and the equations of the ith model.
This strategy is an extension of a suboptimal approach originally discussed in Chow
(1979). Problem (2) seeks the optimal strategy corresponding to the most adverse
circumstance due to choice of model. All rival models are assumed known. The
solution of (2) clearly does not provide insurance against the eventuality that an
unknown (mmod + 1)st model happens to represent the economy; it is just a robust
strategy against known competing "scenarios". A similar, less extreme formulation
is also discussed below, utilizing the dual approach to (2).
The optimization procedure considered below does not distinguish between Y
and u. We can thus define a general vector x = [~] to rewrite the min-max
problem as follows:
where F subsumes all the models. The formulation above is slightly more gen-
eral than the original min-max problem above. Other equivalent formulations are
discussed in Rustem (1987, 1989, 1992).
Algorithms for solving (3) have been considered by a number of authors, including
Charalambous and Conn (1978), Coleman (1978), Conn (1979), Demyanov and
Malomezov (1974), Demyanov and Pevnyi (1972), Dutta and Vidyasagar (1977),
Han (1978; 1981), Murray and Overton (1980). In the constrained case, discussed in
some of these studies, global and local convergence rates have not been established
(e.g. Coleman, 1978; Dutta and Vidyasagar, 1977). In this and the next section, a
dual approach to (3), adopted originally by Medanic and Andjelic (1971, 1972) and
Cohen (1981), is initially utilized. Subsequently, both dual and primal approaches
are used to formulate a superlinearly convergent algorithm.
Robust Min-max Decisions with Rival Models 111
To introduce the basic terminology, let x E IRnx and let F : IRnx --+ IRmxT be
twice continuously differentiable functions with
J = [J 1 , J 2 , ... , Jmmod]T .
Let 1 be the mmod-dimensional vector whose elements are all unity. We define the
inner product of two vectors, y and w, of the same dimension as
where v E 1R1. The following two results are used to introduce the dual approach to
this problem.
Proof This result, initially proved by Medanic and Andjelic (1971; 1972) and also
Cohen (1981), follows from the fact that the maximum of mmod numbers is equal to
the maximum of their convex combination. 0
In Medanic and Andjelic (1971; 1972), the model is assumed to be linear, and
the solution of (5) without the constraints F(x) = 0 is obtained using an iterative
algorithm that projects a onto lR~mod. In Cohen (1981), the iterative nature of the
projection is avoided by dispensing with the equality constraint in lR~mod but including
a normalization in a transformed objective function. Although the resulting objective
function is not necessarily concave in the maximization variables, the algorithm
proposed ensures convergence to the saddle point. The algorithm proposed in Cohen
(1981) for nonlinear systems utilizes a simple projection procedure but is essentially
first order.
Let a* be the value of a that solves (5). It can be shown, by examining the first
order conditions of (5) and (4), that a* is also the shadow price associated with the
inequality constraints in (4). An important feature of (5) that makes it preferable to
(3) is that a j can also be interpreted as the importance attached by the policy maker
to the model Fi(x) = O. There may be cases in which the min-max solution a* may
be too extreme to implement. The policy maker may then wish to assign a value to
a, in a neighbourhood of a*, and determine a more acceptable policy by minimizing
(a, J(x)), with respect to x, for the given a. Another interpretation of (5) is in
terms of the robust character of min-max policies. This is discussed in the following
112 B. Rustem
Lemma:
Lem!M (2). Let there exist a min-max solution to (5), denoted by (x*, a*), and
let J and F be once differentiable at (x*,a*). Further, let strict complementarity
hold for a 2 0 at this solution. Then, for i, j, e E {I, 2, ... , mmod}
(6b)
(6c)
(7)
where A*, J.L*, 'TJ* are the multipliers of F( x) = 0, a 2 0 and (1, a) = 1, resl?ectively.
Necessity in case (i) can be shown by considering (7), which, for a:, a~ E (0,1)
yields a:J.L~ = a~J.L~ = 0, and then J.L: = J.L~ = O. Using (6) we have Ji(x*) =
Jj (x*). Sufficiency is established using Ji(x*) = Jj (x*) and noting that
(8a)
Robust Min-max Decisions with Rival Models 113
Since jl(X*) - jm(x*) < 0, we have 0:; = 0. Given that 0:; = 0, 'V£, jl(X*) <
jm(x*), we can use (2) for those i,j for which ji(x*) = jj(x*) to establish
p,~ = p,~ = 0. By strict complementarity this implies that o:~o:~ E (0,1).
Case (iii) can be established noting that for o:~ = 1, we have p,~ = 0, o:~ = 0,
'Vj I- i and, by strict complementarity, p,~ > 0. From (6) we thus obtain
where>. E ]Rm x T , P, E ]R~mod = {p, E ]Rmmod I p, 2: O} and 1} E ]Rl are the multipliers
associated with F( x) = 0, a 2: 0 and (1, a) = 1, respectively. The characterization
of the min-max solution of (2) as a saddle point requires the relaxation of convexity
assumptions (see Demyanov and Malomezov, 1974; Cohen, 1981). In order to
achieve this characterization, we modify (9) by augmenting it with a penalty function.
Hence, we define the augmented Lagrangian by
La(x,a,>.,p,,1},c) = L(x,a,>.,p,,1}) i
+ < F(x),F(x)), (10)
(Ila)
Robust Min-max Decisions with Rival Models 115
where
(11b)
The quadratic objective function used to compute the direction of progress is given
by
or, alternatively, by
The second derivatives due to the penalty term in the augmented Lagrangian (i.e.
c E;=I Y'2 Fj(Xk) Fj (Xk)) are not inclu~ed in (12). The reason for this is discussed
in Rustem (1992). Furthermore, since FJ(x*) = 0 at the solution x*' ignoring this
term does not affect the asymptotic properties of the algorithm. The values ak and Ak
are given by the solution to the quadratic subproblem in the previous iteration. The
direction of progress at each iteration of the algorithm is determined by the quadratic
subproblem
(13a)
Since the min-max subproblem is more complex, we also consider the quadratic
programming subproblem
The two subproblems are equivalent, but (13,b) involves fewer variables. It is shown
below that the multipliers associated with the inequalities are the values a and that
the solution of either subproblem satisfies common convergence properties.
Let the value of (d,a,v) solving (13) be denoted by (dk,ak+l,vk+t). The
stepsize along dk is defined using the equivalent min-max formulation (3). Thus,
consider the function
1 i.e. (v, fhv) ~ 0, for all v -:/= o.
116 B. Rustem
'I/J(X) = iE{I,2,max
... ,mmod}
{Ji(x)}
and
'l/Jk{X) =, max
'E{I,2,oo.,mmod}
{Ji(Xk) + (V' Ji(Xk), x - Xk)} .
The stepsize strategy determines Tk as the largest value of T = (-y)j, , E (0, 1),
j = 0, 1,2, ... such that Xk+1 given by
The stepsize Tk determined by (14) basically ensures that Xk+1 simultaneously re-
duces the main objective and maintains or improves the feasibility with respect to the
constraints. The penalty term used to measure this feasibility is quadratic and consis-
tent with the augmented Lagrangian (10). It is shown in Rustem (1992; Theorem 4.1)
that (14) can always be fulfilled by the algorithm.
The determination of the penalty parameter C is an important aspect of the algo-
rithm. This is discussed in the following description:
The Algorithm
Step 0: Given xo, Co E [0,00), and small positive numbers 8, p, c" such that
I
8 E (0,00), P E (0, I), c E (0, 2]" E (0, I), Ho, set k = 0.
A
Step 1: Compute V' Jk and V' Fk. Solve the quadratic subproblem (13) (choosing
(13,a) or (13,b) defines a particular algorithm) to obtain db Qk+I, and the
associated multiplier vector Ak+I' In (13,a), we also compute ILk+t.1/k+1
and in (13,b) we also compute Vk+I.
Step 3: If
Robust Min-max Decisions with Rival Models 117
Step 4: Find the smallest nonnegative integer jk such that Tk = "I jk with Xk+l =
Xk + Tkdk such that the inequality (14) is satisfied.
In Step 3, the penalty parameter Ck+l is adjusted to ensure that progress towards
feasibility is maintained. In particular, Ck+l is chosen to make sure that the direction
dk computed by the quadratic subproblem is a descent direction for the penalty
. Ck+l (
functton1/J(xk) - -2- Fk,Fk}.
In Rustem (1992), it is shown that dk is a descent direction, that Ck determined
by (15) is not increased indefinitely, that the algorithm converges to a local solution
of the min-max problem, that the stepsize stepsize Tk converges to unity, and that the
local convergence rate near the solution is Q- or two-step Q-superlinear, depending
on the accuracy of the approximate Hessian, iIk.
4. NUMERICAL EXPERIMENTS
In this section, we illustrate the behaviour of the method with a few test examples.
The objective is to highlight the characteristics of the algorithm along with certain
properties of min-max problems. Specifically, we show the attainment of unit step-
sizes (Tk = 1), the way in which the penalty parameter Ck achieves the constant
value C*, and the numbers of iterations and function evaluations needed to reach the
solution in each case. The attainment of a constant penalty parameter is important
for numerical stability. The achievement of unit steps is important in ensuring rapid
superlinear convergence (Rustem, 1992).
We also show the progress of the algorithm towards the min-max solution, which
exhibits certain robustness characteristics predicted by theory. As discussed in
Lemma 2, if the min max over three functions Jl, J2 and J3 is being computed,
then, at the solution, Jl = J2 > J3 iff aI, a2 E (0, I] and a3 = 0 or Jl > J2 ~ J3
iff al = 1 and a2 = a3 = 0. 2 Lemma 2 states this in greater generality, and the
examples illustrate it. Since a is chosen to maximize the Lagrangian (9), the solution
2 Suppose that the state of the world is described bi' say, three rival theories one of which
is to tum out to be the actual state. With J1 = J > J3, at the min-max solution, the
decision maker need not care, as far as the ob~ective function values are concerned, if the
actual state turns out to be J 1 or J 2. If it is J , then the decision maker is better off. The
Lagrange multiplier vector a indicates this in the min-max formulation (4) and the associated
subproblem (13,b). The robustness aspect is underlined by Lemma 2.
118 B. Rustem
can be seen as a robust optimum in the sense of a worst-case design problem. The
figures describing the convergence of the algorithms also illustrate the process of
convergence of the objective functions to the min-max optima.
We consider six test examples. Three of these are unconstrained min-max prob-
lems in which we study the achievement of unit steplengths, and three are constrained
problems in which we study both the achievement of unit stepsizes and a constant
penalty parameter value c*. The approximate Hessian computation uses the BFGS
updating formula and, for constrained problems, its modification discussed in Powell
(1978). The Hessian approximation is done on the second derivative terms arising
from the Lagrangian (i.e. the first two terms on the right of (12)) whereas the exact
Q* 0.4304811740 0.4304811740
0.5695188260 0.5695188360
0.0 0.0
I
kT Tk = 1; Vk 2: kT 3
No. of J evaluations 5 7
No. of iterations 5 7
The same example was also computed with the initial value of x changed to [0,0].
All the results are identical except that the algorithms took 6 iterations and function
evaluations for the direct Hessian case and 8 iterations and function evaluations for
the Hessian approximated by the BFGS formula.
3 kT denotes the iteration at which Tk = I was reached and unit stepsizes were maintained
Vk 2: kT •
Robust Min-max Decisions with Rival Models 119
Q* 0.33333333 0.33333333
0.5 0.5
0.16666667 0.16666667
krlTk = 1; Vk:::: kr
No. of J evaluations 5 5
No. of iterations 5 5
The same example was also computed from the initial value [0,0]. The same result
in reached in 7 iterations for both Hessian evaluation schemes and unit stepsizes are
again achieved at iteration 1.
120 B. Rustem
{J1(X) = exp (1~~ + (X2 - 1)2) ; J2(x) = exp (1~~ + (X2 + 1)2) }
Initial values: xir = [50.0,0.05]; air = [0.5,0.5]
a. 0.5 0.5
0.5 0.5
k.,.ITk = 1; Vk ~ k.,.
No. of J evaluations 8 11
No. of iterations 8 11
If the algorithm is started from the initial point [1.0, 1.1], the same result is obtained
in 7 (direct) and 11 (BFGS) iterations.
Robust Min-max Decisions with Rival Models 121
k.,.I'Tk = 1; Vk ~ k.,. 4 3 3 3
No. of iterations 10 9 9 9
4 Iteration number at which Ck reached a value c. such that Ck = c. Vk after this iteration.
122 B. Rustem
Initial value xif = [1.9,2.1) from which the algorithm converges to a local solution.
Co = 1.0 Co = 10.0
k.,+k = 1; Vk ~ k.,. 4 3 3 3
I
k. Ck = C.; Vk ~ k. 4 7 2 5 5
No. of iterations 10 9 9 9
Robust Min-max Decisions with Rival Models 123
kTITk = 1; Vk 2: kT
k.1 Ck = C.; Vk 2: k. 4
No. of iterations 8 8 8 8
124 B. Rustem
Initial value x6 = [2,4] from which the algorithm converges to a local solution.
co = 1.0 co = 10.0
Hessian evaluation scheme Hessian evaluation scheme
Direct BFGS Direct BFGS
k.,.ITk = 1; Vk ~ k.,. 3 2
k*lck = c*; Vk ~ k* 4 4 4 4 3
No. of iterations 10 9 8 8
Robust Min-max Decisions with Rival Models 125
krlTk = 1; Vk 2: kr 15
k.1 Ck = C.; Vk 2: k. 4
No. of fn. eval. 20 9 20 26
No. of iterations 20 9 20 19
126 B. Rustem
=
co 1.0 =
co 10.0
Hessian evaluation scheme Hessian evaluation scheme
Direct BFGS Direct BFGS
x. 1.000000069 1.0000003456 1.0000003456 1.0000003456
1.000000069 1.0000003456 1.0000003456 1.0000003456
k. . ITk = I; Vk ~ k. . 5 5 5 5
No. of iterations 20 20 20 20
Figures 1-3 below describe the behaviour of the objective functions as the algorithm
proceeds to the optimum. The figures for the constrained examples 4-6 correspond
to the results for the initial penalty parameter Co = 10.
c.., r..... "'...... 1...... 1t. •• 'J9. I I _ " t •• " . . . .1_ * I~n." 0N4 r...u- ,-'- 11'. ~ ...u.. ... h ••,I • • (1I1f_U
· -
:~
j j
1 1
I
· 1'\ -
8'<::s-
"t::::: ~
--
i
... 1,1 ... ... ... ... ... ... 1,1
aS!::
U,.,.1oIMI .................. ,.w,-. (I, ~ ., II
· 1' ....'"- _-'"" ... &."N' .... "" .. (t ...... "
1-
~
I 3
: J
] ]
. - - -- - .. '-- -.- -
II
o
.~~--~--~~--~--~~--~--~----~--~.
•
........ u .........r - •• tU.' P.t. . . . (10. G.H, It.... U •• ""~f' - I..,U.... ,..., .... (".D. I ..G)
C..t h • .u.. '.llole ,. l\enU•• My........ a......... ,.",.. t, c... '.eu.. 'f.au. Y. It..,....... Mln....r - IUIn,le • (D"lCTt
~ri-----r-----r-----r-----r-----r~---r--~~
'~I 1
J .. ~
1 l
••
,
.
•1 I?::: [ "
.De-I-I =---+
• •
It. . . u•• H\I,lnk" •• lalt,I.1 Polnl• • (I • • ,
1l.... U •• HUnl~.r .. Inlll.1 '.llIb • (2.0 . 4.01
Fig. 2. The convergence of the objective functions vs iteration number for examples 3-4 in Section 4. "Direct" signifies the direct (exact, as opposed
to approximate BFGS) evaluation of Hessians.
c.... r\li••u" V..lu. ,. u_*u.. filu_'o'r ......m'•• , (DIIlICT)
C.AIIt rlll •• u •• V..lu ••• Il.,.Uoa tf'\Ii"'IMr - .... m.l • • CDJlI:rcT,
10
I
.. - . I
.: I ! II iI ~"
-le
J J 0f\p _ _ aA
J k . - ...
J -.° II -
IV""
-to - ..- - -- --
O~
o
-to •
~
<"")
I "
. ._. 1:;'
~ 'J;i~1 (5'
J ;:s
] 10 ] .~--\l . '~ '"~
s:
~
I
·w·· •.
IIV
" [
OLI--~----L---~__-L__~__~~__L.__~__-L__.J
- 10 ,t 14 II •• ao
••
n.,..u•• "111111"" •• 'n'tI.' r.I"". OIl C• ••• 1,1'
~
I; I,.uuo" Humllter .. Iftlu., '0''''' • • (0. Ij ~
r;;
Fig. 3. The convergence of the objective functions vs iteration number for examples 4-Q in Section 4. "Direct" signifies the direct (exact, as opposed
to approximate BFGS) evaluation of Hessians. N
10
-
130 B. Rustem
where fi(U) is the reduced objective function corresponding to the ith model after
yi have been evaluated and eliminated from J i .
The algorithm for solving (16) is essentially a simpler version of that for the
constrained case when the penalty parameter Ck = 0, Vk. A sequence {Ud is
generated so that, starting from a given initial policy vector, each member of the
sequence is defined by
(17)
= L:
m
Hk Q~ V'2 fi(Uk) ,
1
and V' P (Uk) is the column vector denoting the gradient of fi. The shadow prices
or the Kuhn-Tucker multipliers of (18) give the values Q~+ l' i = 1, ... , m.
In order to introduce the stepsize strategy T, we consider the function
Robust Min-max Decisions with Rival Models 131
and
The stepsize Tk is the largest value of T = (r)j. j = 0, 1,2, ... such that Uk+1 given
by (17) satisfies the inequality
4500
4000
~ a fHHT + (1 - a) fNIESR
3500
Fig. 4. The variation of min {a fHMT (U) + (1 - a) fNIESR (U)} as a varies, 0 ~ a ~ I, and
u
r
the corresponding values of the individual components MT (U), fNIESR(U).
out of seven cases. As opposed to the examples in Section 4, any failure to achieve
Tk = 1 could mostly be attributed to the errors in the numerical evaluation of the
model and dyi / dU. Such errors affect the accuracy of the Hessian approximation
of f i , which requires dyi / dU, and Tk = 1 can be shown to depend on the accuracy
of this approximation (see Rustem, 1992).
Robust Min-max Decisions with Rival Models 133
ACKNOWLEDGEMENTS
The valuable comments and suggestions of David Belsley are gratefully acknowl-
edged.
REFERENCES
Becker, R.G., B. Dwolatzky, E. Karakitsos and B. Rustem (1986). ''The Simultaneous Use of
Rival Models in Policy Optimization", The Economic Journal 96, 425-448.
Biggs, M.C.B. (1974). ''The Development of a Class of Constrained Minimization Algorithms
and their Application to the Problem of Power Scheduling", Ph.D. Thesis, University of
London.
Charalambous, C. and J.W. Bandler (1976). "Nonlinear Minimax Optimization as a Sequence
of Least pth Optimization with Finite Values of p", International Journal ofSystem Science
7,377-39l.
Charalambous, C. and A.R. Conn (1978). "An Efficient Algorithm to Solve the Min-Max
Problem Directly", SIAM J. Nwn. Anal. 15, 162-187.
Chow, G.C. (1979). "Effective Use of Econometric Models in Macroeconomic Policy Formu-
lation" in "Optimal Control for Econometric Models" (S. Holly, B. Rustem, M. Zarrop,
eds.), Macmillan, London.
Cohen, G. (1981). "An Algorithm for Convex Constrained Minmax Optimization Based on
Duality", Appl. Math. Optim. 7, 347-372.
Coleman, T.F. (1978). "A Note on 'New Algorithms for Constrained Minimax Optimization'
", Math. Prog. 15,239-242.
Conn, A.R. (1979). "An Efficient Second Order Method to Solve the Constrained Minmax
Problem", Department ofCombinatorics and Optimization, University of Waterloo, Report
January.
Demyanov, V.F. and V.N. Malomezov (1974). "Introduction to Minmax", l. Wiley, New York.
Demyanov, V.F. and A.B. Pevnyi (1972). "Some Estimates in Minmax Problems", Kibemetika
1,107-112.
Dutta, S.R.K. and M. Vidyasagar (1977). "New Algorithms for Constrained Minmax Opti-
mization", Math. Prog. 13, 140-155.
Granger, C. and P. Newbold (1977). "Forecasting Economic Time Series", Academic Press,
New York.
Han, S-P. (1978). "Superlinear Convergence of a Minimax Method", Dept. of Computer
Science, Cornell University, Technical Report 78-336.
Han, S-P. (1981). "Variable Metric Methods for Minimizing a Class of Nondifferentiable
Functions", Mathematical Programming 20,1-13.
Karakitsos, E. and B. Rustem (1991). "Min-Max Policy Design with Rival Models", PROPE
Discussion Paper 116, Presented at the SEDC Meeting, Minnesota.
Lawrence, M.J., R.H. Edmunson and M.l. O'Connor (1986). ''The Accuracy of Combining
Judgemental and Statistical Forecasts", Management Science 32, 1521-1532.
Makridakis, S. and R. Winkler (1983). "Averages of Forecasts: Some Empirical Results",
Management Science 29, 987-996.
Medanic, J. and M. Andjelic (1971). "On a Class of Differential Games without Saddle-point
Solutions", JOTA 8, 413-430.
Medanic, J. and Andjelic (1972). "Minmax Solution of the Multiple Target Problem", IEEE
Trans. AC-I7, 597-604.
Murray, W. and M.L. Overton (1980). "A Projected Lagrangian Algorithm for Nonlinear
Minmax Optimization", SIAM J. Sci. Stat. Comput. 1,345-370.
Polak, E. and D.Q. Mayne (1981). "A Robust Secant Method for Optimization Problems with
Inequality Constraints", JOTA 33, 463-477.
134 B. Rustem
Polak, E., D.Q. Mayne and J.E. Higgins (1988). "A Superlinearly Convergent Min-Max
Algorithm for Min-Max Problems", Memorandum No: UCB/ERL M86/J03, Berkely
California.
Polak, E. and A.L. Tits (1980). "A Globally Convergent, Implementable Multiplier Method
with Automatic Penalty Limitation", Appl. Math. and Optimization 6,335-360.
Powell, M.J.D. (1978). "A Fast Algorithm for Nonlinearly Constrained Optimization Prob-
lems", in G.A. Watson (Ed.), Numerical Analysis. Lecture Notes in Mathematics, 630,
Springer-Verlag, Berlin-Heidelberg.
Rustem, B. (1986). "Convergent Stepsizes for Constrained Optimization Algorithms", lOTA
49, 135-160.
Rustem, B. (1987). "Methods for the Simultaneous Use of Multiple Models in Optimal Policy
Design", in "Developments in Control Theory for Economic Analysis", (C. Carraro and
D. Sartore, eds.), Martinus Nijhoff Kluwer Publishers, Dordrecht.
Rustem, B. (1989). "A Superlinearly Convergent Constrained Min-Max Algorithm for Rival
Models of the Same System", Compo Math. Applic. 17, 1305-1316.
Rustem, B. (1992). "A Constrained Min-Max Algorithm for Rival Models of the Same Eco-
nomic System", Mathematical Programming 53, 279-295.
Rustem, B. (1993). "Equality and Inequality Constrained Optimization Algorithms with Con-
vergent Stepsizes", forthcoming lOTA.
PART THREE
ABSTRACT. Wavelets are a new method of spectral analysis that have attracted considerable
attention in numerous fields. Unlike Fourier methods, wavelets are designed to analyze
data that is nonstationary and subject to abrupt changes. Since macroeconomic data frequently
contains these characteristics, wavelets appear to be a natural tool for studying macroeconomic
time series. This paper first describes wavelets in an intuitive manner, and then explores their
use on macroeconomic data. Initial results are encouraging and more research is in order.
1. INTRODUCTION
a. References on Wavelets
In this and the next section wavelets are described in a very intuitive fashion. More
rigorous descriptions can be found elsewhere. Press (1992) has a particularly good
description of the wavelet decomposition algorithm and contains the Fortran code
to perform it. Strang (1989) provides a more mathematical treatment. Rioul and
Vetterli (1991) describe wavelets from a signal processing perspective. The ultimate
authority is Daubechies (1988).
The m + 1 values of Ck determine the mother function </J. This equation is recursive:
the sum of modified forms of </J equals </J itself. Viewed in graphical terms, it can be
seen that the 2 in the right hand argument of (1) shrinks </J horizontally, the k shifts </J
horizontally, and the Ck shrink or expand </J vertically.
As a simple initial example, consider the case with m = 1 and Co = Cl = 1 so
that (1) becomes
The solution to (1) is the box function, which has value lover [0,1] and is zero
elsewhere. The coefficient 2 in the argument of the first term on the right hand side of
(2) shrinks </J horizontally by a factor of 2. Without the shift given in the second term,
t.
the box extends from a to The second term on the right hand side of (2), however,
t,
shifts the narrowed </J right by thus completing the solution. This is illustrated in
Figures 1 and 2, which show the left and right hand sides of (2), respectively.
A very interesting and useful family of orthogonal mother functions was discov-
ered by Daubechies (1988). Members of this family are both orthogonal and have
compact support, and are categorized by their ability to form polynomials of different
orders. Each member of this family, when taken as a linear combination with itself,
generates a given order polynomial. The first member is the box function; when
taken in linear combination with itself, it forms a constant value (i.e. a polynomial
of degree zero). The second member, which has four Ck coefficients, is called D4 by
Strang (by this notation, D2 is the box function). D 4 , taken in linear combination
with itself, forms a line of arbitrary constant slope (Le., a polynomial of degree one).
Its coefficients, in equation (3), are more complicated than those of the box function.
Wavelets in Macroeconomics: An Introduction 139
o ~ _ _---l
-1 o 1 2
Box Function
Fig. 1.
Ol----..J
-I o 1 2
Box Function
Fig. 2.
(3)
Higher order members of this family generate higher order polynomials, and, as can
be seen in going from D2 to D 4 , with each increase in the order of the replicated
polynomial, the number of Ck 's increases by two.
Figure 3 shows the D4 mother function. Its asymmetry is striking, and while it
is everywhere continuous, it is not everywhere differentiable. Specifically, it is not
left differentiable at so-<:alled dyadic points: k/(2 j ) for k and j integers (Pollen,
1992). This bizarre shape arises from the stringent requirements of local support and
orthogonality. A linear combination of D4 mother functions forms a polynomial of
degree one because the cusps cancel one another.
An important characteristic of mother functions and wavelets is that many are
constructed recursively; they cannot be written out in closed form like most more
familiar functions. For instance, Figure 3 was generated by using equation (l)
recursively with Ck as given in (3) and initial values for ¢(1) and ¢(2). Equation (1)
first yields ¢(~), ¢(1~) and ¢(2~). Next, values at ~ intervals can be constructed,
and so on. The function is non-zero only on [0,3).
140 w.L. Goffe
1.25
0.75
0.5
0.25
o
-0.25
o 0.5 1 1.5 2 2.5 3
D4 Motbcr Function
Fig. 3.
;---
-1 '--
-1 o 1 2
Haar Wavelet
Fig. 4.
c. Wavelets
A wavelet is described by
m
W(x) = L (-1)k ct _k</>(2x - k) . (4)
k==O
W (x) clearly bears a very close connection to (1); a wavelet is basically a rearranged
mother function. Figure 4 shows the Haar wavelet whose mother function is the box
function, and W4 is likewise similar to its mother function, D 4 •
X(I)] _ al
x(2)
[ x(3) - V2
[4]
cos(-7r /8) a2 [4]
cos(37r /8)
cos(27r/8) + V2 cos(67r/8)
x(4) cos(37r/8) cos(97r/8)
+ a3
V2 [
cos(S7r /8)
4
cos(I07r/8)
cos( IS7r /8)
1+ V2 [4 1
a4 cos(77r /8)
cos(I47r/8)
cos(2l7r /8)
(S)
The basis functions for this decomposition consist of the four vectors on the right hand
side. The coefficients aI, ... , a4 are the discrete cosine transform of the vector x.
This equation clearly shows how a series can be decomposed into a sum of sinusoids
that vary in frequency. But it is important to note that these basis functions have
infinite support; that is, no elements of the basis vectors are zero. As a result, Fourier
type decompositions have difficulty picking out changes in x that are permanent or
fleeting; they are best for series that are stationary.
Equation (6) shows a wavelet decomposition of a time series of length 8 with the
Haar wavelet (as shown below, this longer length better illustrates the decomposition's
key features since it contains a variety of different sized wavelets).
xCI) 1 0 1 0
x(2) 1 0 1 0
x(3) 1 0 -1 0
x(4) bOl 1 b02 0 b ll -1 b 12 0
xeS) 2 0 +2 +2 0 +2
x(6) 0 1 0
x(7) 0 1 0 -1
x(8) 0 1 0 -1
1 0 0 0
-1 0 0 0
0 0 0
b21 0 b 22 -1 b23 0 b 24 0
+- +- +- +- (6)
V2 0 V2 0 V2 1 V2 0
0 0 -1 0
0 0 0 1
0 0 0 -1
The first two vectors on the right hand side are the mother function for the Haar
wavelet, which of course is the box function (the terms in the coefficients' denomi-
nators are part of the wavelet or mother function and make them orthonormal). The
other six vectors are the actual Haar wavelet at two scales and locations. The first two
of these are on a larger scale than the second four. The discrete wavelet transform
consists of the bij coefficients. The two subscripts demonstrate the two-dimensional
nature of the decomposition, one of its key features. These coefficients are catego-
rized by i = 0, 1 and 2. For i = 0, the basis is the mother function. For i = 1, the
142 W.L. Goffe
basis is the largest wavelet, and for i = 2, the smallest wavelet. In general, a wavelet
decomposition first contains two coefficients for the mother function. Then, defining
the length of a wavelet by its number of non-zero elements, coefficients for wavelets
cover half the series' length, then a quarter, an eighth, and so on, until the smallest,
which have two non-zero elements. As the coverage of each wavelet shrinks, the
number of wavelets of that size doubles. In (6), this can be seen when moving from
the third and fourth vectors to the fifth through eighth ones. Overall, this means that
the series to be decomposed must have a length that is 2 to a power (Le. 2,4,8, 16,
32, ... ). Unfortunately, this imposes some unwanted restrictions on the length of the
series one can study. For example, it does not appear that the process can be tricked
by padding the series with additional data, since any additional data would influence
the decomposition.
Most importantly, each wavelet contains a considerable number of zero elements
(the so--<:alled compact support). Thus, the discrete wavelet transform has the poten-
tial to "pick up" unique phenomena in the data.
Wavelets of different lengths refer to properties of the data at different frequencies,
though the preferred term in this literature is scale. The longest wavelets contain
elements of the data at low frequency and large scale, while the smallest wavelets
embody high frequencies and small scales.
To further illustrate wavelet decompositions and lay the groundwork for this
display in this paper, consider the following actual decomposition:
1 1 0 1 0
2 1 0 1 0
1 1 0 -1 0
3 3.5 1 6.5 0 .5 -1 3.5 0
10
-
2 0 +-
2 1 2 0 +-
2 1
0 0 1 0 1
2 0 1 0 -1
1 0 1 0 -1
1 0 0 0
-1 0 0 0
0 1 0 0
V2f2 +V2f2
0 .j2 -1 5.j2 0 0
-.j2 0 -.j2 0 +.j2
- 1 - -
.j2 0
. (7)
0 0 -1 0
0 0 0 1
0 0 0 -1
Note how the original series on the left hand side has a "spike" - the fifth term is
much larger than the others - and how the coefficient on the third of the smallest
wavelets (the seventh vector on the right hand side), located at the site of the spike,
reveals this phenomenon with its relatively large coefficient.
Figure 5 displays the discrete wavelet transform of (7). The bottom panel shows
the actual time series arranged horizontally with the smallest values on the bottom.
Wavelets in Macroeconomics: An Introduction 143
b2 • I I
bl I I I I
bOt---_---+---=----I-=--I- -=- -lI
234 S 678
Sample Wavelet Decomposition
Fig. 5.
Note the spike in the data at the fifth observation. The top panel shows the wavelets'
coefficients. The convention followed in their display uses the relative size of the
coefficeints within each category of wavelet. In the top panel, the bOj coefficients are
on the bottom row, the b lj coefficients are in the middle row, and the b2j coefficients
are on the top (the bo, bl and b2 terms on the left hand side of the figure denote the
rows). The coefficients in each of these three rows are arranged in terms of their
relative size, so the largest coefficient value "fills" its location in that row while the
smallest leaves it empty. Intermediate values fill the height of the row proportionally.
In (7), bOI is associated with the first half of the time series, and b02 is associated with
the second half, given the location of the mother functions in the first two vectors.
With bOi = 3.5 and b02 = 6.5, the first half of the bottom row, associated with the
smaller value 3.5, is empty, while the space to the right is full. Likewise, the middle
row, that for blj, has only two parts, so the left half, associated with the smaller blj
coefficient, is empty while the other half is full. Note that in both of these rows,
the wavelet coefficients cover a length of four, thus accounting for the four lines. In
the top row, that for b2j , there are four parts. Again the largest coefficients fill the
panel height; the smallest are empty and the intermediate values are so noted. In
the displays that follow, the first two rows of the wavelet coefficients are frequently
dropped since they convey little information (one part being invariably the largest
and the other the smallest) and take up valuable space.
The algorithm that generates the discrete wavelet transform is relatively simple
and quite fast. An intuitive description of the algorithm is given in Press (1992).
Briefly, a matrix is first formed with the Ck 's oriented roughly along the diagonal. The
data vector is mUltiplied by this matrix, and the resulting vector is sorted. Another
multiplication then takes place with half of the sorted vector and the matrix of the
Ck'S. This process of multiplication, sorting, and discarding half the vector continues
until the sorted vector contains two elements. The vector formed from the discarded
elements contains the discrete wavelet transform coefficients.
The algorithm's operation count is proportional to n, the length of the original
data and is therefore faster than the fast Fourier transform, whose operation count
is proportional to n . log2 (n). Thus, there are no computational constraints to the
discrete wavelet transform for most economic data and certainly none for macro data.
144 W,L Gaffe
10 20 30 40 SO 60
Wavelet Decomposition of a Step Function
Fig. 6.
To better understand how the discrete wavelet transform decomposes data, we use
it here on several pathological time series. Figure 6 shows the graphics associated
with a discrete wavelet transform of a step series of length 64 (recall that the series
to be studied must have length of 2 to an integer power). It has a value of 1 prior to
observation 31 and jumps to 1.1 thereafter. The Haar wavelet seems to do a better
job than other wavelets on macro series (as described below), and so is employed
in this section. As can be seen, there is an abrupt change in all categories of the
wavelet coefficients at the point where the time series jumps. Note the bottom row
of the wavelet panel contains 4 wavelet coefficients (the first, third, and fourth have
the same size), each covering a length of 16, the next has 8 of length 8, the next 16 of
length 4, and the final 32 of length 2. The wavelet coefficients with only two entries
are not shown. In every row, all coefficients but one are the same size (this accounts
for the all or nothing height in the blocks). The change at the jump in the original
series is seen with all coefficients, even though they cover vastly different scales and
provide different resolutions for marking the break.
Figure 7 shows the decomposition of a sine wave. Once again, increased reso-
lution and detail are seen as one moves up the coefficient panel. This figure is most
instructive when compared with Figure 8, whose frequency is the same as that in
Figure 8, and then increases by 20% at the midpoint. While the change in frequency
is barely noticeable when comparing the two time series, note the very different low
frequency (long scale) behavior with the two sets of wavelet coefficients in the two
figures. Further, a comparison of the higher frequency wavelet coefficients shows
differences as well. While not as striking as Figure 6, it nonetheless shows the change
in the series at its midpoint.
Figure 9 shows a series whose slope increases by 20% at its midpoint. While
the change is barely visible to the eye, it is quite obvious by looking at the wavelet
coefficients panel that a major change has taken place in the actual location of the
change in the slope.
Wavelets in Macroeconomics: An Introduction 145
10 20 30 40 SO 60
Wavelet Decomposition of a Sine Wave
Fig. 7.
10 20 30 40 SO 60
Sine Wave with a Change in Frequency
Fig. 8.
b31~~===:#
b,~~~~~~________~
10 20 30 40 SO 60
Line with a Change in Slope
Fig. 9.
146 w.L. Goffe
This section is quite speculative since the work on wavelets in macro is very prelimi-
nary. However, it does suggest some interesting useS for wavelets in economics. The
Haar wavelet is used for all decompositions in this section since it proved to generate
the most interpretable decompositions. The D4 wavelet yielded decompositions of
macro data that did not correspond to known features in the data, such as business
cycles and secular changes. Other wavelets were not investigated.
Figure 10 shows the discrete wavelet decomposition of the quarterly growth rate
of real GNP for the period from 195811 to 19901 (128 observations). Business cycle
turning points are denoted by vertical lines in the bottom panel (dates determined
by the NBER). In turn we see the end of the 1957-1958 recession, the 1960--1961
recession, the 1969-1970 recession, the 1973-1975 recession, the 1980 recession,
and finally, the 1981-1982 recession. This figure is not particularly revealing; the
major revelation is the low and mid-frequency change in 1974, which appears to
capture the well-known slowdown in the economy's growth rate. The high frequency
components cannot be identified with any particular phenomena. Rather surprisingly,
the wavelet decomposition reveals only long term phenomena on real GNP's growth
rate.
Figure 11 shows the results for the level of the log of real GNP, and it conveys
more information than the previous case. In every recession but the first, which is only
partly included in the data, the highest frequency wavelet coefficients are quite large.
With the single exception of a period in 1959 (a downturn, but not a recession), they
are almost never as large in recessions, and so these large high frequency coefficients
identify recessions. These results hold up even better in the wavelet coefficients
with the next lowest frequency, though naturally with less resolution. At the low
frequencies, one sees significant changes in 1966Ito 196611 and 1974lto 197411. The
Wavelets in Macroeconomics: An Introduction 147
later seems to signify the change in the growth rate in the early 1970's. Overall, this
figure is rather impressive since it uses unfiltered data and it picks up both low and
high frequency items of interest.
Figure 12 examines the change in nominal nonfarm business sector compensation
per hour (Citibase series LBCPU) over the same dates as the real GNP data. The
first thing we see is how the spikes in the data are picked up by the high frequency
coefficients. We also note that the two lowest frequency components pick up breaks
after 19661, 19741 and 19821 with some evidence for a break after 19701 in the next
to lowest frequency coefficient. Balke (1991) studied this series for evidence of level
shifts using a modified version of Tsay's method for detecting outliers (1988). He
identified level shifts in 19681, 1972IV, and 198211. Given the low resolution of low
frequency wavelets, the match between these two very different methodologies is
remarkable.
Figure 13 analyzes a synthetic series. The data are generated from an ARIMA
model that Blanchard and Fischer (1989) fit to GNP growth. Thus, it is designed
to replicate the time series in Figure 10. While the two time series appear to have
the same characteristics to the eye, notice the relatively greater variation in the high
frequency components in Figure 13 than in Figure 10 (this is somewhat hard to
see given the different sizes of the figures). This suggests that the discrete wavelet
transform may be useful as a diagnostic tool for fitting models or as a method for
detecting data characteristics ignored by models.
6. CONCLUSION
This paper illustrated intuitively the discrete wavelet transform and then went on
to explore its usefulness for macroeconomics. Although very preliminary, initial
148 w.L. Gaffe
10 20 30 40 50 60
Blanchard-Fischer Data
Fig. 13.
tool for model building was demonstrated with data generated from the Blanchard-
Fischer equation, which was shown in Figure 13 to have a different decomposition
than actual data. Finally, it might be used not only as a diagnostic tool for statistical
and econometric models, but also for data generated from macroeconomic models.
ACKNOWLEDGEMENTS
I would like to thank, without implicating, Nathan Balke, Tony Smith, David Belsley
and the participants of the session titled "Computational Elements in Econometrics
and Statistics II" at the 14th Annual Congress of the Society for Economic Dynamics
and Control in Montreal in June, 1992.
REFERENCES
Balke, Nathan S. "Detecting Level Shifts in Time Series: Misspecfication and a Proposed Solu-
tion." Richard B. Johnson Center for Economic Studies Working Paper #9121. Department
of Economics, Southern Methodist University. 1991.
Blanchard, Olivier Jean and Stanley Fischer. Lectures on Macroeconomics. Cambridge MA:
MIT Press, 1989.
Carey, John. "'Wavelets' Are Causing Ripples Everywhere." Business Week. Febuary 3,1992:
74-5.
"Catch a Wave." The Economist. 323 no. 7754 (1992): 86.
Daubechies, Ingrid. "Orthonormal Bases of Compactly Supported Wavelets." Communications
in Pure and Applied Mathematics 41 (1988): 909-96.
Healy, Dennis M. Jr. and John B. Weaver. ''Two Applications of Wavelet Transforms in
Magnetic Resonance Imaging." IEEE Transactions on Information Theory. 38 (1992):
860-880.
Kolata, Gina. "New Technique Stores Images More Efficiently." New York Times. November
12, 1991: B5+.
Pollen, David. "Daubechies' Scaling Function on [0,3]." Wavelelts: A Tutorial in Theory and
Applications. Ed. Charles K. Chui. San Diego, CA: Academic Press, 1992.
Press, William H., Saul A. Teukolsy, William T. Vetterling and Brian P. Flannery. Numerical
Recipes in Fortran, The Art ofScientific Computing, second edition. New York: Cambridge
Universtiy Press, 1992.
Rioul, Olivier and Martin Vetterli. "Wavelets and Signal Processing." IEEE Signal Processing
Magazine. October 1991: 14-38.
Strang, Gilbert. "Wavelets and Dilation Equations: A Brief Introduction." SIAM Review 31
(1989): 616-27.
Tsay, R. S. "Outliers, Level Shifts, and Variance Change in Time Series." Journal of Forecast-
ing 7 (1988): 1-20.
Wallich, Paul. "Wavelet Theory: An Analysis Technique that is Creating Ripples." Scientific
American January 1991: 34-35.
C.R. BIRCHENHALL*
1. INTRODUCTION
* This document is partly derived from an earlier piece writtenjointIy by myself and Jarlath
Trainor. Many of the strengthens of the current document reflect Jarlath's contributions, but
he has no responsibility for the errors and weaknesses that have undoubtedly arisen from my
rewrite of this document or MatClass. Many thanks for Jarlath's assistance on this project.
My thanks also to the Department of Econometrics & Social Statistics for funding Jarlath's
time in the department. Anyone who is working with Unix workstations will know the value
of having a Unix wizard at hand; in my own case I am greatly indebted to Owen LeBlanc from
the Manchester Computing Centre. Finally my thanks to Ericq Horler and David Belsley for
their careful reading of an earlier version of this document and the removal of many errors.
While their efforts has greatly improved this work they cannot be held responsible for the
remaining inadequancies.
D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 151-172.
© 1994 Kluwer Academic Publishers.
152 C.R. Birchenhall
reasons for this has been the perceived complexity of software development and the
scarcity of requisite expertise. The success of GAUSS suggests that with appropriate
tools a significant number of economists will turn their hand to programming rather
than rely solely on the menu provided by the packages. If there is to be any prospect
of developing a subculture of economic code, there needs to be an appropriate
set of tools and standards. Recently there has been a good deal of interest in using
object-oriented programming systems (OOPS) in reducing the difficulties of software
development; in particular there has been a flourish of interest in the C++ language.
The central aim of the MatCiass project is to go some way in evaluating the potential
for an object-oriented approach to econometric computation using C++.
1 I cannot recall who it was who suggested that the old Algol-Fortran argument was now
settled in Algol's favour, but it seems to be a fair comment on Fortran's belated acceptance
that there is merit in structures and the "new" controls.
MatClass: A Matrix Class/or C++ 153
2 MS-Windows suffers from poor memory management and instability due to its depen-
dance on DOS. Windows/NT or even OS/2 v2 hold more promise for sustained development
of Intel based systems.
154 C.R. Birchenhall
structure is extremely powerful and sadly missing from GAUSS. Needless to say
GAUSS offers no form of inheritance between types.
One minor concern is the long term support of GAUSS. Unlike Fortran with its
multiple sources and open libraries, GAUSS comes from a single source, and the
underlying code is closed. As far as I know there is no secondary source for GAUSS
interpreters, and this could be a major problem if this single source should fail. This
is not to suggest GAUSS should have no long term role; indeed it will undoubtedly
remain a major force in code development but I would be concerned to become solely
dependent on the current system.
It can be noted that most of the above comments on GAUSS apply to MATLAB,
although it is already available on most systems. While few econometricians are
using MATLAB, it is a system offering a similar set of basic facilities, although
it lacks some of the statistical intrinsics and does not have the various add-ons.
Coming from one of the authors of UNPACK, it has found favour among numerical
analysts[7]. It has been suggested that MATLAB not only facilitates the learning of
numerical methods, but it can also be used for prototyping new code before being
translated into Fortran. A similar role could be, and possibly is being, played by
GAUSS.
So where does C++ stand in relation to these existing standards? What makes
C++ interesting is its promise of combining object-oriented methods with runtime
efficiency. But C++ is currently more promise than fact and will not provide a
serious challenge to GAUSS or Fortran until there appropriate class libraries on
which users can build. C++ uses the concept of a class to allow object-oriented
extensions to the C language, i.e. developers can use the class system to extend
the object types supported by the compiler. MatClass is one of several classes that
defines a matrix type and gives the C++ user a matrix syntax and other facilities
similar to those offered by GAUSS or MATLAB.3 But that is not all, for the set of
possible data types are limited only by the programmer's imagination. MatClass uses
higher level classes to embody higher level functionality. For example, it offers a
family of classes to embody ordinary least-squares models. These OLS classes give
the C++ programmer a "package" that hides the storage and computational details
underlying the least-squares procedures. At the other extreme, MatClass has low-
level classes for debugging and error management. For example there is a generic
class, matObject, with error management capabilities whose features are inherited by
all other objects. These few sentences cannot give the reader a complete picture of
object-oriented programming, but we hope we have conveyed the image of a flexible
and extensible system for which MatClass is but a simple beginning.
In comparing the C++ and MatClass combination with GAUSS and Fortran,
note that MatClass acts as a complement to a C++ compiler which links the user
program with the MatClass library before execution. MatClass based programs, then,
are compiled to machine code, having the advantage of efficient low-level routines
but the disadvantage of compiler and linker lags. Unlike Fortran, C++ is a young
3 A commercial based class that offers similar, if not greater, facilities is M++ from Dyad
Software.
MatClass: A Matrix Class/or C++ 155
language, and there are few numerical libraries available. Fortunately, C++ is a
superset of C, and thus existing C libraries, in particular Numerical Recipes in C
[13], are readily adapted for use in C++. MatClass aims to make some contribution
in this direction. In its favour, when compared to Fortran 77, MatClass gives the C++
programmer a matrix syntax and an object-oriented environment. Apart from the
issue of compiler versus interpreter, the comparison of MatClass and GAUSS focuses
on C++'s support for, and MatClass's exploitation of, object-oriented programming.
A fuller discussion of OOPS will be offered below. Here we simply emphasize that
use of C++ classes gives the developer a highly structured language with strong
type checking, and note that most compilers come with source level debuggers that,
together with the debugging features of MatClass, make the debugging of larger
projects relatively straightforward.
Some might wonder whether a highly structured approach to numerical computing
with layers of types and classes might not suffer from runtime inefficiences. It will
be demonstrated that the combination of MatClass with a good quality compiler
produces code that is efficient, particularly for low-level tasks. Furthermore the code
is immediately portable to RISC based Unix - for example, timings will show a HP
720 workstation running up to 10 times faster than a 33Mhz 486 PC.
All in all, then the combination of (a) having the source code in the public domain,
(b) the compiled nature of user written routines and (c) C++'s support for OOPS,
should make MatClass a good candidate to act as a foundation for an "open sytem" of
software. While C++, with appropriate classes, will not replace Fortran or GAUSS,
it promises to be a valuable third leg for developers.
An extensive evaluation of C++ cannot be given here, but a few words are in order.
C++ is a wide spectrum extension of C that allows the programmer to combine low-
level and object-oriented styles in the same program. In part C++ can be viewed as
a better and extended version of the C programming language. From this perspective
C++ is seen to inherit from C its efficiency,flexibility and portability. As a better C,
it offers better type checking and the opportunity to escape some of the danger points
of its parent language e.g. defined constants and macros. In the context of numerical
methods, the expectation is that C++ will allow a more highly structured approach
than its parent language while retaining its efficiency. A matrix extension to C++
promises to give even greater functionality in carrying out matrix calculations. In
particular it offers greater security than C and can incorporate the methods needed
to overcome the technical weaknesses in C's handling of multi-dimensional arrays.
Such extensions to a standard general purpose language offer greater openness and
portability than characterize proprietary packages such as GAUSS.
C++ is a general purpose language that does not have an intrinsic matrix type,
but its support for classes allows users to define their own types that have the same
privileges as intrinsic types. With a C++ matrix class, the user has a system that
combines the general power and flexibility of an increasingly popular general purpose
156 C.R. Birchenhall
dreaded compilers, it is not clear that sufficient consideration has been given to the
deficiencies of GAUSS and to the attractiveness of C++ as a developer system.
See section 9 for a discussion of the relative runtime efficiencies of GAUSS and
MatClass.
One further aim of MatClass is to give the user ready access to state of the
art numerical methods. The current structure of MatClass reflects an interest in
implementating the key matrix decompositions and for solving equations and least-
squares problems. Thus, the two main families of classes, matDec and matOls, are
effectively "packages" built around the LU, Cholesky, Householder QR, and singular
value decompositions. Key references are Press et aI's Numerical Recipes in C [13]
and Golub and Loan's Matrix Computations [7] . Under the influence of the numerical
methods literature some emphasis is placed on condition numbers, and both families
give the user some form of estimating the condition number. In the same spirit,
MatClass aims to give full support to singular-value (SVD) methods, particularly
for least squares problems. Singular values not only have a ready interpretation as
measures of "near singularity", but the SVD algorithms are numerically superior
when the problem is effectively rank deficient. SVD methods have not being given
the role in the econometrics literature they deserve, particularly when discussing rank
and multicollinearity. The SVD should be readily available to all econometricians
and even if it is felt that the computational costs of SVD are too high for routine
work, it should nevertheless be possible to estimate the conditioning of the problem
reliably, regardless of the underlying algorithm. If ill-conditioning is suggested then
the worker can switch to SVD methods - or reconsider the structure of the model!
It has to be noted that from this perspective GAUSS scores highly when compared
with standard statistical packages.
In summary MatClass aims to provide services similar to those of GAUSS and
MATLAB by giving the user a matrix syntax with access to state of the art numerical
methods. What distinguishes MatClass from these others is that these services are
provided in the framework of a compiled object-oriented environment and the full
source code is to be placed in the public domain, a combination that allows third
parties to correct and improve the code without runtime penalties. The object-oriented
nature of the underlying compiler also allows a hierarchy of services to be developed
in an efficient and systematic manner. To add new features to the system, the user
does not have to rewrite the existing "package", his new offerings simply inherit and
build on the features of the existing system. It is this aspect of MatClass to which we
now turn.
by OOPS. Inheritance refers to the ability of a new object type to inherit the properties
of an existing type. When adding a new, or replacing an old, feature of an existing
class, one simply extends or modifies rather than rebuilds the existing classes. In this
way a layered structure of classes can be built on top of the fundamental management
classes and the basic numerical procedures, allowing users to choose their own
level of access. These features are important in the development of complex and
co-operative software, particularly when the development involves more than one
programmer. It will be argued that C++ with appropriate classes promises to be an
efficient, highly structured environment in which to develop facilities and packages
either individually or jointly.
The essential element of an object-oriented programming systems is its support
for classes of objects that can encapsulate data types and inherit properties. To
understand these concepts better and appreciate the arguments favouring their use,
the reader is refered to Grady Booch's Object-Oriented Design with Applications[l].
Booch argues that the use of OOPS offers significant improvements over the more
familiar structured programming approaches, particularly for complex "industrial-
strength" software. Booch's discussion is directed to software houses which have
to manage a team of programmers on a large project, but it is equally valid for a
community of professionals developing a common system.
Booch (p.77) offers the following definition of an object:
An object has state, behaviour, and identity; the structure and behaviour of
similar objects are defined in their common class; the terms instance and object
are interchangeable.
An object state will include not only the external state as perceived by the user of the
object but also an internal state that reflects the details of the class implementation.
For example, the external state of a matrix includes its dimensions (number of
rows and columns) and the contents of its elements. In MatClass the internal state
includes a 'map' which is used to translate references to matrix elements into memory
addresses.
As part of the behaviour of a matrix we would certainly include the ability to be
used in arithmetic expressions, e.g. be added to or multiplied by other, conformable
matrices. From from the perspective of the user of a class, it is important that objects
in the class behave in a valid and readily understood manner. The user should not
typically have to consider the details of the implementation in order to understand the
usage of the class - although occasionally efficiency may demand considered usage
of a class. Indeed, it is highly desirable that the details be hidden from the users,
restricting their access to a well defined interface. Such data and method hiding,
i.e. encapsulation, allows the developer of the class to modify the implementation
without upsetting other modules or user programs. For example, MatClass allocates
storage for matrices column by column. It is relatively straightforward to modify
a few key procedures to allocate the whole matrix in a single contiguous block of
memory without changing the way in which matrices are used. Users of the class
would continue to address individual elements of the class in the same manner, i.e.
A ( i , j) would still refer to the i-jth element of the matrix A.
MatClass: A Matrix Class for C++ 159
reference matrix - an object that is used to reference the whole or part of some other
matrix. Thus, given a matrix A we can declare a retMatrix col to be used to refer to
columns of A. Having attached col to A, we can instruct col to reference a specific
column of A, say the ith, with col. ref Col (i). Thereafter col will act as if it
were the ith column of A- at least until it is instructed to refer to some other part
of A. For example the statement col = 1 would assign the value 1 to the elements
of the ith column of A. A reference matrix differs from a standard matrix in having
to contain a reference to its underlying matrix, for example col holds an internal
reference to A, and so it can be instructed to refer to various parts of that underlying
matrix, for example col could be used to step through the columns of A. Otherwise
a retMatrix inherits all the properties of a standard matrix. The retMatrix class is
derived from the matrix class and any instance of retMatrix is a matrix and can be
used wherever a matrix can be used.
The reader can no doubt imagine objects that are not matrices but which, on
occasion, should behave as matrices. In statistical applications, for example, it can
be useful to have the concept of a data set which can act like a matrix (of variables or
cases), but can also handle variable and/or case names and other sample information
such as missing values and periodicity. Likewise, a table is a useful output device
that may take the form of a matrix of numbers with column and row labels - think
of the standard table of estimated coefficients, standard errors and t-values. Here we
would wish to manipulate the numerical fields like a matrix while having additional
features for handing labels and for displaying of the labels and matrix.
It has to be stressed that the process of making one class inherit the properties of
another is simple and relatively automatic. Virtual methods allow this process to go
a stage further. With virtual methods a child can chose to override those methods
of the parent that are unsuitable for its needs, and the child need not inherit all the
features of the parent. For example, MatClass matrices can be reset in the sense that
their dimensions can be modified dynamically. This would not be appropriate for a
retMatrix, so the reset method is made virtual so that the retMatrix version can treat a
call to reset as a fatal error. Similarly the concrete svdDec class overrides its MatDec
parent's condition number method.
The foregoing discussion has shown that the key to OOPs programming is the
implementation and use of appropriate classes of objects. Table 1 lists the classes
currently offered by MatClass.
MatClass aims both to smooth the path for new programmers in writing pro-
ductive code using C++ and to offer support for longer term development. To
assist the newcomer, MatClass offers a straightforward syntax that does not require
an understanding of the power and subtlety of C++. Thus MatClass assists with
the management of objects and errors, offers a simple input-output system as well
as a rich set of matrix intrinsics, supports the popular matrix decompositions, and
offers a family of OLS classes on which future extensions can be built. At the same
MatClass: A Matrix Class for C++ 161
TABLE 1
Main MatClass classes.
Class Name Role
time, the combination of a rich matrix class with an extensible object and error-
management system gives the more serious user a powerful development tool for
numerical research. It is likely that experienced programmers who have not used
object-oriented systems will suffer from some conversion pains - this author cer-
tainly did! Object-oriented programming styles are very different from traditional
procedural programming styles. Nevertheless the benefits are significant.
Alongside the central matrix class there is a number of general purpose classes
as well as higher level families of classes related to the concept of a matrix. One
162 C.R. Birchenhall
TABLE 2
MatClass scalar types.
Type Name Role
INDEX Unsigned integer, indices
REAL Standard floating point
DOUBLE Double precision floating point type
matError Enumerated type for common errors
such family is based on various matrix decompositions, including the LV, Cholesky,
QR by Householder reflections and SVD decompositions. Furthermore, there is a
family of ordinary least squares (OLS) classes based on the Cholesky, QRH and SVD
decompositions. Finally there are classes to support random number generation and
the use of special matrix functions.
To support these central classes MatClass provides several underlying and sub-
sidiary classes of objects. Apart from a concept of a matrix, which closely emulates
the mathematical idea, MatClass gives support to scalar variables, including real
numbers and integers, character arrays, and arrays of indices. MatClass uses the
terms REAL and DOUBLE to define real variables. The INDEX type is an unsigned
integer and is used primarily for index variables that control the addressing of ma-
trix elements. The LONG is an unsigned long integer that is used where an INDEX
variable will overflow.
MatClass supports string constants and introduces a class of char Arrays. A se-
quence of characters placed between double quotes such as " thi sis a s tr ing"
constitutes a string constant. Variables of type char Array can be declared and used
to store strings of characters. For example, the statement charArray name (40)
declares a variable name that can store a maximum of 40 characters, although the
contents of a charArray variable may be less than its maximum length. These
charArrays are particularly useful when reading strings from files.
The classes realArray and matMap are largely for internal use and offer the end
user few services.
The use of classes to implement random number generators and special functions
may seem surprising to those familiar with traditional programming methods. But
objects need not simply be data structures; rather they are structures which, through
their methods, offer a set of services to the user. A matRandom object not only has a
state (the current state of the random generator) but also offers a number of methods to
fill a matrix with random numbers and having a class of these objects affords the user
access to several independent generators. Likewise, most of the special functions
are implemented as classes, for example, there is a class logGammaFunc that is the
basis of the logGamma function. These special function classes are derived from
the abstract class matSpecFunc that offers its children a number of services. Thus
all special matrix functions are driven by a procedure in the parent class. While the
logGammaFunc class provides the specific code to return the value of logGamma
MatClass: A Matrix Classjor C++ 163
'include "matmath.hpp"
'include "olssvd.hpp"
'include "olschol.hpp"
ols.assign( y, x ) ;
ols.coeff( beta) ;
ols.stdErr( stderr )
tvalues • beta.divij( stderr )
out . newLine 0 ;
out( "Coeffs", width )( "Std Errors", width+2 ) ;
out ( "t-values", width+2 ) .newLineO ;
out( "----------", width )( "----------", width+2
out( "----------", width+2 ) ;
matFormat ( STACKED ) ;
matField( width )
( beta I stderr I tvalues ).put()
out . newLine 0 ;
} II results
Fig. 1. Using OLS Classes Part A.
for some real argument, the matrix version of logGamma is essentially governed
by code in the parent class matSpecFunc. In this way special real-valued functions
are converted into matrix functions without undue duplication of requisite loops and
error checking.
6. AN OLS EXAMPLE
Figures 1 and 2 list the two parts of a program illustrating the use of two OLS
classes, oIsSvd and oIsChoI, that are based on SVD and Cholesky decompositions,
164 C.R. Birchenhall
mainO
{
INDEX M, N
inFile data ;
data.open( Ijsex3n.dat" ) ;
data(M)(N)
X.get( data)
Y.get( data)
x • In( X )
Y • In( Y )
c • 1.0 ;
Y • Y.smpl( 2, M )
x • c I X.smpl( 2, M ) I Y.smpl( 1, M-l )
olsSvd olsl ;
out( "\nResults from olsSvd \n")
results( olsl, y, x ) ;
olsChol 01s2 ;
out( "\nResults from olsChol \n")
results( 01s2, y, x ) ;
return 0
} II main
Fig. 2. Using OLS Classes Part B.
respectively. The program is made up of two functions, resul ts and main. main
is always the entry point into a C++ program, so that execution starts with the lines
declaring the two INDEX variables Mand N. The main function then reads in a data set,
applies log transforms, and builds a regressand and a matrix of regressors including
a lagged regressand. Having created an olsSvd object olsl, it calls results to
print out some of the standard OLS results. This is repeated for an olsChol object
01s2. The output from the program is given in Figure 3.
This example illustrates some of the features of MatClass, but more interestingly it
also illustrates one of the uses of class hierarchies. Looking closely at the resul ts
MatClass: A Matrix Class/or C++ 165
RSS 0.05518
TSS 2.414e+04
SE 0.04795
RSQ 1
RBarSq 1
DW 0.7388
Cond 7.355
RSS 0.05518
TSS 2.414e+04
SE 0.04795
RSQ 1
RBarSq 1
DW 0.7388
Cond 10.66
function in Figure 1, you will see that its first argument is declared to be a matOls, not
an olsSvd or olsChol. Despite C++'s being a strongly typed language, the compiler
does not treat the calls to resul ts with objects of type olsSvd and olsChol as errors,
since these objects are matOls objects. Instances of derived classes are instances of
the parent class. We have a family of matOls classes, and any object in those classes
166 C.R. Birchenhall
can be used where a matOls object is expected. Thus we only need to write one
resul ts function for all existing and future matOls classes. Thinking in terms
of families of classes allows us to introduce a higher level of abstraction into our
programs, giving greater freedom for mixing and matching objects to specific needs.
Although not part of our example, the user can choose the specific class, and thus
algorithm, used by resul ts at runtime.
This example hints at what is possible when building a hierarchy of classes and
illustrates the fact that many of the benefits of inheritance can be exploited by stand-
alone functions such as results. Although MatClass does not currently offer
classes for nonlinear models, these featues of C++ are expected to be particularly
valuable in this area.
7. ERRORS IN MATCLASS
MatClass maintains a stack of function calls and displays a list of the active functions
when an error occurs. For a function to appear on this stack, it must declare a variable
of type matFunc. All major functions in MatClass exploit this facility and thus it
should be possible to identify the sequence of calls that led to the error.
Furthermore an error message will normally induce MatClass to generate a list of
objects. MatClass objects arrange themselves in "levels" for lists, the matrix class
itself is at level 3. The MatClass object lists can be restricted to "high" level objects
through the matListCtrl function.
Unfortunately, MatClass has no access to the C++ user defined identifiers, unless
the program explicitly names objects (using the name method). We eschew the details
of the object lists here, simply noting that each object attempts to provide summary
information on its state.
While most modern C++ compilers come with source level debuggers, and some
come with class browsers, I have found it useful to have MatClass-specific, debugging
facilities. This can be controlled by setting the depth of the trace. The default level
is zero which means no tracing takes place.
During a trace, each major MatClass function occuring at or below the trace depth
identifies itself by name. This is normally followed by information on the principal
object with which the function is working. Tracing can be switched on anywhere in
a program by setting a nonzero debug level and can be switched off by setting a zero
debug level. By default the debug output is written to the standard output file but can
be redirected to a disk file.
MatClass: A Matrix Class for C++ 167
This section compares GAUSS with the combination of MatClass and a C++ com-
piler as development systems. Currently GAUSS has advantages for those whose
needs are met by the various "add-ons" such as GAUSSX. MatClass cannot yet
satisfy these needs, but there is a commitment toward providing "higher" level capa-
bilities in future versions.
While the structure of GAUSS has advantages for "interactive" work and can
form the basis for significant research, it is not as well structured as C++.
In particular, it suffers from not being strongly typed and not offering object-
oriented facilities. As we have illustrated the extensibility characteristic of
MatClass makes it superior to traditional languages such as GAUSS.
In assessing MatClass, one is also assessing in part the available compilers. The
current version of MatClass compiles and runs with all compilers available to the
author: Glockenspiel version v2.0 d2, Borland's C++ version 3.1, Microsoft
C/C++ v7, Zortech v2.1, JPI vI, and Hewlatt Packard's CC compiler provided
with their series 700 workstations. The Glockenspiel and HP compilers use the
AT&T Unix-based cfront, and so it is expected that the class will readily be
ported to this and related systems.
Compiler systems are varied and their appeal will differ according to the back-
ground of the user. Experienced programmers will appreciate using Glocken-
spiel and other command-orientated systems with familiar development tools.
We shall see that the Glockenspiel system produces very efficient code - in
fact the machine code is generated by Microsoft's C compiler. By contrast,
'integrated" environments, such as those offered by Borland, will satisfy those
who want the compilation process to be straightforward and relatively auto-
matic. Anyone who is using systems such as GAUSS to produce programs of
any significant size will find the move to this environment relatively painless
and productive. Not only is the compilation process easy, but you also have a
first class editor and debugger.4
The fact that MatClass works with these compilers suggests that any MatClass
code can be readily ported across operating systems. Glockenspiel, Zortech and
JPI have versions of their compilers for OS/2 - a much maligned system that
is better than DOS - and Borland is flagged by IBM to be a major player in the
future. All DOS based compilers offer support for MS-Windows and several
come with optional DOS extenders.
The reader should not view MatClass as an attempt to give the user a total and
final solution for his computational work; rather it is an attempt to provide one
development path based on the potential of C++. There are already alternatives,
for example M ++ from Dyad Software, and I am confident there will be others.
One price to be paid for using a C++ compiler rather than GAUSS's pseudo-
compiler5 is the relative slow compiling. GAUSS has rightly gained a reputation
4 You also get command line versions of the compiler and make facility allowing you to
base your work on editors such as Brief or SPE.
5 GAUSS produces some form of pseudo-code rather than linkable machine object code.
168 C.R. Birchenhall
TABLE 3
Timings in seconds for 90 x 90 mUltiplication
System Machine Compile Initial Multiply Code Size K
GAUSSv2 486 PC NA 3.02 0.77 NA
GAUSS386 486 PC NA 2.34 0.62 NA
Glockenspiel v2 486 PC 12.9 0.05 1.87 85
Microsoft v7 486 PC 7.6 0.06 2.03 80
Borland v3.1 486 PC 2.8 0.05 2.08 84
for its speed of execution, and for some tasks this can be a critical factor. But
it has to be understood that this feature is only true of its intrinsic operations
such as matrix multiplication. When using low level code such as addressing
individual elements of matrices, however, the interpretitive nature of GAUSS
becomes highly inefficient. In any event the combination of MatClass with
a compiler system such as Glockenspiel competes well with GAUSS. Some
evidence on these issues is given below.
As a crude measure of relative efficiency Table 3 reports timings for the simple task
of multiplying two 90 by 90 matrices. Apart from forming the actual product, this job
involves initialising the two matrices with a nested pair of loops that step through the
rows and columns. This initialisation allows comparison of the relative efficiencies
in low-level access to the elements of the matrices.
Two PC configurations were used, an Opus and a DAN. Both used a 80486 at
33Mhz with 8Mbs of RAM and a 64K cache. The DAN had a 2Mb hard disk
cache. The timings for the GAUSS runs were based on the Opus, while those
for the C++ compilers were based on the DAN. The presence of the hard disk
cache significantly improved the compilation times. Both machines used DOS
s. It is fair to say that a minimum requirement for making good use of C++,
and implicitly MatClass, under DOS is a machine based on the 80386DX or
80386SX with a reasonable hard disk.
The timings for the HP CC runs were based on a Hewlett-Packard 720 worksta-
tion running a PA-RISC processor at 50Mhz, with 128k of instructor cache, 256k
of data cache, 32 megabytes of RAM, and running HP-UX. The compilation
and execution of the programs was done from the shell of GNU Emacs. Without
placing too much weight on one set of timings, these so-called "Snakes" are
truly impressive and bear witness to the possibilities offered by RISe. Some
readers may question the value of raw speed without general productivity tools
or may not see Unix workstations as their preferred platform, but I have found
the system with CC, GNU Emacs, T}3X and MatClass to be highly productive. As
the prices of Unix workstations and higher level PCs continue to fall, machines
of this calibre will soon be the standard for research. With such configurations
MatClass: A Matrix Class/or C++ 169
significance of compiler and linker lags largely disappear. Note the total times
for the HP is less than the total for GAUSS386. For more computationally tasks
the sheer processing power of the HP is even more important. All in all, it is
suggested that the ability to port code from PCs to powerful RISC systems is a
significant advantage of MatClass.
All compilers were set for speed optimisation - in the case of Glockenspiel and
Microsoft the options -Ox -Op were used. Glockenspiel translates C++ into
C and then calls the Microsoft C v7 compiler to generate machine code. The
full optimisation option is the -Ox -Op option for the C compiler. In the case of
Borland, the command line compiler was used with the options -02.
The reported compilation times are averages of 5 runs clocked with a stop watch.
No attempt was made to time GAUSS' compilation, partly because there is no
clear indication when the compilation is complete and partly because it seems to
be negligible for this small job. The timings for initialisation and multiplication
were generated directly by the code using calls to the system clock.
The execution times for the multiplication suggest that the advantages of
GAUSS' assembler code are real but not overwhelming when compared with
modern day optimising compilers. Although the performance of the GAUSS
intrinsics on the 486 is particularly impressive, the times for the Glocken-
spiellMicrosoft combination are equally so for code written in an object-oriented
system. Given the advantages of developing code in a higher level language the
argument for dropping down to assembler is not strong. It has to be stressed
that the implementation of matrix multiplication in MatClass does not involve
any fancy tricks.
Comparing Glockenspiel and GAUSS confirms the trade-off between compi-
lation times and execution speeds for low-level code. Consideration of the
initialisation times suggests the running of low-level code in GAUSS is very
slow when compared to fully compiled code. The conclusion is that GAUSS
gives good execution times as long as the main computations can be completed
using its intrinsics, but if one needs any low-level code, then GAUSS will not
give efficient run times. In particular, it is not feasible to substitute your own
GAUSS routines for GAUSS intrinsic functions without severe loss of effi-
ciency. In MatClass there is no runtime penalty for replacing or adding new
low-level functions.
The Glockenspiel V1.2 and Microsoft C V5.1 combination previously set the
standard for quality compilation of C++ on PCs. The consistency of this
combination has been impressive and the computational performance first class.
While version 2 of Glockenspiel still generates fine code and has the advantage
of being a port of cfront, the native compilers such as Microsoft v7 and Borland
v3.1 have developed into fine products.
Turning to the performance of the Borland compiler, it can be seen that the tim-
ings are almost as good as Microsoft. In the past the floating-point performance
of the Borland compilers has not been impressive. With this compiler they have
caught up. Furthermore their development environment is easier to use than
Microsoft's workbench - although I have been able to configure the latter better
170 C.R. Birchenhall
for my needs.
A comment may be offered on the sizes of compiled code. There is clearly a
danger that significant applications based on MatClass will soon run into memory
difficulties on PCs running DOS. There is a number of ways of overcoming this
limitation. Most of the compilers have a version that supports, even if they do
not provide, a DOS extender and this is likely to be best solution. And there
are smart linkers which do not link redundant code; for example the JPI DOS
compiler's version of the multiply program was approximately 48K, half the
size of the others. One interesting possibility is promised by the new Microsoft
C/C++ version 7 compiler that allows the use of p-code for modules that
are non time-critical. And of course there are overlay systems like Borland's
"VROOM"! The size of the HP code is not vastly greater than the PC code
despite it being based on RISC.
As a C++ class, the essential ingredient for the use of MatClass is a computer
running a C++ compiler. The rapid adoption of the language by the industry means
C++ is well supported on most popular systems, including PCs, Macintoshes, and
Unix workstations. It is likely that the minimal requirement for any significant use
will be a 386SX PC with a hard disk, or machines of comparable performance. The
author must confess to doing some of the original development of MatClass using
Glockenspiel on a humble domestic PCIXT clone, where compilation times were
something of an impediment and nearly impossible with later compilers. With the
current standard of a 486 machine with disk cache and graphics accelerator, however,
these compiler lags are rapidly becoming insignificant.
11. AVAILABILITY
The source code of MatClass is available from the author. The source itself is in the
public domain in the spirit of the Free Software Foundation; that is, the source code
for MatClass will be covered by a free licence which safeguards the free availability
of that code. My own experience with free software - Kermit, Emacs and TEX-
suggests that it works best when someone publishes, in a traditional book form, a
guide and manual. It is my intention to do this for MatClass and a draft of such a
text is available. This is also available in electronic form - in raw TEX, DVI, HP
or PostScript - but I retain the copyright to allow future publication. All of the
MatClass files are available from the UTS machine at the Manchester Computing
Centre using an anonymousftptouts .rnee. ae. uk. Look in the pub subdirectory
for rna te las s. Alternatively the files can be supplied on disk direct form the author
at a cost to cover expenses.
As free software, there will be no commercial warranty. Nor can I offer free
support as a right. Clearly I wish to make MatClass useful and as correct as possible,
MatClass: A Matrix Class for C++ 171
and toward this end I intend to respond to problems and bug reports as well as I
can within the available resources. I will try to support the use of MatClass with
Microsoft C/C++ v7 and Borland C++ Version 3.1 for DOS and HP's CC on their
series 700 workstations. I will also be able to give limited support to Glockenspiel
C++ Version 2.0d2 for DOS, but beyond that I can only make available pointers on
loading MatClass on other systems.
12. BIBLIOGRAPHY
The full use of MatClass requires some understanding of C++. Probably the best
overall introduction is Stanley B. Lippman's C++ Primer [10]. Beyond that, consider
Programming in C++ by Stephen C. Dewhurst and Kathy T. Stark [3]. Look to
Lippman for further information on control structures such as for loops and if
statements and for the general structure of functions and methods in C++. Dewhurst
and Stark give an excellent introduction to the potential offered by C++'s object
orientation. To go further yet see the excellent Advanced C++ Programming Styles
and Idioms by Coplien [2].
The two "bibles" on C++ are authored by the originator of the langauge, Bjarne
Stroustrup. The C++ programming Language[14] was the effective definition of
the first version of the language. With coauthor Margaret Ellis, The Annotated C++
Reference Manual [6] is a draft of an ANSI standard definition of the second version
of the language. Neither of these texts is for the casual reader.
The author must acknowledge the influence of Bruce Eckel's Using C++ [5] and
Scott Ladd's C++ : Techniques and Applications [9].
See Booch [1] and Meyer [11] for a discussion of object-oriented software design.
REFERENCES
13. W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling. Numerical Recipes in C,
Cambridge University Press, Cambridge, 1988.
14. B. Stroustrup. The C++ Programming Language, First Edition, Addison-Wesley, Read-
ing Mass., 1986.
ISMAIL CHABINI, OMAR DRiSSI-KAiTOUNI AND MICHAEL FLORIAN
1. INTRODUCTION
The recent development and use of computing platforms based on parallel processing
architecture has had a major impact on many fields of science and economics that re-
quire intensive computations (Bertsekas and Tsitsiklis, 1989; Zenios, 1989; Pardalos
et aI., 1990). The efficient use of various parallel computing architectures requires the
development of new code that takes advantage of the development tools and compil-
ers that are available. In this paper, we report on parallel implementations of primal
and dual algorithms for matrix balancing problems on a network of Transputers.
This is probably one of the least costly Multiple Instruction Multiple Data (MIMD)
(Flynn, 1972) parallel computing platforms available and is best suited for "coarse
grain" parallelization of sequential codes. Nevertheless, our experiments indicate
that significant gains in speed and efficiency accompany its use when compared to
the execution of the sequential code on a single Transputer.
This paper is organized as follows. In the following Section, the matrix balancing
problem is described. Sections 3 and 4 present primal and dual algorithms for this
problem. Then, the parallel versions of these algorithms are described in Section 5.
The computational results obtained and their evaluation are given in Section 6. Our
conclusions and views on further development of parallel computing implementations
of matrix balancing problems comprise the last Section.
The matrix balancing problem that we consider can be defined as follows: given an
n x m nonnegative matrix gO, supply and demand vectors 0 and D of dimensions n
and m, respectively, an n x m matrix g* is sought that satisfies
D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 173-185.
© 1994 Kluwer Academic Publishers.
174 I. Chabini et al.
m
(tOi=f=Dj=T).
i=1 j=1
where the constraint L:~=1 Xim has been removed, since (5) is a linear system of rank
(n + m - I) and is redundant. By using the lexicographic ordering of the subscripts
Parallel Implementations of Primal and Dual Algorithms 175
Ay=O, (7)
where A is the node-arc incidence matrix with destination m removed.
The K-K-T necessary and sufficient conditions for (6)-(7) are
y -x+AT A = 0,
(8)
Ay =0.
It is relatively easy to show that the vector of dual variables A associated with the
constraints of (7) is given by the formula
(9)
Drissi (1991) was the first to find an explicit analytical expression for (AAT)-I in
order to compute A. In the following, we develop another method for solving (9),
which has the advantage of showing why (9) is easy to solve for bipartite networks.
A byproduct of this analysis is an explicit expression of y as a function of x which is
easy to manipulate. We present first some preliminary results.
Consider the following linear system:
(10)
where Xt. X 2, BI and B2 are column vectors, and All, An, A21 and A22 denote
submatrices of appropriate dimensions.
The system (10) is equivalent to
(11)
(12)
The indices belonging to the set 1 correspond to origins and those belonging to set 2
to the (m - 1) destinations.
In the linear system (11), the matrix Q is given by
Q = (nlm_1 ,m-I - ~
m Um- I"nUn m-I),
Q = n(Im-1 ,m-I - ~
m Um- I'm-I),
1-...!.. -...!.. .. .
_...!..m I_m...!.. .. .
Q =n [ .m . m
.. ..
-...!.. -...!..
m m
We note that the columns of Q are permutations of the first one. This is due to
the topology of the network. Since it is a complete bipartite network, the different
destinations may be permuted without changing the "drawing" of the graph. Due to
its special structure, Q has an easy inverse.
Proposition 1
-1 ( 1m-I
Q -I -_ n 'm-I + Um- I,m-I ) .
Proof
QQ - I -_ (
1m-I ,1mUm-I'
m-I - - m-I + Um- I'm-Im
- -
1 Um-'I m-IUm- 1,m-I ),
since Um-l,m-IUm-l,m-1 = (m - I)Um- l,m-l, it follows that
m-l 1 )
QQ-I = ( 1m - I m-I
,
+-m
- Um- I'm-I
m
- - (m - l)Um _ 1 m-I
'
,
QQ-I = 1m - I ,m-I. o
1 mn 1 m In
Ali = nm L L Xlk - m L Xik -;;: L Xlm i = 1, ... ,n, (14)
k=1 1=1 k=1 1=1
1
(L L
n n
A2i = ;;: Xlj - Xlm) j = 1, ... , m. (15)
1=1 1=1
Prool
We evaluate R of (12) by using the above result.
A2 = Q-IR,
1
A2 = ;;: ([Axh + Um-t,m-dAxh + Um-l,n[Axld·
For a destination node j E { 1, 2, ... , m - 1 }
1 n m-I n n m
1 n m-I n m-t n n
We recall that the node j = m corresponds to the redundant constraint that was
dropped; its dual variable A2",= O. The expression (15) holds for j = m as well.
Hence
j = 1, ... ,m.
178 l. Chabini et al.
In order to evaluate AI, the vector of dual variables (prices) associated with source
nodes, one can apply (12) to obtain
1
AI =m In,n([Axh - (-Un,m-J) A2),
1
AI =m ([Axh + Un,m-I A2).
Let i E { 1, 2, ... , n } represents an origin node
Since A2", = 0,
(16)
By using (15),
By replacing E;;=I A2k with ~ E;;=I E~=I Xlk - ~ E~=I Xlm in (16), one
obtains (14). o
Now, we can obtain an explicit expression for y as a function of x. The linear
system (8) implies that
y =x - ATA,
The primal algorithm that is implied by these results may be stated as follows:
Parallel Implementations of Primal and Dual Algorithms 179
Step 0 (Initialization):
r -
gij - -r'
DiDj
loraII (.
& .).
~,J,
r =1
Step 1 (Computation of the gradient):
at = arg miIlo<a<a
_ _ max F(gr - apr)
The classical method for solving (1)-{3) is a dual method that dates back to 1937
(Kruithof, 1937) and was generalized for general linear constraints by Bregman
(1967a, 1967b). It is used widely in transportation planning applications (Wilson,
1967; Evans, 1967; Robillard and Stewart, 1974). It is also known as the RAS
algorithm (Bachem and Korte, 1979), since it alternates between scaling rows and
columns, which is equivalent to premuItiplying by Rand postmultiplying by S to
obtain the balanced matrix (R and S are diagonal matrices). The algorithm may
be viewed as a coordinate ascent method for the Lagrangean dual of the entropy
optimization problem [Cottle, Duval and Zihan (1986), Schneider and Zenios (1989)].
The algorithm may be stated as follows:
RAS Algorithm
Step 0 (Initialization):
180 I. Chabini et al.
We stated the RAS algorithm in a form that is efficient both for computational
efficiency and numerical stability. The standard statement of the algorithm may be
found, for instance, in Schneider and Zenios (1990).
As in the previous steps, each processor p computes the part of the sum for indices
j E J p • The communication and summation is done as in step 2.
The coarse grain parallel implementation of the RAS algorithm consists in the
decomposition of the computations of steps 1 and 2 over the processors. In step 1,
processor j computes Sj for j E Jp , where the sets Jp form a partition of the columns
j = 1,2, ... , m and have approximately the same cardinality. The communication
of the partial results is done as follows: processors exchange results by successively
sending and receiving information to and from their neighbors, in a ring network
topology (Bertsekas and Tsitsiklis, 1989).
The same approach is taken to partition the computation of the Ri in step 2 of the
algorithm.
This is illustrated by the following example. Let the number of processors be 4
and the number of columns be 8. Each processor is assigned the computation of the
Sj'S for two columns. Processor 1 sends the values of SI and S2, processor 2 sends
182 I. Chabini et al.
the values of 8 3 and 8 4 and so on. Each processor then informs its neighbor of the
values received. For this example, processor 1 would send to processor 2 the values
of 8 7 and 8 8 that were received from processor 4, processor 2 sends the values of 8 1
and 8 2 , and so on. This is illustrated in the figure below.
6. NUMERICAL RESULTS
n m
The increases in speed obtained with the primal method are reported in Table 1.
The best increase is 11.55 and is obtained with 12 processors. The results obtained
with the dual algorithm are reported in Table 2. Here the best increase is 5.03 using
16 processors.
These results indicate that the primal method benefits more from coarse grain
parallelization. In addition, the convergence of the primal method is far superior to
the dual method on ill conditioned examples such as that taken from Robillard and
Stewart (1974). The data for the problem are
0= [0.001 1]
gIl '
The RAS method produces the following solution after 150 iterations:
Parallel Implementations of Primal and Dual Algorithms 183
TABLE 1
Speedups obtained with the projected gradient algorithm.
Projected gradient method
NPROC 2 3 4 6 8 10 12 14 16
Function +
gradient 2 3 4 5.98 8 10 12 14 16
evaluations
Gradient
1.99 2.85 3.90 5.56 6.91 8.12 8.84 9.50 9.95
Projection
Line-
2 3 4 6 7.79 9.81 11.85 10.46 10.95
Search
Total
2 3 4 6 7.69 9.63 11.55 10.46 10.98
algorithm
TABLE 2
Speedups obtained with the RAS method.
Balancing method
NPROC 2 3 4 6 8 10 12 14 16
Speedup 1.74 2.34 2.81 3.52 4.00 4.38 4.64 4.84 5.03
Most of the computational time for the primal method is needed in the line search,
8.71 sec. out of 15.96 sec. The time required for the projection is 1.70 sec. and for
the cost evaluation, 5.55 sec.
The ratio of the computational times, per iteration, of the parallelized algorithms
is 3.28 in favor of the RAS algorithm. Hence, if the primal algorithm is to be
competitive with the RAS method, it should converge to an acceptable solution in
less than an average of 3.28 iterations. The preliminary tests given in this paper
indicate that the primal algorithm converges faster than the RAS method for some
ill conditioned examples. We note that the stopping criteria for the primal and dual
methods are different. The pattern that we observe is that the RAS method obtains
an objective value very close to that of the primal algorithm, but the values of the
variables are relatively far from their optimal values.
7. CONCLUSION
We present in this paper, to our best knowledge, the first parallel computing imple-
mentations of a matrix balancing algorithm on an MIMD computing platform. The
results obtained indicate excellent gains in speed for the primal algorithm, since the
computing tasks are relatively "coarse grained" and well suited for this architecture.
The gains in speed for the RAS algorithm may possibly be improved by an asyn-
chronous implementation in which each processor does not wait to obtain the most
"current" scaling factors before starting another balancing iteration. We intend to
report on further work in this area in future articles.
REFERENCES
5. Bregman, L., "Proof of the convergence of Sheleikhovskii's method for a problem with
transportation constraints", USSR Computational Math. and Mathematical Phys., 7,
pp. 191-204, 1967.
6. Cottle, R.W., Duval, S.G. and Zikan, K., "A Lagrangean relaxation algorithm for the
constrained matrix problem", Naval Research Logistics Quarterly, 33, pp. 55-76, 1986.
7. Drissi-KaItouni, 0., "A projective method for bipartite networks and application to the
matrix estimation and transportation problems", Publication #766, Centre for Research
on Transportation, Universite de Montreal, 1991.
8. Evans, AW., "Some properties of trip distribution methods", Transportation Research,
4,pp. 19-36, 1970.
9. Evans, S.P. and Kirby, H.R., "A three-dimensional Furness procedure for calibrating
gravity models", Transportation Research, 8, pp. 105-122, 1974.
10. Aynn, M.l, "Some computer organizations and their effectiveness", IEEE Transactions
on Computers, C-21(9), pp. 948-960, 1972.
11. Furness, K.P., "Trip forecasting", Unpublished paper cited by Evans and Kirby, 1974.
12. Kruithof, J., "Calculation of telephone traffic", De ingenieur, 52, pp. 15-25, 1937.
13. Lent, A, "A convergent algorithm for maximum entropy image restoration, with a
medical X-ray application", SPSE Conference Proceedings, Toronto, Canada, 1976.
14. Pardalos, P.M., Phillips, AT. and Rosen, lB., "Topics in parallel computing in math-
ematical programming", Department of Computer Science, The Pennsylvania State
University, 1990.
15. Robillard, P. and Stewart, N.F., "Iterative numerical methods for trip distribution prob-
lems", Transportation Research, 8, pp. 575-582, 1974.
16. Schneider, M.H. and Zenios, S.A., "A comparative study of algorithms for matrix bal-
ancing", Operations Research, 38, 1990.
17. Wilson, A.G., "Urban and Regional Models in Geography and Planning", Wiley, New
York,1974.
18. Zenios, S.A., "Parallel numerical optimization: current status and an annotated bibliog-
raphy", ORSA Journal on Computing, vol. 1, No.1, 1989.
19. Zenios, S.A. and Iu, S.L., "Vector and parallel computing for matrix balancing", Annals
of Operations Research, 22, pp. 161-180, 1990.
20. Zenios, S.A., "Matrix balancing on a massively parallel connection machine", ORSA
Journal on Computing, 2, pp. 112-125, 1990.
PART FOUR
ABSTRACT. In this paper we develop a financial model of competitive sectors in the presence
of policy interventions in the form of taxes and price ceilings. The model yields the equilibrium
asset, liability, and financial instrument price pattern. First, the variational ineqUality formula-
tion of the equilibrium conditions is derived and then utilized to obtain qualitative properties
of the equilibrium pattern. We then propose a computational procedure and establish conver-
gence results. The algorithm decomposes the large-scale problems into network subproblems
of special structure, each of which can then be solved exactly (and simultaneously) in closed
form. Numerical results are also presented to illustrate the algorithm's performance.
1. INTRODUCTION
In this paper we develop a framework for the formulation, analysis, and computation
of competitive financial models in the presence of policy interventions. The policy
interventions that are considered are taxes and price controls.
Financial theory since the seminal work of Markowitz (1959) and Sharpe (1970)
has been principally concerned with the portfolio optimization problem facing a
single sector. Here, by contrast, we focus on the competitive equilibrium problem in
which there are multiple sectors, each with its particular optimization problem. We
assume that each sector seeks to minimize its risk while maximizing its net return
in the presence of both taxes and price ceilings. Under the assumption of perfect
competition, each sector in the economy takes the instrument prices as given and
then determines its optimal composition of both assets and liabilities subject to an
accounting identity. In the model, the instrument prices serve as market signals which,
in turn, reflect the economic market conditions that state that if an instrument price at
equilibrium lies within the bounds, then the market for the financial instrument must
clear.
The theoretical developments in this paper are done using variational inequality
theory. We note that variational inequality theory has been used to study a plethora
of problems, including oligopolistic market equilibrium problems (cf. Gabay and
Moulin, 1980; Dafermos and Nagurney, 1987), spatial price equilibrium problems (cf.
Florian and Los, 1982; and Dafermos and Nagurney, 1984), and general economic
equilibrium problems (cf. Border, 1985; Dafermos, 1990; and Dafermos and Zhao,
1991). In addition, variational inequality theory has been used recently to investigate
the effects of policy interventions in the form of price controls in commodity markets
D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 189-205.
© 1994 Kluwer Academic Publishers.
190 Anna Nagumey and June Dong
(cf. Nagurney and Zhao, 1991). These, however, have been partial equilibrium
models in which only a subset of agents/commodities has been treated. In this paper
we demonstrate that variational inequality theory can also be used to study policy
interventions in general equilibrium problems, in particular, in general financial
equilibrium problems.
The paper is organized as follows. In Section 2 we introduce the model, consisting
of mUltiple sectors and mUltiple financial instruments that can be held as assets and/or
as liabilities in the presence of taxes and price ceilings. The model postulates the
behavior of the sectors and, in equilibrium, yields the competitive asset and liability
holdings as well as the instrument prices. The variational inequality formulation of
the equilibrium conditions is derived and then used to study the qualitative properties.
We first show that a solution is guaranteed to exist and then establish uniqueness of
the equilibrium asset and liability pattern.
In Section 3 we propose an algorithm for the computation of the equilibrium
pattern. The algorithm is the modified projection method of Korpelevich (1977),
which is shown to converge for our model. The notable feature of this decomposition
algorithm is that it resolves the large-scale financial problem into simple network
subproblems of special structure, each of which can be solved simultaneously and
exactly in closed form using exact equilibration algorithms (cf. Eydeland and Nagur-
ney, 1989). In Section 4 we conduct numerical experiments with the algorithm
on a variety of examples. In Section 5 we summarize the results and present our
conclusions.
Minimize ( ~; ) T Qi ( ~: ) _ t
j=l
(I - Tij) T j (Xij - Yij)
subject to
n
L Xij = Si, (1)
j=l
L Yij = Si,
j=l
Constraints (1) represent the accounting identities that require that the accounts
for sector i must balance, where Si is the total financial volume held by sector i.
Constraint (2) represents the nonnegativity assumption. We let Pi denote the closed
convex set of (Xi, Yi) satisfying constraints (1) and (2).
Since Qi is a variance-covariance matrix, we will assume it is positive-definite,
and therefore the objective function for each sector is strictly convex. Thus, the
necessary and sufficient conditions for an optimal portfolio are that (Xi, Yi) E Pi
must satisfy the following system of inequalities and equalities:
For each instrument j = 1, ... , n,
2Q(11)j
. T
. Xi + 2Q(21)j
. T
. Yi -
(
1-
)
Tij Tj -
1
f.Li 2: 0,
192 Anna Nagumey and June Dong
. T . T 2
2Q(22)j . Yi + 2Q(12)j . Xi + (1 - Tij)rj - 1'i ~ 0,
. T . T
Xij . (2Q(11)j . Xi + 2Q(zI)j . Yi - (1 - Tij)rj - 1'D = 0, (3)
. T . T
Yij . (2Q(22)j . Yi + 2Q(I2)j . Xi + (1 - Tij)r j - 1'n = 0,
where the symmetric matrix Qi has been partitioned as Qi = (~i: ~i~) and
Q~ a,8)j denotes the j -th column of Q~ a,8)' with a = 1,2; f3 = 1,2. The terms 1'! and
1'~ are the Lagrange multipliers of constraints (1). A similar set of inequalities and
equalities will hold for each of the m sectors.
We now describe the inequalities governing the instrument prices in the economy.
Note that the prices provide feedback to the sectors through the objective function.
We assume that there is free disposal and, hence, the instrument prices will be
nonnegative. Mathematically, the economic system conditions are thus:
For each instrument j = 1, ... , n,
ifrj = Tj
if r j >0 and r j < Tj (4)
ifrj = O.
Therefore, if there is an effective excess supply of any instrument in the economy,
its price must be zero; if an instrument's price is positive (but not at the ceiling), the
market for that instrument must clear; and, if there is an effective excess demand for
an instrument in the economy, its price must be at the ceiling.
Let S == {riO::; r ::; fl, and K == II:I Pi x S, and combine the above sector
and market inequalities and equalities to obtain
Definition 1.
A vector (x, Y, r) E K is an equilibrium point of the competitive financial model
with policy interventions developed above if and only if it satisfies the system of
equalities and inequalities (3) and (4) for all sectors i = 1, ... , m and for all instru-
ments j = 1, ... , n simultaneously.
Theorem 1.
A vector (x, Y, r) of assets and liabilities of the sectors and instrument prices is a
competitive financial equilibrium with policy interventions if and only if it satisfies
the variational inequality problem:
Find (x, Y, r) E K satisfying
m n
LL
i=1 j=1
[2(Q~11)/' Xi + Qhl)/ .Yi) - (1 - Tij}rj] X [X~j - Xij]
Variational Inequalities for the Computation of Financial Equilibria 193
m n
+L L [2(Qt22)j T . Yi + Qt\2)j T . Xi) + (1 - Tij}rj] x [YL - Yij]
i=1 j=1
(5)
Proof: Assume that (x, y, r) E K is an equilibrium point. Then inequalities (3) and
(4) hold for all i and j. Hence, one has
n
L [2(Qtll)j T. Xi + Qhl)j T. Yi) - (1 - Tij)rj - I.d] x [X~j - Xij] ~ 0,
j=1
Summing inequalities (6) and (7) over i, one concludes that for (x, y) E II:: 1 Pi,
m n
L ~ [2(Q~II)/·Xi+QhJ)/·Yd-(1-Tij)rj] x [X~j-Xij]
i=1 j=1
m n
+L L [2(Q(22)/·Yi+Q(I2)/·Xi)+(I- Tij )rj] x [Y~j-Yij] ~0(8)
i=1 j=1
" (1 - 7.--)
'L..J 'J
(x··
'J
- y''J.) x (r'J - rJ >0
.) - (9)
i=1
for all r' E S. Summing inequalities (8) and (10) produces the variational inequality
(5).
We now establish that a solution to variational inequality (5) will also satisfy
equilibrium conditions (3) and (4). If (x,y,r) E K is a solution of variational
inequality (5) and if one lets x~ = Xi, y~ = Yi, for all i, one obtains
(11)
m n
+ L L [2(Q(z2)t· Xi + Q{t2)t· Yi) + (1 - Tij)rj] x [Y~j - Yij] ~ 0
i=1 j=1
(12)
m n
+ "L...J L...J[2(Q(22)j
" i T . Yi1 + Q(12)j
i T . XiI) + (1 - Tij )rjI] x ['
Yij - Yij1 ]
i=1 j=1
m n
(14)
m n
But since each Qi; (i = 1, ... , m) is positive-definite, the left-hand side of (15) must
be nonpositive and, hence, we can conclude that xl = x 2 and yl = y2. The proof is
complete.
In the special case in which there are no taxes or price ceilings imposed, the above
model collapses to the model developed in Nagurney, Dong, and Hughes (1992). In
this case condition (4) simplifies as follows. Since Tij = 0 for all i and j, and f j is
effectively set at infinity for all j, only the equality and the second inequality would
apply in conditions (4). In addition, in the model without policies, the set S would
no longer be bounded, and, hence, the feasible set K would no longer be compact.
Hence, another existence proof would be required.
3. THE ALGORITHM
The Algorithm
Step 0: Initialization:
Set (xO, yO, rO) E K. Let k = 1. Let 0: be a positive scalar.
m n
~ ~ [ k
+~ ~ Yij + 0: (2Qi(22)j'
T k-I
Yi + Qi{l2)j T k-I)
. xi
i=1 j=1
+ f;n [ -k
rj +0: ~
m k-I
(1- Tij)(Xij
k-I k-I
-Yij ) - rj 1
. [r'J - fk]
J >
, V(x' " Y'
-0 r') E K • (16)
m n
+~ ~
~ ~
[ k
Yij + 0: (2(Qi(22)j'
T -k Qi
Yi + (12)j
T -k)
. Xi
i=1 j=1
+f; n k -k-I
[ rj+O:~(I-Tij)(Xij
m
-k-I k-I
-Yij ) - rj 1
. [r' - r k ]
J J
>0
-,
V(x' Y' r')
"
E K. (17)
Variational Inequalities for the Computation of Financial Equilibria 197
Convergence Verification:
If maxi
ij Xij k-II<
k - Xij _ f, maxi k - Yij
ij Yij k-II< _ f, maxi
j rjk - rjk-II<_ f, WI'th f some
positive preselected tolerance, then stop; else, set k = k + 1, and go to Step 1.
We now give an interpretation of the above algorithm as an adjustment process.
x: ,
In (16) each sector i at each time period k receives instrument price signals r k - I
and determines its optimal asset and liability pattern yf. At the same time, the
system determines the prices fk in response to the difference of the total effective
volume of each instrument held as an asset minus the total effective volume held as
a liability at time period k - 1. The agents and the system then improve upon their
approximations through the solution of (17). The process continues until stability is
reached; that is, until the asset and liability volumes and the instrument prices change
negligibly between time periods.
Observe now that both (16) and (17) are equivalent to optimization problems, in
particular to quadratic programming problems of the form
MinimizexEK XT. X + hT . X, (18)
where X == {(x, Y, r) E R2mn+n} and h E R2mn+n consists of the fixed linear terms
in the inequality subproblems (16) and (17). Moreover, problem (18) is separable
in x, Y and r, and, in view of the feasible set, has the network structure depicted in
Figure 1.
Convergence of the algorithm follows (cf. Korpelevich, 1977) under the as-
sumption that the function F that enters the variational inequality is monotone and
Lipschitz continuous, where 0 < a < ±
and L is the Lipschitz constant. We now
prove that these conditions are always satisfied. We first establish monotonicity.
Let (Xl, yl, rl) E K and (x2, y2, r2) E K. In evaluating monotonicity, we must
show that
m n
"''''[ .
L..J L..J 2(Q(11)j T
. (XiI - xJ
2
+ Q(21)j
. T I
. (Yi -
2
yJ)
i=1 j=1
n
+L L
m
x [rlJ - r2]
J
>
-
0,
(19)
for all (Xl, yl ,rl) E K, (x2, y2,r2) E K. After some algebra, the left-hand side of
(19) reduces to
198 Anna Nagumey and June Dong
Asset SubprobleMs
SI
lio.bility SubprobleMs
SI
Price SubprobleMs
rj
m n.
m n.
Lemma 1. The function that enters the variational inequality (5) is monotone.
Lemma 2. The function F(x, y, r) that enters variational inequality (5) is Lipschitz
continuous; that is, for all (xl, yl, rl), (x 2, y2, r2) E K,
(21)
Q
( _(TB)T TB) (22)
0
and
Q= (23)
Qiz
1- Til -1
1- Tmn -1
T= 1- Til
B= 1
. (24)
I- Tmn 1 2mnxn
(25)
and
(26)
200 Anna Nagumey and June Dong
2mn+n
£2 = max
19~2mn+n L
k=1
(27)
c
where elk is the (l, k )th element of C T C, then (£2 J - T C) is a symmetric positive-
definite matrix, and, therefore, (C T C - £2 J) is a negative-definite matrix. Hence,
(28)
Consequently,
4. NUMERICAL RESULTS
In this section we consider the numerical solution of the financial equilibrium model
with policy interventions introduced in Section 2. We emphasize that the model is
designed with empirical applications in mind. For example, the framework that has
been developed fits well with flow-of-funds accounts data that are collected quarterly
or annually to provide snapshots of the financial side of the economy. In the case
of the United States, the data sets are maintained by the Federal Reserve Board of
Governors. For an introduction to flow of funds accounts, see Cohen (1987), and
for a recent algorithmic approach to the balancing of these accounts, see Hughes and
Nagurney (1991).
We now present several examples with a variety of tax and price control scenarios.
We consider an economy with two sectors and three financial instruments. Here we
assume that the "size" of each sector 8i is given by 81 = 1, and 82 = 2. Each
sector realizes that the future values of its portfolio are random variables that can be
described by their expected values and variances and believes that the mean of these
expected values is equal to the current value. The variance-covariance matrices of
the two sectors are:
Variational Inequalities for the Computation of Financial Equilibria 201
Note that the terms in the blocks Qb, Q~I' Q~2' Q~I' are not positive, since the
returns flowing in from an asset item must be negatively correlated with the interest
expenses flowing out into the portfolio's liabilities. (For details see Francis and
Archer (1979).)
We now use the above data to construct the examples. The algorithm was coded
in FORTRAN, compiled using the FORTVS compiler, optimization level 3, and the
numerical runs were done on an ffiM 3090/600J. For each example, the variables
were initialized as follows: rJ
= 1, for all j, Xij = ~, for all j, Yij = ~, for all j.
The 0: parameter was set to .35. The convergence tolerance f was set to 10-3 •
In the first example, we set the taxes T = 0, for all sectors and instruments and
the price control ceilings f to 2 for all instruments. The numerical results for the first
example are
The algorithm converged in 24 iterations and required 5.09 miliseconds for con-
vergence.
In the fifth and final example. we kept the same tax rate as in Example 4 at T = .3
but raised the price ceilings to f = 2. the same level as in Example 1. The numerical
results are
The algorithm converged in 17 iterations for this example and required 3.59
miliseconds of CPU time for convergence.
In line with (4), for each of the above examples, the algorithm yields asset
and liability patterns in which the difference between the total effective volume of
an instrument held as an asset is approximately equal to the total volume of the
instrument held as a liability when the instrument price is not at one of the bounds.
Hence, the market clears for each such instrument, and the price of each instrument
is positive in equilibrium.
ACKNOWLEDGEMENTS
The authors are grateful to D.A. Belsley for suggestions that improved the presenta-
tion of this work.
Variational Inequalities for the Computation of Financial Equilibria 205
REFERENCES
Border, K. C. (1985) "Fixed point theorems with applications to economics and game theory",
Cambridge University Press, Cambridge, United Kingdom.
Cohen, J. (1987) "The flow of funds in theory and practice, Financial and monetary studies
15", Kluwer Academic Publishers, Dordrecht, the Netherlands.
Dafermos, S. (1990) "Exchange price equilibria and variational inequalities", Mathematical
Programming 46,391-402.
Dafermos, S. and Nagurney, A. (1984) "Sensitivity analysis for the general spatial economics
equilibrium problem", Operations Research 32, 1069-1086.
Dafermos, S. and Nagumey, A. (1987) "Oligopolistic and competitive behavior of spatially
separated markets", Regional Science and Urban Economics 17,245-254.
Eydeland, A. and Nagurney, A. (1989) "Progressive equilibration algorithms: the case of
linear transaction costs", Computer Science in Economics and Management 2, 197-219.
Florian, M. and Los, M. (1982) "A new look at static spatial price equilibrium problems",
Regional Science and Urban Economics 12, 579-597.
Francis, J. C. and Archer, S. H. (1979) "Portfolio analysis", Prentice-Hall, Inc., Englewood
Cliffs, New Jersey.
Gabay, D. and Moulin, H. (1980) "On the uniqueness and stability of Nash equilibria in non-
cooperative games", in: A. Bensoussan, P. Kleindorfer, and C. S. Tapiero, eds., "Applied
stochastic control of econometrics and management science", North-Holland, Amsterdam,
The Netherlands.
Hughes, M. and Nagurney, A. (1992) "A network model and algorithm for the analysis and
estimation of financial flow of funds", Computer Science in Economics and Management
523-39.
Kinderlehrer, D. and Stampacchia, G. (1980) "An introduction to variational inequalities and
their applications", Academic Press, New York.
Korpelevich, G. M. (1977) "The extragradient method for finding saddle point and other
problems", Ekonomicheskie i Mathematicheskie Metody (translated as Matekon) 13, 35-
49.
Markowitz, H. M. (1959) "Portfolio selection: efficient diversification of investments", John
Wiley and Sons, Inc., New York.
Nagurney, A., Dong, 1., and Hughes, M., (1992) "The formulation and computation of general
financial equilibrium", Optimization, 26,339-354.
Nagurney, A. and Zhao, L. (1991) "A network equilibrium formulation of market disequilib-
rium and variational inequalities", Networks 21, 109-132.
Sharpe, W. (1970) "Portfolio theory and capital markets", McGraw-Hill Book Company, New
York.
Thore, S. (1986) "Spatial disequilibrium", Journal of Regional Science 26, 660-675.
Zhao, L. and Dafermos, S. (1991) "General economic eqUilibrium and variational inequalities",
Operations Research Letters 10, 369-376.
AGAPI SOMWARU, V. ELDON BALL AND UTPAL VASAVADA
ABSTRACf. The flexible accelerator specification of the demand for capital and labor is
estimated using data generated by the United States agriculture. It is assumed that firms
maximize the discounted value of expected profits subject to a technology that implies capital
and labor stocks are costly to adjust. Given multiproduction behavior, an investment path for
the quasi-fixed stocks is developed assuming maximization of the discounted sum of future
profits over an infinite horizon. The consistency of data with the adjustment-cost specification
requires that the firm's value function be convex in prices. We impose convexity based on
the Cholesky factorization of a matrix of constant parameters association with price effects.
These matrixes are fitted subject to the condition that they must be non-negative definite. Use
non-linear constrained optimization approach for estimating the model the system of quasi-
fixed input, variable input, and output equations jointly by inequality constrained iterative least
squares method.
1. INTRODUCTION
Econometric studies of producer behavior that exploit the duality between production
and cost or profit functions are numerous in the empirical literature. The simultaneous
development of duality theory and accessible computational algorithms for nonlinear
systems estimation have contributed to the growth of this field of inquiry. Of particular
interest to the present study are multiproduct-multifactor models of a production
system. Early contributions to the literature include studies by Shumway (1983) and
Weaver (1983) which focus on the profit maximizing agricultural firm. An important
drawback to these studies is the adoption of a static framework that fails explicitly to
model quasi-fixed input adjustment.
In contrast, a recent study by Vasavada and Chambers (1986) proposed an em-
pirical framework to model optimal adjustment of quasi-fixed factors based on well
known results in the adjustment-cost literature. The adjustment cost model of a firm
is used to rationalize the flexible accelerator specification. Their study adopted the
simplifying assumption that the technology was separable in outputs, thereby per-
mitting construction of a single output aggregate. Vasavada and Ball (1988) extends
this work to include multiple outputs. Although this study relaxes the separability
assumption, the estimated investment demand and output supply equations fail to
satisfy the necessary integrability conditions.
This paper improves on previous efforts by incorporating restrictions from theory
as part of the maintained model. We illustrate the imposition of curvature and mono-
D. A. Belsley (ed.). Computational Techniques for Econometrics and Economic Analysis. 207-218.
© 1994 Kluwer Academic Publishers.
208 A. Somwaru et al.
max
I?O
J eTtpTy - WTX - qTK + F(y,X,K,K)dt, (1)
subject to
K = I - {)K,
K(O) = Ko > 0,
where {) is an n x n diagonal matrix of positive depreciation rates. P E R+
is the price vector corresponding to Y; W E Ri and q E R+. are rental prices
corresponding to X and K, respectively. All prices are measured relative to the
price of output Ym + I. Current relative prices are expected to persist indefinitely.
A firm may form expectations rationally in this manner when information is costly
(Chambers and Lopez, 1984). r > 0 is the constant discount rate and r is an
appropriately dimensioned scalar matrix with r as the diagonal element. Ko is the
Modeling Dynamic Resource Adjustment 209
initial endowment of the quasi-fixed factors. Given Ko, the producer chooses a time
path K(t), Y(t), X(t), and Ym + 1(t), to maximize the present value of rents over an
infinite horizon.
Let J(P, W, q, K) denote the optimal value of (1). The Hamilton-Jacobi equation
(Arrow and Kurz, 1970) then gives
rJ = max
I?O
[pTy - WTX - qTK + F(Y,X,K,K) + JK(I - 6K)] , (2)
where JK(-) is the vector of shadow prices associated with the quasi-fixed stocks.
Under the regularity conditions stated above, Epstein (1981) has shown that the
value function J is dual to F and obeys J K > 0; J and JK are twice continuously
differentiable; (!: + 6) J K + q - J K K K* > 0; J is convex in (P, W, q); and
rJ - JKK* is convex in (P, W, q). The result that (!: + 6) JK + q - JKKK* > 0
restates the equation of motion for J K implied by the maximum principle and follows
by applying the envelope theorem to (2) using the assumption that FK > O. The
statement that J K < 0 follows from the first-order conditions for (2), which imply
that F i< = -JK and hence the result. Convexity of J in (P, W, q) is intuitively seen
by noting that the objective function (1) is the limit of the sum of linear functions in
(P, W, q). The requirement that r J - J K K* be convex in (P, W, q) is an integrability
relationship between J and F. For later use, note that this condition simplifies to
convexity of J when J K is linear in (P, W, q).
The advantage of representing the restrictions implied by dynamic theory in terms
of J is its analytical tractability, since the duality between r J and F implies that the
technology can be recaptured by solving
F*(Y, X, K, K) = P,W,q
min [rJ(P, W, q, K) _ pTy +
When the model generating the data can be approximated by (1), a parametric
characterization of optimal policy rules is available. Optimal output supply and
investment demand equations are obtained by applying the envelope theorem to (2).
Differentiating with respect to (P, W, q) yields
(4)
(7)
210 A. Somwaru et at.
Equations (4) - (7) provide a system of optimal supply and demand equations.
Given a characterization of J that satisfies the regularity conditions, these equations
provide a straightforward procedure for modelling quasi-fixed input adjustment.
3. MODEL ESTIMATION
[i1
A21 A22 A 23 A24
+t[pTWT qT KT] (8)
A31 A32 A33 A34
A41 A42 A43 A44
(9)
(10)
(11)
(12)
Before discussing the regularity conditions notice that the net investment equation
(11) is a multivariate flexible accelerator (Nadiri and Rosen, 1969) with adjustment
matrix M = (1: + A3,/ ). This can be seen by rewriting (11) as
where
(14)
Modeling Dynamic Resource Adjustment 211
(16)
This procedure requires a first -stage estimate of e using multivariate least squares,
followed by the estimation of n based on the inner product of residuals. An estimate
of the variance-covariance matrix is then substituted into (16), and eis estimated
using ~e new weighting matrix. This procedure is repeated until the parameter
vector e and the estimated variance-covariance matrix n stabilize. For the Aitken
estimator, this iterative procedure yields an asymptotically efficient estimator, which
is otherwise not the case (Malinvaud, 1970).
The estimation procedure used in this study must be capable of imposing the the-
oretical curvature restrictions on the value function. This procedure is characterized
by the non-linear constrained optimization problem
212 A. Somwaru et al.
The empirical model identifies two output categories including livestock and crops.
The stocks of durable equipment, service buildings, land, farm-produced durables,
and self-employed labor are assumed costly to adjust. Hired labor, energy, and other
purchased inputs are assumed to adjust freely to current prices and stocks of the
quasi-fixed inputs.
The output series are defined as the quantities marketed (including unredeemed
Commodity Credit Corporation loans) plus changes in farmer-owned inventories and
quantities consumed by farm household. The indexes of output are based on value
to the producer. For this reason, commodity prices are adjusted to reflect direct
payments to producers under government programs.
The labor data were developed by Gollop and Jorgenson (1980). They disag-
gregate labor input and labor cost into cells cross-classified by two sexes, eight age
groups, five education groups, two employment classes (hired and self-employed),
and ten occupational groups.
The value of labor services equals the value of labor payments plus the imputed
value of self-employed and unpaid family labor. The imputed wage rate is set
equal to the mean wage rate of hired farm workers with the same occupational and
demographic characteristics.
The capital input data are derived from information on investment and the outlay
on capital services. There are twelve investment series used to calculate capital
stocks. The perpetual inventory method (Jorgenson, 1974) is used, and the service
lives are those of Bulletin F published by the U. S. Treasury Department. Rental
prices for each asset are constructed taking account of variations in effective tax rates
Modeling Dynamic Resource Adjustment 213
and rates of return, depreciation, and capital gains. The value of capital services is
computed as the product of the rental price and the quantity of capital at the end of the
previous period. A more detailed discussion of the procedures used in constructing
the capital price and quantity series is found in Ball (1985).
5. ESTIMATION RESULTS
The system of quasi-fixed input, variable input, and output equations is jointly esti-
mated by an inequality-constrained iterative least squares method, which is equiva-
lent to the maximum likelihood method. Parametric inequality constraints associated
with the convexity of J in (P, W, q) are imposed during estimation. When these
restrictions are true, the resulting estimates are asymptotically efficient. Further jus-
tification for imposing these constraints comes from noting that the derived structural
model is itself an implication of economic theory. It would, therefore, be inconsistent
selectively to utilize only the structural model implied from theory while rejecting
associated parametric restrictions.
This highly nonlinear model has 788 variables, 697 equations and 8582 non zero
elements. 4.71 Mbytes are required for execution of each iteration on the variance-
covariance matrix.
Adjustment Matrix
Point estimates of adjustment parameters are reported in Table 1. Since the accepted
model did not preclude interdependent adjustments, off-diagonal elements of this im-
portant matrix are non-zero. A positive off-diagonal element, say for M\3, indicates
that when input 3 is below its long-run value, disinvestment in input 1 is induced.
Similarly, a negative value for the same elements of the adjustment matrix implies
that, under identical circumstances, investment in input 1 will be induced. In this
fashion, numerical values of off-diagonal elements reflect structure of interdependent
adjustments in aggregate U.S. agriculture.
Now turn to the diagonal elements of the adjustment matrix. Consider the element
Mil. A value of -0.160 for this coefficient suggests that, when actual stocks for
durable equipment diverge from long-run values, it takes a little over six years to
complete the needed adjustment, given that all other inputs are at long-run equilibrium
levels. Similarly, M 22 , M 33 , and M44 supply an interpretation of adjustment speeds
for other quasi-fixed inputs.
Real estate takes approximately fifteen years to adjust, while farm-produced
durables take only two years. By contrast, the adjustment speed for self-employed
and unpaid family labor is close to nine years to adjust. Values for remaining in-
puts are, for the most part, numerically similar. One important conclusion emerging
from the mUltiple-input, multiple-output model is that disaggregation of labor into
two categories provides more reasonable results. Earlier investment studies derived
high adjustment lags for labor when complete supply response systems are esti-
mated (Vasavada and Chambers, 1986). Since self-employed labor and hired labor
have qualitatively different characteristics, such a disaggregation scheme improves
214 A. Somwaru et al.
TABLE 1
Estimated Adjustment Matrix.
Parameter Estimate Parameter Estimate
M(I,I) -0.160 M(3,1) 0.824
M(I,2) -0.127 M(3,2) -0.050
M(I,3) 0.182 M(3,3) -0.554
M(I,4) -0.060 M(3,4) -0.326
M(2,1) -0.035 M(4,1) 0.045
M(2,2) -0.060 M(4,2) -0.161
M(2,3) 0.019 M(4,3) 0.073
M(2,4) 0.036 M(4,4) -0.134
Note: 1 is durable equipment and service buildings, 2 is real estate,
3 is farm produced durables, and 4 is self-employed and unpaid family labor.
Elasticities: Short-Run
Point estimates for the model reported in Table 1 can be used to evaluate short-run
elasticities. One important feature of the adjustment cost model is its emphasis on
maintaining a clear conceptual distinction between short-run and long-run responses
to changing opportunity costs. This was cited earlier as an important justification for
conducting the present study. Since agricultural producers are exposed to constantly
changing opportunity costs, it is useful to evaluate behavior in this volatile economic
environment.
Short-run elasticities are reported in Table 2. Diagonal elements of the matrix
are own-price elasticities; off-diagonal elements are the corresponding cross-price
elasticities. Own-price elasticities for the quasi-fixed inputs are observed to be nu-
merically small in magnitude. This suggests that, given the technological constraints
faced by agricultural producers, a change in the rental price does not evoke signif-
icantly different utilization patterns for these inputs in the short-run. A different
conclusion emerges when short-run elasticities of variable inputs are evaluated. A
change in the price of variable inputs, namely hired labor and purchased inputs,
induces shifts in utilization patterns. Now turn to an examination of own-price elas-
ticities for outputs. This value for livestock is extremely small although values for
all other inputs exceed one. Among outputs considered, the own-price elasticity for
grain was highest. Values for dairy and other crops are numerically similar.
Real Estate -7.848E-8 -1.359E-8 2. I 84E-7 9.635E-8 1. 827E-7 -2.799E-8 -1.859E-8 -4.208E-7 6.202E-8
Farm Durables 5.903E-8 -2.205E-7 -8.3221E-6 -3.5105E-6 -3.41OE-6 -4.209E-8 -5.045E-7 3.927E-6 1.202E-5
Self Employed Labor -9.0371E-7 -3. 5204E-7 5.0414E-7 -2.319E-6 -3.957E-6 -5.293E-7 -1.0848E-6 -2.961E-7 8.938E-6
~
Hired Labor -0.020 -0.005 -0.012 -0.062 -0.131 -0.016 -0.021 -0.039 0.277
&
S·
OQ
Energy -0.022 -0.005 -0.013 -0.025 -0.039 -0.014 -0.017 -0.093 0.241 S';:s
I:l
;:I
~.
Purchased Inputs -0.005 -0.002 -0.007 -0.011 -0.015 -0.003 -0.005 -0.008 0.130
~
"-
0.008 ~
Livestock 0.016 0.002 0.011 0.006 0.004 0.009 0.105 -0.146 Ii::
~
"-
).
Crops 0.283 0.0198 0.626 0.277 0.397 0.175 0.134 0.727 0.001
~
Ii::
'"li
;:s
..."-
N
.-
VI
216 A. Somwaru et al.
alternative to building ad hoc structural models, one can model dynamic relationships
between economic variables as vector autoregressions. These are essentially reduced-
form relationships, which utilize a priori restrictions only parsimoniously. The model
utilized in the present study is a structural one, with a structure derived explicitly from
relevant economic theory. Restrictions placed on the model are not ad hoc, but rather,
are shown to be implied by the value maximization hypotheses. In this sense, there is
significant justification for adopting this approach. However, there is always the risk
that restrictions imposed on the model, especially curvature, are not supported by the
data. It is argued here that structural models based on dynamic optimization should
be used in their entirety or not at all. Selective utilization of econometric equations
from value maximization without implied parametric restrictions is hard to justify.
This is equally true for static models within the dual framework, where curvature
conditions are frequently not imposed. An important exception is Ball (1988).
Accordingly, a methodology is developed to impose convexity restrictions on a
multiple-input, multiple-output model of aggregate agricultural production. Point
estimates of parameters are used to evaluate short-run elasticities. Generally, short-
run behavioral responses of quasi-fixed inputs to opportunity costs are small. Hence,
implementation of a favorable tax policy in U.S. agriculture would have a small,
albeit non-negligible, effect on investment. By contrast, changing relative prices are
observed to have a significant effect on quantities of variable inputs employed. In a
similar fashion, supply response is also observed to be elastic. A supply management
policy based on manipulating market incentives can prove to be effective, especially
in U.S. agriculture.
Several shortcomings of the present modeling effort may be mentioned. First, it
is necessary to address the issue of expectations formation in a more sophisticated
manner than was possible with the present model. Inclusion of non-static expectations
would serve to strengthen the foundations of the multiple-input, mUltiple-output
model. This will necessarily entail imposition of highly complex cross-equation
restrictions. For this reason, incorporating non-static expectations is relegated to
a future goal. A second issue that needs to be addressed is the inclusion of U.S.
agricultural policy variables in a more direct fashion than was possible within the
exiting framework. While recognizing the importance of this objective, it should be
noted that the U.S. agricultural production sector has been exposed to a diverse set
of policy instruments, which have changed constantly over time. It is by no means
an easy task to integrate this matrix of policy interventions into the existing model.
Future investigations must, therefore, concentrate on the dual objective of both
improving model specification and making models more applicable to policy analysts
and performing hypotheses tests on the structure of the production system. Finally,
alternative functional forms must be subject to experiment to evaluate robustness of
empirical results to alternative specifications (Baffes and Vasavada, 1989). These are
some meaningful directions to pursue in the study of resource response to changing
market incentives.
Modeling Dynamic Resource Adjustment 217
REFERENCES
1. Arrow, K. and M. Kurz (1970), Public Investment, the Rate of Return and Optimal Social
Policy, Baltimore; Johns Hopkins Press.
2. Bacharach, M. (1965), "Estimating Non-Negative Matrices From Marginal Data," Inter-
national Economic Review, 6: 294-310.
3. Baffes,1. and U. Vasavada (1989), "On the Choice of Functional Forms in Agricultural
Production Analysis," Applied Economics, 21: 1053-1061.
4. Ball, V. E. (1988), "Modelling Supply Response in a Multiproduct Framework," Ameri-
can Journal of Agriculture Economics, 70: 813-25.
5. Ball, V.E. (1985), "Output, Input, and Productivity Measurement in U.S. Agriculture,"
American Journal of Agricultural Economics, 67: 475-86.
6. Brooke, A. D. Kendrick, and A. Meeraus (1988), GAMS: A User's Guide, The Scientic
Press.
7. Epstein, L. G. (1981), "Duality Theory and Functional Forms for Dynamic Factor De-
mands," Review of Economic Studies, 48: 81-95.
8. Chambers, R. G. and R. Lopez (1984), "A General Dynamic Supply Response Model,"
Northeastern Journal of Agricultural and Resource Economics, 13: 142-154.
9. Gollop, F., and D. Jorgenson (1980), "U.S. Productivity Growth by Industry, 1947-
73." in New Developments in Productivity Analysis, ed. 1. Kendrick and B. Vaccara.
National Bureau of Economic Research, Studies in Income and Wealth, Vol. 44. Chicago;
University of Chicago Press.
10. Gunter, L. and U. Vasavada (1988), "Dynamic Labor Demand Schedules for U.S. Agri-
culture," Applied Economics.
11. Jorgenson, Dale W. (1974), "The Economic Theory of Replacement and Depreciation,"
Econometrics and Economic Theory: Essays in Honor of Jan Tinbergen. Ed. W. Sell-
akaerts; London, MacMillian Publishing Co., pp. 189-222.
12. Lau, L. J. (1978), "Testing and Imposing Monotonicity, Convexity, and Quais-Convexity
Constraints," Production Economics: A Dual Approach to Theory and Applications. Ed.
D. McFadden and M. Fuss; Amsterdam, North Holland.
13. Malinvaud, E. (1970), Statistical Methods of Econometrics, Amsterdam, North Holland,
1970.
14. Mortenson, D.T. (1973), "Generalized Cost of Adjustment and Dynamic Factor Demand
Theory," Econometrica, 41: 657:65.
15. Murtaugh B. and M. Saunders (1983), MINOS 5.0 Users Guide. Systems Optimization
Laboratory Technical Report SOL 83-20, Stanford University, Stanford, Ca.
16. Murtaugh B. and M. Saunders (1983), MINOS 5.0 Users Guide. Systems Optimization
Laboratory Technical Report SOL 83-20, Stanford University, Stanford, Ca.
17. Nadiri, M. I., and S. Rosen (1969), "Interrelated Factor Demands," American Economic
Review, 59: 457-71.
18. Rao, C.R. (1973), Linear Statistical Inference and Its Applications, John Wiley and Sons;
New York.
19. Ruble, w.L. (1968), "Improving the Computation of Simultaneous Stochastic Linear
Equation Estimates," Ph.D. Thesis, Department of Economics Michigan State University;
East Lansing.
20. Shumway, G.R. (1983), "Supply, Demand, and Technology in a Multiproduct Industry:
Texas Field Crops," American Journal of Agricultura' ·<:conomics, 65: 748-60.
21. U.S. Treasury Department (1942), Bureau of Internal Revenue, Income Tax Depre-
ciation and Obsolescence, Estimated Useful Life and Depreciation Rates, Bulletin F;
Washington, DC.
22. Vasavada, U., and V.E. Ball (1988), "Modeling Dynamic Adjustment in a Multi-Output
Framework," Agricultural Economics.
23. Vasavada, u., and R. G. Chambers (1986), "Investment in U.S. Agriculture," American
Journal ofAgricultural Economics, 68: 950-60.
218 A. Somwaru et al.
24. Weaver. R.D. (1983), "Multiple Input, Multiple Output Production Choices and Tech-
nology in the U.S. Wheat Regions," American Journal of Agricultural Economics, 65:
45-56.
ATREYA CHAKRABORTY AND CHRISTOPHER F. BAUM*
ABSTRACT. This paper focuses on the construction of an index of the intensity of firms'
antitakeover defenses. While many aspects of corporate behavior are qualitative in nature,
an evaluation of a firm's stance and the underlying motives for its behavior often depend on
the elements of a set of qualitative factors. The interactions beween these factors are likely
to have important implications. In this context, only a composite measure will capture these
interactions and their implications for firms' actions. We focus on the creation of an ordinal
measure of anti-takeover defenses and utilize the ordered probit estimation technique to relate
the magnitude of this measure to the motives for instituting these defenses. Our estimates are
generally supportive of the managerial entrenchment hypothesis.
1. INTRODUCTION
The merger wave of the 1980s, coupled with the sophistication of investment banks'
financial engineers, caused many large corporations in the United States to include
anti-takeover amendments in their corporate charters. Rosenbaum (1986) details
anti-takeover measures for 424 Fortune 500 firms as of May 1986. Among these
companies, a surprisingly large number (403) had at least some amendments that
were designed to have anti-takeover consequences or that could be adopted to thwart
takeover attempts. Of these firms, Rosenbaum documents 143 as having poison
pills, 158 with fair price amendments, 223 with classified boards, 362 with blank
check provisions, 65 that require a supermajority to approve a merger, and 222 firms
as having some types of limited shareholder rights. We now define each of these
defensive measures.
Poison Pills: These are preferred stock rights plans adopted by the management,
generally without the shareholders' approval. These amendments are exclusively
tailored to thwart hostile bids by triggering actions that make the target financially
unattractive.
Fair Price Amendments: These are designed to prevent two-tier takeover offers.
They require that the bidders pay all the tendering shareholders the same price. Most
fair price provisions can be waived if the bidder's offer is approved by a supermajor-
ity of target shareholders. This supermajority requirement may be as low as 66% or
as high as 90%.
Classified Boards: Such amendments divide the board of directors into three classes.
Each year only one class of directors is due for election. This prevents a raider from
immediately replacing the full board and taking control of a company, even if the
Intensity of Takeover Defenses: The Empirical Evidence 221
raider controls a majority of the shares. More importantly, such amendments also
make proxy contests over control extremely difficult.
Blank Check: These give the managers (via the board of directors) a "very broad
discretion to establish voting, dividend conversion and other rights for preferred
stock that a company may use" (Rosenbaum, 1986, p. 7). Such discretionary powers
may easily be used to issue securities primarily intended to thwart takeovers (poison
pills). Finally, since the SEC requires companies seeking to issue preferred shares to
disclose to shareholders that unused preferred stock may have anti-takeover effects,
regardless of the company's professed intention, Rosenbaum contends that blank
checks should indeed be classified as anti-takeover measures.
Many researchers' analyses of takeover defenses have been hindered by the general
unavailability of data on the prevalence of such measures. Although a detailed study
of a firm's SEC filings and annual reports would provide much of this information,
there are still serious issues of heterogeneity and classification of the firm-specific
measures into generally accepted categories.
Our goal is an index that incorporates all categories of defensive strategies into
a single ordinal index value. The presence of a particular anti-takeover defense in a
corporate charter is a qualitative factor. Since there is no consensus on the severity of
various defensive mechanisms, it would appear that any cardinal index of the strength
of these defenses would be arbitrary. To deal with this critique, we have used the
qualitative information in Rosenbaum's Takeover Defenses dataset, combined with
other measures of firms' characteristics, to build an ordinal index of "intensity."
We combine data from Rosenbaum's dataset, which indicates whether firms had
various anti-takeover amendments in place as of 1986, with firms' characteristics from
Thies and Baum's Panel84 dataset. I The latter dataset contains annual data on the
firm level for 1977-1983 for a total of 134 large U.S. manufacturing corporations.
Panel84 reconstructs financial statements on a replacement-cost-accounting basis,
exploiting inflation-adjusted data obtained from firms' Forms 10-K and annual report
disclosures required during this period by the Securities and Exchange Commission
and the Federal Accounting Standards Board. 2 These data are particularly appropriate
for this study since we hypothesize that Tobin's q is a relevant explanatory variable,
and Panel84 contains consistent estimates of Tobin's q that are largely free from the
imputation bias created by the commonly-used methods of Brainard, Shoven and
Weiss (1980). This bias may be especially harmful, as it represents measurement
error correlated with common indicators of firm performance, as noted by Klock et
al. (1991).
Of the 134 firms in Pane184, there are 68 which are also to be found in Rosen-
baum's Takeover Defense dataset. 3 We use these matching firms in the empirical
analysis of the next section. Since the Panel84 data provides us with annual detail
of firms' performances from 1977 through 1983, it can be viewed as exogenous to
the observation of firms' takeover defenses in 1986. Although the intervening years
could provide useful information, the use of firm characteristics from the 1977-1983
1 An earlier version of the dataset, containing 100 finns, is documented in Thies and
Sturrock (1987), and was further described in Klock, Thies and Baum (1991). Of these 100
finns, 98 appear in the present dataset. Data for the additional 36 finns were gathered by Glenn
Rudebusch and Steven Oliner of the Federal Reserve Board of Govemors.
2 During the period in which finns were required to report current cost data in their
annual reports (197(r1983), they were given broad leeway in the methodology used for their
calculations. Thus, the accuracy of these figures could be debated. Nevertheless, we presume
that these estimates of replacement cost are likely to be more reliable than those which could
be constructed by outside researchers via adjustments to historical costs, using aggregate price
deflators for capital goods, without access to finn-specific vintage data.
3 A list of the 68 finns and their two-digit industry codes is available from the authors on
request.
Intensity of Takeover Defenses: The Empirical Evidence 223
TABLE I
Descriptive statistics of the sample.
Variable Mean Std. Dev. Minimum Maximum
Tobin's Q 0.97 0.43 0.45 2.83
Financial Leverage 0.43 0.34 0.02 1.50
R&D per dollar of Sales 0.025 0.026 0.0 0.110
Advertising per dollar of Sales 0.D15 0.027 0.0 0.166
Sigma from CAPM regression 7.58 2.74 4.45 19.94
Y = J if Z > ILj-1 .
Intensity of Takeover Defenses: The Empirical Evidence 225
Here y is observed, while z is not. We wish to estimate the parameters of the (3 vector
as well as the vector /L. Since the model includes a constant term, one of the /L'S is not
identified. We accordingly normalize /Lo to zero, and estimate /LI ... /LJ-I. In our
setting, J is equal to 3, the maximum value of intensity, so that estimates of /LI and
/L2 may be recovered. As in the binomial probit model, we cannot separately identify
the variance of the error term, which is thus set to unity. The ordered probit estimator
is available as a component of LIMDEP (Greene, 1992) and is further described in
Greene (1990, pp. 703-706).
In this section, we formulate an hypothesis under which the intensity of firms' anti-
takeover amendments should be related to observable measures of their behavior.
If we observe firms altering their corporate charters to include these amendments
- as we did in large number in the 1980s - we might conclude that firms' actions
are merely reflective of their shareholders' interests and that these actions are being
taken to maximize the value of the firm, as neoclassical theory would suggest.
An alternative explanation has been provided by finance researchers analyzing the
aspects of agency costs. In this formulation, protective measures such as ATAs may
well be indications of an "entrenched management" - essentially, managers who are
looking after their personal interests first and foremost and may indeed take actions
contrary to shareholders' best interests. In examining the intensity of ATAs, we
consider explanatory factors that would be indicative of the entrenched management
hypothesis. Under this hypothesis, we assume that the primary role of anti-takeover
amendments is to insulate underperforming managers from the discipline of the
market for corporate control.
The entrenched management hypothesis suggests that firms that are more likely to
be takeover targets should display a greater intensity of takeover defenses. What gives
rise to this vulnerability? One obvious cause is a poor performance - irrespective
of reason - that leads the firm's valuation in the financial markets to be low. Since
the financial markets' evaluations of a firm's worth are forward-looking measures,
a firm with poor performance and poor prospects will have a low valuation in the
stockmarket and may well be a takeover target - especially if a raider considers the
root of that poor performance to be inadequate management. The shareholders of
such a firm would not want to place obstacles in the raider's path, and ifthey share the
market's low opinion of managerial talent, they would encourage a takeover. Such
managers would, naturally, have every reason to protect themselves, most especially
in the case where they recognize their own shortcomings.
To quantify this rationale, we utilize Tobin's q as an objective and non-myopic
indicator of current management's performance. Past research on the relation between
q and takeovers (Servaes, 1991; Lang et aI., 1989; Morck et aI., 1988) has interpreted
q as a measure of managerial performance: e.g. "In general the shareholders of low
q targets benefit more from takeovers than shareholders of high q targets." (Lang et
al.,1989,p.137).
226 A. Chakraborty and c.F. Baum
If q were indeed a good proxy for managerial efficiency, and if defensive strate-
gies were primarily instituted for shareholders' interests, then one should not expect
Tobin's q to have any explanatory power in predicting the adoption of defensive
measures. However, if the manager were in charge of instituting defensive instru-
ments, one would expect an inverse relationship between anti-takeover amendments
and Tobin's q: the less efficient the manager (as signalled by a lower q), the greater
would be the need for her insulating herself from the disciplining forces of the market.
The second explanatory variable which we include in the model of ATA intensity
is a measure of financial leverage. The rationale for including financial leverage as
an explanatory variable for the analysis of takeover defenses comes from the works
of Jensen (1986) and Ross (1977). The role of debt in motivating organizational
efficiency is well documented in the principal-agent literature. Jensen (1986) points
to the possibility that, more than any other action, debt creation can actually lend
credibility to the management's promise to payout future cash flows. Thus, higher
levels of debt lower the agency cost of cash flows by reducing the discretionary levels
of cash flow available to the management. Conversely, managers less willing to be
exposed to the disciplines of the financial markets would tend to have lower levels
of debt and greater need for defensive measures to insulate them from the market for
corporate control.
The explanatory power ofleverage in predicting anti-takeover amendments should
be minimal if we assume that these amendments primarily serve the shareholders'
interests. We should not expect any systematic relationship between leverage and
defensive measures under this hypothesis if we assume a well functioning market for
corporate control. However, if defensive amendments serve primarily the interests of
an entrenched management, ceteris paribus, one would expect an inverse relationship
between leverage and the probability of adopting anti-takeover measures.
Under the hypothesis that the adoption of anti-takeover amendments reflects the
actions of an entrenched management, we would expect to find
[-] [-]
Pr{ ATA} = f [ Tobin's q, Leverage]
with respect to the likelihood of observing individual ATAs, or, by making use of the
index of ATA intensity, that
[-] [-]
Intensity{ATA} =f [Tobin's q, Leverage].
TABLE 2
Ordered probit estimates of the intensity of takeover defense strate-
gies.
Dependent variable Intensity Intensity2
Constant 5.396 4.997
(2.7) (2.6)
3.395
(2.0)
To ensure that the validity of these results is not an artifact of the Intensity index,
we constructed an aggregation of the index, collapsing categories 2 and 3 into a single
category. The resulting Intensity2 index considers that all firms with one or more
"severe" anti-takeover amendments are given the same value. The results from this
version of the ordinal index are given in the second column of Table 2. They are
qualitatively very similar to those achieved with Intensity: Tobin's q and leverage
are highly significant with the expected negative sign.
In practice, the value of such a model of Intensity must involve its predictive
power. Table 3 reports the distribution of "actual" values of the original Intensity
index, and presents the distribution of predicted values for the model given in Table 2.
The discrete prediction of the model, for each firm, is taken as the alternative among
the set {O, 1,2,3} that has the greatest probability. Both the full and restricted models
correctly classify the two firms with Intensity = O. Both models underpredict the
228 A. Chakraborty and c.F. Baum
TABLE 3
Ordered probit predictions of intensity.
Actual Predicted
0 2 3 Total
0 2 0 0 0 2
I 0 6 9 2 17
2 0 7 14 7 28
3 0 0 9 12 21
Total 2 13 32 21 68
Notes:
Cells are the frequencies of actual and predicted outcomes. The
predicted outcome has maximum probability. The results correspond
to the model reported in the first column of Table 2.
TABLE 4
Predicted probabilities of intensity levels for variations in To-
bin's q and leverage.
Intensity 0 2 3
30% below avg 0.0004 0.1072 0.4013 0.4910
20% below avg 0.0008 0.1498 0.4395 0.4099
10% below avg 0.0015 0.2021 0.4639 0.3325
Average q 0.0029 0.2636 0.4719 0.2616
10% above avg 0.0054 0.3326 0.4626 0.1994
20% above avg 0.0095 0.4063 0.4371 0.1471
30% above avg 0.0161 0.4809 0.3981 0.1048
0.4~------~----~-------r~~~-T--------~----------~'
0.35
0.3
0
0.25
=
....a
E-t
II) 0.2 !i
c::CD tl
E-t
Cl en
O.lS
0.1
0.05
0
-4 -b'X -2 P(l)-b'X 0 P(2)-b'X 2 4
Standard deviation units
0.4
0.35
0.3
0
0.25 =
.......
::., E-t
II)
c:: 0.2 ~
E-t
c! '"
0.15
0.1
o.os
0
-4 -b'X -2 p(l)-b'X o11(2)-b 'X 2 4
Standard deviation units
Fig. 1. Predicted probabilities of intensity levels. (a) Evaluated at point of means. (b)
Evaluated at 20% below average q, mean leverage.
value. The model predicts, in this case, a probability of 0.49 that such a firm will
have Intensity=3: at least one of the severe anti-takeover defenses, accompanied
by three or more of the less severe defenses, or at least two of the severe defenses.
The distribution shifts markedly toward a stronger defensive posture with lower than
average q, and vice versa. Figure 1 depicts how this shift takes place, illustrating
how the thresholds between strength categories are displaced leftward when we move
from average q levels to a q level 20 per cent below average. The results for variations
in leverage are similar, if less marked. The probability that a firm will have Strength
= 3 rises from 0.26 (with average leverage) to 0.32 for a firm with leverage 30 per
cent higher than the average. These results predict a very strong response, in terms
of the intensity of anti-takeover amendments, to either a low value of q or lower
leverage. Since a low value for either of these financial characteristics is indicative
of a greater probability of a raid (Servaes, 1991; Palepu, 1986), the presence of these
amendments should not be expected to reflect shareholders' interests.
5. CONCLUSIONS
REFERENCES
Brainard, w., John Shoven, and Laurence Weiss, 1980. "The Financial Valuation of the Return
to Capital". Brookings Papers on Economic Activity 2, 453-51l.
DeAngelo, Harry, and Edward M. Rice, 1983. "Anti-takeover Charter Amendments and
Stockholder Wealth". Journal of Financial Economics 11, 329-359.
Greene, William, 1990. Econometric Analysis. New York: Macmillan Publishing Co.
Greene, William, 1992. LIMDEP Version 6.0 User's Manual. Bellport, NY: Econometric
Software, Inc.
Jarrell, Gregg and Annette Poulsen, 1987. "Shark Repellents and Stock Prices: The Effects of
Antitakeover Amendments Since 1980". Journal of Financial Economics 19,127-168.
Jensen, Michael C., 1986. "Agency Costs of Free Cash Flow, Corporate Finance, and
Takeovers". American Economic Review 76, 323-329.
Klock, M., C.P. Thies, and C.P. Baum, 1991. ''Tobin's q and Measurement Error: Caveat
Investigator". Journal of Economics and Business 43, 241-252.
Intensity of Takeover Defenses: The Empirical Evidence 231
Lang, L., R Stulz, and RA. Walkling, 1989. "Managerial performance, Tobin's q and the
gains from successful tender offers". Journal of Financial Economics 24, 137-154.
Linn, Scott c., and John T. McConnell, 1983. "An Empirical Investigation of the Impact of
Antitakeover Amendments on Common Stock Prices". Journal of Financial Economics
11,361-399.
Malatesta, Paul H., and Ralph A. Walkling, 1988. "Poison Pill Securities". Journal ofFinancial
Economics 20, 347-376.
Morck, Randall, Andrei Shleifer, and Robert W. Vishny, 1988. "Characteristics of Targets of
Hostile and Friendly Takeovers", in Corporate Takeovers: Causes and Consequences,
Alan Auerbach, ed. Chicago: University of Chicago Press.
Palepu, Krishna G., 1986. "Predicting Takeover Targets: A Methodological and Empirical
Analysis". Journal ofAccounting and Economics 8,3-37.
Pugh, W.N., Page, D.E., and J. S. Jahera, Jr., 1992. "Anti takeover Charter Amendments:
Effects on Corporate Decisions". Journal of Financial Research 15:1, 57-67.
Rosenbaum, Virginia K., 1986. "Takeover Defenses: Profiles ofthe Fortune 500". Washington:
Investor Responsibility Research Center, Inc.
Ross, S.A., 1977. "The Determination of Financial Structure: The Incentive-Signalling Ap-
proach". Bell Journal of Economics 8, 23-40.
Ruback, Richard, 1988. "An overview of takeover defenses", in Mergers and Acquisitions,
Alan Auerbach, ed. Chicago:University of Chicago Press.
Servaes, Henri, 1991. "Tobin's Q and the Gains from Takeovers". Journal of Finance 66:1,
409-419.
Thies, Clifford and Thomas Sturrock, 1987. "What Did Inflation Cost Accounting Tell Us?"
Journal of Accounting, Auditing and Finance Fall 1987, pp. 375-391.
Zavoina, T., and W. McElvey, 1975. "A Statistical Model for the Analysis of Ordinal Level
Dependent Variables". Journal of Mathematical Sociology 4, 103-120.
List of Contributors
V. Eldon Ball
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA
Ravi Bansal
Department of Economics, Duke University, Durham, NC, USA
Christopher F. Baum
Department of Economics, Boston College, Chestnut Hill, MA, USA
C.R. Birchenhall
Department of Econometrics, University of Manchester, Manchester, England
Ismail Chabini
Centre de Recherche sur les Transports, University of Montreal, Montreal, Canada
Atreya Chakraborty
Lemberg Program in International Finance, Brandeis University, USA
Gregory C. Chow
Department of Economics, Princeton University, Princeton, NJ, USA
June Dong
Department of General Business and Finance, School of Management, University of
Massachusetts, Amherst, MA, USA
Omar Drissi-Kaitouni
Centre de Recherche sur les Transports, University of Montreal, Canada
Michael Florian
Centre de Recherche sur les Transports, University of Montreal, Montreal, Canada
A. Ronald Gallant
North Carolina State University, Department of Statistics, Raleigh, NC, USA
234 List of Contributors
William L. Goffe
Department of Economics and Finance, Cameron School of Business, University of
North Carolina, Wilmington, NC, USA
Charles Hallahan
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA
Robert Hussey
Department of Economics, Loyola University of Chicago, Chicago, IL, USA
David Kendrick
Department of Economics, University of Texas, Austin, TX, USA
YueMa
Department of Economics, University of Strathclyde, Glasgow, United Kingdom
Anna Nagurney
Department of General Business and Finance, School of Management, University of
Massachusetts, Amherst, MA, USA
Alfred L. Norman
Department of Economics, University of Texas, Austin, TX, USA
Albert J. Reed
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA
Ber~ Rustem
Department of Computing, Imperial College, London
Agapi Somwaru
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA
George E. Tauchen
Department of Economics, Duke University, Durham, NC, USA
Utpal Vasavada
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA
Index
235
236 Index