0% found this document useful (0 votes)
80 views237 pages

Wale Vet

wavelet

Uploaded by

Leandro Barros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
80 views237 pages

Wale Vet

wavelet

Uploaded by

Leandro Barros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 237

COMPUTATIONAL TECHNIQUES FOR ECONOMETRICS

AND ECONOMIC ANALYSIS


Advances in Computational Economics

VOLUME 3

SERIES EDITORS

Hans Amman, University ofAmsterdam, Amsterdam, The Netherlands


Anna Nagurney, University of Massachusetts at Amherst, USA

EDITORIAL BOARD

Anantha K. Duraiappah, European University Institute


John Geweke, University of Minnesota
Manfred Gilli, University of Geneva
Kenneth L. Judd, Stanford University
David Kendrick, University of Texas at Austin
Daniel McFadden, University of California at Berkeley
Ellen McGrattan, Duke University
Reinhard Neck, Universitiit Bielefeld
Adrian R. Pagan, Australian National University
John Rust, University of Wisconsin
Berc Rustem, University of London
Hal R. Varian, University of Michigan

The titles published in this series are listed at the end of this volume.
Computational Techniques
for Econometrics
and Economic Analysis

edited by

D. A. Belsley
Boston College, Chestnut Hill, U.S.A.

Springer-Science+Business Media, B.V.


Library of Congress Cataloging-in-Publication Data

Computational techniques for econoletrics and economic analysis I


edited by David A. Belsley.
p. cm. -- (Advances in computational economics; v. 3)
Includes index.

1. Econometric lodels--Data processing. 2. Economics.


Mathelatical--Data processing. I. Belsley. David A. II. Series.
H6141. C625 1993
330' .01'5195--dc20 93-17956

ISBN 978-90-481-4290-3 ISBN 978-94-015-8372-5 (eBook)


DOI 10.1007/978-94-015-8372-5

All Rights Reserved


©1994 Springer Science+Business Media Dordrecht
Originally published by Kluwer Academic Publishers in 1994.
Softcover reprint of the hardcover I st edition 1994
No part of the material protected by this copyright may be reproduced
or utilized in any form or by any means, electronic or mechnical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner.
Table of Contents

Preface vii

Part One:
The Computer and Econometric Methods
Computational Aspects of Nonparametric Simulation Estimation
Ravi Bansal, A. Ronald Gallant, Robert Hussey, and George Tauchen 3
On the Accuracy and Efficiency Of GMM Estimators: A Monte Carlo Study
A. J. Hughes Hallett and Yue Ma 23
A Bootstrap Estimator for Dynamic Optimization Models
Albert J. Reed and Charles Hallahan 45
Computation of Optimum Control Functions by Lagrange Multipliers
Gregory C. Chow 65

Part Two:
The Computer and Economic Analysis
Computational Approaches to Learning with Control Theory
David Kendrick 75
Computability, Complexity and Economics
Alfred Lorn Norman 89
Robust Min-Max Decisions with Rival Models
Ber~ Rustem 109

Part Three:
Computational Techniques for Econometrics
Wavelets in Macroeconomics: An Introduction
William L. Goffe 137
MatClass: A Matrix Class for C++
C. R. Birchenhall 151
Parallel Implementations of Primal and Dual Algorithms for Matrix
Balancing
Ismail Chabini, Omar Drissi-Kartouni and Michael Florian 173
VI Table of Contents

Part Four:
The Computer and Econometric Studies
Variational Inequalities for the Computation of Financial Equilibria in the
Presence of Taxes and Price Controls
Anna Nagurney and June Dong 189
Modeling Dynamic Resource Adjustment Using Iterative Least Squares
Agapi Somwaru, Eldon Ball and Utpal Vasavada 207
Intensity of Takeover Defenses: The Empirical Evidence
Atreya Chakraborty and Christopher F. Baum 219

List of Contributors 233

Index 235
DAVID A. BELSLEY, EDITOR

Preface

It is unlikely that any frontier of economics/econometrics is being pushed faster,


further, and in more directions than that of computational techniques. The computer
has become both a tool for doing and an environment in which to do economics and
econometrics. Computational techniques can take over where theory bogs down,
allowing at least approximate answers to questions that defy closed mathematical or
analytical solutions. Computational techniques can make tasks possible that would
otherwise be beyond human potential. And computational techniques can provide
working environments that allow the investigator to marshal all these forces efficiently
toward achieving desired goals.
This volume provides a collection of recent studies that exemplify all the elements
mentioned above. And beyond the intrinsic interest each brings to its respective
subject, they demonstrate by their depth and breadth the amazing power that the
computer brings to the economic analyst. Here we see how modern economic
researchers incorporate the computer in their efforts from the very inception of a
problem straight through to its conclusion.

THE COMPUTER AND ECONOMETRIC METHODS

In "A Nonparametric Simulation Estimator for Nonlinear Structural Models,"


R. Bansal, A.R. Gallant, R. Hussey, and G. Tauchen combine numerical techniques,
the generalized method-of-moments, and non-parametrics to produce an estimator
for structural economic models that is defined by its ability to produce simulated
data that best match the moments of a scoring function based on a non-parametric
estimate of the conditional density of the actual data.
In "On the Accuracy and Efficiency of GMM Estimators: A Monte Carlo Study,"
AJ. Hughes Hallett and Yue Ma provide Monte Carlo evidence that helps to evaluate
the relative small-sample characteristics of several of the more popular generalized
method-of-moments estimators and surprise us by indicating that their own suggested
method seems to work best.
In "A Bootstrap Estimator for Dynamic Optimization Models," A.J. Reed and
C. Hallahan make use of a bootstrapping technique to provide estimates of stochastic,
dynamic programming models that can be made to conform to boundary restrictions
with relative ease.
In "Computation of Optimum Control Functions by Lagrange Multipliers,"
G. Chow explains and illustrates the gain in numerical accuracy that accompanies

D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis. vii-ix.
viii Preface

his method of Lagrange multipliers for solving the standard optimal control problem
over the more usual method of solving the Bellman equations.

THE COMPUTER AND ECONOMIC ANALYSIS

D. Kendrick in "Computational Approaches to Learning with Control Theory" dis-


cusses the means by which the more realistic assumptions that different economic
agents have different knowledge and different ways of learning can be incorporated
in economic modeling.
In "Computability, Complexity and Economics," AL Norman works to find a
framework within which the mathematical theories of computability and complexity
can be used to analyze and compare the relative merits of various of the optimizing
procedures used in economics.
Ber~ Rustem, in "Robust Min-Max Decisions with Rival Models," provides an
algorithm for solving a constrained min-max problem that can be used to produce a
robust optimal policy when there are rival models to be accounted for.

COMPUTATIONAL TECHNIQUES FOR ECONOMETRICS

Continuing his tradition of seeing what the cutting edge has to offer economic and
econometric analysis, W.L. Goffe, in ''Wavelets in Macroeconomics: An Introduc-
tion," examines the usefulness of wavelets for characterizing macroeconomic time
series.
C.R. Birchenhall, in "MatClass: A Matrix Class for C++," provides an introduc-
tion to object-oriented programming along with an actual C++ object class library
in a context of interest to econometricians: a set of numerical classes that allows the
user ready development of numerous econometric procedures.
In "Parallel Implementations of Primal and Dual Algorithms for Matrix Balanc-
ing," I. Chabini, O. Drissi-Kailouni, and M. Florian exploit the power of parallel
processing (within the accessible and inexpensive "286" MS-DOS world) to bring
the computational task of matrix balancing, both with primal and dual algorithms,
more nearly into line.

THE COMPUTER AND ECONOMETRIC STUDIES

In "Variational Inequalities for the Computation of Financial Equilibria in the Pres-


ence of Taxes and Price Controls," A. Nagurney and 1. Dong develop a computa-
tional procedure that decomposes large-scale problems into a network of specialized,
individually-solvable subproblems on their way toward analyzing a financial model
of competitive sectors beset with tax and pricing policy interventions.
In "Modeling Dynamic Resource Adjustment Using Iterative Least Squares,"
A. Somwaru, V.E. Bell, and U. Vasavada develop and illustrate a computational
Preface IX

procedure for estimating structural dynamic models subject to restrictions such as


the inequalities entailed on profits functions through convexity in prices.
Recognizing that corporate behavior can be greatly affected by qualitative as well
as quantitative elements, A. Chakraborty and c.P. Baum, in "Intensity of Takeover
Defenses: The Empirical Evidence," harness the power of the computer to allow
them to study the qualitative issues surrounding the adoption and success of various
anti-takeover devices.

David A. Belsley
PART ONE

The Computer and Econometric Methods


RAYI BANSAL, A. RONALD GALLANT, ROBERT HUSSEY AND GEORGE TAUCHEN

Computational Aspects of N onparametric


Simulation Estimation

ABSTRACT. This paper develops a nonparametric estimator for structural equilibrium models
that combines numerical solution techniques for nonlinear rational expectations models with
nonparametric statistical techniques for characterizing the dynamic properties of time series
data. The estimator uses the the score function from a nonparametric estimate of the law
of motion of the observed data to define a GMM criterion function. In effect, it forces the
economic model to generate simulated data so as to match a nonparametric estimate of the
conditional density of the observed data. It differs from other simulated method of moments
estimators in using the nonparametric density estimate, thereby allowing the data to dictate
what features of the data are important for the structural model to match. The components
of the scoring function characterize important kinds of nonlinearity in the data, including
properties such as nonnormality and stochastic volatility.
. The nonparametric density estimate is obtained using the Gallant-Tauchen seminonpara-
metric (SNP) model. The simulated data that solve the economic model are obtained using
Marcet's method of parameterized expectations. The paper gives a detailed description of the
method of parameterized expectations applied to an equilibrium monetary model. It shows that
the choice of the specification of the Euler equations and the manner of testing convergence
have large effects on the rate of convergence of the solution procedure. It also reviews sev-
eral optimization algorithms for minimizing the GMM objective function. The Neider-Mead
simplex method is found to be far more successful than others for our estimation problem.

1. INTRODUCTION

A structural equilibrium model is a complete description of a model economy in-


cluding the economic environment, the optimization problem facing each agent, the
market clearing conditions, and an assumption of rational expectations. A structural
equilibrium model is difficult to estimate, as doing so entails repeated solution of a
fixed-point problem in many variables. One approach is to employ a linearization,
typically linear-quadratic, in conjunction with Gaussian specification for the errors.
A linear specification is attractive because a closed form solution can be obtained
(Hansen and Sargent, 1980). However, recent advances in numerical techniques now
make it possible to obtain good approximate solutions for nonlinear models. (See
the 1990 symposium in the lournal of Business and Economic Statistics (lBES) ,
summarized in Tauchen, 1990 and Taylor and Uhlig, 1990.) At the same time as
these developments in structural modelling have occurred, purely statistical models,
such as ARCH (Engle, 1982), GARCH (Bollerslev, 1986), and seminonparametric

D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 3-22.
© 1994 Kluwer Academic Publishers.
4 R. Bansal et al.

models (Gallant and Tauchen, 1989, 1992), have been used to discover and charac-
terize important forms of nonlinear behavior in economic time series, especially in
financial time series. Linear Gaussian models cannot explain such nonlinear behav-
ior in actual data. Thus, nonlinear structural models must be examined to see the
extent to which they can explain the nonlinear behavior found in actual economic
data. This paper shows how statistical techniques can be combined with numerical
solution techniques to estimate nonlinear structural equilibrium models.
The most common approach for estimation of nonlinear structural models is
probably generalized method of moments (GMM) applied to Euler equations, as de-
veloped in Hansen and Singleton (1982). This technique has been widely employed in
financial economics and macroeconomics, though it is a limited information method
and has shortcomings. For example, the estimation can encounter problems when
there are unobserved variables, as is the case for the model we consider in Section
2 where the decision interval is a week, but some of the data are observed monthly.
Also it does not provide an estimate of the law of motion of the economic variables.
Thus, if the model is rejected, little information is available regarding the properties
of the observed data that the model has failed to capture.
In this paper we describe an alternative strategy for estimating nonlinear structural
models that was first applied in Bansal, Gallant, Hussey, and Tauchen (1992). The
approach is similar to the simulated method of moments estimators of Duffie and
Singleton (1989) and Ingram and Lee (1991). However, unlike those estimators,
which match preselected moments of the data, our estimator minimizes a GMM
criterion based on the score function of a nonparametric estimator of the conditional
density of the observed data. In effect, the estimator uses as a standard of comparison
a nonparametric estimate of the law of motion of the observed data. By selecting the
GMM criterion in this way, we allow the observed data to determine the dynamic
properties the structural model must match.
The estimator works by combining the method of parameterized expectations for
numerically solving a nonlinear structural equilibrium model (Marcet, 1991; den
Haan and Marcet, 1990) with the seminonparametric (SNP) method for estimating
the conditional density of actual data (Gallant and Tauchen, 1989, 1992). For a par-
ticular setting of the parameters of the structural model, the method of parameterized
expectations generates simulated data that solve the model. The model parameters
are then estimated by searching for the parameter values that minimize a GMM crite-
rion function based on the scoring function of the SNP conditional density estimate.
The nonparametric structural estimator thus has three components: (1) using SNP to
estimate the conditional density of actual data, (2) using the method of parameter-
ized expectations to obtain simulated data that satisfy the structural model, and (3)
estimating the underlying structural parameters by using an optimization algorithm
that finds those parameter values that minimize the GMM criterion function.
Below we discuss in detail how the estimator works in the context of a two-country
equilibrium monetary model. The model is based on Lucas (1982), Svensson (1985),
and Bansal (1990), and is developed in full detail in Bansal, Gallant, Hussey, and
Tauchen (1992). It accommodates time non-separabilities in preferences (Dunn and
Singleton, 1986) and money via a transactions cost technology (Feenstra, 1986). In
An Estimator for Nonlinear Structural Models 5

effect, the model is a nonlinear filter that maps exogenous endowment and money
supply processes into endogenous nominal processes, including exchange rates, in-
terest rates, and forward rates. We show how this nonlinear dynamic model can be
solved and simulated for estimation and evaluation.
In applying our estimator to this model, we find that there are several choices
available to the researcher that greatly affect the estimator's success and rate of
convergence. For example, the form in which one specifies the Euler equations
on which the parameterized expectations algorithm operates can significantly affect
the speed of convergence. This is an important finding, since our estimator uses
this algorithm repeatedly at different model parameter values. Also, the means for
testing convergence can have important consequences; we find it best to test for
convergence of the projection used in parameterized expectations instead of testing
for convergence of the coefficients representing the projection. Finally, we find that
the complexity of our estimation procedure causes some optimization algorithms
to have greater success in minimizing the GMM objective function. Among the
optimization techniques we tried are gradient search methods, simulated annealing,
and simplex methods. In Section 3.1 below we discuss how these methods work and
their strengths and weaknesses for ol!r type of optimization problem.
The rest of the paper is organized as follows: Section 2 specifies the illustrative
monetary model and describes the simulation estimator. Section 3 discusses practical
aspects of implementing the estimator, including solving the model with parameter-
ized expectations and optimizing the GMM objective function to estimate the model
parameters. Concluding remarks comprise the final section.

2. THE NONPARAMETRIC STRUCTURAL ESTIMATOR

2.1. The Structural Model

We apply our nonparametric structural estimator to the equilibrium monetary model


of Bansal, Gallant, Hussey, and Tauchen (1992). In that model, a representative
world consumer has preferences defined over services from two consumption goods.
The utility function is assumed to have the form

Eo f
t=o
J3t [(cr; c;;-6 r--r - 1] /(1 - 1),

where 0 < J3 < 1,0 < 8 < 1,1 > 0, and where Cit and Cit are the consumption
services from goods produced in countries 1 and 2, respectively. Preferences are of the
constant relative risk aversion (CRRA) type in terms of the composite consumption
goods. The parameter 1 is the coefficient of relative risk aversion, 8 determines the
allocation of expenditure between the two services, and J3 is the subjective discount
factor. If 1 = 1, then preferences collapse to log-utility
00

Eo LJ3t (8 In Crt + (1- 8) In Cit).


t=o
6 R. Bansal et at.

The transformation of goods to services is a linear technology

where Cit and C2t are the acquisitions of goods, the /'i,ij determine the extent to which
past acquisitions of goods provide services (and hence utility) in the current period,
and Lc is the lag length. If Lc = 0, then the utility function collapses to the standard
time separable case where Cit = CIt and Cit = C2t. If the nonseparability parameters
/'i,ij are positive, then past acquisitions of goods provide services today. If they are
negative, then there is habit persistence. Other patterns are possible as well. Recent
acquisitions of goods can provide services today, while acquisitions further in the
past contribute to habit persistence.
We introduce money into the model via a transaction-costs technology. The
underlying justification for transactions costs is that the acquisition of goods is costly
both in terms of resources and time. Money, by its presence, economizes on these
costs and hence is valued in equilibrium. Transaction costs, 1/J( c, m), in our model are
an increasing function of the amount of goods consumed C and a decreasing function
of the magnitude of real balances m held by the consumer in the trading period. The
functional form we use for the transaction-costs technology is

where 1/Jo > 0 and a > 1.


The consumer's problem is to maximize expected utility Eo~~o/3tU(cit> cit) by
choosing Cit, C2t, MI,t+l, M2,t+h btt+I' and btt+I' k = 1, ... ,Na , attime t subject
to a sequence of budget constraints

PldCIt + 1/J(CIt, mit)] + etP2t[c2t + 1/J(C2t, m2t)]


Na Na
+ 2)I/R~t)b~,t+1 + l)fNR~t)btt+1 + MI,t+1 + et M 2,t+1
k=1 k=1
Na Na
< 2) 1/ R~t-I )bft + I)ftk- I/ R~t-I )b~t + Mit + et M2t
k=1 k=1
+ PltWIt + et P2t W2t + qlt + etq2t·
Here, PIt and P2t are current prices of consumption goods CIt and C2t in the units
of the respective country's currency. MI,t+1 and M 2,t+1 are the stocks of currency
in the two countries carried forward from period t to t + 1. Real money balances,
mit = Mit/Pit and m2t = M2t/ Pit, are defined in terms of beginning of period
money holdings. The b~,t+1 and bt t+I are the agent's holdings of risk-free claims to
the currencies of countries 1 and 2 in period t + k. Claims on country 1's currency are
made by trading pure discount bonds with gross k-period interest rates R~t. Claims
on country 2's currency are made by trading forward contracts in the currency market,
An Estimator for Nonlinear Structural Models 7

where et is the spot exchange rate and H is the k-period forward exchange rate, with
both rates defined in units of country 1's currency per unit of country 2's currency.
Wit and W2t are the stochastic endowments of goods within the two countries. Lump
sum transfers of qlt and q2t units of currency are made by the government at time t.
These transfers are known to the agent at the beginning of period t but can be used
for carrying out transactions only in period t + 1.
The stationary decision problem facing the agent delivers the following Euler
equations for the asset holdings MI,t+1 and M 2,t+1 :

i = 1,2,

where MUcit is the marginal utility of Cit, and 'l/JCit and 'l/Jmit are the derivatives
of transaction costs, 'l/J( Cit, mit), with respect to the first and second arguments,
respectively. Transactions costs modify the returns to the two monies, Mit and M 2t .
We would expect Plt/ PI,t+1 to be the return attime t+ 1 for carrying forward an extra
unit of country one's currency today. However, because of transaction costs, every
extra unit of currency carried forward also lowers transaction costs in the next period
by a real amount, -'l/Jmi,t+1 ' so the total return is given by [(1 - 'l/Jmi,t+l )Pit/ Pi,t+d,
The model also delivers an intratemporal restriction on the choice of goods Cit and
C2t

et = Et [( MUc2t ) ( Pit) ( 1 + 'l/JClt ) ] ,


MUclt P2t 1 + 'l/Jcu
In maximizing utility, the consumer faces an exogenous stochastic process that
governs the evolution of money growth and endowment growth in the two countries,
We define the operator d to produce the ratio of the value of a variable in one
period to its value in the previous period, as, for example, dM It = MIt/MI,t-l,
Using this operator, we specify a driving process for the exogenous state vector
St = (dMIt, dM2t , dWIt, dW2t) of the form

log St = ao + A log St-I + Ut,


where Ut is iid N(O, 0), ao is a 4-vector, and A and 0 are 4 x 4 matrices. More
complex stochastic processes for the exogenous state variables could easily be ac-
commodated by our numerical solution method.
The final elements needed to complete the description of the model are the market
clearing conditions
8 R. Bansal et al.

i = 1,2.
The parameter vector of the structural economic model is
A = ({3, 'Y, 6, 1/Jo, a, "-11, ... , "-lLc> "-21, ... , "-2L c> a~, vec(A)', vech(n1/2)')'.
For each value of A the model defines a nonlinear mapping from the strictly exogenous
process {St} to an output process {Ut }. The output process is
Ut = (dMlt , dM2t , dWlt, dW2t, dClt, dc2t , dP1t , dP2t , R1t, It let, det)',
which is an II-vector containing the elements of St along with the gross consumption
growth rates, the gross inflation rates, the four-period interest rate in country 1, the
ratio of the four-period forward exchange rate to the spot rate, and the gross growth
rate of the spot exchange rate. It proves convenient also to include the elements of St
in the output process, mapping them directly with an identity map. The particular set
of variables comprising the remaining elements of Ut are those endogenous variables
that turn out to be of interest for various aspects of the analysis of the model and the
empirical work.
The mapping from ( { St}, A) to the endogenous elements of Ut is defined by the
solution to the nonlinear rational expectations model. In practice, we use Marcet's
method of parameterized expectations (Marcet, 1991; den Haan and Marcet, 1990)
to approximate the map. Given a value of A, the method "solves" the model in the
sense of determining simulated r~alizations of the variables that satisfy the Euler
equations. In what follows, {ul'} denotes a realization of the output process given
A and a realization of {St}. A complete description of how we apply the method of
parameterized expectations to this problem is given in Section 3.1 below.

2.2. The Estimation Method

The nonlinearity of the economic model prevents estimation by traditional methods


since it is computationally intractable to compute the likelihood of a sample as a
function of the model's parameters. However, simulation methods can be used to
compute predicted probabilities and expectations under the model. Thus we propose
a new simulation estimator that estimates the model by searching for the value of
the parameter A for which the dynamic properties of data simulated from the model
match, as closely as possible, the properites of actual data.
Not all elements of Ut generated by the model are actaully observed weekly, so our
empirical strategy is to use latent-variable methods with our simulation estimator.
High quality observations on financial market prices, i.e., payoff data, are widely
available on a weekly basis, and so we concentrate on these series in the estimation.
We utilize weekly observations on three raw series: SPOTt , the spot exchange rate
(in $ per DM); FORWARD!, the 30-day forward rate (in $ per DM); and TBILL!, the
one month treasury bill interest rate, computed from the term structure, and quoted
on a bank discount basis. From the raw series we form a 3-element process Yt =
(Ylt, Y2t, Y3t)' with
Y1t = 100 * 10g(SPOTt/SPOTt-t},
An Estimator for Nonlinear Structural Models 9

Y2t = 100 * 10g(FORWARDt/SPOTt},


Y3t = TBILLt·
Exploratory empirical work indicates that {yt} is reasonably taken as a strictly
stationary process, while the levels of the exchange rate series are nonstationary.
The correspondence between the elements of Yt and those of the output vector Ut
are as follows: Country 1 is the U.S. and country 2 is Germany. Given a simulated
realization {Ul'} from the model, the corresponding b;} is computed as

ytt = 100 * log( de; ),


yt = 100 * 10gUi!et),
yt = 100* (360/30)[1- (I/Rt;)].
The expression for y~t converts 1/Rt;, which is the price at time t of $1 in period
t + 4, to an annualized interest rate using the bank discount formula customarily
applied to treasury bill prices (Stigum, 1990, p. 66).
The observed process is {yt} and the simulated process is {y;} as defined above.
The {yt} process is computed directly from the raw data while btl
is computed
using the structural model of Section 2.1. We assume the model to be "true" in the
sense that there is a particular value, AO, of the structural parameter vector and a
realization, {Sot}, of the exogenous vector such that the observed {yt} is obtained
from ({ SOt}, AO) in exactly the same manner that the model generates {ytl from
({ St}, A).
In broad terms, the estimation problem of this paper is analogous to the situation
described, among others, by Duffie and Singleton (1989) and Ingram and Lee (1991).
Common practice in such situations is to use a simulated method of moments estima-
tor of AO based on certain a priori selected moments of the data. We likewise propose
such an estimator, but we take a different approach in determining what moments to
match and in assigning relative weights in matching those moments.
The estimation strategy of this paper starts from the point of view that the structural
model should be forced to confront all empirically relevant aspects of the observed
process. The observed process bt} is strictly stationary and possibly nonlinear,
so its dynamics are completely described by the one-step ahead conditional density
f(Ytl{Yt-j}~d. Let 1(-1') denote a consistent nonparametric estimate computed
from a realization {yt}f=to' The estimator 1('1') defines what is empirically relevant
about the process and thereby provides a comprehensive standard of reference upon
which to match the economic model to the data.
The keystone to our structural estimator is the scoring function of the SNP estima-
tor of Gallant and Tauchen (1989, 1992), which provides a consistent nonparametric
estimator of the conditional density under mild regularity conditions. This use of
the nonparametric fit to define the criterion of estimation motivates our choice of
the term "nonparametric structural estimator". The Gallant-Tauchen estimator is a
truncation estimator based on a series expansion that defipes an hierarchy of increas-
ingly complex models. The estimator f (,1,) = f K ('1" (} K n) is characterized by an
auxiliary parameter vector BKn that contains the coefficients of the expansion; the
lOR. Bansal et al.

subscript K denotes the Kth model in the hierarchy. The length of OKn depends on
the model. In practice, K is determined by a model selection criterion that slowly
expands the model with sample size n and thereby ensures consistency. For the Kth
model in the hierarchy, the corresponding 0Kn solves the first-order condition

where C K n ( .) is the sample log likelihood of the corresponding model.


The nonparametric structural estimator is defined by mimicking this condition.
Specifically, subject to identifiability conditions, a consistent estimator is available by
choosing>. to make the same condition hold (as closely as possible) in the simulation
a A T
~CKn({YT}T=To,(IKn) ~ O.
A

Kn

The left-hand side is the gradient of the log likelihood function evaluated at a simu-
lated realization {y;} ;=TO and at the 0Kn determined by fitting the Kth SNP model
to the actual data {ytl f::to. If the length of >., fA, is less than the length of (I K, f K,
then the model is overidentified (under the order condition) and a GMM criterion is
used to minimize the length of the left-hand side with respect to a suitable weighting
matrix.
Interestingly, this approach defines a consistent and asymptotically normal esti-
mator irrespective of the particular SNP model used, so long as fK ~ fA and an
identification condition is met. In practice, we implement the estimator using the
particular SNP model that emerges from the specification search in the nonparametric
estimation of f (·1· ). The choice of K is thus data-determined. This selection rule
forces the scoring function to be appropriate for the particular sample at hand. The
scoring function of the fitted SNP model contains just those indicators important to
fit the data and no more. Also, because the fitted SNP model has the interpretation of
a nonparametric maximum-likelihood estimator, the information equality from max-
imum likelihood theory provides a convenient simplification that greatly facilitates
estimation of the weighting matrix for the GMM estimation.

3. IMPLEMENTING THE ESTIMATOR

In this section we discuss the practical aspects of implementing the nonparametric


structural estimator described above. The implementation entails an initial SNP
estimation of the conditional density of observed payoff data. The score function from
this density estimate defines what properties our nonparametric structural estimator
must mimic. Because estimating SNP models has been described extensively in
Gallant and Tauchen (1989,1992), we do not review that procedure here.
Following the SNP estimation, there are three distinct components to the proce-
dure. The first involves using the method of parameterized expectations to solve the
structural model for a particular value of the parameter vector >.. The second entails
combining the initial SNP estimation with the parameterized expectations procedure
An Estimator for Nonlinear Structural Models 11

to form the GMM objective function for the nonparametric structural estimator. The
third is optimization of the objective function. Each of these components is described
in detail below.

3.1. Solving the Model Using Parameterized Expectations

We use the method of parameterized expectations (Marcet, 1991; den Haan and
Marcet, 1990) to obtain simulated data that satisfy the Euler equations of the structural
economic model. In essence, this method approximates conditional expectations of
certain terms with the projections of those terms on a polynomial in the state variables.
The method uses Euler equations to iterate between postulated values of time series
and projections based on those postulated values until those values and projections
each converge. This procedure will be explained more fully below.
We find that the specification of the Euler equations greatly affects the speed with
which the parameterized expectations algorithm converges. From Section 2, the first
two Euler equations are

Pit ) ( 1 + 1/JCit i = 1,2,


1 + 1/Jc i,t+1
) ( ) ]
E t [ MUCit - j3MUCi ,t+, ( Pi,HI 1 -1/Jmi,t+1 = 0,

Using the definition of the velocity of money, Vit = Cit Pit! Mit, i = 1,2, one form
in which these equations can be rewritten is

E t ( MUcit )
Vit = E [j3MU ( dCi,t+1 )( ./,
I+""cit ) (1 -1/J ) ]' i = 1,2.
t Ci,t+1 dMi,t+1 Vi,t+1 1+"'Ci,t+1 mi,t+1

Because of the time nonseparabilities in our model, it is also possible to rearrange


these Euler equations into an alternative form that expresses velocity as a single
conditional expectation rather than the ratio of two conditional expectations. (We
omit the derivation here.) It would seem at first that expressing the Euler equations as
a single conditional expectation would be advantageous since the solution algorithm
would have to estimate only one conditional expectation per Euler equation rather
than two. However, we have found that convergence of the algorithm with this
specification is much slower. This occurs because the single conditional expectation
contains a difference of two terms that remains stable across iterations, while the
time series from which it is constructed moves around substantially. The conditional
expectation ofthis difference is less informative for updating guesses at the solution
time series than are the two conditional expectations specified in the ratio above.
The next step in setting up the Euler equations entails various mathematical
manipulations that allow them to be expressed in terms of conditional expectations
of functions of velocity, consumption growth, and money growth:

Et [fi2 (dCI,t-L c+2, dC2,t-Lc+2,' .. ,dCI,t+Lc+l, dC2,t+Lc+l, Va, Vi,t+l, dMi,t+l; A)]
12 R. Bansal et al.

i = 1,2,

where the lij (.) are particular functional forms too complex to be written out here.
The market clearing conditions of the model imply that

dCit = g(Vit, Vi,t-hdWit; A) = dWit(1 + 1PO~~=D/(1 + 1PO~~-t), i = 1,2


Given a vector A and a realization of the exogenous state variables St - which includes
money growth dMit and endowment growth dWit - consumption growth dCit is an
exact function of velocity, so the Euler equations above are fixed-point equations in
the two velocity series. This means we can solve these first two Euler equations for
the two equilibrium velocity processes as a unit before considering the remaining
Euler equations. Using the solution velocity processes, we can then calculate directly
equilibrium consumption growth dCit and inflation dPit for the two countries, and
we can solve the remaining Euler equations to determine the equilibrium k-period
interest rates in country 1, R~t, the premium of the k-period forward rate over the
spot rate H/et and exchange rate growth det.
Several methods have been used to solve nonlinear rational expectations models
with endogenous state variables (Taylor and Uhlig, 1990; Judd, 1991). Among these,
parameterized expectations is particularly suited to use with a simulation estimator
because it produces simulated data that satisfy the Euler equations without having to
solve for the full decision rule. We parameterize each of the conditional expectations
in the above Euler equations as a function of the exogenous and endogenous state
variables. The augmented vector of state variables is

where 1 is concatenated for use as a constant in the regressions described below. If


Lc :$ 1, then there are no endogenous state variables, and St is just equal to St and
a constant. Any class of dense functions, such as polynomials or neural nets, can be
used to approximate the conditional expectations. The particular functional form we
use to parameterize expectations is

where poly(·) is a polynomial in St, and Vij is the vector of its coefficients. We
choose to use an exponential polynomial because economic theory implies that
Et[Fij,t] should be positive. In practice, the polynomial we use consists of linear
and squared terms of the elements of St.
Below is a description of the algorithm for solving for the equilibrium velocity
series given a vector A. In every instance, the ranges of the indices are i = 1, 2 and
j = 1, 2; superscripts indicate iteration numbers.

Step 1.
Simulate a realization of {Ut}, where Ut is iid N(O, n).
An Estimator for Nonlinear Structural Models 13

Step 2.
From some initial So, generate a realization of {St} using

log St = ao + A log St-I + Ut·


In practice, we set So to a vector of ones, but in performing the parameterized
expectations regressions we exclude the first five hundred observations from the
simulated data to eliminate any effect from choosing initial values.

Step 3.
Determine starting realizations of the velocity series {~~}.
We consider two possible ways to do this. The first is to specify starting values
for v?j' perhaps values of Vij obtained from a previous solution at a nearby A. Then,
given v?j and some initial observations on velocities ~~, t = 0, ... , Lc, the remaining
elements of the starting velocity series for t = Lc + 1, ... , T, can be determined
using the following relationships recursively

This structure is recursive because S~ contains de? t-I. A drawback to this approach
is that the simulated time series produced by the 'solution procedure are dependent
upon the starting values, so any attempt to replicate the solution exactly would require
knowing those starting values.
A second approach for establishing starting realizations of the velocity series
would set l'I~ and V2~ to be constants for all t. For these constants, one could
calculate steady-state values for the two velocities, or simply set the velocities equal
to 1. This latter approach still produces convergence in a relatively small number of
iterations.
Regardless of the approach used to determine starting values of velocity, if one
uses the procedure described below to improve the stability of the algorithm by
dampening iteration updates, starting values must also be specified for the polynomial
coefficients v?j. We recommend setting all of the coefficients to zero except the
constants. This means that Et(Fij,t) = exp[poly(.s\,vij)] reduces to Et(Fij,d =
exp[constan4j]. The constants can be set equal to the log of the unconditional means
of the Fij,t's. Setting the initial polynomial coefficients in this way gives a very
stable position from which to start the iterations.

Step 4.
Iteration k: Using the ~;-I series, calculate the Fi~~1 and regress each of these four
on a linearized version of exp[poly(S/-I, vb)] to estimate vb. The linearization is
d one aroun d vij
k-I
.
A linearized version of the exponential function is used to allow one to perform
linear regressions rather than nonlinear regressions at each iteration. When the
14 R. Bansal et al.

coefficients converge (vt = vt-


1 ), the value of the exponential function is equal to

the value of its linearized version at the point at which we want to evaluate it.
Den Haan and Marcet (1990) actually suggest a more gradual way of modifying
the guesses at the polynomial coefficients from iteration to iteration. Rather than
setting vt equal to the coefficients obtained from the regressions, one can set vt
equal to a convex combination of those coefficients, call them bt, and the guess at
the coefficients from the previous iteration as
k = Pbk.
v··
'J 'J
+ (1 - p )v k-l
..
'J '

where 0 < p ::; 1. This procedure has the effect of dampening the speed with which
the guesses at the coefficients are updated. The smaller is p, the more gradually the
coefficients are modified from one iteration to the next. One might want to use this
gradual updating scheme to stabilize iterations that are not well behaved. For the
model in this paper, we were always able to set p = 1, which implies no dampening
in updating the coefficients.

Step 5.
Determine the two Vi~ series according to
k -k-l k -k-l k
Vit = exp[poly(St ,vidll exp(poly[St ,Vi2)],
and the two dC~t series according to

dC~t = g(Vi~, Vi~t-l' dWit; A).

Step 6.
Repeat steps 3 and 4 until the velocity series converge. Convergence is reached when

m~xm~xl(Vi~ - Vi~-l)/(Vi~-l +E)I::;~,

where I' and ~ are small positive numbers.


Note that we check convergence on the velocity series, that is, on the ratios of
the parameterized expectations projections, which is a different procedure than that
used in Marcet (1991). Marcet looks for convergence of the coefficients of the
projections, rather than of the projections themselves. We check convergence on
the projections because of complications that arise when there is a high degree of
multicollinearity between the variables of the parameterized polynomial, as is the
case in our model. Multicollinearity makes it possible for the coefficients of the
polynomial to continue to oscillate between successive iterations even though the
projection onto the polynomial has essentially converged. Since it is the values of
the projections that are important for solving the model, we look for convergence of
those values.

In summary, the parameterized expectations solution method works by alternating


between estimating values of conditional expectations based on some postulated
An Estimator for Nonlinear Structural Models 15

realization of the velocity processes (which amounts to estimating the lIi/S) and
updating the postulated values of the velocity processes based on the estimated
conditional expectation values. The procedure continues until the velocity processes
converge.

Once the equilibrium velocity and consumption growth series have been determined
from the first two Euler equations, the four-period interest rate series in country 1,
the premium of the four-period forward exchange rate over the spot rate, and the
exchange rate growth can be determined from the remaining Euler equations without
additional iterations. The Euler equations can be written as

In these equations dPit = (dMit ~d/(dCit ~,t-d, the gross inflation rate in
each country. As before, 113 and 123 are particular function forms. The conditional
expectations terms in the equations are each estimated by regressing the value of the
function inside the expectations operator on a polynomial in St. The polynomial we
use consists of the elements of St raised to the first, second, third, and fourth powers.
The resulting simulation values are used to form {y;}.
The time required to solve the structural economic model at some value of A is
an important consideration, since our nonparametric estimator requires solutions at
many different values of A in finding the value that minimizes the GMM objective
function. When we use simulated time series of length 1000 to solve the model
(excluding an initial discarded 500 observations), convergence for most values of A
is achieved in approximately one minute on a SUN SPARCstation 2.

3.2. Defining the GMM Objective Function

The Gallant-Tauchen (1992) SNP estimator underlies our nonparametric structural


estimator. Following their notation, given the observed process {yd, let Xt-I =
(Y~_I"'" y~-L)' and let p(YtJXt-l, AD) denote the conditional density of Yt con-
ditional on L lags of itself and the true AD. By stationarity, we can suppress the t
subscript and simply write p(yJx, AD) when convenient. In addition, let p(y, x, AD)
denote the joint density of (Yt, Xt-I). Frequently, we suppress the dependence of
16 R. Bansal et al.

the conditional density on AO and write p(ylx), but we always make explicit the
dependence of the joint density p(y, x, AO) on AO, because that becomes important.
The SNP estimator is a sieve estimator that is based on the sequence of models
{!K(ylx,OK)}K=o, where OK E 0K ~ ~K, 0 K ~ 0K+I and where!(ylx,OK) is
a truncated Hermite series expansion. This hierarchy of models can, under regularity
conditions, approximate p(ylx) well in the sense

where II . II is a Sobelov norm. The approximation also holds along a sequence of


estimated models fitted to data sets {y-L+h ... , Yn}, n = 1,2, ... 00, with the the
appropriate model for each n determined by a model selection strategy.
The key component of our non parametric structural estimator is the mean gradient
of the log-density of a Kth order SNP model,

In practice, the above expectation is approximated by simulating {U; r~"=I' forming


{y;} as just described, taking lags to form {X;_I}' and then averaging
T
Y(A,OK) = ~~)8180)log[!K(Y;lx;_I,OK)1.
7"=1

We take Y(A,OK) ~ g(A,OK).


The nonparametric structural estimator is defined as follows: Let {Yt} ~=-L+ I be
a realization of the observed process and let

Thus, BKn is the estimated parameter vector of a Kth order SNP model fitted to the
data by maximum likelihood. The estimator Ais the solution of the GMM estimation
problem

where

and where W n is a symmetric positive definite weighting matrix such that Wn -+ W


almost surely and W is positive definite. In the application, we use

Wn = {~ t(8180) 10g[!(Ytlxt-l, BKn)](8180') 10g[f(Ytlxt-l, BKn)) } -I,


t= 1
An Estimator for Nonlinear Structural Models 17

which is the natural estimate of the inverse of the information matrix based on the
gradient-outer-productformula. This choice makes the minimized value of the GMM
objective function, sn(.X), approximately X2(£K - £A) for large K.
Below we consider several different algorithms for minimizing sn (A). Regardless
of the algorithm, it is advantageous to control the interface between the optimizer
and the economic model by scaling the optimizer's guesses at the parameter values
to be within a range in accordance with the economic theory behind our model.
For example, in our model it only makes sense for 8 to be between 0 and 1, so we
constrain the optimizer to attempt solutions only with such values. These constraints
are imposed by using various forms of logistic transformations.

3.3. Optimizing the Objective Function

The basic computational task for the estimator is to evaluate ~ = argminA.A {sn (A)}.
This minimization is not straightforward for our problem because of the large number
of parameters to be estimated (between 37 and 41 depending upon whether one, two,
or three lags of consumption services enter the utility function) and because analytical
derivatives of the objective function with respect to A are not available. We tried
four different algorithms for minimizing the objective function and found significant
differences across algorithms for our problem.

3.3.1. Optimizing with NPSOL and DFP


We initially tried two classic gradient search methods: NPSOL (Gill, Murray, Saun-
ders, and Wright, 1986), and Davidon-Fletcher-Powell (DFP), as implemented in the
GQOPT package (Quandt and Goldfeld, 1991). Both algorithms work in a similar
manner. A search direction is determined, a one-dimensional optimization is per-
formed along that direction, and then the search direction is updated. The process is
repeated until a putative optimum is achieved.
These algorithms work quite well when analytic derivatives are available. For
example, we use NPSOL to perform the preliminary SNP parameter estimation to
compute BKn, which is needed to form Sn(A). Analytical derivatives are available
for the SNP objective function, and NPSOL works adequately even on fairly large
problems. In our application, the SNP estimation itself entails a specification search
over roughly thirty different models with some having as many as 150 parameters.
That whole effort takes only three or four days on a SUN SPARCstation 2. In an
variety of other SNP applications, NPSOL has been found to work reasonably well
(Gallant and Tauchen, 1992).
Analytical derivatives of Sn(A), however, are computationally infeasible. The
process {y;} is a solution to a fixed-point problem, as are its analytical derivatives.
Computing 8s n (A) /8A would involve computing a solution to a fixed-point problem
for each component. Evaluating Sn(A) and its derivatives for arbitrary A is well
beyond the reach of current computing equipment. The large computational demands
for analytical derivatives appear to be intrinsic to all solution methods for nonlinear
structural models, including those described in the lBES Symposium (Tauchen, 1990;
Taylor and Uhlig, 1990) or Judd (1991), since they all entail solving nonlinear fixed-
18 R. Bansal et al.

point problems.
Gradient search methods use numerical derivatives in place of analytical deriva-
tives when the latter are unavailable. For our type of problem, this does not work well.
The computations turn out to be about as demanding as would be those for analytical
derivatives approximating the gradient of the objective function oSn(>")/O>' at a.
single point>. entails computing the simulated process {y; } after small perturbations
in each of the >.. With f>. on the order of 37 to 41, this entails, at a minimum, recom-
puting the equilibrium of the model that many additional times just to approximate a
single one-sided gradient. The net effect is to generate about as many function calls
as would a naive grid-search. In fact, our experience suggests that a naive grid search
might even work better. In the course of approximating oSn(>')/O>' via perturbing >.
and forming difference quotients, values of >. that produce sharp improvement in the
objective function are uncovered quite by happenstance. Neither NPSOL nor DFP
retains and makes use subsequently of these particularly promising values of >.; the
effort that goes into to computing the equilibrium for these>. is lost. Simple grid
search would retain these >.'s.

3.3.2. Optimizing with Simulated Annealing


We also tried simulated annealing, a global method. An implementation of simulated
annealing by William Goffe is available in the GQOPT optimization package (Quandt
and Goldfeld, 1991). We used an updated version that William Goffe kindly made
available to us. See Goffe, Ferrier, and Rodgers (1992) for a discussion of the
algorithm and additional references. We give a brief summary of the essential ideas
here.
From a point >., simulated annealing changes element i of>. using
>.~ = >'i + TVi,
where T is a uniformly distributed random number over [-1, 1] and Vi is the ith element
of a vector of weights V. If sn (>") is smaller than sn (>.) the point is accepted. If not,
the point is accepted if a random draw from the uniform over [0, 1] exceeds
p = e[Sn(>")-Sn().)]/T.

The elements of V and T are tuning parameters that must be selected in advance and
are adjusted throughout the course of the iterations. We used the defaults. There are
additional tuning parameters that determine when these adjustments occur. Again,
we accepted the defaults.
The algorithm was defeated by the large number of function evaluations that it
requires. Most exasperating was its insistence on exploring unprofitable parameter
values. After making some promising initial progress the algorithm would plateau
far from an optimum and give no indication that further progress could be achieved
if the iterations were permitted to continue.

3.3.3. Optimizing with Simplex Methods


The optimization method that performs best for our problem is the simplex method
developed by NeIder and Mead (1964). Fortran code for implementing this method
An Estimator for Nonlinear Structural Models 19

is available in the GQOPT optimization package (Quandt and Goldfeld, 1991). The
method works as follows: We begin the minimization of a function of fA variables
by constructing a simplex of (fA + 1) points in fA -dimensional space: Ao, AI, ... ,
Ai>.. We denote the value of the function at point Ai by Si. The lowest, highest, and
second highest values are

Sl = min(si), Sh = max(sd,
• •
corresponding to points AI, Ah, and Ahh. We also define the notation [AiAj] to indicate
the distance from Ai to Aj.
The algorithm works by replacing Ah in the simplex continuously by another
point with a lower function value. Three operations are used to search for such a new
point-reflection, contraction, and expansion-each of which is undertaken relative to
the centroid Aof the simplex points excluding Ah. The centroid is constructed as

i =F h.

The reflection of Ah through the centroid is Ar , which is defined by

Ar = (1 + ar)A - arAh,
where a r > 0 is the reflection coefficient. Ar lies on the line between Ah and A, on
the far side of A, and a r is the ratio of the distance [A r A] to [Ah A]. If Sl < Sr ::; Shh,
we replace Ah with Ar and start the process again with this new simplex.
If reflection has produced a new minimum (sr < Sl), we search for an even lower
function value by expanding the reflection. The expansion point is defined by

where a e > 1 is the expansion coefficient that defines the ratio of the distance [Ae).]
to [ArA]. Ae is farther out than Ar on the line between Ah and A. If Se < Sr, Ah is
replaced in the simplex by Ae. Otherwise, the expansion has failed and Ar replaces
Ah. The process is then restarted with the new simplex.
If reflection of Ah has not even produced a function value less than Shh - which
means that replacing Ah with Ar would leave Sr the maximum - we rename Ah to be
either the old Ah or AT) whichever has a lower function value. Then we attempt to
find an improved point by constructing the contraction

where 0 < a c < 1. The contraction coefficient a c is the ratio of the distance [AcA]
to [AhA]. If Sc < Sh, then the contraction has succeeded, and we replace Ah with
Ac and restart the process. If this contraction has failed, we construct a new simplex
by contracting all the points toward the one with the lowest function value, which is
accomplished by replacing the Ai'S with (Ai + AI) /2. Then the process of updating
the simplex restarts.
20 R. Bansal et al.

NeIder and Mead suggest stopping their procedure when the standard deviation
of the Ai'S is less than some critical value. In our empirical work, we strengthen this
stopping rule by restarting the algorithm several times from the value on which the
NeIder-Mead procedure settles. When this restarting leads to no further significant
improvement in the objective function value, we accept the best point as the minimum
of the function. In implementing the algorithm, we also found it advantageous to
modify the error handling procedures of the NeIder-Mead code provided in GQOPT
slightly to allow us to start the procedure with a wider ranging simplex.
The NeIder-Mead simplex method was far more successful than the other methods
we tried for minimizing our objective function. There are two aspects of this method
that we believe are responsible for its success. First, the method finds new lower
points on the objective surface without estimating derivatives. Second, by using
the operations of reflection, expansion, and contraction, the NeIder-Mead method
is designed to jump over ridges in the objective surface easily in searching for new
lower points. This property can be important in preventing an optimization algorithm
from shutting down too early. Despite these advantages, however, the performance
of the NeIder-Mead method is not completely satisfactory, because it requires a very
large number of function calls to find the minimum of the function. Given the number
of parameters in our model and the complexity of evaluating the objective function
at anyone point, the method can occupy several weeks of computing time on a Sun
SPARCstation. Even though this computing demand is substantial and far greater
than we expected from the outset of this project, we still consider our non parametric
structural estimator very successful in achieving our goal of estimating a nonlinear
rational expectations model and fully accounting for the complex nonlinear dynamics
of actual time series in that estimation.
Results from applying this estimator to the illustrative monetary model are avail-
able in Bansal, Gallant, Hussey, and Tauchen (1992).

4. CONCLUSION

In this paper we describe a new nonparametric estimator for structural equilibrium


models and show its application to an equilibrium monetary model. The discussion
of the implementation of the estimator indicates important considerations that might
arise in applying the estimator to other nonlinear rational expectations models.
There are several advantages to this estimator. By using the method of parameter-
ized expectations to solve the model numerically, structural equilibrium models can
be estimated without limiting oneself to linear approximations. By using a consistent
non parametric estimate of the conditional density of the observed data to define the
criterion to be minimized in estimation, the estimator forces the model to confront the
law of motion of the observed data, which can include complex forms of nonlinearity.
Finally, the estimator provides simulated data from the model. If a model is rejected,
then it is possible to evaluate the dimensions in which it fails to match characteristics
of the observed data, thus providing valuable diagnostic information for building
better models.
An Estimator for Nonlinear Structural Models 21

ACKNOWLEDGEMENTS

This material is based upon work supported by the National Science Foundation
under Grants No. SES-8808015 and SES-90-23083. We thank Geert Bekaert,
Lars Hansen, David Hsieh, Ellen McGrattan, Tom Sargent, and many seminar and
conference participants for helpful comments at various stages of this research.

REFERENCES

Bansal, R., 1990, "Can non-separabilities explain exchange rate movements and risk premia?",
Carnegie Mellon University, Ph.D. dissertation.
Bansal, R., A. R. Gallant, R. Hussey and G. Tauchen, 1992, "Nonparametric estimation of
structural models for high-frequency currency market data", Duke University, manuscript.
Bollerslev, T., 1986, "Generalized autoregressive conditional heteroskedasticity", Journal of
Econometrics 31,307-327.
den Haan, W. J. and A. Marcet, 1990, "Solving the stochastic growth model by parameterizing
expectations", Journal of Business and Economic Statistics 8, 31-4.
Duffie, D. and K. J. Singleton, 1989, "Simulated moments estimation of markov models of
asset prices", Stanford University, Graduate School of Business, manuscript.
Dunn, Kenneth and K. J. Singleton, 1986, "Modeling the term structure of interest rates under
non-separable utility and durability of goods", Journal of Financial Economics 17, 27-55.
Engle, R. F., 1982, "Autoregressive conditional heteroscedasticity with estimates of the vari-
ance of United Kingdom inflation", Econometrica 50, 987-1007.
Feenstra, R. C., 1986, "Functional equivalence between liquidity costs and the utility of
money", Journal of Monetary Economics 17,271-291.
Gallant, A. R. and G. Tauchen, 1989, "Seminonparametric estimation of conditionally con-
strained heterogeneous processes: asset pricing applications", Econometrica 57, 1091-
1120.
Gallant, A. R. and G. Tauchen, 1992, "A nonparametric approach to nonlinear time series:
estimation and simulation", in David Brillinger, Peter Caines, John Zeweke, Emanuel
Paryen, Murray Rosenblatt, and Murad S. Taggu (eds.), New Directions in Time Series
Analysis, Part II, New York: Springer-Verlag, 71-92.
Gill, P. E., W. Murray, M. A. Saunders and M. H. Wright, 1986, "User's guide for NPSOL
(version 4.0): a Fortran package for nonlinear programming", Technical Report SOL 86-2,
Palo Alto: Systems Optimization Laboratory, Stanford University.
Goffe, W. L., G. D. Ferrier, and J. Rodgers, 1992, "Global Optimization of statistical functions:
Preliminary results" in Hans M. Amman, David A. Belsley, and Louis F. Pau (eds.), Com-
putational Economics and Econometrics, Advanced Studies in Theoretical and Applied
Econometrics, Vol. 22, 19-32, Boston: Kluwer Academic Publishers.
Hansen, L. P., 1982, "Large sample properties of generalized method of moments estimators",
Econometrica 50, 1029-1054.
Hansen, L. P. and T. J. Sargent, 1980, "Formulation and estimation of dynamic linear rational
expectations models", Journal of Economic Dynamics and Control 2, 7-46.
Hansen, L. P. and K. J. Singleton, 1982, "Generalized instrumental variables estimators of
nonlinear rational expectations models", Econometrica 50, 1269-1286.
Ingram, B. F. and B. S. Lee, 1991, "Simulation estimation of time-series models", Journal of
Econometrics 47, 197-205.
Judd, K. L., 1991, "Minimum weighted least residual methods for solving aggregate growth
models", Federal Reserve Bank of Minneapolis, Institute of Empirical Macroeconomics,
manuscript.
22 R. Bansal et al.

Lucas, R. E., Jr., 1982, "Interest rates and currency prices in a two-country world", Journal of
Monetary Economics 10, 335-360.
Marcet, A., 1991, "Solution of nonlinear models by parameterizing expectations: an applica-
tion to asset pricing with production", manuscript.
McCallum, B. T., 1983, "On non-uniqueness in rational expectations models: an attempt at
perspective", Journal of Monetary Economics 11, 139-168.
NeIder, J. A. and R. Mead, 1964, "A simplex method for function minimization", The Computer
Journal 7, 308-313.
Quandt, R. E. and S. M. Goldfeld, 1991, GQOPTIPC, Princeton, N.J.
Stigum, M., 1990, The money market, 3rd ed., Homewood, II.: Dow lones-Irwin.
Svensson, L. E. 0., 1985, "Currency prices, terms of trade and interest rates: a general
equilibrium asset-pricing cash-in-advance approach", Journal of International Economics
18,17-4l.
Tauchen, G., 1990, "Associate editor's introduction", Journal of Business and Economic
Statistics 8, l.
Taylor, J. B. and H. Uhlig, 1990, "Solving nonlinear stochastic growth models: a comparison
of alternative solution methods", Journal of Business and Economic Statistics 8, 1-17.
A.J. HUGHES HALLETT AND YUE MA

On the Accuracy and Efficiency of GMM Estimators:


A Monte Carlo Study

ABSTRACT. GMM estimators are now widely used in econometric and financial analysis.
Their asymptotic properties are well known, but we have little knowledge of their small sample
properties or their rate of convergence to their limiting distribution. This paper reports small
sample Monte Carlo evidence which helps discriminate between the many GMM estimators
proposed in the literature. We add a new GMM estimator which delivers better finite sample
properties. We also test whether biases in the parameter estimates are either significant or
significantly different between estimators. We conclude that they are, with both relative and
absolute biases depending on sample size, fitting criterion, non-normality of disturbances, and
parameter size.

1. INTRODUCTION

One of the most interesting developments in econometric theory over the past decade
has been the introduction of the General Method of Moments (GMM) estimators.
Not only is this a significant development because it offers a new and more flexible
approach to estimation, it also opens up an estimation methodology that is particularly
well suited to a range of problems - such as the econometrics of financial markets -
where the form of the probability distributions, as well as their parameters, plays an
important role.
The theoretical properties of GMM estimators - consistency, asymptotic effi-
ciency and sufficiency - were established rapidly after Hansen first introduced the
concept (Hansen, 1982). These properties are established in Duffie and Singleton
(1989), Smith and Spencer (1991) and Deaton and Laroque (1992). However, few
results have been presented on the GMM's small sample properties or rate of conver-
gence to consistency. This would provide important information on the general relia-
bility of GMM estimators. It seems that we lack such information because, although
the principle of GMM estimation is well defined, there is no obvious agreement on
the algorithms to be used for computing the estimates themselves. The theoretical
contributions have been vague on implementation and the choice of fitting criterion.
This paper examines 7 different suggestions from the recent literature.
The first purpose of this paper is to provide some empirical experience that helps
the user discriminate between different GMM estimation techniques. Second, we
introduce a new GMM estimator which, in our experiments at least, produces better
finite sample results than any of the other techniques reported in the literature. Third,

D. A. Belsley (ed.). Computational Techniques for Econometrics and Economic Analysis. 23-44.
© 1994 Kluwer Academic Publishers.
24 A.J. Hughes Hallett and Yue Ma

we draw a distinction between the case where we have to estimate a few model
parameters conditionally on an assumed distribution for the random components (the
traditional econometric approach) and the more general problem of fitting a whole
distribution or probability model.

2. TIlE GMM ESTIMATORS STUDIED

Most GMM estimators can be specified within the framework established by Hansen
(1982). That framework exploits the general orthogonality condition

(1)

where {3 is a k-vector of parameters, Xt is a T-vector of data, and g(x,{3) is an


m-vector of functions of data and parameters. We would have (1) if the maintained
hypothesis is a conventional econometric model, say, Yt - h(Xt , f3) = Ut. That
gives the standard regression approach, where we try to minimise some function of
g(.) = Yt - h( Xt, {J) as a sample ofT observations taken right through the distribution
of Ut. But we could also pick {J to make the fitted Ut distribution yield the same
characteristics as we actually observe in the data on Yt, given Xt. More generally, we
pick {J to minimise

(2)

where f (Yt) is any function of the observed data, and j (Xt, {J) is its fitted counterpart
under the maintained hypothesis and chosen parameter values /3.
In practice we have to define the best fit in some metric, i.e. we choose by /3
solving

(3)

where the value of r defines the norm and W the weighting function. This is the
GMM strategy when the f(-) represent a series of moments from the probability
distribution of Yt; that is, f (.) defines the sample moments, and j (.) represents the
fitted moments given Xt and the choice of {J. In many cases we do not have an
analytic maintained hypothesis, so the fitted moments j(-) have to be constructed
by numerical simulation with pseudo-data replicated many times through the model
to generate numerical evaluations of those moments. That variant is the method of
simulated moments.
Now, if (1) is correct the sample moment,
T
gT({3) =L g(Xt, {3) / T
t=1

should be close to zero when evaluated at {3 = /3. It is therefore reasonable to


/3
estimate {3 by choosing to minimise
On the Accuracy and Efficiency of GMM Estimators 25

Jr«(3) = gT«(3)' W T gT (;3) ,

where WT is a positive definitive weighting matrix. Setting WT = lor n- I , where


Euu' = n, gives the OLS or GLS regression based GMM estimators, and setting
W T = N(N' N)-I N ' , gives the instrumental variable version with instruments N.
Varying the specification of g(., .) gives different GMM estimators:

(1) The simple method of moments.


Defineg(Xt,(3) = [Xt-fLI, T~I (Xt- m J)2- fL2 ]'andWT=h Then

where mi is the i-th sample moment and fLi is the corresponding central moment
from the probability density function expressed in terms of the parameters of the
underlying theoretical model. The simplest GMM estimator minimises
2

Jr«(3) = gT(f3), . gT«(3) = L (mj - fLj)2 .


j=1

The solution to this problem is to set PI = ml and P2 = m2·

(2) The method of simulated moments (Smith and Spencer, 1991)

Define

g(Xt,(3) = [Xt - fL), T ~ 1 (Xt - mJ)2 - fL2,

T ~ 1 (Xt - mJ)3 - fL3, T ~ 1 (Xt - mJ)4 - fL4]'

or

The Duffie and Singleton (1989) GMM estimator then minimises


4
J4T «(3) = gT«(3)' gT«(3) = L (mj - fLj?,
j=1

while Smith and Spencer's (1991) version considers only the first three moments
3
J3T «(3) = 9T«(3)' 9T«(3) = L (mj - fLj)2 .
j=1
26 A.J. Hughes Hallett and Yue Ma

(3) Our new GMM method (Hughes Hallett, 1992):

Define g4T({3) = [I: xt/T - ILl, vI: (Xt - md 2j(T - 1) -..fii2,

</I: (Xt - md4j(T - 1) - yIIL4]' ,


and then construct a GMM estimator by minimising
4
lr([3) = I: (m}!J - IL}!J? .
j=l

This estimator has the advantage that each element in the objective function is ex-
pressed in the same units of measurement. It therefore produces a result that is
independent of those units, and hence of the weighting of the moments implicit in
the previous two estimators.

(4) The Newey and West (1987) version of the GMM estimator introduces the
weighting matrix WT = Vrl, where
T
VT = I: g(Xt,[3*)g'(xt,[3*)jT,
t=l

and where [3* is an initial estimator of [3 obtained from an unweighted method of


simulated moments estimator l .

(5) The Deaton-Laroque method (Deaton and Laroque, 1992) is an instrumental


variable based GMM estimator, adapted here to the problem of fitting a complete
distribution. Define Zt = (1, Xt-l , Xt-2, Xt-3)' and Z = (Z;, Z~, ... , Z~)', where
the Xt'S are actual observations in ascending order t = 1 ... T.
Now let

Xt + Xt-l
at-l = 2 ,ao = 0, aT = +00 ,
and f(x, [3) is the probability density function of Xt. Then Ut represents the differ-
ence between the actual frequency of observations in the interval [at-I, ad and the
theoretical probability of getting an observation in the same interval according to the
maintained hypothesis. We then pick [3 to minimise the "error" between the two
probabilities, that is, to maximise the fit between the observed relative frequencies
and the maintained probability. To do that, write U = (u\, U2, ... ,UT)', together
with
1 Other values are obviously possible for (3*, and it would certainly be possible to iterate
on the (3* values to give a fully converged GLS style estimator.
On the Accuracy and Efficiency of GMM Estimators 27
T
g(Xt, (3) = T· Z;Ut, and gT(f3) =~ g(Xt, (3)/T = Z' u.
t=1

Let WT = (Z' Z)-I. Then the Deaton-Laroque estimator minimises

Jr(f3) = gr WT gT = U' Z(Z' Z)-I Z' U .


We have followed Tauchen (1986) and Deaton and Laroque in using lagged obser-
vations from the data as instruments. But ordering those observations at discrete
intervals over the data's range is necessary to construct the "data" sets for our Monte
Carlo experiments.

(6) Finally, we also compare the GMM estimators with the Maximum likelihood
(ML) estimator. Suppose the pdf is I(Xt, (3). The ML estimator maximises
T
J(f3) = II I(Xt, (3) .
t=1

(7) The existing literature sheds very little light on the small sample properties
of GMM estimators. There appear to be just two studies that attempt to do so.
Tauchen (1986) and Gregory and Smith (1990) both look at the performance of
GMM estimators in a very particular model of assets in macroeconomic performance.
Gregory and Smith find (i) that small sample bias increases as the size of the parameter
being estimated increases (as we do below), (ii) the confidence intervals on the GMM
estimates shrink significantly with increasing sample size (as we do), and (iii) the rate
of convergence to consistency slows noticeably when there are stronger dynamics in
the model. Tauchen concentrates on the instrumental variable version of the GMM
estimator in the same model and finds that the finite sample results are not sensitive
to the choice of instruments. But all of this is done with just 2 sample sizes, 2 GMM
estimators, 2 parameter settings, and 1 maintained hypothesis.
Deaton and Laroque (1992), by contrast, use a different model and report poor
estimates of the underlying distributions with small and medium sample sizes. No
details are given; but evidently rather large samples may be needed to achieve the
desired asymptotic properties - depending on the model, parameter values, estimation
criterion, and distributional context chosen. That calls for closer investigation, both
between different GMM estimators and relative to traditional estimation techniques.

3. THE MONTE CARLO EXPERIMENTS

To compare the performances of the various GMM estimators and the ML estimator
in finite samples, we ran a series of Monte Carlo experiments using sample sizes of
20 and 200 and a variety of parameters values. First, we took 5 cases of the normal
distribution with parameter values (1", (J'2) = {(O, 1), (0,0.25), (0,2), (2,0.25),
(2,2)}; then 3 cases of the gamma distribution with parameters (r, A) = {(1,3),
28 A.i. Hughes Hallett and Yue Ma

TABLE 1

Normal dist. SAMPLE SIZE =200


1st paramo JL 2nd paramo a 2 average X2 p-va1ue
METHOD bias variance bias variance (d.f. =17)
TRUE PARAM. (JL, ( 2 ) = (0,1)
AHH 0.00430 0.00484 -0.01135 0.01207 ILl 85%
HNW 0.00467 0.00617 -0.05450 0.01225 11.7
D-L 0.00471 0.00941 -0.07271 0.01385 12.2
DS 0.00480 0.00984 -0.07334 0.01388 12.6
SIMPLElSS(3)IML 0.00492 0.00985 -0.07675 0.01390 13.5 70%

TRUE PARAM. (JL, ( 2) = (0,2)


AHH 0.00611 0.00970 -0.02309 0.04829 13.2 72%
HNW 0.00717 0.01233 -O.lD901 0.04900 14.3
D-L 0.00772 0.01879 -0.20413 0.05466 15.1
DS 0.00780 0.01949 -0.22721 0.05532 15.6
SIMPLElSS(3)IML 0.00790 0.01970 -0.23351 0.05678 16.0 52%

TRUEPARAM. (JL,a 2 ) = (0,1/4)


AHH 0.00220 0.00119 -0.00283 0.00074 lD.7 89%
HNW 0.00253 0.00152 -0.01360 0.00076 12.1
D-L 0.00322 0.00235 0.04903 0.00087 13.0
DS 0.00415 0.00241 0.09736 0.00090 13.4
SIMPLElSS(3)IML 0.00425 0.00251 0.09831 13.7 69%

TRUE PARAM. (JL, ( 2 ) = (2,2)


AHH 0.00608 0.00968 -0.02301 0.04817 12.1 79%
HNW 0.00731 0.OlD33 -O.l0812 0.04893 12.5
D-L 0.01150 0.01994 -0.61343 0.34311 16.0
DS 0.00802 0.01908 -0.11001 0.05521 13.2
SIMPLElSS(3)IML 0.00808 0.01968 -0.11351 0.05663 13.3 71%

TRUE PARAM. (JL, ( 2) = (2, 1/4)


AHH 0.00215 0.00121 -0.01254 0.00075 lD.5 89%
HNW 0.00261 0.00156 -0.01382 0.00077 lD.7
D-L 0.01282 0.00164 0.01715 0.00303 11.0
DS 0.00255 0.00171 -0.02164 0.00078 11.5
SIMPLElSS(3)IML 0.00256 0.00172 -0.02269 0.00069 11.7 80%

(3, 1), (1, I)}; and finally 3 examples of the beta distribution with (p, q) = {( 1, 3),
(3, 1), (1, I)}. In each experiment, 500 estimation replications were carried out. The
NAG Library Fortran subroutines were used to generate the random pseudo data,
On the Accuracy and Efficiency ofGMM Estimators 29

Table 1 (continued)
SAMPLE SIZE 20 =
METHOD bias variance bias variance (d.f. =4)
TRUE PARAM. (Jl., (12) = (0, 1)
AHH 0.01l03 0.04674 -0.13735 0.09080 3.8 43%
HNW 0.01298 0.05960 -0.18649 0.09332 4.2
O-L -0.01365 0.46749 0.19824 2.65313 5.2
OS 0.01402 0.06273 -0.23965 0.94469 6.1
SIMPLElSS(3)IML 0.01403 0.06374 -0.29827 0.10405 6.2 20%

TRUE PARAM. (Jl., (12) = (0,2)


AHH 0.01885 0.09546 -0.25392 0.36118 4.5 34%
HNW 0.01938 0.12032 -0.57394 0.37404 5.1
O-L -0.02874 0.28734 0.79375 1.31727 5.8
OS 0.01985 0.19526 -0.88349 0.37663 6.0
SIMPLElSS(3)/ML 0.01995 0.19550 -0.89653 0.42219 6.8 16%

TRUE PARAM. (Jl., (12) = (0,1/4)


AHH 0.00502 0.01201 -0.03184 0.00612 4.0 40%
HNW 0.00658 0.01501 -0.07042 0.00774 4.5
O-L -0.04946 0.50263 0.07546 1.63030 6.1
OS 0.00702 0.01586 -0.08041 0.00816 5.3
SIMPLElSS(3)IML 0.00712 001602 -0.09457 0.00888 5.5 24%

TRUE PARAM. (Jl., (12) = (2,2)


AHH 0.01787 0.09448 -0.25470 0.35120 3.5 48%
HNW 0.01838 0.11925 -0.27394 0.37414 4.1
O-L 0.01935 0.12802 0.27443 0.52655 5.1
DS 0.01979 0.19548 -0.28349 0.42263 5.5
SIMPLElSS(3)IML 0.01992 0.19633 -0.29653 0.47619 5.7 23%

TRUE PARAM. (Jl., (12) = (2,1/4)


AHH 0.00602 0.01l94 -0.03194 0.00511 3.1 54%
HNW 0.00640 0.01485 -0.07131 0.00574 4.2
O-L 0.09401 0.05068 0.31639 2.84743 6.7
OS 0.00707 0.01567 -0.12041 0.00616 5.1
SIMPLElSS(3)IML 0.00713 0.01594 -0.13457 0.00688 5.8 22%

and each of the 7 estimators were applied to the resulting 500 "data" sets. In this,
the start-up seeds were randomised by the clock and a quasi-Newton algorithm was
used to find a minimum of a non-linear function subject to fixed upper and lower
bounds for the range of possible parameter values. For the estimates themselves, the
selection criteria are (1) unbiased ness
30 A.J. Hughes Hallett and Yue Ma

TABLE 2

Gamma dist. SAMPLE SIZE =200


1st paramo r 2ndparam. A average X2 p-value
METHOD bias variance bias variance (d.f. = 17)
TRUE PARAM. (r, A) = (3, 1)
AHH 0.04330 0.08775 0.01061 0.01175 14.4 64%
HNW 0.05774 0.13113 0.01534 0.01627 16.2
D-L -0.14282 0.19760 -0.05869 0.02415 17.1
DS 0.17742 0.57358 0.06290 0.04682 17.8
SS(3) 0.22081 0.56552 0.05947 0.04773 18.7
ML 0.18372 0.31388 0.05623 0.03518 18.0 40%
SIMPLE 0.33591 0.28049 0.11805 0.03183 19.5

TRUEPARAM. (r,A) = (1,3)


AHH 0.01474 0.00781 0.03886 0.12880 13.4 70%
HNW 0.02901 0.01783 0.08096 0.23007 14.0
D-L 0.04951 0.03073 0.14412 0.34351 16.1
DS -0.04837 0.04014 -0.18473 0.42035 16.4
SS(3) 0.08781 0.04403 0.25613 0.45788 17.0
ML 0.13395 0.03310 0.48601 0.39696 20.1 37%
SIMPLE 0.09983 0.05107 0.28132 0.51546 18.5

TRUEPARAM. (r,A) = (1,1)


AHH 0.01468 0.00778 0.01295 0.01431 10.5 89%
HNW 0.02821 0.01781 0.02699 0.02556 11.4
D-L -0.04841 0.04018 -0.06160 0.04678 13.0
DS 0.08486 0.05814 0.07379 0.05522 15.2
SS(3) 0.09801 0.05212 0.09377 0.05727 15.8
ML 0.14035 0.03619 0.17030 0.04789 16.4 49%
SIMPLE 0.13023 0.08550 0.11935 0.07498 17.3

= -N.L
1 N A

bias (f3i - 13) ,


0==1

where 13
represents the true parameter value; and N is the number of replications
(N = 500) and (2) efficiency
N N
. 1 ",A -2 - l",A
vanance = N _ 1 L..J (f3i - 13) , where 13 = N L..J f3i .
i==1 i==1

The choice of Normal, Gamma, and Beta densities in these experiments covers the
wide variety of distribution shapes that are found in economic and financial data.
On the Accuracy and Efficiency ofGMM Estimators 31

Table 2 (continued)
SAMPLE SIZE = 20
TRUE PARAM. (T, A) = (3,1) (d.f. =4)
AHH 0.55093 1.67360 0.18058 0.22084 5.2 27%
HNW 0.65978 2.00182 0.21690 0.25896 5.9
D-L 0.77432 8.77511 0.20008 1.42427 6.4
DS 1.52956 5.66632 0.48944 0.50706 6.7
SS(3) 1.84638 7.06289 0.55156 0.71606 7.0
ML 1.60091 6.13546 0.51914 0.69212 6.8 8%
SIMPLE 2.28789 6.09199 0.77963 0.68508 7.3

TRUEPARAM. (T,A) = (1,3)


AHH 0.13228 0.11975 0.59298 2.12712 5.9 21%
HNW 0.27328 0.2228 1.04345 3.40449 6.2
D-L 0.33759 0.93460 1.02650 9.76649 6.4
DS 0.38921 0.26057 1.37600 3.69683 6.9
SS(3) 0.50374 0.27059 1.69972 3.68184 7.0
ML 0.69007 0.58224 2.28504 7.90659 7.3 7%
SIMPLE 0.89671 0.67855 3.06943 7.95057 7.5

TRUE PARAM. (T, A) = (I, 1)


AHH 0.13910 0.12316 0.20486 0.23759 3.8 43%
HNW 0.28118 0.22167 0.35581 0.37290 4.1
D-L 0.64287 0.47650 0.66815 0.63571 4.7
DS 0.70516 0.62484 0.77703 0.93212 6.1
SS(3) 0.82113 0.49703 0.85024 0.75024 6.7
ML 0.71317 6.03977 0.71959 5.88462 6.5 9%
SIMPLE 1.03394 0.92628 1.20932 1.27444 7.3

4. RESULTS: PARAMETER ESTIMATION

Tables 1 to 3 contain the results of our Monte Carlo parameter estimation experiments
for the Normal, Gamma, and Beta distribution cases, respectively. For each estimate
of them we report the bias and variance achieved by our 7 GMM techniques across the
500 Monte Carlo replications. For reasons of space we present only the results for the
small sample size experiments (T = 20) and for the large sample sizes (T = 200).
Results for the intervening cases (T = 50, 100 etc.) are available from the authors.
In what follows, we take the bias in an estimated parameter to be an indicator of
the relative accuracy of the given estimator in the specified circumstances, and the
variance to be an indicator of the estimator's reliability (or sensitivity to "outliers" in
the data).
32 A.J. Hughes Hallett and Yue Ma

TABLE 3

Beta dist. SAMPLE SIZE - 200


1st paramo p 2ndparam. q average X2 p-value
METHOD bias variance bias variance (d.f. = 17)
TRUE PARAM. (7,'\) - (1,3)
AHH 0.01305 0.00773 0.03151 0.09022 14.0 67%
HNW 0.01605 0.01079 0.03684 0.10727 14.2
D-L 0.01697 0.01130 0.03940 0.11056 14.3
DS 0.01730 0.01143 0.04027 0.11139 14.5
SS(3) 0.03027 0.01874 0.07352 0.15800 14.7
ML 0.04406 0.01424 0.12045 0.12527 15.0 59%
SIMPLE -0.04658 0.04865 -0.17615 0.43037 16.0

TRUEPARAM. (p,q) = (1, 1)


AHH -0.00261 0.00900 0.00081 0.00923 10.4 89%
HNW 0.01294 0.01027 0.01057 0.01067 11.8
D-L 0.01331 0.01056 0.01480 0.01181 12.0
DS 0.01341 0.01816 0.01491 0.01221 12.1
SS(3) 0.01335 0.01886 0.01561 0.01243 12.2
ML 0.01368 0.02017 0.01519 0.01121 12.6 71%
SIMPLE 0.65246 0.33157 0.64924 0.36377 14.2

TRUEPARAM. (p,q) = (3,1)


AHH 0.05765 0.09784 0.01676 0.00908 13.0 74%
HNW 0.06025 0.12286 0.01872 0.01303 13.4
D-L 0.06180 0.12636 0.01931 0.01356 13.6
DS -0.06201 0.12651 -0.01947 0.07660 14.0
SS(3) 0.06253 0.12691 0.01958 0.01365 14.6
ML 0.09112 0.17508 0.03061 0.02094 14.8 61%
SIMPLE 0.15116 0.13977 0.04759 0.01699 15.0

(a) General results:

Both criteria, small sample bias and small sample efficiency, put our own GMM
estimator (denoted AHH here) in first place for performance and the Hansen-Newey-
West estimator (HNW) in second place. There are a total of 88 comparisons here 2, and
there is just one case where our GMM estimator does not perform best [the maximum
likelihood technique produces a marginally smaller variance for the second parameter
estimate in the N(2, 0.25) and T = 200 case]. Similarly there are just two cases

2 11 distributions (of 3 types) each with 2 parameters judged by 2 criteria in 2 sample size
experiments.
On the Accuracy and Efficiency of GMM Estimators 33

Table 3 (continued)
SAMPLE SIZE = 20
TRUE PARAM. (p, q) = (1,3) (dJ. = 4)
AHH 0.15040 0.14359 0.60575 1.95128 4.5 34%
HNW 0.15425 0.18224 0.60940 2.25979 4.7
O-L 0.40169 0.93666 1.23680 6.73138 5.3
OS 0.15920 0.18723 0.62354 2.30046 4.8
SS(3) 0.16143 0.18834 0.63016 2.30985 4.9
ML 0.47709 0.46201 1.63302 5.00642 5.7 14%
SIMPLE 0.42583 1.47930 1.22201 9.34737 6.0

TRUE PARAM. (p, q) = (1,1)


AHH 0.10146 0.14182 0.09315 0.17103 3.9 42%
HNW 0.13561 0.19563 0.12821 0.17347 4.2
O-L 0.13698 0.19575 0.12991 0.18961 4.5
OS 0.13584 0.19637 0.12863 0.18467 4.3
SS(3) 0.14291 0.18025 0.13008 0.18806 4.8
ML 0.24933 0.38327 0.24082 0.34987 5.2 17%
SIMPLE 1.28978 7.73841 1.15604 7.62929 7.2

TRUEPARAM. (p,q) = (3,1)


AHH 0.47206 2.22174 0.18087 0.15325 5.3 26%
HNW 0.48975 2.38286 0.19356 0.18169 5.5
O-L 1.38579 8.98914 1.16852 3.57749 6.0
OS 0.50587 2.41933 0.19982 0.18717 5.8
SS(3) 0.58747 2.99887 0.19781 0.18262 6.3
ML 0.50126 2.41273 0.19806 0.18607 6.1 11%
SIMPLE 1.52819 5.69004 0.54230 0.48973 7.1

where the Hansen-Newey-West estimator is not second best [the Deaton-Laroque


method produces a lower bias for the second parameter in the G(3, 1) and G(I, 3),
T = 20, cases]. After these two, the accuracy and reliability of the estimators
deteriorate rapidly, especially with small sample sizes and in the Beta and Gamma
distribution exercises.
A second general observation is that the estimators are differentiated more clearly
in terms of reliability than in terms of accuracy: the variances of the parameter
estimates show more variation over different techniques than do the small sample
biases. This suggests that unbiasedness is a property which, within a given tolerance,
is reached earlier than efficiency with expanding sample sizes. It further suggests
that the asymptotic properties are an unreliable guide to the true parameter values at
small sample sizes.
34 AJ. Hughes Hallett and Yue Ma

(b) The Normal Distribution Case (Table 1):

The most obvious feature of these results is that the methods are largely unbiased and
efficient. An exception is the poor performance of the Deaton-Laroque estimator in
small samples (T = 20). This poor performance is concentrated in the variances of
these parameter estimates, which are often (but not always) 50 to 500 times larger
than that of the other estimators - particularly for the second parameter. There are
fewer problems with the bias of the estimates, although 7 out of the 10 bias results
show signs opposite to the other estimators. Things look better in large samples.
The Deaton-Laroque estimator generally produces better results than the methods of
simulated moments or maximum likelihood and captures third place for large values
of T. Consequently it appears that the Deaton-Laroque method requires much larger
samples to achieve reasonable statistical properties. This sensitivity or unreliability
in small sample sizes is also shared by the maximum likelihood estimates and also
appears in the Beta and Gamma distribution results below, although the Deaton-
Laroque estimator will not be singled out there since all the estimators (beyond the
best two) do badly in those exercises.
A second feature is that both the bias and the variance of the estimates fall roughly
by a factor of 10 with a lO-fold increase in the sample size, suggesting that estimates
by any of the 7 methods converge on statistical consistency at the rate of O(T-I).
This feature does not seem to vary much among the different techniques. Thus, the
performance ranking remains as described above: our GMM estimator dominates
the Hansen-Newey-West estimator in every case, and the latter in turn dominates all
others. Moreover the degree of dominance of our estimator over the Hansen-Newey-
West estimator is usually larger than the dominance of the latter over the next best.
Finally, both the bias and the variance of the estimates rise somewhat with larger
values of (72 (the distribution's second parameter) and rather less so with p, (its first
parameter), but these tendencies are weak compared to the results which follow in
Tables 2 and 3.
The most awkward result in Table 1, therefore, is the poor performance of the
maximum likelihood estimator. In this exercise, it produces independent estimates
of p, and (72 and should be efficient at any sample size; its results should be at least as
good as any of the others. One explanation why this is not so is that the differences
observed in Table 1 may not be statistically significant but are simply the result of
different numerical procedures. This possibility is examined in Section 5 below.

(c) The Gamma distribution case (Table 2):

The estimates in Table 2 show much greater bias and inefficiency - particularly in
small samples, where the estimates of at least one of the two parameters are really
very poor. It seems that consistency here requires a substantially larger sample size
than for problems involving normally distributed variables.
Having said that, our GMM estimator still dominates Hansen-Newey-West, but
by a smaller margin than the latter dominates the next best. In that sense, the best two
both pull ahead of the pack in small samples. This implies that there is an increasing
On the Accuracy and Efficiency of GMM Estimators 35

relative (but not absolute) reliability as the quality of the estimates starts to fall. In
any event, it now matters more which estimator one chooses. Moreover, the choice
between estimators is wider in that, even in larger samples, the biases and variances
of the parameter estimates may be 8-10 times larger if the "wrong" estimator is used.
Once again, both the Deaton-Laroque and the maximum likelihood estimators appear
to be more unreliable and inaccurate than the others in small samples. On the other
hand, the biases and variances from the two best techniques fall by factors of 10 or
more when the sample size is increased from 20 to 200, suggesting that the estimators
are still converging on consistency a little faster than O(T-l).
Finally, both the bias and variances tend to increase with the size of the underlying
parameter but, interestingly, not with the size of the other parameter value.

(d)The Beta distribution case (Table 3):

Like in the Gamma distribution results, the estimates in Table 3 show relatively large
biases and variances in small samples. Consistency therefore requires fairly large
samples, though not as large as the Gamma distribution estimates.
As before, our GMM estimator dominates all others, and the Hansen-Newey-West
estimator comes second, for accuracy, reliability, and mean square errors. The degree
of dominance is reasonably large again, which is consistent with the proposition that
these two estimators pull ahead of the pack as the quality of the estimates starts to
fall. The simple method of moments and the Deaton-Laroque estimators continue to
perform badly in small samples, and the two best methods show biases and variances
failing by factors of 10 or more as the sample increases from 20 to 200. So once
again, consistency appears to be achieved at a rate of more than O(T-I).

5. TESTING THE SIGNIFICANCE OF THE OBSERVED SMALL SAMPLE BIASES

The most striking feature of these results is the relatively poor performance of the
maximum likelihood estimators. Numerically, the maximum likelihood estimators
produce the worst or second worst bias results in 19 out of the 20 Normal distribution
tests (Table 1),6 out of the 12 Gamma distribution tests (Table 2), and 9 out of 12
Beta distribution tests (Table 3). Similarly they show the largest or second largest
variances in 18 out of 20 tests in Table I, in 2 of the 12 tests in Table 2, and in 6 out
of 12 tests in Table 3. So, for accuracy and reliability, maximum likelihood performs
relatively badly compared to the leading GMM estimators.
These results are remarkable because we know that, in the case of normally
distributed variables at least, maximum likelihood estimates are independently dis-
tributed and efficient in the sense of actually reaching the Cramer-Rao lower bound
(Mood, Graybill and Boes, 1974, chapter 7). Theoretically, they cannot be beaten.
For the Gamma and Beta distribution, the theory is not so clear. First, maximum
likelihood estimates are now no longer independent of one another. Estimators that
pay little attention to the higher order moments of the distribution being fitted may
be able to secure lower biases (or variances) in their parameter estimates at the
36 A.J. Hughes Hallett and Yue Ma

implicit cost of higher biases in those higher order moments which are unpenalised.
Such trade-offs are not available for an estimator that tries to fit the entire likelihood
function.
Second, the unbiasedness and efficiency properties of maximum likelihood esti-
mation are now only asymptotic, and our sample sizes of 20 to 200 may be too small
to capture those properties. Thus maximum likelihood estimates may have produced
worse results than some of the GMM estimators because the maximum likelihood
estimators' sampling distributions converge more slowly (in both mean and variance)
to their limiting distribution.
These arguments can explain why the maximum likelihood estimates are worse
than some of the GMM estimates in the Gamma and Beta tests, but not why they
are worse in the Normal distribution tests. In this latter case, there are only two
possible explanations: the maximum likelihood biases (variances) in Table 1 are
not statistically significant, and/or they are the result of numerical instabilities in the
algorithm used to compute them. The implications of these two explanations are
quite different, however. If the biases (variances) are not significantly different from
zero (or each other), then there are not problems with the estimating techniques, and,
statistically it does not matter which is chosen. But if they are significant, it would
be worthwhile to examine the numerical properties of different maximum likelihood
algorithms to eliminate (as far as possible) any problems of numerical instability.

(a) The results from the Normal Distribution tests

The distribution of the maximum likelihood estimates of the parameters of a nor-


mal distribution that actually generate the observations is straightforward. For
a sample size of T, and for N replicated samples, the first parameter would be
Xi = T- 1 E j Xij for the i th sample, and the bias would be computed from the aver-

ageofN replications, X = N- 1 E
Xi = (NT)-I EE Xij· If the underlying ob-
servations Xij are drawn from a Normal(Ji, ( 2 ) distribution, then X '" N(Ji, u 2 j NT)
exactly, and our recorded bias is distributed as N(O, u 2 j NT) in Table 1. Simi-
larly our recorded estimate of the second parameter (the distributions variance) is
the average of N replications; i.e. 8 2 = N- 1 ~ 8;, where each (T - 1)8;;u 2 is
8;
distributed X~T-I) and has variance 2{T - 1). Hence has variance 2u4 j(T - 1)
and /!:-
82 N(u 2 , 2u4 j N(T
- 1)), so bias in the latter is distributed asymptotically as
N(0,2u 4 jN(T - 1)).
For the results in Table 1, we have T = 20 or 200, N = 500, and five different
generating distributions. The bias in the first parameter estimate (Table 1, column 1)
will therefore be significantly different from zero at a 5% level if it lies outside the
interval ±1.96u:;: where U:;: = u jJNT. Similarly the biases in the second parameter
estimate will be significant at a 5% level if they lie outside the interval ±1.96us ,
where Us = u 2 ../2jN(T - 1). Table 4 summarises U:;: and Us for the five different
distributions represented in Table 1.
Evidently the maximum likelihood estimates of the mean (Ji) show significant
biases at a 5% level only in the N(O, i) case with T = 200. The remaining 9
On the Accuracy and Efficiency ofGMM Estimators 37

TABLE 4

Ufij Us

u 2 = 0.25 u2 = 1 u2 = 2 u 2 = 0.25 u2 = 1 u2 = 2
T=20 .005 .01 .014 .004 .015 .029
T=200 .0016 .0032 .0045 .001 .005 .009

maximum likelihood estimators show no significant biases. These tests are exact.
The tests of bias in the estimates of the variance parameter are asymptotic with respect
to a "sample" size of 500, but show a greater number of significant biases. In fact,
significant biases appear in all 11 second parameter estimates.
Hence maximum likelihood estimation has produced some significant biases,
more when estimating of the second parameter than the first. Both the presence of
significant biases in half the cases and the fact that these biases tend to appear in both
the variance parameter and the larger samples for the estimated mean suggest that
numerical instability is at least part of the reason for the poor maximum likelihood
performance. Indeed the estimates for ai and a; from Table 4 are smaller than
the variances of the estimates actually recorded in columns 2 and 4 of Table 1.
The computed distributions of our estimates are therefore very much wider than the
Cramer-Rao lower bound would imply, which is symptomatic of numerical instability.
However, that is not the real issue. The crucial question is, are these biases actually
larger (in a statistical sense) than those coming from the GMM estimators? The
difficulty here is that the exact distributions of the parameter estimates obtained from
the GMM techniques are not analytically tractable, since their estimating equations
do not admit a closed form solution. Thus, we cannot obtain an exact variance of the
parameter estimates Xi and sy to which we could apply the central limit theorem to
derive tests of the biases in x or s2. However we can use the maximum likelihood
values already obtained to estimate those variances. That gives the test results in
Table 7.
Thus, whereas it is possible to argue that maximum likelihood estimators do
not provide any significant biases in the first parameter, and that the actual biases
observed are the result of numerical instabilities in the algorithm used to maximise
the likelihood function, the same cannot be said for the GMM estimators. With our
tests, there is a much higher incidence of significant bias in both parameters and
both sample sizes. The chief offenders are the Deaton-Laroque (DL) and the Method
of Simulated Moments (DS) estimators. At the other end of the scale, our own
GMM estimator and the HNW estimator produced no significant biases in the first
parameter estimate and fewer in the second. Hence there are significant differences
in accuracy and reliability between the AHH and HNW estimators on the one hand,
and the remaining GMM estimators on the other.
38 A.J. Hughes Hallett and Yue Ma

(b) The Gamma Distribution Tests

Here formal testing for bias is difficult since the exact distribution of estimates, even
for maximum likelihood, of the two parameters is unknown. Thus, we are unable to
determine the variances of those estimators to which the Central Limit Theorem might
otherwise apply. Further we cannot substitute the estimated variances (maximum
likelihood or otherwise) obtained in Table 2, for biased parameter estimates entail
biased variance estimates (there being no independence property now). Any formal
justification for this approach has thus disappeared.
But even if conventional asymptotic tests are not possible, a conditional test can
be used that is a sufficient (but not necessary) condition for detecting significant
biases. The maximum likelihood estimates of a Gamma distribution are obtained by
solving

T = XA
(4)
and A = e..p(r) /

simultaneously for T and A, where x = T- 1 E;=l Xi, and the Xi are the random
drawings in a sample of size T. The function 1/J(T) is the Digamma function:
1/J( T) = dlogr( T) / dT where r( T) = (T - I)!. Note that 1/J( T) > 0 is monotonically
increasing in T for T 2: l.
However, for testing purposes, we can form conditional estimates of T and A by
inserting the true values of A and T from the underlying distribution on the right of
(1). Call these conditional estimates T* and A*, and let the actual estimates obtained
by solving (1) be f and 5.. Then, with positively biased estimates, the probability of
any particular positive bias in f or .A is less than the probability of the same bias in T*
or A* under the null of unbiasedness. Hence a sufficient condition for the biases in
Table 2 to be significant (at the 5% level) is that they should be significant for T* or
A*. Indeed (4) implies that T* :.:.- (T, A2(F2/NT). Using the fact that (F2 = T/ A2 for
each of the three Gamma distributions estimated, we find the maximum likelihood
estimates of T to be significantly biased3 for both the small and large samples. The
consistency and asymptotic efficiency of GMM estimators allow us to extend these
asymptotic tests to the other estimators of T in Table 2. Once again, all estimates
show significant biases.
Hence we conclude that, in the case of the Gamma distribution tests, all estimators
show significant biases that do not vanish with larger sample sizes. The sampling
distributions evidently converge slowly on their asymptotic distributions, in terms

Note: )..2(T2jNT for G(3,1) G(1,3) G(l,l)


T= 20 ~ :or- :or-
T=200 .0055 .0032 .0032
On the Accuracy and Efficiency of GMM Estimators 39

TABLE 5
Biases in the estimated means and variances of gamma distributed variables
from Table 2 (/-L = f /~; ([2 = f / ~2).
T=20 T=200
True parameters
and Bias in Bias in Bias in Bias in
distribution Estimator mean variance mean variance
(3,1) AHH .008 -.452 .0113 -.02
/-L=3 HNW .007 -.529 .0115 -.034
([2 =3 DL .145 -.379 -.2015 .225
DS -.212 -.958 -.0106 -.188
SS(3) .124 -.987 .040 -.131
ML .029 -1.006 .0142 -.147
Simple -.029 -1.330 -.016 -.331

(1,3) AHH -.018 -.0234 .0006 -.0012


/-L= 1/3 HNW -.018 -.0332 .0007 -.0027
([2 = .11 DL -.001 -.0286 .0005 -.0049
DS -.016 -.0386 .0050 .0090
SS(3) -.013 -.0430 .0008 -.0085
ML -.013 -.0560 .0080 -.0179
Simple -.021 -.0596 .0018 -.0090

(1,1) AHH -.055 -.215 .0017 -.011


/-L=1 HNW -.055 -.303 .0012 -.025
([2 =I DL -.015 -.410 .0141 .081
DS -.040 -.460 .0103 -.059
SS(3) -.016 -.468 .0039 -.082
ML -.004 -.421 -.0260 -.167
Simple -.08 -.583 .0097 -.098

both of unbiasedness and having larger variances than in the limit (compare values
for .x2(T2/NT with column 2 of Table 2). It is clear that both our preferred GMM
estimators (AHH first, and then HNW) are more accurate and more reliable (having
smaller biases and lower variances) than their rivals - including maximum likelihood.
But, this does not cause them to be unbiased or near-minimum variance. In fact these
results are purely relative: while our own GMM estimator is preferable to the others,
it is not necessarily good.
And this is as far as we can go. Conditional tests on .x itself are not possible
since the variance for the distribution of the inverse geometric mean, (IIxi) -I IT, is
not known and the central limit theorem cannot be applied. Beyond this, we can
only look at the biases in the estimated means (= f /,\) and variances (= f / ,\2)
40 A.J. Hughes Hallett and Yue Ma

numerically. These figures are given in Table 5, but formal tests are not possible
since both are derived from ratios of nonindependently distributed random variables.
It is clear from Table 5 that the biases in the mean are systematically smaller than
those in the variance, and they vary less across estimators than do those for the
variance estimates. 4
These results illustrate an important point. General statements indicating that a
particular estimator is more accurate, or converges faster to its asymptotic distribution,
can be extremely misleading. In this exercise the means have been well estimated in
all cases. The variances are less well estimated - but their fit is still good compared
to many of the estimates of the T, A parameters. And such results are easily obtained,
since even significant biases in T and A of the same sign will offset each other to
produce means or variances with relatively little bias. That is, the quality of the results
obtained from estimating particular characteristics may be quite different from those
obtained from fitting the distribution as a whole. Hence it matters whether the real
objective is to fit particular parameters or the distribution as a whole.

(c) The Beta Distribution Tests

Here not even conditional tests are available to determine the significance of the
biases in the maximum likelihood estimates of Table 3. These estimates arise from
solving

T[1/J(p + q) -1/J(p)] + 'L log Xi = 0 }


+ 'L
(5)
and T[1/J(p + q) -1/J(q)] log (1 - Xi) = 0

simultaneously for p and q, a process that does not yield a tractable closed-form
solution. At best one can inspect the numerical biases in Table 3 or the equivalent
bias results in Table 6. But, just as these, Table 6 shows how easily numerically
"significant" biases in the parameter estimates can offset one another to give appar-
ently unbiased mean and variance estimates. Both are estimated with much smaller
numerical biases than are p and q themselves. There is no clear tendency here for the
variance to be more biased than the mean, and both biases show a stronger tendency
to diminish with increasing T. Nor is there any apparent ranking of biases across
estimators. Yet the general message is the same: it matters for estimation whether
one focuses on particular characteristics of the distribution or its entirety.

6. RESULTS: FITTING THE ENTIRE DISTRIBUTION

To test the goodness of fit of the entire distribution implied by each replication
underlying the results in Tables 1 to 3, we have used the traditional X2 test: the
likelihood ratio goodness-of-fit tests (Kendall and Stewart (1974)). The mean X2

4 Our own GMM estimator generally does better than the other estimators in Table 5. On
the other hand the bias in the variance estimates converges to zero with increasing T, but there
is little convergence of the biases in the means.
On the Accuracy and Efficiency of GMM Estimators 41

TABLE 6
Biases in the estimated means and variances of beta distributed variables from
Table 3.
pq )
(fl = p: q ,
(72 -
- (p+q)2 (p+q+ I)

T=20 T=200
True parameters
and Bias in Bias in Bias in Bias in
distribution Estimator mean variance mean variance
(1,3) AHH .0081 -.0056 .0005 .0003
fl = 1/4 HNW .0077 -.0056 .0007 -.0048
(72 = .0375 DL .0014 -.0094 .0007 -.0003
DS .0076 -.0057 .0007 -.0004
SS(3) .0076 -.0058 .0011 -.0006
ML .0083 -.0117 .0007 -.0011
Simple .0025 -.0091 .0024 .0002

(1, I) AHH -.0377 -.0109 -.0009 .0001


fl = 1/2 HNW -.0583 -.0156 .0006 -.0003
(72 = .083 DL .0016 -.0065 -.0004 -.0004
DS .0016 -.0064 -.0004 -.0004
SS(3) .0028 -.0066 -.0006 -.0005
ML .0017 -.0114 -.0004 -.0005
Simple .0150 -.0371 .0005 -.0249

(3,1) AHH .0038 -.004 .0005 -.0006


fl = 3/4 HNW .0049 -.004 .0003 -.0006
(72 = .0375 DL .0396 -.003 .0002 -.0006
DS .0050 -.004 -.0002 .0007
SS(3) .0003 -.005 .0002 -.0006
ML .0049 -.004 -.0001 -.0009
Simple .0041 -.0107 .0005 -.0015

statistics, for each estimation technique under review, are given in Tables 1 to 3.
The conventional goodness of fit test would accept the null hypothesis that the
observations fitted by the named technique conformed to a normal, gamma or beta
distribution, respectively, if the associated X2 test statistics were less than the critical
values of 27.6 (for a 5% significance level and T = 200). For T = 20, the critical
value is 9.5. Every estimator therefore passes this test easily, even in the smaller
samples, and the null hypothesis is correctly accepted.
It is clear, however, that these tests are considerably more powerful in the larger
42 A.i. Hughes Hallett and Yue Ma

TABLE 7
Test the significance of the estimates' bias for Normal Cj.L, (]"2)
N(O, 1) N(0,2) N(O, 1/4) N(2,2) N(2,1/4)
T == 200 P, a- 2 p, a- i p, a- i p, a- i p, a- 2
AHH
* * *
HNW
* * * * *
DL
* * * * * * * *
DS
* * * * * *
SIMPLE!
SS(3)1ML
* * * * * *
T == 20
AHH
* * * * *
HNW
* * * * *
DL
* * * * * * * *
DS
* * * * *
SIMPLE!
SS(3)1ML
* * * * *
Note: * indicates a significant bias at the 5% level.

samples. Indeed, although we have not specified a particular alternative hypothesis,


the estimator producing the lowest calculated X2 statistic minimises the probability
of making a type II error for any given alternative hypothesis, whatever it may be.
For larger samples the observed significance level, or p-value, corresponding to the
calculated X2 statistic ranges from 72% to 89% in the Normal distribution exercises
for the best of our estimators. This range is from 67% to 89% in the Beta and Gamma
distribution cases, placing the conventional 5% or 10% critical values. For the small
samples, the p-values are lower: 34% to 54% in Table I, 21 % to 43% in Table 2, and
34% to 54% in Table 3.
These results also confirm the performance ranking established in the previous
section. In all 11 experiments, and for both sample sizes, our own GMM estimator
produced a distribution that matched the true distribution better than any of the dis-
tributions fitted by the other estimators. The Hansen-Newey-West estimator came
in second place again, followed by Deaton-Laroque, the method of simulated mo-
ments, and the maximum likelihood estimator. Moreover, the difference in the X2 test
statistics between the best GMM estimator and the maximum likelihood estimator
indicates an improvement of between 8% and 40% in the p-value or confidence level
for accepting the null hypothesis that the estimated distribution successfully fits the
specified distribution in large samples, and an improvement of between 14% and 32%
for the smaller samples. This is a healthy finite sample improvement over traditional
estimation methods.
On the Accuracy andEfficiency ofGMM Estimators 43

7. CONCLUSIONS

Basically, our concerns about the poor small sample properties of GMM estimators
have been born out. While we have observed a fairly rapid rate of convergence
towards consistency and asymptotic efficiency, there is still evidence of statistically
significant biases and large variances, even in the larger samples.
Just how bad the small sample properties actually are depends on the particular
estimation technique chosen. It matters which GMM estimator is used and which
numerical implementation of the maximum likelihood estimator is applied. In these
exercises there is a clear ranking: our own GMM estimator performs best, followed
by the Hansen-Newey-West estimator, and then the Method of Simulated Moments.
The Deaton-Laroque estimator shows a great deal of variability in small samples, but
is a relatively good performance in larger samples.
Moreover, it appears that the differences between the performance of these es-
timators widen as we depart from the classical assumptions of large samples and
normally distributed variables. We find the results are sensitive to the sample size,
the form of fitting criterion, non-normality in the underlying distribution, and the
size of the parameter being estimated. We also find that most estimators are worse in
regard of efficiency than unbiasedness. Nevertheless, the GMM estimators all fairly
good for fitting probability distributions in their entirety, even in relatively small
samples.

APPENDIX

THEORETICAL MOMENTS UNDER DIFFERENT DISTRIBUTIONS

(1) Normal distribution

p.d.f: I(X, (3) = 1


r,:;-::)
V 27[(12
(X-p,?)
exp - 2
(1
2

The P,J = p" P,2 = (12, /-L3 = 0, /-L4 = 3(14.


(2) Gamma distribution
AT
p.d.f: I(X,{3) = reT) XT-1e->'x X,T,A > 0

where

J
00

reT) = ST-J e- s ds
o

(3) Beta distribution


44 A.i. Hughes Hallett and Yue Ma

Then
P pq
ILl = - - IL2 = -:----~.....,----:-::-
p+q (p+q+ I) (p+q)2

2pq(q - p)
IL3= -:--------~~~~-------
(p+q+2) (p+q+ I) (p+q)3

3pq(p2q + 2p2 _ 2pq + pq2 + 2q2)


IL4 = (p + q)4 (p + q + I)(p + q + 2)(p + q + 3)(p + q + 4)

ACKNOWLEDGEMENTS

We are grateful to Dave Belsley, Gregor Smith, Jim Powell, Robin Lumsdaine and
participants of the Econometrics Seminar at Princeton for their comments.

REFERENCES

Deaton, AS. and Laroque, G. (1992) On the behaviour of commodity prices, Review of
Economic Studies, 59, 1-24.
Duffie, D. and Singleton K.J. (1989) Simulated Moments Estimation of Markov Models of
Asset Prices, Stanford University Discussion Paper, Stanford, CA
Gregory, A and G. Smith (1990) "Calibration as Estimation" Econometric Reviews, 9, pp.57-
89.
Hansen, L.P. (1982) Large sample properties of generalised Method of Moments Estimators,
Econometrica, Vol. 50, pp 1029-1054.
Hughes Hallett, A.J. (1992) Stabilising earnings in a volatile market, paper presented in the
Royal Economics Society Conference, London (April).
Kendall, M.G. and Stewart, A. (1973) The Advanced Theory ofStatistics, Vol. 2, Third Edition,
Griffen & Co., London.
Mood, A F. Graybill and D. Boes (1974) Introduction to the Theory ofStatistics, McGraw-Hill,
New York.
Newey, w.K. and West K.D. (1987) A Simple, positive semi-definite, heteroscedasticity and
autocorrelation consistent covariance matrix, Econometrica, 55, pp 703-708.
Smith, G. and Spencer M. (1991) Estimation and testing in models of exchange rate target
zones and process switching, in P. Krugman and M. Miller (eds), Exchange rate targets
and currency bands, Cambridge University Press, Cambridge and New York.
Tauchen, G. (1986) Statistical Properties of Generalised Method of Moments Estimators of
Structural Parameters Obtained from Financial Market Data, Journal of Business and
Economic Statistics, 4, pp.397-425.
ALBERT J. REED AND CHARLES HALLAHAN

A Bootstrap Estimator for Dynamic Optimization Models

ABSTRACf. We propose a technique for computing parameter estimates of dynamic and


stochastic programming problems for which boundary conditions must be imposed. We
demonstrate the feasibility of the technique by computing and interpreting the estimates of a
dynamic food price margin model using secondary economic time series data.

1. INTRODUCfION

Several solutions to infinite time-horizon, multivariate stochastic and dynamic pro-


gramming problems have recently been proposed (Baxter et a1. 1990; Christiano,
1990; Coleman, 1990; den Haan and Marcet, 1990; Gagnon, 1990; Labadie, 1990;
McGratten, 1990; Tauchen, 1990; Taylor and Uhlig, 1990). However, few studies
suggest ways to make correct inferences on parameter estimates in such problems.
An exception has been the recent work of Miranda and Glauber (1991). The complex
restrictions that the coefficients of such solutions must obey can inhibit statistical
inference. Simplifying the restrictions requires simplifying the model structure, and
inferences on a simplified model may only be of limited use. Alternatively, infer-
ences can be made from the first-order conditions of the problem. However, this
strategy forces the analyst to ensure that the parameter estimates satisfy the prob-
lem's boundary conditions. In Miranda and Glauber, (1991) boundary conditions are
inherited through price band policies. Our study applies to the problem of estimating
the parameters of a dynamic problem in which no inherent boundary conditions exist,
but for which economic theory requires certain restrictions to be satisfied if the model
is to be useful in explaining behavior.
We illustrate our method with a stochastic regulator problem. This optimization
framework embodies linear-quadratic models (Sargent, 1987a) and provides the eco-
nomic arguments that underly some vector autoregression models. It also can be used
to approximate dynamic optimization problems without closed form solutions (Mc-
Gratten). Our study suggests how one could make (approximately) correct statistical
inferences on a model whose parameters satisfy fixed point or boundary conditions.
Gallant and Golub (1984) illustrate how one could impose inequality restrictions
on a static optimization problem. Using their methodology, one could impose restric-
tions on the eigenvalues of the matrices of the stochastic regulator, thereby achieving
the required boundary condition. However, such a strategy places more restrictions
on the parameter estimates than the boundary condition.
Our procedure also can be used to estimate the parameters of static optimization

D. A. Belsley (ed.), computationlll Techniquesfor Econometrics and Economic Anillysis, 45-63.


@ 1994 Kluwer Academic Publishers.
46 A.i. Reed and C. Hallahan

problems, but we apply it here to dynamic and stochastic problems. The stochastic
regulator encompasses a wide range of dynamic and stochastic models, and dynamic
and stochastic models provide a rich interpretation of economic data. These models
readily differentiate among the response of an endogenous variable to an actual
change, to a perfectly expected change, and to an unexpected change in an exogenous
variable. Furthermore the problem addresses the Lucas critique by recognizing that
such responses are not invariant to systematic changes in policy.
After discussing the stochastic regulator problem in Section 2, the bootstrap
estimator is presented in Section 3. Section 4 provides an example of interest to
agricultural economists, and Section 5 summarizes the paper.

2. THE STOCHASTIC OPTIMAL REGULATOR PROBLEM

Here we review the setup of the stochastic regulator, its solution, and the conditions
that deliver the solution. A more thorough treatment can be found in Sargent (1987b,
Chapter 1). An understanding of the stochastic regulator is crucial to understanding
the estimation procedure.
Consider a general dynamic and stochastic optimization problem defined by a
vector of state variables x = [x~ : x~l' and a vector of control variables u. The
problem is to find the control sequence {Ut} satisfying

L: (3t 7l"(Xt, Ut)


00

V(xo) = max £0
{u.} t=O

subject to x -I, and the equations of motion


Xlt+1 = gl(Xt,Ut},

X2t+1 = g2(X2t, €t+d ,


and the probability distribution
Prob(€t < e) = G(e) .
Here the vector XI is termed the 'endogenous' state variable, X2 the 'exogenous'
state variable, and €t is a serially uncorrelated error term satisfying £(CtIXt,Xt-l,
... ;€t-I,€t-2, ... ) = o. V(xo) is the value or objective function in period 0,
7l"t(xt,ud is the return function in period t, and (3 is the discount factor. £t(Y)
denotes the mathematical expectation of the random variable Y conditioned on the
state variable in time t, and taken with respect to G.
Two features characterize the above infinite time horizon problem. First, X2t+ I
does not depend on Xlt or Ut. Thus, Xlt does not Granger cause X2t. Second, the
problem is recursive. The selection of U in the current period affects current and
future period returns and future period XI without affecting past-period returns and
past XI. This recursivity enables the analyst to re-cast the above infinite-time-horizon
problem as a two-period problem that can be solved sequentially.
Specifically, the recursive problem can be written as
A Bootstrap Estimator for Dynamic Optimization Models 47

subject to,

Xlt+1 =gl(Xt,Ut)

where,

£V(g(Xt,Ut,C:t+dlxt} = J V(g(Xt,Ut,C:t+d)dG(c:),

and 9 = [g~ : g~l'. The necessary conditions for a solution are:

For interior solutions, the value function satisfies

If
ogl =0
OXI
and Xlt does not Granger cause X2t, (i.e., og2/oxl = 0), then because og2/0Ut = 0,
the necessary conditions reduce to

07r(Xt,Ut) +/3£ {(Ogl 07r ) Ixt} =0.


OUt OUt OXI,t+1

The above conditions are termed Euler equations and have a convenient structure.
The parameters of the Euler equations contain only the parameters of ogl/OUt, /3,
and the parameters of the return function. Unlike the first-order conditions for more
general problems, the Euler equations do not contain V'(xt+d, which complicates
estimation efforts because changes systematically over an iterative solution procedure
and presumably over a data sample. For this reason the proposed estimation procedure
applies to dynamic problems in which the state and control variables can written so
that ogl /OXI = o.
The Euler equations are unobservable because of the expectations operator. If
et is a forecast error and Xt- j (j = 0, 1, ... ) are elements of an information set,
the Rational Expectations Hypothesis (REH) states £(etlxt, Xt-h ... ) = O. If the
parameters of the Euler equations can be expressed in terms of the parameter vector
(), the forecast error is
48 A.i. Reed and C. Hallahan

07f(Xt,Ut) + (3 [Og(Xt,Ut,c t+d 07f(Xt+I,Ut+d] = et«(}).


OUt OUt OXlt+1
The above relationships are referred to as the sample Euler equations. Notice that
£(etlxt, Xt-I, ... ) = 0 implies £etxt-j = 0 (j = 0, 1, .. .). Hence, if one defines a
vector of instruments Zt that consist of elements Xt-j (j = 0, 1, ... ), then

(' { (07f(Xt, Ut) [Og(Xt, Ut, ct+l) 07f(Xt+l, Ut+I)]) } _ (' _


c:- !:l
UUt
+ (3 !:l
UUt
!:l
UXlt+1
0 Zt - c:- et 0 Zt - 0 .

where '0' denotes the Kronecker product.


The above expression is the orthogonality condition exploited when computing
Generalized Method of Moments (GMM) estimates of the parameters of the Euler
equations. In particular, for n observations, the GMM estimate is (Gallant, 1987)

d = argmino S«(}, V) ,

where

and

L
00

m n «(}, x) = n- I et«(}) 0 Zt .
t=1
To obtain a closed-form solution to recursive dynamic and stochastic optimization
problems, one must compromise on the functional form. The Stochastic Optimal
Linear Regulator specifies a quadratic objective function and linear constraints. This
class of models takes the form

V(XO) = max £0
{Ut}
f
t=O
(3t{[x~,u~l [~, q]
W [~:]}
subject to

where

Xt = Xlt] , T=
[ X2t [Til T21] .
T21 0
The infinite time horizon problem is re-cast as a two period problem comprising
Bellman's equation

V(xo) = ~~x {(x~, u~) [~, ~] (~:) + (3£ V(xt+dIXt}


A Bootstrap Estimator for Dynamic Optimization Models 49

and the constraints

Assuming Xlt does not Granger cause X2t. we have a2l = 0 and b2 = 0, and the
problem is recursive. If, in addition all = 0, the Euler equations

w' Xt + qUt + ,Bb; £[rllXlt+1 + r2lX2t+llxtl = 0

serve as a set of necessary conditions. Notice the Euler equations only contain the
parameters bl and the parameters of the objective function.
Now, define b = [b; : b~l', the matrix a with elements aij, and make the transfor-
mations

Vt = q-lW'Xt + Ut

Q =q

B=b.
This permits the problem to be re-stated as

V(xd = max {x~Rxt


Vt
+ v~QVt +,B£ V(xt+dlxt}
subject to

with solution,

Vt = -Fxt ,

where

F = ,B(Q + ,BB' PB)-l B'PA,

and where the P matrix solves the Ricatti equations

P = R + ,BA' PA _,B2 A' PB(Q + ,BB' PB)-IB' PA.


Using the linear constraint, the reduced-form solution is

Xt+l = (A - BF) Xt + ct+l .


50 A.i. Reed and C. Hallahan

The above discussion indicates that convergence of the Ricatti equations induces
an important function. This function maps the parameters of the stochastic regulator
(i.e., Til, T21, W, q, a12, bl , and a22) to the reduced-form coefficients, A - BF.
The above discussion also reveals that iterations on the Ricatti equations amount
to solving the dynamic problem 'backwards'. In the two-period reformulation of
the problem, period t's value function is defined as the maximum of the current
period return and the next period's expected value function. Period t - l's value
function is defined as the maximum of period t - l's return function and period
t's expected value function. Back substituting next period's value function into the
current period's condition yields a sequence of optimal controls. In short, the solution
procedure proceeds forward by computing past values of the optimal control.
By definition, finite time horizon problems are bounded, and their solution requires
beginning in the terminal period and ending in the starting period. However, infinite
time horizon problems require bounded value functions, which in turn require that
distant period return functions and their control must approach zero. Notice that if
Po = 0, Fo = 0, A - BFo = 0, and B -:f:. 0, the control in period T (i.e., VT) is 0 as
T - 00. Hence, setting Po = 0 and iterating on

until the matrix P converges to a fixed point is equivalent to solving the infinite time
horizon problem backwards.
The reduced-form solution of the stochastic regulator describes the movement of
economic data in four different, but interrelated, dimensions. First, the reduced-form
is not invariant to systematic changes in policy. A systematic change in a policy
variable within the X2 vector is represented by a change in the a22 coefficient. The
solution procedure indicates a change in policy will not only alter the A matrix, but
also will alter F, and therefore alter decision rules of agents. Hence, the problem
addresses the Lucas critique of econometric policy evaluation in which reduced forms
are not invariant to changes in policy.
Second, like any regression model, the reduced form coefficients measure the
response of next period's state vector to a one unit change in the current state vector.
Third, the reduced-form describes the response of the economy to c t+ I, the vector of
exogenous shocks. Specifically, such a change cannot be predicted either by agents
in the model or by the econometrician, based on the current period state variables.
The above setup implies a serially correlated response of the state variables to a
single, uncorrelated shock. A persistently higher path of food prices following a
drought describes a serially correlated response to a single, uncorrelated surprise. A
bounded regulator problem implies a stable A - BF matrix (one with eigenvalues
less than unity in modulus). A stable A - BF matrix implies the state variables can
be expressed as a function of current and past shocks. In particular, let the matrix H
capture the instantaneous causality (covariance) between elements of the ct vector,
and define et as the vector of uncorrelated errors (Sargent, 1978). The inverted
system is
A Bootstrap Estimator for Dynamic Optimization Models 51

L
00

Xt+1 = (A - BF)i Het-i .


i=O

The coefficients of this impulse response function measure the contribution of past
shocks on the current state vector. Equivalently, the coefficients measure the persis-
tent movement of the state vector following a single shock.
Fourth, it can be shown that the linear (in variables) Euler equations

can be factored into symmetric 'feedback' and 'feedforward' terms, and the endoge-
nous state vector XIt can be expressed as a function of the future expected stream
of the exogenous state variables {X2t} (Sargent, 1987 a, Ch. 14). Since {x2t+d is
assumed known, the prediction equations describing the stochastic path of X2 are ig-
nored. The computation of this 'perfect foresight' solution is detailed in the Appendix
for the example given in a subsequent section.
The proposed estimation procedure enables the analyst to compute and make
approximately correct inferences about the above responses. Successful computation
permits a rigorous interpretation of the economic time series data.

3. A BOOTSTRAP ESTIMATE

The parameters of the model described in the previous section are estimated using
a bootstrapping procedure and Bayes' Theorem. The prior density is an indicator
function that is diffuse when the boundary conditions hold and 0 otherwise. The
Bayesian bootstrap procedure permits valid inference on all of the parameters and
response coefficients.
The most convenient way to explain how the bootstrap procedure is applied here
is to examine the four fundamental components of the model. These are

1. Unrestricted Reduced Form

2. GMM estimates of the Euler equation parameters

d = argmino S(f), V) ,

where

and
52 A.i. Reed and C. Hallahan

L
00

mn(fJ,x) = n- I et(fJ) 0 Zt,


t=1

WXt + qUt + ,8b~ [rIlXlt+1 + r2lX2t+tl = e(fJ) .

3. Constraints

4. Restricted Reduced Form

[ Xlt+1 ] = (A _ BF) [Xlt] +[ 0 ]


X2t+ I X2t Ct+ I

In the unrestricted reduced form, ,8\2 is a 'free' parameter. In the stochastic regu-
lator, ,8\2 is a function of ,811 and ,821. This function or restriction may be impossible
to impose on an econometric reduced-form representation. Conceptually, however,
both reduced forms satisfy a similar regression structure because both residuals sat-
isfy the condition £(Ct+IXt) = O. Conceptually, either regression structure could be
estimated using a Seemingly Unrelated Regressions (SUR) estimator. The essence
of the proposed procedure is to generate bootstrap samples using the unrestricted
reduced form, restrict the bootstrap estimates to satisfy boundary conditions, and
compute the restricted reduced form.
GMM estimates of fJ are computed from the sample Euler equations using both
the original data and the bootstrap samples. Bootstrap 'T' statistics are used to make
draws on the parameters ,821 and fJ from the approximate likelihood function.
The Ricatti equations are evaluated at the parameter values of the problem.
Convergence within J iterations implies the boundary conditions hold, and the prior is
given a value of one. A - BF is computed for draws that converge. Nonconvergence
implies the boundary conditions do not hold in J iterations. In this case, the prior
density is assigned a value of zero.
The key to implementing the above procedure lies in drawing the parameters
from the bootstrap T statistic. The problem is similar to that of Geweke (1986) who
had the convenience of exact inference in a linear regression model with normally
distributed error terms. There, the pivotal element is distributed as a multivariate
Student-t and can be drawn from a random number generator and added to the OLS
estimate to obtain parameter draws from the likelihood. Here, the bootstrap 'T'
statistic may not be pivotal, but we assume it is nearly so, so that the likelihood can
conveniently be factored.
Bickel and Freedman (1981) and Freedman (1981) provide the conditions under
which the distribution of a bootstrap estimate approximates the distribution of the
statistic - roughly, the conditional distribution of the bootstrap sample must eventually
A Bootstrap Estimator for Dynamic Optimization Models 53

approach the distribution of the sample. When this condition holds, the conditional
distribution of the bootstrap pivot approaches the distribution of the theoretical pivot.
This result is important for both frequentist and Bayesian inference. It enables
frequentists to construct accurate confidence intervals when the distribution of the
sample is unknown. For a Bayesian analysis, the moments of the posterior density of
the parameters must be computed. The posterior density is proportional to the product
of a prior density and the likelihood function. Boos and Monahan (1986) factor the
likelihood function into a function of the data and a function of the theoretical pivot.
This factorization is performed under the assumption that the statistic is sufficient.
Bickel and Freedman's (1981) result permits Boos and Monahan (1986) to replace
the unobserved pivot with the bootstrap pivot in order to approximate the posterior
density.
This result is central to our method. It permits us to make draws from the support
of the approximate likelihood function using bootstrap pivots. SUR estimates of
the unrestricted parameter vector /3 = [/3;!, /3;2' /3~d' and GMM estimates of the
parameters (J deliver the point estimates b = [b;!, b;2' b~d' and the point estimate
d. In addition each estimator provides the covariance matrices C b and Cd. The
theoretical pivot for parameter /3 is T! = C;!/2(b - /3), and the bootstrap pivot is
Tt = C:-!/2(b* - b). Since the distribution ofT! is near Tt, set T! equal to Tt and

Repeating the same procedure for (J gives

The subvector (/3~! : (J'), is used to construct the matrices of the stochastic regulator
problem and the Ricatti equations. For a draw in which the Ricatti equations con-
verge, the restricted response coefficient A - BF is computed. Means and standard
deviations are then computed for these 'successful' draws. We illustrate this method
in the next section.

4. AN EXAMPLE

One statistic of interest to agricultural economists is the food price margin. The
food price margin is the difference between the value of a particular food item and
the price paid to farmers for the farm component of the good. Hence, the food
price margin defines the value added to the item by the processing sector. Empirical
research in this area attempts to predict how food price margins change in response
to a variety of exogenous shifters. Wohlgenant (1989) recognizes that nonfarm and
farm factors of production are substitutes in the manufacture of food and explores the
implications of input substitution for the movement of food price margins. Estimates
of the parameters are obtained from functions derived from static duality theory. An
earlier study, Wohlgenant (1985) explores the movement of food price margins over
time. Estimates of the parameters of a univariate dynamic and stochastic optimization
54 A.J. Reed and C. Hallahan

problem are computed. The problem illustrated in this section shows that multivariate
relationships among factors of production need not be sacrificed to obtain parameter
estimates of a dynamic and stochastic economic model.
Our example has the following specification:
The representative food processor's objective function

L
00

V{l) = max £0 !hr~l) ,


{lab,,fart,ene.} t=o

where

~labt )
( ~Jart ,
~enet

The representative farm firm 's objective function

where

The demandfunctionfor food

+ demt.

The stochastic equations of motion

The decision rule

labt )
( Jart =
[PII
P21
PI2
P13] (labt-I)
P22 P23 Jart-I
enet P31 P32 P33 enet_1
A Bootstrap Estimator for Dynamic Optimization Models 55

wagt
wagt-I
+ [ p"
PI5 PI6 P17 PI8
P24 P25 P26 P27 P28 PN
P29 1 enprt
enprt_1
P34 P35 P36 P37 P38 P39
demt
demt_1
The price margin

(Pt)
rt
= [Wl1
W21
WI2 W13 ]
W22 W23
C"b,_' )
jart-I
enet_1

wagt
wagt-I
enprt
+ [W14 WI5 WI6 WI7 WI8 WI9 ]
W24 W25 W26 W27 W28 W29 enprt-I
demt
demt-I

The model describes a typical food processing firm. This firm employs labor
(lab), farm (jar), and energy (ene) in the production of food. The firm's production
process is described by a linear production function. Each period the processing firm
receives the price of food (P), and pays wages (wag), farm price (r), and energy price
(enpr). The model also describes a typical farm supplier. This supplier receives the
price r for the farm inputs sold to the processor.
The processing firm incurs two types of internal capital costs associated with
utilizing the three factors. First, it incurs a long-run-returns-to-scale cost associated
with combining capital and the three factors. Returns-to-scale-cost parameters are
embedded in the H matrix (with elements hij). Wohlgenant (1989) could not reject
constant returns to scale for most of the food processing industries. We impose this
restriction with h22 = O. Second, the processing firm incurs short-run capital costs
of adjustment, whose parameters are embedded in the D matrix (having elements
dij ). While the farm firm experiences long-run constant returns to scale, capital costs
associated with output adjustments are captured in the parameter c.
The processing industry aggregate faces a consumer demand function for food
output as well as the cost function of the farm sector. The variable dem represents
the stochastic shifts in consumer demand, and AI represents the slope of the inverted
demand function. I At the beginning of each period, a shock occurs to wages,
energy prices, and demand shifts. These shocks define a set of Markov processes
described by three linear difference equations. A change in the parameters of these
difference equations represents a change in economic policy. The problem is to find
the sequences of labor, farm, and energy that maximizes the expected social welfare

I The AI and Cti parameters are obtained or derived from previous empirical studies
[Huang (1988), Putnam (1989)]. Using the sample means of the data, the demand shifter, dem,
is evaluated as the residual of the consumer demand function.
56 A.J. Reed and C. Hallahan

function. In turn, this solution implies a sequence of equilibrium food and farm price
sequences.
We used quarterly, U.S. beef industry data from 1965.1 to 1988.4 to construct
the variable sequences of the model. Data sources and a description of the variable
construction are available from the authors upon request. Two observations are lost
to lags in the model. Four observations are lost to fourth differences. Hence, 90
observations are used in the estimation. Bootstrap samples of size 90 are drawn.
Aggregating the representative processor and the representative farm supplier's
objective function gives the following dynamic programming problem

L P) ,
00

V(3) = max Eo ,Bt 7r


{lab,,J ar, ,ene,} t=O

where
.".(3) _
"t -

-(1/2) (tllabt,tlfart,tlenet) [~::o /l2c


0 d
~ 1 (~~:~t)
tl enet
,
33

subject to the stochastic equations of motion described above. The parameters of


the Euler equations for this problem are estimated using GMM. Specifically, the
instrumental-variable vector used to obtain the GMM estimates is Zt = [lab t _ l ,
jart-I> enet_l, wagt-I> enprt-I> Pt-I> demt-d'. Cholesky decompositions of the
SUR and GMM estimates of the covariance matrices are computed to form the 'T'
statistics.
The equilibrium of the model is found by solving the following dynamic pro-
gramming problem:

L
00

V(4) = max Eo ,Bt7r~4) ,


{lab,,J ar, ,ene,} t=O

where
.".(4) _
"t -
A Bootstrap Estimator for Dynamic Optimization Models 57

subject to the equations of motion given above. We compute the posterior distribution
of the objective function parameters and the linear stochastic difference equations.
The prior is assigned a value 1 if the Ricatti equations associated with V(4) converge
within 150 iterations. Otherwise, the prior is assigned a value zero. 664 of 1000
draws from the bootstrap likelihood resulted in convergent Ricatti equations.
We also compute the posterior for A - BF. A - BF represent the response
coefficients of the reduced-form input demand functions. Combining A - BF with
the consumer demand function gives the parameters of the food price equation.
Combining A - BF with the farm supplier's Euler equations gives the parameters
of the farm price equation. The food price and the farm price functions constitute the
food price margin function.
In Tables 1 to 3, we report the means and standard deviations (in parentheses) of
the posterior distribution. We assume a quadratic loss function. Therefore, the mean
represents our parameter estimate because it minimizes the loss function (Zellner,
1987). The standard error serves as the measure of dispersion of the posterior.
Table 1 reports the estimates of the parameters of the stochastic regulator, its
reduced-form solution, and the price margin functions. The negative estimate of
hI I suggests that labor is a capital saving input in the long-run in the beef industry.
The results also suggest firms consume capital when they adjust labor. (d l1 > 0)
However, they can offset capital adjustment costs by substituting farm inputs for labor
(d 12 < 0). Our estimate of the parameter c (62.6) indicates the short-run supply of
farm inputs facing the processing industry is upward sloping.
Table 1 also reports the parameter estimates of the equation of motion. The results
indicate the demand shifter displays oscillating (complex roots) patterns. The average
period from peak-to-peak is approximately one month (about 1/3 of a quarter). Also,
the results indicate that changes in energy prices have been more permanent than
have changes in wages.
Estimates of the coefficients of the equilibrium input demand functions are re-
ported in Table 1. Coefficient estimates of the reduced form are composite functions
of all or many of the parameters of the problem. Hence, the standard deviations
associated with the composite coefficients embody the standard deviations of many
parameters. The composite coefficients sometimes capture opposite effects. We es-
timate a negative steady-state cost of capital associated with labor. We also estimate
a positive dynamic cost of capital associated with labor (hl1 < 0, d l1 > 0). The
result of these offsetting effects is a positive response of labor to current period wages
(0.173). Apparently, it is the negative steady-state costs that induce firms to hire less
labor when consumer demand increases (-0.111). Our results are consistent with
58 A.J. Reed and C. Hallahan

TABLE I
Parameter estimates, beef model. •

The representative food processor's objective function


00

where

7r~I) = pt(.642, .730, .508) [~:~t 1


1
enet

Labt
-(wagt, rt, enprt) [ tart
1 -10.2 .000 .000
(8.1)
- (1/2) (Labtl tart, enet) [ .000.000.000
Labt )
( tart
enet .000 .000 6.34 enet
(5.3)
32.1 -48.0 .000
(29.2) (43.3)
-48.0 1.00 .000 ~Labt )
-(1/2) (~Labtl ~ tart, ~ enet) ( ~ tart .
(43.3)
~enet
.000 .000 4.72
(4.5)

The representative farm firm's objective function

where,

7r?) = rt tart - (1/2) 62.6(fart - tart_.)2 .


(43.3)

The demand function for food

Pt = -.563(.642, .730, .508) (~:~t)


enet
+ demt .

The stochastic equations of motion

.000 .000
[ (.10)
100
( :;~tt:11 ) = .000 .953 .000
(.12)
( enprt
wagt )
demt+1 .000 .000 .995 demt
(.12)
-.21 .000 .000
(.09)

+
.000 -.037 .000
+ (cc2t+1+l)
lt
(.10)
.000 .000 -.286 c3t+1
(.10)
A Bootstrap Estimator for Dynamic Optimization Models 59

Table 1 (continued)
The decision rule
.268 .042 .007
(.30) (.09) (.02)

labt ) =
( Jart -.39 .915 .003 ( labt - 1 )
(.28) (.21) (.02) Jart-l
enet enet_l
.002 -.038 .292
(.07 (.12) (.19)

1
.173 -.011 -.013 .000 -.111 .008
(.49) (.05) (.05) (.00) (.18) (.03)
wagt-l
wagt
.014 .001 .012 -.000 .007 -.009 enprt
+ (.22) (.02) (.11) (.01) (.12) (.02) ( enprt-l .
demt
-.014 .001 -.197 -.002 .104 -.007 demt-l
(.06) (.01) (.55) (.05) (.22) (.05)
The price margin

-.063 .380 .087]


(
Pt ) = [
(.18) (.09) (.06) labt-l )
( Jart-l
rt -3.86 -.053 -.00 enet-l

1
(12.5) (.51) (2.0)

.064 -.003 -.056 -.001 .992 -.003] ( wagt


(.11) (.01) (.14) (.01) (.10) (.02) wagt-l
[ enprt
+ .
.587 -.365 .009 .006 -.364 .509 enprt-l
(1.0) (.67) (.16) (.12) (1.2) (.88) d:~";~l

* Reported values are means of the posterior, and the numbers in parentheses are standard
errors of the posterior.

the notion that consumers have shifted toward products containing more nonfarm
inputs.
These results are also used to trace the impacts of exogenous changes on the
food price margin. Our results indicate that a wage increase induces an increase in
labor demand. However, positive adjustment costs associated with labor dampen
this increase. In response to a wage increase, firms substitute farm inputs for labor.
This raises the demand for farm inputs and increases farm prices. Our results suggest
the increase in wages raises the marginal costs of processing. However, the larger
increase in farm prices narrows the food price margin. The results also suggest a weak
relationship between the demand for farm inputs and a (positive) shift in consumer
demand. Our point estimate is slightly negative (-0.007). Hence, we estimate that
the price margin widens when consumers increase their demand for beef.
60 A.J. Reed and C. Hallahan

TABLE 2
Estimates of the perfect foresight solution, beef prices .•
Food Pricet, Pt Farm Pricet, Tt
j £t wagt+i £t enpTt+i £t demt+i £t wagt+i £t enpTt+i £t demt+i
1 -0.0489 -0.0084 1.0136 3.5922 1.9117 -3.6690
(.109) (.056) (.082) (5.38) (3.18) (5.42)

2 -0.0030 0.0071 -0.0073 -1.6438 -1.1511 1.8734


(.040) (.019) (.033) (4.13) (2.49) (4.23)

3 -0.0049 0.0063 -0.0028 0.1807 0.0703 -0.1549


(.018) (.011 ) (.017) (2.13) (1.23) (2.16)

4 0.0001 0.0069 -0.0054 -0.2107 -0.1368 0.2189


(.011 ) (.007) (.012) (1.36) (.746) (1.39)

5 -0.008 0.0059 -0.0038 0.0691 0.0339 -0.0575


(.007) (.006) (.008) (.987) (.535) (1.06)

6 0.0004 0.0054 -0.0042 -0.0590 -0.0333 0.0512


(.005) (.004) (.007) (.755) (.409) (.860)

7 0.0000 0.0047 -0.0033 0.0276 0.0153 -0.0206


(.003) (.004) (.006) (.602) (.329) (.714)

8 0.0004 0.0041 -0.0032 -0.0210 -0.0095 0.0136


(.003) (.003) (.005) (.487) (.270) (.592)
• Reported values are means of the posterior. The values in parentheses are
standard errors of the posterior.

Eckstein (1985) demonstrates that a univariate stochastic regulator problem can


be used to compute a useful variety of response elasticities. The results presented
in Tables 2 and 3 illustrate that the above procedure is well-suited to the statistical
estimation of such responses implied by a more general problem. Table 2 reports
the estimates of the perfect foresight solution. This solution gives the current period
response to a known or expected change that occurs j periods into the future. The
idea is that firms adjust production in the current period to reduce adjustment costs
later.
The responses reported in Table 2 display the small effect that future wage
increases exert on current period food prices. This small response is partly due
to the offsetting static and dynamic costs of adjusting labor. Table 2 also reports that
future wage changes exert a positive effect on farm price. Evidently, firms substitute
A Bootstrap Estimator for Dynamic Optimization Models 61

TABLE 3
Impulse response estimates, beef model. •
Food Pricet, Pt Farm Pricet, rt
j wagt_j enprt-j demt-j wagt_j enprt_j demt_j
0 0.0000 0.0000 0.0000 2.1067 -0.0265 -2.2515
0.0000 0.0000 0.0000 (2.41) (.736) (2.60)

0.0643 -0.0559 0.9935 0.5871 0.0082 -0.3858


(.106) (.136) (.103) (1.05) (.178) (1.20)

2 0.0750 -0.0635 1.0034 0.2701 0.0121 0.1508


(.118) (.150) (.148) (.568) (.147) (.573)

3 0.0722 -0.0609 0.7337 -0.0097 0.0269 0.4512


(.118) (.151) (.168) (.386) (.132) (.444)

4 0.0623 -0.0549 0.4591 -0.0452 0.0205 0.3978


(.109) (.143) (.183) (.266) (.120) (.389)

5 0.0520 -0.0485 0.2620 -0.0962 0.0273 0.3114


(.095) (.130) (.191) (.216) (.105) (.339)

6 0.0419 -0.0425 0.1471 -0.0778 0.0189 0.1830


(.081) (.114) (.179) (.181) (.093) (.281)

7 0.0334 -0.0372 0.0906 -0.0810 0.0220 0.1010


(.068) (.098) (.152) (.156) (.079) (.236)
• Reported values are means of the posterior. The values in parentheses are
standard errors of the posterior.

farm inputs for labor before an expected wage increase. Our results also indicate the
price and quantity demanded of farm commodities change before a known increase
in energy price.
The impulse response coefficients are reported in Table 3. Our results measure the
change in food and farm prices following a shock to an exogenous variable. These
coefficients account for the contemporaneous relationships among the various shocks.
Hence, it is difficult to provide an intuitive explanation of the results presented in
Table 3.
62 A.i. Reed and C. Hallahan

5. CONCLUSIONS

This study uses the Bayesian bootstrap to compute econometric estimates of stochas-
tic, dynamic programming problems. Typically, statistical inferences on the reduced-
form coefficients of such a problem are difficult because of the complex cross-
equation restrictions that characterize such solutions. Likewise, direct estimation of
the problem's parameters requires the estimates to adhere to boundary conditions,
which when imposed, require classical techniques to be significantly modified or
discarded (since the boundary condition cannot be checked by evaluating, for ex-
ample, the eigenvalues of a matrix). By contrast, our procedure combines textbook
algorithms with the Bayesian bootstrap to form an estimator that is well suited to
impose such a restriction.
The estimator holds value for analysts facing difficulties imposing restrictions
on any econometric model. It also should be useful for analysts pursing a Bayesian
analysis, but who are uncomfortable with the usual assumption of normally distributed
error terms. All that is required is a statistical representation from which the analyst
can draw bootstrap samples of the variables of the model. The estimator could be
used, for example, to estimate static duality models when one is concerned with
imposing the required curvature restrictions.

REFERENCES

Baxter, M., M.J. Cricini, and K.G. Rouwenhorst: 1990, 'Solving the stochastic growth model
by a discrete state-space, euler-equation approach', Journal of Business and Economic
Statistics 8, 19-21.
Bickel, P.J., and D.A. Freedman: 1981, 'Some asymptotic theory for the bootstrap', The
Annals of Statistics 9, 1196-1217.
Boos D.O., and I.F. Monahan: 1986, 'Bootstrap methods using prior information', Biometrika
73,77-83.
Christiano, L.J.: 1990, 'Solving the stochastic growth model by linear-quadratic approximation
and by value-function iteration', Journal of Business and Economic Statistics 8, 23-26.
Coleman, W.J.: 1990, 'Solving the stochastic growth model by policy-function iteration',
Journal of Business and Economic Statistics 8, 27-29.
den Haan, W.J., and A. Marcet: 1990, 'Solving the stochastic growth model by parameterizing
expectations', Journal of Business and Economic Statistics 8, 31-34.
Eckstein, Z.: 1985, 'The dynamics of agriculture supply: a reconsideration', American Journal
of Agricultural Economics 67,204-214.
Freedman, D.A: 1981, 'Bootstrapping regression models', The Annals of Statistics 9, 1218-
1228.
Gagnon, J.E.: 1990, 'Solving the stochastic growth model by deterministic extended path',
Journal of Business and Economic Statistics 8, 35-38.
Gallant, AR.: 1987, Nonlinear Statistical Models, New York: John Wiley and Sons.
Gallant, AR., and G.H. Golub: 1984, 'Imposing curvature restrictions on flexible functional
forms', Journal of Econometrics 26, 295-321.
Geweke, J.: 1986, 'Exact inference in the inequality constrained normal linear regression
model' , Journal of Applied Econometrics 1, 127-141.
Huang, K.: 1988, 'An inverse demand system for U.S. composite goods', American Journal
of Agricultural Economics 70, 902-909.
A Bootstrap Estimator for Dynamic Optimization Models 63

Labadie, P.: 1990, 'Solving the stochastic growth model by using a recursive mapping based
on least squares projection', Journal of Business and Economic Statistics 8, 39-40.
Lucas, R.E.: 1976, 'Econometric policy evaluation: a critique', The Phillips Curve and the
Labor Market (K. Brunner and A. Meltzer eds) Volume 1 of Carnegie-Rochester Con-
ferences in Public Policy, a supplementary series to the Journal of Monetary Economics,
Amsterdam: North Holland.
McGratten, E.R.: 1990, 'Solving the stochastic growth model by linear-quadratic approxima-
tion', Journal of Business and Economic Statistics 8, 41-44.
Miranda, MJ., and lW. Glauber: 1991. "Estimation of dynamic nonlinear rational expecta-
tions models of commodity markets with private and government stockholding." Paper
presented at the annual meetings of the American Agricultural Economics Association.
Manhattan, Kansas. August 4-7, 1991.
Putnam, J.J.: 1989, Food Consumption, Price, and Expenditures USDNERS Satistical Bul-
letin No. 773.
Sargent, T.J.: 1987a, Macroeconomic Theory, Boston: Academic Press.
Sargent, T.J.: 1987b, Dynamic Macroeconomic Theory, Cambridge: Harvard University Press.
Sargent, T.J.: 1978: 'Estimation of dynamic demand schedules under rational expectations',
Journal of Political Economy, 86, 1009-1044.
Sims, C.: 1990, 'Solving the stochastic growth model by backsolving with a particular
nonlinear form for the decision rule', Journal of Business and Economic Statistics 8,
45--48.
Tauchen, G.: 1990, 'Solving the stochastic growth model by using quadrature methods and
value-function iterations', Journal of Business and Economic Statistics 8, 49-51.
Taylor J.B. and H. Uhlig: 1990, 'Solving nonlinear stochastic growth models: a comparison
of alternative solution methods', Journal of Business and Economic Statistics 8, 1-17.
Wohlgenant, M. K.: 1989, 'Demand for farm output in a complete system of demand functions' ,
American Journal ofAgricultural Economics, 71,241-252.
Wohlgenant, M.K.: 1985, 'Competitive storage, rational expectations, and short-run food price
determination', American Journal of Agricultural Economics, 67,739-748.
Zellner, A.: 1987b, An Introduction to Bayesian Inference in Econometrics, Malibar: Robert
E. Krieger Publishing Company.
GREGORY C. CHOW

Computation of Optimum Control Functions


by Lagrange Multipliers

ABSTRACT. An algorithm is proposed to compute the optimal control function without solving
for the value function in the Bellman equation of dynamic programming. The method is to
sol ve a pair of vector equations for the control variables and the Lagrange multipliers associated
with a set of first-order conditions for an optimal stochastic control problem. It approximates
the vector control function and the vector Lagrangean function locally for each value of the
state variables by linear functions. An example illustrates that such a local approximation is
better than global approximations of the value function.

Previously (Chow 1992a, 1993) I have shown that the optimum control function
of a standard optimum control problem can be derived more conveniently by using
Lagrange multipliers than solving the Bellman partial differential equation for the
value function. This derivation also provides numerical methods for computing the
value of the optimum control corresponding to a given value of the state variable
that are more accurate than those based on solving the Bellman equation. This paper
explains the gain in numerical accuracy and illustrates it by example.

1. DERIVATION OF THE OPTIMAL CONTROL FUNCTION

Consider the following standard optimum control problem in discrete time (an anal-
ogous problem in continuous time is considered in Chow (1993), and the results of
this paper apply equally well to that problem). Let Xt be a column vector of p state
variables and Ut be a vector of q control variables. Let r be a concave and twice
differentiable function and f3 be a discount factor. E t denotes conditional expectation
given information at time t, which includes Xt. The problem is

L
00

max E t f3t r(Xt, Ut) (1)


{Ut}~o t=O

subject to
Xt+l = f(Xt, ut} + CHI, (2)
where Ct+1 is an i.i.d. random vector with mean zero and covariance matrix I;.
Chow (1992a) solves this problem by introducing the p x 1 vector At of Lagrange
multipliers and setting to zero the derivatives of the Lagrangean expression

D. A. Belsley (ed.), Computational Techniques/or Econometrics and Economic Analysis, 65-72.


© 1994 Kluwer Academic Publishers.
66 G.C.Chow

L
00

.c = Edf3 t r(xt, Ut) + f3t+! A~+dxt+1 - f(xt, Ut) - Ct+l]} (3)


t=o
with respect to Ut and Xt (t = 0, 1,2, ... ). The first-order conditions are

00Ut r(xt, Ut) + f3 00 !'(Xt, Ut) EtAt+!


Ut
= 0, (4)

The optimum control at time t is obtained by solving equations (4) and (5) for Ut and
At.
The difficult part in solving these equations is the evaluation of the conditional
expectation EtAt+!, a problem to be treated shortly. We first point out the main
differences between this approach and that of solving the Bellman partial differential
equation for the value function V(x). First, it is not necessary to know the value
function to derive the optimum control function since the latter is a functional, not of
V, but of the vector A of derivatives of V with respect to the state variables. Thus,
obtaining the value function V requires more than is needed to obtain the optimum
control function and hence solves a more difficult problem than necessary. For
example, in the problem of static demand theory derived from maximizing consumer
utility subject to a budget constraint, Bellman's method amounts to finding the indirect
utility function by solving a partial differential equation, whereas we would apply
the method of Lagrange multipliers in obtaining the demand function. Second, our
equation (5) could be obtained by differentiating the Bellman equation with respect
to the state variables. This is a very important first order condition for optimality,
but it is ignored when one tries to solve the Bellman equation for the value function
and thus makes the solution of the optimum control problem more difficult. Third,
for most realistic applied problems an analytical solution for the value function is not
available. A common practice when solving the Bellman equation is to use a global
approximation to the value function when deriving the optimum control function.
By contrast, in solving equations (4) and (5) for a given Xt we avoid using a global
approximation to the Lagrange function in the neighborhood of Xt and use instead
a linear function to approximate A locally for each Xt. This typically yields a more
accurate approximation to the Lagrange function and hence to the corresponding
value function in the Bellman approach.

2. NUMERICAL SOLUTION OF THE FIRST ORDER CONDITIONS

To provide a numerical method for solving the first order conditions (4) and (5), we
approximate the Lagrange function in the neighborhood of Xt by a linear function,
(6)
where the t subscripts of the parameters H t and ht indicate that the linear function
(6) applies to points not too far from Xt, in particular to Xt+!. Thus
Computation of Optimum Control Functions 67

(7)
Et .At+1 = Hd(xt, ut} + ht .
Taking Xt as given, we try to solve (4) and (5) for Ut and.At using (7) for Et.At+l.
Substituting (7) into (4) yields
ar
-a + (3 -a
at'
(Hd + h t ) = 0. (8)
Ut Ut
Assuming tentative~ H t and h t to be known, we solve (8) for Ut using linear
approximations of -a
rand f
Ut

(9)
ar
-a
Xt
= Kltxt + K 12tUt + kIt ,

(10)

where the time subscripts for the parameters of the linear functions indicate that the
functions are valid for values of x and U near Xt and the optimal u;. These parameters
are obtained by evaluating the partial derivatives of
value for u;,
:r
and f at Xt and some initial
the latter to be revised after each iterati~n. Substituting (9) and (10)
into (8) gives

K2tUt + K 2Itxt + k2t + f3C:Ht(Atxt + CtUt + bt ) + f3C:h t = 0. (11)

Equation (11) can be solved for Ut, yielding

Ut = GtXt + gt , (12)

where

G t = -(K2t + f3C:Ht Ct )-1 (K2\t + f3C:Ht At} , (13)

gt = -(K2t + f3C:HtCt}-1 [k2t + f3CHHtbt + ht )] . (14)

To find the parameters H t and h t for .At, we substitute (6), (7), (9), (10) and (12)
into (5) to get

Htxt + ht = KItxt + K\2t(Gtxt + gt) + kIt


(15)
f3A~Ht(AtXt + CtGtXt + Ctgt + bt ) + f3A~ht .
Equating coefficients of (15) yields

Ht = Kit + K I2t Gt + f3A~Ht(At + CtGt ) , (16)


68 G.C.Chow

(17)

To solve equations (4) and (5) numerically, we assume some initial value for the
optimal u; and linearize or / OUt, or / OXt and f about Xt and this value of u; as in
(9) and (10). We then solve the pair of equations (13) and (16) iteratively for G t and
H t . Given G t and H t , the pair of equations (14) and (17) can be solved iteratively
for 9t and h t. The value of optimal control u; is found by GtXt + 9t. This value will
be used to relinearize or / OU, or/ox and f until convergence.
The reader may recognize that the numerical method suggested in this paper
amounts to solving the well known matrix Ricatti equations (13) and (16) for G t and
H t in linear-quadratic control problems. However, there are two important differ-
ences from the standard treatment of stochastic control by dynamic programming.
First, our derivation is different as it does not use the value function at all. Second,
we emphasize the solution of two equations (4) and (5) for Ut and At while treating
Xt as given. We have avoided global approximations to the functions u(x) and A(X)
which can lead to large errors. We employ linear approximations to u(x) and A(X)
only locally about a given Xt and build up the nonlinear functions u(x) and A(X)
by these locally linear approximations for different Xt. To generalize our second
point, we can choose other methods to solve equations (4) and (5) for a given Xt.
We could, for example, use a quadratic approximation to A( x) as discussed in Chow
(1992a). We leave other numerical methods for solving equations (4) and (5) for
future research.

3. AN ILLUSTRATIVE EXAMPLE

To demonstrate how a nonlinear optimal control function is computed numerically


by locally linear approximations, I use a baseline real business cycle model presented
by King, Plosser and Rebelo (1988) and analyzed by Watson (1990). The model
consists of two control variables Utt and U2t, representing consumption and labor
input, respectively, and two state variables Xtt and X2t, denoting, respectively, log At
and capital stock at the beginning of period t, where At represents technology in the
production function qt = x~;-'>(Atu2t)a. The dynamic process (2) is

Xtt = "Y + XI,t-1 + Ct ,


(18)
X2t = (1 - b) X2,t-1 + X~~~I exp(axI,t-d utt-I - UI,t-1 .

The first equation assumes Xlt = log At to be a random walk with a drift "Y, Ct
being a random shock to technology. The second equation gives the evolution of
capital stock X2t, with b denoting the rate of depreciation and investment being the
difference between output qt-I given by the production function and consumption
UI,t-l. The utility function r in (1) is assumed to be

r = log Ult + () 10g(1 - U2t) , (19)


Computation of Optimum Control Functions 69

TABLE 1
Optimal control variables corresponding to
selected state variables.
Ul U2 Xl X2

1. 4.865 0.227 3.466 13.098


2. 5.028 0.219 3.518 13.423
3. 5.297 0.217 3.544 13.754
4. 5.341 0.208 3.559 14.153
5. 5.535 0.206 3.606 14.404
6. 5.656 0.202 3.663 14.672
7. 5.979 0.202 3.738 15.045
8. 6.486 0.208 3.814 15.754
9. 6.868 0.211 3.851 16.765
10. 7.232 0.212 3.851 17.783
11. 7.568 0.212 3.872 18.641
12. 7.901 0.220 3.884 19.715
13. 8.100 0.217 3.862 20.737
14. 8.681 0.230 3.891 21.636
15. 8.844 0.236 3.880 22.893
16. 8.807 0.230 3.862 24.126
17. 9.315 0.234 3.877 25.119
18. 9.983 0.245 3.895 26.441
19. 10.427 0.254 3.913 27.750

where 1 - U2t denotes leisure.


There are five parameters in this model: a, the labor exponent in a Cobb-Douglas
production function; (3, the dicount factor; 'Y, the drift in the random walk process
for Xlt, which is log of the Solow residual in the production function; 8, the rate
of depreciation for capital stock; and 0, the weight given to leisure in the log-linear
utility function of consumption Ult and leisure 1 - U2t. In Chow (1992b) I have
estimated these five parameters by maximum likelihood using quarterly data of the
United States from 1951.1 to 1988.4, covering 38 years. The subject of statistical
estimation by maximum likelihood does not concern us in this paper. Here we take
the resulting set of values for the parameters as given and examine how the linear
approximations to the optimal control function change with the state variables. Let
a = .6368, (3 = .8453, 'Y = .00304, 8 = 1.77 X 10- 8, and 0 = 3.5198. I have
computed the optimal values for the control variables corresponding to 19 sets of
state variables, which are the historical values of these state variables in the first
quarters of the 19 years 1951, 1953, ... , 1987. The values of the four variables are
given in Table 1. The parameters G lI , G 12 , G2J , G22 , gl, g2 of the linear function
corresponding to each set of state variables are given in Table 2. Table 2 illustrates
how poor a global linear approximation to the optimal control function would be, as
70 G.c. Chow

TABLE 2
Parameters of linear optimal control functions.
Gll Gl2 G2l G22 91 92
1. 1.255 0.0546 -0.219 -0.0095 0.165 1.223
2. 1.207 0.0513 -0.213 -0.0090 0.558 1.112
3. 1.280 0.0531 -0.209 -0.0087 0.384 1.107
4. 1.169 0.0471 -0.199 -0.0080 1.059 1.055
5. 1.197 0.0474 -0.199 -0.0079 1.085 1.071
6. 1.170 0.0455 -0.198 -0.0077 1.382 1.086
7. 1.263 0.0479 -0.203 -0.0077 1.182 1.136
8. 1.470 0.0532 -0.210 -0.0076 0.517 1.195
9. 1.602 0.0545 -0.211 -0.0072 0.207 1.202
10. 1.714 0.0550 -0.206 -0.0066 -0.048 1.167
11. 1.789 0.0547 -0.202 -0.0062 -0.123 1.149
12. 2.009 0.0581 -0.207 -0.0060 -0.957 1.164
13. 1.974 0.0543 -0.196 -0.0054 -0.495 1.090
14. 2.420 0.0638 -0.205 -0.0054 -2.500 1.142
15. 2.542 0.0633 -0.204 -0.0051 -2.820 1.122
16. 2.316 0.0548 -0.193 -0.0046 -1.354 1.042
17. 2.558 0.0581 -0.192 -0.0044 -2.262 1.036
18. 3.021 0.0652 -0.195 -0.0042 -4.254 1.050
19. 3.372 0.0693 -0.198 -0.0041 -5.693 1.063

TABLE 3
Regressions of coefficients of linear control functions on
state variables (t statistics in parentheses).
Explanatory variables
Dependent
variables Xl X2 R2

Gll 0.391 -0.346 0.148 0.960


(0.337) (-0.986) (13.16)

Gl2 0.044 -0.0029 0.0012 0.683


(1.40) (-0.304) (3.93)

G2l -0.204 -0.0052 0.0011 0.405


(4.21) (-0.350) (2.35)

G22 -0.023 0.0031 0.00027 0.975


(-9.93) (4.37) (11.84)
Computation of Optimum Control Functions 71

the parameters of the locally linear approximations change with the state variables.
To describe the changes, Table 3 presents linear regressions of four parameters on the
two state variables and the accompanying t statistics and R2 for descriptive purposes
only (as the regressions are not based on a stochastic model).
For this example, we set the maximum number of iterations for solving the pair
of equations G t and H t using (13) and (16) to 25, given each value for u;
used in
linearizing ar / au, ar / ax and f. For our criterion of convergence to three significant
figures, the maximum number of 25 is found to be better than 50 and 20. Once the
optimal linear control function is found for Xl and X2 as of 1951.1, the optimum u;,
G t , H t , 9t and ht can be used as initial values to compute the optimal linear control
function corresponding to Xl and X2 as of 1953.1, and so forth. It takes about eight
hours on a 486 personal computer to maximize a likelihood function with respect to
the five parameters using a simulated annealing maximization algorithm (see Goffe,
Ferrier and Rogers, 1992) which evaluates the likelihood function about 14,000 times
(or about two seconds per evaluation of the likelihood function). At each evaluation,
one must find the linear optimal control function for the given parameters, compute
the residuals of the observed values of the control variables from the computed
optimal values for 152 quarters, compute the value of the likelihood function, and
determine the new values of the five parameters for the next functional evaluation,
which may be time consuming. Hence, merely computing the linear optimal control
function for a given set of parameters in our example should take less than one second
of a 486 computer using Gauss.
In this paper, I have shown how locally linear optimal control functions can be
computed for a standard stochastic control problem in discrete time. The algorithm
is based on solving two equations for the vectors of control variables and Lagrange
multipliers, given the vector of state variables. It is easy to implement using a personal
computer. It can serve as an important component of an algorithm for the statistical
estimation of the parameters of a stochastic control problem in econometrics.

ACKNOWLEDGEMENTS

The author would like to thank Chunsheng Zhou for excellent programming assistance
in obtaining the numerical results reported in this paper and David Belsley for helpful
comments on an early draft.

REFERENCES

Chow, Gregory c., "Dynamic optimization without dynamic programming," Economic Mod-
elling, 9 (1992a), 3-9.
Chow, Gregory C., "Statistical estimation and testing of a real business cycle model," Princeton
University, Econometric Research Program, Research Memorandum No. 365 (1992b)
Chow, Gregory C.,"Optimal control without solving the Bellman equation," Journal of Eco-
nomic Dynamics and Control, 17 (1993).
72 G.c. Chow

Goffe, William L., Gary Ferrier and John Rogers, "Global optimization of statistical functions,"
in Computational Economics and Econometrics, Vol. 1, eds. Hans M. Amman, D. A.
Belsley, and Louis F. Pau, Dordrecht: Kluwer, 1992.
King, Robert G., Charles I. Plosser and S. T. Rebelo, "Production, growth, and business cycles:
II. New Directions," Journal o/Monetary Economics, 21 (1988),309-342.
Watson, Mark W., "Measures of fit for calibrated models," Northwestern University and
Federal Reserve Bank of Chicago, mimeo, 1990.
PART TWO

The Computer and Economic Analysis


DAVID KENDRICK

Computational Approaches to Learning with Control


Theory

ABSTRACf. Macroeconomics has just passed through a period in which it was assumed
that everyone knew everything. Now hopefully we are moving into a period where those
assumptions will be replaced with the more realistic ones that different actors have differ-
ent information and learn in different ways. One approach to implementing these kinds of
assumptions is available from control theory.
This paper discusses the learning procedures that are used in a variety of control theory
methods. These methods begin with deterministic control with and without state variable
and parameter updating. They also included two kinds of stochastic control: passive and
active. With passive learning, stochastic control variables are chosen while considering
the uncertainty in parameter estimates, but no attention is paid to the potential impact of
today's control variables on future learning. By contrast, active learning control seeks a
balance between reaching today's goals and gaining information that makes it easier to reach
tomorrow's goals.

INTRODUCTION

We have just passed through a period in which the key assumptions in macroeconomic
theory were that everyone knew everything. Now hopefully we are moving to a new
period in which it is assumed that the various actors have different information about
they economy; moreover, they learn but they do so in different ways.
Recently Abhay Pethe (1992) has suggested that we are now in a position to
develop dynamic empirical macroeconomic models in which some actors learn in
a sophisticated fashion by engaging in active learning with dual control techniques
while other actors learn only incidentally as new observations arrive and are processed
to form new estimates. One subset of this latter group considers the uncertainty in
the economic system when they choose their actions for the next period. The other
subset ignores the uncertainty in choosing a course of action for the next period.
Finally, there is a fourth group that does not even bother to update their parameter
estimates as additional observations are obtained.
While it is possible that one or more of these subgroups will be empty in any real
economy, the starting assumption that different actors have different information,
choose their actions in different ways, and learn in different ways seems a much
more realistic and solid foundation for macroeconomics that the assumptions of the
previous era.

D. A. Belsley (ed.), Computational Techniquesfor Econometrics and Economic Analysis, 75-87.


© 1994 Kluwer Academic Publishers.
76 D. Kendrick

However, in the new period the analysis of macroeconomic systems will require
different tools then those used in the previous era. While many results from the
previous period could be obtained with analytical mathematics, the tools of the new
era are much more likely to be computational. In anticipation of this, the current
paper reviews the state of the art with regard to one set of tools that could serve well
in the new era. These are the methods of control theory, which date back to the work
of Simon (1956) and Theil (1957) as well as Aoki (1967), Livesey (1971), MacRae
(1972), Prescott (1972), Pindyck (1973), Chow (1975) and Abel (1975). These
methods are now enjoying a resurgence as attention turns once again to learning in
economic systems.
Also the resurgence is being abetted by technical changes in computer hardware
and software that have continued at a rapid pace in the last two decades. Control
methods that were difficult to use twenty years ago on mainframe computers can now
be used on ubiquitous desktop computers. Also super computers, some with parallel
processing capabilities, are rapidly opening an era in which even active learning
stochastic control methods can be used on economic models of substantial size.
It is in this context that this paper examines the current state of the art in numerical
methods for control theory beginning with deterministic systems and passing through
passive learning methods to end with active learning systems. The emphasis is not
on the scope of the activity, since no attempt is made to be comprehensive. Rather
the focus will be on areas where new developments in hardware and software offer
us new opportunities. Also, some major problems that stand in our pathway will
be highlighted. Deterministic problems will be discussed first, followed by passive
learning and active learning problems.

1. DETERMINISTIC CONTROL

Since all uncertainty is ignored in solving deterministic control problems one is free
to use either quadratic-linear or general nonlinear methods. Consider first quadratic-
linear methods and then progress to the general nonlinear problems.
The deterministic quadratic-linear tracking problem is written as find
( N-l
Uk ) k=O
to minimize the cost functional

N-l
+~ L {[Xk - XkJ' W[Xk - XkJ + [Uk -ihJ' A[Uk - Uk]} , (1)
k=O

where

Xk = state vector - an n vector,


Xk = desired state vector - an n vector,
Computational Approaches to Learning with Control Theory 77

Uk =control vector - an m vector,


Uk =desired control vector - an m vector,
W N = symmetric state variable penalty matrix at terminal period, N,
W =symmetric state variable penalty matrix for periods 0 thru N - 1,
A =symmetric control variable penalty matrix for periods 0 thru N - 1,

subject to

Xk+l = AXk + BUk + Czk , k = O, ... ,N - 1, (2)

with Xo given, where

A =state vector coefficient matrix (n x n),


B =control vector coefficient matrix (n x m),
C =exogenous vector coefficient matrix (n x i),
Zk =exogenous vector (i x 1) at time k.

In a macroeconomic setting the state variables are typically unemployment, in-


flation and the balance of payments, and the control variables are taxes, government
spending and the money supply. Following Pindyck (1973) the problem is set up as
tracking desired paths for both state and control variables as closely as possible.
The codes for solving this QLP (quadratic-linear) problem usually use Riccati
equations and are very fast. They were originally coded in Fortran and later in Pascal
and C. More recently they have been coded in metalanguages such as RATS and
GAUSS. The RATS and GAUSS implementations have the advantage that the model
can be estimated, simulated, and solved as an optimal control problem within the
same framework.
However, one of the most effective methods of solving this class of models is
the use of the GAMS language. GAMS is a modeling language that is set driven
so that one can create a set of state variables and a set of control variable and then
define mathematical relations over these sets. Thus it is not necessary to handcraft
each equation but rather only types of equations that are defined over the sets. For an
example of the solution of a quadratic linear macroeconometric model in the GAMS
language see Parasuk (1989).
While the Riccati methods are confined to quadratic-linear models GAMS em-
ploys general nonline!ll" programming solvers such as MINOS to compute the solution
to the model so the user can alter his or her problem from quadratic-linear to general
nonlinear and continue using the same modeling system and solver. In addition the
GAMS system is designed to be used with various solvers so, as technical progress is
made in the solver software, the users gain the benefit of these changes without hav-
ing to move the model from one code to another. An example is the recent addition
of Drud's CONOPT (1992) software to the GAMS package. Thus the user can shift
from using MINOS to using CONOPT by altering a single line in the GAMS problem
representation. Since nonlinear programming codes have comparative advantages
for different types of models, this ability to move easily between solvers could prove
to be most beneficial.
78 D. Kendrick

Examples of the use of GAMS for general nonlinear macroeconometric control


models are Fair's (1984) theoretical models for the household and the firm. These
models have the advantage over analytic methods, viz Turnovsky (1977), that they
can be extended to models with more than a few equations and can be solved for
transitory as well as steady state solutions. The solutions contain the same kind of
derivative sign information that was available with the analytical methods, viz when
the money supply increases interest rates will fall. However, the numerical methods
have the disadvantage that they yield results that hold only for the particular numerical
parameters used. This loss though is mitigated by the fact that the analytical methods
frequently encountered tradeoffs in which it was impossible to sign the outcomes.
Thus even with its disadvantages, Fair's method provides a new and fresh approach
to this part of macroeconomic theory.
Fair's original software implementation of this kind of modeling is relatively
difficult to use. However, GAMS can be used to develop both household and firm
models in an intuitive fashion so that the models can be easily altered. For an example
of the Fair type of models in GAMS see Park (1992).
Deterministic models can be used with three types of learning - or the lack
thereof. In the first type, the decision maker solves the deterministic model for a
number of time periods and then uses this solution over the time horizon without
solving the model again. Lets call this method "deterministic without update" or
simply "deterministic".
In the second type, the decision maker solves the models for many time periods
but only uses the policy values for the first time period. Then after the policy is
applied and new values of the state variable emerge, he solves the problem again
with these new state values as initial conditions. Once again, he uses only the first
period policy and then repeats the process in each time period. This method can be
called "deterministic with state update".
The third type is the same as the second except as new state variable observations
come available in each period they are used to update parameter estimates in the
system equations. This method can be called "deterministic with state and parameters
update".
In summary "deterministic" decision makers are those who ignore the effects of
uncertainty on the policy choice per se but they may engage in updating behavior with
state variables and/or parameters. It seems likely that the largest group of decision
makers fall into the second case. These people ignore uncertainty when making their
decisions but they update the initial conditions for their dynamic problem each period
as they move forward in time.

2. STOCHASTIC: PASSIVE LEARNING

Decision makers who use stochastic methods fall into two groups. Individuals in the
first group use passive learning methods. Decision makers in this group consider the
uncertainty in the system equation parameters while determining policies; however,
no consideration is given to the effect of the decisions on future learning. In contrast,
Computational Approaches to Learning with Control Theory 79

individuals who use active learning methods consider the possibility of perturbing
the system in order to decrease parameter uncertainty in the future.
There are two sources of uncertainty in passive learning models: (1) additive
error terms and (2) unknown parameters. There is also the possibility of state variable
measurement error in passive learning models; however we delay the discussion of
measurement error until the next section of this paper.
The most basic passive learning quadratic-linear tracking problem is written as
find

( N-l
Uk ) k=O

to minimize the cost functional

N-l
+! L [Xk - Xk]' W[Xk - Xk] + [Uk - Uk]' A[Uk - Uk]} , (3)
k=O
where

E = expectations operator,
Xk = state vector - an n vector,
Xk = desired state vector - an n vector,
Uk = control vector - an m vector,
Uk = desired control vector - an m vector,
W N = symmetric state variable penalty matrix at terminal period, N,
W = symmetric state variable penalty matrix for periods 0 thru N - 1,
A = symmetric control variable penalty matrix forperiods 0 thru N - I,

subject to

with: Xo given, where

A = state vector coefficient matrix (n x n),


B = control vector coefficient matrix (n x m),
C = exogenous vector coefficient matrix (n x C),
Zk = exogenous vector (C x 1) at time k,

and

~k '" N(O, Q) ,
~ 00
00 '" N(Oo, ~olo) ,

where
80 D. Kendrick

~k =normally distributed disturbance with zero mean and known covariance Q,


fh =s vector of unknown coefficients in A, Band C with initial estimates
90 and covariance ~g~ - both known,
~gro =known covariance matrix (s x s) for initial period parameter estimates,
Q =known covariance matrix (n x n) for system disturbances, ~k.

In this method, the covariance of the parameters of the systems equations, ~1I11,
plays a major role in the choice of controls. The policy makers avoid controls
that add to the uncertainty in the system by choosing controls that are associated
with parameters with low uncertainty or choosing combinations of controls that are
associated with parameters that have negative covariances. Thus there is a motivation
to hold a "portfolio" of controls that have relatively low uncertainty.
As is shown in Kendrick (1981, Ch. 6), passive learning controls can be computed
with a variant of the Riccati method that is computationally very efficient. However,
the calculations that involve the covariance of the parameters make this method
somewhat less efficient than deterministic methods. Thus, the loss in moving from
deterministic to passive learning stochastic methods is not computational efficiency
so much as restriction on model specification. In deterministic methods one can easily
move from quadratic-linear to general nonlinear specifications. However, stochastic
control methods in the algorithms used in this paper are restricted to linear systems
equations and normal distributions.
The reason for this restriction is that one needs to be able to map the uncertainty in
one period into the next period with dynamic equations. It is a desirable property of
such systems that the form of the distributions remain unchanged from one period to
the next. For example, linear relationships can be used to map normal distributions in
one period into normal distributions in the next period. In contrast, a quadratic rela-
tionship would map a normal distribution in one period into a chi square distribution
in the next period and a Wishart distribution in the third period.
Restricting systems equations, and therefore econometric models, to linear equa-
tions is a high price to pay for being able to do stochastic control. Hopefully this
restriction will soon be broken by advances in numerical methods. One promising
approach to nonlinear models is Matulka and Neck (1992).
Passive learning models were formerly solved on mainframe computers. How-
ever, the personal computers in use today are fast enough to permit solution of these
models on the desktop. For example, the DUAL code of Amman and Kendrick
(1991) has recently been made available on IBM PC's and compatibles. This code
has both passive and active learning capabilities, but for the time being it is expected
that most usage on personal computers will be in the passive mode. This will change
shortly with the widespread use of faster CISC and RISC microprocessors as is
discussed below.
In summary, with passive learning stochastic control the choice of control vari-
ables in each period is affected by the covariance of the parameters of the system
equation. Also, as with deterministic control there is updating of the parameter esti-
mates and of the state variables in each period. We do not define separate names for
the different updating behaviors because it seems sensible that any decision maker
Computational Approaches to Learning with Control Theory 81

who is sophisticated enough to consider the covariance of the parameters in choosing


his controls will also be sophisticated enough to update both parameter and state
variable estimates in each time period.
There is also a good possibility that passive learning stochastic control methods
can be applied to some game theory situations. Hatheway (1992) has developed
a deterministic dynamic game model for the U.S. and Japanese economies using
GAUSS. In Appendix B of his dissertation he outlines a method for extending his
methodology to passive learning stochastic control models.

3. STOCHASTIC: ACTIVE LEARNING

Next we consider the actor who considers the effects of the choice of control variables
in the current period on the future covariance of the parameters. This actor is
sophisticated enough to realize that perturbations to the system today will yield
improved parameter estimates that enable him to control the economic system better
in the future. He is also sophisticated enough to know that if the elements in the
covariance matrix are small that there will be little payoff to active learning efforts.
Moreover he knows that even if the elements in the covariance matrix are small it
may be worthwhile to attempt to learn if the additive system noises are large.
The model for this actor may be written as a general quadratic linear tracking
problem which is to choose the control path

( ) N-I
Uk k=O

to minimize the cost functional

N-I
+t L [Xk - Xk]' W[Xk - Xk] + [Uk - ih]' A[Uk - Uk]} , (5)
k=O

where

E = expectations operator,
Xk = state vector - an n vector,
Xk = desired state vector - an n vector,
Uk = control vector - an m vector,
Uk = desired control vector - an m vector,
WN = symmetric state variable penalty matrix at terminal period, N,
W = symmetric state variable penalty matrix forperiods thru N - 1,
A = symmetric control variable penalty matrix forperiods thru N - 1,
°°
subject to

Xk+1 = A((h) Xk + B(fh) Uk + C(lh) Zk + ~i, k = 0, ... , N - 1, (6)


82 D. Kendrick

with: Xo given,
where

A = state vector coefficient matrix (n x n),


B =control vector coefficient matrix (n x m),
C = exogenous vector coefficient matrix (n x f),
Zk = exogenous vector (f x 1) at time k,
~k = additive system error term.

The measurement relations are

(7)
and the first order Markov process

(8)

where

Yk = measurement vector - an r-vector,


H = measurement coefficient matrix (r x n),
(k = measurement error term - r-vector for each period,
D = known Markov process matrix (s x s),
11k = time-varying parameter error term - s-vector for each period,

where the vectors ~k. (k. 11k. Xo, r;~U are assumed to be mutually independent,
normally distributed random vectors with known means and covariances (positive
semi-definite):

initial period state: Xo = N(xo, r;~I~) ,


initial parameters: 00 = N(Oo, r;OIO) ,
A (}(}

system noise: ~k = N(O,Q),


measurement noise: (k = N(O,R),
Markov process noise: 11k = N(O, G) ,

and where

r;ol~ = known covariance matrix (n x n) for initial period state variables,


r;g~ = known covariance matrix (s x s) for initial period parameter estimates,
Q = known covariance matrix (n x n) for system disturbances, ~k,
R =known covariance matrix (r x r) for measurement disturbances, (k,
G =known covariance matrix (s x s) for Markov disturbances, 11k.

Measurement error is also included in this model. Thus the state variables are not
observed directly but rather through a noisy process. Of course as the sizes of the
measurement errors decrease the gain to active learning efforts will increase.
Computational Approaches to Learning with Control Theory 83

The presence of measurement error in models with distributed lags also raises
the following issue: Normally some data are collected in each time period, and flash
estimates are issued before the full data set has been collected and processed. Thus
the most recent state estimate will be the noisiest while state variables from several
periods ago will have less noise associated with them. So there is a premium on using
data from several periods ago in the feedback rule. However, to control a system well
one wants to use the most recent state variables. This tradeoff between recent states
with noisy measurements and lagged states with less noisy measurement has not yet
been studied numerically. However, the computer code to facilitate such work is
already available.
The problem setup here is general enough to include not only measurement errors
but also to permit inclusion of time varying parameters. This level of sophistication
has not yet been programmed into our numerical codes, but the mathematical deriva-
tions and separate program development have been done by Tucci (1989). When
time varying parameters are present, the parameter covariance elements are likely
to be larger, so there will be more gain from active learning efforts. On the other
hand parameters learned today will be changing in the future; therefore there is less
potential gain from learning. This tradeoff has not yet been studied numerically.
Active learning stochastic control can be done with the DUAL code mentioned
above. This program has recently been modified and versions developed for super-
computers and workstations as well as mainframes. We have versions running on
Cray and IBM supercomputers, IBM mainframes and SUN and IBM workstations.
In addition a version for IBM PC's and compatibles has recently been developed. We
have discovered that we can solve small active learning problems even on IBM AT
computers with 80286 chips and substantially larger models on IBM PS/2 computers
with 80386 chips. Thus we are confident that the 486 chips and beyond will have
the capability to solve active learning stochastic control problems with a number of
states and controls.
Also we have found that it is possible to do large numbers of Monte Carlo runs on
very small models using SUN and IBM workstations. So far these experiments have
shown that actors who are sophisticated enough to employ active learning techniques
will not necessarily perform better on average than actors who use passive learning
stochastic control methods or even in some cases deterministic methods, cf. Amman
and Kendrick (1994). However, we are treating these results with some caution
because of the possibility that nonconvexities in the cost-to-go can affect them.
More than ten years ago Kendrick (1978) and Norman, Norman, and Palash
(1979) first encountered nonconvexities in active learning stochastic control prob-
lems. However, these results were obtained with computer codes of such complexity
that it was uncertain whether or not the nonconvexities were fundamental or not.
Also, the codes and computers of that time were not fast enough to permit detailed
studies of the problem. However, recently Mizrach (1991) has cast new light on this
problem by providing detailed derivations for the single-state, single-control problem
of MacRae (1972). He found that the non convexity was not a passing phenomenon
but rather was fundamental to active learning problems solved with the Tse and
Bar-Shalom( 1973) algorithm.
84 D. Kendrick

Amman and Kendrick (1992) then followed Mizrach's work by using numerical
work to confirm his results and focusing on the cause of the nonconvexities as the
initial covariance of the unknown parameter. As an aid to understanding this result
consider the MacRae model that is stated below.
The MacRae model was chosen for this work because it is the simplest possible
adaptive control problem. If nonconvexities occur in this problem then one can
expect that they will also appear in more complex models. The MacRae model is

find (uo, ud to minimize

J = EHw2X~ +! :L (WkX~ + lku~)} (9)


k=O
subject to

Xk+1 = aXk + bUk + C + ek, for k = 0, 1, (10)

Xo =0. (11)

The parameter values used by MacRae are

a = .7, b = -.5, c = 3.5, s~ = q = .2

Wk = 1 Vk, lk = 1 Vk, st = .5, s~ = s~ = 0 .


Also the desired paths in (9) are implicitly set to zero so

5\ = 0, Uk =0 Vk.

This problem has been solved using the dual control algorithm ofTse and Bar-Shalom
as described in detail in Ch. 11 of Kendrick (1981). At period k of an N period
model with N - k periods to go to total cost-to-go can be written as

IN-k = JD,N-k + Je,N-k + Jp,N-k , (12)

where the D, C, and P subscripts represent the deterministic, cautionary, and probing
components, respectively. The deterministic term includes all of the nonstochastic
elements. The cautionary term is a function of E k+ II k' i.e. of the uncertainty in the
next period before a new control can be applied. The probing term can be written as
N-I
Jp,N-k = ! tr L (Rj E;jj) , (13)
j=k+1
where !R is a Riccati-like term and E;jj is the covariance matrix of the unknown
parameters in period j after updating with data through period j. Notice that the
probing term is a function of the parameter covariance matrix for all periods from
the current to the terminal period.
Computational Approaches to Learning with Control Theory 85

IN

I
0" =0.5


== IN
".

I --
0' =LO

=2.0
==-- IN
".

J :=::::=
0"

==== "•
IN
0" =4.0

~
Fig. l.
~
Effects of a 2 on the total cost-to-go.
"•

It is this probing term that is the primary source of nonconvexities. In fact Amman
and Kendrick (1992) have shown that the nonconvexities can be switched off and on
by altering the initial variance of the uncertain b parameter. An example of this sort
is shown in Figure 1. With the setting of (72 = .5 at the top of the figure the cost-to-go
function remains a convex function of the initial period control, Uo. However, as (72
increases the non convexity appears and causes two local optima for the problem.
In addition to the nonconvexities from the probing term, Amman and Kendrick
also found that there are combinations of parameter values which, in conjunction
with large values of (72, will result in nonconvexities also arising in the cautionary
term.
So the bad news is that the nonconvexities appear to be fundamental to active
86 D. Kendrick

learning stochastic control problems. However, the good news is that there may be
some regularities about these nonconvexities that can be exploited to design efficient
solution algorithms. Also, even if brute force grid search methods must be employed,
computer speeds are increasing so rapidly that models that exhibit nonconvexities
can be solved.
Finally, there is some prospect that the parameter values in empirical economic
models will be such that the nonconvexities occur only rarely. It will take some time
and effort to establish this fact, but one can be hopeful that this will occur.

4. CONCLUSIONS

Economists do not need to use the unrealistic assumption that all economic actors
know everything. Rather there are tools at hand that will allow us to portray different
actors as having different information and able to learn as time passes. Moreover,
there are available algorithms and computer codes for modeling different kinds of
learning behavior in different actors. Some actors may be so sophisticated as to use
active learning methods in which they probe the system in order to improve parameter
estimates over time. Other actors may be sophisticated enough to consider the
covariance of parameters in choosing their control variables but learn only passively
with the arrival of new information. Other actors may be so unsophisticated that they
do not even update parameter estimates when new observations arrive.
Since computer speeds have increased greatly in recent years we can now model
all these kinds of behaviors using code that operates on supercomputers, workstations
and even personal computers.
However, the most sophisticated methods that involve active learning can give
rise to nonconvexities in the cost-to-go, so caution must be exercised until we can
learn more about when these nonconvexities arise and how to solve active learning
problems when they do occur.

REFERENCES

Abel, Andrew (1975), "A Comparison of Three Control Algorithms to the Monetarist-Fiscalist
Debate," Annals ofEconomic and Social Measurement, Vol. 4, No.2, pp. 239-252, Spring.
Amman, Hans M. and David A. Kendrick (1991), "A User's Guide for DUAL, A Program
for Quadratic-Linear Stochastic Control Problems, Version 3.0", Technical Paper T90--94,
Center for Economic Research, The University of Texas, Austin, Texas 78712.
Amman, Hans M. and David A. Kendrick (1992), "Nonconvexities in Stochastic Control
Models", Paper 92-91, Center for Economic Research, The University of Texas, Austin,
Texas, 78712.
Amman, Hans M. and David A. Kendrick (1994), "Active Learning - Monte Carlo Results,"
forthcoming in 1994 in Vol. 18 of the Journal of Economic Dynamics and Control.
Aoki, Masanao (1967), Optimization of StoclUlstic Systems, Academic Press, New York.
Chow, Gregory (1975), Analysis and Control of Dynamic Systems, John Wiley and Sons, Inc.,
New York.
Drud, Arne (1992), "CONOPT - A Large Scale GRG Code," forthcoming in the ORSA Journal
on Computing.
Computational Approaches to Learning with Control Theory 87

Fair, Ray (1984), Specification, Estimation and Analysis of Macroeconometric Models, Har-
vard University Press, Cambridge, Mass. 02138.
Hatheway, Lawrence (1992), Modeling International Economic Interdependence: An Appli-
cation of Feedback Nash Dynamic Games, Ph.D. Dissertation, Department of Economics,
The University of Texas, Austin, Texas 78712.
Kendrick, David A (1978), "Non-convexities from Probing an Adaptive Control Problem,"
Journal of Economic Letters, Vol. 1, pp. 347-351.
Kendrick, David A. (1981), Stochastic Control for Economic Models, McGraw-Hill Book
Company, New York.
Livesey, David A (1971), "Optimizing Short-Term Economic Policy," Economic Journal,
Vol. 81, pp. 525-546.
MacRae, Elizabeth Chase (1972), "Linear Decision with Experimentation," Annals of eco-
nomic and Social Measurement, Vol. 1, No.4, October, pp. 437-448.
Matulka, Josef and Reinhard Neck (1992), "A New Algorithm for Optimum Stochastic Control
on Nonlinear Economic Models," forthcoming in the European Journal of Operations
Research.
Mizrach, Bruce (1991), "Non-Convexities in an Stochastic Control Problem with Learning,"
Journal of Economic Dynamics and Control, Vol. 15, No.3, pp. 515-538.
Norman, A, M. Norman and C. Palash (1979), "Multiple Relative Maxima in Optimal Macroe-
conomic Policy: An Illustration", Southern Economic Journal, 46, 274-279.
Parasuk, Chartchai (1989), Application of Optimal Control Techniques in Calculating Equi-
librium Exchange Rates, Ph.D. Dissertation, Department of Economics, The University of
Texas, Austin, Texas 78712.
Park, Jin-Seok (1992), A Macroeconomic Model of Monopoly: A Theoretical Simulation
Approach and Optimal Control Applications, Ph.D. dissertation in progress, Department
of Economics, University of Texas, Austin, Texas 78712.
Pethe, Abhay (1992), "Using Stochastic Control in Economics: Some Issues", Working Paper
92-5, Center for Economic Research, The University of Texas, Austin, Texas, 78712.
Pindyck, Robert S. (1973), Optimal Planning for Economic Stabilization, North Holland
Publishing Co., Amsterdam.
Prescott, E. C. (1972), "The Multi-period Control Problem under Uncertainty," Econometrica,
Vol. 40, pp. 1043-1058.
Simon, H. A (1956), "Dynamic Programming under Uncertainty with a Quadratic Criterion
Function," Econometrica, Vol. 24, pp. 74-81, January.
Theil, H. (1957), "A Note on Certainty Equivalence in Dynamic Planning," Econometrica,
Vol. 25, pp. 346-349, April.
Tse, Edison and Yaakov Bar-Shalom (1973), "An Actively Adaptive Control for Linear Sys-
tems with Random Parameters," IEEE Transactions on Automatic Control, Vol. AC-17,
pp. 38-52, February.
Tucci, Marco (1989), Time Varying Parameters in Adaptive Control, Center for Economic
Research, The University of Texas, Austin, Texas 78712.
Turnovsky, Stephen J. (1973), "Optimal Stabilization Policies for Deterministic and Stochastic
Linear Systems", Review of Economic Studies, Vol. 40.
Turnovsky, Stephen J. (1977), Macroeconomic Analysis and Stabilization Policy, Cambridge
University Press, London.
ALFRED LORN NORMAN

Computability, Complexity and Economics

ABSTRACf. Herbert Simon advocates that economists should study procedural rationality
instead of substantive rationality. One approach for studying procedural rationality is to con-
sider algorithmic representations of procedures, which can then be studied using the concepts
of computability and complexity. For some time, game theorists have considered the issue of
computability and have employed automata to study bounded rationality. Outside game theory
very little research has been performed. Very simple examples of the traditional economic
optimization models can require transfinite computations. The impact of procedural rationality
on economics depends on the computational resources available to economic agents.

1. INTRODUCTION

H. Simon (1976) suggests that the proper study of rationality in economics is pro-
cedural rationality. Simon believes that procedural rationality should encompass
the cognitive process in searching for solutions to problems. This study should be
performed using computational mathematics, which he defines as the analysis of
the relative efficiencies of different computational processes for solving problems of
various kinds. "The search for computational efficiency is a search for procedural
rationality, ... " In this paper, problem-solving processes are formalized as algorithms
for solving economic problems. Placed in an algorithmic format, procedural ratio-
nality can be studied using the theory of computability and complexity developed by
mathematicians and computer scientists.
In Section 2 the concepts of computability and complexity are presented. The
traditional format of computability is for finite representations. One example is finite
sequences from a finite alphabet. Another is the study of functions f : Nn -+
N k , n 2: 0, k > 0, where N is the natural numbers 0, 1,2, . ... While this model
is appropriate for studying finite state game theory, it is not applicable to most
traditional single agent optimization problems such as the theory of the firm or the
consumer defined as optimization problems over the reals. To study the complexity
of such problems, the information-based complexity concept of Traub, Wasilkowski
and Woiniakowski (1988) is recommended. This approach encompasses both finite
representable combinatorial complexity as well as optimization over the reals. An
important question in complexity theory is whether a problem is tractable, that is can
be computed with polynomial resources.
One application of complexity theory is determining the computational cost of
achieving accuracy in algorithms used in numerical analysis such as integration.
Economists should perform such analyses for algorithms used in optimization models

D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 89-108.
© 1994 Kluwer Academic Publishers.
90 A.L. Norman

and econometrics. A start in this direction has been made by Norman and Jung
(1977), Norman (1981,1994) and Rustem and Velupillai (1987) in the area oflinear
quadratic control. In this paper, we focus on the relationship between computability
and complexity and economic theory with special emphasis on bounded rationality.
In this paper we focus on computational complexity and do not consider dynamic
complexity arising from chaotic behavior even though Spear (1989) demonstrates
that the two concepts are related.
In section 3 the literature concerning computability, complexity and bounded
rationality in finite action game theory is considered. This literature dates back at
least to Rabin's (1957) demonstration of the existence of a noncomputable strategy.
More recently Binmore (1990) and Canning (1992) have considered the impact
of restricting players to computable algorithms. Since Aumann's (1981) suggestion,
game theorists have modeled bounded rationality by replacing players with automata.
A brief survey of this literature is presented. For automata theory there are two
types of complexity: the computational complexity of computing the best-response
automaton and the strategic complexity of implementing the strategy. Overall, game
theory contains many problems currently considered intractable.
Outside of game theory very little research in economics has been done on com-
putability and complexity and their relationship to bounded rationality. The literature
concerning the theory of the firm and the theory of the consumer is considered in
Section 4. Norman (1994) demonstrates that very simple models of the firm can
require transfinite computations to determine profit maximization. Also, such trans-
finite problems cannot be ignored by appealing to concepts such as €-rationality,
because the computational complexity of €-optimization can be exponential, that is,
intractable. Beja (1989) and Rustem and Velupillai (1989) demonstrate a fatal flaw in
the traditional choice model. Norman (1992) proposes a new discrete-mathematics
consumer model for choice with technological change.
In Section 5 we briefly consider two miscellaneous, unrelated articles. The first
is Spear's (1989a) use of computability theory to characterize the identification of
a rational expectation equilibrium. The second is Norman's (1987) use of compu-
tational complexity to characterize alternative mechanisms to clear the astray-Starr
(1974) household exchange problem.
Section 6 forecasts the impact of computability and complexity on economics.
If bounded rationality is interpreted as optimization with a computational resource
restriction, the impact on economic theory depends on whether the restriction is
computability, tractability or linearity.
Finally, the reader is warned that because symbol usage generally follows the
references, some symbols are used for several purposes in the paper.

2. COMPUTABILITY AND COMPLEXITY

There are several approaches to the theory of computability that include recursive
functions, Turing machines, algorithms, and rewrite systems. Because these alterna-
tives are equivalent up to a coding (transformation), the choice selected should be
Computability, Complexity and Economics 91

that most accessible to the reader. While mathematicians, and hence economic theo-
rists, generally prefer the recursive function approach, the readers of Computational
Economics are likely to prefer an algorithmic approach that is intuitively obvious to
economists with some computer programming experience.
Let us consider the algorithmic approach to computability of Sommerhalder
and van Westrhenen (1988), which analyzes the properties of simple-algorithmic-
language, SAL(N), programs. A SAL(N) program is a mathematical entity defined
as a quadruple (n, k, p, P), where P is a sequence of SAL statements, and the
variables occurring in the sequence P belong to XI, ... ,xp E NP. Of these p
variables, n ~ 0 are input variables and k > 0 are output variables. There are two
types of SAL statements:

1. Assignment statements: (Note: we will use f-, not :=, for assignment)

Xi f- 0
Xi f- Xj

Xi f- Xj + 1 (Successor)
Xi f- Xj 0 1 (if Xj = 0 then Xi f- 0 else Xi f- Xj - 1) (Predecessor)

2. While statement:
while Xi t- 0 do S od where S is a sequence of SAL statements.
The set F(SAL(N)) contains all functions f:Nn -+ N k , n ~ 0, k > 0 for
which there exists a SAL(N) program (n, k, p, P) which, given an input(xl, ... ,x n ),
computes (XI, ... ,Xk) = f(XI, ... ,xn ) as output in a finite number of steps. If the
program computes an output for at least one input, the function is a partial recursive
function, and if the program computes an output for every input, the function is a total
recursive function. The set F(SAL(N) is equivalent to the set of recursive functions.
(See Sommerhalder and van Westrhenen, 1988).
In SAL programs, arithmetic operations such as +, -, x, and -;- must be con-
structed as macros. For example the addition macro Xi f - ADD (Xi + Xj) can be
constructed as

while Xj t- 0 do
Xi = Xi +1
Xj = Xj 0 1
od

Adding the standard arithmetic operations as statements in SAL would decrease the
number of statements required to compute a function; nevertheless, if a function was
not computable without arithmetic statements, it would not become computable with
arithmetic statements. Computability addresses the issue of what can be computed
in a finite number of statements, not the number of statements. For simplicity, it is
desirable to keep the instruction set of SAL to the minimum.
92 A.L. Norman

A focus of computability theory is decidability - - that is whether a predicate can


be determined in a finite number of steps. One of these predicates is the halting
problem for SAL(N) programs: Does computation on a given input x, induced by a
given program P, terminate? This problem is not solvable by algorithmic methods;
that is we can not construct a universal program to answer this question. A famous
example is Hilbert's tenth problem: Can we construct an algorithm to determine
whether a given polynomial equation with integer coefficients has an integer solution?
Matijasevits (1971) determined that such an algorithm cannot be constructed. Also,
computability is closely related to GDdel's work on the limitations of constructive
mathematics. Indeed, a frequently used concept in computability proofs is the GDdel
index scheme.
An economic example that clarifies the concept of decidability is the use of order
and rank conditions to determine if a simultaneous structural econometric model is
identified. Since these conditions can be checked by a program running in polynomial
time, the issue is decidable. However, determining the precise values of the unknown,
identified parameters is not decidable, because (besides being real numbers) these
are asymptotic limits that cannot be determined with a finite number of observations
and calculations.
Complexity theory addresses the issue of categorizing the difficulties of alternative
problems that can be solved in a finite number of steps. For an overview of formal
complexity theory see Hartmanis (1989). With the exception of Blum's (1967)
axiomatic approach, theorists concerned with the properties of complexity classes,
such as whether Pis a proper subset of NP, have traditionally used the Turing machine
as their model of computation since it provides a common frame of reference for
considering such relationships.
However, because the Turing model is very tedious to apply to specific finite-
representable combinatorial problems and this model is not applicable without modi-
fication for most real-number numerical analysis problems, complexity practitioners
have constructed many computer models appropriate for the study of particular prob-
lems and algorithms. For an overview of such models and complexity applications,
see Aho, Hopcropft and Ullman (1974). For example, a straight line program model
is generally used in matrix multiplication analysis, and a decision tree, for which each
node represents a comparison, is generally used in sorting analysis. Also, applied
complexity analysis frequently uses a cost function to reflect the cost of performing
the operation central to the algorithm in question. Nevertheless, because most com-
mon models of computation are polynomial related, and the asymptotic definitions of
complexity only count the most frequently occurring operation, the various models
are closely i·elated.
Because most economic optimization problems are defined over the reals - not
the natural numbers - we need a notion of computational complexity general enough
to deal with both types of formulations. Two major branches of computational com-
plexity are information-based complexity and combinatorial complexity. The former
deals with the difficulty of approximating solutions to problems where information
is partial, noisy, and costly. The later deals with problems that can be solved exactly
in a finite number of computations and for which information is complete, exact, and
Computability, Complexity and Economics 93

costless. Most combinatorial problems can be represented using natural numbers or


finite sequences from a finite alphabet. In this paper we need a computational model
that can be used to study both information-based and combinatorial complexity. For
this purpose we employ a slightly modified version of the information-based compu-
tational model of Traub, Wasilkowski and Woiniakowski (1988). Here all arithmetic
and combinatorial operations are assumed to be performed with infinite precision.
Let the economic problem set be designated

(1)

where I is the input set. The solution operation is 8T F -+ G, where G is a


normed linear space. In cases where G is not a normed linear space, there is a
generalized solution operator that need not be discussed in this paper. Associated
with each problem element is a solution element 8(f). Let U(f) be the computed
approximation to 8(f) with absolute error measured by 18(f) - U(f) I. We shall
say that U(f) is an E-approximation iff 1 8(f) - U(f) 1 ~ E.
To compute these E-approximations we may need information about f. We gather
knowledge about f through the use of information operations r : F -+ H. For each
problem element f E F, we compute a number of information operations, which can
either be adaptive or nonadaptive. Associated with the set of information operations
r = {,I, ... "L} is a cost vector C~ = {C'I" .. ,C'L}' In numerical analysis, an
example of an information operation would be the cost necessary to obtain the value
of a function at a point in an integration procedure based on function evaluations. In
economics, information operations could be used to represent the cost of acquiring
data in the marketplace. The knowledge of f obtained through information operators
is represented as N(f).
Given the information obtained from the information operators, the E-
approximations are computed using a specified set of combinatory operations,
n = {WI, . .. , W K }. Associated with these combinatory operations is a cost vector
n
C = {CWI ' ••• , Cw K } . The operations to be included in n constitutes an important
component of the model of computation. For SAL(N), these operations are the "as-
signment" and "while" statements. For the study of the computational complexity of
numerical analysis problems, n consists of arithmetic operations, comparison of real
numbers, and the evaluation of certain elementary functions. For economic problems,
we will introduce additional operators. Some of these information and combinatory
operators will be considered oracles, that is black boxes that can perform an operation
with a specified cost. We do not consider how the black box performs the operation.
For each f E F, we desire to compute a E-approximation U(f) of the true solution
8(f), where E = 0 corresponds to an exact solution. From knowing N(f), the
approximation U(f) is computed by a mapping ¢> that corresponds to an algorithm,
where U(f) = ¢>(N(f)), with

¢> : N(f) -+ G, (2)

and the goal is to compute ¢>(N(f)) at minimal cost. If no information is required,


¢>(N(f)) reduces to ¢>(f). This very generalized conception of an algorithm is
94 A.L. Norman

called an idealized algorithm. Much complexity analysis is performed by restricting


idealized algorithms to realizable algorithms that are based on a particular computer
model, such as a Turing machine, or on computational considerations, such as the
class of algorithms that are linear functions of the input.
The cost of information gathering and computing tjJ( N (f) ), which will be denoted
by CPt (tjJ, N(f) ),is

cp(U, f) = C~ w(r) + Cnw(n), (3)

where U stands for a pair consisting in information N and algorithm tjJ. w(·) is a
vector whose ith element is the number of operations performed on the ith element
of r or n, as designated. This cost function is closely related to the time needed
to perform the computation. To determine the total time, the cost vectors would be
replaced with the time needed to perform the associated operations.
In this paper we concern ourselves only with the worst-case setting of complexity.
Here the error and cost of approximation are defined over all problem elements as
follows:

e(U) = sup 1 S(f) - U(f) I, (4)


fEF

cp(U) = supcp(U, f). (5)


fEF

Another important complexity concept is the average complexity. Formulating


the average complexity requires knowing the distribution of the occurrence of the
elements of f in F. Since such knowledge is not available for most economic
problems, we instead consider the range of performance over F. The cost function as
defined is a measure of the transactions cost of decision making. One consideration
is to compare the absolute costs of alternatives. Another important consideration
is how these costs grow with increasing problem size, which we shall designate by
the generic parameter T. For SAL(N) problems, T = n, the number of inputs. In
considering the asymptotic cost function an important question is whether the growth
of these costs is no more than a polynomial in T. Such problems are considered
tractable. Since as T increases, the costis progressively dominated by highest
power of the polynomial, the definitions for asymptotic complexity assign problems
and algorithms to equivalence classes based on this highest power.
We wish to compare cp(U(T)) with a nonnegative Z = Z(T), which in applica-
tions will frequently be T, T2, T3 and so on.
Definition 1. cp(U(T)) is of upper order [lower order] Z, written O(Z)[o(Z)), if
there exist k, m > 0 such that cp(U(T)) ~ [~] mZ(T) for all T > k.
Definition I requires a slight modification to handle the rate of growth measured
in terms of achieving greater accuracy, that is llf. --+ 00. This definition can now
be employed to characterize the computational complexity of the two optimization
problems by applying the definition of upper and lower order to the cost functions of
q, which is the class of all algorithms that use information operator N.
Computability, Complexity and Economics 95

Definition 2. F has f-computational complexity Z if there exists an f-approximate


algorithm U(f) E ~ such that cp(T) is O(Z) and, for all f-approximate algorithms
U E ~, cp(T) is o(Z).
Like definition 1, definition 2 requires a slight modification to handle the cost of
achieving greater accuracy as measured by l/f. To say that F has O-computational
complexity TO means that F can be computed exactly (f=O) in a fixed number of
computations independent of the length of the time horizon T. Definition 1 divides
algorithms into equivalence classes. For example, an algorithm which can compute
F in six operations is equivalent to one that can compute F in eight. For algorithms
whose cost functions are polynomial in T, the equivalence classes are defined by
the highest power of T. For asymptotic analysis, the cost of the operation that is
performed with the highest power of T can be assigned a value of 1 and all the
other information and combinatory operations can be assigned a value of O. Thus
in analyzing sorting algorithms, only the number of comparisons is considered. If
the concern in analyzing problems is to determine which problems are tractable, the
problem formulation is reasonably robust to the selection of elements of n, because
most standard computational models are polynomial related.
In this paper we consider only one complexity class, which contains problems
currently considered intractable, namely the nondeterministic polynomial NP. While
P and NP are usually defined relative to deterministic and nondeterministic Turing
machines, let us consider defining them relative to SAL(N) and ND-SAL(N) to avoid
introducing a new model.
To discuss the NP class we have to add a statement to SAL to create the nonde-
terministic simple algorithmic language, ND-SAL(N). The new statement is
3. Either statement: either sequence, Si or sequence, Sj od
The intent of the either statement is that one of the two sequences Si or Sj will
be executed. However, which one is left undetermined. In a SAL program the
computational sequence is a straight line. In an ND-SAL(N) program, one path in a
tree is executed.
To illustrate the operation of an ND-SAL(N) program consider the partition
problem: Given a set Q of natural numbers, does there exist a subset J ~ Q such
that

L X = L x? (6)
xEJ xE(Q-J)

For simplicity, consider the special case where Q consists of just three numbers. We
introduce a SAL macro for addition called ADD. The critical steps in an ND-SAL(N)
program would be three statements (i = 1,2,3):

The program would terminate only if X4 equals X5. After these three statements have
been executed there are eight possibilities:
96 A.L Norman

Case X4 Xs
1 XI +X2+X3 0
2 XI +X2 X3
3 XI +X3 X2
4 X2+ X 3 XI
5 XI X2 +X3
6 X2 XI +X3
7 X3 X2 +X3
8 0 XI +X2 +X3

If any of the eight possibilities discovers a partition, the ND-SAL(N) program


terminates successfully. In the equivalent SAL(N) program, at least four of the eight
possibilities must be considered.
Having briefly introduced ND-SAL(N), let us define P and NP. Defining unit
time and cost for executing a statement, polynomial cost and polynomial time are
equivalent. A problem F is a member of the polynomial class P [ nondeterministic
polynomial class NPJ if there exists a SAL(N) [ND-SAL(N)] program that can solve
each member f of F as a polynomial function of the number of inputs, n.
In terms of computability, ND-SAL(N) is no more powerful than SAL(N) because
any function that can be computed by ND-SAL(N) can be computed by SAL(N).
Nevertheless, programs in ND-SAL(N) can be construed as countably parallel in
comparison to the equivalent program in SAL(N). Thus a ND-SAL(N) program that
solves F in polynomial time could have a separate polynomial path for each f.
The equivalent SAL(N) program could consider all these paths in exponential time.
One of the most famous open questions in computer science is whether there exist
problems in NP which are not members of P.
A well-known group of problems in NP, which are assumed not to be members
of P, are known as NP-complete. To show that a new problem is NP-complete
requires two steps. First, a solution that runs in polynomial time must be verified.
Second, one of the existing NP-complete problems must be polynomial transformable
into the new problem. There are numerous NP complete problems including many
operation research problems, such as the traveling salesman problem, and many graph
problems, such as the Hamilton circuit problem. These problems currently require
exponential time or cost in SAL(N). For an introduction to NP complete problems
see Papadimitriou and Steiglitz (1982).

3. BOUNDED RATIONALITY AND GAME THEORY

Game theory is the only field of economics that has generated a literature concerning
computability, complexity, and bounded rationality. This is not totally surprising
since finite action game theory is one of the few economic subjects fitting the tradi-
tional computational models for problems represented by either natural numbers or
Computability, Complexity and Economics 97

finite sequences from a finite alphabet. The first topic we consider is the impact of
the concept of computability on game theory. Next we consider an example of an
NP-complete problem in game theory and finally, we consider finite automata as a
form of bounded rationality.
The knowledge that there exist games with noncomputable optimal strategies has
been known at least since Rabin's (1957) paper. We now present a simple number
theoretic example due to Jones (1982).
Example 1. An arithmetical Game of Length 5
There are two players 1 and 2 who take turns assigning nonnegative integer values
to the variables of a polynomial:

player 1 picks XI
player 2 picks X2
player 1 picks X3
player 2 picks X4
player 1 picks X5

Player 1 wins if and only if xi + x~ + 2XIX2 - X3X4 - 2X3 - 2X5 - 3 = O.


Otherwise player 2 wins. In any arithmetical game, either player 1 has a winning
strategy or player 2 does. But Jones provides a specific example where neither player
1 nor player 2 has a computable winning strategy. This example is related to the
undecidability of Hilbert's 10th problem.
Recent work in computability in game theory has investigated the impact of
imposing a restriction of computability on player strategies. We assume each player is
replaced by an algorithm that generates a strategy choice given a complete description
of the games, including the other player's algorithm. Such an algorithm is complete
if it produces a strategy choice in every situation. It is also rational if it generates the
optimal response to the other player's choices. Binmore (1990) demonstrates that
computable, complete, and rational algorithms do not exist.
Canning (1992) investigates relaxing completeness to obtain algorithms that are
rational and complete on a limited domain. Let H ~ G, the set of games with finite
strategies, and let B ~ A, the set of effectively computable game theories. (H, B)
is solvable if there exists a strategy in A that is complete relative to (H, B) and is
the best choice whenever the opponent plays. Canning demonstrates that (H, A) is
solvable if and only if H ~ D, where D is the set of games with dominant strategies
for each player. Also, (G, K) is solvable if K is the set of algorithms that always
stop. These results define the limits of rational, computable games. To explore these
limits Canning develops concepts such as a strict Nash strategy, a best reply to all best
replies to itself, and a rational algorithm that plays the best response if the opponent
reaches a decision. He encounters a basic problem that the set of rational algorithms
of A is too small to include any algorithm that acts rationally and is complete against
every algorithm in the set of rational algorithms of A.
98 A.L Norman

In addition to the investigation of computability in game theory, researchers have


used complexity theory in game theory investigations. Prasad and Kelly (1990)
provides examples of NP-completeness in determining properties of weighted ma-
jority voting games. Such a game consists of n individuals making up the set N
= {1,2, ... ,n} with an associated vector of weights W =
(Wl,W2, ... ,wn ). A
weighted majority voting game is one in which, for some fixed q, coalition S ~ N
is winning just when EjEs Wj :? q.
Given nonnegative integer weights and a positive integer q, the question of
determining the existence of a subset S ~ N such that EjEs Wj = q is known to
be NP-complete. Prasad and Kelly use this problem to examine the complexity of
determining various power measures of i E N. i is pivotal in subset S ~ N - {i}
if E jES Wj < q and Wi + E jES Wj :? q. Most power measures are functions
of the number of distinct subsets for which i is pivotal. Prasad and Kelly show
that determining whether the number of pivots is greater than r is an NP-complete
problem. They also show the standard political power indices, such as the Absolute
Banzhaf, Banzhaf-Colemen and Shapley-Subik, are all NP-complete problems.
Imposing a computability constraint on a game is not likely to create controversy
among economists. First, the constraint appears obvious; and second, it is robust to
the choice of computational model. The best way to impose tractability on games
is more controversial. Since suggested by Aumann (1981), game theorists have
considered automatons as players in games in order to study bounded rationality.
Kalai (1990) provides an excellent survey of this literature. Here we need only a
short summary based on the bimatrix representation of the stage game for a repeated
prisoner's dilemma

Player 2's Actions


c d
c (3,3) (0,4)
Player 1's Actions
d (4,0) (1,1)

The payoff matrix for this game is symmetric. Both players' 1 and 2 actions are
c, cooperate, and d, defect. The first entry in each element of the payoff matrix
represents the payoff to player 1 and the second the payoff to player 2.
This game has one Nash equilibrium (d, d). While both players would be bet-
ter off cooperating (c, c), this action combination is not stable because both play-
ers could improve their position by switching actions. In the repeated prisoner's
dilemma, the problem is to determine the circumstances under which the two play-
ers would cooperate to achieve a higher payoff. Intuitively it would seem likely
that they would have incentives to cooperate. Let us consider this problem. Let
at E{(C, c), (c, d), (d, c), (d, d)} be the action combination selected by players 1 and
2 in period t. A history h of length l(h) = k is {a j , ai+l, ... , ai+k}, and HT is a
set of all histories of length strictly less than T. A strategy for player i in period t is
If : Ht-l --+ (c, d); that is, a strategy provides a rule for action given all possible
histories. One method of calculating the payoffs in a repeated game is the average
Computability, Complexity and Economics 99

payoff. Let jI = (fl (hO), h(hO)), which means the strategy for the first stage. Then,
recursively for t = 2,3, ... , T: It = l(jI, p, ... ,It-I). Let P(ft) be the payoff to
the two players in period t w~en they use strat~y combination It. Then the average
payoff to the two players is P(f) = (ljT) 2::t=1 PCP).
Let us now describe two very simple strategies for the repeated prisoner's
dilemma. Since the game is symmetric, we only need describe the strategies for
player 1. These two are
a. Constant defect: II (h) -+ d for all histories h.
b. Tit-for-tat: II(hO) -+ c and II(h) -+ h~. (That is, initially cooperate and
afterwards execute the last action taken by player 2.)
As the game is repeated, the set Ht of all possible histories, which is also the domain
of the strategy, increases exponentially. Nevertheless, for the average payoff, T-
period, repeated prisoner's dilemma game, the only Nash equilibrium is the action
combination (d, d) each period.
Kalai's approach for studying bounded rationality in repeated games is full au-
tomation, where both players are replaced by automata. An automaton is a triple
((M, mO), B, T), where M is the set of states of the automaton. The behaviorfunc-
tion B : M -+ (c, d) prescribes an action to player 1 at every state on the automaton.
The transition function T : M x A -+ M transits the automaton to a new state from
an old one as a function of the action combinations of both players. The automata
for the two strategies listed above are presented in the following table.

Strategies of player 1 States B (~) (~) (~) (~)


Constant defect mO=D d D D D D
Tit-for-Tat mO=C c C D C D
mO=D d C D C D

Neyman (1985) maintains that the two-person, repeated prisoner dilemma played
by automata can result in a cooperative Nash equilibrium. Let PJ:
1 ,m2 represent

T repetitions with the average payoff criterion, where each player i chooses an
automaton of size not exceeding mi. Neyman asserts that, if 2 ::; mJ, m2 ::;
T - 1, then there is a Nash equilibrium pair of automata of PJ:t,
m 2 that prescribes
cooperation throughout PT. The occurs because restricting the size of the automata
prevents the usual backward induction. Zemel (1985) introduces small talk into
the finitely repeated prisoner's dilemma as an alternative approach to explaining
cooperation.
Next let us consider Pen-Porath's (1986) result concerning the advantage of
having a bigger automaton in an infinitely repeated two-person zero-sum game,
Z~I' m2 • Since zero-sum games have a value in mixed strategy, every player can
guarantee his/her pure strategy Z-maxmin value with an automaton of size one.
Ben-Porath's result concerning the advantage of being bigger is that for every given
positive integer ml, there is a positive integer m2, and an automaton A2 of size m2,
100 AL. Norman

such that for every automaton Al of size mt. player 1's payoff is no more than the
pure strategy Z-maxmin value of player 1.
Rubinstein (1986) and Abreu and Rubinstein (1988) have investigated the choice
of automata when the number of states is costly. Also, games with finite actions
have the desirable property that all equilibrium payoffs can be well approximated by
equilibria of bounded complexity. This idea is pursued in the papers of Kalai and
Stanford (1988) and Ben-Porath and Peleg (1987).
In addition to characterizing the behavior of automata, game theorists have also
investigated the computational complexity of computing the best-response automaton
under various conditions. Gilboa (1988) considers the problem of computing the best-
response automaton, A I, for player 1 in a repeated game G with n players and n - 1
finite automata, (A 2 , ••• , An), for the remaining players in G. He demonstrates
that the computational complexity of both problem (1) - determining whether a
particular Al is a best-response automaton - and problem (2) - finding a best-response
automaton Al - is polynomial. If the number of players is unrestricted, problem (1)
is NP-complete and problem (2) is not polynomial. Ben-Porath (1990) demonstrates
that for a repeated two person game where player 2 plays a mixed automaton strategy
with a finite support, problem (1) is a NP-complete problem and problem (2) does
not have a polynomial solution. Papadimitriou (1992) considers the relationship
between the computational complexity of determining a best-response strategy and
the strategic complexity in a repeated prisoner's dilemma. If an upper bound is
placed on the number of states of the best-response automaton, the problem is a
NP-complete problem; whereas, if no bound is imposed, the problem is polynomial.
Finally, game theorists are in the process of developing a complexity measure
for implementing an automaton. Kalai-Stanford (1988) define the complexity of a
strategy to be its size (the number of states of the smallest automaton prescribing
it). In general the amount of information needed for playing a strategy equals the
complexity of the strategy; that is, the complexity of a strategy, f, equals the number
of equivalence classes of histories it induces. Banks and Sundaram (1990) propose
an alternative strategic complexity concept that includes a measure of the need to
monitor the opponent's action. Lipman and Srivastava (1990) propose a strategic
complexity measure based on the details of the history required by the strategy. They
are interested in the frequency with which perturbations in history change the induced
strategy. Papadimitriou's (1992) result indicates that achieving a specified Kalai-
Stanford strategic complexity increases the computational complexity of computing
the best response automaton.

4. THE FIRM AND THE CONSUMER

The original calculus-based models of profit and utility maximization are defined
over the reals - for example, the positive orthant of ~n. Consequently, the traditional
computability and complexity arguments based on either the natural numbers or finite
representations from a finite alphabet are not applicable.
In order to demonstrate just how simple a noncomputable optimization problem
Computability, Complexity and Economics 101

can be, we consider the problem presented in Norman (1994), which employs the
information-based complexity model. A monopolist has a linear production process,
faces a linear inverse demand function, and has a profit function for t = 1,2, ... ,T:

Pt = a - dqt,

(8)

where a and d are known, qt is the tth observation of net output, Xt is the tth
level of the production process, (3 is the unknown scalar parameter, and (t is the
tth unobserved disturbance term. The (t are iid normal with mean zero and known
variance one. Since the complexity results are invariant to defining the cost function
as a zero, linear, or quadratic function, the cost function is defined as c(qt) = 0 to
simplify the notation.
Given a normal prior on (3 at time t = 1, the prior information on (3 at time t
is a normal distribution N(mt, hd, where mt is the mean updated by h t = h t - I +
xLI and h t is the precision updated by mt = (mt-Iht- I + qt-Ixt-d/ht. For this
paper let us consider two cases:
1. The agent knows (3 precisely. He or she has either been given precise
knowledge of (3 or has observed (1) a countable number of times so that his or her
prior on (3 has asymptotically converged to N«(3, 00).
2. The agent's prior information on (3 is represented by N(ml' hJ), where hI has
a very small positive value.
The monopolist is interested in maximizing his expected discounted profit over a
finite time horizon:
T
Jr = supE[LTt-IPt(Xt)qt(Xt) Il-I, xt-I], (9)
a;T t=1

h
were TIS. the d'Iscount f actor, qt - I 'IS ( ql, q2, ... , qt-I ) an d x t - I 'IS ( XI, X2, ... , Xt-I ) .
qt-I and x t - I represent the fact that the decision maker anticipates complete infor-
mation that is observed exactly and without delay.
First consider the optimization problem where (3 is a known parameter. The
optimal Xt can be exactly determined as a function of the parameters of f E F
without recourse to the information operator as
* a (10)
x t = 2d(3'

The (f=O)-computational complexity of this problem is TO, polynomial zero, because


the control that can be computed in 3 operations needs to be computed only once for
the entire time horizon.
Now let us illustrate the computational difficulty with case 2, the simplest non-
trivial example having a time horizon of only two periods, T = 2. The value function
in the first period is
102 A.L. Norman

J \ (q\ ) -- a2((m\h\ + q\xJ)/hJ)2 d


- (11)
4d([(m\h\ + q\x\ )/(h\ + xi)J2 + (h\ + xi}-\) .
While the expectation of J\ (q\) has the form

E[Q\(qd] -d (12)
Q2(qd '

where Q\ (q\) and Q2 (q\) are quadratic forms in the normal variable q\. This expec-
tation cannot be carried out explicitly to give an analytic closed expression. This
implies the O-complexity of this problem with an unknown parameter is transfinite.
Norman (1993) uses these two cases to provide a Bayesian explanation of Knight's
concepts of risk and uncertainty. Risk is where the parameters and distributions of
the decision problem are known, and uncertainty is where at least one parameter
or distribution is not known. The conjecture is that, for nonlinear problems, the f-
computational complexity of an uncertainty problem always lies in a equal or higher
computational class than that for the equivalent risk problem.
The reader might have an illusion that transfinite problems are an oddity in eco-
nomics. The author asserts that the opposite is likely to be the case. Readers who
are not familiar with computational complexity, but who have some knowledge of
numerical analysis, should realize that all those problems for which the traditional
numerical analysis focused on asymptotic convergence of alternative algorithms are
transfinite computational problems. The author asserts that most of the standard cal-
culus optimization problems in the theory of the consumer and the firm are transfinite.
Only special cases, such as quadratic problems, are computable. Also, expressions
that are defined by infinite series are frequently not computable. Another example is
traditional asymptotic convergence theory of econometric estimates.
The reader might assume that the problem can be circumvented by appealing to
f-rational arguments; that is, by using f-approximations which can be computed in
a finite number of computations. If the constraint is that these approximations be
tractable in the sense of being polynomiaL costs with respect to the growth parameters
of the problem, using f-approximations is not always possible.
Consider the discrete-time, stationary, infinite horizon discounted stochastic con-
trol problem requiring computation of a fixed point J* of the nonlinear operator T
(acting on a space of function on the set S E ~n) defined by Bellman's equation

(T J)(x) = inf[g(x, u)
uEU
+a J
s
J(u)P(ylx, u)dy], "Ix E S. (13)

Here, U c lRm is the control space, g( x, u) is the cost incurred if the current state is
x and control u is applied, a E(O, 1) is a discount factor, and P(ylx, u) is a stochastic
kernel that specifies the probability distribution of the next state y when the current
state is x and control u is applied. Then J* is interpreted as the value of the expected
discounted cost, starting from state x, providing that the control actions are chosen
optimally. A variation of this model has been considered by economists [for example
Easley and Keifer (1989)] investigating parameter estimation in an estimation and
Computability, Complexity and Economics 103

control context. By treating unknown parameters as augmented states, the simple


monopoly model presented in this section could be generalized to n states and m
controls over an infinite horizon.
Chow and Tsitsiklis (1989) show that the computational complexity of this model
is o(l/[k(a)fj2n+m). This means that, for a given accuracy, the computation cost is
exponential in increasing model size (number of states and controls). Thus, to assume
f-rationality in general is to assume economic agents have exponential computing
power.
Another area of traditional economic theory for which ideas of computability and
complexity have been considered is consumer theory. A choice function is a system
of pairwise preferences that to be preference-compatible must select from every set
of feasible alternatives those maximal elements that are undominated. Beja (1989)
proves "that for the class of choice functions whose domain includes all finite sets and
some infinite set(s), a characterization (by axioms of rational choice) of compatibility
with preferences which are not necessarily transitive and complete must include some
infinite complexity axiom, i.e. an axiom that posits simultaneous consistency across
infinite collections of decisions." Such a condition is obviously not decidable in any
model of computation that must consider these collections sequentially.
Velupillai and Rustem (1990) consider the choice problem from the perspective of
a nondeterministic Turing machine. They present a nondeterministic Turing machine
with a GOdel-numbered sequence of finite sets of alternatives and inquire whether
the Turing machine for each pair (x, y) in each sequence can determine whether x is
at least as good as y. They demonstrate that there is no finite procedure to answer
this question; that is, the issue is not decidable. Velupillai and Rustem's results imply
the standard choice model has fundamental computational problems without even
considering an infinite complexity axiom needed for consistency of preferences.
Norman (1992) considers a simple model of a consumer choosing an item from a
finite set of close substitutes B t = {bit, b2t , ... , bnt} for t = 1, 2, ... , T, either once
or repeated at regular intervals. The consumer problem in period t, St, is
Find a bit E BIt such that for all bjt E BIt is bit t bjt , (14)

where Pit is the price of the ith item in the tth period, It is the income in the tth
period, and bit E BIt if bit E B t and Pit ~ It.
Because of the high rate of technological change in the marketplace and the usually
long time interval between purchases, the consumer of durable goods generally faces
a new set of alternatives possessing new technological attributes. We assume that the
consumer searches for his preferred item by ranking his alternatives. This ranking
operation is costly, because it requires real resources in the form of mental effort,
time, and travel expenses. Given the rapid rate of technological change, we assume
that the consumer's preferences are not given a priori but are determined, to the
extent they can be done so efficiently, in the consumer's search for the preferred item.
We model the ranking of two items as a binary operation, R(bit , bjt), which the
consumer must execute to determine his preferences between two items, bit and
bjt. This operation is modeled as a primitive operation with positive costs, and no
attempt is made to model the human neural network. We assume that the cost, c, of
104 AL. Norman

comparing items is invariant to the two items being compared. The reflexive binary
ranking operation R(bit • bjt) is assumed to have the following cost: Given any two
unranked bit and bjt E B, C(R(bit • bjt )), the cost of executing R(bit. bjt) is c. If
bit and bjt have been ranked, the cost of remembering R(bit. bjt) is O. Also the
consumer could rank alternatives if he or she choose. However, given the cost, this
might not be optimal. In addition, the consumer expends resources to determine
which items in his or her consumption set are budget feasible: For any bit E B, the
cost of performing F (bit) is k.
The consumer's search to find an optimal consumption bundle depends on market
organization. The type of organization considered is a consumer selecting a new
TV from a wall of TVs presented in a electronics discount store. Consequently,
the consumer's search can be conceptualized as one through an unordered sequence
to find a preferred item satisfying a budget constraint; the consumer's search can
be modeled as an algorithm. Organized in this fashion, characterizing an efficient
search is equivalent to determining the combinatorial computational complexity of
the choice problem.
The computational complexity of finding the preferred item in a one-time choice
problem is n. An efficient algorithm, then, is a variation of finding the largest
number in a sequence. Thus, in a one-time choice problem, it is never efficient to
develop a complete preference ordering, which is a variation of sorting a file and
has a computational complexity of n In n. Consequently, if ranking alternatives is
expensive, a procedural rational consumer facing technological change would never
determine a complete preference ordering, a fundamental assumption of a substantive
rational consumer.

5. TWOPAPERS

In this section we consider two separate, unrelated papers. First, Spear (1989) demon-
strates how the imposition of computability on a rational expectations equilibrium,
REE, with incomplete information implies that such equilibria are not identifiable.
Second, Norman (1987) demonstrates how complexity theory can be used to create
a theory of money. These papers are related only in that they provide some insights
into the range of topics in economics to which the concepts of computability and
complexity might be applied.
Spear considers a two-period overlapping-generations model. To use finite rep-
resentation computability theory, he assumes the economy has a countable number
of states, S. The set cl> consists of total recursion functions on S, where total means
that the associated SAL(N) programs stop for all states s E S. The economy maps
admissible forecasts ¢>o into temporary equilibria, T.E. price functions <PI E cl>. This
mapping, 9 : cl> -4 cl>, which, given the assumptions is 9 : N -4 N, is assumed total
recursive and has a fixed point.
Spear considers the problem of determining the circumstances under which agents
can identify the rational expectations eqUilibrium, REE. For the problem under
consideration, identification means the ability to construct an algorithm that can
Computability, Complexity and Economics 105

decide in a finite (not asymptotic) number of steps which function among a class of
recursive functions has generated an observed sequence of ordered pairs of numbers
of the form (j, J[jD. The two basic results for complete information are (1) if the
T.E. price function is primitive recursive, agents can identify it; however, if 4>g[ij is
not primitive recursive, identification may not be possible, and (2) if the function 9 is
primitive recursive, it can be identified in the limit. Primitive recursive functions are
those that can be computed by SAL(N) programs that do not employ while statements.
(Sequences of assignment statements can be executed a specified number of times
with times statements.)
With incomplete information the basic result is: There is no effective procedure
for determining when a given model-consistent updating scheme yields a REE, unless
Rg is empty.
In the second paper, Norman (1987) constructs a theory of money based on the
complexity of barter exchange.
The monetary model employed is the Ostray-Starr (1974) household exchange
problem: Let Wand Z be n x H matrices representing the initial endowments and
excess demands of the H households with columns representing households and
rows representing goods. The entries of W are non-negative. A positive entry for Z
indicates an excess demand, a negative entry excess supply. Given an n-vector price
p whose elements are all positive, the system (p, Z, W) satisfies for i = 1,2, ... , nand
j = 1,2, ... , H the following restrictions:

p'Z=O,

I:Zij = 0, (7)
j=!

These conditions state that the value of each household's excess demands equals
the value of its excess supplies, and the excess supply of any good cannot exceed
its respective endowment. In addition, aggregate excess demand equals aggregate
excess supply.
In this model the general equilibrium auctioneer has generated a set of equi-
librium prices, and the task remains to find a set of trades that clear the resulting
household excess demands. In a manner analogous to the creation of the auctioneer,
a broker is created to arrange a clearing sequence, a set of trades that will reduce all
household excess demands to zero. The difficulty of the broker's task depends on
the conditions imposed on each trade. For all exchange mechanisms considered, all
trades considered must satisfy the condition that the value of the goods received by a
household must equal the value of goods sent without credit. If no other conditions
are imposed on the exchange mechanism, the broker can simply exchange all excess
demands simultaneously. The computational complexity of the resulting "command
exchange" mechanism is nH.
106 A.L. Norman

Because bilateral barter will not clear the household exchange model in general,
multiparty barter in the form of chains is considered. In a chain, household jl
receives good i l and sends good i 2. Household Jz receives good i2 and sends good
i 3. Household jm receives good im and sends good i l . The value of the goods being
traded, y, is equal in all cases. The computational complexity of the multiparty barter
exchange mechanism is the minimum of (n 2H, nH2). Introducing money reduces
the complexity of the exchange mechanism to nH.

6. CONCLUDING REMARKS

To be consistent with Lipman (1991), we define bounded rationality as optimization


with restricted computational resources where the optimizing procedure is specified
as an algorithm. The impact of this definition of bounded rationality on economics
depends on the computational resources available to the optimizing economic agent.
Imposing a constraint of computability on economic agents is not likely to be con-
tested by many economists. This restriction will have some impact on economic
theory. As was pointed out, Rustem and Velupillai (1990) demonstrate that choice
theory will have to be reformulated.
Many economists would accept a definition of bounded rationality as tractabil-
ity, that is polynomial computational resources. Currently NP-complete problems
require exponential resources in deterministic models of computation. Assuming a
polynomial solution to these problems does not exist, such a definition would have a
major impact on economics because numerous NP-complete problems exist in CUf-
rent economic theory. For example, the use of automata in game theory would have
to be refined. Also, in many cases of optimization over the reals, the concept of
€-rationality could not be maintained.
Most humans do not sort large files or perform conventional matrix multiplication
of any size without machine assistance. This might suggest that an appropriate bound
on computational complexity might be a low order polynomial. The author asserts
that economic agents unaided by machines are restricted to algorithms which are at
most linear in the growth parameters. The impact of such a restriction on economic
theory would be massive. While the author believes such a bound is appropriate, his
opinion may not be shared by many economists.

REFERENCES

1. Abreu, D. and Rubinstein, A. 1988, "The structure of Nash equilibrium in repeated games
with finite automata", Econometrica, vol 56, No.6.
2. Aho, A. v., J. E. Hopcroft and J. D. Ullman, 1974, The design and analysis of computer
algorithms (Addison-Wesley: Reading).
3. Aumann, R. J., 1981, "Survey of repeated games", in Essays in Game theory and
Mathematical Economics in Honor of Oskar Morgenstern (Bibiographische Institut:
Mannheim).
4. Banks, J. S. and R. K. Sundaram, 1990, "Repeated games, finite automata, and complex-
ity", Games and Economic Behavior, vol 2, pp. 97-119.
Computability, Complexity and Economics 107

5. Beja, A. 1989, "Finite and infinite complexity in axioms of rational choice or Sen's
characterization of preference-compatibility cannot be improved", Journal of Economic
Theo~,voI49,pp. 339-346.
6. Ben-Porath, E. 1986, "Repeated games with finite automata.", IMSSS, Stanford Univer-
si ty (manuscri pt).
7. Ben-Porath, E. and Peleg, B. 1987, "On the Folk theorem and finite automata", The
Hebrew University (discussion paper).
8. Ben-Porath, E., 1990, "The complexity of computing a best response automaton in
repeated games with mixed strategies", Games and Economic Behavior, vol 2, pp. 1-12.
9. Binmore, K. 1990, Essays on the Foundations of Game Theory, (Basil Blackwell, Ox-
ford).
10. Blum M., 1967, "A machine independent theory of the complexity of recursive func-
tions",1. ACM, vol 14, pp. 3322-336.
11. Canning, D. 1992, "Rationality, Computability, and Nash Equilibrium", Econometrica,
Vol 60, No 4, pp. 877-888.
12. Chow, Chee-Seng and John N. Tsitsiklis, 1989, "The Complexity of Dynamic Program-
ming", Journal of Complexity, 5,466-488.
13. Easley, David and N. M. Keifer, 1988, "Controlling a stochastic process with unknown
parameters", Econometrica, Vol 56 No.5, 1045-1064.
14. Jones, 1. P., 1982, "Some Undecidable Determined Games", International Journal of
Game Theory, vol. II, Issue 2, pp. 63-70.
15. Gilboa, Itzhak, 1988, ''The complexity of computing best-response automata in repeated
games", Journal of Economic Theory, vol 45, pp. 342-352.
16. Hartmanis, Juris, 1989, "Overview of Computational Complexity Theory in Hartmanis",
J (ed) Computational Complexity Theory (American Mathematical Society: Providence).
17. Kalai, E., 1990, "Bounded Rationality and Strategic Complexity in Repeated Games", in
Ichiishi, T, A Neyman, and Y. Tuaman, (eds), Game Theo~ and Applications, (Academic
Publishers, San Diego).
18. Kalai, E. and W. Stanford, 1988, "Finite rationality and interpersonal complexity in
repeated games", Econometrica, vol 56, 2, pp. 397-410.
19. Lipman, B. L. and S. Srivastava, 1990, "Informational requirements and strategic com-
plexity in repeated games", Games and Economic Behavior, vol 2, pp. 273-290.
20. Lipman, B. L. 1991, "How to decide how to decide how to ... : Modeling limited ratio-
nality", Econometrica, vol 59, No.4, pp. 1105-1125.
21. MatijaseviS, J. V., 1971, "On recursive unsolvability of Hilbert's tenth problem", Pro-
ceedings of the Fourth International Congress on Logic, Methodology and Philosophy
of Science, Bucharest, Amsterdam 1973, pp. 89-110.
22. Neyman, A., 1985, "Bounded complexity justifies cooperation in the finitely repeated
prisoner's dilemma", Economics Letters, Vol 19, pp. 227-229.
23. Norman, A, 1981, "On the control of structural models", Journal of Econometrics, Vol
15, pp. 13.24.
24. Norman, Alfred L., 1987, "A Theory of Monetary Exchange", Review of Economic
Studies, 54, 499-517.
25. Norman, Alfred L., 1992, "On the complexity of consumer choice, Department of
Economics", The University of Texas at Austin, (manuscript) Presented at the 1992
Society of Economics and Control Summer Conference, Montreal.
26. Norman, Alfred L., 1994, "On the Complexity of Linear Quadratic Control", European
Journal of Operations Research, 73, 1-12.
27. Norman, Alfred L., 1994, "Risk, Uncertainty and Complexity", Journal of Economic
Dynamics and Control, 18,231-249.
28. Norman, Alfred L. and Woo S. Jung, 1977, "Linear Quadratic Control Theory For Models
With Long Lags", Econometrica, 45, no.4, 905-917.
29. Ostroy, 1. and R. Starr, 1974, "Money and the Decentralization of Exchange", Econo-
metrica, vol 42, pp. 1093-1113.
108 A.L. Norman

30. Papadmimitriou, C. H., 1992, "On players with a bounded number of states", Games and
Economic Behavior, Vol 4, pp. 122-131.
31. Papadminitriou, C. H. and K. Steiglitz, 1982, Combinatorial Optimization: Algorithms
and Complexity, (Prentice-Hall: Englewood Cliffs).
32. Prasad K. and J. S. Kelly, 1990, "NP-Completeness of some problems concerning voting
games", International Journal of Game Theory, Vol 19, pp. 1-9.
33. Rabin, M. 0., 1957, "Effective computability of winning strategies", M. Dresher et al.
(eds), Contributions to the Theory of Games, Annals of Mathematical Studies, Vol 39,
pp. 147-157.
34. Rubinstein, A. 1986, "Finite automata play the repeated prisoner's dilemma", Journal of
Economic Theory, vol 39, pp. 83-96.
35. Rustem, B. and K. Velupillai, 1987, "Objective Functions and the complexity of policy
design", Journal of Economic Dynamics and Control, vol 11, pp. 185-192.
36. Rustem, Band K. Velupillai, 1990, "Rationality, computability, and complexity", Journal
of Economic Dynamics and Control, vol 14, pp. 419-432.
37. Simon, H. A., 1976, "Form substantive to procedural rationality", S. Latsis (ed), Method
and Appraisal in Economics, (Cambridge University Press, Cambridge).
38. Sommerhalder, R. and S. van Westrhenen, 1988 The theory of Computability: Programs,
Machines, Effectiveness and Feasibility, (Addison Wesley: Workingham).
39. Spear, S. E., 1989a, "When are small frictions negligible?", in Barnett, w., 1. Geweke,
and K. Shell (eds), Economic complexity: Chaos, sunspots, bubbles, and nonlinearity,
(Cambridge University Press, Cambridge).
40. Spear, S. E., 1989, "Learning Rational Expectations under computability constraints",
Econometrica, Vol 57, No.4, pp. 889-910.
41. Traub, J.F., G. W. Wasilkowski and H. Wozniakowski, 1988, Information Based Com-
plexity, (Academic Press, Inc., Boston).
42. Zemel, E., 1985, "Small talk and cooperation: A note on bounded rationality", Journal
of Economic Theory, vol 49, No.1, pp. 1-9.
BER<;:RUSTEM

Robust Min-max Decisions with Rival Models

ABSTRACT. In the presence of rival models of the same system, an optimal policy can be
computed to take account of all the models. A min-max, worst-case design, problem is an
extreme case of the ordinary pooling of the models for policy optimization. It is shown that,
due to its noninferiority, the min-max strategy corresponds to the robust policy. If such a robust
policy happens to have too high a political cost to be implemented, an alternative pooling can
be formulated using the robust pooling as a guide.
An algorithm is described for solving the constrained min-max problem. This consists of
a sequential quadratic programming subproblem, a stepsize strategy based on a differentiable
penalty function and an adaptive rule for updating the penalty parameter.
The global convergence and local convergence rate of the algorithm are established in
Rustem (1992). In this paper, we discuss the numerical convergence properties of the algorithm
and related issues such as the convergence of the stepsize to unity and the properties of the
penalty parameter.

1. INTRODUCTION: THE POLICY OPTIMIZATION PROBLEM

Consider the policy optimization problem

min { J(Y, U) I F(Y, U) = o} , (1)

where Y and U are, respectively, the endogenous or output variables and policy
instruments or controls of the system. J is the policy objective function and F is the
model of the economy. In general, F is nonlinear with respect to Y and U. Problem
(1) is essentially a static transcription of a dynamic optimization problem in discrete
time, where
y,

U= Ut Y = Yt

YT
with Ut E IRn and Yt E IRm denoting the control and endogenous variable vectors at
time period t. The optimization covers the periods t = 1, ... , T. Thus, Y E IRm x T ,
U E IRnxT , F : IF C IRnx - t IRTxm and J : .If c IRnx - t 1R' (nx = T x (m + n)).
The vector valued function F is essentially an econometric model comprising a

D. A. Be/Sley (ed.), computational Techniquesfor Econometrics and Economic Analysis, 109-134.


© 1994 Kluwer Academic Publishers.
110 B. Rustem

system of nonlinear difference equations represented in static form for time periods
t = 1, ... ,T.

2. RIVAL MODELS OF THE SAME SYSTEM AND ROBUSTNESS OF MIN-MAX


POLICIES

The formulation of the policy optimization problem (1) is, in practice, an over-
simplification. Originating from rival economic theories, there exist rival models
purporting to represent the same system. The problem of forecasting under similar
circumstances has been approached through forecast pooling by Granger and New-
bold (1977) and, more recently, by Makridakis and Winkler (1983) and Lawrence et
al., (1986). In the presence of rival models, the policy maker may also wish to take
account of all existing rival models in the design of optimal policy. One strategy in
such a situation is to adopt the worst case design problem

min
yl,o •. Y1n mod,U
max {Ji(yi,u)
i
IFi(yi,U) = 0; i = 1, ... ,mmod} , (2)

where there are i = 1, ... , mmod rival models, with yi, F i , respectively, denoting
the dependent (or endogenous) variable vector and the equations of the ith model.
This strategy is an extension of a suboptimal approach originally discussed in Chow
(1979). Problem (2) seeks the optimal strategy corresponding to the most adverse
circumstance due to choice of model. All rival models are assumed known. The
solution of (2) clearly does not provide insurance against the eventuality that an
unknown (mmod + 1)st model happens to represent the economy; it is just a robust
strategy against known competing "scenarios". A similar, less extreme formulation
is also discussed below, utilizing the dual approach to (2).
The optimization procedure considered below does not distinguish between Y
and u. We can thus define a general vector x = [~] to rewrite the min-max
problem as follows:

mjn mtx { Ji(X) F(x) I = 0, i = 1, ... , mmod} , (3)

where F subsumes all the models. The formulation above is slightly more gen-
eral than the original min-max problem above. Other equivalent formulations are
discussed in Rustem (1987, 1989, 1992).
Algorithms for solving (3) have been considered by a number of authors, including
Charalambous and Conn (1978), Coleman (1978), Conn (1979), Demyanov and
Malomezov (1974), Demyanov and Pevnyi (1972), Dutta and Vidyasagar (1977),
Han (1978; 1981), Murray and Overton (1980). In the constrained case, discussed in
some of these studies, global and local convergence rates have not been established
(e.g. Coleman, 1978; Dutta and Vidyasagar, 1977). In this and the next section, a
dual approach to (3), adopted originally by Medanic and Andjelic (1971, 1972) and
Cohen (1981), is initially utilized. Subsequently, both dual and primal approaches
are used to formulate a superlinearly convergent algorithm.
Robust Min-max Decisions with Rival Models 111

To introduce the basic terminology, let x E IRnx and let F : IRnx --+ IRmxT be
twice continuously differentiable functions with

J = [J 1 , J 2 , ... , Jmmod]T .

Let 1 be the mmod-dimensional vector whose elements are all unity. We define the
inner product of two vectors, y and w, of the same dimension as

Using the inner product, we define the subspace

lR~mod = {a E IRmmod I(a, 1) = 1, a ~ o} .


It should be noted that (3) can be solved by the nonlinear programming problem

~,ivn {vi J(x) ~ Iv; F(x) = o} , (4)

where v E 1R1. The following two results are used to introduce the dual approach to
this problem.

Lemma (1). Problem (3) is equivalent to

mjn m~x {(a, J(x)) I F(x) = 0, a E lR~mod } . (5)

Proof This result, initially proved by Medanic and Andjelic (1971; 1972) and also
Cohen (1981), follows from the fact that the maximum of mmod numbers is equal to
the maximum of their convex combination. 0
In Medanic and Andjelic (1971; 1972), the model is assumed to be linear, and
the solution of (5) without the constraints F(x) = 0 is obtained using an iterative
algorithm that projects a onto lR~mod. In Cohen (1981), the iterative nature of the
projection is avoided by dispensing with the equality constraint in lR~mod but including
a normalization in a transformed objective function. Although the resulting objective
function is not necessarily concave in the maximization variables, the algorithm
proposed ensures convergence to the saddle point. The algorithm proposed in Cohen
(1981) for nonlinear systems utilizes a simple projection procedure but is essentially
first order.
Let a* be the value of a that solves (5). It can be shown, by examining the first
order conditions of (5) and (4), that a* is also the shadow price associated with the
inequality constraints in (4). An important feature of (5) that makes it preferable to
(3) is that a j can also be interpreted as the importance attached by the policy maker
to the model Fi(x) = O. There may be cases in which the min-max solution a* may
be too extreme to implement. The policy maker may then wish to assign a value to
a, in a neighbourhood of a*, and determine a more acceptable policy by minimizing
(a, J(x)), with respect to x, for the given a. Another interpretation of (5) is in
terms of the robust character of min-max policies. This is discussed in the following
112 B. Rustem

Lemma:

Lem!M (2). Let there exist a min-max solution to (5), denoted by (x*, a*), and
let J and F be once differentiable at (x*,a*). Further, let strict complementarity
hold for a 2 0 at this solution. Then, for i, j, e E {I, 2, ... , mmod}

Vi,j (i =I- j) iffa:,a~ E (0,1);

Vi,j,e(e =I- i,j) iffa; = 0 and


a; = a~ E (0, 1);
Vj, (j =I- i) iff a: = 1;

Vj, (j =I- i) iff a~ = O.

Proof. The necessary conditions for optimality of (5) are

\7 xJ(x*) a* + \7 xF(x*)A* = 0, (6a)

(6b)

(6c)

(7)

where A*, J.L*, 'TJ* are the multipliers of F( x) = 0, a 2 0 and (1, a) = 1, resl?ectively.
Necessity in case (i) can be shown by considering (7), which, for a:, a~ E (0,1)
yields a:J.L~ = a~J.L~ = 0, and then J.L: = J.L~ = O. Using (6) we have Ji(x*) =
Jj (x*). Sufficiency is established using Ji(x*) = Jj (x*) and noting that

'TJ* = -(J(x*), a*) .


PremuItiplying the equality in (6b) by 1 and using this equality yields

By (7), J.L* = O. Furthermore, strict complementarity implies that a~ E (0,1), Vi.


Case (ii) can be shown by considering (7) for at a~ E (0,1), a; = O. We have
a~J.L: = a~ J.L~ = a;J.L; = 0, thence J.L: = J.L~ = 0 and, by strict complementarity,
J.L; > O. From (6) we have

(8a)
Robust Min-max Decisions with Rival Models 113

and combining these yields

jl(x*) - jm(x*) = -p,; < 0; m = i,j .


For sufficiency, let ji(X*) = jj(x*) > jl(x*). Combining (8) and using (7),
we have

Since jl(X*) - jm(x*) < 0, we have 0:; = 0. Given that 0:; = 0, 'V£, jl(X*) <
jm(x*), we can use (2) for those i,j for which ji(x*) = jj(x*) to establish
p,~ = p,~ = 0. By strict complementarity this implies that o:~o:~ E (0,1).
Case (iii) can be established noting that for o:~ = 1, we have p,~ = 0, o:~ = 0,
'Vj I- i and, by strict complementarity, p,~ > 0. From (6) we thus obtain

P(x*) - ji(X*):::; p,~ - p,~ = -p,~ < 0.


Conversely, ji(x*) > ji(X*) implies

o:~(jj(x*) - ji(X*)) = o:~p,: ~ °


and thus o:~ = 0, 'Vj I- i. Case (iv) can be established as the converse of (iii). 0
The above result illustrates the way in which 0:* is related to l(x*). When some
of the elements of 0:* are such that 0:: E (0, 1) for some i E M c {I, 2, ... , mmod},
it is shown that the ji(X*) have the same value. In this case, the optimal policy x*
yields the same objective function value whichever model happens to represent the
economy. Thus, x* is a robust policy. In other circumstances, the policy maker is
ensured that implementing x* will yield an objective function value that is at least
as good as the min-max optimum. This noninferiority of x* may, on the other hand,
amount to a cautious approach with high political costs. The policy maker can, in
such circumstances, use 0:* as a guide and seek in its neighbourhood a slightly less
cautious scheme that is politically more acceptable. As mentioned above, this can be
done by minimizing (0:, lex)) for a given value of 0:.
In a numerical example of the min-max approach (5), two models of the UK
economy have been considered. One of these is the HM Treasury model (0: 1) and
the other is the NIESR model (0: 2 ). The min-max solution is found to be o:~ = .6
and 0:; = .4. This is discussed further in Section 5.
In the algorithm discussed in the next section, a stepsize strategy is described that
directly aims at measuring progress towards the min-max solution. The algorithm
defines the direction of progress as a quasi-Newton step obtained from a quadratic
subproblem. An augmented Lagrangian function is defined, and a procedure is for-
mulated for determining the penalty parameter. The growth in the penalty parameter
is required only to ensure a descent property. It is shown in Rustem (1992) that this
penalty parameter does not grow indefinitely.
114 B. Rustem

3. MIN-MAX ALGORITHM FOR RIVAL MODELS

Let the Lagrangian function associated with (2) be given by

L(x,a,>.,p,,1}) = (J(x),a) + (F(x),>.) + (a,p,) + (I,a) -1)1}, (9)

where>. E ]Rm x T , P, E ]R~mod = {p, E ]Rmmod I p, 2: O} and 1} E ]Rl are the multipliers
associated with F( x) = 0, a 2: 0 and (1, a) = 1, respectively. The characterization
of the min-max solution of (2) as a saddle point requires the relaxation of convexity
assumptions (see Demyanov and Malomezov, 1974; Cohen, 1981). In order to
achieve this characterization, we modify (9) by augmenting it with a penalty function.
Hence, we define the augmented Lagrangian by

La(x,a,>.,p,,1},c) = L(x,a,>.,p,,1}) i
+ < F(x),F(x)), (10)

where the scalar C 2: 0 is the penalty parameter.


In nonlinear programming algorithms, the penalty parameter C is either taken as
a constant, is increased by a prefixed rate, or is adapted as the algorithm progresses.
Specific examples of the adaptive strategy are Biggs (1974), Polak and Tits (1981),
Polak and Mayne (1981). In this section, we also adopt such a strategy. However,
we depart from the other works in adjusting C to ensure that the direction of search
is a descent direction for the penalty function that regulates the stepsize strategy (14)
below (Rustem, 1992; Lemmas 3.2 and 3.4). This approach is an extension of a
strategy for nonlinear programming discussed in Rustem (1986, 1993).
Let H (.) H(·) denote the Hessians of L and La, with respect to x, evaluted at (.),
respectively, and define the matrix

Sometimes, VF(x) evaluated at Xk will be denoted by VFk. and F(Xk) will be


denoted by Fk . Thus, a local linearization of F(x) at Xk can be written as

Assumption ( 1). The columns of V Fk are assumed to be linearly independent. 0


This assumption is used to simplify the quadratic subproblem used in the algorithm
below for solving (2) and to ensure that the system Fk + V F[[x - Xk] has a solution,
\lxk. This assumption can be relaxed, but only by increasing the complexity of the
quadratic subproblem.
Consider the objective function

F(x, a) = (a, J(x))

and its linear approximation, with respect to x, at a point Xk.

(Ila)
Robust Min-max Decisions with Rival Models 115

where

Y' J(x) = [Y' JI(X), ... , Y' Jm(x)] .


We shall sometimes denote J(x) and Y' J(x), evaluated at Xko by A and Y' J k,
respectively. Thus, for d = x - Xko (11 ,a} can be written as

(11b)

The quadratic objective function used to compute the direction of progress is given
by

or, alternatively, by

The matrix Hk is a symmetric positive semi-definite1 approximation to the Hessian


m e
Hk = 2: a1Y'2Ji(Xk) + 2: A{Y' 2Fj(Xk)+cY'FkY'F[ (12)
i=1 j=1

The second derivatives due to the penalty term in the augmented Lagrangian (i.e.
c E;=I Y'2 Fj(Xk) Fj (Xk)) are not inclu~ed in (12). The reason for this is discussed
in Rustem (1992). Furthermore, since FJ(x*) = 0 at the solution x*' ignoring this
term does not affect the asymptotic properties of the algorithm. The values ak and Ak
are given by the solution to the quadratic subproblem in the previous iteration. The
direction of progress at each iteration of the algorithm is determined by the quadratic
subproblem

(13a)

Since the min-max subproblem is more complex, we also consider the quadratic
programming subproblem

The two subproblems are equivalent, but (13,b) involves fewer variables. It is shown
below that the multipliers associated with the inequalities are the values a and that
the solution of either subproblem satisfies common convergence properties.
Let the value of (d,a,v) solving (13) be denoted by (dk,ak+l,vk+t). The
stepsize along dk is defined using the equivalent min-max formulation (3). Thus,
consider the function
1 i.e. (v, fhv) ~ 0, for all v -:/= o.
116 B. Rustem

'I/J(X) = iE{I,2,max
... ,mmod}
{Ji(x)}

and

'l/Jk{X) =, max
'E{I,2,oo.,mmod}
{Ji(Xk) + (V' Ji(Xk), x - Xk)} .

Let 'l/Jk(Xk + dk) be given by


'l/Jk(Xk+dk) = 'E{I,2,oo.,mmod}
, max {Ji(Xk)+(V'Ji(Xk),d k )}.

The stepsize strategy determines Tk as the largest value of T = (-y)j, , E (0, 1),
j = 0, 1,2, ... such that Xk+1 given by

satisfies the inequality


Ck+1 Ck+1
'I/J(Xk+I) + T(Fk+t.Fk+I) -'l/J(Xk) - T(Fk,Fk} :::; pTk~(dk,Ck+d,
(14a)

where p E (0,1) is a given scalar and

The stepsize Tk determined by (14) basically ensures that Xk+1 simultaneously re-
duces the main objective and maintains or improves the feasibility with respect to the
constraints. The penalty term used to measure this feasibility is quadratic and consis-
tent with the augmented Lagrangian (10). It is shown in Rustem (1992; Theorem 4.1)
that (14) can always be fulfilled by the algorithm.
The determination of the penalty parameter C is an important aspect of the algo-
rithm. This is discussed in the following description:

The Algorithm

Step 0: Given xo, Co E [0,00), and small positive numbers 8, p, c" such that
I
8 E (0,00), P E (0, I), c E (0, 2]" E (0, I), Ho, set k = 0.
A

Step 1: Compute V' Jk and V' Fk. Solve the quadratic subproblem (13) (choosing
(13,a) or (13,b) defines a particular algorithm) to obtain db Qk+I, and the
associated multiplier vector Ak+I' In (13,a), we also compute ILk+t.1/k+1
and in (13,b) we also compute Vk+I.

Step 2: Test for optimality: If optimality is achieved, stop. Else go to Step 3.

Step 3: If
Robust Min-max Decisions with Rival Models 117

then Ck+ 1 = Ck. Else set


_ {1/Jk(Xk + dk) -1/J(Xk) + (c + ~)(dk' iIkdk) }
Ck+l - max (H,Fk ) ,Ck +8
(15)

Step 4: Find the smallest nonnegative integer jk such that Tk = "I jk with Xk+l =
Xk + Tkdk such that the inequality (14) is satisfied.

Step 5: Update iIk to compute iIk+ 1, set k = k + 1 and go to Step 1.

In Step 3, the penalty parameter Ck+l is adjusted to ensure that progress towards
feasibility is maintained. In particular, Ck+l is chosen to make sure that the direction
dk computed by the quadratic subproblem is a descent direction for the penalty
. Ck+l (
functton1/J(xk) - -2- Fk,Fk}.
In Rustem (1992), it is shown that dk is a descent direction, that Ck determined
by (15) is not increased indefinitely, that the algorithm converges to a local solution
of the min-max problem, that the stepsize stepsize Tk converges to unity, and that the
local convergence rate near the solution is Q- or two-step Q-superlinear, depending
on the accuracy of the approximate Hessian, iIk.

4. NUMERICAL EXPERIMENTS

In this section, we illustrate the behaviour of the method with a few test examples.
The objective is to highlight the characteristics of the algorithm along with certain
properties of min-max problems. Specifically, we show the attainment of unit step-
sizes (Tk = 1), the way in which the penalty parameter Ck achieves the constant
value C*, and the numbers of iterations and function evaluations needed to reach the
solution in each case. The attainment of a constant penalty parameter is important
for numerical stability. The achievement of unit steps is important in ensuring rapid
superlinear convergence (Rustem, 1992).
We also show the progress of the algorithm towards the min-max solution, which
exhibits certain robustness characteristics predicted by theory. As discussed in
Lemma 2, if the min max over three functions Jl, J2 and J3 is being computed,
then, at the solution, Jl = J2 > J3 iff aI, a2 E (0, I] and a3 = 0 or Jl > J2 ~ J3
iff al = 1 and a2 = a3 = 0. 2 Lemma 2 states this in greater generality, and the
examples illustrate it. Since a is chosen to maximize the Lagrangian (9), the solution

2 Suppose that the state of the world is described bi' say, three rival theories one of which
is to tum out to be the actual state. With J1 = J > J3, at the min-max solution, the
decision maker need not care, as far as the ob~ective function values are concerned, if the
actual state turns out to be J 1 or J 2. If it is J , then the decision maker is better off. The
Lagrange multiplier vector a indicates this in the min-max formulation (4) and the associated
subproblem (13,b). The robustness aspect is underlined by Lemma 2.
118 B. Rustem

can be seen as a robust optimum in the sense of a worst-case design problem. The
figures describing the convergence of the algorithms also illustrate the process of
convergence of the objective functions to the min-max optima.
We consider six test examples. Three of these are unconstrained min-max prob-
lems in which we study the achievement of unit steplengths, and three are constrained
problems in which we study both the achievement of unit stepsizes and a constant
penalty parameter value c*. The approximate Hessian computation uses the BFGS
updating formula and, for constrained problems, its modification discussed in Powell
(1978). The Hessian approximation is done on the second derivative terms arising
from the Lagrangian (i.e. the first two terms on the right of (12)) whereas the exact

at <5 = 0.01, p = 0.1, ,=


value for the term CkNkN'{ is used. The other parameters of the algorithms are set
0.1.

Example 1. (Charalambous and Bandler, 1976)

{J (x) = xi + x~; P(x) = (2 -


1 xI)2 + (2 - X2)2; P(x) =2 exp(x2 - xI)}

Initial values: x'{; = [1, -0.1]; o'{; = [~,~,~]

Hessian evaluation scheme


Direct BFGS
x* 1.1390376520 1.1390376520
0.8995599640 0.8995599340

Q* 0.4304811740 0.4304811740
0.5695188260 0.5695188360
0.0 0.0

J1(x*) 1.9522244939 1.9522244939


J2(x* ) 1.9522244939 1.9522244939
J3(x* ) 1.5740776268 1.5740776268

I
kT Tk = 1; Vk 2: kT 3

No. of J evaluations 5 7
No. of iterations 5 7

The same example was also computed with the initial value of x changed to [0,0].
All the results are identical except that the algorithms took 6 iterations and function
evaluations for the direct Hessian case and 8 iterations and function evaluations for
the Hessian approximated by the BFGS formula.

3 kT denotes the iteration at which Tk = I was reached and unit stepsizes were maintained
Vk 2: kT •
Robust Min-max Decisions with Rival Models 119

Example 2. (Charalambous and Bandler, 1976)

{J 1(x) = xt + x~; J2(x) = (2 - xI)2 + (2 - x2)2; J3(x) = 2 exp(x2 - xI)}

Initial values: x6 = (1, -0.1]; 0:6 = [~,~,~]

Hessian evaluation scheme


Direct BFGS
1.0 1.0
1.0 1.0

Q* 0.33333333 0.33333333
0.5 0.5
0.16666667 0.16666667

J 1(x*) 2.0 2.0


J 2 (x*) 2.0 2.0
J 3 (x*) 2.0 2.0

krlTk = 1; Vk:::: kr

No. of J evaluations 5 5
No. of iterations 5 5

The same example was also computed from the initial value [0,0]. The same result
in reached in 7 iterations for both Hessian evaluation schemes and unit stepsizes are
again achieved at iteration 1.
120 B. Rustem

Example 3. (Polak, Mayne and Higgins, 1986)

{J1(X) = exp (1~~ + (X2 - 1)2) ; J2(x) = exp (1~~ + (X2 + 1)2) }
Initial values: xir = [50.0,0.05]; air = [0.5,0.5]

Hessian evaluation scheme


Direct BFGS
9
x. 8.1917 x 10 2.5711 x 10 9
1.09957 X 10- 15 6.7297 X 10- 21

a. 0.5 0.5
0.5 0.5

J1(x.) 2.718281828 2.718281828


J2(x.) 2.718281828 2.718281828

k.,.ITk = 1; Vk ~ k.,.

No. of J evaluations 8 11
No. of iterations 8 11

If the algorithm is started from the initial point [1.0, 1.1], the same result is obtained
in 7 (direct) and 11 (BFGS) iterations.
Robust Min-max Decisions with Rival Models 121

Example 4. (Conn, 1979)

{Jl(X) = xi + X~; J2(X) = (2 - xd 2 + (2 - X2)2; J3(X) = 2 exp(x2 - xd}


subject to the constraints

{Fl(X) = Xl + X2 - 2; F2(X) = -xi - x~ + 2.25} .


This example is solved for different initial values and for two different initial penalty
parameters Co =
1.0 and Co =
10.0. Other initial values for all cases are = 0!6
[t, t, t] and the Lagrange multiplier estimates = A6 [t, t]·
Initial value X6
= [2.1, 1.9]

Co =1.0 Co= 10.0


Hessian evaluation scheme Hessian evaluation scheme
Direct BFGS Direct BFGS
x. 1.353553391 1.353553391 1.353553390 1.353553391
0.646446609 0.646446609 0.646446609 0.646446609

ct. 0.0 0.0 0.0 0.0


1.0 1.0 1.0 1.0
0.0 0.0 0.0 0.0

A. 4.000000015 4.0 4.000000008 3.999999997


1.000000008 0.999999987 1.000000004 0.999999976

Jl(x.) 2.250000004 2.250000002 2.250000002 2.250000002


J2(X.) 2.250000004 2.250000002 2.250000002 2.250000002
J3(x.) 0.986137337 0.986137381 0.9861378 0.9861378

c. (final) 1780.693576 2.488613732 31.86545192 46.69260520

k.,.I'Tk = 1; Vk ~ k.,. 4 3 3 3

k.lck = c.; Vk ~ k.4 7 4 3 5

No. of fn. eval. 15 12 12 12

No. of iterations 10 9 9 9

4 Iteration number at which Ck reached a value c. such that Ck = c. Vk after this iteration.
122 B. Rustem

Initial value xif = [1.9,2.1) from which the algorithm converges to a local solution.

Co = 1.0 Co = 10.0

Hessian evaluation scheme Hessian evaluation scheme


Direct BFGS Direct BFGS
x. 0.646446609 0.646446609 0.646446609 0.646446609
1.353553390 1.353553390 1.353553390 1.353553390

Q. 0.0 0.0 0.0 0.0


0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0

A. 11.47275085 11.47275085 11.47275085 11.47275085


5.73637543 5.73637542 5.73637543 5.73637542

Jl(x.) 2.250000002 2.250000002 2.250000002 2.250000002


J2(x. ) 2.250000002 2.250000002 2.250000002 2.250000002
J3(x.) 4.056229975 4.056229975 4.056229975 4.056229975

c. (final) 6601.845484 2.442484228 79.34209 44.1686

k.,+k = 1; Vk ~ k.,. 4 3 3 3

I
k. Ck = C.; Vk ~ k. 4 7 2 5 5

No. of fn. eva!. 28 12 12 12

No. of iterations 10 9 9 9
Robust Min-max Decisions with Rival Models 123

Initial value xl = [4,2].

C{) = 1.0 C{) = 10.0


Hessian evaluation scheme Hessian evaluation scheme
Direct BFGS Direct BFGS
x. 1.353553392 1.353553391 1.353553392 1.353553391
0.646446609 0.646446609 0.646446609 0.646446609

a. 0.0 0.0 0.0 0.0


1.0 1.0 1.0 1.0
0.0 0.0 0.0 0.0

>.. 4.000000764 3.999999990 4.0 4.0


1.000000386 0.999999994 1.0 1.0

Jl(x.) 2.250000002 2.250000002 2.25 2.250000002


J2(x. ) 2.250000002 2.250000002 2.25 2.250000002
J3(x.) 0.986137356 0.986137356 0.9861356 0.9861356

c. (final) 1.0 1.0 1.0 12.5924227

kTITk = 1; Vk 2: kT

k.1 Ck = C.; Vk 2: k. 4

No. of fn. eva!. 8 8 8 8

No. of iterations 8 8 8 8
124 B. Rustem

Initial value x6 = [2,4] from which the algorithm converges to a local solution.

co = 1.0 co = 10.0
Hessian evaluation scheme Hessian evaluation scheme
Direct BFGS Direct BFGS

x* 0.646446609 0.646446609 0.646446609 0.646446609


1.353553390 1.353553390 1.353553390 1.353553390

Q* 0.0 0.0 0.0 0.0


0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0

A* 11.47275085 11.47275085 11.47275085 11.47275085


5.73637543 5.73637542 5.73637543 5.73637542

J 1 (x*) 2.25 2.25 2.250000002 2.250000002

J 2 (x*) 2.25 2.25 2.250000002 2.250000002

J 3 (x*) 4.056229963 4.056229964 4.056229963 4.056223007

c* (final) 1.21109174 2.094947393 24.8366 10.165631

k.,.ITk = 1; Vk ~ k.,. 3 2

k*lck = c*; Vk ~ k* 4 4 4 4 3

No. of fn. eva!. 12 10 8 8

No. of iterations 10 9 8 8
Robust Min-max Decisions with Rival Models 125

Example 5. The min-max formulation of the Rosen-Suzuki problem (Conn, 1979)

Jl(x) = xi + x~ + 2x~ + x~ - 5Xl - 5X2 - 21x3 + 7X4 ,


J2(x) = -9xi - 9x~ - 8x~ - 9x~ - 15xl + 5X2 - 31x3 + 17x4 + 80,
P(x) = llxi + llx~ + 12x~ + llx~ + 5Xl - 15x2 - llx3 - 3X4 - 80,

subject to the constraints

Fl(x) = -xi - 2x~ - x~ - 2x~ + Xl + X4 + 10,


F2(x) = -2xi - x~ - x~ - 2Xl + X2 + X4 + 5.
This example is solved for different initial penalty parameters Co = 1.0 and Co = 10.0.
Initial values X5 a5
= [0,1,1,0]; = [~,~, ~]; = [~, ~]. A5
Co =
1.0 co = 10.0
Hessian evaluation scheme Hessian evaluation scheme
Direct BFGS Direct BFGS
-2.3606 x 10 1.48215 x 10 14 -2.3606 x 10 1.02673 x 10 12
x. 9 7

1.00000003 1.0 1.00000003 1.0


2.00000016 2.0 2.00000016 2.0

Q. 0.0 0.0 0.0 0.0


0.45 0.45 0.45 0.45
0.55 0.55 0.55 0.55

-X. -8.0938 x 10- 9 1.04033 X 10- 9 -8.0911 X 10- 9 1.45367 X 10- 9


2.0000006 2.0 2.0000006 2.0

JI(x.) -44.0 -44.0 -44.0 -44.0


J2(x.) -44.0 -44.0 -44.0 -44.0
J3(x.) -44.0 -44.0 -44.0 -44.0

c. (final) 1.0 1.0 10.0 10.0

krlTk = 1; Vk 2: kr 15

k.1 Ck = C.; Vk 2: k. 4
No. of fn. eval. 20 9 20 26

No. of iterations 20 9 20 19
126 B. Rustem

Example 6. Two constraints imposed on Example 2.

{J 1 (x) = x1 + x~; J2(X) = (2 - xI)2 + (2 - X2?; J3(x) = 2 exp(x2 - xI)}


subject to the constraints

{FI(x) = x? - x~; p2(x) = -2xj - x~} .


This example is solved for different initial penalty parameters Co = 1.0 and Co = 10.0.
Initial values xZ'
= [0,1]; = aZ' [1,1,1]'

=
co 1.0 =
co 10.0
Hessian evaluation scheme Hessian evaluation scheme
Direct BFGS Direct BFGS
x. 1.000000069 1.0000003456 1.0000003456 1.0000003456
1.000000069 1.0000003456 1.0000003456 1.0000003456

0<. 1.0 1.0 1.0 1.0


0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0

A. 2.500000147 2.5 2.500000147 2.5


-1.500000112 -1.499995219 -1.500000112 -1.49999811429

J1(x.) 2.0000008294 2.00000004147 2.00000004147 2.00000004147


J2(x.) 1.9999994471 1.9999997235 1.9999997235 1.9999997235
J3(x.) 2.0 2.0 2.0 2.0

c. (final) 6.381975949 137.83894126 10.431886987 545.15388777

k. . ITk = I; Vk ~ k. . 5 5 5 5

k.ICk = c.; Vk ~ k.4 5 3 5 3

No. of fn. eval. 23 23 23 23

No. of iterations 20 20 20 20

Figures 1-3 below describe the behaviour of the objective functions as the algorithm
proceeds to the optimum. The figures for the constrained examples 4-6 correspond
to the results for the initial penalty parameter Co = 10.
c.., r..... "'...... 1...... 1t. •• 'J9. I I _ " t •• " . . . .1_ * I~n." 0N4 r...u- ,-'- 11'. ~ ...u.. ... h ••,I • • (1I1f_U

· -

:~
j j
1 1
I
· 1'\ -
8'<::s-
"t::::: ~
--
i
... 1,1 ... ... ... ... ... ... 1,1
aS!::
U,.,.1oIMI .................. ,.w,-. (I, ~ ., II
· 1' ....'"- _-'"" ... &."N' .... "" .. (t ...... "

e..t. rv.-u.o ...""'_ .. kH.u.. __ ............. 1 ("'-'l c... ~


i
"....... tt.-.u.. ..........
i i
Il.IJII . . . . . . I.~'t
;.
~
n
~
1:;'
_ .. . ~ ~.
..
I J '"~
I I
I
§:
,f>
:---1
'l~ =1--
'~/
:;:.;,
:V-I·~- [
t&«.u.. .... "-' ........U.I ..... \.11 • • •• ')
"..-"u.............. LtlIVM ..M .. .... , • • • )
~
~
1:0
Fig. 1. The convergence of the objective functions vs iteration number for examples 1-2 in Section 4. "Direct" signifies the direct (exact, as opposed
to approximate BFGS) evaluation of Hessians. !j
-
IV
00
40.
c., , ..... th ..... 1... r. n.,.u ••
I
Jtu ....... I ....... :J ("FMt,
. c:... r..u .. ,..allll. '.. n.r.u.. ........ - 1II• • ple ... (DIIlICT)
-
,. !':l
. -- ~--- ---. ~
l::

1-
~
I 3
: J
] ]

. - - -- - .. '-- -.- -

II
o
.~~--~--~~--~--~~--~--~----~--~.

........ u .........r - •• tU.' P.t. . . . (10. G.H, It.... U •• ""~f' - I..,U.... ,..., .... (".D. I ..G)

C..t h • .u.. '.llole ,. l\enU•• My........ a......... ,.",.. t, c... '.eu.. 'f.au. Y. It..,....... Mln....r - IUIn,le • (D"lCTt
~ri-----r-----r-----r-----r-----r~---r--~~
'~I 1

J .. ~
1 l
••
,
.
•1 I?::: [ "
.De-I-I =---+
• •
It. . . u•• H\I,lnk" •• lalt,I.1 Polnl• • (I • • ,
1l.... U •• HUnl~.r .. Inlll.1 '.llIb • (2.0 . 4.01

Fig. 2. The convergence of the objective functions vs iteration number for examples 3-4 in Section 4. "Direct" signifies the direct (exact, as opposed
to approximate BFGS) evaluation of Hessians.
c.... r\li••u" V..lu. ,. u_*u.. filu_'o'r ......m'•• , (DIIlICT)
C.AIIt rlll •• u •• V..lu ••• Il.,.Uoa tf'\Ii"'IMr - .... m.l • • CDJlI:rcT,

10
I
.. - . I
.: I ! II iI ~"
-le
J J 0f\p _ _ aA
J k . - ...
J -.° II -
IV""
-to - ..- - -- --
O~
o
-to •

O~~~~~~_ _~~_ _1--1__ -.00o


..
I. .. . ••I ••
ILIII.... U. . M......... talU.1 Pel:nw • ( .. ,. I .')
I ...... U •• HU.'.r •• Inn,h) Po'." _ t •• ,
'" .0, •. ~
l::

c..t. r-..u.••".0 •• tt.ar........III ..... - 1&:. .,.. " (DIA&ICT'


'"
CIUI P\Ia.u •••• Iv. ••• 'ltInU. . JllIII.-..r •• IaUtl~ • • fDIJtI'C1', ~
". I I
-;.
~- , - -.
•__ "" ~ . _A

~
<"")

I "
. ._. 1:;'
~ 'J;i~1 (5'
J ;:s
] 10 ] .~--\l . '~ '"~
s:
~
I
·w·· •.
IIV
" [
OLI--~----L---~__-L__~__~~__L.__~__-L__.J
- 10 ,t 14 II •• ao
••
n.,..u•• "111111"" •• 'n'tI.' r.I"". OIl C• ••• 1,1'
~
I; I,.uuo" Humllter .. Iftlu., '0''''' • • (0. Ij ~
r;;
Fig. 3. The convergence of the objective functions vs iteration number for examples 4-Q in Section 4. "Direct" signifies the direct (exact, as opposed
to approximate BFGS) evaluation of Hessians. N
10
-
130 B. Rustem

5. THE UNCONSTRAINED CASE FOR RIVAL ECONOMIC MODELS

The constrained min-max problem (2) may also be formulated as an unconstrained


problem if it is possible to use the model i to eliminate yi and express (2) in terms
of U only. However, this may not be always feasible, if for example there are con-
straints other than those of the model or if it is not possible simply to eliminate the
model due to complex perfect-foresight conditions. Given the solution software that
accompanies most models, it is possible to evaluate yi for every value of U. We
thus have a numerical representation of the model in the form yi = gi(U). The
model derivative information dyi / dU can also be obtained numerically using this
approach. The advantages of this approach are that the unconstrained problem is
solved, thereby avoiding potential complications due to constraints, and the dimen-
sionality of the problem is reduced to that of U. A disadvantage of the approach is
that the model solution is repeatedly computed to evaluate yi for every U considered
by the algorithm. By contrast, the constrained algorithm simultaneously converges
to an optimal and feasible [yi, Uj pair. Another disadvantage is that the derivative
information dyi / dU inevitably involves numerical inaccuracies and that may affect
convergence.
Once yi has been eliminated using the model, the min-max problem (5) can be
written as
m m
mjn ~~ {L: ci fi(U) I0 ~ ci ~ 1; L: ci = 1; i = 1, ... , m} , (16)
1 1

where fi(U) is the reduced objective function corresponding to the ith model after
yi have been evaluated and eliminated from J i .
The algorithm for solving (16) is essentially a simpler version of that for the
constrained case when the penalty parameter Ck = 0, Vk. A sequence {Ud is
generated so that, starting from a given initial policy vector, each member of the
sequence is defined by
(17)

where d k is the direction of search computed by solving the quadratic subproblem


(18) below, and Tk is the stepsize.
Consider the quadratic programming problem

min {v + tdT ilkd I (V' fi(Uk), d) + fi(Uk) ~ v; i = 1, ... ,m } . (18)


d,'"
Here, as in (13,b), v is a scalar, and dk is the direction of search. ilk is a symmetric
positive definite quasi-Newton approximation to the Hessian

= L:
m

Hk Q~ V'2 fi(Uk) ,
1

and V' P (Uk) is the column vector denoting the gradient of fi. The shadow prices
or the Kuhn-Tucker multipliers of (18) give the values Q~+ l' i = 1, ... , m.
In order to introduce the stepsize strategy T, we consider the function
Robust Min-max Decisions with Rival Models 131

¢(U) = max {t(U)}


iE{I.2 •...• m}

and

Let ¢k(Uk + dk ) be given by


¢k(Uk +dk) =. max {fi(Uk) + (vt(Uk),dk)} .
• E{I.2 ..... m}

The stepsize Tk is the largest value of T = (r)j. j = 0, 1,2, ... such that Uk+1 given
by (17) satisfies the inequality

where p E (0,1) is a given scalar and

<I>(dk ) = ¢k(Uk + dk) - ¢(Uk) + t(dk, fhdk) .


The unconstrained case has been considered for economic models in Becker et
at. (1986). with the UK economy models of HM Treasury and NIESR as rivals.
and in Karakitsos and Rustem (1991) where three small rival economic models are
considered. In the two model case discussed by Becker et al. (1986), the robust
characterization of min-max. summarized in Lemma 2. is illustrated by Figure 4. The
vertical axis gives the objective function values versus changes in a on the horizontal
axis. The min-max point is where the three curves meet. The concave curve is the
plot of
min {ajHMT(U) + (1 - a) jNIESR(U)} (19)
u
for given values ofa E [0, 1]. Its maximum corresponds to the value of a maximising
the above function and thence to the min-max solution. The functions jHMT(U) and
jNIESR(U) correspond to the reduced objective functions obtained after eliminating
yHMT and yNIESR from (2), using their respective models. The two convex curves
are the corresponding values of jHMT(U) and rIESR(U) in (19). As a increases,
jHMT(U) decreases, and as (1 - a) increases. jNIESR(U) decreases. The curves
meet at a = .6, which corresponds to the min-max value. At this point jHMT (U) =
jNIESR(U). as discussed in Lemma 2, and the policy corresponding to this point is
model invariant.
The performance of the preceding unconstrained algorithm was studied in con-
nection with the computations involving the three rival models in Karakitsos and
Rustem (1991). Each model is a discrete-time, nonlinear dynamic system of 11
equations with perfect foresight and 2 policy instruments; the problem is solved over
5 periods. Thus, dim (U) = 10. This problem was solved numerous times, with
different parameter specifications and initial points. Using a convergence criterion of
10- 5 for the satisfaction of the optimality conditions, the algorithm converged within
11-38 iterations, and unit stepsize (Tk = 1) was achieved at iteration 7-8 in two
132 B. Rustem

4500

4000

~ a fHHT + (1 - a) fNIESR

3500

0·4 0·6 0·8 10()

Fig. 4. The variation of min {a fHMT (U) + (1 - a) fNIESR (U)} as a varies, 0 ~ a ~ I, and
u
r
the corresponding values of the individual components MT (U), fNIESR(U).

out of seven cases. As opposed to the examples in Section 4, any failure to achieve
Tk = 1 could mostly be attributed to the errors in the numerical evaluation of the
model and dyi / dU. Such errors affect the accuracy of the Hessian approximation
of f i , which requires dyi / dU, and Tk = 1 can be shown to depend on the accuracy
of this approximation (see Rustem, 1992).
Robust Min-max Decisions with Rival Models 133

ACKNOWLEDGEMENTS

The valuable comments and suggestions of David Belsley are gratefully acknowl-
edged.

REFERENCES

Becker, R.G., B. Dwolatzky, E. Karakitsos and B. Rustem (1986). ''The Simultaneous Use of
Rival Models in Policy Optimization", The Economic Journal 96, 425-448.
Biggs, M.C.B. (1974). ''The Development of a Class of Constrained Minimization Algorithms
and their Application to the Problem of Power Scheduling", Ph.D. Thesis, University of
London.
Charalambous, C. and J.W. Bandler (1976). "Nonlinear Minimax Optimization as a Sequence
of Least pth Optimization with Finite Values of p", International Journal ofSystem Science
7,377-39l.
Charalambous, C. and A.R. Conn (1978). "An Efficient Algorithm to Solve the Min-Max
Problem Directly", SIAM J. Nwn. Anal. 15, 162-187.
Chow, G.C. (1979). "Effective Use of Econometric Models in Macroeconomic Policy Formu-
lation" in "Optimal Control for Econometric Models" (S. Holly, B. Rustem, M. Zarrop,
eds.), Macmillan, London.
Cohen, G. (1981). "An Algorithm for Convex Constrained Minmax Optimization Based on
Duality", Appl. Math. Optim. 7, 347-372.
Coleman, T.F. (1978). "A Note on 'New Algorithms for Constrained Minimax Optimization'
", Math. Prog. 15,239-242.
Conn, A.R. (1979). "An Efficient Second Order Method to Solve the Constrained Minmax
Problem", Department ofCombinatorics and Optimization, University of Waterloo, Report
January.
Demyanov, V.F. and V.N. Malomezov (1974). "Introduction to Minmax", l. Wiley, New York.
Demyanov, V.F. and A.B. Pevnyi (1972). "Some Estimates in Minmax Problems", Kibemetika
1,107-112.
Dutta, S.R.K. and M. Vidyasagar (1977). "New Algorithms for Constrained Minmax Opti-
mization", Math. Prog. 13, 140-155.
Granger, C. and P. Newbold (1977). "Forecasting Economic Time Series", Academic Press,
New York.
Han, S-P. (1978). "Superlinear Convergence of a Minimax Method", Dept. of Computer
Science, Cornell University, Technical Report 78-336.
Han, S-P. (1981). "Variable Metric Methods for Minimizing a Class of Nondifferentiable
Functions", Mathematical Programming 20,1-13.
Karakitsos, E. and B. Rustem (1991). "Min-Max Policy Design with Rival Models", PROPE
Discussion Paper 116, Presented at the SEDC Meeting, Minnesota.
Lawrence, M.J., R.H. Edmunson and M.l. O'Connor (1986). ''The Accuracy of Combining
Judgemental and Statistical Forecasts", Management Science 32, 1521-1532.
Makridakis, S. and R. Winkler (1983). "Averages of Forecasts: Some Empirical Results",
Management Science 29, 987-996.
Medanic, J. and M. Andjelic (1971). "On a Class of Differential Games without Saddle-point
Solutions", JOTA 8, 413-430.
Medanic, J. and Andjelic (1972). "Minmax Solution of the Multiple Target Problem", IEEE
Trans. AC-I7, 597-604.
Murray, W. and M.L. Overton (1980). "A Projected Lagrangian Algorithm for Nonlinear
Minmax Optimization", SIAM J. Sci. Stat. Comput. 1,345-370.
Polak, E. and D.Q. Mayne (1981). "A Robust Secant Method for Optimization Problems with
Inequality Constraints", JOTA 33, 463-477.
134 B. Rustem

Polak, E., D.Q. Mayne and J.E. Higgins (1988). "A Superlinearly Convergent Min-Max
Algorithm for Min-Max Problems", Memorandum No: UCB/ERL M86/J03, Berkely
California.
Polak, E. and A.L. Tits (1980). "A Globally Convergent, Implementable Multiplier Method
with Automatic Penalty Limitation", Appl. Math. and Optimization 6,335-360.
Powell, M.J.D. (1978). "A Fast Algorithm for Nonlinearly Constrained Optimization Prob-
lems", in G.A. Watson (Ed.), Numerical Analysis. Lecture Notes in Mathematics, 630,
Springer-Verlag, Berlin-Heidelberg.
Rustem, B. (1986). "Convergent Stepsizes for Constrained Optimization Algorithms", lOTA
49, 135-160.
Rustem, B. (1987). "Methods for the Simultaneous Use of Multiple Models in Optimal Policy
Design", in "Developments in Control Theory for Economic Analysis", (C. Carraro and
D. Sartore, eds.), Martinus Nijhoff Kluwer Publishers, Dordrecht.
Rustem, B. (1989). "A Superlinearly Convergent Constrained Min-Max Algorithm for Rival
Models of the Same System", Compo Math. Applic. 17, 1305-1316.
Rustem, B. (1992). "A Constrained Min-Max Algorithm for Rival Models of the Same Eco-
nomic System", Mathematical Programming 53, 279-295.
Rustem, B. (1993). "Equality and Inequality Constrained Optimization Algorithms with Con-
vergent Stepsizes", forthcoming lOTA.
PART THREE

Computational Techniques for Econometrics


WILLIAM L. GOFFE

Wavelets in Macroeconomics: An Introduction

ABSTRACT. Wavelets are a new method of spectral analysis that have attracted considerable
attention in numerous fields. Unlike Fourier methods, wavelets are designed to analyze
data that is nonstationary and subject to abrupt changes. Since macroeconomic data frequently
contains these characteristics, wavelets appear to be a natural tool for studying macroeconomic
time series. This paper first describes wavelets in an intuitive manner, and then explores their
use on macroeconomic data. Initial results are encouraging and more research is in order.

1. INTRODUCTION

Wavelets, a method of signal analysis, have attracted considerable recent attention.


The IEEE Transactions on Information Theory, for example, has devoted a special
issue (March 1992, Part II) with 31 papers to this topic and Academic Press has begun
a series of books devoted to wavelets. Daubechies, who perhaps has done more
fundamental work on wavelets than any other researcher, was awarded a MacArthur
prize this year. Considerable popular interest also exists ("Catch a Wave", 1992;
Kolata, 1991; Wallich, 1991 and Carey, 1992). Actual applications with wavelets are
either in research or soon to appear. The digital compact cassette (DCC), recently
introduced by Phillips (which is said to have CD sound quality) uses a wavelet-
like method called subband coding. Wavelets may also allow magnetic resonance
images to be generated instantaneously, thereby removing the lengthy exposure time
that limits this very useful imaging technique (Healy and Weaver, 1992). A startup
company devoted to wavelets is even in business.
Very briefly, the discrete wavelet transform is a two dimensional orthogonal
decomposition of a time series that is well suited, and is in fact designed, to detect
abrupt changes and fleeting phenomena. The orthogonality guarantees a unique
decomposition, and the two dimensionality allows the series to be studied at different
scales. One very important characteristic of the discrete wavelet transform used here
is that its basis functions have compact support, i.e., are non-zero for only a limited
range. Thus, they are able to pick up unique phenomena in the data. By contrast,
the basis function for Fourier transforms, sines and cosines, have infinite support and
are less well suited for detecting such phenomena in the data. In another contrast to
Fourier methods, the discrete wavelet transform can be used to study nonstationary
data. The implications for macro data are obvious and are demonstrated below.
This paper first describes the discrete wavelet transform and then studies their use
in analyzing macroeconomic data. These initial results are quite favorable and more
work is in order.

D. A. Belsley (ed.). Computational Techniquesfor Econometrics and Economic Analysis. 137-149.


© 1994 Kluwer Academic Publishers.
138 w.L. Golfe

2. DILATION EQUATIONS, MOTHER FUNCTIONS AND WAVELETS

a. References on Wavelets

In this and the next section wavelets are described in a very intuitive fashion. More
rigorous descriptions can be found elsewhere. Press (1992) has a particularly good
description of the wavelet decomposition algorithm and contains the Fortran code
to perform it. Strang (1989) provides a more mathematical treatment. Rioul and
Vetterli (1991) describe wavelets from a signal processing perspective. The ultimate
authority is Daubechies (1988).

b. Mother Functions and Dilation Equations

Wavelets begin with a solution to a dilation equation, which generates so-called


mother functions (also called scaling functions). These mother functions are inti-
mately related to wavelets and, in fact, are used in the discrete wavelet transform.
Equation (1) shows the specific dilation equation used in this literature:
m
</J(x) =L ck</J(2x - k) . (1)
k=O

The m + 1 values of Ck determine the mother function </J. This equation is recursive:
the sum of modified forms of </J equals </J itself. Viewed in graphical terms, it can be
seen that the 2 in the right hand argument of (1) shrinks </J horizontally, the k shifts </J
horizontally, and the Ck shrink or expand </J vertically.
As a simple initial example, consider the case with m = 1 and Co = Cl = 1 so
that (1) becomes

</J(X) = </J(2x) + </J(2x - 1) . (2)

The solution to (1) is the box function, which has value lover [0,1] and is zero
elsewhere. The coefficient 2 in the argument of the first term on the right hand side of
(2) shrinks </J horizontally by a factor of 2. Without the shift given in the second term,
t.
the box extends from a to The second term on the right hand side of (2), however,
t,
shifts the narrowed </J right by thus completing the solution. This is illustrated in
Figures 1 and 2, which show the left and right hand sides of (2), respectively.
A very interesting and useful family of orthogonal mother functions was discov-
ered by Daubechies (1988). Members of this family are both orthogonal and have
compact support, and are categorized by their ability to form polynomials of different
orders. Each member of this family, when taken as a linear combination with itself,
generates a given order polynomial. The first member is the box function; when
taken in linear combination with itself, it forms a constant value (i.e. a polynomial
of degree zero). The second member, which has four Ck coefficients, is called D4 by
Strang (by this notation, D2 is the box function). D 4 , taken in linear combination
with itself, forms a line of arbitrary constant slope (Le., a polynomial of degree one).
Its coefficients, in equation (3), are more complicated than those of the box function.
Wavelets in Macroeconomics: An Introduction 139

o ~ _ _---l

-1 o 1 2
Box Function
Fig. 1.

Ol----..J

-I o 1 2
Box Function
Fig. 2.

(3)

Higher order members of this family generate higher order polynomials, and, as can
be seen in going from D2 to D 4 , with each increase in the order of the replicated
polynomial, the number of Ck 's increases by two.
Figure 3 shows the D4 mother function. Its asymmetry is striking, and while it
is everywhere continuous, it is not everywhere differentiable. Specifically, it is not
left differentiable at so-<:alled dyadic points: k/(2 j ) for k and j integers (Pollen,
1992). This bizarre shape arises from the stringent requirements of local support and
orthogonality. A linear combination of D4 mother functions forms a polynomial of
degree one because the cusps cancel one another.
An important characteristic of mother functions and wavelets is that many are
constructed recursively; they cannot be written out in closed form like most more
familiar functions. For instance, Figure 3 was generated by using equation (l)
recursively with Ck as given in (3) and initial values for ¢(1) and ¢(2). Equation (1)
first yields ¢(~), ¢(1~) and ¢(2~). Next, values at ~ intervals can be constructed,
and so on. The function is non-zero only on [0,3).
140 w.L. Goffe

1.25

0.75
0.5
0.25
o
-0.25
o 0.5 1 1.5 2 2.5 3
D4 Motbcr Function
Fig. 3.

;---

-1 '--

-1 o 1 2
Haar Wavelet
Fig. 4.

c. Wavelets

A wavelet is described by

m
W(x) = L (-1)k ct _k</>(2x - k) . (4)
k==O

W (x) clearly bears a very close connection to (1); a wavelet is basically a rearranged
mother function. Figure 4 shows the Haar wavelet whose mother function is the box
function, and W4 is likewise similar to its mother function, D 4 •

3. THE DISCRETE WAVELET DECOMPOSITION

The discrete wavelet decomposition is best understood in comparison to the discrete


Fourier decomposition. Equation (5) shows the discrete orthonormal cosine Fourier
decomposition for a series oflength 4 (this short length shows its essential elements).
Wavelets in Macroeconomics: An Introduction 141

X(I)] _ al
x(2)
[ x(3) - V2
[4]
cos(-7r /8) a2 [4]
cos(37r /8)
cos(27r/8) + V2 cos(67r/8)
x(4) cos(37r/8) cos(97r/8)

+ a3
V2 [
cos(S7r /8)
4
cos(I07r/8)
cos( IS7r /8)
1+ V2 [4 1
a4 cos(77r /8)
cos(I47r/8)
cos(2l7r /8)
(S)

The basis functions for this decomposition consist of the four vectors on the right hand
side. The coefficients aI, ... , a4 are the discrete cosine transform of the vector x.
This equation clearly shows how a series can be decomposed into a sum of sinusoids
that vary in frequency. But it is important to note that these basis functions have
infinite support; that is, no elements of the basis vectors are zero. As a result, Fourier
type decompositions have difficulty picking out changes in x that are permanent or
fleeting; they are best for series that are stationary.
Equation (6) shows a wavelet decomposition of a time series of length 8 with the
Haar wavelet (as shown below, this longer length better illustrates the decomposition's
key features since it contains a variety of different sized wavelets).
xCI) 1 0 1 0
x(2) 1 0 1 0
x(3) 1 0 -1 0
x(4) bOl 1 b02 0 b ll -1 b 12 0
xeS) 2 0 +2 +2 0 +2
x(6) 0 1 0
x(7) 0 1 0 -1
x(8) 0 1 0 -1

1 0 0 0
-1 0 0 0
0 0 0
b21 0 b 22 -1 b23 0 b 24 0
+- +- +- +- (6)
V2 0 V2 0 V2 1 V2 0
0 0 -1 0
0 0 0 1
0 0 0 -1
The first two vectors on the right hand side are the mother function for the Haar
wavelet, which of course is the box function (the terms in the coefficients' denomi-
nators are part of the wavelet or mother function and make them orthonormal). The
other six vectors are the actual Haar wavelet at two scales and locations. The first two
of these are on a larger scale than the second four. The discrete wavelet transform
consists of the bij coefficients. The two subscripts demonstrate the two-dimensional
nature of the decomposition, one of its key features. These coefficients are catego-
rized by i = 0, 1 and 2. For i = 0, the basis is the mother function. For i = 1, the
142 W.L. Goffe

basis is the largest wavelet, and for i = 2, the smallest wavelet. In general, a wavelet
decomposition first contains two coefficients for the mother function. Then, defining
the length of a wavelet by its number of non-zero elements, coefficients for wavelets
cover half the series' length, then a quarter, an eighth, and so on, until the smallest,
which have two non-zero elements. As the coverage of each wavelet shrinks, the
number of wavelets of that size doubles. In (6), this can be seen when moving from
the third and fourth vectors to the fifth through eighth ones. Overall, this means that
the series to be decomposed must have a length that is 2 to a power (Le. 2,4,8, 16,
32, ... ). Unfortunately, this imposes some unwanted restrictions on the length of the
series one can study. For example, it does not appear that the process can be tricked
by padding the series with additional data, since any additional data would influence
the decomposition.
Most importantly, each wavelet contains a considerable number of zero elements
(the so--<:alled compact support). Thus, the discrete wavelet transform has the poten-
tial to "pick up" unique phenomena in the data.
Wavelets of different lengths refer to properties of the data at different frequencies,
though the preferred term in this literature is scale. The longest wavelets contain
elements of the data at low frequency and large scale, while the smallest wavelets
embody high frequencies and small scales.
To further illustrate wavelet decompositions and lay the groundwork for this
display in this paper, consider the following actual decomposition:
1 1 0 1 0
2 1 0 1 0
1 1 0 -1 0
3 3.5 1 6.5 0 .5 -1 3.5 0
10
-
2 0 +-
2 1 2 0 +-
2 1
0 0 1 0 1
2 0 1 0 -1
1 0 1 0 -1

1 0 0 0
-1 0 0 0
0 1 0 0
V2f2 +V2f2
0 .j2 -1 5.j2 0 0
-.j2 0 -.j2 0 +.j2
- 1 - -
.j2 0
. (7)
0 0 -1 0
0 0 0 1
0 0 0 -1
Note how the original series on the left hand side has a "spike" - the fifth term is
much larger than the others - and how the coefficient on the third of the smallest
wavelets (the seventh vector on the right hand side), located at the site of the spike,
reveals this phenomenon with its relatively large coefficient.
Figure 5 displays the discrete wavelet transform of (7). The bottom panel shows
the actual time series arranged horizontally with the smallest values on the bottom.
Wavelets in Macroeconomics: An Introduction 143

b2 • I I

bl I I I I
bOt---_---+---=----I-=--I- -=- -lI

234 S 678
Sample Wavelet Decomposition
Fig. 5.

Note the spike in the data at the fifth observation. The top panel shows the wavelets'
coefficients. The convention followed in their display uses the relative size of the
coefficeints within each category of wavelet. In the top panel, the bOj coefficients are
on the bottom row, the b lj coefficients are in the middle row, and the b2j coefficients
are on the top (the bo, bl and b2 terms on the left hand side of the figure denote the
rows). The coefficients in each of these three rows are arranged in terms of their
relative size, so the largest coefficient value "fills" its location in that row while the
smallest leaves it empty. Intermediate values fill the height of the row proportionally.
In (7), bOI is associated with the first half of the time series, and b02 is associated with
the second half, given the location of the mother functions in the first two vectors.
With bOi = 3.5 and b02 = 6.5, the first half of the bottom row, associated with the
smaller value 3.5, is empty, while the space to the right is full. Likewise, the middle
row, that for blj, has only two parts, so the left half, associated with the smaller blj
coefficient, is empty while the other half is full. Note that in both of these rows,
the wavelet coefficients cover a length of four, thus accounting for the four lines. In
the top row, that for b2j , there are four parts. Again the largest coefficients fill the
panel height; the smallest are empty and the intermediate values are so noted. In
the displays that follow, the first two rows of the wavelet coefficients are frequently
dropped since they convey little information (one part being invariably the largest
and the other the smallest) and take up valuable space.
The algorithm that generates the discrete wavelet transform is relatively simple
and quite fast. An intuitive description of the algorithm is given in Press (1992).
Briefly, a matrix is first formed with the Ck 's oriented roughly along the diagonal. The
data vector is mUltiplied by this matrix, and the resulting vector is sorted. Another
multiplication then takes place with half of the sorted vector and the matrix of the
Ck'S. This process of multiplication, sorting, and discarding half the vector continues
until the sorted vector contains two elements. The vector formed from the discarded
elements contains the discrete wavelet transform coefficients.
The algorithm's operation count is proportional to n, the length of the original
data and is therefore faster than the fast Fourier transform, whose operation count
is proportional to n . log2 (n). Thus, there are no computational constraints to the
discrete wavelet transform for most economic data and certainly none for macro data.
144 W,L Gaffe

10 20 30 40 SO 60
Wavelet Decomposition of a Step Function
Fig. 6.

4. THE DISCRETE WAVELET TRANSFORM ON GENERATED TIME SERIES

To better understand how the discrete wavelet transform decomposes data, we use
it here on several pathological time series. Figure 6 shows the graphics associated
with a discrete wavelet transform of a step series of length 64 (recall that the series
to be studied must have length of 2 to an integer power). It has a value of 1 prior to
observation 31 and jumps to 1.1 thereafter. The Haar wavelet seems to do a better
job than other wavelets on macro series (as described below), and so is employed
in this section. As can be seen, there is an abrupt change in all categories of the
wavelet coefficients at the point where the time series jumps. Note the bottom row
of the wavelet panel contains 4 wavelet coefficients (the first, third, and fourth have
the same size), each covering a length of 16, the next has 8 of length 8, the next 16 of
length 4, and the final 32 of length 2. The wavelet coefficients with only two entries
are not shown. In every row, all coefficients but one are the same size (this accounts
for the all or nothing height in the blocks). The change at the jump in the original
series is seen with all coefficients, even though they cover vastly different scales and
provide different resolutions for marking the break.
Figure 7 shows the decomposition of a sine wave. Once again, increased reso-
lution and detail are seen as one moves up the coefficient panel. This figure is most
instructive when compared with Figure 8, whose frequency is the same as that in
Figure 8, and then increases by 20% at the midpoint. While the change in frequency
is barely noticeable when comparing the two time series, note the very different low
frequency (long scale) behavior with the two sets of wavelet coefficients in the two
figures. Further, a comparison of the higher frequency wavelet coefficients shows
differences as well. While not as striking as Figure 6, it nonetheless shows the change
in the series at its midpoint.
Figure 9 shows a series whose slope increases by 20% at its midpoint. While
the change is barely visible to the eye, it is quite obvious by looking at the wavelet
coefficients panel that a major change has taken place in the actual location of the
change in the slope.
Wavelets in Macroeconomics: An Introduction 145

10 20 30 40 SO 60
Wavelet Decomposition of a Sine Wave
Fig. 7.

10 20 30 40 SO 60
Sine Wave with a Change in Frequency
Fig. 8.

b31~~===:#
b,~~~~~~________~

10 20 30 40 SO 60
Line with a Change in Slope

Fig. 9.
146 w.L. Goffe

1960 1965 1970 1975 1980 1985 1990


Growth Rate of Real GNP
Fig. 10.

5. THE DISCRETE WAVELET TRANSFORM ON THREE MACRO SERIES

This section is quite speculative since the work on wavelets in macro is very prelimi-
nary. However, it does suggest some interesting useS for wavelets in economics. The
Haar wavelet is used for all decompositions in this section since it proved to generate
the most interpretable decompositions. The D4 wavelet yielded decompositions of
macro data that did not correspond to known features in the data, such as business
cycles and secular changes. Other wavelets were not investigated.
Figure 10 shows the discrete wavelet decomposition of the quarterly growth rate
of real GNP for the period from 195811 to 19901 (128 observations). Business cycle
turning points are denoted by vertical lines in the bottom panel (dates determined
by the NBER). In turn we see the end of the 1957-1958 recession, the 1960--1961
recession, the 1969-1970 recession, the 1973-1975 recession, the 1980 recession,
and finally, the 1981-1982 recession. This figure is not particularly revealing; the
major revelation is the low and mid-frequency change in 1974, which appears to
capture the well-known slowdown in the economy's growth rate. The high frequency
components cannot be identified with any particular phenomena. Rather surprisingly,
the wavelet decomposition reveals only long term phenomena on real GNP's growth
rate.
Figure 11 shows the results for the level of the log of real GNP, and it conveys
more information than the previous case. In every recession but the first, which is only
partly included in the data, the highest frequency wavelet coefficients are quite large.
With the single exception of a period in 1959 (a downturn, but not a recession), they
are almost never as large in recessions, and so these large high frequency coefficients
identify recessions. These results hold up even better in the wavelet coefficients
with the next lowest frequency, though naturally with less resolution. At the low
frequencies, one sees significant changes in 1966Ito 196611 and 1974lto 197411. The
Wavelets in Macroeconomics: An Introduction 147

1970 1975 1980 1985 1990


Log of Real GNP
Fig. 11.

later seems to signify the change in the growth rate in the early 1970's. Overall, this
figure is rather impressive since it uses unfiltered data and it picks up both low and
high frequency items of interest.
Figure 12 examines the change in nominal nonfarm business sector compensation
per hour (Citibase series LBCPU) over the same dates as the real GNP data. The
first thing we see is how the spikes in the data are picked up by the high frequency
coefficients. We also note that the two lowest frequency components pick up breaks
after 19661, 19741 and 19821 with some evidence for a break after 19701 in the next
to lowest frequency coefficient. Balke (1991) studied this series for evidence of level
shifts using a modified version of Tsay's method for detecting outliers (1988). He
identified level shifts in 19681, 1972IV, and 198211. Given the low resolution of low
frequency wavelets, the match between these two very different methodologies is
remarkable.
Figure 13 analyzes a synthetic series. The data are generated from an ARIMA
model that Blanchard and Fischer (1989) fit to GNP growth. Thus, it is designed
to replicate the time series in Figure 10. While the two time series appear to have
the same characteristics to the eye, notice the relatively greater variation in the high
frequency components in Figure 13 than in Figure 10 (this is somewhat hard to
see given the different sizes of the figures). This suggests that the discrete wavelet
transform may be useful as a diagnostic tool for fitting models or as a method for
detecting data characteristics ignored by models.

6. CONCLUSION

This paper illustrated intuitively the discrete wavelet transform and then went on
to explore its usefulness for macroeconomics. Although very preliminary, initial
148 w.L. Gaffe

1960 1965 1970 1975 1980 1985 1990


Growlh Rale of Nominal Compensalion Per Hour
Fig. 12.

10 20 30 40 50 60
Blanchard-Fischer Data
Fig. 13.

results are quite promising. In particular, wavelets show potential in analyzing


nonstationary data. In fact, as was shown in Figure 11, the best results were obtained
with nonstationary data. Since it appears useful for nonstationary data, the method
also shows potential for examining data simultaneously at low and high frequencies,
so there is no particular short-runllong-run distinction. This is of particular macro
interest given that real business cycle theorists reject the conventional macro short-
run/long-run distinction. Finally, unlike Fourier methods, wavelet analysis locates
phenomena of interest in a time series. For instance, Figure 11 showed that recessions
and secular changes could easily be identified.
Further possible uses of the discrete wavelet decomposition include smoothing
macro time series or isolating short-term phenomena. Either of these could be accom-
plished by reconstructing the data with only the large-scale or small-scale wavelet
coefficients, respectively. The use of the wavelet decomposition as a diagnostic
Wavelets in Macroeconomics: An Introduction 149

tool for model building was demonstrated with data generated from the Blanchard-
Fischer equation, which was shown in Figure 13 to have a different decomposition
than actual data. Finally, it might be used not only as a diagnostic tool for statistical
and econometric models, but also for data generated from macroeconomic models.

ACKNOWLEDGEMENTS

I would like to thank, without implicating, Nathan Balke, Tony Smith, David Belsley
and the participants of the session titled "Computational Elements in Econometrics
and Statistics II" at the 14th Annual Congress of the Society for Economic Dynamics
and Control in Montreal in June, 1992.

REFERENCES

Balke, Nathan S. "Detecting Level Shifts in Time Series: Misspecfication and a Proposed Solu-
tion." Richard B. Johnson Center for Economic Studies Working Paper #9121. Department
of Economics, Southern Methodist University. 1991.
Blanchard, Olivier Jean and Stanley Fischer. Lectures on Macroeconomics. Cambridge MA:
MIT Press, 1989.
Carey, John. "'Wavelets' Are Causing Ripples Everywhere." Business Week. Febuary 3,1992:
74-5.
"Catch a Wave." The Economist. 323 no. 7754 (1992): 86.
Daubechies, Ingrid. "Orthonormal Bases of Compactly Supported Wavelets." Communications
in Pure and Applied Mathematics 41 (1988): 909-96.
Healy, Dennis M. Jr. and John B. Weaver. ''Two Applications of Wavelet Transforms in
Magnetic Resonance Imaging." IEEE Transactions on Information Theory. 38 (1992):
860-880.
Kolata, Gina. "New Technique Stores Images More Efficiently." New York Times. November
12, 1991: B5+.
Pollen, David. "Daubechies' Scaling Function on [0,3]." Wavelelts: A Tutorial in Theory and
Applications. Ed. Charles K. Chui. San Diego, CA: Academic Press, 1992.
Press, William H., Saul A. Teukolsy, William T. Vetterling and Brian P. Flannery. Numerical
Recipes in Fortran, The Art ofScientific Computing, second edition. New York: Cambridge
Universtiy Press, 1992.
Rioul, Olivier and Martin Vetterli. "Wavelets and Signal Processing." IEEE Signal Processing
Magazine. October 1991: 14-38.
Strang, Gilbert. "Wavelets and Dilation Equations: A Brief Introduction." SIAM Review 31
(1989): 616-27.
Tsay, R. S. "Outliers, Level Shifts, and Variance Change in Time Series." Journal of Forecast-
ing 7 (1988): 1-20.
Wallich, Paul. "Wavelet Theory: An Analysis Technique that is Creating Ripples." Scientific
American January 1991: 34-35.
C.R. BIRCHENHALL*

MatClass: A Matrix Class for C++

ABSTRACf. The MatClass project is an experiment in the use of object-oriented methods


in numerical methods using C++. MatClass is a family of numerical classes for C++ that
are freely available. MatClass combined with a C++ compiler gives the user a compiled
matrix language together with a set of numerical and statistical classes based on the key
matrix decompositions e.g. LU, Cholesky, Householder QR, and SVD. It is argued that C++
with classes such as MatClass offer a valuable third line of development complementing the
current standards of Fortran and GAUSS. While object-oriented numerical programmming
is not revolutionary, it is a significant new development. This paper aims to give a brief
and superficial overview of the current state of MatClass and to announce the availability of
Version I.Od.

1. INTRODUCTION

Given that computing is an increasingly important aspect of our working environment,


the profession should be concerned to promote good computing practice amongst its
members, just as it promotes good practice in the formulation of models and the
statistical testing of hypotheses. This is not the place to pursue this theme in depth;
suffice it to say that my aim in providing an open and free programming tool is to
contribute to the development of an embryonic computational economics.
Essentially, software development has been seen as a commercial activity rather
than a fundamental contribution to the discipline. This is not to suggest the authors
of the standard packages are not appreciated by the profession; on the contrary every
applied economist will greatly value the services provided by the author of his chosen
system. The problem is that the reliance on commercial incentives invariably means
that source code is kept secret, and so software is removed from peer review and
essentially closed to third party improvement and extension. No doubt one of the

* This document is partly derived from an earlier piece writtenjointIy by myself and Jarlath
Trainor. Many of the strengthens of the current document reflect Jarlath's contributions, but
he has no responsibility for the errors and weaknesses that have undoubtedly arisen from my
rewrite of this document or MatClass. Many thanks for Jarlath's assistance on this project.
My thanks also to the Department of Econometrics & Social Statistics for funding Jarlath's
time in the department. Anyone who is working with Unix workstations will know the value
of having a Unix wizard at hand; in my own case I am greatly indebted to Owen LeBlanc from
the Manchester Computing Centre. Finally my thanks to Ericq Horler and David Belsley for
their careful reading of an earlier version of this document and the removal of many errors.
While their efforts has greatly improved this work they cannot be held responsible for the
remaining inadequancies.

D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 151-172.
© 1994 Kluwer Academic Publishers.
152 C.R. Birchenhall

reasons for this has been the perceived complexity of software development and the
scarcity of requisite expertise. The success of GAUSS suggests that with appropriate
tools a significant number of economists will turn their hand to programming rather
than rely solely on the menu provided by the packages. If there is to be any prospect
of developing a subculture of economic code, there needs to be an appropriate
set of tools and standards. Recently there has been a good deal of interest in using
object-oriented programming systems (OOPS) in reducing the difficulties of software
development; in particular there has been a flourish of interest in the C++ language.
The central aim of the MatCiass project is to go some way in evaluating the potential
for an object-oriented approach to econometric computation using C++.

2. FORTRAN, GAUSS AND C++

Currently the choice of standards for programming in economics seems to be between


Fortran and GAUSS. MatClass aims at contributing to a third front, namely an object-
oriented approach based on C++. To clarify the potential role of C++, we first take
a quick look at Fortran and GAUSS.
Fortran's long history has led to a vast amount of code being available in var-
ious forms, including free and open libraries and numerous published algorithms.
Most numerical analysts still see Fortran as their lingua franca. From the perspec-
tive of "openness" Fortran ranks highly. Fine examples of freely available code
are LINPACK[4] and Numerical Recipes[12]. On the positive side Fortran offers
flexibility and portability. Fortran's flexibility arises from the compiled nature of
the runtime programs; that is to say the source code is translated into machine code
for the target hardware. The significance of this is that new algorithms can replace
or extend existing library routines without loss of efficiency. In contrast, GAUSS,
which from this perspective is an interpreter, makes 'third party' code suffer from
runtime inefficiences. If code is truly to be open then there should be few impedi-
ments to correcting, improving, or extending the code. Fortran's portability is based
on the popularity of the language, at least in science and engineering, which in turn
means most university systems have a Fortran compiler. Equally important is the
development of standards, in particular the standardisation of the language itself.
On the negative side, Fortran, at least prior to Fortran 90, suffered from lack of
support for aggregate structures and various control structures. In contrast to GAUSS,
Fortran 77 had no concept of a matrix, though Fortran 90 remedies this matter. In
contrast to Pascal or C, Fortran 77 does not support control structures such as "while"
or "case" statements - though some see the latter as a strength ([13] p. 13). Fortran
90 remedies these matters.! In contrast to C++ and other object-oriented languages,
Fortran 77 and 90 lack the concept of inheritance, the significance of which will be
discussed in Section 4.

1 I cannot recall who it was who suggested that the old Algol-Fortran argument was now
settled in Algol's favour, but it seems to be a fair comment on Fortran's belated acceptance
that there is merit in structures and the "new" controls.
MatClass: A Matrix Class/or C++ 153

Turning to GAUSS, we can readily understand its popularity by considering its


matrix syntax, which is supplied in a relatively simple interpretive environment with
efficient intrinsics. A matrix syntax leads to significant increases in programmer
productivity, particularly in econometrics. As an interpreter with extensive and
efficient intrinsics, GAUSS offers the user a "super calculator", where many statistical
calculations can be executed with relative ease. Indeed I would suggest that GAUSS
is to the econometrician what a spreadsheet is to an accountant - without wishing
to suggest that GAUSS programs are as intractable as Lotus macros. Unlike Fortran
and other compiled languages, GAUSS does not suffer from compiler and linker
lags. This latter feature is particularly important when working on older PCs with
relatively low computational power and slow disks. In contrast to Fortran, one of
the main drawbacks of GAUSS is the impediment it places on code replacement and
extension. A Fortran programmer can mix and match a library of routines without
loss of runtime efficiency. Low-level GAUSS code, i.e. code that involves layered
loops addressing individual elements of a matrix, is relatively inefficient, even when
compared with GAUSS' own intrinsic and fully compiled code, a situation that arises
from being an interpreter.
A major difficulty with the original GAUSS was its lack of portability; it is not
possible to port its Intel based assembler to the RISC based Unix workstations. There
is currently an expectation, however, that GAUSS will become available for SUN
workstations, and because this rewrite is based on C, GAUSS may become available
to other workstations. But the underlying code is not in the public domain, and the
porting to other systems will depend on the ability and willingness of the supplier. A
clear advantage to public domain code is that any interested party can initiate ports
to their chosen system, and portability is important if economic researchers are to
exploit the workstation model of computing. While the new 486 PCs are powerful
they fail to match the standards set by systems such as SUN's SPARC and HP's Series
700 workstations. This is not simply a matter of raw computing power - though
that is significant, see Section 9 - it is a matter of the stability and capability of
operating systems. Indeed I have found that my own move to such a machine has led
to significant increases in productivity, and I now find it difficult to return to a 386 PC
with DOS. Furthermore, I am confident that the cost of workstations will continue
to decline and they will soon be feasible for most researchers - be it a 486, or 586
with WindowslNT2, or RISC machines with Unix. What is also clear is that there
will be several contenders in this market, namely SUN, HP, IBM, DEC and various
Intel based kits, and thus portability of software will be an issue.
It has to be said that there are some aspects of the GAUSS language that are
less than desirable. In particular, the restricted range of data types (matrix and
string), together with the limited means for extending the type system (arrays), does
not encourage structured programming. While the matrix type is an improvement
on Fortran's restriction to scalar types and arrays, some general form of record or

2 MS-Windows suffers from poor memory management and instability due to its depen-
dance on DOS. Windows/NT or even OS/2 v2 hold more promise for sustained development
of Intel based systems.
154 C.R. Birchenhall

structure is extremely powerful and sadly missing from GAUSS. Needless to say
GAUSS offers no form of inheritance between types.
One minor concern is the long term support of GAUSS. Unlike Fortran with its
multiple sources and open libraries, GAUSS comes from a single source, and the
underlying code is closed. As far as I know there is no secondary source for GAUSS
interpreters, and this could be a major problem if this single source should fail. This
is not to suggest GAUSS should have no long term role; indeed it will undoubtedly
remain a major force in code development but I would be concerned to become solely
dependent on the current system.
It can be noted that most of the above comments on GAUSS apply to MATLAB,
although it is already available on most systems. While few econometricians are
using MATLAB, it is a system offering a similar set of basic facilities, although
it lacks some of the statistical intrinsics and does not have the various add-ons.
Coming from one of the authors of UNPACK, it has found favour among numerical
analysts[7]. It has been suggested that MATLAB not only facilitates the learning of
numerical methods, but it can also be used for prototyping new code before being
translated into Fortran. A similar role could be, and possibly is being, played by
GAUSS.
So where does C++ stand in relation to these existing standards? What makes
C++ interesting is its promise of combining object-oriented methods with runtime
efficiency. But C++ is currently more promise than fact and will not provide a
serious challenge to GAUSS or Fortran until there appropriate class libraries on
which users can build. C++ uses the concept of a class to allow object-oriented
extensions to the C language, i.e. developers can use the class system to extend
the object types supported by the compiler. MatClass is one of several classes that
defines a matrix type and gives the C++ user a matrix syntax and other facilities
similar to those offered by GAUSS or MATLAB.3 But that is not all, for the set of
possible data types are limited only by the programmer's imagination. MatClass uses
higher level classes to embody higher level functionality. For example, it offers a
family of classes to embody ordinary least-squares models. These OLS classes give
the C++ programmer a "package" that hides the storage and computational details
underlying the least-squares procedures. At the other extreme, MatClass has low-
level classes for debugging and error management. For example there is a generic
class, matObject, with error management capabilities whose features are inherited by
all other objects. These few sentences cannot give the reader a complete picture of
object-oriented programming, but we hope we have conveyed the image of a flexible
and extensible system for which MatClass is but a simple beginning.
In comparing the C++ and MatClass combination with GAUSS and Fortran,
note that MatClass acts as a complement to a C++ compiler which links the user
program with the MatClass library before execution. MatClass based programs, then,
are compiled to machine code, having the advantage of efficient low-level routines
but the disadvantage of compiler and linker lags. Unlike Fortran, C++ is a young

3 A commercial based class that offers similar, if not greater, facilities is M++ from Dyad
Software.
MatClass: A Matrix Class/or C++ 155

language, and there are few numerical libraries available. Fortunately, C++ is a
superset of C, and thus existing C libraries, in particular Numerical Recipes in C
[13], are readily adapted for use in C++. MatClass aims to make some contribution
in this direction. In its favour, when compared to Fortran 77, MatClass gives the C++
programmer a matrix syntax and an object-oriented environment. Apart from the
issue of compiler versus interpreter, the comparison of MatClass and GAUSS focuses
on C++'s support for, and MatClass's exploitation of, object-oriented programming.
A fuller discussion of OOPS will be offered below. Here we simply emphasize that
use of C++ classes gives the developer a highly structured language with strong
type checking, and note that most compilers come with source level debuggers that,
together with the debugging features of MatClass, make the debugging of larger
projects relatively straightforward.
Some might wonder whether a highly structured approach to numerical computing
with layers of types and classes might not suffer from runtime inefficiences. It will
be demonstrated that the combination of MatClass with a good quality compiler
produces code that is efficient, particularly for low-level tasks. Furthermore the code
is immediately portable to RISC based Unix - for example, timings will show a HP
720 workstation running up to 10 times faster than a 33Mhz 486 PC.
All in all, then the combination of (a) having the source code in the public domain,
(b) the compiled nature of user written routines and (c) C++'s support for OOPS,
should make MatClass a good candidate to act as a foundation for an "open sytem" of
software. While C++, with appropriate classes, will not replace Fortran or GAUSS,
it promises to be a valuable third leg for developers.

3. A QUICK LOOK AT C++ AND MATCLASS

An extensive evaluation of C++ cannot be given here, but a few words are in order.
C++ is a wide spectrum extension of C that allows the programmer to combine low-
level and object-oriented styles in the same program. In part C++ can be viewed as
a better and extended version of the C programming language. From this perspective
C++ is seen to inherit from C its efficiency,flexibility and portability. As a better C,
it offers better type checking and the opportunity to escape some of the danger points
of its parent language e.g. defined constants and macros. In the context of numerical
methods, the expectation is that C++ will allow a more highly structured approach
than its parent language while retaining its efficiency. A matrix extension to C++
promises to give even greater functionality in carrying out matrix calculations. In
particular it offers greater security than C and can incorporate the methods needed
to overcome the technical weaknesses in C's handling of multi-dimensional arrays.
Such extensions to a standard general purpose language offer greater openness and
portability than characterize proprietary packages such as GAUSS.
C++ is a general purpose language that does not have an intrinsic matrix type,
but its support for classes allows users to define their own types that have the same
privileges as intrinsic types. With a C++ matrix class, the user has a system that
combines the general power and flexibility of an increasingly popular general purpose
156 C.R. Birchenhall

language with the convenience offered by a matrix syntax. It is to be stressed that


MatClass is not just for the experienced C++ programmer. Indeed, the combination
of MatClass with friendly and sophisticated C++ development environments, such
as that supplied by Borland, can be used as an introduction to programming. Those
with some previous programming experience will be able to exploit the potential of
a fully compiled language that offers a highly structured matrix syntax.
C++ uses classes to allow the user to extend the range of objects supported by the
language and to facilitate structured programming methods and the use of abstract
data types. Instances of classes are considered to be abstract objects whose internal
structure need not be considered in order to successfully exploit their behaviour.
For example, the typical user of MatClass need not be concerned with the detailed
structure of a matrix object as long as it behaves properly e.g. they can be multiplied
or added. From this perspective, it is usual to view a program as a sequence of
instructions to objects to undertake various acts, e.g. print themsleves to a file.
MatClass is largely made up of such object methods, so its user must get used to the
syntax of object methods rather than the more traditional functional or procedural
languages. Thus consider the use of matrix decompositions in MatClass. While
one can solve equations with a "projection" operator, there are several classes of
objects that encapsulate the services of matrix decompositions. To form the LU
decomposition of a matrix, for example, one constructs a luDec object L from the
matrix A with the statement 1 uDec L (A) . Thereafter L is a LU decomposition of A
which can be used to solve the equations Ax=b with the statement L . solve (b) or
to estimate the condition number of A with L . cond ( ). Objects should be viewed,
then, as active, not passive data structures. Indeed objects are a mixture of data and
functions that offer the user various services.
While the ability to encapsulate an abstract data type, such as a matrix, is useful,
our main interest in C++ is its support for inheritance. The user defined classes in
C++ can inherit properties from previously defined classes allowing a hierarchy of
object types. This is the key to object-oriented programming, to which we turn in the
next section.
MatClass aims to lay the foundation for serious scientific work in C++ by being
open, extensible and portable. It is open in the sense that the source code will be
placed in the public domain. MatClass is extensible in the sense that the programmer
can define his own classes of objects that inherit properties of the intrinsic classes
and exploit the object and error management facilities of MatClass. Further user
written extensions will be as efficient as MatClass intrinsics, this being in contrast
to interpreted or semi-compiled systems such as GAUSS. MatClass is portable in
that it is written solely in C++, involves no machine code or dependencies, and cur-
rently compiles correctly with PC compilers from Microsoft, Borland, Glockenspiel,
Zortech and JPI, as well as Hewlett Packard's version of AT&T's cfront on their HP
Series 700 Unix workstations. This portability gives the user the ability to move
programs on to the system best suiting his needs. While MATLAB has been seen as
an ideal system for prototyping algorithms before translating them to FORTRAN, the
distinction between prototyping and development is blurred with MatClass. While
users of GAUSS tend to see it as a self contained system isolating them from the
MatClass: A Matrix Class for C++ 157

dreaded compilers, it is not clear that sufficient consideration has been given to the
deficiencies of GAUSS and to the attractiveness of C++ as a developer system.
See section 9 for a discussion of the relative runtime efficiencies of GAUSS and
MatClass.
One further aim of MatClass is to give the user ready access to state of the
art numerical methods. The current structure of MatClass reflects an interest in
implementating the key matrix decompositions and for solving equations and least-
squares problems. Thus, the two main families of classes, matDec and matOls, are
effectively "packages" built around the LU, Cholesky, Householder QR, and singular
value decompositions. Key references are Press et aI's Numerical Recipes in C [13]
and Golub and Loan's Matrix Computations [7] . Under the influence of the numerical
methods literature some emphasis is placed on condition numbers, and both families
give the user some form of estimating the condition number. In the same spirit,
MatClass aims to give full support to singular-value (SVD) methods, particularly
for least squares problems. Singular values not only have a ready interpretation as
measures of "near singularity", but the SVD algorithms are numerically superior
when the problem is effectively rank deficient. SVD methods have not being given
the role in the econometrics literature they deserve, particularly when discussing rank
and multicollinearity. The SVD should be readily available to all econometricians
and even if it is felt that the computational costs of SVD are too high for routine
work, it should nevertheless be possible to estimate the conditioning of the problem
reliably, regardless of the underlying algorithm. If ill-conditioning is suggested then
the worker can switch to SVD methods - or reconsider the structure of the model!
It has to be noted that from this perspective GAUSS scores highly when compared
with standard statistical packages.
In summary MatClass aims to provide services similar to those of GAUSS and
MATLAB by giving the user a matrix syntax with access to state of the art numerical
methods. What distinguishes MatClass from these others is that these services are
provided in the framework of a compiled object-oriented environment and the full
source code is to be placed in the public domain, a combination that allows third
parties to correct and improve the code without runtime penalties. The object-oriented
nature of the underlying compiler also allows a hierarchy of services to be developed
in an efficient and systematic manner. To add new features to the system, the user
does not have to rewrite the existing "package", his new offerings simply inherit and
build on the features of the existing system. It is this aspect of MatClass to which we
now turn.

4. OBJECT-ORIENTED NUMERICAL METHODS

Object-Oriented Programming Systems (OOPS) in general, and C++ in particular,


have received a good deal of attention in the computing world. Two important features
of OOPS are encapsulation and inheritance. Encapsulation refers to the ability to
hide much of the detail of a data structure or an operation. While this notion has long
been behind the structured approach to programming, its usage is greatly facilitated
158 C.R. Birchenhall

by OOPS. Inheritance refers to the ability of a new object type to inherit the properties
of an existing type. When adding a new, or replacing an old, feature of an existing
class, one simply extends or modifies rather than rebuilds the existing classes. In this
way a layered structure of classes can be built on top of the fundamental management
classes and the basic numerical procedures, allowing users to choose their own
level of access. These features are important in the development of complex and
co-operative software, particularly when the development involves more than one
programmer. It will be argued that C++ with appropriate classes promises to be an
efficient, highly structured environment in which to develop facilities and packages
either individually or jointly.
The essential element of an object-oriented programming systems is its support
for classes of objects that can encapsulate data types and inherit properties. To
understand these concepts better and appreciate the arguments favouring their use,
the reader is refered to Grady Booch's Object-Oriented Design with Applications[l].
Booch argues that the use of OOPS offers significant improvements over the more
familiar structured programming approaches, particularly for complex "industrial-
strength" software. Booch's discussion is directed to software houses which have
to manage a team of programmers on a large project, but it is equally valid for a
community of professionals developing a common system.
Booch (p.77) offers the following definition of an object:
An object has state, behaviour, and identity; the structure and behaviour of
similar objects are defined in their common class; the terms instance and object
are interchangeable.
An object state will include not only the external state as perceived by the user of the
object but also an internal state that reflects the details of the class implementation.
For example, the external state of a matrix includes its dimensions (number of
rows and columns) and the contents of its elements. In MatClass the internal state
includes a 'map' which is used to translate references to matrix elements into memory
addresses.
As part of the behaviour of a matrix we would certainly include the ability to be
used in arithmetic expressions, e.g. be added to or multiplied by other, conformable
matrices. From from the perspective of the user of a class, it is important that objects
in the class behave in a valid and readily understood manner. The user should not
typically have to consider the details of the implementation in order to understand the
usage of the class - although occasionally efficiency may demand considered usage
of a class. Indeed, it is highly desirable that the details be hidden from the users,
restricting their access to a well defined interface. Such data and method hiding,
i.e. encapsulation, allows the developer of the class to modify the implementation
without upsetting other modules or user programs. For example, MatClass allocates
storage for matrices column by column. It is relatively straightforward to modify
a few key procedures to allocate the whole matrix in a single contiguous block of
memory without changing the way in which matrices are used. Users of the class
would continue to address individual elements of the class in the same manner, i.e.
A ( i , j) would still refer to the i-jth element of the matrix A.
MatClass: A Matrix Class for C++ 159

A more substantial example would be a change in the algorithm used to form


the LU decomposition of a matrix in the luDec class. MatClass comes with a luDec
class that encapsulates the idea of an LU decomposition. Essentially, a luDec's
internal state includes a matrix, a pivotal map and a status value. Normally a luDec
object is used without explicit consideration of its state; rather the interest is in its
behaviour - in the way it makes it easier to exploit the use of the decomposition
to solve equations or estimate condition numbers. Having assigned a matrix to a
luDec object, the user can request the object to estimate the condition number, test
for singularity, and solve equations, assured that the object is handling the various
housekeeping tasks and is not repeating significant calculations. In short, a luDec
object encapsulates the numerical procedures that surround an LU decomposition.
To declare and use a luDec object is to choose a set of numerical procedures with the
common element of an LU decomposition. Currently, the luDec class uses a version
of the Crout algorithm with partial pivoting to form the decomposition. It would not
be difficult to replace this procedure with another variant of Gaussian Elimination
without changing the external behaviour or usage of the class.
OOPS goes beyond encapsulation with the concept of inheritance, and it is worth
trying to illustrate how this can be exploited when building numerical objects. Mat-
Class has an abstract class of matrix decompositions, matDec. This class is abstract in
that it does not actually encapsulate any decomposition method; it has purely abstract
decomposition and solution methods. The four concrete decomposition classes -
luDec, cholDec, qrhDec and svdDec - are derived from matDec, and all have actual
decomposition and solution methods. As classes derived from matDec, these meth-
ods replace the abstract methods of matDec. More interestingly they inherit from
matDec the ability to estimate condition numbers, solve multiple sets of equations,
and form matrix inverses - not to mention various internal states and management
functions. For example, a matDec object uses Hager's algorithm to estimate condi-
tion numbers. This method assumes the object to which it is applied knows how to
solve equations. Abstract matDecs cannot solve equations, but concrete matDecs in
the form of luDecs, cholDecs or qrhDecs can solve equations, and instances of these
three classes use the same matDec method to estimate the condition number. It has to
be stressed that the three concrete classes luDec, qrhDec and cholDec use the same
piece of code to estimate condition numbers even though they have different methods
for solving equations, and this common piece of code does not have to be aware ofall
the possible classes that it will be servicing. The condition-number method is being
applied to an object, and each object knows what solution method to use on itself.
Thus when the condition-number method needs to solve an equation, it effectively
requests the object to apply its solution method to that equation. In C++ parlance,
the solving of equations is a virtual method; the actual method is determined dynami-
cally at runtime by the specific object to which the method is applied. In this way the
ability to calculate condition numbers is inherited from the abstract matDec class by
the three classes luDecs, cholDec and qrhDec. And further, this condition-number
method can be inherited, without modification, by future derived classes that can
solve equations, or if need be, it can be overridden as it is in the svdDec class.
As a second example, consider refMatrix, a concept that embodies the idea of a
160 C.R. Birchenhall

reference matrix - an object that is used to reference the whole or part of some other
matrix. Thus, given a matrix A we can declare a retMatrix col to be used to refer to
columns of A. Having attached col to A, we can instruct col to reference a specific
column of A, say the ith, with col. ref Col (i). Thereafter col will act as if it
were the ith column of A- at least until it is instructed to refer to some other part
of A. For example the statement col = 1 would assign the value 1 to the elements
of the ith column of A. A reference matrix differs from a standard matrix in having
to contain a reference to its underlying matrix, for example col holds an internal
reference to A, and so it can be instructed to refer to various parts of that underlying
matrix, for example col could be used to step through the columns of A. Otherwise
a retMatrix inherits all the properties of a standard matrix. The retMatrix class is
derived from the matrix class and any instance of retMatrix is a matrix and can be
used wherever a matrix can be used.
The reader can no doubt imagine objects that are not matrices but which, on
occasion, should behave as matrices. In statistical applications, for example, it can
be useful to have the concept of a data set which can act like a matrix (of variables or
cases), but can also handle variable and/or case names and other sample information
such as missing values and periodicity. Likewise, a table is a useful output device
that may take the form of a matrix of numbers with column and row labels - think
of the standard table of estimated coefficients, standard errors and t-values. Here we
would wish to manipulate the numerical fields like a matrix while having additional
features for handing labels and for displaying of the labels and matrix.
It has to be stressed that the process of making one class inherit the properties of
another is simple and relatively automatic. Virtual methods allow this process to go
a stage further. With virtual methods a child can chose to override those methods
of the parent that are unsuitable for its needs, and the child need not inherit all the
features of the parent. For example, MatClass matrices can be reset in the sense that
their dimensions can be modified dynamically. This would not be appropriate for a
retMatrix, so the reset method is made virtual so that the retMatrix version can treat a
call to reset as a fatal error. Similarly the concrete svdDec class overrides its MatDec
parent's condition number method.

5. THE MATCLASS FAMILY

The foregoing discussion has shown that the key to OOPs programming is the
implementation and use of appropriate classes of objects. Table 1 lists the classes
currently offered by MatClass.
MatClass aims both to smooth the path for new programmers in writing pro-
ductive code using C++ and to offer support for longer term development. To
assist the newcomer, MatClass offers a straightforward syntax that does not require
an understanding of the power and subtlety of C++. Thus MatClass assists with
the management of objects and errors, offers a simple input-output system as well
as a rich set of matrix intrinsics, supports the popular matrix decompositions, and
offers a family of OLS classes on which future extensions can be built. At the same
MatClass: A Matrix Class for C++ 161

TABLE 1
Main MatClass classes.
Class Name Role

inFile Input files


outFile Output files

matObject Container class used for errors and lists


matFunc Function class used for errors and tracing

charArray Arrays of characters


indexArray Arrays of INDEXs
realArray Simple arrays of REALs
matMap Mappings of matrices into realArrays
matPair Iterator for matrices

matrix Central matrix class


refMatrix Matrices that refer to some other matrix

matDec Abstract class of matrix decompositions


luDec LU decomposition class
cholDec Cholesky decomposition class
qrhDec Householder QR decompositions
svdDec Singular Value Decompositions

matOls Abstract OLS class


olsChol OLS class based on Cholesky decomposition
olsQrh QR based OLS class using Householder reflections
olsSvd SVD based OLS class

matRandom Random number generators


matSpecFunc Abstract Special Function
10gGammaFunc etc Concrete Special Functions

time, the combination of a rich matrix class with an extensible object and error-
management system gives the more serious user a powerful development tool for
numerical research. It is likely that experienced programmers who have not used
object-oriented systems will suffer from some conversion pains - this author cer-
tainly did! Object-oriented programming styles are very different from traditional
procedural programming styles. Nevertheless the benefits are significant.
Alongside the central matrix class there is a number of general purpose classes
as well as higher level families of classes related to the concept of a matrix. One
162 C.R. Birchenhall

TABLE 2
MatClass scalar types.
Type Name Role
INDEX Unsigned integer, indices
REAL Standard floating point
DOUBLE Double precision floating point type
matError Enumerated type for common errors

such family is based on various matrix decompositions, including the LV, Cholesky,
QR by Householder reflections and SVD decompositions. Furthermore, there is a
family of ordinary least squares (OLS) classes based on the Cholesky, QRH and SVD
decompositions. Finally there are classes to support random number generation and
the use of special matrix functions.
To support these central classes MatClass provides several underlying and sub-
sidiary classes of objects. Apart from a concept of a matrix, which closely emulates
the mathematical idea, MatClass gives support to scalar variables, including real
numbers and integers, character arrays, and arrays of indices. MatClass uses the
terms REAL and DOUBLE to define real variables. The INDEX type is an unsigned
integer and is used primarily for index variables that control the addressing of ma-
trix elements. The LONG is an unsigned long integer that is used where an INDEX
variable will overflow.
MatClass supports string constants and introduces a class of char Arrays. A se-
quence of characters placed between double quotes such as " thi sis a s tr ing"
constitutes a string constant. Variables of type char Array can be declared and used
to store strings of characters. For example, the statement charArray name (40)
declares a variable name that can store a maximum of 40 characters, although the
contents of a charArray variable may be less than its maximum length. These
charArrays are particularly useful when reading strings from files.
The classes realArray and matMap are largely for internal use and offer the end
user few services.
The use of classes to implement random number generators and special functions
may seem surprising to those familiar with traditional programming methods. But
objects need not simply be data structures; rather they are structures which, through
their methods, offer a set of services to the user. A matRandom object not only has a
state (the current state of the random generator) but also offers a number of methods to
fill a matrix with random numbers and having a class of these objects affords the user
access to several independent generators. Likewise, most of the special functions
are implemented as classes, for example, there is a class logGammaFunc that is the
basis of the logGamma function. These special function classes are derived from
the abstract class matSpecFunc that offers its children a number of services. Thus
all special matrix functions are driven by a procedure in the parent class. While the
logGammaFunc class provides the specific code to return the value of logGamma
MatClass: A Matrix Classjor C++ 163

'include "matmath.hpp"
'include "olssvd.hpp"
'include "olschol.hpp"

void results( matOlst ols, matrixt y, matrix! x )


{
const INDEX width • 10 ;
matrix beta, stderr, tvalues, resid, fitted, newl

ols.assign( y, x ) ;
ols.coeff( beta) ;
ols.stdErr( stderr )
tvalues • beta.divij( stderr )

out . newLine 0 ;
out( "Coeffs", width )( "Std Errors", width+2 ) ;
out ( "t-values", width+2 ) .newLineO ;
out( "----------", width )( "----------", width+2
out( "----------", width+2 ) ;
matFormat ( STACKED ) ;
matField( width )
( beta I stderr I tvalues ).put()
out . newLine 0 ;

out( "RSS )( ols.rss() ).newLine() ;


out( "TSS )( ols.tss() ).newLine() ;
out( "SE )( ols.se() ).newLine() ;
out( "RSQ )( ols.rsq() ).newLine() ;
out( "RBarSq )( ols.rBarSq() ).newLine()
out( "DW )( ols.dw() ).newLine() ;
out( "Cond )( ols.cond() ).newLine() ;

} II results
Fig. 1. Using OLS Classes Part A.

for some real argument, the matrix version of logGamma is essentially governed
by code in the parent class matSpecFunc. In this way special real-valued functions
are converted into matrix functions without undue duplication of requisite loops and
error checking.

6. AN OLS EXAMPLE

Figures 1 and 2 list the two parts of a program illustrating the use of two OLS
classes, oIsSvd and oIsChoI, that are based on SVD and Cholesky decompositions,
164 C.R. Birchenhall

mainO
{
INDEX M, N
inFile data ;

out( "\n\nTest of ols\n\n" )

data.open( Ijsex3n.dat" ) ;
data(M)(N)

matrix X(M,N), c(M-l), x, Y(M), y, sv

X.get( data)
Y.get( data)

x • In( X )
Y • In( Y )

c • 1.0 ;
Y • Y.smpl( 2, M )
x • c I X.smpl( 2, M ) I Y.smpl( 1, M-l )

olsSvd olsl ;
out( "\nResults from olsSvd \n")
results( olsl, y, x ) ;

olsChol 01s2 ;
out( "\nResults from olsChol \n")
results( 01s2, y, x ) ;

return 0
} II main
Fig. 2. Using OLS Classes Part B.

respectively. The program is made up of two functions, resul ts and main. main
is always the entry point into a C++ program, so that execution starts with the lines
declaring the two INDEX variables Mand N. The main function then reads in a data set,
applies log transforms, and builds a regressand and a matrix of regressors including
a lagged regressand. Having created an olsSvd object olsl, it calls results to
print out some of the standard OLS results. This is repeated for an olsChol object
01s2. The output from the program is given in Figure 3.
This example illustrates some of the features of MatClass, but more interestingly it
also illustrates one of the uses of class hierarchies. Looking closely at the resul ts
MatClass: A Matrix Class/or C++ 165

Results from olsSvd :

Coeffs Std Errors t-values


---------- ---------- ----------
0.06734 0.01951 3.451
-0.0145 0.01995 -0.7269
-0.02009 0.01986 -1.011
1.028 0.01281 80.22

RSS 0.05518
TSS 2.414e+04
SE 0.04795
RSQ 1
RBarSq 1
DW 0.7388
Cond 7.355

Results from olsChol

Coeffs Std Errors t-values


---------- ---------- ----------
0.06734 0.01951 3.451
-0.0145 0.01995 -0.7269
-0.02009 0.01986 -1.011
1.028 0.01281 80.22

RSS 0.05518
TSS 2.414e+04
SE 0.04795
RSQ 1
RBarSq 1
DW 0.7388
Cond 10.66

Fig. 3. OLS Output.

function in Figure 1, you will see that its first argument is declared to be a matOls, not
an olsSvd or olsChol. Despite C++'s being a strongly typed language, the compiler
does not treat the calls to resul ts with objects of type olsSvd and olsChol as errors,
since these objects are matOls objects. Instances of derived classes are instances of
the parent class. We have a family of matOls classes, and any object in those classes
166 C.R. Birchenhall

can be used where a matOls object is expected. Thus we only need to write one
resul ts function for all existing and future matOls classes. Thinking in terms
of families of classes allows us to introduce a higher level of abstraction into our
programs, giving greater freedom for mixing and matching objects to specific needs.
Although not part of our example, the user can choose the specific class, and thus
algorithm, used by resul ts at runtime.
This example hints at what is possible when building a hierarchy of classes and
illustrates the fact that many of the benefits of inheritance can be exploited by stand-
alone functions such as results. Although MatClass does not currently offer
classes for nonlinear models, these featues of C++ are expected to be particularly
valuable in this area.

7. ERRORS IN MATCLASS

MatClass maintains a stack of function calls and displays a list of the active functions
when an error occurs. For a function to appear on this stack, it must declare a variable
of type matFunc. All major functions in MatClass exploit this facility and thus it
should be possible to identify the sequence of calls that led to the error.
Furthermore an error message will normally induce MatClass to generate a list of
objects. MatClass objects arrange themselves in "levels" for lists, the matrix class
itself is at level 3. The MatClass object lists can be restricted to "high" level objects
through the matListCtrl function.
Unfortunately, MatClass has no access to the C++ user defined identifiers, unless
the program explicitly names objects (using the name method). We eschew the details
of the object lists here, simply noting that each object attempts to provide summary
information on its state.

8. TRACING PROGRAMS WITH MATCLASS

While most modern C++ compilers come with source level debuggers, and some
come with class browsers, I have found it useful to have MatClass-specific, debugging
facilities. This can be controlled by setting the depth of the trace. The default level
is zero which means no tracing takes place.
During a trace, each major MatClass function occuring at or below the trace depth
identifies itself by name. This is normally followed by information on the principal
object with which the function is working. Tracing can be switched on anywhere in
a program by setting a nonzero debug level and can be switched off by setting a zero
debug level. By default the debug output is written to the standard output file but can
be redirected to a disk file.
MatClass: A Matrix Class for C++ 167

9. EFFICIENCY OF MATCLASS IN ACTION

This section compares GAUSS with the combination of MatClass and a C++ com-
piler as development systems. Currently GAUSS has advantages for those whose
needs are met by the various "add-ons" such as GAUSSX. MatClass cannot yet
satisfy these needs, but there is a commitment toward providing "higher" level capa-
bilities in future versions.
While the structure of GAUSS has advantages for "interactive" work and can
form the basis for significant research, it is not as well structured as C++.
In particular, it suffers from not being strongly typed and not offering object-
oriented facilities. As we have illustrated the extensibility characteristic of
MatClass makes it superior to traditional languages such as GAUSS.
In assessing MatClass, one is also assessing in part the available compilers. The
current version of MatClass compiles and runs with all compilers available to the
author: Glockenspiel version v2.0 d2, Borland's C++ version 3.1, Microsoft
C/C++ v7, Zortech v2.1, JPI vI, and Hewlatt Packard's CC compiler provided
with their series 700 workstations. The Glockenspiel and HP compilers use the
AT&T Unix-based cfront, and so it is expected that the class will readily be
ported to this and related systems.
Compiler systems are varied and their appeal will differ according to the back-
ground of the user. Experienced programmers will appreciate using Glocken-
spiel and other command-orientated systems with familiar development tools.
We shall see that the Glockenspiel system produces very efficient code - in
fact the machine code is generated by Microsoft's C compiler. By contrast,
'integrated" environments, such as those offered by Borland, will satisfy those
who want the compilation process to be straightforward and relatively auto-
matic. Anyone who is using systems such as GAUSS to produce programs of
any significant size will find the move to this environment relatively painless
and productive. Not only is the compilation process easy, but you also have a
first class editor and debugger.4
The fact that MatClass works with these compilers suggests that any MatClass
code can be readily ported across operating systems. Glockenspiel, Zortech and
JPI have versions of their compilers for OS/2 - a much maligned system that
is better than DOS - and Borland is flagged by IBM to be a major player in the
future. All DOS based compilers offer support for MS-Windows and several
come with optional DOS extenders.
The reader should not view MatClass as an attempt to give the user a total and
final solution for his computational work; rather it is an attempt to provide one
development path based on the potential of C++. There are already alternatives,
for example M ++ from Dyad Software, and I am confident there will be others.
One price to be paid for using a C++ compiler rather than GAUSS's pseudo-
compiler5 is the relative slow compiling. GAUSS has rightly gained a reputation
4 You also get command line versions of the compiler and make facility allowing you to
base your work on editors such as Brief or SPE.
5 GAUSS produces some form of pseudo-code rather than linkable machine object code.
168 C.R. Birchenhall

TABLE 3
Timings in seconds for 90 x 90 mUltiplication
System Machine Compile Initial Multiply Code Size K
GAUSSv2 486 PC NA 3.02 0.77 NA
GAUSS386 486 PC NA 2.34 0.62 NA
Glockenspiel v2 486 PC 12.9 0.05 1.87 85
Microsoft v7 486 PC 7.6 0.06 2.03 80
Borland v3.1 486 PC 2.8 0.05 2.08 84

HPCC HP720 2.3 0.01 0.11 106

for its speed of execution, and for some tasks this can be a critical factor. But
it has to be understood that this feature is only true of its intrinsic operations
such as matrix multiplication. When using low level code such as addressing
individual elements of matrices, however, the interpretitive nature of GAUSS
becomes highly inefficient. In any event the combination of MatClass with
a compiler system such as Glockenspiel competes well with GAUSS. Some
evidence on these issues is given below.
As a crude measure of relative efficiency Table 3 reports timings for the simple task
of multiplying two 90 by 90 matrices. Apart from forming the actual product, this job
involves initialising the two matrices with a nested pair of loops that step through the
rows and columns. This initialisation allows comparison of the relative efficiencies
in low-level access to the elements of the matrices.
Two PC configurations were used, an Opus and a DAN. Both used a 80486 at
33Mhz with 8Mbs of RAM and a 64K cache. The DAN had a 2Mb hard disk
cache. The timings for the GAUSS runs were based on the Opus, while those
for the C++ compilers were based on the DAN. The presence of the hard disk
cache significantly improved the compilation times. Both machines used DOS
s. It is fair to say that a minimum requirement for making good use of C++,
and implicitly MatClass, under DOS is a machine based on the 80386DX or
80386SX with a reasonable hard disk.
The timings for the HP CC runs were based on a Hewlett-Packard 720 worksta-
tion running a PA-RISC processor at 50Mhz, with 128k of instructor cache, 256k
of data cache, 32 megabytes of RAM, and running HP-UX. The compilation
and execution of the programs was done from the shell of GNU Emacs. Without
placing too much weight on one set of timings, these so-called "Snakes" are
truly impressive and bear witness to the possibilities offered by RISe. Some
readers may question the value of raw speed without general productivity tools
or may not see Unix workstations as their preferred platform, but I have found
the system with CC, GNU Emacs, T}3X and MatClass to be highly productive. As
the prices of Unix workstations and higher level PCs continue to fall, machines
of this calibre will soon be the standard for research. With such configurations
MatClass: A Matrix Class/or C++ 169

significance of compiler and linker lags largely disappear. Note the total times
for the HP is less than the total for GAUSS386. For more computationally tasks
the sheer processing power of the HP is even more important. All in all, it is
suggested that the ability to port code from PCs to powerful RISC systems is a
significant advantage of MatClass.
All compilers were set for speed optimisation - in the case of Glockenspiel and
Microsoft the options -Ox -Op were used. Glockenspiel translates C++ into
C and then calls the Microsoft C v7 compiler to generate machine code. The
full optimisation option is the -Ox -Op option for the C compiler. In the case of
Borland, the command line compiler was used with the options -02.
The reported compilation times are averages of 5 runs clocked with a stop watch.
No attempt was made to time GAUSS' compilation, partly because there is no
clear indication when the compilation is complete and partly because it seems to
be negligible for this small job. The timings for initialisation and multiplication
were generated directly by the code using calls to the system clock.
The execution times for the multiplication suggest that the advantages of
GAUSS' assembler code are real but not overwhelming when compared with
modern day optimising compilers. Although the performance of the GAUSS
intrinsics on the 486 is particularly impressive, the times for the Glocken-
spiellMicrosoft combination are equally so for code written in an object-oriented
system. Given the advantages of developing code in a higher level language the
argument for dropping down to assembler is not strong. It has to be stressed
that the implementation of matrix multiplication in MatClass does not involve
any fancy tricks.
Comparing Glockenspiel and GAUSS confirms the trade-off between compi-
lation times and execution speeds for low-level code. Consideration of the
initialisation times suggests the running of low-level code in GAUSS is very
slow when compared to fully compiled code. The conclusion is that GAUSS
gives good execution times as long as the main computations can be completed
using its intrinsics, but if one needs any low-level code, then GAUSS will not
give efficient run times. In particular, it is not feasible to substitute your own
GAUSS routines for GAUSS intrinsic functions without severe loss of effi-
ciency. In MatClass there is no runtime penalty for replacing or adding new
low-level functions.
The Glockenspiel V1.2 and Microsoft C V5.1 combination previously set the
standard for quality compilation of C++ on PCs. The consistency of this
combination has been impressive and the computational performance first class.
While version 2 of Glockenspiel still generates fine code and has the advantage
of being a port of cfront, the native compilers such as Microsoft v7 and Borland
v3.1 have developed into fine products.
Turning to the performance of the Borland compiler, it can be seen that the tim-
ings are almost as good as Microsoft. In the past the floating-point performance
of the Borland compilers has not been impressive. With this compiler they have
caught up. Furthermore their development environment is easier to use than
Microsoft's workbench - although I have been able to configure the latter better
170 C.R. Birchenhall

for my needs.
A comment may be offered on the sizes of compiled code. There is clearly a
danger that significant applications based on MatClass will soon run into memory
difficulties on PCs running DOS. There is a number of ways of overcoming this
limitation. Most of the compilers have a version that supports, even if they do
not provide, a DOS extender and this is likely to be best solution. And there
are smart linkers which do not link redundant code; for example the JPI DOS
compiler's version of the multiply program was approximately 48K, half the
size of the others. One interesting possibility is promised by the new Microsoft
C/C++ version 7 compiler that allows the use of p-code for modules that
are non time-critical. And of course there are overlay systems like Borland's
"VROOM"! The size of the HP code is not vastly greater than the PC code
despite it being based on RISC.

10. SYSTEM REQUIREMENTS

As a C++ class, the essential ingredient for the use of MatClass is a computer
running a C++ compiler. The rapid adoption of the language by the industry means
C++ is well supported on most popular systems, including PCs, Macintoshes, and
Unix workstations. It is likely that the minimal requirement for any significant use
will be a 386SX PC with a hard disk, or machines of comparable performance. The
author must confess to doing some of the original development of MatClass using
Glockenspiel on a humble domestic PCIXT clone, where compilation times were
something of an impediment and nearly impossible with later compilers. With the
current standard of a 486 machine with disk cache and graphics accelerator, however,
these compiler lags are rapidly becoming insignificant.

11. AVAILABILITY

The source code of MatClass is available from the author. The source itself is in the
public domain in the spirit of the Free Software Foundation; that is, the source code
for MatClass will be covered by a free licence which safeguards the free availability
of that code. My own experience with free software - Kermit, Emacs and TEX-
suggests that it works best when someone publishes, in a traditional book form, a
guide and manual. It is my intention to do this for MatClass and a draft of such a
text is available. This is also available in electronic form - in raw TEX, DVI, HP
or PostScript - but I retain the copyright to allow future publication. All of the
MatClass files are available from the UTS machine at the Manchester Computing
Centre using an anonymousftptouts .rnee. ae. uk. Look in the pub subdirectory
for rna te las s. Alternatively the files can be supplied on disk direct form the author
at a cost to cover expenses.
As free software, there will be no commercial warranty. Nor can I offer free
support as a right. Clearly I wish to make MatClass useful and as correct as possible,
MatClass: A Matrix Class for C++ 171

and toward this end I intend to respond to problems and bug reports as well as I
can within the available resources. I will try to support the use of MatClass with
Microsoft C/C++ v7 and Borland C++ Version 3.1 for DOS and HP's CC on their
series 700 workstations. I will also be able to give limited support to Glockenspiel
C++ Version 2.0d2 for DOS, but beyond that I can only make available pointers on
loading MatClass on other systems.

12. BIBLIOGRAPHY

The full use of MatClass requires some understanding of C++. Probably the best
overall introduction is Stanley B. Lippman's C++ Primer [10]. Beyond that, consider
Programming in C++ by Stephen C. Dewhurst and Kathy T. Stark [3]. Look to
Lippman for further information on control structures such as for loops and if
statements and for the general structure of functions and methods in C++. Dewhurst
and Stark give an excellent introduction to the potential offered by C++'s object
orientation. To go further yet see the excellent Advanced C++ Programming Styles
and Idioms by Coplien [2].
The two "bibles" on C++ are authored by the originator of the langauge, Bjarne
Stroustrup. The C++ programming Language[14] was the effective definition of
the first version of the language. With coauthor Margaret Ellis, The Annotated C++
Reference Manual [6] is a draft of an ANSI standard definition of the second version
of the language. Neither of these texts is for the casual reader.
The author must acknowledge the influence of Bruce Eckel's Using C++ [5] and
Scott Ladd's C++ : Techniques and Applications [9].
See Booch [1] and Meyer [11] for a discussion of object-oriented software design.

REFERENCES

1. G. Booch. Object-Oriented Design with Applications, Benjamin/Cummings, Redwood,


1991.
2. J.O. Coplien. Advanced C++ Programming Styles and Idioms, Addison-Wesley, Read-
ing Mass., 1992.
3. S.c. Dewhurst and KT. Stark. Programming in C++, Prentice Hall, Englewood Cliffs,
1989.
4. J.J. Dongarra, C.B. Moler, J.R. Bunch and G.w. Stewart. LlNPACK User's Guide, SIAM,
Philadelphia, 1979.
5. B. Eckel. Using C++, Osborne McGraw-Hill, Berkeley, 1989.
6. M.A. Ellis and B. Stroustrup. The Annotated C++ Reference Manual, Addison-Wesley,
Reading Mass., 1990.
7. G.H. Golub and c.F. van Loan. Matrix Computations, John Hopkins, Baltimore, 1989.
8. P. Griffiths and I.D. Hill (editors). Applied Statistics Algorithms, Ellis Horwood, 1985.
9. S. Ladd. C++ Techniques and Applications, Prentice-Hall, New York, 1990.
10. S.B. Lippman. C++ Primer, First Edition, Addison-Wesley, Reading Mass., 1989.
11. B. Meyer. Object-oriented Software Construction, Prentice Hall, UK, 1988.
12. W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling. Numerical Recipes,
Cambridge University Press, Cambridge, 1986.
172 C.R. Birchenhall

13. W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling. Numerical Recipes in C,
Cambridge University Press, Cambridge, 1988.
14. B. Stroustrup. The C++ Programming Language, First Edition, Addison-Wesley, Read-
ing Mass., 1986.
ISMAIL CHABINI, OMAR DRiSSI-KAiTOUNI AND MICHAEL FLORIAN

Parallel Implementations of Primal and Dual Algorithms


For Matrix Balancing

ABSTRACf. We report the parallel computing implementations of a primal projected gradient


algorithm and the classical RAS dual algorithm for matrix balancing. The computing plat-
form used is a network of Transputers which is suitable for coarse grained parallelization of
sequential algorithms. We report computational results with dense matrices of dimension up
to 315 x 315 and 100,000 nonzero variables.

1. INTRODUCTION

The recent development and use of computing platforms based on parallel processing
architecture has had a major impact on many fields of science and economics that re-
quire intensive computations (Bertsekas and Tsitsiklis, 1989; Zenios, 1989; Pardalos
et aI., 1990). The efficient use of various parallel computing architectures requires the
development of new code that takes advantage of the development tools and compil-
ers that are available. In this paper, we report on parallel implementations of primal
and dual algorithms for matrix balancing problems on a network of Transputers.
This is probably one of the least costly Multiple Instruction Multiple Data (MIMD)
(Flynn, 1972) parallel computing platforms available and is best suited for "coarse
grain" parallelization of sequential codes. Nevertheless, our experiments indicate
that significant gains in speed and efficiency accompany its use when compared to
the execution of the sequential code on a single Transputer.
This paper is organized as follows. In the following Section, the matrix balancing
problem is described. Sections 3 and 4 present primal and dual algorithms for this
problem. Then, the parallel versions of these algorithms are described in Section 5.
The computational results obtained and their evaluation are given in Section 6. Our
conclusions and views on further development of parallel computing implementations
of matrix balancing problems comprise the last Section.

2. THE MATRIX BALANCING PROBLEM

The matrix balancing problem that we consider can be defined as follows: given an
n x m nonnegative matrix gO, supply and demand vectors 0 and D of dimensions n
and m, respectively, an n x m matrix g* is sought that satisfies

D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 173-185.
© 1994 Kluwer Academic Publishers.
174 I. Chabini et al.
m

L gij = Oi, i = I, .. . ,n, (1)


j=1

L gij = Dj , j = I, ... ,m, (2)


i=1

i = I, ... ,n; j = I, ... ,m, (3)

(tOi=f=Dj=T).
i=1 j=1

This problem occurs frequently in economics, transportation planning, statistics,


demographics and image reconstruction. A good survey of applications of matrix
balancing may be found in Schneider and Zenios (1990).
Many algorithms have been constructed for matrix balancing problems. These
may be viewed as primal or dual approaches for solving (1)-{3). The primal algorithm
that we present in this paper is new and is based on an analytical gradient projection.
It is well known that (1)-(3) are equivalent to the following entropy optimization
problem:
n m

Min F(g) = L L % (In g¥ - I) (4)


i=1 j=1 gtJ

subject to (1) and (2).


The solution gij is nonzero; hence there is no need to state non negativity con-
straints on [g].

3. THE PRIMAL ALGORITHM

The gradient projection method (Luenberger, 1984) is a primal nonlinear program-


ming algorithm that has not been adapted so far for the problem (4), (1)-(2). Here
we give its adaptation to the matrix balancing problem as well as some details of the
proofs.
Consider X = [Xij] to be an n x m matrix satisfying the conservation of flow
equations
m

-L Xij = 0, i = I, ... ,n,


j=1
n
(5)
L Xij = 0, j=I, ... ,m-l,
i=1

where the constraint L:~=1 Xim has been removed, since (5) is a linear system of rank
(n + m - I) and is redundant. By using the lexicographic ordering of the subscripts
Parallel Implementations of Primal and Dual Algorithms 175

i and j, we can convert the n x m matrix X into an n . m vector x. Let y be the


orthogonal projection of x on the null space of the constraints (1), (2) defined by (5).
y is the solution to the problem

Min tlly - xll 2 (6)


subject to (5), which may be restated as

Ay=O, (7)
where A is the node-arc incidence matrix with destination m removed.
The K-K-T necessary and sufficient conditions for (6)-(7) are
y -x+AT A = 0,
(8)
Ay =0.
It is relatively easy to show that the vector of dual variables A associated with the
constraints of (7) is given by the formula

(9)
Drissi (1991) was the first to find an explicit analytical expression for (AAT)-I in
order to compute A. In the following, we develop another method for solving (9),
which has the advantage of showing why (9) is easy to solve for bipartite networks.
A byproduct of this analysis is an explicit expression of y as a function of x which is
easy to manipulate. We present first some preliminary results.
Consider the following linear system:

(10)

where Xt. X 2, BI and B2 are column vectors, and All, An, A21 and A22 denote
submatrices of appropriate dimensions.
The system (10) is equivalent to

(11)

(12)

where Q = A22 - A2IA\i1 An and R = B2 - A2IA\i1 B I . If All is easily invertible,


then the solution of (10) is equivalent to the solution of (II), which is a linear system
of a lower dimension.
Consider now (AAT) of (9). It may be expressed as

(AAT) = [mln,n -Un,m-I], (13)


-Um-I,n nlm-I,m-I
where I is the identity matrix and U is the unitary matrix. Hence solving (9) is
equivalent to finding a solution to the system
176 I. Chabini et al.

The indices belonging to the set 1 correspond to origins and those belonging to set 2
to the (m - 1) destinations.
In the linear system (11), the matrix Q is given by

Q = (nlm-I,m-I - (-Um-I,n) (~ In,n) (-Un,m-J)),

Q = (nlm_1 ,m-I - ~
m Um- I"nUn m-I),

Q = (nlm_1 ,m-I - .!!:..


m Um - I'm-I),

Q = n(Im-1 ,m-I - ~
m Um- I'm-I),
1-...!.. -...!.. .. .
_...!..m I_m...!.. .. .
Q =n [ .m . m
.. ..
-...!.. -...!..
m m

We note that the columns of Q are permutations of the first one. This is due to
the topology of the network. Since it is a complete bipartite network, the different
destinations may be permuted without changing the "drawing" of the graph. Due to
its special structure, Q has an easy inverse.

Proposition 1

-1 ( 1m-I
Q -I -_ n 'm-I + Um- I,m-I ) .
Proof

QQ-I = n(Im-1 ,m-I - ~


m Um- I'm-I) .!
n (Im-I m-I + Um- I m-I),
' ,

QQ - I -_ (
1m-I ,1mUm-I'
m-I - - m-I + Um- I'm-Im
- -
1 Um-'I m-IUm- 1,m-I ),
since Um-l,m-IUm-l,m-1 = (m - I)Um- l,m-l, it follows that

m-l 1 )
QQ-I = ( 1m - I m-I
,
+-m
- Um- I'm-I
m
- - (m - l)Um _ 1 m-I
'
,

QQ-I = 1m - I ,m-I. o

Proposition 2: The solution of (9) is given by the following expressions


Parallel Implementations of Primal and Dual Algorithms 177

1 mn 1 m In
Ali = nm L L Xlk - m L Xik -;;: L Xlm i = 1, ... ,n, (14)
k=1 1=1 k=1 1=1

1
(L L
n n
A2i = ;;: Xlj - Xlm) j = 1, ... , m. (15)
1=1 1=1

Prool
We evaluate R of (12) by using the above result.

R = [Axh - (-Um-I,n) X (~ In,n) [AxJI,


1
R = [Axh + m Um-l,n[AxJI.

Using (11) and the result of Proposition 1 it follows that

A2 = Q-IR,

A2 =;;:1([Axh + Um-l,m-I[Axh + m1 Um-l,n[Axh + ---;;;-


m-l )
Um-l,n[AxJI ,

1
A2 = ;;: ([Axh + Um-t,m-dAxh + Um-l,n[Axld·
For a destination node j E { 1, 2, ... , m - 1 }
1 n m-I n n m

A2j = ;;: (L Xlj +L L Xlk +L (- L X 1k )),


1=1 k=1 1=1 1=1 k=1

1 n m-I n m-t n n

A2i = ;;: (L Xlj +L L Xlk - L L Xlk - L x l m )'


1=1 k=1 1=1 k=1 1=1 1=1

We recall that the node j = m corresponds to the redundant constraint that was
dropped; its dual variable A2",= O. The expression (15) holds for j = m as well.
Hence

j = 1, ... ,m.
178 l. Chabini et al.

In order to evaluate AI, the vector of dual variables (prices) associated with source
nodes, one can apply (12) to obtain
1
AI =m In,n([Axh - (-Un,m-J) A2),

1
AI =m ([Axh + Un,m-I A2).
Let i E { 1, 2, ... , n } represents an origin node

Since A2", = 0,
(16)

By using (15),

By replacing E;;=I A2k with ~ E;;=I E~=I Xlk - ~ E~=I Xlm in (16), one
obtains (14). o
Now, we can obtain an explicit expression for y as a function of x. The linear
system (8) implies that

y =x - ATA,

Yij = Xij - (-AI; + A2;).


By replacing Ali and A2; by their expression (14) and (15), we obtain

y .. _ x .. + E~-I E;;-I Xlk _ E;;-I Xik _ E~-I Xlj


(17)
OJ - OJ nm m n'
which proves the proposition below.

Proposition 3: The expression of y as a function of x is given by (17). 0

The primal algorithm that is implied by these results may be stated as follows:
Parallel Implementations of Primal and Dual Algorithms 179

Projected Gradient Algorithm

Step 0 (Initialization):

r -
gij - -r'
DiDj
loraII (.
& .).
~,J,
r =1
Step 1 (Computation of the gradient):

V' Fi; = In (g~)


gij
,for all (i, j)

Step 2 (Computation of the projected gradient):

Apply equation (17) to obtain the projected gradient pr.

Step 3 (Optimality test):

If IIp r ll < f, STOP;


otherwise continue to step 4.

Step 4 (Line search):

at = arg miIlo<a<a
_ _ max F(gr - apr)

(The line search is performed with a Newton method)

Step 5 (Update variables):


gr+l = gr _ ar pr
r = r + 1 and return to step 1.

4. THE DUAL METHOD FOR MATRIX BALANCING

The classical method for solving (1)-{3) is a dual method that dates back to 1937
(Kruithof, 1937) and was generalized for general linear constraints by Bregman
(1967a, 1967b). It is used widely in transportation planning applications (Wilson,
1967; Evans, 1967; Robillard and Stewart, 1974). It is also known as the RAS
algorithm (Bachem and Korte, 1979), since it alternates between scaling rows and
columns, which is equivalent to premuItiplying by Rand postmultiplying by S to
obtain the balanced matrix (R and S are diagonal matrices). The algorithm may
be viewed as a coordinate ascent method for the Lagrangean dual of the entropy
optimization problem [Cottle, Duval and Zihan (1986), Schneider and Zenios (1989)].
The algorithm may be stated as follows:

RAS Algorithm

Step 0 (Initialization):
180 I. Chabini et al.

Step 1 (Balancing columns):

For each j = 1,2, ... , m DO:


ST = Dj
J ,,",n RT-1 0
6i=1 i gij

Step 2 (Balancing rows):

For each i = 1,2, ... , n DO:


RT = Oi
, ,,",m ST 0
6j=1 j gij

Step 3 (Stopping test):

If Max IRT -RiRT-11 :s


i T i € and Max
S JT - ST-II -< ,
ST
J € go to step 4;
I::;i::;n I::;j::;m 1
J
otherwise return to step 1.

Step 4 (Compute solution):

T _ RT 0 ST V(i,j) and STOP.


gij - i gij j '

We stated the RAS algorithm in a form that is efficient both for computational
efficiency and numerical stability. The standard statement of the algorithm may be
found, for instance, in Schneider and Zenios (1990).

5. THE PARALLEL IMPLEMENTATIONS OF THE PRIMAL AND DUAL


ALGORITHMS

The parallel computing platform that we used is a network of 16 Transputers installed


in an Intel 286 personal computer operating under MS-DOS. This is probably one
of the least costly parallel computing platforms available. The Transputers that
we worked with are the T800 of Inmos/SGS Thompson, which have the following
characteristics:
• each has four communication links which may send and receive data at 10 or
20 Mbits/sec.
• each has a 32 bit CPU with 4 Kbyte of "on-chip" memory and 1 or
4 Mbyte RAM.
• each operates at 20 Mhz frequency and has a rating of 10 MIPS (Million Instruc-
tions/second).
• each has a floating point processor rated at 1.5 Mflop (Million floating point
operations per second).
Parallel Implementations of Primal and Dual Algorithms 181

A network of Transputers is configured by connecting their communication ports


two by two. The only constraint is that each Transputer may be connected to at
most four others. In order to configure the Transputers into a network we used a
Linkputer, which may be programmed to obtain all possible configurations of up to
32 Transputers, with the restriction that a Transputer may be connected to at most
four others. For the results presented in this paper we used 16 Transputers configured
as a binary tree for the primal method and as a ring for the RAS algorithm. We
used the EXPRESS library of Parasoft and the 3L Parallel Fortran 77 compiler as the
development environment.
The parallel implementation of the projected gradient algorithm for the com-
putation of the gradient is subdivided over the processors. Each processor p
(1 ~ p ~ N P ROC) computes a block (of columns) of 'V FT (Step 1) that cor-
responds to j E Jp , where J p is a subset of the indices j = 1, ... , m. Each
processor receives the data required for the computations at the beginning, and at
each step it computes the components of the gradient. Then each processor computes
~ L:~=I 'V Fij for j E J p and ;k L:kEJp 'V Fik (Step 2). Each processor in the tree
receives the sums from its children and sends the partial sum to its father. The total
is obtained at the root, which broadcasts it to all the processors.
The line search (Step 3) requires the evaluation of the objective function for
various values of the step size ll. A typical evaluation is the term

As in the previous steps, each processor p computes the part of the sum for indices
j E J p • The communication and summation is done as in step 2.
The coarse grain parallel implementation of the RAS algorithm consists in the
decomposition of the computations of steps 1 and 2 over the processors. In step 1,
processor j computes Sj for j E Jp , where the sets Jp form a partition of the columns
j = 1,2, ... , m and have approximately the same cardinality. The communication
of the partial results is done as follows: processors exchange results by successively
sending and receiving information to and from their neighbors, in a ring network
topology (Bertsekas and Tsitsiklis, 1989).

Substep 0: each processor p sends SJ p


to processor p + 1.

Substep 1: (NPROC - 1) times each processor p receives SJ", from proces-


sor p - 1 and sends SJp' to processor p + 1.

The same approach is taken to partition the computation of the Ri in step 2 of the
algorithm.
This is illustrated by the following example. Let the number of processors be 4
and the number of columns be 8. Each processor is assigned the computation of the
Sj'S for two columns. Processor 1 sends the values of SI and S2, processor 2 sends
182 I. Chabini et al.

the values of 8 3 and 8 4 and so on. Each processor then informs its neighbor of the
values received. For this example, processor 1 would send to processor 2 the values
of 8 7 and 8 8 that were received from processor 4, processor 2 sends the values of 8 1
and 8 2 , and so on. This is illustrated in the figure below.

6. NUMERICAL RESULTS

We conducted our numerical experiments on problems with sizes not exceeding


315 x 315 in order to keep all the data in the RAM of each Transputer. The problems
are generated randomly by the following procedure:

gij are randomly generated from a uniform distribution on [1,200].

n m

Oi = L gij, 'Vi; Dj =L gij, 'Vj.


j=1 i=1

g?j = gij (1 ± r ij ), where r ij are randomly generated from a uniform


distribution on [0,0.1].

The increases in speed obtained with the primal method are reported in Table 1.
The best increase is 11.55 and is obtained with 12 processors. The results obtained
with the dual algorithm are reported in Table 2. Here the best increase is 5.03 using
16 processors.
These results indicate that the primal method benefits more from coarse grain
parallelization. In addition, the convergence of the primal method is far superior to
the dual method on ill conditioned examples such as that taken from Robillard and
Stewart (1974). The data for the problem are

0= [0.001 1]
gIl '

The exact solution is

x* = [1+0000 _I~~]_ [ .06131. .. 1.93869 ... ]


1.93869... .06131... .
1+ v'loOO 1+ v'loOO
~ 2_ -

The RAS method produces the following solution after 150 iterations:
Parallel Implementations of Primal and Dual Algorithms 183

TABLE 1
Speedups obtained with the projected gradient algorithm.
Projected gradient method
NPROC 2 3 4 6 8 10 12 14 16

Function +
gradient 2 3 4 5.98 8 10 12 14 16
evaluations

Gradient
1.99 2.85 3.90 5.56 6.91 8.12 8.84 9.50 9.95
Projection

Line-
2 3 4 6 7.79 9.81 11.85 10.46 10.95
Search

Total
2 3 4 6 7.69 9.63 11.55 10.46 10.98
algorithm

TABLE 2
Speedups obtained with the RAS method.
Balancing method
NPROC 2 3 4 6 8 10 12 14 16

Speedup 1.74 2.34 2.81 3.52 4.00 4.38 4.64 4.84 5.03

(150) _ [ .0612 1.9387]


9 - 1.9387 .0613 '
while the projected gradient method produced the following solution after one (!)
iteration:
(I) = [ .06131 1.93869]
9 1.93869 .06131 .
A similar example is the following:
0.001 .001 1]
gO = [ 0.001 1 1 ,
1 1 1
The corresponding primal and dual solutions are
0.27053 0.02702 2.70245]
Projected gradient, g(16) = [ 0.02702 2.70245 0.27053 ,
2.70245 0.27053 0.02702
184 I. Chabini et al.

0.1680 0.0214 2.8105]


RAS algorithm, g(25) = [ 0.0207 2.6333 0.3460
2.6224 0.3338 0.0439
A comparison of the sequential and parallelized algorithms for one 315 x 315
problem is given in the table below. The times are the average seconds per iteration.

Projected Gradient RAS method


Sequential Parallel Sequential Parallel
15.96 1.38 2.09 0.42

Most of the computational time for the primal method is needed in the line search,
8.71 sec. out of 15.96 sec. The time required for the projection is 1.70 sec. and for
the cost evaluation, 5.55 sec.
The ratio of the computational times, per iteration, of the parallelized algorithms
is 3.28 in favor of the RAS algorithm. Hence, if the primal algorithm is to be
competitive with the RAS method, it should converge to an acceptable solution in
less than an average of 3.28 iterations. The preliminary tests given in this paper
indicate that the primal algorithm converges faster than the RAS method for some
ill conditioned examples. We note that the stopping criteria for the primal and dual
methods are different. The pattern that we observe is that the RAS method obtains
an objective value very close to that of the primal algorithm, but the values of the
variables are relatively far from their optimal values.

7. CONCLUSION

We present in this paper, to our best knowledge, the first parallel computing imple-
mentations of a matrix balancing algorithm on an MIMD computing platform. The
results obtained indicate excellent gains in speed for the primal algorithm, since the
computing tasks are relatively "coarse grained" and well suited for this architecture.
The gains in speed for the RAS algorithm may possibly be improved by an asyn-
chronous implementation in which each processor does not wait to obtain the most
"current" scaling factors before starting another balancing iteration. We intend to
report on further work in this area in future articles.

REFERENCES

1. Bacharach, M., "Biproportional matrices and input-output change", Cambridge Uni-


versity Press, 1970.
2. Bachem, A. and Korte, 8., "On the RAS-Algorithm", Computing, 25, pp. 189-198,
1979.
3. Bertsekas, D.P. and Tsitsiklis, J.N., "Parallel and distributed computation, numerical
methods", Prentice Hall, Englewood Cliffs, New Jersey, 1989.
4. Bregman, L., "The relaxation method of finding the common point of convex sets and its
application to the solution of problems in convex programming", USSR Computational
Math. and Mathematical Phys., 7, pp. 200-217,1967.
Parallel Implementations of Primal and Dual Algorithms 185

5. Bregman, L., "Proof of the convergence of Sheleikhovskii's method for a problem with
transportation constraints", USSR Computational Math. and Mathematical Phys., 7,
pp. 191-204, 1967.
6. Cottle, R.W., Duval, S.G. and Zikan, K., "A Lagrangean relaxation algorithm for the
constrained matrix problem", Naval Research Logistics Quarterly, 33, pp. 55-76, 1986.
7. Drissi-KaItouni, 0., "A projective method for bipartite networks and application to the
matrix estimation and transportation problems", Publication #766, Centre for Research
on Transportation, Universite de Montreal, 1991.
8. Evans, AW., "Some properties of trip distribution methods", Transportation Research,
4,pp. 19-36, 1970.
9. Evans, S.P. and Kirby, H.R., "A three-dimensional Furness procedure for calibrating
gravity models", Transportation Research, 8, pp. 105-122, 1974.
10. Aynn, M.l, "Some computer organizations and their effectiveness", IEEE Transactions
on Computers, C-21(9), pp. 948-960, 1972.
11. Furness, K.P., "Trip forecasting", Unpublished paper cited by Evans and Kirby, 1974.
12. Kruithof, J., "Calculation of telephone traffic", De ingenieur, 52, pp. 15-25, 1937.
13. Lent, A, "A convergent algorithm for maximum entropy image restoration, with a
medical X-ray application", SPSE Conference Proceedings, Toronto, Canada, 1976.
14. Pardalos, P.M., Phillips, AT. and Rosen, lB., "Topics in parallel computing in math-
ematical programming", Department of Computer Science, The Pennsylvania State
University, 1990.
15. Robillard, P. and Stewart, N.F., "Iterative numerical methods for trip distribution prob-
lems", Transportation Research, 8, pp. 575-582, 1974.
16. Schneider, M.H. and Zenios, S.A., "A comparative study of algorithms for matrix bal-
ancing", Operations Research, 38, 1990.
17. Wilson, A.G., "Urban and Regional Models in Geography and Planning", Wiley, New
York,1974.
18. Zenios, S.A., "Parallel numerical optimization: current status and an annotated bibliog-
raphy", ORSA Journal on Computing, vol. 1, No.1, 1989.
19. Zenios, S.A. and Iu, S.L., "Vector and parallel computing for matrix balancing", Annals
of Operations Research, 22, pp. 161-180, 1990.
20. Zenios, S.A., "Matrix balancing on a massively parallel connection machine", ORSA
Journal on Computing, 2, pp. 112-125, 1990.
PART FOUR

The Computer and Econometric Studies


ANNA NAGURNEY AND JUNE DONG

Variational Inequalities for the Computation of Financial


Equilibria in the Presence of Taxes and Price Controls

ABSTRACT. In this paper we develop a financial model of competitive sectors in the presence
of policy interventions in the form of taxes and price ceilings. The model yields the equilibrium
asset, liability, and financial instrument price pattern. First, the variational ineqUality formula-
tion of the equilibrium conditions is derived and then utilized to obtain qualitative properties
of the equilibrium pattern. We then propose a computational procedure and establish conver-
gence results. The algorithm decomposes the large-scale problems into network subproblems
of special structure, each of which can then be solved exactly (and simultaneously) in closed
form. Numerical results are also presented to illustrate the algorithm's performance.

1. INTRODUCTION

In this paper we develop a framework for the formulation, analysis, and computation
of competitive financial models in the presence of policy interventions. The policy
interventions that are considered are taxes and price controls.
Financial theory since the seminal work of Markowitz (1959) and Sharpe (1970)
has been principally concerned with the portfolio optimization problem facing a
single sector. Here, by contrast, we focus on the competitive equilibrium problem in
which there are multiple sectors, each with its particular optimization problem. We
assume that each sector seeks to minimize its risk while maximizing its net return
in the presence of both taxes and price ceilings. Under the assumption of perfect
competition, each sector in the economy takes the instrument prices as given and
then determines its optimal composition of both assets and liabilities subject to an
accounting identity. In the model, the instrument prices serve as market signals which,
in turn, reflect the economic market conditions that state that if an instrument price at
equilibrium lies within the bounds, then the market for the financial instrument must
clear.
The theoretical developments in this paper are done using variational inequality
theory. We note that variational inequality theory has been used to study a plethora
of problems, including oligopolistic market equilibrium problems (cf. Gabay and
Moulin, 1980; Dafermos and Nagurney, 1987), spatial price equilibrium problems (cf.
Florian and Los, 1982; and Dafermos and Nagurney, 1984), and general economic
equilibrium problems (cf. Border, 1985; Dafermos, 1990; and Dafermos and Zhao,
1991). In addition, variational inequality theory has been used recently to investigate
the effects of policy interventions in the form of price controls in commodity markets

D. A. Belsley (ed.), Computational Techniques for Econometrics and Economic Analysis, 189-205.
© 1994 Kluwer Academic Publishers.
190 Anna Nagumey and June Dong

(cf. Nagurney and Zhao, 1991). These, however, have been partial equilibrium
models in which only a subset of agents/commodities has been treated. In this paper
we demonstrate that variational inequality theory can also be used to study policy
interventions in general equilibrium problems, in particular, in general financial
equilibrium problems.
The paper is organized as follows. In Section 2 we introduce the model, consisting
of mUltiple sectors and mUltiple financial instruments that can be held as assets and/or
as liabilities in the presence of taxes and price ceilings. The model postulates the
behavior of the sectors and, in equilibrium, yields the competitive asset and liability
holdings as well as the instrument prices. The variational inequality formulation of
the equilibrium conditions is derived and then used to study the qualitative properties.
We first show that a solution is guaranteed to exist and then establish uniqueness of
the equilibrium asset and liability pattern.
In Section 3 we propose an algorithm for the computation of the equilibrium
pattern. The algorithm is the modified projection method of Korpelevich (1977),
which is shown to converge for our model. The notable feature of this decomposition
algorithm is that it resolves the large-scale financial problem into simple network
subproblems of special structure, each of which can be solved simultaneously and
exactly in closed form using exact equilibration algorithms (cf. Eydeland and Nagur-
ney, 1989). In Section 4 we conduct numerical experiments with the algorithm
on a variety of examples. In Section 5 we summarize the results and present our
conclusions.

2. THE MODEL OF COMPETITIVE FINANCIAL EQUILIBRIUM WITH POLICY


INTERVENTIONS

In this section we develop a model of competitive financial equilibrium that permits


the incorporation of policy interventions in the form of taxes and price controls. The
behavior of the sectors is stated along with the market conditions for the financial
instruments. The equilibrium conditions are then formulated as a variational in-
equality problem. Finally, the qualitative properties of existence and uniqueness are
discussed.
We consider an economy consisting of m sectors, with typical sector i, and with
n instruments, with typical instrument j. The volume of instrument j held in sector
i's portfolio as an asset is denoted by Xij, and the volume of instrument j held in
sector i's portfolio as a liability by Yij. The assets in sector i's portfolio are grouped
into a column vector Xi E Rn, and the liabilities are grouped into the column vector
Yi E Rn. We further group the sector asset vectors into the vector X E Rmn and the
sector liability vectors into the vector Y E Rmn.
Assume that each sector's disutility can be defined through its assessment of
risk with respect to its portfolio composition minus its total expected net yield plus
its total tax payment. Each sector's risk is represented by a variance-covariance
matrix denoting the sector's assessment of the standard deviation of prices for each
instrument. The 2n x 2n variance-covariance matrix associated with sector i's assets
Variational Inequalitiesfor the Computation of Financial Equilibria 191

and liabilities is denoted by Qi.


In this model we also assume that the total volume of each balance sheet side is
exogenous. Moreover, under the assumption of perfect competition, each sector will
behave as if it has no influence on instrument prices or on the behavior of the other
sectors. Let T j denote the price of instrument j, and group the instruments into the
vector TERn.
We now describe the policy interventions. First, denote the tax rate levied on
sector i's net yield on financial instrument j as Tij and group the tax rates into the
vector T E Rmn. We assume that the tax rates lie in the interval 0 ~ T < 1. Note
therefore that the government in this model has the flexibility of levying a distinct tax
rate across both sectors and instruments. Further, denote the price ceiling associated
with instrument j by fj, and group the ceilings into a vector fERn. Note that
ceilings have been imposed on variables in other models, in particular on commodity
prices in spatial price equilibrium problems (cf. Thore (1986) and Nagurney and
Zhao (1991)).
Each sector i, then, seeks to determine its optimal composition of instruments
held as assets and as liabilities so as to minimize risk while, at the same time,
maximizing expected net yield subject to the tax rate structure. Each sector, under
the assumption of perfect competition, takes the instrument prices as given. The
portfolio optimization problem for sector i can be stated mathematically as

Minimize ( ~; ) T Qi ( ~: ) _ t
j=l
(I - Tij) T j (Xij - Yij)

subject to
n
L Xij = Si, (1)
j=l

L Yij = Si,
j=l

Xij 2: 0, Yij 2: 0; j = 1, ... , n. (2)

Constraints (1) represent the accounting identities that require that the accounts
for sector i must balance, where Si is the total financial volume held by sector i.
Constraint (2) represents the nonnegativity assumption. We let Pi denote the closed
convex set of (Xi, Yi) satisfying constraints (1) and (2).
Since Qi is a variance-covariance matrix, we will assume it is positive-definite,
and therefore the objective function for each sector is strictly convex. Thus, the
necessary and sufficient conditions for an optimal portfolio are that (Xi, Yi) E Pi
must satisfy the following system of inequalities and equalities:
For each instrument j = 1, ... , n,

2Q(11)j
. T
. Xi + 2Q(21)j
. T
. Yi -
(
1-
)
Tij Tj -
1
f.Li 2: 0,
192 Anna Nagumey and June Dong
. T . T 2
2Q(22)j . Yi + 2Q(12)j . Xi + (1 - Tij)rj - 1'i ~ 0,

. T . T
Xij . (2Q(11)j . Xi + 2Q(zI)j . Yi - (1 - Tij)rj - 1'D = 0, (3)

. T . T
Yij . (2Q(22)j . Yi + 2Q(I2)j . Xi + (1 - Tij)r j - 1'n = 0,

where the symmetric matrix Qi has been partitioned as Qi = (~i: ~i~) and
Q~ a,8)j denotes the j -th column of Q~ a,8)' with a = 1,2; f3 = 1,2. The terms 1'! and
1'~ are the Lagrange multipliers of constraints (1). A similar set of inequalities and
equalities will hold for each of the m sectors.
We now describe the inequalities governing the instrument prices in the economy.
Note that the prices provide feedback to the sectors through the objective function.
We assume that there is free disposal and, hence, the instrument prices will be
nonnegative. Mathematically, the economic system conditions are thus:
For each instrument j = 1, ... , n,
ifrj = Tj
if r j >0 and r j < Tj (4)
ifrj = O.
Therefore, if there is an effective excess supply of any instrument in the economy,
its price must be zero; if an instrument's price is positive (but not at the ceiling), the
market for that instrument must clear; and, if there is an effective excess demand for
an instrument in the economy, its price must be at the ceiling.
Let S == {riO::; r ::; fl, and K == II:I Pi x S, and combine the above sector
and market inequalities and equalities to obtain

Definition 1.
A vector (x, Y, r) E K is an equilibrium point of the competitive financial model
with policy interventions developed above if and only if it satisfies the system of
equalities and inequalities (3) and (4) for all sectors i = 1, ... , m and for all instru-
ments j = 1, ... , n simultaneously.

We now derive the variational inequality formulation of the equilibrium conditions


of the above model in the subsequent theorem.

Theorem 1.
A vector (x, Y, r) of assets and liabilities of the sectors and instrument prices is a
competitive financial equilibrium with policy interventions if and only if it satisfies
the variational inequality problem:
Find (x, Y, r) E K satisfying
m n
LL
i=1 j=1
[2(Q~11)/' Xi + Qhl)/ .Yi) - (1 - Tij}rj] X [X~j - Xij]
Variational Inequalities for the Computation of Financial Equilibria 193
m n
+L L [2(Qt22)j T . Yi + Qt\2)j T . Xi) + (1 - Tij}rj] x [YL - Yij]
i=1 j=1

(5)

for all (X', y', r') E K.

Proof: Assume that (x, y, r) E K is an equilibrium point. Then inequalities (3) and
(4) hold for all i and j. Hence, one has
n
L [2(Qtll)j T. Xi + Qhl)j T. Yi) - (1 - Tij)rj - I.d] x [X~j - Xij] ~ 0,
j=1

from which it follows, after applying constraint (I), that


n
L [2(Q(II)/· Xi + Q(21)/ . Yi) - (1 - Tij)rj] x [X~j - Xij] ~ o. (6)
j=1

Similarly, one can obtain


n
L [2(Qh2)j T . Yi + Q(\2)j T . Xi) + (1 - Tij )rj] x [Y~j - Yij] ~ O. (7)
j=1

Summing inequalities (6) and (7) over i, one concludes that for (x, y) E II:: 1 Pi,
m n
L ~ [2(Q~II)/·Xi+QhJ)/·Yd-(1-Tij)rj] x [X~j-Xij]
i=1 j=1

m n
+L L [2(Q(22)/·Yi+Q(I2)/·Xi)+(I- Tij )rj] x [Y~j-Yij] ~0(8)
i=1 j=1

for all (X', y') E IT::I Pi.


From inequality (4), one can conclude that 0 :S rj :S Tj must satisfy
m

" (1 - 7.--)
'L..J 'J
(x··
'J
- y''J.) x (r'J - rJ >0
.) - (9)
i=1

for all 0 :S rj :S Tj, and, therefore, rES must satisfy


n m

L L (1 - Tij)(Xij - Yij) x (rj - rj) ~ 0 (10)


j=1 i=1
194 Anna Nagurney and June Dong

for all r' E S. Summing inequalities (8) and (10) produces the variational inequality
(5).
We now establish that a solution to variational inequality (5) will also satisfy
equilibrium conditions (3) and (4). If (x,y,r) E K is a solution of variational
inequality (5) and if one lets x~ = Xi, y~ = Yi, for all i, one obtains

(11)

for all r' E S, which implies conditions (4).


Finally, let rj = rj, for all j, in which case substitution into (5) yields
m n
L L [2(Q(1I)t· xi +Q(zI)t·Yi)-(I-Tijh] x [X~j-Xij]
i=1 j=1

m n
+ L L [2(Q(z2)t· Xi + Q{t2)t· Yi) + (1 - Tij)rj] x [Y~j - Yij] ~ 0
i=1 j=1
(12)

for all (x', Y') E rr:1


Pi, which implies (3). The proof is complete.
We now address the qualitative properties of the equilibrium pattern through the
study of variational inequality (5). Observe that the feasible set K is compact and that
the function that enters the variational inequality (5) is continuous. It follows from
the standard theory of variational inequalities (cf. Kinderlehrer and Stampacchia
(1980» that the solution (x, y, r) to (5) is guaranteed to exist. The uniqueness result
is given in the following theorem.

Theorem 2. The equilibrium asset and liability pattern (x, y) is unique.

Proof: Suppose that (x 1 , Y I, r I) E K and (x 2, y2, r2) E K both satisfy variational


inequality (5), so
m n
L L [2(Q(1I)t· x~ + Qb)j T . yf) - (1 - Tij)rJ] x [x~j - xL]
i=1 j=1

m n

+ "L...J L...J[2(Q(22)j
" i T . Yi1 + Q(12)j
i T . XiI) + (1 - Tij )rjI] x ['
Yij - Yij1 ]
i=1 j=1

+ t. [t. (1 - T;;)(xl; - Y!;l] X [rj - r;] " 0 (13)

for all (x', Y,' r') E K, and


Variational Inequalities for the Computation of Financial Equilibria 195
m n

L L [2(Q~II)/' X~ + Qhl)/ . yf) - (1 - Tij)r;] x [X~j - X;j]


i=1 j=1

m n

+L L [2(Q(22)/' yf + Q~12)/' x;) + (1 - Tij)r;] x [Y~j - yfj]


i=1 j=1

(14)

for all (X', y', r') E K.


Setting (x', y', r') = (x 2, y2, r2) in (13) and (x', y', r') = (Xl, yl, rl) in (14) and
adding the results yields, after simplification,
m n
L L [2(Q(II)j T . (x! - x;) + Q~21)j T . (yJ - yf))] x [x;j - xL]
i=1 j=1

m n

+ L: L: [2(Qh2)j T . (yJ - yf) + Q(I2)j T . (x! - xm] x [yfj - yJj] 2 O.


i=1 j=1
(15)

But since each Qi; (i = 1, ... , m) is positive-definite, the left-hand side of (15) must
be nonpositive and, hence, we can conclude that xl = x 2 and yl = y2. The proof is
complete.
In the special case in which there are no taxes or price ceilings imposed, the above
model collapses to the model developed in Nagurney, Dong, and Hughes (1992). In
this case condition (4) simplifies as follows. Since Tij = 0 for all i and j, and f j is
effectively set at infinity for all j, only the equality and the second inequality would
apply in conditions (4). In addition, in the model without policies, the set S would
no longer be bounded, and, hence, the feasible set K would no longer be compact.
Hence, another existence proof would be required.

3. THE ALGORITHM

In this Section we describe a decomposition algorithm for the computation of the


solution to variational inequality (5) that governs the general competitive financial
equilibrium model with taxes and price controls developed in Section 2. The al-
gorithm resolves the large-scale problems into simpler network subproblems, each
of which can then be solved explicitly and in closed form using exact equilibration
algorithms (cf. Eydeland and Nagurney (1989), and the references therein). The
algorithm that we propose is the modified projection method of Korpelevich (1977).
We first state the algorithm in the context of the financial model and then prove
convergence.
196 Anna Nagurney and June Dong

The Algorithm
Step 0: Initialization:
Set (xO, yO, rO) E K. Let k = 1. Let 0: be a positive scalar.

Step 1: Compute (x k , rl, fk) by Solving


m n
~ ~ [-k
~ ~ x + 0: (2(Qi(11)j'
T k-I
Xi + Qi(21)j T k-I)
. Yi
ij
i=1 j=1

- (1 - Tij )rjk-I) - X k-I]


ij . [X'ij - X
-k]
ij

m n
~ ~ [ k
+~ ~ Yij + 0: (2Qi(22)j'
T k-I
Yi + Qi{l2)j T k-I)
. xi
i=1 j=1

+( 1 - Tij )rjk-I) - Yij


k-I] . [Yij
' - Yij
-k]

+ f;n [ -k
rj +0: ~
m k-I
(1- Tij)(Xij
k-I k-I
-Yij ) - rj 1
. [r'J - fk]
J >
, V(x' " Y'
-0 r') E K • (16)

Step 2: Compute (xk , yk , rk) by Solving


m n
~ ~
~ ~
[ k
Xij + 0: (2(Qi(11)j'
T -k
xi + (21)j
Qi T -k)
. Yi
i=l j=1

m n

+~ ~
~ ~
[ k
Yij + 0: (2(Qi(22)j'
T -k Qi
Yi + (12)j
T -k)
. Xi
i=1 j=1

+f; n k -k-I
[ rj+O:~(I-Tij)(Xij
m
-k-I k-I
-Yij ) - rj 1
. [r' - r k ]
J J
>0
-,
V(x' Y' r')
"
E K. (17)
Variational Inequalities for the Computation of Financial Equilibria 197

Convergence Verification:
If maxi
ij Xij k-II<
k - Xij _ f, maxi k - Yij
ij Yij k-II< _ f, maxi
j rjk - rjk-II<_ f, WI'th f some
positive preselected tolerance, then stop; else, set k = k + 1, and go to Step 1.
We now give an interpretation of the above algorithm as an adjustment process.

x: ,
In (16) each sector i at each time period k receives instrument price signals r k - I
and determines its optimal asset and liability pattern yf. At the same time, the
system determines the prices fk in response to the difference of the total effective
volume of each instrument held as an asset minus the total effective volume held as
a liability at time period k - 1. The agents and the system then improve upon their
approximations through the solution of (17). The process continues until stability is
reached; that is, until the asset and liability volumes and the instrument prices change
negligibly between time periods.
Observe now that both (16) and (17) are equivalent to optimization problems, in
particular to quadratic programming problems of the form
MinimizexEK XT. X + hT . X, (18)
where X == {(x, Y, r) E R2mn+n} and h E R2mn+n consists of the fixed linear terms
in the inequality subproblems (16) and (17). Moreover, problem (18) is separable
in x, Y and r, and, in view of the feasible set, has the network structure depicted in
Figure 1.
Convergence of the algorithm follows (cf. Korpelevich, 1977) under the as-
sumption that the function F that enters the variational inequality is monotone and
Lipschitz continuous, where 0 < a < ±
and L is the Lipschitz constant. We now
prove that these conditions are always satisfied. We first establish monotonicity.
Let (Xl, yl, rl) E K and (x2, y2, r2) E K. In evaluating monotonicity, we must
show that
m n
"''''[ .
L..J L..J 2(Q(11)j T
. (XiI - xJ
2
+ Q(21)j
. T I
. (Yi -
2
yJ)
i=1 j=1

n
+L L
m

[2(Qb)j T . (yJ - y;) + Q(12)/ . (X! - X;))


i=1 j=1

x [rlJ - r2]
J
>
-
0,

(19)
for all (Xl, yl ,rl) E K, (x2, y2,r2) E K. After some algebra, the left-hand side of
(19) reduces to
198 Anna Nagumey and June Dong

Asset SubprobleMs

SI

lio.bility SubprobleMs
SI

Price SubprobleMs

rj

Fig. 1. Parallel network structure of the variational inequality subproblems.

m n.

L: L: [2(Q(I1)j T . (x}- xD + Qhl)j T . (yJ - y;))] x (xlj - X;j)


i=1 j=1

m n.

+ L: L: [2(Qh2)j T . (yJ - y;) + Q(12)/' (x}- x;))] x (yJj - y;j),


i=1 j=1

which is clearly nonnegative since Qi has been assumed to be positive-definite. We


have thus established:
Variational Inequalities for the Computation of Financial Equilibria 199

Lemma 1. The function that enters the variational inequality (5) is monotone.

We now investigate Lipschitz continuity in the following lemma.

Lemma 2. The function F(x, y, r) that enters variational inequality (5) is Lipschitz
continuous; that is, for all (xl, yl, rl), (x 2, y2, r2) E K,

with Lipschitz constant L > o.


Proof: F(x, y, r) can be represented as

(21)

where C is of the form

Q
( _(TB)T TB) (22)
0
and

Q= (23)

Qiz
1- Til -1

1- Tmn -1
T= 1- Til
B= 1
. (24)

I- Tmn 1 2mnxn

(25)

and

(26)
200 Anna Nagumey and June Dong

Since C T C is a symmetric matrix, if we let

2mn+n
£2 = max
19~2mn+n L
k=1
(27)

c
where elk is the (l, k )th element of C T C, then (£2 J - T C) is a symmetric positive-
definite matrix, and, therefore, (C T C - £2 J) is a negative-definite matrix. Hence,

(28)

Consequently,

with £ < O. The proof is complete.


Combining Lemmas 1 and 2 we obtain

Theorem 3. The decomposition algorithm converges to the equilibrium asset, liabil-


ity, and price pattern (x, y, r) satisfying variational inequality (5).

4. NUMERICAL RESULTS

In this section we consider the numerical solution of the financial equilibrium model
with policy interventions introduced in Section 2. We emphasize that the model is
designed with empirical applications in mind. For example, the framework that has
been developed fits well with flow-of-funds accounts data that are collected quarterly
or annually to provide snapshots of the financial side of the economy. In the case
of the United States, the data sets are maintained by the Federal Reserve Board of
Governors. For an introduction to flow of funds accounts, see Cohen (1987), and
for a recent algorithmic approach to the balancing of these accounts, see Hughes and
Nagurney (1991).
We now present several examples with a variety of tax and price control scenarios.
We consider an economy with two sectors and three financial instruments. Here we
assume that the "size" of each sector 8i is given by 81 = 1, and 82 = 2. Each
sector realizes that the future values of its portfolio are random variables that can be
described by their expected values and variances and believes that the mean of these
expected values is equal to the current value. The variance-covariance matrices of
the two sectors are:
Variational Inequalities for the Computation of Financial Equilibria 201

1 .15 .3 -.2 -.1 0


.15 1 .1 -.1 -.2 0
.3 .1 1 -.3 0 -.1
QI=
-.2 -.1 -.3 1 0 .3
-.1 -.2 0 0 1 .2
0 0 -.1 .3 .2 1
and
1 .4 .3 -.1 -.1 0
.4 1 .5 0 -.05 0
Q2= .3 .5 1 0 0 -.1
-.1 0 0 1 .5 0
-.1 -.05 0 .5 1 .2
0 0 -.1 0 .2 1

Note that the terms in the blocks Qb, Q~I' Q~2' Q~I' are not positive, since the
returns flowing in from an asset item must be negatively correlated with the interest
expenses flowing out into the portfolio's liabilities. (For details see Francis and
Archer (1979).)
We now use the above data to construct the examples. The algorithm was coded
in FORTRAN, compiled using the FORTVS compiler, optimization level 3, and the
numerical runs were done on an ffiM 3090/600J. For each example, the variables
were initialized as follows: rJ
= 1, for all j, Xij = ~, for all j, Yij = ~, for all j.
The 0: parameter was set to .35. The convergence tolerance f was set to 10-3 •
In the first example, we set the taxes T = 0, for all sectors and instruments and
the price control ceilings f to 2 for all instruments. The numerical results for the first
example are

Results for Example 1


Equilibrium Prices:

TI = .91404 T2 = .94535 T3 = 1.14058

Equilibrium Asset Holdings:

X11 = .28736 XI2 = .40063 X13 = .31200

X21 = .75644 X22 = .56740 X23 = .67616

Equilibrium Liability Holdings:

YI1 = .32035 YI2 = .51047 Y13 = .16917

Y21 = .72447 Y22 = .45723 Y23 = .81830.


202 Anna Nagumey and June Dong

The algorithm converged in 17 iterations and required 3.62 miliseconds of CPU


time for convergence, not induding input/output time. Note that in this example, the
solution is one in which the policies, in essence, have no effect. Hence, this algorithm
may also be used to compute solutions to financial models in the absence of taxes
and price controls, provided that the taxes are set to zero and the price ceilings are
set at a high enough level. The resulting model is then a special case of our more
general one.
In the second example, we kept the taxes at zero but now tightened the price
ceilings to .5 for each instrument. The numerical results for the second example are

Results for Example 2


Equilibrium Prices:

TI = .27083 T2 = .30192 T3 = .49716

Equilibrium Asset Holdings:

XII = .28730 XI2 = .40043 X13 = .31227

X21 = .75653 X22 = .56752 X23 = .67595

Equilibrium Liability Holdings:

Yll = .32005 YI2 = .51074 Y13 = .16920

Y21 = .72464 Y22 = .45708 Y23 = .81828.

The algorithm converged in 18 iterations and required 3.82 miliseconds of CPU


time for convergence. Note that in this example, the equilibrium prices all lie within
the tighter bounds. In particular, the price of instrument 3 is approximately at its
upper bound of .5
In the third example, we raised the tax rate from zero to .15 for all sectors
and instruments and kept the instrument price ceiling at .5, as in Example 2. The
numerical results for the third example are

Results for Example 3


Equilibrium Prices:

TI = .23256 T2 = .26871 T3 = .49995

Equilibrium Asset Holdings:

Xli = .28726 XI2 = .40035 X13 = .31239


Variational Inequalities for the Computation of Financial Equilibria 203

X21 = .75663 X22 = .56777 X23 = .67560

Equilibrium Liability Holdings:

YII = .31965 Y12 = .51098 Y13 = .16938


Y21 = .72460 Y22 = .45680 Y23' = .81860.

The algorithm converged in 19 iterations and required 4.04 miliseconds of CPU


time for convergence.
In the fourth example, we kept the price ceilings at .5 but increased the tax rate
from .15 to .30. The numerical results for this example are

Results for Example 4


Equilibrium Prices:

Tl = .17990 T2 = .22313 T3 = .5000.

Equilibrium Asset Holdings:

XII = .28782 X12 = .40104 X13 = .31114


X21 = .75776 X22 = .56804 X23 = .67420

Equilibrium Liability Holdings:

YII = .31846 Y12 = .51107 Y13 = .17046


Y21 = .72386 Y22 = .45497 Y23 = .82117.

The algorithm converged in 24 iterations and required 5.09 miliseconds for con-
vergence.
In the fifth and final example. we kept the same tax rate as in Example 4 at T = .3
but raised the price ceilings to f = 2. the same level as in Example 1. The numerical
results are

Results for Example 5


Equilibrium Prices:

Tl = .87731 T2 = .92179 T3 = 1.20088


204 Anna Nagurney and June Dong

Equilibrium Asset Holdings:


XII = .28710 XI2 = .40066 X13 = .31224
X21 = .75613 X22 = .56744 X23 = .67643

Equilibrium Liability Holdings:


YII = .32066 YI2 = .51040 YI3 = .16894
Y21 = .72478 Y22 = .45746 Y23 = .81776.

The algorithm converged in 17 iterations for this example and required 3.59
miliseconds of CPU time for convergence.
In line with (4), for each of the above examples, the algorithm yields asset
and liability patterns in which the difference between the total effective volume of
an instrument held as an asset is approximately equal to the total volume of the
instrument held as a liability when the instrument price is not at one of the bounds.
Hence, the market clears for each such instrument, and the price of each instrument
is positive in equilibrium.

5. SUMMARY AND CONCLUSIONS

This paper introduces a competitive multi-sector, multi-instrument financial model


that explicitly allows for the incorporation of policy interventions in the form of taxes
and price controls. The equilibrium conditions for this model are derived using the
assumption that the behavior of each sector is one of portfolio optimization. It is
then shown that the governing equilibrium conditions are equivalent to a variational
inequality problem defined over a compact set. This variational inequality is used
to demonstrate existence of the equilibrium asset, liability, and price pattern and to
prove that the equilibrium asset and liability holdings are unique.
A decomposition procedure is next proposed for the computation of the equilib-
rium pattern and convergence established. A notable feature of the algorithm is that
it resolves large-scale problems into subproblems of special network structure, each
of which can then be solved exactly and in closed form. Finally, several examples
are presented illustrating the numerical performance of the algorithm under a variety
of policy scenarios. The algorithm required only miliseconds of CPU time for con-
vergence for the particular examples and no more than 25 iterations at a fairly precise
level of accuracy. These results suggest that the algorithm should perform well on
empirical models, which is a subject of ongoing research.

ACKNOWLEDGEMENTS

The authors are grateful to D.A. Belsley for suggestions that improved the presenta-
tion of this work.
Variational Inequalities for the Computation of Financial Equilibria 205

This research was funded, in part, by cooperative agreement No. 58-3AEN-


0-8066 from the Economic Research Service of the United States Department of
Agriculture and by a Faculty Award for Women from the National Science Founda-
tion, NSF Grant No. DMS 9024071. .
This research was conducted on the Cornell National Supercomputer Facility,
a resource of the Center for Theory and Simulation in Science and Engineering at
Cornell University, which is funded in part by the National Science Foundation, New
York State, and IBM Corporation.

REFERENCES

Border, K. C. (1985) "Fixed point theorems with applications to economics and game theory",
Cambridge University Press, Cambridge, United Kingdom.
Cohen, J. (1987) "The flow of funds in theory and practice, Financial and monetary studies
15", Kluwer Academic Publishers, Dordrecht, the Netherlands.
Dafermos, S. (1990) "Exchange price equilibria and variational inequalities", Mathematical
Programming 46,391-402.
Dafermos, S. and Nagurney, A. (1984) "Sensitivity analysis for the general spatial economics
equilibrium problem", Operations Research 32, 1069-1086.
Dafermos, S. and Nagumey, A. (1987) "Oligopolistic and competitive behavior of spatially
separated markets", Regional Science and Urban Economics 17,245-254.
Eydeland, A. and Nagurney, A. (1989) "Progressive equilibration algorithms: the case of
linear transaction costs", Computer Science in Economics and Management 2, 197-219.
Florian, M. and Los, M. (1982) "A new look at static spatial price equilibrium problems",
Regional Science and Urban Economics 12, 579-597.
Francis, J. C. and Archer, S. H. (1979) "Portfolio analysis", Prentice-Hall, Inc., Englewood
Cliffs, New Jersey.
Gabay, D. and Moulin, H. (1980) "On the uniqueness and stability of Nash equilibria in non-
cooperative games", in: A. Bensoussan, P. Kleindorfer, and C. S. Tapiero, eds., "Applied
stochastic control of econometrics and management science", North-Holland, Amsterdam,
The Netherlands.
Hughes, M. and Nagurney, A. (1992) "A network model and algorithm for the analysis and
estimation of financial flow of funds", Computer Science in Economics and Management
523-39.
Kinderlehrer, D. and Stampacchia, G. (1980) "An introduction to variational inequalities and
their applications", Academic Press, New York.
Korpelevich, G. M. (1977) "The extragradient method for finding saddle point and other
problems", Ekonomicheskie i Mathematicheskie Metody (translated as Matekon) 13, 35-
49.
Markowitz, H. M. (1959) "Portfolio selection: efficient diversification of investments", John
Wiley and Sons, Inc., New York.
Nagurney, A., Dong, 1., and Hughes, M., (1992) "The formulation and computation of general
financial equilibrium", Optimization, 26,339-354.
Nagurney, A. and Zhao, L. (1991) "A network equilibrium formulation of market disequilib-
rium and variational inequalities", Networks 21, 109-132.
Sharpe, W. (1970) "Portfolio theory and capital markets", McGraw-Hill Book Company, New
York.
Thore, S. (1986) "Spatial disequilibrium", Journal of Regional Science 26, 660-675.
Zhao, L. and Dafermos, S. (1991) "General economic eqUilibrium and variational inequalities",
Operations Research Letters 10, 369-376.
AGAPI SOMWARU, V. ELDON BALL AND UTPAL VASAVADA

Modeling Dynamic Resource Adjustment Using


Iterative Least Squares

ABSTRACf. The flexible accelerator specification of the demand for capital and labor is
estimated using data generated by the United States agriculture. It is assumed that firms
maximize the discounted value of expected profits subject to a technology that implies capital
and labor stocks are costly to adjust. Given multiproduction behavior, an investment path for
the quasi-fixed stocks is developed assuming maximization of the discounted sum of future
profits over an infinite horizon. The consistency of data with the adjustment-cost specification
requires that the firm's value function be convex in prices. We impose convexity based on
the Cholesky factorization of a matrix of constant parameters association with price effects.
These matrixes are fitted subject to the condition that they must be non-negative definite. Use
non-linear constrained optimization approach for estimating the model the system of quasi-
fixed input, variable input, and output equations jointly by inequality constrained iterative least
squares method.

1. INTRODUCTION

Econometric studies of producer behavior that exploit the duality between production
and cost or profit functions are numerous in the empirical literature. The simultaneous
development of duality theory and accessible computational algorithms for nonlinear
systems estimation have contributed to the growth of this field of inquiry. Of particular
interest to the present study are multiproduct-multifactor models of a production
system. Early contributions to the literature include studies by Shumway (1983) and
Weaver (1983) which focus on the profit maximizing agricultural firm. An important
drawback to these studies is the adoption of a static framework that fails explicitly to
model quasi-fixed input adjustment.
In contrast, a recent study by Vasavada and Chambers (1986) proposed an em-
pirical framework to model optimal adjustment of quasi-fixed factors based on well
known results in the adjustment-cost literature. The adjustment cost model of a firm
is used to rationalize the flexible accelerator specification. Their study adopted the
simplifying assumption that the technology was separable in outputs, thereby per-
mitting construction of a single output aggregate. Vasavada and Ball (1988) extends
this work to include multiple outputs. Although this study relaxes the separability
assumption, the estimated investment demand and output supply equations fail to
satisfy the necessary integrability conditions.
This paper improves on previous efforts by incorporating restrictions from theory
as part of the maintained model. We illustrate the imposition of curvature and mono-

D. A. Belsley (ed.). Computational Techniques for Econometrics and Economic Analysis. 207-218.
© 1994 Kluwer Academic Publishers.
208 A. Somwaru et al.

tonicity restrictions on the parameter estimates, not easily handled in conventional


estimation approaches. A flexible accelerator specification of the demands for capital
and labor is estimated using data from the United States agriculture. It is assumed that
firms maximize the discounted value of expected profits subject to a technology that
implies capital and labor stocks are costly to adjust. The consistency of data with the
adjustment-cost specification requires that the representative firm's value function be
convex in prices. We impose convexity based on the matrix of second-order price
effects. This matrix is fitted subject to the condition that it should be non-negative
definite. We develop and implement a computational procedure for estimating this
dynamic production system subject to inequality restrictions implied by convexity in
prices.
The paper is organized as follows: Section 2 reviews the relevant theory; the
common assumption of static price expectations is adopted. Section 3 describes
the functional form specifications. Empirical implementation and data are discussed
in Section 4, and the estimation results are discussed in Section 5. Concluding
comments are provided in the final section.

2. MODEL OF THE MULTIPRODUCT FIRM

A firm's technology is described by a multiple output production function Ym + 1


F(Y, X, K, K) giving the maximum amount of output Ym+ 1 that can be produced
from perfectly variable inputs X E Ri and quasi-fixed stocks K E R+., given that
other outputs Y E R+ are produced. The inclusion of K as an argument in F
indicates the presence of adjustment costs. The production function satisfies F > 0;
F is twice continuously differentiable; Fx, FK > 0 and F k < 0; and F is strictly
concave in all its arguments. These assumptions are discussed in detail in Epstein
(1981).
The firm is assumed to choose an investment path for the quasi-fixed stocks that
maximizes the discounted sum of future profits over an infinite horizon:

max
I?O
J eTtpTy - WTX - qTK + F(y,X,K,K)dt, (1)

subject to
K = I - {)K,

K(O) = Ko > 0,
where {) is an n x n diagonal matrix of positive depreciation rates. P E R+
is the price vector corresponding to Y; W E Ri and q E R+. are rental prices
corresponding to X and K, respectively. All prices are measured relative to the
price of output Ym + I. Current relative prices are expected to persist indefinitely.
A firm may form expectations rationally in this manner when information is costly
(Chambers and Lopez, 1984). r > 0 is the constant discount rate and r is an
appropriately dimensioned scalar matrix with r as the diagonal element. Ko is the
Modeling Dynamic Resource Adjustment 209

initial endowment of the quasi-fixed factors. Given Ko, the producer chooses a time
path K(t), Y(t), X(t), and Ym + 1(t), to maximize the present value of rents over an
infinite horizon.
Let J(P, W, q, K) denote the optimal value of (1). The Hamilton-Jacobi equation
(Arrow and Kurz, 1970) then gives

rJ = max
I?O
[pTy - WTX - qTK + F(Y,X,K,K) + JK(I - 6K)] , (2)

where JK(-) is the vector of shadow prices associated with the quasi-fixed stocks.
Under the regularity conditions stated above, Epstein (1981) has shown that the
value function J is dual to F and obeys J K > 0; J and JK are twice continuously
differentiable; (!: + 6) J K + q - J K K K* > 0; J is convex in (P, W, q); and
rJ - JKK* is convex in (P, W, q). The result that (!: + 6) JK + q - JKKK* > 0
restates the equation of motion for J K implied by the maximum principle and follows
by applying the envelope theorem to (2) using the assumption that FK > O. The
statement that J K < 0 follows from the first-order conditions for (2), which imply
that F i< = -JK and hence the result. Convexity of J in (P, W, q) is intuitively seen
by noting that the objective function (1) is the limit of the sum of linear functions in
(P, W, q). The requirement that r J - J K K* be convex in (P, W, q) is an integrability
relationship between J and F. For later use, note that this condition simplifies to
convexity of J when J K is linear in (P, W, q).
The advantage of representing the restrictions implied by dynamic theory in terms
of J is its analytical tractability, since the duality between r J and F implies that the
technology can be recaptured by solving

F*(Y, X, K, K) = P,W,q
min [rJ(P, W, q, K) _ pTy +

WTX +qTK - JKK*]. (3)

When the model generating the data can be approximated by (1), a parametric
characterization of optimal policy rules is available. Optimal output supply and
investment demand equations are obtained by applying the envelope theorem to (2).
Differentiating with respect to (P, W, q) yields

(4)

X* = !:Jw + JWKK* , (5)

K* = J:;i (!:Jq + K) . (6)

The numeraire output obtained from (3) is

(7)
210 A. Somwaru et at.

Equations (4) - (7) provide a system of optimal supply and demand equations.
Given a characterization of J that satisfies the regularity conditions, these equations
provide a straightforward procedure for modelling quasi-fixed input adjustment.

3. MODEL ESTIMATION

To implement the algorithm supplied by equations (4) - (7), a parametric value


function must be specified. Consider the following candidate for the value function:

J(P, W, q, K) = an + far aT af af]


[il
All AJ2 AJ3 A14

[i1
A21 A22 A 23 A24
+t[pTWT qT KT] (8)
A31 A32 A33 A34
A41 A42 A43 A44

This is a second-order Taylor series expansion of J in (P, W, q, K). The correspond-


ing output supply and investment demand equations from applying (4) - (7) to (8)
are

(9)

(10)

(11)

(12)

Before discussing the regularity conditions notice that the net investment equation
(11) is a multivariate flexible accelerator (Nadiri and Rosen, 1969) with adjustment
matrix M = (1: + A3,/ ). This can be seen by rewriting (11) as

K* = M[K - K(P, W, q)] , (13)

where

(14)
Modeling Dynamic Resource Adjustment 211

is the vector of steady state stocks.


The regularity conditions are readily translated into restrictions on the parameters
of J. Since J K is linear in normalized prices, this curvature condition is equivalent to
convexity of J in (P, W, q). This, in turn, is equivalent to non-negative definiteness
of the matrix of constant parameters associated with second-order price effects. To
impose convexity on the normalized quadratic value function the matrix of constant
parameters can be represented in terms of its Cholesky factorization:
A11 AJ2 A 13 A14
A21 A22 A23 A24 = LD LT
A31 A32 A33 A34
A41 A42 A43 A44
where L is a unit triangular matrix (Lii = 1, Lij = 0, j > i) and D is a diagonal
matrix. The matrix of parameters is non-negative definite if and only if the diagonal
elements (Dii) of the matrix D are non-negative (Rao, 1973). Stability of the optimal
adjustment path requires that (r. + A;/) be a stable matrix, i.e., that all its eigenvalues
have negative real parts. It can be shown (Mortensen, 1973) that the net investment
equations exhibit properties similar to static factor demands if, in addition, the matrix
A;41 A33 is symmetric. Moreover, symmetry is sufficient to rule out optimality of
cyclical adjustment of the quasi-fixed stocks, which provides further motivation for
a test of this specification in the empirical section of the paper.
Two modifications are in order to render the system of equations (9) - (12)
estimable. First, a discrete approximation (K ~ K t - Kt-d to (11) is used.
Second, additive disturbances are appended to reflect a random optimization error.
The system of nonlinear equations can then be written:

!it (Zit , e) = Uit, = m + k + n, t = 1, ... , T ,


i (15)

where Zit is a matrix of observed data, e is a vector of coefficients to be esti-


mated, and Uit is an error of optimization. Assuming that the errors are temporally
independent, each with mean zero, identical distributions, and a positive definite
variance-covariance matrix n, the Aitken-type estimator e is obtained by minimiz-
ing with respect to e

(16)

This procedure requires a first -stage estimate of e using multivariate least squares,
followed by the estimation of n based on the inner product of residuals. An estimate
of the variance-covariance matrix is then substituted into (16), and eis estimated
using ~e new weighting matrix. This procedure is repeated until the parameter
vector e and the estimated variance-covariance matrix n stabilize. For the Aitken
estimator, this iterative procedure yields an asymptotically efficient estimator, which
is otherwise not the case (Malinvaud, 1970).
The estimation procedure used in this study must be capable of imposing the the-
oretical curvature restrictions on the value function. This procedure is characterized
by the non-linear constrained optimization problem
212 A. Somwaru et al.

8(0) = ~n lIT L L lit (Zit, 0)T (0- 1 ® 1) lit (Zit , 0) . (17)


i t
subject to

hi(0) ~O, i= 1, ... ,m+k+n,


where hi (0) are non-negativity restrictions based on a Cholesky factorization of
a real symmetric matrix. The objective function (17) is minimized subject to the
constraint set. The model specification and estimation are accomplished using the
General Algebraic Modeling System (GAMS version 2.25) (Brooke, Kendrick and
Meeraus, 1988) on the Theory Center's Supercomputer (CNSF) at Cornell University.
Numerical solutions to this problem are computer intensive and the CNSF was an
extremely valuable resource for this purpose.
Using GAMS, we modify the Aitken procedure by imposing inequality constraints
in the first stage (Ruble, 1968), replacing (0- 1 ® 1) in (17) with the identity matrix
and then solving for e. Using e,a new estimate of 0 is obtained and the problem
(17) is solved with (0- 1 ® 1) employed as the weighting matrix. Finally, we iterate
e
over steps one and two until the estimated parameter vector and error variance-
convariance matrix 0 stabilize (Iterative Least Squares).

4. EMPIRICAL IMPLEMENTATION AND DATA

The empirical model identifies two output categories including livestock and crops.
The stocks of durable equipment, service buildings, land, farm-produced durables,
and self-employed labor are assumed costly to adjust. Hired labor, energy, and other
purchased inputs are assumed to adjust freely to current prices and stocks of the
quasi-fixed inputs.
The output series are defined as the quantities marketed (including unredeemed
Commodity Credit Corporation loans) plus changes in farmer-owned inventories and
quantities consumed by farm household. The indexes of output are based on value
to the producer. For this reason, commodity prices are adjusted to reflect direct
payments to producers under government programs.
The labor data were developed by Gollop and Jorgenson (1980). They disag-
gregate labor input and labor cost into cells cross-classified by two sexes, eight age
groups, five education groups, two employment classes (hired and self-employed),
and ten occupational groups.
The value of labor services equals the value of labor payments plus the imputed
value of self-employed and unpaid family labor. The imputed wage rate is set
equal to the mean wage rate of hired farm workers with the same occupational and
demographic characteristics.
The capital input data are derived from information on investment and the outlay
on capital services. There are twelve investment series used to calculate capital
stocks. The perpetual inventory method (Jorgenson, 1974) is used, and the service
lives are those of Bulletin F published by the U. S. Treasury Department. Rental
prices for each asset are constructed taking account of variations in effective tax rates
Modeling Dynamic Resource Adjustment 213

and rates of return, depreciation, and capital gains. The value of capital services is
computed as the product of the rental price and the quantity of capital at the end of the
previous period. A more detailed discussion of the procedures used in constructing
the capital price and quantity series is found in Ball (1985).

5. ESTIMATION RESULTS

The system of quasi-fixed input, variable input, and output equations is jointly esti-
mated by an inequality-constrained iterative least squares method, which is equiva-
lent to the maximum likelihood method. Parametric inequality constraints associated
with the convexity of J in (P, W, q) are imposed during estimation. When these
restrictions are true, the resulting estimates are asymptotically efficient. Further jus-
tification for imposing these constraints comes from noting that the derived structural
model is itself an implication of economic theory. It would, therefore, be inconsistent
selectively to utilize only the structural model implied from theory while rejecting
associated parametric restrictions.
This highly nonlinear model has 788 variables, 697 equations and 8582 non zero
elements. 4.71 Mbytes are required for execution of each iteration on the variance-
covariance matrix.

Adjustment Matrix
Point estimates of adjustment parameters are reported in Table 1. Since the accepted
model did not preclude interdependent adjustments, off-diagonal elements of this im-
portant matrix are non-zero. A positive off-diagonal element, say for M\3, indicates
that when input 3 is below its long-run value, disinvestment in input 1 is induced.
Similarly, a negative value for the same elements of the adjustment matrix implies
that, under identical circumstances, investment in input 1 will be induced. In this
fashion, numerical values of off-diagonal elements reflect structure of interdependent
adjustments in aggregate U.S. agriculture.
Now turn to the diagonal elements of the adjustment matrix. Consider the element
Mil. A value of -0.160 for this coefficient suggests that, when actual stocks for
durable equipment diverge from long-run values, it takes a little over six years to
complete the needed adjustment, given that all other inputs are at long-run equilibrium
levels. Similarly, M 22 , M 33 , and M44 supply an interpretation of adjustment speeds
for other quasi-fixed inputs.
Real estate takes approximately fifteen years to adjust, while farm-produced
durables take only two years. By contrast, the adjustment speed for self-employed
and unpaid family labor is close to nine years to adjust. Values for remaining in-
puts are, for the most part, numerically similar. One important conclusion emerging
from the mUltiple-input, multiple-output model is that disaggregation of labor into
two categories provides more reasonable results. Earlier investment studies derived
high adjustment lags for labor when complete supply response systems are esti-
mated (Vasavada and Chambers, 1986). Since self-employed labor and hired labor
have qualitatively different characteristics, such a disaggregation scheme improves
214 A. Somwaru et al.

TABLE 1
Estimated Adjustment Matrix.
Parameter Estimate Parameter Estimate
M(I,I) -0.160 M(3,1) 0.824
M(I,2) -0.127 M(3,2) -0.050
M(I,3) 0.182 M(3,3) -0.554
M(I,4) -0.060 M(3,4) -0.326
M(2,1) -0.035 M(4,1) 0.045
M(2,2) -0.060 M(4,2) -0.161
M(2,3) 0.019 M(4,3) 0.073
M(2,4) 0.036 M(4,4) -0.134
Note: 1 is durable equipment and service buildings, 2 is real estate,
3 is farm produced durables, and 4 is self-employed and unpaid family labor.

specification in applied dynamic production models (Gunter and Vasavada, 1988).

Elasticities: Short-Run
Point estimates for the model reported in Table 1 can be used to evaluate short-run
elasticities. One important feature of the adjustment cost model is its emphasis on
maintaining a clear conceptual distinction between short-run and long-run responses
to changing opportunity costs. This was cited earlier as an important justification for
conducting the present study. Since agricultural producers are exposed to constantly
changing opportunity costs, it is useful to evaluate behavior in this volatile economic
environment.
Short-run elasticities are reported in Table 2. Diagonal elements of the matrix
are own-price elasticities; off-diagonal elements are the corresponding cross-price
elasticities. Own-price elasticities for the quasi-fixed inputs are observed to be nu-
merically small in magnitude. This suggests that, given the technological constraints
faced by agricultural producers, a change in the rental price does not evoke signif-
icantly different utilization patterns for these inputs in the short-run. A different
conclusion emerges when short-run elasticities of variable inputs are evaluated. A
change in the price of variable inputs, namely hired labor and purchased inputs,
induces shifts in utilization patterns. Now turn to an examination of own-price elas-
ticities for outputs. This value for livestock is extremely small although values for
all other inputs exceed one. Among outputs considered, the own-price elasticity for
grain was highest. Values for dairy and other crops are numerically similar.

6. CONCLUSIONS AND POLICY IMPLICATIONS

It is possible to distinguish several approaches to econometric model-building. First,


the economist can use ad hoc restrictions to specify a structural model. As an
TABLE 2
Short-run elasticities with respect to prices.
Prices
Equipment Farm Self-
Stocks & Real Produced Employed Hired Purchased
Buildings Estate Durables Labor Labor Energy Inputs Livestock Crops
Equip. % Bldg. -1.289E-6 -3.265E-7 4.486E-6 -1.039E-6 -2.957E-6 -7.063E-7 -1.l19E-6 -3.456E-6 6.409E-6

Real Estate -7.848E-8 -1.359E-8 2. I 84E-7 9.635E-8 1. 827E-7 -2.799E-8 -1.859E-8 -4.208E-7 6.202E-8

Farm Durables 5.903E-8 -2.205E-7 -8.3221E-6 -3.5105E-6 -3.41OE-6 -4.209E-8 -5.045E-7 3.927E-6 1.202E-5

Self Employed Labor -9.0371E-7 -3. 5204E-7 5.0414E-7 -2.319E-6 -3.957E-6 -5.293E-7 -1.0848E-6 -2.961E-7 8.938E-6
~
Hired Labor -0.020 -0.005 -0.012 -0.062 -0.131 -0.016 -0.021 -0.039 0.277
&

OQ

Energy -0.022 -0.005 -0.013 -0.025 -0.039 -0.014 -0.017 -0.093 0.241 S';:s
I:l
;:I
~.
Purchased Inputs -0.005 -0.002 -0.007 -0.011 -0.015 -0.003 -0.005 -0.008 0.130
~
"-
0.008 ~
Livestock 0.016 0.002 0.011 0.006 0.004 0.009 0.105 -0.146 Ii::
~
"-
).
Crops 0.283 0.0198 0.626 0.277 0.397 0.175 0.134 0.727 0.001
~
Ii::
'"li
;:s
..."-
N
.-
VI
216 A. Somwaru et al.

alternative to building ad hoc structural models, one can model dynamic relationships
between economic variables as vector autoregressions. These are essentially reduced-
form relationships, which utilize a priori restrictions only parsimoniously. The model
utilized in the present study is a structural one, with a structure derived explicitly from
relevant economic theory. Restrictions placed on the model are not ad hoc, but rather,
are shown to be implied by the value maximization hypotheses. In this sense, there is
significant justification for adopting this approach. However, there is always the risk
that restrictions imposed on the model, especially curvature, are not supported by the
data. It is argued here that structural models based on dynamic optimization should
be used in their entirety or not at all. Selective utilization of econometric equations
from value maximization without implied parametric restrictions is hard to justify.
This is equally true for static models within the dual framework, where curvature
conditions are frequently not imposed. An important exception is Ball (1988).
Accordingly, a methodology is developed to impose convexity restrictions on a
multiple-input, multiple-output model of aggregate agricultural production. Point
estimates of parameters are used to evaluate short-run elasticities. Generally, short-
run behavioral responses of quasi-fixed inputs to opportunity costs are small. Hence,
implementation of a favorable tax policy in U.S. agriculture would have a small,
albeit non-negligible, effect on investment. By contrast, changing relative prices are
observed to have a significant effect on quantities of variable inputs employed. In a
similar fashion, supply response is also observed to be elastic. A supply management
policy based on manipulating market incentives can prove to be effective, especially
in U.S. agriculture.
Several shortcomings of the present modeling effort may be mentioned. First, it
is necessary to address the issue of expectations formation in a more sophisticated
manner than was possible with the present model. Inclusion of non-static expectations
would serve to strengthen the foundations of the multiple-input, mUltiple-output
model. This will necessarily entail imposition of highly complex cross-equation
restrictions. For this reason, incorporating non-static expectations is relegated to
a future goal. A second issue that needs to be addressed is the inclusion of U.S.
agricultural policy variables in a more direct fashion than was possible within the
exiting framework. While recognizing the importance of this objective, it should be
noted that the U.S. agricultural production sector has been exposed to a diverse set
of policy instruments, which have changed constantly over time. It is by no means
an easy task to integrate this matrix of policy interventions into the existing model.
Future investigations must, therefore, concentrate on the dual objective of both
improving model specification and making models more applicable to policy analysts
and performing hypotheses tests on the structure of the production system. Finally,
alternative functional forms must be subject to experiment to evaluate robustness of
empirical results to alternative specifications (Baffes and Vasavada, 1989). These are
some meaningful directions to pursue in the study of resource response to changing
market incentives.
Modeling Dynamic Resource Adjustment 217

REFERENCES

1. Arrow, K. and M. Kurz (1970), Public Investment, the Rate of Return and Optimal Social
Policy, Baltimore; Johns Hopkins Press.
2. Bacharach, M. (1965), "Estimating Non-Negative Matrices From Marginal Data," Inter-
national Economic Review, 6: 294-310.
3. Baffes,1. and U. Vasavada (1989), "On the Choice of Functional Forms in Agricultural
Production Analysis," Applied Economics, 21: 1053-1061.
4. Ball, V. E. (1988), "Modelling Supply Response in a Multiproduct Framework," Ameri-
can Journal of Agriculture Economics, 70: 813-25.
5. Ball, V.E. (1985), "Output, Input, and Productivity Measurement in U.S. Agriculture,"
American Journal of Agricultural Economics, 67: 475-86.
6. Brooke, A. D. Kendrick, and A. Meeraus (1988), GAMS: A User's Guide, The Scientic
Press.
7. Epstein, L. G. (1981), "Duality Theory and Functional Forms for Dynamic Factor De-
mands," Review of Economic Studies, 48: 81-95.
8. Chambers, R. G. and R. Lopez (1984), "A General Dynamic Supply Response Model,"
Northeastern Journal of Agricultural and Resource Economics, 13: 142-154.
9. Gollop, F., and D. Jorgenson (1980), "U.S. Productivity Growth by Industry, 1947-
73." in New Developments in Productivity Analysis, ed. 1. Kendrick and B. Vaccara.
National Bureau of Economic Research, Studies in Income and Wealth, Vol. 44. Chicago;
University of Chicago Press.
10. Gunter, L. and U. Vasavada (1988), "Dynamic Labor Demand Schedules for U.S. Agri-
culture," Applied Economics.
11. Jorgenson, Dale W. (1974), "The Economic Theory of Replacement and Depreciation,"
Econometrics and Economic Theory: Essays in Honor of Jan Tinbergen. Ed. W. Sell-
akaerts; London, MacMillian Publishing Co., pp. 189-222.
12. Lau, L. J. (1978), "Testing and Imposing Monotonicity, Convexity, and Quais-Convexity
Constraints," Production Economics: A Dual Approach to Theory and Applications. Ed.
D. McFadden and M. Fuss; Amsterdam, North Holland.
13. Malinvaud, E. (1970), Statistical Methods of Econometrics, Amsterdam, North Holland,
1970.
14. Mortenson, D.T. (1973), "Generalized Cost of Adjustment and Dynamic Factor Demand
Theory," Econometrica, 41: 657:65.
15. Murtaugh B. and M. Saunders (1983), MINOS 5.0 Users Guide. Systems Optimization
Laboratory Technical Report SOL 83-20, Stanford University, Stanford, Ca.
16. Murtaugh B. and M. Saunders (1983), MINOS 5.0 Users Guide. Systems Optimization
Laboratory Technical Report SOL 83-20, Stanford University, Stanford, Ca.
17. Nadiri, M. I., and S. Rosen (1969), "Interrelated Factor Demands," American Economic
Review, 59: 457-71.
18. Rao, C.R. (1973), Linear Statistical Inference and Its Applications, John Wiley and Sons;
New York.
19. Ruble, w.L. (1968), "Improving the Computation of Simultaneous Stochastic Linear
Equation Estimates," Ph.D. Thesis, Department of Economics Michigan State University;
East Lansing.
20. Shumway, G.R. (1983), "Supply, Demand, and Technology in a Multiproduct Industry:
Texas Field Crops," American Journal of Agricultura' ·<:conomics, 65: 748-60.
21. U.S. Treasury Department (1942), Bureau of Internal Revenue, Income Tax Depre-
ciation and Obsolescence, Estimated Useful Life and Depreciation Rates, Bulletin F;
Washington, DC.
22. Vasavada, U., and V.E. Ball (1988), "Modeling Dynamic Adjustment in a Multi-Output
Framework," Agricultural Economics.
23. Vasavada, u., and R. G. Chambers (1986), "Investment in U.S. Agriculture," American
Journal ofAgricultural Economics, 68: 950-60.
218 A. Somwaru et al.

24. Weaver. R.D. (1983), "Multiple Input, Multiple Output Production Choices and Tech-
nology in the U.S. Wheat Regions," American Journal of Agricultural Economics, 65:
45-56.
ATREYA CHAKRABORTY AND CHRISTOPHER F. BAUM*

Intensity of Takeover Defenses: The Empirical Evidence

ABSTRACT. This paper focuses on the construction of an index of the intensity of firms'
antitakeover defenses. While many aspects of corporate behavior are qualitative in nature,
an evaluation of a firm's stance and the underlying motives for its behavior often depend on
the elements of a set of qualitative factors. The interactions beween these factors are likely
to have important implications. In this context, only a composite measure will capture these
interactions and their implications for firms' actions. We focus on the creation of an ordinal
measure of anti-takeover defenses and utilize the ordered probit estimation technique to relate
the magnitude of this measure to the motives for instituting these defenses. Our estimates are
generally supportive of the managerial entrenchment hypothesis.

1. INTRODUCTION

The evaluation of corporate behavior generally involves the assessment of a firm's


stance toward its competitors in the markets in which it operates. Whether we are
examining the firm's behavior in the product markets, the factor markets or the
market for corporate control, we would expect that measurement of the firm's stance
involves both quantitative and qualitative aspects. Much of financial research has
focused upon the readily quantifiable aspects of behavior. However, if we focus on
the market for corporate control, it is apparent that many aspects of a firm's stance
can only be reflected in qualitative factors: for instance, does the firm offer "golden
parachute" severance contracts to its executives? Since the presence or absence of
such a clause in an executive's contract is likely to have important incentive effects
on their behavior, it is the qualitative aspect that we must study.
While techniques such as binomial logit or probit may be readily applied to
individual qualitative measures and their causal factors, such models fall short of
capturing the essence of many legal aspects of a firm's stance. For instance, the recent
wave of "anti-takeover amendments" adopted by major American corporations has
left us with an entire array of qualitative factors at the firm level. Firm ABC may
have adopted takover defenses 1, 4, and 6, whereas its rival Firm XYZ may have
chosen to adopt defenses 2, 6, and 9. Which firm has the stronger defenses against
an hostile takeover? As in the evaluation of any deterrent capability, such a question
is difficult to answer.
* We thank Robert Taggart, Jr., Richard Arnott, Stephen Polasky, seminar participants at
Universite du Quebec 11 Montreal, and participants in the 1992 meetings of the Society for
Economic Dynamics and Control and the Financial Management Association for their valuable
comments, criticisms and many insightful discussions. The usual disclaimer applies.

D. A. Belsley (ed.), Computational Techniquesfor Econometrics and Economic Analysis, 219-231.


© 1994 Kluwer Academic Publishers.
220 A. Chakraborty and C.R Baum

In this study, we attempt to resolve this quandary by constructing an ordinal index


of the strength of firms' takeover defenses which should enable us to categorize
firms by "intensity." Although this technique is applied specifically to the study of
firms' takeover defenses, it should be evident that it could equally well be applied to
many sets of qualitative aspects of firms' behavior-for instance, in evaluating their
marketing strategy, the various dimensions of their research and development effort,
or the importance of various types of intangible assets on their balance sheet.
The plan of the paper is as follows: the next section gives a brief summary
of the types of anti-takeover amendments and discusses some of the evidence in
the literature regarding their effects on corporate performance. In Section 3, the
composite measure of intensity is developed in the context of a dataset of 68 U.S.
corporations, and the ordered probit technique is described. Section 4 specifies an
hypothesis under which the intensity of anti-takeover defenses should be related to
various causal factors and presents our empirical findings from the ordered probit
model. Section 5 concludes and presents suggestions for future research.

2. ANTI-TAKEOVER AMENDMENTS IN AMERICAN CORPORATIONS

The merger wave of the 1980s, coupled with the sophistication of investment banks'
financial engineers, caused many large corporations in the United States to include
anti-takeover amendments in their corporate charters. Rosenbaum (1986) details
anti-takeover measures for 424 Fortune 500 firms as of May 1986. Among these
companies, a surprisingly large number (403) had at least some amendments that
were designed to have anti-takeover consequences or that could be adopted to thwart
takeover attempts. Of these firms, Rosenbaum documents 143 as having poison
pills, 158 with fair price amendments, 223 with classified boards, 362 with blank
check provisions, 65 that require a supermajority to approve a merger, and 222 firms
as having some types of limited shareholder rights. We now define each of these
defensive measures.

Poison Pills: These are preferred stock rights plans adopted by the management,
generally without the shareholders' approval. These amendments are exclusively
tailored to thwart hostile bids by triggering actions that make the target financially
unattractive.

Fair Price Amendments: These are designed to prevent two-tier takeover offers.
They require that the bidders pay all the tendering shareholders the same price. Most
fair price provisions can be waived if the bidder's offer is approved by a supermajor-
ity of target shareholders. This supermajority requirement may be as low as 66% or
as high as 90%.

Classified Boards: Such amendments divide the board of directors into three classes.
Each year only one class of directors is due for election. This prevents a raider from
immediately replacing the full board and taking control of a company, even if the
Intensity of Takeover Defenses: The Empirical Evidence 221

raider controls a majority of the shares. More importantly, such amendments also
make proxy contests over control extremely difficult.

Blank Check: These give the managers (via the board of directors) a "very broad
discretion to establish voting, dividend conversion and other rights for preferred
stock that a company may use" (Rosenbaum, 1986, p. 7). Such discretionary powers
may easily be used to issue securities primarily intended to thwart takeovers (poison
pills). Finally, since the SEC requires companies seeking to issue preferred shares to
disclose to shareholders that unused preferred stock may have anti-takeover effects,
regardless of the company's professed intention, Rosenbaum contends that blank
checks should indeed be classified as anti-takeover measures.

Supermajority requirements: These provisions require approval for specified actions


that is far higher than that set by state laws. Actions which would normally require
majority approval, under these amendments, require approval levels as high as 90%.
Such provisions require a hostile bidder to obtain higher percentages of shares to
obtain control over a firm.

Dual Class Recapitalization: A new class of equity is distributed to shareholders


with superior voting rights but inferior dividends or marketability. This permits
incumbent managers to obtain a majority of votes without owning a majority of the
common shares.

Limiting Shareholders' Rights: These include provisions such as: no shareholder


action by written consent (without a meeting), procedural requirements for share-
holders to nominate directors, restricting shareholders from calling special meetings,
and supermajority requirements to repeal classified boards. Each of these measures
selectively, or in conjunction with other measures, can be used to deter the share-
holders from facilitating changes in corporate control.
Studies investigating the effects of anti-takeover amendments have been, on the
whole, quite inconclusive about the net effect of these financial innovations. DeAn-
gelo and Rice (1983), while examining a sample of NYSE listed firms adopting
anti-takeover amendments during 1971-79, found statistically insignificant (albeit
negative) abnormal stock returns around the announcement of such amendments.
Conversely, Linn and McConnell's (1983) investigation into abnormal returns at the
announcement date for 475 NYSE listed firms (between 1960--80) found significantly
positive abnormal returns. Malatesta and Walkling (1988) report statistically signifi-
cant reductions of shareholders' wealth for firms that adopt poison pill defenses. They
also note that firms adopting such defenses were significantly less profitable than the
average firm in their industries during the year prior to adoption. Jarrell and Poulson
(1987), investigating similar reactions for 600 firms over the period 1979-85, de-
tected a significantly negative price reaction for certain kinds of amendments. Pugh
et al. (1992), investigating the impact of charter amendments' adoptions on firms'
capital expenditures and R&D outlays, conclude that managers take a longer-term
view following such actions. In light of these results, it is difficult unambiguously to
determine whether the adoption of ATAs is consistent with shareholders' interests.
222 A. Chakraborty and c.F. Baum

3. A MEASURE OF INTENSITY OF ANTI-TAKEOVER AMENDMENTS

Many researchers' analyses of takeover defenses have been hindered by the general
unavailability of data on the prevalence of such measures. Although a detailed study
of a firm's SEC filings and annual reports would provide much of this information,
there are still serious issues of heterogeneity and classification of the firm-specific
measures into generally accepted categories.
Our goal is an index that incorporates all categories of defensive strategies into
a single ordinal index value. The presence of a particular anti-takeover defense in a
corporate charter is a qualitative factor. Since there is no consensus on the severity of
various defensive mechanisms, it would appear that any cardinal index of the strength
of these defenses would be arbitrary. To deal with this critique, we have used the
qualitative information in Rosenbaum's Takeover Defenses dataset, combined with
other measures of firms' characteristics, to build an ordinal index of "intensity."
We combine data from Rosenbaum's dataset, which indicates whether firms had
various anti-takeover amendments in place as of 1986, with firms' characteristics from
Thies and Baum's Panel84 dataset. I The latter dataset contains annual data on the
firm level for 1977-1983 for a total of 134 large U.S. manufacturing corporations.
Panel84 reconstructs financial statements on a replacement-cost-accounting basis,
exploiting inflation-adjusted data obtained from firms' Forms 10-K and annual report
disclosures required during this period by the Securities and Exchange Commission
and the Federal Accounting Standards Board. 2 These data are particularly appropriate
for this study since we hypothesize that Tobin's q is a relevant explanatory variable,
and Panel84 contains consistent estimates of Tobin's q that are largely free from the
imputation bias created by the commonly-used methods of Brainard, Shoven and
Weiss (1980). This bias may be especially harmful, as it represents measurement
error correlated with common indicators of firm performance, as noted by Klock et
al. (1991).
Of the 134 firms in Pane184, there are 68 which are also to be found in Rosen-
baum's Takeover Defense dataset. 3 We use these matching firms in the empirical
analysis of the next section. Since the Panel84 data provides us with annual detail
of firms' performances from 1977 through 1983, it can be viewed as exogenous to
the observation of firms' takeover defenses in 1986. Although the intervening years
could provide useful information, the use of firm characteristics from the 1977-1983

1 An earlier version of the dataset, containing 100 finns, is documented in Thies and
Sturrock (1987), and was further described in Klock, Thies and Baum (1991). Of these 100
finns, 98 appear in the present dataset. Data for the additional 36 finns were gathered by Glenn
Rudebusch and Steven Oliner of the Federal Reserve Board of Govemors.
2 During the period in which finns were required to report current cost data in their
annual reports (197(r1983), they were given broad leeway in the methodology used for their
calculations. Thus, the accuracy of these figures could be debated. Nevertheless, we presume
that these estimates of replacement cost are likely to be more reliable than those which could
be constructed by outside researchers via adjustments to historical costs, using aggregate price
deflators for capital goods, without access to finn-specific vintage data.
3 A list of the 68 finns and their two-digit industry codes is available from the authors on
request.
Intensity of Takeover Defenses: The Empirical Evidence 223

TABLE I
Descriptive statistics of the sample.
Variable Mean Std. Dev. Minimum Maximum
Tobin's Q 0.97 0.43 0.45 2.83
Financial Leverage 0.43 0.34 0.02 1.50
R&D per dollar of Sales 0.025 0.026 0.0 0.110
Advertising per dollar of Sales 0.D15 0.027 0.0 0.166
Sigma from CAPM regression 7.58 2.74 4.45 19.94

CAPM Beta 1.04 0.46 2.49


Sales, $ Bi!. 3.330 0.196 62.790
Cash Flow, $ Bi!. 0.215 -0.003 4.398
Net Income, $ Bi!. 0.086 -0.332 1.087
Total Assets, $ Bi!. 3.221 0.185 46.456
Pr{Poison Pill} 0.29 0.46
Pr{Blank Check} 0.85 0.36
Pr{ Classified Board} 0.62 0.49
Pr{Fair Price Amendment} 0.41 0.50
Pr{ Supermajority for Merger} 0.19 0.40
Pr{Dual Class Recapitalization} 0.09 0.29
Pr{Limited Shareholder Rights} 0.60 0.49
Notes:
Statistics (other than the ATA probabilities) are based on the firm average values
for 1977-1983. There are 68 firms in the analysis.

period allows us to avoid possible simultaneity between firms' adoption of takeover


defenses and the financial markets' reactions to those events. Of course, our knowl-
edge that a certain defense is in place in 1986 does not tie down the date of its
adoption, but the use of panel data from a seven-year window should mitigate the
"announcement effects" if a defense was first put in place during this seven-year pe-
riod. Descriptive statistics based on the time-series average values of the explanatory
variables used in our analysis are presented in Table 1. A number of other variables'
descriptive statistics from this seven-year period are given to illustrate the range of
firms considered in this sample. We also present the sample proportions of firms
who have adopted the various anti-takeover amendments described in the previous
section.
The takeover defenses are classed, via extensions to Ruback's (1988) scheme,
as either innately mild or severe. The "mild" defenses are those in the set Fair
Price, Supermajority, Blank Check, and Limited Rights. The "severe" defenses are,
according to Ruback, Poison Pill, Dual Class Recapitalization, and Classified Board.
The most disaggregate form of the ordinal index is defined in terms of the number of
"mild" defenses (NRMILD) and the number of "severe" defenses (NRSEV) as
224 A. Chakraborty and c.F. Baum

Index value Definition Number of firms


0 NRMILD = 0 and NRSEV = 0 2

NRMILD > 0 and NRSEV = 0 17

2 NRMILD ::; 3 and NRSEV = 1 28

(NRMILD > 3 and NRSEV = 1)


3 21
orNRSEV ~ 2.

Although this method yields an ordinal measure of the intensity of anti-takeover


amendments, the problem still remains. How might this ordinal measure be related
to a set of causal factors so that an hypothesis on the intensity of ATAs might be
rigorously tested? The solution to this problem is to be found in the econometric
technique of ordered probit analysis (Zavoina and McElvey, 1975). Standard bino-
mial probit models relate the probability of observing a particular attribute to a set of
causal factors via a nonlinear transformation: the cumulative distribution function of
the error process. Multinomial probit models allow for multiple, mutually exclusive
and exhaustive alternatives, but make no assumption about the ordinality of those
alternatives (e.g. choosing to commute via train is not ambiguously preferred to the
bus or the car-pool). Here, however there is more information from the values of
the ordinal index, and even though merely ordinal, we can relate these values to an
unobservable "latent variable" in an extension of the binomial probit technique.
In binomial probit, the explanatory variables (and a Gaussian error term) are
linearly related to an unobservable latent variable, or index, as Ii = X(3 + Ci. If that
index equals or exceeds a threshold value, we observe the attribute (Yi = 1); if it
falls short of the threshold, we do not (Yi = 0). While the effect of an explanatory
variable on the index is linear, the effect on the predicted probability of observing
the attribute is not, since Pr[Yi = 1] = F(Ii), where F is the cumulative distribution
function of the normal distribution.
The ordered probit technique follows the same approach, but assumes that there
are multiple thresholds that the latent variable may cross. The basic model is:
Z = X(3 + c, C '" N[O, 1] ,
where
y=O if Z<ILo

Y = 1 if ILo < Z < ILl

Y = 2 if ILl < Z < IL2

Y = J if Z > ILj-1 .
Intensity of Takeover Defenses: The Empirical Evidence 225

Here y is observed, while z is not. We wish to estimate the parameters of the (3 vector
as well as the vector /L. Since the model includes a constant term, one of the /L'S is not
identified. We accordingly normalize /Lo to zero, and estimate /LI ... /LJ-I. In our
setting, J is equal to 3, the maximum value of intensity, so that estimates of /LI and
/L2 may be recovered. As in the binomial probit model, we cannot separately identify
the variance of the error term, which is thus set to unity. The ordered probit estimator
is available as a component of LIMDEP (Greene, 1992) and is further described in
Greene (1990, pp. 703-706).

4. ATA INTENSITY AND FIRMS' CHARACTERISTICS

In this section, we formulate an hypothesis under which the intensity of firms' anti-
takeover amendments should be related to observable measures of their behavior.
If we observe firms altering their corporate charters to include these amendments
- as we did in large number in the 1980s - we might conclude that firms' actions
are merely reflective of their shareholders' interests and that these actions are being
taken to maximize the value of the firm, as neoclassical theory would suggest.
An alternative explanation has been provided by finance researchers analyzing the
aspects of agency costs. In this formulation, protective measures such as ATAs may
well be indications of an "entrenched management" - essentially, managers who are
looking after their personal interests first and foremost and may indeed take actions
contrary to shareholders' best interests. In examining the intensity of ATAs, we
consider explanatory factors that would be indicative of the entrenched management
hypothesis. Under this hypothesis, we assume that the primary role of anti-takeover
amendments is to insulate underperforming managers from the discipline of the
market for corporate control.
The entrenched management hypothesis suggests that firms that are more likely to
be takeover targets should display a greater intensity of takeover defenses. What gives
rise to this vulnerability? One obvious cause is a poor performance - irrespective
of reason - that leads the firm's valuation in the financial markets to be low. Since
the financial markets' evaluations of a firm's worth are forward-looking measures,
a firm with poor performance and poor prospects will have a low valuation in the
stockmarket and may well be a takeover target - especially if a raider considers the
root of that poor performance to be inadequate management. The shareholders of
such a firm would not want to place obstacles in the raider's path, and ifthey share the
market's low opinion of managerial talent, they would encourage a takeover. Such
managers would, naturally, have every reason to protect themselves, most especially
in the case where they recognize their own shortcomings.
To quantify this rationale, we utilize Tobin's q as an objective and non-myopic
indicator of current management's performance. Past research on the relation between
q and takeovers (Servaes, 1991; Lang et aI., 1989; Morck et aI., 1988) has interpreted
q as a measure of managerial performance: e.g. "In general the shareholders of low
q targets benefit more from takeovers than shareholders of high q targets." (Lang et
al.,1989,p.137).
226 A. Chakraborty and c.F. Baum

If q were indeed a good proxy for managerial efficiency, and if defensive strate-
gies were primarily instituted for shareholders' interests, then one should not expect
Tobin's q to have any explanatory power in predicting the adoption of defensive
measures. However, if the manager were in charge of instituting defensive instru-
ments, one would expect an inverse relationship between anti-takeover amendments
and Tobin's q: the less efficient the manager (as signalled by a lower q), the greater
would be the need for her insulating herself from the disciplining forces of the market.
The second explanatory variable which we include in the model of ATA intensity
is a measure of financial leverage. The rationale for including financial leverage as
an explanatory variable for the analysis of takeover defenses comes from the works
of Jensen (1986) and Ross (1977). The role of debt in motivating organizational
efficiency is well documented in the principal-agent literature. Jensen (1986) points
to the possibility that, more than any other action, debt creation can actually lend
credibility to the management's promise to payout future cash flows. Thus, higher
levels of debt lower the agency cost of cash flows by reducing the discretionary levels
of cash flow available to the management. Conversely, managers less willing to be
exposed to the disciplines of the financial markets would tend to have lower levels
of debt and greater need for defensive measures to insulate them from the market for
corporate control.
The explanatory power ofleverage in predicting anti-takeover amendments should
be minimal if we assume that these amendments primarily serve the shareholders'
interests. We should not expect any systematic relationship between leverage and
defensive measures under this hypothesis if we assume a well functioning market for
corporate control. However, if defensive amendments serve primarily the interests of
an entrenched management, ceteris paribus, one would expect an inverse relationship
between leverage and the probability of adopting anti-takeover measures.
Under the hypothesis that the adoption of anti-takeover amendments reflects the
actions of an entrenched management, we would expect to find
[-] [-]
Pr{ ATA} = f [ Tobin's q, Leverage]
with respect to the likelihood of observing individual ATAs, or, by making use of the
index of ATA intensity, that
[-] [-]
Intensity{ATA} =f [Tobin's q, Leverage].

The explanatory variables are defined as a market-value-based Tobin's q measure,


Q-AVa, which is the simple average of Tobin's q from the Panel84 dataset for
the years 1977-1983, as well as LEV-AVa, the firm's average financial leverage
(debt/equity ratio)4 over the seven years.
The first column of Table 2 presents the ordered probit estimates of Intensity. The
model is quite successful, with both Tobin's q and financial leverage possessing sig-
nificantly negative coefficient estimates, as predicted by the managerial entrenchment
hypothesis.
4 Measured in market (or fair) value terms.
Intensity of Takeover Defenses: The Empirical Evidence 227

TABLE 2
Ordered probit estimates of the intensity of takeover defense strate-
gies.
Dependent variable Intensity Intensity2
Constant 5.396 4.997
(2.7) (2.6)

Q-AVG -2.110 -1.897


(-3.5) (-3.1)

LEV-AVG -1.351 -1.201


(-2.6) (-2.2)

Jll 2.134 2.026


(1.3) (1.4)

3.395
(2.0)

Log-likelihood -67.666 -37.939

Model X2 24.947 (0.5x 10- 7 )" 17.474 (0.00001)"


Notes:
Estimates are based on 68 observations. Asymptotic t-statistics
are given in parentheses below the estimated coefficients. The
model X2 -statistic tests the hypothesis that all slopes are zero;
its tail probability (p-value) is given in parentheses. "a"
denotes significance at the one per cent level.

To ensure that the validity of these results is not an artifact of the Intensity index,
we constructed an aggregation of the index, collapsing categories 2 and 3 into a single
category. The resulting Intensity2 index considers that all firms with one or more
"severe" anti-takeover amendments are given the same value. The results from this
version of the ordinal index are given in the second column of Table 2. They are
qualitatively very similar to those achieved with Intensity: Tobin's q and leverage
are highly significant with the expected negative sign.
In practice, the value of such a model of Intensity must involve its predictive
power. Table 3 reports the distribution of "actual" values of the original Intensity
index, and presents the distribution of predicted values for the model given in Table 2.
The discrete prediction of the model, for each firm, is taken as the alternative among
the set {O, 1,2,3} that has the greatest probability. Both the full and restricted models
correctly classify the two firms with Intensity = O. Both models underpredict the
228 A. Chakraborty and c.F. Baum

TABLE 3
Ordered probit predictions of intensity.
Actual Predicted
0 2 3 Total
0 2 0 0 0 2
I 0 6 9 2 17
2 0 7 14 7 28
3 0 0 9 12 21
Total 2 13 32 21 68
Notes:
Cells are the frequencies of actual and predicted outcomes. The
predicted outcome has maximum probability. The results correspond
to the model reported in the first column of Table 2.

TABLE 4
Predicted probabilities of intensity levels for variations in To-
bin's q and leverage.
Intensity 0 2 3
30% below avg 0.0004 0.1072 0.4013 0.4910
20% below avg 0.0008 0.1498 0.4395 0.4099
10% below avg 0.0015 0.2021 0.4639 0.3325
Average q 0.0029 0.2636 0.4719 0.2616
10% above avg 0.0054 0.3326 0.4626 0.1994
20% above avg 0.0095 0.4063 0.4371 0.1471
30% above avg 0.0161 0.4809 0.3981 0.1048

30% above avg 0.0017 0.2105 0.4661 0.3218


20% above avg 0.0020 0.2275 0.4694 0.3011
10% above avg 0.0024 0.2452 0.4714 0.2810
Average leverage 0.0029 0.2636 0.4719 0.2616
10% above avg 0.0035 0.2826 0.4710 0.2429
20% above avg 0.0041 0.3022 0.4687 0.2249
30% above avg 0.0049 0.3223 0.4650 0.2078

prevalence of Intensity = 1 in the data, while overpredicting Intensity = 2. Presence


of the strongest measures - noted by Intensity = 3 - are underpredicted, with 12 of
the 21 firms being correctly classified.
To consider the implications of the model ofIntensity further, in Table 4 we present
the predicted probabilities that a firm will fall in Intensity categories {O, 1,2,3} as a
function of variations in each of the explanatory variables. The bordered row in each
Intensity of Takeover Defenses: The Empirical Evidence 229

0.4~------~----~-------r~~~-T--------~----------~'

0.35

0.3

0
0.25
=
....a
E-t
II) 0.2 !i
c::CD tl
E-t
Cl en
O.lS

0.1

0.05

0
-4 -b'X -2 P(l)-b'X 0 P(2)-b'X 2 4
Standard deviation units

0.4

0.35

0.3
0

0.25 =
.......
::., E-t

II)
c:: 0.2 ~
E-t
c! '"
0.15

0.1

o.os

0
-4 -b'X -2 p(l)-b'X o11(2)-b 'X 2 4
Standard deviation units

Fig. 1. Predicted probabilities of intensity levels. (a) Evaluated at point of means. (b)
Evaluated at 20% below average q, mean leverage.

block is the predicted probability distribution at the multivariate point of means: in


our sample, for average Tobin's q of 0.97 and average leverage of 0.43. The table
shows how the probability distribution shifts with an decrease or increase in one of
the explanatory variables, holding the other at its mean. For instance, the first row
of the table considers the probabilities for a Tobin's q 30 per cent below the average
230 A. Chakraborty and c.F. Baum

value. The model predicts, in this case, a probability of 0.49 that such a firm will
have Intensity=3: at least one of the severe anti-takeover defenses, accompanied
by three or more of the less severe defenses, or at least two of the severe defenses.
The distribution shifts markedly toward a stronger defensive posture with lower than
average q, and vice versa. Figure 1 depicts how this shift takes place, illustrating
how the thresholds between strength categories are displaced leftward when we move
from average q levels to a q level 20 per cent below average. The results for variations
in leverage are similar, if less marked. The probability that a firm will have Strength
= 3 rises from 0.26 (with average leverage) to 0.32 for a firm with leverage 30 per
cent higher than the average. These results predict a very strong response, in terms
of the intensity of anti-takeover amendments, to either a low value of q or lower
leverage. Since a low value for either of these financial characteristics is indicative
of a greater probability of a raid (Servaes, 1991; Palepu, 1986), the presence of these
amendments should not be expected to reflect shareholders' interests.

5. CONCLUSIONS

The presence of corporate takeover defenses - taken in combination - is shown to


be clearly related to firm-specific factors such as Tobin's q and financial leverage.
This result has been achieved by taking the qualitative information available on a
firm's stance toward takeovers and applying a fairly loose classification to generate
an ordinal index. The ordered probit technique is then used to relate the level of the
index to a set of explanatory factors.
The multi-dimensional nature of many aspects of corporate behavior suggests
that this technique might be useful in many aspects of empirical research. Since we
are often unable to quantify the magnitude of a firm's stance toward the markets in
which it participates, this method may have wide applicability. Indeed, although we
have focused on corporate decision-making, the method would clearly be applicable
to the analysis of countries, industries, or individuals.

REFERENCES

Brainard, w., John Shoven, and Laurence Weiss, 1980. "The Financial Valuation of the Return
to Capital". Brookings Papers on Economic Activity 2, 453-51l.
DeAngelo, Harry, and Edward M. Rice, 1983. "Anti-takeover Charter Amendments and
Stockholder Wealth". Journal of Financial Economics 11, 329-359.
Greene, William, 1990. Econometric Analysis. New York: Macmillan Publishing Co.
Greene, William, 1992. LIMDEP Version 6.0 User's Manual. Bellport, NY: Econometric
Software, Inc.
Jarrell, Gregg and Annette Poulsen, 1987. "Shark Repellents and Stock Prices: The Effects of
Antitakeover Amendments Since 1980". Journal of Financial Economics 19,127-168.
Jensen, Michael C., 1986. "Agency Costs of Free Cash Flow, Corporate Finance, and
Takeovers". American Economic Review 76, 323-329.
Klock, M., C.P. Thies, and C.P. Baum, 1991. ''Tobin's q and Measurement Error: Caveat
Investigator". Journal of Economics and Business 43, 241-252.
Intensity of Takeover Defenses: The Empirical Evidence 231

Lang, L., R Stulz, and RA. Walkling, 1989. "Managerial performance, Tobin's q and the
gains from successful tender offers". Journal of Financial Economics 24, 137-154.
Linn, Scott c., and John T. McConnell, 1983. "An Empirical Investigation of the Impact of
Antitakeover Amendments on Common Stock Prices". Journal of Financial Economics
11,361-399.
Malatesta, Paul H., and Ralph A. Walkling, 1988. "Poison Pill Securities". Journal ofFinancial
Economics 20, 347-376.
Morck, Randall, Andrei Shleifer, and Robert W. Vishny, 1988. "Characteristics of Targets of
Hostile and Friendly Takeovers", in Corporate Takeovers: Causes and Consequences,
Alan Auerbach, ed. Chicago: University of Chicago Press.
Palepu, Krishna G., 1986. "Predicting Takeover Targets: A Methodological and Empirical
Analysis". Journal ofAccounting and Economics 8,3-37.
Pugh, W.N., Page, D.E., and J. S. Jahera, Jr., 1992. "Anti takeover Charter Amendments:
Effects on Corporate Decisions". Journal of Financial Research 15:1, 57-67.
Rosenbaum, Virginia K., 1986. "Takeover Defenses: Profiles ofthe Fortune 500". Washington:
Investor Responsibility Research Center, Inc.
Ross, S.A., 1977. "The Determination of Financial Structure: The Incentive-Signalling Ap-
proach". Bell Journal of Economics 8, 23-40.
Ruback, Richard, 1988. "An overview of takeover defenses", in Mergers and Acquisitions,
Alan Auerbach, ed. Chicago:University of Chicago Press.
Servaes, Henri, 1991. "Tobin's Q and the Gains from Takeovers". Journal of Finance 66:1,
409-419.
Thies, Clifford and Thomas Sturrock, 1987. "What Did Inflation Cost Accounting Tell Us?"
Journal of Accounting, Auditing and Finance Fall 1987, pp. 375-391.
Zavoina, T., and W. McElvey, 1975. "A Statistical Model for the Analysis of Ordinal Level
Dependent Variables". Journal of Mathematical Sociology 4, 103-120.
List of Contributors

V. Eldon Ball
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA

Ravi Bansal
Department of Economics, Duke University, Durham, NC, USA

Christopher F. Baum
Department of Economics, Boston College, Chestnut Hill, MA, USA

C.R. Birchenhall
Department of Econometrics, University of Manchester, Manchester, England

Ismail Chabini
Centre de Recherche sur les Transports, University of Montreal, Montreal, Canada

Atreya Chakraborty
Lemberg Program in International Finance, Brandeis University, USA

Gregory C. Chow
Department of Economics, Princeton University, Princeton, NJ, USA

June Dong
Department of General Business and Finance, School of Management, University of
Massachusetts, Amherst, MA, USA

Omar Drissi-Kaitouni
Centre de Recherche sur les Transports, University of Montreal, Canada

Michael Florian
Centre de Recherche sur les Transports, University of Montreal, Montreal, Canada

A. Ronald Gallant
North Carolina State University, Department of Statistics, Raleigh, NC, USA
234 List of Contributors

William L. Goffe
Department of Economics and Finance, Cameron School of Business, University of
North Carolina, Wilmington, NC, USA

A.J. Hughes Hallett


Department of Economics, University of Strathclyde, Glasgow, United Kingdom

Charles Hallahan
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA

Robert Hussey
Department of Economics, Loyola University of Chicago, Chicago, IL, USA

David Kendrick
Department of Economics, University of Texas, Austin, TX, USA

YueMa
Department of Economics, University of Strathclyde, Glasgow, United Kingdom

Anna Nagurney
Department of General Business and Finance, School of Management, University of
Massachusetts, Amherst, MA, USA

Alfred L. Norman
Department of Economics, University of Texas, Austin, TX, USA

Albert J. Reed
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA

Ber~ Rustem
Department of Computing, Imperial College, London

Agapi Somwaru
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA

George E. Tauchen
Department of Economics, Duke University, Durham, NC, USA

Utpal Vasavada
Economic Research Service, U.S. Department of Agriculture, Washington D.C., USA
Index

Abel, A. 76 E-approximations 102


active learning 75,76,81 average 94
adjustment costs 60,207,208,214 classes, NP 95
agency costs 225 classes, NP-complete 96, 97, 98, 100
algorithm 71,90,93 classes, P 95
complete 97 combinatorial 92
idealized 94 computational 100
rational 97 consumer theory 103-104
Amman, H. 80, 83, 84 discrete-time, stationary, infinite
anti-takeover defenses 219 horizon control
Aoki, M. 76 infinite complexity axiom 103
ARCH 3 information-based 92
rational expectations equi-
Bar-Shalom, Y. 83, 84 libria 104-105
basis function 137 risk and uncertainty (Knight) 102
Bayes Theorem 51 strategic 100
Bayesian bootstrap 51, 62 theory of money 105-106
Bellman 65, 66 theory of the firm tOO--t03
Bellman's equation 49 worst-case 94
Belsley, D. 44 computation of equilibrium 190
beta densities 30 conditional test 38, 39
beta distribution 35, 40, 43 control theory 76
biases 23 control variables 65,68,69,71
bootstrap 46 convexity 207,208,213,216
bootstrap pivot 53 coordinate ascent method 179
boundary conditions 45,51,52 corporate behavior 219
costs of adjustment 55
C77 Cramer-Rao lower bound 35
C++ 154,167,171
central limit theorem 38, 39 Daubechies, I. 137
Cholesky factorization 207 Deaton, A. 23,26,27
Chow, G. 65,68,69,76 decidability 92
classified boards 220 decomposition algorithm 195
coarse grained parallelization 173 demand theory 66
implementation 181 development systems 167
competing scenarios 110 DFP 17
competitive multi-sector 204 dilation equation 138
complexity discrete Fourier decomposition 140

235
236 Index

Drud's CONOPT 77 gradient search methods 5,17,18


dual approach III Gregory, A. 27
dual control 75
DUAL 80,83 Haar wavelet 140
Duffie 23,25 Hallett, A. 26
dynamic programming 45, 62, 68 Hansen, L. 23, 24
Hatheway, L. 80
econometric model 109
efficiency 36 impulse response 62
encapsulation 157 inheritance 159, 166
Euler equations 4,5,7, 11, 15,47,48,49, iterations 71
51,52,56,57 iterative least squares 207,212,213
exact equilibration 195
JBES Symposium 17
fair price amendments 220
Fair, R. 77,78 Kendall, M. 42
Ferrier 71 Kendrick,D.80,83,84
financial equilibrium 190 King, R. 68
financial leverage 226
finite sample properties 23 Lagrangian
flexible accelerator 207,208,210 augmentation 113, 114, 116
flow-of-funds accounts data 200 function 66, 114
forecast pooling 110 multipliers 65,71
Fortran 152 Laroque, G. 23,26,27
Fourier methods 137,143 learn 75
function, recursive 91,105 likelihood function 53,71
Livesey, D. 76
game theory 96-100 loss function 57
automation 96-100 Lucas critique 46
finite state 96 Lumsdaine, R. 44
majority voting 98
Nash equilibrium 98 MacRae, E. 76, 83, 84
payoff matrix 98 macroeconomic time series 137
prisoner's dilemma 98 managerial entrenchment 219
repeated 98 Markov processes 56
stage 98 MatClass 160, 167
strategy, constant defect 99 MATLAB 154
strategy, tit-for-tat 99 matrix balancing 173
zero-sum 99 Matuika, J. 80
gamma densities 30 maximum likelihood 27, 69
gamma distribution 34, 38, 43 measurement error 82
GAMS 77,78 method of simulated moments 25
GARCH 3 min-max problem 110,127
GAUSS 77,81, 153, 167 min-max strategy 109
generalized method of moments MINOS 77
(GMM) 4, 16, 17, 24, 34, 36,42, Mizrach, B. 83
48,52,53,56 modified projection method 195
Goffe, W. 71 Monte Carlo 23,27,31
goodness of fit 42 mother functions 138
GQOPT 19 multi-instrument financial model 204
gradient projection method 174 Multiple Instruction Multiple Data
Index 237

(MIMD) 173 QLP 77


quadratic programming 127
Neck, R. 80 quadratic subproblem 115, 127
network subproblems 195 qualitative factors 219
Newey, W. 26
non-normality 23 random walk 68
nonconvexities 84, 85 RAS dual algorithm 173
nonlinear programming 111, 114 Rational Expectations Hypothesis
nonlinear rational expectations 8, 12 (REH) 48
nonlinear structural models 17 rationality
nonparametric structural estimator 5,9, bounded 90, 96
10,20 procedural 89
nonseparability 6 substantive 89
normal distribution 30, 34, 36, 43 RATS 77
Norman, A. 83 Rebelo, S. 68
Norman, M. 83 returns-to-scale 55
NPSOL 17 Ricatti equations 50,52,53,57,68,80
rival models 109, 110
Object-Oriented Programming Systems robust policy 109, 111, 113
(OOPS) 157 Rogers, 1. 71
optimal control 65,66,69,71
optimal policy 109, 110 scaling functions 138
optimization 207,211,216 Seemingly Unrelated Regressions
ordered probit 219 (SUR) 52,53
ordinal measure 219 seminonparametric models 3
sequential quadratic programming 109
Palash, C. 83 Simon,H.76
parallel computing 173 simplex method 5,18,20
parameterized expectations 4,5,8, 10, 11, simulated annealing 5, 18,71
14,20 simulated method of moments 4, 9
Parasuk, C. 77 Singleton 23, 25
Park, I-S. 78 small sample properties 23
passive learning 76, 78 Smith, G. 23,25,27,44
penalty parameter 114,116,117 SNP 9, 10, 15, 16
perfect foresight 51, 61 Solow, R. 69
Pethe, A. 75 Spencer, M. 23, 25
Pindyck, R. 76, 77 State updates 28
pirot, theoretical 53 state variables 65,66,68,69,71
Plosser, C. 68 statistical estimation 69, 71
poison pills 220 stepsize strategy 116
portfolio optimization 204 Stewart, A. 42
posterior distribution 53, 57 stochastic regulator 45, 46, 53
Powell, J. 44 structural economic model 8, 15
Prescott, E. 76 structural eqUilibrium model 3
price controls 189
prior density 51,52,53 Tauchen, G. 27
production function 68 taxes 189
program Theil, H. 76
ND-SAL(N) 95 time separablility 6
SAL(N) 91 Tobin's q 226
projected gradient algorithm 179 transactions costs 4, 6, 7
238 Index

Tse, E. 84 variational inequalities 189


Tucci, M. 83 vector autoregression models 45
Turing machine 90 virtual method 159
Turnovsky, S. 77
type II error 42 Watson, M. 68
weighting matrix 25
unconstrained problem 127 West, K. 26
utility function 68 worst-case design 109,110

value function 65,66

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy