0% found this document useful (0 votes)
56 views265 pages

Lecture Notes For ECON660 and ECON460-2022-08

Uploaded by

sarleenpreetkaur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views265 pages

Lecture Notes For ECON660 and ECON460-2022-08

Uploaded by

sarleenpreetkaur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 265

Lecture Notes 1

Mathematical Ecnomics

Guoqiang TIAN
Department of Economics
Texas A&M University
College Station, Texas 77843
(gtian@tamu.edu)

This version: August, 2022


1
These notes draw heavily upon Chiang’s classic textbook Fundamental Methods of
Mathematical Economics and Vinogradov’s notes A Cook-Book of Mathematics, which
are used for my teaching and convenience of my students in class. Please not distribute
it to any others.
Contents

1 The Nature of Mathematical Economics 1


1.1 Economics and Mathematical Economics . . . . . . . . . . . 1
1.2 Advantages of Mathematical Approach . . . . . . . . . . . . 2
1.3 Scientific Analytic Methods: Three Dimensions and Six Na-
tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Economic Models 5
2.1 Ingredients of a Mathematical Model . . . . . . . . . . . . . . 5
2.2 The Real-Number System . . . . . . . . . . . . . . . . . . . . 5
2.3 The Concept of Sets . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Relations and Functions . . . . . . . . . . . . . . . . . . . . . 9
2.5 Types of Function . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Functions of Two or More Independent Variables . . . . . . . 12
2.7 Levels of Generality . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Equilibrium Analysis in Economics 15


3.1 The Meaning of Equilibrium . . . . . . . . . . . . . . . . . . . 15
3.2 Partial Market Equilibrium - A Linear Model . . . . . . . . . 16
3.3 Partial Market Equilibrium - A Nonlinear Model . . . . . . . 18
3.4 General Market Equilibrium . . . . . . . . . . . . . . . . . . . 19
3.5 Equilibrium in National-Income Analysis . . . . . . . . . . . 23

i
ii CONTENTS

4 Linear Models and Matrix Algebra 25


4.1 Matrix and Vectors . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Linear Dependance of Vectors . . . . . . . . . . . . . . . . . . 32
4.4 Commutative, Associative, and Distributive Laws . . . . . . 34
4.5 Identity Matrices and Null Matrices . . . . . . . . . . . . . . 35
4.6 Transposes and Inverses . . . . . . . . . . . . . . . . . . . . . 37

5 Linear Models and Matrix Algebra (Continued) 41


5.1 Conditions for Nonsingularity of a Matrix . . . . . . . . . . . 41
5.2 Test of Nonsingularity by Use of Determinant . . . . . . . . . 43
5.3 Basic Properties of Determinants . . . . . . . . . . . . . . . . 49
5.4 Finding the Inverse Matrix . . . . . . . . . . . . . . . . . . . . 55
5.5 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.6 Application to Market and National-Income Models . . . . . 66
5.7 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.8 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 73
5.9 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6 Comparative Statics and the Concept of Derivative 85


6.1 The Nature of Comparative Statics . . . . . . . . . . . . . . . 85
6.2 Rate of Change and the Derivative . . . . . . . . . . . . . . . 86
6.3 The Derivative and the Slope of a Curve . . . . . . . . . . . . 88
6.4 The Concept of Limit . . . . . . . . . . . . . . . . . . . . . . . 89
6.5 Inequality and Absolute Values . . . . . . . . . . . . . . . . . 92
6.6 Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.7 Continuity and Differentiability of a Function . . . . . . . . . 95

7 Rules of Differentiation and Their Use in Comparative Statics 99


7.1 Rules of Differentiation for a Function of One Variable . . . . 99
CONTENTS iii

7.2 Rules of Differentiation Involving Two or More Functions


of the Same Variable . . . . . . . . . . . . . . . . . . . . . . . 103
7.3 Rules of Differentiation Involving Functions of Different Vari-
ables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.4 Integration (The Case of One Variable) . . . . . . . . . . . . . 113
7.5 Partial Differentiation . . . . . . . . . . . . . . . . . . . . . . . 116
7.6 Applications to Comparative-Static Analysis . . . . . . . . . 118
7.7 Note on Jacobian Determinants . . . . . . . . . . . . . . . . . 121

8 Comparative-Static Analysis of General-Functions 125


8.1 Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.2 Total Differentials . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.3 Rule of Differentials . . . . . . . . . . . . . . . . . . . . . . . . 131
8.4 Total Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.5 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . 137
8.6 Comparative Statics of General-Function Models . . . . . . . 143
8.7 Matrix Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 144

9 Optimization: Maxima and Minima of a Function of One Vari-


able 147
9.1 Optimal Values and Extreme Values . . . . . . . . . . . . . . 148
9.2 Existence of Extremum for Continuous Function . . . . . . . 149
9.3 First-Derivative Test for Relative Maximum and Minimum . 150
9.4 Second and Higher Derivatives . . . . . . . . . . . . . . . . . 154
9.5 Second-Derivative Test . . . . . . . . . . . . . . . . . . . . . . 156
9.6 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.7 Nth-Derivative Test . . . . . . . . . . . . . . . . . . . . . . . . 161

10 Exponential and Logarithmic Functions 163


10.1 The Nature of Exponential Functions . . . . . . . . . . . . . . 163
iv CONTENTS

10.2 Logarithmic Functions . . . . . . . . . . . . . . . . . . . . . . 164


10.3 Derivatives of Exponential and Logarithmic Functions . . . 165

11 Optimization: Maxima and Minima of a Function of Two or More


Variables 169
11.1 The Differential Version of Optimization Condition . . . . . 169
11.2 Extreme Values of a Function of Two Variables . . . . . . . . 170
11.3 Objective Functions with More than Two Variables . . . . . . 177
11.4 Second-Order Conditions in Relation to Concavity and Con-
vexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
11.4.1 Concavity and Convexity . . . . . . . . . . . . . . . . 179
11.4.2 Concavity/Convexity and Global Optimization . . . 185
11.5 Economic Applications . . . . . . . . . . . . . . . . . . . . . . 187

12 Optimization with Equality Constraints 191


12.1 Effects of a Constraint . . . . . . . . . . . . . . . . . . . . . . 192
12.2 Finding the Stationary Values . . . . . . . . . . . . . . . . . . 193
12.3 Second-Order Conditions . . . . . . . . . . . . . . . . . . . . 199
12.4 General Setup of the Problem . . . . . . . . . . . . . . . . . . 203
12.5 Quasiconcavity and Quasiconvexity . . . . . . . . . . . . . . 205
12.6 Utility Maximization and Consumer Demand . . . . . . . . . 212

13 Optimization with Inequality Constraints 217


13.1 Non-Linear Programming . . . . . . . . . . . . . . . . . . . . 218
13.2 Kuhn-Tucker Conditions . . . . . . . . . . . . . . . . . . . . . 219
13.3 Economic Applications . . . . . . . . . . . . . . . . . . . . . . 224

14 Differential Equations 229


14.1 Existence and Uniqueness Theorem of Solutions for Ordi-
nary Differential Equations . . . . . . . . . . . . . . . . . . . 231
CONTENTS v

14.2 Some Common Ordinary Differential Equations with Ex-


plicit Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

14.3 Higher Order Linear Equations with Constant Coefficients . 236

14.4 System of Ordinary Differential Equations . . . . . . . . . . . 240

14.5 Simultaneous Differential Equations and Stability of Equi-


librium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

14.6 The Stability of Dynamical System . . . . . . . . . . . . . . . 247

15 Difference Equations 249

15.1 First-order Difference Equations . . . . . . . . . . . . . . . . 251

15.2 Second-order Difference Equation . . . . . . . . . . . . . . . 254

15.3 Difference Equations of Order n . . . . . . . . . . . . . . . . 255

15.4 The Stability of nth-Order Difference Equations . . . . . . . 257

15.5 Difference Equations with Constant Coefficients . . . . . . . 258


Chapter 1

The Nature of Mathematical


Economics

The purpose of this course is to introduce the most fundamental aspects of


the mathematical knowledge and methods, mainly matrix algebra, mathe-
matical analysis, and optimization theory, which constitute indispensable
tools of modern economics.

1.1 Economics and Mathematical Economics

Economics is a social science that studies how to make decisions in face


of scarce resources. Specifically, it studies individuals’ economic behavior
and phenomena as well as how individuals, such as consumers, house-
holds, firms, organizations and government agencies, make trade-off choic-
es that allocate limited resources among competing uses.
Mathematical economics is a mathematical approach to economic anal-
ysis, in which the economists make use of mathematical symbols in the
statement of the problem and also draw upon known mathematical theo-
rems to aid in reasoning.

1
2 CHAPTER 1. THE NATURE OF MATHEMATICAL ECONOMICS

Since mathematical economics is merely an approach to economic anal-


ysis, it should not and does not differ from the nonmathematical approach
to economic analysis in any fundamental way. The difference between
these two approaches is that in the former, the assumptions and conclu-
sions are stated in mathematical symbols rather than words and in the
equations rather than sentences so that the interdependent relationship
among economic variables and resulting conclusions are more rigorous
and concise by using mathematical models and mathematical statistic-
s/econometric methods.

1.2 Advantages of Mathematical Approach


Mathematical approach has the following advantages:

(1) It makes the language more precise and the statement of


assumptions more clear, which can deduce unnecessary de-
bates resulting from inaccurate verbal language.

(2) It makes the analytical logic more rigorous and clearly s-


tates the boundary, applicable scope and conditions for a
conclusion to hold. Otherwise, the abuse of a theory may
occur.

(3) Mathematics can help obtain the results that cannot be eas-
ily attained through intuition.

(4) It helps us improve and extend the existing economic theo-


ries.

It is, however, noteworthy a good master of mathematics cannot guar-


antee to be a good economist. It also requires fully understanding the an-
alytical framework and research methodologies of economics, and having
a good intuition and insight of real economic environments and economic
1.3. SCIENTIFIC ANALYTIC METHODS: THREE DIMENSIONS AND SIX NATURES3

issues. The study of economics not only calls for the understanding of var-
ious economic terms, concepts and results from the perspective of mathe-
matics (including geometry), but more importantly, even when those are
given by mathematical language or geometric figure, we need to get to
their economic meaning and the underlying profound economic thoughts
and ideals. Thus we should avoid being confused by the mathematical
formulas or symbols in the study of economics.

1.3 Scientific Analytic Methods: Three Dimen-


sions and Six Natures
Scientific economic analysis, especially aimed at studying and solving ma-
jor practical problems affecting the overall situation, is inseparable from
"three dimensions and six natures”:

Three dimensions: theoretical logic, practical knowledge, and his-


torical perspective;

Six natures: scientific, rigorous, realistic, pertinent, forward-looking


and thought-provoking.

Studying and especially solving major social economic problems need


not only theoretical logic analysis and empirical tests, but also the confir-
mation of historical experiences.
Only theory and practice are not enough, causing shortsightedness.
The short-term optimum is not necessarily the long-term optimum. We
thus need historical comparisons from a wide field of vision and angle
of view for drawing experience and lessons. Of course, if merely relying
on historical experience, men’s cognition will be deficient and stick in the
mud, it is difficult to create new ideas and have innovation, and will hin-
der economic and social development.
4 CHAPTER 1. THE NATURE OF MATHEMATICAL ECONOMICS

Therefore, only through the three dimensions of “theoretical logic, prac-


tical knowledge, and historical vision”, can we guarantee that a solution
or reform measure satisfies the “six natures”. Indeed, all knowledge is p-
resented as history, all science is exhibited as logics, and all judgment is
understood in the sense of statistics.
As such, it is not surprising that mathematics and mathematical statis-
tics/econometrics are used as the basic and most important analytical tool-
s in every field of economics. For those who study economics and conduct
research, it is necessary to grasp enough knowledge of mathematics and
mathematical statistics. Therefore, it is of great necessity to master suffi-
cient mathematical knowledge if you want to learn economics well, con-
duct economic research and become a good economist.
All in all, to become a good economist, you need to be of original, cre-
ative and academic way of critical thinking.
Chapter 2

Economic Models

2.1 Ingredients of a Mathematical Model

A economic model is merely a theoretical framework, and there is no in-


herent reason why it must mathematical. If the model is mathematical,
however, it will usually consist of a set of equations designed to describe
the structure of the model. By relating a number of variables to one an-
other in certain ways, these equations give mathematical form to the set of
analytical assumptions adopted. Then, through application of the relevant
mathematical operations to these equations, we may seek to derive a set
of conclusions which logically follow from those assumptions.

2.2 The Real-Number System

Whole numbers such as 1, 2, · · · are called positive numbers; these are the
numbers most frequently used in counting. Their negative counterparts
−1, −2, −3, · · · are called negative integers. The number 0 (zero), on the
other hand, is neither positive nor negative, and it is in that sense unique.
Let us lump all the positive and negative integers and the number zero in-

5
6 CHAPTER 2. ECONOMIC MODELS

to a single category, referring to them collectively as the set of all integers.


Integers of course, do not exhaust all the possible numbers, for we have
fractions, such as 23 , 54 , and 37 „ which – if placed on a ruler – would fall
between the integers. Also, we have negative fractions, such as − 12 and
− 52 . Together, these make up the set of all fractions.
The common property of all fractional number is that each is express-
ible as a ratio of two integers; thus fractions qualify for the designation
rational numbers (in this usage, rational means ratio-nal). But integers are
also rational, because any integer n can be considered as the ratio n/1. The
set of all fractions together with the set of all integers from the set of all
rational numbers.
Once the notion of rational numbers is used, however, there natural-
ly arises the concept of irrational numbers – numbers that cannot be ex-

pressed as raios of a pair of integers. One example is 2 = 1.4142 · · · .
Another is π = 3.1415 · · · .
Each irrational number, if placed on a ruler, would fall between two
rational numbers, so that, just as the fraction fill in the gaps between the
integers on a ruler, the irrational number fill in the gaps between rational
numbers. The result of this filling-in process is a continuum of numbers,
all of which are so-called “real numbers." This continuum constitutes the
set of all real numbers, which is often denoted by the symbol R.

2.3 The Concept of Sets

A set is simply a collection of distinct objects. The objects may be a group


of distinct numbers, or something else. Thus, all students enrolled in a
particular economics course can be considered a set, just as the three inte-
gers 2, 3, and 4 can form a set. The object in a set are called the elements
of the set.
2.3. THE CONCEPT OF SETS 7

There are two alternative ways of writing a set: by enumeration and


by description. If we let S represent the set of three numbers 2, 3 and 4, we
write by enumeration of the elements, S = {2, 3, 4}. But if we let I denote
the set of all positive integers, enumeration becomes difficult, and we may
instead describe the elements and write I = {x|x is a positive integer},
which is read as follows: “I is the set of all x such that x is a positive
integer." Note that the braces are used enclose the set in both cases. In
the descriptive approach, a vertical bar or a colon is always inserted to
separate the general symbol for the elements from the description of the
elements.

A set with finite number of elements is called a finite set. Set I with an
infinite number of elements is an example of an infinite set. Finite sets are
always denumerable (or countable), i.e., their elements can be counted
one by one in the sequence 1, 2, 3, · · · . Infinite sets may, however, be either
denumerable (set I above) or nondenumerable (for example, J = {x|2 <
x < 5}).

Membership in a set is indicated by the symbol ∈ (a variant of the


Greek letter epsilon ϵ for “element"), which is read: “is an element of."

If two sets S1 and S2 happen to contain identical elements,

S1 = {1, 2, a, b} and S2 = {2, b, 1, a}

then S1 and S2 are said to be equal (S1 = S2 ). Note that the order of
appearance of the elements in a set is immaterial.

If we have two sets T = {1, 2, 5, 7, 9} and S = {2, 5, 9}, then S is a


subset of T , because each element of S is also an element of T . A more
formal statement of this is: S is a subset of T if and only if x ∈ S implies
x ∈ T . We write S ⊆ T or T ⊇ S.

It is possible that two sets happen to be subsets of each other. When


8 CHAPTER 2. ECONOMIC MODELS

this occurs, however, we can be sure that these two sets are equal.
If a set have n elements, a total of 2n subsets can be formed from those
elements. For example, the subsets of {1, 2} are: ∅, {1}, {2} and {1, 2}.
If two sets have no elements in common at all, the two sets are said to
be disjoint.
The union of two sets A and B is a new set containing elements belong
to A, or to B, or to both A and B. The union set is symbolized by A ∪ B
(read: “A union B").

Example 2.3.1 If A = {1, 2, 3}, B = {2, 3, 4, 5}, then A ∪ B = {1, 2, 3, 4, 5}.

The intersection of two sets A and B, on the other hand, is a new


set which contains those elements (and only those elements) belonging
to both A and B. The intersection set is symbolized by A ∩ B (read: “A
intersection B").

Example 2.3.2 If A = {1, 2, 3}, A = {4, 5, 6}, then A ∪ B = ∅.

In a particular context of discussion, if the only numbers used are the


set of the first seven positive integers, we may refer to it as the universal
set U . Then, with a given set, say A = {3, 6, 7}, we can define another set
Ā (read: “the complement of A") as the set that contains all the numbers in
the universal set U which are not in the set A. That is: Ā = {1, 2, 4, 5}.

Example 2.3.3 If U = {5, 6, 7, 8, 9}, A = {6, 5}, then Ā = {7, 8, 9}.

Properties of unions and intersections:

A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
2.4. RELATIONS AND FUNCTIONS 9

2.4 Relations and Functions


An ordered pair (a, b) is a pair of mathematical objects. The order in which
the objects appear in the pair is significant: the ordered pair (a, b) is dif-
ferent from the ordered pair (b, a) unless a = b. In contrast, a set of two
elements is an unordered pair: the unordered pair {a, b} equals the un-
ordered pair {b, a}. Similar concepts apply to a set with more than two
elements, ordered triples, quadruples, quintuples, etc., are called ordered
sets.

Example 2.4.1 To show the age and weight of each student in a class, we
can form ordered pairs (a, w), in which the first element indicates the age
(in years) and the second element indicates the weight (in pounds). Then
(19, 128) and (128, 19) would obviously mean different things.

Suppose, from two given sets, x = {1, 2} and y = {3, 4}, we wish to
form all the possible ordered pairs with the first element taken from set x
and the second element taken from set y. The result will be the set of four
ordered pairs (1,2), (1,4), (2,3), and (2,4). This set is called the Cartesian
product, or direct product, of the sets x and y and is denoted by x × y
(read “x cross y").
Extending this idea, we may also define the Cartesian product of three
sets x, y, and z as follows:

x × y × z = {(a, b, c)|a ∈ x, b ∈ y, c ∈ z}

which is the set of ordered triples.

Example 2.4.2 If the sets x, y, and z each consist of all the real numbers,
the Cartesian product will correspond to the set of all points in a three-
dimension space. This may be denoted by R × R × R, or more simply,
R3 .
10 CHAPTER 2. ECONOMIC MODELS

Example 2.4.3 The set {(x, y)|y = 2x} is a set of ordered pairs including,
for example, (1,2), (0,0), and (-1,-2). It constitutes a relation, and its graph-
ical counterpart is the set of points lying on the straight line y = 2x.

Example 2.4.4 The set {(x, y)|y ≤ x} is a set of ordered pairs including,
for example, (1,0), (0,0), (1,1), and (1,-4). The set corresponds the set of all
points lying on below the straight line y = x.

As a special case, however, a relation may be such that for each x value
there exists only one corresponding y value. The relation in example 2.4.3
is a case in point. In that case, y is said to be a function of x, and this
is denoted by y = f (x), which is read: “y equals f of x." A function is
therefore a set of ordered pairs with the property that any x value uniquely
determines a y value. It should be clear that a function must be a relation,
but a relation may not be a function.
A function is also called a mapping, or transformation; both words
denote the action of associating one thing with another. In the statement
y = f (x), the functional notation f may thus be interpreted to mean a rule
by which the set x is “mapped" (“transformed") into the set y. Thus we
may write
f :x→y

where the arrow indicates mapping, and the letter f symbolically specifies
a rule of mapping.
In the function y = f (x), x is referred to as the argument of the func-
tion, and y is called the value of the function. We shall also alternatively
refer to x as the independent variable and y as the dependent variable.
The set of all permissible values that x can take in a given context is known
as the domain of the function, which may be a subset of the set of all real
numbers. The y value into which an x value is mapped is called the image
of that x value. The set of all images is called the range of the function,
2.5. TYPES OF FUNCTION 11

which is the set of all values that the y variable will take. Thus the domain
pertains to the independent variable x, and the range has to do with the
dependent variable y.

2.5 Types of Function


A function whose range consists of only one element is called a constant
function.

Example 2.5.1 The function y = f (x) = 7 is a constant function.

The constant function is actually a “degenerated" case of what are known


as polynomial functions. A polynomial functions of a single variable has
the general form

y = a0 + a1 x + a2 x2 + · · · + an xn

in which each term contains a coefficient as well as a nonnegative-integer


power of the variable x.
Depending on the value of the integer n (which specifies the highest
power of x), we have several subclasses of polynomial function:

Case of n = 0 : y = a0 [constant function]


Case of n = 1 : y = a0 + a1 x [linear function]
Case of n = 2 : y = a0 + a1 x + a2 x2 [quadritic function]
Case of n = 3 : y = a0 + a1 x + a2 x2 + a3 x3 [ cubic function]

A function such as
x−1
y=
x2 + 2x + 4
in which y is expressed as a ratio of two polynomials in the variable x,
is known as a rational function (again, meaning ratio-nal). According to
12 CHAPTER 2. ECONOMIC MODELS

the definition, any polynomial function must itself be a rational function,


because it can always be expressed as a ratio to 1, which is a constant
function.
Any function expressed in terms of polynomials and or roots (such as
square root) of polynomials is an algebraic function. Accordingly, the
function discussed thus far are all algebraic. A function such as y =

x2 + 1 is not rational, yet it is algebraic.
However, exponential functions such as y = bx , in which the inde-
pendent variable appears in the exponent, are nonalgebraic. The closely
related logarithmic functions, such as y = logb x, are also nonalgebraic.

Rules of Exponents:

Rule 1: xm × xn = xm+n
xm
Rule 2: = xm−n (x ̸= 0)
xn
1
Rule 3: x−n = n
x
Rule 4: x0 = 1 (x ̸= 0)
1 √
n
Rule 5: x n = x

Rule 6: (xm )n = xmn

Rule 7: xm × y m = (xy)m

2.6 Functions of Two or More Independent Vari-


ables

Thus for far, we have considered only functions of a single independent


variable, y = f (x). But the concept of a function can be readily extended
2.7. LEVELS OF GENERALITY 13

to the case of two or more independent variables. Given a function

z = g(x, y)

a given pair of x and y values will uniquely determine a value of the de-
pendent variable z. Such a function is exemplified by

z = ax + by or z = a0 + a1 x + a2 x2 + b1 y + b2 y 2

Functions of more than one variables can be classified into various


types, too. For instance, a function of the form

y = a1 x1 + a2 x2 + · · · + an xn

is a linear function, whose characteristic is that every variable is raised to


the first power only. A quadratic function, on the other hand, involves first
and second powers of one or more independent variables, but the sum of
exponents of the variables appearing in any single term must not exceed
two.

Example 2.6.1 y = ax2 + bxy + cy 2 + dx + ey + f is a quadratic function.

2.7 Levels of Generality

In discussing the various types of function, we have without explicit no-


tice introducing examples of functions that pertain to varying levels of
generality. In certain instances, we have written functions in the form

y = 7, y = 6x + 4, y = x2 − 3x + 1 (etc.)
14 CHAPTER 2. ECONOMIC MODELS

Not only are these expressed in terms of numerical coefficients, but they al-
so indicate specifically whether each function is constant, linear, or quadrat-
ic. In terms of graphs, each such function will give rise to a well-defined
unique curve. In view of the numerical nature of these functions, the so-
lutions of the model based on them will emerge as numerical values also.
The drawback is that, if we wish to know how our analytical conclusion
will change when a different set of numerical coefficients comes into effec-
t, we must go through the reasoning process afresh each time. Thus, the
result obtained from specific functions have very little generality.
On a more general level of discussion and analysis, there are functions
in the form
y = a, y = bx + a, y = cx2 + bx + a (etc.)

Since parameters are used, each function represents not a single curve but
a whole family of curves. With parametric functions, the outcome of math-
ematical operations will also be in terms of parameters. These results are
more general.
In order to attain an even higher level of generality, we may resort to
the general function statement y = f (x), or z = g(x, y). When expressed
in this form, the functions is not restricted to being either linear, quadrat-
ic, exponential, or trigonometric – all of which are subsumed under the
notation. The analytical result based on such a general formulation will
therefore have the most general applicability.
Chapter 3

Equilibrium Analysis in
Economics

3.1 The Meaning of Equilibrium

Like any economic term, equilibrium can be defined in various ways. One
definition here is that an equilibrium for a specific model is a situation
where there is no tendency to change. More generally, it means that from
an available set of choices (options), choose the "best” one according to
a certain criterion. It is for this reason that the analysis of equilibrium is
referred to as statics. The fact that an equilibrium implies no tendency to
change may tempt one to conclude that an equilibrium necessarily consti-
tutes a desirable or ideal state of affairs.
This chapter provides two typical examples of equilibrium analysis.
One that is from microeconomics is the equilibrium attained by a mar-
ket under given demand and supply conditions. The other that is from
macroeconomics is the equilibrium of Keynesian national income model
under given conditions of consumption and investment patterns. We will
use these two models as running examples throughout the course.

15
16 CHAPTER 3. EQUILIBRIUM ANALYSIS IN ECONOMICS

3.2 Partial Market Equilibrium - A Linear Model

In a static-equilibrium model, the standard problem is that of finding the


set of values of the endogenous variables which will satisfy the equilibri-
um conditions of the model.

Partial-Equilibrium Market Model

Partial-equilibrium market model is a model of price determination in an


isolated market for a commodity.
Three variables:

Qd = the quantity demanded of the commodity;

Qs = the quantity supplied of the commodity;

P = the price of the commodity.

The Equilibrium Condition: Qd = Qs .


The model consists of equilibrium condition, demand function, and
supply function:

Qd = Qs ,
Qd = a − bP (a, b > 0),
Qs = −c + dP (c, d > 0),

−b is the slope of Qd , a is the vertical intercept of Qd , d is the slope of Qs ,


and −c is the vertical intercept of Qs .
Note that, contrary to the usual practice, quantity rather than price has
been plotted vertically in the figure.
One way of finding the equilibrium is by successive elimination of vari-
ables and equations through substitution.
3.2. PARTIAL MARKET EQUILIBRIUM - A LINEAR MODEL 17

Qd , Q s

a
Qd = a - b P Qs = - c + d P
(demand) (supply)

Q= Qd= Qs (P, Q )

O P
P1 P

-c

Figure 3.1: The linear model and its market equilibrium.

From Qs = Qd , we have

a − bP = −c + dP

and thus
(b + d)P = a + c.

Since b + d ̸= 0, the equilibrium price is

a+c
P̄ = .
b+d

The equilibrium quantity can be obtained by substituting P̄ into either


Qs or Qd :
ad − bc
Q̄ = .
b+d

Since the denominator (b + d) is positive, the positivity of Q̄ requires


that the numerator (ad − bc) > 0. Thus, to be economically meaningful,
the model should contain the additional restriction that ad > bc.
18 CHAPTER 3. EQUILIBRIUM ANALYSIS IN ECONOMICS

3.3 Partial Market Equilibrium - A Nonlinear Mod-


el

The partial market model can be nonlinear. For example, suppose the
model is given by

Qd = Qs (equilibrium condition);
Qd = 4 − P 2 ;
Qs = 4P − 1.

As previously stated, this system of three equations can be reduced to


a single equation by substitution.

4 − P 2 = 4P − 1,

or
P 2 + 4P − 5 = 0,

which is a quadratic equation. In general, given a quadratic equation in


the form
ax2 + bx + c = 0 (a ̸= 0),

its two roots can be obtained from the quadratic formula:



−b ± b2 − 4ac
x̄1 , x̄2 =
2a

where the “+" part of the “±" sign yields x̄1 and “−" part yields x̄2 . Thus,
by applying the quadratic formulas to P 2 +4P −5 = 0, we have P̄1 = 1 and
P̄2 = −5, but only the first is economically admissible, as negative prices
are ruled out.
3.4. GENERAL MARKET EQUILIBRIUM 19

The Graphical Solution

Qd , Qs

4
Qs = 4P - 1

3 ( 1, 3 )

Qd = 4 - P!
1

P
-2 -1 0 1 2

-1

Figure 3.2: The nonlinear model and its market equilibrium.

However, in general, the market model can be highly nonlinear, one


may not be able to find an explicit solution. Then, we would like to know
if there exists an implicit solution. We will answer this question using the
Implicit-Function Theorem in Chapter 8.

3.4 General Market Equilibrium


In the above, we have discussed methods of an isolated market, where-
in the Qd and Qs of a commodity are functions of the price of that com-
modity alone. In practice, there would normally exist many substitutes
and complementary goods. Thus a more realistic model for the demand
and supply functions of a commodity should take into account the effects
not only of the price of the commodity itself but also of the prices of oth-
20 CHAPTER 3. EQUILIBRIUM ANALYSIS IN ECONOMICS

er commodities. As a result, the price and quantity variables of multiple


commodities must enter endogenously into the model. Thus, when sever-
al interdependent commodities are simultaneously considered, equilibri-
um would require the absence of excess demand, which is the difference
between demand and supply, for each and every commodity included in
the model. Consequently, the equilibrium condition of an n−commodity
market model will involve n equations, one for each commodity, in the
form
Ei = Qdi − Qsi = 0 (i = 1, 2, · · · , n),

where Qdi = Qdi (P1 , P2 , · · · , Pn ) and Qsi = Qsi (P1 , P2 , · · · , Pn ) are the de-
mand and supply functions of commodity i, and (P1 , P2 , · · · , Pn ) are prices
of commodities.
Thus, solving n equations for P = (P1 , P2 , · · · , Pn ):

Ei (P1 , P2 , · · · , Pn ) = 0,

we obtain the n equilibrium prices P̄i – if a solution does indeed exist. And
then the Q̄i may be derived from the demand or supply functions.

Two-Commodity Market Model

To illustrate the problem, let us consider a two-commodity market model


with linear demand and supply functions. In parametric terms, such a
model can be written as

Qd1 − Qs1 = 0 (equilibrium condition for commidity 1);

Qd1 = a0 + a1 P1 + a2 P2 (consumer 1’ demand function);

Qs1 = b0 + b1 P1 + b2 P2 (producer 1’ supply function);


3.4. GENERAL MARKET EQUILIBRIUM 21

Qd2 − Qs2 = 0 (equilibrium condition for commdity 2);

Qd2 = α0 + α1 P1 + α2 P2 (consumer 2’ demand function);

Qs2 = β0 + β1 P1 + β2 P2 (producer 2’ supply function).

By substituting the second and third equations into the first and the
fifth and sixth equations into the fourth, the model is reduced to two e-
quations in two variable:

(a0 − b0 ) + (a1 − b1 )P1 + (a2 − b2 )P2 = 0

(α0 − β0 ) + (α1 − β1 )P1 + (α2 − β2 )P2 = 0

If we let
ci = ai − bi (i = 0, 1, 2),

γi = αi − βi (i = 0, 1, 2),

the above two linear equations can be written as

c1 P1 + c2 P2 = −c0 ;

γ1 P1 + γ2 P2 = −γ0 ,

which can be solved by further elimination of variables.

The solutions are


c2 γ0 − c0 γ2
P̄1 = ;
c1 γ2 − c2 γ1
c0 γ1 − c1 γ0
P̄2 = .
c1 γ2 − c2 γ1

For these two values to make sense, certain restrictions should be im-
posed on the model. Firstly, we require the common denominator c1 γ2 −
c2 γ1 ̸= 0. Secondly, to assure positivity, the numerator must have the same
22 CHAPTER 3. EQUILIBRIUM ANALYSIS IN ECONOMICS

sign as the denominator.

Numerical Example

Suppose that the demand and supply functions are numerically as follows:

Qd1 = 10 − 2P1 + P2 ;

Qs1 = −2 + 3P1 ;

Qd2 = 15 + P1 − P2 ;

Qs2 = −1 + 2P2 .

By substitution, we have

5P1 − P2 = 12;

−P1 + 3P2 = 16,

which are two linear equations. The solutions for the equilibrium prices
and quantities are P̄1 = 52/14, P̄2 = 92/14, Q̄1 = 64/7, Q̄2 = 85/7.
Similarly, for the n−commodities market model, when demand and
supply functions are linear in prices, we can have n linear equations. In the
above, we assume that an equal number of equations and unknowns has
a unique solution. However, some very simple examples should convince
us that an equal number of equations and unknowns does not necessarily
guarantee the existence of a unique solution.
For the two linear equations,


 x + y = 8,
,

 x+y =9
3.5. EQUILIBRIUM IN NATIONAL-INCOME ANALYSIS 23

we can easily see that there is no solution.


The second example shows a system has an infinite number of solu-
tions:


 2x + y = 12;
.

 4x + 2y = 24

These two equations are functionally dependent, which means that one
can be derived from the other. Consequently, one equation is redundant
and may be dropped from the system. Any pair (x̄, ȳ) is the solution as
long as (x̄, ȳ) satisfies y = 12 − x.
Now consider the case of more equations than unknowns. In gener-
al, there is no solution. But, when the number of unknowns equals the
number of functionally independent equations, the solution exists and is
unique. The following example shows this fact.

2x + 3y = 58;

y = 18;

x + y = 20.

Thus for simultaneous-equation model, we need systematic methods


of testing the existence of a unique (or determinate) solution. There are
our tasks in the following chapters.

3.5 Equilibrium in National-Income Analysis

The equilibrium analysis can be also applied to other areas of economics.


As a simple example, we may cite the familiar Keynesian national-income
24 CHAPTER 3. EQUILIBRIUM ANALYSIS IN ECONOMICS

model,
Y = C + I0 + G0 (equilibrium condition);

C = a + bY (consumption function),

where Y and C stand for the endogenous variables national income and
consumption expenditure, respectively, and I0 and G0 represent the ex-
ogenously determined investment and government expenditures, respec-
tively.
Solving these two linear equations, we obtain the equilibrium national
income and consumption expenditure:

a + I0 + G0
Ȳ = ,
1−b

a + b(I0 + G0 )
C̄ = .
1−b
Chapter 4

Linear Models and Matrix


Algebra

From the last chapter we have seen that for the one-commodity partial
market equilibrium model, the solutions for P̄ and Q̄ are relatively sim-
ple, even though a number of parameters are involved. As more and
more commodities are incorporated into the model, such solutions for-
mulas quickly become cumbersome and unwieldy. We need to have new
methods suitable for handing a large system of simultaneous equations.
Such a method is provided in matrix algebra.

Matrix algebra can enable us to do many things, including: (1) It pro-


vides a compact way of writing an equation system, even an extremely
large one. (2) It leads to a way of testing the existence of a solution with-
out actually solving it by evaluation of a determinant – a concept closely
related to that of a matrix. (3) It gives a method of finding that solution if
it exists.

Throughout the lecture notes, we will use a bold letter such as a to


denote a vector or a bold capital letter such as A to denote a matrix.

25
26 CHAPTER 4. LINEAR MODELS AND MATRIX ALGEBRA

4.1 Matrix and Vectors


In general, a system of m linear equations in n variables (x1 , x2 , · · · , xn )
can be arranged into such formula

a11 x1 + a12 x2 + · · · a1n xn = d1 ,


a21 x1 + a22 x2 + · · · a2n xn = d2 ,
(4.1.1)
···
am1 x1 + am2 x2 + · · · amn xn = dm ,

where the double-subscripted symbol aij represents the coefficient appear-


ing in the ith equation and attached to the jth variable xj , and dj represents
the constant term in the jth equation.

Example 4.1.1 The two-commodity linear market model can be written –


after eliminating the quantity variables – as a system of two linear equa-
tions.

c1 P1 + c2 P2 = −c0 ,
γ1 P1 + γ2 P2 = −γ0 .

Matrix as Arrays

There are essentially three types of ingredients in the equation system


(3.1). The first is the set of coefficients aij ; the second is the set of vari-
ables x1 , x2 , · · · , xn ; and the last is the set of constant terms d1 , d2 , · · · , dm .
If we arrange the three sets as three rectangular arrays and label them,
respectively, by bold A, x, and d, then we have
 
a
 11
a12 · · · a1n 
 
 a21 a22 · · · a2n 
 
A=

,
 ··· ··· ··· ··· 

 
am1 am2 · · · amn
4.1. MATRIX AND VECTORS 27
 
x
 1 
 
 x2 
 
x=

,

(3.2)
 ··· 
 
xn
 
d
 1 
 
 d2 
 
d= .
 
 ··· 
 
dm

Example 4.1.2 Given the linear-equation system:

6x1 + 3x2 + x3 = 22
x1 + 4x2 − 2x3 = 12 ,
4x1 − x2 + 5x3 = 10

we can write  
 6 3 1 
 
A=
 1 4 −2 
,
 
4 −1 5
 
 x1 
 
x= 
 x2  ,
 
x3
 
 22 
 
d= 
 12  .
 
10

Each of these three arrays given above constitutes a matrix.

A matrix is defined as a rectangular array of numbers, parameters, or


variables. As a shorthand device, the array in matrix A can be written
28 CHAPTER 4. LINEAR MODELS AND MATRIX ALGEBRA

more simple as

A = [aij ]m×n (i = 1, 2, · · · , m; j = 1, 2, · · · , n).

Vectors as Special Matrices

The number of rows and number of columns in a matrix together define


the dimension of the matrix. For instance, A is said to be of dimension
m × n. In the special case where m = n, the matrix is called a square
matrix.

If a matrix contains only one column (row), it is called a column (row)


vector. For notation purposes, a row vector is often distinguished from a
column vector by the use of a primed symbol.

x′ = [x1 , x2 , · · · , xn ].

Remark 4.1.1 A vector is merely an ordered n-tuple and as such it may be


interpreted as a point in an n-dimensional space.

With the matrices defined in (3.2), we can express the equations system
(3.1) simply as
Ax = d.

However, the equation Ax = d prompts at least two questions. How


do we multiply two matrices A and x? What is meant by the equality of
Ax and d? Since matrices involve whole blocks of numbers, the familiar
algebraic operations defined for single numbers are not directly applica-
ble, and there is need for new set of operation rules.
4.2. MATRIX OPERATIONS 29

4.2 Matrix Operations


The Equality of Two Matrices

A = B if and only if aij = bij for all i = 1, 2, · · · , m, j = 1, 2, · · · , n.

Addition and Subtraction of Matrices

A + B = [aij ] + [bij ] = [aij + bij ],

i.e., the addition of A and B is defined as the addition of each pair of


corresponding elements.

Remark 4.2.1 Two matrices can be added (equal) if and only if they have
the same dimension.

Example 4.2.1      
4 9 2 0 6 9
 + = .
2 1 0 7 2 8

Example 4.2.2
     
a11 a12 a13  b11 b12 b13  a11 + b11 a12 + b12 a13 + b13 
 + = .
a21 a22 a23 b21 b22 b23 a21 + b21 a22 + b22 a23 + b23

The Subtraction of Matrices:

A − B is defined by
[aij ] − [bij ] = [aij − bij ].

Example 4.2.3      
19 3 6 8 13 −5
 − = .
2 0 1 3 1 −3
30 CHAPTER 4. LINEAR MODELS AND MATRIX ALGEBRA

Scalar Multiplication:

λA = λ[aij ] = [λaij ],

i.e., to multiply a matrix by a number is to multiply every element of that


matrix by the given scalar.

Example 4.2.4    
3 −1 21 −7
7 = .
0 5 0 35

Example 4.2.5
   
a11 a12 a13  −a11 −a12 −a13 
−1  = .
a21 a22 a23 −a21 −a22 −a23

Multiplication of Matrices:

Given two matrices Am×n and Bp×q , the conformability condition for mul-
tiplication AB is that the column dimension of A must be equal to the row
dimension of B, i.e., the matrix product AB will be defined if and only if
n = p. If defined, the product AB will have the dimension m × q.
The product AB is defined by

AB = C

∑n
with cij = ai1 b1j + ai2 b2j + · · · + ain bnj = l=1 ail blj .

Example 4.2.6
    
a11 a12  b11 b12  a11 b11 + a12 b21 a11 b12 + a12 b22 
  = .
a21 a22 b21 b22 a21 b11 + a22 b21 a21 b12 + a22 b22
4.2. MATRIX OPERATIONS 31

Example 4.2.7
      
3 5 −1 0 −3 + 20 35 17 35
  = = .
4 6 4 7 −4 + 24 42 20 42

Example 4.2.8

u′ = [u1 , u2 , · · · , un ] and v ′ = [v1 , v2 , · · · , vn ],


n
u′ v = u1 v1 + u2 v2 + · · · + un vn = ui vi .
i=1

This can be described by using the concept of the inner product of two
vectors u and v.

v · v = u1 v1 + u2 v2 + · · · + un vn = u′ v.

Example 4.2.9 For the linear-equation system (4.1.1), the coefficient matrix
and the variable vector are:
   
a
 11
a12 · · · a1n  x
 1
   
 a21 a22 · · · a2n   x2 
   
A=

 and x =   ,
  
··· ··· ··· ···  · · ·
   
am1 am2 · · · amn xn

and we then have


 
a x + a12 x2 + · · · + a1n xn 
 11 1
 
 a21 x1 + a22 x2 + · · · + a2n xn 
 
Ax = 

.

 ··· 
 
am1 x1 + am2 x2 + · · · + amn xn
32 CHAPTER 4. LINEAR MODELS AND MATRIX ALGEBRA

Thus, the linear-equation system (4.1.1) can indeed be simply written as

Ax = d.
 
3
Example 4.2.10 Given u =   and v ′ = [1, 4, 5], we have
2

   
3 × 1 3 × 4 3 × 5 3 12 15
uv ′ =  = .
2×1 2×4 2×5 2 8 10

It is important to distinguish the meaning of uv ′ (a matrix with dimension


n × n) and u′ v (a 1 × 1 matrix, or a scalar).

4.3 Linear Dependance of Vectors

Definition 4.3.1 A set of vectors v1 , · · · , vn is said to be linearly depen-


dent if and only if one of them can be expressed as a linear combination of
the remaining vectors; otherwise they are linearly independent.

If there are only two vectors, linear dependence means one is propor-
tional to the other.

Example 4.3.1 The three vectors


     
2 1 4
v1 =   , v2 =   , v3 =  
7 8 5

are linearly dependent since v3 is a linear combination of v1 and v2 ;


     
6 2
4
3v1 − 2v2 =   −   =   = v3
21 16 5
4.3. LINEAR DEPENDANCE OF VECTORS 33

or
3v1 − 2v2 − v3 = 0,
 
0
where 0 =   represents a zero vector.
0

Example 4.3.2 The three vectors


     
2 3 1
v1 =   , v2 =   , v3 =  
3 1 5

are linearly dependent since v1 is a linear combination of v2 and v3 :

1 1
v1 = v2 + v3 .
2 2

An equivalent definition of linear dependence is: a set of m−vectors


v1 , v2 , · · · , vn is linearly dependent if and only if there exists a set of s-
calars k1 , k2 , · · · , kn (not all zero) such that


n
ki vi = 0. (4.3.2)
i=1

or write in form of matrix, we have

V k = 0, (4.3.3)

where  
v′
 1
 ′ 
v 
 2
V = 
 
···
 

vm ,

and 0 is the zero vector, i.e., all elements of 0 are zeros.


If this holds only when ki = 0 for all i, these vectors are linearly inde-
34 CHAPTER 4. LINEAR MODELS AND MATRIX ALGEBRA

pendent.
In general, it is hard to check if a set of m−vectors v1 , v2 , · · · , vn is lin-
early dependent by the definition when m is bigger than 2. However, we
can reduce the test by checking if the homogeneous linear-equation sys-
tem (4.3.3) has a zero solution. We will provide a relative simple way to
check this.

4.4 Commutative, Associative, and Distributive


Laws
The following basic laws on matrix operation sometimes can significantly
simply the computation of matrixes.
The commutative and associative laws of matrix can be stated as fol-
lows:
Commutative Law:
A + B = B + A.

Proof: A + B = [aij ] +[bij ] = [aij +bij ] = [bij +aij ] = [bij ]+[aij ] = B +A.
Associative Law:

(A + B) + C = A + (B + C).

Proof: (A + B) + C = ([aij ] + [bij ]) + [cij ] = [aij + bij ] + [cij ] = [aij + bij +


cij ] = [aij +(bij +cij )] = [aij ]+([bij +cij ]) = [aij ]+([bij ]+[cij ]) = A+(B +C).

Matrix Multiplication

Matrix multiplication is not commutative, that is,

AB ̸= BA.
4.5. IDENTITY MATRICES AND NULL MATRICES 35

Even when AB is defined, BA may not be; but even if both products
are defined, AB = BA may not hold.

     
1 2 0 −1 12 13
Example 4.4.1 Let A =  , B =  . Then AB =  , but
3 4 6 7 24 25
 
−3 −4
BA =  .
27 40

The scalar multiplication of a matrix does obey. The commutative law:

kA = Ak

if k is a scalar.
Associative Law:
(AB)C = A(BC)

provided A is m × n, B is n × p, and C is p × q.
Distributive Law

A(B + C) = AB + AC [premultiplication by A];

(B + C)A = BA + CA [postmultiplication by A].

4.5 Identity Matrices and Null Matrices

Definition 4.5.1 Identity matrix, denoted by I or In in which n indicates


its dimension, is a square matrix with ones in its principal diagonal and
zeros everywhere else. That is,
36 CHAPTER 4. LINEAR MODELS AND MATRIX ALGEBRA

 

1 0 ··· 0 
 
 0 1 ··· 0 
 
In =  .

 ··· ··· ··· ··· 

 
0 0 ··· 1

Fact 1: Given an m × n matrix A, we have

Im A = AIn = A.

Fact 2:
Am×n In Bn×p = (AI)B = AB.

Fact 3:
(In )k = In .

Idempotent Matrices: A matrix A is said to be idempotent if AA = A.


Null Matrices: A null–or zero matrix–denoted by 0, plays the role of
the number 0. A null matrix is simply a matrix whose elements are all
zero. Unlike I, the zero matrix is not restricted to being square. Null
matrices obey the following rules of operation.

Am×n + 0m×n = Am×n ;

Am×n 0n×p = 0m×p ;

0q×m Am×n = 0q×n .

Remark 4.5.1 (a) CD = CE does not imply D = E. For instance, for


     
2 3 1 1 −2 1
C= , D =  , E =  ,
6 9 1 2 3 2
4.6. TRANSPOSES AND INVERSES 37

we have  
5 8
CD = CE =  ,
15 24

even though D ̸= E.
Then a question is: Under what condition, does CD = CE imply
D = E? We will show that it is so if C has the inverse that we will discuss
shortly.
(b) Even if A and B ̸= 0, we can still have AB = 0. Again, we will see
this is not true if A or B has the inverse.
   
2 4 −2 4 
Example 4.5.1 A =  , B =  .
1 2 1 −2
We have AB = 0.

4.6 Transposes and Inverses


The transpose of a matrix A is a matrix which is obtained by interchanging
the rows and columns of the matrix A. Formally, we have

Definition 4.6.1 A matrix B = [bij ]n×m is said to be the transpose of A =


[aij ]m×n if aji = bij for all i = 1, · · · , n and j = 1, · · · , m.

Usually transpose is denoted by A′ or AT .


Recipe - How to Find the Transpose of a Matrix:
The transpose A′ of A is obtained by making the columns of A into the
rows of A′ .
 
3 8 −9
Example 4.6.1 For A =  , its transpose is
1 0 4

 
 3 1
 
A′ = 
 8 0
.
 
−9 4
38 CHAPTER 4. LINEAR MODELS AND MATRIX ALGEBRA

Thus, by definition, if the dimension of a matrix A is m × n, then the


dimension of its transpose A′ must be n × m.

Example 4.6.2 For  


1 0 4
 
D=
0 3 7
,
 
4 7 2

its transpose is:  


 1 0 4
 
D′ =  
0 3 7 = D.
 
4 7 2

Definition 4.6.2 A matrix A is said to be symmetric if A′ = A.


A matrix A is called anti-symmetric (or skew-symmetric) if A′ = −A.
A matrix A is called orthogonal if A′ A = I.

Properties of Transposes:

a) (A′ )′ = A;

b) (A + B)′ = A′ + B ′ ;

c) (αA)′ = αA′ where α is a real number;

d) (AB)′ = B ′ A′ .

The property d) states that the transpose of a product is the product of the
transposes in reverse order.

Inverses and Their Properties

For a given square matrix A, while its transpose A′ is always derivable, its
inverse matrix may or may not exist.

Definition 4.6.3 A matrix, denoted by A−1 , is the inverse of A if the fol-


lowing conditions are satisfied:
4.6. TRANSPOSES AND INVERSES 39

(1) A is a square matrix;

(2) AA−1 = A−1 A = I.

Remark 4.6.1 The following statements are true:

1. Not every square matrix has an inverse. Squareness is a


necessary but not sufficient condition for the existence of
an inverse. If a square matrix A has an inverse, A is said to
be nonsingular. If A possesses no inverse, it is said to be a
singular matrix.

2. If A is nonsingular, then A and A−1 are inverse of each other,


i.e., (A−1 )−1 = A.

3. If A is n × n, then A−1 is also n × n.

4. The inverse of A is unique.

Proof. Let B and C both be inverses of A. Then

B = BI = BAC = IC = C.

5. AA−1 = I implies A−1 A = I. Thus, to check if a matrix A


has an inverse, we only need to check if one of AA−1 = I
and A−1 A = I is satisfied.

Proof. We need to show that if AA−1 = I, and if there is a


matrix B such that BA = I, then B = A−1 . To see this,
postmultiplying both sides of BA = I by A−1 , we have
BAA−1 = A−1 and thus B = A−1 .
   
3 1 1 2 −1
Example 4.6.3 Let A =   and B = 6  . Then
0 2 0 3
      
3 1 2 −1 1 6 1 1 
AB =    =  = .
0 2 0 3 6 6 6 1
40 CHAPTER 4. LINEAR MODELS AND MATRIX ALGEBRA

So B is the inverse of A.

6. Suppose A and B are nonsingular matrices with dimension n × n.


(a) (AB)−1 = B −1 A−1 ;
(b) (A′ )−1 = (A−1 )′ .

Inverse Matrix and Solution of Linear-Equation System

The application of the concept of inverse matrix to the solution of a simul-


taneous linear-equation system is immediate and direct. Consider

Ax = d.

If A is a nonsingular matrix, then premultiplying both sides of Ax = d,


we have
A−1 Ax = A−1 d.

So, x = A−1 d is the solution of Ax = d, and furthermore, the solution is


unique since A−1 is unique. Methods of testing the existence of the inverse
and of its calculation will be discussed in the next chapter.
Chapter 5

Linear Models and Matrix


Algebra (Continued)

In chapter 4, it was shown that a linear-equation system can be written


in a compact notation. Moreover, such an equation system can be solved
by finding the inverse of the coefficient matrix, provided the inverse ex-
ists. This chapter studies how to test for the existence of the inverse and
how to find that inverse, and consequently give the ways of solving linear
equation systems.

5.1 Conditions for Nonsingularity of a Matrix


As was pointed out earlier, the squareness condition is necessary but not
sufficient for the existence of the inverse A−1 of a matrix A. What are the
conditions for the existence of inverse A−1 of a matrix A?

Conditions for Nonsingularity

When the squareness condition is already met, a sufficient condition


for the nonsingularity of a matrix is that its rows (or equivalently, its

41
42CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

columns) are linearly independent. In fact, the necessary and sufficient


conditions for nonsingularity are that the matrix satisfies the squareness
and linear independence conditions.
To see this, write an n × n coefficient matrix A as an ordered set of row
vectors:    
a
 11
a12 · · · a1n  v′
 1
  ′

 a21 v 
a22 · · · a2n 
  2

A=

= 
· · · ··· ···  
 
· · ·
   
an1 an2 · · · ann vn′

where vi′ = [ai1 , ai2 , · · · , ain ], i = 1, 2, · · · , n. As we discussed in the previ-


ous chapter, for the rows to be linearly independent, for any set of scalars
∑n
ki , i=1 ki vi = 0 if and only if ki = 0 for all i, which is equivalent that the
homogeneous linear-equation system

Ak = 0

has the unique solution k = 0, where its transpose k′ = (k1 , k2 , . . . , kn ).


This is true when A has the inverse. Thus, A has the inverse if and only if
the matrix satisfies the squareness and linear independence conditions.

Example 5.1.1 For a given matrix,


 
3 4 5
 
0 1 2
 ,
 
6 8 10

since v3′ = 2v1′ + 0v2′ , so the matrix is singular.


 
1 2
Example 5.1.2 B =   is nonsingular since their two rows are not
3 4
proportional.
5.2. TEST OF NONSINGULARITY BY USE OF DETERMINANT 43
 
−2 1
Example 5.1.3 C =   is singular their two rows are proportional.
6 −3

Rank of a Matrix

The above discussion on row independence are regard to square ma-


trices, it is equally applicable to any m × n rectangular matrix.

Definition 5.1.1 A matrix Am×n is said to be of rank γ if the maximum


number of linearly independent rows that can be found in such a matrix
is γ.

By definition, an n×n nonsingular matrix A has n linearly independent


rows (or columns); consequently it must be of rank n. Conversely, an n × n
matrix having rank n must be nonsingular.

5.2 Test of Nonsingularity by Use of Determi-


nant
To determine whether a square matrix is nonsingular by finding the in-
verse of the matrix is not an easy job. However, we can use the determi-
nant of the matrix to easily determine if a square matrix is nonsingular.

Determinant and Nonsingularity

The determinant of a square matrix A, denoted by |A|, is a uniquely


defined scalar associated with that matrix. Determinants are defined only
for square matrices. For a 2 × 2 matrix:
 
a11 a12 
A= ,
a21 a22
44CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

its determinant is defined as follows:

a11 a12
|A| = = a11 a22 − a12 a21 .
a21 a22

In view of the dimension of matrix A, |A| as defined in the above is


called a second-order determinant.

   
10 4 3 5 
Example 5.2.1 Given A =   and B =  , then
8 5 0 −1

10 4
|A| = = 50 − 32 = 18;
8 5

3 5
|B| = = −3 − 5 × 0 = −3.
0 −1

 
2
6
Example 5.2.2 A =  . Then its determinant is
8 24

2 6
|A| = = 2 × 24 − 6 × 8 = 48 − 48 = 0.
8 24

This example shows that the determinant is equal to zero if and only if
its rows are linearly dependent. As will be seen, the value of a determinant
|A| can serve as a criterion for testing the linear independence of the rows
(hence nonsingularity) of matrix A but also as an input in the calculation
of the inverse A−1 , if it exists.
5.2. TEST OF NONSINGULARITY BY USE OF DETERMINANT 45

Evaluating a Third-Order Determinant

For a 3 × 3 matrix A, its third-order determinants have the value

a11 a12 a13


a22 a23 a21 a23 a21 a22
|A| = a21 a22 a23 = a11 − a12 + a13
a32 a33 a31 a33 a31 a32
a31 a32 a33

= a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 .

We can use the following diagram to calculate the third-order determi-


nant.

Figure 5.1: The graphic illustration for calculating the third-order determi-
nant.
46CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Example 5.2.3

2 1 3
4 5 6 =2×5×9+1×6×7+4×8×3−3×5×7−1×4×9−6×8×2
7 8 9

= 90 + 42 + 96 − 105 − 36 − 96 = −9.

Example 5.2.4

0 1 2
3 4 5 =0×4×8+1×5×6+3×7×2−2×4×6−1×3×8−5×7×0
6 7 8

= 0 + 30 + 42 − 48 − 24 − 0 = 0.

Example 5.2.5

−1 2 1
0 3 2 = −1 × 3 × 2 + 2 × 2 × 1 + 0 × 0 × 1 − 1 × 3 × 1 − 2 × 0 × 2 − 2 × 0 × (−1)
1 0 2

= −6 + 4 + 0 − 3 − 0 − 0 = −5.

The method of cross-diagonal multiplication provides a handy way of


evaluating a third-order determinant, but unfortunately it is not applicable
to determinants of orders higher than 3. For the latter, we must resort to
the so-called “Laplace expansion" of the determinant.

Evaluating an nth-Order Determinant by Laplace Expansion

The minor of the element aij of a determinant |A|, denoted by |Mij |,


can be obtained by deleting the ith row and jth column of the determinant
|A|.
5.2. TEST OF NONSINGULARITY BY USE OF DETERMINANT 47

For instance, for a third determinant, the minors of a11 , a12 and a13 are

a22 a23 a21 a23 a21 a21


|M11 | = , |M12 | = , |M13 | = .
a32 a33 a31 a33 a31 a32

A concept closely related to the minor is that of the cofactor. A cofactor,


denoted by Cij , is a minor with a prescribed algebraic sign attached to it.
Formally, it is defined by


 −|Mij | if i + j is odd;
|Cij | = (−1)i+j |Mij | =

 |Mij | if i + j is even.

Thus, if the sum of the two subscripts i and j in Mij is even, then
|Cij | = |Mij |. If it is odd, then |Cij | = −|Mij |.

Using these new concepts, we can express a third-order determinant as

|A| = a11 |M11 | − a12 |M12 | + a13 |M13 |


= a11 |C11 | + a12 |C12 | + a13 |C13 |.

The Laplace expansion of a third-order determinant serves to reduce


the evaluation problem to one of evaluating only certain second-order de-
terminants. In general, the Laplace expansion of an nth-order determinant
will reduce the problem to one of evaluating n cofactors, each of which is
of the (n − 1)th order, and the repeated application of the process will
methodically lead to lower and lower orders of determinants, eventually
culminating in the basic second-order determinants. Then the value of the
original determinant can be easily calculated.

Formally, the value of a determinant |A| of order n can be found by the


48CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Laplace expansion of any row or any column as follows:


n
|A| = aij |Cij | [expansion by the ith row]
j=1
∑n
= aij |Cij | [expansion by the jth column].
i=1

Even though one can expand |A| by any row or any column, as the
numerical calculation is concerned, a row or column with largest number
of 0’s or 1’s is always preferable for this purpose, because a 0 times its
cofactor is simply 0.

5 6 1
Example 5.2.6 For the |A| = 2 3 0 , the easiest way to expand the
7 −3 0
determinant is by the third column, which consists of the elements 1, 0,
and 0. Thus,

2 3
|A| = 1 × (−1)1+3 = −6 − 21 = −27.
7 −3

Example 5.2.7

0 0 0 1
0 0 2
0 0 2 0
= 1 × (−1)1+4 0 3 0 = −1 × (−24) = 24.
0 3 0 0
4 0 0
4 0 0 0

A triangular matrix is a special type of square matrix. A square matrix


is called the lower triangular if all the entries above the main diagonal
are zero. Similarly, a square matrix is called the upper triangular if all the
entries below the main diagonal are zero.
5.3. BASIC PROPERTIES OF DETERMINANTS 49

Example 5.2.8 (Upper Triangular Determinant) This example shows that


the value of an upper triangular determinant is the product of all elements
on the main diagonal.

a11 a12 · · · a1n a22 a23 · · · a2n


0 a22 · · · a2n 0 a33 · · · a3n
= a11 × (−1)1+1
··· ··· ··· ··· ··· ··· ··· ···
0 0 · · · ann 0 0 · · · ann

a33 a34 · · · a3n


0 a44 · · · a4n
= a11 × a22 × (−1)1+1
··· ··· ··· ···
0 0 · · · ann

= · · · = a11 × a22 × ann .

5.3 Basic Properties of Determinants


Property I. The determinant of a matrix A has the same value as that of
its transpose A′ , i.e.,
|A| = |A′ |.

Example 5.3.1 For


a b
|A| = = ad − bc,
c d

we have
a c
|A′ | = = ad − bc = |A|.
b d

Property II. The interchange of any two rows (or any two columns)
will alter the sign, but not the numerical value of the determinant.
50CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

a b
Example 5.3.2 = ad − bc, but the interchange of the two rows yields
c d

c d
= bc − ad = −(ad − bc).
a b

Property III. The multiplication of any one row (or one column) by a
scalar k will change the value of the determinant k-fold, i.e., for |A|,

a11 a12 ··· a1n a11 a12 · · · a1n


··· ··· ··· ··· ··· ··· ··· ···
kai1 kai2 · · · kain = k ai1 ai2 · · · ain = k|A|.
··· ··· ··· ··· ··· ··· ··· ···
an1 an2 ··· ann an1 an2 · · · ann

In contrast, the factoring of a matrix requires the presence of a common


divisor for all its elements, as in
   
a
 11
a12 · · · a1n  ka
 11
ka12 · · · ka1n 
  
 a21 a22 · · · a2n 
 ka21 ka22 · · · ka2n 
 
 
k = .

· · · ··· 
···  
 ··· ··· ··· 

   
an1 an2 · · · ann kan1 kan2 · · · kann

Property IV. The addition (subtraction) of a multiple of any row (or


column) to (from) another row (or column) will leave the value of the de-
terminant unaltered.

This is an extremely useful property, which can be used to greatly sim-


plify the computation of a determinant.
5.3. BASIC PROPERTIES OF DETERMINANTS 51

Example 5.3.3

a b a b
= a(d + kb) − b(c + ka) = ad − bc = .
c + ka d + kb c d

Example 5.3.4

a b b b a + 3b b b b a + 3b b b b
b a b b a + 3b a b b 0 a−b 0 0
= = = (a+3b)(a−b)3 .
b b a b a + 3b b a b 0 0 a−b 0
b b b a a + 3b b b a 0 0 0 a−b

The second determinant in the above equation is obtained by adding


the second column, the third column and the fourth column into the first
column, respectively. The third determinant is obtained by adding the mi-
nus of the first row to the second row, the third row, and the fourth row in
the second determinant, respectively. Since the fourth determinant is up-
per triangle, the value is the product of all elements on the main diagonal.

Example 5.3.5 Similarly, we can compute the following example:

1 2 3 4 10 2 3 4 10 2 3 4
2 3 4 1 10 3 4 1 0 1 1 −3
= =
3 4 1 2 10 4 1 2 0 2 −2 −2
4 1 2 3 10 1 2 3 0 −1 −1 −1

1 1 −3 1 1 −3
1+1
= (−1) 10 2 −2 −2 = 10 0 −4 4 = 160.
−1 −1 −1 0 0 −4
52CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Example 5.3.6

−2 5 −1 3 0 −13 25 17
−13 25 17
1 −9 13 7 1 −9 13 7
= = (−1)1+2 26 −34 −26
3 −1 5 −5 0 26 −34 −26
26 −33 −24
2 8 −7 −10 0 26 −33 −24

1 25 17 1 25 17
= 13 −2 −34 −26 = 13 0 16 8
−2 −33 −24 0 17 10

16 8
= (−1)1+1 13 = 312.
17 10

The second determinant in the above equation is obtained by adding a


multiple 2 of the second row to the first row; a multiple -3 of the second
row to the third row, and a multiple -2 of the second row to the fourth
row, respectively. The third determinant is obtained by expanding the
first column. The forth determinant is obtained by taking -13 out from
the first column. The fifth determinant is obtained by adding a multiple
2 of the first row to the second row and third row, respectively. The sixth
determinant is obtained by expanding the first column.

Property V. If one row (or column) is a multiple of another row (or


column), the value of the determinant will be zero.

Example 5.3.7
ka kb
= kab − kab = 0.
a b

Remark 5.3.1 Property V is a logic consequence of Property IV.

Property VI. If A and B are both square matrices, then |AB| = |A||B|.
5.3. BASIC PROPERTIES OF DETERMINANTS 53

The above basic properties are useful in several ways. For one thing,
they can be of great help in simplifying the task of evaluating determi-
nants. By adding (or subtracting) multipliers of one row (or column) from
another, the elements of the determinant may be reduced to much simpler
numbers. If we can indeed apply these properties to transform some row
or column into a form containing mostly 0’s or 1’s, Laplace expansion of
the determinant will become a much more manageable task.
Property VII. |A−1 | = 1
|A|
. As a consequence, if A−1 exists, we must
have |A| =
̸ 0. The converse is also true.
Recipe - How to Calculate the Determinant:

1. The multiplication of any one row (or column) by a scalar k


will change the value of the determinant k-fold.

2. The interchange of any two rows (columns) will change the


sign but not the numerical value of the determinant.

3. If a multiple of any row is added to (or subtracted from)


any other row it will not change the value or the sign of
the determinant. The same holds true for columns (i.e. the
determinant is not affected by linear operations with rows
(or columns)).

4. If two rows (or columns) are proportional, i.e., they are lin-
early dependent, then the determinant will vanish.

5. The determinant of a triangular matrix is a product of its


principal diagonal elements.

Using these rules, we can simplify the matrix (e.g. obtain as many zero
elements as possible) and then apply Laplace expansion.
54CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Determinantal Criterion for Nonsingularity

Our present concern is primarily to link the linear dependence of


rows with the vanishing of a determinant. By property I, we can easily see
that row independence is equivalent to column independence.

Given a linear-equation system Ax = d, where A is an n × n coefficient


matrix, we have

|A| ̸= 0 ⇔ A is row (or column) independent


⇔ rank(A) = n
⇔ A is nonsingular
⇔ A−1 exists
e = A−1 d exists.
⇔ a unique solution x

Thus the value of the determinant of A provides a convenient criterion


for testing the nonsingularity of matrix A and the existence of a unique
solution to the equation system Ax = d.

Rank of a Matrix Redefined

The rank of a matrix A was earlier defined to be the maximum number


of linearly independent rows in A. In view of the link between row in-
dependence and the nonvanishing of the determinant, we can redefine the
rank of an m × n matrix as the maximum order of a nonvanishing deter-
minant that can be constructed from the rows and columns of that matrix.
The rank of any matrix is a unique number.

Obviously, the rank can at most be m or n for a m × n matrix A,


whichever is smaller, because a determinant is defined only for a square
5.4. FINDING THE INVERSE MATRIX 55

matrix. Symbolically, this fact can be expressed as follows:

γ(A) ≤ min{m, n}.

Example 5.3.8  
 1 3 2
 
γ
 2 6 4
=2
 
−5 7 1

1 3 2
1 3
since 2 6 4 = 0 and ̸= 0.
2 6
−5 7 1
One can also see this because the first two rows are linearly dependent,
but the last two are independent, therefore the maximum number of lin-
early independent rows is equal to 2.

Properties of the rank:

1) The column rank and the row rank of a matrix are equal.

2) rank(AB) ≤ min{rank(A); rank(B)}.

3) rank(A) = rank(AA′ ) = rank(A′ A).

5.4 Finding the Inverse Matrix

If the matrix A in a linear-equation system Ax = d is nonsingular, then


A−1 exists, and the unique solution of the system will be x̄ = A−1 d. We
have learned to test the nonsingularity of A by the criterion |A| ̸= 0. The
next question is how we can find the inverse A−1 if A does pass that test.
56CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Expansion of a Determinant by Alien Cofactors

We have known that the value of a determinant |A| of order n can be


found by the Laplace expansion of any row or any column as follows;


n
|A| = aij |Cij | [expansion by the ith row]
j=1
∑n
= aij |Cij | [expansion by the jth column]
i=1

Now what happens if we replace one row (or column) by another row
(or column), i.e., aij by ai′ j for i ̸= i′ or by aij ′ for j ̸= j ′ . Then we have the
following important property of determinants.
Property VIII. The expansion of a determinant by alien cofactors (the
cofactors of a “wrong" row or column) always yields a value of zero. That
is, we have


n
ai′ j |Cij | = 0 (i ̸= i′ ) [expansion by the i′ th row and use of cofactors of ith row]
j=1


n
aij ′ |Cij | = 0 (j ̸= j ′ ) [expansion by the j ′ th column and use of cofactors of jth column]
j=1

The reason for this outcome lies in the fact that the above formula can
be considered as the result of the regular expansion of a matrix that has
two identical rows or columns.

Example 5.4.1 For the determinant

a11 a12 a13


|A| = a21 a22 a23 ,
a31 a32 a33
5.4. FINDING THE INVERSE MATRIX 57

consider another determinant

a11 a12 a13


|A∗ | = a11 a12 a13 .
a31 a32 a33

If we expand |A∗ | by the second row, then we have


3
0 = |A∗ | = a11 |C21 | + a12 |C22 | + a13 |C23 | = a1j |C2j |.
j=1

Matrix Inversion

Property VIII provides a way of finding the inverse of a matrix. For a n × n


matrix A:  
a
 11
a12 · · · a1n 
 
 a21 a22 · · · a2n 
A=


,

· · · ··· ··· 

 
an1 an2 · · · ann

since each element of A has a cofactor |Cij |, we can form a matrix of cofac-
tors by replacing each element aij with its cofactor |Cij |. Such a cofactor
matrix C = [|Cij |] is also n × n. For our present purpose, however, the
transpose of C is of more interest. This transpose C ′ is commonly referred
to as the adjoint of A and is denoted by adj A. That is,
 
|C | |C21 | · · · |Cn1 | 
 11
 
 |C12 | |C22 | · · · |Cn2 | 
 
C ′ ≡ adj A ≡  .

 ··· ··· ··· ··· 

 
|C1n | |C2n | · · · |Cnn |

By utilizing the formula for the Laplace expansion and Property VI, we
58CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

have
  
a
 11
a12 · · · a1n   |C11 | |C21 | · · · |Cn1 | 
  
 a21 a22 · · · a2n   

′   |C12 | |C22 | · · · |Cn2 | 
AC = 

 
· · · ··· ···  
 ··· ··· ··· ··· 

  
an1 an2 · · · ann |C1n | |C2n | · · · |Cnn |
∑ ∑n ∑n 

n
j=1 a1j |C1j | j=1 a1j |C2j | · · · j=1 a1j |Cnj | 
 ∑n ∑n ∑n 
 a2j |Cnj | 
 j=1 a2j |C1j | j=1 a2j |C2j | · · · j=1 
= 
 
 ··· ··· ··· ··· 
 
∑n ∑n ∑n
j=1 anj |C1j | j=1 anj |C2j | · · · j=1 anj |Cnj |
 

|A| 0 ··· 0 
 
 0 |A| · · · 0 
 
= 

··· ··· ··· ···

 
0 0 · · · |A|

= |A|In .

Therefore, by the uniqueness of A−1 of A, we know

−1 C′ adj A
A = = .
|A| |A|

Now we have found a way to invert the matrix A.

Remark 5.4.1 In summary, the general procedures for finding the inverse
of a square A are:

(1) find |A| (if |A| = 0, there is no inverse, stop);

(2) find the cofactors of all elements of A and form C = [|Cij |];

(3) get the transpose of C to have C ′ ;

(4) determine A−1 by A−1 = 1


|A|
C ′;

(5) verify AA−1 = I.


5.4. FINDING THE INVERSE MATRIX 59
 
a b 
In particular, for a 2 × 2 matrix A =  , the cofactor matrix is:
c d

   
|C11 | |C12 |  d −c
C= = .
|C21 | |C22 | −b a

Its transpose is:  


 d −b
C′ =  .
−a a

Therefore, the inverse is given by

adj A
A−1 =
|A|
 
1  d −b
=  ,
ad − cb −c a

which is a very useful formula.

 
3 2
Example 5.4.2 A =  .
1 0

The inverse of A is given by


   
1  0 −2  0 1 
A−1 =  = .
−2 −1 3 1
− 3
2 2

 
4 1 −1
 
Example 5.4.3 Find the inverse of B = 
0 3 2
.
 
3 0 7
60CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Since |B| = 99 ̸= 0, B −1 exists. The cofactor matrix is


 
|C11 | |C12 | |C13 |
 
C =  
|C21 | |C22 | |C23 |
 
|C31 | |C32 | |C33 |
 
(−1)
1+1
|M11 | (−1)1+2 |M12 | (−1)1+3 |M13 |
 
= 
(−1)
2+1
|M21 | (−1)2+2 |M22 | (−1)2+3 |M23 |

 
(−1)3+1 |M31 | (−1)3+2 |M32 | (−1)3+3 |M33 |
 
 3 2 0 2 0 3 


− 
 0 7 3 7 3 0 

 
 
 1 −1 4 −1 4 1
 
= − − 

 0 7 3 7 3 0

 
 
 1 −1 4 −1 4 1 
 
 − 
3 2 0 2 0 3
 
 21 6 −9
 
= 
−7 31 3
.
 
5 −8 12

Then  
 21 −7 5
 
adj B = C ′ = 
 6 31 −8
.
 
−9 3 12

Therefore, we have
 
 21 −7 5 
−1 1 
 6

B =  31 −8
.
99  
−9 3 12
5.5. CRAMER’S RULE 61
 
2 4 5
 
Example 5.4.4 A =  
0 3 0.
 
1 0 1
We have |A| = −9 and
 
 3 −4 −15
−1 1 
A =− 
 0 −3 0 
.
9 
−3 4 6

5.5 Cramer’s Rule


The method of matrix inversion just discussed enables us to derive a con-
venient way of solving a linear-equation system, known as the Cramer’s
rule.

Derivation of the Cramer’s Rule

Given a linear-equation system Ax = d, the solution can be written as

1
x̄ = A−1 d = (adj A)d
|A|

provided A is nonsingular. Thus,


  
|C | |C21 | · · · |Cn1 |   d1 
 11
  
1   
 |C12 | |C22 | · · · |Cn2 |   d2 
x̄ =   
|A| 
 ··· ··· ··· ···   
 · · ·
  
|C1n | |C2n | · · · |Cnn | dn
∑ 

n
i=1 di |Ci1 | 
 ∑n 
1  
 i=1 di |Ci2 | 
=  .
|A| 
 ··· 

 
∑n
i=1 di |Cin |
62CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

That is, the x̄j is given by

1 ∑ n
x̄j = di |Cij |
|A| i=1
 
 11
a a12 · · · d1 · · · a1n 
 
1  a21 a22 · · · d2 · · · a2n 

=  
|A| 
· · · ··· ··· ··· 

 
an1 an2 · · · dn · · · ann
1
= |Aj |,
|A|

where |Aj | is obtained by replacing the jth column of |A| with constant
terms d1 , · · · , dn . This result is the statement of Cramer’s rule.

Example 5.5.1 Let us solve


    
2 3  x1  12
   =  
4 −1 x2 10

for x1 , x2 using Cramer’s rule. Since

|A| = −14, |A1 | = −42, |A2 | = −28,

we have
−42 −28
x1 = = 3, x2 = = 2.
−14 −14

Example 5.5.2

5x1 + 3x2 = 30;


6x1 − 2x2 = 8.
5.5. CRAMER’S RULE 63

We then have

5 3
|A| = = −28;
6 −2

30 3
|A1 | = = −84;
8 −2

5 30
|A2 | = = −140.
6 8

Therefore, by Cramer’s rule, we have

|A1 | −84 |A2 | −140


x̄1 = = = 3 and x̄2 = = = 5.
|A| −28 |A| −28

Example 5.5.3

x1 + x2 + x3 = 0
12x1 + 2x2 − 3x3 = 5
3x1 + 4x2 + x3 = −4.

In the form of matrix


    
1 1
1  x1   0 
    
12 2 −3 x  =  5  .
   2  
    
3 4 1 x3 −4

We have
|A| = 35, |A3 | = 35, and thus x3 = 1.
64CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Example 5.5.4

7x1 − x2 − x3 = 0
10x1 − 2x2 + x3 = 8
6x1 + 3x2 − 2x3 = 7.

We have

|A| = −61, |A1 | = −61, |A2 | = −183, |A3 | = −244.

Thus
|A1 |
x̄1 = = 1,
|A|
|A2 |
x̄2 = = 3,
|A|
|A3 |
x̄3 = = 4.
|A|

Note on Homogeneous Linear-Equation System

A linear-equation system Ax = d is said to be a homogeneous-equation


system if d = 0, i.e., if Ax = 0. If |A| ̸= 0, x̄ = 0 is a unique solution of
Ax = 0 since x̄ = A−1 0 = 0. This is a "trivial solution." Thus, the only
way to get a nontrivial solution from the homogeneous-equation system
is to have |A| = 0, i.e., A is singular. In this case, Cramer’s rule is not
directly applicable. Of course, this does not mean that we cannot obtain
solutions; it means only that the solution is not unique. In fact, it has an
infinite number of solutions.
If r(A) = k < n, we can delete n − k dependent equations from the
homogeneous-equation system Ax = 0, and then apply Cramer’s rule to
any k variables, say (x1 , . . . , xk ) whose coefficient matrix has a rank k and
5.5. CRAMER’S RULE 65

constant term in equation i is −(ai,k+1 xk+1 + . . . , ain xn ).

Example 5.5.5

a11 x1 + a12 x2 = 0,
a21 x2 + a22 x2 = 0.

If |A| = 0, then its rows are linearly dependent. As a result, one of two
equations is redundant. By deleting, say, the second equation, we end up
with one equation with two variables. The solutions are

a12
x̄1 = − x2 if a11 ̸= 0
a11

For a linear-equation system with n variables and m equations, we


have the following proposition.

Proposition 5.5.1 A necessary and sufficient condition for the existence of solu-
tion for a linear-equation system Am×n x = d with n variables and m equations
is that the rank of A and the rank of the added matrix [A; d] are the same, i.e.,

r(A) = r([A; d]).

Overview on Solution Outcomes for a linear-Equation System with Any


Number of Variables and Equations

For a general linear-equation system Ax = d, our discussion can be sum-


marized as in the following table.
66CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

d
d ̸= 0 d=0
|A|
The solution is unique The solution is unique
|A| ̸= 0
and x̄ ̸= 0 and x = 0
Equations An infinite number of There is an infinite
|A| = 0 dependent solutions and x̄ ̸= 0 number of solutions
Equations
No solution exists [Not Applicable]
inconsistent

Table 5.1: The summary of solution for linear-equation system Ax = d

5.6 Application to Market and National-Income


Models

Market Model:

The two-commodity model described in chapter 3 can be written as


follows:

c1 P1 + c2 P2 = −c0 ,
γ1 P1 + γ2 P2 = −γ0 .

Thus
c1 c2
|A| = = c1 γ2 − c2 γ1 ,
γ1 γ2

−c0 c2
|A1 | = = c2 γ0 − c0 γ2 ,
−γ0 γ2

c1 −c0
|A2 | = = c0 γ1 − c1 γ0 .
γ1 −γ0
5.6. APPLICATION TO MARKET AND NATIONAL-INCOME MODELS67

Thus the equilibrium is given by

|A1 | c2 γ0 − c0 γ2
P̄1 = =
|A| c1 γ2 − c2 γ1

and
|A2 | c0 γ1 − c1 γ0
P̄2 = = .
|A| c1 γ2 − c2 γ1

General Market Equilibrium Model:

Consider a market for three goods. The demand and supply for each
good are given by:


 D1 = 5 − 2P1 + P2 + P3 ,

 S1 = −4 + 3P1 + 2P2 .



 D2 = 6 + 2P1 − 3P2 + P3 ,

 S2 = 3 + 2P2 .



 D3 = 20 + P1 + 2P2 − 4P3 ,

 S3 = 3 + P2 + 3P3 ,

where Pi is the price of good i; i = 1; 2; 3.

The equilibrium conditions are: Di = Si ; i = 1; 2; 3 , that is




 5P1 + P2 − P3 = 9,



 −2P1 + 5P2 − P3 = 3,



 −P − P + 7P = 17.
1 2 3
68CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

This system of linear equations can be solved via Cramer’s rule

|A1 | 356
P̄1 = = = 2,
|A| 178

|A2 | 356
P̄2 = = = 2,
|A| 178
|A3 | 534
P̄3 = = = 3.
|A| 178

National-Income Model

Consider the simple national-income model:

Y = C + I0 + G0 ,

C = a + bY (a > 0, 0 < b < 1).

These can be rearranged into the form

Y − C = I0 + G0 ,

−bY + C = a.

While we can solve Ȳ and C̄ by Cramer’s rule, here we solve this model
by inverting the
 coefficient
 matrix.  
 1 −1 1 
1 1
Since A =  , then A−1 = 1−b  .
−b 1 b 1
Hence
  
[ ]
1 1 1 I0 + G0 
Ȳ C̄ =   
1−b b 1 a
 
1  I0 + G0 + a 
=  .
1 − b b(I0 + G0 ) + a
5.7. QUADRATIC FORMS 69

5.7 Quadratic Forms


Quadratic Forms

Definition 5.7.1 A function q with n-variables is said to have the quadrat-


ic form if it can be written as

q(u1 , u2 , · · · , un ) = d11 u21 + 2d12 u1 u2 + · · · + 2d1n u1 un


+ d22 u22 + 2d23 u2 u3 + · · · + 2d2n u2 un
···
+ dnn u2n .

This is, it is a polynomial having only second-order terms (either the square
of a variable or the product of two variables).

If we let dji = dij , i < j, then q(u1 , u2 , · · · , un ) can be written as

q(u1 , u2 , · · · , un ) = d11 u21 + d12 u1 u2 + · · · + d1n u1 un


+ d12 u2 u1 + d22 u22 + · · · + d2n u2 un
···
+ dn1 un u1 + dn2 un u2 + · · · + dnn u2n

n ∑
n
= dij ui uj
i=1 j=1

= u′ Du,

where  
d
 11
d12 · · · d1n 
 
 d21 d22 · · · d2n 
 
D=

,
· · · · · · · · · · · ·

 
dn1 dn2 · · · dnn

which is called the quadratic-form matrix.


70CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Since dij = dji , D is a symmetric matrix.

Example 5.7.1 A quadratic form in two variables:

q = d11 u21 + d12 u1 u2 + d22 u22 .

The symmetric matrix is  


 d11 d12 /2
 .
d12 /2 d22

Then we have q = u′ Du.

Positive and Negative Definiteness:

Definition 5.7.2 A quadratic form q(u1 , u2 , · · · , un ) = u′ Du is said to be

(a) positive definite (PD) if q(u) > 0 for all u ̸= 0;

(b) positive semidefinite (PSD) if q(u) ≥ 0 for all u ̸= 0;

(c) negative definite (ND) if q(u) < 0 for all u ̸= 0;

(d) negative semidefinite (NSD) if q(u) ≤ 0 for all u ̸= 0.

Otherwise q is called indefinite (ID).

Sometimes, we say that a matrix D is, for instance, positive definite if


the corresponding quadratic form q(u) = u′ Du is positive definite.

Example 5.7.2
q = u21 + u22

is positive definite (PD),


q = (u1 + u2 )2

is positive semidefinite (PSD), and

q = u21 − u22
5.7. QUADRATIC FORMS 71

is indefinite.

Determinantal Test for Sign Definiteness:

We state without proof that for the quadratic form q(u) = u′ Du, the
necessary and sufficient condition for positive definiteness is the principal
minors of |D|, namely,
|D1 | = d11 > 0,

d11 d12
|D2 | = > 0,
d21 d22
···

d11 d12 · · · d1n


d21 d22 · · · d2n
|Dn | = > 0.
··· ··· ··· ···
dn1 dn2 · · · dnn

The corresponding necessary and sufficient condition for negative def-


initeness is that the principal minors alternate in sign as follows:

|D1 | < 0, |D2 | > 0, |D3 | < 0, etc.

Two-Variable Quadratic Form

Example 5.7.3 Is q = 5u2 + 3uv + 2v 2 either positive or negative definite?


The symmetric matrix is  
 5 1.5
 .
1.5 2

Since the principal minors of |D| is |D1 | = 5 and

5 1.5
|D2 | = = 10 − 2.25 = 7.75 > 0,
1.5 2
72CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

so q is positive definite.

Three-Variable Quadratic Form

Example 5.7.4 Determine whether

q = u21 + 6u22 + 3u23 − 2u1 u2 − 4u2 u3

is positive or negative definite. The matrix D corresponding this quadratic


form is  
 1 −1 0 
 
D= 
−1 6 −2 ,
 
0 −2 3

and the principal minors of |D| are

|D1 | = 1 > 0,

1 −1
|D2 | = = 6 − 1 = 5 > 0,
−1 6

and
1 −1 0
|D3 | = −1 6 −2 = 11 > 0.
0 −2 3

Thus, the quadratic form is positive definite.

Example 5.7.5 Determine whether

q = −3u21 − 3u22 − 5u23 − 2u1 u2

is positive or negative definite. The matrix D corresponding this quadratic


5.8. EIGENVALUES AND EIGENVECTORS 73

form is  
−3 −1 0
 
D=
−1 −3 0
.
 
0 0 −5

Leading principal minors of D are

|D1 | = −3 < 0,

|D2 | = 8 > 0,

|D3 | = −40 < 0.

Therefore, the quadratic form is negative definite.

5.8 Eigenvalues and Eigenvectors

Consider the matrix equation:

Dx = λx.

Any number λ such that the equation Dx = λx has a non-zero vector-


solution x is called the eigenvalue (or called the characteristic root) of
the above equation. Any non-zero vector x satisfying the above equation
is called the eigenvector (or called the characteristic vector) of D for the
eigenvalue λ.
Recipe - How to calculate eigenvalues:
From Dx = λx, we have the following homogeneous-equation system:

(D − λI)x = 0.

Since we require that x be non-zero, the determinant of (D−λI) should


74CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

vanish. Therefore all eigenvalues can be calculated as roots of the equa-


tion (which is often called the characteristic equation or the characteristic
polynomial of D)
|D − λI| = 0.

Example 5.8.1 Let  


 3 −1 0
 
D= 
−1 3 0 .
 
0 0 5

3−λ −1 0
|D − λI| = −1 3−λ 0
0 0 5−λ
= (3 − λ)(3 − λ)(5 − λ) − (5 − λ)
= (5 − λ)(λ − 2)(λ − 4) = 0,

and therefore the eigenvalues are λ1 = 2, λ2 = 4, and λ3 = 5.


For λ1 = 2, we solve
    
3 − 2 −1 2  x1  0
    
 −1 3−2 0     
  x2  = 0 .
    
0 0 5−2 x3 0

Thus, the eigenvector, corresponding to λ1 = 2, is v1 = c1 (1, 1, 0)′ , where


c1 is an arbitrary real constant. Similarly, for λ2 = 4 and λ3 = 5, we have
v2 = c2 (1, −1, 0)′ and v3 = c3 (1, 2, 0)′ , respectively.

Properties of Eigenvalues:

Proposition 5.8.1 A quadratic form q(u1 , u2 , · · · , un ) = u′ Du is

positive definite if and only if eigenvalues λi > 0 for all i = 1, 2, · · · , n.


5.8. EIGENVALUES AND EIGENVECTORS 75

negative definite if and only if eigenvalues λi < 0 for all i = 1, 2, · · · , n.

positive semidefinite if and only if eigenvalues λi ≥ 0 for all i =


1, 2, · · · , n.

negative semidefinite if and only if eigenvalues λi ≤ 0 for all i =


1, 2, · · · , n

indefinite if at least one positive and one negative eigenvalues exist.

Definition 5.8.1 Matrix A is said to be diagonalizable if there exists a non-


singular matrix P and a diagonal matrix D such that

P −1 AP = D.

Matrix U is an orthogonal matrix if U ′ = U −1 .

Theorem 5.8.1 (The Spectral Theorem for Symmetric Matrices) Suppose that
A is a symmetric matrix of order n and λ1 , · · · , λn are its eigenvalues. Then there
exists an orthogonal matrix U such that
 
λ1 0
 .. 
U −1 AU = 
 . .

 
0 λn

Usually, U is the normalized matrix formed by eigenvectors. It has the


property U ′ U = I. "Normalized" means that for any column u of the
matrix U , we have u′ u = 1.

Example 5.8.2 Diagonalize the matrix


 
1 2
A= .
2 4
76CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

First, we need to find the eigenvalues:

1−λ 2
= λ(λ − 5) = 0,
2 4−λ

i.e., λ1 = 0 and λ2 = 5.

For λ1 = 0, we solve
    
1 − 0 2  x1  0
   =  .
2 4 − 0 x2 0

The eigenvector, corresponding to λ1 = 0, is v1 = c1 (2, −1)′ , where C1


is an arbitrary real constant. Similarly, for λ2 = 5, we have v2 = c2 (1, 2)′ .

Let us normalize the eigenvectors, i.e. let us pick constants Ci such that
vi′ vi = 1. We get
( ) ( )
2 −1 1 2
v1 = √ , √ , v2 = √ ,√ .
5 5 5 5

Thus the diagonalization matrix U is


 
2 √1
 √5 5
U=  
−1
√ 2
√ .
5 5

You can easily check that


 
0 0
U −1 AU = 
 .
0 5

The trace of a square matrix of order n is the sum of the n elements on


∑n
its principal diagonal, i.e., tr(A) = i=1 aii .
5.9. VECTOR SPACES 77

Properties of the Trace:

1) tr(A) = λ1 + · · · + λn ;

2) if A and B are of the same order, tr(A + B) = tr(A) + tr(B);

3) if a is a scalar, tr(aA) = atr(A);

4) tr(AB) = tr(BA), whenever AB is square;

5) tr(A′ ) = tr(A);
∑n ∑n
6) tr(A′ A) = i=1 j=1 a2ij .

5.9 Vector Spaces


A (real) vector space is a nonempty set V of objects together with an addi-
tive operation + : V × V → V , +(u, v) = u + v and a scalar multiplicative
operation · : R × V → V , ·(a, u) = au which satisfies the following axioms
for any u, v, w ∈ V and any a, b ∈ R where R is the set of all real numbers:

1. (u + v) + w = u + (v + w);

2. u + v = v + u;

3. 0 + u = u;

4. u + (−u) = 0;

5. a(u + v) = au + av;

6. (a + b)u = au + bu;

7. a(bu) = (ab)u;

8. 1u = u.

The objects of a vector space V are called the vectors, the operations +
and · are called the vector addition and scalar multiplication, respectively.
The element 0 ∈ V is the zero vector and −v is the additive inverse of V .
78CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Example 5.9.1 (The n-Dimensional Vector Space Rn ) For Rn , consider u, v ∈


Rn , u = (u1 , u2 , · · · , un )′ , v = (v1 , v2 , · · · , vn )′ and a ∈ R. Define the addi-
tive operation and the scalar multiplication as follows:

u + v = (u1 + v1 , · · · , un + vn )′ ,

au = (au1 , · · · , aun )′ .

It is not difficult to verify that Rn together with these operations is a vector


space.

Let V be a vector space. An inner product or scalar product in V is


a function s : V × V → R, s(u, v) = u · v which satisfies the following
properties:

1. u · v = v · u,

2. u · (v + w) = u · v + u · w,

3. a(u · v) = (au) · v = u · (av),

4. u · u ≥ 0 and u · u = 0 iff u = 0.

Example 5.9.2 Let u, v ∈ Rn , u = (u1 , u2 , · · · , un )′ , v = (v1 , v2 , · · · , vn )′ .


Then u · v = u1 v1 + · · · + un vn .

Let V be a vector space and v ∈ V . The norm of magnitude is a func-



tion || · || : V → R defined as ||v|| = v · v. For any v ∈ V and any a ∈ R,
we have the following properties:

1. ||au|| = |a|||u||;

2. ||u + v|| ≤ ||u|| + ||v||;

3. |u · v| ≤ ||u|| × ||v||.
5.9. VECTOR SPACES 79

The nonzero vectors u and v are parallel if there exists a ∈ R such that
u = av.
The vectors u and v are orthogonal or perpendicular if their scalar
product is zero, that is, if u · v = 0.
uv
The angle between vectors u and v is arccos( ||u||||v|| )
A nonempty subset S of a vector space V is a subspace of V if for any
u, v ∈ S and a ∈ R
u + v ∈ S and au ∈ S.

Example 5.9.3 V is a subset of itself. {0} is also a subset of V . These


subspaces are called proper subspaces.

Example 5.9.4 L = {(x, y)|y = mx + n} where m, n ∈ R and m ̸= 0 is a


subspace of R2 .

Let u1 , u2 , · · · , uk be vectors in a vector space V . The set S of all linear


combinations of these vectors

S = {a1 u1 + a2 u2 + · · · , ak uk |a1 , · · · , ak ∈ R}

is called the subspace generated or spanned by the vectors u1 , u2 , · · · , uk


and denoted as sp(u1 , u2 , · · · , uk ). One can prove that S is a subspace of
V.

Example 5.9.5 Let u1 = (2, −1, 1)′ , u2 = (3, 4, 0)′ . Then the subspace of R3
generated by u1 and u2 is

sp(u1 , u2 ) = {(2a + 3b, −a + 4b, a)′ |a, b ∈ R}.

As we discussed in Chapter 4, a set of vectors {u1 , u2 , · · · , uk } in a vec-


tor space V is linearly dependent if there exists the real numbers a1 , a2 , · · · , ak ,
not all zero, such that a1 u1 + a2 u2 + · · · + ak uk = 0. In other words, the
80CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

set of vectors in a vector space is linearly dependent if and only if one


vector can be written as a linear combination of the others. A set of vec-
tors {u1 , u2 , · · · , uk } in a vector space V is linearly independent if it is not
linearly dependent.
Properties: Let {u1 , u2 , · · · , uk } be n vectors in Rn . The following con-
ditions are equivalent:

i) The vectors are independent.

ii) The matrix having these vectors as columns is nonsingular.

iii) The vectors generate Rn .

A set of vectors {u1 , u2 , · · · , uk } in V is a basis for V if it, first,


generates V , and, second, is linearly independent.

Example 5.9.6 Consider the following vectors in Rn . ei = (0, · · · , 0, 1, 0, · · · , 0)′ ,


where 1 is in the ith position, i = 1, · · · , n. The set En = {e1 , e2 , · · · , en }
forms a basis for Rn which is called the standard basis.

Let V be a vector space and B = {u1 , u2 , · · · , uk } a basis for V . Since


B generates V , for any u ∈ V , there exists the real numbers x1 , x2 , · · · , xn
such that u = x1 u1 + · · · + xn un . The column vector x = (x1 , x2 , · · · , xn )′
is called the vector of coordinates of u with respect to B.

Example 5.9.7 Consider the vector space Rn with the standard basis En .
For any u = (u1 , · · · , un )′ , we can represent u as u = u1 e1 + · · · + un en ;
therefore, (u1 , · · · , un )′ is the vector of coordinates of u with respect to En .

Example 5.9.8 Consider the vector space R2 . Let us find the coordinate
vector of (−1, 2)′ with respect to the basis B = (1, 1)′ , (2, −3)′ (i.e., find
(−1, 2)′B ). We have to solve for a, b such that (−1, 2)′ = a(1, 1)′ + b(2, −3)′ .
Solving the system a + 2b = −1 and a − 3b = 2, we find a = 1
5
and b = − 35 .
Thus, (−1, 2)′B = ( 15 , − 53 )′ .
5.9. VECTOR SPACES 81

The dimension of a vector space V dim(V ) is the number of elements


in any basis for V .

Example 5.9.9 The dimension of the vector space Rn with the standard
basis En is n.

Let U and V be two vector spaces. A linear transformation of U into V


is a mapping T : U → V such that for any u, v ∈ U and any a, b ∈ R, we
have
T (au + bv) = aT (u) + bT (v).

Example 5.9.10 Let A be a m × n real matrix. The mapping T : Rn → Rm


defined by T (u) = Au is a linear transformation.

Properties:

Let U and V be two vector spaces, B = (b1 , · · · , bn ) a basis for U and


C = (c1 , · · · , cm ) a basis for V .

1. Any linear transformation T can be represented by an m ×


n matrix AT whose ith column is the coordinate vector of
T (bi ) relative to C.

2. If x = (x1 , · · · , xn )′ is the coordinate vector of u ∈ U relative


to B and y = (y1 , · · · , ym )′ is the coordinate vector of T (u)
relative to C, then T defines the following transformation
of coordinates:

y = AT x for any u ∈ U.

The matrix AT is called the matrix representation of T relative to bases


B and C.
82CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)

Remark 5.9.1 Any linear transformation is uniquely determined by a trans-


formation of coordinates.

Example 5.9.11 Consider the linear transformation T : R3 → R2 , T ((x, y, z)′ ) =


(x − 2y, x + z)′ and bases B = {(1, 1, 1)′ , (1, 1, 0)′ , (1, 0, 0)′ } for R3 and
C = {(1, 1)′ , (1, 0)′ } for R2 . How can we find the matrix representation
of T relative to bases B and C?
We have

T ((1, 1, 1)′ ) = (−1, 2), T ((1, 1, 0)′ ) = (−1, 1), T ((1, 0, 0)′ ) = (1, 1).

The columns of AT are formed by the coordinate vectors of T ((1, 1, 1)′ ),


T ((1, 1, 0)′ ), T ((1, 0, 0)′ ) relative to C. Applying the procedure developed
in Example 5.9.8, we find
 
 2 1 1
AT =  .
−3 −2 0

Let V be a vector space of dimension n, B and C be two bases for V ,


and I : V → V be the identity transformation ((I(v) = v for all v ∈ V ). The
change-of-basis matrix D relative to B, C is the matrix representation of
I to B, C.

Example 5.9.12 For u ∈ V , let x = (x1 , · · · , xn )′ be the coordinate vector of


u relative to B and y = (y1 , · · · , yn )′ is the coordinate vector of u relative
to C. If D is the change-of-basis matrix relative to B, C then y = Cx. The
change-of-basis matrix relative to C, B is D −1 .

Example 5.9.13 Given the following bases for R2 : B = {(1, 1)′ , (1, 0)′ } and
C = {(0, 1)′ , (1, 1)′ }, find the change-of-basis matrix D relative to B, C.
The columns of D are the coordinate vectors of (1, 1)′ , (1, 0)′ relative to C.
5.9. VECTOR SPACES 83

Following Example 5.9.8, we find


 
0 −1
D= .
1 1
84CHAPTER 5. LINEAR MODELS AND MATRIX ALGEBRA (CONTINUED)
Chapter 6

Comparative Statics and the


Concept of Derivative

6.1 The Nature of Comparative Statics

Comparative statics is concerned with the comparison of different equilib-


rium states that are associated with different sets of values of parameters
and exogenous variables. When the value of some parameter or exoge-
nous variable that is associated with an initial equilibrium changes, we
will have a new equilibrium. Then the question posed in the comparative-
static analysis is: How would the new equilibrium compare with the old?
It should be noted that in the comparative-statics analysis we don’t
concern with the process of adjustment of the variables; we merely com-
pare the initial (prechange) equilibrium state with the new (postchange)
equilibrium state. We also preclude the possibility of instability of equilib-
rium for we assume the equilibrium to be attainable.
It should be clear that the problem under consideration is essentially
one of finding a rate of change: the rate of change of the equilibrium value
of an endogenous variable with respect to the change in a particular pa-

85
86CHAPTER 6. COMPARATIVE STATICS AND THE CONCEPT OF DERIVATIVE

rameter or exogenous variable. For this reason, the mathematical concept


of derivative takes on preponderant significance in comparative statics.

6.2 Rate of Change and the Derivative

We want to study the rate of change of any variable y in response to a


change in another variable x, where the two variables are related to each
other by the function
y = f (x).

Applied in the comparative static context, the variable y will represent


the equilibrium value of an endogenous variable, and x will be some pa-
rameter.

The Difference Quotient

We use the symbol ∆ to denote the change from one point, say x0 , to
another point, say x1 . Thus ∆x = x1 − x0 . When x changes from an initial
value x0 to a new value x0 +∆x, the value of the function y = f (x) changes
from f (x0 ) to f (x0 + ∆x). The change in y per unit of change in x can be
represented by the difference quotient.

∆y f (x0 + ∆x) − f (x0 )


= .
∆x ∆x

Example 6.2.1 y = f (x) = 3x2 − 4.

Then f (x0 ) = 3x20 − 4, f (x0 + ∆x) = 3(x0 + ∆x)2 − 4,


6.2. RATE OF CHANGE AND THE DERIVATIVE 87

and thus,

∆y f (x0 + ∆x) − f (x0 )


=
∆x ∆x
3(x0 + ∆x)2 − 4 − (3x20 − 4)
=
∆x
6x0 ∆x + 3(∆x)2
=
∆x
= 6x0 + 3∆x.

The Derivative

Frequently, we are interested in the rate of change of y when ∆x is


very small. In particular, we want to know the rate of ∆y/∆x when ∆x
approaches to zero. If, as ∆x → 0, the limit of the difference quotient
∆y/∆x exits, that limit is called the derivative of the function y = f (x),
and the derivative is denoted by

dy ∆y
≡ y ′ ≡ f ′ (x) ≡ lim .
dx ∆x→0 ∆x

Remark 6.2.1 Several points should be noted about the derivative: (1) a
derivative is a function. Whereas the difference quotient is a function of x0
and ∆x, the derivative is a function of x0 only; and (2) since the derivative
is merely a limit of the difference quotient, it must also be of necessity a
measure of some rate of change. Since ∆x → 0, the rate measured by the
derivative is in the nature of an instantaneous rate of change.

Example 6.2.2 Referring to the function y = 3x2 − 4 again. Since

∆y
= 6x + 3∆x,
∆x

dy
we have dx
= 6x.
88CHAPTER 6. COMPARATIVE STATICS AND THE CONCEPT OF DERIVATIVE

6.3 The Derivative and the Slope of a Curve


Elementary economics tells us that, given a total-cost function C = f (Q),
where C is the total cost and Q is the output, the marginal cost M C is
defined as M C = ∆C/∆Q. It is a continuous variable, ∆Q will refer to an
infinitesimal change. It is well known that M C can be measured by the
slope of the total cost curve. But the slope of the total-cost curve is nothing
but the limit of the ratio ∆C/∆Q as ∆Q → 0. The concept of the slope
of a curve is then merely the geometric counterpart of the concept of the
derivative.

Figure 6.1: Graphical illustrations of the slope of the total cost curve and
the marginal cost.
6.4. THE CONCEPT OF LIMIT 89

6.4 The Concept of Limit

In the above, we have defined the derivative of a function y = f (x) as the


limit of ∆y/∆x as ∆x → 0. We now study the concept of limit. For a given
function q = q(v), the concept of limit is concerned with the question:
What value does q approach as v approaches a specific value? That is,
as v → N (the N can be any number, say N = 0, N = +∞, −∞), what
happens to limv→N g(v).

When we say v → N , the variable v can approach the number N either


from values greater than N , or from values less than N . If, as v → N from
the left side (from values less than N ), q approaches a finite number L, we
call L the left-side limit of q. Similarly, we call L the right-side limit of q
if v → N from the right side. The left-side limit and right-side limit of q
are denoted by limv→N − q and limv→N + q, respectively. The limit of q at N
is said to exit if
lim q = lim+ q,
v→N − v→N

and is denoted by limv→N q. Note that L must be a finite number. If we


have the situation of limv→N q = ∞ (or −∞), we shall consider q to possess
no limit or an "infinite limit." It is important to realize that the symbol ∞
is not a number, and therefore it cannot be subjected to the usual algebraic
operations.

Graphical Illustrations

There are several possible situations regrading the limit of a function,


which are shown in the following diagrams.
90CHAPTER 6. COMPARATIVE STATICS AND THE CONCEPT OF DERIVATIVE

Figure 6.2: Possible Situations regarding the limit of a function q = g(v).

Evaluation of a limit

Let us now illustrate the algebraic evaluation of a limit of a given function


q = g(v).

Example 6.4.1 Given q = 2 + v 2 , find limv→0 q. It is clear that limv→0− q = 2


and limv→0+ q = 2 and v 2 → 0 as v → 0. Thus limv→0 q = 2.
Note that, in evaluating limv→N q, we only let v tends N but, as a rule,
do not let v = N . Indeed, sometimes N even is not in the domain of the
function q = g(v).
6.4. THE CONCEPT OF LIMIT 91

Example 6.4.2 Consider

q = (1 − v 2 )/(1 − v).

For this function, N = 1 is not in the domain of the function, and we


cannot set v = 1 since it would involve division by zero. Moreover, even
the limit-evaluation procedure of letting v − 1 will cause difficulty since
(1 − v) → 0 as v → 1.
One way out of this difficulty is to try to transform the given ratio to a
form in which v will not appear in the denominator. Since

1 − v2 (1 − v)(1 + v)
q= = = 1 + v (v ̸= 1)
1−v 1−v

and v → 1 implies v ̸= 1 and (1 + v) → 2 as v → 1, we have limv→1 q = 2.

2v+5
Example 6.4.3 Find limv→∞ v+1
.
2v+5 2(v+1)+3 3 3 2v+5
Since v+1
= v+1
= 2 + v+1 and limv→∞ v+1
= 0, so limv→∞ v+1
= 2.

Formal View of the Limit Concept

Definition 6.4.1 The number L is said to be the limit of q = g(v) as v


approaches N if, for every neighborhood of L, there can be found a corre-
sponding neighborhood of N (excluding the point v = N ) in the domain
of the function such that, for every value of v in that neighborhood, its im-
age lies in the chosen L-neighborhood. Here a neighborhood of a point L
is an open interval defined by

(L − a1 , L + a2 ) = {q|L − a1 < q < L + a2 } for a1 > a2 > 0


92CHAPTER 6. COMPARATIVE STATICS AND THE CONCEPT OF DERIVATIVE

Figure 6.3: The graphical representation of the limit defined in term of


neighborhoods.

6.5 Inequality and Absolute Values


Rules of Inequalities:

Transitivity:
a > b and b > c implies a > c;
a ≥ b and b ≥ c implies a ≥ c.
Addition and Subtraction:
a > b =⇒ a ± k > b ± k;
a ≥ b =⇒ a ± k ≥ b ± k.
Multiplication and Division:
a > b =⇒ ka > kb (k > 0);
a > b =⇒ ka < kb (k < 0).
6.5. INEQUALITY AND ABSOLUTE VALUES 93

Squaring:
a > b with b ≥ 0 =⇒ a2 > b2 .

Absolute Values and Inequalities

For any real number n, the absolute value of n is defined and denoted
by 

 n if n > 0,



|n| = −n if n < 0,




 0 if n = 0.

Thus we can write |x| < n as an equivalent way −n < x < n (n > 0).
Also |x| ≤ n if and only if −n ≤ x ≤ n (n > 0).
The following properties characterize absolute values:
1) |m| + |n| ≥ |m + n|;
2) |m| · |n| = |m · n|;
|m| m
3) |n|
= n
.

Solution of an Inequality

Example 6.5.1 Find the solution of the inequality 3x−3 > x+1. By adding
(3 − x) to both sides, we have

3x − 3 + 3 − x > x + 1 + 3 − x.

Thus, 2x > 4 so x > 2.

Example 6.5.2 Solve the inequality |1 − x| ≤ 3.


From |1 − x| ≤ 3, we have −3 ≤ 1 − x ≤ 3, or −4 ≤ −x ≤ 2. Thus,
4 ≥ x ≥ −2, i.e., −2 ≤ x ≤ 4.
94CHAPTER 6. COMPARATIVE STATICS AND THE CONCEPT OF DERIVATIVE

6.6 Limit Theorems


Theorems Involving a Single Equation

Theorem I: If q = av + b, then limv→N q = aN + b.

Theorem II: If q = g(v) = b, then limv→N q = b.

Theorem III: limv→N v k = N k .

Example 6.6.1 Given q = 5v + 7, then limv→2 = 5 · 2 + 7 = 17.

Example 6.6.2 q = v 3 . Find limv→2 q.


By theorem III, we have limv→2 = 23 = 8.

Theorems Involving Two Functions

For two functions q1 = g(v) and q2 = h(v), if limv→N q1 = L1 , limv→N q2 =


L2 , then we have the following theorems:

Theorem IV: limv→N (q1 + q2 ) = L1 + L2 .

Theorem V: limv→N (q1 q2 ) = L1 L2 .

Theorem VI: limv→N q1


q2
= L1
L2
(L2 ̸= 0).

1+v
Example 6.6.3 Find limv→0 2+v
.
1+v
Since limv→0 (1 + v) = 1 and limv→0 (2 + v) = 2, so limv→0 2+v
= 12 .

Remark 6.6.1 Note that L1 and L2 represent finite numbers; otherwise the-
orems do not apply.

Limit of a Polynomial Function

lim a0 + a1 v + a2 v 2 + · · · + an v n = a0 + a1 N + a2 N 2 + · · · + an N n .
v→N
6.7. CONTINUITY AND DIFFERENTIABILITY OF A FUNCTION 95

6.7 Continuity and Differentiability of a Func-


tion
Continuity of a Function

Definition 6.7.1 A function q = g(v) is said to be continuous at N if limv→N q


exists and limv→N g(v) = g(N ).

Thus the term continuous involves no less than three requirements: (1)
the point N must be in the domain of the function; (2) limv→N g(v) exists;
and (3) limv→N g(v) = g(N ).

Remark 6.7.1 It is important to note that while – in discussing the limit


of a function – the point (N, L) is excluded from consideration, we are no
longer excluding it in defining continuity at point N . Rather, as the third
requirement specifically states, the point (N, L) must be on the graph of
the function before the function can be considered as continuous at point
N.

Polynomial and Rational Functions

From the discussion of the limit of polynomial function, we know that


the limit exists and equals the value of the function at N . Since N is a
point in the domain of the function, we can conclude that any polynomial
function is continuous in its domain. By those theorems involving two
functions, we also know any rational function is continuous in its domain.

4v 2
Example 6.7.1 q = v 2 +1
.
Then
4v 2 limv→N 4v 2 4N 2
lim = = .
v→N v2 + 1 limv→N (v 2 + 1) N2 + 1
96CHAPTER 6. COMPARATIVE STATICS AND THE CONCEPT OF DERIVATIVE

Example 6.7.2 The rational function

v 3 + v 2 − 4v − 4
q=
v2 − 4

is not defined at v = 2 and v = −2. Since v = 2, −2 are not in the domain,


the function is discontinuous at v = 2 and v = −2, despite the fact that its
limit exists as v → 2 or −2 by noting

v 3 + v 2 − 4v − 4 v(v 2 − 4) + v 2 − 4
q = =
v2 − 4 v2 − 4
(v + 1)(v − 4)
2
= = v + 1 (v ̸= 2, −2).
v2 − 4

Differentiability Implies Continuity

By the definition of the derivative of a function y = f (x), we know that


f ′ (x0 ) exists at x0 if the lim of ∆y/∆x exists at x = x0 as ∆x → 0, i.e.,

∆y
f ′ (x0 ) = lim
∆x→0 ∆x
f (x0 + ∆x) − f (x0 )
≡ lim (differentiability condition).
∆x→0 ∆x

On the other hand, the function y = f (x) is continuous at x0 if and only


if
lim f (x) = f (x0 ) (continuity condition).
x→x0

We want to know what is the relationship between the continuity and


differentiability of a function. Now we show the continuity of f is a nec-
essary condition for its differentiability. But this is not sufficient.
Since the notation x → x0 implies x ̸= x0 , so x−x0 is a nonzero number,
it is permissible to write the following identity:

f (x) − f (x0 )
f (x) − f (x0 ) = (x − x0 ).
x − x0
6.7. CONTINUITY AND DIFFERENTIABILITY OF A FUNCTION 97

Taking the limit of each side of the above equation as x → x0 yields the
following results:

Left side = lim (f (x) − f (x0 )) = lim f (x) − f (x0 ).


x→x0 x→x0

f (x) − f (x0 )
Right side = lim lim (x − x0 )
x→x0 x − x0 x→x0

= f ′ (x0 ) lim (x − x0 ) = 0.
x→x0

Thus limx→x0 f (x) − f (x0 ) = 0. So limx→x0 f (x) = f (x0 ) which means


f (x) is continuous at x = x0 .
Although differentiability implies continuity, the converse may not be
true. That is, continuity is a necessary, but not sufficient, condition for
differentiability. The following example shows this.

Example 6.7.3 f (x) = |x|.


This function is clearly continuous at x = 0. Now we show that it is
not differentiable at x = 0. This involves the demonstration of a disparity
between the left-side limit and the right-side limit. Since, in considering
the right-side limit x > 0, we have

f (x) − f (0) x
lim+ = lim+ = lim+ 1 = 1.
x→0 x−0 x→0 x x→0

On the other hand, in considering the left-side limit, x < 0; we have

f (x) − f (0) |x| −x


lim− = lim− = lim− = lim− −1 = −1.
x→0 x−0 x→0 x x→0 x x→0

∆y
Thus, limx→0 ∆x
does not exist since the left-side limit and the right-
side limit are not the same, which implies that the derivative of y = |x|
does not exist at x = 0.
98CHAPTER 6. COMPARATIVE STATICS AND THE CONCEPT OF DERIVATIVE
Chapter 7

Rules of Differentiation and Their


Use in Comparative Statics

The central problem of comparative-static analysis, that of finding a rate


of change, can be identified with the problem finding the derivative of a
function y = f (x), provided only a small change in x is being considered.
Before going into comparative-static models, we begin some rules of dif-
ferentiation.

7.1 Rules of Differentiation for a Function of One


Variable

Constant-Function Rule

If y = f (x) = c, where c is a constant, then

dy
≡ y ′ ≡ f ′ = 0.
dx
99
100CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

Proof.

dy f (x′ ) − f (x) c−c


= lim = lim =x′ →x 0 = 0.
dx x′ →x x′ − x x′ →x x′ − x

dy df
We can also write dx
= dx
as

d d
y= f.
dx dx

So we may consider d/dx as an operator symbol.

Power-Function Rule

If y = f (x) = xa where a is any real number −∞ < a < ∞,

d
f (x) = axa−1 .
dx

Remark 7.1.1 Note that:

(i) If a = 0, then
d 0 d
x = 1 = 0.
dx dx

(ii) If a = 1, then y = x. Thus

dx
= 1.
dx

For simplicity, we prove this rule only for the case where a = n, where
n is any positive integer. Since

xn − xn0 = (x − x0 )(xn−1 + x0 xn−2 + x20 xn−3 + · · · + x0n−1 ),

then
xn − xn0
= xn−1 + x0 xn−2 + x20 xn−3 + · · · + x0n−1 .
x − x0
7.1. RULES OF DIFFERENTIATION FOR A FUNCTION OF ONE VARIABLE101

Thus,

f (x) − f (x0 ) xn − xn0


f ′ (x0 ) = x→x
lim = x→x
lim
0 x − x0 0 x − x0

= lim xn−1 + x0 xn−2 + x20 xn−3 + · · · + x0n−1


x→x0

= xn−1
0 + xn−1
0 + xn−1
0 + · · · + xn−1
0

= nxn−1
0 .

Example 7.1.1 Suppose y = f (x) = x−3 . Then y ′ = −3x−4 .


x. Then y ′ = 12 x− 2 . In particular, we
1
Example 7.1.2 Suppose y = f (x) =

can know that f ′ (2) = · 2− 2 =
1 1 2
2 4
.

Power-Function Rule Generalized

If the function is given by y = cxa , then

dy df
= = acxa−1 .
dx dx

Example 7.1.3 Suppose y = 2x. Then

dy
= 2x0 = 2.
dx

Example 7.1.4 Suppose y = 4x3 . Then

dy
= 4 · 3x3−1 = 12x2 .
dx

Example 7.1.5 Suppose y = 3x−2 . Then

dy
= −6x−2−1 = −6x−3 .
dx
102CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

Common Rules:

f (x) = constant ⇒ f ′ (x) = 0;


f (x) = xa (a is constant) ⇒ f ′ (x) = axa−1 ;
f (x) = ex ⇒ f ′ (x) = ex ;
f (x) = ax (a > 0) ⇒ f ′ (x) = ax ln a;
1
f (x) = ln x ⇒ f ′ (x) =
x
1 1
f (x) = loga x(a > 0; a ̸= 1) ⇒ f ′ (x) = loga e = ;
x x ln a
f (x) = sin x ⇒ f ′ (x) = cos x;
f (x) = cos x ⇒ f ′ (x) = − sin x;
1
f (x) = tan x ⇒ f ′ (x) = ;
cos2 x
1
f (x) = ctanx ⇒ f ′ (x) = − 2 ;
sin x
1
f (x) = arcsin x ⇒ f ′ (x) = √ ;
1 − x2
1
f (x) = arccos x ⇒ f ′ (x) = − √ ;
1 − x2
1
f (x) = arctan x ⇒ f ′ (x) = ;
1 − x2
1
f (x) = arcctanx ⇒ f ′ (x) = − .
1 − x2

We will come back to discuss the exponential function and log func-
tions and their derivatives in Chapter 10.
7.2. RULES OF DIFFERENTIATION INVOLVING TWO OR MORE FUNCTIONS OF THE SAME V

7.2 Rules of Differentiation Involving Two or More


Functions of the Same Variable

Let f (x) and g(x) be two differentiable functions. We have the following
rules:

Sum-Difference Rule:

d d d
[f (x) ± g(x)] = f (x) ± g(x) = f ′ (x) ± g ′ (x).
dx dx dx
This rule can easily be extended to more functions:
[ ]
d ∑n ∑n
d ∑n
fi (x) = fi (x) = fi′ (x).
dx i=1 i=1 dx i=1

Example 7.2.1
d
(ax2 + bx + c) = 2ax + b.
dx

Example 7.2.2 Suppose a short-run total-cost function is given by c = Q3 −


4Q2 + 10Q + 75. Then the marginal-cost function is the limit of the quotient
∆C/∆Q, or the derivative of the C function:

dC
= 3Q2 − 8Q + 10.
dQ

In general, if a primitive function y = f (x) represents a total func-


tion, then the derivative function dy/dx is its marginal function. Since the
derivative of a function is the slope of its curve, the marginal function
should show the slope of the curve of the total function at each point x.
Sometimes, we say a function is smooth if its derivative is continuous.
104CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

L’Hopital Rule

We may use the derivatives to find the limit of a continuous function of


which the numerator and denominator approach to zero (or infinity), i.e.,
we have the following L’Hopital rule.

Theorem 7.2.1 (L’Hopital Rule) Suppose that f (x) and g(x) are differentiable
on an open interval (a, b), except possibly at c. If limx→c f (x) = limx→c g(x) = 0
or limx→c f (x) = limx→c g(x) = ±∞, g ′ (x) ̸= 0 for all x in (a, b) with x ̸= c,
f ′ (x)
and limx→c ′ exists. Then,
g (x)

f (x) f ′ (x)
lim = lim ′ .
x→c g(x) x→c g (x)

Example 7.2.3
v 3 + v 2 − 4v − 4
q= .
v2 − 4
Note that limv→2 v 3 + v 2 − 4v − 4 = 0 and limv→2 v 2 − 4 = 0. Then, by
L’Hopital Rule, we have

v 3 + v 2 − 4v − 4 + v 2 − 4v − 4)
d
dv
(v 3
lim = lim
v→2 v2 − 4 v→2 d
dv
(v 2 − 4)
3v 2 + 2v − 4
= lim = 3.
v→2 2v

Example 7.2.4
4v + 5
q= .
v2 + 2v − 3
Since limv→∞ 4v + 5 = ∞ and limv→∞ v 2 + 2v − 3 = ∞, by L’Hopital Rule,
we then have
d
4v + 5 dv
(4v + 5)
lim = lim
v→∞ v + 2v − 3
2 v→∞ d
(v 2 + 2v − 3)
dv
4
= lim = 0.
v→∞ 2v + 2
7.2. RULES OF DIFFERENTIATION INVOLVING TWO OR MORE FUNCTIONS OF THE SAME V

Product Rule:

d d d
[f (x)g(x)] = f (x) g(x) + g(x) f (x)
dx dx dx
= f (x)g ′ (x) + g(x)f ′ (x).

Proof.

d f (x)g(x) − f (x0 )g(x0 )


[f (x0 )g(x0 )] = lim
dx x→x 0 x − x0
f (x)g(x) − f (x)g(x0 ) + f (x)g(x0 ) − f (x0 )g(x0 )
= lim
x→x0 x − x0
f (x)[g(x) − g(x0 )] + g(x0 )[f (x) − f (x0 )]
= lim
x→x0 x − x0
g(x) − g(x0 ) f (x) − f (x0 )
= lim f (x) + lim g(x0 )
x→x0 x − x0 x→x 0 x − x0
= f (x0 )g ′ (x0 ) + g(x0 )f ′ (x0 ).

Since this is true for any x = x0 , this proves the rule.

Example 7.2.5 Suppose y = (2x+3)(3x2 ). Let f (x) = 2x+3 and g(x) = 3x2 .
Then f ′ (x) = 2, g ′ (x) = 6x. Hence,

d
[(2x + 3)(3x2 )] = (2x + 3)6x + 3x2 · 2
dx
= 12x2 + 18x + 6x2
= 18x2 + 18x.

As an extension of the rule to the case of three functions, we have

d
[f (x)g(x)h(x)] = f ′ (x)g(x)h(x) + f (x)g ′ (x)h(x) + f (x)g(x)h′ (x).
dx
106CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

Finding Marginal-Revenue Function from Average-Revenue Function

Suppose that the average-revenue (AR) function is specified by

AR = 15 − Q.

The the total-revenue (TR) function is

T R ≡ AR · Q = 15Q − Q2 .

Then, the marginal-revenue (MR) function is given by

d
MR ≡ T R = 15 − 2Q.
dQ

In general, if AR = f (Q), then

T R ≡ AR · Q = Qf (Q).

Thus
d
MR ≡ T R = f (Q) + Qf ′ (Q).
dQ
From this, we can tell relationship between M R and AR. Since

M R − AR = Qf ′ (Q),

they will always differ the amount of Qf ′ (Q). Also, since

TR PQ
AR ≡ = = P,
Q Q

we can view AR as the inverse demand function for the product of the
firm. If the market is perfectly competitive, i.e., the firm takes the price as
given, then P = f (Q) =constant. Hence f ′ (Q) = 0. Thus M R − AR = 0
7.2. RULES OF DIFFERENTIATION INVOLVING TWO OR MORE FUNCTIONS OF THE SAME V

or M R = AR. Under imperfect competition, on the other hand, the AR


curve is normally downward-sloping, so that f ′ (Q) < 0. Thus M R < AR.

Quotient Rule

[ ]
d f (x) f ′ (x)g(x) − f (x)g ′ (x)
= .
dx g(x) g 2 (x)
We will come back to prove this rule after learning the chain rule.

Example 7.2.6
[ ]
d 2x − 3 2(x + 1) − (2x − 3)(1) 5
= = .
dx x + 1 (x + 1)2 (x + 1)2

[ ]
d 5x 5(x2 + 1) − 5x(2x) 5(1 − x2 )
= = .
dx x2 + 1 (x2 + 1)2 (x2 + 1)2

[ ]
d ax2 + b 2ax(cx) − (ax2 + b)c c(ax2 − b) ax2 − b
= = = .
dx cx (cx)2 (cx)2 cx2

Relationship Between Marginal-Cost and Average-Cost Functions

As an economic application of the quotient rule, let us consider the rate


of change of average cost when output varies.
Given a total cost function C = C(Q), the average cost (AC) function
and the marginal-cost (M C) function are given by

C(Q)
AC ≡ (Q > 0),
Q

and
M C ≡ C ′ (Q).
108CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

Figure 7.1: Graphical representation of relationship between marginal-cost


and average-cost functions

The rate of change of AC with respect to Q can be found by differenti-


ating AC:
[ ]
d C(Q) C ′ (Q)Q − C(Q)
=
dQ Q Q2
[ ]
1 ′ C(Q)
= C (Q) −
Q Q
1
= [M C(Q) − AC(Q)] .
Q
7.3. RULES OF DIFFERENTIATION INVOLVING FUNCTIONS OF DIFFERENT VARIABLES109

From this it follows that, for Q > 0.

d
AC > 0 iff M C(Q) > AC(Q);
dQ
d
AC = 0 iff M C(Q) = AC(Q);
dQ
d
AC < 0 iff M C(Q) < AC(Q).
dQ

7.3 Rules of Differentiation Involving Functions


of Different Variables
Now we consider cases where there are two or more differentiable func-
tions, each of which has a distinct independent variable.

Chain Rule

If we have a function z = f (y), where y is in turn a function of another


variable x, say, y = g(x), then the derivative of z with respect to x is given
by
dz dz dy
= · = f ′ (y)g ′ (x). [ChainRule]
dx dy dx
The chain rule appeals easily to intuition. Given a ∆x, there must result
in a corresponding ∆y via the function y = g(x), but this ∆y will in turn
being about a ∆z via the function z = f (y).
Proof. Note that

dz ∆z ∆z ∆y
= lim = lim .
dx ∆x→0 ∆x ∆x→0 ∆y ∆x

Since ∆x → 0 implies ∆y → 0 which in turn implies ∆z → 0, we then


have
dz dz dy
= · = f ′ (y)g ′ (x). Q.E.D.
dx dy dx
110CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

In view of the function y = g(x), we can express the function z = f (y)


as z = f (g(x)), where the contiguous appearance of the two function sym-
bols f and g indicates that this is a compose function (function of a func-
tion). So sometimes, the chain rule is also called the composite function
rule.
As an application of this rule, we use it to prove the quotient rule.
For z = 1
g(x)
, let z = 1
y
= y −1 and y = g(x). Then we have

dz dz dy 1 g ′ (x)
= · = − 2 g ′ (x) = − 2 .
dx dy dx y g (x)

Thus,
[ ]
d f (x) d
= [f (x) · g −1 (x)]
dx g(x) dx
d −1
= f ′ (x)g −1 (x) + f (x)[g (x)]
dx
[ ]
′ −1 g(x)
= f (x)g (x) + f (x) − 2
g (x)
′ ′
f (x)g(x) − f (x)g (x)
= . Q.E.D.
g 2 (x)

Example 7.3.1 If z = 3y 2 and y = 2x + 5, then

dz dz dy
= = 6y(2) = 12y = 12(2x + 5).
dx dy dx

Example 7.3.2 If z = y − 3 and y = x3 , then

dz dz dy
= = 1 · 3x2 = 3x2 .
dx dy dx

The usefulness of this rule can best be appreciated when one must dif-
ferentiate a function such as those below.

Example 7.3.3 z = (x2 + 3x − 2)17 . Let z = y 17 and y = x2 + 3x − 2.


7.3. RULES OF DIFFERENTIATION INVOLVING FUNCTIONS OF DIFFERENT VARIABLES111

dz dz dy
= = 17y 16 · (2x + 3)
dx dy dx
= 17(x2 + 3x − 2)16 (2x + 3).

Once being familiar with the chain rule, it is unnecessary to adopt in-
termediate variables to find the derivative of a function.
We can find the derivative of a more general function by applying the
chain rule repeatedly.

Example 7.3.4 z = [(x3 − 2x + 1)3 + 3x]−2 . Applying the chain rule repeat-
edly, we have.

dz
= −2[(x3 − 2x + 1)3 + 3x]−3 [3(x3 − 2x + 1)2 (3x2 − 2) + 3].
dx

Example 7.3.5 Suppose T R = f (Q), where output Q is a function of la-


bor input L, or Q = g(L). Then, by the chain rule, the marginal revenue
product of labor (M RPL ) is

dR dR dQ
M RPL = = = f ′ (Q)g ′ (L) = M R · M PL ,
dL dQ dL

where M RPL is marginal physical product of labor. Thus the result shown
above constitutes the mathematical statement of the well-known result in
economics that M RPL = M R · M PL .

Inverse-Function Rule

If a function y = f (x) represents a one-to-one mapping, i.e., if the func-


tion is such that a different value of x will always yield a different value of
y, then the function f will have an inverse function x = f −1 (y). Here, the
symbol f −1 is a function symbol which signifies a function related to the
112CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

function f ; it does not mean the reciprocal of the function f (x). When x
and y refer specifically to numbers, the property of one-to-one mapping is
seen to be unique to the class of function known as monotonic function.

Definition 7.3.1 A function f is said to be monotonically increasing (de-


creasing) if x1 > x2 implies f (x1 ) > f (x2 ) (f (x1 ) < f (x2 )).

In either of these cases, an inverse function f −1 exists.


A practical way of ascertaining the monotonicity of a given function
y = f (x) is to check whether the f ′ (x) always adheres to the same algebraic
sign for all values. Geometrically, this means that its slope is either always
upward or always downward.

Example 7.3.6 Suppose y = 5x + 25. Since y ′ = 5 for all x, the function


is monotonic and thus the inverse function exists. In fact, it is given by
x = 1/5y − 5.

If an inverse function exists, the original and the inverse functions must
be both monotonic. Moreover, if f −1 is the inverse function of f , then f
must be the inverse function of f −1 .
In general, we may not have the explicit inverse function. However,
we can easily find the derivative of an inverse function by the following
inverse function rule:
dx 1
= dy .
dy dx

Proof.
dx ∆x 1 1
= lim = lim ∆y = ′
dy ∆y→0 ∆y ∆x→0 ∆x y
by noting that ∆y → 0 implies ∆x → 0.

Example 7.3.7 Suppose y = x5 + x. Then

dx 1 1
= dy = 4 .
dy dx
5x + 1
7.4. INTEGRATION (THE CASE OF ONE VARIABLE) 113

Example 7.3.8 Given y = ln x, its inverse is x = ey . Therefore, by the


inverse-function rule, we have

dx 1 1
= = = x = ey .
dy dy/dx 1/x

7.4 Integration (The Case of One Variable)

Let f (x) be a continuous function. The indefinite integral of f (denoted by



f (x)dx) is defined as

f (x)dx = F (x) + C,

where F (x) is such that F ′ (x) = f (x), and C is an arbitrary constant.

Rules of Integration

∫ ∫ ∫
• [af (x) + bg(x)]dx = a f (x)dx + b g(x)dx, where a and b are con-
stants (linearity of the integral);

∫ ∫
• f ′ (x)g(x)dx = f (x)g(x) − f (x)g ′ (x)dx (integration by parts);

∫ ∫
• f (u(t)) du
dt
dt = f (u)du (integration by substitution).
114CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

Some Special Rules of Integration:


f ′ (x)
dx = ln |f (x)| + C;
f (x)

1
dx = ln |x| + C;
∫ x
ex dx = ex + C;

f ′ (x)ef (x) dx = ef (x) + C;

xa+1
xa dx = + C, a ̸= −1;
a+1

ax
ax dx = + C, a > 0.
ln a

Example 7.4.1

∫ ∫ ∫ ∫
x2 + 2x + 1 1 x2
dx = xdx + 2dx + dx = + 2x + ln |x| + C.
x x 2

Example 7.4.2


1∫ 1 ∫ −z e−x
2
−x2
(−2x)e−x dx = −
2
xe dx = − e dz = − + C.
2 2 2

Example 7.4.3
∫ ∫
xe dx = xe −
x x
ex dx = xex − ex + C.

Definition 7.4.1 (The Newton-Leibniz formula) The definite integral of


a continuous function f is
∫ b
f (x)dx = F (x)|ba = F (b) − F (a)
a

for F (x) such that F ′ (x) = f (x) for all x ∈ [a, b].
7.4. INTEGRATION (THE CASE OF ONE VARIABLE) 115

Remark 7.4.1 The indefinite integral is a function. The definite integral is


a number.

Properties of Definite Integrals:

∫ b ∫ b ∫ b
[αf (x) + βg(x)]dx = α f (x)dx + β g(x)dx;
a a a
∫ b ∫ a
f (x)dx = − f (x)dx;
∫aa b

f (x)dx = 0;
a
∫ b ∫ c ∫ b
f (x)dx = f (x)dx + f (x)dx;
a a c
∫ b ∫ b
f (x)dx ≤ |f (x)|dx;
a a
∫ b ∫ g(b)
f (x)g ′ (x)dx = f (u)du, u = g(x) (change of variable);
a g(a)
∫ b ∫ b
f ′ (x)g(x)dx = f (x)g(x)|ba − f (x)g ′ (x)dx,
a a

where a, b, c, α, β are real numbers.

Some More Useful Results:

d ∫ b(λ)
f (x)dx = f (b(λ))b′ (λ) − f (a(λ))a′ (λ).
dλ a(λ)

Example 7.4.4
d ∫x
f (t)dt = f (x).
dx a
116CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

7.5 Partial Differentiation

So far, we considered only the derivative of functions of a single indepen-


dent variable. In comparative-static analysis, however, we are likely to
encounter the situation in which several parameters appear in a model,
so that the equilibrium value of each endogenous variable may be a func-
tion of more than one parameter. Because of this, we now consider the
derivative of a function of more than one variable.

Partial Derivatives

Consider a function

y = f (x1 , x2 , · · · , xn ),

where the variables xi (i = 1, 2, · · · , n) are all independent of one another,


so that each can vary by itself without affecting the others. If the variable
xi changes ∆xi while the other variables remain fixed, there will be a cor-
responding change in y, namely, ∆y. The difference quotient in this case
can be expressed as

∆y f (x1 , x2 , · · · , xi−1 , xi + ∆xi , xi , · · · , xn ) − f (x1 , x2 , · · · , xn )


= .
∆xi ∆xi

If we take the limit of ∆y/∆xi , that limit will constitute a derivative.


We call it the partial derivative of y with respect to xi . The process of tak-
ing partial derivatives is called partial differentiation. Denote the partial
∂y
derivative of y with respect to xi by ∂xi
, i.e.,

∂y ∆y
= lim .
∂xi ∆xi →0 ∆xi

Also we can use fi to denote ∂y/∂xi . If the function happens to be


7.5. PARTIAL DIFFERENTIATION 117

written in terms of unsubscripted variables, such as y = f (u, v, w), one


also uses, fu , fv , fw to denote the partial derivatives.

Techniques of Partial Differentiation

Partial differentiation differs from the previously discussed differen-


tiation primarily in that we must hold the other independent variables
constant while allowing one variable to vary.

Example 7.5.1 Suppose that y = f (x1 , x2 ) = 3x21 + x1 x2 + 4x22 . Find ∂y/∂x1


and ∂y/∂x2 .
∂y ∂f
≡ = 6x1 + x2 ;
∂x1 ∂x1
∂y ∂f
≡ = x1 + 8x2 .
∂x2 ∂x2

Example 7.5.2 For y = f (u, v) = (u + 4)(3u + 2v), we have

∂y
≡ fu = (3u + 2v) + (u + 4) · 3
∂u
= 6u + 2v + 12;

∂y
≡ fv = 2(u + 4).
∂v
When u = 2 and v = 1, then fu (2, 1) = 26 and fv (2, 1) = 12.

Example 7.5.3 Given y = (3u − 2v)/(u2 + 3v),

∂y 3(u2 + 3v) − (3u − 2v)(2u) −3u2 + 4uv + 9v


= = ;
∂u (u2 + 3v)2 (u2 + 3v)2

and
∂y −2(u2 + 3v) − (3u − 2v) · 3 −u(2u + 9)
= 2 2
= .
∂v (u + 3v) (u2 + 3v)2
118CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

7.6 Applications to Comparative-Static Analysis

Equipped with the knowledge of the various rules of differentiation, we


can at last tackle the problem posed in comparative-static analysis: name-
ly, how the equilibrium value of an endogenous variable will change when
there is a change in any of the exogenous variables or parameters.

Market Model

For the one-commodity market model:

Qd = a − bp (a, b > 0);


Qs = −c + dp (c, d > 0),

the equilibrium price and quantity are given by

a+c
p̄ = ;
b+d
ad − bc
Q̄ = .
b+d

These solutions will be referred to as being in the reduced form: the


two endogenous variables have been reduced to explicit expressions of
the four independent variables, a, b, c, and d.
To find how an infinitesimal change in one of the parameters will af-
fect the value of p̄ or Q̄, one has only to find out its partial derivatives. If
the sign of a partial derivative can be determined, we will know the di-
rection in which p̄ will move when a parameter changes; this constitutes
a qualitative conclusion. If the magnitude of the partial derivative can be
ascertained, it will constitute a quantitative conclusion.
Also, to avoid misunderstanding, a clear distinction should be made
between the two derivatives, say, ∂ Q̄/∂a and ∂Qd /∂a. The latter derivative
7.6. APPLICATIONS TO COMPARATIVE-STATIC ANALYSIS 119

is a concept appropriate to the demand function taken alone, and without


regard to the supply function. The derivative ∂ Q̄/∂a, on the other hand, to
the equilibrium quantity which takes into account interaction of demand
and supply together. To emphasize this distinction, we refer to the partial
derivatives of p̄ and Q̄ with respect to the parameters as comparative-static
derivatives.

Figure 7.2: The graphical illustration of comparative statics: (a) increase in


a; (b) increase in b; (c) increase in c, and (d) increase in d.
120CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

For instance, for p̄, we have

∂ p̄ 1
= ;
∂a b+d
∂ p̄ a+c
=− ;
∂b (b + d)2
∂ p̄ 1
= ;
∂c b+d
∂ p̄ a+c
=− .
∂d (b + d)2

∂ p̄ ∂ p̄ ∂ p̄ ∂ p̄
Thus ∂a
= ∂c
> and ∂b
= ∂d
< 0.

National-Income Model

Y = C + I0 + G0 (equilibrium condition);
C = α + β(Y − T ) (α > 0; 0 < β < 1);
T = γ + δY (γ > 0; 0 < δ < 1),

where the endogenous variables are the national income Y , consumption


C, and taxes T . The equilibrium income (in reduced form) is

α − βγ + I0 + G0
Ȳ = .
1 − β + βδ

Thus,

∂ Ȳ 1
= > 0 (the government-expenditure multiplier);
∂G0 1 − β + βδ
∂ Ȳ −β
= < 0;
∂γ 1 − β + βδ
∂ Ȳ −β(α − βγ + I0 + G0 ) −β Ȳ
= = < 0.
∂δ (1 − β + βδ) 2 (1 − β + βδ)
7.7. NOTE ON JACOBIAN DETERMINANTS 121

7.7 Note on Jacobian Determinants

Partial derivatives can also provide a means of testing whether there exists
functional (linear or nonlinear) dependence among a set of n variables.
This is related to the notion of Jacobian determinants.
Consider n differentiable functions in n variables not necessary linear.

y1 = f 1 (x1 , x2 , · · · , xn );

y2 = f 2 (x1 , x2 , · · · , xn );

··· ;

yn = f n (x1 , x2 , · · · , xn ),

where the symbol f i denotes the ith function, we can derive a total of n2
partial derivatives.

∂yi
(i = 1, 2, · · · , n; j = 1, 2, · · · , n).
∂xj

We can arrange them into a square matrix, called the Jacobian matrix
and denoted by J , and then take its determinant, the result will be what
is known as a Jacobian determinant (or a Jacobian, for short), denoted by
|J |:

∂y1
∂x1
∂y1
∂x2
··· ∂y1
∂xn

∂(y1 , y2 , · · · , yn )
∂y2 ∂y2
··· ∂y2
|J | = = ∂x1 ∂x2 ∂xn
.
∂(x1 , x2 , · · · , xn ) ··· ··· ··· ···
∂yn
∂x1
∂yn
∂x2
··· ∂yn
∂xn

Example 7.7.1 Consider two functions:

y1 = 2x1 + 3x2 ;
122CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST

y2 = 4x21 + 12x1 x2 + 9x22 .

Then the Jacobian determinant is

∂y1 ∂y1
2 3
|J | = ∂x1 ∂x2
= = 0.
∂y2 ∂y2
∂x1 ∂x2
(8x1 + 12x2 ) (12x1 + 18x2 )

A Jacobian test for the existence of functional dependence among a set


of n functions is provided by the following theorem:

Theorem 7.7.1 The n functions f 1 , f 2 , · · · f n are functionally (linear or non-


linear) dependent if and only if the Jacobian determinant |J | defined above will be
identically zero for all values of x1 , x2 , · · · , xn .

For the above example, since

∂(y1 , y2 )
|J | = = (24x1 + 36x2 ) − (24x1 + 36x2 ) ≡ 0
∂(x1 , x2 )

for all x1 and x2 , then y 1 and y 2 are functionally dependent. In fact, y2 is


simply y1 squared.
Let us now consider the special case of linear functions. We have ear-
lier shown that the rows of the coefficient matrix A of a linear-equation
system: Ax = d, i.e.,

a11 x1 + a12 x2 + · · · + a1n xn = d1 ;

a21 x1 + a22 x2 + · · · + a2n xn = d2 ;

··· ;

an1 x1 + an2 x2 + · · · + ann xn = dn .

We know that the rows of the coefficient matrix A are linearly depen-
dent if and only if |A| = 0. This result can now be interpreted as a special
7.7. NOTE ON JACOBIAN DETERMINANTS 123

application of the Jacobian criterion of functional dependence.


To see this, take the left side of each equation Ax = d as a separate
function of the n variables x1 , x2 , · · · , xn , and denote these functions by
y1 , y2 , · · · , yn . Then we have ∂yi /∂xj = aij . In view of this, the elements of
|J | will be precisely the elements of A, i.e., |J | = |A| and thus the Jacobian
criterion of functional dependence among y1 , y2 , · · · , yn is equivalent to the
criterion |A| = 0 in the present linear case.
124CHAPTER 7. RULES OF DIFFERENTIATION AND THEIR USE IN COMPARATIVE ST
Chapter 8

Comparative-Static Analysis of
General-Functions

The study of partial derivatives has enables us, in the preceding chapter,
to handle the simple type of comparative-static problems, in which the e-
quilibrium solution of the model can be explicitly stated in the reduced
form. We note that the definition of the partial derivative requires the ab-
sence of any functional relationship among the independent variables. As
applied to comparative-static analysis, this means that parameters and/or
exogenous variables which appear in the reduced-form solution must be
mutually independent.

However, no such expediency should be expected when, owing to the


inclusion of general functions in a model, no explicit reduced-form solu-
tion can be obtained. In such a case, we will have to find the comparative-
static derivatives directly from the originally given equations in the model.
Take, for instance, a simple national-income model with two endogenous
variables Y and C:

Y = C + I0 + G0 (equilibrim condition);

125
126CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

C = C(Y, T0 ) (T0 : exogenous taxes),

which reduces to a single equation

Y = C(Y, T0 ) + I0 + G0

to be solved for Ȳ . We must, therefore, find the comparative-static deriva-


tives directly from this equation. How might we approach the problem?

Let us suppose that an equilibrium solution Ȳ does exists. We may


write the equation
Ȳ = Ȳ (I0 , G0 , T0 ),

even though we are unable to determine explicitly the form which this
function takes. Furthermore, in some neighborhood of Ȳ , the following
identical equality will hold:

Ȳ ≡ C(Ȳ , T0 ) + I0 + G0 .

Since Ȳ is a function of T0 , the two arguments of the C function are not


independent. T0 can in this case affect C not only directly, but also indi-
rectly via Ȳ . Consequently, partial differentiation is no longer appropriate
for our purposes. In this case, we must resort to total differentiation (as a-
gainst partial differentiation). The process of total differentiation can lead
us to the related concept of total derivative. Once we become familiar
with these concepts, we shall be able to deal with functions whose argu-
ments are not all independent so that we can study the comparative-static
of a general-function model.
8.1. DIFFERENTIALS 127

8.1 Differentials
The symbol dy/dx has been regarded as a single entity. We shall now rein-
terpret as a ratio of two quantities, dy and dx.

Differentials and Derivatives

Given a function y = f (x), we can use the difference quotient ∆y/∆x


to represent the ratio of change of y with respect to x. Since
[ ]
∆y
∆y ≡ ∆x, (8.1.1)
∆x

the magnitude of ∆y can be found, once the ∆y/∆x and the variation ∆x
are known. If we denote the infinitesimal changes in x and y, respectively,
by dx and dy, the identity (8.1) becomes
[ ]
dy
dy ≡ dx. (8.1.2)
dx

or

dy = f ′ (x)dx. (8.1.3)

The symbols dy and dx are called the differentials of y and x, respec-


tively.
Dividing the two identities in (8.1.2) throughout by dx, we have
( )
(dy) dy
≡ .
(dx) dx

or
(dy)
≡ f ′ (x).
(dx)
This result shows that the derivative dy/dx ≡ f ′ (x) may be interpreted
128CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

as the quotient of two separate differentials dy and dx.


On the basis of (8.1.2), once we are given f ′ (x), dy can immediately be
written as f ′ (x)dx. The derivative f ′ (x) may thus be viewed as a "convert-
er" that serves to convert an infinitesimal change dx into a corresponding
change dy.

Example 8.1.1 Given y = 3x2 + 7x − 5, find dy. Since f ′ (x) = 6x + 7, the


desired differential is
dy = (6x + 7)dx.

The following diagram shows the relationship between “∆y" and “dy".

Figure 8.1: Graphical illustration of the relationship between “∆y" and


“dy".
8.1. DIFFERENTIALS 129
[ ]
∆y CB
∆y ≡ ∆x = AC = CB;
∆x AC
[ ]
dy CD
dy = ∆x = AC = CD,
dx AC
which differs from ∆y by an error of DB.

Remark 8.1.1 The purpose of finding the differential dy is called the dif-
ferentiation. Recall that we have also used this term as a synonym for
derivation. To avoid confusion, the word “differentiation" with the phrase
“with respect to x" when we take derivative dy/dx.

Differentials and Point Elasticity

As an illustration of the application of differentials in economics, let us


consider the elasticity of a function. For a demand function Q = f (P ), for
instance, the price elasticity of demand is defined as (∆Q/Q)/(∆P/P ), the
ratio of percentage change in quantity demanded and percentage change
in price. Now if ∆P → 0, the ∆P and ∆Q will reduce to the differential
dP and dQ, and the elasticity becomes

dQ/Q dQ/dP marginal demand f unction


ϵd ≡ = = .
dP/P Q/P average demand f unction

In general, for a given function y = f (x), the point elasticity of y with


respect to x as
dy/dx marginal f unction
ϵyx = = .
y/x average f unction

Example 8.1.2 Find ϵd if the demand function is Q = 100 − 2P . Since


dQ/dP = −2 and Q/P = (100 − 2P )/P , so ϵd = (−P )/(50 − P ). Thus the
demand is inelastic (|ϵd | < 1) for 0 < P < 25, unit elastic (|ϵd | = 1) for
P = 25, and elastic for 25 < P < 50.
130CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

8.2 Total Differentials


The concept of differentials can easily be extended to a function of two or
more independent variables. Consider a savings function

S = S(Y, i),

where S is savings, Y is national income, and i is interest rate. If the func-


tion is continuous and possesses continuous partial derivatives, the total
differential is defined by

∂S ∂S
dS = dY + di.
∂Y ∂i

That is, the infinitesimal change in S is the sum of the infinitesimal change
in Y and the infinitesimal change in i.

Remark 8.2.1 If i remains constant, the total differential will reduce to the
partial differential: ( )
∂S dS
= .
∂Y dY i constant

Furthermore, general case of a function of n variables y = f (x1 , x2 , · · · , xn ),


the total differential of this function is given by

∂f ∂f ∂f ∑n
df = dx1 + dx2 + · · · + dxn = fi dxi ,
∂x1 ∂x2 ∂xn i=1

in which each term on the right side indicates the amount of change in y
resulting from an infinitesimal change in one of n variables.
Similar to the case of one variable, the n partial elasticities can be writ-
ten as

∂f xi
ϵf xi = (i = 1, 2, · · · , n).
∂xi f
8.3. RULE OF DIFFERENTIALS 131

8.3 Rule of Differentials

Let c be constant and u and v be two functions of the variables x1 , x2 , · · · , xn .


The the following rules are valid:

Rule I: dc = 0;

Rule II: d(cua ) = caua−1 du;

Rule III: d(u ± v) = du ± dv;

Rule IV: d(uv) = vdu + udv;

Rule V: d(u/v) = 1/v 2 (vdu − udv).

Example 8.3.1 Find dy of the function y = 5x21 +3x2 . There are two ways to
find dy. One is the straightforward method by finding ∂f /∂x1 and ∂f /∂x2 :
∂f /∂x1 = 10x1 and ∂f /∂x2 = 3, which will then enable us to write

dy = f1 dx1 + f2 dx2 = 10x1 dx1 + 3dx2 .

The other way is to use the rules given above by letting u = 5x21 and
v = 3x2 ;

dy = d(5x21 ) + d(3x2 ) (by rule III)


= 10x1 dx1 + 3dx2 (by rule II).

Example 8.3.2 Find dy of the function y = 3x21 + x1 x22 . Since f1 = 6x1 + x22
and f2 = 2x1 x2 , the desired differential is

dy = (6x1 + x22 )dx + 2x1 x2 dx2 .


132CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

By applying the given rules, the same result can be arrived at

dy = d(3x21 ) + d(x1 x22 )


= 6x1 dx1 + x22 dx1 + 2x1 x2 dx2
= (6x1 + x22 )dx1 + 2x1 x2 dx2 .

Example 8.3.3 For the function

x1 + x2
y=
2x21

−(x1 + 2x2 ) 1
f1 = 3
and f2 = 2 ,
2x1 2x1
then
−(x1 + 2x2 ) 1
dy = 3
dx1 + 2 dx2 .
2x1 2x1

The same result can also be obtained by applying the given rules:

1
dy = [2x2 d(x1 + x2 ) − (x1 + x2 )d(2x21 )] [by rule V]
4x41 1
1
= 4 [2x21 (dx1 + dx2 ) − (x1 + x2 )4x1 dx1 ]
4x1
1
= 4 [−2x1 (x1 + 2x2 )dx1 + 2x21 dx2 ]
4x1
x1 + 2x2 1
=− 3
dx1 + 2 dx2 .
2x1 2x1

For the case of more than two functions, we have:

Rule VI: d(u ± v ± w) = du ± dv ± dw;

Rule VII: d(uvw) = vwdu + uwdv + uvdw.


8.4. TOTAL DERIVATIVES 133

8.4 Total Derivatives

Consider a function

y = f (x, w) with x = g(w).

Here, the variable w can affect y through two channels: (1) indirectly, vi-
a the function g and then f , and (2) directly, via the function. Unlike a
partial derivative, the total derivative does not require the argument x to
main constant as w varies, and can thus allow for the postulated relation-
ship between the two variables. Whereas the partial derivative fw is ade-
quate for expressing the direct effect alone, the total derivative is needed
to express both effects jointly.
To get the total derivative, we first get the total differential

dy = fx dx + fw dw.

Dividing both sides of this equation by dw, we have the total derivative:

dy dx dw
= fx + fw
dw dw dw
∂y dx ∂y
= + .
∂x dw ∂w

Example 8.4.1 Find the dy/dw, given the function

y = f (x, w) = 3x − w2 with x = g(w) = 2w2 + w + 4.

dy dx
= fx + fw = 3(4w + 1) − 2w = 10w + 3.
dw dw
As a check, we may substitute the function g into f , to get

y = 3(2w2 + w + 4) − w2 = 5w2 + 3w + 12,


134CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

which is now a function of w alone. Then, we also have

dy
= 10w + 3,
dw

and thus we have the identical answer.

Example 8.4.2 Suppose that a utility function is given by

U = U (s, c),

where c is the amount of coffee consumed and s is the amount of sugar


consumed, and another function s = g(c) indicates the complementarity
between these two goods. Then we can find the marginal utility of coffee
given by
dU ∂U ′ ∂U
M Uc = = g (c) + .
dc ∂s ∂c
Through the inverse function rule for c = g −1 (s), we can also find the
marginal utility of sugar given by

dU ∂U dc ∂U
M Us = = +
ds ∂c ds ∂s
∂U 1 ∂U
= ′
+ .
∂c g (c) ∂s

The marginal rate of substitution of coffee for sugar M RScs is given by

∂U ′
[ ∂U ]
M Uc g (c) + ∂U ′ g ′ (c) + ∂U
M RScs = = ∂s
∂U 1
∂c
∂U = g (c) ∂s
∂U
∂c
′ (c)
= g ′ (c).
M Us ∂c g ′ (c)
+ ∂s ∂c
+ ∂U ∂s
g

A Variation on the Theorem

For a function
y = f (x1 , x2 , w)
8.4. TOTAL DERIVATIVES 135

with x1 = g(w) and x2 = h(w), the total derivative of y is given by

dy ∂f dx1 ∂f dx2 ∂f
= + + .
dw ∂x1 dw ∂x2 dw ∂w

Example 8.4.3 Let a production function be

Q = Q(K, L, t),

where K is the capital input, L is the labor input, and t is the time which
indicates that the production can shift over time in reflection of technolog-
ical change. Since capital and labor can also change over time, we may
write
K = K(t) and L = L(t).

Thus the rate of output with respect to time can be denote as

dQ ∂Q dK ∂Q dL ∂Q
= + + .
dt ∂K dt ∂L dt ∂t

Another Variation on the Theme

Now if a function is given,

y = f (x1 , x2 , u, v)

with x1 = g(u, v) and x2 = h(u, v), we can find the total derivative of y
with respect to u (while v is held constant). Since

∂f ∂f ∂f ∂f
dy = dx1 + dx2 + du + dv,
∂x1 ∂x2 ∂u ∂v
136CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

dividing both sides of the above equation by du, we have

dy ∂y dx1 ∂y dx2 ∂y du ∂y dv
= + + +
du ∂x1 du ∂x2 du ∂u du ∂v du
[ ]
∂y dx1 ∂y dx2 ∂y dv
= + + = 0 since v is constant .
∂x1 du ∂x2 du ∂u du

Since v is held constant, the above is the partial total derivative, we


redenote the above equation by the following notation:

§y ∂y ∂x1 ∂y ∂x2 ∂y
= + + .
§u ∂x1 ∂u ∂x2 ∂u ∂u

§z §z
Example 8.4.4 Find the partial total derivatives §u
and §v
if

z = 3x2 − 2y 4 + 5uv 2 ,

where
x = u − v 2 + 4.

and
y = 8u3 v + v 2 + 1.

By applying the above formula on the partial total derivative, we have

§z ∂z ∂x ∂z ∂y ∂z
= + +
§u ∂x ∂u ∂y ∂u ∂u
= 6x × 1 − 8y 3 × 24u2 v + 5v 2
= 6x − 196y 3 u2 v + 5v 2 .
8.5. IMPLICIT FUNCTION THEOREM 137

and

§z ∂z ∂x ∂z ∂y ∂z
= + +
§v ∂x ∂v ∂y ∂v ∂v
= 6x × −2v − 8y 3 × (8u3 + 2v) + 10uv
= −12xv − 8y 3 (8u3 + 2v) + 10uv.

Remark 8.4.1 In the cases we have discussed, the total derivative formu-
las can be regarded as expressions of the chain rule, or the composite-
function rule. Also the chain of derivatives does not have to be limited to
only two “links"; the concept of the total derivative should be extendible
to cases where there are three or more links in the composite function.

8.5 Implicit Function Theorem

The concept of total differentials can also enable us to find the derivatives
of the so-called “implicit functions." As such, we can still do comparative-
static analysis for general functions.

Implicit Functions

A function given in the form of y = f (x1 , x2 , · · · , xn ) is called the explicit


function, because the variable y is explicitly expressed as a function of x.
But in many cases y is not an explicit function x1 , x2 , · · · , xn , instead, the
relationship between y and x1 , · · · , xn is given with the form of

F (y, x1 , x2 , · · · , xn ) = 0.

Such an equation may also be defined as implicit function y = f (x1 , x2 , · · · , xn ).


Note that an explicit function y = f (x1 , x2 , · · · , xn ) can always be trans-
138CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

formed into an equation

F (y, x1 , x2 , · · · , xn ) ≡ y − f (x1 , x2 , · · · , xn ) = 0.

The reverse transformation is not always possible.


In view of this uncertainty, we have to impose certain conditions under
which we ensure that a given equation F (y, x1 , x2 , · · · , xn ) = 0 does indeed
define an implicit function y = f (x1 , x2 , · · · , xn ). Such a result is given by
the so-called "implicit-function theorem."

Theorem 8.5.1 (Implicit-Function Theorem) Given F (y, x1 , x2 , · · · , xn ) =


0, suppose that the following conditions are satisfied:

(a) the function F has continuous partial derivatives Fy , Fx1 , Fx2 , · · · , Fxn ;

(b) at point (y0 , x10 , x20 , · · · , xn0 ) satisfying F (y0 , x10 , x20 , · · · , xn0 ) =
0, Fy is nonzero.

Then there exists an n−dimensional neighborhood of x0 = (x10 , x20 , · · · , xn0 ),


denoted by N (x0 ), such that y is an implicitly defined function of variables x1 , x2 , · · · , xn ,
in the form of y = f (x1 , x2 , · · · , xn ), and F (y, x1 , x2 , · · · , xn ) = 0 for all points
in N (x0 ). Moreover, the implicit function f is continuous, and has continuous
partial derivatives f1 , · · · , fn .

Derivatives of Implicit Functions

Differentiating F , we have dF = 0, or

Fy dy + F1 dx1 + · · · + Fn dxn = 0.

Suppose that only y and x1 are allowed to vary. Then the above equa-
8.5. IMPLICIT FUNCTION THEOREM 139

tion reduce to Fy dy + F1 dx1 = 0. Thus

dy ∂y F1
≡ =− .
dx1 other variable constant ∂x1 Fy

In the simple case where the given equation is F (y, x) = 0, the rule
gives
dy Fx
=− .
dx Fy

= − FFxy = − −12x
3
Example 8.5.1 Suppose y − 3x4 = 0. Then dy
dx 1
= 12x3 .
In this particular case, we can easily solve the given equation for y, to
get y = 3x4 so that dy/dx = 12x3 .

Example 8.5.2 F (x, y) = x2 + y 2 − 9 = 0. Thus,

dy Fx 2x x
=− =− = − , (y ̸= 0).
dx Fy 2y y

Example 8.5.3 F (y, x, w) = y 3 x2 + w3 + yxw − 3 = 0, we have

∂y Fx 2y 3 x + yw
=− =− 2 2 .
∂x Fy 3y x + xw

In particular, at point (1, 1, 1), ∂y


∂x
= −3/4.

Example 8.5.4 Suppose that the equation F (Q, K, L) = 0 implicitly de-


fines a production function Q = f (K, L). Then we can find M PL (marginal
product of labor) and M PK (marginal product of capital) as follows:

∂Q FK
M PK ≡ =− ;
∂K FQ

∂Q FL
M PL ≡ =− .
∂L FQ
In particular, we can also find the M RT SLK (marginal rate of technical
140CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

substitution) which is given by

∂K FL
M RT SLK ≡ | |= .
∂L FK

Extension to the Simultaneous-Equation Case

Consider a set of simultaneous equations.

F 1 (y1 , y2 , · · · , yn ; x1 , x2 , · · · , xm ) = 0;

F 2 (y1 , y2 , · · · , yn ; x1 , x2 , · · · , xm ) = 0;

···

F n (y1 , y2 , · · · , yn ; x1 , x2 , · · · , xm ) = 0.

Suppose that F 1 , F 2 , · · · , F n are differentiable. Taking differentials on


both side of the equation system, we then have
[ ]
∂F 1 ∂F 1 ∂F 1 ∂F 1 ∂F 1 ∂F 1
dy1 + dy2 +· · ·+ dyn = − dx1 + dx2 + · · · + dxm ;
∂y1 ∂y2 ∂yn ∂x1 ∂x2 ∂xm
[ ]
∂F 2 ∂F 2 ∂F 2 ∂F 2 ∂F 2 ∂F 2
dy1 + dy2 +· · ·+ dyn = − dx1 + dx2 + · · · + dxm ;
∂y1 ∂y2 ∂yn ∂x1 ∂x2 ∂xm
···
[ ]
∂F n ∂F n ∂F 1 ∂F n ∂F n ∂F n
dy1 + dy2 +· · ·+ dyn = − dx1 + dx2 + · · · + dxm .
∂y1 ∂y2 ∂yn ∂x1 ∂x2 ∂xm
Or in matrix form,
     
∂F 1 ∂F 1 ∂F 1 ∂F 1 ∂F 1 ∂F 1
 ∂y1 ∂y2
··· ∂yn  
dy1   ∂x1 ∂x2
··· ∂xm  
dx1 
 ∂F 2     ∂F 2   
 ∂F 2
··· ∂F  
2
dy2   ∂F 2
··· ∂F  
2
dx2 
 ∂y1 ∂yn     ∂x1 ∂xm   
 ∂y2
  = − ∂x2
 . (8.5.4)

··· ··· ··· ···  
···

··· ··· ··· ···   
 ··· 
     
∂F n ∂F n ∂F n ∂F n ∂F n ∂F n
∂y1 ∂y2
··· ∂yn
dyn ∂x1 ∂x2
··· ∂xm
dxm
8.5. IMPLICIT FUNCTION THEOREM 141

Now suppose that the following Jacobian determinant is nonzero:

∂F 1 ∂F 1 ∂F 1
∂y1 ∂y2
··· ∂yn
∂F 2 ∂F 2 ∂F 2
∂(F 1 , F 2 , · · · , F n ) ···
|J | = = ∂y1 ∂y2 ∂yn
̸= 0.
∂(y1 , y2 , · · · , yn ) ··· ··· ··· ···
∂F n ∂F n ∂F n
∂y1 ∂y2
··· ∂yn

Then, we can obtain total differentials dy = (dy1 , dy2 , . . . , dyn )′ by in-


verting J .
dy = J −1 Fx dx,

where  
∂F 1 ∂F 1 ∂F 1
 ∂x1 ∂x2
··· ∂xn 
 ∂F 2 
 ∂F 2
··· ∂F 2 
 ∂x1 ∂x2 ∂xn 
Fx =  .

··· ··· ··· ···

 
∂F n ∂F n ∂F n
∂x1 ∂x2
··· ∂xn

If we want to obtain partial derivatives with respect to xi (i = 1, 2, . . . , m),


we can do so by letting dxk = 0 for k ̸= i and divide dxi on both sides of
(8.5.4). Then, we have the following equation:
    
∂F 1 ∂F 1 ∂F 1 ∂F 1
 ∂y1 ∂y2
··· ∂y1
∂yn   ∂xi   ∂xi 
 ∂F 2    ∂F 2 
 ∂F 2
··· ∂F 2   ∂y2   
 ∂y1 ∂yn   ∂xi   ∂xi 
 ∂y2
  = − .

··· ··· ··· ···  
 · · ·
 
···
    
∂F n ∂F n ∂F n ∂F n
∂y1 ∂y2
··· ∂yn
∂yn
∂xi ∂xi

Then, by Cramer’s rule, we have

∂yj |J i |
= j (j = 1, 2, · · · , n; i = 1, 2, · · · , m),
∂xi |J |
142CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

where |Jji | is obtained by replacing the jth column of |J | with


[ ]′
∂F 1 ∂F 2 ∂F n
Fxi = , ,··· , .
∂xi ∂xi ∂xi

Of course, we can find these derivatives by inversing the Jacobian ma-


trix J:    −1  
∂F 1 ∂F 1 ∂F 1 ∂F 1
∂y1
 ∂xi   ∂y1 ∂y2
··· ∂yn   ∂xi 
 ∂y2   ∂F 2   ∂F 2 
   ∂F 2
··· ∂F 2   
 ∂xi   ∂y1 ∂yn   ∂xi 
  = − ∂y2
  .
  
· · · ··· ··· ··· ···

 
···
     
∂F n ∂F n ∂F n ∂F n
∂yn
∂xi ∂y1 ∂y2
··· ∂yn ∂xi

In the compact notation,

∂y
= −J −1 Fxi .
∂xi

Example 8.5.5 Let the national-income model be rewritten in the form:

Y − C − I0 − G0 = 0;

C − α − β(Y − T ) = 0;

T − γ − δY = 0.

Then

∂F 1 ∂F 1 ∂F 1
∂Y ∂C ∂T
1 −1 0
|J | = ∂F 2
∂Y
∂F 2
∂C
∂F 2
∂T
= −β 1 β = 1 − β + βδ.
∂F 3 ∂F 3 ∂F 3
∂Y ∂C ∂T
−δ 0 1

Suppose that all exogenous variables and parameters are fixed except
8.6. COMPARATIVE STATICS OF GENERAL-FUNCTION MODELS 143

G0 . Then we have     
 1 −1 0   ∂G
∂ Ȳ
 1
  0  
−β 1 β  ∂ C̄   
   ∂G0  = 0 .
    
−δ 0 1 ∂ T̄
∂G0
0

We can solve the above equation for, say, ∂ Ȳ /∂G0 which comes out to be

1 −1 0
0 1 β
∂ Ȳ 0 0 1 1
= = .
∂G0 |J| 1 − β + βδ

8.6 Comparative Statics of General-Function Mod-


els
Consider a single-commodity market model:

Qd = Qs , [equilibrium condition];
Qd = D(P, Y0 ), [∂D/∂P < 0; ∂D/∂Y0 > 0];
Qs = S(P ), [dS/dP > 0],

where Y0 is an exogenously determined income. From this model, we can


obtain a single equation:

D(P, Y0 ) − S(P ) = 0.

Even though this equation cannot be solved explicitly for the equilib-
rium price P̄ , by the implicit-function theorem, we know that there exists
the equilibrium price P̄ that is the function of Y0 :

P̄ = P̄ (Y0 ),
144CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

such that
F (P̄ , Y0 ) ≡ D(P̄ , Y0 ) − S(P̄ ) = 0.

It then requires only a straight application of the implicit-function rule


to produce the comparative-static derivative, dP̄ /dY0 :

dP̄ ∂F/∂Y0 ∂D/∂Y0


=− =− > 0.
dY0 ∂F/∂P ∂D/∂P − dS/dP

Since Q̄ = S(P̄ ), thus we have

dQ̄ dS dP̄
= > 0.
dY0 dP dY0

8.7 Matrix Derivatives

Matrix derivatives play an important role in economic analysis, especially


in econometrics. If A is n × n non-singular matrix, the derivative of its
determinant with respect to A is given by


|A| = [Cij ]
∂A

where [Cij ] is the matrix of cofactors of A.


8.7. MATRIX DERIVATIVES 145

Some Useful Formulas

Let a, b be k × 1 vectors and M be a k × k matrix. Then we have:

da′ b
= a;
db
db′ a
= a;
db
dM b
= M ′;
db
db′ M b
= (M + M ′ )b.
db

Example 8.7.1 (Find the Least Square Estimator for Multiple Regression Model)
Consider the multiple regression model:

y = Xβ + ϵ,

where n × 1 vector y is the dependent variable, X is a n × k matrix of k


explanatory variables with rank(X) = k, β is a k × 1 vector of coefficients
which are to be estimated and ϵ is a n×1 vector of disturbances. We assume
that the matrices of observations X and y are given. Our goal is to find
an estimator b for β using the least squares method. The least squares
estimator of β is a vector b, which minimizes the expression

E(b) = (y − Xb)′ (y − Xb) = y ′ y − y ′ Xb − b′ X ′ y + b′ X ′ Xb.

The first-order condition for extremum is:

dE(b)
= 0 ⇒ −2X ′ y + 2X ′ Xb = 0 ⇒ b = (X ′ X)−1 X ′ y.
db

On the other hand, by the third derivation rule above, we have:

dE 2 (b)
= (2X ′ X)′ = 2X ′ X.
db2
146CHAPTER 8. COMPARATIVE-STATIC ANALYSIS OF GENERAL-FUNCTIONS

It will be seen in Chapter 11 that, to check whether the solution b is indeed


a minimum, we need to prove the positive definiteness of the matrix X ′ X.
First, notice that X ′ X is a symmetric matrix. To prove positive definite-
ness, we take an arbitrary k × 1 vector z, z ̸= 0 and check the following
quadratic form:
z ′ (X ′ X)z = (Xz)′ (Xz)

The assumptions rank(X) = k and z ̸= 0 imply Xz ̸= 0. Thus X ′ X is


positive definite.
Chapter 9

Optimization: Maxima and


Minima of a Function of One
Variable

The optimization problem is the core issue in economics. Rationality (i.e.,


individuals pursue in maximizing their personal interests in economic sit-
uations) is the most basic behavior assumption about individual decision-
makers in economics and also in practice. The basis of their analysis is
solving optimization problems.

Our attention thus is turned to the study of goal equilibrium, in which


the equilibrium state is defined as the optimal position for a given econom-
ic unit and in which the said economic unit will be deliberately striving for
attainment of that equilibrium. Our primary focus will be on the classical
techniques for locating optimal positions - those using differential calcu-
lus.

147
148CHAPTER 9. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF ONE V

9.1 Optimal Values and Extreme Values

Economics is by large a science of choice. When an economic project is


to be carried out, there are normally a number of alternative ways of ac-
complishing it. One (or more) of these alternatives will, however, be more
desirable than others from the stand-point of some criterion, and it is the
essence of the optimization problem to choose.
The most common criterion of choice among alternatives in economic-
s is the goal of maximizing something (e.g., utility maximization, profit
maximization) or minimizing something (e.g., cost minimization). Eco-
nomically, we may categorize such maximization and minimization prob-
lems under general heading of optimization. From a purely mathematical
point of view, the collective term for maximum and minimum is the more
matter-of-fact designation extremum, meaning an extreme value.
In formulating an optimization problem, the first order of business is
to delineate an objective function in which the dependent variable rep-
resents the objects whose magnitudes the economic unit in question can
pick and choose. We shall therefore refer to the independent variables as
choice variables.
Consider a general-form objective function

y = f (x).

Three specific cases of functions are depicted in Figure 9.1. The point E
and F in (c) are relative (or local) extremum, in the sense that each of these
points represents an extremum in some neighborhood of the point only.
We shall continue our discussion mainly with reference to the search for
relative extreme. Since an absolute (or global) maximum must be either
a relative maxima or one of the ends of the function. Thus, if we know
9.2. EXISTENCE OF EXTREMUM FOR CONTINUOUS FUNCTION 149

Figure 9.1: The extremum for various functions: (a) constant function; (b)
monotonic function, (3) non-monotonic function.

all the relative maxima, it is necessary only to select the largest of these
and compare it with the end points in order to determine the absolute
maximum. Hereafter, the extreme values considered will be relative or
local ones, unless indicated otherwise.

9.2 Existence of Extremum for Continuous Func-


tion

Let X be a domain of a function f . First, we give the following concepts:

Definition 9.2.1 (Local Optimum) Let f (x) be continuous in a neighbor-


hood U of a point x0 . It is said to have a local or relative maximum (resp.
minimum) at x0 if for all x ∈ U, x ̸= x0 , f (x) ≤ f (x0 ) (resp. f (x) ≥ f (x0 )).

Definition 9.2.2 (Global Optimum) If f (x∗ ) = f (x) (resp. f (x∗ ) > f (x))
for all x in the domain X of the function, then the function is said to have
global (unique) maximum at x∗ ; if f (x∗ ) 5 f (x) (resp. f (x∗ ) < f (x)) for
150CHAPTER 9. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF ONE V

all x in the domain of the function, then the function is said to have global
(unique) minimum at x∗ .

A classical conclusion about global optimization is the so-called Weier-


strass theorem.

Proposition 9.2.1 (Weierstrass’s Theorem) Suppose that f is continuous on


a closed and bounded subset X of R1 (or, in the general case, of Rn ). Then f
reaches its maximum and minimum in X, i.e. there exist points m, M ∈ X such
that f (m) ≤ f (x) ≤ f (M ), for all x ∈ X. Moreover, the set of maximal (resp.
minimal) points is compact.

In order to easily determine whether a function has an extreme point,


the following gives the method of finding extreme values by differential
method. Generally, there are two types of necessary conditions for the
interior extreme point, that is, the first and second order necessary condi-
tions.

9.3 First-Derivative Test for Relative Maximum


and Minimum
Given a function y = f (x), the first derivative f ′ (x) plays a key role in our
search for its extreme values. For smooth functions, an interior relative
extreme values can only occur where

f ′ (x) = 0

which is a necessary (but not sufficient) condition for a relative extremum


(either maximum or minimum). We summarize this as in the following
proposition on the necessary condition for extremum.
9.3. FIRST-DERIVATIVE TEST FOR RELATIVE MAXIMUM AND MINIMUM151

Figure 9.2: The first derivative test: (a) f ′ (x0 ) doest not exist; and (b)
f ′ (x0 ) = 0.

Proposition 9.3.1 (Fermat’s Theorem: Necessary Condition for Extremum)


Suppose that f (x) is differentiable on X and has a local extremum (minimum or
maximum) at an interior point x0 ∈ X. Then f ′ (x0 ) = 0.

Note that if the first derivative vanishes at some point, it does not imply
that at this point f possesses an extremum. Such an example is f = x3 . As
such, we can only state that f has a stationary point.
We have some useful results about stationary points.

Proposition 9.3.2 (Rolle Theorem) Suppose that f is continuous in [a, b], d-


ifferentiable on (a, b) and f (a) = f (b). Then there exists a point c ∈ (a, b) such
that f ′ (c) = 0.

From Roll Theorem, we can prove the well-known Mean-Value Theo-


rem, also called Lagrange’s Theorem.

Proposition 9.3.3 (the Mean-Value Theorem or Lagrange’s Theorem) Suppose


that f is continuous on [a, b] and differentiable on (a, b). Then there exists a point
c ∈ (a, b) such that f ′ (c) = f (b)−f (a)
b−a
.
152CHAPTER 9. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF ONE V

Figure 9.3: The Mean-Value Theorem implies that there exists some c in
the interval (a, b) such that the secant joining the endpoints of the interval
[a, b] is parallel to the tangent at c.

Proof. Let g(x) = f (x) − f (b)−f (a)


b−a
x. Then, g is continuous in [a, b], dif-
ferentiable on (a, b) and g(a) = g(b). Thus, by Rolle Theorem, there exists
one point c ∈ (a, b) such that g ′ (c) = 0, and therefore f ′ (c) = f (b)−f (a)
b−a
. 
The above Mean Value Theorem is also true for multivariate x. If
functionf : Rn → R is differentiable, then there is z = tx + (1 − t)y
with 0 5 t 5 1, such that

f (y) = f (x) + Df (z)(y − x),

where [ ]
∂f (x) ∂f (x) ∂f (x)
Df (x) = , ,··· , .
∂x1 ∂x2 ∂xn
An variation of the above mean-value theorem is in form of integral
calculus:

Theorem 9.3.1 (Mean-Value Theorem of Integral Calculus) Suppose that f :


[a, b] → R is continuous on [a, b]. Then there exists a number c ∈ (a, b) such
that
∫ b
f (x)dx = f (c)(b − a).
a
9.3. FIRST-DERIVATIVE TEST FOR RELATIVE MAXIMUM AND MINIMUM153
∫x
Proof. Let F (x) = a f (t)dt. Since f : [a, b] → R is continuous on [a, b],
F (x) is continuous and differentiable on (a, b). Then, by the Mean-Value
Theorem, there is c ∈ (a, b) such that

F (b) − F (a)
= F ′ (c) = f (c).
b−a

Therefore, we have
∫ b
f (x)dx = f (c)(b − a).
a


The second variation of the mean-value theorem is the generalized
mean-value theorem:

Proposition 9.3.4 (Cauchy’s Theorem or the Generalized Mean Value Theorem)


Suppose that f and g are continuous in [a, b] and differentiable in (a, b). Then
there exists at a point c ∈ (a, b) such that (f (b)−f (a))g ′ (c) = (g(b)−g(a))f ′ (c).

Proof. The case that g(a) = g(b) is easy. So, assume that g(a) ̸= g(b). Define

f (b) − f (a)
h(x) = f (x) − g(x).
g(b) − g(a)

Then, applying the Mean-Value Theorem gets the result. 


To verify if it has a maximum or minimum, we can use the following
proposition on first-derivative test relative extremum.

Proposition 9.3.5 (First-Derivative Test Relative Extremum) Suppose that


f ′ (x0 ) = 0. Then the value of the function at x0 , f (x0 ), is

(a) a relative maximum if f ′ (x) changes its sign from positive to negative from
the immediate left of the point x0 to its immediate right;

(b) a relative minimum if f ′ (x) changes its sign from negative to positive from
the immediate left of the point x0 to its immediate right;
154CHAPTER 9. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF ONE V

(c) an inflection (not extreme) point if f ′ (x) has the same sign on some neigh-
borhood.

Example 9.3.1 y = (x − 1)3 .


x = 1 is not extreme point even f ′ (1) = 0.

Example 9.3.2 y = f (x) = x3 − 12x2 + 36x + 8.


Since f ′ (x) = 3x2 − 24x + 36, to get the critical values, i.e., the values of
the function at x satisfying the condition f ′ (x) = 0, we set f ′ (x) = 0, and
thus
3x2 − 24x + 36 = 0.

Its roots are x̄1 = 2 and x̄2 = 6. It is easy to verify that f ′ (x) > 0 for
x < 2 and f ′ (x) < 0 for x > 2. Thus x = 2 is a maximal point and the
corresponding maximum value of the function f (2) = 40. Similarly, we
can verify that x = 6 is a minimal point and f (6) = 8.

Example 9.3.3 Find the relative extremum of the average-cost function

AC = f (Q) = Q2 − 5Q + 8.

Since f ′ (2.5) = 0, f ′ (Q) < 0 for Q < 2.5, and f ′ (Q) > 0 for Q > 2.5, so
Q̄ = 2.5 is a minimal point.

9.4 Second and Higher Derivatives


Since the first derivative f ′ (x) of a function y = f (x) is also a function of x,
we can consider the derivative of f ′ (x), which is called the second deriva-
tive. Similarly, we can find derivatives of even higher orders. These will
enable us to develop alternative criteria for locating the relative extremum
of a function.
9.4. SECOND AND HIGHER DERIVATIVES 155

The second derivative of the function f is denoted by f ′′ (x) or d2 y/dx2 .


If the second derivative f ′′ (x) exists for all x values, f (x) is said to be twice
differentiable; if, in addition, f ′′ (x) is continuous, f (x) is said to be twice
continuously differentiable.
The higher-order derivatives of f (x) can be similarly obtained and
symbolized along the same line as the second derivative:

f ′′′ (x), f (4) (x), · · · , f (n) (x),

or
d3 y d4 y dn y
, , · · · , .
dx3 dx4 dxn

Remark 9.4.1 dn y/dxn can be also written as (dn /dxn )y, where the dn /dxn
part serves as an operator symbol instructing us to take the n-th derivative
with respect to x.

Example 9.4.1 y = f (x) = 4x4 − x3 + 17x2 + 3x − 1.


Then

f ′ (x) = 16x3 − 3x2 + 34x + 3;


f ′′ (x) = 48x2 − 6x + 34;
f ′′′ (x) = 96x − 6;
f (4) (x) = 96;
f (5) (x) = 0.

Example 9.4.2 Find the first four derivatives of the function

x
y = g(x) = (x ̸= −1).
1+x
156CHAPTER 9. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF ONE V

g ′ (x) = (1 + x)−2 ;
g ′′ (x) = −2(1 + x)−3 ;
g ′′′ (x) = 6(1 + x)−4 ;
g (4) (x) = −24(1 + x)−5 .

Remark 9.4.2 A negative second derivative is consistently reflected in an


inverse U-shaped curve; a positive second derivative is reflected in an U-
shaped curve.

9.5 Second-Derivative Test


Recall the meaning of the first and the second derivatives of a function f.
The sign of the first derivative tells us whether the value of the function
increases (f ′ > 0) or decreases(f ′ < 0), whereas the sign of the second
derivative tells us whether the slope of the function increases (f ′′ > 0)
or decreases (f ′′ < 0). This gives us an insight how to verify that at a
stationary point there exists a maximum or minimum. Thus, we have the
following result on the second-derivative test for relative extremum.

Proposition 9.5.1 (Second-Derivative Test for Relative Extremum) Suppose


that f ′ (x0 ) = 0. Then the value of the function at x0 , f (x0 ), will be

(a) a relative maximum if f ′′ (x0 ) < 0;

(b) a relative minimum if f ′′ (x0 ) > 0.

This test is in general more convenient to use than the first-derivative


test, since it does not require us to check the derivative sign to both the left
and right of x.

Example 9.5.1 y = f (x) = 4x2 − x.


9.5. SECOND-DERIVATIVE TEST 157

Since f ′ (x) = 8x − 1 and f ′′ (x) = 8, we know f (x) reaches its minimum


at x̄ = 1/8. Indeed, since the function plots as a U-shaped curve, the
relative minimum is also the absolute minimum.

Example 9.5.2 y = g(x) = x3 − 3x2 + 2.


y ′ = g ′ (x) = 3x2 − 6x and y ′′ = 6x − 6. Setting g ′ (x) = 0, we obtain the
critical values x̄1 = 0 and x̄2 = 2, which in turn yield the two stationary
values g(0) = 2 (a maximum because g ′′ (0) = −6 < 0) and g(2) = −2 (a
minimum because g ′′ (2) = 6 > 0).

Remark 9.5.1 Note that when f ′ (x0 ) = 0, f ′′ (x0 ) < 0 (f ′′ (x0 ) > 0) is a
sufficient condition for a relative maximum (resp. minimum) but not a
necessary condition. However, the condition f ′′ (x0 ) ≤ 0 (f ′′ (x0 ) ≥ 0) is a
necessary (even though not sufficient) for a relative maximum (resp. min-
imum).

Condition for Profit Maximization

Let R = R(Q) be the total-revenue function and let C = C(Q) be the


total-cost function, where Q is the level of output. The profit function is
then given by
π = π(Q) = R(Q) − C(Q).

To find the profit-maximizing output level, we need to find Q̄ such that

π ′ (Q̄) = R′ (Q̄) − C ′ (Q̄),

or
R′ (Q̄) = C ′ (Q̄), or M R(Q̄) = M C(Q̄).

To be sure the first-order condition leads to a maximum, we require

d2 π
≡ π ′′ (Q̄) − C ′′ (Q̄) < 0.
dQ
158CHAPTER 9. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF ONE V

Economically, this would mean that, if the rate of change of M R is less


than the rate of change of M C at Q̄, then that output Q will maximize
profit.

Example 9.5.3 Let R(Q) = 1200Q − 2Q2 and C(Q) = Q3 − 61.25Q2 +


1528.5Q + 2000. Then the profit function is

π(Q) = −Q3 + 59.2Q2 − 328.5Q − 2000.

Setting π ′ (Q) = −3Q2 + 118.5Q − 328.5 = 0, we have Q̄1 = 3 and


Q̄2 = 36.5. Since π ′′ (3) = −18 + 118.5 = 100.5 > 0 and π ′′ (36.5) = −219 +
118.5 = −100.5 < 0, so the profit-maximizing output is Q̄ = 36.5.

9.6 Taylor Series

This section considers the so-called "expansion" of a function y = f (x) into


what is known as the Taylor series (expansion around any point x = x0 ).
To expand a function y = f (x) around a point x0 means to transform that
function into a polynomial form, in which the coefficients of the various
terms are expressed in terms of the derivative values f ′ (x0 ), f ′′ (x0 ), etc. -
all evaluated at the point of expansion x0 .

Proposition 9.6.1 (Taylor’s Theorem) Given an arbitrary function ϕ(x), if we


know the values of ϕ(x0 ), ϕ′ (x0 ), ϕ′′ (x0 ), etc., then this function can be expanded
around the point x0 as follows:

1 ′′ 1
ϕ(x) = ϕ(x0 ) + ϕ′ (x0 )(x − x0 ) + ϕ (x0 )(x − x0 )2 + · · · + ϕ(n) (x0 )(x − x0 )n + Rn
2! n!
≡ Pn + Rn ,

where Pn represents the nth-degree polynomial and Rn denotes a remainder which


9.6. TAYLOR SERIES 159

can be denoted by the so-called Lagrange form of the remainder:

ϕ(n+1) (P )
Rn = (x − x0 )n+1
(n + 1)!

with P being a point between x and x0 . Here n! is the "n factorial", defined as

n! ≡ n(n − 1)(n − 2) · · · (3)(2)(1).

Remark 9.6.1 When n = 0, the Taylor’s Theorem reduces to the mean-


value theorem that we discussed in Section 9.3:

ϕ(x) = P0 + R0 = ϕ(x0 ) + ϕ′ (P )(x − x0 ),

or
ϕ(x) − ϕ(x0 ) = ϕ′ (P )(x − x0 ),

which states that the difference between the value of the function ϕ at x0
and at any other x value can be expressed as the product of the difference
(x − x0 ) and ϕ′ (P ) with P being some point between x and x0 .

Remark 9.6.2 If x0 = 0, then Taylor series reduce to the so-called Maclau-


rin series:

1 ′′ 1 1
ϕ(x) = ϕ(0)+ ϕ′ (0)x+ ϕ (0)x2 +· · · + ϕ(n) (0)xn + ϕ(n+1) (P )xn+1 ,
2! n! (n + 1)!

where P is a point between 0 and x.

Example 9.6.1 Expand the function

1
ϕ(x) =
1+x
160CHAPTER 9. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF ONE V

Figure 9.4: The graphic representation of the Taylor’s Theorem reduces to


the mean-value theorem when n = 0.

1
around the point x0 = 1, with n = 4. Since ϕ(1) = 2
and

ϕ′ (x) = −(1 + x)−2 , ϕ′ (1) = −1/4;


ϕ′′ (x) = 2(1 + x)−3 , ϕ′′ (1) = 1/4;
ϕ(3) (x) = −6(1 + x)−4 , ϕ(3) (1) = −3/8;
ϕ(4) (x) = 24(1 + x)−5 , ϕ(4) (1) = 3/4,

we obtain the following Taylor series:

ϕ(x) = 1/2 − 1/4(x − 1) + 1/8(x − 1)2 − 1/16(x − 1)3 + 1/32(x − 1)4 + Rn .


9.7. NTH-DERIVATIVE TEST 161

9.7 Nth-Derivative Test


A relative extremum of the function f can be equivalently defined as fol-
lows:
A function f (x) attains a relative maximum (resp. minimum) value at
x0 if f (x) − f (x0 ) is nonpositive (resp. nonnegative) for values of x in some
neighborhood of x0 .
Assume that f (x) has finite, continuous derivatives up to the desired
order at x = x0 , then the function can be expanded around x = x0 as a
Taylor series:

1 ′′
f (x) − f (x0 ) =f ′ (x0 )(x − x0 ) + f (x0 )(x − x0 )2 + · · ·
2!
1 1
+ f (n) (x0 )(x − x0 )n + f (n+1) (x0 )(x − x0 )n+1 .
n! (n + 1)!

From the above expansion, we have the following proposition.

Proposition 9.7.1 (Nth-Derivative Test) Suppose that f ′ (x0 ) = 0, and the


first nonzero derivative value at x0 encountered in successive derivation is that of
the Nth derivative, f (N ) (x0 ) ̸= 0. Then the stationary value f (x0 ) will be

(a) a relative maximum if N is an even number and f (N ) (x0 ) < 0;

(b) a relative minimum if N is an even number and f (N ) (x0 ) > 0;

(c) an inflection point if N is odd.

Example 9.7.1 y = (7 − x)4 .


Since f ′ (7) = 4(7−7)3 = 0, f ′′ (7) = 12(7−7)2 = 0, f ′′′ (7) = 24(7−7) = 0,
f (n) (7) = 24 > 0, so x = 7 is a minimal point such that f (7) = 0.
162CHAPTER 9. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF ONE V
Chapter 10

Exponential and Logarithmic


Functions

Exponential functions, as well as the closely related logarithmic functions,


have important applications in economics, especially in connection with
growth problems, and in economic dynamics in general. This chapter dis-
cusses some basic properties and derivatives of exponential and logarith-
mic functions.

10.1 The Nature of Exponential Functions


In this simple version, the exponential function may be represented in the
form:
y = f (t) = bt (b > 1),

where b denotes a fixed base of the exponent t. Its generalized version has
the form:
y = abct .

Remark 10.1.1 y = abct = a(bc )t . Thus we can consider bc as a base of


exponent t. It changes exponent from ct to t and changes base b to bc .

163
164 CHAPTER 10. EXPONENTIAL AND LOGARITHMIC FUNCTIONS

If the base is an irrational number e = 2.71828 · · · , the function:

y = aert

is referred to the natural exponential function, which can be alternatively


denoted as
y = a exp(rt).

Remark 10.1.2 It can be proved that e may be defined as the limit:


( )n
1
e ≡ lim f (n) = lim 1+ .
n→∞ n→∞ n

10.2 Logarithmic Functions

For the exponential function y = bt and the natural exponential function


y = et , taking the log of y to the base b (denote by logb y) and the base e
(denoted by loge y) respectively, we obtain the logarithmic function.

t = logb y,

and
t = loge y ≡ ln y.

For example, we know that 42 = 16. So we can write log4 16 = 2.


Since y = bt ⇐⇒ t = logb y, we can write

blogb y = bt = y.

The following rules are familiar to us:


10.3. DERIVATIVES OF EXPONENTIAL AND LOGARITHMIC FUNCTIONS165

Rules:

(a) ln(uv) = ln u + ln v (log of product);

(b) ln(u/v) = ln u − ln v (log of quotient);

(c) ln ua = a ln u (log of power);

(d) logb u = (logb e)(loge u) = (logb e)(ln u) (conversion of log


base);

(e) logb e = 1/(loge b) = 1/ ln b (inversion of log base).

Properties of Log:

(a) log y1 = log y2 iff y1 = y2 ;

(b) log y1 > log y2 iff y1 > y2 ;

(c) 0 < y < 1 iff log y < 0;

(d) y = 1 iff log y = 0;

(e) log y → ∞ as y → ∞;

(f) log y → −∞ as y → 0.

Remark 10.2.1 t = logb y and t = ln y are the respective inverse functions


of the exponential functions y = bt and y = et .

10.3 Derivatives of Exponential and Logarithmic


Functions
The Basic Rule:

d ln t
(a) dt
= 1t ;

det
(b) dt
= et ;
166 CHAPTER 10. EXPONENTIAL AND LOGARITHMIC FUNCTIONS

(c) def (t)


dt
= f ′ (t)ef (t) ;

d f ′ (t)
(d) dt
ln f (t) = f (t)
.

Example 10.3.1 The following are examples to find derivatives:

(a) Let y = ert . Then dy/dt = rert ;

(b) Let y = e−t . Then dy/dt = −e−t ;

(c) Let y = ln at. Then dy/dt = a/at = 1/t;

(d) Let y = ln tc . Since y = ln tc = c ln t, so dy/dt = c(1/t);

(e) Let y = t3 ln t2 . Then dy/dt = 3t2 ln t2 + 2t3 /t = 2t2 (1 + 3 ln t).

The Case of Base b

dbt
(a) dt
= bt ln b;
d 1
(b) dt
logb t = t ln b
;

(c) d f (t)
dt
b = f ′ (t)bf (t) ln b;
d f ′ (t) 1
(d) dt
logb f (t) = f (t) ln b
.

t
Proof of (a). Since bt = eln b = et ln b , then (d/dt)bt = (d/dt)et ln b =
(ln b)(et ln b ) = bt ln b.
Proof of (b). Since

logb t = (logb e)(loge t) = (1/ ln b) ln t,

(d/dt)(logb t) = (d/dt)[(1/ ln b) ln t] = (1/ ln b)(1/t)

Example 10.3.2 (a) Let y = 121−t . Then dy


dt
= d(1−t)
dt
121−t ln 12 = −121−t ln 12
10.3. DERIVATIVES OF EXPONENTIAL AND LOGARITHMIC FUNCTIONS167

An Application

Example 10.3.3 Find dy/dx from y = xa ekx−c . Taking the natural log of
both sides, we have
ln y = a ln x + kx − c.

Differentiating both sides with respect to x, we get

1 dy a
= + k.
y dx x

dy
Thus dx
= (a/x + k)y = (a/x + k)xa ekx−c .

Use the above technical method, we can similarly find the derivative
of y = ϕ(x)ψ(x) .
168 CHAPTER 10. EXPONENTIAL AND LOGARITHMIC FUNCTIONS
Chapter 11

Optimization: Maxima and


Minima of a Function of Two or
More Variables

This chapter develops a way of finding the extreme values of an objec-


tive function that involves two or more choice variables. As before, our
attention will be focused heavily on relative extrema, and for this reason
we should often drop the adjective "relative," with the understanding that,
unless otherwise specified, the extrema referred to are relative.

11.1 The Differential Version of Optimization Con-


dition
This section shows the possibility of equivalently expressing the deriva-
tive version of first and second order conditions in terms of differentials.
Consider the function z = f (x). Recall that the differential of z = f (x)
is
dz = f ′ (x)dx.

169
170CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

Since f ′ (x) = 0, it implies that dz = 0 is the necessary condition for


extreme values. This first-order condition requires that dz = 0 as x is
varied. In such a context, with dx ̸= 0, dz = 0 if and only if f ′ (x) = 0.
What about the sufficient conditions in terms of second-order differ-
entials?
Differentiating dz = f ′ (x)dx, we have

d2 z ≡ d(dz) = d[f ′ (x)dx]


= d[f ′ (x)]dx
= f ′′ (x)dx2 .

Note that the symbols d2 z and dx2 are fundamentally different. d2 z


means the second-order differential of z; but dx2 means the squaring of
the first-order differential dx.
Thus, from the above equation, we have d2 z < 0 (resp. d2 z > 0) if and
only if f ′′ (x) < 0 (resp. f ′′ (x) > 0). Therefore, the second-order sufficient
condition for maximum (resp. minimum) of z = f (x) is d2 z < 0 (resp.
d2 z > 0).

11.2 Extreme Values of a Function of Two Vari-


ables

For a function of one variable, an extreme value is represented graphically


by the peak of a hill or the bottom of a valley in a two-dimensional graph.
With two choice variables, the graph of the function z = f (x, y) becomes a
surface in a 3-space, and while the extreme values are still to be associated
with peaks and bottoms.
11.2. EXTREME VALUES OF A FUNCTION OF TWO VARIABLES 171

Figure 11.1: The graphical illustrations for extrema of a function with two
choice variables: (a) A is a maximum; and (b) B is a minimum.

First-Order Condition

For a function z = f (x, y), the first-order necessary condition for an


extremum again involves dz = 0 for arbitrary values of dx and dy: an
extremum must be a stationary point, at which z must be constant for
arbitrary infinitesimal changes of two variables x and y.

In the present two-variable case, the total differential is

dz = fx dx + fy dy.

Thus, the equivalent derivative version of the first-order condition dz =


0 is
fx = fy = 0 or ∂f /∂x = ∂f /∂y = 0.

As in the earlier discussion, the first-order condition is necessary, but


not sufficient. To develop a sufficient condition, we must look to the
second-order total, which is related to second-order partial derivatives.
172CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

Second-Order Partial Derivatives

From the function z = f (x, y), we can have two first-order partial
derivatives, fx and fy . Since fx and fy are themselves functions of x, we
can find second-order partial derivatives:
( )
∂ ∂ 2z ∂ ∂z
fxx ≡ fx or 2
≡ ;
∂x ∂x ∂x ∂x
( )
∂ ∂ 2z ∂ ∂z
fyy ≡ fy or 2 ≡ ;
∂y ∂y ∂y ∂y
( )
∂ 2z ∂ ∂z
fxy ≡ ≡ ;
∂x∂y ∂x ∂y
( )
∂ 2z ∂ ∂z
fyx ≡ ≡ .
∂y∂x ∂y ∂x

The last two are called cross (or mixed) partial derivatives.

Theorem 11.2.1 (Schwarz’s Theorem or Young’s Theorem) If at least one


of the two partials is continuous, then

∂ 2f ∂2f
= , i, j = 1, 2, · · · , n.
∂xj ∂xi ∂xi ∂xj

Remark 11.2.1 Even though fxy and fyx have been separately defined, they
will – according to Young’s theorem, be identical with each other, as long
as the two cross partial derivatives are both continuous. In fact, this the-
orem applies also to functions of three or more variables. Given z =
g(u, v, w), for instance, the mixed partial derivatives will be characterized
by guv = gvu , guw = gwu , etc. provided these partial derivatives are contin-
uous.

Example 11.2.1 Find all second-order partial derivatives of z = x3 + 5xy −


y 2 . The first partial derivatives of this function are

fx = 3x2 + 5y and fy = 5x − 2y.


11.2. EXTREME VALUES OF A FUNCTION OF TWO VARIABLES 173

Thus, fxx = 6x, fyx = 5, and fyy = −2. As expected, fyx = fxy .

Example 11.2.2 For z = x2 e−y , its first partial derivatives are

fx = 2xe−y and fy = −x2 e−y .

Again, fyx = fxy .

Second-Order Total Differentials

From the first total differential

dz = fx dx + fy dy,

we can obtain the second-order total differential d2 z:

∂(dz) ∂(dz)
d2 z ≡ d(dz) = dx + dy
∂x ∂y
∂ ∂
= (fx dx + fy dy)dx + (fx dx + fy dy)dy
∂x ∂y
= [fxx dx + fxy dy]dx + [fyx dx + fyy dy]dy
= fxx dx2 + fxy dydx + fyx dxdy + fyy dy 2
= fxx dx2 + 2fxy dxdy + fyy dy 2 [ if fxy = fyx ].

We know that if f (x, y) satisfy the conditions of Schwarz’s theorem, we


have fxy = fyx .

Example 11.2.3 Given z = x3 + 5xy − y 2 , find dz and dz 2 .

dz =fx dx + fy dy
=(3x2 + 5y)dx + (5x − 2y)dy.
174CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

d2 z =fxx dx2 + 2fxy dxdy + fyy dy 2


=6xdx2 + 10dxdy − 2dy 2 .

Note that the second-order total differential can be written in matrix


form

d2 z = fxx dx2 + 2fxy dxdy + fyy dy 2


  
[ ]
fxx fxy  dx
= dx, dy   
fyx fyy dy

for the function z = f (x, y), where the matrix


 
fxx fxy 
H=
 
fyx fyy

is called the Hessian matrix (or simply a Hessian).


Then, by the discussion on quadratic forms in Chapter 5, we have

(a) d2 z is positive definite iff fxx > 0 and |H| = fxx fyy −(fxy )2 >
0;

(b) d2 z is negative definite iff fxx < 0 and |H| = fxx fyy −(fxy )2 >
0.

From the inequality fxx fyy − (fxy )2 > 0, it implies that fxx and fyy are
required to take the same sign.

Example 11.2.4 Give fxx = −2, fxy = 1, and fyy = −1 at a certain point on
a function z = f (x, y), does d2 z have a definite sign at that point regardless
of the values of dx and dy? The Hessian determinant is in this case

−2 1
,
1 −1
11.2. EXTREME VALUES OF A FUNCTION OF TWO VARIABLES 175

with principal minors |H1 | = −2 < 0 and

−2 1
|H2 | = = 2 − 1 = 1 > 0.
1 −1

Thus d2 z is negative definite.

Example 11.2.5 Give fxx = −2, fxy = 1, and fyy = −1 at a certain point on
a function z = f (x, y), does d2 z have a definite sign at that point regardless
of the values of dx and dy? The Hessian determinant is in this case

−2 1
,
1 −1

with principal minors |H1 | = −2 < 0 and

−2 1
|H2 | = = 2 − 1 = 1 > 0.
1 −1

Thus d2 z is negative definite.

For operational convenience, second-order differential conditions can


be translated into equivalent conditions on second-order derivatives. The
actual translation would require a knowledge of quadratic forms, which
has already been discussed in Chapter 5.

Second-Order Sufficient Condition for Extremum

Using the concept of d2 z, we then have:

(a) For maximum of z = f (x, y): d2 z < 0 for any values of dx


and dy, not both zero, which is equivalent to:

fxx < 0, fyy < 0, and fxx fyy > (fxy )2 ;


176CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

(b) For minimum of z = f (x, y): d2 z > 0 for any values of dx


and dy, not both zero, which is equivalent to:

fxx > 0, fyy > 0, and fxx fyy > (fxy )2 .

Therefore, from the above first- and second-order conditions, we obtain


the following proposition for relative extremum.

Proposition 11.2.1 (Conditions for Extremum) Suppose that z = f (x, y) are


twice continuously differentiable. Then, we have
Conditions for Maximum:

(1) fx = fy = 0 (necessary condition);

(2) fxx < 0, fyy < 0, and fxx fyy > (fxy )2 .

Conditions for Minimum:

(1) fx = fy = 0 (necessary condition);

(2) fxx > 0, fyy > 0, and fxx fyy > (fxy )2 .

Example 11.2.6 Find the extreme values of z = 8x3 + 2xy − 3x2 + y 2 + 1.

fx = 24x2 + 2y − 6x, fy = 2x + 2y;

fxx = 48x − 6, fyy = 2, fxy = 2.

Setting fx = 0 and fy = 0, we have

24x2 + 2y − 6x = 0;

2y + 2x = 0.

Then y = −x and thus from 24x2 + 2y − 6y, we have 24x2 − 8x = 0


which yields two solutions for x: x̄1 = 0 and x̄2 = 1/3.
11.3. OBJECTIVE FUNCTIONS WITH MORE THAN TWO VARIABLES177

Since fxx (0, 0) = −6 and fyy (0, 0) = 2, it is impossible fxx fyy ≥ (fxy )2 =
4, so the point (x̄1 , ȳ1 ) = (0, 0) is not extreme point. For the solution
(x̄2 , ȳ2 ) = (1/3, −1/3), we find that fxx = 10 > 0, fyy = fxy = 2 > 0, and
fxx fyy − (fxy )2 = 20 − 4 > 0, so (x̄, ȳ, z̄) = (1/3, −1/3, 23/27) is a relative
minimum point.

Example 11.2.7 z = x + 2ey − ex − e2y . Letting fx = 1 − ex = 0 and


fy = 2e − 2e2y = 0, we have x̄ = 0 and ȳ = 1/2. Since fxx = −ex , fyy =
−4e2y , and fxy = 0, then fxx (0, 1/2) = −1 < 0, fyy (0, 1/2) = −4e < 0, and
fxx fyy − (fxy )2 > 0, so (x̄, ȳ, z̄) = (0, 1/2, −1) is the maximization of the
function.

11.3 Objective Functions with More than Two Vari-


ables

When there are n choice variables, the objective function may be expressed
as
z = f (x1 , x2 , · · · , xn ).

The total differential will then be

dz = f1 dx1 + f2 dx2 + · · · + fn dxn

so that the necessary condition for extremum is dz = 0 for arbitrary dxi ,


which in turn means that all the n first-order partial derivatives are re-
quired to be zero:
f1 = f2 = · · · = fn = 0.

It can be verified that the second-order differential d2 z can be written


178CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

as
  
f
 11
f12 · · · f1n   dx1 
[ 
]  
 
2  f21 f22 · · · f2n   dx2 
d z = dx1 , dx2 , · · · , dxn 

 
· · · · · · · · · · · ·  
···
  
fn1 fn2 · · · fnn dxn

≡ (dx)′ Hdx.

Thus the Hessian determinant is

f11 f12 · · · f1n


f21 f22 · · · f2n
|H| =
··· ··· ··· ···
fn1 fn2 · · · fnn

and the second-order sufficient condition for extremum is, as before, that
all the n principal minors be positive for a minimum in z and that they
duly alternate in sign for a maximum in z, the first one being negative.
In summary, we have the following proposition.

Proposition 11.3.1 (Conditions for Extremum) Suppose that z = f (x1 , x2 , . . . , xn )


are twice continuously differentiable. Then, we have:
Conditions for Maximum:

(1) f1 = f2 = · · · = fn = 0 (necessary condition);

(2) |H1 | < 0, |H2 | > 0, |H3 | < 0, · · · , (−1)n |Hn | > 0. [d2 z is
negative definite].

Conditions for Minimum:

(1) f1 = f2 = · · · = fn = 0 (necessary condition);

(2) |H1 | > 0, |H2 | > 0, |H3 | > 0, · · · , |Hn | > 0. [d2 z is positive
definite].
11.4. SECOND-ORDER CONDITIONS IN RELATION TO CONCAVITY AND CONVEXITY179

Example 11.3.1 Find the extreme values of

z = 2x21 + x1 x2 + 4x22 + x1 x3 + x23 + 2.

From the first-order condition:

f1 = 0 : 4x1 + x2 + x3 = 0;
f2 = 0 : x1 + 8x2 + 0 = 0;
f3 = 0 : x1 + 0 + 2x3 = 0,

we can find a unique solution x̄1 = x̄2 = x̄3 = 0. This means that there is
only one stationary value, z̄ = 2. The Hessian determinant of this function
is
f11 f12 f13 4 1 1
|H| = f21 f22 f23 = 1 8 0 .
f31 f32 f33 1 0 2

Since the principal minors of which are all positive: |H1 | = 4, |H2 | =
31, and |H3 | = 54, we can conclude that z̄ = 2 is a minimum.

11.4 Second-Order Conditions in Relation to Con-


cavity and Convexity

11.4.1 Concavity and Convexity

Second-order conditions which are always concerned with whether a s-


tationary point is the peak of a hill or the bottom of a valley are closely
related to the so-called (strictly) concave or convex functions.
A function that gives rise to a hill (resp. valley) over the entire domain
is said to be a concave (resp. convex) function. If the hill (resp. valley)
180CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

pertains only to a subset S of the domain, the function is said to be concave


(resp. convex) on S.
Mathematically, a function is said to be concave (resp. convex) if, for
any pair of distinct points u and v in the domain of f , and for any 0 < θ <
1,
θf (u) + (1 − θ)f (v) ≤ f (θu + (1 − θ)v)

(resp. θf (u) + (1 − θ)f (v) ≥ f (θu + (1 − θ)v))

Furthermore, if the weak inequality "≤" (resp. "≥") is replaced by the


strictly inequality "<" (resp. ">"), the function is said to be strictly concave
(resp. strictly convex).

Remark 11.4.1 θu + (1 − θ)v consists of line segments between points u


and v when θ takes values of 0 ≤ θ ≤ 1. Thus, in the sense of geometry, the
function f is concave (resp. convex) if and only if the line segment of any
two points u and v lies on or below (resp. above) the surface. The function
is strictly concave (resp. strictly convex) if and only if the line segment lies
entirely below (resp. above) the surface, except at M and N .

From the definition of concavity and convexity, we have the following


three theorems:
Theorem I (Linear functions). If f (x) is a linear function, then it is a
concave function as well as a convex function, but not strictly so.
Theorem II (Negative of a function). If f (x) is a (strictly) concave
function, then −f (x) is a (strictly) convex function, and vice versa.
Theorem III (Sum of functions). If f (x) and g(x) are both concave
(resp. convex) functions, then f (x)+g(x) is a concave (resp. convex) func-
tion. Further, in addition, either one or both of them are strictly concave
(resp. strictly convex), then f (x) + g(x) is strictly concave (resp. convex).
In view of the association of concavity (resp. convexity) with a global
hill (valley) configuration, an extremum of a concave (resp. convex) func-
11.4. SECOND-ORDER CONDITIONS IN RELATION TO CONCAVITY AND CONVEXITY181

Figure 11.2: The graphical illustration of a concave function with two


choice variables and the definition of concavity.

tion must be a peak - a maximum (resp. a bottom - a minimum). Moreover,


the maximum (resp. minimum) must be a peak an absolute maximum (re-
sp. minimum). Furthermore, the maximum (resp. minimum) is unique if
the function is strictly concave (resp. strictly convex).
In the preceding paragraph, the properties of concavity and convexi-
ty are taken to be global in scope. If they are valid only for a portion of
surface (only in a subset S of domain), the associated maximum and min-
imum are relative to that subset of the domain.
We know that when z = f (x1 , · · · , xn ) is twice continuous differen-
tiable, z = f (x1 , · · · , xn ) reaches its maximum (resp. minimum) if d2 z is
negative (resp. positive) definite.
The following proposition shows the relationship between concavity
(resp. convexity) and negative definiteness.

Proposition 11.4.1 A twice continuously differentiable function z = f (x1 , x2 , · · · , xn )


is concave (resp. convex) if and only if d2 z is everywhere negative (resp. positive)
semidefinite. The said function is strictly concave (resp. convex) if (but not on-
182CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

ly if) d2 z is everywhere negative (resp. positive) definite, i.e., its Hessian matrix
H = D2 f (x) is negative (positive) definite on X.

Remark 11.4.2 As discussed above, the strict concavity of a function f (x)


can be determined by testing whether the principal minors of the Hessian
matrix change signs alternately, namely,

|H1 | = f11 > 0,

f11 f12
|H2 | = > 0,
f21 f22

f11 f12 f13


|H3 | = f21 f22 f23 < 0,
f31 f32 f33

··· ,
(−1)n |Hn | = (−1)n |H| > 0.

∂ 2f
and so on, where fij = . This algebraic condition is very useful for
∂xi ∂xj
testing second-order conditions of optimality. It can easily verify whether
a function is strictly concave (resp. strictly convex) by checking whether
its Hessian matrix is negative (resp. positive) definite.

Example 11.4.1 Check z = −x4 for concavity or convexity by the deriva-


tive condition.
Since d2 z = −12x2 dx2 ≤ 0 for all x and dx2 , it is concave. This function,
in fact, is strictly concave.

Example 11.4.2 Check z = x21 + x22 for concavity or convexity.


11.4. SECOND-ORDER CONDITIONS IN RELATION TO CONCAVITY AND CONVEXITY183

Since
f11 f12 2 0
|H| = = ,
f21 f22 0 2

|H1 | = 2 > 0, |H2 | = 4 > 0. Thus, by the proposition, the function is


strictly convex.

Example 11.4.3 Check if the following production is concave:

Q = f (L, K) = Lα K β ,

where L, K > 0; α, β > 0, and α + β < 1.


Since

fL = αLα−1 K β ,
fK = βLα Lβ−1 ;
fLL = α(α − 1)Lα−2 K β ,
fKK = β(β − 1)Lα K β−2 ,
fLK = αβLα−1 K β−1 ,

thus

|H1 | = fLL = α(α − 1)Lα−2 K β < 0;


fLL fLK
|H2 | = = fLL fKK − (fLK )2
fKL fKK
= αβ(α − 1)(β − 1)L2(α−1) K 2(β−1) − α2 β 2 L2(α−1) K 2(β−1)
= αβ[(α − 1)(β − 1) − αβ]L2(α−1) K 2(β−1)
= αβ(1 − α − β)L2(α−1) K 2(β−1) > 0.

Therefore, it is strictly concave for L, K > 0, α, β > 0, and α + β < 1.

If we only require a function be differentiable, but not twice differen-


184CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

tiable, the following position fully characterizes the concavity of the func-
tion.

Proposition 11.4.2 Suppose that f : R → R is differentiable. Then f is concave


if and only if for any x, y ∈ R, we have

f (y) 5 f (x) + f ′ (x)(y − x). (11.4.1)

Indeed, for a concave function, the from Figure 11.3, we have

u(x) − u(x∗ )
5 u′ (x∗ ),
x − x∗

which means (12.5.3).

slope=u’(x*)

u’(x)-u’(x*)
slope=
x-x*

x* x

Figure 11.3: The graphical illustration why Proposition 11.4.2 holds for a
concave function.

When there are two or more independent variables, the above propo-
sition becomes:

Proposition 11.4.3 Suppose that f : Rn → R is differentiable. Then f is con-


11.4. SECOND-ORDER CONDITIONS IN RELATION TO CONCAVITY AND CONVEXITY185

cave if and only if for any x, y ∈ R, we have


n
∂f (x)
f (y) 5 f (x) + (yj − xj ). (11.4.2)
j=1 ∂xj

11.4.2 Concavity/Convexity and Global Optimization

A local optimum is, in general, not equal to the global optimum, but under
certain conditions, these two are consistent with each other.

Theorem 11.4.1 (Global Optimum) Suppose that f is a concave and twice


continuously differentiable function on X ⊆ Rn , and x∗ is an interior point
of X. Then, the following three statements are equivalent:

(1) Df (x∗ ) = 0.

(2) f has a local maximum at x∗ .

(3) f has a global maximum at x∗ .

P ROOF . It is clear that (3) ⇒ (2), and it follows from the first-order
condition that (2) ⇒ (1). Therefore, we just need to prove that (1) ⇒ (3).
Suppose Df (x∗ ) = 0. Then f is concave implies that for all x in the
domain, we have:

f (x) 5 f (x∗ ) + Df (x∗ )(x − x∗ ).

These two formulas mean that for all x, we must have

f (x) 5 f (x∗ ).

Therefore, f reaches a global maximum at x∗ . 2

Theorem 11.4.2 (Uniqueness of Global Optimum) Let X ⊆ Rn .


186CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

(1) If a strictly concave function f defined on X reaches a local max-


imum value at x∗ , then x∗ is the unique global maximum point.

(2) If a strictly convex function f reaches a local minimum value at


x̃, then x̃ is the unique global minimum point.

P ROOF . Proof by contradiction. If x∗ is a global maximum point of


function f , but not unique, then there is a point x′ ̸= x∗ , such that f (x′ ) =
f (x∗ ). Suppose that xt = tx′ + (1 − t)x∗ . Then, strict concavity requires
that for all t ∈ (0, 1),

f (xt ) > tf (x′ ) + (1 − t)f (x∗ ).

Since f (x′ ) = f (x∗ ),

f (xt ) > tf (x′ ) + (1 − t)f (x′ ) = f (x′ ).

This contradicts the assumption that x′ is a global maximum point off .


Consequently, the global maximum point of a strictly concave function is
unique. The proof of part (2) is similar, and thus omitted. 2

Theorem 11.4.3 (The sufficient condition for the uniqueness of global optimum)
Suppose that f (x) is twice continuously differentiable on X ⊆ Rn . We have:

(1) If f (x) is strictly concave and fi (x∗ ) = 0, i = 1, · · · , n, then x∗


is a unique global maximum point of f (x).

(2) If f (x) is strictly convex and fi (x̃) = 0, i = 1, · · · , n, then x̃ is a


unique global minimum point of f (x).
11.5. ECONOMIC APPLICATIONS 187

11.5 Economic Applications

Problem of a Multiproduct Firm

Example 11.5.1 Suppose that a competitive firm produces two products.


Let Qi represents the output level of the i-th product and let the prices of
the products are denoted by P1 and P2 . Since the firm is a competitive
firm, it takes the prices as given. Accordingly, the firm’s revenue function
will be
T R = P1 Q1 + P2 Q2

The firm’s cost function is assumed to be

C = 2Q21 + Q1 Q2 + 2Q22 .

Then, the profit function of this hypothetical firm is given by

π = T R − C = P1 Q1 + P2 Q2 − 2Q21 − Q1 Q2 − 2Q22 .

The firm wants to maximize the profit by choosing the levels of Q1 and
Q2 . For this purpose, setting

∂π
= 0 : P1 − 4Q1 − Q2 = 0;
∂Q1
∂π
= 0 : P2 − Q1 − 4Q2 = 0,
∂Q2

we have

4Q1 + Q2 = P1 ;
Q1 + 4Q2 = P2 ,
188CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

and thus
4P1 − P2 4P2 − P1
Q̄1 = and Q̄2 = .
15 15
Also the Hessian matrix is
   
π11 π12  −4 −1
H= = .
π21 π22 −1 −4

Since |H1 | = −4 < 0 and |H2 | = 16 − 1 > 0, the Hessian matrix (or
d2 z) is negative definite, and the solution does maximize. In fact, since
H is everywhere negative definite, the maximum profit found above is
actually a unique absolute maximum.

Example 11.5.2 Let us now transplant the problem in the above example
into the setting of a monopolistic market.
Suppose that the demands facing the monopolist firm are as follows:

Q1 = 40 − 2P1 + P2 ;
Q2 = 15 + P1 − P2 .

Again, the cost function is given by

C = Q21 + Q1 Q2 + Q22 .

From the monopolistic’s demand function, we can express prices P1


and P2 as functions of Q1 and Q2 and so for the profit function. The reason
we want to do so is that we need express the profit as the function of
outputs only. Thus, solving

− 2P1 + P2 = Q1 − 40;
P1 − P2 = Q2 − 15,
11.5. ECONOMIC APPLICATIONS 189

we have

P1 = 55 − Q1 − Q2 ;
P2 = 70 − Q1 − 2Q2 .

Consequently, the firm’s total revenue function T R can be written as

T R =P1 Q1 + P2 Q2
=(55 − Q1 − Q2 )Q1 + (70 − Q1 − 2Q2 )Q2 ;
=55Q1 + 70Q2 − 2Q1 Q2 − Q21 − 2Q22 .

Thus the profit function is

π = TR − C
= 55Q1 + 70Q2 − 3Q1 Q2 − 2Q21 − 3Q22 ,

which is an object function with two choice variables Q1 and Q2 . Setting

∂π
= 0 : 55 − 4Q1 − 3Q2 = 0;
∂Q1
∂π
= 0 : 70 − 3Q1 − 6Q2 = 0,
∂Q2

we can find the solution output level are

2
(Q̄1 , Q̄2 ) = (8, 7 ).
3

The prices and profit are

1 2 1
P̄1 = 39 , P̄2 = 46 , and π̄ = 488 .
3 3 3
190CHAPTER 11. OPTIMIZATION: MAXIMA AND MINIMA OF A FUNCTION OF TWO

Inasmuch as the Hessian determinant is

−4 −3
,
−3 −6

we have |H1 | = −4 < 0 and |H2 | = 15 > 0 so that the value of π̄ does rep-
resent the maximum. Also, since Hessian matrix is everywhere negative
definite, it is a unique absolute maximum.
Chapter 12

Optimization with Equality


Constraints

The last chapter presented a general method for finding the relative ex-
trema of an objective function of two or more variables. One important
feature of that discussion is that all the choice variables are independent
of one another, in the sense that the decision made regarding one vari-
able does not depend on the choice of the remaining variables. However,
in many cases, optimization problems are the constrained optimization
problem. For instance, every consumer maximizes her utility subject to
her budget constraint. A firm minimizes the cost of production with the
constraint of production technique.

In the present chapter, we shall consider the problem of optimization


with equality constraints. Our primary concern will be with relative con-
strained extrema.

191
192 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

12.1 Effects of a Constraint

In general, for a function, say z = f (x, y), the difference between a con-
strained extremum and a free extremum may be illustrated in Figure 12.1.

Figure 12.1: Difference between a constrained extremum and a free ex-


tremum

The free extremum in this particular graph is the peak point of entire
domain, but the constrained extremum is at the peak of the inverse U-
shaped curve situated on top of the constraint line. In general, a constraint
(less freedom) maximum can be expected to have a lower value than the
free (more freedom) maximum, although by coincidence, the two maxi-
ma may happen to have the same value. But the constrained maximum
can never exceed the free maximum. To have certain degrees of freedom
of choices, the number of constraints in general should be less than the
number of choice variables.
12.2. FINDING THE STATIONARY VALUES 193

12.2 Finding the Stationary Values

For illustration, let us consider a consumer choice problem: maximize her


utility:
u(x1 , x2 ) = x1 x2 + 2x1

subject to the budget constraint:

4x1 + 2x2 = 60.

Even without any new technique of solution, the constrained maxi-


mum in this problem can easily be found. Since the budget line implies

60 − 4x1
x2 = = 30 − 2x1
2

we can combine the constraint with the objective function by substitution.


The result is an objective function in one variable only:

u = x1 (30 − 2x1 ) + 2x1 = 32x1 − 2x21 ,

which can be handled with the method already learned. By setting

∂u
= 32 − 4x1 = 0,
∂x1

we get the solution x̄1 = 8 and thus, by the budget constraint, x̄2 =
30 − 2x̄1 = 30 − 16 = 14 since d2 u/dx21 = −4 < 0, that stationary value
constitutes a (constrained) maximum.

However, when the constraint is itself a complicated function, or when


the constraint cannot be solved to express one variable as an explicit func-
tion of the other variables, the technique of substitution and elimination
of variables could become a burdensome task or would in fact be of no
194 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

avail. In such case, we may resort to a method known as the method of


Lagrange multiplier.

Lagrange-Multiplier Method

The essence of the Lagrange-multiplier method is to convert a con-


strained extremum problem into a free-extremum problem so that the first-
order condition approach can still be applied.

In general, given an objective function

z = f (x, y)

subject to the constraint


g(x, y) = c,

where c is a constant, we can define the Lagrange function as

Z = f (x, y) + λ[c − g(x, y)].

The symbol λ, representing some as yet undermined number, is called


the Lagrange multiplier. If we can somehow be assured that g(x, y) = c,
so that the constraint will be satisfied, then the last term of Z will van-
ish regardless of the value of λ. In that event, Z will be identical with u.
Moreover, with the constraint out of the way, we only have to seek the free
maximum. The question is: How can we make the parenthetical expres-
sion in Z vanish?

The tactic that will accomplish this is simply to treat λ as an additional


variable, i.e., to consider Z = Z(λ, x, y). For stationary values of Z, then
12.2. FINDING THE STATIONARY VALUES 195

the first-order condition for an interior free extremum is

∂Z
Zλ ≡ = c − g(x, y) = 0;
∂λ
∂Z
Zx ≡ = fx − λgx (x, y) = 0;
∂x
∂Z
Zy ≡ = fy − λgy (x, y) = 0.
∂y

Example 12.2.1 Let us again consider the consumer’s choice problem above.
The Lagrange function is

Z = x1 x2 + 2x1 + λ[60 − 4x1 − 2x2 , ]

for which the necessary condition for a stationary value is

Zλ = 60 − 4x1 − 2x2 = 0;
Zx1 = x2 + 2 − 4λ = 0;
Zx2 = x1 − 2λ = 0.

Solving the stationary point of the variables, we find that x̄1 = 8, x̄2 = 14,
and λ = 4. As expected, x̄1 = 8 and x̄2 = 14 are the same obtained by the
substitution method.

Example 12.2.2 Find the extremum of z = xy subject to x + y = 6. The


Lagrange function is
Z = xy + λ(6 − x − y).

The first-order condition is

Zλ = 6 − x − y = 0;
Zx = y − λ = 0;
Zy = x − λ = 0.
196 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

Thus, we find λ̄ = 3, x̄ = 3, ȳ = 3.

Example 12.2.3 Find the extremum of z = x21 + x22 subject to x1 + 4x2 = 2.


The Lagrange function is

Z = x21 + x22 + λ(2 − x1 − 4x2 ).

The first-order condition (FOC) is

Zλ = 2 − x1 − 4x2 = 0;
Zx1 = 2x1 − λ = 0;
Zx2 = 2x2 − 4λ = 0.

The stationary value of Z, defined by the solution

4 2 8
λ̄ = , x̄1 = , x̄2 = ,
17 17 17

4
is therefore Z̄ = z̄ = 17
.
To tell whether z̄ is a maximum, we need to consider the second-order
condition.

An Interpretation of the Lagrange Multiplier

The Lagrange multiplier λ̄ measures the sensitivity of Z to change in


the constraint. If we can express the solution λ̄, x̄, and ȳ all as implicit
functions of the parameter c:

λ̄ = λ̄(c), x̄ = x̄(c), and ȳ = ȳ(c),


12.2. FINDING THE STATIONARY VALUES 197

all of which will have continuous derivative, we have the identities:

c − g(x̄, ȳ) ≡ 0;
fx (x̄, ȳ) − λ̄gx (x̄, ȳ) ≡ 0;
fy (x̄, ȳ) − λ̄gy (x̄, ȳ) ≡ 0.

Thus, we can consider Z as a function of c:

Z = f (x̄, ȳ) + λ̄[c − g(x̄, ȳ)].

Therefore, we have
[ ]
dZ̄ dx̄ dȳ dλ̄ dx̄ dȳ
= fx + fy + [c − g(x̄, ȳ)] + λ 1 − gx − gy
dc dc dc dc dc dc
dx̄ dȳ dx̄
= (fx − λgx ) + (fy − λgy ) + [c − g(x̄, ȳ)] + λ
dc dc dc
= λ.

n-Variable and Multiconstraint Cases

The generalization of the Lagrange-multiplier method to n variable can


be easily carried out. The objective function is

z = f (x1 , x2 , · · · , xn )

subject to
g(x1 , x2 , · · · , xn ) = c.

It follows that the Lagrange function will be

Z = f (x1 , x2 , · · · , xn ) + λ[c − g(x1 , x2 , · · · , xn )],


198 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

for which the first-order condition will be given by

Zλ = c − g(x1 , x2 , · · · , xn ) = 0;
Zi = fi (x1 , x2 , · · · , xn ) − λgi (x1 , x2 , · · · , xn ) = 0 [i = 1, 2, · · · , n].

If the objective function has more than one constraint, say, two con-
straints
g(x1 , x2 , · · · , xn ) = c and h(x1 , x2 , · · · , xn ) = d.

The Lagrange function is then defined by

Z = f (x1 , x2 , · · · , xn ) + λ[c − g(x1 , x2 , · · · , xn )] + µ[d − h(x1 , x2 , · · · , xn )],

for which the first-order condition consists of (n + 2) equations:

Zλ = c − g(x1 , x2 , · · · , xn ) = 0;
Zµ = d − h(x1 , x2 , · · · , xn ) = 0;
Zi = fi (x1 , x2 , · · · , xn ) − λgi (x1 , x2 , · · · , xn ) − µhi (x1 , x2 , · · · , xn ) = 0.

Summarizing the above discussions, we have the following conclusion


regarding the equality constrained optimization problems.

Proposition 12.2.1 (First-Order Necessary Condition for Interior Extremum)


Suppose that f (x) and g j (x), j = 1, · · · , m, are continuously differentiable func-
tions defined on X ⊆ Rn , x∗ is an interior point of X and an extreme point
(maximal or minimal point) of f —— where f is subject to the constraint of
g j (x∗ ) = 0, where j = 1, · · · , m. If the gradient Dg j (x∗ ) = 0, j = 1, · · · , m, are
linearly independent, then there is a unique λ∗j , j = 1, · · · , m,, such that:

∂L(x∗ , λ∗ ) ∂f (x∗ ) ∑ m
∂g j (x∗ )
= + λ∗j = 0, i = 1, · · · , n.
∂xi ∂xi i=1 ∂xi
12.3. SECOND-ORDER CONDITIONS 199

12.3 Second-Order Conditions

From the last section, we know that finding the constrained extremum is
equivalent to find the free extremum of the Lagrange function Z and give
the first-order condition. This section gives the second-order sufficient
condition for the constrained extremum of f .
For a constrained extremum of z = f (x, y), subject to g(x, y) = c, the
second-order necessary-and-sufficient conditions still revolve around the
algebraic sign of the second-order total differential d2 z, evaluated at a s-
tationary point. However, there is one important change. In the present
context, we are concerned with the sign definiteness or semidefiniteness
of d2 z, not for all possible values of dx and dy (not both zero), but only
for those dx and dy values (not both zero) satisfying the linear constraint
gx dx + gy dy = 0.
The second-order sufficient conditions are:

For maximum of z: d2 z negative definite, subject to dg = 0

For minimum of z: d2 z positive definite, subject to dg = 0

Inasmuch as the (dx, dy) pairs satisfying the constraint gx dx+gy dy = 0 con-
stitute merely a subset of the set of all possible dx and dy, the constrained
sign definiteness is less stringent. In other words, the second-order suffi-
cient condition for a constrained-extremum problem is a weaker condition
than that for a free-extremum problem.
In the following, we shall concentrate on the second-order sufficient
conditions.
200 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

The Bordered Hessian

We consider the case where the objective functions take form

z = f (x1 , x2 , · · · , xn )

subject to
g(x1 , x2 , · · · , xn ) = c.

The Lagrange function is then

Z = f (x1 , x2 , · · · , xn ) + λ[c − g(x1 , x2 , · · · , xn )].

Define the bordered Hessian determinant |H̄| by

0 g1 g2 ··· gn
g1 Z11 Z12 · · · Z1n
|H̄| = g2 Z21 Z22 · · · Z2n
··· ··· ··· ··· ···
gn Zn1 Zn2 · · · Znn

where in the newly introduced symbols, the horizontal bar above H mean-
s bordered, and Zij = fij − λgij .

Note that by the first-order condition,

f1 f2 fn
λ= = = ··· = .
g1 g2 gn
12.3. SECOND-ORDER CONDITIONS 201

The bordered principal minors can be defined as

0 g1 g2 g3
0 g1 g2
g1 Z11 Z12 Z13
|H̄2 | = g1 Z11 Z12 , |H̄3 | = (etc.)
g2 Z21 Z22 Z23
g2 Z21 Z22
g3 Z31 Z32 Z33

with the last one being |H̄n | = |H̄|, where the subscript indicates the order
of the leading principal minor being bordered. For instance, |H̄2 | involves
the second leading principal minor of the (plain) Hessian, bordered with
0, g1 , and g2 ; and similarly for the others. The conditions for positive and
negative definiteness of d2 z are then:
d2 z is negative definite subject to dg = 0 iff

|H̄2 | > 0, |H̄3 | < 0, |H̄4 | > 0, · · · , (−1)n |H̄n | > 0,

and d2 z is positive definite subject to dg = 0 iff

|H̄2 | < 0, |H̄3 | < 0, |H̄4 | < 0, · · · , (|H̄n | < 0.

In the former, all the bordered leading principal minors, starting with
|H̄2 |, must be negative; in the latter, they must alternate in sign. As previ-
ously, a positive definite d2 z is sufficient to establish a stationary value of
z as its minimum, whereas a negative definite d2 z is sufficient to establish
it as a maximum.
Summarizing the above discussions, we have the following conclu-
sions.

Proposition 12.3.1 (Second-Order Sufficient Condition for Interior Extremum)


Suppose that z = f (x1 , x2 , . . . , xn ) are twice continuously differentiable and
g(x1 , x2 , . . . , xn ) is differentiable. Then we have:
202 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

The Conditions for Maximum:

(1) Zλ = Z1 = Z2 = · · · = Zn = 0 (necessary condition);

(2) |H̄2 | > 0, |H̄3 | < 0, |H̄4 | > 0, · · · , (−1)n |H̄n | > 0.

The Conditions for Minimum:

(1) Zλ = Z1 = Z2 = · · · = Zn = 0 (necessary condition);

(2) |H̄2 | < 0, |H̄3 | < 0, |H̄4 | < 0, · · · , (|H̄n | < 0.

Example 12.3.1 For the objective function z = xy subject to x + y = 6, we


have shown that (x̄, ȳ, z̄) = (3, 3, 9) is a possible extremum solution. Since
Zx = y − λ and Zy = x − λ, then Zxx = 0, Zxy = 1, and Zyy = 0, gx = gy = 1.
Thus, we find that
0 1 1
|H̄| = 1 0 1 = 2 > 0.
1 1 0

which establishes the stationary value of z̄ = 9 as a maximum.

Example 12.3.2 For the objective function of z = x21 + x22 subject to x1 +


4x2 = 2, we have shown that (x̄, ȳ, z̄) = (2/17, 8/17, 4/7) is a possible ex-
tremum solution. To tell whether it is maximum or minimum or nor, we
need to check the second-order sufficient condition. Since Z1 = 2x1 − λ
and Z2 = 2x2 − λ as well as g1 = 1 and g2 = 4, we have Z11 = 2, Z22 = 2,
and Z12 = Z21 = 0. It thus follows that the bordered Hessian is

0 1 4
|H̄| = 1 2 0 = −34 < 0,
4 0 2

and the stationary value z̄ = 4/17 is a minimum.


12.4. GENERAL SETUP OF THE PROBLEM 203

12.4 General Setup of the Problem

Now consider the most general setting of the problem with n variables and
m equality constraints ("Extremize" means to find either the minimum or
the maximum of the objective function f ):

extremize f (x1 , . . . , xn ) (12.4.1)


s.t. g j (x1 , . . . , xn ) = bj , j = 1, 2, . . . , m < n.

Again, f is the objective function, g 1 , g 2 , . . . , g m are the constraint function-


s, and b1 , b2 , . . . , bm are the constraint constants. The difference n − m is the
number of degrees of freedom of the problem.
Note that we must require that n > m, otherwise there is no degree of
freedom to choose.
If it is possible to explicitly express (from the constraint functions) m
independent variables as functions of the other n − m independent vari-
ables, we can eliminate m variables in the objective function, thus the ini-
tial problem will be reduced to the unconstrained optimization problem
with respect to n − m variables. However, in many cases it is not techni-
cally feasible to explicitly express one variable as function of the others.
Instead of the substitution and elimination method, we may resort to
the easy-to-use and well-defined method of Lagrange multipliers.
Let f and g 1 , . . . , g m be continuously differentiable functions and the
j
Jacobian J = ( ∂g
∂xi
, i = 1, 2, . . . , n, j = 1, 2, . . . , m) have the full rank, i.e.
rank(J ) = m.
Introduce the Lagrangian function as


m
L(x1 , . . . , xn , λ1 , . . . , λm ) = f (x1 , . . . , xn ) + λj (bj − g j (x1 , . . . , xn )),
j=1
204 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

where λ1 , . . . , λm are constant (Lagrange multipliers).


What is the necessary condition for the solution of the problem (12.4.1)?
Equating all partial derivatives of L with respect to x1 , . . . , xn , λ1 , . . . , λm
to zero, we have

∂L(x1 , . . . , xn , λ1 , . . . , λm ) ∂f (x1 , . . . , xn ) ∑m
∂g j (x1 , . . . , xn )
= − λj = 0, i = 1, 2, . . . , n,
∂xi ∂xi j=1 ∂xi

∂L(x1 , . . . , xn , λ1 , . . . , λm )
= bj − g j (x1 , . . . , xn ) = 0, j = 1, 2, . . . , m.
∂λj
Solving these equations for x1 , . . . , xn and λ1 , . . . , λm , we will get a set
of stationary points of the Lagrangian. If x∗ = (x∗1 , . . . , x∗n ) is a solution of
the problem (12.4.1), it should be a stationary point of L.
It is important to assume that rank(J ) = m, and the functions are
continuously differentiable.
If we need to check whether a stationary point results in a maximum
or minimum of the object function, the following local sufficient condition
can be applied:

Proposition 12.4.1 (Sufficient Condition with Multiple Constraints) Let us


introduce a bordered Hessian |H̄r | as
 
∂g 1 ∂g 1

0 ··· 0 ∂x1
··· ∂xr 
 . .. .. .. .. .. 
 .. . . . . . 
 
 
 ∂g m ∂g m 
 0 ··· 0 ··· 
|H̄r | = det 
 ∂g1
∂x1 ∂xr 
, r = 1, 2, . . . , n.
 ∂g m ∂2L ∂ L 
2

 ∂x1 ··· ∂x1 ∂x1 ∂x1
··· ∂x1 ∂xr 
 . ... .. .. ... .. 
 .
 . . . .  
 
∂g 1 ∂g m ∂2L ∂2L
∂xr
··· ∂xr ∂xr ∂x1
··· ∂xr ∂xr

Let f and g 1 , . . . , g m are twice continuously differentiable functions and let


x∗ satisfy the necessary condition for the problem (12.4.1).
Let |H̄r (x∗ )| be the bordered Hessian determinant evaluated at x∗ . Then
12.5. QUASICONCAVITY AND QUASICONVEXITY 205

(1) if (−1)r−m+1 |H̄r (x∗ )| > 0, r = m + 1, . . . , n, then x∗ is a local


maximum point for the problem (12.4.1);

(2) if (−1)m |H̄r (x∗ )| > 0, r = m + 1, . . . , n, then x∗ is a local mini-


mum point for the problem (12.4.1).

Note that when m = 1, the above second-order sufficient conditions


reduce to the second-order sufficient conditions in the previous section.

12.5 Quasiconcavity and Quasiconvexity

For a problem of free extremum, we know that the concavity (resp.


convexity) of the objective function guarantees the existence of absolute
maximum (resp. absolute minimum). For a problem of constrained op-
timization, we will demonstrate that the quansiconcavity (resp. quasi-
convexity) of the objective function guarantees the existence of absolute
maximum (resp. absolute minimum).

Algebraic Characterization

Quasiconcavity and quasiconvexity, like concavity and convexity, can


be either strict or nonstrict:

Definition 12.5.1 A function is quasiconcave (resp. quasiconvex) if, for


any pair of distict points u and v in the convex domain of f , and for 0 <
θ < 1, we have

f (θu + (1 − θ)v) ≥ min{f (u), f (v)}


[resp. f (θu + (1 − θ)v) ≤ max{f (u), f (v)}].
206 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

Note that when f (v) ≥ f (u), the above inequalities imply respectively

f (θu + (1 − θ)v) ≥ f (u)


[resp. f (θu + (1 − θ)v) ≤ f (v)].

Furthermore, if the weak inequality "≥" (resp. "≤") is replaced by the


strict inequality ">" (resp. "<"), f is said to be strictly quasiconcave (resp.
strictly quasiconvex).

Figure 12.2: The graphic illustrations of quasiconcavity and quasiconvex-


ity: (a) The function is strictly quasiconcave; (b) the function is strictly
quasiconvex; and (c) the function is quasiconcave but not strictly quasi-
concave.

Remark 12.5.1 From the definition of quasiconcavity (resp. quasiconvex-


ity), we know that quasiconvity (resp. quasiconvexity) is a weaker condi-
tion than concavity (resp. convexity).

Theorem I (Negative of a function). If f (x) is quasiconcave (resp. strictly


quasiconcave), then −f (x) is quasiconvex (resp. strictly quasiconvex).
Theorem II (concavity versus quaisconcavity). Any (strictly) concave
(resp. convex) function is (strictly) quasiconcave (resp. quasiconvex), but
the converse may not be true.
Theorem III (linear function). If f (x) is linear, then it is quasiconcave
as well as quasiconvex.
12.5. QUASICONCAVITY AND QUASICONVEXITY 207

Theorem IV (monotone function with one variable). If f is a function


of one variable, then it is quasiconcave as well as quasiconvex.

Remark 12.5.2 Note that, unlike concave (convex) functions, a sum of t-


wo quasiconcave (quasiconvex) functions is not necessarily quasiconcave
(resp. quasiconvex).

Figure 12.3: The graphic representation of the alternative definitions of


quasiconcavity and quasiconvexity.

Sometimes it may prove easier to check quasiconcavity and quasicon-


vexity by the following alternative (equivalent) definitions. We state it as
a proposition.

Proposition 12.5.1 A function f (x), where x is a vector of variables in the do-


main X, is quasiconcave (resp. quasiconvex) if and only if, for any constant k,
the set S ≥ (k) ≡ {x ∈ X : f (x) ≥ k} (resp. S ≤ (k) ≡ {x ∈ X : f (x) ≤ k}) is
convex.

P ROOF . Necessity: Let x1 and x2 be two points of S ≥ (k). We need to


show: all convex combinations xθ ≡ θx1 + (1 − θ)x2 , θ ∈ [0, 1] are in S ≥ (k).
Since x1 ∈ S ≥ (k) and x2 ∈ S ≥ (k), by the definition of upper contour
set, we have f (x1 ) = k and f (x2 ) = k.
Now, for any xθ , since f is quasi-concave, then:

f (xθ ) = min[f (x1 ), f (x2 )] = k.


208 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

Therefore, f (xθ ) = k, and then xθ ∈ S ≥ (k). Consequently, SS ≥ (k) must be


a convex set.
Sufficiency: we need to show: if for all k ∈ R, S ≥ (k) is a convex set,
then f (x) is a quasi-concave function. Let x1 and x2 be two arbitrary
points in X. Without loss of generality, suppose f (x1 ) = f (x2 ). Since
for all k ∈ R, S ≥ (k) is a convex set, then S ≥ (f (x2 )) must be convex.
It is also clear that x2 ∈ S ≥ (f (x2 )), and since f (x1 ) = f (x2 ), we have
x1 ∈ S ≥ (f (x2 )). As such, for any convex combination of x1 and x2 , we
must have xθ ∈ S ≥ (f (x2 )). It follows from the definition of S ≥ (f (x2 )) that
f (xθ ) = f (x2 ). As a consequence, we must have

f (xθ ) = min[f (x1 ), f (x2 )].

Therefore, f (x) is quasi-concave. 2

Example 12.5.1 (1) Z = x2 is quasiconvex since S ≤ is convex.


(2) Z = f (x, y) = xy is quasiconcave since S ≥ is convex.
(3) Z = f (x, y) = (x − a)2 + (y − b)2 is quasiconvex since S ≤ is convex.

The above facts can be seen by looking at graphs of these functions.

Differentiable Functions

Similar to the concavity, when a function is differentiable, we have the


following result.

Proposition 12.5.2 Suppose that f : R → R is differentiable. Then f is quasi-


concave if and only if for any x, y ∈ R, we have

f (y) = f (x) ⇒ f ′ (x)(y − x) = 0. (12.5.2)


12.5. QUASICONCAVITY AND QUASICONVEXITY 209

When there are two or more independent variables, the above propo-
sition becomes:

Proposition 12.5.3 Suppose that f : Rn → R is differentiable. Then f is quasi-


concave if and only if for any x, y ∈ Rn , we have


n
∂f (x)
f (y) = f (x) ⇒ (yj − xj ) = 0. (12.5.3)
j=1 ∂xj

If a function z = f (x1 , x2 , · · · , xn ) is twice continuously differentiable,


quasiconcavity and quansiconvexity can be checked by means of the first
and second order partial derivatives of the function.

Define a bordered determinant as follows:

0 f1 f2 ··· fn
f1 f11 f12 · · · f1n
|B| = f2 f21 f22 · · · f2n .
··· ··· ··· ··· ···
fn fn1 fn2 · · · fnn

Remark 12.5.3 The determinant |B| is different from the bordered Hes-
sian |H|. Unlike |H|, the border in |B| is composed of the first derivatives
of the function f rather than an extraneous constraint function g.

We can define successive principal minors of B as follows:

0 f1 f2
0 f1 ,
|B1 | = , |B2 | = f1 f11 f12 , · · · , |Bn | = |B|.
f1 f11
f2 f21 f22
210 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

A necessary condition for a function z = f (x1 , · · · , xn ) defined the non-


negative orthant to be quasiconcave is that

|B1 | ≤ 0, |B2 | ≥ 0, |B3 | ≤ 0, · · · , (−1)n |Bn | ≥ 0.

A sufficient condition for f to be strictly quasiconcave on the nonneg-


ative orthant is that

|B1 | < 0, |B2 | > 0, |B3 | < 0, · · · , (−1)n |Bn | > 0.

For strict quasiconvexity, the corresponding sufficient condition is that

|B1 | < 0, |B2 | < 0, · · · , |Bn | < 0.

Example 12.5.2 z = f (x1 , x2 ) = x1 x2 for x1 > 0 and x2 > 0. Since f1 = x2 ,


f2 = x1 , f11 = f22 = 0, and f12 = f21 = 1, the relevant principal minors
turn out to be

0 x2 x1
0 x2
|B1 | = = −x22 < 0, |B2 | = x2 0 1 = 2x1 x2 > 0.
x2 0
x1 1 0

Thus z = x1 x2 is strictly quasiconcave on the positive orthant.

Example 12.5.3 Show that z = f (x, y) = xa y b (x, y > 0; a, b > 0) is quasi-


concave.
Since

fx = axa−1 y b , fy = bxa y b−1 ;


fxx = a(a − 1)xa−2 y b , fxy = abxa−1 y b−1 , fyy = b(b − 1)xa y b−2 ,
12.5. QUASICONCAVITY AND QUASICONVEXITY 211

thus

0 fx
|B1 | = = −(axa−1 y b )2 < 0;
fx fxx

0 fx fy
|B2 | = fx fxx fxy = [2a2 b2 − a(a − 1)b2 − a2 b(b − 1)]x3a−2 y 3b−2
fy fyx fyy

= ab(a + b)x3a−2 y 3b−2 > 0.

Hence it is strictly quasiconcave.

Remark 12.5.4 When the constraint g is linear: g(x) = a1 x1 +· · ·+an xn = c,


the second-order partial derivatives of g disappears, and thus, from the
first order condition fi = λgi , the bordered determinant |B| and the bor-
dered Hessian determinant have the following relationship:

|B| = λ2 |H̄|.

Consequently, in the linear-constraint case, the two bordered determi-


nants always have the same sign at the stationary point of z. The same
is true for principal minors. It then follows that if the bordered determi-
nant |B̄| satisfies the sufficient condition for strict quasiconcavity, the bor-
dered Hessian |H̄| must then satisfy the second-order sufficient condition
for constrained maximization. Thus, in the case of linear-constraint, the
first-order necessary condition is also a sufficient condition for the maxi-
mization problem when the object function is quasiconcave.

Absolute versus Relative Extrema

If an objective function is (strictly) quasiconcave (resp. quansiconvex)


and the constraint function is convex, by the similar reasons for concave
212 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

(resp. convex) functions, its relative maximum (resp. relative minimum)


is a (unique) absolute maximum (resp. absolute minimum).

Theorem 12.5.1 (Global Optimum) Suppose that f is concave and the con-
straint function is convex. They are both are twice continuously differentiable
function on X ⊆ Rn , and x∗ is an interior point of X. Then, the following three
statements are equivalent:

(1) Zλ (x∗ ) = Z1 (x∗ ) = . . . = Zn (x∗ ) = 0.

(2) f has a local maximum subject to g(x) at x∗ .

(3) f has a global maximum subject to g(x) at x∗ .

12.6 Utility Maximization and Consumer Demand


Let us now examine the consumer choice problem – utility maximization
problem. For simplicity, only consider the two-commodity case. The con-
sumer wants to maximize her utility

u = u(x, y) (ux > 0, uy > 0)

subject to her budget constraint

Px x + Py y = I

by taking prices Px and Py as well as his income I as given.

First-Order Condition

The Lagrange function is

Z = u(x, y) + λ(I − Px x − Py y).


12.6. UTILITY MAXIMIZATION AND CONSUMER DEMAND 213

At the first-order condition, we have the following equations:

Zλ = I − Px x − Py y = 0;
Zx = ux − λPx = 0;
Zy = uy − λPy = 0.

Figure 12.4: The graphical illustration of the conditions for utility maxi-
mization.

From the last two equations, we have

ux uy
= = λ,
Px Py

or
ux Px
= .
uy Py

The term ux
uy
≡ M RSxy is the so-called marginal rate of substitution of
Px
x for y. Thus, we obtain the well-known equality: M RSxy = Py
which is
the necessary condition for the interior solution.
214 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS

Second-Order Condition

If the bordered Hessian in the present problem is positive, i.e., if

0 Px Py
|H̄| = Px uxx uxy = 2Px Py uxy − Py2 uxx − Px2 uyy > 0,
Py uyx uyy

(with all the derivatives evaluated at the stationary point of x̄ and ȳ), then
the stationary value of u will assuredly be maximum.
Since the budget constraint is linear, from the result in the last section,
we have
|B| = λ2 |H̄|.

Thus, as long as |B| > 0, we know the second-order condition holds.


Recall that |B| > 0 means that utility function is strictly quasi-concave.
Also, quasi-concavity of a utility function means the indifference curves
represented by the utility function is convex, i.e., the upper contour set
{y : u(y)) ≥ u(x)} is convex, in which case we call the preferences repre-
sented by the utility function is convex.

Remark 12.6.1 The convexity of preferences implies that consumers wan-


t to diversify their consumptions, and thus, convexity can be viewed as
the formal expression of basic measure of economic markets for diversi-
fication. Also, the strict quasi-concavity implies that the strict convexity
of ≻i , which in turn implies the conventional diminishing marginal rates
of substitution (DMRS), and weak convexity of <i is equivalent to the
quasi-concavity of utility function ui .

Px
From M RSxy = Py
, we may solve x or y as a function of another and
then substitute it into the budget line to find the demand function of x or
y.
12.6. UTILITY MAXIMIZATION AND CONSUMER DEMAND 215

Example 12.6.1 Consider that the Cobb-Douglas utility function:

u(x, y) = xa y 1−a , 0 < a < 1.

This function is strictly increasing and concave on R2++ .


M Ux ay Px
Substituting M RSxy = M Uy
= (1−a)x
into M RSxy = Py
, we have

ay Px
=
(1 − a)x Py

and then
(1 − a)Px x
y= .
aPy
Substituting the above y into the budget line Px x + Py y = I and solving
for x, we get the demand function for x

aI
x(Px , Py , I) = .
Px

Substituting the above x(Px , Py , I) into the budget line, the demand
function for y is obtained:

(1 − a)I
y(Px , Py , I) = .
Py
216 CHAPTER 12. OPTIMIZATION WITH EQUALITY CONSTRAINTS
Chapter 13

Optimization with Inequality


Constraints

Classical methods of optimization (the method of Lagrange multipliers)


deal with optimization problems with equality constraints in the form
of g(x1 , . . . , xn ) = c. Nonclassical optimization, also known as mathe-
matical programming, tackles problems with inequality constraints like
g(x1 , . . . , xn ) ≤ c.

Mathematical programming includes linear programming and nonlin-


ear programming. In linear programming, the objective function and al-
l inequality constraints are linear. When either the objective function or
an inequality constraint is nonlinear, we face a problem of nonlinear pro-
gramming.

In the following, we restrict our attention to non-linear programming.


A problem of linear programming - also called a linear program - is dis-
cussed in the next chapter.

217
218CHAPTER 13. OPTIMIZATION WITH INEQUALITY CONSTRAINTS

13.1 Non-Linear Programming

The nonlinear programming problem is that of choosing nonnegative val-


ues of certain variables so as to maximize or minimize a given (non-linear)
function subject to a given set of (non-linear) inequality constraints.
The nonlinear programming maximum problem is

max f (x1 , . . . , xn )
s.t. g i (x1 , . . . , xn ) ≤ bi , i = 1, 2, . . . , m;
x1 ≥ 0, . . . , xn ≥ 0.

Similarly, the minimization problem is

min f (x1 , . . . , xn )
s.t. g i (x1 , . . . , xn ) ≥ bi , i = 1, 2, . . . , m;
x1 ≥ 0, . . . , xn ≥ 0.

First, note that there are no restrictions on the relative size of m and n,
unlike the case of equality constraints. Second, note that the direction of
the inequalities (≤ or ≥) at the constraints is only a convention, because
the inequality g i ≤ bi can be easily converted to the ≥ inequality by multi-
plying it by -1, yielding −g i ≥ −bi . Third, note that an equality constraint,
say g k = bk , can be replaced by the two inequality constraints, g k ≤ bk and
−g k ≤ −bk .

Definition 13.1.1 ( Binding Constraint) A constraint g j ≤ bj is called the


binding (or active) at x0 = (x01 , . . . , x0n ) if g j (x0 ) = bj , i.e., x0 is a boundary
point of the constraint.
13.2. KUHN-TUCKER CONDITIONS 219

13.2 Kuhn-Tucker Conditions

For the purpose of ruling out certain irregularities on the boundary of the
feasible set, a restriction on the constrained functions is imposed. This re-
striction is the so-called constraint qualification. The following is a strong
version of the constraint qualification, which is much easier to verify.

Definition 13.2.1 Let C be the constraint set. We say that the (linear in-
dependenc) constraint qualification condition is satisfied at x∗ ∈ C if the
constraints that hold at x∗ with equality are independent; that is, if the gra-
dients (the vectors of partial derivatives) of g j -constraints that are valuated
and binding at x∗ are linearly independent for j = 1, . . . , m.

Define the Lagrangian function for optimization with inequality con-


straints:


m
L(x1 , . . . , xn , λ1 , . . . , λm ) = f (x1 , . . . , xn ) + λj (bj − g j (x1 , . . . , xn )).
j=1

The following results is the necessity theorem about Kuhn-Tucker con-


ditions for a local optimum if the constraint qualification is satisfied.

Proposition 13.2.1 ( Necessity Theorem of Kuhn-Tucker conditions) Suppose


that that the constraint qualification condition is satisfied and that the objective
functions and constraint functions are differentiable. Then, we have

(1) the Kuhn-Tucker necessary condition for maximization is:

∂L ∂L
≤ 0, xi ≥ 0 and xi = 0, i = 1, . . . , n;
∂xi ∂xi

∂L ∂L
≥ 0, λj ≥ 0 and λj = 0, j = 1, . . . , m.
∂λj ∂λj
220CHAPTER 13. OPTIMIZATION WITH INEQUALITY CONSTRAINTS

(2) the Kuhn-Tucker necessary condition for minimization is:

∂L ∂L
≥ 0, xi ≥ 0 and xi = 0, i = 1, . . . , n;
∂xi ∂xi

∂L ∂L
≤ 0, λj ≥ 0 and λj = 0, j = 1, . . . , m.
∂λj ∂λj

In general, the Kuhn-Tucker condition is neither necessary nor suf-


ficient for a local optimum without other conditions such as constraint
qualification condition. However, if certain assumptions are satisfied, the
Kuhn-Tucker condition becomes necessary and even sufficient.

Example 13.2.1 Consider the following nonlinear program:

max π = x1 (10 − x1 ) + x2 (20 − x2 )

s.t. 5x1 + 3x2 ≤ 40;


x1 ≤ 5;
x2 ≤ 10;
x1 ≥ 0, x2 ≥ 0.

The Lagrangian function of the nonlinear program in Example (13.2.1)


is:

L = x1 (10 − x1 ) + x2 (20 − x2 ) − λ1 (5x1 + 3x2 − 40) − λ2 (x1 − 5) − λ3 (x2 − 10).

The Kuhn-Tucker conditions are:

∂L
= 10 − 2x1 − 5λ1 − λ2 ≤ 0;
∂x1

∂L
= 20 − 2x2 − 3λ1 − λ2 ≤ 0;
∂x2
13.2. KUHN-TUCKER CONDITIONS 221

∂L
= −(5x1 + 3x2 − 40) ≥ 0;
∂λ1
∂L
= −(x1 − 5) ≥ 0;
∂λ2
∂L
= −(x2 − 10) ≥ 0;
∂λ3
x1 ≥ 0, x2 ≥ 0;

λ1 ≥ 0, λ2 ≥ 0, λ3 ≥ 0;
∂L ∂L
x1 = 0, x2 = 0;
∂x1 ∂x2
∂L
λi = 0, i = 1, 2, 3.
∂λi
Notice that the failure of the constraint qualification signals certain ir-
regularities at the boundary kinks of the feasible set. Only if the optimal
solution occurs in such a kink may the Kuhn-Tucker condition not be sat-
isfied.
If all constraints are linear and functionally independent, the constraint
qualification is always satisfied.

Example 13.2.2 The constraint qualification for the nonlinear program in


Example (13.2.1) is satisfied since all constraints are linear and functional-
ly independent. Therefore, the optimal solution ( 95 , 295 ) must satisfy the
34 34

Kuhn-Tucker condition in example Example (13.2.1).

Example 13.2.3 The following example illustrates a case where the Kuhn-
Tucker condition is not satisfied in the solution of an optimization prob-
lem. Consider the problem:

max y
s.t. x + (y − 1)3 ≤ 0;
x ≥ 0, y ≥ 0.
222CHAPTER 13. OPTIMIZATION WITH INEQUALITY CONSTRAINTS

The solution to this problem is (0, 1). (If y > 1, then the restriction
x + (y + 1)3 ≤ 0 implies x < 0.) The Lagrangian function is:

L = y + λ[−x − (y − 1)3 ].

The Kuhn-Tucker conditions requires

∂L
≤ 0,
∂y

or
1 − 3λ(y − 1)2 ≤ 0.

As can be observed, this condition is not verified at the point (0, 1).

Proposition 13.2.2 (Kuhn-Tucker Sufficiency Theorem- 1) Suppose that the


following conditions are satisfied:

(a) f is differentiable and concave in the nonnegative orthant;

(b) each constraint function is differentiable and convex in the non-


negative orthant;

(c) the point x∗ satisfies the Kuhn-Tucker necessary maximum condi-


tion.

Then x∗ gives a global maximum of f .

The concavity of the objective function and convexity of constraints


may be weakened to quasiconcavity and quasiconvexity, respectively. To
adapt this theorem for minimization problems, we need to interchange the
two words "concave" and "convex" in a) and b) and use the Kuhn-Tucker
necessary minimum condition in c).
13.2. KUHN-TUCKER CONDITIONS 223

Proposition 13.2.3 (Kuhn-Tucker Sufficiency Theorem- 2) The Kuhn-Tucker


conditions is also a sufficient condition for x∗ to be a local optimum of a maximiza-
tion (resp. minimization) program if the following assumptions are satisfied:

(a) the objective function f is differentiable and quasiconcave (resp.


quasiconvex);

(b) each constraint g i is differentiable and quasiconvex (resp. quasi-


concave);

(c) any one of the following is satisfied:

∂f (x∗ )
(c.i) there exists j such that ∂xj
< 0 (resp. > 0);
∂f (x∗ )
(c.ii) there exists j such that ∂xj
> 0 (resp. < 0)
and xj can take a positive value without violating
the constraints;

(c.iii) f is concave (resp. convex).

The problem of finding the nonnegative vector (x∗ , λ∗ ), x∗ = (x∗1 , . . . , x∗n ), λ∗ =


(λ∗1 , . . . , λ∗m ) which satisfies the Kuhn-Tucker necessary condition and for
which

L(x, λ∗ ) ≤ L(x∗ , λ∗ ) ≤ L(x∗ , λ) ∀ x = (x1 , . . . , xn ) ≥ 0, λ = (λ1 , . . . , λm ) ≥ 0

is known as the saddle point problem.

Proposition 13.2.4 If (x∗ , λ∗ ) solves the saddle point problem then (x∗ , λ∗ )
solves the problem (13.1.1).
224CHAPTER 13. OPTIMIZATION WITH INEQUALITY CONSTRAINTS

13.3 Economic Applications

Corner Solution for Linear Utility Maximization

Suppose the preference ordering is represented by the linear utility


function :
u(x, y) = ax + by.

Since the marginal rate of substitution of x for y is a/b and the economic
rate of substitution of x for y is px /py are both constant, they cannot be
in general equal. So the first-order condition cannot hold with equality as
long as a
b
̸= px
py
. In this case the answer to the utility-maximization problem
typically involves a boundary solution: only one of the two goods will
be consumed. It is worthwhile presenting a more formal solution since
it serves as a nice example of the Kuhn-Tucker theorem in action. The
Kuhn-Tucker theorem is the appropriate tool to use here, since we will
almost never have an interior solution.
The Lagrange function is

L(x, y, λ) = ax + by + λ(m − px x − py y),

and thus

∂L
= a − λpx ; (13.3.1)
∂x
∂L
= b − λpt ; (13.3.2)
∂y
∂L
= m − px − py . (13.3.3)
∂λ

There are four cases to be considered:


∂L ∂L
Case 1. x > 0 and y > 0. Then we have ∂x
= 0 and ∂y
= 0. Thus,
a px a
b
= py
. Since λ = px
> 0, we have px x + py y = m and thus all x and y that
13.3. ECONOMIC APPLICATIONS 225

satisfy px x + py y = m are the optimal consumptions.


Case 2. x > 0 and y = 0. Then we have ∂L
∂x
= 0 and ∂L
∂y
5 0. Thus,
a
b
= px
py
. Since λ = a
px
> 0, we have px x + py y = m and thus x = m
px
is the
optimal consumption.
Case 3. x = 0 and y > 0. Then we have ∂L
∂x
5 0 and ∂L
∂y
= 0. Thus,
a
b
5 px
py
. Since λ = b
py
> 0, we have px x + py y = m and thus y = m
py
is the
optimal consumption.
Case 4. x = 0 and y = 0. Then we have ∂L
∂x
5 0 and ∂L
∂y
5 0. Since
λ= b
py
> 0, we have px x + py y = m and thefore m = 0 because x = 0 and
y = 0. However, when m ̸= 0, this case is impossible.
In summary, the demand functions are given by


 (m/px , 0) if a/b > px /py



(x(px , py , m), y(px , py , m)) = (0, m/py ) if a/b < px /py




 (x, m/p − p /p x) if a/b = px /py
x y x

for all x ∈ [0, m/px ].

x x x

x x x

Figure 13.1: Utility maximization for linear utility function

Remark 13.3.1 In fact, it is easily found out the optimal solutions by com-
paring relatives steepness of the indifference curves and the budget line.
a px
For instance, as shown in Figure 13.1 below, when b
> py
, the indiffer-
ence curves become steeper, and thus the optimal solution is the one the
226CHAPTER 13. OPTIMIZATION WITH INEQUALITY CONSTRAINTS

a px
consumer spends his all income on good x . When b
< py
, the indiffer-
ence curves become flatter, and thus the optimal solution is the one the
consumer spends his all income on good y. When a
b
̸= px
py
, the indiffer-
ence curves and the budget line are parallel and coincide at the optimal
solutions, and thus the optimal solutions are given by all the points on the
budget line.

Economic Interpretation of Nonlinear Program and the Kuhn-Tucker


Condition

A maximization program in the general form, for example, is the pro-


duction problem facing a firm which has to produce n goods such that it
maximizes its revenue subject to m resource (factor) constraints.
The variables have the following economic interpretations:

• xj is the amount produced of the jth product;

• ri is the amount of the ith resource available;

• f is the profit (revenue) function;

• g i is a function which shows how the ith resource is used in produc-


ing the n goods.

The optimal solution to the maximization program indicates the opti-


mal quantities of each good the firm should produce.
In order to interpret the Kuhn-Tucker condition, we first have to note
the meanings of the following variables:

• fj = ∂f
∂xj
is the marginal profit (revenue) of product j;

• λi is the shadow price of resource i;


13.3. ECONOMIC APPLICATIONS 227

∂g i
• gji = ∂xj
is the amount of resource i used in producing a marginal
unit of product j;

• λi gji is the imputed cost of resource i incurred in the production of a


marginal unit of product j.


m
The condition ∂L
∂xj
≤ 0 can be written as fj ≤ λi gji and it says that the
i=1
marginal profit of the jth product cannot exceed the aggregate marginal
imputed cost of the jth product.
∂L
The Kuhn-Tucker condition xj ∂x j
= 0 implies that, in order to produce
good j (xj > 0), the marginal profit of good j must be equal to the ag-
∂L
gregate marginal imputed cost ( ∂x j
= 0). The same condition shows that
∂L
good j is not produced (xj = 0) if there is an excess imputation xj ∂x j
< 0.
The Kuhn-Tucker condition ∂L
∂λi
≥ 0 is simply a restatement of con-
straint i, which states that the total amount of resource i used in producing
all the n goods should not exceed total amount available ri .
∂L
The condition ∂λi
= 0 indicates that if a resource is not fully used in
∂L
the optimal solution ( ∂λ i
> 0), then its shadow price will be 0 (λi = 0). On
∂L
the other hand, a fully used resource ( ∂λ i
= 0) has a strictly positive price
(λi > 0).

Example 13.3.1 Let us find an economic interpretation for the maximiza-


tion program given in Example (13.2.1):

max R = x1 (10 − x1 ) + x2 (20 − x2 )


s.t. 5x1 + 3x2 ≤ 40;
x1 ≤ 5;
x2 ≤ 10;
x1 ≥ 0, x2 ≥ 0.
228CHAPTER 13. OPTIMIZATION WITH INEQUALITY CONSTRAINTS

A firm has to produce two goods using three kinds of resources avail-
able in the amounts 40, 5, 10 respectively. The first resource is used in the
production of both goods: five units are necessary to produce one unit of
good 1, and three units to produce one unit of good 2. The second re-
source is used only in producing good 1 and the third resource is used
only in producing good 2.
The sale prices of the two goods are given by the linear inverse demand
equations p1 = 10 − x1 and p2 = 20 − x2 . The problem the firm faces is
how much to produce of each good in order to maximize revenue R =
x1 p1 + x2 p2 . The solution (2, 10) gives the optimal amounts the firm should
produce.
Chapter 14

Differential Equations

We first provide the general concept of ordinary differential equations de-


fined on Euclidean spaces.

Definition 14.0.1 An equation,

F (x, y, y ′ , · · · , y (n) ) = 0, (14.0.1)

which constitutes independent variable x, unknown function y = y(x) of


the independent variable, and its first derivative y ′ = y ′ (x) to the nth order
derivative y (n) = y (n) (x), is called the ordinary differential equation.
If the highest order derivative in the equation is n, the equation is also
called the nth-order ordinary differential equation.

If for all x ∈ I, the function y = ψ(x) satisfies

F (x, ψ(x), ψ ′ (x), · · · , ψ (n) (x)) = 0,

then y = ψ(x) is called a solution to the ordinary differential equation


(14.0.1).
Sometimes the solutions of the ordinary differential equations are not
unique, and there may even exist infinite solutions. For example, y =

229
230 CHAPTER 14. DIFFERENTIAL EQUATIONS

C 1 4 dy y
+ x is the solution of the ordinary differential equation + = x3 ,
x 5 dx x
where C is an arbitrary constant. Next we introduce the concept of general
solutions and particular solutions of ordinary differential equations.

Definition 14.0.2 The solution of the nth-order ordinary differential equa-


tion (14.0.1)
y = ψ(x, C1 , · · · , Cn ), (14.0.2)

which contains n independent arbitrary constants, C1 , · · · , Cn , is called the


general solution to ordinary differential equation (14.0.1). Here, indepen-
dence means that the Jacobi determinant

∂ψ ∂ψ ∂ψ
···
∂C1 ∂C2 ∂Cn
∂ψ (1) ∂ψ (1)
∂ψ (1)
D[ψ, ψ (1) , · · · , ψ (n−1) ] def ···
= ∂C1 ∂C2 ∂Cn
D[C1 , · · · , Cn ] .. .. .. ..
. . . .
(n−1) (n−1) (n−1)
∂ψ ∂ψ ∂ψ
···
∂C1 ∂C2 ∂Cn

is not identically equal to 0.

If a solution of an ordinary differential equation, denoted y = ψ(x),


does not contain any constant, it is called the particular solution. Obvi-
ously, a general solution becomes a particular solution when the arbitrary
constants are determined. In general, the restrictions of some initial con-
ditions determine the value of any constants. For example, for ordinary
differential equation (14.0.1), if there are some given initial conditions:

(1) (n−1)
y(x0 ) = y0 , y (1) (x0 ) = y0 , · · · , y (n−1) (x0 ) = y0 , (14.0.3)

then the ordinary differential equation (14.0.1) and the initial value con-
ditions (14.0.3) are said to be the Cauchy problem or initial value prob-
14.1. EXISTENCE AND UNIQUENESS THEOREM OF SOLUTIONS FOR ORDINARY DIFFEREN

lem for the nth-order ordinary differential equations. Then the question is
what conditions the function F should satisfy so that the above ordinary
differential equations are uniquely solvable. This problem is the existence
and uniqueness of solutions for ordinary differential equations.

14.1 Existence and Uniqueness Theorem of So-


lutions for Ordinary Differential Equations
We first consider an ordinary differential equation of first-order y ′ = f (x, y)
that satisfies initial condition (x0 , y0 ), that is, y(x0 ) = y0 . Let y(x) be a solu-
tion to the differential equation.

Definition 14.1.1 Let a function f (x, y) be defined on D ⊆ R2 . We say


f satisfies the local Lipschitz condition with respect to y at the point
(x0 , y0 ) ∈ D, if there exists a neighborhood U ⊆ D of (x0 , y0 ), and a positive
number L such that

|f (x, y) − f (x, z)| 5 L|y − z|, ∀(x, y), (x, z) ∈ U.

If there is a positive number L such that

|f (x, y) − f (x, z)| 5 L|y − z|, ∀(x, y), (x, z) ∈ D,

we call f (x, y) satisfies the global Lipschitz condition in D ⊆ R2 .

The following lemma characterizes the properties of the function satisfy-


ing Lipschitz condition.

Lemma 14.1.1 Suppose that f (x, y) defined on D ⊆ R2 is continuously dif-


ferentiable. If there is an ϵ > 0 such that fy (x, y) is bounded on U = {(x, y) :
|x − x0 | < ϵ, |y − y0 | < ϵ}, then f (x, y) satisfies the local Lipschitz condition. If
fy (x, y) is bounded on D, then f (x, y) satisfies the global Lipschitz condition.
232 CHAPTER 14. DIFFERENTIAL EQUATIONS

Theorem 14.1.1 If f is continuous on an open set D, then for any (x0 , y0 ) ∈ D,


there always exists a solution y(x) of the differential equation, and it satisfies
y ′ = f (x, y) and y(x0 ) = y0 .

The following is the theorem on the uniqueness of the solution for dif-
ferential equations.

Theorem 14.1.2 Suppose that f is continuous on an open set D, and satisfies


the global Lipschitz condition with respect to y. Then for any (x0 , y0 ) ∈ D, there
always exists a unique solution y(x) satisfying y ′ = f (x, y) and y(x0 ) = y0 .

For nth order ordinary differential equations, y (n) = f (x, y, y ′ , · · · , y (n−1) ),


if the Lipschitz condition is changed to for y, y ′ , · · · , y (n−1) instead of for y,
we have similar conclusions about the existence and uniqueness of solu-
tion. See Ahmad and Ambrosetti (2014) for the specific proof of existence
and uniqueness.

14.2 Some Common Ordinary Differential Equa-


tions with Explicit Solutions
Generally, we hope to obtain the concrete form of solutions, namely explic-
it solutions, for differential equations. However, in many cases, there is no
explicit solution. Here we give some common cases in which differential
equations can be solved explicitly.

Case of Separable Equations

Consider a separable differential equation y ′ = f (x)g(y), and y(x0 ) = y0 .


It can be rewritten as:
dy
= f (x)dx.
g(y)
Integrating both sides, then we get the solution to the differential equation.
14.2. SOME COMMON ORDINARY DIFFERENTIAL EQUATIONS WITH EXPLICIT SOLUTION

For example, for (x2 + 1)y ′ + 2xy 2 = 0, y(0) = 1, using the above solving
procedure, we get the solution as

1
y(x) = .
ln(x2 + 1) + 1

In addition, the differential equation with the form y ′ = f (y) is called


autonomous system, since y ′ is only determined by y.

Homogeneous Type of Differential Equation

Some differential equations with constant coefficients have explicit solu-


tions.

Definition 14.2.1 We call the function f (x, y) a homogeneous function of


degree n if for any λ, f (λx, λy) = λn f (x, y).

Differential equations have the form of homogeneous functions if M (x, y)dx+


N (x, y)dy = 0, where M (x, y) and N (x, y) are homogeneous functions with
the same order.
y
By variable transformation z = , the above differential equations can
x
be transformed into the separable form. Suppose M (x, y) and N (x, y)
are homogeneous functions of degree n, M (x, y)dx + N (x, y)dy = 0 is
dz M (1, z) dz
transformed to z + x = − , and then the final form is =
dx N (1, z) dx
M (1, z)
z+
N (1, z) M (1, z)
− , where z + is a function of z.
x N (1, z)

Exact Differential Equation

Given a simply connected and open subset D ⊆ R2 and two functions M


∂M (x, y) ∂N (x, y)
and N which are continuous and satisfy ≡ on D, then
∂y ∂x
234 CHAPTER 14. DIFFERENTIAL EQUATIONS

then implicit first-order ordinary differential equation of the form

M (x, y)dx + N (x, y)dy = 0

is called the exact differential equation, or the total differential equa-


tion. The nomenclature of "exact differential equation" refers to the exact
∂M (x, y) ∂N (x, y)
derivative of a function. Indeed, when ≡ , the solution
∂y ∂x
is F (x, y) = C, where the constant C is determined by the initial value,
∂F ∂F
and F (x, y) satisfies = M (x, y) or = N (x, y).
∂x ∂y
It is clear that a separable differential equation is a special case of an
1
exact differential equation y ′ = f (x)g(y) or dy − f (x)dx = 0, and then
g(y)
1 ∂M (x, y) ∂N (x, y)
we have M (x, y) = −f (x), N (x, y) = ,and = = 0.
g(y) ∂y ∂x
For example, 2xy 3 dx + 3x2 y 2 dy = 0 is an exact differential equation, of
which the general solution is x2 y 3 = C, and C is a constant.
When solving differential equations with explicit solutions, we usually
convert differential equations into the form of exact differential equations.

First-Order Linear Equation

Consider the first order linear equation of the following form:

dy
+ p(x)y = q(x). (14.2.4)
dx

When q(x) = 0, the above differential equation (14.2.4) is a separable


differential equation , and its solution is assumed to be y = ψ(x).
Suppose that ψ1 (x) is a particular solution of the differential equation
(14.2.4), then y = ψ(x) + ψ1 (x) is clearly the solution of the equations
(14.2.4).
dy ∫
− p(x)dx
It is easy to show that the solution to + p(x)y = 0 is y = Ce .
dx
Next we find the general solution to the differential equation (14.2.4).
14.2. SOME COMMON ORDINARY DIFFERENTIAL EQUATIONS WITH EXPLICIT SOLUTION

Suppose that

y = c(x)e− p(x)dx
,

and differentiating this gives

∫ ∫
y ′ = c′ (x)e− p(x)dx
+ c(x)p(x)e− p(x)dx
,

then substituting this back into the original differential equation, we have

∫ ∫ ∫
c′ (x)e− p(x)dx
+ c(x)p(x)e− p(x)dx
= p(x)c(x)e− p(x)dx
+ q(x),

and thus

′ p(x)dx
c (x) = q(x)e .

We have
∫ ∫
p(x)dx
c(x) = q(x)e dx + C.

Thus, the general solution is

∫ (∫ ∫ )
y(x) = e− p(x)dx
q(x)e p(x)dx
dx + C .

Bernoulli Equation

The following differential equation is called the Bernoulli equation:

dy
+ p(x)y = q(x)y n , (14.2.5)
dx

where n (with n ̸= 0, 1) is a natural number.

Multiplying both sides by (1 − n)y (−n) gives:

dy
(1 − n)y (−n) + (1 − n)y (1−n) p(x) = (1 − n)q(x).
dx
236 CHAPTER 14. DIFFERENTIAL EQUATIONS

Let z = y (1−n) , and get:

dz
+ (1 − n)zp(x) = (1 − n)q(x),
dx

which becomes a first-order linear equation whose explicit solution can be


obtained.

The differential equations with explicit solutions have other forms, such
as some special forms of Ricatti equations, and the equations similar to
M (x, y)dx + N (x, y)dy = 0, but not satisfying

∂M (x, y) ∂N (x, y)
≡ .
∂y ∂x

14.3 Higher Order Linear Equations with Con-


stant Coefficients

Consider a differential equation of degree n with constant coefficients

y (n) + a1 y (n−1) + · · · + an−1 y ′ + an y = f (x). (14.3.6)

If f (x) ≡ 0, then the differential equation (14.3.6) is called the homoge-


neous differential equation of degree n, otherwise it is called the nonho-
mogeneous differential equation.

There is a method for finding the general solution yg (x) of a homoge-


neous differential equation of degree n. The general solution is the sum
of n bases of solutions y1 , · · · , yn , that is, yg (x) = C1 y1 (x) + · · · + Cn yn (x),
where C1 , · · · , Cn are arbitrary constants. These arbitrary constants are
uniquely determined by initial-value conditions. Find a function y(x) sat-
14.3. HIGHER ORDER LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS237

isfying

y(x) = y00 , y ′ (x) = y01 , · · · , y (n−1) (x) = y0n−1 , when x = x0 ,

where x0 , y00 , y01 , · · · , y0n−1 are given initial values.

The procedures for solving the fundamental solution of homogeneous


differential equations are given below:

(1) Solve the characteristic equation with respect to λ:

λn + a1 λn−1 + · · · + an−1 λ + an = 0.

Suppose that the roots of the characteristic equation are λ1 , · · · , λn .


Some roots may be complex and some are multiple.

(2) If λi is the non-multiple real characteristic root, then the fundamen-


tal solution corresponding to this root is yi (x) = eλi x .

(3) If λi is the real characteristic root of multiplicity k, then there are k


fundamental solutions:

yi1 (x) = eλi x , yi2 (x) = xeλi x , · · · , yik (x) = xk−1 eλi x .

(4) If λj is the non-multiple complex characteristic root, λj = αj +



iβj , i = −1, its complex conjugate denoted by λj+1 = αj − iβj is also
the characteristic root, thus there are two fundamental solutions generat-
ed by these complex conjugate roots λj , λj+1 :

yj1 = eαj x cos βj x, yj2 = eαj x sin βj x.

(5) If λj is the complex characteristic root of multiplicity l, λj = αj + iβj ,


its complex conjugate is also the complex characteristic root of multiplicity
238 CHAPTER 14. DIFFERENTIAL EQUATIONS

l, thus these 2l complex roots generate 2l fundamental solutions:

yj1 = eαj x cos βj x, yj2 = xeαj x cos βj x, · · · , yjl = xl−1 eαj x cos βj x;

yjl+1 = eαj x sin βj x, yjl+2 = xeαj x sin βj x, · · · , yj2l = xl−1 eαj x sin βj x.

The following is a general method for solving nonhomogeneous differ-


ential equations.
The general form of solution to nonhomogeneous differential equation-
s is ynh (x) = yg (x) + yp (x), where yg (x) is the corresponding general solu-
tion of the homogeneous equation, and yp (x) is the particular solution of
the nonhomogeneous equation.
Next are some procedures for solving for particular solutions of non-
homogeneous equations.
(1) If f (x) = Pk (x)ebx , and Pk (x) is the polynomial of degree k, then the
form of particular solutions is:

yp (x) = xs Qk (x)ebx ,

where Qk (x) is also a polynomial of degree k. If b is not a characteristic


root corresponding to the characteristic equation, then s = 0; if b is a char-
acteristic root of multiplicity m, then s = m.
(2) If f (x) = Pk (x)epx cos qx + Qk (x)epx sin qx, and Pk (x) and Qk (x) are
all polynomials of degree k, then the form of particular solutions is:

yp (x) = xs Rk (x)epx cos qx + xs Tk (x)epx sin qx,

where Rk (x), Tk (x) are also polynomials of degree k. If p + iq is not a root


of the characteristic equation, then s = 0; if p + iq is a characteristic root of
multiplicity m, then s = m.
(3) A general method for solving nonhomogeneous differential equa-
14.3. HIGHER ORDER LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS239

tions is called the the variation of parameters or the method of undetermined-


coefficients.
Suppose that the general solution of a homogeneous equation is given
as follows:
yg = C1 y1 (x) + · · · + Cn yn (x),

where yi (x) is the fundamental solution. Regard constants C1 , · · · , Cn as


the functions with respect to x, such as u1 (x), · · · , un (x), so the form of
particular solutions to the nonhomogeneous equation can be expressed as

yp (x) = u1 (x)y1 (x) + · · · + un (x)yn (x),

where u1 (x), · · · , un (x) are the solutions of the following equations

u′1 (x)y1 (x) + · · · + u′n (x)yn (x) = 0,

u′1 (x)y1′ (x) + · · · + u′n (x)yn′ (x) = 0,


..
.

u′1 (x)y1 (x) + · · · + u′n (x)yn(n−2) (x) = 0,


(n−2)

u′1 (x)y1 (x) + · · · + u′n (x)yn(n−1) (x) = f (x).


(n−1)

(4) If f (x) = f1 (x) + f2 (x) + · · · + fr (x), and yp1 (x), · · · , ypr (x) are the
particular solutions corresponding to f1 (x), · · · , fr (x), then

yp (x) = yp1 (x) + · · · + ypr (x).

Here is an example to familiarize the application of this method.

Example 14.3.1 Solve y ′′ − 5y ′ + 6y = t2 + et − 5.


The characteristic roots are λ1 = 2 and λ2 = 3. Thus, the general solu-
240 CHAPTER 14. DIFFERENTIAL EQUATIONS

tion of the homogeneous equation is:

y(t) = C1 e2t + C2 e3t .

Next, to find a particular solution of the nonhomogeneous equation, its


form is written as:
yp (t) = at2 + bt + c + det .

We first substitute this particular solution in the initial equation to de-


termine the coefficients a, b, c, d:

2a + det − 5(2at + b + det ) + 6(at2 + bt + c + det ) = t2 − 5 + et .

The coefficients of both sides should be consistent, thus we get:

6a = 1, −5 × 2a + 6b = 0, 2a − 5b + 6c = −5, d − 5d + 6d = 1,

Therefore, d = 1/2, a = 1/6, b = 5/18 and c = −71/108.


Finally, the general solution of the nonhomogeneous differential equa-
tion is:
t2 5t 71 et
y(t) = C1 e2t + C2 e3t + + − + .
6 18 108 2

14.4 System of Ordinary Differential Equations

The general form is:

ẋ(t) = A(t)x(t) + b(t), x(0) = x0 ,

where t (time) is an independent variable, x(t) = (x1 (t), · · · , xn (t))′ is a


vector of dependent variables, A(t) = (aij (t))[n×n] is an n × n matrix of
real varying coefficients, and b(t) = (b1 (t), · · · , bn (t))′ is an n-dimensional
14.4. SYSTEM OF ORDINARY DIFFERENTIAL EQUATIONS 241

varying vector.
Consider the case that A is a constant coefficient matrix and b is a con-
stant vector, also called the system of differential equations with con-
stant coefficients:

ẋ(t) = Ax(t) + b, x(0) = x0 , (14.4.7)

where A is assumed to be nonsingular.


The system of differential equations (14.4.7) can be solved by the fol-
lowing two steps.
Step 1: we consider the system of homogeneous equations (i.e. b = 0):

ẋ(t) = Ax(t), x(0) = x0 . (14.4.8)

And its solution is denoted by xc (t).


Step 2: find a particular solution xp to the nonhomogeneous equation
(14.4.7). The constant vector xp is a particular solution so that Axp = −b,
namely xp = −A−1 b.
Given the general solution of the homogeneous equation and the par-
ticular solution to the nonhomogeneous equation, the general solution of
the system of differential equations (14.4.8) is:

x(t) = xc (t) + xp .

There are two methods for solving the system of homogeneous differential
equations (14.4.8).
The first one is that we can eliminate n − 1 dependent variables so that
the system of differential equations becomes the differential equation of
order n, such as the following example.

Example 14.4.1 The system of differential equation is:


242 CHAPTER 14. DIFFERENTIAL EQUATIONS



 ẋ = 2x + y,

 ẏ = 3x + 4y.

We differentiate the first equation to eliminate y and ẏ. Since ẏ = 3x + 4y =


3x + 4ẋ − 4 · 2x, we obtain the corresponding quadratic homogeneous
differential equation:
ẍ − 6ẋ + 5x = 0,

thus the general solution is x(t) = C1 et + C2 e5t . Since y(t) = ẋ − 2x, y(t) =
−C1 et + 3C2 e5t .

The second method is to rewrite the homogeneous differential equa-


tion (14.4.8) as:

x(t) = eAt x0 ,

where
A2 t2
eAt = I + At + + ··· .
2!
Now we solve eAt in three different cases.

Case 1: A has different real eigenvalues

Matrix A has different real eigenvalues, which means that its eigenvec-
tors are linearly independent. Thus A can be diagonalized, that is,

A = P ΛP −1 ,

where P = [v1 , v2 , · · · , vn ] consists of the eigenvectors of A, and moreover


Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A,
thus we have
eA = P eΛ P −1 .
14.4. SYSTEM OF ORDINARY DIFFERENTIAL EQUATIONS 243

Therefore, the solution to the system of differential equation (14.4.8) is:

x(t) = P eΛt P −1 x0
= P eΛt c
= c 1 v 1 e λ1 t + · · · + c n v n e λn t ,

where c = (c1 , c2 , · · · , cn ) is a vector of arbitrary constants, and it is deter-


mined by the initial value, namely c = P −1 x0 .

Case 2: A has multiple real eigenvalues, but no complex eigenvalues

First, consider a simple case that A has only one eigenvalue of multiplic-
ity m. In this case, there are at most m linearly independent eigenvectors,
which means that the matrix P can not be constructed as a matrix consist-
ing of linearly independent eigenvectors, so A can not be diagonalized.
Thus, the solution has the following form:


m
x(t) = ci hi (t),
i=1

where hi (t), ∀i, are quasi-polinomials, and ci , ∀i, are determined by initial
conditions. For example, when m = 3, we have:

h1 (t) = eλt v1 ,
h2 (t) = eλt (tv1 + v2 ),
h3 (t) = eλt (t2 v1 + 2tv2 + 3v3 ),

where v1 , v2 , v3 are determined by following conditions:

(A − λI)vi = vi−1 , v0 = 0.

If A has more than one multiple real eigenvalues, then the solution of the
244 CHAPTER 14. DIFFERENTIAL EQUATIONS

differential equation (14.4.8) can be obtained by summing up the solutions


corresponding to each eigenvalue.

Case 3: A has complex eigenvalues

Since A is a real matrix, complex eigenvalues will be generated in the


form of conjugate pairs.

If an eigenvalue of A is α + βi, then its conjugate complex α − βi is also


an eigenvalue.

Now consider a simple case: A has only one pair of complex eigenval-
ues, λ1 = α + βi and λ2 = α − βi.

Let v1 and v2 be the eigenvectors corresponding to λ1 and λ2 ; then we


have v2 = v̄1 , where v̄1 refers to the conjugation of v1 . The solution of the
differential equation (14.4.8) can be expressed as:

x(t) = eAt x0
= P eΛt P −1 x0
= P eΛt c
= c1 v1 e(α+βi)t + c2 v2 e(α−βi)t
= c1 v1 eαt (cos βt + i sin βt) + c2 v2 eαt (cos βt − i sin βt)
= (c1 v1 + c2 v2 )eαt cos βt + i(c1 v1 − c2 v2 )eαt sin βt
= h1 eαt cos βt + h2 eαt sin βt,

where h1 = c1 v1 + c2 v2 and h2 = i(c1 v1 − c2 v2 ).

If A has many pairs of conjugate complex eigenvalues, then the solu-


tion of the differential equation (14.4.8) is obtained by summing up the
solutions corresponding to all eigenvalues.
14.5. SIMULTANEOUS DIFFERENTIAL EQUATIONS AND STABILITY OF EQUILIBRIUM245

14.5 Simultaneous Differential Equations and Sta-


bility of Equilibrium
Consider the following simultaneous differential equations system:

ẋ = f (t, x), (14.5.9)

where t (time) is an independent variable, x = (x1 , · · · , xn ) are dependent


variables, and f (t, x) is continuously differentiable with respect to x ∈ Rn
and satisfies the initial condition x(0) = x0 . Such simultaneous differential
equations are called the planar dynamic systems. If f (t, x∗ ) = 0, the point
x∗ is called the equilibrium of the above dynamical system.

Definition 14.5.1 A simultaneous differential equation system x∗ is local-


ly stable if there is δ > 0 and a unique path of x = ϕ(t, x0 ) such that
limt→∞ ϕ(t, x0 ) = x∗ whenever |x∗ − x0 | < δ.

Consider the case of simultaneous differential equations system with


two variables x = x(t) and y = y(t):

 dx

 = f (x, y),
dt

 dy = g(x, y).

dt

Let J be the Jacobian


 
∂f ∂f
 
 ∂y 
J =  ∂x ∂g 
 ∂g 
∂x ∂y

evaluated at (x∗ , y ∗ ), and λ1 and λ2 be the eigenvalues of this Jacobian.


Then the stability of equilibrium point is characterized as follows:
246 CHAPTER 14. DIFFERENTIAL EQUATIONS

(1) It is a (locally) stable ( or unstable) node if λ1 and λ2 are different


real numbers and are negative (or positive);
(2) It is a (locally) saddle point if eigenvalues are real numbers but with
opposite signs, namely λ1 λ2 < 0;
(3) It is a (locally) stable (or unstable) focus if λ1 and λ2 are complex
numbers and Re(λ1 ) < 0(or Re(λ1 ) > 0);
(4) It is a center if λ1 and λ2 are complex and Re(λ1 ) = 0;
(5) It is a (locally) stable (or unstable) improper node if λ1 and λ2 are
real, λ1 = λ2 < 0 (or λ1 = λ2 > 0) and the Jacobian is not a diagonal matrix;
(6) It is a (locally) stable (or unstable) star node if λ1 and λ2 are real ,
λ1 = λ2 < 0 (or λ1 = λ2 > 0) and the Jacobian is a diagonal matrix.
Figure 2.5 below depicts six types of equilibrium point.

Figure 14.1: The Types of Equilibrium Points


14.6. THE STABILITY OF DYNAMICAL SYSTEM 247

14.6 The Stability of Dynamical System


In a dynamic system, Lyapunov method studies the global stability of e-
quilibrium points.
Let x̄(t, x0 ) be the unique solution of the dynamic system (14.5.9), and
Br (x) = {x′ ∈ D : |x′ − x| < r} be an open ball of radius r centered at x.
The following is the definition of stability of equilibrium point.

Definition 14.6.1 The equilibrium point x∗ of the dynamic system (14.5.9)

(1) is globally stable if for any r > 0, there is a neighbourhood U of


x∗ such that
x̄(t, x0 ) ∈ Br (x∗ ), ∀x0 ∈ U.

(2) is globally asymptotically stable if for any r > 0, there is a


neighbourhood U ′ of x∗ such that

lim x̄(t, x0 ) = x∗ , ∀x0 ∈ U.


t→∞

(3) is globally unstable if it is neither globally stable nor asymptot-


ically globally stable.

Definition 14.6.2 Let x∗ be the equilibrium point of the dynamic system


(14.5.9), Q ⊆ Rn be an open set containing x∗ , and V (x) : Q → R be a
continuously differentiable function. If it satisfies:

(1) V (x) > V (x∗ ), ∀x ∈ Q, x ̸= x∗ ;

(2) V̇ (x) is defined as:

def
V̇ (x) = ▽V (x)f (t, x) ≤ 0, ∀x ∈ Q, (14.6.10)

where ▽V (x) is the gradient of V with respect to x,


248 CHAPTER 14. DIFFERENTIAL EQUATIONS

thus it is called Lyapunov function.

The following is the Lyapunov theorem about the equilibrium points


of dynamic systems.

Theorem 14.6.1 If there exists a Lyapunov function V for the dynamic system
(14.5.9), then the equilibrium point x∗ is globally stable.
If the Lyapunov function (14.6.10) of the dynamic system satisfies V̇ (x) <
0, ∀x ∈ Q, x ̸= x∗ , then the equilibrium point x∗ is asymptotically globally
stable.
Chapter 15

Difference Equations

Difference equations can be regarded as discretized differential equations,


and many of their properties are similar to those of differential equations.
Let y be a real-valued function defined on natural numbers. yt means
the value of y at t, where t = 0, 1, 2, · · · , which can be regarded as time
points.

Definition 15.0.3 The first-order difference of y at t is:

∆y(t) = y(t + 1) − y(t).

The second-order difference of y at t is:

∆2 y(t) = ∆(∆y(t)) = y(t + 2) − 2y(t + 1) + y(t).

Generally, the nth- order difference of y at t is:

∆n y(t) = ∆(∆n−1 y(t)), n > 1.

Definition 15.0.4 The difference equation is a function of y and its differ-

249
250 CHAPTER 15. DIFFERENCE EQUATIONS

ences ∆y, ∆2 y, · · · , ∆n−1 y,

F (y, ∆y, ∆2 y, · · · , ∆n y, t) = 0, t = 0, 1, 2, · · · . (15.0.1)

If n is the highest order of nonzero coefficient in the formula (15.0.1), the


above equation is called an nth-order difference equation.

If F (ψ(t), ∆ψ(t), ∆2 ψ(t), · · · , ∆n ψ(t), t) = 0 holds for ∀t, then we call


y = ψ(k) a solution of the difference equation. Similar to differential e-
quations, the solutions of difference equations also have general solutions
and particular solutions. The general solutions usually contain some arbi-
trary constants that can be determined by initial conditions.

The difference equations can also be expressed in the following form


by variable conversion:

F (y(t), y(t + 1), · · · , y(t + n), t) = 0, t = 0, 1, 2, · · · . (15.0.2)

If the coefficients of y0 (k), yn (k) are not zero, and the highest correspond-
ing order is n, then it is called an nth-order difference equation.

The followings are mainly focused on the difference equations with


constant coefficients. A common expression is written as:

f0 y(t + n) + f1 y(t + n − 1) + · · · + fn−1 y(t + 1) + fn y(t) = g(t), t = 0, 1, 2, · · · ,


(15.0.3)
where f0 , f1 , · · · , fn are real numbers, and f0 ̸= 0, fn ̸= 0.
fi
Dividing both sides of the equation by f0 , and making ai = for
f0
g(t)
i = 0, · · · , n, r(t) = , the nth order difference equation can be written
f0
as the simpler form:
15.1. FIRST-ORDER DIFFERENCE EQUATIONS 251

y(t + n) + a1 y(t + n − 1) + · · · + an−1 y(t + 1) + an y(t) = r(t), t = 0, 1, 2 · · · .


(15.0.4)
Here are three procedures that are usually used to solve nth order lin-
ear difference equations:
Step 1: find the general solution of the homogeneous difference equa-
tion

y(t + n) + a1 y(t + n − 1) + · · · + an−1 y(t + 1) + an y(t) = 0,

and let the general solution be Y .


Step 2: find a particular solution y ∗ of the difference equation (2.9.53).
Step 3: the solution of the difference equation (2.9.53) is

y(t) = Y + y ∗ .

The followings are the solutions of the first-order, second-order and


nth-order difference equations, respectively.

15.1 First-order Difference Equations


The first-order difference equation is defined as:

y(t + 1) + ay(t) = r(t), t = 0, 1, 2, · · · . (15.1.5)

The corresponding homogeneous difference equation is:

y(t + 1) + ay(t) = 0,

and the general solution is y(t) = c(−a)t , where c is an arbitrary constant.


252 CHAPTER 15. DIFFERENCE EQUATIONS

To get a particular solution for a nonhomogeneous difference equation,


consider r(t) = r, that is, the case that does not change over time.
Obviously, a particular solution is as follows:

r
y∗ = , a ̸= −1,
1+a

y ∗ = rt, a = −1.

Hence, the solution of the nonhomogeneous difference equation(15.1.5) is:



 r
 c(−a)t + , if a ̸= −1,
y(t) = 1+a (15.1.6)

 c + rt, if a = −1.

If the initial condition y(0) = y0 is known, the solution of the difference


equation (15.1.5) is:
 ( )
 r r
 y0 − × (−a)t + , if a ̸= −1,
y(t) = 1+a 1+a (15.1.7)

 y0 + rt, if a = −1.

If r depends on t, a particular solution is:


t−1
y∗ = (−a)t−1−i r(i),
i=0

thus the solution of the difference equation (15.1.5) is:


t−1
y(t) = (−a)t y0 + (−a)t−1−i r(i), t = 1, 2, · · · .
i=0

For a general function r(t) = f (t), the coefficients of A0 , · · · ,


Am can be determined by using the method of undetermined-coefficients,
namely considering y ∗ = f (A0 , A1 , · · · , Am ; t). The following is to solve for
a particular solution in a case that r(t) is a polynomial.
15.1. FIRST-ORDER DIFFERENCE EQUATIONS 253

Example 15.1.1 Solve the following difference equation:

y(t + 1) − 3y(t) = t2 + t + 2.

The homogeneous equation is:

y(t + 1) − 3y(t) = 0,

The general solution is:


Y = C3t .

Using the method of undetermined-coefficients to get the particular solu-


tion of the nonhomogeneous equation, suppose that the particular solu-
tion has the form:
y ∗ = At2 + Bt + D.

Substitute y ∗ into the nonhomogeneous difference equation, and get:

A(t + 1)2 + B(t + 1) + D − 3At2 − 3Bt − 3D = t2 + t + 2,

or
−2At2 + 2(A − B)t + A + B − 2D = t2 + t + 2.

Since equality holds for each t, we must have:




 −2A = 1



 2(A − B) = 1



 A + B − 2D = 2,

1 3
which gives A = − , B = −1 and D = − , thus we have a particular
2 4
∗ 1 2 3
solution: y = − t − t − . Therefore, a particular solution of the nonho-
2 4
1 3
mogeneous equation is y(t) = Y + y ∗ = C3t − t2 − t − .
2 4
254 CHAPTER 15. DIFFERENCE EQUATIONS

We can also solve the case with an exponential function by using the
method of undetermined-coefficients.

Example 15.1.2 Consider the first-order difference equation:

y(t + 1) − 3y(t) = 4et .

Suppose that the form of particular solution is y ∗ = Aet , then substitut-


4
ing it into the nonhomogeneous difference equation gives: A = .
e−3
Therefore, the general solution of the first-order difference equation is:
4et
y(t) = Y + y ∗ = C3t + .
e−3

Here are some of the common ways for finding particular solutions:

(1) when r(t) = r, a usual form of particular solution is: y ∗ = A;

(2) when r(t) = r + ct, a usual form of particular solution is:


y ∗ = A1 t + A2 ;

(3) when r(t) = tn , a usual form of particular solution is: y ∗ =


A0 + A1 t + · · · + An tn ;

(4) when r(t) = ct , a usual form of particular solution is: y ∗ =


Act ;

(5) when r(t) = α sin(ct) + β cos(ct), a usual form of particular


solution is: y ∗ = A1 sin(ct) + A2 cos(ct).

15.2 Second-order Difference Equation

The second-order difference equation is defined as:

y(t + 2) + a1 y(t + 1) + a2 y(t) = r(t).


15.3. DIFFERENCE EQUATIONS OF ORDER N 255

The corresponding homogeneous differential equation is:

y(t + 2) + a1 y(t + 1) + a2 y(t) = 0.

Then, its general solution depends on the roots of the following linear
equation:
m2 + a1 m + a2 = 0,

which is called the auxiliary equation or characteristic equation of second


order difference equations. Let m1 and m2 be the roots of this equation.
Since a2 ̸= 0, both m1 and m2 are not 0.

Case 1: m1 and m2 are different real roots.

The general solution of the homogeneous equation is Y = C1 mt1 +


C2 mt2 , where C1 and C2 are arbitrary constants.

Case 2: m1 and m2 are the same real roots.

The general solution of the homogeneous equation is Y = (C1 +


C2 t)mt1 .

Case 3: m1 and m2 are two complex roots, namely r(cos θ±i sin θ) with r >
0, θ ∈ (−π, π]. The general solution of the homogeneous equation is
Y = C1 rt cos(tθ + C2 ).

For a general function r(t), it can be solved by the method of undetermined-


coefficients.

15.3 Difference Equations of Order n


The general nth-order difference equation is defined as:

y(t+n)+a1 y(t+n−1)+· · ·+an−1 y(t+1)+an y(t) = r(t), t = 0, 1, 2, · · · . (15.3.8)


256 CHAPTER 15. DIFFERENCE EQUATIONS

The corresponding homogeneous equation is:

y(t + n) + a1 y(t + n − 1) + · · · + an−1 y(t + 1) + an y(t) = 0,

and its characteristic equation is:

mn + a1 mn−1 + · · · + an−1 m + an = 0.

Let its n characteristic roots be m1 , · · · , mn .


The general solutions of the homogeneous equations are the sum of
the bases generated by these eigenvalues, and its concrete forms are as
follows:

Case 1: The formula generated by a single real root m is C1 mk .

Case 2: The formula generated by the real root m of multiplicity p is:

(C1 + C2 t + C3 t2 + · · · + Cp tp−1 )mt .

Case 3: The formula generated by a pair of nonrepeated conjugate com-


plex roots r(cos θ ± i sin θ) is:

C1 rt cos(tθ + C2 ).

Case 4: The formula generated by a pair of conjugate complex roots


r(cos θ ± i sin θ) of multiplicity p is:

rt [C1,1 cos(tθ + C1,2 ) + C2,1 t cos(tθ + C2,2 ) + · · · + Cp,1 tp−1 cos(tθ + Cp,2 )].

The general solution of the homogeneous difference equation is ob-


tained by summing up all formulas generated by eigenvalues.
15.4. THE STABILITY OF N TH-ORDER DIFFERENCE EQUATIONS 257

A particular solution y ∗ of a nonhomogeneous difference equation can


be generated by the method of undetermined-coefficients.
A particular solution is:


n ∞

y∗ = θs mis r(t − i),
s=1 i=0

where
ms
θs = .
Πj̸=s (ms − mj )

15.4 The Stability of nth-Order Difference Equa-


tions

Consider an nth-order difference equation

y(t + n) + a1 y(t + n − 1) + · · · + an−1 y(t + 1) + an y(t) = r(t), t = 0, 1, 2, · · · .


(15.4.9)
The corresponding homogeneous equation is:

y(t + n) + a1 y(t + n − 1) + · · · + an−1 y(t + 1) + an y(t) = 0, t = 0, 1, 2, · · · .


(15.4.10)

Definition 15.4.1 The difference equation (15.0.4) is asymptotically sta-


ble, if an arbitrary solution Y (t) of the homogeneous equation (15.4.10)
satisfies Y (t)|t→∞ = 0.

Let m1 , · · · , mn be the solution of their characteristic equation:

mn + a1 mn−1 + · · · + an−1 m + an = 0. (15.4.11)


258 CHAPTER 15. DIFFERENCE EQUATIONS

Theorem 15.4.1 Suppose that the modulus of all eigenvalues of the character-
istic equation are less than 1. Then, the difference equation (15.4.9) is asymptoti-
cally stable.

When the following inequality conditions are satisfied, the modulus of all
eigenvalues of the characteristic equation are less than 1.

1 an
> 0,
an 1

1 0 an an−1
a1 1 0 an
> 0,
an 0 1 a1
an−1 an 0 1

1 0 ··· 0 an an−1 ··· a1


a1 1 ··· 0 0 an an−1 · · · a2
.. .. .. .. .. .. .. ..
. . . . . . . .
an−1 an−2 · · · 1 0 0 ··· an
> 0.
an 0 ··· 0 1 a1 ··· an−1
an−1 an ··· 0 0 1 ··· an−2
.. .. .. .. .. .. .. ..
. . . . . . . .
a1 a2 · · · an 0 0 ··· 1

15.5 Difference Equations with Constant Coeffi-


cients

The difference equation with constant coefficients is defined as:

x(t) = Ax(t − 1) + b, (15.5.12)


15.5. DIFFERENCE EQUATIONS WITH CONSTANT COEFFICIENTS259

where x = (x1 , · · · , xn )′ , b = (b1 , · · · , bn )′ . Suppose the matrix A is diago-


nalizable, the corresponding eigenvalues are λ1 , · · · , λn and the matrix P
formed by linearly independent eigenvectors such that
 
λ 0 ···
 1
0 
 
 0 λ2 · · · 0 
 
A = P −1 
 .. .. . . ..  P.
 . . . . 

 
0 0 · · · λn

A necessary and sufficient condition for the differential equation (15.5.12)


to be (asymptotically) stable is that the modulus of all eigenvalues λi are
less than 1. When the modulus of all eigenvalues λi are less than 1, the
equilibrium point x∗ = limt−→∞ x(t) = (I − A)−1 b.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy