Concave Analysis
Concave Analysis
Concave Analysis
and Applications
Elements of
Concave Analysis
and Applications
Prem K. Kythe
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Notations, Definitions, and Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
1 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Cofactor Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Solution with the Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.3 Gaussian Elimination Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Definite and Semidefinite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Special Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6.1 Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6.2 Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.3 Bordered Hessian: Two Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6.4 Bordered Hessian: Single Function . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Differential Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.1 Limit of a Function at a Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Theorems on Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.1 Limit at Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.2 Infinite Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Global and Local Extrema of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 First and Second Derivative Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.1 Definition of Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 Vector-Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.1 Geometric Meaning of the Inflection Point . . . . . . . . . . . . . . . . . . . 40
2.6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 Multivariate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.7.1 Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
viii CONTENTS
This textbook on concave analysis aims at two goals. Firstly, it provides sim-
ple yet comprehensive subject matter to the readers who are undergraduate
seniors and beginning graduate students in mathematical economics and busi-
ness mathematics. For most readers the only prerequisites are courses in ma-
trix algebra and differential calculus including partial differentiation; however,
for the last chapter a thorough working knowledge of linear partial differen-
tial equations and the Laplace transforms is required. The readers can omit
this chapter if not required. The subject of the book centers mostly around
concave and convex optimization; other related topics are also included. The
details are provided below in the overview section.
Although there are many excellent books on the market, almost all of
them are at times difficult to understand. They are very heavy on theoreti-
cal aspects, and generally fail to provide ample worked-out examples to give
readers easy understanding and workability.
The second goal is elucidated below in the section ‘To Readers’.
Motivation
The subject of convexity and quasi-convexity has been a model for economic
theorists to make decisions about cost minimization and revenue maximiza-
tion. This has resulted in a lot of publications in convex optimization. So
why is there keen interest in concave and quasi-concave functions? Firstly,
economic theory dictates that all utility functions are quasi-convex and that
all cost functions are concave in input prices. Therefore, a cost function that
is not concave in input prices is not a cost function. Secondly, the standard
model in economic theory consists in a set of alternatives and an ordering of
these alternatives, according to different priorities and interests. The process
that a decision maker follows is to choose a favorite alternative with the prop-
erty that no other alternative exceeds the ordering. In such a situation the
decision maker often uses a function that ‘represents’ this ordering. Thus, for
example, suppose there are four alternatives, say, a, b, c and d, and suppose
that the decision maker prefers a to b and treats both c and d as equally
desirable. Any function, like f , with f (a) > f (b) > f (c) = f (d) may rep-
xiv PREFACE
Overview
A general description of the topics covered in the book is as follows: Chap-
ter 1 introduces a review of matrix algebra that includes definitions, matrix
inversion, solutions of systems of linear algebraic equations, definite and semi-
definite matrices, Jacobian, two types of Hessian matrices, and the Hessian
test. Chapter 2 is a review of calculus, with topics dealing with limits, deriva-
tive, global and local extrema, first and second derivative tests, vector-valued
functions, optimization, multivariate functions, and basic concepts of mathe-
matical economics.
Concave and convex functions are introduced in Chapter 3, starting with
the notion of convex sets, Jensen’s inequalities for both concave and convex
functions, and unconstrained optimization. Chapter 4 deals with concave
programming; it is devoted to optimization problems on maximization mostly
with inequality constraints, and using the Lagrange method of multipliers and
the KKT necessary and sufficient conditions. Applications to mathematical
economics include the topic of peak price loading, and comparative statics is
discussed. Optimization problems focusing on minimization are introduced
in Chapter 5 on convex programming, in order to compare it with concave
optimization. Nonlinear programming is discussed; the Fritz John and Slater
conditions are presented, and the topic of Lagrangian duality is discussed.
Chapters 6 and 7 deal with quasi-concave and quasi-convex functions.
Both topics are important in their own applications. The single-function
bordered Hessian test on quasi-concavity and quasi-convexity is presented,
and optimization problems with types of functions and the minmax theorem
are provided. Chapter 8 deals with log-concave functions; general results
on log-concavity are presented, with application on mean residual life; and
the Asplund sum is introduced, with its algebra, derivatives, and area mea-
sure. Log-concavity of nonnegative sequences is discussed, and all log-concave
PREFACE xv
To Readers
The second goal concerns specifically the abuse and misuse of a couple of
standard mathematical notations in this field of scientific study. They are
the gradient ∇f and the Laplacian ∇2 f of a function f (x) in Rn . Somehow,
and somewhere, a tradition started to replace the first-order partials of the
function f by its gradient ∇f . It seems that this tradition started without
any rigorous mathematical argument in its support. This book has provided
a result (Theorem 2.18) that establishes that only under a specific necessary
condition the column vector [∂f /∂x1 · · · ∂f /∂xn ] can replace the gradient
vector ∇f , and these two quantities, although isomorphic to each other, are
not equal. Moreover, it is shown that any indiscriminate replacement between
these two quantities leads to certain incorrect results (§3.5).
xvi PREFACE
The other misuse deals with the Laplacian ∇2 f , which has been used to
represent the Hessian matrix (§1.6.2), without realizing that ∇2 f is the trace
(i.e., sum of the diagonal elements) of the Hessian matrix itself. This abuse
makes a part equal to the whole. Moreover, ∇2 is the well-known linear partial
differential operator of the elliptic type known as the Laplacian.
It appears that this misuse perhaps happened because of the term ‘vector’,
which is used (i) as a scalar quantity, having only magnitude, as in the row
or column vectors (in the sense of a matrix), and (ii) as a physical quantity,
such as force, velocity, acceleration, and momentum, having both magnitude
and direction. The other factor for the abuse in the case of the gradient is
the above-mentioned linear isomorphic mapping between the gradient vector
∇f and the (scalar) column vector [∂f /∂x1 · · · ∂f /∂xn ]T . This isomorphism
has been then literally used as ‘equality’ between these two quantities. Once
the case for ∇f became the tradition, the next choice ∇2 f for the Hessian
matrix became another obvious, but incorrect, tradition.
As readers, you will find an attention symbol, !!! , at different parts of the
book. It is used to point out the significance of the statements found there.
The other less important notations are the ≺, the ⊕ and the ⊙ symbols.
Although borrowed from physics and astronomy, these symbols are acceptable
with a different but almost similar meaning provided that they are properly
defined as given in the section on Notations. Moreover, the ⊕ and the ⊙
symbols have now become so common due to the advancement in cell phones
and related electronic technology that they are probably losing their rigorous
mathematical significance.
Acknowledgments
I take this opportunity to thank Mr. Sarfraz Khan, Executive Editor, Taylor
& Francis, for his support, and Mr. Callum Fraser for coordinating the book
project. I also thank the Project Editor Michele A. Dimont for doing a great
job of editing the text. Thanks are due to the reviewers and to some of my
colleagues who made some very valuable suggestions to improve the book.
Lastly, I thank my friend Michael R. Schäferkotter for help and advice freely
given whenever needed.
Prem K. Kythe
Notations, Definitions, and Acronyms
Cov, covariance
c.d.f., cumulative distribution function
c(w, y), cost function
Dt , derivative with respect to t
Df (x), derivative of f (x) in Rn
D, aggregated demand
D, domain, usually in the z-plane
dist(A, B), distance between points (or sets) A and B
dom(f ), domain of a function f
DRS, decreasing return to scale
e, expenditure function
E, amount allocated for expenditure
E[X], expected value of a random vector X
E(f ), entropy of f
Eq(s)., Equation(s) (when followed by an equation number)
ei , ith unit vector, i = 1, . . . , n
[e], set of the unit vectors ei in Rn
epi(f ), epigraph of f
e(p, u), expenditure function
F , field
f : X 7→ Y , function f maps the set X into (onto) the set Y
f ◦ g, composite function of f and g: (f ◦ g)(·) = f (g(·))
f ′ , first derivative of f
f ′′ , second derivative of f
f (n) , nth derivative of f
∂f (x)
, first-order partials of f in Rn , also written fi , for i = 1, . . . , n; also
∂xi
∂f ∂f ∂f
written as fx , fy , fx for , , in R3
∂x ∂y ∂z
∂ 2 f (x)
, second-order partials of f in Rn , also written as fij for i, j = 1, . . . , n;
∂xi ∂xj
∂2f ∂2f ∂2f
also written as fxx , fyy , fzx for , , in R3
∂x2 ∂y 2 ∂z 2
(f ◦ g)(x),= f (g(x)), composition of functions f and g
Rt Rt
f ⋆ g, convolution of f (t) and g(t) (= 0 f (t − u)g(u) du = 0 f (ug (t − u) du =
L−1 {G(s)F (s)})
FJ, Fritz John conditions R∞
F (s), Laplace transform of f (t) (= 0 est f (t) dt)
G, government expenditure; constrained set
Gmin , positive minimal accepted level of profit
G(·; ·), Green’s function
NOTATIONS, DEFINITIONS, AND ACRONYMS xix
∂ ∂ ∂
∇, ‘del’ operator, ∇ = i + j + k , ((x, y, z) ∈ R3 ); an operator defined
∂x ∂y ∂z
∂ ∂
in Rn as ∇ = e1 + · · · + en
∂x1 ∂xn
∂f ∂f
∇f , gradient of a function f , a vector in R3 , defined by ∇ = i +j +
∂x ∂y
∂f ∂f ∂f
k ; a vector in Rn defined by ∇ = e1 + · · · + en for x =
∂z ∂x1 ∂xn
∂f ∂f
(x1 , . . . , xn ) ∈ Rn (dimension 1 × n), or by ∇ = e1 + · · · + en for
∂x1 ∂xn
x = (x1 , . . . , xn ) ∈ Rn (dimension n × 1)
∂2 ∂2
∇2 , Laplacian operator defined on Rn as + · · ·+ ; it is a linear elliptic
∂x21 ∂x2n
partial differential operator
∂2f ∂2f
∇2 f (x) = 2 + ··· + , x = (x1 , . . . , xn ) ∈ Rn , Laplacian of f (x); also
∂x1 ∂x2n
the trace of the Hessian matrix H
kxk1 , l1 -norm of a vector x
kxk2 , l2 -norm, or Euclidean norm, of a vector x
kxk∞ , l∞ -norm of a vector x
≻, , (subordination (predecessor): A B, matrix inequality between ma-
trices A and B; A ≻ B, strict matrix inequality between matrices A and
B
≺, , subordination (successor) , e.g., f ≺ g is equivalent to f (0) = g(0) and
f (E) ⊂ g(E), where E is the open disks ; but here x ≺ y is used for
componentwise strict inequality, and x y for componentwise inequality
between vectors x and y
(f ⊕ g)(z), = sup{f (x)g(y)}, where f and g are log-concave functions
x+y
x
(s ⊙ f )(x), = sf , where f is a log-concave function, and s > 0
s
n
n! n
k , binomial coefficient = k! (n − k)! = n−k
iso iso
= , isomorphic to; for example, A = B means A is isomorphic to B, and
conversely
end of a proof, or an example
!!! attention symbol
1
Matrix Algebra
Some basic concepts and results from linear and matrix algebra, and from
finite-dimensional vector spaces are presented. Proofs for most of the results
can be found in many books, for example, Bellman [1970], Halmos [1958],
Hoffman and Kunze [1961] Lipschutz [1968], and Michel and Herget [2007].
1.1 Definitions
A matrix A is a rectangular array of elements (numbers, parameters, or vari-
ables), where the elements in a horizontal line are called rows, and those in
a vertical line columns. The dimension of a matrix is defined by the number
of rows m and the number of columns n, and we say that such a matrix has
dimension m × n, or simply that the matrix is m × n. If m = n, then we have
a square matrix. If the matrix is 1 × n, we call it a row vector, and if the
matrix is m × 1, then it is called a column vector. A matrix that converts the
rows of a matrix A to columns and the columns of A to rows is called the
transpose of A and is denoted by AT .
Let two 3 × 3 matrices A and B be defined as
a11 a12 a13 b11 b12 b13
A = a21 a22 a23 , B = b21 b22 b23 . (1.1.1)
a31 a32 a33 b31 b32 b33
subtracted from) a11 in A; b12 to (or from) a12 , and so on. Multiplication of
a matrix by a number or scalar involves multiplication of each element of the
matrix by the scalar, and it is called scalar multiplication, since it scales the
matrix up or down by the size of the scalar.
A row vector A and a column vector B are written, respectively, as
b11
A = [ a11 a12 a13 ]1×3 , B = b21 .
b31 3×1
the matrices A and B, and B and E are conformable for multiplication, but
A and C are not conformable. Thus,
3 · 5 + 6 · 7 + 11 · 9 3 · 13 + 6 · 8 + 11 · 10 156 97
AB = = ,
12 · 5 + 8 · 7 + 5 · 9 12 · 13 + 8 · 8 + 5 · 10 161 270 2×2
5 · 1 + 13 · 2 5 · 4 + 13 · 4 5 · 7 + 13 × 9 31 72 152
BE = 7 · 1 + 8 · 2 7·4+8·4 7 · 7 + 8 × 9 = 23 60 121 .
9 · 1 + 10 · 2 9 · 4 + 10 · 4 9 · 7 + 10 × 9 29 76 153 3×3
1.2 PROPERTIES 3
Since the two matrices are conformable, their product AP will give the values
V (in dollars) of the stock at each outlet:
100 · 220 + 110 · 65 + 80 · 114 + 115 · 168 57, 590
210 · 220 + 230 · 65 + 150 · 114 + 400 · 168 145, 450
V = AP = = .
165 · 220 + 95 · 65 + 68 · 114 + 145 · 168 74, 587
150 · 220 + 190 · 65 + 130 · 114 + 300 · 168 110, 520
1.2 Properties
The following properties of matrices are useful.
1. Matrix addition is commutative and associative i.e., A + B = B + A, and
(A+B)+C = A+(B+C). These properties also hold for matrix subtraction,
since A − B = A + (−B).
2. Matrix multiplication, with a few exceptions, is not commutative, i.e.,
AB 6= BA. Scalar multiplication is commutative, i.e., cA = Ac. If three or
more matrices are conformable, i.e., if Aj×k , Bm×n , Cp×q , where k = m and
n = p, the associative law applies as long as matrices are multiplied in their
order of conformability. Thus, (AB)C = A(BC). Under the same conditions
the matrix multiplication is also distributive, i.e., A(B + C) = AB + BC.
Example 1.3. Given
7 4 7
4 3 10
A= 1 5
, B= , C = 8 ,
3 5 6 2×3
8 9 3×2 9 3×1
we get
7·4+4·3 7·3+4·5 7 · 10 + 4 · 6 40 41 94
AB = 1 · 4 + 5 · 3 1 · 3 + 5 · 5 1 · 10 + 5 · 6 = 19 28 40 ;
8·4+9·3 8·3+9·5 8 · 10 + 9 · 6 59 69 134 3×3
4 1 MATRIX ALGEBRA
40 41 94 7 40 · 7 + 41 · 8 + 94 · 9 1454
(AB)C = 19 28 40 8 = 19 · 7 + 28 · 8 + 40 · 9 = 717 ;
59 69 134 3×3 9 3×1 59 · 7 + 69 · 8 + 134 · 9 2171 3×1
7
4 3 10 4 · 7 + 3 · 8 + 10 · 9 142
BC = 8 = = ;
3 5 6 2×3 3·7+5·8+6·9 115 2×1
9 3×1
7 4 7 · 142 + 4 · 115 1454
142
A(BC) = 1 5 = 1 · 142 + 5 · 115 = 717 .
115
8 9 3×2 8 · 142 + 9 · 115 3×1 2171 3×1
3. An identity matrix I is a square matrix whose diagonal elements are all
1 and all remaining elements are 0. An n × n identity matrix is sometimes
denoted by In . The identity matrix I is the unity in matrix algebra just as
the numeral 1 is the unity in algebra. Thus, the multiplication of a matrix
by an identity matrix leaves the original matrix unchanged; so also the multi-
plication of an identity matrix by itself leaves the identity matrix unchanged.
Hence, AI = IA = A, and I × I = I2 = I.
4. A matrix A for which A = AT is called a symmetric matrix. A symmetric
matrix A for which A × A = A is an idempotent matrix. The identity matrix
I is both symmetric and idempotent.
5. A null matrix 0 is composed of all 0s and can have any dimension; it is not
necessarily square. Obviously, addition or subtraction of null matrices leaves
the original matrix unchanged; multiplication by a null matrix yields a null
matrix. A scalar zero 0 has dimension 1 × 1.
6. A matrix with zero elements everywhere below (or above) the principal
diagonal is called upper (or lower) triangular matrix, also known as upper or
lower echelon form. Thus,
a11 a12 ··· a1,n−1 a1,n a11 0 ··· 0 0
0 a22 ··· a2,n−1 a2n a21 a22 ··· 0 0
, or
··· ··· ··· ··· ··· ··· ··· ··· ··· ···
0 0 ··· 0 ann an1 an2 ··· an,n−1 ann
the determinant is
a11 a12
|A| = = a11 a22 − a12 a21 . (1.3.1)
a21 a22
where
a22 a23 a21 a23 a21 a22
|M11 | = , |M12 | = , |M13 | = ,
a32 a33 a31 a33 a31 a32
where |M11 | is the minor of a11 , |M12 | the minor of a12 , and |M13 | the minor
of a13 . A cofactor |Cij | is a minor with a prescribed sign, which follows the
rule
|Cij | = (−1)i+j |Mij |. (1.3.5)
Thus, depending on an even or odd power of (−1) we have
if i + j is an even number, then |Cij | = |Mij |,
if i + j is an odd number, then |Cij | = −|Mij |.
1.3 MATRIX INVERSION 7
The inverse matrix A−1 exists only if A is a square matrix. Thus, multiply-
ing a matrix by its inverse reduces to the identity matrix. This property is
8 1 MATRIX ALGEBRA
similar to the property of the reciprocal in algebra. The inverse matrix can
be obtained using the formula
1
A−1 = adj A, |A| 6= 0. (1.3.9)
|A|
3 2 −4
Example 1.5. Consider the matrix A = −2 5 1 . Then
3 −2 7
Then the transpose of the cofactor matrix C gives the adjoint matrix:
37 −22 14
adj A = CT = 17 33 −11 .
10 −12 14
To check the answer, evaluate both AA−1 and A−1 A; both of these products
should be equal to I.
Example
1.6. This is a very useful time-saving result. Given A =
a b
, find adj A and A−1 . Assuming |A| = ad − bc = 6 0, the cofactor
c d
matrix of A is
d −c
C= ,
−b a
1.4 SYSTEMS OF LINEAR EQUATIONS 9
and thus,
d −b
adj A = CT = .
−c a
Hence,
−1 1 1 d −b
A = adj A = . (1.3.10)
|A| ad − bc −c a
1.4 Systems of Linear Equations
A system of linear equations can be expressed in the form of a matrix equation.
For example, consider the following system of n linear equations:
Ax = b, (1.4.2)
where
a11 a12 ··· a1n
x1 b1
a a22 ··· a2n x2 b2
A = 21
, x=
...
b=
... ,
(1.4.3)
··· ··· ···
an1 an2 ··· ann xn bn
where A is called the coefficient matrix, x the solution vector, and b the
vector of constant terms. Note that x and b are always column vectors.
Example 1.6. Consider
5x1 + 8x2 = 42
4x1 + 9x2 = 39.
Thus,
5 8 x1 5x1 + 8x2
Ax = = ,
4 9 x2 4x1 + 9x2 2×1
10 1 MATRIX ALGEBRA
provided that A−1 exists, we multiply both sides of this equation by A−1 ,
following the laws of conformability:
A−1 −1
n×n An×n xn×1 = An×n bn×1 ,
xn×1 = A−1
n×n bn×1 . (1.4.4)
Then
9 8
9 −8 1 9 −8 − 13
adj A = CT = , A−1 = = 134 5 .
−4 5 13 −4 5 − 13 13
66 27
which gives x1 = ≈ 5.07 and x2 = ≈ 2.07.
13 13
1.4.2 Cramer’s Rule. To solve the system Ax = b, Cramer’s rule is as
follows: Let D denote the coefficient matrix A, and Di the matrix obtained
1.4 SYSTEMS OF LINEAR EQUATIONS 11
|Di |
xi = , i = 1, 2, . . . , n. (1.4.5)
|D|
x − 2z = 3
−y + 3z = 1
2x + 5z = 0.
1 0 −2 3 0 −2
We have |D| = 0 −1 3 = −9 6= 0; |Dx | = 1 −1 3 = −15;
2 0 5 0 0 5
1 3 −2 1 0 3
|Dy | = 0 1 3 = 27; |Dz | = 0 −1 1 = 6. Thus,
2 0 5 2 0 0
x − 2y + 3z = 4
2x + y − 4z = 3
−3x + 4y − z = −2.
In order to reduce the matrix to the upper triangular form, first, any nonzero
coefficient (usually the largest) in absolute value is located, and brought to the
12 1 MATRIX ALGEBRA
x − 2y + 3z = 4
y − 2z = −1
4z = 8.
x − 2z + 2w = 1
−2x + 3y + 4z = 1
y+z−w =0
3x + y − 2z − w = 3.
x − 2z + 2w = 1
y+z−w =0
−3z + 7w = 1
w = 1,
where tr(A) = a11 +a22 is the trace of A. In this case, Eq (1.5.3) is |A−ri I| =
0, or
a11 − r a12
= 0,
a21 a22 − r
solving which we have r2 − (a11 + a22 )r + (a11 a22 − a12 a21 ) = 0, or in matrix
notation,
r2 − tr(A)r + |A| = 0,
z · Az ≥ (≤) 0.
f1 = f1 (x1 , x2 , . . . , xm )
f2 = f2 (x1 , x2 , . . . , xm )
..
.
fn = fn (x1 , x2 , . . . , xm ), (1.6.1)
y1 = 5x1 − 4x2
y2 = 25x21 − 40x1 x2 + 16x22 ,
the Jacobian is
5 −4
|J| =
50x1 − 40x2 −40x1 + 32x2
= −200x1 + 160x2 + 200x1 − 160x2 = 0.
∂ 2f
or, componentwise by Hi,j = . The determinant det H is denoted by
∂xi ∂xj
|H|. Generally, the function for which the Hessian is used is obvious from the
context.
Like any matrix, the Hessian additions are commutative, associative and
distributive, and multiplication of a Hessian by a scalar and the product of
two Hessians are defined in the usual way.
!!! Some authors denote the Hessian H of a twice-differentiable function
f = f (x1 , . . . , xn ) incorrectly as ∇2 f (x). This notation is in conflict with
the established notation for the Laplacian of a twice-differentiable function
f = f (x1 , . . . , xn ), denoted by ∇2 f and defined by
∂ ∂f ∂ ∂f
= .
∂xi ∂xj ∂xj ∂xi
This is the Hessian test for second-order derivatives, and for a symmetric
(2 × 2) positive definite matrix it is defined by
fxx fxy
|H| = . (1.6.4)
fyx fyy
If the first element on the principal diagonal, known as the first principal
minor and denoted by |H1 | = fxx is positive and the second principal minor
then the second-order conditions for a minimum are met. When |H1 | > 0
and |H2 | > 0, the Hessian |H| is called positive definite. A positive definite
Hessian satisfies the second-order conditions for a minimum.
If the first principal minor |H1 | = fxx < 0 and |H2 | > 0, the Hessian |H| is
called negative definite. A negative definite Hessian satisfies the second-order
conditions for a maximum.
Example 1.14. Consider f = 3x2 − xy + 2y 2 − 4x − 7y + 12. Then
fxx = 6, fxy = −1, fyy = 4. Thus,
6 −1
|H| = ,
−1 4
which gives |H1 | = 6 > 0 and |H2 | = |H| = 24 − 1 = 23 > 0. Hence, the
Hessian is positive definite, and f is minimized at the critical values, which
are given by the solution of fx = 6x − y − 4 = 0, fy = −x + 4y − 7 = 0, i.e.,
at x∗ = 1, y ∗ = 2.
Fxx Fxy gx 0 gx gy
|H̄| = Fyx Fyy gy , or gx Fxx Fxy , (1.6.6)
gx gy 0 gy Fyx Fyy
Fxx Fxy
|H| = ,
Fyx Fyy
bordered by the first derivatives of the constraint with zero on the principal
diagonal. The order of a bordered Hessian is determined by the order of
the principal minor being bordered. Thus, |H̄| in (1.6.6) represents a second
bordered principal minor |H̄2 |, because the principal minor being bordered is
2 × 2.
In general, for a function F (x1 , x2 , . . . , xn ) in n variables subject to the
constraint g(x1 , x2 , . . . , xn ), the bordered Hessian, defined by (1.6.5), can also
be expressed as
0 g1 g2 ··· gn
F11 F12 ··· F1n g1
F21 F22 ··· F2n g2
g1 F11 F12 ··· F1n
|H̄| = · · · ··· ··· ··· , or ,
g2 F21 F22 ··· F2n
Fn1 Fn2 ··· Fnn gn
··· ··· ··· ···
g1 g2 ··· gn 0
gn Fn1 Fn2 ··· Fnn
(1.6.7)
where the n × n principal minor is being bordered. If all the principal minors
are negative, i.e., if |H̄2 |, |H̄3 |, . . . , |H̄n | < 0, then the bordered Hessian is
positive definite, and a positive definite Hessian always satisfies the sufficient
condition for a relative (local) minimum. Similarly, if the principal minors
alternate in sign from positive to negative, i.e., if |H̄2 | > 0, |H̄3 | < 0, |H̄4 | > 0,
1.6 SPECIAL DETERMINANTS 19
and so on, then the bordered Hessian is negative definite, and a negative
definite Hessian always satisfies the sufficient condition for F to be concave
and have a relative (local) maximum.
If |H| = 0 and |H1 | = 0 = |H2 |, then H is not negative definite, but it
is negative semidefinite with |H1 | ≤ 0 and |H2 | = |H| ≥ 0. However, for
the semidefinite test, we must check the signs of these discriminants with
adj H. Then if |H1 | < 0 and |H2 | = |H| = 0, both discriminants are negative
semidefinite, and it satisfies the sufficient condition for F to be concave and
have a relative (local) maximum.
0 f1 f2 ··· fn
f1 f11 f12 ··· f1n
|B̄| = f2 f21 f22 ··· f2n . (1.6.13)
··· ··· ··· ··· ···
fn fn1 fn2 ··· fnn
Note that the bordered Hessian |B| is composed of the first derivatives of
the function f rather than an extraneous constraint g. The leading principal
minors are
0 f1 f2
0 f1
|B1 | = , |B2 | = f1 f11 f12 , |Bn | = |B|. (1.6.14)
f1 f11
f2 f21 f22
where the partial derivatives are evaluated in the nonnegative orthant. Note
that the first condition in (1.6.15) is automatically satisfied since |B̄1 | =
2
∂f
−f12 = − .
∂x1
20 1 MATRIX ALGEBRA
1.7 Exercises
1.1. Use any method to solve each system of equations. If the system has
no solution, mark it as inconsistent.
x − 2y + 3z = 7
x+y−z =6
(i) 2x + y + z = 4 (ii) 3x − 2y + z = −5
−3x + 2y − 2z = −10. x + 3y − 2z = 14.
2x − 2y − 2z = 2 + 2y − z = −3
x
(iii) 2x + 3y + z = 2 (iv) 2x − 4y + z = −7
3x + 2y = 0. −2x + 2y − 3z = 4.
= −[a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 )]
a11 a12 a13
= − a21 a22 a23 .
a31 a32 a33
1.4. Given the column vector u and the row vector v, find the matrix uv:
(a) u = [ 3 2 5 4 ]T4×1 , and v = [ 2 7 4 8 ]1×4 .
6 21 12 24
4 14 8 16
Ans. The matrix uv = .
10 35 20 40
8 28 16 32 4×4
T
(b) u = [ 4 7 8 1 ]1×4 , and v = [ 2 9 7 ]3×1 . Ans. The matrix uv is
not defined and cannot be multiplied.
a1 x + b 1 y = c1 ,
1.5. Consider the system of equations
a2 x + b 2 y = c2 .
If D = a1 b2 − a2 b1 6= 0, use the matrix method to show that the solution is
c1 b 2 − c2 b 1 a1 c2 − a2 c1
x= ,y= .
D D
Solution. If a1 6= 0, then
b1 c1
1 a1
a1 b 1 c1 a 1
→ →
a2 b 2 c2
a2 b 2 c2
b1 c1 b1 c1
1 a1 a1 1 a1 a1
→
−a2 b1 −a1 c1 −a2 b1 + b2 a1 −a2 c1 + c2 a1
0 + b2 + c2 0
a1 a1 a1 a1
b1 c1
1 a1 a1
→
−a2 c1 + c2 a1 a1
0 1 ·
a1 −a2 b1 + b2 a1
b1 c1 −b1 c2 + b2 c1
1 a1 a1 1 0 −a2 b1 + b2 a1
→ →
−a2 c1 + c2 a1 −a2 c1 + c2 a1
0 1 0 1
−a2 b1 + b2 a1 −a2 b1 + b2 a1
which yields
c1 b 2 − c2 b 1 1 a1 c2 − c2 b 1 1
x= = (c1 b2 − c2 b1 ), y = = (a1 c2 − a2 c1 ).
a1 b 2 − a2 b 1 D a1 b 2 − a2 c1 D
22 1 MATRIX ALGEBRA
If a1 = 0, then
b2 c2
1
0 b1 c1 a2 b2 c2 a2 a2
→ →
a2 b2 c2 0 b1 c1
0 b1 c1
b2c2 c2 b 2 c1 c1 b 2 − c2 b 1
1 1 0 − =
a2a2 a2 a2 b 1 −a2 b1
→ → ,
c1 c1 −a2 c1
0 1 0 1 =
b1 b1 −a2 b1
c1 b 2 − c2 b 1 −a2 c1
which yields x = and y = .
D D
1.6. Given a and x as vectors, ei as the unit column vector with ith
element 1 and all other elements zeros, [e] = [e − 1, . . . , en ] as the column
vector with ith element ei for all i = 1, . . . , n, and A and H as square matrices
in Rn , verify the dimensions in the following table.
Expression Dimension Expression Dimension
x n×1 a n×1
A n×n H n×n
† ∂f ∂f
depending on whether ∇f is written as e1 + · · · + en , or as
∂x1 ∂xn
∂f ∂f
e1 + · · · + en .
∂x1 ∂xn
1.7. Determine
the rank of the following matrices:
−2 7 3
(a) A = 1 6 4 . Ans. |A| = −153 6= 0. Thus, the matrix A is
3 −8 5
nonsingular and the three rows and three columns are linearly independent.
Hence, ρ(A) = 3.
7 −2 5
(b) B = 2 10 −4 . Ans. |B| = 0. Hence the matrix B is
−3 −15 6
1.7 EXERCISES 23
singular, and the three rows and three columns are not linearly independent.
Hence, ρ(B)
6= 3. However, the 2 × 2 submatrix in the upper left corner gives
7 −2
6 0. Thus, ρ(B) = 2. Note that row 3 is −1.5 times row 2,
= 74 =
2 10
and column 3 is −2/5 times column 2.
1.8. If the two columns of a 3 × 3 determinant are equal, then the value
of the determinant is zero.
a11 a12 a13
Solution. a21 a22 a23 = a11 (a22 a31 − a32 a21 ) − a12 (a21 a31 − a31 a21 )
a11 a12 a13
+a11 (a21 a32 − a31 a22 )
= a11 a22 a31 − a11 a32 a21 − a12 (0) + a11 a21 a22 − a11 a31 a22
= a31 (a11 a22 − a11 a22 ) + a32 (a11 a21 − a11 a21 ) = 0.
4 6
1.9. Show that the matrix A = has no inverse.
2 3
" #
4 6 1 0
1 32 1
4 0
Solution 1. A I = →
2 3 0 1 2 3 0 1
1 32 1
4 0
→ .
0 1 − 12 1
Since this reduced form of the matrix A I shows that the identity matrix I
cannot appear on the left of the vertical bar, the matrix A has no inverse.
Solution 2. Since |A| = 0, the matrix A has no inverse, in view of formula
(1.3.10).
1.10. Consider the system
a11 a12 xt a a14 xt−1 b
= 13 + 1 ,
a21 a22 yt a23 a24 yt−1 b2
or in matrix form
Axt = E xt−1 + b, (1.6.11)
Substituting these into the homogeneous form of the given equation, we get
or
Eki Ci (ri )t−1 − Aki Ci (ri )t = 0,
10x + 7y + 8z + 7w = 32
−5y − 2z − 7w = −14
− 29
3 y− 7z − 19
3 w = −23
−11y − 12z − 7w = −30.
Now eliminate y from the second and third equations, and also from the third
and fourth equations, and then z from the third and fourth equations. This
gives the echelon form, from which we first find w, then z, then y, and finally
x. Ans. x = y = z = w = 1.
1.13. Use the Hessian to determine whether the function F (x, y, z) =
2x2 − 7x − xy + 5y 2 − 3y + 4yz + 6z 2 + 3z − 4xz is minimized or maximized
at the critical points. Solution. The first-order criterion gives
Using Cramer’s method, we get |A| = (120 − 16) + 1(−12 + 16) − 4(−4 + 40) =
276, and
7 −1 −4 4 7 −4 4 −1 7
|A1 | = 3 10 4 , |A2 | = −1 3 4 , |A3 | = −1 10 3 ,
−3 4 12 −4 −3 12 4 4 −3
4 −1 −4
|H| = −1 10 4 .
−4 4 12
4 −1
Since |H1 | = 4 > 0, |H2 | = = 42 > 0, and |H3 | = |H| = 276 > 0,
−1 10
we find that |H| is positive definite, which means that F (x, y, z) is minimized
at the critical points.
−8 4
1.14. Find the characteristic root of the matrix A = . Ans.
4 −8
Solving |A − r I| = 0, we get
−8 − r 4
|A − r I| = = r2 + 16r + 48 = 0,
4 −8 − r
which gives r = −4, −12. Also note that since both characteristic roots are
negative, the matrix A is negative definite. Also, check that the trace (sum
of the principal diagonal) of the matrix A must be equal to the sum of the
two characteristic roots.
25 61 −12
1.15. Find the inverse of the matrix A = 18 −2 4 .
8 35 21
0.01 0.05 −0.01
Ans. A−1 = 0.01 −0.02 0.01 .
−0.02 0.01 0.03
26 1 MATRIX ALGEBRA
1.17. The equilibrium conditions for two related goods are given by 6P1 −
9P2 = 42, −P1 + 7P2 = 66, where P1 and P2 are respective prices. Find the
equilibrium prices P1 and P2 .
The
Solution. system
of equations in the matrix form is Ax = b =
6 −9 P1 42 7 9
= . Then |A| = 33; adj A = . Then A−1 =
−1 7 P2 66
1 6
1 1 7 9 P1 −1 26.96
adj A = . This yields =A b= .
|A| 33 1 6 P2 13.28
1.18. The Bayou Steel Company produces stainless steel and aluminum
containers. On a typical day, they manufactured 750 steel containers with
10-gallon capacity, 500 with 5-gallon capacity, and 600 with 1-gallon capacity.
On the same day they manufactured 900 aluminum containers with 10-gallon
capacity, 700 with 5-gallon capacity, and 1100 with 1-gallon capacity. (a)
Represent the above data as two different matrices. (b) If the amount of the
material used in the 10-gallon capacity containers is 20 pounds, that used in
5-gallon containers is 12 pounds, and that in 1-gallon containers is 5 pounds,
find the matrix representing the amount of material. (c) If the stainless steel
costs $0.25 per pound and aluminum costs $0.10 per pound, find the matrix
representing cost. (d) Find the total cost of the day’s production.
Solution. (a) The data can be represented as
750 900
750 500 600
, or 500 700 ;
900 700 1100 2×3
600 1100 3×2
20
750 500 600 12 = 24000
(b) the amount of material: ;
900 700 1100 31900 2×1
2×3 5 3×1
1.7 EXERCISES 27
(c) the cost is equal to [ 0.25 0.10 ]1×2 ; and (d) total cost of production
124000
[ 0.25 0.10 ]1×2 = $9191.20.
3900 2×1
1.19. Use the Hessian to determine whether the function F (x, y, z) =
−4x2 = 9x + xz − 2y 2 + 3y + 2yz − 4z 3 is minimized or maximized at the
critical points. Solution. The first-order criterion gives
−9 0 1 −8 −9 1 8 0 −9
|A1 | = −3 −4 2 , |A2 | = 0 −3 2 , |A3 | = 0 −4 −3 ,
0 2 12 1 0 −12 1 2 0
−8 0 1
|H| = 0 −4 2 .
1 2 −12
−8 0
Since |H1 | = −8 < 0, |H2 | = = 32 > 0, and |H3 | = |H| = −348 <
0 −4
2
0; also Fxx · Fyy = 32 < (Fxy ) = 144. Thus, we find that |H| is negative
definite, which means that F (x, y, z) is maximized at the critical points.
1.20. Maximize the total profit function P for a manufacturing firm pro-
ducing two related goods, in quantities x and y, so that the demand functions
are defined by P1 = 60−4x−2y, and P2 = 40−x−4y, and the total cost func-
tion TC= 4x2 +xy+2y 2 . Solution. Let P =TR -TC, where TR= P1 x+P2 y.
Thus,
Using Cramer’s Rule, |A| = 176, |A1 | = 560, |A2 | = 400, thus giving x8 =
3.18, y ∗ = 2.27. The second-order derivatives are Pxx = −16, Pyy = −12, P +
xy − 4 = Pyx , and the Hessian is
−16 −4
|H| = = |A| = 176 > 0,
−4 −12
and |H1 | = −16 < 0. Thus, |H| is negative definite, and P is maximized at
(x∗ , y ∗ ).
2
Differential Calculus
Some basic concepts and results from real analysis are presented. The topics
include limit theorems, differentiation, criterion for concavity and related the-
orems, and vector-valued functions. Proofs for almost all of the results can be
found in many textbooks on calculus, e.g., Boas [1996], Hardy [1967], Royden
[1968], and Rudin [1976].
2.1 Definitions
A function f is a rule which assigns to each value of a variable x, called the
argument of the function, one and only one value y = f (x) known as the value
of the function at x. The domain of a function f , denoted dom(f ), is the set
of all possible values of x; the range of f , denoted by R(f ), is the set of all
possible values for f (x). Examples of functions are:
Linear function: f (x) = mx + b,
Quadratic function: f (x) = ax2 + bx + c = 0, a 6= 0.
Polynomial function of degree n: f (x) = an xn + an−1 xn−1 + · · · + a1 x +
a0 , an 6= 0; n nonnegative integer.
g(x)
Rational function: f (x) = , where g(x) and h(x) are both polynomials
h(x)
and h(x) 6= 0.
Power function: f (x) = axn , where n is any real number.
2.1.1 Limit of a Function at a Point. Let a function f be defined through-
out an open interval containing a, except possibly at a itself. Then the limit
of f (x) as x approaches a is L, i.e.,
lim f (x) = L, (2.1.1)
x→a
if for every ε > 0 there corresponds a δ > 0 such that |f (x) − L| < ε whenever
0 < |x − a| < δ. In other words, lim f (x) = L means that for every ε > 0
x→a
30 2 DIFFERENTIAL CALCULUS
2 2 2
(f) lim , x 6= 0. Note that lim+ = ∞, and lim− = −∞. Since the
x
x→0 x→0 x x→0 x
value of the limit is not unique as x → 0 from either right or left, this limit
does not exist.
Theorem 2.2. If a >√0 and√n positive integer, or if a < 0 and n odd
positive integer, then lim n x = n a.
x→a
2.2.2 Infinite Limits lim f (x) = ±∞. If f (x) exists on an open interval
x→a
containing a, except possibly at x = a, then f (x) becomes infinite (or increases
without bound), written as lim f (x) = ∞, if, for every positive number N ,
x→a
there corresponds a δ > 0 such that f (x) > N whenever 0 < |x − a| < δ.
Sometimes we say that f (x) becomes positively infinite as x approaches
a. A similar definition for lim f (x) = −∞ is: If f (x) exists on an open
x→a
interval containing a, except possibly at x = a, then f (x) becomes negatively
infinite (or decreases without bound), written as lim f (x) = −∞, if, for every
x→a
positive number M , there corresponds a δ > 0 such that f (x) < M whenever
0 < |x − a| < δ.
1
Example 2.4. (a) lim = ∞ if n is an even positive number;
x→a (x − a)n
1 1
(b) lim = ∞, and lim = −∞ if n is an odd positive
x→a+ (x − a)n x→a− (x − a)n
number.
32 2 DIFFERENTIAL CALCULUS
g(x)
(iv) lim = 0.
x→a f (x)
f (x) − f (c)
f ′ (c) = lim ≥ 0 since x − c < 0 as x → c from left,
x→c− x−c
f (x) − f (c)
f ′ (c) = lim ≤ 0 since x − c > 0 as x → c from right.
x→c+ x−c
Since zero is the only number which is both non-negative and non-positive,
so f ′ (c) = 0.
In the case of a relative minimum, assume in the above proof that f has
a relative minimum at c and note that f (x) − f (c) ≥ 0 for all x sufficiently
close to c, and reverse the sign in the two inequalities above.
Corollary 2.2. If a function f is continuous on a closed interval [a, b] and
if f (a) = f (b), then f has at least one critical number in the open interval
(a, b).
Theorem 2.12. (Mean-value theorem) If a function f is continuous on a
closed interval [a, b] and differentiable on the open interval (a, b), then there
exists a number c ∈ (a, b) such that f (b) − f (a) = f ′ (c)(b − a).
A function f is said to have an absolute maximum or (a global maximum)
on the domain D at the point c ∈ D if f (c) ≥ f (x) for all x ∈ D, where f (c) is
called the maximum value of f on D. Similarly, if f (c) ≤ f (x) for all x ∈ D,
then we say that f has an absolute minimum or (a global minimum) on the
domain D at the point c ∈ D, where f (c) is called the minimum value of f
on D. These extreme values are termed absolute or global because they are
the largest and the smallest value, respectively, of the function f on D.
A function f is said to have a relative maximum or (a local maximum) at
the point c if f (c) ≥ f (x) for all x in an open interval containing c. Similarly,
if f (c) ≤ f (x) for all x in an open interval containing c, then we say that
f has an relative minimum or (a local minimum) at c. This definition can
be extended to include the endpoints of the interval [a, b] by saying that f
has a relative extremum at an endpoint of [a, b] if f attains its maximum or
minimum value at that endpoint in the half-open interval containing it.
34 2 DIFFERENTIAL CALCULUS
A functionf is convex (i.e., CU) iff f ′′ > 0, and concave (i.e., CD) iff f ′′ < 0.
if a < x < c and f ′′ (x) > 0 if c < x < b (see Figure 2.2.).
The functions that are convex or concave at a point are presented graph-
36 2 DIFFERENTIAL CALCULUS
Example 2.5. Let f (x) = x5 −5x3 . Then f ′ (x) = 5x4 −15x2 = 5x2√(x2 −3)
and f ′′ (x) = 20x3 −30x = 10x(2x 2
√ −3). √ The critical√numbers are √ 0, ± 3, and
the values are f (0) = 0, f (− 3) = 30 3.0, f ′′ (− 3) = −30√ 3 < 0. Thus,
′′ ′′
for all numbers t ∈ R, where i, j, k are the unit vectors along the coordinate
axes. Conversely, if f, g, h are functions from X to R, then a vector-valued
38 2 DIFFERENTIAL CALCULUS
r(t + t0 ) − r(t)
r′ (t) = lim , (2.5.3)
t0 →0 t0
r(t) = f (t), g(t), h(t) = f (t) i + g(t) j + h(t) k, the above equation becomes
[f (t)]2 + [g(t)]2 + [h(t)]2 = c, which when differentiated implicitly gives
and we say that r(t) is integrable on [a, b]. Moreover, if R(t) is an antideriv-
ative of r(t) in the sense that R′ (t) = r(t) for all t ∈ [a, b], then
Z b
b
r(t) dt = R(t) a = R(b) − R(a). (2.5.5)
a
Example 2.9. Given u(t) = ti+t2 j+t3 k, and v(t) = sin ti+cos tj+2 sin tk,
we have
2.6 Optimization
An application of the study of global and local extrema of a function leads to
certain optimization problems. Recall that an optimal solution corresponds
to the point or points where a given function attains an absolute maximum or
absolute minimum value. Certain useful guidelines to solve such optimization
problems are as follows:
(i) Relative to a given problem, define a function to be optimized, then plot
its graph and label the relevant quantities, if possible; (ii) label the quantity
that needs to be optimized, and signify the appropriate domain, also known
as the feasible domain, for the problem; and (iii) using the methods of §2.3,
solve the problem.
Example 2.11. To find the rectangle of largest possible area that can be
inscribed in a semi-circle of
√radius r, let the rectangle
√ be of height h, length
w, and area A = hw = h r2 − h2 , where w = 2 r2 − h2 , 0 < h < r (see
Figure 2.5).
√
which gives h = ±r/ 2. Note that A′ is undetermined when h = r (obvious
from the geometry of the problem). Also, we can discard the negative solution
for h. Then at the two endpoints √ and one critical point in [0, r], we have
A(0) = 0 = A(r), and A(h) = A(r/ 2) = r2 . Hence, the maximum √ possible
√
area of the rectangle is A(h) = r2 , and it occurs when h = r/ 2 and w = r 2.
Also check that this maximum area A is smaller that the area πr2 /2 of the
semi-circle.
Example 2.12. A 15” × 24” piece of sheet metal is formed into an open
box by cutting out a square from each of the four corners and folding up
the remaining piece. How large should each square be to obtain a box of
maximum volume?
Let V denote the volume of the open box. From Figure 2.6, we find that
V (x) = x(15 − 2x)(24 − 2x), and the critical points are given by V ′ (x) = 0.
Now,
Using a specific example, suppose that for a certain book printing company
the cost and revenue functions in a particular year are defined, in thousands
of dollars, by C(x) = 2x3 − 12x2 + 30x and R(x) = −x3 + 9x2 , where x
represents the units of 1000 books. It is assumed that this model is accurate
up to approximately x = 6. We are required to find out what the company’s
profit zone is and what level of production will maximize the company’s profit.
The profit function is P (x) = R(x) − C(x) = −3x3 + 21x2 − 30x = −3x(x −
2)(x−5). The solution set of the equation P (x) = 0 is x = {0, 2, 5}. Neglecting
x = 0, we know that the positive break-even points are x = 2 and x = 5, i.e.,
2000 and 5000 books. Again, solving P ′ (x) = 0, or equivalently
√ C ′ (x) =
7 ± 19
R′ (x), we get 3x2 − 14x + 30 = 0, which gives x = ≈ 3.786 or
3
0.880. Using the first derivative test we find that P has a relative maximum
at 3.786 and a relative minimum at 0.880, i.e., the relative maximum at 3786
and relative minimum at 880 books, respectively. Thus, the maximum profit
is P (3.786) = 24.626, or $24, 626. This is presented in Figure 2.7(b), in which
the correspondence between the relative extrema of P and those of C and R
are marked by vertical segments between the graphs of C(x) and R(x) at the
points 0.88 and 3.786.
the relative plateau (i.e., at the point x∗ ) in relation to the coordinate axes
to be a relative maximum and moving upward in relation to the coordinate
axes to be a minimum.
(c) In the case of a function of two variables, the product of the second-
order direct partial derivatives evaluated at the point x∗ must be greater than
the product of the cross partial derivatives also evaluated at the critical point
x∗ . This condition is needed to exclude the cases of an inflection point or a
saddle point at x∗ .
The above conditions are presented in Figure 2.8 for a function of two
variables z = f (x, y), where we have a relative maximum (Figure 2.8(a)) and
a relative minimum (Figure 2.8(b)).
The conditions satisfied in each case are as follows:
For a relative maximum: fx , fy = 0; fxx , fyy < 0; and fxx · fyy > (fxy )2 ;
For a relative minimum: fx , fy = 0; fxx , fyy > 0; and fxx · fyy > (fxy )2 .
The last condition can also be written as fxx · fyy − (fxy )2 > 0.
44 2 DIFFERENTIAL CALCULUS
Since there are different signs in each of the second-order derivative values
in the cases (1) and (4), the function f cannot have a relative extremum at
the critical points (7, 3) and (−7, −3). In the case when fxx and fyy are of
different signs, the product fxx fyy cannot be greater than (fxy )2 , and the
function f is at a saddle point. Next we check the sign of fxx · fyy > (fxy )2
at the remaining two critical points (7, −3) and (−7, 3):
At the point (7, −3) we have (−84) · (−18) > (0)2 ; thus, we have a relative
maximum at (7, −3). Also, at the point (−7, 3) we have (84) · (18) > (0)2 ;
thus, we have a relative minimum at (−7, 3).
As an alternative method, we can use the Hessian (see §1.6.2), which for
this example is defined by
fxx fxy
|H| = .
fyx fyy
2.6 OPTIMIZATION 45
−84 0 −84 0
at (7, 3) : |H| = ; at (7, −3) : |H| = ;
0 18 0 −18
84 0 84 0
at (−7, 3) : |H| = ; at (−7, −3) : |H| = ;
0 18 0 −18
∂f ∂f ∂f
∇f (x) = i+ j+ k, (2.7.1)
∂x ∂y ∂z
where i, j, k are the unit vectors along the coordinate axes. In Rn , we have
∂f ∂f ∂f
∇f (x) = e1 + e2 + · · · + en , (2.7.2)
∂x1 ∂x2 ∂xn
46 2 DIFFERENTIAL CALCULUS
where ei is the unit vector along the ith coordinate axis, i = 1, 2, . . . , n, and
all these vectors form a linearly independent set in Rn .
Theorem 2.18. (Michel and Herget [2007: 88]) Let {e1 , e2 , . . . , en } be the
linearly independent set of unit vectors along the coordinate axes in a vector
Pn ∂f n ∂g
P ∂f ∂g
space Rn . If ei = ei , then = for all i = 1, 2, . . . , n.
i=1 ∂xi i=1 ∂xi ∂xi ∂xi
Pn ∂f Pn ∂g
Proof. If ei = ei , then, by matrix multiplication, we have
i=1 ∂xi i=1 ∂xi
h ∂f ∂g i
− [ei ]T = [0]1×1 = [0]1×n [ei ]Tn×1 , (2.7.4)
∂xi ∂xi 1×n n×1
∂f ∂f ∂f
∇f (x) = e1 + e2 + · · · + en
∂x1 ∂x2 ∂xn
h ∂f ∂f ∂f i T
= ... [e] = 0, (2.7.5)
∂x1 ∂x2 ∂xn
∂f ∂f ∂f
= 0, = 0, . . . , = 0. (2.7.6)
∂x1 ∂x2 ∂xn
between the gradient of a linear mapping, ∇f , and the first partial derivatives
∂f
for i = 1, . . . , n, but also imposes the restriction that the equations (2.7.6)
∂xi
will hold only when condition (2.7.5), ∇f (x) = 0, is satisfied.
This condition is used in the Lagrange multiplier method, the KKT con-
ditions, and the Fritz John condition in optimization problems. However,
there are a couple of cases involving the first- and higher-order Taylor series
approximations, where the above isomorphism is misused (see §3.5).
2.8 MATHEMATICAL ECONOMICS 47
PK K + PL L = E, (2.8.1)
where K and L denote capital and labor, PK and PL their respective prices,
and E the amount allotted for expenditures. In isocost analysis the prices and
the expenditures for individual items are initially held constant; only different
inputs are allowed to vary. Solve the above formula for K, and show that a
change in PL and PK will affect the slope and the vertical intercept.
Solving Eq (2.8.1) for an isocost line we get
E − PL L E P
L
K= , or K = − L,
PK PK PK
which is a straight line of the form y = mx + b, where the slope m = −PL /PK
and the vertical intercept, also called the y-intercept, b = E/PK (see Figure
2.9).
The effect of a change in any one of the parameters can be easily seen
from Figure 2.9. For example, an increase in expenditure from E to E ′ will
increase the vertical intercept and the isocost line (dashed line) will shift out to
the right parallel to the previous line; however the slope remains unaffected
because it depends on the ratio of the prices −PL /PK and prices are not
affected by change in expenditures. A change in PL will change the slope of
the line but not the vertical intercept, but a change in PK will change both
the slope and the vertical intercept.
2.8.2 Supply and Demand. Let Qs and Qd denote the supply and demand
functions, respectively. Equilibrium in supply and demand occurs when Qs =
Qd . For example, the equilibrium prices and quantity are determined in the
following situation: Given Qs = 4P − 7, Qd = 14 − 3P , in equilibrium we
have Qs = Qd , or 4P − 7 = 14 − 3P , or P = 3. Then substituting P = 3 in
either equation we get Qs = 4P − 7 = 12 − 7 = 5 = Qd .
The equilibrium equation is Y = C + I + G + (X − Z), where Y is income,
C consumption, I investment, G government expenditures, X exports, and Z
imports.
above data,
Solving Eqs (2.8.2) and (2.8.3), we find the condition of simultaneous equi-
librium in both markets. Thus, multiplying Eq (2.8.2) by 2 and adding to
Eq (2.8.3) gives 0.7Y = 455, or Y = 650. Then substituting this value
of Y into Eq (2.8.2) we get 130 + 70i = 150, i = 2/7 ≈ 0.29. For these
values of Y and i, the equilibrium values of C, Mt , and Mz are: C =
56 + 0.8Y = 56 + (0.8)(650) = 576, Mt = 0.3Y = (0.3)(650) = 195, Mz =
55 − 140i = 55 − 140(2/7) = 15. To check, C + I = 56 + 0.8Y + 94 = 70i =
56 + (0.8)(650) + 94 − 70(2/7) = 650, and Mt + Mz = 0.3Y + 55 − 140i =
(0.3)(650) + 55 − 140(27) = 195 + 55 − 40 = 210 = Ms .
dπ d(TR) d(TC)
= − = 0,
dQ dQ dQ
50 2 DIFFERENTIAL CALCULUS
d(TR) d(TC)
which implies that = , or MR = MC.
dQ dQ
1 −5/6 5/6 dK 5
K L + K 1/6 L−5/6 = 0,
6 dL 6
which simplifies to give
2.9 Exercises
2.1. Prove that if f is a linear function, then f satisfies the hypotheses of
the mean value theorem on every closed interval [a, b], and that every number
c satisfies the conclusion of the theorem.
2.2. If f is a quadratic function and [a, b] is any closed interval, prove that
there is precisely one number c ∈ (a, b) which satisfies the conclusion of the
mean value theorem. Hint. Consider f (x) = ax2 + bx + c.
2.4. For the function f (x) = x4 = 8x2 + 16, find the intervals where it is
increasing or decreasing. Hint. f ′ (x) = 4x3 − 16x = 4x(2)(x + 2). Three
2.9 EXERCISES 51
cases: (i) f ′ < 0 if x < −2, (ii) f ′ > 0 if 0 < x < 2, and (iii) f ′ > 0 if x > 2.
Ans. f is increasing in the intervals (−2, 0) and (2, ∞) and decreasing in the
intervals (−∞, −2) and 0, 2).
2.7. Find the local extrema of f ′ , and describe the intervals in which f ′ is
increasing or decreasing, given (i) f (x) = x4 − 6x2 ; (ii) f (x) = x4/3 + 4x1/3 .
Ans. (i) maximum f ′ (−1) = 8; minimum f ′ (1) = −8; f ′ increasing on
(−∞, −1] and 1, ∞), decreasing on [−1, 1]. (ii) minimum f (−1) = −3; in-
creasing on [−1, ∞), decreasing on (−∞, −1].
2.8. Use the second derivative test, whenever applicable, to find the local
extrema of f , and the intervals of concavity of f : (i) f (x) = 3x4 − 4x3 + 6;
(ii) f (x) = 2x6 − 6x4 ; and (iii) x2 − 27/x2 .
Ans. (i) minimum: f (1) = 5; CU (concave upward) on (−∞, 0) and
(2/3, 0); CD (concave downward) on (0, 2/3); abscissas of point of inflec-
tion are 0√and 2/3.√(ii) maximum f (0) = 0 (by p first derivative
p test); mini-
mump f (−p 2) = f ( 2) = −8; CU on (−∞, − 6/5) andp ( 6/5, ∞); CD on
(− 6/5, 6/5); abscissas of points of inflection are ± 6/5. (iii) No max-
imum or minimum; CU on (−∞, −3) and (3, ∞); CD on (−3, 0) and (0, 3);
abscissas of points of inflection are ±3.
2.9. Find the intervals where the functions (i) f (x) = xex , and (ii) g(x) =
cos x are concave upward or downward, and the points of inflection, if any.
Ans. (i) f ′′ (x) = (2 + x) ex; f (x) is concave upward if x > −2 and downward
if x < −2; −2 is an inflection point. (ii) g ′′ (x) = − cos x; so g is concave
upward where it is negative and downward where it is positive, with the point
of inflection at 0.
52 2 DIFFERENTIAL CALCULUS
2.11. Find the parametric equation of the tangent line to the curve C
defined by C = {(et , tet , t + 4) : t ∈ R} at the point P = (1, 0, 4). Ans.
x = 1 + t, y = t, z = 4.
2.15. Let r(t) be the position vector of a particle, and s denote the arc
length along a curve C traced by the motion of the particle. Since the mag-
nitude of the velocity r′ (t) = ds/dt, and the direction of r′ (t) is the same
as that of the unit vector T(s) defined by T(s) = r′ (t)/|r′ (t)|, we may write
ds
r′ (t) = T(s), which after differentiating with respect to t gives
dt
d2 s ds d d2 s ds ds ′
r′′ (t) = 2
T(s) + T(s) + 2 T(s) + T (s).
dt dt dt dt dt dt
dv v2
r′′ (t) = T(s) + N(s).
dt ρ
2.9 EXERCISES 53
2.16. An electrical power station (A) is being built on the bank of a river.
Cables to the substation need to be laid out underground and underwater
from another substation (B) 3 km upstream and on the opposite bank of the
river. The river follows a straight line course through this stretch and has a
fairly constant width of 1 km. Given that the cost of laying cable underground
is $30,000 per km and the cost of laying cables underwater is $50,000 per km,
in what proportion should the cable be laid in order to minimize cost?
Solution. A simple solution is to connect substations √ A and B by a
straight line because that will be the shortest distance of 10 between them.
But will it be cost effective
√ since the cable must be laid completely underwater,
at a cost of 50000 × 10 ≈ $ 158, 114, which may not be the minimum cost.
Another approach, to minimize the underwater installation cost, would be
to cross the river along the shortest possible path and then run the under-
ground cable along the bank of the river. Any combination of these two paths
may lead to the absolute minimum cost. Thus, to find the optimal solution,
we consider the path as shown in Figure 2.11.
Since we want to find the absolute minimum of the function C(x) on the
interval [0, 3], we first find the critical points:
1 −1/2
C ′ (x) = 30000 + 50000 ·1 = (3 − x)2 2(3 − x) · (−1)
2
50000(x − 3)
= 30000 + p .
1 + (3 − x)2
Although C ′ (x) exists for all x ∈ [0, 3], there are two values of x for which
C ′ (x) = 0, solving which we find that
50000(x − 3)
30000 + p = 0,
1 + (3 − x)2
9
which simplifies to (x − 3)2 = , thus leading to two solutions: x = 3 ± 34 .
16
Since the solution with the plus sign is outside the feasible interval [0, 3], we
have only one solution at x = 9/4. Thus, we have
C(0) = $ 158, 114 (as above); C(9/4) = $ 130, 000; and C(3) = $ 140, 000.
Hence, the minimum cost of running the cable from station A to B is achieved
by laying a vertical underground cable for apdistance of 2.25 km (A to C) and
a diagonal underwater cable of length 1 + (3/4)2 = 1.25 km (along CB) at
a cost of $ 130,000.
2.19. Supply and demand problems generally involve more than one mar-
ket. Determine the equilibrium price P and quantity Q for the following three
goods:
Qd1 = −5P1 + P2 + P3 + 23, Qs1 = 6P1 − 8,
Qd2 = P1 − 3P2 + 2P3 − 15, Qs2 = 3P2 − 11,
Qd3 = P1 + 2P2 − 4P3 + 19, Qs3 = 3P3 − 5,
where s1, s2, s3 and d1, c2, d3 denote the three supply and demand indices.
Solution. For the equilibrium we have in each market the following
equations:
Market 1: Qd1 = Qs1 gives (a) 11P1 − P2 − P3 = 31,
Market 2: Qd2 = Qs2 gives (b) P1 − 6P2 + 2P3 = −26,
Market 3: Qd3 = Qs3 gives (c) P1 − 2P2 − 7P3 = −24.
Method 1. We will use Cramer’s rule (A.15) to solve this system of three
equations AP = b. Thus,
11 −1 −1
|A| = 1 −6 2 = 11[42 − 4] + [−7 − 2] − [2 + 6] = 401,
1 2 −7
31 −1 −1
|A1 | 1604
|A1 | = −26 −6 2 = 1604, =⇒ P1 = = = 4;
|A| 401
−24 2 −7
11 31 −1
|A2 | 2807
|A2 | = 1 −26 2 = 2807, =⇒ P2 = = = 7;
|A| 401
1 −24 −7
56 2 DIFFERENTIAL CALCULUS
11 −1 31
|A3 | 2406
|A3 | = 1 1 −26 = 2406, =⇒ P3 = = = 6.
|A| 401
1 2 −24
Method 2. This problem can also be solved by the Gauss elimination
method (§1.4.3), as follows: Keeping Eq (a) fixed, eliminate P1 between Eqs
(a) and (b), i.e., multiply Eq (b) by 11 and subtract from Eq (a), to get
Next, eliminate P2 between Eqs (d) and (e), i.e., multiply Eq (d) by 23 and
Eq (e) by 65, and add, to get: 401P3 = 2406, or P3 = 6. Hence, the system
reduces to the triangularized system
11P1 − P2 − P3 = 31,
65P2 − 23P3 = 317,
P3 = 6.
which yields
38 9 8
adj(A) = CT = −9 −76 −23 ,
−8 −23 −65
1
where [A]−1 = | adj(A), and [b] = [31 − 26 − 24]T . Hence, the vector
{A
P = [P1 P2 P3 ]T is given by
T 1
[ P1 P2 P3 ] = [A]−1 = = [A]−1 [b]T ,
|A|
2.9 EXERCISES 57
or
P1 38 −9 −8 31 1604 4
P2 = 1 9 8 −23 −26 =
1
2807 = 7 .
401 401
P3 8 −23 −65 −24 2306 6
Next, multiply Eq (d’) by 19 and Eq (e’) by 9, and subtract; this gives 401P1 =
1604, or P1 = 4. Substitute this value of P1 into Eq (e’) to get P3 = 6. Finally,
substitute these values of P1 and P3 into any one of Eqs (a), (b) or (c), to get
P2 = 7.
Then
dπ
π′ = = −12Q2 −108Q+4800 = −12(Q2 +9Q−400) = −12(Q−16)(Q+25).
dQ
2.23. Using graphs, show how the addition of a proportional tax (a tax
depending on income, also known as super tax) affects the parameters of the
income determination model. Plot two systems: (1) Y = C + I, C = 90 +
0.8Y, I0 = 40 (solid line marked (1) in Figure 2.13); and (2) Y = C + I, C =
90 + 0.8Yd , I0 = 40, Yd = Y − T, T = 25 + 0.25Y , where 25 is the lump-sum
tax (sold line marked (2) in Figure 2.13).
Solution. System (1): Aggregated demand function D = C + I = 90 +
0.8Y + 40 = 130 + 0.8Y ; slope= 0.8; this yields the income of 650.
System (2): Aggregated demand function D = C + I = 90 + 0.8Yd + 40 =
130 + 0.8(Y − 25 − 0.25Y ) = 110 + 0.6Y ; slope = 0.6; this yields the income of
275. Thus, the proportional tax affects not only the slope of the line, or the
MCP, from m = 0.8 to m = 0.6, but also the vertical intercept that is lowered
2.9 EXERCISES 59
since the tax includes a lump-sum tax of 25. As a result of this tax, income
falls from 650 to 275.
dC
2.24. (a) Given C = C0 + bY , we get MCP= = b.
dY
(b) Given C = 1100 + 0.75Yd, where Yd = Y − T , T = 80, we have
dC
C = 1100 + 0.75(Y − 80) = 1160 + 0.75Y . Then MCP = = 0.75.
dY
44
2.25. The average cost function is given by AC = 1.6Q + 4 +. Find the
Q
marginal cost MC. Hint. MC is determined by first finding TC = (AC)Q,
d(TC)
and then using the formula MC= . Ans. MC = 3.2Q + 4.
dQ
2.27. Maximize the following total revenue function TR and total profit
function π by finding the critical value(s), by testing the second-order con-
ditions, and by calculating the maximum TR or π: (a) π = −Q3 − 75Q2 +
60 2 DIFFERENTIAL CALCULUS
2.28. Let an isoquant be defined by 24K 1/4 L3/4 = 2414. Use implicit
differentiation with respect to L to find the slope of the isoquant dK/dL, or
the MRTS for given values of K = 260 and L = 120, and interpret the result.
Solution. By implicit differentiation, we get dK/dL = −3K/L. Then
MRTS = −3(260)/120 = −6.5. This means that if L is increased by 1 unit,
K must be decreased by 6.5 units in order to retain the production isoquant
when the production level is constant.
2.29. Let an isoquant be defined by 50K 3/5 L2/5 = 5000. Use implicit
differentiation with respect to L to find the slope of the isoquant dK/dL, or
the MRTS for given values of K = 360 and L = 160, and interpret the result.
dK 2L
Ans. =− , MRTS= −1.5
dL 3K
8 −1
At (1, 4) : |H| = .
−1 2
2.31. Consider the function f (x, y) = 36y − 4x2 − 8xy − 2y 2 + 72x. Find
the extrema at the critical points for this function.
Solution. We have fx = −8x − 8y + 72 = 0, fy = 36 − 8x − 4y = 0,
solving which we get the critical point (0, 9). Next, fxx = −8, fyy = −4, fxy =
2.9 EXERCISES 61
−8 = fyx , so fxx (0, 9) · fyy (0, 9) = 32 < [fxy (1, 4)]2 = 64. Hence, the function
f has an inflection point at (0, 9).
2.32 Consider the function f (x, y) = 6x2 − 3y 2 − 24x + 6y + 6xy. Find the
extrema at the critical points for this function.
Solution. We have fx = 12x−24+6y = 0, fy = −6y +6+6x = 0, solving
which we get the critical point (1, 2). Next, fxx = 12, fyy = −6, fxy = 6 = fyx .
Since fxx and fyy are of different signs, we get fxx · fyy = −72 < (fxy )2 = 36.
Hence, the function f has a saddle point at (1, 2). The same results are
obtained by using the Hessian at the critical point:
12 6
At (1, 2) : |H| = .
6 −6
2.33. Consider the function f (x, y) = 2x3 + 3y 3 + 18y 2 − 90x − 189y. Find
the extrema at the critical points for this function.
Solution. We have fx = 6x2 − 12x − 90 = 0, fy = 9y 2 + 36y − 189 = 0,
solving which we get the critical points (−3, 3), (−3, −7), (5, 3), (5, −7). Next,
fxx = 12x − 12, fyy = 18y + 36, fxy = 0 = fyx . Then
Since fxx and fyy are of different signs in cases (1) and (4), we have sad-
dle points at (−3, 3) and (5, −7). Next, in case (2) we have fxx · fyy =
(−48)(−18) > (0)2 , so we have a relative maximum at (−3, −7). Again, in
case (3) we have fxx · fyy = (48)(18) > (0)2 , so we have a relative minimum
at (5, 3). The same results are obtained by using the Hessian at each critical
point:
−48 0 −48 0
At (−3, 3) : |H| = ; At (−3, −7) : |H| = ;
0 90 0 −18
48 0 48 0
At (5, 3) : |H| = ; At (5, −7) : |H| = .
0 −18 0 −90
The concept of a convex set is used to define concave and convex functions.
Although the names appear similar, a convex set should not be confused with
a convex function. However, a concave and a convex function are defined in
terms of a convex set.
The set xy is the line segment joining the points x and y in X. Then we
have the following definition: The set Y ⊂ X is said to be a convex set if Y
contains the line segment xy whenever x and y are two arbitrary points in Y
(see Figure 3.1). A convex set is called a convex body if it contains at least
one interior point, i.e., if it completely contains some sphere.
Example 3.1. The following sets are convex: (i) the empty set; (ii) a set
containing one point; a line segment and a plane in R3 ; and (iv) any linear
subspace of X, and (v) a cube and a sphere in R3 .
Example 3.2. Let Y and Z be convex sets in X, let α, β ∈ R, and let
αY = {x ∈ X : x = αy, y ∈ Y }, and βZ = {x ∈ X : x = βZ, z ∈ Z}. Then
the set αY + βZ = {x ∈ X : x = αy + βz, y ∈ Y, z ∈ Z} is a convex set in
X.
Theorem 3.1. Let Y be a convex set in X, and let α, β ∈ R be positive
scalars. Then (α + β)Y = αY + βY .
Proof. If x ∈ (α + β)Y , then x = (α + β)y = αy + βy ∈ αY + βY . Hence,
(α + β)Y ⊂ αY + βY . Let Y be convex, and let x = αy + βz, where y, z ∈ Y .
α β α+β
Then since + = = 1, we get
α+β α+β α+β
1 α β
x= y+ z ∈ Y.
α+β α+β α+β
The left-hand side of (3.2.2) denotes the functional value of a convex function,
which exceeds the combination of functional values on the right-hand side (see
Figure 3.5).
The left-side of this inequality, which is equal to tf (x) + (1 − t)f (x′ ), implies
(3.2.2). Hence, the inequality (3.2.1) implies (3.2.2). Conversely, assume that
tf (x) + (1 − t)f (x′ ) ≤ f (tx + (1 − t)x′ ). Choose y and y ′ such that y ≤ f (x)
and y ′ ≤ f (x′ ). Obviously, (x, y) and (x′ , y ′ ) are both in hyp(f ). Thus,
ty ≤ tf (x), and (1 − t)y ′ ≤ (1 − t)f (x′ ) for any 0 ≤ t ≤ 1. But this implies
that ty + (1 − t)y ′ ≤ tf (x) + (1 − t)f (x′ ). Since the right side of this inequality
1 The prefix hypo- or hyp− (from Greek and Latin) mean ‘under’ or ‘beneath’.
3.2 CONCAVE FUNCTIONS 67
is assumed to be not greater than tf (x) + (1 − t)f (x′ ), we have the inequality
(3.2.3).
ty + (1 − t)y ′ ≤ f (tx + (1 − t)x′ ).
Hence, (tx + (1 − t)x′ , ty + (1 − t)y ′ ) ∈ hyp(f ), i.e., hyp(f ) is a convex set,
and the inequality (3.2.2) implies (3.2.1), which completes the proof.
If the inequality in (3.2.3) is strict, then the function f is called a strictly
concave function; i.e., for any x, x′ ∈ dom(f ),
where t ∈ [0, 1]. This is inequality (3.2.2) valid for concave functions defined
on R.
(ii) For more than two points xi ∈ R, i = 1, . . . , k, Jensen’s inequality is
k
X Xk
f ti xi ≥ ti f (xi ), (3.3.3)
i i
k
P
where ti = 1, ti ≥ 0.
i
R
(iii) Let p(x) ≥ 0 such that p(x) dx = 1. Then the continuous form of
Jensen’s inequality is
Z Z
f x p(x) dx ≥ f (x)p(x) dx. (3.3.4)
This inequality is a basic result which has produced many other useful
inequalities. For example, we have the arithmetic-geometric mean inequality:
a+b √
≥ ab for a, b ≥ 0, (3.3.6)
2
a + b
1
which can be easily proved by using the inequality log ≥ 2 (log a +
2
log b) for the function f (x) = log x, which is concave for all x > 0.
2 The prefix epi- from Greek and Latin means ‘over’ or ‘upon.’
3.4 CONVEX FUNCTIONS 71
Note that the property (3.4.1) is, in many cases, weakened by requiring
that
f (tx + (1 − t)x′ ) ≤ max{f (x), f (x′ )} (3.4.5)
for all x, x′ ∈ X and t ∈ [0, 1]. The inequality (3.4.5) is known as the modified
Jensen’s inequality.
We will assume that all convex functions are extendable, and hence, use
the same notation f for a convex function as well as its extension f˜.
Let f : X 7→ Y denote the mapping of a set X ∈ Rn into another set
Y ∈ Rn . Then a mapping f1 of a subset X1 ⊂ X into Y is called the
restriction of f to the set X1 . This definition leads to the following result:
Theorem 3.6. Let f be a convex function defined on a convex subset of
Rn . Then f is convex iff its restriction to every chord in the convex domain
set is a convex function.
72 3 CONCAVE AND CONVEX FUNCTIONS
m
P
(iv) f (x) = log(bi − aTi x)−1 is convex, where dom (f ) = {x | aTi x < bi , i =
i=1
1, . . . , m}.
For the composition of three functions defined by f (x) = h(g1 (g2 ))(x),
where f, h, g1 and g2 are differentiable, we have the following result.
Theorem 3.7. Let f (x) = h(g1 (g2 ))(x), where h : R2 7→ R, and gi : R 7→
R. Then f is convex if h is a univariate convex and nondecreasing function
in each argument, and g1 , g2 are convex.
Proof. For the composition f (x) = h(g1 (g2 ))(x),
f ′′ (x) = h′′ (g1 (g2 ))g1′ (g2 )g2′ + h′ (g1 (g2 )) g1′ (g2 )g2′′ + g1′′ (g1′ )2 . (3.4.7)
∂f (a)
f (x) = f (a) + (x − a), (3.5.1)
∂x
3.5 DIFFERENTIABLE FUNCTIONS 75
The matrix form (3.5.3) can be compared with the summation form of the
second-order Taylor’s series approximation given in §B.3.
!!! As a tradition, some authors define the first-order derivatives of f as
∇f (x), and the first-order Taylor’s series approximation of f (x) at a point a
as
f (x) = f (a) + ∇f (x)(x − a), (3.5.5)
which is misleading because it is based on the isomorphic equality (2.7.7).
But isomorphism and equality are two different operations. If we rewrite this
approximation as
f (x) − f (a) = ∇f (x)(x − a),
we see that the left-hand side is a scalar, while the right-hand side is a vector;
thus, this equality cannot be justified by any argument. Similarly, the second-
order Taylor’s series approximation is defined as
1
f (x) = f (a) + ∇f (x)T (x − a) + (x − a)T ∇2 f (x)(x − a), (3.5.6)
2
76 3 CONCAVE AND CONVEX FUNCTIONS
which is again abused on two counts: (i) the second term on the right side is
already shown to be misleading, and (ii) the last term on the right involves
∇2 f , which is the Laplacian of f , same as the trace of the Hessian matrix
∂2f
H, and hence, it does not represent all second-order derivatives for
∂xi ∂xj
i, j = 1, 2, . . . , n (for more details, refer to §1.6.2). In fact, the second-order
Taylor approximation (3.5.6) does not reduce to the second-order Taylor ap-
proximation (3.5.4) for a function of two variables.
(1) fxx (2, 3) = −6(2) = −12 < 0, fyy (2, 3) = 12(3) = 36 > 0,
(2) fxx (2, −3) = −6(2) = −12 < 0, fyy (2, −3) = 12(−3) = −36 < 0,
(3) fxx (−2, 3) = −6(−2) = 12 > 0, fyy (−2, 3) = 12(3) = 36 > 0,
(4) fxx (−2, −3) = −6(−2) = 12 > 0, fyy (−2, −3) = 12(−3) = −36 < 0.
Since there are different signs for each of the second direct partials in (1) and
(4), the function f cannot be at a relative extremum at (2, 3) and (−2, −3).
3.6 UNCONSTRAINED OPTIMIZATION 77
However, since the signs of second partials are both negative in (2) and positive
in (3) above, the function f may have a relative maximum at (2, −3) and a
relative minimum at (−2, 3). Since fxx and fyy are of different signs, the
product of fxx and fyy cannot be greater than (fxy )2 .
Since fxy = 0 = fyx , we check fxx · fyy > (fxy )2 at the critical points
(2, −3) and (−2, 3):
fxx (2, −3)·fyy (2, −3) = (−12)(−36) > 0, fxx (−2, 3)·fyy (−2, 3) = (12)(36) > 0.
Thus, f has a relative maximum at (2, −3) and a relative minimum at (−2, 3).
Example 3.8 Consider f (x, y) = 3x2 − xy + 2y 2 − 4x − 7y + 8. The
first-order partial derivatives, equated to zero, give: fx = 6x − y − 4 = 0, fy =
−x + 4y − 7 = 0, solving which we get x = 1, y = 2. Thus, the critical number
is (1, 2). The second-order partial derivatives are: fxx = 6, fxy = fyx =
−1, fyy = 4. Checking the condition fxx · fyy > (fxy )2 , we have 6 · 4 > (−1)2 .
Since both fxx and yy are positive, we have a global minimum at (1, 2).
Example 3.9. Consider f (x, y) = 52x + 36y − 4xy − 6x2 − 3y 2 + 5. The
first-order partial derivatives, equated to zero, give fx = 52 − 4y − 12x =
0, fy = 36 − 4x − 6y = 0, so the critical point is (3, 4). The second-order
partial derivatives are: fxx = −12, fxy = −4, fyy = −6. Since both fxx < 0,
Fyy < 0, and (fxy )2 = (−4)2 > 0, and fxx · fyy > (fxy )2 at the point (3, 4),
the function f has a global maximum at (3, 4).
Example 3.10. Consider f (x, y) = 48y − 3x2 − 6xy − 2y 2 + 60x. The
first-order partial derivatives, equated to zero, give fx = −6x − 6y + 60 =
0, fy = −6x − 4y + 48 = 0, so the critical point is (4, 6), The second-order
partial derivatives are: fxx = −6 < 0, fxy = −6 = fyx < 0, fyy = −4 < 0, and
fxx · fxy = (−6)(−4) = 24 < (fxY )2 = 36. The function f has an inflection
point at (4, 6).
Example 3.11. Optimize the following total profit function π:
(a) π = −Q2 + 15Q = 36; and (b) π = Q63 − 25Q2 + 1200Q − 316, where Q
is the total output function.
Ans. (a) Critical point Q = 7.5; π ′′ (Q) = −2, so π ′′ (7.5) = −2 < 0,
convex, relative minimum at Q = 7.5.
(b) π = 3Q2 − 50Q + 1200 = (3Q + 40)(Q − 30) = 0, so critical points
are Q = −(40)/3, 30; π ′′ = 6Q − 30, so Q(−40/3) < 0, concave, relative
maximum; Q(30) > 0, convex, relative minimum.
78 3 CONCAVE AND CONVEX FUNCTIONS
3.7 Exercises
3.1. Some graphs of functions and domains are presented below in Figure
3.10. What can you say about each one of these graphs?
Ans. (a)-(e) concave; (f)-(i): convex; (j): indifference curve; (k)-(l): non-
convex sets.
An indifference curve is the set of all (x, y) where a utility function u(x, y)
has a constant value. In Figure 3.10(j) the values k1 , k2 , k3 represent the
indifference curves, each one obeying a different level of utility; for example,
k1 = 4, k2 = 12, k3 = 16.
3.2. Given the following plots of polygons (Figure 3.11), determine which
3.7 EXERCISES 79
3.3. Prove that if f1 , f2 are convex, then the pointwise supremum func-
tion max{f1 (x), f2 (x)} is convex. Proof. Note that epi max{f1 (x), f2 (x)}
corresponds to the intersection of the two epigraphs (Figure 3.12).
3.4. Let Y be a convex set in X. Prove that the closure Ȳ is a convex set.
Hint. Use definition (3.1.1) from a line segment from a boundary point to
another.
3.5. Prove that if f is convex on the interval [a, b] iff for any a < x1 <
x2 < x3 < b, we have
Proof. Let f be convex and twice differentiable on [a, b], and choose a <
80 3 CONCAVE AND CONVEX FUNCTIONS
Thus,
x3 − x − 1)f (x2 ) ≤ (x3 − x − 2)f (x1 ) + (x2 − x1 )f (x3 ).
or
(a, b).
3.7. Choose k points (x1 , f (x1 )), . . . , (xk , f (xk )) on the graph of the
function y = f (x), and assign these k points normalized masses (weights)
p1 , . . . , pk ∈ [0, 1] such that p1 + · · · + pk = 1. Then the center of grav-
Pk Pk
ity is defined at the point (xg , yg ) with xg = , yg = pif (xi ). Prove
i=1 i=1
that if f is a convex function on the interval [a, b], then for anyPchoice of
{x1 , . . . , xk } ∈ [a, b] and associated weights p1 , . . . pk ∈ [0, 1] with i pki = 1,
there holds the inequality f (xg ) ≤ yg . Hint. Use induction on k.
3.8. Let f and g be real-valued concave functions with the same domain
D. Define a function h so that h(x) = f (x)+g(x) for all x ∈ D. Is h a concave
function? If it is, prove it; otherwise provide a counterexample. Ans. Since
f and g are concave functions with domain D, then for x ∈ D and y ∈ D,
x 6= y, and for all t ∈ [0, 1], we have
which means that h is a concave function. A similar result holds for convex
functions; just replace the word ‘concave’ by ‘convex’ and the operation ‘≥’
by ‘≤’ in the above statement and proof.
3.9. Let f be a convex function on [0, 1]. Prove that for any two sets
Pk Pk
p1 , . . . , pk ∈ [0, 1] with i=1 pi = 1, and q1 , . . . , qk ∈ [0, 1] with i=1 qi = 1,
k
P k
P
there holds the inequality pi log qi ≤ pi log pi . Proof. Let xi = qi /pi .
i=1 i=1
Using Exercise 3.6, we find that for a convex function,
k
X k
X
f pi xi ≤ pi f (xi ).
i=1 i=1
3.10. Show that the function f (x) = log x is concave for x > 0.
Proof. Take t = 21 , and use the definition (3.4.2); we should show
x + y f (x) + f (y) x + y log x + log y
that f ≥ , or that log ≥ , or that
x + y2 log(xy) 2 2 x + y2
log ≥ for all x, y > 0. This leads to log ≥ log(xy)1/2 ,
2 2 2
x+y √
which after exponentiating both sides gives ≥ xy for all x, y > 0 (see
2
inequality (3.3.6)).
3.11. Show that the function f (x) = xα is concave for 0 ≤ α ≤ 1.
Hint. Concavity or convexity can be verified using the inequality (3.2.2)
and (3.4.1), respectively, or by checking that the second derivative is nonpos-
itive or nonnegative.
3.12. Show that the following functions are convex on the specified do-
main: (a) f (x) = xα on R++ for α ≥ 1 or α ≤ 0; (b) f (x) = eax on R for
82 3 CONCAVE AND CONVEX FUNCTIONS
which shows that the Hessian is nonnegative, and hence, f (x, y) is convex.
3.13. Let x⌊i⌋ denote the ith largest integral component of x (i.e., largest
integer not exceeding x), which means that the terms x⌊1⌋ ≥ x⌊2⌋ ≥ · · · ≥ x⌊n⌋
m
P
are in nondecreasing order. Then the function f (x) = x⌊i⌋ is a convex
i=1
function.
m
P
Hint. The result follows by writing f (x) = x⌊i⌋ = max xi1 + xi2 +
i=1
· · · xim | 1 ≤ i1 ≤ i2 ≤ im ≤ n , which is the maximum of all possible sums of
m different largest integral components of x, and it is a convex function since
n!
it is the pointwise maximum of of linear functions.
m!(n − m)!
3.14. Use the inequality (3.3.6) to derive the Hölder inequality: For p, q >
1 1 n
1, p + q = 1, and x, y ∈ R ,
n
X n
X n
1/p X 1/q
xi yi = |xi |p |yi |p . (3.7.2)
i=1 i=1 i=1
1/p 1/q
|xi |p |yi |q 1 |xi |p 1 |yi |q
n n ≤ n + P
P P p P q n
|xj |p |yj |q |xj |p |yj |q .
j=1 j=1 j=1 j=1
3.7 EXERCISES 83
Then summing over i we get the Hölder inequality. The equality in (2.7.2)
|x1 |p−1 |x2 |p−1 |xn |p−1
holds iff = = ··· = . Note that for p = 2 this
|y1 | |y2 | |yn |
inequality reduces to Cauchy-Schwarz inequality:
n
X 2 n
X n
X
2
|x|i yi | ≤ |xi | |yj |2 , (3.7.4)
i=1 i=1 j=1
Since the signs in (1) and (4) are different, they are ignored; they can be
saddle points. Since fxy = 0 = fyx , we get from (2): (−24)(−42) > (0)2 , and
from (3): (24)(42) > (0)2 . Hence, the function f has a relative maximum at
(−3, −5) and a relative minimum at (5, 2), and a saddle point at (−3, 2) and
(5, −5).
3.16. For the following function, find the critical points and determine if at
these points the function is a relative maximum, relative minimum, inflection
point, or saddle point.
(a) f (x, y) = 3x2 − xy + 2y 2 − 4x − 7y + 10.
(b) f (x, y) = 48y − 3x2 − 6xy − 2y 2 + 72x.
(c) f (x, y) = 5x2 − 3y 2 − 30x + 7y + 4xy.
(d) f (x, y) = 3x3 − 5y 2 − 225x + 70y + 20.
Ans. (a) Critical point (1, 2); f has a global minimum at (1, 2).
(b) Critical point (0, 12); inflection point at (0, 12).
(c) Critical point (2, 5/2); saddle point.
84 3 CONCAVE AND CONVEX FUNCTIONS
(d) Critical points (5, 7), (−5, 7); relative maximum at (−5, 7), saddle point
at (5, 7).
3.17. Find (a) the critical points, and (b) test whether the following func-
tion is at a relative maximum or minimum: z = 3y 3 − x3 + 108x − 81y + 32.
Ans. (a) Equate the first-order partial derivatives to zero, and solve for
x and y: zx = −3x2 + 108 = 0, which gives x = ±6. Similarly, zy =
9y 2 − 81 = 0, which gives y = ±3. Thus, there are four distinct critical points
at (6, 3), (6, −3), (−6, 3), (−6, −3). The second partials are zxy = zyx = 0.
(b) Take the second-order direct partial derivatives, evaluate them at each
critical point, and check the signs:
Since there are different signs for each second partial in (1) and (4), the
function cannot be a relative maximum or minimum at (6, 3) or (−6, −3).
Also, since zxx zyy − (zx y)2 is negative at (6, 3) and positive at (−6, −3), there
is a saddle point at each of these two points. Now, since zxx and zyy are of
different signs in (2) and (3), the function may have a relative maximum at
(6, −3) and a relative minimum at (−6, 3). But since zxx and zyy are of the
same sign in (2) and (3), while zxx zyy < (zx y)2 in (3), there is an inflection
point at (−6, 3); and thus, there is a relative maximum at (6, −3).
3.18. Test for relative maxima and minima of the function f (x, y) =
x3 + 3xy 2 − 3x2 − 3y 2 − 40.
Ans. fx = 3x2 + 3y 2 − 6x = 0, fy = 6xy − 6y = 0. Solving these equations
simultaneously, we get the critical points as (0, 0), (2, 0), (1, 1), (1, −1). Also,
fxx = 6x − 6, fyy = 6x − 6, fxy = fyx = 6y. Then
at (0, 0): fxx fyy − (fxy )2 > 0 and fxx < 0 ⇒ relative maximum;
at (2, 0): fxx fyy − (fxy )2 > 0 and fxx > 0 ⇒ relative minimum;
at (1, 1): fxx fyy − (fxy )2 < 0 ⇒ saddle point;
at (1, −1): fxx fyy − (fxy )2 < 0 ⇒ saddle point.
3.19. Find the minimum distance between the origin and the surface
z 2 = x2 y + 4.
Ans. Let P (z, y, z) be any point on the surface. Then the square of
3.7 EXERCISES 85
λ −λ|x|
3.22. Show that the density function f (x) = e , λ > 0, of the Laplace
2
distribution ln f (x) = ln |x| is a concave function. What can you say about
f ′ (x)? Ans. f ′ (x) < 0 for x 6= 0; f ′ (x) does not exist at x = 0. Also
86 3 CONCAVE AND CONVEX FUNCTIONS
see Appendix C.
3.23. Prove that a (concave, convex, or any) function f : Rn 7→ R is
differentiable at a point x ∈ dom(f ) iff the gradient ∇f exists. Proof. The
gradient of f is defined at x ∈ Rn by
∂f ∂f ∂f ∂f ∂f ∂f
∇f (x) = e1 + e2 + · · · + en = [e]T ··· ,
∂x1 ∂x2 ∂xn ∂x1 ∂x2 ∂xn
where [e] is the 1 × n matrix of unit vectors ei , each in the direction of the
∂f
respective axis for i = 1, 2, . . . , n. If f is differentiable at x, then all
∂xi
exist, which implies that ∇f exists. Conversely, if ∇f exists, then all first-
∂f
order partial derivatives exist for each i = 1, 2, . . . , n. However, note
∂xi
∂f
that using ∇f is taking the question too far; simply the existence of ,
∂xi
i = 1, 2, . . . , n, should suffice.
3.24. Prove that the definitions (3.2.1) and (3.2.2) are equivalent. Hint.
Follow the proof of Theorem 3.4 by using proper inequality signs.
3.25. Let f : Rn 7→ R be defined by f (x) = p x. Consider the problem of
minimizing p · x by choosing x subject to the condition that x belongs to a
constraint set G. Prove that the minimum value of C(p) = min{p x | x ∈ G}
is a linear homogeneous and concave function of p.
3.26. Optimize the following functions by (i) finding the critical values at
which the function is optimized, and (ii) testing the second-order condition
to determine if it is a relative maximum or minimum.
(a) f (x) = −x3 + 6x2 + 135x − 26; (b) f (x) = x4 − 4x3 − 80x2 + 108; and
(c) f (x) = (11 − 5x)4 .
Ans. (a) Critical points x = −5, 9; f ′′ (−5) > 0, convex, relative minimum
at x = −5; f ′′ (9) < 0, concave, relative maximum at x = 9.
(b) Critical points −5, 0, 8; f ′′ (−5) > 0, convex, relative minimum at x =
−5; f ′′ (0) < 0, concave, relative maximum at x = 0; f ′′ (8) > 0, convex,
relative minimum at x = 8.
(c) Critical points x = 11 ′′ ′′′
5 ; f (11/5) = 0, test fails; f (11/5) = 0, test
inconclusive; f (11/5) > 0, convex, relative minimum at x = 11
iv)
5 .
4
Concave Programming
4.1 Optimization
As we have seen, optimization problems deal with finding the maximum or
minimum of a function (i.e., optimizing the objective function) subject to
none or certain prescribed constraints.
We will consider the following four cases of necessary and sufficient condi-
tions for (local) optimality: (1) no constraints; (2) only equality constraints;
(3) equality and inequality constraints; and (4) only inequality constraints.
where the Hessian is defined in §1.6.2, and definite and semidefinite matrices
in §1.5.
For unconstrained minimization, the necessary and sufficient condi-
tions for a local minimum of f (x) at x∗ are:
(i) the first partial derivative of f with respect to each xi , i = 1, 2, . . . , n, is
zero, i.e.,
∂f
(x) = 0, i = 1, 2, . . . , n, (4.1.3)
∂xi
where the critical point x∗ is obtained by solving equations (4.1.3) simulta-
neously; and
(ii) the Hessian |H| of f at x∗ is positive semidefinite (PSD), i.e.,
Many examples of minimization for the unconstrained case have been consid-
ered in the previous chapters, and some examples on minimization for this
case will be presented in the next chapter.
!!! The necessary and sufficient conditions (4.1.1) and (4.1.3) are sometimes
defined by stating that ∇f (x∗ ) = 0, which reduces to (4.1.1) and (4.1.3).
(1) fxx (2, 3) = −6(2) = −12 < 0, fyy (2, 3) = 12(3) = 36 > 0,
(2) fxx (2, −3) = −6(2) = −12 < 0, fyy (2, −3) = 12(−3) = −36 < 0,
(3) fxx (−2, 3) = −6(−2) = 12 > 0, fyy (−2, 3) = 12(3) = 36 > 0,
(4) fxx (−2, −3) = −6(−2) = 12 > 0, fyy (−2, −3) = 12(−3) = −36 < 0.
Since there are different signs for each of the second direct partials in (1) and
(4), the function f cannot have a relative extremum at (2, 3) and (−2, −3).
However, since the signs of second partials are both negative in (2) and both
positive in (3) above, the function f may have a relative maximum at (2, −3)
and a relative minimum at (−2, 3). Since fxx and fyy are of different signs,
the product of fxx and fyy cannot be greater than (fxy )2 .
4.2 METHOD OF LAGRANGE MULTIPLIERS 89
Since fxy = 0 = fyx , we check fxx · fyy > (fxy )2 at the critical points
(2, −3) and (−2, 3):
fxx (2, −3)·fyy (2, −3) = (−12)(−36) > 0, fxx (−2, 3)·fyy (−2, 3) = (12)(36) > 0.
Hence, f has a relative maximum at (2, −3) and a relative minimum at
(−2, 3).
Example 4.2. Maximize the utility function u(x, y) = 3xy subject to the
constraint g(x) = 3x + 4y − 60. The Lagrangian is L(x, y, λ) = 3xy + λ(60 −
3x−4y), and the first-order partial derivatives equated to zero give the system
of equations Lx = 3y − 3λ = 0, Ly = 3x − 4λ = 0, Lλ = 60 − 3x − 4y = 0.
Writing this system in the matrix form Ax = b:
0 3 −3 x 0
3 0 −4 y = 0 .
−3 −4 0 λ −60
Using Cramer’s rule to solve this system, we have |A| = 72, |A1 | = 720, |A2 | =
540, |A3 | = 540, which give the critical point x∗ , y ∗ = 7.5, λ∗ = 7.5. Next,
taking the second-order partial derivatives of U with respect to x, y, (Lxx =
0, Lxy = 3 = Lyx , Lyy = 0), the first-order partial derivatives of g are:
gx = 3, gy = 4, and writing in the left-side form of (1.6.7) we get
0 3 4 0 3 3
|H̄| = 3 0 3 , or |H̄| = 3 0 4 .
4 3 0 3 4 0
Thus, we find the value of |H̄2 | = |H̄| = 72 from the bordered Hessian on the
left, and that of |H̄2 | = |H̄ = 72 from the bordered Hessian on the right. Note
that there is no need to use both forms of the bordered Hessian; the left-hand
form works as well. Since |H̄| = |A| > 0, the bordered Hessian |H̄| is negative
definite. Also, uxx · uyy = 0 < (uxy )2 = 9, the function u is maximized at the
above critical point (x∗ , y ∗ ) = (7.5, 7.5).
or
F (x, y, λ) = f (x, y) + λ(k − g(x, y)), (4.2.2)
The critical values at which the objective function is optimized are denoted
by x∗ , y ∗ , λ∗ , and are determined by solving equations (4.2.3) simultaneously.
The second-order conditions will obviously be different from those for the
unconstrained optimization considered in §4.1; they are discussed in the se-
quel.
4 12 1
|H| = 12 −10 1 ,
1 1 0
and its second principal minor is |H2 | = |H| = 4(−1) − 12(1) + 1(12 + 10) =
6 > 0. Since |H̄2 | > 0, |H| is negative definite, and so F (x, y) has a local
maximum at (22, 8).
Note that the Lagrange multiplier λ approximates the marginal impact
on the objective function caused by a small change in the constant of the
constraint. Thus, in the above example, with the value of λ∗ = 184, a 1-unit
4.3 KARUSH-KUHN-TUCKER CONDITIONS 91
m k
∂f ∗ X ∂gi ∗ X ∂hj ∗
µ0 (x ) + µi (x ) + λj (x ) = 0, (4.3.7)
∂xi i=1
∂xi j=1
∂xj
then the KKT conditions belong to a wider class of first-order necessary con-
ditions (FONC) which allow non-smooth functions using subderivatives. Con-
dition (4.3.7) is known as the Fritz John condition, to be discussed later in
§5.3.
4.3 KARUSH-KUHN-TUCKER CONDITIONS 93
name. The general rule is: The necessary conditions are sufficient for optimal-
ity if the objective function f in an optimization (maximization) problem is a
concave function, where the inequality constraints gi are continuously differ-
entiable convex functions and the equality constraints hj are affine functions.
Example 4.5. (Nonlinear programming problem in R2 ) Consider the prob-
lem: maximize f (x, y) = xy subject to the constraints x + y 2 ≤ 2 for x, y ≥ 0.
Since the feasible region is bounded, a global maximum for this problem ex-
ists, because a continuous function on a closed and bounded (compact) set has
a maximum there. We write the given constraints as g1 (x, y) = x + y 2 ≤ 2,
g2 (x, y) = −x ≤ 2, g3 (x, y) = −y ≤ 0. Then the KKT conditions can be
written as
y − λ1 + λ2 = 0, (4.3.10)
x − 2yλ1 + λ3 = 0, (4.3.11)
2
λ1 (2 − x − y ) = 0, (4.3.12)
λ2 x = 0, (4.3.13)
λ3 y = 0, (4.3.14)
2
x + y ≤ 2, (4.3.15)
x, y, λ1 , λ2 , λ3 ≥ 0. (4.3.16)
This problem has two constraints. Using the Lagrange method we can set up
this problem with two constraints so that the Lagrange problem becomes
We find a solution for x∗ and y ∗ , and then check if the constraint that was
ignored (i.e., λ2 ) has been violated. If the answer is yes, then go to Step 2.
Step 2. (Use both constraints, assuming that they are binding) We take
λ1 > 0, λ2 > 0. Then the first-order KKT conditions (4.3.19) become
Lx = ux − λ1 Px − λ2 = 0,
Ly = uy − λ1 Py = 0,
(4.3.21)
Lλ1 = B − Px x − Py y = 0,
Lλ2 = x∗ − x = 0.
Then the solution will be the point where the two constraints intersect.
Step 3. (Use the second constraint, ignore the first one) We assume λ2 >
0, λ1 = 0, and the first-order KKT conditions (4.3.19) become
Lx = ux − λ1 Px − λ2 = 0,
Ly = uy − λ1 Py = 0, (4.3.22)
∗
Lλ2 = x − x = 0.
Lx = y − λ1 − λ2 = 0, x ≥ 0,
Ly = x − λ1 = 0, y ≥ 0,
(4.3.24)
Lλ1 = 90 − x − y ≥ 0, λ1 ≥ 0,
Lλ2 = 30 − x ≥ 0, λ2 ≥ 0.
The first-order KKT conditions, which are necessary and sufficient conditions
for maximization, give the following six relations:
∂F
1a. = f (x∗1 , x∗2 ) + λg(x∗1 , x∗2 ) ≤ 0,
∂xi
1b. xi ≥ 0,
∂F
1c. x∗i = 0, i = 1, 2,
∂xi
(4.4.3)
∂F
2a. = g(x∗1 , x∗2 ) ≥ 0,
∂xi
2b. λ∗ ≥ 0,
∂F
2c. λ∗ = 0,
∂λ
where x∗1 , x∗2 are the critical points of f . Conditions (1.c) and (2.c) are called
the complementary-slackness conditions, which implies that both x∗1 , x∗2 , and
∂f (x∗1 ) ∂f (x∗2 )
and cannot both be zero.
∂xi ∂xi
For a linear objective function which is concave or convex, but not strictly
concave or strictly convex, the concave programming that satisfies the KKT
98 4 CONCAVE PROGRAMMING
conditions will always satisfy the necessary and sufficient conditions for a
maximum.
The significance of the KKT conditions (4.4.3) is as follows:
(i) Condition (1.a) requires that the Lagrangian function F be maximized
with respect to x1 and x2 , while condition (2.a) demands that the function F
be minimized with respect to λ. This means that the concave programming
is designed to seek out a saddle point in the function F in order to optimize
the objective function f subject to the inequality constraints g.
(ii) In optimization problems with equality constraints which are set equal
to zero, the quantity λg related to the constraint can be either subtracted
or added to the objective function f to define the Lagrangian function as in
Eqs (4.2.1) and (4.2.2). However, in concave programming with inequality
constraints, the order of subtraction is very important, since the constraint in
the KKT conditions is always expressed in the ≥ 0 form.
The KKT conditions with inequality constraints can be explained in the
single variable case as follows. Suppose we want to find a local maximum for
an objective function f (x) in the first quadrant x ≥ 0 (this is the inequality
constraint). There are three cases: (i) The critical point is an interior point
of the first quadrant (Figure 4.1(a)): f ′ (x) = 0 and x > 0; (ii) the critical
point is on the boundary (at G) (Figure 4.1(b)): f ′ (x) = 0 and x = 0; and
(iii) critical point is at H or J (Figure 4.1(c)): f ′ (x) < 0 and x = 0. Thus,
all these three cases can be stated concisely in one statement:
f ′ (x) ≤ 0, x ≥ 0 and xf ′ (x) = 0,
which is precisely contained in the KKT conditions. Note that these con-
ditions exclude a point like A in Figure 4.1(a) where it is not a maximum,
because f ′ (K) > 0.
At this point we are unable to determine whether the function h(z) looks
like h1 (z) or h2 (z). Thus, we need a set of first-order KKT conditions that
100 4 CONCAVE PROGRAMMING
either z ∗ = 0 and h′ (z ∗ 0 ≤ 0,
or z ∗ > 0 and h′ (z ∗ ) = 0.
h′ (z ∗ ) ≤ 0 and z ∗ (h′ (z ∗ )) = 0.
Notice that the KKT conditions are similar to the complementary slackness
conditions which are used in the Lagrangian formulation. Recall that f (x, y)
such that g(x, y) < k, without ignoring the non-negativity on x and y yet, the
Lagrangian is
L = f (x, y) − λ(g(x, y) − k), (4.4.5)
∂f (x∗ , y ∗ )
Lx = − λgx (x∗ , y ∗ ) = 0,
∂x
∂f (x∗ , y ∗ )
Ly = − λgy (x∗ , y ∗ ) = 0, (4.4.6)
∂y
g(x∗ , y ∗ ) − k ≤ 0,
λ∗ g(x∗ , y ∗ ) − k = 0.
The last condition in (4.4.6) implies that either the constraints bind, or else
the Lagrange multiplier λ is zero. This means that the solution could be in
region 7 or region 1 (see Figure 4.2a).
The KKT conditions are relevant in problems where we add the non-
negativity constraints x ≥ 0 and y ≥ 0 to the constrained maximization
problem. This adds the restriction implied by these constraints directly into
the first-order conditions, i.e., they capture the way the first-order conditions
change when the solution is in the region 2 − 6 in Figure 4.2(a).
Example 4.8. A company wants to maximize utility while spending no
more than a predetermined budget. Suppose the concave programming prob-
lem is posed as:
The Lagrangian is
Next, the Lagrangian L is minimized with respect to the variable λ and the
related conditions:
∂L
2a. = B − px x − py y,
∂λ
∗
2b. λ ≥ 0,
2c. λ∗ (B − px x − py y) = 0.
ux − λ∗ px = 0, uy − λ∗ py = 0,
which gives
ux uy
λ∗ = = . (4.4.7)
px py
Since px , py > 0 and assuming that the customer is unsatisfied (i.e., ux , uy >
0), we have λ∗ > 0. But then from (2c) we will have B − px x − py y = 0. Thus,
the budget constraint behaves exactly like an equality constraint (and not a
weak inequality). Hence, the optimal points (x∗ , y ∗ ) will lie somewhere on the
ux px
budget line and not below it. Further, from Eq (4.4.7) we also get = .
uy py
ux px
Since is simply the slope of the indifference curve, and is the slope of the
uy py
budget line, whenever both x∗ , y ∗ > 0 (this case), the indifference curve will
be tangent to the budget line at the point of optimization, and this provides
an interior solution as in Figure 4.1a (see Figure 4.3a). This case, in fact,
reduces the problem to the constrained optimization problem as discussed in
previous section.
Case 2. If x∗ = 0, y ∗ > 0, then from (1.c) we have
ux uy
ux − λ∗ px ≤ 0, uy − λ∗ py = 0; thus λ∗ ≥ , λ∗ = . (4.4.8)
px py
Assuming that ux , uy , px , py > 0, we get λ∗ > 0. Thus, from (2c) the budget
constraint behaves exactly like an equality constraint, and not a weak inequal-
ity, even though only one variable is greater than zero and the other equals
102 4 CONCAVE PROGRAMMING
zero. Hence, as in case 1, the optimal point (x∗ , y ∗ ) will lie on the budget line
and not below it (Figure 4.3a).
ux
Now, substituting λ∗ = on the right into (4.4.8) we get
uy
ux uy ux py
≤ , or ≤ .
px py uy px
This means that the indifference curves along the budget are everywhere flat-
ter than the budget line, which leads to a corner solution in the upper left
(Figure 4.3b), and at the corner solution the slope of the budget indifference
curve that just touches the budget line may be flatter or equal to the slope of
the budget line.
1a. Πx = 30 − 2x∗ − λ∗ ≤ 0, Πy = 64 − 4y ∗ − λ∗ ≤ 0,
1b. x∗ ≥ 0, y ∗ ≥ 0,
1c. x∗ (30 − 2x∗ − λ∗ ) = 0, y ∗ (64 − 4y ∗ − λ∗ ) = 0,
2a. Πλ = 36 − x∗ − y ∗ ,
2b. λ∗ ≥ 0,
2c. λ∗ (36 − x∗ − y ∗ ) = 0.
36 − x∗ − y ∗ = 0.
30 − 2x∗ − λ∗ = 0,
64 − 4y ∗ − λ∗ = 0,
36 − x∗ − y ∗ = 0,
−2 0 −1 −30 0 −1
|A| = 0 −4 −1 = 6, |A|1 = −64 −4 −1 = 110,
−1 −1 0 −36 −1 0
−2 −30 −1 −2 0 −30
|A|2 = 0 −64 −1 = 36, |A|3 = 0 −4 −64 = 248.
−1 −36 0 −1 −1 −36
104 4 CONCAVE PROGRAMMING
F1 (Y, C, Z; C0 , I0 , G0 , X0 , Z0 , b, z) = Y − C − I0 − G0 − X0 − Z = 0,
F2 (Y, C, Z; C0 , I0 , G0 , X0 , Z0 , b, z) = C − c0 − bY,
F3 (Y, C, Z; C0 , I0 , G0 , X0 , Z0 , b, z) = Z − Z0 = zY.
(4.4.10)
We will discuss three cases:
Case 1. The partial derivatives of the unknown functions Y, C, Z with respect
to X0 are written in the matrix form as
∗
∂F
1 ∂F1 ∂F1 ∂Y −
∂F1
∂Y ∂C ∂Z ∂x0∗ ∂X
0
∂F2 ∂F2 ∂F2 ∂C ∂F2
= − , (4.4.11)
∂Y
∂C ∂Z ∂X 0 ∂X 0
∂F3 ∂F3 ∂F3 ∗
∂Z ∂F3
−
∂Y ∂C ∂Z ∂X0 ∂X0
where Y ∗ , C ∗ , Z ∗ denote the optimal values of the unknown functions. Sub-
stituting the values of the partial derivatives from (4.4.10) we obtain
∂Y ∗
1 −1 1 ∂x0∗
−b 1 0 ∂C
= [1 0 0]. (4.4.12)
∂X0
−z 0 1 ∂Z ∗
∂X0
Denoting this system as Ax = b, and using Cramer’s rule, we find that
1 −1 1
|A| = 1 − b + z > 0, |A1 | = 0 1 0 = 1,
0 0 1
1 1 1 1 −1 1
|A2 | = −b 0 0 = b, |A3 | = −b 1 0 = z.
−z 0 1 −z 0 0
4.5 APPLICATION TO MATHEMATICAL ECONOMICS 105
Thus,
∂Y ∗ |A1 | 1
= = > 0,
∂x0 A| 1−b+z
∂C ∗ |A2 | b
= = > 0,
∂x0 A| 1−b+z
∂Z ∗ |A3 | z
= = > 0.
∂x0 A| 1−b+z
Case 2. The partial derivatives of the functions Y ∗ , C ∗ , Z ∗ with respect to b
∂Y ∗ ∂C ∗ ∂Z ∗ T
h i
are obtained from (4.4.12) by replacing the matrix x by ,
∂b ∂b ∂b
thus giving ∂Y ∗
1 −1 1 ∂b∗
−b 1 0 ∂C = [ 0 Y ∗ 0 ] .
∂b (4.4.13)
−z 0 1 ∂Z ∗
∂b
Then using Cramer’s rule we get
∂Y ∗ Y∗ ∂C ∗ (1 + z)Y ∗ ∂Z ∗ zY ∗
= > 0, = > 0, = > 0.
∂b 1−b+z ∂b 1−b+z ∂b 1−b+z
Case 3. The partial derivatives of the functions Y ∗ , C ∗ , Z ∗ with respect to z
∂Y ∗ ∂C ∗ ∂Z ∗ T
h i
are obtained from (4.4.12) by replacing the matrix x by ,
∂z ∂z ∂z
thus giving ∂Y ∗
1 −1 1 ∂b
∂C ∗ ∗
−b 1 0
∂b = [ 0 0 Y ] . (4.4.14)
−z 0 1 ∂Z ∗
∂b
Then using Cramer’s rule we get
∂Y ∗ −Y ∗ ∂C ∗ (−bY ∗ ∂Z ∗ (1 − b)Y ∗
= < 0, = < 0, = > 0.
∂z 1−b+z ∂z 1−b+z ∂z 1−b+z
dR µ dC
= . (4.5.5)
dQ 1 + µ dQ
Since, by assumption, both dR/dQ and dC/dQ are strictly positive, the in-
equality in the non-negativity condition on µ implies that µ > 0. Thus, this
4.5 APPLICATION TO MATHEMATICAL ECONOMICS 107
firm operates at a level of output at which the marginal revenue dR/dQ is less
than the marginal cost dC/dQ. This conclusion is interesting because it is in
contradiction with the behavior of a profit maximizing firm which operates at
a level at which they are equal.
To operate, the firm must pay b per unit of output, whether it is peak (day)
or off-peak (night). Moreover, the firm must purchase capacity at a cost of
c per unit of output. Let K denote the total capacity measured in units of
Q. The firm must pay for capacity, regardless of whether it operates in the
off-peak period or not. Then the question is: Who should be charged for the
capacity costs, peak or off-peak customers? Thus, the firm’s maximization
problem becomes
Maximize P1 Q1 + P2 Q2 − b(Q1 − Q2 ) − cK
subject to K ≥ Q1 , K ≥ Q2 , (4.5.7)
M R1 = b + c − λ2 = b + c, M R2 = b + λ2 = b,
λ1 c = 8,
MR
z }| { zMC}| {
−5
22 − 2 × 10 Q1 = b + c = 14,
18 − 2 × 10−5 Q2 = b = 6.
Solving this system we get Q1 = 40, 000, Q2 = 60, 000, which violates the
assumption Q2 > Q1 = K (i.e., the second constraint is non-binding). Hence,
4.6 COMPARATIVE STATICS 109
a + bp = m − np + ky,
(b + a)p = m − a + ky,
m − a + ky
which gives p ≡ p∗ = . To determine the equilibrium level of p∗ ,
b+n
we find the change in p∗ with respect to y, or any of the other five param-
dp∗ k
eters a, b, m, n, k, we have = > 0. This means that an increase in
dy bn
consumers’ income will result in an increase in the equilibrium price of the
commodity.
This analysis can also be carried out by defining Eqs (4.6.1) explicitly as
Qs − QD = 0, i.e., the implicit function F = a + bp − m + np − ky. Then,
assuming Fp 6= 0, we get
dp∗ Fy
=− .
dy Fp
Since by differentiating we get Fp = b + n and Fy = k, the ratio is given by
dp∗ k
= > 0.
dy b+n
Next, consider the case when there is more than one endogenous variable.
Thus, let
F1 (y1 , y2 ; x1 , x2 ) = 0,
(4.6.2)
F2 (y1 , y2 ; x1 , x2 ) = 0.
110 4 CONCAVE PROGRAMMING
To find the partial derivatives of this system with respect to one variable, say
y1 , the total derivative of both functions with respect to x1 is given by
or
JX = B, (4.6.4)
where y1∗ , y2∗ denote the incomes at the equilibrium point. Since
the optimal values of the endogenous values y1∗ , y2∗ are determined as implicit
functions of exogenous variables x1 :
∂yi |Ji |
= . (4.6.5)
∂xi |J|
∂y1∗
Using Cramer’s rule, the first derivative is obtained by replacing the
∂x1
∂y2∗
first column of J with the column vector B, and the second derivative is
∂x1
obtained by replacing the second column of J with the column vector B :
∂F1 ∂F1
−
∂x1 ∂y2 ∂F ∂F
∂F2 ∂F2 1 2 ∂F2 ∂F1
− − −
∂y1∗ |J1 | ∂x1 ∂y2 ∂x1 ∂y2 ∂x1 ∂y2
= = ∂F ∂F1 = ,
∂x1 |J| 1 ∂F1 ∂F2 ∂F2 ∂F1
−
∂y1 ∂y2 ∂y1 ∂y2 ∂y1 ∂y2
∂F2 ∂F2
∂y1 ∂y2
4.6 COMPARATIVE STATICS 111
and
∂F1 ∂F1
− −
∂y1 ∂x1 ∂F ∂F
∂F2 ∂F2 1 2 ∂F2 ∂F1
− − − −
∂y2∗ |J2 | ∂y1 ∂x1 ∂y1 ∂x1 ∂y1 ∂x1
= = ∂F1 ∂F1 = .
∂x1 |J| ∂F1 ∂F2 ∂F2 ∂F1
−
∂y1 ∂y2 ∂y1 ∂y2 ∂y1 ∂y2
∂F2 ∂F2
∂y1 ∂y2
∂y1∗ ∂y2∗
The partial derivatives and are determined by the same method.
∂x2 ∂x2
Example 4.13. Let the equilibrium in the goods and service market (IS
curve) and the money market (LM curve) be given, respectively, by
where L(Y, i) denotes the demand for money, M0 the supply of money, C0 the
autonomous consumption, i the interest, and P the price level; thus M0 /P
becomes the supply of real money. For the sake of simplicity, hold P as
constant. Then the equilibrium level of P and i is affected by a change in C0 .
Using the above method and Cramer’s rule, we get from (4.6.6) and (4.6.7):
∂Y ∂Y ∂i
− 1 − CY − Ci = 0,
∂C0 ∂C0 ∂C0
∂Y ∂i
LY + Li = 0,
∂C0 ∂C0
∂Y ∗ Li
= > 0,
∂C0 (1 − CY )Li + Ci LY
∂i∗ −Li
= > 0.
∂C0 (1 − CY )Li + Ci LY
112 4 CONCAVE PROGRAMMING
Lx = y 2 − 2λ2 = 0,
Ly = 2xy − λ2 = 0, (4.6.13)
Lλ2 = 60 − 2x − y = 0.
Lx = y 2 − λ1 = 0,
Ly = 2xy − λ1 = 0, (4.6.14)
Lλ1 = 50 − x − y = 0.
Solving for x and y we get x∗ = 16.67, y ∗ = 33.33. But these values when
substituted into the coupon constraint (last equation in (4.6.13)) we find that
2x∗ + y ∗ = 2(10) + 40 = 60. Thus, this solution does not violate the budget
constraint. In fact, it just meets this constraint. However, this result is
unusual in the sense that although the budget constraint is met, it is not
binding due to the particular location of the coupon constraint.
4.7 Exercises
4.1. For the following function, find the critical points and determine if at
these points the function is a relative maximum, relative minimum, inflection
point, or saddle point.
(a) f (x, y) = 48y − 3x2 − 6xy − 2y 2 + 72x.
(b) f (x, y) = 5x2 − 3y 2 − 30x + 7y + 4xy.
(c) f (x, y) = 3x3 − 5y 2 − 225x + 70y + 20.
(d) f (x, y) = x3 + 2y 3 − 3x2 + 9y 2 − 45x − 60y
Ans. (a) Critical point (0, 12); inflection point at (0, 12).
(b) Critical point (2, 5/2); saddle point.
(c) Critical points (5, 7), (−5, 7); relative maximum at (−5, 7), saddle point
at (5, 7).
(d) Critical points (−3, 2), (−3, −5), (5, 2), (5, −5); relative maximum at
114 4 CONCAVE PROGRAMMING
(−3, −5) and a relative minimum at (5, 2); saddle points may be at (−3, 2)
and 5, −5).
4.2. Consider f (x, y) = 52x + 36y − 4xy − 6x2 − 3y 2 + 5. The first-
order partial derivatives, equated to zero, give fx = 52 − 4y − 12x = 0, fy =
36 − 4x − 6y = 0, so the critical point is (3, 4). The second-order partial
derivatives are: fxx = −12, fxy = −4, fyy = −6. Since both fxx < 0, Fyy < 0,
and (fxy )2 = (−4)2 > 0, and fxx ·fyy > (fxy )2 at the point (3, 4), the function
f has a global maximum at (3, 4).
4.3. Consider f (x, y) = 48y − 3x2 − 6xy − 2y 2 + 60x. The first-order
partial derivatives, equated to zero, give fx = −6x − 6y + 60 = 0, fy =
−6x − 4y + 48 = 0, so the critical point is (4, 6), The second-order partial
derivatives are: fxx = −6 < 0, fxy = −6 = fyx < 0, fyy = −4 < 0, and
fxx · fxy = (−6)(−4) = 24 < (fxY )2 = 36. The function f has an inflection
point at (4, 6).
4.4. Use the method of Lagrange multipliers to solve the problem: Given a
budget constraint of $110 when PK = 3 and PL = 4, maximize the generalized
Cobb-Douglas production function q = K 0.4 L0.5 .
Hint. The Lagrangian is Q = K 0.4 L0.5 + λ(162 − 3K − 4L). The critical
values are K ∗ = 24, L∗ = 22.5. Next, |H2 | > 0, |H̄| is negative definite, so Q
is maximized at the critical values.
4.5. Use the method of Lagrange multipliers to solve the problem: Max-
imize f (x, y) = 12 x2 = 12 y 2 − 2xy − y subject to the equality constraint
g(x, y) = x + y − 2. Solution. The Lagrangian for the problem is
1 2 1 2
L(x, y, λ) = x + y − 2xy − y + λ(x + y − 2).
2 2
Lxx Lxy 1 −2
|H| = = = −3 < 0.
Lyx Lyy −2 1
4.6. Maximize the utility function u(x, y) = x0.3 y 0.4 subject to the budget
constraint g(x, y) = 2x + 8y = 172.
Hint. Let u(x, y, λ) = x0.3 y 0.4 + λ(172 − 2x − 8y). The critical values are
x = 6, y ∗ = 20, λ∗ = 0.14, and the bordered Hessian |H̄| is negative definite
∗
4.7. If the equilibrium in the goods and service market (IS curve) and the
money market (LM curve) are defined as in Example 4.13, what effect will y ∗
and i∗ have on a change in M0 .
Ans. Take p as a constant. Then
∂y ∂y ∂i
− Cy − Ci = 0,
∂M0 ∂M0 ∂M0
∂y ∂i 1
Ly Li − = 0.
∂M0 ∂M0 P
or
∂y ∗
1 − Cy −Ci ∂M0 0
∂i∗ = .
Ly Li 1/p
∂M0
Then using Cramer’s rule we get
∂y ∗ Ci
= > 0,
∂M0 P (1 − Cy )Li + Ci Ly
∂i∗ 1 − Ci
= < 0.
∂M0 P (1 − Cy )Li + Ci Ly
This means that an increase in the money supply M0 will increase the equi-
librium level of income, but decrease the equilibrium interest rate.
Then Fx = yz 2 − λ = 0, Fy = xz 2 − λ = 0, Fz = 2xyz − λ = 0, Fλ =
20 − x − y − z = 0. To solve these equations simultaneously, we equate λ from
the first two equations, and from the first and the third equation, giving:
yz 2 = xz 2 , yz 2 = 2xyz,
1 z 2 2yz 1 0 2yz 1 0 z2
|H̄3 | = |H| = 0 − 1 1 0 2xz + 1 1 z 2 2xz − 1 1 z2 0
1 2xz 2xy 1 2yz 2xy 1 2yz 2xz
2
= − 1(0 − 2xz · 2xz) − z (2xy − 2xz) + 2yz(2yz − 0)
+ 1(z 2 · 2xy − 2yz · 2xz) − 0 + 2yz(2yz − z 2 )
− 1(z 2 · 2xz − 0) − 0 + z 2 (2yz − z 2 )
= z 4 − 4xz 3 − 4yz 3 − 4xyz 2 + 4x2 z 2 + 4y 2 z 2 .
Thus, |H̄3 |5,5,10 = −20000 < 0. Hence, |H̄2 | > 0 and |H̄3 | < 0 imply that |H|
is negative definite, and the function f is maximized at the critical values.
4.9. Maximize the total utility defined by u(x, y) = 10x2 + 15xy − 3y 2
when the firm meets the quota g(x, y) equal to 2x + y = 23.
Ans. Critical values: x∗ = 9, y ∗ = 5, λ∗ = 105; |H| is negative definite
and u is maximized at (x∗ , y ∗ ) = (9, 5).
4.10. Maximize the utility function u = Q1 Q2 when P1 = 1, P2 = 3, and
the firm’s budget is B = 60. Also estimate the effect of a 1-unit increase in
the budget. The budget constraint is Q1 + Q2 = 60, and the constraint is
Q1 + 3Q2 = 60. We consider the Lagrangian
L = Q1 Q2 + λ(60 − Q1 − 3Q2 ).
0 1
|H| = = −1 < 0.
1 0
With λ∗ = 10, a $1 increase in the budget will change the constant of the
constraint to 61, so that the new Lagrangian is
L = Q1 Q2 + λ(61 − Q1 − 4Q2 ),
∂Y ∂Y ∂i
− CY − Ci = 0,
∂M0 ∂M0 ∂M0
∂Y ∂i 1
LY + Li − = 0,
∂M0 ∂M0 P
and the money market is in equilibrium when the demand for money is equal
tp the money supply, i.e., when
L(Y, i) = M0 .
2 K −1.5 2
or −1.5
= , which simplifies to L1.5 = 0.4(1.5)K 1.5, or L ≈ 0.7K.
3L 5
Then using the last equation 80 − 2K − 5L = 0, we get the critical values as
K ∗ ≈ 14.45 and L∗ ≈ 10.18.
Next, the second-order partial derivatives of Q are:
−3 −4
QKK = −36K −2.5 0.4K −0.5 + 0.6L−0.5 −14.4K −3 0.4K −0.5 + 0.6L−0.5 ,
−3 −4
QLL = −54L−2.5 0.4K −0.5 + 0.6L−0.5 + 32.4L−3 0.4K −0.5 + 0.6L−0.5 ,
−4
QKL = 21.6K −1.5L−1.5 0.4K −0.5 + 0.6L−0.5 = QLK .
Using the values of K ∗ for K and L∗ for L, and carrying out some compu-
tations, we find that QKK ≈ −1.09, QLL ≈ −2.24, and QKL ≈ 1.5. Thus,
since QKK < 0, QLL < 0, and QKK QLL < (QKL )2 , we conclude that q is
maximized at the point (K ∗ , L∗ ). Alternatively, from the Hessian is |H| =
−1.09 1.5
, we find that |H1 | = −1.09 < 0, and |H2 | = |H| = −3.85 < 0,
1.5 −2.24
so the Hessian is negative definite (ND), and q is maximized at (K ∗ , L∗ ).
5
Convex Programming
∂f
(x) = 0, (5.1.1)
∂xi
where the Hessian is defined in §1.6.2, and definite and semidefinite matrices
are discussed in §1.5.
122 5 CONVEX PROGRAMMING
Also, it is easy to check that fxx ·fyy > (fxy )2 . Hence, f has a global minimum
at (1,2).
or
L(x, y, λ) = f (x, y) − λ(k − g(x, y)), (5.1.4)
where λ > 0 is known as the Lagrange multiplier, f (x, y) as the original func-
tion or the objective function, and g(x, y) as the constraint. Since the con-
straint is always set equal to zero, the product λ(g(x, y) − k), or λ(k − g(x, y))
is zero, and therefore, the addition of this term does not change the value
of the objective function f (x, y). The critical values at which the objective
function is optimized are denoted by x∗ , y ∗ , λ∗ , and are determined by taking
the first-order partial derivatives with respect to x, y, and λ, equating them
to zero, and solving them simultaneously:
The second-order conditions, which are obviously different from those of the
unconstrained optimization, are similar to those discussed in the previous
chapter.
The first-order conditions (5.1.5) are similar to the KKT conditions, which
are the necessary conditions, discussed in detail in §4.3; they are applicable in
equality and inequality constraints to the Lagrangian L(x, y, λ) = f (x, y) +
λ(g(x, y) − k).
so we have
∂L ∂L ∂L
= 2y − x − 5 + λ = 0, = x − y + λ = 0, and = x + y − 5 = 0.
∂y ∂x ∂λ
Solving these equations simultaneously, we get the optimal values: x∗ =
1.5, y ∗ = 3.5, and λ∗ = 2. Then f (x∗ , y ∗ .λ∗ ) = 4.625. Notice that the factor
g(x, y) − k with these optimal values is zero, as expected. To check the suf-
1 −1
ficient conditions: the Hessian |H| = = 2 > 0, and the first-order
−1 1
principal |H1 | = 1 > 0, while the second-order principal |H2 | = |H| > 0.
Thus, the Hessian positive definite, and the conditions for a minimum are
satisfied.
5.1.3 Equality Constraints: General Case. Consider the problem:
Minimize f (x) such that gj (x) = 0, j = 1, 2, . . . , m, x ∈ Rn .
Case 1 m = 1 (single constraint). This case corresponds to the case in
§5.1.2, and the method of Lagrange multiplier is used. In this case since the
(equality) constraint is g(x) = 0, the point x lies on the graph of the nonlinear
equation g(x) = 0 (see Figure 5.1). This necessary condition reduces to
∂f ∗
(x ) + λg(x∗ ) = 0, (5.1.6)
∂x
and the point x∗ where the minimum occurs is called a minimizer for the
problem. Notice that condition (5.1.6) can also be expressed as ∇f (x∗ ) +
λg(x∗ ) = 0.
Case 2 (general case). In Rn the necessary condition (5.1.6) holds for each
constraint gj (x) = 0. The Lagrangian for this problem is
m
X
L(f, λ ) = f (x) + λj gj (x). (5.1.7)
j=1
124 5 CONVEX PROGRAMMING
∂L ∂L ∂L
= 2x − y + λ = 0, = 2y − x + 4 + λ = 0, = x + y − 2 = 0,
∂x ∂y ∂λ
Lxx Lxy 2 −1
|H| = = = 3 > 0.
Lyx Lyy −1 2
1 2 1 2
L(x, y, λ) = x + y − xy + 4y + λ(x + y − 2).
2 2
5.1 MINIMIZATION PROBLEMS 125
Lxx Lxy 1 −1
|H| = = = 0.
Lyx Lyy −1 1
∂g 1
Since the Hessian test fails, we use condition (5.1.9). Since (x, y) = ,
∂xi 1
∂gj z
we have zT (x) = z1 + z2 = 0, i.e., z2 = −z1 . Next, consider z = 1 6=
∂xi z2
0
. Then
0
T 1 −1 z1
z |H| z = [ z1 z2 ] = (z1 − z2 )2 = (2z1 )2 > 0.
−1 1 z2
1
L(x, y, λ) = x2 + y 2 − 12x − 4y − 60 + λ(30x + 20y + S 2 − 120).
2
Then
∂L
= 2x − 12 + 30λ = 0,
∂x
∂L
= y − 4 + 20λ = 0,
∂y
∂L
= 30x + 20y + S 2 − 120.
∂λ
This is a nonlinear system of equations, which defines only the necessary KKT
conditions. We now consider two cases:
Case 1. If λ = 0, then solving the first two equations, we get x = 6 and
y = 4. Then the third equation gives S 2 = −140, which is infeasible.
60
Case 2. If S = 0, then the first two equations give x = 17 = 3.529 and
12 28
y = 17 = 0.706. Then from the third equation we get λ = 17 = 1.647. Thus,
the only feasible solution is the point( 60 28 ∗ ∗
17 , 17 ), yielding f (x , y ) = −92.47.
2 0
Next, to check the sufficient conditions, we have the Hessian |H| = =
0 1
8 > 0, and the first-order and second-order principals are |H1 | = 2 > 0, and
|H2 | = |H| > 0. Thus, the Hessian positive definite, and the conditions for a
minimum are satisfied.
Example 5.7. Minimize 0.5x2 + 0.5y 2 − 8x − 2y − 80 subject to the
constraint g(x, y) = 20x + 10y ≤ 130. Using the slackness variable S ≥ 0, the
constraint becomes g(x, y) = 20x + 10y + S − 130 = 0. The Laplacian is
∂L ∂L
= x − 8 + λ = 0, = y − 2 + 10λ,
∂x ∂y
∂L ∂L
= 20x + 10y + S − 130, = λ.
∂λ ∂S
5.1.5 General Linear Case. Consider the problem: Minimize f (x) subject
to the constraint g(x) = gj (x) ≤ 0, j = 1, 2, . . . , m, x ∈ Rn . The geometric
interpretation of this problem is as follows: As in the case of the equality
constraint, in this case we also have the necessary conditions defined by Eq
(5.1.6) for some value of λ which in the equality case is simply a positive or
negative real number. However, in the case of inequality constraints the sign
of λ is known in advance depending on the direction of −∇f , as shown in
Figure 5.3, where ∇g represents the direction of increase of g(x).
Thus, for minimization, we have:
Then the Lagrangian is L(x, λ) = f (x) + λ g(x), and the KKT conditions are
∂L ∗ ∗ ∂f ∗ ∂g ∗
(1) (x , λ ) = (x ) + λ∗ (x ) = 0, for all i = 1, . . . , n,
∂xi ∂xi ∂xi
∗
(2) g(x ) ≤ 0, (5.1.10)
(3) λ ∗ g(x∗ ) = 0, λ ∗ ≥ 0.
Note that condition (2) yields the given inequality constraint, and condition
(3) is complementary slackness, i.e.,
Note that if the inequality constraint are of the form g(x) ≥ 0, then we must
have λ∗j ≤ 0; or, alternatively, use −g(x) and retain λ ∗ ≥ 0.
!!! Note that condition (1) in (5.1.10) and (5.1.11) can also be expressed
respectively as ∇x L(x∗ , λ ∗ ) = ∇f (x∗ ) + λ ∗ ∇g(x∗ ) = 0, and ∇Lx (x∗ , λ ∗ ) =
m
P
∇f (x∗ ) + λ∗j ∇gj (x) = 0 (n equations).
j=1
gj (x) ≤ 0, j = 1, 2, . . . , m,
gj (x) ≥ 0, j = m + 1, . . . p,
hj (x) = 0, j = p + 1, . . . , q, x ∈ Rn ,
n
P
are linear functions of the form hj (x) = ajk xk − bj . Note that the domain
k=1
of each one of these functions is nonempty. Thus, a convex minimization
program has a convex objective, and the set of feasible solutions is a convex
set.
The Lagrangian is
m
X p
X q
X
L(x, λ ) = f (x) + λj gj (x) − λj gj (x) + µj gj (x). (5.2.1)
j=1 j=m+1 j=p+1
If x∗ minimizes f (x) while the above conditions (5.1.11) are satisfied, then
provided certain regularity conditions, or constraint qualifications to be dis-
cussed in the sequel, are met, there exist vectors λ ∗ and µ ∗ such that the
following KKT necessary conditions are satisfied:
m p q
∂f ∗ X ∂gj ∗ X ∂gj ∗ X ∂gj ∗
(1) (x ) + λj (x ) − λj (x ) + µj (x ),
∂x j=1
∂x j=m+1
∂x j=p+1
∂x
(2) all constraints given in (5.2.1) are satisfied,
(3) λ∗j ≥ 0, j = 1, 2, . . . , p, (5.2.2)
(4) λ∗j ≥ 0, j = 1, 2, . . . , p,
(5) µ∗j are unrestricted in sign for j = p + 1, . . . , q.
where λ and µ are the Lagrange multipliers. Then the KKT conditions are:
∂F
(a) = 0; (b) x+y +z +w = 1; (c) w ≤ C; (d) µ ≥ 0; and (e) µ(w −C) = 0.
∂x
Condition (a) yields 2x − λ = 0, 2y − λ = 0, 2z − λ = 0, 2w − λ = 0, which
gives x = y = z = λ2 , w = λ−µ
2 .
130 5 CONVEX PROGRAMMING
Case 1. If C > 41 , we have the interior case (see Figure 5.4). Since
1
4 − C ≤ 0, we find from (5.2.4) that condition (d) is satisfied. Thus, from
(5.2.3) we get
1 1
x=y=z≥ ; w = 1 − (x + y + z) ≤ .
4 4
But by condition (e), we have µ = 0, and hence, x = y + z = w = 41 . This is
the optimal case, even if we require w < C and C > 41 .
Case 2. If C = 14 , this is similar to Case 1, and the unconstrained optimum
lies on the boundary.
Case 3. If C < 14 , and if w < C, then condition (e) would require that
µ = 0. But then (5.2.3) would give x = 14 , which would violate condition (c).
Hence, w = C and x = y = z = 13 (1 − C). Further,
1 1 1
f = 3 (1 − C)2 + c2 = (1 − C)2 + C 2 = 1 − 2C + 4C 2 ,
9 3 3
5.2 NONLINEAR PROGRAMMING 131
1 1 1
and thus, f ≥ 4; and f = 4 when C = 4. The graph of f is presented in
Figure 5.5.
gj (x) ≤ 0, j = 1, 2, . . . , m,
gj (x) ≥ 0, j = m + 1, . . . p,
gj (x) = 0, j = p + 1, . . . , q, x ∈ Rn .
The Lagrangian is
m
X p
X q
X
L(x, λ ) = f (x) + λj gj (x) − λj gj (x) + µj gj (x). (5.2.5)
j=1 j=m+1 j=p+1
If x∗ minimizes f (x) while the above conditions (5.1.11) are satisfied, then
provided certain regularity conditions, or constraint qualifications to be dis-
cussed in the sequel, are met, there exist vectors λ ∗ and µ ∗ such that the
following KKT necessary conditions are satisfied:
m p q
∂f ∗ X ∂gj ∗ X ∂gj ∗ X ∂gj
(1) (x ) + λj (x ) − λj (x ) + µj ((x∗ ),
∂x j=1
∂x j=m+1
∂x j=p+1
∂x
(2) all constraints given in (5.2.1) are satisfied,
(3) λ∗j ≥ 0, j = 1, 2, . . . , p, (5.2.6)
(4) λ∗j ≥ 0, j = 1, 2, . . . , p,
(5) µ∗j are unrestricted in sign for j = p + 1, . . . , q.
5.2.2 Two Inequality Constraints. The general case is: Minimize f (x)
subject to gj (x) ≤ 0 for j = 1, 2, . . . , m, and qi (x) = xi ≥ 0 for i = 1, 2, . . . , n.
Let the Lagrange multipliers be µ1 , µ2 , . . . , µn associated with each of the non-
negativity constraints. Then, using the slackness in the KKT conditions we
will have µ∗i x∗i = 0 for i = 1, 2, . . . , n, and condition (1) in (5.2.6) becomes
m n
∂f ∗ X ∂gj ∗ X ∂qi j ∗
(1a) (x ) + λj (x ) − µ∗i (x ) = 0,
∂x j=1
∂x i=1
∂x
n
X m
X
x∗ µ∗ = [x∗ ]T ∇f (x∗ ) + λ∗j ∇gj (x∗ ) = 0. (5.2.10)
i=1 j=1
5.2 NONLINEAR PROGRAMMING 133
Note that no explicit Lagrange multipliers are used for non-negativity con-
straints. Also, although the KKT conditions are generally used to check
optimization, they are, however, not valid under all situations. There is an-
other set of necessary conditions, known as Fritz John conditions, discussed
in the next section, which are valid at all times, but in many cases they do
not provide the same information as the KKT conditions.
!!! In the above discussion, in view of Theorem 2.18, the expression (1a) can
be expressed as
m
X n
X
∗ ∗
∇f (x ) + λj ∇gj (x ) − µ∗i ∇qi (x∗ ) = 0;
j=1 i=1
or
m
X
∇f (x∗ ) + λj ∇gj (x∗ µ∗ );
j=1
m
X
∇f (x∗ ) + λj ∇gj (x∗ ) ≥ 0.
j=1
g(x∗ ) < 0. Hence, the local minimum is identified by the same conditions as
in case 2, Eq(5.2.2).
!!! Eqs (5.2.11) and (5.2.12) can be expressed, respectively, as −∇x f (x) =
λ∇x g(x), and − ∇x f (x) = λ∇x g(x), λ > 0.
m
X
L(x, λ ) = λ∗0 f (x∗ ) + λ∗j gj (x∗ )). (5.3.1)
j=1
If x∗ is the minimizer, then there exists a λ∗ ∈ Rm+1 , and the Fritz John
conditions are
m
∂ L̃ ∗ ∗ ∂f ∗ X ∂gj ∗
(1) (x , λ ) = λ∗0 (x ) + λ∗j (x ) = 0,
∂x ∂x j=1
∂x
(2) gj (x∗ ) ≤ 0, j = 1, 2, . . . , m, (5.3.2)
(3) λ∗j {gj (x∗ )} = 0, j = 1, 2, . . . , m,
(4) λ∗ ≥ 0 and λ∗ =
6 0.
!!! The first Fritz John condition in (5.3.2) can also be written as
m
X
∇x L̃(x∗ , λ ∗ ) = λ∗0 ∇f (x∗ ) + λ∗j ∇gj (x∗ ) = 0.
j=1
The Fritz John (FJ) conditions are always necessary for x∗ to be a solu-
tion. However, the KKT conditions are necessary provided certain conditions
known as constraint qualifications (CQ) are satisfied. This can be represented
as
CQ
Local optimum =⇒ Fritz John =⇒ KKT .
Fritz John conditions, because if λ∗0 = 0, then these conditions do not use the
objective and they are of no practical use in locating the optimal point x∗ .
Remember that the CQs are essentially the constraints that ensure the
λ∗0 > 0. Thus, if we redefine λ∗j as λ∗j /λ∗0 , j = 0, 1, . . . , m, then the Fritz John
conditions reduce to the KKT conditions, so that λ∗0 = 1 can be ignored.
Example 5.11. Minimize f (x, y) = −x subject to g1 (x, y) = y −(1−x)5 ≤
0, and g2 (x, y) = −y ≤ 0, where x = (x, y). The graphs are presented in
Figure 5.7, where the feasible region and the optimal point x∗ are identified
for this problem.
x
In matrix notation, x = . The optimal solution is x∗ = (1, 0), which in
y
1
matrix form is written as x∗ = . Now, in view of x∗ , we have g(x∗ , y ∗ ) = y,
0
and g2 (x∗ , y ∗ ) = −y, and so we get
∂f ∗ −1 ∂g1 ∗ 0 ∂g2 ∗ 0
(x ) = , (x ) = , (x ) = .
∂x 0 ∂x 1 ∂x −1
Note that the CQs are not met, since the first partials of both constraints
that are satisfied as strict equalities at x∗ are not linearly independent. Next,
the FJ conditions are
∂f ∗ ∂g1 ∗ ∂g1 ∗
λ0 (x ) + λ1 (x ) + λ2 (x ) = 0,
∂x ∂x ∂x
i.e.,
−1 0 0 0
λ0 + λ1 + λ2 = ,
0 1 −1 0
i.e., λ1 (0) + λ2 (0) = 1 and λ1 (1) + λ2 (−1) = 0, which are inconsistent, that
5.3 FRITZ JOHN CONDITIONS 137
Note that the CQs are not met as they are not the necessary conditions.
However, the FJ conditions identify an optimal problem. The gradient vectors
of both constraints that are satisfied as strict equalities at x∗ are not linearly
independent. Next, the FJ conditions that provide an optimal solution are
∂f ∗ ∂g1 ∗ ∂g1 ∗
λ0 (x ) + λ1 (x ) + λ2 (x ) = 0,
∂x ∂x ∂x
i.e.,
0 1 −1 0
λ0 + λ1 + λ2 = ,
−1 0 0 0
which are satisfied if λ0 = 0 and λ1 = λ2 . On the other hand, the KKT
conditions are
∂f ∗ 0 1 −1
− (x ) = = λ1 + λ2 ,
∂x 1 0 0
138 5 CONVEX PROGRAMMING
i.e., λ1 (1) + λ2 (−1) = 0 and λ1 (0) + λ2 (0) = 1, which are inconsistent, that
is, they cannot be solved for λ1 and λ2 .
5.3.1 Feasibility. The following four cases for the feasibility problem are
considered.
Case 1. A convex optimization problem with equality and inequality con-
straints is to find
Note that
∗ ∞ if problem is infeasible,
x =
−∞ if problem is unbounded below,
where the infeasibility of the problem means that no x satisfies the constraints.
Then (i) x is feasible if x ∈ dom(f ) and it satisfies the constraints.
(ii) A feasible x is optimal if f (x) = f (x∗ ).
Case 2. Find
This problem can be regarded as a special case of the above general problem
(5.3.3) with f (x) = 0. For this problem
∗ 0 if constraints are feasible,
f (x ) =
∞ if constraints are infeasible,
where kz − xk2 ≤ A.
5.3 FRITZ JOHN CONDITIONS 139
m
\ k
\
x= dom(gi ) ∩ dom(hj ), (5.3.7)
i=0 j=1
k
X
min f (x) = − log(bi − aTi x) (5.3.8)
i=1
Note that the feasible set of a convex optimization problem is a convex set.
Example 5.15. Find
x
min{f (x) = x2 + y 2 } subject to g1 (x) = (x + y)2 ; h1 (x) = ≤ 0.
x 1 + y2
Note that f is convex, and the feasible set {(x, y) | x = −y ≤ 0} is convex. But
h1 is not convex, and g1 is not affine. Hence, it is not a convex optimization
problem.
An equivalent, but not identical, problem to Example 5.15 is
140 5 CONVEX PROGRAMMING
then the CQ holds at x since the relative interior is nonempty. Moreover, for
a convex program where the CQ holds, the KKT necessary conditions are also
sufficient.
∂L
max L(x, λ, µ), subject to (x, λ , µ) = 0, λ , µ ≥ 0, x ∈ D,
∂x
where L̂ is the dual Lagrangian.
Theorem 5.1. (Weak duality theorem) The dual Lagrangian L̂(λ λ, µ ) and the
primal Lagrangian L̄(x, λ , µ ) are related by the inequalities
The quantity L̄(x) − L̂(λ
λ , µ ) is called the duality gap.
Then, the dual problem is to find the slope λ of the tangential line (or plane)
for which the intercept with the z2 -axis is maximized.
142 5 CONVEX PROGRAMMING
that is,
Verify that this objective function L(x, λ) is convex, thus, the minimum is
obtained by using the necessary and sufficient conditions, which are
∂L ∂L
= 10x + 2y − 3λ = 0, = 2x + 2y − λ = 0,
∂x ∂y
∂ L̂
= k − λ = 0, which yields λ = k > 0.
∂λ
Since λ > 0, we get the dual solution:
∗ ∗
∗ ∗ ∗ k k
slackness is satisfied since λ (k − 3x −
which is feasible. The complementary
y ) = 0. Hence, (x , y ) = 4 , 4 ∈ D is the optimal point for the primal
problem. Moreover, since f (x∗ ) = 5x∗2 + 2x∗ y ∗ + y ∗2 = k 2 /2 = L(λ∗ ), there
is no duality gap.
i.e.,
Minimize x2 + xy + y 2 subject to x + y ≥ 6, x, y ≥ 0.
The associated dual problem is
λ) .
max min L(x, λ ) = max L̂(λ
λ ≥0 x∈D λ ≥0
Verify that this objective function L(x, λ ) is convex, thus, the minimum is
obtained by using the necessary and sufficient conditions which are
∂L ∂L
= 2x + y − λ = 0, = x + 2y − λ = 0,
∂x ∂y
Verify that the objective function for this min problem is concave. Hence,
∂ L̂ 2
= 6 − λ = 0, which yields λ = 9 > 0.
∂λ 3
Since λ > 0, we get the dual solution:
λ∗
x∗ = y ∗ = = 3 > 0, and x∗ + y ∗ = 6,
3
The Lagrangian is L(x, λ) = −2x2 −3x3 +λ(x2 −1). For the primal problem
the optimality condition is
∂L
= 0, which yields −4x−9x2 +2λx = 0, λ(x2 −1) = 0, λ ≥ 0, x2 ≤ 1.
∂x
The dual problem is max L̂(λ) , that is,
λ≥0
L̂(λ) = min L(x, λ) = min −2x2 −3x3 +λ(x2 −1) = −∞ for all values of λ.
x x
Thus, the primal objective = −5, and the dual objective = −∞. Hence,
there is a duality gap. Geometrically, let z1 = g(x) = x2 − 1 and z2 =
f (x) = −2x2 − 3x3 . Then the supporting plane (line) runs through (−1, 0)
and intersects the z2 -axis at −∞ (see Figure 5.10).
λ ) ≤ L(x, λ̄
L(x̄, λ̄ λ) for all x ∈ D, i.e., L(x, λ̄
λ) = min L(x, λ̄
λ ) = L̂(λ̄
λ ),
x∈D
(5.4.2a)
λ ) ≥ L(x, λ̄
L(x̄, λ̄ λ) for all λ ≥ 0, i.e., L(x̄, λ̄
λ) = max L(x̄, λ ) = L̄(x̄).
λ ≥0
(5.4.2b)
λ) = L̂(λ̄
L̄(x̄) = L(x̄, λ̄ λ ). (5.4.3)
This shows that the primal objective is equal to the dual objective and the
duality gap is zero.
146 5 CONVEX PROGRAMMING
where f (x) and gj (x) are all convex functions, and D is a convex set.
The saddle point sufficiency condition is: If x ∈ D and λ ≥ 0, then (x̄, λ̄
λ)
is a saddle point of L(x, λ ) iff
(i) x̄ minimizes L(x, λ ) = f (x) + λ T gj (x) over D;
(ii) gj (x) ≤ 0 for each j = 1, 2, . . . , m;
λgj (x̄) = 0, which implies that f (x̄) = L(x̄, λ̄
(iii) λ̄ λ).
λ) is a saddle point of L(x, λ), then x̄ solves the primal problem (5.4.4)
If (x̄, λ̄
λ
and λ̄ solves the dual problem which is
λ ) = min L(x, λ ).
max L̂(λ (5.4.5)
λ ≥0 x∈D
where D ⊆ Rn is a nonempty convex set, f (x) and gj (x) are convex and hj (x)
are linear functions.
The dual problem is defined as follows: Find
λ , µ ) subject to λ ≥ 0,
Ψ = sup L̂(λ (5.4.7)
λ , µ ) = inf
where L̂(λ f (x) + λ T g(x) = µ T h(x) . In (5.4.6) and (5.4.7), inf
x∈D
may be replaced by min, and sup by max.
Theorem 5.2. (Strong duality theorem) Assuming that the following CQ
holds: There exists an x̂ in D such that gj (x̂) < 0 for j = 1, 2, . . . , m1 , and
hj (x̂) = 0 for j = 1, 2, . . . , m2 , and 0 ∈ int {h(D)}, where h(D) = h(x), x ∈
D, then
Φ = Ψ, (5.4.8)
5.5 Exercises
5.1. Let the production function be a Cobb-Douglas function with decreas-
ing returns to scale, so that the firm’s profit function is defined by
π = P AK α Lβ − rK − wL. (5.5.1)
F1 (K, L; r, w, P, A, α, β) = αP AK α−1 Lβ − r = 0,
F2 (K, L; r, w, P, A, α, β) = βP AK α Lβ−1 − w = 0.
|A| = α(α − 1)P AK α−2 Lβ β(β − 1)K α Lβ−2 − (αβP AK α−1 Lβ−1 )2
= αβ(1 − α − β)P 2 A2 K 2α−2 L2β−2 > 0 for α + β < 1.
Thus,
where tr(A) is the trace of the matrix A (i.e., sum of the diagonal elements
of A). This means that an increase in wages will decrease the demand for
capital. Similarly,
This shows that an increase in wages will reduce the optimal level of labor
used.
Case 2. For computing change in the demand for capital and for labor due to
∂K ∗ ∂L∗
an increase in output price, i.e., and , the first-order conditions can
∂P ∂P
be expressed by partial derivatives with respect to w (wages) in the matrix
form Ax = b as
∗
α−2 β α−1 β−1
∂K
α(α − 1)P AK L αβP AK L ∂P −αAK α−1 Lβ
∂L∗ = .
αβP AK α−1 Lβ−1 β(β − 1)P AK α Lβ−2 −βAK α Lβ−1
∂P
Note that the matrix A is the same as in Case 1, while in this case we have
−αAK α−1 Lβ αβP AK α−1 Lβ−1
|A1 | = = αβP A2 K 2α−1 L2β−2 > 0,
−βAK α Lβ−1 β(β − 1)P AK α Lβ−2
α(α − 1)P AK α−1 Lβ −αAK α−1 Lβ
|A2 | = = αβP A2 K 2α−2 L2β−1 > 0.
αβP AK α−1 Lβ−1 −βAK α Lβ−1
This yields
which shows that an increase in the output will increase the demand for capital
Similarly,
which shows that an increase in the output will increase the optimal level of
labor used.
5.2. Consider f (x, y) = 3x2 − xy + 2y 2 − 4x − 7y + 8. The first-order partial
derivatives, equated to zero, give: fx = 6x − y − 4 = 0, fy = −x + 4y − 7 = 0,
solving which we get x = 1, y = 2. Thus, the critical number is (1, 2). The
second-order partial derivatives are: fxx = 6, fxy = fyx = −1, fyy = 4.
5.5 EXERCISES 149
Checking the condition fxx · fyy > (fxy )2 , we have 6 · 4 > (−1)2 . Since both
fxx and fyy are positive, we have a global minimum at (1, 2).
5.3. Optimize f (x, y) = 4x2 + 3xy + 6y 2 , subject to the constraint x + y =
28. The Lagrangian in the form (5.1.2) is
8 3 1
|H̄| = 3 12 1 ,
1 1 0
and its second principal minor is |H̄2 | = |H̄| = 8(−1) − 3(−1) + 1(3 − 12) =
−14 < 0. Since |H̄2 | < 0, |H| is positive definite, F (x, y) is at a local mini-
mum.
5.4. Optimize the following Cobb-Douglas production functions subject
to the given constraints by (i) using the Lagrange function and finding the
critical points, and (b) by using the Hessian.
(a) Q = K 0.4 L0.5 subject to 6K + 2L = 270. We get
L (3)(0.5)
= = 3.75, which gives L = 3.75K.
K 0.4
Since H̄2 | > 0, |H̄| is negative definite, and Q is maximized at the point
(20, 75).
5.5. Maximize the utility function u = x0.6 y 0.3 subject to the budget
constraints 8x − 5y − 300.
Ans. Since U (x, y) = x0.6 y 0.3 +λ(600−8x−5y), we have Ux = 0.6x−0.4 y 0.3 −
8λ, Uy = 0.3x0.6 y 0.7 − 5λ, Uλ = 300 − 8x − 5y. Then from the first two equa-
tions we get y = 45 x, which after substituting in the last equation gives the
critical points (25, 20).
5.6. Minimize the total costs defined by c = 15x2 + 30xy + 30y 2 when the
firm meets the quota g(x, y) equal to 2x + 3y = 20. Define
30 30 2
|H̄| = 30 60 3 .
2 3 0
The second principal minor is |H̄2 | = −150 < 0. Thus, |H̄| is positive definite
and C is minimized when x = y = 12.
5.7. Minimize the utility u = x1/ 2y 3/5 subject to the budget constraint
3x + 9y = 66. Define
since all terms are positive. Hence, |H̄| is positive definite, and U is minimized
at the critical values.
5.8. Minimize x2 + 2y 2 subject to x + y ≥ 3 and y − x2 ≥ 1. Solution.
The Lagrangian is
L(x, y, λ1 , λ2 ) = x2 + 2y 2 + λ1 (3 − x − y) + λ2 (1 − y + x2 ), λ1 , λ2 ≥ 0.
Then
Lx (x, y, λ1 , λ2 ) = 2x − λ1 + 2λ2 = 0,
Ly (x, y, λ1 , λ2 ) = 4y − λ1 − λ2 = 0,
Lλ1 (x, y, λ1 , λ2 ) = 3 − x − y = 0,
Lλ2 (x, y, λ1 , λ2 ) = 1 − y + x2 = 0. (5.5.3)
Solving the last two equations in (5.5.3), we get (x, y) = (−2, 5) or (x, y) =
(1, 2). Using (x, y) = (−2, 5) in the first two equations in (5.5.3) we find that
λ1 = 28, λ2 = −8, which is not feasible. Next, using (x, y) = (1, 2) in the first
two equations in (5.5.3) we get λ1 = 6, λ2 = 2, which is feasible and the point
(1, 2) is the global minimizer.
5.9. Minimize x2 + y 2 − 4x − 4y subject to x2 ≤ y, x + y ≤ 2. Solution.
The Lagrangian is
Then
Lx (x, y, λ1 , λ2 ) = 2x − 2λ1 x + λ2 = 0,
Ly (x, y, λ1 , λ2 ) = 2y − 4 − λ1 + λ2 = 0,
Lλ1 (x, y, λ1 , λ2 ) = x2 − y = 0,
Lλ2 (x, y, λ1 , λ2 ) = x + y − 2 = 0. (5.5.4)
152 5 CONVEX PROGRAMMING
Solving the last two equations in (5.5.4), we get (x, y) = (−2, 4) or (x, y) =
(1, 1). Using (x, y) = (−2, 4) in the first two equations in (5.5.3) we find that
λ1 = −4, λ2 = −0, which is not feasible. Next, using (x, y) = (1, 1) in the
first two equations in (5.5.3) we get λ1 = 0, λ2 = 2, which is feasible and the
point (1, 1) is the global minimizer.
5.10. The Constant Elasticity of Substitution (CES) production function
is defined by
−1/β
q = A αK −β + (1 − α)L−β , (5.5.1)
where A > 0 is the coefficient parameter, α (0 < α < 1) the distribution
parameter denoting relative factor shares, and β > −1 the substitution pa-
rameter that determines the value of elasticity of substitution (see Exercise
−2
2.35). Consider q = 100 0.4K −0.5 + 0.6L−0.5 , and determine the relative
minimum. Using the Lagrangian, the first-order partial derivatives of Q ≡ q
are:
−3
QK = 40K −1.5 0.4K −0.5 + 0.6L−0.5 =0
−1.5
−0.5
−0.5 −3
QL = 60L 0.4K + 0.6L = 0.
Solving these two equations we get L1.5 = 1.5K 1.5 , or L ≈ 1.3K. The second-
order partial derivatives of Q are:
−3 −4
QKK = −60K −2.5 0.4K −0.5 + 0.6L−0.5 + 24K −3 0.4K −0.5 + 0.6L−0.5 ,
−3 −4
QLL = −90L−2.5 0.4K −0.5 + 0.6L−0.5 + 54L−3 0.4K −0.5 + 0.6L−0.5 ,
−0.5 −4
−1.5 −1.5
−0.5
QKL = 36K L 0.4K + 0.6L = QLK .
Upper-level sets, also known as upper contour sets, are convex sets for quasi-
concave functions, and they are used in problems involving consumer’s utility
maximization and a company’s cost minimization. For example, let an input
requirement in the case of a production function correspond to the upper-
level set (6.1.1), where α denotes an output level, x an input vector, and f a
single-output production function. Then, in the case of utility maximization,
where u denotes a utility function, the set of all consumption bundles {x}
that are preferable to a given consumption bundle {x∗ } is also an upper-level
set Uα = {x∗ | u(x∗ ) ≥ α}.
A function f is said to be quasi-concave iff the upper-level set Uα is a
convex set for every y ∈ R(f ).
A real-valued function f : Rn 7→ R is strictly quasi-concave iff
for all x, x′ ∈ Rn and for all t ∈ [0, 1]. This definition differs from the above
definition of quasi-concavity in that only strict inequality is used.
Let x, x′ ∈ R be two distinct points on the x-axis (a convex set) such
that the interval [x, x′ ] supports an arc AB on the curve, and B is higher
than A (see Figure 6.1(a)). Since all the points between A and B on the
arc are strictly higher than A, it satisfies the condition of quasi-concavity.
The curves are strictly quasi-concave if all possible [x, x′ ] intervals have arcs
that satisfy this same condition. Notice that this function also satisfies the
condition of non-strict quasi-concavity, but does not satisfy the condition of
quasi-convexity, because some points on the arc AB are higher than A, and
this is not acceptable for a quasi-convex function. Figure 6.1(b) presents the
case where a horizontal line segment A′ B ′ exists on which all points have the
same height. This curve meets the condition of quasi-concavity, but does not
satisfy that of strict quasi-concavity.
Note that in general a quasi-concave function that is also concave has its
graph approximately shaped like a bell, or part thereof, and a quasi-convex
function has its graph shaped like an inverted bell, or a part of it. Thus,
quasi-concavity (or quasi-convexity) is a weaker condition than concavity (or
convexity).
The above geometrical characterization leads to the following algebraic
quasi-concave
definition: A function f is iff, for any pair of distinct points
quasi-convex
x and x′ in the (convex-set) domain of f , and for 0 < t < 1,
≥ f (x)
f (x′ ) ≥ f (x) =⇒ f (tx + (1 − t)x′ ) . (6.1.3)
≤ f (x′ )
the function −f , we will have −f (x) ≥ −f (x′ ) and f (tx + (1 − t)x′ ) ≤ −f (x).
Thus, −f satisfies the condition of quasi-convexity.
Concavity implies quasi-concavity. To prove, let f be concave. Then
f (tx + (1 − t)x′ ) ≥ tf (x) + (1 − t)f (x′ ). Now, assume that f (x′ ) ≥ f (x).
Then any weighted average of f (x) and f (x′ ) cannot possibly be less than
f (x), i.e., tf (x) + (1 − t)f (x′ ) ≥ f (x). Combining these two results we find
that f (tx + (1 − t)x′ ) ≥ f (x) for f (x′ ) ≥ f (x), which satisfies the definition
of quasi-concavity.
The condition of quasi-concavity does not guarantee concavity.
In the case of concave (and convex) functions, there is a very useful result:
the sum of concave (convex) functions is also concave (convex). However, this
result cannot be generalized to quasi-concave and quasi-convex functions.
Sometimes quasi-concavity and quasi-convexity can be checked by using
the following definition:
n quasi-concave
A function f (x), where x = (x1 , . . . , xn ) ∈ R , is iff, for
quasi-convex
any constant k, the set
S ≥ ≡ {x | f (x) ≥ k}
is a convex set. (6.1.4)
S ≤ ≡ {x | f (x) ≤ k}
The three functions in Figure 6.2 all contain concave as well as convex
segments and therefore they are neither concave nor convex. However, the
function in Figure 6.2(a) is quasi-concave because for any value of k, although
the figure shows only one value of k, and the set S ≥ is convex. The function
in Figure 6.2(b) is, however, quasi-convex since the set S ≤ is convex. The
function in Figure 6.2(c) is a monotone function and it differs from the other
two functions in that both S ≥ and S ≤ are convex sets. Hence, the function is
156 6 QUASI-CONCAVE FUNCTIONS
both quasi-concave and quasi-convex. Note that formula (6.1.4) can be used
to check quasi-concavity and quasi-convexity, but it cannot verify whether
they are strict or nonstrict.
Example 6.1. Check f (x) = x2 , x ≥ 0, for quasi-concavity and quasi-
convexity. The graph of the function shows that it is a convex and a strictly
convex function. It is also quasi-concave because its graph is a U-shaped curve,
starting at the origin and increasing; it is similar to Figure 6.2(c) generating
a convex S ≥ as well as a convex S ≤ set. Instead we use formula (6.1.3). If x
and x′ are two distinct nonnegative values of x, then f (x) = x2 , f (x′ ) = x′2 ,
and f (tx + (1 − t)x′ ) = (tx + (1 − t)x′ )2 . Now, suppose f (x′ ) ≥ f (x), i.e.,
x′2 ≥ x2 ; then, x′ ≥ x, or specifically x′ > x, since x and x′ are distinct
points. Thus, the weighted average tx + (1 − t)x′ must lie between x and x′ ,
and we have for 0 < t < 1,
x′2 > (tx + (1 − t)x′ )2 > x2 or f (x′ ) > f (tx + (1 − t)x′ ) > f (x).
But in view of (6.1.3), this result implies that f is both strictly quasi-concave
and strictly quasi-convex.
Example 6.2. Show that f (x, y) = xy, x, y ≥ 0 is quasi-concave. Use
the criterion in (6.1.4) and show that the set S ≥ = {(x, y) | xy ≥ k} is a
convex set for any k. Note that the curve xy = k with k ≥ 0 is a different
curve for each k. If k > 0, this curve is a rectangular hyperbola in the first
quadrant of the xy-plane, and the set consisting of all points on and above
this hyperbola is a convex set. But if k = 0, the curve is defined by xy = 0,
which constitutes the nonnegative parts of the x and y axes, and it is again
a convex set. Hence, the function f (x, y) = xy, x, y ≥ 0, is quasi-concave. Be
careful not to confuse the given curve with z = xy which is a surface in the
(x, y, z)-space. In this example we are examining the characteristics of the
surface which is quasi-concave in R3 .
Example 6.3. Show that the function f (x, y) = (x−a)2 +(y−b)2 is convex
and so it is quasi-convex. Use the criterion (6.1.4), and set (x−a)2 +(y −b)2 =
k, k ≥ 0. For√ each k, the curve is a circle in the xy-plane with center at (a, b)
and radius k. Since the set {(x, y) | (x − a)2 + (y − b)2 = k} is the set of all
points on and inside this circle, it is a convex set, even when k = 0 in which
case the circle degenerates into a single point (a, b), and a set with a single
point is a convex set. Hence, the given function is quasi-convex.
For strict quasi-concavity and quasi-convexity, the right side of (6.2.2) must
be changed to strict inequality > 0.
If a function f (x), x ∈ Rn , is twice continuously differentiable, we can
check quasi-concavity and quasi-convexity by using the bordered Hessian |B|
(single function, §1.6.4) defined by
0 f1 f2 ··· fn
f1 f11 f12 ··· f1n
|B| = f2 f21 f22 ··· f2n , (6.2.3)
··· ··· ··· ··· ···
fn fn1 fn2 ··· fnn
∂f ∂2f
where fi = and fij = , i, j = 1, . . . .n. Note that, unlike the
∂xi ∂xi ∂xj
bordered Hessian |H̄| described in §1.6.3 and used for optimization problems
involving an extraneous constraint g, the above-defined bordered Hessian |B|
is composed of the first derivatives of the function f only, without any extra-
neous constraint g. The leading principal minors of |B| are
0 f1 f2
0 f1
|B1 | = , |B2 | = f1 f11 f12 , |Bn | = |B|. (6.2.4)
f1 f11
f2 f21 f22
We will state two conditions, one of which is necessary and the other is
sufficient, and both relate to quasi-concavity on a domain consisting only
of the nonnegative orthant (the n-dimensional analogue of the nonnegative
quadrant) which is defined by x1 , . . . , xn ≥ 0. These conditions are as follows:
The necessary condition for a function z = f (x) to be quasi-concave on the
nonnegative orthant is
≤ odd
|B1 | ≤ 0, |B2 | ≥ 0, and |Bn | 0 if n is (6.2.5)
≥ even,
158 6 QUASI-CONCAVE FUNCTIONS
where the partial derivatives are evaluated in the nonnegative orthant. Recall
that the first condition in (6.2.5) is automatically satisfied since |B1 | = −f12 =
2
∂f
− .
∂x1
The sufficient condition for f to be strictly quasi-concave on the nonnega-
tive orthant is that
≤ odd
|B1 | < 0, |B2 | > 0, |Bn | 0 if n is (6.2.6)
≥ even,
where the partial derivatives are evaluated in the nonnegative orthant. The
details of these conditions are available in Arrow and Enthoven [1961:797],
and Takayama [1993:65].
Example 6.4. The function f (x1 , x2 ) = x1 x2 , x1 , x2 ≥ 0 is quasi-concave
(compare Example 6.2). We will check it using (6.2.2). Let u = (u1 , u2 ) and
v = (v1 , v2 ) be two points in dom(f ). Then f (u) = u1 u2 and f (v) = v1 v2.
Assume that
f (v) ≥ f (u), or v1 v2 ≥ u1 u2 , (6.2.7)
where u1 , u2 , v1 , v2 ≥ 0. Since the partial derivatives of f are f1 = x2 and
f2 = x1 , condition (6.2.2) implies that f1 (u)(v1 − u1 ) + f2 (u)(v2 − u2 ) =
u2 (v1 − u1 ) + u1 (v2 − u2 ) ≥ 0, which after rearranging the terms is
Now there are four cases to consider depending on the values of u1 and u2 :
(1) If u1 = u2 = 0, then (6.2.8) is trivially satisfied.
(2) If u1 = 0 and u2 > 0, then (6.2.8) reduces to u2 v1 ≥ 0, which is again
satisfied since u2 and v1 are both nonnegative.
(3) If u1 > 0 and u2 = 0, then (6.2.8) reduces to 0 ≥ −u1 v2 , which is
satisfied.
(4) Suppose u1 , u2 > 0, so that v1 , v2 > 0 also. Subtracting v2 u1 from both
sides of (6.2.7), we obtain
0 x2 x1
0 x2
|B1 | = = −x22 ≤ 0, |B2 | = x2 0 1 = 2x1 x2 ≥ 0.
x2 0
x1 1 0
0 fx
|B1 | = = −(axa−1 y b )2 < 0,
fx fxx
0 fx fy
|B2 | = fx fxx fxy = [2a2 b2 − a(a − 1)b2 − a2 b(b − 1)] x3a−2 y 3b−2 > 0,
fy fyx fyy
(Sufficiency ⇐) Let α = min{f (x), f (y)}. Then both x ∈ BαU and y ∈ BαU
hold. Since BαU is a convex set, we get tx + (1 − t)y ∈ BαU , which implies that
f (tx + (1 − t)y) ≥ α = min{f (x), f (y)}. This proves quasi-concavity of f .
This proposition states that a function f is quasi-concave iff for any α the
upper-level set of f is a convex set.
x if x ≤ 0,
Example 6.7. Consider the function f (x) = 0 if 0 ≤ x ≤ 1,
x − 1 if x ≥ 1.
see from the topological map of any actual mountain. In fact, the contours of
mountains do not generally enclose convex sets.
In reality, mountains generally come in different forms. For example, a
mountain may be a deformation of a cone that gets progressively steeper at
higher altitudes, becoming harder to climb. In this case a straight line from
the top of the mountain to any other point on its surface will not lie on or
under the surface, but rather pass through the air. The function defined by
the surface of such a mountain is not concave.
Let us consider the surface of a mountain as function f (x, y), where x
denotes longitude and y the latitude. Then a contour is a level curve of f . A
function with the property that, for every value of a, the set of points (x, y)
such that f (x, y) ≥ a is a convex set is said to be quasi-concave. The set of
points (x, y) in this definition lie inside every contour on a topographical map.
Example 6.10. Let f (x, y) = −x2 − y 2 . The upper-level set of f for α is
the set of points (x, y) such that −x2 − y 2 ≥ α, or x2 + y 2 ≤ −α. Thus, if
α > 0, the upper-level set Uα is empty, whereas if α < 0, it is a disk of radius
α1/2 .
have
tf (x) + (1 − t)f (y) ≥ ta + (1 − t)a = a. (6.5.2)
f (tx + (1 − t)y) ≥ a,
2 2
Example 6.13. Consider the bell-shaped curves g(x, y) = e−x −y and
2 2
h(x, y) = e−(x−1) −(y−1) . Both functions are quasi-concave. Let their sum
be G(x, y) = g(x, y) + h(x, y). The function G(x, y) is not quasi-concave
as it consists of two disjoint bell-shaped surfaces, representing the functions
g(x, y) and h(x, y), respectively. The indifference curves and the upper-level
set (shaded region) of G(x, y) corresponding to α = 0.7 are shown in Figure
6.9. It is obvious that the upper-level set is not convex because it consists
of two parts. This example shows that, in general, the sum of quasi-concave
functions is not necessarily a quasi-concave function.
Figure 6.9 (a) Indifference curves, and (b) upper-level set of the function G(x, y).
166 6 QUASI-CONCAVE FUNCTIONS
f (tx + (1 − t)y) > tf (x) + (1 − t)f (y) ≥ min{f (x), f (y)}, (6.7.2)
Hence,
functions; but g(x) is not strictly quasi-concave; in fact, it is not even quasi-
concave, although it is strictly convex (see Figure 6.8).
which holds for t > 0 and t small enough. But this contradicts the assumption
that x̄ is a local maximum.
Note that the strict quasi-concavity of f also implies that optimal solutions
of the problem (6.8.1) are unique provided they exist. This fact has impor-
tant implications in economics: for example, if strictly quasi-concave utility
functions are used, the solution of consumer’s optimization problem will be
unique.
Theorem 6.9. Let X be a nonempty convex set and assume that f is
strictly quasi-concave. In case an optimal solution of the problem (6.8.1)
exists, it is unique.
Proof. Assume that x∗ ∈ X is an optimal solution of the problem (6.8.1),
and let y ∈ X, y 6= x∗ be another maximum. Then we have f (x∗ ) = f (y) and
the strict quasi-concavity of f implies that
holds for any t ∈ (0, 1). But this contradicts the optimality of x∗ since, in
view of convexity of X, we get tx∗ + (1 − t)y ∈ X for t ∈ [0, 1].
6.8 QUASI-CONCAVE PROGRAMMING 169
These are necessary conditions of optimality under the usual regularity con-
ditions for general nonlinear programming problems. More details about the
KKT conditions are described in §4.3. Besides these conditions, we have the
following results especially for quasi-concave optimization problems.
Theorem 6.10. (Arrow and Enthoven [1961]) Assume that g1 , . . . , gm are
quasi-concave functions and that the following regularity conditions hold:
(a) there exists an x̄ ∈ Rn such that gi (x̄) > 0 for all i = 1, . . . , m (Slater
condition), and
∂gi
(b) for each i, either gi is concave or otherwise 6= 0, i = 1, . . . , m, for
∂x
each feasible (admissible) solution of the problem (6.8.2).
170 6 QUASI-CONCAVE FUNCTIONS
since all terms are positive. Hence, |H| is positive definite, and L is minimized
at the critical values.
6.9 SUMMARY 171
6.9 Summary
Recall that a concave function f over X satisfies the inequality (3.2.2), i.e.,
for any x, y ∈ X and for t ∈ [0, 1]. From this inequality we obtain
or equivalently
6.10 Exercises
6.1. Is every quasi-concave function concave? If so, prove it; otherwise
provide a counterexample.
Ans. Not every quasi-concave function is concave. Counterexample:
Define the function f (x) = x2 with dom(f ) = R+ (the set of positive real
numbers). We must show that (i) this function f is quasi-concave, and (ii) f
is not concave. To see (i), note that f is a strictly increasing function on R+ .
Thus, if f (x′ ) ≥ f (x) for x, x′ ∈ dom(f ), then x′ ≥ x, and therefore, for any
t ∈ [0, 1], we have tx′ +(1−t)x ≥ x. Hence, f is quasi-concave. To see (ii), note
that f (0) = 0, f (2) = 4, but f ( 21 · 0 + 12 · 2) = f (1) = 1 ≤ 21 f (0) + 12 f (2) = 2,
which cannot be true if f is a concave function. Note that f (x) = x2 defined
on the entire real line would not be quasi-concave (why?).
6.2. Let f and g be real-valued concave functions with the same domain
D. Define a function h such that for all x ∈ D, h(x) = f (x) + g(x). Is h a
concave function?
Solution. Since f and g are both concave functions with domain D, we
have for x, x′ ∈ D and for all t ∈ [0, 1],
log g(x), where both log f (x) and log g(x) are concave functions, by virtue of
the property that a concave function of a concave function is concave. Thus,
log h(x), being the sum of two concave functions, is concave. Hence, h(x) is
a monotone transformation of a concave function, and therefore, it is quasi-
concave. On the other hand, if one of the functions f and g is negative valued,
then log h(x) is not well-defined, and so the above argument fails. In fact, h(x)
is, in general, not quasi-concave if both f and g are concave. For example, let
D be the set of all real numbers R, and f (x) = −1 and g(x) = −x−2 for all
x ∈ R, whence h(x) = −g(x) = x−2 . Note that although both of the functions
f and g are concave, since f ′′ (x) < 0 and g ′′ (x) < 0, the function h(x) is not
quasi-concave because h(1) = 1 = H(−1), but h( 12 · 1 + 21 · (−1)) = h(0) =
0 < h(1).
Now,
Then Lx = yz 2 − λ = 0, Ly = xz 2 − λ = 0, Lz = 2xyz − λ = 0, Lλ =
20 − x − y − z = 0. To solve these equations simultaneously, we equate λ from
the first two equations, and from the first and the third equation, giving:
yz 2 = xz 2 , yz 2 = 2xyz,
174 6 QUASI-CONCAVE FUNCTIONS
1 z 2 2yz 1 0 2yz 1 0 z2
|H̄3 | = |H| = 0 − 1 1 0 2xz + 1 1 z 2 2xz − 1 1 z2 0
1 2xz 2xy 1 2yz 2xy 1 2yz 2xz
2
= − 1(0 − 2xz · 2xz) − z (2xy − 2xz) + 2yz(2yz − 0)
+ 1(z 2 · 2xy − 2yz · 2xz) − 0 + 2yz(2yz − z 2 )
− 1(z 2 · 2xz − 0) − 0 + z 2 (2yz − z 2 )
= z 4 − 4xz 3 − 4yz 3 − 4xyz 2 + 4x2 z 2 + 4y 2 z 2 .
Thus, |H̄3 |5,5,10 = −20000 < 0. Hence, |H̄2 | > 0 and |H̄3 | < 0 imply that |H|
is negative definite, and the function f is maximized at the critical values.
6.6. Minimize the total costs defined by c = 15x2 + 30xy + 30y 2 when the
firm meets the quota g(x, y) equal to 2x + 3y = 20. Define
30 30 2
|H| = 30 60 3 .
2 3 0
The second principal minor is |H2 | = −150 < 0 Thus, |H| is positive definite
and L is minimized when x = y = 12.
6.10 EXERCISES 175
L = Q1 Q2 + λ(60 − Q1 − 3Q2 ).
0 1
|H| = = −1 < 0.
1 0
L = Q1 Q2 + λ(61 − Q1 − 4Q2 ),
for all x ∈ S;
(ii) if f is quasi-convex, then |B|k (x) ≤ 0 for all x ∈ S, k = 1, 2, . . . , n;
(iii) if for all x ∈ S,
|B|1 (x) < 0, |B|2 (x) > 0, . . . , |B|n (x) < 0 if n is odd,
|B|1 (x) < 0, |B|2 (x) > 0, . . . , |B|n (x) > 0 if n is even,
176 6 QUASI-CONCAVE FUNCTIONS
n
X
x, x′ ∈ S, and f (x) ≥ f (x′ ) =⇒ fj′ (x′ )(xj − x′j ) ≥ 0.
j=1
6.11. Prove that all extrema (critical points) of a concave function are
global maxima. Hint. The definition of concavity f (tx + (1 − t)f (y)) ≥
tf (x) + (1 − t)f (y) can be written as f (y + t(x − y)) ≥ f (y) + t[f (x) − f (y)],
6.10 EXERCISES 177
Uα (g(f ))) = {x, g(f (x)) ≥ α} = {x, g(f ) ≥ g(α′ )} = {x, f (x) ≥ α′ } = Uα′ (f )
so x2/3 y 2/3 = g((f (x, y)), where f (x, y) = x1/3 y 1/3 and g(z) = z 2 , where g(z)
is a monotonic transformation.
6.13. Any CES utility function u(x, y) = (axr + by r )1/r , 0 < r < 1, is
quasi-concave, since u(x, y) = g(h(x, y)), where h(x, y) = axr + by r is a con-
cave function because it is a positive linear combination of concave functions,
and h(z) = z 1/r is a monotonic transformation.
for all x, x′ ∈ dom(f ) and all t ∈ [0, 1]. The inequality (7.1.2) is the defining
property of a quasi-convex function, with an additional property that the neg-
ative of a (quasi-) convex function is a (quasi-) concave function. Since ‘quasi’
means ‘as if’, we expect quasi-convex functions to have some special proper-
ties similar to those for convex functions (and similarly in the case of quasi-
concave functions). Moreover, since every convex function is quasi-convex,
we expect the convex functions to be more highly structured. Although De
Finetti [1949] was the first person to recognize some of these characteristics of
functions having convex level sets, it was Fenchel [1983] who was the pioneer
in formalizing, naming, and developing the class of quasi-convex functions.
Later, Slater [1950] generalized the KKT saddle-point equivalence theorem,
and Arrow and Enthoven [1961] laid the foundation of quasi-convex program-
ming with applications to consumer demand.
180 7 QUASI-CONVEX FUNCTIONS
A strictly
quasi-convex function need not be quasi-convex. An example is:
1 if x = 0,
f (x) = . The lower-level set Lα = {x : f (x) ≤ 0} for α = 0 is
6 0.
0 if x =
not convex, but f is strictly quasi-convex.
Note that it is not proper to define strict quasi-convexity by requiring that
the lower contour sets should be strictly convex, because a lower contour set
can be strictly convex even when f has some flat portions. Further, a function
is strictly quasi-convex iff −f is strictly quasi-concave; and a strictly quasi-
convex function is quasi-convex. The lower-level set Lα and quasi-convex
functions are presented in Figure 7.1. Also, f is said to be quasi-linear if it is
quasi-convex and quasi-concave.
The inequality (7.1.1) became the defining property of a convex and quasi-
convex function, with an additional property that the negative of a (quasi-)
convex function is a (quasi-) concave function. Thus, we will find that quasi-
convex functions have some special properties similar to those for convex
functions (and similarly in the case of quasi-concave functions).
convex function is quasi-convex. This is similar to the case that every concave
function is quasi-concave. The following result relates to both quasi-concave
and quasi-convex functions.
Theorem 7.1. Let F be a function defined on Rn , and g be a function
defined on R. If F is quasi-concave and g is decreasing, then the function
f (x) = g(F (x) is quasi-convex for all x.
Theorem 7.2. A function f defined on a convex set S ∈ Rn is quasi-convex
iff for all x, x′ ∈ S such that f (x) ≥ f (x′ ), we have for t ∈ [0, 1],
p
Figure 7.2 f (x, y) = x2 + y 2 .
Recall that the sum of two convex functions is a convex function. How-
ever, the sum of two quasi-convex functions is not necessarily a quasi-convex
function, i.e., if f and g are quasi-convex, then (f + g)(x) = f (x) = g(x) need
182 7 QUASI-CONVEX FUNCTIONS
∂f (x)
f (y) ≤ f (x) =⇒ · (y − x) ≤ 0. (7.2.2)
∂x
convex iff for all x, x′ ∈ S such that f (x) ≥ f (x′ ), we have for t ∈ [0, 1],
Then for any α, the set Lα is either empty, in which case it is convex, or
consists of a single point, in which case it is convex, or contains two or
more points, in which case we choose x, x′ ∈ Lα with f (x) ≤ f (x′ ). Then
f (tx + (1 − t)x′ )) ≤ f (x′ ) ≤ a for all t ∈ [0, 1], because x′ ∈ Lα . Hence,
tx + (1 − t)x′ ∈ Lα , so that Lα is convex, and f is quasi-concave.
0 f1 f2 ··· fn
f1 f11 f12 ··· f1n
|B| = f2 f21 f22 ··· f2n , (7.3.1)
··· ··· ··· ··· ···
fn fn1 fn2 ··· fnn
where fi are the first-order derivatives of f , and fij are the second-order
derivatives of f . The leading principal minors of B are
0 f1 f2
0 f1
|B1 | = , |B2 | = f1 f11 f12 , |Bn | = |B|. (7.3.2)
f1 f11
f2 f21 f22
Since f (x) = max{f (x), f y}, the inequality (7.3.3) shows that f is quasi-
convex. Similarly, since f (x) = min{f (x), f (y)}, the inequality (7.3.4) shows
that f is quasi-concave.
fx = 4x − 5 − y − 2z = 0, fy = −x + 6y − 4 + 3z = 0, fz = 3y + 8z + 3 − 2x = 0,
which, by using Cramer’s rule gives |A| = 136, |A1 | = 176, |A2 | = 216, |A3 | =
−76, giving the critical point (x∗ , y ∗ , z ∗ ) approximately as (1.29, 1.59, −0.56).
Next, the second-order partial derivatives are:
the function f is minimized at the critical point (or, the critical point is the
minimizer of f ).
Example 7.12. Consider f (x, y) = 5x3 − 15xy + 5y 3 . Equating the first-
order derivatives to zero, we get f1 5x2 − 15y = 0, fy = −15x + 15y 2 = 0.
Solving these two equations, we get the critical points as (0, 0) and (1, 1).
Next, the second-order derivatives are fxx = 30x, fxy = −15 = fyx , fyy =
30y. At the point (0, 0), we find that fxx 0 = fyy , fxy = fyx = −15. Notice
that fxx and fyy have the same sign, and fxx fyy = 0 < (fxy )2 = (−15)2 ;
hence, the function has a point of inflection at (0, 0). Next, at the point
(1, 1), we have fxx = 30 > 0, fxy = −15 = fyx , fyy = 30 > 0, and fxx fyy =
900 > (fxy )2 = (−15)2 = 225; hence, the function has a relative minimum at
(1, 1).
50 50 2
|H̄| = 50 100 3 .
2 3 0
Then |H̄2 | = |H̄| = −250 < 0; thus, the bordered Hessian is positive definite
(PD), and the costs c(x, y) are minimized at the critical values.
20
5λK 0.6 (0.2)L−0.8 , Cλ = 400 − 5K 0.6 L0.2 . The first two equations yield =
10
3λK −0.4 L0.2
, or K = 2L/3, which when substituted in the third equation
λK 0.6 L−0.8
gives K ≈ 379, L ≈ 569. Next, the second-order partial derivatives of C
are: CKK = 1.2λK −1.4 L0.2 , CLL = λK 0.6 L−1.8 , CKL = −0.6λK −0.4 L−0.8 =
CLK . Thus, the bordered Hessian is
Then
since K, L, λ > 0. Hence, the bordered Hessian is positive definite (PD), and
the cost c(x, y) is minimized at the critical values.
7.4.3 Inequality Constraints. Minimize a quasi-convex function f (x)
subject to gi (x) ≤ 0, i = 1, . . . , m, and Ax = b, where the inequality con-
straints gi , are convex. If the objective function f is differentiable, the first-
order condition for quasi-convexity implies that x is optimal if ∇f (x)T (y − x)
> 0 for all x = 6 y. This condition is only sufficient for optimality, and it
requires that ∇f = 6 0. Note that ∇f = 0 holds in the convex case for x to
be optimal. Figure 7.5 shows that the simple optimality condition f ′ (x) = 0,
valid for convex functions, does not hold for quasi-convex function.
s ≥ t. Let x∗ denote the optimal point for the quasi-convex problem. If the
feasibility problem is:
Find x subject to φt (x) ≤ 0, gi (x) ≤ 0, Ax = b, i = 1, . . . , m, (7.4.1)
∗
is feasible, then we have x ≤ t. Conversely, if this problem is infeasible, then
we cannot conclude that x∗ ≥ t. The problem (7.4.1) is known as the convex
feasibility problem, since the inequality constraints are all convex functions
and the equality constraints are linear.
This leads to a simple algorithm for solving quasi-convex optimization prob-
lem by using bisection that solves the convex feasibility problem at each step.
Assuming that the problem is feasible, start with the interval [l, r] that con-
tains the optimal value x∗ . Then solve the convex feasibility problem at the
mid-point t = (l + r)/2, by determining the half interval that contains the
optimal value, and continue halving the half-interval each time until the width
of the interval that contains the optimal value is small enough, say ε > 0. This
is known as the bisection method. Note that the length of the interval after k
iterations is 2−k (r − l), which means that exactly ln((r − l)/ε) iterations will
be required before the algorithm terminates.
Fx = −1/(1 + x) − λ1 − λ2 = 0, Fy = 2y + 2λ1 − λ3 = 0,
Fλ1 = 3 − x + 2y = 0, Fλ2 = −x = 0, Fλ3 = −y = 0.
where x∗ can take the values ±∞. The set of all optimal points is denoted by
Example 7.17. Martos [1969, 1971] has shown that there is a class of
quasi-convex functions which need not be convex, namely, the quadratic func-
tion f (x) = 21 xQx + cx, where Q is a symmetric matrix. If X = Rn , then
f (x) is convex iff f (x) is quasi-convex; thus, Q is positive and semidefinite.
However, if X = Rn+ , the function f (x) may be quasi-convex, yet not be con-
vex. For example, f (x1 , x2 ) = −x1 x2 is quasi-convex over R2+ , but it is not
convex there.
Example 7.18. One can combine convex, concave and linear functions to
form quasi-convex functions. For example, let f and g be defined on a convex
192 7 QUASI-CONVEX FUNCTIONS
7.5 Summary
Let S be a nonempty convex set in Rn , and let f : S 7→ R. Then there are
the following types of convexity and quasi-convexity at a point.
1. Convexity at a point. The function f is convex at x ∈ S if f (tx + (1 −
t)x′ ) ≤ tf (x) + (1 − t)f (x′ ) for each t ∈ (0, 1) and each x′ ∈ S.
2. Strict convexity at a point. The function f is strictly convex at
x ∈ S if f (tx + (1 − t)x′ ) < tf (x) + (1 − t)f (x′ ) for each t ∈ (0, 1) and each
x′ ∈ S, x 6= x∗ .
3. Quasi-convexity at a point. The function f is quasi-convex at x ∈ S
if f (tx + (1 − t)x′ ) ≤ max{f (x), f (x′ )} for each t ∈ (0, 1) and each x′ ∈ S.
4. Strict quasi-convexity at a point. The function f is strictly quasi-
convex at x ∈ S if f (tx + (1 − t)x′ ) < max{f (x), f (x′ )} for each t ∈ (0, 1)
and each x′ ∈ S, f (x′ ) 6= f (x).
7.6 Exercises
7.1. Let f : R 7→ R be convex, and a, b ∈ dom(f ), a < b. (a) Show that
b−x x−a
f (x) ≤ f (a) + f (b) for all x ∈ [a, b].
b−a b−a
f (b) − f (a)
f ′ (a) ≤ ≤ f ′ (b).
b−a
(d) Suppose f is twice differentiable. Use the result in (c) to show that
f ′′ (a) ≥ 0 and f ′′ (b) ≥ 0.
Hint. The first three inequalities follow from the definition of a convex
function: Suppose f is differentiable. Then f is convex iff dom(f ) is a convex
7.6 EXERCISES 193
set and f (y) ≥ f (x)+f ′ (x)(y−x) for all x, y ∈ dom(f ), which is the first-order
Taylor’s series approximation at x. Part (d) is obvious.
7.2. Suppose f : Rn 7→ R is convex with dom(f ) = Rn , and bounded
above on Rn . Show that f is constant.
7.3. Prove that a convex function is quasi-convex. Proof. Let the func-
tion f have the domain S (a convex set). Let a be a real number and xy
be points in the lower-level set La with x, y ∈ La . First, we show that the
set La is convex. For this, we need to show that for every t ∈ [0, 1] we
have tx + (1 − t)y ∈ La . Since S on which f is defined is convex, we have
tx + (1 − t)y ∈ S, and thus f is defined at the point tx + (1 − t)y. Now,
convexity of f implies that f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y). Moreover,
the fact that x ∈ La means that f (x) ≤ a, and similarly, y ∈ La means that
f (y) ≤ a. Hence, tf (x) + (1 − t)f (y) ≤ ta + (1 − t)a = a. Combining these
two inequalities we get f (tx + (1 − t)y) ≤ a, so that tx + (1 − t)y ∈ La . Thus,
every upper-level set is convex and hence, f is quasi-convex.
Note that quasi-convexity is weaker than convexity, in the sense that every
convex function is quasi-convex.
7.4. Prove that g(t) = f (tx + (1 − t)y) is quasi-convex for t ∈ [0, 1] and for
any x, y ∈ Rn iff f is quasi-convex. Hint. Use definition (7.1.1).
7.5. Prove that
|H̄|1 (x) ≥ 0, |H̄|2 (x) ≤ 0, . . . , |H̄|n (x) ≥ 0 if n is odd,
|H̄|1 (x) ≥ 0, |H̄|2 (x) ≤ 0, . . . , |H̄|n (x) ≤ 0 if n is even,
|H̄|1 (x) > 0, |H̄|2 (x) < 0, . . . , |H̄|n (x) > 0 if n is odd,
|H̄|1 (x) > 0, |H̄|2 (x) < 0, . . . , |H̄|n (x) < 0 if n is even,
7.13. Plot the graphs of the following functions and check whether they
are quasi-concave, quasi-convex, both, or neither: (a) f (x) = x3 − 2x; (b)
f (x, y) = 6x − 9y; (c) f (x, y) = y − ln x.
7.14. Verify that the cubic function f (x) = ax3 + bx2 + cx + d is in general
neither quasi-concave nor quasi-convex.
7.15. Use definitions (6.1.3), (6.1.4), and (7.1.3) to check whether f (x) =
x2 , x ≥ 0 is quasi-concave or quasi-convex. (See Example 6.1).
sented in Figures 7.6 (a), (b), and (c); the graphs do not exist for x > 0.
√ √
Figure 7.6 (a) y = x, (b) y = − x.
7.18. Use the bordered Hessian to check whether the following functions
are quasi-concave or quasi-convex: (a) f (x, y) = −x2 − y 2 , x, y > 0; (b)
f (, y) = −(x + 1)2 − (y + 2)2 , x, y > 0.
7.21. Let f and g be real-valued convex functions with the same domain
D. Define a function h so that h(x) = f (x)g(x) for all x ∈ D. Show that
196 7 QUASI-CONVEX FUNCTIONS
8.1 Definitions
A nonnegative function f : Rn 7→ R is said to be log-concave (or logarithmi-
cally concave) if its domain is a convex set, and f satisfies the inequality
for all x, x′ ∈ dom(f ) and 0 < t < 1. If f is strictly positive, then the
logarithm of the function f , i.e., log f , is concave:
for all x, x′ ∈ dom(f ) and 0 < t < 1. Hence, f is log-convex if log f is convex.
Example 8.1. (i) Exponential function f (x) = eax ; (ii) f (x) = xa , a ≥ 0;
ex
and (iii) f (x) = (known as the inverse logit function),1 are log-concave
1 + ex
functions.
1 Note that the logit function is used in statistics to determine the log-odds or logarithm
of the order p/(1 − p), where p denotes the function power.
198 8 LOG-CONCAVE FUNCTIONS
where [H] is the Hessian matrix of f (§1.6.2). In other words, condition (8.1.4)
states that the difference between the left-hand side and the right-hand side
expressions is negative semidefinite (NSD). In the case of functions f in R
condition (8.1.4) simplifies to
Since f (x) > 0, condition (8.1.4) can also be written using the Schur com-
plement (see Gill et al. [1990]). However, condition (8.1.4) is often written
incorrectly as
f (x)∇2 f (x) ∇f (x)∇f (x)T ,
is log-concave.
(iv) The Laplace transform of a nonnegative convex function is log-concave.
(v) If the random variable has a log-concave p.d.f., then the c.d.f. is a
log-concave function.
(vi) If two independent random variables have log-concave p.d.f.s, then
their sum has a log-concave p.d.f.
8.2 Theorems
Consider a real-valued function f which is log-concave on some interval I if
f (x) ≥ 0 and x 7→ log f (x) is an extended real-valued concave function, where
we have log 0 = −∞.
Theorem 8.1. (Marshall and Olkin [1979; 16B.3.a]) A sufficient condition
for a nonnegative function f to be log-concave on I is given by
In the particular case when F (x, y) = f (y − x), first note that F is (TP)2
on A × B iff f is nonnegative and
for all E ∈ Σ, and (ii) µ(∅) = 0. The Lebesgue measure on R is a complete translation-
invariant measure of a σ-algebra containing the intervals in R such that µ[0, 1] = 1. A
σ-finite measure is a finite number, and real numbers with standard Lebesgue measure are
σ-finite but not finite.
8.2 THEOREMS 201
g ′′ (ξ) g ′′ (x)
Suppose g ′′ /g ′ is monotone decreasing. Since x > ξ, we have > .
g ′ (ξ) g ′ (x)
Then it follows from (8.2.5) that
Since g is strictly monotone and g(a) = 0, then g(x) is of the same sign
as g ′ (x) for all x ∈ (a, b). Therefore multiplying both sides of (8.2.6) by
g(x)g ′ (x) preserves the direction of inequality, and yields g ′ (x)2 − g ′ (x)g ′ (a) >
g(x)g ′′ (x), and hence g(x)g ′′ (x) − g ′ (x)2 < −g ′ (x)g ′ (a) < 0. Thus,
Since g(x) is monotone and g(b) = 0, we must have g ′ (x)g(x) < 0 for x < b.
Multiplying both sides of (8.2.8) by g(x)g ′ (x), we get g ′ (x)2 − g ′ (x)f ′ (x) >
g(x)g ′′ (x). As before, this inequality implies inequality (8.2.7), which in view
of Remark 1 establishes the log-concavity of g.
According to Schoenberg [1951], any log-concave function f on R is either
monotone or, if it is non-monotone, then f (x) → 0 as x → ±∞ at least
exponentially. Thus, if f and g are non-monotone and log-concave functions
on R, then their convolution is well defined on R.
An important theorem is as follows:
Theorem 8.5. (Lekkerkerker [1953]) Convolution of log-concave functions
defined on the interval [0, ∞) under some additional requirements, mentioned
in the next theorem, is log-concave.
Lekkerkerker’s proof is very long. A simpler proof of this result as published
by Merkle [1998a,b] is as follows.
202 8 LOG-CONCAVE FUNCTIONS
Corollary 8.2. If the density function f is log-concave on (a, b), then the
failure rate is monotone increasing on (a, b). If the failure rate is monotone
increasing on (a, b), then R′ (x)/R(x) is monotone increasing.
Corollary 8.3. If the density function f is monotone increasing, then the
reliability function F̄ is log-concave.
Proof. Since F̄ is a reliability function, then it must be monotone decreas-
ing. Thus, if f is monotone increasing, the failure rate f /F̄ must be monotone
increasing. But increasing failure rate is equivalent to a log-concave reliability
function.
8.2.4 Mean Residual Lifetime. The mean residual lifetime function mrl(x)
evaluated at x is the expected length of remaining life for a machine of age
Rb
x; it is defined as mrl(x) = x tf (t) dt − x. If this function is monotone
increasing, then a machine will age with the passage of time, in the sense that
its expected remaining life time will diminish as it gets older.
Theorem 8.9. (Muth [1977]) Let the random variable X represent the
length of life. The sufficient condition for mean residual lifetime mrl(x) to be
a monotone decreasing function is either the p.d.f. f (x) is log-concave, or the
failure rate r(x) is a monotone increasing function.
Proof. Integrating mrl(x) by parts we get
Z b
mrl(x) = F̄ (t) dt/F̄ (x).
x
Rh
Since R(x) = x F̄ (t) dt, we have mrl(x) = R(x)/R′ (x), so mrl(x) is a de-
creasing function iff R(x) is log-convex. By Theorem 8.8(ii), R(x) will be
log-convex if r(x) is an increasing function, thereby proving the sufficiency
of condition (ii). By Theorem 8.8(i), log-concavity of f (x) implies that r(x)
is monotone increasing, which implies that mrl(x) is monotone decreasing,
which proves the sufficiency of condition (i).
2 |x|2
Example 8.2. Let f (x) = e−|x| /2
, u(x) = . Then
2
1 if x ∈ K,
f (x) = χK (x) =
0 if x 6∈ K,
(8.3.1)
0 if x ∈ K,
u(x) = IK (x) =
∞ if x 6∈ K,
(s ⊙ χK ) ⊕ (t ⊙ χL ) = χsK+tL , (8.3.3)
where w = αu + βv. Here ⊕ and ⊙ are linear, in the usual sense, with respect
to the conjugates of the exponents (with reverse sign).
8.3.4 Integral Functional. For a log-concave function f that verifies the
decay condition at infinity, let
Z
I(f ) = f (x) dx ∈ [0, ∞). (8.3.6)
Rn
Note that if f = χK (K a convex set), then I(f ) = V (K), where V (K) is the
volume of K.
We will now study the limit
I((f ⊕ ε) ⊙ g) − I(f )
δI(f, g) = lim , (8.3.7)
ε→0+ ε
Equality holds iff there exists x0 such that g(x) = f (x− x0 ) for every x ∈ Rn .
This means that the functional log(I) is concave in the class of log-concave
functions, equipped with the linear structure given by the operations ⊕ and
⊙ defined by (8.3.2). Note that for f = χK and g = χL (K and L convex
sets), we get the Brunn-Minkowski inequality in its multiplicative form:
Note that (i) the assumption I(f ) > 0 can be removed in dimension n = 1.
2
(ii) Choose f (x) = e−|x| /2 and g = e−|x| in dimension n = 1; then
δI(f, g) = +∞.
(iii) For a suitable choice of f and g, δI(f, g) < 0, in contrast with the
idea that δI(f, g) is a mixed integral of f and g (mixed volumes are always
nonnegative).
(iv) Choose g = f ; then we have the formula δI(f, f ) = nI(f ) − E(f ) (no
homogeneity here!), where
Z
E(f ) = − f log(f ) dx (8.3.11)
Rn
is the entropy of f .
8.3.5 Area Measure of Log-Concave Functions. Comparing the formu-
las Z
V (K + εL) − V (L)
lim = hL dσK , (8.3.12)
ε→0+ ε Sn−1
and Z
I((e−u ⊕ ε) ⊙ e−v ) − I(e−v )
lim+ = v ∗ dµf , (8.3.13)
ε→0 ε Rn−1
Example 8.3. (a) Let Geom(p) denote the geometric distribution with
the probability mass function PX (i) = (1 − p)pi for i ∈ Z+ . For each p,
these random variables represent the ‘edge case’ of the log-concavity property,
whenever Eq (8.4.2) is an identity for all i.
(b) The Poisson distribution P (λ) is log-concave for any λ ≥ 0.
(c) Any binomial distribution is log-concave.
Definition 8.2 can be generalized as follows: Given sequences PV (v) and
PV +W (x), there exists a two-dimensional array of coefficients PV |W (x) such
that X
PV +W (x) = PV (v) PV +W (x − v|v). (8.4.3)
In fact, the sequence PV |W acts like conditional probability without requiring
the sequence to sum to 1.
Example 8.4. For some p ∈ (0, 1) and α ∈ (0, 1), define the joint distri-
bution of V and W by
i+j
P(V = i, W = j) = (1 − p)pi+j αi (1 − α)j , for i, j ≥ 0. (8.4.4)
i
a(i)
r,s = PW |V (i−r|r)PW |V (i−s|s)−PW |V (i−r−1|r)PW |V (i−s+1|s). (8.4.6)
8.4 LOG-CONCAVITY OF NONNEGATIVE SEQUENCES 209
Then we have
(i)
Condition B. For the quantities ar,s defined by (8.4.6), the following two
conditions must hold for all 0 < t ≤ m ≤ i:
t
X t
X
(i) (i)
(a) am+k,m−k ≥ 0, and (b) am+k+1,m−k ≥ 0. (8.4.7)
k=−t k=−t−1
(8.4.10)
where
m
X (i)
S1 = PV (m + k)PV (m − k)am+k,m−k
k=0
(i)
+ PV (m + k + 1)PV (m − k)am−k,m+k+1 , (8.4.12)
m
X (i)
S2 = PV (m)PV (m + 1)am,m+1 , (8.4.13)
k=0
X (i) (i)
S3 = PV (j)PV (k)aj,k + PV (j + 1)PV (k − 1)ak−1,j+1 .
i≥j≥k (8.4.14)
8.4 LOG-CONCAVITY OF NONNEGATIVE SEQUENCES 211
(i) (i)
where c0 = am,m and ck = am+k,m−k for 1 ≤ k ≤ m. Then condition B(a)
Pt
tells us that k=0 ck ≥ 0 for all 0 ≤ t ≤ m, and so by Lemma 8.1 with
k = m, i = m, Eq (8.4.12) is positive. In the same way we can show that the
sum of the second and third terms in (8.4.11) equals
m
X
PV (m + k + 1)PV (m − k) dk , (8.4.16)
k=0
(i) (i)
where dk = am+k+1,m−k + am−k,m+k+1 for 0 ≤ k ≤ m. Then condition B(b)
Pt
tells us that k=0 dk ≥ 0 for all 0 ≤ t ≤ m, and so, by Lemma 8.1 with
k = m + 1, i = m, Eq (8.4.13) is positive. Hence, PV +W (i)2 − PV +W (i −
1)PV +W (i + 1) ≥ 0 for all i. Other cases are similarly resolved.
Thus, we have established that the sum of any two independent and iden-
tically distributed geometric random variables (both on the edge case) is a
negative binomial distribution (still, log-concave, but no longer the edge case).
(i)
Next, the quantities aj,k have the following properties for independent ran-
(i) (i) (i)
dom variables: V and W , (i) aj,j+1 ≡ 0 for all j; (ii) ak−1,j+1 = −aj,k ; and
(i)
(iii) if W is log-concave, then aj,k ≥ 0 for j ≥ k.
We fix i, define cj as in Lemma 8.1, and define dj as a sequence such
Pt
that j=0 dj ≥ 0 for all 0 ≤ t ≤ m, and if V and W are independent and
212 8 LOG-CONCAVE FUNCTIONS
log-concave, then
t
X t
X
(i) (i)
a(i)
m,m + am+j,m−j − am+j−1,m−j+1
j=1 j=1
t
X t−1
X
(i) (i) (i)
= a(i)
m,m + am+j,m−j − am+j,m−j = am+t,m−t ≥ 0,
j=1 j=0 (8.4.17)
and
t
X
(i) (i)
am+j+1,m−j − am+j,m−j+1
j=0
t
X t−1
X
(i) (i) (i) (i)
= a(i)
m,m + am+j,m−j − am+j,m−j = am+t+1,m−t − am,m−1
j=1 j=1
(i)
= am+t+1,m−t ≥ 0. (8.4.18)
where each factor on the right side of (8.5.1) is a normal distribution truncated
at 1 − x1,2 . Next, the maximum yield vs. cost is evaluated as follows: If the
manufacturing cost c = x1 + 2x2 , then the maximum yield for a given cost is
given by
Ymax (c) = sup x1 +2x2 =c Y (x), (8.5.2)
x1 ,x2 ≥0
The relation between the cost c and the maximum yield Ymax is presented in
Figure 8.2, where the cost rises as the yield increases.
Let a density function f be defined as
where φ is concave (and so −φ is convex). Let us call the class of all such
densities f on R as the class of log-concave densities and denote this class by
P0 ≡ Plog-concave . The function f ∈ P0 is log-concave iff
214 8 LOG-CONCAVE FUNCTIONS
(i) log f (tx + (1 − t)y) ≥ t log f (x) + (1 − t) log f (y) for all t ∈ [0, 1] and for
all x, y ∈ R;
(ii) f (tx + (1 − t)y) ≥ f (x)t · f (y)1−t ;
p
(iii) f ((x + y)/2) ≥ f (x)f (y) (for t = 12 , and assuming f is measurable);
(iv) f ((x + y))2 ≥ f (x)f (y).
1 2
Example 8.8. 1. Standard normal distribution: f (x) = √ e−x /2 ; then
√ 2π
− log f (x) = 12 x2 + log 2π, and (− log f (x))′′ = 1.
1 −|x|
2. Laplace distribution: f (x) = 2e ; then − log f (x) = |x| + log 2,
(− log f (x))′′ = 0 for all x =
6 0.
ex
3. Logistic distribution: f (x) = ; then −logf (x) = −x + 2 log(1 +
(1 + ex )2
ex
ex ), (− log f (x))′′ = = f (x).
(1 + ex )2
8.6 Exercises
8.1. Prove that if P1 and P2 are log-concave probability measures, then
the product measure P1 × P2 is a log-concave probability measure.
Proof. If a probability measure P in Rn assigns zero mass to every
hyperplane in Rn , then by (8.1.1), log-concavity of P holds if P (tx+(1−t)y) ≥
P (x)t P (y)1−t , 0 < t < 1, for all x, y ∈ Rn . Let A, B denote two rectangular
hyperplanes with sides parallel to the coordinate axes such that all x ∈ A
and all y ∈ B. Then, by the above inequality for these hyperplanes, we have
P (tA + (1 − t)B) ≥ P (A)t P (B)1−t for 0 < t < 1. A similar argument applies
to the product P1 × P2 .
8.6 EXERCISES 215
8.2. Show that Condition B holds in Example 8.5. Solution. For any
(i) i+1
i, ar,s is proportional to ri si − i−1r s , i.e., for part (a) of Condition B,
(i) (i)
for any given i, the increment term am+t,m−t + am−t,m+t is proportional to
i
i
i−1
i+1
i+1
i−1
2 m+t m−t − m+t m−t − m+t m−t , which is positive for t ≤ T and nega-
Pt
(i)
tive for t > T for some value of T > 0. Hence the partial sums am+k,m−k
k=−t
form a sequence which increases for t ≤ T and decreases thereafter. Then,
P b m
P (i)
using the identity j aj r−j = a+br , we find that am+k,m−k = 0, and
k=−m
thus the sequence of partial sums must be nonnegative for any t. A similar
argument holds for part (b) of Condition B.
8.3. Show that the density function (p.d.f.) of which of the following
probability distributions is log-concave, log-convex, or log-linear: (i) uniform
distribution; (ii) standard normal distribution; (iii) logistic distribution; (iv)
extreme-value distribution; (v) exponential distribution; (vi) Weibull distri-
bution; (vii) power function distribution; (viii) Gamma distribution; (ix) chi-
square distribution; (x) chi-distribution; (xi) beta distribution; and (xii) Stu-
dent’s t-distribution.
Answer. (i) Uniform distribution, defined on the interval [0, 1], has density
f (x) = 1, which is (weakly) log-concave.
(ii) Standard normal probability distribution has probability density f (x) =
1 −x2 /2
√ e , whence (log f (x))′ = −x and (log f (x))′′ = −1 < 0. Thus, the
2π
density is log-concave.
e−x
(iii) Logistic distribution has density f (x) = , whence (log f (x))′
(1 + e−x )2
= −1+2(1−F (x)), and (log f (x))′′ = −2f (x) < 0; hence, f (x) is log-concave.
(iv) Extreme-value distribution has density function f (x) = exp{−e−x},
giving (log f (x))′′ = −e−x < 0; hence f (x) is log-concave; (v) exponential
distribution has density f (x) = λe−λx , with (log f (x))′′ = 0, and f ′ (x) < 0
for all x ∈ [0, ∞], and hence f (x) is log-concave.
(vi) Weibull distribution with parameter c has density function
f (x) =
< 0 for c > 1,
c−1 −xc ′′ −2 c
cx e , x ∈ (0, ∞). Also, (log f (x)) = (1−c)x (1+cx ) = 0 for c = 1,
> 0 for c < 1.
Thus, f (x) is (strictly) log-concave if 0 < c < 1, log-linear if c = 1, and it is
log-convex if c > 1.
c
(vii) Power function distribution has density function f (x) = cxc−1 e−x ,
216 8 LOG-CONCAVE FUNCTIONS
< 0 for c > 1,
′′ −2 c
x ∈ (0, ∞). Also, (log f (x)) = (1 − c)x (1 + cx ) = 0 for c = 1, Thus,
> 0 for c < 1.
the density function is (strictly) log-concave if 0 < c < 1, log-linear if c = 1,
and log-convex if c > 1.
xm−1 θm e−xθ
(viii) Gamma distribution has density function f (x) = ,x∈
Γ(m)
1−m
(0, ∞), θ > 0 and m > 0. Then (log f (x))′′ = . Thus, the density
x2
function is strictly log-concave for m > 1, but strictly log-convex for m < 1.
(viii) Chi-square distribution with n degrees of freedom is a gamma distri-
bution with θ = 2 and m = n/2. Since the sum of the squares of n indepen-
dent standard normal random variables has a chi-square distribution with n
degrees of freedom, and since the gamma distribution has a log-concave den-
sity function for m ≥ 1, so the sum of the squares of two or more independent
standard normal variables has a log-concave density function.
x(n/2)−1 e−n/2 x2
(ix) chi-distribution has density function f (x) = , x > 0,
2n/2 Γ(n/2)
n−1
where n is a positive integer. Since (log f (x))′′ = − 2 − n < 0, the density
x
function is log-concave.
xa−1 (1 − x)b−1
(x) beta-distribution has density function f (x) = ,x ∈
B(a, b)
1−a 1−b
(0, 1), a, b > 0. Since (log f (x))′′ = + , the density function is
x x
log-concave if a ≥ 1 and b ≥ 1, and log-convex if a < 1 and b < 1. If a < 1
and b > 1, or if a > 1 and b < 1, then the density function is neither log-convex
nor log-concave on (0, 1).
(xi) Student’s t-distribution is defined on the entire real line with density
function
(1 + x2 /n)−n+1/2
f (x) = √ ,
n B(1/2, n/2)
where B(a, b) is the incomplete beta function and n is the number of degrees
n − x2
of freedom. Since (log f (x))′′ = −(n + 1) 2 2
, the density function is
√ √(n + x )
log-concave on the central interval [− n, n], and therefore, it is log-concave
√
√ this interval but log-convex on each of the outer intervals [∞, − n] and
on
[ n, ∞]. Thus, although√ this
√ distribution is itself not log-concave, a truncate
one on the interval [− n, n] is log-concave.
9
Quadratic Programming
1 T
(P) : Minimize 2 x Qx + cT x subject to Ax ≥ b, (9.1.1)
m
X
L(x, λ ) = f (x) + λj gj (x) = 21 xT Qx + cT x + λT (b − Ax). (9.1.2)
j=1
The dual problem is defined as max min L(x, λ ) , that is,
λ ≥0 x
∂L(x, λ )
max L(x, λ ) subject to = 0. (9.1.3)
λ ≥0 ∂x
T
subject to
T
Lx (x, λ ) = Qx
T
+ c − TA λ = 0, λ ≥ 0. This dual constraint implies
that x Qx + c − A λ = x 0, where Lx denotes the first-order partial
derivatives L with respect to x, i.e.,
xT Qx + xT c − xT AT λ = xT Qx + xT c − λ T Ax = 0. (9.1.5)
Qx + c − AT λ = 0 =⇒ x + Q−1 c − Q−1 AT λ = 0,
i.e.,
x = Q−1 [AT λ − c]. (9.1.8)
Thus, we may eliminate x altogether from the dual problem.
Given any two matrices U and V, we will use the following four known
results: (i) [UV]T = VT UT ; (ii) [UT ]T = U; (iii) UT V = VT U (assuming
compatibility); and (iv) Q and Q−1 be symmetric and identical to their trans-
poses. Then, substituting the value of x from (9.1.6) into the dual objective
function and rearranging, we get
T
= bT λ − 1
2 AT λ − c · Q−1 · Q · Q−1 · AT λ − c by (i) and (iv)
T
= bT λ − 12 [ AT λ − cT ] · Q−1 · AT λ − c
λT AQ−1 − cT Q−1 ] · (AT λ − c) by (i) and (ii)
= bT λ − 21 [λ
λT AQ−1 AT λ + cT Q−1 c − cT Q−1 AT λ − λ T AQ−1 c]
= bT λ − 21 [λ
λT AQ−1 AT λ + cT Q−1 c − cT Q−1 AT λ − (AQ−1 c)T λ ] by (iii)
= bT λ − 12 [λ
λT AQ−1 AT λ + cT Q−1 c − cT Q−1 AT λ − cT (Q−1 )T AT λ ] by(i)
= bT λ − 21 [λ
λT AQ−1 AT λ + cT Q−1 c − 2cT Q−1 AT λ ] by (iv)
= bT λ − 21 [λ
= [bT + cT Q−1 AT ] λ − 1
λT
2 [λ AQ
−1 T
A λ ] − 12 [cT Q−1 c]
= uT λ − 1
2
λT v λ ] − 12 [c Q
[λ T −1
c],
where
u = b + AQ−1 c, and v = −AQ−1 AT .
max uT λ − 21 λ T v λ − 1
2 cT Q−1 c . (9.1.9)
λ ≥0
−x ≥ −1 −y ≥ −1
0 ≤ x ≤ 1 =⇒ 0 ≤ y ≤ 1 =⇒
x ≥ 0; y ≥ 0,
that is,
Minimize 12 xT Qx + cT x, subject to Ax ≥ b,
−1 0 −1
x 1 0 −4 0 −1 −1
x= , Q= , c= , A= , b = .
y 0 1 −4 1 0 0
0 1 0
220 9 QUADRATIC PROGRAMMING
∂ L̂
λ ) = u + v λ = 0.
(λ (9.2.1)
∂λλ
9.2 HILDRETH-D’ESOPO METHOD 221
which yields
∂ L̂/∂λ1
3
−1 0 1 0
λ1
∂ L̂ ∂ L̂/∂λ2
3 0 −1 0 1 λ2
=
∂ L̂/∂λ3 = u + v λ = −4 + 1
∂λλ 0 −1 0 λ3
∂ L̂/∂λ4 −4 0 1 0 −1 λ4
1 − λ1 + λ3 0
1 − λ2 + λ4 0
= = , (9.2.2)
−4 + λ1 − λ3 0
−4 + λ2 − λ4 0
which must be solved to obtain the optimal λ ∗ . This becomes simpler using
the Hildreth-D’Espo method, which is as follows. We start with (9.2.1). Then
∂ L̂
First iteration. Let λ = [0 0 0 0]T . Solve = 0, keeping λ2 = 0 =
∂λ1
λ3 = λ4 ; we get −λ1 + λ3 + 1 = 0, which gives λ1 = 1, thus λ = [1 0 0 0]T .
∂ L̂
Next, solve = 0, with λ1 = 1, λ3 = 0 = λ4 ; we get −λ2 + λ4 + 1 = 0,
∂λ2
yielding λ2 = 1. Thus, λ = [1 1 0 0]T .
∂ L̂
Next, solve = 0, with λ1 = 1 = λ2 , λ4 = 0; we get λ1 − λ3 − 4 = 0,
∂λ3
yielding λ3 = −3. Fix λ3 = 0. Thus, λ = [1 1 0 0]T .
∂ L̂
Finally, solve = 0, with λ1 = 1 = λ2 , λ3 = 0; we get λ2 − λ4 − 4 = 0,
∂λ4
yielding λ4 = −3. Fix λ4 = 0. Thus, λ = [1 1 0 0]T .
End of First iteration.
Notice that λ has changed from [0 0 0 0]T to [1 1 0 0]T . So we go to the
second iteration.
Second iteration. It goes through the following four steps:
1. −λ1 + λ3 + 1 = 0 with λ2 = 1, λ3 = 0 = λ4 , yielding λ1 = 1; thus,
λ = [1 1 0 0]T .
222 9 QUADRATIC PROGRAMMING
Finally, note that using these values of x the primal objective (9.1.1) yields
1 T 1 0 3 3
2 x Qx + cT x = 1
2 [3 3] + [ −4 −4 ] = −13.
0 1 3 3
First, the vectors x and c and the matrices A and Q are partitioned into
their basic and nonbasic partitions, which are denoted by the index B and N
respectively. Thus, A = [AB | AN ], where AB has columns corresponding to
the basic variables xB and is nonsingular. Again, the constraint in (9.3.1) is
expressed as AB xB + AN xN = b. Thus, if xN = 0, then xB = [AB ]−1 b. In
general, xB = [AB ]−1 b − [AB ]−1 AN xN for x that satisfies Ax = b.
The partitions of c and Q are
QB
B QN
B
c = [cB | cN ]T , Q= , (9.3.2)
QB
N QN
N
where QN B T
B = [QN ] .
which is of the matrix form (9.3.1). The algebraic function f (x, y, z) can be
expressed in the matrix form as
1 3 3 x x
1
[
3 | x y z ] 3 1 1 y + [ 3 1 1 ] y , (9.3.4)
{z } | {z }
xT
3 2 3 z cT
z
| {z } | {z } | {z }
Q x x
subject to
x
x+z =9 1 0 1 9
=⇒ y = , x ≥ 0. (9.3.5)
y + z = 12 0 1 1 12
| {z } z | {z }
A b
Also
−1
B −1
1 0 9 9
A b= = = xB ,
0 1 12 12
−1
−1 1 0 1 0
AB AN xN = [0] = .
0 1 1 0
The objective function (9.3.1) is
f (x) = 13 xT Qx + cT x
QB
B QN
B
xB
xB
= 13 [ xTB xTN ] + [ cT cT ]
B N xN
QN QN xN
B N
QB xB + QB xN
= 13 [ xTB xTN ] + cB xB + cN xN
B N
QN xB + QN xN
T B
= 3 xB QB xB + xB QN
1 T T B T N
B xN + xN QN xB + xN QN xN + cB xB + cN xN .
224 9 QUADRATIC PROGRAMMING
where
−1 −1 −1 o
z = cB AB b + 31 [ AB b]T QB
B A
B
b ,
p = [(AB )−1 b]T QN B −1 T B
B − [(A ) b] QB (AB )−1 AN + cN − cTB (AB )−1 AN ,
B −1 N
R = QN B
N − QN (A ) A + [(AB )−1 AN ]T QB B −1 N
B (A ) A .
∂f (x) X
= pi + rik xk (for i ∈ N )
∂xi
k∈N
= pi (at xN = 0). (9.3.7)
At this point we must choose any negative pi and increase the corresponding
value of xi until
(i) one of the basic variables becomes zero (as in the simplex method); or
∂f (x) −pi
(ii) = pi + rii xi = 0, i.e., xi = . Note that this result is nonbasic
∂xi rii
but is a feasible solution.
Example 9.2, continued. We will use iteration method.
(First iteration:) We have
3 1 3 1 0 1 1 1
p = [ 9 12 ] −[ 9 12 ] +[1]−[ 3 1] = [−51].
1 3 1 0 1 1 1 0
∂f
Thus, = −51.
∂z
Note that we do not need R, since xN = 0. Next, increase z from 0 until
one of the basic variables goes to zero, i.e., by 9 units when x = 0, y = 3, z = 9.
Then the new objective function is 84, as compared to 330 at the previous
iteration.
9.4 WOLFE’S METHOD 225
m
X n
nX o Xn
L(x, s, t, λ , µ ) = f (x) − λi aij xj − bi + s2i − µj {−xj + t2j },
i=1 j=1 j=1
1 The simplex method is an algorithm for starting at some extreme feasible point and,
2 The algorithm and examples can be found in Bertsimas and Tsitsiklis [1997], Bertsekas
[1999], and Cormen et al. [2001].
9.4 WOLFE’S METHOD 227
Table 1
Note that λ1 cannot be the entering variable since s21 is the basic variable such
that λ1 s21 = 0. Hence, x is the entering variable since µ1 is not basic; similarly,
y can also be the entering variable since µ2 is not basic. The min-ratio for v1
and s21 , given by (xB /x(0), is (8/4, 6/3) = (2, 2), which is a tie. So we must
take y as the entering variable, since it has the min-ratio (10/2, 6/2) = (5, 3).
This leads to Table 2.
Table 2
Now λ1 can enter since s21 is not basic, and for v1 and v2 the min-ratio is
(xB /y(0)) = (8/3, 4/2). This leaves the variable v2 , which leads to Table 3.
228 9 QUADRATIC PROGRAMMING
Table 3
Table 4
9.5 Exercises
9.1. For the following optimization problems, let f (x) be the cost function
and g(x) the inequality constraint. Minimize f (x) subject to g(x) ≤ 0, given:
(i) f (x) = x, g(x) = |x| in a domain D ⊂ R;
(ii) f (x) = x3 , g(x) = −x + 1 in a domain D ⊂ R;
(iii) f (x) = x3 , g(x) = −x + 1 in a domain D ⊂ R+ ; and
−x − 2 for x ≤ −1,
(iv) f (x) = x, g(x) = x for −1 ≤ x ≤ 1 in a domain D ⊂ R,
−x + 2 for x ≥ 1.
Also, in each problem plot the graph, state whether it is convex and if so,
whether the Slater condition (§5.3.2) is satisfied, i.e., gi (x) ≥ 0 for all i.
9.5 EXERCISES 229
9.2. (a) Minimize the distance between the origin and the convex region
bounded by the constraints x + y ≥ 6, 2x + y ≥ 8, and x, y ≥ 0, and (b) verify
that the KKT necessary conditions are satisfied at the point of minimum
distance.
Solution. Since minimizing the required distance is equivalent to mini-
mizing the distance from the origin to the tangent of the circle that touches
the given convex region, consider a circle x2 + y 2 = r2 , and minimize r2 , or
f (x, y) = x2 +y 2 subject to the constraints x+y ≥ 6, 2x+y ≥ 8, and x, y ≥ 0.
The feasible region lies in the first quadrant x ≥ 0, y ≥ 0. In Figure 9.1, the
lines x + y = 6 and 2x + y = 8 are plotted, and the feasible region is the
shaded region. We will determine a point (x, y) which gives a minimum value
of f (x, y) = x2 + y 2 subject to the given constraints.
dy x
The slope of the tangent to a circle x2 + y 2 = c2 is = − ; the slope of
dx y
the line x + y = 6 is −1, while that of the line 2x + y = 8 is −2. Then
dy x
Case 1. If the line x + y = 6 is tangent to the circle, then = − = −1,
dx y
which gives x = y. Then solving x + y = 6 and x = y, we get x = 3 = y, i.e.,
this line touches the circle at (3, 3).
dy x
Case 2. If the line 2x + y = 8 is tangent to the circle, then =− =
dx y
230 9 QUADRATIC PROGRAMMING
∂L ∂L
= 2x − λ1 − 2λ2 = 0, = 2y − λ1 − λ2 = 0,
∂x ∂y
∂L ∂L
= 6 − x − y = 0, = 8 − 2x − y = 0,
∂λ1 ∂λ2
and solving these equations simultaneously we find at the point (3, 3) that
λ1 = 6, λ2 = 0. Also, since λ1 (6 − x − y) = 0, λ2 (8 − 2x − y) = 0 at (3, 3), the
KKT conditions are satisfied at the point (3, 3), and min f (x, y) = 18 at this
point.
9.3. Use the KKT conditions to minimize 12 x2 + 21 y 2 − 2x − 2y subject to
the constraint
−x ≥ −1 −y ≥ −1
0 ≤ x ≤ 1 =⇒ 0 ≤ y ≤ 1 =⇒
x ≥ 0; y ≥ 0.
Ans. x∗ = [ 1 1 ]T .
9.4. Use KKT conditions to minimize f (x, y) = x2 + 4y 2 − 8x− 16y subject
to the constraints x + y ≤ 5, x ≤ 3, x ≥ 0, y ≥ 0.
2 0 −8 1 1 5
Hint. Q = , cT = ,A= , and b = . Ans.
0 8 −16 1 0 3
x∗ = [3 2]T .
9.5. Use KKT conditions to minimize f (x, y, z) = 2x2 + y 2 + 4z 2 subject
to the constraints x + 2y − z = 6, 2x − 2y + 3z = 12, x, y, z ≥ 0.
9.5 EXERCISES 231
4 0 0 0
1 2 −1 6
Hint. Q = 0 2 2 , A = , cT = 0, b = .
2 −2 3 12
0 0 8 0
Ans. x∗ = [5.045 1.194 1.433]T .
9.6. Use Beale’s method to solve the problem: min 21 x2 + 12 y 2 +z 2 +2xy +
2xz +yz + 2x +y + z subject to the constraint x + z = 8, y + z = 10. Hint.
1 2 2
T 1 0 1
Q = 2 1 1 , c = [ 2 1 1 ], A =
, and bT = [ 8 10 ].
0 1 1
2 1 2
The dual function (9.2.1) is obtained by minimizing the above strictly convex
function. Thus, we get
1
g(c) = − (GT c − 2AT b)T (AT A)−1 (GT c − 2AT b) − cT h.
4
9.8. Consider the optimization problem in R2+ with the objective function
f (x) = [F1 (x) F2 (x)]T , where F1 (x) = x2 + y 2 and F (x) = (2x + 5)2 .
Then (a) evaluate all Pareto optimal values and points in explicit expressions
for both values and points using the scalarization method; and (b) solve the
scalarization problem with either weight equal to zero; in both cases show
that the solutions of the scalar problem are also Pareto optimal.
3 This theorem states that a continuous function on a nonempty closed bounded set
Solution. Since the problem is convex, all Pareto optimal points can be
obtained using the scalarization method with some weight vector λ ≥ 0. Thus,
fix some λ ≥ 0 and solve the problem: minimize {λ1 (x2 + y 2 ) + λ2 (2x + 5)}.
This problem is equivalent to minimizing {(λ1 +4λ2 )x2 +λ1 y 2 +20λ2 x+25λ2 }.
Any solution of this problem will give a Pareto optimal point and value. Since
the cost function f is strictly convex, the corresponding Pareto optimal point
is given by " #
−10λ2
x∗ (λ1 , λ2 ) = λ1 + 4λ2 ,
0
which, by setting µ = λ2 /λ1 , can be written as
" −10µ #
∗
x (µ) = 1 + 4µ ,
0
thus yielding
−10µ 2 −20µ 2
F1∗ (µ) = , and F2∗ (µ) = +5 .
1 + 4µ 1 + 4µ
The remaining Pareto optimal points and values are calculated as follows: let
µ → 0 and µ → ∞. As µ → 0, we get x∗ = 0, which gives f ∗ (x) = [0 25].
Next, as µ → ∞, we get x∗ = [− 52 0]T and f ∗ (x) = [ 25
4 0], which corresponds
to minimizing the error in the solution with the minimum norm, and this x∗
is not necessarily a Pareto optimal point.
9.9. Prove that any local optimal point of a convex problem is (globally)
optimal.
Proof. Assume x is locally optimal and y is optimal with f (x) < f (y).
Since x is locally optimal, it means that there is an M such that if z is
feasible and kz − xk2 ≤ M , then f (z) ≥ f (x). Let z = ty + (1 − t)x, where
M
t = . Then ky − xk2 > M for 0 < t < 12 . Since z is a convex
2ky − xk2
combination of two feasible points, it is also feasible. Also, kz − xk2 = M/2,
and thus, f (z) ≤ tf (x) + (1 − t)f (x) + (1 − t)f (y) < f (x) + (1 − t)f (x), which
contradicts the assumption that x is locally optimal. The result also holds in
Rn .
10
Optimal Control Theory
10.1 Hamiltonian
The Hamiltonian is the operator corresponding to the total energy of the
system in most cases. Its spectrum is the set of possible outcomes when one
measures the total energy of a system. Because of its close relation with
the time-evolution of a system, it is very important in most formulations of
quantum theory. On the other hand, the Hamiltonian in optimal control
theory is distinct from its quantum mechanical definition. Pontryagin proved
that a necessary condition for solving the optimal control problem is that the
control should be chosen so as to minimize the Hamiltonian. This is known
as Pontryagin’s minimum principle which states that a control u(t) is to be
chosen so as to minimize the objective function
Z T
J(u) = Ψ(x(T )) + L(x, u, t) dt, (10.1.1)
0
where x(t) is the system state which evolves according to the state equa-
tions ẋ = f (x, u, t), x(0) = x0 , t ∈ [0, T ], and the control must satisfy the
constraints a ≤ u(t) ≤ b, t ∈ [0, T ].
time according to the differential equation set equal to zero in the constraint;
y(t) is the control variable whose value is selected or controlled to optimize
J; t denotes time; and ẋ denotes the time derivative of x, i.e., dx/dt. The
solution of the optimal control problem (10.2.1) is obtained to set the limits
of the optimal dynamic time path for the control variable y(t).
The dynamic optimal control problems involve the Hamiltonian function H
similar to the Lagrangian in static optimal control problems. The Hamiltonian
is defined as
H(x(t), y(t), λ(t), t) = f (x(t), y(t), λ(t), t) + λ(t)g(x(t), y(t), t), (10.2.2)
where, unlike the static problems, the multiplier λ(t), called the costate vari-
able, is a function of t and estimates the marginal value or shadow price of
the associate state variable x(t). The method of solving problems of the type
(10.2.1) is similar to that used for solving the static optimization problem
involving the Lagrangian. Thus, assuming that the Hamiltonian is differen-
tiable in y and strictly concave so that there is an interior solution, and not
an endpoint solution, the necessary conditions for maximization are
∂H
(a) = 0,
∂y
∂λ ∂H
(b) = λ̇ = − ,
∂t ∂x
∂x ∂H (10.2.3)
(c) = ẋ = ,
∂t ∂λ
(d) x(0) = x0 ,
(e) x(T ) = xT .
Conditions (a), (b) and (c) are known as the maximum principle, and con-
ditions (d) and (e) as the boundary conditions; the two equations of motion
in conditions (b) and (c) are called the Hamiltonian system or the control
system.
For minimization, the objective functional can simply be multiplied by −1,
as in concave programming. If the solution does not involve an end point,
∂H
need not be equal to zero as in condition (a).
∂y
Example 10.1. Solve the following dynamic optimal control problem:
Z 4
Maximize (5x − 6y 2 ) dt subject to ẋ = 6y, x(0) = 2, x(4) = 98.
0
The Hamiltonian is
H = 5x − 6y 2 + λ(6y).
10.2 OPTIMAL CONTROL 235
λ = −5t + c1 . (10.2.4d)
where the arbitrary constants c1 and c2 are determined using the boundary
conditions x(0) = 2, x(4) = 98, giving c2 = 2, c1 = 18. Hence,
The equation of motion ẋ = 6y also gives the above control variable, since
−15t + 54 = 6y yields y = −2.5t + 9. Finally, at the endpoints we have
y(0) = 9, y(4) = −1. Thus the optimal path of the control variable y(t) is
linear, starting at the point (0, 9) and ending at the point (4, −1), with a slope
of −10/4 = −2.5.
fxx fxy
H= , (10.2.5)
fyx fyy
236 10 OPTIMAL CONTROL THEORY
with the endpoint values y(0) = 30 and y(3) = 0. Thus, the optimal path
of the control variable is linear, from point (0, 30) to (3, 0), with the slope of
−30/3 = −10.
Step 3. If x∗ (T ) < xmin , set the terminal endpoint equal to the value of
the constraint, x(T ) = xmin , and solve as a fixed endpoint problem.
Example 10.3. Solve
Z 3
Maximize (5x − y 2 ) dt subject to ẋ = 4y, x(0) = 2, x(3) ≥ 180.
0
Since the free endpoint solution satisfies the terminal endpoint constraint
X(T ) ≥ 180, the constraint is not binding and we thus have a proper solution,
where from Example 10.2, the control variable y(t) = −10t + 30.
Example 10.4. Solve the same problem as in Example 10.3 but with
the new boundary conditions: x(0) = 5 and x(4) ≥ 190. First, we will use
the complementary slackness condition to find the solution by assuming that
x(1) − S 2 = 190, where S is called the slackness variable. There are two cases
to consider:
Case 1. λ = 0: Then −2y = 0, or y = 0. Also, ẋ = 4y = 0 gives
x(t) = a1 , which using the initial condition x(0) = 2, gives a1 = 2. Then the
terminal condition x(1) = S 2 + 190 gives S 2 + 190 = 2, or S 2 = −188, which
is infeasible.
Case 2. S = 0: Then the first two steps are the same as in Example 10.3
solved as a free endpoint problem. The maximum principle gives
thus, giving λ(t) = −5t + c1 , x(t) = −20t2 + 8c1 t + c2 . Now the new boundary
conditions are x(0) = 5 and x(4) = 190, which yield c2 = 5 and c1 = 15.2.
Hence, λ(t) = −5t + 15.2, x(t) = −20t2 + 121.6t + 5, and y(t) = −10t + 30.4.
∂Hc
If the solution does not involve an endpoint, then 6 0 in condition
=
∂y
(a), but Hc must still be maximized with respect to y. Since Hc = H ept ,
the value of y that will maximize Hc will also maximize Hc since ept , being
independent of y, is treated like a constant when maximizing with respect
to y. The sufficiency conditions, depending on the sign of the Hessian |H|,
remain the same for the discounted optimal control problem.
Z 3
Example 10.5. Maximize e−0.02t (x − 2x2 − 5y 2 ) dt, subject to ẋ =
0
y − 0.5x, x(0) = 90, and x(3) free. Solution. First, we check the sufficient
conditions to ensure that this problem has a global maximum. The Hessian
fxx fxy −4 0
|H| = = = 40 > 0,
fyx fyy 0 −10
and the first principal minor |H1 | = −4 < 0, while the second principal minor
|H2 | = |H| > 0, which imply that the Hessian is negative definite. Thus, f is
240 10 OPTIMAL CONTROL THEORY
∂Hc
= 10y + µ = 0, which gives y = 0.1µ,
∂y
∂Hc
µ̇ = pµ − = 0.02µ − (1 − 4x − 0.5µ) = 0.52µ + 4x − 1,
∂x
∂Hc
ẋ = = y − 0.5x = 0.1µ − 0.5x.
∂µ
To solve for µ and x, the last two equations above are written in the matrix
form Y = AY + B as
µ̇ 0.52 4 µ −1
= + .
ẋ 0.1 −0.5 x 0
0.52 − r 4
|A − rI| = = 0,
0.1 −0.5 − r
where, using formula (A.20) with λ replaced by r so it does not conflict with
the Lagrange multiplier λ, the characteristic roots are
p
0.02 ± (0.02)2 − 4(−0.66)
r1,2 = = 0.82245, −0.80245.
2
For r1 = 0.82245, the eigenvector ye1 is determined by solving the equation
0.52 − 0.82245 4 c1 −0.30245 4 c1
= = 0,
0.1 −0.5 − 0.82245 c2 0.1 1.32245 c2
Now, we apply the boundary conditions: Since µ(T ) e−pt = 0 at the free
endpoint, we get at T = 3,
µ(3) e−0.02(3) = 0,
or
[3.306k1 e0.82245(3) − 3.0247k2 e−0.80247(3) + 0.758] e−0.06 = 0,
or
37.033k1 − 0.256k2 + 0.713 = 0. (10.5.4)
Also, at T = 0, x(0) = 90, so we have
and
10.6 Exercises
Fixed endpoint:
R2
10.1. Maximize 0 (3x − 2y 2 ) dt subject to ẋ = 8y, x(0) = 5, and x(2) free.
Hint. Hamiltonian H = 3x − 2y 2 + λ(8y); ẋ = 16λ, x(t) = −24t2 + 96t + 5
(state variable); y(t) = −6t + 12 (control variable). The optimal path of the
control variable is linear starting at (0, 12) and ending at (2, 0) with slope −6.
Free endpoints:
10.2. Solve the optimal control problem with a free endpoint:
Z 4
Maximize (5x − 2y 2 ) dt subject to ẋ = 8y, x(0) = 2, x(4) free.
0
∂H
Solution. We have = −4y + 8λ = 0, which gives y = 0.5λ, and
∂y
∂H ∂H
λ̇ = − = −5, ẋ = = 8y. Then λ(t) = −5t + c1 , where c1 = 20 using
∂x ∂λ
the boundary condition λ(4) = 0. Also, since ẋ = −80t + 16c1 , we get on
integrating, x(t) = −40t2 + 320t + c2 , where c2 = 2 using condition x(0) = 2.
Thus, x(t) = −40t2 + 320t + 2, and then y(t) = 0.5λ = −2.5t + 10, with the
endpoint values y(0) = 10 and y(4) = 0. Thus, the optimal path of the control
variable is linear, from point (0, 10) to (4, 0), with the slope of −10/4 = −2.5.
Z 2
10.3. Maximize (9x − 12y 2 ) dt, ẋ = 18y, x(0) = 5, x(3) free. So-
0
lution. First we check if the sufficiency condition is met. The Hessian
f fxy 9 0
|H| = xx = = −24 < 0, and the first principal minor
fyx Fyy 0 −24
|H1 | = 9 > 0, while the second principal minor |H2 | = |H| > 0, which imply
that the Hessian is negative definite. Thus, f is concave in both x and y, and
g = 8y is linear, so conditions for a global maximum are satisfied.
The Hamiltonian is
H = 9x − 12y 2 + λ(18y).
Then we have
∂H
= −24y + 18y, which gives y = 0.75λ,
∂y
∂H
λ̇ = − = −9,
∂x
∂H
ẋ = = 18y = 18(0.75λ) = 13.5λ.
∂λ
10.6 EXERCISES 243
Integrating the first of the last two equations, we get λ(t) = −9t + c1 . Then
ẋ = 13.5(−9t + c1 ) = −121.5t + 13.5c1 , which on integrating gives
Using the transversality condition λ(3) = 0, we find that c1 = 27, and thus,
the costate variable is λ(t) = −9t + 27. Condition x(0) = 5 gives c2 = 5, and
the state variable is
x(t) = −60.5t2 + 364.5t + 5,
and the control variable is given by
fxx fxy −4 0
|H| = = = 8 > 0,
fyx Fyy 0 −2
and the first principal minor |H1 | = −4 < 0, while the second principal minor
|H2 | = |H| > 0, which imply that the Hessian is negative definite. Thus, f is
concave in both x and y, and g = x + 2y is linear, so conditions for a global
maximum are satisfied.
The Hamiltonian is
Then we have
∂H
= 3 − 2y + 2λ, which gives y = λ + 1.5,
∂y
∂H
λ̇ = − = 1 + 4x − λ,
∂x
∂H
ẋ = = x + 2y = x + λ + 1.5λ.
∂λ
The last two equations in matrix form Y = AX + B are
λ̇ −1 4 λ 1
= + .
ẋ 1 1 x 1.5
which gives
The transversality condition λ(2) = 0 and the initial condition x(0) = 6, when
applied to the above two equations, respectively, give
108k1 − 0.37k2 = 1,
87.532k1 + 0.115k2 = 6.5,
which when solved simultaneously, e.g., by Cramer’s rule, give k1 = 0.056 and
k2 = 13.73. Hence, from Eq (9.6.1) we get
Z 4
10.5. Maximize (5x − 2y 2 ) dt subject to ẋ = 8y, x(0) = 2, x(4) ≥ 800.
0
Solution. From Exercise 10.2 we have
10.6. Solve the same problem as in Exercise 9.2 but with the new boundary
conditions: x(0) = 5 and x(4) ≥ 650. The first two steps are the same as in
Example k.1 solved as a free endpoint problem. The maximum principle gives
thus, giving λ(t) = −5t+c1 , x(t) = −40t2 +16c1 t+c2 . Now the new boundary
conditions are x(0) = 2 and x(4) = 650, which yield c2 = 2 and c1 = 20.125.
Hence, λ(t) = −5t+20.125, x(t) = −40t2 +322t+2, and y(t) = −2.5t+20.125.
10.7. Solve
Z 4
Maximize (5x − 2y 2 ) dt subject to ẋ = 8y, x(0) = 2, x(4) ≥ 620.
0
Since the free endpoint solution satisfies the terminal endpoint constraint
X(T ) ≥ 620, the constraint is not binding and we thus have a proper solution,
where from Example 9.2, the control variable y(t) = −10t + 40.
Z 1
10.8. Maximize (8x + 3y − 2y 2 ) dt subject to ẋ = 8y, x(0) = 9, x(1) ≥
0
90.
246 10 OPTIMAL CONTROL THEORY
H = 8x + 3y − 2y 2 + λ(8y).
∂H
= 3 − 4y + 8λ = 0, which gives y = 2λ + 1.5,
∂y
∂H (9.6.2)
λ̇ = − = −8,
∂x
∂H
ẋ = = 8y = 16λ + 12.
∂λ
First, we will use the complementary slackness condition to find the solution
by assuming that x(1) − S 2 = 90. There are two cases to consider:
Case 1. λ = 0: Then 3 − 4y = 0, or y = 0.75. Also, ẋ = 12 gives
x(t) = 12t + a1 , which using the initial condition x(0) = 9, gives a1 = 9. Then
the terminal condition x(1) = S 2 + 90 gives 12 + 9 − S 2 = 90, or S 2 = −69,
which is infeasible.
Case 2. S = 0: Then integrating the last two equations in (9.6.2), we get
λ(t) = −8t + c1 ,
(9.6.3)
ẋ = 16(−8t + c1 ) + 12 = −128t + 16c1 + 12.
Using the initial condition x(0) = 9, we get c2 = 9. Also, using the transver-
sality condition λ(1) = 0, we get c1 = 8. Thus,
λ(t) = −8t + 8,
x(t) = −64t2 = 140t + 9.
Now, to see if this solution is acceptable, we check x(1) = 85 < 90. So the
terminal constraint is violated. Thus, in this situation we solve the problem
with a fixed endpoint condition x(1) = 90. Then from Eqs (9.6.3), condition
x(0) = 9 gives c2 = 9, and the new constraint x(1) = 90 gives c1 = 9.0625.
Hence,
Z 3
10.9. Maximize e−0.04t (xy − x2 − y 2 ) dt subject to ẋ = x + 2y, x(0) =
0
130.2, and x(3) free. Solution. First, we check the sufficient conditions to
ensure that this problem has a global maximum. The Hessian
fxx fxy −2 0
|H| = = = 4 > 0,
fyx fyy 0 −2
and the first principal minor |H1 | = −2 < 0, while the second principal minor
|H2 | = |H| > 0, which imply that the Hessian is negative definite. Thus, f is
concave in both x and y, and g = x + 2y is linear, so conditions for a global
maximum are satisfied.
Now, the current-valued Hamiltonian is
Hc = xy − x2 − y 2 + µ(x + 2y).
∂Hc
= x − 2y + 2µ = 0, which gives y = 0.5(x + 2µ),
∂y
∂Hc
µ̇ = pµ − = 0.04µ − (y − 2x + µ) = −0.96µ + 1.5x,
∂x
∂Hc
ẋ = = x + 2y = 2x + 2µ.
∂µ
To solve for µ and x, the last two equations above are written in the matrix
form Y = AY + B as
µ̇ −1.96 1.5 µ 0
= + .
ẋ 2 2 x 0
−1.96 − r 1.5
|A − rI| = = 0,
2 2−r
Then
Now, we apply the boundary conditions: at the free endpoint µ(3) e−0.04t = 0,
or µ(3)e−0.04(3) = µ(3) e−0.12 = 0, which gives
13.599k1 − 0.162k2 = 0.
Z 1
10.10. Maximize e−0.07t (8x + 3y + xy − 2x2 − 0.8y 2 ) dt subject to
0
ẋ = x + 4y, x(0) = 91, and x(1) free. Solution. First, we check the
sufficient conditions to ensure that this problem has a global maximum. The
Hessian
f fxy −4 1
|H| = xx = = 5.6 > 0,
fyx fyy 1 −1.6
and the first principal minor |H1 | = −4 < 0, while the second principal minor
|H2 | = |H| > 0, which imply that the Hessian is negative definite. Thus, f is
concave in both x and y, and g = x + 4y is linear, so conditions for a global
maximum are satisfied.
Now, the current-valued Hamiltonian is
∂Hc
= 3 + x − 1.6y + 4µ = 0, which gives y = 0.625x + 2.5µ + 1.875,
∂y
∂Hc
µ̇ = pµ − = 0.07µ − (8 + y − 4x + µ) = −1.57µ + 3.375x − 9.875,
∂x
∂Hc
ẋ = = x + 4y = 10µ + 3.5x + 7.5.
∂µ
To solve for µ and x, the last two equations above are written in the matrix
form Y = AY + B as
µ̇ −1.57 3.375 µ −9.875
= + .
ẋ 10 3.5 x 7.5
−1.57 − r 3.375
|A − rI| = = 0,
10 3.5 − r
Then
Now, we apply the boundary conditions: at the free endpoint µ(1) e−0.07t) = 0,
which gives
505.946k1 − 0.0013k2 = 55.684.
Also, x(0) = 91 gives
2.3315k1 + k2 = 12.025.
Solving these two equations simultaneously by Cramer’s rule, we get k1 =
0.11, and k2 = 11.77. Hence,
In microeconomics there are three well-known and highly used demands. They
are the Marshallian, the Hicksian, and the Walrasian demands, which deal
with what consumers will buy in different situations so as to maximize their
profit. Some useful results like Shephard’s lemma and the Slutsky equation
are introduced, and so-called Giffen and Veblen goods are discussed. We
will first introduce Shephard’s lemma, and then analyze the above-mentioned
three demands.
∂e(p, u)
hj (p, u) = . (11.1.1)
∂pj
In the theory of the firm, this lemma has a similar form for the conditional
factor demand c(w, y) for each input factor (or good) j:
∂c(w, y)
xj (w, y) = . (11.1.2)
∂wj
Proof. We will consider (11.1.1) only for the two-good case. The general
case and proof of (11.1.2) are analogous. The expenditure function e(p1 , p2 , u)
is the minimand of the constrained optimization problem, and thus, using the
Lagrange multiplier method, the Lagrangian is given by
∂e ∂L
= = xhj , j = 1, 2,
∂pj ∂pj
where xhj is the minimizer (i.e., the Hicksian demand function for good j, j =
1, 2).
where p, x is the inner product of price and quantity vectors. The consumer
has a utility function u : Rm
+ 7→ R. Then the consumer’s Marshallian demand
correspondence is defined uniquely by
x∗ (p, i) = arg max {u(x)} , (11.2.2)
x∈b(p,i)
which is a homogeneous function of degree zero, i.e., for every constant a > 0,
Then
i p1ε−1 i p2ε−1 δ
x∗ (p1 , p2 , i) = , , ε= . (11.2.6)
p1ε−1 + p2ε−1 p1ε−1 + p2ε−1 δ−1
In both case, the preferences are strictly convex, the demand is unique, and
the demand function is continuous.
3. The utility function has the linear form u(x1 , x2 ) = x1 +x2 , which is weakly
convex, and in fact the demand is not unique: when p1 = p2 , the consumer
may divide his income in arbitrary ratios between goods 1 and 2 and get the
same utility.
4. The utility function exhibits a non-diminishing marginal rate of substitu-
tion
u(x1 , x2 ) = (xα α
1 + x2 ), α > 1. (11.2.7)
The utility function is concave, and the demand is not continuous: When
p1 < p2 , the consumer demands only good 1, and when p1 > p2 , the consumer
demands only good 2, and when p1 = p2 the demand correspondence contains
two distinct bundles, either buy only good 1 or buy only good 2.
where good 1 is the numéraire with price p1 = 1, good 2 has price p2 , and
the consumer’s income is m. (i) Find the consumer’s Marshallian demands
for both goods as a function of p1 , p2 and m, and the corner solutions, if
any; (ii) use the solution in part (i) and the relevant homogeneity property
of Marshallian demands to determine the consumer’s demands for goods 1
and 2 for arbitrary non-negative prices p1 , p2 and income m; (iii) find the
consumer’s Hicksian demand (§11.2) functions hi (p1 , p2 , u), i = 1, 2; (iv) find
the consumer’s expenditure function e(p1 , p2 , u); and (v) verify if Shephard’s
lemma applies in this case.
Solution. (i) The marginal rate of substitution between good 2 and good
1 is 1 + y −1/2 > 1. At an interior solution, we must have p2 = 1 + y −1/2 ,
which is possible only if p2 > 1. If the consumer chooses positive amounts of
both goods, the Marshallian demands are given by
1 2 1 2
y(1, p2 , m) = , and x(1, p2 , m) = m − p2 . (11.2.8)
p2 − 1 p2 − 1
1 2
The consumption of good 1 is positive iff p2 > 1 and m > p2 . If the
p2 − 1
consumption of good 1 is zero, then y(1, p2 , m) = m/p2 and x(1, p2 , m) = 0.
254 11 DEMANDS
(v) First, in the case of interior solutions, we have, using Shephard’s lemma,
∂e(p1 , p2 , u)
= hi (p1 , p2 , u), i = 1, 2.
∂pi
x(p, w) that describe demand given by prices p and income w are easier to
observe directly. The two are related by
where e(p, u) is the expenditure function (the function that gives the minimum
wealth required to get to a utility level), and by
where v(p, w) is the indirect utility function (which gives the utility level
of having a given wealth under a fixed price regime). Their derivatives are
related by the Slutsky equation (see §11.4).
Whereas the Hicksian demand comes from the expenditure minimization
problem, the Marshallian demand comes from the utility maximization prob-
lem. Thus, the two problems are mathematical duos, and hence the duality
theorem provides a method of proving the above relationships.
The Hicksian demand function is intimately related to the expenditure
function. If the consumer’s utility function u(x) is locally nonsatiated and
strictly convex, then h(p, u) = ∇p e(p, u). If the consumer’s utility function
u(x) is continuous and represents a locally nonsatiated preference relation,
then the Hicksian demand correspondence h(p, u) satisfies the following prop-
erties:
1. Homogeneity of degree zero in p:P For all a > 0, h(ap, u) =Ph(p, u). This is
because the same x that minimizes j pj xj also minimizes j apj xj subject
to the same constraint.
2. No excess demand: The constraint u(x) ≥ ū holds with strict equality,
u(x) = ū. This follows from the continuity of the utility function. Informally,
they could simply spend less until utility becomes exactly ū.
3. Hicksian demand finds the cheapest consumption bundle that achieves a
given utility level.
utility u by maximizing the utility at the original price and income vectors,
formally given by v(p, w). The right-hand side of Eq (11.4.1) represents the
change in demand for good j holding utility fixed at u minus the quantity of
good k demanded, multiplied by the change in demand for good j when wealth
changes. Thus, the first term on the right-hand side of Eq (11.4.1) represents
the substitution effect, and the second term the income effect. Since utility
cannot be observed, the substitution effect is not directly observable, but it
can be calculated using the other two terms in Eq (11.4.1).
To derive Eq (11.4.1), use the identity hj (p, u) = xj (p, e(p, u)), where
e(p, u) is the expenditure function, and u is the utility obtained by maxi-
mizing utility for a given p and w. Then differentiating hj (p, u) partially with
respect to pk , we get
∂hj (p, u) ∂xj (p, e(p, u)) ∂xj (p, e(p, u)) ∂e(p, u)
= + · . (11.4.2)
∂pk ∂pk ∂e(p, u) ∂pk
∂e(p, u)
Then, using the fact that = hk (p, u) by Shephard’s lemma, and that
∂pk
hk (p, u) = hk (p, v(p, w)) = xk (p, w) at the optimum, where v(p, w) is the
indirect utility function, and substituting these results into Eq (11.4.2), we
obtain Eq (11.4.1).
The Slutsky equation (11.4.1) shows that the change in the demand for a
good, caused by a price change, can be explained by the following two effects:
(i) a substitution effect that results in a change in the relative prices of two
goods, and (ii) an income effect that results in a change in the consumer’s
purchasing power. For more details, see Nicholson [1978].
The Slutsky equation (11.4.1) can also be written as
∂hj (p, u) ∂xj (p, w) ∂xj (p, w)
= + xk (p, w), (11.4.3)
∂pk ∂pk ∂w
or in matrix form as
Dp h(p, u) = Dp x(p, w) + Dw x(p, e) x(p, e)T . (11.4.4)
| {z } | {z } | {z }
n×n n×1 1×n
Another Proof. Take any (p, w, u), and recall that h(p, u) = x(p, w)
and e(p, u) = w. Now, differentiate hj (p, u) = xj (p, e(p, u)) with respect to
pj :
∂hj (p, u) ∂xj (p, e(p, u)) ∂xj (p, e(p, u)) ∂e(p, u)
= +
∂pk ∂pk ∂w ∂pj
∂xj (p, e(p, u)) ∂xj (p, e(p, u))
= + hj (p, u)
∂pk ∂w
∂xj (p, w) ∂xj (p, w)
= + xj (p, w).
∂pk ∂w
258 11 DEMANDS
The formula (11.4.1) signifies both substitution effect and income effect as
follows:
∂xj (p, w) ∂hj (p, u) ∂xj (p, w)
= − xk (p, w) . (11.4.5)
∂pk ∂p | ∂w {z
| {zk } }
substitution effect income effect
decreases, the demand curve slopes downward (Figure 11.1). But in the case
of inferior good the income effect and the substitution effect are in the opposite
direction (Figure 11.2).
Example 11.2. (1). Let p = (p1 , p2 ) be original prices, and x = (x1 , x2 )
be the original demand. Let p1 be decreased to p′1 . While the initial income
was i = p1 x1 + p2 x2 , now the consumer needs to buy only p′1 x1 + p2 x2 = i′ ,
thus, the consumer’s income becomes i − i′ = (p1 − p′1 )x1 . Thus, at the new
price, (i) if less income is needed than before to buy the original choice, then
real income has increased; and (ii) if more income than before is needed to
buy the original choice, then the real income has decreased.
(2). To determine the changes in quantities demanded when the consumer’s
income is adjusted so that at new prices he/she can just buy the original
choice, let (i) the change be from x to x′ (known as the pure substitution
effect), and (ii) the subsequent change from x′ to x′′ (known as the pure
income effect). For example, suppose that the demand function for milk is
i 120
x1 = 10 + . If initially, i = 120 and p1 = 3, then x1 = 10 + =
10p1 10 × 3
14 units of milk. Thus, 14 × 3 = 42 is spent on milk, and the remaining
120 − 42 = 78 is spent on other goods. Now, suppose p1 has decreased
to p′1 = 2. Then how much income is needed to buy the initial bundle?
106
Obviously, 78+2×14 = 106. Then the consumer will buy x′1 = 10+ = 15.3
10
units of milk with that money. Thus, 15.3 − 14 = 1.3 is the substitution effect.
120
Next, what is the income effect? Obviously, x′′1 = 10 + = 16 will be
10 × 2
the eventual consumption, giving a change of 16 − 14 = 2 units. Hence, the
2 − 13 = 0.7 is the income effect.
The substitution effect always moves opposite to the price effect. Thus,
we say: the substitution effect is negative when the change in demand due to
the substitution effect is opposite to the price change. However, the income
effect may be negative or positive depending on whether the good is inferior
or not.
11.4.1 Giffen Goods. If the income effect and the substitution effect are
in opposite directions and if the income effect is larger than the substitution
effect, then a price decrease lessens the demand. Such goods are called Giffen
goods. For a consumer who consumes multiple goods, the Giffen goods are
inferior goods (Figure 11.3).
In fact, the above definition of Giffen good implies that
∂xci (p, M )
> 0. (11.4.6)
∂pi
260 11 DEMANDS
Using the Slutsky equation for the own-price effect of good i, the inequality
(11.4.6) implies that
Finally, note that a Giffen good faces an upward sloping demand curve
because the income effect dominates the substitution effect, which means that
quantity demand increases as the price rises. However, a good cannot always
have an upward sloping demand curve, because the consumer eventually runs
out of money. At some point the rising price of the Giffen good surpasses the
consumer’s budget, and a price increase will lower the amount of the good
the consumer is able to buy. This means that at higher enough prices, the
demand curve will start sloping downward (Figure 11.4, where A marks the
point at which the good surpasses consumer’s budget; and the ‘Giffen good
range’ represents the range where only such good is consumed).
11.4.2 Veblen Goods. (Veblen [1899]) These goods are types of luxury
goods for which the quantity demanded increases as the price increases, which
appears to be in contradiction to the Law of Demand. Consumers prefer
more of a good as its price rises, resulting in an upward sloping demand
curve. These goods are, for example, personal goods such as wines, jewelry,
designer handbags and luxury cars, and are in demand simply because the high
prices asked for them. They make a desirable status symbol as conspicuous
consumption and leisure. This phenomenon is also known as the Veblen effect,
where goods are desired even if they become over-priced. A corollary is that
a decrease in their price decreases their demand. This effect is known as the
snob effect, the bandwagon effect, the network effect, the hot-hand fallacy,
and the common law of business balance. None of these effects, however, can
predict what will happen to the actual quantity of goods demanded as prices
change. The actual effect on quantity demanded will depend on the range of
other available goods, their prices and substitutions for consumed goods.
∂u(x)
− λp = 0, p · x = w. (11.5.2)
∂x
∂u(x)
Boundary solution: For some k, − λpk = 0 should be replaced by
∂xk
∂u(x) ≤ 0 if xk = 0,
− λpk (11.5.3)
∂xk = 0 if xk > 0.
Example 11.3. For a boundary solution, let
v(p, w) := u(x′ ),
n
X X
u(x) = αj log xj , αj ≥ 0, nαj = 1, (11.5.5)
j=1 j=1
where αj is the fraction of the expense for good j. Then the Walrasian demand
αj w
is xj (p, w) = . Hence,
pj
n
X
v(p, w) = log w + αj (log αj − log pj ).
j=1
where q is the amount of consumed good, p is the price, T is the total time
available, τ is the time spent for ‘leisure,’ which determines h = T − τ hours
of work, w is wage, so wh is labor income, and P is nonlabor income. Since
the utility function is Cobb-Douglas, the Walrasian demand is
(1 − t)(wT + P ) τ (wT + P )
q(p, w, P ) = , τ (p, w, P ) = for τ < T .
p w
In the case when τ ≤ T , the consumer does not participate in the labor market
(τ (p, w, P ) = t) and spends all labor income to purchase goods (q(p, w, P ) =
P/p). For τ < T , the indirect utility function is
where K is a constant.
In other words, this problem asks as to what is the cheapest way to get
utility as high as ū. Let the Hicksian demand correspondence1 h(p, ū) be
the set of solutions to the problem (11.6.1). Assuming local nonsatiation,
the constraint can be modified as follows: if u is continuous and the set
U ≡ {x ∈ X : u(x) ≥ ū} is not empty, then choose any x′ such that u(x′ ) > ū,
and choose any x̄ ∈ R+ such that p′j x̄ ≥ p′ · x′ for all j and all p′ in a
neighborhood of p ≫ 0. This replaces the constraint by a compact set. Then
the cost minimizing solution of problem (11.6.1) is the same locally with
respect to (p, ū). Moreover, the set h(p, ū) is not empty in the neighborhood
of (p, ū) ∈ Rn++ × U .
Theorem 11.6. (Hicksian demand) Let u be continuous and locally non-
satiated, and let X = Rn+ . Then the set h, known as the Hicksian correspon-
dence and defined by h : Rn++ × U 7→ Rn+ (i) is homogeneous of degree 0 in
1 This correspondence is also used in welfare analysis.
11.7 EXPENDITURE FUNCTION 265
p; (ii) achieves ū exactly (i.e., u(x′ ) = ū for any x′ ∈ h(p, ū) if ū ≥ u(0));
(iii) is convex for any given (p, ū) if u is quasi-concave; and (iv) is upper
semicontinuous.
The set h(p, u) reduces to a point if u is strictly quasi-concave.
Proof. (i), (ii), and (iii) are easy. To prove (iv), we cannot apply the
maximum theorem as we did in the proof of Theorem 11.1 on the Walrasian
demand, because in the present case the set is not locally bounded.
11.8 Applications
Labor Supply. We will apply a Slutsky equation to the labor/leisure deci-
sion problem. Let the income vector i be given by i = wT + P . Then, since
w affects the labor τ , we get from (11.4.3),
∂τ (p, w, i) ∂τ (p, w, i) ∂τ (p, w, i)
= + T. (11.8.1)
∂w ∂w ∂i
If we differentiate τ (p, w, i) = τ (p, we(p, w, ū)) with respect to w(τ (p, w, ū)),
we obtain
∂τ (p, w, i) ∂τ (p, w, i) ∂τ (p, w, i)
= + τ (p, w, i). (11.8.2)
∂w ∂w ∂i
Hence,
∂τ (p, w, i) ∂τ (p, w, ū) ∂τ (p, w, i) ∂τ (p, w, i)
= − τ (p, w, i) + T
∂w | ∂w {z } | ∂i{z } | ∂i{z }
substitution effect income effect I income effect II
∂τ (p, w, ū) ∂τ (p, w, i)
= + T − τ (p, w, i) . (11.8.3)
∂w ∂i
For the labor supply h = T − τ , the relation (11.8.3) becomes
∂τ (p, w, i) ∂h(p, w, ū) ∂h(p, w, i)
= − h(p, w, i), (11.8.4)
∂w ∂w ∂i
which is called Roy’s identity. This identity can also be written as follows:
For all (p, w) ≫ 0,
1
x(p, w) = − ∇p v(p, w). (11.8.5)
Dw v(p, w)
Proof. For any (p, w) ≫ 0, and ū = v(p, w), we have v(p, e(p, varu)) = ū,
which upon differentiation gives
∇p v(p, e(p, ū)) + Dw v(p, e(p, ū)) ∇p e(p, ū) = 0,
∇p v(p, e(p, ū)) + Dw v(p, e(p, ū)) h(p, ū) = 0
∇p v(p, w) + Dw v(p, w)x(p, w) = 0, (11.8.6)
11.9 EXERCISES 267
with respect to (x, λ) is a full rank matrix, i.e., we must show that
1
D2 u(x) − Du(x)T
λ (11.8.7)
1
− Du(x) 0
λ
is full rank. This is satisfied when u is differentiable strictly quasi-concave.
(Use the following definition):
u : X (⊂ Rn ) 7→ R is differentiable strictly quasi-concave if
∆xT D2 u(x)∆x < 0 for any ∆x(6= 0) ∈ Rn such that Du(x)∆x = 0.
11.9 Exercises
11.1. A consumer has preferences represented by the utility function
1/2
u(x1 , x2 ) = x1 + x2 + 2x2 , where good 1 is the numéraire and price p1 = 1.
The price of good 2 is p2 , and the consumer’s income is m. (a) Find this
consumer’s Marshallian demand for goods 1 and 2 as a function of p2 and
m, accounting for corner solutions, if any; (b) using the result of part (a)
and the relevant homogeneity property of Marshallian demand, find this con-
sumer’s demand for goods 1 and 2 for arbitrary nonnegative prices p1 , p2 and
income m; (c) find the consumer’s Hicksian demand functions h1 (p1 , p2 u) and
h2 (p1 , p2 u); (d) find this consumer’s expenditure function e(p1 , p2 , u); and (e)
verify that Shephard’s lemma applies in this case (part (d)).
Ans. (a) The marginal rate of substitution between good 1 and good 2 is
−1/2 −1/2
1 + x2 > 1. At an interior solution, we must have p2 = 1 + x2 , which
is possible only if p2 > 1. If the consumer chooses positive amounts of both
goods, the Marshallian demands are given by
1 2 1 2
x2 (1, p2 , m) = , x1 (1, p2 , m) = m − p2 . (11.9.1)
p2 − 1 p2 − 1
268 11 DEMANDS
1 2
There is positive consumption of good 1 iff p2 > 1 and m > p2 . If
p2 − 1
the consumption of good 1 is zero, then x1 (1, p2 , m) = 0 and x2 (1, p2 , m) =
m/p2 . Thus the consumption of good 2 is always positive since the marginal
rate of substitution approaches infinity as x2 approaches zero.
(b) Since demand is homogeneous of degree zero in prices and income, we
have p m
2
xi (p1 , p2 , m) = xi 1, , . (11.9.2)
p1 p1
Then, using the result of part (a), at an interior solution we have
1 2 p 2
2
x2 (p1 , p2 , m) = = ,
(p2 /p1 ) − 1 p2 − p1
(11.9.3)
m p2 1 2 m p1 p2 2
x1 (p1 , p2 , m) = − = − .
p1 p1 (p2 /p1 ) − 1 p1 p2 − p1
and
If p2 2
u+1< , (11.9.7)
p2 − p1
p
then h2 (p1 , p2 , u) = 0, and u = h2 (p1 , p2 , u) + 2 h2 (p1 , p2 , u). The right-
hand side of the second equation is a strictly increasing function of h2 , i.e.,
for any specified (p1 , p2 , u), there is a unique solution for h2 (p1 , p2 , u)., but
this solution cannot be expressed in a simple closed form.
11.9 EXERCISES 269
∂e(p1 , p2 , u)
= hi (p1 , p2 , u). (11.9.9)
∂pi
11.4. Show that the Walrasian and Hicksian demands are equal. Hint.
(i) In both demands the consumption bundles that maximize utility are the
same as the consumption bundles which maximize expenditure, provided the
constraints of the two ‘match up’. (ii) Both demands must coincide when com-
puted according to the same prices, income, and utility. (iii) The proposition
implies that the expenditure function e(p, ū(p, w)) = w and ū(p, e(p, ū)) = ū,
so for a fixed price vector p, the quantities e(p, ·) and ū(p, ·) are inverses of
each other.
12
Black-Scholes Model
Although far from being perfect, the Black-Scholes model is still useful. It
demands a prerequisite of partial differential equations and the Laplace trans-
form.
∂V 1 ∂2V ∂V
+ σ2 S 2 2
+ rS − rV = 0, (12.1.1)
∂t 2 ∂S ∂S
where V denotes the value of the option, r the rate of interest, and S the asset
at time t. The derivation of this equation is as follows: The Black-Scholes
model based on return has two components: (i) µ dt, which is predictable
and deterministic, where µ is the drift, and (ii) σ dX, which is a random
contribution to the return dS/S, where σ is the volatility of the asset S at
time t. For each interval √dt, the quantity dX is a sample drawn from the
normal distribution N (0, ( dt)2 ), which when multiplied by σ produces the
term σ dX. The value of the parameters σ and µ are estimated from the
historical data. Thus, we obtain the following stochastic differential equation:
dS
= µ dt + σ dX. (12.1.2)
S
This strategy replicates the option if V = f (t, S). Combining these options
we get the Black-Scholes equation
∂f 1 ∂2f ∂f
+ σ 2 S 2 2 + rS − rf = 0, (12.1.4)
∂t 2 ∂S ∂S
S
x = ln so that S = K ex ,
K
1 2 2τ
τ= σ (T − t) so that t = T − 2 , (12.2.1)
2 σ
1 1 2τ
U (x, t) = V (S, t) = V (Kex , T − 2 ).
K K σ
Next, we apply the chain rule to the partial derivatives in the Black-Scholes
equation:
∂V ∂U ∂τ 1 ∂U
=K = − Kσ 2 ,
∂t ∂τ ∂t 2 ∂τ
∂V ∂U ∂x K ∂U ∂U
=K = = e−x ,
∂S ∂x ∂S S ∂x ∂x
∂2V K ∂U K ∂ ∂U
= − +
∂S 2 S 2 ∂x S ∂S ∂x
K ∂U K ∂ ∂U ∂x
=− 2 +
S ∂x S ∂x ∂x ∂S
K ∂U K ∂2U e−2x ∂ 2 U ∂U
=− 2 + 2 2
= 2
− .
S ∂x S ∂x K ∂x ∂x
1 ∂U ∂U 1 ∂2U ∂U
− Kσ 2 + rK + σ2 K − − rU = 0,
2 ∂τ ∂x 2 ∂x2 ∂x
274 12 BLACK-SCHOLES MODEL
which simplifies to
∂U ∂U ∂2U
− + (k − 1) + − kU = 0, (12.2.2)
∂τ ∂x ∂x2
2r
where k = . Notice that the coefficients of Eq (12.2.2) are independent of
σ2
x and τ . The boundary condition for V is V (ST , T ) = (ST − K)+ . Now, from
Eq (12.2.1), x = ln{ST /K} ≡ xT when t = T and St = ST , and τ = 0. Then
the boundary condition for U is
1 1
U0 (xT ) = U (xT , 0) = V (ST − K)+ = (KexT − K)+ = (exT − 1)+ .
K K
Lastly, we set
2
W (x, T ) = eαx+β τ
U (x, t), (12.2.3)
where α = 12 (k − 1) and β = 21 (k + 1), (k = 2r/σ 2 ). The transformation
(12.2.3) converts Eq (12.2.2) into the heat equation; details are as follows:
∂U 2
∂W
= e−αx−β τ − β 2 W (x, τ ) ,
∂τ ∂τ
∂U
−αx−β 2 τ ∂W
=e − αW (x, τ ) ,
∂x ∂x
∂2U −αx−β 2 τ
∂W ∂2W
2
=e α2 W (x, τ ) − 2α + .
∂x ∂x ∂x2
∂W ∂W
β 2 W (x, τ ) − + (k − 1) − αW (x, τ ) +
∂τ ∂x
∂W ∂2W
+αW (x, τ ) = 2α + − kW (x, τ ) = 0,
∂x ∂x2
∂W ∂2W
= . (12.2.4)
∂τ ∂x2
1 −αx−β 2 τ
V (S, t) = e W (x, τ ). (12.2.6)
K
12.2.2. Solution of the Heat Equation. The solution of the heat equation
(12.2.4) subject to the boundary condition (12.2.5) is
Z ∞
1 2
W (x, τ ) = √ e−(x−y) /(4τ ) W0 (y) dy
2 πτ −∞
Z ∞
1 2
= √ e−(x−y) /(4τ ) (eβy − eαy )+ dy,
2 πτ −∞
(12.2.7)
1 2
G(x, t, y) = √ e−(x−y) /(4τ ) . (12.2.8)
2 πτ
For details, see Kythe [2011:§6.1.2]. The graphs of this Green’s function G
for some values of t > 0 are shown in Figure 12.1, which shows normal distri-
butions.
12.2.3 Black-Scholes Call Price. The solution of the heat equation (12.2.4)
subject to the boundary condition (12.2.5) is obtained from Eq (12.2.8) as
Z ∞
1 2
W (x, τ ) = √ e−(x−y) /(4τ ) W0 (y) dy
2 πτ −∞
Z ∞
1 2
= √ e−(x−y) /(4τ ) (eβy − eαy )+ dy.
2 πτ −∞
(12.2.9)
276 12 BLACK-SCHOLES MODEL
√ √ √
Now, set z = (y − x)/ 2τ , so that y = 2τ z + x, giving dy = 2τ dz. Then
Z ∞ n 1 o n √ √ o
1
W (x, τ ) = √ exp − z 2 exp β( 2τ z + x) − α( 2τ z + x)+ dz.
2π −∞ 2
(12.2.10)
Note the integral in (12.2.10)
√ is nonzero only
√ when the second exponent
√ term
is positive, i.e., when β( 2τ z + x) > α( 2τ z + x), or z > −x/ 2π. Let us
write Eq (12.2.10) as
W (x, τ ) = I1 − I2 ,
where
Z ∞ n n √
1 1 2o o
I1 = √ √ exp − z exp β( 2τ z + x) dz,
2π −x/ 2π 2
Z ∞ n n √
1 1 2o o
I2 = √ √ exp − z exp α( 2τ z + x) dz.
2π −x/ 2π 2
√
Completing the square in the integrand in I1 , we get − 21 z 2 + β 2τ z + βx =
√
− 12 (z − β 2τ )2 + βx + β 2 τ , and thus,
Z ∞ √
1 2 1
2τ )2
I1 = √ eβx+β τ √ e− 2 (z−β dz.
2π −x/ 2π
√
Now set u = z − β 2τ . Then
Z ∞
1 2 1 2
I1 = √ eβx+β τ √ √ e− 2 u du
2π −x/ 2π−β 2τ
βx+β 2 τ
x √
=e Φ √ + β 2τ ,
2τ
r−σ2 /2
Recall that x = ln(S/K), k = 2r/σ 2 , α = 21 (k − 1) = σ2 ,β = 21 (k + 1) =
2
r+σ /2
σ2 , and τ = 12 σ 2 (T − t). Thus,
√ S
x ln K + (r + σ 2 /2)(T − t)
√ + β 2τ = ≡ d1 ,
2τ σ(T − t)
x √ √
√ + α 2τ = d1 − σ T − t ≡ d2 ,
2τ
12.2 SOLUTION OF BLACK-SCHOLES EQUATION 277
Finally, to obtain the solution for the call price V (St , t), we use Eq (12.2.6)
in Eq (12.2.11) and obtain
2 2
V (S, t) = K e−αx−β τ
W (x, τ ) = K e−αx−β τ
(I1 − I2 ). (12.2.12)
Example 12.1. Suppose the price of gold today is $2000 per ounce and
the risk-free interest is 3%. Suppose you do not want gold today (because it
is out of fashion), but you do want it in 6 months (when, of course, it will
278 12 BLACK-SCHOLES MODEL
∂f ∂f
df = (∇f, dx) = dS + dt
∂S ∂t
∂f ∂f ∂f
= St µ + dt + St σ dX.
∂S ∂t ∂S
√ 1 ∂2f
Since dX behaves like dt, we may take (dX)2 = dt. Then 2 (dS)2 =
∂S 2
12.2 SOLUTION OF BLACK-SCHOLES EQUATION 279
1 ∂2f 2 2
S σ dt up to first order. Hence, by Itô’s lemma
2 ∂S 2 t
∂f ∂f ∂2f 2 2 ∂f
df = St µ + + S σ dt + St σ dX, (12.2.14)
∂S ∂t ∂S 2 t ∂S
√
where dX = dt, as defined above. Notice that the only randomness in df is
the dX term. Thus, we can contract a portfolio that eliminates the random
part, and the rest we can easily control.
First, we will rely on the discrete version of (12.2.14). Since we want to
price a contingent claim, or derivative, a simple method is to set
−1 for derivative,
Π=
∆ for shares,
∂f
where ∆ ≡ . For a small change δt in time, the corresponding change in
∂S
Π is δΠ = −δf + ∆ δS. The discrete version of (12.2.14) gives
∂f 1 ∂2f 2 2
δΠ = − − σ St δt, (12.2.15)
∂t 2 ∂S 2
which implies that the portfolio is risk-less (no uncertainty), and then by
arbitrage argument we must have δΠ = rΠ δt, or
∂f 1 ∂2f 2 2
− − σ St δt = r(−f + ∆S) δt,
∂t 2 ∂S 2
which yields
∂f 1 ∂2f 2 2
+ σ S t + r∆S δt = rf δt,
∂t 2 ∂S 2
or
∂f 1 ∂2f 2 2
+ σ St + r∆S = rf,
∂t 2 ∂S 2
which yields the Black-Scholes-Merton partial differential equation
∂f 1 ∂2f 2 2 ∂f
+ σ St + r S − rf = 0, (12.2.16)
∂t 2 ∂S 2 ∂S
with known Cauchy data f (St , T ), which are initial conditions (at t = 0) on
S and St . Thus, any function f that satisfies Eq (12.2.16) denotes the price
of some theoretical contingent claim, and every contingent claim must satisfy
Eq (12.2.16).
A solution of Eq (12.2.16) with boundary conditions depicting a European
call option with strike K, i.e., with f (S, T ) = max{S − K, 0}, we obtain
280 12 BLACK-SCHOLES MODEL
the Black-Scholes price of the European call option. Let c denote the Black-
Scholes price of a European call option on a stock with no dividend, i.e.,
12.2.6 Implied Volatility. Next, we will discuss the implied volatility, and
where Black-Scholes goes wrong. Remember that prices are not set by the
Black-Scholes options price. It is the markets that set prices, and according to
some economists they set prices nearly perfectly. Therefore, go to the market
to see what a call option on a certain underlying is selling for at this moment,
i.e., at t = 0. Observe K, r, St , T , but remember that we cannot observe σ. So
we solve for σ using (12.2.17), which is easy since the Black-Scholes call option
price is monotonic in σ. The number we get is called the implied volatility.
If we check market data for different strike prices K with every thing else
being equal, we get different implied volatilities. In fact, what we get is called
a volatility smile or a volatility skew depending on its shape. This problem
is due to our assumption that σ is an intrinsic property of the underlying. It
should not vary with K.
12.2 SOLUTION OF BLACK-SCHOLES EQUATION 281
Example 12.2. The prices for European call and put options on the
QQQ (a NASDAQ 100 composite) for October 11, 2016 and expiration dates
in October 17 and November 28, 2016, are presented in Table 12.1.
Calls Puts
Strike Oct Nov Oct Nov
34 3.9 4.1 0.05 0.25
35 2.8 3.2 0.05 0.35
36 1.85 2.35 0.1 0.55
37 1 1.65 0.25 0.85
38 0.35 1.05 0.6 1.25
39 0.1 0.6 1.4 1.9
40 0.05 0.35 2.35 2.6
The above data is plotted in Figure 12.2 with the strike price on the x-axis
and implied volatility on the y-axis.
Volatility smiles also occur with commodities. It is found that σ not only
varies with the strike price, it also depends on whether a call or a put is being
priced. Moreover, implied volatility varies with the expiration of the option.
Thus, Black-Scholes is used to show that it lacks certain features. This model
could be enlarged. Some suggestions are: (i) assume volatility is stochastic,
282 12 BLACK-SCHOLES MODEL
i.e., let σ = µσ dt + σ̂ dX; (ii) assume volatility is local, i.e., σ = σ(S, t); (iii)
assume the process that is underlying follows a jump-diffusion process; and
(iv) assume interest rates are, at the very least, nonconstant. However, no
improvement of the Black-Scholes model is available so far.
1 n S σ2 o
d1 = √ ln + r+ (T − t) , (12.3.2)
σ T −t K 2
√
d2 = d1 − σ T − t. (12.3.3)
12.3 BLACK-SCHOLES FORMULA 283
The price of the corresponding put option based on put-call parity is given by
P (S, t) = Ke−t(T −t) − S + C(S, t) = N (−d2 )ke−r(T −t) − N (−d1 )S. (12.3.4)
where
1 n F 1 2 o
d± = √ ln ± σ τ ,
σ τ K 2 (12.3.6)
√
d± = d∓ ± σ τ ,
where τ = T − t is the time to expiry (i.e., remaining time, backwards time),
D = e−rτ is the discount factor, F = erτ S = S/D is the forward price of the
underlying asset, S = DF , and d+ = d1 and D− = d2 . The formula (12.3.5)
is a special case of the so-called Black-76 formula. Thus, if a put-call parity,
defined by C − P = D(F − K) = S − DK, is given, then the price of a put
option is
P (F, τ ) = D N (−d− )K − N (−d+ )F . (12.3.7)
The Black-Scholes formula (12.3.5) is a difference of two terms, sum of
which is equal to the value of the binary call options. According to Nielsen
[1993], this formula can be interpreted in terms of the N (d± ) (and a fortiori
d± ) terms as follows: it allows us to decompose a call option into the difference
of two binary options, which are simply an asset-or-nothing call minus a cash-
or-nothing call, where a call option exchanges cash for an asset at expiry,
while an asset-or-nothing call just yields the asset (with no cash in exchange)
and a cash-or-nothing call just yields cash (with no asset in exchange).
Next, we can rewrite formula (12.3.1) as
C = D N (d+ )F − N (d− )K . (12.3.8)
money N (d+ ) and the value of the underlying at expiry F , while the N (d− )K
term is the product of the probability of the option expiring in money N (d− )
and the value of the cash at expiry K. This is because as either both binaries
expire in the money or both expire out of money, i.e., either cash is exchanged
for asset or it is not, the probabilities N (d+ ) and N (d− ) are not equal. In fact,
the quantities d± are measures of moneyness (in standard deviation), while
N (d− ) are the probabilities of expiring ITM (percent moneyness). Thus, the
interpretation of the cash option, N (d− )K, is correct, since the value of the
cash is independent of movements of the underlying, and therefore, it can be
interpreted simply as ‘probability times value.’
On the other hand, the product N (d+ )F is more complicated, since, ac-
cording to Nielsen [1993], the probability of expiring in the money and the
value of the asset at expiry are not independent. In fact, the value of the
asset at expiry is variable in terms of cash, but is constant in terms of the
asset itself (i.e., it is a fixed quantity of the asset). Thus, these quantities are
independent only if one changes stock (numéraire) to the asset rather than
cash.
In formula (12.3.1), if S replaces the forward F in d± instead of the 21 σ 2
term, there is the term (r ± 21 σ 2 ), which can be interpreted as a drift factor
in the risk-neutral measure for appropriate numéraire. The reason for the
use of the factor 12 σ 2 is to account for the difference between the median
and mean of the log-normal distribution, if d− is used for moneyness rather
1 F
than the standardized moneyness m = √ ln . The same factor is
σ τ K
found in Itô’s lemma on the geometric Brownian motion. Another reason for
the incorrectness of the naive interpretation of replacing N (d+ ) by N (d− ) in
formula (12.3.5) is that it will yield a negative value for out-of-the-money call
options.
Thus, the terms N (d1 ) and N (d2 ) represent, respectively, the probabilities
of the option expiring in-the-money under the exponential martingale prob-
ability measure for stock and the equivalent martingale probability measure
for risk-free asset. The risk-neutral probability for a finite stock price ST is
defined by
N ′ (d2 (ST ))
p(S, T ) = √ , ST ∈ (0, ∞), (12.3.9)
ST σ T
1 vega is not a letter in the Greek alphabet; it arises from reading the Greek letter ν as
‘v’.
286 12 BLACK-SCHOLES MODEL
and variance
2 2
Var[X] = eσ − 1 e2µ+σ . (12.5.2)
The p.d.f for X is
1 n 1 ln x − µ 2 o
dFX (x) = √ exp − , (12.5.3)
σx 2π 2 σ
The exponent in (12.5.6), after completing the square and combining terms,
becomes
1 1 2
− (y 2 − 2yµ + µ2 − 2σ 2 y) = − 2 y − (µ + σ 2 ) + µ + 12 σ 2 ,
2σ 2 2σ
y − µ
FY (y) = FX . Hence, the integral of Eq (12.5.7) involves the scale-
σ
location transformation of the standard normal c.d.f. Since Φ(−x) = 1−Φ(x),
we get (Hogg and Kulgman [1984])
− ln K + µ + σ 2
LX (K) = exp µ + 12 σ 2 Φ . (12.5.8)
σ
where
ln(St /K) + (r + 12 σ 2 )τ
d1 = √ ,
σ τ
√
d2 = d1 − σ τ , (12.6.2)
Z y
1 2
Φ(y) = √ e−t /2 dt,
2π −∞
and Φ(y) is the standard normal c.d.f.
The value of the bond at time zero is B0 = 1, and that of the stock is S0 . This
model is valid under certain market assumptions, for which see Hull [2008].
By Itô’s lemma, the value Vt of a derivative written on the stock follows the
diffusion equation
∂V ∂V 1 ∂2V
dVt = + dS + (dS)2
∂t ∂S 2 ∂S 2
∂V ∂V 1 ∂2V 2 2
= + dS + σ S dt (12.6.5)
∂t ∂S 2 ∂S 2
∂V ∂V 1 ∂2V ∂V
= + µSt dS + σ 2 St2 dt + σS t dXt .
∂t ∂S 2 ∂S 2 ∂S
There are four different methods to derive Eq (12.6.1): (i) By straightfor-
ward integration; (ii) by applying the Feynman-Kac theorem, (iii) by trans-
forming the Black-Scholes equation into the heat equation, for which a solution
288 12 BLACK-SCHOLES MODEL
is known (this was the original method used by Black and Scholes [1973]; see
§12.2.3), and (iv) using the Capital Asset Pricing Model (CAPM). We will
discuss these methods in the sequel.
With constant interest rate r, the time t price of a European call option on
a non-dividend paying stock when its spot price is St and with strike K and
time to maturity τ = T − t is given by
which can be evaluated to give Eq (12.6.1), but rewritten here for convenience
as
C(St , K, T ) = St Φ(d1 ) = Ke−rτ Φ(d2 ),
St 1
ln + (r + σ 2 )τ
d1 = K √ 2 ,
σ τ (12.6.7)
St 1 2
√ ln + (r − σ )τ
d2 = d1 − σ τ = K √ 2 .
σ τ
To find a measure Q such that under this measure the discounted stock price
that uses Bt is a martingale, let
µ − rt
dSt = rt St dt + σSt dWtQ , where WtQ = Wt + t. (12.6.8)
σ
Recall that the bond Bt = erτ serves as the numéraire, and since r is determin-
istic, we can take NT = erT out of the expectation with V (ST , T ) = (ST −K)+
we can write
V (St , t) = e−r(T −t) E N [(ST − K)+ |Ft ],
Now we will use the stock price St as the numéraire and recover the Black-
Scholes call price. We start with the stock price process in Eq (12.6.8) under
the measure Q and with a constant interest rate
The related bond price is defined by B̃ = B/S. Then by Itô’s lemma we get
the process
dB̃t = σ 2 B̃t dt − σ B̃t dWtQ . (12.6.11)
The measure Q turns S̃ = S/B into a martingale, and not into B̃. The
measure P that turns B̃ into a martingale is
Thus, to solve for Zt define Yt = ln Zt and apply Itô’s lemma again, yielding
1
dYt = −(r + σ 2 ) dt − σ dWtP . (12.6.15)
2
1
YT − Yt = −(r + σ 2 )(T − t) − σ(WTP − WtP ),
2
EMM ∐ when the interest rates are constant. Thus, from Eq (12.6.6)
To evaluate these two integrals, we will use the results derived in §12.6.2, that
under Q and at time t the terminal stock price ST follows the log-normal
distribution with mean ln St + r − 21 σ 2 τ and variance σ 2 τ , where τ = T − t
denotes the time to maturity.
The first integral in the last line of Eq (12.6.20) uses the conditional ex-
pectation of ST , given that ST > K, thus
Z ∞
ST dF (ST ) = E Q [ST |ST > K] = LST (K), (12.6.21)
K
and thus, the first integral in the last line of Eq (12.6.20) is St Φ(d1 ). Next,
using Eq (12.5.4), the second integral in the last line of Eq (12.6.20) can be
written as
Z ∞
e−rτ K dF (ST ) = e−rτ K[1 − F (K)]
K
h ln K − ln S − (r − 1 σ 2 )τ i
t
= e−rτ K 1 − Φ √ 2
σ τ
= e−rτ K[1 − Φ(−d2 )] = e−rτ KΦ(d2 ).
Hence, combining these two terms we obtain Eq (12.6.1) for the European
call price.
292 12 BLACK-SCHOLES MODEL
∂V ∂V 1 ∂2V
+ µ(xt , t) + σ(xt , t)2 2 − r(t, x)V (xt , t) = 0, (12.6.24)
∂t ∂x 2 ∂x
with boundary condition V (xT , T ). Then this equation has the solution
n Z T o
Q
V (xt , t) = E [exp − r(Su , u) du V (xT , T ) Ft ]. (12.6.25)
t
In this equation the time-t expectation is defined with respect to the same
measure Q under which the stochastic part of Eq (12.6.23) defines Brownian
motion.
In order to apply the Feynman-Kac theorem to the Black-Scholes call price,
note that the value Vt = V (St , t) of a European call option written at time t
with strike price K and constant rates of interest r satisfies the Black-Scholes
equation
∂V ∂V 1 ∂2V
+ rSt + σ 2 St2 − rVt = 0, (12.6.26)
∂t ∂S 2 ∂S 2
with boundary condition V (ST , T ) = (ST − K)+ .1 Eq (12.6.26) is the same
as Eq (12.6.24) for xt = St , µ(xt , t) = rSt , and σ(xt , t) = σSt . Thus, we can
apply the Feynman-Kac theorem so that the value of the European call is
given by
n Z T o
Q
V (St , t) = E [exp − r(Xu , u) du V (ST , T )|Ft ]
t
= e−rτ E Q [(ST − K)+ Ft ]. (12.6.27)
12.6.5 CAPM. The Capital Asset Pricing Model (CAPM) is based on the
assumption that the expected return ri of a security i in excess of the risk-free
rate r is
E[ri ] − r = βi (E[rM ] − r),
1 See www.frouah.com for the derivation of Eq (12.6.26).
12.6 BLACK-SCHOLES CALL PRICE 293
where rM denotes the return on the market, and the security’s beta is given
by
Cov[ri , rM ]
βi = .
Var[rM ]
12.6.6 CAPM for Assets. During the time increment dt, the expected
dSt
stock price return E[rS dt] is E , where St satisfies the diffusion equation
St
(12.6.3). Then the expected return is
dSt
E = r dt + βS (E[rM ] − r) dt. (12.6.28)
St
Similarly, the expected return on the derivative E[rV dt], where Vt satisfies
the diffusion equation (12.6.5), is
dVt
E = r dt + βV (E[rM ] − r) dt. (12.6.29)
Vt
∂V ∂V 1 ∂2V 2 2
E[dVt ] = dt+ [rSt dt+ St βS (E[rM − r]) dt]+ σ S dt. (12.6.33)
∂t ∂S 2 ∂S 2
On equating Eqs (12.6.32) and (12.6.33), dropping dt from both sides, and
canceling terms in βS , we get the Black-Scholes equation (12.6.26). Hence, we
have obtained the Black-Scholes call price by using the Feynman-Kac theorem
exactly the same way as in §12.6.4 and solving the integral as in §12.6.3.
12.7 Dividends
The Black-Scholes call price in Eq (12.6.1) is for a call written on a non-
dividend-paying stock. There are two ways to incorporate dividends into the
call price: (i) by assuming that the stock pays a continuous dividend yield
q, or (ii) by assuming that the stock pays dividends in lump sum or ‘lumpy’
dividends.
Following the same derivation method as in §12.6.2, Eq (12.7.1) has the solu-
tion n o
ST = St exp (r − q − 12 σ 2 )τ + σWτQ , τ = T − t. (12.7.2)
Thus, ST follows the log-normal distribution with mean St e(r−q)τ and variance
2
St2 e2(r−q)τ (eσ τ − 1). Then, proceeding as in Eq (12.6.20), the call price is
where
St 1
ln + r − q + σ2 τ
K 2
d1 = √ .
σ τ
Using Eq (12.5.4), the second term in Eq (12.6.20) becomes
h ln K − ln S − r − q − 1 σ 2 τ i
−rτ −rτ t 2
e K[1 − F (K)] = e 1−Φ √
σ τ
= e−rτ KΦ(d2 ), (12.7.5)
√
where d2 = d1 σ τ , as before. Then substituting Eqs (12.7.4) and (12.7.5)
into Eq (12.7.2), we obtain the Black-Scholes price of a European call written
on a stock that pays continuous dividends as
Notice that the only modification is that the current value of the stock price
is decreased by e−qτ , and the return on the stock is decreased from r to r − q.
12.7.2 Lumpy Dividends. The concept is the same as above, except that
the current value of the stock price is decreased by the dividends, though not
continuously.
d ln St = (µ − 21 σ 2 ) dt + σ dWt , (12.8.1)
so that
ln St − ln S0 = (µ − 12 σ 2 ) t + σWt , since W0 = 0.
Hence, the solution of the SDE is
n o
St = S0 exp (µ − 21 σ 2 )t + σWt . (12.8.2)
and variance σ 2 t. Hence, in view of (12.5.1) and (12.5.2), St follows the log-
2
normal distribution with mean S0 eµt and variance S02 e2µt (eσ t − 1). We can
also integrate Eq (12.8.1) from t to T and obtain
ST = St exp (µ − 21 σ 2 )τ + σ(WT − Wt ) ,
d ln Bt = rt dt.
Note that when interest rates are constant, rt = r and Bt = ert . Thus,
integrating from t to T we get the solution
nZ T o
Bt,T = exp ru du =⇒ Bt,T = erτ for constant interest rates.
t
µ − rt
dSt = rt St dt + σSt dW Q , where WtQ = Wt + t. (12.8.3)
σ
We know that under Q, at time t = 0, the stock price St follows the log-normal
2
distribution with mean S0 ert t and variance S02 e2rt T eσ t − 1 ; however St is
not a martingale. If we use Bt as the numéraire, the discounted stock price
S̃t = St /B + t will be a martingale. Applying Itô’s lemma to S̃t , we get the
SDE
∂ S̃t ∂ S̃t
dS̃t = dBt + dSt , (12.8.4)
∂B ∂S
12.8 SOLUTIONS OF SDES 297
St 1
dS̃t = − dBt + dSt
Bt2 Bt
St 1
= − 2 rt Bt dt + rt St dt + σSt dWtQ
Bt Bt
= σ S̃t dWtQ .
Thus, ln S̃t follows the normal distribution with mean ln S̃0 − 12 σ 2 t and variance
σ 2 t. For a proof that S̃ is a martingale, see Exercise 12.5.
12.8.4 Summary. Apply Itô’s lemma to the stock price dSt = µSt dt +
σSt dWt and the bond prince dBt = rt Bt dt, we obtain the processes for ln St
and ln Bt :
1
d ln St = µ − σ 2 dt + σ dWt ,
2
d ln Bt = rt dt.
n 1 2
o nZ t o
St = S0 exp µ − σ t + σWt , and Bt = exp rs ds .
2 0
If we apply a change of measure to get the stock price under the risk-neutral
measure Q, we have
n 1
o
dSt = rSt + σSt dWtQ =⇒ St = S0 exp r − σ 2 t + σWtQ .
2
12.9 Exercises
12.1. Let the price S of a share today be $ 12.00. We will construct a time
series for the share prices over four intervals if µ = 0.35, σ = 0.26, dt = 1/252,
where 252 is the number of trade days in a year. We will determine dX at
each
√ step from the normal distribution
√ with mean 0 and standard deviation
1/ 252 ≈ 0.063, i.e., N 0, 1/ 252 = N (0, 0.063).
Step 1. At time t = 0 we have S0 = 12. We choose a value for dX from
N (0, 0.063), so take dX = −0.05. Then Eq (12.1.2) gives
dS 0.35
= + 0.26(−0.05) = −0.0116 =⇒ dS = 12(−0.0116) = −0.14,
12 252
thus, S1 = 12 − 0.14 = 11.86.
Step 2. S1 = 11.86; choose dX = 0.15. Then
dS1 0.35
= + 0.26(0.15) = 0.04 =⇒ dS = 11.86(0.04) = 0.48,
11.86 252
thus, S2 = 11.86 + 0.48 = 12.34.
Step 3. S2 = 12.34; choose dX = 0.09. Then
dS2 0.35
= + 0.26(0.09) = 0.025 =⇒ dS = 12.34(0.025) = 0.31,
12.34 252
thus, S2 = 12.34 + 0.31 = 12.65.
Step 4. S3 = 12.65; choose dX = 0.12. Then
dS3 0.35
= + 0.26(0.12) = 0.0325 =⇒ dS = 12.65(0.0325) = 0.41,
12.65 252
12.9 EXERCISES 299
n 1 X 1 X 1 X
Xt+s − Xtn = √ εj − √ εj = √ εj .
n n n
1≤j≤n(t+s) 1≤j≤nt nt+1≤j≤n(t+s)
n
√
Since Xt+s − Xtn → N (0, s), we get Xt+s − Xt = sZ. Thus, dXt behaves
√
like dt. Hence, we obtain the continuous time analog of Eq (12.9.1) as
dSt
= µ dt + σ dX, (12.9.4)
St
1 This theorem states that, given certain conditions, the arithmetic mean of a sufficiently
large number of iterates of independent random variables, each with a well-defined finite
expected value and finite variance, will be approximately normally distributed, regardless
of underlying distribution.
300 12 BLACK-SCHOLES MODEL
which is Eq (12.1.2).
12.3. Derive the Black-Scholes equation for a contingent claim f (St , t)
with a risk-free self-financing portfolio Π.
Solution. A self-financing portfolio can be constructed using Itô’s lemma.
Thus, if dSt = St µ dt + S5 σ dX, and f : (St , t) 7→ R, we find that if dx =
(dSt , dt)′ , then
∂f ∂f
df = (∇f, dx) = dS + dt
∂S ∂t
∂f ∂f ∂f
= St µ + dt + St σ dX.
∂S ∂t ∂S
√ 1 ∂2f
Since dX behaves like dt, we may take (dX)2 = dt. Then 2 (dS)2 =
∂S 2
1 ∂2f 2 2
S σ dt up to first order. Hence, by Itô’s lemma
2 ∂S 2 t
∂f ∂f ∂2f 2 2 ∂f
df = St µ + + S σ dt + St σ dX, (12.9.5)
∂S ∂t ∂S 2 t ∂S
∂f
where ∆ ≡ . For a small change δt in time, the corresponding change in
∂S
Π is δΠ = −δf + ∆ δS. The discrete version of (12.9.4) and (12.9.5) give
∂f 1 ∂2f 2 2
δΠ = − − σ St δt, (12.9.6)
∂t 2 ∂S 2
which implies that the portfolio is risk-less (no uncertainty), and then by
arbitrage argument we must have δΠ = rΠ δt, or
∂f 1 ∂2f 2 2
− − σ St δt = r(−f + ∆S) δt,
∂t 2 ∂S 2
which yields
∂f 1 ∂2f 2 2
+ σ S t + r∆S δt = rf δt,
∂t 2 ∂S 2
12.9 EXERCISES 301
or
∂f 1 ∂2f 2 2
+ σ St + r∆S = rf,
∂t 2 ∂S 2
which yields the Black-Scholes-Merton partial differential equation
∂f 1 ∂2f 2 2 ∂f
+ σ St + r S − rf = 0, (12.9.7)
∂t 2 ∂S 2 ∂S
with known Cauchy data f (St , T ). Thus, any function f that satisfies Eq
(12.9.7) denotes the price of some theoretical contingent claim, and every
contingent claim must satisfy Eq (12.9.7).
12.4. Prove that S̃t , defined in Eq (12.8.5), is a martingale under Q.
Solution. Consider the expectation under Q for s < t
1
E Q [S̃t |Fs ] = S̃0 exp − σ 2 t E Q [exp σWtQ |Fs ]
2
n 1 o n o
= S̃0 exp − σ 2 t + σWsQ E Q [exp σ(WtQ − WsQ ) |Fs ].
2
Note that at time s the quantity WtQ −WsQ is distributed as N (0, t−s), which
Q
is the same as the distribution to Wt−s at time zero. Hence, we have
n 1 o n o
E Q [S̃t |Fs ] = S̃0 exp − σ 2 t + σWsQ E Q [exp σWt−s
Q
|F0 ].
2
Thus, we have
1 n o n1 o
E Q [S̃t |Ft ] = S̃0 exp− σ 2 t + σWsQ exp σ 2 (t − s)
2 2
n 1 o
2 Q
= S̃0 exp − σ s + σWs = S̃s ,
2
2
normal distribution with mean St erτ and variance St2 e2rτ eσ τ − 1 when the
interest rate rt is the constant value r; and (ii) under the original measure,
the process for S̃t is dS̃t = (µ − r)S̃t dt + σ S̃t dWt , which is clearly not a
martingale.
12.5. Show that the process Y defined by Yt = t2 Wt3 , t ≥ 0 satisfies the
SDE Y
t
dYt = 2 + 3(t4 Yt )1/3 dt + 3(tYt )2/3 dWt , Y0 = 0.
t
Solution. Obviously, Y= 0. The function f (t, x) = t2 x3 is in C 2 , and so
by Itô’s lemma
∂f ∂f 1 ∂2f
dYt = (t, Wt ) dt + (t, Wt ) dWt + (t, Wt ) d(W, W )t
∂t ∂x 2 ∂x2
1
= 2tWt3 dt + 3t2 Wt2 dWt + 6t2 Wt dt
2
= (2tWt3 + 3t2 Wt ) dt + 3t2 Wt2 dWt .
The result is obtained since 2tWt3 = 2Yt /t, 3t2 Wt = 3(t4 Yt )1/3 , and 3t2 Wt2 =
3(tYt )2/3 .
12.6. Show that the process X given by Xt = eWt +t/2 + eWt −t/2 , t ≥ 0,
satisfies the SDE
Solution. Since Xt = eWt −t/2 (et + 1), t ≥ 0, set Zt = eWt −t/2 and
Yt = (et + 1), t ≥ 0. Then Z = E[W ] satisfies the SDE dZt = Zt dWt , and Y
satisfies the SDE dYt = et dt. Since Z is a continuous semi-martingale and Y
is continuous martingale of bounded variation, we have Z, Y ≡ [Z, Y ] ≡ 0
(P– a.s.). Hence, by the product rule we get
Also, X0 = e0 + e0 = 2.
12.7. Derive the Black-Scholes Equation (12.1.1). Solution. Given a
continuous and continuously differentiable function f over an interval I, its
Taylor’s series at a point x = a ∈ I is
∞
X f (n) (a)
f (x) = (x − a)n . (12.9.8)
n=0
n!
12.9 EXERCISES 303
or ∞
X f (n) (a)
f (x) − f (a) = (x − a)n . (12.9.9)
n=1
n!
df 1 d2 f
df = dS + (dS)2 + · · · . (12.9.11)
dS 2 dS 2
By (12.1.2)
dS = S(µ dt + σ dS),
(dS)2 = S 2 (µ2 (dt)2 + 2µσ dt dX + σ 2 (dX)2 ).
(12.9.12)
df 1 d2 S 2 2 df d2 S
df = dS + 2
(S σ dt) = (Sµ dt + σ dX) + 2 (S 2 σ 2 dt)
dS 2 dt dS dt
df df 1 2 2 d2 S
= σS dX + µS + σ S dt. (12.9.13)
dS dS 2 dt2
Itô’s lemma relates a small change in a function of a random variable to a small
change in the variable itself, as it contains a deterministic component dt and
a random component dX. We will, however, need the following multivariate
version of Itô’s lemma: If f is a function of two variables S, t, then
∂f ∂f 1 ∂2f ∂f
df = σS dX + µS + σ2 S 2 2 + dt. (12.9.14)
∂S ∂S 2 ∂S ∂t
304 12 BLACK-SCHOLES MODEL
We will now derive the Black-Scholes equation (12.1.1). Let V (S, t), which
is called C(S, t) for a call and P (S, t) for a put, denote the value of an option,
and be r the interest rate. Using Itô’s lemma (12.9.14) we have
∂V ∂V 1 ∂2V ∂V
dV = σS dX + µS + σ2 S 2 + dt. (12.9.15)
∂S ∂S 2 ∂S 2 ∂t
∂V ∂V 1 ∂2V ∂V
dΠ = σS dX + µS + σ2 S 2 2
+ dt − ∆Sµ dt − ∆Dσ dX
∂S ∂S 2 ∂S ∂t
∂V ∂V 1 ∂2V ∂V
= σS − ∆ dX + µS + σ2 S 2 2 + − µ ∆S dt
∂S ∂S 2 ∂S ∂t
1 2
2 2∂ V ∂V ∂V
= σ S + dt, choosing ∆ = . (12.9.16)
2 ∂S 2 ∂t ∂S
Now, if Π was invested in risk-less assets, it would have a growth of rΠ dt
during the interval dt. Then, using (12.9.16), we should get for a fair price
1 ∂2V ∂V ∂V 1 2 2 ∂ 2 V ∂V
rΠ dt = σ2 S 2 2
+ dt =⇒ r V − S = σ S + ,
2 ∂S ∂t ∂S 2 ∂S 2 ∂t
P (Ai )P (B | Ai )
(v) P (Ai | B) = P
n (Bayes’ formula);
P (Ai )P (B | Ai )
i=1
n
P
(vi) P (B) = P (Ai )P (B | Ai ) (Total probability formula)
i=1
variable, we have
DRV X CRV X
P R
P (X ∈ A) = f (x) A
f (x) dx
x∈A
P R
E[X] ≡ µ = xf (x) A
xf (x) dx
x∈A R∞
P
F (x) = P (X ≤ x) = f (t) −∞ f (t) dt
t≤x
Var[X] ≡ σ 2 = E[(X − µ)2 ] 2
R − µ) ] 2
E[(X
P
= (x − µ)2 f (x) = Ω (x − µ) f (x) dx
x∈Ω
H(x) = E[ln(1/f (X))] E[ln(1/f
R (X))]
P
=− f (x)(ln f (x)) = − Ω f (x) ln f (x) dx
x∈Ω
1
Note that if y = ln x, then y ′ = , x > 0. In general, if y = loga x, then
x
1
y′ = , a > 1.
x ln a
P
For discrete random variable : E[g((X)] = g(x)fX (x).
x∈ΩR
For continuous random variable : E[g((X)] = Ω
g(x)fX (x) dx.
The discrete and continuous variable (X, Y ) is defined below and presented
308 A PROBABILITY TOPICS
in Figure A3.
In this Table fX (x) denotes the marginal distribution, and E[g[X, Y ]] the
expectation.
Schwarz inequality: (E[XY ])2 ≤ E[X 2 ]E[Y 2 ].
For independent random variables X and Y ,
f (x, y) = fX (x)fY (y).
F (x, y) = FX (x)FY (y).
E[XY ] = E[X] + E[Y ].
Var[X + Y ] = Var[X] + Var[Y ].
For independent
R ∞random variables X and Y ,
fX+Y (x) = −∞ fX (t)fY (x − t) dt.
Correlation Coefficient ρ :
Cov[X, Y ]
ρ= p , −1 ≤ ρ ≤ 1,
Var[X]Var[Y ]
X and Y independent =⇒ ρ = 0, p
Var[X + Y ] = Var[X] + Var[Y ] + 2ρ Var[X]Var[Y ].
A.6 Moments
The kth central moment µk is defined as µk = E[(X − µ)k ].
The skewness γ1 and kurtosis γ2 are defined as
γ1 = µ3 /σ 3 , γ2 = (µ4 /σ 4 ) − 3.
A.7 Convergence
Convergence in probability:
lim Xn = X ⇐⇒ lim P (|Xn − X| > ε) = 0 for each ε > 0.
n→∞ n→∞
Convergence almost surely:
plimn→∞ Xn = xX ⇐⇒ P lim Xn = X = 1.
n→∞
Convergence in distribution:
Xn 7→ X ⇐⇒ lim P (Xn ≤ x) = P (X ≤ x) for each x
n→∞
such that P (X ≤ x) is continuous in x.
Convergence in mean:
l. i. m.n→∞ Xn = X ⇐⇒ lim E[|Xn − X|2 ] = 0.
n→∞
For these kinds of convergence we have:
plimn→∞ Xn = X
⇐⇒ limpn→∞ Xn = X =⇒ Xn 7→ X
l. i. m.n→∞ Xn = X
in distribution, where plim is the limit in probability, and l.i.m. is the limit
in mean.
variables themselves are not normally distributed. In other words, under cer-
tain conditions the arithmetic mean of a sufficiently large number of iterates
of independent random variables, each with a finite expected value and finite
variance, will be approximately normally distributed, regardless of the dis-
tribution used. A simple example is that if one flips a coin many times, the
probability of getting a given number of heads in a series of flips must follow
a normal curve, with mean equal to half of the total number of flips in each
series. More details are available in Rice [1995].
Let {X1 , . . . , Xn } denote a sequence of independent and identically dis-
tributed random variables drawn from distributions of expected values by µ
and finite variance by σ 2 . Consider the sample average
X1 + · · · + Xn
Sn =
n
of such random variables. Then by the law of large numbers, this sample
average converges in probability and almost surely to the expected value µ
as n → ∞. The classical central limit theorem determines the size and the
distributional form of the deterministic number µ during the convergence.
This
√ theorem states that as n gets larger, the distribution of the difference
n (Sn −µ) approximates the normal distribution with mean zero and variance
σ 2 . For very large n the distribution of Sn becomes close to the normal
distribution with mean µ and variance σ 2 /n.
Theorem A.1. (Lindeberg-Lévy CLT) Suppose {X1 , . . . , Xn } is a se-
quence of independent and identically distributed random variables √
with E[Xi ] =
µ and Var[Xi ] = σ 2 < ∞. Then as n → ∞, the random variables n (Sn− µ)
converge in distribution to a normal N (0, σ 2 ), i,e.,
n
√ h 1 X i
n Xi − µ −→ N (0, σ 2 ).
n i=1
Let X and Y denote normed linear spaces over a field F which may be R,
or Rn . The mappings defined by f : Rn 7→ R may, in general, not be linear.
Let L(X, Y ) denote the class of all linear operators from X into Y , and let
B(X, Y ) denote the class of all bounded linear operators.
Proof. The proof follows from the definition (B.2). Suppose that F (x0 ) =
f ′ (x−)), let ε > 0, and let h ∈ X. Then there exists a δ > 0 such that
1
f (x0 + h) − f (x0 ) − F (x0 )h < ε khk,
kthk
provided that kthk < δ if th 6= 0. But this implies that
f (x0 + th) − f (x0 )
− F (x0 )h < ε, (B.4)
t
provided that |t| < δkhk.
The fact still remains that the gradient is defined as ∇f (x) = Df (x) [e],
i.e.,
∂f (x) ∂f (x)
∇f (x) = e1 + · · · + en
∂x1 ∂xn
e1
∂f (x) ∂f (x) . ∂f (x) ∂f (x)
= ··· .
. = ··· [e]T ,
∂x1 ∂xn ∂x1 ∂xn (B.6)
en
and the vector [e]T cannot just be removed from this definition unless ∇f
occurs in an equation that satisfies Condition A of Theorem 2.18.
According to Smith [1985:71], using the scalar product of ∇f (x) and the
increment h, the gradient is defined implicitly by
f (x + t h) − f (x)
∇f (x), h = lim as h → 0. (B.7)
t
If this limit exists for all directions h in an inner product space, then ∇f (x)
is simply described as the gradient of f at x. In the case of Fréchet differen-
tiation, the gradient, if it exists, can be expressed as
d
∇f (x), h) = f (x + h) h=0
. (B.8)
dx
with respect to the usual inner product. The second Gateaux variation is
Z 1
d2
d2 f (x, h) = f (x1 + th1 + · · · + xn + thn ) dt = H, h , (B.10)
dt2 0
t=0
∂ 2 f (x1 , . . . , xn )
where h = (h1 , . . . , hn ), and H = is the Hessian of f . Then
∂xi ∂xj
Hh, h = hT Hh.
314 B DIFFERENTIATION OF OPERATORS
f n (ξ) (x − a)n
where Rn = is the remainder after n terms, and a < ξ < x. If
n!
Rn = 0, the infinite series is called the Taylor series about x = a.
In R2 , let x = (x, y) and a = (a, b). Then the second-order approximation
of Taylor’s series expansion is
∂f ∂ ∂f
where fi = and fij = for i, j = 1, 2.
∂xi ∂xi ∂xj
In Rn , where x = (x1 , . . . , xn ) and a = (a1 , . . . , an ), the second-order
approximation of Taylor’s series expansion is
n n
X 1h X i
f (x) ≈ f (a) + fi (a)(xi − ai ) + fij (a)(xi − ai )(xj − aj ) , (B.11)
i=1
2! i,j=1
∂f ∂ ∂f h ∂ ∂f i
where fi = and fij = = , if f is continuous for
∂xi ∂xi ∂xj ∂xj ∂xi
i, j = 1, 2, . . . , n. Note that the first two terms in (B.11) give the first-order
approximation of the Taylor’s series expansion at x = a in Rn .
C
Distributions
C.1 Definitions
We will follow the convention of denoting a random variable by an upper case
letter, e.g., X, and using the corresponding lower case letter, e.g., x, for a
particular value of that variable.
A real-valued function F (x) is called a (univariate) cumulative distribution
function (c.d.f.), or simply a distribution function, or distribution, if (i) F (x)
is nondecreasing, i.e., F (x1 ) ≤ F (x2 ) for x1 ≤ x2 ; (ii) F (x) is everywhere
continuous from the right, i.e., F (x) = lim F (x + ε); and (iii) F (−∞) =
ε→0+
0, F ′ (∞) = 1.
The function F (x) describes probability of the event: X ≤ x, i.e., the
probability p{X ≤ x} = f (x), which describes the c.d.f. of X. There are two
principal types of distributions: discrete and continuous.
Discrete Distributions. They are characterized by the random variable
X taking on an enumerable number of values . . . , x−1 , x0 , x1 , . . . with point
probabilities pn = P {X = xn } ≥ 0 which is subject only to the restriction
P
pn = 1. In this case the distribution is written as
n
X
F (x) = P {X ≤ x} = pn , (C.1)
xn ≤x
where the summation is taken over all values of x for which xn ≤ x. The set
{xn } of values for which pn > 0 is called the domain of the random variable
X.
A discrete distribution of a random variable is called a lattice distribution
if there exist numbers a and b 6= 0 such that every possible value of X can be
represented in the form a + nb, where n takes only integer values.
Continuous Distributions They are characterized by F (x) being abso-
lutely continuous. Thus, F (x) possesses a derivative F ′ (x) = f (x), and the
316 C DISTRIBUTIONS
The derivative f (x) is called the probability density function (p.d.f.) or fre-
quency function, and the values of x for which f (x) > 0 make up the domain
of the random variable X.
is the set of all negative real numbers. Unlike the case of the mirror-image
Pareto distribution, we cannot calculate closed form expressions for the c.d.f.
or the d(x) function for this distribution. But it is known that both these
functions are non-monotone because the failure rate and the mean residual
lifetime function are non-monotone.
1 e−x
5. Logistic Distribution. c.d.f. F (x) = −x
; density f (x) = .
1+e (1 + e−x )2
Also, we have (ln f (x))′ = −1 + 2(1 − F (x)), and (ln f (x))′′ = −2f (x) < 0;
hence, this distribution has log-concave density.
6. Extreme-value Distribution. Density function is f (x) = exp{−e−x},
giving (ln f (x))′′ = −e−x < 0; hence this distribution had log-concave density.
This distribution arises as the limit as n → ∞ of the greatest value among n
independent random variables.
7. chi-Square Distribution with n Degrees of Freedom. It is a gamma
distribution with θ = 2 and m = n/2. Since the sum of the squares of n
independent standard normal random variables has a chi-square distribution
with n degrees of freedom, and since the gamma distribution has a log-concave
density function for m ≥ 1, the sum of the squares of two or more independent
standard normal variables has a log-concave density function.
8. chi Distribution. Its support is {x : x > 0}; density function
x(n/2)−1 e−n/2 x2
f (x) = ,
2n/2 Γ(n/2)
λ
10. Laplace Distribution. It has density function f (x) = e−λ|x| , where
2
λ > 0; c.d.f. is
1 λx
e if x < 0,
f (x) = 2 1 −λx
1− 2e if x ≥ 0.
The density function is sometimes known as the double exponential, since it is
proportional to the exponential density for positive x and to the mirror-image
of the exponential distribution for negative x. Also, ln f (x) = λ|x|, which is
clearly a concave function, although its derivative (ln f (x))′ does not exist at
x = 0.
11. Weibull Distribution with Parameter c > 0. The density func-
c
tion is f (x) = cxc−1 e−x , x ∈ (0, ∞). Also, (ln f (x))′′ = (1 − c)x−2 (1 +
< 0 for c > 1,
cxc ) = 0 for c = 1, Thus, the density function is (strictly) log-concave if
> 0 for c < 1.
0 < c < 1, log-linear if c = 1, and log-convex if c > 1. Further, the reliability
c
function F̄ (x) = 1 − F (x) = e−x , giving (ln F̄ (x))′′ = −c(c − 1)xc−2 which
is positive for c < 1 and nonpositive for c ≥ 1. Thus, the reliability function
is log-concave for c ≥ 1 and log-convex for c < 1. For this distribution with
0 < c < 1 the failure rate is a decreasing function of age.
12. Power Function Distribution. Its c.d.f. is F (x) = xβ , with support
(0, 1); density function f (x) = βxβ−1 , giving (ln f (x))′′ = (1 − β)x−2 , so that
the density function is log-concave if β ≥ 1 and log-convex if 0 < β < 1.
This distribution has a log-concave c.d.f. for all positive β, since (ln F (x))′′ =
R x F (t) x
−βx−2 < 0. The difference function is d(x) = l dt = ; thus, d(x)
F (x) β+1
is monotone increasing for all β ≥ 0, because log-concavity of F (x) implies
that d(x) is monotone increasing. Moreover, the reliability function F̄ (x) =
βxβ−2 (1 − β − xβ )
1 − xβ , giving (ln F̄ (x))′′ = , which has the same sign as
(1 − xβ )2
1 − β − xβ ; thus, this expression is positive for x near zero and negative for x
near 1. Hence, the reliability function is neither log-concave nor log-convex on
β − xβ + 1
(0, 1). The right-side integral of the reliability function R(x) = ,
1+β
which is neither log-concave nor log-convex.
xm−1 θm e−xθ
13. Gamma Distribution. Its density function is f (x) = ,x∈
Γ(m)
1−m
(0, ∞), θ > 0 and m > 0. Then (ln f (x))′′ = . Thus, the density func-
x2
tion is strictly log-concave for m > 1, but strictly log-convex for m < 1, and
in this case f ′ (x) < 0 for all x > 0. Therefore, the c.d.f. is log-concave, and
by Theorem 8.7, the left-side integral of the c.d.f. is log-concave. Barlow and
Proschan [1981: 75] have shown that for m < 1, the failure rate is a monotone
C.2 SOME COMMON DISTRIBUTIONS 319
where B(a, b) is the incomplete beta function and n is the number of degrees
n − x2
of freedom. Then since (ln f (x))′′ = −(n+1) , the density function is
(n + x2 )2
320 C DISTRIBUTIONS
√ √
log-concave on the central interval [− n, n], and therefore, it is log-concave
√
√ this interval but log-convex on each of the outer intervals [∞, − n] and
on
[ n, ∞]. Thus, although this √ distribution
√ is itself not log-concave, a trun-
cate one on the interval [− n, n] is log-concave. There does not exist any
proof for the log-concavity or log-convexity of the c.d.f. function, but numer-
ical computations, using the program gauss, show that the c.d.f. is neither
log-concave nor log-convex for the cases n = 1, 2, 3, 4 and 24. Since this distri-
bution is symmetric, the log reliability function is the mirror-image of the log
of the log c.d.f., and hence, the c.d.f. is neither log-concave nor log-convex,
and so is the reliability function.
18. Cauchy Distribution. It is a Student’s t distribution with one degree of
freedom, and is equal to the ratio of two independent standard normal random
1
variables. The density function is f (x) = , and c.d.f. F (x) =
π(1 + x2 )
1 arctan(x) x2 − 1
2 + ; then (ln f (x))′′ = −2 which is negative for |x| < 1
π (x2 + 1))2
and positive for |x| > 1. Thus, Rthe density function is neither log-concave nor
x
log-convex. Since the integral −∞ F (t) dt does not converge, the function G
is not well-defined.
19. F -Distribution. It has support as the set of positive real numbers. It
has two integer-valued parameters m1 and m2 , known as ‘degrees of freedom.’
The density function is
D.1 Notation
The Laplace transform is defined as
Z ∞
L{f (t)} ≡ F (s) = f¯(s) = f (t)e−st dt, (D.1)
0
The second property is very useful; it is based on the Leibniz rule, which
states that if g(x, t) is an integrable function of t for each value of x, and
∂g(x, t)
the partial derivative exists and is continuous in the region under
∂x Z b
Rb ∂g(x, t)
consideration, and if f (x) = a g(x, t)dt, then f ′ (x) = dt.
a ∂x
A table of some useful Laplace transform pairs is given at the end of this
Appendix (Table D.1, p. 340); larger tables are available in reference books
on mathematical formulas, e.g., in Abramowitz and Stegun [1972].
We will now explain certain techniques to derive Laplace transforms from
known transform pairs.
Example D.1. Consider (see formula 19, Table D.1)
1
L eat = . (D.3)
s−a
Differentiating both sides with respect to a, we get
1
L teat = , (D.4)
(s − a)2
2!
L t2 eibt = ,
(s − ib)3
which yields
and equating the real and imaginary parts of this equality, we obtain
2(s3 − 3sb2 )
L t2 cos bt = , (D.6)
(s2 + b2 )3
and
2(3s2 b − b3 )
L t2 sin bt = . (D.7)
(s2 + b2 )3
The Laplace transforms of L eat t2 cos bt and Leat t2 sin bt can then be
easily obtained.
Example D.2. Consider
( √ )
−1 e−a s
a
L = erfc √ , (D.8)
s 2 t
where
Z x Z ∞
2 −u2 2 2
erf(x) = √ e du, and erfc(x) = 1 − erf(x) = √ e−u du.
π 0 π x
Then, after changing the order of integration and the Laplace inversion and
carrying out the integration on the left side, we get
Z a ( √ )
−1 e−x s √
L dx = L−1 (s−3/2 − s−3/2 e−a s ), (D.10)
0 s
1
which, in view of the convolution theorem (Property (i) ) with F (s) = and
√ s
G(s) = e−a s+c
, yields
( √ ) Z
t
−1 e−a s+c a 2
L = √ e−cu−a /4u du. (D.12)
s 0 2 πu
3
D.1 NOTATION 325
Since r r
a a 1 c a 1 c
√ = √ + + √ − ,
2 u 3 4 u 3 2 u 4 u3 2 u
and
a2 √ a 2 √ √ a 2 √
cu + = cu + √ −a c= cu − √ + a c,
4u 2 u 2 u
a √ a √
we define x = √ + cu and y = √ − cu , and use the notation
2 u 2 u
a √ a √
√ + ct = x1 , and √ − ct = y1 .
2 t 2 t
Hence,
( √ )
−1 e−a s+c
1 h a√c a √
L = e erfc √ + ct
s 2 2 t (D.13)
√ a √ i
+ e−a c erfc √ − ct .
2 t
∞
X
L−1 {G(s)} = g(t) = gk (t), (D.14)
k=1
T0 eml T0
B= , and A= − B.
2s sinh ml s
∞
T0 −ml m(l−x) X
T = e e − e−m(l−x) e−2nml
s 0
∞
T0 X −m(2nl+x)
= e − e−m[(2n+2)l−x] ,
s 0
∞
X 2(n + 1)l − x 2nl + x
T = T0 erf √ − erf √ .
0 2 kt 2 kt
Alternatively, we can use the Cauchy residue theorem and obtain the solution
in terms of the Fourier series. Thus,
Proof. Consider the rectangle in Figure D.1. Choose β > |γ| and such
that z0 lies inside this rectangle. By the Cauchy integral formula, we have
Z
f (z)
dz = 2πif (z0 ), (D.20)
Γ z − z0
where Γ is the contour ABCDA. Let S denote the contour ABCD, then
Z Z Z
f (z) f (z) f (z)
dz = dz + dz.
Γ z − z0 DA z − z0 S z − z0
Since Z Z
f (z) f (z)
dz = − dz,
DA z − z0 AD z − z0
we get from ( D.20)
Z γ+iβ Z
f (z) f (z)
− dz + dz = 2πif (z0 ). (D.21)
γ−iβ z − z0 S z − z0
f (z) f (z) 1 M 2M
z − z0
=
z
z0 ≦ k+1 z0 ≦ β k+1 .
1− z 1−
z z
Z Z
f (z) 2M 2M
dz < k+1 |dz| = k+1 (length of S)
S z − z0 β S β
2M 4β − 2γ 2M 2γ
= k = k 4− .
β β β β
Z
f (z)
Thus, lim dz = 0. Hence, from ( D.21),
β→∞ S z − z0
Z γ+i∞
f (z)
− dz = 2πif (z0 ),
γ−i∞ z − z0
or
Z γ+i∞
1 F (z)
F (s) = dz. (D.22)
2πi γ−i∞ s−z
The proof of Theorem 6.2 for the Laplace transform now becomes elemen-
tary. By taking the Laplace inverse of both sides of Eq ( D.22), we have
Proof. Consider the integral over the arc BB ′ . Let the angle BOC ′ be
denoted by α. On BB ′ we have z = Reiθ , where θ varies from α to π/2,
α = cos−1 (γ/R), and γ = OC ′ . Then we get
Z Z π/2
iθ
ezt f (z) dz < CR−k eRte Rieiθ dθ
BB ′ α
Z π/2 Z π/2
−k+1 Rt cos θ −k+1
= CR e dθ ≦ CR eγt dθ
α α
= CR−k+1 (π/2 − α)eγt
γ
= CR−k+1 eγt sin−1 →0 as R → ∞.
R
Z
Similarly, ezt f (z) dz → 0 as R → ∞.
A′ A
Let us now consider the integral over the arc B ′ CA′ . By following the
above procedure, we get
Z Z 3π/2
ezt f (z) dz < CR−k+1 eRt cos θ dθ
B ′ CA′ π/2
Z π
−k+1 −Rt sin φ
= CR e dφ where θ = π/2 + φ
0
Z π/2 Z π/2
−k+1 −Rt sin φ −k+1
= 2CR e dφ ≤ 2CR e−2Rtφ/π dφ
0 0
πCR−k
= (1 − e−Rt ) → 0 as R → ∞.
t
D.2 LAPLACE2 -TRANSFORM 331
Z
Hence, ezt f (z) dz → 0 as R → ∞, provided that t > 0.
Γ
The justification to use the inequality e−RT sin φ ≤ e−2RT φ/π in the above
penultimate step is as follows: The function g(φ) = sin φ − 2φ/π ≥ 0 for
0 ≤ φ ≤ π/2, and with g(0) = 0 = g(π/2) has only one critical point at
φ = cos−1 (2/π), which gives a maximum.
Z γ+iβ
1
This result enables us to convert the integral F (z) ezt dz into an
2πi γ−iβ
integral over the contour (−Γ).
Proof. Using the definition (D.24) for F (s) and G(s), we get
Z ∞ Z ∞
−s2 y 2 2
x2
F (S)G(s) = ye f (y) dy x e−s g(x) dx
Z0 ∞ Z ∞ 0
2 2 2
= yx e−s (x +y ) f (y)g(x) dx dy
0 0
Z ∞ Z t p
2 2
= t e−s t dt xg(x)f ( t2 − y 2 ) dx
0 0
Z ∞ p nZ t o
2 2
= t e−s t xg(x)f ( t2 − y 2 ) dx dt
0 0
nZ t p o
= L2 xg(x)f ( t2 − x2 ) dx ,
0
nZ ∞ o
L2 f (τ )Φ(t, τ ) dτ = F (q(s))Φ(s), (D.28)
0
nZ ∞ o Z ∞ ∞ nZ o
−s2 t2
L2 f (τ )Φ(t, τ ) dτ = te f (τ )Φ(t, τ ) dτ dt
0 0 0
Z ∞ nZ ∞ o Z ∞
2 2 2 2
−s t
= f (τ ) dt te Φ(t, τ ) dτ = Φ(s) f (τ )τ e−τ q (s) dτ
0 0 0
= Φ(s)F (q(s)).
More details about L2 -transform can also be found in Yurekli and Wilson
[2002], [2003].
D.3 Exercises
D.1. The√
formula (D.9) can be obtained as follows (Churchill [1972]):
−a s
e √ dy 1 √ a √
Define √ = x and e−a s = z. Then y ′ = = − 3/2 e−a s − e−a s ,
s ds 2s 2s
D.3 EXERCISES 333
a √
which yields 2sy ′ +y +az = 0. Similarly, z ′ = − √ e−a s yields 2z ′ +ay = 0.
2 s
Taking the inverse transform of these equations, we get
where L−1 {y} = F (t) and L−1 {z} = G(t). From these two equations in F
1 a2 F A 2
and G, we get F ′ = − F , whose solution is F = √ e−a /4t , which
2t 2t t
aA −a2 /4t 1 1
gives G = √ e . Note that if a = 0, then y = √ , and F (t) = √
2 t 3 s πt
1
implies that A = √ . Hence,
π
1 2 a 2
F (t) = √ e−a /4t , G= √ e−a /4t .
πt πt 3
( √ )
−1 e−a s 1 2
Then we integrate L √ = √ e−a /4t with respect to a from 0 to
s πt
( √ )
e−a s n 1 o rπ
−1
a and obtain L , using L √ = (formula 12, Table D.1).
s t s
n cosh a√s o n √ o
−1 −1 sinh a s
D.2. Find (a) L √ , and (b) L √ , b > a > 0.
s cosh b s sinhb s
Hint. Use cosh x = ex + e−x /2, sinh x = ex − e−x /2, and (1 + z)−1 =
∞
X
(−1)n z n .
n=0
∞
X (2n + 1)b − a (2n + 1)b + a
Ans. (a) (−1)n erfc √ + erfc √ ,
n=0
2 t 2 t
∞
X (2n + 1)b − a −[(2n+1)b−a]2 /(4t) (2n + 1)b + a −[(2n+1)b+a]2 /(4t)
(b) √ e − √ e .
n=0 4πt3 4πt3
Γ(n+!)
D.3. Show that L{tn } = n+1 , where Γ(x) is the gamma function.
Z ∞ s Z ∞
1 Γ(n+!)
n
Ans. We have L{t } = tn e−st dt = n+1 xp e−x dx = n+1 , where
0 s 0 s
we have set st = x.
D.4. Solve the partial differential equation utt = uxx , with the initial
(1 − x)2
conditions u(x, 0) = − , ut (x, 0) = 0, and the boundary conditions
2
ux (0, t) = 1 and ux (1, t) = 0.
1 (1 − x)2
Ans. u = − t2 − .
2 2
334 D LAPLACE TRANSFORMS
D.5. Solve the partial differential equation ut = uxx , with the initial
condition u(x, 0) = 0 and the boundary conditions ux (0, t) = 0 and u(1, t) =
1. √
cosh x s
Hint. The solution in the transform domain is ū = √ .
s cosh s
Find two different inverses of this solution, by expanding the solution in a
series of the type shown in Example D.7 and by the residue theorem.
∞
X
n 2n + 1 − x 2n + 1 + x
Ans. u = (−1) erfc √ + erfc √ , or
0
2 t 2 t
∞
X 4 cos(2n + 1)πx/2 −(2n+1)2 π2 t/4
u=1− (−1)n e .
0
(2n + 1)π
( √ )
−1 e−a s
D.8. Use contour integration to evaluate L .
s
Solution. Using the Laplace inversion formula (D.2), we have
Z √
c+i∞
1 e−a s
f (t) = est ds. (D.29)
2πi c−i∞ s
D.3 EXERCISES 335
Consider the Bromwich contour M ABC1 CDL (Figure D.3). Then by Cauchy’s
theorem we get
Z c+i∞ −a√s Z Z
e st
I= e ds = F (s) ds + F (s) ds
c−i∞ s LD DC
Z Z Z
+ F (s) ds + F (s) ds + F (s)ds.
C1 BA AM
The integral over the circle C1 is easily shown to be equal to 2πi. This is
done by taking the radius to be ε and substituting s = εeiθ . On BA, s = u eiπ ,
and
Z R→∞
1 −a√ueiπ/2+uteiπ iπ
IBA = e e du
u eiπ
Zε→0∞ Z ∞
1 −ia√u−ut 1 −ut √ √
= e du = e (cos a u − i sin a u) du
0 u 0 u
Z ∞
1 −v2 t
=2 e (cos av − i sin av) dv,
0 v
R R ∞ 1 −v2 t
where u = v 2 . Similarly, CD = −2 0 e (cos av + i sin av) dv. Hence,
v
Z Z Z ∞
1 −v2 t
+ = −4i e sin av dv.
CD BA 0 v
336 D LAPLACE TRANSFORMS
Z ∞
1 −v2 t
In order to evaluate the integral e sin av dv, we consider the integral
Z ∞ 0 v
2
e−v t cos av dv. Then
0
Z ∞ Z ∞
−v 2 t −v 2 t+iav
e cos av dv = ℜ e dv
0 0
Z ∞ √ √
−a2 /4t −(v t−ia/2 t)2
=ℜ e e dv
0
2 Z ∞
e−a /4t 2 √ √
=ℜ √ √ e−w dw , where w = v t − ia/2 t
t −ia/2 t
2 Z ∞ Z 0
e−a /4t −w 2 −w 2
=ℜ √ e dw + √ e dw .
t 0 −ia/2 t
Hence,
Z ∞ √ −a2 /4t
2 πe
e−v t cos av dv = √ .
0 2 t
Integrating both sides of this equation with respect to a from 0 to a, we get
Z ∞ r Z a
1 −v2 t π 2 π a
e sin av dv = e−x /4t
dx = erf √ .
0 v 4t 0 2 2 t
Thus
( √ )
−1 e−a s
1 π a a
L = 2πi − 4i erf √ = erfc √ . (D.30)
s 2πi 2 2 t 2 t
√
n −x s2 +a2 o
−1 e
D.9. Determine f (x, t) ≡ L2 .
2s2 (s2 − b)
√ √
−x s2 +a2
e−x s+a
2
e √
Solution. Let F (s) = 2 2 , which gives 2F ( s) = . If
2s (s − b) s(s − b)
√ 2 ∞R 2 2 2
we denote x s + a2 by z and use e−z = √ 0 e−y −z /(4y ) dy, we find that
π
Z √
c+i∞
e−x s+a st2
2
1
f (x, t) = e ds
2πi c−i∞ s(s − b)
Z c+i∞ Z ∞ 2
1 1 2 2 2 2
= e−y −(x (s+a ))/(4y ) dy est ds
2πi c−i∞ s(s − b) 0
Z ∞ −y2 −a2 x2 /(4y2 ) Z c+i∞ −x2 /(4y2 )−t2 )s
e 1 e
= dy ds
0 s 2πi c−i∞ s−b
D.3 EXERCISES 337
Z
1 ∞ −y2 −a2 x2 /(4y2 ) b(t−x2 /(4y2 )+t2 )
= e e − 1 H(t − x2 /(4y 2 ) + t2 ) dy
b 0
Z Z
1 ∞ −y2 −(a2 +1)x2 /(4y2 )+b(t+t2 ) 1 ∞ −y2 −a2 x2 /(4y2 )
= e dy − e dy,
b A b A
where A = 12 √t+t
x
2
, and in the last two lines above we have used the formulas
n e−as o n e−as o
L−1 = H(t − a) and L−1 = eb(t−a) H(t − a) (see Table D.1).
s s−b
π 2 2
D.10. To show that L2 {sin(tτ )} = 3 t e−t /(4s ) . Hint. Use the defini-
4s
tion (D.24).
D.11. Use the L2 -transform to solve the singular integral equation
Z ∞ t
2
f (y) sin(ty) dy = erf , a ∈ R.
π 0 2a
r
π
which simplifies to F (s) = 12 . On inverting the L2 -transform (using
a2 + s2
2 2
e−a t
(D.26)) we obtain the required solution as f (t) = . Note that erf(x) =
t
2 xR 2
√ 0 e−t dt.
π
nZ ∞ τ2 o
D.12. Use the Efros theorem to find L2 erfc dτ .
0 2x
Solution. Using (D.28) we get
nZ ∞ τ2 o nZ ∞ o τ2 1
L2 erfc dτ = L2 τ erfc
dτ
0 2x 0 2x τ
h h i √
1 1 π
= 2 L2 } = ,
s τ s→√s 4s5/4
√
√ π
which, using (D.26) for F ( s) = yields
4s5/4
n √π o √
π 1/4
L−1
2 = x .
4s 5/4 2Γ( 54 )
338 D LAPLACE TRANSFORMS
∂g ∂f −1 ∂f
(x) = − (x, g(x)) (x, g(x)).
∂xj ∂y ∂xj
This is not strictly convex because (1.0) (0, 1) and (1, 0) 6= (0, 1), but
1 1 1 1
2 (1, 0) + 2 (0, 1) = ( 2 , 2 ) (0, 1).
Finally, if u represents , then (i) is convex if u is quasi-concave, and
(ii) is strongly convex if u is strictly quasi-concave.
For more details, see Richter [1971].
Bibliography
(Note: First author or single author is cited with last name first.)
78(6): 616-631.
Krugman, P. 1991. Increasing returns and economic geography. J. Polit.
Econ. 99: 483-499.
Kuhn, H. W. 1976. Nonlinear Programming. A historical review, in Nonlinear
Programming, (R. W. Cottle and C. E. Lemke, eds.), Vol 9, SIAM-AMS
Proceedings, pages 1-26. American Mathematical Society.
——— , and A. W. Tucker. 1951. Nonlinear programming. Proceedings of
2nd Berkeley Symposium. Berkeley, University of California Press. pp.
481-492.
Kythe, Prem K. 2011. Green’s Functions and Linear Differential Equations:
Theory, Applications, and Computation. Taylor & Francis Group/CRC
Press.
Lay, S. R. 1982. Convex Sets and Their Applications. New York, NY: John
Wiley.
Lekkerkerker, C. G. 1953. A property of logarithmic concave functions, I, II
Indag. Math. 15: 505-521.
Lipschutz, S. 1968. Linear Algebra. New York, NY: McGraw-Hill.
Luenberger, D. G. 1968. Quasi-convex programming. SIAM Journal on Ap-
plied Mathematics. 16(5).
——— . 1984. Linear and Nonlinear Programming. Addison-Wesley.
Mangasarian, O. L. 1969/1994. Nonlinear Programming. New York, NY:
McGraw-Hill. Reprinted as Classics in Applied Mathematics. SIAM, 1994.
Markowitz, H. 1952. Porfolio selection. The Journal of Finance. 7(1): 77-91.
Marsden, Jerrold, E., and Anthony J. Tromba. 1976. Vector Calculus. W. H.
Freemann, San Francisco.
Marshall, A., and I. Olken. 1979. Inequalities: Theory of Majorization and
Its Applications. New York: Academic Press.
Martin, D. H. 1985. The essence of invexity. J. Optim. Theory Appl. 47:
65-76. doi:10.1007/BF00941316.
Martos, B. 1969. Subdefinite matrices and quadratic forms. SIAM J. Appl.
Math. 17: 1215-1233.
——— . 1971. Quadratic programming with quasi-convex objective function.
Opns. Res. 19: 87-97.
——— . 1975. Nonlinear Programming Theory and Methods. North-Holland.
Mas-Colell, A., M. D. Whinston, and J. R. Green. 1995. Microeconomic
Theory. Oxford: Oxford University Press.
Merkle, Milan. 1998a. Convolutions of logarithmically concave functions.
Univ. Beograd. Publ. Elektrotehn. Fak. Ser. Math. 9: 113-117.
——— . 1998b. Logarithmic concavity of distribution functions. Interna-
tional Memorial Conference “S. S. Mitrinovic” Nis., 1996 collection, in G.
V. Milovanovič (ed.) Recent Progress in Inequalities. Dordrecht: Kluwer
Academic Publishers. pp. 481-484.
Meyer, C.D. 2000. Matrix Analysis and Applied Linear Algebra. Society for
Industrial and Applied Mathematics.
BIBLIOGRAPHY 345
Michel, Anthony N., and Charles J. Herget. 2007. Algebra and Analysis.
Boston, MA: Birkhaüser.
Mocedal, J., and S. J. Wright. 1999. Numerical Optimization. Springer.
Moyer, Herman. 1969. Introduction to Modern Calculus. New York, NY:
McGraw-Hill.
Muth, E. 1977. Reliability models with positive memory derived from the
mean residual life function, in Theory and Applications of Reliability, vol.
II, C. Toskos and I. Shimi, eds. New York: Academic Press, pp. 401-436.
Nesterov, Yurii. 2004. Introductory Lectures on Convex Optimization: A
Basic Course. (Applied Optimization). New York: Springer Science +
Business Media.
——— , and Lars-Erik Persson. 2006. Convex Functions and Their Applica-
tions. (CMS Books in Mathematics). Cambridge University Press.
Nicholson, Walter. 1978. Microeconomic Theory. 2nd ed. Hinsdale: Dryden
Press.
Niculescu, C. P. 2000. A new look at Newton’s inequalities. J. Inequal Pure
Appl. Math. 1, issue 2, article 17; also http://jipam.vu.edu.au/.
——— , and Lars-Erik Persson. 2006. Convex Functions and Their Applica-
tions. (CMS Books in Mathematics).
Nielsen, Lars Tyge. 1993. Understanding N (d1 ) and N (d2 ): Risk-adjusted
probabilities in the Black-Scholes model. Revue Finance (Journal of the
French Finance Association). 14(1): 95-106.
Osserman, Robert. 1968. Two-Dimension Calculus. Harcourt, Bruce &
World, New York.
Patel, J. K., C. H. Kapadia, and D. B. Owen. 1976. Handbook of Statistical
Distributions. New York: Marcel Dekker.
Peajcariaac, Josep E., and Y. L. Tong. 1992. Convex Functions, Partial
Orderings, and Statistical Applications. Mathematics in Science & Engi-
neering. Boston, MA: Academic Press.
Pečarić, Josep E., Frank Proschan, and Y. L. Tong. 1992. Convex Functions,
Partial Orderings, and Statistical Applications. Mathematics in Science
and Engineering 187. Boston, MA: Academic Press.
Phelps, Robert R. 1993. Convex Functions, Monotone Operators and Differ-
entiability. Lecture Notes in Mathematics.
Polyak, B. T. 1987. Introduction to Optimization. Optimization Software.
Translated from Russian.
Ponstein, J. 1967. Seven kinds of convexity. SIAM Review. 9(1): 115-119.
Prékopa, András. 1971. Logarithmic concave measures with applications tom
stochastic programming. Acta Scientiarum Mathematicarum. 32: 301-
316.
Rådo, Lennat, and Bertil Westergren.1995. Mathematical Handbook for Sci-
ence and Engineering. Boston, MA: Birkhäuser.
Rice, John. 1995. Mathematical Statistics and Data Analysis, 2nd ed.,
Duxbury Press.
346 BIBLIOGRAPHY
Springer.
Todd, M. J. 2001. Semidefinite optimization. Acta Numerica. 10: 515-560.
Valentine, F. A. 1964. Convex Sets. New York, NY: McGraw-Hill.
Vandenberghe, L., and S. Boyd. 1995. Semidefinite programming. SIAM
Review, 49-95.
van Tiel, J. 1984. Convex Analysis. An Introductory text. New York, NY:
John Wiley.
Varian, Hal R. 1982. The nonparametric approach to demand analysis. Econo-
metrica. 50:945-973.
——— . 1992. Microeconomic Analysis. 3rd ed. New York: Norton.
Veblen, Thorstein B. 1899. The Theory of the Leisure Class: An Economic
Study of Institutions. London: Macmillan.
von Neumann, J. 1928. Zur Theorie der Gesellschaftsspiele. Math. Annalen.
100: 295-320.
——— . 1945-46. A model of general economic equilibrium. Review of Eco-
nomic Studies. 13: 1-9.
——— , and O. Morgenstern. 1953. Theory of Games and Economic Behav-
ior.3rd ed. Princeton University Press; 1st ed. 1944.
Wang, Y. Linear transformations preserving log-concavity. Linear Algebra
Appl. 359: 161-167.
——— , and Y.-N. Yeh. 2005. Log-concavity and LC-positivity. Available at
archiv:math.CO/0504164.
Webster, R. 1994. Convexity. Oxford University Press.
Whittle, P. 1971. Optimization under Constraints. New York, NY: John
Wiley.
Wilf, H. S. 1994. Generating Functionology. 2nd ed. Boston, MA: Academic
Press.
Wilmott, P. , S. Howison, and J. Dewynne. 1995. The Mathematics of Finan-
cial Derivatives: A Student Introduction. Cambridge, U.K.: Cambridge
University Press.
Wilson, C. 2012. Concave functions of a single variable. ECON-UA6, Uni-
versity of New York. http://homepages.nyu.edu/caw1, Feb 21, 2012.
Yurekli, O., and A. Sadek. 1991. Parseval-Goldstein type theorem on the
Widder potential transform and its applications. Intern. J. Math. Math.
Scs. 14:517-524.
——— , and S. Wilson. 2002. A new method of solving Bessel’s differential
equation using the ℓ2 -transform. Appl. Math. Comput. 130: 587-591.
——— , and S. Wilson. 2003. A new method of solving Hermite’s differential
equation using the ℓ2 -transform. Appl. Math. Comput. 145: 495-500.
Wolfe, P. 1959. The simplex method for quadratic programming. Economet-
rica. 27: 382-398.
Zalinescu, C. 2002. Convex Analysis in General Vector Spaces. World Scien-
tific.
Index
A C
Abel’s summation formula 209 call option 277
acceleration 52ff capital asset pricing model 292ff
affine inequality constraints 113 cash-or-nothing call 283
transformation of domain 72 center limit theorem 299, 311ff
antiderivative 39 characteristic polynomial 13
Asplund sum 204ff roots 25, 243
arbitrage 277ff Chebyshev inequality 307
area measure 207 circular helix 38
asset-or-nothing call 283 cofactor 6
autonomous expenditure multiplier 48ff comparative statics 109ff
complementary slackness conditions 97,
B 127
basins 45 concavity 34, 37
Bayes’ formula 306 , test for 34ff
Beale’s method 222ff Condition A 46, 313
bisection method 192 Condition B 209ff, 211ff, 215
Black-Scholes call price 275, 277, 280, conditional expectation 294
287, 295 cone 65, 162
economy 287 concave programming 87ff
formula 282ff conditional probability 208
model 271ff, 282 correlation coefficient 310
Black-Scholes equation 271ff, 273ff, 280, constraints, budget 113, 171
294, 300, 302, 304 , convex objective 217
Bonferroni’s inequality 305 , convex linear 217
Boole’s inequality 305 , dual 218
bordered Hessian: two functions, 16, , equality 123ff, 188ff, 218
90ff, 116, 150ff, 157, 170, 188ff, , equality and inequality 92ff, 190
193ff , implicit 139
, single function, 19ff, 157ff, 186ff, , inequality 97, 105, 126, 128, 139,
196 189, 228
bounded operators 313 , nonnegativity 99, 218
, variation 300 , qualifications 93
Bromwich contour 337 constraint set, convex 217
Brownian motion 284, 299, 301 , convex quadratic 217
budget constraint 171 control variable 235
, indifference curve 102 contour 330
Brunn-Minkowski inequality 206 , lines 162
350 INDEX