0% found this document useful (0 votes)
17 views379 pages

Concave Analysis

Uploaded by

mynetidcard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views379 pages

Concave Analysis

Uploaded by

mynetidcard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 379

Elements of

Concave Analysis
and Applications
Elements of
Concave Analysis
and Applications

Prem K. Kythe
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2018 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20180414

International Standard Book Number-13: 978-1-138-70528-9 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Names: Kythe, Prem K., author.


Title: Elements of concave analysis and applications / Prem K. Kythe.
Description: Boca Raton, Florida : CRC Press, [2018] | Includes
bibliographical references and index.
Identifiers: LCCN 2018006905| ISBN 9781138705289 (hardback : alk. paper) |
ISBN 9781315202259 (ebook : alk. paper)
Subjects: LCSH: Concave functions--Textbooks. | Convex functions--Textbooks.
| Functions of real variables--Textbooks. | Matrices--Textbooks.
Classification: LCC QA353.C64 K96 2018 | DDC 515/.88--dc23
LC record available at https://lccn.loc.gov/2018006905

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To the memory of
Dr. James H. & Mrs. Mickey Abbott
with reverence and love
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Notations, Definitions, and Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
1 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Cofactor Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Solution with the Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.3 Gaussian Elimination Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Definite and Semidefinite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Special Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6.1 Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6.2 Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.3 Bordered Hessian: Two Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6.4 Bordered Hessian: Single Function . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Differential Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.1 Limit of a Function at a Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Theorems on Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.1 Limit at Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.2 Infinite Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Global and Local Extrema of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 First and Second Derivative Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.1 Definition of Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 Vector-Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.1 Geometric Meaning of the Inflection Point . . . . . . . . . . . . . . . . . . . 40
2.6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 Multivariate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.7.1 Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
viii CONTENTS

2.7.2 Gradient at a Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45


2.8 Mathematical Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.8.1 Isocost Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.8.2 Supply and Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.8.3 IS-LM Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.8.4 Marginal of an Economic Function . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.8.5 Marginal Rate of Technical Substitution . . . . . . . . . . . . . . . . . . . . . 50
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Concave and Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.1 Properties of Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3 Jensen’s Inequality for Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.1 Properties of Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.2 Jensen’s Inequality for Convex Functions . . . . . . . . . . . . . . . . . . . . 73
3.5 Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.6 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4 Concave Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 Constrained Optimization with Equality Constraint . . . . . . . . . . 89
4.3 Karush-Kuhn-Tucker Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.1 Equality and Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.2 Necessary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.3 Regularity Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.4 Sufficient Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4 Inequality Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.5 Application to Mathematical Economics . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.5.1 Peak Load Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.6 Comparative Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5 Convex Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1 Minimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1.1 Unconstrained Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1.2 Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.1.3 Equality Constraints: General Case . . . . . . . . . . . . . . . . . . . . . . . . 123
5.1.4 Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.1.5 General Linear Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.2 Nonlinear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2.1 Two Inequality Constraints and One Equality Constraint . . . 131
5.2.2 Two Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
CONTENTS ix

5.3 Fritz John Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135


5.3.1 Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3.2 Slater’s Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.4 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.4.1 Geometrical Interpretation of Duality . . . . . . . . . . . . . . . . . . . . . . 141
5.4.2 Saddle Point Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.4.3 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6 Quasi-Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.1 Quasi-Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.2 Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.3 Theorems on Quasi-Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.4 Three-Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.5 Multivariate Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.6 Sums of Quasi-Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.7 Strictly Quasi-Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.7.1 Sums of Strictly Quasi-Concave Functions . . . . . . . . . . . . . . . . . . 167
6.8 Quasi-Concave Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7 Quasi-Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.1 Quasi-Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.2 Properties of Quasi-Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.3 Bordered Hessian Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.3.1 Properties of the Bordered Hessian . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.4 Quasi-Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.4.1 No Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.4.2 Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.4.3 Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.4.4 Convex Feasibility Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.4.5 Equality and Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . 190
7.4.6 Minmax Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8 Log-Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.1.1 Log-Concavity Preserving Operations . . . . . . . . . . . . . . . . . . . . . . 198
8.2 Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.2.1 General Results on Log-Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.2.2 Log-Concavity of Density and Left-Side Integral . . . . . . . . . . . . . 202
8.2.3 Reliability Theory and Right-Side Integral . . . . . . . . . . . . . . . . . . 203
8.2.4 Mean Residual Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.3 Asplund Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.3.1 Derivatives of Integrals of Log-Concave Functions . . . . . . . . . . 204
x CONTENTS

8.3.2 Adding Log-Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205


8.3.3 Asplund Sum and Conjugate Functions . . . . . . . . . . . . . . . . . . . . . 205
8.3.4 Integral Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.3.5 Area Measure of Log-Concave Functions . . . . . . . . . . . . . . . . . . . . 207
8.4 Log-Concavity of Nonnegative Sequences . . . . . . . . . . . . . . . . . . . . . . . . 207
8.5 Log-Concave Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
7.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9 Quadratic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.1 Quadratic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.2 Hildreth-D’Esopo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.3 Beale’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.4 Wolfe’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
10 Optimal Control Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10.1 Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10.2 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10.2.1 Sufficient Conditions for Optimization . . . . . . . . . . . . . . . . . . . . . 235
10.3 Free Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
10.4 Inequality Constraints at the Endpoints . . . . . . . . . . . . . . . . . . . . . . . 237
10.5 Discounted Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11 Demands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.1 Shephard’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.2 Marshallian Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
11.3 Hicksian Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
11.4 Slutsky Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
11.4.1 Giffen Goods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
11.4.2 Veblen Goods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
11.5 Walrasian Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
11.6 Cost Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
11.7 Expenditure Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.8 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12 Black-Scholes Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.1 Black-Scholes Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.1.2 Itô’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
12.1.2 Derivation of Black-Scholes Equation . . . . . . . . . . . . . . . . . . . . . . 272
12.2 Solution of Black-Scholes Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
12.2.1 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
12.2.2 Solution of the Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
12.2.3 Black-Scholes Call Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
12.2.4 Some Finance Terms and Arbitrage . . . . . . . . . . . . . . . . . . . . . . . 277
12.2.5 Self-Financing Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
12.2.6 Implied Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
CONTENTS xi

12.3 Black-Scholes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282


12.4 Use of Greek Letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.5 Log-normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.5.1 Log-normal c.d.f and p.d.f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.5.2 Log-normal Conditional Expected Value . . . . . . . . . . . . . . . . . . . 286
12.6 Black-Scholes Call Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.6.1 Black-Scholes Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.6.2 Black-Scholes under a Different Numéraire . . . . . . . . . . . . . . . . 288
12.6.3 Black-Scholes by Direct Integration . . . . . . . . . . . . . . . . . . . . . . . 290
12.6.4 Feynman-Kac Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.6.5 CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.6.6 CAPM for Assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
12.7 Dividends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
12.7.1 Continuous Dividends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
12.7.2 Lumpy Dividends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.8 Solutions of SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.8.1 Stock Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.8.2 Bond Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
12.8.3 Discounted Stock Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
12.8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
12.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
A Probability Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
B Differentiation of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
C Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
D Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
E Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
F Locally Nonsatiated Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Preface

This textbook on concave analysis aims at two goals. Firstly, it provides sim-
ple yet comprehensive subject matter to the readers who are undergraduate
seniors and beginning graduate students in mathematical economics and busi-
ness mathematics. For most readers the only prerequisites are courses in ma-
trix algebra and differential calculus including partial differentiation; however,
for the last chapter a thorough working knowledge of linear partial differen-
tial equations and the Laplace transforms is required. The readers can omit
this chapter if not required. The subject of the book centers mostly around
concave and convex optimization; other related topics are also included. The
details are provided below in the overview section.
Although there are many excellent books on the market, almost all of
them are at times difficult to understand. They are very heavy on theoreti-
cal aspects, and generally fail to provide ample worked-out examples to give
readers easy understanding and workability.
The second goal is elucidated below in the section ‘To Readers’.

Motivation
The subject of convexity and quasi-convexity has been a model for economic
theorists to make decisions about cost minimization and revenue maximiza-
tion. This has resulted in a lot of publications in convex optimization. So
why is there keen interest in concave and quasi-concave functions? Firstly,
economic theory dictates that all utility functions are quasi-convex and that
all cost functions are concave in input prices. Therefore, a cost function that
is not concave in input prices is not a cost function. Secondly, the standard
model in economic theory consists in a set of alternatives and an ordering of
these alternatives, according to different priorities and interests. The process
that a decision maker follows is to choose a favorite alternative with the prop-
erty that no other alternative exceeds the ordering. In such a situation the
decision maker often uses a function that ‘represents’ this ordering. Thus, for
example, suppose there are four alternatives, say, a, b, c and d, and suppose
that the decision maker prefers a to b and treats both c and d as equally
desirable. Any function, like f , with f (a) > f (b) > f (c) = f (d) may rep-
xiv PREFACE

resent the ordering, irrespective of whatever numerical values (level curves)


such ordering has. However, the situation changes when the decision maker is
a consumer who is choosing between different goods and prices. In this case
the consumer’s ordering, based on the level curves (or indifference curves) of
chosen alternatives, can look quite different from that of the businessperson.
It so happens that the consumer’s ordering becomes concave (less expense,
more goods). But this situation implies that any function that represents the
consumer’s interest is, at best, quasi-concave. This natural phenomenon is, of
course, based on a deep but simple result that concave and convex functions
are diametrically opposite to each other in behavior and intent.
However, the subject of concave analysis, with emphasis on concave, quasi-
concave and log-concave functions, has appeal to both the consumer and busi-
ness organizations.

Overview
A general description of the topics covered in the book is as follows: Chap-
ter 1 introduces a review of matrix algebra that includes definitions, matrix
inversion, solutions of systems of linear algebraic equations, definite and semi-
definite matrices, Jacobian, two types of Hessian matrices, and the Hessian
test. Chapter 2 is a review of calculus, with topics dealing with limits, deriva-
tive, global and local extrema, first and second derivative tests, vector-valued
functions, optimization, multivariate functions, and basic concepts of mathe-
matical economics.
Concave and convex functions are introduced in Chapter 3, starting with
the notion of convex sets, Jensen’s inequalities for both concave and convex
functions, and unconstrained optimization. Chapter 4 deals with concave
programming; it is devoted to optimization problems on maximization mostly
with inequality constraints, and using the Lagrange method of multipliers and
the KKT necessary and sufficient conditions. Applications to mathematical
economics include the topic of peak price loading, and comparative statics is
discussed. Optimization problems focusing on minimization are introduced
in Chapter 5 on convex programming, in order to compare it with concave
optimization. Nonlinear programming is discussed; the Fritz John and Slater
conditions are presented, and the topic of Lagrangian duality is discussed.
Chapters 6 and 7 deal with quasi-concave and quasi-convex functions.
Both topics are important in their own applications. The single-function
bordered Hessian test on quasi-concavity and quasi-convexity is presented,
and optimization problems with types of functions and the minmax theorem
are provided. Chapter 8 deals with log-concave functions; general results
on log-concavity are presented, with application on mean residual life; and
the Asplund sum is introduced, with its algebra, derivatives, and area mea-
sure. Log-concavity of nonnegative sequences is discussed, and all log-concave
PREFACE xv

probability distributions with their density functions and cumulative density


functions are presented in detail.
Chapter 9 deals with the quadratic programming to optimization prob-
lems, and presents the following numerical methods: (i) Hildreth-D’Esopo
method; (ii) Beale’s method; and (iii) Wolfe’s method. The optimal con-
trol theory is discussed in Chapter 10, using the Hamiltonian and different
types of optimization problems. Three types of demands, namely Marshal-
lian, Hicksian, and Walrasian, are introduced in Chapter 11, using the Shep-
hard’s lemma and the Slusky equation with applications to cost minimiza-
tion, and the Giffen and Veblen goods. Chapter 12 is exclusively devoted to
the Black-Scholes differential equation, its solution, log-normal distribution,
Black-Scholes call price, Feynman-Kac theorem, capital asset pricing model,
dividends, stocks, bonds, and discounted stock prices.
There are six appendices: (A) Some useful topics on probability; (B) differ-
entiation of operators involving the Gateaux differential, Fréchet derivative,
the concept of the gradient, and Taylor’s series first- and second-order approx-
imations; (C) a list of probability distributions; (D) a self-contained detailed
discussion of the Laplace transforms; (E) implicit function theorem; and (F)
locally nonsatiated function. The bibliography toward the end of the book
contains the references cited in the book, followed by the Index.
The book contains over 330 examples and exercises; most exercises are
provided with solutions, simpler ones with hints and answers. Since the book
uses and discusses vectors and matrices, care is taken, in order to avoid confu-
sion, to set all (row/column) vectors and matrices in bold lowercase and bold
uppercase, respectively.
This is an introductory textbook that provides a good combination of
methodology, applications, and hands-on projects for students with diverse
interests from mathematical economics, business mathematics, engineering,
and other related applied mathematics courses.

To Readers
The second goal concerns specifically the abuse and misuse of a couple of
standard mathematical notations in this field of scientific study. They are
the gradient ∇f and the Laplacian ∇2 f of a function f (x) in Rn . Somehow,
and somewhere, a tradition started to replace the first-order partials of the
function f by its gradient ∇f . It seems that this tradition started without
any rigorous mathematical argument in its support. This book has provided
a result (Theorem 2.18) that establishes that only under a specific necessary
condition the column vector [∂f /∂x1 · · · ∂f /∂xn ] can replace the gradient
vector ∇f , and these two quantities, although isomorphic to each other, are
not equal. Moreover, it is shown that any indiscriminate replacement between
these two quantities leads to certain incorrect results (§3.5).
xvi PREFACE

The other misuse deals with the Laplacian ∇2 f , which has been used to
represent the Hessian matrix (§1.6.2), without realizing that ∇2 f is the trace
(i.e., sum of the diagonal elements) of the Hessian matrix itself. This abuse
makes a part equal to the whole. Moreover, ∇2 is the well-known linear partial
differential operator of the elliptic type known as the Laplacian.
It appears that this misuse perhaps happened because of the term ‘vector’,
which is used (i) as a scalar quantity, having only magnitude, as in the row
or column vectors (in the sense of a matrix), and (ii) as a physical quantity,
such as force, velocity, acceleration, and momentum, having both magnitude
and direction. The other factor for the abuse in the case of the gradient is
the above-mentioned linear isomorphic mapping between the gradient vector
∇f and the (scalar) column vector [∂f /∂x1 · · · ∂f /∂xn ]T . This isomorphism
has been then literally used as ‘equality’ between these two quantities. Once
the case for ∇f became the tradition, the next choice ∇2 f for the Hessian
matrix became another obvious, but incorrect, tradition.
As readers, you will find an attention symbol, !!! , at different parts of the
book. It is used to point out the significance of the statements found there.
The other less important notations are the ≺, the ⊕ and the ⊙ symbols.
Although borrowed from physics and astronomy, these symbols are acceptable
with a different but almost similar meaning provided that they are properly
defined as given in the section on Notations. Moreover, the ⊕ and the ⊙
symbols have now become so common due to the advancement in cell phones
and related electronic technology that they are probably losing their rigorous
mathematical significance.

Acknowledgments
I take this opportunity to thank Mr. Sarfraz Khan, Executive Editor, Taylor
& Francis, for his support, and Mr. Callum Fraser for coordinating the book
project. I also thank the Project Editor Michele A. Dimont for doing a great
job of editing the text. Thanks are due to the reviewers and to some of my
colleagues who made some very valuable suggestions to improve the book.
Lastly, I thank my friend Michael R. Schäferkotter for help and advice freely
given whenever needed.

Prem K. Kythe
Notations, Definitions, and Acronyms

A list of the notations, definitions, abbreviations, and acronyms used in this


book is given below.

a.s., almost surely


A, matrix
AT , transpose of the matrix A
|A|, determinant of a matrix A
adj A, adjoint of a matrix A
det(A) or |A|, determinant of a matrix A
A−1 , inverse of a matrix A
aij , element of a matrix A in the ith row and jth column
B(c, r), ball with center c and radius r
B(X, Y ), class of all bounded linear operators from X into Y
B, bordered Hessian matrix: one function
C 0 (D), class of functions continuous on a region D
C k (D), class of continuous functions with kth continuous derivative
on a region D, 0 ≤ k < ∞
C ∞ (D), class of continuous functions infinitely differentiable on a region D
C-function, same as a C 0 -function; continuous function
C, cofactor matrix
Cij , cofactor of the element in the ith row and jth column
C T , transpose of C = adj A
Ci , eigenvectors
C, cost; consumption
C(Q), production cost
CAPM, capital asset pricing model
CES, constant elasticity of substitution
CLPD, constant positive linear dependence constraint qualification
CLT, central limit theorem
CQ, constraint qualification
CRCQ, constant rank constraint qualification
CU, concave upward
CD, concave downward
xviii NOTATIONS, DEFINITIONS, AND ACRONYMS

Cov, covariance
c.d.f., cumulative distribution function
c(w, y), cost function
Dt , derivative with respect to t
Df (x), derivative of f (x) in Rn
D, aggregated demand
D, domain, usually in the z-plane
dist(A, B), distance between points (or sets) A and B
dom(f ), domain of a function f
DRS, decreasing return to scale
e, expenditure function
E, amount allocated for expenditure
E[X], expected value of a random vector X
E(f ), entropy of f
Eq(s)., Equation(s) (when followed by an equation number)
ei , ith unit vector, i = 1, . . . , n
[e], set of the unit vectors ei in Rn
epi(f ), epigraph of f
e(p, u), expenditure function
F , field
f : X 7→ Y , function f maps the set X into (onto) the set Y
f ◦ g, composite function of f and g: (f ◦ g)(·) = f (g(·))
f ′ , first derivative of f
f ′′ , second derivative of f
f (n) , nth derivative of f
∂f (x)
, first-order partials of f in Rn , also written fi , for i = 1, . . . , n; also
∂xi
∂f ∂f ∂f
written as fx , fy , fx for , , in R3
∂x ∂y ∂z
∂ 2 f (x)
, second-order partials of f in Rn , also written as fij for i, j = 1, . . . , n;
∂xi ∂xj
∂2f ∂2f ∂2f
also written as fxx , fyy , fzx for , , in R3
∂x2 ∂y 2 ∂z 2
(f ◦ g)(x),= f (g(x)), composition of functions f and g
Rt Rt
f ⋆ g, convolution of f (t) and g(t) (= 0 f (t − u)g(u) du = 0 f (ug (t − u) du =
L−1 {G(s)F (s)})
FJ, Fritz John conditions R∞
F (s), Laplace transform of f (t) (= 0 est f (t) dt)
G, government expenditure; constrained set
Gmin , positive minimal accepted level of profit
G(·; ·), Green’s function
NOTATIONS, DEFINITIONS, AND ACRONYMS xix

Geom(p), geometric distribution with probability distribution


H, Hamiltonian
hyp, hypograph of f
hj (p, u), Hicksian demand for good j
h, hours of work, h = T − τ
H, Hessian matrix
H̄, bordered Hessian matrix: two functions, f and the constraint g
iff, if and only if
i, income
I, income
I(p), e-beam intensity at position p
IK (x) = 0 if x ∈ K; ∞ if x ∈ /K
IS, commodity equilibrium
Int (D), interior of a domain D
IRS, increasing return to scale
I, identity matrix
j , goods number; input factor
|J|, or J, or simply J, Jacobian determinant
K, capital
KKT, Karush-Kuhn-Tucker conditions
L, Lagrangian function; also, labor
∂L ∂L
Lx , Ly , = , first-order partials of L
∂x ∂y
∂ 2 L ∂ 2 L ∂ 2L ∂ 2 L
Lxx , Lyy , Lxy , Lyx , = , , , , second-order partials of L
∂x2 ∂y 2 ∂x∂y ∂y∂x
L{f (t)}, Laplace transform of f (t), also denoted by F (s)
L−1 {f (t)}, inverse Laplace transform, also denoted by f (t)
L2 {f (t); s}, Laplace2 transform (also known as L2 -transform) of f (t; s)
LICQ, linear independence constraint qualification
LM, monetary equilibrium
LM, monetary equilibrium
LQC, linearity constrained qualifications
mrl(x), mean residual lifetime function
Mij , minor of the element aij of a matrix A
Md , demand for money
Ms , supply of money
Mt , transition-precautionary demand for money
Mz , speculative demand for money
MC, marginal cost
MFCQ, Mangasarian-Fromovitz constraint qualification
ME, marginal expenditure
xx NOTATIONS, DEFINITIONS, AND ACRONYMS

MPC, marginal propensity to consume


MR, marginal revenue
MRTS, marginal rate of technical substitution
N, set of natural numbers (positive integers)
N(A), null space of matrix A
N , numéraire
ND, negative definite
NSD, negative semidefinite
p.d.f., probability density function, or density function
p, prices (a vector with component pj )
P , profit; supply; nonlabor income
PV , probability function for a random variable V
PK price for K
PL , price for L
P, probability
PD, positive definite
PSD, positive semidefinite
q, Cobb-Douglas production function; also, amount of consumed good
Q, quantity of output produced; risk neutral measure
Qs , supply
Qd , demand
Q, level of output
QNCQ, quasi-normality constraint qualification
QP, quadratic programming
r, eigenvalue of a matrix A
r(t), vector-valued function (= f (t)i + g(t)j + h(t)k)
R, revenue
R(Q), sales revenue
R(f ), range of function f
R, the real line; real plane
R3 , three-dimensional real space
Rn , n-dimensional real space
R+ , nonnegative real numbers
R++ , positive real numbers
s, variable of Laplace transform
S, slackness variable; savings
S(p, w), Slutsky matrix
SC, Slater condition
SDE, stochastic differential equation
SOSC, second-order sufficient condition
tr(A), trace of a matrix A (sum of the diagonal elements)
NOTATIONS, DEFINITIONS, AND ACRONYMS xxi

T , total time available


TC, total cost
TE, total expenditure
TP, total product
TPk , totally positive of order k
TR, total revenue
u, utility
u̇, time-derivative of a function u
U (y | f ), upper contour set of f at y
Uα , upper-level set
v(p, m), indirect utility function
V , value function
w, wage (a vector with components wj );
wh, labor income(wage/hour)
x, quantity (vector)
xj (w, y), conditional factor demand for each input factor or good j
⌊x⌋, greatest integer function, floor of x
x, eigenvector of a matrix A; a point in R
x∗ , critical number, critical point (maximizer or minimizer)
x  y, componentwise inequality between vectors x and y
x ≻ y, strict componentwise inequality between vectors x and y
X, real normed space
X, exports
y, output;
Y , income
Yc , convex hull of Y ⊂ X
Z, imports
Z, set of integers
Z+ , set of nonnegative integers
δf (x, h), Gateaux differential (or G-differential) of f at x and h real
λ, λ Lagrange multiplier(s)
µ, µ Lagrange multiplier(s); also measure, Lebesgue measure
π, profit
ρ, rank of a matrix; ρ(A), rank of the matrix A; correlation coefficient
Σ, σ-algebra
τ , labor; also, time spent for leisure
χK (x) = 1 if x ∈ K; 0 if x ∈ / K, characteristic function defined on a convex
set K
k · k, norm
0, null (or zero) vector, = [0 0 . . . 0] in Rn
xxii NOTATIONS, DEFINITIONS, AND ACRONYMS

∂ ∂ ∂
∇, ‘del’ operator, ∇ = i + j + k , ((x, y, z) ∈ R3 ); an operator defined
∂x ∂y ∂z
∂ ∂
in Rn as ∇ = e1 + · · · + en
∂x1 ∂xn
∂f ∂f
∇f , gradient of a function f , a vector in R3 , defined by ∇ = i +j +
∂x ∂y
∂f ∂f ∂f
k ; a vector in Rn defined by ∇ = e1 + · · · + en for x =
∂z ∂x1 ∂xn
∂f ∂f
(x1 , . . . , xn ) ∈ Rn (dimension 1 × n), or by ∇ = e1 + · · · + en for
∂x1 ∂xn
x = (x1 , . . . , xn ) ∈ Rn (dimension n × 1)
∂2 ∂2
∇2 , Laplacian operator defined on Rn as + · · ·+ ; it is a linear elliptic
∂x21 ∂x2n
partial differential operator
∂2f ∂2f
∇2 f (x) = 2 + ··· + , x = (x1 , . . . , xn ) ∈ Rn , Laplacian of f (x); also
∂x1 ∂x2n
the trace of the Hessian matrix H
kxk1 , l1 -norm of a vector x
kxk2 , l2 -norm, or Euclidean norm, of a vector x
kxk∞ , l∞ -norm of a vector x
≻, , (subordination (predecessor): A  B, matrix inequality between ma-
trices A and B; A ≻ B, strict matrix inequality between matrices A and
B
≺, , subordination (successor) , e.g., f ≺ g is equivalent to f (0) = g(0) and
f (E) ⊂ g(E), where E is the open disks ; but here x ≺ y is used for
componentwise strict inequality, and x  y for componentwise inequality
between vectors x and y
(f ⊕ g)(z), = sup{f (x)g(y)}, where f and g are log-concave functions
x+y
x
(s ⊙ f )(x), = sf , where f is a log-concave function, and s > 0
s
n
 n! n

k , binomial coefficient = k! (n − k)! = n−k
iso iso
= , isomorphic to; for example, A = B means A is isomorphic to B, and
conversely
 end of a proof, or an example
!!! attention symbol
1
Matrix Algebra

Some basic concepts and results from linear and matrix algebra, and from
finite-dimensional vector spaces are presented. Proofs for most of the results
can be found in many books, for example, Bellman [1970], Halmos [1958],
Hoffman and Kunze [1961] Lipschutz [1968], and Michel and Herget [2007].

1.1 Definitions
A matrix A is a rectangular array of elements (numbers, parameters, or vari-
ables), where the elements in a horizontal line are called rows, and those in
a vertical line columns. The dimension of a matrix is defined by the number
of rows m and the number of columns n, and we say that such a matrix has
dimension m × n, or simply that the matrix is m × n. If m = n, then we have
a square matrix. If the matrix is 1 × n, we call it a row vector, and if the
matrix is m × 1, then it is called a column vector. A matrix that converts the
rows of a matrix A to columns and the columns of A to rows is called the
transpose of A and is denoted by AT .
Let two 3 × 3 matrices A and B be defined as
   
a11 a12 a13 b11 b12 b13
A =  a21 a22 a23  , B =  b21 b22 b23  . (1.1.1)
a31 a32 a33 b31 b32 b33

Then, for example,


 
a11 a21 a31
AT =  a12 a22 a32  ,
a13 a23 a33

and similarly for BT . For addition (or subtraction) of two matrices (A + B,


or A − B) the two matrices A and B must be of equal dimension. Each
element of one matrix (B) is added to (or subtracted from) the corresponding
element of the other matrix (A). Thus, the element b11 in B is added to (or
2 1 MATRIX ALGEBRA

subtracted from) a11 in A; b12 to (or from) a12 , and so on. Multiplication of
a matrix by a number or scalar involves multiplication of each element of the
matrix by the scalar, and it is called scalar multiplication, since it scales the
matrix up or down by the size of the scalar.
A row vector A and a column vector B are written, respectively, as
 
b11
A = [ a11 a12 a13 ]1×3 , B =  b21  .
b31 3×1

However, to save space, B is often written as its transpose, i.e.,


T
B = [ b11 b21 b31 ] .

Multiplication of a row vector A by a column vector B requires that each


vector has the same number of elements. This multiplication is then carried
out by multiplying each element of the row vector by its corresponding element
in the column vector and summing the product. Thus,
T
AB = [ a11 a12 a13 ]1×3 × [ b11 b21 b31 ]3×1
= [a11 b11 + a12 b21 + a13 b31 ]1×1 .

The above technique of multiplication of a row and a column vector is used


to obtain the multiplication of any two vectors with a precondition that the
number of rows and columns of one matrix must be the same as the number
of columns and rows of the other matrix. Then the two matrices are said to
be conformable for multiplication. Thus, an m × n matrix can be multiplied
by another n × m matrix, and the resulting matrix will be m × m.
Example 1.1. Given
 
  5 13  
3 6 11 1 4 7
A= , B = 7 8  , E= ,
12 8 5 2 4 9
2×3 9 10 3×2 2×3

the matrices A and B, and B and E are conformable for multiplication, but
A and C are not conformable. Thus,
   
3 · 5 + 6 · 7 + 11 · 9 3 · 13 + 6 · 8 + 11 · 10 156 97
AB = = ,
12 · 5 + 8 · 7 + 5 · 9 12 · 13 + 8 · 8 + 5 · 10 161 270 2×2
   
5 · 1 + 13 · 2 5 · 4 + 13 · 4 5 · 7 + 13 × 9 31 72 152
BE =  7 · 1 + 8 · 2 7·4+8·4 7 · 7 + 8 × 9  =  23 60 121  .
9 · 1 + 10 · 2 9 · 4 + 10 · 4 9 · 7 + 10 × 9 29 76 153 3×3
1.2 PROPERTIES 3

Example 1.2. In business, one method of keeping track of sales of different


types of products at different outlets is to keep the inventory in the form of a
matrix. Thus, suppose that a construction company has four different outlets
selling (a) bricks, (b) lumber, (c) cement, and (d) roof shingles. The inventory
and the price of each item are expressed as matrices A and P:

(a) (b) (c) (d)


   
100 110 80 115 220
 210 230 150 400   65 
A =  , P=  .
165 95 68 145 114
150 190 130 300 4×4 168 4×1

Since the two matrices are conformable, their product AP will give the values
V (in dollars) of the stock at each outlet:
   
100 · 220 + 110 · 65 + 80 · 114 + 115 · 168 57, 590
 210 · 220 + 230 · 65 + 150 · 114 + 400 · 168   145, 450 
V = AP =  = . 
165 · 220 + 95 · 65 + 68 · 114 + 145 · 168 74, 587
150 · 220 + 190 · 65 + 130 · 114 + 300 · 168 110, 520

1.2 Properties
The following properties of matrices are useful.
1. Matrix addition is commutative and associative i.e., A + B = B + A, and
(A+B)+C = A+(B+C). These properties also hold for matrix subtraction,
since A − B = A + (−B).
2. Matrix multiplication, with a few exceptions, is not commutative, i.e.,
AB 6= BA. Scalar multiplication is commutative, i.e., cA = Ac. If three or
more matrices are conformable, i.e., if Aj×k , Bm×n , Cp×q , where k = m and
n = p, the associative law applies as long as matrices are multiplied in their
order of conformability. Thus, (AB)C = A(BC). Under the same conditions
the matrix multiplication is also distributive, i.e., A(B + C) = AB + BC.
Example 1.3. Given
  
7 4   7
4 3 10
A= 1 5
  , B= , C = 8 ,
3 5 6 2×3
8 9 3×2 9 3×1

we get
   
7·4+4·3 7·3+4·5 7 · 10 + 4 · 6 40 41 94
AB =  1 · 4 + 5 · 3 1 · 3 + 5 · 5 1 · 10 + 5 · 6  =  19 28 40  ;
8·4+9·3 8·3+9·5 8 · 10 + 9 · 6 59 69 134 3×3
4 1 MATRIX ALGEBRA
 
     
40 41 94 7 40 · 7 + 41 · 8 + 94 · 9 1454
(AB)C =  19 28 40   8  =  19 · 7 + 28 · 8 + 40 · 9 = 717  ;
59 69 134 3×3 9 3×1 59 · 7 + 69 · 8 + 134 · 9 2171 3×1
 
  7    
4 3 10 4 · 7 + 3 · 8 + 10 · 9 142
BC =  8  = = ;
3 5 6 2×3 3·7+5·8+6·9 115 2×1
9 3×1
     
7 4   7 · 142 + 4 · 115 1454
142
A(BC) =  1 5  =  1 · 142 + 5 · 115  =  717  .
115
8 9 3×2 8 · 142 + 9 · 115 3×1 2171 3×1
3. An identity matrix I is a square matrix whose diagonal elements are all
1 and all remaining elements are 0. An n × n identity matrix is sometimes
denoted by In . The identity matrix I is the unity in matrix algebra just as
the numeral 1 is the unity in algebra. Thus, the multiplication of a matrix
by an identity matrix leaves the original matrix unchanged; so also the multi-
plication of an identity matrix by itself leaves the identity matrix unchanged.
Hence, AI = IA = A, and I × I = I2 = I.
4. A matrix A for which A = AT is called a symmetric matrix. A symmetric
matrix A for which A × A = A is an idempotent matrix. The identity matrix
I is both symmetric and idempotent.
5. A null matrix 0 is composed of all 0s and can have any dimension; it is not
necessarily square. Obviously, addition or subtraction of null matrices leaves
the original matrix unchanged; multiplication by a null matrix yields a null
matrix. A scalar zero 0 has dimension 1 × 1.
6. A matrix with zero elements everywhere below (or above) the principal
diagonal is called upper (or lower) triangular matrix, also known as upper or
lower echelon form. Thus,
   
a11 a12 ··· a1,n−1 a1,n a11 0 ··· 0 0
 0 a22 ··· a2,n−1 a2n   a21 a22 ··· 0 0 
 , or  
··· ··· ··· ··· ··· ··· ··· ··· ··· ···
0 0 ··· 0 ann an1 an2 ··· an,n−1 ann

represent the upper or lower triangular matrix.


7. The sum of all the elements on the principal diagonal of an n × n matrix
A is called the trace of the matrix A and is denoted by tr(A).
See Exercise 1.6 for the dimensions of some most used expressions in Rn .
Remember that dimensions are always listed row by column, and a vector a
is a column vector, and scalars are of dimension 1 × 1.
1.3 MATRIX INVERSION 5

1.3 Matrix Inversion


The determinant of a 2 × 2 matrix A is called a second-order determinant,
and it is obtained by taking the product of the two elements on the principal
diagonal and subtracting it from the product of the two elements off the
principal diagonal. Thus, for the 2 × 2 matrix
 
a11 a12
A= ,
a21 a22

the determinant is

a11 a12
|A| = = a11 a22 − a12 a21 . (1.3.1)
a21 a22

The determinant |A| is sometimes denoted by det(A). It is a number or a


scalar and is obtained only for square matrices. If |A| = 0, then the deter-
minant is said to vanish and the matrix A is said to be singular. A singular
matrix is one in which there exists a linear dependence between at least two
rows or columns. If |A| 6= 0, then the matrix A is nonsingular and all its rows
and columns are linearly independent.
Generally, in the case of systems of linear equations, if there is linear de-
pendence, the system may have no solution or an infinite number of possible
solutions, and a unique solution cannot be determined. Thus, the determi-
nant is an important test that points to specific problems before the system
should be solved. Given a system of equations with the coefficient matrix A,
(i) if |A| = 0, the matrix is singular, indicating that there is a linear
dependence among the equations, and so no unique solution exists.
(ii) if |A| 6= 0, the matrix is nonsingular and there is linear independence
among the equations, and so a unique solution exists and can be determined.
The rank ρ of a matrix is defined as the maximum number of linearly
independent rows or columns in the matrix. The rank of a matrix is used
for a simple test of linear dependence, as follows: Assume that A is a square
matrix of order n. Then
(i) if ρ(A) = n, the A is nonsingular and there is linear independence;
(ii) if ρ(A) < n, then A is singular and there is linear dependence.
   
7 4 6 9
Example 1.4. Consider A = and B = . Then |A| =
8 9 8 12
7(9) − 4(8) = 63 − 32 = 31 6= 0, and so the matrix A is nonsingular, i.e., there
is linear independence between any rows or columns, and ρ(A) = 2. On the
other hand, |B| = 6(12) − 9(8) = 72 − 72 = 0, and so the matrix B is singular
and linear dependence exists between its rows and columns (casual inspection
reveals that row 2 is 4/3 times row 1, and column 2 is 3/2 times column 1),
and thus ρ(B) = 1. 
6 1 MATRIX ALGEBRA

The determinant of a 3 × 3 matrix A


 
a11 a12 a13
A =  a21 a22 a23  (1.3.2)
a31 a32 a32

is called a third-order determinant, and it is a sum of three products, which


are derived as follows:
1. Take the first element a11 of the first row and (mentally) delete the row
and the column in which it appears. Then multiply a11 by the determinant
of the remaining second-order matrix.
2. Take the second element a12 of the first row and (mentally) delete the
row and the column in which it appears. Then multiply a12 by (−1) and the
determinant of the remaining second-order matrix.
3. Take the first element a13 of the first row and (mentally) delete the row
and the column in which it appears. Then multiply a13 by the determinant
of the remaining second-order matrix.
This process yields

a22 a23 a a23 a a22


|A| = a11 + a12 (−1) 21 + a13 21
a32 a33 a31 a33 a31 a32
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a32 − a23 a31 ) + a13 (a21 a32 − a22 a31 ),
(1.3.3)

which is a scalar quantity.


The determinant of a 4 × 4 matrix can be determined similarly as the sum
of four products, and so on.
The first line of Eq (1.3.3) can also be written as

|A| = a11 |M11 | + a12 (−1)|M12 | + a13 |M13 |, (1.3.4)

where
a22 a23 a21 a23 a21 a22
|M11 | = , |M12 | = , |M13 | = ,
a32 a33 a31 a33 a31 a32

where |M11 | is the minor of a11 , |M12 | the minor of a12 , and |M13 | the minor
of a13 . A cofactor |Cij | is a minor with a prescribed sign, which follows the
rule
|Cij | = (−1)i+j |Mij |. (1.3.5)
Thus, depending on an even or odd power of (−1) we have

if i + j is an even number, then |Cij | = |Mij |,
if i + j is an odd number, then |Cij | = −|Mij |.
1.3 MATRIX INVERSION 7

Then the expansion (1.3.4) can be expressed in terms of the cofactors as

|A| = a11 |C11 | + a12 |C12 | + a13 |C13 |, (1.3.6)

which is known as the third-order Laplace expansion of the determinant |A|.


This expansion can be extended to an nth-order determinant. The Laplace
expansion allows us to evaluate a determinant along any row or column. In
practice, however, selection of a row or column with more zeros than others
simplifies the evaluation by eliminating terms.
Let A and B be (n × n) matrices. Some properties of determinants are
stated below; proofs can be found in Michel and Herget [2007].
1. AT = |A|; and |AB| = |A||B|.
2. If all elements of a column (or row) of A are zero, then |A| = 0.
3. If B is the matrix obtained by multiplying every element in a column (or
row) of A by a constant c, while all other columns of B are the same as those
in A, then |B| = c |A|.
4. If B is the same except that two columns (or rows) are interchanged, then
|B| = −|A|.
5. If two columns (or rows) of A are identical, or proportional, or linearly
dependent, then |A| = 0.
6. Addition or subtraction of any nonzero multiple of one row (or column)
to (or from) another row (or column) of A has no effect on the value of a
determinant.
7. Interchanging any two rows or columns of A reverses the sign of the
determinant, but this cannot change the singularity of the matrix.
8. The determinant of a triangular matrix is equal to the product of the
elements on the principal diagonal.
9. Let I be the (n × n) identity matrix, and let 0 be the (n × n) zero matrix.
Then |I) = 1 and |0) = 0.

1.3.1 Cofactor Matrix. A cofactor matrix C is a matrix in which every


element aij is replaced by its cofactor |Cij |. An adjoint matrix is the transpose
of a cofactor matrix. Thus, for a 3 × 3 matrix A defined in (1.1.1),
   
|C11 | |C12 | |C13 | |C11 | |C21 | |C31 |
C =  |C21 | |C22 | |C23 |  ,adj A = CT =  |C12 | |C22 | |C32 |  .
|C31 | |C32 | |C33 | |C13 | |C23 | |C33 |
(1.3.7)
Let A be a square matrix. Then its inverse matrix A−1 satisfies the relation

AA−1 = I = A−1 A. (1.3.8)

The inverse matrix A−1 exists only if A is a square matrix. Thus, multiply-
ing a matrix by its inverse reduces to the identity matrix. This property is
8 1 MATRIX ALGEBRA

similar to the property of the reciprocal in algebra. The inverse matrix can
be obtained using the formula

1
A−1 = adj A, |A| 6= 0. (1.3.9)
|A|
 
3 2 −4
Example 1.5. Consider the matrix A =  −2 5 1 . Then
3 −2 7

|A| = 3[5(7) − 1(−2)] − 2[(−2)(7) − (1)(3)] − 4[(−2)(−2) − 5(3)] = 127 =


6 0.

So the matrix A is nonsingular, and ρ(A) = 3. The cofactor matrix of A is


 
5 1 −2 1 −2 −2

 −2 7 3 7 3 −2   
  37 17 10
 2 −4 3 −4 3 2 
 − −2
C=  =  −22 33 −12  .
7 3 7 3 −2 
  14 −11 14
 3 2 3 −4 3 2 

−2 5 −2 1 −2 5

Then the transpose of the cofactor matrix C gives the adjoint matrix:
 
37 −22 14
adj A = CT =  17 33 −11  .
10 −12 14

Thus, by formula (1.3.9),


   37 22 14 
37 −22 14 127 − 127 127
1 
A−1 = 17
17 33 −11  =  127 33
127
11
− 127 
127 10 12 14
10 −12 14 127 − 127 127
 
0.29 0.17 0.11
=  0.13 0.26 −0.09  .
0.08 −0.09 0.11

To check the answer, evaluate both AA−1 and A−1 A; both of these products
should be equal to I. 
 Example
 1.6. This is a very useful time-saving result. Given A =
a b
, find adj A and A−1 . Assuming |A| = ad − bc = 6 0, the cofactor
c d
matrix of A is  
d −c
C= ,
−b a
1.4 SYSTEMS OF LINEAR EQUATIONS 9

and thus,  
d −b
adj A = CT = .
−c a
Hence,  
−1 1 1 d −b
A = adj A = . (1.3.10)
|A| ad − bc −c a
1.4 Systems of Linear Equations
A system of linear equations can be expressed in the form of a matrix equation.
For example, consider the following system of n linear equations:

a11 x1 + a12 x2 + · · · + a1n xn = b1 ,


a21 x1 + a22 x2 + · · · + a2n xn = b2 ,
······
an1 x1 + an2 x2 + · · · + ann xn = bn . (1.4.1)

Then this system can be expressed in the matrix form as

Ax = b, (1.4.2)

where

  

a11 a12 ··· a1n
 x1 b1
a a22 ··· a2n   x2   b2 
A =  21

, x=
 ... 
 b=
 ...  ,
 (1.4.3)
··· ··· ···
an1 an2 ··· ann xn bn

where A is called the coefficient matrix, x the solution vector, and b the
vector of constant terms. Note that x and b are always column vectors.
Example 1.6. Consider

5x1 + 8x2 = 42
4x1 + 9x2 = 39.

Expressing this system as Ax = b, we have


    
5 8 x1 42
A= , x= , b= .
4 9 x2 39

Thus,     
5 8 x1 5x1 + 8x2
Ax = = ,
4 9 x2 4x1 + 9x2 2×1
10 1 MATRIX ALGEBRA

and so we get the matrix form


   
5x1 + 8x2 42
Ax = b : = .
4x1 + 9x2 39

Systems of linear equations can be solved by the following methods.


1.4.1 Solution with the Inverse. Any system of linear equations can be
solved with the inverse. Thus, to solve the equation

An×n xn×1 = bn×1 ,

provided that A−1 exists, we multiply both sides of this equation by A−1 ,
following the laws of conformability:

A−1 −1
n×n An×n xn×1 = An×n bn×1 ,

which, using A−1 A = I, gives

In×n xn×1 = A−1


n×n bn×1 .

Since Ix = x, we have the formula:

xn×1 = A−1
n×n bn×1 . (1.4.4)

Example 1.8. To solve the system of equations in Example 1.7, we have


|A| = 5(9) − 8(4) = 13 =
6 0. The cofactor matrix of A is
 
9 −4
C= .
−8 5

Then
     9 8

9 −8 1 9 −8 − 13
adj A = CT = , A−1 = = 134 5 .
−4 5 13 −4 5 − 13 13

Hence, by formula (1.4.4), we get


 9 8
   66 
13 − 13 42
x= 4 5 = 13
27
− 13 13
39 13

66 27
which gives x1 = ≈ 5.07 and x2 = ≈ 2.07. 
13 13
1.4.2 Cramer’s Rule. To solve the system Ax = b, Cramer’s rule is as
follows: Let D denote the coefficient matrix A, and Di the matrix obtained
1.4 SYSTEMS OF LINEAR EQUATIONS 11

by replacing the coefficients xi in D by the elements of the column matrix b.


Then, if |D| 6= 0, the solution of the system is given by

|Di |
xi = , i = 1, 2, . . . , n. (1.4.5)
|D|

Example 1.9. Solve

x − 2z = 3
−y + 3z = 1
2x + 5z = 0.

1 0 −2 3 0 −2
We have |D| = 0 −1 3 = −9 6= 0; |Dx | = 1 −1 3 = −15;
2 0 5 0 0 5
1 3 −2 1 0 3
|Dy | = 0 1 3 = 27; |Dz | = 0 −1 1 = 6. Thus,
2 0 5 2 0 0

|Dx | −15 5 |Dy | 27 |Dz | 6 2


x= = = ; y= = = −3; z = = =− .
|D| −9 3 |D| −9 |D| −9 3

1.4.3 Gaussian Elimination Method. To solve the system (1.4.1), this


method involves replacing equations by combinations of equations in such a
way that a triangular system in the upper echelon form is obtained:

x1 + a12 x2 + · · · + a1,n−1 xn−1 + a1n xn = b′1


x2 + · · · + a2,n−1 xn−1 + a2n xn = b′2
······
xn−1 + an−1,n xn = b′n−1
xn = b′n .

The last equation determines xn . Then the process of backward substitution


is used to find xn−1 , . . . , x2 , x1 .
Example 1.10. Use the Gaussian elimination method to solve the system

x − 2y + 3z = 4
2x + y − 4z = 3
−3x + 4y − z = −2.

In order to reduce the matrix to the upper triangular form, first, any nonzero
coefficient (usually the largest) in absolute value is located, and brought to the
12 1 MATRIX ALGEBRA

upper left corner by interchanges of equations and columns. In this example


this is already in place. Keeping the first equation intact, the correct multiple
of the second equation (which is 1/2) and that of the third equation (which
is 1/3) are subtracted, respectively, from the first, which gives

x − 2y + 3z = 4
y − 2z = −1
4z = 8.

Thus, the solution is x = 4, y = 3, z = 2. 


Example 1.11. Use the Gaussian elimination method to solve the system

x − 2z + 2w = 1
−2x + 3y + 4z = 1
y+z−w =0
3x + y − 2z − w = 3.

By following the procedure, this system reduced to the triangular form

x − 2z + 2w = 1
y+z−w =0
−3z + 7w = 1
w = 1,

which gives the solution as x = 3, y = −1, z = 2, w = 1. 

1.5 Definite and Semidefinite Matrices


Let A be an n × n matrix. If there exists a scalar r and a non-zero vector x
such that
Ax = rx, (1.5.1)
then r is called an eigenvalue 1 of A and x is called an eigenvector of A
corresponding to the eigenvalue r. An important result is: r is an eigenvalue
of A iff
|A − rI| = 0. (1.5.2)
The left side of this equation can be expanded in terms of the elements of A
as
a11 − r a12 ··· a1n
a21 a22 − r · · · a2n
|A − rI| = .. .. .. , (1.5.3)
. . ··· .
an1 an2 · · · ann − r
1
Although λ is generally used to denote an eigenvalue of a matrix, we are using r instead,
so that it does not conflict with the Lagrange multiplier λ.
1.5 DEFINITE AND SEMIDEFINITE MATRICES 13

which yields a polynomial of degree n. In order for r to be an eigenvalue of


A it must satisfy Eq (1.5.2).
For example, the eigenvalues ri , i = 1, 2, of a 2 × 2 matrix A can be found
by solving the quadratic equation
p
tr(A) ± tr(A) − 4|A|
ri = , (1.5.4)
2

where tr(A) = a11 +a22 is the trace of A. In this case, Eq (1.5.3) is |A−ri I| =
0, or
a11 − r a12
= 0,
a21 a22 − r

solving which we have r2 − (a11 + a22 )r + (a11 a22 − a12 a21 ) = 0, or in matrix
notation,
r2 − tr(A)r + |A| = 0,

which leads to the quadratic formula (1.5.4).


Eq (1.5.2) is called the characteristic equation or characteristic polynomial
for the matrix A. For a non-trivial solution, (A − rI) must be singular.
The eigenvalues of any real symmetric matrix A in a complex vector space
are all real. A (n × n) matrix A is said to be positive semidefinite (PSD) (or
negative semidefinite (NDS)) if the eigenvalues ri are nonnegative or nonpos-
itive; or, alternatively, if for all z ∈ Rn ,

z · Az ≥ (≤) 0.

6 0, then A is positive definite (PD)(or negative


If equality is strict for all z =
definite (ND)).
 
1 0
Example 1.12. (a) The identity matrix I = is positive (semi)
     0 1    
x x 1 0 x x x
definite for all z = , since z · Iz = · = =
y y 0 1 y y y
2 2
x + y ≥ 0. Note that −I is negative (semi) definite.
 
−1 −1
(b) The matrix M = is negative semidefinite, since for all z =
  −1 −1
x
,
y
       
x −1 −1 x x −x − y
z · Mz = = = −(x + y)2 ≤ 0,
y −1 −1 y y −x − y
14 1 MATRIX ALGEBRA

but it is not negative definite at x = −y, since z · Mz = 0. However, not all


matrices
 are either
 positive or negative definite, as for example, the matrix
1 0
D= shows that
0 −1
   
x 1 0 x
z · Dz = = x2 − y 2 . 
y 0 −1 y

Some properties of positive (negative) definite and positive (negative)


semidefinite matrices are:
(i) M is PSD [PD] ⇔ −M is NSD [ND].
(ii) M is ND [PD] ⇔ M−1 is ND [PD].
(iii) M is ND [PD] ⇔ M NSD [PSD], but M is NSD [PSD] 6⇒ M is ND
[PD].
(iv) M is ND [PD] ⇒ M + M′ is ND [PD].
Proof. (i) z · (−M)Az = −(z · Mz). (ii) z · Mz = (z · Mz)′ = z · M′ z =
z · MM−1 M′ z = M′ z · M−1 z = M′ z · M−1 M′ z. (iii) For all z 6= 0, z · Mz > 0
and for all z = 0, z · Mz = 0, then for all z we have z · Mz ≥ 0; and (iv)
z · (M + M′ ) = 2z · Mz. 

1.6 Special Determinants


We will discuss the following determinants in detail: the Jacobian, the Hes-
sian, and the two bordered Hessians, one involving two functions f and (the
constraint) g, and the other involving one function f only.

1.6.1 Jacobian. The Jacobian determinant |J| is used to test functional


dependence, both linear and nonlinear. It is composed of all the first-order
partial derivatives of a system of equations, arranged in an ordered sequence.
For a system of nonlinear equations

f1 = f1 (x1 , x2 , . . . , xm )
f2 = f2 (x1 , x2 , . . . , xm )
..
.
fn = fn (x1 , x2 , . . . , xm ), (1.6.1)

the Jacobian determinant is given by


∂f1 ∂f1 ∂f1
···
∂x1 ∂x2 ∂xm
∂f2 ∂f2 ∂f2
∂(f1 , f2 , . . . , fn ) ···
|J| = = ∂x1 ∂x2 ∂xm . (1.6.2)
∂(x1 , x2 , . . . , xm ) .. .. ..
. . ··· .
∂fn ∂fn ∂fn
···
∂x1 ∂x2 ∂xm
1.6 SPECIAL DETERMINANTS 15

If |J| = 0, the equations are functionally dependent; if |J| 6= 0, they are


functionally independent. Note that the Jacobian |J| is also denoted by J or
J.
Example 1.13. Given the system

y1 = 5x1 − 4x2
y2 = 25x21 − 40x1 x2 + 16x22 ,

the Jacobian is

5 −4
|J| =
50x1 − 40x2 −40x1 + 32x2
= −200x1 + 160x2 + 200x1 − 160x2 = 0.

So there is functional dependence, which is (5x1 − 4x2 )2 = 25x21 − 40x1 x2 +


16x22 . 

1.6.2 Hessian. The Hessian matrix H of a twice-differentiable function f (x)


at a point x ∈ Rn is defined by the square n × n matrix by
 
∂2f ∂2f ∂2f
 ∂x2 ···
 1 ∂x1 ∂x2 ∂x1 ∂xn 

 ∂2f ∂2f ∂2f 
 ··· 

H =  ∂x2 ∂x1 ∂x22 ∂x2 ∂xn 
, (1.6.3)

 .. .. .. 

 . . ··· . 
 ∂2f ∂2f ∂2f 
···
∂xn ∂x1 ∂xn ∂x2 ∂x2n

∂ 2f
or, componentwise by Hi,j = . The determinant det H is denoted by
∂xi ∂xj
|H|. Generally, the function for which the Hessian is used is obvious from the
context.
Like any matrix, the Hessian additions are commutative, associative and
distributive, and multiplication of a Hessian by a scalar and the product of
two Hessians are defined in the usual way.
!!! Some authors denote the Hessian H of a twice-differentiable function
f = f (x1 , . . . , xn ) incorrectly as ∇2 f (x). This notation is in conflict with
the established notation for the Laplacian of a twice-differentiable function
f = f (x1 , . . . , xn ), denoted by ∇2 f and defined by

∂2f ∂2f ∂2f


∇2 f = ∇ · ∇f = 2 + 2 + ···+ 2 ,
∂x1 ∂x2 ∂xn
16 1 MATRIX ALGEBRA

which is obviously not a matrix, but a second-order elliptic partial differential


operator. Moreover, ∇2 f is the trace of the matrix (1.6.3).
As seen from the definition (1.6.3), the mixed partial derivatives occupy
the entries that are off-diagonal. By Clairaut’s theorem, so long as they are
continuous, the order of differentiation does not matter. Thus,

∂  ∂f  ∂  ∂f 
= .
∂xi ∂xj ∂xj ∂xi

Hence, if the second derivatives of f are all continuous in a neighborhood D,


then the Hessian of f is a symmetric matrix throughout D.
The column vector of the first-order partial derivatives of a function f is
denoted by
 T
∂f ∂f ∂f
··· .
∂x1 ∂x2 ∂xm

If this column vector is zero at a point x, then f has a critical point at x,


which is denoted by x∗ .
The main application of the Hessian is found in large-scale optimization
problems that use Newton-type methods. The second derivative test (§2.4)
checks if the point x is a local extremum. For example, a function f attains a
local maximum, or a local minimum, at x according as the Hessian is positive
definite, or negative definite, at x. The function f has a saddle point at x, if
the Hessian has both positive and negative eigenvalues. In other cases the test
is inconclusive. However, if the Hessian is semidefinite in a neighborhood of x,
we can definitely conclude whether f is locally convex or concave, respectively.
The second derivative test for a single variable function f is explained in
§2.3. In the case of a bivariate function, the discriminant |H| can be used
since it is the product of the eigenvalues. If this product is positive, then the
eigenvalues are both positive, or both negative; if it is negative, then the two
eigenvalues have different signs; if it is zero, then the second derivative test
fails.
For functions of several variables, the second-order conditions that are suf-
ficient for a local maximum or minimum can be expressed in terms of the
sequence of principal (i.e., the upper-leftmost) minors (determinants of sub-
matrices) of the Hessian. These conditions are a special case of bordered
Hessian, defined below, which is used for the second-derivative test in certain
constrained optimization problems.
For example, in an optimization problem with a function f of two variables
(x, y), if the first-order conditions fx = fy = 0 are satisfied, a sufficient
1.6 SPECIAL DETERMINANTS 17

condition for a function z = f (x, y) to be an optimum is:



fxx , fyy > 0 for a minimum,
(1)
fxx , fyy < 0 for a maximum;
(2) fxx fyy > (fxy )2 .

This is the Hessian test for second-order derivatives, and for a symmetric
(2 × 2) positive definite matrix it is defined by

fxx fxy
|H| = . (1.6.4)
fyx fyy
If the first element on the principal diagonal, known as the first principal
minor and denoted by |H1 | = fxx is positive and the second principal minor

|H2 | = |H| = fxx fyy − (fxy )2 > 0,

then the second-order conditions for a minimum are met. When |H1 | > 0
and |H2 | > 0, the Hessian |H| is called positive definite. A positive definite
Hessian satisfies the second-order conditions for a minimum.
If the first principal minor |H1 | = fxx < 0 and |H2 | > 0, the Hessian |H| is
called negative definite. A negative definite Hessian satisfies the second-order
conditions for a maximum.
Example 1.14. Consider f = 3x2 − xy + 2y 2 − 4x − 7y + 12. Then
fxx = 6, fxy = −1, fyy = 4. Thus,
 
6 −1
|H| = ,
−1 4

which gives |H1 | = 6 > 0 and |H2 | = |H| = 24 − 1 = 23 > 0. Hence, the
Hessian is positive definite, and f is minimized at the critical values, which
are given by the solution of fx = 6x − y − 4 = 0, fy = −x + 4y − 7 = 0, i.e.,
at x∗ = 1, y ∗ = 2. 

1.6.3 Bordered Hessian: Two Functions. Given a function f with a


constraint function g such that g(x) = c, the bordered Hessian is defined by
 ∂g ∂g ∂g 
0 ···
 ∂x1 ∂x2 ∂xn 
 ∂g
 ∂2f ∂2f ∂2f  
 · · · 
 ∂x1 ∂x21 ∂x1 ∂x2 ∂x1 ∂xn 
 
H̄(f, g) =  ∂g ∂2f ∂2f ∂2f  . (1.6.5)
 2 · · · 
 ∂x2 ∂x2 ∂x1 ∂x 2 ∂x2 ∂xn 
 
 .. .. .. .. 
 . . . ··· . 
 ∂g 2 2 2
∂ f ∂ f ∂ f 
···
∂xn ∂xn ∂x1 ∂xn ∂x2 ∂x2n
18 1 MATRIX ALGEBRA

In the case of m constraints, the zero in the top-left first element is to be


replaced by an m × m square block of zeros, and there will be m border rows
at the top and m border columns at the left.
Note that since a bordered Hessian can neither be positive definite nor
negative definite, the rules stated above about extrema being characterized
by a positive definite or negative definite Hessian do not apply. The second de-
rivative test in such a case consists of signs of restrictions of the determinants
of a certain set of (n − m) sub-matrices of the bordered Hessian.
In the case of a function f (x, y) subject to a constraint g(x, y) the bordered
Hessian for constrained optimization is defined as follows: form a new function
F (x, y) = f (x, y) + r[k − g(x, y)], where the first-order conditions are Fx =
Fy = Fr = 0. Then the second-order conditions can be expressed in terms of
a bordered Hessian |H̄| in one of the following two ways:

Fxx Fxy gx 0 gx gy
|H̄| = Fyx Fyy gy , or gx Fxx Fxy , (1.6.6)
gx gy 0 gy Fyx Fyy

which is simply the usual Hessian (1.6.4) (with f replaced by F ), i.e.,

Fxx Fxy
|H| = ,
Fyx Fyy

bordered by the first derivatives of the constraint with zero on the principal
diagonal. The order of a bordered Hessian is determined by the order of
the principal minor being bordered. Thus, |H̄| in (1.6.6) represents a second
bordered principal minor |H̄2 |, because the principal minor being bordered is
2 × 2.
In general, for a function F (x1 , x2 , . . . , xn ) in n variables subject to the
constraint g(x1 , x2 , . . . , xn ), the bordered Hessian, defined by (1.6.5), can also
be expressed as

0 g1 g2 ··· gn
F11 F12 ··· F1n g1
F21 F22 ··· F2n g2
g1 F11 F12 ··· F1n
|H̄| = · · · ··· ··· ··· , or ,
g2 F21 F22 ··· F2n
Fn1 Fn2 ··· Fnn gn
··· ··· ··· ···
g1 g2 ··· gn 0
gn Fn1 Fn2 ··· Fnn
(1.6.7)
where the n × n principal minor is being bordered. If all the principal minors
are negative, i.e., if |H̄2 |, |H̄3 |, . . . , |H̄n | < 0, then the bordered Hessian is
positive definite, and a positive definite Hessian always satisfies the sufficient
condition for a relative (local) minimum. Similarly, if the principal minors
alternate in sign from positive to negative, i.e., if |H̄2 | > 0, |H̄3 | < 0, |H̄4 | > 0,
1.6 SPECIAL DETERMINANTS 19

and so on, then the bordered Hessian is negative definite, and a negative
definite Hessian always satisfies the sufficient condition for F to be concave
and have a relative (local) maximum.
If |H| = 0 and |H1 | = 0 = |H2 |, then H is not negative definite, but it
is negative semidefinite with |H1 | ≤ 0 and |H2 | = |H| ≥ 0. However, for
the semidefinite test, we must check the signs of these discriminants with
adj H. Then if |H1 | < 0 and |H2 | = |H| = 0, both discriminants are negative
semidefinite, and it satisfies the sufficient condition for F to be concave and
have a relative (local) maximum.

1.6.4 Bordered Hessian: Single Function. If a function f (x), x ∈ Rn , is


twice continuously differentiable, then the bordered Hessian |B̄| for a single
function is defined by

0 f1 f2 ··· fn
f1 f11 f12 ··· f1n
|B̄| = f2 f21 f22 ··· f2n . (1.6.13)
··· ··· ··· ··· ···
fn fn1 fn2 ··· fnn

Note that the bordered Hessian |B| is composed of the first derivatives of
the function f rather than an extraneous constraint g. The leading principal
minors are

0 f1 f2
0 f1
|B1 | = , |B2 | = f1 f11 f12 , |Bn | = |B|. (1.6.14)
f1 f11
f2 f21 f22

This Hessian is used to check quasi-concavity and quasi-convexity for a


function z = f (x), which are discussed in Chapters 6 and 7. We will state
two conditions, one of which is necessary and the other is sufficient, and
both relate to quasi-concavity on a domain consisting only of the nonnegative
orthant (the n-dimensional analogue of the nonnegative quadrant) which is
defined by x1 , . . . , xn ≥ 0. These conditions are as follows:
The necessary condition for a function z = f (x) to be quasi-concave on the
nonnegative orthant is (see §6.2; also Arrow and Enthoven [1961])
 
≤ odd
|B̄1 | ≤ 0, |B̄2 | ≥ 0, and |B̄n | 0 if n is (1.6.15)
≥ even,

where the partial derivatives are evaluated in the nonnegative orthant. Note
that the first condition in (1.6.15) is automatically satisfied since |B̄1 | =
 2
∂f
−f12 = − .
∂x1
20 1 MATRIX ALGEBRA

The sufficient condition for f to be strictly quasi-concave on the nonnega-


tive orthant is that
 
≤ odd
|B̄1 | < 0, {B̄2 | > 0, |B̄n | 0 if n is (1.6.16)
≥ even,

where the partial derivatives are evaluated in the nonnegative orthant.


Notice that unlike the bordered Hessian for a function f that is to be
optimized under a constraint g, the bordered Hessian |B| is used to check
whether a given function is quasi-concave or quasi-convex. Examples where
this Hessian is used are provided in Chapters 6 and 7.

1.7 Exercises
1.1. Use any method to solve each system of equations. If the system has
no solution, mark it as inconsistent.
 
 x − 2y + 3z = 7
  x+y−z =6

(i) 2x + y + z = 4 (ii) 3x − 2y + z = −5

 

−3x + 2y − 2z = −10. x + 3y − 2z = 14.
 

 2x − 2y − 2z = 2  + 2y − z = −3
 x
(iii) 2x + 3y + z = 2 (iv) 2x − 4y + z = −7

 

3x + 2y = 0. −2x + 2y − 3z = 4.

Ans. (i) x = 2, y = −1, z = 1; (ii) x = 1, y = 3, z = −2; (iii) inconsistent;


(iv) x = −3, y = 21 , z = 1.
1.2. A store sells almonds for $6.00 per pound and peanuts for $1.50 per
pound. The manager decides to mix 40 pounds of peanuts and some almonds
and sell the mixture for $3.00 per pound. How many pounds of almonds are
mixed with the peanuts so that the mixture will generate the same revenue
as each kind of nut sold separately. Ans. 20 pounds.
a11 a12 a13
1.3. Let A =  a21 a22 a23 
a31 a32 a33
be a 3 × 3 matrix. Show that by interchanging columns 1 and 3 the value of
the new determinant is −1 times the value of the original determinant.
Solution. 
a13 a12 a11
|A| = a23 a22 a21  = a13 (a22 a31 − a32 a21 ) − a12 (a21 a33 − a33 a21 ) +
a33 a32 a31
a11 (a23 a32 − a33 a22 )
1.7 EXERCISES 21

= −[a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 )]
a11 a12 a13
= − a21 a22 a23 .
a31 a32 a33

1.4. Given the column vector u and the row vector v, find the matrix uv:
(a) u = [ 3 2 5 4 ]T4×1 , and v = [ 2 7 4 8 ]1×4 .
 
6 21 12 24
 4 14 8 16 
Ans. The matrix uv =   .
10 35 20 40
8 28 16 32 4×4
T
(b) u = [ 4 7 8 1 ]1×4 , and v = [ 2 9 7 ]3×1 . Ans. The matrix uv is
not defined and cannot be multiplied.

a1 x + b 1 y = c1 ,
1.5. Consider the system of equations
a2 x + b 2 y = c2 .
If D = a1 b2 − a2 b1 6= 0, use the matrix method to show that the solution is
c1 b 2 − c2 b 1 a1 c2 − a2 c1
x= ,y= .
D D
Solution. If a1 6= 0, then
 
b1 c1
 1 a1
 
a1 b 1 c1 a 1

→ →
a2 b 2 c2  
a2 b 2 c2
   
b1 c1 b1 c1
1 a1 a1  1 a1 a1 
 → 
 −a2 b1 −a1 c1   −a2 b1 + b2 a1 −a2 c1 + c2 a1 
0 + b2 + c2 0
a1 a1 a1 a1
 
b1 c1
1 a1 a1 
→ 
 −a2 c1 + c2 a1 a1 
0 1 ·
a1 −a2 b1 + b2 a1
   
b1 c1 −b1 c2 + b2 c1
1 a1 a1  1 0 −a2 b1 + b2 a1 
→ → 
 −a2 c1 + c2 a1   −a2 c1 + c2 a1 
0 1 0 1
−a2 b1 + b2 a1 −a2 b1 + b2 a1
which yields
c1 b 2 − c2 b 1 1 a1 c2 − c2 b 1 1
x= = (c1 b2 − c2 b1 ), y = = (a1 c2 − a2 c1 ).
a1 b 2 − a2 b 1 D a1 b 2 − a2 c1 D
22 1 MATRIX ALGEBRA

If a1 = 0, then
 
b2 c2
1
   
0 b1 c1 a2 b2 c2 a2 a2 
→ → 
a2 b2 c2 0 b1 c1  
0 b1 c1
   
b2c2 c2 b 2 c1 c1 b 2 − c2 b 1
1 1 0 − =
a2a2   a2 a2 b 1 −a2 b1 
→ → ,
 c1   c1 −a2 c1 
0 1 0 1 =
b1 b1 −a2 b1
c1 b 2 − c2 b 1 −a2 c1
which yields x = and y = .
D D
1.6. Given a and x as vectors, ei as the unit column vector with ith
element 1 and all other elements zeros, [e] = [e − 1, . . . , en ] as the column
vector with ith element ei for all i = 1, . . . , n, and A and H as square matrices
in Rn , verify the dimensions in the following table.
Expression Dimension Expression Dimension

x n×1 a n×1

f (x) 1×1 f (a) 1×1

e1 n×1 [e] n×1


 T
∂f (x) ∂f (x)
n×1 1×n
∂x ∂x
∇f (x) n × 1, or 1 × n † ∇2 f = tr H 1×1

A n×n H n×n

† ∂f ∂f
depending on whether ∇f is written as e1 + · · · + en , or as
∂x1 ∂xn
∂f ∂f
e1 + · · · + en .
∂x1 ∂xn
1.7. Determine
  the rank of the following matrices:
−2 7 3
(a) A =  1 6 4 . Ans. |A| = −153 6= 0. Thus, the matrix A is
3 −8 5
nonsingular and the three rows and three columns are linearly independent.
Hence, ρ(A) = 3.
 
7 −2 5
(b) B =  2 10 −4 . Ans. |B| = 0. Hence the matrix B is
−3 −15 6
1.7 EXERCISES 23

singular, and the three rows and three columns are not linearly independent.
Hence, ρ(B)
 6= 3. However, the 2 × 2 submatrix in the upper left corner gives
7 −2
6 0. Thus, ρ(B) = 2. Note that row 3 is −1.5 times row 2,
= 74 =
2 10
and column 3 is −2/5 times column 2.
1.8. If the two columns of a 3 × 3 determinant are equal, then the value
of the determinant is zero.
a11 a12 a13
Solution. a21 a22 a23 = a11 (a22 a31 − a32 a21 ) − a12 (a21 a31 − a31 a21 )
a11 a12 a13
+a11 (a21 a32 − a31 a22 )
= a11 a22 a31 − a11 a32 a21 − a12 (0) + a11 a21 a22 − a11 a31 a22
= a31 (a11 a22 − a11 a22 ) + a32 (a11 a21 − a11 a21 ) = 0.
 
4 6
1.9. Show that the matrix A = has no inverse.
2 3
" #

4 6 1 0
 1 32 1
4 0
Solution 1. A I = →
2 3 0 1 2 3 0 1
 
1 32 1
4 0
→  .
0 1 − 12 1
Since this reduced form of the matrix A I shows that the identity matrix I
cannot appear on the left of the vertical bar, the matrix A has no inverse.
Solution 2. Since |A| = 0, the matrix A has no inverse, in view of formula
(1.3.10).
1.10. Consider the system
       
a11 a12 xt a a14 xt−1 b
= 13 + 1 ,
a21 a22 yt a23 a24 yt−1 b2

or in matrix form
Axt = E xt−1 + b, (1.6.11)

where x = (x, y). Prove that (E − ri A)Ci = 0.


Proof. Without loss of generality assume that b is a null matrix, and also
assume distinct real roots. Then we have

xt − ki Ci (ri )t , xt−1 − ki Ci (ri )t−1 . (1.6.12)


24 1 MATRIX ALGEBRA

Substituting these into the homogeneous form of the given equation, we get

Aki Ci (ri )t = Eki Ci (ri )t−1 ,

or
Eki Ci (ri )t−1 − Aki Ci (ri )t = 0,

which, when evaluated at t = 1, gives (E − ri A)Ci = 0.


1.11. Use any method to solve the following system:
 
ln x = 4 ln y logx y = 3 ln y
(a) ; (b)
log3 x = 2 + 2 log3 y. logx (4y) = 5.
Ans. (a) x = 81, y = 3; (b) x = 2, y = 8.
1.12. Use the Gaussian elimination method to solve the system

10x + 7y + 8z + 7w = 32, 5x + 6y + 5z + 7w = 23,


6x + 10y + 9z + 8w = 33, 5x + 9y + 10z + 7w = 31.

Solution. In order to reduce the matrix to the upper triangular form,


first the largest coefficient in absolute value is located, and brought to the
upper left corner by interchanges of equations and columns. This is called the
first pivot. In this example this is already in place. Keeping the first equation
intact, multiply the second and fourth equations by 2, third equation by 5/3,
and then subtract all these three equations from the first, to give

10x + 7y + 8z + 7w = 32
−5y − 2z − 7w = −14
− 29
3 y− 7z − 19
3 w = −23
−11y − 12z − 7w = −30.

Now eliminate y from the second and third equations, and also from the third
and fourth equations, and then z from the third and fourth equations. This
gives the echelon form, from which we first find w, then z, then y, and finally
x. Ans. x = y = z = w = 1.
1.13. Use the Hessian to determine whether the function F (x, y, z) =
2x2 − 7x − xy + 5y 2 − 3y + 4yz + 6z 2 + 3z − 4xz is minimized or maximized
at the critical points. Solution. The first-order criterion gives

Fx = 4x−7−y−4z = 0; Fy = −x+10y−3+4z = 0; Fz = 4y+12z+3−4x = 0,


1.7 EXERCISES 25

which in the matrix form Ax = b is


    
4 −1 −4 x 7
 −1 10 4 y  =  3 .
−4 4 12 z −3

Using Cramer’s method, we get |A| = (120 − 16) + 1(−12 + 16) − 4(−4 + 40) =
276, and

7 −1 −4 4 7 −4 4 −1 7
|A1 | = 3 10 4 , |A2 | = −1 3 4 , |A3 | = −1 10 3 ,
−3 4 12 −4 −3 12 4 4 −3

yielding A1 = 608, A2 = 104, A3 = −485; thus, the critical points are x∗ =


(608)/(276), y ∗ = (104)/(276), and z ∗ = −(485)/(276). Next, taking the
second-order derivatives we have Fxx = 4, Fyy = 10, Fzz = 12, Fxy = −1 =
Fyx , Fyz = 4 = Fzy , Fzx = −4 = Fxz , which yields the Hessian

4 −1 −4
|H| = −1 10 4 .
−4 4 12

4 −1
Since |H1 | = 4 > 0, |H2 | = = 42 > 0, and |H3 | = |H| = 276 > 0,
−1 10
we find that |H| is positive definite, which means that F (x, y, z) is minimized
at the critical points.
 
−8 4
1.14. Find the characteristic root of the matrix A = . Ans.
4 −8
Solving |A − r I| = 0, we get

−8 − r 4
|A − r I| = = r2 + 16r + 48 = 0,
4 −8 − r

which gives r = −4, −12. Also note that since both characteristic roots are
negative, the matrix A is negative definite. Also, check that the trace (sum
of the principal diagonal) of the matrix A must be equal to the sum of the
two characteristic roots.
 
25 61 −12
1.15. Find the inverse of the matrix A =  18 −2 4 .
8 35 21
 
0.01 0.05 −0.01
Ans. A−1 =  0.01 −0.02 0.01 .
−0.02 0.01 0.03
26 1 MATRIX ALGEBRA

1.16. Use matrix inversion to solve the following system of equations:


(a) 5x + 3y = 29, 2x + 7y = 58; and (b) 6x + 9y = 48, 2x + 5y = 36.
  
5 3 x
Solution. Write the system in the matrix form Ax = b: =
    2  7 y 
29 7 −3 1 7 −3
. Then |A| = 29; adj A = ; thus, A−1 = ,
58   −2 5 29 −2 5
1
giving x = A−1 x = . (b) Write the system in the matrix form Ax = b:
    8  
6 9 x 48 5 −9
= . Then |A| = 12; adj A = ; thus, A−1 =
25 y  36   −2 6
1 5 −9 −7
, giving x = A−1 x = .
12 −2 6 10

1.17. The equilibrium conditions for two related goods are given by 6P1 −
9P2 = 42, −P1 + 7P2 = 66, where P1 and P2 are respective prices. Find the
equilibrium prices P1 and P2 .

  The
 Solution.  system
  of equations in the matrix  form is Ax = b =
6 −9 P1 42 7 9
= . Then |A| = 33; adj A = . Then A−1 =
−1 7 P2  66
   1 6 
1 1 7 9 P1 −1 26.96
adj A = . This yields =A b= .
|A| 33 1 6 P2 13.28

1.18. The Bayou Steel Company produces stainless steel and aluminum
containers. On a typical day, they manufactured 750 steel containers with
10-gallon capacity, 500 with 5-gallon capacity, and 600 with 1-gallon capacity.
On the same day they manufactured 900 aluminum containers with 10-gallon
capacity, 700 with 5-gallon capacity, and 1100 with 1-gallon capacity. (a)
Represent the above data as two different matrices. (b) If the amount of the
material used in the 10-gallon capacity containers is 20 pounds, that used in
5-gallon containers is 12 pounds, and that in 1-gallon containers is 5 pounds,
find the matrix representing the amount of material. (c) If the stainless steel
costs $0.25 per pound and aluminum costs $0.10 per pound, find the matrix
representing cost. (d) Find the total cost of the day’s production.
Solution. (a) The data can be represented as
 
  750 900
750 500 600
, or  500 700  ;
900 700 1100 2×3
600 1100 3×2
 
  20  
750 500 600  12  = 24000
(b) the amount of material: ;
900 700 1100 31900 2×1
2×3 5 3×1
1.7 EXERCISES 27

(c) the cost is equal to [ 0.25 0.10 ]1×2 ; and (d) total cost of production
 
124000
[ 0.25 0.10 ]1×2 = $9191.20.
3900 2×1
1.19. Use the Hessian to determine whether the function F (x, y, z) =
−4x2 = 9x + xz − 2y 2 + 3y + 2yz − 4z 3 is minimized or maximized at the
critical points. Solution. The first-order criterion gives

Fx = −8x + 9 + z = 0; Fy = −4y + 3 + 2z = 0; Fz = x + 2y − 12z = 0,

which in the matrix form Ax = b is


    
−8 0 1 x −9
 0 −4 2   y  =  −3  .
1 2 −12 z 0

Using Cramer’s rule, we get |A| = −8(4804) + 1(4) = −348, and

−9 0 1 −8 −9 1 8 0 −9
|A1 | = −3 −4 2 , |A2 | = 0 −3 2 , |A3 | = 0 −4 −3 ,
0 2 12 1 0 −12 1 2 0

yielding A1 = 462, A2 = −303, A3 = −84; thus, the critical points are x∗ =


−(462)/(348) = −1.33, y ∗ = (303)/(348) = 0.87, z ∗ = (84)/(348) = 0.24.
Next, taking the second-order derivatives we have Fxx = −8, Fyy = −4, Fzz =
−12, Fxy = 0 = Fyx , Fyz = 2 = Fzy , Fzx = 1 = Fxz , which yields the Hessian

−8 0 1
|H| = 0 −4 2 .
1 2 −12

−8 0
Since |H1 | = −8 < 0, |H2 | = = 32 > 0, and |H3 | = |H| = −348 <
0 −4
2
0; also Fxx · Fyy = 32 < (Fxy ) = 144. Thus, we find that |H| is negative
definite, which means that F (x, y, z) is maximized at the critical points.
1.20. Maximize the total profit function P for a manufacturing firm pro-
ducing two related goods, in quantities x and y, so that the demand functions
are defined by P1 = 60−4x−2y, and P2 = 40−x−4y, and the total cost func-
tion TC= 4x2 +xy+2y 2 . Solution. Let P =TR -TC, where TR= P1 x+P2 y.
Thus,

P = TR − TC = (60 − 4x − 2y)x + (40 − x − 4y)y − (4x2 + xy + 2y 2 )


= 60x − 8x2 − 4xy + 40y − 6y 2 .
28 1 MATRIX ALGEBRA

Thus, using the first-order criterion, we have Px = 60 − 16x − 4y = 0, Py =


−4x + 40 = 12y, which in matrix form Ax = b is
    
−16 −4 x −60
= .
−4 −12 y −40

Using Cramer’s Rule, |A| = 176, |A1 | = 560, |A2 | = 400, thus giving x8 =
3.18, y ∗ = 2.27. The second-order derivatives are Pxx = −16, Pyy = −12, P +
xy − 4 = Pyx , and the Hessian is

−16 −4
|H| = = |A| = 176 > 0,
−4 −12

and |H1 | = −16 < 0. Thus, |H| is negative definite, and P is maximized at
(x∗ , y ∗ ).
2
Differential Calculus

Some basic concepts and results from real analysis are presented. The topics
include limit theorems, differentiation, criterion for concavity and related the-
orems, and vector-valued functions. Proofs for almost all of the results can be
found in many textbooks on calculus, e.g., Boas [1996], Hardy [1967], Royden
[1968], and Rudin [1976].

2.1 Definitions
A function f is a rule which assigns to each value of a variable x, called the
argument of the function, one and only one value y = f (x) known as the value
of the function at x. The domain of a function f , denoted dom(f ), is the set
of all possible values of x; the range of f , denoted by R(f ), is the set of all
possible values for f (x). Examples of functions are:
Linear function: f (x) = mx + b,
Quadratic function: f (x) = ax2 + bx + c = 0, a 6= 0.
Polynomial function of degree n: f (x) = an xn + an−1 xn−1 + · · · + a1 x +
a0 , an 6= 0; n nonnegative integer.
g(x)
Rational function: f (x) = , where g(x) and h(x) are both polynomials
h(x)
and h(x) 6= 0.
Power function: f (x) = axn , where n is any real number.
2.1.1 Limit of a Function at a Point. Let a function f be defined through-
out an open interval containing a, except possibly at a itself. Then the limit
of f (x) as x approaches a is L, i.e.,
lim f (x) = L, (2.1.1)
x→a

if for every ε > 0 there corresponds a δ > 0 such that |f (x) − L| < ε whenever
0 < |x − a| < δ. In other words, lim f (x) = L means that for every ε > 0
x→a
30 2 DIFFERENTIAL CALCULUS

there corresponds a δ > 0, such that f (x) is in the interval (L − ε, L + ε)


whenever x is in the interval (x − a, x + a) and x 6= a. Moreover, if f (x) has
a limit L as x → a, then L is unique and finite.
Example 2.1. (a) lim (5x − 7) = 3; (b) lim c = c, where c is a constant;
x→2 x→3
|x|
(c) lim does not exist, because in every interval (−δ, δ) there are numbers
x→0 x
such that |x|/x = 1 and other numbers such that |x|/x = −1, so the limit is
not unique; (d) if f (x) is the greatest integer function defined as f (x) = ⌊x⌋,
i.e., f denotes the largest integer z such that z ≤ x and is known as the
floor of f , and if n is any integer, then the limit from the right as x → n+ is
lim ⌊x⌋ = n, while the limit from the left as x → n− is lim ⌊x⌋ = n − 1,
x→n+ x→n−
sin x
and hence the limit lim ⌊x⌋ does not exist; (e) lim =1 
x→n x→0 x

0 if x is rational,
Example 2.2. Let f be defined as follows: f (x) =
1 if x is irrational.
Then for every real number a, lim f (x) does not exist. 
x→a

2.2. Theorems on Limits


Some useful theorems on limits are given below, without proof.
Theorem 2.1. If lim f (x) = L and lim g(x) = M , then
x→a x→a

(i) lim [f (x) ± g(x)] = L ± M ;


x→a
(ii) lim [f (x) · g(x)] = L · M,
x→a
(iii) lim [f (x)/g(x)] = L/M, provided M 6= 0.
x→a

Example 2.3. (a) lim [x3 (x + 4)] = lim x2 · lim (x + 4) = (2)3 · (2 + 4) =


x→2 x→2 x→2
48.
3x2 − 5x limx→4 (3x2 − 5x) 3(4)2 − 5(4)
(b) lim = = = 2.8;
x→4 x + 6 limx→4 (x + 6) 4+6
√ h i1/2
(c) lim 6x3 + 1 = lim (6x3 + 1) = [6(3)3 + 1]1/2 = (49)1/2 = 7.
x→2 x→2
√ √
x2/3 + 3 x lim [x2/3 + 3 x]
x→8
(d) lim =
x→8 4 − (16/x) lim [4 − (16/x)]
x→8
2/3
√ √
lim x + lim 3 x 4+6 2 √
x→8 x→8
= = = 2 + 3 2;
lim 4 − lim (16/x) 4−2
x→8 x→8
x−7 x−7 1 1
(e) lim 2 = lim = lim = ;
x→7 x − 49 x→7 (x + 7) x→7 x + 7 13
2.2. THEOREMS ON LIMITS 31

2 2 2
(f) lim , x 6= 0. Note that lim+ = ∞, and lim− = −∞. Since the
x
x→0 x→0 x x→0 x
value of the limit is not unique as x → 0 from either right or left, this limit
does not exist. 
Theorem 2.2. If a >√0 and√n positive integer, or if a < 0 and n odd
positive integer, then lim n x = n a.
x→a

Theorem 2.3. (Sandwich theorem) If f (x) ≤ h(x) for all x in an open


interval containing a, except possibly at a, and if lim f (x) = L = lim g(x),
x→a x→a
then lim h(x) = L.
x→a

A function f is continuous at a point a if the following three conditions are


satisfied: (i) a is in the domain of f , (ii) lim f (x) exists, and (iii) lim f (x) =
x→a x→a
f (a).
Theorem 2.4. (Intermediate value theorem) If a function f is continuous
on a closed interval [a, b] and if f (a) 6= f (b), then f takes on every value
between f (a) and f (b).
2.2.1 Limit at Infinity. If f is defined on an interval (c, ∞), then
lim f (x) = L means that for every ε > 0 there corresponds an N > 0
x→∞
such that |f (x) − L| < ε whenever x > N .
Theorem 2.5. If k is a positive rational number and c is any nonzero real
number, then
c c
lim = 0, and lim = 0.
x→∞ xk x→−∞ xk

2.2.2 Infinite Limits lim f (x) = ±∞. If f (x) exists on an open interval
x→a
containing a, except possibly at x = a, then f (x) becomes infinite (or increases
without bound), written as lim f (x) = ∞, if, for every positive number N ,
x→a
there corresponds a δ > 0 such that f (x) > N whenever 0 < |x − a| < δ.
Sometimes we say that f (x) becomes positively infinite as x approaches
a. A similar definition for lim f (x) = −∞ is: If f (x) exists on an open
x→a
interval containing a, except possibly at x = a, then f (x) becomes negatively
infinite (or decreases without bound), written as lim f (x) = −∞, if, for every
x→a
positive number M , there corresponds a δ > 0 such that f (x) < M whenever
0 < |x − a| < δ.
1
Example 2.4. (a) lim = ∞ if n is an even positive number;
x→a (x − a)n
1 1
(b) lim = ∞, and lim = −∞ if n is an odd positive
x→a+ (x − a)n x→a− (x − a)n
number.
32 2 DIFFERENTIAL CALCULUS

Theorem 2.6. If lim f (x) = ∞ and lim g(x) = c 6= 0, then


x→a x→a

(i) lim [g(x) + f (x)] = ∞;


x→a
f (x)
(ii) If c > 0, then lim [g(x)f (x)] = ∞ and lim = ∞;
x→a x→a g(x)
f (x)
(iii) If c < 0, then lim [g(x)f (x)] = −∞ and lim = −∞;
x→a x→a g(x)

g(x)
(iv) lim = 0.
x→a f (x)

2.3 Global and Local Extrema of Functions


A function f is increasing on an interval I if f (x1 ) < f (x2 ) whenever x1 < x2
for x1 , x2 ∈ I. A function f is decreasing on an interval I if f (x1 ) > f (x2 )
whenever x1 < x2 , x1 , x2 ∈ I. Let a function f be defined on an interval I,
and let u and v be numbers in I. If f (x) ≤ f (v) for all x ∈ I, then f (v) is
called the maximum value of f on I. Similarly, if f (x) ≥ f (u) for all x ∈ I,
then f (u) is called the minimum value of f on I.
Theorem 2.7. If a function f is continuous on a closed interval [a, b],
then f takes on a minimum value f (u) and a maximum value f (v) at some
numbers u and v in [a, b].
These extrema are called the absolute minimum and the absolute maximum
for f on an interval. However, the local extrema of f are defined as follows:
Let c be a number in the domain of a function f . If there exists an open
interval (a, b) containing c such that f (x) ≤ f (c) for all x ∈ (a, b), then f (c)
is a local maximum of f . If there exists an open interval (a, b) containing
c such that f (x) ≥ f (c) for all x ∈ (a, b), then f (c) is a local minimum of
f . Sometimes, the term extremum is used to mean either a maximum or a
minimum.
Theorem 2.8. If a function f has a local extremum at a number c, then
either f ′ (c) = 0 or f ′ (c) does not exist.
Theorem 2.9. If a function f is continuous on a closed interval [a, b]
and has its maximum (or minimum) value at a number c in the open interval
(a, b), then either f ′ (c) = 0 or f ′ (c) does not exist.
The number c in the domain of a function f is known as a critical number
of f if either f ′ (c) = 0 or f ′ (c) does not exist.
Theorem 2.10. (Rolle’s theorem) If a function f is continuous on a closed
interval [a, b], differentiable on the open interval (a, b), and if f (a) = f (b),
then f ′ (c) = 0 for at least one number c ∈ (a, b).
Corollary 2.1. If f has a derivative at c ∈ (a, b) and f ′ (c) 6= 0, then f (c)
2.3 GLOBAL AND LOCAL EXTREMA OF FUNCTIONS 33

is not a local extremum of f .


This result is a particular case of Fermat’s theorem, which follows.
Theorem 2.11. (Fermat’s theorem) If f has a relative extremum at a
point c ∈ (a, b), and if f ′ (c) exists, then f ′ (c) = 0.
Proof. Assume that f has a relative maximum at c ∈ (a, b) and that f ′ (c)
f (x) − f (c)
exists. The existence of f ′ (c) implies that the limit lim exists.
x→c x−c
This being a two-sided limit, we will show that both one-sided limits exist
and are equal to f ′ (c):

f (x) − f (c)
f ′ (c) = lim ≥ 0 since x − c < 0 as x → c from left,
x→c− x−c
f (x) − f (c)
f ′ (c) = lim ≤ 0 since x − c > 0 as x → c from right.
x→c+ x−c

Since zero is the only number which is both non-negative and non-positive,
so f ′ (c) = 0.
In the case of a relative minimum, assume in the above proof that f has
a relative minimum at c and note that f (x) − f (c) ≥ 0 for all x sufficiently
close to c, and reverse the sign in the two inequalities above. 
Corollary 2.2. If a function f is continuous on a closed interval [a, b] and
if f (a) = f (b), then f has at least one critical number in the open interval
(a, b).
Theorem 2.12. (Mean-value theorem) If a function f is continuous on a
closed interval [a, b] and differentiable on the open interval (a, b), then there
exists a number c ∈ (a, b) such that f (b) − f (a) = f ′ (c)(b − a).
A function f is said to have an absolute maximum or (a global maximum)
on the domain D at the point c ∈ D if f (c) ≥ f (x) for all x ∈ D, where f (c) is
called the maximum value of f on D. Similarly, if f (c) ≤ f (x) for all x ∈ D,
then we say that f has an absolute minimum or (a global minimum) on the
domain D at the point c ∈ D, where f (c) is called the minimum value of f
on D. These extreme values are termed absolute or global because they are
the largest and the smallest value, respectively, of the function f on D.
A function f is said to have a relative maximum or (a local maximum) at
the point c if f (c) ≥ f (x) for all x in an open interval containing c. Similarly,
if f (c) ≤ f (x) for all x in an open interval containing c, then we say that
f has an relative minimum or (a local minimum) at c. This definition can
be extended to include the endpoints of the interval [a, b] by saying that f
has a relative extremum at an endpoint of [a, b] if f attains its maximum or
minimum value at that endpoint in the half-open interval containing it.
34 2 DIFFERENTIAL CALCULUS

2.4 First and Second Derivative Tests


The following theorem is useful in determining the intervals on which a func-
tion is increasing or decreasing.
Theorem 2.13. Let a function f be continuous on a closed interval [a, b]
and differentiable on the open interval (a, b). If f ′ (x) > 0 for all x ∈ (a, b),
then f is increasing on [a, b]. If f ′ (x) < 0 for all x ∈ (a, b), then f is decreasing
on [a, b].
Theorem 2.14. (First derivative test) Let c ∈ (a, b) be a critical number
of a function f , and let f be continuous on [a, b] and differentiable on (a, b),
except possibly at c. If f ′ (x) > 0 for a < c < b and f ′ (x) < 0 for c < x < b,
then f (c) is a local maximum for f ; if f ′ (x) < 0 for a < c < b and f ′ (x) > 0
for c < x < b, then f (c) is a local minimum for f ; and if f ′ (x) > 0 or if
f ′ (x) < 0 for all x ∈ (a, b) except x = c, then f (c) is not a local extremum of
the function f .
2.4.1 Definition of Concavity. Let a function f be differentiable at c. The
graph of f is concave upward, or concave up (CU), at the point P (c, f (c)) if
there exists an open interval (a, b) containing c such that the graph of f is
above the tangent line through the point P . The graph is concave downward,
or concave down (CD), at P : (c, f (c)) if there exists an open interval (a, b)
containing c such that on (a, b) the graph of f is below the tangent through
the point P .
Concavity refers to the shape of a curve rather than its direction. Although
we have used the conventional terms ‘concave up’ (CU) and ‘concave down’
(CD), we will henceforth use the terms ‘convex’ and ‘concave’, respectively,
when discussing concavity of a function f on an interval (a, b).
This definition is related to the above Theorems 2.13 and 2.14, which are
used to determine whether a function is decreasing or increasing depending
on whether f ′ is negative or positive (see Figure 2.1).
If the function f ′ has a derivative, then these theorems can also be applied
to f ′′ . In other words, if f has a second derivative on some interval I = (a, b),
then on that interval the following theorem holds:
Theorem 2.15. (Test for concavity) If f is differentiable on an open
interval I containing c, then at the point P : (c, f (c)) the graph of f is (i)
convex (i.e., CU) iff f ′′ (c) > 0, and (ii) concave (i.e., CD) iff f ′′ (c) < 0.
Hence, we have:

A functionf is convex (i.e., CU) iff f ′′ > 0, and concave (i.e., CD) iff f ′′ < 0.

A point P (c, f (c)) on the graph of a function f is a point of inflection (or


an inflection point) if there exists an open interval (a, b) containing c such
that (i) f ′′ (x) > 0 if a < x < c and f ′′ (x) < 0 if c < x < b; or (ii) f ′′ (x) < 0
2.4 FIRST AND SECOND DERIVATIVE TESTS 35

if a < x < c and f ′′ (x) > 0 if c < x < b (see Figure 2.2.).

Figure 2.1 Increasing and decreasing functions at x = a.

Figure 2.2 Points of inflection at x = a.

Theorem 2.16. (Second derivative test) Let a function f be differentiable


on an open interval containing c and let f ′ (c) = 0. If f ′′ (c) < 0, then f has
a local maximum at c; and if f ′′ (c) > 0, then f has a local minimum at c.

The functions that are convex or concave at a point are presented graph-
36 2 DIFFERENTIAL CALCULUS

ically in Figure 2.3.

Figure 2.3 Convex and concave functions at a point.

Figure 2.3(a): Slope at x = a is positive; the function f (x) is increasing at


x = a; f ′ (a) > 0, and f ′′ (a) > 0, so the function is convex (CU) at x = a.
Figure 2.3(b): Slope at x = a is negative; the function f (x) is decreasing
at x = a; f ′ (a) < 0, and f ′′ (a) > 0, so the function is convex (CU) at x = a.
Figure 2.3(c): Slope at x = a is positive; the function f (x) is increasing at
x = a; f ′ (a) > 0, and f ′′ (a) < 0, so the function is concave (CD) at x = a.
Figure 2.3(d): Slope at x = a is negative; the function f (x) is decreasing
at x = a; f ′ (a) < 0, and f ′′ (a) < 0, so the function is concave (CD) at x = a.
2.5 VECTOR-VALUED FUNCTIONS 37

Example 2.5. Let f (x) = x5 −5x3 . Then f ′ (x) = 5x4 −15x2 = 5x2√(x2 −3)
and f ′′ (x) = 20x3 −30x = 10x(2x 2
√ −3). √ The critical√numbers are √ 0, ± 3, and
the values are f (0) = 0, f (− 3) = 30 3.0, f ′′ (− 3) = −30√ 3 < 0. Thus,
′′ ′′

by the second derivative


√ test, f has√a local minimum √ at x√= 3 and √ a local
maximum at x = − 3, given by f ( 3) = −5 3 and f (− 3) = 6 3. Since
f ′′ (0) = 0, the second derivative test does not apply at this critical √ number.

Then using the first derivative
√ test we find that f (x) < 0 in − 3<x<0
and f ′ (x) < 0 in 0 < x < 3, i.e., f ′ (x) does not change sign, so there is no
local extremum√ at x = 0.√ Solving f ′′ (x) = 0, i.e., 10x(2x3 − 3) = 0, we get the
solution set {− 6/2, 0, 6/2} which are the points of inflection. The results
are given in the following table.

Interval f ′′ (x) Concavity



(−∞, √− 6/2) − concave (CD)
(− √ 6/2, 0) + convex (CU)
√ 6/2)
(0, − concave (CD)
( 6/2, ∞) + convex (CU) 

Example 2.6. Let f (x) = 12 + 2x2 − x4 . Then f ′ (x) = 4x − 4x3 =


4x(1 − x2 ), and f ′′ (x) = 4 − 12x2 = 4(1 − 3x2 ). From f ′ (x) = 0 we get the
critical numbers 0, 1, and −1, giving f ′′ (0) = 4 > 0, f ′′ (1) = −8 < 0 and
f ′′ (−1) = −8 < 0, with the corresponding functional values f (0) = 12, f (1) =
13 = f (−1). Hence, by the second derivative test there is a local minimum
at x = 0 and a local maxima at x = ±1. The possible points of√inflection √
are obtained by solving f ′′ (x) = 0, which has the solution {−1/ 3, 1/ 3}.
Hence, we find that

(i) f is concave on the interval (−∞, −1/ 3) since f ′′ (x) < 0 there;
√ √
(ii) f is convex on the interval −1/ 3, 1/ 3) since f ′′ (x) > 0 there; and

(iii) f is concave on the interval (1/ 3, ∞) since f ′′ (x) < 0 there. 

2.5 Vector-Valued Functions


A function f from a set X ⊆ R to a set Y ⊆ R3 is a correspondence defined
by a vector-valued function, denoted by r(t), which defines a unique vector
r(t) = x i + y j + z k ≡ x, y, z , where t ∈ R. Let the components x, y, z be
defined by x = f (t), y = g(t), z = h(t), respectively. Then

r(t) = f (t), g(t), h(t) = f (t)i + g(t)j + h(t)k (2.5.1)

for all numbers t ∈ R, where i, j, k are the unit vectors along the coordinate
axes. Conversely, if f, g, h are functions from X to R, then a vector-valued
38 2 DIFFERENTIAL CALCULUS

function r may be defined by Eq (2.5.1). Thus, r is a vector-valued function


iff r(t) is defined by (2.5.1). The domain of r is assumed to be the intersection
of the domains of f, g, and h. A geometrical interpretation of Eq (2.5.1) is
presented in Figure 2.4(a): If OP is the position vector corresponding to r(t),
then as x varies through X, and the endpoint P (f (t), g(t), h(t)) traces the
curve with parametric equations x = f (t), y = g(t), z = h(t). For example,
if r(t) = a cos t i + a sin t j + bt k, then the endpoint of the position vector
corresponds to t tracing the circular helix shown in Figure 2.4(b).

Figure 2.4 (a) Eq (1.5.1). (b) Circular helix.

If r(t) is defined by (2.5.1), then

lim r(t) = lim f (t), lim g(t), lim h(t) , (2.5.2)


t→a t→a t→a t→a

provided f, g, and h have limits as t → a. If we denote lim f (t) = a1 , lim g(t) =


t→a t→a
a2 , lim h(t) = a3 , then lim r(t) = a1 , a2 , a3 = a1 i + a2 j + a3 k = a.
t→a t→a
A vector-valued function r(t) is continuous at a if lim r(t) = r(a). The
t→a
derivative r′ (t) of a vector-valued function r(t) is defined by

r(t + t0 ) − r(t)
r′ (t) = lim , (2.5.3)
t0 →0 t0

for all t such that the limit exists.


Example 2.7. Given r(t) = (ln t)i+e−3tj+t2 k, the domain of r is the set of
positive real numbers where r is continuous, and r′ (t) = (1/t)i − 3e−3tj + 2tk,
and r′′ (t) = (−1/t2 )i + 9e−3t j + 2k. 
Example 2.8. To prove that if r(t) is constant, then r′ (t) is orthogonal
to r(t) for every t, note that r(t) · r(t) = |r(t)|2 = c for some scalar c. Since
2.5 VECTOR-VALUED FUNCTIONS 39

r(t) = f (t), g(t), h(t) = f (t) i + g(t) j + h(t) k, the above equation becomes
[f (t)]2 + [g(t)]2 + [h(t)]2 = c, which when differentiated implicitly gives

2f (t)f ′ (t) + 2g(t)g ′ (t) + 2h(t)h′ (t) = 0,

i.e., 2r(t) · r′ (t) = 0. 


Let Dt u(t) denote the derivative of u(t) with respect to t. Then
Theorem 2.17. If u and v are differentiable vector-valued functions and
c is a scalar, then the following relations hold:

Dt [u(t) ± v(t)] = u′ (t) ± v′ (t);


Dt [c u(t)] = c u(t);
Dt [u(t) · v(t)] = u(t) · v′ (t) + u′ (t) · v(t);
Dt [u(t) × v(t)] = u(t) × v′ (t) + u′ (t) × v(t).

Definite integrals of vector-valued functions are defined as follows: If r(t) =


f (t)i + g(t)j + h(t)k, where the functions f, g, and h are integrable on an
interval [a, b], then by definition
Z b Z b  Z b  Z b 
r(t) dt = f (t) dt i + g(t) dt j h(t) dt k, (2.5.4)
a a a a

and we say that r(t) is integrable on [a, b]. Moreover, if R(t) is an antideriv-
ative of r(t) in the sense that R′ (t) = r(t) for all t ∈ [a, b], then
Z b
b
r(t) dt = R(t) a = R(b) − R(a). (2.5.5)
a

If R(t) is an antiderivative of r(t), then every antiderivative has the form


R(t) + c for some constant vector c, and we write
Z
r(t) dt = R(t) + c.

Example 2.9. Given u(t) = ti+t2 j+t3 k, and v(t) = sin ti+cos tj+2 sin tk,
we have

Dt [u(t) · v(t)] = (1 + 5t2 ) sin t + (3t + 2t2 ) cos t,


Dt [u(t) × v(t)] = [(t2 + 4t) sin t − t2 cos t]i + [(3t2 − 2) sin t + (t2 − 2t) cos t]j
+ [−3t sin t + (1 − t2 ) cos t]k.

Example 2.10. If u′ (t) = t2 i + (6t + 1)j + 8t3 k and u(0) = 2i − 3j + k,


then  3 
t
u(t) = + 2 i + (3t2 + t − 3)j + (2t4 + 1)k. 
3
40 2 DIFFERENTIAL CALCULUS

2.5.1 Geometric Meaning of the Inflection Point. An inflection point


(IP) on the graph of a function f is a point where the second derivative f ′′ = 0.
What is the meaning of this? The relation of the inflection point to intervals
where the curve is concave upward (CU) or concave downward (CD) is exactly
the same as the relation of critical points to the intervals where the function
is increasing or decreasing. Thus, the inflection points mark the boundaries
of the two different kinds of behavior. Also, only one sample value of f ′ is
needed between each pair of consecutive IPs in order to see whether the curve
is CU or CD along that interval.

2.6 Optimization
An application of the study of global and local extrema of a function leads to
certain optimization problems. Recall that an optimal solution corresponds
to the point or points where a given function attains an absolute maximum or
absolute minimum value. Certain useful guidelines to solve such optimization
problems are as follows:
(i) Relative to a given problem, define a function to be optimized, then plot
its graph and label the relevant quantities, if possible; (ii) label the quantity
that needs to be optimized, and signify the appropriate domain, also known
as the feasible domain, for the problem; and (iii) using the methods of §2.3,
solve the problem.

Example 2.11. To find the rectangle of largest possible area that can be
inscribed in a semi-circle of
√radius r, let the rectangle
√ be of height h, length
w, and area A = hw = h r2 − h2 , where w = 2 r2 − h2 , 0 < h < r (see
Figure 2.5).

Figure 2.5 Example 2.11.


Since the domain A is closed and bounded on the interval [0, r], we find, in
view of Rolle’s theorem, that the absolute maximum area of A is guaranteed
to exist. First, we find the critical point(s):

p 2h2 2r2 − 4h2


A′ (h) = 2 r2 − h2 − √ = √ = 0,
r 2 − h2 r 2 − h2
2.6 OPTIMIZATION 41


which gives h = ±r/ 2. Note that A′ is undetermined when h = r (obvious
from the geometry of the problem). Also, we can discard the negative solution
for h. Then at the two endpoints √ and one critical point in [0, r], we have
A(0) = 0 = A(r), and A(h) = A(r/ 2) = r2 . Hence, the maximum √ possible

area of the rectangle is A(h) = r2 , and it occurs when h = r/ 2 and w = r 2.
Also check that this maximum area A is smaller that the area πr2 /2 of the
semi-circle. 
Example 2.12. A 15” × 24” piece of sheet metal is formed into an open
box by cutting out a square from each of the four corners and folding up
the remaining piece. How large should each square be to obtain a box of
maximum volume?
Let V denote the volume of the open box. From Figure 2.6, we find that
V (x) = x(15 − 2x)(24 − 2x), and the critical points are given by V ′ (x) = 0.

Figure 2.6 (a) Example 2.12. (b) Graph of V .

Now,

V ′ (x) = x(15−2x)(−2)+x(−2)(24−2x)+(15−2x)(24−2x) = 12(x−3)(x−10).

Thus, V ′ (x) = 0 at x = 3 or x = 10. But since the only critical point


in [0, 15/2] is at x = 3, we find that V (0) = 0 = V (15/2), and thus, the
maximum possible volume is V (3) = 486 in2 (see Figure 2.6(b)). 
Example 2.13. The profit P (x) by selling x units of a given product
is related to the cost C(x) of its production and the revenue R(x) by the
formula P (x) = R(x) − C(x). In order to maximize profit, we will assume
that C and R are differentiable functions, so we can compute the critical
points of P from P ′ (x) = 0, which are also the solutions of C ′ (x) = R′ (x). A
typical cost-revenue graph is presented in Figure 2.7, which shows that cost
initially exceeds revenue and then falls below revenue as bulk manufacturing
and transportation costs are realized, and finally exceeds revenue after the
production capacity and market saturation are reached. The profit zone is
bounded by two positive break-even points, at which C(x) = R(x); and the
points of maximum profit and loss happen at the points where C ′ (x) = R′ (x).
42 2 DIFFERENTIAL CALCULUS

Using a specific example, suppose that for a certain book printing company
the cost and revenue functions in a particular year are defined, in thousands
of dollars, by C(x) = 2x3 − 12x2 + 30x and R(x) = −x3 + 9x2 , where x
represents the units of 1000 books. It is assumed that this model is accurate
up to approximately x = 6. We are required to find out what the company’s
profit zone is and what level of production will maximize the company’s profit.

Figure 2.7 Example 2.13, graph of P (x).

The profit function is P (x) = R(x) − C(x) = −3x3 + 21x2 − 30x = −3x(x −
2)(x−5). The solution set of the equation P (x) = 0 is x = {0, 2, 5}. Neglecting
x = 0, we know that the positive break-even points are x = 2 and x = 5, i.e.,
2000 and 5000 books. Again, solving P ′ (x) = 0, or equivalently
√ C ′ (x) =
7 ± 19
R′ (x), we get 3x2 − 14x + 30 = 0, which gives x = ≈ 3.786 or
3
0.880. Using the first derivative test we find that P has a relative maximum
at 3.786 and a relative minimum at 0.880, i.e., the relative maximum at 3786
and relative minimum at 880 books, respectively. Thus, the maximum profit
is P (3.786) = 24.626, or $24, 626. This is presented in Figure 2.7(b), in which
the correspondence between the relative extrema of P and those of C and R
are marked by vertical segments between the graphs of C(x) and R(x) at the
points 0.88 and 3.786. 

2.7 Multivariate Functions. The following three conditions must be met


for a multivariate function to be a relative maximum or minimum at a critical
point:
(a) All first-order partial derivatives must be zero simultaneously. When
solved, they yield the critical point x∗ where the function is neither increasing
nor decreasing with respect to the coordinate axes, although there is a relative
plateau at this point.
(b) The second-order direct partial derivatives at the point x∗ must all be
negative for a relative maximum and positive for a relative minimum. Geo-
metrically, it means that the function is concave and moving downward from
2.6 OPTIMIZATION 43

the relative plateau (i.e., at the point x∗ ) in relation to the coordinate axes
to be a relative maximum and moving upward in relation to the coordinate
axes to be a minimum.
(c) In the case of a function of two variables, the product of the second-
order direct partial derivatives evaluated at the point x∗ must be greater than
the product of the cross partial derivatives also evaluated at the critical point
x∗ . This condition is needed to exclude the cases of an inflection point or a
saddle point at x∗ .

Figure 2.8 Relative extrema for a multivariate function.

The above conditions are presented in Figure 2.8 for a function of two
variables z = f (x, y), where we have a relative maximum (Figure 2.8(a)) and
a relative minimum (Figure 2.8(b)).
The conditions satisfied in each case are as follows:
For a relative maximum: fx , fy = 0; fxx , fyy < 0; and fxx · fyy > (fxy )2 ;
For a relative minimum: fx , fy = 0; fxx , fyy > 0; and fxx · fyy > (fxy )2 .
The last condition can also be written as fxx · fyy − (fxy )2 > 0.
44 2 DIFFERENTIAL CALCULUS

If fxx · fyy < (fxy )2 , we get


(i) an inflection point if fxx and fyy have the same signs;
(ii) a saddle point if fxx and fyy have different signs where the function has
a maximum when viewed from one axis and a minimum when viewed from
the other axis (see Figure 2.8(c));
(iii) if fxx · fyy = (fxy )2 , the test fails; and
(iv) if the function is strictly concave up (CU) (convex down (CD)) in x
and y, as in Figures 2.8(a)-(b), there is only one extremum, called the global
maximum (minimum). If the function is simply concave (convex) in x and y
on an interval, there will be a relative maximum (minimum) at the critical
point.
Example 2.14. Consider f (x, y) = y 3 − 2x3 + 294x − 27y + 72. The
first-order partial derivatives are:
fx = −6x2 + 294 = 0, or x2 = 49, which gives x = ±7;
fy = 3y 2 − 27 = 0, or y 2 = 9, which gives y = ±3.
Hence, there are four critical points: (7, 3), (7, −3), (−7, 3), and (−7, −3).
Next, using the second-order partial derivatives at each critical point, we
check the signs: fxx = −12x, fyy = 6y, fxy = 0 = fyx :

(1) fxx (7, 3) = −84 < 0, fyy (7, 3) = 18 > 0,


(2) fxx (7, −3) = −84 < 0, fyy (7, −3) = −18 < 0,
(3) fxx (−7, 3) = 84 > 0, fyy (−7, 3) = 18 > 0,
(4) fxx (−7, −3) = 84 > 0, fyy (−7, −3) = −18 < 0.

Since there are different signs in each of the second-order derivative values
in the cases (1) and (4), the function f cannot have a relative extremum at
the critical points (7, 3) and (−7, −3). In the case when fxx and fyy are of
different signs, the product fxx fyy cannot be greater than (fxy )2 , and the
function f is at a saddle point. Next we check the sign of fxx · fyy > (fxy )2
at the remaining two critical points (7, −3) and (−7, 3):
At the point (7, −3) we have (−84) · (−18) > (0)2 ; thus, we have a relative
maximum at (7, −3). Also, at the point (−7, 3) we have (84) · (18) > (0)2 ;
thus, we have a relative minimum at (−7, 3).
As an alternative method, we can use the Hessian (see §1.6.2), which for
this example is defined by

fxx fxy
|H| = .
fyx fyy
2.6 OPTIMIZATION 45

Then at each of the critical points

−84 0 −84 0
at (7, 3) : |H| = ; at (7, −3) : |H| = ;
0 18 0 −18
84 0 84 0
at (−7, 3) : |H| = ; at (−7, −3) : |H| = ;
0 18 0 −18

which leads to the same answer as above. 


Thus, we have three ways to present the optimization analysis:
(i) by tabular form, (ii) by using Hessian, or (iii) by simple explanation.

2.7.1 Geometrical Interpretation. The geometrical concepts of minima,


saddle points, and maxima can be visualized from the following illustration:
Consider a mountainous terrain M . If a function f 7→ R sends each point
to its elevation, then the inverse image of a point a ∈ R is simply a contour
line, such that each connected component of a contour line is either a point,
a simple closed curve, or a closed curve with a double point. Contour lines
may have points of higher order, like triple points, but they, being unstable,
may be removed by a slight deformation of M . The double points in contour
lines occur at saddle points, or passes, since at these points the surrounding
terrain curves up in one direction and down in the other.
Just imagine there is flooding in the terrain M . Then the region covered
by water when it reaches an elevation of the point a is given by f −1 (−∞, a],
i.e., water reaches the points with at most elevation a. When water passes
the height of the point a, a critical point where the gradient ∇f = 0, then
the water either (i) starts filling the terrain basin, (ii) covers a saddle point
(a mountain pass), or (iii) submerges a peak. In each of these three types of
critical points (basins, passes, and peaks), we have the case of minima, saddle
points, and maxima, respectively. The safest place to escape flooding is either
the front or back high elevation at a saddle point (known as the horn and the
cantle of a saddle), or the highest peak (maximum elevation), and the worst
places are the basins (minima) and the inflection points.

2.7.2 Gradient at a Point. The gradient of a function f at a point x ∈ Rn


is a multiplication of vectors by scalars. For example, in R3 , the gradient of
f at a point x = (x, y, z) is denoted by grad f , or ∇f , and defined by

∂f ∂f ∂f
∇f (x) = i+ j+ k, (2.7.1)
∂x ∂y ∂z

where i, j, k are the unit vectors along the coordinate axes. In Rn , we have

∂f ∂f ∂f
∇f (x) = e1 + e2 + · · · + en , (2.7.2)
∂x1 ∂x2 ∂xn
46 2 DIFFERENTIAL CALCULUS

where ei is the unit vector along the ith coordinate axis, i = 1, 2, . . . , n, and
all these vectors form a linearly independent set in Rn .
Theorem 2.18. (Michel and Herget [2007: 88]) Let {e1 , e2 , . . . , en } be the
linearly independent set of unit vectors along the coordinate axes in a vector
Pn ∂f n ∂g
P ∂f ∂g
space Rn . If ei = ei , then = for all i = 1, 2, . . . , n.
i=1 ∂xi i=1 ∂xi ∂xi ∂xi
Pn ∂f Pn ∂g
Proof. If ei = ei , then, by matrix multiplication, we have
i=1 ∂xi i=1 ∂xi

h ∂f ∂g i
− [ei ]T = [0]1×1 = [0]1×n [ei ]Tn×1 , (2.7.4)
∂xi ∂xi 1×n n×1

which is a relation in the sense that it is an equation, expressed in matrix


 ∂f ∂g 
form as the matrix product of scalar terms − and unit vectors
∂xi ∂xi
[ei ]T for i = 1, 2, . . . , n, and equated to the matrix product of the zero vector
and unit vectors [ei ]T . Since the set {e1 , e2 , . . . , en } is linearly independent,
∂f ∂g
we get − = 0 for all i = 1, 2, . . . , n, which proves the theorem. 
∂xi ∂xi
This leads to the following result:
Condition A. Let f (x) ∈ Rn be a continuously differentiable function. If

∂f ∂f ∂f
∇f (x) = e1 + e2 + · · · + en
∂x1 ∂x2 ∂xn
h ∂f ∂f ∂f i T
= ... [e] = 0, (2.7.5)
∂x1 ∂x2 ∂xn

where [e] = [e1 e2 . . . , en ], then

∂f ∂f ∂f
= 0, = 0, . . . , = 0. (2.7.6)
∂x1 ∂x2 ∂xn

This is the necessary condition that not only establishes an isomorphism


 
iso ∂f ∂f ∂f
∇f (x) = ... (2.7.7)
∂x1 ∂x2 ∂xn

between the gradient of a linear mapping, ∇f , and the first partial derivatives
∂f
for i = 1, . . . , n, but also imposes the restriction that the equations (2.7.6)
∂xi
will hold only when condition (2.7.5), ∇f (x) = 0, is satisfied.
This condition is used in the Lagrange multiplier method, the KKT con-
ditions, and the Fritz John condition in optimization problems. However,
there are a couple of cases involving the first- and higher-order Taylor series
approximations, where the above isomorphism is misused (see §3.5).
2.8 MATHEMATICAL ECONOMICS 47

2.8 Mathematical Economics


Some terms, notations, and definitions, useful in the study of elementary
mathematical economics, are introduced.

2.8.1 Isocost Lines. In economics, an isocost line represents the different


combinations of two inputs or factors of production that can be purchased
with a given sum of money. It is represented, in general, by the formula

PK K + PL L = E, (2.8.1)

where K and L denote capital and labor, PK and PL their respective prices,
and E the amount allotted for expenditures. In isocost analysis the prices and
the expenditures for individual items are initially held constant; only different
inputs are allowed to vary. Solve the above formula for K, and show that a
change in PL and PK will affect the slope and the vertical intercept.
Solving Eq (2.8.1) for an isocost line we get
E − PL L E P 
L
K= , or K = − L,
PK PK PK
which is a straight line of the form y = mx + b, where the slope m = −PL /PK
and the vertical intercept, also called the y-intercept, b = E/PK (see Figure
2.9).
The effect of a change in any one of the parameters can be easily seen
from Figure 2.9. For example, an increase in expenditure from E to E ′ will
increase the vertical intercept and the isocost line (dashed line) will shift out to
the right parallel to the previous line; however the slope remains unaffected
because it depends on the ratio of the prices −PL /PK and prices are not
affected by change in expenditures. A change in PL will change the slope of
the line but not the vertical intercept, but a change in PK will change both
the slope and the vertical intercept.

Figure 2.9 Isocost line.


48 2 DIFFERENTIAL CALCULUS

2.8.2 Supply and Demand. Let Qs and Qd denote the supply and demand
functions, respectively. Equilibrium in supply and demand occurs when Qs =
Qd . For example, the equilibrium prices and quantity are determined in the
following situation: Given Qs = 4P − 7, Qd = 14 − 3P , in equilibrium we
have Qs = Qd , or 4P − 7 = 14 − 3P , or P = 3. Then substituting P = 3 in
either equation we get Qs = 4P − 7 = 12 − 7 = 5 = Qd .
The equilibrium equation is Y = C + I + G + (X − Z), where Y is income,
C consumption, I investment, G government expenditures, X exports, and Z
imports.

Example 2.15. In the case of a two-sector economy, Y = C + I, C =


C0 + bY , and I = I0 . Given C0 = 95, b = 0.8, and I0 = 65, calculate
the equilibrium level of income in terms of (a) the general parameters, and
(b) the specific values assigned to these parameters. In the case (a), the
equilibrium equation is Y = C +I = C0 +bY +I0 , which on solving for Y gives
C0 + I0
Y = , known as the reduced form which expresses the endogenous
1−b
variable Y as an explicit function of the exogenous variables C0 , I0 and the
95 + 65 160
parameter b. In case (b), Y = = = 800. Note that the term
1 − 0.8 0.2
1/(1 − b) is called the autonomous expenditure multiplier, which measures
the multiple effect each dollar of autonomous spending has on the equilibrium
level of income. 
The parameter b is called the marginal propensity to consume (MPC) in an
income determination model; it is the proportion of an aggregate raise in pay
that a consumer spends on the consumption of goods and services, as opposed
to saving it. In this sense, the above autonomous expenditure multiplier can
be expressed as 1/(1 − MPC).

2.8.3 IS-LM Equation. The IS-schedule is the locus of points representing


all possible combinations of interest rates and income levels consistent with
equilibrium in the commodity market and the LM-schedule is the locus of
similar points in the money market. Thus, the IS-LM analysis is used to
find the level of income and the rate of interest at which both the commodity
(goods) market and the money market will be in equilibrium. The commodity
market for a single two-sector economy is in equilibrium when Y = C + I,
while the money market is in equilibrium when the supply of money Ms is
equal to the demand for money Md , with Md = Mt + Mz , where Mt is the
transition-precautionary demand for money, and Mz the speculative demand
for money.
Example 2.16. Assuming a two-sector economy where C = 56 + 0.8Y ,
I = 94 − 70i, Ms = 210, Mt = 0.3Y , and Mz = 55 − 140i, compute the IS
and LM. The IS (commodity equilibrium) exists when Y = C + I. With the
2.8 MATHEMATICAL ECONOMICS 49

above data,

Y = 56 + 0.8Y + 94 − 70i, or 0.2Y + 70i − 150 = 0. (2.8.2)

The LM (monetary equilibrium) exists when Ms = Mt + Mz , which with the


above data is

210 = 0.3Y + 55 − 140i, or 0.3Y − 140i − 155 = 0. (2.8.3)

Solving Eqs (2.8.2) and (2.8.3), we find the condition of simultaneous equi-
librium in both markets. Thus, multiplying Eq (2.8.2) by 2 and adding to
Eq (2.8.3) gives 0.7Y = 455, or Y = 650. Then substituting this value
of Y into Eq (2.8.2) we get 130 + 70i = 150, i = 2/7 ≈ 0.29. For these
values of Y and i, the equilibrium values of C, Mt , and Mz are: C =
56 + 0.8Y = 56 + (0.8)(650) = 576, Mt = 0.3Y = (0.3)(650) = 195, Mz =
55 − 140i = 55 − 140(2/7) = 15. To check, C + I = 56 + 0.8Y + 94 = 70i =
56 + (0.8)(650) + 94 − 70(2/7) = 650, and Mt + Mz = 0.3Y + 55 − 140i =
(0.3)(650) + 55 − 140(27) = 195 + 55 − 40 = 210 = Ms .

2.8.4 Marginal of an Economic Function. Let MC = marginal cost;


MR= marginal revenue; TC = total cost; TR = total revenue; Q = level of
output; P = demand; TR=PQ; R = revenue; C = cost; π = profit = R − C;
and TP = total product. Then the marginal cost (MC) in economics is defined
as the change in total cost incurred from the production of an additional unit.
The marginal revenue (MR) is defined as the change in total revenue brought
about by the sale of an extra good. Since total cost (TC) and total revenue
(TR) are both functions of the level of output (Q), both MC and MR are
expressed as derivatives of their total functions, respectively, i.e.,
d(TC) d(TR)
if TC= TC(Q), then MC= ; if TR= TR(Q), then MR= .
dQ dQ
For example, (i) let TR= 62 Q-5 Q2 . Then MR=62-10 Q; and (ii) TC =
Q3 −19 Q+34. Then MC = 3Q2 −19.
In general, the marginal of any economic function is the derivative of its
total function.
dC
The marginal propensity to consume (MCP ) is defined as MCP= ; and
dY
average cost function is denoted by AC.

Theorem 2.19. MC = MR at the profit maximizing state.


Proof. Since π = TR − TC, to maximize we have

dπ d(TR) d(TC)
= − = 0,
dQ dQ dQ
50 2 DIFFERENTIAL CALCULUS

d(TR) d(TC)
which implies that = , or MR = MC. 
dQ dQ

2.8.5 Marginal Rate of Technical Substitution (MRTS). An isoquant


determines the different levels of inputs K and L that can be used to produce
a specific level of output Q. One such isoquant for the output level Q = k,
k constant, is defined by aK 1/4 L3/4 = k, and its slope dK/dL is known as
MRTS. The general form of an isoquant is aK p L1−p = k, where a is real,
0 < p < 1, and k is a constant.
Example 2.17. Let an isoquant at the output level 2016 be defined by
Q = 24K 1/6 L5/6 = 2016. (a) Determine the slope dK/dL, and (b) evaluate
the MRTS at K = 308 and L = 115. First, (a) since the given isoquant is
K 1/6 L5/6 = 84, using the implicit differentiation we get

1 −5/6 5/6 dK 5
K L + K 1/6 L−5/6 = 0,
6 dL 6
which simplifies to give

dK −5K 1/6 L−5/6 5K


= −5/6 5/6
=− . (2.8.4)
dL K L L
5K
Thus, MRTS = − . (b) Substituting K = 308 and L = 115 into (2.8.4),
L
(5)(308)
we find that MRTS = − = −13.39. Thus, in the case of a constant
115
production level, if L is increased by a meager 1 unit, K must be decreased
by 13.39 units in order to remain on the production isoquant. 

2.9 Exercises
2.1. Prove that if f is a linear function, then f satisfies the hypotheses of
the mean value theorem on every closed interval [a, b], and that every number
c satisfies the conclusion of the theorem.

2.2. If f is a quadratic function and [a, b] is any closed interval, prove that
there is precisely one number c ∈ (a, b) which satisfies the conclusion of the
mean value theorem. Hint. Consider f (x) = ax2 + bx + c.

2.3. Find the intervals where the function f (x) = x2 − 2x + 1 is decreasing


or increasing. Hint. f ′ (x) = 2x − 2 = 2(x − 1); f ′ > 0 when x > 1 and
negative when x < 1. Ans. f (x) is decreasing in the interval (−∞, 1) and
increasing in the interval (1, ∞).

2.4. For the function f (x) = x4 = 8x2 + 16, find the intervals where it is
increasing or decreasing. Hint. f ′ (x) = 4x3 − 16x = 4x(2)(x + 2). Three
2.9 EXERCISES 51

cases: (i) f ′ < 0 if x < −2, (ii) f ′ > 0 if 0 < x < 2, and (iii) f ′ > 0 if x > 2.
Ans. f is increasing in the intervals (−2, 0) and (2, ∞) and decreasing in the
intervals (−∞, −2) and 0, 2).

2.5. If f is a polynomial function of degree 3 and [a, b] is any closed interval,


prove that there are at most two numbers in (a, b) which satisfy the conclusion
of the mean value theorem. Generalize this result to a polynomial function of
degree n, where n is a positive integer. Hint. If f has degree 3, then f ′ (x)
is a polynomial of degree 2. Then f (b) − f (a) = f ′ (x)(b − a) has at most two
solutions x1 and x2 . If f has degree n, then there are at most n − 1 solutions.

2.6. For the function f (x) = x3 + x2 − 2x − 1, find the intervals where


it is concave downward or upward, and find the point of inflection. Hint.
f ′ (x) = 3x2 + 2x − 2; f ′′ (x) = 6x + 2 = 2(3x + 1). Ans. f is con-
cave downward if x < −1/3 and concave upward if x > −1/3. Since the
graph of f changes the direction of its concavity at x = −1/3, the point
(−1/3, f (−1/3) = (−1/3, −59/27) is the inflection point.

2.7. Find the local extrema of f ′ , and describe the intervals in which f ′ is
increasing or decreasing, given (i) f (x) = x4 − 6x2 ; (ii) f (x) = x4/3 + 4x1/3 .
Ans. (i) maximum f ′ (−1) = 8; minimum f ′ (1) = −8; f ′ increasing on
(−∞, −1] and 1, ∞), decreasing on [−1, 1]. (ii) minimum f (−1) = −3; in-
creasing on [−1, ∞), decreasing on (−∞, −1].

2.8. Use the second derivative test, whenever applicable, to find the local
extrema of f , and the intervals of concavity of f : (i) f (x) = 3x4 − 4x3 + 6;
(ii) f (x) = 2x6 − 6x4 ; and (iii) x2 − 27/x2 .
Ans. (i) minimum: f (1) = 5; CU (concave upward) on (−∞, 0) and
(2/3, 0); CD (concave downward) on (0, 2/3); abscissas of point of inflec-
tion are 0√and 2/3.√(ii) maximum f (0) = 0 (by p first derivative
p test); mini-
mump f (−p 2) = f ( 2) = −8; CU on (−∞, − 6/5) andp ( 6/5, ∞); CD on
(− 6/5, 6/5); abscissas of points of inflection are ± 6/5. (iii) No max-
imum or minimum; CU on (−∞, −3) and (3, ∞); CD on (−3, 0) and (0, 3);
abscissas of points of inflection are ±3.

2.9. Find the intervals where the functions (i) f (x) = xex , and (ii) g(x) =
cos x are concave upward or downward, and the points of inflection, if any.
Ans. (i) f ′′ (x) = (2 + x) ex; f (x) is concave upward if x > −2 and downward
if x < −2; −2 is an inflection point. (ii) g ′′ (x) = − cos x; so g is concave
upward where it is negative and downward where it is positive, with the point
of inflection at 0.
52 2 DIFFERENTIAL CALCULUS

2.10. For the f (x) = 3x2 − 9x + 6, determine the points of inflection, if


any, and intervals of concavity. Ans. f ′′ (x) > 0, so no points of inflection,
and the curve is entirely concave upward.

2.11. Find the parametric equation of the tangent line to the curve C
defined by C = {(et , tet , t + 4) : t ∈ R} at the point P = (1, 0, 4). Ans.
x = 1 + t, y = t, z = 4.

2.12. Find two different unit tangent vectors to the√curve C = {x =


e , y = e−t , x = t2 } at the point P = (1, 1, 4). Ans. ±(1/ 5) 2, −1, 0 . 
2t

2.13. If a function f and a vector-valued


h i u have limits as t → a,
i h function
then prove that lim f (t)u(t) = lim f (t) lim u(t) . 
t→a t→a t→a

2.14. If u and v are vector-valued functions which have limits as t → a,


then prove that lim [u(t) × v(t)] = lim u(t) × lim v(t). 
t→a t→a t→a

2.15. Let r(t) be the position vector of a particle, and s denote the arc
length along a curve C traced by the motion of the particle. Since the mag-
nitude of the velocity r′ (t) = ds/dt, and the direction of r′ (t) is the same
as that of the unit vector T(s) defined by T(s) = r′ (t)/|r′ (t)|, we may write
ds
r′ (t) = T(s), which after differentiating with respect to t gives
dt
d2 s ds d d2 s ds ds ′
r′′ (t) = 2
T(s) + T(s) + 2 T(s) + T (s).
dt dt dt dt dt dt

Figure 2.10 Acceleration.


We know that T′ (s) = KN(s), where K = |T′ (s)| and N(s) = T′ (s)/|T(s)|
is a unit vector orthogonal to T(s) (called the principal unit normal vector to
C at the point P (s)). If we denote the speed ds/dt by v and write K = 1/ρ,
where ρ is the radius of curvature of C, then

dv v2
r′′ (t) = T(s) + N(s).
dt ρ
2.9 EXERCISES 53

The result is known as the acceleration in terms of a tangential component


dv/dt (the rate of change of speed with respect to time) and a normal com-
ponent v 2 /ρ (see Figure 2.10).

2.16. An electrical power station (A) is being built on the bank of a river.
Cables to the substation need to be laid out underground and underwater
from another substation (B) 3 km upstream and on the opposite bank of the
river. The river follows a straight line course through this stretch and has a
fairly constant width of 1 km. Given that the cost of laying cable underground
is $30,000 per km and the cost of laying cables underwater is $50,000 per km,
in what proportion should the cable be laid in order to minimize cost?
Solution. A simple solution is to connect substations √ A and B by a
straight line because that will be the shortest distance of 10 between them.
But will it be cost effective
√ since the cable must be laid completely underwater,
at a cost of 50000 × 10 ≈ $ 158, 114, which may not be the minimum cost.
Another approach, to minimize the underwater installation cost, would be
to cross the river along the shortest possible path and then run the under-
ground cable along the bank of the river. Any combination of these two paths
may lead to the absolute minimum cost. Thus, to find the optimal solution,
we consider the path as shown in Figure 2.11.

Figure 2.11 Possible path of the cable.

Since the vertical distance between stations A and B is 3 km, we take a


point C at a distance x from A and join CB. Denoting CB by y, we note x
is the length of the underground cable and y the length of underwater cable.
Note that x = 0 corresponds to the case in which all the cable is underwater,
and x = 3 corresponds to the case in which the underwater cable will be the
shortest (along the line segment
p DB). From the right triangle CDB, we have
2 2 2
y = 1 + (3 − x) , or y = 1 + (3 − x)2 . Then choosing the path as A to C
54 2 DIFFERENTIAL CALCULUS

to B, the cost of laying the cable is


p
C(x) = 30000 x + 50000 y = 30000 x + 50000 1 + (3 − x)2 .

Since we want to find the absolute minimum of the function C(x) on the
interval [0, 3], we first find the critical points:

1 −1/2 
C ′ (x) = 30000 + 50000 ·1 = (3 − x)2 2(3 − x) · (−1)
2
50000(x − 3)
= 30000 + p .
1 + (3 − x)2

Although C ′ (x) exists for all x ∈ [0, 3], there are two values of x for which
C ′ (x) = 0, solving which we find that

50000(x − 3)
30000 + p = 0,
1 + (3 − x)2

9
which simplifies to (x − 3)2 = , thus leading to two solutions: x = 3 ± 34 .
16
Since the solution with the plus sign is outside the feasible interval [0, 3], we
have only one solution at x = 9/4. Thus, we have

C(0) = $ 158, 114 (as above); C(9/4) = $ 130, 000; and C(3) = $ 140, 000.

Hence, the minimum cost of running the cable from station A to B is achieved
by laying a vertical underground cable for apdistance of 2.25 km (A to C) and
a diagonal underwater cable of length 1 + (3/4)2 = 1.25 km (along CB) at
a cost of $ 130,000.

2.17. An oil company wants to manufacture a half-liter can for a brand of


motor oil in the shape of a right circular cylinder. Determine the dimensions
for such a can of radius r and height h that will minimize the amount of metal
used to form it.
Hint. Using the fact that 1 liter corresponds to a volume of 1000 cm3 ,
we have V = πr2 h = 500, and using the area of base and the top plus
the area of the circular part, the total area of the metal used for a can is
500 1000
A(r) = 2πr2 + 2πrh = 2πr 2 = 2πr2 + . Ans. Relative minimum at
πr r
1/3
r = 10/(4π) ≈ 4.3 cm.

2.18. Let the investment be nonautonomous but a function of income.


Find (a) the equilibrium level of income Y , given that Y = C +I, C = C0 +bY ,
I = I0 + aY , and (b) determine the change in the multiplier (MCP) b in this
case.
2.9 EXERCISES 55

Solution. (a) Y = C + I = C0 + bY + I0 + aY , or (1 − b − a)Y = C0 + I0 ,


C0 + I0
which gives Y = . Assume that C0 = 75, I0 = 80, b = 0.6, a = 0.2.
1−b−a
75 + 80 155
Then Y = = = 775.
1 − 0.6 − 0.2 0.2
1 1 1 1 1
(b) = = = 2.5, whereas = =
1−b 1 − 0.6 0.4 1−b−a 1 − 0.6 − 0.2
1 1
= 5, thus the multiplier MCP increases while changing from to
0.2 1−b
1 C0 + I0 155 155
. With the above data, we get = = = 387.5, which
1−b−a 1−b 0.6 0.4
is half of the value of 775 determined above in the case of nonautonomous
investment.

2.19. Supply and demand problems generally involve more than one mar-
ket. Determine the equilibrium price P and quantity Q for the following three
goods:
Qd1 = −5P1 + P2 + P3 + 23, Qs1 = 6P1 − 8,
Qd2 = P1 − 3P2 + 2P3 − 15, Qs2 = 3P2 − 11,
Qd3 = P1 + 2P2 − 4P3 + 19, Qs3 = 3P3 − 5,
where s1, s2, s3 and d1, c2, d3 denote the three supply and demand indices.
Solution. For the equilibrium we have in each market the following
equations:
Market 1: Qd1 = Qs1 gives (a) 11P1 − P2 − P3 = 31,
Market 2: Qd2 = Qs2 gives (b) P1 − 6P2 + 2P3 = −26,
Market 3: Qd3 = Qs3 gives (c) P1 − 2P2 − 7P3 = −24.
Method 1. We will use Cramer’s rule (A.15) to solve this system of three
equations AP = b. Thus,
11 −1 −1
|A| = 1 −6 2 = 11[42 − 4] + [−7 − 2] − [2 + 6] = 401,
1 2 −7
31 −1 −1
|A1 | 1604
|A1 | = −26 −6 2 = 1604, =⇒ P1 = = = 4;
|A| 401
−24 2 −7
11 31 −1
|A2 | 2807
|A2 | = 1 −26 2 = 2807, =⇒ P2 = = = 7;
|A| 401
1 −24 −7
56 2 DIFFERENTIAL CALCULUS

11 −1 31
|A3 | 2406
|A3 | = 1 1 −26 = 2406, =⇒ P3 = = = 6.
|A| 401
1 2 −24
Method 2. This problem can also be solved by the Gauss elimination
method (§1.4.3), as follows: Keeping Eq (a) fixed, eliminate P1 between Eqs
(a) and (b), i.e., multiply Eq (b) by 11 and subtract from Eq (a), to get

(d) 65P2 − 23P3 = 317.

Again, eliminate P1 between Eqs (b) and (c), to get

(e) − 8P2 + 9P3 = −2.

Next, eliminate P2 between Eqs (d) and (e), i.e., multiply Eq (d) by 23 and
Eq (e) by 65, and add, to get: 401P3 = 2406, or P3 = 6. Hence, the system
reduces to the triangularized system

11P1 − P2 − P3 = 31,
65P2 − 23P3 = 317,
P3 = 6.

The values of P1 , P2 are now found by back substitution, i.e., substituting


P3 = 6 in the second equation we get P2 = 7, and then substituting these
values of P2 and P3 into the first equations gives P1 = 4.
Method 3. We can use the formula [P] = [A]−1 [b] (see §1.4), where [A]−1
is obtained using formulas (A.8) and (A.10). Since |A| = 401 6= 0, we have
the cofactor matrix C as
 
−6 2 1 2 1 −6

 2 −7 1 −7 1 2   

 −1 −1
 38 −9 −8
11 −1 11 −1   
C= − = −9 −76 −23  ,
 2 −7 1 −7 1 2 
  −8 −23 −65
 −1 −1 11 −1 11 −1 
− − −
−6 2 1 2 1 −6

which yields  
38 9 8
adj(A) = CT =  −9 −76 −23  ,
−8 −23 −65
1
where [A]−1 = | adj(A), and [b] = [31 − 26 − 24]T . Hence, the vector
{A
P = [P1 P2 P3 ]T is given by

T 1
[ P1 P2 P3 ] = [A]−1 = = [A]−1 [b]T ,
|A|
2.9 EXERCISES 57

or
        
P1 38 −9 −8 31 1604 4
 P2  = 1  9 8 −23   −26  =
1 
2807  =  7  .
401 401
P3 8 −23 −65 −24 2306 6

Method 4. Using the simple elimination method, multiply Eq (a) by 2 to


get 22P1 − 2P2 − 2P3 = 62, and subtract it from Eq (c) to get

(d’) 23P1 − 19P3 = 38.

Multiply Eq (b) by 3 and add to Eq (b), to get

(e’) 4P1 − 19P3 = −98.

Next, multiply Eq (d’) by 19 and Eq (e’) by 9, and subtract; this gives 401P1 =
1604, or P1 = 4. Substitute this value of P1 into Eq (e’) to get P3 = 6. Finally,
substitute these values of P1 and P3 into any one of Eqs (a), (b) or (c), to get
P2 = 7.

Figure 2.12 Lump-sum tax.


2.20. Using graph, show how the addition of a lump-sum tax (a tax
independent of income) affects the parameters of the income determination
model. Plot two systems: (1) Y = C + I, C = 120 + 0.8Y, I0 = 40 (solid line
marked (1) in Figure 2.12); and (2) Y = C + I, C = 120 + 0.8Y, I0 = 40, Yd =
Y − T, T = 60 (sold line marked (2) in Figure 2.12).
Solution. System (1): Aggregated demand function D = C + I = 120 +
0.8Y + 40 = 160 + 0.8Y ; slope= 0.8; solving 160 + 0.8Y = y yields the income
of 800.
58 2 DIFFERENTIAL CALCULUS

System (2): Aggregated demand function D = C + I = 120 + 0.8Yd + 40 =


120 + 0.8(Y − 60) + 40 = 112 + 0.8Y ; slope = 0.8; solving 112 + 0.8Y = y
yields the income of 560. Thus, the lump-sum tax has a negative effect on the
vertical intercept of D equal to −M P C(T ) = −0.8(60) = −48. As a result of
this tax, income falls from 800 to 560.

2.21. (Optimization of economic functions) Maximize the profit π, given


R = 3500Q − 64Q2, and C = 4Q3 − 10Q2 − 1300Q + 6000. Thus,

π = R − C = −4Q3 − 54Q2 + 4800Q − 6000.

Then


π′ = = −12Q2 −108Q+4800 = −12(Q2 +9Q−400) = −12(Q−16)(Q+25).
dQ

Equating π ′ to zero, we get the critical points as Q = 16, −25. Next,

π ′′ = −24Q − 108 =⇒ π ′′ (16) = −492 < 0,

which is concave and π has a relative maximum at Q = 16. Since π ′′ (−25) =


492 > 0, this critical point is rejected. Also, π(16) = −4(16)3 − 54(16)2 +
4800(16) − 6000 = 40492.

2.22. The marginal expenditure function (ME) is associated with the


supply function P = 1 + 3Q + 4Q2 . Find ME when P = 3 and P = 5.
d(TE)
Solution. ME = . Here TE = P Q = (1 + 3Q + 4Q2 )Q = Q + 3Q2 +
dQ
4Q3 . Then ME = 1 + 6Q + 12Q2 . At Q = 2, ME = 1 + 6(2) + 12(4) = 61;
and at Q = 5, ME = 1 + 6(5) + 12(25) = 331.

2.23. Using graphs, show how the addition of a proportional tax (a tax
depending on income, also known as super tax) affects the parameters of the
income determination model. Plot two systems: (1) Y = C + I, C = 90 +
0.8Y, I0 = 40 (solid line marked (1) in Figure 2.13); and (2) Y = C + I, C =
90 + 0.8Yd , I0 = 40, Yd = Y − T, T = 25 + 0.25Y , where 25 is the lump-sum
tax (sold line marked (2) in Figure 2.13).
Solution. System (1): Aggregated demand function D = C + I = 90 +
0.8Y + 40 = 130 + 0.8Y ; slope= 0.8; this yields the income of 650.
System (2): Aggregated demand function D = C + I = 90 + 0.8Yd + 40 =
130 + 0.8(Y − 25 − 0.25Y ) = 110 + 0.6Y ; slope = 0.6; this yields the income of
275. Thus, the proportional tax affects not only the slope of the line, or the
MCP, from m = 0.8 to m = 0.6, but also the vertical intercept that is lowered
2.9 EXERCISES 59

since the tax includes a lump-sum tax of 25. As a result of this tax, income
falls from 650 to 275.

Figure 2.13 Proportional tax.

dC
2.24. (a) Given C = C0 + bY , we get MCP= = b.
dY
(b) Given C = 1100 + 0.75Yd, where Yd = Y − T , T = 80, we have
dC
C = 1100 + 0.75(Y − 80) = 1160 + 0.75Y . Then MCP = = 0.75.
dY

44
2.25. The average cost function is given by AC = 1.6Q + 4 +. Find the
Q
marginal cost MC. Hint. MC is determined by first finding TC = (AC)Q,
d(TC)
and then using the formula MC= . Ans. MC = 3.2Q + 4.
dQ

2.26. Optimize the following functions: (a) y = 12 x4 − 10x3 + 100x2 + 16;


(b) y = −3x4 − 28x2 + 108x + 37; (c) y = −(x + 11)4 ; and (d) y = (7 − 5x)4 .
Ans. (a) Critical values −5, 0, 20; convex, and relative minimum at x =
−5, 20; concave and relative maximum at x = 0. (b) Critical values −9, 0, 2;
concave, and relative minimum at x = −9, 2; convex and relative minimum
at x = 0. (c) Critical value −11; concave and relative maximum at x = −11.
(d) Critical values 57 ; convex, and relative minimum at x = 75 . Hint. In
(c) the test fails at y ′ (−11), y ′′ (−11), y ′′′ (−11), and in (d) the test fails at
y ′ (7/5), y ′′ (7/5), y ′′′ (7/5).

2.27. Maximize the following total revenue function TR and total profit
function π by finding the critical value(s), by testing the second-order con-
ditions, and by calculating the maximum TR or π: (a) π = −Q3 − 75Q2 +
60 2 DIFFERENTIAL CALCULUS

1800Q − 310; (b) TC = 12 Q3 − 18Q2 + 420Q; (c) TR = 1200Q − 8Q2 , and TC


= 1400+80Q; and (d) TR = 6400Q+15Q2, and TC = Q3 +75Q2 +100Q+754.
Ans. (a) Critical values Q = −60, 10; π ′′ (−60) < 0, π ′′ (10) > 0, so concave
and relative maximum is at Q = −60, and convex and relative minimum at
Q = 10. (b) AC = 21 Q3 − 18Q2 + 420Q /Q = 21 Q2 − 18Q + 420. Then
AC′ = Q − 18, so the critical value is Q = 18. Since AC′′ = 1, the convex
and relative minimum is at Q = 18. (c) π = TR − TC; the critical value is at
Q = 70; since π ′′ < 0, we have concave and relative maximum at Q = 70. (d)
π = 6400Q + 15Q2 − Q3 − 75Q2 − 100Q − 754 = −Q3 − 60Q2 + 6300Q − 754,
giving π ′ = −3Q2 − 120Q + 6300 = −3(Q2 + 40Q − 2100). The critical values
are Q = 70, −30, and π ′′ (70) < 0, so the concave and relative maximum is at
Q = 70; π ′′ (−30) > 0, so convex and relative minimum is at Q = −30.

2.28. Let an isoquant be defined by 24K 1/4 L3/4 = 2414. Use implicit
differentiation with respect to L to find the slope of the isoquant dK/dL, or
the MRTS for given values of K = 260 and L = 120, and interpret the result.
Solution. By implicit differentiation, we get dK/dL = −3K/L. Then
MRTS = −3(260)/120 = −6.5. This means that if L is increased by 1 unit,
K must be decreased by 6.5 units in order to retain the production isoquant
when the production level is constant.

2.29. Let an isoquant be defined by 50K 3/5 L2/5 = 5000. Use implicit
differentiation with respect to L to find the slope of the isoquant dK/dL, or
the MRTS for given values of K = 360 and L = 160, and interpret the result.
dK 2L
Ans. =− , MRTS= −1.5
dL 3K

2.30. Consider the function f (x, y) = 4x2 − xy + y 2 − 4x − 7y + 10. Find


the extrema at the critical points for this function.
Solution. We have fx = 8x − y − 4 = 0, fy = −x + 2y − 7 = 0, solving
which we get the critical point (1, 4). Next, fxx = 8, fyy = 2, fxy = −1 = fyx ,
so fxx (1, 4) · fyy (1, 4) = 16 > [fxy (1, 4)]2 = 1. Hence, the function f has a
global minimum at (1, 4). The same results are obtained by using the Hessian
at each critical point:

8 −1
At (1, 4) : |H| = .
−1 2

2.31. Consider the function f (x, y) = 36y − 4x2 − 8xy − 2y 2 + 72x. Find
the extrema at the critical points for this function.
Solution. We have fx = −8x − 8y + 72 = 0, fy = 36 − 8x − 4y = 0,
solving which we get the critical point (0, 9). Next, fxx = −8, fyy = −4, fxy =
2.9 EXERCISES 61

−8 = fyx , so fxx (0, 9) · fyy (0, 9) = 32 < [fxy (1, 4)]2 = 64. Hence, the function
f has an inflection point at (0, 9).

2.32 Consider the function f (x, y) = 6x2 − 3y 2 − 24x + 6y + 6xy. Find the
extrema at the critical points for this function.
Solution. We have fx = 12x−24+6y = 0, fy = −6y +6+6x = 0, solving
which we get the critical point (1, 2). Next, fxx = 12, fyy = −6, fxy = 6 = fyx .
Since fxx and fyy are of different signs, we get fxx · fyy = −72 < (fxy )2 = 36.
Hence, the function f has a saddle point at (1, 2). The same results are
obtained by using the Hessian at the critical point:

12 6
At (1, 2) : |H| = .
6 −6

2.33. Consider the function f (x, y) = 2x3 + 3y 3 + 18y 2 − 90x − 189y. Find
the extrema at the critical points for this function.
Solution. We have fx = 6x2 − 12x − 90 = 0, fy = 9y 2 + 36y − 189 = 0,
solving which we get the critical points (−3, 3), (−3, −7), (5, 3), (5, −7). Next,
fxx = 12x − 12, fyy = 18y + 36, fxy = 0 = fyx . Then

(1) fxx (−3, 3) = −48 < 0, fyy (−3, 3) = 90 > 0,


(2) fxx (−3, −7) = −48 < 0, fyy (−3, −7) = −18 < 0,
(3) fxx (5, 3) = 48 > 0, fyy (5, 3) = −18 > 0,
(4) fxx (5, −7) = 48 > 0, fyy (5, −7) = −90 < 0.

Since fxx and fyy are of different signs in cases (1) and (4), we have sad-
dle points at (−3, 3) and (5, −7). Next, in case (2) we have fxx · fyy =
(−48)(−18) > (0)2 , so we have a relative maximum at (−3, −7). Again, in
case (3) we have fxx · fyy = (48)(18) > (0)2 , so we have a relative minimum
at (5, 3). The same results are obtained by using the Hessian at each critical
point:

−48 0 −48 0
At (−3, 3) : |H| = ; At (−3, −7) : |H| = ;
0 90 0 −18
48 0 48 0
At (5, 3) : |H| = ; At (5, −7) : |H| = .
0 −18 0 −90

2.34. The equation of a production isoquant is given as 8K 1/4 L3/4 = 1008.


dK
(a) Use implicit differentiation to find the MRTS ; and (b) evaluate the
dL
MRTS at K = 238 and L = 183.
Ans. (a) dK/dL = −3K/L; (b) MRTS = −(3)(238)/183 = −3.9. Hence,
an increase in L by 1 unit will result in a decrease of 3.9 units in K, in order
to remain on the production isoquant.
62 2 DIFFERENTIAL CALCULUS

2.35. The elasticity of substitution σ, 0 ≤ σ ≤ ∞, measures the percentage


change in the least-cost (K/L) input ratio resulting from a small percentage
change in the input-price ratio (PL /PK ). If σ = 0, there is no substitution. A
Cobb-Douglas function has a constant elasticity of substitution (CES), defined
 −1/β
by q = A αK −β + (1 − α)L−β , where A > 0 is the efficiency parameter,
α (0 < α < 1) is the distribution parameter denoting relative factor shares,
and β > −1 is the substitution parameter that determines the value of the
elasticity of substitution. Prove that the CES production function f (kK, kL)
is a homogeneous function of degree 1.
Solution. Multiply inputs K and L in the expression for q by a constant
k > 0, to get

f (kK, kL) = A[α(kK)−β + (1 − α)(kL)−β ]−1/β



= A[k −β αK −β + (1 − α)L−β ]−1/β
= A(k −β )−1/β [αK −β + (1 − α)L−β ]−1/β
= kA[αK −β + (1 − α)L−β ]−1/β = kq.
3
Concave and Convex Functions

The concept of a convex set is used to define concave and convex functions.
Although the names appear similar, a convex set should not be confused with
a convex function. However, a concave and a convex function are defined in
terms of a convex set.

3.1 Convex Sets


Let X denote a vector space over a field F . Let k · k denote a norm on X,
which is a mapping from X into R such that for every x, y ∈ X and every
t ∈ F , the following conditions are satisfied: (i) kxk ≥ 0; (ii) kxk = 0 iff
x = 0; (iii) ktxk = |t| · kxk; and kx + yk ≤ kxk + kyk. Then X is called a real
normed linear space.
Let x, y ∈ X, and let a set xy be defined by

xy = {z ∈ X : z = tx + (1 − t)y for all t ∈ R, 0 ≤ t ≤ 1}. (3.1.1)

The set xy is the line segment joining the points x and y in X. Then we
have the following definition: The set Y ⊂ X is said to be a convex set if Y
contains the line segment xy whenever x and y are two arbitrary points in Y
(see Figure 3.1). A convex set is called a convex body if it contains at least
one interior point, i.e., if it completely contains some sphere.

Figure 3.1 Convex and non-convex sets.


64 3 CONCAVE AND CONVEX FUNCTIONS

Example 3.1. The following sets are convex: (i) the empty set; (ii) a set
containing one point; a line segment and a plane in R3 ; and (iv) any linear
subspace of X, and (v) a cube and a sphere in R3 . 
Example 3.2. Let Y and Z be convex sets in X, let α, β ∈ R, and let
αY = {x ∈ X : x = αy, y ∈ Y }, and βZ = {x ∈ X : x = βZ, z ∈ Z}. Then
the set αY + βZ = {x ∈ X : x = αy + βz, y ∈ Y, z ∈ Z} is a convex set in
X. 
Theorem 3.1. Let Y be a convex set in X, and let α, β ∈ R be positive
scalars. Then (α + β)Y = αY + βY .
Proof. If x ∈ (α + β)Y , then x = (α + β)y = αy + βy ∈ αY + βY . Hence,
(α + β)Y ⊂ αY + βY . Let Y be convex, and let x = αy + βz, where y, z ∈ Y .
α β α+β
Then since + = = 1, we get
α+β α+β α+β

1 α β
x= y+ z ∈ Y.
α+β α+β α+β

Hence, x ∈ (α + β)Y , which gives αY + βY ⊂ (α + β)Y. 


This theorem implies that the intersection of an arbitrary collection of
convex sets is also a convex set.
Let Y be a subset of X. Then the convex hull of Y, denoted by Yc , is the
intersection of all convex sets which contain Y . This convex hull is also called
the convex cover of Y , and therefore, it is always a convex set. Some convex
hulls are presented in Figure 3.2. The convex hull is also known as convex
envelope.

Figure 3.2 Convex hulls.

Theorem 3.2. Let Y be a subset of X. The convex hull of Y is the


set of points α1 y1 + α2 y2 + · · · + αk yk , where y1 , . . . , yk ∈ Y , and αi > 0,
k
P
i = 1, . . . , k, such that αi = 1, where k is not fixed.
i=1
3.2 CONCAVE FUNCTIONS 65

Proof. If Z is the set of points as described in the theorem, then obviously


Z is convex, and also Y ⊂ Z. Hence, Yc ⊂ Z. Now, we will show that Z ⊂ Yc ,
i.e., we show that Z is contained in every convex set which contains Y . We
will use the method of induction on the number of elements of Y that appear
in the representation of an element of Z. Suppose U is a convex set such that
Y ⊂ U . If z = α1 z1 ∈ Z for n = 1, then α1 = 1 and z ∈ U . Now assume that
an element of Z is in U if it is represented in terms of (n − 1) elements of Y .
Then, let z = α1 z1 + · · · + αn zn be in Z, and let β = α1 + · · · + αn−1 , and take
βi = αi /β, i = 1, . . . , n − 1. Further, let u = β1 z1 + · · · + βn−1 zn−1 . Then
u ∈ U , by induction. However, zn ∈ U , αn = 1−β, and z = βu+(1−β)zn ∈ U ,
since U is convex. Thus, by induction, Z ⊂ U , which implies that Z ⊂ Yc . 
Corollary 3.1. (i) Let Y be a convex set in X. Then the closure Ȳ of Y
is also a convex set; and (ii) since the intersection of finitely many closed sets
is always closed, the intersection of finitely many closed convex sets is also a
closed convex set.
A set Y in X is called a cone with vertex at the origin if y ∈ Y implies
αy ∈ Y for all α ≥ 0. Hence, if Y is a cone with vertex at the origin, then the
set x0 + Y , where x0 ∈ X, is called a cone with vertex x0 . A convex cone is
a set which is both convex and a cone. Some examples of cones are shown in
Figure 3.3.

Figure 3.3 (a) Cone. (b) Convex cone.

Theorem 3.3. Any sphere in X is a convex set.


Proof. Without loss of generality, consider the unit sphere Y = {x ∈ X :
kxk < 1}. If x0 , y0 ∈ Y , then kx0 k < 1 and ky0 k < 1. Now if α ≥ 0 and β ≥ 0,
where α + β = 1, then kαx0 + βy0 k ≤ kαx0 k + kβy0 k = αkx0 k + βky0 k <
α + β = 1, and therefore, αx0 + βy0 ∈ Y . 

3.2 Concave Functions


Note that a function f is convex iff −f is concave. Concave and convex
functions can be defined in terms of convex sets known as hypographs and
66 3 CONCAVE AND CONVEX FUNCTIONS

epigraphs, respectively. The hypograph1 hyp(f ) of a real-valued function


f : Rn 7→ R is defined as the area below f . It is a set in Rn+1 given by

hyp(f ) = {(x, y) | y ≤ f (x), (3.2.1)

Thus, a function f is said to be concave if hyp(f ) is a convex set (Figure 3.4).


This definition implies that a function f is concave if for any x, x′ ∈ dom(f ),

f (tx + (1 − t)x′ ) ≥ tf (x) + (1 − t)f (x′ ) for 0 ≤ t ≤ 1. (3.2.2)

The left-hand side of (3.2.2) denotes the functional value of a convex function,
which exceeds the combination of functional values on the right-hand side (see
Figure 3.5).

Figure 3.4 Concave function. Figure 3.5 Definition (3.2.2).

Theorem 3.4. The definitions (3.2.1) and (3.2.2) of a concave function


are equivalent.
Proof. Let (x, x′ ) be in hyp(f ), where y = f (x) and y ′ = f (x′ ). If
hyp(f ) is a convex set, then for any 0 ≤ t ≤ 1 the convex combination
(tx + (1 − t)x′ , ty + (1 − t)y ′ ) is also in hyp(f ). Thus,

tf (x) = (1 − t)f (x′ ) = ty + (1 − t)y ′ ≤ f (tx + (1 − t)x′ ). (3.2.3)

The left-side of this inequality, which is equal to tf (x) + (1 − t)f (x′ ), implies
(3.2.2). Hence, the inequality (3.2.1) implies (3.2.2). Conversely, assume that
tf (x) + (1 − t)f (x′ ) ≤ f (tx + (1 − t)x′ ). Choose y and y ′ such that y ≤ f (x)
and y ′ ≤ f (x′ ). Obviously, (x, y) and (x′ , y ′ ) are both in hyp(f ). Thus,
ty ≤ tf (x), and (1 − t)y ′ ≤ (1 − t)f (x′ ) for any 0 ≤ t ≤ 1. But this implies
that ty + (1 − t)y ′ ≤ tf (x) + (1 − t)f (x′ ). Since the right side of this inequality
1 The prefix hypo- or hyp− (from Greek and Latin) mean ‘under’ or ‘beneath’.
3.2 CONCAVE FUNCTIONS 67

is assumed to be not greater than tf (x) + (1 − t)f (x′ ), we have the inequality
(3.2.3).
ty + (1 − t)y ′ ≤ f (tx + (1 − t)x′ ).
Hence, (tx + (1 − t)x′ , ty + (1 − t)y ′ ) ∈ hyp(f ), i.e., hyp(f ) is a convex set,
and the inequality (3.2.2) implies (3.2.1), which completes the proof. 
If the inequality in (3.2.3) is strict, then the function f is called a strictly
concave function; i.e., for any x, x′ ∈ dom(f ),

f (tx + (1 − t)x′ ) > tf (x) + (1 − t)f (x′ ), (3.2.4)

where t ∈ (0, 1). Thus, a function is strictly concave if its hypograph is a


strictly convex set.
Theorem 3.5. (Combinations of concave functions) (i) Let f and g be
concave functions defined on a convex subset of Rn . Then their sum f + g is
a concave function; also if one of them is strictly concave, then the sum f + g
is strictly concave.
(ii) Let f be a (strictly) concave function on a convex subset of Rn , and α
be a positive scalar. Then, αf is a (strictly) concave function.
(iii) An affine combination of concave functions is a concave function, i.e.,
αf + βg is a concave function on a subset of Rn , where α, β ≥ 0.
(iv) Let f be a (strictly) concave function on a convex subset of Rn , and
let g be a strictly increasing concave function defined on R(f ) ∈ R. Then, the
composite function f ◦ g is a (strictly) concave function.
(v) Let f and g be concave functions on a convex subset of Rn , and bounded
from below. Then, the pointwise infimum function min{f (x), g(x)} is a con-
cave function.
(vi) Let f and g be concave functions on a convex subset of Rn , and let
S be a subset of the domain space dom(f ). Then, f is continuous on the
interior of S, except possibly a point of discontinuity (singularity) only on the
boundary ∂S.
(vii) Let f be a function defined on a convex subset of Rn . Then f is
concave iff its restriction to every chord in the convex domain is a concave
function.
Geometrically, part (vii) is about a convex slice that a vertical hyperplane
cuts out of the hypograph. Recall that a function is concave iff its hypograph
is a convex set, and a hypograph is convex if every hyperplane intersecting it
produces a slice that is a convex set.
68 3 CONCAVE AND CONVEX FUNCTIONS

3.2.1 Properties of Concave Functions


1. A differentiable function f is concave on an interval I if its derivative f ′ is
monotone decreasing on I; i.e., a concave function has a non-increasing slope
(i.e., not decreasing but allowing zero slope).
2. Points where concavity changes (between concave and convex) are inflection
points.
3. Near a local maximum in the interior of the domain of a function, the
function must be concave; as a partial converse, if the derivative of a strictly
concave function is zero at some point, then the point is a local maximum.
4. If f is twice-differentiable, then f is concave iff f ′′ is nonpositive (or, if the
acceleration is nonpositive). If f ′′ is negative, then it is strictly concave, but
the converse is not true as shown by f (x) = −x4 .
5. Any local maximum of a concave function is also a global maximum. A
strictly concave function will have at most one global maximum.
6. If f is concave and differentiable, then it is bounded above by its first-order
Taylor approximation

f (y) ≤ f (x) + f ′ (x)(y − x).

7. A continuous function on C is concave iff for any x, y ∈ C


x + y  f (x) + f (y)
f ≤ .
2 2

8. If a function f is concave, and f (0) = 0, then f is subadditive.


Proof. Since f is concave, let y = 0, then f (tx) = f (tx + (1 − t) · 0) ≥
tf (x) + (1 − t) · 0 ≥ tf (x). Also,
 a   b 
f (a) + f (b) = f (a + b) + f (a + b)
a+b a+b
a b
≥ f (a + b) + f (a + b) = f (a + b). 
a+b a+b

Example 3.3. Some examples of concave and convex functions are



1. The functions f (x) = −x2 and g(x) = x are concave on their domains,
as their second derivatives f ′′ (x) = −2 and g ′′ (x) = − 43 x−5/4 are always
negative.
2. The logarithm function f (x) = log x is concave on its domain (0, ∞), as
its derivative f ′ (x) = 1/x is strictly decreasing function.
3. Any affine function f (x) = ax + b is both concave and convex, but not
strictly concave nor strictly convex.
4. The sine function f (x) = sin x is concave on the interval [0, π].
3.3 JENSEN’S INEQUALITY FOR CONCAVE FUNCTIONS 69

3.3 Jensen’s Inequality for Concave Functions


Jensen’s inequality is a general form of the definition (3.2.2) of a concave
function; it uses general convex combinations. It states that a real-valued
function f : Rn 7→ R is concave iff the function value of a convex combination
is at least as large as the convex combination of functional values. Thus,
a real-valued function f : Rn 7→ R is concave iff the function value of a
concave combination is larger than the concave combination of the functional
value, i.e., iff for any k vectors x1 , . . . , xk ∈ dom(f ) and t1 ≥ 0, . . . , tk ≥
0, t1 + · · · + tk = 1, we have

f (t1 x1 + · · · + tk xk ) ≥ t1 f (x1 ) + · · · + tk f (xk ). (3.3.1)


k  k k
P P P
This inequality can also be written as f ti xi ≥ ti f (xi ) for t1 = 1
i i i
and ti ≥ 0 for i = 1, . . . , k.
Besides the above general definition of Jensen’s inequality, there are other
forms for this inequality for concave functions. Let f : Rn 7→ R be a concave
function. Then
(i) For two points x, x′ ∈ R, Jensen’s inequality is

f (tx + (1 − t)x′ ) ≥ tf (x) + (1 − t)f (x′ ), (3.3.2)

where t ∈ [0, 1]. This is inequality (3.2.2) valid for concave functions defined
on R.
(ii) For more than two points xi ∈ R, i = 1, . . . , k, Jensen’s inequality is

k
X  Xk
f ti xi ≥ ti f (xi ), (3.3.3)
i i

k
P
where ti = 1, ti ≥ 0.
i
R
(iii) Let p(x) ≥ 0 such that p(x) dx = 1. Then the continuous form of
Jensen’s inequality is
Z  Z
f x p(x) dx ≥ f (x)p(x) dx. (3.3.4)

(iv) For any probability distribution on x, Jensen’s inequality is

f (Ex) ≥ Ef (x). (3.3.5)


70 3 CONCAVE AND CONVEX FUNCTIONS

Jensen’s inequality (Figure 3.6) can be interpreted as follows: The (zero


mean) randomization decreases the average value of a concave function.

Figure 3.6 Jensen’s inequality.

This inequality is a basic result which has produced many other useful
inequalities. For example, we have the arithmetic-geometric mean inequality:

a+b √
≥ ab for a, b ≥ 0, (3.3.6)
2
a + b
1
which can be easily proved by using the inequality log ≥ 2 (log a +
2
log b) for the function f (x) = log x, which is concave for all x > 0.

3.4 Convex Functions


Let f : Rn 7→ R. Then the function f is said to be convex if for all x, y ∈
dom(f ),
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y), (3.4.1)

where 0 ≤ t ≤ 1. The definition (3.4.1) simply says that a convex combination


of the functional values is greater than or equal to their convex combinations
(see Figure 3.7).
Convex functions can be defined in terms of the convex set called epigraphs.
The epigraph2 epi(f ) of a real-valued function f : Rn 7→ R is defined as the
area above f (see Figure 3.8). It is a set in Rn , defined by
epi(f ) = {(x, y) : y ≥ f (x)}. (3.4.2)

2 The prefix epi- from Greek and Latin means ‘over’ or ‘upon.’
3.4 CONVEX FUNCTIONS 71

Thus, a function f is said to be convex if epi(f ) is a convex set. This definition


implies that a function f satisfies the inequality (3.4.1). Obviously,

A function f is convex if and only if −f is concave.

An extended-value extension of a convex function f is defined as



˜ f (x) if x ∈ dom (f ),
f (x) = (3.4.3)
+∞ if x 6∈ dom(f ).

Then the inequality

f˜(tx + (1 − t)y) ≤ tf˜(x) + (1 − t)f˜(y) (3.4.4)

holds for all x, y ∈ Rn , t ∈ [0, 1]. The inequality (3.4.4) is an extension of


(3.4.1) defined on R ∪ {+∞}.

Figure 3.7 Definition (3.4.1). Figure 3.8 Convex set epi(f ).

Note that the property (3.4.1) is, in many cases, weakened by requiring
that
f (tx + (1 − t)x′ ) ≤ max{f (x), f (x′ )} (3.4.5)
for all x, x′ ∈ X and t ∈ [0, 1]. The inequality (3.4.5) is known as the modified
Jensen’s inequality.
We will assume that all convex functions are extendable, and hence, use
the same notation f for a convex function as well as its extension f˜.
Let f : X 7→ Y denote the mapping of a set X ∈ Rn into another set
Y ∈ Rn . Then a mapping f1 of a subset X1 ⊂ X into Y is called the
restriction of f to the set X1 . This definition leads to the following result:
Theorem 3.6. Let f be a convex function defined on a convex subset of
Rn . Then f is convex iff its restriction to every chord in the convex domain
set is a convex function.
72 3 CONCAVE AND CONVEX FUNCTIONS

Geometrically, a function is convex iff its epigraph is a convex set. An


epigraph is convex if every hyperplane intersecting it produces a convex shaped
slice. This theorem is about a convex slice which is cut out of the epigraph
by a vertical hyperplane.
A function f is strictly convex if the strict inequality holds in the definition
(3.4.1).

3.4.1 Properties of Convex Functions. Some useful properties of convex


functions are as follows.
(i) Given two convex functions f and g defined on a convex subset of Rn , their
sum f + g is a convex function. Moreover, if at least one of them is strictly
convex, the sum f + g is strictly convex. This property extends to infinite
P∞
sums and integrals, i.e., if f1 , . . . fn , . . . are convex functions, then fn is
R n=1
convex; and if g(x, y) is convex in x, then g(x, y) dy is convex.
(ii) Given f as a (strictly) convex function on a convex subset of Rn , and a
positive scalar α, the function αf is (strictly) convex.
(iii) An affine combination of convex functions is again a convex function,
i.e., if f and g are both convex functions on a convex subset of Rn , then (a)
αf + βg, where α, β ≥ 0, is a convex function; and (b) if f is convex, then
f (Ax + b) is convex. This is called affine transformation of the domain.
(iv) Let f be a (strictly) convex function on a convex subset of Rn , and g be a
strictly increasing convex function defined on R(f ) in R. Then the composite
function f ◦ g is a (strictly) convex function.
(v) Let f1 and f2 be convex functions on a convex subset of Rn , and bounded
from above. Then the pointwise supremum function max{f1 (x), f2 (x)} is a
convex function. This property corresponds to the intersection of epigraphs
(see Exercise 3.5).
(vi) A function f is convex iff it is convex on all lines, i.e., f is convex at x0
iff f (x0 + th) is convex in t for all x0 and h.
(vii) A positive multiple of a convex function is convex, i.e., f is convex iff αf
is convex for α ≥ 0.
(viii) If {fa }α∈A is convex, then supα∈A {fa } is convex. This is known as
pointwise supremum; see property (v).
Example 3.4. (i) The piecewise-linear function f (x) = max{ATi xi + bi }
i
is convex, and its epi (f ) is a polyhedron;
(ii) sups∈S kx − sk, which is the maximum distance to any set, is convex in x;
(iii) f (x) = x⌊1⌋ + x⌊2⌋ + x⌊3⌋ is convex on Rn , where x⌊i⌋ is the ith largest
integer x; and
3.4 CONVEX FUNCTIONS 73

m
P
(iv) f (x) = log(bi − aTi x)−1 is convex, where dom (f ) = {x | aTi x < bi , i =
i=1
1, . . . , m}. 

3.4.2 Jensen’s Inequality for Convex Functions. Let f : Rn 7→ R be a


convex function. Then Jensen’s inequality for convex functions is defined in
the four cases discussed in §3.3, except that the ≥ sign in inequalities (3.3.1)
through (3.3.5) is replaced by the ≤ sign.
Jensen’s inequality for convex functions is presented in Figure 3.9. It can be
interpreted as follows: The (zero mean) randomization increases the average
value of a convex function.

Figure 3.9 Jensen’s inequality.

As in the case of concave functions, a general definition of convex functions,


based on Jensen’s inequality, uses general convex combinations. Thus, a real-
valued function f : Rn 7→ R is convex iff the function value of a convex
combination is not larger than the convex combination of the functional value,
i.e., iff for any k vectors x1 , . . . , xk ∈ dom(f ) and t1 ≥ 0, . . . , tk ≥ 0, t1 +
· · · + tk = 1,

f (t1 x1 + · · · + tk xk ) ≤ t1 f (x1 ) + · · · + tk f (xk ). (3.4.6)

Example 3.5. In R, (i) xα is convex on R+ for α ≥ 1 and a ≤ 0, and


concave for 0 ≤ α ≤ 1.
(ii) log x is concave on R+ , but x log x is concave on R++ , because lim x log x =
x→0
0.
(iii) eax is convex.
Rx 2
(iv) |x|, and max(0, ±x) are convex, but log −x
e−t dt is concave.
74 3 CONCAVE AND CONVEX FUNCTIONS

(v) f (x) = eg(x) is convex if g is convex; f (x) = 1/g(x) is convex if g is


p
P = g(x) , p ≥ 1, is convex if g(x) is convex and
concave and positive; f (x)
positive; and f (x) = − log(−fi (x)) is convex on {x|fi (x) < 0} if fi are
i
convex.
(vi) Since f is concave if −f is convex, the function f (x) = x2 is convex for
all x ∈ R; the function f (x) = 1/x is convex for x ∈ R+ ; and the function
f (x) = log x is concave for x ∈ R+ .
(vi) If f (x) = (h ◦ g)(x) = h(g(x)), where g : R 7→ R and h : R 7→ R, is
a composition of two functions g and h in R, then (a) if g and h are both
convex, then f is nondecreasing, and (b) if g and h are both concave, then f
is nonincreasing.
Proof. For differentiable functions f, g, h, if f (x) = h(g(x)), then f ′′ (x) =
h (g(x))g ′′ (x) + h′′ (x)(g ′ (x))2 . 

For the composition of three functions defined by f (x) = h(g1 (g2 ))(x),
where f, h, g1 and g2 are differentiable, we have the following result.
Theorem 3.7. Let f (x) = h(g1 (g2 ))(x), where h : R2 7→ R, and gi : R 7→
R. Then f is convex if h is a univariate convex and nondecreasing function
in each argument, and g1 , g2 are convex.
Proof. For the composition f (x) = h(g1 (g2 ))(x),
 
f ′′ (x) = h′′ (g1 (g2 ))g1′ (g2 )g2′ + h′ (g1 (g2 )) g1′ (g2 )g2′′ + g1′′ (g1′ )2 .  (3.4.7)

This result can be extended to the k-composition of the form f (x) =


h(g1 (g2 · · · (gk ))).
Example 3.6. If each gi (x) is convex, then (i) f (x) = max gi (x) is convex,
P i
and (ii) f (x) = log egi (x) is convex. 
i

3.5 Differentiable Functions


Let f : Rn 7→ R be a concave or convex function. We say that a function f is
differentiable at the point x = (x1 , x2 , . . . , xn ) if the first-order partial deriva-
∂f
tives , i = 1, 2, . . . , n, exist. Then the second-order partial derivatives of
∂xi
2
∂ f
f are for i, j = 1, 2, . . . , n.
∂xi ∂xj
The first-order Taylor’s series approximation of f at a ∈ Rn is defined as

∂f (a)
f (x) = f (a) + (x − a), (3.5.1)
∂x
3.5 DIFFERENTIABLE FUNCTIONS 75

which can be written in matrix form as


 T
∂f (a)
[f (x)]1×1 = [f (a)]1×1 + [x − a]n×1 . (3.5.2)
∂x 1×n

The second-order Taylor’s series approximation of f at a ∈ R2 in matrix form


is
 
∂f (a)
[f (x)]1×1 = [f (a)]1×1 + [x − a]T1×n
∂x n×1
 2

1 ∂ f
+ [x − a]T1×n [(x − a)]n×1 ,
2 ∂xi ∂xj n×n
(3.5.3)
 
∂2f
for i, j = 1, 2, . . . , n. The term in (3.5.3) represents the (i, j)th
∂xi ∂xj
element of the Hessian matrix H for the function f (a), and the vector [x− a]T
is for index i while the second vector [x − a] is for index j. In particular, in R2
with x = (x, y), the second-order Taylor approximation at a point a = (a, b)
is given by

∂f (a, b) ∂f (a, b) 1 h ∂ 2 f (a, b)


f (x, y) = f (a, b) + (x − a) + (y − b) + (x − a)2
∂x ∂y 2 ∂x2
∂f (a, b) ∂ 2 f (a, b) 2
i
+2 (x − a)(y − b) + (y − b) . (3.5.4)
∂x∂y ∂y 2

The matrix form (3.5.3) can be compared with the summation form of the
second-order Taylor’s series approximation given in §B.3.
!!! As a tradition, some authors define the first-order derivatives of f as
∇f (x), and the first-order Taylor’s series approximation of f (x) at a point a
as
f (x) = f (a) + ∇f (x)(x − a), (3.5.5)
which is misleading because it is based on the isomorphic equality (2.7.7).
But isomorphism and equality are two different operations. If we rewrite this
approximation as
f (x) − f (a) = ∇f (x)(x − a),
we see that the left-hand side is a scalar, while the right-hand side is a vector;
thus, this equality cannot be justified by any argument. Similarly, the second-
order Taylor’s series approximation is defined as

1
f (x) = f (a) + ∇f (x)T (x − a) + (x − a)T ∇2 f (x)(x − a), (3.5.6)
2
76 3 CONCAVE AND CONVEX FUNCTIONS

which is again abused on two counts: (i) the second term on the right side is
already shown to be misleading, and (ii) the last term on the right involves
∇2 f , which is the Laplacian of f , same as the trace of the Hessian matrix
∂2f
H, and hence, it does not represent all second-order derivatives for
∂xi ∂xj
i, j = 1, 2, . . . , n (for more details, refer to §1.6.2). In fact, the second-order
Taylor approximation (3.5.6) does not reduce to the second-order Taylor ap-
proximation (3.5.4) for a function of two variables.

3.6 Unconstrained Optimization


In view of §1.3-1.6, a multivariate function f is a relative minimum or maxi-
mum if the following three conditions are met:
(i) The first-order direct partial derivatives must be zero simultaneously,
which means that at a critical point the function is neither increasing nor
decreasing with respect to the principal axes.
(ii) The second-order partial derivatives, when calculated at the critical
point, must be negative for a relative maximum and positive for a relative
minimum. Thus, in a relative way with respect to the critical point the
function is concave and moving downward in relation to the principal axes for
a maximum, and convex and moving upward relative to the principal axes for
a minimum.
(iii) The product of the second-order direct partial derivatives evaluated
at the critical point must exceed the product of the cross partial derivatives
also evaluated at the critical point. This condition is used to check for an
inflection point or a saddle point.
Example 3.7. Consider f (x, y) = 2y 3 − x3 + 12x − 54y + 12. Equating
the first-order partial derivatives of f to zero, we get fx = −3x2 + 12 =
0, fy = 6y 2 − 54 = 0, i.e., x = ±2 and y = ±3. Thus, the critical numbers
are (2, 3), (2, −3), (−2, 3), (−2, −3). Next, take the second-order direct partial
derivatives and evaluate them at each of the four critical points to check for
their signs:
fxx = −6x, fyy = 12y.
Then

(1) fxx (2, 3) = −6(2) = −12 < 0, fyy (2, 3) = 12(3) = 36 > 0,
(2) fxx (2, −3) = −6(2) = −12 < 0, fyy (2, −3) = 12(−3) = −36 < 0,
(3) fxx (−2, 3) = −6(−2) = 12 > 0, fyy (−2, 3) = 12(3) = 36 > 0,
(4) fxx (−2, −3) = −6(−2) = 12 > 0, fyy (−2, −3) = 12(−3) = −36 < 0.

Since there are different signs for each of the second direct partials in (1) and
(4), the function f cannot be at a relative extremum at (2, 3) and (−2, −3).
3.6 UNCONSTRAINED OPTIMIZATION 77

However, since the signs of second partials are both negative in (2) and positive
in (3) above, the function f may have a relative maximum at (2, −3) and a
relative minimum at (−2, 3). Since fxx and fyy are of different signs, the
product of fxx and fyy cannot be greater than (fxy )2 .
Since fxy = 0 = fyx , we check fxx · fyy > (fxy )2 at the critical points
(2, −3) and (−2, 3):

fxx (2, −3)·fyy (2, −3) = (−12)(−36) > 0, fxx (−2, 3)·fyy (−2, 3) = (12)(36) > 0.

Thus, f has a relative maximum at (2, −3) and a relative minimum at (−2, 3). 
Example 3.8 Consider f (x, y) = 3x2 − xy + 2y 2 − 4x − 7y + 8. The
first-order partial derivatives, equated to zero, give: fx = 6x − y − 4 = 0, fy =
−x + 4y − 7 = 0, solving which we get x = 1, y = 2. Thus, the critical number
is (1, 2). The second-order partial derivatives are: fxx = 6, fxy = fyx =
−1, fyy = 4. Checking the condition fxx · fyy > (fxy )2 , we have 6 · 4 > (−1)2 .
Since both fxx and yy are positive, we have a global minimum at (1, 2). 
Example 3.9. Consider f (x, y) = 52x + 36y − 4xy − 6x2 − 3y 2 + 5. The
first-order partial derivatives, equated to zero, give fx = 52 − 4y − 12x =
0, fy = 36 − 4x − 6y = 0, so the critical point is (3, 4). The second-order
partial derivatives are: fxx = −12, fxy = −4, fyy = −6. Since both fxx < 0,
Fyy < 0, and (fxy )2 = (−4)2 > 0, and fxx · fyy > (fxy )2 at the point (3, 4),
the function f has a global maximum at (3, 4). 
Example 3.10. Consider f (x, y) = 48y − 3x2 − 6xy − 2y 2 + 60x. The
first-order partial derivatives, equated to zero, give fx = −6x − 6y + 60 =
0, fy = −6x − 4y + 48 = 0, so the critical point is (4, 6), The second-order
partial derivatives are: fxx = −6 < 0, fxy = −6 = fyx < 0, fyy = −4 < 0, and
fxx · fxy = (−6)(−4) = 24 < (fxY )2 = 36. The function f has an inflection
point at (4, 6). 
Example 3.11. Optimize the following total profit function π:
(a) π = −Q2 + 15Q = 36; and (b) π = Q63 − 25Q2 + 1200Q − 316, where Q
is the total output function.
Ans. (a) Critical point Q = 7.5; π ′′ (Q) = −2, so π ′′ (7.5) = −2 < 0,
convex, relative minimum at Q = 7.5.
(b) π = 3Q2 − 50Q + 1200 = (3Q + 40)(Q − 30) = 0, so critical points
are Q = −(40)/3, 30; π ′′ = 6Q − 30, so Q(−40/3) < 0, concave, relative
maximum; Q(30) > 0, convex, relative minimum.
78 3 CONCAVE AND CONVEX FUNCTIONS

3.7 Exercises

3.1. Some graphs of functions and domains are presented below in Figure
3.10. What can you say about each one of these graphs?

Figure 3.10 Convex and concave functions.

Ans. (a)-(e) concave; (f)-(i): convex; (j): indifference curve; (k)-(l): non-
convex sets.

An indifference curve is the set of all (x, y) where a utility function u(x, y)
has a constant value. In Figure 3.10(j) the values k1 , k2 , k3 represent the
indifference curves, each one obeying a different level of utility; for example,
k1 = 4, k2 = 12, k3 = 16.
3.2. Given the following plots of polygons (Figure 3.11), determine which
3.7 EXERCISES 79

polygons are convex. Hint. First row convex; justify.

Figure 3.11 Polygons.

3.3. Prove that if f1 , f2 are convex, then the pointwise supremum func-
tion max{f1 (x), f2 (x)} is convex. Proof. Note that epi max{f1 (x), f2 (x)}
corresponds to the intersection of the two epigraphs (Figure 3.12).

Figure 3.12 epi max{f1 (x), f2 (x)}.

3.4. Let Y be a convex set in X. Prove that the closure Ȳ is a convex set.
Hint. Use definition (3.1.1) from a line segment from a boundary point to
another.
3.5. Prove that if f is convex on the interval [a, b] iff for any a < x1 <
x2 < x3 < b, we have

f (x2 ) − f (x1 ) f (x3 ) − f (x2 )


≤ . (3.7.1)
x2 − x1 x3 − x2

Proof. Let f be convex and twice differentiable on [a, b], and choose a <
80 3 CONCAVE AND CONVEX FUNCTIONS

x1 < x2 < x3 < b. Then by definition of a convex function, we have


x − x x2 − x1  x3 − x2 x2 − x2
3 2
f (x2 ) = f x1 + x3 ≤ f (x1 ) + f (x3 ).
x3 − x1 x3 − x1 x3 − x1 x3 − x1

Thus,
x3 − x − 1)f (x2 ) ≤ (x3 − x − 2)f (x1 ) + (x2 − x1 )f (x3 ).

or

(x3 − x1 )[f (x3 ) − f (x1 )] ≤ (x − 1 − x2 )f (x1 ) + (x2 − x1 )f (x3 )


= ((x − 2 − x1 )[f (x3 ) − f (x1 )],

which yields (3.10.1). 

3.6. Prove that a function f which is twice-differentiable on the interval


[a, b] ∈ R is convex iff f ′′ (x) ≥ 0 for a ≤ x ≤ b. Proof. Let f be convex and
twice-differentiable on [a, b]. Choose a < x1 < x2 < x3 < x4 < b. Then, by
definition,
f (x2 ) − f (x1 ) f (x4 ) − f (x3 )
≤ .
x2 − x1 x4 − x3

Letting x2 → x+ ′
1 and x3 → x4 , we find that f (x1 ) ≤ f (x4 ). Since these
points are arbitrary, f is increasing on (a, b). Hence, f ′′ (x) ≥ 0 for all x ∈

(a, b). 

3.7. Choose k points (x1 , f (x1 )), . . . , (xk , f (xk )) on the graph of the
function y = f (x), and assign these k points normalized masses (weights)
p1 , . . . , pk ∈ [0, 1] such that p1 + · · · + pk = 1. Then the center of grav-
Pk Pk
ity is defined at the point (xg , yg ) with xg = , yg = pif (xi ). Prove
i=1 i=1
that if f is a convex function on the interval [a, b], then for anyPchoice of
{x1 , . . . , xk } ∈ [a, b] and associated weights p1 , . . . pk ∈ [0, 1] with i pki = 1,
there holds the inequality f (xg ) ≤ yg . Hint. Use induction on k.

3.8. Let f and g be real-valued concave functions with the same domain
D. Define a function h so that h(x) = f (x)+g(x) for all x ∈ D. Is h a concave
function? If it is, prove it; otherwise provide a counterexample. Ans. Since
f and g are concave functions with domain D, then for x ∈ D and y ∈ D,
x 6= y, and for all t ∈ [0, 1], we have

f (tx + (1 − t)y) ≥ tf (x) + (1 − t)f (y), g(tx + (1 − t)y) ≥ tg(x) + (1 − t)g(y).


3.7 EXERCISES 81

Using these two inequalities we find that

h(tx + (1 − t)y) = f (tx + (1 − t)y) + g(tx + (1 − t)y)


≥ tf (x) + (1 − t)f (y) + tg(x) + (1 − t)g(y)
= t[f (x) + g(x)] + (1 − t)[f (y) + g(y)]
= t h(x) + (1 − t)h(y),

which means that h is a concave function. A similar result holds for convex
functions; just replace the word ‘concave’ by ‘convex’ and the operation ‘≥’
by ‘≤’ in the above statement and proof.
3.9. Let f be a convex function on [0, 1]. Prove that for any two sets
Pk Pk
p1 , . . . , pk ∈ [0, 1] with i=1 pi = 1, and q1 , . . . , qk ∈ [0, 1] with i=1 qi = 1,
k
P k
P
there holds the inequality pi log qi ≤ pi log pi . Proof. Let xi = qi /pi .
i=1 i=1
Using Exercise 3.6, we find that for a convex function,

k
X  k
X
f pi xi ≤ pi f (xi ).
i=1 i=1

Since f (x) = log(1/x) is a convex function, we have 0 = log 1 = log(1/1) =


 1   1  1 P 1
log P = log P ≤ pi log = pi log P . In particular,
i qi i pi xi xi i pi xi
P pi P P
using 0 ≤ i pi log = i pi [log pi − log qi ], whence we get i pi log qi ≤
P qi
i pi log pi . 

3.10. Show that the function f (x) = log x is concave for x > 0.
Proof. Take t = 21 , and use the definition (3.4.2); we should show
x + y  f (x) + f (y) x + y  log x + log y
that f ≥ , or that log ≥ , or that
 x + y2 log(xy) 2 2  x + y2
log ≥ for all x, y > 0. This leads to log ≥ log(xy)1/2 ,
2 2 2
x+y √
which after exponentiating both sides gives ≥ xy for all x, y > 0 (see
2
inequality (3.3.6)).
3.11. Show that the function f (x) = xα is concave for 0 ≤ α ≤ 1.
Hint. Concavity or convexity can be verified using the inequality (3.2.2)
and (3.4.1), respectively, or by checking that the second derivative is nonpos-
itive or nonnegative.
3.12. Show that the following functions are convex on the specified do-
main: (a) f (x) = xα on R++ for α ≥ 1 or α ≤ 0; (b) f (x) = eax on R for
82 3 CONCAVE AND CONVEX FUNCTIONS

any a ∈ R; (c) f (x) = |x|p , p ≥ 1 on R; (d) the negative entropy function


f (x) = x log x, defined on R++ , or on R+ , and defined 0 for x = 0; and (e)
f (x, y) = x2 /y, (x, y) ∈ R2 , y > 0; (f) f (x) = max{xi }.
i
Ans. (a)-(c) are simple; (d) f ′′′ (x) = 1/x > 0 for x > 0; (e) the Hessian
for y > 0 Is

2/y −2x/y 2 2 2 2 4x2


|H| = = · 2x y = ≥ 0 for y > 0,
−2x/y 2 2x2 /y 3 y3 y

which shows that the Hessian is nonnegative, and hence, f (x, y) is convex.
3.13. Let x⌊i⌋ denote the ith largest integral component of x (i.e., largest
integer not exceeding x), which means that the terms x⌊1⌋ ≥ x⌊2⌋ ≥ · · · ≥ x⌊n⌋
m
P
are in nondecreasing order. Then the function f (x) = x⌊i⌋ is a convex
i=1
function.
m
P 
Hint. The result follows by writing f (x) = x⌊i⌋ = max xi1 + xi2 +
i=1
· · · xim | 1 ≤ i1 ≤ i2 ≤ im ≤ n , which is the maximum of all possible sums of
m different largest integral components of x, and it is a convex function since
n!
it is the pointwise maximum of of linear functions.
m!(n − m)!
3.14. Use the inequality (3.3.6) to derive the Hölder inequality: For p, q >
1 1 n
1, p + q = 1, and x, y ∈ R ,

n
X n
X n
1/p  X 1/q
xi yi = |xi |p |yi |p . (3.7.2)
i=1 i=1 i=1

Solution. Using (3.2.2) or (3.4.1), the general form of inequality (3.3.6)


is
at b1−t ≤ ta + (1 − t)b, a, b ≥ 0, t ∈ [0, 1]. (3.7.3)
|xi |p |yi |q 1 1
Take a = P
n ,b = P
n , and set t = p and 1 − t = q. Then the
|xj |p |yj |q
j=1 j=1
inequality (3.7.3) becomes

 1/p  1/q
|xi |p |yi |q 1 |xi |p 1 |yi |q
n n ≤ n + P
P P p P q n
|xj |p |yj |q |xj |p |yj |q .
j=1 j=1 j=1 j=1
3.7 EXERCISES 83

Then summing over i we get the Hölder inequality. The equality in (2.7.2)
|x1 |p−1 |x2 |p−1 |xn |p−1
holds iff = = ··· = . Note that for p = 2 this
|y1 | |y2 | |yn |
inequality reduces to Cauchy-Schwarz inequality:

n
X 2 n
X n
X
2
|x|i yi | ≤ |xi | |yj |2 , (3.7.4)
i=1 i=1 j=1

and the triangle inequality is:


n
X n
X
xi ≤ |xi |. (3.7.5)
i=1 i=1

3.15. Consider f (x, y) = x3 + 2y 3 − 3x2 + 9y 2 − 45x − 60y. The first-


order partial derivatives are: fx = 3x2 − 6x − 45, fy = 6y 2 + 18y − 60, so
the solutions of fx = 0 are x = −3, 5 and of fy = 0 are y = 2, −5, giving
four critical points: (−3, 2), (−3, −5), (5, 2), (5, −5), The second-order partial
derivatives are: fxx = 6x − 6, fxy = 0 = fyx , fyy = 12y + 18. Then

(1) fxx (−3, 2) = −24 < 0, fyy (−3, 2) = 42 > 0,


(2) fxx (−3, −5) = −24 < 0, fyy (−3, −5) = −42 < 0,
(3) fxx (5, 2) = 24 > 0, fyy (7, 2) = 42 > 0,
(4) fxx (5, −5) = 24 > 0, fyy (7, −5) = −42 < 0.

Since the signs in (1) and (4) are different, they are ignored; they can be
saddle points. Since fxy = 0 = fyx , we get from (2): (−24)(−42) > (0)2 , and
from (3): (24)(42) > (0)2 . Hence, the function f has a relative maximum at
(−3, −5) and a relative minimum at (5, 2), and a saddle point at (−3, 2) and
(5, −5). 
3.16. For the following function, find the critical points and determine if at
these points the function is a relative maximum, relative minimum, inflection
point, or saddle point.
(a) f (x, y) = 3x2 − xy + 2y 2 − 4x − 7y + 10.
(b) f (x, y) = 48y − 3x2 − 6xy − 2y 2 + 72x.
(c) f (x, y) = 5x2 − 3y 2 − 30x + 7y + 4xy.
(d) f (x, y) = 3x3 − 5y 2 − 225x + 70y + 20.
Ans. (a) Critical point (1, 2); f has a global minimum at (1, 2).
(b) Critical point (0, 12); inflection point at (0, 12).
(c) Critical point (2, 5/2); saddle point.
84 3 CONCAVE AND CONVEX FUNCTIONS

(d) Critical points (5, 7), (−5, 7); relative maximum at (−5, 7), saddle point
at (5, 7).

3.17. Find (a) the critical points, and (b) test whether the following func-
tion is at a relative maximum or minimum: z = 3y 3 − x3 + 108x − 81y + 32.
Ans. (a) Equate the first-order partial derivatives to zero, and solve for
x and y: zx = −3x2 + 108 = 0, which gives x = ±6. Similarly, zy =
9y 2 − 81 = 0, which gives y = ±3. Thus, there are four distinct critical points
at (6, 3), (6, −3), (−6, 3), (−6, −3). The second partials are zxy = zyx = 0.
(b) Take the second-order direct partial derivatives, evaluate them at each
critical point, and check the signs:

zxx = −6x, zyy = 18y,


(1) zxx (6, 3) = −6(6) = −36 < 0 zyy (6, 3) = 18(3) = 54 > 0,
(2) zxx (6, −3) = −6(−6) = −36 < 0 zyy (6, −3) = 18(−3) = −54 > 0,
(3) zxx (−6, 3) = −6(−6) = 36 > 0 zyy (−6, 3) = 18(3) = 54 < 0,
(4) zxx (−6, −3) = −6(−6) = 36 > 0 zyy (−6, −3) = 18(−3) = −54 < 0.

Since there are different signs for each second partial in (1) and (4), the
function cannot be a relative maximum or minimum at (6, 3) or (−6, −3).
Also, since zxx zyy − (zx y)2 is negative at (6, 3) and positive at (−6, −3), there
is a saddle point at each of these two points. Now, since zxx and zyy are of
different signs in (2) and (3), the function may have a relative maximum at
(6, −3) and a relative minimum at (−6, 3). But since zxx and zyy are of the
same sign in (2) and (3), while zxx zyy < (zx y)2 in (3), there is an inflection
point at (−6, 3); and thus, there is a relative maximum at (6, −3).

3.18. Test for relative maxima and minima of the function f (x, y) =
x3 + 3xy 2 − 3x2 − 3y 2 − 40.
Ans. fx = 3x2 + 3y 2 − 6x = 0, fy = 6xy − 6y = 0. Solving these equations
simultaneously, we get the critical points as (0, 0), (2, 0), (1, 1), (1, −1). Also,
fxx = 6x − 6, fyy = 6x − 6, fxy = fyx = 6y. Then
at (0, 0): fxx fyy − (fxy )2 > 0 and fxx < 0 ⇒ relative maximum;
at (2, 0): fxx fyy − (fxy )2 > 0 and fxx > 0 ⇒ relative minimum;
at (1, 1): fxx fyy − (fxy )2 < 0 ⇒ saddle point;
at (1, −1): fxx fyy − (fxy )2 < 0 ⇒ saddle point.

3.19. Find the minimum distance between the origin and the surface
z 2 = x2 y + 4.
Ans. Let P (z, y, z) be any point on the surface. Then the square of
3.7 EXERCISES 85

the distance 0P is d2 = x2 + y 2 + z 2 . Thus, we find the coordinates of P


such that d2 is minimum. Let d2 ≡ f (x, y) = x2 + y 2 + x2 y + 4. Then
fx = 2x + 2xy = 0, fy √= 2y + x2 = 0; solving these equations we get the
critical points as (, 0), ± 2, −1). Also, fxx = 2 + 2y, fyy = 2, fxy = fyx = 2x.
Then at (0, 0) we have fxx fyy − (fxy )2 = 4 > 0. Hence, we have the relative
minimum at x = 4, and d2 = 4, so that d = 2.

3.20. A manufacturer determines that his cost function C = 13 Q2 + 3Q +


300, where Q is the number of units produced. At what level of output will
the average cost per unit be a minimum, and what is the minimum?
Q2 /3 + 3Q + 300 Q 300
Ans. Average cost C̃ = C/Q = = +3+ , so that
Q 3 Q
dC̃ 1 300 d2 C̃
= − 2 = 0 gives the critical point Q = ±30. Since > 0 at
dQ 3 Q dQ2
Q = 30 and C̃ has relative minimum at Q = 30, so Q = 30 is the absolute
minimum.

3.21. A manufacturer produces and sells 30,000 units of products through-


out the year. He would like to determine the number of units to be manu-
factured in each production run so that he can minimize annual set-up costs
and carrying costs. The production cost of each unit is $20 and carrying costs
(insurance, storage, etc.), estimated to be 10% of the value of the average
set-up costs per production run, are $30. Find the economic order quantity
(or economic lot size) which is the size of each production run.
Ans. Let q be the number of units in a production factory. Since the sales
are distributed at a uniform rate, assume that the inventory varies uniformly
from q to 0 between production runs. Thus, we take the average inventory
to be q/2 units. The production costs are $ 20 per unit, so the value of the
average inventory is 20(q/2) = 10q. The carrying costs are 10% of this value,
i.e., they are equal to (0.1)(10q). The number of production runs per year is
30, 000/q. Thus, the total set-up costs are = 30(30000/q). Hence, the total
30000
annual carrying costs plus the set-up costs are C = (0.1(10q) + 30 =
q
900000 dC 900000 √
q+ , whence = 1− 2
= 0 gives the critical points as ±300 10.
q dq q
d2 C 1800000 √
Since 2
= 3
> 0 at q = 300 10 ≈ 949, the absolute minimum is at
dq q
q ≈ 949.

λ −λ|x|
3.22. Show that the density function f (x) = e , λ > 0, of the Laplace
2
distribution ln f (x) = ln |x| is a concave function. What can you say about
f ′ (x)? Ans. f ′ (x) < 0 for x 6= 0; f ′ (x) does not exist at x = 0. Also
86 3 CONCAVE AND CONVEX FUNCTIONS

see Appendix C.
3.23. Prove that a (concave, convex, or any) function f : Rn 7→ R is
differentiable at a point x ∈ dom(f ) iff the gradient ∇f exists. Proof. The
gradient of f is defined at x ∈ Rn by
 
∂f ∂f ∂f ∂f ∂f ∂f
∇f (x) = e1 + e2 + · · · + en = [e]T ··· ,
∂x1 ∂x2 ∂xn ∂x1 ∂x2 ∂xn

where [e] is the 1 × n matrix of unit vectors ei , each in the direction of the
∂f
respective axis for i = 1, 2, . . . , n. If f is differentiable at x, then all
∂xi
exist, which implies that ∇f exists. Conversely, if ∇f exists, then all first-
∂f
order partial derivatives exist for each i = 1, 2, . . . , n. However, note
∂xi
∂f
that using ∇f is taking the question too far; simply the existence of ,
∂xi
i = 1, 2, . . . , n, should suffice.
3.24. Prove that the definitions (3.2.1) and (3.2.2) are equivalent. Hint.
Follow the proof of Theorem 3.4 by using proper inequality signs.
3.25. Let f : Rn 7→ R be defined by f (x) = p x. Consider the problem of
minimizing p · x by choosing x subject to the condition that x belongs to a
constraint set G. Prove that the minimum value of C(p) = min{p x | x ∈ G}
is a linear homogeneous and concave function of p.
3.26. Optimize the following functions by (i) finding the critical values at
which the function is optimized, and (ii) testing the second-order condition
to determine if it is a relative maximum or minimum.
(a) f (x) = −x3 + 6x2 + 135x − 26; (b) f (x) = x4 − 4x3 − 80x2 + 108; and
(c) f (x) = (11 − 5x)4 .
Ans. (a) Critical points x = −5, 9; f ′′ (−5) > 0, convex, relative minimum
at x = −5; f ′′ (9) < 0, concave, relative maximum at x = 9.
(b) Critical points −5, 0, 8; f ′′ (−5) > 0, convex, relative minimum at x =
−5; f ′′ (0) < 0, concave, relative maximum at x = 0; f ′′ (8) > 0, convex,
relative minimum at x = 8.
(c) Critical points x = 11 ′′ ′′′
5 ; f (11/5) = 0, test fails; f (11/5) = 0, test
inconclusive; f (11/5) > 0, convex, relative minimum at x = 11
iv)
5 .
4
Concave Programming

The subject of concave programming deals with constrained optimization


problems, in the sense of maximizing an objective function subject to con-
strained equality, and inequality constraints. In this chapter we will introduce
the method of Lagrange multipliers to solve constrained optimization prob-
lems with equality and inequality constraints, in areas of both mathematics
and mathematical finance.

4.1 Optimization
As we have seen, optimization problems deal with finding the maximum or
minimum of a function (i.e., optimizing the objective function) subject to
none or certain prescribed constraints.
We will consider the following four cases of necessary and sufficient condi-
tions for (local) optimality: (1) no constraints; (2) only equality constraints;
(3) equality and inequality constraints; and (4) only inequality constraints.

4.1.1 Unconstrained Optimization. Assume that the function f : D 7→


Rn is a continuously differentiable function.
for unconstrained maximization, the necessary and sufficient condi-
tions for a local maximum of f (x) at x∗ = (x1 , x2 , . . . , xn ) are:
(i) the first partial derivatives of f with respect to each xi , i = 1, 2, . . . , n, is
zero, i.e.,
∂f
(x) = 0, i = 1, 2, . . . , n, (4.1.1)
∂xi
where the critical point x∗ is obtained by solving equations (4.1.1) simulta-
neously; and
(ii) the Hessian |H| of f at x∗ is negative semidefinite (NSD), i.e.,

|H|(f ) ≤ 0 for all x∗ , (4.1.2)


88 4 CONCAVE PROGRAMMING

where the Hessian is defined in §1.6.2, and definite and semidefinite matrices
in §1.5.
For unconstrained minimization, the necessary and sufficient condi-
tions for a local minimum of f (x) at x∗ are:
(i) the first partial derivative of f with respect to each xi , i = 1, 2, . . . , n, is
zero, i.e.,
∂f
(x) = 0, i = 1, 2, . . . , n, (4.1.3)
∂xi
where the critical point x∗ is obtained by solving equations (4.1.3) simulta-
neously; and
(ii) the Hessian |H| of f at x∗ is positive semidefinite (PSD), i.e.,

|H|(f ) ≥ 0 for all x∗ . (4.1.4)

Many examples of minimization for the unconstrained case have been consid-
ered in the previous chapters, and some examples on minimization for this
case will be presented in the next chapter.
!!! The necessary and sufficient conditions (4.1.1) and (4.1.3) are sometimes
defined by stating that ∇f (x∗ ) = 0, which reduces to (4.1.1) and (4.1.3).

Example 4.1. Consider f (x, y) = 2y 3 − x3 + 12x − 54y + 12. Equating


the first-order partial derivatives of f to zero, we get fx = −3x2 + 12 =
0, fy = 6y 2 − 54 = 0, i.e., x = ±2 and y = ±3. Thus, the critical numbers
are (2, 3), (2, −3), (−2, 3), (−2, −3). Next, take the second-order direct partial
derivatives and evaluate them at each of the four critical points to check for
their signs:
fxx = −6x, fyy = 12y.
Then

(1) fxx (2, 3) = −6(2) = −12 < 0, fyy (2, 3) = 12(3) = 36 > 0,
(2) fxx (2, −3) = −6(2) = −12 < 0, fyy (2, −3) = 12(−3) = −36 < 0,
(3) fxx (−2, 3) = −6(−2) = 12 > 0, fyy (−2, 3) = 12(3) = 36 > 0,
(4) fxx (−2, −3) = −6(−2) = 12 > 0, fyy (−2, −3) = 12(−3) = −36 < 0.

Since there are different signs for each of the second direct partials in (1) and
(4), the function f cannot have a relative extremum at (2, 3) and (−2, −3).
However, since the signs of second partials are both negative in (2) and both
positive in (3) above, the function f may have a relative maximum at (2, −3)
and a relative minimum at (−2, 3). Since fxx and fyy are of different signs,
the product of fxx and fyy cannot be greater than (fxy )2 .
4.2 METHOD OF LAGRANGE MULTIPLIERS 89

Since fxy = 0 = fyx , we check fxx · fyy > (fxy )2 at the critical points
(2, −3) and (−2, 3):
fxx (2, −3)·fyy (2, −3) = (−12)(−36) > 0, fxx (−2, 3)·fyy (−2, 3) = (12)(36) > 0.
Hence, f has a relative maximum at (2, −3) and a relative minimum at
(−2, 3). 
Example 4.2. Maximize the utility function u(x, y) = 3xy subject to the
constraint g(x) = 3x + 4y − 60. The Lagrangian is L(x, y, λ) = 3xy + λ(60 −
3x−4y), and the first-order partial derivatives equated to zero give the system
of equations Lx = 3y − 3λ = 0, Ly = 3x − 4λ = 0, Lλ = 60 − 3x − 4y = 0.
Writing this system in the matrix form Ax = b:
    
0 3 −3 x 0
 3 0 −4   y  =  0  .
−3 −4 0 λ −60

Using Cramer’s rule to solve this system, we have |A| = 72, |A1 | = 720, |A2 | =
540, |A3 | = 540, which give the critical point x∗ , y ∗ = 7.5, λ∗ = 7.5. Next,
taking the second-order partial derivatives of U with respect to x, y, (Lxx =
0, Lxy = 3 = Lyx , Lyy = 0), the first-order partial derivatives of g are:
gx = 3, gy = 4, and writing in the left-side form of (1.6.7) we get
   
0 3 4 0 3 3
|H̄| =  3 0 3  , or |H̄| =  3 0 4  .
4 3 0 3 4 0

Thus, we find the value of |H̄2 | = |H̄| = 72 from the bordered Hessian on the
left, and that of |H̄2 | = |H̄ = 72 from the bordered Hessian on the right. Note
that there is no need to use both forms of the bordered Hessian; the left-hand
form works as well. Since |H̄| = |A| > 0, the bordered Hessian |H̄| is negative
definite. Also, uxx · uyy = 0 < (uxy )2 = 9, the function u is maximized at the
above critical point (x∗ , y ∗ ) = (7.5, 7.5).

4.2 Method of Lagrange Multipliers


Given functions f, g1 , . . . , gm and h1 , . . . , hk defined on some domain D ⊂ Rn ,
the maximization problem is stated as follows.
Determine max f (x) subject to the constraints gi (x) ≤ 0 for all i = 1, . . . , m
x∈D
and all hj (x) = 0 for all j = 1, . . . , k.

4.2.1. Constrained Optimization with Equality Constraint. The


method of Lagrange multipliers is used in this and other cases. Thus, given a
function f (x, y) subject to a constraint g(x, y) = k (constant), a new function
can be defined by either of the following two equivalent forms:

F (x, y, λ) = f (x, y) − λ(g(x, y) − k), (4.2.1)


90 4 CONCAVE PROGRAMMING

or
F (x, y, λ) = f (x, y) + λ(k − g(x, y)), (4.2.2)

where λ > 0 is known as the Lagrange multiplier, f (x, y) as the original


function or the objective function, g(x, y) as the constraint, and F (x, y, λ)
as the Lagrangian function or simply, the Lagrangian. Since the constraint is
always set equal to zero, the product λ(g(x, y) − k), or λ(k − g(x, y)) is also
equal to zero, and thus, the addition of this term does not change the value
of the objective function f (x, y). In view of Theorem 2.18, if ∇F (x, y, λ) = 0,
then
Fx (x, y, λ) = 0, Fy (x, y, λ) = 0, Fλ (x, y, λ) = 0. (4.2.3)

The critical values at which the objective function is optimized are denoted
by x∗ , y ∗ , λ∗ , and are determined by solving equations (4.2.3) simultaneously.
The second-order conditions will obviously be different from those for the
unconstrained optimization considered in §4.1; they are discussed in the se-
quel.

Example 4.3. Optimize f (x, y) = 2x2 + 12xy − 5y 2 , subject to the con-


straint x + y = 30. The Lagrangian in the form (4.2.2) is

F (x, y, λ) = 2x2 + 12xy − 5y 2 + λ(30 − x − y).

The first-order partial derivatives are: Fx = 4x + 12y − λ = 0, Fy = 12x −


10y − λ = 0, Fλ = 30 − x − y = 0, which when solved simultaneously give
the critical values: x∗ = 22, y ∗ = 8, λ∗ = 184. Substituting these values
in F (x, y, λ) we get F (22, 8) = 2760 = f (22, 8). Notice that both functions
f (x, y) and F (x, y) are equal at the critical values, since the constraint is equal
to zero at these values.
The second-order derivatives are Fxx = 4, Fyy = −10, Fxy = Fyx = 12.
Also, from the constraint g(x, y) = x + y − 30, we have gx = 1, gy = 1. Then
the bordered Hessian (see Eq (1.6.7)) is

4 12 1
|H| = 12 −10 1 ,
1 1 0

and its second principal minor is |H2 | = |H| = 4(−1) − 12(1) + 1(12 + 10) =
6 > 0. Since |H̄2 | > 0, |H| is negative definite, and so F (x, y) has a local
maximum at (22, 8). 
Note that the Lagrange multiplier λ approximates the marginal impact
on the objective function caused by a small change in the constant of the
constraint. Thus, in the above example, with the value of λ∗ = 184, a 1-unit
4.3 KARUSH-KUHN-TUCKER CONDITIONS 91

increase (or decrease) in the constant of constraint would result in an increase


(or decrease) in F by approximately 184 units, as the following example shows.

Example 4.4. Suppose there is a 1-unit change (decrease) in the constant


of the constraint. We want to determine what change it will make in F and
λ∗ in the above example. Then with the new constraint g(x, y) = x + y − 29,
we have
F (x, y, λ) = 2x2 + 12xy − 5y 2 + λ(29 − x − y),
which gives Fx = 4x + 12y − λ = 0, Fy = 12x − 10y − λ = 0, Fλ = 29 −
x − y = 0. Solving these equations simultaneously we get x∗ = 21.266, y ∗ =
7.733, λ∗ = 177.86; thus, F (21.266, 7.733) = 2578.887 = f (21.266, 7.733),
which is approximately 181.1 (i.e., about 6.6%) smaller than the previous
value of F = 2760, while the new value of λ∗ is about 3.3% smaller than the
previous value of λ∗ = 184. 
This is the reason why Lagrange multipliers are called shadow prices. Also,
in utility optimization subject to a budget constraint, the value of λ will
estimate the marginal utility of an extra dollar of income.
!!! The function F (x, y), defined by (4.2.1), can have the term λ(k − g(x, y))
= 0 either added to or subtracted from the objective function f (x, y) without
changing the value of x∗ and y ∗ , except that only the sign of λ∗ will be affected.

4.3 Karush-Kuhn-Tucker Conditions


We will consider maximization problems subject to equality and inequality
constraints. This case is also known as nonlinear programming, and the
Karush-Kuhn-Tucker (KKT) conditions (also known as Kuhn-Tucker con-
ditions) are used in such optimization problems. These conditions provide
the first-order necessary conditions for an optimal solution in nonlinear pro-
gramming, provided that certain regularity conditions are satisfied. Although
the KKT conditions allow inequality constraints, this is a generalization of
the method of Lagrange multipliers, which allows only equality constraints.
The system of equations under the KKT conditions is generally not solved
directly, except in a few special cases where a closed-form solution can be de-
rived analytically. In general, many nonlinear optimization algorithms can be
interpreted by numerical methods for solving the KKT system of equations.

4.3.1 Equality and Inequality Constraints. Consider the nonlinear pro-


gramming problem

Optimize f (x) subject to gi (x) ≤ 0, hj (x) = 0, (4.3.1)

where x is the optimization variable, f the objective (or utility) function,


gi , i = 1, . . . , m, are the inequality constraint functions, and hj , j = 1, . . . , k,
are the equality constraint functions.
92 4 CONCAVE PROGRAMMING

4.3.2 Necessary Conditions. Suppose that the objective function f : Rn 7→


R, and the constraint functions gi : Rn 7→ R and hj : Rn 7→ R, are contin-
uously differentiable at a point x∗ . If this point x∗ is a local optimum that
satisfies the following four regularity conditions (known as the KKT condi-
tions), then there exist constants µi , (i = 1, . . . , m) and λj , (j = 1, . . . , k)
such that:
(i) Stationary conditions:
For maximizing f (x):
m n
∂f ∗ X ∂g ∗ X ∂h ∗
(x ) = µi (x ) + kλj (x ). (4.3.2)
∂xi i=1
∂xi j=1
∂xi

For minimizing f (x):


m n
∂f ∗ X ∂g ∗ X ∂h ∗
− (x ) = µi (x ) + kλj (x ). (4.3.3)
∂xi i=1
∂xi j=1
∂xi

(ii) Primal feasibility conditions:

gi (x∗ ) ≤ 0 for all i = 1, . . . , m,


(4.3.4)
hj (x∗ ) ≤ 0 for all j = 1, . . . , k.
(iii) Dual feasibility condition:

µi ≥ 0 for all i = 1, . . . , m. (4.3.5)


(iv) Complementary slackness condition:

µi gi (x∗ ) = 0 for all i = 1, . . . , m. (4.3.6)


Note that in the absence of inequality constraints, i.e., when m = 0, the
KKT conditions become the Lagrange conditions and the KKT multipliers
become the Lagrange multipliers.
Further, if we add an additional multiplier µ0 , which may be zero, and
rewrite the above KKT stationary conditions (i) as

m k
∂f ∗ X ∂gi ∗ X ∂hj ∗
µ0 (x ) + µi (x ) + λj (x ) = 0, (4.3.7)
∂xi i=1
∂xi j=1
∂xj

then the KKT conditions belong to a wider class of first-order necessary con-
ditions (FONC) which allow non-smooth functions using subderivatives. Con-
dition (4.3.7) is known as the Fritz John condition, to be discussed later in
§5.3.
4.3 KARUSH-KUHN-TUCKER CONDITIONS 93

4.3.3 Regularity Conditions (or constraint qualifications). If a minimum


point x∗ were to satisfy the KKT conditions (4.3.2) through (4.3.6), the prob-
lem (4.3.1) must satisfy certain regularity conditions. The most used of such
conditions are as follows:
1. Linearity constrained qualifications (LCQ): If gi and hj are affine func-
tions, i.e., they are linear functions plus a constant, then no other condition
is needed.
2. Linear independence constraint qualification (LICQ): The gradients of the
active inequality constraints and those of the equality constraints are linearly
independent at x∗ .
3. Mangasarian-Fromovitz constraint qualification (MFCQ): The gradients
of the active inequality constraints and those of the equality constraints are
positive-linearly independent at x∗ .
4. Constant rank constraint qualification (CRCQ): For each subset of the gra-
dients of the active inequality constraints and those of the equality constraints,
the rank at a vicinity of x∗ is constant.
5. Constant positive linear dependence constraint qualification (CLPD): For
each subset of the gradients of the active inequality constraints and those of
the equality constraints, if it is positive-linearly dependent at x∗ then it is
positive-linearly dependent1 at a vicinity of x∗ .
6. Quasi-normality constraint qualification (QNCQ): If the gradients of the
active inequality constraints and those of the equality constraints are positive-
linearly dependent at x∗ with associated multipliers µi for inequalities and λj
for equalities, then there exists no sequence {xn } → x∗ such that µi 6= 0 ⇒
µi gi (xn ) > 0 and λj 6= 0 ⇒ λj hj (xn ) > 0.
7. Slater conditions (SC): For a convex problem, there exists a point x such
that h(x) = 0 and gi (x) < 0.
This is the most used condition in practice. Note that
LICQ ⇒ MFCQ ⇒ CPLD ⇒ QNCQ,
LICQ ⇒ CRCQ ⇒ CPLD ⇒ QNCQ,
but the converses are not true, and MFCQ 6≡ CRCQ. In practice one must
prefer weaker constraints since they provide stronger optimality conditions.

4.3.4 Sufficient Conditions. In general, the necessary conditions do not


qualify as sufficient conditions for optimality. However, in some cases the
necessary conditions also become sufficient conditions, with some additional
information such as the second-order sufficient condition (SOSC). For exam-
ple, for smooth functions, the SOSC involves the second derivative, hence the

1A set (v1 , . . . , vn ) is positive-linearly dependent if there exists a1 ≥ 0, . . . , an ≥ 0 not


all zero such that a1 v1 + · · · + an vn = 0.
94 4 CONCAVE PROGRAMMING

name. The general rule is: The necessary conditions are sufficient for optimal-
ity if the objective function f in an optimization (maximization) problem is a
concave function, where the inequality constraints gi are continuously differ-
entiable convex functions and the equality constraints hj are affine functions.
Example 4.5. (Nonlinear programming problem in R2 ) Consider the prob-
lem: maximize f (x, y) = xy subject to the constraints x + y 2 ≤ 2 for x, y ≥ 0.
Since the feasible region is bounded, a global maximum for this problem ex-
ists, because a continuous function on a closed and bounded (compact) set has
a maximum there. We write the given constraints as g1 (x, y) = x + y 2 ≤ 2,
g2 (x, y) = −x ≤ 2, g3 (x, y) = −y ≤ 0. Then the KKT conditions can be
written as

y − λ1 + λ2 = 0, (4.3.10)
x − 2yλ1 + λ3 = 0, (4.3.11)
2
λ1 (2 − x − y ) = 0, (4.3.12)
λ2 x = 0, (4.3.13)
λ3 y = 0, (4.3.14)
2
x + y ≤ 2, (4.3.15)
x, y, λ1 , λ2 , λ3 ≥ 0. (4.3.16)

Note that in the case of Rn , in equations of the form λi (bi − gi (x1 , . . . , , xn ))


at least one of the two factors must be zero. If there are n such conditions
there are at most 2n possible cases to consider. In this example there are only
22 = 4 cases, which are as follows:
Case 1. Suppose λ1 = 0. Then from (4.3.10) and (4.3.11) we get y +λ+2 = 0
and x + λ3 = 0, respectively. Since each term is nonnegative, the only
solution of these two equations is x = 0 = y = λ2 = λ3 . Check that
the KKT conditions are satisfied when x = y = λ1 = λ2 = λ3 = 0.
However, these values do not provide a local maximum since f (0, 0) = 0
and f (x, y) > 0 at points inside the feasible region.
Case 2. Suppose x + y 2 = 2. Then at least one of x = 2 − y 2 and y must be
positive.
Case 2a. Suppose x > 0. Then λ2 = 0, and in view of (4.3.10) we get λ1 = y.
2 2
Then (4.3.11) gives x − 2yλp 1 + λ3 = 2 − 3y + λ3 , or 3y = 2 + λ3 > 0,
and λ3 = 0. Hence, y = 2/3, and x = 2 − 2/3 = 4/3. Note that these
values satisfy all the KKT conditions.

Case 2b. Suppose x = 0, which gives y = 2. Since y > 0 we get λ3 = 0.
Then (4.3.11) gives λ1 = 0. But this is precisely Case 1.
p
Thus, we have only two p local maxima at (0, 0) and (4/3, 2/3 ), and one
global maximum at (4/3, 2/3 ) 
4.3 KARUSH-KUHN-TUCKER CONDITIONS 95

Example 4.6. (Utility maximization with one simple constraint) Consider


the utility maximization problem with a budget constraint:

Maximize u(x, y) subject to B = Px x + Py y, and x∗ ≥ x. (4.3.17)

This problem has two constraints. Using the Lagrange method we can set up
this problem with two constraints so that the Lagrange problem becomes

max u(x, y) + λ1 (B − Px x − Py y) + λ2 (x∗ − x). (4.3.18)


x,y

This type of problem is found in cases of marginal utility of budget-ration


constraints, wartime rationing, and peak load pricing. We will provide exam-
ples in each of these situations.
However, in this utility maximization problem we know that the budget
constraint may or may not be binding since it depends on the size of x∗ .
Solution. We will vary slightly the first-order condition. Thus, using
the KKT conditions which can be applied to cases where the constraints
may sometimes be non-binding. The KKT conditions for problems (4.3.17)-
(4.3.18) are
Lx = ux − λ1 Px − λ2 = 0, x ≥ 0,
Ly = uy − λ1 Py = 0, y ≥ 0,
(4.3.19)
Lλ1 = B − Px x − Py y ≥ 0, λ1 ≥ 0,
Lλ2 − x∗ − x ≥ 0, λ2 ≥ 0.
Using the Lagrange problem (4.3.18) we must have λ1 (B − Px x − Py y) = 0,
which yields either λ1 = 0 or B − Px x − Py y = 0. Thus, if we regard λ1
as the marginal utility of the budget (i.e., income), then in case the budget
constraint is not satisfied, the marginal utility of the additional B is zero
(λ1 = 0).
Similarly for the ration constraint, either x∗ − x = 0 or λ2 = 0, where λ2
can be regarded as the marginal utility of relaxing the ration constraint.
Solutions of such problems need the trial-and-error method. This method
enumerates the points on the boundary. Since there is more than one possible
outcome, we must try them all, but with the understanding that we are making
an educated guess as to which constraint is more likely to be non-binding.
Since in this problem we know that the budget constraint will always be
binding, we will concentrate on the ration constraint, and go through the
following steps:
Step 1. (Simply ignore the second constraint) We are assuming that λ2 =
0, λ1 > 0. Then the first-order KKT conditions (4.3.19) become
Lx = ux − λ1 Px − λ2 = 0,
Ly = uy − λ1 Py = 0, (4.3.20)
Lλ1 = B − Px x − Py y = 0.
96 4 CONCAVE PROGRAMMING

We find a solution for x∗ and y ∗ , and then check if the constraint that was
ignored (i.e., λ2 ) has been violated. If the answer is yes, then go to Step 2.
Step 2. (Use both constraints, assuming that they are binding) We take
λ1 > 0, λ2 > 0. Then the first-order KKT conditions (4.3.19) become

Lx = ux − λ1 Px − λ2 = 0,
Ly = uy − λ1 Py = 0,
(4.3.21)
Lλ1 = B − Px x − Py y = 0,
Lλ2 = x∗ − x = 0.

Then the solution will be the point where the two constraints intersect.
Step 3. (Use the second constraint, ignore the first one) We assume λ2 >
0, λ1 = 0, and the first-order KKT conditions (4.3.19) become

Lx = ux − λ1 Px − λ2 = 0,
Ly = uy − λ1 Py = 0, (4.3.22)

Lλ2 = x − x = 0.

We will explain these steps using a numerical example, which is as follows:

Maximize u(x, y) = xy subject to x + y ≤ 90, and x ≤ 30. (4.3.23)

From (4.3.18) the Lagrangian is

L(x, y) = xy + λ1 (90 − x − y) + λ2 (30 − x),

and the KKT conditions (4.3.19) are

Lx = y − λ1 − λ2 = 0, x ≥ 0,
Ly = x − λ1 = 0, y ≥ 0,
(4.3.24)
Lλ1 = 90 − x − y ≥ 0, λ1 ≥ 0,
Lλ2 = 30 − x ≥ 0, λ2 ≥ 0.

So we have four equations and four unknowns x, y, λ1 and λ2 . To solve, we


use the above steps, i.e., we first ask if any λi (i = 1, 2) could be zero. First
try λ2 = 0, because, in view of the form of the utility function, λ1 = 0 does
not make sense. This gives x − λ1 = y − λ1 , or x = y, and from the constraint
90 − x − y we get x∗ = y ∗ = 0, which could not be the case, as it violates the
constraint that x ≤ 30. Hence, x∗ = 30 and y ∗ = 60, and thus, λ∗1 = 30 and
λ∗2 = 15. 
4.4 INEQUALITY CONSTRAINED OPTIMIZATION 97

4.4 Inequality Constrained Optimization


So far we have studied optimization problems in which the constraints are
strictly equalities. In certain economic problems there are weak constraints;
for example, in problems in which the utility function is to be maximized
subject to ‘not more than x dollars or workweeks to minimize the costs.’
Optimization problems of this kind are called concave programming, because
the objective function and the constraint function are assumed to be concave.
It is a form of nonlinear programming where the object function is optimized
subject to inequality constraints.
As mentioned in Chapter 3, concave functions are also included since the
convex functions are negatives of the concave functions. This implies that in
maximization problems, concave programming can also be used to minimize
a convex objective (or utility) function by maximizing the negative of that
function.
Case 4. Only inequality constraints. Consider the following optimiza-
tion problem:

Maximize f (x1 , x2 ) subject to g(x1 , x2 ) ≥ 0, x1 , x2 ≥ 0, (4.4.1)

and the Lagrangian is

F (x1 , x2 ) = f (x1 , x2 ) + λg(x1 , x2 ). (4.4.2)

The first-order KKT conditions, which are necessary and sufficient conditions
for maximization, give the following six relations:

∂F
1a. = f (x∗1 , x∗2 ) + λg(x∗1 , x∗2 ) ≤ 0,
∂xi
1b. xi ≥ 0,
∂F
1c. x∗i = 0, i = 1, 2,
∂xi
(4.4.3)
∂F
2a. = g(x∗1 , x∗2 ) ≥ 0,
∂xi
2b. λ∗ ≥ 0,
∂F
2c. λ∗ = 0,
∂λ

where x∗1 , x∗2 are the critical points of f . Conditions (1.c) and (2.c) are called
the complementary-slackness conditions, which implies that both x∗1 , x∗2 , and
∂f (x∗1 ) ∂f (x∗2 )
and cannot both be zero.
∂xi ∂xi
For a linear objective function which is concave or convex, but not strictly
concave or strictly convex, the concave programming that satisfies the KKT
98 4 CONCAVE PROGRAMMING

conditions will always satisfy the necessary and sufficient conditions for a
maximum.
The significance of the KKT conditions (4.4.3) is as follows:
(i) Condition (1.a) requires that the Lagrangian function F be maximized
with respect to x1 and x2 , while condition (2.a) demands that the function F
be minimized with respect to λ. This means that the concave programming
is designed to seek out a saddle point in the function F in order to optimize
the objective function f subject to the inequality constraints g.
(ii) In optimization problems with equality constraints which are set equal
to zero, the quantity λg related to the constraint can be either subtracted
or added to the objective function f to define the Lagrangian function as in
Eqs (4.2.1) and (4.2.2). However, in concave programming with inequality
constraints, the order of subtraction is very important, since the constraint in
the KKT conditions is always expressed in the ≥ 0 form.
The KKT conditions with inequality constraints can be explained in the
single variable case as follows. Suppose we want to find a local maximum for
an objective function f (x) in the first quadrant x ≥ 0 (this is the inequality
constraint). There are three cases: (i) The critical point is an interior point
of the first quadrant (Figure 4.1(a)): f ′ (x) = 0 and x > 0; (ii) the critical
point is on the boundary (at G) (Figure 4.1(b)): f ′ (x) = 0 and x = 0; and
(iii) critical point is at H or J (Figure 4.1(c)): f ′ (x) < 0 and x = 0. Thus,
all these three cases can be stated concisely in one statement:
f ′ (x) ≤ 0, x ≥ 0 and xf ′ (x) = 0,
which is precisely contained in the KKT conditions. Note that these con-
ditions exclude a point like A in Figure 4.1(a) where it is not a maximum,
because f ′ (K) > 0.

Figure 4.1 KKT Conditions in single variable.


4.4 INEQUALITY CONSTRAINED OPTIMIZATION 99

Example 4.7. Consider the problem:


Maximize f (x, y) ≤ k, where x ≥ 0, y ≥ 0, subject to the constraints:
1. g(x∗ , y ∗ ) < k, x∗ > 0, y ∗ > 0,
2. g(x∗ , y ∗ ) = k, x∗ > 0, y ∗ = 0,
3. g(x∗ , y ∗ ) < k, x∗ > 0, y ∗ = 0,
4. g(x∗ , y ∗ ) < k, x∗ = 0, y ∗ = 0, (4.4.4)
5. g(x∗ , y ∗ ) < k, x∗ = 0, y ∗ > 0,
6. g(x∗ , y ∗ ) = k, x∗ = 0, y ∗ > 0,
7. g(x∗ , y ∗ ) = k, x∗ > 0, y ∗ > 0.
See Figure 4.2 for the feasible region and the sets where these constraints are
applied.
Consider the non-negativity constraint z ≥ 0. There are two basic types of
solutions with this kind of constraint: (i) h1 (z), and h2 (z). We will consider
h1 (z) with a maximum at z ∗ > 0. In this case the maximum is at a point on
a flat part of the function. Hence, h′1 (z ∗ ) = 0.
Next consider h2 (z) with a maximum at z ∗∗ . In this case the function could
be flat at z ∗∗ = 0. But it could also be downward sloping, so h′2 (z ∗∗ ) ≤ 0 (see
Figure 4.2b).

Figure 4.2 (a) Feasible region. (b) Non-negativity constraints.

At this point we are unable to determine whether the function h(z) looks
like h1 (z) or h2 (z). Thus, we need a set of first-order KKT conditions that
100 4 CONCAVE PROGRAMMING

allows for either case. These conditions can be written as

either z ∗ = 0 and h′ (z ∗ 0 ≤ 0,
or z ∗ > 0 and h′ (z ∗ ) = 0.

These conditions can be written in short as

h′ (z ∗ ) ≤ 0 and z ∗ (h′ (z ∗ )) = 0.

Notice that the KKT conditions are similar to the complementary slackness
conditions which are used in the Lagrangian formulation. Recall that f (x, y)
such that g(x, y) < k, without ignoring the non-negativity on x and y yet, the
Lagrangian is
L = f (x, y) − λ(g(x, y) − k), (4.4.5)

and the first-order conditions are

∂f (x∗ , y ∗ )
Lx = − λgx (x∗ , y ∗ ) = 0,
∂x
∂f (x∗ , y ∗ )
Ly = − λgy (x∗ , y ∗ ) = 0, (4.4.6)
∂y
g(x∗ , y ∗ ) − k ≤ 0,
λ∗ g(x∗ , y ∗ ) − k = 0.

The last condition in (4.4.6) implies that either the constraints bind, or else
the Lagrange multiplier λ is zero. This means that the solution could be in
region 7 or region 1 (see Figure 4.2a).
The KKT conditions are relevant in problems where we add the non-
negativity constraints x ≥ 0 and y ≥ 0 to the constrained maximization
problem. This adds the restriction implied by these constraints directly into
the first-order conditions, i.e., they capture the way the first-order conditions
change when the solution is in the region 2 − 6 in Figure 4.2(a).
Example 4.8. A company wants to maximize utility while spending no
more than a predetermined budget. Suppose the concave programming prob-
lem is posed as:

Maximize u(x, y) subject to B − px x − py y ≥ 0, x, y ≥ 0.

The Lagrangian is

L(x, y) = u(x, y) + λ(B − px x − py y).


4.4 INEQUALITY CONSTRAINED OPTIMIZATION 101

First, L is maximized with respect to the variables x, y, subject to the KKT


conditions:
∂L ∂L
1a. = ux − λ∗ px ≤ 0, = uy − λ∗ py ≤ 0,
∂x ∂y
1b. x∗ ≥ 0, y ∗ ≥ 0,
1c. x∗ (ux − λ∗ px ) = 0, y ∗ (uy − λ∗ yp ) = 0.

Next, the Lagrangian L is minimized with respect to the variable λ and the
related conditions:
∂L
2a. = B − px x − py y,
∂λ

2b. λ ≥ 0,
2c. λ∗ (B − px x − py y) = 0.

Thus, we have three cases of nontrivial solutions, which are as follows.


Case 1. If x∗ , y ∗ > 0, then from (1c) we have

ux − λ∗ px = 0, uy − λ∗ py = 0,

which gives
ux uy
λ∗ = = . (4.4.7)
px py
Since px , py > 0 and assuming that the customer is unsatisfied (i.e., ux , uy >
0), we have λ∗ > 0. But then from (2c) we will have B − px x − py y = 0. Thus,
the budget constraint behaves exactly like an equality constraint (and not a
weak inequality). Hence, the optimal points (x∗ , y ∗ ) will lie somewhere on the
ux px
budget line and not below it. Further, from Eq (4.4.7) we also get = .
uy py
ux px
Since is simply the slope of the indifference curve, and is the slope of the
uy py
budget line, whenever both x∗ , y ∗ > 0 (this case), the indifference curve will
be tangent to the budget line at the point of optimization, and this provides
an interior solution as in Figure 4.1a (see Figure 4.3a). This case, in fact,
reduces the problem to the constrained optimization problem as discussed in
previous section.
Case 2. If x∗ = 0, y ∗ > 0, then from (1.c) we have
ux uy
ux − λ∗ px ≤ 0, uy − λ∗ py = 0; thus λ∗ ≥ , λ∗ = . (4.4.8)
px py

Assuming that ux , uy , px , py > 0, we get λ∗ > 0. Thus, from (2c) the budget
constraint behaves exactly like an equality constraint, and not a weak inequal-
ity, even though only one variable is greater than zero and the other equals
102 4 CONCAVE PROGRAMMING

zero. Hence, as in case 1, the optimal point (x∗ , y ∗ ) will lie on the budget line
and not below it (Figure 4.3a).
ux
Now, substituting λ∗ = on the right into (4.4.8) we get
uy

ux uy ux py
≤ , or ≤ .
px py uy px

This means that the indifference curves along the budget are everywhere flat-
ter than the budget line, which leads to a corner solution in the upper left
(Figure 4.3b), and at the corner solution the slope of the budget indifference
curve that just touches the budget line may be flatter or equal to the slope of
the budget line.

Figure 4.3 Two cases.

Case 3. If x∗ = 0 = y ∗ , then L = u(x, y) + λB, and (1.a) gives ux ≤


0, uy ≤ 0, and (1.b) and (1.c) are satisfied. Also, (2.a) becomes Lλ = B > 0,
and then (2.c) gives λ∗ = 0. Thus, the problem reduces to that of optimization
with no constraints, which are discussed in §3.6.
This analysis provides an insight into the necessary and sufficient KKT
conditions. However, their use in problems of practical applications is ex-
plained in the following examples, where x∗ , y ∗ , λ∗ still denote the critical (or
optimal) values.
Example 4.9. Maximize the profit π(x, y) = 30x − x2 + 60y − 2y 2 − 11
subject to the production constraint x + y ≤ 36.
First, the constraint is 36 − x − y ≥ 0. Then the Lagrangian is

Π(x, y) = 30x − x2 + 60y − 2y 2 − 11 + λ(36 − x − y). (4.4.9)


4.4 INEQUALITY CONSTRAINED OPTIMIZATION 103

The KKT conditions are:

1a. Πx = 30 − 2x∗ − λ∗ ≤ 0, Πy = 64 − 4y ∗ − λ∗ ≤ 0,
1b. x∗ ≥ 0, y ∗ ≥ 0,
1c. x∗ (30 − 2x∗ − λ∗ ) = 0, y ∗ (64 − 4y ∗ − λ∗ ) = 0,
2a. Πλ = 36 − x∗ − y ∗ ,
2b. λ∗ ≥ 0,
2c. λ∗ (36 − x∗ − y ∗ ) = 0.

Next, we check these conditions as follows:


(i) Check if λ∗ = 0 or λ∗ > 0: If λ∗ = 0, then (1a) gives 30 − 2x∗ ≤ 0 and
64 − 4y ∗ ≤ 0, i.e., x∗ ≥ 15, y ∗ ≥ 16, or x∗ + y ∗ = 31 > 36, which violates the
constraint. Thus, λ∗ 6= 0, and we conclude that λ∗ > 0.
(ii) If λ∗ > 0, the constraint holds with equality, i.e.,

36 − x∗ − y ∗ = 0.

(iii) If λ∗ > 0, then we check if x∗ or y ∗ can be zero. If x = 0, y ∗ = 36,


then the second condition in (1c) is violated, because 36(64 − 4(36) − λ∗ ) 6= 0
(since λ∗ > 0). Similarly, if y ∗ = 0, x∗ = 36, then the first condition in (1c)
is violated, because 36(30 − 2(36) − λ∗ ) 6= 0. Thus, neither x∗ nor y ∗ can be
zero, and from (1b) we conclude that both x∗ > 0, y ∗ > 0.
(iv) Now, since x∗ , y ∗ , λ∗ > 0, we have the following equations from (1a)
and (2a):

30 − 2x∗ − λ∗ = 0,
64 − 4y ∗ − λ∗ = 0,
36 − x∗ − y ∗ = 0,

to be solved simultaneously. This system is written in matrix form Ax = b


as   ∗   
−2 0 −1 x −30
 0 −4 −1   y ∗  =  −64  .
−1 −1 0 λ∗ −36
Using Cramer’s rule, we get

−2 0 −1 −30 0 −1
|A| = 0 −4 −1 = 6, |A|1 = −64 −4 −1 = 110,
−1 −1 0 −36 −1 0
−2 −30 −1 −2 0 −30
|A|2 = 0 −64 −1 = 36, |A|3 = 0 −4 −64 = 248.
−1 −36 0 −1 −1 −36
104 4 CONCAVE PROGRAMMING

Hence, the optimal values are x∗ = 110/6 = 18.33, y ∗ = 36/6 = 6, λ∗ =


248/6 = 41.33. Also, note that with λ∗ = 41.33, a 1-unit increase in the
constant of the constraint will increase the profit by approximately 43.33. 
Example 4.10. In the foreign trade market, let the income be determined
by
Y = C + I0 + G0 + X0 + Z, C = C0 + bY, Z = Z0 + zY,
where X denotes the exports, Z the imports, the subscript zero indicates an
exogenous fixed variable, b the marginal propensity to consume, and z the
marginal propensity to import.
First, the Lagrangians for the above three functions are

F1 (Y, C, Z; C0 , I0 , G0 , X0 , Z0 , b, z) = Y − C − I0 − G0 − X0 − Z = 0,
F2 (Y, C, Z; C0 , I0 , G0 , X0 , Z0 , b, z) = C − c0 − bY,
F3 (Y, C, Z; C0 , I0 , G0 , X0 , Z0 , b, z) = Z − Z0 = zY.
(4.4.10)
We will discuss three cases:
Case 1. The partial derivatives of the unknown functions Y, C, Z with respect
to X0 are written in the matrix form as
 ∗   
 ∂F
1 ∂F1 ∂F1  ∂Y −
∂F1
 ∂Y ∂C ∂Z   ∂x0∗    ∂X
 0 
 ∂F2 ∂F2 ∂F2    ∂C   ∂F2


   = − , (4.4.11)
 ∂Y
 ∂C ∂Z    ∂X 0   ∂X 0 
∂F3 ∂F3 ∂F3 ∗
 ∂Z   ∂F3 

∂Y ∂C ∂Z ∂X0 ∂X0
where Y ∗ , C ∗ , Z ∗ denote the optimal values of the unknown functions. Sub-
stituting the values of the partial derivatives from (4.4.10) we obtain
∂Y ∗
 

1 −1 1  ∂x0∗ 
  
 −b 1 0    ∂C 
 = [1 0 0]. (4.4.12)
 ∂X0 
−z 0 1  ∂Z ∗ 
∂X0
Denoting this system as Ax = b, and using Cramer’s rule, we find that

1 −1 1
|A| = 1 − b + z > 0, |A1 | = 0 1 0 = 1,
0 0 1
1 1 1 1 −1 1
|A2 | = −b 0 0 = b, |A3 | = −b 1 0 = z.
−z 0 1 −z 0 0
4.5 APPLICATION TO MATHEMATICAL ECONOMICS 105

Thus,
∂Y ∗ |A1 | 1
= = > 0,
∂x0 A| 1−b+z
∂C ∗ |A2 | b
= = > 0,
∂x0 A| 1−b+z
∂Z ∗ |A3 | z
= = > 0.
∂x0 A| 1−b+z
Case 2. The partial derivatives of the functions Y ∗ , C ∗ , Z ∗ with respect to b
∂Y ∗ ∂C ∗ ∂Z ∗ T
h i
are obtained from (4.4.12) by replacing the matrix x by ,
∂b ∂b ∂b
thus giving  ∂Y ∗ 
 
1 −1 1  ∂b∗ 
 −b 1 0   ∂C  = [ 0 Y ∗ 0 ] .
 
 ∂b  (4.4.13)
−z 0 1  ∂Z ∗ 
∂b
Then using Cramer’s rule we get
∂Y ∗ Y∗ ∂C ∗ (1 + z)Y ∗ ∂Z ∗ zY ∗
= > 0, = > 0, = > 0.
∂b 1−b+z ∂b 1−b+z ∂b 1−b+z
Case 3. The partial derivatives of the functions Y ∗ , C ∗ , Z ∗ with respect to z
∂Y ∗ ∂C ∗ ∂Z ∗ T
h i
are obtained from (4.4.12) by replacing the matrix x by ,
∂z ∂z ∂z
thus giving  ∂Y ∗ 
 
1 −1 1  ∂b 
 ∂C ∗  ∗
 −b 1 0 
 ∂b  = [ 0 0 Y ] . (4.4.14)
 
−z 0 1  ∂Z ∗ 
∂b
Then using Cramer’s rule we get

∂Y ∗ −Y ∗ ∂C ∗ (−bY ∗ ∂Z ∗ (1 − b)Y ∗
= < 0, = < 0, = > 0. 
∂z 1−b+z ∂z 1−b+z ∂z 1−b+z

4.5 Application to Mathematical Economics


The KKT method is often used in theoretical models in order to obtain qual-
itative results.
Example 4.11. (Value function) Consider the optimization problem with
constant inequality constraints:

Maximize f (x) subject to gi (x) ≤ ai , hj (x) = 0. (4.5.1)


106 4 CONCAVE PROGRAMMING

The value function is defined as

V (a1 , . . . , an ) = sup f (x) subject to gi (x) ≤ ai , hj (x) = 0,


x
i = 1, . . . , m, j = 1, . . . , k. (4.5.2)

Note that dom V = {ai ∈ Rm } for some x ∈ Rn , gi (x) ≤ ai , i = 1, . . . , m.


Solution. Each coefficient µi is the rate at which the value function V
increases as ai increases. Thus, if we interpret each ai as a resource constraint,
the coefficients µi determine the size of increase in the optimum value of
the function V that will result from an increase in a resource. This sort of
interpretation is important in economics and is used in utility maximization
problems. 
Example 4.12. Consider, for example, the case of a firm that wants
to maximize its sales revenue subject to a minimum profit constraint. Let Q
denote the quantity of output produced (to be chosen), R(Q) the sales revenue
with a positive first derivative and with zero value at zero output, C(Q) the
production costs with a positive first derivative and with a nonnegative value
at zero output, and Gmin the positive minimal acceptable level of profit. Then
the problem is of the type (4.3.1) provided that the revenue function levels
off so it eventually is less steep than the cost function. This problem can be
formulated as follows:

Maximize R(Q) subject to Gmin ≤ R(Q) − C(Q), Q > 0, (4.5.3)

where the KKT conditions are


dR dC
(1 + µ) −µ ≤ 0, Q ≥ 0
dQ dQ
dR dC
(1 + µ)Q −µ = 0,
dQ dQ (4.5.4)
R(Q) − C(Q) − Gmin ≥ 0,
µ ≥ 0,
 
µ R(Q) − C(Q) − Gmin = 0.

Solution. We cannot take Q = 0, because this choice violates the mini-


mum profit constraint. Hence, Q > 0, and the third KKT condition implies
that the first condition holds with equality. Solving this equality we get

dR µ dC
= . (4.5.5)
dQ 1 + µ dQ

Since, by assumption, both dR/dQ and dC/dQ are strictly positive, the in-
equality in the non-negativity condition on µ implies that µ > 0. Thus, this
4.5 APPLICATION TO MATHEMATICAL ECONOMICS 107

firm operates at a level of output at which the marginal revenue dR/dQ is less
than the marginal cost dC/dQ. This conclusion is interesting because it is in
contradiction with the behavior of a profit maximizing firm which operates at
a level at which they are equal.

4.5.1 Peak Load Pricing. (Supply-and-demand Pricing) Peak and off-peak


pricing requires planning as a maximization problem for firms that invest
in capacity in order to target a primary market. However, there is always
a secondary market in which the firm can often sell its product. Once the
capacity has been purchased to service the firm’s primary market, the capital
is freely available (up to the capacity) to be used in the secondary market.
Typical examples include: schools and universities who build to meet day-time
needs (peak) but may offer night-school classes (off-peak); theaters that offer
shows in the evening (peak) and matinees (off-peak); or trucking companies
who have dedicated routes but may choose to enter ‘back-haul’ markets. Since
the capacity price is a factor in the profit maximizing decision for the peak
market and is already paid, it is normally not a factor in calculating optimal
price and quality for the smaller (off-peak) market. However, if the secondary
market’s demand is close to the same size as the primary market, capacity
constraints may be an issue, especially given that it is common practice to
price discriminate and charge lower prices in off-peak periods. Even though
the secondary market is smaller than the primary, it is possible at the lower
(profit maximizing) price that off-peak demand exceeds capacity. In such
cases, choices must be made taking both markets into account, making the
problem a classic application of KKT conditions.
Consider a profit maximizing company which faces two demand curves

P1 = D1 (Q1 ) in the peak period,


P2 = D2 (Q2 ) in the off-peak period. (4.5.6)

To operate, the firm must pay b per unit of output, whether it is peak (day)
or off-peak (night). Moreover, the firm must purchase capacity at a cost of
c per unit of output. Let K denote the total capacity measured in units of
Q. The firm must pay for capacity, regardless of whether it operates in the
off-peak period or not. Then the question is: Who should be charged for the
capacity costs, peak or off-peak customers? Thus, the firm’s maximization
problem becomes
Maximize P1 Q1 + P2 Q2 − b(Q1 − Q2 ) − cK
subject to K ≥ Q1 , K ≥ Q2 , (4.5.7)

where P1 and P2 are defined in (4.5.6). The Lagrangian of this problem is

L = D1 (Q1 )Q1 + D2 (Q2 )Q2 − b(Q1 + Q2 ) − cK + λ1 (K − Q1 )λ2 (K − Q2 ),


(4.5.8)
108 4 CONCAVE PROGRAMMING

and the KKT conditions are


∂D1
L 1 = D 1 + Q1 − b − λ1 = 0, (M R1 − b − λ1 = 0),
∂Q1
∂D2
L 2 = D 2 + Q1 − b − λ2 = 0, (M R2 − b − λ2 = 0),
∂Q2 (4.5.9)
LK = −c + λ1 + λ2 = 0, (c = λ1 + λ2 ),
Lλ1 = K − Q1 ≥ 0, λ1 ≥ 0,
Lλ2 = K − Q2 ≥ 0, λ2 ≥ 0.

To find a solution, we follow the following steps:


Step 1. Since D2 (Q2 ) < D1 (Q1 ), we choose and try λ2 = 0. Then the
KKT conditions (4.5.9) give

M R1 = b + c − λ2 = b + c, M R2 = b + λ2 = b,

which implies that K + Q1 . Then we check to see if Q∗2 ≤ K. If this inequality


is true, then we have a valid solution. Otherwise, the second constraint is
violated and the assumption λ2 = 0 becomes false. Then we go to the next
step.
Step 2. If Q∗2 > K, then Q∗1 = Q∗2 = K, and M R1 = b+λ1 , M R2 = b+λ2 .
Since c = λ1 + λ2 , then λ1 and λ2 represent the share of c each group must
pay.
We will illustrate this optimization problem by a numerical example, as
follows.
Suppose the demand during the peak period is P1 = 22 − 10−5 Q1 , and
during the off-peak period is P2 = 18 − 10−5 Q2 .
To produce a unit of output per half-day requires a unit of capacity costing
8 cents per day. The cost of a unit capacity is the same whether it is used
during the peak period only or off-peak period also. In addition to the costs
of capacity, it costs 6 cents in operating cost (labor and fuel) to produce 1
unit per half day (both during peak and off-peak periods).
If we assume that the capacity constraint is binding (λ2 −0), then the KKT
conditions (4.5.9) become

λ1 c = 8,
MR
z }| { zMC}| {
−5
22 − 2 × 10 Q1 = b + c = 14,
18 − 2 × 10−5 Q2 = b = 6.

Solving this system we get Q1 = 40, 000, Q2 = 60, 000, which violates the
assumption Q2 > Q1 = K (i.e., the second constraint is non-binding). Hence,
4.6 COMPARATIVE STATICS 109

assuming that both constraints are binding, i.e., Q1 = Q2 = Q, the KKT


conditions become
λ1 + λ2 = 8,
22 − 2 × 10−5 Q = 6 + λ1 , (4.5.10)
−5
18 − 2 × 10 Q = 6 + λ2 ,
solving which we get Q = K = 50, 000, λ1 = 5, λ2 = 2, P1 = 17, P2 = 13.
Since the capacity constraint is binding in both markets, the peak market
pays λ1 = 6 and the off-peak market pays λ2 = 2 of the capacity cost. 

4.6 Comparative Statics


Supply-demand model. Consider the case when there is one endogenous
variable (like price p) and one exogenous variable (like consumers’ income y).
Let the supply QS and the demand QD of a commodity be defined by
QS = a + bp, a, b > 0,
(4.6.1)
QD = m − np + ky,

subject to the equilibrium condition Qs = Qd . We will solve this system for


the equilibrium price level p∗ by writing the above equations as

a + bp = m − np + ky,
(b + a)p = m − a + ky,

m − a + ky
which gives p ≡ p∗ = . To determine the equilibrium level of p∗ ,
b+n
we find the change in p∗ with respect to y, or any of the other five param-
dp∗ k
eters a, b, m, n, k, we have = > 0. This means that an increase in
dy bn
consumers’ income will result in an increase in the equilibrium price of the
commodity.
This analysis can also be carried out by defining Eqs (4.6.1) explicitly as
Qs − QD = 0, i.e., the implicit function F = a + bp − m + np − ky. Then,
assuming Fp 6= 0, we get
dp∗ Fy
=− .
dy Fp
Since by differentiating we get Fp = b + n and Fy = k, the ratio is given by
dp∗ k
= > 0.
dy b+n
Next, consider the case when there is more than one endogenous variable.
Thus, let
F1 (y1 , y2 ; x1 , x2 ) = 0,
(4.6.2)
F2 (y1 , y2 ; x1 , x2 ) = 0.
110 4 CONCAVE PROGRAMMING

To find the partial derivatives of this system with respect to one variable, say
y1 , the total derivative of both functions with respect to x1 is given by

∂F1 ∂y1 ∂F1 ∂y2 ∂F1


+ + = 0,
∂y1 ∂x1 ∂y2 ∂x1 ∂x1
(4.6.3)
∂F2 ∂y1 ∂F2 ∂y2 ∂F2
+ + = 0.
∂y1 ∂x1 ∂y2 ∂x1 ∂x1

These equations can be written in matrix form as


 ∂F
1 ∂F1   ∂y1∗   ∂F1 

 ∂y1 ∂y2   ∂x1  =  ∂x1  ,
 ∂F ∗ 
2 ∂F2   ∂y2  ∂F2 

∂y1 ∂y2 ∂x1 ∂x1

or
JX = B, (4.6.4)

where y1∗ , y2∗ denote the incomes at the equilibrium point. Since

∂F1 ∂F2 ∂F2 ∂F1


|J| = − 6= 0,
∂y1 ∂y2 ∂y1 ∂y2

the optimal values of the endogenous values y1∗ , y2∗ are determined as implicit
functions of exogenous variables x1 :

∂yi |Ji |
= . (4.6.5)
∂xi |J|

∂y1∗
Using Cramer’s rule, the first derivative is obtained by replacing the
∂x1
∂y2∗
first column of J with the column vector B, and the second derivative is
∂x1
obtained by replacing the second column of J with the column vector B :

∂F1 ∂F1

∂x1 ∂y2  ∂F ∂F
∂F2 ∂F2 1 2 ∂F2 ∂F1 
− − −
∂y1∗ |J1 | ∂x1 ∂y2 ∂x1 ∂y2 ∂x1 ∂y2
= = ∂F ∂F1 = ,
∂x1 |J| 1 ∂F1 ∂F2 ∂F2 ∂F1

∂y1 ∂y2 ∂y1 ∂y2 ∂y1 ∂y2
∂F2 ∂F2
∂y1 ∂y2
4.6 COMPARATIVE STATICS 111

and
∂F1 ∂F1
− −
∂y1 ∂x1  ∂F ∂F
∂F2 ∂F2 1 2 ∂F2 ∂F1 
− − − −
∂y2∗ |J2 | ∂y1 ∂x1 ∂y1 ∂x1 ∂y1 ∂x1
= = ∂F1 ∂F1 = .
∂x1 |J| ∂F1 ∂F2 ∂F2 ∂F1

∂y1 ∂y2 ∂y1 ∂y2 ∂y1 ∂y2
∂F2 ∂F2
∂y1 ∂y2

∂y1∗ ∂y2∗
The partial derivatives and are determined by the same method.
∂x2 ∂x2
Example 4.13. Let the equilibrium in the goods and service market (IS
curve) and the money market (LM curve) be given, respectively, by

F1 (Y, i; C0 , M0 , P0 ) = Y − C0 − C(Y, i) = 0, 0 < CY < 1, Ci < 0,


(4.6.6)
F2 (Y, i; C0 , M0 , P0 ) = L(Y, i) − M0 /P = 0, LY > 0, Li < 0,
(4.6.7)

where L(Y, i) denotes the demand for money, M0 the supply of money, C0 the
autonomous consumption, i the interest, and P the price level; thus M0 /P
becomes the supply of real money. For the sake of simplicity, hold P as
constant. Then the equilibrium level of P and i is affected by a change in C0 .
Using the above method and Cramer’s rule, we get from (4.6.6) and (4.6.7):

∂Y ∂Y ∂i
− 1 − CY − Ci = 0,
∂C0 ∂C0 ∂C0
∂Y ∂i
LY + Li = 0,
∂C0 ∂C0

which in matrix form JX = B is


 ∂Y ∗ 
   
1 − CY −Ci  ∂C0  1
 ∂i∗  = .
LY Li 0
∂C0

Then, using Cramer’s rule, we get

∂Y ∗ Li
= > 0,
∂C0 (1 − CY )Li + Ci LY
∂i∗ −Li
= > 0.
∂C0 (1 − CY )Li + Ci LY
112 4 CONCAVE PROGRAMMING

This means that an increase in C0 will produce an increase in the equilibrium


level of interest i. 
Example 4.14. (Wartime Rationing) The civilian population is subjected
to rationing of basic consumer goods during wartime. The rationing process
and control is effected through the use of coupons issued by the government,
which ensures that each consumer is allotted coupons every month. The
consumers on their part redeem a certain number of coupons at the time
of purchasing the food items they need. The consumer, however, pays two
prices at the time of purchase, one for the purchase of rationed goods and the
other for the price of the coupon. This practice requires that the consumer
must have sufficient money and sufficient coupons at the time of purchasing
a rationed item. As an example, we will analyze this rationing process by
considering two rationed goods, say x and y.
Let a consumer’s utility function be u(x, y), and assume that the consumer
has a fixed money budget of B dollars, and the prices of the two goods be Px
and Py , respectively. Also assume that the consumer is allotted coupons C to
be used to purchase both goods x and y at coupon prices of cx and cy . This
leads to the following consumer’s maximization problem:
Maximize u(x, y)
subject to B ≥ Px x + Py y and C ≥ cx x + cy y, x ≥ 0, y ≥ 0.
(4.6.8)
The Lagrangian for this problem is
L(x, y) = u(x, y) + λ1 (B − Px x − Py y) + λ2 (C − cx x − xy y), (4.6.9)
where λ1 , λ2 are the Lagrange multipliers on the budget and coupon con-
straints, respectively. The KKT conditions are
Lx = ux − λ1 Px − λ2 cy = 0,
Ly = uy − λ1 Py − λ2 cy = 0,
(4.6.10)
Lλ1 = B − Px x − Py y ≥ 0, λ1 ≥ 0,
Lλ2 = C − cx x − cy y ≥ 0, λ2 ≥ 0.
Example 4.15. We will solve this optimization problem using the fol-
lowing data: Let the utility function be u(x, y) = xy 2 , B = 50, Px = Py =
1, C = 60, and cx = 2, cy = 1. The Lagrangian (4.6.9) becomes
L = xy 2 + λ1 (50 − x − y) + λ2 (60 − 2x − y), (4.6.11)
and the KKT conditions (4.6.10) become
Lx = y 2 − λ1 − 2λ2 = 0, x ≥ 0,
Ly = 2xy − λ1 − λ2 = 0, y ≥ 0,
(4.6.12)
Lλ1 = 50 − x − y ≥ 0, λ1 ≥ 0,
Lλ2 = 60 − 2x − y ≥ 0, λ2 ≥ 0.
4.7 EXERCISES 113

Using the trial-and-error method we proceed as follows: we choose one of


the constraints to be non-binding and solve for x and y. After the solution,
use these values to test if the constraint that was chosen to be non-binding is
violated. If so, then re-do the process choosing the other constraint to be non-
binding. If we find that this chosen non-binding constraint is again violated,
then we can assume both constraints bind and the solution is determined by
the constraints. This leads to the following steps:
Step 1. (Ignoring the coupon constraint) Assume λ2 = 0, λ1 > 0. Then
the first-order KKT conditions (4.6.12) become

Lx = y 2 − 2λ2 = 0,
Ly = 2xy − λ2 = 0, (4.6.13)
Lλ2 = 60 − 2x − y = 0.

Solving for x and y we get x∗ = 10, y ∗ = 40.


Step 2. (Ignoring the budget constraint) Assume λ1 = 0, λ2 > 0. Then
the first-order KKT conditions (4.6.12) become

Lx = y 2 − λ1 = 0,
Ly = 2xy − λ1 = 0, (4.6.14)
Lλ1 = 50 − x − y = 0.

Solving for x and y we get x∗ = 16.67, y ∗ = 33.33. But these values when
substituted into the coupon constraint (last equation in (4.6.13)) we find that
2x∗ + y ∗ = 2(10) + 40 = 60. Thus, this solution does not violate the budget
constraint. In fact, it just meets this constraint. However, this result is
unusual in the sense that although the budget constraint is met, it is not
binding due to the particular location of the coupon constraint. 

4.7 Exercises
4.1. For the following function, find the critical points and determine if at
these points the function is a relative maximum, relative minimum, inflection
point, or saddle point.
(a) f (x, y) = 48y − 3x2 − 6xy − 2y 2 + 72x.
(b) f (x, y) = 5x2 − 3y 2 − 30x + 7y + 4xy.
(c) f (x, y) = 3x3 − 5y 2 − 225x + 70y + 20.
(d) f (x, y) = x3 + 2y 3 − 3x2 + 9y 2 − 45x − 60y
Ans. (a) Critical point (0, 12); inflection point at (0, 12).
(b) Critical point (2, 5/2); saddle point.
(c) Critical points (5, 7), (−5, 7); relative maximum at (−5, 7), saddle point
at (5, 7).
(d) Critical points (−3, 2), (−3, −5), (5, 2), (5, −5); relative maximum at
114 4 CONCAVE PROGRAMMING

(−3, −5) and a relative minimum at (5, 2); saddle points may be at (−3, 2)
and 5, −5).
4.2. Consider f (x, y) = 52x + 36y − 4xy − 6x2 − 3y 2 + 5. The first-
order partial derivatives, equated to zero, give fx = 52 − 4y − 12x = 0, fy =
36 − 4x − 6y = 0, so the critical point is (3, 4). The second-order partial
derivatives are: fxx = −12, fxy = −4, fyy = −6. Since both fxx < 0, Fyy < 0,
and (fxy )2 = (−4)2 > 0, and fxx ·fyy > (fxy )2 at the point (3, 4), the function
f has a global maximum at (3, 4).
4.3. Consider f (x, y) = 48y − 3x2 − 6xy − 2y 2 + 60x. The first-order
partial derivatives, equated to zero, give fx = −6x − 6y + 60 = 0, fy =
−6x − 4y + 48 = 0, so the critical point is (4, 6), The second-order partial
derivatives are: fxx = −6 < 0, fxy = −6 = fyx < 0, fyy = −4 < 0, and
fxx · fxy = (−6)(−4) = 24 < (fxY )2 = 36. The function f has an inflection
point at (4, 6).
4.4. Use the method of Lagrange multipliers to solve the problem: Given a
budget constraint of $110 when PK = 3 and PL = 4, maximize the generalized
Cobb-Douglas production function q = K 0.4 L0.5 .
Hint. The Lagrangian is Q = K 0.4 L0.5 + λ(162 − 3K − 4L). The critical
values are K ∗ = 24, L∗ = 22.5. Next, |H2 | > 0, |H̄| is negative definite, so Q
is maximized at the critical values.
4.5. Use the method of Lagrange multipliers to solve the problem: Max-
imize f (x, y) = 12 x2 = 12 y 2 − 2xy − y subject to the equality constraint
g(x, y) = x + y − 2. Solution. The Lagrangian for the problem is

1 2 1 2
L(x, y, λ) = x + y − 2xy − y + λ(x + y − 2).
2 2

Using the KKT conditions we get Lx = x−2y +λ = 0, Ly = y −2x−1+λ = 0,


Lλ = x + y − 2, solving which we get x∗ = 12 , y ∗ = 1, so λ∗ = 0, and
f (x∗ , y ∗ ) = −1.375. The Hessian for the problem is

Lxx Lxy 1 −2
|H| = = = −3 < 0.
Lyx Lyy −2 1

Hence f (x, y) has a local maximum at ( 12 , 1).

4.6. Maximize the utility function u(x, y) = x0.3 y 0.4 subject to the budget
constraint g(x, y) = 2x + 8y = 172.
Hint. Let u(x, y, λ) = x0.3 y 0.4 + λ(172 − 2x − 8y). The critical values are
x = 6, y ∗ = 20, λ∗ = 0.14, and the bordered Hessian |H̄| is negative definite

and the utility is maximized at the critical values.


4.7 EXERCISES 115

4.7. If the equilibrium in the goods and service market (IS curve) and the
money market (LM curve) are defined as in Example 4.13, what effect will y ∗
and i∗ have on a change in M0 .
Ans. Take p as a constant. Then

∂y  ∂y   ∂i 
− Cy − Ci = 0,
∂M0 ∂M0 ∂M0
 ∂y  ∂i  1
Ly Li − = 0.
∂M0 ∂M0 P

or
∂y ∗
 
   
1 − Cy −Ci  ∂M0  0
 ∂i∗  = .
Ly Li 1/p
∂M0
Then using Cramer’s rule we get

∂y ∗ Ci
= > 0,
∂M0 P (1 − Cy )Li + Ci Ly
∂i∗ 1 − Ci
= < 0.
∂M0 P (1 − Cy )Li + Ci Ly

This means that an increase in the money supply M0 will increase the equi-
librium level of income, but decrease the equilibrium interest rate.

4.8. Use Lagrange multipliers to optimize f (x, y, z) = xyz 2 subject to the


constraint x + y + z = 20. The Lagrangian is

F (x, y, z) = xyz 2 + λ(20 − x − y − z).

Then Fx = yz 2 − λ = 0, Fy = xz 2 − λ = 0, Fz = 2xyz − λ = 0, Fλ =
20 − x − y − z = 0. To solve these equations simultaneously, we equate λ from
the first two equations, and from the first and the third equation, giving:

yz 2 = xz 2 , yz 2 = 2xyz,

or y = x and z = 2x. Substituting these in the fourth equation we get:


20 − x − x − 2x = 0, or x∗ = 5, which gives y ∗ = 5, z ∗ = 10, λ∗ = 500 as
critical values. Thus, F (5, 5, 10) = 2500.
The second-order derivatives are: Fxx = 0, Fyy = 0, Fzz = 2xy, Fxy =
z 2 , Fyz = 2xz, Fxz = 2yz. Also, from g(x, y, z) = x + y + z − 20, we get
116 4 CONCAVE PROGRAMMING

gx = 1 = gy = gz . Then the bordered Hessian from Eq (A.24), using the


second form, is

0 1 1 1
 1 0 z2 2yz
|H| =  .
1 z2 0 2xz
1 2yz 2xz 2xy
The second principal minor is |H̄2 | = 0 − 1(−z 2) + 1(z 2 ) = 2z 2 . Thus,
|H̄2 |10 > 0. The third principal minor is

1 z 2 2yz 1 0 2yz 1 0 z2
|H̄3 | = |H| = 0 − 1 1 0 2xz + 1 1 z 2 2xz − 1 1 z2 0
1 2xz 2xy 1 2yz 2xy 1 2yz 2xz
 2

= − 1(0 − 2xz · 2xz) − z (2xy − 2xz) + 2yz(2yz − 0)
 
+ 1(z 2 · 2xy − 2yz · 2xz) − 0 + 2yz(2yz − z 2 )
 
− 1(z 2 · 2xz − 0) − 0 + z 2 (2yz − z 2 )
= z 4 − 4xz 3 − 4yz 3 − 4xyz 2 + 4x2 z 2 + 4y 2 z 2 .

Thus, |H̄3 |5,5,10 = −20000 < 0. Hence, |H̄2 | > 0 and |H̄3 | < 0 imply that |H|
is negative definite, and the function f is maximized at the critical values.
4.9. Maximize the total utility defined by u(x, y) = 10x2 + 15xy − 3y 2
when the firm meets the quota g(x, y) equal to 2x + y = 23.
Ans. Critical values: x∗ = 9, y ∗ = 5, λ∗ = 105; |H| is negative definite
and u is maximized at (x∗ , y ∗ ) = (9, 5).
4.10. Maximize the utility function u = Q1 Q2 when P1 = 1, P2 = 3, and
the firm’s budget is B = 60. Also estimate the effect of a 1-unit increase in
the budget. The budget constraint is Q1 + Q2 = 60, and the constraint is
Q1 + 3Q2 = 60. We consider the Lagrangian

L = Q1 Q2 + λ(60 − Q1 − 3Q2 ).

The first-order partial derivatives equated to zero give: LQ1 = Q2 − λ =


0, LQ2 = Q1 − 3λ = 0, Lλ = 60 − Q1 − 4Q2 = 0. Solving these equations
simultaneously we obtain the critical values: Q∗1 = 30, Q∗2 = 10 = λ∗ . The
second-order partial derivatives are: LQ1 Q1 = 0 = LQ2 Q2 , LQ1 Q2 = 1 =
LQ2 Q1 , giving the Hessian

0 1
|H| = = −1 < 0.
1 0

Hence, L is maximized at the critical values.


4.7 EXERCISES 117

With λ∗ = 10, a $1 increase in the budget will change the constant of the
constraint to 61, so that the new Lagrangian is

L = Q1 Q2 + λ(61 − Q1 − 4Q2 ),

which yields: LQ1 = Q2 − λ = 0, LQ2 = Q1 − 3λ = 0, Lλ = 121 − Q1 −


3Q2 = 0. Solving these equations simultaneously we obtain the critical values:
Q∗1 = 30.5, Q∗2 = 10.167 = λ∗ . Thus, the utility function increases from
u = (30)(10) = 300 to u = (30.5)(10.167) = 310.083, i.e., there is an increase
in the utility function of about 10.
4.11. Consider the model of Example 4.13 and assume that

Y − C0 − C(Y, i) = 0, 0 < CY < 1, Ci < 0,


L(Y, i) = M0 /P 0, LY > 0, Li < 0,

where P is constant. As in Example 4.13, we will use the comparative statics


method to determine the effect on the equilibrium levels of Y and i by a
change in the money supply M0 . The first-order partial derivatives of Y and
i with respect to M0 are

∂Y ∂Y ∂i
− CY − Ci = 0,
∂M0 ∂M0 ∂M0
∂Y ∂i 1
LY + Li − = 0,
∂M0 ∂M0 P

which in the matrix form JX = B is


 ∂Y 
   
1 − CY −Ci  ∂M0  0
 ∂i  = ,
LY Li 1/P
∂M0

where J is the same as in Example 4.13. Using Cramer’s rule, as in Example


4.13, we find that

∂F1 ∂F2 ∂F2 ∂F1



∂y1 |J1 | ∂x ∂x2 ∂x2 ∂x2
= =− 2 ,
∂x2 |J| ∂F1 ∂F2 ∂F2 ∂F1

∂y1 ∂y2 ∂y1 ∂y2
∂F1 ∂F2 ∂F2 ∂F1

∂y1 |J2 | ∂y ∂x2 ∂y1 ∂x2
= =− 1 .
∂x2 |J| ∂F1 ∂F2 ∂F2 ∂F1

∂y1 ∂y2 ∂y1 ∂y2
118 4 CONCAVE PROGRAMMING

4.12. We will discuss a combination of foreign trade and our national


market by combining the (i) goods market, (ii) foreign trade market, and (ii)
money market. Let these three markets be defined by

(i) I = I(i), Ii < 0,


S = S(Y, i), 0 < SY < 1, Si > 0,
(ii) Z = Z(Y, i), 0 < ZY < 1, Zi < 0,
(4.8.4)
X = X0 ,
(iii) MD = L(Y, i), LY > 0, Li < 0,
MS = M0 ,

where Z denotes the imports, S the savings, X0 the autonomous exports,


MD the demand for money, MS the money supply, and the other symbols are
defined in the above example.
The combined goods market remains in equilibrium (i.e., stable) when the
injections are equal to the leakages, i.e., when

I(i) + X0 = S(Y, i) + Z(Y, i),

and the money market is in equilibrium when the demand for money is equal
tp the money supply, i.e., when

L(Y, i) = M0 .

Combining these two market situations, the Lagrangian is defined by

F1 (Y, i; M0 , X0 ) = I(i) + X0 − S(Y, i) − Z(Y, i),


(4.8.5)
F2 (Y, i; M0 , X0 ) = L(Y, i) − M0 .

We will consider the following two cases:


Case 1. The partial derivatives of the functions Y ∗ , i∗ with respect to M0 ,
expressed in the matrix form Ax = b, are given by
   ∂Y ∗   ∂F 
∂F1 ∂F1 1

 ∂Y ∂i   ∂M0   ∂M0 
 ∂F
2 ∂F2   ∂i∗  =  ∂F2  ,

∂Y ∂i ∂M0 ∂M0
or  ∂Y ∗ 
   
−SY − ZY Ii − Si − Zi  ∂M0  0
 ∂i∗  = .
LY Li 1
∂M0
4.7 EXERCISES 119

Using Cramer’s rule, as in Example 4.10, we obtain


∂Y ∗ Ii − Si − Zi
= > 0,
∂M0 Li (SY − Zy ) + LY (Ii − Si − Zi )
∂i∗ SY + Z Y
= < 0.
∂M0 Li (SY − Zy ) + LY (Ii − Si − Zi )
Case 2. The partial derivatives of the functions Y ∗ , i∗ with respect to X0 ,
expressed in the matrix form Ax = b, are given by
   ∂Y ∗   ∂F 
∂F1 ∂F1 1

 ∂Y ∂i   ∂X 0   ∂X 0 
 ∂F
2 ∂F2   ∂i∗  =  ∂F2  ,

∂Y ∂i ∂X0 ∂X0
or  ∗ 
  ∂Y  
−SY − ZY Ii − Si − Zi  ∂X0  −1
 ∂i∗  = .
LY Li 0
∂X0
Using Cramer’s rule, as in Example 4.10, we obtain
∂Y ∗ Li
= > 0,
∂X0 Li (SY − Zy ) + LY (Ii − Si − Zi )
∂i∗ LY
= < 0. 
∂X0 Li (SY − Zy ) + LY (Ii − Si − Zi )

4.13. A constant elasticity of substitution (CES) function is normally


 −1/β
defined as q(K, L) = A αK −β +(1−α)L−β , where A > 0 is the efficiency
parameter, α, 0 < α < 1 the distribution parameter denoting relative factor
shares, and β > −1 is the substitution parameter that determines the value
of the elasticity of substitution. A CES production function is defined by
 −1/0.5
q(K, L) = 60 0.4K −0.5 + (1 − 0.4)L−0.5
 
= 60 0.4K −0.5 + 0.6L−0.5 −2,
subject to the equality constraint 2K + 5L = 80. The Lagrangian is
 
Q(K, L, λ) = 60 0.4K −0.5 + 0.6L−0.5 −2 + λ(80 − 2K − 5L).
The first-order partial derivatives of Q equated to zero are
 −3
QK = −120 0.4K −0.5 + 0.6L−0.5 (−0.2K −1.5 ) − 2λ
 −3
= 24K −1.5 0.4K −0.5 + 0.6L−0.5 − 2λ = 0;
−0.5 −3
 −0.5

QL = −120 0.4K + 0.6L (−0.3L−1.5 ) − 5λ
 −3
= 36L−1.5 0.4K −0.5 + 0.6L−0.5 − 5λ;
Qλ = 80 − 2K − 5L = 0.
120 4 CONCAVE PROGRAMMING

The first two equations yield


 −3
24K −1.5 0.4K −0.5 + 0.6L−0.5 2λ
 −3 = ,
36L −1.5 0.4K −0.5 + 0.6L −0.5 5λ

2 K −1.5 2
or −1.5
= , which simplifies to L1.5 = 0.4(1.5)K 1.5, or L ≈ 0.7K.
3L 5
Then using the last equation 80 − 2K − 5L = 0, we get the critical values as
K ∗ ≈ 14.45 and L∗ ≈ 10.18.
Next, the second-order partial derivatives of Q are:
 −3  −4
QKK = −36K −2.5 0.4K −0.5 + 0.6L−0.5 −14.4K −3 0.4K −0.5 + 0.6L−0.5 ,
 −3   −4
QLL = −54L−2.5 0.4K −0.5 + 0.6L−0.5 + 32.4L−3 0.4K −0.5 + 0.6L−0.5 ,
  −4
QKL = 21.6K −1.5L−1.5 0.4K −0.5 + 0.6L−0.5 = QLK .

Using the values of K ∗ for K and L∗ for L, and carrying out some compu-
tations, we find that QKK ≈ −1.09, QLL ≈ −2.24, and QKL ≈ 1.5. Thus,
since QKK < 0, QLL < 0, and QKK QLL < (QKL )2 , we conclude that q is
maximized at the point (K ∗ , L∗ ). Alternatively, from the Hessian is |H| =
−1.09 1.5
, we find that |H1 | = −1.09 < 0, and |H2 | = |H| = −3.85 < 0,
1.5 −2.24
so the Hessian is negative definite (ND), and q is maximized at (K ∗ , L∗ ). 
5
Convex Programming

As we have seen, optimization problems deal with finding the maximum or


minimum of a function, called the objective function, subject to certain pre-
scribed constraints. As opposed to concave programming, we minimize a given
objective function with or without constraints in convex programming. Thus,
given functions f, g1 , . . . , gm and h1 , . . . , hk defined on some domain D ⊂ Rn ,
the minimization problem is stated as follows: Determine min f (x) subject
x∈D
to the constraints gi (x) ≤ 0 for all i = 1, . . . , m and all hj (x) = 0 for all
j = 1, . . . , k.

5.1 Minimization Problems


Some minimization problems without any constraints have already been pre-
sented in previous chapters. However, we will again discuss the unconstrained
case, and then consider the cases of necessary and sufficient conditions for (lo-
cal) optimality with only equality constraints, only inequality constraints, and
combined equality and inequality constraints.

5.1.1 Unconstrained Minimization. Assume that the function f : D 7→


Rn is a continuously differentiable function. Then for unconstrained min-
imization, the necessary and sufficient conditions for a local minimum x∗
of f (x) are: (i) the first-order partial derivatives with respect to each xi ,
i = 1, 2, . . . , n, must be zero, i.e.,

∂f
(x) = 0, (5.1.1)
∂xi

where x∗ is obtained by solving equations (5.1.1) simultaneously; and (ii) the


Hessian |H| of f at x∗ is positive semidefinite (PSD), i.e.,

|H|(f ) ≥ 0 for all x∗ , (5.1.2)

where the Hessian is defined in §1.6.2, and definite and semidefinite matrices
are discussed in §1.5.
122 5 CONVEX PROGRAMMING

Example 5.1. Given the function f (x, y) = 3x2 − xy + 2y 2 − 4x − 7y + 10,


we have fx = 6x − y − 4 = 0, fy = −x + 4y − 7 = 0, solving which we
get the critical point (x∗ , y ∗ ) = (1, 2). Next, we take the second-order direct
partial derivatives and evaluate them at the above critical point. Thus, fxx =
6, fyy = 4, fxy = −1 = fyx , and we get

fxx (1, 2) = 6 > 0, fyy (1, 2) = 4 > 0.

Also, it is easy to check that fxx ·fyy > (fxy )2 . Hence, f has a global minimum
at (1,2). 

5.1.2 Equality Constraints. The method of Lagrange multipliers is used in


such cases. Thus, given the problem to minimize a function f (x, y) subject to
a constraint g(x, y) = k (constant), the Lagrangian can be defined by either
of the following two equivalent forms:

L(x, y, λ) = f (x, y) + λ(g(x, y) − k), (5.1.3)

or
L(x, y, λ) = f (x, y) − λ(k − g(x, y)), (5.1.4)
where λ > 0 is known as the Lagrange multiplier, f (x, y) as the original func-
tion or the objective function, and g(x, y) as the constraint. Since the con-
straint is always set equal to zero, the product λ(g(x, y) − k), or λ(k − g(x, y))
is zero, and therefore, the addition of this term does not change the value
of the objective function f (x, y). The critical values at which the objective
function is optimized are denoted by x∗ , y ∗ , λ∗ , and are determined by taking
the first-order partial derivatives with respect to x, y, and λ, equating them
to zero, and solving them simultaneously:

Lx (x, y, λ) = 0, Ly (x, y, λ) = 0, Lλ (x, y, λ) = 0. (5.1.5)

The second-order conditions, which are obviously different from those of the
unconstrained optimization, are similar to those discussed in the previous
chapter.
The first-order conditions (5.1.5) are similar to the KKT conditions, which
are the necessary conditions, discussed in detail in §4.3; they are applicable in
equality and inequality constraints to the Lagrangian L(x, y, λ) = f (x, y) +
λ(g(x, y) − k).

Example 5.2. Minimize 0.5x2 + y 2 − xy − y such that x + y = 5. The


Lagrangian is

L(x, y, λ) = 0.5x2 + y 2 − xy − y + λ(x + y − 5),


5.1 MINIMIZATION PROBLEMS 123

so we have
∂L ∂L ∂L
= 2y − x − 5 + λ = 0, = x − y + λ = 0, and = x + y − 5 = 0.
∂y ∂x ∂λ
Solving these equations simultaneously, we get the optimal values: x∗ =
1.5, y ∗ = 3.5, and λ∗ = 2. Then f (x∗ , y ∗ .λ∗ ) = 4.625. Notice that the factor
g(x, y) − k with these optimal values is zero, as expected. To check the suf-
1 −1
ficient conditions: the Hessian |H| = = 2 > 0, and the first-order
−1 1
principal |H1 | = 1 > 0, while the second-order principal |H2 | = |H| > 0.
Thus, the Hessian positive definite, and the conditions for a minimum are
satisfied. 
5.1.3 Equality Constraints: General Case. Consider the problem:
Minimize f (x) such that gj (x) = 0, j = 1, 2, . . . , m, x ∈ Rn .
Case 1 m = 1 (single constraint). This case corresponds to the case in
§5.1.2, and the method of Lagrange multiplier is used. In this case since the
(equality) constraint is g(x) = 0, the point x lies on the graph of the nonlinear
equation g(x) = 0 (see Figure 5.1). This necessary condition reduces to
∂f ∗
(x ) + λg(x∗ ) = 0, (5.1.6)
∂x
and the point x∗ where the minimum occurs is called a minimizer for the
problem. Notice that condition (5.1.6) can also be expressed as ∇f (x∗ ) +
λg(x∗ ) = 0.

Figure 5.1 Equality constraint in R.

Case 2 (general case). In Rn the necessary condition (5.1.6) holds for each
constraint gj (x) = 0. The Lagrangian for this problem is
m
X
L(f, λ ) = f (x) + λj gj (x). (5.1.7)
j=1
124 5 CONVEX PROGRAMMING

The KKT (necessary) conditions are: If the point x∗ ∈ Rn is a minimizer for


the problem, then for some λ ∗ ∈ Rn :
m
∂L ∗ X ∂hj ∗
(x ) + λj (x ) = 0, and gj (x∗ ) = 0 for all i, j. (5.1.8)
∂xi j=1
∂xj

In addition to conditions (5.1.8), suppose that z 6= 0 for an arbitrary point


z ∈ Rn . Then we have
∂gj ∗
zT (x ) = 0, j = 1, . . . , m =⇒ zT |H| z > 0, (5.1.9)
∂xj

zT ∇gj (x∗ ) = 0, j = 1, . . . , m =⇒ zT ∇2x L(x∗ , λ∗ )z > 0, (5.1.9)


where |H| is the Hessian for the Lagrangian L(x∗ , λ∗ ), zT = [ z1 , z2 , . . . , zm ],
and f has a strict local minimum at x∗ such that gj (x∗ ) = 0 for j = 1, . . . , m.
Note that the sufficient condition (5.1.9) is useful when the Hessian |H| = 0.
!!! Note that condition (5.1.8) can also be expressed as
m
X
∇f (x∗ ) + λj ∇hj (x∗ ) = 0, and gj (x∗ ) = 0 for all i, j.
j=1

Example 5.3. Minimize f (x, y) = x2 +y 2 −xy +4y subject to the equality


constraint x + y = 2, i.e., g(x, y) = x + y − 2. The Lagrangian is

L(x, y, λ) = x2 + y 2 − xy + 4y + λ(x + y − 2).

The KKT conditions give

∂L ∂L ∂L
= 2x − y + λ = 0, = 2y − x + 4 + λ = 0, = x + y − 2 = 0,
∂x ∂y ∂λ

solving which we get x∗ = 2, y ∗ = 0, λ∗ = 4, giving f (x∗ , y ∗ ) = 4. Thus, the


point (x∗ , y ∗ ) satisfies the necessary conditions to be a minimum.
The Hessian for the problem is

Lxx Lxy 2 −1
|H| = = = 3 > 0.
Lyx Lyy −1 2

Hence f (x, y) has a local minimum at (2, 0). 


Example 5.4. Minimize f (x, y) = 12 x2 + 12 y 2 − xy − 4y subject to the
equality constraint g(x, y) = x + y − 3. The Lagrangian for the problem is

1 2 1 2
L(x, y, λ) = x + y − xy + 4y + λ(x + y − 2).
2 2
5.1 MINIMIZATION PROBLEMS 125

Using the KKT conditions we get Lx = x − y + λ = 0, Ly = y − x − 4 + λ = 0,


Lλ = x + y − 3, solving which we get x∗ = 25 , y ∗ = 21 , so λ∗ = 2, and
f (x∗ , y ∗ ) = 5. The Hessian for the problem is

Lxx Lxy 1 −1
|H| = = = 0.
Lyx Lyy −1 1
 
∂g 1
Since the Hessian test fails, we use condition (5.1.9). Since (x, y) = ,
∂xi  1
∂gj z
we have zT (x) = z1 + z2 = 0, i.e., z2 = −z1 . Next, consider z = 1 6=
  ∂xi z2
0
. Then
0
  
T 1 −1 z1
z |H| z = [ z1 z2 ] = (z1 − z2 )2 = (2z1 )2 > 0.
−1 1 z2

Hence, f (x, y) has a strict local minimum at ( 25 , 21 ). 

Figure 5.2 For Example 5.5.

Example 5.5. (Only one constraint) Minimize f = x2 + y 2 + z 2 + w2


subject to x + y + z + w = 1. The Lagrangian is F = x2 + y 2 + z 2 +
w2 + λ(1 − x − y − z − w). From Fx = 0, where x = {x, y, z, w}, we get
2x − λ = 0, 2y − λ = 0, 2z − λ = 0, and 2w − λ = 0. Thus, x = y = z =
w = λ/2, and so x + y + z + w = 2λ, and using the constraint we get λ = 21 .
Hence, x = y = z = w = 41 , and f = 14 . The function f is represented,
for convenience, in two dimensions (x, w) in Figure 5.2, where function f is
126 5 CONVEX PROGRAMMING

a circle, and the constraint is the straight line x + 1 + w = 1. The four


dimension case can be easily visualized from Figure 5.2, where the circles can
be replaced by spheres, and we are seeking the smallest sphere that intersects
with the equality constraint which will be a three-dimensional plane in a four
dimensional space. 

5.1.4 Inequality Constraints. Consider the problem: Minimize f (x) such


that gj (x) ≤ 0, x ∈ Rn , and j = 1, . . . , n. The inequality constraint is
replaced by g(x) + S 2 = 0, where S is called the slackness condition. Then
the necessary KKT conditions (5.1.5) is applied.
Example 5.6. Minimize f (x, y) = x2 + 12 y 2 − 12x − 4y − 60 subject to the
constraint 30x + 2y ≤ 120. Solution. The Lagrangian with the slackness
variable S is

1
L(x, y, λ) = x2 + y 2 − 12x − 4y − 60 + λ(30x + 20y + S 2 − 120).
2

Then

∂L
= 2x − 12 + 30λ = 0,
∂x
∂L
= y − 4 + 20λ = 0,
∂y
∂L
= 30x + 20y + S 2 − 120.
∂λ

This is a nonlinear system of equations, which defines only the necessary KKT
conditions. We now consider two cases:
Case 1. If λ = 0, then solving the first two equations, we get x = 6 and
y = 4. Then the third equation gives S 2 = −140, which is infeasible.
60
Case 2. If S = 0, then the first two equations give x = 17 = 3.529 and
12 28
y = 17 = 0.706. Then from the third equation we get λ = 17 = 1.647. Thus,
the only feasible solution is the point( 60 28 ∗ ∗
17 , 17 ), yielding f (x , y ) = −92.47.
2 0
Next, to check the sufficient conditions, we have the Hessian |H| = =
0 1
8 > 0, and the first-order and second-order principals are |H1 | = 2 > 0, and
|H2 | = |H| > 0. Thus, the Hessian positive definite, and the conditions for a
minimum are satisfied. 
Example 5.7. Minimize 0.5x2 + 0.5y 2 − 8x − 2y − 80 subject to the
constraint g(x, y) = 20x + 10y ≤ 130. Using the slackness variable S ≥ 0, the
constraint becomes g(x, y) = 20x + 10y + S − 130 = 0. The Laplacian is

L(x, y, S, λ) = 0.5x2 + 0.5y 2 − 8x − 2y − 80 + λ(20x + 10y + S − 130).


5.1 MINIMIZATION PROBLEMS 127

Then the KKT conditions give

∂L ∂L
= x − 8 + λ = 0, = y − 2 + 10λ,
∂x ∂y
∂L ∂L
= 20x + 10y + S − 130, = λ.
∂λ ∂S

We have a system of nonlinear equations. To solve it, we consider the following


two cases:
Case 1. Let λ = 0. Then we get x = 8 and y = 2, which give S = −50
(infeasible).
Case 2. Let S = 0. Then x − 8 + 20λ = 0, y − 2 + 10λ = 0. Solving these
two equations by eliminating λ we get x − 2y = 4, which when used with the
third equation gives x = 6, y = 1, and λ = 0.05. Hence, the only solution is
the point x = (6, 1), and if the given problem has a solution, then this must
be the required solution. 

5.1.5 General Linear Case. Consider the problem: Minimize f (x) subject
to the constraint g(x) = gj (x) ≤ 0, j = 1, 2, . . . , m, x ∈ Rn . The geometric
interpretation of this problem is as follows: As in the case of the equality
constraint, in this case we also have the necessary conditions defined by Eq
(5.1.6) for some value of λ which in the equality case is simply a positive or
negative real number. However, in the case of inequality constraints the sign
of λ is known in advance depending on the direction of −∇f , as shown in
Figure 5.3, where ∇g represents the direction of increase of g(x).
Thus, for minimization, we have:

g(x) ≤ 0 =⇒ λ ≥ 0, and g(x) ≥ 0 =⇒ λ ≤ 0.

Then the Lagrangian is L(x, λ) = f (x) + λ g(x), and the KKT conditions are

∂L ∗ ∗ ∂f ∗ ∂g ∗
(1) (x , λ ) = (x ) + λ∗ (x ) = 0, for all i = 1, . . . , n,
∂xi ∂xi ∂xi

(2) g(x ) ≤ 0, (5.1.10)

(3) λ ∗ g(x∗ ) = 0, λ ∗ ≥ 0.

Note that condition (2) yields the given inequality constraint, and condition
(3) is complementary slackness, i.e.,

g(x∗ ) < 0 =⇒ λ ∗ = 0; λ ∗ > 0 =⇒ g(x∗ ) = 0.


128 5 CONVEX PROGRAMMING

Figure 5.3 Inequality constraint in R and sign of λ.

In the general case of m inequality constraints, if x∗ is a minimizer, then


m
P
for all λ∗j ≥ 0 for j = 1, . . . , m, such that for L(x, λ ) = f (x) + λ∗j gj (x),
j=1
the KKT necessary conditions are
m
∂L ∗ ∗ ∂f ∗ X ∂gj ∗
(1) (x , λ ) = (x ) + λ∗j (x ) = 0 (n equations),
∂x ∂x j=1
∂x
(5.1.11)
(2) gj (x∗ ) ≤ 0 for j = 1, 2, . . . , m,
(3) {g(x∗ )} λ ∗j = 0, j = 1, 2, . . . , m.

Note that if the inequality constraint are of the form g(x) ≥ 0, then we must
have λ∗j ≤ 0; or, alternatively, use −g(x) and retain λ ∗ ≥ 0.
!!! Note that condition (1) in (5.1.10) and (5.1.11) can also be expressed
respectively as ∇x L(x∗ , λ ∗ ) = ∇f (x∗ ) + λ ∗ ∇g(x∗ ) = 0, and ∇Lx (x∗ , λ ∗ ) =
m
P
∇f (x∗ ) + λ∗j ∇gj (x) = 0 (n equations).
j=1

5.2 Nonlinear Programming


Consider the nonlinear program: Minimize f (x) subject to the nonlinearity
constraints

gj (x) ≤ 0, j = 1, 2, . . . , m,
gj (x) ≥ 0, j = m + 1, . . . p,
hj (x) = 0, j = p + 1, . . . , q, x ∈ Rn ,

where f is a convex function on Rn ; gj , j = 1, . . . , m, are convex functions on


Rn ; gj , j = m + 1, . . . , p, are concave functions on Rn ; and hj , j = 1, . . . , q,
5.2 NONLINEAR PROGRAMMING 129

n
P
are linear functions of the form hj (x) = ajk xk − bj . Note that the domain
k=1
of each one of these functions is nonempty. Thus, a convex minimization
program has a convex objective, and the set of feasible solutions is a convex
set.
The Lagrangian is

m
X p
X q
X
L(x, λ ) = f (x) + λj gj (x) − λj gj (x) + µj gj (x). (5.2.1)
j=1 j=m+1 j=p+1

If x∗ minimizes f (x) while the above conditions (5.1.11) are satisfied, then
provided certain regularity conditions, or constraint qualifications to be dis-
cussed in the sequel, are met, there exist vectors λ ∗ and µ ∗ such that the
following KKT necessary conditions are satisfied:

m p q
∂f ∗ X ∂gj ∗ X ∂gj ∗ X ∂gj ∗
(1) (x ) + λj (x ) − λj (x ) + µj (x ),
∂x j=1
∂x j=m+1
∂x j=p+1
∂x
(2) all constraints given in (5.2.1) are satisfied,
(3) λ∗j ≥ 0, j = 1, 2, . . . , p, (5.2.2)
(4) λ∗j ≥ 0, j = 1, 2, . . . , p,
(5) µ∗j are unrestricted in sign for j = p + 1, . . . , q.

!!! Condition (1) in (5.2.2) can also be expressed as


m
P Pp q
P
∇f (x∗ ) + λj ∇gj (x∗ ) − λj ∇gj (x∗ ) + µj ∇gj (x∗ ).
j=1 j=m+1 j=p+1

Example 5.8. (One equality constraint and one inequality constraint)


Minimize f = x2 + y 2 + z 2 + w2 , subject to x+ y + z + w = 1 and w ≤ C, where
C is a scalar which will also be determined. There are two possible scenarios
for this problem, which are presented in Figure 5.4, depending on the value
of C, whether small or large; the figures represent the two-dimensional case.
Note that the shaded region is outside the feasible region because w > C
there. The Lagrangian is defined as

F = x2 + y 2 + z 2 + w2 + λ(1 − x − y − z − w) + µ(w − C),

where λ and µ are the Lagrange multipliers. Then the KKT conditions are:
∂F
(a) = 0; (b) x+y +z +w = 1; (c) w ≤ C; (d) µ ≥ 0; and (e) µ(w −C) = 0.
∂x
Condition (a) yields 2x − λ = 0, 2y − λ = 0, 2z − λ = 0, 2w − λ = 0, which
gives x = y = z = λ2 , w = λ−µ
2 .
130 5 CONVEX PROGRAMMING

Using condition (b) we get x + y + z + w = 4 · 21 λ − 12 µ = 1, which gives


4λ − µ = 2, or λ = 21 (2 + µ). Hence,
2+µ 1 µ 2+µ µ 1 3µ
x=y=z= = + ,w= − = − . (5.2.3)
8 4 8 2 2 4 8
1 3µ
Next, from condition (c), we get 4 − 8 ≤ C, or
3µ 1
≥ − C. (5.2.4)
8 4

Figure 5.4 Example 5.8: (a) C small; (b) C large.

Case 1. If C > 41 , we have the interior case (see Figure 5.4). Since
1
4 − C ≤ 0, we find from (5.2.4) that condition (d) is satisfied. Thus, from
(5.2.3) we get
1 1
x=y=z≥ ; w = 1 − (x + y + z) ≤ .
4 4
But by condition (e), we have µ = 0, and hence, x = y + z = w = 41 . This is
the optimal case, even if we require w < C and C > 41 .
Case 2. If C = 14 , this is similar to Case 1, and the unconstrained optimum
lies on the boundary.
Case 3. If C < 14 , and if w < C, then condition (e) would require that
µ = 0. But then (5.2.3) would give x = 14 , which would violate condition (c).
Hence, w = C and x = y = z = 13 (1 − C). Further,
 
1 1 1 
f = 3 (1 − C)2 + c2 = (1 − C)2 + C 2 = 1 − 2C + 4C 2 ,
9 3 3
5.2 NONLINEAR PROGRAMMING 131

1 1 1
and thus, f ≥ 4; and f = 4 when C = 4. The graph of f is presented in
Figure 5.5. 

Figure 5.5 Graph of f .

5.2.1 Two Inequality Constraints and One Equality Constraint: Min-


imize f (x) subject to the constraints

gj (x) ≤ 0, j = 1, 2, . . . , m,
gj (x) ≥ 0, j = m + 1, . . . p,
gj (x) = 0, j = p + 1, . . . , q, x ∈ Rn .

The Lagrangian is
m
X p
X q
X
L(x, λ ) = f (x) + λj gj (x) − λj gj (x) + µj gj (x). (5.2.5)
j=1 j=m+1 j=p+1

If x∗ minimizes f (x) while the above conditions (5.1.11) are satisfied, then
provided certain regularity conditions, or constraint qualifications to be dis-
cussed in the sequel, are met, there exist vectors λ ∗ and µ ∗ such that the
following KKT necessary conditions are satisfied:
m p q
∂f ∗ X ∂gj ∗ X ∂gj ∗ X ∂gj
(1) (x ) + λj (x ) − λj (x ) + µj ((x∗ ),
∂x j=1
∂x j=m+1
∂x j=p+1
∂x
(2) all constraints given in (5.2.1) are satisfied,
(3) λ∗j ≥ 0, j = 1, 2, . . . , p, (5.2.6)
(4) λ∗j ≥ 0, j = 1, 2, . . . , p,
(5) µ∗j are unrestricted in sign for j = p + 1, . . . , q.

!!! Condition (1) in (5.2.6) can also be expressed as


m
P Pp q
P
∇f (x∗ ) + λj ∇gj (x∗ ) − λj ∇gj (x∗ ) + µj ∇gj (x∗ ).
j=1 j=m+1 j=p+1
132 5 CONVEX PROGRAMMING

5.2.2 Two Inequality Constraints. The general case is: Minimize f (x)
subject to gj (x) ≤ 0 for j = 1, 2, . . . , m, and qi (x) = xi ≥ 0 for i = 1, 2, . . . , n.
Let the Lagrange multipliers be µ1 , µ2 , . . . , µn associated with each of the non-
negativity constraints. Then, using the slackness in the KKT conditions we
will have µ∗i x∗i = 0 for i = 1, 2, . . . , n, and condition (1) in (5.2.6) becomes
m n
∂f ∗ X ∂gj ∗ X ∂qi j ∗
(1a) (x ) + λj (x ) − µ∗i (x ) = 0,
∂x j=1
∂x i=1
∂x

which implies that


       
1 0 0 0
0 1 0 0
m        
∂f ∗ X ∂gj ∗ 0
∗ 
0
∗  ∗
0 0
∗  
(x ) + λj (x ) − µ1  ..  − µ2  ..  − · · · − µn−1  ..  − µn  ..  = 0,
 
∂x j=1
∂x . . . .
0 0 1 0
0 0 0 1
(5.2.7)
or
m
∂f ∗ X ∂gj ∗ ∗
(x ) + λj (x µ ),
∂x j=1
∂x
T
where µ ∗ ∈ Rn = [ µ∗1 µ∗2 · · · µ∗n ] .
Note that we must have µ ∗ ≥ 0. Thus, the above KKT necessary conditions
(5.2.6) become
m
∂f ∗ X ∂gj ∗
(1) (x ) + λj (x ) ≥ 0,
∂x j=1
∂x
(2) all constraints given in (5.2.1) are satisfied,
(3) λ∗j {gj (x∗ )} = 0, j = 1, 2, . . . , m, (5.2.8)
(4) λ∗j ≥ 0, j = 1, 2, . . . , m,
(5) µ∗i x∗ = 0 for all i,

where condition (5) means that


 
n m
X ∂f X ∂g j
x∗ µ∗ = [x∗ ]T  (x∗ ) + λ∗j (x∗ ) = 0. (5.2.9)
i=1
∂x j=1
∂x

 
n
X m
X
x∗ µ∗ = [x∗ ]T ∇f (x∗ ) + λ∗j ∇gj (x∗ ) = 0. (5.2.10)
i=1 j=1
5.2 NONLINEAR PROGRAMMING 133

Note that no explicit Lagrange multipliers are used for non-negativity con-
straints. Also, although the KKT conditions are generally used to check
optimization, they are, however, not valid under all situations. There is an-
other set of necessary conditions, known as Fritz John conditions, discussed
in the next section, which are valid at all times, but in many cases they do
not provide the same information as the KKT conditions.
!!! In the above discussion, in view of Theorem 2.18, the expression (1a) can
be expressed as
m
X n
X
∗ ∗
∇f (x ) + λj ∇gj (x ) − µ∗i ∇qi (x∗ ) = 0;
j=1 i=1

the equation (5.2.7) can be expressed as


       
1 0 0 0
0 1 0 0
m        
X 0 0 0 0
∇f (x∗ ) + λj ∇gj (x∗ ) − µ∗1  .  − µ∗  .  − · · · − µ∗  .  − µ∗  .  = 0,
 ..  2 . n−1  .  n . 
j=1   . . .
0 0 1 0
0 0 0 1

or
m
X
∇f (x∗ ) + λj ∇gj (x∗ µ∗ );
j=1

and condition (1) in (5.2.8) can be expressed as

m
X
∇f (x∗ ) + λj ∇gj (x∗ ) ≥ 0.
j=1

Example 5.9. Consider the inequality constrained optimization problem

min f (x) subject to g(x) ≤ 0,


x∈R2

where f (x) = x2 + y 2 and g(x) = x2 + y 2 − 1.


How can we determine whether x∗ is at a local minimum? Since the un-
constrained minimum of f (x) lies within the feasible region, the necessary
and sufficient conditions for a constrained local minimum are the same as
∂f ∗
those for an unconstrained local minimum, i.e., (x ) = 0 (or equivalently,

∂x
∇x f (x ) = 0) , and the Hessian is positive definite. Note that for this op-
timization problem the constraint is not active at the local minimum since
134 5 CONVEX PROGRAMMING

g(x∗ ) < 0. Hence, the local minimum is identified by the same conditions as
in case 2, Eq(5.2.2). 

Figure 5.6 (a) Isoclines of f (x). (b) Domain h(x) = 0.

Example 5.10. Consider the constrained optimization problem of Exam-


ple 5.9 except that f (x) is now defined by

f (x, y) = (x − 1.2)2 + (y − 1.2)2 ,

with the same g(x, y) = x2 + y 2 − 1 (see Figure 5.6). First, we determine


whether x∗ is a local minimizer. Since the unconstrained local minimum of
f (x) lies outside the feasible region, this is definitely an optimization problem
with an equality constraint g(x) = 0. Hence, a local optimum occurs when
∂f ∂g
(x) and (x) are parallel, so that we have
∂x ∂x
∂f ∂g
− (x) = λ (x). (5.2.11)
∂x ∂x
Also, we determine if a constrained local minimum as −∇x points away from
∂f
the feasible region. Thus, the constrained local minimum occurs when (x)
∂x
∂g
and (x) are in the same direction), which gives
∂x
∂f ∂g
− (x) = λ (x), λ > 0.  (5.2.12)
∂x ∂x
5.3 FRITZ JOHN CONDITIONS 135

!!! Eqs (5.2.11) and (5.2.12) can be expressed, respectively, as −∇x f (x) =
λ∇x g(x), and − ∇x f (x) = λ∇x g(x), λ > 0.

5.3 Fritz John Conditions


For the problem: Minimize f (x) subject to the inequality constraints gj (x) ≤
0, j = 1, 2, . . . , m, the Fritz John conditions are weaker necessary conditions,
which are based on the weak Lagrangian

m
X
L(x, λ ) = λ∗0 f (x∗ ) + λ∗j gj (x∗ )). (5.3.1)
j=1

If x∗ is the minimizer, then there exists a λ∗ ∈ Rm+1 , and the Fritz John
conditions are
m
∂ L̃ ∗ ∗ ∂f ∗ X ∂gj ∗
(1) (x , λ ) = λ∗0 (x ) + λ∗j (x ) = 0,
∂x ∂x j=1
∂x
(2) gj (x∗ ) ≤ 0, j = 1, 2, . . . , m, (5.3.2)
(3) λ∗j {gj (x∗ )} = 0, j = 1, 2, . . . , m,
(4) λ∗ ≥ 0 and λ∗ =
6 0.

!!! The first Fritz John condition in (5.3.2) can also be written as

m
X
∇x L̃(x∗ , λ ∗ ) = λ∗0 ∇f (x∗ ) + λ∗j ∇gj (x∗ ) = 0.
j=1

The Fritz John (FJ) conditions are always necessary for x∗ to be a solu-
tion. However, the KKT conditions are necessary provided certain conditions
known as constraint qualifications (CQ) are satisfied. This can be represented
as
CQ
Local optimum =⇒ Fritz John =⇒ KKT .

One of the examples of a constraint qualification (CQ) is as follows: The set


n ∂g o
k
S= (x∗ ) | k ∈ K (or equivalently, S = {∇gk (x)∗ | k ∈ K}) is linearly
∂x
independent, where K = {k | gk (x∗ )} = 0. That is, the gradient vectors of
all the constraints that are satisfied as strict equalities at x∗ must be linearly
independent.
In Fritz John conditions (5.3.2), suppose that λ∗0 = 0. Then these condi-
tions are satisfied for any function f at the point x∗ , regardless of whether or
not the function f has a minimum at x∗ . This is the main weakness of the
136 5 CONVEX PROGRAMMING

Fritz John conditions, because if λ∗0 = 0, then these conditions do not use the
objective and they are of no practical use in locating the optimal point x∗ .
Remember that the CQs are essentially the constraints that ensure the
λ∗0 > 0. Thus, if we redefine λ∗j as λ∗j /λ∗0 , j = 0, 1, . . . , m, then the Fritz John
conditions reduce to the KKT conditions, so that λ∗0 = 1 can be ignored.
Example 5.11. Minimize f (x, y) = −x subject to g1 (x, y) = y −(1−x)5 ≤
0, and g2 (x, y) = −y ≤ 0, where x = (x, y). The graphs are presented in
Figure 5.7, where the feasible region and the optimal point x∗ are identified
for this problem.
 
x
In matrix notation, x = . The optimal solution is x∗ = (1, 0), which in
y 
1
matrix form is written as x∗ = . Now, in view of x∗ , we have g(x∗ , y ∗ ) = y,
0
and g2 (x∗ , y ∗ ) = −y, and so we get

     
∂f ∗ −1 ∂g1 ∗ 0 ∂g2 ∗ 0
(x ) = , (x ) = , (x ) = .
∂x 0 ∂x 1 ∂x −1

Note that the CQs are not met, since the first partials of both constraints
that are satisfied as strict equalities at x∗ are not linearly independent. Next,
the FJ conditions are

∂f ∗ ∂g1 ∗ ∂g1 ∗
λ0 (x ) + λ1 (x ) + λ2 (x ) = 0,
∂x ∂x ∂x

i.e.,
       
−1 0 0 0
λ0 + λ1 + λ2 = ,
0 1 −1 0

which are satisfied if λ0 = 0 and λ1 = λ2 . On the other hand, the KKT


conditions are
     
∂f ∗ 1 0 0
− (x ) = = λ1 + λ2 ,
∂x 0 1 −1

i.e., λ1 (0) + λ2 (0) = 1 and λ1 (1) + λ2 (−1) = 0, which are inconsistent, that
5.3 FRITZ JOHN CONDITIONS 137

is, these equations have no solution for λ1 and λ2 . 

Figure 5.7 Example 5.11.

Example 5.12. Minimize f (x, y) = −y such that g1 (x) = x− (1 − y)3 ≤ 0,


and g2 (x) = −x ≤ 0, where x = (x, y) . The graph is similar to Figure 5.7,
where the feasible region and the optimal point x∗ are identified for this
problem.
   
x 0
In matrix notation, x = . The optimal solution is at x∗ = . Also,
y 1
     
∂f ∗ 0 ∂g1 ∗ 1 ∂g2 ∗ −1
(x ) = , (x ) = , (x ) = .
∂x −1 ∂x 0 ∂x 0

Note that the CQs are not met as they are not the necessary conditions.
However, the FJ conditions identify an optimal problem. The gradient vectors
of both constraints that are satisfied as strict equalities at x∗ are not linearly
independent. Next, the FJ conditions that provide an optimal solution are

∂f ∗ ∂g1 ∗ ∂g1 ∗
λ0 (x ) + λ1 (x ) + λ2 (x ) = 0,
∂x ∂x ∂x
i.e.,        
0 1 −1 0
λ0 + λ1 + λ2 = ,
−1 0 0 0
which are satisfied if λ0 = 0 and λ1 = λ2 . On the other hand, the KKT
conditions are      
∂f ∗ 0 1 −1
− (x ) = = λ1 + λ2 ,
∂x 1 0 0
138 5 CONVEX PROGRAMMING

i.e., λ1 (1) + λ2 (−1) = 0 and λ1 (0) + λ2 (0) = 1, which are inconsistent, that
is, they cannot be solved for λ1 and λ2 . 
5.3.1 Feasibility. The following four cases for the feasibility problem are
considered.
Case 1. A convex optimization problem with equality and inequality con-
straints is to find

min f (x) subject to gi (x) = 0, i = 1, . . . , m; hj (x) ≤ 0, j = 1, . . . , k,


x
(5.3.3)
where f : Rn 7→ R is the objective or cost function, x ∈ Rn is the optimiza-
tion variable, gi (x) are the equality constraints, and hj (x) are the inequality
constraints. The optimal value of x, denoted by x∗ , is

f (x∗ ) = inf{f (x) | gi (x) = 0, hj (x) ≤ 0.} (5.3.4)

Note that 
∗ ∞ if problem is infeasible,
x =
−∞ if problem is unbounded below,
where the infeasibility of the problem means that no x satisfies the constraints.
Then (i) x is feasible if x ∈ dom(f ) and it satisfies the constraints.
(ii) A feasible x is optimal if f (x) = f (x∗ ).
Case 2. Find

min f (x) subject to gi (x) = 0, i = 1, . . . , m; hj (x) ≤ 0, j = 1, . . . , k. (5.3.5)


x

This problem can be regarded as a special case of the above general problem
(5.3.3) with f (x) = 0. For this problem

∗ 0 if constraints are feasible,
f (x ) =
∞ if constraints are infeasible,

where feasibility of constraints implies that any feasible x is optimal.


(iii) x∗ is locally optimal if there is an A > 0 such that x is optimal for the
following problem: Find

min f (x) subject to gi (x) = 0, i = 1, . . . , m; hj (x) ≤ 0, j = 1, . . . , k (5.3.6)


x

where kz − xk2 ≤ A.
5.3 FRITZ JOHN CONDITIONS 139

Example 5.13. For n = 1, k = m = 0, consider


(a) f (x) = 1/x : dom(f ) = R+ : it has no optimal point.
(b) f (x) = − log x : dom(f ) = R+ : we find that f (x∗ ) = −∞.
(c) f (x) = x log x : dom(f ) = R++ : we have f (x∗ ) = −1/e, and x = 1/e is
optimal.
(d) f (x) = x3 − 3x : f (x∗ ) = −∞: local optima are at x = ±1, local
maximum at x = −1, and local minimum at x = 1. 
Case 3. An optimization problem has an implicit constraint if

m
\ k
\
x= dom(gi ) ∩ dom(hj ), (5.3.7)
i=0 j=1

where gi (x) = 0 and hj (x) ≤ 0 are explicit constraints. A problem is uncon-


strained if it has no explicit constraints (m = k = 0).
Example 5.14. The problem

k
X
min f (x) = − log(bi − aTi x) (5.3.8)
i=1

is an unconstrained problem with implicit constraints aTi x < bi . 


Case 4. An optimization problem with affine inequality constraints is:

Minimize f (x) subject to gi (x) ≤ 0, i = 1, . . . , m; aTi x = bj , j = 1, . . . , k,


(5.3.9)
where f and gi (i = 1, . . . , m) are convex, and equality constraints are affine.
The problem (5.3.9) is quasi-convex if the function f is quasi-convex and the
functions gi (x) are convex. The problem (5.3.9) is often written as

Minimize f (x) subject to gi (x) ≤ 0, i = 1, . . . , m; aTi x = bi . (5.3.10)

Note that the feasible set of a convex optimization problem is a convex set.
Example 5.15. Find

x
min{f (x) = x2 + y 2 } subject to g1 (x) = (x + y)2 ; h1 (x) = ≤ 0.
x 1 + y2

Note that f is convex, and the feasible set {(x, y) | x = −y ≤ 0} is convex. But
h1 is not convex, and g1 is not affine. Hence, it is not a convex optimization
problem. 
An equivalent, but not identical, problem to Example 5.15 is
140 5 CONVEX PROGRAMMING

minimize {x2 + y 2 } subject to x + y = 0, x ≤ 0.

5.3.2 Slater’s Condition. For convex programs, if there exists an x′ ∈ Rn


such that

gj (x′ ) < 0 for j = 1, 2, . . . , m, and gj (x′ ) > 0 for j = m + 1, . . . , p,

then the CQ holds at x since the relative interior is nonempty. Moreover, for
a convex program where the CQ holds, the KKT necessary conditions are also
sufficient.

5.4 Lagrangian Duality


Like linear programming, the nonlinear programming has a duality theory.
For the problem
Minimize f (x) subject to the constraints

gj (x) ≤ 0, j = 1, 2, . . . , m, and hj (x) = 0, j = 1, 2, . . . , p,

the Lagrangian is given by


X X
L(x, λ , µ ) = f (x) = λj gj (x) + µj hj (x).
j j

Then the Lagrangian duals are as follows:


Primal Problem (minmax):
Minimize L̄(x), where

L̄(x) = max L(x, λ , µ )


µ
λ ,µ

f (x) if gj (x) ≤ 0 for all j and hj (x) = 0 for all j,
=
+∞ if gj (x) > 0 or hj (x) 6= 0 for some j,

where L̄ is the primal Lagrangian. Then, the original problem is


n o
min L̄(x) = min max L(x, λ , µ )
x∈D x∈D µ
λ ,µ
n o
= min f (x) | gj (x) ≤ 0, hj (x) = 0 for all j.
x∈D

Dual Problem (maxmin):


λ , µ ) subject to λ ≥ 0, where L̂(λ
Maximize L̂(λ λ , µ ) = min L(x, λ , µ ).
x∈D
In particular, under conditions of convexity and differentiability, this re-
duces to
5.4 LAGRANGIAN DUALITY 141

∂L
max L(x, λ, µ), subject to (x, λ , µ) = 0, λ , µ ≥ 0, x ∈ D,
∂x
where L̂ is the dual Lagrangian.

Figure 5.8 Duality.

Theorem 5.1. (Weak duality theorem) The dual Lagrangian L̂(λ λ, µ ) and the
primal Lagrangian L̄(x, λ , µ ) are related by the inequalities

λ, µ ) ≤ L(x, λ , µ ) ≤ L̄(x, λ , µ ) for all λ ≥ 0 and x ∈ D,


L̂(λ
  (5.4.1)
max min L(x, λ , µ ) ≤ min max L(x, λ , µ ) .
µ
λ ,µ x x µ
λ ,µ

 
The quantity L̄(x) − L̂(λ
λ , µ ) is called the duality gap.

5.4.1 Geometrical Interpretation of Duality. Consider the problem:


Minimize f (x) subject to g(x) ≤ 0, x ∈ D. Let z1 = g(x) and z2 = f (x).
Then, as in Figure 5.8, G is the image of the set D under the map z1 =
g(x), z2 = f (x). The original problem is equivalent to
Minimize z2 , subject to z1 ≤ 0, z = (z1 , z2 ) ∈ G,
which yields the optimum solution z̄ = (z1 , z2 ) shown in Figure 5.8.
The dual problem is: For λ ≥ 0, we have

λ ) = min L(x, λ) = min f (x) + λ g(x) = min z2 + λz1 = α.
L̂(λ
x∈D x∈D z∈G

Then, the dual problem is to find the slope λ of the tangential line (or plane)
for which the intercept with the z2 -axis is maximized.
142 5 CONVEX PROGRAMMING

Example 5.16. (Min-max problem) Minimize 5x2 + 2xy + y 2 subject


to 3x + y ≥ k, x, y ≥ 0, where k > 0 is an integer. Let x = (x, y), and
D = {x | x ≥ 0, y ≥ 0}. The Lagrangian is

L(x, λ ) = 5x2 + 2xy + y 2 + λ(k − 3x − y).

Then the primal problem is:


 
min max L(x, λ ) = min L̄(x) ,
x∈D λ ≥0 x∈D

that is,

Minimize 5x2 + 2xy + y 2 subject to 3x + y ≥ k, x, y ≥ 0.

The associated dual problem is


 
λ) .
max min L(x, λ ) = max L̂(λ
λ ≥0 x∈D λ ≥0

Consider the dual objective



λ ) = min L(x, λ)} = 5x2 + 2xy + y 2 + λ(k − 3x − y).
L̂(λ
x∈D

Verify that this objective function L(x, λ) is convex, thus, the minimum is
obtained by using the necessary and sufficient conditions, which are
∂L ∂L
= 10x + 2y − 3λ = 0, = 2x + 2y − λ = 0,
∂x ∂y

which give the optimal point x∗ = λ4 = y ∗ , which lie in D provided λ ≥ 0.


λ ) = L(λ/4, λ/4) = kλ − 21 λ2 . Then the dual problem reduces to
Hence, L̂(λ
 1
min kλ − λ2 .
λ≥0 2
Verify that the objective function for this minimum problem is concave. Hence,

∂ L̂
= k − λ = 0, which yields λ = k > 0.
∂λ
Since λ > 0, we get the dual solution:

λ∗ = k, which gives L̂(λ∗ ) = k 2 /2,

and the primal solution:

x∗ = y ∗ = λ∗ /4 = k/4 > 0, and 3x∗ + y ∗ = k, for any k > 0,


5.4 LAGRANGIAN DUALITY 143

∗ ∗
∗ ∗ ∗ k k
 slackness is satisfied since λ (k − 3x −
which is feasible. The complementary
y ) = 0. Hence, (x , y ) = 4 , 4 ∈ D is the optimal point for the primal
problem. Moreover, since f (x∗ ) = 5x∗2 + 2x∗ y ∗ + y ∗2 = k 2 /2 = L(λ∗ ), there
is no duality gap. 

Example 5.17. Minimize x2 + xy + y 2 subject to x + y ≥ 6, x, y ≥ 0. Let


x = (x, y), and D = {x | x ≥ 0, y ≥ 0}. The Lagrangian is

L(x, λ ) = x2 + xy + y 2 + λ(6 − x − y).

Then the primal problem is:


 
min max L(x, λ ) = min L̄(x) ,
x∈D λ ≥0 x∈D

i.e.,
Minimize x2 + xy + y 2 subject to x + y ≥ 6, x, y ≥ 0.
The associated dual problem is
 
λ) .
max min L(x, λ ) = max L̂(λ
λ ≥0 x∈D λ ≥0

Consider the dual objective



λ) = min L(x, λ ) = x2 + xy + y 2 + λ(6 − x − y).
L̂(λ
x∈D

Verify that this objective function L(x, λ ) is convex, thus, the minimum is
obtained by using the necessary and sufficient conditions which are

∂L ∂L
= 2x + y − λ = 0, = x + 2y − λ = 0,
∂x ∂y

which give the optimal point x∗ = λ3 = y ∗ , which lie in D provided λ ≥ 0.


λ ) = L(λ/3, λ/3) = 6λ − 31 λ2 . Then the dual problem reduces to
Hence, L̂(λ

min 6λ − 13 λ2 .
λ≥0

Verify that the objective function for this min problem is concave. Hence,

∂ L̂ 2
= 6 − λ = 0, which yields λ = 9 > 0.
∂λ 3
Since λ > 0, we get the dual solution:

λ∗ = 9, which gives L̂(λ∗ ) = 27 > 0,


144 5 CONVEX PROGRAMMING

and the primal solution:

λ∗
x∗ = y ∗ = = 3 > 0, and x∗ + y ∗ = 6,
3

which is feasible. The complementary slackness is satisfied since λ∗ (6 − x∗ −


y ∗ ) = 0. Hence, (x∗ , y ∗ ) = (3, 3) ∈ D is the optimal point for the primal
problem. Moreover, since f (x∗ ) = x∗2 + x∗ y ∗ + y ∗2 = (3)2 + (3)(3) + (3)2 =
27 = 6 λ∗ , there is a no duality gap. 

Example 5.18. Minimize (−2x2 − 3x3 ) subject to x2 ≤ 1 (see Figure 5.9).

Figure 5.9 f (x) = −2x2 − 3x3 .

The Lagrangian is L(x, λ) = −2x2 −3x3 +λ(x2 −1). For the primal problem
the optimality condition is

∂L
= 0, which yields −4x−9x2 +2λx = 0, λ(x2 −1) = 0, λ ≥ 0, x2 ≤ 1.
∂x

Case 1. λ = 0. Then x = −4/9, or x = 0.


Case 2. λ > 0. Then x2 −1 = 0, or x = ±1, where for x = 1 we get λ = 13/2,
and for x = −1 we get λ = −5/2 which is not feasible, and therefore, rejected.
Thus, the solutions are:

(x, λ) = (−4/9, 0) or (0, 0) or (1, 13/2),


f (x) = 16/243 or 0 or − 5.
5.4 LAGRANGIAN DUALITY 145

Hence, the primal solution is x∗ = 1, f (x∗ ) = −5.

Figure 5.10 Duality gap.


The dual problem is max L̂(λ) , that is,
λ≥0


L̂(λ) = min L(x, λ) = min −2x2 −3x3 +λ(x2 −1) = −∞ for all values of λ.
x x

Thus, the primal objective = −5, and the dual objective = −∞. Hence,
there is a duality gap. Geometrically, let z1 = g(x) = x2 − 1 and z2 =
f (x) = −2x2 − 3x3 . Then the supporting plane (line) runs through (−1, 0)
and intersects the z2 -axis at −∞ (see Figure 5.10). 

5.4.2 Saddle Point Sufficiency. The weak duality is defined by (5.4.1).


λ ) is
The definition of a saddle point of L(x, λ ) is as follows: The point (x̄, λ̄
called a saddle point of L(x, λ ) if

λ ) ≤ L(x, λ̄
L(x̄, λ̄ λ) for all x ∈ D, i.e., L(x, λ̄
λ) = min L(x, λ̄
λ ) = L̂(λ̄
λ ),
x∈D
(5.4.2a)
λ ) ≥ L(x, λ̄
L(x̄, λ̄ λ) for all λ ≥ 0, i.e., L(x̄, λ̄
λ) = max L(x̄, λ ) = L̄(x̄).
λ ≥0
(5.4.2b)

λ) is a saddle point of L(x, λ ), then


It means that if (x̄, λ̄

λ) = L̂(λ̄
L̄(x̄) = L(x̄, λ̄ λ ). (5.4.3)

This shows that the primal objective is equal to the dual objective and the
duality gap is zero.
146 5 CONVEX PROGRAMMING

Consider the primal problem

min x ∈ D{f (x)} subject to gj (x) ≤ 0, j = 1, 2, . . . , m, (5.4.4)

where f (x) and gj (x) are all convex functions, and D is a convex set.
The saddle point sufficiency condition is: If x ∈ D and λ ≥ 0, then (x̄, λ̄
λ)
is a saddle point of L(x, λ ) iff
(i) x̄ minimizes L(x, λ ) = f (x) + λ T gj (x) over D;
(ii) gj (x) ≤ 0 for each j = 1, 2, . . . , m;
λgj (x̄) = 0, which implies that f (x̄) = L(x̄, λ̄
(iii) λ̄ λ).
λ) is a saddle point of L(x, λ), then x̄ solves the primal problem (5.4.4)
If (x̄, λ̄
λ
and λ̄ solves the dual problem which is

λ ) = min L(x, λ ).
max L̂(λ (5.4.5)
λ ≥0 x∈D

5.4.3 Strong Duality. Consider the primal problem: Find

Φ = inf f (x) subject to gj (x) ≤ 0, j = 1, 2, . . . , m1 ,


and hj (x) = 0, j = 1, 2, . . . , m2 , x ∈ D, (5.4.6)

where D ⊆ Rn is a nonempty convex set, f (x) and gj (x) are convex and hj (x)
are linear functions.
The dual problem is defined as follows: Find

λ , µ ) subject to λ ≥ 0,
Ψ = sup L̂(λ (5.4.7)

λ , µ ) = inf
where L̂(λ f (x) + λ T g(x) = µ T h(x) . In (5.4.6) and (5.4.7), inf
x∈D
may be replaced by min, and sup by max.
Theorem 5.2. (Strong duality theorem) Assuming that the following CQ
holds: There exists an x̂ in D such that gj (x̂) < 0 for j = 1, 2, . . . , m1 , and
hj (x̂) = 0 for j = 1, 2, . . . , m2 , and 0 ∈ int {h(D)}, where h(D) = h(x), x ∈
D, then
Φ = Ψ, (5.4.8)

i.e., there is no duality gap. Moreover, if Φ > −∞, then


(a) Ψ = L(λ λ ∗ , µ ∗ ) for some λ ∗ ≥ 0; and
(b) if x∗ solves the primal, then it satisfies the complementary slackness:
λ∗j gj (x∗ ) = 0 for all j = 1, 2, . . . , m1 .
5.5 EXERCISES 147

5.5 Exercises
5.1. Let the production function be a Cobb-Douglas function with decreas-
ing returns to scale, so that the firm’s profit function is defined by

π = P AK α Lβ − rK − wL. (5.5.1)

Using the first-order partial derivatives πK and πL , the first-order conditions


are

F1 (K, L; r, w, P, A, α, β) = αP AK α−1 Lβ − r = 0,
F2 (K, L; r, w, P, A, α, β) = βP AK α Lβ−1 − w = 0.

The first-order conditions can be expressed by partial derivatives with respect


to w (wages) in the matrix form Ax = b as
 
  ∂K ∗  
α(α − 1)P AK α−2 Lβ αβP AK α−1 Lβ−1  ∂w  0
 ∂L∗  = ,
αβP AK α−1 Lβ−1 β(β − 1)P AK α Lβ−2 1
∂w

solving which we get

|A| = α(α − 1)P AK α−2 Lβ β(β − 1)K α Lβ−2 − (αβP AK α−1 Lβ−1 )2
= αβ(1 − α − β)P 2 A2 K 2α−2 L2β−2 > 0 for α + β < 1.

Since this is an unconstrained optimization problem, we have |A| = |H|, and


|H| = |H2 | > 0, which implies that the profit is minimized, and this profit
minimizing firm operates under decreasing returns to scale. We will consider
two cases:
Case 1. For computing change in the demand for capital and for labor due
∂K ∗ ∂L∗
to an increase in wages, i.e., and , we have
∂w ∂w

0 αβP AK α−1 Lβ−1


|A1 | = = −αβP AK α−1 Lβ−1 < 0,
1 β(β − 1)P AK α Lβ−2
α(α − 1)P AK α−2 Lβ 0
|A2 | = < 1,
αβP AK α−1 Lβ−1 1

Thus,

∂K ∗ |A1 | −αβP AK α−1 Lβ−1 −KL


= = = < 0,
∂w |A| αβ(1 − α − β)P 2 A2 K 2α−2 L2β−2 (1 − α − β) tr(A)
148 5 CONVEX PROGRAMMING

where tr(A) is the trace of the matrix A (i.e., sum of the diagonal elements
of A). This means that an increase in wages will decrease the demand for
capital. Similarly,

∂L∗ |A2 | α(α − 1)P AK α−2 Lβ −(1 − α)L2


= = 2 2 2α−2 2β−2
= < 0.
∂w |A| αβ(1 − α − β)P A K L (1 − α − β) tr(A)

This shows that an increase in wages will reduce the optimal level of labor
used.
Case 2. For computing change in the demand for capital and for labor due to
∂K ∗ ∂L∗
an increase in output price, i.e., and , the first-order conditions can
∂P ∂P
be expressed by partial derivatives with respect to w (wages) in the matrix
form Ax = b as
 ∗ 
 α−2 β α−1 β−1
 ∂K  
α(α − 1)P AK L αβP AK L  ∂P  −αAK α−1 Lβ
 ∂L∗  = .
αβP AK α−1 Lβ−1 β(β − 1)P AK α Lβ−2 −βAK α Lβ−1
∂P

Note that the matrix A is the same as in Case 1, while in this case we have
 
−αAK α−1 Lβ αβP AK α−1 Lβ−1
|A1 | = = αβP A2 K 2α−1 L2β−2 > 0,
−βAK α Lβ−1 β(β − 1)P AK α Lβ−2
 
α(α − 1)P AK α−1 Lβ −αAK α−1 Lβ
|A2 | = = αβP A2 K 2α−2 L2β−1 > 0.
αβP AK α−1 Lβ−1 −βAK α Lβ−1

This yields

∂K ∗ |A2 | αβP A2 K 2α−2 L2β−1 K


= = = > 0,
∂P |A| αβ(1 − α − β)2 P 2 A2 K α−2 Lβ−2 (1 − α − β)P

which shows that an increase in the output will increase the demand for capital
Similarly,

∂L∗ |A2 | αβP A2 K 2α−2 L2β−1 L


= = = > 0,
∂P |A| αβ(1 − α − β)2 P 2 A2 K α−2 Lβ−2 (1 − α − β)P

which shows that an increase in the output will increase the optimal level of
labor used. 
5.2. Consider f (x, y) = 3x2 − xy + 2y 2 − 4x − 7y + 8. The first-order partial
derivatives, equated to zero, give: fx = 6x − y − 4 = 0, fy = −x + 4y − 7 = 0,
solving which we get x = 1, y = 2. Thus, the critical number is (1, 2). The
second-order partial derivatives are: fxx = 6, fxy = fyx = −1, fyy = 4.
5.5 EXERCISES 149

Checking the condition fxx · fyy > (fxy )2 , we have 6 · 4 > (−1)2 . Since both
fxx and fyy are positive, we have a global minimum at (1, 2). 
5.3. Optimize f (x, y) = 4x2 + 3xy + 6y 2 , subject to the constraint x + y =
28. The Lagrangian in the form (5.1.2) is

F (x, y, λ) = 4x2 + 3xy + 6y 2 + λ(28 − x − y). (5.5.2)

The first-order partial derivatives are: Fx = 8x + 3y − λ = 0, Fy = 3x +


12y − λ Fλ = 28 − x − y, which when solved simultaneously give the critical
values: x∗ = 18, y ∗ = 10, λ∗ = 174. Substituting these values in (5.5.2) we get
F (18, 10) = 2436 = f (18, 10). Notice that both functions f (x, y) and F (x, y)
are equal at the critical values, since the constraint is equal to zero at those
values.
The second-order derivatives are Fxx = 8, Fyy = 12, Fxy = Fyx = 3. Also,
from the constraint g(x, y) = x + y − 28, we have gx = 1, gy = 1. Then the
bordered Hessian (§1.6.3) is

8 3 1
|H̄| = 3 12 1 ,
1 1 0

and its second principal minor is |H̄2 | = |H̄| = 8(−1) − 3(−1) + 1(3 − 12) =
−14 < 0. Since |H̄2 | < 0, |H| is positive definite, F (x, y) is at a local mini-
mum. 
5.4. Optimize the following Cobb-Douglas production functions subject
to the given constraints by (i) using the Lagrange function and finding the
critical points, and (b) by using the Hessian.
(a) Q = K 0.4 L0.5 subject to 6K + 2L = 270. We get

QK = 0.4K −0.6 L0.5 − 6λ = 0,


QL = 0.5K 0.4 L−0.5 − 2λ = 0,
Qλ = 270 − 6K − 2L = 0.

0.4K −0.6 L0.5 6λ


From the first two equations we find that 0.4 −0.5
= , or
0.5K L 2λ

L (3)(0.5)
= = 3.75, which gives L = 3.75K.
K 0.4

Substituting L = 3.75K into the third equation above, we get

270 − 6K − 7.5K = 0, which gives the critical points: K = 20, L = 75.


150 5 CONVEX PROGRAMMING

(b) Using formula (16.6), the Hessian is

−0.24K −1.6L0.5 0.2K −0.6 L−0.5 6


|H̄| = 0.2K −0.6 L−0.5 −0.25K 0.4L−1.5 2
6 2 0
= 4.8K −0.6 L−0.5 + 9K 0.4 L−1.5 + 0.96K −1.6L0.5 > 0.

Since H̄2 | > 0, |H̄| is negative definite, and Q is maximized at the point
(20, 75).
5.5. Maximize the utility function u = x0.6 y 0.3 subject to the budget
constraints 8x − 5y − 300.
Ans. Since U (x, y) = x0.6 y 0.3 +λ(600−8x−5y), we have Ux = 0.6x−0.4 y 0.3 −
8λ, Uy = 0.3x0.6 y 0.7 − 5λ, Uλ = 300 − 8x − 5y. Then from the first two equa-
tions we get y = 45 x, which after substituting in the last equation gives the
critical points (25, 20).
5.6. Minimize the total costs defined by c = 15x2 + 30xy + 30y 2 when the
firm meets the quota g(x, y) equal to 2x + 3y = 20. Define

C(x, y) = 15x2 + 30xy + 30y 2 + λ(20 − 2x − 3y).

Then Cx = 30x + 30y − 2λ = 0, Cy = 30x + 60y − 3λ = 0, Cλ = 20 − 2x − 3y.


Solving these three equations simultaneously, we get the critical values: x∗ =
4, y ∗ = 4, λ∗ = 120.
The second-order partial derivatives are: Cxx = 30, Cyy = 60, Cxy = 30 =
Cyx , and gx = 2, gy = 3. Thus, the bordered Hessian (§1.6.3) is

30 30 2
|H̄| = 30 60 3 .
2 3 0

The second principal minor is |H̄2 | = −150 < 0. Thus, |H̄| is positive definite
and C is minimized when x = y = 12. 
5.7. Minimize the utility u = x1/ 2y 3/5 subject to the budget constraint
3x + 9y = 66. Define

U (x, y) = x1/2 y 3/5 + λ(66 − 3x − 9y).

Then Ux = 21 x−1/2 y 3/5 − 3λ = 0, Uy = 53 x1/2 y −2/5 − 9λ = 0, Uλ = 66 −


3x − 9y = 0. Solving these equations simultaneously, we get the critical
values: x∗ = 10, y ∗ = 20, λ∗ = 0.3. The second-order partial derivatives are:
5.5 EXERCISES 151

Uxx = − 41 x−3/2 y 3/5 , Uxy = 3 −1/2 −2/5


10 x y 6 1/2 −7/5
= Uyx , Uyy = − 25 x y . Then
the bordered Hessian is

− 14 x−3/2 y 3/5 3 −1/2 −2/5


10 x y 3
3 −1/2 −2/5
|H̄| = 10 6 1/2 −7/5
x y − 25 x y 9 .
3 9 0

The second principal minor is

81 −3/2 3/5 81 −1/2 −2/5 54 1/2 −7/3


|H̄2 | = |H̄| = 4 x y + 5 x y + 25 x y > 0,

since all terms are positive. Hence, |H̄| is positive definite, and U is minimized
at the critical values. 
5.8. Minimize x2 + 2y 2 subject to x + y ≥ 3 and y − x2 ≥ 1. Solution.
The Lagrangian is

L(x, y, λ1 , λ2 ) = x2 + 2y 2 + λ1 (3 − x − y) + λ2 (1 − y + x2 ), λ1 , λ2 ≥ 0.

Then

Lx (x, y, λ1 , λ2 ) = 2x − λ1 + 2λ2 = 0,
Ly (x, y, λ1 , λ2 ) = 4y − λ1 − λ2 = 0,
Lλ1 (x, y, λ1 , λ2 ) = 3 − x − y = 0,
Lλ2 (x, y, λ1 , λ2 ) = 1 − y + x2 = 0. (5.5.3)

Solving the last two equations in (5.5.3), we get (x, y) = (−2, 5) or (x, y) =
(1, 2). Using (x, y) = (−2, 5) in the first two equations in (5.5.3) we find that
λ1 = 28, λ2 = −8, which is not feasible. Next, using (x, y) = (1, 2) in the first
two equations in (5.5.3) we get λ1 = 6, λ2 = 2, which is feasible and the point
(1, 2) is the global minimizer.
5.9. Minimize x2 + y 2 − 4x − 4y subject to x2 ≤ y, x + y ≤ 2. Solution.
The Lagrangian is

L(x, y, λ1 , λ2 ) = x2 + y 2 − 4x − 4y + λ1 (x2 − y) + λ2 (x + y − 2), λ1 , λ2 ≥ 0.

Then

Lx (x, y, λ1 , λ2 ) = 2x − 2λ1 x + λ2 = 0,
Ly (x, y, λ1 , λ2 ) = 2y − 4 − λ1 + λ2 = 0,
Lλ1 (x, y, λ1 , λ2 ) = x2 − y = 0,
Lλ2 (x, y, λ1 , λ2 ) = x + y − 2 = 0. (5.5.4)
152 5 CONVEX PROGRAMMING

Solving the last two equations in (5.5.4), we get (x, y) = (−2, 4) or (x, y) =
(1, 1). Using (x, y) = (−2, 4) in the first two equations in (5.5.3) we find that
λ1 = −4, λ2 = −0, which is not feasible. Next, using (x, y) = (1, 1) in the
first two equations in (5.5.3) we get λ1 = 0, λ2 = 2, which is feasible and the
point (1, 1) is the global minimizer.
5.10. The Constant Elasticity of Substitution (CES) production function
is defined by
 −1/β
q = A αK −β + (1 − α)L−β , (5.5.1)
where A > 0 is the coefficient parameter, α (0 < α < 1) the distribution
parameter denoting relative factor shares, and β > −1 the substitution pa-
rameter that determines the value of elasticity of substitution (see Exercise
 −2
2.35). Consider q = 100 0.4K −0.5 + 0.6L−0.5 , and determine the relative
minimum. Using the Lagrangian, the first-order partial derivatives of Q ≡ q
are:
 −3
QK = 40K −1.5 0.4K −0.5 + 0.6L−0.5 =0
−1.5
 −0.5

−0.5 −3
QL = 60L 0.4K + 0.6L = 0.

Solving these two equations we get L1.5 = 1.5K 1.5 , or L ≈ 1.3K. The second-
order partial derivatives of Q are:
 −3  −4
QKK = −60K −2.5 0.4K −0.5 + 0.6L−0.5 + 24K −3 0.4K −0.5 + 0.6L−0.5 ,
  −3   −4
QLL = −90L−2.5 0.4K −0.5 + 0.6L−0.5 + 54L−3 0.4K −0.5 + 0.6L−0.5 ,
−0.5 −4
−1.5 −1.5
 −0.5

QKL = 36K L 0.4K + 0.6L = QLK .

We will use some numerical computation: We take K = 1, then L = 1.3. Then


 −3  −4
0.4K −0.5 + 0.6L−0.5 ≈ 1.26, 0.4K −0.5 + 0.6L−0.5 = 1.36, and QKK ≈
−42.96 < 0, QLL ≈ −25.55, and QKL ≈ 30.38 > 0. Thus, QKK QLL ≈
1098 > 0 and (QKL )2 = 923 > 0. Since Qkk QLL > 0 and QKK QLL >
(QKL )2 , we have a relative minimum at (1, 1.73).
6
Quasi-Concave Functions

6.1 Quasi-Concave Functions


Most of the objective functions used in optimization problems are generally
quasi-concave (or quasi-convex). In many problems, both quasi-concave and
quasi-convex functions characterize a constraint set, which is a convex set. As
mentioned before, there are quasi-concave (or quasi-convex) functions which
are not concave (or convex) functions, although the converse need not be
true. However, both quasi-concavity and quasi-convexity are defined in terms
of convex sets, and they hold a symmetric relationship:

f : Rn 7→ R is (strictly) quasi-concave iff (−f ) is (strictly) quasi-convex.

Let f : Rn 7→ R be a real-valued function at x = (x1 , . . . , xn ) ∈ Rn . Then


the upper-level set of f , denoted by Uα for all α, is defined by

Uα = {x ∈ dom(f ) | f (x) ≥ α}. (6.1.1)

Upper-level sets, also known as upper contour sets, are convex sets for quasi-
concave functions, and they are used in problems involving consumer’s utility
maximization and a company’s cost minimization. For example, let an input
requirement in the case of a production function correspond to the upper-
level set (6.1.1), where α denotes an output level, x an input vector, and f a
single-output production function. Then, in the case of utility maximization,
where u denotes a utility function, the set of all consumption bundles {x}
that are preferable to a given consumption bundle {x∗ } is also an upper-level
set Uα = {x∗ | u(x∗ ) ≥ α}.
A function f is said to be quasi-concave iff the upper-level set Uα is a
convex set for every y ∈ R(f ).
A real-valued function f : Rn 7→ R is strictly quasi-concave iff

f (tx + (1 − t)x′ ) > min{f (x), f (x′ )} (6.1.2)


154 6 QUASI-CONCAVE FUNCTIONS

for all x, x′ ∈ Rn and for all t ∈ [0, 1]. This definition differs from the above
definition of quasi-concavity in that only strict inequality is used.
Let x, x′ ∈ R be two distinct points on the x-axis (a convex set) such
that the interval [x, x′ ] supports an arc AB on the curve, and B is higher
than A (see Figure 6.1(a)). Since all the points between A and B on the
arc are strictly higher than A, it satisfies the condition of quasi-concavity.
The curves are strictly quasi-concave if all possible [x, x′ ] intervals have arcs
that satisfy this same condition. Notice that this function also satisfies the
condition of non-strict quasi-concavity, but does not satisfy the condition of
quasi-convexity, because some points on the arc AB are higher than A, and
this is not acceptable for a quasi-convex function. Figure 6.1(b) presents the
case where a horizontal line segment A′ B ′ exists on which all points have the
same height. This curve meets the condition of quasi-concavity, but does not
satisfy that of strict quasi-concavity.

Figure 6.1 Quasi-concavity.

Note that in general a quasi-concave function that is also concave has its
graph approximately shaped like a bell, or part thereof, and a quasi-convex
function has its graph shaped like an inverted bell, or a part of it. Thus,
quasi-concavity (or quasi-convexity) is a weaker condition than concavity (or
convexity).
The above geometrical characterization leads to the following algebraic
quasi-concave
definition: A function f is iff, for any pair of distinct points
quasi-convex
x and x′ in the (convex-set) domain of f , and for 0 < t < 1,
 
≥ f (x)
f (x′ ) ≥ f (x) =⇒ f (tx + (1 − t)x′ ) . (6.1.3)
≤ f (x′ )

A linear function f (x) is both quasi-concave and quasi-convex. To prove,


note that multiplying an inequality by −1 reverses the sign of inequality. If f
is quasi-concave, with f (x) ≥ f (x′ ), then f (tx + (1 − t)x′ ) ≥ f (x). Now, for
6.1 QUASI-CONCAVE FUNCTIONS 155

the function −f , we will have −f (x) ≥ −f (x′ ) and f (tx + (1 − t)x′ ) ≤ −f (x).
Thus, −f satisfies the condition of quasi-convexity.
Concavity implies quasi-concavity. To prove, let f be concave. Then
f (tx + (1 − t)x′ ) ≥ tf (x) + (1 − t)f (x′ ). Now, assume that f (x′ ) ≥ f (x).
Then any weighted average of f (x) and f (x′ ) cannot possibly be less than
f (x), i.e., tf (x) + (1 − t)f (x′ ) ≥ f (x). Combining these two results we find
that f (tx + (1 − t)x′ ) ≥ f (x) for f (x′ ) ≥ f (x), which satisfies the definition
of quasi-concavity.
The condition of quasi-concavity does not guarantee concavity.
In the case of concave (and convex) functions, there is a very useful result:
the sum of concave (convex) functions is also concave (convex). However, this
result cannot be generalized to quasi-concave and quasi-convex functions.
Sometimes quasi-concavity and quasi-convexity can be checked by using
the following definition:
 
n quasi-concave
A function f (x), where x = (x1 , . . . , xn ) ∈ R , is iff, for
quasi-convex
any constant k, the set
 
S ≥ ≡ {x | f (x) ≥ k}
is a convex set. (6.1.4)
S ≤ ≡ {x | f (x) ≤ k}

The sets S ≥ and S ≤ are presented in Figure 6.2.

Figure 6.2 Sets S ≥ and S ≤ .

The three functions in Figure 6.2 all contain concave as well as convex
segments and therefore they are neither concave nor convex. However, the
function in Figure 6.2(a) is quasi-concave because for any value of k, although
the figure shows only one value of k, and the set S ≥ is convex. The function
in Figure 6.2(b) is, however, quasi-convex since the set S ≤ is convex. The
function in Figure 6.2(c) is a monotone function and it differs from the other
two functions in that both S ≥ and S ≤ are convex sets. Hence, the function is
156 6 QUASI-CONCAVE FUNCTIONS

both quasi-concave and quasi-convex. Note that formula (6.1.4) can be used
to check quasi-concavity and quasi-convexity, but it cannot verify whether
they are strict or nonstrict.
Example 6.1. Check f (x) = x2 , x ≥ 0, for quasi-concavity and quasi-
convexity. The graph of the function shows that it is a convex and a strictly
convex function. It is also quasi-concave because its graph is a U-shaped curve,
starting at the origin and increasing; it is similar to Figure 6.2(c) generating
a convex S ≥ as well as a convex S ≤ set. Instead we use formula (6.1.3). If x
and x′ are two distinct nonnegative values of x, then f (x) = x2 , f (x′ ) = x′2 ,
and f (tx + (1 − t)x′ ) = (tx + (1 − t)x′ )2 . Now, suppose f (x′ ) ≥ f (x), i.e.,
x′2 ≥ x2 ; then, x′ ≥ x, or specifically x′ > x, since x and x′ are distinct
points. Thus, the weighted average tx + (1 − t)x′ must lie between x and x′ ,
and we have for 0 < t < 1,

x′2 > (tx + (1 − t)x′ )2 > x2 or f (x′ ) > f (tx + (1 − t)x′ ) > f (x).

But in view of (6.1.3), this result implies that f is both strictly quasi-concave
and strictly quasi-convex. 
Example 6.2. Show that f (x, y) = xy, x, y ≥ 0 is quasi-concave. Use
the criterion in (6.1.4) and show that the set S ≥ = {(x, y) | xy ≥ k} is a
convex set for any k. Note that the curve xy = k with k ≥ 0 is a different
curve for each k. If k > 0, this curve is a rectangular hyperbola in the first
quadrant of the xy-plane, and the set consisting of all points on and above
this hyperbola is a convex set. But if k = 0, the curve is defined by xy = 0,
which constitutes the nonnegative parts of the x and y axes, and it is again
a convex set. Hence, the function f (x, y) = xy, x, y ≥ 0, is quasi-concave. Be
careful not to confuse the given curve with z = xy which is a surface in the
(x, y, z)-space. In this example we are examining the characteristics of the
surface which is quasi-concave in R3 . 
Example 6.3. Show that the function f (x, y) = (x−a)2 +(y−b)2 is convex
and so it is quasi-convex. Use the criterion (6.1.4), and set (x−a)2 +(y −b)2 =
k, k ≥ 0. For√ each k, the curve is a circle in the xy-plane with center at (a, b)
and radius k. Since the set {(x, y) | (x − a)2 + (y − b)2 = k} is the set of all
points on and inside this circle, it is a convex set, even when k = 0 in which
case the circle degenerates into a single point (a, b), and a set with a single
point is a convex set. Hence, the given function is quasi-convex. 

6.2 Differentiable Functions  


quasi-concave
A differentiable function f (x) in R is iff, for any pair of
quasi-convex

distinct points x and x in dom(f ),
 
f ′ (x)(x′ − x)
f (x′ ) ≥ f (x) =⇒ ≥ 0. (6.2.1)
f ′ (x′ )(x′ − x)
6.2 DIFFERENTIABLE FUNCTIONS 157

Quasi-concavity and quasi-convexity are strict if the inequality on the right


side in (6.2.1) is changed to strict inequality > 0.
 
n quasi-concave
A differentiable function f (x), where x = (x1 , . . . ,n ) ∈ R , is
quasi-convex
iff, for any pair of distinct points x and x′ in dom(f ),
 Pn ∂f (x) 



 (x j − x )
j 


′ j=1 ∂xj
f (x ) ≥ f (x) =⇒ ′
n ∂f (x ) ≥ 0. (6.2.2)
P 

 (x′j − xj ) 

j=1 ∂xj

For strict quasi-concavity and quasi-convexity, the right side of (6.2.2) must
be changed to strict inequality > 0.
If a function f (x), x ∈ Rn , is twice continuously differentiable, we can
check quasi-concavity and quasi-convexity by using the bordered Hessian |B|
(single function, §1.6.4) defined by

0 f1 f2 ··· fn
f1 f11 f12 ··· f1n
|B| = f2 f21 f22 ··· f2n , (6.2.3)
··· ··· ··· ··· ···
fn fn1 fn2 ··· fnn

∂f ∂2f
where fi = and fij = , i, j = 1, . . . .n. Note that, unlike the
∂xi ∂xi ∂xj
bordered Hessian |H̄| described in §1.6.3 and used for optimization problems
involving an extraneous constraint g, the above-defined bordered Hessian |B|
is composed of the first derivatives of the function f only, without any extra-
neous constraint g. The leading principal minors of |B| are

0 f1 f2
0 f1
|B1 | = , |B2 | = f1 f11 f12 , |Bn | = |B|. (6.2.4)
f1 f11
f2 f21 f22

We will state two conditions, one of which is necessary and the other is
sufficient, and both relate to quasi-concavity on a domain consisting only
of the nonnegative orthant (the n-dimensional analogue of the nonnegative
quadrant) which is defined by x1 , . . . , xn ≥ 0. These conditions are as follows:
The necessary condition for a function z = f (x) to be quasi-concave on the
nonnegative orthant is
 
≤ odd
|B1 | ≤ 0, |B2 | ≥ 0, and |Bn | 0 if n is (6.2.5)
≥ even,
158 6 QUASI-CONCAVE FUNCTIONS

where the partial derivatives are evaluated in the nonnegative orthant. Recall
that the first condition in (6.2.5) is automatically satisfied since |B1 | = −f12 =
 2
∂f
− .
∂x1
The sufficient condition for f to be strictly quasi-concave on the nonnega-
tive orthant is that
 
≤ odd
|B1 | < 0, |B2 | > 0, |Bn | 0 if n is (6.2.6)
≥ even,

where the partial derivatives are evaluated in the nonnegative orthant. The
details of these conditions are available in Arrow and Enthoven [1961:797],
and Takayama [1993:65].
Example 6.4. The function f (x1 , x2 ) = x1 x2 , x1 , x2 ≥ 0 is quasi-concave
(compare Example 6.2). We will check it using (6.2.2). Let u = (u1 , u2 ) and
v = (v1 , v2 ) be two points in dom(f ). Then f (u) = u1 u2 and f (v) = v1 v2.
Assume that
f (v) ≥ f (u), or v1 v2 ≥ u1 u2 , (6.2.7)
where u1 , u2 , v1 , v2 ≥ 0. Since the partial derivatives of f are f1 = x2 and
f2 = x1 , condition (6.2.2) implies that f1 (u)(v1 − u1 ) + f2 (u)(v2 − u2 ) =
u2 (v1 − u1 ) + u1 (v2 − u2 ) ≥ 0, which after rearranging the terms is

u2 (v1 − u1 ) ≥ u1 (u2 − v2 ). (6.2.8)

Now there are four cases to consider depending on the values of u1 and u2 :
(1) If u1 = u2 = 0, then (6.2.8) is trivially satisfied.
(2) If u1 = 0 and u2 > 0, then (6.2.8) reduces to u2 v1 ≥ 0, which is again
satisfied since u2 and v1 are both nonnegative.
(3) If u1 > 0 and u2 = 0, then (6.2.8) reduces to 0 ≥ −u1 v2 , which is
satisfied.
(4) Suppose u1 , u2 > 0, so that v1 , v2 > 0 also. Subtracting v2 u1 from both
sides of (6.2.7), we obtain

v2 (v1 − u1 ) ≥ u1 (u2 − v2 ). (6.2.9)

This leads to the following three possibilities: (a) If u2 = v2 , then v1 ≥ u1 .


Since (u1 , u2 ) and (v1 , v2 ) are distinct points, and u2 = v2 and v1 > u1 , so
condition (6.2.8) is satisfied.
(b) If u2 > v2 , then we have v1 > u1 by (6.2.9). Multiplying both sides of
(6.2.9) by u2 /v2 , we get
u2
u2 (v1 − u1 ) ≥ u1 (u2 − v2 ) > u1 (u2 − v2 ), since u2 > v2 . (6.2.10)
v2
6.3 THEOREMS ON QUASI-CONCAVITY 159

Hence, (6.2.8) is satisfied.


(c) If u2 < v2 , i.e., u2 /v2 is a positive fraction, the first inequality of (6.2.10)
still holds in this case. The second inequality also holds because a fraction
u2 /v2 of a negative number (u2 − v2 ) is greater than the number itself. Hence
the given function is quasi-concave.
As for the sufficient condition, note that the partial derivatives are f1 =
x2 , f2 = x1 , f11 = f22 = 0, f12 = f21 = 1, giving

0 x2 x1
0 x2
|B1 | = = −x22 ≤ 0, |B2 | = x2 0 1 = 2x1 x2 ≥ 0.
x2 0
x1 1 0

Thus, (6.2.6) is satisfied in the positive orthant. 


Example 6.5. Show that the function f (x, y) = xa y b , (x, y > 0; a > 0, b >
1) is strictly quasi-concave. The partial derivatives are: fx = axa−1 y b , fy =
bxa y b−1 , fxx = a(a−1)xa−2 y b , fyy = b(b−1)xa y b−2 , fxy = fyx = abxa−1 y b−1 .
Then the minors of the bordered Hessian |B| are

0 fx
|B1 | = = −(axa−1 y b )2 < 0,
fx fxx
0 fx fy
|B2 | = fx fxx fxy = [2a2 b2 − a(a − 1)b2 − a2 b(b − 1)] x3a−2 y 3b−2 > 0,
fy fyx fyy

thus satisfying the condition for strict quasi-concavity in (6.2.6). 


Note that it is improper to define strict quasi-concavity by saying that the
upper-level sets are strictly convex, since an upper-level set can be strictly
convex even when the function has flat parts. Moreover, a function f is
strictly quasi-concave iff −f is strictly quasi-convex.
Quasi-concavity is a weaker assumption than concavity in the sense that, al-
though every concave function is quasi-concave, the converse is not true. How-
ever, economists sometimes demand something more than quasi-concavity.
For example, a quasi-concave function does not imply risk aversion, while a
concave utility function does.

6.3 Theorems on Quasi-Concavity


Let S denote the convex set on which a concave function is defined. Thus,
(tx + (1 − t)x′ , ty + (1 − t)y ′ ) ∈ S. Then quasi-concavity of f means that

tf (x) + (1 − t)f (x′ ) = ty + (1 − t)y ′ ≤ f (tx + (1 − t)x′ ). (6.3.1)

A real-valued function f on a metric space X is called upper semicon-


tinuous if for each real number α the set {x : f (x) < α} is open. If f is
160 6 QUASI-CONCAVE FUNCTIONS

continuous, both f and −f are upper semicontinuous. This definition leads


to the proposition: Let f be an upper semicontinuous and real-valued function
on a countably compact space, then f is bounded from above and assumes its
maximum (Royden [1968: 161]).
Theorem 6.1. Let f be an upper semicontinuous function on a convex set
S ∈ R2 . If f is strictly quasi-concave on S, then f is quasi-concave on S.
Proof. Let (x, x′ ) be in S, where y = f (x) and y ′ = f (x′ ). If S is a
convex set, then the convex combination (tx + (1 − t)x′ , ty + (1 − t)y ′ ) is also
in S for any t ∈ [0, 1]. Thus, the inequality (6.3.1) holds for all x, x′ ∈ S.
Conversely, assume that tf (x) + (1 − t)f (x′ ) ≤ f (tx + (1 − t)x′ ). Choose y and
y ′ such that y ≤ f (x) and y ′ ≤ f (x′ ). Obviously, (x, y) and (x′ , y ′ ) are both
in S. Thus, ty ≤ tf (x), and (1 − t)y ′ ≤ (1 − t)f (x′ ) for any 0 ≤ t ≤ 1. But
this implies that ty + (1 − t)y ′ ≤ tf (x) + (1 − t)f (x′ ). Since the right-hand
side of this inequality is assumed to be not greater than tf (x) + (1 − t)f (x′ ),
we get from inequality (6.3.1)

ty + (1 − t)y ′ ≤ f (tx + (1 − t)x′ ).

Hence, (tx + (1 − t)x′ , ty + (1 − t)y ′ ) ∈ hyp(f ), i.e., S is an upper-level set,


and the inequality (6.3.1) implies (6.1.2), which completes the proof. 
This means that strict quasi-concavity and upper semicontinuity imply
quasi-concavity, but the converse is not true. Another useful result is
Theorem 6.2. Let f be a strictly quasi-concave function on a convex set
S ∈ R2 . If x∗ is a local maximizer of f on S, then x∗ is a global maximizer
of f on S.
Note that a sum of two concave functions is a concave function. However,
the sum of two quasi-concave functions is not necessarily a quasi-concave
function. Also, the sum of a concave function and a quasi-concave function is
not necessarily either a concave function or a quasi-concave function.
Theorem 6.3. Let f : R2 7→ R be a quasi-concave function, and let
g : R 7→ R be a nondecreasing function whose domain contains R(f ). Then
the composite function F (x) := g ◦ f : R2 7→ R is a quasi-concave function.
Proof. This follows from the fact that strictly monotone functions have
strictly monotone inverse functions. Thus, for any α we get

{x|F (x) ≥ α} = {x | g(f (x)) ≥ α} = {x | f (x) ≥ g −1 (α)},

which is a convex set since f is quasi-concave. 


Theorem 6.4. Let BαU := X ∩ {x | f (x) ≥ α} = X ∩ Uα . Then f (x) is
quasi-concave iff BαU is a convex set for all α.
Proof. (Necessity ⇒) Let x, y ∈ BαU , and let t ∈ [0, 1]. Then by definition
of quasi-concavity we have x ∈ X and both f (x) ≥ α and f (y) ≥ α. Using
6.3 THEOREMS ON QUASI-CONCAVITY 161

the quasi-concavity of f , we have f (tx + (1 − t)y) ≥ min{f (x), f (y)} ≥ α.


Thus, tx + (1 − t)y ∈ BαU , which implies that BαU is a convex set.

(Sufficiency ⇐) Let α = min{f (x), f (y)}. Then both x ∈ BαU and y ∈ BαU
hold. Since BαU is a convex set, we get tx + (1 − t)y ∈ BαU , which implies that
f (tx + (1 − t)y) ≥ α = min{f (x), f (y)}. This proves quasi-concavity of f . 

This proposition states that a function f is quasi-concave iff for any α the
upper-level set of f is a convex set.

To check whether a given function f is quasi-concave, just check the prop-


erty in Theorem 6.4, which states that a function f is quasi-concave iff all of
its upper-level sets are convex sets; i.e., for any α, the set of points, where
f (x) ≥ α is true, is a convex set.

We will consider functions of a single variable x ∈ R. Recall that monotone


functions in R are both quasi-concave and quasi-convex, since both upper- and
lower-level sets are convex sets; in fact, they are intervals. Hence, monotone
functions are quasi-linear. Some examples follow.

Example 6.6. Consider the simplest case: f (x) = x, which is a strictly


increasing linear function (see Figure 6.3). 

Figure 6.3 Strictly increasing function. Figure 6.4 Quasi-concave function.


 x if x ≤ 0,

Example 6.7. Consider the function f (x) = 0 if 0 ≤ x ≤ 1,


x − 1 if x ≥ 1.

This function is not linear, but monotone increasing; thus, it is quasi-concave


(see Figure 6.4, where the horizontal part of the graph corresponds to the
interval [0, 1]. 

Example 6.8. Consider the function f (x) = x3 , which is strictly increas-


ing (see Figure 6.5). 
162 6 QUASI-CONCAVE FUNCTIONS
 2
 −x if x ≤ 0,

Example 6.9. Consider the function f (x) = 0 if 0 ≤ x ≤ 1,


−(x − 2)2 + 1 if x ≥ 1.
This function is neither increasing nor decreasing and has a horizontal part on
its graph (see Figure 6.6). This function is obviously quasi-concave since all
its upper-level sets are convex sets (intervals); it is not quasi-convex because
the lower-level sets are not convex sets. The plot of this function shows the
general shape of quasi-concave functions of a single variable. 
Note that each continuous quasi-concave function of a single variable be-
longs to one of the following classes of functions (Martos [1975]): (i) Either f
is a monotone increasing function on X, or a monotone decreasing function on
X; and (ii) there exist a, b ∈ X, a ≤ b, such that f is monotonically increasing
for x < a, constant for x ∈ [a, b], and monotonically decreasing for x > b.

Figure 6.5 Strictly increasing function. Figure 6.6 Quasi-concave function.

6.4 Three-Dimensional Case


We will consider an analogy with a mountain, and try to explain the concept
of concavity in reference to a high mountain by asking if the surface of the
mountain is concave. The answer is ‘yes’ if every straight line connecting any
two points on the surface lies everywhere on or under the surface. The ideal
situation is if the mountain was a perfect dome (semi-sphere); in this case
the condition is satisfied and the function defined by its surface is concave.
The condition is also satisfied if the mountain is a perfect cone, since in this
case every straight line connecting two points on its surface lies exactly on
the surface, and the function defined by the cone is concave.
The function in the case of a perfect dome or a cone has a common prop-
erty: each level line is a circle. Similarly, on a topological map of the above-
mentioned mountain, the set of points inside each contour (i.e., the set of
points at which the height of the mountain exceeds a given number) is a con-
vex set. The spacing of the contour lines differs, but the set of points inside
every contour has the same shape for each mountain: it is a disk, and each
such set is convex. However, not every mountain has this property, as one can
6.5 MULTIVARIATE CASE 163

see from the topological map of any actual mountain. In fact, the contours of
mountains do not generally enclose convex sets.
In reality, mountains generally come in different forms. For example, a
mountain may be a deformation of a cone that gets progressively steeper at
higher altitudes, becoming harder to climb. In this case a straight line from
the top of the mountain to any other point on its surface will not lie on or
under the surface, but rather pass through the air. The function defined by
the surface of such a mountain is not concave.
Let us consider the surface of a mountain as function f (x, y), where x
denotes longitude and y the latitude. Then a contour is a level curve of f . A
function with the property that, for every value of a, the set of points (x, y)
such that f (x, y) ≥ a is a convex set is said to be quasi-concave. The set of
points (x, y) in this definition lie inside every contour on a topographical map.
Example 6.10. Let f (x, y) = −x2 − y 2 . The upper-level set of f for α is
the set of points (x, y) such that −x2 − y 2 ≥ α, or x2 + y 2 ≤ −α. Thus, if
α > 0, the upper-level set Uα is empty, whereas if α < 0, it is a disk of radius
α1/2 . 

6.5 Multivariate Case


We can now generalize to Rn . Let f be a multivariate function defined on the
set S. Then f is said to be quasi-concave if, for any number a, the set of the
points for which f (x) ≥ a is a convex set.
Let f be a multivariate function defined on a set S. For any real number
a, the set Uα , defined by (6.1.1), is the upper-level set of f ∈ S for all α. In
the case of a mountain, Uα is the set of all points at which the altitude is at
least α.
Note that f is quasi-convex iff −f is quasi-concave. The notion of quasi-
concavity is weaker than that of convexity, in the following sense.
Theorem 6.5. Every concave function is quasi-concave, and every convex
function is quasi-convex.
Proof. Let the function be f , and the convex set on which it is defined
be S. Let a be a real number and let x and y be two pints in the upper-level
set Uα , where x ∈ Uα and y ∈ Uα . We must show that Uα is convex, i.e., we
must show that for every t ∈ [0, 1] we have tx + (1 − t)y ∈ Uα .
First, note that the set S on which f is defined is convex; thus, we have
tx + (1 − t)y ∈ S, and so f is defined at the point tx + (1 − t)y. Next, the
concavity of f means that

f (tx + (1 − t)y) ≥ tf (x) + (1 − t)f (y). (6.5.1)

Moreover, since x ∈ Uα means f (x) ≥ a, and y ∈ Uα means f (y) ≥ a, we


164 6 QUASI-CONCAVE FUNCTIONS

have
tf (x) + (1 − t)f (y) ≥ ta + (1 − t)a = a. (6.5.2)

Then combining the inequalities (6.5.1) and (6.5.2) we get

f (tx + (1 − t)y) ≥ a,

which means that tx + (1 − t)y ∈ Uα , thereby proving that every upper-level


set is convex and hence f is quasi-concave. The other part of the theorem
follows simply by using the fact that −f is convex. 
Example 6.11. Consider the function f (x, y) = −x2 − y 2 . This is a
concave function, and also quasi-concave. Take ψ(y) = ey . Then, using
Theorem 6.3, the function g(x, y) = ef (x,y) will be quasi-concave (see Figure
6.7). Notice that although the function g(x, y) is obtained from a concave
function using a strictly monotonically increasing transformation, g is not at
all a concave function. 

Figure 6.7 Function g(x, y).

6.6 Sums of Quasi-Concave Functions


The following property holds for the sum of concave functions: If f1 , . . . , fm
are concave functions and λi , i = 1, . . . , m, then F (x) = λ1 f1 (x) + · · · +
λm fm (x) is a concave function. But this property does not hold for quasi-
concave functions.
Example 6.12. The function g1 (x) = ex , x ≥ 0, is a strictly monotonically
increasing function and thus, it is quasi-concave. Further, the function g2 (x) =
e−x , x ≤ 0, is a strictly monotonically decreasing function, and thus it is
also quasi-concave. But their sum g(x) = g1 (x) + g2 (x) is strictly convex
6.6 SUMS OF QUASI-CONCAVE FUNCTIONS 165

and not quasi-concave (see Figure 6.8). 

Figure 6.8 Function g1 (x) and g2 (x).

2 2
Example 6.13. Consider the bell-shaped curves g(x, y) = e−x −y and
2 2
h(x, y) = e−(x−1) −(y−1) . Both functions are quasi-concave. Let their sum
be G(x, y) = g(x, y) + h(x, y). The function G(x, y) is not quasi-concave
as it consists of two disjoint bell-shaped surfaces, representing the functions
g(x, y) and h(x, y), respectively. The indifference curves and the upper-level
set (shaded region) of G(x, y) corresponding to α = 0.7 are shown in Figure
6.9. It is obvious that the upper-level set is not convex because it consists
of two parts. This example shows that, in general, the sum of quasi-concave
functions is not necessarily a quasi-concave function.

Figure 6.9 (a) Indifference curves, and (b) upper-level set of the function G(x, y).
166 6 QUASI-CONCAVE FUNCTIONS

6.7 Strictly Quasi-Concave Functions


Using the definition (3.8.8) of strict concavity of a function f , which states
that f is a strictly concave function over X if for any y, y ∈ X, x 6= y, and
for any t, 0 < t < 1, we have the inequality

f (tx + (1 − t)y) > tf (x) + (1 − t)f (y). (6.7.1)

This implies that

f (tx + (1 − t)y) > tf (x) + (1 − t)f (y) ≥ min{f (x), f (y)}, (6.7.2)

which leads to the generalization of concavity: f (x) is strictly quasi-concave


on X, if for all x, y ∈ X, x 6= y, 0 < t < 1,

min{f (x), f (y)} < f (tx + (1 − t)y}. (6.7.3)

f (x) is strictly quasi-convex on X ⇐⇒ −f is strictly quasi-concave on X.

Hence,

f (x) is strictly concave on X =⇒ f is strictly quasi-concave on X.

f (x) is strictly quasi-concave on X =⇒ f is quasi-concave on X.

A set C ⊂ Rn is said to be strictly convex if for any x, y ∈ C, 0 < t < 1 we


have tx + (1 − t)y ∈ Int(C), where Int(C) is the set of all interior points of C.
Consider the upper-level sets of strictly quasi-concave functions. Generally
one would think that strict quasi-concavity is equivalent to strict convexity of
the upper-level sets. This is partially true; however, we have
Theorem 6.6. If f is a continuous strictly quasi-concave function, then
its upper-level sets are either empty or they are strictly convex sets.
Proof. As defined earlier, let LU α := {x | f (x) ≥ α}. Since f is continuous,
any point y with f (y) > 0 must belong to Int(LU a ). Since f is quasi-concave,
LUα is a convex set. If this set is not strictly convex, then there exist x, y ∈ LU
α
and 0 < t < 1 such that xt ≡ tx + (1 − t)y 6∈ Int(LU α ). Moreover, because of
continuity of f , we must have f (xt ) = α. Again, since f is quasi-concave, we
have min{f (x), f (y)} ≤ f (xt ) = α. Hence, min{f (x), f (y)} = f (xt ), which
contradicts the strict-concavity of f . 
Note that the converse of this theorem is not true, because there exist
functions with all their nonempty upper-level sets which are strictly convex
but not strictly quasi-concave.
6.7 STRICTLY QUASI-CONCAVE FUNCTIONS 167

Example 6.14. The function



−x2 if x ≤ 0,
fˆ(x) =
−(x − 1)2 + 1 if x ≥ 0

is strictly quasi-concave (see Figure 6.10). 

Figure 6.10 Function f (x).

Theorem 6.7. Let f be a strictly quasi-concave function and let Ψ be


a strictly increasing function defined on R(f ). Then F (x) := Ψ(f (x)) is a
strictly quasi-concave function.
Proof. For any x1 , x2 ∈ X, t ∈ (0, 1) and xt = tx1 + (1 − t)x2 , we have

F (xt ) = Ψ(f (xt )) = Ψ(f (tx1 + (1 − t)x2 ))


> min{Ψ(f (x1 ), f (x2 )}) = min{Ψ(f (x1 )), Ψ(f (x2 ))})
= min{F (x1 ), F (x2 )},

which proves the strict quasi-concavity of F . 

It seems from these situations that the graphs of strictly quasi-concave


functions must be ‘curved.’ But, in general this is not true, as shown, for
example, by the function f1 (Figure 6.3) which is a strictly quasi-concave linear
function. Thus, strictly concave functions are also strictly quasi-concave: the
function f1 shows that there are strictly quasi-concave functions which are
concave but not strictly concave.

6.7.1 Sums of Strictly Quasi-Concave Functions. If g1 , . . . , gm are


strictly quasi-concave functions and if ki ≥ 0, i = 1, . . . , m, then there exist
ki > 0 for all i such that F (x) := k1 g1 (x) + · · · + km gm (x) is a strictly concave
function. However, it does not hold for strictly quasi-concave functions. In
fact, in general, we have the following example.
Example 6.15. Consider g(x) = g1 (x) + g2 (x) = ex + e−x , which is
the sum of two strictly monotone (and therefore two strictly quasi-concave)
168 6 QUASI-CONCAVE FUNCTIONS

functions; but g(x) is not strictly quasi-concave; in fact, it is not even quasi-
concave, although it is strictly convex (see Figure 6.8). 

6.8 Quasi-Concave Programming


We will consider the optimization problem:

max{f (x)}, x ∈ X. (6.8.1)

First, assume that f is a quasi-concave function. Then the problem (6.8.1)


may have local maxima, which are not global. For example, consider the
function f4 (Figure 6.6): for this function, any x ∈ (0, 1) is a local max-
imum whereas its global maximum is at x∗ = 2. The difficulty is due to
the horizontal parts on its graph, which can be eliminated by requiring strict
quasi-concavity of the function. The following result is well known.
Theorem 6.8. Let X be a nonempty convex set and assume that f is
strictly quasi-concave. Then any local maximum is a global solution of the
problem (6.8.1).
Proof. Assume that x̄ ∈ X is a local maximum, i.e., there exists an ε > 0
such that f (x) ≤ f (x̄) holds for any x ∈ X with kx − x̄k ≤ ε. Assume that x̄
is not a global maximum. Then there exists a y ∈ X such that f (y) > f (x̄).
Let xt := ty + (1 − t)x̄, 0 ≤ t ≤ 1. Then the convexity of X implies that
xt ∈ X for all t ∈ [0, 1]. On the other hand, if t is small enough, we clearly
have kxt − x̄k ≤ ε. Using strict quasi-concavity of f , we get

f (xt ) = f (ty + (1 − t)x̄) > min{f (x̄), f (y)} = f (x̄),

which holds for t > 0 and t small enough. But this contradicts the assumption
that x̄ is a local maximum. 
Note that the strict quasi-concavity of f also implies that optimal solutions
of the problem (6.8.1) are unique provided they exist. This fact has impor-
tant implications in economics: for example, if strictly quasi-concave utility
functions are used, the solution of consumer’s optimization problem will be
unique.
Theorem 6.9. Let X be a nonempty convex set and assume that f is
strictly quasi-concave. In case an optimal solution of the problem (6.8.1)
exists, it is unique.
Proof. Assume that x∗ ∈ X is an optimal solution of the problem (6.8.1),
and let y ∈ X, y 6= x∗ be another maximum. Then we have f (x∗ ) = f (y) and
the strict quasi-concavity of f implies that

f (tx∗ + (1 − t)y) > min{f (x∗ ), f (y)} = f (x∗ )

holds for any t ∈ (0, 1). But this contradicts the optimality of x∗ since, in
view of convexity of X, we get tx∗ + (1 − t)y ∈ X for t ∈ [0, 1]. 
6.8 QUASI-CONCAVE PROGRAMMING 169

Next, we assume that

X = {x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0},

where each gi , i = 1, . . . , m, is quasi-concave. Then X is a convex set. In


fact, {x | gi (x) ≥ 0} is an upper-level set of gi that corresponds to α = 0, and
so it is convex for all i. We have
m
\
X= {x | gi (x) ≥ 0},
i=1

and thus, X is a convex set.


Now we formulate the general quasi-concave optimization problem:

max{f (x) | gi (x) ≥ 0, i = 1, . . . , m}, (6.8.2)

where f, g1 , . . . , gm are quasi-concave functions. In view of the above dis-


cussion, the set of feasible (admissible) solutions of the problem (6.8.2) is a
convex set. For the objective function we have assumed only quasi-concavity,
and hence, the problem (6.8.2) may have local solutions which are not global.
If we assume that f is strictly quasi-concave, then the problem (6.8.2) has a
unique optimal solution and no local maximum exists. Assuming continuous
differentiability, the Karush-Kuhn-Tucker (KKT) conditions for the problem
(6.8.2) are
m
∂f (x) X ∂g(x)
+ = 0,
∂xi i=1
∂xi
gi (x) ≥ 0, i = 1, . . . , m, (6.8.3)
ti gi (x) ≥ 0, i = 1, . . . , m,
ti ≥ 0, i = 1, . . . , m.

These are necessary conditions of optimality under the usual regularity con-
ditions for general nonlinear programming problems. More details about the
KKT conditions are described in §4.3. Besides these conditions, we have the
following results especially for quasi-concave optimization problems.
Theorem 6.10. (Arrow and Enthoven [1961]) Assume that g1 , . . . , gm are
quasi-concave functions and that the following regularity conditions hold:
(a) there exists an x̄ ∈ Rn such that gi (x̄) > 0 for all i = 1, . . . , m (Slater
condition), and
∂gi
(b) for each i, either gi is concave or otherwise 6= 0, i = 1, . . . , m, for
∂x
each feasible (admissible) solution of the problem (6.8.2).
170 6 QUASI-CONCAVE FUNCTIONS

Then, if x∗ is a locally optimal solution of the problem (6.8.2), there exists


a t∗ such that with (x∗ , t∗ ) the KKT conditions (6.8.3) hold.
The KKT conditions are also sufficient optimality conditions under appro-
priate assumptions (see §4.3). The following theorem is useful.
Theorem 6.11. (Arrow and Enthoven [1961]) Assume that f, g1 , . . . , gm
are quasi-concave functions and that the Kuhn-Tucker conditions (6.8.3) hold
for (x∗ , t∗ ). If f is twice continuously differentiable on the feasible (admissible)
set and if ∇f (x∗ ) = 6 0, then x∗ is an optimal solution of the problem (6.8.2).
Example 6.16. Suppose there is a 1-unit change (decrease) in the constant
of the constraint. We want to determine what change it will make in L and
λ∗ in the above example. Then with the new constraint g(x, y) = x + y − 27,
we have
L(x, y, λ) = 4x2 + 3xy + 6y 2 + λ(27 − x − y),

which gives Lx = 8x + 3y − λ = 0, Ly = 3x + 12y − λ = 0, Lλ = 27 −


x − y = 0. Solving these equations simultaneously we get x∗ = 17.36, y ∗ =
9.64, λ∗ = 167.78; thus, L(17.36, 9.64) = 2265.10 = f (17.36, 9, 64), which is
approximately 171 smaller than the previous value of L = 2436 and close to
the previous value of λ∗ = 174. 
This is the reason why Lagrange multipliers are called shadow prices. Also,
in utility optimization subject to a budget constraint, the value of λ will
estimate the marginal utility of an extra dollar of income.
Example 6.17. Minimize the utility function u(x, y) = x1/2 y 3/5 subject
to the budget constraint 3x + 9y = 66. Define

L(x, y) = x1/2 y 3/5 + λ(66 − 3x − 9y).

Then Lx = 21 x−1/2 y 3/5 − 3λ = 0, Ly = 35 x1/2 y −2/5 − 9λ = 0, Lλ = 66 −


3x − 9y = 0. Solving these equations simultaneously, we get the critical
values: x∗ = 10, y ∗ = 20, λ∗ = 0.3. The second-order partial derivatives are:
Lxx = − 41 x−3/2 y 3/5 , Lxy = 10
3 −1/2 −2/5
x y 6 1/2 −7/5
= Lyx , Lyy = − 25 x y . Then
the bordered Hessian is

− 14 x−3/2 y 3/5 3 −1/2 −2/5


10 x y 3
3 −1/2 −2/5
|H| = 10 6 1/2 −7/5
x y − 25 x y 9 .
3 9 0

The second principal minor is

81 −3/2 3/5 81 −1/2 −2/5 54 1/2 −7/3


|H2 | = |H| = 4 x y + 5 x y + 25 x y > 0,

since all terms are positive. Hence, |H| is positive definite, and L is minimized
at the critical values. 
6.9 SUMMARY 171

6.9 Summary
Recall that a concave function f over X satisfies the inequality (3.2.2), i.e.,

f (tx + (1 − t)y) ≥ tf (x) + (1 − t)f (y)

for any x, y ∈ X and for t ∈ [0, 1]. From this inequality we obtain

f (tx + (1 − t)y) ≥ tf (x) + (1 − t)f (y) ≥ min{f (x), f (y)}.

This leads to the following generalization of concavity: A function f is called


quasi-concave on X, if for all x, y ∈ X and for all t ∈ [0, 1],

min{f (x), f (y)} ≤ f (tx + (1 − t)y),

or equivalently

f (x) ≤ f (y) implies that f (x) ≤ f (tx + (1 − t)y).

The definition of quasi-convexity is analogous: If f is a convex function over


X then for all x, y ∈ X and for all t ∈ [0, 1],

f (x) ≤ f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) ≤ max{f (x), f (y)},

which leads to the following generalization: A function f is called quasi-convex


on X, for all x, y ∈ X and for all t ∈ [0, 1],

max{f (x), f (y)} ≥ f (tx + (1 − t)y),

or equivalently f (x) ≥ f (y) implies f (x) ≥ f (tx + (1 − t)y). This defini-


tion of quasi-convexity is a consequence of the fact that if f is concave then
−f is convex. Moreover, a function f is quasi-linear if it is both quasi-concave
and quasi-convex. The following relations hold:
f is concave over X =⇒ f is quasi-concave over X.
f is convex over X =⇒ f is quasi-convex over X.
f is linear over X =⇒ f is quasi-linear over X.
f is quasi-concave over X =⇒ −f is quasi-convex over X.
f is quasi-convex over X =⇒ −f is quasi-concave over X.
− min{f (x), f (y)} = max{−f (x), −f (y)}.
− max{f (x), f (y)} = min{−f (x), −f (y)}.
The empty set is considered as convex, by convention.
172 6 QUASI-CONCAVE FUNCTIONS

6.10 Exercises
6.1. Is every quasi-concave function concave? If so, prove it; otherwise
provide a counterexample.
Ans. Not every quasi-concave function is concave. Counterexample:
Define the function f (x) = x2 with dom(f ) = R+ (the set of positive real
numbers). We must show that (i) this function f is quasi-concave, and (ii) f
is not concave. To see (i), note that f is a strictly increasing function on R+ .
Thus, if f (x′ ) ≥ f (x) for x, x′ ∈ dom(f ), then x′ ≥ x, and therefore, for any
t ∈ [0, 1], we have tx′ +(1−t)x ≥ x. Hence, f is quasi-concave. To see (ii), note
that f (0) = 0, f (2) = 4, but f ( 21 · 0 + 12 · 2) = f (1) = 1 ≤ 21 f (0) + 12 f (2) = 2,
which cannot be true if f is a concave function. Note that f (x) = x2 defined
on the entire real line would not be quasi-concave (why?).

6.2. Let f and g be real-valued concave functions with the same domain
D. Define a function h such that for all x ∈ D, h(x) = f (x) + g(x). Is h a
concave function?
Solution. Since f and g are both concave functions with domain D, we
have for x, x′ ∈ D and for all t ∈ [0, 1],

f (tx+(1−t)x′ ) ≥ tf (x)+(1−t)f (x′ ), g(tx+(1−t)x′ ) ≥ tg(x)+(1−t)g(x′ ).

Thus, from these two inequalities we get

h(tx + (1 − t)x′ ) = f (tx + (1 − t)x′ ) + f (tx + (1 − t)x′ ) ≥ tg(x) + (1 − t)g(x′ )


≥ tf (x) + (1 − t)f (x′ ) + tg(x) + (1 − t)g(x′ )
= t(f (x) + g(x)) + (1 − t)(f (x′ ) + g(x′ ))
= th(x) = (1 − t)h(x′ ),

which holds for all t ∈ [0, 1], i.e., h is a concave function.

6.3. Let f and g be real-valued concave functions defined on the same


domain D. Define a function h so that h(x) = f (x)g(x) for all x ∈ D. (a) Is h a
concave function? If so, prove it; otherwise provide a counterexample. (b) Is h
a quasi-concave function? If so, prove it; otherwise provide a counterexample.
Solution. (i) h is not necessarily concave. For example, let D be the real
line, and let f (x) = x = g(x). Both f and g are concave functions. But since
h(x)x2 , with h′′ (x) = 2 > 0, the function h is not a concave function. (For h
to be concave, h′′ (x) must be negative or zero everywhere on D).
(ii) h is not in general quasi-concave, except in the case when both f (x) > 0
and g(x) > 0 for all x ∈ D. If this is the case, then log h(x) = log f (x) +
6.10 EXERCISES 173

log g(x), where both log f (x) and log g(x) are concave functions, by virtue of
the property that a concave function of a concave function is concave. Thus,
log h(x), being the sum of two concave functions, is concave. Hence, h(x) is
a monotone transformation of a concave function, and therefore, it is quasi-
concave. On the other hand, if one of the functions f and g is negative valued,
then log h(x) is not well-defined, and so the above argument fails. In fact, h(x)
is, in general, not quasi-concave if both f and g are concave. For example, let
D be the set of all real numbers R, and f (x) = −1 and g(x) = −x−2 for all
x ∈ R, whence h(x) = −g(x) = x−2 . Note that although both of the functions
f and g are concave, since f ′′ (x) < 0 and g ′′ (x) < 0, the function h(x) is not
quasi-concave because h(1) = 1 = H(−1), but h( 12 · 1 + 21 · (−1)) = h(0) =
0 < h(1).

6.4 A consumer utility function is defined as

u(x, y) = min{v1 (x, y), v2 (x, y)},

where v1 and v2 are both quasi-concave functions. Is u quasi-concave? If it


is, prove it; otherwise provide a counterexample.
Solution. By assumption vi , i = 1, 2, are concave functions, which means
that for t ∈ [0, 1],

vi (tx + (1 − t)y) ≥ min{vi (x), vi (y)}, i = 1, 2.

Now,

u(tx + (1 − t)y) = min{v1 (tx + (1 − t)y), v2 ((tx + (1 − t)y)}


≥ min{min{v1 (x), v1 (y)}, min{v2 (x), v2 (y)}}
= min{min{v1 (x), v2 (x)}, min{v1 (y), v2 (y)}}
= min{u(x), u(y)},

which implies that u is quasi-concave.

6.5. Use Lagrange multipliers to optimize f (x, y, z) = xyz 2 subject to the


constraint x + y + z = 20. Define

L(x, y, z) = xyz 2 + λ(20 − x − y − z).

Then Lx = yz 2 − λ = 0, Ly = xz 2 − λ = 0, Lz = 2xyz − λ = 0, Lλ =
20 − x − y − z = 0. To solve these equations simultaneously, we equate λ from
the first two equations, and from the first and the third equation, giving:

yz 2 = xz 2 , yz 2 = 2xyz,
174 6 QUASI-CONCAVE FUNCTIONS

or y = x and z = 2x. Substituting these in the fourth equation we get:


20 − x − x − 2x = 0, or x∗ = 5, which gives y ∗ = 5, z ∗ = 10, λ∗ = 500 as
critical values. Thus, L(5, 5, 10) = 2500.
The second-order derivatives are: Lxx = 0, Lyy = 0, Lzz = 2xy, Lxy =
z 2 , Lyz = 2xz, Lxz = 2yz. Also, from g(x, y, z) = x + y + z − 20, we get
gx = 1 = gy = gz . Then the bordered Hessian from Eq (1.6.7), using the
second form, is

0 1 1 1
1 0 z 2 2yz
|H| =  .
1 z2 0 2xz
1 2yz 2xz 2xy
The second principal minor is |H̄2 | = 0 − 1(−z 2) + 1(z 2 ) = 2z 2 . Thus,
|H̄2 |10 > 0. The third principal minor is

1 z 2 2yz 1 0 2yz 1 0 z2
|H̄3 | = |H| = 0 − 1 1 0 2xz + 1 1 z 2 2xz − 1 1 z2 0
1 2xz 2xy 1 2yz 2xy 1 2yz 2xz
 2

= − 1(0 − 2xz · 2xz) − z (2xy − 2xz) + 2yz(2yz − 0)
 
+ 1(z 2 · 2xy − 2yz · 2xz) − 0 + 2yz(2yz − z 2 )
 
− 1(z 2 · 2xz − 0) − 0 + z 2 (2yz − z 2 )
= z 4 − 4xz 3 − 4yz 3 − 4xyz 2 + 4x2 z 2 + 4y 2 z 2 .

Thus, |H̄3 |5,5,10 = −20000 < 0. Hence, |H̄2 | > 0 and |H̄3 | < 0 imply that |H|
is negative definite, and the function f is maximized at the critical values. 

6.6. Minimize the total costs defined by c = 15x2 + 30xy + 30y 2 when the
firm meets the quota g(x, y) equal to 2x + 3y = 20. Define

L(x, y) = 15x2 + 30xy + 30y 2 + λ(20 − 2x − 3y).

Then Lx = 30x + 30y − 2λ = 0, Ly = 30x + 60y − 3λ = 0, Lλ = 20 − 2x − 3y.


Solving these three equations simultaneously, we get the critical values: x∗ =
4, y ∗ = 4, λ∗ = 120.
The second-order partial derivatives are: Lxx = 30, Lyy = 60, Lxy = 30 =
Lyx , and gx = 2, gy = 3. Thus, the bordered Hessian (1.6.7) is

30 30 2
|H| = 30 60 3 .
2 3 0

The second principal minor is |H2 | = −150 < 0 Thus, |H| is positive definite
and L is minimized when x = y = 12. 
6.10 EXERCISES 175

6.7. Maximize the utility function u = Q1 Q2 when P1 = 1, P2 = 3, and


the firm’s budget is B = 60. Also estimate the effect of a 1-unit increase in
the budget. The budget constraint is Q1 + Q2 = 60, and the constraint is
Q1 + 3Q2 = 60. We consider the Lagrangian

L = Q1 Q2 + λ(60 − Q1 − 3Q2 ).

The first-order partial derivatives equated to zero give: LQ1 = Q2 − λ =


0, LQ2 = Q1 − 3λ = 0, Lλ = 60 − Q1 − 4Q2 = 0. Solving these equations
simultaneously we obtain the critical values: Q∗1 = 30, Q∗2 = 10 = λ∗ . The
second-order partial derivatives are: LQ1 Q1 = 0 = LQ2 Q2 , LQ1 Q2 = 1 =
LQ2 Q1 , giving the Hessian

0 1
|H| = = −1 < 0.
1 0

Hence, L is maximized at the critical values.


With λ∗ = 10, a $1 increase in the budget will change the constant of the
constraint to 61, so that the new Lagrangian is

L = Q1 Q2 + λ(61 − Q1 − 4Q2 ),

which yields LQ1 = Q2 − λ = 0, LQ2 = Q1 − 3λ = 0, Lλ = 121 − Q1 −


3Q2 = 0. Solving these equations simultaneously we obtain the critical values
Q∗1 = 30.5, Q∗2 = 10.167 = λ∗ . Thus, the utility function increases from
u = (30)(10) = 300 to u = (30.5)(10.167) = 310.083, i.e., there is an increase
in the utility function of about 10. 

6.8. Let f be a function of n variables with continuous partial derivatives


of all orders in an open convex set S, and let |B|k be the determinant of its kth
order bordered Hessian B. Verify that the criteria to check quasi-concavity
and quasi-convexity, described in §6.2, can be summarized as follows.
(i) If f is quasi-concave, then

|B|1 (x) ≤ 0, |B|2 (x) ≥ 0, . . . , |B|n (x) ≤ 0 if n is odd,
|B|1 (x) ≤ 0, |B|2 (x) ≥ 0, . . . , |B|n (x) ≥ 0 if n is even,

for all x ∈ S;
(ii) if f is quasi-convex, then |B|k (x) ≤ 0 for all x ∈ S, k = 1, 2, . . . , n;
(iii) if for all x ∈ S,

|B|1 (x) < 0, |B|2 (x) > 0, . . . , |B|n (x) < 0 if n is odd,
|B|1 (x) < 0, |B|2 (x) > 0, . . . , |B|n (x) > 0 if n is even,
176 6 QUASI-CONCAVE FUNCTIONS

then f is quasi-concave; and


(iv) if |B|k (x) < 0 for all x ∈ S, k = 1, 2, . . . , n, then f is quasi-convex.
This theorem can also be stated as follows:

|B|1 (x) ≤ 0, |B|2 (x) ≥ 0, . . . , |B|n (x) ≤ 0 if n is odd,
|B|1 (x) ≤ 0, |B|2 (x) ≥ 0, . . . , |B|n (x) ≥ 0 if n is even,
for all x ∈ S, is a necessary condition for quasi-concavity, and

|B|1 (x) < 0, |B|2 (x) > 0, . . . , |B|n (x) < 0 if n is odd,
|B|1 (x) < 0, |B|2 (x) > 0, . . . , |B|n (x) > 0 if n is even,

for all x ∈ S, is a sufficient condition for quasi-concavity.


Note that unlike an analogous result for concave functions, the above con-
ditions do not cover all cases of quasi-concave functions. For example, in the
case when |B|k (x) ≤ 0 for all k ≤ n but |B|j (x) = 0 for some j and some
x ∈ S, the above result may include the possibility that the function f is
quasi-convex, but it fails to tell if it is.

6.9. Prove that a function f of a single variable defined on an interval I


is quasi-concave iff there exists a number x′ such that f is nondecreasing on
{x ∈ I : x < x′ } and nonincreasing on {x ∈ I : x > x′ }. Hint. Suppose
f satisfies the condition and suppose that for some number a the points x1
and x2 belong to the upper-level set Uα . Then f (x1 ) ≥ a and f (x2 ) ≥ a,
so that f (x) ≥ a for every point x between x1 and x2 . Thus, x ∈ Uα , so
that Uα is convex and therefore, f is quasi-concave. If f does not satisfy the
condition, we can find x1 , x2 and x3 in I such that x1 < x2 < x3 and f (x2 ) <
min{f (x1 ), f (x3 )}. Then the upper-level set Uα for a = min{f (x1 ), f (x3 )}
includes x1 and x3 , but not x2 , and therefore, Uα is not convex, and f is not
quasi-concave.

6.10. Prove that a differentiable function f of n variables defined on a


convex set S is quasi-concave on S iff

n
X
x, x′ ∈ S, and f (x) ≥ f (x′ ) =⇒ fj′ (x′ )(xj − x′j ) ≥ 0.
j=1

Hint. Use (6.2.1).

6.11. Prove that all extrema (critical points) of a concave function are
global maxima. Hint. The definition of concavity f (tx + (1 − t)f (y)) ≥
tf (x) + (1 − t)f (y) can be written as f (y + t(x − y)) ≥ f (y) + t[f (x) − f (y)],
6.10 EXERCISES 177

f (y + t(x − y)) − f (y)


or ≥ f (x) − f (y). Taking the limits as t → 0, and
t
noticing that the right-hand side does not depend on t, we get

f (y + t(x − y)) − f (y)


lim = f ′ (y) ≥ f (x) − f (y). (6.10.2)
t→0 t

Suppose y is a critical point so that f ′ (y) = 0. Then Eq (6.10.2) gives f (x) −


f (y) ≤ 0, which implies that f (x) ≤ f (y) for all x. Hence, the function f is
lower at any point other than the critical point y, i.e., y is a global maximum
of f .

6.12. Prove that a monotone composite transformation (g ◦ f ) = g(f )(x)


of a quasi-concave function f is itself quasi-concave.
Proof. Let α ∈ R. Since g is monotonic, there exists an α′ ∈ R such that
α = g(α′ ). Then the upper-level set of f

Uα (g(f ))) = {x, g(f (x)) ≥ α} = {x, g(f ) ≥ g(α′ )} = {x, f (x) ≥ α′ } = Uα′ (f )

is a convex set. Note that a composite transformation of a concave function


is not necessarily concave; for example, f (x) = x is concave, g(x) = x3 is
monotonically increasing, but g(f (x)) = x3 is not concave in R.

6.13. Prove that every Cobb-Douglas utility function u(x, y) = Ax2 y 2 ,


a, b > 0, is quasi-concave.
Hint. We know that the DRS (decreasing return to scale) Cobb-Douglas
function, such as f (x, y) = x1/3 y 1/3 , is concave. Also, an IRS (increasing
return to scale) Cobb-Douglas function, such as g(x, y) = x2/3 y 2/3 , is quasi-
concave. This means that the IRS Cobb-Douglas function is the monotonic
composite transformation of the DRS Cobb-Douglas function:

f (x, y) = x2/3 y 2/3 = (x1/3 y 1/3 )2 ,

so x2/3 y 2/3 = g((f (x, y)), where f (x, y) = x1/3 y 1/3 and g(z) = z 2 , where g(z)
is a monotonic transformation.

6.13. Any CES utility function u(x, y) = (axr + by r )1/r , 0 < r < 1, is
quasi-concave, since u(x, y) = g(h(x, y)), where h(x, y) = axr + by r is a con-
cave function because it is a positive linear combination of concave functions,
and h(z) = z 1/r is a monotonic transformation.

6.14. Show, by an example, that quasi-concave functions do not have the


same implications for continuity and differentiability as concave functions.
178 6 QUASI-CONCAVE FUNCTIONS

Ans. Let f : R+ 7→ R be defined by


 3
x
 if 0 ≤ x ≤ 1,
f (x) = 1 if 1 ≤ x ≤ 2,

 3
x if x > 2.

Since f is non-decreasing, it is both quasi-concave and quasi-convex on R.


But f is discontinuous at x = 2, and not differentiable there. Moreover, f
is constant on (1, 2), and thus, every point in this open interval is a relative
maximum as well as a relative minimum. However, no point in the interval
(1, 2) is neither a global maximum nor a global minimum. Finally, f ′ (0) = 0,
but 0 is neither a relative maximum nor a relative minimum, and f is not
differentiable at x = 1.

6.15. Show that the function f : R2+ 7→ R given by f (x, y) = 50 ln x ln y


is stricyly concave on [e, ∞) ⊂ R2+ , but only quasi-concave on the larger
domain [1, ∞). Hint: Use 3-D plot. Show that the second principal minor
|H2 | = fxx fyy − (fxy )2 = 0 at the point (e, e).
7
Quasi-Convex Functions

Since a function f is quasi-convex in a domain D ∈ Rn , iff (−f ) is quasi-


concave in that domain, we can repeat most of the results of the previous
chapter by replacing the upper-level set by the lower-level set, the ≥ sign
by ≤, and the operation ‘min’ by ‘max’. However, there are some useful
and interesting results in quasi-convex function theory, and we will risk some
repetition and provide all relevant information on the topic.

7.1 Quasi-Convex Functions


A real-valued function f : Rn 7→ R is quasi-convex if dom(f ) is convex and
the lower-level sets Lα = {x ∈ dom(f )|f (x) ≤ α} are convex for all α such
that for all x, x′ ∈ dom(f ) and all t ∈ [0, 1],

f (tx + (1 − t)x′ ) ≤ max{f (x), f (x′ )}. (7.1.1)

A real-valued function f : Rn 7→ R is strictly quasi-convex iff

f (tx + (1 − t)x′ ) < max{f (x), f (x′ )}, (7.1.2)

for all x, x′ ∈ dom(f ) and all t ∈ [0, 1]. The inequality (7.1.2) is the defining
property of a quasi-convex function, with an additional property that the neg-
ative of a (quasi-) convex function is a (quasi-) concave function. Since ‘quasi’
means ‘as if’, we expect quasi-convex functions to have some special proper-
ties similar to those for convex functions (and similarly in the case of quasi-
concave functions). Moreover, since every convex function is quasi-convex,
we expect the convex functions to be more highly structured. Although De
Finetti [1949] was the first person to recognize some of these characteristics of
functions having convex level sets, it was Fenchel [1983] who was the pioneer
in formalizing, naming, and developing the class of quasi-convex functions.
Later, Slater [1950] generalized the KKT saddle-point equivalence theorem,
and Arrow and Enthoven [1961] laid the foundation of quasi-convex program-
ming with applications to consumer demand.
180 7 QUASI-CONVEX FUNCTIONS

A strictly
 quasi-convex function need not be quasi-convex. An example is:
1 if x = 0,
f (x) = . The lower-level set Lα = {x : f (x) ≤ 0} for α = 0 is
6 0.
0 if x =
not convex, but f is strictly quasi-convex.
Note that it is not proper to define strict quasi-convexity by requiring that
the lower contour sets should be strictly convex, because a lower contour set
can be strictly convex even when f has some flat portions. Further, a function
is strictly quasi-convex iff −f is strictly quasi-concave; and a strictly quasi-
convex function is quasi-convex. The lower-level set Lα and quasi-convex
functions are presented in Figure 7.1. Also, f is said to be quasi-linear if it is
quasi-convex and quasi-concave.
The inequality (7.1.1) became the defining property of a convex and quasi-
convex function, with an additional property that the negative of a (quasi-)
convex function is a (quasi-) concave function. Thus, we will find that quasi-
convex functions have some special properties similar to those for convex
functions (and similarly in the case of quasi-concave functions).

Figure 7.1 Quasi-convex function.

Thus, we can prove that a convex function is quasi-convex. The proof is as


follows: Let the function f have the domain S (a convex set). Let a be a real
number and x and y be points in the lower-level set Lα with x, y ∈ Lα . First,
we show that the set Lα is convex. For this, we need to show that for every
t ∈ [0, 1] we have tx + (1 − t)y ∈ Lα . Since S on which f is defined is convex,
we have tx+(1−t)y ∈ S, and thus f is defined at the point tx+(1−t)y. Now,
convexity of f implies that f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y). Moreover,
the fact that x ∈ Lα means that f (x) ≤ a, and similarly, y ∈ Lα means that
f (y) ≤ a. Hence, tf (x) + (1 − t)f (y) ≤ ta + (1 − t)a = a. Combining these
two inequalities we get f (tx + (1 − t)y) ≤ a, so that tx + (1 − t)y ∈ Lα . Thus,
every upper-level set is convex and hence, f is quasi-convex. 
Note that quasi-concavity is weaker than concavity, in the sense that every
7.1 QUASI-CONVEX FUNCTIONS 181

convex function is quasi-convex. This is similar to the case that every concave
function is quasi-concave. The following result relates to both quasi-concave
and quasi-convex functions.
Theorem 7.1. Let F be a function defined on Rn , and g be a function
defined on R. If F is quasi-concave and g is decreasing, then the function
f (x) = g(F (x) is quasi-convex for all x.
Theorem 7.2. A function f defined on a convex set S ∈ Rn is quasi-convex
iff for all x, x′ ∈ S such that f (x) ≥ f (x′ ), we have for t ∈ [0, 1],

f (tx + (1 − t)x′ ) ≤ f (x′ ). (7.1.3)

Proof. First, suppose that f is quasi-convex. Let x, x′ ∈ S, and f (x) ≤


f (x′ ). Then for α = f (x′ ) we have x ∈ Lα and x′ ∈ Lα . Since f is quasi-
convex, Lα is convex, and thus, tx + (1 − t)x′ ∈ Lα for all t ∈ [0, 1]. Hence,
f (tx + (1 − t)x′ ) ≤ a = f (x′ ) for all t ∈ [0, 1].
Next, suppose that for all x, x′ ∈ S with f (x) ≤ f (x′ ) and for all t ∈ [0, 1],
we have f (tx + (1 − t)x′ ) ≤ f (x′ ). For any α, the set Lα is either empty,
in which case it is convex, or consists of a single point, in which case it is
convex, or contains two or more points, in which case we choose x, x′ ∈ Lα
with f (x) ≤ f (x′ ). Then f (tx + (1 − t)x′ )) ≤ f (x′ ) ≤ α for all t ∈ [0, 1],
because x′ ∈ Lα . Hence, tx + (1 − t)x′ ∈ Lα , so that Lα is convex, and f is
quasi-convex. 
Note that a similar theorem holds for quasi-concave functions (Exercise
6.2).
p
Example 7.1. Consider the function f (x, y) = x2 + y 2 , which is convex
and quasi-convex (see Figure 7.2).

p
Figure 7.2 f (x, y) = x2 + y 2 .

Recall that the sum of two convex functions is a convex function. How-
ever, the sum of two quasi-convex functions is not necessarily a quasi-convex
function, i.e., if f and g are quasi-convex, then (f + g)(x) = f (x) = g(x) need
182 7 QUASI-CONVEX FUNCTIONS

not be quasi-convex. Also, the sum of a convex function and a quasi-convex


function is not necessarily either a convex function or a quasi-convex function.
A concave function can be quasi-convex. For example x 7→ log x is concave
and quasi-convex.
Any monotone function is both quasi-convex and quasi-concave. More
generally, a function that decreases up to a point and increases thereafter is
quasi-convex (compare unimodality, i.e., there is a single highest value).
Let S be a convex subset of Rn , and let f : S 7→ R be a function. Then
the following statements are equivalent:
1. f is quasi-convex on S.
2. For all x, y ∈ S and all t ∈ (0, 1),

f (x) ≤ f (y) =⇒ f (tx + (1 − t)y) ≤ f (y). (7.1.4)

3. For all x, y ∈ S and all t ∈ (0, 1),

f (tx + (1 − t)y) ≤ max{f (x), f (y)}. (7.1.5)

Let f : Rn 7→ R be a quasi-convex function, and let g : R 7→ R be a


nondecreasing function whose domain contains R(f ). Then the composite
function g ◦ f : Rn 7→ R is a quasi-convex function.
Let f : Y 7→ R be a real-valued function defined on the open set Y ∈ Rn ,
and let X ⊂ Y be a convex set. We will assume that f is continuous or
differentiable on Y . The assumption that Y ⊂ Rn , but not Y = Rn , is
justified in view of the following two examples.
Example 7.2. Consider the Cobb-Douglas utility function u(x, y) =
xa yb , a, b > 0, a + b = 1, with dom(u) = Rn+ = {x | xi ≥ 0, i = 1, . . . , n}

with n = 2. Take a = b = 21 , then u(−1, 1) = i · 1 = i, where i = −1, is
not defined on R, but is defined on Rn+ , although the derivatives do not exist
on the boundary. Hence, the function u(x, y) is differentiable on Y = Rn++ =
{x|xi > 0, i = 1, . . . , n} with n = 2. 
Example 7.3. The additively separable logarithmic utility function u(x, y)
= log x + log y, with dom(u) = R2++ , is differentiable on Y = Rn++ = {x|xi >
0, i = 1, . . . , n} with n = 2. 
Quasi-convexity is a generalization of convexity. The set of all quasi-convex
(quasi-concave) functions contains the set of all convex (concave) functions.
Example 7.4. Let f : R 7→ R be an increasing function. Then f is both
quasi-concave and quasi-convex. To prove this, consider x, y ∈ R, and assume
without loss of generality that x > y. Then for any t ∈ (0, 1), we have

x > tx + (1 − t)y > y. (7.1.6)


7.2 PROPERTIES OF QUASI-CONVEX FUNCTIONS 183

Since f is increasing, we have


f (x) ≥ f (tx + (1 − t)y) ≥ f (y). (7.1.7)
Since f (x) = max{f (x), f y}, the inequality (7.1.6) shows that f is quasi-
convex.
It is always possible to choose a nondecreasing function that is neither
concave nor convex on R, like f (x) = x3 . However, we have shown that not
every quasi-convex) function is convex.
Example 7.5. Consider the indirect utility function
v(p, m) ≡ max{u(x) | px ≤ m}, (7.1.8)
where v(p, m) denotes the decrease in the north-east direction; p is the com-
modity price and m the income (money) of a consumer; and u is a utility
function. Figure 7.3 shows that in the case when u is quasi-concave and
monotone in consumption bundles, then v is monotone increasing in m and
decreasing in p, and quasi-convex in (p, m). Its gradient is in the south-west
direction, and its lower contour set is the shaded region in Figure 7.3.

Figure 7.3 v(p, m).

Note that, contrary to quasi-concavity, the quasi-convexity provides a con-


dition that allows minimization of v(p, x) on the lower contour set {p : v(p, m) ≤
v̄}, such that the indirect utility function plays the duality role in the theory
of consumption and demand. 

7.2 Properties of Quasi-Convex Functions


1. f is quasi-convex iff it is quasi-convex on lines, i.e., f (x0 +th) is quasi-convex
in t for all x0 and h.
2. The modified Jensen’s inequality is defined as follows: f is quasi-convex iff
for all x, y ∈ dom(f ) and t ∈ [0, 1]
f (tx + (1 − t)y) ≤ max{f (x), f (y)}. (7.2.1)
184 7 QUASI-CONVEX FUNCTIONS

This is presented in Figure 7.4.

Figure 7.4 Modified Jensen’s inequality.

3. Positive multiples: If f is quasi-convex and α ≥ 0, then αf is quasi-convex.


4. Positive maximum: If f1 , f2 are quasi-convex, then max{f1 , f2 } is quasi-
convex. This property extends to a supremum over an arbitrary set.
5. Affine transformation of the domain: If f is quasi-convex, then f (Ax + b)
is quasi-convex.
6. Composition with a monotone increasing function: If f is quasi-convex and
g a monotone increasing function, then (g ◦ f ) = g(f (x)) is quasi-convex.
7. If f is quasi-convex, then g(x) = inf f (x, y) is quasi-convex in x.
y

8. In general, the sums of quasi-convex functions are not necessarily quasi-


convex.
9. For a quasi-convex function f there holds the contraction property: f (tx) ≤
tf (x) for t ∈ [0, 1] if f (0) ≥ 0. It follows directly from f (tx) = f [tx + (1 − t)0]
≤ tf (x) + (1 − t)0 ≤ tf (x). This implies that the inequality for quasi-convex
functions is a weaker upper bound, since f (x) > tf (x) for t < 1 and f (x) > 0.
Thus, f (tx) ≤ f (x), but contradiction does not hold in that f (tx) may be
equal to f (x) for t ∈ (0, 1).
10. A differentiable function f on a convex domain is quasi-convex iff the
following first-order condition is satisfied:

∂f (x)
f (y) ≤ f (x) =⇒ · (y − x) ≤ 0. (7.2.2)
∂x

This is known as the first-order condition for quasi-convexity.


Theorem 7.3. A function f defined on a convex set S ∈ Rn is quasi-
7.2 PROPERTIES OF QUASI-CONVEX FUNCTIONS 185

convex iff for all x, x′ ∈ S such that f (x) ≥ f (x′ ), we have for t ∈ [0, 1],

f (tx + (1 − t)x′ ) ≤ f (x′ ). (7.2.3)

Proof. First, suppose that f is quasi-convex. Let x, x′ ∈ S, and f (x) ≤


f (x′ ). Then for a = f (x′ ) we have x ∈ Lα and x′ ∈ Lα . Since f is quasi-
convex, Lα is convex, and thus, tx + (1 − t)x′ ∈ Lα for all t ∈ [0, 1]. Hence,
f (tx + (1 − t)x′ ) ≤ a = f (x′ ) for all t ∈ [0, 1].
Next, suppose that for all x, x′ ∈ S with f (x) ≤ f (x′ ) we have for all
t ∈ [0, 1],
f (tx + (1 − t)x′ ) ≤ f (x′ ). (7.2.4)

Then for any α, the set Lα is either empty, in which case it is convex, or
consists of a single point, in which case it is convex, or contains two or
more points, in which case we choose x, x′ ∈ Lα with f (x) ≤ f (x′ ). Then
f (tx + (1 − t)x′ )) ≤ f (x′ ) ≤ a for all t ∈ [0, 1], because x′ ∈ Lα . Hence,
tx + (1 − t)x′ ∈ Lα , so that Lα is convex, and f is quasi-concave. 

Example 7.6. Consider the function f (x) = x2 , x > 0. We have |B1 | =


0 f1 0 2x
= = −4x2 < 0 for all x > 0. Hence, this function is both
f1 f11 2x 2
quasi-convex and quasi-concave on the set {x : x > 0}. 

Example 7.7. Consider the function f (x) = x2 , x ≥ 0. Then |B1 (0)| = 0,


and this function does not satisfy the sufficient condition for either quasi-
concavity or quasi-convexity. 
Let S be a convex subset of Rn , and let f : S 7→ R be a function. Then
the following statements are equivalent:
1. f is quasi-convex on S.
2. For all x, y ∈ S and all t ∈ (0, 1),

f (x) ≤ f (y) =⇒ f (tx + (1 − t)y) ≤ f (y). (7.2.5)

3. For all x, y ∈ S and all t ∈ (0, 1),

f (tx + (1 − t)y) ≤ max{f (x), f (y)}. (7.2.6)

Let S be a convex subset of R, and let f : S 7→ R be a function. Then f


is said to be strictly quasi-convex if for all x, y ∈ S with x 6= y, and for all
t ∈ (0, 1),
f (tx + (1 − t)y) < max{f (x), f (y)}. (7.2.7)
186 7 QUASI-CONVEX FUNCTIONS

It is always possible to choose a nondecreasing function that is neither


concave nor convex on R, like f (x) = x3 . However, we have shown that not
every quasi-convex (quasi-concave) function is convex (concave).

7.3 Bordered Hessian Test


We can use the bordered Hessian B, presented in §1.6.4, to check if a given
function is quasi-convex. Unlike the bordered Hessian H̄ that is used in op-
timization problems subject to extraneous constraints, the bordered Hessian
B, defined for the function f alone, is useful to test if the function f is quasi-
concave or quasi-convex. Thus, for a function f with domain in Rn , the
determinant of the bordered Hessian B is defined by

0 f1 f2 ··· fn
f1 f11 f12 ··· f1n
|B| = f2 f21 f22 ··· f2n , (7.3.1)
··· ··· ··· ··· ···
fn fn1 fn2 ··· fnn

where fi are the first-order derivatives of f , and fij are the second-order
derivatives of f . The leading principal minors of B are

0 f1 f2
0 f1
|B1 | = , |B2 | = f1 f11 f12 , |Bn | = |B|. (7.3.2)
f1 f11
f2 f21 f22

7.3.1 Properties of the Bordered Hessian. There are two properties


which can be  described as follows.
  
quasi-concave (−1)k−1 |Bk | ≥ 0
1. If f is on a open set D ∈ R, then
quasi-convex |Bk | ≤ 0
for k = 2, . . . , n + 1.
 
(−1)k−1 |Bk | > 0
2. If f is on a open set D ∈ R for k = 2, . . . , n + 1.,
 |Bk | < 0
quasi-concave
then f is .
quasi-convex
The top choice in property 2 requires that the signs of the leading principal
minors alternate, starting with the negative sign for the 2 × 2 matrix |B2 |.
Example 7.8. Consider the function f (x) = x2 , x > 0. We have |B1 | =
0 f1 0 2x
= = −4x2 < 0 for all x > 0. Hence, this function is both
f1 f11 2x 2
quasi-concave and quasi-convex on the set {x : x > 0}. 
Example 7.9. Consider the function f (x) = x2 , x ≥ 0. Then |B1 (0)| = 0,
and this function does not satisfy the sufficient condition for either quasi-
concavity or quasi-convexity. 
7.4 QUASI-CONVEX OPTIMIZATION 187

Example 7.10. Let f : R 7→ R be an increasing function. Then f is both


quasi-concave and quasi-convex. To prove this, consider x, y ∈ R, and assume
without loss of generality that x > y. Then for any t ∈ (0, 1), we have

x > tx + (1 − t)y > y. (7.3.3)

Since f is increasing, we have

f (x) ≥ f (tx + (1 − t)y) ≥ f (y). (7.3.4)

Since f (x) = max{f (x), f y}, the inequality (7.3.3) shows that f is quasi-
convex. Similarly, since f (x) = min{f (x), f (y)}, the inequality (7.3.4) shows
that f is quasi-concave.

7.4 Quasi-Convex Optimization


7.4.1 No Constraints. Some examples are as follows.
Example 7.11. Minimize f (x, y, z) = 2x2 − 5x − xy + 3y 2 − 4y + 3yz +
2
4z + 3z − 2xz. Equating the first-order partial derivatives, we get

fx = 4x − 5 − y − 2z = 0, fy = −x + 6y − 4 + 3z = 0, fz = 3y + 8z + 3 − 2x = 0,

which in the matrix form Ax = b is written as


    
4 −1 −2 x 5
 −1 6 3 y =  4 
−2 3 8 z −3

which, by using Cramer’s rule gives |A| = 136, |A1 | = 176, |A2 | = 216, |A3 | =
−76, giving the critical point (x∗ , y ∗ , z ∗ ) approximately as (1.29, 1.59, −0.56).
Next, the second-order partial derivatives are:

fxx = 4, fxy = −1, fxz = −2;


fyx = −1, fyy = 6, fyz = 3;
fzx = −2, fzy = 3, fzz = 8.

Then the Hessian is  


4 −1 −2
|H| =  −1 6 3 ,
−2 3 8
 
4 −1
and the minors are: |H1 | = 4 > 0, |H2 | = = 25 > 0, and
−1 6
|H3 | = |H| = |A| = 176 > 0. Thus, the Hessian is positive definite (PD), and
188 7 QUASI-CONVEX FUNCTIONS

the function f is minimized at the critical point (or, the critical point is the
minimizer of f ). 
Example 7.12. Consider f (x, y) = 5x3 − 15xy + 5y 3 . Equating the first-
order derivatives to zero, we get f1 5x2 − 15y = 0, fy = −15x + 15y 2 = 0.
Solving these two equations, we get the critical points as (0, 0) and (1, 1).
Next, the second-order derivatives are fxx = 30x, fxy = −15 = fyx , fyy =
30y. At the point (0, 0), we find that fxx 0 = fyy , fxy = fyx = −15. Notice
that fxx and fyy have the same sign, and fxx fyy = 0 < (fxy )2 = (−15)2 ;
hence, the function has a point of inflection at (0, 0). Next, at the point
(1, 1), we have fxx = 30 > 0, fxy = −15 = fyx , fyy = 30 > 0, and fxx fyy =
900 > (fxy )2 = (−15)2 = 225; hence, the function has a relative minimum at
(1, 1). 

7.4.2 Equality Constraints. Some examples follow.


Example 7.13. The total costs of a firm are given by c(x, y) = 25x2 +
50xy + 50y 2 when the firm is required to meet the production quota g(x, y) =
2x + 3y = 40 (i.e., c(x, y) is subject to the equality constraint g). The Lapla-
cian is given by

C(x, y) = 25x2 + 50xy + 50y 2 + λ(40 − 2x − 3y).

To find the critical values, we equate the first-order partial derivatives of C to


zero, thus yielding Cx = 50x + 50y − 2λ = 0, Cy = 50x + 100y − 3λ = 0, Cλ =
4 − 2x − 3y = 0. Solving these three equations, using Cramer’s rule, we get
the critical values as x∗ = 8 = y ∗ , λ∗ = 400. Next, the second-order partial
derivatives of C are: Cxx = 50 = Cxy = Cyx , Cyy = 100, and the first-order
partial derivatives of g are: gx = 2, gy = 3. Thus, the bordered Hessian is

50 50 2
|H̄| = 50 100 3 .
2 3 0

Then |H̄2 | = |H̄| = −250 < 0; thus, the bordered Hessian is positive definite
(PD), and the costs c(x, y) are minimized at the critical values. 

Example 7.14. Minimize the cost of production of 400 units of a good


when Q = 5K 0.6 L0.2 , and PK = 20, PL = 10, i.e., minimize c(K, L) = 20K +
10L subject to the constraints 5K 0.6 L0.2 = 400. The Laplacian is

C(K, L) = 20K + 10L + λ(400 − 5K 0.6 L0.2 ).

To find the critical values, we equate the first-order partial derivatives of


C to zero, thus yielding CK = 20 − 5λ(0.6)K −0.4 L0.2 = 0, CL = 10 −
7.4 QUASI-CONVEX OPTIMIZATION 189

20
5λK 0.6 (0.2)L−0.8 , Cλ = 400 − 5K 0.6 L0.2 . The first two equations yield =
10
3λK −0.4 L0.2
, or K = 2L/3, which when substituted in the third equation
λK 0.6 L−0.8
gives K ≈ 379, L ≈ 569. Next, the second-order partial derivatives of C
are: CKK = 1.2λK −1.4 L0.2 , CLL = λK 0.6 L−1.8 , CKL = −0.6λK −0.4 L−0.8 =
CLK . Thus, the bordered Hessian is

1.2λK −1.4 L0.2 −0.6λK −0.4L−0.8 3K −0.4 L0.2


|H̄| = −0.6λK −0.4L−0.8 λK 0.6 L−1.8 K 0.6 L−0.8 .
3K −0.4 L0.2 K 0.6 L−0.8 0

Then

|H̄2 | = 3K −0.4 L0.2 [−0.6λK 0.2L−1.6 − 3λK 0.2 L−1.6 ]


− K 0.6 L−0.2 [1.2λK −0.8 L0.4 + 1.8λK −0.8 L0.4 ]
= −10.8λK −0.2L−1.4 − 3λK −0.2 L0.2 < 0,

since K, L, λ > 0. Hence, the bordered Hessian is positive definite (PD), and
the cost c(x, y) is minimized at the critical values. 
7.4.3 Inequality Constraints. Minimize a quasi-convex function f (x)
subject to gi (x) ≤ 0, i = 1, . . . , m, and Ax = b, where the inequality con-
straints gi , are convex. If the objective function f is differentiable, the first-
order condition for quasi-convexity implies that x is optimal if ∇f (x)T (y − x)
> 0 for all x = 6 y. This condition is only sufficient for optimality, and it
requires that ∇f = 6 0. Note that ∇f = 0 holds in the convex case for x to
be optimal. Figure 7.5 shows that the simple optimality condition f ′ (x) = 0,
valid for convex functions, does not hold for quasi-convex function.

Figure 7.5 Quasi-convex optimality.

7.4.4 Convex Feasibility Method. A general method to solve quasi-convex


optimization problem uses the representation of the sublevel sets of a quasi-
convex function via a family of convex inequalities. Let φt : Rn 7→ R, t ∈ R,
be a family to convex functions such that f (x) ≤ t ⇐⇒ φt (x) ≤ 0, and for
each x, the functions φt (x) are nonincreasing, i.e., φs (x) ≤ φt (x) whenever
190 7 QUASI-CONVEX FUNCTIONS

s ≥ t. Let x∗ denote the optimal point for the quasi-convex problem. If the
feasibility problem is:
Find x subject to φt (x) ≤ 0, gi (x) ≤ 0, Ax = b, i = 1, . . . , m, (7.4.1)

is feasible, then we have x ≤ t. Conversely, if this problem is infeasible, then
we cannot conclude that x∗ ≥ t. The problem (7.4.1) is known as the convex
feasibility problem, since the inequality constraints are all convex functions
and the equality constraints are linear.
This leads to a simple algorithm for solving quasi-convex optimization prob-
lem by using bisection that solves the convex feasibility problem at each step.
Assuming that the problem is feasible, start with the interval [l, r] that con-
tains the optimal value x∗ . Then solve the convex feasibility problem at the
mid-point t = (l + r)/2, by determining the half interval that contains the
optimal value, and continue halving the half-interval each time until the width
of the interval that contains the optimal value is small enough, say ε > 0. This
is known as the bisection method. Note that the length of the interval after k
iterations is 2−k (r − l), which means that exactly ln((r − l)/ε) iterations will
be required before the algorithm terminates.

Example 7.15. Minimize f (x, y) = y 2 −log(1+x)−y subject to x−2y ≤ 3,


x ≥ 0, y ≥ 0. Take F (x, y) = y 2 −log(1+x)+λ1 (3−x+2y)+λ2 (−x)+λ3 (−y).
Then the first-order KKT conditions give

Fx = −1/(1 + x) − λ1 − λ2 = 0, Fy = 2y + 2λ1 − λ3 = 0,
Fλ1 = 3 − x + 2y = 0, Fλ2 = −x = 0, Fλ3 = −y = 0.

Since the last equation gives y = 0, we get x = 3 from Fλ1 = 0. Thus,


the critical point is (3.0). The second-order conditions are: Fxx = 1/(1 +
x)2 , Fxy = 0 = Fyx , Fyy = 2, which at the point (3.0) give the Hessian
1
16 0 1
|H| = = 8 > 0.
0 2
1
Also, the first minor is |H1 | = 16 > 0, and the second minor |H2 | = |H| =
2 1
Fxx Fyy − (Fxy ) = 8 > 0. Hence, the function f (x, y) has a minimum at
(3, 0). 

7.4.5 Equality and Inequality Constraints. Minimize f (x) subject


to equality constraint gi (x) = 0 and inequality constraint hj (x) ≤ 0. The
domain D of the optimization problem is
n
\ m
\
D= dom(gi ) ∩ hj (x). (7.4.2)
i=1 j=1
7.4 QUASI-CONVEX OPTIMIZATION 191

A point x ∈ D is called a critical point x∗ if it satisfies the above equality and


inequality constraints, and the problem is said to be feasible if there exists at
least one critical point. The critical value x∗ is said to be the optimal value
of the above problem and is defined by

x∗ ∈ {f (x) | gi (x) = 0, i = 1, . . . , n; hj (x) ≤ 0, j = 1, . . . , m}, (7.4.3)

where x∗ can take the values ±∞. The set of all optimal points is denoted by

X ∗ = {x | gi (x) = 0, i = 1, . . . , n; hj (x) ≤ 0, j = 1, . . . , m; f (x) = x∗ }.


(7.4.4)
If X ∗ = ∅, the optimization problem is not solvable. A point x∗ is locally
optimal if there exists an M > 0 such that

f (x) = inf{f (x) | gi (z) = 0, i = 1, . . . , n;


hj (z) = 0, j = 1, . . . , m; kz − xk2 ≤ M }, (7.4.5)

i.e., x∗ solves the optimization problem.


Example 7.16. Let dom(f ) = R++ . Then (a) f (x) = 1/x: x∗ = 0 but
the optimal value is not achieved; (b) f (x) = − log x: x∗ = −∞, but this
problem is unbounded from below; (c) f (x) = x log x: x∗ = −1/e which is
achieved. 

7.4.6 Minmax Theorem. In microeconomics, quasi-concave utility func-


tions imply that consumers have convex preferences. Quasi-convex functions
are important in game theory, industrial organization, and general equilib-
rium theory, particularly for applications of Sion’s minmax theorem, which is
a generalization of von Neumann’s minmax theorem (zero-sum games).
Theorem 7.4. (Minmax Theorem; von Neumann [1928]) Let X ⊂ Rn and
Y ⊂ Rm be compact sets. If f : X × Y 7→ R is a continuous function that is
convex-concave, i.e., f (·, y) : X 7→ R is convex for fixed y, and f (x, ·) : Y 7→ R
is concave for fixed x, then we have

min max{f (x, y)} = max min{f (x, y)}. (7.4.6)


x∈X y∈Y y∈Y x∈X

Example 7.17. Martos [1969, 1971] has shown that there is a class of
quasi-convex functions which need not be convex, namely, the quadratic func-
tion f (x) = 21 xQx + cx, where Q is a symmetric matrix. If X = Rn , then
f (x) is convex iff f (x) is quasi-convex; thus, Q is positive and semidefinite.
However, if X = Rn+ , the function f (x) may be quasi-convex, yet not be con-
vex. For example, f (x1 , x2 ) = −x1 x2 is quasi-convex over R2+ , but it is not
convex there.
Example 7.18. One can combine convex, concave and linear functions to
form quasi-convex functions. For example, let f and g be defined on a convex
192 7 QUASI-CONVEX FUNCTIONS

set X such that f (x) 6= 0 for all x ∈ X. Then g/f is quasi-convex on X if


any of following six conditions holds:
I. g is convex and f (x) > 0 for all x ∈ X, or g is concave and f (x) < 0 for
all x ∈ X; and
II. f is linear on X, or f is convex on X and g(x) ≤ 0 for all x ∈ X, or f is
concave on X and g(x) ≥ 0 for all x ∈ X.

7.5 Summary
Let S be a nonempty convex set in Rn , and let f : S 7→ R. Then there are
the following types of convexity and quasi-convexity at a point.
1. Convexity at a point. The function f is convex at x ∈ S if f (tx + (1 −
t)x′ ) ≤ tf (x) + (1 − t)f (x′ ) for each t ∈ (0, 1) and each x′ ∈ S.
2. Strict convexity at a point. The function f is strictly convex at
x ∈ S if f (tx + (1 − t)x′ ) < tf (x) + (1 − t)f (x′ ) for each t ∈ (0, 1) and each
x′ ∈ S, x 6= x∗ .
3. Quasi-convexity at a point. The function f is quasi-convex at x ∈ S
if f (tx + (1 − t)x′ ) ≤ max{f (x), f (x′ )} for each t ∈ (0, 1) and each x′ ∈ S.
4. Strict quasi-convexity at a point. The function f is strictly quasi-
convex at x ∈ S if f (tx + (1 − t)x′ ) < max{f (x), f (x′ )} for each t ∈ (0, 1)
and each x′ ∈ S, f (x′ ) 6= f (x).

7.6 Exercises
7.1. Let f : R 7→ R be convex, and a, b ∈ dom(f ), a < b. (a) Show that

b−x x−a
f (x) ≤ f (a) + f (b) for all x ∈ [a, b].
b−a b−a

(b) Show that

f (x) = f (a) f (b) − f (a) f (b) − f (x)


≤ ≤ for all x ∈ (a, b).
x−a b−a b−x

(c) Suppose f is differentiable. Use the inequality in (b) to show that

f (b) − f (a)
f ′ (a) ≤ ≤ f ′ (b).
b−a

(d) Suppose f is twice differentiable. Use the result in (c) to show that
f ′′ (a) ≥ 0 and f ′′ (b) ≥ 0.
Hint. The first three inequalities follow from the definition of a convex
function: Suppose f is differentiable. Then f is convex iff dom(f ) is a convex
7.6 EXERCISES 193

set and f (y) ≥ f (x)+f ′ (x)(y−x) for all x, y ∈ dom(f ), which is the first-order
Taylor’s series approximation at x. Part (d) is obvious.
7.2. Suppose f : Rn 7→ R is convex with dom(f ) = Rn , and bounded
above on Rn . Show that f is constant.
7.3. Prove that a convex function is quasi-convex. Proof. Let the func-
tion f have the domain S (a convex set). Let a be a real number and xy
be points in the lower-level set La with x, y ∈ La . First, we show that the
set La is convex. For this, we need to show that for every t ∈ [0, 1] we
have tx + (1 − t)y ∈ La . Since S on which f is defined is convex, we have
tx + (1 − t)y ∈ S, and thus f is defined at the point tx + (1 − t)y. Now,
convexity of f implies that f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y). Moreover,
the fact that x ∈ La means that f (x) ≤ a, and similarly, y ∈ La means that
f (y) ≤ a. Hence, tf (x) + (1 − t)f (y) ≤ ta + (1 − t)a = a. Combining these
two inequalities we get f (tx + (1 − t)y) ≤ a, so that tx + (1 − t)y ∈ La . Thus,
every upper-level set is convex and hence, f is quasi-convex.

Note that quasi-convexity is weaker than convexity, in the sense that every
convex function is quasi-convex.
7.4. Prove that g(t) = f (tx + (1 − t)y) is quasi-convex for t ∈ [0, 1] and for
any x, y ∈ Rn iff f is quasi-convex. Hint. Use definition (7.1.1).
7.5. Prove that

sup{f (x) + g(x)} ≤ sup{f (x)} = sup{g(x)}.


x x x

Hint. This inequality follows from the defining inequality of quasi-convex


functions and the triangle inequality.
7.6. Prove that the floor function x 7→ ⌊x⌋ is a quasi-convex function that
is neither convex nor continuous.
7.7. Prove that if x 7→ f (x) and y 7→ g(y) are positive convex decreasing
functions, then (x, y) 7→ f (x)g(y) is quasi-convex.
7.8. Prove that if f and g are both convex, both nondecreasing (or nonin-
creasing), and positive functions on an interval, then f g is convex.
7.9. Let f be a function of n variables with continuous partial derivatives
of all order in an open convex set S, and let H̄k be the determinant of its
kth-order bordered Hessian. Then
194 7 QUASI-CONVEX FUNCTIONS

(i) if f is quasi-convex, then |H̄|k (x) ≤ 0 for all x ∈ S, k = 1, 2, . . . , n; and

(ii) if |H̄|k (x) < 0 for all x ∈ S, k = 1, 2, . . . , n, then f is quasi-convex.

This result can also be stated as follows:


|H̄|1 (x) ≥ 0, |H̄|2 (x) ≤ 0, . . . , |H̄|n (x) ≥ 0 if n is odd,
|H̄|1 (x) ≥ 0, |H̄|2 (x) ≤ 0, . . . , |H̄|n (x) ≤ 0 if n is even,

for all x ∈ S, is a necessary condition for quasi-convexity, and


|H̄|1 (x) > 0, |H̄|2 (x) < 0, . . . , |H̄|n (x) > 0 if n is odd,
|H̄|1 (x) > 0, |H̄|2 (x) < 0, . . . , |H̄|n (x) < 0 if n is even,

for all x ∈ S, is a sufficient condition for quasi-convexity.

7.10. If f is convex, nondecreasing, and positive, and g is concave, nonin-


creasing and positive functions on an interval, then f /g is convex.

7.11. Minimize C(x, y) = (x − 3)2 + (y − 4)2 subject to x + y ≥ 4 and


x, y ≥ 0.

7.12. Minimize C(x, y) = 2x + y subject to x2 − 4x + y ≥ 0 and x, y ≥ 0.

7.13. Plot the graphs of the following functions and check whether they
are quasi-concave, quasi-convex, both, or neither: (a) f (x) = x3 − 2x; (b)
f (x, y) = 6x − 9y; (c) f (x, y) = y − ln x.

7.14. Verify that the cubic function f (x) = ax3 + bx2 + cx + d is in general
neither quasi-concave nor quasi-convex.

7.15. Use definitions (6.1.3), (6.1.4), and (7.1.3) to check whether f (x) =
x2 , x ≥ 0 is quasi-concave or quasi-convex. (See Example 6.1).

7.16. Show that f (x, y) = xy, x, y ≥ 0 is not quasi-convex.

7.17. Consider the parabolic cylinder x = y 2 , x ≥ 0. Determine if this


function is convex,
√ quasi-convex, concave, or quasi-concave. Hint. The
graphs of y = ± x, x ≥ 0 and the 3-D plot of the given functions are pre-
7.6 EXERCISES 195

sented in Figures 7.6 (a), (b), and (c); the graphs do not exist for x > 0.

√ √
Figure 7.6 (a) y = x, (b) y = − x.

Figure 7.6 (c) 3-D Plot of x = y 2 .

7.18. Use the bordered Hessian to check whether the following functions
are quasi-concave or quasi-convex: (a) f (x, y) = −x2 − y 2 , x, y > 0; (b)
f (, y) = −(x + 1)2 − (y + 2)2 , x, y > 0.

7.19. Minimize C(x, y) = (x − 3)2 + (y − 4)2 subject to x + y ≥ 4 and


x, y ≥ 0.

7.20. Minimize C(x, y) = 2x + y subject to x2 − 4x + y ≥ 0 and x, y ≥ 0.

7.21. Let f and g be real-valued convex functions with the same domain
D. Define a function h so that h(x) = f (x)g(x) for all x ∈ D. Show that
196 7 QUASI-CONVEX FUNCTIONS

h is a quasi-convex function. Ans. h is necessarily convex; for example, let


f (x) = x and g(x) = x for x ∈ R (the real line). Both f and g are convex,
and so is h(x) = x2 , and also quasi-convex.
7.22. Show that x∗ = (− 21 , 13 , 1) is the optimal point for the optimization
problem: Minimize 21 xT Ax + bT x + c, where
     
10 12 2 12 x
A =  6 17 13  , b =  −14  , x =  y  , c = [7].
8 13 6 −21 z
 
Hint. The function f (x) = f (x, y, z) = 12 24x2 + 42y 2 + 21z 2 + 12x − 14y −
21z + 7. Equating the first-order partial derivatives of f with respect to x, y, z
to zero yields the critical point x∗ = (− 12 , 13 , 1). Use Hessian H, as in Example
7.11, to show that the function f is minimized at x∗ .
7.23. Show that the function f (x, y) = −xa y b , x, y > 0, 0 < a, b < 1, is
quasi-convex. Hint. Use the bordered Hessian |B| (§7.3.1) and show that
|B1 | < 0 and |B2 | < 0.
8
Log-Concave Functions

Log-concavity is an important part of information economics. Since the loga-


rithm of the cumulative distribution function (c.d.f.) of a random variable is
a concave function, it turns out that the ratio of the probability density func-
tion (p.d.f.) to c.d.f. is a monotone decreasing function. Some readers may
prefer to read §8.5 on log-concave distribution prior to starting this chapter.

8.1 Definitions
A nonnegative function f : Rn 7→ R is said to be log-concave (or logarithmi-
cally concave) if its domain is a convex set, and f satisfies the inequality

f (tx + (1 − t)x′ ) ≥ f (x)t f (x′ )1−t (8.1.1)

for all x, x′ ∈ dom(f ) and 0 < t < 1. If f is strictly positive, then the
logarithm of the function f , i.e., log f , is concave:

log(f (tx + (1 − t)x′ ) ≥ t log f (x) + (1 − t) log f (x′ ) (8.1.2)

for all x, x′ ∈ dom(f ) and 0 < t < 1.


Compare the above definition with the following definition of a log-convex
function, which uses the property that f is convex if −f is concave. Thus, a
function f is log-convex if it satisfies the inequality

f (tx + (1 − t)x′ ) ≤ f (x)t f (x′ )1−t (8.1.3)

for all x, x′ ∈ dom(f ) and 0 < t < 1. Hence, f is log-convex if log f is convex.
Example 8.1. (i) Exponential function f (x) = eax ; (ii) f (x) = xa , a ≥ 0;
ex
and (iii) f (x) = (known as the inverse logit function),1 are log-concave
1 + ex
functions. 
1 Note that the logit function is used in statistics to determine the log-odds or logarithm
of the order p/(1 − p), where p denotes the function power.
198 8 LOG-CONCAVE FUNCTIONS

Some properties of log-concave functions are as follows:


(i) A positive log-concave function is also quasi-concave.
(ii) Every concave function that is nonnegative on its domain is log-concave.
However, the converse is not necessarily true. An example is the Gaussian
2
function f (x) = e−x /2 which is log-concave since log f (x) = −x2 /2 is a
concave function of x. But f is not concave since its second derivative f ′′ (x) =
2
e−x /2 (x2 − 1) > 0 for |x| > 1.
(iii) A nonnegative C 2 -function f defined on a convex domain is log-concave
iff for all x ∈ Rn for which f (x) > 0,
 
∂f
 ∂x1  
  ∂f ∂f
f (x)[H]   . ··· , (8.1.4)
 ..

 ∂x1 ∂xn
 ∂f 
∂xn

where [H] is the Hessian matrix of f (§1.6.2). In other words, condition (8.1.4)
states that the difference between the left-hand side and the right-hand side
expressions is negative semidefinite (NSD). In the case of functions f in R
condition (8.1.4) simplifies to

f (x)f ′′ (x) ≤ (f ′ (x))2 . (8.1.5)

Since f (x) > 0, condition (8.1.4) can also be written using the Schur com-
plement (see Gill et al. [1990]). However, condition (8.1.4) is often written
incorrectly as
f (x)∇2 f (x)  ∇f (x)∇f (x)T ,

implying that f (x)∇2 f (x) − ∇f (x)∇f (x)T is negative semidefinite (NSD).


Other properties of log-concave functions are as follows: (i) (sum:) sum of
log-concave functions are not always log-concave; (ii) (product): if f and g are
log-concave, Rthen f g is log-concave; (iii) (integral): if f (x, y) is log-concave
in x, y, then f (x,R y) dy is log-concave; and (iv) (convolution:) if f and g are
log-concave, then f (x − y)g(y) dy is log-concave.

8.1.1 Log-Concavity Preserving Operations


(i) Products. The product of log-concave functions is also log-concave. In
fact, if f and g are log-concave, then log f and log g are concave, by definition.
Hence,
log f (x) + log g(x) = log(f (x)g(x)) (8.1.6)

is concave, and so f g is also log-concave.


8.2 THEOREMS 199

(ii) Marginals. If f (x, y) : Rn+m 7→ R is log-concave, then


Z
g(x) = f (x, y) dy (8.1.7)

is log-concave; see the Prékopa-Leindler inequality below (Theorem 8.10).


(iii) The above property implies that convolution preserves log-concavity,
since h(x, y) = f (x − y)g(y) is log-concave if f and g are log-concave, and
hence the convolution f ⋆ g of the functions f and g, defined by
Z Z
(f ⋆ g)(x) = f (x − y)g(y) dy = h(x, y) dy, (8.1.8)

is log-concave.
(iv) The Laplace transform of a nonnegative convex function is log-concave.
(v) If the random variable has a log-concave p.d.f., then the c.d.f. is a
log-concave function.
(vi) If two independent random variables have log-concave p.d.f.s, then
their sum has a log-concave p.d.f.

8.2 Theorems
Consider a real-valued function f which is log-concave on some interval I if
f (x) ≥ 0 and x 7→ log f (x) is an extended real-valued concave function, where
we have log 0 = −∞.
Theorem 8.1. (Marshall and Olkin [1979; 16B.3.a]) A sufficient condition
for a nonnegative function f to be log-concave on I is given by

f (x + h)f (x′ ) ≤ f (x)f (x′ + h), x < x′ ; x + h, x′ + h ∈ I, h > 0. (8.2.1)

Log-concavity is a more difficult concept than log-convexity. It is known


that the sum of two log-convex functions is again a log-convex function, but
it need not be true for log-concavity. Before we prove an important theorem
on convolution of log-concave functions, we will introduce a definition of a
totally positive function of order k: A function F defined on A × B, where A
and B are subsets of R, is said to be totally positive of order k, denoted by
(TP)k , if for all m, 1 ≤ m ≤ k and all x1 , x2 < · · · < xm , y1 < y2 < · · · < ym ,
where xi ∈ A and yi ∈ B we have

F (x1 , y1 ) ··· F (x1 , ym )


.. .. ≥ 0. (8.2.2)
. ··· .
F (xm , y1 ) · · · F (xm , ym )

This definition implies that if u and v are nonnegative functions and F is


(TP)k , then u(x)v(y)F (x, y) is also (TP)k .
200 8 LOG-CONCAVE FUNCTIONS

In the particular case when F (x, y) = f (y − x), first note that F is (TP)2
on A × B iff f is nonnegative and

f (y1 − x1 )f (y2 − x2 ) − f (y2 − x1 )f (y1 − x2 ) ≥ 0 (8.2.3)

for all x1 < x2 in A and y1 < y2 in B.


Theorem 8.2. Condition (8.2.3) is equivalent to log-concavity of x 7→
f (x).
Proof. Assume that x1 < x2 < y1 < y2 . Let y1 − x2 = r, and y1 − x1 =
s, y2 − y1 = h. Then r < s and h > 0, and (8.2.3) becomes f (r + h)f (s) ≥
f (r)f (s + h), which is simply condition (8.2.1). 
Schoenberg [1951] has shown that F (x, y) = f (y − x) is (TP)2 on R × R iff
f is log-concave on R.
Theorem 8.3. (Marshal and Olkin [1979:2]) If K is (TP)m and L is
(TP)n and σ is a sigma-finite measure,2 then the convolution
Z
M (x, y) = K(x, z)L(z, y) dσ(z). (8.2.4)

For proof, see Marshal and Okin [1979].

8.2.1 General Results on Log-Concavity. Let g be a twice-differentiable


real-valued function defined on an interval I of the extended real line. A
function g is said to be log-concave on the interval (a, b) if the function log g
is a concave function on (a, b).
Remark 1. Log-concavity of g on (a, b) is equivalent to each of the follow-
ing two conditions: (i) g ′ (x)/g(x) is monotone decreasing on (a, b); and (ii)
(log g(x))′′ < 0.
Theorem 8.4. Let g be strictly monotone (increasing or decreasing) de-
fined on the interval (a, b). Suppose that either g(a) = 0 or g(b) = 0. If g ′ is
a log-concave function on (a, b), then g(x) is also a log-concave function on
(a, b).
Proof. Following Bagnoli and Bergstrom [1989], suppose g(a) = 0. Using
the generalized mean value theorem (or Cauchy mean value theorem) there
exists ξ ∈ (a, b) such that

g ′ (x) − g(a) g ′′ (ξ)


= ′ . (8.2.5)
g(x) − g(a) g (ξ)
2 Let Σ be a σ-algebra over X. A function µ : Σ 7→ R is called a measure if (i) µ(E) = 0

for all E ∈ Σ, and (ii) µ(∅) = 0. The Lebesgue measure on R is a complete translation-
invariant measure of a σ-algebra containing the intervals in R such that µ[0, 1] = 1. A
σ-finite measure is a finite number, and real numbers with standard Lebesgue measure are
σ-finite but not finite.
8.2 THEOREMS 201

g ′′ (ξ) g ′′ (x)
Suppose g ′′ /g ′ is monotone decreasing. Since x > ξ, we have > .
g ′ (ξ) g ′ (x)
Then it follows from (8.2.5) that

g ′ (x) − g ′ (a) g ′′ (x)


> ′ . (8.2.6)
g(x) g (x)

Since g is strictly monotone and g(a) = 0, then g(x) is of the same sign
as g ′ (x) for all x ∈ (a, b). Therefore multiplying both sides of (8.2.6) by
g(x)g ′ (x) preserves the direction of inequality, and yields g ′ (x)2 − g ′ (x)g ′ (a) >
g(x)g ′′ (x), and hence g(x)g ′′ (x) − g ′ (x)2 < −g ′ (x)g ′ (a) < 0. Thus,

g(x)g ′′ (x) − g ′ (x)2  g ′ (x) ′


< 0, or < 0. (8.2.7)
g ′ (x)2 g(x)

Then using Remark 1 we conclude that g is log-concave.


Next, suppose that g(b) = 0. Using the generalized mean value theorem,
there exists an x ∈ (a, b) such that

g ′ (x) − g(b) g ′′ (ξ)


= ′ . (8.2.8)
g(x) − g(b) g (ξ)

If g ′′ (x)/g ′ (x) is monotone decreasing, then since g(b) = 0 and x < ξ, it


follows from (8.2.8) that

g ′ (x) − g ′ (b) g ′′ (x)


< ′ . (8.2.9)
g(x) g (x)

Since g(x) is monotone and g(b) = 0, we must have g ′ (x)g(x) < 0 for x < b.
Multiplying both sides of (8.2.8) by g(x)g ′ (x), we get g ′ (x)2 − g ′ (x)f ′ (x) >
g(x)g ′′ (x). As before, this inequality implies inequality (8.2.7), which in view
of Remark 1 establishes the log-concavity of g. 
According to Schoenberg [1951], any log-concave function f on R is either
monotone or, if it is non-monotone, then f (x) → 0 as x → ±∞ at least
exponentially. Thus, if f and g are non-monotone and log-concave functions
on R, then their convolution is well defined on R.
An important theorem is as follows:
Theorem 8.5. (Lekkerkerker [1953]) Convolution of log-concave functions
defined on the interval [0, ∞) under some additional requirements, mentioned
in the next theorem, is log-concave.
Lekkerkerker’s proof is very long. A simpler proof of this result as published
by Merkle [1998a,b] is as follows.
202 8 LOG-CONCAVE FUNCTIONS

Theorem 8.6. (Merkle [1998a]) Let f and g be log-concave functions on


R such that the convolution
Z +∞
h(x) = f (x − z)g(z) dz (8.2.10)
−∞

is defined for all x ∈ R. Then the function h is log-concave on R.


Proof. Let F (x, z) = f (x − z) and L(z, y) = g(z − y). Since f and g
are log-concave functions, we find, by Theorem 8.3, that F and L are (TP)2
R +∞
on R × R, and M (x, y) = −∞ f (x − z)g(z − y) dz is (TP)2 on R × R. Now,
R +∞
setting z − y = t we get M (x, y) = −∞ f (x − y − t)g(t) dt = h(x − y). Since
M is (TP)2 , we find that h is log-concave on R. 
Note that since we have set t = z − y, the differential dz in this theorem
cannot be replaced by an arbitrary sigma-finite measure dσ. Also, if the
function f is log-concave on an interval I, then the function f ∗ defined by

f (x) if x ∈ I,
f ∗ (x) =
0 otherwise

is log-concave on R. Hence, Theorem 8.6 can be applied to convolutions of


functions defined on intervals of R. The statement of Theorem 8.6 holds if
the ‘log-concave’ is replaced by ‘log-convex’, and the proof is direct since the
function f 7→ f (x − z)g(z) is log-convex in z for each x.

8.2.2 Log-Concavity of Density and Left-Side Integral. Let X be a


real-valued random variable whose support is an interval (a, b) on the extended
real line. Let X have a cumulative distribution function (c.d.f.) F and a

density function f , where f (x) = R xF (x). For all x ∈ (a, b), denote the left-side
integral of the c.d.f. by G(x) = a F (t) dt. We will show that log-concavity of
f implies log-concavity of F , which in turn implies log-concavity of G. Note
that we speak of ‘left-side integral’ here because F (x) and G(x) measure areas
lying on the left of x on a histogram.
Theorem 8.7. (i) If the density function f is log-concave on (a, b), then
the c.d.f. F is also log-concave on (a, b); and (ii) if the c.d.f. F is log-concave
on (a, b), then the left-side integral G of the c.d.f. is also a log-concave function
on (a, b).
Proof. (i) Applying Theorem 8.4 to the function F , we find that F (a) = 0
and F is strictly increasing on [a, b], since F is a c.d.f. with support [a, b],
and that if F ′′ /F ′ is monotone decreasing, then so is F ′ /F . But F ′ = f
and F ′′ = f ′ . Thus, if f ′ /f is monotone decreasing, then F ′ /F is monotone
decreasing. Then part (i) follows from this fact and Remark 1 (§8.2.1). To
prove part (ii), apply Theorem 8.4 to the function G(x). Clearly, G(a) = 0
and G is a strictly increasing function on [a, b]. Thus, if G′′ /G′ is monotone
8.2 THEOREMS 203

decreasing, then so is G′ /G. But G′′ = f and G′ = F . Thus, if f /F is


monotone decreasing, then so is G′ /G. Part (ii) then follows from this fact
and Remark 1. 
Corollary 8.1. If the density function f is monotone decreasing, then its
c.d.f. F and its left-side integral G are both log-concave.
Proof. Since F is a c.d.f., then F must be monotone increasing. Then
if f is monotone decreasing, f (x)/F (x) must be monotone decreasing. But
(f (x)/F (x))′ = (log F (x))′′ . Thus, if f is monotone decreasing, F must be
log-concave. Then G must also be log-concave since, by Theorem 8.7, log-
concavity of F implies log-concavity of G. 

8.2.3 Reliability Theory and Right-Side Integral. Reliability theory


deals with the time patterns of survival probability of an existing machine or
organism. The length of remaining life for an object is modeled as a random
variable X with c.d.f. F (x), and with support (a, b). This theory takes
into account the properties of the ‘right tail’ of the distribution of X. The
reliability function F̄ on X is defined by F̄ (x) = 1 − F (x), and the failure
rate (also known as the hazard function) r(x) is defined as r(x) = f (x)/F̄ (x).
Also, the right-side integral of the reliability function is defined by R(x) =
Rh
x
F̄ (t) dt.
Theorem 8.8. (i) If the density function f is log-concave on (a, b), then
the reliability function F̄ (x) is also log-concave on (a, b), and (ii) if the relia-
bility function F̄ is log-concave on (a, b), then the right-side integral R(x) is
a log-concave function on (a, b).
Proof. Note that this theorem is a dual of Theorem 8.7. To prove part
(i), apply Theorem 8.4 to the function F̄ (x) = 1 − F (x). Since F is a c.d.f.,
we must have F̄ (b) = 0 and F̄ is a monotone decreasing function. Then, if
F̄ ′′ (x)/F̄ ′ (x) is a decreasing function, so is F̄ ′ (x)/F̄ (x). But F̄ ′ (x) = −f (x)
and F̄ ′′ (x) = −f ′ (x) for all x ∈ [a, b]. Therefore, F̄ ′′ (x)/F̄ ′ (x) = f ′ (x)/f (x)
which is monotone decreasing, and so is F̄ /(x)/F̄ (x). Hence, if f is log-
concave, then F̄ must be log-concave.
To prove part (ii), apply Theorem 8.4 to the function R(x). Clearly, R(b) =
0 and R(x) is monotone decreasing in x. Using Theorem 8.4, we find that if
R′′ (x)/R′ (x) is monotone decreasing, then so is R′ (x)/R(x). In this case then
R′ (x) = −F̄ (x) and R′′ (x) = f (x). Thus, if R′′ (x)/R′ (x) = −f (x)/F (x) is
monotone decreasing, then R′ (x)/R(x) = −F̄ (x)/R(x) must also be monotone
decreasing. Then, if F̄ is log-concave, so is R log-concave. 
Since the failure rate is defined by r(x) = f (x)/F̄ (x) = −F̄ ′ (x)/F̄ (x),
we find that the reliability function F̄ is log-concave iff the failure rate is
monotone increasing in x. This fact and Theorem 8.8 lead to the following
results.
204 8 LOG-CONCAVE FUNCTIONS

Corollary 8.2. If the density function f is log-concave on (a, b), then the
failure rate is monotone increasing on (a, b). If the failure rate is monotone
increasing on (a, b), then R′ (x)/R(x) is monotone increasing.
Corollary 8.3. If the density function f is monotone increasing, then the
reliability function F̄ is log-concave.
Proof. Since F̄ is a reliability function, then it must be monotone decreas-
ing. Thus, if f is monotone increasing, the failure rate f /F̄ must be monotone
increasing. But increasing failure rate is equivalent to a log-concave reliability
function. 

8.2.4 Mean Residual Lifetime. The mean residual lifetime function mrl(x)
evaluated at x is the expected length of remaining life for a machine of age
Rb
x; it is defined as mrl(x) = x tf (t) dt − x. If this function is monotone
increasing, then a machine will age with the passage of time, in the sense that
its expected remaining life time will diminish as it gets older.
Theorem 8.9. (Muth [1977]) Let the random variable X represent the
length of life. The sufficient condition for mean residual lifetime mrl(x) to be
a monotone decreasing function is either the p.d.f. f (x) is log-concave, or the
failure rate r(x) is a monotone increasing function.
Proof. Integrating mrl(x) by parts we get

Z b
mrl(x) = F̄ (t) dt/F̄ (x).
x

Rh
Since R(x) = x F̄ (t) dt, we have mrl(x) = R(x)/R′ (x), so mrl(x) is a de-
creasing function iff R(x) is log-convex. By Theorem 8.8(ii), R(x) will be
log-convex if r(x) is an increasing function, thereby proving the sufficiency
of condition (ii). By Theorem 8.8(i), log-concavity of f (x) implies that r(x)
is monotone increasing, which implies that mrl(x) is monotone decreasing,
which proves the sufficiency of condition (i). 

8.3 Asplund Sum


The Asplund sum is used in adding log-concave functions, and in conjugate
functions.
8.3.1 Derivatives of Integrals of Log-Concave Functions. Consider
the function f : Rn 7→ R+ of the form f = e−u , where u : Rn 7→ (−∞, ∞] is
convex, with u 6= ∞. We assume that

lim u(x) = ∞ ⇐⇒ lim f (x) = 0.


|x|→∞ |x|→∞
8.3 ASPLUND SUM 205

2 |x|2
Example 8.2. Let f (x) = e−|x| /2
, u(x) = . Then
2

1 if x ∈ K,
f (x) = χK (x) =
0 if x 6∈ K,
 (8.3.1)
0 if x ∈ K,
u(x) = IK (x) =
∞ if x 6∈ K,

where K ⊂ Rn is a convex set. 


8.3.2 Adding Log-Concave Functions (Asplund sum). Let f and g be
log-concave and s > 0. Set

(f ⊕ g)(z) = sup {f (x)g(y)},


z=x+y
x (8.3.2)
(s ⊙ f )(x) = sf .
s
These operations preserve log-concavity since both f ⊕ g and s ⊙ f are log-
concave. If K and L are convex sets and s, t > 0, then

(s ⊙ χK ) ⊕ (t ⊙ χL ) = χsK+tL , (8.3.3)

where sK + tL = {sx + ty : x ∈ K, y ∈ L}.


x 1
The result (8.3.3) follows from (s ⊙ χK )(x) = sχK = s⊙ = 1 if
s s
x x
∈ K, and similarly (t ⊙ χL )(x) = 1 if ∈ L. This may also say that
s t
the direct sum of s ⊙ χK = s if x ∈ K and t ⊙ χL = t if x ∈ L is equal to
χsK+tL = s + t if x is both K and L (i.e., x ∈ K ∩ L).
!!! The ⊕ and ⊙ notations are borrowed from physics, where, e.g., ⊕
means ‘direct sum of two vector spaces.’ For their definition, see
https://www.physicsforums.com/threads/o-plus-symbol.362404/; also /o-dot-
symbol. The physical definitions may not be adapted here rigorously, but since
they are properly defined, they seem to be useful.
8.3.3 Asplund Sum and Conjugate Functions. Let u : Rn 7→ (−∞, ∞]
be a convex function. The conjugate function u∗ of u is defined by

u∗ (y) = sup (x, y) = u(x). (8.3.4)


x∈Rn

The conjugate function u∗ : Rn 7→ (−∞, ∞] is also a convex function.


Note that the ⊕ and ⊙ operations can be defined equivalently as follows:
For log-concave functions f = e−u , g = e−v , and α, β > 0,

(α ⊙ f ) ⊕ (β ⊙ g) = (α ⊙ e−u ) ⊕ (β ⊙ e−v ) = e−w , (8.3.5)


206 8 LOG-CONCAVE FUNCTIONS

where w = αu + βv. Here ⊕ and ⊙ are linear, in the usual sense, with respect
to the conjugates of the exponents (with reverse sign).
8.3.4 Integral Functional. For a log-concave function f that verifies the
decay condition at infinity, let
Z
I(f ) = f (x) dx ∈ [0, ∞). (8.3.6)
Rn

Note that if f = χK (K a convex set), then I(f ) = V (K), where V (K) is the
volume of K.
We will now study the limit

I((f ⊕ ε) ⊙ g) − I(f )
δI(f, g) = lim , (8.3.7)
ε→0+ ε

in particular with respect to existence, and representation formulas in terms


of f .
It is known that when f is the Gaussian function, i.e., f (x) = e−|x|/2 , the
limit (8.3.7) is regarded as the mean width of g, and denoted by M (g). The
question is why the study of the limit (8.3.7) is important. If f = χK and
g = χL , this limit can be written as
Z
V (K + εL) − ε(L)
lim+ = hL dσK , (8.3.8)
ε→0 ε S n−1

where hL is the support function of L, and σK is the area measure of K. In


particular, this limit identifies the area measure of K.
Theorem 8.10. (Prékopa-Leindler inequality) For every f and g log-
concave and t ∈ (0, 1),

I((t ⊙ f ) ⊕ ((1 − t) ⊙ g)) ≥ I(f )t I(g)1−t . (8.3.9)

Equality holds iff there exists x0 such that g(x) = f (x− x0 ) for every x ∈ Rn .
This means that the functional log(I) is concave in the class of log-concave
functions, equipped with the linear structure given by the operations ⊕ and
⊙ defined by (8.3.2). Note that for f = χK and g = χL (K and L convex
sets), we get the Brunn-Minkowski inequality in its multiplicative form:

V (tK + (1 − t)L) ≥ V (K)t V (L)1−t , (8.3.10)

which is equivalent to the classical inequality.


Theorem 8.11. Let f and g be log-concave, and assume that I(f ) > 0.
Then the limit (8.3.7) denoted by δI(f, g) exists and belongs to (−∞, ∞].
8.4 LOG-CONCAVITY OF NONNEGATIVE SEQUENCES 207

Proof is based on the Prékopka-Leindler inequality (Theorem 8.10):

ε → log(I(f ⊕ ε) ⊙ g)) is concave. 

Note that (i) the assumption I(f ) > 0 can be removed in dimension n = 1.
2
(ii) Choose f (x) = e−|x| /2 and g = e−|x| in dimension n = 1; then
δI(f, g) = +∞.
(iii) For a suitable choice of f and g, δI(f, g) < 0, in contrast with the
idea that δI(f, g) is a mixed integral of f and g (mixed volumes are always
nonnegative).
(iv) Choose g = f ; then we have the formula δI(f, f ) = nI(f ) − E(f ) (no
homogeneity here!), where
Z
E(f ) = − f log(f ) dx (8.3.11)
Rn

is the entropy of f .
8.3.5 Area Measure of Log-Concave Functions. Comparing the formu-
las Z
V (K + εL) − V (L)
lim = hL dσK , (8.3.12)
ε→0+ ε Sn−1

and Z
I((e−u ⊕ ε) ⊙ e−v ) − I(e−v )
lim+ = v ∗ dµf , (8.3.13)
ε→0 ε Rn−1

seems to suggest that dµf is a sort of area measure of f . Note that µf


determines f uniquely, i.e.,

µf = µg =⇒ there exists an x0 ∈ Rn such that g(x) = f (x−x0 ) for all x ∈ Rn .

The proof uses the characterization of equality in the Prékopa-Leindler in-


equality.

8.4 Log-Concavity of Nonnegative Sequences


Hoggar’s theorem states that the sum of two independent discrete-valued log-
concave random variables is itself log-concave. We will determine conditions
under which this result still holds for dependent variables. Log-concavity
of the Stirling numbers of the second kind and of the Eulerian numbers is
established.
The property of log-concavity for nonnegative sequences is defined as fol-
lows: A sequence {u(i), i ≥ 0} is log-concave if, for all i ≥ 1,

u(i)2 ≥ u(i − 1)u(i + 1). (8.4.1)


208 8 LOG-CONCAVE FUNCTIONS

Eq (8.4.1) is also known as the quadratic Newton inequality. Log-concavity is


used in combinatorics, algebra, geometry, computer science, and econometrics.
In probability and statistics it is related to the notion of negative association
of random variables.
Definition 8.2. A random variable V taking values in Z+ is log-concave if
the probability mass function PV (i) = P(V = i) forms a log-concave sequence,
that is, V is log-concave iff for all i

PV (i)2 ≥ PV (i − 1)PV (i + 1). (8.4.2)

Example 8.3. (a) Let Geom(p) denote the geometric distribution with
the probability mass function PX (i) = (1 − p)pi for i ∈ Z+ . For each p,
these random variables represent the ‘edge case’ of the log-concavity property,
whenever Eq (8.4.2) is an identity for all i.
(b) The Poisson distribution P (λ) is log-concave for any λ ≥ 0.
(c) Any binomial distribution is log-concave. 
Definition 8.2 can be generalized as follows: Given sequences PV (v) and
PV +W (x), there exists a two-dimensional array of coefficients PV |W (x) such
that X
PV +W (x) = PV (v) PV +W (x − v|v). (8.4.3)
In fact, the sequence PV |W acts like conditional probability without requiring
the sequence to sum to 1.
Example 8.4. For some p ∈ (0, 1) and α ∈ (0, 1), define the joint distri-
bution of V and W by
 
i+j
P(V = i, W = j) = (1 − p)pi+j αi (1 − α)j , for i, j ≥ 0. (8.4.4)
i

Using the identities


k   ∞  
X k i X i+j j
α (1 − α)k−i = 1, t = (1 − t)−i − 1, for 0 ≤ t < 1,
i=0
i j=0
i

we find that V + W is Geom(p), V is Geom(αp/(αp + (1 − p)), and W is


Geom(p/((1 − α)p + (1 − p))). The conditional probabilities are negative
binomials with
 
i+j
PW |V (j | i) = P(W = j | V = i) = (1−p+αp)i+1 (p(1−α))j .  (8.4.5)
i

Definition 8.3. Given coefficients PW |V and fixed i, define

a(i)
r,s = PW |V (i−r|r)PW |V (i−s|s)−PW |V (i−r−1|r)PW |V (i−s+1|s). (8.4.6)
8.4 LOG-CONCAVITY OF NONNEGATIVE SEQUENCES 209

Then we have
(i)
Condition B. For the quantities ar,s defined by (8.4.6), the following two
conditions must hold for all 0 < t ≤ m ≤ i:
t
X t
X
(i) (i)
(a) am+k,m−k ≥ 0, and (b) am+k+1,m−k ≥ 0. (8.4.7)
k=−t k=−t−1

Example 8.5. (Continued) Note that Condition B holds in Example 8.4,


(i)    i+1
since for any given i, the quantity ar,s is proportional to ri si − i−1
r s .
(i)
Thus, in (8.4.7a), as t increases, the term am+k,m−k is proportional to
        
i i i−1 i+1 i+1 i−1 > 0 for t ≤ T ,
2 − −
m+t m−t m+t m−t m+t m−t < 0 for t > T ,
t
P (i)
for some value of T . Thus, the partial sums am+k,m−k form a sequence
k=−t
which increases for t ≤ T and decreases thereafter (t < T ). Using the identity
P a b   Pm
a+b (i)
j j r−j = r , we find that the sum am+k,m−k = 0. Hence the
k=−m
sequence of partial sums is nonnegative for any t. Using this method, a similar
result follows for the sums in (8.4.7b). 
Lemma 8.1. Fix k ≥ m and suppose that {cj } is a sequence such that
n
P
Cn := cj ≥ 0 for all 0 ≤ n ≤ m. Then for any log-concave sequence p,
j=0
and for any 0 ≤ i ≤ m,
i
X
p(k + j)p(m − j)cj ≥ 0. (8.4.8)
j=0

Proof. Apply Abel’s summation formula (summation by parts) which


yields
i
X i
X
p(k + j)p(m − j)cj = p(k + j)p(m − j)(Cj − Cj−1 )
j=0 j=0
i
X (8.4.9)
= [p(k + j)p(m − j) − p(k + j + 1)p(m − j − 1)]Cj
j=0
+ Ci p(k + i + 1)p(m − i − 1),

where C−1 = 0. The log-concavity of p implies that p(k + j)p(m − j) ≥


p(k + j + 1)p(m − j − 1) for j ≥ 0. Thus, since each Cj ≥ 0, the result
follows. 
210 8 LOG-CONCAVE FUNCTIONS

Theorem 8.12 (Hoggar [1974]) If V and W are independent log-concave


random variables, then their sum V + W is also log-concave. Equivalently,
the convolution of any two log-concave sequences is log-concave.
The proof of Hoggar’s theorem is a special case of the following theorem.
Theorem 8.13. (Johnson and Goldschmidt [2005]) If V is a log-concave
random variable, and random variables V and W satisfy Condition B, then
the sum V + W is also log-concave.
Proof. For any i, the sequence PV +W defined by (8.4.3) satisfies the
log-concavity definition, i.e.,
PV +W (i)2 − PV +W (i − 1)PV +W (i + 1)
i i+1
P P 
= PV (j)PV (k) PW |V (i − j|j)PW |V (i − k|k)
j=0 k=0

−PW |V (i − j − 1|j)PW |V (i − k + 1|k)
i i+1
P P (i)
= PV (j)PV (k) aj,k
j=0 k=0
i
P
≡ (S1 + S2 + S3 ),
j=0

(8.4.10)

where the sums S1 , S2 , S3 correspond to the following three regions: S1 to the


region {j ≥ k}; S2 to the region {j = k − 1}, and S3 to the {j ≤ k − 2}. Also,
in region S3 we use new coordinates (k − 1, j + 1) for j ≤ k − 2. Then the
right-hand side of Eq (8.4.10) can be rewritten as
i X
X i+1 i
X
(i) (i)
PV (j)PV (k) aj,k = PV (j)PV (j + 1)aj,j+1
j=0 k=0 j=0
X  (i) (i)

+ PV (j)PV (k)aj,k + PV (j + 1)PV (k − 1)ak−1,j+1
i≥j≥k (8.4.11)

where
m
X  (i)
S1 = PV (m + k)PV (m − k)am+k,m−k
k=0
(i) 
+ PV (m + k + 1)PV (m − k)am−k,m+k+1 , (8.4.12)
m
X (i)
S2 = PV (m)PV (m + 1)am,m+1 , (8.4.13)
k=0
X  (i) (i)

S3 = PV (j)PV (k)aj,k + PV (j + 1)PV (k − 1)ak−1,j+1 .
i≥j≥k (8.4.14)
8.4 LOG-CONCAVITY OF NONNEGATIVE SEQUENCES 211

The sum S1 in (8.4.12) is further split according to whether r = j + k is even


or odd: If r is even, then m = r/2; if r is odd, then m = (r − 1)/2. We will
consider the case when r is even. In this case
m
X  (i)
S1 = PV (m + k)PV (m − k)am+k,m−k
k=0
(i) 
+ PV (m + k + 1PV (m − k − 1)am−k−1,m+k+1
m
X  (i)
= PV (m + k)PV (m − k)am+k,m−k
k=0
m−1
X (i)
+ PV (m + k + 1PV (m − k − 1)am−k−1,m+k+1
k=0
m
X  (i) (i)
= PV (m)2 a(i)
m,m + PV (m + k)PV (m − k) am+k,m−k + am−k,m+k
k=1
m
X
= PV (m + k)PV (m − k)ck , (8.4.15)
k=0

(i) (i)
where c0 = am,m and ck = am+k,m−k for 1 ≤ k ≤ m. Then condition B(a)
Pt
tells us that k=0 ck ≥ 0 for all 0 ≤ t ≤ m, and so by Lemma 8.1 with
k = m, i = m, Eq (8.4.12) is positive. In the same way we can show that the
sum of the second and third terms in (8.4.11) equals

m
X
PV (m + k + 1)PV (m − k) dk , (8.4.16)
k=0

(i) (i)
where dk = am+k+1,m−k + am−k,m+k+1 for 0 ≤ k ≤ m. Then condition B(b)
Pt
tells us that k=0 dk ≥ 0 for all 0 ≤ t ≤ m, and so, by Lemma 8.1 with
k = m + 1, i = m, Eq (8.4.13) is positive. Hence, PV +W (i)2 − PV +W (i −
1)PV +W (i + 1) ≥ 0 for all i. Other cases are similarly resolved. 
Thus, we have established that the sum of any two independent and iden-
tically distributed geometric random variables (both on the edge case) is a
negative binomial distribution (still, log-concave, but no longer the edge case).
(i)
Next, the quantities aj,k have the following properties for independent ran-
(i) (i) (i)
dom variables: V and W , (i) aj,j+1 ≡ 0 for all j; (ii) ak−1,j+1 = −aj,k ; and
(i)
(iii) if W is log-concave, then aj,k ≥ 0 for j ≥ k.
We fix i, define cj as in Lemma 8.1, and define dj as a sequence such
Pt
that j=0 dj ≥ 0 for all 0 ≤ t ≤ m, and if V and W are independent and
212 8 LOG-CONCAVE FUNCTIONS

log-concave, then

t
X t
X
(i) (i)
a(i)
m,m + am+j,m−j − am+j−1,m−j+1
j=1 j=1
t
X t−1
X
(i) (i) (i)
= a(i)
m,m + am+j,m−j − am+j,m−j = am+t,m−t ≥ 0,
j=1 j=0 (8.4.17)

and
t
X
(i) (i)
am+j+1,m−j − am+j,m−j+1
j=0
t
X t−1
X
(i) (i) (i) (i)
= a(i)
m,m + am+j,m−j − am+j,m−j = am+t+1,m−t − am,m−1
j=1 j=1
(i)
= am+t+1,m−t ≥ 0. (8.4.18)

Hence, Condition B holds for independent and log-concave V and W .

8.5 Log-Concave Distributions


The following probability distributions are log-concave: (i) Normal distri-
bution and multivariate normal distributions; (ii) exponential distribution;
(iii) uniform distribution over any convex set; (iv) logistic distribution; (v)
extreme value distribution; (vi) Laplace distribution; (vii) chi-distribution;
(viii) Wishart distribution, where n ≥ p + 1 (Prékopa [1971]); (ix) Dirichlet
distribution, where all parameters are ≥ 1 (Prékopa [1971]); (x) Gamma dis-
tribution if the shape parameter is ≥ 1; (xi) chi-distribution if the number of
degrees of freedom is ≥ 2; (xii) beta distribution if both shape parameters are
≥ 1; and (xiii) Weibull distribution if the shape parameter is ≥ 1.
All the above parameter restrictions are based on the fact that the exponent
of a nonnegative quantity must be nonnegative in order that the function
remains log-concave.
The following distributions are not log-concave for all parameters: (i) Stu-
dent’s t-distribution; (ii) Cauchy distribution; (iii) Pareto distribution; (iv)
log-normal distribution; and (v) F -distribution.
Although the cumulative distribution function (c.d.f.) of all log-concave
distributions is log-concave, the following non-log-concave distributions also
have log-concave c.d.f.s: (i) log-normal distribution; (ii) Patero distribution;
(iii) Weibull distribution when the shape parameter is < 1; and (iv) gamma
distribution when the shape parameter is < 1.
Some useful properties of log-concave distributions are as follows:
8.5 LOG-CONCAVE DISTRIBUTIONS 213

(i) If the density (p.d.f.) is log-concave, so is its c.d.f.


(ii) If a multivariate density is log-concave, so is the marginal density over
any subset of variables.
(iii) The sum of two independent log-concave random variables is log-
concave, since the convolution of two log-concave functions is log-concave.
(iv) The product of two log-concave functions is log-concave, which means
that the joint densities formed by multiplying two probability densities (e.g.,
normal-gamma distribution, which always has a shape parameter ≥ 1) are log-
concave. This property is extremely useful in general-purpose Gibbs sampling
programs.
Example 8.6. In manufacturing projects, let xm = x + v, where xm de-
notes the manufacturing yield, x ∈ Rn the nominal value of design parameters,
v ∈ Rn the manufacturing errors with zero random value, and S ⊆ Rn , the
specs, i.e., acceptable values of xm . Then the yield Y (x) is the probability
that x + v is in S, i.e., Y (x) = p(x + v) is log-concave, if S is a convex set,
and if the probability density of v is log-concave. 
Example 8.7. Let S = {y ∈ R2 | y1 ≥ 1, y2 ≥ 1}, and let v1 , v2 be
independent, normal with σ = 1. Then, with x + v ∈ S, the yield is
Z ∞  Z ∞ 
1 2 2
Y (x) = p(x + v) = e−t /2
dt e−t /2
dt , (8.5.1)
2π 1−x1 1−x2

where each factor on the right side of (8.5.1) is a normal distribution truncated
at 1 − x1,2 . Next, the maximum yield vs. cost is evaluated as follows: If the
manufacturing cost c = x1 + 2x2 , then the maximum yield for a given cost is
given by
Ymax (c) = sup x1 +2x2 =c Y (x), (8.5.2)
x1 ,x2 ≥0

where Y (x) is log-concave, and

− log Ymax (c) = inf x1 +2x2 =c − log Y (x1 , x2 ). (8.5.3)


x1 ,x2 ≥0

The relation between the cost c and the maximum yield Ymax is presented in
Figure 8.2, where the cost rises as the yield increases. 
Let a density function f be defined as

f (x) ≡ fφ (x) = exp{φ(x)} = exp{−(−φ(x))}, (8.5.4)

where φ is concave (and so −φ is convex). Let us call the class of all such
densities f on R as the class of log-concave densities and denote this class by
P0 ≡ Plog-concave . The function f ∈ P0 is log-concave iff
214 8 LOG-CONCAVE FUNCTIONS

(i) log f (tx + (1 − t)y) ≥ t log f (x) + (1 − t) log f (y) for all t ∈ [0, 1] and for
all x, y ∈ R;
(ii) f (tx + (1 − t)y) ≥ f (x)t · f (y)1−t ;
p
(iii) f ((x + y)/2) ≥ f (x)f (y) (for t = 12 , and assuming f is measurable);
(iv) f ((x + y))2 ≥ f (x)f (y).

Figure 8.1 Cost vs. maximum yield.

1 2
Example 8.8. 1. Standard normal distribution: f (x) = √ e−x /2 ; then
√ 2π
− log f (x) = 12 x2 + log 2π, and (− log f (x))′′ = 1.
1 −|x|
2. Laplace distribution: f (x) = 2e ; then − log f (x) = |x| + log 2,
(− log f (x))′′ = 0 for all x =
6 0.
ex
3. Logistic distribution: f (x) = ; then −logf (x) = −x + 2 log(1 +
(1 + ex )2
ex
ex ), (− log f (x))′′ = = f (x). 
(1 + ex )2

8.6 Exercises
8.1. Prove that if P1 and P2 are log-concave probability measures, then
the product measure P1 × P2 is a log-concave probability measure.
Proof. If a probability measure P in Rn assigns zero mass to every
hyperplane in Rn , then by (8.1.1), log-concavity of P holds if P (tx+(1−t)y) ≥
P (x)t P (y)1−t , 0 < t < 1, for all x, y ∈ Rn . Let A, B denote two rectangular
hyperplanes with sides parallel to the coordinate axes such that all x ∈ A
and all y ∈ B. Then, by the above inequality for these hyperplanes, we have
P (tA + (1 − t)B) ≥ P (A)t P (B)1−t for 0 < t < 1. A similar argument applies
to the product P1 × P2 .
8.6 EXERCISES 215

8.2. Show that Condition B holds in Example 8.5. Solution. For any
(i)    i+1
i, ar,s is proportional to ri si − i−1r s , i.e., for part (a) of Condition B,
(i) (i)
for any given i, the increment term am+t,m−t + am−t,m+t is proportional to
i
 i
 i−1
 i+1
 i+1
 i−1

2 m+t m−t − m+t m−t − m+t m−t , which is positive for t ≤ T and nega-
Pt
(i)
tive for t > T for some value of T > 0. Hence the partial sums am+k,m−k
k=−t
form a sequence which increases for t ≤ T and decreases thereafter. Then,
P  b   m
P (i)
using the identity j aj r−j = a+br , we find that am+k,m−k = 0, and
k=−m
thus the sequence of partial sums must be nonnegative for any t. A similar
argument holds for part (b) of Condition B.

8.3. Show that the density function (p.d.f.) of which of the following
probability distributions is log-concave, log-convex, or log-linear: (i) uniform
distribution; (ii) standard normal distribution; (iii) logistic distribution; (iv)
extreme-value distribution; (v) exponential distribution; (vi) Weibull distri-
bution; (vii) power function distribution; (viii) Gamma distribution; (ix) chi-
square distribution; (x) chi-distribution; (xi) beta distribution; and (xii) Stu-
dent’s t-distribution.
Answer. (i) Uniform distribution, defined on the interval [0, 1], has density
f (x) = 1, which is (weakly) log-concave.
(ii) Standard normal probability distribution has probability density f (x) =
1 −x2 /2
√ e , whence (log f (x))′ = −x and (log f (x))′′ = −1 < 0. Thus, the

density is log-concave.
e−x
(iii) Logistic distribution has density f (x) = , whence (log f (x))′
(1 + e−x )2
= −1+2(1−F (x)), and (log f (x))′′ = −2f (x) < 0; hence, f (x) is log-concave.
(iv) Extreme-value distribution has density function f (x) = exp{−e−x},
giving (log f (x))′′ = −e−x < 0; hence f (x) is log-concave; (v) exponential
distribution has density f (x) = λe−λx , with (log f (x))′′ = 0, and f ′ (x) < 0
for all x ∈ [0, ∞], and hence f (x) is log-concave.
(vi) Weibull distribution with parameter c has density function
 f (x) =

 < 0 for c > 1,
c−1 −xc ′′ −2 c
cx e , x ∈ (0, ∞). Also, (log f (x)) = (1−c)x (1+cx ) = 0 for c = 1,


> 0 for c < 1.
Thus, f (x) is (strictly) log-concave if 0 < c < 1, log-linear if c = 1, and it is
log-convex if c > 1.
c
(vii) Power function distribution has density function f (x) = cxc−1 e−x ,
216 8 LOG-CONCAVE FUNCTIONS

 < 0 for c > 1,

′′ −2 c
x ∈ (0, ∞). Also, (log f (x)) = (1 − c)x (1 + cx ) = 0 for c = 1, Thus,


> 0 for c < 1.
the density function is (strictly) log-concave if 0 < c < 1, log-linear if c = 1,
and log-convex if c > 1.
xm−1 θm e−xθ
(viii) Gamma distribution has density function f (x) = ,x∈
Γ(m)
1−m
(0, ∞), θ > 0 and m > 0. Then (log f (x))′′ = . Thus, the density
x2
function is strictly log-concave for m > 1, but strictly log-convex for m < 1.
(viii) Chi-square distribution with n degrees of freedom is a gamma distri-
bution with θ = 2 and m = n/2. Since the sum of the squares of n indepen-
dent standard normal random variables has a chi-square distribution with n
degrees of freedom, and since the gamma distribution has a log-concave den-
sity function for m ≥ 1, so the sum of the squares of two or more independent
standard normal variables has a log-concave density function.
x(n/2)−1 e−n/2 x2
(ix) chi-distribution has density function f (x) = , x > 0,
2n/2 Γ(n/2)
n−1
where n is a positive integer. Since (log f (x))′′ = − 2 − n < 0, the density
x
function is log-concave.
xa−1 (1 − x)b−1
(x) beta-distribution has density function f (x) = ,x ∈
B(a, b)
1−a 1−b
(0, 1), a, b > 0. Since (log f (x))′′ = + , the density function is
x x
log-concave if a ≥ 1 and b ≥ 1, and log-convex if a < 1 and b < 1. If a < 1
and b > 1, or if a > 1 and b < 1, then the density function is neither log-convex
nor log-concave on (0, 1).
(xi) Student’s t-distribution is defined on the entire real line with density
function
(1 + x2 /n)−n+1/2
f (x) = √ ,
n B(1/2, n/2)
where B(a, b) is the incomplete beta function and n is the number of degrees
n − x2
of freedom. Since (log f (x))′′ = −(n + 1) 2 2
, the density function is
√ √(n + x )
log-concave on the central interval [− n, n], and therefore, it is log-concave

√ this interval but log-convex on each of the outer intervals [∞, − n] and
on
[ n, ∞]. Thus, although√ this
√ distribution is itself not log-concave, a truncate
one on the interval [− n, n] is log-concave.
9
Quadratic Programming

Quadratic programming is used to optimize functions f (x), x ∈ Rn , of the


form n1 xT Qx + cT x subject to equality and inequality constraints, where
n ≥ 2 is an integer, usually very large. We will discuss the iteration methods
to solve such optimization problems.

9.1 Quadratic Programming


Quadratic programs (QP) are minimization programs that have a convex (qua-
dratic) objective and a convex constraint set formed by linear constraints. The
QP primal problem is defined as

1 T
(P) : Minimize 2 x Qx + cT x subject to Ax ≥ b, (9.1.1)

which is equivalent to b−Ax ≤ 0, with any nonnegativity restrictions included


in Ax ≥ b, where x ∈ Rn ; Q is an n × n real, symmetric and positive definite
matrix; A is an m × n real matrix; and b, c ∈ Rm are each a column vector.
Thus, we have a convex objective and linear constraints.
The Lagrangian is

m
X
L(x, λ ) = f (x) + λj gj (x) = 21 xT Qx + cT x + λT (b − Ax). (9.1.2)
j=1


The dual problem is defined as max min L(x, λ ) , that is,
λ ≥0 x

∂L(x, λ )
max L(x, λ ) subject to = 0. (9.1.3)
λ ≥0 ∂x

This yields the dual problem

(D) Maximize L(x, λ ) = 12 xT Qx + cT x + λ T (b − Ax), (9.1.4)


218 9 QUADRATIC PROGRAMMING

T
subject to
T
 Lx (x, λ ) = Qx
T
+ c − TA λ = 0, λ ≥ 0. This dual constraint implies
that x Qx + c − A λ = x 0, where Lx denotes the first-order partial
derivatives L with respect to x, i.e.,

xT Qx + xT c − xT AT λ = xT Qx + xT c − λ T Ax = 0. (9.1.5)

Using (9.1.4), the dual objective function can be rearranged as



λ T b − 12 xT Qx + xT Qx + xT c − λ T Ax = λ T b − 12 xT Qx. (9.1.6)

Thus, the dual problem reduces to

Minimize λT b − 21 xT Qx subject to Qx + c − AT λ = 0, λ ≥ 0. (9.1.7)

There are three particular cases:


Case 1. If Q = 0, then

(P) is min cT x subject to Ax ≥ b,

(D) is max λT b subject to AT λ = c,

which is the regular linear programming pair.


Case 2. If Q 6= 0, then
(P) has n variables, and m linear inequality constraints, and
(D) has (m + n) variables, n equality constraints, and m nonnegativity
constraints.
Case 3. If Q is nonsingular (i.e., Q−1 exists), then from the dual con-
straints we have

Qx + c − AT λ = 0 =⇒ x + Q−1 c − Q−1 AT λ = 0,

i.e.,
x = Q−1 [AT λ − c]. (9.1.8)
Thus, we may eliminate x altogether from the dual problem.
Given any two matrices U and V, we will use the following four known
results: (i) [UV]T = VT UT ; (ii) [UT ]T = U; (iii) UT V = VT U (assuming
compatibility); and (iv) Q and Q−1 be symmetric and identical to their trans-
poses. Then, substituting the value of x from (9.1.6) into the dual objective
function and rearranging, we get

λT b − 12 xT Qx = λT b − 21 [Q−1 AT λ − c]T · Q · [Q−1 AT λ − c)]


9.1 QUADRATIC PROGRAMMING 219

T 
= bT λ − 1
2 AT λ − c · Q−1 · Q · Q−1 · AT λ − c by (i) and (iv)
T  
= bT λ − 12 [ AT λ − cT ] · Q−1 · AT λ − c
λT AQ−1 − cT Q−1 ] · (AT λ − c) by (i) and (ii)
= bT λ − 21 [λ
λT AQ−1 AT λ + cT Q−1 c − cT Q−1 AT λ − λ T AQ−1 c]
= bT λ − 21 [λ
λT AQ−1 AT λ + cT Q−1 c − cT Q−1 AT λ − (AQ−1 c)T λ ] by (iii)
= bT λ − 12 [λ
λT AQ−1 AT λ + cT Q−1 c − cT Q−1 AT λ − cT (Q−1 )T AT λ ] by(i)
= bT λ − 21 [λ
λT AQ−1 AT λ + cT Q−1 c − 2cT Q−1 AT λ ] by (iv)
= bT λ − 21 [λ
= [bT + cT Q−1 AT ] λ − 1
λT
2 [λ AQ
−1 T
A λ ] − 12 [cT Q−1 c]
= uT λ − 1
2
λT v λ ] − 12 [c Q
[λ T −1
c],

where
u = b + AQ−1 c, and v = −AQ−1 AT .

Then for nonsingular Q, the dual is


max uT λ − 21 λ T v λ − 1
2 cT Q−1 c . (9.1.9)
λ ≥0

Example 9.1. Minimize 21 x2 + 21 y 2 − 4x − 4y subject to the constraints

 
−x ≥ −1 −y ≥ −1
0 ≤ x ≤ 1 =⇒ 0 ≤ y ≤ 1 =⇒
x ≥ 0; y ≥ 0,

that is,
Minimize 12 xT Qx + cT x, subject to Ax ≥ b,

where we will take Q as the identity matrix I. Then

  
     −1 0 −1
x 1 0 −4  0 −1   −1 
x= , Q= , c= , A= , b =  .
y 0 1 −4 1 0 0
0 1 0
220 9 QUADRATIC PROGRAMMING

Since Q−1 = Q = I, we get


     
−1 −1 0    3
 −1   0 −1  1 0 −4  3 
u = b + AQ−1 c =  +  = ,
0 1 0 0 1 −4 −4
0 0 1 −4
 
−1 0   
−1 T  0 −1  1 0 −1 0 1 0
v = −AQ A = −  
1 0 0 1 0 −1 0 1
0 1
 
−1 0 1 0
 0 −1 0 1 
= .
1 0 −1 0
0 1 0 −1
Since the dual is given by (9.1.9), the optimal solution from (9.1.8) with the
above computed values of u and v is
 ∗
   λ1∗  
∗ −1 T ∗ 1 0 −1 0 1 0  λ2  −4
x = Q [A λ − c] =  ∗−
0 1 0 −1 0 1 λ3 −4
λ∗4
 
−λ∗1 + λ∗3 + 4
= .  (9.1.10)
−λ2∗ + λ∗4 + 4
The values of λ could be determined from (9.1.10). However, these values are
easily determined using the Hildreth-D’Epso method described in the next
section. The values of λ ∗ so obtained are then used to determine the optimal
value of x∗ . This example will continue after this method is explained.

9.2 Hildreth-D’Esopo Method


This method, which is an iterative method developed by Hildreth [1957] and
D’Esopo [1959], will be applied to the dual (9.1.9). It follows the following
three steps:
Step 1. Start with an initial λ , for example, λ = 0.
Step 2. For j = 1, 2, . . . , m, do the following: Search for the maximum in
∂ L̂
the direction parallel to the λj -axis by fixing λk , k 6= j, and solving =0
∂λj
for λj . If this λj < 0, then fix λj = 0.
Step 3. If λ from this iteration is the same as the λ from the previous
iteration, stop; else go to step 2.
The KKT conditions for the dual (9.1.9) are

∂ L̂
λ ) = u + v λ = 0.
(λ (9.2.1)
∂λλ
9.2 HILDRETH-D’ESOPO METHOD 221

!!! Some authors write this expression as ∇λ L̂(λ


λ ) = u + v λ , which is incor-
rect.
Example 9.1, continued. We start with the KKT conditions (9.2.1),
with
   
3 −1 0 1 0
 3   0 −1 0 1 
u= , v =  ,
−4 1 0 −1 0
−4 0 1 0 −1

which yields

 
∂ L̂/∂λ1 
3
 
−1 0 1 0
 
λ1
∂ L̂  ∂ L̂/∂λ2
  3   0 −1 0 1   λ2 
=
 ∂ L̂/∂λ3  = u + v λ =  −4  +  1  
∂λλ 0 −1 0 λ3
∂ L̂/∂λ4 −4 0 1 0 −1 λ4
   
1 − λ1 + λ3 0
 1 − λ2 + λ4   0 
=  =  , (9.2.2)
−4 + λ1 − λ3 0
−4 + λ2 − λ4 0

which must be solved to obtain the optimal λ ∗ . This becomes simpler using
the Hildreth-D’Espo method, which is as follows. We start with (9.2.1). Then
∂ L̂
First iteration. Let λ = [0 0 0 0]T . Solve = 0, keeping λ2 = 0 =
∂λ1
λ3 = λ4 ; we get −λ1 + λ3 + 1 = 0, which gives λ1 = 1, thus λ = [1 0 0 0]T .
∂ L̂
Next, solve = 0, with λ1 = 1, λ3 = 0 = λ4 ; we get −λ2 + λ4 + 1 = 0,
∂λ2
yielding λ2 = 1. Thus, λ = [1 1 0 0]T .
∂ L̂
Next, solve = 0, with λ1 = 1 = λ2 , λ4 = 0; we get λ1 − λ3 − 4 = 0,
∂λ3
yielding λ3 = −3. Fix λ3 = 0. Thus, λ = [1 1 0 0]T .
∂ L̂
Finally, solve = 0, with λ1 = 1 = λ2 , λ3 = 0; we get λ2 − λ4 − 4 = 0,
∂λ4
yielding λ4 = −3. Fix λ4 = 0. Thus, λ = [1 1 0 0]T .
End of First iteration.
Notice that λ has changed from [0 0 0 0]T to [1 1 0 0]T . So we go to the
second iteration.
Second iteration. It goes through the following four steps:
1. −λ1 + λ3 + 1 = 0 with λ2 = 1, λ3 = 0 = λ4 , yielding λ1 = 1; thus,
λ = [1 1 0 0]T .
222 9 QUADRATIC PROGRAMMING

2. −λ2 + λ4 + 1 = 0 with λ1 = 1, λ3 = 0 = λ4 , yielding λ2 = 1; thus,


λ = [1 1 0 0]T .
3. λ1 − λ3 − 2 = 0 with λ1 = 1 = λ2 , λ4 = 0, yielding λ3 = −3, so fix λ3 = 0;
thus, λ = [1 1 0 0]T .
4. λ2 − λ4 − 2 = 0 with λ1 = 1 = λ2 , λ3 = 0, yielding λ4 = −3, so fix λ4 = 0;
thus, λ = [1 1 0 0]T .
End of second iteration.
The iteration has converged since the value of λ remains the same for these
two iterations. Thus, λ∗1 = 1 = λ∗2 , λ∗3 = 0 = λ∗4 . Then, using (9.1.8), the
optimal primal vector x∗ is given by


  λ1    
1 0 −1 0 1 0  λ2  −4 3
x∗ = Q−1 [AT λ ∗ −c] =   − = .
0 1 0 −1 0 1 λ3 −4 3
λ4

Finally, note that using these values of x the primal objective (9.1.1) yields
    
1 T 1 0 3 3
2 x Qx + cT x = 1
2 [3 3] + [ −4 −4 ] = −13. 
0 1 3 3

9.3 Beale’s Method


This method (Beale [1959]) is used for the quadratic programming problem
1 T
Minimize f (x) = n x Qx + cT x , subject to Ax = b, x ≥ 0. (9.3.1)

First, the vectors x and c and the matrices A and Q are partitioned into
their basic and nonbasic partitions, which are denoted by the index B and N
respectively. Thus, A = [AB | AN ], where AB has columns corresponding to
the basic variables xB and is nonsingular. Again, the constraint in (9.3.1) is
expressed as AB xB + AN xN = b. Thus, if xN = 0, then xB = [AB ]−1 b. In
general, xB = [AB ]−1 b − [AB ]−1 AN xN for x that satisfies Ax = b.
The partitions of c and Q are
 
QB
B QN
B
c = [cB | cN ]T , Q= , (9.3.2)
QB
N QN
N

where QN B T
B = [QN ] .

Example 9.2. Minimize

f (x, y, z) = 13 x2 + 13 y 2 + z 2 + 2xy + 2xz + yz + 3x + y + z, (9.3.3)


9.3 BEALE’S METHOD 223

which is of the matrix form (9.3.1). The algebraic function f (x, y, z) can be
expressed in the matrix form as
    
1 3 3 x x
1
[
3 | x y z ]  3 1 1   y  + [ 3 1 1 ]  y , (9.3.4)
{z } | {z }
xT
3 2 3 z cT
z
| {z } | {z } | {z }
Q x x

subject to
 
  x 
 
x+z =9 1 0 1   9
=⇒ y = , x ≥ 0. (9.3.5)
y + z = 12 0 1 1 12
| {z } z | {z }
A b

An initial choice x = 9, y = 12, z = 0 makes the objective (9.3.3) equal to


330. Also, with this choice we have
 
x0 = x0B | x0N = [x0 y 0 | z 0 ] = [9 12 | 0]T ,
 
 B N
 1 0 1
A = A |A = ,
0 1 1
  T
c = cB | cN = [ 3 1 1] ,
 
1 3 3
 B   
QB ∗ QN B
 3 1 1 
Q= = .
QB
N Q N
N  −− −− − −− 

3 2 3

Also
 −1    
B −1
 1 0 9 9
A b= = = xB ,
0 1 12 12
 −1    
−1 1 0 1 0
AB AN xN = [0] = .
0 1 1 0
The objective function (9.3.1) is
f (x) = 13 xT Qx + cT x
   
QB
B QN
B
  xB
xB
= 13 [ xTB xTN ]   + [ cT cT ]  
B N xN
QN QN xN
 B N

QB xB + QB xN
= 13 [ xTB xTN ]   + cB xB + cN xN
B N
QN xB + QN xN
 T B 
= 3 xB QB xB + xB QN
1 T T B T N
B xN + xN QN xB + xN QN xN + cB xB + cN xN .
224 9 QUADRATIC PROGRAMMING

On substituting xB = (AB )−1 b and (AB )−1 AN xN = 0, we get


n −1 −1 T B B −1 o
f (x) = cB AB b + 31 [ AB b] QB A b

+ [(AB )−1 b]T QNB − [(A )
B −1 T B
b] QB (AB )−1 AN +cN − cTB(AB )−1 AN
n o
B −1 N B −1 N T B B −1 N
+ 31 xTN QN
N − Q B
N (A ) A + [(A ) A ] Q B (A ) A xN .

The objective function can also be written as

f (x) = z + pT + 13 xTN R xN , (9.3.6)

where
−1 −1 −1 o
z = cB AB b + 31 [ AB b]T QB
B A
B
b ,
p = [(AB )−1 b]T QN B −1 T B
B − [(A ) b] QB (AB )−1 AN + cN − cTB (AB )−1 AN ,
B −1 N
R = QN B
N − QN (A ) A + [(AB )−1 AN ]T QB B −1 N
B (A ) A .

Thus, using KKT conditions we have

∂f (x) X
= pi + rik xk (for i ∈ N )
∂xi
k∈N
= pi (at xN = 0). (9.3.7)

At this point we must choose any negative pi and increase the corresponding
value of xi until
(i) one of the basic variables becomes zero (as in the simplex method); or
∂f (x) −pi
(ii) = pi + rii xi = 0, i.e., xi = . Note that this result is nonbasic
∂xi rii
but is a feasible solution.
Example 9.2, continued. We will use iteration method.
(First iteration:) We have
        
3 1 3 1 0 1 1 1
p = [ 9 12 ] −[ 9 12 ] +[1]−[ 3 1] = [−51].
1 3 1 0 1 1 1 0

∂f
Thus, = −51.
∂z
Note that we do not need R, since xN = 0. Next, increase z from 0 until
one of the basic variables goes to zero, i.e., by 9 units when x = 0, y = 3, z = 9.
Then the new objective function is 84, as compared to 330 at the previous
iteration.
9.4 WOLFE’S METHOD 225

(Second iteration) gives xB = [ y z | x ] = [ 3 9 | 0 ]. This iteration


continues to the next one until the two successive iterations give the same
result. At that point the iterative process stops. 

9.4 Wolfe’s Method


This is an iterative procedure developed by Wolfe [1959] to solve maximiza-
tion QP problem numerically with inequality constraints. If the QP problem
is a minimization problem, it must be converted into a maximization problem
by multiplying the constraints by −1. The inequality constraints are then
converted into equality constraints by introducing slack variables. The QP
problem is finally converted into a linear programming (LP) problem by intro-
ducing new variables v = {v1 , v2 , . . . , vn } in Rn , which is solved numerically
by the simplex method.1
We will consider the following maximization QP problem in R2 :
Maximize f (x) = 12 xT Qx + cT x subject to the constraints Ax − b ≤
0, x ≥ 0,
which we rewrite as:
Pn n
P
Maximize 21 cjk xj xk + cj xj , where cjk = ckj , subject to the con-
j,k=1 j=1
n
P
straints aij xj ≤ bj , xj ≥ 0 for i = 1, 2, . . . , m; j = 1, 2, . . . , n; and for all
j=1
j, k, bi ≥ 0 for all i = 1, 2, . . . , m.
n
P
Also assume that the quadratic form cjk xj xk is negative semidefinite
j,k=1
(NSD).
Wolfe’s method is then developed using the following five steps:
Step 1. Convert the inequality constraints to equations by introducing
slack variables s = s2i in the ith constraint (i = 1, 2, . . . , m) and the slack
variable t = t2j in the jth non-negativity constraint (j = 1, 2, . . . , n).
Step 2. Construct the Lagrangian

m
X n
nX o Xn
L(x, s, t, λ , µ ) = f (x) − λi aij xj − bi + s2i − µj {−xj + t2j },
i=1 j=1 j=1

where x = (x1 , x2 , . . . , xn ), s = (s21 , s22 , . . . , s2m ), t = (t21 , t22 , . . . , t2n ), λ =


(λ1 , λ2 , . . . , λm ), and µ = (µ1 , µ2 , . . . , µn ). Next, equate to zero the first

1 The simplex method is an algorithm for starting at some extreme feasible point and,

by a sequence of exchanges, proceeding systematically to the other such points until a


solution point is found. This is done in a way which steadily reduces the value of the
linear function Z. The exchange process involved is essentially the same as used in matrix
inversion. Details of this method can be found in Hildebrand [1974].
226 9 QUADRATIC PROGRAMMING

partial derivatives of L with respect to x, s, t, λ , µ , and obtain the KKT con-


ditions.
Step 3. Introduce the nonnegative vector v = [v1 , v2 , . . . , vn ] in the KKT
n
P m
P
conditions cj + cjk xk − aij λi + µj = 0 for j = 1, 2, . . . , n, and construct
k=1 i=1
the objective function Z = v1 + v2 + · · · + vn .
Step 4. Obtain the initial basic feasible solution for the following linear
programming (LP) problem: min{Zv } = v1 + v2 + · · · + vn subject to the
constraints
n
X m
X
cjk xk − aij λi + µj = −cj for j = 1, 2, . . . , n,
k=1 i=1
Xm
aij xj + s2i = bi ,
i=1

where vj , λj , µj , xj ≥ 0 for i = 1, 2, . . . , m; j = 1, 2, . . . , n, and satisfying the


complementary slackness condition
n
X m
X
µj xj + λi s2i = 0; i.e., λi s2i = 0, and µj xj = 0.
j=1 i=1

Step 5. Apply the two-phase simplex method (Tableau Format)2 to deter-


mine for the LP problem of Step 4 an optimum solution, which must satisfy
the complementary slackness conditions.
The optimum solution obtained in Step 5 is the optimum solution of the
required QP problem.
Example 9.3. (Singh [2012]) Maximize Z = 8x + 10y − 2x2 − x2 subject
to the constraints 3x + 2y ≤ 6 and x, y ≥ 0.
Solution. Introduce all the constraints in the nonpositive form: 3x+2y ≤
6 , and −x ≤ 0, −y ≤ 0, and then introduce the slack variables so that
3x + 2y + s21 = 6, −x + t21 = 0, and −y + t22 = 0. Then the problem reduces to
the LP problem:
Maximize Z = 8x + 10y − 2x2 − y 2 subject to the constraints 3x + 2y + s21 =
6, −x + t21 = 0, −y + t22 = 0.
The Lagrangian is given by

L(x, y, λ1 , µ1 , µ2 , s1 , t1 , t2 ) = 8x + 10y − 2x2 − y 2 + λ1 (6 − 3x − 2y − s21 )


+ µ1 (x − t21 ) + µ2 (y − t2 ),

2 The algorithm and examples can be found in Bertsimas and Tsitsiklis [1997], Bertsekas
[1999], and Cormen et al. [2001].
9.4 WOLFE’S METHOD 227

whence we obtain the KKT conditions


∂L ∂L
= 8 − 4x − 3λ1 + µ1 = 0, = 10 − 2y − 2λ1 + µ2 = 0,
∂x ∂y
∂L ∂L ∂L
= 6 − 3x − 2y − s21 = 0, = x − t21 = 0, = y − t22 = 0.
∂λ1 ∂µ1 ∂µ2

Next, we introduce the linearly independent variables v1 and v2 , and we have


the LP problem:
Maximize Z = −v1 − v2 subject to the constraints 4x + 3λ1 − µ1 + v1 =
8, 3y + 2λ1 − µ2 + v2 = 10, 3x + 2y + s21 = 6.
The details of the solution are given in the following four tables.

Table 1

BV cB xB x(0) y(0) λ1 (0) µ1 (0) µ2 (0) v1 (−1) v2 (−1) s21 (0)


v1 −1 8 4 0 3 −1 0 1 0 0
v2 −1 10 0 2 2 0 −1 0 1 0
s21 0 6 3 2 0 0 0 0 0 1
Z = −18 −4 −2 −5 1 1 0 0 0

Note that λ1 cannot be the entering variable since s21 is the basic variable such
that λ1 s21 = 0. Hence, x is the entering variable since µ1 is not basic; similarly,
y can also be the entering variable since µ2 is not basic. The min-ratio for v1
and s21 , given by (xB /x(0), is (8/4, 6/3) = (2, 2), which is a tie. So we must
take y as the entering variable, since it has the min-ratio (10/2, 6/2) = (5, 3).
This leads to Table 2.

Table 2

BV cB xB x(0) y(0) λ1 (0) µ1 (0) µ2 (0) v1 (−1) v2 (−1) s21 (0)


v1 −1 8 4 0 3 −1 0 1 0 0
v2 −1 4 −3 0 2 0 −1 0 1 0
y 0 3 3/2 1 0 0 0 0 0 1/2
Z = −12 −1 0 −5 1 1 0 0 0

Now λ1 can enter since s21 is not basic, and for v1 and v2 the min-ratio is
(xB /y(0)) = (8/3, 4/2). This leaves the variable v2 , which leads to Table 3.
228 9 QUADRATIC PROGRAMMING

Table 3

BV cB xB x(0) y(0) λ1 (0) µ1 (0) µ2 (0) v1 (−1) v2 (−1) s21 (0)


v1 −1 2 17/2 0 0 −1 3 1 −3 0
λ1 0 2 −3/2 0 1 0 −1 0 1 0
y 0 3 3/2 1 0 0 0 0 0 1/2
Z = −2 −17/2 0 0 1 −3 0 4 0

The min-ratio is (4/17, 2). The final solution follows.

Table 4

BV cB xB x(0) y(0) λ1 (0) µ1 (0) µ2 (0) v1 (−1) v2 (−1) s21 (0)


x 0 4/17 1 0 0 −2/17 6/17 2/17 −6/17 0
λ1 0 40/17 0 0 1 −3/17 8/17 20/17 −6/17 0
y 0 45/17 0 1 0 3/17 −9/17 −3/17 9/17 35/34
Z=0 0 0 0 0 0 1 1 0

Hence, the optimum solution is (x, y, λ1 ) = (4/17, 45/17, 40/17), since v2 =


0 = v1 = µ1 = µ2 = s21 . This solution satisfies the conditions λ1 s21 = 0, µ1 x =
0, µ2 t = 0. The maximum value of Z is max{Z} = 6137/289. 

9.5 Exercises
9.1. For the following optimization problems, let f (x) be the cost function
and g(x) the inequality constraint. Minimize f (x) subject to g(x) ≤ 0, given:
(i) f (x) = x, g(x) = |x| in a domain D ⊂ R;
(ii) f (x) = x3 , g(x) = −x + 1 in a domain D ⊂ R;
(iii) f (x) = x3 , g(x) = −x + 1 in a domain D ⊂ R+ ; and

 −x − 2 for x ≤ −1,

(iv) f (x) = x, g(x) = x for −1 ≤ x ≤ 1 in a domain D ⊂ R,


−x + 2 for x ≥ 1.
Also, in each problem plot the graph, state whether it is convex and if so,
whether the Slater condition (§5.3.2) is satisfied, i.e., gi (x) ≥ 0 for all i.
9.5 EXERCISES 229

Ans. (i) The problem is convex, x∗ = 0 and λ∗ = 1, and the Slater


condition is not satisfied; (ii) the problem is not convex, x∗ = 1, λ3 = 3, and
the Slater condition is not satisfied; (iii) the problem is convex, x∗ = 1, λ∗ = 3,
and the Slater condition is satisfied; and (iv) the problem is not convex,
x∗ = 2, λ∗ = 1, and the Slater condition is not satisfied.

9.2. (a) Minimize the distance between the origin and the convex region
bounded by the constraints x + y ≥ 6, 2x + y ≥ 8, and x, y ≥ 0, and (b) verify
that the KKT necessary conditions are satisfied at the point of minimum
distance.
Solution. Since minimizing the required distance is equivalent to mini-
mizing the distance from the origin to the tangent of the circle that touches
the given convex region, consider a circle x2 + y 2 = r2 , and minimize r2 , or
f (x, y) = x2 +y 2 subject to the constraints x+y ≥ 6, 2x+y ≥ 8, and x, y ≥ 0.
The feasible region lies in the first quadrant x ≥ 0, y ≥ 0. In Figure 9.1, the
lines x + y = 6 and 2x + y = 8 are plotted, and the feasible region is the
shaded region. We will determine a point (x, y) which gives a minimum value
of f (x, y) = x2 + y 2 subject to the given constraints.

Figure 9.1 Feasible region.

dy x
The slope of the tangent to a circle x2 + y 2 = c2 is = − ; the slope of
dx y
the line x + y = 6 is −1, while that of the line 2x + y = 8 is −2. Then
dy x
Case 1. If the line x + y = 6 is tangent to the circle, then = − = −1,
dx y
which gives x = y. Then solving x + y = 6 and x = y, we get x = 3 = y, i.e.,
this line touches the circle at (3, 3).
dy x
Case 2. If the line 2x + y = 8 is tangent to the circle, then =− =
dx y
230 9 QUADRATIC PROGRAMMING

−2, which gives x = 2y. Then solving 2x + y = 8 and 2x = y, we get


x = 16/5, y = 8/5, i.e., this line touches the circle at (16/5, 8/5).
Of these two cases, since the point (16/5, 8/5) lies outside and the point
(3, 3) lies within the feasible region, we have min(f, y) = x2 + y 2 = 18, x =
3, y = 3.
(b) To verify KKT necessary conditions, let
   
x λ1
x= , and λ = .
y λ2

Then f (x) = x2 + y 2 , g(x) = x + y − 6, h(x) = 2x + y − 8, and the KKT


conditions are

L(x, λ ) = f (x) + λ1 (6 − x − y) + λ2 (8 − 2x − y).

subject to the constraints g(x) ≥ 0, h(x) ≥ 0, x ≥ 0, and y ≥ 0. Equating


partial derivatives of L(x, λ ) to zero we get

∂L ∂L
= 2x − λ1 − 2λ2 = 0, = 2y − λ1 − λ2 = 0,
∂x ∂y
∂L ∂L
= 6 − x − y = 0, = 8 − 2x − y = 0,
∂λ1 ∂λ2

and solving these equations simultaneously we find at the point (3, 3) that
λ1 = 6, λ2 = 0. Also, since λ1 (6 − x − y) = 0, λ2 (8 − 2x − y) = 0 at (3, 3), the
KKT conditions are satisfied at the point (3, 3), and min f (x, y) = 18 at this
point.
9.3. Use the KKT conditions to minimize 12 x2 + 21 y 2 − 2x − 2y subject to
the constraint
 
−x ≥ −1 −y ≥ −1
0 ≤ x ≤ 1 =⇒ 0 ≤ y ≤ 1 =⇒
x ≥ 0; y ≥ 0.

Ans. x∗ = [ 1 1 ]T .
9.4. Use KKT conditions to minimize f (x, y) = x2 + 4y 2 − 8x− 16y subject
to the constraints x + y ≤ 5, x ≤ 3, x ≥ 0, y ≥ 0.
       
2 0 −8 1 1 5
Hint. Q = , cT = ,A= , and b = . Ans.
0 8 −16 1 0 3
x∗ = [3 2]T .
9.5. Use KKT conditions to minimize f (x, y, z) = 2x2 + y 2 + 4z 2 subject
to the constraints x + 2y − z = 6, 2x − 2y + 3z = 12, x, y, z ≥ 0.
9.5 EXERCISES 231
   
4 0 0   0  
1 2 −1 6
Hint. Q =  0 2 2  , A = , cT = 0, b = .
2 −2 3 12
0 0 8 0
Ans. x∗ = [5.045 1.194 1.433]T .

9.6. Use Beale’s method to solve the problem: min 21 x2 + 12 y 2 +z 2 +2xy +
2xz +yz + 2x +y + z subject to the constraint x + z = 8, y + z = 10. Hint.
1 2 2  
T 1 0 1
Q = 2 1 1 , c = [ 2 1 1 ], A =
  , and bT = [ 8 10 ].
0 1 1
2 1 2

9.7. For the optimization problem: Minimize kAx − bk22 subject to Gx =


h, (a) find the dual problem; (b) write the KKT conditions, and see if you can
find a solution; (c) find the optimal multiplier vector and the optimal solution
as a function of the optimal multiplier.
Solution. (a) The Lagrangian is

L(x, c) = kAx − bk22 + cT (Gx − h) = xAT Ax + (GT c − 2AT b)T x − cT h.

The dual function (9.2.1) is obtained by minimizing the above strictly convex
function. Thus, we get

1
g(c) = − (GT c − 2AT b)T (AT A)−1 (GT c − 2AT b) − cT h.
4

(b) The KKT conditions are: 2AT (Ax∗ − b) + GT c∗ = 0, and Gx∗ = h.


Since the problem is convex and satisfies the Slater condition, it is feasible
and an optimal point x∗ exists iff the KKT conditions have a solution. But
since by the Weierstrass theorem3 an optimal solution exists, so the KKT
conditions must have a solution. Thus, from the first KKT condition, we get
x∗ = (AT A)−1 (AT b − 12 GT c∗ ), where from the second KKT condition we
have c∗ = −2(G(AT A)GT )−1 (h − G(AT A)−1 AT b).

9.8. Consider the optimization problem in R2+ with the objective function
f (x) = [F1 (x) F2 (x)]T , where F1 (x) = x2 + y 2 and F (x) = (2x + 5)2 .
Then (a) evaluate all Pareto optimal values and points in explicit expressions
for both values and points using the scalarization method; and (b) solve the
scalarization problem with either weight equal to zero; in both cases show
that the solutions of the scalar problem are also Pareto optimal.

3 This theorem states that a continuous function on a nonempty closed bounded set

achieves a maximum and a minimum in this set. In fact, let f : Rn 7→ R be a continuous


real-valued function, and let M = sup {f (x)} and m = inf {f (x)}. Then there is a point
x∈Rn x∈Rn
xM and a point xm such that f (xm ) = m and f (xM ) = M.
232 9 QUADRATIC PROGRAMMING

Solution. Since the problem is convex, all Pareto optimal points can be
obtained using the scalarization method with some weight vector λ ≥ 0. Thus,
fix some λ ≥ 0 and solve the problem: minimize {λ1 (x2 + y 2 ) + λ2 (2x + 5)}.
This problem is equivalent to minimizing {(λ1 +4λ2 )x2 +λ1 y 2 +20λ2 x+25λ2 }.
Any solution of this problem will give a Pareto optimal point and value. Since
the cost function f is strictly convex, the corresponding Pareto optimal point
is given by " #
−10λ2
x∗ (λ1 , λ2 ) = λ1 + 4λ2 ,
0
which, by setting µ = λ2 /λ1 , can be written as
" −10µ #

x (µ) = 1 + 4µ ,
0

thus yielding
 −10µ 2  −20µ 2
F1∗ (µ) = , and F2∗ (µ) = +5 .
1 + 4µ 1 + 4µ

The remaining Pareto optimal points and values are calculated as follows: let
µ → 0 and µ → ∞. As µ → 0, we get x∗ = 0, which gives f ∗ (x) = [0 25].
Next, as µ → ∞, we get x∗ = [− 52 0]T and f ∗ (x) = [ 25
4 0], which corresponds
to minimizing the error in the solution with the minimum norm, and this x∗
is not necessarily a Pareto optimal point.
9.9. Prove that any local optimal point of a convex problem is (globally)
optimal.
Proof. Assume x is locally optimal and y is optimal with f (x) < f (y).
Since x is locally optimal, it means that there is an M such that if z is
feasible and kz − xk2 ≤ M , then f (z) ≥ f (x). Let z = ty + (1 − t)x, where
M
t = . Then ky − xk2 > M for 0 < t < 12 . Since z is a convex
2ky − xk2
combination of two feasible points, it is also feasible. Also, kz − xk2 = M/2,
and thus, f (z) ≤ tf (x) + (1 − t)f (x) + (1 − t)f (y) < f (x) + (1 − t)f (x), which
contradicts the assumption that x is locally optimal. The result also holds in
Rn . 
10
Optimal Control Theory

A Hamiltonian function is involved in dynamic programming of an objective


function on the state variables in optimal control theory. A Hamiltonian
is similar to a Lagrangian in concave programming and requires first-order
conditions.

10.1 Hamiltonian
The Hamiltonian is the operator corresponding to the total energy of the
system in most cases. Its spectrum is the set of possible outcomes when one
measures the total energy of a system. Because of its close relation with
the time-evolution of a system, it is very important in most formulations of
quantum theory. On the other hand, the Hamiltonian in optimal control
theory is distinct from its quantum mechanical definition. Pontryagin proved
that a necessary condition for solving the optimal control problem is that the
control should be chosen so as to minimize the Hamiltonian. This is known
as Pontryagin’s minimum principle which states that a control u(t) is to be
chosen so as to minimize the objective function
Z T
J(u) = Ψ(x(T )) + L(x, u, t) dt, (10.1.1)
0

where x(t) is the system state which evolves according to the state equa-
tions ẋ = f (x, u, t), x(0) = x0 , t ∈ [0, T ], and the control must satisfy the
constraints a ≤ u(t) ≤ b, t ∈ [0, T ].

10.2 Optimal Control


Problems in optimal control theory involve continuous time, a finite time
horizon, and fixed endpoints. They are generally written as
Z T
Maximize J = f (x(t), y(t), t) dt, (10.2.1)
0

subject to ẋ = g(x(t), y(t), t), x(0) = x0 , x(T ) = xT , where J is the value of


the functional to be optimized; x(t) is the state variable which changes over
234 10 OPTIMAL CONTROL THEORY

time according to the differential equation set equal to zero in the constraint;
y(t) is the control variable whose value is selected or controlled to optimize
J; t denotes time; and ẋ denotes the time derivative of x, i.e., dx/dt. The
solution of the optimal control problem (10.2.1) is obtained to set the limits
of the optimal dynamic time path for the control variable y(t).
The dynamic optimal control problems involve the Hamiltonian function H
similar to the Lagrangian in static optimal control problems. The Hamiltonian
is defined as

H(x(t), y(t), λ(t), t) = f (x(t), y(t), λ(t), t) + λ(t)g(x(t), y(t), t), (10.2.2)

where, unlike the static problems, the multiplier λ(t), called the costate vari-
able, is a function of t and estimates the marginal value or shadow price of
the associate state variable x(t). The method of solving problems of the type
(10.2.1) is similar to that used for solving the static optimization problem
involving the Lagrangian. Thus, assuming that the Hamiltonian is differen-
tiable in y and strictly concave so that there is an interior solution, and not
an endpoint solution, the necessary conditions for maximization are

∂H
(a) = 0,
∂y
∂λ ∂H
(b) = λ̇ = − ,
∂t ∂x
∂x ∂H (10.2.3)
(c) = ẋ = ,
∂t ∂λ
(d) x(0) = x0 ,
(e) x(T ) = xT .

Conditions (a), (b) and (c) are known as the maximum principle, and con-
ditions (d) and (e) as the boundary conditions; the two equations of motion
in conditions (b) and (c) are called the Hamiltonian system or the control
system.
For minimization, the objective functional can simply be multiplied by −1,
as in concave programming. If the solution does not involve an end point,
∂H
need not be equal to zero as in condition (a).
∂y
Example 10.1. Solve the following dynamic optimal control problem:
Z 4
Maximize (5x − 6y 2 ) dt subject to ẋ = 6y, x(0) = 2, x(4) = 98.
0

The Hamiltonian is
H = 5x − 6y 2 + λ(6y).
10.2 OPTIMAL CONTROL 235

Using conditions (10.2.1) we get from conditions (a)–(c):


∂H
= −12y + 6λ = 0, which gives y = 0.5λ,
∂y
(10.2.4a)
∂H
λ̇ = − = −5, (10.2.4b)
∂x
∂H
ẋ = = 6y, (10.2.4c)
∂λ
Next, we integrate (10.2.4b) to get:

λ = −5t + c1 . (10.2.4d)

Also from (10.2.4a), (10.2.4c) and (10.2.4d) we find that ẋ = 6(0.5λ) = 3λ =


3(−5t + c1 ) = −15t + 3c1 , which upon integration gives

x(t) = −7.5t2 + 3c1 t + c2 ,

where the arbitrary constants c1 and c2 are determined using the boundary
conditions x(0) = 2, x(4) = 98, giving c2 = 2, c1 = 18. Hence,

x(t) = −7.5t2 + 54t + 2 (state variable),


λ(t) = −5t + 18 (costate variable),
y(t) = 0.5λ = 0.5(−5t + 18) = −2.5t + 9 (control variable).

The equation of motion ẋ = 6y also gives the above control variable, since
−15t + 54 = 6y yields y = −2.5t + 9. Finally, at the endpoints we have
y(0) = 9, y(4) = −1. Thus the optimal path of the control variable y(t) is
linear, starting at the point (0, 9) and ending at the point (4, −1), with a slope
of −10/4 = −2.5. 

10.2.1 Sufficient Conditions for Optimization. Assuming that the nec-


essary conditions (10.2.3) for maximization in control theory are satisfied, the
sufficiency conditions are satisfied if
(i) The objective functional f (x(t), y(t), t) and the constraint g(x(t), y(t), t)
are both differentiable and both concave in x and y; and
(ii) λ(t) = 0 if the constraint is nonlinear in x and y. However, if the constraint
is linear, λ may assume any sign.
Recall that linear functions are either concave or convex, but neither strictly
concave nor strictly convex. In the case of nonlinear functions, the easiest
test for joint concavity is the following discriminant test: Given the Hessian
(discriminant of the second-order derivatives) of the function f ,

fxx fxy
H= , (10.2.5)
fyx fyy
236 10 OPTIMAL CONTROL THEORY

a function is strictly concave if the discriminant is negative definite, i.e., if


|H1 | = fxx < 0 and |H2 | = |H| > 0;
and simply concave if the discriminant is negative semidefinite, i.e., if
|H1 | = fxx ≤ 0 and |H2 | ≥ |H| ≥ 0.
A negative definite discriminant implies a global maximum and is always
sufficient for a maximum. A negative semidefinite discriminant indicates a
local maximum and is sufficient for a maximum if the test is conducted for
every possible ordering of the variables with similar results (see §A.5).

10.3 Free Endpoint


In the case of a free endpoint, say the upper endpoint, the optimal control
problem becomes
Z T
Maximize J = f (x(t), y(t), t) dt
0
subject to ẋ = g(x(t), y(t), t), x(0) = x0 , x(T ) free.
(10.3.1)
Then, assuming that the Hamiltonian is differentiable in y and strictly concave
so that there is an interior solution, the necessary conditions for maximization
remain the same as in (10.2.1) except for condition (e) which becomes
(e) λ(T ) = 0. (10.3.2)
Such a boundary condition is called the transversality condition for a free
endpoint. The justification for condition (10.3.2) is as follows: If the value of
x(T ) is free to vary, the constraint must also be free (i.e., nonbinding), and
so the shadow price λ evaluated at x = T must be zero.
Example 10.2. Solve the following optimal control problem with a free
endpoint:
Z 3
Maximize (5x − y 2 ) dt subject to ẋ = 4y, x(0) = 2, x(3) free.
0

The Hamiltonian is H = 5x − y 2 + λ(4y). Using conditions (10.2.1) we get


from conditions (a)–(c):
∂H
= −2y + 4λ = 0, which gives y = 2λ,
∂y
(10.3.3a)
∂H
λ̇ = − = −5, (10.3.3b)
∂x
∂H
ẋ = = 4y, (10.3.3c)
∂λ
10.4 INEQUALITY CONSTRAINTS AT THE ENDPOINTS 237

Then (10.3.3b) gives


λ(t) = −5t + c1 . (10.3.3d)
The constant c1 is determined using the boundary condition λ(3) = 0 in
(10.3.3d), which gives c1 = 15. From (10.3.3a), (10.3.3c) and (10.3.3d), we
get ẋ = −40t + 120, which upon integration gives

x(t) = −20t2 + 120t + c2 ,

where c2 is determined using condition (d) x(0) = 2, giving c2 = 2. Thus,

x(t) = −20t2 + 120t + 2.

From (10.3.3a) we get the control variable

y(t) = 2λ = −10t + 30,

with the endpoint values y(0) = 30 and y(3) = 0. Thus, the optimal path
of the control variable is linear, from point (0, 30) to (3, 0), with the slope of
−30/3 = −10. 

10.4 Inequality Constraints at the Endpoints


In the case when the terminal value of the state variable x(t) is subject to an
inequality constraint of the type x(T ) ≥ xmin , the optimal value x∗ (T ) may
be chosen freely as long as it does not violate the value set by the constraint
xmin . Thus, if x∗ (T ) > xmin , the constraint becomes nonbinding like the case
of a free endpoint, and we can take λ(T ) = 0 when x∗ (T ) > xmin . In such a
case, conditions (a) through (d) of (10.2.3) remain the same, but the condition
(e) is replaced by λ(T ) = 0, as in the case of a free endpoint. However, if
x∗ (T ) < xmin , the constraint is binding and the optimal solution will require
setting x(T ) = xmin , i.e., a fixed-end problem with

λ(T ) ≥ 0 when x∗ (T ) = xmin . (10.4.1)

The endpoint problems are sometimes reduced to a single statement

λ(T ) ≥ 0, x(T ) ≥ xmin , [x(T ) − xmin ] λ(T ) = 0, (10.4.2)

which is similar to the KKT condition. In practice, however, the problems


with inequality constraints are easy to solve if we follow the following three
steps:
Step 1. Solve the problem as if it were a free endpoint problem.
Step 2. If the optimal value of the state variable x(t) is greater than the
minimum required by the endpoint condition, i.e., if x( T ) ≥ xmin , then the
correct solution has been found.
238 10 OPTIMAL CONTROL THEORY

Step 3. If x∗ (T ) < xmin , set the terminal endpoint equal to the value of
the constraint, x(T ) = xmin , and solve as a fixed endpoint problem.
Example 10.3. Solve
Z 3
Maximize (5x − y 2 ) dt subject to ẋ = 4y, x(0) = 2, x(3) ≥ 180.
0

First, solve it as an unconstrained problem with a free endpoint. From Ex-


ample 10.2, we have:

x(t) = −40t2 + 120t + 2, which gives x(3) = 182 > 180.

Since the free endpoint solution satisfies the terminal endpoint constraint
X(T ) ≥ 180, the constraint is not binding and we thus have a proper solution,
where from Example 10.2, the control variable y(t) = −10t + 30. 
Example 10.4. Solve the same problem as in Example 10.3 but with
the new boundary conditions: x(0) = 5 and x(4) ≥ 190. First, we will use
the complementary slackness condition to find the solution by assuming that
x(1) − S 2 = 190, where S is called the slackness variable. There are two cases
to consider:
Case 1. λ = 0: Then −2y = 0, or y = 0. Also, ẋ = 4y = 0 gives
x(t) = a1 , which using the initial condition x(0) = 2, gives a1 = 2. Then the
terminal condition x(1) = S 2 + 190 gives S 2 + 190 = 2, or S 2 = −188, which
is infeasible.
Case 2. S = 0: Then the first two steps are the same as in Example 10.3
solved as a free endpoint problem. The maximum principle gives

y = 2λ, λ̇ = −5, ẋ = −40t + 8c1 ,

thus, giving λ(t) = −5t + c1 , x(t) = −20t2 + 8c1 t + c2 . Now the new boundary
conditions are x(0) = 5 and x(4) = 190, which yield c2 = 5 and c1 = 15.2.
Hence, λ(t) = −5t + 15.2, x(t) = −20t2 + 121.6t + 5, and y(t) = −10t + 30.4. 

10.5 Discounted Optimal Control


Optimal control problems involving discounting are expressed as follows:
Z T
Maximize J = e−pt f (x(t), y(t), t) dt
0
subject to ẋ = g(x(t), y(t), t), x(0) = x0 , x(T ) free.
(10.5.1)
10.5 DISCOUNTED OPTIMAL CONTROL 239

The Hamiltonian for this problem is

H = e−pt f (x(t), y(t), t) dt + λ(t)g(x(t), y(t), t) dt.

If we set µ(t) = ept λ(t), then the Hamiltonian is modified to

Hc = ept H = f (x(t), y(t), t) dt + µ(t)g(x(t), y(t), t) dt (10.5.2)

where Hc is called the current value Hamiltonian. The above optimization


problem can be solved by the above method for a free endpoint condition,
∂H ∂Hc −pt
where H is replaced by Hc and λ by µ. Then, λ̇ = − =− e . But
∂x ∂x
since la(t) = µ(t) er−pt , we get by differentiating, λ̇ = µ̇ e−pt − pµ e−pt . By
equating these two expressions for λ̇ and canceling the term e−pt , condition
∂Hc
(b) in (10.2.3) is replaced by µ̇ = pµ − . The boundary conditions (d)
∂x
and (e) are similarly adjusted. Thus, assuming an interior solution exists, the
necessary conditions for the current-value Hamiltonian Hc are
∂Hc
(a) = 0,
∂y
∂µ ∂Hc
(b) = µ̇ = − ,
∂t ∂x
∂x ∂Hc (10.5.3)
(c) = ẋ = ,
∂t ∂µ
(d) x(0) = x0 ,
(e) µ(T ) e−pt = 0.

∂Hc
If the solution does not involve an endpoint, then 6 0 in condition
=
∂y
(a), but Hc must still be maximized with respect to y. Since Hc = H ept ,
the value of y that will maximize Hc will also maximize Hc since ept , being
independent of y, is treated like a constant when maximizing with respect
to y. The sufficiency conditions, depending on the sign of the Hessian |H|,
remain the same for the discounted optimal control problem.
Z 3
Example 10.5. Maximize e−0.02t (x − 2x2 − 5y 2 ) dt, subject to ẋ =
0
y − 0.5x, x(0) = 90, and x(3) free. Solution. First, we check the sufficient
conditions to ensure that this problem has a global maximum. The Hessian

fxx fxy −4 0
|H| = = = 40 > 0,
fyx fyy 0 −10

and the first principal minor |H1 | = −4 < 0, while the second principal minor
|H2 | = |H| > 0, which imply that the Hessian is negative definite. Thus, f is
240 10 OPTIMAL CONTROL THEORY

concave in both x and y, and g = y − 4x is linear, so conditions for a global


maximum are satisfied.
Now, the current-valued Hamiltonian is

Hc = x − 2x2 − 5y 2 + µ(y − 0.5x).

Applying the maximum principle (10.5.3) and using p = 0.02, we get

∂Hc
= 10y + µ = 0, which gives y = 0.1µ,
∂y
∂Hc
µ̇ = pµ − = 0.02µ − (1 − 4x − 0.5µ) = 0.52µ + 4x − 1,
∂x
∂Hc
ẋ = = y − 0.5x = 0.1µ − 0.5x.
∂µ

To solve for µ and x, the last two equations above are written in the matrix
form Y = AY + B as
      
µ̇ 0.52 4 µ −1
= + .
ẋ 0.1 −0.5 x 0

The characteristic equation for this system of equations is

0.52 − r 4
|A − rI| = = 0,
0.1 −0.5 − r

where, using formula (A.20) with λ replaced by r so it does not conflict with
the Lagrange multiplier λ, the characteristic roots are
p
0.02 ± (0.02)2 − 4(−0.66)
r1,2 = = 0.82245, −0.80245.
2
For r1 = 0.82245, the eigenvector ye1 is determined by solving the equation
     
0.52 − 0.82245 4 c1 −0.30245 4 c1
= = 0,
0.1 −0.5 − 0.82245 c2 0.1 1.32245 c2

which gives −0.30245c1 + 4c2 = 0, or c1 = 3.306c2 , so that


 
1 3.306
ye = k1 e0.82245t .
1

For r2 = −0.80245, the eigenvector ye2 is obtained by solving


     
0.52 + 0.80245 4 c1 1.32245 4 c1
= = 0,
0.1 −0.5 + 0.80245 c2 0.1 0.30245 c2
10.5 DISCOUNTED OPTIMAL CONTROL 241

which gives 1.32245c1 + 4c2 = 0, or c1 = −3.0247c2, so that


 
−3.0247
ye2 = k2 e−0.80245t .
1

The particular solution is given by Y∗ = −A−1 B, or using (A.10), by


      
µ∗ 1 −0.5 −6 −1 0.758
Y= =− = .
x∗ −0.66 −0.1 0.52 0 0.152

Adding the complementary and particular solutions we get

µ(t) = 3.306k1 e0.82245t − 3.0247k2 e−0.80247t + 0.758,


x(t) = k1 e0.82245t + k2 e−0.80247t + 0.152.

Now, we apply the boundary conditions: Since µ(T ) e−pt = 0 at the free
endpoint, we get at T = 3,

µ(3) e−0.02(3) = 0,

or
[3.306k1 e0.82245(3) − 3.0247k2 e−0.80247(3) + 0.758] e−0.06 = 0,
or
37.033k1 − 0.256k2 + 0.713 = 0. (10.5.4)
Also, at T = 0, x(0) = 90, so we have

k1 + k2 + 0.152 = 90. (10.5.5)

Solving (10.5.4) and (10.5.5) simultaneously by Cramer’s rule, we get k1 =


0.59, and k2 = 89.25. Hence,

µ(t) = 1.95 e0.82245t − 269.94 e−0.80247t + 0.758, costate variable,


0.82245t −0.80247t
x(t) = 0.59 e + 89.25 e + 0.152, state variable,

and

y(t) = 0.1µ(t) = 0.195 e0.82245t−26.994 e−0.80247t+0.0758, control variable. 


242 10 OPTIMAL CONTROL THEORY

10.6 Exercises
Fixed endpoint:
R2
10.1. Maximize 0 (3x − 2y 2 ) dt subject to ẋ = 8y, x(0) = 5, and x(2) free.
Hint. Hamiltonian H = 3x − 2y 2 + λ(8y); ẋ = 16λ, x(t) = −24t2 + 96t + 5
(state variable); y(t) = −6t + 12 (control variable). The optimal path of the
control variable is linear starting at (0, 12) and ending at (2, 0) with slope −6.
Free endpoints:
10.2. Solve the optimal control problem with a free endpoint:
Z 4
Maximize (5x − 2y 2 ) dt subject to ẋ = 8y, x(0) = 2, x(4) free.
0

∂H
Solution. We have = −4y + 8λ = 0, which gives y = 0.5λ, and
∂y
∂H ∂H
λ̇ = − = −5, ẋ = = 8y. Then λ(t) = −5t + c1 , where c1 = 20 using
∂x ∂λ
the boundary condition λ(4) = 0. Also, since ẋ = −80t + 16c1 , we get on
integrating, x(t) = −40t2 + 320t + c2 , where c2 = 2 using condition x(0) = 2.
Thus, x(t) = −40t2 + 320t + 2, and then y(t) = 0.5λ = −2.5t + 10, with the
endpoint values y(0) = 10 and y(4) = 0. Thus, the optimal path of the control
variable is linear, from point (0, 10) to (4, 0), with the slope of −10/4 = −2.5.
Z 2
10.3. Maximize (9x − 12y 2 ) dt, ẋ = 18y, x(0) = 5, x(3) free. So-
0
lution. First we check if the sufficiency condition is met. The Hessian
f fxy 9 0
|H| = xx = = −24 < 0, and the first principal minor
fyx Fyy 0 −24
|H1 | = 9 > 0, while the second principal minor |H2 | = |H| > 0, which imply
that the Hessian is negative definite. Thus, f is concave in both x and y, and
g = 8y is linear, so conditions for a global maximum are satisfied.
The Hamiltonian is

H = 9x − 12y 2 + λ(18y).

Then we have

∂H
= −24y + 18y, which gives y = 0.75λ,
∂y
∂H
λ̇ = − = −9,
∂x
∂H
ẋ = = 18y = 18(0.75λ) = 13.5λ.
∂λ
10.6 EXERCISES 243

Integrating the first of the last two equations, we get λ(t) = −9t + c1 . Then
ẋ = 13.5(−9t + c1 ) = −121.5t + 13.5c1 , which on integrating gives

x(t) = −60.5t2 + 13.5c1 t + c2 .

Using the transversality condition λ(3) = 0, we find that c1 = 27, and thus,
the costate variable is λ(t) = −9t + 27. Condition x(0) = 5 gives c2 = 5, and
the state variable is
x(t) = −60.5t2 + 364.5t + 5,
and the control variable is given by

y(t) = 0.7λ(t) = −12t + 20.25.


Z 2
10.4. Maximize (3y − y 2 − x − 2x2 ) dt, ẋ = x + 2y, x(0) = 6, x(2) free.
0
Solution. First we check if the sufficiency condition is met. The Hessian

fxx fxy −4 0
|H| = = = 8 > 0,
fyx Fyy 0 −2

and the first principal minor |H1 | = −4 < 0, while the second principal minor
|H2 | = |H| > 0, which imply that the Hessian is negative definite. Thus, f is
concave in both x and y, and g = x + 2y is linear, so conditions for a global
maximum are satisfied.
The Hamiltonian is

H = 3y − y 2 − x − 2x2 + λ(x + 2y).

Then we have
∂H
= 3 − 2y + 2λ, which gives y = λ + 1.5,
∂y
∂H
λ̇ = − = 1 + 4x − λ,
∂x
∂H
ẋ = = x + 2y = x + λ + 1.5λ.
∂λ
The last two equations in matrix form Y = AX + B are
      
λ̇ −1 4 λ 1
= + .
ẋ 1 1 x 1.5

The characteristic roots are given by (|A| = −5, tr A = 0)


p
0 ± (0)2 − 4(−5)
r1,2 = = ±2.236.
2
244 10 OPTIMAL CONTROL THEORY

The eigenvector ye1 for the root r1 = 2.236 is obtained from


     
−1 − 2.236 4 c1 −3.236 4 c1
= = 0,
1 1 − 2.236 c2 1 −1.236 c2

or −3.236c1 + 4c2 = 0, which gives c1 = 1.236c2 . Thus,


 
1.236
ye1 = k1 e2.236t .
1

The eigenvector ye2 for the root r2 = −2.236 is obtained from


     
−1 + 2.236 4 c1 1.236 4 c1
= = 0,
1 1 + 2.236 c2 1 3.236 c2

or 1.236c1 + 4c2 = 0, which gives c1 = −3.236c2. Thus,


 
−3.236
ye2 = k2 e−2.236t .
1

For the particular solution we get


      
λ 1 1 −4 1 −1
=− = ,
x (−5) −1 −1 1.5 −0.5

which gives

λ(t) = 1.236k1 e2.236t + 3.236k2 e−2.236t − 1,


x(t) = k1 e2.236t + k2 e−2.236t − 0.5. (9.6.1)

The transversality condition λ(2) = 0 and the initial condition x(0) = 6, when
applied to the above two equations, respectively, give

108k1 − 0.37k2 = 1,
87.532k1 + 0.115k2 = 6.5,

which when solved simultaneously, e.g., by Cramer’s rule, give k1 = 0.056 and
k2 = 13.73. Hence, from Eq (9.6.1) we get

λ(t) = 0.67e2.236t − 44.43e−2.236t − 1, costate variable


2.236t −2.236t
x(t) = 0.05e + 13.73e − 0.5, state variable
y(t) = λ(t) + 1.5 = 0.67e2.236t − 44.43e−2.236t + 0.5, control variable.
10.6 EXERCISES 245

Z 4
10.5. Maximize (5x − 2y 2 ) dt subject to ẋ = 8y, x(0) = 2, x(4) ≥ 800.
0
Solution. From Exercise 10.2 we have

λ(t) = −5t + 20,


x(t) = −40t2 + 320t + 2,
y(t) = −2.5t + 10.

Now we evaluate x(t) at x = 4: x(4) = 642 < 800, so the constraint is


violated. Thus, we redo this problem as a new fixed endpoint conditions:
x(0) = 2 and x(4) = 800. Then from Exercise 10.2, since λ(t) = −5t + c1 ,
and x(t) = −40t2 + 16c1 t + c2 , we apply the new endpoint conditions, and get
x(0) = c2 = 2, and x(4) = −640 + 64c1 + 2 = 800, giving c1 = 22.47. Hence,

λ(t) = −5t + 22.47, costate variable


2
x(t) = −40t + 359.52 + 2, state variable
y(t) = −2.5t + 11.26, control variable.

10.6. Solve the same problem as in Exercise 9.2 but with the new boundary
conditions: x(0) = 5 and x(4) ≥ 650. The first two steps are the same as in
Example k.1 solved as a free endpoint problem. The maximum principle gives

y = 2λ, λ̇ = −5, ẋ = −80t2 + 16c1 ,

thus, giving λ(t) = −5t+c1 , x(t) = −40t2 +16c1 t+c2 . Now the new boundary
conditions are x(0) = 2 and x(4) = 650, which yield c2 = 2 and c1 = 20.125.
Hence, λ(t) = −5t+20.125, x(t) = −40t2 +322t+2, and y(t) = −2.5t+20.125.
10.7. Solve
Z 4
Maximize (5x − 2y 2 ) dt subject to ẋ = 8y, x(0) = 2, x(4) ≥ 620.
0

First, solve it as an unconstrained problem with a free endpoint. From Exer-


cise 10.5, we have:

x(t) = −40t2 + 320t + 2, which gives x(4) = 642 > 620.

Since the free endpoint solution satisfies the terminal endpoint constraint
X(T ) ≥ 620, the constraint is not binding and we thus have a proper solution,
where from Example 9.2, the control variable y(t) = −10t + 40.
Z 1
10.8. Maximize (8x + 3y − 2y 2 ) dt subject to ẋ = 8y, x(0) = 9, x(1) ≥
0
90.
246 10 OPTIMAL CONTROL THEORY

Solution. The Hamiltonian is

H = 8x + 3y − 2y 2 + λ(8y).

Then, using conditions (10.2.3) we have

∂H
= 3 − 4y + 8λ = 0, which gives y = 2λ + 1.5,
∂y
∂H (9.6.2)
λ̇ = − = −8,
∂x
∂H
ẋ = = 8y = 16λ + 12.
∂λ
First, we will use the complementary slackness condition to find the solution
by assuming that x(1) − S 2 = 90. There are two cases to consider:
Case 1. λ = 0: Then 3 − 4y = 0, or y = 0.75. Also, ẋ = 12 gives
x(t) = 12t + a1 , which using the initial condition x(0) = 9, gives a1 = 9. Then
the terminal condition x(1) = S 2 + 90 gives 12 + 9 − S 2 = 90, or S 2 = −69,
which is infeasible.
Case 2. S = 0: Then integrating the last two equations in (9.6.2), we get

λ(t) = −8t + c1 ,
(9.6.3)
ẋ = 16(−8t + c1 ) + 12 = −128t + 16c1 + 12.

The last equation on integration gives

x(t) = −64t2 + 16c1 + 12t + c2 .

Using the initial condition x(0) = 9, we get c2 = 9. Also, using the transver-
sality condition λ(1) = 0, we get c1 = 8. Thus,

λ(t) = −8t + 8,
x(t) = −64t2 = 140t + 9.

Now, to see if this solution is acceptable, we check x(1) = 85 < 90. So the
terminal constraint is violated. Thus, in this situation we solve the problem
with a fixed endpoint condition x(1) = 90. Then from Eqs (9.6.3), condition
x(0) = 9 gives c2 = 9, and the new constraint x(1) = 90 gives c1 = 9.0625.
Hence,

λ(t) = −8t + 9.0625, costate variable


2
x(t) = −64t + 15t + 9, state variable
y(t) = 2λ(t) + 1.5 = −16t + 19.625, control variable.
10.6 EXERCISES 247

Z 3
10.9. Maximize e−0.04t (xy − x2 − y 2 ) dt subject to ẋ = x + 2y, x(0) =
0
130.2, and x(3) free. Solution. First, we check the sufficient conditions to
ensure that this problem has a global maximum. The Hessian

fxx fxy −2 0
|H| = = = 4 > 0,
fyx fyy 0 −2

and the first principal minor |H1 | = −2 < 0, while the second principal minor
|H2 | = |H| > 0, which imply that the Hessian is negative definite. Thus, f is
concave in both x and y, and g = x + 2y is linear, so conditions for a global
maximum are satisfied.
Now, the current-valued Hamiltonian is

Hc = xy − x2 − y 2 + µ(x + 2y).

Applying the maximum principle (9.5.3) and using p = 0.04, we get

∂Hc
= x − 2y + 2µ = 0, which gives y = 0.5(x + 2µ),
∂y
∂Hc
µ̇ = pµ − = 0.04µ − (y − 2x + µ) = −0.96µ + 1.5x,
∂x
∂Hc
ẋ = = x + 2y = 2x + 2µ.
∂µ

To solve for µ and x, the last two equations above are written in the matrix
form Y = AY + B as
      
µ̇ −1.96 1.5 µ 0
= + .
ẋ 2 2 x 0

The characteristic equation for this system of equations is

−1.96 − r 1.5
|A − rI| = = 0,
2 2−r

where, using formula (A.20), the characteristic roots are


p
0.04 ± (0.04)2 − 4(−0.692)
r1,2 = = 2.65, −2.61.
2

For r1 = 2.65, the eigenvector ye1 is determined by solving the equation


     
−1.96 − 2.65 1.5 c1 −4.61 1.5 c1
= = 0,
2 2 − 2.65 c2 2 −0.65 c2
248 10 OPTIMAL CONTROL THEORY

which gives −4.61c1 + 1.5c2 = 0, or c2 = 3.073c1, so that


 
1 1
ye = k e2.65t .
3.073 1

For r2 = −2.61, the eigenvector ye2 is obtained by solving


     
−1.96 + 2.61 1.5 c1 0.65 1.5 c1
= = 0,
2 2 + 2.61 c2 2 4.61 c2

which gives 0.65c1 + 1.5c2 = 0, or c1 = −2.308c2, so that


 
−2.308
ye2 = k2 e−2.61t .
1

The particular solution is given by Y∗ = −A−1 B, where B = {0}. Thus,


adding the complementary and particular solutions we get
      
∗ µ 1 2 −2 0 0
Y = =− = .
x −6.92 −1.5 1.96 0 0

Then

µ(t) = k1 e2.65t − 2.308k2 e−2.61t ,


x(t) = 3.073k1 e2.65t + k2 e−2.61t .

Now, we apply the boundary conditions: at the free endpoint µ(3) e−0.04t = 0,
or µ(3)e−0.04(3) = µ(3) e−0.12 = 0, which gives

13.599k1 − 0.162k2 = 0.

Also, x(0) = 130.2 gives

43.495k1 + 0.0735k2 = 130.2.

Solving these two equations simultaneously by Cramer’s rule, we get k1 =


2.62, and k2 = 220.0589. Hence,

µ(t) = 2.62 e2.65t − 507.89 e−2.61t, costate variable,


2.65t −2.61t
x(t) = 8.05 e + 220.06 e , state variable,

and the control variable is

y(t) = 0.5(x + 2µ(t)) = 6.645 e2.65t − 397.56 e−2.61t.


10.6 EXERCISES 249

Z 1
10.10. Maximize e−0.07t (8x + 3y + xy − 2x2 − 0.8y 2 ) dt subject to
0
ẋ = x + 4y, x(0) = 91, and x(1) free. Solution. First, we check the
sufficient conditions to ensure that this problem has a global maximum. The
Hessian
f fxy −4 1
|H| = xx = = 5.6 > 0,
fyx fyy 1 −1.6
and the first principal minor |H1 | = −4 < 0, while the second principal minor
|H2 | = |H| > 0, which imply that the Hessian is negative definite. Thus, f is
concave in both x and y, and g = x + 4y is linear, so conditions for a global
maximum are satisfied.
Now, the current-valued Hamiltonian is

Hc = 8x + 3y + xy − 2x2 − 0.8y 2 + µ(x + 4y).

Applying the maximum principle (9.5.3) and using p = 0.04, we get

∂Hc
= 3 + x − 1.6y + 4µ = 0, which gives y = 0.625x + 2.5µ + 1.875,
∂y
∂Hc
µ̇ = pµ − = 0.07µ − (8 + y − 4x + µ) = −1.57µ + 3.375x − 9.875,
∂x
∂Hc
ẋ = = x + 4y = 10µ + 3.5x + 7.5.
∂µ

To solve for µ and x, the last two equations above are written in the matrix
form Y = AY + B as
      
µ̇ −1.57 3.375 µ −9.875
= + .
ẋ 10 3.5 x 7.5

The characteristic equation for this system of equations is

−1.57 − r 3.375
|A − rI| = = 0,
10 3.5 − r

where, using formula (A.20), the characteristic roots are


p
0.07 ± (0.07)2 − 4(−39.245)
r1,2 = = 6.299, −6.229.
2

For r1 = 6.299, the eigenvector ye1 is determined by solving the equation


     
−1.57 − 6.299 3.375 c1 −7.869 3.375 c1
= = 0,
10 3.5 − 6.299 c2 10 3.5 − 6.299 c2
250 10 OPTIMAL CONTROL THEORY

which gives −7.869c1 + 3.375c2 = 0, or c2 = 2.3315c1, so that


 
1
ye1 = k e6.299t .
2.3315 1

For r2 = −6.229, the eigenvector ye2 is obtained by solving


     
−1.75 + 6.229 3.375 c1 4.659 3.375 c1
= = 0,
10 3.5 + 6.229 c2 10 9.729 c2

which gives 4.659c1 + 3.375c2 = 0, or c1 = −0.724c2, so that


 
− − 0.724
ye2 = k2 e−6.229t .
1

The particular solution is given by Y∗ = −A−1 B. Thus, by adding the


complementary and particular solutions we get
     
∗ µ 1 3.5 −3.375 −59.875
Y = =− + .
x −39.245 −10 −1.57 78.975

Then

µ(t) = k1 e6.299t − 0.724k2 e−6.229t − 59.875,


x(t) = 2.3315k1 e6.299t + k2 e−6.229t + 78.975.

Now, we apply the boundary conditions: at the free endpoint µ(1) e−0.07t) = 0,
which gives
505.946k1 − 0.0013k2 = 55.684.
Also, x(0) = 91 gives
2.3315k1 + k2 = 12.025.
Solving these two equations simultaneously by Cramer’s rule, we get k1 =
0.11, and k2 = 11.77. Hence,

µ(t) = 0.11 e6.299t − 8.52 e−6.229t − 59.875, costate variable,


x(t) = 0.256 e6.299t + 11.77 e−6.229t + 78.975, state variable,

and the control variable is

y(t) = 0.675x + 2.5µ + 1.875 = 0.435 e6.299t − 18.29 e−6.229t − 168.03.


11
Demands

In microeconomics there are three well-known and highly used demands. They
are the Marshallian, the Hicksian, and the Walrasian demands, which deal
with what consumers will buy in different situations so as to maximize their
profit. Some useful results like Shephard’s lemma and the Slutsky equation
are introduced, and so-called Giffen and Veblen goods are discussed. We
will first introduce Shephard’s lemma, and then analyze the above-mentioned
three demands.

11.1 Shephard’s Lemma


This lemma is used in microeconomics in applications to the theory of the
firm and consumer choices. For more details, see Varian [1992].
Shephard’s lemma states that the demand h for a particular good j, for
a given level of utility u and given price p, is equal to the derivative of the
expenditure function e with respect to the price of the relevant good, i.e.,

∂e(p, u)
hj (p, u) = . (11.1.1)
∂pj

In the theory of the firm, this lemma has a similar form for the conditional
factor demand c(w, y) for each input factor (or good) j:

∂c(w, y)
xj (w, y) = . (11.1.2)
∂wj

Proof. We will consider (11.1.1) only for the two-good case. The general
case and proof of (11.1.2) are analogous. The expenditure function e(p1 , p2 , u)
is the minimand of the constrained optimization problem, and thus, using the
Lagrange multiplier method, the Lagrangian is given by

L(p1 , p2 , u) = p1 x1 + p2 x2 + λ(u − U (x1 , x2 )), (11.1.3)

where U (x1 , x2 ) is the prescribed constraint. The derivatives of the minimand


252 11 DEMANDS

e(p1 , p2 , u) with respect to the parameters pj , j = 1, 2, are given by

∂e ∂L
= = xhj , j = 1, 2,
∂pj ∂pj

where xhj is the minimizer (i.e., the Hicksian demand function for good j, j =
1, 2). 

11.2 Marshallian Demand


Named after Alfred Marshall (British, 1842-1924, author of Principles of Eco-
nomics 1890), the Marshallian demand in microeconomics specifies what the
consumer would buy in each price and income or wealth situation, assum-
ing it perfectly solves the utility maximization problem. It is compared with
Walrasian demand (named after Léon Walras), also known as uncompensated
demand, since the original Marshallian analysis ignored wealth effects. Using
the utility maximization problem, there are m commodities with price vector
p and choice vector x. The consumer has income i, and a set of affordable
packages
b(p, i) = {x : p, x ≤ i}, (11.2.1)

where p, x is the inner product of price and quantity vectors. The consumer
has a utility function u : Rm
+ 7→ R. Then the consumer’s Marshallian demand
correspondence is defined uniquely by

x∗ (p, i) = arg max {u(x)} , (11.2.2)
x∈b(p,i)

which is a homogeneous function of degree zero, i.e., for every constant a > 0,

x∗ (a p, a i) = x∗ (p, i). (11.2.3)

Suppose p and i are measured in dollars. When a = 100, ap and a i are


exactly the same quantities measured in cents. Obviously, changing the units
of measurement does not affect the demand.
The following examples deal with two goods, 1 and 2.
β
1. The utility function has the Cobb-Douglas form: u(x1 , x2 ) = xα
1 x2 ; then
the constrained optimization problem leads to the Marshallian demand func-
tion  
αi βi
x∗ (p1 , p2 , i) = , . (11.2.4)
(α + β)p1 (α + β)p2
2. The utility function is a CES utility function:
 xδ xδ2 1/δ
1
u(x1 , x2 ) = + . (11.2.5)
δ δ
11.2 MARSHALLIAN DEMAND 253

Then
 i p1ε−1 i p2ε−1  δ
x∗ (p1 , p2 , i) = , , ε= . (11.2.6)
p1ε−1 + p2ε−1 p1ε−1 + p2ε−1 δ−1

In both case, the preferences are strictly convex, the demand is unique, and
the demand function is continuous.
3. The utility function has the linear form u(x1 , x2 ) = x1 +x2 , which is weakly
convex, and in fact the demand is not unique: when p1 = p2 , the consumer
may divide his income in arbitrary ratios between goods 1 and 2 and get the
same utility.
4. The utility function exhibits a non-diminishing marginal rate of substitu-
tion
u(x1 , x2 ) = (xα α
1 + x2 ), α > 1. (11.2.7)

The utility function is concave, and the demand is not continuous: When
p1 < p2 , the consumer demands only good 1, and when p1 > p2 , the consumer
demands only good 2, and when p1 = p2 the demand correspondence contains
two distinct bundles, either buy only good 1 or buy only good 2.

Example 11.1. A consumer’s preferences are represented by the utility


function
U (x, y) = x + y + 2y 1/2 ,

where good 1 is the numéraire with price p1 = 1, good 2 has price p2 , and
the consumer’s income is m. (i) Find the consumer’s Marshallian demands
for both goods as a function of p1 , p2 and m, and the corner solutions, if
any; (ii) use the solution in part (i) and the relevant homogeneity property
of Marshallian demands to determine the consumer’s demands for goods 1
and 2 for arbitrary non-negative prices p1 , p2 and income m; (iii) find the
consumer’s Hicksian demand (§11.2) functions hi (p1 , p2 , u), i = 1, 2; (iv) find
the consumer’s expenditure function e(p1 , p2 , u); and (v) verify if Shephard’s
lemma applies in this case.
Solution. (i) The marginal rate of substitution between good 2 and good
1 is 1 + y −1/2 > 1. At an interior solution, we must have p2 = 1 + y −1/2 ,
which is possible only if p2 > 1. If the consumer chooses positive amounts of
both goods, the Marshallian demands are given by
 1 2  1 2
y(1, p2 , m) = , and x(1, p2 , m) = m − p2 . (11.2.8)
p2 − 1 p2 − 1

1 2 
The consumption of good 1 is positive iff p2 > 1 and m > p2 . If the
p2 − 1
consumption of good 1 is zero, then y(1, p2 , m) = m/p2 and x(1, p2 , m) = 0.
254 11 DEMANDS

The consumption of good 2 is always positive, because the marginal rate of


substitution tends to infinity as y → 0.
(ii) Since the demand is homogeneous of degree zero in prices and income,
we have  p m
2
xi (p1 , p2 , m) = xi 1, , , i = 1, 2. (11.2.9)
p1 p1
Then from part (i) we find that at an interior solution
 1 2  p 2
1
y(p1 , p2 , m) = = ,
(p2 /p1 ) − 1 p2 − p1
(11.2.10)
m p2  1 2 m p1 p2
x(p1 , p2 , m) = − = − .
p1 p1 (p2 /p1 ) − 1 p1 p2 − p1

(iii) At an interior solution,


 p1 2
h2 (p1 , p2 , u) = ,
p2 − p1
p
h1 (p1 , p2 , u) = u − h2 (p1 , p2 , u) − 2 h2 (p1 , p2 , m)
 p
1
2  p
1
 (11.2.11)
=u− −2
p2 − p1 p2 − p1
h p 2 i  p 2
1 1
=u− +1 =u+1= .
p2 − p1 p2 − p1
Thus, there is an interior solution iff
 p1  2
u+1> .
p2 − p1
If  p1  2
u+1< ,
p2 − p1
p
then h1 (p1 , p2 , u) = 0, and u = h2 (p1 , p − 2, u) + 2 h2 (p1 , p2 , u). The
right-hand side of the last equation is a strictly increasing function of h2 ,
which means that for any prescribed (p1 , p2 , u) there is a unique solution for
h2 (p1 , p2 , u). However, such a solution does not have a simple closed form
representation.
(iv) We have

e(p1 , p2 , u) = p1 h(p1 , p2 , u) + p2 h2 (p1 , p2 , u)


  p 2   p 2
1 1
= p1 u − + 1 + 1 + p2 using part (iii),
p2 − p1 p2 − p1
p2 p21 − p1 p22 p1 p2
= p1 (u + 1) + = p1 (u + 1) − .
(p2 − p1 )2 p2 − p1
(11.2.12)
11.3 HICKSIAN DEMAND 255

(v) First, in the case of interior solutions, we have, using Shephard’s lemma,

∂e(p1 , p2 , u)
= hi (p1 , p2 , u), i = 1, 2.
∂pi

Next, differentiating (11.2.12) with respect to p2 , we get

∂e(p1 , p2 , u) −(p2 − p1 )p1 + p1 p2 p21


= 2
=
∂p2 (p2 − p1 ) (p2 − p1 )2
= h2 (p1 , p2 , u) from part (iii),

which verifies that Shephard’s lemma applies to good 2. Again, differentiating


(11.2.12) with respect to p1 , we get

∂e(p1 , p2 , u) (p2 − p1 )p2 + p1 p2 − 2


=u+1−
∂p1 (p2 − p1 )2
 p 2
2
=u+1−
p2 − p1
= h1 (p1 , p2 , u) from part (iii),

which verifies that Shephard’s lemma holds for good 1.

11.3 Hicksian Demand


Named after John Richard Hicks (British neo-Keynesian economist, 1904-
1989, NL 1972), in microeconomics, a consumer’s Hicksian demand correspon-
dence is the demand of a consumer over a bundle of goods that minimizes their
expenditure while delivering a fixed level of utility. It is defined as follows:
Given a utility function u : Rn+ 7→ R, the Hicksian demand correspondence
h∗ : Rn++ × u(R + +n ) 7→ Rn+ is defined by

h∗ (p, ū) = arg min p · x subject to u(x) ≥ ū.
x∈Rn
+

If this correspondence is a function, then it is called the Hicksian demand


function or compensated demand function. Mathematically,
n nX oo
h(p, ū) = arg min pj xj subject to u(x) ≥ ū. (11.3.1)
x
j

Hicksian demand functions are often convenient for mathematical manipu-


lation because they do not require income or wealth to be represented. Also,
the function to be minimized is linear in xj , which makes the optimization
problem simpler. However, the Marshallian demand functions of the form
256 11 DEMANDS

x(p, w) that describe demand given by prices p and income w are easier to
observe directly. The two are related by

h(p, u) = x(p, e(p, u)), (11.3.3)

where e(p, u) is the expenditure function (the function that gives the minimum
wealth required to get to a utility level), and by

h(p, v(p, w)) = x(p, w), (11.3.3)

where v(p, w) is the indirect utility function (which gives the utility level
of having a given wealth under a fixed price regime). Their derivatives are
related by the Slutsky equation (see §11.4).
Whereas the Hicksian demand comes from the expenditure minimization
problem, the Marshallian demand comes from the utility maximization prob-
lem. Thus, the two problems are mathematical duos, and hence the duality
theorem provides a method of proving the above relationships.
The Hicksian demand function is intimately related to the expenditure
function. If the consumer’s utility function u(x) is locally nonsatiated and
strictly convex, then h(p, u) = ∇p e(p, u). If the consumer’s utility function
u(x) is continuous and represents a locally nonsatiated preference relation,
then the Hicksian demand correspondence h(p, u) satisfies the following prop-
erties:
1. Homogeneity of degree zero in p:P For all a > 0, h(ap, u) =Ph(p, u). This is
because the same x that minimizes j pj xj also minimizes j apj xj subject
to the same constraint.
2. No excess demand: The constraint u(x) ≥ ū holds with strict equality,
u(x) = ū. This follows from the continuity of the utility function. Informally,
they could simply spend less until utility becomes exactly ū.
3. Hicksian demand finds the cheapest consumption bundle that achieves a
given utility level.

11.4 Slutsky Equation


The Slutsky equation (or Slutsky identity) relates changes in the Marshallian
(uncompensated) demand to changes in Hicksian (compensated) demand, just
to maintain a fixed level of utility. Thus, this equation decomposes the change
in demand for good j in response to the change in price of good k, and is
defined by
∂xj (p, w) ∂hj (p, u) ∂xj (p, w)
= − xk (p, w), (11.4.1)
∂pk ∂pk ∂w
where h(p, u) is the Hicksian demand and x(p, w) is the Marshallian demand,
both at the price levels p (vector), wealth level (or income level) w, and fixed
11.4 SLUTSKY EQUATION 257

utility u by maximizing the utility at the original price and income vectors,
formally given by v(p, w). The right-hand side of Eq (11.4.1) represents the
change in demand for good j holding utility fixed at u minus the quantity of
good k demanded, multiplied by the change in demand for good j when wealth
changes. Thus, the first term on the right-hand side of Eq (11.4.1) represents
the substitution effect, and the second term the income effect. Since utility
cannot be observed, the substitution effect is not directly observable, but it
can be calculated using the other two terms in Eq (11.4.1).
To derive Eq (11.4.1), use the identity hj (p, u) = xj (p, e(p, u)), where
e(p, u) is the expenditure function, and u is the utility obtained by maxi-
mizing utility for a given p and w. Then differentiating hj (p, u) partially with
respect to pk , we get
∂hj (p, u) ∂xj (p, e(p, u)) ∂xj (p, e(p, u)) ∂e(p, u)
= + · . (11.4.2)
∂pk ∂pk ∂e(p, u) ∂pk
∂e(p, u)
Then, using the fact that = hk (p, u) by Shephard’s lemma, and that
∂pk
hk (p, u) = hk (p, v(p, w)) = xk (p, w) at the optimum, where v(p, w) is the
indirect utility function, and substituting these results into Eq (11.4.2), we
obtain Eq (11.4.1).
The Slutsky equation (11.4.1) shows that the change in the demand for a
good, caused by a price change, can be explained by the following two effects:
(i) a substitution effect that results in a change in the relative prices of two
goods, and (ii) an income effect that results in a change in the consumer’s
purchasing power. For more details, see Nicholson [1978].
The Slutsky equation (11.4.1) can also be written as
∂hj (p, u) ∂xj (p, w) ∂xj (p, w)
= + xk (p, w), (11.4.3)
∂pk ∂pk ∂w
or in matrix form as
Dp h(p, u) = Dp x(p, w) + Dw x(p, e) x(p, e)T . (11.4.4)
| {z } | {z } | {z }
n×n n×1 1×n

Another Proof. Take any (p, w, u), and recall that h(p, u) = x(p, w)
and e(p, u) = w. Now, differentiate hj (p, u) = xj (p, e(p, u)) with respect to
pj :
∂hj (p, u) ∂xj (p, e(p, u)) ∂xj (p, e(p, u)) ∂e(p, u)
= +
∂pk ∂pk ∂w ∂pj
∂xj (p, e(p, u)) ∂xj (p, e(p, u))
= + hj (p, u)
∂pk ∂w
∂xj (p, w) ∂xj (p, w)
= + xj (p, w). 
∂pk ∂w
258 11 DEMANDS

The formula (11.4.1) signifies both substitution effect and income effect as
follows:
∂xj (p, w) ∂hj (p, u) ∂xj (p, w)
= − xk (p, w) . (11.4.5)
∂pk ∂p | ∂w {z
| {zk } }
substitution effect income effect

Figure 11.1 Normal goods.

Figure 11.2 Inferior goods.

According to Slutsky, the substitution effect, which is the change from x


to x′ , always moves opposite to the price effect, which is the change from x′
to x′′ . Thus, the substitution effect is negative since the change in demand
due to the substitution effect is opposite from the price change. On the other
hand, the income effect may be negative or positive depending on whether
the good is inferior or not. Hence, in the case of normal goods, since both
the substitution and income effects increase demand when their own price
11.4 SLUTSKY EQUATION 259

decreases, the demand curve slopes downward (Figure 11.1). But in the case
of inferior good the income effect and the substitution effect are in the opposite
direction (Figure 11.2).
Example 11.2. (1). Let p = (p1 , p2 ) be original prices, and x = (x1 , x2 )
be the original demand. Let p1 be decreased to p′1 . While the initial income
was i = p1 x1 + p2 x2 , now the consumer needs to buy only p′1 x1 + p2 x2 = i′ ,
thus, the consumer’s income becomes i − i′ = (p1 − p′1 )x1 . Thus, at the new
price, (i) if less income is needed than before to buy the original choice, then
real income has increased; and (ii) if more income than before is needed to
buy the original choice, then the real income has decreased.
(2). To determine the changes in quantities demanded when the consumer’s
income is adjusted so that at new prices he/she can just buy the original
choice, let (i) the change be from x to x′ (known as the pure substitution
effect), and (ii) the subsequent change from x′ to x′′ (known as the pure
income effect). For example, suppose that the demand function for milk is
i 120
x1 = 10 + . If initially, i = 120 and p1 = 3, then x1 = 10 + =
10p1 10 × 3
14 units of milk. Thus, 14 × 3 = 42 is spent on milk, and the remaining
120 − 42 = 78 is spent on other goods. Now, suppose p1 has decreased
to p′1 = 2. Then how much income is needed to buy the initial bundle?
106
Obviously, 78+2×14 = 106. Then the consumer will buy x′1 = 10+ = 15.3
10
units of milk with that money. Thus, 15.3 − 14 = 1.3 is the substitution effect.
120
Next, what is the income effect? Obviously, x′′1 = 10 + = 16 will be
10 × 2
the eventual consumption, giving a change of 16 − 14 = 2 units. Hence, the
2 − 13 = 0.7 is the income effect.
The substitution effect always moves opposite to the price effect. Thus,
we say: the substitution effect is negative when the change in demand due to
the substitution effect is opposite to the price change. However, the income
effect may be negative or positive depending on whether the good is inferior
or not. 
11.4.1 Giffen Goods. If the income effect and the substitution effect are
in opposite directions and if the income effect is larger than the substitution
effect, then a price decrease lessens the demand. Such goods are called Giffen
goods. For a consumer who consumes multiple goods, the Giffen goods are
inferior goods (Figure 11.3).
In fact, the above definition of Giffen good implies that

∂xci (p, M )
> 0. (11.4.6)
∂pi
260 11 DEMANDS

Figure 11.3 Giffen goods.

Using the Slutsky equation for the own-price effect of good i, the inequality
(11.4.6) implies that

∂xci (p, ū) ∂xc (p, M )


− xi i > 0,
∂pi ∂M
or
∂xci (p, ū) ∂xc (p, M )
> xi i . (11.4.7)
∂pi ∂M
Also,
∂xci (p, ū)
< 0, (11.4.8)
∂pi
so from the above section we know that for inequality (11.4.7) to hold we must
at least have
∂xci (p, M )
< 0. (11.4.9)
∂M
But this is the definition of an inferior good, which means that every Giffen
good is also inferior. On the other hand, not every inferior good is a Giffen
good because the right-hand side of the inequality (11.4.7) could be negative
but larger than the left-hand side (in absolute value). In such a case the
own-price effect would be negative, which means that for a good to be a
∂xc (p, M )
Giffen good, the income effect times demand, xi i , must be smaller
∂M
∂xci (p, ū)
(or larger in absolute vale) than the substitution effect , which is
∂pi
inequality (11.4.7).
11.5 WALRASIAN DEMAND 261

Finally, note that a Giffen good faces an upward sloping demand curve
because the income effect dominates the substitution effect, which means that
quantity demand increases as the price rises. However, a good cannot always
have an upward sloping demand curve, because the consumer eventually runs
out of money. At some point the rising price of the Giffen good surpasses the
consumer’s budget, and a price increase will lower the amount of the good
the consumer is able to buy. This means that at higher enough prices, the
demand curve will start sloping downward (Figure 11.4, where A marks the
point at which the good surpasses consumer’s budget; and the ‘Giffen good
range’ represents the range where only such good is consumed).

Figure 11.4 Giffen goods and consumer’s budget.

11.4.2 Veblen Goods. (Veblen [1899]) These goods are types of luxury
goods for which the quantity demanded increases as the price increases, which
appears to be in contradiction to the Law of Demand. Consumers prefer
more of a good as its price rises, resulting in an upward sloping demand
curve. These goods are, for example, personal goods such as wines, jewelry,
designer handbags and luxury cars, and are in demand simply because the high
prices asked for them. They make a desirable status symbol as conspicuous
consumption and leisure. This phenomenon is also known as the Veblen effect,
where goods are desired even if they become over-priced. A corollary is that
a decrease in their price decreases their demand. This effect is known as the
snob effect, the bandwagon effect, the network effect, the hot-hand fallacy,
and the common law of business balance. None of these effects, however, can
predict what will happen to the actual quantity of goods demanded as prices
change. The actual effect on quantity demanded will depend on the range of
other available goods, their prices and substitutions for consumed goods.

11.5 Walrasian Demand


Consider the utility maximization problem

max u(x) subject to p · x ≤ w, x ∈ B(p, w). (11.5.1)


x∈X
262 11 DEMANDS

The Walrasian Demand correspondence is defined as the set of solutions


x(p, w) ⊂ X of the maximization problem (11.5.1) given p ≫ 0 and w ≥ 0.
The set x(p, w) is not empty for any such (p, w) if u is continuous. An impor-
tant property of x(p, w) is the so-called Walrasian demand.
For the definition of a locally nonsatiated function, see Appendix F.
Theorem 11.1. Suppose u is a continuous, locally nonsatiated function,
and let X = Rn+ . Then the Walrasian demand correspondence x : Rn++ ×R+ 7→
Rn+ has the following four properties:
(i) Homogeneity of degree 0: x(αp, αw) = x(p, w) for any α > 0 and (p, w);
(ii) Walras’s law: p · x′ = w for any x′ ∈ x(p, w) and (p, w);
(iii) Convexity: x(p, w) is convex for any (p, w) if u is quasi-concave; and
(iv) Continuity: x is upper semicontinuous.
Proof. Property (i) follows from the definition and (11.5.1); for (ii) use
local satiation; (iii) is obvious; and (iv) follows from the maximum theorem. 
Notes. The function x(p, w) is a single point if u is strictly quasi-concave,
and is continuous if it is single valued.
The function x(p, w) can be obtained for each (p, w) by applying the KKT
conditions if u is differentiable; these conditions are necessary in the case of
constrained optimization, and are sufficient if u is quasi-concave.
Suppose u is locally nonsatiated and the optimal solution for problem
(11.5.1) is an interior solution. Then the KKT conditions are

∂u(x)
− λp = 0, p · x = w. (11.5.2)
∂x
∂u(x)
Boundary solution: For some k, − λpk = 0 should be replaced by
∂xk

∂u(x) ≤ 0 if xk = 0,
− λpk (11.5.3)
∂xk = 0 if xk > 0.
Example 11.3. For a boundary solution, let

DUx1 (x1 , x2 ) − λp1 ≤ 0, Dux2 (x1 , x2 ) − λp2 = 0.

This equation can be written as


 
1
∇u(x) − λp + µ = 0, for some µ ≥ 0.
0

Specifically, take u(x) = x1 + x2 . Assume an interior solution for x1 . Then
the KKT conditions become

1 ≤ 0 if x2 = 0
√ − λp1 = 0, 1 − λp2 ; p · x = w.
2 x1 = 0 if x2 > 0
11.5 WALRASIAN DEMAND 263

Then the solution is


p2 w p2
(i) x1 (p, w) = 22 , x2 (p, w) = − , λ(p, w) = 1 when 4p1 w > p22 ;
4p1 p2 4p1
w 1
(ii) x1 (p, w) = , x2 (p, w) = 0, λ(p, w) = √ when 4p1 w ≤ p22 .
p1 2 p1 w
Note that there is no income effect on x1 , i.e., x1 is independent of w as
long as 4p2 w > p22 . 
For any (p, w) ∈ Rn++ × R+ , the indirect utility function is defined as

v(p, w) := u(x′ ),

where x′ ∈ x(p, w).


Theorem 11.2. The indirect utility function v(p, w) satisfies the following
properties: (i) it is homogeneous of degree 0; (ii) it is nonincreasing in pk for
any k and strictly increasing in w; (iii) it is quasi-convex; and (iv) it is
continuous.
Proof. Proofs of (i) and (ii) are obvious; (iv) follows from the maximum
theorem. For the proof of (iii), suppose max{v(p′ , w′ ), v(p′′ , w′′ ) ≤ v̄ for any
(p′ , w′ ), (p′′ , w′′ ) ∈ Rn++ × R+ and v̄ ∈ R. Then for any t ∈ [0, 1] and any
x ∈ B(tp′ + (1 − t)p′′ , tw′ + (1 − t)w”), either x ∈ B(p′ , w′ ) or x ∈ B(p′′ , w′′ )
must hold. Thus,

v(tp′ + (1 − t)p′′ , tw′ + (1 − t)w′′ ) ≤ max{v(p′ , w′ ), v(p′′ , w′′ ) ≤ v̄.  (11.5.4)

Example 11.4. Let the Cobb-Douglas utility function be

n
X X
u(x) = αj log xj , αj ≥ 0, nαj = 1, (11.5.5)
j=1 j=1

where αj is the fraction of the expense for good j. Then the Walrasian demand
αj w
is xj (p, w) = . Hence,
pj

n
X
v(p, w) = log w + αj (log αj − log pj ). 
j=1

Example 11.5. Let the quasi-linear utility function be


p p2 w
v(p, w) = x1 (p, w) + x2 (p, w) = + ,
4p1 p2
264 11 DEMANDS

where we assume an interior solution. Then the indirect utility function is of


the form v(p, w) = a(p) + w. 
Example 11.6. (Labor supply) Consider the following simple labor deci-
sion problem:

max {(1 − t) log q + t log τ } subject to pq + wτ ≤ wT + P, τ ≤ T,


q,τ ≥0

where q is the amount of consumed good, p is the price, T is the total time
available, τ is the time spent for ‘leisure,’ which determines h = T − τ hours
of work, w is wage, so wh is labor income, and P is nonlabor income. Since
the utility function is Cobb-Douglas, the Walrasian demand is

(1 − t)(wT + P ) τ (wT + P )
q(p, w, P ) = , τ (p, w, P ) = for τ < T .
p w

In the case when τ ≤ T , the consumer does not participate in the labor market
(τ (p, w, P ) = t) and spends all labor income to purchase goods (q(p, w, P ) =
P/p). For τ < T , the indirect utility function is

v(p, w, wT + P ) = K + log(wT + P ) − t log p − (1 − t) log w,

where K is a constant. 

11.6 Cost Minimization


For each p ≫ 0 and u ∈ R, consider the problem

min{p · x} subject to u(x) ≥ ū. (11.6.1)


x∈X

In other words, this problem asks as to what is the cheapest way to get
utility as high as ū. Let the Hicksian demand correspondence1 h(p, ū) be
the set of solutions to the problem (11.6.1). Assuming local nonsatiation,
the constraint can be modified as follows: if u is continuous and the set
U ≡ {x ∈ X : u(x) ≥ ū} is not empty, then choose any x′ such that u(x′ ) > ū,
and choose any x̄ ∈ R+ such that p′j x̄ ≥ p′ · x′ for all j and all p′ in a
neighborhood of p ≫ 0. This replaces the constraint by a compact set. Then
the cost minimizing solution of problem (11.6.1) is the same locally with
respect to (p, ū). Moreover, the set h(p, ū) is not empty in the neighborhood
of (p, ū) ∈ Rn++ × U .
Theorem 11.6. (Hicksian demand) Let u be continuous and locally non-
satiated, and let X = Rn+ . Then the set h, known as the Hicksian correspon-
dence and defined by h : Rn++ × U 7→ Rn+ (i) is homogeneous of degree 0 in
1 This correspondence is also used in welfare analysis.
11.7 EXPENDITURE FUNCTION 265

p; (ii) achieves ū exactly (i.e., u(x′ ) = ū for any x′ ∈ h(p, ū) if ū ≥ u(0));
(iii) is convex for any given (p, ū) if u is quasi-concave; and (iv) is upper
semicontinuous.
The set h(p, u) reduces to a point if u is strictly quasi-concave.
Proof. (i), (ii), and (iii) are easy. To prove (iv), we cannot apply the
maximum theorem as we did in the proof of Theorem 11.1 on the Walrasian
demand, because in the present case the set is not locally bounded. 

11.7 Expenditure Function


The expenditure function e(p, ū) is defined as e(p, ū) := p · x′ for any x′ ∈
h(p, ū). The following result holds:
Theorem 11.7. Let u be continuous and locally nonsatiated, and let X =
Rn+ . Then the expenditure function e : Rn++ × U 7→ R is (i) homogeneous of
degree 1 in p; (ii) nondecreasing in pj for any j and strictly increasing in ū
for ū > u(0); (iii) concave in p; and (iv) continuous.
Hence, we find that utility maximization and cost minimization go hand-
in-hand together; one implies the other. The following result holds.
Theorem 11.8. (Mas-Colell et al. [1995: 156]; Nicholson [1978: 90-
93]) Let u be continuous and locally nonsatiated, and let X = Rn+ . Then
(i) if x∗ ∈ x(p, w) for given p ≫ 0 and w ≥ 0, then x∗ ∈ h(p, v(p, w)) and
e(p, v(p, w)) = w; and (ii) if x∗ ∈ h(p, ū) for given p ≫ 0 and ū ≥ u(0), then
x∗ ∈ x(p, e(p, ū)) and v(p, e(p, ū)) = ū.
Proof. Suppose utility maximization does not imply cost minimization,
i.e., there exists an x′ ∈ Rn+ that satisfies u(x′ ) ≥ u(x∗ ) and p · x′ < p · x∗
(which, by Walras’s law, equals w). Then by local nonsatiation, there also
exists an x′′ ∈ Rn+ that satisfies u(x′′ > u(x∗ ) and p · x′′ < w. But this
contradicts utility maximization. Hence, x∗ ∈ h(p, v(p, w)) and e(p, v(p, w)) =
p · x∗ = w.
Next, suppose cost minimization does not imply utility maximization, i.e.,
there exists an x′ ∈ Rn+ that satisfies u(x′ ) ≥ u(x∗ ) ≥ ū and p · x′ ≤ p · x∗ .
Note that p · x′ > 0 because ū ≥ u(0). Let xt := t0 = (1 − t)x′ ∈ X for
t ∈ [0, 1]. Then u(xt ) > u(x∗ ) and p · xt < p · x∗ because p · x′ > 0 for
small t. But this contradicts cost minimization. Hence, x∗ ∈ x(p, e(p, w)) and
v(p, v(p, ū)) = u(x∗ ) = ū since u > u(0). 

Example 11.7. The Walrasian demand curve of good j is downward


∂xj (p, w) ∂xj (p, w)
sloping (i.e., < 0) if it is a normal good (i.e., ≥ 0). If
∂pj ∂w
∂xj (p, w)
good j is an inferior good (i.e., < 0), then xj can be a Giffen good
∂w
∂xj (p, w)
(i.e., > 0). 
∂pj
266 11 DEMANDS

The right-hand side of formula (11.4.3) is known as the Slutsky matrix or


substitution matrix, denoted by S(p, w), which for all (p, w) ≫ 0 and ū =
v(p, w), has the following properties: (i) S(p, w) = Dp2 e(p, ū), (ii) S(p, w) is
negative semidefinite, (iii) S(p, w) is a symmetric matrix, and (iv) S(p, w)p =
0.
To prove these properties, note that (i) follows from the previous theorem;
(ii) and (iii) hold, since e(p, ū) is concave and twice continuously differentiable;
and (iv) follows since h(p, ū) is homogeneous of degree 0 in p, or x(p, w) is
homogeneous of degree 0 in (p, w), and Walras’s law applies.

11.8 Applications
Labor Supply. We will apply a Slutsky equation to the labor/leisure deci-
sion problem. Let the income vector i be given by i = wT + P . Then, since
w affects the labor τ , we get from (11.4.3),
∂τ (p, w, i) ∂τ (p, w, i) ∂τ (p, w, i)
= + T. (11.8.1)
∂w ∂w ∂i
If we differentiate τ (p, w, i) = τ (p, we(p, w, ū)) with respect to w(τ (p, w, ū)),
we obtain
∂τ (p, w, i) ∂τ (p, w, i) ∂τ (p, w, i)
= + τ (p, w, i). (11.8.2)
∂w ∂w ∂i
Hence,
∂τ (p, w, i) ∂τ (p, w, ū) ∂τ (p, w, i) ∂τ (p, w, i)
= − τ (p, w, i) + T
∂w | ∂w {z } | ∂i{z } | ∂i{z }
substitution effect income effect I income effect II
∂τ (p, w, ū) ∂τ (p, w, i) 
= + T − τ (p, w, i) . (11.8.3)
∂w ∂i
For the labor supply h = T − τ , the relation (11.8.3) becomes
∂τ (p, w, i) ∂h(p, w, ū) ∂h(p, w, i)
= − h(p, w, i), (11.8.4)
∂w ∂w ∂i
which is called Roy’s identity. This identity can also be written as follows:
For all (p, w) ≫ 0,
1
x(p, w) = − ∇p v(p, w). (11.8.5)
Dw v(p, w)
Proof. For any (p, w) ≫ 0, and ū = v(p, w), we have v(p, e(p, varu)) = ū,
which upon differentiation gives
∇p v(p, e(p, ū)) + Dw v(p, e(p, ū)) ∇p e(p, ū) = 0,
∇p v(p, e(p, ū)) + Dw v(p, e(p, ū)) h(p, ū) = 0
∇p v(p, w) + Dw v(p, w)x(p, w) = 0, (11.8.6)
11.9 EXERCISES 267

which on rearranging yields (11.8.5). 


Notes: (1) If u is differentiable and locally nonsatiated, and if X = Rn+ ,
then all previous theorems are applicable.
(2) If ū > u(0) and w > 0, and u is quasi-concave, and prices p and demands
hj are strictly positive, then Walrasian demand and Hicksian demand are
characterized by the following KKT conditions:
For Walrasian demand: ∇u(x) − λp = 0, and w − p · x = 0.
For Hicksian demand: p − λ∇u(x) = 0 and u(x) − ū = 0.
(3) The implicit function theorem (see Appendix E) implies that x(p, w) is
a C 1 function if the derivative
 2 
D u(x) −p
−pT 0

with respect to (x, λ) is a full rank matrix, i.e., we must show that
 1 
D2 u(x) − Du(x)T
 λ  (11.8.7)
1
− Du(x) 0
λ
is full rank. This is satisfied when u is differentiable strictly quasi-concave.
(Use the following definition):
u : X (⊂ Rn ) 7→ R is differentiable strictly quasi-concave if
∆xT D2 u(x)∆x < 0 for any ∆x(6= 0) ∈ Rn such that Du(x)∆x = 0.

11.9 Exercises
11.1. A consumer has preferences represented by the utility function
1/2
u(x1 , x2 ) = x1 + x2 + 2x2 , where good 1 is the numéraire and price p1 = 1.
The price of good 2 is p2 , and the consumer’s income is m. (a) Find this
consumer’s Marshallian demand for goods 1 and 2 as a function of p2 and
m, accounting for corner solutions, if any; (b) using the result of part (a)
and the relevant homogeneity property of Marshallian demand, find this con-
sumer’s demand for goods 1 and 2 for arbitrary nonnegative prices p1 , p2 and
income m; (c) find the consumer’s Hicksian demand functions h1 (p1 , p2 u) and
h2 (p1 , p2 u); (d) find this consumer’s expenditure function e(p1 , p2 , u); and (e)
verify that Shephard’s lemma applies in this case (part (d)).
Ans. (a) The marginal rate of substitution between good 1 and good 2 is
−1/2 −1/2
1 + x2 > 1. At an interior solution, we must have p2 = 1 + x2 , which
is possible only if p2 > 1. If the consumer chooses positive amounts of both
goods, the Marshallian demands are given by
 1 2  1 2
x2 (1, p2 , m) = , x1 (1, p2 , m) = m − p2 . (11.9.1)
p2 − 1 p2 − 1
268 11 DEMANDS

1 2 
There is positive consumption of good 1 iff p2 > 1 and m > p2 . If
p2 − 1
the consumption of good 1 is zero, then x1 (1, p2 , m) = 0 and x2 (1, p2 , m) =
m/p2 . Thus the consumption of good 2 is always positive since the marginal
rate of substitution approaches infinity as x2 approaches zero.
(b) Since demand is homogeneous of degree zero in prices and income, we
have  p m
2
xi (p1 , p2 , m) = xi 1, , . (11.9.2)
p1 p1
Then, using the result of part (a), at an interior solution we have
 1 2  p 2
2
x2 (p1 , p2 , m) = = ,
(p2 /p1 ) − 1 p2 − p1
(11.9.3)
m p2  1 2 m  p1 p2  2
x1 (p1 , p2 , m) = − = − .
p1 p1 (p2 /p1 ) − 1 p1 p2 − p1

(c) An interior solution,


 p1  2
h2 (p1 , p2 , u) = , (11.9.4)
p2 − p1

and

h1 (p1 , p2 , u) = u − h2 (p1 , p2 , u) − 2h2 (p1 , p2 , u)1/2


 p 2  p 
1 1
=u− −2
p − p1 p2 − p1
2 2 
p1
=u− +1 −1
p2 − p1
 p 2
2
=u+1− . (11.9.5)
p2 − p1

Thus, there will be an interior solution iff


 p2  2
u+1> . (11.9.7)
p2 − p1

If  p2  2
u+1< , (11.9.7)
p2 − p1
p
then h2 (p1 , p2 , u) = 0, and u = h2 (p1 , p2 , u) + 2 h2 (p1 , p2 , u). The right-
hand side of the second equation is a strictly increasing function of h2 , i.e.,
for any specified (p1 , p2 , u), there is a unique solution for h2 (p1 , p2 , u)., but
this solution cannot be expressed in a simple closed form.
11.9 EXERCISES 269

(d) e(p1 , p2 , u) = p1 h( p1 , p2 , u) + p2 h2 (p1 , p2 , u). Using result of part (c),


this expression can be written for an interior solution as
  
p2 2  p
1
2
e(p1 , p2 , u) = p1 u − + 1 + p2
p2 − p1 p2 − p1
p2 p21 − p1 p22
= p1 (u + 1) +
(p2 − p1 )2
p1 p2
= p1 (u + 1) − . (11.9.8)
p2 − p1

(e) In the case of interior solutions, by Shephard’s lemma we have

∂e(p1 , p2 , u)
= hi (p1 , p2 , u). (11.9.9)
∂pi

Differentiating (11.9.8) with respect to p2 we get

∂e(p1 , p2 , u) −(p2 − p1 )p1 + p1 p − 2


=
∂p2 (p2 − p1 )2
p2
= = h2 (p1 , p2 , u) from (11.9.4).
(p2 − p1 )2
(11.9.10)

Thus, Shephard’s lemma applies to good 2. Again, differentiating (11.9.8)


with respect to p1 , we get

∂e(p1 , p2 , u) (p2 − p1 )p2 + p1 p2


= u1 + 1 −
∂p1 (p2 − p1 )2
 p 2
2
=u+1− = h1 (p1 , p2 , u) from (11.9.5).
p2 − p1
(11.9.11)

Thus, Shephard’s lemma applies to good 1.


11.2. Use the Cobb-Douglas utility function u(x1 , x2 ) = x10.35 x20.65 , let
initially p1 = 2, p2 = 1, i = 100, and finally p′1 = 1, p2 = 1, i = 100. Then,
i 100 100
initially: x1 = 0.35 = 0.35 = 17.5, x2 = 0.65 = 65;
p1 0.2 1
i 100 100
finally: x′′1 = 0.35 ′ = 0.35 = 35, x′′2 = 0.65 = 65.
p1 1 1
(a) How much money is needed to buy the initial bundle with new prices?
Ans. i′ = p′1 x1 + p2 x2 = 1 × 17.5 + 1 × 65 = 82.5.
(b) What is the demand with i′ and new prices?
i′ 82.5 82.5
Ans. Initially x′1 = 0.35 = 0.35 = 28.875, x2 = 0.65 = 53.625.
p1 1 1
270 11 DEMANDS

Hence, substitution effect is = x′1 − x1 = 28.875 − 17.5 = 11.375, and income


effect is equal to x′′1 = x′1 = 35 − 28.875 = 6.125. Note that these two effects
add up to 11.375 + 6.125 = 17.5.

11.3. An electric company is setting up a power plant in a foreign country,


and it has to plan its capacity. The peak-period demand for power is given
by p1 = 400 − q1 , and the off-peak demand is given by p2 = 380 − q2 . The
variable cost is 20 per unit (paid in both markets) and capacity costs 10 per
unit which is only paid once and is used in both periods.
(i) Write out the Lagrangian and KKT conditions for this problem.
(ii) Find the optimal outputs and capacity for this problem.
(iii) How much of the capacity is paid for by each market (i.e., what are the
values of λ1 and λ2 )?
(iv) Now suppose the capacity cost is 30 cents per unit (paid only once). Find
the quantities, capacity, and how much of the capacity is paid for by each
market (i.e., λ1 and λ2 ).

11.4. Show that the Walrasian and Hicksian demands are equal. Hint.
(i) In both demands the consumption bundles that maximize utility are the
same as the consumption bundles which maximize expenditure, provided the
constraints of the two ‘match up’. (ii) Both demands must coincide when com-
puted according to the same prices, income, and utility. (iii) The proposition
implies that the expenditure function e(p, ū(p, w)) = w and ū(p, e(p, ū)) = ū,
so for a fixed price vector p, the quantities e(p, ·) and ū(p, ·) are inverses of
each other.
12
Black-Scholes Model

Although far from being perfect, the Black-Scholes model is still useful. It
demands a prerequisite of partial differential equations and the Laplace trans-
form.

12.1 Black-Scholes Equation


The Black-Scholes equation is a one-dimensional diffusion equation, which
describes the price of the option over time, and is defined as

∂V 1 ∂2V ∂V
+ σ2 S 2 2
+ rS − rV = 0, (12.1.1)
∂t 2 ∂S ∂S

where V denotes the value of the option, r the rate of interest, and S the asset
at time t. The derivation of this equation is as follows: The Black-Scholes
model based on return has two components: (i) µ dt, which is predictable
and deterministic, where µ is the drift, and (ii) σ dX, which is a random
contribution to the return dS/S, where σ is the volatility of the asset S at
time t. For each interval √dt, the quantity dX is a sample drawn from the
normal distribution N (0, ( dt)2 ), which when multiplied by σ produces the
term σ dX. The value of the parameters σ and µ are estimated from the
historical data. Thus, we obtain the following stochastic differential equation:

dS
= µ dt + σ dX. (12.1.2)
S

Three particular cases are as follows:


Case 1. If σ = 0, the behavior of the asset price is given by the equation
dS/S = µ dt, which, with the initial condition S(0) = S0 , has the solution
S = S0 eµt . Thus, the asset price is totally deterministic.
Case 2. Eq (12.1.2) represents a random walk, and it cannot be solved to
give a deterministic solution for the share price S, but it gives probabilistic
information about the behavior of S.
272 12 BLACK-SCHOLES MODEL

Case 3. Eq (12.1.2) can be regarded as a scheme for constructing the time


series that may be realized by the share prices (see Exercise 12.1 for a discrete
model).
The discrete model presented in Exercise 12.1 (Figure 12.9.8) with a finite
time interval in each step is not practicable because the large amount of data
involved would be unmanageable. Hence, we will develop a continuous model
by taking the limit as dt → 0. This requires Itô’s lemma which is a version of
Taylor’s series for functions of random variable.

12.1.1 Itô’s Lemma. Assume that Xt is a drift-diffusion process that satis-


fies the SDE
dXt = µt dt + σt dBt ,
where Bt is a Wiener process. If f (t, x) is a C 2 -function, its Taylor’s series
expansion is
∂f ∂f 1 ∂2f 2
df = + dx + dx + · · · .
∂t ∂x 2 ∂x2
Substituting Xt for x, and therefore, µt dt + σt dBt for dx gives
∂f ∂f 1 ∂2f 2 2
df = dt + (µt dt + σt dBt ) + (µ dt + 2µt σt dt dBt + σt2 dBt2 ) + · · · .
∂t ∂x 2 ∂x2 t
In the limit as dt → 0, the terms dt2 and dt dBt tend to zero faster than dBt2 ,
which is O(dt). Setting the dt2 and dt dBt terms to zero, substituting dt for
dBt2 , and collecting the dt and dBt terms, we obtain
 ∂f ∂f 1 ∂2f  ∂f
df = + µt + σt2 2 dt + σt dBt . (12.1.3)
∂t ∂x 2 ∂x ∂x
This is Itô’s lemma.

12.1.2 Derivation of Black-Scholes Equation. To derive the Black-


Scholes equation (1.1), suppose that a stock price follows a geometric Brown-
ian motion given by the stochastic differential equation dS = S(µ dt + σ dB).
Then, if the value of an option at time t is f (t, St ), by Itô’s lemma (12.1.3)
we get
 ∂f 1 ∂2f  ∂f
df (t, St ) = + (St σ)2 2 dt + dSt .
∂t 2 ∂S ∂S

The term dS represents the change in value in time dt of the trading
∂S
∂f
strategy consisting of holding an amount of the stock. If this trading
∂S
strategy is followed, and any cash held is assumed to grow at the risk-free
rate r, then the total value V of this portfolio satisfies the SDE
 ∂f  ∂f
dVt = r Vt − St dt + dSt .
∂S ∂S
12.2 SOLUTION OF BLACK-SCHOLES EQUATION 273

This strategy replicates the option if V = f (t, S). Combining these options
we get the Black-Scholes equation

∂f 1 ∂2f ∂f
+ σ 2 S 2 2 + rS − rf = 0, (12.1.4)
∂t 2 ∂S ∂S

which is Eq (12.1.1) with f = V .

12.2 Solution of Black-Scholes Equation


The solution is determined by first transforming the Black-Scholes equation
(12.1.1) into one-dimensional diffusion equation (heat equation), and then
solving this equation by different methods.
12.2.1 Transformation. We will use the following transformations to con-
vert the Black-Scholes partial differential equation (12.1.1) into the heat equa-
tion. First, we convert the spot price to log-moneyness and the time to one-
half of the total variance, to remove the S and S 2 terms from the Black-Scholes
equation; thus, we set

S
x = ln so that S = K ex ,
K
1 2 2τ
τ= σ (T − t) so that t = T − 2 , (12.2.1)
2 σ
1 1 2τ
U (x, t) = V (S, t) = V (Kex , T − 2 ).
K K σ

Next, we apply the chain rule to the partial derivatives in the Black-Scholes
equation:

∂V ∂U ∂τ 1 ∂U
=K = − Kσ 2 ,
∂t ∂τ ∂t 2 ∂τ
∂V ∂U ∂x K ∂U ∂U
=K = = e−x ,
∂S ∂x ∂S S ∂x ∂x
∂2V K ∂U K ∂  ∂U 
= − +
∂S 2 S 2 ∂x S ∂S ∂x
K ∂U K ∂  ∂U  ∂x
=− 2 +
S ∂x S ∂x ∂x ∂S
K ∂U K ∂2U e−2x  ∂ 2 U ∂U 
=− 2 + 2 2
= 2
− .
S ∂x S ∂x K ∂x ∂x

Substitution for the partials in the Black-Scholes equation (12.1.1) yields

1 ∂U ∂U 1  ∂2U ∂U 
− Kσ 2 + rK + σ2 K − − rU = 0,
2 ∂τ ∂x 2 ∂x2 ∂x
274 12 BLACK-SCHOLES MODEL

which simplifies to

∂U ∂U ∂2U
− + (k − 1) + − kU = 0, (12.2.2)
∂τ ∂x ∂x2

2r
where k = . Notice that the coefficients of Eq (12.2.2) are independent of
σ2
x and τ . The boundary condition for V is V (ST , T ) = (ST − K)+ . Now, from
Eq (12.2.1), x = ln{ST /K} ≡ xT when t = T and St = ST , and τ = 0. Then
the boundary condition for U is

1 1
U0 (xT ) = U (xT , 0) = V (ST − K)+ = (KexT − K)+ = (exT − 1)+ .
K K

Lastly, we set
2
W (x, T ) = eαx+β τ
U (x, t), (12.2.3)
where α = 12 (k − 1) and β = 21 (k + 1), (k = 2r/σ 2 ). The transformation
(12.2.3) converts Eq (12.2.2) into the heat equation; details are as follows:

∂U 2
 ∂W 
= e−αx−β τ − β 2 W (x, τ ) ,
∂τ ∂τ
∂U  
−αx−β 2 τ ∂W
=e − αW (x, τ ) ,
∂x ∂x
∂2U −αx−β 2 τ
 ∂W ∂2W 
2
=e α2 W (x, τ ) − 2α + .
∂x ∂x ∂x2

Substituting these derivatives into Eq (12.2.2) we obtain

∂W  ∂W 
β 2 W (x, τ ) − + (k − 1) − αW (x, τ ) +
∂τ ∂x
∂W ∂2W
+αW (x, τ ) = 2α + − kW (x, τ ) = 0,
∂x ∂x2

which simplifies to the heat equation

∂W ∂2W
= . (12.2.4)
∂τ ∂x2

The boundary condition for W (x, τ ) is obtained from (12.2.3) as

W0 (xT ) = W (xT , 0) = eαxT U (xT , 0)


 +  +
= e(α+1)xT − eαxT = eβxT − eαxT ,
(12.2.5)
12.2 SOLUTION OF BLACK-SCHOLES EQUATION 275

since β = α + 1. Notice that the transformation from V to W is given by

1 −αx−β 2 τ
V (S, t) = e W (x, τ ). (12.2.6)
K
12.2.2. Solution of the Heat Equation. The solution of the heat equation
(12.2.4) subject to the boundary condition (12.2.5) is
Z ∞
1 2
W (x, τ ) = √ e−(x−y) /(4τ ) W0 (y) dy
2 πτ −∞
Z ∞
1 2
= √ e−(x−y) /(4τ ) (eβy − eαy )+ dy,
2 πτ −∞
(12.2.7)

where the Green’s function G(x, t, y) is given by

1 2
G(x, t, y) = √ e−(x−y) /(4τ ) . (12.2.8)
2 πτ

For details, see Kythe [2011:§6.1.2]. The graphs of this Green’s function G
for some values of t > 0 are shown in Figure 12.1, which shows normal distri-
butions.

Figure 12.1 Green’s function for 1-D diffusion operator.

12.2.3 Black-Scholes Call Price. The solution of the heat equation (12.2.4)
subject to the boundary condition (12.2.5) is obtained from Eq (12.2.8) as
Z ∞
1 2
W (x, τ ) = √ e−(x−y) /(4τ ) W0 (y) dy
2 πτ −∞
Z ∞
1 2
= √ e−(x−y) /(4τ ) (eβy − eαy )+ dy.
2 πτ −∞
(12.2.9)
276 12 BLACK-SCHOLES MODEL

√ √ √
Now, set z = (y − x)/ 2τ , so that y = 2τ z + x, giving dy = 2τ dz. Then
Z ∞ n 1 o n √ √ o
1
W (x, τ ) = √ exp − z 2 exp β( 2τ z + x) − α( 2τ z + x)+ dz.
2π −∞ 2
(12.2.10)
Note the integral in (12.2.10)
√ is nonzero only
√ when the second exponent
√ term
is positive, i.e., when β( 2τ z + x) > α( 2τ z + x), or z > −x/ 2π. Let us
write Eq (12.2.10) as
W (x, τ ) = I1 − I2 ,
where
Z ∞ n n √
1 1 2o o
I1 = √ √ exp − z exp β( 2τ z + x) dz,
2π −x/ 2π 2
Z ∞ n n √
1 1 2o o
I2 = √ √ exp − z exp α( 2τ z + x) dz.
2π −x/ 2π 2

Completing the square in the integrand in I1 , we get − 21 z 2 + β 2τ z + βx =

− 12 (z − β 2τ )2 + βx + β 2 τ , and thus,
Z ∞ √
1 2 1
2τ )2
I1 = √ eβx+β τ √ e− 2 (z−β dz.
2π −x/ 2π


Now set u = z − β 2τ . Then
Z ∞
1 2 1 2
I1 = √ eβx+β τ √ √ e− 2 u du
2π −x/ 2π−β 2τ
βx+β 2 τ
 x √ 
=e Φ √ + β 2τ ,

where Φ is the c.d.f. of a normal random variable. The I2 is similar, so


replacing β by α, we get
2
 x √ 
I2 = eαx+α τ Φ √ + α 2τ .

r−σ2 /2
Recall that x = ln(S/K), k = 2r/σ 2 , α = 21 (k − 1) = σ2 ,β = 21 (k + 1) =
2
r+σ /2
σ2 , and τ = 12 σ 2 (T − t). Thus,

√ S
x ln K + (r + σ 2 /2)(T − t)
√ + β 2τ = ≡ d1 ,
2τ σ(T − t)
x √ √
√ + α 2τ = d1 − σ T − t ≡ d2 ,

12.2 SOLUTION OF BLACK-SCHOLES EQUATION 277

and then the integrals I1 and I2 are given by


2 2
I1 = eβx+β τ
Φ(d1 ), and I2 = eαx+α τ
Φ(d2 ).

Hence, the solution is


2 2
W (x, τ ) = I1 − I2 = eβx+β τ
Φ(d1 ) − eαx+α τ
Φ(d2 ). (12.2.11)

Finally, to obtain the solution for the call price V (St , t), we use Eq (12.2.6)
in Eq (12.2.11) and obtain
2 2
V (S, t) = K e−αx−β τ
W (x, τ ) = K e−αx−β τ
(I1 − I2 ). (12.2.12)

The first and the second integrals in Eq (12.2.12) are


2 2
K e−αx−β τ
eβx+β τ Φ(d1 ) = K e(β−α)x Φ(d1 ) = SΦ(d1 ), since β − α = 1,
−αx−β 2 τ αx+α2 τ (α2 −β 2 )τ −r(T −t)
Ke e Φ(d2 ) = K e Φ(d2 ) = Ke Φ(d2 ),
(12.2.13)

since β 2 − α2 = 2r/σ 2 . Then combining both terms in (12.2.13) we get the


Black-Scholes call price C(St , K, T ) in Eq (12.6.28).

12.2.4 Some Finance Terms and Arbitrage


1. The price of a contingent claim can be determined using risk-neutral prob-
ability, like martingales, change of measure, etc.
2. Examples of contingent claims are call options and put options.
3. A call option gives the holder the right (but not the obligation) to buy a
specified item for an agreed upon price at an agreed upon time.
4. Arbitrage is simply (risk-free) free money. An arbitrage argument says that
there should be no (risk-free) free money. Arbitrage can determine prices.
5. How does one use arbitrage to price a claim? Try to replicate the claim
with stocks and bonds.
6. Stocks and bonds are called securities.
7. A contingent claim, f , is replicable if we can construct a portfolio Π such
that
a. the values of Π and f are the same under all circumstances, and
b. Π is self-financing, i.e., as time passes, we only shift money around
within the portfolio, but we do not put any more in (or take any out).
8. Π is called the replicating portfolio (of f ).

Example 12.1. Suppose the price of gold today is $2000 per ounce and
the risk-free interest is 3%. Suppose you do not want gold today (because it
is out of fashion), but you do want it in 6 months (when, of course, it will
278 12 BLACK-SCHOLES MODEL

be suddenly very popular). Therefore, you buy a forward contract F0 , which


says that you will receive gold in 6 months. You have locked in a price today
for something you will buy in 6 months. Then how much should you pay for
this wonderful opportunity?
Suppose the forward contract costs $2500. Then you should go to the bank,
and borrow $2000 at 3% interest, and use this money to buy some gold right
now. Then short (sell) the forward (to a sucker). In 6 months, the following
things will happen:
(i) You sell your gold for $2500.
(ii) You pay back your bank loan with your newly received funds.
(iii) You are left with $2500 − $2000 · e0.5(0.3) = $176.33, which is a lot of free
money.
What if the forward contract F0 is selling for less than $200 · e0.5(0.3) ?
Assume that you have gold lying around, and you sell your gold today and
get $2000, because you know the trick. Next, you put this $2000 in the bank.
Finally, you go long (buy) the forward contract. Then what happens after 6
months?
You take money out of the bank, which is now $2000 ·e0.5(0.3) .
Use it to buy your gold back for F0 .
Now you have your gold back, and $2000 · e0.5(0.3) − F0 . Since this number is
positive, you are happy.
Hence, arbitrage sets the price of the forward contract to be $2000 · e0.5(0.3) .
If the price is anything else, there is risk-free free money to be made. This
is true of all forward contracts on an asset with no storage costs and which
do not pay dividends, assuming that the interest rates are constant. Gener-
ally, any replicable claim has the same price as its self-financing replicating
portfolio. 

12.2.5 Self-Financing Portfolio. We have seen that forward contracts are


simple to price, mostly because of the linearity of payoffs at maturity. On
the other hand, options are difficult to price, because the payoff at maturity
has a kink (a small problem or flaw due to randomness). However, a self-
financing portfolio can be constructed using Itô’s lemma. Thus, if dSt =
St µ dt + S5 σ dX, and f : (St , t) 7→ R, we find that if dx = (dSt , dt)′ , then

∂f ∂f
df = (∇f, dx) = dS + dt
∂S ∂t
 ∂f ∂f  ∂f
= St µ + dt + St σ dX.
∂S ∂t ∂S

√ 1 ∂2f
Since dX behaves like dt, we may take (dX)2 = dt. Then 2 (dS)2 =
∂S 2
12.2 SOLUTION OF BLACK-SCHOLES EQUATION 279

1 ∂2f 2 2
S σ dt up to first order. Hence, by Itô’s lemma
2 ∂S 2 t
 ∂f ∂f ∂2f 2 2 ∂f
df = St µ + + S σ dt + St σ dX, (12.2.14)
∂S ∂t ∂S 2 t ∂S

where dX = dt, as defined above. Notice that the only randomness in df is
the dX term. Thus, we can contract a portfolio that eliminates the random
part, and the rest we can easily control.
First, we will rely on the discrete version of (12.2.14). Since we want to
price a contingent claim, or derivative, a simple method is to set

−1 for derivative,
Π=
∆ for shares,

∂f
where ∆ ≡ . For a small change δt in time, the corresponding change in
∂S
Π is δΠ = −δf + ∆ δS. The discrete version of (12.2.14) gives
 ∂f 1 ∂2f 2 2
δΠ = − − σ St δt, (12.2.15)
∂t 2 ∂S 2

which implies that the portfolio is risk-less (no uncertainty), and then by
arbitrage argument we must have δΠ = rΠ δt, or
 ∂f 1 ∂2f 2 2
− − σ St δt = r(−f + ∆S) δt,
∂t 2 ∂S 2
which yields
 ∂f 1 ∂2f 2 2 
+ σ S t + r∆S δt = rf δt,
∂t 2 ∂S 2
or
∂f 1 ∂2f 2 2
+ σ St + r∆S = rf,
∂t 2 ∂S 2
which yields the Black-Scholes-Merton partial differential equation

∂f 1 ∂2f 2 2 ∂f
+ σ St + r S − rf = 0, (12.2.16)
∂t 2 ∂S 2 ∂S

with known Cauchy data f (St , T ), which are initial conditions (at t = 0) on
S and St . Thus, any function f that satisfies Eq (12.2.16) denotes the price
of some theoretical contingent claim, and every contingent claim must satisfy
Eq (12.2.16).
A solution of Eq (12.2.16) with boundary conditions depicting a European
call option with strike K, i.e., with f (S, T ) = max{S − K, 0}, we obtain
280 12 BLACK-SCHOLES MODEL

the Black-Scholes price of the European call option. Let c denote the Black-
Scholes price of a European call option on a stock with no dividend, i.e.,

c ≡ c(K, r, St , t, T, σ) = St N (d1 ) − K e−r(T −t) N (d2 ), (12.2.17)

where N is the cumulative distribution function of the standard normal vari-


able, N = N (0, 1)), and

ln(St /K) + (r + 21 σ 2 )(T − t)


d1 = √ ,
σ T −t

d2 = d1 − σ T − t. (12.2.18)

Some properties of the Black-Scholes price of c are as follows:


1. If St is very large, c should be priced like a forward contract, since
c ≈ St − Ke−r(T −t) if St is large, which is the price of a forward contract.
2. If σ is very small, the payoff is expected to be

c ≈ max{St er(T −t) − K, 0}. (12.2.19)

These two properties make the benchmark test.


3. c is an increasing function of σ.
∂c ∂f
4. = N (d1 ). This is used to estimate ∆ = , to be used in the
∂S ∂S
replicating portfolio of c.
Notice that the price determined by risk-neutral expectation is the same
as the price determined by solving the Black-Scholes equation.

12.2.6 Implied Volatility. Next, we will discuss the implied volatility, and
where Black-Scholes goes wrong. Remember that prices are not set by the
Black-Scholes options price. It is the markets that set prices, and according to
some economists they set prices nearly perfectly. Therefore, go to the market
to see what a call option on a certain underlying is selling for at this moment,
i.e., at t = 0. Observe K, r, St , T , but remember that we cannot observe σ. So
we solve for σ using (12.2.17), which is easy since the Black-Scholes call option
price is monotonic in σ. The number we get is called the implied volatility.

If we check market data for different strike prices K with every thing else
being equal, we get different implied volatilities. In fact, what we get is called
a volatility smile or a volatility skew depending on its shape. This problem
is due to our assumption that σ is an intrinsic property of the underlying. It
should not vary with K.
12.2 SOLUTION OF BLACK-SCHOLES EQUATION 281

Example 12.2. The prices for European call and put options on the
QQQ (a NASDAQ 100 composite) for October 11, 2016 and expiration dates
in October 17 and November 28, 2016, are presented in Table 12.1.

Table 12.1 Prices for European Call and Put Options

Calls Puts
Strike Oct Nov Oct Nov
34 3.9 4.1 0.05 0.25
35 2.8 3.2 0.05 0.35
36 1.85 2.35 0.1 0.55
37 1 1.65 0.25 0.85
38 0.35 1.05 0.6 1.25
39 0.1 0.6 1.4 1.9
40 0.05 0.35 2.35 2.6

From this table, we notice that


S0 = 37.73 (price at the closing of October 17, 2016),
T − t = 42/365 = 0.1151,
r = 0.83,
q = 0.18.
This data give the implied volatility as given in Table 12.2.

Table 12.2 Implied Volatility

Strike October Call November Call


34 0.3230 0.29
35 0.2592 0.2493
36 0.2455 0.2369
37 0.2455 0.2369
38 0.2279 0.2198
39 0.2156 0.2279
40 0.2181 0.2206

The above data is plotted in Figure 12.2 with the strike price on the x-axis
and implied volatility on the y-axis.

Volatility smiles also occur with commodities. It is found that σ not only
varies with the strike price, it also depends on whether a call or a put is being
priced. Moreover, implied volatility varies with the expiration of the option.
Thus, Black-Scholes is used to show that it lacks certain features. This model
could be enlarged. Some suggestions are: (i) assume volatility is stochastic,
282 12 BLACK-SCHOLES MODEL

i.e., let σ = µσ dt + σ̂ dX; (ii) assume volatility is local, i.e., σ = σ(S, t); (iii)
assume the process that is underlying follows a jump-diffusion process; and
(iv) assume interest rates are, at the very least, nonconstant. However, no
improvement of the Black-Scholes model is available so far.

Figure 12.2 Implied volatility, October call.

12.3 Black-Scholes Formula


The Black-Scholes formula is used to calculate the price of the European
put and call options. This price remains consistent with the Black-Scholes
equation (12.1.1), since the formula is obtained by solving this equation using
the terminal and boundary conditions. Thus, for example, the value of a
call option for a non-dividend-paying underlying stock in terms of the Black-
Scholes parameters is

C(S, t) = N (d1 )S − N (d2 )Ke−r(T −t) , (12.3.1)

where N (·) is the cumulative distribution function of the standard normal


distribution, S is the spot price of the underlying asset, K is the strike price,
T − t is the time to maturity, r is the risk-free rate (annual, expressed as
continuously compounding), σ is the volatility of returns of the underlying
asset, and

1 n S  σ2  o
d1 = √ ln + r+ (T − t) , (12.3.2)
σ T −t K 2

d2 = d1 − σ T − t. (12.3.3)
12.3 BLACK-SCHOLES FORMULA 283

The price of the corresponding put option based on put-call parity is given by

P (S, t) = Ke−t(T −t) − S + C(S, t) = N (−d2 )ke−r(T −t) − N (−d1 )S. (12.3.4)

An alternative formulation of the Black-Scholes formula, obtained by in-


troducing some new variables, is

C(F, τ ) = D N (d+ )F − N (d− )K , (12.3.5)

where
1 n F  1 2 o
d± = √ ln ± σ τ ,
σ τ K 2 (12.3.6)

d± = d∓ ± σ τ ,
where τ = T − t is the time to expiry (i.e., remaining time, backwards time),
D = e−rτ is the discount factor, F = erτ S = S/D is the forward price of the
underlying asset, S = DF , and d+ = d1 and D− = d2 . The formula (12.3.5)
is a special case of the so-called Black-76 formula. Thus, if a put-call parity,
defined by C − P = D(F − K) = S − DK, is given, then the price of a put
option is 
P (F, τ ) = D N (−d− )K − N (−d+ )F . (12.3.7)
The Black-Scholes formula (12.3.5) is a difference of two terms, sum of
which is equal to the value of the binary call options. According to Nielsen
[1993], this formula can be interpreted in terms of the N (d± ) (and a fortiori
d± ) terms as follows: it allows us to decompose a call option into the difference
of two binary options, which are simply an asset-or-nothing call minus a cash-
or-nothing call, where a call option exchanges cash for an asset at expiry,
while an asset-or-nothing call just yields the asset (with no cash in exchange)
and a cash-or-nothing call just yields cash (with no asset in exchange).
Next, we can rewrite formula (12.3.1) as

C = D N (d+ )F − N (d− )K . (12.3.8)

This formula is made up of the difference of two parts: DN (d+ )F , and


DN (d− )K, where the first part is the present value of an asset-or-nothing
call while the second is the present value of a cash-or-nothing call; the factor
D in each is for discounting, because the expiry date is in the future, and re-
moving it changes the present value to the future value (i.e., value at expiry).
In simple terms, N (d+ )F is the future value of an asset-or-nothing call and
N (d− )K is the future value of a cash-or-nothing call. However, in risk-neutral
terms, they are the expected value of the asset and the expected value of the
cash, respectively, in the risk-neutral measure.
An obviously incorrect interpretation ensues if the N (d+ )F term is re-
garded simply as the product of the probability of the option expiring in
284 12 BLACK-SCHOLES MODEL

money N (d+ ) and the value of the underlying at expiry F , while the N (d− )K
term is the product of the probability of the option expiring in money N (d− )
and the value of the cash at expiry K. This is because as either both binaries
expire in the money or both expire out of money, i.e., either cash is exchanged
for asset or it is not, the probabilities N (d+ ) and N (d− ) are not equal. In fact,
the quantities d± are measures of moneyness (in standard deviation), while
N (d− ) are the probabilities of expiring ITM (percent moneyness). Thus, the
interpretation of the cash option, N (d− )K, is correct, since the value of the
cash is independent of movements of the underlying, and therefore, it can be
interpreted simply as ‘probability times value.’
On the other hand, the product N (d+ )F is more complicated, since, ac-
cording to Nielsen [1993], the probability of expiring in the money and the
value of the asset at expiry are not independent. In fact, the value of the
asset at expiry is variable in terms of cash, but is constant in terms of the
asset itself (i.e., it is a fixed quantity of the asset). Thus, these quantities are
independent only if one changes stock (numéraire) to the asset rather than
cash.
In formula (12.3.1), if S replaces the forward F in d± instead of the 21 σ 2
term, there is the term (r ± 21 σ 2 ), which can be interpreted as a drift factor
in the risk-neutral measure for appropriate numéraire. The reason for the
use of the factor 12 σ 2 is to account for the difference between the median
and mean of the log-normal distribution, if d− is used for moneyness rather
1 F 
than the standardized moneyness m = √ ln . The same factor is
σ τ K
found in Itô’s lemma on the geometric Brownian motion. Another reason for
the incorrectness of the naive interpretation of replacing N (d+ ) by N (d− ) in
formula (12.3.5) is that it will yield a negative value for out-of-the-money call
options.
Thus, the terms N (d1 ) and N (d2 ) represent, respectively, the probabilities
of the option expiring in-the-money under the exponential martingale prob-
ability measure for stock and the equivalent martingale probability measure
for risk-free asset. The risk-neutral probability for a finite stock price ST is
defined by
N ′ (d2 (ST ))
p(S, T ) = √ , ST ∈ (0, ∞), (12.3.9)
ST σ T

where N ′ is the standard normal probability density function, and d2 = d2 (K)


is defined in (12.3.3). Note that the term N (d2 ) represents the probability that
the call will be exercised under the assumption that the asset drift is risk-free.
On the other hand, the term N (d1 ) has no simple probability interpretation,
but SN (d1 ) represents the present value, under risk-free interest rate, of the
expected asset price at expiration provided the asset price at expiry is above
the exercise price.
12.5 LOG-NORMAL DISTRIBUTION 285

The solution of Eq (12.1.1), when discounted appropriately, is a martin-


gale. What this means is that the option price is the expected value of the
discounted payoff of the option. Any computation of the option price using
this expectation is the risk neutrality approach and can be performed with-
out any knowledge of the theory of partial differential equations, since the
expectation of the option payoff is not done under the real world probability
measure, but an artificial risk-free measure, which is different from the real
world measure. For details, see Hull [2008; 307-309].

12.4 Use of Greek Letters


The letters of the Greek alphabet, commonly known as the Greeks, are im-
portant not only in the mathematical theory of finance but also in active
trading. They measure the sensitivity of the value of a derivative or a portfo-
lio to changes in parameter value(s) while the other measures are held fixed.
Mathematically, they are partial derivatives of the price with respect to the
parameter values. For example, the Greek letter gamma (γ) is a partial de-
rivative of another Greek letter delta (δ). Financial institutions typically set
(risk) limit values for each of the Greeks that their traders must not exceed.
The Greek delta is most important since it usually confers the largest risk.
For example, many traders who speculate and follow a delta-neutral hedging
approach will zero their delta at the end of the day.
The Greeks for Black-Scholes can be obtained by differentiation of the
Black-Scholes formula (12.3.1) or (12.3.5) (see Chen et al. [2010]). Note that
it is clear from the formulae that the gamma and vega1 are both the same
value for call and put options. This can be seen directly from put-call parity,
since the difference of a put and a call is a forward, which is linear in S and
independent of σ; so a forward has zero gamma and zero vega. Recall that
N ′ is the standard normal probability density function.
In practice, some sensitivities are usually quoted in scaled-down terms, to
match the scale of likely changes in the parameters. For example, rho (ρ) is
often reported divided by 10,000 (1 basis point rate change), vega by 100 (1
vol point change), and theta (θ) by 365 or 252 (1 day decay based on either
calendar days or trading days per year).

12.5 Log-normal Distribution


12.5.1 Log-normal p.d.f and c.d.f. First, note that if a random variable
Y ∈ R follows the normal distribution with mean µ and variance σ 2 , then
X = eY follows the log-normal distribution with mean
1 2
E[X] = eµ+ 2 σ , (12.5.1)

1 vega is not a letter in the Greek alphabet; it arises from reading the Greek letter ν as
‘v’.
286 12 BLACK-SCHOLES MODEL

and variance 
2 2
Var[X] = eσ − 1 e2µ+σ . (12.5.2)
The p.d.f for X is

1 n 1  ln x − µ 2 o
dFX (x) = √ exp − , (12.5.3)
σx 2π 2 σ

and the c.d.f is  ln x − µ 


FX (x) = Φ , (12.5.4)
σ
where Z y
1 2
Φ(y) = √ e−t /2
dt
2π −∞

is the standard normal c.d.f.

12.5.2 Log-normal Conditional Expected Value. The expected value of


X conditional on X > x is LX (K) = E[X|X > x]. Using Eq (12.5.3), we find
that LX (K) for the log-normal distribution is
Z ∞  2 o
1 n 1 ln x − µ
LX (K) = √ exp − dx. (12.5.5)
K σ 2π 2 σ

With the change of variables y = ln x, so that x = ey , dx = ey dy, and the


Jacobian ey , we get from (12.5.5)
Z ∞  2 o
ey n 1 ln x − µ
LX (k) = √ exp − dy. (12.5.6)
∈K σ 2π 2 σ

The exponent in (12.5.6), after completing the square and combining terms,
becomes

1 1 2
− (y 2 − 2yµ + µ2 − 2σ 2 y) = − 2 y − (µ + σ 2 ) + µ + 12 σ 2 ,
2σ 2 2σ

and Eq (12.5.6) reduces to


Z ∞ n 1  y − (µ + σ 2 ) 2 o
1 1
LX (K) = exp{µ + 21 σ 2 } √ exp − dy.
σ
ln K σ 2π 2 σ
(12.5.7)
Now, for the random variable X with p.d.f. fX (x) and c.d.f. FX (x),
and the scale-location transformation Y = σX + µ, it is easy to show that
1 y − µ
the Jacobian is 1/σ, the p.d.f for Y is fY (y) = fX , and c.d.f is
σ σ
12.6 BLACK-SCHOLES CALL PRICE 287

y − µ
FY (y) = FX . Hence, the integral of Eq (12.5.7) involves the scale-
σ
location transformation of the standard normal c.d.f. Since Φ(−x) = 1−Φ(x),
we get (Hogg and Kulgman [1984])

  − ln K + µ + σ 2 
LX (K) = exp µ + 12 σ 2 Φ . (12.5.8)
σ

12.6 Black-Scholes Call Price


Let C(St , K, T ) be the t-time price of a European call option. Then we have

C(St , K, T ) = St Φ(d1 ) − e−rτ KΦ(d2 ), (12.6.1)

where
ln(St /K) + (r + 12 σ 2 )τ
d1 = √ ,
σ τ

d2 = d1 − σ τ , (12.6.2)
Z y
1 2
Φ(y) = √ e−t /2 dt,
2π −∞
and Φ(y) is the standard normal c.d.f.

12.6.1 Black-Scholes Economy. There are two assets: a risky stock S,


and a risk-less bond B. These assets are derived by the stochastic differential
equations

dSt = µSt dt + σSt dXt , (12.6.3)


dBt = rt Bt dt. (12.6.4)

The value of the bond at time zero is B0 = 1, and that of the stock is S0 . This
model is valid under certain market assumptions, for which see Hull [2008].
By Itô’s lemma, the value Vt of a derivative written on the stock follows the
diffusion equation

∂V ∂V 1 ∂2V
dVt = + dS + (dS)2
∂t ∂S 2 ∂S 2
∂V ∂V 1 ∂2V 2 2
= + dS + σ S dt (12.6.5)
∂t ∂S 2 ∂S 2
 ∂V ∂V 1 ∂2V   ∂V 
= + µSt dS + σ 2 St2 dt + σS t dXt .
∂t ∂S 2 ∂S 2 ∂S
There are four different methods to derive Eq (12.6.1): (i) By straightfor-
ward integration; (ii) by applying the Feynman-Kac theorem, (iii) by trans-
forming the Black-Scholes equation into the heat equation, for which a solution
288 12 BLACK-SCHOLES MODEL

is known (this was the original method used by Black and Scholes [1973]; see
§12.2.3), and (iv) using the Capital Asset Pricing Model (CAPM). We will
discuss these methods in the sequel.
With constant interest rate r, the time t price of a European call option on
a non-dividend paying stock when its spot price is St and with strike K and
time to maturity τ = T − t is given by

C(St , K, T ) = e−rτ E Q [(ST − K)+ |Ft ], (12.6.6)

which can be evaluated to give Eq (12.6.1), but rewritten here for convenience
as
C(St , K, T ) = St Φ(d1 ) = Ke−rτ Φ(d2 ),
St 1
ln + (r + σ 2 )τ
d1 = K √ 2 ,
σ τ (12.6.7)
St 1 2
√ ln + (r − σ )τ
d2 = d1 − σ τ = K √ 2 .
σ τ
To find a measure Q such that under this measure the discounted stock price
that uses Bt is a martingale, let

µ − rt
dSt = rt St dt + σSt dWtQ , where WtQ = Wt + t. (12.6.8)
σ

12.6.2 Black-Scholes under a Different Numéraire. The principle be-


hind the ‘pricing by arbitrage’ is that if the market is complete, we can find a
portfolio that replicates the derivative at all times, and we can find an equiv-
alent martingale measure (EMM) N such that the discounted stock price is
a martingale. Moreover, N determines the unique numéraire Nt that dis-
counts the stock price. The time t value V (St , t) of the derivative with payoff
V (ST , T ) at time T discounted by the numéraire Nt is
 
N V (ST , T )
V (S − t, t) = Nt E |Ft . (12.6.9)
NT

Recall that the bond Bt = erτ serves as the numéraire, and since r is determin-
istic, we can take NT = erT out of the expectation with V (ST , T ) = (ST −K)+
we can write
V (St , t) = e−r(T −t) E N [(ST − K)+ |Ft ],

which is Eq (12.6.6) for the call price.


12.6 BLACK-SCHOLES CALL PRICE 289

Now we will use the stock price St as the numéraire and recover the Black-
Scholes call price. We start with the stock price process in Eq (12.6.8) under
the measure Q and with a constant interest rate

dSt = rSt dt + σSt dWtQ . (12.6.10)

The related bond price is defined by B̃ = B/S. Then by Itô’s lemma we get
the process
dB̃t = σ 2 B̃t dt − σ B̃t dWtQ . (12.6.11)

The measure Q turns S̃ = S/B into a martingale, and not into B̃. The
measure P that turns B̃ into a martingale is

WtP = WtQ − σt, (12.6.12)

so that we find that dB̃t = −σ B̃t dWtP is a martingale under P.


The value of the European call is determined by using Nt = St as the
numéraire together with the payoff function V (ST , T ) = (ST − K)+ in evalu-
ating Eq (12.6.9) as
 
(ST − K)+
V (St , t) = St E P |Ft = St E P [(1 − KZT )|Ft ], (12.6.13)
ST

where Zt = 1/St . To evaluate V (St , t) we need the distribution for ZT . The


process for Z = 1/S is obtained using Itô’s lemma in Eq (12.6.10) and change
of measure in Eq (12.6.12), i.e.,

dZt = (−r + σ 2 )Zt dt − σZt dWtQ = −rZt dt − σZt dWtP . (12.6.14)

Thus, to solve for Zt define Yt = ln Zt and apply Itô’s lemma again, yielding

1
dYt = −(r + σ 2 ) dt − σ dWtP . (12.6.15)
2

This equation after integration yields the solution

1
YT − Yt = −(r + σ 2 )(T − t) − σ(WTP − WtP ),
2

so that the solution for ZT is


n 1 o
ZT = exp ln Zt − (r + σ 2 )(T − t) − σ(WTP − WtP ) . (12.6.16)
2
290 12 BLACK-SCHOLES MODEL

Note that since WTP −WtP is identical in distribution to WτP , where τ = T −t is


the time to maturity, and since WτP follows the normal distribution with zero
mean and variance σ 2 τ , the exponent in Eq (12.6.16) also follows the normal
distribution with mean ln Zt − (r + 12 σ 2 )τ = − ln St − (r + 12 σ 2 )τ ≡ u and
variance σ 2 τ ≡ v. This implies that ZT follows the log-normal distribution
with mean eu+v/2 and variance (ev −1) e2u+v . Note that the factor (1−KZT )+
in the expectation of Eq (12.6.13) is nonzero for ZT < 1/K. Hence, we can
rewrite this expectation as two integrals
Z 1/k Z 1/K
P
E [(1 − KZT )|Ft ] = dFZT − K ZT dFZT ≡ I1 − I2 , (12.6.17)
−∞ −∞

where FZT is the c.d.f of ZT defined in Eq (12.6.16). Using the definition of


LZT (x) from Eq (12.5.8) for I1 , we find that
   ln 1 − u 
1 K
I1 = FZT =Φ
k v
 − ln K + ln S + (r + 1 σ 2 )τ 
t 2
=Φ √ = Φ(d1 ),
σ τ
Z ∞ Z ∞ 
I2 = K ZT dFZT − ZT dFZT
−∞ 1/K
h  i
1
= K E P [ZT ] − LZT
k (12.6.18)
1
  − ln + u + v 
=K e u+v/2
−e u+v/2
Φ K √
v
S t
  − ln − (r − 12 σ 2 )τ 
= Keu+v/2 1 − Φ K √
σ τ
K −rτ
= e Φ(d2 ), since 1 − Φ(−d2 ) = Φ(d2 ).
S

Substituting these values of I1 and I2 into Eq (12.6.13) we get

V (St , t) = St E P [(1 − KZT )|Ft ] = St [I1 − I2 ]


= St Φ(d1 ) − Ke−rτ Φ(d2 ), (12.6.19)

which is the Black-Scholes call price in Eq (12.6.1).

12.6.3 Black-Scholes by Direct Integration. The European call price


C(St , K, T ) is a discounted time t expected value of (St − K)+ under the
12.6 BLACK-SCHOLES CALL PRICE 291

EMM ∐ when the interest rates are constant. Thus, from Eq (12.6.6)

C(St , K, T ) = e−rτ E Q [(ST − K)+ |Ft ]


Z ∞
= e−rτ (ST − K) dF (ST )
K
(12.6.20)
Z ∞ Z ∞
= e−rτ ST dF (ST ) − e−rτ K dF (ST ).
K K

To evaluate these two integrals, we will use the results derived in §12.6.2, that
under Q and at time t the terminal stock  price ST follows the log-normal
distribution with mean ln St + r − 21 σ 2 τ and variance σ 2 τ , where τ = T − t
denotes the time to maturity.
The first integral in the last line of Eq (12.6.20) uses the conditional ex-
pectation of ST , given that ST > K, thus
Z ∞
ST dF (ST ) = E Q [ST |ST > K] = LST (K), (12.6.21)
K

where the conditional expectation from Eq (12.5.8) is given by


n   o
1 2 1
L ST = exp ln St + r − σ τ + σ 2 τ
2 2
 − ln K + ln S + (r − 1 σ 2 )τ + σ 2 τ 
t 2
×Φ √
σ τ
= St erτ Φ(d1 ), (12.6.22)

and thus, the first integral in the last line of Eq (12.6.20) is St Φ(d1 ). Next,
using Eq (12.5.4), the second integral in the last line of Eq (12.6.20) can be
written as
Z ∞
e−rτ K dF (ST ) = e−rτ K[1 − F (K)]
K
h  ln K − ln S − (r − 1 σ 2 )τ i
t
= e−rτ K 1 − Φ √ 2
σ τ
= e−rτ K[1 − Φ(−d2 )] = e−rτ KΦ(d2 ).

Hence, combining these two terms we obtain Eq (12.6.1) for the European
call price.
292 12 BLACK-SCHOLES MODEL

12.6.4 Feynman-Kac Theorem. First, we will discuss this theorem.


Theorem 12.1. (Feynman-Kac theorem) Suppose that xt follows the
process
dxt = µ(xt , t) dt + σ(xt , t)], dWtQ , (12.6.23)
and suppose that the differentiable function V = V (xt , t) satisfies the partial
differential equation

∂V ∂V 1 ∂2V
+ µ(xt , t) + σ(xt , t)2 2 − r(t, x)V (xt , t) = 0, (12.6.24)
∂t ∂x 2 ∂x

with boundary condition V (xT , T ). Then this equation has the solution

n Z T o
Q
V (xt , t) = E [exp − r(Su , u) du V (xT , T ) Ft ]. (12.6.25)
t

In this equation the time-t expectation is defined with respect to the same
measure Q under which the stochastic part of Eq (12.6.23) defines Brownian
motion.
In order to apply the Feynman-Kac theorem to the Black-Scholes call price,
note that the value Vt = V (St , t) of a European call option written at time t
with strike price K and constant rates of interest r satisfies the Black-Scholes
equation
∂V ∂V 1 ∂2V
+ rSt + σ 2 St2 − rVt = 0, (12.6.26)
∂t ∂S 2 ∂S 2
with boundary condition V (ST , T ) = (ST − K)+ .1 Eq (12.6.26) is the same
as Eq (12.6.24) for xt = St , µ(xt , t) = rSt , and σ(xt , t) = σSt . Thus, we can
apply the Feynman-Kac theorem so that the value of the European call is
given by
n Z T o
Q
V (St , t) = E [exp − r(Xu , u) du V (ST , T )|Ft ]
t
= e−rτ E Q [(ST − K)+ Ft ]. (12.6.27)

This equation is the same as Eq (12.6.7). Hence, the expectation in Eq


(12.6.27) can be evaluated exactly in the same way as above in §12.6.3, and
thus, we obtain the call price in Eq (12.6.1).

12.6.5 CAPM. The Capital Asset Pricing Model (CAPM) is based on the
assumption that the expected return ri of a security i in excess of the risk-free
rate r is
E[ri ] − r = βi (E[rM ] − r),
1 See www.frouah.com for the derivation of Eq (12.6.26).
12.6 BLACK-SCHOLES CALL PRICE 293

where rM denotes the return on the market, and the security’s beta is given
by
Cov[ri , rM ]
βi = .
Var[rM ]

12.6.6 CAPM for Assets. During  the time increment dt, the expected
dSt
stock price return E[rS dt] is E , where St satisfies the diffusion equation
St
(12.6.3). Then the expected return is
 
dSt
E = r dt + βS (E[rM ] − r) dt. (12.6.28)
St

Similarly, the expected return on the derivative E[rV dt], where Vt satisfies
the diffusion equation (12.6.5), is
 
dVt
E = r dt + βV (E[rM ] − r) dt. (12.6.29)
Vt

If we divide both sides of Eq (12.6.4) by Vt , we get

dVt 1  ∂V 1 ∂2V  ∂V dSt St


= + σ 2 St2 2
dt + ,
Vt Vt ∂t 2 ∂S ∂S St Vt
or
1  ∂V 1 ∂2V  ∂V St
rV dt = + σ 2 St2 dt + rS dt. (12.6.30)
Vt ∂t 2 ∂S 2 ∂S Vt
If we drop dt from both sides and take covariance of rV and RM , and note
that only the second term on the right-hand side of Eq (12.6.30) is stochastic,
we get
∂V St
Cov[rV , rM ] = Cov[rS , rM ], (12.6.31)
∂S Vt
which yields the relation between the beta of the derivative βV and the beta
of the stock βS as
 ∂V S 
t
βV = βS ,
∂S Vt
which is same relation as in Black-Scholes [1973, Eq(15)]. Next, multiplying
Eq (12.6.29) by Vt we obtain

E[dVt ] = rVt dt + Vt βV (E[rM ] − r) dt


∂V
= rVt dt + St βS (E[rM ] − r) dt, (12.6.32)
∂S
294 12 BLACK-SCHOLES MODEL

which is same relation as in Black-Scholes [1973, Eq(18)]. Now, taking expec-


tations of the second line in Eq (12.6.5), and substituting for E[dSt ] from Eq
(12.6.29), we obtain

∂V ∂V 1 ∂2V 2 2
E[dVt ] = dt+ [rSt dt+ St βS (E[rM − r]) dt]+ σ S dt. (12.6.33)
∂t ∂S 2 ∂S 2
On equating Eqs (12.6.32) and (12.6.33), dropping dt from both sides, and
canceling terms in βS , we get the Black-Scholes equation (12.6.26). Hence, we
have obtained the Black-Scholes call price by using the Feynman-Kac theorem
exactly the same way as in §12.6.4 and solving the integral as in §12.6.3.

12.7 Dividends
The Black-Scholes call price in Eq (12.6.1) is for a call written on a non-
dividend-paying stock. There are two ways to incorporate dividends into the
call price: (i) by assuming that the stock pays a continuous dividend yield
q, or (ii) by assuming that the stock pays dividends in lump sum or ‘lumpy’
dividends.

12.7.1 Continuous Dividends. Assume that the dividend yield q is con-


stant so that the stockholder receives an amount qSt dt of dividend in the time
increment dt. After the dividend is paid out, the value of the stock drops by
the dividend amount. In other words, without the dividend yield, the value
of the stock increases by rSt dt, but with the dividend yield, the value of the
stock increases by rSt dt − qSt dt = (r − q)St dt. Thus, the expected return
becomes r − q instead of r, which means that the risk-neutral process for St
satisfies Eq (12.6.8) but with drift r − q instead of r we have

dSt = (r − q)St dt + σSt dWtQ . (12.7.1)

Following the same derivation method as in §12.6.2, Eq (12.7.1) has the solu-
tion n o
ST = St exp (r − q − 12 σ 2 )τ + σWτQ , τ = T − t. (12.7.2)

Thus, ST follows the log-normal distribution with mean St e(r−q)τ and variance
2
St2 e2(r−q)τ (eσ τ − 1). Then, proceeding as in Eq (12.6.20), the call price is

C(St , K, T ) = e−rτ LST (K) − e−rτ (1 − F (K)), (12.7.3)

where the conditional expectation LST (K) from Eq (12.5.8) becomes


n 1  1 o
LST (K) = exp ln St + r − q − σ 2 τ + σ 2 τ
2 2
 − ln K + ln S + (r − q − 1 σ 2 )τ + σ 2 τ 
t 2
×Φ √
σ τ
= St e(r−q)τ Φ(d1 ), (12.7.4)
12.8 SOLUTIONS OF SDES 295

where  
St 1
ln + r − q + σ2 τ
K 2
d1 = √ .
σ τ
Using Eq (12.5.4), the second term in Eq (12.6.20) becomes

h  ln K − ln S − r − q − 1 σ 2  τ i
−rτ −rτ t 2
e K[1 − F (K)] = e 1−Φ √
σ τ
= e−rτ KΦ(d2 ), (12.7.5)

where d2 = d1 σ τ , as before. Then substituting Eqs (12.7.4) and (12.7.5)
into Eq (12.7.2), we obtain the Black-Scholes price of a European call written
on a stock that pays continuous dividends as

C(St , K, T ) = St e−qτ Φ(d1 ) − e−rτ KΦ(d2 ). (12.7.6)

Notice that the only modification is that the current value of the stock price
is decreased by e−qτ , and the return on the stock is decreased from r to r − q.

12.7.2 Lumpy Dividends. The concept is the same as above, except that
the current value of the stock price is decreased by the dividends, though not
continuously.

12.8 Solutions of SDEs


12.8.1 Stock Price. Recall that St is driven by the diffusion in Eq (12.6.3).
Now, apply Itô’s lemma to ln St , and get

d ln St = (µ − 21 σ 2 ) dt + σ dWt , (12.8.1)

which, upon integration from 0 to t, gives


Z t Z t Z t
d ln Su = (µ − 12 σ 2 ) du + σ dWu ,
0 0 0

so that
ln St − ln S0 = (µ − 12 σ 2 ) t + σWt , since W0 = 0.
Hence, the solution of the SDE is
n o
St = S0 exp (µ − 21 σ 2 )t + σWt . (12.8.2)

Since Wt is distributed normal N (0, t) with zero mean and variance t, we


find that ln St follows the normal distribution with mean ln S0 + (µ − 21 σ 2 )t
296 12 BLACK-SCHOLES MODEL

and variance σ 2 t. Hence, in view of (12.5.1) and (12.5.2), St follows the log-
2
normal distribution with mean S0 eµt and variance S02 e2µt (eσ t − 1). We can
also integrate Eq (12.8.1) from t to T and obtain

ST = St exp (µ − 21 σ 2 )τ + σ(WT − Wt ) ,

which is similar to Eq (12.8.2), and thus, ST follows the log-normal distribu-


2
tion with mean St eµτ and variance St2 e2µτ (eσ τ − 1).

12.8.2 Bond Price. Applying Itô’s lemma to the function ln Bt , we find


that this function follows the SDE

d ln Bt = rt dt.

Integrating this equation from 0 to t we get


Z t
d ln Bt − dLnB0 = ru du,
0

which yields the solution of this SDE as


nZ t o
Bt = exp ru du , since B0 = 1.
0

Note that when interest rates are constant, rt = r and Bt = ert . Thus,
integrating from t to T we get the solution
nZ T o
Bt,T = exp ru du =⇒ Bt,T = erτ for constant interest rates.
t

12.8.3 Discounted Stock Price as Martingale. To find a measure Q


such that under this measure the discounted stock price that uses Bt is a
martingale, let

µ − rt
dSt = rt St dt + σSt dW Q , where WtQ = Wt + t. (12.8.3)
σ
We know that under Q, at time t = 0, the stock price St follows the log-normal
2
distribution with mean S0 ert t and variance S02 e2rt T eσ t − 1 ; however St is
not a martingale. If we use Bt as the numéraire, the discounted stock price
S̃t = St /B + t will be a martingale. Applying Itô’s lemma to S̃t , we get the
SDE
∂ S̃t ∂ S̃t
dS̃t = dBt + dSt , (12.8.4)
∂B ∂S
12.8 SOLUTIONS OF SDES 297

since all terms involving second-order derivatives are zero. Expanding Eq


(12.8.4) we get the SDE

St 1
dS̃t = − dBt + dSt
Bt2 Bt
St 1  
= − 2 rt Bt dt + rt St dt + σSt dWtQ
Bt Bt
= σ S̃t dWtQ .

The solution of this SDE is


n 1 o
S̃t = S̃0 exp − σ 2 t + σWtQ . (12.8.5)
2

Thus, ln S̃t follows the normal distribution with mean ln S̃0 − 12 σ 2 t and variance
σ 2 t. For a proof that S̃ is a martingale, see Exercise 12.5.
12.8.4 Summary. Apply Itô’s lemma to the stock price dSt = µSt dt +
σSt dWt and the bond prince dBt = rt Bt dt, we obtain the processes for ln St
and ln Bt :
 
1
d ln St = µ − σ 2 dt + σ dWt ,
2
d ln Bt = rt dt.

Solving these equations, we get

n 1 2
 o nZ t o
St = S0 exp µ − σ t + σWt , and Bt = exp rs ds .
2 0

If we apply a change of measure to get the stock price under the risk-neutral
measure Q, we have
n 1
 o
dSt = rSt + σSt dWtQ =⇒ St = S0 exp r − σ 2 t + σWtQ .
2

Since St is not a martingale under Q, we discount St by Bt to obtain S̃t =


St /Bt and
n 1 o
dS̃t = σ S̃t dWtQ =⇒ S̃t = S̃0 exp − σ 2 t + σWtQ .
2

Thus, S̃t is a martingale under Q.


298 12 BLACK-SCHOLES MODEL

The distributions of the above processes are summarized in Table 12.3,


where S̃ = S/B. Also note that the logarithm of the stock price is normally
distributed.
Table 12.3 Distributions

Stochastic Log-normal Distribution† Process a


Process Mean Variance Martingale
2 
dS = µS dt + σS dW St eµτ St2 e2µτ eσ τ − 1 no
2 
dS = rS dt + σS dW Q St erτ St2 e2rτ eσ τ − 1 no
2 
dS̃ = µS̃ dt + σS dW Q S̃t S̃t2 eσ τ − 1 yes
2 
dS̃ = (µ − r)S̃ dt + σ S̃ dW Q St e(µ−r)τ St2 e2(µ−r)τ eσ τ − 1 no

† This distribution is for ST |Ft .

12.9 Exercises
12.1. Let the price S of a share today be $ 12.00. We will construct a time
series for the share prices over four intervals if µ = 0.35, σ = 0.26, dt = 1/252,
where 252 is the number of trade days in a year. We will determine dX at
each
√ step from the normal distribution
√  with mean 0 and standard deviation
1/ 252 ≈ 0.063, i.e., N 0, 1/ 252 = N (0, 0.063).
Step 1. At time t = 0 we have S0 = 12. We choose a value for dX from
N (0, 0.063), so take dX = −0.05. Then Eq (12.1.2) gives

dS 0.35
= + 0.26(−0.05) = −0.0116 =⇒ dS = 12(−0.0116) = −0.14,
12 252
thus, S1 = 12 − 0.14 = 11.86.
Step 2. S1 = 11.86; choose dX = 0.15. Then

dS1 0.35
= + 0.26(0.15) = 0.04 =⇒ dS = 11.86(0.04) = 0.48,
11.86 252
thus, S2 = 11.86 + 0.48 = 12.34.
Step 3. S2 = 12.34; choose dX = 0.09. Then

dS2 0.35
= + 0.26(0.09) = 0.025 =⇒ dS = 12.34(0.025) = 0.31,
12.34 252
thus, S2 = 12.34 + 0.31 = 12.65.
Step 4. S3 = 12.65; choose dX = 0.12. Then

dS3 0.35
= + 0.26(0.12) = 0.0325 =⇒ dS = 12.65(0.0325) = 0.41,
12.65 252
12.9 EXERCISES 299

thus, S2 = 12.65 + 0.41 = 13.06.


12.2. Derive Eq (12.1.2).
Solution. Let the return from a stock be defined discretely as
δSt √
= µ δt + σ δt. (12.9.1)
St
To find the continuous version of Eq (12.9.1) we will use Brownian motion
which is a stochastic process, i.e., a set of random variables {Xt }, t ≥ 0
such that (i) the mapping t 7→ Xt is a.s. continuous, (ii) the process has
stationary, independent increments, and (iii) the increment Xt+s − Xt is nor-
mally distributed with variance t. Let us consider the stochastic process
{Xtn }, t ≥ 0, n ≥ 1, defined by
1 X
Xtn = √ εj , (12.9.2)
n
1≤j≤⌊nt⌋

where ε1 , ε2 , . . . denote a sequence of independent standard normal random


variables such that εj ∼ N (0, 1). Thus, Xt is a random walk that takes a new
step every 1/n units of time. Note that by the central limit theorem1
1 X
p εj (12.9.3)
⌊nt⌋ 1≤j≤⌊nt⌋

converges (in distribution) to a standard normal random variable Z. Then


using (12.9.3), we have
p
n ⌊nt⌋ 1 X
Xt = √ p εj .
n ⌊nt⌋ 1≤j≤⌊nt⌋
p
⌊nt⌋ √ √
Now, since lim √ = t, we get in the limit Xt = t Z. Moreover, for
n→∞ n
s, t ∈ {0, 1/n, 2/n, . . . } we have

n 1 X 1 X 1 X
Xt+s − Xtn = √ εj − √ εj = √ εj .
n n n
1≤j≤n(t+s) 1≤j≤nt nt+1≤j≤n(t+s)

n

Since Xt+s − Xtn → N (0, s), we get Xt+s − Xt = sZ. Thus, dXt behaves

like dt. Hence, we obtain the continuous time analog of Eq (12.9.1) as
dSt
= µ dt + σ dX, (12.9.4)
St
1 This theorem states that, given certain conditions, the arithmetic mean of a sufficiently
large number of iterates of independent random variables, each with a well-defined finite
expected value and finite variance, will be approximately normally distributed, regardless
of underlying distribution.
300 12 BLACK-SCHOLES MODEL

which is Eq (12.1.2).
12.3. Derive the Black-Scholes equation for a contingent claim f (St , t)
with a risk-free self-financing portfolio Π.
Solution. A self-financing portfolio can be constructed using Itô’s lemma.
Thus, if dSt = St µ dt + S5 σ dX, and f : (St , t) 7→ R, we find that if dx =
(dSt , dt)′ , then

∂f ∂f
df = (∇f, dx) = dS + dt
∂S ∂t
 ∂f ∂f  ∂f
= St µ + dt + St σ dX.
∂S ∂t ∂S
√ 1 ∂2f
Since dX behaves like dt, we may take (dX)2 = dt. Then 2 (dS)2 =
∂S 2
1 ∂2f 2 2
S σ dt up to first order. Hence, by Itô’s lemma
2 ∂S 2 t
 ∂f ∂f ∂2f 2 2 ∂f
df = St µ + + S σ dt + St σ dX, (12.9.5)
∂S ∂t ∂S 2 t ∂S

where dX is defined in (12.9.4). Notice that the only randomness in df is


the dX term. Thus, we can construct a portfolio that eliminates the random
part, and the rest we can easily control.
First, we will rely on the discrete version of (12.9.4) and (12.9.5). Since we
want to price a contingent claim, or derivative, a simple method is to set

−1 for derivative,
Π=
∆ for shares,

∂f
where ∆ ≡ . For a small change δt in time, the corresponding change in
∂S
Π is δΠ = −δf + ∆ δS. The discrete version of (12.9.4) and (12.9.5) give
 ∂f 1 ∂2f 2 2
δΠ = − − σ St δt, (12.9.6)
∂t 2 ∂S 2

which implies that the portfolio is risk-less (no uncertainty), and then by
arbitrage argument we must have δΠ = rΠ δt, or
 ∂f 1 ∂2f 2 2
− − σ St δt = r(−f + ∆S) δt,
∂t 2 ∂S 2

which yields
 ∂f 1 ∂2f 2 2 
+ σ S t + r∆S δt = rf δt,
∂t 2 ∂S 2
12.9 EXERCISES 301

or
∂f 1 ∂2f 2 2
+ σ St + r∆S = rf,
∂t 2 ∂S 2
which yields the Black-Scholes-Merton partial differential equation

∂f 1 ∂2f 2 2 ∂f
+ σ St + r S − rf = 0, (12.9.7)
∂t 2 ∂S 2 ∂S

with known Cauchy data f (St , T ). Thus, any function f that satisfies Eq
(12.9.7) denotes the price of some theoretical contingent claim, and every
contingent claim must satisfy Eq (12.9.7).
12.4. Prove that S̃t , defined in Eq (12.8.5), is a martingale under Q.
Solution. Consider the expectation under Q for s < t

 1 
E Q [S̃t |Fs ] = S̃0 exp − σ 2 t E Q [exp σWtQ |Fs ]
2
n 1 o n o
= S̃0 exp − σ 2 t + σWsQ E Q [exp σ(WtQ − WsQ ) |Fs ].
2

Note that at time s the quantity WtQ −WsQ is distributed as N (0, t−s), which
Q
is the same as the distribution to Wt−s at time zero. Hence, we have
n 1 o n o
E Q [S̃t |Fs ] = S̃0 exp − σ 2 t + σWsQ E Q [exp σWt−s
Q
|F0 ].
2

Now, the moment generating function (mgf) of a random variable X with


normal distribution N (µ, σ 2 ) is
n 1 o
E[eφX ] = exp µφ + σ 2 φ2 .
2
Q
Since Wt−s under Qn is Q-Brownian
o motion distributed as N (0, t − s), the mfg
Q Q

of Wt−s is E [exp σWt−s ] = exp 21 σ 2 (t − s) , where σ is the same as φ.
Q

Thus, we have

1 n o n1 o
E Q [S̃t |Ft ] = S̃0 exp− σ 2 t + σWsQ exp σ 2 (t − s)
2 2
n 1 o
2 Q
= S̃0 exp − σ s + σWs = S̃s ,
2

which shows that S̃t is a Q-martingale.


Note that (i) pricing a European call option under Black-Scholes uses the
fact that under Q, at time t, the terminal stock price at expiry, ST , follows the
302 12 BLACK-SCHOLES MODEL

2 
normal distribution with mean St erτ and variance St2 e2rτ eσ τ − 1 when the
interest rate rt is the constant value r; and (ii) under the original measure,
the process for S̃t is dS̃t = (µ − r)S̃t dt + σ S̃t dWt , which is clearly not a
martingale.
12.5. Show that the process Y defined by Yt = t2 Wt3 , t ≥ 0 satisfies the
SDE  Y 
t
dYt = 2 + 3(t4 Yt )1/3 dt + 3(tYt )2/3 dWt , Y0 = 0.
t
Solution. Obviously, Y= 0. The function f (t, x) = t2 x3 is in C 2 , and so
by Itô’s lemma

∂f ∂f 1 ∂2f
dYt = (t, Wt ) dt + (t, Wt ) dWt + (t, Wt ) d(W, W )t
∂t ∂x 2 ∂x2
1
= 2tWt3 dt + 3t2 Wt2 dWt + 6t2 Wt dt
2
= (2tWt3 + 3t2 Wt ) dt + 3t2 Wt2 dWt .

The result is obtained since 2tWt3 = 2Yt /t, 3t2 Wt = 3(t4 Yt )1/3 , and 3t2 Wt2 =
3(tYt )2/3 .
12.6. Show that the process X given by Xt = eWt +t/2 + eWt −t/2 , t ≥ 0,
satisfies the SDE

dXt = Xt dWt + eWt +t/2 dt, X0 = 2.

Solution. Since Xt = eWt −t/2 (et + 1), t ≥ 0, set Zt = eWt −t/2 and
Yt = (et + 1), t ≥ 0. Then Z = E[W ] satisfies the SDE dZt = Zt dWt , and Y
satisfies the SDE dYt = et dt. Since Z is a continuous semi-martingale and Y
is continuous martingale of bounded variation, we have Z, Y ≡ [Z, Y ] ≡ 0
(P– a.s.). Hence, by the product rule we get

dXt = Yt dZt + Zt dYt + Z, Y t


= Yt Zt dWt + Zt et dt
= Xt dWt + eWt +t/2 dt.

Also, X0 = e0 + e0 = 2.
12.7. Derive the Black-Scholes Equation (12.1.1). Solution. Given a
continuous and continuously differentiable function f over an interval I, its
Taylor’s series at a point x = a ∈ I is

X f (n) (a)
f (x) = (x − a)n . (12.9.8)
n=0
n!
12.9 EXERCISES 303

If this series converges to f on I, then



X f (n) (a)
f (x) = f (a) + (x − a)n ,
n=1
n!

or ∞
X f (n) (a)
f (x) − f (a) = (x − a)n . (12.9.9)
n=1
n!

If we replace x by x + ∆x and a by x, then (12.9.9) gives



X f (n) (x) f ′′ (x)
∆f = f (x + ∆x) − f (x) = (∆x)n = f ′ (x) + (∆x)2 + · · · .
n=1
n! 2!
(12.9.10)
Now, let t → 0 in Eq (12.1.2), i.e., we have: with probability 1, (dX)2 → dt
as t → 0. Suppose that f (S) is a function of the asset price S. If we change
S by a small amount dS, then by (12.9.10) we have

df 1 d2 f
df = dS + (dS)2 + · · · . (12.9.11)
dS 2 dS 2

By (12.1.2)

dS = S(µ dt + σ dS),
(dS)2 = S 2 (µ2 (dt)2 + 2µσ dt dX + σ 2 (dX)2 ).
(12.9.12)

Since, with probability 1, (dX)2 → dt as dt → 0, the term S 2 σ 2 (dX)2 dom-


inates the expression for (dS)2 in (12.9.12) as dt becomes small. Hence, we
will retain only this term and use S 2 σ 2 dt as an approximation in (12.9.11)
for (dS)2 as dt → 0, thus giving Itô’s lemma:

df 1 d2 S 2 2 df d2 S
df = dS + 2
(S σ dt) = (Sµ dt + σ dX) + 2 (S 2 σ 2 dt)
dS 2 dt dS dt
df  df 1 2 2 d2 S 
= σS dX + µS + σ S dt. (12.9.13)
dS dS 2 dt2
Itô’s lemma relates a small change in a function of a random variable to a small
change in the variable itself, as it contains a deterministic component dt and
a random component dX. We will, however, need the following multivariate
version of Itô’s lemma: If f is a function of two variables S, t, then

∂f  ∂f 1 ∂2f ∂f 
df = σS dX + µS + σ2 S 2 2 + dt. (12.9.14)
∂S ∂S 2 ∂S ∂t
304 12 BLACK-SCHOLES MODEL

We will now derive the Black-Scholes equation (12.1.1). Let V (S, t), which
is called C(S, t) for a call and P (S, t) for a put, denote the value of an option,
and be r the interest rate. Using Itô’s lemma (12.9.14) we have

∂V  ∂V 1 ∂2V ∂V 
dV = σS dX + µS + σ2 S 2 + dt. (12.9.15)
∂S ∂S 2 ∂S 2 ∂t

To derive Eq (12.9.15), consider a portfolio containing one option and −∆


units of the underlying stock. Then the value of the portfolio is Π = V − ∆S,
which gives dΠ = dV − δ dS, and

∂V  ∂V 1 ∂2V ∂V 
dΠ = σS dX + µS + σ2 S 2 2
+ dt − ∆Sµ dt − ∆Dσ dX
∂S ∂S 2 ∂S ∂t
 ∂V   ∂V 1 ∂2V ∂V 
= σS − ∆ dX + µS + σ2 S 2 2 + − µ ∆S dt
∂S ∂S 2 ∂S ∂t
1 2
2 2∂ V ∂V  ∂V
= σ S + dt, choosing ∆ = . (12.9.16)
2 ∂S 2 ∂t ∂S
Now, if Π was invested in risk-less assets, it would have a growth of rΠ dt
during the interval dt. Then, using (12.9.16), we should get for a fair price
1 ∂2V ∂V   ∂V  1 2 2 ∂ 2 V ∂V
rΠ dt = σ2 S 2 2
+ dt =⇒ r V − S = σ S + ,
2 ∂S ∂t ∂S 2 ∂S 2 ∂t

which yields the Black-Scholes equation (12.1.1).


A
Probability Topics

A.1 Definition of Probability Measure


(i) 0 ≤ P (A) ≤ 1 for each event A;
(ii) P (Ω) = 1;
(iii) A ∩ B = ∅ =⇒ P (A ∪ B) = P (A) + P (B); and

 S  P ∞
(iv) Ai ∩ Aj = ∅ =⇒ P Ai = P (Ai ).
i=1 i=1

T
Property (iv) is equivalent to Ai ⊃ Ai+1 , Ai = ∅ =⇒ lim P (An ) = 0.
i=1 n→∞

A.2 Probability Laws (Note: ¬A is used for not-A)


(i) P (¬A) = 1 − P (A);
(ii) P (A ∪ B) = P (A) + P (B) − P (A ∩ B);
(iii) P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C)
−P (B ∩ C) + P (A ∩ B ∩ C);
(iv) P (B\A) = P (B) − P (A ∩ B);
(v) P (A△B) = P (A) + P (B) − 2P (A ∩ B);
Sn  P n
(vi) P Ai ≤ P (Ai ) (Boole’s inequality);
i=1 i=1
Tn  Pn
(vii) P Ai ≥ 1 − (1 − P (Ai )) (Bonferroni’s inequality).
i=1 i=1

A.3 Conditional Probability and Independent Events


P (A ∩ B) P (A ∩ B)
(i) P (A | B) = , P (B | A)) = ;
P (B) P (A)
(ii) P (A ∩ B) = P (A) · P (B | A);
(iii) P (A ∩ B ∩ C) = P (A) · P (B | A) · P (C | A ∩ B);
(iv) P (A1 ∩ . . . ∩ An ) = P (A1 )P (A2 | A1 )P (A3 | A1 ∩ A2 ) · · ·
P (An | A1 ∩ · · · ∩ An−1 ) (see Figure A.1 for n = 4);
306 A PROBABILITY TOPICS

P (Ai )P (B | Ai )
(v) P (Ai | B) = P
n (Bayes’ formula);
P (Ai )P (B | Ai )
i=1
n
P
(vi) P (B) = P (Ai )P (B | Ai ) (Total probability formula)
i=1

Figure A.1 Conditional probability and independent events.


For independent events the following hold:
(i) P (A ∩ B) = P (A)P (B).
(ii) P (B | A) = P (B), P (A | B) = P (A).
(iii) P (A ∩ B ∩ C) = P (A)P (B)P (C).
(iv) P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 )P (A2 ) · · · P (An ).
(v) If A and B are independent, then (¬A, B), (A, ¬B) and (¬A, ¬B) are
pairs of independent events.

A.4 Random (Stochastic) Variables


Discrete random variable X and continuous random variable X are pre-
sented in Figure A.2(a) and (b). Let f (X) denote their probability function.

Figure A.2 (a) Discrete, and (b) continuous random variable X .

Then, using the following notation: P (X ∈ A) for probability; E[x] or µ for


expectation; F (x) for distribution function; Var[X] or σ 2 for variance; H(x)
for entropy; DRV for discrete random variable; CRV for continuous random
A.5 BIVARIATE (TWO-DIMENSIONAL) RANDOM VARIABLE. 307

variable, we have

DRV X CRV X
P R
P (X ∈ A) = f (x) A
f (x) dx
x∈A
P R
E[X] ≡ µ = xf (x) A
xf (x) dx
x∈A R∞
P
F (x) = P (X ≤ x) = f (t) −∞ f (t) dt
t≤x
Var[X] ≡ σ 2 = E[(X − µ)2 ] 2
R − µ) ] 2
E[(X
P
= (x − µ)2 f (x) = Ω (x − µ) f (x) dx
x∈Ω
H(x) = E[ln(1/f (X))] E[ln(1/f
R (X))]
P
=− f (x)(ln f (x)) = − Ω f (x) ln f (x) dx
x∈Ω

1
Note that if y = ln x, then y ′ = , x > 0. In general, if y = loga x, then
x
1
y′ = , a > 1.
x ln a

Expectations : E[aX] = aE[X], E[X + Y ] = E[X] + E[Y ].

P
For discrete random variable : E[g((X)] = g(x)fX (x).
x∈ΩR
For continuous random variable : E[g((X)] = Ω
g(x)fX (x) dx.

Variances : (i) Var[aX] = a2 Var[X]; (ii) Var[X] = E[X 2 ] − (E[X])2 (Steiner’s


theorem); and (iii) Var[X + Y ] = Var[X] + Var[Y ] + 2Cov[X, Y ].

Chebyshev inequality : (i) P (|X| ≥ a) ≤ E[X 2 ]/a2 ; (ii) P (|X − µ| ≥ a) ≤


Var[X]/a2 ; and (iii) P (|X − µ| ≥ kσ) ≤ 1/k 2 .

A.5 Bivariate (Two-dimensional) Random Variable.

The discrete and continuous variable (X, Y ) is defined below and presented
308 A PROBABILITY TOPICS

in Figure A3.

Figure A.3 (a) Discrete, and (b) continuous random variables X, Y .

Discrete Variable (X, Y ) Continuous Variable (X, Y )


P P RR
P ((X, Y ) ∈ A) = f (x, y) f (x, y) dx dy
= A
(x,y)∈A
P P Rx Ry
F (x, y) = P (X ≤ x, Y ≤ y) = f (u, v) = −∞ −∞ f (u, v) du dv
u≤x v≤y
P R∞
fX (x) = f (x, y) = −∞ f (x, y) dy
y
P R∞
fY (y) = F (x, y) = −∞ f (x, y) dx
Px P RR ∞
E[g[X, Y ]] = g(x, y)f (x, y) = −∞ g(x, y)f (x, y) dx dy
x y

In this Table fX (x) denotes the marginal distribution, and E[g[X, Y ]] the
expectation.
Schwarz inequality: (E[XY ])2 ≤ E[X 2 ]E[Y 2 ].
For independent random variables X and Y ,
f (x, y) = fX (x)fY (y).
F (x, y) = FX (x)FY (y).
E[XY ] = E[X] + E[Y ].
Var[X + Y ] = Var[X] + Var[Y ].
For independent
R ∞random variables X and Y ,
fX+Y (x) = −∞ fX (t)fY (x − t) dt.

R xrandom variables X and Y (X ≥ 0, Y ≥ 0),


For independent
fX+Y (x) = 0 fX (t)fY (x − t) dt.
Covariance :
Cov[X, Y ] = E[(X − µ1 )(Y − µ2 )], E[X] = µ1 , E[Y ] = µ2 .
Cov[X, X] = Var[X].
Cov[X, Y ] = E[XY ] − E[X]E[Y ] = Cov[Y, X].
X and Y independent =⇒ Cov[X, Y ] = 0.
A.8 CENTRAL LIMIT THEOREM 309

Var[X + Y ] = Var[X] + Var[Y ] + 2Cov[X, Y ].

Correlation Coefficient ρ :
Cov[X, Y ]
ρ= p , −1 ≤ ρ ≤ 1,
Var[X]Var[Y ]
X and Y independent =⇒ ρ = 0, p
Var[X + Y ] = Var[X] + Var[Y ] + 2ρ Var[X]Var[Y ].

A.6 Moments
The kth central moment µk is defined as µk = E[(X − µ)k ].
The skewness γ1 and kurtosis γ2 are defined as

γ1 = µ3 /σ 3 , γ2 = (µ4 /σ 4 ) − 3.

For the N (µ, σ) normal distribution


µ2k+1 = 0, µ2 = σ 2k (2k − 1)!!, γ1 = γ2 = 0.

A.7 Convergence
Convergence in probability:
lim Xn = X ⇐⇒ lim P (|Xn − X| > ε) = 0 for each ε > 0.
n→∞ n→∞
Convergence almost surely:
 
plimn→∞ Xn = xX ⇐⇒ P lim Xn = X = 1.
n→∞
Convergence in distribution:
Xn 7→ X ⇐⇒ lim P (Xn ≤ x) = P (X ≤ x) for each x
n→∞
such that P (X ≤ x) is continuous in x.
Convergence in mean:
l. i. m.n→∞ Xn = X ⇐⇒ lim E[|Xn − X|2 ] = 0.
n→∞
For these kinds of convergence we have:

plimn→∞ Xn = X
⇐⇒ limpn→∞ Xn = X =⇒ Xn 7→ X
l. i. m.n→∞ Xn = X

in distribution, where plim is the limit in probability, and l.i.m. is the limit
in mean.

A.8 Central Limit Theorem


The central limit theorem (CLT) used in probability theory establishes that
in most cases when the independent variables are added, their sum tends
toward a normal distribution (known as the bell curve) even if the original
310 A PROBABILITY TOPICS

variables themselves are not normally distributed. In other words, under cer-
tain conditions the arithmetic mean of a sufficiently large number of iterates
of independent random variables, each with a finite expected value and finite
variance, will be approximately normally distributed, regardless of the dis-
tribution used. A simple example is that if one flips a coin many times, the
probability of getting a given number of heads in a series of flips must follow
a normal curve, with mean equal to half of the total number of flips in each
series. More details are available in Rice [1995].
Let {X1 , . . . , Xn } denote a sequence of independent and identically dis-
tributed random variables drawn from distributions of expected values by µ
and finite variance by σ 2 . Consider the sample average

X1 + · · · + Xn
Sn =
n
of such random variables. Then by the law of large numbers, this sample
average converges in probability and almost surely to the expected value µ
as n → ∞. The classical central limit theorem determines the size and the
distributional form of the deterministic number µ during the convergence.
This
√ theorem states that as n gets larger, the distribution of the difference
n (Sn −µ) approximates the normal distribution with mean zero and variance
σ 2 . For very large n the distribution of Sn becomes close to the normal
distribution with mean µ and variance σ 2 /n.
Theorem A.1. (Lindeberg-Lévy CLT) Suppose {X1 , . . . , Xn } is a se-
quence of independent and identically distributed random variables √
with E[Xi ] =
µ and Var[Xi ] = σ 2 < ∞. Then as n → ∞, the random variables n (Sn− µ)
converge in distribution to a normal N (0, σ 2 ), i,e.,
n
√ h 1 X  i
n Xi − µ −→ N (0, σ 2 ).
n i=1

For proof, see Billingsley [1995:357].


B
Differentiation of Operators

Let X and Y denote normed linear spaces over a field F which may be R,
or Rn . The mappings defined by f : Rn 7→ R may, in general, not be linear.
Let L(X, Y ) denote the class of all linear operators from X into Y , and let
B(X, Y ) denote the class of all bounded linear operators.

B.1 Gateaux Differential


Let x0 ∈ X be a fixed point and let f : X 7→ Y . If there exists a function
δf (x0 , ·) : X 7→ Y such that

f (x0 + th) − f (x0 )


lim − δf (x0 , h) = 0, (B.1)
t→0 t
where t ∈ F for all h ∈ X, then f is said to be Gateaux differentiable at
x0 , and δf (f0 , h) is called the Gateaux differential of f at x0 with increment
h. The Gateaux differential is also called the weak differential of f or the
G-differential of f .

B.2 Fréchet Derivative


Let x0 ∈ X be a fixed point, and let f : X 7→ Y . If there exists a bounded
linear operator F (x0 ) ∈ B(X, Y ) such that

lim f (x0 + h) − f (x0 ) − F (x0 )h = 0, (B.2)


khk→0

where h ∈ X, then f is said to be Fréchet differentiable at x0 , and F (x0 ) is


called the Fréchet derivative of f at x0 .
The following theorem establishes that Fréchet differentiability implies Gateaux
differentiability.
Theorem B.1. Let f : X 7→ Y , and let x0 ∈ X be a fixed point. If f is
Fréchet differentiable at x0 , then f is Gateaux differentiable, and the Gateaux
differential is given by

δf (x0 , h) = f ′ (x0 )h for all h ∈ X. (B.3)


312 B DIFFERENTIATION OF OPERATORS

Proof. The proof follows from the definition (B.2). Suppose that F (x0 ) =
f ′ (x−)), let ε > 0, and let h ∈ X. Then there exists a δ > 0 such that
1
f (x0 + h) − f (x0 ) − F (x0 )h < ε khk,
kthk
provided that kthk < δ if th 6= 0. But this implies that
f (x0 + th) − f (x0 )
− F (x0 )h < ε, (B.4)
t
provided that |t| < δkhk. 

Example B.1. Let X = Rn and let f : Rn 7→ R. Let x = (x1 , . . . , xn ) ∈


R , and h = (h1 , . . . , hn ) ∈ Rn . If f has continuous partial derivatives with
n

respect to xi , i = 1, . . . , n, then the Fréchet differential of f is given by


∂f (x) ∂f (x)
δf (x, h) = h1 + · · · + hn .
∂x1 ∂xn
For a fixed x0 ∈ Rn , define the bounded linear functional f (x0 ) on Rn by
n
X ∂f (x)
f (x0 ) h = hi for h ∈ Rn .
i=1
∂xi x=x0

This means that f (x0 ) is the Fréchet derivative of f at x0 , and we denote it


by Df (x0 ):
h ∂f (x ) ∂f (x0 ) i
0
Df (x0 ) = ... . (B.5)
∂x1 ∂xn
In R the derivative Df (x) becomes f ′ (x).

Example B.2. Consider the line 3x + 2y = 6 in R with slope −3/2. In


R2 , the function f (, y) = 3x + 2y − 6 has the gradient ∇f = 3i + 2j. Then
∂f /∂x
f ′ (x) = − = −3/2 is the slope of the plane of f . The gradient ∇f is
∂f /∂y
the vector starting at the origin, passing through the pane, and ending at the
point (3, 2) (see Figure B.1). 

Figure B.1 Slope and gradient.


B.2 FRÉCHET DERIVATIVE 313

The fact still remains that the gradient is defined as ∇f (x) = Df (x) [e],
i.e.,

∂f (x) ∂f (x)
∇f (x) = e1 + · · · + en
∂x1 ∂xn
 
  e1  
∂f (x) ∂f (x)  .  ∂f (x) ∂f (x)
= ··· .
. = ··· [e]T ,
∂x1 ∂xn ∂x1 ∂xn (B.6)
en

and the vector [e]T cannot just be removed from this definition unless ∇f
occurs in an equation that satisfies Condition A of Theorem 2.18.
According to Smith [1985:71], using the scalar product of ∇f (x) and the
increment h, the gradient is defined implicitly by

f (x + t h) − f (x)
∇f (x), h = lim as h → 0. (B.7)
t

If this limit exists for all directions h in an inner product space, then ∇f (x)
is simply described as the gradient of f at x. In the case of Fréchet differen-
tiation, the gradient, if it exists, can be expressed as

f (x + t h) = f (x) + ∇f, h + o(khk), (B.8)

provided that khk → 0 in a subspace containing f (x) and f (x)+h. Practically,


the gradient is usually defined by

d
∇f (x), h) = f (x + h) h=0
. (B.8)
dx

Example B.3. Smith [1985:78]) Let f : Rn 7→ R have continuous second-


order partial derivatives. The gradient of f is given by
 
∂f ∂f
∇f = ... (B.9)
∂x1 ∂xn

with respect to the usual inner product. The second Gateaux variation is
Z 1
d2
d2 f (x, h) = f (x1 + th1 + · · · + xn + thn ) dt = H, h , (B.10)
dt2 0
t=0

∂ 2 f (x1 , . . . , xn )
where h = (h1 , . . . , hn ), and H = is the Hessian of f . Then
∂xi ∂xj
Hh, h = hT Hh. 
314 B DIFFERENTIATION OF OPERATORS

It is obvious from (B.8) and (B.9) that gradient ∇f is expressed as either


equal or equivalent to the first partial derivatives of f , using words like ‘prac-
tically’ or ‘usually.’ But the question still remains about the validity of such
a conclusion. The second variation (B.10) is, however, correct.

B.3 Taylor’s Series Approximations


Taylor’s series approximations are presented in summation form. In R, the
Taylor’s series approximation about x = a is

f ′′ (a) f (n−1) (a)


f (x) = f (a) + f ′ (a)(x − a) + (x − a)2 + · · · + (x − a)(n−1) + Rn ,
2! (n − 1)!

f n (ξ) (x − a)n
where Rn = is the remainder after n terms, and a < ξ < x. If
n!
Rn = 0, the infinite series is called the Taylor series about x = a.
In R2 , let x = (x, y) and a = (a, b). Then the second-order approximation
of Taylor’s series expansion is

f (x, y) ≈ f (a, b) + fx (a, b)(x − a) + fy (a, b)(y − b)


1h i
+ fxx (a, b)(x − a)2 + 2fxy (a, b)(x − a)(y − b) + fyy (a, b)(y − b)2 ,
2!
or
2 2
X 1 hX i
f (x) ≈ f (a) + fi (a)(xi − ai ) + fij (a)(xi − ai )(xj − aj ) ,
i=1
2! i=1

∂f ∂  ∂f 
where fi = and fij = for i, j = 1, 2.
∂xi ∂xi ∂xj
In Rn , where x = (x1 , . . . , xn ) and a = (a1 , . . . , an ), the second-order
approximation of Taylor’s series expansion is
n n
X 1h X i
f (x) ≈ f (a) + fi (a)(xi − ai ) + fij (a)(xi − ai )(xj − aj ) , (B.11)
i=1
2! i,j=1

∂f ∂  ∂f  h ∂  ∂f  i
where fi = and fij = = , if f is continuous for
∂xi ∂xi ∂xj ∂xj ∂xi
i, j = 1, 2, . . . , n. Note that the first two terms in (B.11) give the first-order
approximation of the Taylor’s series expansion at x = a in Rn .
C
Distributions

C.1 Definitions
We will follow the convention of denoting a random variable by an upper case
letter, e.g., X, and using the corresponding lower case letter, e.g., x, for a
particular value of that variable.
A real-valued function F (x) is called a (univariate) cumulative distribution
function (c.d.f.), or simply a distribution function, or distribution, if (i) F (x)
is nondecreasing, i.e., F (x1 ) ≤ F (x2 ) for x1 ≤ x2 ; (ii) F (x) is everywhere
continuous from the right, i.e., F (x) = lim F (x + ε); and (iii) F (−∞) =
ε→0+
0, F ′ (∞) = 1.
The function F (x) describes probability of the event: X ≤ x, i.e., the
probability p{X ≤ x} = f (x), which describes the c.d.f. of X. There are two
principal types of distributions: discrete and continuous.
Discrete Distributions. They are characterized by the random variable
X taking on an enumerable number of values . . . , x−1 , x0 , x1 , . . . with point
probabilities pn = P {X = xn } ≥ 0 which is subject only to the restriction
P
pn = 1. In this case the distribution is written as
n

X
F (x) = P {X ≤ x} = pn , (C.1)
xn ≤x

where the summation is taken over all values of x for which xn ≤ x. The set
{xn } of values for which pn > 0 is called the domain of the random variable
X.
A discrete distribution of a random variable is called a lattice distribution
if there exist numbers a and b 6= 0 such that every possible value of X can be
represented in the form a + nb, where n takes only integer values.
Continuous Distributions They are characterized by F (x) being abso-
lutely continuous. Thus, F (x) possesses a derivative F ′ (x) = f (x), and the
316 C DISTRIBUTIONS

c.d.f. can be written as


Z x
F (x) = P {X ≤ x} = f (t) dt. (C.2)
−∞

The derivative f (x) is called the probability density function (p.d.f.) or fre-
quency function, and the values of x for which f (x) > 0 make up the domain
of the random variable X.

C.2 Some Common Distributions


1. Uniform Distribution: It is defined on the interval [0, 1]; density (p.d.f.)
R x (t)
f (x) = 1; c.d.f. F (x) = x; difference d(x) = 0 FF̄(x) d(t) = x/2; failure rate
R 1 (t)
r(x) = f (x)/F̄ (x) = 1/(1 − x); mean residual lifetime mrl(x) = x F̄F̄ (x) dt =
(1−x)/2. Thus, the density function is (weakly) log-concave; d(x) is monotone
increasing: r(x) is monotone increasing; and mrl(x) is monotone decreasing.
These properties follow from the log-concavity of f .
1 1 2
2. Normal Distribution. The p.d.f. is f (x) = √ e− 2 ((x−a)/n) , −∞ <
σ 2π
x < ∞, where −∞ < m < ∞, 0 < σ < m, and m the mean and σ 2 the
variance. Since the normal c.d.f. does not have a closed form, it is very difficult
to determine directly whether the c.d.f. is log-concave, or to determine where
the functions d(x), r(x), and mrl(x) are monotone. The standard normal
1 2
probability distribution has probability density f (x) = √ e−x /2 . Then

(ln f (x))′ = −x and (ln f (x))′′ = −1 < 0. Thus, the normal distribution
has log-concave density. The normed normal distribution has f (x) = φ(x) =
1 2
√ e−x /2 , x ∈ (0, 1), with expectation µ = 0, and variance σ 2 = 1.

3. Log-normal Distribution. If Y is a log normal distribution, then
Y = eX , where X is a normal distribution. Since the normal distribution
has a log-concave c.d.f., by Theorem 8.2.8 the log normal distribution has a
concave c.d.f. By Theorem 8.2.7, the difference function d(x) is increasing
for a log normally distributed variable. The log normal density function is
1 2
f (x) = √ e−(ln x) /2 , which, unlike the normal distribution, is not log-
x 2π
ln x
concave; however, since (ln f (x))′′ = 2 , so f (x) is log-concave on the in-
x
terval (0, 1) and log-convex on the interval (1, ∞). The failure rate is neither
monotone increasing nor monotone decreasing (Patel, et al. [1976]). Further,
the function mrl(x) is not monotone, but increasing for small values and de-
creasing for large values of x (Muth [1975]). These last two statements have
been verified only by numerical computation and computer graphics.
4. Mirror-image of Log-normal Distribution. It has a monotone de-
creasing d(x) for some values of x and increasing for others. The support
C.2 SOME COMMON DISTRIBUTIONS 317

is the set of all negative real numbers. Unlike the case of the mirror-image
Pareto distribution, we cannot calculate closed form expressions for the c.d.f.
or the d(x) function for this distribution. But it is known that both these
functions are non-monotone because the failure rate and the mean residual
lifetime function are non-monotone.
1 e−x
5. Logistic Distribution. c.d.f. F (x) = −x
; density f (x) = .
1+e (1 + e−x )2
Also, we have (ln f (x))′ = −1 + 2(1 − F (x)), and (ln f (x))′′ = −2f (x) < 0;
hence, this distribution has log-concave density.
6. Extreme-value Distribution. Density function is f (x) = exp{−e−x},
giving (ln f (x))′′ = −e−x < 0; hence this distribution had log-concave density.
This distribution arises as the limit as n → ∞ of the greatest value among n
independent random variables.
7. chi-Square Distribution with n Degrees of Freedom. It is a gamma
distribution with θ = 2 and m = n/2. Since the sum of the squares of n
independent standard normal random variables has a chi-square distribution
with n degrees of freedom, and since the gamma distribution has a log-concave
density function for m ≥ 1, the sum of the squares of two or more independent
standard normal variables has a log-concave density function.
8. chi Distribution. Its support is {x : x > 0}; density function

x(n/2)−1 e−n/2 x2
f (x) = ,
2n/2 Γ(n/2)

where n is a positive integer. Ifp a random variable X has a chi-distribution


with n degrees of freedom, then X/n has a chi-distribution with parameter
n. The sample standard deviation from the sum of n independent standard
n−1
normal variables has a chi-distribution. Since (ln f (x))′′ = − 2 − n < 0,
x
the chi-distribution has a log-concave density function. The chi-distribution
with n = 2 is known as the Rayleigh distribution, and with n = 3, as the
Maxwell distribution.
9. Exponential Distribution. Its support is [0, ∞]; density function f (x) =
λe−λx ; c.d.f. F (x) = 1 − e−λx . The density function is log-linear, and hence
log-concave, with (ln f (x))′′ = 0, and f ′ (x) < 0, and F ′ (x) > 0 for all x.
Thus, f (x)/F (x) is monotone decreasing, and hence f is strictly log-concave.
From Theorem 8.2.6, we conclude that d(x) is a monotone increasing function.
Barlow and Proschan [1981] have noted that this is the only distribution for
which the failure rate r(x) and the mean residual lifetime mrl(x) are constant:
Rh
r(x) = f (x)/F̄ (x) = λ and mrl(x) = x F̄ (t) dt = 1/λ. If the lifetime of an
object has an exponential distribution, then it does not ‘wear out’, i.e., the
probability of failure and the expected remaining lifetime remain constant so
long as the object ‘survives.’
318 C DISTRIBUTIONS

λ
10. Laplace Distribution. It has density function f (x) = e−λ|x| , where
2
λ > 0; c.d.f. is
 1 λx
e if x < 0,
f (x) = 2 1 −λx
1− 2e if x ≥ 0.
The density function is sometimes known as the double exponential, since it is
proportional to the exponential density for positive x and to the mirror-image
of the exponential distribution for negative x. Also, ln f (x) = λ|x|, which is
clearly a concave function, although its derivative (ln f (x))′ does not exist at
x = 0.
11. Weibull Distribution with Parameter c > 0. The density func-
c
tion is f (x) = cxc−1 e−x , x ∈ (0, ∞). Also, (ln f (x))′′ = (1 − c)x−2 (1 +
 < 0 for c > 1,

cxc ) = 0 for c = 1, Thus, the density function is (strictly) log-concave if


> 0 for c < 1.
0 < c < 1, log-linear if c = 1, and log-convex if c > 1. Further, the reliability
c
function F̄ (x) = 1 − F (x) = e−x , giving (ln F̄ (x))′′ = −c(c − 1)xc−2 which
is positive for c < 1 and nonpositive for c ≥ 1. Thus, the reliability function
is log-concave for c ≥ 1 and log-convex for c < 1. For this distribution with
0 < c < 1 the failure rate is a decreasing function of age.
12. Power Function Distribution. Its c.d.f. is F (x) = xβ , with support
(0, 1); density function f (x) = βxβ−1 , giving (ln f (x))′′ = (1 − β)x−2 , so that
the density function is log-concave if β ≥ 1 and log-convex if 0 < β < 1.
This distribution has a log-concave c.d.f. for all positive β, since (ln F (x))′′ =
R x F (t) x
−βx−2 < 0. The difference function is d(x) = l dt = ; thus, d(x)
F (x) β+1
is monotone increasing for all β ≥ 0, because log-concavity of F (x) implies
that d(x) is monotone increasing. Moreover, the reliability function F̄ (x) =
βxβ−2 (1 − β − xβ )
1 − xβ , giving (ln F̄ (x))′′ = , which has the same sign as
(1 − xβ )2
1 − β − xβ ; thus, this expression is positive for x near zero and negative for x
near 1. Hence, the reliability function is neither log-concave nor log-convex on
β − xβ + 1
(0, 1). The right-side integral of the reliability function R(x) = ,
1+β
which is neither log-concave nor log-convex.
xm−1 θm e−xθ
13. Gamma Distribution. Its density function is f (x) = ,x∈
Γ(m)
1−m
(0, ∞), θ > 0 and m > 0. Then (ln f (x))′′ = . Thus, the density func-
x2
tion is strictly log-concave for m > 1, but strictly log-convex for m < 1, and
in this case f ′ (x) < 0 for all x > 0. Therefore, the c.d.f. is log-concave, and
by Theorem 8.7, the left-side integral of the c.d.f. is log-concave. Barlow and
Proschan [1981: 75] have shown that for m < 1, the failure rate is a monotone
C.2 SOME COMMON DISTRIBUTIONS 319

decreasing function of age, implying that the reliability function is log-convex.


xa−1 (1 − x)b−1
14. beta Distribution. Its density function is f (x) = ,x ∈
B(a, b)
1−a 1−b
(0, 1), a, b > 0. Since (ln f (x))′′ = + , the density function is log-
x x
concave if a ≥ 1 and b ≥ 1, and log-convex if a < 1 and b < 1. If a < 1 and
b > 1, or if a > 1 and b < 1, then the density function is neither log-convex nor
log-concave on (0, 1). The log-concavity of the c.d.f. and reliability function
for a > 1 and b < 1, or for a < 1 and b > 1 are not known in general, but only
in the following two special cases: (i) a = b = 12 , in which case the distribution
is known as the arcsin distribution. Closed form expressions for the c.d.f. and
the reliability functions are known but they are not simple. Their plots show
that the density function and the reliability function are neither log-concave
nor log-convex. (ii) a = 2, b = 12 , in which case the closed-form expressions
when plotted show that the c.d.f. and its integrals are neither log-concave nor
log-convex, but the reliability function and its integrals are log-convex.
15. Pareto Distribution. It is defined over the nonnegative real numbers,
has c.d.f. F (x) = 1 − xβ , β > 0; density function f (x) = βx−β−1 ; then
β+1
(ln f (x))′′ = > 0, implying that the density function is log-convex
x2
rather than log-concave. But since f ′ (x) < 0 for all x, the c.d.f. F (x) is log-
concave. This distribution is a simple example for which both failure rate and
mean residual lifetime behave ‘unusually.’ The reliability function is F̄ (x) =
x−β , thus (ln F̄ (x))′′ = β/x2 > 0, implying that the reliability function is log-
convex rather than log-concave, and R ∞therefore, the failure rate is a decreasing
function of x. Also, since R(x) = x F (t) dt, the function R(x) converges iff
1 β−1
β > 1, in which case R(x) = x1−β . Then (ln R(x))′′ = > 0, and
β−1 x2
therefore, R(x) is log-concave and mrl(x) is a decreasing function of x.
16. Mirror-image of Pareto Distributions. Its support is (−∞, −1);
−β
R x F (x) = (−x) , β > 0. For β > 1, the
c.d.f.
−1
left integral of the c.d.f. G(x) =
−∞
F (t) dt converges and G(x) = (β −1) (−x) 1−β
; then d(x) = G(x)/G′ (x)
x 1
= G(x)/F (x) = , and d′ (x) = < 0. Thus, the function d(x) is
1−β 1−β
decreasing, and the mean residual lifetime function mrl(x) is increasing.
17. Student’s t-Distribution. It is defined on the entire real line with
density function
(1 + x2 /n)−n+1/2
f (x) = √ ,
n B(1/2, n/2)

where B(a, b) is the incomplete beta function and n is the number of degrees
n − x2
of freedom. Then since (ln f (x))′′ = −(n+1) , the density function is
(n + x2 )2
320 C DISTRIBUTIONS

√ √
log-concave on the central interval [− n, n], and therefore, it is log-concave

√ this interval but log-convex on each of the outer intervals [∞, − n] and
on
[ n, ∞]. Thus, although this √ distribution
√ is itself not log-concave, a trun-
cate one on the interval [− n, n] is log-concave. There does not exist any
proof for the log-concavity or log-convexity of the c.d.f. function, but numer-
ical computations, using the program gauss, show that the c.d.f. is neither
log-concave nor log-convex for the cases n = 1, 2, 3, 4 and 24. Since this distri-
bution is symmetric, the log reliability function is the mirror-image of the log
of the log c.d.f., and hence, the c.d.f. is neither log-concave nor log-convex,
and so is the reliability function.
18. Cauchy Distribution. It is a Student’s t distribution with one degree of
freedom, and is equal to the ratio of two independent standard normal random
1
variables. The density function is f (x) = , and c.d.f. F (x) =
π(1 + x2 )
1 arctan(x) x2 − 1
2 + ; then (ln f (x))′′ = −2 which is negative for |x| < 1
π (x2 + 1))2
and positive for |x| > 1. Thus, Rthe density function is neither log-concave nor
x
log-convex. Since the integral −∞ F (t) dt does not converge, the function G
is not well-defined.
19. F -Distribution. It has support as the set of positive real numbers. It
has two integer-valued parameters m1 and m2 , known as ‘degrees of freedom.’
The density function is

f (x) = cx(m1 /2)−1 (1 + (m1 /m2 )x)−(m1 +m2 )/2 ,

where c is a constant that depends only on m1 and m2 . This distribution


arises in statistical applications as the distribution of the ratio of two inde-
pendent chi-square distributions with m1 and m2 degrees of freedom. Since
(ln f (x))′′ = −(m1 /2 − 1)/x2 + (m1 /m2 )2 (m1 + m2 )/2(1 + m1 /m2 x)−2 , the
density function is log-convex if m1 ≤ 2; but since (ln f (x))′′ is positive or
negative depending on whether x is greater than or less than
r
m1 − 2
m2
m + 1 + m2
r ,
m1 − 2
1−
m + 1 + m2

this function is neither log-concave nor log-convex.


D
Laplace Transforms

The technique of integral transforms, and in particular of Laplace transforms,


is a powerful tool for the solution of linear ordinary or partial differential
equations. A function f (x) may be transformed by the formula
Z b
F (s) = f (x)K(s, x) dx,
a

where F (s), provided it exists, is the integral transform of f (x), s is the


variable of the transform, and K(s, x) is known as the kernel of the transform.
An integral transform is a linear transformation, which, when applied to a
linear initial or boundary value problem, reduces the number of independent
variables by one for each application of the integral transform. Thus, a partial
differential equation can be reduced to an algebraic equation by repeated
application of integral transforms. The algebraic problem is generally easy to
solve for the function F (s), and the solution of the problem is obtained if we
can determine the function f (x) from F (s) by some inversion formula.

D.1 Notation
The Laplace transform is defined as
Z ∞
L{f (t)} ≡ F (s) = f¯(s) = f (t)e−st dt, (D.1)
0

and its inverse is


Z c+i∞
−1 1
L {F (s)} ≡ f (t) = F (s)est ds, (D.2)
2πi c−i∞

where s is the variable of the transform, which in general is a complex variable.


Note that the Laplace transform F (s) exists for s > α, if the function f (t)
is piecewise continuous in every finite closed interval 0 ≤ t ≤ b (b > 0), and
f (t) is of exponential order α, i.e., there exist α, M , and t0 > 0 such that
e−αt |f (t)| < M for t > t0 .
322 D LAPLACE TRANSFORMS

Two basic properties of the Laplace transforms are:


(i) Convolution Theorem:
Z t Z t
−1
L {G(s)F (s)} = f (t − u)g(u)du = f (u)g(t − u)du.
0 0

(ii) If L {f (x, t)} = F (x, s), then


   
∂f (x, t) ∂F (x, s) ∂F (x, s) ∂f (x, t)
L = , and L−1 = .
∂x ∂x ∂x ∂x

The second property is very useful; it is based on the Leibniz rule, which
states that if g(x, t) is an integrable function of t for each value of x, and
∂g(x, t)
the partial derivative exists and is continuous in the region under
∂x Z b
Rb ∂g(x, t)
consideration, and if f (x) = a g(x, t)dt, then f ′ (x) = dt.
a ∂x
A table of some useful Laplace transform pairs is given at the end of this
Appendix (Table D.1, p. 340); larger tables are available in reference books
on mathematical formulas, e.g., in Abramowitz and Stegun [1972].
We will now explain certain techniques to derive Laplace transforms from
known transform pairs.
Example D.1. Consider (see formula 19, Table D.1)
 1
L eat = . (D.3)
s−a
Differentiating both sides with respect to a, we get
 1
L teat = , (D.4)
(s − a)2

and repeating this differentiation n times, we find that


 n!
L tn eat = . (D.5)
(s − a)n+1

Next, replacing a by ib, choosing an appropriate n, and comparing the real


and imaginary parts on both sides, we get the Laplace transforms of functions
tn cos bt and tn sin bt, and then combining with the Laplace transform of eat ,
we obtain the Laplace transforms of functions tn eat cos bt and tn eat sin bt. For
example, if we choose n = 2, then we have
 2!
L t2 eat = .
(s − a)3
D.1 NOTATION 323

Now letting a = ib, we get

 2!
L t2 eibt = ,
(s − ib)3

which yields

 2(s + ib)3 2(s3 + 3is2 b − 3sb2 − ib3 )


L t2 (cos bt + i sin bt) = 2 2 3
= ,
(s + b ) (s2 + b2 )3

and equating the real and imaginary parts of this equality, we obtain

 2(s3 − 3sb2 )
L t2 cos bt = , (D.6)
(s2 + b2 )3

and
 2(3s2 b − b3 )
L t2 sin bt = . (D.7)
(s2 + b2 )3
 
The Laplace transforms of L eat t2 cos bt and Leat t2 sin bt can then be
easily obtained. 
Example D.2. Consider
( √ )
−1 e−a s
a
L = erfc √ , (D.8)
s 2 t

where
Z x Z ∞
2 −u2 2 2
erf(x) = √ e du, and erfc(x) = 1 − erf(x) = √ e−u du.
π 0 π x

We can derive certain Laplace inverses by differentiating and integrating (D.8).


Thus, ( √ )
−1 e−a s 1 2
L √ = √ e−a /4t (D.9)
s πt
is obtained after differentiating (D.8) with respect to a and canceling out the
negative sign on both sides.
Although the usual method of deriving (D.9) is by contour √
integration
−a s
e
(see Exercise D.10), or by using the Laplace inverse of , an interesting
s
method is given in Exercise D.1.
n √ o a 2
Next, we obtain L−1 e−a s = √ e−a /4t which is obtained by dif-
2 πt3
ferentiating (D.9) with respect to a and canceling out the negative sign. 
324 D LAPLACE TRANSFORMS
( √ )
−1 e−as
a
Example D.3. If we integrate the formula L = erfc √ with
s 2 t
respect to a from 0 to a, we get
Z a ( √ ) Z a
−1 e−x s x
L dx = erfc √ dx.
0 s 0 2 t

Then, after changing the order of integration and the Laplace inversion and
carrying out the integration on the left side, we get
Z a ( √ )
−1 e−x s √
L dx = L−1 (s−3/2 − s−3/2 e−a s ), (D.10)
0 s

while the right side yields


Z a  a Z a
x x 1 2
erfc √ dx = x erfc √ +√ x e−x /4t dx
0 2 t 2 t 0 πt 0
r r
a t −a2 /4t t
= a erfc √ − 2 e +2 .
2 t π π
n o r
−1 −3/2 t
Since L s =2 , we get
π
r
n √ o t −a2 /4t a
L−1 s−3/2 e−a s = 2 e − a erfc √ .  (D.11)
π 2 t
( √ )
e−a s+c
Example D.4. To evaluate L−1 , we know from (D.9) that
s
n √ o a 2
L−1 e−a s = √ e−a /4t .
2 πt 3

Hence, using formula 2, Table D.1,


n √ o a 2
L−1 e−a s+c = √ e−ct−a /4t ,
2 πt 3

1
which, in view of the convolution theorem (Property (i) ) with F (s) = and
√ s
G(s) = e−a s+c
, yields
( √ ) Z
t
−1 e−a s+c a 2
L = √ e−cu−a /4u du. (D.12)
s 0 2 πu
3
D.1 NOTATION 325

Since r r
a a 1 c a 1 c
√ = √ + + √ − ,
2 u 3 4 u 3 2 u 4 u3 2 u
and
a2 √ a 2 √ √ a 2 √
cu + = cu + √ −a c= cu − √ + a c,
4u 2 u 2 u
a √ a √
we define x = √ + cu and y = √ − cu , and use the notation
2 u 2 u
a √ a √
√ + ct = x1 , and √ − ct = y1 .
2 t 2 t

Then the integral on the right side of (D.12) becomes


 √ Z ∞ √
Z ∞ 
1 2 2
√ ea c e−x dx + e−a c e−y dy
π x1 y1
1 h a√c  a √  √  a √ i
= e erfc √ + ct + e−a c erfc √ − ct .
2 2 t 2 t

Hence,
( √ )
−1 e−a s+c
1 h a√c  a √ 
L = e erfc √ + ct
s 2 2 t (D.13)
√  a √ i
+ e−a c erfc √ − ct . 
2 t

We state a useful theorem without proof.


P∞
Theorem D.1. If G(s) = Gk (s) is uniformly convergent series, then
k=1


X
L−1 {G(s)} = g(t) = gk (t), (D.14)
k=1

where L−1 {Gk (s)} = gk (t).


Example D.5. As an application of Theorem D.1, since
n o
L−1 s−3/2 e−1/s
  
−1 1 1 1 1 n 1
=L 1− + − + · · · + (−1) + ···
s3/2 s 2!s2 3!s3 n!sn (D.15)
∞ ∞ √
X (−1)n 1 X (−1)n (2 t)2n+1 1 √
= L−1 n+3/2
= √ = √ sin(2 t),
0
n! s π 0 (2n + 1)! π
326 D LAPLACE TRANSFORMS

we find that this result and formula 7, Table D.1 give


n o 1 √
L−1 s−1/2 e−1/s = √ cos(2 t).  (D.16)
πt

Example D.6. Consider a semi-infinite medium bounded by 0 ≦ x ≦ ∞,


−∞ < y, z < ∞, which has an initial zero temperature, while its face x = 0
is maintained at a time-dependent temperature f (t). The problem is to find
the temperature for t > 0. By applying the Laplace transform to the heat
s
conduction equation kTxx = Tt , we get T xx = T , where T = L{T }. The
k
solution of this equation is

T = Aemx + Be−mx , (D.17)


p
where m = s/k. Since T remains bounded as x → ∞, we find that A =
0. The boundary condition at x = 0 in the transform domain yields B =
f¯(s), where f¯(s) is the Laplace transform of f (t). Thus, the solution in the
transform domain is
T = f¯(s) e−mx .
To carry out the inversion, we use the convolution property and Example D.2
and get
Z t 2
x e−x /4kτ
T = √ f (t − τ ) dτ.
0 2τ πkτ
If f¯(s) = 1, then the solution for T reduces to
2
x e−x /4kt
T = √ .
2t πkt
This solution is the fundamental solution for the heat conduction equation for
the half-space. In the special case when f (t) = T0 , the solution is given by
 
x
T = T0 erfc √ .
2 kt

Example D.7. Consider an infinite slab bounded by 0 ≤ x ≤ l, −∞ <


y, z < ∞, with initial zero temperature. The face x = 0 is maintained at
a constant temperature T0 , and the face x = l is maintained at zero tem-
perature. The problem is to find the temperature inside the slab for t > 0.
Proceeding as in the above example, the solution in the transform domain
is given by Eq (D.17). Applying the boundary conditions in the transform
domain we get
T0
A+B = , and A eml + B e−ml = 0.
s
D.1 NOTATION 327

These two equations yield

T0 eml T0
B= , and A= − B.
2s sinh ml s

Substituting these values into T , given by (D.17), and simplifying, we find


that
T0 sinh m(l − x)
T = .
s sinh ml
Rewriting this solution as

T0 −ml  m(l−x)  −1


T = e e − e−m(l−x) 1 − e−2ml ,
s

and expanding the last factor by the binomial theorem, we get


T0 −ml  m(l−x) X
T = e e − e−m(l−x) e−2nml
s 0

T0 X  −m(2nl+x) 
= e − e−m[(2n+2)l−x] ,
s 0

which, on inversion, yields

∞ 
X  2(n + 1)l − x   2nl + x 
T = T0 erf √ − erf √ .
0 2 kt 2 kt

Alternatively, we can use the Cauchy residue theorem and obtain the solution
in terms of the Fourier series. Thus,

X T0 est sinh m(l − x)


T = residues of
s sinh ml
h ∞ i
x X 2 −n2 π2 kt/l2
= T0 1 − − e sin (nπx/l) . 
l 1

Theorem D.2. (Inversion Theorem) If F (s) is the Laplace transform of


f (t), then
Z c+i∞
1
f (t) = F (s) est ds, (D.18)
2πi c−i∞

where F (s) is of order O(s−k ), where k > 0.


To prove this theorem, we first state and prove a lemma.
328 D LAPLACE TRANSFORMS

Lemma D.1. If f (z) is analytic and of order O(z −k ) in the half-plane


ℜ {z} > γ, where γ and k > 0 are real constants, then
Z γ+iβ
1 f (z) 
f (z0 ) = lim dz, ℜ z0 > γ. (D.19)
2πi β→∞ γ−iβ z0 − z

Proof. Consider the rectangle in Figure D.1. Choose β > |γ| and such
that z0 lies inside this rectangle. By the Cauchy integral formula, we have
Z
f (z)
dz = 2πif (z0 ), (D.20)
Γ z − z0

where Γ is the contour ABCDA. Let S denote the contour ABCD, then
Z Z Z
f (z) f (z) f (z)
dz = dz + dz.
Γ z − z0 DA z − z0 S z − z0

Since Z Z
f (z) f (z)
dz = − dz,
DA z − z0 AD z − z0
we get from ( D.20)
Z γ+iβ Z
f (z) f (z)
− dz + dz = 2πif (z0 ). (D.21)
γ−iβ z − z0 S z − z0

Figure D.1 Rectangular contour.


D.1 NOTATION 329
Z
f (z)
Now, consider dz as β → ∞. Obviously, β → ∞ implies that
S z − z0
|z| → ∞ on S. Thus |z| ≧ β for points on S. If we take β large enough so
1 1 z0 1 z0
that β > 2|z0 |, then |z0 | < β ≦ |z|, or < implies that 1 − ≧
2 2 z 2 z
z0 1
1− > . Noting that |f (z)| < M |z|−k for large z, we get
z 2

f (z) f (z) 1 M 2M
z − z0
=
z
 z0  ≦ k+1  z0  ≦ β k+1 .
1− z 1−
z z

It now follows that

Z Z
f (z) 2M 2M
dz < k+1 |dz| = k+1 (length of S)
S z − z0 β S β
   
2M 4β − 2γ 2M 2γ
= k = k 4− .
β β β β

Z
f (z)
Thus, lim dz = 0. Hence, from ( D.21),
β→∞ S z − z0

Z γ+i∞
f (z)
− dz = 2πif (z0 ),
γ−i∞ z − z0

or
Z γ+i∞
1 F (z)
F (s) = dz.  (D.22)
2πi γ−i∞ s−z

The proof of Theorem 6.2 for the Laplace transform now becomes elemen-
tary. By taking the Laplace inverse of both sides of Eq ( D.22), we have

Z γ+i∞ n F (z) o Z γ+i∞


 1 1
f (t) = L−1 F (s) = L−1 dz = F (z)ezt dz. 
2πi γ−i∞ s−z 2πi γ−i∞
(D.23)
−k iθ
Lemma D.2. If |f (z)| < CR Z, z = R e , −π ≦ θ ≦ π, R > R0 , where
R0 , C, and k are constants, then ezt f (z) dz → 0 as R → ∞, provided
Γ
t > 0, where Γ is the arc BB ′ CA′ A, and R is the radius of the circular arc
330 D LAPLACE TRANSFORMS

with chord AB (Figure D.2).

Figure D.2 Contour Γ.

Proof. Consider the integral over the arc BB ′ . Let the angle BOC ′ be
denoted by α. On BB ′ we have z = Reiθ , where θ varies from α to π/2,
α = cos−1 (γ/R), and γ = OC ′ . Then we get
Z Z π/2

ezt f (z) dz < CR−k eRte Rieiθ dθ
BB ′ α
Z π/2 Z π/2
−k+1 Rt cos θ −k+1
= CR e dθ ≦ CR eγt dθ
α α
= CR−k+1 (π/2 − α)eγt
γ
= CR−k+1 eγt sin−1 →0 as R → ∞.
R
Z
Similarly, ezt f (z) dz → 0 as R → ∞.
A′ A
Let us now consider the integral over the arc B ′ CA′ . By following the
above procedure, we get
Z Z 3π/2
ezt f (z) dz < CR−k+1 eRt cos θ dθ
B ′ CA′ π/2
Z π
−k+1 −Rt sin φ
= CR e dφ where θ = π/2 + φ
0
Z π/2 Z π/2
−k+1 −Rt sin φ −k+1
= 2CR e dφ ≤ 2CR e−2Rtφ/π dφ
0 0
πCR−k
= (1 − e−Rt ) → 0 as R → ∞.
t
D.2 LAPLACE2 -TRANSFORM 331
Z
Hence, ezt f (z) dz → 0 as R → ∞, provided that t > 0. 
Γ
The justification to use the inequality e−RT sin φ ≤ e−2RT φ/π in the above
penultimate step is as follows: The function g(φ) = sin φ − 2φ/π ≥ 0 for
0 ≤ φ ≤ π/2, and with g(0) = 0 = g(π/2) has only one critical point at
φ = cos−1 (2/π), which gives a maximum.
Z γ+iβ
1
This result enables us to convert the integral F (z) ezt dz into an
2πi γ−iβ
integral over the contour (−Γ).

D.2 Laplace2 -Transform


The Laplace2 -Transform (also denoted by L2 -transform), which is a Laplace-
type integral transform, was introduced by Yurekli and Sadek [1991]. It is
defined as Z ∞
2 2
L2 {f (t); s} = t e−s t f (t) dt. (D.24)
0
A useful property of the L2 -transform is as follows:
If f is a class C n function, i.e., if f, f ′ , . . . , f (n−1) are all continuous func-
tions with a piecewise continuous derivative f (n) on the interval t ≧ 0 and
2 2
if all functions are of exponential order ec t as t → ∞ for some constant c,
then for n = 1, 2, . . .

L2 {δtn f (t); s} = 2n s2n L2 {f (t); s} − 2n−1 s2(s−1) f (0+ )


− 2n−2 s2(n−2) (δt f )(0+ ) − · · · − (δtn−1 f )(0+ ). (D.25)

For proof, see Yurekli and Sadek [1991]. 



Lemma D.3. (Inversion formula) Let F ( s) be an analytic function of s
(assuming that s = 0 is not a branch point) except at a finite number√of poles
each of which lies to the left of the vertical line ℜ{s} = c, and if F ( s) → 0
as s → ∞ through the left-plane ℜ{s} ≦ c, and if L2 {f (t); s} ≡ F (s) (see
Figure D.3), then
Z c+i∞
1 √ 2
L−1
2 {F (s)} = f (t) = 2F ( s ) est ds
2πi c−i∞
m
X  √ 2 
= Res {2F ( s) est }, s = sk . (D.26)
k=1

Proof. Aghili, Ansari and Sedgi [2007].


Lemma D.4. (Convolution theorem) If F (s), G(s) is the L2 -transforms of
the functions f (t) and g(t), respectively, then
nZ t p o
F (s)G(s) = L2 {f ⋆ g} = L2 xg(x)f ( t2 − x2 ) dx . (D.27)
0
332 D LAPLACE TRANSFORMS

Proof. Using the definition (D.24) for F (s) and G(s), we get
Z ∞ Z ∞
−s2 y 2 2
x2
F (S)G(s) = ye f (y) dy x e−s g(x) dx
Z0 ∞ Z ∞ 0
2 2 2
= yx e−s (x +y ) f (y)g(x) dx dy
0 0
Z ∞ Z t p
2 2
= t e−s t dt xg(x)f ( t2 − y 2 ) dx
0 0
Z ∞ p nZ t o
2 2
= t e−s t xg(x)f ( t2 − y 2 ) dx dt
0 0
nZ t p o
= L2 xg(x)f ( t2 − x2 ) dx ,
0

where wephave set x2 + y 2 = t2 , so that y dy = t dt, holding x as constant,


and y = (t2 − x2 ). 
A generalization of this theorem is the Efros theorem, also known as the
generalized product theorem; for another version, see §13.1.
Theorem D.3. (Efros Theorem) Assuming that Φ(s) and q(s) are ana-
2 2
lytic, let L2 {Φ(t, τ )} = Φ(s)τ e−τ q (s) . Then

nZ ∞ o
L2 f (τ )Φ(t, τ ) dτ = F (q(s))Φ(s), (D.28)
0

where L2 {f (t)} = F (s).


Proof. Using the definition (D.24) of the L2 -transform, and changing the
order of integration, we get

nZ ∞ o Z ∞ ∞ nZ o
−s2 t2
L2 f (τ )Φ(t, τ ) dτ = te f (τ )Φ(t, τ ) dτ dt
0 0 0
Z ∞ nZ ∞ o Z ∞
2 2 2 2
−s t
= f (τ ) dt te Φ(t, τ ) dτ = Φ(s) f (τ )τ e−τ q (s) dτ
0 0 0
= Φ(s)F (q(s)). 

More details about L2 -transform can also be found in Yurekli and Wilson
[2002], [2003].

D.3 Exercises
D.1. The√
formula (D.9) can be obtained as follows (Churchill [1972]):
−a s
e √ dy 1 √ a √
Define √ = x and e−a s = z. Then y ′ = = − 3/2 e−a s − e−a s ,
s ds 2s 2s
D.3 EXERCISES 333

a √
which yields 2sy ′ +y +az = 0. Similarly, z ′ = − √ e−a s yields 2z ′ +ay = 0.
2 s
Taking the inverse transform of these equations, we get

aG − F − 2tF ′ = 0, and aF − 2tG = 0,

where L−1 {y} = F (t) and L−1 {z} = G(t). From these two equations in F
1  a2 F  A 2
and G, we get F ′ = − F , whose solution is F = √ e−a /4t , which
2t 2t t
aA −a2 /4t 1 1
gives G = √ e . Note that if a = 0, then y = √ , and F (t) = √
2 t 3 s πt
1
implies that A = √ . Hence,
π

1 2 a 2
F (t) = √ e−a /4t , G= √ e−a /4t .
πt πt 3

( √ )
−1 e−a s 1 2
Then we integrate L √ = √ e−a /4t with respect to a from 0 to
s πt
( √ )
e−a s n 1 o rπ
−1
a and obtain L , using L √ = (formula 12, Table D.1).
s t s
n cosh a√s o n √ o
−1 −1 sinh a s
D.2. Find (a) L √ , and (b) L √ , b > a > 0.
s cosh b  s sinhb s
Hint. Use cosh x = ex + e−x /2, sinh x = ex − e−x /2, and (1 + z)−1 =

X
(−1)n z n .
n=0

X   (2n + 1)b − a   (2n + 1)b + a 
Ans. (a) (−1)n erfc √ + erfc √ ,
n=0
2 t 2 t
∞  
X (2n + 1)b − a −[(2n+1)b−a]2 /(4t) (2n + 1)b + a −[(2n+1)b+a]2 /(4t)
(b) √ e − √ e .
n=0 4πt3 4πt3
Γ(n+!)
D.3. Show that L{tn } = n+1 , where Γ(x) is the gamma function.
Z ∞ s Z ∞
1 Γ(n+!)
n
Ans. We have L{t } = tn e−st dt = n+1 xp e−x dx = n+1 , where
0 s 0 s
we have set st = x. 
D.4. Solve the partial differential equation utt = uxx , with the initial
(1 − x)2
conditions u(x, 0) = − , ut (x, 0) = 0, and the boundary conditions
2
ux (0, t) = 1 and ux (1, t) = 0.
1 (1 − x)2
Ans. u = − t2 − .
2 2
334 D LAPLACE TRANSFORMS

D.5. Solve the partial differential equation ut = uxx , with the initial
condition u(x, 0) = 0 and the boundary conditions ux (0, t) = 0 and u(1, t) =
1. √
cosh x s
Hint. The solution in the transform domain is ū = √ .
s cosh s
Find two different inverses of this solution, by expanding the solution in a
series of the type shown in Example D.7 and by the residue theorem.
∞  
X
n 2n + 1 − x 2n + 1 + x
Ans. u = (−1) erfc √ + erfc √ , or
0
2 t 2 t

X 4 cos(2n + 1)πx/2 −(2n+1)2 π2 t/4
u=1− (−1)n e .
0
(2n + 1)π

D.6. Solve in the transform domain the partial differential equation ut =


uzz + kutzz , given that u(z, 0) = 0, and u(0, t) = u0 , lim u(z, t) = 0 for t > 0.
z→∞
u0 −z√s
Expand the solution in the transform domain in the form ū = e 1+
 s
powers of k . Invert the first two terms of this expansion.
Z r
u(z, t) 1 λ 1 −xt λx 1
Ans. =1− e sin dx, where = k.
u0 π 0 x λ−x λ
D.7. Using the Laplace transform method, solve the partial differential
equation ut = uxx , with the initial condition u(x, 0) = 0 and the boundary
conditions ux (0, t) = 0, and ux (1, t) = 1. √
cosh x s
Hint. The solution in the transform domain is ū = 3/2 √ . Find two
s sinh s
different inverses of this solution, by expanding the solution in a series of the
type shown in Example D.7 and by the residue theorem.
X∞  p  
2 2
Ans. u = 2 t/π e−(2n+1−x) /4t + e−(2n+1+x) /4t
n=0 
2n + 1 − x 2n + 1 + x
−(2n + 1 − x) erfc √ − (2n + 1 + x) erfc √ ,
2 t 2 t
and

x2 1 X 2(−1)n −n2 π2 t
u= +t− − e cos nπx.
2 6 n=1 n2 π 2

( √ )
−1 e−a s
D.8. Use contour integration to evaluate L .
s
Solution. Using the Laplace inversion formula (D.2), we have
Z √
c+i∞
1 e−a s
f (t) = est ds. (D.29)
2πi c−i∞ s
D.3 EXERCISES 335

Consider the Bromwich contour M ABC1 CDL (Figure D.3). Then by Cauchy’s
theorem we get
Z c+i∞ −a√s Z Z
e st
I= e ds = F (s) ds + F (s) ds
c−i∞ s LD DC
Z Z Z
+ F (s) ds + F (s) ds + F (s)ds.
C1 BA AM

As established in Lemma D.2, we have


Z Z
F (s) ds + F (s) ds = 0,
LD AM

e−a s
where F (s) = est .
s

Figure D.3 Bromwich contour.

The integral over the circle C1 is easily shown to be equal to 2πi. This is
done by taking the radius to be ε and substituting s = εeiθ . On BA, s = u eiπ ,
and
Z R→∞
1 −a√ueiπ/2+uteiπ iπ
IBA = e e du
u eiπ
Zε→0∞ Z ∞
1 −ia√u−ut 1 −ut √ √
= e du = e (cos a u − i sin a u) du
0 u 0 u
Z ∞
1 −v2 t
=2 e (cos av − i sin av) dv,
0 v
R R ∞ 1 −v2 t
where u = v 2 . Similarly, CD = −2 0 e (cos av + i sin av) dv. Hence,
v
Z Z Z ∞
1 −v2 t
+ = −4i e sin av dv.
CD BA 0 v
336 D LAPLACE TRANSFORMS
Z ∞
1 −v2 t
In order to evaluate the integral e sin av dv, we consider the integral
Z ∞ 0 v
2
e−v t cos av dv. Then
0
Z ∞ Z ∞ 
−v 2 t −v 2 t+iav
e cos av dv = ℜ e dv
0 0
 Z ∞ √ √

−a2 /4t −(v t−ia/2 t)2
=ℜ e e dv
0
 2 Z ∞ 
e−a /4t 2 √ √
=ℜ √ √ e−w dw , where w = v t − ia/2 t
t −ia/2 t
 2 Z ∞ Z 0 
e−a /4t −w 2 −w 2
=ℜ √ e dw + √ e dw .
t 0 −ia/2 t

Hence,
Z ∞ √ −a2 /4t
2 πe
e−v t cos av dv = √ .
0 2 t
Integrating both sides of this equation with respect to a from 0 to a, we get
Z ∞ r Z a
1 −v2 t π 2 π a
e sin av dv = e−x /4t
dx = erf √ .
0 v 4t 0 2 2 t

Thus
( √ )  
−1 e−a s
1 π a a
L = 2πi − 4i erf √ = erfc √ .  (D.30)
s 2πi 2 2 t 2 t

n −x s2 +a2 o
−1 e
D.9. Determine f (x, t) ≡ L2 .
2s2 (s2 − b)
√ √
−x s2 +a2
e−x s+a
2
e √
Solution. Let F (s) = 2 2 , which gives 2F ( s) = . If
2s (s − b) s(s − b)
√ 2 ∞R 2 2 2
we denote x s + a2 by z and use e−z = √ 0 e−y −z /(4y ) dy, we find that
π
Z √
c+i∞
e−x s+a st2
2
1
f (x, t) = e ds
2πi c−i∞ s(s − b)
Z c+i∞ Z ∞  2
1 1 2 2 2 2
= e−y −(x (s+a ))/(4y ) dy est ds
2πi c−i∞ s(s − b) 0
Z ∞ −y2 −a2 x2 /(4y2 )  Z c+i∞ −x2 /(4y2 )−t2 )s 
e 1 e
= dy ds
0 s 2πi c−i∞ s−b
D.3 EXERCISES 337
Z
1 ∞ −y2 −a2 x2 /(4y2 )  b(t−x2 /(4y2 )+t2 ) 
= e e − 1 H(t − x2 /(4y 2 ) + t2 ) dy
b 0
Z Z
1 ∞ −y2 −(a2 +1)x2 /(4y2 )+b(t+t2 ) 1 ∞ −y2 −a2 x2 /(4y2 )
= e dy − e dy,
b A b A

where A = 12 √t+t
x
2
, and in the last two lines above we have used the formulas
n e−as o n e−as o
L−1 = H(t − a) and L−1 = eb(t−a) H(t − a) (see Table D.1).
s s−b
π 2 2
D.10. To show that L2 {sin(tτ )} = 3 t e−t /(4s ) . Hint. Use the defini-
4s
tion (D.24).
D.11. Use the L2 -transform to solve the singular integral equation
Z ∞  t 
2
f (y) sin(ty) dy = erf , a ∈ R.
π 0 2a

Solution. Using Example D.8, we find that



2 1 π 1
F 3
= √ ,
π 2s 4s 2s 1 + a2 s2
2

r
π
which simplifies to F (s) = 12 . On inverting the L2 -transform (using
a2 + s2
2 2
e−a t
(D.26)) we obtain the required solution as f (t) = . Note that erf(x) =
t
2 xR 2
√ 0 e−t dt.
π
nZ ∞  τ2  o
D.12. Use the Efros theorem to find L2 erfc dτ .
0 2x
Solution. Using (D.28) we get
nZ ∞  τ2  o nZ ∞ o τ2  1
L2 erfc dτ = L2 τ erfc

0 2x 0 2x τ
h h i √
1 1 π
= 2 L2 } = ,
s τ s→√s 4s5/4

√ π
which, using (D.26) for F ( s) = yields
4s5/4
n √π o √
π 1/4
L−1
2 = x .
4s 5/4 2Γ( 54 )
338 D LAPLACE TRANSFORMS

Table D.1 Some Laplace Transform Pairs


R∞
f (t) F (s) = 0 e−st f (t) dt
1 s

1. f (at), a > 0 aF a
2. e±at f (t) F (s ∓ a)
3. H(t − a)f (t − a) e−as F (s)
e−as
4. eb(t−a) H(t − a) s−b
5. tn f (t) (−1)n F (n) (s)
6. f ′ (t) sF (s) − f (0+)
7. f ′′ (t) s2 F (s) − sf (0+) − f ′ (0+)
Pn
8. f (n) (t) sn F (s) − sn−k f (k−1) (0+)
k=1
Rt F (s)
9. 0+ f (y) dy s
R t (t−y)n−1 F (s)
10. 0+ (n−1)! f (y) dy n
1
Rs∞
11. t f (t) s
F (y) dy
RT
12. f (t + T ), of period T (1 − e−T s )−1 0 e−st f (t) dt

13. √1t s
14. δ(t) 1
15. δ(t − T ) e−T s , T ≧ 0
n
16. δ (t) sn
1
17. 1, H(t) s
e−T s
18. H(t − T ) s , T ≧ 0
n!
19. tn s n+1 , n = 0, 1, 2, . . .
1
20. e±at s∓a
n!
21. tn e±at (s∓a)n+1 , n = 0, 1, 2, . . .
s
22. (1 − at)e±at (s∓a)2
a
23. sin at s2 +a2
s
24. cos at s2 +a2
a
25. sinh at s2 −a2
s
26. cosh at s2 −a2
E
Implicit Function Theorem

Let f : Rn + m 7→ Rm be a continuously differentiable function, where


Rn+m ≡ Rn × Rm . Fix a point (a, b) = (a1 , . . . , an , b1 , . . . , bm ) such that
f (a, b) = 0. Let U ⊂ Rn and V ⊂ RM , and let g : U 7→ V such that the
graphs of g satisfy the relation f = 0 on U × V . Let the Jacobian matrix
J = (Df )(a, b) of f , defined by
  
∂f1 ∂f1 ∂f1 ∂f1
··· (a, b) ··· (a, b)
 ∂x1 (a, b) (a, b)  
∂xn   ∂y1 ∂yn 
 
(Df )(a, b) =  .. .. ..  .. .. .. 
 ∂fn . . . . . .
  
∂fn   ∂fn ∂fn 
(a, b) ··· (a, b) (a, b) ··· (a, b)
∂x1 ∂xn ∂y1 ∂yn
= [x][y],

where x is the matrix of partial derivatives in the variables xi and y is the


matrix of partial derivatives in the variables yj . The implicit function theorem
states that if Y is an invertible matrix, then there are U, V and g such that

{(x, g(x) | x ∈ U } = {(x, y) ∈ U × V | f (x, y) = c}.

Thus, if f is k-times continuously differentiable in U × V , then the same holds


for the function g inside U , and

∂g  ∂f −1 ∂f
(x) = − (x, g(x)) (x, g(x)).
∂xj ∂y ∂xj

Example E.1. The implicit derivative of y with respect x, and that of x


with respect to y, can be found by totally differentiating the implicit function
f (x, y) = x2 + y2 − 1 and equating to zero, i.e., 2x dx + 2y dy = 0, giving
dy x dx y
= − and =− .
dx y dy x
F
Locally Nonsatiated Function

The utility function u : X 7→ R represents a binary relation  if x  y ⇐⇒


u(x) ≥ u(y). A preference relation  is locally nonsatiated if for all x ∈ X
and ε > 0, there exists a y such that ky − xk < ε and y ≻ x. A utility
function u : X 7→ R is locally nonsatiated if it represents a locally nonsatiated
preference relation ; that is, for every x ∈ X and ε > 0, there exists a y
such that ky − xk < ε and u(y) ≥ u(x). This definition leads to the following
result:
Theorem F.1. Suppose X ∈ Rn . A binary relation  is complete,
transitive, and continuous iff it admits a continuous utility representation
u : x 7→ R.
Theorem F.2. If  is strictly monotone, then it is locally nonsatiated.
Proof. Let x be given, and let y = x + (ε/2) e, where e = {1, 1, . . . , 1}.
Then (i) yip> x for each i; (ii) strict monotonicity implies y ≻ x; and (iii)
Pn 2

ky − xk = i=1 (ε/n) = ε/ n < ε. Hence,  is locally nonsatiated.
Theorem F.3. Prove that αx + (1 − α)y  y.
Proof. Suppose x  y, i.e., x1 + x2 ≥ y1 + y2 . Fix α ∈ (0, 1). Then
αx+(1−α)y = (αx1 +(1−α)y1, αx2 +(1−α)y2) = α(x1 +x2 )+(1−α)(y1 +y2 ) ≥
α(y1 + y2 ) + (1 − α)(y1 + y2 ) = y1 + y2 , since x1 + x2 ≥ y1 + y2 . 
If h(u, p) is continuous and locally nonsatiated, and h(u, p) is a function,
then for all p1 and p2 , we have

(p2 − p1 )[h(u, p2 ) − h(u, p1 )] ≤ 0.

This is not strictly convex because (1.0)  (0, 1) and (1, 0) 6= (0, 1), but
1 1 1 1
2 (1, 0) + 2 (0, 1) = ( 2 , 2 )  (0, 1).
Finally, if u represents , then (i)  is convex if u is quasi-concave, and
(ii)  is strongly convex if u is strictly quasi-concave.
For more details, see Richter [1971].
Bibliography
(Note: First author or single author is cited with last name first.)

Abramowitz, M., and I. A. Stegun (Eds.) 1972. Handbook of Mathematical


Functions. New York: Dover.
Afriat, S. N. 1967. The construction of utility functions from expenditure
data. International Economic Review. 8: 67-77.
Aghili A. , A. Ansari, and A. Sedgi. 2007. An inversion technique for the L2 -
transform with applications. Int. J. Contemp. Math. Scs. 2: 1387-1394.
An, M.Y. 1998. Logconcavity versus logconvexity: A complete characteriza-
tion. J. Econ. Theory. 80: 350-369.
Arrow, K. J., and A. C. Enthoven. 1961. Quasi-concave programming.
Econometrica. 29: 779-800.
———, L. Hurwicz, and H. Uzawa. 1961. Constraint qualifications in maxi-
mization problems. Naval Res. Log. Quart. 8: 175-191.
Avriel, M., W. E. Diewert, and I. Zang. 1988. Generalized Concavity. New
York: Plenum Press.
Bagnoli, Mark, and Ted Bergstrom. 1989. Log-concave Probability and its
Applications. Univerisity of Michigan. Econ. Theory. 26: 445-469.
———, and T. Bergstrom. 2005. Log-concave probability and its applications.
Econ. Theory. 26: 445-469.
Barlow, R. E., and F. Proschan. 1981. Statistical Theory of Reliability and
Life Testing: Probability Models. Silver Springs, MD: McArdale Press.
Barro, R.J., and X. Sala-i-Martin. 2004. Economic Growth. Boston: MIT
Press.
Barvinok, A. 2002. A Course in Convexity. Vol, 54, Graduate Studies in
Mathematics. American Mathematical Society.
Beale, E. M. L. 1959. On quadratic programming. Naval Research Logistic
Quart. 6: 227-243.
Bellman, R. E. 1970. Introduction to Matrix Algebra. New York, NY:
McGraw-Hill.
Berger, M. 1990. Convexity. The American Mathematical Monthly. 97(8):
650-678.
Bergstrom, T., and M. Bagnoli. 2005. Log-concave probability and its appli-
cations. Econom. Theory. 26: 445-469.
342 BIBLIOGRAPHY

Bertsekas, D. P. 1999. Nonlinear Programming. 2nd ed. Athena Scientific,


Belmont, MA.
Bertsimas, D.,and J. N. Tsitsiklis. 1997. Introduction to Linear Optimization.
Athena Scientific, Belmont, MA
Billingsley, Patrick. 1995. Probability and Measure, 3rd ed. New York, NY:
John Wiley.
Black, F., and M. Scholes. 1973. The pricing of options and corporate liabil-
ities. Journal of Political Economy. 81 (3): 637-654.
Boas, Ralph P. 1996. A Primer of Real Functions. 4th ed. Washington, DC:
Mathematical Association of America.
Borwein, Jonathan M., and Jon D. Vanderwerff. 2010. Convex Functions:
Construction, Characterizations and Counterexamples. Cambridge Uni-
versity Press.
Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex Optimization. Cam-
bridge University Press.
Bazaraa, M. S., H. D. Sherali, and C. M. Shetty. 1993. Nonlinear Program-
ming: Theory and Algorithms. 2nd ed. New York, NY: John Wiley.
Chen, Hong-Yi, Cheng-Few Lee, and Weikang Shih. 2010. Derivation and
applications of Greek letters: Review and integration, in Handbook of
Quantitative Finance and Risk Management (Cheng-Few Lee, Alice C.
Lee and John Lee, eds.). III: 491-503. New York, Springer-Verlag.
Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford
Stein. 2001. Introduction to Algorithms, 2nd ed. Section 29.3: The
simplex algorithm, 790-804. MIT Press and McGraw-Hill.
Crouzeix, J.-P. 2008. Quasi-concavity, in Steven N. Durlauf and Lawrence E.
Blume The New Dictionary of Economics, 2nd ed. Palgrave: Macmillan.
Davenport, H., and G. Pólya. 1949. On the product of two power series.
Canadian J. Math. 1: 1-5.
De Finetti, B. 1949. Sulla stratificazioni convesse. Annali di Mathematica.
30: 173-183.
D’Esopo, D. A. 1959. A convex programming procedure. Naval Research
Logistic Quarterly, 6: 33-42.
Dharmadhikari, Sudhakar, and Kumar Joag-Dev. 1988. Unimodality, Con-
vexity, and Applications. Boston, MA: Academic Press.
Eggleston, H. G. 1958. Convexity. Cambridge University Press.
Fenchel, W. 1953. Convex Cones, Sets, and Functions. Lecture Notes, Prince-
ton University.
——— . 1983. Convexity through the ages. In Convexity and Its Applica-
tions, (P. M. Gruber and J. M. Wills, eds.) pages 120-130. Birkhäuser
Verlag.
Gill, P. E., W. Murray, M. A. Saunders, and M. H. Wright. 1990. A Schur-
complement method for sparse quadratic programming. In M. G. Cox
and S. J. Hammarling (eds.) Reliable Numerical Computation. Oxford
University Press. 113-138.
BIBLIOGRAPHY 343

——— , W. Murray, and M. H. Wright. 1991. Numerical Linear Algebra and


Optimization. Vol. 1. Addison-Wesley Publishing Company, Redwood
City.
——— , and E. Wong. 2012. Sequential quadratic programming methods. In
J. Lee and S. Leyffer (eds.) Mixed Integer Nonlinear Programming. Vol.
154 of The IMA Volumes in Mathematics and its Applications. Springer
New York. 147-224.
Ginsberg, W. 1973. Concavity and quasiconcavity in economics. Journal of
Economic Theory. 6: 596-605
Golub, G., and C. F. Van Loan. 1989. Matrix Computations. 2nd ed. Johns
Hopkins University Press.
Halmos, P. R. 1958. Finite Dimensional Vector Spaces. Princeton, NJ: Van
Nostrand.
Hardy, G. H. 1967. A Course in Pure Mathematics. 10th ed. Cambridge
University Press.
Hewett, Edwin, and Karl Stromberg. 1965. Real and Abstract Analysis.
Springer-Verlag, New York.
Hildebrand, F. B. 1974. Introduction to Numerical Analysis. 2nd ed. New
York, NY: McGraw-Hill.
Hildreth, C. 1957. A quadratic programming procedure. Naval Research
Logistic Quarterly, 14: 79-85.
Hoffman, K., and R. Kunze. 1961. Linear Algebra. Englewood Cliffs, NJ:
Prentice-Hall.
Hogg, R., and S. Klugman. 1984. Loss Distributions. New York, NY: John
Wiley.
Hoggar, S. G. 1974. Chromatic polynomials and logarithmic concavity. J.
Combin. Theory, Ser. B. 16: 248-254.
Horn, R. A., and C. A. Johnson. 1985. Matrix Analysis. Cambridge Univer-
sity Press.
Horst, R., and P. Pardalos. 1994. Handbook of Global Optimization. Kluwer.
Hull, John C. 2008. Options, Futures and Other Derivatives. 7th ed. Engle-
wood Cliffs, NJ: Prentice-Hall.
Hiriart-Urruty, J.-B., and C. Lemaréchal. 2001. Fundamental of Convex
Analysis. Springer-Verlag.
Joag-Dev, K., and F. Proschan. 1983. Negative association of random vari-
ables with applications. Ann. Statis. 11: 286-295.
Johnson, Oliver, and Christina Goldschmidt. 2008. Preservation of log-
concavity on summation. Available at arXiv:math/0502548v2 [math.
PR], 12 Oct 2005. 1-15.
Kaplan, Wilfred. 1959. Advanced Calculus. Addison-Wesley, Reading, MA.
Karush, W. 1939. Minima of functions of several variables with inequalities
as side constraints. M. Sc. Dissertation. Dept. of Mathematics, Univ. of
Chicago, Chicago, IL.
Klee, V. 1971. What is a convex set? The American Mathematical Monthly.
344 BIBLIOGRAPHY

78(6): 616-631.
Krugman, P. 1991. Increasing returns and economic geography. J. Polit.
Econ. 99: 483-499.
Kuhn, H. W. 1976. Nonlinear Programming. A historical review, in Nonlinear
Programming, (R. W. Cottle and C. E. Lemke, eds.), Vol 9, SIAM-AMS
Proceedings, pages 1-26. American Mathematical Society.
——— , and A. W. Tucker. 1951. Nonlinear programming. Proceedings of
2nd Berkeley Symposium. Berkeley, University of California Press. pp.
481-492.
Kythe, Prem K. 2011. Green’s Functions and Linear Differential Equations:
Theory, Applications, and Computation. Taylor & Francis Group/CRC
Press.
Lay, S. R. 1982. Convex Sets and Their Applications. New York, NY: John
Wiley.
Lekkerkerker, C. G. 1953. A property of logarithmic concave functions, I, II
Indag. Math. 15: 505-521.
Lipschutz, S. 1968. Linear Algebra. New York, NY: McGraw-Hill.
Luenberger, D. G. 1968. Quasi-convex programming. SIAM Journal on Ap-
plied Mathematics. 16(5).
——— . 1984. Linear and Nonlinear Programming. Addison-Wesley.
Mangasarian, O. L. 1969/1994. Nonlinear Programming. New York, NY:
McGraw-Hill. Reprinted as Classics in Applied Mathematics. SIAM, 1994.
Markowitz, H. 1952. Porfolio selection. The Journal of Finance. 7(1): 77-91.
Marsden, Jerrold, E., and Anthony J. Tromba. 1976. Vector Calculus. W. H.
Freemann, San Francisco.
Marshall, A., and I. Olken. 1979. Inequalities: Theory of Majorization and
Its Applications. New York: Academic Press.
Martin, D. H. 1985. The essence of invexity. J. Optim. Theory Appl. 47:
65-76. doi:10.1007/BF00941316.
Martos, B. 1969. Subdefinite matrices and quadratic forms. SIAM J. Appl.
Math. 17: 1215-1233.
——— . 1971. Quadratic programming with quasi-convex objective function.
Opns. Res. 19: 87-97.
——— . 1975. Nonlinear Programming Theory and Methods. North-Holland.
Mas-Colell, A., M. D. Whinston, and J. R. Green. 1995. Microeconomic
Theory. Oxford: Oxford University Press.
Merkle, Milan. 1998a. Convolutions of logarithmically concave functions.
Univ. Beograd. Publ. Elektrotehn. Fak. Ser. Math. 9: 113-117.
——— . 1998b. Logarithmic concavity of distribution functions. Interna-
tional Memorial Conference “S. S. Mitrinovic” Nis., 1996 collection, in G.
V. Milovanovič (ed.) Recent Progress in Inequalities. Dordrecht: Kluwer
Academic Publishers. pp. 481-484.
Meyer, C.D. 2000. Matrix Analysis and Applied Linear Algebra. Society for
Industrial and Applied Mathematics.
BIBLIOGRAPHY 345

Michel, Anthony N., and Charles J. Herget. 2007. Algebra and Analysis.
Boston, MA: Birkhaüser.
Mocedal, J., and S. J. Wright. 1999. Numerical Optimization. Springer.
Moyer, Herman. 1969. Introduction to Modern Calculus. New York, NY:
McGraw-Hill.
Muth, E. 1977. Reliability models with positive memory derived from the
mean residual life function, in Theory and Applications of Reliability, vol.
II, C. Toskos and I. Shimi, eds. New York: Academic Press, pp. 401-436.
Nesterov, Yurii. 2004. Introductory Lectures on Convex Optimization: A
Basic Course. (Applied Optimization). New York: Springer Science +
Business Media.
——— , and Lars-Erik Persson. 2006. Convex Functions and Their Applica-
tions. (CMS Books in Mathematics). Cambridge University Press.
Nicholson, Walter. 1978. Microeconomic Theory. 2nd ed. Hinsdale: Dryden
Press.
Niculescu, C. P. 2000. A new look at Newton’s inequalities. J. Inequal Pure
Appl. Math. 1, issue 2, article 17; also http://jipam.vu.edu.au/.
——— , and Lars-Erik Persson. 2006. Convex Functions and Their Applica-
tions. (CMS Books in Mathematics).
Nielsen, Lars Tyge. 1993. Understanding N (d1 ) and N (d2 ): Risk-adjusted
probabilities in the Black-Scholes model. Revue Finance (Journal of the
French Finance Association). 14(1): 95-106.
Osserman, Robert. 1968. Two-Dimension Calculus. Harcourt, Bruce &
World, New York.
Patel, J. K., C. H. Kapadia, and D. B. Owen. 1976. Handbook of Statistical
Distributions. New York: Marcel Dekker.
Peajcariaac, Josep E., and Y. L. Tong. 1992. Convex Functions, Partial
Orderings, and Statistical Applications. Mathematics in Science & Engi-
neering. Boston, MA: Academic Press.
Pečarić, Josep E., Frank Proschan, and Y. L. Tong. 1992. Convex Functions,
Partial Orderings, and Statistical Applications. Mathematics in Science
and Engineering 187. Boston, MA: Academic Press.
Phelps, Robert R. 1993. Convex Functions, Monotone Operators and Differ-
entiability. Lecture Notes in Mathematics.
Polyak, B. T. 1987. Introduction to Optimization. Optimization Software.
Translated from Russian.
Ponstein, J. 1967. Seven kinds of convexity. SIAM Review. 9(1): 115-119.
Prékopa, András. 1971. Logarithmic concave measures with applications tom
stochastic programming. Acta Scientiarum Mathematicarum. 32: 301-
316.
Rådo, Lennat, and Bertil Westergren.1995. Mathematical Handbook for Sci-
ence and Engineering. Boston, MA: Birkhäuser.
Rice, John. 1995. Mathematical Statistics and Data Analysis, 2nd ed.,
Duxbury Press.
346 BIBLIOGRAPHY

Richter, M. K., 1971. Rational Choice, in J. S, Chipman, I. Hurwitz, M. K.


Richter, and H. F. Sonnenschein Preferences, Utility, and Demand. A
Minnesota Symposium. Chapter 2: 29:58. New York: Harcourt-Bruce-
Janovich.
Roberts, A. Wayne, and Dale E. Varberg. 1973. Convex Functions. New
York: Academic Press.
Rockafellar, Ralph Tyrell. 1970/1997. Convex Analysis. Princeton Land-
marks in Mathematics and Physics. Princeton University Press.
Ross, S. 1996. Stochastic Processes. 2nd ed. New York, NY: Wiley.
Royden, H. L. 1968. Real Analysis. 2nd ed. London: Macmillan.
Rudin, W. 1976. Principles of Mathematical Analysis. New York: McGraw-
Hill.
Ruszczyński, Andrzej. 2006. Nonlinear Optimization. Princeton, NJ: Prince-
ton University Press.
Sagan, B. E. 1988. Inductive and injective proofs of log concavity results.
Discrete Math. 68: 281-292.
——— . 1992. Inductive proof of q-log concavity. Discrete Math. 99: 289-
306.
Schrijver, A. 1986. Theory of Linear and Integer Programming. New York,
NY: John Wiley.
Schoenberg, I. J. 1951. On Pólya frequency functions, I. The totally positive
functions and their Laplace transforms. J. Analyse Math. 1: 331-374.
Simon, C., and L. Blume. 1994. Mathematics for Economists. New York: W.
W. Norton.
Singh, Richa. 2012. Optimization Methods and Quadratic Programming.
Master’s thesis, National Institute of Technology, Rourkela, India.
Skiba, A. K. 1978. Optimal growth with a convex-concave production func-
tion. Econom. 46(3): 527-539.
Slater, M. 1950. Lagrange multipliers revisited: A contribution to non-Linear
programming. Cowles Commission Discussion Papers, Math. 403, No-
vember 1950.
Smith, Peter. 1985. Convexity Methods in Variational Calculus. Research
Studies Press, Letchworth, UK, and John Wiley, New York.
Solow, R. 1956. A contribution to the theory of economic growth. Q. J. Econ.
70: 65-94.
Stanley, R. P. 1989. Log-concave and unimodal sequences in algebra, combi-
natorics, and geometry. In Graph Theory and its Applications: East and
West (Jinan, 1986); Ann. New York Acad. Sci. 576: 500-535.
Strang, G. 1980. Linear Algebra and its Applications. New York: Academic
Press.
Takayama, Akira. 1993. Analytical Methods in Economics. University of
Michigan Press.
Tikhomorov, V. M. 1990. Convex analysis. In Analysis II: Convex Analysis
and Approximation Theory. (R. V. Gamkrelidze, ed.) Vol 14, pages 1-82.
BIBLIOGRAPHY 347

Springer.
Todd, M. J. 2001. Semidefinite optimization. Acta Numerica. 10: 515-560.
Valentine, F. A. 1964. Convex Sets. New York, NY: McGraw-Hill.
Vandenberghe, L., and S. Boyd. 1995. Semidefinite programming. SIAM
Review, 49-95.
van Tiel, J. 1984. Convex Analysis. An Introductory text. New York, NY:
John Wiley.
Varian, Hal R. 1982. The nonparametric approach to demand analysis. Econo-
metrica. 50:945-973.
——— . 1992. Microeconomic Analysis. 3rd ed. New York: Norton.
Veblen, Thorstein B. 1899. The Theory of the Leisure Class: An Economic
Study of Institutions. London: Macmillan.
von Neumann, J. 1928. Zur Theorie der Gesellschaftsspiele. Math. Annalen.
100: 295-320.
——— . 1945-46. A model of general economic equilibrium. Review of Eco-
nomic Studies. 13: 1-9.
——— , and O. Morgenstern. 1953. Theory of Games and Economic Behav-
ior.3rd ed. Princeton University Press; 1st ed. 1944.
Wang, Y. Linear transformations preserving log-concavity. Linear Algebra
Appl. 359: 161-167.
——— , and Y.-N. Yeh. 2005. Log-concavity and LC-positivity. Available at
archiv:math.CO/0504164.
Webster, R. 1994. Convexity. Oxford University Press.
Whittle, P. 1971. Optimization under Constraints. New York, NY: John
Wiley.
Wilf, H. S. 1994. Generating Functionology. 2nd ed. Boston, MA: Academic
Press.
Wilmott, P. , S. Howison, and J. Dewynne. 1995. The Mathematics of Finan-
cial Derivatives: A Student Introduction. Cambridge, U.K.: Cambridge
University Press.
Wilson, C. 2012. Concave functions of a single variable. ECON-UA6, Uni-
versity of New York. http://homepages.nyu.edu/caw1, Feb 21, 2012.
Yurekli, O., and A. Sadek. 1991. Parseval-Goldstein type theorem on the
Widder potential transform and its applications. Intern. J. Math. Math.
Scs. 14:517-524.
——— , and S. Wilson. 2002. A new method of solving Bessel’s differential
equation using the ℓ2 -transform. Appl. Math. Comput. 130: 587-591.
——— , and S. Wilson. 2003. A new method of solving Hermite’s differential
equation using the ℓ2 -transform. Appl. Math. Comput. 145: 495-500.
Wolfe, P. 1959. The simplex method for quadratic programming. Economet-
rica. 27: 382-398.
Zalinescu, C. 2002. Convex Analysis in General Vector Spaces. World Scien-
tific.
Index

A C
Abel’s summation formula 209 call option 277
acceleration 52ff capital asset pricing model 292ff
affine inequality constraints 113 cash-or-nothing call 283
transformation of domain 72 center limit theorem 299, 311ff
antiderivative 39 characteristic polynomial 13
Asplund sum 204ff roots 25, 243
arbitrage 277ff Chebyshev inequality 307
area measure 207 circular helix 38
asset-or-nothing call 283 cofactor 6
autonomous expenditure multiplier 48ff comparative statics 109ff
complementary slackness conditions 97,
B 127
basins 45 concavity 34, 37
Bayes’ formula 306 , test for 34ff
Beale’s method 222ff Condition A 46, 313
bisection method 192 Condition B 209ff, 211ff, 215
Black-Scholes call price 275, 277, 280, conditional expectation 294
287, 295 cone 65, 162
economy 287 concave programming 87ff
formula 282ff conditional probability 208
model 271ff, 282 correlation coefficient 310
Black-Scholes equation 271ff, 273ff, 280, constraints, budget 113, 171
294, 300, 302, 304 , convex objective 217
Bonferroni’s inequality 305 , convex linear 217
Boole’s inequality 305 , dual 218
bordered Hessian: two functions, 16, , equality 123ff, 188ff, 218
90ff, 116, 150ff, 157, 170, 188ff, , equality and inequality 92ff, 190
193ff , implicit 139
, single function, 19ff, 157ff, 186ff, , inequality 97, 105, 126, 128, 139,
196 189, 228
bounded operators 313 , nonnegativity 99, 218
, variation 300 , qualifications 93
Bromwich contour 337 constraint set, convex 217
Brownian motion 284, 299, 301 , convex quadratic 217
budget constraint 171 control variable 235
, indifference curve 102 contour 330
Brunn-Minkowski inequality 206 , lines 162
350 INDEX

covariance 310 , chi-square 215ff, 317, 322


convergence, almost surely 309 , continuous, 315
, in distribution 312 , cumulative 315
, in mean 309 , Dirichlet 212
, in probability 309 , discrete 315
convex cone 65 , double 318
cover 63 , F- 212, 318
hull 63 , exponential 215, 317
set 63ff, 65ff,153ff, 156ff, 161ff, 163, , extreme value 212, 215ff, 317
168, 175, 182, 184, 205 , Gamma 212, 215ff, 318ff
convex feasibility 189 , geometric 208
convex optimization 138ff , Laplace 85, 212, 214, 318
programming 121ff , log-concave 212ff
convexity at a point 192 , log-normal 212, 284ff, 286, 290,
convolution of log-concave functions 201ff 292, 316
cost minimization 264ff , logistic 212, 214ff, 317
Cramer’s rule/method 10ff, 27ff, 55, , marginal 308
89, 103ff, 111ff, 115, 118ff , Maxwell 317
critical points, 16, 27, 32, 41, 44ff, 60, , mirror image of log-normal 316
83, 85ff, 88ff, 103ff, 105, 110ff, , mirror image of Pareto 319
115, 117, 119, 176, 180, 191, , normal 215, 285, 295ff, 302, 309,
196, 241, 244 316
cumulative distribution function (c.d.f.) , multivariate normal 211
203, 206ff, 216, 286ff, 315ff, , Pareto 212, 319
318ff , Poisson 208
, power function 215, 318
D , probability 214
demand(s) 174, 251ff , Rayleigh 317
, Hicksian 251, 255ff, 264ff, 267ff, , Student’s t- 212, 215ff, 319ff
270 , uniform 215, 316
, Marshallian 251ff, 253, 256, 267ff , Wishart 212
, Walrasian 251, 261ff, 263ff, 267 , Weibull 212, 215, 318
, off-peak 270 dividends 294ff
, peak period 270 , continuous 294
, uncompensated 252 dual problem 140ff, 144, 146, 217ff
determinant 4 , constraint 218
derivative process 272 duality gap 141ff, 145
drift-diffusion process 272 , strong 146
discriminant 16ff, 304
, test 235 E
distribution, beta 212, 215ff, 315 eigenvector 240ff, 243ff, 247ff, 249ff
, binomial 208, 211 elasticity of substitution 62
, Cauchy 212, 320 electron-beam lithography 198
, chi 212, 215ff, 220, 317 empty set 173
INDEX 351

entropy 207 , concave 36, 63ff, 65ff, 69ff, 78,


epigraph 70ff, 79 81ff, 85, 94, 172, 176, 181,
equation, Black-Scholes 271ff, 273ff, 280, 197ff, 240, 252
294, 300, 302, 304 , conjugate 204ff
, Black-Scholes-Merton partial dif- , constant elasticity of substitution,
ferential 279, 301 119, 152
, characteristic 13, 240ff, 247, 249 , convex 36, 63ff, 65ff, 78, 81, 94,
, diffusion 273, 287 182, 205, 213
, heat 274ff , cost 85, 106, 228
, IS-LM 48ff , cubic 199
, linear 9 cumulative distribution (c.d.f.) 203,
, parametric 52 206ff, 216, 286ff, 315ff, 318ff
, Slutsky 256ff, 260 , cumulative distribution 280, 315ff
, stochastic differential 271ff, 287, , decay 206, 317
297 , dual objective function 218
equilibrium level 109 , differentiable 292, 302
equivalent martingale measure 288 , density 202ff, 204, 213, 216, 317ff
European call 279ff, 281, 288ff, 291, , difference 316, 318
295, 302 domain 27
expectations 307ff , economic 58
, expenditure 256, 264
F , floor 93, 270
failure rate 204, 317 , Frchet differentiable 313ff
feasibility 138ff , frequency 316
feasible domain 40 , gamma 333
region 99, 134, 229 , Gateaux differentiable 311ff
first derivative test 34ff , Gaussian 206
formula, Abel’s summation 209 , Green’s 275
, Bayes’ formula 306 , Hamiltonian 234ff
, binomial 303 , Hicksian demand 255
, Black-Scholes 28ff , implicit utility 185,
, inversion 323, 333, 336 , incomplete beta 216, 322
, total probability 306 , indirect utility 256ff, 263ff
Frchet derivative 311ff, 315 , integrable 39
differential 311ff , inverse logit 197
free endpoints 236, 242 , Lagrangian 90ff, 149
Fritz John conditions 46, 135ff, 137 , linear 29
function, aggregated demand 57ff , limit of 30ff
, algebraic 223 , locally nonsatiated 262ff, 265ff,
, bell-shaped 168 267
, CES utility 177, 252 , logarithmically concave 197ff
, Cobb-Douglas utility 62, 114, , logarithmic utility 184
149, 170, 173, 184, 263ff, , log-concave 197ff, 199, 201ff, 203ff,
270 205ff, 207ff, 211ff, 216, 320
352 INDEX

, log-convex 197ff, 202, 216, 320 G


, log-linear 316 Gateaux differential 311ff
, marginal expenditure 58 geometric distribution 208
, Marshallian demand 256 Giffen goods 251, 259ff
, mean residual lifetime 208, 316ff gradient 45ff, 313ff
, moment generating 301 Greek letters 285
, multivariate 163
H
, negative entropy 82
Hamiltonian 233ff, 236ff, 239ff, 242ff,
, objective 87ff, 97ff, 121, 143ff,
246ff
170, 227ff
, current value 239ff, 247ff, 249
, of exponential order 321
Hessian, 15ff, 24ff, 28, 60ff, 76, 82, 87ff,
, parabolic cylinder 194, 232
114, 116, 120ff, 125ff,150, 175,
, polynomial 50
187, 190, 197, 201, 235ff, 239ff,
, probability density 199, 202ff, 212,
242ff, 247ff, 249
214ff, 216, 287, 315ff, 318ff
bordered Hessian: two functions, 16,
, probability mass 208
90ff, 116, 150ff, 157, 170, 188ff,
, quadratic, 29, 50
193ff
, quasi-concave 19ff, 153ff,155ff, 160ff,
, single function, 19ff, 157ff, 186ff, 196
162ff, 164ff, 166, 169, 171,
, principal minors of 18ff
174, 177ff, 185ff, 193, 197,
Hicksian demand correspondence 255ff,
262, 267
264ff, 267ff, 270
, quasi-convex, 19ff, 155ff, 159,
hypograph 65ff
163, 171ff,181ff, 185ff, 192ff,
194ff I
, rational 29 income effect 258, 266
, reliability 203ff, 318 implicit differentiation 59
, residual lifetime 317 implied volatility 278ff
, Slutsky 256, 260 inequality, Bonferroni’s 305
, strictly concave 67 , Boole’s 305
, strictly convex 72, 166, 230 , Chebyshev 307
, strictly monotone 160 , Hlder 82ff
, strictly monotone inverse 161 , Jensen’s for concave functions 69ff, 71ff
, strictly quasi-concave 153ff, 160ff, , Jensen’s for convex functions 73ff
166ff, 267 , Jensen’s modified 186
, strictly quasi-convex 187ff, , Prkopa-Leindler 199, 207
, support 206 , quadratic Newton 208
, total cost 27 , Schwarz 308
, total revenue 59 , triangle 83
, totally positive of order k 199ff indifference curve 78
, upper semicontinuous 159, 194 independent events 306
, utility 112, 114, 116, 150, 173, infinite slab 326
176, 256, 267 inflection point 35, 40ff, 45ff, 51ff, 61,
, value 105 68
, vector-valued 37ff integrals of log-concave functions 204
INDEX 353

IS curve 114 Laplacian 76


schedule 46 Lebesgue measure 200
IS-LM equation 48 Leibniz’s rule 322
isocost lines 46ff limit of a function 30ff
isoquant 60ff in mean 309
isomorphism 46, 75 in probability 309
It’s lemma 272ff, 279, 284, 287, 289, linear programming 218, 225ff
296, 300, 303 LM curve 111, 114
schedule 48
J
log-concave densities 213
Jacobian, 14ff
, random variable 210
determinant 14
, sequence 208ff, 210
matrix 315, 339
log-concavity 200ff, 203, 205, 208ff, 210
Jensen’s inequality 69ff, 71ff
log-convexity 199
K log-normal conditional expected value
KKT conditions 46, 91ff, 94ff, 96ff, 98ff, 286
102, 106ff, 108, 120, 122ff, lower level set 181ff
124ff, 128ff, 132, 143, 169ff,
200, 202ff, 223ff, 226ff, 229ff,
M
231, 267ff, 270
marginal cost 49
, dual feasibility 92
density 213
, primal feasibility 92
marginal expenditure function 58
, dual feasibility 92
marginal propensity to consume 49ff
, regularity 93
marginal rate of technical substitution
, slackness 92, 126ff, 238, 246
50, 268
, stationary 92ff
marginal revenue 49
L Marshallian demand 251ff, 253, 256,
labor/leisure decision problem 265 266ff, 267ff
Lagrange multiplier(s) method 46, 89ff, martingale 288ff, 296ff, 298, 301
92, 95, 100, 114, 116, 123, mathematical economics 47ff, 105ff
133, 170ff, 177, 222, 251 matrix, addition/subtraction 1
Lagrangian 96ff, 100ff, 104, 107, 112, , adjoint 7ff
114ff, 118, 122ff, 126ff, 129, , coefficient 9
135, 144, 151, 225ff, 2570 , cofactor 6, 56
duality 140ff , definite 12ff, 88
dual 140ff , dimensions 1, 22
Laplace transform(s) 199, 321ff, 327, , determinant 4
329, 334 , Hessian, 15ff, 24ff, 28, 60ff, 76,
inverse 323ff, 329 82, 87ff, 114, 116, 120ff, 125ff,
inversion 324 150, 175, 187, 190, 197, 201,
L2 transform 331ff, 337 235ff, 239ff, 242ff, 247ff, 249
, inversion formulas 331, 334 , bordered Hessian: two functions,
, convolution theorem 331 16, 90ff, 116, 150ff, 157, 170,
, table 338 188ff, 193ff
354 INDEX

, single function, 19ff, 157ff, 186ff, , convex feasibility 189


196 , Cramer’s (rule/method) 10ff, 25ff,
, idempotent 4 89, 103ff, 105, 110ff, 115, 117ff,
, identity 4, 23, 219 119
, inverse 7ff, 23, 25 , Fritz John 46, 135ff, 137
inversion 26 , Gauss elimination 11ff, 24
, Jacobian 315, 341 , Hildreth-D’Esopo 220ff
, minor 6 , iteration 217, 222
, negative-definite (ND) 13, 16ff, , KKT46, 91ff, 94ff, 96ff, 98ff, 102,
25, 28ff, 178, 236, 239 106ff, 108, 120, 122ff, 124ff,
, negative-semidefinite (NSD), 13, 128ff, 132, 143, 169ff, 200,
16, 87, 197, 236 202ff, 223ff, 226ff, 229ff, 231,
, nonsingular, 5, 219 267ff, 270
, null, 4 , Lagrange multiplier(s) 46, 89ff,
, positive-definite (PD), 13, 16ff, 92, 95, 100, 114, 116, 123,
25, 28, 134, 149ff, 151,178 133, 170ff, 177, 222, 251
, positive-semidefinite (PSD) 13,16ff, , Newton-type 16
88, 121 , simplex 225ff
properties 3 , trial-and-error 95, 112
, rank 5, 22 , Wolfe’s 225ff
, semidefinite 12ff, 88, minimization, unconstrained 121ff
, singular 5, 23 minimizer 127, 136, 150, 252
, Slutsky 256, 260 minimum value 32ff, 43ff
, substitution 266 , absolute 32ff
, symmetric 4 , global 68, 149, 178, 193
, trace 4, 12, 148 , local 32ff, 68, 125, 134
, transpose of 1, , relative 42, 53, 60ff, 84, 86ff, 114,
, triangular 4 178
, confirmable 2 moments 309
maximizer 160 mountainous terrain 162
maximum principle 234, 240 multiplication principle 311
, value 32ff, 42ff multivariate density 212
, absolute 32ff
N
, local 32ff, 115
noncovex set 63
, global 33ff, 43ff, 74, 114, 168, 178
nonlinear programming 92, 94, 128ff,
, relative 43, 53, 60ff, 84, 86ff, 114,
169
178
norm 63
, yield 213ff
normed linear space 311
mean width 206
numraire 288ff
mean residual lifetime 204
minmax 140 O
theorem 191 one-unit charge 91
method, Beale’s 222ff, 231 optimal control theory 233ff
, bisection 190 optimal control problem 234
INDEX 355

, discounted 238ff quasi-concave optimization 169ff


optimization, 16ff, 40ff quasi-concave programming 168ff,
, constrained 18, 89ff 181ff
, convex 138, 230 quasi-convexity 154, 171, 184ff, 192ff
problems 87, 68, 196 optimality 184
, quasi-convex 187ff
R
, sufficient conditions 235, 239
radius of curvature 52
orthant, 19ff, 157
random stochastic variable 210, 308ff
P random variable, bivariate 308
parabolic cylinder 194, 132 , continuous 306ff
Pareto optimal points 231ff , discrete 306ff
partition 226 radio-wave attenuation 68
p.d.f. 199, 203, 213, 215, 287 real normed space 63
peak load pricing 106 regularity conditions 93ff, 169
Pontryagin minimum principle 233 replicable claim 275
portfolio 304 replicating portfolio 277
, replicating 277 right side integral 203
, risk-less 279 risk-neutral measure 297
, self-financing 278 Roy’s identity 266
principal minor 17ff
S
, first 17ff, 115
saddle point 44ff, 84, 113, 145
, second 17, 116
sufficiency 145ff
primal problem 140ff, 142
second-derivative test 16, 34ff, 50
principal minors, 17ff, 19
second-order derivatives 42
probability, conditional 306
semimartingale 300
density function 213, 315ff
semi-sphere 162
laws 305
semi-infinite medium 328
measure 214, 305
shadow prices 91
, risk-neutral 284
Shephard’s lemma 251ff, 253, 257ff,
proportional tax 58ff
269ff
put options 275, 279
sigma-algebra 200
Q sigma-finite measure 200, 202
quadratic programming 217ff, 222ff, 225 slackness variable 126ff, 238, 246
programs 217 , complementary 127
qualifications, constant rank 93 Slater conditions 93, 140, 170, 225ff,
, linearity constrained 93 228, 232
, linear independence constrained Slutsky matrix 265
93 , conditions 93, 140, 170, 226
, Mangasarian-Fromovitz constraints , matrix 256, 260
93 solution by inverse 10
, quasi-normality constraint 94 solution, optimal 219
, regularity 93, 128, 131 , vector 9ff
quasi-concavity 154ff, 159ff, 168, 179, strict quasi-concavity 159ff
356 INDEX

stochastic process 296 , Steiner’s 307


strong duality 146 , Weierstrass 231
Student’s t-distribution 211, 214, 216, transversality condition 243ff
319ff time horizon 233
substitution effect 258, 266 total energy of the system 233
sufficient conditions 93
U
summation by parts 209
uniformly convergent series 327
supply and demand 55
upper contour set 153
pricing 106
upper level set 153ff, 169
systems of linear equations 9, 21ff
utility maximization 195, 252
T V
Taylor’s approximation, first-order 68, variable, control 235ff, 244ff, 246ff, 248,
74, 193, 314 250
approximation, second-order 75, 314 , costate 234ff, 244ff, 246ff, 248,
series 46 250
theorem, Cauchy mean value 200 , slackness 162ff
, Cauchy 335 , state 233, 235, 243ff, 246ff, 248,
, Cauchy residue 327 250
, center limit 297, 309ff variance 298, 305
, convolution 324, 324 Veblen goods 261
, Efros 333 vector, column 1, 21
, Fermat’s 33 , linearly independent 46ff
, Feymann-Kac 288, 292 multiplication, 2
, generalized mean-value 204ff , position 52
, generalized product 332 , row 1, 21
, Hoggar’s 210 , unit 45, 52
, implicit function 339 volatility of returns 282
, intermediate value 31 , skew 280
, inversion 327
, L2 convolution 331 W
, minmax 191, 264 Walras’s law 262
, sandwich 31 Walrasian demand 251, 261ff, 263ff, 265
, mean-value 33, 50, wartime rationing 111ff
, Rolle’s 32 Wiener process 272

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy