0% found this document useful (0 votes)
85 views33 pages

Math 2400 Lecture Notes: Differentiation: Pete L. Clark

This document provides lecture notes on differentiation. It begins by distinguishing differentiability from continuity, noting that a differentiable function is always continuous but not vice versa. It then covers the basic differentiation rules including the constant rule, sum rule, and product rule. It also discusses the linearity of the derivative and provides an example of applying the generalized product rule to a function of multiple terms. The notes conclude by stating that differentiation is a linear mapping and linking calculus to linear algebra.

Uploaded by

EGE ERTEN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views33 pages

Math 2400 Lecture Notes: Differentiation: Pete L. Clark

This document provides lecture notes on differentiation. It begins by distinguishing differentiability from continuity, noting that a differentiable function is always continuous but not vice versa. It then covers the basic differentiation rules including the constant rule, sum rule, and product rule. It also discusses the linearity of the derivative and provides an example of applying the generalized product rule to a function of multiple terms. The notes conclude by stating that differentiation is a linear mapping and linking calculus to linear algebra.

Uploaded by

EGE ERTEN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

MATH 2400 LECTURE NOTES: DIFFERENTIATION

PETE L. CLARK

Contents
1. Differentiability Versus Continuity 2
2. Differentiation Rules 3
3. Optimization 7
3.1. Intervals and interior points 7
3.2. Functions increasing or decreasing at a point 7
3.3. Extreme Values 8
3.4. Local Extrema and a Procedure for Optimization 10
3.5. Remarks on finding roots of f ′ 12
4. The Mean Value Theorem 12
4.1. Statement of the Mean Value Theorem 12
4.2. Proof of the Mean Value Theorem 13
4.3. The Cauchy Mean Value Theorem 14
5. Monotone Functions 15
5.1. The Monotone Function Theorems 15
5.2. The First Derivative Test 17
5.3. The Second Derivative Test 18
5.4. Sign analysis and graphing 19
5.5. A theorem of Spivak 21
6. Inverse Functions I: Theory 21
6.1. Review of inverse functions 21
6.2. The Interval Image Theorem 23
6.3. Monotone Functions and Invertibility 23
6.4. Inverses of Continuous Functions 24
6.5. Inverses of Differentiable Functions 25
7. Inverse Functions II: Examples and Applications 27
1
7.1. x n 27
7.2. L(x) and E(x) 28
7.3. Some inverse trigonometric functions 31
References 33


c Pete L. Clark, 2012.
Thanks to Bryan Oakley for some help with the proof of Proposition 44.
1
2 PETE L. CLARK

1. Differentiability Versus Continuity


Recall that a function f : D ⊂ R → R is differentiable at a ∈ D if
f (x + h) − f (x)
lim
h→0 h
exists, and when this limit exists it is called the derivative f ′ (a) of f at a. More-
over, the tangent line to y = f (x) at f (a) exists if f is differentiable at a and is
the unique line passing through the point (a, f (a)) with slope f ′ (a).

Note that an equivalent definition of the derivative at a is


f (x) − f (a)
lim .
x→a x−a
One can see this by going to the ϵ-δ definition of a limit and making the “substitu-
tion” h = x − a: then 0 < |h| < δ ⇐⇒ 0 < |x − a| < δ.
Theorem 1. Let f : D ⊂ R → R be a function, and let a ∈ D. If f is differentiable
at a, then f is continuous at a.
Proof. We have
( )( )
f (x) − f (a) f (x) − f (a)
lim f (x)−f (a) = lim ·(x−a) = lim lim x − a = f ′ (a)·0 = 0.
x→a x→a x−a x→a x−a x→a

Thus
0 = lim (f (x) − f (a)) = ( lim f (x)) − f (a),
x→a x→a
so
lim f (x) = f (a).
x→a

Remark about linear continuity...

The converse of Theorem 1 is far from being true: a function f which is con-
tinuous at a need not be differentiable at a. An easy example of this is f (x) = |x|
at a = 0.

But in fact the situation is even worse: a function f : R → R can be continuous


everywhere yet still fail to be differentiable at many points. One way of introducing
points of non-differentiability while preserving continuity is to take the absolute
value of a differentiable function.
Theorem 2. Let f : D ⊂ R → R be continuous at a ∈ D.
a) Then |f | is continuous at a.
b) The following are equivalent:
(i) f is differentiable at a, and either f (a) ̸= 0 or f (a) = f ′ (a) = 0.
(ii) |f | is differentiable at a.
Proof. a) We have already proved this; it is restated for comparison with part b).
b) (i) =⇒ (ii): Suppose first that f is differentiable at a and also that f (a) ̸= 0.
By Theorem 1 f is continuous at a and therefore there exists some δ > 0 such that
for all x ∈ I = (a − δ, a + δ), f has the same sign at x as it does at a: in other
words, if f (a) > 0 then f (x) is positive for all x ∈ I and if f (a) < 0 then f (x)
MATH 2400 LECTURE NOTES: DIFFERENTIATION 3

is negative for all x ∈ I. In the first case, upon restriction to I, |f | = f , so it is


differentiable at a since f is. In the second case, upon restriction to I, |f | = −f ,
which is also differentiable at a since f is and hence also −f is.
Now suppose that f (a) = f ′ (a) = 0 . . .


2. Differentiation Rules
Theorem 3. (Constant Rule) Let f be differentiable at a ∈ R and C ∈ R. Then
the function Cf is also differentiable at a and
(Cf )′ (a) = Cf ′ (a).
Proof. There is nothing to it:
( )
(Cf )(a + h) − (Cf )(a) f (a + h) − f (a)
(Cf )′ (a) = lim =C lim = Cf ′ (a).
h→0 h h→0 h


Theorem 4. (Sum Rule) Let f and g be functions which are both differentiable at
a ∈ R. Then the sum f + g is also differentiable at a and
(f + g)′ (a) = f ′ (a) + g ′ (a).
Proof. Again, no biggie:
(f + g)(a + h) − (f + g)(a) f (a + h) − f (a) g(a + h) − g(a)
(f +g)′ (a) = lim = lim +
h→0 h h→0 h h
f (a + h) − f (a) g(a + h) − g(a)
= lim + lim = f ′ (a) + g ′ (a).
h→0 h h→0 h


These results, simple as they are, have the following important consequence.
Corollary 5. (Linearity of the Derivative) For any differentiable functions f and
g and any constants C1 , C2 , we have
(C1 f + C2 g)′ = C1 f ′ + C2 g ′ .
The proof is an immediate application of the Sum Rule followed by the Product
Rule. The point here is that functions L : V → W with the property that L(v1 +
v2 ) = L(v1 ) + L(v2 ) and L(Cv) = CL(v) are called linear mappings, and are
extremely important across mathematics.1 The study of linear mappings is the
subject of linear algebra. That differentiation is a linear mapping (on the infinite-
dimensional vector space of real functions) provides an important link between
calculus and algebra.
Theorem 6. (Product Rule) Let f and g be functions which are both differentiable
at a ∈ R. Then the product f g is also differentiable at a and
(f g)′ (a) = f ′ (a)g(a) + f (a)g ′ (a).

1We are being purposefully vague here as to what sort of things V and W are...
4 PETE L. CLARK

Proof.
f (a + h)g(a + h) − f (a)g(a)
(f g)′ (a) = lim
h→0 h
f (a + h)g(a + h) − f (a)g(a + h) + (f (a)g(a + h) − f (a)g(a))
= lim
h→0 h
( )( ) ( )
f (a + h) − f (a) g(a + h) − g(a)
= lim lim g(a + h) + f (a) lim .
h→0 h h→0 h→0 h
Since g is differentiable at a, g is continuous at a and thus limh→0 g(a + h) =
limx→ ag(x) = g(a). The last expression above is therefore equal to
f ′ (a)g(a) + f (a)g ′ (a).

Dimensional analysis and the product rule.

The generalized product rule: suppose we want to find the derivative of a func-
tion which is a product of not two but three functions whose derivatives we already
know, e.g. f (x) = x sin xex . We can – of course? – still use the product rule, in
two steps:
f ′ (x) = (x sin xex )′ = ((x sin x)ex )′ = (x sin x)′ ex + (x sin x)(ex )′
= (x′ sin x + x(sin x)′ )ex + x sin xex = sin x + x cos xex + x sin xex .
Note that we didn’t use the fact that our three differentiable functions were x,
sin x and ex until the last step, so the same method shows that for any three
functions f1 , f2 , f3 which are all differentiable at x, the product f = f1 f2 f3 is also
differentiable at a and
f ′ (a) = f1′ (a)f2 (a)f3 (a) + f1 (a)f2′ (a)f3 (a) + f1 (a)f2 (a)f3′ (a).
Riding this train of thought a bit farther, here a rule for the product of any finite
number n ≥ 2 of differentiable functions.
Theorem 7. (Generalized Product Rule) Let n ≥ 2 be an integer, and let f1 , . . . , fn
be n functions which are all differentiable at a. Then f = f1 · · · fn is also differen-
tiable at a, and
(1) (f1 · · · fn )′ (a) = f1′ (a)f2 (a) · · · fn (a) + . . . + f1 (a) · · · fn−1 (a)fn′ (a).
Proof. By induction on n.
Base Case (n = 2): This is precisely the “ordinary” Product Rule (Theorem XX).
Induction Step: Let n ≥ 2 be an integer, and suppose that the product of any n
functions which are each differentiable at a ∈ R is differentiable at a and that the
derivative is given by (1). Now let f1 , . . . , fn , fn+1 be functions, each differentiable
at a. Then by the usual product rule
(f1 · · · fn fn+1 )′ (a) = ((f1 · · · fn )fn+1 )′ (a) = (f1 · · · fn )′ (a)fn+1 (a)+f1 (a) · · · fn (a)fn+1

(a).
Using the induction hypothesis this last expression becomes
(f1′ (a)f2 (a) · · · fn (a) + . . . + f1 (a) · · · fn−1 (a)fn′ (a)) fn+1 (a)+f1 (a) · · · fn (a)fn+1

(a)
= f1′ (a)f2 (a) · · · fn (a)fn+1 (a) + . . . + f1 (a) · · · fn (a)fn+1

(a).

MATH 2400 LECTURE NOTES: DIFFERENTIATION 5

Example: We may use the Generalized Product Rule to give a less computationally
intensive derivation of the power rule
(xn )′ = nxn−1
for n a positive integer. Indeed, taking f1 = · · · = fn = x, we have f (x) = xn =
f1 · · · fn , so applying the Generalized Power rule we get
(xn )′ = (x)′ x · · · x + . . . + x · · · x(x)′ .
Here in each term we have x′ = 1 multipled by n − 1 factors of x, so each term
evalutes to xn−1 . Moreover we have n terms in all, so
(xn )′ = nxn−1 .
No need to mess around with binomial coefficients!
Example: More generally, for any differentiable function f and n ∈ Z+ , the Gener-
alized Product Rule shows that the function f (x)n is differentiable and (f (x)n )′ =
nf (x)n−1 . (This sort of computation is more traditionally done using the Chain
Rule...coming up soon!)
Theorem 8. (Quotient Rule) Let f and g be functions which are both differentiable
at a ∈ R, with g(a) ̸= 0. Then fg is differentiable at a and
( )′
f g(a)f ′ (a) − f (a)g ′ (a)
(a) = .
g g(a)2
Proof. Step 0: First observe that since g is continuous and g(a) ̸= 0, there is
some interval I = (a − δ, a + δ) about a on which g is nonzero, and on this
interval fg is defined. Thus it makes sense to consider the difference quotient
f (a+h)/g(a+h)−f (a)/g(a)
h for h sufficiently close to zero.
Step 1: We first establish the Reciprocal Rule, i.e., the special case of the Quo-
tient Rule in which f (x) = 1 (constant function). Then
1 1/g(a + h) − 1/g(a)
( )′ (a) = lim
g h→0 h
( )( )
g(a) − g(a + h) g(a + h) − g(a) 1 −g ′ (a)
= lim = − lim lim = .
h→0 hg(a)g(a + h) h→0 h h→0 g(a)g(a + h) g(a)2
Above we have once again used the fact that g is differentiable at a implies g is
continuous at a.
Step 2: We now derive the full Quotient Rule by combining the Product Rule and
the Reciprocal Rule. Indeed, we have
( )′ ( )′ ( )′
f 1 ′ 1 1
(a) = f · (a) = f (a) + f (a) (a)
g g g(a) g
f ′ (a) g ′ (a) g(a)f ′ (a) − g ′ (a)f (a)
= − f (a) = .
g(a) g(a)2 g(a)2

Lemma 9. Let f : D ⊂ R → R. Suppose:
(i) limx→a f (x) exists, and
(ii) There exists a number L ∈ R such that for all δ > 0, there exists at least one x
with 0 < |x − a| < δ such that f (x) = L.
Then limx→a f (x) = L.
6 PETE L. CLARK

Proof. We leave this as an (assigned, this time!) exercise, with the following sugges-
tion to the reader: suppose that limx→a f (x) = M ̸= L, and derive a contradiction
by taking ϵ to be small enough compared to |M − L|. 
Example: Consider, again, for α ∈ R, the function fα : R → R defined by fα (x) =
xα sin( x1 ) for x ̸= 0 and fα (0) = 0. Then f satisfies hypothesis (ii) of Lemma 9
with L = 0, since on any deleted interval around zero, the function sin( x1 ) takes the
value 0 infinitely many times. According to Lemma 9 then, if limx→0 fα (x) exists
at all, then it must be 0. As we have seen, the limit exists iff α > 0 and is indeed
equal to zero in that case.
Theorem 10. (Chain Rule) Let f and g be functions, and let a ∈ R be such that
f is differentiable at a and g is differentiable at f (a). Then the composite function
g ◦ f is differentiable at a and
(g ◦ f )′ (a) = g ′ (f (a))f ′ (a).
Proof. Motivated by Leibniz notation, it is tempting to argue as follows:
( ) ( )
′ g(f (x)) − g(f (a)) g(f (x)) − g(f (a)) f (x) − f (a)
(g ◦ f ) (a) = lim = lim ·
x→a x−a x→a f (x) − f (a) x−a
( )( )
g(f (x)) − g(f (a)) f (x) − f (a)
= lim lim
x→a f (x) − f (a) x→a x−a
( )( )
g(f (x)) − g(f (a)) f (x) − f (a)
= lim lim = g ′ (f (a))f ′ (a).
f (x)→f (a) f (x) − f (a) x→a x−a
The replacement of “limx→a . . . by limf (x)→f (a) . . .” in the first factor above is
justified by the fact that f is continuous at a.
However, the above argument has a gap in it: when we multiply and divide
by f (x) − f (a), how do we know that we are not dividing by zero?? The answer
is that we cannot rule this out: it is possible for f (x) to take the value f (a) on
arbitarily small deleted intervals around a: again, this is exactly what happens for
the function fα (x) of the above example near a = 0.2 This gap is often held to
invalidate the proof, and thus the most common proof of the Chain Rule in honors
calculus / basic analysis texts proceeds along (superficially, at least) different lines.
But in fact I maintain that the above gap may be rather easily filled to give a
complete proof. The above argument is valid unless the following holds: for all
δ > 0, there exists x with 0 < |x − a| < δ such that f (x) − f (a) = 0. So it remains
to give a different proof of the Chain Rule in that case. First, observe that with the
above hypothesis, the difference quotient f (x)−f x−a
(a)
is equal to 0 at points arbitarily
close to x = a. It follows from Lemma 9 that if
f (x) − f (a)
lim
x→a x−a
exists at all, then it must be equal to 0. But we are assuming that the above limit
exists, since we are assuming that f is differentiable at a. Therefore what we have
2One should note that in order for a function to have this property it must be “highly oscillatory
near a” as with the functions fα above: indeed, fα is essentially the simplest example of a
function having this kind of behavior. In particular, most of the elementary functions considered
in freshman calculus do not exhibit this highly oscillatory behavior near any point and therefore
the above argument is already a complete proof of the Chain Rule for such functions. Of course
our business here is to prove the Chain Rule for all functions satisfying the hypotheses of the
theorem, even those which are highly oscillatory!
MATH 2400 LECTURE NOTES: DIFFERENTIATION 7

seen is that in the remaining case we have f ′ (a) = 0, and therefore, since we are
trying to show that (g◦f )′ (a) = g ′ (f (a))f ′ (a), we are trying in this case to show that
(g ◦ f )′ (a) = 0. So consider our situation: for x ∈ R we have two possibilities: the
first is f (x)−f (a) = 0, in which case also g(f (x))−g(f (a)) = g(f (a))−g(f (a)) = 0,
so the difference quotient is zero at these points. The second is f (x) − f (a) ̸= 0, in
which case the algebra
g(f (x)) − g(f (a)) f (x) − f (a)
g(f (x)) − g(f (a)) = ·
f (x) − f (a) x−a
is justified, and the above argument shows that this expression tends to g ′ (f (a))f ′ (a) =
0 as x → a. So whichever holds, the difference quotient g(f (x))−g(f x−a
(a))
is close to
(or equal to!) zero.3 Thus the limit tends to zero no matter which alternative
obtains. Somewhat more formally, if we fix ϵ > 0, then the first step of the argu-
ment shows that there is δ > 0 such that for all x with 0 < |x − a| < δ such that
f (x) − f (a) ̸= 0, | g(f (x))−g(f
x−a
(a))
| < ϵ. On the other hand, when f (x) − f (a) = 0,
then | g(f (x))−g(f
x−a
(a))
| = 0, so it is certainly less than ϵ! Therefore, all in all we have
0 < |x − a| < δ =⇒ | g(f (x))−g(f
x−a
(a))
| < ϵ, so that
g(f (x)) − g(f (a))
lim = 0 = g ′ (f (a))f ′ (a).
x→a x−a


3. Optimization
3.1. Intervals and interior points.

At this point I wish to digress to formally define the notion of an interval on


the real line and and interior point of the interval. . . .
3.2. Functions increasing or decreasing at a point.

Let f : D → R be a function, and let a be an interior point of D. We say that f is


increasing at a if for all x sufficiently close to a and to the left of a, f (x) < f (a)
and for all x sufficiently close to a and to the right of a, f (x) > f (a). More formally
phrased, we require the existence of a δ > 0 such that:
• for all x with a − δ < x < a, f (x) < f (a), and
• for all x with a < x < a + δ, f (x) > f (a).

We say f is decreasing at a if there exists δ > 0 such that:


• for all x with a − δ < x < a, f (x) > f (a), and
• for all x with a < x < a + δ, f (x) < f (a).

We say f is weakly increasing at a if there exists δ > 0 such that:


• for all x with a − δ < x < a, f (x) ≤ f (a), and
• for all x with a < x < a + δ, f (x) ≥ f (a).

3This is the same idea as in the proof of the Switching Theorem, although – to my mild
disappointment – we are not able to simply apply the Switching Theorem directly, since one of
our functions is not defined in a deleted interval around zero.
8 PETE L. CLARK

Exercise: Give the definition of “f is decreasing at a”.

Exercise: Let f : I → R, and let a be an interior point of I.


a) Show that f is increasing at a iff −f is decreasing at a.
b) Show that f is weakly increasing at a iff −f is weakly decreasing at a.

Example: Let f (x) = mx + b be the general linear function. Then for any a ∈ R:
f is increasing at a iff m > 0, f is weakly increasing at a iff m ≥ 0, f is decreasing
at a iff m < 0, and f is weakly decreasing at a iff m ≤ 0.

Example: Let n be a positive integer, let f (x) = xn . Then:


If x is odd, then for all a ∈ R, f (x) is increasing at a.
If x is even, then if a < 0, f (x) is decreasing at a, if a > 0 then f (x) is increasing
at a. Note that when n is even f is neither increasing at 0 nor decreasing at 0
because for every nonzero x, f (x) > 0 = f (0).4

If one looks back at the previous examples and keeps in mind that we are sup-
posed to be studying derivatives (!), one is swiftly led to the following fact.
Theorem 11. Let f : I → R, and let a be an interior point of a. Suppose f is
differentiable at a.
a) If f ′ (a) > 0, then f is increasing at a.
b) If f ′ (a) < 0, then f is decreasing at a.
c) If f ′ (a) = 0, then no conclusion can be drawn: f may be increasing through a,
decreasing at a, or neither.
Proof. a) The differentiability of f at a has an ϵ-δ interpretation, and the idea is to
use this interpretation to our advantage. Namely, take ϵ = f ′ (a): there exists δ > 0
such that for all x with 0 < |x − a| < δ, | f (x)−f
x−a
(a)
− f ′ (a)| < f ′ (a), or equivalently
f (x) − f (a)
0< < 2f ′ (a).
x−a
In particular, for all x with 0 < |x−a| < δ, f (x)−f
x−a
(a)
> 0, so: if x > a, f (x)−f (a) >
0, i.e., f (x) > f (a); and if x < a, f (x) − f (a) < 0, i.e., f (x) < f (a).
b) This is similar enough to part a) to be best left to the reader as an exercise.5
c) If f (x) = x3 , then f ′ (0) = 0 but f is increasing at 0. If f (x) = −x3 , then
f ′ (0) = 0 but f is decreasing at 0. If f (x) = x2 , then f ′ (0) = 0 but f is neither
increasing nor decreasing at 0. 
3.3. Extreme Values.

Let f : D → R. We say M ∈ R is the maximum value of f on D if


(MV1) There exists x ∈ D such that f (x) = M , and
(MV2) For all x ∈ D, f (x) ≤ M .
4We do not stop to prove these assertions as it would be inefficient to do so: soon enough we
will develop the right tools to prove stronger assertions. But when given a new definition, it is
always good to find one’s feet by considering some examples and nonexamples of that definition.
5Suggestion: either go through the above proof flipping inequalities as appropriate, or use the
fact that f is decreasing at a iff −f is increasing at a and f ′ (a) < 0 ⇐⇒ (−f )′ (a) > 0 to apply
the result of part a).
MATH 2400 LECTURE NOTES: DIFFERENTIATION 9

It is clear that a function can have at most one maximum value: if it had more
than one, one of the two would be larger than the other! However a function need
not have any maximum value: for instance f : (0, ∞) → R by f (x) = x1 has no
maximum value: limx→0+ f (x) = ∞.

Similarly, we say m ∈ R is the minimum value of f on D if


(mV1) There exists x ∈ D such that f (x) = m, and
(mV2) For all x ∈ D, f (x) ≥ m.

Again a function clearly can have at most one minimum value but need not have
any at all: the function f : R \ {0} → R by f (x) = x1 has no minimum value:
limx→0− f (x) = −∞.

Exercise: For a function f : D → R, the following are equivalent:


(i) f assumes a maximum value M , a minimum value m, and M = m.
(ii) f is a constant function.

Recall that a function f : D → R is bounded above if there exists a number


B such that for all x ∈ D, f (x) ≤ B. A function is bounded below if there exists
a number b such that for all x ∈ D, f (x) ≥ b. A function is bounded if it is both
bounded above and bounded below: equivalently, there exists B ≥ 0 such that for
all x ∈ D, |f (x)| ≤ B: i.e., the graph of f is “trapped between” the horizontal lines
y = B and y = −B.

Exercise: Let f : D → R be a function.


a) Show: if f has a maximum value, it is bounded above.
b) Show: if f has a minimum value, it is bounded below.

Exercise: a) If a function has both a maximum and minimum value on D, then it


is bounded on D: indeed, if M is the maximum value of f and m is the minimum
value, then for all x ∈ B, |f (x)| ≤ max(|m|, |M |).
b) Give an example of a bounded function f : R → R which has neither a maximum
nor a minimum value.

We say f assumes its maximum value at a if f (a) is the maximum value


of f on D, or in other words, for all x ∈ D, f (x) ≤ f (a). Simlarly, we say f
assumes its minimum value at a if f (a) is the minimum value of f on D, or in
other words, for all x ∈ D, f (x) ≥ f (a).

Example: The function f (x) = sin x assumes its maximum value at x = π2 , be-
cause sin π2 = 1, and 1 is the maximum value of the sine function. Note however
that π2 is not the only x-value at which f assumes its maximum value: indeed, the
sine function is periodic and takes value 1 precisely at x = π2 + 2πn for n ∈ Z.
Thus there may be more than one x-value at which a function attains its maximum
value. Similarly f attains its minimum value at x = 3π 2 – f ( 2 ) = −1 and f takes

no smaller values – and also at x = 2 + 2πn for n ∈ Z.



10 PETE L. CLARK

Example: Let f : R → R by f (x) = x3 + 5. Then f does not assume a maxi-


mum or minimum value. Indeed, limx→∞ f (x) = ∞ and limx→−∞ f (x) = −∞.

Example: Let f : [0, 2] → R be defined as follows: f (x) = x + 1, 0 ≤ x < 1.


f (x) = 1, x = 1.
f (x) = x − 1, 1 < x ≤ 2.
Then f is defined on a closed, bounded interval and is bounded above (by 2) and
bounded below (by 0) but does not have a maximum or minimum value. Of course
this example of a function defined on a closed bounded interval without a maximum
or minimum value feels rather contrived: in particular it is not continuous at x = 1.

This brings us to the statement (but not yet the proof; sorry!) of one of the
most important theorems in this or any course.
Theorem 12. (Extreme Value Theorem) Let f : [a, b] → R be a continuous func-
tion. Then f has a maximum and minimum value, and in particular is bounded
above and below.
Again this result is of paramount importance: ubiquitously in (pure and applied)
mathematics we wish to optimize functions: that is, find their maximum and or
minimum values on a certain domain. Unfortunately, as we have seen above, a
general function f : D → R need not have a maximum or minimum value! But
the Extreme Value Theorem gives rather mild hypotheses on which these values
are guaranteed to exist, and in fact is a useful tool for establishing the existence of
maximia / minima in other situations as well.

Example: Let f : R → R be defined by f (x) = x2 (x − 1)(x − 2). Note that f


does not have a maximum value: indeed limx→∞ f (x) = limx→−∞ = ∞. However,
we claim that f does have a minimum value. We argue for this as follows: given
that f tends to ∞ with |x|, there must exist ∆ > 0 such that for all x with |x| > ∆,
f (x) ≥ 1. On the other hand, if we restrict f to [−∆, ∆] we have a continuous
function on a closed bounded interval, so by the Extreme Value Theorem it must
have a minimum value, say m. In fact since f (0) = 0, we see that m < 0, so in
particular m < 1. This means that the minimum value m for f on [−∆, ∆] must
in fact be the minimum value for f on all of R, since at the other values – namely,
on (−∞, −∆) and (∆, ∞), f (x) > 1 > 0 ≥ m.
We can be at least a little more explicit: a sign analysis of f shows that f is
positive on (−∞, 1) and (2, ∞) and negative on (1, 2), so the minimum value of f
will be its minimum value on [1, 2], which will be strictly negative. But exactly what
is this minimum value m, and for which x value(s) does it occur? Stay tuned:we
are about to develop tools to answer this question!

3.4. Local Extrema and a Procedure for Optimization.

We now describe a type of “local behavior near a” of a very different sort from
being increasing or decreasing at a.

Let f : D → R be a function, and let a ∈ D. We say that f has a local


maximum at a if the value of f at a is greater than or equal to its values at
all sufficiently close points x. More formally: there exists δ > 0 such that for all
MATH 2400 LECTURE NOTES: DIFFERENTIATION 11

x ∈ D, |x − a| < δ =⇒ f (x) ≤ f (a). Similarly, we say that f has a local


minimum at a if the vaalue of f at a is greater than or equal to its values at
all sufficiently close points x. More formally: there exists δ > 0 such that for all
x ∈ D, |x − a| < δ =⇒ f (x) ≥ f (a).
Theorem 13. Let f : D ⊂ R, and let a be an interior point of a. If f is differ-
entiable at a and has a local extremum – i.e., either a local minimum or a local
maximum – at x = a, then f ′ (a) = 0.
Proof. Indeed, if f ′ (a) ̸= 0 then either f ′ (a) > 0 or f ′ (a) < 0.
If f ′ (a) > 0, then by Theorem X.X f is increasing at a. Thus for x slightly smaller
than a, f (x) < f (a), and for x slightly larger than a, f (x) > f (a). So f does not
have a local extremum at a.
Similarly, if f ′ (a) < 0, then by Theorem X.X f is decreasing at a. Thus for x
slightly smaller than a, f (x) > f (a), and for x slightly larger than a, f (x) < f (a).
So f does not have a local extremum at a. 
Theorem 14. (Optimization Procedure) Let f : [a, b] → R be continuous. Then
the minimum and maximum values must each be attained at a point x ∈ [a, b] which
is either:
(i) an endpoint: x = a or x = b,
(ii) a stationary point: f ′ (a) = 0, or
(iii) a point of nondifferentiability.
Often one lumps cases (ii) and (iii) of Theorem XX together under the term crit-
ical point (but there is nothing very deep going on here: it’s just terminology).
Clearly there are always exactly two endpoints. In favorable circustances there
will be only finitely many critical points, and in very favorable circumstances they
can be found exactly: suppose they are c1 , . . . , cn . (There may in fact not be any
critical points, but that would only make our discussion easier...) Suppose further
that we can explicitly compute all the values f (a), f (b), f (c1 ), . . . , f (cn ). Then we
win: the largest of these values is the maximum value, and the minimum of these
values is the minimum value.

Example: Let f (x) = x2 (x − 1)(x − 2) = x4 − 3x3 + 2x2 . Above we argued


that there is a ∆ such that |x| > ∆ =⇒ |f (x)| ≥ 1: let’s find such a ∆ explicitly.
We intend nothing fancy here:
f (x) = x4 − 3x2 + 2x2 ≥ x4 − 3x3 = x3 (x − 3).
So if x ≥ 4, then
x3 (x − 3) ≥ 43 · 1 = 64 ≥ 1.
On the other hand, if x < −1, then x < 0, so −3x3 > 0 and thus
f (x) ≥ x4 + 2x2 = x2 (x2 + 2) ≥ 1 · 3 = 3.
Thus we may take ∆ = 4. Now let us try the procedure of Theorem XX out by
finding the maximim and minimum values of f (x) = x4 − 3x3 + 2x2 on [−4, 4].

Since f is differentiable everywhere on (−4, 4), the only critical points will be the
stationary points, where f ′ (x) = 0. So we compute the derivative:
f ′ (x) = 4x3 − 9x2 + 4x = x(4x2 − 9x + 4).
12 PETE L. CLARK


9± 17
The roots are x = 8 , or, approximately,
x1 ≈ 0.6094 . . . , x2 = 1.604 . . . .
f (x1 ) = 0.2017 . . . , f (x2 ) = −0.619 . . . .
Also we always test the endpoints:
f (−4) = 480, f (4) = 96.
So √the maximum value is 480 and the minimum value is −.619 . . ., occurring at
9+ 17
8 .
3.5. Remarks on finding roots of f ′ .

4. The Mean Value Theorem


4.1. Statement of the Mean Value Theorem.

Our goal in this section is to prove the following important result.


Theorem 15. (Mean Value Theorem) Let f : [a, b] → R be continuous on [a, b]
and differentiable on (a, b). Then there exists at least one c such that a < c < b and
f (b) − f (a)
f ′ (c) = .
b−a
Remark: If you will excuse a (vaguely) personal anecdote: I still remember the
calculus test I took in high school in which I was asked to state the Mean Value
Theorem. It was a multiple choice question, and I didn’t see the choice I wanted,
which was as above except with the subtly stronger assumption that fR′ (a) and
fL′ (b) exist: i.e., f is one-sided differentiable at both endpoints. So I went up to the
teacher’s desk to ask about this. He thought for a moment and said, “Okay, you
can add that as an answer if you want”, and so as not to give special treatment to
any one student, he announced to the class that he was adding a possible answer to
the Mean Value Theorem question. So I marked my added answer, did the rest of
the exam, and then had time to come back to this question. Upon further reflection
it became clear that one-sided differentiability at the endpoints was not in fact re-
quired, i.e., one of the pre-existing choices was the correct answer and not the one I
had added. So I changed my answer and submitted my exam. As you can see from
the statement above, my final answer was correct. But many students in the class
figured that if I had successfully lobbied for an additional answer then this answer
was probably the correct one, with the effect that they changed their answer from
the correct answer to my incorrect added answer! They were not so thrilled with
either me or the teacher, but in my opinion he at least behaved admirably: this
was a real “teachable moment”!

One should certainly draw a picture to go with the Mean Value Theorem, as it
has a very simple geometric interpretation: under the hypotheses of the theorem,
there exists at least one interior point c of the interval such that the tangent line
at c is parallel to the secant line joining the endpoints of the interval.
And one should also interpret it physically: if y = f (x) gives the position of a
particle at a given time x, then the expression f (b)−f
b−a
(a)
is nothing less than the
average velocity between time a and time b, whereas the derivative f ′ (c) is the in-
stantaneous velocity at time c, so that the Mean Value Theorem says that there is at
MATH 2400 LECTURE NOTES: DIFFERENTIATION 13

least one instant at which the instantaneous velocity is equal to the average velocity.

Example: Suppose that cameras are set up at certain checkpoints along an in-
terstate highway in Georgia. One day you receive in the mail photos of yourself
at two checkpoints. The two checkpoints are 90 miles apart and the second photo
is taken 73 minutes after the first photo. You are issued a ticket for violating the
speeed limit of 70 miles per hour. The enclosed letter explains: your average veloc-
ity was (90 miles) / (73 minutes) · (60 minutes) / (hour) ≈ 73.94 miles per hour.
Thus, although no one saw you violating the speed limit, they may mathematically
deduce that at some point your instantaneous velocity was over 70 mph. Guilt by
the Mean Value Theorem!

4.2. Proof of the Mean Value Theorem.

We will deduce the Mean Value Theorem from the Extreme Value Theorem (which
we have not yet proven, but all in good time...). However, it is convenient to first
establish a special case.
Theorem 16. (Rolle’s Theorem) Let f : [a, b] → R. We suppose:
(i) f is continuous on [a, b].
(ii) f is differentiable on (a, b).
(iii) f (a) = f (b).
Then there exists c with a < c < b and f ′ (c) = 0.
Proof. By the Extreme Value Theorem, f has a maximum M and a minimum m.
Case 1: Suppose M > f (a) = f (b). Then the maximum value does not occur
at either endpoint. Since f is differentiable on (a, b), it must therefore occur at a
stationary point: i.e., there exists c ∈ (a, b) with f ′ (c) = 0.
Case 2: Suppose m < f (a) = f (b). Then the minimum value does not occur at
either endpoint. Since f is differentiable on (a, b), it must therefore occur at a
stationary point: there exists c ∈ (a, b) with f ′ (c) = 0.
Case 3: The remaining case is f (a) ≤ m ≤ M ≤ f (a), which implies m = M =
f (a) = f (b), so f is constant. In this case f ′ (c) = 0 at every point c ∈ (a, b)! 

To deduce the Mean Value Theorem from Rolle’s Theorem, it is tempting to tilt
our head until the secant line from (a, f (a)) to (b, f (b)) becomes horizontal and
then apply Rolle’s Theorem. The possible flaw here is that if we start a subset in
the plane which is the graph of a function and rotate it too much, it may no longer
be the graph of a function, so Rolle’s Theorem does not apply.
The above objection is just a technicality. In fact, it suggests that more is true:
there should be some version of the Mean Value Theorem which applies to curves
in the plane which are not necessarily graphs of functions. Indeed we will meet
such a generalization later – the Cauchy Mean Value Theorem – and use it
to prove L’Hôpital’s Rule – but at the moment it is, alas, easier to use a simple trick.

Proof of the Mean Value Theorem: Let f : [a, b] → R be continuous on [a, b] and
differentiable on (a, b). There is a unique linear function L(x) such that L(a) = f (a)
and L(b) = f (b): indeed, L is nothing else than the secant line to f between (a, f (a))
and (b, f (b)). Here’s the trick: by subtracting L(x) from f (x) we reduce ourselves
to a situation where we may apply Rolle’s Theorem, and then the conclusion that
14 PETE L. CLARK

we get is easily seen to be the one we want about f .


Here goes: define
g(x) = f (x) − L(x).
Then g is defined and continuous on [a, b], differentiable on (a, b), and g(a) =
f (a) − L(a) = f (a) − f (a) = 0 = f (b) − f (b) = f (b) − L(b) = g(b). Applying Rolle’s
Theorem to g, there exists c ∈ (a, b) such that g ′ (c) = 0. On the other hand, since
L is a linear function with slope f (b)−f
b−a
(a)
, we compute
f (b) − f (a)
0 = g ′ (c) = f ′ (c) − L′ (c) = f ′ (c) − ,
b−a
and thus
f (b) − f (a)
f ′ (c) = .
b−a
4.3. The Cauchy Mean Value Theorem.

We present here a modest generalization of the Mean Value Theorem due to A.L.
Cauchy. Although perhaps not as fundamental and physically appealing as the
Mean Value Theorem, it certainly has its place: for instance it may be used to
prove L’Hôpital’s Rule.
Theorem 17. (Cauchy Mean Value Theorem) Let f, g : [a, b] → R be continuous
and differentiable on (a, b). Then there exists c ∈ (a, b) such that
(2) (f (b) − f (a))g ′ (c) = (g(b) − g(a))f ′ (c).
Proof. Case 1: Suppose g(a) = g(b). By Rolle’s Theorem, there is c ∈ (a, b) such
that g ′ (c) = 0. With this value of c, both sides of (2) are zero, hence they are equal.
Case 2: Suppose g(a) ̸= g(b), and define
( )
f (b) − f (a)
h(x) = f (x) − g(x).
g(b) − g(a)
Then h is continuous on [a, b], differentiable on (a, b), and
f (a)(g(b) − g(a)) − g(a)(f (b) − f (a)) f (a)g(b) − g(a)f (b)
h(a) = = ,
g(b) − g(a) g(b) − g(a)
f (b)(g(b) − g(a)) − g(b)(f (b) − f (a)) f (a)g(b) − g(a)f (b)
h(b) = = ,
g(b) − g(a) g(b) − g(a)
so h(a) = h(b).6 By Rolle’s Theorem there exists c ∈ (a, b) with
( )
f (b) − f (a)
0 = h′ (c) = f ′ (c) − g ′ (c),
g(b) − g(a)
or equivalently,
(f (b) − f (a))g ′ (c) = (g(b) − g(a))f ′ (c).


Exercise: Which choice of g recovers the “ordinary” Mean Value Theorem?

6Don’t be so impressed: we wanted a constant C such that if h(x) = f (x) − Cg(x), then
h(a) = h(b), so we set f (a) − Cg(a) = f (b) − Cg(b) and solved for C.
MATH 2400 LECTURE NOTES: DIFFERENTIATION 15

5. Monotone Functions
5.1. The Monotone Function Theorems.

The Mean Value Theorem has several important consequences. Foremost of all
it will be used in the proof of the Fundamental Theorem of Calculus, but that’s for
later. At the moment we can use it to establish a criterion for a function f to be
increasing / weakly increasing / decreasing / weakly decreasing on an interval in
terms of sign condition on f ′ .
Theorem 18. (First Monotone Function Theorem) Let I be an open interval, and
let f : I → R be a function which is differentiable on I.
a) Suppose f ′ (x) > 0 for all x ∈ I. Then f is increasing on I: for all x1 , x2 ∈ I
with x1 < x2 , f (x1 ) < f (x2 ).
b) Suppose f ′ (x) ≥ 0 for all x ∈ I. Then f is weakly increasing on I: for all
x1 , x2 ∈ I with x1 < x2 , f (x1 ) ≤ f (x2 ).
c) Suppose f ′ (x) < 0 for all x ∈ I. Then f is decreasing on I: for all x1 , x2 inI
with x1 < x2 , f (x1 ) > f (x2 ).
d) Suppose f ′ (x) ≤ 0 for all x ∈ I. Then f is weakly decreasing on I: for all
x1 , x2 ∈ I with x1 < x2 , f (x1 ) ≥ f (x2 ).
Proof. a) We go by contraposition: suppose that f is not increasing: then there
exist x1 , x2 ∈ I with x1 < x2 such that f (x1 ) ≥ f (x2 ). Apply the Mean Value
Theorem to f on [x1 , x2 ]: there exists x1 < c < x2 such that f ′ (c) = f (xx22)−f
−x1
(x1 )
≤ 0.
b) Again, we argue by contraposition: suppose that f is not weakly increasing: then
there exist x1 , x2 ∈ I with x1 < x2 such that f (x1 ) > f (x2 ). Apply the Mean Value
Theorem to f on [x1 , x2 ]: there exists x1 < c < x2 such that f ′ (c) = f (xx22)−f
−x1
(x1 )
< 0.
c),d) We leave these proofs to the reader. One may either proceed exactly as in
parts a) and b), or reduce to them by multiplying f by −1. 
Corollary 19. (Zero Velocity Theorem) Let f : I → R be a differentiable function
with identically zero derivative. Then f is constant.
Proof. Since f ′ (x) ≥ 0 for all x ∈ I, f is weakly increasing on I: x1 < x2 =⇒
f (x1 ) ≤ f (x2 ). Since f ′ (x) ≤ 0 for all x ∈ I, f is weakly decreasing on I: x1 <
x2 =⇒ f (x1 ) ≥ f (x2 ). But a function which is weakly increasing and weakly
decreasing satsifies: for all x1 < x2 , f (x1 ) ≤ f (x2 ) and f (x1 ) ≥ f (x2 ) and thus
f (x1 ) = f (x2 ): f is constant. 
Remark: The strategy of the above proof is to deduce Corollary 19 from the In-
creasing Function Theorem. In fact if we argued directly from the Mean Value
Theorem the proof would be significantly shorter: try it!
Corollary 20. Suppose f, g : I → R are both differentiable and such that f ′ = g ′
(equality as functions, i.e., f ′ (x) = g ′ (x) for all x ∈ I). Then there exists a constant
C ∈ R such that f = g + C, i.e., for all x ∈ I, f (x) = g(x) + C.
Proof. Let h = f − g. Then h′ = (f − g)′ = f ′ − g ′ ≡ 0, so by Corollary 19, h ≡ C
and thus f = g + h = g + C. 
Remark: Corollary 20 can be viewed as the first “uniqueness theorem” for differen-
tial equations. Namely, suppose that f : I → R is some function, and consider the
set of all functions F : I → R such that F ′ = f . Then Corollary ?? asserts that if
16 PETE L. CLARK

there is a function F such that F ′ = f , then there is a one-parameter family of


such functions, and more specifically that the general such function is of the form
F + C. In perhaps more familiar terms, this asserts that antiderivatives are unique
up to an additive constant, when they exist.

On the other hand, the existence question lies deeper: namely, given f : I → R,
must there exist F : I → R such that F ′ = f ? In general the answer is no.

Exercise: Let f : R → R by f (x) = 0 for x ≤ 0 and f (x) = 1 for x > 0. Show that
there is no function F : R → R such that F ′ = f .

In other words, not every function f : R → R has an antiderivative, i.e., is


the derivative of some other function. It turns out that every continuous function
has an antiderivative: this will be proved in the second half of the course. (On the
third hand, there are some discontinuous functions which have antiderivatives, but
it is too soon to get into this...)
Corollary 21. Let n ∈ Z+ , and let f : I → R be a function whose nth derivative
f (n) is identically zero. Then f is a polynomial function of degree at most n − 1.
Proof. Exercise. (Hint: use induction.) 
The setting of the Increasing Function Theorem is that of a differentiable function
defined on an open interval I. This is just a technical convenience: for continu-
ous functions, the increasing / decreasing / weakly increasing / weakly decreasing
behavior on the interior of I implies the same behavior at an endpoint of I.
Theorem 22. Let f : [a, b] → R be a function. We suppose:
(i) f is continuous at x = a and x = b.
(ii) f is weakly increasing (resp. increasing, weakly decreasing, decreasing) on
(a, b).
Then f is weakly increasing (resp. increasing, weakly decreasing, decreasing) on
[a, b].
Remark: The “resp.” in the statement above is an abbreviation for “respectively”.
Use of respectively in this way is a shorthand way for writing out several cognate
statements. In other words, we really should have four different statements, each
one of the form “if f has property X on (a, b), then it also has property X on
[a, b]”, where X runs through weakly increasing, increasing, weakly decreasing, and
decreasing. Use of “resp.” in this way is not great mathematical writing, but it is
sometimes seen as preferable to the tedium of writing out a large number of very
similar statements. It certainly occurs often enough for you to get used to seeing
and understanding it.
Proof. There are many similar statements here; let’s prove some of them.
Step 1: Suppose that f is continuous at a and weakly increasing on (a, b). We
will show that f is weakly increasing on [a, b). Indeed, assume not: then there
exists x0 ∈ (a, b) such that f (a) > f (x0 ). Now take ϵ = f (a) − f (x0 ); since f
is (right-)continuous at a, there exists δ > 0 such that for all a ≤ x < a + δ,
|f (x) − f (a)| < f (a) − f (x0 ), which implies f (x) > f (x0 ). By taking a < x < x0 ,
this contradicts the assumption that f is weakly increasing on (a, b).
Step 2: Suppose that f is continuous at a and increasing on (a, b). We will show
MATH 2400 LECTURE NOTES: DIFFERENTIATION 17

that f is increasing on [a, b). Note first that Step 1 applies to show that f (a) ≤ f (x)
for all x ∈ (a, b), but we want slightly more than this, namely strict inequality. So,
seeking a contradiction, we suppose that f (a) = f (x0 ) for some x0 ∈ (a, b). But
now take x1 ∈ (a, x0 ): since f is increasing on (a, b) we have f (x1 ) < f (x0 ) = f (a),
contradicting the fact that f is weakly increasing on [a, b).
Step 3: In a similar way one can handle the right endpoint b. Now suppose that
f is increasing on [a, b) and also increasing on (a, b]. It remains to show that f is
increasing on [a, b]. The only thing that could go wrong is f (a) ≥ f (b). To see that
this cannot happen, choose any c ∈ (a, b): then f (a) < f (c) < f (b). 
Let us say that a function f : I → R is monotone if it is either increasing on
I or decreasing on I, and also that f is weakly monotone if it is either weakly
increasing on I or weakly decreasing on I.
Theorem 23. (Second Monotone Function Theorem) Let f : I → R be a function
which is continuous on I and differentiable on the interior I ◦ of I (i.e., at every
point of I except possibly at any endpoints I may have).
a) The following are equivalent:
(i) f is weakly monotone.
(ii) Either we have f ′ (x) ≥ 0 for all x ∈ I ◦ or f ′ (x) ≤ 0 for all x ∈ I ◦ .
b) Suppose f is weakly monotone. The following are equivalent:
(i) f is not monotone.
(ii) There exist a, b ∈ I ◦ with a < b such that the restriction of f to [a, b is constant.
(iii) There exist a, b ∈ I ◦ with a < b such that f ′ (x) = 0 for all x ∈ [a, b].
Proof. Throughout the proof we restrict our attention to increasing / weakly in-
creasing functions, leaving the other case to the reader as a routine exercise.
a) (i) =⇒ (ii): Suppose f is weakly increasing on I. We claim f ′ (x) ≥ 0 for all
x ∈ I ◦ . If not, there is a ∈ I ◦ with f ′ (a) < 0. Then f is decreasing at a, so there
exists b > a with f (b) < f (a), contradicting the fact that f is weakly decreasing.
(ii) =⇒ (i): Immediate from the Increasing Function Theorem and Theorem 22.
b) (i) =⇒ (ii): Suppose f is weakly increasing on I but not increasing on I. By
Theorem 22 f is still not increasing on I ◦ , so there exist a, b ∈ I ◦ with a < b such
that f (a) = f (b). Then, since f is weakly increasing, for all c ∈ [a, b] we have
f (a) ≤ f (c) ≤ f (b) = f (a), so f is constant on [a, b].
(ii) =⇒ (iii): If f is constant on [a, b], f ′ is identically zero on [a, b].
(iii) =⇒ (i): If f ′ is identically zero on some subinterval [a, b], then by the Zero
Velocity Theorem f is constant on [a, b], hence is not increasing. 
The next result follows immediately.
Corollary 24. Let f : I → R be differentiable. Suppose that f ′ (x) ≥ 0 for all
x ∈ I, and that f ′ (x) > 0 except at a finite set of points x1 , . . . , xn . Then f is
increasing on I.
Example: A typical application of Theorem 24 is to show that the function f :
R → R by f (x) = x3 is increasing on all of R. Indeed, f ′ (x) = 3x2 which is strictly
positive at all x ̸= 0 and 0 at x = 0.
5.2. The First Derivative Test.

We can use Theorem 22 to quickly derive another staple of freshman calculus.


18 PETE L. CLARK

Theorem 25. (First Derivative Test) Let I be an interval, a an interior point of I,


and f : I → R a function. We suppose that f is continuous on I and differentiable
on I \ {a} – i.e., differentiable at every point of I except possibly at x = a. Then:
a) If there exists δ > 0 such that f ′ (x) is negative on (a − δ, a) and is positive on
(a, a + δ). Then f has a strict local minimum at a.
b) If there exists δ > 0 such that f ′ (x) is positive on (a − δ, a) and is negative on
(a, a + δ). Then f has a strict local maximum at a.
Proof. a) By the First Monotone Function Theorem, since f ′ is negative on the
open interval (a − δ, a) and positive on the open interval (a, a + δ) f is decreasing
on (a − δ, a) and increasing on (a, a + δ). Moreover, since f is differentiable on its
entire domain, it is continuous at a − δ, a and a + δ, and thus Theorem 22 applies
to show that f is decreasing on [a − δ, a] and increasing on [a, a + δ]. This gives
the desired result, since it implies that f (a) is strictly smaller than f (x) for any
x ∈ [a − δ, a) or in (a, a + δ].
b) As usual this may be proved either by revisiting the above argument or deduced
directly from the result of part a) by multiplying f by −1. 

Remark: This version of the First Derivative Test is a little stronger than the
familiar one from freshman calculus in that we have not assumed that f ′ (a) = 0
nor even that f is differentiable at a. Thus for instance our version of the test
applies to f (x) = |x| to show that it has a strict local minimum at x = 0.

5.3. The Second Derivative Test.


Theorem 26. (Second Derivative Test) Let a be an interior point of an interval
I, and let f : I → R. We suppose:
(i) f is twice differentiable at a, and
(ii) f ′ (a) = 0.
Then if f ′′ (a) > 0, f has a strict local minimum at a, whereas if f ′′ (a) < 0, f has
a strict local maximum at a.
Proof. As usual it suffices to handle the case f ′′ (a) > a.
Notice that the hypothesis that f is twice differentiable at a implies that f is
differentiable on some interval (a−δ, a+δ) ( otherwise it would not be meaningful to
talk about the derivative of f ′ at a). Our strategy will be to show that for sufficiently
small δ > 0, f ′ (x) is negative for x ∈ (a − δ, a) and positive for x ∈ (a, a + δ) and
then apply the First Derivative Test. To see this, consider
f ′ (x) − f ′ (a) f ′ (x)
f ′′ (a) = lim = lim .
x→a x−a x→a x − a

We are assuming that this limit exists and is positive, so that there exists δ > 0

such that for all x ∈ (a − δ, a) ∪ (a, a + δ), fx−a(x)
is positive. And this gives us exactly
f ′ (x)
what we want: suppose x ∈ (a − δ, a). Then x−a > 0 and x − a < 0, so f ′ (x) < 0.

On the other hand, suppose x ∈ (a, a + δ). Then fx−a (x)
> 0 and x − a > 0, so
f ′ (x) > 0. So f has a strict local minimum at a by the First Derivative Test. 

Remark: When f ′ (a) = f ′′ (a) = 0, no conclusion can be drawn about the local
behavior of f at a: it may have a local minimum at a, a local maximum at a, be
increasing at a, decreasing at a, or none of the above.
MATH 2400 LECTURE NOTES: DIFFERENTIATION 19

5.4. Sign analysis and graphing.

When one is graphing a function f , the features of interest include number and
approximate locations of the roots of f , regions on which f is positive or negative,
regions on which f is increasing or decreasing, and local extrema, if any. For these
considerations one wishes to do a sign analysis on both f and its derivative f ′ .

Let us agree that a sign analysis of a function g : I → R is the determina-


tion of regions on which g is positive, negative and zero.

The basic strategy is to determine first the set of roots of g. As discussed before,
finding exact values of roots may be difficult or impossible even for polynomial
functions, but often it is feasible to determine at least the number of roots and
their approximate location (certainly this is possible for all polynomial functions,
although this requires justification that we do not give here). The next step is to
test a point in each region between consecutive roots to determine the sign.

This procedure comes with two implicit assumptions. Let us make them explicit.

The first is that the roots of f are sparse enough to separate the domain I into
“regions”. One precise formulation of of this is that f has only finitely many roots
on any bounded subset of its domain. This holds for all the elementary functions we
know and love, but certainly not for all functions, even all differentiable functions:
we have seen that things like x2 sin( x1 ) are not so well-behaved. But this is a con-
venient assumption and in a given situation it is usually easy to see whether it holds.

The second assumption is more subtle: it is that if a function f takes a posi-


tive value at some point a and a negative value at some other point b then it must
take the value zero somewhere in between. Of course this does not hold for all
functions: it fails very badly, for instance, for the function f which takes the value
1 at every rational number and −1 at every irrational number.

Let us formalize the desired property and then say which functions satisfy it.

A function f : I → R has the intermediate value property if for all a, b ∈ I


with a < b and all L in between f (a) and f (b) – i.e., with f (a) < L < f (b) or
f (b) < L < f (a) – there exists some c ∈ (a, b) with f (c) = L.

Thus a function has the intermediate value property when it does not “skip” values.

Here are two important theorems, each asserting that a broad class of functions
has the intermediate value property.
Theorem 27. (Intermediate Value Theorem) Let f : [a, b] → R be a continuous
function defined on a closed, bounded interval. Then f has the intermediate value
property.
Example of a continuous function f :√[0, 2]Q → Q failing √
the intermediate value
property. Let f (x) be −1 for 0 ≤ x < 2 and f (x) = 1 for 2 < x ≤ 1.
20 PETE L. CLARK

The point of this example is to drive home the point that the Intermediate Value
Theorem is the second of our three “hard theorems” in the sense that we have no
chance to prove it without using special properties of the real numbers beyond the
ordered field axioms. And indeed we will not prove IVT right now, but we will use
it, just as we used but did not yet prove the Extreme Value Theorem. (However we
are now not so far away from the point at which we will “switch back”, talk about
completeness of the real numbers, and prove the three hard theorems.)

The Intermediate Value Theorem (or IVT) is ubiquitously useful. As alluded to


earlier, even such innocuous properties as every non-negative real number having a
square root contain an implicit appeal to IVT. From the present point of view, it
justifies the following observation.

Let f : I → R be a continuous function, and suppose that there are only finitely
many roots, i.e., there are x1 , . . . , xn ∈ I such that f (xi ) = 0 for all i and f (x) ̸= 0
for all other x ∈ I. Then I \ {x1 , . . . , xn } is a finite union of intervals, and on each
of them f has constant sign: it is either always positive or always negative.

So this is how sign analysis works for a function f when f is continuous – a very
mild assumption. But as above we also want to do a sign analysis of the derivative
f ′ : how may we justify this?

Well, here is one very reasonable justification: if the derivative f ′ of f is itself


continuous, then by IVT it too has the intermediate value property and thus, at
least if f ′ has only finitely many roots on any bounded interval, sign analysis is
justified. This brings up the following basic question.
Question 1. Let f : I → R be a differentiable function? Must its derivative
f ′ : I → R be continuous?
Let us first pause to appreciate the subtlety of the question: we are not asking
whether f differentiable implies f continuous: we proved long ago and have used
many times that this is the case. Rather we are asking whether the new function
f ′ can exist at every point of I but fail to itself be a continuous function. In fact
the answer is yes.

Example: Let f (x) = x2 sin( x1 ). I claim that f is differentiable on all of R but


that the derivative is discontinuous at x = 0, and in fact that limx→0 f ′ (x) does
not exist. . . .
Theorem 28. (Darboux) Let f : I → R be a differentiable function. Suppose
that we have a, b ∈ I with a < b and f ′ (a) < f ′ (b). Then for every L ∈ R with
f ′ (a) < L < f ′ (b), there exists c ∈ (a, b) such that f ′ (c) = L.
Proof. Step 1: First we handle the special case L = 0, which implies f ′ (a) < 0 and
f ′ (b) > 0. Now f is a differentiable – hence continuous – function defined on the
closed interval [a, b] so assumes its minimum value at some point c ∈ [a, b]. If c is
an interior point, then as we have seen, it must be a stationary point: f ′ (c) = 0.
But the hypotheses guarantee this: since f ′ (a) < 0, f is decreasing at a, thus
takes smaller values slightly to the right of a, so the minimum cannot occur at a.
Similarly, since f ′ (b) > 0, f is increasing at b, thus takes smaller values slightly to
MATH 2400 LECTURE NOTES: DIFFERENTIATION 21

the left of b, so the minimum cannot occur at b.


Step 2: We now reduce the general case to the special case of Step 1 by defining
g(x) = f (x) − Lx. Then g is still differentiable, g ′ (a) = f ′ (a) − L < 0 and g ′ (b) =
f ′ (b) − L > 0, so by Step 1, there exists c ∈ (a, b) such that 0 = g ′ (c) = f ′ (c) − L.
In other words, there exists c ∈ (a, b) such that f ′ (c) = L. 
Remark: Of course there is a corresponding version of the theorem when f (b) <
L < f (a). Darboux’s Theorem also often called the Intermediate Value The-
orem For Derivatives, terminology we will understand better when we discuss
the Intermediate Value Theorem (for arbitrary continuous functions).

Exercise: Let a be an interior point of an interval I, and suppose f : I → R is


a differentiable function. Show that the function f ′ cannot have a simple dis-
continuity at x = a. (Recall that a function g has a simple discontinuity at a
if limx→a− g(x) and limx→a+ g(x) both exist but either they are unequal to each
other or they are unequal to g(a).)
5.5. A theorem of Spivak.

The following theorem is taken directly from Spivak’s book (Theorem 7 of Chapter
11): it does not seem to be nearly as well known as Darboux’s Theorem (and in
fact I think I encountered it for the first time in Spivak’s book).
Theorem 29. Let a be an interior point of I, and let f : I → R. Suppose:
(i) f is continuous on I,
(ii) f is differentiable on I \ {a}, i.e., at every point of I except possibly at a, and
(iii) limx→a f ′ (x) = L exists.
Then f is differentiable at a and f ′ (a) = L.
Proof. Choose δ > 0 such that (a − δ, a + δ) ⊂ I. Let x ∈ (a, a + δ). Then f is
differentiable at x, and we may apply the Mean Value Theorem to f on [a, x]: there
exists cx ∈ (a, x) such that
f (x) − f (a)
= f ′ (cx ).
x−a
Now, as x → a every point in the interval [a, x] gets arbitrarily close to x, so
limx→a cx = x and thus
f (x) − f (a)
fR′ (a) = lim+ = lim+ f ′ (cx ) = lim+ f ′ (x) = L.
x→a x−a x→a x→a
By a similar argument involving x ∈ (a − δ, a) we get
fL′ (a) = lim f ′ (x) = L,
x→
so f is differentiable at a and f ′ (a) = L. 

6. Inverse Functions I: Theory


6.1. Review of inverse functions.

Let X and Y be sets, and let f : X → Y be a function between them. Recall


that an inverse function is a function g : Y → X such that
g ◦ f = 1X : X → X, f ◦ g = 1Y : Y → Y.
22 PETE L. CLARK

Let’s unpack this notation: it means the following: first, that for all x ∈ X, (g ◦
f )(x) = g(f (x)) = x; and second, that for all y ∈ Y , (f ◦ g)(y) = f (g(y)) = y.
Proposition 30. (Uniqueness of Inverse Functions) Let f : X → Y be a function.
Suppose that g1 , g2 : Y → X are both inverses of f . Then g1 = g2 .
Proof. For all y ∈ Y , we have
g1 (y) = (g2 ◦ f )(g1 (y)) = g2 (f (g1 (y))) = g1 (y).

Since the inverse function to f is always unique provided it exists, we denote it by
f −1 . (Caution: In general this has nothing to do with f1 . Thus sin−1 (x) ̸= csc(x) =
1
sin x . Because this is legitimately confusing, many calculus texts write the inverse
sine function as arcsin x. But in general one needs to get used to f −1 being used
for the inverse function.)

We now turn to giving conditions for the existence of the inverse function. Re-
call that f : X → Y is injective if for all x1 , x2 ∈ X, x1 ̸= x2 =⇒ f (x1 ) ̸= f (x2 ).
In other words, distinct x-values get mapped to distinct y-values. (And in yet other
words, the graph of f satisfies the horizontal line test.) Also f : X → Y is surjec-
tive if for all y ∈ Y , there exists at least one x ∈ X such that y = f (x).

Putting these two concepts together we get the important notion of a bijective
function f : X → Y , i.e., a function which is both injective and surjective. Other-
wise put, for all y ∈ Y there exists exactly one x ∈ X such that y = f (x). It may
well be intuitively clear that bijectivity is exactly the condition needed to guarantee
existence of the inverse function: if f is bijective, we define f −1 (y) = xy , the unique
element of X such that f (xy ) = y. And if f is not bijective, this definition breaks
down and thus we are unable to define f −1 . Nevertheless we ask the reader to bear
with us as we give a slightly tedious formal proof of this.
Theorem 31. (Existence of Inverse Functions) For f : X → Y , TFAE:
(i) f is bijective.
(ii) f admits an inverse function.
Proof. (i) =⇒ (ii): If f is bijective, then as above, for each y ∈ X there exists
exactly one element of X – say xy – such that f (xy ) = y. We may therefore define a
function g : Y → X by g(y) = xy . Let us verify that g is in fact the inverse function
of f . For any x ∈ X, consider g(f (x)). Because f is injective, the only element
x′ ∈ X such that f (x′ ) = f (x) is x′ = x, and thus g(f (x)) = x. For any y ∈ Y , let
xy be the unique element of X such that f (xy ) = y. Then f (g(y)) = f (xy ) = y.
(ii) =⇒ (i): Suppose that f −1 exists. To see that f is injective, let x1 , x2 ∈ X
be such that f (x1 ) = f (x2 ). Applying f −1 on the left gives x1 = f −1 (f (x1 )) =
f −1 (f (x2 )) = x2 . So f is injective. To see that f is surjective, let y ∈ Y . Then
f (f −1 (y)) = y, so there is x ∈ X with f (x) = y, namely x = f −1 (y). 
For any function f : X → Y , we define the image of f to be {y ∈ Y | ∃x ∈ X | y =
f (x)}. The image of f is often denoted f (X).7

7This is sometimes called the range of f , but sometimes not. It is safer to call it the image!
MATH 2400 LECTURE NOTES: DIFFERENTIATION 23

We now introduce the dirty trick of codomain restriction. Let f : X → Y


be any function. Then if we replace the codomain Y by the image f (X), we still
get a well-defined function f : X → f (X), and this new function is tautologically
surjective. (Imagine that you manage the up-and-coming band Yellow Pigs. You
get them a gig one night in an enormous room filled with folding chairs. After
everyone sits down you remove all the empty chairs, and the next morning you
write a press release saying that Yellow Pigs played to a “packed house”. This is
essentially the same dirty trick as codomain restriction.)

Example: Let f : R → R by f (x) = x2 . Then f (R) = [0, ∞), and although


x2 : R → R is not surjective, x2 : R → [0, ∞) certainly is.

Since a codomain-restricted function is always surjective, it has an inverse iff it


is injective iff the original functionb is injective. Thus:
Corollary 32. For a function f : X → Y , the following are equivalent:
(i) The codomain-restricted function f : X → f (X) has an inverse function.
(ii) The original function f is injective.
6.2. The Interval Image Theorem.

Next we want to return to earth by considering functions f : I → R and their


inverses, concentrating on the case in which f is continuous.
Theorem 33. (Interval Image Theorem) Let I ⊂ R be an interval, and let f : I →
R be a continuous function. Then the image f (I) of f is also an interval.
Proof. At the moment we will give the proof only when I = [a, b], i.e., is closed
and bounded. The general case will be discussed later when we switchback to talk
about least upper bounds. Now suppose f : [a, b] → R is continuous. Then f has a
minimum value m, say at xm and a maximum value M , say at xM . It follows that
the image f ([a, b]) of f is a subset of the interval [m, M ]. Moreover, if L ∈ (m, M ),
then by the Intermediate Value Theorem there exists c in between xm and xM such
that f (c) = L. So f ([a, b]) = [m, M ]. 

Remark: Although we proved only a special case of the Interval Image Theorem,
in this case we proved a stronger result: if f is a continuous function defined on
a closed, bounded interval I, then f (I) is again a closed, bounded interval. One
might hope for analogues for other types of intervals, but in fact this is not true.

Exercise: Let I be a nonempty interval which is not of the form [a, b]. Let J
be any nonempty interval in R. Show that there is a continuous function f : I → R
with f (I) = J.

6.3. Monotone Functions and Invertibility.

Recall f : I → R is monotone if it is either increasing or decreasing. Every


monotone function is injective. (In fact, a weakly monotone function is monotone
if and only if it is injective.) Therefore our dirty trick of codomain restriction works
to show that if f : I → R is monotone, f : I → f (I) is bijective, hence invertible.
Thus in this sense we may speak of the inverse of any monotone function.
24 PETE L. CLARK

Proposition 34. Let f : I → f (I) be a monotone function.


a) If f is increasing, then f −1 : f (I) → I is increasing.
b) If f is decreasing, then f −1 : F (I) → I is decreasing.
Proof. As usual, we will content ourselves with the increasing case, the decreasing
case being so similar as to make a good exercise for the reader.
Seeking a contradiction we suppose that f −1 is not increasing: that is, there
exist y1 < y2 ∈ f (I) such that f −1 (y1 ) is not less than f −1 (y2 ). Since f −1 is an
inverse function, it is necessarily injective (if it weren’t, f itself would not be a
function!), so we cannot have f −1 (y1 ) = f −1 (y2 ), and thus the possibility we need
to rule out is f −1 (y2 ) < f −1 (y1 ). But if this holds we apply the increasing function
f to get y2 = f (f −1 (y2 )) < f (f −1 (y1 )) = y1 , a contradiction. 
Lemma 35. (Λ-V Lemma) Let f : I → R. The following are equivalent:
(i) f is not monotone: i.e., f is neither increasing nor decreasing.
(ii) At least one of the following holds:
(a) f is not injective.
(b) f admits a Λ-configuration: there exist a < b < c ∈ I with f (a) < f (b) > f (c).
(c) f admits a V -configuration: there exist a < b < c ∈ I with f (a) > f (b) < f (c).
Proof. Exercise! 
Theorem 36. Let f : I → R be continuous and injective. Then f is monotone.
Proof. We will suppose that f is injective and not monotone and show that it
cannot be continuous, which suffices. We may apply Lemma 35 to conclude that f
has either a Λ configuration or a V configuration.
Suppose first f has a Λ configuration: there exist a < b < c ∈ I with f (a) <
f (b) > f (c). Then there exists L ∈ R such that f (a) < L < f (b) > L > f (c). If f
were continuous then by the Intermediate Value Theorem there would be d ∈ (a, b)
and e ∈ (b, c) such that f (d) = f (e) = L, contradicting the injectivity of f .
Next suppose f has a V configuration: there exist a < b < c ∈ I such that
f (a) > f (b) < f (c). Then there exists L ∈ R such that f (a) > L > f (b) < L < f (c).
If f were continuous then by the Intermediate Value Theorem there would be
d ∈ (a, b) and e ∈ (b, c) such that f (d) = f (e) = L, contradicting injectivity. 
6.4. Inverses of Continuous Functions.
Theorem 37. (Continuous Inverse Function Theorem) Let f : I → R be injective
and continuous. Let J = f (I) be the image of f .
a) f : I → J is a bijection, and thus there is an inverse function f −1 : J → I.
b) J is an interval in R.
c) If I = [a, b], then either f is increasing and J = [f (a), f (b)] or f is decreasing
and J = [f (b), f (a)].
d) The function f −1 : J → I is also continuous.
Proof. [S, Thm. 12.3] Parts a) through c) simply recap previous results. The new
result is part d), that f −1 : J → I is continuous. By part c) and Proposition 34,
either f and f −1 are both increasing, or f and f −1 are both decreasing. As usual,
we restrict ourselves to the first case.
Let b ∈ J. We must show that limy→b f −1 (y) = f −1 (b). We may write b = f (a)
for a unique a ∈ I. Fix ϵ > 0. We want to find δ > 0 such that if f (a) − δ < y <
MATH 2400 LECTURE NOTES: DIFFERENTIATION 25

f (a) + δ, then a − ϵ < f −1 (y) < a + ϵ.


Take δ = min(f (a + ϵ) − f (a), f (a) − f (a − ϵ)). Then:
f (a − ϵ) ≤ f (a) − δ, f (a) + δ ≤ f (a + ϵ),
and thus if f (a) − δ < y < f (a) + δ we have
f (a − ϵ) ≤ f (a) − δ < y < f (a) + δ ≤ f (a + ϵ).
−1
Since f is increasing, we get
f −1 (f (a − ϵ)) < f −1 (y) < f −1 (f (a + ϵ)),
or
f −1 (b) − ϵ < f −1 (y) < f −1 (b) + ϵ.

Remark: To be honest with you, I don’t find the above proof to be very enlightening.
After reflecting a little bit on my dissatisfaction with this argument, I came up with
an alternate proof, which in my opinion is conceptually simpler, but depends on
the Monotone Jump Theorem, a characterization of the possible discontinuities
of a weakly monotone function. The proof of this theorem uses the Dedekind
completeness of the real numbers, so is postponed to the next part of the notes in
which we discuss completeness head-on.8
6.5. Inverses of Differentiable Functions.

In this section our goal is to determine conditions under which the inverse f −1
of a differentiable funtion is differentiable, and if so to find a formula for (f −1 )′ .

Let’s first think about the problem geometrically. The graph of the inverse func-
tion y = f −1 (x) is obtained from the graph of y = f (x) by interchanging x and
y, or, put more geometrically, by reflecting the graph of y = f (x) across the line
y = x. Geometrically speaking y = f (x) is differentiable at x iff its graph has
a well-defined, nonvertical tangent line at the point (x, f (x)), and if a curve has
a well-defined tangent line, then reflecting it across a line should not change this.
Thus it should be the case that if f is differentiable, so is f −1 . Well, almost. Notice
the occurrence of “nonvertical” above: if a curve has a vertical tangent line, then
since a vertical line has “infinite slope” it does not have a finite-valued derivative.
So we need to worry about the possibility that reflection through y = x carries a
nonvertical tangent line to a vertical tangent line. When does this happen? Well,
the inverse function of the straight line y = mx + b is the straight line y = m 1
(x − b)
1
– i.e., reflecting across y = x takes a line of slope m to a line of slope m . Morever,
it takes a horizontal line y = c to a vertical line x = c, so that is our answer: at
any point (a, b) = (a, f (a)) such that f ′ (a) = 0, then the inverse function will fail
to be differentiable at the point (b, a) = (b, f −1 (b)) because it will have a vertical
tangent. Otherwise, the slope of the tangent line of the inverse function at (b, a) is
precisely the reciprocal of the slope of the tangent line to y = f (x) at (a, b).

Well, so the geometry tells us. It turns out to be quite straightforward to adapt
8Nevertheless in my lectures I did state the Monotone Jump Theorem at this point and use it
to give a second proof of the Continuous Inverse Function Theorem.
26 PETE L. CLARK

this geometric argument to derive the desired formula for (f −1 )′ (b), under the as-
sumption that f is differentiable. We will do this first. Then we need to come back
and verify that indeed f −1 is differentiable at b if f ′ (f −1 (b)) exists and is nonzero:
this turns out to be a bit stickier, but we are ready for it and we will do it.
Proposition 38. Let f : I → J be a bijective differentiable function. Suppose
that the inverse function f −1 : J → I is differentiable at b ∈ J. Then (f −1 )′ (b) =
−1
1
f ′ (f −1 (b)) . In particular, if f is differentiable at b then f ′ (f −1 (b)) ̸= 0.
Proof. We need only implicitly differentiate the equation
f −1 (f (x)) = x,
getting
(3) (f −1 )′ (f (x))f ′ (x) = 1,
or
1
(f −1 )′ (f (x)) = .
f ′ (x)
To apply this to get the derivative at b ∈ J, we just need to think a little about our
variables. Let a = f −1 (b), so f (a) = b. Evaluating the last equation at x = a gives
1 1
(f −1 )′ (b) = ′ = ′ −1 .
f (a) f (f (b))
Moreover, since by (3) we have (f −1 )′ (b)f ′ (f −1 (b)) = 1, f ′ (f −1 (b)) ̸= 0. 
As mentioned above, unfortunately we need to work a little harder to show the
differentiability of f −1 , and for this we cannot directly use Proposition 38 but end
up deriving it again. Well, enough complaining: here goes.
Theorem 39. (Differentiable Inverse Function Theorem) Let f : I → J be con-
tinuous and bijective. Let b be an interior point of J and put a = f −1 (b). Suppose
that f is differentiable at a and f ′ (a) ̸= 0. Then f −1 is differentiable at b, with the
familiar formula
1 1
(f −1 )′ (b) = ′ = ′ −1 .
f (a) f (f (b))
Proof. [S, Thm. 12.5] We have
f −1 (b + h) − f −1 (b) f −1 (b + h) − a
(f −1 )′ (b) = lim = lim .
h→0 h h→0 h
Since J = f (I), every b + h ∈ J is of the form
b + h = f (a + kh )
for a unique kh ∈ I. Since b + h = f (a + kh ), f −1 (b + h) = a + kh ; let’s make this
9

substitution as well as h = f (a + kh ) − f (a) in the limit we are trying to evaluate:


a + kh − a kh 1
(f −1 )′ (b) = lim = lim = lim .
h→0 f (a + kh ) − b h→0 f (a + kh ) − f (a) h→0 f (a+kh )−f (a)
kh
We are getting close: the limit now looks like the reciprocal of the derivative of f
at a. The only issue is the pesky kh , but if we can show that limh→0 kh = 0, then
9Unlike Spivak, we will include the subscript k to remind ourselves that this k is defined in
h
terms of h: to my taste this reminder is worth a little notational complication.
MATH 2400 LECTURE NOTES: DIFFERENTIATION 27

we may simply replace the “limh→0 ” with “limkh →0 ” and we’ll be done.
But kh = f −1 (b + h) − a, so – since f −1 is continuous by Theorem 37 – we have
lim kh = lim f −1 (b + h) − a = f −1 (b + 0) − a = f −1 (b) − a = a − a = 0.
h→0 h→0
So as h → 0, kh → 0 and thus
1 1 1
(f −1 )′ (b) = = = ′ −1 .
limkh →0 f (a+kkhh)−f (a) f ′ (a) f (f (b))


7. Inverse Functions II: Examples and Applications


1
7.1. x .
n

In this section we illustrate the preceding concepts by defining and differentiat-


1
ing the nth root function x n . The reader should not now be surprised to hear that
we give separate consideration to the cases of odd n and even n.

Either way, let n > 1 be an integer, and consider


f : R → R, x 7→ xn .
Case 1: n = 2k +1 is odd. Then f ′ (x) = (2k +1)x2k = (2k +1)(xk )2 is non-negative
for all x ∈ R and not identically zero on any subinterval [a, b] with a < b, so by
Theorem 23 f : R → R is increasing. Moreover, we have limx→±∞ f (x) = ±∞.
Since f is continuous, by the Intermediate Value Theorem the image of f is all of
R. Moreover, f is everywhere differentiable and has a horizontal tangent only at
x = 0. Therefore there is an inverse function
f −1 : R → R
which is everywhere continuous and differentiable at every x ∈ R except x = 0 (at
which point there is a well-defined, but vertical, tangent line). It is typical to call
1
this function x n .10

Case 2: n = 2k is even. Then f ′ (x) = (2k)x2k−1 is positive when x > 0 and


negative when x < 0. Thus f is decreasing on (−∞, 0] and increasing on [0, ∞).
In particular it is not injective on its domain. If we want to get an inverse func-
tion, we need to engage in the practice of domain restriction. Unlike codomain
restriction, which can be done in exactly one way so as to result in a surjective
function, domain restriction brings with it many choices. Luckily for us, this is a
relatively simple case: if D ⊂ R, then the restriction of f to D will be injective if
and only if for each x ∈ R, at most one of x, −x lies in D. If we want the restricted
domain to be as large as possible, we should choose the domain to include 0 and
exactly one of x, −x for all x > 0. There are still lots of ways to do this, so let’s
try to impose another desirable property of the domain of a function: namely, if
possible we would like it to be an interval. A little thought shows that there are
two restricted domains which meet all these requirements: we may take D = [0, ∞)
or D = (−∞, 0].
10I’ll let you think about why this is good notation: it has to do with the rules for
exponentiation.
28 PETE L. CLARK

7.2. L(x) and E(x).

Consider the function l : (0, ∞) → R given by l(x) = x1 . As advertised, we will


soon be able to prove that every continuous function has an antiderivative, so bor-
rowing on this result we define L : (0, ∞) → R to be such that L′ (x) = l(x). More
precisely, recall that when they exist antiderivatives are unique up to the addition
of a constant, so we may uniquely specify L(x) by requiring L(1) = 0.
Proposition 40. For all x, y ∈ (0, ∞), we have
(4) L(xy) = L(x) + L(y).
Proof. Let y ∈ (0, ∞) be regarded as fixed, and consider the function
f (x) = L(xy) − L(x) − L(y).
We have
1 1 y 1
f ′ (x) = L′ (xy)(xy)′ − L′ (x) = ·y− = − = 0.
xy x xy x
By the zero velocity theorem, the function f (x) is a constant (depending, a priori
on y), say Cy . Thus for all x ∈ (0, ∞),
L(xy) = L(x) + L(y) + Cy .
If we plug in x = 1 we get
L(y) = 0 + L(y) + Cy ,
and thus Cy = 0, so L(xy) = L(x) + L(y). 
Corollary 41. a) For all x ∈ (0, ∞) and n ∈ Z+ , we have L(xn ) = nL(x).
b) For x ∈ (0, ∞), we have L( x1 ) = −L(x).
c) We have limx→∞ L(x) = ∞, limx→0+ L(x) = −∞.
d) We have L((0, ∞)) = R.
Proof. a) An easy induction argument using L(x2 ) = L(x) + L(x) = 2L(x).
b) For any x ∈ (0, ∞) we have 0 = L(1) = L(x · x1 ) = L(x) + L( x1 ).
c) Since L′ (x) = x1 > 0 for all x ∈ (0, ∞), L is increasing on (0, ∞). Since
L(1) = 0, for any x > 0, L(x) > 0. To be specific, take C = L(2), so C > 0.
Then by part a), L(2n ) = nL(2) = nC. By the Archimedean property of R, this
shows that L takes arbitaririly large values, and since it is increasing, this implies
limx→∞ L(x) = ∞. To evaluate limx→0+ L(x) we may proceed similarly: by part
b), L( 12 ) = −L(2) = −C < 0, so L( 21n ) = −nL(2) = −Cn, so L takes arbitrarily
small values. Again, combined with the fact that L is increasing, this implies
limx→0+ L(x) = −∞. (Alternately, we may evaluate limx→0+ L(x) by making the
change of variable y = x1 and noting that as x → 0+ , y → ∞+. This is perhaps
more intuitive but is slightly tedious to make completely rigorous.)
d) Since L is differentiable, it is continuous, and the result follows immediately from
part c) and the Intermediate Value Theorem. 
Definition: We define e to be the unique positive real number such that L(e) = 1.
(Such a number exists because L : (0, ∞) → R is increasing – hence injective and
has image (−∞, ∞). Thus in fact for any real number α there is a unique positive
real number β such that L(β) = α.)
MATH 2400 LECTURE NOTES: DIFFERENTIATION 29

Since L(x) is everywhere differentiable with nonzero derivative x1 , the differentiable


inverse function theorem applies: L has a differentiable inverse function
E : R → (0, ∞), E(0) = 1.
Let’s compute E ′ : differentiating L(E(x)) = x gives
E ′ (x)
1 = L′ (E(x))E ′ (x) = .
E(x)
In other words, we get
E ′ (x) = E(x).
Corollary 42. For all x, y ∈ R we have E(x + y) = E(x)E(y).
Proof. To showcase the range of techniques available, we give three different proofs.
Ey (x)
First proof: For y ∈ R, let Ey (x) = E(x + y). Put f (x) = E(x) . Then
Ey (x)E ′ (x) − Ey′ (x)E(x) E(x + y)E(x) − E ′ (x + y)(x + y)′ E(x)
f ′ (x) = 2
=
E(x) E(x)2
E(x + y)E(x) − E(x + y) · 1 · E(x)
= = 0.
E(x)2
By the Zero Velocity Theorem, there is Cy ∈ R such that for all x ∈ R, f (x) =
E(x + y)/E(x) = Cy , or E(x + y) = E(x)Cy . Plugging in x = 0 gives
E(y) = E(0)C(y) = 1 · C(y) = C(y),
so
E(x + y) = E(x)E(y).

Second proof: We have


( )
E(x + y)
L = L(E(x + y)) − L(E(x)) − L(E(y)) = x + y − x − y = 0.
E(x)E(y)
The unique x ∈ (0, ∞) such that L(x) = 0 is x = 1, so we must have
E(x + y)
= 1,
E(x)E(y)
or
E(x + y) = E(x)E(y).
Third proof: For any y1 , y2 > 0, we have
L(y1 y2 ) = L(y1 ) + L(y2 ).
Put y1 = E(x1 ) and y2 = E(x2 ), so that x1 = L(y1 ), x2 = L(y2 ) and thus
E(x1 )E(x2 ) = y1 y2 = E(L(y1 y2 )) = E(L(y1 ) + L(y2 )) = E(x1 + x2 ).

Note also that since E and L are inverse functions and L(e) = 1, we have E(1) = e.
Now the previous disucssion must suggest to any graduate of freshman calculus
that E(x) = ex : both functions defined and positive for all real numbers, are equal
to their own derivatives, convert multiplication into addition, and take the value 1
at x = 0. How many such functions could there be?
30 PETE L. CLARK

Proposition 43. Let f : R → R be a differentiable function such that f ′ (x) = f (x)


for all x ∈ R. Then there is a constant C such that f (x) = CE(x) for all x ∈ R.
f (x)
Proof. Consider the function g : R → R defined by g(x) = E(x) . Then for all x ∈ R,
E(x)f ′ (x) − E ′ (x)f (x) E(x)f (x) − E(x)f (x)
g ′ (x) = = = 0.
E(x)2 E(x)2
By the Zero Velocity Theorem g = f
E is constant: f (x) = CE(x) for all x. 
In other words, if there really is a function f (x) = ex out there with f ′ (x) = ex and
f (0) = 1, then we must have ex = E(x) for all x. The point of this logical maneuver
is that although in precalculus mathematics one learns to manipulate and graph
exponential functions, the actual definition of ax for irrational x is not given, and
indeed I don’t see how it can be given without using key concepts and theorems of
calculus. But, with the functions E(x) and L(x) in hand, let us develop the theory
of exponentials and logarithms to arbitrary bases.

Let a > 0 be a real number. How should we define ax ? In the following slightly
strange way: for any x ∈ R,
ax := E(L(a)x).
Let us make two comments: first, if a = e this agrees with our previous definition:
ex = E(xL(e))) = E(x). Second, the definition is motivated by the following
desirable law of exponents: (ab )c = abc . Indeed, assuming this holds unrestrictedly
for b, c ∈ R and a > 1, we would have
ax = E(x log a) = ex log a = (elog a )x = ax .
But here is the point: we do not wish to assume that the laws of exponents work
for all real numbers as they do for positive integers...we want to prove them!
Proposition 44. Fix a ∈ (0, ∞). For x ∈ R, we define
ax := E(L(a)x).
If a ̸= 1, we define
L(x)
loga (x) = .
L(a)
a) The function ax is differentiable and (ax )′ = L(a)ax .
b) The function loga x is differentiable and (loga x)′ = L(a)x
1
.
c) Suppose a > 1. Then a is increasing with image (0, ∞), loga x is increasing
x

with image (−∞, ∞), and ax and loga x are inverse functions.
d) For all x, y ∈ R, ax+y = ax ay .
e) For all x > 0 and y ∈ R, (ax )y = axy .
f ) For all x, y > 0, loga (xy) = loga x + loga y.
g) For all x > 0 and y ∈ R, loga (xy ) = y loga x.
Proof. a) We have
(ax )′ = E(L(a)x)′ = E ′ (L(a)x)(L(a)x)′ = E(L(a)x) · L(a) = L(a)ax .
b) We have
( )′
′ L(x) 1
(loga (x)) = = .
L(a) L(a)x
MATH 2400 LECTURE NOTES: DIFFERENTIATION 31

c) Since their derivatives are always positive, ax and loga x are both increasing
functions. Moreover, since a > 1, L(a) > 0 and thus
lim ax = lim E(L(a)x) = E(∞) = ∞,
x→∞ x→∞
L(x) ∞
lim loga (x) = lim = = ∞.
x→∞ L(a)
x→∞ L(a)
Thus ax : (−∞, ∞) → (0, ∞) and loga x : (0, ∞) → (−∞, ∞) are bijective and thus
have inverse functions. Thus check that they are inverses of each other, it suffices
to show that either one of the two compositions is the identity function. Now
L(ax ) L(E(L(a)x)) L(a)x
loga (ax ) = = = = x.
L(a) L(a) L(a)
d) We have
ax+y = E(L(a)(x + y)) = E(L(a)x + L(a)y) = E(L(a)x)E(L(a)y) = ax ay .
e) We have
(ax )y = E(L(ax )y) = E(L(E(L(a)x))y) = E(L(a)xy) = axy .
f) We have
L(xy) L(x) + L(y) L(x) L(y)
loga (xy) = = = + = loga x + loga y.
L(a) L(a) L(a) L(a)
g) We have
L(xy ) L(E(L(x)y)) L(x)y
loga xy = = = = y loga x.
L(a) L(a) L(a)

x
Having established all this, we now feel free to write e for E(x) and log x for L(x).

Exercise: Suppose 0 < a < 1. Show that ax is decreasing with image (0, ∞),
loga x is decreasing with image (0, ∞), and ax and loga x are inverse functions.

Exercise: Prove the change of base formula: for all a, b, c > 0 with a, c ̸= 1,
logc b
loga b = .
logc a
7.3. Some inverse trigonometric functions.

We now wish to consider inverses of the trigonometric functions: sine, cosine, tan-
gent, and so forth. Right away we encounter a problem similar to the case of xn for
even n: the trigonometric functions are periodic, hence certainly not injective on
their entire domain. Once again we are forced into the art of domain restriction
(as opposed to the science of codomain restriction).

Consider first f (x) = sin x. To get an inverse function, we need to restrict the
domain to some subset S on which f is injective. As usual we like intervals, and a
little thought shows that the maximal possible length of an interval on which the
sine function is injective is π, attained by any interval at which the function either
increases from −1 to 1 or decreases from 1 to −1. This still gives us choices to make.
The most standard choice – but to be sure, one that is not the only possible one
32 PETE L. CLARK

nor is mathematically consecrated in any particular way – is to take I = [ −π π


2 , 2 ].

We claim that f is increasing on I. To check this, note that f (x) = cos x is indeed
positive on ( −π π −π π
2 , 2 ). We have f ([ 2 , 2 ]) = [−1, 1]. The inverse function here is of-
ten called arcsin x (“arcsine of x”) in an attempt to distinguish it from sin1 x = csc x.
This is as good a name as any: let’s go with it. We have
−π π
arcsin : [−1, 1] → [ , ].
2 2
Being the inverse of anincreasing function, arcsin x is increasing. Moreover since the
sine function has a nonzero derivative on ( −π π
2 , 2 ), arcsin x is differentiable there.
As usual, to find the derivative we prefer to redo the implicit differentiation by
hand: differentiating
sin(arcsin x)) = x,
we get
cos(arcsin x) arcsin′ (x) = 1,
or
d 1
arcsin x = .
dx cos(arcsin x)
This looks like a mess, but a little trigonometry will clean it up. The key is to
realize that cos arcsin x means “the cosine of the angle whose sine is x” and that
there must be a simpler description of this. If we draw a right triangle with angle
θ = arcsin x, then to get the ratio of the opposite side to the hypotenuse to be x we
may take the length of the opposite side to be x and the length of the hypotenuse to
be
√ 1, in which case the length
√ of the adjacent side is, by the Pythagorean Theorem,
1 − x2 . Thus cos θ = 1 − x2 , so finally
d 1
arcsin x = √ .
dx 1 − x2

Now consider f (x) = cos x. Since f is even, it is not injective on any interval
containing 0 in its interior. Reflecting a bit on the graph of f (x) = cos x one sees
that a reasonable choice for the restricted domain is [0, π]: since f ′ (x) = − sin x is
negative on (0, π) and 0 and 0 and π, f (x) is decreasing on [0, π] and hence injective
there. Its image is f ([0, π])) = [−1, 1]. Therefore we have an inverse function
arccos : [−1, 1] → [0, π].
Since cos x is continuous, so is arccos x. Since cos x is differentiable and has zero
derivative only at 0 and π, arccos x is differentiable on (−1, 1) and has vertical tan-
gent lines at x = −1 and x = 1. Morever, since cos x is decreasing, so is arccos x.

We find a formula for the derivative of the arccos function just as we did for arcsin
above: differentiating the identity
cos arccos x = x
gives
− sin(arccos x) arccos′ x = 1,
or
−1
arccos′ x = .
sin arccos x
MATH 2400 LECTURE NOTES: DIFFERENTIATION 33

Again, this may be simplified. If φ = arccos √x, then x = cos φ, so if we are on the
unit circle then the y-coordinate is sin φ = 1 − x2 , and thus
−1
arccos′ x = √ .
1 − x2
Remark: It is hard not to notice that the derivatives of the arcsine and the arccosine
are simply negatives of each other, so for all x ∈ [0, π2 ],
arccos′ x + arcsin′ x = 0.
By the Zero Velocity Theorem, we conclude
arccos x + arcsin x = C
for some constant C. To determine C, simply evaluate at x = 0:
π π
C = arccos 0 + arcsin 0 = + 0 = ,
2 2
and thus for all x ∈ [0, π2 ] we have
π
arccos x + arcsin x = .
2
Thus the angle θ whose sine is x is complementary to the angle φ whose cosine is
x. A little thought should convince you that this is a familiar fact.

sin x
Finally, consider f (x) = tan x = cos x . The domain is all real numbers for which
cos x ̸= 0, so all real numbers except ± π2 , ± 3π2 , . . .. The tangent function is peri-
odic with period π and also odd, which suggests that, as with the sine function, we
should restrict this domain to the largest interval about 0 on which f is defined and
injective. Since f ′ (x) = sec2 x > 0, f is increasing on ( −π π
2 , 2 ) and thus is injective
there. Moreover, limx→± π2 tan x = ±∞, so by the Intermediate Value Theorem
f (( −π
2 , 2 )) = R. Therefore we have an inverse function
π

−π π
arctan : R → ( , ).
2 2
Since the tangent function is differentiable with everywhere positive derivative, the
same is true for arctan x. In particular it is increasing, but not without bound:
we have limx→±∞ arctan x = ± π2 . In other words the arctangent has horizontal
asymptotes at y = ± π2 .

References
[S] M. Spivak, Calculus. Fourth edition.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy