Math 2400 Lecture Notes: Differentiation: Pete L. Clark
Math 2400 Lecture Notes: Differentiation: Pete L. Clark
PETE L. CLARK
Contents
1. Differentiability Versus Continuity 2
2. Differentiation Rules 3
3. Optimization 7
3.1. Intervals and interior points 7
3.2. Functions increasing or decreasing at a point 7
3.3. Extreme Values 8
3.4. Local Extrema and a Procedure for Optimization 10
3.5. Remarks on finding roots of f ′ 12
4. The Mean Value Theorem 12
4.1. Statement of the Mean Value Theorem 12
4.2. Proof of the Mean Value Theorem 13
4.3. The Cauchy Mean Value Theorem 14
5. Monotone Functions 15
5.1. The Monotone Function Theorems 15
5.2. The First Derivative Test 17
5.3. The Second Derivative Test 18
5.4. Sign analysis and graphing 19
5.5. A theorem of Spivak 21
6. Inverse Functions I: Theory 21
6.1. Review of inverse functions 21
6.2. The Interval Image Theorem 23
6.3. Monotone Functions and Invertibility 23
6.4. Inverses of Continuous Functions 24
6.5. Inverses of Differentiable Functions 25
7. Inverse Functions II: Examples and Applications 27
1
7.1. x n 27
7.2. L(x) and E(x) 28
7.3. Some inverse trigonometric functions 31
References 33
⃝
c Pete L. Clark, 2012.
Thanks to Bryan Oakley for some help with the proof of Proposition 44.
1
2 PETE L. CLARK
Thus
0 = lim (f (x) − f (a)) = ( lim f (x)) − f (a),
x→a x→a
so
lim f (x) = f (a).
x→a
Remark about linear continuity...
The converse of Theorem 1 is far from being true: a function f which is con-
tinuous at a need not be differentiable at a. An easy example of this is f (x) = |x|
at a = 0.
2. Differentiation Rules
Theorem 3. (Constant Rule) Let f be differentiable at a ∈ R and C ∈ R. Then
the function Cf is also differentiable at a and
(Cf )′ (a) = Cf ′ (a).
Proof. There is nothing to it:
( )
(Cf )(a + h) − (Cf )(a) f (a + h) − f (a)
(Cf )′ (a) = lim =C lim = Cf ′ (a).
h→0 h h→0 h
Theorem 4. (Sum Rule) Let f and g be functions which are both differentiable at
a ∈ R. Then the sum f + g is also differentiable at a and
(f + g)′ (a) = f ′ (a) + g ′ (a).
Proof. Again, no biggie:
(f + g)(a + h) − (f + g)(a) f (a + h) − f (a) g(a + h) − g(a)
(f +g)′ (a) = lim = lim +
h→0 h h→0 h h
f (a + h) − f (a) g(a + h) − g(a)
= lim + lim = f ′ (a) + g ′ (a).
h→0 h h→0 h
These results, simple as they are, have the following important consequence.
Corollary 5. (Linearity of the Derivative) For any differentiable functions f and
g and any constants C1 , C2 , we have
(C1 f + C2 g)′ = C1 f ′ + C2 g ′ .
The proof is an immediate application of the Sum Rule followed by the Product
Rule. The point here is that functions L : V → W with the property that L(v1 +
v2 ) = L(v1 ) + L(v2 ) and L(Cv) = CL(v) are called linear mappings, and are
extremely important across mathematics.1 The study of linear mappings is the
subject of linear algebra. That differentiation is a linear mapping (on the infinite-
dimensional vector space of real functions) provides an important link between
calculus and algebra.
Theorem 6. (Product Rule) Let f and g be functions which are both differentiable
at a ∈ R. Then the product f g is also differentiable at a and
(f g)′ (a) = f ′ (a)g(a) + f (a)g ′ (a).
1We are being purposefully vague here as to what sort of things V and W are...
4 PETE L. CLARK
Proof.
f (a + h)g(a + h) − f (a)g(a)
(f g)′ (a) = lim
h→0 h
f (a + h)g(a + h) − f (a)g(a + h) + (f (a)g(a + h) − f (a)g(a))
= lim
h→0 h
( )( ) ( )
f (a + h) − f (a) g(a + h) − g(a)
= lim lim g(a + h) + f (a) lim .
h→0 h h→0 h→0 h
Since g is differentiable at a, g is continuous at a and thus limh→0 g(a + h) =
limx→ ag(x) = g(a). The last expression above is therefore equal to
f ′ (a)g(a) + f (a)g ′ (a).
Dimensional analysis and the product rule.
The generalized product rule: suppose we want to find the derivative of a func-
tion which is a product of not two but three functions whose derivatives we already
know, e.g. f (x) = x sin xex . We can – of course? – still use the product rule, in
two steps:
f ′ (x) = (x sin xex )′ = ((x sin x)ex )′ = (x sin x)′ ex + (x sin x)(ex )′
= (x′ sin x + x(sin x)′ )ex + x sin xex = sin x + x cos xex + x sin xex .
Note that we didn’t use the fact that our three differentiable functions were x,
sin x and ex until the last step, so the same method shows that for any three
functions f1 , f2 , f3 which are all differentiable at x, the product f = f1 f2 f3 is also
differentiable at a and
f ′ (a) = f1′ (a)f2 (a)f3 (a) + f1 (a)f2′ (a)f3 (a) + f1 (a)f2 (a)f3′ (a).
Riding this train of thought a bit farther, here a rule for the product of any finite
number n ≥ 2 of differentiable functions.
Theorem 7. (Generalized Product Rule) Let n ≥ 2 be an integer, and let f1 , . . . , fn
be n functions which are all differentiable at a. Then f = f1 · · · fn is also differen-
tiable at a, and
(1) (f1 · · · fn )′ (a) = f1′ (a)f2 (a) · · · fn (a) + . . . + f1 (a) · · · fn−1 (a)fn′ (a).
Proof. By induction on n.
Base Case (n = 2): This is precisely the “ordinary” Product Rule (Theorem XX).
Induction Step: Let n ≥ 2 be an integer, and suppose that the product of any n
functions which are each differentiable at a ∈ R is differentiable at a and that the
derivative is given by (1). Now let f1 , . . . , fn , fn+1 be functions, each differentiable
at a. Then by the usual product rule
(f1 · · · fn fn+1 )′ (a) = ((f1 · · · fn )fn+1 )′ (a) = (f1 · · · fn )′ (a)fn+1 (a)+f1 (a) · · · fn (a)fn+1
′
(a).
Using the induction hypothesis this last expression becomes
(f1′ (a)f2 (a) · · · fn (a) + . . . + f1 (a) · · · fn−1 (a)fn′ (a)) fn+1 (a)+f1 (a) · · · fn (a)fn+1
′
(a)
= f1′ (a)f2 (a) · · · fn (a)fn+1 (a) + . . . + f1 (a) · · · fn (a)fn+1
′
(a).
MATH 2400 LECTURE NOTES: DIFFERENTIATION 5
Example: We may use the Generalized Product Rule to give a less computationally
intensive derivation of the power rule
(xn )′ = nxn−1
for n a positive integer. Indeed, taking f1 = · · · = fn = x, we have f (x) = xn =
f1 · · · fn , so applying the Generalized Power rule we get
(xn )′ = (x)′ x · · · x + . . . + x · · · x(x)′ .
Here in each term we have x′ = 1 multipled by n − 1 factors of x, so each term
evalutes to xn−1 . Moreover we have n terms in all, so
(xn )′ = nxn−1 .
No need to mess around with binomial coefficients!
Example: More generally, for any differentiable function f and n ∈ Z+ , the Gener-
alized Product Rule shows that the function f (x)n is differentiable and (f (x)n )′ =
nf (x)n−1 . (This sort of computation is more traditionally done using the Chain
Rule...coming up soon!)
Theorem 8. (Quotient Rule) Let f and g be functions which are both differentiable
at a ∈ R, with g(a) ̸= 0. Then fg is differentiable at a and
( )′
f g(a)f ′ (a) − f (a)g ′ (a)
(a) = .
g g(a)2
Proof. Step 0: First observe that since g is continuous and g(a) ̸= 0, there is
some interval I = (a − δ, a + δ) about a on which g is nonzero, and on this
interval fg is defined. Thus it makes sense to consider the difference quotient
f (a+h)/g(a+h)−f (a)/g(a)
h for h sufficiently close to zero.
Step 1: We first establish the Reciprocal Rule, i.e., the special case of the Quo-
tient Rule in which f (x) = 1 (constant function). Then
1 1/g(a + h) − 1/g(a)
( )′ (a) = lim
g h→0 h
( )( )
g(a) − g(a + h) g(a + h) − g(a) 1 −g ′ (a)
= lim = − lim lim = .
h→0 hg(a)g(a + h) h→0 h h→0 g(a)g(a + h) g(a)2
Above we have once again used the fact that g is differentiable at a implies g is
continuous at a.
Step 2: We now derive the full Quotient Rule by combining the Product Rule and
the Reciprocal Rule. Indeed, we have
( )′ ( )′ ( )′
f 1 ′ 1 1
(a) = f · (a) = f (a) + f (a) (a)
g g g(a) g
f ′ (a) g ′ (a) g(a)f ′ (a) − g ′ (a)f (a)
= − f (a) = .
g(a) g(a)2 g(a)2
Lemma 9. Let f : D ⊂ R → R. Suppose:
(i) limx→a f (x) exists, and
(ii) There exists a number L ∈ R such that for all δ > 0, there exists at least one x
with 0 < |x − a| < δ such that f (x) = L.
Then limx→a f (x) = L.
6 PETE L. CLARK
Proof. We leave this as an (assigned, this time!) exercise, with the following sugges-
tion to the reader: suppose that limx→a f (x) = M ̸= L, and derive a contradiction
by taking ϵ to be small enough compared to |M − L|.
Example: Consider, again, for α ∈ R, the function fα : R → R defined by fα (x) =
xα sin( x1 ) for x ̸= 0 and fα (0) = 0. Then f satisfies hypothesis (ii) of Lemma 9
with L = 0, since on any deleted interval around zero, the function sin( x1 ) takes the
value 0 infinitely many times. According to Lemma 9 then, if limx→0 fα (x) exists
at all, then it must be 0. As we have seen, the limit exists iff α > 0 and is indeed
equal to zero in that case.
Theorem 10. (Chain Rule) Let f and g be functions, and let a ∈ R be such that
f is differentiable at a and g is differentiable at f (a). Then the composite function
g ◦ f is differentiable at a and
(g ◦ f )′ (a) = g ′ (f (a))f ′ (a).
Proof. Motivated by Leibniz notation, it is tempting to argue as follows:
( ) ( )
′ g(f (x)) − g(f (a)) g(f (x)) − g(f (a)) f (x) − f (a)
(g ◦ f ) (a) = lim = lim ·
x→a x−a x→a f (x) − f (a) x−a
( )( )
g(f (x)) − g(f (a)) f (x) − f (a)
= lim lim
x→a f (x) − f (a) x→a x−a
( )( )
g(f (x)) − g(f (a)) f (x) − f (a)
= lim lim = g ′ (f (a))f ′ (a).
f (x)→f (a) f (x) − f (a) x→a x−a
The replacement of “limx→a . . . by limf (x)→f (a) . . .” in the first factor above is
justified by the fact that f is continuous at a.
However, the above argument has a gap in it: when we multiply and divide
by f (x) − f (a), how do we know that we are not dividing by zero?? The answer
is that we cannot rule this out: it is possible for f (x) to take the value f (a) on
arbitarily small deleted intervals around a: again, this is exactly what happens for
the function fα (x) of the above example near a = 0.2 This gap is often held to
invalidate the proof, and thus the most common proof of the Chain Rule in honors
calculus / basic analysis texts proceeds along (superficially, at least) different lines.
But in fact I maintain that the above gap may be rather easily filled to give a
complete proof. The above argument is valid unless the following holds: for all
δ > 0, there exists x with 0 < |x − a| < δ such that f (x) − f (a) = 0. So it remains
to give a different proof of the Chain Rule in that case. First, observe that with the
above hypothesis, the difference quotient f (x)−f x−a
(a)
is equal to 0 at points arbitarily
close to x = a. It follows from Lemma 9 that if
f (x) − f (a)
lim
x→a x−a
exists at all, then it must be equal to 0. But we are assuming that the above limit
exists, since we are assuming that f is differentiable at a. Therefore what we have
2One should note that in order for a function to have this property it must be “highly oscillatory
near a” as with the functions fα above: indeed, fα is essentially the simplest example of a
function having this kind of behavior. In particular, most of the elementary functions considered
in freshman calculus do not exhibit this highly oscillatory behavior near any point and therefore
the above argument is already a complete proof of the Chain Rule for such functions. Of course
our business here is to prove the Chain Rule for all functions satisfying the hypotheses of the
theorem, even those which are highly oscillatory!
MATH 2400 LECTURE NOTES: DIFFERENTIATION 7
seen is that in the remaining case we have f ′ (a) = 0, and therefore, since we are
trying to show that (g◦f )′ (a) = g ′ (f (a))f ′ (a), we are trying in this case to show that
(g ◦ f )′ (a) = 0. So consider our situation: for x ∈ R we have two possibilities: the
first is f (x)−f (a) = 0, in which case also g(f (x))−g(f (a)) = g(f (a))−g(f (a)) = 0,
so the difference quotient is zero at these points. The second is f (x) − f (a) ̸= 0, in
which case the algebra
g(f (x)) − g(f (a)) f (x) − f (a)
g(f (x)) − g(f (a)) = ·
f (x) − f (a) x−a
is justified, and the above argument shows that this expression tends to g ′ (f (a))f ′ (a) =
0 as x → a. So whichever holds, the difference quotient g(f (x))−g(f x−a
(a))
is close to
(or equal to!) zero.3 Thus the limit tends to zero no matter which alternative
obtains. Somewhat more formally, if we fix ϵ > 0, then the first step of the argu-
ment shows that there is δ > 0 such that for all x with 0 < |x − a| < δ such that
f (x) − f (a) ̸= 0, | g(f (x))−g(f
x−a
(a))
| < ϵ. On the other hand, when f (x) − f (a) = 0,
then | g(f (x))−g(f
x−a
(a))
| = 0, so it is certainly less than ϵ! Therefore, all in all we have
0 < |x − a| < δ =⇒ | g(f (x))−g(f
x−a
(a))
| < ϵ, so that
g(f (x)) − g(f (a))
lim = 0 = g ′ (f (a))f ′ (a).
x→a x−a
3. Optimization
3.1. Intervals and interior points.
3This is the same idea as in the proof of the Switching Theorem, although – to my mild
disappointment – we are not able to simply apply the Switching Theorem directly, since one of
our functions is not defined in a deleted interval around zero.
8 PETE L. CLARK
Example: Let f (x) = mx + b be the general linear function. Then for any a ∈ R:
f is increasing at a iff m > 0, f is weakly increasing at a iff m ≥ 0, f is decreasing
at a iff m < 0, and f is weakly decreasing at a iff m ≤ 0.
If one looks back at the previous examples and keeps in mind that we are sup-
posed to be studying derivatives (!), one is swiftly led to the following fact.
Theorem 11. Let f : I → R, and let a be an interior point of a. Suppose f is
differentiable at a.
a) If f ′ (a) > 0, then f is increasing at a.
b) If f ′ (a) < 0, then f is decreasing at a.
c) If f ′ (a) = 0, then no conclusion can be drawn: f may be increasing through a,
decreasing at a, or neither.
Proof. a) The differentiability of f at a has an ϵ-δ interpretation, and the idea is to
use this interpretation to our advantage. Namely, take ϵ = f ′ (a): there exists δ > 0
such that for all x with 0 < |x − a| < δ, | f (x)−f
x−a
(a)
− f ′ (a)| < f ′ (a), or equivalently
f (x) − f (a)
0< < 2f ′ (a).
x−a
In particular, for all x with 0 < |x−a| < δ, f (x)−f
x−a
(a)
> 0, so: if x > a, f (x)−f (a) >
0, i.e., f (x) > f (a); and if x < a, f (x) − f (a) < 0, i.e., f (x) < f (a).
b) This is similar enough to part a) to be best left to the reader as an exercise.5
c) If f (x) = x3 , then f ′ (0) = 0 but f is increasing at 0. If f (x) = −x3 , then
f ′ (0) = 0 but f is decreasing at 0. If f (x) = x2 , then f ′ (0) = 0 but f is neither
increasing nor decreasing at 0.
3.3. Extreme Values.
It is clear that a function can have at most one maximum value: if it had more
than one, one of the two would be larger than the other! However a function need
not have any maximum value: for instance f : (0, ∞) → R by f (x) = x1 has no
maximum value: limx→0+ f (x) = ∞.
Again a function clearly can have at most one minimum value but need not have
any at all: the function f : R \ {0} → R by f (x) = x1 has no minimum value:
limx→0− f (x) = −∞.
Example: The function f (x) = sin x assumes its maximum value at x = π2 , be-
cause sin π2 = 1, and 1 is the maximum value of the sine function. Note however
that π2 is not the only x-value at which f assumes its maximum value: indeed, the
sine function is periodic and takes value 1 precisely at x = π2 + 2πn for n ∈ Z.
Thus there may be more than one x-value at which a function attains its maximum
value. Similarly f attains its minimum value at x = 3π 2 – f ( 2 ) = −1 and f takes
3π
This brings us to the statement (but not yet the proof; sorry!) of one of the
most important theorems in this or any course.
Theorem 12. (Extreme Value Theorem) Let f : [a, b] → R be a continuous func-
tion. Then f has a maximum and minimum value, and in particular is bounded
above and below.
Again this result is of paramount importance: ubiquitously in (pure and applied)
mathematics we wish to optimize functions: that is, find their maximum and or
minimum values on a certain domain. Unfortunately, as we have seen above, a
general function f : D → R need not have a maximum or minimum value! But
the Extreme Value Theorem gives rather mild hypotheses on which these values
are guaranteed to exist, and in fact is a useful tool for establishing the existence of
maximia / minima in other situations as well.
We now describe a type of “local behavior near a” of a very different sort from
being increasing or decreasing at a.
Since f is differentiable everywhere on (−4, 4), the only critical points will be the
stationary points, where f ′ (x) = 0. So we compute the derivative:
f ′ (x) = 4x3 − 9x2 + 4x = x(4x2 − 9x + 4).
12 PETE L. CLARK
√
9± 17
The roots are x = 8 , or, approximately,
x1 ≈ 0.6094 . . . , x2 = 1.604 . . . .
f (x1 ) = 0.2017 . . . , f (x2 ) = −0.619 . . . .
Also we always test the endpoints:
f (−4) = 480, f (4) = 96.
So √the maximum value is 480 and the minimum value is −.619 . . ., occurring at
9+ 17
8 .
3.5. Remarks on finding roots of f ′ .
One should certainly draw a picture to go with the Mean Value Theorem, as it
has a very simple geometric interpretation: under the hypotheses of the theorem,
there exists at least one interior point c of the interval such that the tangent line
at c is parallel to the secant line joining the endpoints of the interval.
And one should also interpret it physically: if y = f (x) gives the position of a
particle at a given time x, then the expression f (b)−f
b−a
(a)
is nothing less than the
average velocity between time a and time b, whereas the derivative f ′ (c) is the in-
stantaneous velocity at time c, so that the Mean Value Theorem says that there is at
MATH 2400 LECTURE NOTES: DIFFERENTIATION 13
least one instant at which the instantaneous velocity is equal to the average velocity.
Example: Suppose that cameras are set up at certain checkpoints along an in-
terstate highway in Georgia. One day you receive in the mail photos of yourself
at two checkpoints. The two checkpoints are 90 miles apart and the second photo
is taken 73 minutes after the first photo. You are issued a ticket for violating the
speeed limit of 70 miles per hour. The enclosed letter explains: your average veloc-
ity was (90 miles) / (73 minutes) · (60 minutes) / (hour) ≈ 73.94 miles per hour.
Thus, although no one saw you violating the speed limit, they may mathematically
deduce that at some point your instantaneous velocity was over 70 mph. Guilt by
the Mean Value Theorem!
We will deduce the Mean Value Theorem from the Extreme Value Theorem (which
we have not yet proven, but all in good time...). However, it is convenient to first
establish a special case.
Theorem 16. (Rolle’s Theorem) Let f : [a, b] → R. We suppose:
(i) f is continuous on [a, b].
(ii) f is differentiable on (a, b).
(iii) f (a) = f (b).
Then there exists c with a < c < b and f ′ (c) = 0.
Proof. By the Extreme Value Theorem, f has a maximum M and a minimum m.
Case 1: Suppose M > f (a) = f (b). Then the maximum value does not occur
at either endpoint. Since f is differentiable on (a, b), it must therefore occur at a
stationary point: i.e., there exists c ∈ (a, b) with f ′ (c) = 0.
Case 2: Suppose m < f (a) = f (b). Then the minimum value does not occur at
either endpoint. Since f is differentiable on (a, b), it must therefore occur at a
stationary point: there exists c ∈ (a, b) with f ′ (c) = 0.
Case 3: The remaining case is f (a) ≤ m ≤ M ≤ f (a), which implies m = M =
f (a) = f (b), so f is constant. In this case f ′ (c) = 0 at every point c ∈ (a, b)!
To deduce the Mean Value Theorem from Rolle’s Theorem, it is tempting to tilt
our head until the secant line from (a, f (a)) to (b, f (b)) becomes horizontal and
then apply Rolle’s Theorem. The possible flaw here is that if we start a subset in
the plane which is the graph of a function and rotate it too much, it may no longer
be the graph of a function, so Rolle’s Theorem does not apply.
The above objection is just a technicality. In fact, it suggests that more is true:
there should be some version of the Mean Value Theorem which applies to curves
in the plane which are not necessarily graphs of functions. Indeed we will meet
such a generalization later – the Cauchy Mean Value Theorem – and use it
to prove L’Hôpital’s Rule – but at the moment it is, alas, easier to use a simple trick.
Proof of the Mean Value Theorem: Let f : [a, b] → R be continuous on [a, b] and
differentiable on (a, b). There is a unique linear function L(x) such that L(a) = f (a)
and L(b) = f (b): indeed, L is nothing else than the secant line to f between (a, f (a))
and (b, f (b)). Here’s the trick: by subtracting L(x) from f (x) we reduce ourselves
to a situation where we may apply Rolle’s Theorem, and then the conclusion that
14 PETE L. CLARK
We present here a modest generalization of the Mean Value Theorem due to A.L.
Cauchy. Although perhaps not as fundamental and physically appealing as the
Mean Value Theorem, it certainly has its place: for instance it may be used to
prove L’Hôpital’s Rule.
Theorem 17. (Cauchy Mean Value Theorem) Let f, g : [a, b] → R be continuous
and differentiable on (a, b). Then there exists c ∈ (a, b) such that
(2) (f (b) − f (a))g ′ (c) = (g(b) − g(a))f ′ (c).
Proof. Case 1: Suppose g(a) = g(b). By Rolle’s Theorem, there is c ∈ (a, b) such
that g ′ (c) = 0. With this value of c, both sides of (2) are zero, hence they are equal.
Case 2: Suppose g(a) ̸= g(b), and define
( )
f (b) − f (a)
h(x) = f (x) − g(x).
g(b) − g(a)
Then h is continuous on [a, b], differentiable on (a, b), and
f (a)(g(b) − g(a)) − g(a)(f (b) − f (a)) f (a)g(b) − g(a)f (b)
h(a) = = ,
g(b) − g(a) g(b) − g(a)
f (b)(g(b) − g(a)) − g(b)(f (b) − f (a)) f (a)g(b) − g(a)f (b)
h(b) = = ,
g(b) − g(a) g(b) − g(a)
so h(a) = h(b).6 By Rolle’s Theorem there exists c ∈ (a, b) with
( )
f (b) − f (a)
0 = h′ (c) = f ′ (c) − g ′ (c),
g(b) − g(a)
or equivalently,
(f (b) − f (a))g ′ (c) = (g(b) − g(a))f ′ (c).
6Don’t be so impressed: we wanted a constant C such that if h(x) = f (x) − Cg(x), then
h(a) = h(b), so we set f (a) − Cg(a) = f (b) − Cg(b) and solved for C.
MATH 2400 LECTURE NOTES: DIFFERENTIATION 15
5. Monotone Functions
5.1. The Monotone Function Theorems.
The Mean Value Theorem has several important consequences. Foremost of all
it will be used in the proof of the Fundamental Theorem of Calculus, but that’s for
later. At the moment we can use it to establish a criterion for a function f to be
increasing / weakly increasing / decreasing / weakly decreasing on an interval in
terms of sign condition on f ′ .
Theorem 18. (First Monotone Function Theorem) Let I be an open interval, and
let f : I → R be a function which is differentiable on I.
a) Suppose f ′ (x) > 0 for all x ∈ I. Then f is increasing on I: for all x1 , x2 ∈ I
with x1 < x2 , f (x1 ) < f (x2 ).
b) Suppose f ′ (x) ≥ 0 for all x ∈ I. Then f is weakly increasing on I: for all
x1 , x2 ∈ I with x1 < x2 , f (x1 ) ≤ f (x2 ).
c) Suppose f ′ (x) < 0 for all x ∈ I. Then f is decreasing on I: for all x1 , x2 inI
with x1 < x2 , f (x1 ) > f (x2 ).
d) Suppose f ′ (x) ≤ 0 for all x ∈ I. Then f is weakly decreasing on I: for all
x1 , x2 ∈ I with x1 < x2 , f (x1 ) ≥ f (x2 ).
Proof. a) We go by contraposition: suppose that f is not increasing: then there
exist x1 , x2 ∈ I with x1 < x2 such that f (x1 ) ≥ f (x2 ). Apply the Mean Value
Theorem to f on [x1 , x2 ]: there exists x1 < c < x2 such that f ′ (c) = f (xx22)−f
−x1
(x1 )
≤ 0.
b) Again, we argue by contraposition: suppose that f is not weakly increasing: then
there exist x1 , x2 ∈ I with x1 < x2 such that f (x1 ) > f (x2 ). Apply the Mean Value
Theorem to f on [x1 , x2 ]: there exists x1 < c < x2 such that f ′ (c) = f (xx22)−f
−x1
(x1 )
< 0.
c),d) We leave these proofs to the reader. One may either proceed exactly as in
parts a) and b), or reduce to them by multiplying f by −1.
Corollary 19. (Zero Velocity Theorem) Let f : I → R be a differentiable function
with identically zero derivative. Then f is constant.
Proof. Since f ′ (x) ≥ 0 for all x ∈ I, f is weakly increasing on I: x1 < x2 =⇒
f (x1 ) ≤ f (x2 ). Since f ′ (x) ≤ 0 for all x ∈ I, f is weakly decreasing on I: x1 <
x2 =⇒ f (x1 ) ≥ f (x2 ). But a function which is weakly increasing and weakly
decreasing satsifies: for all x1 < x2 , f (x1 ) ≤ f (x2 ) and f (x1 ) ≥ f (x2 ) and thus
f (x1 ) = f (x2 ): f is constant.
Remark: The strategy of the above proof is to deduce Corollary 19 from the In-
creasing Function Theorem. In fact if we argued directly from the Mean Value
Theorem the proof would be significantly shorter: try it!
Corollary 20. Suppose f, g : I → R are both differentiable and such that f ′ = g ′
(equality as functions, i.e., f ′ (x) = g ′ (x) for all x ∈ I). Then there exists a constant
C ∈ R such that f = g + C, i.e., for all x ∈ I, f (x) = g(x) + C.
Proof. Let h = f − g. Then h′ = (f − g)′ = f ′ − g ′ ≡ 0, so by Corollary 19, h ≡ C
and thus f = g + h = g + C.
Remark: Corollary 20 can be viewed as the first “uniqueness theorem” for differen-
tial equations. Namely, suppose that f : I → R is some function, and consider the
set of all functions F : I → R such that F ′ = f . Then Corollary ?? asserts that if
16 PETE L. CLARK
On the other hand, the existence question lies deeper: namely, given f : I → R,
must there exist F : I → R such that F ′ = f ? In general the answer is no.
Exercise: Let f : R → R by f (x) = 0 for x ≤ 0 and f (x) = 1 for x > 0. Show that
there is no function F : R → R such that F ′ = f .
that f is increasing on [a, b). Note first that Step 1 applies to show that f (a) ≤ f (x)
for all x ∈ (a, b), but we want slightly more than this, namely strict inequality. So,
seeking a contradiction, we suppose that f (a) = f (x0 ) for some x0 ∈ (a, b). But
now take x1 ∈ (a, x0 ): since f is increasing on (a, b) we have f (x1 ) < f (x0 ) = f (a),
contradicting the fact that f is weakly increasing on [a, b).
Step 3: In a similar way one can handle the right endpoint b. Now suppose that
f is increasing on [a, b) and also increasing on (a, b]. It remains to show that f is
increasing on [a, b]. The only thing that could go wrong is f (a) ≥ f (b). To see that
this cannot happen, choose any c ∈ (a, b): then f (a) < f (c) < f (b).
Let us say that a function f : I → R is monotone if it is either increasing on
I or decreasing on I, and also that f is weakly monotone if it is either weakly
increasing on I or weakly decreasing on I.
Theorem 23. (Second Monotone Function Theorem) Let f : I → R be a function
which is continuous on I and differentiable on the interior I ◦ of I (i.e., at every
point of I except possibly at any endpoints I may have).
a) The following are equivalent:
(i) f is weakly monotone.
(ii) Either we have f ′ (x) ≥ 0 for all x ∈ I ◦ or f ′ (x) ≤ 0 for all x ∈ I ◦ .
b) Suppose f is weakly monotone. The following are equivalent:
(i) f is not monotone.
(ii) There exist a, b ∈ I ◦ with a < b such that the restriction of f to [a, b is constant.
(iii) There exist a, b ∈ I ◦ with a < b such that f ′ (x) = 0 for all x ∈ [a, b].
Proof. Throughout the proof we restrict our attention to increasing / weakly in-
creasing functions, leaving the other case to the reader as a routine exercise.
a) (i) =⇒ (ii): Suppose f is weakly increasing on I. We claim f ′ (x) ≥ 0 for all
x ∈ I ◦ . If not, there is a ∈ I ◦ with f ′ (a) < 0. Then f is decreasing at a, so there
exists b > a with f (b) < f (a), contradicting the fact that f is weakly decreasing.
(ii) =⇒ (i): Immediate from the Increasing Function Theorem and Theorem 22.
b) (i) =⇒ (ii): Suppose f is weakly increasing on I but not increasing on I. By
Theorem 22 f is still not increasing on I ◦ , so there exist a, b ∈ I ◦ with a < b such
that f (a) = f (b). Then, since f is weakly increasing, for all c ∈ [a, b] we have
f (a) ≤ f (c) ≤ f (b) = f (a), so f is constant on [a, b].
(ii) =⇒ (iii): If f is constant on [a, b], f ′ is identically zero on [a, b].
(iii) =⇒ (i): If f ′ is identically zero on some subinterval [a, b], then by the Zero
Velocity Theorem f is constant on [a, b], hence is not increasing.
The next result follows immediately.
Corollary 24. Let f : I → R be differentiable. Suppose that f ′ (x) ≥ 0 for all
x ∈ I, and that f ′ (x) > 0 except at a finite set of points x1 , . . . , xn . Then f is
increasing on I.
Example: A typical application of Theorem 24 is to show that the function f :
R → R by f (x) = x3 is increasing on all of R. Indeed, f ′ (x) = 3x2 which is strictly
positive at all x ̸= 0 and 0 at x = 0.
5.2. The First Derivative Test.
Remark: This version of the First Derivative Test is a little stronger than the
familiar one from freshman calculus in that we have not assumed that f ′ (a) = 0
nor even that f is differentiable at a. Thus for instance our version of the test
applies to f (x) = |x| to show that it has a strict local minimum at x = 0.
We are assuming that this limit exists and is positive, so that there exists δ > 0
′
such that for all x ∈ (a − δ, a) ∪ (a, a + δ), fx−a(x)
is positive. And this gives us exactly
f ′ (x)
what we want: suppose x ∈ (a − δ, a). Then x−a > 0 and x − a < 0, so f ′ (x) < 0.
′
On the other hand, suppose x ∈ (a, a + δ). Then fx−a (x)
> 0 and x − a > 0, so
f ′ (x) > 0. So f has a strict local minimum at a by the First Derivative Test.
Remark: When f ′ (a) = f ′′ (a) = 0, no conclusion can be drawn about the local
behavior of f at a: it may have a local minimum at a, a local maximum at a, be
increasing at a, decreasing at a, or none of the above.
MATH 2400 LECTURE NOTES: DIFFERENTIATION 19
When one is graphing a function f , the features of interest include number and
approximate locations of the roots of f , regions on which f is positive or negative,
regions on which f is increasing or decreasing, and local extrema, if any. For these
considerations one wishes to do a sign analysis on both f and its derivative f ′ .
The basic strategy is to determine first the set of roots of g. As discussed before,
finding exact values of roots may be difficult or impossible even for polynomial
functions, but often it is feasible to determine at least the number of roots and
their approximate location (certainly this is possible for all polynomial functions,
although this requires justification that we do not give here). The next step is to
test a point in each region between consecutive roots to determine the sign.
This procedure comes with two implicit assumptions. Let us make them explicit.
The first is that the roots of f are sparse enough to separate the domain I into
“regions”. One precise formulation of of this is that f has only finitely many roots
on any bounded subset of its domain. This holds for all the elementary functions we
know and love, but certainly not for all functions, even all differentiable functions:
we have seen that things like x2 sin( x1 ) are not so well-behaved. But this is a con-
venient assumption and in a given situation it is usually easy to see whether it holds.
Let us formalize the desired property and then say which functions satisfy it.
Thus a function has the intermediate value property when it does not “skip” values.
Here are two important theorems, each asserting that a broad class of functions
has the intermediate value property.
Theorem 27. (Intermediate Value Theorem) Let f : [a, b] → R be a continuous
function defined on a closed, bounded interval. Then f has the intermediate value
property.
Example of a continuous function f :√[0, 2]Q → Q failing √
the intermediate value
property. Let f (x) be −1 for 0 ≤ x < 2 and f (x) = 1 for 2 < x ≤ 1.
20 PETE L. CLARK
The point of this example is to drive home the point that the Intermediate Value
Theorem is the second of our three “hard theorems” in the sense that we have no
chance to prove it without using special properties of the real numbers beyond the
ordered field axioms. And indeed we will not prove IVT right now, but we will use
it, just as we used but did not yet prove the Extreme Value Theorem. (However we
are now not so far away from the point at which we will “switch back”, talk about
completeness of the real numbers, and prove the three hard theorems.)
Let f : I → R be a continuous function, and suppose that there are only finitely
many roots, i.e., there are x1 , . . . , xn ∈ I such that f (xi ) = 0 for all i and f (x) ̸= 0
for all other x ∈ I. Then I \ {x1 , . . . , xn } is a finite union of intervals, and on each
of them f has constant sign: it is either always positive or always negative.
So this is how sign analysis works for a function f when f is continuous – a very
mild assumption. But as above we also want to do a sign analysis of the derivative
f ′ : how may we justify this?
The following theorem is taken directly from Spivak’s book (Theorem 7 of Chapter
11): it does not seem to be nearly as well known as Darboux’s Theorem (and in
fact I think I encountered it for the first time in Spivak’s book).
Theorem 29. Let a be an interior point of I, and let f : I → R. Suppose:
(i) f is continuous on I,
(ii) f is differentiable on I \ {a}, i.e., at every point of I except possibly at a, and
(iii) limx→a f ′ (x) = L exists.
Then f is differentiable at a and f ′ (a) = L.
Proof. Choose δ > 0 such that (a − δ, a + δ) ⊂ I. Let x ∈ (a, a + δ). Then f is
differentiable at x, and we may apply the Mean Value Theorem to f on [a, x]: there
exists cx ∈ (a, x) such that
f (x) − f (a)
= f ′ (cx ).
x−a
Now, as x → a every point in the interval [a, x] gets arbitrarily close to x, so
limx→a cx = x and thus
f (x) − f (a)
fR′ (a) = lim+ = lim+ f ′ (cx ) = lim+ f ′ (x) = L.
x→a x−a x→a x→a
By a similar argument involving x ∈ (a − δ, a) we get
fL′ (a) = lim f ′ (x) = L,
x→
so f is differentiable at a and f ′ (a) = L.
Let’s unpack this notation: it means the following: first, that for all x ∈ X, (g ◦
f )(x) = g(f (x)) = x; and second, that for all y ∈ Y , (f ◦ g)(y) = f (g(y)) = y.
Proposition 30. (Uniqueness of Inverse Functions) Let f : X → Y be a function.
Suppose that g1 , g2 : Y → X are both inverses of f . Then g1 = g2 .
Proof. For all y ∈ Y , we have
g1 (y) = (g2 ◦ f )(g1 (y)) = g2 (f (g1 (y))) = g1 (y).
Since the inverse function to f is always unique provided it exists, we denote it by
f −1 . (Caution: In general this has nothing to do with f1 . Thus sin−1 (x) ̸= csc(x) =
1
sin x . Because this is legitimately confusing, many calculus texts write the inverse
sine function as arcsin x. But in general one needs to get used to f −1 being used
for the inverse function.)
We now turn to giving conditions for the existence of the inverse function. Re-
call that f : X → Y is injective if for all x1 , x2 ∈ X, x1 ̸= x2 =⇒ f (x1 ) ̸= f (x2 ).
In other words, distinct x-values get mapped to distinct y-values. (And in yet other
words, the graph of f satisfies the horizontal line test.) Also f : X → Y is surjec-
tive if for all y ∈ Y , there exists at least one x ∈ X such that y = f (x).
Putting these two concepts together we get the important notion of a bijective
function f : X → Y , i.e., a function which is both injective and surjective. Other-
wise put, for all y ∈ Y there exists exactly one x ∈ X such that y = f (x). It may
well be intuitively clear that bijectivity is exactly the condition needed to guarantee
existence of the inverse function: if f is bijective, we define f −1 (y) = xy , the unique
element of X such that f (xy ) = y. And if f is not bijective, this definition breaks
down and thus we are unable to define f −1 . Nevertheless we ask the reader to bear
with us as we give a slightly tedious formal proof of this.
Theorem 31. (Existence of Inverse Functions) For f : X → Y , TFAE:
(i) f is bijective.
(ii) f admits an inverse function.
Proof. (i) =⇒ (ii): If f is bijective, then as above, for each y ∈ X there exists
exactly one element of X – say xy – such that f (xy ) = y. We may therefore define a
function g : Y → X by g(y) = xy . Let us verify that g is in fact the inverse function
of f . For any x ∈ X, consider g(f (x)). Because f is injective, the only element
x′ ∈ X such that f (x′ ) = f (x) is x′ = x, and thus g(f (x)) = x. For any y ∈ Y , let
xy be the unique element of X such that f (xy ) = y. Then f (g(y)) = f (xy ) = y.
(ii) =⇒ (i): Suppose that f −1 exists. To see that f is injective, let x1 , x2 ∈ X
be such that f (x1 ) = f (x2 ). Applying f −1 on the left gives x1 = f −1 (f (x1 )) =
f −1 (f (x2 )) = x2 . So f is injective. To see that f is surjective, let y ∈ Y . Then
f (f −1 (y)) = y, so there is x ∈ X with f (x) = y, namely x = f −1 (y).
For any function f : X → Y , we define the image of f to be {y ∈ Y | ∃x ∈ X | y =
f (x)}. The image of f is often denoted f (X).7
7This is sometimes called the range of f , but sometimes not. It is safer to call it the image!
MATH 2400 LECTURE NOTES: DIFFERENTIATION 23
Remark: Although we proved only a special case of the Interval Image Theorem,
in this case we proved a stronger result: if f is a continuous function defined on
a closed, bounded interval I, then f (I) is again a closed, bounded interval. One
might hope for analogues for other types of intervals, but in fact this is not true.
Exercise: Let I be a nonempty interval which is not of the form [a, b]. Let J
be any nonempty interval in R. Show that there is a continuous function f : I → R
with f (I) = J.
In this section our goal is to determine conditions under which the inverse f −1
of a differentiable funtion is differentiable, and if so to find a formula for (f −1 )′ .
Let’s first think about the problem geometrically. The graph of the inverse func-
tion y = f −1 (x) is obtained from the graph of y = f (x) by interchanging x and
y, or, put more geometrically, by reflecting the graph of y = f (x) across the line
y = x. Geometrically speaking y = f (x) is differentiable at x iff its graph has
a well-defined, nonvertical tangent line at the point (x, f (x)), and if a curve has
a well-defined tangent line, then reflecting it across a line should not change this.
Thus it should be the case that if f is differentiable, so is f −1 . Well, almost. Notice
the occurrence of “nonvertical” above: if a curve has a vertical tangent line, then
since a vertical line has “infinite slope” it does not have a finite-valued derivative.
So we need to worry about the possibility that reflection through y = x carries a
nonvertical tangent line to a vertical tangent line. When does this happen? Well,
the inverse function of the straight line y = mx + b is the straight line y = m 1
(x − b)
1
– i.e., reflecting across y = x takes a line of slope m to a line of slope m . Morever,
it takes a horizontal line y = c to a vertical line x = c, so that is our answer: at
any point (a, b) = (a, f (a)) such that f ′ (a) = 0, then the inverse function will fail
to be differentiable at the point (b, a) = (b, f −1 (b)) because it will have a vertical
tangent. Otherwise, the slope of the tangent line of the inverse function at (b, a) is
precisely the reciprocal of the slope of the tangent line to y = f (x) at (a, b).
Well, so the geometry tells us. It turns out to be quite straightforward to adapt
8Nevertheless in my lectures I did state the Monotone Jump Theorem at this point and use it
to give a second proof of the Continuous Inverse Function Theorem.
26 PETE L. CLARK
this geometric argument to derive the desired formula for (f −1 )′ (b), under the as-
sumption that f is differentiable. We will do this first. Then we need to come back
and verify that indeed f −1 is differentiable at b if f ′ (f −1 (b)) exists and is nonzero:
this turns out to be a bit stickier, but we are ready for it and we will do it.
Proposition 38. Let f : I → J be a bijective differentiable function. Suppose
that the inverse function f −1 : J → I is differentiable at b ∈ J. Then (f −1 )′ (b) =
−1
1
f ′ (f −1 (b)) . In particular, if f is differentiable at b then f ′ (f −1 (b)) ̸= 0.
Proof. We need only implicitly differentiate the equation
f −1 (f (x)) = x,
getting
(3) (f −1 )′ (f (x))f ′ (x) = 1,
or
1
(f −1 )′ (f (x)) = .
f ′ (x)
To apply this to get the derivative at b ∈ J, we just need to think a little about our
variables. Let a = f −1 (b), so f (a) = b. Evaluating the last equation at x = a gives
1 1
(f −1 )′ (b) = ′ = ′ −1 .
f (a) f (f (b))
Moreover, since by (3) we have (f −1 )′ (b)f ′ (f −1 (b)) = 1, f ′ (f −1 (b)) ̸= 0.
As mentioned above, unfortunately we need to work a little harder to show the
differentiability of f −1 , and for this we cannot directly use Proposition 38 but end
up deriving it again. Well, enough complaining: here goes.
Theorem 39. (Differentiable Inverse Function Theorem) Let f : I → J be con-
tinuous and bijective. Let b be an interior point of J and put a = f −1 (b). Suppose
that f is differentiable at a and f ′ (a) ̸= 0. Then f −1 is differentiable at b, with the
familiar formula
1 1
(f −1 )′ (b) = ′ = ′ −1 .
f (a) f (f (b))
Proof. [S, Thm. 12.5] We have
f −1 (b + h) − f −1 (b) f −1 (b + h) − a
(f −1 )′ (b) = lim = lim .
h→0 h h→0 h
Since J = f (I), every b + h ∈ J is of the form
b + h = f (a + kh )
for a unique kh ∈ I. Since b + h = f (a + kh ), f −1 (b + h) = a + kh ; let’s make this
9
we may simply replace the “limh→0 ” with “limkh →0 ” and we’ll be done.
But kh = f −1 (b + h) − a, so – since f −1 is continuous by Theorem 37 – we have
lim kh = lim f −1 (b + h) − a = f −1 (b + 0) − a = f −1 (b) − a = a − a = 0.
h→0 h→0
So as h → 0, kh → 0 and thus
1 1 1
(f −1 )′ (b) = = = ′ −1 .
limkh →0 f (a+kkhh)−f (a) f ′ (a) f (f (b))
Let a > 0 be a real number. How should we define ax ? In the following slightly
strange way: for any x ∈ R,
ax := E(L(a)x).
Let us make two comments: first, if a = e this agrees with our previous definition:
ex = E(xL(e))) = E(x). Second, the definition is motivated by the following
desirable law of exponents: (ab )c = abc . Indeed, assuming this holds unrestrictedly
for b, c ∈ R and a > 1, we would have
ax = E(x log a) = ex log a = (elog a )x = ax .
But here is the point: we do not wish to assume that the laws of exponents work
for all real numbers as they do for positive integers...we want to prove them!
Proposition 44. Fix a ∈ (0, ∞). For x ∈ R, we define
ax := E(L(a)x).
If a ̸= 1, we define
L(x)
loga (x) = .
L(a)
a) The function ax is differentiable and (ax )′ = L(a)ax .
b) The function loga x is differentiable and (loga x)′ = L(a)x
1
.
c) Suppose a > 1. Then a is increasing with image (0, ∞), loga x is increasing
x
with image (−∞, ∞), and ax and loga x are inverse functions.
d) For all x, y ∈ R, ax+y = ax ay .
e) For all x > 0 and y ∈ R, (ax )y = axy .
f ) For all x, y > 0, loga (xy) = loga x + loga y.
g) For all x > 0 and y ∈ R, loga (xy ) = y loga x.
Proof. a) We have
(ax )′ = E(L(a)x)′ = E ′ (L(a)x)(L(a)x)′ = E(L(a)x) · L(a) = L(a)ax .
b) We have
( )′
′ L(x) 1
(loga (x)) = = .
L(a) L(a)x
MATH 2400 LECTURE NOTES: DIFFERENTIATION 31
c) Since their derivatives are always positive, ax and loga x are both increasing
functions. Moreover, since a > 1, L(a) > 0 and thus
lim ax = lim E(L(a)x) = E(∞) = ∞,
x→∞ x→∞
L(x) ∞
lim loga (x) = lim = = ∞.
x→∞ L(a)
x→∞ L(a)
Thus ax : (−∞, ∞) → (0, ∞) and loga x : (0, ∞) → (−∞, ∞) are bijective and thus
have inverse functions. Thus check that they are inverses of each other, it suffices
to show that either one of the two compositions is the identity function. Now
L(ax ) L(E(L(a)x)) L(a)x
loga (ax ) = = = = x.
L(a) L(a) L(a)
d) We have
ax+y = E(L(a)(x + y)) = E(L(a)x + L(a)y) = E(L(a)x)E(L(a)y) = ax ay .
e) We have
(ax )y = E(L(ax )y) = E(L(E(L(a)x))y) = E(L(a)xy) = axy .
f) We have
L(xy) L(x) + L(y) L(x) L(y)
loga (xy) = = = + = loga x + loga y.
L(a) L(a) L(a) L(a)
g) We have
L(xy ) L(E(L(x)y)) L(x)y
loga xy = = = = y loga x.
L(a) L(a) L(a)
x
Having established all this, we now feel free to write e for E(x) and log x for L(x).
Exercise: Suppose 0 < a < 1. Show that ax is decreasing with image (0, ∞),
loga x is decreasing with image (0, ∞), and ax and loga x are inverse functions.
Exercise: Prove the change of base formula: for all a, b, c > 0 with a, c ̸= 1,
logc b
loga b = .
logc a
7.3. Some inverse trigonometric functions.
We now wish to consider inverses of the trigonometric functions: sine, cosine, tan-
gent, and so forth. Right away we encounter a problem similar to the case of xn for
even n: the trigonometric functions are periodic, hence certainly not injective on
their entire domain. Once again we are forced into the art of domain restriction
(as opposed to the science of codomain restriction).
Consider first f (x) = sin x. To get an inverse function, we need to restrict the
domain to some subset S on which f is injective. As usual we like intervals, and a
little thought shows that the maximal possible length of an interval on which the
sine function is injective is π, attained by any interval at which the function either
increases from −1 to 1 or decreases from 1 to −1. This still gives us choices to make.
The most standard choice – but to be sure, one that is not the only possible one
32 PETE L. CLARK
Now consider f (x) = cos x. Since f is even, it is not injective on any interval
containing 0 in its interior. Reflecting a bit on the graph of f (x) = cos x one sees
that a reasonable choice for the restricted domain is [0, π]: since f ′ (x) = − sin x is
negative on (0, π) and 0 and 0 and π, f (x) is decreasing on [0, π] and hence injective
there. Its image is f ([0, π])) = [−1, 1]. Therefore we have an inverse function
arccos : [−1, 1] → [0, π].
Since cos x is continuous, so is arccos x. Since cos x is differentiable and has zero
derivative only at 0 and π, arccos x is differentiable on (−1, 1) and has vertical tan-
gent lines at x = −1 and x = 1. Morever, since cos x is decreasing, so is arccos x.
We find a formula for the derivative of the arccos function just as we did for arcsin
above: differentiating the identity
cos arccos x = x
gives
− sin(arccos x) arccos′ x = 1,
or
−1
arccos′ x = .
sin arccos x
MATH 2400 LECTURE NOTES: DIFFERENTIATION 33
Again, this may be simplified. If φ = arccos √x, then x = cos φ, so if we are on the
unit circle then the y-coordinate is sin φ = 1 − x2 , and thus
−1
arccos′ x = √ .
1 − x2
Remark: It is hard not to notice that the derivatives of the arcsine and the arccosine
are simply negatives of each other, so for all x ∈ [0, π2 ],
arccos′ x + arcsin′ x = 0.
By the Zero Velocity Theorem, we conclude
arccos x + arcsin x = C
for some constant C. To determine C, simply evaluate at x = 0:
π π
C = arccos 0 + arcsin 0 = + 0 = ,
2 2
and thus for all x ∈ [0, π2 ] we have
π
arccos x + arcsin x = .
2
Thus the angle θ whose sine is x is complementary to the angle φ whose cosine is
x. A little thought should convince you that this is a familiar fact.
sin x
Finally, consider f (x) = tan x = cos x . The domain is all real numbers for which
cos x ̸= 0, so all real numbers except ± π2 , ± 3π2 , . . .. The tangent function is peri-
odic with period π and also odd, which suggests that, as with the sine function, we
should restrict this domain to the largest interval about 0 on which f is defined and
injective. Since f ′ (x) = sec2 x > 0, f is increasing on ( −π π
2 , 2 ) and thus is injective
there. Moreover, limx→± π2 tan x = ±∞, so by the Intermediate Value Theorem
f (( −π
2 , 2 )) = R. Therefore we have an inverse function
π
−π π
arctan : R → ( , ).
2 2
Since the tangent function is differentiable with everywhere positive derivative, the
same is true for arctan x. In particular it is increasing, but not without bound:
we have limx→±∞ arctan x = ± π2 . In other words the arctangent has horizontal
asymptotes at y = ± π2 .
References
[S] M. Spivak, Calculus. Fourth edition.