0% found this document useful (0 votes)
26 views707 pages

Engineering Mathematics Final Ok

The course 'Engineering Mathematics I' aims to introduce mathematical concepts relevant to automotive and power engineering, teach problem identification using mathematical techniques, and develop decision-making skills based on mathematical solutions. The content includes set theory, cardinality, and basic operations on sets, along with historical context and axiomatic foundations of set theory. Key concepts such as union, intersection, and Cartesian products are also covered, emphasizing their importance in mathematical logic and applications.

Uploaded by

Kenny Gava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views707 pages

Engineering Mathematics Final Ok

The course 'Engineering Mathematics I' aims to introduce mathematical concepts relevant to automotive and power engineering, teach problem identification using mathematical techniques, and develop decision-making skills based on mathematical solutions. The content includes set theory, cardinality, and basic operations on sets, along with historical context and axiomatic foundations of set theory. Key concepts such as union, intersection, and Cartesian products are also covered, emphasizing their importance in mathematical logic and applications.

Uploaded by

Kenny Gava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 707

NAMES COURSES: ENGINEERING MATHEMATICS I

OBJECTIVES

The objectives of this course are designed to :


-Introduce to students the mathematical concept applicable in
automotive and power engineering based application
-Teach students how to identify problems that can be solved using
mathematical techniques
-Enable students learn how to develop a rationale or decision making
based on mathematical solutions
-Train students in using graphical representation of mathematical
concepts
DESCRIPTION OF THE CONTENT

PART : ONE
CHAPTER 1: ELEMENTS OF MATHEMATICS
1.1.Set theory

Sets

Writing A = {1, 2, 3, 4} means that the elements of the set A are the
numbers 1, 2, 3 and 4. Sets of elements of A, for example {1, 2}, are
subsets of A.

1
Sets can themselves be elements. For example, consider the set
B = {1, 2, {3, 4}}. The elements of B are not 1, 2, 3, and 4. Rather, there
are only three elements of B, namely the numbers
umbers 1 and 2, and the set
{3, 4}.

The elements of a set can be anything. For example,


C = { red, green, blue }, is the set whose elements are the colors red,
green and blue.

Notation and terminology

First usage of the symbol ϵ in the work Arithmetices principia nova


methodo exposita by Giuseppe Peano
Peano.

The relation "is an element of", also called set membership


membership, is denoted
by the symbol "∈".
". Writing

means that "x is an element of A".


". Equivalent expressions are "x
" is a
member of A", "x belongs to A", "x is in A" and "x lies in A". The
expressions "A includes x" and "A contains x"" are also used to mean set
membership, however some authors use them to mean instead "x
" is a
subset of A".[1] Logician George Boolos strongly urged that "contains"
be used for membership only aand
nd "includes" for the subset relation
only.[2]

2
Another possible notation for the same relation is

meaning "A contains x",


x though it is used less often.

The negation of set membership is denoted by the symbol "∉". Writing

means that "x is not an element of A".

The symbol ϵ was first used by Giuseppe Peano 1889 in his work
Arithmetices principia nova methodo exposita.
exposita. Here he wrote on page
X:

"Signum ϵ significat est. Ita a ϵ b legitur a est quoddam b; ..."

which means

"The symbol ϵ means is.


is So a ϵ b is read as a is a b; ..."

The symbol itself is a stylized lowercase Greek letter epsilon ("ε"), the
first letter of the word ἐστί, which means "is".

The Unicode characters for these symbols are U+2208 ('element of'),
U+220B ('contains as member') and U+2209 ('not an element of'). The
equivalent LaTeX commands are "\in",
" "\ni" and "\notin".
notin". Mathematica
has commands "\[Element]"
[Element]" and "\[NotElement]".
"
3
Cardinality of sets

The number of elements in a particular set is a property known as


cardinality; informally, this is the size of a set. In the above examples the
cardinality of the set A is 4, while the cardinality of either of the sets B
and C is 3. An infinite set is a set with an infinite number of elements,
while a finite set is a set with a finite number of elements. The above
examples are examples of finite sets. An example of an infinite set is the
set of positive integers = { 1, 2, 3, 4, ... }.

Examples

Using the sets defined above, namely A = {1, 2, 3, 4 }, B = {1, 2, {3,


4}} and C = { red, green, blue }:

• 2∈A
• {3,4} ∈ B
• {3,4} is a member of B
• Yellow ∉ C
• The cardinality of D = { 2, 4, 8, 10, 12 } is finite and equal to 5.
• The cardinality of P = { 2, 3, 5, 7, 11, 13, ...} (the prime numbers)
is infinite (this was proven by Euclid).

Set theory
A Venn diagram illustrating the intersection of two sets.

4
Set theory is the branch of mathematical logic that studies sets, which
informally are collections of objects. Although any type of object can be
collected into a set, set theory is applied most often to objects that are
relevant to mathematics. The language of set theory can be used in the
definitions of nearly all mathematical objects.

The modern study of set theory was initiated by Georg Cantor and
Richard Dedekind in the 1870s. After the discovery of paradoxes in
naive set theory, numerous axiom systems were proposed in the early
twentieth century, of which the Zermelo–Fraenkel axioms, with the
axiom of choice, are the best-known.

Set theory is commonly employed as a foundational system for


mathematics, particularly in the form of Zermelo– Fraenkel set theory
with the axiom of choice. Beyond its foundational role, set theory is a
branch of mathematics in its own right, with an active research
community. Contemporary research into set theory includes a diverse
collection of topics, ranging from the structure of the real number line to
the study of the consistency of large cardinals.

History

Georg Cantor.

Mathematical topics typically emerge and evolve through interactions


among many researchers. Set theory, however, was founded by a single
5
paper in 1874 by Georg Cantor: "On a Property of the Collection of All
Real Algebraic Numbers".

Since the 5th century BC, beginning with Greek mathematician Zeno of
Elea in the West and early Indian mathematicians in the East,
mathematicians had struggled with the concept of infinity. Especially
notable is the work of Bernard Bolzano in the first half of the 19th
century.[3] Modern understanding of infinity began in 1867–71, with
Cantor's work on number theory. An 1872 meeting between Cantor and
Richard Dedekind influenced Cantor's thinking and culminated in
Cantor's 1874 paper.

Cantor's work initially polarized the mathematicians of his day. While


Karl Weierstrass and Dedekind supported Cantor, Leopold Kronecker,
now seen as a founder of mathematical constructivism, did not.
Cantorian set theory eventually became widespread, due to the utility of
Cantorian concepts, such as one-to-one correspondence among sets, his
proof that there are more real numbers than integers, and the "infinity of
infinities" ("Cantor's paradise") resulting from the power set operation.
This utility of set theory led to the article "Mengenlehre" contributed in
1898 by Arthur Schoenflies to Klein's encyclopedia.

The next wave of excitement in set theory came around 1900, when it
was discovered that Cantorian set theory gave rise to several
contradictions, called antinomies or paradoxes. Bertrand Russell and

6
Ernst Zermelo independently found the simplest and best known
paradox, now called Russell's paradox: consider "the set of all sets that
are not members of themselves", which leads to a contradiction since it
must be a member of itself, and not a member of itself. In 1899 Cantor
had himself posed the question "What is the cardinal number of the set
of all sets?", and obtained a related paradox. Russell used his paradox as
a theme in his 1903 review of continental mathematics in his The
Principles of Mathematics.

In 1906 English readers were treated to Theory of Sets of Points[4] by


William Henry Young and his wife Grace Chisholm Young, published
by Cambridge University Press.

The momentum of set theory was such that debate on the paradoxes did
not lead to its abandonment. The work of Zermelo in 1908 and Abraham
Fraenkel in 1922 resulted in the set of axioms ZFC, which became the
most commonly used set of axioms for set theory. The work of analysts
such as Henri Lebesgue demonstrated the great mathematical utility of
set theory, which has since become woven into the fabric of modern
mathematics. Set theory is commonly used as a foundational system,
although in some areas category theory is thought to be a preferred
foundation.

Basic concepts and notation

7
Set theory begins with a fundamental binary relation between an object o
and a set A. If o is a member (or element) Bof A, write o ∈ A. Since
sets are objects, the membership relation can relate sets as well.

A derived binary relation between two sets is the subset relation, also
called set inclusion. If all the members of set A are also members of set
B, then A is a subset of B, denoted A ⊆ B. For example, {1, 2} is a
subset of {1, 2, 3} , and so is {2} but {1, 4} is not. From this definition,
it is clear that a set is a subset of itself; for cases where one wishes to
rule this out, the term proper subset is defined. A is called a proper
subset of B if and only if A is a subset of B, but B is not a subset of A.
Note also that 1 and 2 and 3 are members (elements) of set {1, 2, 3} , but
are not subsets, and the subsets in turn are not as such members of the
set.

Just as arithmetic features binary operations on numbers, set theory


features binary operations on sets. The:

• Union of the sets A and B, denoted A ∪ B, is the set of all objects


that are a member of A, or B, or both. The union of {1, 2, 3} and
{2, 3, 4} is the set {1, 2, 3, 4} .
• Intersection of the sets A and B, denoted A ∩ B, is the set of all
objects that are members of both A and B. The intersection of {1,
2, 3} and {2, 3, 4} is the set {2, 3} .

8
• Set difference of U and A, denoted U \ A, is the set of all members
of U that are not members of A. The set difference {1, 2, 3} \ {2, 3,
4} is {1} , while, conversely, the set difference {2, 3, 4} \ {1, 2, 3}
is {4} . When A is a subset of U, the set difference U \ A is also
called the complement of A in U. In this case, if the choice of U is
clear from the context, the notation Ac is sometimes used instead
of U \ A, particularly if U is a universal set as in the study of Venn
diagrams.
• Symmetric difference of sets A and B, denoted A △ B or A ⊖ B,
is the set of all objects that are a member of exactly one of A and B
(elements which are in one of the sets, but not in both). For
instance, for the sets {1, 2, 3} and {2, 3, 4} , the symmetric
difference set is {1, 4} . It is the set difference of the union and the
intersection, (A ∪ B) \ (A ∩ B) or (A \ B) ∪ (B \ A).
• Cartesian product of A and B, denoted A × B, is the set whose
members are all possible ordered pairs (a, b) where a is a member
of A and b is a member of B. The cartesian product of {1, 2} and
{red, white} is {(1, red), (1, white), (2, red), (2, white)}.
• Power set of a set A is the set whose members are all possible
subsets of A. For example, the power set of {1, 2} is { {}, {1},
{2}, {1, 2} } .

9
Some basic sets of central importance are the empty set (the unique set
containing no elements), the set of natural numbers, and the set of real
numbers.

Some ontology

An initial segment of the von Neumann hierarchy.

A set is pure if all of its members are sets, all members of its members
are sets, and so on. For example, the set {{}} containing only the empty
set is a nonempty pure set. In modern set theory, it is common to restrict
attention to the von Neumann universe of pure sets, and many systems
of axiomatic set theory are designed to axiomatize the pure sets only.
There are many technical advantages to this restriction, and little
generality is lost, because essentially all mathematical concepts can be
modeled by pure sets. Sets in the von Neumann universe are organized
into a cumulative hierarchy, based on how deeply their members,
members of members, etc. are nested. Each set in this hierarchy is
assigned (by transfinite recursion) an ordinal number α, known as its
rank. The rank of a pure set X is defined to be the least upper bound of
all successors of ranks of members of X. For example, the empty set is
assigned rank 0, while the set {{}} containing only the empty set is
assigned rank 1. For each ordinal α, the set Vα is defined to consist of all
pure sets with rank less than α. The entire von Neumann universe is
denoted V.

10
Axiomatic set theory

Elementary set theory can be studied informally and intuitively, and so


can be taught in primary schools using Venn diagrams. The intuitive
approach tacitly assumes that a set may be formed from the class of all
objects satisfying any particular defining condition. This assumption
gives rise to paradoxes, the simplest and best known of which are
Russell's paradox and the Burali-Forti paradox. Axiomatic set theory
was originally devised to rid set theory of such paradoxes.[5]

The most widely studied systems of axiomatic set theory imply that all
sets form a cumulative hierarchy. Such systems come in two flavors,
those whose ontology consists of:

• Sets alone. This includes the most common axiomatic set theory,
Zermelo–Fraenkel set theory (ZFC), which includes the axiom
of choice. Fragments of ZFC include:
o Zermelo set theory, which replaces the axiom schema of
replacement with that of separation;
o General set theory, a small fragment of Zermelo set theory
sufficient for the Peano axioms and finite sets;
o Kripke–Platek set theory, which omits the axioms of infinity,
powerset, and choice, and weakens the axiom schemata of
separation and replacement.

11
• Sets and proper classes. These include Von Neumann–Bernays–
Gödel set theory, which has the same strength as ZFC for theorems
about sets alone, and Morse-Kelley set theory and Tarski–
Grothendieck set theory, both of which are stronger than ZFC.

The above systems can be modified to allow urelements, objects that


can be members of sets but that are not themselves sets and do not have
any members.

The systems of New Foundations NFU (allowing urelements) and NF


(lacking them) are not based on a cumulative hierarchy. NF and NFU
include a "set of everything, " relative to which every set has a
complement. In these systems urelements matter, because NF, but not
NFU, produces sets for which the axiom of choice does not hold.

Systems of constructive set theory, such as CST, CZF, and IZF, embed
their set axioms in intuitionistic instead of classical logic. Yet other
systems accept classical logic but feature a nonstandard membership
relation. These include rough set theory and fuzzy set theory, in which
the value of an atomic formula embodying the membership relation is
not simply True or False. The Boolean-valued models of ZFC are a
related subject.

An enrichment of ZFC called Internal Set Theory was proposed by


Edward Nelson in 1977.

12
Applications

Many mathematical concepts can be defined precisely using only set


theoretic concepts. For example, mathematical structures as diverse as
graphs, manifolds, rings, and vector spaces can all be defined as sets
satisfying various (axiomatic) properties. Equivalence and order
relations are ubiquitous in mathematics, and the theory of mathematical
relations can be described in set theory.

Set theory is also a promising foundational system for much of


mathematics. Since the publication of the first volume of Principia
Mathematica, it has been claimed that most or even all mathematical
theorems can be derived using an aptly designed set of axioms for set
theory, augmented with many definitions, using first or second order
logic. For example, properties of the natural and real numbers can be
derived within set theory, as each number system can be identified with
a set of equivalence classes under a suitable equivalence relation whose
field is some infinite set.

Set theory as a foundation for mathematical analysis, topology, abstract


algebra, and discrete mathematics is likewise uncontroversial;
mathematicians accept that (in principle) theorems in these areas can be
derived from the relevant definitions and the axioms of set theory. Few
full derivations of complex mathematical theorems from set theory have
been formally verified, however, because such formal derivations are

13
often much longer than the natural language proofs mathematicians
commonly present. One verification project, Metamath, includes human-
written, computer‐verified derivations of more than 12, 000 theorems
starting from ZFC set theory, first order logic and propositional logic.

Areas of study

Set theory is a major area of research in mathematics, with many


interrelated subfields.

Combinatorial set theory

Combinatorial set theory concerns extensions of finite combinatorics


to infinite sets. This includes the study of cardinal arithmetic and the
study of extensions of Ramsey's theorem such as the Erdős–Rado
theorem.

Descriptive set theory

Descriptive set theory is the study of subsets of the real line and, more
generally, subsets of Polish spaces. It begins with the study of
pointclasses in the Borel hierarchy and extends to the study of more
complex hierarchies such as the projective hierarchy and the Wadge
hierarchy. Many properties of Borel sets can be established in ZFC, but
proving these properties hold for more complicated sets requires
additional axioms related to determinacy and large cardinals.

14
The field of effective descriptive set theory is between set theory and
recursion theory. It includes the study of lightface pointclasses, and is
closely related to hyperarithmetical theory. In many cases, results of
classical descriptive set theory have effective versions; in some cases,
new results are obtained by proving the effective version first and then
extending ("relativizing") it to make it more broadly applicable.

A recent area of research concerns Borel equivalence relations and more


complicated definable equivalence relations. This has important
applications to the study of invariants in many fields of mathematics.

Fuzzy set theory

In set theory as Cantor defined and Zermelo and Fraenkel axiomatized,


an object is either a member of a set or not. In fuzzy set theory this
condition was relaxed by Lotfi A. Zadeh so an object has a degree of
membership in a set, a number between 0 and 1. For example, the degree
of membership of a person in the set of "tall people" is more flexible
than a simple yes or no answer and can be a real number such as 0.75.

Inner model theory

An inner model of Zermelo–Fraenkel set theory (ZF) is a transitive


class that includes all the ordinals and satisfies all the axioms of ZF. The
canonical example is the constructible universe L developed by Gödel.
One reason that the study of inner models is of interest is that it can be
15
used to prove consistency results. For example, it can be shown that
regardless of whether a model V of ZF satisfies the continuum
hypothesis or the axiom of choice, the inner model L constructed inside
the original model will satisfy both the generalized continuum
hypothesis and the axiom of choice. Thus the assumption that ZF is
consistent (has at least one model) implies that ZF together with these
two principles is consistent.

The study of inner models is common in the study of determinacy and


large cardinals, especially when considering axioms such as the axiom
of determinacy that contradict the axiom of choice. Even if a fixed
model of set theory satisfies the axiom of choice, it is possible for an
inner model to fail to satisfy the axiom of choice. For example, the
existence of sufficiently large cardinals implies that there is an inner
model satisfying the axiom of determinacy (and thus not satisfying the
axiom of choice).

Large cardinals

A large cardinal is a cardinal number with an extra property. Many


such properties are studied, including inaccessible cardinals, measurable
cardinals, and many more. These properties typically imply the cardinal
number must be very large, with the existence of a cardinal with the
specified property unprovable in Zermelo-Fraenkel set theory.

16
Determinacy

Determinacy refers to the fact that, under appropriate assumptions,


certain two-player games of perfect information are determined from the
start in the sense that one player must have a winning strategy. The
existence of these strategies has important consequences in descriptive
set theory, as the assumption that a broader class of games is determined
often implies that a broader class of sets will have a topological
property. The axiom of determinacy (AD) is an important object of
study; although incompatible with the axiom of choice, AD implies that
all subsets of the real line are well behaved (in particular, measurable
and with the perfect set property). AD can be used to prove that the
Wadge degrees have an elegant structure.

Forcing

Paul Cohen invented the method of forcing while searching for a model
of ZFC in which the continuum hypothesis fails, or a model of ZF in
which the axiom of choice fails. Forcing adjoins to some given model of
set theory additional sets in order to create a larger model with
properties determined (i.e. "forced") by the construction and the original
model. For example, Cohen's construction adjoins additional subsets of
the natural numbers without changing any of the cardinal numbers of the
original model. Forcing is also one of two methods for proving relative

17
consistency by finitistic methods, the other method being Boolean-
valued models.

Cardinal invariants

A cardinal invariant is a property of the real line measured by a


cardinal number. For example, a well-studied invariant is the smallest
cardinality of a collection of meagre sets of reals whose union is the
entire real line. These are invariants in the sense that any two isomorphic
models of set theory must give the same cardinal for each invariant.
Many cardinal invariants have been studied, and the relationships
between them are often complex and related to axioms of set theory.

Set-theoretic topology

Set-theoretic topology studies questions of general topology that are


set-theoretic in nature or that require advanced methods of set theory for
their solution. Many of these theorems are independent of ZFC,
requiring stronger axioms for their proof. A famous problem is the
normal Moore space question, a question in general topology that was
the subject of intense research. The answer to the normal Moore space
question was eventually proved to be independent of ZFC.

Objections to set theory as a foundation for mathematics

18
From set theory's inception, some mathematicians have objected to it as
a foundation for mathematics. The most common objection to set theory,
one Kronecker voiced in set theory's earliest years, starts from the
constructivist view that mathematics is loosely related to computation. If
this view is granted, then the treatment of infinite sets, both in naive and
in axiomatic set theory, introduces into mathematics methods and
objects that are not computable even in principle. The feasibility of
constructivism as a substitute foundation for mathematics was greatly
increased by Errett Bishop's influential book Foundations of
Constructive Analysis.

A different objection put forth by Henri Poincaré is that defining sets


using the axiom schemas of specification and replacement, as well as the
axiom of power set, introduces impredicativity, a type of circularity, into
the definitions of mathematical objects. The scope of predicatively
founded mathematics, while less than that of the commonly accepted
Zermelo-Fraenkel theory, is much greater than that of constructive
mathematics, to the point that Solomon Feferman has said that "all of
scientifically applicable analysis can be developed [using predicative
methods]".

Ludwig Wittgenstein condemned set theory. He wrote that "set theory is


wrong", since it builds on the "nonsense" of fictitious symbolism, has
"pernicious idioms", and that it is nonsensical to talk about "all

19
numbers".[9] Wittgenstein's views about the foundations of mathematics
were later criticised by Georg Kreisel and Paul Bernays, and
investigated by Crispin Wright, among others.

Category theorists have proposed topos theory as an alternative to


traditional axiomatic set theory. Topos theory can interpret various
alternatives to that theory, such as constructivism, finite set theory, and
computable set theory.[10] Topoi also give a natural setting for forcing
and discussions of the independence of choice from ZF, as well as
providing the framework for pointless topology and Stone spaces.[11]

An active area of research is the univalent foundations arising from


homotopy type theory. Here, sets may be defined as certain kinds of
types, with universal properties of sets arising from higher inductive
types. Principles such as the axiom of choice and the law of the excluded
middle appear in a spectrum of different forms, some of which can be
proven, others which correspond to the classical notions; this allows for
a detailed discussion of the effect of these axioms on mathematics.

1.2.THEORY OF RELATION AND FUNCTIONS

Set Theory/Relations

Ordered pairs

20
To define relations on sets we must have a concept of an ordered pair, as
opposed to the unordered pairs the axiom of pair gives. To have a
rigorous definition of ordered pair, we aim to satisfy one important
property, namely, for sets a,b,c and d, .

As it stands, there are many ways to define an ordered pair to satisfy this
th
property. A simple definition, then is . (This is true
simply by definition. It is a convention that we can usefully build upon,
and has no deeper significance.)

Theorem

Proof

If and , then .
Now, if then . Then
, so and a=c.
So we have . Thus meaning
.

If , we have and thus so .

If , note , so

Relations

21
Using the definiton of ordered pairs, we now introduce the notion of a
binary relation.

The simplest definition of a binary relation is a set of ordered pairs.


More formally, a set is a relation if for some x,y.
We can simplify the notation and write or simply .

We give a few useful definitions of sets used when speaking of relations.

• The domain of a relation R is defined as


, or all sets that are the initial member
of an ordered pair contained in R.

• The range of a relation R is defined as


, or all sets that are the final member of an ordered pair contained
in R.

• The union of the domain and range, , is


called the field of R.

• A relation R is a relation on a set X if .

• The inverse of R is the set

• The image of a set A under a relation R is defined as


.

22
• The preimage of a set B under a relation R is the image of B over
R-1 or

It is intuitive, when considering a relation, to seek to construct more


relations from it, or to ccombine it with others.

We can compose two relations R and S to form one relation


. So means that there
is some y such that .

We can define a few useful binary relations as examples:

1. The Cartesian Product of two sets is


, or the set where
all elements of A are related to all elements of B. As an exercise,
show that all relations from A to B are subsets of .
Notationally is written
2. The membership relation on a set A,
3. The identity relation on A,

The following properties


ties may or may not hold for a relation R on a set
X:

• R is reflexive if holds for all x in X.


• R is symmetric if implies for all x and y in X.

23
• R is antisymmetric if and together imply that for all
x and y in X.
• R is transitive if and together imply that holds for all
x, y, and z in X.
• R is total if , , or both hold for all x and y in X.

Binary relation
"Relation (mathematics)" redirects here. For a more general notion of
relation, see finitary relation.
relation. For a more combinatorial viewpoint, see
theory of relations.. For other uses, see Relation § Mathematics.
Mathematics

In mathematics, a binary relation on a set A is a collection of ordered


pairs of elements of A.. In other words, it is a subset of the Cartesian
product A2 = A × A.. More generally, a binary relation between two sets
A and B is a subset of A × B. The terms correspondence,
correspondence dyadic
relation and 2-place
place relation are synonyms for binary relation.

An example is the "divides


divides" relation between the set of prime numbers P
and the set of integers Z, in which every prime p is associated with every
integer z that is a multiple of p (but with no integer that is not a multiple
of p).
). In this relation, for instance, the prime 2 is associated with
numbers that include −4, 0, 6, 10, but not 1 or 9; and the prime 3 is
associated with numbers that include 0, 6, and 9, but not 4 or 13.

24
Binary relations are used in many branches of mathematics to model
concepts like "is greater than", "is equal to", and "divides" in arithmetic,
"is congruent to" in geometry, "is adjacent to" in graph theory, "is
orthogonal to" in linear algebra and many more. The concept of function
is defined as a special kind of binary relation. Binary relations are also
heavily used in computer science.

A binary relation is the special case n = 2 of an n-ary relation


R ⊆ A1 × … × An, that is, a set of n-tuples where the jth component of
each n-tuple is taken from the jth domain Aj of the relation. An example
for a ternary relation on Z×Z×Z is "lies between ... and ...", containing
e.g. the triples (5,2,8), (5,8,2), and (−4,9,−7).

In some systems of axiomatic set theory, relations are extended to


classes, which are generalizations of sets. This extension is needed for,
among other things, modeling the concepts of "is an element of" or "is a
subset of" in set theory, without running into logical inconsistencies
such as Russell's paradox.

Formal definition

A binary relation R between arbitrary sets (or classes) X (the set of


departure) and Y (the set of destination or codomain) is specified by
its graph G, which is a subset of the Cartesian product X × Y. The
binary relation R itself is usually identified with its graph G, but some

25
authors define it as an or
ordered triple (X, Y, G),, which is otherwise
referred to as a correspondence.
correspondence

The statement (x, y) ∈ G is read "x is R-related to y",


", and is denoted by
xRy or R(x, y). The latter notation corresponds to viewing R as the
characteristic function of the subset G of X × Y, i.e. R(x
x, y) equals to 1
(true), if (x, y) ∈ G, and 0 (false) otherwise.

The order of the elements in each pair of G is important: if a ≠ b, then


aRb and bRa can be true or false, independently of each other. Resuming
the above example, the prime 3 divides the integer 9, but 9 doesn't
divide 3.

The domain of R is the set of all x such that xRy for at least one y. The
range of R is the set of all y such that xRy for at least one x. The field of
R is the union of its domain and its range.

According to the definition above, two rel


relations
ations with identical graphs
but different domains or different codomains are considered different.
For example, if , then , , and
are three distinct relations, where is the set of integers and is
the set of real numbers.
numbers

Especially in set theory


theory,, binary relations are often defined as sets of
ordered pairs, identifying binary relations with their graphs. The domain
of a binary relation is then defined as the set of all such that there
26
exists at least one such that , the range of is defined as the
set of all such that there exists at least one such that , and
the field of is the union of its domain and its range.

A special case of this difference in points of view applies to the notion


of function.. Many authors insist on distinguishing between a function's
codomain and its range.
range. Thus, a single "rule," like mapping every real
number x to x2, can lead to distinct functions and ,
depending on whether the images under that rule are understood to be
reals or, more restrictively, non-negative
non negative reals. But others view functions
as simply sets of ordered pairs with unique first components. This
difference in perspectives does raise some nontrivial issues. As an
example, the former camp considers surjectivity—or
or being onto—as
onto a
property of functions, while the latter sees it as a relationship that
functions may bear to sets.

Either approach is adequate for most uses, provided that one attends to
the necessary changes in language, notation, and the definitions of
concepts like restrictions,
restrictions composition, inverse relation,, and so on. The
choice
ce between the two definitions usually matters only in very formal
contexts, like category theory.
theory

Example
2nd example relation

27
ball car doll gun

John + − − −

Mary − − + −

Venus − + − −

1st example relation

ball car doll gun

John + − − −

Mary − − + −

Ian − − − −

Venus − + − −

Example: Suppose there are four objects {ball, car, doll, gun} and four
persons {John, Mary, Ian, Venus}. Suppose that John owns the ball,
Mary owns the doll, and Venus owns the car. Nobody owns the gun and
Ian owns nothing. Then the binary relation "is owned by" is given as

R = ({ball, car, doll, gun}, {John, Mary, Ian, Venus}, {(ball, John),
(doll, Mary), (car, Venus)}).

28
Thus the first element of R is the set of objects, the second is the set of
persons, and the last element is a set of ordered pairs of the form (object,
owner).

The pair (ball, John), denoted by ballRJohn means that the ball is owned by
John.

Two different relations could have the same graph. For example: the
relation

({ball, car, doll, gun}, {John, Mary, Venus}, {(ball, John), (doll,
Mary), (car, Venus)})

is different from the previous one as everyone is an owner. But the


graphs of the two relations are the same.

Nevertheless, R is usually identified or even defined as G(R) and "an


ordered pair (x, y) ∈ G(R)" is usually denoted as "(x, y) ∈ R".

Special types of binary relations

Example relations between real numbers. Red: y=x2. Green: y=2x+20.

Some important types of binary relations R between two sets X and Y


are listed below. To emphasize that X and Y can be different sets, some
authors call such binary relations heterogeneous.

Uniqueness properties:
29
• injective (also called left-unique[7]): for all x and z in X and y in Y
it holds that if xRy and zRy then x = z. For example, the green
relation in the diagram is injective, but the red relation is not, as it
relates e.g. both x = −5 and z = +5 to y = 25.
• functional (also called univalent[8] or right-unique[7] or right-
definite[9]): for all x in X, and y and z in Y it holds that if xRy and
xRz then y = z; such a binary relation is called a partial function.
Both relations in the picture are functional. An example for a non-
functional relation can be obtained by rotating the red graph
clockwise by 90 degrees, i.e. by considering the relation x=y2
which relates e.g. x=25 to both y=-5 and z=+5.
• one-to-one (also written 1-to-1): injective and functional. The
green relation is one-to-one, but the red is not.

Totality properties (only definable if the sets of departure X resp.


destination Y are specified; not to be confused with a total relation):

• left-total: for all x in X there exists a y in Y such that xRy. For


example, R is left-total when it is a function or a multivalued
function. Note that this property, although sometimes also referred
to as total, is different from the definition of total in the next
section. Both relations in the picture are left-total. The relation
x=y2, obtained from the above rotation, is not left-total, as it
doesn't relate, e.g., x = −14 to any real number y.

30
• surjective (also called right-total[7] or onto): for all y in Y there
exists an x in X such that xRy. The green relation is surjective, but
the red relation is not, as it doesn't relate any real number x to e.g.
y = −14.

Uniqueness and totality properties:

• A function: a relation that is functional and left-total. Both the


green and the red relation are functions.
• An injective function: a relation that is injective, functional, and
left-total.
• A surjective function or surjection: a relation that is functional,
left-total, and right-total.
• A bijection: a surjective one-to-one or surjective injective function
is said to be bijective, also known as one-to-one
correspondence.[10] The green relation is bijective, but the red is
not.

Difunctional

Less commonly encountered is the notion of difunctional (or regular)


relation, defined as a relation R such that R=RR−1R.

To understand this notion better, it helps to consider a relation as


mapping every element x∈X to a set xR = { y∈Y | xRy }.This set is
sometimes called the successor neighborhood of x in R; one can define
31
the predecessor neighborhood analogously.[12] Synonymous terms for
these notions are afterset and respectively foreset.

A difunctional relation can then be equivalently characterized as a


relation R such that wherever x1R and x2R have a non-empty
intersection, then these two sets coincide; formally x1R ∩ x2R ≠ ∅
implies x1R = x2R.[11]

As examples, any function or any functional (right-unique) relation is


difunctional; the converse doesn't hold. If one considers a relation R
from set to itself (X = Y), then if R is both transitive and symmetric (i.e.
a partial equivalence relation), then it is also difunctional.[13] The
converse of this latter statement also doesn't hold.

A characterization of difunctional relations, which also explains their


name, is to consider two functions f: A → C and g: B → C and then
define the following set which generalizes the kernel of a single function
as joint kernel: ker(f, g) = { (a, b) ∈ A × B | f(a) = g(b) }. Every
difunctional relation R ⊆ A × B arises as the joint kernel of two
functions f: A → C and g: B → C for some set C.[14]

In automata theory, the term rectangular relation has also been used to
denote a difunctional relation. This terminology is justified by the fact
that when represented as a boolean matrix, the columns and rows of a
difunctional relation can be arranged in such a way as to present

32
rectangular blocks of true on the (asymmetric) main diagonal.[15] Other
authors however use the term "rectangular" to denote any heterogeneous
relation whatsoever.[6]

Relations over a set

If X = Y then we simply say that the binary relation is over X, or that it


is an endorelation over X.[16] In computer science, such a relation is
also called a homogeneous (binary) relation.[6][16][17] Some types of
endorelations are widely studied in graph theory, where they are known
as simple directed graphs permitting loops.

The set of all binary relations Rel(X) on a set X is the set 2X × X which is
a Boolean algebra augmented with the involution of mapping of a
relation to its inverse relation. For the theoretical explanation see
Relation algebra.

Some important properties of a binary relation R over a set X are:

• reflexive: for all x in X it holds that xRx. For example, "greater


than or equal to" (≥) is a reflexive relation but "greater than" (>) is
not.
• irreflexive (or strict): for all x in X it holds that not xRx. For
example, > is an irreflexive relation, but ≥ is not.
• coreflexive: for all x and y in X it holds that if xRy then x = y. An
example of a coreflexive relation is the relation on integers in
33
which each odd number is related to itself and there are no other
relations. The equality relation is the only example of a both
reflexive and coreflexive relation.

The previous 3 alternatives are far from being exhaustive; e.g. the
red relation y=x2 from the above picture is neither irreflexive, nor
coreflexive, nor reflexive, since it contains the pair (0,0), and (2,4),
but not (2,2), respectively.

• symmetric: for all x and y in X it holds that if xRy then yRx. "Is a
blood relative of" is a symmetric relation, because x is a blood
relative of y if and only if y is a blood relative of x.
• antisymmetric: for all x and y in X, if xRy and yRx then x = y.
For example, ≥ is anti-symmetric (so is >, but only because the
condition in the definition is always false).[18]
• asymmetric: for all x and y in X, if xRy then not yRx. A relation
is asymmetric if and only if it is both anti-symmetric and
irreflexive.[19] For example, > is asymmetric, but ≥ is not.
• transitive: for all x, y and z in X it holds that if xRy and yRz then
xRz. For example, "is ancestor of" is transitive, while "is parent of"
is not. A transitive relation is irreflexive if and only if it is
asymmetric.[20]

34
• total: for all x and y in X it holds that xRy or yRx (or both). This
definition for total is different from left total in the previous
section. For example, ≥ is a total relation.
• trichotomous: for all x and y in X exactly one of xRy, yRx or x =
y holds. For example, > is a trichotomous relation, while the
relation "divides" on natural numbers is not.[21]
• Right Euclidean: for all x, y and z in X it holds that if xRy and
xRz, then yRz.
• Left Euclidean: for all x, y and z in X it holds that if yRx and zRx,
then yRz.
• Euclidean: A Euclidean relation is both left and right Euclidean.
Equality is a Euclidean relation because if x=y and x=z, then y=z.
• serial: for all x in X, there exists y in X such that xRy. "Is greater
than" is a serial relation on the integers. But it is not a serial
relation on the positive integers, because there is no y in the
positive integers such that 1>y.[22] However, "is less than" is a
serial relation on the positive integers, the rational numbers and the
real numbers. Every reflexive relation is serial: for a given x,
choose y=x. A serial relation can be equivalently characterized as
every element having a non-empty successor neighborhood (see
the previous section for the definition of this notion). Similarly an
inverse serial relation is a relation in which every element has
non-empty predecessor neighborhood.[12]

35
• set-like (or local): for every x in X, the class of all y such that yRx
is a set. (This makes sense only if relations on proper classes are
allowed.) The usual ordering < on the class of ordinal numbers is
set-like, while its inverse > is not.

A relation that is reflexive, symmetric, and transitive is called an


equivalence relation. A relation that is symmetric, transitive, and serial is
also reflexive. A relation that is only symmetric and transitive (without
necessarily being reflexive) is called a partial equivalence relation.

A relation that is reflexive, antisymmetric, and transitive is called a


partial order. A partial order that is total is called a total order, simple
order, linear order, or a chain.[23] A linear order where every nonempty
subset has a least element is called a well-order.

Binary endorelations by property

reflexivit transitiv symbo


symmetry example
y e l

directed

graph

undirected
irreflexive symmetric
graph

36
tournamen antisymmetri pecking
irreflexive
t c order

dependenc
reflexive symmetric
y

strict weak antisymmetri


irreflexive Yes <
order c

total
reflexive Yes ≤
preorder

preferenc
preorder reflexive Yes ≤
e

partial antisymmetri
reflexive Yes ≤ subset
order c

partial
equivalenc symmetric Yes
e

equivalenc ∼, ≅,
reflexive symmetric Yes equality
e relation ≈, ≡

strict antisymmetri proper


irreflexive Yes <
partial c subset

37
order

Operations on binary relations

If R, S are binary relations over X and Y, then each of the following is a


binary relation over X and Y:

• Union: R ∪ S ⊆ X × Y, defined as R ∪ S = { (x, y) | (x, y) ∈ R or


(x, y) ∈ S }. For example, ≥ is the union of > and =.
• Intersection: R ∩ S ⊆ X × Y, defined as R ∩ S = { (x, y) | (x, y) ∈
R and (x, y) ∈ S }.

If R is a binary relation over X and Y, and S is a binary relation over Y


and Z, then the following is a binary relation over X and Z: (see main
article composition of relations)

• Composition: S ∘ R, also denoted R ; S (or more


ambiguouslyR ∘ S), defined as S ∘ R = { (x, z) | there exists y ∈ Y,
such that (x, y) ∈ R and (y, z) ∈ S }. The order of R and S in the
notation S ∘ R, used here agrees with the standard notational order
for composition of functions. For example, the composition "is
mother of" ∘ "is parent of" yields "is maternal grandparent of",
while the composition "is parent of" ∘ "is mother of" yields "is
grandmother of".

38
A relation R on sets X and Y is said to be contained in a relation S on X
and Y if R is a subset of S, that is, if x R y always implies x S y. In this
case, if R and S disagree, R is also said to be smaller than S. For
example, > is contained in ≥.

If R is a binary relation over X and Y, then the following is a binary


relation over Y and X:

• Inverse or converse: R −1, defined as R −1 = { (y, x) | (x, y) ∈ R }.


A binary relation over a set is equal to its inverse if and only if it is
symmetric. See also duality (order theory). For example, "is less
than" (<) is the inverse of "is greater than" (>).

If R is a binary relation over X, then each of the following is a binary


relation over X:

• Reflexive closure: R =, defined as R = = { (x, x) | x ∈ X } ∪ R or


the smallest reflexive relation over X containing R. This can be
proven to be equal to the intersection of all reflexive relations
containing R.
• Reflexive reduction: R ≠, defined as R ≠ = R \ { (x, x) | x ∈ X } or
the largest irreflexive relation over X contained in R.
• Transitive closure: R +, defined as the smallest transitive relation
over X containing R. This can be seen to be equal to the
intersection of all transitive relations containing R.

39
• Transitive reduction: R −, defined as a[clarification needed] minimal
relation having the same transitive closure as R.
• Reflexive transitive closure: R *, defined as R * = (R +) =, the
smallest preorder containing R.
• Reflexive transitive symmetric closure: R ≡, defined as the
smallest equivalence relation over X containing R.

Complement

If R is a binary relation over X and Y, then the following too:

• The complement S is defined as x S y if not x R y. For example,


on real numbers, ≤ is the complement of >.

The complement of the inverse is the inverse of the complement.

If X = Y, the complement has the following properties:

• If a relation is symmetric, the complement is too.


• The complement of a reflexive relation is irreflexive and vice
versa.
• The complement of a strict weak order is a total preorder and vice
versa.

The complement of the inverse has these same properties.

40
Restriction

The restriction of a binary relation on a set X to a subset S is the set of


all pairs (x, y) in the relation for which x and y are in S.

If a relation is reflexive, irreflexive, symmetric, antisymmetric,


asymmetric, transitive, total, trichotomous, a partial order, total order,
strict weak order, total preorder (weak order), or an equivalence relation,
its restrictions are too.

However, the transitive closure of a restriction is a subset of the


restriction of the transitive closure, i.e., in general not equal. For
example, restricting the relation "x is parent of y" to females yields the
relation "x is mother of the woman y"; its transitive closure doesn't relate
a woman with her paternal grandmother. On the other hand, the
transitive closure of "is parent of" is "is ancestor of"; its restriction to
females does relate a woman with her paternal grandmother.

Also, the various concepts of completeness (not to be confused with


being "total") do not carry over to restrictions. For example, on the set of
real numbers a property of the relation "≤" is that every non-empty
subset S of R with an upper bound in R has a least upper bound (also
called supremum) in R. However, for a set of rational numbers this
supremum is not necessarily rational, so the same property does not hold
on the restriction of the relation "≤" to the set of rational numbers.

41
The left-restriction (right-restriction, respectively) of a binary relation
between X and Y to a subset S of its domain (codomain) is the set of all
pairs (x, y) in the relation for which x (y) is an element of S.

Algebras, categories, and rewriting systems

Various operations on binary endorelations can be treated as giving rise


to an algebraic structure, known as relation algebra. It should not be
confused with relational algebra which deals in finitary relations (and in
practice also finite and many-sorted).

For heterogenous binary relations, a category of relations arises.[6]

Despite their simplicity, binary relations are at the core of an abstract


computation model known as an abstract rewriting system.

Sets versus classes

Certain mathematical "relations", such as "equal to", "member of", and


"subset of", cannot be understood to be binary relations as defined
above, because their domains and codomains cannot be taken to be sets
in the usual systems of axiomatic set theory. For example, if we try to
model the general concept of "equality" as a binary relation =, we must
take the domain and codomain to be the "class of all sets", which is not a
set in the usual set theory.

42
In most mathematical contexts, references to the relations of equality,
membership and subset are harmless because they can be understood
implicitly to be restricted to some set in the context. The usual work-
around to this problem is to select a "large enough" set A, that contains
all the objects of interest, and work with the restriction =A instead of =.
Similarly, the "subset of" relation ⊆ needs to be restricted to have
domain and codomain P(A) (the power set of a specific set A): the
resulting set relation can be denoted ⊆A. Also, the "member of" relation
needs to be restricted to have domain A and codomain P(A) to obtain a
binary relation ∈A that is a set. Bertrand Russell has shown that
assuming ∈ to be defined on all sets leads to a contradiction in naive set
theory.

Another solution to this problem is to use a set theory with proper


classes, such as NBG or Morse–Kelley set theory, and allow the domain
and codomain (and so the graph) to be proper classes: in such a theory,
equality, membership, and subset are binary relations without special
comment. (A minor modification needs to be made to the concept of the
ordered triple (X, Y, G), as normally a proper class cannot be a member
of an ordered tuple; or of course one can identify the function with its
graph in this context.) With this definition one can for instance define a
function relation between every set and its power set.

Examples of common binary relations

43
• order relations,, including strict orders:
o greater than
o greater than or equal to
o less than
o less than or equal to
o divides (evenly)
o is a subset of
• equivalence relations
relations:
o equality
o is parallel to (for affine spaces)
o is in bijection with
o isomorphy
• dependency relation,
relation, a finite, symmetric, reflexive relation.
• independency relation
relation,, a symmetric, irreflexive relation which is
the complement of some dependency relation.

Functions

Definitions

A function may be defined as a particular type of relation. We define a


partial function as some mapping from a set to another set
that assigns to each no more than one . Alternatively, f is
a function if and only if

44
If on each , assigns exactly one , then is called total
function or just function. The following definitions are commonly used
when discussing functions.

• If and is a function, then we can denote this by writing


. The set is known as the domain and the set is
known as the codomain.
codomain
• For a function , the image of an element is
such that . Alternatively, we can say that is the value of
evaluated at .
• For a function , the image of a subset of is the set
. This set is denoted by . Be
careful not to confuse this with for , which is an element
of .
• The range of a function is , or all of the values
where we can find an such that .
• For a function , the preimage of a subset of is the
set . This is denoted by .

Properties of functions

A function is onto, or surjective, if for each exists


such that . It is easy to show that a function is surjective
if and only if its codomain is equal to its range. It is one--to-one, or
injective, if different elements of are mapped to different elements of
45
, that is . A function that is both injective and
surjective is intuitively termed bijective.

Composition of functions

Given two functions and , we may be interested in


first evaluating f at some and then evaluating g at . To this
end, we define the composition' of these functions, written , as

Note that the composition of these functions maps an element in to an


element in , so we would write .

Inverses of functions

If there exists a function such that for , ,


we call a left inverse of . If a left inverse for exists, we say that is
left invertible.. Similarly, if there exists a function such that
then we call a right inverse of . If such an exists, we say
that is right invertible.
invertible. If there exists an element which is both a left
and right inverse of , we say that such an element is the inverse of and
denote it by . Be careful not to confuse this with the preimage of f;
the preimage of f always exists while the inverse may not. Proof of the
following theorems is left as an exercis
exercise to the reader.

46
Theorem: If a function has both a left inverse and a right inverse ,
then .

Theorem: A function is invertible if and only if it is bijective.

ction[1] is a relation between a set of inputs and a


In mathematics, a function
set of permissible outputs with the property that each input is related to
exactly one output. An example is the function that relates each real
number x to its square x2. The output of a function f corresponding to an
input x is denoted by f(x)
f (read "f of x").
"). In this example, if the input is
−3, then the output is 9, and we may write f(−3)
−3) = 9. Likewise, if the
input is 3, then the output is also 9, and we may write f(3)
(3) = 9. (The
same output may be produced by more than one input, but each input
gives only one output.) The input variable(s) are sometim
sometimes referred to
as the argument(s) of the function.

Functions of various kinds are "the central objects of investigation"[2] in


most fields of modernn mathematics. There are many ways to describe or
represent a function. Some functions may be defined by a formula or
algorithm that tells how to compute the output for a given input. Others
are given by a picture, called the graph of the function.. In science,
functions are sometimes defined by a table that gives the outputs for
selected inputs. A function could be described implicitly, for example as
the inverse to another function or as a solution of a differential equation.
equation

47
The input and output of a function can be expressed as an ordered pair,
ordered so that the first element is the input (or tuple of inputs, if the
function takes more than one input), and the second is the output. In the
example above, f(x) = x2, we have the ordered pair (−3, 9). If both input
and output are real numbers, this ordered pair can be viewed as the
Cartesian coordinates of a point on the graph of the function.

In modern mathematics,[3] a function is defined by its set of inputs,


called the domain; a set containing the set of outputs, and possibly
additional elements, as members, called its codomain; and the set of all
input-output pairs, called its graph. Sometimes the codomain is called
the function's "range", but more commonly the word "range" is used to
mean, instead, specifically the set of outputs (this is also called the
image of the function). For example, we could define a function using
the rule f(x) = x2 by saying that the domain and codomain are the real
numbers, and that the graph consists of all pairs of real numbers (x, x2).
The image of this function is the set of non-negative real numbers.
Collections of functions with the same domain and the same codomain
are called function spaces, the properties of which are studied in such
mathematical disciplines as real analysis, complex analysis, and
functional analysis.

In analogy with arithmetic, it is possible to define addition, subtraction,


multiplication, and division of functions, in those cases where the output

48
is a number. Another important operation defined on functions is
function composition,, where the output from one function becomes the
input to another function.

Introduction and examples

A function that associates to any of the four colored shapes its color.

For an example of a function, let X be the set consisting of four shapes:


a red triangle, a yellow rectangle, a green hexagon, and a red square; and
let Y be the set consisting of five colors: red, blue, green, pink,
p and
yellow. Linking each shape to its color is a function from X to Y: each
shape is linked to a color (i.e., an element in Y),
), and each shape is
"linked", or "mapped", to exactly one color. There is no shape that lacks
a color and no shape that has two
two or more colors. This function will be
referred to as the "color-of-the-shape
"color function".

49
The input to a function is called the argument and the output is called the
value. The set of all permitted inputs to a given function is called the
domain of the function, while the set of permissible outputs is called the
codomain. Thus, the domain of the "color-of-the-shape function" is the
set of the four shapes, and the codomain consists of the five colors. The
concept of a function does not require that every possible output is the
value of some argument, e.g. the color blue is not the color of any of the
four shapes in X.

A second example of a function is the following: the domain is chosen to


be the set of natural numbers (1, 2, 3, 4, ...), and the codomain is the set
of integers (..., −3, −2, −1, 0, 1, 2, 3, ...). The function associates to any
natural number n the number 4−n. For example, to 1 it associates 3 and
to 10 it associates −6.

A third example of a function has the set of polygons as domain and the
set of natural numbers as codomain. The function associates a polygon
with its number of vertices. For example, a triangle is associated with
the number 3, a square with the number 4, and so on.

The term range is sometimes used either for the codomain or for the set
of all the actual values a function has.

Definition

50
The above diagram represents a function with domain {1, 2, 3},
codomain {A, B, C, D} and set of ordered pairs {(1,D), (2,C), (3,C)}.
The image is {C,D}.

However, this second diagram does not represent a function. One reason
is that 2 is the first element in more than one ordered pair. In particular,
(2, B) and (2, C) are both elements of the set of ordered pairs. Another
reason, sufficient by itself, is that 3 is not the first element (input) for

51
any ordered pair. A third reason, likewise, is that 4 is not the first
element of any ordered pair.

In order to avoid the use of the informally defined concepts of "rules"


and "associates", the above intuitive explanation of functions is
completed with a formal definition. This definition relies on the notion
of the Cartesian product. The Cartesian product of two sets X and Y is
the set of all ordered pairs, written (x, y), where x is an element of X and
y is an element of Y. The x and the y are called the components of the
ordered pair. The Cartesian product of X and Y is denoted by X × Y.

A function f from X to Y is a subset of the Cartesian product X × Y


subject to the following condition: every element of X is the first
component of one and only one ordered pair in the subset.[4] In other
words, for every x in X there is exactly one element y such that the
ordered pair (x, y) is contained in the subset defining the function f. This
formal definition is a precise rendition of the idea that to each x is
associated an element y of Y, namely the uniquely specified element y
with the property just mentioned.

Considering the "color-of-the-shape" function above, the set X is the


domain consisting of the four shapes, while Y is the codomain
consisting of five colors. There are twenty possible ordered pairs (four
shapes times five colors), one of which is

52
("yellow rectangle", "red").
"red")

The "color-of-the-shape"
shape" function described above consists of the set of
those ordered pairs,

(shape, color)

where the color is the actual color of the given shape. Thus, the pair
("red triangle", "red") is in the function, but the pair ("yellow rectangle",
rectangl
"red") is not.

Functional notation

A function f with domain X and codomain Y is commonly denoted by

or

In this context, the elements of X are called arguments of f. For each


argument x,, the corresponding unique y in the codomain is called the
function value at x or the image of x under f.. It is written as f(x). One
says that f associates y with x or maps x to y.. This is abbreviated by

53
A general function is often denoted by f.. Special functions have names,
for example, the signum function is denoted by sgn. Given a real
number x,, its image under the signum function is then written as sgn(x).
sgn(
Here, the argument is denoted by the symbol x,, but different symbols
may be used in other contexts. For example, in physics, the velocity of
some body, depending on the time, is denoted v(t).
). The parentheses
around the argument may be omitted when there is little chance of
confusion, thus: sin x;; this is known as prefix notation.

In order to denote a specific function, the notation (an arrow with a


bar at its tail) is used. For example, the above function reads

The first part can be read as:

• "f is a function from ℕ (the set of natural numbers) to ℤ (the set of


integers)" or
• "f is a ℤ-valued
valued function of an ℕ-valued
valued variable".

The second part is read:

• "x maps to 4−x."


."

In other words, this function has the natural numbers as domain, the
integers as codomain. Strictly speaking, a function is properly defined

54
only when the domain and codomain are specified. For example, the
formula f(x) = 4 − x alone (without specifying the codomain and
domain) is not a properly defined function. Moreover, the function

(with different domain) is not considered the same function, even though
the formulas defining f and g agree, and similarly with a different
codomain. Despite that, many authors drop the specification of the
domain and codomain, especially if these are clear from the context. So
in this example many just write f(x) = 4 − x.. Sometimes, the maximal
possible domain is also understood implicitly: a formula such as
may mean that the domain of f is the set of real numbers x
where the square root is defined (in this case x ≤ 2 or x ≥ 3).

To define a function, sometimes a dot notation is used in order to


emphasizee the functional nature of an expression without assigning a
special symbol to the variable. For instance, stands for the function
, stands for the integral function , and so on.

Specifying a function

A function can be defined by any mathematical condi


condition
tion relating each
argument (input value) to the corresponding output value. If the domain
is finite, a function f may be defined by simply tabulating all the

55
arguments x and their corresponding function values f(x). More
commonly, a function is defined by a formula, or (more generally) an
algorithm — a recipe that tells how to compute the value of f(x) given
any x in the domain.

There are many other ways of defining functions. Examples include


piecewise definitions, induction or recursion, algebraic or analytic
closure, limits, analytic continuation, infinite series, and as solutions to
integral and differential equations. The lambda calculus provides a
powerful and flexible syntax for defining and combining functions of
several variables. In advanced mathematics, some functions exist
because of an axiom, such as the Axiom of Choice.

The graph of a function is its set of ordered pairs F. This is an


abstraction of the idea of a graph as a picture showing the function
plotted on a pair of coordinate axes; for example, (3, 9), the point above
3 on the horizontal axis and to the right of 9 on the vertical axis, lies on
the graph of y=x2.

computable function

Functions that send integers to integers, or finite strings to finite strings,


can sometimes be defined by an algorithm, which gives a precise
description of a set of steps for computing the output of the function
from its input. Functions definable by an algorithm are called

56
computable functions. For example, the Euclidean algorithm gives a
precise process to compute the greatest common divisor of two positive
integers. Many of the functions studied in the context of number theory
are computable.

Fundamental results of computability theory show that there are


functions that can be precisely defined but are not computable.
Moreover, in the sense of cardinality, almost all functions from the
integers to integers are not computable. The number of computable
functions from integers to integers is countable, because the number of
possible algorithms is. The number of all functions from integers to
integers is higher: the same as the cardinality of the real numbers. Thus
most functions from integers to integers are not computable. Specific
examples of uncomputable functions are known, including the busy
beaver function and functions related to the halting problem and other
undecidable problems.

Basic properties

There are a number of general basic properties and notions. In this


section, f is a function with domain X and codomain Y.

Image and preimage


Image (mathematics)

57
The graph of the function f(x) = x3 − 9x2 + 23x − 15. The interval A =
[3.5, 4.25] is a subset of the domain, thus it is shown as part of the x-axis
(green). The image of A is (approximately) the interval [−3.08,
[ −1.88]. It
is obtained by projecting to the y-axis
axis (along the blue arrows) the
intersection
tion of the graph with the light green area consisting of all points
whose x-coordinate
coordinate is between 3.5 and 4.25. the part of the (vertical) y-
axis shown in blue. The preimage of B = [1, 2.5] consists of three
intervals. They are obtained by projecting the intersection of the light
red area with the graph to the x-axis.

If A is any subset of the domain X, then f(A)) is the subset of the


codomain Y consisting of all images of elements of A. We say the f(A)
is the image of A under f. The image of f is given by f(X
X). On the other
hand, the inverse image (or preimage, complete inverse image)
image of a
subset B of the codomain Y under a function f is the subset of the
domain X defined by

58
So, for example, the preimage of {4, 9} under the squaring function is
−3,−2,2,3}. The term range usually refers to the image,[7] but
the set {−3,−2,2,3}.
sometimes it refers to the codomain.

By definition of a function, the image of an element x of the domain is


always a single element y of the codomain. Conversely, though, the
preimage of a singleton set (a set with exactly one element) may in
general contain any number of elements. For example, if f(x) = 7 (the
constant function taking value 7), the
thenn the preimage of {5} is the empty
set but the preimage of {7} is the entire domain. It is customary to write
f−1(b) instead of f−1({bb}), i.e.

This set is sometimes called the fiber of b under f.

Use of f(A)) to denote the image of a subset A ⊆ X is consistent so long


as no subset of the domain is also an element of the domain. In some
fields
ds (e.g., in set theory, where ordinals are also sets of ordinals) it is
convenient or even necessary to distinguish the two concepts; the
customary notation is f[A]
f for the set { f(x): x ∈ A }. Likewise, some
authors use square brackets to avoid confusion between the inverse
image and the inverse function. Thus they would write f−1[B] and f−1[b]
for the preimage of a set and a singleton.

59
Injective and surjective functions

A function is called injective (or one-to-one,, or an injection) if f(a) ≠


f(b) for any two different elements a and b of the domain. It is called
surjective (or onto) if f(X)
f = Y.. That is, it is surjective if for every
element y in the codomain there is an x in the domain such that f(x) = y.
Finally f is called bijective if it is both injective and surjective. This
nomenclature was introduced by the Bourbaki group.

The above "color-of-the


the-shape"
shape" function is not injective, since two
distinct shapes (the red triangle and the red rectangle) are assigned the
same value. Moreover, it is not surjective, since the image of the
function contains only three, but not all five colors in the codomain.

Function composition
Function composition

The function composition of two functions takes the output of one


function as the input of a second one. More specifically, the composition
of f with a function g: Y → Z is the function defined by

That is, the value of x is obtained by first applying f to x to obtain y =


f(x)) and then applying g to y to obtain z = g(y).
). In the notation , the
function on the right, f,
f, acts first and the function on the left, g acts

60
second, reversing English reading order. The notation can be memorized
by reading the notation as "g
" of f" or "g after f".
". The composition is
only defined when the codomain of f is the domain of g.. Assuming that,
the composition in the opposite order need not be defined. Even if it
is, i.e., if the codomain of f is the codomain of g, it is not in general true
that

That is, the order of the composition is important. For example, suppose
f(x) = x2 and g(x) = x+1.
+1. Then g(f(x)) = x2+1, while f(g((x)) = (x+1)2,
which is x2+2x+1,
+1, a different function.

A composite function g(f(x))


)) can be visualized as the combination
of two
wo "machines". The first takes input x and outputs f(x). The
second takes f(x)) and outputs g(f(x)).

61

A concrete example of a function composition

( ∘ f )(c) = #.
Another composition. For example, we have here (g

Identity function

The unique function over a set X that maps each element to itself is
called the identity function for X,, and typically denoted by idX. Each set
has its own identity function, so the subscript cannot be omitted unless
the set can be inferred from context. U
Under
nder composition, an identity
function is "neutral": if f is any function from X to Y,, then

62
Restrictions and extensions

Informally, a restriction of a function f is the result of trimming its


domain. More precisely, if S is any subset of X,, the restriction of f to S
is the function f|S from S to Y such that f|S(s) = f(s) for all s in S. If g is a
restriction of f,, then it is said that f is an extension of g.

The overriding of f: X → Y by g: W → Y (also called overriding union)


union
is an extension of g denoted as ((f ⊕ g): (X ∪ W) → Y. Its graph is the
set-theoretical
theoretical union of the graphs of g and f|X \ W. Thus, it relates any
element of the domain of g to its image under g,, and any other element
of the domain of f to its image under f.. Overriding is an associative
operation; it has the empty function as an identity element.
element If f|X ∩ W and
g|X ∩ W are pointwise
wise equal (e.g., the domains of f and g are disjoint),
then the union of f and g is defined and is equal to their overriding
union. This definition agrees with the definition of union for binary
relations.

Inverse function

An inverse function for f, denoted by f−1, is a function in the opposite


direction, from Y to X,
X satisfying

63
That is, the two possible compositions of f and f−1 need to be the
respective identity maps of X and Y.

As a simple example, if f converts a temperature in degrees Celsius C to


degrees Fahrenheit F,, the function converting degrees Fahrenheit to
degrees Celsius would be a suitable f−1.

Such an inverse function exists if and only if f is bijective. In


I this case, f
is called invertible. The notation (or, in some texts, just ) and f−1
are akin to multiplication and reciprocal notation. With this analogy,
identity functions are like the multiplicative identity,, 1, and inverse
functions are like reciprocals
recipro (hence the notation).

Types of functions[

Real-valued
valued functions

A real-valued
valued function f is one whose codomain is the set of real
numbers or a subset thereof. If, in addition, the domain is also a subset
of the reals, f is a real valued function of a real variable. The study of
such functions is called real analysis.

64
Real-valued
valued functions enjoy so-called
so called pointwise operations. That is,
given two functions

f, g: X → Y

where Y is a subset of the reals (and X is an arbitrary set), their


produ f ⋅ g are functions with the same
(pointwise) sum f+g and product
domain and codomain. They are defined by the formulas:

In a similar vein, complex analysis studies functions whose domain and


codomain are both the set of complex numbers.. In most situations, the
domain and codomain are understood from context, and only the
relationship
nship between the input and output is given, but if ,
then in real variables the domain is limited to non
non-negative
negative numbers.

The following table contains a few particularly important types of real


real-
valued functions:

Linear function Quadratic function

65
A linear function A quadratic function.

f(x) = ax + b. f(x) = ax2 + bx + c.

Discontinuous function Trigonometric functions

The sine and cosine functions.


The signum function is not
continuous, since it "jumps" at 0.

Roughly speaking, a continuous


f(x) = sin(x)) (red), f(x) = cos(x)
function is one whose graph can be
(blue)
drawn without lifting the pen.

66
Further types of functions

There are many other special classes of functions that are important to
particular branches of mathematics, or particular applications.

Function spaces

The set of all functions from a set X to a set Y is denoted by X → Y, by


[X → Y], or by YX. The latter notation is motivated by the fact that,
when X and Y are finite and of size |X| and |Y|, then the number of
functions X → Y is |YX| = |Y||X|. This is an example of the convention
from enumerative combinatorics that provides notations for sets based
on their cardinalities. If X is infinite and there is more than one element
in Y then there are uncountably many functions from X to Y, though
only countably many of them can be expressed with a formula or
algorithm.

Currying

An alternative approach to handling functions with multiple arguments


is to transform them into a chain of functions that each takes a single
argument. For instance, one can interpret Add(3,5) to mean "first
produce a function that adds 3 to its argument, and then apply the 'Add
3' function to 5". This transformation is called currying: Add 3 is
curry(Add) applied to 3. There is a bijection between the function spaces
CA×B and (CB)A.
67
When working with curried functions it is customary to use prefix
notation with function application considered left-associative, since
juxtaposition of multiple arguments—as in (f x y)—naturally maps to
evaluation of a curried function. Conversely, the → and ⟼ symbols are
considered to be right-associative, so that curried functions may be
defined by a notation such as f: ℤ → ℤ → ℤ = x ⟼ y ⟼ x·y.

Variants and generalizations

Alternative definition of a function

The above definition of "a function from X to Y" is generally agreed


on,[citation needed] however there are two different ways a "function" is
normally defined where the domain X and codomain Y are not explicitly
or implicitly specified. Usually this is not a problem as the domain and
codomain normally will be known. With one definition saying the
function defined by f(x) = x2 on the reals does not completely specify a
function as the codomain is not specified, and in the other it is a valid
definition.

In the other definition a function is defined as a set of ordered pairs


where each first element only occurs once. The domain is the set of all
the first elements of a pair and there is no explicit codomain separate
from the image.[8][9] Concepts like surjective have to be refined for such
functions, more specifically by saying that a (given) function is

68
surjective on a (given) set if its image equals that set. For example, we
might say a function f is surjective on the set of real numbers.

If a function is defined as a set of ordered pairs with no specific


codomain, then f: X → Y indicates that f is a function whose domain is
X and whose image is a subset of Y. This is the case in the ISO
standard.[7] Y may be referred to as the codomain but then any set
including the image of f is a valid codomain of f. This is also referred to
by saying that "f maps X into Y"[7] In some usages X and Y may subset
the ordered pairs, e.g. the function f on the real numbers such that y=x2
when used as in f: [0,4] → [0,4] means the function defined only on the
interval [0,2].[10] With the definition of a function as an ordered triple
this would always be considered a partial function.

An alternative definition of the composite function g(f(x)) defines it for


the set of all x in the domain of f such that f(x) is in the domain of g.[11]
Thus the real square root of −x2 is a function only defined at 0 where it
has the value 0.

Functions are commonly defined as a type of relation. A relation from X


to Y is a set of ordered pairs (x, y) with x ∈ X and y ∈ Y. A function
from X to Y can be described as a relation from X to Y that is left-total
and right-unique. However, when X and Y are not specified there is a
disagreement about the definition of a relation that parallels that for
functions. Normally a relation is just defined as a set of ordered pairs
69
and a correspondence is defined as a triple (X, Y, F),, however the
distinction between the two is often blurred or a relation
relation is never
referred to without specifying the two sets. The definition of a function
as a triple defines a function as a type of correspondence, whereas the
definition of a function as a set of ordered pairs defines a function as a
type of relation.

Many operations
perations in set theory, such as the power set,, have the class of
all sets as their domain, and therefore, although they are informally
described as functions, they do not fit the set-theoretical
set theoretical definition
outlined above, because a class is not necessarily a set. However some
definitions of relations and functions define them as classes of pairs
rather
her than sets of pairs and therefore do include the power set as a
function.

Partial and multi-valued


valued functions

is not a function in the proper sense, but a multi


multi-valued
function: it assigns to each positive real number x two values: the
(positive) square root of x, and

70
In some parts of mathematics, including recursion theory and functional
analysis, it is convenient to study partial functions in which some values
of the domain have no association in the graph; i.e., single-valued
relations. For example, the function f such that f(x) = 1/x does not define
a value for x = 0, since division by zero is not defined. Hence f is only a
partial function from the real line to the real line. The term total function
can be used to stress the fact that every element of the domain does
appear as the first element of an ordered pair in the graph.

In other parts of mathematics, non-single-valued relations are similarly


conflated with functions: these are called multivalued functions, with the
corresponding term single-valued function for ordinary functions.

Functions with multiple inputs and outputs

The concept of function can be extended to an object that takes a


combination of two (or more) argument values to a single result. This
intuitive concept is formalized by a function whose domain is the
Cartesian product of two or more sets.

For example, consider the function that associates two integers to their
product: f(x, y) = x·y. This function can be defined formally as having
domain ℤ×ℤ, the set of all integer pairs; codomain ℤ; and, for graph, the
set of all pairs ((x, y), x·y). Note that the first component of any such

71
pair is itself a pair (of integers), while the second component is a single
integer.

The function value of the pair (x, y) is f((x, y)). However, it is customary
to drop one set of parentheses and consider f(x, y) a function of two
variables, x and y. Functions of two variables may be plotted on the
three-dimensional Cartesian as ordered triples of the form (x, y, f(x, y)).

The concept can still further be extended by considering a function that


also produces output that is expressed as several variables. For example,
consider the integer divide function, with domain ℤ×ℕ and codomain
ℤ×ℕ. The resultant (quotient, remainder) pair is a single value in the
codomain seen as a Cartesian product.

Binary operations

The familiar binary operations of arithmetic, addition and multiplication,


can be viewed as functions from ℝ×ℝ to ℝ. This view is generalized in
abstract algebra, where n-ary functions are used to model the operations
of arbitrary algebraic structures. For example, an abstract group is
defined as a set X and a function f from X×X to X that satisfies certain
properties.

Traditionally, addition and multiplication are written in the infix


notation: x+y and x×y instead of +(x, y) and ×(x, y).

72
Functors

The idea of structure-preserving functions, or homomorphisms, led to


the abstract notion of morphism, the key concept of category theory. In
fact, functions f: X → Y are the morphisms in the category of sets,
including the empty set: if the domain X is the empty set, then the subset
of X × Y describing the function is necessarily empty, too. However,
this is still a well-defined function. Such a function is called an empty
function. In particular, the identity function of the empty set is defined, a
requirement for sets to form a category.

The concept of categorification is an attempt to replace set-theoretic


notions by category-theoretic ones. In particular, according to this idea,
sets are replaced by categories, while functions between sets are
replaced by functors.

1.3. BOOLEAN ALGEBRA

In mathematics and mathematical logic, Boolean algebra is the branch


of algebra in which the values of the variables are the truth values true
and false, usually denoted 1 and 0 respectively. Instead of elementary
algebra where the values of the variables are numbers, and the main
operations are addition and multiplication, the main operations of
Boolean algebra are the conjunction and, denoted ∧, the disjunction or,

73
denoted ∨, and the negation not, denoted ¬. It is thus a formalism for
describing logical relations in the same way that ordinary algebra
describes numeric relations.

Boolean algebra was introduced by George Boole in his first book The
Mathematical Analysis of Logic (1847), and set forth more fully in his
An Investigation of the Laws of Thought (1854). According to
Huntington, the term "Boolean algebra" was first suggested by Sheffer
in 1913.

Boolean algebra has been fundamental in the development of digital


electronics, and is provided for in all modern programming languages. It
is also used in set theory and statistics.

History

Boole's algebra predated the modern developments in abstract algebra


and mathematical logic; it is however seen as connected to the origins of
both fields. In an abstract setting, Boolean algebra was perfected in the
late 19th century by Jevons, Schröder, Huntington, and others until it
reached the modern conception of an (abstract) mathematical structure.[4]
For example, the empirical observation that one can manipulate
expressions in the algebra of sets by translating them into expressions in
Boole's algebra is explained in modern terms by saying that the algebra
of sets is a Boolean algebra (note the indefinite article). In fact, M. H.

74
Stone proved in 1936 that every Boolean algebra is isomorphic to a field
of sets.

In the 1930s, while studying switching circuits, Claude Shannon


observed that one could also apply the rules of Boole's algebra in this
setting, and he introduced switching algebra as a way to analyze and
design circuits by algebraic means in terms of logic gates. Shannon
already had at his disposal the abstract mathematical apparatus, thus he
cast his switching algebra as the two-element Boolean algebra. In circuit
engineering settings today, there is little need to consider other Boolean
algebras, thus "switching algebra" and "Boolean algebra" are often used
interchangeably.[5][6][7] Efficient implementation of Boolean functions is
a fundamental problem in the design of combinational logic circuits.
Modern electronic design automation tools for VLSI circuits often rely
on an efficient representation of Boolean functions known as (reduced
ordered) binary decision diagrams (BDD) for logic synthesis and formal
verification.[8]

Logic sentences that can be expressed in classical propositional calculus


have an equivalent expression in Boolean algebra. Thus, Boolean logic
is sometimes used to denote propositional calculus performed in this
way. Boolean algebra is not sufficient to capture logic formulas using
quantifiers, like those from first order logic. Although the development
of mathematical logic did not follow Boole's program, the connection

75
between his algebra and logic was later put on firm ground in the setting
of algebraic logic, which also studies the algebraic systems of many
other logics.[4] The problem of determining whether the variables of a
given Boolean (propositional) formula can be assigned in such a way as
to make the formula evaluate to true is called the Boolean satisfiability
problem (SAT), and is of importance to theoretical computer science,
being the first problem shown to be NP-complete. The closely related
model of computation known as a Boolean circuit relates time
complexity (of an algorithm) to circuit complexity.

Values

Whereas in elementary algebra expressions denote mainly numbers, in


Boolean algebra they denote the truth values false and true. These values
are represented with the bits (or binary digits), namely 0 and 1. They do
not behave like the integers 0 and 1, for which 1 + 1 = 2, but may be
identified with the elements of the two-element field GF(2), that is,
integer arithmetic modulo 2, for which 1 + 1 = 0. Addition and
multiplication then play the Boolean roles of XOR (exclusive-or) and
AND (conjunction) respectively, with disjunction x∨y (inclusive-or)
definable as x + y + xy.

Boolean algebra also deals with functions which have their values in the
set {0, 1}. A sequence of bits is a commonly used such function.
Another common example is the subsets of a set E: to a subset F of E is
76
associated the indicator function that takes the value 1 on F and 0
outside F.. The most general example is the elements of a Boolean
algebra, with all of the foregoing being instances thereof.

As with elementary algebra, the purely equational part of the theory may
be developed without considering explicit values for the variables.

Operations

Basic operations

The basic operations of Boolean calculus are as follows.

• AND (conjunction
conjunction), denoted x∧y (sometimes x AND y or Kxy),
satisfies x∧y = 1 if x = y = 1 and x∧y = 0 otherwise.
• OR (disjunction),
), denoted x∨y (sometimes x OR y or Axy),
satisfies x∨y = 0 if x = y = 0 and x∨y = 1 otherwise.
• NOT (negation),
), denoted ¬
¬x (sometimes NOT x,, N
Nx or !x),
satisfies ¬x = 0 if x = 1 and ¬x = 1 if x = 0.

If the truth values 0 and 1 are interpreted as integers, these operations


may be expressed with the ordinary operations of arithmetic, or by the
minimum\maximum
maximum functions:

77
One may
ay consider that only the negation and one of the two other
operations are basic, because of the following identities that allow to
define the conjunction in terms of the negation and the disjunction, and
vice versa:

Derived operations

The three Boolean operations described above are referred to as basic,


meaning that they can be taken as a basis for other Boolean operations
that can be built up from them by composition, the manner in which
operations are combined or compounded. Operatio
Operations
ns composed from the
basic operations include the following examples:

The first operation, x → y, or Cxy, is called material implication.


implication If x
is true then the value of x → y is taken to be that of y.. But if x is false
then the value of y can be ignored;
ignored; however the operation must return
some truth value and there are only two choices, so the return value is
the one that entails less, namely true. (Relevance logic addresses this by

78
viewing an implication with a false premise as something other than
either true or false.)

The second operation, x ⊕ y, or Jxy, is called exclusive or (often


abbreviated as XOR) to distinguish it from disjunction as the inclusive
kind. It excludes the possibility of both x and y. Defined in terms of
arithmetic it is addition mod 2 where 1 + 1 = 0.

The third operation, the complement of exclusive or, is equivalence or


Boolean equality: x ≡ y, or Exy, is true just when x and y have the same
value. Hence x ⊕ y as its complement can be understood as x ≠ y, being
true just when x and y are different. Its counterpart in arithmetic mod 2
is x + y + 1.

Given two operands, each with two possible values, there are 22 = 4
possible combinations of inputs. Because each output can have two
possible values, there are a total of 24 = 16 possible binary Boolean
operations.

Laws

A law of Boolean algebra is an identity such as x∨(y∨z) = (x∨y)∨z


between two Boolean terms, where a Boolean term is defined as an
expression built up from variables and the constants 0 and 1 using the
operations ∧, ∨, and ¬. The concept can be extended to terms involving
other Boolean operations such as ⊕, →, and ≡, but such extensions are
79
unnecessary for the purposes to which the laws are put. Such purposes
include the definition of a Boolean algebra as any model of the Boolean
laws, and as a means for deriving new laws from old as in the derivation
of x∨(y∧z) = x∨(z∧y)) from y∧z = z∧y as treated in the section on
axiomatization.

Monotone laws

Boolean algebra satisfies many of the same laws as ordinary algebra


when one matches up ∨ with addition and ∧ with multiplication. In
particular the following laws are common to both kinds of algebra:

80
A consequence of the first of these laws is 1∨1
1 1 = 1, which is false in
ordinary algebra, where 1+1 = 2. Taking x = 2 in the second law shows
that it is not an ordinary algebra law either, since 2×2 = 4. The
remaining four laws can be falsified in ordinary al
algebra
gebra by taking all
variables to be 1, for example in Absorption Law 1 the left hand side is
1(1+1) = 2 while the right hand side is 1, and so on.

All of the laws treated so far have been for conjunction and disjunction.
These operations have the property that changing either argument either
leaves the output unchanged or the output changes in the same way as
the input. Equivalently, changing any variable from 0 to 1 never results
in the output changing from 1 to 0. Operations with this property are
said to be monotone.. Thus the axioms so far have all been for
monotonic Boolean logic. Nonmonotonicity enters via complement ¬ as
follows.[3]

Nonmonotone laws

The complement operation


operation is defined by the following two laws.

All properties of negation including the laws below follow from the
above two laws alone.[3]

81
In both ordinary and Boolean algebra,
algebra, negation works by exchanging
pairs of elements, whence in both algebras it satisfies the double
negation law (also called involution law)

But whereas ordinary algebra satisfies the two laws

Boolean algebra satisfies De Morgan's laws:

Completeness

The laws listed above define Boolean algebra, in the sense that they
entail the rest of the subject. The laws Complementation 1 and 2,
together with the monotone laws, suffice for this purpose and can
therefore be taken as one possible complete set of laws or axiomatization
of Boolean algebra. Every law
law of Boolean algebra follows logically from
these axioms. Furthermore, Boolean algebras can then be defined as the
models of these axioms as treated in the section thereon..

To clarify, writing down further laws of Boolean algebra cannot give


rise to any new consequences of these axioms, nor can it rule out any

82
model of them. In contrast, in a list of some but not all of the same laws,
there could have been Boolean laws that did not follow from those on
the list, and moreover there would have been models of the listed laws
that were not Boolean algebras.

This axiomatization is by no means the only one, or even necessarily the


most natural given that we did not pay attention to whether some of the
axioms followed from others but simply chose to stop when we noticed
we had enough laws, treated further in the section on axiomatizations.
Or the intermediate notion of axiom can be sidestepped altogether by
defining a Boolean law directly as any tautology, understood as an
equation that holds for all values of its variables over 0 and 1. All these
definitions of Boolean algebra can be shown to be equivalent.

Boolean algebra has the interesting property that x = y can be proved


from any non-tautology. This is because the substitution instance of any
non-tautology obtained by instantiating its variables with constants 0 or
1 so as to witness its non-tautologyhood reduces by equational reasoning
to 0 = 1. For example, the non-tautologyhood of x∧y = x is witnessed by
x = 1 and y = 0 and so taking this as an axiom would allow us to infer
1∧0 = 1 as a substitution instance of the axiom and hence 0 = 1. We can
then show x = y by the reasoning x = x∧1 = x∧0 = 0 = 1 = y∨1 = y∨0 =
y.

83
Duality principle

Principle: If {X,R} is a poset, then {X,R(inverse)} is also a poset.

There is nothing magical about the choice of symbols for the values of
Boolean algebra. We could rename 0 and 1 to say α and β, and as long as
we did so consistently throughout it would still be Boolean algebra,
albeit with some obvious cosmetic differences.

But suppose we rename 0 and 1 to 1 and 0 respectively. Then it would


still be Boolean algebra, and moreover operating on the same values.
However it would not be identical to our original Boolean algebra
because now we find ∨ behaving the way ∧ used to do and vice versa.
So there are still some cosmetic differences to show that we've been
fiddling with the notation, despite the fact that we're still using 0s and
1s.

But if in addition to interchanging the names of the values we also


interchange the names of the two binary operations, now there is no
trace of what we have done. The end product is completely
indistinguishable from what we started with. We might notice that the
columns for x∧y and x∨y in the truth tables had changed places, but that
switch is immaterial.

When values and operations can be paired up in a way that leaves


everything important unchanged when all pairs are switched
84
simultaneously, we call the members of each pair dual to each other.
Thus 0 and 1 are dual, and ∧ and ∨ are dual. The Duality Principle, also
called De Morgan duality, asserts that Boolean algebra is unchanged
when all dual pairs are interchanged.

One change we did not need to make as part of this interchange was to
complement. We say that complement is a self-dual operation. The
identity or do-nothing operation x (copy the input to the output) is also
self-dual. A more complicated example of a self-dual operation is (x∧y)
∨ (y∧z) ∨ (z∧x). There is no self-dual binary operation that depends on
both its arguments. A composition of self-dual operations is a self-dual
operation. For example, if f(x,y,z) = (x∧y) ∨ (y∧z) ∨ (z∧x), then
f(f(x,y,z),x,t) is a self-dual operation of four arguments x,y,z,t.

The principle of duality can be explained from a group theory


perspective by fact that there are exactly four functions that are one-to-
one mappings (automorphisms) of the set of Boolean polynomials back
to itself: the identity function, the complement function, the dual
function and the contradual function (complemented dual). These four
functions form a group under function composition, isomorphic to the
Klein four-group, acting on the set of Boolean polynomials. Walter
Gottschalk remarked that consequently a more appropriate name for the
phenomenon would be the principle (or square) of quaternality.[14]

Diagrammatic representations
85
Venn diagrams

A Venn diagram[15] is a representation of a Boolean operation using


shaded overlapping regions. There is one region for each variable, all
circular in the examples here. The interior and exterior of region x
corresponds respectively to the values 1 (true) and 0 (false) for variable
x.. The shading indicates the value of the operation for each combination
of regions, with dark denoting 1 and light 0 (some authors use the
opposite convention).

The three Venn diagrams in the figure below represent respectively


conjunction x∧y, disjunction
isjunction x∨y, and complement ¬x.

Figure 2. Venn diagrams for conjunction, disjunction, and complement

For conjunction, the region inside both circles is shaded to indicate that
x∧y is 1 when both variables are 1. The other regions are left unshaded
to indicate that x∧y is 0 for the other three combinations.

86
The second diagram represents disjunction x∨y by shading those regions
that lie inside either or both circles. The third diagram represents
complement ¬x by shading the region not inside the circle.

While we have not shown the Venn diagrams for the constants 0 and 1,
they are trivial, being respectively a white box and a dark box, neither
one containing a circle. However we could put a circle for x in those
boxes, in which case each would denote a function of one argument, x,
which returns the same value independently of x, called a constant
function. As far as their outputs are concerned, constants and constant
functions are indistinguishable; the difference is that a constant takes no
arguments, called a zeroary or nullary operation, while a constant
function takes one argument, which it ignores, and is a unary operation.

Venn diagrams are helpful in visualizing laws. The commutativity laws


for ∧ and ∨ can be seen from the symmetry of the diagrams: a binary
operation that was not commutative would not have a symmetric
diagram because interchanging x and y would have the effect of
reflecting the diagram horizontally and any failure of commutativity
would then appear as a failure of symmetry.

Idempotence of ∧ and ∨ can be visualized by sliding the two circles


together and noting that the shaded area then becomes the whole circle,
for both ∧ and ∨.

87
To see the first absorption law, x∧(x∨y) = x, start with the diagram in
the middle for x∨y and note that the portion of the shaded area in
common with the x circle is the whole of the x circle. For the second
absorption law, x∨(x∧y) = x, start with the left diagram for x∧y and note
that shading the whole of the x circle results in just the x circle being
shaded, since the previous shading was inside the x circle.

The double negation law can be seen by complementing the shading in


the third diagram for ¬x, which shades the x circle.

To visualize the first De Morgan's law, (¬x)∧(¬y) = ¬(x∨y), start with


the middle diagram for x∨y and complement its shading so that only the
region outside both circles is shaded, which is what the right hand side
of the law describes. The result is the same as if we shaded that region
which is both outside the x circle and outside the y circle, i.e. the
conjunction of their exteriors, which is what the left hand side of the law
describes.

The second De Morgan's law, (¬x)∨(¬y) = ¬(x∧y), works the same way
with the two diagrams interchanged.

The first complement law, x∧¬x = 0, says that the interior and exterior
of the x circle have no overlap. The second complement law, x∨¬x = 1,
says that everything is either inside or outside the x circle.

88
Digital logic gates

Digital logic is the application of the Boolean algebra of 0 and 1 to


electronic hardware consisting of logic gates connected to form a circuit
diagram. Each gate implements a Boolean operation, and is depicted
schematically by a shape indicating the operation. The shapes associated
with the gates for conjunction (AND-gates), disjunction (OR-gates), and
complement (inverters) are as follows.[16]

The lines on the left of each gate represent input wires or ports. The
value of the input is represented by a voltage on the lead. For so-called
"active-high" logic, 0 is represented by a voltage close to zero or
"ground", while 1 is represented by a voltage close to the supply voltage;
active-low reverses this. The line on the right of each gate represents the
output port, which normally follows the same voltage conventions as the
input ports.

Complement is implemented with an inverter gate. The triangle denotes


the operation that simply copies the input to the output; the small circle
on the output denotes the actual inversion complementing the input. The
convention of putting such a circle on any port means that the signal

89
passing through this port is complemented on the way through, whether
it is an input or output port.

The Duality Principle, or De Morgan's laws, can be understood as


asserting that complementing all three ports of an AND gate converts it
to an OR gate and vice versa, as shown in Figure 4 below.
Complementing both ports of an inverter however leaves the operation
unchanged.

More generally one may complement any of the eight subsets of the
three ports of either an AND or OR gate. The resulting sixteen
possibilities give rise to only eight Boolean operations, namely those
with an odd number of 1's in their truth table. There are eight such
because the "odd-bit-out" can be either 0 or 1 and can go in any of four
positions in the truth table. There being sixteen binary Boolean
operations, this must leave eight operations with an even number of 1's
in their truth tables. Two of these are the constants 0 and 1 (as binary
operations that ignore both their inputs); four are the operations that
depend nontrivially on exactly one of their two inputs, namely x, y, ¬x,
and ¬y; and the remaining two are x⊕y (XOR) and its complement x≡y.

Boolean algebras
90
Boolean algebra (structure)

The term "algebra" denotes both a subject, namely the subject of


algebra, and an object, namely an algebraic structure. Whereas the
foregoing has addressed the subject of Boolean algebra, this section
deals with mathematical objects called Boolean algebras, defined in full
generality as any model of the Boolean laws. We begin with a special
case of the notion definable without reference to the laws, namely
concrete Boolean algebras, and then give the formal definition of the
general notion.

Concrete Boolean algebras

A concrete Boolean algebra or field of sets is any nonempty set of


subsets of a given set X closed under the set operations of union,
intersection, and complement relative to X.[3]

(As an aside, historically X itself was required to be nonempty as well to


exclude the degenerate or one-element Boolean algebra, which is the one
exception to the rule that all Boolean algebras satisfy the same equations
since the degenerate algebra satisfies every equation. However this
exclusion conflicts with the preferred purely equational definition of
"Boolean algebra," there being no way to rule out the one-element
algebra using only equations— 0 ≠ 1 does not count, being a negated

91
equation. Hence modern authors allow the degenerate Boolean algebra
and let X be empty.)

Example 1. The power set 2X of X, consisting of all subsets of X. Here


X may be any set: empty, finite, infinite, or even uncountable.

Example 2. The empty set and X. This two-element algebra shows that
a concrete Boolean algebra can be finite even when it consists of subsets
of an infinite set. It can be seen that every field of subsets of X must
contain the empty set and X. Hence no smaller example is possible,
other than the degenerate algebra obtained by taking X to be empty so as
to make the empty set and X coincide.

Example 3. The set of finite and cofinite sets of integers, where a


cofinite set is one omitting only finitely many integers. This is clearly
closed under complement, and is closed under union because the union
of a cofinite set with any set is cofinite, while the union of two finite sets
is finite. Intersection behaves like union with "finite" and "cofinite"
interchanged.

Example 4. For a less trivial example of the point made by Example 2,


consider a Venn diagram formed by n closed curves partitioning the
diagram into 2n regions, and let X be the (infinite) set of all points in the
plane not on any curve but somewhere within the diagram. The interior
of each region is thus an infinite subset of X, and every point in X is in

92
exactly one region. Then the set of all 22n possible unions of regions
(including the empty set obtained as the union of the empty set of
regions and X obtained as the union of all 2n regions) is closed under
union, intersection, and complement relative to X and therefore forms a
concrete Boolean algebra. Again we have finitely many subsets of an
infinite set forming a concrete Boolean algebra, with Example 2 arising
as the case n = 0 of no curves.

A Boolean algebra can be formally defined as a set of elements , , ...


with the following properties:

1. has two binary operations, (logical AND, or "wedge") and


(logical OR, or "vee"), which satisfy the idempotent laws

(1)

the commutative laws

(2)

(3)

and the associative laws

(4)

(5)

93
2. The operations satisfy the absorption law

(6)

3. The operations are mutually distributive

(7)

(8)

4. contains universal bounds (the empty set) and (the universal set)
which satisfy

(9)

(10)

(11)

(12)

5. has a unary operation of complementation, which obeys the laws

(13)

(14)

(Birkhoff and Mac Lane 1996).

94
In the slightly archaic terminology of (Bell 1986, p. 444), a Boolean
algebra can be defined as a set of elements , , ... with binary operators
(or ; logical OR) and (or ; logical AND) such that

1a. If and are in the set , then is in the set .

1b. If and are in the set , then is in the set .

2a. There is an element (zero) such that for every element .

2b. There is an element (unity) such that for every element .

3a. .

3b. .

4a. .

4b. .

5. For every element there is an element such that and .

6. There are at least two distinct elements in the set .

Huntington (1933ab) presented the following basis for Boolean algebra:

1. Commutativity: .

2. Associativity: .

95
3. Huntington axiom: .

H. Robbins then conjectured that the Huntington axiom could be


replaced with the simpler Robbins axiom,

(15)

The algebra defined by commutativity, associativity, and the Robbins


axiom is called Robbins algebra. Computer theorem proving
demonstrated that every Robbins algebra satisfies the second Winkler
condition, from which it follows immediately that all Robbins algebras
are Boolean (McCune, Kolata 1996).

CHAPTER 2: LAWS OF BOOLEAN ALGEBRA

Boolean Algebra is used to analyze and simplify the digital (logic)


circuits. It uses only the binary numbers i.e. 0 and 1. It is also called as
Binary Algebra or logical Algebra. Boolean algebra was invented by
George Boole in 1854.

Rule in Boolean Algebra

Following are the important rules used in Boolean algebra.

• Variable used can have only two values. Binary 1 for HIGH and
Binary 0 for LOW.

96
• Complement of a variable is represented by an overbar (-). Thus,
complement of variable B is represented as . Thus if B = 0 then
= 1 and B = 1 then = 0.
• ORing of the variables is represented by a plus (+) sign between
them. For example ORing of A, B, C is represented as A &plus; B
&plus; C.
• Logical ANDing of the two or more variable is represented by
writing a dot between them such as A.B.C. Sometime the dot may
be omitted like ABC.

Boolean Laws

There are six types of Boolean Laws.

1.Commutative law

Any binary operation which satisfies the following expression is referred


to as commutative operation.

Commutative law states that changing the sequence of the variables does
not have any effect on the output of a logic circuit.

97
2.Associative law

This law states that the order in which the logic operations are
performed is irrelevant as their effect is the same.

3.Distributive law

Distributive law states the following condition.

4.AND law

These laws use the AND operation. Therefore they are called as AND
laws.

5.OR law

These laws use the OR operation. Therefore they are called as OR laws.

98
6.INVERSION law

This law uses the NOT operation. The inversion law states that double
inversion of a variable results in the original variable itself.

2.1.Logic statements

Statement (logic)

In logic, a statement is either (a) a meaningful declarative sentence that


is either true or false, or (b) that which a true or false declarative
sentence asserts. In the latter case, a statement is distinct from a sentence
in that a sentence is only one formulation of a statement, whereas there
may be many other formulations expressing the same statement.

Overview

Philosopher of language, Peter Strawson advocated the use of the term


"statement" in sense (b) in preference to proposition. Strawson used the
term "Statement" to make the point that two declarative sentences can
make the same statement if they say the same thing in different ways.
Thus in the usage advocated by Strawson, "All men are mortal." and
"Every man is mortal." are two different sentences that make the same
statement.

99
In either case a statement is viewed as a truth bearer.

Examples of sentences that are (or make) statements:

• "Socrates is a man."
• "A triangle has three sides."
• "Madrid is the capital of Spain."

Examples of sentences that are not (or do not make) statements:

• "Who are you?"


• "Run!"
• "Greenness perambulates."
• "I had one grunch but the eggplant over there."
• "The King of France is wise."
• "Broccoli tastes good."
• "Pegasus exists."

The first two examples are not declarative sentences and therefore are
not (or do not make) statements. The third and fourth are declarative
sentences but, lacking meaning, are neither true nor false and therefore
are not (or do not make) statements. The fifth and sixth examples are
meaningful declarative sentences, but are not statements but rather
matters of opinion or taste. Whether or not the sentence "Pegasus
exists." is a statement is a subject of debate among philosophers.

100
Bertrand Russell held that it is a (false) statement. Strawson held it is not
a statement at all.

Statement as an abstract entity

In some treatments "statement" is introduced in order to distinguish a


sentence from its informational content. A statement is regarded as the
information content of an information-bearing sentence. Thus, a
sentence is related to the statement it bears like a numeral to the number
it refers to. Statements are abstract logical entities, while sentences are
grammatical entities.

Statements, truth values and truth tables

A statement is an assertion that can be determined to be true or false.


The truth value of a statement is T if it is true and F if it is false. For
example, the statement ``2 + 3 = 5'' has truth value T. Statements that
involve one or more of the connectives ``and'', ``or'', ``not'', ``if then''
and `` if and only if '' are compound statements (otherwise they are
simple statements). For example, ``It is not the case that 2 + 3 = 5'' is the
negation of the statement above. Of course, it is stated more simply as
``2 + 3 5''. Other examples of compound statements are:

101
If you finish your homework then you can watch T.V.
This is a question if and only if this is an answer.
I have read this and I understand the concept.

In symbolic logic, we often use letters, such as p, q and r to represent


statements and the following symbols to represent the connectives.

Note that the connective ``or'' in logic is used in the inclusive sense (not
the exclusive sense as in English). Thus, the logical statement ``It is
raining or the sun is shining '' means it is raining, or the sun is shining or
it is raining and the sun is shining.

If p is the statement ``The wall is red'' and q is the statement ``The lamp
is on'', then is the statement ``The wall is red or the lamp is on (or
both)'' whereas is the statement ``If the lamp is on then the wall is
red''. The statement translates to ``The wall isn't red and the lamp is
on''.

Statements given symbolically have easy translations into English but it


should be noted that there are several ways to write a statement in
English. For example, with the examples above, the statement

102
directly translates as ``If the wall is red then the lamp is on''. It can also
be stated as ``The wall is red only if the lamp is on'' or ``The lamp is on
if the wall is red''. Similarly, directly translates as ``The wall is red
and the lamp is not on'' but it would be preferable to say ``The wall is
red but the the lamp is off''. The truth value of a compound statement is
determined from the truth values of its simple components under certain
rules. For example, if p is a true statement then the truth value of is F.
Similarly, if p has truth value F, then the statement has truth value T.
These rules are summarized in the following truth table.

If p and q are statements, then the truth value of the statement is T


except when both p and q have truth value F. The truth value of is F
except if both p and q are true. These and the truth values for the other
connectives appear in the truth tables below.

From these elementary truth tables, we can determine the truth value of
more complicated statments. For example, what is the truth value of
given that p and q are true? In this case, has truth value F and

103
from the second line of the tables above, we see the truth value of the
compound statement is F. Had it been the case that p was false and q
true, then again would be false and from the fourth row of the above
table we see that is a false statement. To consider all the possible
truth values, we construct a truth table.

The lower case t and f were used to record truth values in intermediate
steps. Note that while a truth table involving statements p and q has 4
rows to cover the possibility of each statement being true or false, if we
have additional information about either statement this will reduce the
number of rows in the truth table. If, for example, the statement p is
known to be true, then in constructing the truth table of we will
only have 2 rows. Truth tables involving n statements will have rows
unless additional information about the truth values of some of these
statements is known.

A statement that is always true is called logically true or a tautology. A


statement that is always false is called logically false or a contradiction.
Symbolically, we denote a tautology by 1 and a contradiction by 0.

Worked Examples

104
Negation

Sometimes in mathematics it's important to determine what the opposite


of a given mathematical statement is. This is usually referred to as
"negating" a statement. One thing to keep in mind is that if a statement is
true, then its negation is false (and if a statement is false, then its
negation is true).

Let's take a look at some of the most common negations.

Negation of "A or B".


Before giving the answer, let's try to do this for an example.

Consider the statement "You are either rich or happy." For this statement
to be false, you can't be rich and you can't been happy. In other words,
the opposite is to be not rich and not happy. Or if we rewrite it in terms
of the original statement we get "You are not rich and not happy."

If we let A be the statement "You are rich" and B be the statement "You
are happy", then the negation of "A or B" becomes "Not A and Not B."

In general, we have the same statement: The negation of "A or B" is the
statement "Not A and Not B."

105
Negation of "A and B".
Again, let's
's analyze an example first.

Consider the statement "I am both rich and happy." For this statement to
be false I could be either not rich or not happy. If we let A be the
statement "I am rich" and B be the statement "I am happy", then the
negation of "A and B" becomes "I am not rich or I am not happy" or
"Not A or Not B".

Negation of "If A, then B".


To negate a statement of the form "If A, then B" we should replace it
with the statement "A
A and Not B".
B". This might seem confusing at first,
so let's take a look at a simple example to help understand why this is
the right thing to do.

Consider the statement "If I am rich, then I am happy." For this


statement to be false, I would need to be rich and not happy. If A is the
statement "I am rich" and B is the statement "I am happy,", then the
negation of "A B" is "I am rich" = A, and "I am not happy" = not B.

So the negation of "if A, then B" becomes "A and not B".

106
Example.
Now let's consider a statem
statement
ent involving some mathematics. Take the
statement "If n is even, then is an integer." For this statement to be
false, we would need to find an even integer for which was not
an integer. So the opposite of this statement is the statement that
t " is
even and is not an integer."

Negation of "For every ...", "For all ...", "There exists ..."
Sometimes we encounter phrases such as "for every," "for any," "for all"
and "there exists" in mathematical statements.

Example.
Consider the statement
tatement "For all integers , either is even or is
odd". Although the phrasing is a bit different, this is a statement of the
form "If A, then B." We can reword this sentence as follows: "If is
any integer, then either is even or is odd."

How would we negate this statement? For this statement to be false, all
we would need is to find a single integer which is not even and not odd.
In other words, the negation is the statement "There exists an integer ,
so that is not even and is not odd."

In general, when negating a statement involving "for all," "for every",


the phrase "for all" gets replaced with "there exists." Similarly, when

107
negating a statement involving "there exists", the phrase "there exists"
gets replaced with "for every" or "for all."

Example. Negate the statement "If all rich people are happy, then
all poor people are sad."
First, this statement has the form "If A, then B", where A is the
statement "All rich people are happy" and B is the statement "All poor
people are sad." So the negation has the form "A and not B." So we will
need to negate B. The negation of the statement B is "There exists a poor
person who is not sad."

Putting this together gives: "All rich people are happy, but there exists a
poor person who is not sad" as the negation of "If all rich people are
happy, then all poor people are sad."

Summary.
Statement Negation

"A or B" "not A and not B"

"A and B" "not A or not B"

"if A, then B" "A and not B"

"For all x, A(x)" "There exist x such that not A(x)"

108
"There exists x such that A(x)" "For every x, not A(x)"

A B A AND B

True True True

True False False

False True False

False False False

A simple truth table showing all the


possible values of "A AND B".

Every time you use a computer you are relying on Boolean logic: a
system of logic established long before computers were around, named
after the English mathematician George Boole (1815 - 1864). In Boolean
logic statements can either be true or false (e.g. at the moment "I want a

109
cup of tea" is false, but "I want a piece of cake" is always true), and you
can string these together using the words AND, OR and NOT. To
establish if these compound statements are true of false, you might
create what's called a truth table, listing all the possible values the basic
statements can take, and then all the corresponding values the compound
statement can take. (You can read more in George Boole and the
wonderful world of 0s and 1s.)

Truth tables are useful for simple logic statements, but quickly become
tiresome and error prone for more complicated statements. Boole came
to the rescue by ingeniously recognising that binary logical operations
behaved in a way that's strikingly similar to our normal arithmetic
operations, with a few twists.

In this new kind of arithmetic (called Boolean algebra) the variables are
logical statements (loosely speaking, sentences that are either true or
false). As these can only take two values we can write 0 for a statement
we know is false and 1 for a statement we know is true. Then we can
rewrite OR as a kind of addition using only 0s and 1s:

0 + 0 = 0 (since "false OR false" is false)


1 + 0 = 0 + 1 = 1 (since "true OR false" and "false OR true" are both
true)
1 + 1 = 1 (since "true OR true" is true).

110
We can rewrite AND as a kind of multiplication:

0 x 1 = 1 x 0 = 0 (since "false AND true" and "true AND false" are both
false)
0 x 0 = 0 (since "false AND false" is false)
1 x 1 = 1 (since "true AND true" is true).

As the variables can only have the values of 0 and 1, we can define the
NOT operation as the complement, taking a number to the opposite of its
value:

If A = 1, then NOT A = 0
If A = 0, then NOT A = 1
A + NOT A = 1 (since "true OR false" is true)
A x NOT A = 0 (since "true AND false" is false).

Our new version of these operations is similar in many ways to our more
familiar notions of addition and multiplication but there are a few key
differences. Parts of equations can conveniently disappear in Boolean
algebra, which can be very handy. For example, the variable B in

A+AxB

is irrelevant, no matter what value B has or what logical statement it


represents. This is because if A is true (or equivalently A=1) then A OR
(A AND B) is true no matter whether the statement B is true or false.

111
And if A is false (that is, A=0) then (A AND B) is false no matter the
value of B, and so A OR (A AND B) is false. So Boolean algebra
provides us with a disappearing act: the expression A + A x B is equal to
a simple little A:

A + A x B = A.

Also, in Boolean algebra there is a kind of reverse duality between


addition and multiplication:

(A + B)' = A' x B' and (A x B)' = A' + B'.

These two equalities are known as De Morgan's Laws, after the British
mathematician Augustus de Morgan (1806 - 1871). (You can convince
yourself that they are true using the equivalent truth tables.)

These are just two of the tricks Boolean algebra has up its sleeves for
simplifying complicated

Logical Operators
We will now define the logical operators which we mentioned earlier,
using truth tables. But let us proceed with caution: most of the operators
have names which we may be accustomed to using in ways that are
fuzzy or even contradictory to their proper definitions. In all cases, use
the truth table for an operator as its exact and only definition; try not to

112
bring to logic the baggage of your colloquial use of the English
language.

The first logical operator which we will discuss is the "AND", or


conjunction operator. For the computer scientist, it is perhaps the most
useful logical operator we will discuss. It is a "binary" operator (a
binary operator is defined as an operator that takes two operands; not
binary in the sense of the binary number system):

p AND q

It is traditionally represented using the following symbol:

but we will represent it using the ampersand ("&") since that is the
symbol most commonly used on computers to represent a logical AND.
It has the following truth table:

p q p&q

T T T

T F F

F T F

F F F

113
Notice that p & q is only T if both p and q are T. Thus the rigorous
definition of AND is consistent with its colloquial definition. This will
be very useful for us when we get to Boolean Algebra: there, we will use
1 in place of T and 0 in place of F, and the AND operator will be used to
"mask" bits.

Perhaps the quintessential example of masking which you will encounter


in your further studies is the use of the "network mask" in networking.
An IP ("Internet Protocol") address is 32 bits long, and the first n bits are
usually used to denote the "network address", while the remaining 32 - n
bits denote the "host address":

n bits 32 - n bits

Network Address Host Address

Suppose that on your network, the three most significant bits in the first
byte of an IP address denote the network address, while the remaining
29 bits of the address are used for the host. To find the network address,
we can AND the first byte with

1 1 1 0 0 0 0 02

since

xxxyyyyy

&

114
11100000

xxx00000

(x & 1 = x, but x & 0 = 0). Thus masking allows the system to separate
the network address from the host address in order to identify which
network information is to be sent to. Note that most network numbers
have more than 3 bits. You will spend a lot of time working with
network masks in your courses on networking.

The OR (or disjunction) operator is also a binary operator, and is


traditionally represented using the following symbol:

We will represent OR using the stroke ("|"), again due to common usage
on computers. It has the following truth table:

p q p|q

T T T

T F T

F T T

115
F F F

p | q is true whenever either p is true, q is true or both p and q are true


(so it too agrees with its colloquial counterpart).

The NOT (or negation or inversion) operator is a "unary" operator: it


takes just one operand, like the unary minus in arithmetic (for instance, -
x). NOT is traditionally represented using either the tilde ("~") or the
following symbol:

In a programming environment, NOT is frequently represented using the


exclamation point ("!"). Since the exclamation point is too easy to
mistake for the stroke, we will use the tilde instead. Not has the
following truth table:

p ~p

T F

F T

~ p is the negation of p, so it again agrees with its colloquial counterpart;


it is essentially the 1's complement operation.

The XOR (eXclusive OR) operator is a binary operator, and is not


independent of the operators we have presented thus far (many texts do
not introduce it as a separate logical operator). It has no traditional

116
notation, and is not often used in programming (where our usual logical
operator symbols originate), so we will simply adopt the "X" as the
symbol for the XOR:

p q pXq

T T F

T F T

F T T

F F F

p X q is T if either p is T or q is T, but not both. We will see later how


to write it in terms of ANDs, ORs and NOTs.

XOR has a number of specific and essentially unrelated uses: for


instance, in random numbers, in cryptography and in computer graphics.
One common usage is as the quickest way to assign the value zero: p X
p is always zero.

The implication operator (IMPLIES) is a binary operator, and is defined


in a somewhat counterintuitive manner (until you appreciate it, that is!).
It is traditional notated by one of the following symbols:

but we will denote it with an arrow ("→"):

117
p q p→q

T T T

T F F

F T T

F F T

So p → q follows the following reasoning:

1. a True premise implies a True conclusion, therefore T → T is T;


2. a True premise cannot imply a False conclusion, therefore T → F
is F; and
3. you can conclude anything from a false assumption, so F →
anything is T.

IMPLIES (implication) is definitely one to watch; while its definition


makes sense (after a bit of thought), it is probably not what you are used
to thinking of as implication.

EQUIVALENCE is our final logical operator; it is a binary operator,


and is traditionally notated by either an equal sign, a three-lined equal
sign or a double arrow ("↔"):

p q p↔q

118
T T T

T F F

F T F

F F T

p ↔ q is T if p and q are the same (are equal), so it too follows our


colloquial notion.

Just as with arithmetic operators, the logical operators follow "operator


precedence" (an implicit ordering). In an arithmetic expression with
sums and products and no parentheses, the multiplications are performed
before the additions. In a similar fashion, if parentheses are not used, the
operator precedence for logical operators is:

1. First do the NOTs;


2. then do the ANDs;
3. then the ORs and XORs, and finally
4. do the IMPLIES and EQUIVALENCEs.

119
Gate Symbol Boolean Equation

NOT A Q Q=A

AND A
Q Q = A.B
B

OR A
Q Q = A +B
B

A
NAND Q Q = A.B
B

A
NOR Q Q = A +B

EXOR A Q = A ⊕ B or
Q
B
Q = A.B + A.B

EXNOR A Q = A ⊕ B or
Q
Q = A.B + A.B
B

Now let us put this into practice. There are two ways in which Boolean
expressions for a logic system can be formed, either from a truth table or
from a logic circuit diagram. We will now consider each of these in turn

120
starting with the easiest, which is to complete a Boolean expression from
a truth table.

CHAPTER 3: POLYNOMIALS

A polynomial looks like this:

example of a polynomial
this one has 3 terms

Polynomial comes from poly- (meaning "many") and -nomial (in this
case meaning "term") ... so it says "many terms"

A polynomial can have:

constants (like 3, -20, or ½)

121
variables (like x and y)

exponents (like the 2 in y2), but only 0, 1, 2, 3, ... etc are allowed

that can be combined using addition, subtraction, multiplication and


division ...

... except ...

... not division by a variable (so something like 2/x is right out)

So:

A polynomial can have constants, variables and exponents,


but never division by a variable.

Polynomial or Not?

These are polynomials:

122
• 3x
• x-2
• -6y2 - (7/9)x
• 3xyz + 3xy2z - 0.1xz - 200y + 0.5
• 512v5+ 99w5
• 5

(Yes, even "5" is a polynomial, one term is allowed, and it can even be
just a constant!)

And these are not polynomials

• 3xy-2 is not, because the exponent is "-2" (exponents can only be


0,1,2,...)
• 2/(x+2) is not, because dividing by a variable is not allowed
• 1/x is not either
• √x is not, because the exponent is "½" (see fractional exponents)

But these are allowed:

• x/2 is allowed, because you can divide by a constant


• also 3x/8 for the same reason
• √2 is allowed, because it is a constant (= 1.4142...etc)

Monomial, Binomial, Trinomial

There are special names for polynomials with 1, 2 or 3 terms:


123
How do you remember the names? Think cycles!

There is also quadrinomial (4 terms) and quintinomial (5 terms),


but those names are not often used.

Polynomials can have as many terms as needed, but not an infinite


number of terms.

Variables

Polynomials can have no variable at all

Example: 21 is a polynomial. It has just one term, which is a constant.

Or one variable

Example: x4-2x2+x has three terms, but only one variable (x)

Or two or more variables

124
Example: xy4-5x2z has two terms, and three variables (x, y and z)

What is Special About Polynomials?

Because of the strict definition, polynomials are easy to work with.

For example we know that:

• If you add polynomials you get a polynomial


• If you multiply polynomials you get a polynomial

So you can do lots of additions and multiplications, and still have a


polynomial as the result.

Also, polynomials of one variable are easy to graph, as they have


smooth and continuous lines.

Example: x4-2x2+x

Se e how nice and


smooth the curve is?

125
You can also divide polynomials (but the result may not be a
polynomial).

Degree

The degree of a polynomial with only one variable is the largest


exponent of that variable.

Example:
The Degree is 3 (the largest exponent of
x)

Standard Form

The Standard Form for writing a polynomial is to put the terms with the
highest degree first.

Example: Put this in Standard Form: 3x2 - 7 + 4x3 + x6

The highest degree is 6, so that goes first, then


then 3, 2 and then the constant
last:

x6 + 4x3 + 3x2 – 7

Polynomials can have as many terms as needed, but not an infinite


number of terms.

Variables
126
Polynomials can have no variable at all

Example: 21 is a polynomial. It has just one term, which is a constant.

Or one variable

Example: x4-2x2+x has three terms, but only one variable (x)

Or two or more variables

Example: xy4-5x2z has two terms, and three variables (x, y and z)

What is Special About Polynomials?

Because of the strict definition, polynomials are easy to work with.

For example we know that:

• If you add polynomials you get a polynomial


• If you multiply polynomials you get a polynomial

So you can do lots of additions and multiplications, and still have a


polynomial as the result.

Also, polynomials of one variable are easy to graph, as they have


smooth and continuous lines.

127
Example: x4-2x2+x

See how nice and


smooth the curve is?

You can also divide polynomials (but the result may not be a
polynomial).

Degree

The degree of a polynomial with only one variable is the largest


exponent of that variable.

Example:
The Degree is 3 (the largest exponent of
x)

For more complicated cases, read Degree (of an Expression)


Expression).

Standard Form

The Standard Form for writing a polynomial is to put the terms with the
highest degree first.

128
Example: Put this in Standard Form: 3x2 - 7 + 4x3 + x6

The highest degree is 6, so that goes first, then 3, 2 and then the constant
last:

x6 + 4x3 + 3x2 - 7

Polynomials: Combining "Like Terms"(page 2 of 2)

Polynomial basics, Combining "like terms"

Probably the most common thing you will be doing with polynomials is
"combining like terms". This is the process of adding together whatever
terms you can, but not overdoing it by trying to add together terms that
can't actually be combined. Terms can be combined ONLY IF they have
the exact same variable part. Here is a rundown of what's what:

4x and NOT like The second term has no


3 terms variable

The second term now has


a variable,
4x and NOT like
but it doesn't match the
3y terms
variable of
the first term

129
The second term now has
4x and NOT like
2
the same variable, but the
3x terms
degree is different

Now the variables match


4x and LIKE
and the
3x TERMS
degrees match

Once you have determined that two terms are indeed "like" terms and
can indeed therefore be combined, you can then deal with them in a
manner similar to what you did in grammar school. When you were first
learning to add, you would do "five apples and six apples is eleven
apples". You have since learned that, as they say, "you can't add apples
and oranges". That is, "five apples and six oranges" is just a big pile of
fruit; it isn't something like "eleven applanges". Combining like terms
works much the same way.

• Simplify 3x + 4x

These are like terms since they have the same variable part, so I
can combine the terms: three x's and four x's makes seven x's:
Copyright © Elizabeth Stapel 2000-2011 All Rights Reserved

3x + 4x = 7x

130
• Simplify 2x2 + 3x – 4 – x2 + x + 9

It is often best to group like terms together first, and then simplify:

2x2 + 3x – 4 – x2 + x + 9
= (2x2 – x2) + (3x + x) + (–4 + 9)
= x2 + 4x + 5

In the second line, many students find it helpful to write in the


understood coefficient of 1 in front of variable expressions with no
written coefficient, as is shown in red below:

(2x2 – x2) + (3x + x) + (–4 + 9)


= (2x2 – 1x2) + (3x + 1x) + (–4 + 9)
= 1x2 + 4x + 5
= x2 + 4x + 5

It is not required that the understood 1 be written in when simplifying


expressions like this, but many students find this technique to be very
helpful. Whatever method helps you consistently complete the
simplification is the method you should use.

• Simplify 10x3 – 14x2 + 3x – 4x3 + 4x – 6

10x3 – 14x2 + 3x – 4x3 + 4x – 6


= (10x3 – 4x3) + (–14x2) + (3x + 4x) – 6
= 6x3 – 14x2 + 7x – 6
131
Warning: When moving the terms around, remember that the terms'
signs move with them. Don't mess yourself up by leaving orphaned
"plus" and "minus" signs behind.

• Simplify 25 – (x + 3 – x2)

The first thing I need to do is take the negative through the


parentheses:

25 – (x + 3 – x2)
= 25 – x – 3 + x2
= x2 – x + 25 – 3
= x2 – x + 22

If it helps you to keep track of the negative sign, put the understood 1 in
front of the parentheses:

25 – (x + 3 – x2)
= 25 – 1(x + 3 – x2)
= 25 – 1x – 3 + 1x2
= 1x2 – 1x + 25 – 3
= 1x2 – 1x + 22
= x2 – 1x + 22

132
While the first format (without the 1's being written in) is the more
"standard" format, either format should be acceptable (but check with
your instructor). You should use the format that works most successfully
for you.

• Simplify x + 2(x – [3x – 8] + 3)

Warning: This is the kind of problem that us math teachers love to put
on tests (yes, we're cruel people), so you should expect to need to be
able to do this.

This is just an order of operations problem with a variable in it. If I


work carefully from the inside out, paying careful attention to my
"minus" signs, then I should be fine:

x + 2(x – [3x – 8] + 3)
= x + 2(x – 1[3x – 8] + 3)
= x + 2(x – 3x + 8 + 3)
= x + 2(–2x + 11)
= x – 4x + 22
= –3x + 22

• Simplify [(6x – 8) – 2x] – [(12x – 7) – (4x – 5)]

I'll work from the inside out:

133
[(6x – 8) – 2x] – [(12x – 7) – (4x – 5)]
= [6x – 8 – 2x] – [12x – 7 – 4x + 5]
= [4x – 8] – [8x – 2]
= 4x – 8 – 8x + 2
= –4x – 6

• Simplify –4y – [3x + (3y – 2x + {2y – 7} ) – 4x + 5]

–4y – [3x + (3y – 2x + {2y – 7} ) - 4x + 5]


= –4y – [3x + (3y – 2x + 2y – 7) - 4x + 5]
= –4y – [3x + (–2x + 5y – 7) – 4x + 5]
= –4y – [3x – 2x + 5y – 7 – 4x + 5]
= –4y – [3x – 2x – 4x + 5y – 7 + 5]
= –4y – [–3x + 5y – 2]
= –4y + 3x – 5y + 2
= 3x – 4y – 5y + 2
= 3x – 9y + 2

If you think you need more practice with this last type of problem (with
all the brackets and the negatives and the parentheses, then review the
"Simplifying with Parentheses" lesson.)

Warning: Don't get careless and confuse multiplication and addition.


This may sound like a silly thing to say, but it is the most commonly-
made mistake (after messing up the order of operations):

134
(x)(x) = x2 (multiplication)

x + x = 2x (addition)

" x2 " DOES NOT EQUAL " 2x "

So if you have something like x3 + x2, DO NOT try to say that this
somehow equals something like x5 or 5x. If you have something like 2x
+ x, DO NOT say that this somehow equals something like 2x2.

A polynomial is a mathematical expression involving a sum of powers in


one or more variables multiplied by coefficients. A polynomial in one
variable (i.e., a univariate polynomial) with constant coefficients is
given by

(1)

The individual summands with the coefficients (usually) included are


called monomials (Becker and Weispfenning 1993, p. 191), whereas the
products of the form in the multivariate case, i.e., with the
coefficients omitted, are called terms (Becker and Weispfenning 1993,
p. 188). However, the term "monomial" is sometimes also used to mean
polynomial summands without their coefficients, and in some older

135
works, the definitions of monomial and term are reversed. Care is
therefore needed in attempting to distinguish these conflicting usages.

The highest power in a univariate polynomial is called its order, or


sometimes its degree.

Any polynomial with can be expressed as

(2)

where the product runs over the roots of and it is understood that
multiple roots are counted with multiplicity.

A polynomial in two variables (i.e., a bivariate polynomial) with


constant coefficients is given by

(3)

The sum of two polynomials is obtained by adding together the


coefficients sharing the same powers of variables (i.e., the same terms)
so, for example,

(4)

and has order less than (in the case of cancellation of leading terms) or
equal to the maximum order of the original two polynomials. Similarly,

136
the product of two polynomials is obtained by multiplying term by term
and combining the results, for example

(5
)

(6
)

and has order equal to the sum of the orders of the two original
polynomials.

A polynomial quotient

(7)

of two polynomials and is known as a rational function. The


process of performing such a division is called long division, with
synthetic division being a simplified method of recording the division.

For any polynomial , divides , meaning that the


polynomial quotient is a rational polynomial or, in the case of an integer
polynomial, another integer polynomial (N. Sato, pers. comm., Nov. 23,
2004).

Exchanging the coefficients of a univariate polynomial end-to-end


produces a polynomial
137
(8)

whose roots are reciprocals of the original roots .

Horner's rule provides a computationally efficient method of forming a


polynomial from a list of its coefficients, and can be implemented in the
Wolfram Language as follows.

Polynomial[l_List, x_] := Fold[x #1 + #2&, 0, l]

The following table gives special names given to polynomials of low


orders.

polynomial order polynomial name

2 quadratic polynomial

3 cubic polynomial

4 Quartic

5 Quintic

6 Sextic

Polynomials of fourth degree may be computed using three


multiplications and five additions if a few quantities are calculated first
(Press et al. 1989):

138
(9)

where

(10)

(11)

(12)

(13)

(14)

Similarly, a polynomial of fifth degree may be computed with four


multiplications and five additions, and a polynomial of sixth degree may
be computed with four multiplications and seven additions.

Polynomials of orders one to four are solvable using only rational


operations and finite root extractions. A first-order equation is trivially
solvable. A second-order equation is soluble using the quadratic
equation. A third-order equation is solvable using the cubic equation. A
fourth-order equation is solvable using the quartic equation. It was
proved by Abel and Galois using group theory that general equations of
fifth and higher order cannot be solved rationally with finite root
extractions (Abel's impossibility theorem).

139
However, solutions of the general quintic equation may be given in
terms of Jacobi theta functions or hypergeometric functions in one
variable. Hermite and Kronecker proved that higher order polynomials
are not soluble in the same manner. Klein showed that the work of
Hermite was implicit in the group properties of the icosahedron. Klein's
method of solving the quintic in terms of hypergeometric functions in
one variable can be extended to the sextic, but for higher order
polynomials, either hypergeometric functions in several variables or
"Siegel functions" must be used (Belardinelli 1960, King 1996, Chow
1999). In the 1880s, Poincaré created functions which give the solution
to the th order polynomial equation in finite form. These functions
turned out to be "natural" generalizations of the elliptic functions.

3.1.LINEAR FUNCTIONS

Graphing linear systems

A system of linear equation comprises two or more linear equations. The


solution of a linear system is the ordered pair that is a solution to all
equations in the system.

One way of solving a linear system is by graphing. The solution to the


system will then be in the point in which the two equations intersect.

140
Example

Solve the following system of linear equations

{y=2x+4y=3x+2

The two lines appear to intersect in (2, 8)

It's a good idea to always check your graphical solution algebraically by


substituting x and y in your equations with the ordered pair

y=2x+4_8=?2⋅2+4
2+4

8=8

y=3x+2_

8=?3⋅2+2

141
8=8

A linear system that has exactly one solution is called a consistent


independent system. Consistent means that the lines intersect and
independent means that the lines are distinct.

Linear systems composes of parallel lines that have the same slope but
different y-intersect
intersect do not have a solution since the lines won't intersect.
Linear systems without a solution are called inconsistent systems.

Linear systems composed of lines that have the same slope and the y-
y
intercept are said to be consistent dependent systems. Consistent
dependent systems have infinitely many solutions since the lines
coincide.

Linear Equations

A linear equation is an equation for a straight line

These are all linear equations:


y = 2x+1 5x = 6+3y y/2 = 3 - x

Let us look more closely at one example:

142
Example: y = 2x+1 is a linear equation:

The graph of y = 2x+1 is a straight line

• When x increases, y increases twice as fast, hence 2x


• When x is 0, y is already 1. Hence +1 is also needed
• So: y = 2x + 1

Here are some example values:

x y = 2x + 1

-1 y = 2 × (-1) + 1 = -1

0 y=2×0+1=1

1 y=2×1+1=3

143
2 y=2×2+1=5

Check for yourself that those points are part of the line above!

Different Forms

There are many ways of writing linear equations, but they usually have
constants (like "2" or "c") and must have simple variables (like "x" or
"y").

Examples: These are linear equations:


y = 3x – 6

y - 2 = 3(x + 1)

y + 2x - 2 = 0

5x = 6

y/2 = 3

But the variables (like "x" or "y") in Linear Equations do NOT have:

• Exponents (like the 2 in x2)


• Square roots, cube roots, etc

144
Examples: These are NOT linear equations:
y2 - 2 = 0

3√x - y = 6

x3/2 = 16

Slope-Intercept Form

The most common form is the slope-intercept equation of a straight line:

Slope (or Gradient) Y Intercept

Example: y = 2x + 1

(Our example from the top, which is in Slope-Intercept form)

• Slope: m = 2
• Intercept: b = 1

145
Play With It !

You can see the effect of different


values of m and b at Explore the
Straight Line Graph

Point-Slope Form

Another common one is the Point-Slope Form of the equation of a


straight line:

y - y1 = m(x - x1)

Example: y - 3 = ¼(x - 2)

• x1 = 2
• y1 = 3
• m=¼

General Form

And there is also the General Form of the equation of a straight line:

146
Ax + By + C = 0
(A and B cannot both be 0)

Example: 3x + 2y - 4 = 0

• A=3
• B=2
• C = -4

There are other, less common forms as well.

As a Function

Sometimes a linear equation is written as a function, with f(x) instead of


y:

y = 2x - 3 f(x) = 2x - 3

These are the same!

And functions are not always written using f(x):

y = 2x - 3 w(u) = 2u – 3 h(z) = 2z - 3

These are also the same!

147
The Identity Function

There is a special linear function called the "Identity Function":

f(x) = x

And here is its graph:

It makes a 45° (its slope is 1)

It is called "Identity" because what comes out is identical to what goes


in:

In Out

0 0

5 5

-2 -2

148
...etc ...etc

Constant Functions

Another special type of linear function is the Constant Function ... it is a


horizontal line:

f(x) = C

No matter what value of "x", f(x) is always equal to some constant


value.

Linear Equations

A linear equation is an equation for a straight line

These are all linear equations:


y = 2x+1 5x = 6+3y y/2 = 3 - x

Let us look more closely at one example:


149
Example: y = 2x+1 is a linear equation:

The graph of y = 2x+1 is a straight line

• When x increases, y increases twice as fast, hence 2x


• When x is 0, y is already 1. Hence +1 is also needed
• So: y = 2x + 1

Here are some example values:

x y = 2x + 1

-1 y = 2 × (-1) + 1 = -1

0 y=2×0+1=1

1 y=2×1+1=3

150
2 y=2×2+1=5

Check for yourself that those points are part of the line above!

Different Forms

There are many ways of writing linear equations, but they usually have
constants (like "2" or "c") and must have simple variables (like "x" or
"y").

Examples: These are linear equations:


y = 3x – 6

y - 2 = 3(x + 1)

y + 2x - 2 = 0

5x = 6

y/2 = 3

But the variables (like "x" or "y") in Linear Equations do NOT have:

• Exponents (like the 2 in x2)


• Square roots, cube roots, etc

151
Examples: These are NOT linear equations:
y2 - 2 = 0

3√x - y = 6

x3/2 = 16

Slope-Intercept Form

The most common form is the slope-intercept equation of a straight line:

Slope (or Gradient) Y Intercept

Example: y = 2x + 1

(Our example from the top, which is in Slope-Intercept form)

• Slope: m = 2
• Intercept: b = 1

152
Play With It !

You can see the effect of different


values of m and b at Explore the
Straight Line Graph

Point-Slope Form

Another common one is the Point-Slope Form of the equation of a


straight line:

y - y1 = m(x - x1)

Example: y - 3 = ¼(x - 2)

• x1 = 2
• y1 = 3
• m=¼

General Form

And there is also the General Form of the equation of a straight line:

153
Ax + By + C = 0
(A and B cannot both be 0)

Example: 3x + 2y - 4 = 0

• A=3
• B=2
• C = -4

There are other, less common forms as well.

As a Function

Sometimes a linear equation is written as a function, with f(x) instead of


y:

y = 2x - 3 f(x) = 2x - 3

These are the same!

And functions are not always written using f(x):

y = 2x - 3 w(u) = 2u – 3 h(z) = 2z - 3

These are also the same!

154
The Identity Function

There is a special linear function called the "Identity Function":

f(x) = x

And here is its graph:

It makes a 45° (its slope is 1)

It is called "Identity" because what comes out is identical to what goes


in:

In Out

0 0

5 5

-2 -2

155
...etc ...etc

Constant Functions

Another special type of linear function is the Constant Function ... it is a


horizontal line:

f(x) = C

No matter what value of "x", f(x) is always equal to some constant


value.

CHAPTER 4: VECTOR AND PLANES

4.1.Introduction to Vectors and Scalars

DEFINITION: Vectors were developed to provide a compact way of


dealing with multidimensional situations without writing every bit of
information. Vectors are quantities that have magnitude and direction,
they can be denoted in three ways: in bold (r), underlined (r) or .

156
The position point of a vector is defined using Cartesian co-ordinates: it
uses the coordinates of the OX, OY and OZ axes where O is the origin.
We will be looking at vectors in 3 dimensional space in Cartesian
coordinates. Similar ideas hold for vectors in n dimensional space (n-
vectors).

In the above diagram r is the position vector of a point A relative to the


origin O. So .

4.2.Addition and Subtraction of vectors

DEFINITION: We will look at two laws involving addition and


subtraction; The commutative law and the Associative law.

The commutative law:

Addition and subtraction of vectors obeys the commutative law,

This means that:

157
In terms of the following components

For addition:

For subtraction:

The Associative law:

This uses different routes to get to the same final desination

In terms of the following components:

Top

The length and unit vector of a vector

158
The length of a vector :

DEFINITION: For any vector like this one

The length of a (which is denoted by |a|) is given by:

The unit vector of a vector

DEFINITION: A unit vector is a vector with unit length 1. By standard


convention we let i, j and k be unit vectors along the positive x, y and z
axes, so in terms of components:

Find the length of your own vector and the related unit vector.

4.3.Scalar multiplication of vectors

DEFINITION: This involves the multiplying of a vector by a scalar, i.e.


a number.

In terms of the following components:

159
Scalar multiplication will change the length of the vector and if the
factor λ is negative the vector will point to the opposite direction. Scalar
multiplication satisfies the following properties where a and b are
vectors, λ and μ and are scalars.

Now try multiplying your own vectors and scalars.

4.4.The vector equation of a line

DEFINITION: As taught in A-level and before the equation of a


straight line is given by y = mx + c where m is the gradient and c is the
y-intercept of the line. This is for two dimensions. For three dimensions
similar concept can be used to represent a line in vector form.

In the diagram above, the vector (1,m) is parallel to the line AB and
point A with vector coordinates (0,c) lies on the line AB. Let B be a
typical point on the line with positive vector r. As d=(0,c) is a point on

160
the line and n=(1,m) is a vector parallel to the line, the vector equation
of the line AB is given by

, .

Find the vector equation of your own line by entering two points.

Intersections of vectors with the X-Y plane

Continuing from above we will now look a case where a given line
intersects the X-Y plane. The vector equation of the line is

If we take:

we get that

We know that in the x-y plane z = 0 so;

Substituting this into equations will give that λ = -2, which when
substituted into the second and first equation give that y = -2 and x = -2.
The intersection of the above line occurs at the point(-2,-2,0).

Scalar product

Let us take two vectors.

161
Then the scalar product of a and b, denoted by a.b is

Setting these equal to each other gives

The angle between two vectors

Using the two above formulae and setting them equal to each other as
shown below we are able to calculate the angle between two vectors.

4.6.Vector equations of planes

DEFINITION: Let a vector normal (vector perpendicular to the plane)


be denoted as n, a position vector of some point in plane denoted as a
and a typical point on the plane denoted as r.

162
Looking the above figure, r − a is perpendicular to n. So the formula of
the vector equation of the plane is given by:

4.7.Angles of intersection of two planes

DEFINITION: The angle between two planes is the angle between the
two normals. The plane must firstly been written in vector form.

For example let our plane be

In vector form this is

where

this can also be written as

163
If we had two planes then we would have two normal vectors say n1 and
n2. to find the angle between these two vectors using the same formula
when we found the angle between vectors (above).

Find the angle between two of your own planes.

4.8.Vector Product

DEFINITION: The vector product is fundamentally different from the


scalar product. The vector product of two vectors is a vector but the
scalar product is a scalar. The vector product is given by:

where
|a| is the length of a
θ is the angle between vectors
n is the unit vector perpendicular to a and b whose direction is
determined by the left hand skew rule.

For vectors a × b is found by using the following:

For simplicity this can be written in terms of determinants.

164
4.8.The Area of a Vector Triangle

DEFINITION: In terms of vectors the area of the triangle below is:

Finding the Equation of a Plane given Three Points

Recall that the vector equation of a plane is (r - a). n = 0 where a is a


point on the plane and n is a vector normal to the plane.

Suppose we have the points

which are in Cartesian form. We firstly need to find the vector parallel to
the plane.

To get a vector n, which is normal to the plane, we take the vector


product of the above vectors.

165
This gives a vector denoted by n. So the equation of the plane is found
using the same method as above.

By substituting in we get

4.9.Minimum distance between two skew lines

DEFINITION: Skew lines are lines or vectors which are not parallel
and do not meet. We now seek the minimum distance between these
lines. By drawing a line between both lines, called a transversal, it will
be perpendicular to both lines.

166
The transversal connects A and B, n3 is the unit vector in the direction
AB and p is the required distance. As mentioned above n3 is
perpendicular to both n1 and n2.

CHAPTER 5: ANALYTICAL GEOMETRY

What Is Analytic Geometry?

Analytic Geometry is a branch of algebra that is used to model


geometric objects - points, (straight) lines, and circles being the most

167
basic of these. Analytic geometry is a great invention of Descartes and
Fermat.

In plane analytic geometry, points are defined as ordered pairs of


numbers, say, (x, y), while the straight lines are in turn defined as the
sets of points that satisfy linear equations, see the excellent expositions
by D. Pedoe or D. Brannan et al. From the view of analytic geometry,
geometric axioms are derivable theorems. For example, for any two
distinct points (x1, y1) and (x2, y2), there is a single line ax + by + c = 0
that passes through these points. Its coefficients a, b, c can be found (up
to a constant factor) from the linear system of two equations

ax1 + by1 + c = 0
ax2 + by2 + c = 0,

or directly from the determinant equation

However, no axiomatic theory may escape using undefined elements. In


Set Theory that underlies much of mathematics and, in particular,
analytic geometry, the most fundamental notion of set remains
undefined.

168
Geometry of the three-dimensional space is modeled with triples of
numbers (x, y, z) and a 3D linear equation ax + by + cz + d = 0 defines a
plane. In general, analytic geometry provides a convenient tool for
working in higher dimensions.

Within the framework of analytic geometry one may (and does) model
non-Euclidean geometries as well. For example, in plane projective
geometry a point is a triple of homogenous coordinates (x, y, z), not all
0, such that

(tx, ty, tz) = (x, y, z),

for all t ≠ 0, while a line is described by a homogeneous equation

ax + bx + cz = 0.

In analytic geometry, conic sections are defined by second degree


equations:

ax² + bxy + cy² + dx + ey + f = 0.

That part of analytic geometry that deals mostly with linear equations is
called Linear Algebra.

Cartesian analytic geometry is geometry in which the axes x = 0 and y =


0 are perpendicular.
169
The components of n-tuple x = (x1, ..., xn) are known as its coordinates.
When n = 2 or n = 3, the first coordinates is called abscissa and the
second ordinate.

5.1.PLANES

Plane

A plane is a flat surface with no thickness.


Our world has three dimensions,
but there are only two
dimensions on a plane:

• length and width make a


plane
• x and y also make a plane

And a plane goes on forever.

Examples

It is actually hard to give a real example!

When we draw something on a flat piece of


paper we are drawing on a plane ...

... except that the paper itself is not a plane,

170
because it has thickness! And it should extend
forever, too.
So the very top of a perfect piece of paper
that goes on forever is the right idea!

Also, the top of a table, the floor and a whiteboard are all like a plane.

Here you can spin part of a plane (it really should extend forever):

A plane has 2 Dimensions (and is often called 2D):

Point, Line, Plane and Solid

A Point has no dimensions, only position


A Line is one-dimensional
A Plane is two dimensional (2D)
A Solid is three-dimensional (3D)

Plane vs Plain

In geometry a "plane" is a flat surface with no thickness.

171
But a "plain" is a treeless mostly flat expanse of land ... it is also flat,
but not in the pure sense we use in geometry.

Both words have other meanings too: Plane can mean an airplane, a
level, or a tool for cutting things flat; Plain can mean without special
things, or well understood.

Imagine

Imagine you lived in a two-dimensional world. You could travel around,


visit friends, but nothing in your world has height.

You could measure distances and angles.

You could travel fast or slow. You could go forward, backwards or


sideways. You could move in straight lines, circles, or anything so long
as you never go up or down.

What is a Plane?

In geometry, a plane is a flat surface that extends forever in two


dimensions, but has no thickness. It is a bit difficult to visualize a plane
because in real life, there is nothing that we can use as a true example of
a geometric plane. However, we can use the surface of a wall, the floor,
or even a piece of paper to represent a part of a geometric plane. You
just have to remember that unlike the real-world parts of planes,
geometric planes have no edge to them.
172
In algebra, we graph points in the coordinate plane, which is an
example of a geometric plane. The coordinate plane has a number line
that extends left to right indefinitely and another one that extends up and
down indefinitely. You can never see the entire coordinate plane. The
fact that it extends forever along the x- and y-axis is just indicated by
arrows on the ends of the number lines. Those are the two dimensions
over which a plane extends forever. When you graph points, you never
graph one point deeper into the paper than another point. That shows
that the coordinate plane does not have thickness to it.

A coordinate plane - an example of a geometric plane

Drawing and Naming Planes

In order for us to discuss planes, we need to be able to see them and


label them. Therefore, even though geometric planes do not have to

173
edges to them, when they are drawn, they have an outline. Usually, they
are represented by a parallelogram that is shaded in, like this:

If we want to talk about two or more different planes, then we need to be


able to name each plane. There are two ways to label planes. Most
frequently, you use three or four of the points that are in the plane as the
name. Remember that points are indicated with a dot and are labeled
with a capital letter. The second way to name a plane is with just one
capital letter that is written in the corner of the image of the plane. This
letter does not have a dot next to it and is sometimes written in a script
font that is different from the font used for points.

Below, we will go through some example questions to help you


understand which points are in a plane and how to use them to name the
plane.

5.2. SPHERE

174
A sphere (from Greek σφαίρα — sphaira, "globe, ball,"[1]) is perfectly
round geometrical object in three-dimensional space, such as the shape
of a round ball. Like a circle in two dimensions, a perfect sphere is
completely symmetrical around its center, with all points on the surface
lying the same distance r from the center point. This distance r is known
as the radius of the sphere.

In higher mathematics, a careful distinction is made between the surface


of a sphere (referred to as a “sphere”), and the inside of a sphere
(referred to as a “ball”). Thus, a sphere in three dimensions is considered
to be a two-dimensional spherical surface embedded in three-
dimensional Euclidean space, while a ball is a solid figure bounded by a
sphere.

This article deals with the mathematical concept of a sphere. In physics,


a sphere is an object (usually idealized for the sake of simplicity)
capable of colliding or stacking with other objects which occupy space.

Volume and surface area

The volume inside a sphere is given by the formula

where r is the radius of the sphere. This formula was first derived by
Archimedes, who showed that the volume of a sphere is 2/3 that of a

175
circumscribed cylinder. (This assertion follows from Cavalieri's
principle.) In modern mathematics, this formula is most easily derived
using integral calculus, e.g. using disk integration.

The surface area of a sphere is given by the formula or or .


This is the derivative of the formula for the volume with respect to r.
This formula was also first derived by Archimedes, based upon the fact
that the projection to the lateral surface of a circumscribing cylinder (i.e.
the Gall-Peters map projection) is area-preserving.

Equations in R3

In analytic geometry, a sphere with center (x0, y0, z0) and radius r is the
locus of all points (x, y, z) such that

The points on the sphere with radius r can be parametrized via

(see also trigonometric functions and spherical coordinates).

A sphere of any radius centered at zero is an integral surface of the


following differential form:
176
This equation reflects the fact that the position and velocity vectors of a
point travelling on the sphere are always orthogonal to each other.

The surface area of a sphere with diameter D is

More generally, the area element on the sphere is given in spherical


coordinates by

In particular, the total area can be obtained by integration

The volume of a sphere with radius r and diameter d = 2r is

The sphere has the smallest surface area among all surfaces enclosing a
given volume and it encloses the largest volume among all closed
surfaces with a given surface area. For this reason, the sphere appears in
nature: for instance bubbles and small water drops are roughly spherical,
because the surface tension locally minimizes surface area. The surface

177
area in relation to the mass of a sphere is called the specific surface area.
From the above stated equations it can be expressed as follows:

An image of one of the most accurate man-made spheres, as it refracts


the image of Einstein in the background. This sphere was a fused quartz
gyroscope for the Gravity Probe B experiment which differs in shape
from a perfect sphere by no more than 40 atoms of thickness. It is
thought that only neutron stars are smoother. It was announced on 1
July, 2008 that Australian scientists had created even more perfect
spheres, accurate to 0.3 nanometers, as part of an international hunt to
find a new global standard kilogram.[2]

A sphere can also be defined as the surface formed by rotating a circle


about any diameter. If the circle is replaced by an ellipse, and rotated
about the major axis, the shape becomes a prolate spheroid, rotated
about the minor axis, an oblate spheroid.

Terminology

Pairs of points on a sphere that lie on a straight line through its center are
called antipodal points. A great circle is a circle on the sphere that has
the same center and radius as the sphere, and consequently divides it into
two equal parts. The shortest distance between two distinct non-

178
antipodal points on the surface and measured along the surface, is on the
unique great circle passing through the two points. Equipped with the
great-circle distance, a great circle becomes the Riemannian circle.

If a particular point on a sphere is (arbitrarily) designated as its north


pole, then the corresponding antipodal point is called the south pole and
the equator is the great circle that is equidistant to them. Great circles
through the two poles are called lines (or meridians) of longitude, and
the line connecting the two poles is called the axis of rotation. Circles on
the sphere that are parallel to the equator are lines of latitude. This
terminology is also used for astronomical bodies such as the planet
Earth, even though it is neither spherical nor even spheroidal (see geoid).

Hemisphere

A sphere is divided into two equal hemispheres by any plane that passes
through its center. If two intersecting planes pass through its center, then
they will subdivide the sphere into four lunes or biangles, the vertices of
which all coincide with the antipodal points lying on the line of
intersection of the planes.

The antipodal quotient of the sphere is the surface called the real
projective plane, which can also be thought of as the northern
hemisphere with antipodal points of the equator identified.

179
The volume of a hemisphere is

The surface area is

Generalization to other dimensions

Spheres can be generalized to spaces of any dimension. For any natural


number n, an n-sphere, often written as Sn, is the set of points in (n + 1)-
dimensional Euclidean space which are at a fixed distance r from a
central point of that space, where r is, as before, a positive real number.
In particular:

• a 0-sphere is a pair of endpoints of an interval (−r, r) of the real


line
• a 1-sphere is a circle of radius r
• a 2-sphere is an ordinary sphere
• a 3-sphere is a sphere in 4-dimensional Euclidean space.

Spheres for n > 2 are sometimes called hyperspheres.

The n-sphere of unit radius centred at the origin is denoted Sn and is


often referred to as "the" n-sphere. Note that the ordinary sphere is a 2-
sphere, because it is a 2-dimensional surface (which is embedded in 3-
dimensional space).

The surface area of the (n − 1)-sphere of radius 1 is

180
where Γ(z) is Euler's Gamma function.

Another formula for surface area is

and the volume within is the surface area times or

Generalization to metric spaces

More generally, in a metric space (E,d), the sphere of center x and radius
r > 0 is the set of points y such that d(x,y) = r.

If the center is a distinguished point considered as origin of E, as in a


normed space, it is not mentioned in the definition and notation. The
same applies for the radius if it is taken equal to one, as in the case of a
unit sphere.
181
In contrast to a ball, a sphere may be an empty set, even for a large
radius. For example, in Zn with Euclidean metric, a sphere of radius r is
nonempty only if r2 can be written as sum of n squares of integers.

Topology

In topology, an n-sphere is defined as a space homeomorphic to the


boundary of an (n+1)-ball; thus, it is homeomorphic to the Euclidean n-
sphere, but perhaps lacking its metric.

• a 0-sphere is a pair of points with the discrete topology


• a 1-sphere is a circle (up to homeomorphism); thus, for example,
(the image of) any knot is a 1-sphere
• a 2-sphere is an ordinary sphere (up to homeomorphism); thus, for
example, any spheroid is a 2-sphere

The n-sphere is denoted Sn. It is an example of a compact topological


manifold without boundary. A sphere need not be smooth; if it is
smooth, it need not be diffeomorphic to the Euclidean sphere.

The Heine-Borel theorem implies that a Euclidean n-sphere is compact.


The sphere is the inverse image of a one-point set under the continuous
function ||x||. Therefore the sphere is closed. Sn is also bounded,
therefore it is compact.

Spherical geometry

182
Great circle on a sphere

Spherical geometry

The basic elements of plane geometry are points and lines.


lines On the
sphere, points are defined in the usual sense, but the analogue of "line"
may not be immediately apparent. If one measures by arc length one
finds that the shortest path connecting two points lying entirely in the
sphere is a segment of the great circle containing the points; see
geodesic.. Many theorems from classical geometry hold true for this
spherical geometry as well, but many do not (see parallel postulate).
postulate In
spherical trigonometry,
trigonometry angles are defined between great circles. Thus
spherical trigonometry is different from ordinary trigonometry in many
respects. For example, the sum of the interior angles of a spherical
triangle exceeds 180 degrees. Also, any two similar spherical triangles
are congruent.

PROPERTIES OF THE SPHERE

183
In their book Geometry and the imagination[3] David Hilbert and
Stephan Cohn-Vossen describe 11 properties of the sphere and discuss
whether these properties uniquely determine the sphere. Several
properties hold for the plane which can be thought of as a sphere with
infinite radius. These properties are:

1. The points on the sphere are all the same distance from a fixed
point. Also, the ratio of the distance of its points from two fixed
points is constant.

The first part is the usual definition of the sphere and determines it
uniquely. The second part can be easily deduced and follows a
similar result of Apollonius of Perga for the circle. This second
part also holds for the plane.

2. The contours and plane sections of the sphere are circles.

This property defines the sphere uniquely.

3. The sphere has constant width and constant girth.

The width of a surface is the distance between pairs of parallel


tangent planes. There are numerous other closed convex surfaces
which have constant width, for example Meissner's tetrahedron.
The girth of a surface is the circumference of the boundary of its

184
orthogonal projection on to a plane. It can be proved that each of
these properties implies the other. File:Sphere section.png

4. All points of a sphere are umbilics.

At any point on a surface we can find a normal direction which is


at right angles to the surface, for the sphere these on the lines
radiating out from the center of the sphere. The intersection of a
plane containing the normal with the surface will form a curve
called a normal section and the curvature of this curve is the
sectional curvature. For most points on a surfaces different sections
will have different curvatures, the maximum and minimum values
of these are called the principal curvatures. It can be proved that
any closed surface will have at least four points called umbilical
points. At an umbilic all the sectional curvatures are equal, in
particular the principal curvature's are equal. Umbilical points can
be thought of as the points where the surface is closely
approximated by a sphere.
For the sphere the curvatures of all normal sections are equal, so
every point is an umbilic. The sphere and plane are the only
surfaces with this property.

5. The sphere does not have a surface of centers.

185
For a given normal section there is a circle whose curvature is the
same as the sectional curvature, is tangent to the surface and whose
center lines along on the normal line. Take the two center
corresponding to the maximum and minimum sectional curvatures
these are called the focal points, and the set of all such centers
forms the focal surface.
For most surfaces the focal surface forms two sheets each of which
is a surface and which come together at umbilical points. There are
a number of special cases. For channel surfaces one sheet forms a
curve and the other sheet is a surface; For cones, cylinders, toruses
and cyclides both sheets form curves. For the sphere the center of
every osculating circle is at the center of the sphere and the focal
surface forms a single point. This is a unique property of the
sphere.

6. All geodesics of the sphere are closed curves.

Geodesics are curves on a surface which give the shortest distance


between two points. They are generalisation of the concept of a
straight line in the plane. For the sphere the geodesics are great
circles. There are many other surfaces with this property.

7. Of all the solids having a given volume, the sphere is the one with
the smallest surface area; of all solids having a given surface area,
the sphere is the one having the greatest volume.

186
These properties define the sphere uniquely. These properties can
be seen by observing soap bubbles. A soap bubble will enclose a
fixed volume and due to surface tension it will try to minimize its
surface area. This is why a free floating soap bubble approximates
a sphere (though external forces such as gravity will distort the
bubble's shape slightly).

8. The sphere has the smallest total mean curvature among all convex
solids with a given surface area.

The mean curvature is the average of the two principal curvatures


and as these are constant at all points of the sphere then so is the
mean curvature.

9. The sphere has constant positive mean curvature.

The sphere is the only surface without boundary or singularities


with constant positive mean curvature. There are other surfaces
with constant mean curvature, the minimal surfaces have zero
mean curvature.

10. The sphere has constant positive Gaussian curvature.

Gaussian curvature is the product of the two principle curvatures.


It is an intrinsic property which can be determined by measuring
length and angles and does not depend on the way the surface is

187
embedded in space. Hence, bending a surface will not alter the
Gaussian curvature and other surfaces with constant positive
Gaussian curvature can be obtained by cutting a small slit in the
sphere and bending it. All these other surfaces would have
boundaries and the sphere is the only surface without boundary
with constant positive Gaussian curvature. The pseudosphere is an
example of a surface with constant negative Gaussian curvature.

11. The sphere is transformed into itself by a three-parameter


family of rigid motions.

Consider a unit sphere place at the origin, a rotation around the x, y


or z axis will map the sphere onto itself, indeed any rotation about
a line through the origin can be expressed as a combination of
rotations around the three coordinate axis, see Euler angles. Thus
there is a three parameter family of rotations which transform the
sphere onto itself, this is the rotation group, SO(3). The plane is
the only other surface with a three parameter family of
transformations (translations along the x and y axis and rotations
around the origin). Circular cylinders are the only surfaces with
two parameter families of rigid motions and the surfaces of
revolution and helicoids are the only surfaces with a one parameter
family.

PART: TWO

188
CHAPTER 6: LINEAR ALGEBRA

Linear Algebra

Linear algebra is the study of linear sets of equations and their


transformation properties. Linear algebra allows the analysis of rotations
in space, least squares fitting, solution of coupled differential equations,
determination of a circle passing through three given points, as well as
many other problems in mathematics, physics, and engineering.
Confusingly, linear algebra is not actually an algebra in the technical
sense of the word "algebra" (i.e., a vector space over a field , and so
on).

The matrix and determinant are extremely useful tools of linear algebra.
One central problem of linear algebra is the solution of the matrix
equation

for . While this can, in theory, be solved using a matrix inverse

other techniques such as Gaussian elimination are numerically more


robust.

In addition to being used to describe the study of linear sets of equations,


the term "linear algebra" is also used to describe a particular type of

189
algebra. In particular, a linear algebra over a field has the structure of
a ring with all the usual axioms for an inner addition and an inner
multiplication together with distributive laws, therefore giving it more
structure than a ring. A linear algebra also admits an outer operation of
multiplication by scalars (that are elements of the underlying field ).
For example, the set of all linear transformations from a vector space to
itself over a field forms a linear algebra over . Another example of a
linear algebra is the set of all real square matrices over the field of the
real numbers.

Linear algebra is the branch of mathematics concerning vector spaces


and linear mappings between such spaces. It includes the study of lines,
planes, and subspaces, but is also concerned with properties common to
all vector spaces.

The set of points with coordinates that satisfy a linear equation forms a
hyperplane in an n-dimensional space. The conditions under which a set
of n hyperplanes intersect in a single point is an important focus of study
in linear algebra. Such an investigation is initially motivated by a system
of linear equations containing several unknowns. Such equations are
naturally represented using the formalism of matrices and vectors.[1][2][3]

Linear algebra is central to both pure and applied mathematics. For


instance, abstract algebra arises by relaxing the axioms of a vector space,
leading to a number of generalizations. Functional analysis studies the

190
infinite-dimensional version of the theory of vector spaces. Combined
with calculus, linear algebra facilitates the solution of linear systems of
differential equations.

Techniques from linear algebra are also used in analytic geometry,


engineering, physics, natural sciences, computer science, computer
animation, and the social sciences (particularly in economics). Because
linear algebra is such a well-developed theory, nonlinear mathematical
models are sometimes approximated by linear models.

History

The study of linear algebra first emerged from the study of determinants,
which were used to solve systems of linear equations. Determinants
were used by Leibniz in 1693, and subsequently, Gabriel Cramer
devised Cramer's Rule for solving linear systems in 1750. Later, Gauss
further developed the theory of solving linear systems by using Gaussian
elimination, which was initially listed as an advancement in geodesy.[4]

The study of matrix algebra first emerged in England in the mid-1800s.


In 1844 Hermann Grassmann published his “Theory of Extension”
which included foundational new topics of what is today called linear
algebra. In 1848, James Joseph Sylvester introduced the term matrix,
which is Latin for "womb". While studying compositions of linear
transformations, Arthur Cayley was led to define matrix multiplication

191
and inverses. Crucially, Cayley used a single letter to denote a matrix,
thus treating a matrix as an aggregate object. He also realized the
connection between matrices and determinants, and wrote "There would
be many things to say about this theory of matrices which should, it
seems to me, precede the theory of determinants".[4]

In 1882, Hüseyin Tevfik Pasha wrote the book titled "Linear


Algebra".[5][6] The first modern and more precise definition of a vector
space was introduced by Peano in 1888;[4] by 1900, a theory of linear
transformations of finite-dimensional vector spaces had emerged. Linear
algebra took its modern form in the first half of the twentieth century,
when many ideas and methods of previous centuries were generalized as
abstract algebra. The use of matrices in quantum mechanics, special
relativity, and statistics helped spread the subject of linear algebra
beyond pure mathematics. The development of computers led to
increased research in efficient algorithms for Gaussian elimination and
matrix decompositions, and linear algebra became an essential tool for
modelling and simulations.

The origin of many of these ideas is discussed in the articles on


determinants and Gaussian elimination.

192
Educational history

Linear algebra first appeared in graduate textbooks in the 1940s and in


undergraduate textbooks in the 1950s. Following work by the School
Mathematics Study Group, U.S. high schools asked 12th grade students
to do "matrix algebra, formerly reserved for college" in the 1960s.[8] In
France during the 1960s, educators attempted to teach linear algebra
through affine dimensional vector spaces in the first year of secondary
school. This was met with a backlash in the 1980s that removed linear
algebra from the curriculum.[9] In 1993, the U.S.-based Linear Algebra
Curriculum Study Group recommended that undergraduate linear
algebra courses be given an application-based "matrix orientation" as
opposed to a theoretical orientation.[10]

Scope of study

Vector spaces

The main structures of linear algebra are vector spaces. A vector space
over a field F is a set V together with two binary operations. Elements of
V are called vectors and elements of F are called scalars. The first
operation, vector addition, takes any two vectors v and w and outputs a
third vector v + w. The second operation, scalar multiplication, takes any
scalar a and any vector v and outputs a new vector av. The operations of
addition and multiplication in a vector space must satisfy the following

193
axioms.[11] In the list below, let u, v and w be arbitrary vectors in V, and
a and b scalars in F.

Axiom Signification

Associativity of addition u + (v + w) = (u + v) + w

Commutativity of addition u+v=v+u

There exists an element 0 ∈ V, called


Identity element of addition the zero vector, such that v + 0 = v for
all v ∈ V.

For every v ∈ V, there exists an element


Inverse elements of addition −v ∈ V, called the additive inverse of v,
such that v + (−v) = 0

Distributivity of scalar
multiplication with respect to a(u + v) = au + av
vector addition

Distributivity of scalar
multiplication with respect to (a + b)v = av + bv
field addition

Compatibility of scalar
a(bv) = (ab)v [nb 1]
multiplication with field

194
multiplication

Identity element of scalar 1v = v,, where 1 denotes the


multiplication multiplicative identity in F.

The first four axioms are those of V being an abelian group under vector
addition. Vector spaces may be diverse in nature, for example,
containing functions, polynomials or matrices. Linear algebra is
concerned with properties common to all vector spaces.

Linear transformations

Similarly as in the theory of other algebraic structures, linear algebra


studies mappings between vector spaces that preserve the vector-space
vector
structure. Given two vector spaces V and W over a field F, a linear
transformation (also called linear map, linear mapping or linear
operator) is a map

that is compatible with addition and scalar multiplication:

for any vectors u,v ∈ V and a scalar a ∈ F.

Additionally for any vectors u, v ∈ V and scalars a, b ∈ F:


F

195
When a bijective linear mapping exists between two vector spaces (that
is, every vector from the second space is associated with exactly one in
the first), we say that the two spaces are isomorphic. Because
cause an
isomorphism preserves linear structure, two isomorphic vector spaces
are "essentially the same" from the linear algebra point of view. One
essential question in linear algebra is whether a mapping is an
isomorphism or not, and this question can be answered by checking if
the determinant is nonzero. If a mapping is not an isomorphism, linear
algebra is interested in finding its range (or image) and the set of
elements that get mapped to zero, called the kernel of the mapping.

Linear transformations
nsformations have geometric significance. For example, 2 × 2
real matrices denote standard planar mappings that preserve the origin.

Subspaces, span, and basis

Again, in analogue with theories of other algebraic objects, linear


algebra is interested in subsets of vector spaces that are tthemselves
vector spaces; these subsets are called linear subspaces.. For example,
both the range and kernel of a linear mapping are subspaces, and are
thus often called the range
ran space and the nullspace;; these are important
examples of subspaces. Another important way of forming a subspace is
to take a linear combination of a set of vectors v1, v2, ..., vk:

196
where a1, a2, ..., ak are scalars. The set of all linear combinations of
vectors v1, v2, ..., vk is called their span,, which forms a subspace.

A linear combination of any system of vectors with all zero coefficients


is the zero vector of V.. If this is the only way to express the zero vector
as a linear
inear combination of v1, v2, ..., vk then these vectors are linearly
independent.. Given a set of vectors that span a space, if any vector w is a
linear combination off other vectors (and so the set is not linearly
independent), then the span would remain the same if we remove w from
the set. Thus, a set of linearly dependent vectors is redundant in the
sense that there will be a linearly independent subset which will sspan the
same subspace. Therefore, we are mostly interested in a linearly
independent set of vectors that spans a vector space V,, which we call a
basis of V.. Any set of vectors that spans V contains a basis, and any
linearly independent set of vectors in V can be extended to a basis.[12] It
turns out that if we accept the axiom of choice,, every vector space has a
basis;[13] nevertheless, this basis may be unnatural, and indeed, may
m not
even be constructible. For instance, there exists a basis for the real
numbers, considered as a vector space over the rationals,
rationals but no explicit
basis has been constructed.

Any two bases of a vector space V have the same cardinality,


cardinality which is
called the dimension of V.. The dimension of a vector space is well-

197
defined by the dimension theorem for vector spaces.
spaces. If a basis of V has
finite number of elements, V is called a finite-dimensional
dimensional vector space.
dimensional and U is a subspace of V,, then dim U ≤ dim V.
If V is finite-dimensional
If U1 and U2 are subspaces of V, then

One often restricts consideration to finite-dimensional


finite dimensional vector spaces. A
fundamental theorem of linear algebra states that all vector spaces of the
same dimension are isomorphic,[15] giving an easy way of characterizing
isomorphism.

Matrix theory

A particular basis {v1, v2, ..., vn} of V allows one to construct a


coordinate system in V:
V the vector with coordinates (a1, a2, ..., an) is the
linear combination

The condition that v1, v2, ..., vn span V guarantees that each vector v can
be assigned coordinates, whereas tthe
he linear independence of v1, v2, ..., vn
assures that these coordinates are unique (i.e. there is only one linear
combination of the basis vectors that is equal to v).
). In this way, once a
basis of a vector space V over F has been chosen, V may be identified
space Fn. Under this identification, addition and
with the coordinate n-space

198
scalar multiplication of vectors in V correspond to addition and scalar
multiplication of their coordinate vectors in Fn. Furthermore, if V and W
are an n-dimensional and m-dimensional vector space over F, and a
basis of V and a basis of W have been fixed, then any linear
transformation T: V → W may be encoded by an m × n matrix A with
entries in the field F, called the matrix of T with respect to these bases.
Two matrices that encode the same linear transformation in different
bases are called similar. Matrix theory replaces the study of linear
transformations, which were defined axiomatically, by the study of
matrices, which are concrete objects. This major technique distinguishes
linear algebra from theories of other algebraic structures, which usually
cannot be parameterized so concretely.

There is an important distinction between the coordinate n-space Rn and


a general finite-dimensional vector space V. While Rn has a standard
basis {e1, e2, ..., en}, a vector space V typically does not come equipped
with such a basis and many different bases exist (although they all
consist of the same number of elements equal to the dimension of V).

One major application of the matrix theory is calculation of


determinants, a central concept in linear algebra. While determinants
could be defined in a basis-free manner, they are usually introduced via
a specific representation of the mapping; the value of the determinant
does not depend on the specific basis. It turns out that a mapping has an

199
inverse if and only if the determinant has an inverse (every non
non-zero real
or complex number has an inverse[16]). If the determinant is zero, then
the nullspace is nontrivial. Determinants have other applications,
including
ding a systematic way of seeing if a set of vectors is linearly
independent (we write the vectors as the columns of a matrix, and if the
determinant of that matrix is zero, the vectors are linearly dependent).
Determinants could also be used to solve syste
systems
ms of linear equations
(see Cramer's rule),
), but in real applications, Gaussian elimina
elimination is a
faster method.

Eigenvalues and eigenvectors

In general, the action of a linear transformation may be quite complex.


Attention to low-dimensional
dimensional examples gives an indication of the variety
of their types. One strategy for a general nn-dimensional transformation T
is to find "characteristic lines" that are invariant sets under T. If v is a
non-zero
zero vector such that Tv is a scalar multiple of v,, then the line
through 0 and v is an invariant set under T and v is called a characteristic
sca λ such that Tv = λv is called a
vector or eigenvector.. The scalar
characteristic value or eigenvalue of T.

To find an eigenvector or an eigenvalue, we note that

200
where I is the identity matrix.
matrix. For there to be nontrivial solutions to that
equation, det(T − λ I) = 0. The determinant is a polynomial,
polynomial and so the
eigenvalues are not guaranteed to exist if the field is R.. Thus, we often
work with an algebraically closed field such as the complex numbers
when dealing with eigenvectors and eigenvalues so that an eigenvalue
will always exist. It would be particularly nice if given a transformation
T taking a vector space
pace V into itself we can find a basis for V consisting
of eigenvectors. If such a basis exists, we can easily compute the action
of the transformation on any vector: if v1, v2, ..., vn are linearly
independent eigenvectors of a mapping of n-dimensional
dimensional spaces
sp T with
(not necessarily distinct) eigenvalues λ1, λ2, ..., λn, and if v = a1v1 + ... +
an vn, then,

Such a transformation is called a diagonalizable matrix since in the


eigenbasis, the transformation is represented by a diagonal matrix.
matrix
Because operations like matrix multiplication, matrix inversion, and
determinant calculation are simple on diagonal matrices, computations
involving matrices are much simpler if we can bring the matrix to a
diagonal form. Not all matrices are diagonalizable (even over an
algebraically closed field).

201
Inner-product
product spaces

Besides these basic concepts,


concepts, linear algebra also studies vector spaces
with additional structure, such as an inner product.. The inner product is
an example of a bilinear form,
form, and it gives the vector space a geometric
structure by allowing for the definition of length and angles. Formally,
an inner product is a map

that satisfies the following three axioms for all vectors u,


u v, w in V and
all scalars a in F:[17][18]
[18]

• Conjugate symmetry:

Note that in R,, it is symmetric.

• Linearity in the first argument:

• Positive-definiteness
definiteness:

with equality only for v = 0.

We can define the length of a vector v in V by


202
and we can prove the Cauchy–Schwarz
Cauchy inequality:

In particular, the quantity

and so we can call this quantity the cosine of the angle between the two
vectors.

Two vectors are orthogonal if . An orthonormal basis is a basis


where all basis vectors have length 1 and are orthogonal to each other.
Given any finite-dimensional
dimensional vector space, an orthonormal
orthonorm basis could
be found by the Gram––Schmidt procedure. Orthonormal bases are
particularly nice to deal with, since if v = a1 v1 + ... + an vn, then
.

The inner product facilitates the construction of many useful concepts.


For instance, given a transform T, we can define its Hermitian conjugate
T* as the linear transform
form satisfying

203
If T satisfies TT* = T*T, we call T normal. It turns out that normal
matrices are precisely the matrices that have an orthonormal system of
eigenvectors that span V.

Some main useful theorems

A matrix is invertible, or non-singular, if and only if the linear map


represented by the matrix is an isomorphism.

• Any vector space over a field F of dimension n is isomorphic to Fn


as a vector space over F.
• Corollary: Any two vector spaces over F of the same finite
dimension are isomorphic to each other.
• A linear map is an isomorphism if and only if the determinant is
nonzero.

Applications

Because of the ubiquity of vector spaces, linear algebra is used in many


fields of mathematics, natural sciences, computer science, and social
science. Below are just some examples of applications of linear algebra.

Solution of linear systems

Linear algebra provides the formal setting for the linear combination of
equations used in the Gaussian method. Suppose the goal is to find and

204
describe the solution(s), if any, of the following system of linear
equations:

The Gaussian-elimination
elimination algorithm is as follows: eliminate x from all
equations below L1, and then eliminate y from all equations below L2.
This will put the system into triangular form.. Then, using back
back-
substitution, each unknown can be solved for.

In the example, x is eliminated fro


from L2 by adding (3/2)L
L1 to L2. x is then
eliminated from L3 by adding L1 to L3. Formally:

The result is:

Now y is eliminated from L3 by adding −4L2 to L3:

205
The result is:

This result is a system of linear equations in triangular form, and so the


first part of the algorithm is complete.

The last part, back-substitution,


substitution, consists of solving for the known in
reverse order. It can thus be seen that

Then, z can be substituted into L2, which can then be solved to obtain

Next, z and y can be substituted into L1, which can be solved to obtain

The system is solved.

We can, in general, write any system of linear equations as a matrix


equation:

206
The solution of this system is characterized as follows: first, we find a
particular solution x0 of this equation using Gaussian elimination. Then,
we compute the solutions of Ax = 0; that is, we find the null space N of
A.. The solution set of this equation is given by
. If the number of variables is equal to the
number of equations, then we can characterize when the system has a
unique solution: since N is trivial if and only if det A ≠ 0, the equation
has a unique solution if and only if det A ≠ 0.[19]

Least-squares
squares best fit line

The least squares method is used


sed to determine the best fit line for a set of
data.[20] This line will minimize the sum of the squares of the residuals.

Fourier series expansion

π, π] → R as a
Fourier series are a representation of a function f: [−π,
trigonometric series:

This series expansion is extremely useful in solving partial differential


equations.. In this article, we will not be concerned with convergence
issues; it is nice to note that all Lipschitz-continuous
Lipschitz functions
unctions have a
converging Fourier series expansion, and nice enough discontinuous

207
functions have a Fourier series that converges to the function value at
most points.

The space of all functions that can be represented by a Fourier series


form a vector space (technically speaking, we call functions that have
the same Fourier series expansion the "same" function, since two
different discontinuous functions might have the
the same Fourier series).
Moreover, this space is also an inner product space with the inner
product

The functions gn(x)) = sin(nx)


sin( nx) for n ≥ 0 are
for n > 0 and hn(x) = cos(nx
an orthonormal basis for the space of Fourier
Fourier-expandable
expandable functions. We
can thus use the tools of linear algebra to find the expansion of any
function in this space in terms of these basis functions. For instance, to
find the coefficient ak, we take
tak the inner product with hk:

and by orthonormality, ; that is,

208
Quantum mechanics

Quantum mechanics is highly inspired by notions in linear algebra. In


quantum mechanics,, the physical state of a particle is represented by a
vector, and observables (such as momentum, energy,, and angular
momentum)) are represented by linear operators on the underlying vector
space. More concretely, the wave function of a particle describes its
physical state and lies in the vector space L2 (the functions φ: R3 → C

such that is finite), and it evolves according to


the Schrödinger equation.
equation. Energy is represented as the operator

, where V is the potential energy.


energy H is also
known as the Hamiltonian operator
operator.. The eigenvalues of H represents the
possible energies that can be observed. Given a particle in some state φ,
we can expand φ into a linear combination of eigenstates of H. The
component of H in each eigenstate determines the probability of
measuring the corresponding eigenva
eigenvalue,
lue, and the measurement forces
the particle to assume that eigenstate (wave function collapse).

Many of the principles and techniques of linear algebra can be seen in


the geometry of lines in a real two dimensional plane E.. When
formulated using vectors and
and matrices the geometry of points and lines
in the plane can be extended to the geometry of points and hyperplanes
in high-dimensional
dimensional spaces.

209
Point coordinates in the plane E are ordered pairs of real numbers, (x,y),
(
and a line is defined as the set of points
p (x,y)) that satisfy the linear
equation[21]

where a, b and c are not all zero. Then,


Then

or

where x = (x, y,, 1) is the 3 × 1 set of homogeneous coordinates


( y).[22]
associated with the point (x,

Homogeneous coordinates identify the plane E with the z = 1 plane in


x coordinates in E are obtained from
three dimensional space. The x−y
homogeneous coordinates y = (y1, y2, y3) by dividing by the third
component (if it is nonzero) to obtain y = (y1/y3, y2/y3, 1).

The linear equation, λ,, has the important property, that if x1 and x2 are
homogeneous coordinates of points on the line, then the point αx1 + βx2
is also on thee line, for any real α and β.

210
Now consider the equations of the two lines λ1 and λ2,

which forms a system of linear equations


equations.. The intersection of these two
lines is defined by x = (x,
( y,, 1) that satisfy the matrix equation,

or using homogeneous coordinates,

The point of intersection of these two lines is the unique non


non-zero
solution of these equations. In homogeneous coordinates,
coordinates, the solutions
are multiples of the following solution:[22]

if the rows of B are linearly independent (i.e., λ1 and λ2 represent distinct


lines). Divide through by x3 to get Cramer's rule for the solution of a set
of two linear equations in two unknowns.[23] Notice that this yields a
point in the z = 1 plane only when the 2 × 2 submatrix associated with x3
has a non-zero
zero determinant.

211
It is interesting to consider the case of three lines, λ1, λ2 and λ3, which
yield the matrix equation,

which in homogeneous
us form yields,

Clearly, this equation has the solution x = (0,0,0), which is not a point on
the z = 1 plane E.. For a solution to exist in the plane E,, the coefficient
matrix C must have rank 2, which means its determinant must be zero.
Another way to say this is that the columns of the matrix must be
linearly dependent.

Introduction to linear transformations

Another way to approach linear algebra is to consider linear functions on


the two dimensional real plane E=R2. Here R denotes the set of real
numbers. Let x=(x,
=(x, y) be an arbitrary vector in E and consider the linear
function λ: E→R,, given by

or

212
This transformation has the important property that if Ay
Ay=d, then

This showss that the sum of vectors in E map to the sum of their images
in R.. This is the defining characteristic of a linear map,, or linear
transformation.[21] For this case, where the image space is a real number
functional.[23]
the map is called a linear fun

Consider the linear functional a little more carefully. Let i=(1,0) and j
=(0,1) be the natural basis vectors on E, so that x=xi+yj.. It is now
possible to see that

Thus, the columns of the matrix A are the image of the basis vectors of
E in R.

This is true for any pair of vectors used to define coordinates in E.


Suppose we select a non-orthogonal
non non-unit
unit vector basis v and w to
define coordinates of vectors in E. This means a vector x has coordinates
(α,β), such that x=αv+β
+βw.. Then, we have the linear functional

213
where Av=d and Aw=e
=e are the images of the basis vectors v and w. This
is written in matrix form as

Coordinates relative
elative to a basis

This leads to the question of how to determine the coordinates of a


vector x relative to a general basis v and w in E.. Assume that we know
the coordinates of the vectors, x, v and w in the natural basis i=(1,0) and
j =(0,1). Our goal is two find the real numbers α
α, β, so that x=αv+βw,
that is

To solve this equation for α, β,, we compute the linear coordinate


functionals σ and τ for the basis v, w,, which are given by,[22]

The functionals σ and τ compute the components of x along the basis


vectors v and w,, respectively, that is,

which can be written in matrix form as

214
These coordinate functionals have the properties,

These equations can be assembled into the single matrix equation,

Thus, the matrix formed by the coordinate linear functionals is the


inverse of the matrix formed by the basis vectors.[21][23]

Inverse image

The set of points in the plane E that map to the same image in R under
the linear functional λ define a line in E.. This line is the image of the
inverse map, λ−1: R→E
E.. This inverse image is the set of the points x=(x,
y) that solve the equation,

Notice that a linear functional operates on known values for x=(x, y) to


compute a value c in R,
R, while the inverse image seeks the values for
x=(x,
=(x, y) that yield a specific value c.

215
In order to solve the equation, we first recognize that only one of the two
unknowns (x,y) can be determined, so we select y to be determined, and
rearrange the equation

Solve for y and obtain the inverse image as the set of points,

For convenience the free parameter x has been relabeled t.

The vector p defines the intersection of the line with the y-axis,
y known
as the y-intercept.
intercept. The vector h satisfies the homogeneous equation,

Notice that if h is a solution to this homogeneous equation, then t h is


also a solution.

The set of points of a linear functional that map to zero define the kernel
of the linear functional. The line can be considered to be the set of points
h in the kernel translated by the vector p.[21][23

216
Sources of subspaces: kernels and ranges of linear transformations

Let T be a linear transformation from a vector space V to a vector space


W. Then the kernel of T is the set of all vectors A in V such that
T(A)=0, that is

ker(T)={A in V | T(A)=0}

The range of T is the set of all vectors in W which are images of some
vectors in V, that is

range(T)={A in W | there exists B in V such that T(B)=A}.

Notice that the kernel of a transformation from V to W is a subset of V


and the range is a subset of W. For example if T is a transformation from
the space of functions to the space of real numbers then the kernel must
consist of functions and the range must consist of numbers.

Examples.

1. Let P be the projection of R2 on a line L from R2. Then the kernel of P


is the set of all vectors in R2 which are perpendicular to L and the range
of P is the set of all vectors parallel to L.

217
Indeed, the vectors which are perpendicular to L and only these vectors
are annihilated by the projection. This proves the statement about the
kernel. The projection on L of every vector is parallel to L (by the
definition of projection) and conversely, every vector which is parallel to
L is the projection of some vector from R2, for example, it is the
projection of itself. This proves the statement about the range.

2. Let R be the rotation of R2 through angle Pi/3; counterclockwise.


Then the kernel of R is 0 (no non-zero vectors are annihilated by the
rotation). The range of the rotation is the whole R2. Indeed, for every
vector W in R2 let V be the vector obtained by rotating W through angle
Pi/3; clockwise. Then W is the result of rotating V through Pi/3
counterclockwise, thus W is in the range of our transformation. Since W
was an arbitrary vector in R2, the range of the rotation is the whole R2.

3. Let R be the reflection of R2 about a line L. Then the kernel of R is 0.


Indeed, no non-zero vectors are annihilated by the reflection. The range
is the whole R2 (prove it!).

4. Let Z be the zero transformation from V to W which takes every


vector in V to the zero of W. Then the kernel of Z is V and the range is
{0} (prove it!).

5. Let T be the linear operator of the vector space of P polynomials


which takes every polynomial to its derivative. Then the range of T is

218
the whole P (every polynomial is a derivative of another polynomial)
and the kernel of T is the set of all constants (prove it!).

6. Let T be the linear operator on the space P of polynomials which takes


every polynomial g(x) to the polynomial ∫g(t) dt as t=0..x (integral from
0 to x of T(t)). Then the range of T is the set of all polynomials h(x) such
that h(0)=0 (every such polynomial is the image under T of its
derivative) and the kernel of T is 0 (the only function with 0 anti-
derivative is 0).

7. Let T be the linear transformation from the space of all n by n


matrices M to R which takes every matrix to its trace. Then the range of
T is the whole R (every number is the trace of some matrix) and the
kernel consists of all n by n matrices with zero trace.

6.2.BASIS AND DIMENSION

BASIS

In our previous discussion, we introduced the concepts of span and


linear independence. In a way a set of vectors S = {v1, ... , vk} span a
vector space V if there are enough of the right vectors in S, while they
are linearly independent if there are no redundancies. We now combine
the two concepts.

Definition of Basis

219
Let V be a vector space and S = {v1, v2, ... , vk} be a subset of V. Then
S is a basis for V if the following two statements are true.

1. S spans V.

2. S is a linearly independent set of vectors in V.

Example

Let V = Rn and let S = {e1, e2, ... ,en} where ei has ith component equal
to 1 and the rest 0. For example

e2 = (0,1,0,0,...,0)

Then S is a basis for V called the standard basis.

Example

Let V = P3 and let S = {1, t, t2, t3}. Show that S is a basis for V.

Solution

We must show both linear independence and span.

Linear Independence:

Let

220
c1(1) + c2(t) + c3(t2) + c4(t3) = 0

Then since a polynomial is zero if and only if its coefficients are all zero
we have

c1 = c2 = c3 = c4 = 0

Hence S is a linearly independent set of vectors in V.

Span

A general vector in P3 is given by

a + bt + ct2 + dt3

We need to find constants c1, c2, c3, c4 such that

c1(1) + c2(t) + c3(t2) + c4(t3) = a + bt + ct2 + dt3

We just let

c1 = a, c2 = b, c3 = c, c4 = d

Hence S spans V.

We can conclude that S is a basis for V.

In general the basis {1, t, t2, ... , tn} is called the standard basis for Pn.

221
Example

Show that S = {v1, v2, v3, v4} where

is a basis for V = M2x2.

Solution

We need to show that S spans V and is linearly independent.

Linear Independence

We write

this gives the four equations

222
c1 + c2 + c3 + 2c4 = 0
2c1 + c4 = 0
2c2 = 0
2c3 = 0

Which has the corresponding homogeneous matrix equation

Ac = 0

with

We have

det A = -12

Since the determinant is nonzero, we can conclude that only the trivial
solution exists. That is

c1 = c2 = c3 = c4 = 0

Span

We set
223
which gives the equations

c1 + c2 + c3 + 2c4 = x1
2c1 + c 4 = x2
2c2 = x3
2c3 = x4

The corresponding matrix equation is

Ac = x

Since A is nonsingular, this has a unique solution, namely

c = A-1x

Hence S spans V.

We conclude that S is a basis for V.

If S spans V then we know that every vector in V can be written as a


linear combination of vectors in S. If S is a basis, even more is true.

Theorem

224
Let S = {v1, v2, ... , vk} be a basis for V. Then every vector in V can be
written uniquely as a linear combination of vectors in S.

Remark: What is new here is the word uniquely.

Proof

Suppose that

v = a1v1 + ... + anvn = b1v1 + ... + bnvn

then

(a1 - b1)v1 + ... + (an - bn)vn = 0

Since S is a basis for V, it is linearly independent and the above equation


implies that all the coefficients are zero. That is

a1 - b1 = ... = an - bn = 0

We can conclude that

a1 = b1, ... , an = bn

Since linear independence is all about having no redundant vectors, it


should be no surprise that if S = {v1, v2, ... , vk}spans V, then if S is not
linearly independent then we can remove the redundant vectors until we
arrive at a basis. That is if S is not linearly independent, then one of the

225
vectors is a linear combination of the rest. Without loss of generality,
we can assume that this is the vector vk. We have that

vk = a1v1 + ... + ak-1vk-1

If v is any vector in S we can write

v = c1v1 + ... + ckvk = c1v1 + ... + ck-1vk-1 + ck(a1v1 + ... + ak-1vk-1)

which is a linear combination of the smaller set

S' = {v1, v2, ... , vk-1}

If S' is not a basis, as above we can get rid of another vector. We can
continue this process until the vectors are finally linear independent.

We have proved the following theorem.

Theorem

Let S span a vector space V, then there is a subset of S that is a basis for
V.

Dimension

We have seen that any vector space that contains at least two vectors
contains infinitely many. It is uninteresting to ask how many vectors

226
there are in a vector space. However there is still a way to measure the
size of a vector space. For example, R3 should be larger than R2. We
call this size the dimension of the vector space and define it as the
number of vectors that are needed to form a basis. Tow show that the
dimensions is well defined, we need the following theorem.

Theorem

If S = {v1, v2, ... , vn} is a basis for a vector space V and T = {w1, w2,
... , wk} is a linearly independent set of vectors in V, then k < n.

Remark: If S and T are both bases for V then k = n. This says that
every basis has the same number of vectors. Hence the dimension is
will defined.

The dimension of a vector space V is the number of vectors in a basis. If


there is no finite basis we call V an infinite dimensional vector space.
Otherwise, we call V a finite dimensional vector space.

Proof

If k > n, then we consider the set

R1 = {w1,v1, v2, ... , vn}

Since S spans V, w1 can be written as a linear combination of the vi's.


227
w1 = c1v1 + ... + cnvn

Since T is linearly independent, w1 is nonzero and at least one of the


coefficients ci is nonzero. Without loss of generality assume it is c1. We
can solve for v1 and write v1 as a linear combination of w1, v2, ... vn.
Hence

T1 = {w1, v2, ... , vn}

is a basis for V. Now let

R2 = {w1, w2, v2, ... , vn}

Similarly, w2 can be written as a linear combination of the rest and one


of the coefficients is non zero. Note that since w1 and w2 are linearly
independent, at least one of the vi coefficients must be nonzero. We can
assume that this nonzero coefficient is v2 and as before we see that

T2 = {w1, w2,v3, ... , vn}

is a basis for V. Continuing this process we see that

Tn = {w1, w2, ... , wn}

is a basis for V. But then Tn spans V and hence wn+1 is a linear


combination of vectors in Tn. This is a contradiction since the w's are
linearly independent. Hence n > k.

228
Example

Since

E = {e1, e2, ... , en}

is a basis for Rn then dim Rn = n.

Example

dim Pn = n + 1

since

E = {1, t, t2, ... , tn}

is a basis for Pn.

Example

dim Mmxn = mn

We will leave it to you to find a basis containing mn vectors.

If we have a set of linearly independent vectors

S = {v1, v2, ... , vk}

229
with k < n, then S is not a basis. From the definition of basis, S does not
span V, hence there is a vk+1 such that vk+1 is not in the span of S. Let

S1 = {v1, v2, ... ,vk ,vk+1}

S1 is linearly independent. We can continue this until we get a basis.

Theorem

Let

S = {v1, v2, ... , vk}

be a linearly independent set of vectors in a vector space V. Then there


are vectors

vk+1, ... , vn

such that

{v1, v2, ... , vk, vk+1, ... , vn}

is a basis for V.

We finish this discussion with some very good news. We have seen that
to find out if a set is a basis for a vector space, we need to check for both
linear independence and span. We know that if there are not the right

230
number of vectors in a set, then the set cannot form a basis. If the
number is the right number we have the following theorem.

Theorem

Let V be an n dimensional vector space and let S be a set with n vectors.


Then the following are equivalent.

1. S is a basis for V.

2. S is linearly independent.

3. S spans V.

6.3.LINEAR DEPENDENCE

Linearly Dependent Vectors

vectors , , ..., are linearly dependent iff there exist scalars , ,


..., , not all zero, such that

(1)

If no such scalars exist, then the vectors are said to be linearly


independent. In order to satisfy the criterion for linear dependence,

231
(2)

(3)

In order for this matrix equation to have a nontrivial solution, the


determinant must be 0, so the vectors are linearly dependent if

(4)

and linearly independent otherwise.

Let and be -dimensional vectors. Then the following three conditions


are equivalent (Gray 1997).

1. and are linearly dependent.

2. .

3. The matrix has rank less than two.

Testing for Linear Dependence of Vectors

232
There are many situations when we might wish to know whether a set of
vectors is linearly dependent, that is if one of the vectors is some
combination of the others.

Two vectors u and v are linearly independent if the only numbers x and
y satisfying xu+yv=0 are x=y=0. If we let

then xu+yv=0 is equivalent to

If u and v are linearly independent, then the only solution to this system
of equations is the trivial solution, x=y=0. For homogeneous systems
this happens precisely when the determinant is non-zero. We have now
found a test for determining whether a given set of vectors is linearly
independent: A set of n vectors of length n is linearly independent if the
matrix with these vectors as columns has a non-zero determinant. The
set is of course dependent if the determinant is zero.

Example

The vectors <1,2> and <-5,3> are linearly independent since the matrix

233
has a non-zero determinant.

Example

The vectors u=<2,-1,1>, v=<3,-4,-2>, and w=<5,-10,-8> are dependent


since the determinant

is zero. To find the relation between u, v, and w we look for constants x,


y, and z such that

This is a homogeneous system of equations. Using Gaussian


Elimination, we see that the matrix

in row-reduced form is

234
Thus, y=-3z and 2x=-3y-5z=-3(-3z)-5z=4z which implies
0=xu+yv+zw=2zu-3zv+zw or equivalently w=-2u+3v. A quick
arithmetic check verifies that the vector w is indeed equal to -2u+3v.

CHAPTER 7: MATRICES

Matrices

A Matrix is an array of numbers:

A Matrix
(This one has 2 Rows and 3 Columns)

We talk about one matrix, or several matrices.

There are many things we can do with them ...

Addingj

To add two matrices: add the numbers in the matching positions:

235
These are the calculations:

3+4=7 8+0=8

4+1=5 6-9=-3

The two matrices must be the same size, i.e. the rows must match in
size, and the columns must match in size.

Example: a matrix with 3 rows and 5 columns can be added to another


matrix of 3 rows and 5 columns.

But it could not be added to a matrix with 3 rows and 4 columns (the
columns don't match in size)

Negative

The negative of a matrix is also simple:

236
These are the calculations:

-(2)=-2 -(-4)=+4

-(7)=-7 -(10)=-10

Subtracting

To subtract two matrices: subtract the numbers in the matching


positions:

These are the calculations:

3-4=-1 8-0=8

4-1=3 6-(-9)=15

Note: subtracting is actually defined as the addition of a negative


matrix: A + (-B)

Multiply by a Constant

We can multiply a matrix by some value:

237
These are the calculations:

2×4=8 2×0=0

2×1=2 2×-9=-18

We call the constant a scalar, so officially this is called "scalar


multiplication".

Multiplying by Another Matrix

To multiply two matrices together is a bit more difficult ... read


Multiplying Matrices to learn how.

Dividing

And what about division? Well we don't actually divide matrices, we do


it this way:

A/B = A × (1/B) = A × B-1

where B-1 means the "inverse" of B.

So we don't divide, instead we multiply by an inverse.

238
And there are special ways to find the Inverse ...

... learn more about the Inverse of a Matrix.

Transposing

To "transpose" a matrix, swap the rows and columns. We put a "T" in


the top right-hand corner to mean transpose:

Notation

A matrix is usually shown by a capital letter (such as A, or B)

Each entry (or "element") is shown by a lower case letter with a


"subscript" of row,column:

Rows and Columns

So which is the row and which is the column?

239
• Rows go left-right
• Columns go up-down

To remember that rows come before columns use the word "arc":

ar,c

Example:

B=

Here are some sample entries:

b1,1 = 6 (the entry at row 1, column 1 is 6)

b1,3 = 24 (the entry at row 1, column 3 is 24)

b2,3 = 8 (the entry at row 2, column 3 is 8)

The Determinant of a Matrix

DEFINITION: Determinants play an important role in finding the


inverse of a matrix and also in solving systems of linear equations. In the
following we assume we have a square matrix (m = n). The determinant
of a matrix A will be denoted by det(A) or |A|. Firstly the determinant of
a 2×2 and 3×3 matrix will be introduced, then the n×n case will be
shown.
240
Determinant of a 2×2 matrix

Assuming A is an arbitrary 2×2 matrix A, where the elements are given


by:

then the determinant of a this matrix is as follows:

Determinant of a 3×3 matrix

The determinant of a 3×3 matrix is a little more tricky and is found as


follows (for this case assume A is an arbitrary 3×3 matrix A, where the
elements are given below).

then the determinant of a this matrix is as follows:

Now try an example of finding the determinant of a 3×3 matrix yourself.

Determinant of a n×n matrix

For the general case, where A is an n×n matrix the determinant is given
by:

241
Where the coefficients αij are given by the relation:

where βij is the determinant of the (n-1) × (n-1) matrix that is obtained
by deleting row i and column j. This coefficient αij is also called the
cofactor of aij.

The Inverse of a Matrix

DEFINITION: Assuming we have a square matrix A, which is non-


singular (i.e. det(A) does not equal zero), then there exists an n×n matrix
A-1 which is called the inverse of A, such that this property holds:

AA-1 = A-1A = I, where I is the identity matrix.

The inverse of a 2×2 matrix

Take for example a arbitury 2×2 Matrix A whose determinant (ad − bc)
is not equal to zero.

where a,b,c,d are numbers, The inverse is:

The inverse of a n×n matrix

The inverse of a general n×n matrix A can be found by using the


following equation.

242
Where the adj(A) denotes the adjoint (or adjugate) of a matrix. It can be
calculated by the following method:

• Given the n×n matrix A, define


B = bij
to be the matrix whose coefficients are found by taking the
determinant of the (n-1) × (n-1) matrix obtained by deleting the ith
row and jth column of A. The terms of B (i.e. B = bij) are known as
the cofactors of A.
• Define the matrix C, where
cij = (−1)i+j bij.
• The transpose of C (i.e. CT) is called the adjoint of matrix A.

Lastly to find the inverse of A divide the matrix CT by the determinant


of A to give its inverse.

Solving Systems of Equations using Matrices

DEFINITION: A system of linear equations is a set of equations with n


equations and n unknowns, is of the form of

The unknowns are denoted by x1, x2, ..., xn and the coefficients (a and b
above) are assumed to be given. In matrix form the system of equations
above can be written as:

A simplified way of writing above is like this: Ax = b

243
Now, try putting your own equations into matrix form.

Inverse Matrix Method

DEFINITION: The inverse matrix method uses the inverse of a matrix


to help solve a system of equations, such like the above Ax = b. By pre-
multiplying both sides of this equation by A-1 gives:

or alternatively

So by calculating the inverse of the matrix and multiplying this by the


vector b we can find the solution to the system of equations directly.
And from earlier we found that the inverse is given by

From the above it is clear that the existence of a solution depends on the
value of the determinant of A. There are three cases:

1. If the det(A) does not equal zero then solutions exist using
2. If the det(A) is zero and b=0 then the solution will be not be
unique or does not exist.
3. If the det(A) is zero and b=0 then the solution can be x = 0 but as
with 2. is not unique or does not exist.

Looking at two equations we might have that

Written in matrix form would look like

244
and by rearranging we would get that the solution would look like

Similarly for three simultaneous equations we would have:

Written in matrix form would look like

and by rearranging we would get that the solution would look like

Solving Systems of Equations using Matrices

DEFINITION: A system of linear equations is a set of equations with n


equations and n unknowns, is of the form of

245
The unknowns are denoted by x1, x2, ..., xn and the coefficients (a and b
above) are assumed to be given. In matrix form the system of equations
above can be written as:

A simplified way of writing above is like this: Ax = b

Now, try putting your own equations into matrix form.

look at two methods used to solve matrices. These are:

• Inverse Matrix Method


• Cramer's Rule

Inverse Matrix Method

DEFINITION: The inverse matrix method uses the inverse of a matrix


to help solve a system of equations, such like the above Ax = b. By pre-
multiplying both sides of this equation by A-1 gives:

246
or alternatively

So by calculating the inverse of the matrix and multiplying this by the


vector b we can find the solution to the system of equations directly.
And from earlier we found that the inverse is given by

From the above it is clear that the existence of a solution depends on the
value of the determinant of A. There are three cases:

1. If the det(A) does not equal zero then solutions exist using
2. If the det(A) is zero and b=0 then the solution will be not be
unique or does not exist.
3. If the det(A) is zero and b=0 then the solution can be x = 0 but as
with 2. is not unique or does not exist.

Looking at two equations we might have that

Written in matrix form would look like

247
and by rearranging we would get that the solution would look like

Similarly for three simultaneous equations we would have:

Written in matrix form would look like

and by rearranging we would get that the solution would look like

Augmented matrices

Matrices are incredibly useful things that crop up in many different


applied areas. For now, you'll probably only do some elementary
manipulations with matrices, and then you'll move on to the next topic.
248
But you should not be surprised to encounter matrices again in, say,
physics or engineering. (The plural "matrices" is pronounced as "MAY-
truh-seez".)

Matrices were initially based on systems of linear equations.

• Given the following system of equations, write the associated


augmented matrix.

2x + 3y – z = 6
–x – y – z = 9
x + y + 6z = 0

Write down the coefficients and the answer values, including all
"minus" signs. If there is "no" coefficient, then the coefficient is
"1".

That is, given a system of (linear) equations, you can relate to it the
matrix (the grid of numbers inside the brackets) which contains only the
coefficients of the linear system. This is called "an augmented matrix":
the grid containing the coefficients from the left-hand side of each
equation has been "augmented" with the answers from the right-hand
side of each equation.

249
The entries of (that is, the values in) the matrix correspond to the x-, y-
and z-values in the original system, as long as the original system is
arranged properly in the first place. Sometimes, you'll need to rearrange
terms or insert zeroes as place-holders in your matrix.

• Given the following system of equations, write the associated


augmented matrix.

x+y=0
y+z=3
z–x=2

I first need to rearrange the system as:

x+y =0
y+z=3
–x +z=2

Then I can write the associated matrix as:

When forming the augmented matrix, use a zero for any entry where the
corresponding spot in the system of linear equations is blank.

250
Coefficient matrices

If you form the matrix only from the coefficient values, the matrix
would look like this:

This is called "the coefficient matrix

Above, we went from a linear system to an augmented matrix. You can


go the other way, too.

• Given the following augmented matrix, write the associated


linear system.

Remember that matrices require that the variables be all lined up


nice and neat. And it is customary, when you have three variables,
to use x, y, and z, in that order. So the associated linear system
must be:

x + 3y = 4
2y – z = 5
3x + z = –2

251
The Size of a matrix

Matrices are often referred to by their sizes. The size of a matrix is given
in the form of a dimension, much as a room might be referred to as "a
ten-by-twelve room". The dimensions for a matrix are the rows and
columns, rather than the width and length. For instance, consider the
following matrix A:

Since A has three rows and four columns, the size of A is 3 × 4


(pronounced as "three-by-four").

The rows go side to side; the columns go up and down. "Row" and
"column" are technical terms, and are not interchangable. Matrix
dimensions are always given with the number of rows first, followed by
the number of columns. Following this convention, the following matrix
B:

252
...is 2 × 3. If the matrix has the same number of rows as columns, the
matrix is said to be a "square" matrix. For instance, the coefficient
matrix from above:

...is a 3 × 3 square matrix.

Multiplication of Matrices

Before we give the formal definition of how to multiply two matrices,


we will discuss an example from a real life situation. Consider a city
with two kinds of population: the inner city population and the suburb
population. We assume that every year 40% of the inner city population
moves to the suburbs, while 30% of the suburb population moves to the
inner part of the city. Let I (resp. S) be the initial population of the inner
city (resp. the suburban area). So after one year, the population of the
inner part is

0.6 I + 0.3 S

while the population of the suburbs is

253
0.4 I + 0.7 S

After two years, the population of the inner city is

0.6 (0.6 I + 0.3 S) + 0.3 (0.4 I + 0.7 S)

and the suburban population is given by

0.4 (0.6 I + 0.3 S) + 0.7(0.4 I + 0.7 S)

Is there a nice way of representing the two populations after a certain


number of years? Let us show how matrices may be helpful to answer
this question. Let us represent the two populations in one table (meaning
a column object with two entries):

254
So after one year the table which gives the two populations is

If we consider the following rule (the product of two matrices)

then the populations after one year are given by the formula

After two years the populations are

255
Combining this formula with the above result, we get

In other words, we have

In fact, we do not need to have two matrices of the same size to multiply
them. Above, we did multiply a (2x2) matrix with a (2x1) matrix (which
gave a (2x1) matrix). In fact, the general rule says that in order to
perform the multiplication AB, where A is a (mxn) matrix and B a (kxl)
matrix, then we must have n=k. The result will be a (mxl) matrix. For
example, we have

256
Remember that though we were able to perform the above
multiplication, it is not possible to perform the multiplication

So we have to be very careful about multiplying matrices. Sentences like


"multiply the two matrices A and B" do not make sense. You must know
which of the two matrices will be to the right (of your multiplication)
and which one will be to the left; in other words, we have to know

whether we are asked to perform or . Even if both


multiplications do make sense (as in the case of square matrices with the
same size), we still have to be very careful. Indeed, consider the two
matrices

We have

257
and

So what is the conclusion behind this example? The matrix


multiplication is not commutative, the order in which matrices are
multiplied is important. In fact, this little setback is a major problem in
playing around with matrices. This is something that you must always be
careful with. Let us show you another setback. We have

the product of two non-zero matrices may be equal to the zero-matrix.

258
Algebraic Properties of Matrix Operations

In this page, we give some general results about the three operations:
addition, multiplication, and multiplication with numbers, called scalar
multiplication.

From now on, we will not write (mxn) but mxn.

Properties involving Addition. Let A, B, and C be mxn matrices. We


have

1.

A+B = B+A

2.

(A+B)+C = A + (B+C)

3.

where is the mxn zero-matrix (all its entries are equal to 0);

4.

if and only if B = -A.

Properties involving Multiplication.

259
1.

Let A, B, and C be three matrices. If you can perform the products


AB, (AB)C, BC, and A(BC), then we have

(AB)C = A (BC)

Note, for example, that if A is 2x3, B is 3x3, and C is 3x1, then the
above products are possible (in this case, (AB)C is 2x1 matrix).

2.

If and are numbers, and A is a matrix, then we have

3.

If is a number, and A and B are two matrices such that the


product is possible, then we have

4.

If A is an nxm matrix and the mxk zero-matrix, then

260
Note that is the nxk zero-matrix. So if n is different from m,
the two zero-matrices are different.

Properties involving Addition and Multiplication.

1.

Let A, B, and C be three matrices. If you can perform the


appropriate products, then we have

(A+B)C = AC + BC

and

A(B+C) = AB + AC

2.

If and are numbers, A and B are matrices, then we have

and

261
Example. Consider the matrices

Evaluate (AB)C and A(BC). Check that you get the same matrix.

Answer. We have

so

On the other hand, we have

so

Example. Consider the matrices

262
It is easy to check that

and

These two formulas are called linear combinations. More on linear


combinations will be discussed on a different page.

We have seen that matrix multiplication is different from normal


multiplication (between numbers). Are there some similarities? For
example, is there a matrix which plays a similar role as the number 1?
The answer is yes. Indeed, consider the nxn matrix

In particular, we have

263
The matrix In has similar behavior as the number 1. Indeed, for any nxn
matrix A, we have

A In = In A = A

The matrix In is called the Identity Matrix of order n.

Example. Consider the matrices

Then it is easy to check that

264
The identity matrix behaves like the number 1 not only among the
matrices of the form nxn. Indeed, for any nxm matrix A, we have

In particular, we have

Invertible Matrices

Invertible matrices are very important in many areas of science. For


example, decrypting a coded message uses invertible matrices (see the
coding page). The problem of finding the inverse of a matrix will be
discussed in a different page (click here).

Definition. An matrix A is called nonsingular or invertible iff

there exists an matrix B such that

265
where In is the identity matrix. The matrix B is called the inverse matrix
of A.

Example. Let

One may easily check that

Hence A is invertible and B is its inverse.

Notation. A common notation for the inverse of a matrix A is A-1. So

266
Example. Find the inverse of

Write

Since

we get

267
Easy algebraic manipulations give

or

The inverse matrix is unique when it exists. So if A is invertible, then A-


1
is also invertible and

The following basic property is very important:

268
If A and B are invertible matrices, then is also invertible and

Remark. In the definition of an invertible matrix A, we used both


and to be equal to the identity matrix. In fact, we need only one of
the two. In other words, for a matrix A, if there exists a matrix B such

that , then A is invertible and B = A-1.

Special Matrices: Triangular, Symmetric, Diagonal

We have seen that a matrix is a block of entries or two dimensional data.


The size of the matrix is given by the number of rows and the number of
columns. If the two numbers are the same, we called such matrix a
square matrix.

To square matrices we associate what we call the main diagonal (in


short the diagonal). Indeed, consider the matrix

Its diagonal is given by the numbers a and d. For the matrix

269
its diagonal consists of a, e, and k. In general, if A is a square matrix of
order n and if aij is the number in the ith-row and jth-colum, then the
diagonal is given by the numbers aii, for i=1,..,n.

The diagonal of a square matrix helps define two type of matrices:


upper-triangular and lower-triangular. Indeed, the diagonal
subdivides the matrix into two blocks: one above the diagonal and the
other one below it. If the lower-block consists of zeros, we call such a
matrix upper-triangular. If the upper-block consists of zeros, we call
such a matrix lower-triangular. For example, the matrices

are upper-triangular, while the matrices

270
are lower-triangular. Now consider the two matrices

The matrices A and B are triangular. But there is something special


about these two matrices. Indeed, as you can see if you reflect the matrix
A about the diagonal, you get the matrix B. This operation is called the
transpose operation. Indeed, let A be a nxm matrix defined by the
numbers aij, then the transpose of A, denoted AT is the mxn matrix
defined by the numbers bij where bij = aji. For example, for the matrix

we have

271
Properties of the Transpose operation. If X and Y are mxn matrices
and Z is an nxk matrix, then

1.

(X+Y)T = XT + YT

2.

(XZ)T = ZT XT

3.

(XT)T = X

A symmetric matrix is a matrix equal to its transpose. So a symmetric


matrix must be a square matrix. For example, the matrices

are symmetric matrices. In particular a symmetric matrix of order n,

contains at most different numbers.

A diagonal matrix is a symmetric matrix with all of its entries equal to


zero except may be the ones on the diagonal. So a diagonal matrix has at
most n different numbers other than 0. For example, the matrices

272
are diagonal matrices. Identity matrices are examples of diagonal
matrices. Diagonal matrices play a crucial role in matrix theory. We will
see this later on.

Example. Consider the diagonal matrix

Define the power-matrices of A by

Find the power matrices of A and then evaluate the matrices

for n=1,2,....

Answer. We have

and
273
By induction, one may easily show that

for every natural number n. Then we have

for n=1,2,..

Scalar Product. Consider the 3x1 matrices

The scalar product of X and Y is defined by

In particular, we have

XTX = (a2 + b2 + c2). This is a 1 x 1 matrix .

274
Elementary Operations for Matrices

Elementary operations for matrices play a crucial role in finding the


inverse or solving linear systems. They may also be used for other
calculations. On this page, we will discuss these type of operations.
Before we define an elementary operation, recall that to an nxm matrix
A, we can associate n rows and m columns. For example, consider the
matrix

Its rows are

Its columns are

Let us consider the matrix transpose of A

275
Its rows are

As we can see, the transpose of the columns of A are the rows of AT. So
the transpose operation interchanges the rows and the columns of a
matrix. Therefore many techniques which are developed for rows may
be easily translated to columns via the transpose operation. Thus, we
will only discuss elementary row operations, but the reader may easily
adapt these to columns.

Definition. Two matrices are row equivalent if and only if one may be
obtained from the other one via elementary row operations.

Example. Show that the two matrices

are row equivalent.

276
Answer. We start with A. If we keep the second row and add the first to
the second, we get

We keep the first row. Then we subtract the first row from the second
one multiplied by 3. We get

We keep the first row and subtract the first row from the second one. We
get

which is the matrix B. Therefore A and B are row equivalent.

277
One powerful use of elementary operations consists in finding solutions
to linear systems and the inverse of a matrix. This happens via Echelon
Form and Gauss-Jordan Elimination. In order to appreciate these two
techniques, we need to discuss when a matrix is row elementary
equivalent to a triangular matrix. Let us illustrate this with an example.

Example. Consider the matrix

First we will transform the first column via elementary row operations
into one with the top number equal to 1 and the bottom ones equal 0.
Indeed, if we interchange the first row with the last one, we get

Next, we keep the first and last rows. And we subtract the first one
multiplied by 2 from the second one. We get

278
We are almost there. Looking at this matrix, we see that we can still take
care of the 1 (from the last row) under the -2. Indeed, if we keep the first
two rows and add the second one to the last one multiplied by 2, we get

We can't do more. Indeed, we stop the process whenever we have a


matrix which satisfies the following conditions

1.

any row consisting of zeros is below any row that contains at least
one nonzero number;

2.

the first (from left to right) nonzero entry of any row is to the left
of the first nonzero entry of any lower row.

Now if we make sure that the first nonzero entry of every row is 1, we
get a matrix in row echelon form. For example, the matrix above is not
in echelon form. But if we divide the second row by -2, we get

This matrix is in echelon form.

279
Matrix Exponential

The matrix exponential plays an important role in solving system of


linear differential equations. On this page, we will define such an object
and show its most important properties. The natural way of defining the
exponential of a matrix is to go back to the exponential function ex and
find a definition which is easy to extend to matrices. Indeed, we know
that the Taylor polynomials

converges pointwise to ex and uniformly whenever x is bounded. These


algebraic polynomials may help us in defining the exponential of a
matrix. Indeed, consider a square matrix A and define the sequence of
matrices

When n gets large, this sequence of matrices get closer and closer to a
certain matrix. This is not easy to show; it relies on the conclusion on ex

280
above. We write this limit matrix as eA. This notation is natural due to
the properties of this matrix. Thus we have the formula

One may also write this in series notation as

At this point, the reader may feel a little lost about the definition above.
To make this stuff clearer, let us discuss an easy case: diagonal matrices.

Example. Consider the diagonal matrix

It is easy to check that

for . Hence we have

281
Using the above properties of the exponential function, we deduce that

Indeed, for a diagonal matrix A, eA can always be obtained by replacing


the entries of A (on the diagonal) by their exponentials. Now let B be a
matrix similar to A. As explained before, then there exists an invertible
matrix P such that

B = P-1AP.

Moreover, we have

Bn = P-1AnP

for , which implies

282
This clearly implies that

In fact, we have a more general conclusion. Indeed, let A and B be two


square matrices. Assume that . Then we have .
Moreover, if B = P-1AP, then

eB = P-1eAP.

Example. Consider the matrix

This matrix is upper-triangular. Note that all the entries on the diagonal
are 0. These types of matrices have a nice property. Let us discuss this
for this example. First, note that

283
In this case, we have

In general, let A be a square upper-triangular matrix of order n. Assume


that all its entries on the diagonal are equal to 0. Then we have

Such matrix is called a nilpotent matrix. In this case, we have

As we said before, the reasons for using the exponential notation for
matrices reside in the following properties:

Theorem. The following properties hold:

1.

2.

if A and B commute, meaning AB = BA, then we have

284
eA+B = eAeB;

3.

for any matrix A, eA is invertible and

Application of Invertible Matrices: Coding

There are many ways to encrypt a message. And the use of coding has
become particularly significant in recent years (due to the explosion of
the internet for example). One way to encrypt or code a message uses
matrices and their inverse. Indeed, consider a fixed invertible matrix A.
Convert the message into a matrix B such that AB is possible to
perform. Send the message generated by AB. At the other end, they will
need to know A-1 in order to decrypt or decode the message sent. Indeed,
we have

which is the original message. Keep in mind that whenever an undesired


intruder finds A, we must be able to change it. So we should have a
mechanical way of generating simple matrices A which are invertible
and have simple inverse matrices. Note that, in general, the inverse of a

285
matrix involves fractions which are not easy to send in an electronic
form. The best is to have both A and its inverse with integers as their
entries. In fact, we can use our previous knowledge to generate such

class of matrices. Indeed, if A is a matrix such that its determinant is


and all its entries are integers, then A-1 will have entries which are
integers. So how do we generate such class of matrices? One practical

way is to start with an upper triangular matrix with on the diagonal


and integer-entries. Then we use the elementary row operations to
change the matrix while keeping the determinant unchanged. Do not
multiply rows with non-integers while doing elementary row operations.
Let us illustrate this on an example.

Example. Consider the matrix

First we keep the first row and add it to the second as well as to the third
rows. We obtain

286
Next we keep the first row again, we add the second to the third, and
finally add the last one to the first multiplied by -2. We obtain

This is our matrix A. Easy calculations will give det(A) = -1, which we
knew since the above elementary operations did not change the
determinant from the original triangular matrix which obviously has -1
as its determinant. We leave the details of the calculations to the reader.
The inverse of A is

To every letter we will associate a number. The easiest way to do that is


to associate 0 to a blank or space, 1 to A, 2 to B, etc... Another way is to
associate 0 to a blank or space, 1 to A, -1 to B, 2 to C, -2 to D, etc... Let
us use the second choice. So our message is given by the string

287
Now we rearrange these numbers into a matrix B. For example, we have

Then we perform the product AB, where A is the matrix found above.
We get

The encrypted message to be sent is

Complex numbers as Matrices

In this section, we use matrices to give a representation of complex


numbers. Indeed, consider the set

288
We will write

Clearly, the set is not empty. For example, we have

In particular, we have

for any real numbers a, b, c, and d.

Algebraic Properties of

1.

Addition: For any real numbers a, b, c, and d, we have

Ma,b + Mc,d = Ma+c,b+d.

In other words, if we add two elements of the set , we still get a


matrix in . In particular, we have

-Ma,b = M-a,-b.

2.

289
Multiplication by a number: We have

So a multiplication of an element of and a number gives a matrix


in .

2.

Multiplication: For any real numbers a, b, c, and d, we have

In other words, we have

Ma,b Mc,d = Mac-bd, ad+bc.

This is an extraordinary formula. It is quite conceivable given the


difficult form of the matrix multiplication that, a priori, the product
of two elements of may not be in again. But, in this case, it
turns out to be true.

290
The above properties infer to a very nice structure. The next natural
question to ask, in this case, is whether a nonzero element of is
invertible. Indeed, for any real numbers a and b, we have

So, if , the matrix Ma,b is invertible and

In other words, any nonzero element Ma,b of is invertible and its


inverse is still in since

In order to define the division in , we will use the inverse. Indeed,


recall that

So for the set , we have

Ma, b÷Mc, d = Ma, b×Mc, d-1 = Ma, b×M ,

which implies

291
Ma, b÷Mc, d = M ,-

The matrix Ma,-b is called the conjugate of Ma,b. Note that the conjugate
of the conjugate of Ma,b is Ma,b itself.

Fundamental Equation. For any Ma,b in , we have

Ma,b = a M1,0 + b M0,1 = a I2 + b M0,1.

Note that

M0,1 M0,1 = M-1,0 = - I2.

Remark. If we introduce an imaginary number i such that i2 = -1, then


the matrix Ma,b may be rewritten by

a + bi

A lot can be said about , but we will advise you to visit the page on
complex numbers.

DEFINITION: A matrix is defined as an ordered rectangular array of


numbers. They can be used to represent systems of linear equations, as
will be explained below.

Here are a couple of examples of different types of matrices:

Symmetric Diagonal Upper Lower Zero Identity

292
Triangula Triangula
r r

And a fully expanded m×n matrix A, would look like this:

... or in a more compact form:

Matrix Addition and Subtraction

DEFINITION: Two matrices A and B can be added or subtracted if and


only if their dimensions are the same (i.e. both matrices have the same
number of rows and columns. Take:

Addition

If A and B above are matrices of the same type then the sum is found by
adding the corresponding elements aij + bij .

293
Here is an example of adding A and B together.

Subtraction

If A and B are matrices of the same type then the subtraction is found by
subtracting the corresponding elements aij − bij.

Here is an example of subtracting matrices.

Now, try adding and subtracting your own matrices.

Matrix Multiplication

DEFINITION: When the number of columns of the first matrix is the


same as the number of rows in the second matrix then matrix
multiplication can be performed.

Here is an example of matrix multiplication for two 2×2 matrices.

Here is an example of matrix multiplication for two 3×3 matrices.

294
Now lets look at the n×n matrix case, Where A has dimensions m×n, B
has dimensions n×p. Then the product of A and B is the matrix C, which
has dimensions m×p. The ijth element of matrix C is found by
multiplying the entries of the ith row of A with the corresponding entries
in the jth column of B and summing the n terms. The elements of C are:

Note: That A×B is not the same as B×A

Transpose of Matrices

DEFINITION: The transpose of a matrix is found by exchanging rows


for columns i.e. Matrix A = (aij) and the transpose of A is:

AT = (aji) where j is the column number and i is the row number of


matrix A.

For example, the transpose of a matrix would be:

295
In the case of a square matrix (m = n), the transpose can be used to
check if a matrix is symmetric. For a symmetric matrix A = AT.

Now try an example.

Solving Systems of Equations using Matrices

DEFINITION: A system of linear equations is a set of equations with n


equations and n unknowns, is of the form of

The unknowns are denoted by x1, x2, ..., xn and the coefficients (a and b
above) are assumed to be given. In matrix form the system of equations
above can be written as:

296
A simplified way of writing above is like this: Ax = b

Now, try putting your own equations into matrix form.

Inverse Matrix Method

DEFINITION: The inverse matrix method uses the inverse of a matrix


to help solve a system of equations, such like the above Ax = b. By pre-
multiplying both sides of this equation by A-1 gives:

or alternatively

So by calculating the inverse of the matrix and multiplying this by the


vector b we can find the solution to the system of equations directly.
And from earlier we found that the inverse is given by

From the above it is clear that the existence of a solution depends on the
value of the determinant of A. There are three cases:

1. If the det(A) does not equal zero then solutions exist using

297
2. If the det(A) is zero and b=0 then the solution will be not be
unique or does not exist.
3. If the det(A) is zero and b=0 then the solution can be x = 0 but as
with 2. is not unique or does not exist.

Looking at two equations we might have that

Written in matrix form would look like

and by rearranging we would get that the solution would look like

Now try solving your own two equations with two unknowns.

Similarly for three simultaneous equations we would have:

298
Written in matrix form would look like

and by rearranging we would get that the solution would look like

Now try solving your own three equations with three unknowns.

Cramer's Rule

DEFINITION: Cramer's rule uses a method of determinants to solve


systems of equations. Starting with equation below,

The first term x1 above can be found by replacing the first column of A

by . Doing this we obtain:

299
Similarly for the general case for solving xr we replace the rth column of

A by and expand the determinant. This method of using determinants


can be applied to solve systems of linear equations. We will illustrate
this for solving two simultaneous equations in x and y and three
equations with 3 unknowns x, y and z.

Two simultaneous equations in x and y

To solve use the following:

or simplified:

Three simultaneous equations in x, y and z

ax + by + cz = p
dx + ey + fz = q
gx + hy + iz = r

To solve use the following:


300
More Problems on Linear Systems and Matrices

Let us answer (1). Consider the augmented matrix of the system

Let us do some row elementary operations. First take second Row minus
first row and third row minus 3 times first row, that is

to get

301
Next take to get

So the original system is consistent if and only if

Next we answer (2). If then we have

so the condition is satisfied. Therefore the system is consistent.

If , then we have

302
so the system in this case is not consistent.

Problem. Find the range of the matrix

ecall that the vector is in the range of if and only if there

exists a vector such that . This clearly implies that


is in the range of if and only if the linear system is
consistent. To discuss this let us consider the augmented matrix

Let us do some row elementary operations. First take

and we will get

303
Next take to get

So the system is consistent if and only if

So the vector is in the range of if and only if .

Problem. Let be a three-digit number such that

1. is equal to fifteen times the sum of its digits.


2. If the digits are reversed, then the new number is equal to plus 396.
3. The one's digit is one larger than the sum of the other digits.

Use linear systems to find .

304
Write . By definition we have . The first
condition translates into

Since the new number with reversed digits is , the


second condition translates into

And finally the last condition gives

Putting all the above equations together we get

or

305
Consider the augmented matrix of this system

Let us do some elementary row operations to solve it. First we


interchange the third row with the first one and multiply the new first

row with to get

Next take the and to get

Next take to get


306
Divide the third row with 781 to get

Next take and to get

Divide the second row by 9 to get to get

Finally take to get

307
So we have

or . Check that the original conditions are satisfied.

Problem. Use linear systems to find out if there exists a parabola that

passes through the points , , and . If the answer is yes,


find its equation.

The parabola will have the form . The three points will
be on the parabola if and only if

308
The existence of such a parabola translates into the consistency of the
above system. Consider the augmented matrix

Take and to get

Take and to get

Now take to get

309
From here, we can conclude that the parabola does exist. To find it, let

us continue with the row elementary operations. Take and

to get

Finally take

Hence

310
or .

Problem. Consider the matrix

Use systems to find all the 2x2 matrices such that

Use this conclusion to find a matrix such that

Answer.

Set

311
Then we have

and

So the equation implies

This clearly implies

So

where and are two arbitrary numbers. Consider the matrix

Since is not of the form found before, then we must have

312
INTRODUCTION TO DETERMINANTS

For any square matrix of order 2, we have found a necessary and


sufficient condition for invertibility. Indeed, consider the matrix

The matrix A is invertible if and only if . We called this


number the determinant of A. It is clear from this, that we would like to
have a similar result for bigger matrices (meaning higher orders). So is
there a similar notion of determinant for any square matrix, which
determines whether a square matrix is invertible or not?

In order to generalize such notion to higher orders, we will need to study


the determinant and see what kind of properties it satisfies. First let us
use the following notation for the determinant

PROPERTIES OF THE DETERMINANT

1.

Any matrix A and its transpose have the same determinant,


meaning

313
This is interesting since it implies that whenever we use rows, a
similar behavior will result if we use columns. In particular we will
see how row elementary operations are helpful in finding the
determinant. Therefore, we have similar conclusions for
elementary column operations.

2.

The determinant of a triangular matrix is the product of the entries


on the diagonal, that is

3.

If we interchange two rows, the determinant of the new matrix is


the opposite of the old one, that is

314
4.

If we multiply one row with a constant, the determinant of the new


matrix is the determinant of the old one multiplied by the constant,
that is

In particular, if all the entries in one row are zero, then the
determinant is zero.

5.

If we add one row to another one multiplied by a constant, the


determinant of the new matrix is the same as the old one, that is

Note that whenever you want to replace a row by something


(through elementary operations), do not multiply the row itself by
a constant. Otherwise, you will easily make errors (due to Property
4).

315
6.

We have

In particular, if A is invertible (which happens if and only if

), then

If A and B are similar, then .

Let us look at an example, to see how these properties work.

Example. Evaluate

316
Let us transform this matrix into a triangular one through elementary
operations. We will keep the first row and add to the second one the first

multiplied by . We get

Using the Property 2, we get

Therefore, we have

DETERMINANTS OF MATRICES OF HIGHER ORDER

As we said before, the idea is to assume that previous properties satisfied


by the determinant of matrices of order 2, are still valid in general. In
other words, we assume:

1.

Any matrix A and its transpose have the same determinant,


meaning

317
2.

The determinant of a triangular matrix is the product of the entries


on the diagonal.

3.

If we interchange two rows, the determinant of the new matrix is


the opposite of the old one.

4.

If we multiply one row with a constant, the determinant of the new


matrix is the determinant of the old one multiplied by the constant.

5.

If we add one row to another one multiplied by a constant, the


determinant of the new matrix is the same as the old one.

6. We have

In particular, if A is invertible (which happens if and only if

), then

318
So let us see how this works in case of a matrix of order 4.

Example. Evaluate

We have

If we subtract every row multiplied by the appropriate number from the


first row, we get

We do not touch the first row and work with the other rows. We
interchange the second with the third to get

319
If we subtract every row multiplied by the appropriate number from the
second row, we get

Using previous properties, we have

If we multiply the third row by 13 and add it to the fourth, we get

which is equal to 3. Putting all the numbers together, we get

320
These calculations seem to be rather lengthy. We will see later on that a
general formula for the determinant does exist.

Example. Evaluate

In this example, we will not give the details of the elementary


operations. We have

Example. Evaluate

We have

General Formula for the Determinant Let A be a square matrix of


order n. Write A = (aij), where aij is the entry on the row number i and

321
the column number j, for and . For any i and j,
set Aij (called the cofactors) to be the determinant of the square matrix
of order (n-1) obtained from A by removing the row number i and the
column number j multiplied by (-1)i+j. We have

for any fixed i, and

for any fixed j. In other words, we have two type of formulas: along a
row (number i) or along a column (number j). Any row or any column
will do. The trick is to use a row or a column which has a lot of zeros.
In particular, we have along the rows

or

322
or

As an exercise write the formulas along the columns.

Example. Evaluate

We will use the general formula along the third row. We have

Which technique to evaluate a determinant is easier ? The answer


depends on the person who is evaluating the determinant. Some like the
elementary row operations and some like the general formula. All that
matters is to get the correct answer.

323
DETERMINANT AND INVERSE OF MATRICES

Finding the inverse of a matrix is very important in many areas of


science. For example, decrypting a coded message uses the inverse of a
matrix. Determinant may be used to answer this problem. Indeed, let A
be a square matrix. We know that A is invertible if and only if

. Also if A has order n, then the cofactor Ai,j is defined as the


determinant of the square matrix of order (n-1) obtained from A by
removing the row number i and the column number j multiplied by (-
1)i+j. Recall

for any fixed i, and

for any fixed j. Define the adjoint of A, denoted adj(A), to be the


transpose of the matrix whose ijth entry is Aij.

Example. Let

324
We have

Let us evaluate . We have

Note that . Therefore, we have

Is this formula only true for this matrix, or does a similar formula exist
for any square matrix? In fact, we do have a similar formula.

Theorem. For any square matrix A of order n, we have

325
In particular, if , then

For a square matrix of order 2, we have

which gives

This is a formula which we used on a previous page.

On the next page, we will discuss the application of the above formulas
to linear systems.

326
APPLICATION OF DETERMINANT TO SYSTEMS: CRAMER'S
RULE

We have seen that determinant may be useful in finding the inverse of a


nonsingular matrix. We can use these findings in solving linear systems
for which the matrix coefficient is nonsingular (or invertible).

Consider the linear system (in matrix form)

AX=B

where A is the matrix coefficient, B the nonhomogeneous term, and X


the unknown column-matrix. We have:

Theorem. The linear system AX = B has a unique solution if and only if


A is invertible. In this case, the solution is given by the so-called
Cramer's formulas:

327
where xi are the unknowns of the system or the entries of X, and the
matrix Ai is obtained from A by replacing the ith column by the column
B. In other words, we have

where the bi are the entries of B.

In particular, if the linear system AX = B is homogeneous, meaning


, then if A is invertible, the only solution is the trivial one, that is
. So if we are looking for a nonzero solution to the system, the
matrix coefficient A must be singular or noninvertible. We also know

that this will happen if and only if . This is an important


result.

Example. Solve the linear system

328
Answer. First note that

which implies that the matrix coefficient is invertible. So we may use the
Cramer's formulas. We have

We leave the details to the reader to find

Eigenvalues and Eigenvectors: An Introduction

The eigenvalue problem is a problem of considerable theoretical interest


and wide-ranging application. For example, this problem is crucial in
solving systems of differential equations, analyzing population growth

329
models, and calculating powers of matrices (in order to define the
exponential matrix). Other areas such as physics, sociology, biology,
economics and statistics have focused considerable attention on
"eigenvalues" and "eigenvectors"-their applications and their
computations. Before we give the formal definition, let us introduce
these concepts on an example.

Example. Consider the matrix

Consider the three column matrices

We have

330
In other words, we have

Next consider the matrix P for which the columns are C1, C2, and C3,
i.e.,

We have det(P) = 84. So this matrix is invertible. Easy calculations give

Next we evaluate the matrix P-1AP. We leave the details to the reader to
check that we have
331
In other words, we have

Using the matrix multiplication, we obtain

which implies that A is similar to a diagonal matrix. In particular, we


have

332
for . Note that it is almost impossible to find A75 directly
from the original form of A.

This example is so rich of conclusions that many questions impose


themselves in a natural way. For example, given a square matrix A, how
do we find column matrices which have similar behaviors as the above
ones? In other words, how do we find these column matrices which will
help find the invertible matrix P such that P-1AP is a diagonal matrix?

From now on, we will call column matrices vectors. So the above
column matrices C1, C2, and C3 are now vectors. We have the following
definition.

Definition. Let A be a square matrix. A non-zero vector C is called an


eigenvector of A if and only if there exists a number (real or complex)
such that

If such a number exists, it is called an eigenvalue of A. The vector C


is called eigenvector associated to the eigenvalue .

Remark. The eigenvector C must be non-zero since we have

333
for any number .

Example. Consider the matrix

We have seen that

where

So C1 is an eigenvector of A associated to the eigenvalue 0. C2 is an


eigenvector of A associated to the eigenvalue -4 while C3 is an
eigenvector of A associated to the eigenvalue 3.

It may be interesting to know whether we found all the eigenvalues of A


in the above example. In the next page, we will discuss this question as
well as how to find the eigenvalues of a square matrix

334
Computation of Eigenvalues

For a square matrix A of order n, the number is an eigenvalue if and


only if there exists a non-zero vector C such that

Using the matrix multiplication properties, we obtain

This is a linear system for which the matrix coefficient is . We


also know that this system has one solution if and only if the matrix

coefficient is invertible, i.e. . Since the zero-vector is a


solution and C is not the zero vector, then we must have

Example. Consider the matrix

335
The equation translates into

which is equivalent to the quadratic equation

Solving this equation leads to

In other words, the matrix A has only two eigenvalues.

In general, for a square matrix A of order n, the equation

will give the eigenvalues of A. This equation is called the characteristic


equation or characteristic polynomial of A. It is a polynomial function
in of degree n. So we know that this equation will not have more than

336
n roots or solutions. So a square matrix A of order n will not have more
than n eigenvalues.

Example. Consider the diagonal matrix

Its characteristic polynomial is

So the eigenvalues of D are a, b, c, and d, i.e. the entries on the diagonal.

This result is valid for any diagonal matrix of any size. So depending on
the values you have on the diagonal, you may have one eigenvalue, two
eigenvalues, or more. Anything is possible.

Remark. It is quite amazing to see that any square matrix A has the
same eigenvalues as its transpose AT because

For any square matrix of order 2, A, where

337
the characteristic polynomial is given by the equation

The number (a+d) is called the trace of A (denoted tr(A)), and clearly
the number (ad-bc) is the determinant of A. So the characteristic
polynomial of A can be rewritten as

Let us evaluate the matrix

B = A2 - tr(A) A + det(A) I2.

We have

We leave the details to the reader to check that

In other word, we have

338
This equation is known as the Cayley-Hamilton theorem. It is true for
any square matrix A of any order, i.e.

where is the characteristic polynomial of A.

We have some properties of the eigenvalues of a matrix.

Theorem. Let A be a square matrix of order n. If is an eigenvalue of


A, then:

1.

is an eigenvalue of Am, for

2.

If A is invertible, then is an eigenvalue of A-1.

3.

A is not invertible if and only if is an eigenvalue of A.

4.

339
If is any number, then is an eigenvalue of .

5.

If A and B are similar, then they have the same characteristic


polynomial (which implies they also have the same eigenvalues).

Computation of Eigenvectors

Let A be a square matrix of order n and one of its eigenvalues. Let X


be an eigenvector of A associated to . We must have

This is a linear system for which the matrix coefficient is . Since


the zero-vector is a solution, the system is consistent. In fact, we will in
a different page that the structure of the solution set of this system is
very rich. In this page, we will basically discuss how to find the
solutions.

Remark. It is quite easy to notice that if X is a vector which satisfies


, then the vector Y = c X (for any arbitrary number c) satisfies
the same equation, i.e. . In other words, if we know that X is
an eigenvector, then cX is also an eigenvector associated to the same
eigenvalue.

340
Let us start with an example.

Example. Consider the matrix

First we look for the eigenvalues of A. These are given by the

characteristic equation , i.e.

If we develop this determinant using the third column, we obtain

Using easy algebraic manipulations, we get

341
which implies that the eigenvalues of A are 0, -4, and 3.
Next we look for the eigenvectors.

1.

Case : The associated eigenvectors are given by the linear


system

which may be rewritten by

Many ways may be used to solve this system. The third equation is
identical to the first. Since, from the second equations, we have y =
6x, the first equation reduces to 13x + z = 0. So this system is
equivalent to

So the unknown vector X is given by

342
Therefore, any eigenvector X of A associated to the eigenvalue 0 is
given by

where c is an arbitrary number.

2.

Case : The associated eigenvectors are given by the linear


system

which may be rewritten by

343
In this case, we will use elementary operations to solve it. First we

consider the augmented matrix , i.e.

Then we use elementary row operations to reduce it to a upper-


triangular form. First we interchange the first row with the first one
to get

Next, we use the first row to eliminate the 5 and 6 on the first
column. We obtain

If we cancel the 8 and 9 from the second and third row, we obtain

Finally, we subtract the second row from the third to get

344
Next, we set z = c. From the second row, we get y = 2z = 2c. The
first row will imply x = -2y+3z = -c. Hence

Therefore, any eigenvector X of A associated to the eigenvalue -4


is given by

where c is an arbitrary number.

2.

Case : The details for this case will be left to the reader.
Using similar ideas as the one described above, one may easily
show that any eigenvector X of A associated to the eigenvalue 3 is
given by

345
where c is an arbitrary number.

Remark. In general, the eigenvalues of a matrix are not all distinct from
each other (see the page on the eigenvalues for more details). In the next
two examples, we discuss this problem.

Example. Consider the matrix

The characteristic equation of A is given by

Hence the eigenvalues of A are -1 and 8. For the eigenvalue 8, it is easy


to show that any eigenvector X is given by

where c is an arbitrary number. Let us focus on the eigenvalue -1. The


associated eigenvectors are given by the linear system

346
which may be rewritten by

Clearly, the third equation is identical to the first one which is also a
multiple of the second equation. In other words, this system is equivalent
to the system reduced to one equation

2x+y + 2z= 0.

To solve it, we need to fix two of the unknowns and deduce the third

one. For example, if we set and , we obtain .


Therefore, any eigenvector X of A associated to the eigenvalue -1 is
given by

In other words, any eigenvector X of A associated to the eigenvalue -1 is


a linear combination of the two eigenvectors

347
Example. Consider the matrix

The characteristic equation is given by

Hence the matrix A has one eigenvalue, i.e. -3. Let us find the associated
eigenvectors. These are given by the linear system

which may be rewritten by

This system is equivalent to the one equation-system

x - y = 0.

So if we set x = c, then any eigenvector X of A associated to the


eigenvalue -3 is given by

348
Let us summarize what we did in the above examples.

Summary: Let A be a square matrix. Assume is an eigenvalue of A.


In order to find the associated eigenvectors, we do the following steps:

1.

Write down the associated linear system

2.

Solve the system.

3.

Rewrite the unknown vector X as a linear combination of known


vectors.

The above examples assume that the eigenvalue is real number. So


one may wonder whether any eigenvalue is always real. In general, this
is not the case except for symmetric matrices. The proof of this is very
complicated. For square matrices of order 2, the proof is quite easy. Let
us give it here for the sake of being little complete.
Consider the symmetric square matrix

349
Its characteristic equation is given by

This is a quadratic equation. The nature of its roots (which are the
eigenvalues of A) depends on the sign of the discriminant

Using algebraic manipulations, we get

Therefore, is a positive number which implies that the eigenvalues of


A are real numbers.

Remark. Note that the matrix A will have one eigenvalue, i.e. one
double root, if and only if . But this is possible only if a=c and
b=0. In other words, we have

A = a I2.

The Case of Complex Eigenvalues

First let us convince ourselves that there exist matrices with complex
eigenvalues.

Example. Consider the matrix

350
The characteristic equation is given by

This quadratic equation has complex roots given by

Therefore the matrix A has only complex eigenvalues.

The trick is to treat the complex eigenvalue as a real one. Meaning we


deal with it as a number and do the normal calculations for the
eigenvectors. Let us see how it works on the above example.

We will do the calculations for . The associated eigenvectors


are given by the linear system

A X = (1+2i) X

which may be rewritten as

351
In fact the two equations are identical since (2+2i)(2-2i) = 8. So the
system reduces to one equation

(1-i)x - y = 0.

Set x=c, then y = (1-i)c. Therefore, we have

where c is an arbitrary number.

Remark. It is clear that one should expect to have complex entries in the
eigenvectors.

We have seen that (1-2i) is also an eigenvalue of the above matrix. Since
the entries of the matrix A are real, then one may easily show that if is
a complex eigenvalue, then its conjugate is also an eigenvalue.
Moreover, if X is an eigenvector of A associated to , then the vector
, obtained from X by taking the complex-conjugate of the entries of
X, is an eigenvector associated to . So the eigenvectors of the above
matrix A associated to the eigenvalue (1-2i) are given by

352
where c is an arbitrary number.

Let us summarize what we did in the above example.

Summary: Let A be a square matrix. Assume is a complex eigenvalue


of A. In order to find the associated eigenvectors, we do the following
steps:

1.

Write down the associated linear system

2.

Solve the system. The entries of X will be complex numbers.

3.

Rewrite the unknown vector X as a linear combination of known


vectors with complex entries.

4.

If A has real entries, then the conjugate is also an eigenvalue.


The associated eigenvectors are given by the same equation found
in 3, except that we should take the conjugate of the entries of the
vectors involved in the linear combination.

353
In general, it is normal to expect that a square matrix with real entries
may still have complex eigenvalues. One may wonder if there exists a
class of matrices with only real eigenvalues. This is the case for
symmetric matrices. The proof is very technical and will be discussed in
another page. But for square matrices of order 2, the proof is quite easy.
Let us give it here for the sake of being little complete.

Consider the symmetric square matrix

Its characteristic equation is given by

This is a quadratic equation. The nature of its roots (which are the
eigenvalues of A) depends on the sign of the discriminant

Using algebraic manipulations, we get

354
Therefore, is a positive number which implies that the eigenvalues of
A are real numbers.

Remark. Note that the matrix A will have one eigenvalue, i.e. one
double root, if and only if . But this is possible only if a=c and
b=0. In other words, we have

A = a I2.

Diagonalization

When we introduced eigenvalues and eigenvectors, we wondered when


a square matrix is similarly equivalent to a diagonal matrix? In other
words, given a square matrix A, does a diagonal matrix D exist such that
? (i.e. there exists an invertible matrix P such that A = P-1DP)

In general, some matrices are not similar to diagonal matrices. For


example, consider the matrix

Assume there exists a diagonal matrix D such that A = P-1DP. Then we


have

355
i.e is similar to . So they have the same characteristic
equation. Hence A and D have the same eigenvalues. Since the
eigenvalues of D of the numbers on the diagonal, and the only
eigenvalue of A is 2, then we must have

In this case, we must have A = P-1DP = 2 I2, which is not the case.
Therefore, A is not similar to a diagonal matrix.

Definition. A matrix is diagonalizable if it is similar to a diagonal


matrix.

Remark. In a previous page, we have seen that the matrix

has three different eigenvalues. We also showed that A is


diagonalizable. In fact, there is a general result along these lines.

Theorem. Let A be a square matrix of order n. Assume that A has n


distinct eigenvalues. Then A is diagonalizable. Moreover, if P is the
matrix with the columns C1, C2, ..., and Cn the n eigenvectors of A, then
the matrix P-1AP is a diagonal matrix. In other words, the matrix A is
diagonalizable.
356
Theorem. Let A be a square matrix of order n. In order to find out
whether A is diagonalizable, we do the following steps:

1.

Write down the characteristic polynomial

2.

Factorize . In this step, we should be able to get

where the , , may be real or complex. For every i, the


powers ni is called the (algebraic) multiplicity of the eigenvalue

3.

For every eigenvalue, find the associated eigenvectors. For

example, for the eigenvalue , the eigenvectors are given by the


linear system

357
Then solve it. We should find the unknown vector X as a linear
combination of vectors, i.e.

where , are arbitrary numbers. The integer mi is

called the geometric multiplicity of .

4.

If for every eigenvalue the algebraic multiplicity is equal to the


geometric multiplicity, then we have

which implies that if we put the eigenvectors Cj, we obtained in 3.


for all the eigenvalues, we get exactly n vectors. Set P to be the
square matrix of order n for which the column vectors are the
eigenvectors Cj. Then P is invertible and

358
is a diagonal matrix with diagonal entries equal to the eigenvalues
of A. The position of the vectors Cj in P is identical to the position
of the associated eigenvalue on the diagonal of D. This identity
implies that A is similar to D. Therefore, A is diagonalizable.

Remark. If the algebraic multiplicity ni of the eigenvalue is


equal to 1, then obviously we have mi = 1. In other words, ni = mi.

5.

If for some eigenvalue the algebraic multiplicity is not equal to the


geometric multiplicity, then A is not diagonalizable.

Example. Consider the matrix

In order to find out whether A is diagonalizable, lt us follow the steps


described above.

1.

The polynomial characteristic of A is

359
So -1 is an eigenvalue with multiplicity 2 and -2 with multiplicity
1.

2.

In order to find out whether A is diagonalizable, we only


concentrate ur attention on the eigenvalue -1. Indeed, the
eigenvectors associated to -1, are given by the system

This system reduces to the equation -y + z = 0. Set and ,


then we have

So the geometric multiplicity of -1 is 2 the same as its algebraic


multiplicity. Therefore, the matrix A is diagonalizable. In order to
find the matrix P we need to find an eigenvector associated to -2.
The associated system is

360
which reduces to the system

Set , then we have

Set

Then

But if we set

361
then

We have seen that if A and B are similar, then An can be expressed


easily in terms of Bn. Indeed, if we have A = P-1BP, then we have An
= P-1BnP. In particular, if D is a diagonal matrix, Dn is easy to
evaluate. This is one application of the diagonalization. In fact, the
above procedure may be used to find the square root and cubic
root of a matrix. Indeed, consider the matrix above

Set

then

362
Hence A = P D P-1. Set

Then we have

B3 = A.

In other words, B is a cubic root of A.

CHAPTER 8: SYSTEMS OF LINEAR EQUATIONS

System of Equations: An Introduction

Many books on linear algebra will introduce matrices via systems of


linear equations. We tried a different approach. We hope this way you
will appreciate matrices as a powerful tool useful not only to solve linear
systems of equations. Basically, the problem of finding some unknowns
linked to each others via equations is called a system of equations. For
example,

363
and

are systems of two equations with two unknowns (x and y), while

is a system of two equations with three unknowns (x, y, and z).

These systems of equations occur naturally in many real life problems.


For example, consider a nutritious drink which consists of whole egg,
milk, and orange juice. The food energy and protein for each of the
ingredients are given by the table:

A natural question to ask is how much of each ingredient do we need to


produce a drink of 540 calories and 25 grams of protein. In order to

364
answer that, let x be the number of eggs, y the amount of milk (in cups),
and z the amount of orange of juice (in cups). Then we need to have

The task of Solving a system consists of finding the unknowns, here: x,


y and z. A solution is a set of numbers once substituted for the
unknowns will satisfy the equations of the system. For example, (2,1,2)
and (0.325, 2.25, 1.4) are solutions to the system above.

The fundamental problem associated to any system is to find all the


solutions. One way is to study the structure of its set of solutions which,
in some cases, may help finding the solutions. Indeed, for example, in
order to find the solutions to a linear system, it is enough to find just a
few of them. This is possible because of the rich structure of the set of
solutions.

Systems of Linear Equations: Gaussian Elimination

It is quite hard to solve non-linear systems of equations, while linear


systems are quite easy to study. There are numerical techniques which
help to approximate nonlinear systems with linear ones in the hope that

365
the solutions of the linear systems are close enough to the solutions of
the nonlinear systems. We will not discuss this here. Instead, we will
focus our attention on linear systems.

For the sake of simplicity, we will restrict ourselves to three, at most


four, unknowns. The reader interested in the case of more unknowns
may easily extend the following ideas.

Definition. The equation

ax+by+cz+dw=h

where a, b, c, d, and h are known numbers, while x, y, z, and w are


unknown numbers, is called a linear equation. If h =0, the linear
equation is said to be homogeneous. A linear system is a set of linear
equations and a homogeneous linear system is a set of homogeneous
linear equations.

For example,

and

366
are linear systems, while

is a nonlinear system (because of y2). The system

is an homogeneous linear system.

Matrix Representation of a Linear System

Matrices are helpful in rewriting a linear system in a very simple form.


The algebraic properties of matrices may then be used to solve systems.
First, consider the linear system

367
Set the matrices

Using matrix multiplications, we can rewrite the linear system above as


the matrix equation

As you can see this is far nicer than the equations. But sometimes it is
worth to solve the system directly without going through the matrix
form. The matrix A is called the matrix coefficient of the linear system.
The matrix C is called the nonhomogeneous term. When , the
linear system is homogeneous. The matrix X is the unknown matrix. Its
entries are the unknowns of the linear system. The augmented matrix
associated with the system is the matrix [A|C], where

368
In general if the linear system has n equations with m unknowns, then
the matrix coefficient will be a nxm matrix and the augmented matrix an
nx(m+1) matrix. Now we turn our attention to the solutions of a system.

Definition. Two linear systems with n unknowns are said to be


equivalent if and only if they have the same set of solutions.

This definition is important since the idea behind solving a system is to


find an equivalent system which is easy to solve. You may wonder how
we will come up with such system? Easy, we do that through
elementary operations. Indeed, it is clear that if we interchange two
equations, the new system is still equivalent to the old one. If we
multiply an equation with a nonzero number, we obtain a new system
still equivalent to old one. And finally replacing one equation with the
sum of two equations, we again obtain an equivalent system. These
operations are called elementary operations on systems. Let us see how
it works in a particular case.

Example. Consider the linear system

369
The idea is to keep the first equation and work on the last two. In doing
that, we will try to kill one of the unknowns and solve for the other two.
For example, if we keep the first and second equation, and subtract the
first one from the last one, we get the equivalent system

Next we keep the first and the last equation, and we subtract the first
from the second. We get the equivalent system

Now we focus on the second and the third equation. We repeat the same
procedure. Try to kill one of the two unknowns (y or z). Indeed, we keep
the first and second equation, and we add the second to the third after
multiplying it by 3. We get

This obviously implies z = -2. From the second equation, we get y = -2,
and finally from the first equation we get x = 4. Therefore the linear
system has one solution

370
Going from the last equation to the first while solving for the unknowns
is called backsolving.

Keep in mind that linear systems for which the matrix coefficient is
upper-triangular are easy to solve. This is particularly true, if the matrix
is in echelon form. So the trick is to perform elementary operations to
transform the initial linear system into another one for which the
coefficient matrix is in echelon form.
Using our knowledge about matrices, is there anyway we can rewrite
what we did above in matrix form which will make our notation (or
representation) easier? Indeed, consider the augmented matrix

Let us perform some elementary row operations on this matrix. Indeed,


if we keep the first and second row, and subtract the first one from the
last one we get

371
Next we keep the first and the last rows, and we subtract the first from
the second. We get

Then we keep the first and second row, and we add the second to the
third after multiplying it by 3 to get

This is a triangular matrix which is not in echelon form. The linear


system for which this matrix is an augmented one is

As you can see we obtained the same system as before. In fact, we


followed the same elementary operations performed above. In every step
the new matrix was exactly the augmented matrix associated to the new

372
system. This shows that instead of writing the systems over and over
again, it is easy to play around with the elementary row operations and
once we obtain a triangular matrix, write the associated linear system
and then solve it. This is known as Gaussian Elimination. Let us
summarize the procedure:

Gaussian Elimination. Consider a linear system.

1.

Construct the augmented matrix for the system;

2.

Use elementary row operations to transform the augmented matrix


into a triangular one;

3.

Write down the new linear system for which the triangular matrix
is the associated augmented matrix;

4.

Solve the new system. You may need to assign some parametric
values to some unknowns, and then apply the method of back
substitution to solve the new system.

Example. Solve the following system via Gaussian elimination

373
The augmented matrix is

We use elementary row operations to transform this matrix into a


triangular one. We keep the first row and use it to produce all zeros
elsewhere in the first column. We have

Next we keep the first and second row and try to have zeros in the
second column. We get

374
Next we keep the first three rows. We add the last one to the third to get

This is a triangular matrix. Its associated system is

Clearly we have v = 1. Set z=s and w=t, then we have

The first equation implies

x=2+ y+ z-w- v.

Using algebraic manipulations, we get

x=- - s - t.

Putting all the stuff together, we have

375
Example. Use Gaussian elimination to solve the linear system

The associated augmented matrix is

We keep the first row and subtract the first row multiplied by 2 from the
second row. We get

This is a triangular matrix. The associated system is

Clearly the second equation implies that this system has no solution.
Therefore this linear system has no solution.
376
Definition. A linear system is called inconsistent or overdetermined if
it does not have a solution. In other words, the set of solutions is empty.
Otherwise the linear system is called consistent.

Following the example above, we see that if we perform elementary row


operations on the augmented matrix of the system and get a matrix with

one of its rows equal to , where , then the system is


inconsistent.

SYSTEMS OF EQUATIONS in TWO VARIABLES

A system of equations is a collection of two or more equations with the


same set of unknowns. In solving a system of equations, we try to find
values for each of the unknowns that will satisfy every equation in the
system.
The equations in the system can be linear or non-linear. This tutorial
reviews systems of linear equations.

Two simultaneous equations in x and y

To solve use the following:

377
or simplified:

Three simultaneous equations in x, y and z

ax + by + cz = p
dx + ey + fz = q
gx + hy + iz = r

To solve use the following:

A problem can be expressed in narrative form or the problem can be


expressed in algebraic form.

Let's start with an example stated in narrative form. We'll convert it to an


equivalent equation in algebraic form, and then we will solve it.
Example 1:

A total of $12,000 is invested in two funds paying 9% and 11% simple


interest. If the yearly interest is $1,180, how much of the $12,000 is
invested at each rate?

378
Before you work this problem, you must know the definition of simple
interest. Simple interest can be calculated by multiplying the amount
invested at the interest rate.

Solution:

We have two unknowns: the amount of money invested at 9% and the


amount of money invested at 11%. Our objective is to find these two
numbers.
Sentence (1) ''A total of $12,000 is invested in two funds paying 9% and
11% simple interest.'' can be restated as (The amount of money invested

at 9%) (The amount of money invested at 11%) $12,000.


Sentence (2) ''If the yearly interest is $1,180, how much of the $12,000
is invested at each rate?'' can be restated as (The amount of money

invested at 9%) 9% + (The amount of money invested at 11% 11%)


total interest of $1,180.
It is going to get tiresome writing the two phrases (The amount of
money invested at 9%) and (The amount of money invested at 11%)
over and over again. So let's write them in shortcut form. Call the phrase
(The amount of money invested at 9%) by the symbol and call the

379
phrase (The amount of money invested at 11%) by the symbol .
Let's rewrite sentences (1) and (2) in shortcut form.

We have converted a narrative statement of the problem to an


equivalent algebraic statement ofthe problem. Let's solve this
system of equations. A system of linear equations can be solved
four different ways:

THE METHOD OF SUBSTITUTION:


The method of substitution involves five steps:

Step 1: Solve for y in equation (1).

Step 2: Substitute this value for y in equation (2). This will change
equation (2) to an equation with just one variable, x.

380
Step 3: Solve for x in the translated equation (2).

Step 4: Substitute this value of x in the y equation you obtained in Step


1.

Step 5: Check your answers by substituting the values of x and y in each


of the original equations. If, after the substitution, the left side of the
equation equals the right side of the equation, you know that your
answers are correct.

381
The Method of Elimination:
The process of elimination involves five steps:
In a two-variable problem rewrite the equations so that when the
equations are added, one of the variables is eliminated, and then solve
for the remaining variable.

Step 1: Change equation (1) by multiplying equation (1) by to


obtain a new and equivalent equation (1).

Step 2: Add new equation (1) to equation (2) to obtain equation (3).

382
Step 3: Substitute in equation (1) and solve for x.

Step 4: Check your answers in equation (2). Does

The Method of Matrices:

This method is essentially a shortcut for the method of elimination.

383
Rewrite equations (1) and (2) without the variables and operators. The
left column contains the coefficients of the x's, the middle column
contains the coefficients of the y's, and the right column contains the
constants.

The objective is to reorganize the original matrix into one that looks like

where a and b are the solutions to the system.


Step 1. Manipulate the matrix so that the number in cell 11 (row 1-col 1)
is 1. In this case, we don't have to do anything. The number 1 is already
in the cell.
Step 2: Manipulate the matrix so that the number in cell 21 is 0. To do
this we rewrite the matrix by keeping row 1 and creating a new row 2 by
adding -0.09 x row 1 to row 2.

384
Step 3: Manipulate the matrix so that the cell 22 is 1. Do this by
multiplying row 2 by 50.

Step 4: Manipulate the matrix so that cell 12 is 0. Do this by adding

You can read the answers off the matrix as x = $7,000 and y = $5,000.

SYSTEMS OF EQUATIONS IN THREE VARIABLE

It is often desirable or even necessary to use more than one variable to


model a situation in a field such as business, science, psychology,
engineering, education, and sociology, to name a few. When this is the
case, we write and solve a system of equations in order to answer
questions about the situation.

385
If a system of linear equations has at least one solution, it is consistent.
If the system has no solutions, it is inconsistent. If the system has an
infinity number of solutions, it is dependent. Otherwise it is
independent.

A linear equation in three variables is an equation equivalent to the


equation

where , , , and are real numbers and , , , and are not all
.

Example 1:
John inherited $25,000 and invested part of it in a money market
account, part in municipal bonds, and part in a mutual fund. After one
year, he received a total of $1,620 in simple interest from the three
investments. The money market paid 6% annually, the bonds paid 7%
annually, and the mutually fund paid 8% annually. There was $6,000
more invested in the bonds than the mutual funds. Find the amount John
invested in each category.
There are three unknowns:
1 : The amount of money invested in the money market account.
2 : The amount of money invested in municipal bonds.
3 : The amount of money invested in a mutual fund.

386
Let's rewrite the paragraph that asks the question we are to answer.

[The amount of money invested in the money market account + [The


amount of money invested in municipal bonds ] + [The amount of

money invested in a mutual fund ]

The 6% interest on [ The amount of money invested in the money


market account ]+ the 7% interest on [ The amount of money invested in
municipal bonds ] + the 8% interest on [ The amount of money invested

in a mutual fund ]

[The amount of money invested in municipal bonds ] - [ The amount of

money invested in a mutual fund ] = .

It is going to get boring if we keep repeating the phrases


1 : The amount of money invested in the money market account.
2 : The amount of money invested in municipal bonds.
3 : The amount of money invested in a mutual fund.

Let's create a shortcut by letting symbols represent these phrases. Let

x = The amount of money invested in the money market account.


y = The amount of money invested in municipal bonds.

387
z = The amount of money invested in a mutual fund.

in the three sentences, and then rewrite them.


The sentence [ The amount of money invested in the money market

account ] [ The amount of money invested in municipal bonds ] [

The amount of money invested in a mutual fund ] can now be


written as

The sentence The interest on [ The amount of money invested in the

money market account ] the interest on [ The amount of money

invested in municipal bonds ] the interest on [ The amount of

money invested in a mutual fund ] can now be written as

The sentence [ The amount of money invested in municipal bonds ] [

The amount of money invested in a mutual fund ] can now be


written as

We have converted the problem from one described by words to one that
is described by three equations.

388
We are going to show you how to solve this system of equations three
different ways:
1) Substitution,
2) Elimination,
3) Matrices.
SUBSTITUTION:
The process of substitution involves several steps:
Step 1: Solve for one of the variables in one of the equations. It
makes no difference which equation and which variable you choose.

Let's solve for in equation (3) because the equation only has two
variables.

Step 2: Substitute this value for in equations (1) and (2). This will
change equations (1) and (2) to equations in the two variables and .
Call the changed equations (4) and (5).

389
or

Step 3: Solve for in equation (4).

Step 4: Substitute this value of in equation (5). This will give you
an equation in one variable.

Step 5: Solve for .

Step 6: Substitute this value of in equation (4) and solve for .


390
Step 7: Substitute for and for in equation (1) and

solve for .

The solutions: is invested in the monkey market account, is

invested in the municipal bonds, and is invested in mutual funds.


Step 8: Check the solutions:

Yes

Yes

Yes

ELIMINATION:
The process of elimination involves several steps: First you reduce three
equations to two equations with two variables, and then to one equation
with one variable.
Step 1: Decide which variable you will eliminate. It makes no
difference which one you choose. Let us eliminate first because is
missing from equation (3).

391
Step 2: Multiply both sides of equation (1) by and then add the
transformed equation (1) to equation (2) to form equation (4).

(1) :

(2) :

(4) :
Step 3: We now have two equations with two variables.

(3) :

(4) :
Step 4: Multiply both sides of equation (3) by and add to
equation (4) to create equation (5) with just one variable.

(3) :

(4) :

(5) :

Step 5: Solve for in equation (5).

Step 6: Substitute for in equation (3) and solve for .

392
Step 7: Substitute for and for in equation (1) and
solve for .

Example. Solve the linear system

Answer. First note that

393
which implies that the matrix coefficient is invertible. So we may use the
Cramer's formulas. We have

We leave the details to the reader to find

Note that it is easy to see that z=0. Indeed, the determinant which gives z
has two identical rows (the first and the last). We do encourage you to
check that the values found for x, y, and z are indeed the solution to the
given system.

Remark. Remember that Cramer's formulas are only valid for linear
systems with an invertible matrix coefficient.

394
Three simultaneous equations in x, y and z

ax + by + cz = p
dx + ey + fz = q
gx + hy + iz = r

To solve use the following:

CHAPTER 9: COMPLEX NUMBERS

9.1.DEFINITION

Up until now, you've been told that you can't take the square root of a
negative number. That's because you had no numbers which were
negative after you'd squared them (so you couldn't "go backwards" by
taking the square root). Every number was positive after you squared it.
So you couldn't very well square-root a negative and expect to come up
with anything sensible.

Now, however, you can take the square root of a negative number, but it
involves using a new number to do it. This new number was invented

395
(discovered?) around the time of the Reformation. At that time, nobody
believed that any "real world" use would be found for this new number,
other than easing the computations involved in solving certain equations,
so the new number was viewed as being a pretend number invented for
convenience sake.

(But then, when you think about it, aren't all numbers inventions? It's not
like numbers grow on trees! They live in our heads. We made them all
up! Why not invent a new one, as long as it works okay with what we
already have?)

Anyway, this new number was called "i", standing for "imaginary",
because "everybody knew" that i wasn't "real". (That's why you couldn't
take the square root of a negative number before: you only had "real"
numbers; that is, numbers without the "i" in them.) The imaginary is
defined to be:

Then:

Now, you may think you can do this:

396
But this doesn't make any sense! You already have two numbers that
square to 1; namely –1 and +1. And i already squares to –1. So it's not
reasonable that i would also square to 1. This points out an important
detail: When dealing with imaginaries, you gain something (the ability
to deal with negatives inside square roots), but you also lose something
(some of the flexibility and convenient rules you used to have when
dealing with square roots). In particular, YOU MUST ALWAYS DO
THE i-PART FIRST!

• Simplify sqrt(–9).

(Warning: The step that goes through the third "equals" sign is " ",
not " ". The i is outside the radical.)

• Simplify sqrt(–25).

• Simplify sqrt(–18).

• Simplify –sqrt(–6).

397
In your computations, you will deal with i just as you would with x,
except for the fact that x2 is just x2, but i2 is –1:

• Simplify 2i + 3i.

2i + 3i = (2 + 3)i = 5i

• Simplify 16i – 5i.

16i – 5i = (16 – 5)i = 11i

• Multiply and simplify (3i)(4i).

(3i)(4i) = (3·4)(i·i) = (12)(i2) = (12)(–1) = –12

• Multiply and simplify (i)(2i)(–3i).

(i)(2i)(–3i) = (2 · –3)(i · i · i) = (–6)(i2 · i)

=(–6)(–1 · i) = (–6)(–i) = 6i

Note this last problem. Within it, you can see that , because i2 = –1.
Continuing, we get:

This pattern of powers, signs, 1's, and i's is a cycle:

398
In other words, to calculate any high power of i, you can convert it to a
lower power by taking the closest multiple of 4 that's no bigger than the
exponent and subtracting this multiple from the exponent. For example,
a common trick question on tests is something along the lines of
"Simplify i99", the idea being that you'll try to multiply i ninety-nine
times and you'll run out of time, and the teachers will get a good giggle
at your expense in the faculty lounge. Here's how the shortcut works:

i99 = i96+3 = i(4×24)+3 = i3 = –i

That is, i99 = i3, because you can just lop off the i96. (Ninety-six is a
multiple of four, so i96 is just 1, which you can ignore.) In other words,
you can divide the exponent by 4 (using long division), discard the
answer, and use only the remainder. This will give you the part of the
exponent that you care about. Here are a few more examples:

• Simplify i17.

i17 = i16 + 1 = i4 · 4 + 1 = i1 = i

399
• Simplify i120.

i120 = i4 · 30 = i4· 30 + 0 = i0= 1

• Simplify i64,002.

i64,002 = i64,000 + 2 = i4 · 16,000 + 2 = i2 = –1

Now you've seen how imaginaries work; it's time to move on to complex
numbers. "Complex" numbers have two parts, a "real" part (being any
"real" number that you're used to dealing with) and an "imaginary" part
(being any number with an "i" in it). The "standard" format for complex
numbers is "a + bi"; that is, real-part first and i-part last.

e complex numbers are the field of numbers of the form , where


and are real numbers and i is the imaginary unit equal to the square root
of , . When a single letter is used to denote a complex
number, it is sometimes called an "affix." In component notation,
can be written . The field of complex numbers includes the
field of real numbers as a subfield.

The set of complex numbers is implemented in the Wolfram Language


as Complexes. A number can then be tested to see if it is complex using
the command Element[x, Complexes], and expressions that are complex
numbers have the Head of Complex.

400
Complex numbers are useful abstract quantities that can be used in
calculations and result in physically meaningful solutions. However,
recognition of this fact is one that took a long time for mathematicians to
accept. For example, John Wallis wrote, "These Imaginary Quantities
(as they are commonly called) arising from the Supposed Root of a
Negative Square (when they happen) are reputed to imply that the Case
proposed is Impossible" (Wells 1986, p. 22).

Through the Euler formula, a complex number

(1)

may be written in "phasor" form

(2)

401
Here, is known as the complex modulus (or sometimes the complex
norm) and is known as the complex argument or phase. The plot above
shows what is known as an Argand diagram of the point , where the
dashed circle represents the complex modulus of and the angle
represents its complex argument. Historically, the geometric
representation of a complex number as simply a point in the plane was
important because it made the whole idea of a complex number more
acceptable. In particular, "imaginary" numbers became accepted partly
through their visualization.

Unlike real numbers, complex numbers do not have a natural ordering,


so there is no analog of complex-valued inequalities. This property is not
so surprising however when they are viewed as being elements in the
complex plane, since points in a plane also lack a natural ordering.

The absolute square of is defined by , with the complex


conjugate, and the argument may be computed from

(3)

The real and imaginary parts are given by

(4)

(5)

402
(6)

(7)

de Moivre's identity relates powers of complex numbers for real by

(8)

A power of complex number to a positive integer exponent can be


written in closed form as

(9)

The first few are explicitly

(10)

(11)

(12)

(13)

(Abramowitz and Stegun 1972).

Complex addition

403
(14)

complex subtraction

(15)

complex multiplication

(16)

and complex division

(17)

can also be defined for complex numbers. Complex numbers may also
be taken to complex powers. For example, complex exponentiation
obeys

(18)

where is the complex argument.

Complex numbers are "binomials" of a sort, and are added, subtracted,


and multiplied in a similar way. (Division, which is further down the
page, is a bit different.) First, though, you'll probably be asked to
demonstrate that you understand the definition of complex numbers.

404
• Solve 3 – 4i = x + yi

Finding the answer to this involves nothing more than knowing


that two complex numbers can be equal only if their real and
imaginary parts are equal. In other words, 3 = x and –4 = y.

To simplify complex-valued expressions, you combine "like" terms and


apply the various other methods you learned for working with
polynomials.

• Simplify (2 + 3i) + (1 – 6i).

(2 + 3i) + (1 – 6i) = (2 + 1) + (3i – 6i) = 3 + (–3i) = 3 – 3i

• Simplify (5 – 2i) – (–4 – i).

(5 – 2i) – (–4 – i)

= (5 – 2i) – 1(–4 – i) = 5 – 2i – 1(–4) – 1(–i)

= 5 – 2i + 4 + i= (5 + 4) + (–2i + i)

= (9) + (–1i) = 9 – i

You may find it helpful to insert the "1" in front of the second set of
parentheses (highlighted in red above) so you can better keep track of
the "minus" being multiplied through the parentheses.

• Simplify (2 – i)(3 + 4i).


405
(2 – i)(3 + 4i) = (2)(3) + (2)(4i) + (–i)(3) + (–i)(4i)

= 6 + 8i – 3i – 4i2 = 6 + 5i – 4(–1)

= 6 + 5i + 4 = 10 + 5i

For the last example above, FOILing works for this kind of
multiplication, if you learned that method. But whatever method you
use, remember that multiplying and adding with complexes works just
like multiplying and adding polynomials, except that, while x2 is just x2,
i2 is –1. You can use the exact same techniques for simplifying complex-
number expressions as you do for polynomial expressions, but you can
simplify even further with complexes because i2 reduces to the number –
1.

Adding and multiplying complexes isn't too bad. It's when you work
with fractions (that is, with division) that things turn ugly. Most of the
reason for this ugliness is actually arbitrary. Remember back in
elementary school, when you first learned fractions? Your teacher would
get her panties in a wad if you used "improper" fractions. For instance,
you couldn't say " 3/2 "; you had to convert it to "1 1/2". But now that
you're in algebra, nobody cares, and you've probably noticed that
"improper" fractions are often more useful than "mixed" numbers. The
issue with complex numbers is that your professor will get his boxers in

406
a bunch if you leave imaginaries in the denominator. So how do you
handle this?

Suppose you have the following exercise: Copyright © Elizabeth


Stapel 2000-2011 All Rights Reserved

• Simplify

This is pretty "simple", but they want me to get rid of that i


underneath, in the denominator. The 2 in the denominator is fine,
but the i has got to go. To do this, I will use the fact that i2 = –1. If
I multiply the fraction, top and bottom, by i, then the i underneath
will vanish in a puff of negativity:

So the answer is

This was simple enough, but what if they give you something more
complicated?

• Simplify

If I multiply this fraction, top and bottom, by i, I'll get:

407
Since I still have an i underneath, this didn't help much. So how do
I handle this simplification? I use something called "conjugates".
The conjugate of a complex number a + bi is the same number, but
with the opposite sign in the middle: a – bi. When you multiply
conjugates, you are, in effect, multiplying to create something in
the pattern of a difference of squares:

Note that the i's disappeared, and the final result was a sum of
squares. This is what the conjugate is for, and here's how it is used:

So the answer is

In the last step, note how the fraction was split into two pieces. This is
because, technically speaking, a complex number is in two parts, the real
part and the i part. They aren't supposed to "share" the denominator. To
be sure your answer is completely correct, split the complex-valued
fraction into its two separate terms.
408
You'll probably only use complexes in the context of solving quadratics
for their zeroes. (There are many other practical uses for complexes, but
you'll have to wait for more interesting classes like "Engineering 201" to
get to the "good stuff".)

Remember that the Quadratic Formula solves "ax2 + bx + c = 0" for the
values of x. Also remember that this means that you are trying to find
the x-intercepts of the graph. When the Formula gives you a negative
inside the square root, you can now simplify that zero by using complex
numbers. The answer you come up with is a valid "zero" or "root" or
"solution" for "ax2 + bx + c = 0", because, if you plug it back into the
quadratic, you'll get zero after you simplify. But you cannot graph a
complex number on the x,y-plane. So this "solution to the equation" is
not an x-intercept. You can make this connection between the Quadratic
Formula, complex numbers, and graphing:

x2 – 2x – 3 x2 – 6x + 9 x2 + 3x + 3

a positive number zero inside the a negative

409
inside the square square root number inside
root the square root

one (repeated) two complex


two real solutions
real solution solutions

two distinct x- one (repeated) x-


no x-intercepts
intercepts intercept

As an aside, you can graph complexes, but not in the x,y-plane. You
need the "complex" plane. For the complex plane, the x-axis is where
you plot the real part, and the y-axis is where you graph the imaginary
part. For instance, you would plot the complex number 3 – 2i like this:

410
This leads to an interesting fact: When you learned about regular ("real")
numbers, you also learned about their order (this is what you show on
the number line). But x,y-points don't come in any particular order. You
can't say that one point "comes after" another point in the same way that
you can say that one number comes after another number. For instance,
you can't say that (4, 5) "comes after" (4, 3) in the way that you can say
that 5 comes after 3. Pretty much all you can do is compare "size", and,
for complex numbers, "size" means "how far from the origin". To do
this, you use the Distance Formula, and compare which complexes are
closer to or further from the origin. This "size" concept is called "the
modulus". For instance, looking at our complex number plotted above,
its modulus is computed by using the Distance Formula: Copyright ©
Elizabeth Stapel 2000-2011 All Rights Reserved

Note that all points at this distance from the origin have the same
modulus. All the points on the circle with radius sqrt(13) are viewed as
being complex numbers having the same "size" as 3 – 2i.
411
9.2.Algebra of complex Numbers

for

Complex Number Algebra

1.2 The Algebra of Complex Numbers

We have seen (in Section 1.1) that complex numbers came to be


viewed as ordered pairs of real numbers. That is, a complex number is
defined to be

where x and y are both real numbers.

The reason we say ordered pair is because we are thinking of a point


in the plane. The point (2, 3), for example, is not the same as (3, 2). The
order in which we write and in the equation makes a
difference. Clearly, then, two complex numbers are equal if and only if
their x coordinates are and their y coordinates are equal. In other words,

iff .

(Throughout this text, iff means if and only if.)

412
A meaningful number system requires a method for combining
ordered pairs. The definition of algebraic operations must be consistent
so that the sum, difference, product, and quotient of any two ordered
pairs will again be an ordered pair. The key to defining how these
numbers should be manipulated is to follow Gauss's lead and equate
with . Then, if and are arbitrary complex
numbers, we have

Thus, the following definitions should make sense.

Definition 1.1, (Addition)

Formula (1-8) .

Derivation of Formula (1-8).

Definition 1.2, (Subtraction)

Formula (1-9) .

413
Derivation of Formula (1-9).

Example 1.1. Given . (a) Find


and (b) .

and

We can also use the notation and :

and

Explore Solution 1.1. (a).

Explore Solution 1.1. (b).

Given the rationale we devised for addition and subtraction, it is


tempting to define the product as . It turns out,
however, that this is not a good definition, and we ask you in the

414
exercises for this section to explain why. How, then, should products be
defined? Again, if we equate with and assume, for the
moment, that makes sense (so that ), we have

Thus, it appears we are forced into the following definition.

Definition 1.3, (Complex Multiplication)

Formula (1-10) .

Derivation of Formula (1-10).

Example 1.2. Given . Find .

415
We get the same answer by using the notation and :

Of course, it makes sense that the answer came out as we expected


because we used the notation x+iy as motivation for our definition in the
first place.

Explore Solution 1.2.

To motivate our definition for division, we will proceed along the


same lines as we did for multiplication, assuming :

We need to figure out a way to write the preceding quantity in the


form . To do this, we use a standard trick and multiply the
numerator and denominator by , which gives

416
Definition 1.4, (Complex Division)

Formula (1-11) , for .

Derivation of Formula (1-11).

Example 1.3. Given . Find .

As with the example for multiplication, we also get this answer if we use
the notation :

417
Explore Solution 1.3.

To perform operations on complex numbers, most mathematicians


would use the notation and engage in algebraic manipulations, as
we did here, rather than apply the complicated-looking definitions we
gave for those operations on ordered pairs. This procedure is valid
because we used the notation as a guide for defining the
operations in the first place. Remember, though, that the notation
is nothing more than a convenient bookkeeping device for keeping track
of how to manipulate ordered pairs. It is the ordered pair algebraic
definitions that form the real foundation on which the complex number
system is based. In fact, if you were to program a computer to do
arithmetic on complex numbers, your program would perform
calculations on ordered pairs, using exactly the definitions that we gave.

It turns out that our algebraic definitions give complex numbers all the
properties we normally ascribe to the real number system. Taken

418
together, they describe what algebraists call a field. In formal terms, a
field is a set (in this case, the complex numbers) together with two
binary operations (in this case, addition and multiplication) having the
following properties.

(P1) Commutative Law for Addition.

Derivation of Property (P1).

(P2) Associative Law for Addition.

(P3) Additive Identity. There is a complex number such that

for all complex numbers z.

The number is obviously the ordered pair .

(P4) Additive Inverses. Given any complex number , there is a


complex number (depending on ) with the property that

419
.

Obviously, if , the number will be .

(P5) Commutative Law for Multiplication.

(P6) Associative Law for Multiplication.

(P7) Multiplicative Identity. There is a complex number such that

for all complex numbers .

As one might expect, it turns out that is the unique complex


number with this property. We ask you to verify this identity in the
exercises for this section.

(P8) Multiplicative Inverses. Given any number other than the


number (0, 0), there is a complex number (depending on z) which we
shall denote by with the property that

420
Based on our definition for division, it seems reasonable that the number
would be

We ask you to confirm this result in the exercises for this section.

(P9) The Distributive Law.

Derivation of Property (P 9).

None of these properties is difficult to prove. Most of the proofs


make use of corresponding facts in the real number system. To illustrate,
we give a proof of property (P1).

Proof of the commutative law for addition: Let and


be arbitrary complex numbers. Then,

421
Actually, you can think of the real number system as a subset of the
complex number system. To see why, let's agree that, as any complex
number of the form is on the axis, we can identify it with the real
number . With this correspondence, we can easily verify that our
definitions for addition, subtraction, multiplication, and division of
complex numbers are consistent with the corresponding operations on
real numbers. For example, if and are real numbers, then

It is now time to show specifically how the symbol relates to the


quantity . Note that

422
If we use the symbol for the point , the preceding identity gives

which means . So, the next time you are having a


discussion with your friends and they scoff when you claim that is
not imaginary, calmly put your pencil on the point of the
coordinate plane and ask them if there is anything imaginary about
it. When they agree there isn't, you can tell them that this point, in fact,
represents the mysterious in the same way that represents .

We can also see more clearly now how the notation quates to
. Using the preceding conventions (i.e., , etc.), we have

423
Thus we may move freely between the notations and ,
depending on which is more convenient for the context in which we are
working. Students sometimes wonder whether it matters where the " is
located in writing a complex number. It does not. Generally, most texts
place terms containing an at the end of an expression, and place the "
before a variable, but after a constant. Thus, we write , ,
etc., but , and so forth. Because letters lower in the
alphabet generally denote constants, you will usually (but not always)
see the expression , instead of . Many authors write
quantities like instead of to make sure the " is not
mistakenly thought to be inside the square root symbol. Additionally, if
there is concern that the " might be missed, it is sometimes placed

before a lengthy expression, as in

Definition 1.5, (Real Part of z). The real part of , denoted


by , is the real number .

Definition 1.6, (Imaginary Part of z). The imaginary part of


, denoted by , is the real number .

Definition 1.7, (Conjugate of z). The conjugate of , denoted


by , is the complex number .

424
Example 1.4. Given .

1.4 (a) We have and .

1.4 (b) We have and .

1.4 (c) We have and .

The following theorem gives some important facts relating to these


operations. You will be asked for a proof in the exercises

Theorem 1.1. Suppose are arbitrary complex numbers. Then if

425
Because of what it erroneously connotes, it is a shame that the term
imaginary is used in Definition (1.6). It was coined by the brilliant
mathematician and philosopher René Descartes (1596--1650) during an
era when quantities such as were thought to be just that. Gauss, who
was successful in getting mathematicians to adopt the phrase complex
number rather than imaginary number, also suggested that they use
lateral part of z in place of imaginary part of z. Unfortunately, that
suggestion never caught on, and it appears we are stuck with what
history has handed do

Geometry of Complex Numbers

1.3 The Geometry of Complex Numbers

Complex numbers are ordered pairs of real numbers, so they can be


represented by points in the plane. In this section we show the effect that
algebraic operations on complex numbers have on their geometric
representations.

We can represent the number by a position vector in


the xy plane whose tail is at the origin and whose head is at the
point . When the xy plane is used for displaying complex
numbers, it is called the complex plane, or more simply, the
plane. Recall that . Geometrically, is the
projection of onto the axis, and is the projection of

426
onto the axis. It makes sense, then, to call the axis the real axis and
the axis the imaginary axis, as Figure 1.3 illustrates.

Figure 1.3 The complex plane.

Addition of complex numbers is analogous to addition of vectors in


the plane. As we saw in Section 1.2, the sum of and
is . Hence, can be obtained
vectorially by using the "parallelogram law," where the vector sum is the
vector represented by the diagonal of the parallelogram formed by the
two original vectors. Figure 1.4 illustrates this notion.

Figure 1.4 The sum .

427
The difference can be represented by the displacement vector
from the point to the point , as Figure 1.5 shows.

Figure 1.5 The difference .

Definition 1.8, (Modulus or Absolute Value). The modulus, or


absolute value, of the complex number is a nonnegative real
number denoted by and is given by the equation

(1-20) .

The number is the distance between the origin and the


point . The only complex number with modulus zero is the

428
number . The number has modulus
, and is depicted in Figure 1.6.

Figure 1.6 The real and imaginary parts of a complex number.

The numbers are the lengths of the sides of the


right triangle OPQ shown in Figure 1.7.

Figure 1.7 The moduli of z and its components.

429
The inequality means that the point is closer to the origin
than the point . Although obvious from Figure 1.7, it is still profitable
to work out algebraically the standard results that

(1-21) and .

which we leave as an exercise for the reader.

The difference represents the displacement vector from to ,


so the distance between and is given by . We can obtain this
distance by using Definition (1.2) and Definition (1.8) to obtain the
familiar formula

If , then is the reflection of ,


through the origin, and is the reflection of through the
axis, as illustrated in Figure 1.8.

430
Figure 1.8 The geometry of negation and conjugation.

We can use an important algebraic relationship to establish properties


of the absolute value that have geometric applications. Its proof is rather
straightforward, and we ask you to give it in the exercises for this
section.

(1-22) .

A beautiful and important application of the above identity is its use


in establishing the triangle inequality, which states that the sum of the
lengths of two sides of a triangle is greater than or equal to the length of
the third side. Figure 1.9 illustrates this inequality.

431
Figure 1.9 The triangle inequality.

Theorem 1.2, (Triangle Inequality). If are arbitrary complex


numbers, then

(1-23) .

Proof. We appeal to basic results:

Taking square roots yields the desired inequality.


432
Example 1.5. To produce an example of which Figure 1.9 is a
reasonable illustration, we let
. Then and . Clearly,
; hence . In this case, we can verify the triangle
inequality without recourse to computation of square roots because

,
thus
.

Explore Solution 1.5.

Compute .

433
434
We can also establish other important identities by means of the
triangle inequality. Note that

Subtracting from the left and right sides of this string of inequalities
gives an important relationship that will be used in determining lower
bounds of sums of complex numbers:

(1-24) .

Using the identity and the commutative and associative laws


it follows that

435
(1-20)
(1-20)
(1-20)

Taking square roots of the terms on the left and right establishes another
important identity

(1-25) .

As an exercise, we ask you to show

(1-26) , provided .

Example 1.6. Use the values


, then and
. Also ; hence

,
thus
.

Explore Solution 1.6.

436
Compute .

437
Figure 1.10 illustrates the multiplication shown in Example1.6. The
length of the vector apparently equals the product of the lengths of
, confirming that , but why is it located in the
second quadrant when both are in the first quadrant? The answer
to this question will become apparent to you in Section 1.4.

438
Figure 1.10 The geometry of multiplication.

Argand Diagram

An Argand diagram is a plot of complex numbers as points

439
in the complex plane using the x-axis as the real axis and y-axis as the
imaginary axis. In the plot above, the dashed circle represents the
complex modulus of and the angle represents its complex argument.

While Argand (1806) is generally credited with the discovery, the


Argand diagram (also known as the Argand plane) was actually
described by C. Wessel prior to Argand. Historically, the geometric
representation of a complex number as a point in the plane was
important because it made the whole idea of a complex number more
acceptable. In particular, this visualization helped "imaginary" and
"complex" numbers become accepted in mainstream mathematics as a
natural extension to negative numbers along the real line.

9.4.POWERS AND ROOTS

Powers and roots


Powers of complex numbers are just special cases of products when the
power is a positive whole number. We have already studied the powers
of the imaginary unit i and found they cycle in a period of length 4.

and so forth. The reasons were that (1) the absolute value |i| of i was one,
so all its powers also have absolute value 1 and, therefore, lie on the unit

440
circle, and (2) the argument arg(i) of i was 90°, so its nth power will
have argument n90°, and those
angles will repeat in a period of
length 4 since 4·90° = 360°, a full
circle.

More generally, you can find zn as


the complex number (1) whose
absolute value is |z|n, the nth power of
the absolute value of z, and (2)
whose argument is n times the
argument of z.

In the figure you see a complex number z whose absolute value is about
the sixth root of 1/2, that is, |z| = 0.89, and whose argument is 30°. Here,
the unit circle is shaded black while outside the unit circle is gray, so z is
in the black region. Since |z| is less than one, it’s square is at 60° and
closer to 0. Each higher power is 30° further along and even closer to 0.
The first six powers are displayed, as you can see, as points on a spiral.
This spiral is called a geometric or exponential sprial.

Roots.
Note that in the last example, z6 is on the negative real axis at about -1/2.
That means that z is just about equal to one of the sixth roots of -1/2.

441
There are, in fact, six sixth roots of any complex number. Let w be a
complex number, and z any of its sixth roots. Since z6 = w, it follows
that

1. the absolute value of w, |w| is |z|6, so |z| = |w|1/6, and


2. 6 arg(z) is arg(w), so arg(z)=arg(w)/6.

Actually, the second statement isn’t quite right since 6 arg(z) could be
any multiple of 360° more than arg(w), so you can add multiples of 60°
to arg(w) to get the other five roots.

For example, take w to be -1/2, the green dot in


the figure to the right. Then |w| is 1/2, and arg(w)
is 180°. Let z be a sixth root of w. Then (1) |z| is
|w|1/6 which is about 0.89. Also, (2) the argument
of w is arg(w) = 180°. But the same angle could
be named by any of

180°, 540°, 900°, 1260°, 1610°, or 1970°.

If we take 1/6 of each of these angles, then we’ll have the possible
arguments for z:

30°, 90°, 150°, 210°, 270°, or 330°.

442
Since each of the angles for z differs by 360°, therefore each of the
possible angles for z will differ by 60°. These six sixth roots of -1/2 are
displayed in the figure as blue dots.

More roots of unity.


Recall that an “nth root of unity” is just another name for an nth root of
one. The fourth roots are ±1, ±i, as noted earlier in the section on
absolute value. We also saw that the eight 8th roots of unity when we
looked at multiplication were ±1, ±i, and
±√2/2 ± i√2/2.

Let’s consider now the sixth roots of unity.


They will be placed around the circle at 60°
intervals. Two of them, of course, are ±1. Let
w be the one with argument 60°. The triangle
with vertices at 0, 1, and w is an equilateral
triangle, so it is easy to determine the coordinates of w. The x-coordinate
is 1/2, and the y-coordinate is √3/2. Therefore, w is (1 + i√3)/2. The
remaining sixth roots are reflections of w in the real and imaginary axes.
In summary, the six sixth roots of unity are ±1, and (±1 ± i√3)/2 (where
+ and – can be taken in any order).

Now some of these sixth roots are lower roots of unity as well. The
number –1 is a square root of unity, (–1 ± i√3)/2 are cube roots of unity,
and 1 itself counts as a cube root, a square root, and a “first” root

443
(anything is a first root of itself). But the remaining two sixth roots,
namely, (1 ± i√3)/2, are sixth roots, but not any lower roots of unity.
Such roots are called primitive, so (1 ± i√3)/2 are the two primitive sixth
roots of unity.

It’s fun to find roots of unity, but we’ve found most of the easy ones
already.

9.5.DEMOIVRE’S THEOREM

De Moivre's Theorem

In the last section, we looked at the polar form of complex numers and
proved a beautiful theorem regarding them. In this section, we prove
another beautiful result, known as De Moivre's Theorem, which allows
us to easily compute powers and roots of complex numbers given in
polar form. We will also apply this theorem to many examples.

Theorem 6.6.1 (De Moivre's Theorem): For every real number θ and
every positive integer n, we have

• (6.6.2) (cos θ + i sin θ)n = cos nθ + i sin nθ.

Proof: We prove this theorem by induction, i.e. first we prove it for n=1
and then we prove that if (6.6.2) holds for a particular value of n, then it
holds for n+1 as well. This suffices to prove the theorem for every
positive integer n. (Induction is a commonly-used method for proving
444
mathematical results.) The case n=1 is trivial. Assume (6.6.2) holds for
n. Then we have

(cos θ + i sin θ)n+1 = (cos θ + i sin θ)n (cos θ + i sin θ) = (cos nθ + i sin
nθ) (cos θ + i sin θ),

where in the last step we used the induction hypothesis, i.e. the
assumption that (6.6.2) holds for n. Computing the product on the right
yiedls

(cos θ + i sin θ)n+1 = (cos nθ cos θ - sin nθ sin θ) + i (sin nθ cos θ + cos
nθ sin θ) = cos (n+1)θ + i sin (n+1)θ,

where we used the addition formulas for sine and cosine in the last step.
(Altermatively, we could have applied Theorem 6.5.4 to the two factors
on the right side of the previous equation.) Thus, (6.6.2) holds for n+1 as
well, whence it holds for every positive integer n.

QED

One reason De Moivre's Theorem is useful is that it allows us to


compute large powers of complex numbers expressed in polar form
without having to carry out every multiplication explicitly. The
following examples show this.

Example 1: Use De Moivre's Theorem to compute (1 + i)12.

445
Solution: As we have seen in the previous section, the polar form of 1 +
i is √2 (cos π/4 + i sin π/4). Thus, by De Moivre's Theorem, we have

(1 + i)12 = [√2 (cos π/4 + i sin π/4)]12

= (√2)12(cos π/4 + i sin π/4)12

= 26 (cos 3π + i sin 3π)

= 64(cos π + i sin π) = 64(-1) = -64.

Example 2: Use De Moivre's Theorem to compute (√3 + i)5.

Solution: It is straightforward to show that the polar form of √3 + i is


2(cos π/6 + i sin π/6). Thus we have

(√3 + i)5 = [2(cos π/6 + i sin π/6)]5

= 25(cos π/6 + i sin π/6)5

= 32(cos 5π/6 + i sin 5π/6)

= 32(-√3/2 + 1/2 i)

= -16√3 + 16 i.

A very important application of De Moivre's Theorem is computing nth


roots of complex numbers, where n is a posiive integer. First we look at
nth roots of 1, also known as nth roots of unity. First we note that 1

446
may be written in polar form as 1 = cos 2πm + i sin 2πm for every
integer m, since cosine and sine are periodic with period 2π. Now
consider the complex number z = cos (2πm/n) + i sin (2πm/n), where n
is an arbitrary positive integer. By De Moivre's Theorem we have

zn = [cos (2πm/n) + i sin (2πm/n)]n = cos 2πm + i sin 2πm = 1.

Thus we have shown that cos (2πm/n) + i sin (2πm/n) is an nth root of
unity. In fact, all the nth roots of unity are obtained this way by plugging
in all integer values of m from 0 to n-1. (Every other integer m yields a
root of unity identical to one of these.) Thus we now know how to find
all n nth roots of unity for every positive integer n.

Example 3: Compute the three cube roots of unity.

Solution: From our above discussion, we see that the three cube roots of
unity have the form cos (2πm/3) + i sin (2πm/3) for m=0, 1, or 2.
Plugging in m=0 yields the root cos 0 + i sin 0 = 1. (It is easy to see that
1 is an nth root of unity for every integer n, since 1n = 1. Plugging in
m=0 will always yield this root.) Plugging in m=1 yields the root cos
(2π/3) + i sin (2π/3) = -1/2 - √3/2 i and plugging in m=2 yields the root
cos (4π/3) + i sin (4π/3) = -1/2 - √3/2 i.

Example 4: Compute the four fourth roots of unity.

447
Solution: The four fourth roots of unity have the form cos (2πm/4) + i
sin (2πm/4) for m=0, 1, 2, or 3. As usual, plugging in m=0 yields the
root 1. Plugging in m=1 yields the root cos π/2 + i sin π/2 = i. Plugging
in m=2 yields cos π + i sin π = -1. Finally, plugging in m=3 yields cos
3π/2 + i sin 3π/2 = -i.

The nth roots of unity have a nice geometric interpretation in terms of


where they lie in the complex plane. They form a regular n-gon on the
unit circle with one vertex at 1. The case n=6 is shown in the following
figure.

448
Figure 6.6.3: The Six Sixth Roots of Unity

So much for nth roots of unity - how about nth roots of general complex
numbers? The following lemma, which is an immediate consequence of
De Moivre's Theorem, tells us how to compute the n nth roots of an
arbitrary complex number, given in polar form.

Lemma 6.6.4: Let z = r (cos θ + i sin θ) be an arbitrary complex


number. Then the n nth roots w of z are each of the form

449
• (6.6.5) w = n√r {cos [(θ + 2πm)/n] + i sin [(θ + 2πm)/n]}

where m is an integer ranging from 0 to n-1.

Proof: If we apply De Moivre's Theorem to each of the alleged nth roots


of z, we find

wn = (n√r {cos [(θ + 2πm)/n] + i sin [(θ + 2πm)/n]})n

= r {cos [(θ + 2πm)/n] + i sin [(θ + 2πm)/n]}n

= r [cos (θ + 2πm) + i sin (θ + 2πm)]

= r (cos θ + i sin θ) = z.

Thus, each of these is indeed an nth root of z. Since we have found n


distinct roots, this must be all of them.It should be noted that in Lemma
6.6.4 as well as our derivation of the formula for the nth roots of unity,
we have implicitly used a very powerful result, known as the
Fundamental Theorem of Algebra. This theorem says that every
polynomial of degree n factors completely into a product of n linear
factors over the complex numbers. Among other things, this theorem
implies that every polynomial of degree n has at most n complex roots.
Thus, there can be at most n nth roots of a fixed complex number a,
since these roots satisfy the degree-n polynomial equation xn - a = 0. We
will not prove the Fundamental Theorem of Algebra in this module.

450
Example 5: Compute the two square roots of i.

Solution: It is easy to see that i has the polar form cos π/2 + i sin π/2.
Thus, by Lemma 6.6.4, its square roots are cos π/4 + i sin π/4 = √2/2 +
√2/2 i and cos 3π/4 + i sin 3π/4 = -√2/2 - √2/2 i.

Example 6: Compute the three cube roots of -8.

Solution: Since -8 has the polar form 8 (cos π + i sin π), its three cube
roots have the form 3√8 {cos[(π + 2πm)/3] + i sin[(π + 2πm)/3]} for
m=0, 1, and 2. Thus the roots are 2 (cos π/3 + i sin π/3) = 1 + √3 i, 2 (cos
π + i sin π) = -2, and 2 (cos 5π/3 + i sin 5π/3) = 1 - √3 i.

De Moivre's Theorem

The variable z, z = x + iy can be represented by its length and angle as


can any two dimensional vector, and the relation as usual is

De Moivre's Theorem is the statement that . We can


therefore write

451
The Logarithm and the Problem of the Multivalued Nature of
Angles

The logarithm of z is, from the last equation, describable as

This formula has a problem, and that problem is that the angle is not a
well defined function in the complex plane, and so neither is ln z.

The difficulty is that as we wander around the origin in a


counterclockwise direction, the angle keeps increasing and comes back
after each revolution 2 greater than it was.

Thus the value of the logarithm at a given value of z depends on how


you got there; unless you artificially restrict its angle say to range from -
to . If you do that the function ln z is discontinuous on the negative
real axis.

Similar problems exists for inverse powers such as x1/2 and x1/3 as well.

There are several ways to get around this problem.

The prosaic way is to define such inverse functions precisely by


introducing a line of discontinuity for them, called a cut.

452
Thus for the logarithm you can say that its imaginary part, , has values
that run from 0 to 2 . If so it is discontinuous on the positive real axis,
being 0 on one side of it and 2 on the other.

Alternatively you can have its values lie from - to + , so that its line
of discontinuity is the negative real axis, and you can choose any other
half line of discontinuity starting at the origin.

Another way to handle this problem is to replace the complex plane by a


geometric structure called a Riemann surface, on which the function in
question is single valued without a discontinuity.

In the case of the logarithm this surface winds around and around the
origin. For the square root if you go around the origin twice you come
back to where you started from.

3.2.PARTIAL FRACTION

Basic Integral Formulas

1)
2) Where is any constant.

3)

4)

5)
453
6)

7)

8)
9)
10)
11)
12)

13)
14)
15)
16)
17) or

18)

19) or

20) or
21)
22)
23)
24)
25)
454
26)
27)
28)
29)
30)
31)
32)
33)
34)

35) or

36) or

37) or

38) or

39) or

40)

Partial Fractions

455
In this section we are going to take a look at integrals of rational
expressions of polynomials and once again let’s start this section out
with an integral that we can already do so we can contrast it with the
integrals that we’ll be doing in this section.

So, if the numerator is the derivative of the denominator (or a constant


multiple of the derivative of the denominator) doing this kind of integral
is fairly simple. However, often the numerator isn’t the derivative of the
denominator (or a constant multiple). For example, consider the
following integral.

456
In this case the numerator is definitely not the derivative of the
denominator nor is it a constant multiple of the derivative of the
denominator. Therefore, the simple substitution that we used above
won’t work. However, if we notice that the integrand can be broken up
as follows,

then the integral is actually quite simple.

This process of taking a rational expression and decomposing it into


simpler rational expressions that we can add or subtract to get the
original rational expression is called partial fraction decomposition.
Many integrals involving rational expressions can be done if we first do
partial fractions on the integrand.

457
So, let’s do a quick review of partial fractions. We’ll start with a
rational expression in the form,

where both P(x) and Q(x) are polynomials and the degree of P(x) is
smaller than the degree of Q(x). Recall that the degree of a polynomial
is the largest exponent in the polynomial. Partial fractions can only be
done if the degree of the numerator is strictly less than the degree of the
denominator. That is important to remember.

So, once we’ve determined that partial fractions can be done we factor
the denominator as completely as possible. Then for each factor in the
denominator we can use the following table to determine the term(s) we
pick up in the partial fraction decomposition.

Factor in Term in partial

denominator fraction decomposition

458
,

Notice that the first and third cases are really special cases of the second
and fourth cases respectively.

There are several methods for determining the coefficients for each term
and we will go over each of those in the following examples.

Let’s start the examples by doing the integral above.

Example 1 Evaluate the following integral.

459
Solution

The first step is to factor the denominator as much as possible and get
the form of the partial fraction decomposition. Doing this gives,

The next step is to actually add the right side back up.

Now, we need to choose A and B so that the numerators of these two are
equal for every x. To do this we’ll need to set the numerators equal.

460
Note that in most problems we will go straight from the general form of
the decomposition to this step and not bother with actually adding the
terms back up. The only point to adding the terms is to get the
numerator and we can get that without actually writing down the results
of the addition.

At this point we have one of two ways to proceed. One way will always
work, but is often more work. The other, while it won’t always work, is
often quicker when it does work. In this case both will work and so
we’ll use the quicker way for this example. We’ll take a look at the
other method in a later example.

What we’re going to do here is to notice that the numerators must be


equal for any x that we would choose to use. In particular the
numerators must be equal for and . So,
let’s plug these in and see what we get.

461
So, by carefully picking the x’s we got the unknown constants to quickly
drop out. Note that these are the values we claimed they would be
above.

At this point there really isn’t a whole lot to do other than the integral.

Recall that to do this integral we first split it up into two integrals and
then used the substitutions,

462
on the integrals to get the final answer.

Before moving onto the next example a couple of quick notes are in
order here. First, many of the integrals in partial fractions problems
come down to the type of integral seen above. Make sure that you can
do those integrals.

There is also another integral that often shows up in these kinds of


problems so we may as well give the formula for it here since we are
already on the subject.

It will be an example or two before we use this so don’t forget about it.

463
Example 2 Evaluate the following integral.

Solution

We won’t be putting as much detail into this solution as we did in the


previous example. The first thing is to factor the denominator and get
the form of the partial fraction decomposition.

The next step is to set numerators equal. If you need to actually add the
right side together to get the numerator for that side then you should do
so, however, it will definitely make the problem quicker if you can do

464
the addition in your head to get,

As with the previous example it looks like we can just pick a few values
of x and find the constants so let’s do that.

465
Note that unlike the first example most of the coefficients here are
fractions. That is not unusual so don’t get excited about it when it
happens.

Now, let’s do the integral.

Again, as noted above, integrals that generate natural logarithms are


very common in these problems so make sure you can do them.

Example 3 Evaluate the following integral.

466
Solution

This time the denominator is already factored so let’s just jump right to
the partial fraction decomposition.

Setting numerators gives,

467
In this case we aren’t going to be able to just pick values of x that will
give us all the constants. Therefore, we will need to work this the
second (and often longer) way. The first step is to multiply out the right
side and collect all the like terms together. Doing this gives,

Now we need to choose A, B, C, and D so that these two are equal. In


other words we will need to set the coefficients of like powers of x
equal. This will give a system of equations that can be solved.

468
Note that we used x0 to represent the constants. Also note that these
systems can often be quite large and have a fair amount of work
involved in solving them. The best way to deal with these is to use some
form of computer aided solving techniques.

Now, let’s take a look at the integral.

469
In order to take care of the third term we needed to split it up into two
separate terms. Once we’ve done this we can do all the integrals in the
problem. The first two use the substitution
, the third uses the substitution
and the fourth term uses the formula given above for inverse tangents.

Example 4 Evaluate the following integral.

470
Solution

Let’s first get the general form of the partial fraction decomposition.

Now, set numerators equal, expand the right side and collect like terms.

471
Setting coefficient equal gives the following system.

472
Don’t get excited if some of the coefficients end up being zero. It
happens on occasion.

Here’s the integral.

473
To this point we’ve only looked at rational expressions where the degree
of the numerator was strictly less that the degree of the denominator. Of
course not all rational expressions will fit into this form and so we need
to take a look at a couple of examples where this isn’t the case.

Example 5 Evaluate the following integral.

Solution

So, in this case the degree of the numerator is 4 and the degree of the
denominator is 3. Therefore, partial fractions can’t be done on this

474
rational expression.

To fix this up we’ll need to do long division on this to get it into a form
that we can deal with. Here is the work for that.

So, from the long division we see that,

475
and the integral becomes,

The first integral we can do easily enough and the second integral is now
in a form that allows us to do partial fractions. So, let’s get the general
form of the partial fractions for the second integrand.

476
Setting numerators equal gives us,

Now, there is a variation of the method we used in the first couple of


examples that will work here. There are a couple of values of x that will
allow us to quickly get two of the three constants, but there is no value
of x that will just hand us the third.

What we’ll do in this example is pick x’s to get the two constants that
we can easily get and then we’ll just pick another value of x that will be
easy to work with (i.e. it won’t give large/messy numbers anywhere) and
then we’ll use the fact that we also know the other two constants to find
the third.

477
The integral is then,

478
In the previous example there were actually two different ways of
dealing with the x2 in the denominator. One is to treat it as a quadratic
which would give the following term in the decomposition

and the other is to treat it as a linear term in the following way,

which gives the following two terms in the decomposition,

We used the second way of thinking about it in our example. Notice


however that the two will give identical partial fraction decompositions.
So, why talk about this? Simple. This will work for x2, but what about
x3 or x4? In these cases we really will need to use the second way of
thinking about these kinds of terms.

479
Let’s take a look at one more example.

Example 6 Evaluate the following integral.

Solution

In this case the numerator and denominator have the same degree. As
with the last example we’ll need to do long division to get this into the
correct form. I’ll leave the details of that to you to check.

480
So, we’ll need to partial fraction the second integral. Here’s the
decomposition.

Setting numerator equal gives,

Picking value of x gives us the following coefficients.

481
The integral is then,

Partial-Fraction Decomposition:

Previously, you have added and simplified rational expressions, such as:

482
Partial-fraction decomposition is the process of starting with the
simplified answer and taking it back apart, of "decomposing" the final
expression into its initial polynomial fractions.

To decompose a fraction, you first factor the denominator. Let's work


backwards from the example above. The denominator is x2 + x, which
factors as x(x + 1).

Then you write the fractions with one of the factors for each of the
denominators. Of course, you don't know what the numerators are yet,
so you assign variables (usually capital letters) for these unknown
values:

Then you set this sum equal to the simplified result:

Multiply through by the common denominator of x(x + 1) gets rid of all


of the denominators:

3x + 2 = A(x + 1) + B(x) Copyright © Elizabeth Stapel 2006-


2011 All Rights Reserved

483
Multiply things out, and group the x-terms and the constant terms:

3x + 2 = Ax + A1 + Bx
3x + 2 = (A + B)x + (A)1
(3)x + (2)1 = (A + B)x + (A)1

For the two sides to be equal, the coefficients of the two polynomials
must be equal. So you "equate the coefficients" to get:

3=A+B
2=A

This creates a system of equations that you can solve:

A=2
B=1

Then the original fractions were (as we already know) the following:

There is another method for solving for the values of A and B. Since the
equation "3x + 2 = A(x + 1) + B(x)" is supposed to be true for any value
of x, we can pick useful values of x, plug-n-chug, and find the values for
A and B. Looking at the equation "3x + 2 = A(x + 1) + B(x)", you can
see that, if x = 0, then we quickly find that 2 = A:

484
3x + 2 = A(x + 1) + B(x)
3(0) + 2 = A(0 + 1) + B(0)
0 + 2 = A(1) + 0
2=A

And if x = –1, then we easily get –3 + 2 = –B, so B = 1.

I've never seen this second method in textbooks, but it can often save
you a whole lot of time over the "equate the coefficients and solve the
system of equations" method that they usually teach.

If the denominator of your fraction factors into unique linear factors,


then the decomposition process is fairly straightforward, as shown in the
previous example. But what if the factors aren't unique or aren't linear?

ometimes a factor in the denominator occurs more than one. For


instance, in the fraction 13/24, the denominator 24 factors as 2×2×2×3.
The factor 2 occurs three times. To get the 13/24, there may have been a
1
/2 or a 1/4 or a 1/8 that was included in the original addition. You can't
tell by looking at the final result.

In the same way, if a rational expression has a repeated factor in the


denominator, you can't tell, just by looking, which denominators might
have been included in the original addition. You have to account for
every possibility.

485
• Find the partial-fraction decomposition of the following
expression:

The factor x – 1 occurs three times in the denominator. I will


account for that by forming fractions containing increasing powers
of this factor in the denominator, like this:

Now I multiply through by the common denominator to get:

x2 + 1 = Ax(x – 1)2 + Bx(x – 1) + Cx + D(x – 1)3

I could use a system of equations to solve for A, B, C, and D, but


the other method seemed easier. The two zeroing numbers are x =
1 and x = 0: so

x = 1: 1 + 1 = 0 + 0 + C + 0, so C = 2
x = 0: 1 = 0 + 0 + 0 – D, so D = –1

But what do I do now? I have two other variables, namely A and


B, for which I need values. But since I've got values for C and D, I
can pick any two other x-values, plug them in, and get a system of

486
equations that I can solve for A and B. The particular x-values I
choose aren't important, so I'll pick smallish ones:

x = 2: Copyright © Elizabeth Stapel 2006-2011 All Rights


Reserved

(2)2 + 1 = A(2)(2 – 1)2 + B(2)(2 – 1) + (2)(2) + (–1)(2 – 1)3


4 + 1 = 2A + 2B + 4 – 1
5 = 2A + 2B + 3
1=A+B

x = –1:

(–1)2 + 1 = A(–1)(–1 – 1)2 + B(–1)(–1 – 1) + (2)(–1) + (–1)(–


1 – 1)3
1 + 1 = –4A + 2B – 2 + 8
2 = –4A + 2B + 6
2A – B = 2

I'm still stuck solving a system of equations, but by using the easier
method to solve for C and D, I now have a simpler system to solve.
Adding the two equations, I get 3A = 3, so A = 1. Then B = 0 (so
that term in the expansion "vanishes"), and the complete
decomposition is:

487
In the above example, one of the coefficients turned out to be zero. This
doesn't happen often (in algebra classes, anyway), but don't be surprised
if you get zero, or even fractions, for some of your coefficients. The
textbooks usually stick pretty closely to nice neat whole numbers, but
not always. Don't just assume that a fraction or a zero is a wrong answer.
For instance:

...decomposes as:

Note: You can also handle the fractions like this:

If the denominator of your rational expression has an unfactorable


quadratic, then you have to account for the possible "size" of the
numerator. If the denominator contains a degree-two factor, then the
numerator might not be just a number; it might be of degree one. So you
would deal with a quadratic factor in the denominator by including a
linear expression in the numerator.

• Find the partial-fraction decomposition of the following:


488
Factoring the denominator, I get x(x2 + 3). I can't factor the
quadratic bit, so my expanded form will look like this:

Note that the numerator for the "x2 + 3" fraction is a linear
polynomial, not just a constant term.

Multiplying through by the common denominator, I get:

x – 3 = A(x2 + 3) + (Bx + C)(x)


x – 3 = Ax2 + 3A + Bx2 + Cx
x – 3 = (A + B)x2 + (C)x + (3A)

The only zero in the original denominator is x = 0, so:

(0) – 3 = (A + B)(0)2 + C(0) + 3A


–3 = 3A

Then A = –1. Since I have no other helpful x-values to work with,


I think I'll take the one value I've solved for, equate the remaining
coefficients, and see what that gives me:

489
x – 3 = (–1 + B)x2 + (C)x – 3
–1 + B = 0 and C = 1
B = 1 and C = 1

(There is no one "right" way to solve for the values of the


coefficients. Use whichever method "feels" right to you on a given
exercise.)

Then the decomposition is:

If the denominator of your rational expression has repeated unfactorable


quadratics, then you use linear-factor numerators and follow the pattern
that we used for repeated linear factors in the denominator; that is, you'll
use fractions with increasing powers of the repeated factors in the
denominator.

• Set up, but do not solve, the decomposition equality for the
following:

490
Since x2 + 1 is not factorable, I'll have to use numerators with
linear factors. Then the decomposition set-up looks like this:

Thankfully, I don't have to try to solve this one.

One additional note: Partial-fraction decomposition only works for


"proper" fractions. That is, if the denominator's degree is not larger than
the numerator's degree (so you have, in effect, an "improper" polynomial
fraction), then you first have to use long division to get the "mixed
number" form of the rational expression. Then decompose the remaining
fractional part.

• Decompose the following: Copyright © Elizabeth Stapel 2006-


2011 All Rights Reserved

The numerator is of degree 5; the denominator is of degree 3. So


first I have to do the long division:

491
The long division rearranges the rational expression to give me:

Now I can decompose the fractional part. The denominator factors


as (x2 + 1)(x – 2).

The x2 + 1 is irreducible, so the decomposition will be of the form:

Multiplying out and solving, I get:

2x2 + x + 5 = A(x2 + 1) + (Bx + C)(x – 2)


x = 2: 8 + 2 + 5 = A(5) + (2B + C)(0), 15 = 5A, and A = 3
x = 0: 0 + 0 + 5 = 3(1) + (0 + C)(0 – 2),
5 = 3 – 2C, 2 = –2C, and C = –1
x = 1: 2 + 1 + 5 = 3(1 + 1) + (1B – 1)(1 – 2),
8 = 6 + (B – 1)(–1) = 6 – B + 1,
8 = 7 – B, 1 = – B, and B = –1

Then the complete expansion is:

492
The preferred placement of the "minus" signs, either "inside" the
fraction or "in front", may vary from text to text. Just don't leave a
"minus" sign hanging loose underneath.

Integration of Rational functions: Example1

Find

Solution. Since the degree of the numerator is higher than the


denominator, we should perform the long-division. We get

which implies

We concentrate on the fraction . The partial decomposition


technique gives

493
This gives x + 2 = A(x-1) + B(x+1). If we substitute x=1, we get B = 3/2
and we substitute x=-1, we get A = -1/2. Therefore, we have

Now we are in position to perform the desired integration. Indeed, we


have

Since we have ,

we get

Integration of Rational functions: Example2

Evaluate

494
Solution: First let us find the antiderivative

The main idea is first try to come up with the derivative of in

the top (or the numerator). Since , we have

Hence

Since we have (see the table of Basic Formulas for Integrating Rational
Functions)

and

495
we get

Therefore, we have

Easy calculations give

CHAPTER 10: HYPERBOLIC FUNCTIONS

The hyperbolic functions , , , , , (hyperbolic


sine, hyperbolic cosine, hyperbolic tangent, hyperbolic cosecant,
hyperbolic secant, and hyperbolic cotangent) are analogs of the circular
functions, defined by removing s appearing in the complex
exponentials. For example,

496
(1)

so

(2)

Note that alternate notations are sometimes used, as summarized in the


following table.

alternate notations

(Gradshteyn and Ryzhik 2000, p. xxvii)

(Gradshteyn and Ryzhik 2000, p. xxvii)

(Gradshteyn and Ryzhik 2000, p. xxvii)

(Gradshteyn and Ryzhik 2000, p. xxvii)

The hyperbolic functions share many properties with the corresponding


circular functions. In fact, just as the circle can be represented
parametrically by

(3)

(4)

497
a rectangular hyperbola (or, more specifically, its right branch) can be
analogously represented by

(5)

(6)

where is the hyperbolic cosine and is the hyperbolic sine.

The hyperbolic functions arise in many problems of mathematics and

mathematical physics in which integrals involving arise (whereas

the circular functions involve ). For instance, the hyperbolic sine


arises in the gravitational potential of a cylinder and the calculation of
the Roche limit. The hyperbolic cosine function is the shape of a
hanging cable (the so-called catenary). The hyperbolic tangent arises in
the calculation of and rapidity of special relativity. All three appear in
the Schwarzschild metric using external isotropic Kruskal coordinates in
general relativity. The hyperbolic secant arises in the profile of a laminar
jet. The hyperbolic cotangent arises in the Langevin function for
magnetic polarization.

The hyperbolic functions are defined by

(7)

(8)

498
(9)

(10)

(11)

(12)

(13)

(14)

(15)

(16)

For arguments multiplied by ,

(17)

(18)

The hyperbolic functions satisfy many identities analogous to the


trigonometric identities (which can be inferred using Osborn's rule) such
as

(19)

(20)

499
(21)

See also Beyer (1987, p. 168).

Some half-angle formulas are

(22)

(23)

where .

Some double-angle formulas are

(24)

(25)

(26)

Identities for complex arguments include

(27)

(28)

The absolute squares for complex arguments are

(29)

500
The hyperbolic functions enjoy properties similar to the trigonometric
functions; their definitions, though, are much more straightforward:

Here are their graphs: the (pronounce: "kosh") is pictured in red, the
function (rhymes with the "Grinch") is depicted in blue.

As their trigonometric counterparts, the function is even, while the


function is odd.

Their most important property is their version of the Pythagorean


Theorem.

501

The verification is straightforward:

While , , parametrizes the unit circle, the

hyperbolic functions , , parametrize the

standard hyperbola , x>1.

In the picture below, the standard hyperbola is depicted in red, while the

point for various value


valuess of the parameter t is pictured in
blue.

502
The other hyperbolic functions are defined the same way, the rest of the
trigonometric functions is defined:

tanh x coth x

sech x csch x

503
For every formula for the trigonometric functions, there is a similar (not
necessary identical) formula for the hyperbolic functions:

Let's consider for example the addition formula for the hyperbolic cosine
function:

Start with the right side and multiply out:

Prove the addition formula for the hyperbolic sine function:

Show that .

You want to show that .

Start with the right side and multiply out:

504
The hyperbolic sine function is a one-to-one function, and thus has an
inverse. As usual, we obtain the graph of the inverse hyperbolic sine

function (also denoted by ) by reflecting the graph

of about the line y=x:

505
Since is defined in terms of the exponential function, you should
not be surprised that its inverse function can be expressed in terms of the
logarithmic function:

Let's set , and try to solve for x:

This is a quadratic equation with instead of x as the variable. y will be


considered a constant.

So using the quadratic formula, we obtain

Since for all x, and since for all y, we have to discard


the solution with the minus sign, so

and consequently

Read that last sentence again slowly!


506
We have found out that

You know what's coming up, don't you? Here's the graph. Note that the
hyperbolic cosine function is not one-to-one, so let's restrict the domain

to .

Here it is: Express the inverse hyperbolic cosine functions in terms of


the logarithmic function!

10.1.EXPONENTIAL AND HYPERBOLIC

507
Hyperbolic and exponential discounting

I'd never heard of hyperbolic discounting and it looked mathematically


interesting, so I followed the link. The article is a case of someone using
a mathematical term but not being quite sure what it means.

The author says early in the article:

Since the phrase hyperbolic discounting is despicable jargon, let me


explain it in terms that even I can understand.

Well, I don't know that it's "despicable" jargon (this is the kind of
statement that discourages people from being comfortable with
mathematics), and it worried me that a mathematical term was being
maligned, so I read on.

Immediate rewards

Question: If I offered you $100 today, or $105 in one month from now,
which would you choose?

Most people choose the immediate reward, because waiting for a month
for a bit more doesn't seem to make sense.

In plain English, that's all hyperbolic discounting really means. Most


people would choose to claim a reward right now, rather than a larger

508
reward some time in the future. It's what marketers use all the time to
encourage you to buy now.

The author correctly identifies an exponential curve and uses the


following graph as an example (without saying what its relevance is, or
indicating what the axes represent):

[Fig 1: Image source]

Then he immediately states the following, suggesting some relationship


between the exponential curve above, and hyperbolic curves. He simply
says:

This is a hyperbolic curve:

509
[Fig 2: Image source]

That's when it started to get interesting for me. The subject matter of the
graph suggested a physics application, rather than some behavioural
finance model. I began to suspect the author didn't really know what was
going on in this topic (or certainly didn't understand the example graph).

Exponential decay (increasing form) graphs

At first glance, this graph looked like an exponential decay (increasing


form) to me. An exponential decay (increasing form) curve describes
situations where a quantity grows quickly at first, then levels out to some
limiting value. It is the mirror image of an exponentially decaying
quantity. An example of such a curve arises in terminal velocity
problems. A skydiver's velocity starts at zero, builds rapidly to around
half of their final velocity, and as air resistance builds up, the velocity
more gradually builds to some terminal value (usually around 200
km/h.)

510
[Figure 3: Velocity of a skydiver V = 225(1 − e−t)]

Another case is the charge in a circuit containing a capacitor (see


Example 1 on that page). Here's the graph of the charge from that
example, given by q = 2(1 − e−10t).

[Fig 4: Charge in a capacitor]

The curve rises fairly quickly then flattens out to some limiting value
(which in this case is 2 coulombs).

It's called an exponential "decay" curve because the power of e is


negative. We've just flipped the exponential growth curve (that is,
511
reflected it in a horizontal axis) from its usual shape (steeply descending
at first, then levelling off).

The Vmax term in the hyperbolic curve above (Figure 2) was what made
me think of these exponential curves.

You can see the hyperbolic curve is similar to our "flipped" exponential
decay ones, but not identical. The hyperbolic curve appears to have
infinite slope at S = 0, whereas the exponential ones do not.

Hyperbolic curves

I didn't accept that Figure 2 was a hyperbolic curve at first because such
curves usually have 2 arms, something like this:

[Fig 5: Rectangular hyperbola, xy = 1]

512
Enzyme Studies

The curve given in Figure 2 is not about behavioral economics at all (nor
is it about physics, as I suspected). It actually represents the Michaelis-
Menten equation, which arises in the study of enzymes. Here's what the
original page (where the Figure 2 graph came from) said:

The Michaelis-Menten equation is a quantitative description of the


relationship among the rate of an enzyme-catalyzed reaction [v1], the
concentration of substrate [S] and two constants, Vmax (maximum
reaction rate) and Km (a constant).

Here is the formula for V:

The "velocity" refers to the speed of the reaction. For your convenience,
here is the curve for Vmax (Figure 2) we saw earlier:

513
They go on to say:

The Michaelis-Menten equation has the same form as the equation for a
rectangular hyperbola; graphical analysis of reaction rate (V) versus
substrate concentration [S] produces a hyperbolic rate plot.

Let's try it out with some simple constant values. I chose Vmax = 6, and
Km = 1.7.

The formula becomes:

Here is the graph of the hyperbola, showing the two arms I talked about
before.

But if we restrict our values of S such that S > 0, we obtain a curve that
looks a lot like Figure 2. The slope at S = 0 is not infinite, but it is quite
steep.
514
So the graph referred to in the marketing article is indeed hyperbolic.
The actual choice for their example was unfortunate, because the
variables seemed to have nothing to do with economics (they don't), and
I was no wiser about what was going on.

Comparison with exponential decay curves

Let's now see how close the hyperbolic curve is to a similar exponential
one. I graphed V = 6(1 − e−0.2S), (chosen so it has the same maximum
value) and here's the result:

515
Note the upper limit is the same as my hyperbolic curve, (Vmax = 6). The
shape of the two curves is similar, but not exactly the same. Here are the
2 graphs on the same set of axes:

Explanation

At the beginning of this post, I asked if you wanted $100 now or $105 in
a month. In most countries currently, the interest and inflation rates
mean it would be much better for you to wait for the month, as you
would make 5% on your money. At that monthly rate, you could make
around 80% per year!

Most people intuitively "discount" future money. That is, they know that
if they wait, they should get some reward for waiting. Now
mathematically, that reward grows exponentially (not as steep at the
beginning), because interest on money grows exponentially. But in our
minds, we want it now. That is, it's more like the hyperbolic case, where

516
we are impatient and want a higher reward for waiting even a short
while. Our impatience can be represented by the hyperbolic curve.

Studies on peoples' (and animals') investing and savings habits have


shown this to be the case. In one experiment, a group of subjects was
offered $15 now, or they could wait and get more money later. The
average responses were: One month later $20, one year later $50, and
ten years later $100. These perceived "wait for reward" data follow the
hyperbolic curve more closely than the exponential one. [Study by
Thaler, R. (2005), Advances in Behavioral Economics, Russel Sage
Foundation]

Applications

Hyperbolic discounting has many implications in the areas of savings


rates (small pain now for future gain), climate change (impact of energy
policy now on future environmental conditions), attitudes to health
screening (some discomfort now for future health), lifestyle choices
(amount of exercise now for reducing obesity) and behavior due to
weather predictions (how many crops to plant for future benefit).

Conclusion

Next time some salesperson is pressuring you to buy something right


now, ask him about the difference between exponential and hyperbolic

517
discounting. It will give you some breathing time to work out whether it
really is better to buy now, or wait.

Derivatives of Hyperbolic Functions

The last set of functions that we’re going to be looking in this chapter at
are the hyperbolic functions. In many physical situations combinations
of and arise fairly often. Because of this these
combinations are given names. There are the six hyperbolic functions
and they are defined as follows.

518
Here are the graphs of the three main hyperbolic functions.

We also have the following facts about the hyperbolic functions.

519
You’ll note that these are similar, but not quite the same, to some of the
more common trig identities so be careful to not confuse the identities
here with those of the regular trig functions.

Because the hyperbolic functions are defined in terms of exponential


functions finding their derivatives is fairly simple provided you’ve
already read through the next section. We haven’t however so we’ll
need the following formula that can be easily proved after we’ve
covered the next section.

With this formula we’ll do the derivative for hyperbolic sine and leave
the rest to you as an exercise.

520
For the rest we can either use the definition of the hyperbolic function
and/or the quotient rule. Here are all six derivatives.

521
Here are a couple of quick derivatives using hyperbolic functions.

Example 1 Differentiate each of the following functions.

(a)

(b)

Solution

(a)

(b)

522
10.2.PHOASORS IN ELECTRICAL CIRCUITS

How to Use Phasors for Circuit Analysis

A phasor is a complex number in polar form that you can apply to circuit
analysis. When you plot the amplitude and phase shift of a sinusoid in a
complex plane, you form a phase vector, or phasor.

As you might remember from algebra class, a complex number consists


of a real part and an imaginary part. For circuit analysis, think of the real
part as tying in with resistors that get rid of energy as heat and the
imaginary part as relating to stored energy, like the kind found in
inductors and capacitors.

You can also think of a phasor as a rotating vector. Unlike a vector


having magnitude and direction, a phasor has magnitude VA and angular
displacement ϕ. You measure angular displacement in the
counterclockwise direction from the positive x-axis.

Here is a diagram of a voltage phasor as a rotating vector at some


frequency, with its tail at the origin. If you need to add or subtract
phasors, you can convert the vector into its x-component (VA cos ϕ) and
its y-component (VA sin ϕ) with some trigonometry.

523
The following sections explain how to find the different forms of
phasors and introduce you to the properties of phasors.

Find phasor forms

Phasors, which you describe with complex numbers, embody the


amplitude and phase of a sinusoidal voltage or current. The phase is the
angular shift of the sinusoid, which corresponds to a time shift t0. So if
you have cos[ω(t – t0)], then ωt0 = ϕO, where ϕO is the angular phase
shift.

To establish a connection between complex numbers and sine and cosine


waves, you need the complex exponential ejθ and Euler’s formula:

ejθ = cosθ + jsinθ

where

j = √-1

524
The left side of Euler’s formula is the polar phasor form, and the right
side is the rectangular phasor form. You can write the cosine and sine as
follows:

cosθ = Re[ejθ]

sinθ = Im[ejθ]

In the equations shown here, Re[ ] denotes the real part of a complex
number, and Im[ ] denotes the imaginary part of a complex number.

Here is a cosine function and a shifted cosine function with a phase shift
of π/2.

525
In general, for the sinusoids shown here, you have an amplitude VA, a
radian frequency ω, and a phase shift of ϕ given by the following
expression:

Because the radian frequency ω remains the same in a linear circuit, a


phasor just needs the amplitude VA and the phase ϕ to get into polar
form:

V = VAejϕ

To describe a phasor, you need only the amplitude and phase shift (not
the radian frequency). Using Euler’s formula, the rectangular form of the
phasor is

V = VAcosϕ + jVAsinϕ

Examine the properties of phasors

One key phasor property is the additive property. If you add sinusoids
that have the same frequency, then the resulting phasor is simply the
vector sum of the phasors — just like adding vectors:

V = V1 + V2 + …VN

526
For this equation to work, phasors V1, V2, …,VN must have the same
frequency. You find this property useful when using Kirchhoff’s laws.

Another vital phasor property is the time derivative. The time derivative
of a sine wave is another scaled sine wave with the same frequency.
Taking the derivative of phasors is an algebraic multiplication of jω in
the phasor domain. First, you relate the phasor of the original sine wave
to the phasor of the derivative:

But the derivative of a complex exponential is another exponential


multiplied by jω:

Based on the phasor definition, the quantity (jωV) is the phasor of the
time derivative of a sine wave phasor V. Rewrite the phasor jωV as

When taking the derivative, you multiply the amplitude VA by ω and


shift the phase angle by 90o, or equivalently, you multiply the original

527
sine wave by jω. See how the imaginary number j rotates a phasor by
90o?

Working with capacitors and inductors involves derivatives because


things change over time. For capacitors, how quickly a capacitor voltage
changes directs the capacitor current. For inductors, how quickly an
inductor current changes controls the inductor voltage.

CHAPTER 11: DIFFERENTIAL CALCULUS

Differentiation Formulas

In the first section of this chapter we saw the definition of the derivative
and we computed a couple of derivatives using the definition. As we
saw in those examples there was a fair amount of work involved in
computing the limits and the functions that we worked with were not
terribly complicated.

For more complex functions using the definition of the derivative would
be an almost impossible task. Luckily for us we won’t have to use the
definition terribly often. We will have to use it on occasion, however we
have a large collection of formulas and properties that we can use to
simplify our life considerably and will allow us to avoid using the
definition whenever possible.

528
We will introduce most of these formulas over the course of the next
several sections. We will start in this section with some of the basic
properties and formulas. We will give the properties and formulas in
this section in both “prime” notation and “fraction” notation.

Properties

1)

OR

In other words, to differentiate a sum or difference all we need to


do is differentiate the individual terms and then put them back
together with the appropriate signs. Note as well that this property
is not limited to two functions.

See the Proof of Various Derivative Formulas section of the Extras


chapter to see the proof of this property. It’s a very simple proof

529
using the definition of the derivative.

2)

OR

, c is any number

In other words, we can “factor” a multiplicative constant out of a


derivative if we need to. See the Proof of Various Derivative
Formulas section of the Extras chapter to see the proof of this
property.

Note that we have not included formulas for the derivative of products
or quotients of two functions here. The derivative of a product or
quotient of two functions is not the product or quotient of the derivatives
of the individual pieces. We will take a look at these in the next
section.

Next, let’s take a quick look at a couple of basic “computation” formulas


that will allow us to actually compute some derivatives.

Formulas

530
1) If then

OR

The derivative of a constant is zero. See the Proof of Various


Derivative Formulas section of the Extras chapter to see the proof
of this formula.

2) If then

OR

, n is any number.

This formula is sometimes called the power rule. All we are


doing here is bringing the original exponent down in front and
multiplying and then subtracting one from the original exponent.

Note as well that in order to use this formula n must be a number,


it can’t be a variable. Also note that the base, the x, must be a
variable, it can’t be a number. It will be tempting in some later
sections to misuse the Power Rule when we run in some functions
where the exponent isn’t a number and/or the base isn’t a variable.

531
See the Proof of Various Derivative Formulas section of the Extras
chapter to see the proof of this formula. There are actually three
different proofs in this section. The first two restrict the formula to
n being an integer because at this point that is all that we can do at
this point. The third proof is for the general rule, but does suppose
that you’ve read most of this chapter.

These are the only properties and formulas that we’ll give in this
section. Let’s compute some derivatives using these properties.

Example 1 Differentiate each of the following functions.

(a)

[Solution]

(b)
[Solution]

(c)

532
[Solution]

(d)

[Solution]

(e) [Solution]

Solution

(a)

In this case we have the sum and difference of four terms and so we will
differentiate each of the terms using the first property from above and
then put them back together with the proper sign. Also, for each term
with a multiplicative constant remember that all we need to do is
“factor” the constant out (using the second property) and then do the
derivative.

533
Notice that in the third term the exponent was a one and so upon
subtracting 1 from the original exponent we get a new exponent of zero.
Now recall that . Don’t forget to do any basic
arithmetic that needs to be done such as any multiplication and/or
division in the coefficients.

[Return to Problems]

(b)

The point of this problem is to make sure that you deal with negative
exponents correctly. Here is the derivative.

Make sure that you correctly deal with the exponents in these cases,

534
especially the negative exponents. It is an easy mistake to “go the other
way” when subtracting one off from a negative exponent and get
instead of the correct .

(c)

Now in this function the second term is not correctly set up for us to use
the power rule. The power rule requires that the term be a variable to a
power only and the term must be in the numerator. So, prior to
differentiating we first need to rewrite the second term into a form that
we can deal with.

Note that we left the 3 in the denominator and only moved the variable
up to the numerator. Remember that the only thing that gets an
exponent is the term that is immediately to the left of the exponent. If
we’d wanted the three to come up as well we’d have written,

535
so be careful with this! It’s a very common mistake to bring the 3 up
into the numerator as well at this stage.

Now that we’ve gotten the function rewritten into a proper form that
allows us to use the Power Rule we can differentiate the function. Here
is the derivative for this part.

(d)

All of the terms in this function have roots in them. In order to use the
power rule we need to first convert all the roots to fractional exponents.
Again, remember that the Power Rule requires us to have a variable to a
number and that it must be in the numerator of the term. Here is the

536
function written in “proper” form.

In the last two terms we combined the exponents. You should always do
this with this kind of term. In a later section we will learn of a technique
that would allow us to differentiate this term without combining
exponents, however it will take significantly more work to do. Also
don’t forget to move the term in the denominator of the third term up to
the numerator. We can now differentiate the function.

537
Make sure that you can deal with fractional exponents. You will see a
lot of them in this class.

(e)

In all of the previous examples the exponents have been nice integers or
fractions. That is usually what we’ll see in this class. However, the
exponent only needs to be a number so don’t get excited about problems
like this one. They work exactly the same.

The answer is a little messy and we won’t reduce the exponents down to
decimals. However, this problem is not terribly difficult it just looks
that way initially.

538
There is a general rule about derivatives in this class that you will need
to get into the habit of using. When you see radicals you should always
first convert the radical to a fractional exponent and then simplify
exponents as much as possible. Following this rule will save you a lot of
grief in the future.

Back when we first put down the properties we noted that we hadn’t
included a property for products and quotients. That doesn’t mean that
we can’t differentiate any product or quotient at this point. There are
some that we can do.

Example 2 Differentiate each of the following functions.

(a)
[Solution]

(b)

Solution

539
(a)

In this function we can’t just differentiate the first term, differentiate the
second term and then multiply the two back together. That just won’t
work. We will discuss this in detail in the next section so if you’re not
sure you believe that hold on for a bit and we’ll be looking at that soon
as well as showing you an example of why it won’t work.

It is still possible to do this derivative however. All that we need to do is


convert the radical to fractional exponents (as we should anyway) and
then multiply this through the parenthesis.

Now we can differentiate the function.

540
(b)

As with the first part we can’t just differentiate the numerator and the
denominator and the put it back together as a fraction. Again, if you’re
not sure you believe this hold on until the next section and we’ll take a
more detailed look at this.

We can simplify this rational expression however as follows.

This is a function that we can differentiate.

[Return to Problems]

541
So, as we saw in this example there are a few products and quotients that
we can differentiate. If we can first do some simplification the functions
will sometimes simplify into a form that can be differentiated using the
properties and formulas in this section.

Before moving on to the next section let’s work a couple of examples to


remind us once again of some of the interpretations of the derivative.

Example 3 Is increasing,
decreasing or not changing at ?

Solution

We know that the rate of change of a function is given by the functions


derivative so all we need to do is it rewrite the function (to deal with the
second term) and then take the derivative.

542
Note that we rewrote the last term in the derivative back as a fraction.
This is not something we’ve done to this point and is only being done
here to help with the evaluation in the next step. It’s often easier to do
the evaluation with positive exponents.

So, upon evaluating the derivative we get

So, at the derivative is negative and so the


function is decreasing at .

Example 4 Find the equation of the tangent line to

at .

Solution

We know that the equation of a tangent line is given by,

543
So, we will need the derivative of the function (don’t forget to get rid of
the radical).

Again, notice that we eliminated the negative exponent in the derivative


solely for the sake of the evaluation. All we need to do then is evaluate
the function and the derivative at the point in question,
.

The tangent line is then,

544
Example 5 The position of an object at any time t (in hours) is given
by,

Determine when the object is moving to the right and when the object is
moving to the left

Solution

The only way that we’ll know for sure which direction the object is
moving is to have the velocity in hand. Recall that if the velocity is
positive the object is moving off to the right and if the velocity is
negative then the object is moving to the left.

So, we need the derivative since the derivative is the velocity of the

545
object. The derivative is,

The reason for factoring the derivative will be apparent shortly.

Now, we need to determine where the derivative is positive and where


the derivative is negative. There are several ways to do this. The
method that I tend to prefer is the following.

Since polynomials are continuous we know from the Intermediate Value


Theorem that if the polynomial ever changes sign then it must have first
gone through zero. So, if we knew where the derivative was zero we
would know the only points where the derivative might change sign.

We can see from the factored form of the derivative that the derivative
will be zero at and . Let’s graph these

546
points on a number line.

Now, we can see that these two points divide the number line into three
distinct regions. In each of these regions we know that the derivative
will be the same sign. Recall the derivative can only change sign at the
two points that are used to divide the number line up into the regions.

Therefore, all that we need to do is to check the derivative at a test point


in each region and the derivative in that region will have the same sign
as the test point. Here is the number line with the test points and results
shown.

547
Here are the intervals in which the derivative is positive and negative.

We included negative t’s here because we could even though they may
not make much sense for this problem. Once we know this we also can
answer the question. The object is moving to the right and left in the
following intervals.

548
ƒ'(x) =
and if this limit exists

ƒ'(c) =

If ƒ is differentiable at x = c, then ƒ is continuous at x = c.

Differentiation Rules

General and Logarithmic Differentiation Rules

1. [cu] = cu' 2. [u v] = u' v' sum rule

product quotient
3. [uv] = uv' + vu' rule 4. [ ]= rule
power
5. [c] = 0 6. [un] = nun-1u' rule

7. [x] = 1 8. [ln u] =

10. [ƒ(g(x))] = ƒ' (g(x)) chain


9. [eu] = euu' rule
g' (x)

549
Derivatives of the Trigonometric Functions

1. [sin u] = (cos u)u' 2. [csc u] = -(csc u cot u)u'

3. [cos u] = -(sin u)u' 4. [sec u] = (sec u tan u)u'

5. [tan u] = (sec2 u)u' 6. [cot u] = -(csc2 u)u'

Derivatives of the Inverse Trigonometric Functions

1. [arcsin u] = 2. [arccsc u] =

3. [arccos u] = 4. [arcsec u] =

5. [arctan u] = 6. [arccot u] =

Implicit Differentiation

Implicit differentiation is useful in cases in which you cannot easily


solve for y as a function of x.

Exercise: Find for y3 + xy - 2y - x2 = -2

[y3 + xy - 2y - x2] = [-2]

3y2 + (x + y) - 2 - 2x = 0

(3y2 + x - 2) = 2x - y

550
=

Higher Order Derivatives

These are successive derivatives of ƒ(x). Using prime notation, the


second derivative of ƒ(x), ƒ''(x), is the derivative of ƒ'(x). The numerical
notation for higher order derivatives is represented by:

ƒ(n)(x) = y(n)

The second derivative is also indicated by .

Exercise: Find the third derivative of y = x5.

y' = 5x4
y'' = 20x3
y''' = 60x2

Derivatives of Inverse Functions

If y = ƒ(x) and x = ƒ-1(y) are differentiable inverse functions, then their


derivatives are reciprocals:

Logarithmic Differentiation

551
It is often advantageous to use logarithms to differentiate certain
functions.

1. Take ln of both sides

2. Differentiate

3. Solve for y'

4. Substitute for y

5. Simplify

Exercise:
Find for y =

ln y = [ln(x2 + 1) - ln(x2 - 1)]

y' =

y' =

Mean Value Theorem

If ƒ is continuous on [a, b] and differentiable on (a, b), then there exists a


number c in (a, b) such that

552
ƒ'(c) =

L'Hôpital's Rule

If lim ƒ(x)/g(x) is an indeterminate of the form 0/0 or , and if lim


ƒ'(x)/g'(x) exists, then

lim = lim

The indeterminate form 0 can be reduced to 0/0 or so that


L'Hôpital's Rule can be applied.

Note: L'Hôpital's Rule can be applied to the four different indeterminate


forms of :

, , , and

Exercise: What is ?
(A) 2
(B) 1
(C) 0
(D)
(E) The limit does not exist.

553
The answer is
B. =1

Tangent and Normal Lines

The derivative of a function at a point is the slope of the tangent line.


The normal line is the line that is perpendicular to the tangent line at the
point of tangency.

The slope of the normal line to the curve y = 2x2 + 1 at


Exercise:
(1, 3) is
(A) -1/12
(B) -1/4
(C) 1/12
(D) ¼
(E) 4

The answer is
y' = 4x
B.
y = 4(1) = 4
slope of normal = -1/4

Extreme Value Theorem

554
If a function ƒ(x) is continuous on a closed interval, then ƒ(x) has both a
maximum and minimum value in the interval.

Curve Sketching

Situation Indicates
ƒ'(c) > 0 ƒ increasing at c
ƒ'(c) < 0 ƒ decreasing at c
ƒ'(c) = 0 horizontal tangent at c
ƒ'(c) = 0, ƒ'(c-) < 0,
relative minimum at c
ƒ'(c+) > 0
ƒ'(c) = 0, ƒ'(c-) > 0,
relative maximum at c
ƒ'(c+) < 0
ƒ'(c) = 0, ƒ''(c) > 0 relative minimum at c
ƒ'(c) = 0, ƒ''(c) < 0 relative maximum at c
ƒ'(c) = 0, ƒ''(c) = 0 further investigation required
ƒ''(c) > 0 concave upward
ƒ''(c) < 0 concave downward
ƒ''(c) = 0 further investigation required
ƒ''(c) = 0, ƒ''(c-) < 0,
point of inflection
ƒ''(c+) > 0
ƒ''(c) = 0, ƒ''(c-) > 0, point of inflection

555
ƒ''(c+) < 0
ƒ(c) exists, ƒ'(c) does not possibly a vertical tangent; possibly an
exist absolute max. or min.

Newton's Method for Approximating Zeros of a Function

xn + 1 = xn -

To use Newton's Method, let x1 be a guess for one of the roots. Reiterate
the function with the result until the required accuracy is obtained.

Optimization Problems

Calculus can be used to solve practical problems requiring maximum or


minimum values.

A rectangular box with a square base and no top has a


Exercise:
volume of 500 cubic
inches. Find the dimensions for the box that require the
least amount of
material.

Let V = volume, S = surface area, x = length of base, and


h = height of box

V = x2h = 500

556
S = x2 + 4xh = x2 + 4x(500/x2) = x2 + (2000/x)
S' = 2x - (2000/x2) = 0
2x3 = 2000
x = 10, h = 5
Dimensions: 10 x 10 x 5 inches

Rates-of-Change Problems

Distance, Velocity, and Acceleration

y = s(t) position of a particle along a line at time t

v = s'(t) instantaneous velocity (rate of change) at time t

a = v'(t) = s''(t) instantaneous acceleration at time t

Related Rates of Change

Calculus can be used to find the rate of change of two or more variable
that are functions of time t by differentiating with respect to t.

A boy 5 feet tall walks at a rate of 3 feet/sec toward a


Exercise:
streetlamp that is
12 feet above the ground.
a) What is the rate of change of the tip of his shadow?
b) What is the rate of change of the length of his shadow?

557
b) = ft/sec a) = ft/sec

Note: the answers are independent of the distance from the light.

A conical tank 20 feet in diameter and 30 feet tall (with


Exercise:
vertex down)
leaks water at a rate of 5 cubic feet per hour. At what rate
is the water
level dropping when the water is 15 feet deep?

V= r2h h2

5= h2

V= h3 ft/hr

11.1. Limits

558
et’s first start off with the following “definition” of a limit.

Definition

We say that the limit of f(x) is L as x approaches a and write this as

provided we can make f(x) as close to L as we want for all x sufficiently


close to a, from both sides, without actually letting x be a.

This is not the exact, precise definition of a limit. If you would like to
see the more precise and mathematical definition of a limit you should
check out the The Definition of a Limit section at the end of this
chapter. The definition given above is more of a “working” definition.
This definition helps us to get an idea of just what limits are and what
they can tell us about functions.

559
So just what does this definition mean? Well let’s suppose that we know
that the limit does in fact exist. According to our “working” definition
we can then decide how close to L that we’d like to make f(x). For sake
of argument let’s suppose that we want to make f(x) no more that 0.001
away from L. This means that we want one of the following

Now according to the “working” definition this means that if we get x


sufficiently close to a we can make one of the above true. However, it
actually says a little more. It actually says that somewhere out there in
the world is a value of x, say X, so that for all x’s that are closer to a
than X then one of the above statements will be true.

This is actually a fairly important idea. There are many functions out
there in the world that we can make as close to L for specific values of x
that are close to a, but there will be other values of x closer to a that give

560
functions values that are nowhere near close to L. In order for a limit to
exist once we get f(x) as close to L as we want for some x then it will
need to stay in that close to L (or get closer) for all values of x that are
closer to a. We’ll see an example of this later in this section.

In somewhat simpler terms the definition says that as x gets closer and
closer to x=a (from both sides of course…) then f(x) must be getting
closer and closer to L. Or, as we move in towards x=a then f(x) must be
moving in towards L.

It is important to note once again that we must look at values of x that


are on both sides of x=a. We should also note that we are not allowed to
use x=a in the definition. We will often use the information that limits
give us to get some information about what is going on right at x=a, but
the limit itself is not concerned with what is actually going on at x=a.
The limit is only concerned with what is going on around the point x=a.
This is an important concept about limits that we need to keep in mind.

An alternative notation that we will occasionally use in denoting limits is

561
How do we use this definition to help us estimate limits? We do exactly
what we did in the previous section. We take x’s on both sides of x=a
that move in closer and closer to a and we plug these into our function.
We then look to see if we can determine what number the function
values are moving in towards and use this as our estimate.

Let’s work an example.

Example 1 Estimate the value of the following limit.

Solution

Notice that I did say estimate the value of the limit. Again, we are not
going to directly compute limits in this section. The point of this section
is to give us a better idea of how limits work and what they can tell us

562
about the function.

So, with that in mind we are going to work this in pretty much the same
way that we did in the last section. We will choose values of x that get
closer and closer to x=2 and plug these values into the function. Doing
this gives the following table of values.

x f(x) x f(x)
2.5 3.4 1.5 5.0
2.1 3.857142857 1.9 4.157894737
2.01 3.985074627 1.99 4.015075377
2.001 3.998500750 1.999 4.001500750
2.0001 3.999850007 1.9999 4.000150008
2.00001 3.999985000 1.99999 4.000015000

Note that we made sure and picked values of x that were on both sides of
and that we moved in very close to to
make sure that any trends that we might be seeing are in fact correct.

Also notice that we can’t actually plug in into the

563
function as this would give us a division by zero error. This is not a
problem since the limit doesn’t care what is happening at the point in
question.

From this table it appears that the function is going to 4 as x approaches


2, so

Let’s think a little bit more about what’s going on here. Let’s graph the
function from the last example. The graph of the function in the range
of x’s that were interested in is shown below.

First, notice that there is a rather large open dot at .


This is there to remind us that the function (and hence the graph) doesn’t
exist at .

564
As we were plugging in values of x into the function we are in effect
moving along the graph in towards the point as . This
is shown in the graph by the two arrows on the graph that are moving in
towards the point.

When we are computing limits the question that we are really asking is
what y value is our graph approaching as we move in towards
on our graph. We are NOT asking what y value the graph takes
at the point in question. In other words, we are asking what the graph is
doing around the point . In our case we can see that as
x moves in towards 2 (from both sides) the function is approaching
even though the function itself doesn’t even exist at
. Therefore we can say that the limit is in fact 4.

565
So what have we learned about limits? Limits are asking what the
function is doing around and are not concerned with
what the function is actually doing at . This is a good
thing as many of the functions that we’ll be looking at won’t even exist
at as we saw in our last example.

Let’s work another example to drive this point home.

Example 2 Estimate the value of the following limit.

566
Solution

The first thing to note here is that this is exactly the same function as the
first example with the exception that we’ve now given it a value for
. So, let’s first note that

As far as estimating the value of this limit goes, nothing has changed in
comparison to the first example. We could build up a table of values as
we did in the first example or we could take a quick look at the graph of
the function. Either method will give us the value of the limit.

Let’s first take a look at a table of values and see what that tells us.
Notice that the presence of the value for the function at
will not change our choices for x. We only choose values of x that are
getting closer to but we never take .
In other words the table of values that we used in the first example will
be exactly the same table that we’ll use here. So, since we’ve already
got it down once there is no reason to redo it here.

567
From this table it is again clear that the limit is,

The limit is NOT 6! Remember from the discussion after the first
example that limits do not care what the function is actually doing at the
point in question. Limits are only concerned with what is going on
around the point. Since the only thing about the function that we
actually changed was its behavior at this will not
change the limit.

Let’s also take a quick look at this function's graph to see if this says the
same thing.

568
Again, we can see that as we move in towards on our
graph the function is still approaching a y value of 4. Remember that we
are only asking what the function is doing around and
we don’t care what the function is actually doing at .
The graph then also supports the conclusion that the limit is,

Let’s make the point one more time just to make sure we’ve got it.
Limits are not concerned with what is going on at .
Limits are only concerned with what is going on around
. We keep saying this, but it is a very important concept about limits

569
that we must always keep in mind. So, we will take every opportunity to
remind ourselves of this idea.

Since limits aren’t concerned with what is actually happening at


we will, on occasion, see situations like the previous
example where the limit at a point and the function value at a point are
different. This won’t always happen of course. There are times where
the function value and the limit at a point are the same and we will
eventually see some examples of those. It is important however, to not
get excited about things when the function and the limit do not take the
same value at a point. It happens sometimes and so we will need to be
able to deal with those cases when they arise.

Let’s take a look another example to try and beat this idea into the
ground.

Example 3 Estimate the value of the following limit.

570
Solution

First don’t get excited about the θ in function. It’s just a letter, just like
x is a letter! It’s a Greek letter, but it’s a letter and you will be asked to
deal with Greek letters on occasion so it’s a good idea to start getting
used to them at this point.

Now, also notice that if we plug in θ=0 that we will get division by
zero and so the function doesn’t exist at this point. Actually, we get 0/0
at this point, but because of the division by zero this function does not
exist at θ=0.

So, as we did in the first example let’s get a table of values and see what
if we can guess what value the function is heading in towards.

1 0.45969769 -1 -0.45969769
0.1 0.04995835 -0.1 -0.04995835
0.01 0.00499996 -0.01 -0.00499996
0.001 0.00049999 -0.001 -0.00049999

Okay, it looks like the function is moving in towards a value of zero as

571
θ moves in towards 0, from both sides of course.

Therefore, the we will guess that the limit has the value,

So, once again, the limit had a value even though the function didn’t
exist at the point we were interested in.

It’s now time to work a couple of more examples that will lead us into
the next idea about limits that we’re going to want to discuss.

Example 4 Estimate the value of the following limit.

572
Solution

Let’s build up a table of values and see what’s going on with our
function in this case.

t f(t) t f(t)
1 -1 -1 -1
0.1 1 -0.1 1
0.01 1 -0.01 1
0.001 1 -0.001 1

Now, if we were to guess the limit from this table we would guess that
the limit is 1. However, if we did make this guess we would be wrong.
Consider any of the following function evaluations.

573
In all three of these function evaluations we evaluated the function at a
number that is less that 0.001 and got three totally different numbers.
Recall that the definition of the limit that we’re working with requires
that the function be approaching a single value (our guess) as t gets
closer and closer to the point in question. It doesn’t say that only some
of the function values must be getting closer to the guess. It says that all
the function values must be getting closer and closer to our guess.

To see what’s happening here a graph of the function would be


convenient.

From this graph we can see that as we move in towards


the function starts oscillating wildly and in fact the oscillations
increases in speed the closer to that we get. Recall from
our definition of the limit that in order for a limit to exist the function
must be settling down in towards a single value as we get closer to the

574
point in question.

This function clearly does not settle in towards a single number and so
this limit does not exist!

This last example points out the drawback of just picking values of x
using a table of function values to estimate the value of a limit. The
values of x that we chose in the previous example were valid and in fact
were probably values that many would have picked. In fact they were
exactly the same values we used in the problem before this one and they
worked in that problem!

When using a table of values there will always be the possibility that we
aren’t choosing the correct values and that we will guess incorrectly for
our limit. This is something that we should always keep in mind when
doing this to guess the value of limits. In fact, this is such a problem
that after this section we will never use a table of values to guess the
value of a limit again.

575
This last example also has shown us that limits do not have to exist. To
this point we’ve only seen limits that have existed, but that just doesn’t
always have to be the case.

Let’s take a look at one more example in this section.

Example 5 Estimate the value of the following limit.

Solution

This function is often called either the Heaviside or step function. We


could use a table of values to estimate the limit, but it’s probably just as
quick in this case to use the graph so let’s do that. Below is the graph of
this function.

576
We can see from the graph that if we approach from the
right side the function is moving in towards a y value of 1. Well
actually it’s just staying at 1, but in the terminology that we’ve been
using in this section it’s moving in towards 1…

Also, if we move in towards from the left the function is


moving in towards a y value of 0.

According to our definition of the limit the function needs to move in


towards a single value as we move in towards (from
both sides). This isn’t happening in this case and so in this example we
will also say that the limit doesn’t exist.

577
Note that the limit in this example is a little different from the previous
example. In the previous example the function did not settle down to a
single number as we moved in towards . In this example
however, the function does settle down to a single number as
on either side. The problem is that the number is different on
each side of . This is an idea that we’ll look at in a little
more detail in the next section.

LIMITS AND INFINITY

One of the mysteries of Mathematics seems to be the concept of


"infinity", usually denoted by the symbol . So what is ? It is
simply a symbol that represents large numbers. Indeed, numbers are of
three kinds: large, normal size, and small. The normal size numbers are
the ones that we have a clear feeling for. For example, what does a
trillion mean? That is a very large number. Also numbers involved in
macro-physics are very large numbers. Small numbers are usually used
in micro-physics. Numbers like 10-75 are very small. Being positive or
negative has special meaning depending on the problem at hand. The

common mistake is to say that is smaller than 0. While this may be

578
true according to the natural order on the real line in term of sizes,
is big, very big!

So when do we have to deal with and ? Easy: whenever you take


the inverse of small numbers, you generate large numbers and vice-
versa. Mathematically we can write this as:

Note that the inverse of a small number is a large number. So size-wise


there is no problem. But we have to be careful about the positive or
negative sign. We have to make sure we know whether a small number
is positive or negative. 0+ represents small positive numbers while 0-
represents small negative numbers. (Similarly, we will use e.g. 3+ to
denote numbers slightly bigger than 3, and 3- to denote numbers slightly
smaller than 3.) In other words, being more precise we have

Remark. Do not treat as ordinary numbers. These symbols do not

obey the usual rules of arithmetic, for instance, ,

, , etc.

579
Example. Consider the function

When , then . So

Note that when x gets closer to 3, then the points on the graph get closer
to the (dashed) vertical line x=3. Such a line is called a vertical
asymptote. For a given function f(x), there are four cases, in which
vertical asymptotes can present themselves:

(i)

580
; ;

(ii)

; ;

(iii)

; ;

(iv)

; ;

581
Next we investigate the behavior of functions when . We have

seen that . So for example, we have

In the next example, we show how this result is very useful.

Example. Consider the function

We have

which implies

582
Note that when x gets closer to (x gets large), then the points on the
graph get closer to the horizontal line y=2. Such a line is called a
horizontal asymptote.

In particular, we have

for any number a, and any positive number r, provided xr is defined. We


also have

583
For , we have to be careful about the definition of the power of
negative numbers. In particular, we have

for any natural number n.

Example. Consider the function

We have

So we have

Example. Consider the function

584
We have

and then

When x goes to , then x > 0, which implies that |x| = x. Hence

When x goes to , then x < 0, which implies that |x| = -x. Hence

585
Remark. Be careful! A common mistake is to assume that .

This is true if and false if x < 0.

Exercise 1. Find

Answer.

Hence

586
Exercise 2. Find

Answer.

We have

So

Since , then x < 0 which implies

587
Exercise 3. Find

Answer to Exercise 3

We have

Since

588
we get

Exercise 4. Find

Answer to Exercise 4

We have

Since

which implies

589
On the other hand, we have

where we used |x| = x since . Finally we have

which implies

Remark. You may think: Why make things so complicated? Isn't there
an easier way?

Unfortunately, the answer is no! Indeed, we have

590
But the difference of two very large numbers may not be small. In fact,

is left undefined. In our last example it turned out that

, but you can easily come up with other examples where

turns out to be some other number.

Exercise 5. Find the vertical and horizontal asymptotes for the graph of

Answer to Exercise 5

The vertical asymptotes are found at the "bad" points, i.e. the points,
which are not the domain. In this case, we have one "bad" point, where
the denominator equals zero: x=2. We have

and

591
So x=2 is a vertical asymptote. On the other hand, we have

and

So y=1 and y= -1 are horizontal asymptotes.

592
11.2.THE DERIVATIVE

The Definition of the Derivative

In the first section of the last chapter we saw that the computation of the
slope of a tangent line, the instantaneous rate of change of a function,
and the instantaneous velocity of an object at all
required us to compute the following limit.

We also saw that with a small change of notation this limit could also be
written as,

(1)

This is such an important limit and it arises in so many places that we


give it a name. We call it a derivative. Here is the official definition of
the derivative.

593
The derivative of with respect to x is the function

and is defined as,

(2)

Note that we replaced all the a’s in (1) with x’s to acknowledge the fact

that the derivative is really a function as well. We often “read”

as “f prime of x”.

Let’s compute a couple of derivatives using the definition.

Example 1 Find the derivative of the following function using the


definition of the derivative.

594
Solution

So, all we really need to do is to plug this function into the definition of
the derivative, (1), and do some algebra. While, admittedly, the algebra
will get somewhat unpleasant at times, but it’s just algebra so don’t get
excited about the fact that we’re now computing derivatives.

First plug the function into the definition of the derivative.

Be careful and make sure that you properly deal with parenthesis when

595
doing the subtracting.

Now, we know from the previous chapter that we can’t just plug in
since this will give us a division by zero error. So we
are going to have to do some work. In this case that means multiplying
everything out and distributing the minus sign through on the second
term. Doing this gives,

Notice that every term in the numerator that didn’t have an h in it


canceled out and we can now factor an h out of the numerator which will
cancel against the h in the denominator. After that we can compute the
limit.

596
So, the derivative is,

Example 2 Find the derivative of the following function using the


definition of the derivative.

Solution

This one is going to be a little messier as far as the algebra goes.


However, outside of that it will work in exactly the same manner as the

597
previous examples. First, we plug the function into the definition of the
derivative,

Note that we changed all the letters in the definition to match up with the
given function. Also note that we wrote the fraction a much more
compact manner to help us with the work.

As with the first problem we can’t just plug in . So we


will need to simplify things a little. In this case we will need to combine
the two terms in the numerator into a single rational expression as
follows.

598
Before finishing this let’s note a couple of things. First, we didn’t
multiply out the denominator. Multiplying out the denominator will just
overly complicate things so let’s keep it simple. Next, as with the first
example, after the simplification we only have terms with h’s in them

599
left in the numerator and so we can now cancel an h out.

So, upon canceling the h we can evaluate the limit and get the
derivative.

The derivative is then,

600
Example 3 Find the derivative of the following function using the
definition of the derivative.

Solution

First plug into the definition of the derivative as we’ve done with the
previous two examples.

In this problem we’re going to have to rationalize the numerator. You


do remember rationalization from an Algebra class right? In an Algebra
class you probably only rationalized the denominator, but you can also
rationalize numerators. Remember that in rationalizing the numerator

601
(in this case) we multiply both the numerator and denominator by the
numerator except we change the sign between the two terms. Here’s the
rationalizing work for this problem,

602
Again, after the simplification we have only h’s left in the numerator.
So, cancel the h and evaluate the limit.

603
And so we get a derivative of,

Let’s work one more example. This one will be a little different, but it’s
got a point that needs to be made.

Example 4 Determine for

Solution

Since this problem is asking for the derivative at a specific point we’ll
go ahead and use that in our work. It will make our life easier and that’s
always a good thing.

So, plug into the definition and simplify.

604
We saw a situation like this back when we were looking at limits at
infinity. As in that section we can’t just cancel the h’s. We will have to
look at the two one sided limits and recall that

605
606
The two one-sided limits are different and so

doesn’t exist. However, this is the limit that gives us the derivative that
we’re after.

If the limit doesn’t exist then the derivative doesn’t exist either.

In this example we have finally seen a function for which the derivative
doesn’t exist at a point. This is a fact of life that we’ve got to be aware
of. Derivatives will not always exist. Note as well that this doesn’t say
anything about whether or not the derivative exists anywhere else. In
fact, the derivative of the absolute value function exists at every point
except the one we just looked at, .

The preceding discussion leads to the following definition.

607
Definition

A function is called differentiable at

if exists and is called


differentiable on an interval if the derivative exists for each point in that
interval.

The next theorem shows us a very nice relationship between functions


that are continuous and those that are differentiable.

Theorem

If is differentiable at then

is continuous at .

See the Proof of Various Derivative Formulas section of the Extras


chapter to see the proof of this theorem.

608
Note that this theorem does not work in reverse. Consider

and take a look at,

So, is continuous at

but we’ve just shown above in Example 4 that

is not differentiable at .

Alternate Notation

Next we need to discuss some alternate notation for the derivative. The
typical derivative notation is the “prime” notation. However, there is
another notation that is used on occasion so let’s cover that.

Given a function all of the following are

equivalent and represent the derivative of with


respect to x.

609
Because we also need to evaluate derivatives on occasion we also need a
notation for evaluating derivatives when using the fractional notation.
So if we want to evaluate the derivative at x=a all of the following are
equivalent.

Note as well that on occasion we will drop the (x) part on the function to
simplify the notation somewhat. In these cases the following are
equivalent.

As a final note in this section we’ll acknowledge that computing most


derivatives directly from the definition is a fairly complex (and

610
sometimes painful) process filled with opportunities to make mistakes.
In a couple of sections we’ll start developing formulas and/or properties
that will help us to take the derivative of many of the common functions
so we won’t need to resort to the definition of the derivative too often.

Rules of Differentiation of Functions in Calculus

The basic rules of Differentiation of functions in calculus are presented


along with several examples .

1 - Derivative of a constant function.

The derivative of f(x) = c where c is a constant is given by

f '(x) = 0
Example

f(x) = - 10 , then f '(x) = 0

2 - Derivative of a power function (power rule).

The derivative of f(x) = x r where r is a constant real number is given


by

611
f '(x) = r x r - 1

Example

f(x) = x -2 , then f '(x) = -2 x -3 = -2 / x 3

3 - Derivative of a function multiplied by a constant.


The derivative of f(x) = c g(x) is given by

f '(x) = c g '(x)

Example

f(x) = 3x 3 ,

let c = 3 and g(x) = x 3, then f '(x) = c g '(x)


= 3 (3x 2) = 9 x 2

4 - Derivative of the sum of functions (sum rule).


The derivative of f(x) = g(x) + h(x) is given by

f '(x) = g '(x) + h '(x)


Example

612
f(x) = x 2 + 4

let g(x) = x 2 and h(x) = 4, then f '(x) = g '(x) + h '(x) = 2x + 0 = 2x

5 - Derivative of the difference of functions.

The derivative of f(x) = g(x) - h(x) is given by

f '(x) = g '(x) - h '(x)


Example

f(x) = x 3 - x -2

let g(x) = x 3 and h(x) = x -2, then


f '(x) = g '(x) - h '(x) = 3 x 2 - (-2 x -3) = 3 x 2 + 2x -3

6 - Derivative of the product of two functions (product rule).


The derivative of f(x) = g(x) h(x) is given by

f '(x) = g(x) h '(x) + h(x) g '(x)


Example

f(x) = (x 2 - 2x) (x - 2)

613
let g(x) = (x 2 - 2x) and h(x) = (x - 2), then
f '(x) = g(x) h '(x) + h(x) g '(x) = (x 2 - 2x) (1) + (x - 2) (2x - 2)
= x 2 - 2x + 2 x 2 - 6x + 4 = 3 x 2 - 8x + 4

7 - Derivative of the quotient of two functions (quotient rule).


The derivative of f(x) = g(x) / h(x) is given by f '(x) = ( h(x) g '(x) -
g(x) h '(x) ) / h(x) 2
Example

f(x) = (x - 2) / (x + 1)

let g(x) = (x - 2) and h(x) = (x + 1), then


f '(x) = ( h(x) g '(x) - g(x) h '(x) ) / h(x) 2

= ( (x + 1)(1) - (x - 2)(1) ) / (x + 1) 2

= 3 / (x + 1) 2

far we have looked at derivatives outside of the notion of


differentiability. The problem with this approach, though, is that some
functions have one or many points or intervals where their derivatives
are undefined. A function f is differentiable at a point c if

exists.
614
Similarly, f is differentiable on an open interval (a, b) if

exists for every c in (a, b).

Basically, f is differentiable at c if f'(c) is defined, by the above


definition. Another point of note is that if f is differentiable at c, then f is
continuous at c.

Let's go through a few examples and discuss their differentiability. First,


consider the following function.

plot(1/x^2, x, -5, 5).show(ymin=0, ymax=10)


Toggle Line Numbers

615
To find the limit of the function's slope when the change in x is 0, we
can either use the true definition of the derivative and do

def f(x):
return 1/x^2

var('h')
((f(x+h)-f(x))/h).rational_simplify().subs(h=0)
Toggle Line Numbers

or we can simply use the rules of differentiation by calling


'derivative(1/x^2, x)'. In any case, we find that

Since f'(x) is undefined when x = 0 (-2/02 = ?), we say that f is not


differentiable at x = 0. Since f'(x) is defined for every other x, we can
say that f' is continuous on (-∞, 0) U (0, ∞), where "U" denotes the union
of two intervals.

How about a function that is everywhere continuous but is not


everywhere differentiable? This occurs quite often with piecewise
functions, since even though two intervals might be connected, the slope
can change radically at their junction. Take a look at the function g(x) =
|x|.

616
plot(abs(x), x, -5, 5)
Toggle Explanation Toggle Line Numbers

Using our knowledge of what "absolute value" means, we can rewrite


g(x) in the expanded form

This should be easy to differentiate now; we get

617
What about at x = 0? The "logical" response would be to see that g(0) =
0 and say that g'(0) must therefore equal 0. Careful, though...looking
back at the limit definition of the derivative, the derivative of f at a point
c is the limit of the slope of f as the change in its independent variable
approaches 0. Really, the only relevant piece of information is the
behavior of function's slope close to c. Referring back to the example,
since the limit of g'(x) as x approaches 0 from the left ≠ the limit of g'(x)
as x approaches 0 from the right, g'(0) does not exist. We can use the
limit definition of the derivative to prove this:

so , which is undefined.

In this form, it makes far more sense why g'(0) is undefined. By simply
looking at the graph of g, too, one can see that the sudden "twist" at x =
0 is responsible for our inability to evaluate g' there. We can now justly
pronounce that g is differentiable on (-∞, 0) U (0, ∞), so g' is continuous
on that same interval.

The third function of discussion has a couple of quirks--take a look.

618
p = plot(sqrt(x-2), x, 2, 5)
pt1 = point((3, 1), rgbcolor="white", pointsize=30, faceted=True)
pt2 = point((3, 2), rgbcolor="black", pointsize=30)
l = line([(3, 1), (3, 2)], linestyle="--")
(p+pt1+pt2+l).show(xmin=0)
Toggle Line Numbers

Not only is v(t) defined solely on [2, ∞), it has a jump discontinuity at t
= 3. The jump discontinuity causes v'(t) to be undefined at t = 3; do you
see why? Using a slightly modified limit definition of the derivative,
think of what

619
would be for c = 3 and some x very close to 3. The resulting slope would
be astronomically large either negatively or positively, right? In fact, the
dashed line connecting v(t) for t ≠ 3 and v(3) is what the tangent line
will look like at that point. Since a function's der
derivative
ivative cannot be
infinitely large and still be considered to "exist" at that point, v is not
differentiable at t=3

11.5.MAXIMA AND MINIMA

A local maximum point on a function is a point (x,y) on the


graph of the function whose y coordinate is larger than all other y
coordinates on the graph at points "close to'' (x,y).. More precisely,
(x,f(x)) is a local maximum if there is an interval (a,b)
with a<x<b and f(x)≥f(z) for every z in (a,b).
Similarly, (x,y) is a local minimum point if it has locally the
smallest y coordinate. Again being more precise: (x,f(x)) is a
local minimum if there is an interval (a,b) with a<x<b and
f(x)≤f(z) for every z in (a,b). A local extremum is
either a local minimum or a local maximum.

Local maximum and minimum points are quite distinctive on the graph
of a function, and are therefore useful in understanding the shape of the
graph. In many applied problems we
we want to find the largest or smallest
value that a function achieves (for example, we might want to find the
minimum cost at which some task can be performed) and so identifying

620
maximum and minimum points will be useful for applied problems as
well. Some examples of local maximum and minimum points are shown
in figure 5.1.1.

0,0
0,0
A
A
A
B
B

Figure 5.1.1. Some local maximum points ( A)) and minimum points (
B).

If (x,f(x)) is a point where f(x) reaches a local maximum


or minimum, and if the derivative of f exists at x,, then the graph has
a tangent line and the tangent line must be horizontal. This is important
enough to state as a theorem, though we will not prove it.

Theorem 5.1.1 (Fermat's Theorem) If f(x) has a local extremum at


x=a and f is differentiab
differentiable at a, then f
f′(a)=0.

Thus, the only points at which a function can have a local maximum or
minimum are points at which the derivative is zero, as in the left hand
graph in figure 5.1.1,, or the derivative is undefined, as in the right hand
graph. Any value of x for which f′(x) is zero or undefined is
called a critical value for f.. When looking for local maximum and
621
minimum points, you are likely to make two sorts of mistakes: You may
forget that a maximum or minimum can occur where the derivative does
not exist, and so forget to check whether the derivative exists
everywhere. You might also assume that any place that the derivative is
zero is a local maximum or minimum point, but this is not true. A
portion of the graph of f(x)=x3 is shown in figure 5.1.2. The
derivative of f is f′(x)=3x2, and f′(0)=0, but
there is neither a maximum nor minimum at (0,0)..

0,0

Figure 5.1.2. No maximum or minimum even though the derivative is


zero.

Since the derivative is zero or undefined at both local maximum and


local minimum points, we need a way to determine which, if either,
actually occurs. The most elementary approach, but one that is often
tedious or difficult, is to test directly whether th
the y coordinates "near''
the potential maximum or minimum are above or below the y
coordinate at the point of interest. Of course, there are too many points
"near'' the point to test, but a little thought shows we need only test two
provided we know thatt f is continuous (recall that this means that the
graph of f has no jumps or gaps).

622
Suppose, for example, that we have identified three points at which
f′ is zero or nonexistent: (x1,y1), (x2,y2)
(x2,y2),
(x3,y3), and x1<x2<x3 (see figure 5.1.3).
5.1.3 Suppose that
we compute the value of f(a) for x1<a<x2 and that
x1<a<x2,
f(a)<f(x2).. What can we say about the graph between a and
x2?? Could there be a point (b,f(b)), a<b<x2 with
f(b)>f(x2)?? No: if there were, the graph would go up from
(a,f(a)) to (b,f(b)) then down to (x2,f(x2))
and somewhere in between would have a local maximum point. (This is
not obvious; it is a result of the Extreme Value Theorem, theorem 6.1.2.)
But at that local maximum point the derivative of f would be zero or
nonexistent, yet we already know that the derivative is zero or
nonexistent only at x1, x2, and x3.. The upshot is that one
computation tells us that (x2,f(x2)) has the largest y
coordinate of any point on the graph near x2 and to the left of
x2.. We can perform the same test on the right. If we find that on both
sides of x2 the values are smaller, then there must be a local
maximum at (x2,f(x2));; if we find that on both sides of
x2 the values are larger, then there must be a local minimum at
(x2,f(x2));; if we find one of each, then there is neither a local
maximum or minimum at x2.

0,0

623
x1

x2

x3

Figure 5.1.3. Testing for a maximum or minimum.

It is not always easy to compute the value of a function at a particular


point. The task is made easier by the availability of calculators and
computers, but they have their own drawbacks–-they
drawbacks they do not always
allow us to distinguish between values that are very close together.
Nevertheless, because this method is conceptually simple and sometimes
easy to perform, you should always consider it.

Example 5.1.2 Find all local maximum and minimum points for the
function f(x)=x3
f(x)=x3−x. The derivative is
f′(x)=3x2−1.. This is defined everywhere and is zero at
x=±3/3.. Looking first at x=3/3, we see that
f(3/3)=−23/9.. Now we test two points on either side of
x=3/3,, making sure that neither is farther away than the nearest critical
value; since 3<3
3<3, 3/3<1 and we can use x=0 and
x=1. Since f(0)=0>−23/9 and

624
f(1)=0>−23/9,, there must be a local minimum at x=3/3. For
x=−3/3,, we see that f(−3/3)=23/9.
This time we can use x=0 and x=−1,, and we find that
−1)=f(0)=0<23/9,, so there must be a local maximum
f(−1)=f(0)=0<23/9
at x=−3/3..

Of course this example is made very simple by our choice of points to


test, namely −1, 0, 1.. We could have used other values, say
x=−1
−5/4, 1/3,, and 3/4,, but this would have made the
calculations considerably more tedious.

Example 5.1.3 Find all local maximum and minimum points for
f(x)=sin x+cos x. The derivative is
f′(x)=cos x−sin x.. This is always defined and is zero whenever
cos x=sin x.. Recalling that the cos x and sin x are
the x and y coordinates of points on a unit circle, we see that
cos x=sin x when x is π/4, π/4±π, π/4±2π,
π/4±3π,, etc. Since both sine and cosine have a period of 2π,
we need only determine the status of x=π/4 and x=5π/4.
We can use 0 and π/2 to test the critical value x=π/4. We
find that f(π/4)=2, f(0)=1<2 and
f(π/2)=1,, so there is a local maximum when x=π and also
x=π/4
when π/4±2π,
x=π/4±2 π/4±4π,, etc. We can summarize

625
this more neatly by saying that there are local maxima at
π/4±2kπ for every integer k.

We use π and 2π to test the critical value x=5


x=5π/4. The
relevant values are f(5π/4)=−2,
f(π)=−1>−2, f(2π)=1>−2,, so there is a local minimum
at x=5π/4, 5π/4±2π, π/4±4π, etc. More
5π/4±4π
succinctly, there are local minima at π/4±2kπ for every
5π/4±2kπ
integer k.

Many important applied problems involve finding the best way to


accomplish some task. Often this involves finding the maximum or
minimum value of some function: the minimum time to make a certain
journey, the minimum cost for doing a task, the maximum powe
power that
can be generated by a device, and so on. Many of these problems can be
solved by finding the appropriate function and then using techniques of
calculus to find the maximum or the minimum value required.

Generally such a problem will have the follow


following
ing mathematical form:
Find the largest (or smallest) value of f(x) when a≤x
x≤b .
Sometimes a or b are infinite, but frequently the real world imposes
some constraint on the values that x may have.

626
Such a problem differs in two ways from the local maximum and
minimum problems we encountered when graphing functions: We are
interested only in the function between a and b , and we want to know
the largest or smallest value that f(x) takes on, not merely
m values
that are the largest or smallest in a small interval. That is, we seek not a
local maximum or minimum but a global maximum or minimum,
sometimes also called an absolute maximum or minimum.

Any global maximum or minimum must of course be a local


loc maximum
or minimum. If we find all possible local extrema, then the global
maximum, if it exists,, must be the largest of the local maxima and the
global minimum, if it exists,
exists, must be the smallest of the local minima.
We already know where local extrema can occur: only at those points at
which f′(x) is zero or undefined. Actually, there are two additional
points at which a maximum or minimum can occur if the endpoints a
and b are not infinite, namely, at a and b . We have not previously
considered
red such points because we have not been interested in limiting a
function to a small interval. An example should make this clear.

0,0

−2

Figure 6.1.1. The function f(x)=x2 restricted to [−2,1]

627
Example 6.1.1 Find the maximum and minimum values of f(x)=x2
on the interval [−22,1] , shown in figure 6.1.1.. We compute
f′(x)=2x , which is zero at x=0 and is always defined.

Since f′(1)=2 we would not normally flag x=1 as a point of


interest, but it is clear from the graph that when f(x) is restricted to
[−2,1] there is a local maximum at x=1 . Likewise we would
not normally pay attention to x=−2 , but since we have truncated f
at −2 we have introduced a new local maximum there as well. In a
technical sense nothing new is going on here: When we truncate f we
actually create a new function, let's call it g , that is defined only on the
interval [−2,1] . If we try to compute the derivative of this new
function we actually find that it does not have a derivative at −2 or 1 .
Why? Because to compute the derivative at 1 we must compute the limit
limΔx→0g(1+Δx)−g(11)Δx.

This limit does not exist because when Δx>0 , g(1+


+Δx) is
not defined. It is simpler, however, simply to remember that we must
always check the endpoints.

So the function g , that is, f restricted to [−2,1] , has one critical


value and two finite endpoints, any of which might be the global
maximum or minimum. We could first determine which of these are

628
local maximum or minimum points (or neither); then the largest local
maximum must be the global maximum and the smallest local minimum
must be the global minimum. It is usually easier, however, to compute
the value of f at every point at which the global maximum or minimum
might occur; the largest of these is the global maximum, the smallest is
the global minimum.

So we compute f(−2)=
)=4 , f(0)=0 , f(1)=1 . The
global maximum is 4 at x=−2 and the global minimum is 0 at x=0
.

It is possible that there is no global maximum or minimum. It is difficult,


and not particularly useful, to express a complete procedure for
determining whether this is the case. Generally, the best approach is to
gain enough understanding of the shape of the graph to decide.
Fortunately, only a rough idea of the shape is usually needed.

There are some particularly nice cases that are easy. A continuous
function on a closed interval [a,b] always has both a global
maximum and a global minimum, so examining the critical values and
the endpoints is enough:

Theorem 6.1.2 (Extreme value theorem) If f is continuous on a closed


interval [a,b] , then it has both a minimum and a maximum point.

629
That is, there are real numbers c and d in [a,b] so that for every x
in [a,b] , f(x)≤f(cc) and f(x)≥f(d) .

Another easy case: If a function is continuous and has a single critical


value, then if there is a local maximum at the critical value it is a global
maximum, and if it is a local minimum it is a global minimum. There
may also be a global minimum in the first case, or a global maximum
ma in
the second case, but that will generally require more effort to determine.

Example 6.1.3 Let f(x)=


)=−x2+4x−3 . Find the maximum
value of f(x) on the interval [0,4] . First note that f′(x)=−2x+4=0
when x=2 , and f(2)=1 . Next observe that
f′(x) is defined for all x , so there are no other critical values.
Finally, f(0)=−3 and f(4)=−3 . The largest value of f(x)
on the interval [00,4] is f(2)=1 .

Example 6.1.4 Let f(x)=


)=−x2+4x−3 . Find the maximum
value of f(x) on the interval [−1,1] .

First note that f′(x)=−22x+4=0 when x=


=2 . But x=2
is not in the interval, so we don't use it. Thus the only two points to
be checked are the endpoints; f(−1)=−8 and f(11)=0 . So
the largest value of f(x
x) on [−1,1] is f(1)=0 .

Example 6.1.5 Find the maximum and minimum values of the function
f(x)=7+|x−2| for x between 1 and 4 inclusive. The
630
derivative f′(x) is never zero, but f′(x) is undefined at x=2
, so we compute f(2
2)=7 . Checking the end points we get
f(1)=8 and f(4)=
)=9 . The smallest of these numbers is
f(2)=7 , which is, therefore, the minimum value of f(x) on
the interval 1≤x≤4 , and the maximum is f(4)=9 .

0,0

−2

Figure 6.1.2. f(x)=x3−


−x

Example 6.1.6 Find all local maxima and minima for f(xx)=x3−x
, and determine whether there is a global maximum or minimum on
the open interval (−2,22) . In example 5.1.2 we found a local
√3/9)
maximum at (−√3/3,2√ and a local minimum at
(√3/3,−2√3/9) . Since the endpoints are not in the
interval (−2,2) they cannot be considered. Is the lone local
maximum a global maximum? Here we must look more closely at the
graph. We know that on the closed interval [−√3/3,√3/3]]
there is a global maximum at x=−√3/3 and a global
minimum at x=√3/3 . So the question becomes: what happens
between −2 and −√33/3 , and between √3/3 and 2 ?
Since there is a local minimum at x=√3/3 , the graph must

631
continue up to the right, since there are no more critical values. This
means no value of f will be less than −2√3/9 between √3/3
and 2 , but it says nothing about whether we might find a value
larger than the local maximum 2√3/9 . How can we tell? Since
the function increases to the right of √3/3 , we need to know what
the function values do "close to'' 2 . Here the easiest test
est is to pick a
number and do a computation to get some idea of what's going on. Since
f(1.9)=4.959>2√3/9 , there is no global
maximum at −√3/3 , and hence no global maximum at all. (How
can we tell that 4.959>
>2√3/9 ? We can use a calculator to
approximate the right hand side; if it is not even close to 4.959 we
can take this as decisive. Since 2√3/9≈0.3849 , there's
really no question. Funny things can happen in the rounding done by
computers and calculators, however, so we might be a little more
careful, especially if the values come out quite close. In this case we can
convert the relation 4.959>2√3/9
4.959 into (99/2)4.959>√3
and ask whether this is true. Since the left side is clearly
larger than 4⋅4 which is clearly larger than √3 , this settles the
question.)

A similar analysis shows that there is also no global minimum. The


graph of f(x) −2,2)
on (− is shown in figure 6.1.2.
6.1.2

632
Example 6.1.7 Of all rectangles of area 100, which has the smallest
perimeter?

First we must translate this into a purely mathematical problem in which


we want to find the minimum value of a function. If x denotes one of
the sides of the rectangle, then the adjacent side must be 100/x (in
order that the area be 100). So the function
function we want to minimize is
f(x)=2x+2100x

since the perimeter is twice the length plus twice the width of the
rectangle. Not all values of x make sense in this problem: lengths of
sides of rectangles must be positive, so x>0 . If x>0 then so is
100/x , so we need no second condition on x .

We next find f′(x) and set it equal to zero: 0=f′(x)=


)=2−200/x2
. Solving f′(x)=0 for x gives us x=±10
. We are interested only in x>0 , so only the value x=
=10 is of
interest. Since f′(x) is defined everywhere on the interval (0,∞)
, there are no more critical values, and there are no endpoints. Is there
a local maximum, minimum, or neither at x=10 ? The second
derivative is f″(x)=400
400/x3 , and f″(10)>
)>0 , so
there is a local minimum. Since there is only one critical value, this is

633
also the global minimum, so the rectangle with smallest perimeter is the
10×10 square.

Example 6.1.8 You want to sell a certain number n of items in order to


maximize your profit. Market research tells you that if you set the price
at $1.50, you will be able to sell 5000 items, and for every 10 cents you
lower the price below $1.50 you will be able to sell
sell another 1000 items.
Suppose that your fixed costs ("start-up
("start up costs'') total $2000, and the per
item cost of production ("marginal cost'') is $0.50. Find the price to set
per item and the number of items sold in order to maximize profit, and
also determine
ne the maximum profit you can get.

The first step is to convert the problem into a function maximization


problem. Since we want to maximize profit by setting the price per item,
we should look for a function P(x) representing the profit when the
price per item is x . Profit is revenue minus costs, and revenue is
number of items sold times the price per item, so we get
P=nx−2000−0.50n . The number of items sold is
itself a function of x , n=5000+1000(1.5−x)/0.10
, because (1.5−x)/0.10
0.10 is the number of multiples of 10
cents that the price is below $1.50. Now we substitute for n in the
profit function:
P(x)=(5000+1000(1.5−
−x)/0.10)x−2000−0.5(5000+1000((1.5−x)/0.10)=−
10000x2+25000x−12000
12000

634
We want to know the maximum value of this function when x is
between 0 and 1.5 . The derivative is P′(x)=−20000x+
+25000
, which is zero when x=1.25 . Since
P″(x)=−20000<0 , there must be a local maximum at
x=1.25 , and since this is the only critical value it must be a global
maximum as well. (Alternately, we could compute P(0)=
)=−12000
, P(1.25)=3625 , and P(1.5)=3000 and
note that P(1.25) is the maximum of these.)

11.7.PARTIAL DIFFERENTIATION

Partial Derivatives

Now that we have the brief discussion on limits out of the way we can
proceed into taking derivatives of functions of mo
moee than one variable.
Before we actually start taking derivatives of functions of more than one
variable let’s recall an important interpretation of derivatives of
functions of one variable.

Recall that given a function of one variable, , the

derivative, , represents the ate of change of the

635
function as x changes. This is an important interpretation of derivatives
and we are not going to want to lose it with functions of more than one
variable. The problem with functions of more than one variable is that
there is more than one variable. In other words, what do we do if we
only want one of the variables to change, or if we want more than one of
them to change? In fact, if we’re going to allow more than one of the
variables to change there are then going to be an infinite amount of ways
for them to change. For instance, one variable could be changing faster
than the other variable(s) in the function. Notice as well that it will be
completely possible for the function to be changing differently
depending on how we allow one or more of the variables to change.

We will need to develop ways, and notations, for dealing with all of
these cases. In this section we are going to concentrate exclusively on
only changing one of the variables at a time, while the remaining
variable(s) are held fixed. We will deal with allowing multiple variables
to change in a later section.

Because we are going to only allow one of the variables to change taking
the derivative will now become a fairly simple process. Let’s start off
this discussion with a fairly simple function.

636
Let’s start with the function

and let’s determine the rate at which the function is

changing at a point, , if we hold y fixed and allow x to


vary and if we hold x fixed and allow y to vary.

We’ll start by looking at the case of holding y fixed and allowing x to


vary. Since we are interested in the rate of change of the function at

and are holding y fixed this means that we are going


to always have (if we didn’t have this then eventually
y would have to change in order to get to the point…). Doing this will
give us a function involving only x’s and we can define a new function
as follows,

Now, this is a function of a single variable and at this point all that we

are asking is to determine the rate of change of at

637
. In other words, we want to compute

and since this is a function of a single variable we already know

how to do that. Here is the rate of change of the function at

if we hold y fixed and allow x to vary.

We will call the partial derivative of

with respect to x at and we will denote it


in the following way,

Now, let’s do it the other way. We will now hold x fixed and allow y to
vary. We can do this in a similar way. Since we are holding x fixed it
must be fixed at and so we can define a new function
of y and then differentiate this as we’ve always done with functions of
one variable.

Here is the work for this,

638
In this case we call the partial derivative of

with respect to y at and we denote it


as follows,

Note that these two partial derivatives are sometimes called the first
order partial derivatives. Just as with functions of one variable we can
have derivatives of all orders. We will be looking at higher order
derivatives in a later section.

Note that the notation for partial derivatives is different than that for
derivatives of functions of a single variable. With functions of a single
variable we could denote the derivative with a single prime. However,
with partial derivatives we will always need to remember the variable
that we are differentiating with respect to and so we will subscript the

639
variable that we differentiated with respect to. We will shortly be seeing
some alternate notation for partial derivatives as well.

Note as well that we usually don’t use the notation for


partial derivatives. The more standard notation is to just continue to use

. So, the partial derivatives from above will more


commonly be written as,

Now, as this quick example has shown taking derivatives of functions of


more than one variable is done in pretty much the same manner as taking

derivatives of a single variable. To compute


all we need to do is treat all the y’s as constants (or numbers) and then
differentiate the x’s as we’ve always done. Likewise, to compute

we will treat all the x’s as constants and then


differentiate the y’s as we are used to doing.

640
Before we work any examples let’s get the formal definition of the
partial derivative out of the way as well as some alternate notation.

Since we can think of the two partial derivatives above as derivatives of


single variable functions it shouldn’t be too surprising that the definition
of each is very similar to the definition of the derivative for single
variable functions. Here are the formal definitions of the two partial
derivatives we looked at above

Now let’s take a quick look at some of the possible alternate notations

for partial derivatives. Given the function

the following are all equivalent notations,

641
For the fractional notation for the partial derivative notice the difference
between the partial derivative and the ordinary derivative from single
variable calculus.

642
Okay, now let’s work some examples. When working these examples
always keep in mind that we need to pay very close attention to which
variable we are differentiating with respect to. This is important because
we are going to treat all other variables as constants and then proceed
with the derivative as if it was a function of a single variable. If you can
remember this you’ll find that doing partial derivatives are not much
more difficult that doing derivatives of functions of a single variable as
we did in Calculus I.

Example 1 Find all of the first order partial derivatives for the
following functions.

(a)

[Solution]

(b)

[Solution]

(c)

[Solution]

643
(d)

[Solution]

Solution

(a)

Let’s first take the derivative with respect to x and remember that as we
do so all the y’s will be treated as constants. The partial derivative with
respect to x is,

Notice that the second and the third term differentiate to zero in this
case. It should be clear why the third term differentiated to zero. It’s a
constant and we know that constants always differentiate to zero. This is
also the reason that the second term differentiated to zero. Remember
that since we are differentiating with respect to x here we are going to
treat all y’s as constants. That means that terms that only involve y’s

644
will be treated as constants and hence will differentiate to zero.

Now, let’s take the derivative with respect to y. In this case we treat all
x’s as constants and so the first term involves only x’s and so will
differentiate to zero, just as the third term will. Here is the partial
derivative with respect to y.

[Return to Problems]

(b)

With this function we’ve got three first order derivatives to compute.
Let’s do the partial derivative with respect to x first. Since we are
differentiating with respect to x we will treat all y’s and all z’s as
constants. This means that the second and fourth terms will differentiate
to zero since they only involve y’s and z’s.

645
This first term contains both x’s and y’s and so when we differentiate
with respect to x the y will be thought of as a multiplicative constant and
so the first term will be differentiated just as the third term will be
differentiated.

Here is the partial derivative with respect to x.

Let’s now differentiate with respect to y. In this case all x’s and z’s will
be treated as constants. This means the third term will differentiate to
zero since it contains only x’s while the x’s in the first term and the z’s
in the second term will be treated as multiplicative constants. Here is
the derivative with respect to y.

646
Finally, let’s get the derivative with respect to z. Since only one of the
terms involve z’s this will be the only non-zero term in the derivative.
Also, the y’s in that term will be treated as multiplicative constants.
Here is the derivative with respect to z.

(c)

With this one we’ll not put in the detail of the first two. Before taking
the derivative let’s rewrite the function a little to help us with the
differentiation process.

647
Now, the fact that we’re using s and t here instead of the “standard” x
and y shouldn’t be a problem. It will work the same way. Here are the
two derivatives for this function.

Remember how to differentiate natural logarithms.

648
(d)

Now, we can’t forget the product rule with derivatives. The product rule
will work the same way here as it does with functions of one variable.
We will just need to be careful to remember which variable we are
differentiating with respect to.

Let’s start out by differentiating with respect to x. In this case both the
cosine and the exponential contain x’s and so we’ve really got a product
of two functions involving x’s and so we’ll need to product rule this up.
Here is the derivative with respect to x.

649
Do not forget the chain rule for functions of one variable. We will be
looking at the chain rule for some more complicated expressions for
multivariable functions in a later section. However, at this point we’re
treating all the y’s as constants and so the chain rule will continue to
work as it did back in Calculus I.

Also, don’t forget how to differentiate exponential functions,

650
Now, let’s differentiate with respect to y. In this case we don’t have a
product rule to worry about since the only place that the y shows up is in
the exponential. Therefore, since x’s are considered to be constants for
this derivative, the cosine in the front will also be thought of as a
multiplicative constant. Here is the derivative with respect to y.

Example 2 Find all of the first order partial derivatives for the
following functions.

(a) [Solution]

651
(b)

[Solution]

(c)

[Solution]

Solution

(a)

We also can’t forget about the quotient rule. Since there isn’t too much
to this one, we will simply give the derivatives.

652
In the case of the derivative with respect to v recall that u’s are constant
and so when we differentiate the numerator we will get zer

(b)

Now, we do need to be careful however to not use the quotient rule


when it doesn’t need to be used. In this case we do have a quotient,
however, since the x’s and y’s only appear in the numerator and the z’s
only appear in the denominator this really isn’t a quotient rule problem.

Let’s do the derivatives with respect to x and y first. In both these cases
the z’s are constants and so the denominator in this is a constant and so

653
we don’t really need to worry too much about it. Here are the
derivatives for these two cases.

Now, in the case of differentiation with respect to z we can avoid the


quotient rule with a quick rewrite of the function. Here is the rewrite as
well as the derivative with respect to z.

654
We went ahead and put the derivative back into the “original” form just
so we could say that we did. In practice you probably don’t really need
to do that.

(c)

In this last part we are just going to do a somewhat messy chain rule
problem. However, if you had a good background in Calculus I chain
rule this shouldn’t be all that difficult of a problem. Here are the two
derivatives,

655
656
So, there are some examples of partial derivatives. Hopefully you will
agree that as long as we can remember to treat the other variables as
constants these work in exactly the same manner that derivatives of
functions of one variable do. So, if you can do Calculus I derivatives
you shouldn’t have too much difficulty in doing basic partial derivatives.

There is one final topic that we need to take a quick look at in this
section, implicit differentiation. Before getting into implicit
differentiation for multiple variable functions let’s first remember how
implicit differentiation works for functions of one variable.

657
Example 3 Find for
.

Solution

Remember that the key to this is to always think of y as a function of x,

or and so whenever we differentiate a


term involving y’s with respect to x we will really need to use the chain

rule which will mean that we will add on a to that term.

The first step is to differentiate both sides with respect to x.

The final step is to solve for .

658
Now, we did this problem because implicit differentiation works in
exactly the same manner with functions of multiple variables. If we
have a function in terms of three variables x, y, and z we will assume

that z is in fact a function of x and y. In other words,

. Then whenever we differentiate z’s with

respect to x we will use the chain rule and add on a .


Likewise, whenever we differentiate z’s with respect to y we will add on

a .

Let’s take a quick look at a couple of implicit differentiation problems.

Example 4 Find and for each of the following


functions.

659
(a)
[Solution]

(b)

Solution

(a)

Let’s start with finding . We first will differentiate both sides

with respect to x and remember to add on a whenever we


differentiate a z.

Remember that since we are assuming

660
then any product of x’s and z’s will be a product and so will
need the product rule!

Now, solve for .

Now we’ll do the same thing for except this time we’ll need

to remember to add on a whenever we differentiate a z.

661
(b)

We’ll do the same thing for this function as we did in the previous part.

First let’s find .

662
Don’t forget to do the chain rule on each of the trig functions and when
we are differentiating the inside function on the cosine we will need to

also use the product rule. Now let’s solve for .

663
Now let’s take care of . This one will be slightly easier than
the first one.

664
11.8.Higher order derivatives

Higher Order Derivatives

Let’s start this section with the following function.

By this point we should be able to differentiate this function without any


problems. Doing this we get,

665
Now, this is a function and so it can be differentiated. Here is the
notation that we’ll use for that, as well as the derivative.

This is called the second derivative and is now


called the first derivative.

Again, this is a function so we can differentiate it again. This will be


called the third derivative. Here is that derivative as well as the
notation for the third derivative.

Continuing, we can differentiate again. This is called, oddly enough, the


fourth derivative. We’re also going to be changing notation at this

666
point. We can keep adding on primes, but that will get cumbersome
after awhile.

This process can continue but notice that we will get zero for all
derivatives after this point. This set of derivatives leads us to the
following fact about the differentiation of polynomials.

Fact

If p(x) is a polynomial of degree n (i.e. the largest exponent in the


polynomial) then,

We will need to be careful with the “non-prime” notation for


derivatives. Consider each of the following.

667
The presence of parenthesis in the exponent denotes differentiation
while the absence of parenthesis denotes exponentiation.

Collectively the second, third, fourth, etc. derivatives are called higher
order derivatives.

Let’s take a look at some examples of higher order derivatives.

Example 1 Find the first four derivatives for each of the following.

(a)

[Solution]

(b) [Solution]

(c)

[Solution]

668
Solution

(a)

There really isn’t a lot to do here other than do the derivatives.

Notice that differentiating an exponential function is very simple. It


doesn’t change with each differentiation.

669
(b)

Again, let’s just do some derivatives.

Note that cosine (and sine) will repeat every four derivatives. The other
four trig functions will not exhibit this behavior. You might want to
take a few derivatives to convince yourself of this.

[Return to Problems]

(c)

670
In the previous two examples we saw some patterns in the differentiation
of exponential functions, cosines and sines. We need to be careful
however since they only work if there is just a t or an x in the argument.
This is the point of this example. In this example we will need to use the
chain rule on each derivative.

671
So, we can see with slightly more complicated arguments the patterns
that we saw for exponential functions, sines and cosines no longer
completely hold.

[Return to Problems]

Let’s do a couple more examples to make a couple of points.

Example 2 Find the second derivative for each of the following


functions.

(a)

(b)

(c)

Solution

(a)

Here’s the first derivative.

672
Notice that the second derivative will now require the product rule.

Notice that each successive derivative will require a product and/or


chain rule and that as noted above this will not end up returning back to
just a secant after four (or another other number for that matter)
derivatives as sine and cosine will.

(b)

Again, let’s start with the first derivative.

673
As with the first example we will need the product rule for the second
derivative.

(c)

Same thing here.

674
The second derivative this time will require the quotient rule.

As we saw in this last set of examples we will often need to use the
product or quotient rule for the higher order derivatives, even when the
first derivative didn’t require these rules.

Let’s work one more example that will illustrate how to use implicit
differentiation to find higher order derivatives.

675
Example 3 Find for

Solution

Okay, we know that in order to get the second derivative we need the
first derivative and in order to get that we’ll need to do implicit
differentiation. Here is the work for that.

Now, this is the first derivative. We get the second derivative by


differentiating this, which will require implicit differentiation again.

676
This is fine as far as it goes. However, we would like there to be no
derivatives in the answer. We don’t, generally, mind having x’s and/or
y’s in the answer when doing implicit differentiation, but we really don’t
like derivatives in the answer. We can get rid of the derivative however
by acknowledging that we know what the first derivative is and
substituting this into the second derivative equation. Doing this gives,

677
Now that we’ve found some higher order derivatives we should
probably talk about an interpretation of the second derivative.

If the position of an object is given by s(t) we know that the velocity is


the first derivative of the position.

678
The acceleration of the object is the first derivative of the velocity, but
since this is the first derivative of the position function we can also think
of the acceleration as the second derivative of the position function.

Alternate Notation

There is some alternate notation for higher order derivatives as well.


Recall that there was a fractional notation for the first derivative.

We can extend this to higher order derivatives.

679
How to find the slope of a straight line and its derivative ? What is
the relation between the slope of a curve or a parabola and its
derivative ? How to find the derivative of the composite of two
functions f(g(x)), an exponential or trigonometric function, a
logarithmic function,… ?

1. Calculate the slope of a straight line and its derivative :


Solution 1

Given the graph of


the following linear
function :
Find the slope of a straight line graphically
and, knowing the equation y = (-1/3)x + 1,
compare the value of the derivative y' to
the slope.

a) Slope of the straight line calculation

680
and knowing that
the equation of the
straight line
shown is

you are asked to :

a) find the slope of


the straight line
graphically with
Slope of the line =
the help of the
formula : slope = ΔY /
ΔX , b) Derivative y ' of the straight line

b) figure out the


derivative y ' of this
straight line
equation with the
help of the derivative c) Reach a conclusion on the result
rules, obtained in a) and b)

c) conclude
something on the The slope of the line, computed in a), is equal

681
obtained results to the derivative of the straight line equation,
in a) and b). computed in b). Indeed, the derivative of a
linear function (general equation of a linear
function : ) equals the slope of the
line described by that function. In other
words, the derivative of a linear function is
the angular coefficient a of the function.

Calculate a derivative is the same than


calculate a slope and inversely

2. Compute the slope of a curve and its derivative : Solution 2

Given the graph Find out graphically the slope of the


and the quadratic tangent line to the parabola at x = -2
equation of the knowing that the parabola equation is y =
following parabola x2 + 1. Find out the derivative y' and the
y = x2 + 1 and its equation of the tangent line
tangent line

682
at x = -2, a) Calculation of the derivative y ' of the
quadratic equation (parabola equation)

y ' = (x2 + 1) ' = 2x

b) Calculate graphically the slope of the


tangent line to the parabola at x = -2

you are asked to :

a) find out the


derivative y ' of the
parabola equation
with the help of the
derivative formulas,

b) find out

683
graphically the slope of Slope of the tangent line =
the tangent line to
the parabola at x =
-2, with the help of the
formula c) Find the equation of that tangent line
slope = ΔY / ΔX ,

c) find the equation The general equation of a straight line is y =


of that tangent line, ax + b, where the coefficient a is the slope of
the line.
d) figure out the
derivative of the tangent Given that the slope of our tangent line is –4,
line equation with
thus the equation of the tangent line will be
the help of the
y = -4x + b.
derivative
formulas, What is the value of the b term ?

e) reach a Knowing that the tangent line is tangent to


conclusion on the the parabola at x = -2, thus at y = (-2)2 + 1 =
results obtained 5, we can now figure out the b term (y-
in b) and d). intercept) of the line that has a slope a = -4
and that passes through the point (-2 ; 5) :

Tangent line equation : y = -4x + b,

684
we know that (x ; y) = (-2 ; 5) satisfies
this equation.

Thus 5 = -4(-2) + b,

and 5 + 4(-2) = b,

which eventually yields b = 5 - 8 = -3.

Conclusion : the tangent line equation is


y = - 4x - 3

d) Calculate the derivative of the tangent

y ' = (-4x - 3) ' = - 4

e) Reach a conclusion on the results


obtained in b) and d)

The derivative of the tangent line equation,


computed in d), is equal to the slope of that
tangent line, computed in b).

685
Calculate a derivative is the same than
calculate a slope and inversely.

It is also interesting to notice that the


derivative of the parabola is f '(x) = (x2 + 1) '
= 2x.

f '(x) = 2x is the equation of the derivative of


the parabola but it is also the equation of the
slopes of the tangent lines to the parabola at
various points on the curve.

In other words, f '(x) = 2x expresses the slope


of the curve for various value of x.

For instance, what is the value of the slope of


the parabola at x = -2 ?

f '(-2) = 2(-2) = -4.

Or, for instance, what is the slope of the


curve at x = 1 ?

f '(1) = 2(1) = 2.

686
3. Compute the derivative of the following functions
(use the derivative rules)

Solution 3.1 Find the derivative y' of the


following function y = 2x3 + 4x2 - 5x + 7
with the help of the derivative rules.
Exercises with complete detailed solution.

Solution 3.2 Find out the derivative y' of y


= -7 + 5x - (5/2)x2 - 3x4 with the help of the
derivative formulas. Exercises with
answers.

687
Solution 3.3 y = (-x + 7)4, find the
derivative y' with the help of the derivative
rules. Exercises with solutions.

688
Solution 3.4

Solution 3.5

689
Solution 3.6

Solution 3.7

Solution 3.8

690
4. Derivate the following functions (derivative formulas) :
composite of two functions f(g(x), exponential, logarithmic
function,
trigonometric function,...

691
Solution 4.1

Solution 4.2

Solution 4.3

692
Solution 4.4

693
Solution 4.5

Solution 4.6

Solution 4.7

694
Solution 4.8

Solution 4.9

Solution 4.10

695
696
Solution 4.11

Solution 4.12

697
Solution 4.13

698
Solution 4.14

699
Solution 4.15

Solution 4.16

700
Solution 4.17

Solution 4.18

701
Solution 4.19

702
Solution 4.20

Solution 4.21

703
Solution 4.22 Interesting !

704
MATHEMATICS I: ASSIGNMENT /40Marks

Q1. Compute the derivative of the following functions 10Marks


(use the derivative rules)

Q2 20Marks

a) Evaluate the following integral.

b) Evaluate the following integral.

705
c) Evaluate the following integral.

d) Evaluate the following integral.

e) Evaluate the following integral.

Q3. Find the partial-fraction decomposition of the following expression: 10Marks

i.

706
ii.

707

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy