Rational Numbers To Linear Equations: Hung-Hsi Wu
Rational Numbers To Linear Equations: Hung-Hsi Wu
to Linear Equations
Hung-Hsi Wu
Rational Numbers
to Linear Equations
Rational Numbers
to Linear Equations
Hung-Hsi Wu
2010 Mathematics Subject Classification. Primary 97-01, 97-00, 97D99, 97-02,
00-01, 00-02.
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit www.ams.org/publications/pubpermissions.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.
c 2020 by the author. All rights reserved.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 25 24 23 22 21 20
Dedicated to the memory of
David M. Collins
Contents
Preface xi
Prerequisites xlv
Chapter 1. Fractions 1
Overview of Chapters 1 and 2 1
1.1. Definition of a fraction 4
1.2. Equivalent fractions 19
1.3. Adding and subtracting fractions 32
1.4. Multiplying fractions 43
1.5. Dividing fractions 54
1.6. Complex fractions 68
1.7. Percent, ratio, and rate problems 72
1.8. Appendix: The basic laws 86
ix
x CONTENTS OF THE COMPANION VOLUMES AND STRUCTURE OF THE CHAPTERS
Structure of the chapters in this volume and its two companion volumes
(RLE= Rational Numbers to Linear Equations,
A&G = Algebra and Geometry,
PCC = Pre-Calculus, Calculus, and Beyond)
Preface
xi
xii PREFACE
both correct and learnable. These six volumes will also shore up the critical math-
ematical backgrounds of supervisors of mathematics and mathematics professional
developers.
There has been no lack of books on all or parts of school mathematics—the
mathematics of K–12—in the education literature. We have chosen to add another
2,500 pages (the approximate total length of these six volumes) to the already
voluminous literature because we believe these volumes provide a first attempt
at solving two of the central problems in mathematics education: whether school
mathematics can be made to respect the integrity of mathematics and how much
mathematics a mathematics teacher or a mathematics educator needs to know.
We will address the former problem first. These six volumes give a detailed
confirmation of the fact that school mathematics—while maintaining its fidelity to
the progression of the standard school mathematics curriculum from kindergarten to
grade 12—can be made to respect the integrity of mathematics. Such a confirmation
has been a long time coming.
In the following pages, we will explain what mathematical integrity is and
why it is important to have an exposition of school mathematics that respects
mathematical integrity.3
At first glance, it seems absurd that there would be any need to discuss whether
school mathematics respects mathematical integrity. Is not school mathematics, by
its very name, part of mathematics and, as such, does it not follow that school math-
ematics carries the integrity inherent in the subject? This is a misconception about
school mathematics that we must confront without delay. School mathematics is
in fact not part of mathematics if mathematics is understood to be what working
mathematicians do or what is taught to math majors in college mathematics depart-
ments. Rather, school mathematics is an engineered version of mathematics—in the
sense of mathematical engineering introduced in [Wu2006]—in the same way
that civil engineering is an engineered version of Newtonian mechanics. Mathe-
matical engineering customizes the abstractions of mathematics for consumption
by K–12 students. For example, a fraction in mathematics is a straightforward
concept: it is an element of the quotient field of the integral domain of integers.
Fortunately, no one suggests that we tell this to ten-year-olds. Mathematical engi-
neering intervenes at this point to recast the concept of fractions so that fractions
can be understood by elementary students (see [Wu1998]). There are many such
examples all through the K–12 curriculum, e.g., negative numbers, slope of a line,
geometric measurements (length, area, and volume), congruence, similarity, expo-
nential functions, logarithms, axioms of plane geometry, etc. The engineering that
is needed to make these abstract concepts learnable by school students is therefore
substantial at times. Now there is good engineering, but there is also bad engineer-
ing, and the question is whether good mathematical engineering has been put in the
service of school mathematics. Unhappily, the answer is not always. In fact, school
3 It would be legitimate to also inquire why it has taken so long for someone to try to meet
mathematics and mathematical integrity parted ways at least five decades ago, and
our schools have been plagued by products of very bad mathematical engineering
ever since.
Before proceeding further, we first explain what mathematical integrity is be-
cause this concept is coming into focus,. We say a mathematical exposition has
mathematical integrity if it embodies the following five qualities:
(a) Definitions: Every concept is clearly and precisely defined
so that there is no ambiguity about what is being discussed. (See
the quote from Gibson at the beginning of this preface.)
(b) Precision: All statements are precise, especially the hy-
potheses that guarantee the validity of a mathematical assertion,
the reasoning in a proof, and the conclusions that follow from a
set of hypotheses.
(c) Reasoning: All statements4 other than the unavoidable ba-
sic assumptions are supported by reasoning.5
(d) Coherence: The basic concepts and skills are logically in-
terwoven to form a single fabric, and the interconnections among
them are consistently revealed.
(e) Purposefulness: The mathematical purpose behind every
concept and skill is clearly brought out so as to leave no doubt
about why it is where it is.
These we call the Fundamental Principles of Mathematics. A fuller discus-
sion of these principles will be found on pp. xxviii–xxxiii in the To the Instructor
section on pp. xxvii ff. below, but two things need to be said right away. First, the
role of definitions in school mathematics has been misunderstood, and misrepre-
sented, in the education literature thus far, so that—to educators—the emphasis on
definitions may seem to be misplaced. One will find a more balanced presentation
about definitions on pp. xxix–xxx. Next, there is no difference between reasoning
and proof in a mathematical context, and what is generally called problem solving
in the education literature is part of what is known as theorem proving in math-
ematics.6 Overall, it should not be difficult to see—and these three volumes will
bear witness to this fact—that these five fundamental principles are what make
mathematics transparent, in the sense that everything is on the table and no guess-
work or privileged knowledge is needed for its decoding. They are also the qualities
that make mathematics accessible to all students and learnable by all students. If
we want mathematics learning to take place in schools, it is incumbent on us to
teach school mathematics that is consistent with these fundamental principles.
But to return to the discussion of school mathematics of the past five decades,
we have to begin by asking what is school mathematics? This is in fact the question
that these six volumes ([Wu2011a], [Wu2016a], . . . , [Wu2020c]) attempt to answer,
but short of that, we will have to say school mathematics is the common content
of most of the mathematics textbooks in K–12 and most of the college textbooks
aimed at the professional development of mathematics teachers and mathematics
educators (compare the review of school textbooks in Appendix B of Chapter 3
4 With the exception of a few standard ones such as the fundamental theorem of algebra.
5 Intuitively,
reasoning supports even those assumptions because there are reasons why we
want to assume them.
6 Compare the discussion on pp. xxxvi-xxxvii.
xiv PREFACE
in [NMAP2]). If this strikes readers as too vague, they will be relieved to know
that there is in fact an amazing consistency among these textbooks.7 For example,
a fraction is thought of as a piece of pizza or a part-of-a-whole, although neither
conveys the message to students that a fraction is a number that they have to use
for extensive computations. Consequently, with such a "definition" of a fraction,
the arithmetic operations on fractions cannot be defined and their computational
algorithms cannot be proved.8 For another example, the concept of the slope of a
line in the coordinate plane is defined in most of these textbooks by taking two pre-
assigned points on the line to form the rise-over-run. But why is this rise-over-run
equal to the rise-over-run with respect to another pair of points on the same line?
Almost all textbooks insinuate that this equality is obviously true and not worth
fussing about. And so on. In general, school mathematics, as defined collectively
by these textbooks, is antithetical to mathematical integrity in that it lacks clarity
(due to a general absence of definitions and a pervasive lack of precision in its artic-
ulation), mostly asks for rote memorization as its default mode of learning (due to
the pervasive absence of reasoning), is incoherent (due to its neglect of the inherent
logical structure of mathematics), and traverses the curriculum in a listless and
pro forma manner (due to its failure to recognize the mathematical purpose behind
each topic). We call the content of these standard school mathematics textbooks
TSM (Textbook School Mathematics).9 TSM is recognized, consciously or
subconsciously, by teachers and educators to be unlearnable, and it is this unlearn-
ability that emboldens countless sensible adults to proclaim, often with pride, "I
am not good in math!"
There is a far more pernicious fallout from TSM, however, and it is the effect
TSM has on mathematics teachers and educators. These teachers and educators
have learned only TSM in K–12, but as of 2020, institutions of higher learning do
not provide courses to help future teachers and educators to replace their knowledge
of TSM with school mathematics with mathematical integrity. Consequently, all
that most teachers can do when they go back to teach in K–12 is trot out the TSM
they are familiar with, and all that most educators can do when they begin their
research is to fall back on the TSM they were taught. So the next generation also
learns TSM, and this is the vicious cycle that has rendered school mathematics
synonymous with TSM for at least the past five decades. Most educators may have
suspected that there must be more to school mathematics than TSM, but without
access to an exposition of school mathematics with mathematical integrity, their
suspicion remains just that, a suspicion.
Back in 1985, Lee Shulman lamented in his well-known address to the AERA
about "the absence of focus on subject matter among the various research paradigms
for the study of teaching" ([Shulman, page 6]). Shulman was talking about all dis-
ciplines, but from the standpoint of these six volumes ([Wu2011a], [Wu2016a], . . . ,
[Wu2020c]), we gain a clear perspective on how this neglect of the subject matter
may have come about in mathematics. We speculate that mathematics educators
7 When I first had the opportunity to sample a wide range of the available K–12 textbooks
for the first time around the year 2000, I was convinced that the publishers were in collusion and
simply agreed to copy each other.
8 Remember: "Ours is not to reason why. Just invert and multiply."
9 For a more extended discussion of TSM, see To the Instructor on pp. xxvii ff. as well as
[Wu2014] and [Wu2018a]. Note that TSM provides a new window into the phenomenon known as
math phobia.
PREFACE xv
may have chosen not to pay any attention to mathematical content because, since
school mathematics was apparently nothing more than TSM, they saw nothing in
the subject matter of school mathematics worthy of their serious attention. To
change mathematics educators’ perception of the subject matter in mathematics,
we have to give them access to a fully detailed exposition of school mathematics
with mathematical integrity.
The omnipresence of TSM in the last half-century created the unmistakable im-
pression that perhaps at least some of the travesties in TSM are necessarily endemic
to school mathematics. Under the circumstances, it was not easy to imagine that
school mathematics might have anything to do with mathematical integrity. But
two things happened around 1990. In 1989, NCTM (National Council of Teachers
of Mathematics) launched its school mathematics education reform by proclaiming
that school mathematics could be made to respect mathematical integrity. Without
a detailed exposition of school mathematics that respects mathematical integrity
to back up its claim, NCTM was of course going out on a limb. Then in 1994,
Alan Schoenfeld made a scholarly statement with the clear implication that, while
a school mathematics curriculum with mathematical integrity was certainly possi-
ble, we did not have it yet. What he wrote was, "Proof is not a thing separable
from mathematics, as it appears to be in our curricula . . . . And I believe it can be
imbedded in our curricula, at all levels" ([Schoenfeld1994, page 76]). Schoenfeld’s
statement was prompted in part by the debates surrounding the NCTM reform.
Note that beyond affirming his belief in the fundamental article of faith underly-
ing the NCTM reform, he stated openly that, indeed, this article of faith had not
yet been confirmed. We will return to Schoenfeld’s statement below, but before
proceeding further, we will make a few comments about the NCTM reform.
The foundational documents of the NCTM reform are the two sets of standards:
the 1989 [NCTM1989] and the 2000 [PSSM]. Although NCTM did not have an ex-
plicit recognition of the concept of TSM, the 1989 reform was undoubtedly a revolt
against the stranglehold of TSM on school mathematics education. NCTM declared
in essence that mathematical integrity must be part of school mathematics. For ex-
ample, [NCTM1989] states that one of the reform’s goals is that students "become
mathematically literate" (page 6). [PSSM] states that "a mathematics curriculum
should be coherent" (page 15), "should focus on important mathematics" (page
15), and "reasoning and proof should be a consistent part of students’ mathemati-
cal experience in prekindergarten through grade 12" (page 56). With the hindsight
of thirty years, we can see all too clearly the obstacles that confronted the NCTM
reform. With students, teachers, and educators completely immersed in TSM, the
clarion call for coherence, reasoning, and proof might as well have been stated in a
foreign language. Most of them had no conception of what those words meant.
We have to remember that, for example, what little "proof" TSM has to offer
resides only in the course in high school geometry, and even there, proofs are mainly
taught by rote (see [Schoenfeld1988]). Back in 1989, there was no detailed point-by-
point exposition of school mathematics that could provide a roadmap to show how
mathematical integrity can be introduced into school mathematics. There were no
school mathematics textbooks to replace the TSM-infested ones.10 Most fatally,
NCTM made no commitment to a massive and long-term professional development
10 In the year 2020, most of us can calmly look back and see that the reform curricula that
11 The absence of this commitment was no accident. To carry out this kind of professional
development on a large scale, the need for something like these six volumes ([Wu2011a], . . . ,
[Wu2020c]) to serve as a guide would be absolute.
PREFACE xvii
in [Wu2016b], similar to Chapter 6 of this volume), and middle and high school
geometry (along the lines of Chapters 4 and 5 of [Wu2016a], similar to Chapters 4
and 5 in this volume).
However, in the apparent absence of a detailed account of what a CCSSM cur-
riculum would look like,12 the specificity of the curricular deviations in CCSSM
turns out to be more of a political liability than an asset. Many people immedi-
ately put CCSSM and the NCTM reform on the same footing. Their perception was
that these two movements represented what happens when a bunch of wannabes
pontificate about school mathematics education without knowing what they are
talking about. A conspicuous example they cited is CCSSM’s approach to the ge-
ometry curriculum in middle school and high school using reflections, rotations,
and translations as the basic building blocks. Such a change is necessitated by the
inherently flawed TSM geometry curriculum based on an uninformed interpreta-
tion of the work of Euclid some twenty-three centuries ago (see pp. 157–164 below
for a more detailed explanation, and see Chapter 8 of [Wu2020b] for a compre-
hensive one). Instead, CCSSM calls for a nuanced two-step process to introduce
reflections, rotations, and translations as the foundational building blocks of the
school geometry curriculum. Standards 8.G on page 55 of [CCSSM] describe how,
in grade 8, these transformations can be used informally (but correctly) in heuris-
tic arguments to develop students’ intuition about transformations (as detailed in
Chapters 4 and 5 of [Wu2016a]). Then in high school, these transformations are
precisely defined to be used for formal proofs (as in Chapters 4 and 5 of this vol-
ume and Chapters 6 and 7 of [Wu2020b]). But not having such details available
back in 2010, many critics, educators, and teachers immediately predicted the im-
pending doom of this effort by CCSSM by citing the failures of putative similar
experiments in other nations. Moreover, they also predicted (not entirely incor-
rectly) the almost certain confusion among teachers who would try to implement
this new curriculum, and they regarded as inevitable the disappearance of proofs
from CCSSM high school geometry. In the absence of a detailed exposition that
shows how to navigate and implement the Common Core geometry standards with
mathematical integrity,13 such misunderstanding led inevitably to harsh criticism
(see, e.g., [Milgram-Wurman, pp. 4–5] and [Phelps-Milgram, page 10 and footnote
15 on page 41]). So CCSSM ends up facing the same wide credibility gap that
plagued the NCTM reform twenty years ago.
It did not help that the CCSSM agenda also left out the critical component of
professional development for teachers, thereby creating the same sense of bewilder-
ment in classrooms across the land (see [Education Week], [Loewus1], [Loewus2],
and [Sawchuk]). It would seem that CCSSM is repeating the same mistake as
the NCTM reform by not taking seriously the need to offer sustained, large-scale
professional development for teachers to help with its implementation. With the
the part of the CCSSM curriculum mentioned above—fractions, finite decimals, rational numbers,
parts of algebra, and middle school geometry—in the form of [Wu2010a] and [Wu2010b] in fact
predated CCSSM (they were drafts for [Wu2016a] and [Wu2016b], respectively). However, the
existence of these documents was not made widely known.
13 Note, however, that many curricular details on geometry were soon provided in [Wu2012],
[Wu2013], [Wu2016a], and [Wu2016b], but they were not made widely known. A detailed CCSSM-
aligned high school geometry curriculum, in existence since 2007, will appear in the second volume
of this three-volume set, [Wu2020b].
xviii PREFACE
publication of these three volumes, at least one complete exposition of school math-
ematics with mathematical integrity—an exposition that is also consistent in the
main with CCSSM—will be available to provide the needed guidance for this kind
of professional development, but will these volumes be too little too late? Only
time will tell.
It should be abundantly clear from the foregoing discussion that any real im-
provement in school mathematics education requires us to rethink the mathematical
education of teachers and educators. In particular, the destructive presence of TSM
can no longer be ignored. These six volumes have been written with the express
intent of encouraging and supporting such rethinking.
In the last few years, several books have made a concerted effort to promote
the introduction of mathematical integrity into school mathematics, e.g., [MET2],
[NCTM2009], [MUST], and the sixteen volumes in the NCTM series Developing
Essential Understanding (e.g., [Ellis-Bieda-Knuth]). In a book entitled We Reason
& We Prove for All Mathematics, Arbaugh et al. respond directly to Schoenfeld’s
belief in the possibility of imbedding proofs in all levels of K–12 (quoted on page
xv) by flatly stating that, in their volume, they "will provide guidance about how
to make reasoning-and-proving a reality in your classroom" ([Arbaugh et al., page
x]). These developments are welcome because their willingness to directly address
the content of K–12 mathematics represents a giant step forward in school mathe-
matics education at a time when many are still clinging to the idea that integrating
fun, engaging activities into the classroom—while leaving TSM intact—is the way
to improve school mathematics education. Nevertheless, we must also add a word
of caution at this juncture of school mathematics education concerning the effec-
tiveness of "providing guidance" in small doses, quite apart from the quality of the
guidance itself.
As of 2020, we have to face the unpleasant truth that, because of the longstand-
ing malfeasance of the education establishment, most people in school mathematics
education have been immersed in TSM, and only TSM, for their entire lives. Conse-
quently, most end up being deficient in a detailed knowledge of the inner workings
of mathematics on the one hand and in a coherent view of mathematics as a whole
on the other. An example of the former is the chronic failure to recognize that,
without precise definitions, correct reasoning (= proof) is unattainable. Another
example of the same is the fact that a proof must not be confused with a heuristic
argument, no matter how attractive that heuristic argument may be.
Examples of the lack of a global, coherent view of mathematics abound in
TSM, but we will limit our discussion to only three of them. The first is the lack
of awareness of the overall hierarchical structure of mathematics; e.g., in order to
move forward mathematically in a mathematical development, one may only use
results already proved earlier. There is no better illustration of this lack than the
"proof" in TSM of equivalent fractions using fraction multiplication14 —one that
is universally taught in TSM. Such a "proof" should be recognized for what it is:
totally anti-mathematical. Here, the details are, step by step, impeccable, but the
flagrant mathematical error lies in using a fact—about fraction multiplication that
2×4 2×4
14 This is the reasoning that 2
3
= 3×4
because 2
3
= 2
3
×1= 2
3
× 4
4
= 3×4
.
PREFACE xix
15 Teachers need to know where their students come from and where they are headed,
curriculumwise.
xx PREFACE
We have just given a partial explanation of why these six volumes (this vol-
ume, together with [Wu2011a], [Wu2016a], [Wu2016b], [Wu2020b], and [Wu2020c])
require 2,500 pages of detailed mathematical discussions to confirm the fact that
school mathematics can be made to respect mathematical integrity. Because of the
corrosive effects of TSM that have pervaded and degraded school mathematics for
so long, we are obliged to rebuild school mathematics from the ground up. In these
six volumes, we take nothing for granted. For example, we pay special attention
to the need for correct definitions as the basis for reasoning and proofs; we want
to drive home the point that once a definition of a concept (such as a fraction) is
given, then every subsequent assertion about this concept has to be based on the
definition, and on the definition alone. Every statement in these volumes, from
whole numbers to calculus, is carefully proved.16 The intended goal of this effort
is to clarify, cumulatively, the mathematical meaning of the declarative statement,
"A implies B", as a purely deductive process that begins with the hypothesis A and
arrives at the conclusion B. This is in contrast with the common practice in TSM
of "explaining" something by telling a story, by drawing an analogy, or by offering
an attractive pattern or heuristic argument. These six volumes take an entirely
different tack: they show, consistently, how to verify "A implies B" in mathematics
by moving from A to B on the basis of definitions, explicit assumptions, or theo-
rems with the help of logic. These volumes do so—we emphasize—from the first
page to the last because we believe that the way to teach is not to pontificate but
to lead by example. This process of acculturating teachers (and ultimately their
students) to reasoning and proof does not have to be rigid or formal, especially in
the early grades (see, e.g., Sections 4.2 or Section 6.2 of [Wu2011a]), but the essen-
tial elements of logical deduction must be put in place and maintained ab initio to
preserve the integrity of mathematics. We also go into extensive detail about such
seemingly pedestrian topics as the proper use of symbols (Sections 6.1 and 6.2 on
pp. 298ff.), the meaning of an equation, and what it means to solve an equation
(see pp. 322–324), with the hope that the long years of obfuscation in TSM with
such jargon as "variables" and "symbolic manipulations for solving an equation"
will be brought to a merciful end.
We hope that the foregoing discussion has made the case for the critical need
for a thorough-going exposition of school mathematics with mathematical integrity.
Incidentally, the only reason we have made repeated references in this whole discus-
sion to the same six volumes by the present author is that there is no comparable
exposition at the moment. It is in fact our hope that the publication of these six
volumes will encourage others to come up with their own ways of replacing TSM
across K–12 with a development of school mathematics that respects mathematical
integrity.
Knowing what school mathematics with mathematical integrity looks like en-
ables us to face up to the second problem in mathematics education that was
mentioned on page xii: how much mathematics a mathematics teacher or a mathe-
matics educator needs to know. For teachers, this problem has a long history; see,
16 With the usual disclaimer that there are a very few theorems that we must intentionally
Now consider the teaching and learning of fractions and (finite) decimals. While
education researchers of the past five decades were no doubt aware of the simple
treatment of fractions in abstract algebra, their uncritical acceptance of TSM misled
them into believing that, for elementary school students, one can do no better than
teaching fractions as pieces of pizzas or some variation thereof. Consequently, they
focussed their research on the teaching and learning of fractions, for the most part,
on tweaking the TSM model of fractions-as-pizzas—with no thought given to help-
ing students learn about fractions as numbers or learn to reason their way through
the arithmetic of fractions.19 As a result, education research on fractions has fo-
cussed on increasing children’s experiential and informal familiarity with fractions
17 Regardless of the fact that the term TSM was coined only in 2011.
18 We strongly believe that the mathematics of elementary school should be taught by math-
ematics teachers. See [Wu2009].
19 Unhappily, TSM also claims some professional mathematicians among its victims: these
mathematicians have come to believe that teaching fractions in schools can lead to nothing more
than "confusion and memorization". See, for example, [DeTurck].
xxii PREFACE
based on the pizza model rather than on increasing children’s mathematical knowl-
edge of fractions based on a correct definition of a fraction. If it had tried to do
the latter, it would have rejected the absurd pizza model from the outset (see, e.g.,
pp. 33–35 of [Wu2008] for a brief discussion of the relevant literature). The same
body of education research has also tried to make sense—unsuccessfully of course—
of other anti-mathematical practices, such as treating decimals as a different kind
of number, adding and subtracting fractions using the least common denominator,
or teaching the multiplication and division of fractions without precise definitions.
Only recently have researchers become aware of a more reasonable foundation for
fractions (initiated in [Wu1998] and expanded in [Wu2011a]; abbreviated versions
are given in Chapter 1 of [Wu2016a] and Chapter 1 of this volume)20 that puts
the study of fractions on the number line, emphasizes the concept of a fraction as
a number for arithmetic computations, and makes sense of (finite) decimals as a
special collection of fractions. There is still some distance to go in this direction,
such as honoring the definition of a fraction by using it in every situation, e.g., for
multiplication, for division, for understanding ratios, etc. We eagerly look forward
to a change along these lines in the education research on fractions and decimals in
the years to come (cf. [Siegler et al.]).
School textbooks
Better school mathematics education requires not only more effective teach-
ers but also textbooks that contain only school mathematics with mathematical
integrity. Our discussion thus far has been all about getting more effective teach-
ers but nothing about getting better textbooks. This is not because we believe
textbooks are less important, but since most school textbooks are published by
the major publishers, there is little that people in academia can do to convince
publishers to abandon their bottom-line mentality and write better textbooks (cf.
[Keeghan]). However, there are now several online curricula written more or less in
accordance with CCSSM and, according to some reports, a few seem to be showing
promise.21
As we said at the beginning, we hope these six volumes under discussion can
serve as a blueprint for better school textbooks. But let us add a few caveats in
this regard. First of all, these six volumes are certainly not student textbooks: they
are written specifically for adults (teachers and educators, maybe some curious par-
ents). Nevertheless, their mathematical content has been carefully customized (i.e.,
engineered) for use in the appropriate grades, at least as far as the mathematical
level of sophistication is concerned, so that after some straightforward pedagogical
modifications and embellishments, they can be expanded into student textbooks.
An example of how such an expansion may be realized will appear before long,
we hope, in the form of a student textbook for grade 8 that will be posted on the
author’s homepage, https://math.berkeley.edu/~wu/. At the very least, we be-
lieve these six volumes taken together can serve as a detailed guide for textbook
20 This approach to fractions and decimals—as presented in [Wu2016a]—served as a blueprint
for the fractions and decimals standards of [CCSSM]. Because this volume is for consumption by
high school teachers and mathematics educators, what is in Chapter 1 is more brief and slightly
more sophisticated than its counterparts in [Wu2011a] and [Wu2016a].
21 It is uncertain whether any of the textbook evaluation agencies is aware of the importance
publishers on how to write school mathematics textbooks across K–12 that respect
both the standard curricular sequence and mathematical integrity. For this pur-
pose, textbook writers should take note that there are several major departures
from the standard school curriculum in this volume and [Wu2020b] and [Wu2020c].
Briefly, they are the following:
The presentation of the curricular shift described in (3) will be given in Chap-
ters 4 and 5 of this volume, but with a mild twist. Because the informal geometry
xxiv PREFACE
(proposed for grade 8) has already been treated in detail in [Wu2016a], the ge-
ometry in Chapters 4 and 5 of the present volume will be the formal high school
counterpart of the informal geometry in [Wu2016a]. The exposition of the main
body of plane geometry (geometry of the triangle and the circle along with con-
structions with ruler and compass) then resumes in Chapters 6 and 7 of the second
volume, [Wu2020b], after we have finished discussing the standard topics of second-
year algebra.
Final thoughts
We call special attention to the fact that the third and last of these three
volumes, [Wu2020c], is essentially an introduction to mathematical analysis, cus-
tomized specifically for consumption by prospective mathematics teachers and ed-
ucators.22 It is likely that this material will also benefit beginning math majors in
college.
We should also address an obvious question that probably has been on readers’
minds all along; namely, why does this volume on high school mathematics begin
with the middle school topics of fractions and rational numbers? Nothing need be
said about the obvious relevance of these topics to mathematics educators, but we
owe high school teachers an explanation of why we consider these topics to be an
integral part of their content knowledge. It is a fact—though hidden in TSM—that
rational numbers, rather than real numbers, are the backbone of the mathematics
in grades 5–12. Unfortunately, because of TSM, students in all grades seem to
have trouble with fractions and, consequently, with rational numbers. Given the
hierarchical structure of mathematics, it is not surprising that students’ inability
to learn algebra can often be traced back to their weakness in the foundational
subjects of fractions and rational numbers. This was pointed out in the National
Mathematics Advisory Panel Report (see page 18 of [NMAP1]). Indeed, the story
has been told many times that even students in honors sections of Algebra 2 plead
with their teachers to give them instructions on fractions. So, to be effective in
teaching the standard topics of high school mathematics, high school teachers must
have a TSM-free working knowledge of fractions and rational numbers as well.
A final reflection: Earlier, we quoted Lee Shulman’s lament about "the absence
of focus on subject matter among the various research paradigms for the study of
teaching" (see page xiv). These six volumes have now redefined the meaning of
this subject matter for school mathematics. We hope mathematics educators will
discover through these volumes that the mathematics underlying school mathemat-
ics, when presented correctly, is no longer meaningless like TSM and is worthy of
their best efforts to learn it. Moreover, the subject matter, thus redefined, will have
repercussions on "the study of teaching". As school mathematics becomes more
learnable by all students, and therefore more teachable by all teachers, pedagogy
will have to focus—not on how to render the incomprehensible23 palatable—but on
how to facilitate the normal process of learning so that all students can learn how
to reason critically and correctly.
22 Here as well as elsewhere in these three volumes, we are engaging in serious mathematical
But for all that, it will be necessary to first make school mathematics that re-
spects mathematical integrity an integral part of mathematics education research.
This then harks back to Lee Shulman’s lament. It is our belief and our hope that
school mathematics education will improve when mathematics education research
begins to address, not TSM, but school mathematics with mathematical integrity.
Acknowledgements
The drafts of this volume and its companion volumes, [Wu2020b] and
[Wu2020c], have been used since 2006 in the mathematics department at the Univer-
sity of California at Berkeley as textbooks for a three-semester sequence of courses,
Math 151–153, that was created for pre-service high school teachers.24 The two
people most responsible for making these courses a reality were the two chairs of
the mathematics department in those early years: Calvin Moore and Ted Slaman.
I am immensely indebted to them for their support. I should not fail to mention
that, at one point, Ted volunteered to teach an extra course for me in order to
free me up for the writing of an early draft of these volumes. Would that all of
us had chairs like him! Mark Richards, then Dean of Physical Sciences, was also
behind these courses from the beginning. His support not only meant a lot to me
personally, but I suspect that it also had something to do with the survival of these
courses in a research-oriented department.
It is manifestly impossible to write three volumes of teaching materials without
generous help from students and friends in the form of corrections and suggestions
for improvement. I have been fortunate in this regard, and I want to thank them
all for their critical contributions: Richard Askey,25 David Ebin, Emiliano Gómez,
Larry Francis, Ole Hald, Sunil Koswatta, Bob LeBoeuf, Gowri Meda, Clinton Rem-
pel, Ken Ribet, Shari Lind Scott, Angelo Segalla, and Kelli Talaska. Dick Askey’s
name will be mentioned in several places in these volumes, but I have benefitted
from his judgment much more than what those explicit citations would seem to
indicate. I especially appreciate the fact that he shared my belief early on in the
corrosive effect of TSM on school mathematics education. David Ebin and Angelo
Segalla taught from these volumes at SUNY Stony Brook and CSU Long Beach,
respectively, and I am grateful to them for their invaluable input from the trenches.
I must also thank Emiliano Gómez, who has taught these courses more times than
anybody else with the exception of Ole Hald. Some of his deceptively simple com-
ments have led to much soul-searching and extensive corrections. Bob LeBoeuf put
up with my last-minute requests for help, and he showed what real dedication to a
cause is all about.
Section 1.9 of the third volume ([Wu2020c]) on the importance of sine and
cosine could not have been written without special help from Professors Thomas
Kailath and Julius O. Smith III of Stanford University, as well as from my longtime
collaborator Robert Greene of UCLA. I am grateful to them for their uncommon
courtesy.
24 Since the fall of 2018, this three-semester sequence has been pared down to a two-semester
sequence. A partial study of the effects of these courses on pre-service teachers can be found in
[Newton-Poon1].
25 Sadly, Dick passed away on October 9, 2019.
xxvi PREFACE
Last but not least, I have to single out two names for my special expression
of gratitude. Larry Francis has been my editor for many years, and he has pored
over every single draft of these manuscripts with the same meticulous care from
the first word to the last. I want to take this opportunity to thank him for the
invaluable help he has consistently provided me. Ole Hald took it upon himself to
teach the whole Math 151–153 sequence—without a break—several times to help
me improve these volumes. That he did, in more ways than I can count. His
numerous corrections and suggestions, big and small, all throughout the last nine
years have led to many dramatic improvements. My indebtedness to him is too
great to be expressed in words.
Hung-Hsi Wu
Berkeley, California
February 2, 2020
To the Instructor
These three volumes (the other two being [Wu2020b] and [Wu2020c]) have
been written expressly for high school mathematics teachers and mathematics ed-
ucators.1 Their goal is to revisit the high school mathematics curriculum, together
with relevant topics from middle school, to help teachers better understand the
mathematics they are or will be teaching and to help educators establish a sound
mathematical platform on which to base their research. In terms of mathematical
sophistication, these three volumes are designed for use in upper division courses
for math majors in college. Since their content consists of topics in the upper
end of school mathematics (including one-variable calculus), these volumes are in
the unenviable position of straddling two disciplines: mathematics and education.
Such being the case, these volumes will inevitably inspire misconceptions on both
sides. We must therefore address their possible misuse in the hands of both math-
ematicians and educators. To this end, let us briefly review the state of school
mathematics education as of 2020.
For roughly the last five decades, the nation has had a de facto national school
mathematics curriculum, one that has been defined by the standard school math-
ematics textbooks. The mathematics encoded in these textbooks is extremely
flawed.2 We call the body of knowledge encoded in these textbooks TSM
(Textbook School Mathematics; see page xiv). We will presently give a su-
perficial survey of some of these flaws,3 but what matters to us here is the fact that
institutions of higher learning appear to be oblivious to the rampant mathematical
mis-education of students in K–12 and have done very little to address the insid-
ious presence of TSM in the mathematics taught to K–12 students over the last
50 years. As a result, mathematics teachers are forced to carry out their teaching
duties with all the misconceptions they acquired from TSM intact, and educators
likewise continue to base their research on what they learned from TSM. So TSM
lives on unchallenged.
These three volumes are the conclusion of a six-volume series4 whose goal is
to correct the universities’ curricular oversight in the mathematical education of
1 We use the term "mathematics educators" to refer to university faculty in schools of education.
2 These statements about curriculum and textbooks do not take into account how much the quality
of school textbooks and teachers’ content knowledge may have evolved recently with the advent of
CCSSM (Common Core State Standards for Mathematics) ([CCSSM]) in 2010.
3 Detailed criticisms and explicit corrections of these flaws are scattered throughout these
volumes.
4 The earlier volumes in the series are [Wu2011a], [Wu2016a], and [Wu2016b].
xxvii
xxviii TO THE INSTRUCTOR
6 Proponents of this approach to definitions often seem to forget that, after the emergence
of a precise definition, students are still owed a systematic exposition of mathematics using the
definition so that they can learn about how the definition fits into the overall logical structure of
mathematics.
TO THE INSTRUCTOR xxxi
(2) Reasoning. Reasoning is the lifeblood of mathematics, and the main rea-
son for learning mathematics is to learn how to reason. In the context of school
mathematics, reasoning is important to students because it is the tool that empow-
ers them to explore on their own and verify for themselves what is true and what
is false without having to take other people’s words on faith. Reasoning gives them
confidence and independence. But when students have to accustom themselves to
performing one unexplained rote skill after another, year after year, their ability
to reason will naturally atrophy. Many students find it more expedient to stop
asking why and simply take any order that comes their way sight unseen just to
get by.7 One can only speculate on the cumulative effect this kind of mathematics
"learning" has on those students who go on to become teachers and mathematics
educators.
(3) Precision. The purpose of precision is to eliminate errors and minimize
misconceptions, but in TSM students learn at every turn that they should not
believe exactly what they are told but must learn to be creative in interpreting it.
For example, TSM preaches the virtue of using the theorem on equivalent fractions
to simplify fractions and does not hesitate to simplify a rational expression in x as
follows:
(x − 1)(x2 + 3) x2 + 3
= .
x(x − 1) x
This looks familiar because "canceling the same number from top and bottom" is
exactly what the theorem on equivalent fractions is supposed to do. Unfortunately,
this theorem only guarantees
ca a
=
bc b
when a, b, and c are whole numbers (b and c understood to be nonzero). In the
2
√ (x +3), and x is necessarily a
previous rational expression, however, none of (x−1),
whole number because x could be, for example, 5. Therefore, according to TSM,
students in algebra should look back at equivalent fractions and realize that the
theorem on equivalent fractions—in spite of what it says—can actually be applied
to "fractions" whose "numerators" and "denominators" are not whole numbers.
Thus TSM encourages students to believe that "nothing needs to be taken precisely
and one must be flexible in interpreting what one learns". This extrapolation-happy
mindset is the opposite of what it takes to learn a precise subject like mathematics
or any of the exact sciences. For example, we cannot allow students to believe that
the domain of definition of log x is [0, ∞) since [0, ∞) is more or less the same as
(0, ∞). Indeed, the presence or absence of the single point "0" is the difference
between true and false.
Another example of how a lack of precision leads to misconceptions is the
statement that "β 0 = 1", where β is a nonzero number. Because TSM does not
use precise language, it does not—or cannot—draw a sharp distinction between a
heuristic argument, a definition, and a proof. Consequently, it has misled numerous
students and teachers into believing that the heuristic argument for defining β 0 to
be 1 is in fact a "proof" that β 0 = 1. The same misconception persists for negative
exponents (e.g., β −n = 1/β n ). The lack of precision is so pervasive in TSM that
there is no end to such examples.
7 There is consistent anecdotal evidence from teachers in the trenches that such is the case.
xxxii TO THE INSTRUCTOR
(4) Coherence. Another reason why TSM is less than learnable is its inco-
herence. Skills in TSM are framed as part of a long laundry list, and the lack of
definitions for concepts ensures that skills and their underlying concepts remain
forever disconnected. Mathematics, on the other hand, unfolds from a few cen-
tral ideas, and concepts and skills are developed along the way to meet the needs
that emerge in the process of unfolding. An acceptable exposition of mathematics
therefore tells a coherent story that makes mathematics memorable. For example,
consider the fact that TSM makes the four standard algorithms for whole numbers
four separate rote-learning skills. Thus TSM hides from students the overriding
theme that the Hindu-Arabic numeral system is universally adopted because it
makes possible a simple, algorithmic procedure for computations; namely, if we
can carry out an operation (+, −, ×, or ÷) for single-digit numbers, then we can
carry out this operation for all whole numbers no matter how many digits they
have (see Chapter 3 of [Wu2011a]). The standard algorithms are the vehicles that
bridge operations with single-digit numbers and operations on all whole numbers.
Moreover, the standard algorithms can be simply explained by a straightforward
application of the associative, commutative, and distributive laws. From this per-
spective, a teacher can explain to students, convincingly, why the multiplication
table is very much worth learning; this would ease one of the main pedagogical
bottlenecks in elementary school. Moreover, a teacher can also make sense of the
associative, commutative, and distributive laws to elementary students and help
them see that these are vital tools for doing mathematics rather than dinosaurs in
an outdated school curriculum. If these facts had been widely known during the
1990s, the senseless debate on whether the standard algorithms should be taught
might not have arisen and the Math Wars might not have taken place at all.
TSM also treats whole numbers, fractions, (finite) decimals, and rational num-
bers as four different kinds of numbers. The reality is that, first of all, decimals are
a special class of fractions (see pp. 14ff.), whole numbers are part of fractions, and
fractions are part of rational numbers. Moreover, the four arithmetic operations
(+, −, ×, and ÷) in each of these number systems do not essentially change from
system to system. There is a smooth conceptual transition at each step of the
passage from whole numbers to fractions and from fractions to rational numbers;
see Parts 2 and 3 of [Wu2011a] or Sections 2.2, 2.4, and 2.5 in this volume. This
coherence facilitates learning: instead of having to learn about four different kinds
of numbers, students basically only need to learn about one number system (the
rational numbers). Yet another example is the conceptual unity between linear
functions and quadratic functions: in each case, the leading term—ax for linear
functions and ax2 for quadratic functions—determines the shape of the graph of
the function completely, and the studies of the two kinds of functions become sim-
ilar as each revolves around the shape of the graph (see Section 2.1 of [Wu2020b]).
Mathematical coherence gives us many such storylines, and a few more will be
detailed below.
(5) Purposefulness. In addition to the preceding four shortcomings—a lack
of clear definitions, faulty or nonexistent reasoning, pervasive imprecision, and gen-
eral incoherence—TSM has a fifth fatal flaw: it lacks purposefulness. Purposefulness
is what gives mathematics its vitality and focus: the fact is that a mathematical
investigation, at any level, is always carried out with a specific goal in mind. When
a mathematics textbook reflects this goal-oriented character of mathematics, it
TO THE INSTRUCTOR xxxiii
propels the mathematical narrative forward and facilitates its learning by making
students aware of where the discussion is headed, and why. Too often, TSM lurches
from topic to topic with no apparent purpose, leading students to wonder why they
should bother to tag along. One example is the introduction of the absolute value
of a number. Many teachers and students are mystified by being saddled with such
a "frivolous" skill: "just kill the negative sign", as one teacher put it. Yet TSM
never tries to demystify his concept. (For an explanation of the need to introduce
absolute value, see, e.g., the discussion on pp. 130ff.). Another is the seemingly
inexplicable replacement
√ √ of the square root and cube root symbols of a positive
number b, i.e., b and 3 b, by rational exponents, b1/2 and b1/3 , respectively (see,
e.g., Section 4.2 of [Wu2020b]). Because TSM teaches the laws of exponents as
merely "number facts", it is inevitable that it would fail to point out the purpose of
this change of notation, which is to shift focus from the operation of taking roots to
the properties of the exponential function bx for a fixed positive b. A final example
is the way TSM teaches estimation completely by rote, without ever telling students
why and when estimation is important and therefore worth learning. Indeed, we
often have to make estimates, either because precision is unattainable or unneces-
sary, or because we purposely use estimation as a tool to help achieve precision (see
[Wu2011a, Section 10.3]).
To summarize, if we want students to be taught mathematics that is learn-
able, then we must discard TSM and replace it with the kind of mathematics that
possesses these five qualities:
We have come across them before on page xiii: these are the Fundamental Principles
of Mathematics (also see Section 2.1 in [Wu2018a]).
TSM consistently violates all five fundamental principles. Because of the dom-
inance of TSM for at least the past half-century, most students come out of K–12
knowing only TSM but not mathematics that respects these fundamental principles.
To them, learning mathematics is not about learning how to reason or distinguish
true from false but about memorizing facts and tricks to get correct answers. Faced
with this crisis, what should be the responsibility of institutions of higher learn-
ing? Should it be to create courses for future teachers and educators to help them
systematically replace their knowledge of TSM with mathematics that is consistent
with the five fundamental principles? Or should it be, rather, to leave TSM alone
but make it more palatable by helping teachers infuse their classrooms with activ-
ities that suggest visions of reasoning, problem solving, and sense making? As of
this writing, an overwhelming majority of the institutions of higher learning are
choosing the latter alternative.
At this point, we return to the earlier question about some of the ways both
university mathematicians and educators might misunderstand and misuse these
three volumes.
xxxiv TO THE INSTRUCTOR
First, consider the case of mathematicians. They are likely to scoff at what
they perceive to be the triviality of the content in these volumes: no groups, no
homomorphisms, no compact sets, no holomorphic functions, and no Gaussian cur-
vature. They may therefore be tempted to elevate the level of the presentation, for
example, by introducing the concept of a field and show that, when two fractions
symbols m/n and k/ (with whole numbers m, n, k, , and n = 0, = 0) satisfying
m = nk are identified, and when + and × are defined by the usual formulas, the
fraction symbols form a field. In this elegant manner, they can efficiently cover all
the standard facts in the arithmetic of fractions in the school curriculum.8 This
is certainly a better way than defining fractions as points on the number line to
teach teachers and educators about fractions, is it not? Likewise, mathematicians
may find finite geometry to be a more exciting introduction to axiomatic systems
than any proposed improvements on the high school geometry course in TSM. The
list goes on. Consequently, pre-service teachers and educators may end up learn-
ing from mathematicians some interesting mathematics, but not mathematics that
would help them overcome the handicap of knowing only TSM.
Mathematicians may also engage in another popular approach to the profes-
sional development of teachers and educators: teaching the solution of hard prob-
lems. Because mathematicians tend to take their own mastery of fundamental skills
and concepts for granted, many do not realize that it is nearly impossible for teach-
ers who have been immersed in thirteen years or more of TSM to acquire, on their
own, a mastery of a mathematically correct version of the basic skills and concepts.
Mathematicians are therefore likely to consider their major goal in the professional
development of teachers and educators to be teaching them how to solve hard prob-
lems. Surely, so the belief goes, if teachers can handle the "hard stuff", they will
be able to handle the "easy stuff" in K–12. Since this belief is entirely in line
with one of the current slogans in school mathematics education about the critical
importance of problem solving, many teachers may be all too eager to teach their
students the extracurricular skills of solving challenging problems in addition to
teaching them TSM day in and day out. In any case, the relatively unglamorous
content of these three volumes (this volume, [Wu2020b], and [Wu2020c])—designed
to replace TSM—will get shunted aside into supplementary reading assignments.
At the risk of belaboring the point, the focus of these three volumes is on
showing how to replace teachers’ and educators’ knowledge of TSM in grades 9–12
with mathematics that respects the fundamental principles of mathematics. There-
fore, reformulating the mathematics of grades 9–12 from an advanced mathemati-
cal standpoint to obtain a more elegant presentation is not the point. Introducing
novel elementary topics (such as Pick’s theorem or the 4-point affine plane) into
the mathematics education of teachers and educators is also not the point. Rather,
the point in year 2020 is to do the essential spadework of revisiting the standard
9–12 curriculum—topic by topic, along the lines laid out in these three volumes—
showing teachers and educators how the TSM in each case can be supplanted by
mathematics that makes sense to them and to their students. For example, since
most pre-service teachers and educators have not been exposed to the use of precise
definitions in mathematics, they are unlikely to know that definitions are supposed
to be used, exactly as written, no more and no less, in logical arguments. One of
the most formidable tasks confronting mathematicians is, in fact, how to change
educators’ and teachers’ perception of the role of definitions in reasoning.
As illustration, consider how TSM handles slope. There are two ways, but we
will mention only one of them.9 TSM pretends that, by defining the slope of a
line L using the difference quotient with respect to two pre-chosen points P and
Q on L,10 such a difference quotient is a property of the line itself (rather than
a property of the two points P and Q). In addition, TSM pretends that it can
use "reasoning" based on this defective definition to derive the equation of a line
when (for example) its slope and a given point on it are prescribed. Here is the
inherent danger of thirteen years of continuous exposure to this kind of pseudo-
reasoning: teachers cease to recognize that (a) such a definition of slope is defective
and (b) such a defective definition of slope cannot possibly support the purported
derivation (= proof) of the equation of a line. It therefore comes to pass that—
as a result of the flaws in our education system—many teachers and educators
end up being confused about even the meaning of the simplest kind of reasoning:
"A implies B". They need—and deserve—all the help we can give so that they
can finally experience genuine mathematics, i.e., mathematics that is based on the
fundamental principles of mathematics.
Of course, the ultimate goal is for teachers to use this new knowledge to teach
their own students so that those students can achieve a true understanding of what
"A implies B" means and what real reasoning is all about. With this in mind, we
introduce in Section 6.4 (pp. 337ff.) the concept of slope by discussing what slope is
supposed to measure (an example of purposefulness) and how to measure it, which
then leads to the formulation of a precise definition. With the availability of the
AA-criterion for triangle similarity (Theorem G22 on page 288), we then show how
this definition leads to the formula for the slope of a line as the difference quotient
of the coordinates of any two points on the line (the "rise-over-run"). Having
this critical flexibility to compute the slope—plus an earlier elucidation of what an
equation is (pp. 322–324)—we easily obtain the equation of a line passing through
a given point with a given slope, with correct reasoning this time around (see pp.
357ff.). Of course the same kind of reasoning can be applied to similar problems
when other reasonable geometric data are prescribed for the line.
By guiding teachers and educators systematically through the correction of
TSM errors on a case-by-case basis, we believe they will gain a new and deeper
understanding of school mathematics. Ultimately, we hope that if institutions of
higher learning and the education establishment can persevere in committing them-
selves to this painstaking work, the students of these teachers and educators will
be spared the ravages of TSM. If there is an easier way to undo thirteen years and
more of mis-education in mathematics, we are not aware of it.
A main emphasis in using these three volumes should therefore be on providing
patient guidance to teachers and educators to help them overcome the many hand-
icaps inflicted on them by TSM. In this light, we can say with confidence that, for
9 A second way is to define a line to be the graph of a linear equation y = mx + b and then
define the slope of this line to be m. This is the definition of a line in advanced mathematics, but
it is so profoundly inappropriate for use in K–12 that we will just ignore it.
10 This is the "rise-over-run".
xxxvi TO THE INSTRUCTOR
now, the best way for mathematicians to help educate teachers and educators is to
firm up their mathematical foundations. Let us repair the damage TSM has done
to their mathematics content knowledge by helping them to acquire a knowledge
of school mathematics that is consistent with the fundamental principles of math-
ematics.
Next, we address the issue of how educators may misuse these three volumes.
Educators may very well frown on the volumes’ insistence on precise definitions
and precise reasoning and their unremitting emphasis on proofs while, apparently,
neglecting problem solving, conceptual understanding, and sense making. To them,
good professional development concentrates on all of these issues plus contextual
learning, student thinking, and communication with students. Because these three
volumes never explicitly mention problem solving, conceptual understanding, or
sense making per se (or, for that matter, contextual learning or student thinking),
their content may be dismissed by educators as merely skills-oriented or technical
knowledge for its own sake and, as such, get relegated to reading assignments outside
of class. They may believe that precious class time can be put to better use by
calling on students to share their solutions to difficult problems or by holding small
group discussions about problem-solving strategies.
We believe this attitude is also misguided because the critical missing piece in
the contemporary mathematical education of teachers and educators is an exposure
to a systematic exposition of the standard topics of the school curriculum that
respects the fundamental principles of mathematics. Teachers’ lack of access to
such a mathematical exposition is what lies at the heart of much of the current
education crisis. Let us explain.
Consider problem solving. At the moment, the goal of getting all students
to be proficient in solving problems is being pursued with missionary zeal, but
what seems to be missing in this single-minded pursuit is the recognition that the
body of knowledge we call mathematics consists of nothing more than a sequence
of problems posed, and then solved, by making logical deductions on the basis of
precise definitions, clearly stated hypotheses, and known results.11 This is after
all the whole point of the classic two-volume work [Pólya-Szegö], which introduces
students to mathematical research through the solutions to a long list of problems.
For example, the Pythagorean theorem and its many proofs are nothing more than
solutions to the problem posed by people from diverse cultures long ago: "Is there
any relationship among the three sides of a right triangle?" There is no essential
difference between problem solving and theorem proving in mathematics. Each time
we solve a problem, we in effect prove a theorem (trivial as that theorem may
sometimes be).
The main point of this observation is that if we want students to be profi-
cient in problem solving, then we must give them plenty of examples of grade-
appropriate proofs all through (at least) grades 4–12 and engage them regularly
11 It is in this light that the previous remark about the purposefulness of mathematics can
be better understood: before solving a problem, one should know why the problem was posed in
the first place. Note that, for beginners (i.e., school students), the overwhelming emphasis has to
be on solving problems rather than the more elusive issue of posing problems.
TO THE INSTRUCTOR xxxvii
12 And, of course, to also get school textbooks that are unsullied by TSM. However, it seems
likely as of 2020 that major publishers will hold onto TSM until there are sufficiently large numbers
of knowledgeable teachers who demand better textbooks. See the end of [Wu2015].
13 These three volumes, together with [Wu2011a], [Wu2016a], and [Wu2016b].
xxxviii TO THE INSTRUCTOR
14 As well as from the other three volumes, [Wu2011a], [Wu2016a], and [Wu2016b]).
TO THE INSTRUCTOR xxxix
In one sense, these three volumes are just textbooks, and you may feel you have
gone through too many textbooks in your life to need any fresh advice. Nevertheless,
we are going to suggest that you approach these volumes with a different mindset
than what you may have used with other textbooks, because you will soon be using
the knowledge you gain from these volumes to teach your students. Reading other
textbooks, you would likely congratulate yourself if you could achieve mastery over
90% of the material. That would normally guarantee an A. More is at stake with
these volumes, however, because they directly address what you will need to know
in order to write your lessons. Ask yourself whether a mathematics teacher whose
lessons are correct only 90% of the time should be considered a good teacher. To
be blunt, such a teacher would be a near disaster. So your mission in reading these
volumes should be to achieve nothing short of total mastery. You are expected to
know this material 100%. To the extent that the content of these three volumes
is just K–12 mathematics, this is an achievable goal. This is the standard you have
to set for yourself. Having said that, we also note explicitly that many Mathematical
Asides are sprinkled all through the text, sometimes in the form of footnotes. These
are comments—usually from an advanced mathematical perspective—that try to
shed light on the mathematics under discussion. The above reference to "total
mastery" does not include these comments.
You should approach these volumes differently in yet another respect. Students’
typical attitude towards a math course is that if they can do all the homework
problems, then most of their work is done. Think back on your calculus courses
or any of the math courses when you were in school, and you will understand how
true this is. But since these volumes are designed specifically for teachers, your
emphasis cannot be limited to merely doing the homework assignments because
your job will be more than just helping students to do homework problems. When
you stand in front of a class, what you will be talking about, most of the time, will
not be the exercises at the end of each section but the concepts and skills in the
exposition proper.1 For example, very likely you will soon have to convince a class
on geometry why the Pythagorean theorem is correct. There are two proofs of this
theorem in these volumes, one in Chapter 5 of this volume and the other in Chapter
4 of [Wu2020c]. Yet on neither occasion is it possible to assign a problem that asks
for a proof of this theorem. There are problems that can assess whether you know
1 I will be realistic and acknowledge that there are teachers who use class time only to drill
students on how to get the right answers to exercises, often without reasoning. But one of the
missions of these three volumes is to steer you away from that kind of teaching. See To the
Instructor on pp. xxvii ff.
xli
xlii TO THE PRE-SERVICE TEACHER
enough about the Pythagorean theorem to apply it, but how do you assess whether
you know how to prove the theorem when the proofs have already been given in
the text? It is therefore entirely up to you to achieve mastery of everything in the
text itself. One way to check is to pick a theorem at random and ask yourself:
Can I prove it without looking at the book? Can I explain its significance? Can I
convince someone else why it is worth knowing? Can I give an intuitive summary
of the proof? These are questions that you will have to answer as a teacher. To
the extent possible, these volumes try to provide information that will help you
answer questions of this kind. I may add that the most taxing part of writing these
volumes was in fact to do it in a way that would allow you, as much as possible,
to adapt them for use in a school classroom with minimal changes. (Compare, for
example, To the Instructor on pp. xxvii ff.)
There is another special feature of these volumes that I would like to bring to
your attention: these volumes are essentially school textbooks written for teachers,
and as such, you should read them with the eyes of a school student. When you read
Chapter 1 of this volume on fractions, for instance, picture yourself in a sixth-grade
classroom and therefore, no matter how much abstract algebra you may know or
how well you can explain the construction of the quotient field of an integral domain,
you have to be able to give explanations in the language of sixth-grade mathematics
(i.e., to sixth graders). Similarly, when you come to Chapter 6, you are developing
algebra from the beginning, so even the use of symbols will be an issue (it is in
fact the key issue; see Section 6.1 on pp. 298ff.). Therefore, be very deliberate and
explicit when you introduce a symbol, at least for a while.
The major conclusions in these volumes, as in all mathematics books, are sum-
marized into theorems. Depending on the author’s (and other mathematicians’)
whims, theorems are sometimes called propositions, lemmas, or corollaries as a
way of indicating which theorems are deemed more important than others. Roughly
speaking, a proposition is not regarded to be as important as a theorem, a lemma is
conceptually less important than a proposition, and a corollary is supposed to follow
immediately from the theorem or proposition to which it is attached. (Incidentally,
a formula or an algorithm is just a theorem.) This idiosyncratic classification of the-
orems started with Euclid around 300 BC, and it is too late to do anything about it
now. The main concepts of mathematics are codified into definitions. Definitions
are set in boldface in these volumes when they appear for the first time; a few
truly basic ones are even individually displayed in a separate paragraph, but most
of the definitions are embedded in the text itself, so you should watch out for them.
The statements of the theorems, and especially their proofs, depend on the
definitions, and proofs are the guts of mathematics.
Please note that when I said above that I expect you to know everything in
these volumes, I was using the word "know" in the way mathematicians normally
use the word. They do not use it to mean simply "know the statement by heart".
Rather, to know a theorem, for instance, means know the statement by heart, know
its proof, know why it is worth knowing, know what its potential implications are,
and finally, know how to apply it in new situations. If you know anything short
of this, how can you expect to be able to answer your students’ questions? At the
very least, you should know by heart all the theorems and definitions as well as the
main ideas of each proof because, if you do not, it will be futile to talk about the
TO THE PRE-SERVICE TEACHER xliii
Because every assertion in these three volumes (this volume, together with
[Wu2020b] and [Wu2020c]) will be proved, students should be comfortable with
mathematical reasoning. It is hoped that as they progress through the volumes, all
students will become increasingly at ease with proofs. In terms of the undergraduate
curriculum, readers of this volume—as a rule of thumb—should have already taken
the usual two years of college calculus or their equivalents.
1 Unfortunately, a correct exposition of this topic is difficult to come by. Try Chapter 7 of
[Wu2011a].
xlv
Some Conventions
• Each chapter is divided into sections. Titles of the sections are given at
the beginning of each section as well as in the table of contents. Each
section (with few exceptions) is divided into subsections; a list of the
subsections in each section—together with a summary of the section in
italics—is given at the beginning of each section.
• When a new concept is first defined, it appears in boldface but is not
often accorded a separate paragraph of its own. For example:
A subset R in a plane is called convex if given any two points
A, B in R, . . . (p. 172).
You will have to look for many definitions in the text proper. (However,
not all boldfaced words or phrases signify new concepts to be defined,
because boldface fonts are sometimes used for emphasis.)
• When a new notation is first introduced, it also appears in boldface. For
example:
The congruence notation ABC ∼ = A B C will be under-
stood to mean . . . (p. 245).
• Equations are labeled with numbers inside parentheses, and the first digit
of the label indicates the chapter in which the equation can be found.
For example, the "(1.17)" in the sentence "Thus (1.17) implies that . . . "
means the 17th labeled equation in Chapter 1.
• Exercises are located at the end of each section.
• Bibliographic citations are labeled with the name of the author(s) inside
square brackets, e.g., [Ginsburg]. The bibliography begins on page 387.
• In the index, if a term is defined on a certain page, that page will be in
italics. For example, the item
division-with-remainder, 15, 137, 139
means that the term "division-with-remainder" appears in a significant
way on all three pages, but the definition of the term is on page 139.
xlvii
CHAPTER 1
Fractions
consisting of fractions and negative fractions. Unfortunately, this term has been incorrectly used
in the education literature to mean fractions only, but not negative fractions.
2 See page xiv for a definition of TSM.
3 The importance of fractions to the learning of algebra is beginning to be recognized. See,
for example, Recommendation 4 on page xvii of the report of the National Mathematics Advisory
Panel ([NMAP1]). See also the article [Wu2018b].
1
2 1. FRACTIONS
but TSM does not explain its validity when a, b, c, d are fractions, much less when they are
rational numbers. However, one can find such an explanation on page 118 of Section 2.5.
4 1. FRACTIONS
provide a framework for you to communicate with high school students who need
help. You can be certain that, as of 2020, you will often be called upon to explain
fractions and rational numbers to your students. We hope that you will take the
time and trouble to become thoroughly familiar with these two chapters so that
you will be able to make sense of fractions and rational numbers when explaining
them to your students. This will be an important first step toward improving school
mathematics education.
Reasoning in mathematics requires precise definitions for each and every con-
cept used.7 We need a definition of a fraction, not only because this is what
mathematics demands, but also because students need a precise mental image for
fractions to update the mental image of their fingers for whole numbers. Because
7 3
there is no natural image for fractions such as 11 or 13 , it is incumbent on us to
create one for students. They cannot go through life knowing a fraction only as a
piece of pizza, as TSM basically forces them to do. They will have to use fractions
to compute percents for sales tax and volumes of solids for gardening chores even
when no pizza is in sight.
Beyond pizzas, the most common definition TSM has to offer for fractions is
"parts of a whole".8 For students, the difficulty with the conception of a fraction
as "parts of a whole" is multifaceted:
(1) The concept of a "whole" is elusive. TSM never defines
what a "whole" is. It is many things, and thus a moving target.
A concept this nebulous cannot serve as a solid foundation for
learning fractions.
7 Mathematical Aside: Except undefined concepts that are part of the foundational axioms.
8 We should point out the linguistic coincidence that the word "whole" appears in both
phrases, "whole numbers" and "parts of a whole". Be careful not to confuse the two. Fortunately,
we will basically have no occasion to refer to "parts of a whole" in this volume beyond this informal
discussion.
1.1. DEFINITION OF A FRACTION 5
We will use the number line to formulate a definition of fractions (the exposition
of this chapter goes back to [Wu1998] and [Wu2002]; compare 3.NF on page 24 of
[CCSSM]). The fractions will be a particular collection of points on the number line.
The definition will be unambiguous, and the geometric nature of this collection of
points will make it a very accessible mental image of fractions for students.
The number line is the name that school mathematics gives to what is called
in mathematics the real line, i.e., the x-axis. Take a line which is (usually chosen
to be) horizontal and pick a point to designate as 0. The line being horizontal, one
can distinguish between the left direction and the right direction on the line. We
choose another point on the line to the right of 0 and designate it as 1.9 Once a
choice of the two points 0 and 1 has been fixed on this line, the line is called the
number line.
0 1
If a and b are two points on the number line so that a is to the left of b, then the
segment from a to b consists of all the points between a and b, together with the
points a and b themselves; the notation for this segment is [a, b]. (Mathematical
Aside: In calculus, a segment is called a closed bounded interval.)
a b
9 By convention, 1 is to the right of 0. It could have been to the left of 0, of course, if the
The points a and b are called the endpoints of [a, b]; a is the left endpoint and
b the right endpoint. The segment [0, 1] will be called the unit segment, and
its right endpoint 1 will be called the unit on the number line.
We will have to make precise the common notion of "equal parts" on the number
line. To this end, we have to be able to decide if two segments have the same length.
Two segments [a, b] and [c, d] are said to be of the same length if, by sliding one
segment along the number line until their left endpoints a and c coincide, then
their right endpoints b and d also coincide. For convenience, we will also express
the length of [a, b] as the distance between a and b.
Mathematical Aside: We intentionally use the suggestive term of "sliding" for
the comparison of two segments because we are setting the stage for teaching frac-
tions in the upper elementary grades. The mathematical terminology for "slide" is
"translate". Therefore we have implicitly introduced the concept of a translation
into the study of fractions (see page 234 for the definition of translation). Formally,
we are translating [a, b] along the number line from a to c (or, along the vector −→
ac),
and we say [a, b] and [c, d] have the same length if this translation also maps b to d.
This approach to fractions therefore assumes a knowledge of Euclidean geometry.
There is no logical difficulty here as Euclidean geometry can be developed without
reference to numbers if we so wish (see [Hilbert]). In terms of pedagogy, this expo-
sition is set at the right level too because the amount of geometric knowledge that
is implicitly assumed of school students is nothing more than what any ten-year-old
would naturally take for granted.
Back to the number line on which the points 0 and 1 have been chosen. We
choose another point to the right of 1 so that the distance between that point and 1
is the same as the distance between 0 and 1. We designate that point as 2. Then we
choose another point to the right of 2 so that the distance between that point and
2 is the same as the distance between 0 and 1. We designate that point as 3, and so
on. In this way, we get an infinite sequence of equidistant points to the right of
0 (i.e., points on the line so that the distance between any two consecutive points
is the same) to which we have attached the whole numbers, N, {0, 1, 2, 3, . . .}.
Think of the number line as an infinite ruler, as shown:
0 1 2 3 4
We have seen that the choice of a 0 and a 1 on a horizontal line naturally leads
to a sequence of equidistant points to the right of 0. For this reason, we will also
refer to a horizontal line with an infinite sequence of equidistant points identified
with N on its right side as the number line. By definition, a number, or more
precisely, a real number is just a point on the number line.10
Mathematical Aside: It is worth pointing out that implicit in
the above definition of the sequence of whole numbers, N, on
the number line is the assumption that there is a way to tell the
"distance" between any two points on the number line. Thus 2
is the point to the right of 1 so that the distance between 0 and
1 is the same as the distance between 1 and 2, 3 is the point to
10 Mathematical Aside: We are in effect introducing coordinates in the line, in the same
way that we will later introduce coordinates in the plane in Section 6.3. By defining a point on
the number line as a number, we are adopting the usual practice of identifying a point with its
coordinate(s) once the coordinate system has been fixed.
1.1. DEFINITION OF A FRACTION 7
We pause to note that we have just defined explicitly what a number is, namely,
a point on the number line. This may not seem like much until you recall that,
in TSM, the word number is bandied about repeatedly and yet nobody can say
precisely what a number is.
A segment [a, b] is said to have length c, where c is a point on the number
line, if the segments [a, b] and [0, c] have the same length. Thus by definition, the
segment [0, c] has length c. In particular, the unit segment [0, 1] has length 1 and,
for this reason, we sometimes refer to 1 as the unit length or unit distance.
Let it be observed that, insofar as c is just a symbol representing a point on the
number line, to say [a, b] has length c means very little at the moment. However
if c happens to be a whole number, say 7, then to say [a, b] has length 7 gives us a
good idea of how long [a, b] is (because we know what 7 times longer than [0, 1] is).
The next order of business is to expand our pool of "standard numbers" by naming
more numbers on the number line, namely, the fractions.
At this juncture, we can make contact with the common notion of a "whole"
again. The advantage of using the number line to define fractions is that we are in
effect fixing the "whole" once and for all: it will always be the length of the unit
segment [0, 1]. No ambiguity, no guessing. At the same time, the number line has
the flexibility to accommodate any kind of a "whole": if we want to talk about
dividing a pizza, let the unit 1 on the number line be the area of a circular pizza
(always assuming the pizza has uniform thickness so that only the area of the pizza
matters in the division), but if we want to consider the distance of a car from a
starting point, let the unit 1 be one mile. In the former case, the number 3 then
stands for the total area of 3 pizzas, and in the latter case the number 3 will stand
for 3 miles.
We emphasize: contrary to what TSM says, the "whole" is not the unit segment
[0, 1], but the length of the unit segment [0, 1]. The precision here is everything.
TSM has misled students into believing that the "whole" in "parts of a whole" is an
object, e.g., a segment, a square, a pizza, a glass of water, etc., whereas the correct
statement is that the whole is, respectively, the length of a segment, the area of a
square, the area of a pizza, the volume of a glass of water, etc. The sloppiness of the
language in TSM is a main reason for much of students’ nonlearning, as we shall see.
An informal discussion
With the length of [0, 1] as the "whole", let us see informally where the frac-
tions with denominator equal to 3 (i.e., 13 , 23 , 33 , 43 , etc.) should be placed on the
number line. We will assume that we can divide a given segment into any number
of segments of equal length (this is something intuitive and believable and, in any
case, we will show how to do it in the Pedagogical Comments on page 16). The
fraction 13 is one-third of the whole (= the length of [0, 1]) and, therefore, 13 is the
length of one part when we divide the length of [0, 1] into 3 equal parts, i.e., 3 seg-
ments of equal length. If we divide also each of [0, 1], [1, 2], [2, 3], . . . into 3 segments
of equal length, then these division points, together with the whole numbers, form
8 1. FRACTIONS
Any of the following thickened short segments is "one part when the whole is divided
into 3 equal parts" and is therefore a legitimate representation of 13 :
0 1 2 3
1
3
With respect to the standard representation, we observe that the length of this
segment determines its right endpoint, and the right endpoint determines the length
of this segment. Therefore, we may as well identify the standard representation of
1 1
3 with its right endpoint. Then the fraction 3 becomes identified with a point on
the number line.
In like manner, the fraction 53 , being the length of 5 of these short segments,
has the following standard representation (see the thickened segment below).
For a similar reason, we identify the standard representation of 53 with its right
endpoint and proceed to denote the latter by 53 , as shown:
0 1 2 3
5
3
m
In general then, a fraction 3 for a nonzero whole number m has the stan-
dard representation consisting of m adjoining short segments abutting 0, and
we identify this standard representation of m
3 with its right endpoint. When m = 0,
we agree to identify 03 with the point 0. Now each fraction with denominator 3 is
identified with one and only one of the points in the sequence of thirds, as shown:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 10 11
3 3 3 3 3 3 3 3 3 3 3 3
0 1 2 3 4 5 6 7 8 9 10 11
5 5 5 5 5 5 5 5 5 5 5 5
We will now begin the formal presentation of fractions. We say a segment [a, b]
is divided into n equal parts if [a, b] is expressed as the union of n adjoining,
nonoverlapping segments of the same length. (The union of a collection of sets
is the totality of all the points each of which belongs to at least one of the given
sets. For example, the union of the segments [0, 1] and [1, 3] is the segment [0, 3].)
We implicitly assume that it is possible to divide a segment into a given number of
equal parts. (Those who feel uneasy about the possibility of dividing a given segment
into n equal parts for any whole number n may wish to skip to the Pedagogical
Comments on page 16 to get some assurance on how to get it done.)
Divide each of the line segments [0, 1], [1, 2], [2, 3], [3, 4], . . . into 3 equal parts.
The totality of division points, including the whole numbers, forms a sequence of
equidistant points (see p. 6 for the definition), to be called the sequence of thirds.
We now attach symbols to points in the sequence of thirds, as follows. Starting
from left to right, the fraction 03 is, by definition, the first point in the sequence,
which is 0. For a nonzero whole number m, m 3
is the m-th point in the sequence to
the right of 0. Thus 3 is the first point to the right of 0, 23 is the second point, 33 is
1
the third point, etc. Note that 33 coincides with 1, 63 coincides with 2, 93 coincides
with 3, and in general, 3m 3 coincides with m for any whole number m. Here is the
picture:
0 1 2 3 etc.
0 1 2 3 4 5 6 7 8 9 10
3 3 3 3 3 3 3 3 3 3 3
The fraction m 1
3 is called the m-th multiple of 3 . Note that the way we have
1
just introduced the multiples of 3 on the number line is exactly the same way that
the multiples of 1 (i.e., the whole numbers) were introduced on the number line.
By doing to 13 exactly what we did to the number 1 in putting the whole numbers
on the number line, we obtain all the whole-number multiples of 13 , i.e., all the
3 , where m ∈ N. (The symbol "∈" means belonging to or belongs to.)
m
whole numbers) form an infinite sequence of equidistant points on the number line,
to be called the sequence of n-ths. Starting from left to right, 0 is denoted by
0 1
n
. The first point in the sequence to the right of 0 is denoted by n , the second
2 3
point by n , the third by n , etc., and the m-th point in the sequence to the right
of 0 is denoted by m n
, to be called the m-th multiple of n 1
. The sequence of
1
n-ths therefore consists of all the whole-number multiples of n , i.e., those
m
points denoted by n for some whole number m.
Definition. The collection of all the sequences of n-ths, as n runs through the
nonzero whole numbers 1, 2, 3, . . . , is called the fractions. For a nonzero whole
number m, the m-th point to the right of 0 in the sequence of n-ths is denoted by m n.
The number m is called the numerator and n is called the denominator of the
symbol mn . By the traditional abuse of language, it is common to say that m and n
are the numerator and denominator, respectively, of the fraction m n
.11 By
0
convention, 0 is denoted by n for any n.
n 2n 3n 4n
(A) It is self-evident that n = 1, n = 2, n = 3, n = 4, and in general,
kn
(1.2) = k, for all whole numbers k, n, where n = 0.
n
Note that we have followed the convention of denoting the product of the whole
numbers k and n by kn rather than k×n. Now, letting n = 1 and k = 1, respectively,
we get
k n
= k and = 1 for all whole numbers k, n (n = 0).
1 n
11 The 100% correct statement is of course that "m is the numerator of the symbol which
denotes the fraction that is the m-th point of the sequence of n-ths, and n is the denominator of
this symbol." Needless to say, almost no one in real life—inside or outside math class—ever talks
like this.
12 Of course, 13 miles would be called "3 and a quarter miles" in everyday conversation, for
4
a reason to be explained on page 36.
1.1. DEFINITION OF A FRACTION 11
(B) For the study of fractions, the need for precision about what the unit is
cannot be overstated. On one level, it is impossible to say which point is what
fraction until the unit segment is fixed, i.e., the two points 0 and 1. Thus the
following fixed point P on the given horizontal line is either 24 or 2, depending on
which point is chosen to be the unit 1:
0 P 1
0 1 P
On a second level, without a precise statement about what the unit stands for,
it would be impossible to say what "equal parts" means, and without that, the
ambiguity would likely lead to nonlearning (see Exercise 4 on page 18). For example,
if the unit 1 "stands for a cup of water" (as is commonly done in TSM), does 13
mean a third of the volume of the liquid in the cup or a third of the liquid by height
(imagine that the cup is the usual curved shape and not a right circular cylinder)?
Or, if the unit 1 "stands for a ham", does 13 mean a third of the meat or a third of
the ham by weight including the bone? We can also give a slightly different example
to expose the error resulting from such ambiguity: suppose the unit ("the whole")
is a pizza and we ask what fraction is represented by putting one of the four pieces
on the left below together with one of the eight pieces on the right below:
'$'$
@
@
@@
&%&%
When the answer of 38 is not forthcoming, a common conclusion is that students
"do not have the necessary conceptual understanding of a fraction". However, if
students are taught that the "whole" is a pizza, they may very well think of 1 as
the shape of the pizza, so each fraction becomes a shape and they naturally would
not know how to put two shapes together to get a fraction. Students would be more
likely to "get it" if, instead of saying that "the pizza represents 1", we tell them
that the area of the pizza (ignoring its depth) represents 1.13 Then (assuming they
know what area is) they would have a better chance of seeing that the area of a
piece on the left above is equal to twice the area of a piece on the right, so that the
answer to the question should be 38 .
(C) We have been talking about the number line, but in a literal sense this way
of speaking is incorrect. A different choice of the line or even a different choice of
the positions of the numbers 0 and 1 would lead to a different number line. What is
true, however, is that anything done on one number line can be done on any other
in exactly the same way,14 and therefore we may—and do—identify all of them.
Now it makes sense to speak of the number line.
13 The need for precision about the unit exposes the common fallacy that introducing students
to fractions via pizzas is good pedagogy. The area of a curvilinear figure like the disk is too
sophisticated for children. The length of a segment or the area of a rectangle is a better alternative.
14 Mathematical Aside: All number lines are similar to each other in the sense of the definition
on page 284 so that they can be identified via similarity. Algebraically, the real numbers form a
complete ordered field, and since all complete ordered fields are isomorphic, we can also identify
them via isomorphism.
12 1. FRACTIONS
(D) Although a fraction is formally a point on the number line, the informal
discussion above makes it clear that on an intuitive level, a fraction m
n is just the
segment [0, m
n ]. So in the back of our minds, the segment image should never go
away completely, and this fact is reflected in the language we now introduce. First,
we give a definition.
L1 L2
(E) In school mathematics, the meaning of the equal sign is a subject that
is much discussed (see, e.g., Chapter 2 of [Carpenter et al.]), mainly because the
meaning of equality is never made clear. In addition, you will find later on the
traditional use of the word equivalent for fractions when equal is meant. Such a
cavalier attitude towards "equality" only adds to the confusion. For this reason,
we make explicit the fact that, by definition, two fractions k and m n
are equal
(or, equivalent), in symbols, k = m
n
, if they are the same point on the number
line. We have already seen in equation (1.1), for example, that kn k
n = 1 = k for any
n, k ∈ N. Incidentally, because there is no definition for fractions in TSM, TSM
has no precise meaning for the equality of two fractions k and mn.
(F) The definition of a fraction as a point on the number line allows us to make
precise the concept of order among fractions, i.e., the concept that one fraction is
smaller than (or bigger than) another. First, consider the case of whole numbers.
The way we put the whole numbers on the number line, a whole number m is
smaller than (or less than) another whole number n (in symbols, m < n) if m
is to the left of n (thus [0, m] is shorter than [0, n]). We expand on this fact by
defining, for two points A and B on the number line, A is smaller than B if A
is to the left of B. In symbols, A < B. We call this an inequality. In particular,
if A and B are fractions, A < B means A is to the left of B.
A B
Thus, 43 < 53 because, in the sequence of thirds, the 4th member in the sequence is
to the left of the 5th member of the sequence (see page 9).
There are two things about the definition of A < B that are worth noting here.
One is that, according to this definition, one cannot compare two given fractions
until both fractions are placed on the same number line. More to the point, what
1.1. DEFINITION OF A FRACTION 13
this says is that we can compare fractions only when they refer to the same unit.
The other is that in TSM, the concept of A < B between fractions is never defined
beyond inscrutable statements such as "A < B when B names a greater amount
than A". The reason for this omission is obvious: since there is no definition for the
concept of a fraction, a fraction is an unknown object and, therefore, it is impossible
to say how one unknown object A can be smaller than another unknown object B.
Sometimes the inequality B > A is used in place of A < B. Then we say B
is bigger (or greater) than A. This is the place to mention two related symbols.
A ≤ B means either A < B or A = B. Sometimes we refer to an inequality such
as A ≤ B as a weak inequality. For example, the weak inequality 12 ≤ 12 may
seem odd at first glance, but when it is realized that all it says is either 12 < 12 or
2 = 2 , we have to agree that 2 ≤ 2 is a correct statement since it is correct that
1 1 1 1
The need for 5 and 14 is obvious; the role of the "fraction bar" is to separate the
14
5 from the 14 so that, for example, one does not confuse 5 with 145. It is with
the same need for separation in mind that when a fraction such as 14
5 is sometimes
shown horizontally as 14/5, there is a slant bar between the 14 and the 5.
This piece of information about what makes up the fraction symbol should be
clearly conveyed to students in grades 5–7.
15 Regardless of what TSM has to say, this is the correct definition of a finite decimal, one
20
Activity. Can you locate the fraction 15 ? How is it related to 43 ?
The key idea will turn out to be the use of division-with-remainder for whole
numbers.16 First, look at the multiples of 17: 0, 17, 34, 51, 68, 85, . . . . Thus the
68-th multiple of 17 1
is 4 (because 68 = 4 × 17), and the 85-th multiple of 171
is 5
(because 85 = 5 × 17). Therefore, 17 lies between 17 (= 4) and 17 (= 5) and is just
84 68 85
1 84 1
17 shy of 5; i.e., 17 is the point on the number line which is 17 to the left of 5. In
terms of division-with-remainder, since 84 = (4 × 17) + 16, we have
84 (4 × 17) + 16
= .
17 17
1
So if each step we take is of length 17 , going another 16 steps to the right of 4 will
get us to 17 . If we go 17 steps instead, we will get to 5. Therefore 84
84
17 should be
quite near 5, as shown:
0 1 2 3 4 5 6 7 8
6
84
17
In general, if m
n is a fraction and division-with-remainder gives m = qn + k,
where q and k are whole numbers and 0 ≤ k < n, then
m qn + k
= ,
n n
qn
and the position of m n on the number line will be between q (= n ) and q + 1
(= (q+1)n
n , which is qn+n
n ).
algorithm" in abstract algebra texts. There is a good reason why the latter terminology is not
used in school mathematics: it would be too easily confused with the "long division algorithm".
16 1. FRACTIONS
a mixed numbers or the symbol 4 16 17 at this point. If you do, what would you tell
your students about the meaning of the symbol 4 16 17 ? Do not say, as TSM does,
that it is 4 and 16
17 , because how would your students interpret the word "and" in
this context?17 It actually means the addition of the fraction 4 and the fraction 16
17 ,
as in 4 + 16
17 . But fraction addition has not yet been defined, so you would confuse
students by abusing the word "and" here. TSM has a habit of using this word
"and" inappropriately. See page 35 for another flagrant example.
0 1
Of course the same trick works for equal division into n equal parts for any nonzero
whole number n.
A second answer is to confront the issue directly: given a segment, we will
show how to use plastic triangles and compass to divide it into any number of
equal parts. Suppose we have to divide a given segment AB into 3 equal parts.
Referring to the picture below, we draw an arbitrary ray L (see page 174 for the
precise definition) issuing from A and, using a compass, mark off three points C, D,
and E in succession on L so that AC, CD, and DE have equal length, the precise
length of AC being irrelevant. For example, if you make each of AC, CD, and DE
half an inch long, then you do not even need a compass. Join BE, and through C
and D draw lines parallel to BE 18 that intersect AB at C and D , respectively.
The points C and D are then the desired division points on AB to achieve the
equidivision, i.e., AC , C D , and D B are of equal length, as shown:
17 To understand why this confuses students, if "4 and 16 " is all they know about 4 16 , we
17 17
can look ahead and ask how they can compute with something like 4 16 17
× 12 17
3
?
18 The use of plastic triangles for this purpose is probably well known, but if not, see pp.
L
E
D
C
A C D B
Exercises 1.1. In doing these and subsequent exercises, please observe the
following basic rules:
(a) Show your work; your explanation is as important as
your answer.
(b) Be clear. Get used to the idea that, as a teacher, every-
thing you say has to be understood.
(1) Indicate the approximate position of each of the following on the number
line, and briefly explain why: (a) 1.24, (b) 186 1257 77 132
11 , (c) 132 , (d) 355 , (e) 1257 .
(2) Suppose the unit 1 on the number line is the area of the following region
enclosed by the thickened segments, where the given square is divided into
eight congruent rectangles (and therefore eight parts of equal area):19
Observe that a region whose area is 18 of the area of the given square is the
fraction 15 relative to this unit. In terms of this unit, what is the fraction
represented by the area of each of the following shaded regions of the same
square? Give a brief explanation of your answer. (In the picture in the
middle, the square is divided into eight congruent rectangles, and in the
picture on the right, two copies of the same square share a common side
and the square on the right is divided into four parts of equal area.)
19 We will give a precise definition of congruence in Chapter 4 and will formally discuss
area in Chapter 4 of [Wu2020c]. In this chapter, we only make use of both concepts in the
context of triangles and rectangles, and then only in the most superficial way. For the purpose
of understanding this chapter, you may therefore take both concepts in the intuitive sense. If
anything more than intuitive knowledge is needed, it will be supplied on the spot, e.g., on page
47 later on in this chapter.
18 1. FRACTIONS
@
@
@
@
@
@
(3) With the unit as in Exercise 2 above, write down the fraction representing
the area of the following shaded region (assume that the top and bottom
sides of the square are each divided into three segments of equal length):
(Hint: If you divide the given square horizontally into 8 congruent rect-
angles, then you can figure out the area of the shaded region in terms of
1
the 24 ’s of the area of the given square.)
(4) It was emphasized in the text (page 11) that the concept of "equal parts"
should be made precise in the teaching of fractions. This exercise gives
one example to illustrate this need. There are numerous others (see, e.g.,
the case of Two Green Triangles on page 86 of the case book [Barnett-
Goldstein-Jackson]).
A text on professional development claims that students’ conception
of "equal parts" is fragile and is prone to errors. As illustration, it says
that when a circle is presented as in the left picture below to students,
'$ '$
QQ
&% &%
2
they have no trouble shading 3 . However, it goes on to say that when
these same students are asked to construct their own picture of 23 , we often
see them create pictures with unequal pieces as in the right picture above.
(a) Referring to the latter presentation of "thirds" by students, in what
sense are the three pieces "equal"? In what sense are they "unequal"? (b)
What would you do as a teacher to prevent students from acquiring such
a misconception about "equal parts"?
(5) Ellen ate 15 of a large pizza with a 10-inch diameter and Kate ate 14 of a
smaller pizza with an 8-inch diameter. (Assume that all pizzas have the
same thickness and that the fractions of a pizza are measured in terms
of area.) Ellen told Kate that since she had eaten more pizza than Kate,
1 1
5 > 4 . (i) Did Ellen eat more pizza than Kate? (ii) Is Ellen’s assertion
that 15 > 14 correct? Explain why or why not. (You are allowed to use the
usual area formula for a circle.)
(6) (Review remark (B) on page 11 on the importance of the unit before do-
ing this exercise. Also make sure that you do it by a careful use of the
definition of a fraction rather than by some intuition you possess that you
cannot explain to your students.)
1.2. EQUIVALENT FRACTIONS 19
(a) After driving 218 miles, we have gone only two-thirds of the dis-
tance we planned to drive for the day. How many miles did we plan to
drive for the day? Explain.
(b) After reading 236 pages of a book, I am exactly four-fifths of the
way through. How many pages are in the book? Explain.
(c) Alexandra was three-quarters of the way to school after having
walked 0.72 miles from home. How far is her home from school? Explain.
(7) Take a pair of opposite sides of a unit square and divide each side into
7 equal parts. Join the corresponding points of division to obtain 7 thin
rectangles (we will assume that these are rectangles). For the remaining
pair of opposite sides, divide each into 5 equal parts and also join the
corresponding points of division; these lines are perpendicular to the other
7 lines. The intersections of these 7 and 5 lines create 7×5 small rectangles
which are congruent to each other (we will assume that too). What is the
area of each such small rectangle, and why? (Compare page 48 below.)
(8) Three segments (thickened) are on the number line, as shown:
137
A B 25 C
3 4 5 6 7
It is known that the length of the left segment is 11 16 , that of the middle
8 23
segment is 17 , and that of the right segment is 25 . What are the fractions
A, B, and C? (Caution: Remember that you have to explain your answers,
and that you know nothing about "mixed numbers" until we come to this
concept on page 36 below.)
(9) The following is found in a certain third-grade workbook:
Each of the following figures represents a fraction:
Does this problem make sense as it stands? If so, explain your answer
clearly. If not, how would you rephrase it so that it makes sense?
the mathematics education literature. Unlike TSM, we also explicitly define the
concept of k of m
n
and use it to prove the division interpretation of a fraction,
which is erroneously considered in TSM to be part of the definition of a fraction.
The fundamental theorem (p. 20)
The cross-multiplication algorithm (p. 21)
The concept of k of mn
(p. 24)
The division interpretation of a fraction (p. 28)
Recall that two fractions are said to be equal (or equivalent) if they are the
same point on the number line. For example, we observed in equation (1.2) on
page 10 that nk k
n = 1 , as both are equal to k. The following is a generalization of
this simple fact.
Then m k
n = .
Now suppose we further divide each of the segments between consecutive points in
the sequence of thirds into 5 equal parts. Then each of the segments [0, 1], [1, 2],
[2, 3], . . . is now divided into 5 × 3 = 15 equal parts and, in an obvious way, we
have obtained the sequence of fifteenths on the number line:
4
0 1 3
The point 43 , being the 4-th point in the sequence of thirds, is now the 20-th point
in the sequence of fifteenths because 20 = 5 × 4. The latter is by definition the
fraction 20 5×4 4 5×4
15 , i.e., 5×3 . Thus 3 = 5×3 .
1.2. EQUIVALENT FRACTIONS 21
The preceding reasoning is enough to prove the general case. Thus let k = cm
and = cn for whole numbers c, k, , m, and n. We will prove that m k
n = . In
other words, we will prove equation (1.4) above.
The fraction m n is the m-th point in the sequence of n-ths. Now divide each
of the segments between consecutive points in the sequence of n-ths into c equal
parts, so that each of [0, 1], [1, 2], [2, 3], . . . is now divided into cn equal parts. Thus
the sequence of n-ths, together with the new division points, becomes the sequence
of cn-ths. Simple reasoning shows that the m-th point in the sequence of n-ths
must be the cm-th point in the sequence of cn-ths. This is another way of saying
m cm
n = cn . The proof is complete.
As mentioned earlier (page 12), it is a tradition in school mathematics to say
that two fraction (symbols) k and m k
n are equivalent if they are equal, i.e., if and
m
n are the same point (see page 12). In this terminology, Theorem 1.1 gives a
sufficient condition for two fractions k and mn to be equivalent, and this accounts
for the name of the theorem.
There seems to be little awareness of the power of Theorem 1.1 on equivalent
fractions in TSM. Consequently, the role played by Theorem 1.1 in the TSM cur-
riculum is a minimal one: it is mostly used to reduce fractions. This is wrong,
because Theorem 1.1 is the fundamental fact about fractions. As a first demon-
stration of this claim, we now use Theorem 1.1 to bring closure to the discussion
on page 14 of the last section about the decimal 5.8900. Recall that we had, by
definition,
58900
= 5.8900.
104
We will show that 5.8900 = 5.89 and, more generally, one can append zeros to
or delete zeros from the right end of a finite decimal—to the right of the decimal
digits—without changing the number. Indeed,
58900 589 × 102 589
5.8900 = = 2 = 2 = 5.89,
104 10 × 10 2 10
where the third equality makes use of the cancellation law (1.4). The reasoning is
of course valid in general; e.g.,
127 127 × 104 1270000
12.7 = = = = 12.70000.
10 10 × 10 4 105
The rest of this chapter may be said to be nothing more than an extended demon-
stration of the importance of Theorem 1.1.
We have just seen that Theorem 1.1 gives a sufficient condition for two fractions
m
n and k to be equal; there is a nonzero whole number c so that k = cm and
= cn. There is an obvious interest in such a sufficient condition because each
symbol represents a point on the number line and one would like to be able to
decide whether the two symbols represent the same point or not. On the other
hand, the condition in Theorem 1.1 is not a necessary condition, in the sense that
the equality m k
n = does not imply that k = cm and = cn for some whole number
22 1. FRACTIONS
In the context of Theorem 1.2, it becomes clear why we chose to state Theorem
1.1 in that clumsy way: it exhibits the close relationship between the theorem on
equivalent fractions and the cross-multiplication algorithm.
For later needs, we pause to note that there are several different but equally
valid ways to state Theorem 1.2. One way is to say that
k m
= if and only if kn = m.
n
Another says
k m
= is equivalent to kn = m.
n
A more symbolic way is
k m
= ⇐⇒ kn = m.
n
No matter how the theorem is stated, all it says is that both of the following
statements are valid:
First, k = mn implies kn = m.
Second, kn = m implies k = m n.
As is well known, each is said to be the converse of the other.
Mathematical Aside: From the vantage point of abstract algebra, the impor-
tance of Theorem 1.2 (and hence of Theorem 1.1) is manifest because the cross-
multiplication algorithm is exactly the equivalence relation between ordered pairs
of integers when fractions are defined as equivalence classes of such ordered pairs:
(a, b) ∼ (a , b ) if and only if ab = a b.
From the proof of Theorem 1.2, we can extract a very useful statement about
pairs of fractions, which we call the Fundamental Fact of Fraction-Pairs
(FFFP):
Any two fractions may be symbolically represented as two frac-
tions with the same denominator.
Such a denominator is called a common denominator of the two fractions. In
other words, there will always be some whole number q, so that these two fractions
belong to the same sequence of q-ths. The reason is simple: we can simply take
q = the product of the denominators because, if the fractions are m k
n and , then
by Theorem 1.1, we have
m m k nk
(1.6) = and = .
n n n
The two fractions are now seen to be the m-th and nk-th members in the sequence
of n-ths. That said, we should also call attention to the fact that, in some special
cases, some fractions can be put on equal footing without having to multiply their
denominators. For example, we can use 12 as a common denominator for 32 and
9 3 12
8 because 2 = 8 . However, knowing that the product of the denominators in
question always works creates a "comfort zone" in such considerations.
We can paraphrase FFFP this way: any two fractions can be put on an equal
footing, in the sense that they can always be put in the same sequence of q-ths
for some q so that they become directly comparable. In the notation of (1.6), if
m < nk, then in the sequence of n-ths, m n (being the m-th member) is to the
left of k (being the nk-th member) and is therefore the smaller of the two. An
analogy is to compare 155 inches with 4 meters: one cannot get a sense of which is
longer until both measurements are put on an equal footing in the sense that they
are expressed in terms of the same unit, e.g., an inch. Then since 1 inch is 2.54
cm, 155 inches is 155 × 2.54 = 393.7 cm = 3.937 meters, which is shorter than 4
24 1. FRACTIONS
meters.20 This is how we can tell that 4 meters is longer. In the same way, given 23
and 57 , we may replace them with 14 15 2 5
21 and 21 , respectively, and conclude that 3 < 7 .
There will be numerous applications of FFFP in subsequent discussions.
k m
The concept of
of n
For a reason that will become clear in the following subsection, we have in-
tentionally used the word "partition" in place of the usual word "divide" in the
preceding definition.
20 We freely make use of the multiplication of final decimals here because we are only giving
that extra time be spent on learning it because it is critical for understanding frac-
tion multiplication (and therefore also for understanding fraction division). End
of Pedagogical Comments.
8
7 is 8 copies of 17 . Now subdivide each of these 8 segments of length 1
7 into 5 equal
parts, as shown:
8
0 1 7
The unit segment is now partitioned into 5 × 7 = 35 equal parts, so that the new
division points furnish the initial points in the sequence of 35-ths. The segment
[0, 87 ] is now partitioned into 40 equal parts by this sequence of 35-ths. If we take
every 8th division point in this sequence of 35-ths, starting with 0, then we get a
partition of [0, 87 ] into 5 equal parts. So the length of a part in the latter partition
8
is 35 . (Of course, what we have done is merely to reprove the theorem on equivalent
fractions in the special case of 87 = 5×8
5×7 .)
This way of exploiting equivalent fractions will be seen to clarify many aspects
of fractions, such as the interpretation of a fraction as division (next subsection) or
the concept of multiplication (Section 1.4). It also allows us to solve word problems
of the following type.
Example. Prema walked 25 of the distance from home to school, and there
was still 49 of a mile to go. How far is her home from school?
Solution. We can draw the distance from home to school on the number line,
with 0 being home, the unit 1 being a mile, and S being the distance of the school
26 1. FRACTIONS
from home.21 Then it is given that, when the segment from 0 to S is partitioned
into 5 equal parts, Prema was at the second division point after 0:
0 Prema S
4
9 mi
For convenience, call any one of these five segments a short segment. Then the
total distance from home to school is 5 times the length of a short segment, and
we are given that the distance from where Prema stands to S comprises 3 short
segments. We are also given that the distance from where Prema stands to S is 49
of a mile. If we can find out how long a third of 49 of a mile is, then we will know
the length of a short segment and the problem will be solved. By the theorem
on equivalent fractions, we can easily change 49 to an equivalent fraction whose
numerator is divisible by 3; e.g.,
4 3×4 3×4 4+4+4
= = = ,
9 3×9 27 27
and this exhibits 49 as 3 copies of 27
4
. Therefore a third of 4
9 is 4
27 . The total distance
4
from 0 to S is thus 5 copies of 27 , which is
4+4+4+4+4 20
= .
27 27
The distance from Prema’s home to school is therefore 20 27 miles.
We would like to point out that the preceding problem is one of the standard
problems on fractions which is usually given after the multiplication of fractions has
been introduced, and the solution method is given out as a rote algorithm ("flip
over (1 − 25 ) to multiply by 49 "). However, we now see that there is no need to use
multiplication of fractions for the solution, and, in addition, the reasoning behind
the present method of solution is so simple that there is no need to memorize any
solution template.
As a final remark on the concept of " k of m n ", we prove the following theorem
that will be useful for our consideration of fraction multiplication in Section 1.4.
First, we obtain a general formula for k of mn , which will be part (i) of the theorem.
As motivation for part (ii) of the theorem, recall the fact proven above that 21 51
14 = 34 .
m
If n is a fraction, then we expect the following to hold:
21
of m
14
51 m
n = 34 of n .
But according to the definition of "of " on page 24, this equality asserts that if [0, m
n]
is divided into 14 equal parts, then the length of 21 concatenated parts would be
equal to the length of 51 concatenated parts when the same segment [0, m n ] is divided
into 34 equal parts. This is not obvious. Moreover, if m n = M
N , then we also expect
that
21
14 of m 51 M
n = 34 of N .
21 You may notice that the unit 1 is not shown in the picture. This is because we do not
know ahead of time whether S > 1 or S < 1, so we cannot place 1 on this number line until the
problem has been solved.
1.2. EQUIVALENT FRACTIONS 27
This equality clearly needs a proof, and part (ii) of the following theorem will take
care of that. Finally, the notation in the theorem is worthy of a comment. We will
be dealing with four fractions which will be assumed to be equal in pairs. So we
use lower case k and upper case K L to denote the same fraction in part (ii) to ease
the memory load somewhat. The same is true for m M
n and N .
(ii) If k = K m M
L and n = N , then
k m K M
of n = L of N .
However, (1.7) tells us that, because multiplication between whole numbers is com-
mutative, the equality (1.8) is actually correct.
Using the idea of " k of mn ", we now give a completely different interpretation
of a fraction, the so-called division interpretation. We will prove (n = 0):
m
= the length of one part when [0, m]
n
(1.9) is partitioned into n equal parts.
Proof of (1.9). We observe that, by the definition of "of " on page 24, the
right side of (1.9) is equal to n1 of m, i.e., n1 of m
1 , which, by Theorem 1.3(i) on
1×m m
page 27, is equal to n×1 = n , the left side of (1.9). The proof of (1.9) is complete.
Due to the importance of (1.9), we now give a direct proof without appealing
to Theorem 1.3(i). To partition [0, m] into n equal parts, we use (1.2) on page 10
to express m as nm 1
n . By the definition of a fraction, m is nm copies of n . If we
1
first group m copies of n together and call it B, then we are saying that [0, m] is n
copies of B. Therefore the right side of (1.9) is equal to the length of B. Since B
is m copies of n1 , its length is equal to m
n by definition. This again proves (1.9).
It may surprise the reader to learn that the right side of (1.9) is actually
something familiar to us, at least in certain settings. Consider the special case that
m is a whole-number multiple of n; let us say m = kn for some whole number k.
In the right side of the equality (1.9), we want the length of one part when [0, m] is
partitioned into n equal parts. Since m = kn, [0, m] is just the concatenation of kn
copies of the unit segment [0, 1]. Therefore we may partition kn copies of [0, 1] (i.e.,
[0, m]) into n equal groups ("equal" in terms of length) where each group consists
of exactly k copies of [0, 1], Thus, the right side of (1.9) is just the total length
of these k copies of [0, 1], which is k. Now recall the definition of the partitive
interpretation of the division m÷n, which is the number of objects in a group
if m (= kn) objects is divided into n equal groups. If we interpret "object" in this
case to be "[0, 1]", then the right side of (1.9) is precisely m ÷ n in the partitive
sense.
We repeat, if m = kn for some whole number k, then the right side of (1.9) is
equal to the partitive division m ÷ n, and (1.9) becomes the statement that
m
(1.10) = m ÷ n.
n
Behind the symbols in (1.10) lies this statement: if m is a whole number multiple
of n, then the fraction m
n is the same point on the number line as the partitive
division m ÷ n. Moreover, at this moment, (1.10) has no meaning when m is not a
1.2. EQUIVALENT FRACTIONS 29
This then suggests the idea that if we are willing to "expand" the meaning of m ÷ n
even when m is not a whole-number multiple of n, then the right side of (1.9)
would be a good starting point; i.e., we can make sense of m ÷ n for any two whole
numbers (n = 0) by defining it to mean the right side of (1.9). This leads to the gen-
eral concept of the division m÷n of any two whole numbers m and n (n = 0):
m
= m ÷ n.
n
There are three observations to be made about this theorem. First, the fact
n is equal to a division m ÷ n is called "the division interpretation
that a fraction m
of a fraction". However, there are two serious conceptual errors concerning this
"interpretation" in TSM: (a) this "interpretation" is a fact that we can prove (see
Theorem 1.4) rather than an ad hoc meaning one sees fit to confer on a fraction
(as in TSM), and (b) the two meanings of m ÷ n when m is—or is not—a whole-
number multiple of n require a careful discussion and differentiation (which is never
done in TSM). The failure described in (b) leads to students’ confusion about the
very meaning of whole-number division. A second observation is that what we
called the "expansion" of the meaning of m ÷ n is called, in technical language,
an extension of the usual concept of whole-number division. This means that
although the whole numbers m and n in the preceding concept of m ÷ n have been
freed from the restriction of m being a whole-number multiple of n, yet when m
is a whole-number multiple of n, this definition of m ÷ n is the same as the old
one on account of the discussion preceding (1.10). The general idea of extension
30 1. FRACTIONS
(4) School textbooks usually present the cancellation law for fractions as fol-
lows:
Given a fraction m
n , suppose a nonzero whole number k divides
both m and n. Then m m÷k
n = n÷k .
A B C D E
0 1 2 6 3 4 6 5
13
2.7 3
(ii) Prove that the following three statements are equivalent for any four
whole numbers a, b, c, and d, with b = 0 and d = 0:
a c a c a+b c+d
(a) b = d. (b) a+b = c+d . (c) b = d .
(One way is to prove that (a) implies (b) and (b) implies (a). Then
prove (a) implies (c) and (c) implies (a).)
(10) Place the three fractions 13 11 9
6 , 5 , and 4 on the number line and explain
how they get to be where they are.
(11) For which fractions m m m+b
n is it true that n = n+b , where b is a nonzero
whole number?
(12) m k
n of a fraction is equal to . What is this fraction?
32 1. FRACTIONS
0 4 11
7
Similarly, if we have two whole numbers m and n, then m + n is simply the length
of the concatenation of the two segments of length m and n:
m n
m+n
With the expectation that addition should mean the same thing for whole num-
bers and fractions, we are led to the following definition of the sum of two fractions:
k m
Definition. Given fractions k and m
n , we define their sum + n by
k m
+ = the length of two concatenated segments, one
n
k m
of length , followed by one of length :
n
k m
n
k m
+ n
It follows directly from this definition that the addition of fractions satisfies the
associative and commutative laws; see Exercise 1 on page 41 (cf. the appendix on
page 86 for a summary of these laws).
It is also an immediate consequence of the definition that
k m k+m
(1.11) + =
because both sides are equal to the length of k + m copies of 1 (see page 12 for
the meaning of "copy"). More explicitly, the left side is the length of k copies of 1
combined with m copies of 1 and is therefore the length of k + m copies of 1 , which
is exactly the right side. This tells us that, to compute the sum of two fractions
with the same denominator , one adds them as one would with whole numbers,
with the only difference being that, instead of adding so many copies of the unit 1,
we now add so many copies of the unit fraction 1 , as above.
Because of FFFP (see page 23), the general case of adding two fractions with
unequal denominators is immediately reduced to the case of equal denominators;
i.e., in order to add
k m
+
n
Up to this point, when two fractions are given, we have used the product of
their denominators as a common denominator for both, i.e., a whole number so
that both fractions are equivalent to a fraction with that as their denominator.
Sometimes it can happen that a different, smaller common denominator is "handed
to you for free". For example, when the denominator of one of the fractions is
34 1. FRACTIONS
already a multiple of the other denominator, then the bigger denominator already
serves as a common denominator; e.g.,
3 7 6 7 13
+ = + =
4 8 8 8 8
or
27 13 27 130 157
(1.13) + = + = .
100 10 100 100 100
(Incidentally, the second example can be equally well expressed as 0.27+1.3 = 1.57.
See page 14 for the decimal notation.) A slightly more sophisticated example is
7 5 25
12 + 8 . It is relatively easy to notice that 24 is the least common multiple of 12
and 8 and, as such, 24 is the least common denominator of the two fractions. Since
24 = 2 × 12 = 3 × 8, the addition can be done more simply as follows:
7 5 2×7 3×5 14 + 15 29
+ = + = = .
12 8 2 × 12 3 × 8 24 24
By comparison, if we use (1.12), then we would get
7 5 (7 × 8) + (12 × 5) 116 29
+ = = =
12 8 12 × 8 96 24
where the last equality in the parentheses is due to the cancellation law (page 20).
In general, suppose m k
n and are given and there is a whole number D that is
a common multiple of both n and , say D = n = n; then bthe computation of
the sum k + m n can make use of D as the common denominator instead of n, as
follows:
k m kn m kn + m
+ = + = .
n n n D
A more interesting example to illustrate the advantage of using a simpler common
denominator can be found in Exercise 13 on page 42.
Then
40451 + 72800
4.0451 + 7.28 = (corresponds to (α))
104
113251
= (corresponds to (β))
104
= 11.3251 (corresponds to (γ)).
The reasoning is of course completely general and serves to explain the algorithm
(α)–(γ) for any pair of decimals.
A second application is to get the so-called complete expanded form of a (finite)
decimal. For example, given 4.1297, we know it is the fraction
41297
.
104
But we have the expanded form of the whole number 41297:
41297 = (4 × 104 ) + (1 × 103 ) + (2 × 102 ) + (9 × 101 ) + (7 × 100 ).
(Recall that 100 is by definition equal to 1.) We also know that, by equivalent
4 3
fractions, 4×10
104
= 4, 1×10
104
= 10 1
, etc. Thus by (1.12) on page 33,
1 2 9 7
(1.14) 4.1297 = 4 + + + 3 + 4.
10 102 10 10
This expression of 4.1297 in (1.14) as a sum of descending powers27 of 10, where
the coefficients of these powers are the digits of the number itself (i.e., 4, 1, 2, 9,
and 7), is called the complete expanded form of 4.1297. Equation (1.14) is the
reason that we can say 4.1297 is "the sum of 4 and 1 tenth and 2 hundredths and
9 thousandths and 7 ten-thousandths". Please observe that this conclusion is a
precise, logical consequence of the definition of a decimal (in equation (1.3) on page
14) and the addition formula (1.11) on page 33. Please do not make the standard
TSM mistake of telling students—without first proving (1.14)—that 4.1297 "means
4 and 1 tenth and 2 hundredths and 9 thousandths and 7 ten-thousandths". You
would confuse them. In the same way, a decimal 0.d1 d2 · · · dn ,28 where each dj is a
single-digit number, has the following complete expanded form:
d1 d2 dn
(1.15) 0.d1 d2 · · · dn = + 2 + ··· + n.
10 10 10
than another now becomes our next concern. To this end, we will need the basic
inequality in Theorem 1.5 below. Recall from page 12 that k < m
n means the point
k m
is to the left of the point n on the number line:
k m
n
Therefore the latter is to the left of the former. Consequently, 49 is the bigger of the
two. This reasoning is perfectly valid in general, so we have the following theorem.
have kn m 1
n < n . This means the kn-th multiple of n is to the left of the m-th
multiple of the same, so that kn must be smaller than m; i.e., kn < m. Conversely,
1
suppose kn < m. Then the kn-th multiple of n is to the left of the m-th multiple
1
of n , so that
kn m
< .
n n
By the theorem on equivalent fractions, this becomes k < m
n . The proof of the
theorem is complete.
Proof. (i) This is because the preceding theorem implies that the inequality ac < bc
is equivalent to the inequality ac < bc. Now, since a, b, and c are whole numbers,
ac = the sum of a copies of c and, similarly, bc = the sum of b copies of c. Therefore
ac < bc is equivalent to a < b, and part (i) is proved.
(ii) The inequality > n is equivalent to n < , which is of course identi-
cal to 1 · n < 1 · which, by the preceding theorem, is equivalent to 1 < n1 . The
corollary is proved. Note that we have just made use of the dot in 1· to denote 1×.
While this kind of intuitive argument is invaluable for pointing students toward the
correct conclusion, it cannot be confused with valid reasoning. After all, how would
such an intuitive argument bring conviction to the claim that
1 1
> ?
8590007 8590008
This is why the correct reasoning using the cross-multiplication inequality must be
taught in addition to the intuitive argument using small values of and n. End of
Pedagogical Comments.
The following observations about the comparison of fractions are useful and
also easy to prove. Let A, B, C, D be fractions. Then we have:
(1) A < B ⇐⇒ there is a fraction C so that A + C = B.
(2) A < B implies A + C < B + C for every fraction C.
(3) A < B and C < D implies A + C < B + D.
The proofs require nothing more than making mathematical interpretations of cor-
rect drawings on the number line. We will leave them as an exercise (see Exercise
14 on page 42).
m k
We can now define the subtraction of fractions: suppose n > ; then a seg-
ment of length m k
n is longer than a segment of length .
the length of the remaining segment when a segment of length k is taken from one
end of a segment of length m
n.
m
n
k
also define
m m m m
− = 0 and −0= .
n n n n
This definition of the subtraction of fractions is clearly modeled on the subtraction
of whole numbers. For example, 9 − 4 is the length of the remaining segment when
a segment of length 4 is taken from a segment of length 9.
1.3. ADDING AND SUBTRACTING FRACTIONS 39
The definition of subtraction for fractions has the following pleasant (but ex-
pected) consequence: let k and m
n be distinct fractions; then
k m m k
(1.16) the length of the segment , is − .
n n
m
Indeed, by the definition of length on page 6, the length of [0, (respectively, [0, k ])
n]
m k
is n (respectively, ). Therefore (1.16) is clear from the definition of subtraction:
m
n
0 k m
k n
The reasoning used in the proof of (1.11) on p. 33, together with FFFP, gives
m k m − nk
(1.17) − = .
n n
Note that this formula makes implicit use of the preceding cross-multiplication in-
equality (Theorem 1.5), because the subtraction of whole numbers in the numerator
of the right side of (1.17), m − nk, does not make sense unless we know m > nk;
but since k < m n , Theorem 1.5 guarantees that m > nk.
We wish to bring out the fact that subtraction is an alternate way of expressing
addition. Indeed, the definition of m n − , together with the definition of adding
k
juncture, because the mathematical development of Chapter 2 does not depend on this section.
40 1. FRACTIONS
the subtraction algorithm for whole numbers, we can get around this difficulty by
computing as follows:
2 3 2 3
17 − 7 = 16 + 1 − 7+
5 4 5 4
2 3
(1.19) = (16 − 7) + 1 −
5 4
7 3 13 13
= 9+ − =9+ =9 .
5 4 20 20
32 Least common multiple. For a precise definition, see Exercise 4 on page 156.
1.3. ADDING AND SUBTRACTING FRACTIONS 41
Mathematical Aside: The use of the least common denominator for the defini-
tion of the addition of fractions is more than just a pedagogical disaster. From a
mathematical perspective, it is conceptually incorrect. If it were necessary to find
the LCM of the two denominators before the addition of two fractions could be de-
fined, it would imply that addition cannot be performed in the field of quotients of
an integral domain unless the latter has the special property that any two elements
in it have an LCM. This is almost the statement that addition cannot be defined
in the field of quotients of an integral domain unless the domain has the unique
factorization property. However, we know that this is false because addition can be
defined in the field of quotients of any domain.
Exercises 1.3.
(1) Use the definition of the sum of two fractions on page 33 to show that the
addition of fractions satisfies the associative and commutative laws. (See
(1.56) and (1.57) on page 87 for the precise statement of these two laws
and the subsequent discussion for their significance.)
(2) Compute (you may use a four-function calculator for the arithmetic com-
putations; no need to simplify your answers but you have to show all your
steps):
5
(i) 18 8
+ 27 5
, (ii) 34 − 51
7 5
, (iii) 24 12 + 32 11 3 13
15 , (iv) 315 8 − 312 20 .
(10) State the subtraction algorithm for finite decimals, and explain why it is
true. (See the discussion of the addition algorithm for finite decimals on
page 34.)
(11) (a) 25 + 12
7
= ? (b) Laura ran for 35 minutes, stopped to take a rest, and
then ran for another 24 minutes. How long did she run altogether, and
what does this have to do with part (a)?
(12) An alcohol solution mixes 5 fl. oz. of water with 24 fl. oz. of alcohol. Then
4 fl. oz. of water and 19 fl. oz. of alcohol are added to the solution. Which
has a higher concentration of alcohol (which is defined to be the number
in the sense of (1.10) on page 28): the old solution or the new?
(13) If n is a whole number, we define n! (read: n factorial) to be the product
of all the whole numbers from 1 through n. Thus 5! = 1 × 2 × 3 × 4 ×
5.
We also define 0! to be 1. Define the so-called binomial coefficients nk
for any whole number k satisfying 0 ≤ k ≤ n as
n n!
= .
k (n − k)! k!
Then prove
n n−1 n−1
= + .
k k k−1
(The proper context for this exercise is Pascal’s triangle, which is discussed
in Section 5.4 of [Wu2020b].)
(14) Prove each of the statements (1)–(3) on page 38 for fractions A, B, C,
and D.
(15) Suppose a, b are whole numbers so that 1 < a < b. Which is bigger: a−1 a
or b−1b ? Can you tell by inspection? What about a+1
a and b+1
b ?
(16) (a) Suppose ab and dc are fractions and ab < dc . Prove that ab < a+c b+d <
c
d . (b) Prove that between any two distinct fractions, there is another
fraction.
(17) Let ab be a nonzero fraction, with a = b. Order the following (infinite
number of) fractions: ab , a+1 a+2 a+3
b+1 , b+2 , b+3 , . . . . (Caution: It makes a
difference whether a < b or a > b.)
(18) In the notation of Exercise 13, observe that each fraction n! j , where n, j
are whole numbers and 1 ≤ j ≤ n, is actually a whole number. Find the
following sum and simplify your answer as much as possible:
1 1 1 1 1
49!
+ 49!
+ 49!
+ ··· + 49!
+ 49!
.
1 2 3 48 49
for defining 5−3 as 1/53 becomes indistinguishable from a proof that 5−3 = 1/53 .
44 1. FRACTIONS
of fractions, and there is no pretense that any of what follows is logically correct.
It so happens that some of the following claims or guesses will eventually be proved
on the basis of the definition of multiplication on page 45, but until that happens,
nothing in this paragraph should be construed as proven and therefore usable for
reasoning. That said, let k and m be whole numbers; then the concept of k × m
is no mystery: it is the length of k copies of m (see page 12 for the meaning of
"copy"):
k × m = m + m + ··· + m.
k
Now by analogy (and some wishful thinking but not by logic), we would like to
believe that for a whole number k and for a fraction m n , the multiplication k × n
m
m
"should also be" the length of k copies of n ; i.e.,
m m m
(1.20) k× = + ··· + .
n n n
k
36 A reminder: equation (1.20) is not a proven fact, only part of our wishful thinking of what
We now push the reset button. We will leave our heuristic discussion behind
and formalize our provisional definition in (1.24) in the following definition. From
this formal definition we will draw logical conclusions about fraction multiplication.
Note that, according to the definition on page 24, we may rephrase the definition
as
k m k m
(1.25) × = of .
n n
The preceding definition of multiplication poses a potential problem, and we
should deal with it right away. We first illustrate the problem with a simple exam-
ple. Consider 12 × 34 . We know that 12 = 24 and 34 = 1520 , and therefore we expect
that, no matter how fraction multiplication is defined, we should have the equality
1 3 2 15
× = × .
2 4 4 20
By (1.25), this is equivalent to asking whether
1 3 2 15
of = of .
2 4 4 20
Now if this equality turns out to be incorrect, then it would mean the definition
makes no sense, or in the usual language of mathematics, is not well-defined.
More generally, suppose
k K m M
= and = .
L n N
46 1. FRACTIONS
(See the appendix on page 86 for a summary of these laws.) We will leave the
detailed proof of the corollary to an exercise (Exercise 4 on page 53).
There are two immediate consequences of our definition of fraction multipli-
cation. The first is the interpretation of division by a (nonzero) whole number
as multiplication. Recall that we defined on page 29 the concept of k ÷ for two
arbitrary whole numbers k and ( = 0) to mean the length of one part when [0, k]
is partitioned into equal parts. But from the definition of fraction multiplication
(page 45), 1 × k has exactly the same meaning: the length of one part when [0, k]
is partitioned into equal parts. Therefore,
1
(1.27) k ÷ = × k for whole numbers k and , = 0.
We can put equation (1.27) into a more general context. (For a reason that will be
obvious, we are going to change the notation in the definition on page 29 from m
and n to k and , respectively.) While the definition of k ÷ on page 29 for any two
whole numbers k and ( = 0) is a generalization of the partitive division of k ÷
when k is a whole-number multiple of , this definition can itself be generalized by
replacing k with a fraction A. Indeed, we now define the division of a fraction
A by a nonzero whole number to be the length of one part when [0, A]
is partitioned into equal parts. Then it follows from the definition of fraction
multiplication that 1 × A is equal to A divided by in the sense just defined. For
1.4. MULTIPLYING FRACTIONS 47
this reason:
Division (of a whole number or a fraction) by a nonzero whole
number will henceforth be replaced by multiplication by 1 .
Incidentally, in view of the product formula, equation (1.27) implies that k ÷
= k , which is exactly Theorem 1.4 on page 29. This then provides a different
perspective on Theorem 1.4.
A second immediate consequence of the definition of fraction multiplication is
that if k is a whole number, then according to the definition of multiplication on
page 45, k × mn = 1 × n = the length of the concatenation of k segments of length
k m
m
n . By the definition of addition (page 33), the latter is equal to adding k copies
of the fraction mn:
m m m
(1.28) k× = + ··· + .
n n n
k
In other words, the multiplication k × retains the meaning of repeated addition.
m
n
As is well known, the product formula has numerous applications. One of the
simplest may be the explanation of the usual cancellation rule for fractions. For
example, we have
135 49 105
× =
28 9 4
because we can "cancel" the 9’s and 7’s in the numerators and denominators. The
precise reasoning is the following. By the product formula,
135 49 135 × 49
× =
28 9 28 × 9
Therefore,
135 49 15 × 9 × 7 × 7 15 × 7 15 7
× = = = × ,
28 9 4×7×9 4 4 1
where we have made use of equivalent fractions. The same reasoning of course
proves that for any fractions ma k
n and a we have the general cancellation rule
for fractions:
ma k mk
(1.29) × = .
n a n
Areas of rectangles
both in the naive sense until Chapter 4 of this volume and Chapter 4 in [Wu2020c], respectively.
48 1. FRACTIONS
@
@
@
@
@
@
In school mathematics, this theorem is the basis of the statement that "area is
length times width". It is sobering to realize that in TSM, there is no explanation of
Theorem 1.7 except (perhaps) the case of k and m n being whole numbers.
38
When
the lengths of the sides of a rectangle are numbers that may not be fractions,39 the
theorem continues to hold, but the reasoning becomes more sophisticated. This
issue will be considered in Section 4.4 of [Wu2020c].
Before giving the proof of Theorem 1.7, let us work out a special case: why is
the area of a rectangle with sides lengths 23 and 52 equal to 23 × 52 ? First consider
a simpler case: why does a rectangle with sides lengths 13 and 12 have area equal
to 13 × 12 ? Such a rectangle is related to the unit square in the following way. Let
one pair of opposite sides of the unit square be divided into 3 parts of equal length
and the other pair into 2 halves. By joining corresponding points of the division on
opposite sides, we obtain a paving of the unit square by 6 small rectangles. Each of
these 6 small rectangles has side lengths 13 and 12 , and they are congruent to each
other (geometrically obvious, but see Exercises 4 and 7 on page 237).
38 Note that CCSSM ([CCSSM]) shows awareness of the significance of Theorem 1.7.
39 In other words, irrational numbers.
1.4. MULTIPLYING FRACTIONS 49
1
3
1
2
These 3 × 2 small rectangles are congruent and therefore have equal areas. The
areas of these rectangles thus form an equal partition of the unit 1 (= area of unit
square) into 3 × 2 equal parts. Consequently, the area of each small rectangle is the
1
fraction 3×2 . By the product formula, we have
1 1 1
= × .
3×2 3 2
We have just shown that a rectangle with side lengths 13 and 12 has area 13 × 12 .
Now we can tackle the case of sides with lengths 23 and 52 . Let a rectangle R
with side lengths 23 and 52 be given. We want to show that its area is 23 × 52 . The
key observation here is that R is paved by rectangles with side lengths 13 and 12 .
Precisely, it is paved by 2 rows of 5 such rectangles—the 2 being from the numerator
of 23 and the 5 being from the numerator of 52 —as shown below.
R
1
3
1
2
Since each of these 2 × 5 small rectangles has area equal to 1
3×2 , the area of R is
the sum of 2 × 5 of these areas; i.e.,
1 1
area of R = + ··· + .
3×2 3×2
(2×5)
Proof of Theorem 1.7. The proof is broken up into two parts: Part I: the case
where the lengths of the sides are unit fractions, i.e., fractions with numerators equal
to 1 (see page 10) and Part II: the general case. The proof will take for granted
certain geometrically obvious facts about rectangles that can be easily proved once
the needed tools are available (see Exercises 4 and 7 on page 237).
50 1. FRACTIONS
Part I. We are going to prove that if a rectangle has two sides of lengths 1 and n1 ,
then its area is 1 × n1 .
On the number line, let the unit 1 be the area of the unit square. We will show
how to divide this particular unit segment into n equal parts, as follows. Partition
one pair of opposite sides of the unit square into parts of equal length and the
other pair into n parts of equal length. Join the corresponding points of the division
to obtain a paving of the unit square by n congruent rectangles, each having side
lengths 1 and n1 , as shown:
copies
1
?
1
n
n copies -
These n rectangles have equal areas because they are congruent. The unit square
therefore has been partitioned into n equal parts—in terms of area—by these
rectangles. Consider the shaded rectangle in the picture. By the definition of a
fraction (in the case that the unit is the area of the unit square), its area is the
1
fraction ×n on this number line because it is one part when the unit (area of the
unit square) is partitioned into n equal parts. By the product formula, its area is
equal to
1 1 1
= × .
×n n
Part II. We are going to prove that the area of a rectangle with sides of length k
and mn is × n .
k m
k copies
R
1
?
1
n
m copies -
1.4. MULTIPLYING FRACTIONS 51
1
We have just seen that each of these small rectangles has area equal to n . Since
R is paved by km of these congruent small rectangles, the area of R is equal to
1 1 1 km k m
+ ··· + = km × = = × ,
n n n n n
km
where we have used (1.28) on page 47 and the product formula. The proof of The-
orem 1.7 is now complete.
Three remarks
8375
1.25 × 0.0067 = (corresponding to (α))
102 × 104
8375
= (corresponding to (β))
102+4
A B P
0 1 3
By the definitions of the addition of fractions and by (1.25) on page 45, we have
4 4 25
|AP | = (3 × |AB|) + × |AB| = 3 + × |AB| = × |AB|.
7 7 7
This proves (1.30) when m 25
n is equal to 7 .
In TSM, the argument would go something like this: if we compare the two
7 is to 1 (length of [0, 1]) as |AP | is to |AB| by proportional
number lines, then 25
reasoning. Thus
25/7 |AP |
= .
1 |AB|
By the cross-multiplication algorithm, we get |AP | = 257 × |AB| again. However,
since we do not know what proportional reasoning is, this argument has no validity.
A third and final remark is that there are two standard inequalities concerning
multiplication that are worth knowing: if A, B, C, and D are fractions, then:
(i) If A > 0, then AB < AC is equivalent to B < C.
(ii) A < B and C < D imply AC < BD.
Both are obvious when we interpret fraction multiplication as the area of a rectan-
gle. See Exercise 3 on page 53.
1.4. MULTIPLYING FRACTIONS 53
Exercises 1.4.
(1) Do each of the following without calculators.
(a) (12 23 × 12 23 × 12 23 ) × (2 19
1
× 2 19
1
× 2 19
1
)× 1
26 .
7
(b) ( 18 × 4 23 ) + (2 61 × 7
18 )
7
+ ( 18 × 3 16 ) .
2
(c) 8 50 × 1250 12 .
(2) (i) Prove that 123.45 × 1014 = 0.012345.
(ii) Prove that 67.8901 × 103 = 67890.1.
(iii) Formulate and prove generalizations of parts (i) and (ii).
(3) Prove the following for fractions A, B, C, and D (see page 52):
(i) If A > 0, then AB < AC is equivalent to B < C.
(ii) A < B and C < D imply AC < BD.
(4) Give a detailed proof of the corollary to Theorem 1.6 on page 46.
(5) Consider the following two numbers A and B, where
A is the length of the concatenation (see page 12) of 4 parts
when [0, 18
29 ] is divided into 17 equal parts. B is the length of
4
the concatenation of 18 parts when [0, 17 ] is divided into 29
equal parts.
Is A equal to B? Why or why not?
(6) (a) Find a fraction q so that 28 12 = q×5 14 . Do the same for 218 17 = q×19 12 .
(b) Make up a (realistic) word problem for each situation, and make sure
that the problems are not the same for both.
(7) The perimeter of a rectangle is by definition the sum of the lengths of
its four sides. Show that given a fraction A and a fraction L, (a) there is
54 1. FRACTIONS
a rectangle with area equal to A but with a perimeter that is bigger than
L and (b) there is a rectangle with perimeter equal to L but with an area
that is less than A.
(8) (a) 16 12 cups of liquid would fill a punch bowl. If the capacity of the
cup is 9 13 fluid ounces, what is the capacity of the punch bowl? Explain
carefully. (b) The length of a rod is 18 58 times the length of a short piece
that is 3 14 inches long. How long is the rod? Explain.
(9) How many buckets of water would fill a container if the capacity of the
bucket is 3 13 gallons and that of the container is 7 12 gallons? (Caution:
Getting an answer for this exercise is easy, but explaining it logically is
not.)
(10) Give a proof of the distributive law for the division of whole numbers;
namely, let k, m, n be whole numbers, and let n > 0. Then
(m ÷ n) + (k ÷ n) = (m + k) ÷ n.
(11) (This is Exercise 8 on page 31. Now do it again using the concept of
fraction multiplication.) James gave a riddle to his friends: "I was on a
hiking trail, and after walking 79 of a mile, I was 49 of the way to the end.
How long is the trail?"
(12) The difference of two given fractions is equal to 45 of the smaller one, while
their sum is equal to 28
15 . What are the fractions? (Hint: Use the number
line.)
41 This precise definition of division provides a simple explanation that division of a nonzero
number by 0 has no meaning, because if it had meaning, then for a nonzero whole number m, m 0
is the whole number q so that q × 0 = m. But the last equation cannot hold because the product
on the left side is equal to 0 whereas the right side m is nonzero.
56 1. FRACTIONS
m fixed and m being a multiple of a nonzero whole number n, it is always the case
that m × n ÷ n = m and m ÷ n × n = m. But as we have indicated all along, TSM
is simply incapable of achieving such precision.
The preceding definition of division among whole numbers is important for
the understanding of division among fractions because the latter is patterned after
the former, with one important caveat. The definition of whole number division
m ÷ n makes sense only when m is a multiple of n, but, with n fixed and n > 1,
there are relatively few whole numbers m that are multiples of n. Our first task in
approaching the division of fractions is to remove this restrictive condition in our
prospective definition of fraction division by showing that, given a nonzero fraction
B, every fraction is a (fractional) multiple of B, as the following theorem shows.
The proof of Theorem 1.8 shows explicitly how to get the fraction C so that
CB = A: if A = k and B = m
n , then the proof gives C as
kn k n
C = = × .
m m
In other words,
m k k n
(1.31) if C × = , then C = × .
n m
This fact will be useful below.
Despite the simplicity of the statement, Theorem 1.8 is conceptually subtle and
may take some getting used to. As mentioned above, it says that if a fraction B
is nonzero, then every fraction A is a fractional multiple of B, in the sense that
A = CB for some fraction C. (Note that, since we are no longer dealing exclusively
with whole numbers, the meaning of multiple has to be suitably modified. In the
future, if we want to indicate that there is a whole number C so that A = CB,
we will say explicitly that A is a whole number multiple of B.) Taking A = 1,
then Theorem 1.8 implies that there is exactly one fraction, which we will denote
by B −1 , so that B −1 B = 1. We call this B −1 the inverse (or multiplicative
inverse, to be precise) of B. In fact, (1.31) shows that
m n
(1.32) if B = , then B −1 = .
n m
1.5. DIVIDING FRACTIONS 57
For this reason, B −1 is also called the reciprocal of B in the context of fractions.
Using this notation, the expression of C in equation (1.31) above can be rewritten
as
C = AB −1 .
Thus, if A = CB, then C = AB −1 . For example, if A = 11
5 and B = 23
8 , then the
C that satisfies C( 23 11
8 ) = 5 is, according to (1.31),
11 8 88
C = × = .
5 23 115
We can now give the definition of fraction division. It is, word for word, the
same as the definition of whole number division given on page 55, with the excep-
tion that, thanks to Theorem 1.8, there is no need to require that A be a fractional
multiple of B.
k n
C= × .
m
So (1.33) is proved. (We see that there is nothing mysterious about this rule: it is
a simple consequence of the correct definition of division.)
(ii) Division is well-defined. Given fractions k and m
n (the latter being nonzero),
k K m M
we have defined the division of k by m n . Now suppose = L and n = N for
Example. A rod 43 38 meters long is cut into pieces which are 53 meters long.
How many such pieces can we get out of the rod?
If we change the numbers in this example to "if a rod 48 meters long is cut into
pieces which are 2 meters long, how many such pieces can we get out of the rod?",
then there would be no question that we should do the problem by dividing 48 by
2. So we will begin the discussion by following this analogy and simply divide 43 38
by 53 and see what we get:
43 38 1041 1
5 = = 26 .
3
40 40
1.5. DIVIDING FRACTIONS 59
m 5
r = × .
n 3
We notice that m n
must be a proper fraction in the sense that m < n
because r < 53 , so that, in view of the preceding equation,
3 3 5
m = ×r×n< × ×n=n
5 5 3
(see item (B) on page 52). Therefore, substituting this value of r into
the equation (1.35) gives
3 5 m 5
43 = K× + ×
8 3 n 3
m 5
= K+ × ,
n 3
60 1. FRACTIONS
where K + mn
is a mixed number because m n
is a proper fraction. So we
may rewrite the preceding equation in the notation of mixed numbers:
3 m 5
43 = K × .
8 n 3
By the definition of division, we see that
m 43 3
K = 58.
n 3
We now come to the last part of the arithmetic of finite decimals: division.
In principle, this is very simple because we are going to show that the division
of decimals is reduced to the division of whole numbers. The following example is
sufficient to illustrate the general case: the division
0.311
0.64
becomes, upon using the definition of a decimal and invert-and-multiply,
311
0.311 103 311
= 64 = .
0.64 102
640
This reasoning is naturally valid for the division of any two finite decimals. There-
fore, the division of any two finite decimals is equal to a fraction.
Now a fraction is not quite the right answer to a division of two finite decimals
because we want the answer to be a finite decimal. The next step is therefore to
convert a given fraction to a finite decimal; i.e., given a fraction m n for some whole
numbers m and n, to convert m n
to a finite decimal is to find whole numbers
N and k so that
m N
= k,
n 10
where the right side is, by definition, a finite decimal (see page 14). It turns out
that this is not always possible, as we now explain.
The overriding fact is that a fraction in lowest terms is equal to a finite decimal
if and only if its denominator is a product of 2’s and 5’s42 (Theorem 3.8 on page
152). Keeping this in mind, what we will do here is show how to convert a fraction,
in three different ways, to a finite decimal if its denominator is a product of 2’s
and 5’s. At the end, we will also make a passing comment on the general case of
converting an arbitrary fraction to an "infinite decimal".
In one sense, the conversion is easy when the denominator is a product of 2’s
and 5’s. What makes it easy is the simple observation that a power of 10 is a product
of 2’s and 5’s; i.e., 10k = 2k × 5k for any whole number k (because 10 = 2 × 5).
7
To see how this leads to the conversion, take the fraction 125 , for example. Since
125 = 53 and 103 = 23 × 53 , we see that we can directly change the denominator
125 to 103 by multiplying it by 23 . Therefore, by invoking equivalent fractions, we
get
7 7 23 × 7 56
= 3 = 3 = 3 = 0.056.
125 5 2 ×5 3 10
28
Here is another example: consider the fraction 625 . Since 625 = 54 and 104 =
24 × 54 , we have
28 28 24 × 28 448
= 4 = 4 = 4 = 0.0448.
625 5 2 × 54 10
311
Finally, let us take up the fraction above, 640 . We have
311 311
= 7 .
640 2 ×5
Since 107 = 27 × 57 = (27 × 5) × 56 = 640 × 56 , we see that multiplying 640 by 56
changes 640 to 107 . Therefore, using equivalent fractions again, we get
311 311 × 56 311 × 56 4859375
= = = = 0.4859375.
640 (2 × 5) × 5
7 6 107 107
There should be no difficulty at this point in converting any fraction with a denom-
inator equal to a product of 2’s and 5’s to a finite decimal.
This is not the end of the story, however. We need two more observations
to round out the picture. First, there is another way to make use of the given
hypothesis that the denominator of a fraction is a power of 2’s and 5’s. Thus far,
we have focussed our attention on the denominator of the fraction, but we will
now shift the focus to the numerator instead. We can use equivalent fractions to
introduce a large power of 10 into the numerator so that we can cancel the 2’s and 5’s
in the denominator to get a whole number. This statement needs to be amplified, so
7
let us look at an example: consider 125 again. Knowing in advance that 125 = 53 ,
we know that if we have 10 (= 2 × 5 = 23 × 125) in the numerator, we would
3 3 3
be able to cancel the 125 in the denominator. Precisely, using the cancellation rule
(1.29) on page 47, we have
7 7 × 103 1
(1.36) = × 3.
125 125 10
N3
The subscript 3 in the fraction we call N3 on the right refers to the fact that 103 is
in its numerator. N3 is in fact a whole number because its numerator is a multiple
of 125 (since 103 = 23 × 125) so that it cancels the 125 in the denominator. Thus
N3 = 7 × 23 = 56 so that
7 N3 56
(1.37) = 3 = 3 = 0.056
125 10 10
62 1. FRACTIONS
Again, the subscript 7 in the fraction we call N7 on the right refers to the 107 in
its numerator. N7 is also a whole number because the number 107 in its numerator
is a multiple of 640 (since 107 = (27 × 5) × 56 = 640 × 56 ) and this factor of 640
in the numerator of N7 cancels its denominator. Thus N7 = 311 × 56 = 4859375 so
that
311 N7 4859375
(1.39) = 7 = = 0.4859375.
640 10 107
Let us do one more example. Consider the fraction 15 5
32 . Because 32 = 2 , we rewrite
the fraction as
15 15 × 105 1
(1.40) = × 5.
32 32 10
N5
Once again, the fraction N5 is a whole number because the 105 in its numerator
is equal to 105 = 25 × 55 = 32 × 55 and the denominator 32 of N5 therefore gets
canceled. Thus N5 = 15 × 55 = 46875. Consequently,
15 N5 46875
(1.41) = 5 = = 0.46875.
32 10 105
The second observation builds on the first and establishes a connection between
the way of converting a fraction to a finite decimal in the first observation and the
traditional way that uses long division. To explain this statement in greater detail,
7
let us look at 125 again. As we explained after (1.36), the number N3 in (1.36) is a
whole number (in fact N3 = 56). Therefore N3 is the quotient of the long division
of 7×103 by 125. By tradition, one refers to this long division as the long division
of the numerator 7 by the denominator 125, it being understood that it is
actually the long division of 7 × 103 (and not 7) by the denominator 125. In TSM,
the conversion of a fraction to a decimal by long division is a rote skill, and part
of this rote skill is to "put the decimal point back in the quotient N3 (= 56)" in a
particular way. In our setting, such a placement of the decimal point—third digit
from the right—is precisely explained (and dictated) by the denominator 103 in
(1.37).
As a second example, consider the fraction 311 640 in (1.38). Again, since we
already know that the N7 in (1.38) is a whole number (in fact, N7 = 4859375), N7
is now seen to be the quotient of the "long division of the numerator 311 by the
denominator 640" (again it is understood that the dividend is actually 311 × 107 ).
Equation (1.39) then shows why 311 640 is equal to the decimal obtained by placing
the decimal point 7 digits from the right in the quotient N7 . Similarly, the fraction
15
32 in (1.41) is equal to the decimal obtained by placing the decimal point 5 digits
from the right in the numerator N5 (= 46875), where N5 is the quotient of the long
division of 15 (actually 15 × 105 ) by 32 as in (1.40).
We must tie up one last loose end concerning (1.36). To express the fraction
7
125 as a finite decimal, do we have to use 103 as in (1.36) or can we use 10k for any
1.5. DIVIDING FRACTIONS 63
Theorem 1.9. Let m n be a fraction so that n is a product of 2’s and 5’s. Then
for any sufficiently large whole number k, the long division of m × 10k by n has
quotient q (a whole number) and remainder 0, and m n is equal to the finite decimal
q
10k
.
It is clear from the reasoning used to arrive at Theorem 1.9 that the k in
Theorem 1.9 can be taken to be any whole number larger than or equal to both
of the exponents of 2 and 5 when n is expressed as the product of a power of 2
times a power of 5. This latitude in the choice of k is what makes the algorithm
of the conversion of a fraction to decimal so easy to use, because it says that if in
doubt, one can add any number of zeros to the right of the numerator to do the
long division. Such a practical consideration also brings up another observation
about Theorem 1.9, namely, that there is in fact no need to consider any fraction
whose denominator contains factors of both 2 and 5. This is because, if factors of
both 2 and 5 are present, the denominator will be a multiple of 10 (= 2 × 5) and
the factors of 10 can be "split off" from the beginning to simplify the computations.
640 : because 640 = 2 × (2 × 5) = 64 × 10,
Let us illustrate with the above fraction 311 6
we have
311 311 1
= × .
640 64 10
64 1. FRACTIONS
Now the general case. Most fractions will not be equal to a finite decimal
because Theorem 3.8 on page 152 says that if a fraction in lowest terms is equal to
a finite decimal, then its denominator is a product of 2’s or 5’s or both. Thus in
general, a fraction will be equal to an infinite decimal, whose definition can only
be given by using the concept of limit; see Chapter 3 of [Wu2020c]. The fact that
every fraction is equal to an infinite repeating decimal is the content of Theorem
3.8 in Section 3.4 of [Wu2020c].43
TSM usually teaches the procedure of "converting a fraction to an infinite
decimal by the long division of the numerator by the denominator" entirely by
rote without the slightest explanation of what an "infinite decimal" is or why the
procedure involving long division is valid. Worse, the simple explanation (implicit
in (1.42) on page 63) that one can attach as many zeros to the right of the numerator
to continue the long division and then put back the decimal point in the quotient
could have been given but never was. While it is true that the definition of an
infinite decimal is beyond the scope of school mathematics, one can nevertheless
approach the conversion of fractions to infinite decimals in school mathematics in
a more civilized manner, as we now explain. Consider, for definiteness, the decimal
conversion of 27 . Instead of saying 27 is equal to the infinite repeating decimal
0.285714, we can say instead that we can approximate 27 as closely as we like by a
finite decimal. For example, suppose we want a finite decimal that is within 1/106
of 27 . Following up our success with equation (1.36) on page 61, we should at least
try something like the following:
2 2 × 108 1
= × 8.
7 7 10
The hope is that the fraction in parentheses on the right will yield something that
serves our purpose. We use 108 instead of 106 because we are unsure of our footing
here and want to play it safe (we could even use 1010 if 108 turns out to be not good
enough). By the long division of 2 × 108 by 7, we get the division-with-remainder
(1.43) 2 × 108 = (28571428 × 7) + 4.
Thus,
2 (28571428 × 7) + 4 1 4 1
= × 8 = 28571428 + ×
7 7 10 7 108
28571428 4 1
= + × .
108 7 108
Equivalently,
2 4 1
− 0.28571428 = × ,
7 7 108
1
108
In any case, the decimal 0.28571428 is a finite decimal approximation of 27 with an
error no bigger than 1/108 . Since 1/108 < 1/106 , this decimal approximation is
certainly within 1/106 of 27 .
There is a more intuitive way to write (1.44). First, it follows from (1.44) that
2 1
0.28571428 < < 0.28571428 + 8 .
7 10
Since 1018 = 0.00000001, the addition algorithm for decimals therefore implies that
2
(1.45) 0.28571428 < < 0.28571429.
7
This is a more reasonable way of saying that 0.28571428 is a finite decimal approx-
imation of 27 up to an error of 1/108 .
In a similar manner, if we want a finite decimal that approximates 27 to within
1/1015 , then we would begin instead with
2 2 × 1015 1
= × 15 .
7 7 10
(We now use 1015 because our experience above with 108 has emboldened us.) An
analogous calculation now gives
2 1
0 < − 0.285714285714285 < 15 .
7 10
It follows that the decimal 0.285714285714285 is within 1/1015 of the fraction 27 .
As in (1.45), we also have
2
(1.46) 0.285714285714285 < < 0.285714285714286.
7
(9) The following is one of several approaches to the division of fractions that
one finds in TSM44 :
k/
We try to find out what m/n could mean. Using equivalent frac-
tions, we get
k
k
× n kn
kn
= = = ,
m
n
m
n × n mn
n
m
and therefore
k
kn
m = .
n m
Point out all the flaws in this explanation.
(10) Here is another approach to the division of fractions according to [Davis-
Pearn, p. 10]. It begins by explaining that 1 34 ÷ 12 = 3 12 can be modeled by
"How many 12 -minutes are there in 1 34 minutes?" Now 3 of the 12 -minutes
is 1 12 minutes, while 4 of the 12 -minutes is 2 minutes, and already 2 > 1 34 .
So 1 34 ÷ 12 is between 3 and 4. Since 1 34 − 1 12 = 14 and 14 minutes is exactly
1 1 1 1 3
2 of a 2 -minute unit, there are exactly 3 2 of the 2 -minutes units in 1 4 .
That said, the explanation of 1 4 ÷ 2 = 3 2 continues:
3 1 1
m
n of a small pizza. During this special sale, it sells 12 of a pizza for the
usual price of 13 of a pizza.46 At the sales price, how much would 8 23 small
pizzas cost?
5
(15) Use the number line to solve the following: if 13 of a number N is 8 more
than a third of N , what is N ?
(16) Show that there is a rectangle with area < 1 sq. cm and perimeter equal
to 1,000 cm.
12 37
1.2 3.7 10 10 12 3700 1165596
+ = 315 + 8 = + = .
31.5 0.008 10 103
315 8 2520
46 I got this idea from my late friend David Collins. We believe that if all pizza parlors would
buy into this idea, the national fractions achievement would improve.
47 This is a confusing piece of terminology because it suggests that complex numbers are
involved, but they are not. Since this is the terminology already in use in school mathematics and
the confusion is tolerable, we will go along with it. Such compromises are unavoidable.
1.6. COMPLEX FRACTIONS 69
However, if x = 34 , for example, then this equality becomes an equality that requires
the product formula for complex fractions for its justification:
4 ×7
3 3
7
4
× =
3
.
( 43 − 1) ( 43 )3 + 2 4 − 1 ( 43 )3 + 2
Have students been properly prepared to accept that this computation is correct?
The answer is an emphatic no. Many other computations with rational expressions
related to addition or subtraction also assume a knowledge of such computations
with complex fractions, implicit as that may be. See, for example, page 316. Yet
TSM utters nary a word about this requisite background information. This does
real damage to student learning.
A main goal of school mathematics education is to help students acquire the
ability to move from assumptions to conclusions by the use of reasoning. This
ability is what we hope will mold students into independent, critical thinkers. The
inculcation of the ability to reason is a delicate process under the best of circum-
stances, and its chance of success is irrevocably diminished if there are gaping holes
imbedded in the very reasoning we try to teach. What is the point, for example, of
paying close attention to the reasoning (if it is given) that the formula for fraction
addition, (1.48), is correct for whole numbers k, , m, and n when (1.48) is soon
applied to situations where k, , m, and n are fractions? Students cannot help but
ask: where is the reasoning now? Taking such examples as a cue, their survival
instincts will lead them to the conviction that, in mathematics, it is not reason-
ing but improvisation and expediency that matter. It then comes to pass that,
once students are taught how multiplication can be "distributed" over addition—
the distributive law—many will feel no inhibition in "distributing" squaring over
2 2
addition (i.e., (x + y)
√ = x +√ y 2 ) or even "distributing" the taking of square-root
√
over addition (i.e., x + y = x + y). While it will take serious education re-
search to establish a causal relationship between these two kinds of phenomena,
few would argue that abusing (1.48) and similar formulas about fractions will serve
to reinforce the importance of reasoning in students’ minds.
This volume asks you, as a teacher, to avoid the kind of TSM-inspired rote
teaching that we have just described and replace it with the correct reasoning that
underlies the operations with complex fractions.
Here is a brief summary of the basic facts of complex fractions that figure
prominently in school mathematics: let A, . . . , F be fractions, and we assume fur-
ther that they are nonzero where appropriate in the following. Then:
AC A
(a) Cancellation law: If C = 0, then BC = B .
2.9 × 17
7
2.9
Example: 7 = 2 .
3 × 17
2
3
A C
(b) B = D if and only if AD = BC.
A C
B
< D
if and only if AD < BC.
1.6. COMPLEX FRACTIONS 71
4 13
4 16 m 13
Example: If × < × , then 5
m < 2
16 .
5 3 n 2 n 3
A
(c) B ±D
C
= (AD)±(BC)
BD
.
0.21 1
0.21 × 13
Example: × 3 = .
0.037 2.6 0.037 × 2.6
Remark. If you are curious about whatever happened to the division of com-
plex fractions, it is a consequence of (a) and (d); see Exercise 4 on p. 72.
Assertions (a), (b), (c), and (d) are the generalized versions of the cancellation
law (page 20), the cross-multiplication algorithm (page 22) and cross-multiplication
inequality (page 36), the addition and subtraction formulas (pages 33 and 39), and
the product formula (page 46), respectively, for ordinary fractions. We call explicit
1.2 3.7
attention to the fact that (c) and (d) justify the computations with 31.5 + 0.008 and
0.21
0.037 × 84.3
2.6 on pages 68 and 69. Note also that it follows immediately from (a) that
the cancellation rule for fractions (equation (1.29) on page 47) continues to hold
for complex fractions: CE D × BE = BD if E = 0. For example,
A AC
(2) (a) Divide 98 into two parts A and B (i.e., A and B are fractions and
A
A + B = 98) so that B = 67 . (b) Divide 27 into two parts A and B so that
A 4
B = 5.
3
(3) Two fractions x and y satisfy xy = 10 and xy = 158
. What are x and y?
A −1
(4) (i) Let A and B be nonzero fractions. Prove that B = B
A . (ii) Prove
the following invert-and-multiply rule for complex fractions: let A, B, C,
D be nonzero fractions; then
A
B AD
C
= .
D
BC
(5) (i) Prove (b) on page 70 by the mechanical procedure of converting both
sides to ordinary fractions. (ii) Now prove (b) again, but this time by
employing the reasoning used in the text to prove (a). (iii) Repeat (i) and
(ii) on (c) and (d) on page 71.
(6) Explain, in as simple a manner as possible, approximately where the frac-
tion 821 is on the number line. (This is a mathematical problem, which
26 2
means that you have to be precise even when you make approximations.)
(7) (a) Find the fraction K so that K − 2.5 = 4 13 −K. (b) Find the fraction
A so that A−2.5 2
4.1−A = 3 . (c) Can you interpret the statements in (a) and (b)
geometrically?
1
(8) If x, y are nonzero fractions, what is 1 1 1 ? (This expression for x
2 x + y)
(
of "rate" can be considered: average rate and constant rate. We also emphasize
the underlying similarity among the typical school "rate" problems: speed, house-
painting, lawn-mowing, and water flow from a faucet.
Percent (p. 73)
Ratio (p. 75)
Constant speed (p. 76)
Other kinds of constant rate (p. 81)
Percent
Solutions.
(i) If x is 5% of 24, then x = 5% × 24 = 100
5
× 24 = 120 6
100 = 5 .
(ii) If 5% of a certain number y is 16, then again strictly from the definition
5
of "percent", this translates into (5%)y = 16; i.e., 100 × y = 16. Multiplying both
100
sides by 5 , we get
100
y = 16 × = 320.
5
(iii) Suppose 9 is N % of 24 for some fraction N . This translates into 9 =
N % × 24, or 9 = 100
N
× 24. Multiplying both sides by 100
24 , we have 9 × 24 = N , so
100
that
900 75 1
N= = = 37 .
24 2 2
So the answer to (iii) is 37 12 %.
We should point out the critical role played by complex fractions in the solution
of (iii). Since N is a priori a fraction (and actually turns out to be a fraction), N %
is a complex fraction. Therefore the calculations in (iii) had to make use of (a) and
(d) on pp. 70 and 71. This is the first and only time we will take the trouble to point
out the numerous instances in which complex fractions are used in an essential way
in the rest of this section.
What we can conclude from this short discussion is that if students have an
adequate background in fractions and have been carefully instructed in the use of
symbols, the concept of percent is straightforward and involves no subtlety. Once
49 Always remind your students that if they do not know definitions, they are not in a position
to do mathematics, in the same way that anyone who has no vocabulary is not in a position to
write novels.
1.7. PERCENT, RATIO, AND RATE PROBLEMS 75
this kind of instruction has been implemented in the school classroom, then ed-
ucation research would be in a position to shed light on what the real learning
difficulties might be. Until then, we should concentrate on meeting the minimum
requirements of the fundamental principles of mathematics (see page xiii in the
preface), which includes providing clear and precise definitions of all the concepts
and the requisite reasoning for all the claims. Note however that such a definition
of percent could not be given if the concept of a complex fraction were not available.
Ratio
Next we take up the concept of ratio. In TSM, this concept is encrusted with
excessive verbiage. It would be expedient, therefore, to begin with a short definition.
In connection with ratio, there are some common expressions that need to be
made explicit. To say that the ratio of boys to girls in a classroom is 3 to 2 is
to say that if B (respectively, G) is the number of boys (respectively, girls) in the
classroom, then the ratio of B to G is 32 . Similarly, in making a fruit punch, the
statement that the ratio of fruit juice to rum is 7 to 2 means that we are
comparing the volumes of the two fluids (because the use of volume for measurement
is understood in this situation), and if the amount of fruit juice is A fluid ounces
and the amount of rum is B fluid ounces, then the ratio of A to B is 72 , and so on.
We will now work through some standard problems on ratios by using rea-
soning strictly based on this definition, without any guesswork. This is in fact
how mathematics should be done, but TSM makes it necessary for students to ig-
nore whatever "definitions" they are given and use rote solution templates to solve
problems—especially problems about ratios and rate (our next topic). The clarity
of the ensuing discussion, as well as the ease with which we dispatch the problems,
may serve as a persuasive argument for the need for precise definitions.
Solution. Let the number of boys be B and the number of girls be G, then
we are given that B 11
G = 13 . Thus by the cross-multiplication algorithm, 13B = 11G.
k k
Let k be this common number, i.e., 13B = 11G = k, so B = 13 and G = 11 . Now
k k 24k
we are also given B + G = 696, so 13 + 11 = 696. This gives 143 = 696, and there-
fore 24k = 143 × 696; i.e., k = 29 × 143. Since B = 13
k
, we get B = 319. The value
k
of G can be obtained from either B+G = 696 or from G = 11 . In any case, G = 377.
2
Example 2. Divide 88 into two parts so that their ratio is 3 to 45 .
76 1. FRACTIONS
In this short discussion, we have intentionally used only one method to do both
examples to emphasize the simplicity of such problems. We leave to an exercise the
exploration of other methods of solution (Exercise 5 on p. 85).
Constant speed
traveled after a time interval of length t (hours, minutes, etc.) is vt feet, miles, etc.
We will denote this distance by d(t) (and not just d) to indicate that it depends on
the length of the time interval t. Thus, we will make the case, heuristically, that if
a motion is of constant speed, then
(1.50) d(t) = vt for any t ≥ 0.
We write vt and not tv because, v being a constant (i.e., a fixed number), we respect
the notational convention of putting a constant before the symbol t that can take
on infinitely many values. See page 302 below. (You may notice at this point that
this discussion resembles in spirit the earlier one about the definition of fraction
multiplication on pp. 44ff.)
As we have just said, this number d(t) depends on what t is; d(t) changes with
t. You will recognize from your exposure to calculus that d(t) is a function of t, and
our notation duly reflects this fact. However, in a middle school classroom, it is not
necessary to emphasize this aspect of the concept of constant speed at this point.
For school students, the sophistication of this definition lies in the fact that, for the
purpose of checking whether a motion has constant speed, it requires the checking
of an infinite number of multiplications, namely, the product of t by v for every
t ≥ 0. This is a departure from the mathematics of elementary school where one
is called upon to do only a small number of computations for each problem. The
constant v is what is usually called the speed of the object and its unit will depend
on the units we use for distance and time. If these are ft and sec, respectively, then
the unit of v will be feet per second; if they are yd and min, respectively, then
the unit of v will be yards per minute, etc. For definiteness, let us say the unit
for distance is miles and the unit for time is hours; then the speed is miles per
hour, and we abbreviate it to mph as usual. For example, driving at a constant
speed of 60 mph means the total distance d(1) traveled after 1 hour is 60 × 1 = 60
mi (according to (1.50)), and the total distance d(3.2) traveled after 3.2 hours is
60 × 3.2 = 192 mi, etc.
Here is the heuristic argument that will lead us to (1.50). Let v be a fixed
number. We can easily agree that the following two statements capture the naive
idea of "motion of constant speed":
(a) The object travels v miles in every one-hour interval.
(b) In any two time intervals of the same length, the object
travels the same distance.52
On the strength of these two plausible statements (a) and (b), we now show that
(1.50) must hold.
Case 1: t is a whole number. This is the litmus test: if (1.50) cannot be shown
to be true on the basis of (a) and (b) when t is a whole number, then clearly there
would be no point in going further. But if t is a whole number, then the time
interval of length t is t copies of a 1-hour interval. In each of these 1-hour intervals,
the object travels v miles, by (a). Therefore in a time interval of t hours, the object
travels v + v + · · · + v (t times) miles, which is tv = vt miles. Thus d(t) = vt as in
(1.50) when t is a whole number.
52 Note that statement (a) by itself does not guarantee "constant speed" even in the naive
Case 2: t is a unit fraction; i.e., t = n1 for some nonzero whole number n. Let
the distance that the object travels in n1 hours be D miles; i.e., d( n1 ) = D (observe
that, by (b), it does not matter which ( n1 )-hour interval we use). Therefore, since a
1-hour interval is n copies of a ( n1 )-hour interval, the distance traveled in 1 hour is
· · · + D = nD = Dn miles.
D + D +
n
But in 1 hour, the object is known to travel v miles, by (a). Therefore, v = Dn, and
multiplying both sides by n1 yields D = v n1 ; i.e., d( n1 ) = v n1 . So again, d(t) = vt is
correct when t = n1 .
Thus d( m
n) = vm
n and we see that (1.50) is also true when t = m m
n for any fraction n .
What the preceding heuristic argument shows is that for a motion that is of con-
stant speed v in the naive sense of statements (a) and (b) above, we have d(t) = vt
for all fractions t. In other words, equation (1.50) holds when t is a fraction. Con-
versely, it is easy to see that if d(t) = vt for all t ≥ 0, then statements (a) and (b)
are true. Therefore the naive notion of what constitutes constant speed in the form
of (a) and (b) is equivalent to the single equation (1.50) to the effect that d(t) = vt
for any t ≥ 0. Since the latter is clearly simpler and easier to use, we adopt it as
the formal definition of constant speed:
When the motion is known to have constant speed v in the sense of this defi-
nition, we can abbreviate "constant speed" to "speed" and call v the speed of the
motion. We repeat, the concept of "speed" in school mathematics makes sense only
in the case of "constant speed". This is therefore a paradoxical situation: because
our mathematical repertoire is limited, we have to first define "constant speed"
before we can talk about "speed". If calculus is available, then the natural order
of things will be restored, as "speed" can then be defined before "constant speed".
As in the case of the definition of fraction multiplication on pp. 44ff, every piece
of reasoning about constant speed from now on must be based on this definition
and on this definition alone. In particular, the heuristic argument leading up to the
definition will play no role in the logical development from this point forward.
In the definition of constant speed, clearly we may restrict our attention only to
those t so that t > 0. Then using the definition of division, especially the statement
1.7. PERCENT, RATIO, AND RATE PROBLEMS 79
(**) on page 57, we may equivalently reformulate the concept of constant speed
using division instead of multiplication: a motion has constant speed v (v is a fixed
number) if for any number t > 0, the distance d(t) (feet, miles, etc.) traveled by
the object in any time interval of length t (seconds, minutes, etc.) satisfies
d(t)
(1.51) = v for all t > 0.
t
This is the precise formulation of the common notion that "speed is distance di-
vided by time", but, we repeat, this phrase has no meaning whatsoever in school
mathematics except in the special case of constant speed.
We now describe yet another way of defining constant speed that is important
for solving certain types of word problems. For an object in motion, we introduce
the concept of its average speed over the time interval from t1 to t2 , t1 < t2 ,
as
total distance traveled from time t1 to t2
(1.52) .
t2 − t1
Pedagogical Comments. In the school classroom, two aspects of the defi-
nition of average speed in (1.52) merit special emphasis. First, the term "average
speed" by itself carries no information until we know the time interval—from a
specific point in time t1 to another specific point in time t2 —in which to compute
the average speed in question. Thus an average speed has to be measured in a
specific time interval. In addition, because the terminology "average" stimulates
the conditioned reflex of "add two numbers and divide by 2", students need to put
this conditioned reflex in check. "Average speed" is not the average of two speeds
any more than a "Venus flytrap" is a flytrap from Venus or a "sea lion" is a lion
from the sea. These added cognitive complexities associated with "average speed"
are thus things you must impress on your students because their sensitivity to the
difference between a technical term and a phrase in everyday language is something
they need to acquire in order to go further in mathematics or science. End of
Pedagogical Comments.
Now suppose we have an object moving at constant speed. Then the total
distance traveled from time t1 to time t2 is equal to
(total distance traveled from time 0 to t2 )
− (total distance traveled from 0 to t1 ).
This is equal to vt2 − vt1 . Therefore the average speed of a motion of constant
speed v over the time interval from t1 to t2 is
vt2 − vt1 v(t2 − t1 )
= = v.
t2 − t1 t2 − t1
Therefore, a motion of constant speed v has the same average speed over any time
interval. This then leads naturally to the next theorem.
Proof. The preceding discussion proves that if the object moves at constant speed
v, then its average speed over any time interval is also v. Conversely, suppose the
average speed of the object over any time interval is always equal to a constant v.
80 1. FRACTIONS
We have to prove that the motion is one of constant speed; i.e., we must prove that
given any number t > 0, the total distance d(t) traveled by the object during a time
interval of length t satisfies d(t) = vt. Let the motion of the object start at time
0. With t given, let us measure the average speed of this motion during the time
interval from t1 to t0 + t, where t0 is an arbitrary nonnegative number. Then the
total distance traveled in this time interval of length t is equal to the difference,
(1.53) (total dist. traveled from 0 to t0 + t) − (total dist. traveled from 0 to t0 ).
Now by the hypothesis that the average speed over any time interval is v, we have
Theorem 1.10 tells us that, in the case of constant speed, we can talk about the
"average speed" of a motion without referencing the time interval over which the
average speed is measured.
Mathematical Aside: One can gain a better understanding of one half of The-
orem 1.10 from the point of view of calculus: we will show why constant speed
implies all average speeds are equal to a constant. Let d(t) be the total distance
traveled by an object from the starting point at time 0 to its position at time t.
Suppose the motion is one of constant speed v for a fixed number v; i.e., the deriv-
ative d (t) is equal to v. Now a function whose derivative is equal to a constant v
is a linear polynomial of the form vt + c for some constant c. Thus d(t) = vt + c.
Consider now the total distance traveled from time t1 to time t2 . Clearly,
total distance traveled from time t1 to t2 = d(t2 ) − d(t1 ).
Since d(t) = vt + c, we have
d(t2 ) − d(t1 ) = (vt2 + c) − (vt1 + c) = v(t2 − t1 ).
Therefore the average speed from time t1 to t2 is (according to (1.52))
v(t2 − t1 )
= v.
t2 − t1
This then explains why constant speed means that the average speed over any time
interval is equal to a fixed constant.
1.7. PERCENT, RATIO, AND RATE PROBLEMS 81
In TSM,54 the speed of an object is the "rate" of motion in moving from one
place to another. The trouble with such a statement is that it makes some sense
intuitively but, in the setting of school mathematics, it is ultimately confusing
because TSM has promoted "rate" to be a fundamental concept that purports to
test students’ so-called "conceptual understanding" and, yet, TSM fails to produce
a definition of "rate".55 So the first thing we should do in approaching "rate
problems" is to be forthright with students and tell them that, in K–12, "rate"
should be understood as a figure of speech that refers vaguely to problems related to
"work" in a naive sense. All we can do in K–12 is to define precisely the concepts of
"average rate" and "constant rate" in specific instances such as speed, lawn-mowing,
house-painting, water-flow from a faucet, etc. All so-called "rate problems" in K–
12are about either constant rate or average rate, no more and no less. For example,
in problems related to the painting of (the exterior walls of) a house, the rate there
would be in terms of the number of square feet painted per day or per hour. Or,
in problems related to lawn-mowing, the rate in question would be in terms of the
number of square feet mowed per hour or per minute. A third example is water
flowing out of a faucet and filling a container, and the rate in that case would be
in terms of the number of gallons (or liters) of water coming out per minute or per
second. Imitating the discussion of constant speed and average speed (over a time
interval) in the preceding subsection, the concepts of constant rate and average rate
can be analogously defined. For example, a constant rate of lawn-mowing can
be defined in one of two equivalent ways. One is to say that there is a constant r
(whose unit is square feet per hour), so that if AT is the total area that is mowed
53 Those about freely falling objects only show up in high school in the context of quadratic
functions.
54 See page xiv for the definition of TSM.
55 Without calculus, it is impossible to explain what "rate" means.
82 1. FRACTIONS
in T hours, then AT = rT no matter what T may be. The other is to define the
average rate of lawn-mowing from time T1 to time T2 as the division T2A−T 12
1
,
where A12 is the total area mowed from time T1 to time T2 . Then the lawn is said
to be mowed at a constant rate if the average rate of the lawn being mowed
over any time interval is equal to a fixed constant. The equivalence of these two
definitions is by virtue of Theorem 1.10 on page 79 or, more precisely, by the exact
counterpart of this theorem for lawn-mowing.
For further comments on "rate" problems as they appear in TSM, go to the
Pedagogical Comments on page 84.
We now show how to give four different formulations of the same rate problem:
(P1) Regina drives from Town A to Town B in 10 hours, and Eric in 12.
Assuming that each drives at a fixed constant speed, Regina from Town A to Town
B, and Eric from Town B to Town A, and that they get started at the same time
and drive on the same highway, after how many hours will they meet in between?
(You may assume that each car has the size of a point.)
(P2) Regina mows a lawn in 10 hours, and Eric in 12. Assuming that each
mows at a fixed constant rate, how long would it take them to mow the same lawn
if they start mowing at the same time and mow together without interfering with
each other?
(P3) Regina paints a (small) house in 10 hours, and Eric in 12. Assuming
that they start painting at the same time and each paints at a fixed constant rate,
how long would it take them to paint the same house if they paint together without
interfering with each other?
(P4) A faucet can fill a tub in 10 minutes, and a second faucet in 12. Assum-
ing that the rate of the water flow remains constant in each faucet, how long would
it take to fill the same tub if both faucets are turned on at the same time?
1
1
By the distributive law, we have dT 10 + 12 = d. Since d is just a number,
1
multiplying both sides by the complex fraction d (and using rules (a) and (d) on
1 1
page 71) gives ( 10 + 12 )T = 1. By the definition of division, we get
1 5
T = 1 1 =5 (hours).
10 + 12
11
It may be instructive if we also solve problem (P2) for comparison.
Let the area of the lawn be A sq. ft. Because in 10 hours Regina can mow the
whole lawn, i.e., A sq. ft., her (constant) rate of lawn-mowing rR is, by definition,
A A
10 sq. ft. per hour. Similarly, Eric’s rate of lawn-mowing rE is 12 sq. ft. per hour.
Now suppose the two together can finish mowing the lawn in T hours. So in T
hours, Regina mows rR T sq. ft., because she mows at a constant rate; thus she
mows AT AT
10 sq. ft. in T hours. Similarly, in T hours, Eric mows 12 sq. ft. Because
they mow with no interference from each other, the sum total of the areas they
mow in T hours adds up exactly to A; i.e.,
AT AT
(1.55) + = A.
10 12
1 1 1
By the distributive law, AT ( 10 + 12 ) = A. Multiplying both sides by A , we get
1 1
T ( 10 + 12 )= 1, so that
1 5
T = 1 1 =5 (hours),
10 + 12
11
exactly as before.
Observe that equations (1.54) and (1.55) are mathematically identical equa-
tions.
Example 3. Tom and May drive on the same highway at constant speed.
Tom starts 15 minutes before May, and his speed is 48 mph. May starts from the
same spot as Tom and her speed is 60 mph. How many hours after Tom leaves will
May catch up with him? (It is understood that Tom’s car and May’s car could be
idealized to be two points on the number line in doing this problem.)
time duration, but at t = 0, Tom is already 12 miles away from the starting point.
Therefore after t hours, he is (12+48t) miles from the starting point. Since they are
the same distance away from the starting point t hours after May starts driving, we
have 60t = 12 + 48t. Therefore 12t = 12, and t = 1 hour. Since May starts 14 hour
after Tom, it takes 1+ 14 = 1 14 hours after Tom leaves for May to catch up with him.
Pedagogical Comments. Over the years, TSM has developed what may be
charitably called a "generic work problem", which typically reads as follows:
It takes Regina 10 hours to do a job, and Eric 12 hours to do
the same job. If they work together, how long will it take them
to get the job done?
The mathematical defects of such a problem are overwhelming. First, this problem
cannot be solved if Regina and Eric do not each work at a constant rate, yet the
assumption of constant rate is typically not mentioned. A second assumption is
that, somehow, Regina and Eric manage to do different parts of the job, so that at
the end the two parts fit together perfectly without any interference, getting the
job done faster. If the nature of the work is not made explicit, however, such an
assumption will surely strain students’ credulity. A third serious defect is that the
concept of constant rate becomes difficult to formulate precisely when the job in
question is not clearly specified. Indeed, the average rate of work from time t1 to
time t2 is by definition
the amount of work done from t1 to t2
.
t2 − t1
But the numerator has to be a number (referring to some unit, to be sure), and
a student would have a hard time associating the vague description of "amount of
work" with a number. Such vagueness interferes with the learning of mathematics.
Too often this kind of "work problem" ends up being learned entirely by rote.
Make sure that you will not damage your students’ learning with those kinds of
"work problems". Convince the textbook publishers that if such problems are ever
given, there should be an explicit understanding that the work refers to something
specific, such as mowing a lawn. End of Pedagogical Comments.
Exercises 1.7.
(1) Helena drives from Town A to Town B at x mph and drives back at y
mph. What is her average speed for the round trip? If the round trip
takes t hours, how far apart are the towns?
(2) (a) Define precisely what it means to say that water flows out of the faucet
at a constant rate. (b) A faucet with a constant rate of water flow fills
a tub in 9 minutes. If the rate of water flow increases by 10%, how long
would it take to fill the tub?
(3) Kate and Laura start walking at the same time and walk straight toward
each other at constant speed. Kate walks 1 23 times as fast as Laura. If
they are 2,000 feet apart initially and if they meet after 8 12 minutes, how
fast does each walk?
(4) Let A and B be two fractions so that 0 < A < B. (a) Find the midpoint
C of the segment [A, B]; i.e., find C so that B − C = C − A. (b) Find
the point D so that the ratio of the length of [A, D] to that of [D, B] is
1.7. PERCENT, RATIO, AND RATE PROBLEMS 85
2 : 5. (c) Find the point E so that the ratio of the length of [A, E] to that
of [E, B] is m : n, where m and n are nonzero whole numbers.
(5) In Examples 1 and 2 of this section (see page 75), an algebraic method is
used to arrive at the desired conclusion. Now solve these problems again,
pictorially, by making use of the number line.
(6) In a bag of blue, red, and green marbles, the ratio of blue to green marbles
is 12 , and the ratio of red to green marbles is 13 . There are 143 marbles
altogether. How many blue, red, and green marbles are there in the bag?
(7) Most people hold the belief that the concept of constant speed is not at
all complicated. For example, a constant speed of 30 mph means—to
them—that in each one-hour interval, the distance traveled is 30 miles.
The following example shows that they are wrong. Consider an object
that moves according to the following rule:
In the first half-hour, it moves at a constant speed of 60 mph,
and it stops completely during the second half-hour. Then in the
next half-hour, again it moves at a constant speed of 60 mph,
and it stops completely during the next half-hour. Then repeat.
(a) Prove that this motion has the property that in each one-hour interval
(regardless of what the starting point is), the total distance traveled is 30
miles. (b) Is this motion one of constant speed? Explain.
(8) (a) Suppose an object is observed to move at an average speed of s1 mph
from 0 hour to t hours and to move at an average speed of s2 mph from
t hours to 2t hours. Show that its average speed from 0 hour to 2t hours
is 12 (s1 + s2 ) mph. (b) Can you generalize part (a) to the average speed
of an object observed over k time intervals so that each lasts exactly t
hours?
(9) A music store sells a CD player for $225. The owner decides to increase
sales by not charging customers the 8% sales tax. Then he changes his
mind and charges customers $x so that, after they pay the sales tax, the
total amount they pay is still $225. What is x?
(10) A high-tech stock dropped 45% of its value in June to its present value of
N dollars. A stockbroker tells his clients that if the stock goes up by 60%
of its present value, then it would be back to where it was in June. Is he
correct? If so, why? If not, by what percent must the stock at its present
value of N dollars rise in order to regain its former value?
(11) (Sixth-grade Japanese exam question) A train 132 meters long travels at
87 kilometers per hour and another train 118 meters long travels at 93
kilometers per hour. Both trains are traveling in the same direction on
parallel tracks. How many seconds does it take from the time the front of
the locomotive of the faster train reaches the end of the slower train to the
time that the end of the faster train reaches the front of the locomotive
of the slower one?
(12) Highway 505 north of Vacaville is 33 miles long. The speed limit is 55
mph for trucks and 70 mph for cars. Suppose you are driving a truck and
1
2 minute after you enter Highway 505 from Vacaville, the first car passes
your truck. How many more cars will pass your truck before you exit the
86 1. FRACTIONS
highway if (a) everybody drives at the speed limit and (b) the distance
between cars is always 14 mile?56
(13) Driving at her usual constant speed of v mph, Stefanie can get from A to
B in 5 hours. Today, after driving 1 hour, she decides to speed up to a
constant speed of w mph so that she can finish the whole trip in 4 21 hours
instead of 5. By what percent is w bigger than v (compared to v)?
(14) (a) Define precisely what it means for someone to paint a house at a
constant rate. (b) Max and Nancy working together can finish painting
a house in 56 hours. If Max paints the same house alone, it would take
him 90 hours to get it done. How long would it take Nancy to paint the
house if she works alone? (Assume each paints at a constant rate and that
when they paint together there is no interference.)
(15) Each of four people A, B, C, and D can paint a house in 9, 10, 15, and
18 hours, respectively. To paint two such houses they split up in teams
A&D and B&C. If each person paints at a constant rate, which team will
finish first?57
(16) Alfred, Bruce, and Chuck mow lawns at fixed constant rates. It takes
them 2 hours, 1.5 hours, and 2.5 hours, respectively, to mow a certain
lawn if each is mowing alone. If they mow the same lawn together and if
there is no interference in their work, how long will it take them to get it
done?
(17) (a) How much money would be in an account at the end of three years if
the initial deposit is $93 and the bank makes a 6% interest payment at
the end of each year? Assume that no money is ever withdrawn from the
account so that, for instance, at the beginning of the second year there is
$93 plus the interest of the preceding year in the account. (You may use
a calculator, but write down the steps clearly.) (b) How much at the end
of n years?
theorems will be used in such generality without comment in all three volumes.
With this understood, the associative law for addition states that for any x, y,
z, we always have
(1.56) (x + y) + z = x + (y + z)
and the commutative law for addition states that for any x and y,
(1.57) x+y =y+x
A fairly tedious argument, one that is independent of the specific numbers x, y, z
involved but is dependent formally only on these two laws, then leads to the follow-
ing general theorem. For everyday applications, this theorem is all that matters as
far as addition is concerned:
Theorem 1. For any finite collection of numbers, the sums obtained by adding
them up in any order are all equal.58
The reason Theorem 1 is of interest to us stems from the fact that addition
is by definition an operation on two numbers only, such as x + y. If we are given
three numbers, x, y, and z, what are we to make of x + y + z? Do we first do the
addition (x + y) first and then add the sum to z, i.e., (x + y) + z, or do we add x
to the sum (y + z)? The associative law (1.56) says it does not matter because the
two are equal. In fact, if one decides instead to add y to the sum (x + z), it still
yields the same number because
y + (x + z) = (x + z) + y (by (1.57))
= x + (z + y) (by (1.56))
= x + (y + z) (by (1.57)).
This simple argument serves to illustrate why Theorem 1 is correct. Henceforth,
we will write
x+y+z
without fear of ambiguity. Similarly, we will likewise write
x+y+w+z
without any fear of ambiguity because Theorem 1 guarantees that
(x + z) + (w + y) = ((z + y) + w) + x = x + (y + (w + z)) = · · · , etc.
The same comment applies of course to the sum
x1 + x2 + · · · + xn
for any numbers x1 , x2 , . . . , xn .
A similar discussion holds for multiplication. Thus the associative and com-
mutative laws for multiplication state that for any x, y, z, we always have
x(yz) = (xy)z and xy = yx,
Finally, the distributive law is the link between addition and multiplication.
It states that, for any x, y, z,
x(y + z) = xy + xz.
Here it is understood that the multiplications xy and xz on the right side are
performed before the products are added (you may look ahead to page 313 for a
general discussion of the so-called order of operations). A simple argument then
extends this law to allow for any number of additions other than two. For example,
the distributive law for five additions states that for any x, a, b, c, d, e, we have
(1.58) x(a + b + c + d + e) = xa + xb + xc + xd + xe,
where, once again, it is understood that the multiplications xa, . . . , xe on the
right are performed before the products are added. (Observe that equation (1.58)
makes implicit use of Theorem 1.) Moreover, because of the commutativity of
multiplication, we also have
(a + b + c + d + e)x = ax + bx + cx + dx + ex,
among many possibilities.
CHAPTER 2
Rational Numbers
1 We repeat: in mathematics, "rational numbers" means the collection of fractions and neg-
89
90 2. RATIONAL NUMBERS
absolute values. Henceforth, inequalities will play a more prominent role, not just
in a purely numerical setting, but in geometry as well.
Recall that a number, or a real number, is a point on the number line (page
5). We now look at all the numbers rather than just 0 and those to the right of
0. The notation for segments [a, b] will now be extended to all points a, b on the
number line, but it will always be understood that a < b. Take any point p on the
number line which is not equal to 0; such a p could be on either side of 0 and, in
particular, it does not have to be a fraction. Denote the mirror reflection of p
on the opposite side of 0 by p∗ ; i.e., p and p∗ are on opposite sides of 0 and are
equidistant from 0 in the sense that:
if p is to the right of 0, then the segments [p∗ , 0] and [0, p] have
the same length,
p∗ 0 p
and
if p is to the left of 0, then the segments [p, 0] and [0, p∗ ] have
the same length.
p 0 p∗
If p = 0, we define
0∗ = 0.
For any point p, we denote (p∗ )∗ by p∗∗ ; thus p∗∗ is the mirror reflection of p∗ .
The following is a succinct way of expressing the fact that reflecting a nonzero point
across 0 twice in succession brings it back to itself (if p = 0, then of course 0∗∗ = 0):
(2.1) p∗∗ = p.
Because the fractions are to the right of 0, the numbers such as 1∗ , 2∗ , or
( 59 )∗
are to the left of 0. Here are some examples of mirror reflections of fractions
(remember that fractions include the whole numbers):
3∗ (2 43 )∗ 2∗ 1∗ ( 23 )∗ 0 2
3
1 2 2 34 3
The set of all the fractions and their mirror reflections, i.e., the numbers m n
and ( k )∗ for all whole numbers k, , m, n ( = 0, n = 0), is called the rational
numbers and is denoted by Q (the "Q" stands for quotient; see Theorem 2.11
2.2. ADDING RATIONAL NUMBERS 91
on page 116). Recall that the whole numbers, denoted by N, are a subset of the
fractions. The set of whole numbers and their mirror reflections,
. . . , 3∗ , 2∗ , 1∗ , 0, 1, 2, 3, . . . ,
is called the integers and is denoted by Z. Note that the integers are the sequence
of equidistant points, extending infinitely both to the left and right of 0, that
includes the sequence of whole numbers. Note also that (recall, the symbol "⊂”
denotes "is a subset of", or "is contained in")
N ⊂ Z ⊂ Q.
We now recall the concept of order among numbers (page 12): for any x, y on the
number line, x < y means that x is to the left of y. An equivalent notation is y > x.
We say x is smaller than y or y is greater than x.
x y
Numbers which are to the right of 0 (thus those x satisfying x > 0) are called
positive, and those which are to the left of 0 (thus those that satisfy x < 0) are
negative. So 2∗ and ( 31 )∗ are negative, while all nonzero fractions are positive. The
mirror reflection of a positive number is therefore negative, by definition, and the
mirror reflection of a negative number is positive. The number 0 is, by definition,
neither positive nor negative.
You are undoubtedly accustomed to writing, for example, 2∗ as −2 and ( 13 )∗ as
− 13 . You also know that the "−" sign in front of −2 is called the negative sign.
So you may wonder why we employ this ∗ notation and have avoided mentioning
the negative sign up to this point. The reason is that the negative sign, having to
do with the operation of subtraction, simply will not figure in our considerations
until we begin to subtract rational numbers. Moreover, the terminology of "negative
sign" carries certain psychological baggage that may interfere with learning rational
numbers the proper way. For example, if a = −3, then there is nothing "negative"
about −a, which is 3. We therefore think it best to hold off introducing the negative
sign until its natural arrival in the context of subtraction in the next section.
Exercises 2.1.
(1) Show that between any two rational numbers, there is another rational
number.
(i) (1.23)∗ or (1.24)∗ ? (ii) (1.7)∗ or ( 12 ∗
(2) Which is greater? 7 ) ?
1 ∗ 2 ∗ 9 ∗ 4 ∗
(iii) (587 5 ) or (587 11 ) ? (iv) ( 16 ) or ( 7 ) ?
(3) Which of the following numbers is closest to 0 (on the number line):
( 15 ∗ ( 11 ∗ 13 9
7 ) , 5 ) , 6 , 4?
which then allows us for the first time to do the subtraction s − t for any fractions
s and t. In turn, this definition of subtraction justifies the identification of x∗ with
"negative x", −x.
Fundamental assumptions on addition (p. 92)
Addition and mirror reflection (p. 93)
The concept of subtraction (p. 96)
fractions (see the definition of extension on page 29) and not a radical departure
from it, just as the addition of fractions is an extension of the addition of whole
numbers and not a radical departure.
Incidentally, the fact that we assume the addition of rational numbers to be
associative and commutative means that Theorem 1 in the appendix of Chapter
1 (page 86) applies to rational numbers. In particular, we will be free to add a
collection of rational numbers in any order we like.
The last assumption (A3) makes it official that, for example, 2 + 2∗ = 0. As
to (A2), it is not as vacuous as it appears: if x is a negative rational number, then
x+0 is an unknown quantity at the moment because our experience with 0 has been
limited to our encounters with positive quantities. When x is negative, it takes an
explicit assumption to get x + 0 = x.
Because we are assuming that addition among rational numbers is commuta-
tive, (A2) and (A3) then also imply that:
(A2 ) 0 + x = x for any rational number x.
(A3 ) If x is any rational number, x∗ + x = 0.
Now that we have two new operations on the rational numbers—the mirror
reflection ∗ and addition—the first thing we should ask is how they interact with
each other. For example, is the order of applying them interchangeable; i.e., given
two rational numbers x and y, if we add them and then take the mirror reflection,
do we end up with the same number as when we take their mirror reflections first
before adding them? In symbols, this becomes whether (x + y)∗ = x∗ + y ∗ . We
will prove that such is the case, but we need some preparation for this proof in the
form of a lemma.
The lemma in question is the converse of (A3): if x + y = 0, then y = x∗ . The
motivation for the lemma comes from the fact that there are times, even critical
times, when we want to claim that a number y is the mirror reflection of a given
number x (see, for example, the discussion leading up to (2.21) on page 107). What
the lemma tells us is that we can get it done by a straightforward computation;
namely, just compute that x + y = 0. This is a very attractive scenario because it
is always satisfying to be able to give a proof by computation.
We are now in a position to prove what we are after concerning ∗ and addition.
94 2. RATIONAL NUMBERS
Theorem 2.2 immediately tells us how to add two negative fractions, for exam-
ple, ( 34 )∗ and 5∗ , as follows:
∗ ∗ ∗
3 ∗ 3 3
+5 = +5 = 5 .
4 4 4
Perhaps it is not obvious, but Theorem 2.2 is also a statement about how to "re-
move parentheses", as we shall see on page 97.
With these basic facts out of the way, we are in a position to explicitly compute
the sum of any two rational numbers. Since we already know how to add 0 to any
number, by (A2) and (A2 ), it suffices to consider the sum of two nonzero rational
numbers. Now a nonzero rational number is either a fraction or a negative fraction,
so we proceed to look at all the possibilities. Therefore let s and t be any two
nonzero fractions; i.e., s and t are both positive. Then the following four cases
exhaust all the possibilities in adding two rational numbers:
s + t, s∗ + t∗ , s + t∗ , s∗ + t.
By (A1), we already know how to compute the sum s + t as these are fractions
(see equation (1.12) on page 33). Therefore, we only need to examine the three
remaining cases. We emphasize that s and t here stand for any two fractions. Since
s + t∗ = t∗ + s, we see that the fourth case above follows from the third case.3
Therefore we need examine only the following two cases:
s∗ + t∗ and s + t∗ .
The first case is easily disposed of because by Theorem 2.2 , we have
s∗ + t∗ = (s + t)∗ .
3 Such an assertion is sometimes confusing to a beginner who becomes fixated on the symbols
s and t themselves. Here is a more detailed explanation of this assertion. Suppose we can prove a
formula for the third case s + t∗ for all fractions s and t. Then the commutative law of addition
implies that we have a formula for t∗ + s, again for all positive fractions s and t. Since s and t are
arbitrary, we may switch the symbols and write s for t as well as t for s, all the while remembering
that they stand for any fractions. Assuming that the switch has been done, then we have a formula
for s∗ + t for all positive fractions s and t. But then this is exactly the fourth case.
2.2. ADDING RATIONAL NUMBERS 95
(2.5) s ∗ + t∗ = (s + t)∗ ,
(s − t) if s ≥ t,
∗ ∗
(2.6) s+t = t +s=
∗
(t − s) if s < t.
This definition reveals that subtraction is just a different way of writing addition
among rational numbers. This enlarges on equation (1.27) on page 46 and equation
(1.33) on page 57, which show that division is merely a different way of writing
multiplication.
In the rest of this section, we will explore some ramifications of this concept of
subtraction. The overriding fact is that, without this general definition, we do not
have a good grasp of what subtraction is about. Beyond the oddity of not being
able to subtract a larger fraction from a smaller one, there is also the unpleasant
observation that "subtraction is not associative"; i.e., in general, (x − y) − z =
x − (y − z) for fractions x, y, z. For example, letting x = 4, y = 2, z = 1, the left
side is 1 while the right side is 3. We need clarity in the situation.
We start from the beginning. Letting x = 0 in the definition of x − y, we have
0 − y = y∗
2.2. ADDING RATIONAL NUMBERS 97
(2.10) −s − t = − (s + t),
(s − t) if s ≥ t,
(2.11) s−t = −t+s=
− (t − s) if s < t.
We pursue the theme that subtraction is another way of writing addition among
rational numbers and bring closure to a remark we made about equation (1.19) on
page 40 on the subtraction of fractions. We now show that for any rational numbers
a, b, x, y,
(2.14) (a + b) − (x + y) = (a − x) + (b − y).
98 2. RATIONAL NUMBERS
This is because
(a + b) − (x + y) = a + b + (x + y)∗ = a + b + x∗ + y ∗ ,
where the first equality is by the definition of subtraction and the second equality
is on account of Theorem 2.2. Thus (a + b) − (x + y) = (a + x∗ ) + (b + y ∗ ), by
Theorem 1 of the appendix in Chapter 1 (page 86). By the definition of subtraction
again, we get (a + b) − (x + y) = (a − x) + (b − y).
It is clear from this reasoning that there is a similar assertion if a + b is replaced
by a sum of k rational numbers for any positive integer k and the same is done to
x + y. The details are left as an exercise (Exercise 7 on page 99).
There is a certain "commutativity of subtraction" that follows naturally from
the definition of subtraction in terms of addition. For any x, y ∈ Q, we claim that
(2.15) −x + y = y − x.
For example, +4 = 4−
− 23 both being equal to 10
2
3, 3 as a simple application of
Theorem 2.3* shows. In general, we have
−x + y = x∗ + y (2.8)
= y + x∗ (commutativity)
= y−x (definition of subtraction).
By contrast, we should caution that there is no "associative law for subtrac-
tion"; i.e., (x − y) − z = x − (y − z). This is because the left side is equal to
(x − y) − z = (x + y ∗ ) + z ∗ (definition of subtraction)
= x + (y ∗ + z ∗ ) (associativity)
= x + (y + z)∗ (Theorem 2.2)
= x − (y + z) (definition of subtraction).
In other words, (x − y) − z = x − (y + z). Therefore (x − y) − z = x − (y − z) if and
only if x − (y + z) = x − (y − z), and a straightforward argument shows that the
latter happens for all x and y if and only if z = 0. We leave the details to Exercise
6 on page 99.
Before we bring this section to a close, we will prove a generalization of (1.16)
on page 39. We will prove that for any rational numbers x and y,
(2.16) the length of the segment [x, y] is y − x.
To prove this, first let x, y ≥ 0.
0 x y
Then by definition, the lengths of [0, y] and [0, x] are y and x, respectively, so that
if the length of [x, y] is , then x + = y (because [0, y] is the concatenation of [0, x]
and [x, y]).4
It follows that the length of [x, y] is = y − x. Next, suppose x, y ≤ 0. Then
we know the length of [−y, −x] is (−x) − (−y) by what we have just proved.
x y 0 −y −x
4 We are appealing to geometric intuition right now, but this reasoning about length will be
put on a formal basis in Section 4.1 of [Wu2020c] as the Additivity Property of length.
2.3. THE VECTORIAL REPRESENTATION OF ADDITION 99
Since [x, y] and [−y, −x] have the same length, we see that the length of [x, y] in
this case is (−x) − (−y) = (−x) + y = y − x on account of (2.15). Finally, there
remains the case that x < 0 and y > 0.
x 0 y
Now the segment [x, y] is the concatenation of [x, 0] and [0, y]. By the preceding
results, we have
length of [x, y] = (0 − x) + (y − 0) = y − x.
The proof of (2.16) is complete.
Exercises 2.2.
(1) Prove that for all x, y ∈ Q, if x + y = x, then y = 0.
(2) Compute (a) −202 + 189, (b) −93 − 728, (c) −3 25 + 9,
(d) −4 67 + 2 23 , (e) −7.1 − 22 13 , (f) 7 − (2.5 − 3 23 ),
7 ∗
(g) (−703.2 + 689.4) − ( 15 − 3 23 ), (h) ( 56 − (1 18 5
) ) + 24 .
(3) Without using Theorem 2.3 or Theorem 2.3* and using only (A1)–(A3),
Lemma 2.1, and Theorem 2.2, explain as if to an eighth grader why 43 −
2 15 = − 13
15 .
(4) Prove (2.12) and (2.13) on page 97: i.e., for all x, y ∈ Q, we have
−(x − y) = −x + y and −(−x + y) = x − y. Give a reason at each
step.
(5) Explain carefully why each of the following is true for all x, y, z ∈ Q:
(a) (x − y) − z = (x − z) − y. (b) x − (y − z) = (x − y) + z. (c) (x + y) − z =
x − (z − y).
(6) Prove that for rational numbers x, y, and z with x, y = 0, x − (y + z) =
x − (y − z) if and only if z = 0.
(7) (a) Let a, b, . . . , z, w be rational numbers. Give a detailed proof of the
following and justify every step:
(a + b + c + d) − (x + y + z + w) = (a − x) + (b − y) + (c − z) + (d − w).
(b) Can you extend (a) from a pair of 4 rational numbers to a pair of n
rational numbers for any positive integer n? For notation, try
(a1 + a2 + · · · + an ) − (x1 + x2 + · · · + xn )
= (a1 − x1 ) + (a2 − x2 ) + · · · + (an − xn ).
(Although officially we do not take up mathematical induction until Sec-
tion 1.7 in [Wu2020b], you may use that technique here if you want.)
are fractions, the vector addition so described agrees with the earlier definition of
fraction addition in terms of the concatenation of segments.
Vectors and their addition (p. 100)
Adding rational numbers via vectors (p. 102)
We learned in the preceding section how to add rational numbers on the as-
sumption that addition satisfies the three reasonable properties (A1)–(A3) (see page
92), and we also observed that this way of adding rational numbers coincides with
the concatenation of segments if the rational numbers are positive (see page 12 for
the meaning of "concatenation"). For a sixth- or seventh-grade classroom, however,
it is better to have a more concrete approach to addition, either as an alternative
or, at least, as a supplement. We now outline such an approach by returning to
the number line and introducing new objects called "vectors". Then we define the
addition of vectors, and on the basis of that we give a new definition of the addi-
tion of rational numbers. At the end of the section, we will indicate why the two
definitions, the one introduced in the last section and the present one using vectors,
coincide.
Thus we start from the beginning all over again, pretending never to
have heard of the addition of rational numbers. Let us introduce a definition: a
vector is a segment on the number line, together with a designation of one of its
two endpoints as a starting point and the other as an endpoint. In pictures,
we put an arrowhead at the endpoint of a vector to indicate its direction from the
starting point to the endpoint, as shown:
starting
point endpoint
-
We will continue to refer to the length of the segment as the length of the vector.5
We call the vector left-pointing if the endpoint is to the left of the starting point,
and right-pointing if the endpoint is to the right of the starting point. The above
vector is right-pointing, for example. The direction of a vector refers to whether
it is left-pointing or right-pointing.
→ →
−
We denote vectors by placing an arrow above the letter, e.g., A , − x , etc. For
→
−
example, the vector K below is left-pointing and has length 1, with a starting point
→
−
at 1∗ and an endpoint at 2∗ , while the vector L is right-pointing and has length 2,
with a starting point at 0 and an endpoint at 2.
3∗ 2∗ 1∗ 0 1 2 3
-
−
→ →
−
K L
We will regard a vector with zero length as the zero vector, to be denoted by
→
− →
−
0 . Unless stated to the contrary, 0 will always be understood to be the vector
with starting point and endpoint at 0.
For the purpose of discussing the addition of rational numbers, we can further
simplify matters by restricting our attention to a special class of vectors. Let x
5 Remember that length is always ≥ 0.
2.3. THE VECTORIAL REPRESENTATION OF ADDITION 101
be a rational number; then we define the vector − →x to be the vector with starting
point at 0 and endpoint at x. It follows from the definition that if x is a nonzero
fraction, then the segment of the vector −
→
x is exactly [0, x]. Here are two examples
of vectors arising from rational numbers:
1.5
4∗ 3∗ 2∗ 1∗ 0 1 2
-
−∗
→ −→
3 1.5
In the following, we will only consider vectors −→x where x ∈ Q, so that all
vectors under discussion will be understood to have their starting point at 0.
3∗ 2∗ 1∗ 0 1 2
-
→ −
− →
The vector 2 + 1∗ is therefore the vector that starts at 0 and ends at 1, as shown:
3∗ 2∗ 1∗ 0 1 2
-
→
− →
−
2 + 1∗
The vectorial definition of the addition of rational numbers may be best illus-
trated by (interactive) animations. Here are links to four such animations (due to
Sunil Koswatta) for the four cases of x > 0 and y > 0, x > 0 and y < 0, x < 0 and
y > 0, and x < 0 and y < 0, respectively:
https://www.geogebra.org/m/jsvempak,
https://www.geogebra.org/m/ynqmvgsz,
https://www.geogebra.org/m/cqsz7n7w,
https://www.geogebra.org/m/svvzmgyd.
In general, a vector is completely determined by its length and its direction.
Therefore the following is a complete description of the sum of any two vectors:
If both vectors −
→
x and −→y have the same direction, then the sum
vector x + y has the same direction as −
→
− →
− →x and −→y , and its length
is the length of the concatenation of the segments of − →
x and − →
y
(see page 12 for the meaning of "concatenation") and is therefore
the sum of the lengths of −
→
x and − →y.
→
− →
−
If the vectors x and y have different directions, then the
x +−
direction of the sum vector −
→ →y is the same as that direction
102 2. RATIONAL NUMBERS
of the vector with the greater length, and the length of the sum
vector −
→x +− →
y is the difference of the lengths of the vectors −→
x
→
−
and y .
As an application, observe that the above description of the sum −→x +−→
y , where
x and y are two rational numbers, does not make any distinction between whether
x comes first or y comes first. In other words, according to this description of vector
addition, −
→x +− →
y is the same vector as −→y +−→x . We have the following lemma.
Lemma 2.5. The addition of rational numbers using the vector method is com-
mutative.
To get some intuitive feelings for the sum of two rational numbers x and y, we
will now systematically go through the various possibilities for − →
x +−→y . These are:
(i) Both x and y are positive.
(ii) Both x and y are negative.
(iii) x is positive but y is negative, and the length of − →
x is less
→
−
than the length of y .
(iv) x is positive but y is negative, and the length of − →
y is less
→
−
than the length of x .
Because we know from Lemma 2.4 that − →
x +− →y =−→y +−→x , the preceding four cases
exhaust all the possibilities. For example, there is no need to consider the case in
which x is negative but y is positive and the length of − →
y is less than the length of
→
−
x , because looking at y + x (which is equal to x + −
→
− →
− →
− →y ), this would be case (iii)
above provided we interchange the symbols x and y.
Now case (i) is straightforward: we are given the following picture:
0 →
−y y
- -
→
−x x
→
− →
−
The vector x + y is, according to the definition of the vector sum, right-pointing
with length equal to the sum of the lengths of −→x and −
→y . The number x + y is
therefore the point indicated by the down-arrow:
2.3. THE VECTORIAL REPRESENTATION OF ADDITION 103
0 −
→
y in new position
- -?
x
This confirms the fact that for two fractions x and y, their sum as rational numbers
by the use of vectors is exactly the length of the concatenation of the segments
[0, x] and [0, y], i.e., the same as their sum as fractions.
y −
→
y
-
0−
→
x x
Then, by definition, x + y is the point as indicated:
−
→
y in new position
?
-
0 x
In this case x + y is negative, and the picture shows that the length of the segment
[x + y, 0] is the length of −
→y minus the length of −→x . This is consistent with the
→
− →
−
description of x + y above.
We leave cases (ii) and (iv) to an exercise (Exercise 1 on page 104).
The preceding discussion shows that if x and y are fractions, then x + y, being
the length of the concatenation of [0, x] and [0, y], has exactly the same meaning
as that on page 33. Therefore the addition of fractions x + y as defined by vector
addition in (2.17) coincides with the addition of fractions in the sense of page 33. In
fact, we are going to show that the addition of rational numbers according to (2.17)
using vectors is equal to the addition of rational numbers according to (A1)–(A3)
on page 92.
Let s, t be two fractions. Then the sum of two rational numbers has to be one
of the following four types of sums:
s + t, s + t∗ , s∗ + t, s∗ + t∗ .
In order to show that the sum of two rational numbers is the same whether they
are added according to the method of Section 2.2 or according to the method of
vectors in equation (2.17) (page 102), it suffices to show that each of the above
four sums is the same regardless of which of these two methods is used. We shall
address each of these four sums in the order listed above.
For this particular discussion, we are going to employ an ad hoc notation: for
two rational numbers x and y,
def
x + y = the sum of x and y as defined by (2.17),
while x + y will continue to denote the addition in the sense of Section 2.2. We
have just seen that for fractions s and t
s + t = s + t = the ordinary sum of s and t.
Next, we prove s + t∗ = s + t∗ . According to equation (2.6) of Section 2.2
on page 96,
∗ s−t if s ≥ t,
s+t =
(t − s)∗ if s < t.
104 2. RATIONAL NUMBERS
Now we use the description of the sum of two vectors on page 101: if s ≥ t, then
→
−
the direction of −
→s + t∗ is the direction of s (i.e., right-pointing) as it is longer, and
its length is s − t. This shows s + t∗ = s + t∗ , in case s ≥ t. If, however, s < t,
then the same observation about the sum of two vectors says that the direction
→
− →
−
of −
→s + t∗ is the direction of t (i.e., left-pointing) as it is longer, and its length
is t − s. In other words, we also have s + t∗ = s + t∗ in case s < t. Thus
s + t∗ = s + t∗ for any fractions s and t.
Next, we prove
t + s∗ = t + s∗ .
t + s∗ = s∗ + t.
The combination of the last three equalities clearly proves (2.18). We have thus
disposed of three of the four sums.
It remains to deal with the sum s∗ + t∗ . By Theorem 2.2 on page 94, s∗ + t∗ =
→
− →
− → −
− →
(s + t)∗ . But since both s∗ and t∗ are left-pointing, so is s∗ + t∗ and the length
→ −
− → →
− →
−
of s∗ + t∗ is just the length of the concatenation of the segments of s∗ and t∗ , i.e.,
∗ ∗ ∗ ∗
equal to s + t. Hence s + t = s + t as well.
We have therefore proved that x + y = x + y for all rational numbers x and
y. Thus the addition of rational numbers can be defined purely algebraically as in
Section 2.2 or by the use of vectors in (2.17).
Since the addition of rational numbers according to the method of Section 2.2
is associative (see (A1)), it follows that the addition of rational numbers according
to (2.17) is also associative. The proof of this conclusion using vectors is extremely
tedious and therefore will not be given, but see Exercise 4 in the following exercises
to get an idea.
Exercises 2.3.
(1) Referring to the discussion after Lemma 2.5 on page 102, find the sums
in case (ii) and case (iv) for −
→
x +−
→y , where x and y are rational numbers.
(2) For each of the following sums of rational numbers, explain as if to a sixth
grader using the definition of addition in (2.17) whether it is positive or
negative:
31 + 29∗ , (68 12 )∗ + 68 25 , (1 78 )∗ + 2 10
+ (2 14 )∗ , (1 10
1
, 16
7
3 ∗
) + 97 .
(3) Compute each of the following in two different ways: by the method of
Section 2.2 but without making use of Theorem 2.3 on page 96 and then
by using the vector definition in (2.17): (a) 54 + 3∗ and (b) ( 52 )∗ + 23 .
2.4. MULTIPLYING RATIONAL NUMBERS 105
(4) Give a direct proof of the associative law of addition for rational numbers
using the definition (2.17) in the following two special cases: (3+6∗ )+7 =
3 + (6∗ + 7) and (6 + 3.5∗ ) + 2 = 6 + (3.5∗ + 2). (You will have a better
understanding of why it is so tedious to prove associativity using vectors
after doing this exercise.)
This surprising fact, the bane of many school students, can be given a very
short proof. We will present this proof at the end of the section (see page 109),
but not here, because for beginning algebra students, such a sophisticated proof
would be far from appropriate or enlightening. Instead, we will take a leisurely
tour through the basic multiplication facts of fractions and negative fractions and
wind up with equation (2.20) as our final destination.
Our first order of business is to find out explicitly how to multiply rational
numbers. (Before proceeding further, you may wish to review the discussion about
the nature of this kind of computation in the proof of Theorem 2.3 on page 96.)
Thus let x, y ∈ Q. What is xy? If x = 0 or y = 0, then xy = 0 according to
equation (2.19). We may therefore assume that both x and y are nonzero, so that
each is either a fraction s or the negative of a fraction, −s. Therefore, letting s and
t be nonzero fractions, we consider the following four cases separately:
st, (−s)t, s(−t), and (−s)(−t).
Since Section 1.4 already dealt with the case st, it suffices to deal with the remaining
three cases. We will prove that, for all positive fractions s and t,
(−s)t = −(st) (e.g., (−5)7 = −35),
According
to Lemma 2.1* on page 97, all we have to do is to prove that
(−5) × 7 + 35 = 0. This is a straightforward computation using the distributive
law:
(−5) × 7 + 35 = (−5) × 7 + (5 × 7) = ((−5) + 5) × 7 = 0 × 7 = 0,
where the last equality makes use of equation (2.19) on page 105. The general case
is no different: to prove (−s)t = −(st), it suffices to prove
(2.21) (−s)t + st = 0.
We simply carry out the same computation:
(−s)t + st = (−s) + s t = 0 · t = 0,
where we make use of equation (2.19) again in the last step. So the proof for
(−s)t = −(st) is complete.
It remains to deal with the third equality above; i.e., for all nonzero fractions
s and t,
(−s)(−t) = st.
We can again invoke Lemma 2.1* on page 97 if we think of st as minus −(st);
in other words, st = −(−(st)). Then this lemma says (−s)(−t) would be equal
to −(−(st)) if we could prove (−s)(−t) + (−(st)) = 0. This we can do because
we have just finished proving that −(st) = (−s)t, and another application of the
distributive law now gives
(−s)(−t) + (−(st)) = (−s)(−t) + (−s)t
= (−s) (−t) + t (distributive law)
= (−s) · 0 (by (A3*) on page 97)
=0 (by (2.19) on page 105)
and the proof of (−s)(−t) = st is also complete.
Theorem 2.6 gives us a basic idea about the multiplication of rational numbers,
but it needs to be complemented by a broader view of the situation. We will attend
to that presently, but let us first take note of some of its immediate consequences.
For example, the following simple rules are implied by Theorem 2.6:
positive × positive is positive,
positive × negative is negative,
negative × negative is positive.
In particular, we know that
x2 ≥ 0 for any x ∈ Q
regardless of whether x is 0 or positive or negative. Anticipating FASM (page 133;
more precisely, using (D) and (E) on p. 123 and p. 124), we have
(2.22) x2 ≥ 0 for any number x.
108 2. RATIONAL NUMBERS
The equality (−s)(−t) = st for all fractions s, t is among the Most Frequently
Asked Questions in school mathematics. Not all students will understand the pre-
ceding proof, but very likely they will all want a reasonable explanation of this fact.
We now address this classroom issue by presenting a much simpler proof of some-
thing less, namely (−1)(−1) = 1. There are two reasons why we take the trouble
to give an independent proof of a result already implied by Theorem 2.6. One is
that this proposed proof may be more widely accessible to students; in mathematics
education, every bit of reasoning helps. The other reason is the surprising fact that
this simple result actually leads to the proof of a generalization of Theorem 2.6.
We begin with a basic, but believable, observation and will prove it by using
the hands-on vectorial approach to the addition of rational numbers.
Theorem 2.7. For any rational number x, the number (−1)x is the mirror
reflection of x. In symbols, (−1)x = −x.
Remark. This theorem may be better understood in the language of ∗ on
page 90: it says (−1)x = x∗ . In this form, one may get a better idea of what the
theorem tries to say: the left side (−1)x is about multiplication (the product of
−1 and x), the right side x∗ is about the operation of mirror reflection, and yet
the two sides are equal. If one realizes that multiplication and mirror reflection are
seemingly two independent operations (notice that (M1) and (M2) on page 105 do
not mention ∗ explicitly), then one is less likely to take Theorem 2.7 for granted.
Proof. The number −x is the point on the opposite side of 0 from x so that x and
−x are equidistant from 0. Therefore this is the picture we want to be true when
x is positive:
(−1)x 0 x
x 0 (−1)x
Now think of the sum x + (−1)x in terms of vectors (see p. 101 and p. 101). If
we can show that
x + (−1)x = 0,
→
− −−−→
then the vectors x and (−1)x must have opposite direction and equal length (see
the description of the sum of two vectors on page 101). Consequently, (−1)x will
have to be equal to −x, as desired. Let us therefore prove that x + (−1)x is equal
to 0. We use the distributive law:
(M2)
x + (−1)x = 1 · x + (−1)x = 1 + (−1) x.
→
− −−→
But 1 + (−1) = 0 because 1 and (−1) have the same length and have opposite
directions (recall: we are only using facts from the vectorial approach to the addition
of rational numbers). Therefore
x + (−1)x = 1 + (−1) x = 0 · x = 0,
where the last step is by (2.19) on page 105. The proof of Theorem 2.7 is complete.
2.4. MULTIPLYING RATIONAL NUMBERS 109
Corollary. (−1)(−1) = 1.
We now draw on our experience of having proved Theorem 2.6 and Theorem
2.7 to prove a general theorem: instead of multiplying fractions with other fractions
or negative fractions, we show directly how to multiply arbitrary rational numbers.
Proof. We first prove (2.23); i.e., (−x)y = x(−y) = −(xy). If we read the equality
of Theorem 2.7 backwards, we get −x = (−1)x. Therefore (−x)y = ((−1)x)y =
(−1)(xy), by the associative law. Now we apply Theorem 2.6 again, but this time
to the rational number (xy). Then we get (−1)(xy) = −(xy). Hence
(−x)y = (−1)(xy) = −(xy).
We can prove x(−y) = −(xy) in a similar manner or we can apply the commutative
law twice to what we have just proved to get x(−y) = (−y)x = −(yx) = −(xy).
Next, we prove (2.24); i.e., (−x)(−y) = xy. Theorem 2.7 gives (−x)(−y) =
((−1)x)((−1)y). Now Theorem 2 in the appendix of Chapter 1 (page 88) implies
that we can multiply four given numbers in any order and the result will be the
same. Thus,
((−1)x)((−1)y) = ((−1)(−1))(xy).
So the corollary to Theorem 2.7 says ((−1)(−1))(xy) = 1 · (xy) = xy. Theorem 2.8
is proved.
If we reflect on Theorems 2.6–2.8 a little, we will come to the realization that
the key ingredient in all these proofs is the distributive law. This law was explic-
itly mentioned in the proofs of Theorems 2.6 and 2.7 and is critical to the proof
of Theorem 2.8 as well because Theorem 2.7 lies behind Theorem 2.8. There is a
fundamental reason why this has to be the case, namely, what connects addition
to multiplication is the distributive law, and since we need to import information
about addition (such as (A3*) on page 97 and Lemma 2.1* on page 97) to multi-
plication, the distributive law is our only recourse.
we have
(−x)(−y) + (−(xy)) = (−x)(−y) + ((−x)y) (because (−x)y = −(xy))
= (−x) (−y) + y (distributive law)
= (−x) · 0 = 0.
The proof of Theorem 2.8 is complete.
We conclude this section with three remarks. First, Theorem 2.8 yields an ex-
plicit algorithm for the multiplication of rational numbers: if m k
n and are fractions,
then
m k mk
× − = − ,
n n
m k mk
− × − = .
n n
In the next section, we will see that these formulas remain valid even when m, n,
k, are rational numbers (rather than just whole numbers).
Second, Theorem 2.7 gives us another way to think about how to remove paren-
theses, to the effect that −(x + y) = −x − y for all x, y ∈ Q (see Theorem 2.2* on
page 97). This is because
−(x + y) = (−1)(x + y) (Theorem 2.7)
= (−1)x + (−1)y (distributive law)
= −x + (−y) (Theorem 2.7 again)
= −x − y (by (2.9) on p. 97).
Third, we use Theorem 2.8 to prove the following form of the distributive law,
which is commonly taken for granted:
(2.25) x(y − z) = xy − xz for all x, y, z ∈ Q.
Indeed, by using the ordinary distributive law, we have x(y − z) = x(y + z ∗ ) =
xy + xz ∗ = xy + x(−z). But xy + x(−z) = xy + (−xz) by equation (2.23), so
x(y − z) = xy + (−xz). By (2.9) on p. 97, xy + (−xz) = xy − xz, and we have the
desired conclusion.
Exercises 2.4.
(1) Use (M1) and (M2) to give a direct, simple explanation of 0 · x = 0 for
any x ∈ Q.
(2) Compute and justify each step:
(a) (−4)(−1 12 + 14 ),
(b) 165 − 560( 34 − 87 ),
(d) (20 29 × (− 17
5
)) +(3 29 × 17
5
).
(3) (a) Find a simple proof of (−1)n = −n for a whole
number n without mak-
ing use of Theorem 2.7. (Hint: (−1)n = (−1) (−1) + (−1) + · · · + (−1) .
n
2.4. MULTIPLYING RATIONAL NUMBERS 111
bring awareness to the fact that there are two ways to remove parentheses—
the other makes use of Theorem 2.2* on page 97—and both are based on
genuine mathematics.)
(8) Consider each of the following two statements about any rational number
x:
(a) 3x < x.
1
(b) 10 x > x.
If it is always true or always false, prove the statement. If it is sometimes
true and sometimes false, give examples to explain why.
(9) (a) I have a rational number x so that 5 − (2x − 1) = (1 − 83 x). What is
this x? (b) Same question for (2 − 3x) − (x + 1) = 53 x + 12 .
(10) (For this exercise, let us extend the definition on page 24 of Chapter 1 by
m
defining, for any rational number x and any fraction m n , the meaning of n
3 ∗
of x to be m n · x.) (a) A number x has the property that 4 of x exceeds
x itself by 49. What is this x? (b) A number t = 0 has the property that
twice t exceeds t2 by 47 of t. Find t.
As before, we begin such a discussion with the proof of a theorem that is the
counterpart of Theorem 1.8 on page 56.
For example, making use of Theorem 2.8 on page 109 in addition to Theorem
1.8, we have that if x = − 13 and y = 25 , then z = −( 31 × 52 ). Similarly, if x = 75 and
y = − 23 , then z = −( 75 × 32 ), or if x = − 75 and y = − 23 , then z = 75 × 32 . Note that,
except for the negative sign, the z in all cases is obtained by invert-and-multiply.
Proof. We will first prove the existence of such a z and worry about its uniqueness
later. If x = 0, we can just take z to be 0. Thus we may assume that x = 0.
Thus with a nonzero x given, suppose y > 0. If also x > 0, then both x and
y are fractions and the existence of z is already known (see Theorem 1.8 on page
56). If however x < 0, then (−x) is a fraction and again there exists a fraction z
2.5. DIVIDING RATIONAL NUMBERS 113
such that (−x) = z y. Thus x = −(−x) = −(z y) = (−z )y, by equation (2.23) in
Theorem 2.8 on page 109. So letting z = −z , we have proved that x = zy when
x < 0. Together, we have proved that for any x ∈ Q, there is a z ∈ Q so that
x = zy in case y > 0.
Still with x = 0 given, suppose y < 0. We have to prove that there is a z ∈ Q
so that x = zy. Since (−y) > 0, we know from the last paragraph that there
is a z ∈ Q so that x = z (−y). Again, by equation (2.23), this is the same as
x = (−z )y. Setting z = −z , we have once more proved that there is a z ∈ Q so
that x = zy in case y < 0.
We have now proved that such a z exists in all cases.
Now we prove uniqueness, i.e., "one and only one z" if y = 0. The proof is
a standard piece of mathematical reasoning which may take some getting used to,
but you cannot avoid encountering it in higher mathematics. We first prove that
if there are two numbers Y1 and Y2 so that yY1 = 1 and yY2 = 1, then necessarily
Y1 = Y2 . We begin by multiplying both sides of yY1 = 1 by Y2 to get
(2.26) Y2 (yY1 ) = Y2 .
By Theorem 2 in the appendix of Chapter 1 (page 88), the left side of (2.26) is
equal to (yY2 )Y1 . But yY2 = 1, so the left side of (2.26) is equal to Y1 , and we
have Y1 = Y2 , by (2.26). Therefore Y1 and Y2 coincide. Hence there is only one
number Y which satisfies yY = 1. This Y is called the multiplicative inverse of
the nonzero y.
Now we look at the general case. With arbitrary x, y given, suppose x = zy
for some number z. We will show that z = xY and is therefore unique (Y is the
multiplicative inverse of y). To this end, multiply both sides of x = zy by Y to get
xY = (zy)Y . But the right side is equal to
z(yY ) = z · 1 = z.
y = 0,
(2.27) x = zy ⇐⇒ z = xy −1 .
[Wu2020b]). In TSM, this corollary (for real numbers x and y) is known as the
zero product property or zero product rule, but when it is stated without
any proof in a course on algebra, it often leads to misapplications. For example,
students begin to believe that if (x − 5)(x + 1) = 7, then x − 5 = 7 or x + 1 = 7 (see
p. 223 of [MUST]). This is to be expected because, when students are not shown
any reasoning for such an assertion, they do not see the importance of having a 0
on the right side of the equality xy = 0 and are therefore inclined to extrapolate the
corollary to xy = 7 or xy = b for any number b. Another lesson to learn from this
student misconception is that we cannot afford to wait until algebra to teach this
corollary, because it leads to a misunderstanding that this is an abstract algebraic
fact. Thus, the corollary is sometimes proved only by appealing to the concept
of an integral domain in abstract algebra (see p. 250 of [MUST]), whereas it is
nothing more than a simple consequence of the existence of a multiplicative inverse
for every nonzero number. No additional abstractions are needed. Of course, this
kind of misunderstanding is also caused by the misinformation about "variables"
in TSM, and we will deal with this issue in Section 6.1 (pp. 298ff.) and Section
6.2 (pp. 322ff.). We therefore strongly suggest that you single out Corollary 1 and
its proof for emphasis when you teach rational numbers—and not wait for a course
in algebra—so that students are already mentally prepared before they take up
algebra. The more general form of the corollary where x and y are real numbers
will be found in Exercise 6 after Section 2.1 in [Wu2020c]. End of Pedagogical
Comments.
Proof. This can be verified separately for positive and negative y’s (see Exercise
3 on page 120), but it is also valuable to learn an abstract proof. Indeed, from
1 = y −1 y, we get 1 = (−(y −1 ))(−y) (by equation (2.24) on page 109). Comparing
the latter with 1 = ((−y)−1 )(−y) and using the uniqueness of the multiplicative
inverse of −y, we get (−y)−1 = −(y −1 ), as claimed.
We emphasize that this xy here is a new notation, though it is one that extends
the old notation. In greater detail, the symbol xy makes sense thus far only when
x and y are fractions (page 57), but the x and y in the preceding definition are
possibly negative numbers and, for these, xy does not yet have a meaning. On the
other hand, if x and y are fractions, this meaning of xy of course coincides with the
old one on page 57.
The division of x by y is also called the quotient of x divided by y. It
follows from the definition that
x
(2.28) x= y.
y
We note that equation (2.28) has the virtue of suggesting the "cancellation" of the
y’s to get x.
By equation (2.27) on page 113, the quotient z of x divided by y satisfies
z = xy −1 . Therefore we have the following equality since both sides are equal to
the quotient z of x divided by y:
x
(2.29) = xy −1 .
y
Thus equation (2.29) says explicitly that dividing by y is the same as multiplying by
the (multiplicative) inverse of y. One sees that this is a continuation of the theme,
started on page 47, to the effect that division is nothing more than a different way
of writing multiplication.
We note that the definition of the quotient xy of x divided by y as the number
z so that x = zy can be symbolically represented as
x
(2.30) = z ⇐⇒ x = zy.
y
Assertion (2.30) is unfortunately the source that gives rise to the slippery phrase in
TSM, to the effect that "division and multiplication are inverse operations". Make
sure you understand that this phrase sounds good but actually makes no sense.
Since we have already explained this point at length in the Mathematical Aside on
page 55, we will not belabor the point any further.
that − 37 is also the quotient of 3 divided by −7, then by the uniqueness part of
116 2. RATIONAL NUMBERS
3
Theorem 2.9, −7 and − 37 must be equal. To show that − 37 is also the quotient of
3 divided by −7, it suffices to show (by definition of the quotient) that
3
3= − × (−7).
7
But this is so because, by equation (2.24) on page 109, the right side is equal to
7 × 7, which is indeed equal to 3, by virtue of the cancellation rule (page 47). In a
3
This theorem will be seen to be a special case of a basic fact about so-called
rational quotients, to be introduced presently. But in terms of everyday computa-
tions in school mathematics, Theorem 2.10 is well-nigh indispensable and deserves
to be singled out. We can put Theorem 2.10 in a broader perspective. When x
and y are whole numbers, this theorem implies that every negative rational number
can be written as a division of two integers where the denominator is positive. For
example, − 37 = −37 , as we have seen in (2.31). Since every positive rational number
(i.e., nonzero fraction) is already known to be a division of two positive integers
(see (iii) on page 58), we have proved the following.
Theorem 2.11. Every rational number can be written as a division of two
integers. In addition, the integers can be chosen so that the denominator is positive.
2.5. DIVIDING RATIONAL NUMBERS 117
Theorems 2.11 and 2.10 are conceptually important because they give an alter-
nate conception of a rational number. In advanced mathematics, rational numbers
are sometimes defined as quotients of integers.
Theorems 2.11 and 2.10 imply that, for instance, the rational number − 97 is
equal to −9 9
7 or −7 , and the former is the preferred choice. Before explaining why
this is so, we first rewrite the algorithm for multiplying rational numbers in its final
form: if a, b, c, d are integers, then
a c ac
× = .
b d bd
The proof is nothing more than a routine case-by-case verification. For example,
−3 −14 3 14
× = − × (Theorem 2.10)
7 −5 7 5
3 14
= − × (equation (2.23))
7 5
3 × 14
= − .
7×5
By Theorem 2.10 again,
3 × 14 3 × 14
− = .
7×5 −(7 × 5)
Hence,
−3 −14 3 × 14
× =
7 −5 −(7 × 5)
(−3) × (−14)
= (equations (2.23), (2.24)).
7 × (−5)
The reasoning for the general case is the same.
As a special case, we have that for integers a, b,
a 1
= × a.
b b
−9 9
We can now give an indication of why 7 is preferred, most of the time, over −7
Rational quotients
Just as the division of fractions led to the concept of complex fractions, the
division of rational numbers leads to a similar concept which, for lack of a name,
will be referred to as rational quotients. Let x, y, z, w be rational numbers so
118 2. RATIONAL NUMBERS
that they are nonzero where appropriate in the following. Then xy is an example
of a rational quotient; x will be called its numerator, and y its denominator.
We now list the rational-quotient analogs of the basic properties (a)–(d) of complex
fractions on p. 70.
(b) x
y
z
= w if and only if xw = yz.
(c) x
y
±w
z
= xw±yz
yw
.
(d) x
y
×w
z xz
= yw .
As in Section 1.6 (page 68), we will avoid proving (a)–(d) by the mechanical
procedure of writing out each rational number as a quotient of two integers for the
routine computations and relying on (2.23) on page 109 and the cancellation rule
(equation (1.29) on p. 47) to get the answer but will instead make repeated use of
the uniqueness part of Theorem 2.9. To prove (a), for example, let A = xy , B = zxzy ,
and we will prove that A = B. By the definition of division in Q (page 115), we
have x = Ay and zx = B(zy). But the first equality implies zx = z(Ay) which is of
course equal to zx = A(zy). Now compare the latter with zx = B(zy). Theorem
2.9 says there is only one way to express zx as a rational multiple of zy, so we must
have A = B.
z −1 w
w = z.
and denominators. The cumulative effect of many such compulsory blind leaps
of faith in TSM is the corrosion of mathematics learning. It forces students to
ignore the fundamental fact that a mathematical conclusion is valid only when
certain hypotheses are satisfied (compare the discussion on pp. 69ff.). For survival,
students learn instead to apply indiscriminately whatever they know—regardless
of the circumstances—in order to get an answer. At the end, they come to believe
that every skill is unconditionally valid regardless of hypothesis.
Be sure to point out to your students the mathematical reasoning behind what
seem to be rote skills in Theorem 2.10, Theorem 2.11, and the rules (a) and (d) for
rational quotients. These skills lie behind the general form of invert-and-multiply
that justifies the preceding computation. End of Pedagogical Comments.
Exercises 2.5.
Recall the definition of x < y between any two numbers x and y (see page 91):
it means x is to the left of y on the number line.
x y
We also write y > x for x < y. A related symbol is x ≤ y (or y ≥ x), which
means x < y or x = y.
In this section, we begin to take a serious look at the comparison of rational
numbers8 ; i.e., if x and y are rational numbers, which of the following is true: x ≤ y
or x > y? We will prove several basic facts about inequalities that are useful in
school mathematics. In general, we use the symbol "<" exclusively, but you should
be aware that every one of these inequalities has an analogous statement about
"≤".
We take note of three simple properties of the inequality between numbers;
they are obvious consequences of the fact that numbers are points on the number
line, and it does not matter if they are rational numbers or just real numbers. The
first two are the following:
Reflexive property of "≤". If x ≤ y and y ≤ x, then x = y.
The third property deserves to be singled out because it plays a critical role in
many proofs. Given any two numbers x and y, then either they are the same point
or if they are distinct, one is to the left of the other; i.e., x is to the left of y or
y is to the left of x. These three possibilities are obviously mutually exclusive. In
symbols, this becomes:
Trichotomy law. Given two numbers x and y, then one and
only one of the three possibilities holds: x = y or x < y or x > y.
The way this law comes up in proofs is typically the following. Suppose we try to
prove that two numbers x and y are equal. Sometimes it is impossible or difficult
real numbers. For this reason, we will use on occasion "number" instead of "rational number" in
the discussion of this section.
122 2. RATIONAL NUMBERS
For example, given 2 < 3, we can verify directly that 2 − 15 < 3 − 15 and
7 7
2+ 3 < 3 + 3.
We first prove that x < y implies x + z < y + z for any z. So suppose x < y.
Because of the commutativity of addition, it suffices to prove z + x < z + y, or
equivalently, the endpoint of the vector z + x is to the left of the endpoint of the
vector z + y . By the definition of vector addition, both vectors z + x and z + y are
obtained by placing the starting points of x and y , respectively, at the endpoint
of z , and the endpoints of the displaced x and y , respectively, will be z + x and
z + y. Since by hypothesis, the endpoint of x is to the left of the endpoint of y , the
conclusion is immediate.
The following picture shows the case where x > 0 and y > 0:
z z+x 0
- - -x -
z+y y
For example, (−5) < (−3) =⇒ (−3) − (−5) > 0 (because (−3) − (−5) = 2),
and conversely, (−3) − (−5) > 0 =⇒ (−5) < (−3).
First, we prove that x < y =⇒ y − x > 0. By (B), x < y implies x + (−x) <
y+(−x), which is equivalent to 0 < y−x. Conversely, we prove y−x > 0 =⇒ x < y.
Again we use (B), y − x > 0 implies that (y − x) + x > 0 + x, which is equivalent
to y > x, as desired.
Thus, 4 < 5 =⇒ ( 23 23
6 )4 < ( 6 )5 (because the left side is
92
6 and the right side
6 ), and (−11) < (−9) =⇒ 7(−11) < 7(−9) (because the left side is −77 while
is 115
the right side is −66).
We first prove that, with x, y, z as given, x < y =⇒ xz < yz. We give two
proofs.
First proof: By (C), x < y =⇒ y − x > 0. Since z > 0 by hypothesis and the
product of two positive numbers is positive (see page 107), we have (y − x)z > 0,
so that yz − xz > 0. By (C) again, xz < yz, as desired.
A second proof uses the theorem on fraction multiplication, which equates a
product with the area of a rectangle (Theorem 1.7 on page 48). Given z > 0 and
x < y, if x < 0 < y, then xz < 0 and yz > 0 and there would be nothing to
prove. Therefore we need only consider the cases where x and y have the same
sign (i.e., they are both ≥ 0 or both ≤ 0; see p. 122). So let 0 ≤ x < y. If
x = 0, obviously xz < yz. Thus we may assume 0 < x < y. Then the inequality
xz < yz is exactly inequality (i) on page 52 in Section 1.4. (Briefly, here is the
reason: for fractions x, y, and z, Theorem 1.7 on page 48 says xz and yz are areas
of rectangles with sides of length x, z and y, z, respectively. Since x < y, clearly
the rectangle corresponding to yz contains the rectangle corresponding to xz and
therefore has a greater area. Hence yz > xz.) Now suppose x < y ≤ 0. Again, if
y = 0, there is nothing to prove, so we may assume x < y < 0; then (−x), (−y) > 0.
Moreover x < y implies (−y) < (−x), by (A). Thus we know from the preceding
argument that (−y)z < (−x)z, which is equivalent to −yz < −xz (equation (2.23)
on page 109), and therefore yz > xz, by (A) again.
Finally, we prove the converse: if for some z > 0, xz < yz, then x < y. We
claim that z1 > 0. Indeed, since z( z1 ) = 1 and 1 > 0, we see—from page 107—that,
in order for the product of z with z1 to be positive, z and z1 have to be either both
positive or both negative. Therefore z1 has to be positive. Such being the case,
then by what we have just proved, z1 > 0 and xz < yz imply that z1 (xz) < z1 (yz),
which is the same as x < y. (D) is proved.
9 It should be remarked that, in advanced mathematics, (C) is taken as the definition of x < y.
124 2. RATIONAL NUMBERS
To students, the fact that, when z < 0, the inequality x < y would turn into
xz > yz is the most fascinating aspect about inequalities. This goes against every-
thing they have learned up to this point, which suggests that whatever arithmetic
operation they apply to an inequality would preserve that inequality. Here, how-
ever, is a situation where an inequality gets reversed. We first illustrate with some
examples whose validity can be easily verified (in each case, the initial inequality is
multiplied by −4 to get the second one):
1<2 but −4 > −8,
3
2 < 15
4 but −6 > −15,
−2 < 1
2 but 8 > −2,
0 x y
Then the relative positions of 2x and 2y are the same as those of x and y although
each is pushed further to the right of 0.
0 2x 2y
−2y −2x 0 2x 2y
10 At the end of the preceding section, we warned against the tendency to assume that every
skill is universally applicable. There is no better illustration of the danger of this tendency than
the contrast between (D) and (E). One must begin to be sensitive to the fact that some facts are
true only under restrictive hypotheses.
2.6. COMPARING RATIONAL NUMBERS 125
We see that −2y is now to the left of −2x, so that −2y < −2x, as claimed. (Of
course, if z were − 12 , then x and y would both be pushed closer to 0 instead, but
the relative positions of − x2 and − y2 would still be the same as those of −2x and
−2y.)
It remains to prove that if z < 0, then xz > yz implies x < y. We claim that
1
z < 0. This is because z( z1 ) = 1 and 1 is positive. Since z is negative, z1 has to be
negative too as negative × positive = negative (see p. 107). Thus by the first part
of the proof, multiplying both sides of xz > yz by z1 reverses the inequality; i.e.,
1 1
z (xz) < z (yz). This is the same as x < y. The proof of (E) is complete.
In the course of proving (E), we proved the following useful fact: let x ∈ Q;
then
1
(2.34) x > 0 ⇐⇒ > 0.
x
This is because x( x1 ) = 1 > 0, and x and x1 cannot have opposite signs (i.e.,
one is negative and the other positive). Therefore x and x1 are either both positive
or both negative, proving (2.34). In turn, (2.34) leads to another useful fact: for
x, y, z ∈ Q,
x y x y
(2.35) x < y =⇒ < if z > 0 and > if z < 0.
z z z z
Activity. True or false: x < y and 0 < z < w imply xz < yw. (Be careful.)
Absolute value
This can be proved by a case-by-case examination of the four cases where x and y
take turns being positive and negative. Since the reasoning is routine, the details
can be left to Exercise 3 on page 131. On the other hand, inequalities involving
absolute value tend to present difficulties to students, so let us discuss this topic
with some care.
If b is a positive number, then the set of all the numbers x so that |x| < b
consists of all the points x of distance less than b from 0, indicated by the thickened
segment below (excluding the endpoints):
−b 0 x b
It follows that the inequality |x| < b for a point x is equivalent to the fact
that x satisfies both −b < x and x < b. It is standard practice in mathematics to
126 2. RATIONAL NUMBERS
combine these two inequalities into a composite statement in the form of a double
inequality:
|x| < b is equivalent to −b < x < b.
In the usual notation for intervals on the number line, this becomes:
|x| < b is equivalent to x ∈ (−b, b).
(The set of all the points x on the number line satisfying c < x < d, for two
fixed points c and d, with c < d, is denoted by (c, d). This is called an open
interval with endpoints c and d. We apologize for the likely confusion of this
notation with the point in the coordinate plane whose coordinates are c and d—to
be introduced in Section 6.3 on pp. 331—but that is the way it is. In this context,
the "segments" in our usual discussion are denoted by [c, d], which consists of the
open interval (c, d) together with the two endpoints c and d; more explicitly, [c, d]
is the collection of all the points on the number line x so that c ≤ x ≤ d. We call
[c, d] the closed interval with endpoints c and d.) Henceforth, we will refer to
[c, d] interchangeably as either a segment or a closed interval.
The fact that a single inequality |x| < b involving absolute value is equivalent to
a double inequality −b < x < b is a very useful fact to keep in mind in considerations
involving absolute values. In the following, we sometimes refer to −b < x < b as
the associated double inequality of |x| < b. The following example illustrates
the way the conversion of an absolute value inequality into its associated double
inequality can be put to use.
Example. Determine all the numbers x so that |6x + 1| + 2 41 < 5, and show
them on the number line.
The inequality |6x + 1| + 2 41 < 5 is equivalent to |6x + 1| < 5 − 2 14 (by (B)
above), which is just |6x + 1| < 2 34 , which, in turn, is equivalent to the double
inequality −2 34 < 6x + 1 < 2 34 . The left inequality is equivalent to −2 34 − 1 < 6x
(by (B) again); i.e., − 15
4 < 6x. Now we multiply both sides of this inequality by
1
6 and use (D) to conclude that it is equivalent to − 15 24 < x. By exactly the same
reasoning, the right inequality 6x + 1 < 2 34 is equivalent to x < 24 7
. Putting all this
together, we have that the inequality |6x + 1| + 2 4 < 5 is equivalent to the double
1
inequality − 15 7
24 < x < 24 . The collection of all the points x satisfying this double
inequality is the open interval (− 15 7
24 , 24 ) and is indicated by the thickened segment
in the picture (not including the endpoints).
− 15
24 0
7
24
|6x + 1| + 2 14 < 5
A basic property of absolute value is the following: for two numbers x and x0 ,
(2.37) |x0 − x| = the distance between x and x0 .
In particular, the length of an interval, (a, b) or [a, b], is just |a − b|.
There are three cases to consider: both x0 and x are positive, one is positive
and the other is negative, and finally both are negative. First we look at the case
where both are positive. Since |x0 − x| = |x − x0 |, we may assume x < x0 , so that
|x0 − x| = x0 − x.
2.6. COMPARING RATIONAL NUMBERS 127
0 x x0
Proving that x0 − x is the distance between x and x0 is—by definition (p. 125)—
equivalent to proving that x0 − x is the length of the segment [x, x0 ], but this
follows from the definition of subtraction on page 38: x0 − x is the length of the
remaining segment when [0, x] is taken away from [0, x0 ]. (Recall: x0 and x are
understood to be rational numbers but by FASM (Section 2.7 on pp. 133) may be
more liberally interpreted to be any numbers. See the footnote on page 121.) Now
consider the second case, where one is positive and the other is negative. Again,
because |x0 − x| = |x − x0 |, we may assume that x < 0 and x0 > 0, as shown:
x 0 x0
Since x < 0, we have x = −|x|. Then |x0 − x| = |x0 − (−|x|)| = |x0 + |x|| = x0 + |x|,
and the claim is again obvious. Finally, the case of x0 and x being both negative
is reduced to the first case because x0 = −|x0 | and x = −|x|, so that
|x0 − x| = |− (x0 − x)| = |− x0 − (−x)| = ||x0 | − |x||
= distance between |x0 | and |x|, by the first case
= distance between x0 and x, by symmetry.
Here is the picture for the case in which both x0 and x are negative:
x0 x 0 |x| |x0 |
The importance of the concept of absolute value is hidden in TSM.11 See the
Pedagogical Comments on page 130 for an informal explanation of the importance.
Here we are content to illustrate how absolute value is used in a nontrivial way by
proving two basic inequalities involving absolute values. Here is the first one.
(see (2.22) on page 107). So we may as well assume both x and y to be nonzero.
Such being the case, we make use of (2.35) on p. 125 to rewrite the theorem as
2|xy|
≤1
x2 + y 2
1
for all x and y. Since x2 + y 2 > 0, we have x2 +y 2 > 0 on account of (2.34) on
and that equality holds if and only if |x| = |y|. We know from a previous remark
on page 126 that this inequality is equivalent to the double inequality
2xy
−1 ≤ ≤ 1.
x2+ y2
In this form, part (i) of the theorem is equivalent to asserting that the number
x2 +y 2 is trapped inside the segment [−1, 1] between −1 and 1 for all x and y.
2xy
Without the absolute value sign, part (i) of the theorem merely says that
2xy
≤ 1.
x2 + y2
−1 on the number line and, in particular, cannot be equal to −100. We see that the
presence of absolute value in the inequality of the theorem makes a big difference.
It remains to give the simple proof of Theorem 2.12. We first prove part (i)
in its original formulation:
2|xy| ≤ x2 + y 2 .
Let u = |x| and v = |y|; then 2|xy| = 2|x| · |y| by (2.36) on page 125. Thus
2|xy| = 2uv. Now we make the simple observation that for all numbers t, t2 = |t|2 ;
this is clear when we first consider the case t ≥ 0 and then the case t < 0. Therefore,
we have x2 = |x|2 = |x| · |x| = uu = u2 . Similarly, y 2 = v 2 . Thus part (i) becomes
the statement that
2uv ≤ u2 + v 2 ,
which is equivalent to 0 ≤ u2 − 2uv + v 2 , by (B) on page 122. In other words, part
(i) is equivalent to
u2 − 2uv + v 2 ≥ 0
2.6. COMPARING RATIONAL NUMBERS 129
It may be thought that part (ii) of Theorem 2.12, which gives a necessary and
sufficient condition for the inequality to become an equality, is not interesting. The
opposite is true; see Exercise 18 on page 132 below.
We conclude with what is probably the most basic inequality involving absolute
value in elementary mathematics.
Theorem 2.13 (Triangle inequality). For any numbers x and y, (i) |x+y| ≤
|x| + |y| and (ii) this (weak) inequality is an equality if and only if x and y are of
the same sign, i.e., both ≥ 0 or both ≤ 0.
Proof. We first prove part (i). If one of x and y is 0, then there is nothing to
prove. We assume therefore that both x and y are nonzero. The most elementary
proof is one using case-by-case examination of the inequality. There are four cases
to consider: (i) both x, y > 0, (ii) both x, y < 0, (iii) x < 0 but y > 0, and (iv)
x > 0 but y < 0. Such a proof would be boring, but it is quite instructive if you
want to get some down-to-earth feelings about absolute values.
We give a different proof, one that makes use of the fact that the inequality
|x| ≤ b is equivalent to the double inequality −b ≤ x ≤ b. This is a standard
proof but is also one from which one can learn something about absolute values.
Therefore, instead of proving |x + y| ≤ |x| + |y|, we prove the double inequality
−|x| − |y| ≤ x + y ≤ |x| + |y|. There is no question that −|x| ≤ x ≤ |x| and
12 The fact that for any two numbers u and v, (2.38) is true can be verified by a straightforward
application of the distributive law; this fact is discussed in a broader context on pp. 304ff.
130 2. RATIONAL NUMBERS
−|y| ≤ y ≤ |y|. From −|x| ≤ x and −|y| ≤ y, we use the corollary of (B) on page
122 to conclude that −|x| − |y| ≤ x + y. Similarly, we use x ≤ |x| and y ≤ |y| and
the corollary of (B) to conclude that x + y ≤ |x| + |y|. Thus, we have proved both
inequalities in the double inequality.
Next we prove part (ii). There is no question that if x and y are of the same
sign, then equality holds in the inequality; i.e., |x + y| = |x| + |y|. We now prove
the converse; namely, |x + y| = |x| + |y| implies that x and y are of the same sign.
If one of x and y is 0, then x and y are already of the same sign and there is
nothing to prove. Assume therefore that both x and y are nonzero, and we will
use a contradiction argument. Suppose |x + y| = |x| + |y| and x, y are not of the
same sign. This means one of x and y is negative and the other is positive. For
definiteness, say x < 0 and y > 0. Then |x| + |y| = −x + y, but |x + y| = x + y
or |x + y| = −(x + y). In the case of the former, we have −x + y = x + y, so
that 2x = 0 and therefore x = 0. This contradicts x being negative. If the latter,
then −x + y = −(x + y), so that −x + y = −x − y, and therefore 2y = 0. Hence
y = 0 and this contradicts the positivity of y. Thus it is impossible that x and y are
not of the same sign when |x+y| = |x|+|y|. The proof of Theorem 2.13 is complete.
of the accuracy of these estimates, it can be said that they both miss the mark by
54 and it does not matter whether they are over or under by this amount. Thus
it is the absolute value of this difference, rather than the difference itself, that is
of primary interest. The absolute value in this instance provides exactly the right
tool to express the absolute error of such estimations.
A similar phenomenon surfaces when one tries to say that two numbers x and
y are "close" to each other, for example, in the consideration of the limit b of a
sequence (xn ) in Chapter 2 of [Wu2020c]. When n is large, one wants to say that
the distance between xn and b is small. In this situation, it is irrelevant whether
xn is to the left or to the right of b, because all one cares about is that the distance
between xn and b gets smaller and smaller:
xn b b xn
It is in this context that equation (2.37) on page 126 comes to the fore because
it allows us to express this fact simply as "|b − xn | is small". There is a more
important consideration, however. In order to verify that indeed |b − xn | is small
in a particular situation, often long computations involving absolute values will be
required. Therefore the concept of absolute value is needed, not only as a conve-
nient tool to express a finding at the end of a long process, but as an integral part
of the reasoning within the process itself. (Such computations are faintly suggested
in the proof of Theorem 2.12 on page 127 and are slightly better represented in the
proofs of parts (c) and (d) of Theorem 2.10 in [Wu2020c].) Because the concept
of limit shows up almost everywhere in advanced mathematics, one can begin to
get a glimpse of the importance of absolute values from this discussion. End of
Pedagogical Comments.
Exercises 2.6.
(c) ( −2 4 14 −2
3 )/( 7 ) or ( 3 )( 8.5 )?
(2) Determine all the numbers x which satisfy the inequality in each of the
following: (a) |x − 1| − 5 < 23 , (b) 11 − |3 + 2x| > 2.5, (c) |2x − 35 | ≥ 15 ,
(d) 3 − |2x − 5| ≥ 4.2.
(3) Prove (2.36) on p. 125, i.e., for any x, y ∈ Q, |xy| = |x| · |y|.
x |x|
(4) If x and y are rational numbers and y = 0, then prove that = .
y |y|
(5) Prove that the trichotomy law (page 121) implies the reflexive property of
"≤" (see page 121).
(6) (a) If x, y, z, w are rational numbers and x ≤ y and w ≤ z, then show that
x + w ≤ y + z. (b) If, in addition, all four numbers ≥ 0, then show that
xw ≤ yz.
(7) (a) Let x, y, z, w ∈ Q, and let y, w > 0. Then prove that xy < w z
⇐⇒
x z
xw < yz. (b) Give examples to show that both implications " y < w =⇒
x z
xw < yz" and "xw < yz =⇒ y < w " are false without the assumption
132 2. RATIONAL NUMBERS
|x|
Assuming the usual area formula for a rectangle, use this picture to give a
proof of Theorem 2.12 on page 127. (Hint: The inequality of the theorem
is equivalent to |xy| ≤ 12 (x2 + y 2 ).)
(17) Let a and b be numbers so that a < b, and let θ be a number. Prove that
a < (1 − θ)a + θb < b ⇐⇒ 0 < θ < 1. 13
(18) (a) Prove the following inequality of arithmetic and geometric
√ means
for two numbers: if s and t are nonnegative numbers, then st ≤ 12 (s + t),
and equality holds if and only if s = t. (The left side is called the geo-
metric mean of s and t while the right side is called their arithmetic
13 Due to Ole Hald.
2.7. FASM 133
mean.) (b) Prove that among all rectangles with a fixed perimeter, the
square has the biggest area.
(19) If x and y are positive, then prove that (a) x2 = y 2 if and only if x = y
and (b) x2 < y 2 if and only if x < y. (Hint: Use the trichotomy law.)
(20) Expand the following outline of an argument into a valid proof of the
triangle inequality (Theorem 2.13 on page 129): to prove |x+y| ≤ |x|+|y|,
it suffices to prove |x + y|2 ≤ (|x| + |y|)2 . This is so because
|x + y|2 = (x + y)2 = x2 + 2xy + y 2
≤ |x|2 + 2|x| · |y| + |y|2 = (|x| + |y|)2 .
2.7. FASM
We will give a precise statement of FASM in this section and put the last two
chapters on fractions and rational numbers in perspective.
Using the concept of rational quotients, we can state the Fundamental As-
sumption of School Mathematics (FASM):
The laws of operations for both addition and multiplication (asso-
ciative, commutative, and distributive), the formulas (a)–(d) on
page 118 for rational quotients, and the basic facts about inequal-
ities (A)–(E) on pp. 122–124 for rational numbers continue to
be valid when the rational numbers are replaced by real numbers.
FASM will be proved in Section 2.1 of [Wu2020c].
Next we turn to the treatment of the rational numbers in this volume. Our
starting point is the whole numbers; then we expand it to fractions, and finally to the
rational numbers by introducing negative fractions. In an upper division course on
abstract algebra, however, the field of rational numbers, Q, is usually introduced
in the following way. One starts with the whole numbers, N, and enlarges that to the
integers, Z, by adjoining to N the negative integers (i.e., the additive inverses of the
whole numbers). For example, there is no whole number x so that x + 3 = 0 but the
number −3 in Z has exactly this property: (−3) + 3 = 0. Z is now a commutative
ring in the sense that addition and multiplication are defined between any two
elements of Z, so that both satisfy the associative and commutative laws and so
that the distributive law holds. Moreover, every element in Z has an additive
inverse. Now Z has an additional property; namely, it has no divisors of zero, in
the sense that if m = 0 and n = 0, then the product mn = 0. Such a ring is called
an integral domain. An integral domain can be further expanded to a field,
called the quotient field of the integral domain; in the case of Z, this quotient
field is exactly Q. The difference between an integral domain and its quotient field
is that every nonzero element of the former now has a multiplicative inverse in the
latter; e.g., while there is no integer z so that z · 2 = 1, the element 12 in Q does
satisfy ( 12 ) · 2 = 1. In fact, it turns out that all nonzero elements in Q—not only
those in Z—have multiplicative inverses in Q itself.
In summary, the main point of the two-step process of expansion, from N to
Z to Q—in the context of algebra—is that by passing from N to Q, we acquire an
additive inverse for every element of Q (in particular, for every element of N) and
also a multiplicative inverse for every nonzero element of Q (in particular, for every
nonzero element of N). This is the reason from the perspective of mathematical
134 2. RATIONAL NUMBERS
structure that we need the field of rational numbers. It may be added that the
passage from an integral domain to its quotient field is completely standard and
merits at most a week and a half in a course on abstract algebra. But in school
mathematics, teaching fractions is of course spread over four or five years.
From the point of view of teaching school mathematics, something else should
also be brought out, namely, the fact that the passage from an integral domain to
its quotient field, while standard, is nevertheless abstract. To simplify matters a
little bit, let us illustrate with Z and Q. Let S be the subset of ordered pairs
of integers Z × Z consisting of all the elements (x, y) so that y = 0. Introduce
an equivalence relation ∼ in S by defining (x, y) ∼ (z, w) if and only if xw = yz
("ordered" means, by definition, (x, y) = (z, w) if and only if x = z and y = w).
Denote the equivalence class of (x, y) in S by xy . The set of all such xy is what we
call Q. Identify Z with the set of all elements of the form x1 , and we have Z ⊂ Q.
It is in this sense that Q is an extension of Z. Finally, we convert Q into a ring by
defining addition and multiplication in Q as
x z xw + yz x z xz
(2.39) + = and · = .
y w yw y w yw
Of course we routinely check the compatibility of these definitions with the equiv-
alence relation and the fact that every nonzero element of Q has a multiplicative
inverse. This is what we normally teach our math majors in three to four lectures.
An obvious but relevant comment is that the above equivalence relation, to the
effect that (x, y) ∼ (z, w) if and only if xw = yz, when written in the ordinary
fractional notation, becomes
x z
= if and only if xw = yz.
y w
One recognizes that this is precisely the statement that the cross-multiplication
algorithm (Theorem 1.2 on page 22) holds in Q. Furthermore, one sees in the
definition of addition in (2.39) that there is no mention of the least common de-
nominator at all. Therefore a knowledge of abstract algebra is beneficial to school
teachers at least in two respects: they gain an appreciation of the significance of the
cross-multiplication algorithm (and are therefore less likely to ban their students
from using it), and they also realize how misguided it is to insist on the use of the
least common denominator in fraction addition. Having said that, let us return to
the point about the abstraction inherent in this way of defining Q: this is definitely
not something you want to bring back to elementary or middle school. It is this
disconnect between abstract algebra and what can be taught in the school class-
room that we must keep in mind when we approach Chapter 1. We are forced to
introduce Q by using something less abstract than equivalence classes of ordered
pairs of integers, and the number line seems to be an acceptable compromise. For
example, the product formula (Theorem 1.6 on page 46) is among the most sub-
stantial theorems to be proved in these two chapters. We choose to give a proof on
the basis of a definition of fraction multiplication in terms of the number line (page
45) because the alternative is to define multiplication as in (2.39); i.e., xy · w
z xz
= yw .
But fifth and sixth graders do not take kindly to having multiplication of fractions
defined for them by a formula. Indeed, TSM has done this very thing to them for
decades, and we know only too well the result: massive nonlearning.
2.7. FASM 135
One can also gain some perspective on the introduction of negative numbers on
the number line in Section 2.1 (page 90). In an abstract algebra course, the negative
integers are elements of an abstract ring, Z, and are introduced before Q is defined.
Of course, negative fractions then appear as equivalence classes of ordered pairs of
integers. It is a case of abstractions piled on top of abstractions. In Chapters 1 and
2, we go from N to fractions—the nonnegative numbers in Q—before introducing
negative numbers all at once as points to the left of 0 on the number line. As in the
case of fractions, the goal is to make negative numbers as "concrete" as possible.
The fear of negative numbers in middle school is well known, and one can only
speculate that this fear is the result of presenting negative numbers as abstract
quantities in TSM and teaching their arithmetic properties such as (negative) ×
(negative) = (positive) by rote. This explains the emphasis on reasoning and proofs
for all the arithmetic computations in Z in Chapter 2. It may be mentioned that
the proof of Theorem 2.8 given on page 109 of (negative) × (negative) = (positive)
is in essence what would be given in an abstract algebra course. However, because
this proof is given only after an elaborate discussion of special cases (see Section
2.4), it is hoped that the proof will finally make sense and that some version of it
can be used in the school classroom.
CHAPTER 3
Book VII of Euclid’s Elements ([Euclid2]). See Propositions 1 and 2 therein. Euclid, whose name
has become part of the English language (and probably every language), was a Greek mathemati-
cian who lived in Alexandria (in Egypt, then a Greek colony) around 300 BC. Essentially nothing
is known about him other than the fact that he authored the Elements, a comprehensive account
of the mathematical knowledge known at his time, which probably contains his reorganization
together with his own original contributions. To the general public, "Euclid" is synonymous with
plane geometry, but it is a fact that only the first six of the thirteen books of this work are devoted
to plane geometry. Beyond the Euclidean algorithm, the fundamental theorem of arithmetic and
his (Euclid’s) famous proof of the infinitude of primes both appear in Book IX. The Elements is
the work that laid the foundation of not only mathematics but of modern science as well.
137
138 3. THE EUCLIDEAN ALGORITHM
We will begin with some basic concepts about whole numbers that will allow
us to define precisely the meaning of the reduced form of a given fraction. We start
at the beginning. A nonzero integer d is a divisor or a factor of an integer a if
a = cd for some integer c. (Thus a divisor is nonzero, by definition.) Sometimes we
also say d divides a. Another way to say d divides a is to say that the rational
number ad is an integer. We write d|a when this happens, and we also say a is an
(integral) multiple of d. If d does not divide a, we write d |a. We also call an
expression of a as a product, a = cd, a factorization of a.
Observes that 1 divides every integer, as does −1. Also observe that (i) if k|
and |m, then k|m and (ii) every nonzero integer divides 0. The simple proofs are
left to Exercise 1 on page 146.
In the following discussion, most of the time all the integers involved will be
whole numbers, i.e., integers which are positive or 0. However, there are one or two
places where it would become very awkward if we restrict ourselves only to whole
numbers (cf. part of the proof of the Euclidean algorithm on pp. 144ff.). For this
reason, we bring in integers from the beginning. When we need to focus on whole
numbers exclusively, we will be explicit about it, e.g., the concept of a prime in the
next section.
Consider now two whole numbers a and d, where at least one of a and d is not
equal to 0. An integer is said to be a common divisor of a and d if it divides
both a and d. Note that any two such whole numbers a and d have at least two
common divisors, namely, ±1. A whole number c is said to be the GCD (greatest
common divisor) of whole numbers a and d if, among all the common divisors of
a and d, c is the greatest. Notation: GCD(a, d). Observe that the definition of
GCD is well-defined because, let us say, a > 0. Then the set of common divisors
of a and d contains at least 1, so GCD(a, d) is just the largest number among the
finite set of integers from 1, 2, . . . to a. Two whole numbers a and d are said to be
relatively prime if GCD(a, d) = 1. For example, GCD(125, 64) = 1.
For a later need, we will give another approach to GCD. Given a whole number
n, let D(n) denote the set of all the divisors of n. For example, D(0) is the set of
all nonzero integers and D(1) contains only the two integers ±1. Clearly,
the set of all common divisors of a and d = D(a) ∩ D(d).
Then still assuming that at least one of the whole numbers a and d is nonzero so
that D(a) ∩ D(d) is a finite set, we have
(3.1) GCD(a, d) = max{D(a) ∩ D(d)},
where max indicates the largest number in the finite set. If a > 0, then the fact
that D(0) is the set of all nonzero integers implies that
(3.2) GCD(a, 0) = a.
In this notation, we also see that a and d being relatively prime is equivalent to
D(a) ∩ D(d) = 1.
A fraction is said to be a reduced form of a given fraction k if m
m
n
k
n = and m and
n are relatively prime. In general, a fraction with the property that its numerator
and denominator are relatively prime is said to be in lowest terms, or reduced.
A fact taken for granted in elementary school is that any fraction has a reduced
3.1. THE REDUCED FORM OF A FRACTION 139
form and that there is only one. When classroom instruction focuses entirely on
fractions with a single-digit numerator and denominator—a common practice in
TSM—the reduced form of a fraction can be obtained by visual inspection. For
fractions with larger numerators and denominators, however, deciding whether a
fraction is in reduced form is often not obvious. For example, is the fraction
1147
899
reduced? (It is not. See Exercise 7 on page 146.) The purpose of this section is to
clarify this situation once and for all by proving the following theorem. The state-
ment requires that we define the term algorithm: it is an explicit finite procedure
that leads to a desired outcome.2
Theorem 3.1. Every nonzero fraction has a unique reduced form. Further-
more, this reduced form can be obtained by an algorithm.
Proof (beginning).3 We will first prove the fact that every nonzero fraction has
a reduced form, as follows. Let a fraction k be given, k > 0. Let c, c > 0, be the
GCD of k and , and let k = ck and = c for some whole numbers k and . We
claim
k k
is the reduced form of .
The equality k = k is because of equivalent fractions (Theorem 1.1 on page 20).
We will prove the fact that k is reduced by contradiction. Suppose it is not. Then
k and have a common divisor c > 1; let us say k = c k0 and = c 0 for some
whole numbers k0 and 0 . Then
k = ck = cc k0 and = c = cc 0 .
It follows that cc is a common divisor of k and . Since c > 1, cc > c, and
this contradicts the fact that c is the greatest of the common divisors of k and .
Therefore k is reduced. This then proves that every fraction has a reduced form.
Showing that a reduced form of a fraction always exists is the easy part of
proving Theorem 3.1, however. To complete the proof of the theorem, we also need
to show:
(a) The reduced form of a fraction is unique.
(b) There is an explicit finite procedure (i.e., an algorithm) that unfailingly
yields the reduced form of a fraction.
Neither is trivial, and the proof of both needs the Euclidean algorithm, which is
the subject of the next subsection.
For the statement of the Euclidean algorithm, we have to first recall the well-
known procedure of division-with-remainder: given positive integers a and d, the
division-with-remainder of a by d is given by the equation
(3.3) a = qd + r where q, r ∈ N and 0 ≤ r < d.
2 Note that an algorithm can also refer to an infinite process so that the numbers produced
by the successive steps form a sequence that converges to the desired outcome.
3 The conclusion of the proof is on page 145.
140 3. THE EUCLIDEAN ALGORITHM
The remainder of this subsection is devoted to the proof of the Euclidean algo-
rithm (the proof itself is given on page 143).
The overriding idea that runs through the proof of the Euclidean algorithm is
captured by the following lemma.
We should point out right away that while the equality a = qd + r in the lemma
is obviously suggested by the division-with-remainder of a by d in (3.3), there is
one small difference: the requirement of "0 ≤ r < d" in (3.3) is irrelevant for the
validity of Lemma 3.3 itself.
Because we have been emphasizing the purposefulness of mathematics (cf. page
xiii), we should ask—before we even bother with the proof of the lemma—what
purpose this lemma serves in our search for an algorithm that yields the GCD of
two whole numbers. A simple example will provide a satisfactory answer to this
question. Let us try to get the GCD of 897 and 221. The numbers 897 and 221 are
relatively big and—without resorting to any computer software—it is not so easy
to see what their common divisors are. However, Lemma 3.3 suggests that we look
at the division-with-remainder of 897 by 221:
(3.5) 897 = (4 × 221) + 13.
The (as yet unproven) lemma now guarantees that
GCD(897, 221) = GCD(221, 13).
Compared with (897, 221), the pair (221, 13) is a much smaller pair of whole num-
bers and the level of difficulty of our search has been reduced accordingly. If we
are lucky or clever, we may notice that 13 divides 221 and therefore GCD(221, 13)
= 13. The GCD of 897 and 221 is thus 13 and we are done. However, since we
are searching for an algorithm, we must produce a procedure that yields the GCD
without the intervention of luck or cleverness for its implementation.5 Once more,
4 Mathematical Aside: In abstract algebra, this is of course the division algorithm for integers,
but in school mathematics, one cannot afford to use this terminology because it causes confusion
with the long division algorithm.
5 An algorithm is an explicit finite procedure that leads to a desired outcome, period. There
Lemma 3.3 comes to the rescue: if it was helpful before, it would likely be helpful
again. Let us therefore use it on 221 and 13. The division-with-remainder of 221
by 13 now states
221 = (17 × 13) + 0.
Therefore Lemma 3.3 implies that GCD(221, 13) = GCD(13, 0), and the latter is
just 13, by (3.2) on page 138. We may therefore summarize our findings as follows:
by iterating the division-with-remainder (3.3) in the manner of Lemma 3.3, we
obtained
GCD(897, 221) = GCD(221, 13) = GCD(13, 0) = 13.
This is Lemma 3.3 at work.
Incidentally, the equality in (3.5) already expresses the GCD 13 in terms of 897
and 221, because it says 13 = 897 − (4 × 221). So by (2.23) on page 109, we get
13 = (1 × 897) + ((−4) × 221).
We have therefore verified the Euclidean algorithm completely in this special case.
We can now approach the proof of Lemma 3.3 with the prior assurance that it
will indeed be useful for our purpose.
Proof of Lemma 3.3. We will in fact prove something slightly more general. To
this end, we need to define what it means for two sets A and B to be equal:
A = B means A ⊂ B and B ⊂ A (i.e., two sets are equal, by definition, if they have
the same collection of elements in the sense that any element that belongs to one
set also belongs to the other set). With this understood, we will prove that the
equality among whole numbers a = qd + r implies the following equality of sets:
(3.6) D(a) ∩ D(d) = D(d) ∩ D(r).
By (3.1) on page 138, (3.6) implies (3.4) and hence also Lemma 3.3.
The proof of (3.6) hinges on the following simple
Observation: Suppose A, B, C are integers and A = B + C,
and suppose an integer n divides any two of A, B, C. Then n
divides all three.
The proof is straightforward (see Exercise 2 on page 146).
To prove (3.6), let us first prove one of the requisite inclusion relationships:
D(a) ∩ D(d) ⊂ D(d) ∩ D(r).
Suppose an integer n belongs to the left side; i.e., n divides both a and d. Then
we will show that it also divides both d and r. In other words, we have to show
that if n divides both a and d, then it also divides r in a = qd + r. Indeed, n
dividing both a and d implies n divides a and qd in a = qd + r, and therefore n has
to divide the third number r, by the preceding observation. We have proved the
desired inclusion.
The proof of the reverse inclusion is entirely similar. This proves equation (3.6)
and the proof of Lemma 3.3 is complete.
Now that we know Lemma 3.3 is correct, let us concentrate on the business
at hand, which is to use the lemma to prove the Euclidean algorithm in general.
To this end, we will work through a more elaborate example to get a better feel
for the process: let us determine the GCD of 10049 and 1190. Thus we carry out
142 3. THE EUCLIDEAN ALGORITHM
Proof of the Euclidean algorithm. Given the whole number pair a and d,
with a > d ≥ 1, division-with-remainder yields
a = qd + r, where 0 ≤ r < d.
By Lemma 3.3, we have the equality GCD(a, d) = GCD(d, r). Therefore the deter-
mination of the GCD of a and d is replaced by the determination of the GCD of d
and r. Observe that a > d and d > r. Lemma 3.3 now suggests that we again apply
division-with-remainder to the pair d and r; then the determination of GCD(d, r)
will be replaced by the determination of the GCD of a yet-smaller pair, and so on.
After a finite number of steps, this process must terminate with a pair of whole
numbers whose second member is 0. Let us say we get the following in succession:
a = qd + r,
d = q1 r + r1 ,
r = q2 r1 + r2 ,
r1 = q3 r2 + r3 ,
r2 = q4 r3 + 0.
The relationship between the remainder and the divisor in (3.3) on p. 139 implies
d > r > r1 > r2 > r3 > 0.
Note that the division with remainder can, in principle, continue for d − 1 steps
before it terminates with remainder 0, but for simplicity of writing, we have allowed
144 3. THE EUCLIDEAN ALGORITHM
Lemma 3.4 (Key lemma). Suppose , m, n are nonzero whole numbers and
|(mn). If and m are relatively prime, then |n.
The reason we call this the "key lemma" is that it is the linchpin in the proof
of the uniqueness of the reduced form of a fraction as well as in the proof of the
all-important uniqueness of the prime decomposition of a whole number in the
fundamental theorem of arithmetic (see page 149). This proof shows why it is
important to be able to express the GCD of two numbers as an integral linear
combination of them.
Before proving the key lemma, let us make sure that we know what it says. It
is easy to see that a whole number can divide a product without dividing either
factor. Thus, 6|(3 × 4), but 6 |3 and 6 |4, or more elaborately, 63|(72 × 245), but
63 |72 and 63 |245. However, what the key lemma says is that must divide one
of the factors in the product if is relatively prime to the other factor. It goes
6 This process actually terminates with a 0 remainder in far fewer than half of (d − 1) steps.
Proof of key lemma. The following brilliant proof is (so far as we can determine)
due to Euclid. We are given whole numbers , m, and n, so that |mn and and
m are relatively prime. We must prove |n. Since and m are relatively prime,
GCD(, m) = 1. By the Euclidean algorithm, 1 = α + βm for some integers α and
β. Multiply this equation through by n, and we get n = αn + βmn. Since di-
vides mn by hypothesis, |(βmn); obviously, |(n). Therefore divides αn+βmn,
which is n. In other words, divides n. The proof is complete.
(Theorem 1.2 on page 22), we have k 0 = k0 . Obviously k divides the left side of
the equation, which is k 0 ; therefore k also divides the right side of the equation,
which is k0 . Since k is reduced, k and are relatively prime. Therefore the
key lemma implies that k |k0 , so that k ≤ k0 . We now look at k0 = k 0 from
a different angle. Since k0 |(k0 ), we have k0 |(k 0 ). Since k00 is also reduced, k0
and 0 are relatively prime. By the key lemma again, we must have k0 |k and thus
k0 ≤ k . Together with k ≤ k0 , we get k = k0 . Using k 0 = k0 , we conclude
that also = 0 , as desired.
146 3. THE EUCLIDEAN ALGORITHM
Exercises 3.1.
(1) (i) If k, , m are integers and if k| and |m, then prove that k|m. (ii) Show
that every nonzero integer divides 0. (Caution: Use the precise definition
of divisibility.)
(2) Suppose A, B, C are whole numbers and A = B + C. Show that if a
whole number n divides any two of A, B, C, then n divides all three.
(3) Prove that the number 3 is a divisor of a whole number n if and only if 3
is a divisor of the number obtained by adding up all the digits of n. (Hint:
3 dividing a power of 10 always has remainder 1.)
(4) Repeat Exercise 3 with the number 3 replaced by the number 9.
(5) Prove that a whole number with 3 or more digits is divisible by 4 precisely
when the number formed by its last two digits (i.e., its tens digit and ones
digit) is divisible by 4. (Thus 93748 is divisible by 4 because 48 is divisible
by 4.)
(6) Prove that a whole number is divisible by 5 if and only if its last digit is
0 or 5.
(7) In each of the following, find the reduced form of the fraction: (a) 160 256 ,
(b) 273
156 , (c) 144
336 , (d) 1147
899 .
(8) (i) Find the GCD of each of the following pairs of numbers by listing all
the divisors of each number and compare: 35 and 84, 54 and 117, 104 and
195. (ii) Find the GCD of each of the same pairs of numbers by using the
Euclidean algorithm.
(9) Find the GCD of each of the following pairs of numbers, and express it as
an integral linear combination of the numbers in question: 322 and 159,
357 and 272, 671 and 2196.
(10) Let the GCD of two positive integers a and d be k, and let k = ma−nd for
some whole numbers m and n. Show that m and n are relatively prime.
(11) Let m a
n be the reduced form of a fraction b ; then prove that a = γm and
b = γn, where γ is the GCD of m and n.
(12) The effectiveness of the Euclidean algorithm depends on how fast the
remainders in the sequence of iterated divisions-with-remainder get to 0.
This exercise gives an indication. Given a division-with-remainder of a by
d,
a = qd + r.
We may assume that the division-with-remainder is nontrivial in the
sense that a > d and r > 0. Then first prove (i) r < 12 a.
Next, we iterate the divisions-with-remainder as in the Euclidean al-
gorithm:
a = qd + r,
d = q1 r + r1 ,
r = q2 r1 + r2 .
Then, assuming that the divisions-with-remainder are also nontrivial,
prove (ii) r1 < 12 d, (iii) r2 < 12 r.
Of course, the preceding equations being divisions-with-remainder,
we already know r2 < r1 < r (see (3.3) on p. 139). In theory,
however, we could have r1 = r − 1 and r2 = r1 − 1 so that
3.2. THE FUNDAMENTAL THEOREM OF ARITHMETIC 147
Primes
Activity. Prove the last claim; i.e., if c and d are whole numbers, a = cd, and
both c and d are > 1, then each of c and d is a proper divisor of a.
√
Lemma 3.5. Given a whole number n > 1, if no prime number p ≤ n is a
divisor of n, then n is a prime.
√
For a positive number x, its positive square root √ x is the
positive number so that its square is equal to x; i.e., ( x)2 = x.
In Section 2.5 of [Wu2020c], we will prove that any positive real
number has a unique positive square root. Here we anticipate
this fact, but rest assured that there is no danger of circular
reasoning. Observe that√ if a, b are positive numbers, then a < b
√
is equivalent to a < b (Exercise 19 on page 133).
Activity. Check whether 4493 is a prime by using Lemma 3.5 and the list of
primes < 100. (You are allowed to use a four-function calculator.)
We will presently prove Lemma 3.5a, but before doing that, we want to give
an intuitive
√ explanation of why there is at least a proper divisor m of n so that
m ≤ n. First consider the additive analog: we claim that if a positive integer n
is the sum of two positive integers and m, n = + m, then one of and m has to
8 If 1 were defined to be a prime, it would mess up the statement of the fundamental theorem
The following theorem shows that the primes are the fundamental multiplicative
building blocks of the whole numbers.
The uniqueness statement in this theorem, which is important for many reasons,
can be made more explicit, as follows: suppose n = p1 p2 · · · pk = q1 q2 · · · q , where
each of the pi ’s and qj ’s is a prime. Then to say that the collection of primes
150 3. THE EUCLIDEAN ALGORITHM
is unique means that k = and, after renumbering the subscripts of the q’s if
necessary, we have pi = qi for all i = 1, 2, . . . , k.
The expression of n as a product of primes, n = p1 p2 · · · pk , is called its prime
decomposition.9 Let it be noted explicitly that in the above expression, some or
all of the pi ’s could be the same; e.g., 24 = 2 × 2 × 2 × 3. The fundamental theorem
of arithmetic says that, except for the order of the primes, the prime decomposition
of each n is unique.
It should not be assumed that getting the explicit prime decomposition of a
whole number is easy. Try 9167, for instance. Even with the help of Lemma 3.5
on page 148, we still have to check all the primes ≤ 96 to see if any of them di-
vides 9167. (It turns out that 9167 has the prime decomposition 9167 = 89 × 103.)
The whole field of cryptography, which makes possible the secure transmission of
confidential information on the internet, depends on the fact that it is practically
impossible to get the prime decomposition of a well-chosen whole number of, say,
2,000 digits (as of 2020). However, it is not difficult to establish, in theory, that
every whole number ≥ 2 has a prime decomposition, as we now show.10
one can infer from this question that the proof of the uniqueness in Theorem 3.6
will not be trivial.
Remarks. (i) Knowing that each positive integer has a unique prime decompo-
sition, we can now get a more intuitive understanding of the case when two positive
integers and m are relatively prime: and m are relatively prime if and only if the
prime decompositions of and m have no primes in common (Exercise 2 on page
155). (ii) We can also take a second look at the key lemma (page 144) from the
vantage point of the fundamental theorem of arithmetic. Thus suppose a positive
integer divides a product mn of positive integers m and n, and suppose and m
are relatively prime. We can now achieve a more intuitive understanding of why
must divide n, as follows. By the preceding remark (i), the fact that and m
are relatively prime means that the prime factors of are distinct from the prime
factors of m. But if divides mn, then all the primes in the prime decomposition
of have to be a subset of the primes in the prime decomposition of n. This is why
|n. We must caution, however, that—unlike remark (i)—the intuitive argument
in this case does not constitute anything remotely like a proof for the key lemma.
This intuitive argument must remain an intuitive argument only, because it is actu-
ally circular, in the sense that the proof of the fundamental theorem of arithmetic
(which gives every positive integer its unique prime decomposition) depends on the
key lemma, and it will not do to use the unique prime decomposition of a positive
integer to prove that the key lemma is valid.
Activity. If k is a reduced fraction, prove that for any positive integer n, the
n
fraction kn is also reduced.
Note that the second part of the theorem is clearly false if mn is not reduced.
For example, 36 = 0.5, but the prime decomposition of 6 contains a 3. It is also good
to recall that a finite decimal is just a fraction whose denominator is a power of
10. It will be apparent from the proof how important it is to have such a clear-cut
definition of a decimal.
Proof. We first prove that if the prime decomposition of the denominator n con-
tains no primes other than 2 and 5, then m n is equal to a finite decimal. The idea
of the proof is already contained in the reasoning on pp. 61ff. in the discussion of
decimal division, but we can quickly summarize it by way of an example. Consider
3.2. THE FUNDAMENTAL THEOREM OF ARITHMETIC 153
27 27
the fraction 160 . Since 160 = 25 · 5, 160 is equal to the decimal 0.16875 because,
by equivalent fractions (Theorem 1.1 on page 20),
27 27 27 · 54 16875
= 5 = 5 = ,
160 2 ·5 2 ·5·5 4 105
which by definition is 0.16875. In general, if n = 2a 5b , where a, b are whole numbers,
we may assume without loss of generality that a ≤ b. Then
m m 2b−a m 2b−a m
= a b = b−a a b =
n 2 5 2 2 5 10b
and the last is a finite decimal, by definition.
Conversely, suppose m n is a reduced fraction which is equal to a finite decimal:
m k
= b,
n 10
where k, b are whole numbers. We have to show that no prime other than 2 and 5
divides n. By the cross-multiplication algorithm, nk = m10b . Since m n is reduced, n
b
is relatively prime to m. Since n divides nk, it divides m10 as well. The key lemma
(page 144) shows that n must divide 10b , which is 2b 5b . Therefore, 2b 5b = n for
some whole number . By the uniqueness of the prime decomposition, the collection
of primes on the right is the same collection as those on the left, which consists of
only 2’s and 5’s. Therefore the primes on the right consist of only 2’s and 5’s. Thus
n = 2a 5c , where a and c are whole numbers ≤ b. The theorem is proved.
At this point, we can take a look at the question of whether the rational numbers
are sufficient for doing mathematics. The following theorem implies that they are
not. For its statement, recall that a number is a point on the number line (page
5); for emphasis, a point on the number line is also called a real number. A real
number is said to be irrational if it does not lie in Q.11 In addition, a perfect
square is a whole number which is equal to the square of another whole number.
Thus the first few perfect squares are 0, 1, 4, 9, 16, 25, 36, 49, 64, . . . .
Theorem 3.9. Let n be a whole number which is not a perfect square. If there
is a positive number r so that r 2 = n, then r is irrational.
page 20 to cancel all common factors of 2 in m and n. That said, we square both
√ m2
sides of 2 = m 2 2 2
n to get 2 = n2 , so that m = 2n . Thus m is even and therefore
m is even. This implies n is odd. Now since m is even, then m = 2k for some
positive integer k, and m2 = 4k2 . Hence the equation m2 = 2n2 may be written
as 4k2 = 2n2 , which is the same as 2k2 = n2 . Now n2 is even (it is equal 2
√ to 2k ),
and therefore n must be even, contradicting the fact that n is odd, so 2 cannot
be rational after all.
The following proof is in essence a generalization of this argument, with the
fundamental theorem of arithmetic replacing the argument using "even and odd".
Exercises 3.2.
(1) Without using the fundamental theorem of arithmetic, give a direct, self-
contained proof of why the prime decomposition of 455 (= 5 × 7 × 13) is
unique.
(2) (i) Prove that two positive integers k and m are relatively prime if and
only if the prime decompositions of k and m have no primes in common.
(ii) Given two positive integers a and b, prove that if their GCD is k, then
the two positive integers ka and kb are relatively prime.
156 3. THE EUCLIDEAN ALGORITHM
using "any of the measures typically employed in classroom research." Since the Schoenfeld article
has shown that course to be a total mathematical disaster, the fact that the classroom research
in mathematics education around 1983 considered it to be "well-taught" serves to reinforce the
point we are trying to make: TSM has done serious damage to mathematics education research
as well.
157
158 4. BASIC ISOMETRIES AND CONGRUENCE
about nurturing their ability to formulate and present mathematically correct ar-
guments. When students did geometric constructions with ruler and compass, the
main concern was whether their description of the steps of the construction was
correct and whether their pictures were visually accurate, but not whether they
could provide the needed reasoning to support the mathematical correctness of the
constructions. Such excesses naturally called for corrections. Unhappily, as is often
the case, the resulting corrections turned out to be not better but simply defective
in a different way. Indeed, a second kind of high school geometry course sprang
up around 1995; it gave up completely on proofs and relied solely on computer ge-
ometry software and hands-on activities to establish conviction. See, for example,
[Serra] and the discussion in [Wu2004a, pp. 533–534].
The disconnect between the TSM high school geometry course and the rest of
the TSM curriculum also shows up in the teaching of the two concepts that are
the twin pillars of the course: congruence and similarity. In TSM, these concepts
are first introduced in middle school, where students are taught that congruence
means "same size and same shape" and similarity means "same shape but not
necessarily the same size". Mathematics has to be precise in order to preclude
misinterpretations, so inherently imprecise phrases such as "same shape" and "same
size" can have no place in a mathematical definition. Often what is the "same
shape" to one person may not be the "same shape" to another. For example,
consider the following two curves: are they of the "same shape"? To those whose
claim that these curves are not of the "same shape", what can you say to convince
1
them that they are? (The left curve is the graph of 490 x2 , while the right curve is
2
the graph of x , and both are drawn to the same scale.)
Y Y
X X
The fact is that they are similar in a precise mathematical sense and therefore must
be of the "same shape".2 However, until we can formulate a precise definition of
similarity and prove that these two curves are similar, it would be difficult, if not
impossible, to argue from an artistic or an emotional point of view that they are
the "same shape". For the past fifty years, unfortunately, middle school students
have had to put up with these kinds of ambiguous statements about congruence
and similarity as mathematical definitions.
The problem with TSM’s treatment of congruence and similarity does not stop
here, however. There is an abrupt—and inexplicable—change in the definitions of
congruence and similarity in the transition from middle school to high school. In
the TSM high school geometry course, the definition of congruence (respectively,
2 They are both graphs of quadratic functions and are therefore similar, by Theorem 2.11 in
• The high school geometry course cannot be the only place in the K–12
curriculum where definitions, theorems, and proofs are taken seriously. To
the extent that we are trying to teach students mathematics with mathe-
matical integrity (see page xiii for the definition) rather than some latter-
day concoction that purports to be "mathematics", we have to make the
rest of the school mathematics curriculum take definitions, theorems, and
proofs seriously too. The use of an axiomatic system in the high school
geometry course is a more complex issue and will have to be handled with
some care; see the discussion on pp. 160ff.
• Congruence and similarity are not only the bedrock of the high school
geometry course and a foundation of high school algebra, they are also a
mainstay of the whole school geometry curriculum. We must find defini-
tions for these two concepts so that they are usable in both middle school
and high school.
• The curricular decision by TSM to make students work with slope in intro-
ductory algebra before teaching them about similar triangles has caused
great harm in students’ learning of slope (see [PG], for example). We have
to provide students with the necessary mathematical knowledge about sim-
ilar triangles before the introduction of slope. This will improve students’
160 4. BASIC ISOMETRIES AND CONGRUENCE
ability to learn about slope and strengthen the relevance of the high school
geometry course to the school mathematics curriculum.
Any attempt at improving the TSM geometry curriculum must therefore begin
by directly confronting these issues. A mathematics curriculum that puts def-
initions and proofs in the high school geometry course but nowhere else is ev-
idently not a viable curriculum as the present crisis in school mathematics ed-
ucation so eloquently testifies (see, e.g., [RAGS]). The main purpose of these
six volumes—[Wu2011a], [Wu2016a], [Wu2016b], [Wu2020b], [Wu2020c], and the
present volume—is to demonstrate that teaching school mathematics with mathe-
matical integrity (see page xiii) throughout K–12 is not only possible and desirable
but also makes school mathematics more learnable (compare the discussion of the
fundamental principles of mathematics on pp. xiii ff.). Such a large-scale effort at
revamping the K–12 mathematics curriculum would seem to be the only way to
combat the kind of disconnect described in the first bullet above.
It remains to say a few words about the use of an axiomatic system to begin
the study of plane geometry.3 Such an approach to plane geometry began with
Euclid around 300 BC (see [Euclid1]). That was an epoch-making achievement
in mathematics and science, but, unexpectedly, the axiomatic approach also be-
came the gold standard for introducing every beginner to mathematics through the
centuries. The fact that, for over two millennia, people did not realize the inappro-
priateness of the axiomatic method as a teaching tool for beginners simply boggles
the pedagogical mind. Euclid’s axioms, or variants thereof, are nothing but a pre-
cise summary of what we assume to be true when we embark on a serious study
of geometric figures in the plane (triangles, quadrilaterals, circles, etc.). While it
is essential that we have a clear-cut starting point for geometry, the mathematical
requirements that these axioms be as "simple" and "plausible" as possible and that
there be as few axioms as possible make the axiomatic method unsuitable for be-
ginners, in the following sense. These dual requirements put the starting point of a
beginner’s geometric journey at the "lowest level" possible. Consequently, it takes
a great deal of effort to prove many boring technical theorems just to bring the
discussion up to a level that is sufficiently interesting for an average student. We
already mentioned this affliction that besets the typical TSM high school geometry
course earlier on page 157. This kind of foundation-building work, while important
for mathematics per se, is perhaps best left to professional mathematicians because
not many beginners (or even mathematicians) can overcome the tedium of this kind
of highly nonintuitive, technical work.
Implicit in the TSM curriculum is the decision that the high school geometry
course must make up for TSM’s failure to prove anything elsewhere in the curricu-
lum by proving everything in this one course. Thus the main curricular justification
for an axiomatic development of geometry is to let students experience genuine
rigor by proving everything ab initio. We have already commented on the cognitive
disruption in students’ learning trajectory brought about by this sort of geometry
course, but there is much more to be said.
With the hindsight of twenty-three centuries (between us and Euclid), we now
know that such rigor is pedagogically untenable in a high school course (see [Hilbert]
and, e.g., Chapters 2 and 3 of [Greenberg] or Chapter 3 of [Hartshorne]). A main
3 If possible, you may wish to skim through Chapter 8 of [Wu2020b] right now, particularly
its preamble, to get an overall picture concerning the use of axioms in school geometry education.
OVERVIEW OF CHAPTERS 4 AND 5 161
reason is that the geometry of the plane is a more complex system than is usually
realized. Even the heroic efforts of some school textbooks such as [Moise-Downs]
to update the axiomatic system of Euclid fall far short of the goal of presenting a
"rigorous" treatment of plane geometry for high school. (Some obvious critical gaps
in the reasoning of [Moise-Downs] will be pointed out in the preamble to Chapter
8 of [Wu2020b].) Ultimately, what is important for beginners is not so much to
learn how to prove every theorem in a very narrow area of school mathematics but
to learn how to navigate in general from point A to point B via logical reasoning,
i.e., from hypothesis to conclusion. Again, see Chapter 8 of [Wu2020b].
We will modify the axiomatic approach by making the starting point of our
geometric studies a collection of eight assumptions, (L1)–(L8) (see pp. 165–176,
184–188, 237, and 250). We hope these assumptions are sufficiently "plausible",
but it is not part of our design to make them as "simple" as possible. In fact,
we intentionally assume quite a bit more than what the usual geometry axioms
do (compare, e.g., [Greenberg] and [Hartshorne]) in order to minimize the need to
prove intuitively obvious but technically tricky elementary theorems. In addition,
these eight assumptions overlap in subtle ways so that some redundancy among
them is built in. This redundancy helps to eliminate some (but, unfortunately, not
all) arguments that are otherwise too technical or too sophisticated for high school
students. (At certain points of the exposition, we will make explicit recommen-
dations about skipping the proofs of certain pictorially obvious facts in a school
classroom.) These expository decisions will then allow us to get to interesting theo-
rems reasonably early; e.g., Theorem G4 on page 226, which is the fourth principal
theorem in this exposition of geometry, says that opposite sides of a parallelogram
are equal (congruent). But perhaps the most unusual feature of these assumptions
is the fact that they provide a platform for defining the concept of congruence from
the beginning. This then brings us to a solution of the issue mentioned in the
second bullet on page 159 about the definitions of congruence and similarity in the
school mathematics curriculum.
We will approach congruence and similarity in a way that is different from
Euclid’s approach twenty-three centuries ago (see [Euclid1]) but more in tune with
the current understanding of these concepts. In mathematics, a congruence in the
plane is by definition a composition of the basic isometries: rotations, translations,
and reflections. This definition turns out to be suitable for use in middle school
because the essence of the basic isometries can be captured by hands-on activities.
In addition, because the concept of dilation can also be approached by way of hands-
on activities (see page 268), similarity can now be defined as the composition of a
finite number of congruences and dilations.
We envision a middle school geometry curriculum devoted to an exploration
of the basic isometries and dilations through hands-on activities together with a
judicious use of informal proofs. This new curriculum will define congruence as a
finite composition of basic isometries and will define similarity as the composition
of a finite number of dilations and congruences. The fact that the congruence or
similarity of two geometric figures (curved or not) can now be checked by hands-on
activities serves to demystify both concepts. Moreover, the hands-on activities will
foster the development of geometric intuition, an important goal of this curricular
design. The emphasis on intuition means that, in middle school, the basic isometries
will be defined informally via hands-on activities but not precisely in the sense of
162 4. BASIC ISOMETRIES AND CONGRUENCE
mathematical definitions. Likewise, other more complex concepts such as the half-
planes of a line will not be defined precisely in middle school either. Proofs will
be presented without attending to all the subtle technical details so that they
will be as intuitive as possible. For example, students can learn to prove ASA
and SAS by hands-on activities without being held responsible for transcribing the
activities into a precise written proof (see, e.g., pp. 290–299 in [Wu2016a]). In the
same way, they will also learn an informal proof of the AA criterion for triangle
similarity (see pp. 324–326 of [Wu2016a]). In addition, the availability of a correct
definition of congruence allows the concepts of length, area, and volume to be
correctly introduced in the middle school curriculum (see Chapter 5 of [Wu2016a]).
The high school geometry course will begin by revisiting the middle school
curriculum. The basic isometries and dilation will remain the cornerstones but,
this time around, each of these concepts—rotation, reflection, translation, and
dilation—will be precisely defined, as will all the standard concepts. In other
words, the definitions of congruence and similarity will remain unchanged from
middle school, but their intuitive content will be upgraded to precise descriptions.
There will no longer be any disconnect between the middle school concepts of con-
gruence and similarity and their high school counterparts. Moreover, the high
school course will clearly enunciate the geometric assumptions (L1)–(L8) that are
needed for the logical development of plane geometry. Given these changes, it will
be seen that this second tour of the middle school geometry curriculum will be far
from routine. For example, the concept of the half-planes of a line in the plane will
have to be defined abstractly by way of an assumption (L4) on plane separation (see
page 176). For another example, the definition of translation will not be given from
the beginning, but only after some theorems have been proved (see Section 4.4 on
pp. 216). The high school course will also revisit all the theorems that have been
informally proved in middle school but will do so this time from the standpoint of
the clearly stated geometric assumptions and precise definitions.
It remains to point out that implicit in the preceding description of our ap-
proach to congruence and similarity is a solution to the issue involving the teaching
of slope in the third bullet on page 159. The main geometric ingredient needed for a
correct definition of slope is the AA criterion for triangle similarity (Theorem G22
on page 288), and we have just described how an informal proof of this criterion
is built into this curriculum. Students will therefore be in possession of all the
needed tools to work with a correct definition of slope and acquire an intuitive un-
derstanding of what slope is all about (see Section 6.4 on pp. 337ff. in this volume).
This then makes it possible for students to learn—correctly, perhaps for the first
time—about the slopes of the graphs of linear equations in two variables in middle
school.
It is time to return to the expository decision concerning Chapters 4 and 5 men-
tioned on page 157. If these three volumes were to give a systematic and grade-level
appropriate exposition of the mathematics in grades 6–12, then these two chapters
would be devoted to an exposition of middle school geometry curriculum based on
the basic isometries and dilation. Later chapters in these three volumes would then
give a complete exposition of high school geometry. However, given space and time
limitations, such an exposition is not practical here. Fortunately, an account of
how middle school geometry might be taught has been given in two chapters in
OVERVIEW OF CHAPTERS 4 AND 5 163
one of the author’s middle school volumes: Chapters 4 and 5 of [Wu2016a]. Tak-
ing advantage of that fact, this volume will skip the presentation of middle school
geometry and go directly to high school geometry. Thus, instead of presenting
an informal introduction to geometry via hands-on activities and informal proofs,
Chapters 4 and 5 in this volume will, instead, present the initial segment of the
projected high school geometry course (which will encompass Chapters 6–8 of the
companion volume [Wu2020b]). For example, Chapter 4 will enumerate the precise
geometric assumptions from the beginning and give precise definitions for all the
standard terms. We should add that, to soften the formal character of such an
introduction to geometry, we have added as many side remarks and examples as
we can manage to make the exposition more user-friendly.
We are duty-bound to point out that, unlike the preceding three chapters which
give a presentation of school mathematics that is essentially consistent with the
usual sequencing of topics in the TSM school curriculum, the above outline of how
geometry should be taught in grades 8–11 represents a substantial departure from
the geometry and algebra curricula typical in TSM. Fortunately, the Common Core
State Standards for Mathematics ([CCSSM]), first released in 2010, have adopted
this very same departure. In particular, the presentation of geometry in Chapters
4 and 5 is now entirely compatible with the algebra and geometry standards of
[CCSSM] in high school.
We will round out the preceding discussion with two additional comments.
The first is the role of precise definitions in geometry. To accurately capture the
visual (geometric) information, we need precise language. For this reason, we ask
you to pay attention to the precision in the definitions of many familiar concepts
such as "half-planes", "angles", "convex sets", "rectangles", "polygons", etc. In an
overwhelming majority of these cases, the new definitions will take you by surprise.
To give but one example, TSM defines a rectangle as "a quadrilateral with four right
angles and two pairs of opposite sides with the same length", but in Chapter 4, a
rectangle is merely "a quadrilateral with four right angles" without any mention
about the length of the opposite sides (see page 193). An additional surprise may
be that it will take some effort to prove that rectangles do exist (Corollary 2 of
Theorem G3 on page 225) and that, indeed, they have opposite sides of the same
length (page 226). You may find it refreshing to see that the equality of the lengths
of opposite sides is now proved rather than assumed.
While precise definitions are important, we must emphasize that the goal of
geometry is not to study precise definitions per se, but to understand—in the
mathematical sense of its intuitive content and how to use it to prove theorems—
the visual information encoded in the definitions. The precision of the language is
merely a means to an end, but not the end itself. In geometry, the all-encompassing
concern is with the development of geometric intuition and the ability to reason in
geometric terms.
A second comment is that although basing high school geometry on the basic
isometries represents a serious departure from common practice, it actually comes
closer than any other approach to exposing a key feature of the Euclidean plane,
namely, its maximal homogeneity, in the sense that the plane has enough rotations,
reflections, and translations to carry any line segment to any other line segment
164 4. BASIC ISOMETRIES AND CONGRUENCE
having the same length.4 This homogeneity—together with the parallel postulate—
is what makes the Euclidean plane the Euclidean plane. This characteristic property
is largely invisible in the other presentations of plane geometry, especially in the
usual axiomatic treatments. We hope that, by bringing the homogeneity to the
fore, this new approach will bring a renewed appreciation of school geometry.
4 Mathematical Aside: In technical language, this says that the Euclidean plane is a two-point
homogeneous space.
5 Other than point, line, and plane.
6 Euclid did not assume any knowledge of the number line in [Euclid1].
4.1. THE BASIC VOCABULARY, PART 1 165
single one of the assumptions below, (L1)–(L8), is intuitively obvious.7 The only
reason we write them down here is to make absolutely clear which facts we will use
for the proofs of theorems.
The emphasis of (L1) is on the uniqueness of the line. Let us illustrate this fact
with a simple application. We say two lines are distinct if they are not the same,
i.e., not equal as subsets of the plane in the sense of page 141. Thus, by definition,
if two lines are distinct, there is at least one point that belongs to one but not the
other. A priori, this leaves open the possibility that, given two distinct lines, one
is contained in the other but is not equal to it. We will now prove this does not
happen.
Lemma 4.1. If two lines 1 and 2 are distinct, then there is a point P1 on 1
that does not belong to 2 and there is a point P2 on 2 that does not belong to 1 .
Proof. The proof is by contradiction. Suppose there is no such point P1 on 1 .
Take two distinct points Q and Q on 1 ; then both points must belong to 2 . It
follows that the two lines 1 and 2 both pass through Q and Q and therefore—by
(L1)—are necessarily the same line. This contradicts the fact that 1 and 2 are
distinct. So there is a point P1 on 1 that does not belong to 2 after all. In a
similar way, we can prove that there is a point P2 on 2 that does not belong to 1 .
The proof of Lemma 4.1 is complete.
Two lines L1 and L2 that have no point in common are said to be parallel. In
symbols, L1
L2 . The following lemma is a simple consequence of (L1):
Lemma 4.2. Two distinct lines are either parallel or have exactly one point in
common.
Recall the standard terminology: the intersection of two sets S1 and S2 is by
definition the collection of all the points in common to both S1 and S2 . In symbols,
S1 ∩ S2 . If there is no point in the intersection of S1 and S2 , we say they do not
intersect. Thus, an equivalent way of stating the lemma is that two distinct lines
either do not intersect or they intersect at exactly one point. Naturally one needs
to know when two lines intersect and when they do not. It turns out that this issue
cannot be settled except by an explicit assumption.
It will be seen that the parallel postulate dominates plane geometry. Without
it, many things we take for granted would be false, e.g., the Pythagorean theorem,
7 Mathematical Aside: We want to be explicit about the fact that (L1)–(L8) are—by design—
not a minimal set of axioms. For example, given that we assume every line can be made into
a number line (see (L3)) and therefore the concept of "betweenness" and the property of "line
separation" are built in, (L8) is known to follow from (L4); see [Greenberg, pp. 113 and 116].
166 4. BASIC ISOMETRIES AND CONGRUENCE
the equality of opposite sides of a rectangle, the angle sum of a triangle being 180
degrees, the concept of similarity, etc. It is unfortunately the case that TSM8 does
not bring out the overriding importance of the parallel postulate, so we strongly
suggest that teachers emphasize it in their teaching and educators never lose sight
of this fact in their research on proofs in geometry. A deeper discussion of this
assumption can be found in Chapter 8 of [Wu2020b].
According to the parallel postulate, for a point P not on a line L, every line that
contains P intersects L except possibly for one line. This postulate does not assume
explicitly that there exists a line passing through P and parallel to L. However, we
shall see in the corollary to Theorem G1 on page 222 that the existence of such a
parallel line can in fact be proved once we know there are enough rotations in the
plane. So contrary to what is normally done in school textbooks, our formulation
of the parallel postulate merely asserts that there is no more than one parallel line.
We know intuitively that if three distinct lines L1 , L2 , and L3 are given so that
L1 L2 and L2 L3 , then L1 L3 . It is less known that this intuitive fact has to
be proved by invoking the parallel postulate. More formally, we state:
L3 hhh
hhhh
hhhh P
hhhh
L1 hhh
hh
L2
The point P does not lie on L2 because P lies on L1 and L1 has no point in common
with L2 (L1 L2 by hypothesis). Thus through a point P not lying on L2 now
pass two distinct lines L3 and L1 , both parallel to L2 , contradicting the parallel
postulate. The lemma is proved.
If A and B are two distinct points, then by (L1), there is a unique line containing
A and B. We denote this line by LAB and call it the line joining A and B.
Naturally, we will need the concept of the line segment AB, or more simply the
segment AB, joining the two points A and B. If LAB were a number line (see page
5), we would simply define the segment AB to be the collection of all the points
"between A and B", together with the two points A and B. This then motivates
the next assumption.
(L3) Every line can be made into a number line so that any two given points
on the line are the 0 and 1, respectively, of the number line.
Assuming (L3), we can now define the concept of a point between two distinct
points A and B (recall remark (G) on pp. 13ff. in this connection). By (L3), we
can make LAB into a number line by choosing two fixed points P and Q on the
line LAB and designate them to be 0 and 1, respectively. Then a point C is, by
definition, between A and B if C lies on LAB and, with respect to this number
line, either A < C < B or B < C < A (again, see remark (G) on pp. 13ff.). In
symbols, we write A ∗ C ∗ B.
This definition of betweenness calls for some clarification. The definition de-
pends on the choice of two random points P and Q on LAB as 0 and 1, respec-
tively. Could it happen that, with respect to one choice of 0 and 1 on LAB , we have
A ∗ C ∗ B, but respect to another choice of 0 and 1, it would no longer be the case
that A ∗ C ∗ B? We now explain why the answer is no. So consider one choice of
P, Q ∈ LAB as 0 and 1, respectively. Then LAB can be represented as a horizontal
line as usual. Here are the two possibilities for A ∗ C ∗ B:
Case 1 P Q A C B
0 1
Case 2 P Q B C A
0 1
The advantage of having a number line is that, at each point X, there is a definite
positive direction—which consists of all the numbers greater than X—and also a
negative direction—which consists of all the numbers less than X. In terms of
the picture of the number line as a horizontal line, the positive direction is right-
pointing and the negative direction is left-pointing. Hence we can equivalently define
A ∗ C ∗ B as follows: if one of the two points A and B is to the left of C, then the
other point is to the right of C. For example, the following situation shows that C
is not between A and B because A and B are now both to the left of C and neither
of the two points A and B is to the right of C:
P Q B A C
0 1
Now consider what happens when two other points P and Q are chosen to be
the new 0 and 1, respectively. One possibility is that Q is to the right of P so
that the new 1 is again to the right of the new 0. Then the positive (respectively,
negative) direction of the number line with respect to P and Q as 0 and 1 does not
change and, consequently, neither does the left or right direction on the horizontal
number line. It follows that the concept of order on the two number lines (see page
12) stays the same, and we have A < C < B in Case 1 and B < C < A in Case 2
168 4. BASIC ISOMETRIES AND CONGRUENCE
Case 1 P Q A C B
0 1
Case 2 P Q B C A
0 1
The other possibility is that Q (the new 1) is to the left of P (the new 0); then the
positive (respectively, negative) directions of this number line become the negative
(respectively, positive) directions of the previous number line with P and Q as
0 and 1. The switching of the positive and negative directions on LAB has the
effect that the left and right directions are switched, and so are the "<" and ">"
relationships (see page 12). Hence the previous inequalities A < C < B in Case 1
and B < C < A in Case 2 now become B < C < A in Case 1 and A < C < B in
Case 2. According to the definition of ∗ on page 167, however, it is still the case
that A ∗ C ∗ B.
Case 1 Q P A C B
1 0
Case 2 Q P B C A
1 0
If we redraw these number lines in the usual way, so that 0 is to the left of 1, then
the pictures become more transparent:
Case 1 B C A P Q
0 1
Case 2 A C B P Q
0 1
explicitly that the definition of ∗ is well-defined and make yourself available for a
discussion of these subtleties with those students who are curious about such things.
Let it be known that it is perfectly legitimate to let slide some subtleties that
are not essential to the school curriculum in a school classroom. For example, we
have been using the real numbers without once discussing the complexities of R
that took humans more than two thousand years to unravel. Another example is
the fact that we will be using the geometric assumptions (L1)–(L8) as the foun-
dation for our geometric discussions without mentioning Gödel’s incompleteness
theorem([Henriksen]), regardless of how basic Gödel’s theorem may be. There are
already very substantive issues in school mathematics to worry about without get-
ting bogged down in these subtleties. End of Pedagogical Comments.
Implicit in the preceding discussion that ∗ is well-defined is the fact that there
is a "symmetry" in the role of A and B in A ∗ C ∗ B; i.e., A ∗ C ∗ B if and only
if B ∗ C ∗ A. This is because A ∗ C ∗ B means either A < C < B or B < C < A
relative to a number-line structure on LAB . By this definition, B ∗ C ∗ A means
either B < C < A or A < C < B. Obviously,
A < C < B or B < C < A ⇐⇒ B < C < A or A < C < B.
Therefore, A ∗ C ∗ B ⇐⇒ B ∗ C ∗ A. In words, C is between A and B if and only
if C is between B and A. This is as it should be. The following lemma gives the
most basic property of the betweenness symbol ∗ beyond this "symmetry". For its
statement, we say a collection of points are collinear if they lie on a line.
Lemma 4.4. Given three distinct collinear points A, B, C, then exactly one of
the following three possibilities holds: A ∗ B ∗ C, B ∗ C ∗ A, or C ∗ A ∗ B. (In words,
one and only one of any three collinear points is between the other two.)
Proof. [In terms of the number line, nothing could be more obvious than this
lemma. Therefore consider skipping this proof in a school classroom.]
We will convert the line LAB into a number line by declaring A to be 0 and B
to be 1. Since C is distinct from A and B, C = 0 and C = 1 on this number line.
Thus there are exactly three possibilities: C < 0, 0 < C < 1, and 1 < C. If C < 0,
then C < 0 < 1, which means C ∗ A ∗ B, by the definition of the symbol ∗ on page
167. Similarly, if 0 < C < 1, then A ∗ C ∗ B, which is equivalent to B ∗ C ∗ A by
the remarks immediately preceding Lemma 4.4. Finally, if 1 < C, then 0 < 1 < C
so that A < B < C, which means A ∗ B ∗ C. In summary, we see that there are
exactly three possibilities: C ∗ A ∗ B, B ∗ C ∗ A, A ∗ B ∗ C. The fact that they
are also mutually exclusive can be seen from the fact that, on this number line,
A < B < C, B < C < A, and C < A < B are mutually exclusive on account of the
trichotomy law (page 121). The proof is complete.
Finally, we can now give the definition we are after. The line segment, or
more simply the segment joining A and B, is by definition all the points between A
and B, together with A and B themselves.9 The notation for this segment is AB.
It is now clear that AB = BA. The points A and B are called the endpoints of AB.
9 Mathematical Aside: A segment on the number line is therefore what is normally called a
Note that there is no universal agreement on the notation used to denote lines,
segments, and later on, "rays". For example, some books use AB to denote the
line passing through A and B, AB to denote the segment between A and B, and
−−→
AB to denote the ray from A to B. One must proceed with caution in each new
situation.
Once we have segments, a natural concept to introduce is that of a polygon.
Intuitively, we do not want the following figure on the left to be called a "polygon"
because it "crosses itself", and we do not want the following figure on the right to
be a "polygon" either because it "doesn’t close up".
r r r
C
S
CS r
C S C
C Sr C
C C
C C
C
r Cr
r
Cr
It is clear that the definition of a polygon requires some care. We first define a
special case of a polygon, the hexagon, which is by definition a geometric figure
(i.e., a subset of the plane) consisting of six points A, B, C, D, E, F in the plane,
together with the six segments
AB, BC, CD, DE, EF , and F A,
so that none of them intersects each other except at the endpoints as indicated;
i.e., AB intersects BC at B, BC intersects CD at C, etc., but there are no other
intersections otherwise, and so that no consecutive segments lie in a line; i.e., we
want to see a "corner" at each A, B, . . . , F , e.g.,
C
A
B APD
PP
E
A F
The six points A, B, . . . , F are called the vertices of the hexagon and the six seg-
ments AB, BC, . . . , F A are its sides or edges. Notice that by its very definition,
a hexagon labels its vertices cyclically in the sense that its sides connect all of
them in alphabetical order until the very end, when the last vertex F is connected
to the first vertex A.
Now that we have defined a hexagon, we want to define a polygon of any
number of sides (or vertices, for that matter), and we are up against a problem
with notation: for six vertices, we can employ A, . . . , F , but if we have a polygon
with 234 vertices, what symbols should we employ to denote these vertices? We can
use numbers instead of letters to denote the vertices, in which case, we can go from
1, 2, . . . all the way to 234. But because integers come up in so many contexts,
sooner or later this would lead to hopeless notational confusion. We are therefore
forced into using subscripts: we can efficiently denote the 234 vertices by the 234
symbols A1 , A2 , A3 , . . . , A233 , A234 . Of course we could have used any letter, say
V , instead of A for this purpose, e.g., V1 , V2 , V3 , . . . , V233 , V234 .
4.1. THE BASIC VOCABULARY, PART 1 171
With this digression into notation out of the way, we can now give the general
definitions of a polygon and related concepts.
Let n be any positive integer ≥ 3. An n-sided polygon (or more simply an
n-gon) is by definition a geometric figure (i.e., subset of the plane) consisting of n
distinct points A1 , A2 , . . . , An in the plane, together with the n segments A1 A2 ,
A2 A3 , . . . , An−1 An , An A1 , so that
(i) none of these segments intersects any other except at the
endpoints as indicated; i.e., A1 A2 intersects A2 A3 at A2 , A2 A3
intersects A3 A4 at A3 , etc., but otherwise no other intersection
is allowed, and
(ii) any three consecutive vertices are never collinear:
The second condition excludes, for example, the possibility of calling the following
geometric figure containing four distinct points a 4-gon:
Ar1
HH
HH
HH
HH
HH
A2 r r HrA
4
A3
When the number of sides is not relevant, we will simply say polygon. In symbols,
a polygon will be denoted by A1 A2 · · · An . If n = 3, the polygon is called a triangle;
if n = 4, a quadrilateral; if n = 5, a pentagon; and if n = 6, a hexagon, as
we have seen. These technical terms have somehow made their way into everyday
conversation. For example, if you want to talk about politics, you had better know
what The Pentagon roughly looks like and that it houses the Department of Defense.
In principle there is a name for every n-gon at least for n ≤ 10. Thus if n = 7, the
polygon is called a heptagon; if n = 8, it is an octagon; if n = 9, it is a nonagon,
and finally if n = 10, it is a decagon. But such extra erudition is hardly necessary
since 7-gon, 8-gon, etc., would do just fine. In these volumes, we normally use
the special names only for n = 3, 4, 5, 6.
Given polygon A1 A2 · · · An , as in the earlier case of the hexagon, the Ai ’s are
called the vertices and the segments A1 A2 , A2 A3 , etc., are called the edges or
sometimes the sides. For each Ai , both Ai−1 and Ai+1 are called its adjacent
vertices (except that in the case of A1 , its adjacent vertices are An and A2 , and in
the case of An , its adjacent vertices are A1 and An−1 ). Thus the sides of a polygon
are exactly the segments joining adjacent vertices. Two sides of a polygon with a
common vertex are called adjacent sides. A line segment joining two nonadjacent
vertices is called a diagonal.
The use of subscripts makes clear (assuming n ≥ 5) that the adjacent vertices
of A2 are A1 and A3 , the adjacent vertices of A3 are A2 and A4 , etc., and that the
adjacent sides of A2 A3 are A1 A2 and A3 A4 , the adjacent sides of A3 A4 are A2 A3
and A4 A5 , etc. However, a beginner may have more trouble visualizing that the
adjacent vertices of An are A1 and An−1 or that the adjacent sides of An−1 An are
A1 An and An−2 An−1 . One way to overcome this notational quirk is to think of the
172 4. BASIC ISOMETRIES AND CONGRUENCE
A n−1 A3
.
A4
.
. .
.
Then it is quite clear from this arrangement whether or not two vertices or two
sides are adjacent around the vertex An of an n-gon.
The following are examples of polygons (with the labeling of the vertices omit-
ted):
E E A
E A E AP
E A E PP
E E
E E
E E
Line separation
Assumption (L3) enables us to say quite a bit more about lines in the plane.
To this end, we first introduce a definition. A subset R in a plane is called convex
if given any two points A, B in R, the segment AB lies completely in R. This
definition has the obvious advantage of being simple to use, but does it capture
the intuitive feeling of "convexity"? By doing lots of drawings, you will see that it
does. For example, the shaded figures below are not convex, as the segment AB in
each case fails to lie within the figure.
. A B.
. B .
A
Every line and the plane itself are of course convex. The convexity of the plane
is immediate from the definition of convexity, but because we are beginning to
prove geometric theorems, we should prove the convexity of a line. So given a line
L, suppose A and B are distinct points of L. We have to prove that the segment
AB lies in L. By assumption (L1) on page 165, there is a line LAB joining A and
B, and by the definition of a segment on page 169, the segment AB consists of all
10 Note that, here, we are using the concept of a "circle" in an informal way. The formal
the points between A and B on LAB , together with A and B. Thus AB ⊂ LAB .
However, another part of (L1) asserts that there is only one line joining A and B;
therefore necessarily L = LAB . Thus AB ⊂ L after all.
Many common figures, such as the inside of a triangle or a rectangle or a circle,
once "inside" has been properly defined (see p. 196 and p. 186), will also be seen to
be convex. It is also a simple exercise to show that the intersection of two convex
sets is convex (Exercise 7 on p. 180). Taking intersections of convex sets will turn
out to be a very productive way of generating new convex sets (see, e.g., pp. 181ff.).
If we have a number line L0 , then "the positive half-line" L+ 0 consisting of all
the positive numbers is convex: indeed if a and b are in L+ 0 and a < b, then the
segment joining a to b is the interval [a, b] consisting of all the numbers x satisfying
a ≤ x ≤ b. Since a is positive, x has to be positive and therefore every point in
−
this segment also lies in L+0 . For analogous reasons, "the negative half-line" L0
consisting of all the negative numbers is convex. Observe also that the number
−
line L0 is now broken up into three parts: L+ 0 , L0 , and the set {0} consisting of
the point 0 alone. These three parts have the properties that (1) no two of the
−
parts have any point in common and (2) the union of L+ 0 , L0 , and {0} is the whole
number line L0 . Furthermore, the line segment joining a negative number A to a
positive number B must contain 0; i.e., 0 ∈ [A, B] if A < 0 and B > 0, as shown:
0
L0
A B
By virtue of assumption (L3), we can transfer what we have just said about the
number line to any line in the plane. Let us first set up some common terminology.
A set is said to be nonempty if it contains at least one point. A collection of subsets
in the plane is said to be disjoint if no two of them have a point in common. For
example, the three sets L− 0 , L0 , and {0} on the number line L0 above are disjoint.
+
Since the three sets L+ , L− , and {0} are disjoint and their union is the whole line,
−
it is common to say in this situation that the line is the disjoint union of L+ 0 , L0 ,
and {0}.
A O B
Proof. By assumption (L3) (page 167), we can make L into a number line
with O being the 0 of the number line. Then letting L+ , L− , and {O} be the sets
−
L+0 , L0 , and {0} with respect to this number-line structure on L, we see by the
reasoning preceding the lemma that everything claimed in this lemma is true. The
proof is complete.
174 4. BASIC ISOMETRIES AND CONGRUENCE
When A and B belong to the same half-line, we sometimes say that A and B
are on the same side of O. If A and B belong to different half-lines, then we say
they are on opposite sides of O or on different sides of O.
Lemma 4.5 enables us to determine, given a point O on a line L, whether two
points A and B are on the same side or on opposite sides of O. Precisely:
(a) Two points A and B belong to the same side of O ⇐⇒ the
segment AB does not contain O.
(b) Two points A and B belong to opposite sides of O ⇐⇒ the
segment AB contains O.
The proof is simplicity itself. To prove (a), for example, first assume A and B
belong to the same side of O; let us say A, B ∈ L+ .
L− O A B L+
You may wonder why we bother with Lemma 4.5 since all it does is to restate the
obvious fact that the number line is the disjoint union of the positive and negative
half-lines and {0}. There are two reasons. One is that Lemma 4.5 provides a direct
geometric description of the separation of any line in the plane by a point lying in it
independent of any identification with a number line. Since we will be concentrating
on doing geometry, there will be many occasions when we want to be free from the
distractions of numbers. A second reason is that Lemma 4.5 sets up a model for
the next assumption, (L4) on page 176.
With notation as in Lemma 4.5, the set consisting of the point O and one of the
half-lines determined by O is called a ray. Thus O determines two rays. We also
say these are rays issuing from O. If we want to specifically refer to the ray con-
taining A, we use the symbol ROA . We will also refer to ROA as the ray from O
to A. The point O is the vertex of the ray ROA . If O is between A and B, then
the two rays ROA and ROB have only the vertex O in common and are sometimes
referred to as opposite rays. Each ray is, intuitively, infinite in only one direction.
The following lemma will be needed for the definition of a translation (pp.
231ff.). It also nicely illustrates why the concept of betweenness (page 167) is
useful.
r
O A B
4.1. THE BASIC VOCABULARY, PART 1 175
Plane separation
From lines we next turn to the whole plane. Two rays are defined to be distinct
if there is a point in one that does not lie in the other. Given two distinct rays ROA
and ROB with a common vertex O, the angle ∠AOB is intuitively the shaded region
Γ of the plane "between" these two rays, together with the two rays themselves, as
shown:
A
O
Γ
B
How to describe this shaded region Γ precisely is the next order of business and
we will get to that in the next section. What we do in this subsection is lay the
groundwork for such a description. To this end, we will need the planar analog
of Lemma 4.5 for a line (page 173). However, since we no longer have the planar
analog of (L3) (page 167)—which was used to justify Lemma 4.5—we will have to
take the drastic step of asserting the truth of the planar analog of Lemma 4.5 in a
new assumption. To this end, we define in general that the plane is the disjoint
union of three sets U , V , and W if these three sets are disjoint and if the union of
these three sets is the whole plane (compare the definition of disjoint union for a
line on page 173).
176 4. BASIC ISOMETRIES AND CONGRUENCE
(L4) (Plane separation) A line L separates the plane into two nonempty
subsets, H+ and H− , called the half-planes of L. The half-planes H+ and H−
satisfy the following two properties:
(i) The plane is the disjoint union of H+ , H− , and L, and the
half-planes H+ and H− are convex.
H−
H+
L
(ii) If two points A and B in the plane belong to different half-
planes, then the line segment AB must intersect the line L.
−
H q s q
A B
H+
L
Lemma 4.7. Let L be a line in the plane, and let B be a point in the half-plane
H+ of L. Suppose a line containing B intersects L at a point A. Then the half-
line of A on containing B is the intersection H+ ∩ , and the ray RAB is the
intersection of with the closed half-plane of L containing H+ .
4.1. THE BASIC VOCABULARY, PART 1 177
A B
L H+
Proof. [Since the lemma is pictorially obvious, consider skipping this proof in a
school classroom.]
Let the half-line of A on containing B be denoted by + . To show that
+ = H+ ∩ , we first prove H+ ∩ ⊂ + . Let P be a point on that lies in H+ , and
we will prove P ∈ + . If P = B, then of course B lies in + . So we may assume
P = B. Since both P and B are in H+ , the segment P B lies in H+ because the
latter is convex. Thus P B contains no point of L (because L and H+ are disjoint)
and, in particular, does not contain A. Therefore P , as a point of , lies on the
same side of A as B (see (a) on page 174), i.e., lies in + .
Conversely, we will prove the reverse inclusion: + ⊂ H+ ∩ . Thus, suppose
P lies on + and we will show P ∈ H+ ∩ ; i.e., we will show P lies in H+ , the
half-plane of L containing B. Now P being in + means the segment P B does not
contain A. This implies that P B contains no point of the line L because, if P B
contains a point C of L, then LP B ∩ L is the point C. But LP B = , so ∩ L is C.
But we are given that ∩ L is A, so A = C (Lemma 4.2 on page 165). Thus P B
contains A, a contradiction. So P B contains no point of the line L after all and P
and B must lie on the same side of L; i.e., P ∈ H+ (see (L4)(ii)). The proof that
+ = H+ ∩ is complete. Since the second assertion in the lemma about the ray
RAB is now trivial, we have proved the lemma.
Activity. Let L be a line in the plane and let H+ and H− be its half-planes.
If P ∈ L, let be a line distinct from L and passing through P . Prove that the two
sets ∩ H+ and ∩ H− are the half-lines of with respect to P .
We wish to make a further comment about assumption (L4). Clearly, one would
prefer a more explicit description of the half-planes of a line. After all, if a line is
drawn on a piece of paper or on a black board, one can point to the two "halves"
of the plane separated by the line. In a middle school classroom, this is what one
should do without a doubt: just point to the half-planes and not burden students
with abstract statements like (L4). But in high school, it is time for students to
learn to appreciate the difficulty of transcribing obvious visual information into
precise and (in this case) abstract language. Instead of waving their hands about
what is "on the left" or what is "on the right", they learn instead to use properties
(i) and (ii) in (L4) to pin down precisely what these half-planes are.11 Although
(i) and (ii) are nonintuitive, they nevertheless leave no doubt about each half-plane
being exactly what our intuition says it is: any two points in the same half-plane
can be joined by a segment disjoint from L so that we can walk along this segment
from one point to the other without crossing L (see (a) on page 176). Moreover,
if we are given two points A and B in the plane, "separated by L", then it means
that, intuitively, one cannot get from A to B without crossing L (see (b) on page
176). Without more information about the line, this is all we can say about its
11 The same abstract idea will be used once more at the end of this section for the statement
of an analogous theorem about polygons (Theorem 4.13 on page 195).
178 4. BASIC ISOMETRIES AND CONGRUENCE
Lemma 4.8. Let L and L be two distinct lines and let P1 , P2 , and P3 be three
distinct points on L so that P1 ∗P2 ∗P3 . Let three mutually parallel lines (cf. Lemma
4.3 on page 166) passing through P1 , P2 , and P3 intersect L at Q1 , Q2 , and Q3 ,
respectively. Then Q1 ∗ Q2 ∗ Q3 .
Remark. The lemma does not preclude the possibility that Pi = Qi for i = 1,
2, or 3, as the right picture above suggests. This lemma may seem frivolous at first
sight, but it has nontrivial applications. It will be used in the proof of Theorem
G15* on page 266, and it underlies the fact that a linear function defined on a line
in the plane is monotone; i.e., it is either constant or increasing or decreasing, as
the proof of (‡) on page 344 shows.
L L L L
P1 Q1 P1 = Q1
P2 P2
P3 Q3 P3 Q3
L3 L3
Q2 Q2
For simplicity, let us denote the line LP3 Q3 by L3 . If P1 = Q1 , then the line
containing P1 and Q1 being parallel to L3 (by hypothesis) does not intersect L3 .
In particular, the segment P1 Q1 does not intersect L3 and, by assumption (L4)(ii)
(on page 176), P1 and Q1 lie on the same side of L3 . Needless to say, the same
is true if P1 = Q1 . Next, we note that the segment P1 P2 does not intersect L3
either because, if it does, it has to intersect L3 at P3 (two distinct nonparallel lines
intersect at only one point, by Lemma 4.2 on page 165), and therefore P1 ∗ P3 ∗ P2
contradicts our assumption that P1 ∗P2 ∗P3 (see Lemma 4.4 on page 169). Therefore
P1 and P2 also lie on the same side of L3 . Altogether, we see that the points P1 ,
P2 , and Q1 are on the same side of L3 .
Since we are assuming that Q1 Q2 intersects L3 at Q3 , by (L4)(i) on page 176,
Q1 and Q2 lie on opposite sides of L3 . Since P2 and Q1 have been shown in the
preceding paragraph to lie on the same side of L3 , it follows that P2 and Q2 also
lie on opposite sides of L3 . By (L4)(ii) (on page 176), the segment P2 Q2 intersects
L3 . A fortiori, the line LP2 Q2 intersects L3 , and this contradicts the hypothesis of
the lemma that these are parallel lines. The proof of Lemma 4.8 is complete.
Exercises 4.1.
(1) Let L1 L2 and let a third line be distinct from L1 . Prove that if
intersects L1 , then it must intersect L2 .
(2) Give all the reasons why the following figure with five vertices cannot be
made into a polygon no matter how the vertices are labeled:
r
@ @
@r
r
r r
(3) Let A and P be two distinct points on a line. Suppose we fix one of the
two rays issuing from A and call it R1 . Then one and only one of the
two rays issuing from P has the property that it either contains R1 or is
contained in R1 .
(4) Let P , Q, and S be three distinct points on a line L so that P ∗ Q ∗ S.
(i) Show that the two rays RP Q and RP S are equal. (ii) Let L be a line
passing through P and distinct from L. Prove that Q and S belong to
the same half-plane of L .
180 4. BASIC ISOMETRIES AND CONGRUENCE
12 A corresponding uniqueness result holds for half-planes of a line and will be proved in
Definition of an angle
O
Γ
B
Note that Γ contains both ROA and ROB , by definition. The other subset deter-
mined by ROA and ROB is the union of the complement14 of Γ, together with the
two rays ROA and ROB . This is the shaded region below (reminder: the shading
is supposed to extend infinitely above, below, and to the left). Call this region Γ .
We note explicitly that Γ and Γ are not disjoint because they have the two rays
ROA and ROB in common.
Γ A
O P
There will be the rare occasion when we have to scrutinize Γ , so it will be useful to
have a characterization of Γ , and it is this: Γ is the union of the closed half-plane
of LOA not containing B and the closed half-plane of LOB not containing A. We
leave the straightforward proof of this fact to an exercise (Exercise 2 on page 198).
We claim that Γ is not convex. To see this, join A to B and let P be a point
on the segment AB so that A ∗ P ∗ B. We will prove that P does not lie in Γ , which
will show that Γ is not convex. By the convexity of Γ, P is in Γ and is therefore
not in the complement of Γ. Since Γ consists of the complement of Γ and the two
rays ROA and ROB , to show P is not in Γ , we must show that P lies in neither
ROA nor ROB . If it were to lie in ROA or ROB , let us say P ∈ ROA . Now P = A,
13 Recall an earlier remark on page 173 about generating new convex sets by intersecting old
ones.
14 The complement of a set S in the plane is by definition the set of all the points in the
so LOA and LAB have two distinct points A and P in common. By Lemma 4.2 on
page 165, we have LAB = LOA , and therefore B ∈ LOA (since B ∈ LAB ). This
contradicts the hypothesis that A, B, and O are noncollinear points. Thus Γ is
not convex, as claimed.
Either the convex set Γ or the nonconvex set Γ determined by these two rays
ROA and ROB is called the angle determined by these rays. Unless stated to
the contrary, we follow the standard practice of taking the convex subset to be the
angle and denote it by ∠AOB. These rays are called the sides of the angles, and
the point O the vertex of the angle. We emphasize that, in these three volumes, an
angle is always one of the (convex and nonconvex) regions of the plane determined
by two rays with a common vertex—and each angle always includes both rays.
(Most books follow a different convention by defining an angle to be the union of
the two rays themselves. The present definition suits our purpose better.) If we
want to consider the angle determined by the nonconvex subset, we will have to
say so explicitly or use an arc to so indicate, e.g.,
A
O B
A better notation, one that will be used often, is to use an arc and a letter in the
region to indicate the angle. Thus ∠b denotes the convex region in the left figure
below, while ∠c denotes the nonconvex region in the right figure below.
A A
b
O O
c
B B
The angles of a triangle (see p. 171 for the definition) are usually denoted by yet
another notation. For a triangle, such as ABC in the left figure below, we will
usually let ∠A stand for ∠BAC, ∠B for ∠ABC, and ∠C for ∠ACB.
A A
Z
ZZ
Z D
Z C
Z
Z
B ZC B
angle, which is intuitively the correct one to look at. We will lightly touch on this
delicate issue on page 197.
So far, we have dealt with the situation where A, O, B are not collinear. The
remaining case that they are collinear is of special interest. If O, A, B are distinct
collinear points, then A and B are in either the same half-line with respect to O or
in opposite half-lines. First, we look at the former case. Following the definition of
an angle, we get two sets: the "region" between the rays ROA and ROB together
with the rays themselves (which is ROA = ROB ) and the complement of the rays
ROA and ROB together with the rays themselves (which is the whole plane), as
shown below.
B B
O A O A
We define these special angles to be, respectively, the zero angle and the full
angle determined by the coincidental rays ROA and ROB .
Now suppose A and B are collinear with O but lie in opposite half-lines with
respect to O, i.e., A ∗ O ∗ B (Lemma 4.6(i) on page 174). In this case, either
closed half-plane determined by the line LAB is by definition the straight angle
determined by the opposite rays ROA and ROB . Thus ∠AOB will refer to either
closed half-plane.
Dr Er P r
Cr Er Br rN
r r
Mr
A B O r
Suppose we have chosen the segment AB in the bottom line and the segment DE
in the top line as a unit segment in its respective line. Intuitively, an appropriate
translation15 in the upward direction will bring the bottom line to the middle line
so that the point B goes to the point B and A to C. Since we expect translations
to be length-preserving, the segment CB will have length 1. Similarly, a suitable
translation in the downward direction will bring the top line to the middle line
and bring D to C and E to E . But DE is a unit segment and the translation is
length-preserving, so the segment CE also has to be of length 1. Obviously, not
both CE and CB can have length 1, so we have a contradiction. This shows that
the choice of a unit segment on a given line cannot be random.
We can also look at the same phenomenon from a different perspective. Con-
sider the three lines LOM , LON , and LOP in the right picture above. Again, suppose
we have chosen the segments OM , ON , and OP as unit segments in their respective
lines. Then M , N , and P all lie on the unit circle around O. Still speaking intu-
itively, it is clearly not a comforting thought that this unit circle does not appear
to be "round" as a result of such random choices of unit segments on these lines!
This naive discussion points to the fact that the concept of the length of a seg-
ment in the plane is far from simple; it can only be fully understood in the context
of rotations, translations, and reflections of the plane (which will be addressed in
assumption (L7) on page 237). Right now, we will make an assumption so that we
can assign "lengths" to segments in the plane in a "consistent" way. Precisely:
(L5) To each pair of points A and B of the plane, we can assign a number
dist(A, B), called the distance between A and B, so that
(i) dist(A, B) = dist(B, A) and dist(A, B) ≥ 0. Furthermore,
dist(A, B) > 0 ⇐⇒ A = B.
(ii) Given a ray with vertex O and a positive number r, there is
a unique point B on the ray so that dist(O, B) = r.
(iii) Let O and A be two points on a line L so that dist(O, A) = 1,
and let O and A be the 0 and 1 of a number line on L (as in (L3)).
Then for any two points P and Q on L, dist(P, Q) coincides with
the length of the segment P Q on this number line.
(iv) If A, B, C are collinear points and C is between A and B,
then
dist(A, B) = dist(A, C) + dist(C, B).
Let us amplify on (ii) and (iii). (ii) guarantees that, given a positive number r
and a point O in the plane, there are many points A of distance r from O (one for
each ray issuing from O). Note also that, without (ii), (iii) would not make sense
because we would not know whether there is a point A on L so that dist(O, A) = 1.
15 The precise meanings of rotations, reflections, and translations will be given in Sections
4.4 and 4.5, and the assumption about them is (L7) on page 237.
4.2. THE BASIC VOCABULARY, PART 2 185
Now the main thrust of (iii) is that, with O and A as 0 and 1, respectively, on L,
the concept of the length of a segment [P, Q] on the number line L is already well-
defined (see page 6), and it would potentially be confusing if this length were to
differ from the distance between P and Q. Fortunately, (iii) averts such confusion.16
It may be worthwhile to further point out that, according to equation (2.37)
on page 126, the length of a segment [P, Q] on L is |P − Q|. In this light, what (iii)
shows is that the distance between P and Q can be computed by |P − Q| where P
and Q are now regarded as two numbers on this number line. This fact is of critical
importance when we get to setting up coordinates in the plane (see Section 6.3 on
pp. 331ff.).
We will refer to the assignment of a number dist(A, B) to each pair of points
A and B in the plane as the distance function. In view of (iii), the length of a
segment AB for any two points A and B, to be denoted by |AB|, will henceforth
be defined to be dist(A, B) without any reference to the line containing A and B.
Thus "length of a segment" retains the intuitive meaning of "the distance between
the endpoints".
Anticipating assumption (L7) on page 237, we hasten to show how (L5), to-
gether with (L7), will rule out the absurd situations depicted in the pictures on
page 184. First, we show that it is impossible that both CE and AB have length
1 in the picture below. Suppose they do, and we will deduce a contradiction.
Cr Er Br
r r
A B
This is because the "upward" vertical translation that moves the bottom line to
the top line will move A to C and B to B , and since translation preserves length
of segments, the lengths of CB and AB will be the same. By (iii) of assumption
(L5), we get
|CE | + |E B | = |CB | = |AB| = 1.
Since also |CE | = 1 by hypothesis, we have
1 = |CE | = |CB | − |E B | = |AB| − |E B | < |AB| = 1,
where the inequality is because |E B | > 0, by (i) of (L5). Therefore 1 < 1, a
contradiction. So if we assume (L5) and (L7), the fact that AB has length 1 in the
bottom line will preclude CE from having length 1 in the top line.
Similar considerations will rule out why it is impossible that all three segments
OM , ON , and OP in the following picture could have length 1. Again, suppose
they do, and we will deduce a contradiction.
r
r M
P
rN
M
O r r
16 With hindsight, the requirement in (iii), that dist(O, A) = 1, is necessary because the
Indeed, if we rotate counterclockwise around O until the ray ROM is on top of the
ray RON , then the point M will be rotated to the point M on RON . Since rotation
preserves length, we have |OM | = |OM | = 1. By (L5)(iii),
|ON | + |N M | = |OM | = 1.
A circle whose radius is of length 1 is called a unit circle, and a disk of radius
1 is called a unit disk. In general, given a circle C of radius r around a point O,
we say a point P is inside C if P is in the closed disk of radius r around O; i.e.,
dist(P, O) ≤ r. We say P is in the exterior of C if dist(P, O) > r.
Degrees of angles
We need one more definition before we can introduce the concept of the degree
of an angle. We say two angles ∠AOC and ∠COB, with a common side ROC , are
adjacent angles with respect to ∠AOB if C belongs to ∠AOB (as a region in
the plane). Let it be stated explicitly that, in this case, although ∠AOB can denote
either the convex subset or the nonconvex subset, once the choice is made, then
∠AOC and ∠COB are understood to be subsets of ∠AOB. For example, suppose
∠AOC and ∠COB are adjacent angles with respect to a convex angle ∠AOB as
in the left picture below.
17 This is the standard way to use the words "open" and "closed" in advanced mathematics.
4.2. THE BASIC VOCABULARY, PART 2 187
A A
O C O C
B B
Then ∠AOC in this context will have to be the convex (shaded) subset in the left
picture below rather than the nonconvex subset indicated by the arc in the right
picture above. Similarly, ∠COB in this context will have to be the convex angle.
Next, consider the situation where ∠AOC and ∠COB are adjacent angles with
respect to a nonconvex angle ∠AOB as in the left picture below; then in this case
∠AOC (for example) has to be the shaded subset on the left and not the nonconvex
subset indicated by the arc on the right.
A A
C C O
O
B B
An interesting example of adjacent angles is the case of a straight angle ∠AOB
(see p. 183 for the definition of straight angle): let the ad hoc notation of Π+ denote
the upper closed half-plane of LAB and let Π− denote the lower closed half-plane
of LAB :
Π+
O
A B
Π_
Then Π+ and Π− are adjacent angles with respect to the full angle at O (see page
183 for the definition of full angle).
Adjacent angles ∠AOC and ∠COB (with respect to ∠AOB) are the analogs,
among angles, of adjacent segments AC, CB so that A, B, C are collinear and C
is between A and B.
O A C B
The concept of adjacent angles will allow us to formulate the analog of part (iv) in
assumption (L5) above.
Now we can introduce the concept of the degree of an angle by way of an
assumption. Intuitively, every angle has a degree, a straight angle should be 180
degrees, and the "full" angle should be 360 degrees. Our assumption now takes the
188 4. BASIC ISOMETRIES AND CONGRUENCE
following form:
(L6) To each angle ∠AOB, we can assign a number |∠AOB|, called its
degree, so that:
(i) 0 ≤ |∠AOB| ≤ 360◦ , where the small circle ◦ is the abbrevi-
ation of "degree".
(ii) Given a ray ROB and a number x so that 0 < x < 360 and
x = 180, let one of the two closed half-planes of the line LOB be
specified. Then there is a unique ray ROA lying in the specified
closed half-plane of LOB so that |∠AOB| = x◦ , where ∠AOB
denotes the convex angle if x < 180, and the nonconvex angle if
x > 180.
(iii) |∠AOB| = 0◦ ⇐⇒ ∠AOB is the zero angle; |∠AOB| = 180◦
⇐⇒ ∠AOB is a straight angle; |∠AOB| = 360◦ ⇐⇒ ∠AOB is
the full angle at O.
(iv) If ∠AOC and ∠COB are adjacent angles with respect to
∠AOB, then
Observe that parts (i), (ii), and (iv) of (L6) are the exact analogs of (i), (ii),
and (iv) of (L5). Also observe that, with respect to the previous situation of two
straight angles Π+ and Π− being adjacent angles with respect to the full angle
at O, part (iv) of (L6) now provides a consistency check on part (iii) of the same
assumption (L6).
Π+
O
A B
Π_
Indeed, (iv) says that the following is valid:
But (iii) implies that the left side is equal to 180◦ + 180◦ , which is also 360◦ . So
indeed the equality is valid.
We can also use property (iv) of the degree of an angle to prove something that
confirms our intuition. Given two rays ROA and ROB with a common vertex O so
that O, A, and B are not collinear, then they determine two angles, one convex
and the other nonconvex (page 181), and both angles are denoted by ∠AOB. The
next lemma is intuitively clear.
Proof. We first prove that if ∠AOB is convex, then |∠AOB| ≤ 180◦ . By the
definition on p. 182, the convexity of ∠AOB implies that it is the intersection of
the closed half-plane ΠA of LOB containing A and the closed half-plane ΠB of LOA
containing B.
4.2. THE BASIC VOCABULARY, PART 2 189
A ΠA
@
@
@
@
@
E O B
In view of Lemma 4.9, our convention of taking every angle that is not a
straight angle to be convex therefore amounts to assuming that, unless stated to
the contrary, such an angle is < 180◦ .
For a later need, we point out the following direct consequence of the uniqueness
assertion in part (ii) of assumption (L6).
190 4. BASIC ISOMETRIES AND CONGRUENCE
Lemma 4.10. Let two angles ∠M AB and ∠N AB be both convex or both non-
convex (see the picture below). Suppose they have one side RAB in common and
M and N are on the same side of the line LAB . Then the other sides RAM and
RAN coincide if and only if the angles have the same degree.
M
"
"
" N
"
"
"
"
"
"
"
"
"
"
A B
As in the case of assumptions (L5) on distance, (L6) by itself does not have
much substance and its full significance is revealed only in the context of the next
assumption that the basic isometries (to be defined in the next section) are distance-
preserving and degree-preserving (see assumption (L7), page 237).
We now give a more intuitive discussion of the degree of an angle. Let ∠AOB
be given. Here, ∠AOB will denote either the convex angle or the nonconvex angle,
depending on the situation. Let C be the unit circle centered at O, and we may
assume that both A and
B lie on C. Let AB denote the intersection of C with
∠AOB. We will call AB an arc on C or, more precisely, the arc intercepted by
∠AOB on C. AB is called a minor arc on C if it is the intersection of C with a
convex angle. It is called a major arc if it is the intersection of C with a nonconvex
angle. It is possible, using the distance function in the plane, to define the length
of an arc.18 An arc whose length is 360 1
of the length of C is called an arc of one
degree. Then we can subdivide a degree into n equal parts in the sense of equal
arc-length (where n is any whole number), thereby obtaining n1 of a degree, etc. It
is exactly the same as the division of the chosen unit on a number line into unit
fractions in Section 1.1 except that in this case we have a "circular number line"
so that, once a point has been chosen to be 0, the number 360 coincides with 0
again. In any case, every arc on the unit circle will also have a length measured in
terms of degrees so that we can speak of an arc of 36.7 degrees. (This discussion
will be made completely precise in Section 1.5 of [Wu2020c]; see especially Lemma
1.12 therein.)
Still on an intuitive level, with ∠AOB and AB as above, the degree |∠AOB|
of ∠AOB in the sense of assumption (L6) on p. 188 is just the degree of the arc
AB that ∠AOB intercepts on the unit circle. Thus in the following picture, if the
length of this arc AB is x degrees, then |∠AOB| = x◦ . (On the left, ∠AOB denotes
the convex angle and, on the right, the nonconvex angle.)
18 There is a subtle point in this definition which will be addressed in Chapter 4 of [Wu2020c]
when we discuss length and area. It has to do with the fact that we have yet to precisely define
what the "length of an arc" is. There is no fear of circular reasoning, however, because the
length of an arc can be defined independently and can be given right now if we do not mind the
interruption. Therefore, the concept of "length of an arc" in this discussion may be taken in a
naive sense without any fear of logical difficulties.
4.2. THE BASIC VOCABULARY, PART 2 191
A A
xo
1
O 1 B O B
xo
Notice that the method of angle measurement we have just described is exactly
the principle used in the construction of the protractor.
Mathematical Aside: (1) With notation as in the preceding discussion, the de-
gree of an arbitrary angle, ∠AOB, is defined in advanced mathematics as follows.
The length (i.e., circumference) of the unit circle C being 2π, let δ = 2π/360. If
we think of the full angle at the center of the circle as 360◦ , then δ is the length
of the arc intercepted on the unit circle by a 1◦ angle. Now let the length of the
arc AB intercepted by ∠AOB on the unit circle be denoted by ||AB||. Then the
degree of ∠AOB is, by definition, the number ||AB||/δ. (2) As is well known,
there is another unit for measuring angles, called radian, that is more commonly
used in advanced mathematics. The reason for preferring radians to degrees will be
explained in Section 1.5 of the companion volume [Wu2020c], but for now, degree
will serve perfectly well as a unit for measuring angles.
of these names for triangles, namely, a triangle is called a right triangle if one of
its angles is a right angle, an acute triangle if all of its angles are acute, and an
obtuse triangle if (at least) one of its angles is obtuse. (We will see in Section
6.5 of [Wu2020b] that a triangle cannot have more than one obtuse angle or more
than one right angle.)
Let two lines meet at O, and suppose one of the four angles, say ∠AOB as
shown, is a right angle.
A
B q B
O
A
Then we claim that all the remaining angles are also right angles; i.e., |∠BOA | =
|∠A OB | = |∠B OA| = 90◦ . This is because, by (iii) of (L6), |∠BOA | =
|∠AOA | − |∠AOB| = 180◦ − 90◦ = 90◦ . Similarly, the remaining two angles
are also 90◦ . It follows that when two lines meet, one of the four angles so pro-
duced is a right angle if and only if all four angles so produced are right angles.
By definition, two lines are perpendicular if one of the four angles at the point
192 4. BASIC ISOMETRIES AND CONGRUENCE
A O B
We thus have
Lemma 4.11. Let L be a line and O a point on L. Then there is a unique line
passing through O and perpendicular to L.
With the availability of measurements for both angles and line segments, we
can complete the list of standard definitions. First, if AB is a segment, then by
(L5)(iii), we may assume that LAB can be made into a number line so that the
segment AB is just [A, B], and dist(A, B) = B − A, where the A and B in "A − B"
are understood to be numbers. If C = 12 (A + B), then C ∈ LAB and it is easy to
check that
1
A < C < B and C − A = B − C = (B − A) .
2
O A C B
From A < C < B, we get C ∈ [A, B], and from C − A = B − C, we get dist(A, C) =
dist(C, B), or in the language of length as defined on page 185, |AC| = |CB|.
This point C is called the midpoint of the segment AB; i.e., C ∈ AB and C
is equidistant from the endpoints A and B of AB. Then, analogous to the angle
bisector of an angle, the perpendicular bisector of a segment AB is the line
perpendicular to LAB and passing through the midpoint of AB. It follows from the
uniqueness of the line perpendicular to a line passing through a given point that
there is one and only one perpendicular bisector of a segment.
19 Recall that an angle is a region, so it makes sense to say a ray lies in ∠AOB.
4.2. THE BASIC VOCABULARY, PART 2 193
We now introduce some common names for certain triangles and quadrilaterals
(see page 171 for the definitions).
An equilateral triangle is a triangle with three sides of the same length, and
an isosceles triangle is one with at least two sides of the same length. (Thus
by our definition, an equilateral triangle is isosceles.) A quadrilateral all of whose
angles are right angles is called a rectangle.20 A rectangle all of whose sides are of
the same length is called a square. Be aware that, at this point, we do not know
whether there is a square or not, or worse, whether there is a rectangle or not. (If it is
the case that the sum of (the degrees) of the four angles of the quadrilateral is 361◦ ,
then clearly no rectangle can exist, much less a square.) Now for a quadrilateral
ABCD, the sides AB and CD are formally defined to be opposite sides of the
quadrilateral, as are the sides BC and AD.
B PPP
PP
PC
A D
A quadrilateral with at least one pair of parallel opposite sides21 is called a trape-
zoid. A quadrilateral with two pairs of parallel opposite sides is called a parallel-
ogram. A quadrilateral with four sides of equal length is called a rhombus. We
shall prove in Section 6.2 of [Wu2020b] that rhombi are parallelograms.
A debate of long standing in TSM is about whether one should define an isosce-
les triangle to have exactly two sides of equal length, a rectangle to be a quadrilateral
with four right angles but with at least two unequal adjacent sides, or a trapezoid
to be a quadrilateral with exactly one pair of parallel sides. This is not a productive
debate because there are valid mathematical reasons for the convention adopted in
the preceding paragraph.22 This is another reason why we should get rid of TSM
so that equilateral triangles are allowed to be special cases of isosceles triangles,
squares are allowed to be special cases of rectangles, etc.
Polygonal regions
In the above catalog of names for polygons, we all know that equilateral trian-
gles and squares are special; they are examples of regular polygons. A polygon is
by definition a regular polygon if all its sides have the same length, all its angles
(at the vertices) have the same degree, and it is inscribed in a circle; i.e., all its
vertices lie on a circle. There is an equivalent way of expressing the last condition
about being inscribed in a circle that turns out to be important for other reasons,
and it involves convexity. This is our next issue.
We already mentioned the fact that school mathematics conflates a closed disk
of radius r and center O (see p. 186 for the definition) with the circle of the same
20 Mathematical Aside: This is the correct definition of a rectangle because in the non-
Euclidean geometries, there can be no rectangles in this sense. See [Greenberg, page 250].
21 Strictly speaking, the correct statement is that "the lines containing the opposite sides are
sides, the present convention immediately implies that the area of a square is the square of one
side, but if we do not allow a square to be a rectangle, then we must prove anew the area formula
for a square.
194 4. BASIC ISOMETRIES AND CONGRUENCE
radius and center. Thus when school mathematics talks about "the area of the
circle of radius r around O", what it means is actually "the area of the closed disk
of radius r around O". There is little hope of forcing a change of terminology in
school mathematics at this late date, so just grin and bear it. Nevertheless, we
want to create a mathematical framework in which the difference between a circle
and a disk of the same radius and center can be carefully analyzed when the need
arises (as indeed it will). In everyday language, we normally refer to a circle as the
"boundary" of the closed disk with the same radius and center. This way of talking
about a circle and its associated disk is both unambiguous and convenient, so there
is no reason not to adopt it in mathematics provided we can make precise sense of
"boundary". This we now do.
Let S be a subset of the plane Π. A point B is a boundary point of S in
Π if every disk23 centered at B of positive radius—no matter how small—contains
a point in S and a point not in S. Intuitively, this means that a boundary point
of S is one that can be approached arbitrarily closely by points in S and also by
points outside S. In particular, a boundary point of S is never a point "completely
inside" S or "completely outside" S. So this conforms to our naive conception
of a "boundary" point of S. By definition, the boundary of S is the set of all
the boundary points of S. We leave as an exercise (Exercise 4 on page 198) the
verification that the circle of radius r (r > 0) about O is the boundary of the closed
disk of radius r about O as well as the boundary of the open disk of radius r about
O (see p. 186 for the definitions of open disk and closed disk).
Note that the concept of the boundary of a set S in Π is dependent on the
fact that S is being considered as a subset of the plane Π. Thus if S is a segment
AB in the plane, then the boundary of AB, as a subset of the plane, is the whole
segment AB itself. This is in contrast with the fact that, when we talk about "the
boundary of the segment AB in the line LAB ", we mean the two endpoints A and B.
Activity. Verify the last claim that the boundary of a segment AB in the
plane is AB itself.
Denote the open disk of radius r with center O by D and the circle with the
same center and the same radius by C. As noted, the boundary of D is C. There is
another set with C as its boundary: if E denotes the set of all the points P so that
|P O| > r, then E too has C as its boundary (see Exercise 4 on page 198 again).
We will call E the exterior of C. To distinguish between the two sets D and E,
we introduce the following concepts: a set S in the plane is said to be bounded
if it is contained in some closed disk of radius R; otherwise it is unbounded. For
example, a circle of radius r is bounded, and so is an open disk or a closed disk of
radius r, but any line, any ray, or any half-plane of a line is unbounded. We can
also understand boundedness from a slightly different point of view, as follows.
Lemma 4.12. A set S in the plane is bounded if and only if there is a point O
in the plane and a positive number R so that the distance of every point in S from
O is ≤ R.
Proof. First let S be bounded. So it is contained in a closed disk D of radius R. If
the center of D is O , then the distance of each point of S from O is ≤ R because
23 It does not matter whether the disk is closed or open.
4.2. THE BASIC VOCABULARY, PART 2 195
each point of S is a point of D. Conversely, if a set S has the property that for
some fixed positive number R, every point of S is of distance ≤ R from a fixed
point O in the plane, then S is bounded because S is contained in the closed disk
of radius R centered at O . The proof is complete.
With the same notation, we see that the open disk D is bounded but the exterior
E of the circle C is not. It is clear that the plane is the disjoint union of D, C, and
E (see page 175 for the definition of disjoint union). There is also a property about
D and E that is intuitively obvious:
Any segment joining a point in the open disk D to a point in the
exterior E of the circle must intersect the circle C.
Like many "obvious" statements in geometry, a proof of this assertion involves
subtle concepts about real numbers. (Compare the intermediate value theorem in
Section 6.2 of [Wu2020c].) For this reason, we will assume this fact here without
proof.
A set in the plane that contains all of its boundary is called a closed set. In
terms of the preceding notation, the closed disk with center O and radius r, to be
denoted by D, is a closed set. Observe that D = D ∪ C, where we recall that the
symbol "∪" stands for union. In this volume, we will refer to the closed disk D
with radius r and center O as the closed set inside the circle C. Sometimes we
say more simply that D is the inside of C.
We have discussed the situation of the circle in such detail because it sets up a
model for the discussion of polygons (see page 171 for the definition). The linguistic
abuse of the word "circle" described on page 194 in fact spills over to "triangle",
"quadrilateral", and in general, "polygon". For example, a triangle is by definition
the union of the three segments consisting of the three sides, but when we speak
of, e.g., "the area of the triangle", we certainly do not mean "the area of the three
segments" but, rather, "the area inside those three segments", where the meaning
of inside is usually understood in an intuitive and imprecise sense.
We now try to shed some light on the word "inside". First, let us draw a
parallel with the situation of a circle by invoking a theorem that we will not prove
in this volume. Two new definitions will be needed for the statement of the theorem.
A polygonal segment is a finite collection of segments A1 A2 , A2 A3 , A3 A4 , . . . ,
An−2 An−1 , An−1 An , with the understanding that these segments could be collinear
and that there may be intersections among them. Then a region R in the plane
is said to be connected if any two points in R can be joined by a polygonal
segment that lies completely in R. It is easy to see that the open disk D with
center O and radius r and the exterior E of the circle with center O and radius r
are disjoint connected sets (see Exercise 5 on page 198). Also, recall the definition
of the complement of a subset S in the plane as all the points in the plane not lying
in S (see the footnote on page 181).
This theorem should remind you of the plane separation assumption (L4) on
page 176.
Not to belabor the point, but it should be obvious that if we replace P, B, and
E in the theorem by the sets C, D, and E (that arose in the preceding discussion
of the circle C), respectively, then the theorem (except for the last part) is just a
summary of what we found out about the circle C. From now on, we will denote the
bounded set in the theorem exclusively by B. Then we call the union of B and P
the inside of P or the region enclosed by P; the region E in Theorem 4.13 will
be called the exterior of P.24 It follows from Theorem 4.13 that both the inside
of a polygon and the union of a polygon with its exterior are closed sets, and both
have P as boundary.
We will not give a proof of Theorem 4.13, to avoid getting sidetracked because
it is long, and it involves technical arguments that cannot be said to be basic to the
K–12 curriculum. It may be mentioned that the standard statement of Theorem
4.13 does not include the last part about when "B = B and E = E". It is
included here because of our later needs and because it is an easy consequence of
the first part of the theorem. In any case, an essentially complete and readable
proof of Theorem 4.13 is given on pp. 267–269 of the classic What Is Mathematics?
([Courant-Robbins]).25
One can easily believe this theorem by looking at a few pictures; the shaded
set in each of the following is the inside of the polygon in question.
From Theorem 4.13, a polygon is the boundary of the inside of the polygon,
which is a closed bounded set. This motivates the following definition.
open set. This is the reason why we use the term "inside" rather than "interior".
25 This theorem is the special case of the famous Jordan Curve Theorem when the curve in
question is a polygon. One can find an elementary proof of this theorem in [Henle]. Inciden-
tally, the Courant-Robbins volume is highly recommended as a general introduction to advanced
mathematics.
4.2. THE BASIC VOCABULARY, PART 2 197
exception to define boundary of a set and closed set because they are absolutely
essential for the considerations of area in Chapter 4 of [Wu2020c]. However, we
suggest that you do not make heavy weather of these two definitions in the school
classroom because school students have far more pressing concerns than learning
these subtle definitions. The everyday meaning of "boundary" is good enough most
of the time in the school setting. Therefore, this definition is for your own concep-
tual clarification as a teacher: to the extent possible, these three volumes will try
to convince you at every step of the way that there is no room for ambiguity in
mathematics. End of Pedagogical Comments.
Exercises 4.2.
(1) Imagine the hands of a clock to be idealized rays emanating from the
center of the clock. (a) What is the angle between the hour and minute
hands precisely at 8:20 am?27 (b) At what time between 8 am and 9 am
will the hour and minute hands coincide? (c) Is there any time—other
than 12 am and 12 pm—that the hour, minute, and second hands all
coincide?28
(2) Prove the characterization, stated on page 181, of Γ , the nonconvex angle
determined by two distinct rays ROA and ROB with a common vertex O.
(3) Prove on the basis of (L6) that every angle has one and only one angle
bisector.
(4) To do this exercise, use any theorem you know from high school geometry,
but be sure to state precisely what you are using. (i) Prove that the
boundary of the open (or closed) disk of radius r around a point O is the
circle of radius r around O. (ii) Prove that the exterior of a circle C (see
page 194) has C as its boundary. (iii) Prove that the exterior of a circle is
never convex.29
(5) Prove that if C is a given circle with center O and radius r, then its open
disk, closed disk, and exterior are all connected. (For the connectedness
of the exterior, use any theorem you know from high school geometry, but
be sure to state precisely what you are using.)
(6) A triangular region in the plane is by definition the intersection of the
three angles of a triangle. (a) Show that a triangular region is always
convex. (b) Let T be a triangular region in the plane. Show—without
invoking Theorem 4.13 on page 195—that if P ∈ T and Q is in the exterior
of T , then the segment P Q intersects the boundary of T .
(7) Given a circle C and a point P on C, a line LP is said to be a tangent to
C at P if LP intersects C exactly at P ; i.e., LP ∩ C = {P }. Assume that
every point of a circle has a tangent (a fact we will prove in Section 6.8 of
[Wu2020b]) and that the circle always lies entirely in a closed half-plane
of each tangent. Then prove that any disk, open or closed, is convex.
(Caution: This exercise is not as easy as it seems; try to do everything
according to the definitions.)
6.8 of [Wu2020b].
4.3. TRANSFORMATIONS OF THE PLANE 199
(8) Assume that for any given subset S of the plane, if a segment joins a point
in S and a point not in S, then the segment contains a boundary point of
S.30 Then prove that any closed bounded set in the plane with a circle C
as its boundary is the closed disk with the same radius and center.
Why transformations
Given two segments AB and CD, how can we compare which one is longer
without first getting their individual lengths? For example, suppose we have a
rectangle ABCD. Do the opposite sides AB and CD have the same length?
A D
B C
Similarly, given two angles, how can we compare which one is bigger without
first getting their individual degrees? For example, if two lines L and L are parallel
and they are intersected by another line, how can we tell if the angles ∠a and ∠b
as shown have the same degree?
L
a
b L
30 Mathematical Aside: This simple fact requires the least upper bound axiom for its proof;
These questions, while seemingly silly when the figures are drawn on a piece of
paper, take on a new meaning if the sides of the rectangle ABCD are a few miles
apart or if the lines L and L in the case of angles are also very far apart. We are
therefore confronted with a real-world situation of having to find out whether two
geometric figures (two segments, two angles, or two triangles) in different parts of
the plane are "the same" in some sense (e.g., same length, same degree, etc.).
The traditional way of dealing with this problem in Euclidean geometry is to
write down a set of axioms which abstractly guarantee that two triangles are "the
same" (i.e., congruent). This is how it is usually done in TSM, and the drawback
of such an approach is that, in a mathematical environment where proofs and
reasoning are scarce or nonexistent, to introduce students to proofs by the opaque
formalism of axioms is to invite discontent and also to ensure nonlearning. As of the
last decade, the teaching of geometry in many high schools still vacillated between
teaching proofs by rote via axioms from the beginning and teaching no proofs at
all.31
We propose a third approach, one that is more direct and more tangible and
that makes use of three standard "moves"32 to bring one figure on top of another
in order to check whether two geometric figures are congruent. Even more impor-
tantly, we base proofs of theorems directly on these "moves". In this way, congru-
ence ceases to be mysterious and abstract; it becomes a tactile concept which can
be realized concretely via these standard "moves". The key issue then is what it
means to "move" things around in a plane, with the understanding that the lengths
of segments and degrees of angles remain unchanged in the process. Since "moving
things around in a plane" is exactly where the concept of a "transformation" comes
in, we first define transformations.
31 One can get a glimpse of the general situation from the book review [Wu2004a].
32 To be called basic isometries (page 217).
4.3. TRANSFORMATIONS OF THE PLANE 201
of the distance function (see page 185), an equivalent definition is therefore that
an isometry F is a transformation that preserves length; i.e., the length of any
segment P Q, where P and Q are points in the plane, is equal to the length of the
segment P Q , where P = F (P ) and Q = F (Q ). Thus, if a picture of P , Q,
P , and Q looks like the following, then the transformation F is not an isometry
because the length of P Q is visibly longer than the length of P Q .
q Q
qP
q q
P Q
On the other hand, the identity transformation is an isometry. The next subsection
will introduce a class of isometries called rotations.
Rotations
θ
φ φ
θ O P1 Q1
O
P3
202 4. BASIC ISOMETRIES AND CONGRUENCE
If θ ≥ 0, θ (P ) is the point Q on C so Q
that Q is obtained from P by turning
θ degrees in the counterclockwise θ
O
direction along C (in other words, P
|∠QOP | = θ ◦ ).
B
S
A
B
S
Q
A o
32
o
32
Q
O
Now use a pointed object (e.g., the needle of a compass) to pin the transparency to
the paper at the point O. Holding the paper fixed, rotate the transparency around
O, counterclockwise by 32 degrees33 and stop. For the moment, ignore the angles
∠AQB and ∠A Q B in the above picture and concentrate on S and Q. We will
denote the red geometric figure in this new position by S . S is exactly where ρ has
moved S. Similarly, we denote the red point Q in this new position by Q ; this point
Q is where the rotation ρ has moved Q. Notice that ρ does not move O, the center
of rotation. Needless to say, there is nothing special about the number 32; one
should do this Activity with angles of any degree, clockwise or counterclockwise.
This Activity suggests that the rotation ρ is an isometry. Indeed, if A and B
are two points in the plane Π and ρ moves them to A and B , respectively (see the
preceding picture), then the Activity tells us how to locate A and B ; namely, copy
O, A, and B in red on a piece of transparency, and then rotate the transparency 32
degrees around O. The positions of the red A and red B on the transparency are
the locations of A and B , respectively. Since the distance from the red A to the
red B on the transparency is exactly the distance from A to B in Π, ρ is distance-
preserving; i.e., ρ is an isometry. The Activity also suggests that ρ preserves the
degrees of angles in the sense that—in the notation above—the angle ∠AQB,
for example, is moved by ρ to ∠A Q B and since the red angle ∠A Q B is a copy
of ∠AQB on the transparency, we have |∠A Q B | = |∠AQB|.
Everything we have said thus far has nothing in particular to do with "32 de-
grees", so what the Activity suggests is that any rotation is an isometry that also
preserves degrees of angles. However, from the point of view of our mathematical
development, the fact that a rotation is an isometry is something that cannot be
proved but must be assumed. See page 217 and page 237.
somewhere on the paper (not the transparency) (with the help of a protractor) so that the ray
ROA is in the counterclockwise direction of the ray ROB . Copy ∠AOB on the transparency. Then
rotate the transparency counterclockwise around O until the ray ROB on the transparency is on
top of the ray ROA on the paper.
204 4. BASIC ISOMETRIES AND CONGRUENCE
e.g., "a rotation of (−36) degrees" or "a (−36)-degree rotation" signifies that we
have to rotate 36 degrees clockwise.
(2) It is a curious fact—but sometimes useful nevertheless—that since we allow
ourselves to use both clockwise and counterclockwise rotations, it suffices to use
angles of θ degrees so that |θ| ≤ 180 in any discussion of rotations. Indeed, a rota-
tion of 235 degrees is equal to a rotation of −125 degrees because 360 − 125 = 235,
and a rotation of −235 degrees is equal to a rotation of 125 degrees. In general, if
180 < θ ≤ 360, then a rotation of θ degrees is equal to a rotation of −(360 − θ)
degrees, and if −360 ≤ −θ < −180, then a rotation of −θ degrees is equal to a rota-
tion of (360 − θ) degrees. Of course, if 180 < θ ≤ 360, we have | ± (360 − θ)| < 180.
We will put this to use later.
To make sense of these new concepts, we have to look at more than the constant
transformation and the identity transformation. Let us consider the rotations de-
fined above. We claim that a rotation is always a bijection. While this is intuitively
obvious, we will go through the argument carefully. Let us fix a rotation ρ around
some O of θ degrees. Since a rotation of 0 degrees is the identity transformation,
we may assume 0 < |θ| ≤ 360. By a remark on page 204, we may in fact assume
0 < |θ| ≤ 180. Because the case of θ = 180 is obvious, we will henceforth assume
0 < |θ| < 180.
ρ is injective. Let P1 and P2 be two distinct points, and we must show ρ(P1 ) =
ρ(P2 ). If one of them is equal to O, say P1 = O, then ρ(P1 ) = O by the definition
of a rotation, and, because P2 = O, ρ(P2 ) = O, also by the definition of a rotation.
Therefore ρ(P1 ) = ρ(P2 ) in this case. We may therefore assume that both P1 and
P2 are different from O. If |OP1 | = |OP2 |, then there is nothing to prove because, if
P1 and P2 denote ρ(P1 ) and ρ(P2 ), respectively, then by the definition of a rotation,
|OP1 | = |OP1 | = |OP2 | = |OP2 |. Therefore |OP1 | = |OP2 | so that P1 = P2 ; i.e.,
ρ(P1 ) = ρ(P2 ). So let us suppose |OP1 | = |OP2 |. Then both P1 and P2 lie on some
circle C around O and ρ maps them to P1 and P2 , respectively, as shown:
P P2
1
θ φ P1
P2
O
C
For definiteness, we may assume θ > 0 because the argument for the case of
a negative θ is similar. Thus, we are looking at a counterclockwise rotation of
θ degrees, 0 < θ < 180. Let ∠P1 OP2 denote the convex angle as usual, and let
|∠P1 OP2 | = φ◦ , where 0 < φ ≤ 180 (see Lemma 4.9 on page 188). By switching the
points P1 and P2 if necessary, we may assume that the counterclockwise rotation of
φ◦ maps P1 to P2 , as shown in the above picture. Therefore we may characterize P2
(which is ρ(P2 )) as the point obtained from P1 , first by a counterclockwise rotation
of φ degrees (which moves P1 to P2 ), followed by a counterclockwise rotation of θ
degrees (which moves P2 to P2 ). By (a) on page 201, P2 is the point obtained from
P1 by a counterclockwise rotation of (φ + θ) degrees. But P1 is by definition the
point obtained from P1 by a counterclockwise rotation of θ degrees. Since φ > 0,
(θ + φ) = θ and therefore P1 = P2 , by (b) on page 201. The proof of the injectivity
of ρ is complete.
ρ is surjective. Let a point Q be given. We must find a point Q so that
ρ(Q) = Q . If Q = O, just let Q = O. So let Q = O, and let C be the circle with
center O and radius OQ . Now rotate Q by θ degrees along C in the clockwise
direction to get to a point Q, as shown below. By definition of ρ, we have ρ(Q) = Q .
Hence ρ is surjective. This proves that ρ is bijective.
Q
θ
O
Q
C
4.3. TRANSFORMATIONS OF THE PLANE 207
Now again assume that coordinates have been introduced in Π. We claim that
the following transformation G of Π, defined by G(x, y) = (arctan(x), y), is injective
but not surjective.
To show G is injective, we must show that if (x1 , y1 ) = (x2 , y2 ), then G(x1 , y1 )
= G(x2 , y2 ). To this end, observe that (x1 , y1 ) = (x2 , y2 ) means x1 = x2 or
y1 = y2 . First suppose x1 = x2 ; then either x1 < x2 or x1 > x2 . If x1 < x2 , then
208 4. BASIC ISOMETRIES AND CONGRUENCE
Inverse transformations
and
(ρA ◦ ρB )(B), (ρB ◦ ρA )(B).
We want to show that the two points are different in each case. To this end, we
have to look at the following picture, where lines perpendicular to LAB through A
and B have been drawn. Let points C, N , P , and D be chosen on these lines, as
shown, so that LCN and LP D are perpendicular to LAB at A and B, respectively,
and |AB| = |AC| = |AN | = |BD| = |BP |. In addition, let E be a point on LAB so
that A ∗ B ∗ E and |AB| = |BE|.
37 Equation (4.1) should be counterbalanced by the fact that the composition of two rotations
with distinct centers is in general not a rotation. See Exercise 11(iii) and (iv) on page 372.
38 The following discussion of this example will assume some geometric facts that we have
not proved. There is no harm in doing this because this example is a side remark rather than an
integral part of our logical development.
210 4. BASIC ISOMETRIES AND CONGRUENCE
C q rP
q q q
A @ B E
@
@
N q @r D
We want to find out what (ρA ◦ρB )(A) is. By definition, this is the point ρA ( ρB (A)).
So we have to first find out what the point ρB (A) is. This is the point obtained by
rotating A 90 degrees counterclockwise around B. First of all, ρB (A) rotates the
ray RBA to the ray RBD . So ρB (A) must lie on the ray RBD . But we are assuming
that |AB| = |DB|, so by (L5)(ii) on page 184, ρB moves A to D, and therefore
ρB (A) = D. Hence in order to find out what (ρA ◦ ρB )(A) is, we now must find out
what ρA (D) is.
Notice that we are looking strictly at the effect of the transformation ρA on the
point D, and we ignore what ρB is or the fact that D = ρB (A). In other words, in
finding out about the effect of a composition of two transformations, say ϕ ◦ , we
first observe what the first transformation does to a point P , say (P ) = Q; once
that is done, we forget about and concentrate entirely on Q to find out what ϕ
does to the point Q. Please keep this in mind.
To return to our task at hand, we have to find out what ρA (D) is. So what does
ρA do? It turns every point 90 degrees counterclockwise around A. For example,
ρA (B) lies on the ray RAC , but since |AB| = |AC| by construction and ρ(AB) has
the same length as AB, ρA (B) = C by (L6)(iv) on page 188. Similarly, ρA (N ) = B.
By the same reasoning, ρA (D) lies on the ray perpendicular to LAD and lying
in the same closed half-plane of LAD as P . Now, by elementary geometry (see,
e.g., Section 6.2 of [Wu2020b]), we know that |∠P AB| = |∠BAD| = 45◦ , so that
|∠P AD| = |∠P AB| + |∠BAD| = 90◦ (see (L6)(iv) on page 188), and that |AD| =
|AP | because AD and AP are the diagonals of the squares AN DB and ABP C,
respectively. (For our need here, it suffices to verify both facts experimentally.)
Therefore, ρA (D) = P for reasons similar to the above. Consequently,
(ρA ◦ ρB )(B) = C,
(ρB ◦ ρA )(B) = N
4.3. TRANSFORMATIONS OF THE PLANE 211
are similar and will be left as an exercise (Exercise 8 on page 216). In any case, we
also have
(4.4) (ρA ◦ ρB )(B) = (ρB ◦ ρA )(B).
Conclusion: Given two transformations F and G of the plane, it is in general
false that F ◦ G = G ◦ F .
With these preliminaries out of the way, we now come to the main point.
Given a transformation F , suppose there is a transformation G so that both F ◦ G
and G ◦ F are equal to the identity transformation I on the plane. Then we
say that G is an inverse transformation of F (and of course also that F is an
inverse transformation of G). Often, we simply say F is an inverse of G.
Again, referring to rotations, let ρ be the rotation of degree θ around O, where
−360 ≤ θ ≤ 360, and let ρ be the rotation of degree −θ around the same point O,
where −360 ≤ θ ≤ 360; then it can be immediately verified by using the definition
of a rotation that
(4.5) ρ ◦ ρ = ρ ◦ ρ = I
so that ρ is an inverse transformation of ρ.
The following theorem characterizes transformations which have an inverse
transformation.
Proof. In an exercise (Exercise 5 on page 216), you will prove (ii). We can prove
(i) very simply by use of a standard argument, one that deserves to be learned.
Let G be an inverse of a given transformation F . Then F is injective because
if F (P1 ) = F (P2 ) for two points P1 and P2 , then also G(F (P1 )) = G(F (P2 )) and
therefore P1 = P2 because G◦F = I. Thus if P1 = P2 , then F (P1 ) = F (P2 ). Also, F
is surjective because given Q ∈ Π, if we let P = G(Q), then F (P ) = F (G(Q)) = Q,
because F ◦ G = I. The proof of the theorem is complete.
subgroups, as we will point out in due course, e.g., pp. 235, 240, and 286. (Note
that while it is clear that we are here talking about bijections of the plane, we have
purposely omitted any reference to the plane because this discussion is valid for the
bijections of any set.)
Appendix
P
xo
H + xo
_
O Q
H
Let us denote the half-plane of LOP in which Q lies by H + and denote the half-
plane of LOP in which Q lies by H − . According to Lemma 4.10 on page 190,
the point Q in H + is unique, and the point Q in H − is also unique. If it is
counterclockwise rotation that we want, then intuitively, we would choose Q in H +
and define (P ) = Q, but if we want instead the x◦ clockwise rotation of P , then
we would take Q in H − . Thus for the purpose of defining (P ), we simply choose
the half-plane H + of LOP and define (P ) to be the unique point Q in H + so that
Q ∈ H + , |OP | = |OQ|, and |∠QOP | = x◦ . The definition of counterclockwise
rotation therefore boils down to the "consistent" choice of a half-plane of a given
line passing through O.
Perhaps we should point out that, just as we have used "left" and "right"
regarding the number line without any formal definition, we will also use "up" and
"down" in the rest of this appendix without a formal definition. The fact is that
we could define all these concepts if we must, but given that this volume is already
overloaded with (uninteresting) technicalities, we have chosen not to given these
formal definitions.
We can now give the formal definition. Fix a point O in the plane. Let a point
P be given, P distinct from O. First, we are going to single out a specific half-plane
HP of the line LOP , in the following way. Let ∠A1 OA2 be a right angle with vertex
4.3. TRANSFORMATIONS OF THE PLANE 213
at O so that the side ROA1 is horizontal and right-pointing and the side ROA2 is
vertical and upward-pointing. By (L5)(ii), we may assume without loss of generality
that |A1 O| = |A2 O|. Similarly, on the line LOA1 , let a point A3 be chosen so that
A1 ∗ O ∗ A3 and |OA1 | = |OA3 |, and on the line LOA2 let a point A4 be chosen so
that A2 ∗ O ∗ A4 and |OA2 | = |OA4 |, as shown:39
q
A2 P
A3 O A1
q A4
P
For the definition of HP , we first dispose of four special cases:
(1) If P lies on the ray ROA1 , then HP is the half-plane of LOP
that contains A2 .
(2) If P lies on the ray ROA2 , then HP is the half-plane of LOP
that contains A3 .
(3) If P lies on the ray ROA3 , then HP is the half-plane of LOP
that contains A4 .
(4) If P lies on the ray ROA4 , then HP is the half-plane of LOP
that contains A1 .
The HP in each of these four cases is represented as the shaded region in the
following:
Next, assume that P lies on neither the horizontal line LA1 A3 nor the vertical
line LA2 A4 . If P lies in ∠A3 OA4 (let us say), then the two points A3 and A4 will
lie on different sides of the line LOP by the crossbar axiom on page 250.40 The
following definition of HP makes use of this fact:
(i) If P lies in ∠A1 OA2 , then HP is the half-plane of LOP that
contains A2 .
(ii) If P lies in ∠A2 OA3 , then HP is the half-plane of LOP that
contains A3 .
(iii) If P lies in ∠A3 OA4 , then HP is the half-plane of LOP that
contains A4 .
The HP in each of these four cases is represented as the shaded region in the
following:
HP A2 HP
A2 A2 A2
P P
A3 O A1 A3 O A1 A3 O A1 A3 O A1
P
A4 A4 A4 A4 P
HP HP
A2
P
xo
A3 O A1
HP
Q
A4
It is now easy to define the x-degree counterclockwise rotation around O
when 0 ≤ x ≤ 360. First, we do it intuitively. If we have to rotate a point P = O
counterclockwise through (let us say) 235 degrees, we can stop the rotation after
180 degrees and then resume the counterclockwise rotation for another 55 degrees
(235 = 180 + 55). The advantage of doing this is that after 180 degrees of rotation,
we know exactly where P is, namely, the point P which lies on the line LOP so
that O is the midpoint of the segment P P (see the picture on p. 213). Therefore
to define the 235-degree counterclockwise rotation of P , all we need to do is carry
out the 55-degree counterclockwise rotation of P . But the latter is something we
already know how to do.
Formally, suppose a number x is given so that 180 < x ≤ 360. Write x as
x = 180 + x, where 0 < x ≤ 180. Then the x-degree counterclockwise rotation
4.3. TRANSFORMATIONS OF THE PLANE 215
It is now clear how one should go about defining the x-degree clockwise
rotation around O for x satisfying 0 ≤ x ≤ 180: for each point P = O, we
specify the preferred half-plane of LOP to be the opposite half-plane of HP . Let
us denote this half-plane by HP− . Then for each x satisfying 0 ≤ x ≤ 180 and for
each P = O, the x-degree clockwise rotation of P is by definition the point P
so that |OP | = |OP |, |∠P OP | = x◦ , and P lies in the specified half-plane HP− .
The x-degree clockwise rotation around O, where 180 < x ≤ 360, is then
defined as the x-degree clockwise rotation of P (the 180-degree rotated image of P
as above), where x = 180 + x and 0 < x ≤ 180. Altogether, we have defined the
x-degree clockwise rotation of any point P in the plane and for any x satisfying
0 ≤ x ≤ 360.
It remains to define the x-degree counterclockwise rotations around an
arbitrary point O of the plane, where 0 ≤ x ≤ 360. Let T be the translation
−−→
along the vector OO (page 234). For each point P , let P be the point in the
plane so that T (P ) = P , and let Q be the x-degree counterclockwise rotation of P
around O. Then, by definition, the x-degree counterclockwise rotation of P around
O is the point T (Q). The x-degree clockwise rotation is defined similarly.
Exercises 4.3.
(1) (a) Prove that the composition of two isometries is an isometry. (b) Prove
that the composition of two surjections is a surjection and the composition
of two injections is an injection. (Hence the composition of bijections is a
bijection.) (c) If F , G are bijections, then prove that the inverse of F ◦ G
is G−1 ◦ F −1 .
(2) (This exercise makes use of coordinates; see the warning on pp. 207 and
208.) (a) Let F and G be transformations of the plane defined by F (x, y) =
(x, y + 1) and G(x, y) = (xy, y). Are the transformations F ◦ G and G ◦ F
equal? (b) Is F ◦G injective? Surjective? (c) Is G◦G injective? Surjective?
(3) (This exercise makes use of coordinates; see the warning on pp. 207 and
208.) (a) Let F be the transformation of the plane defined by F (x, y) =
(x + 1, y + 1), and let C denote the unit circle, i.e., the circle of radius
1 around the origin (0, 0). Give a rough description of F (C), but be as
precise as you can. (b) Let G be the transformation of the plane defined
by G(x, y) = (2x, y), and let C be the unit circle as before. Give a rough
description of G(C), but be as precise as you can. (c) Let H be the
transformation of the plane defined by H = G ◦ F , so that H(x, y) =
(2x + 1, y + 1), and let C be the unit circle as before. Give a rough
description of H(C), but be as precise as you can.
(4) (This exercise makes use of coordinates; see the warning on pp. 207 and
208.) (a) Consider the transformation G of the plane defined by G(x, y) =
(x2 , y). Is it injective? Is it surjective? (b) Consider the transformation F
216 4. BASIC ISOMETRIES AND CONGRUENCE
toward proving theorems in geometry in the process. These theorems are needed for
the definitions of reflections and translations in the next section, and among them,
the most important is Theorem G1 on page 220. Theorem G1 will have many
applications.
Assumptions about rotations and first consequences (p. 217)
Theorem G1 and its proof (p. 220)
Theorems G2–G4 (p. 223)
At this point, we will assume that the first six assumptions, (L1)–(L6), have
been committed to memory and we will freely make use of them to prove some
simple geometric theorems. Recall that (L2) is the parallel postulate.
Our attitude toward geometric proofs at this point is strictly utilitarian: we
prove the minimum number of theorems that are needed for the discussion of linear
equations in beginning algebra. A more systematic presentation of the proofs of
the basic theorems in plane geometry will be given in Chapter 6 of [Wu2020b].
Moreover, Chapter 8 in [Wu2020b] will discuss the nature of proofs in geometry
from a broader perspective.
We have already defined rotations on page 202. It remains to make explicit our
assumptions about rotations:
(1) Any rotation maps a line to a line, a segment to a segment,
a ray to a ray, and an angle to an angle.
(2) Any rotation preserves lengths of segments and degrees of
angles.
Thus every rotation is by assumption an isometry (see (2)). Note that, by
(2), a rotation preserves not only lengths but also degrees of angles (see page 203
for the definition of degree-preserving). Rotation is the first of three isometries to
be studied in detail that will be referred to as the basic isometries of the plane,42
the other two being reflection (page 229) and translation (page 234). These three
are the basic building blocks of the concept of congruence (page 240).
Note that the rotation of zero degrees around a point is just the identity trans-
formation I of the plane. Rotations of 180 degrees play a major role in the logical
development of plane geometry; see, for example, Theorem G1 on page 220 and
Theorem G12 on page 259.
Let θ and σ be numbers so that −360 ≤ θ, σ ≤ 360 and so that −360 ≤ θ + σ ≤
360. Let θ and σ be rotations of θ and σ degrees, respectively, around the same
center. Then according to equation (4.1) on page 209, the composition of θ and
σ satisfies
(4.6) θ ◦ σ = θ+σ .
Recall that the restriction −360 ≤ θ + σ ≤ 360 has to be imposed on equation (4.6)
because otherwise the right side of (4.6) will not make sense (rotations are thus far
defined only for angles in the range [−360, 360]).
Equation (4.6) has to be supplemented by three remarks. First, the composition
of two rotations with distinct centers may not be a rotation in general; see Exercise
42 But see page 237 for further comments on the terminology of "basic isometries".
218 4. BASIC ISOMETRIES AND CONGRUENCE
10(b) on page 238 and Exercise 11 on page 372. Second, if σ = −θ, then (4.6)
reduces to equation (4.5) on page 211 (because a rotation of 0 degrees is the identity
transformation I):
θ ◦ −θ = I and −θ ◦ θ = I.
As noted on page 211, this implies that, by virtue of Theorem 4.15 on page 211,
each rotation θ is a bijection. A third remark is that when rotations of any degree
have been defined, equation (4.6) will be seen to hold for any two number θ and σ
with no restrictions (see equation (1.94) in Section 1.6 of [Wu2020c]).
We now point out that there are "plenty of" rotations as a result of our as-
sumption about the existence of angles with a prescribed degree in (L6)(ii) on page
188.
Lemma 4.16. Given a point O and a number t so that −360 ≤ t ≤ 360, there
exists a t-degree rotation around O.
Proof. We have to show that, given a number t so that 0 ≤ t ≤ 360, we can define
a t-degree rotation around a given point O using only the given assumptions. We
do so as follows. Since such a rotation maps O to O, we only have to define the
rotated image of P for a point P distinct from O. According to (L6)(ii), there
are two angles ∠P OQ1 and ∠P OQ2 (where Q1 and Q2 lie in different half-planes
of LOP ) sharing the side ROP so that |∠P OQ1 | = |∠P OQ2 | = t◦ . The following
picture shows the case 0 < t < 180:
q
P
HHH qQ1
HH t◦
HH
H
O A t◦
A
A
A
A
Aq Q2
A
Without loss of generality, we may assume that Q1 and Q2 are the points so that
|OQ1 | = |OQ2 | = |OP |. Then by the definition of rotation (page 202), the t-degree
counterclockwise rotation of P has to be one of Q1 and Q2 ; according to the picture
above, it is Q1 . Thus (P ) = Q1 . We have now defined how moves an arbitrary
point P = O, so is well-defined. If t satisfies, instead, −360 ≤ t ≤ 0, then the
t-degree rotation of P will be defined in a similar way, except that we now look for
the clockwise rotation of P of degree |t|. The proof of Lemma 4.16 is complete.
It may not be entirely obvious that the assumptions (1) and (2) about ro-
tations on page 217, when coupled with (L1)–(L6), already allow us to prove very
interesting geometric theorems. See Theorems G1–G4 in the remainder of this sec-
tion. In addition to their intrinsic interest, these four theorems are needed for a
meaningful discussion of the definitions of reflection and translation in the next
section. To get a preview of some of the mathematical issues involved in these
definitions, we give an intuitive discussion of reflections and translations.
4.4. THE BASIC ISOMETRIES: ROTATIONS 219
A Λ(A)
r r
For simplicity let us denote by P the point Λ(P ). Implicit in this definition is the
fact that (a) there is such a point P so that L is the perpendicular bisector of the
segment P P and (b) there is only one such point P . Neither is obvious at the
moment. The need for (a) is clear, but the need for (b) may be less so. The fact is
that if there is another point Q distinct from P so that L is also the perpendicular
bisector of P Q, then the definition of a reflection implies that we can also define
Λ(P ) = Q. This raises the question: which point does R assign to P , P or Q?
L
O.
P P
.
Q
If we cannot verify that both (a) and (b) are valid, then the concept of a reflec-
tion would not be well-defined (see page 45) on two levels. Given a line L and
a point P in the plane, either the putative reflection Λ across L cannot assign a
point to P (this would be the case if (a) fails) or there is more than one candidate
for such a P so that the assignment of Λ to P becomes ambiguous (this would be
the case if (b) fails). To go forward, what we need is a confirmation of the following.
Wish List #1. Given a line L and a point P , there is one and only one line
passing through P and perpendicular to L.
We must also keep in mind that any confirmation of this claim must be based
only on the properties of rotations such as (1) and (2) on page 217, together with
assumptions (L1)–(L6).
−→
B, then the resulting vector is denoted by BA. Therefore, while AB and BA are
−−
→ −−
→
the same segment, AB and BA are different vectors because they have different
−−
→
starting points and different endpoints. The length of a vector AB is by definition
the length of the segment AB.
−−
→
Intuitively, the translation T along a given vector AB is the transformation
that "moves a point P in the plane the same distance and in the same direction as
−−→
AB". For example, if P does not lie on line LAB , we can describe T (P ) as follows.
Draw the line passing through P and parallel to line LAB ; then Q = T (P ) is the
intersection of and the line passing through B and parallel to the line LAP .
B BBQ
MBB BMB
−−
→ BB
B AB BB
B BB P
AB B
B
−−→
Pictorially, this description of Q = T (P ) is believable as far as "P Q pointing in
−−
→ −−→
the same direction as AB" is concerned. But now how do we know that, with P Q
−−→ −−→
so defined, AB and P Q indeed have the same length? In other words, noting that
ABQP is by definition a parallelogram (page 193), we need the following to be true.
Wish List #2. Opposite sides of a parallelogram have the same length.
We now set out to prove both wish list items. The key ingredient in both proofs
is the following basic theorem about rotations of 180 degrees:
For the sequence of geometric theorems to follow, we will adopt
a special convention for their enumeration. Henceforth, all the
theorems in plane geometry will be numbered consecutively by G1,
G2, G3, etc. This is because, in Chapter 6 of [Wu2020b], we will
bring all these theorems together to give a coherent account of plane
geometry.
Theorem G1. Let O be a point not contained in a line L, and let be the
rotation of 180◦ around O. Then the image of L by is a line parallel to itself; i.e.,
(L) L.
We note first of all that Lemma 4.16 on page 218 guarantees that there is
such a rotation of 180◦ around O. That said, let us begin by describing explicitly
this rotation . Given a point P distinct from O, let P denote the image point
(P ) of P by . Then, by (L6)(iii) on page 188, P is the point on the ray RP O
so that |P O| = |P O|. This is because ∠P OP is a straight angle ( is a 180-
degree rotation) and a rotation is distance-preserving (assumption (2) on page
217). Similarly, for any other point Q, the image Q of Q by is the point on RQO
so that |Q O| = |QO|.
4.4. THE BASIC ISOMETRIES: ROTATIONS 221
q Q
q Or qP
P
Q q
One should also get an intuitive feeling for why Theorem G1 is true by some
hands-on activities.
Proof of Theorem G1. We give two proofs of this simple but important the-
orem. The first proof argues by contradiction. The second proof does not use a
contradiction argument and is one that can be used directly in a school classroom.
(L)
qO
q L
Q P
By the assumptions about rotations (page 217), we know (L) is a line. Sup-
pose (L) is not parallel to L. Then they intersect at a point Q. The fact that
Q ∈ (L) means that there is a point P ∈ L so that (P ) = Q. Since is a rotation
of 180◦ around O, the three points P , O, and (P ) are collinear; i.e., P , O, and
Q are collinear (see (L6)(iii) on page 188). As usual, call this line LP Q . Now, not
only is P on L, but Q is also on L because Q = L ∩ (L). Thus L and LP Q have
two points P and Q in common and therefore they coincide: L = LP Q (see (L1)
on page 165). But O also lies on LP Q , so O lies on L, and this directly contradicts
the hypothesis that O is not contained in L. Therefore (L) has to be parallel to
L. Theorem G1 is proved.
q
Q (= (P ))
O q
L
P
222 4. BASIC ISOMETRIES AND CONGRUENCE
Since is a 180-degree rotation around O, the three points P , O, Q are collinear and
therefore Q lies on LOP . Observe that since P is on L and O does not lie on L by
hypothesis, O and P are distinct so that P and Q are also distinct points (because P
and Q lie on opposite half-lines with respect to O by the definition of a 180◦ angle).
Now O lies on LOP and O is not on L, so that LOP and L are distinct lines. By
Lemma 4.2 on page 165, LOP and L can have at most one point in common. Since
P is already known to be on both L and LOP , no other point on LOP can be on L.
In particular, since Q is distinct from P , Q does not lie on L. The proof is complete.
As mentioned on page 166, we can now prove an existence result that comple-
ments the parallel postulate (page 165).
Corollary to Theorem G1. If a point P and a line L are given so that P does
not lie on L, then there exists a line passing through P and parallel to L.
Proof of corollary. Let Q be any point on L and let O be the midpoint of segment
P Q. If is the 180-degree rotation around O, then of course (Q) = P so that
(L) passes through P . Moreover, Theorem G1 says (L) L, and the corollary is
proved.
We can now fulfill the promise made on page 166 by giving a direct proof of
Lemma 4.3. We are given three lines L1 , L2 , and L3 , and they satisfy L1 L2 and
L2 L3 . We have to prove that if L1 and L3 are distinct, then L1 L3 .
Suppose L1 and L3 are distinct. By Lemma 4.1 on page 165, there is a point
P on L3 so that P is not on L1 . If we can prove that any line passing through
P distinct from L3 must intersect L1 , then this would imply that any line passing
through P that is not L3 is not parallel to L1 . But since the preceding corollary
implies that there is a line passing through P that is parallel to L1 , we conclude
that L3 must be that line. This then shows L3 L1 .
Thus let be a line passing through P and distinct from L3 . We are going to
prove that intersects L1 .
L3
P
L2
P
L1
P
We cannot prove in one step that intersects L1 . Instead, we first prove that
intersects L2 . Indeed, since P ∈ L3 and L3 L2 by hypothesis, P does not lie
on L2 . By the parallel postulate, through P passes at most one line parallel to
L2 . Since, by hypothesis, L3 is that line, then L3 is the only line passing through
P that is parallel to L2 . Therefore is not parallel to L2 and intersects L2 at
some point P . As usual, P ∈ L2 and L2 L1 (by hypothesis) imply that P does
not lie on L1 . By the parallel postulate again, through P passes only one line
4.4. THE BASIC ISOMETRIES: ROTATIONS 223
Theorems G2–G4
The next two theorems are both intuitively obvious, and both are simple con-
sequences of Theorem G1. Our immediate concern is whether given a line L and
a point P not on L, there can be two distinct lines passing through P and both
perpendicular to L (see Wish List #1 on page 219). Your instinct tells you "of
course not", because if this happens, the lines would create a triangle whose angles
(have degrees that) add up to > 180 degrees.
P
C
C
C
C
C
C
C
C L
However, that theorem about the sum of angles of a triangle is not known at this
point and therefore cannot be invoked. We must find another way to show that
this is impossible. This is the content of the following theorem.
Theorem G2. Two lines perpendicular to the same line are either identical or
parallel to each other.
Returning now to the main line of our discussion, we see from Theorem G2
that given a point outside a line , there can be at most one line passing through
the point and ⊥ . Now comes the question of existence: is there such a line? We
show that such must be the case by a clever argument: so far we only have one
nontrivial existence theorem, namely, the Corollary to Theorem G1 on page 222.
We will use that to produce a line perpendicular to a given line. First, we give a
definition. Given two lines L1 and L2 , a transversal of L1 and L2 is a line that
is distinct from L1 and L2 and intersects both. We will prove
L1 L2
M
q
A1 A2
Corollary 1 of Theorem G3. Given a point P not lying on a line , there exists
one and only one line L passing through P and perpendicular to .
4.4. THE BASIC ISOMETRIES: ROTATIONS 225
Λ L
qP
A
Proof. Let A be a point on and let Λ be the line passing through A and perpen-
dicular to . If Λ passes through P , we have proved the existence part. If not, then
by the Corollary to Theorem G1 on page 222, there exists a line L passing through
P and parallel to Λ. We shall prove presently that L intersects . Thus intersects
both Λ and L and is therefore a transversal of Λ and L. By Theorem G3, we have
L ⊥ . This then completes the proof of the existence of such an L provided we
can show that L intersects the line .
Suppose L does not intersect ; then L . But we already know that L Λ.
Therefore, by Lemma 4.3 on page 166, Λ , and this contradicts the fact that Λ
meets at A. Thus L intersects after all.
To prove the uniqueness of L, suppose another line L passes through P and is
also perpendicular to . By Theorem G2, since L and L are not parallel (they have
P in common), they have to be identical. Thus L = L . Corollary 1 to Theorem
G3 is proved.
Proof. Indeed, let two lines L1 and L2 be perpendicular to a third line L at A and
D, respectively. Let a line L be perpendicular to L1 at a point B on L1 , and let
L meet L2 at a point C.
L1 L2
B L
C
L
A D
Theorem G4, together with the fact that a rectangle is a parallelogram (Corol-
lary to Theorem G2 on page 224) implies that
the opposite sides of a rectangle have the same length.
This reconciles the usual definition in school mathematics of a rectangle (a quadri-
lateral with four right angles and opposite sides of the same length) with our defi-
nition of a rectangle (a quadrilateral with four right angles).
The proof of Theorem G4 requires the following lemma.
Lemma 4.17. Let F be a bijection of the plane that maps lines to lines, and let
L1 and L2 be two distinct lines. Then the image lines F (L1 ) and F (L2 ) are also
distinct. Furthermore, if L1 and L2 intersect at a point P , then F (L1 ) and F (L2 )
also intersect, and their point of intersection is F (P ).
L2 @ F (L2 ) r
@ F (P )
L1 @P
r
@
@
F (L1 )
Proof of Theorem G4. Given parallelogram ABCD, we must show |DA| = |BC|
and |AB| = |CD|. It suffices to prove the former.
A
@
D
@
@q M
@
C
B
43 To further firm up this discussion, note that "slantangles" exist in the hyperbolic plane
We have so few tools at our disposal that our first thoughts have to be: how
can we make use of Theorem G1? If we look at the picture of a parallelogram,
sooner or later the idea will surface that we should do a 180-degree rotation around
the midpoint of a diagonal, e.g., around the midpoint M of the diagonal AC.
(The diagonal AC is not in the original picture of ABCD, but putting it there
helps us see the situation better.) Let be the rotation of 180 degrees around
M . Then (C) = A so that (LBC ) is a line passing through A and (by Theorem
G1) parallel to LBC . Since the line LAD has exactly the same two properties
by assumption, the parallel postulate (page 165) implies that (LBC ) = LAD .
Similarly, (LAB ) = LCD . Thus,
On the other hand, the intersection of LBC and LAB is B. By Lemma 4.17, we
have
(4.7) (B) = D.
Recall we also have (C) = A. Since maps segments to segments (by assumption
(1) on page 217), we have (BC) = DA. Since is an isometry (by assumption
(2) on page 217), we have |BC| = |DA|, and Theorem G4 is proved.
The proof is already implicit in the proof of Theorem G4 and will therefore be
left as an exercise (Exercise 3 on p. 228).
Theorem G4 allows us to introduce a useful concept. Given two parallel lines,
we can now define the distance between them. First, let P be a point not lying
on a line . The distance of P to the line is by definition the length |P Q|,
where Q is the point of intersection of the line and the line passing through P
and perpendicular to . See the following picture on the left:
P P P
Q Q Q
(
Now suppose we have parallel lines and and P ∈ (see picture on the
right). If P is another point on , then we claim that the distance of P to is
the same as the distance of P to .44 Indeed, let the line passing through P and
perpendicular to intersect at Q . By Theorem G2 (page 223), LP Q LP Q .
Therefore P QQ P is a parallelogram. Consequently, |P Q| = |P Q |, by Theorem
G4. This proves the claim.
The common distance from points on one of two parallel lines to the other is
called the distance between the parallel lines.
Exercises 4.4.
(1) Prove that a parallelogram with one right angle is a rectangle.
(2) Let F be a bijection of the plane that maps lines to lines. Prove that F
maps distinct lines to distinct lines.
(3) Prove the Corollary to Theorem G4 on page 227.
(4) (This exercise is a further refinement of the proof of Theorem G4.) Recall
from the proof of Theorem G4 that if M is the midpoint of the diagonal
AC and is the 180◦ rotation around M , then (B) = D. Prove that (i)
B, M , and D are collinear and that (ii) the diagonal BD and the diagonal
AC bisect each other, in the sense that the point of intersection of AC
and BD is the midpoint of both AC and BD.
(5) Fix two points P and Q in the plane. Let 1 be the counterclockwise
rotation of 45◦ around P , and let 2 be the clockwise rotation of 90◦
around Q. Also write L for LP Q in the interest of notational simplicity.
Now describe as precisely as you can the two lines 1 2 (L) and 2 1 (L).
In particular, does 1 2 (L) equal 2 1 (L)?
(6) Prove the following slight generalization of Lemma 4.17 on page 226: let
F be a bijection of the plane and let U and V be two subsets of the plane.
Then
F (U ∩ V) = F (U) ∩ F (V).
(7) Given a line L, prove that all the points of a fixed distance k from L form
two lines each parallel to L.
44 This explains why the sleepers (cross ties) across rail tracks can afford to be all of the same
length.
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 229
(8) Given positive numbers a and b, prove that there exists a rectangle whose
sides have lengths a and b. (Do not skip any steps!)
(9) Let L1 and L2 be parallel lines and let O be a point equidistant from L1
and L2 . Let two lines passing through O intersect L1 and L2 at A, B and
C, D, respectively, as shown. Prove that |AB| = |CD|.
D C L2
J
J
J
JO
J
J L1
A B
(10) (This exercise makes use of a coordinate system; see the warning about
the use of coordinates on page 207.) Let A = (1, 0) and B = (0, 1) and let
A be the 90◦ counterclockwise rotation around A and let B be the 90◦
clockwise rotation around B. Let = B ◦ A . What is (A) and what is
(B)?
Reflections
c c ss s s c c cc s s
own inverse. It follows from Theorem 4.15 on page 211 that every reflection is a
bijection.
Lemma 4.19. Given a line in the plane, there is a reflection across that line.
The lemma follows immediately from the definition of reflection and Corollary
1 of Theorem G3 on page 224.
Translations
−−→ −−→
If AB and P Q are pointing in the same direction, then typically they look
something like the following, where the "right" closed half-plane of L0 contains
RAB and RP Q :
r -
L0
P Q
r -
A B
−−
→ −−→
If LAB = LP Q , then AB and P Q pointing in the same direction would look like
this:
L0
Pr -Q Ar - B
Still assuming that LAB = LP Q , if we make LAB into a number line (as (L3) on
−−
→ −−→
page 167 says we could), then the preceding picture suggests that AB and P Q point
in the same direction if and only if either P < Q and A < B, or P > Q and A > B.
The simple proof may be left to Exercise 5 on page 237. An alternate formulation
−−→ −−→
is this: suppose LAB = LP Q . Then AB and P Q point in the same direction if and
only if RAB ⊂ RP Q or RP Q ⊂ RAB . One can also prove this easily by making
LAB into a number line or making use of Lemma 4.6 on page 174 (see Exercise 12
on page 239).
−−→
We usually abuse the language and say that "AB and the line L are parallel" to
mean that LAB and L are parallel. With this understood, it is essential that, in the
−−
→ −−→
preceding definition, the line L0 be not parallel to either AB or P Q. Otherwise, we
could have the following situation where L0 LAB LP Q and the "lower" closed
−−→
half-plane of L0 does contain both rays, RAB and RP Q , and yet the vectors AB
−−→
and P Q are in no way "pointing in the same direction".
L0 r -
P Q
r
B A
Proof. [Since the lemma is intuitively obvious (try drawing lots of pictures) and the
proof is tedious, it is suggested that this proof not be given in a school classroom.]
First assume P ∈ LAB . If P = A, then we simply let Q = B. From now on,
we may assume P = A. Either P lies on the ray RAB or P does not.
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 233
rA -Br Pr -Q
r
We make LAB into a number line with A = 0 and B > 0. Then the ray RAB
consists of nonnegative numbers. Since P = A and P ∈ RAB , P > 0. Define Q so
that Q > P and so that |P Q| = |AB|. Then RP Q consists of numbers ≥ P and
therefore RP Q consists of positive numbers. Hence, if L0 is the line perpendicular
to LAB at A, then the closed half-plane of L0 that contains B contains all the
nonnegative numbers in LAB (by Lemma 4.7 on page 176) and therefore contains
−−
→ −−→
both RAB and RP Q . This shows AB and P Q point in the same direction.
Next, suppose P does not lie in the ray RAB . Then we make LAB into a number
line so that P = 0 and A > 0. If B < A, then the ray RAB would consist of all
the numbers ≤ A and, since P = 0 < A, we would have P ∈ RAB . Contradiction.
Thus A < B, and RAB consists of all the positive numbers ≥ A. We now choose
Q to be a positive number on LAB so that |P Q| = |AB|. Then RP Q consists of all
nonnegative numbers on LAB . Let L0 be the line perpendicular to LP Q at P .
L0
rP -Q
r Ar -Br
Then the same reasoning shows that the closed half-plane of L0 that contains A
contains all the nonnegative numbers on LAB and, therefore, contains both RAB
−−
→ −−→
and RP Q . This again shows AB and P Q point in the same direction and the
existence part of the lemma is proved if P lies in LAB .
−−→
This vector P Q is unique because, by the definition of "pointing in the same
direction", we know that the point Q must lie in LAB and therefore Q must lie
in one of the two rays issuing from P on LAB determined by P . The requirement
that |P Q| = |AB| implies that once a ray issuing from P is chosen, there can be
only one Q in that ray so that |P Q| = |AB| (by (L5)(ii) on page 184). Since the
preceding existence proof specifies the ray issuing from P in which Q resides, it is
−−→
clear that this Q is unique and therefore P Q is unique.
Next, suppose P does not lie in LAB . Let L0 be the line LAP . Let L1 be the
line passing through P and parallel to LAB . Let L2 be the line passing through
−−
→
the endpoint B of AB and parallel to the line L0 . The point Q is the intersection
of L1 and L2 , as shown:
L0 L2
P
Q q
L1
A
-
B
Remark. Why must the lines L1 and L2 in the preceding picture intersect?
This is probably not a question one wants to address in a proof during a lesson
in a school classroom, but it is something a teacher should be ready to explain if
the question is raised. So suppose not, then L1 L2 . Since we also have L0 L2
by construction, we have L0 L1 or L0 = L1 by Lemma 4.3 on page 166. But
L0 = L1 because A ∈ L0 and A does not lie in L1 ; therefore, L0 L1 . This is
absurd because L0 intersects L1 at P . Hence, L1 must intersect L2 after all.
It follows immediately from the definition of the translation TAB that if a point
P lies in LAB , then TAB (P ) lies in LAB (see the definition of pointing in the same
direction on page 231). In particular, TAB (A) = B. We would also like to make
explicit the following property of TAB that is basically contained in the proof of
Lemma 4.20.
Lemma 4.21. Suppose a point P does not lie on LAB , and suppose TAB (P ) =
Q. Then Q is the unique point so that ABQP is a parallelogram.
P - q Q = TAB (P )
A
-
B
Proof. The only thing that is not already in the proof of Lemma 4.20 is the
uniqueness of such a Q. With A, B, P given, ABQP being a parallelogram means
Q has to be the intersection of the line passing through P and parallel to LAB , and
the line passing through B and parallel to LAP . Since these two lines are uniquely
determined once A, B, and P are given (parallel postulate), Q is also uniquely
determined. This completes the proof.
We observe that every translation has an inverse transformation (see page 211)
that is also a translation. To see this, let us keep the same notation as Lemma 4.21
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 235
−−→
so that TAB is the translation along AB. We will prove that the translation TBA
−−→
is inverse to TAB . If Q lies on LBA , then one can easily prove that, since P Q
−−
→ −−→ −−→
and AB point in the same direction, QP and BA also point in the same direction.
Therefore, since |QP | = |BA|, we have TBA (Q) = P in this case (Exercise 2 on
237). If Q does not lie on LBA , then since ABQP is a parallelogram, BAP Q is also
a parallelogram. The uniqueness part of Lemma 4.21 now shows that TBA (Q) = P .
Therefore for any point P , we have TBA (TAB (P )) = P , or,
TBA ◦ TAB = I.
By switching the points A and B, we obtain
TAB ◦ TBA = I.
−−
→
This means that for any vector AB, the translation TAB has an inverse transforma-
tion TBA . By Theorem 4.15 on page 211, every translation is a bijection of the plane.
Activity 3. We use a piece of paper as a model for the plane. On the paper,
−−
→
draw a vector AB, and also extend the segment AB to a line, denoted as usual
by LAB . Draw some figures on the paper in black. Then use a sheet of overhead-
projector transparency to copy (i.e., trace over) everything on the paper, using (let
−−→
us say) a red pen. In particular, make sure that both the vector AB and the line
LAB are on the transparency. Holding the paper in place, slide the transparency
along the black line LAB on the paper until the red point A on the transparency is
on top of the (black) point B on the paper. The new positions of all the red figures
on the transparency will then display how the translation from A to B moves the
figures on the paper. Here is an example (the starting point of the red vector, a
red A, is not shown in the picture).
B
A
236 4. BASIC ISOMETRIES AND CONGRUENCE
Theorem G5. Given a line LAB that joins two distinct points A and B, let
−−→
be a line in the plane, and let T be the translation TAB along AB. (i) If is equal
to LAB or parallel to LAB , then T () = . (ii) If is neither equal to LAB nor
parallel to LAB , then the translation T () is a line parallel to itself.
Proof. Part (i) follows immediately from the definition of a translation and the
parallel postulate, so we may leave its proof as an exercise (Exercise 1 on page 237).
We will give the proof of part (ii). So let be a line neither parallel to LAB nor
equal to LAB . Suppose T () is not parallel to . We will deduce a contradiction.
Since T () is not parallel to , either T () = or T () intersects at a point
Q. In either case, we have a point Q on T () ∩ . Since Q ∈ T (), there is a point
P ∈ so that Q = T (P ).
T () r
Q = T (P )
r
P
Either P does lie in LAB or it does not. Suppose P ∈ LAB ; then T (P ) ∈ LAB (as
was pointed out right below the definition of a translation on page 234), so that
Q ∈ LAB . Therefore both P and Q lie on LAB . But P = Q because |P Q| = |AB|
by the definition of a translation and A and B are distinct. So |P Q| = |AB| > 0.
Thus and TAB are two lines passing through the distinct points P and Q; by
(L1), = LAB . This contradicts the hypothesis of (ii) that is not equal to LAB .
Next, suppose P does not lie on LAB . By Lemma 4.21 on page 234, P Q LAB ;
i.e., LAB . This again contradicts the working hypothesis that is not parallel
to LAB . Hence T () and the theorem is proved.
We have just finished the definitions of the basic isometries (i.e., rotations, re-
flections, and translations), and it remains to make some concluding remarks. We
have made assumptions about rotations (see (1) and (2) on page 217), reflections
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 237
(see (Λ1) and (Λ2) on page 231), and translations (see (T 1) and (T 2) on page 236).
We can now summarize these assumptions in one all-embracing statement, as fol-
lows.
(L7) The basic isometries (rotations, reflections, and translations) have the
following properties:
(i) A basic isometry maps a line to a line, a segment to a seg-
ment, a ray to a ray, and an angle to an angle.
(ii) A basic isometry preserves lengths of segments and degrees
of angles.
Once again, we note that rotations, reflections, and translations are, by as-
sumption, isometries and that there are "plenty of" basic isometries in the sense of
Lemmas 4.16, 4.19, and 4.22 (page 218, page 231, and page 236, respectively).
Exercises 4.5.
(1) Prove part (i) of Theorem G5 on page 236.
(2) Let a translation TAB be given, and let TAB (P ) = Q. If Q lies on LBA ,
prove that TBA (Q) = P .
(3) Prove that if the diagonals of a parallelogram are perpendicular to each
other, then the parallelogram is a rhombus; i.e., all four sides of the par-
allelogram are equal. (See Exercise 4 on page 228.)
(4) (i) Let ABCD be a parallelogram. Suppose F is a point on AD and
E is a point on BC such that |AF | = |BE|. Prove that ABEF is a
parallelogram. (ii) Suppose in part (i) the parallelogram ABCD is a
rectangle. Prove that ABEF is also a rectangle.
(5) Suppose the four points A, B, P , and Q lie on a number line. Prove that
−−→ −−→
AB and P Q point in the same direction if and only if A < B and P < Q,
or A > B and P > Q.
238 4. BASIC ISOMETRIES AND CONGRUENCE
−−→ −−→
(6) Suppose the two vectors AB and P Q point in the same direction and the
−−→ −−→
two vectors P Q and U V also point in the same direction. Does it follow
−−
→ −−→
that AB and U V point in the same direction?
(7) (This exercise makes use of a coordinate system; see the warning about the
use of coordinates on page 207.) In the picture below, L is the horizontal
−−→
line {y = 1}, and CD is the vector that starts at C = (3, 0) and ends at
D = (2, 1). The points A and B are as shown. Let R = the reflection
−−→
across L, T = the translation along CD, and = the 90◦ rotation around
B (according to the definition of rotations on page 202, this is a counter-
clockwise rotation). What is ( ◦ T ◦ R)(A) and what is ( ◦ R ◦ T )(A)?
Are the two points the same?
y r A = (1, 2)
D = (2, 1) L
I
@
@
@
@
@
r 45◦ @
x
O B = (1, 0) C = (3, 0)
L
C
L1 A DC P
C
C
C
L2 B EC Q
C
C
C
L3 C F C
CC
(Hint. There are many ways to do this, and here is one that uses trans-
lations. Let the line parallel to and passing through E intersect L1 at
P , and let the line parallel to and passing through F intersect L2 at Q.
−−→
Let T be the translation along the vector DE. Then of course T (D) = E.
Prove T (E) = F , and then prove T (P ) = Q.)
−−
→ −−→
(12) Prove that if LAB = LP Q , then AB and P Q point in the same direction
if and only if RAB ⊂ RP Q or RP Q ⊂ RAB . (Hint: Make use of Lemma
4.6 on page 174 and imitate its proof by making LAB into a number line
−−→ −−→
and argue with < rather than with ∗. Suppose AB and P Q point in the
same direction. Let L0 be a line whose closed half-plane contains both
RAB and RP Q , and let LAB intersect L0 at O. If O ∗ P ∗ A, then prove
RAB ⊂ RP Q , but if O ∗ A ∗ P , then prove RP Q ⊂ RAB . Conversely, if
RAB ⊂ RP Q , let L0 be the line ⊥ LP Q at P . Then the closed half-plane
of L0 that contains Q contains both RP Q and RAB .)
(13) (a) Prove that every translation is equal to the composition of two reflec-
tions. (b) Prove that every rotation is also equal to the composition of
two reflections.
Comments: The net effect of this exercise seems to be that we
can forget rotations and translations because we only need re-
flections. This is an algebraic afterthought on the geometry of
basic isometries, and it must be said that, in advanced mathe-
matics, this algebraic point of view has paid immense dividends.
On the other hand, this algebraic fact is only something to keep
in mind from the point of view of algebra, but no more than
that. Geometers continue to think in terms of translations and
rotations directly.
Proof. It has been pointed out that every one of the basic isometries has the
following three properties: it is a bijection (see pp. 205, 231, and 235), it is an
isometry, and it maps lines to lines as well as preserves the degrees of angles (see
(L7) on page 237 for each of these claims). Because these properties persist under
composition, the proof of part (a) of the theorem is straightforward. To prove part
(b), i.e., the inverse of a congruence is a congruence, let a congruence ϕ be the
composition of three basic isometries F ◦ G ◦ H; then it is simple to directly verify
that if ψ = H −1 ◦ G−1 ◦ F −1 , then ψ ◦ ϕ = I = ϕ ◦ ψ. So ψ is the inverse of ϕ.
But the inverse of a basic isometry is a basic isometry, because the inverse of a
rotation is a rotation (see page 211), the inverse of a reflection is itself (see page
231), and the inverse of a translation is a translation (see page 235), so ψ is also
a congruence. The proof is similar if ϕ is the composition of any number of basic
isometries. Part (c) follows immediately from the definition of a congruence as a
composition of basic isometries. The proof of Theorem G6 is complete.
With the availability of the concept of congruence, we can now give new mean-
ing to segments with the same length and angles with the same degree by proving
the following lemma. The proof of the lemma is very instructive.
Lemma 4.23. (i) Two segments have the same length if and only if they are
congruent to each other. (ii) Two angles have the same degree if and only if they
are congruent to each other.
Proof. We first prove (i). Let AB and A B be congruent segments. Since ev-
ery congruence is an isometry (Theorem G6), the segments have the same length.
Conversely, suppose |AB| = |A B |, then we will prove that there is a congruence
φ so that φ(AB) = A B .
B
ZZ
Z
Z
Z
Z
A
A B
−−→
Let T be the translation along the vector AA . Then T maps A to A and—by
(L7)(i) on page 237—maps AB to a segment whose one endpoint is at A and
the other endpoint we call B0 ; i.e., B0 = T (B). (In the following picture, since
we are trying to show the position of A B0 (= T (AB)), we use a dashed line to
represent the segment AB in its original position. Of course T also moves A B
away from its original position, but since we are not concerned with T (A B ), we
omit T (A B ) from the picture altogether. However, we choose to retain A B in
its original position in the picture for the benefit of this discussion.)
B
ZZ
Z
Z
Z
Z
:
A B0
(= T (A)) (= T (B))
A B
Let ∠B A B0 denote the convex angle with vertex at A , and let |∠B A B0 | = t◦ .
Let denote the t-degree rotation around A . (In this picture, is clearly the
242 4. BASIC ISOMETRIES AND CONGRUENCE
−−→
As before, we use the translation T along the vector OO to map O to O .
T then moves ∠AOB to a new angle ∠A0 O B0 (see (L7)(i) on page 237) with
vertex at O , where A0 = T (A) and B0 = T (B). See the left picture below. Let
the degree of the convex angle ∠A0 O A be t. Let be the t-degree (clockwise or
counterclockwise) rotation around O (in the picture, it is counterclockwise) so that
maps the ray RO A0 to the ray RO A . Let map B0 to a point B1 . Thus maps
∠A0 O B0 to ∠A O B1 , as shown in the right picture below.
B B
O O
t A A
O A0 = T (A) O
B1
A B0 = T (B) A
B B
Notice that the congruence ◦ T has moved ∠AOB to ∠A O B1 so that the
latter now has one side in common with ∠A O B . In the right picture above, B1
and B are in opposite half-planes of the line LO A . Now we use the reflection Λ
across LO A to map ∠A O B1 to an angle in the closed half-plane of LO A that
contains ∠A O B . We claim that Λ maps the ray RO B1 to the ray RO B . To
prove the claim, let B 1 = Λ(B1 ). Then the claim becomes: the two rays RO B and
RO B 1 coincide.
B1
% B
%
%
%
O % A
To this end, we will prove that |∠A O B | = |∠A O B 1 | and appeal to Lemma
4.10 on page 190 to draw the desired conclusion. To prove |∠A O B | = |∠A O B 1 |,
observe that ∠A O B 1 = Λ(∠A O B1 ), and we also have
∠A O B1 = (∠A0 O B0 ) = (T (∠AOB)).
Therefore,
∠A O B 1 = Λ(∠A O B1 ) = Λ((T (∠AOB))) = (Λ ◦ ◦ T )(∠AOB).
Let φ denote the congruence Λ ◦ ◦ T . Then we have
(4.9) φ(∠AOB) = ∠A O B 1 .
By assumption (L7) on page 237, φ preserves degrees of angles. Thus ∠AOB and
∠A O B 1 have the same degree. By hypothesis, ∠AOB and ∠A O B also have the
244 4. BASIC ISOMETRIES AND CONGRUENCE
|∠A O B | = |∠A O B 1 |,
as desired. Since these two angles lie in the same closed half-plane of their common
side which is the ray RO A , Lemma 4.10 implies that their other sides must coincide;
i.e., the two rays RO B and RO B 1 coincide, and the claim is proved.
Equation (4.9) now reads:
φ(∠AOB) = ∠A O B .
The lemma shows the equivalence of "two segments have the same length" with
"two segments are congruent". Following the tradition started by Euclid ([Euclid1]),
it is customary to say that two segments are equal when what is meant is that
they are congruent. The same goes for equal angles when what is meant is that
the angles are congruent. When there is no fear of confusion, we will also abuse the
language in this manner for the rest of these volumes, but it is important to note
that this terminology is, strictly speaking, incorrect, because, for example, "equal
segment" means literally that the segments are equal geometric figures in the sense
of equal subsets of the plane (see page 141).
The rest of the geometric discussion in this volume will focus on triangles and,
at times, polygons. It is however worth pointing out that the concept of congruence
applies not only to polygons, but to any geometric figures, including "curved" ones
such as parabolas and ellipses. See Chapter 2 of [Wu2020b] and Chapters 1 and 4
of [Wu2020c]. At this point, a little reflection will reveal that the crude definition
of congruence in TSM as "same size and same shape"—regardless of its intuitive
appeal—can in no way be used as a definition of congruence. In mathematics,
one cannot replace a precise concept with a vague intuitive one, no matter how
appealing. For example, we can say that the following two strange looking figures
are congruent, not because they "look the same", but because the left figure can
be mapped to the right figure by a translation (such as from P to Q) followed by
a (counterclockwise) rotation of 90◦ .
4.6. CONGRUENCE, SAS, AND ASA 245
46 The emphasis is on "suitably chosen". See, for example, Exercise 5 on page 251.
246 4. BASIC ISOMETRIES AND CONGRUENCE
The proofs of SAS and ASA depend on the following simple lemmas. We note
that for the proof of Theorem G9 (ASA), we only need Lemma 4.24 (whose proof
is actually implicit in the proof of Lemma 4.23 on page 241). For the proof of
Theorem G8 (SAS), however, Lemma 4.25 will be needed.
Lemma 4.24. Assume two convex angles ∠M AB and ∠N AB so that |∠M AB|
= |∠N AB|. Suppose they have one side AB in common and M and N are on
opposite sides of the line LAB . Then the reflection across the line LAB maps ∠N AB
to ∠M AB (and also maps ∠M AB to ∠N AB).
M
AH B
HH
HH
HH
N
Lemma 4.25. Suppose two convex angles ∠M AB and ∠N AB have the same
degree and they have one side AB in common. Assume further that the segments
AM and AN have the same length. Then either M = N (if M and N are on the
same side of LAB ) or the reflection across LAB maps N to M (if M and N are on
opposite sides of LAB ).
r M
AH B
HH
HH
HH
Hr N
For the proof of Lemma 4.24, observe that the reflection R across LAB maps
∠N AB to ∠N0 AB, where N0 = R(N ), so that ∠N0 AB and ∠M AB are now con-
vex angles with the same degree, in the same half-plane of LAB , with one side RAB
in common. So ∠N0 AB = ∠M AB, by Lemma 4.10 on page 190. This proves
Lemma 4.24. As to Lemma 4.25, suppose M and N are on the same side of LAB .
By Lemma 4.10 on page 190 again, we know that the rays AM and AN coincide.
But since |AM | = |AN |, necessarily M = N . Now if M and N are on opposite
sides of LAB , then Lemma 4.24 shows that the reflection across LAB maps the ray
4.6. CONGRUENCE, SAS, AND ASA 247
RAN to the ray RAM . Since a reflection preserves length (see (L7)(ii) on page 237),
the reflection maps the segment AN to a segment of length equal to |AM |, and
therefore M = N by virtue of (L5)(ii) on page 184. This proves Lemma 4.25.
Proof of Theorem G9. We will prove that if the triangles ABC and A B C
satisfy |AB| = |A B |, |∠A| = |∠A |, and |∠B| = |∠B |, then there is a congruence
ϕ so that
ϕ(ABC) = A B C .
To this end, we break up the proof into three steps, going from a special case to
the most general case:
Case I. In this case, either C, C are already in the same half-plane of LAB or
they are in opposite half-planes of LAB . If the former, then we claim that C = C ,
so that in this situation, we need only let ϕ be I, the identity transformation. To
prove the claim, observe that because |∠CAB| = |∠C AB| by hypothesis, the fact
that C and C are in the same half-plane of LAB implies that we have the equality
of rays RAC = RAC (Lemma 4.10 on page 190). Thus the two rays RAC and RAC
in the following picture in fact coincide:
C
% C
%
%
%
%
A % B
In like manner, because |∠CBA| = |∠C BA|, we have RBC = RBC . Therefore
the following intersections are equal:
Case II. We now let the triangles satisfy the less restrictive condition that
A = A , but B and B may now be distinct. Let the degree of the convex angle
between the rays RAB and RAB be θ, as shown in the left picture below.
C
@
A = A r
@ A
A = A r @B A
A
θ A
A
C A C∗
@
C @
@ @
@ B = B∗
@ B
Then there is a θ-degree (clockwise or counterclockwise) rotation θ around the
point A so that θ (RAB ) = RAB (compare the proof of Lemma 4.23 on page 241).
In the left picture above, it is a clockwise rotation, but a different configuration
may require a counterclockwise rotation from RAB to RAB .
Let B ∗ = θ (B). Now B and B ∗ are two points on the ray RA B so that
|A B | = |AB| (by hypothesis) and |A B ∗ | = |θ (AB)| = |AB| (because of (L7)(ii)
on page 237 and the fact that θ is a basic isometry). Therefore, B and B ∗ are
two points on the same ray RA B equidistant from A ; we conclude that B = B ∗
(by (L5)(ii) on page 184); i.e., B = θ (B).
Letting C ∗ = θ (C), we get
(4.12) θ (ABC) = A B C ∗ .
4.6. CONGRUENCE, SAS, AND ASA 249
Case III. Finally, we deal with the general case where no restriction is placed
on the triangles ABC and A B C . Assuming that the vertices A and A are distinct,
−−→
let T be the translation along the vector AA ; note that T (A) = A . If we define
B ∗ = T (B) and C ∗ = T (C), then
(4.14) T (ABC) = A B ∗ C ∗ .
C∗
A B∗
C
C A B
B
Now the triangles A B C and A B ∗ C ∗ have the vertex A in common. Furthermore,
The preceding proof furnishes a classic example of a proof that "progresses from
the simple to the complex", in the sense that it starts by proving a relatively simple
case (Case I), then proceeds to a slightly more complex case (Case II), and then
finally arrives at the most general case (Case III). It may remind you of the proof
250 4. BASIC ISOMETRIES AND CONGRUENCE
of Theorem 1.7 on page 49, in which we first proved that the area of a rectangle
with sides of length 1 and n1 is 1 × n1 and, on the basis of this fact, we proceeded
to prove the general case, namely, that the area of a rectangle with sides of length
and n is × n . One should not hold onto the simplistic belief that all proofs
k m k m
yield to such a direct attack, but this direct approach is something we must keep
in mind anytime we want to prove a theorem.
We will prove in Section 6.6 of [Wu2020b] that every isometry of the plane is
a congruence. In other words, every isometry is nothing but the composition of
a finite number of basic isometries. This underscores the importance of the basic
isometries: they are the basic building blocks of all the isometries of the plane. Once
we know this, then we know that if a transformation preserves distance, it must
be a congruence and therefore it is automatically surjective and it automatically
preserves lines, segments, rays, angles, and degrees of angles. However, until we can
prove this fact about isometries, we cannot assume that an isometry has all these
desirable properties. So be careful.
Before we leave the mathematical discussion of this chapter, we state the last
assumption we need for the development of plane geometry.
(L8) (Crossbar axiom) Given a convex angle AOB, for any point C not
equal to O in ∠AOB, the ray ROC intersects the segment AB (indicated as point
D in the following figure).
A
@
` @D
O `` C
```
``` @
``@ `@```
B
You may regard the crossbar axiom as frivolous, because "what else can the
ray ROC do if it does not intersect AB"? First of all, so long as you consider
this statement to be obvious, then our objective of agreeing on a common starting
point has been met: we do want to assume only believable facts. As to whether
the crossbar axiom is frivolous, we should point out that up to this point, none of
the assumptions (L1)–(L7) explicitly guarantees that the ray ROC must intersect
AB.47 The purpose of the crossbar axiom is therefore to firm up the intuitive idea
that a ray is indeed "straight and infinite in one direction" and therefore cannot
stay inside the bounded triangular region OAB. For example, (L8) guarantees that
the angle bisector (see page 192) of an angle in a triangle must intersect the oppo-
site side, any two medians (see page 252 for the definition) must meet, and the two
diagonals of a parallelogram must intersect each other (page 260). It will also make
a rather dramatic appearance in unexpected places, e.g., the proof of Theorem G14
(hidden in the proof of (♣) on page 262), the proof of Theorem G16 on page 270,
the proof of Ceva’s theorem in Section 6.7 of [Wu2020b], etc.
Mathematical Aside: Since we are assuming that every line can be made into
a number line (see (L3) on page 167), we get easy access to the definition of the
subtle concept of "betweenness" and its basic properties (pp. 167ff.; also see the
more elaborate discussion in Chapter 8 of [Wu2020b]). Such being the case, it is
known that the crossbar axiom can be deduced from the plane separation property
(L4) (page 176) if our goal is to pursue a strictly axiomatic development of plane
(Euclidean) geometry. See page 116 of [Greenberg]. However the proof is too
technical to be of real educational value for the purpose of teaching in K–12.
Exercises 4.6.
(1) Prove that congruence is a transitive relation (see page 241).
(2) Prove that if ϕ is a congruence and S is a convex set in the plane, then
ϕ(S) is also convex.
(3) Prove Theorem G8 (SAS) on page 245.
(4) Prove that any two circles with equal radii are congruent. (Caution: This
is a slippery proof. Be very precise.)
(5) Explain why two triangles with two pairs of congruent sides and one pair
of congruent angles need not be congruent.
(6) (This exercise makes use of a coordinate system; see the warning about
the use of coordinates on page 207.) In the picture below, let C denote
the lower left corner of the black figure. Suppose |∠CAB| = 45◦ , |AB| =
|BC|, and line L makes an angle of 45 degrees with line LAB .
Let F be the counterclockwise rotation of 45◦ with center at the point
B, let G be the clockwise rotation of 90◦ with center at the point A, let
H be the reflection across the line L, and let J be the translation along
−−→
AB. Furthermore, let S denote the black figure at the point C.
L C
A B
Using a separate sketch for each of the following items, indicate the
positions of (a) G(S), (b) F (G(S)), (c) G(H(S)) and H(G(S)) (are they
equal?), (d) J(S), (e) J(F (S)) and F (J(S)), (f) H(J(S)) and J(H(S)),
(g) G(H(J(S))), and (h) J(H(F (S))).
(7) Recall that the opposite sides of a rectangle have the same length (see page
226), so that knowing the lengths of a pair of adjacent sides is equivalent
to knowing the lengths of all four sides of a rectangle. It is common to
refer to a rectangle with a pair of adjacent sides of lengths a and b as
a rectangle with side lengths a and b. Now let R1 and R2 be two
rectangles with side lengths a1 , b1 and a2 , b2 , respectively. Prove that
R1 ∼= R2 if and only if either a1 = a2 and b1 = b2 , or a1 = b2 and a2 = b1 .
252 4. BASIC ISOMETRIES AND CONGRUENCE
We mentioned in the overview (pp. 157ff.) that the content of this chapter
should also be taught in middle school, but in a more intuitive and informal man-
ner. The Common Core Standards are in agreement ([CCSSM], page 55): the
eighth-grade geometry standards call for an understanding of "congruence and sim-
ilarity using physical models, transparencies, or geometry software" and the use of
"informal arguments to establish facts about the angle sum and exterior angle of
triangles, about the angles created when parallel lines are cut by a transversal, and
the angle-angle criterion for similarity of triangles". It will take a delicate touch to
achieve a balance between the nurturing of geometric intuition and the promotion
of reasoning. Because this middle school issue has been taken up in Chapters 4
and 5 of [Wu2016a] (also see [Wu2010a]), we will instead concentrate on the cor-
responding problem in high school: how to introduce geometry in the high school
curriculum.
Transformations of the plane and concepts of surjectivity and injectivity are
taxing topics even for college students, and it would not do to subject the average
high school student to a treatment with the same degree of precision as in this chap-
ter and the next.48 A teacher will have to judiciously simplify the content of these
chapters in order to convey to students their main message, namely, that congru-
ence and similarity are precise mathematical concepts. One suggestion is to confine
the discussion of transformations only to bijections (one-to-one correspondences) in
the plane and mention general transformations only in a few exercises or as activi-
ties for enrichment (cf. the examples on pp. 207ff.). Another suggestion is to make
liberal use of transparencies at every turn when discussing basic isometries and di-
lations. See, for example, the Activities of Sections 4.4 and 4.5 (pp. 221, 230, and
235) and the Activity of Section 5.1 on page 258. One can even assign homework
48 One of the considerations that entered into the design of this geometry curriculum is pre-
cisely the awareness that students may initially experience difficulty with these concepts. Therefore
we want these concepts to be taught, first intuitively in middle school and then more formally in
high school.
4.7. A BRIEF PEDAGOGICAL DISCUSSION OF PROOFS 253
problems on such activities and ask students to report their findings on the effects
of various transformations. With enough such hands-on experiences, students will
build up their geometric intuition about the basic isometries and therefore about
isometries in general. Using transparencies in a similar manner to illustrate the
composition of transformations is also highly recommended.
Such hands-on activities are meant, of course, to supplement the definitions
and the accompanying mathematical discussions, not to replace them. At the same
time, a high school presentation of some of the definitions, lemmas, and theorems
can probably afford to specify that they be "skipped on first reading" so that they
can be revisited later (and even then, perhaps soft-pedaled). For example, Theorem
G1 (page 220) is so basic that ample time should be devoted to its proof, but the
proofs of Theorems G2–G3 (pages 223–224) do not quite occupy the same exalted
status. Recall that the purpose of these two theorems is to show that, from a
point outside a given line, one can drop one and only one perpendicular to the
given line. This fact is needed to make the definition of a reflection well-defined.
However, since this fact is so intuitively obvious, it may be pedagogically legitimate
to simply state these two theorems but postpone their proofs so as to get to the
definition of reflection as quickly as possible. Pedagogical decisions of this type are
always the prerogative of the teacher. Having said that, we would suggest explicitly
that Theorem G4 be carefully proved because the method of proof is powerful. In
fact, the proof of Theorem G18 (page 277) can be given right after Theorem G4 if
so desired.
As another illustration of what can be done to smooth geometric instruction
in high school, it is not usually realized that the definitions of the alternate inte-
rior angles and corresponding angles of a transversal are by no means simple (see
page 276). While the precise definitions should be given (they justify why the con-
cepts of a half-line in Lemma 4.5 and a half-plane in (L4) are indispensable), one
should not make a big deal of the precision; it is boring and cumbersome. Unless
absolutely necessary, one should simply use a picture to identify alternate interior
angles. There is a similar phenomenon with certain proofs. Take Theorem G14, for
example. In this case, there is in fact an explicit recommendation on pp. 262ff. to
suppress some of the subtleties inherent in the proof. Please note that this recom-
mendation is based on two considerations: (a) if the subtle point is not explicitly
brought out by the teacher, an overwhelming majority of the students will not be
aware of it, and (b) in the context of mathematics learning, learning about the
proof of such a subtle point at this stage of students’ mathematical development
is of secondary importance. What we are suggesting is that good pedagogy always
involves some subjective judgment: there is no ironclad rule that dictates in a given
situation what should be emphasized and what could be soft-pedaled. Some com-
promise is essential when the conflict between what is ideal and what is achievable
becomes extreme.
Overall, we wish to advocate a certain flexibility in teaching proofs in geometry.
There should be no doubt about the importance of proofs all through the school
curriculum. But the main message we are trying to convey—as we did in the dis-
cussion of axiomatic developments of plane geometry on page 160—is that a slavish
adherence to the mindset of "proving absolutely everything" is counterproductive
when it comes to the teaching of high school geometry. Compare the discussion on
pp. 160ff. As an example of the kind of flexibility we have in mind, it would be
254 4. BASIC ISOMETRIES AND CONGRUENCE
at times worthwhile to skip a less interesting proof and use the time for a detailed
discussion of the evolution of another proof. A potential candidate for the latter
would be the proof of Theorem G15 on page 263.
The preceding discussion is a reminder of the realities about the mathemati-
cal education of teachers: what we teach teachers is not always what we can use,
unchanged, to teach school students. Pedagogical considerations will necessarily
modify pure mathematical knowledge, or, in the terminology of [Wu2006], the
"mathematical engineering" aspect of school mathematics education cannot be ig-
nored. In particular, these volumes contain more proofs than is optimal for students’
mathematics learning. Experience in the actual classroom will suggest the proper
give-and-take between what ideally should be taught and what could actually be
taught. What such pedagogical considerations cannot do, however, is lighten your
mathematical load as a prospective teacher. If you hope to teach certain concepts
or certain proofs effectively by making the correct mathematical decisions, then you
must know the whole mathematical story first before you can decide what message
must be conveyed and which details can be harmlessly left out. One cannot write a
faithful twenty-page plot outline of War and Peace without first carefully reading
through the thousand and more pages of the uncondensed version. Likewise, with-
out a complete knowledge of the relevant mathematics, you will not know what to
keep and what to leave out in your lessons because you won’t be able to distinguish
between what is truly essential and what is expendable. Besides, if by chance you
get a precocious youngster who wants the whole truth and nothing but the truth,
then you will have to supply the whole truth and nothing but the truth. This too
is part of your basic duty as a teacher, and these volumes are designed to get you
ready for such contingencies.
CHAPTER 5
This chapter introduces the other basic concept in school geometry: similarity.
Like congruence, similarity has not fared well in TSM.1 Middle school students
are taught that two sets are similar if they are the same shape but not the same
size. Intuitively, this is a useful description of similarity, but as in the case of con-
gruence, TSM has the habit of confusing nice-sounding intuitive statements with
precise mathematical definitions. When students get to high school under the illu-
sion that "same shape but not necessarily the same size" is all they need to know
about similarity, they are shocked to be confronted with the fact that similarity
henceforth will only mean equal angles and proportional sides for triangles but
no further mention is made about the similarity of curved figures. Consequently,
students’ understanding of similarity upon graduation from high school consists of
two disconnected sound bites: a definition of similar triangles in terms of propor-
tional sides and equal angles and a vague conception of "same shape but not the
same size" for anything other than triangles. Thus TSM even fails to give students
a correct understanding of a concept as basic as similarity. Fortunately, a correct
definition of similarity, one that is discussed below, can be easily introduced as early
as middle school through ample hands-on experiments plus a judicious amount of
reasoning. A more detailed description of how this can be accomplished in middle
school has been given in Chapter 4 of [Wu2016a]. We are happy to point out that
this curricular advocacy has been adopted by the CCSSM ([CCSSM]), and if the
CCSSM is rigorously implemented, at least one of the egregious errors of TSM will
be rectified in the near future.
We will not pretend that a successful implementation in school classrooms of
this new point of view will be easy or straightforward. It will require some hard
work by knowledgeable teachers to bring about a true understanding of similarity.
The main goal of this chapter is to provide teachers with the content knowledge
they will need for such a successful implementation. In this context, what was said
in Section 4.7 (pp. 252ff.) is just as relevant, if not more so, to the material of this
chapter.
255
256 5. DILATION AND SIMILARITY
case—on account of its simplicity—is of independent interest, not the least because
its proof requires two new characterizations of a parallelogram.
Statement of the theorem (p. 256)
Two characterizations of a parallelogram (p. 259)
Proof of FTS when r = 12 (p. 263)
The number r in the theorem is generally referred to as the scale factor. The
statement above on DE
BC is a standard abuse of notation for LDE LBC ; i.e.,
the line containing the segment DE is parallel to the line containing the segment
BC. We will continue to use this abuse of notation for the rest of these volumes.
In applications, it is sometimes more convenient to assume, instead of
|AD| |AE|
(5.1) = ,
|AB| |AC|
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 257
|AD| |AE|
(5.2) = .
|DB| |EC|
Because the proof of the equivalence of (5.1) with (5.2) is entirely elementary and
straightforward, we will leave it as an exercise (Exercise 1 on page 267).
We will give a proof of the special case of FTS where r = 12 at the end of this
section (page 263), i.e., for the case that D and E are midpoints of AB and AC,
respectively. To this end, we first give an equivalent formulation of FTS that is
sometimes more convenient to use:
Theorem G11 (FTS*). Let ABC be given, and let D be a point on the ray
RAB not equal to A or B. Let be the line parallel to BC and passing through D.
Then intersects the ray RAC at a point E, and
A
QQ
Q
Q
Q
D Q E
Q
Q
B QC
Let us explain in greater detail what we mean when we say Theorems G10
and G11 are equivalent. It means that, if we assume the validity of (L1)–(L8)
in Chapter 4 and Theorems G1–G9, then Theorem G10 implies Theorem G11 and,
conversely, Theorem G11 implies Theorem G10. In other words, if you know The-
orem G10 is true, then you also know Theorem G11 is true and, conversely, if you
know Theorem G11 is true, then you also know Theorem G10 is true. These two
theorems are therefore interchangeable in the precise sense above. Leaving the de-
tails of the converse to an exercise (Exercise 9 on page 268), we will prove that,
assuming Theorem G10, we can prove Theorem G11.
Proof of Theorem G11. Let the line passing through D and parallel to LBC be
as in the theorem. We will first prove that intersects not just the line LAC , but
the ray RAC ; it will be an indirect proof. Let E be chosen on the ray RAC so that
|AD|
|AE| = · |AC|.
|AB|
A A
D E
B C
B C
D E
Then by the cross-multiplication algorithm on page 22 (applied to real numbers by
appealing to FASM—see page 133), we have
|AD| |AE|
= .
|AB| |AC|
By FTS (Theorem G10), we have LDE LBC . Since LBC by hypothesis, and
LDE are two lines both passing through D and parallel to LBC . By the parallel
postulate (page 165), the lines and LDE coincide, so that does intersect the ray
RAC at E. Moreover, by FTS once again,
|AD| |AE| |DE|
= = .
|AB| |AC| |BC|
The proof of Theorem G11 is complete (when Theorem G10 is assumed).
Activity. We will assume that the lines of lined papers are equidistant
parallel lines, in the sense that the distances between adjacent parallel lines (see
page 228 for the definition) are equal. Therefore on a given transversal LAB , the
segments intercepted on LAB by adjacent parallel lines will be all of the same length
(see Exercise 11 on page 238). It follows that if a segment on LAB has its endpoints
on two of the lines on a piece of lined paper (such as AB below), then any one of
the parallel lines in between will divide the segment into two parts whose lengths
can be instantly read off by counting the number of parallel lines. To be explicit,
consider the following situation:
A
\
\
\
P q \qQ
\
D q \qE
\
\
\
\
B C
If the length of the segment on AB trapped between two adjacent parallel lines is
s, then |AD| = 3s and |AB| = 5s. Because the lines LBC and LDE are parallel,
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 259
|AP | |AQ| 2
= = .
|AB| |AC| 5
Theorem G11 now predicts that
|P Q| 2
(5.5) = .
|BC| 5
Again, it should be a satisfying experience for students to verify the prediction (5.5)
by directly measuring |BC| and |P Q|. In a school classroom, such an experiment,
when repeated for many different variations of this configuration, should provide an
excellent opportunity for students to build up their intuition about Theorem G11
which, as we noted, is equivalent to FTS.
The proof of FTS for the special case of r = 12 requires the ability to recognize
a parallelogram when we see one. This subsection takes a step in that direction.
First, a general observation.
Theorem G12. Let O be a point on a line L, and let be the rotation of 180◦
around O. Then maps each half-plane of L to its opposite half-plane.
Proof. We will give a proof similar to the one outlined in Exercise 4 on page 228.
It is well to point out that, at the outset, the two diagonal segments AC and BD
are not known to intersect each other (see the Pedagogical Comments after the
proof), much less bisect each other.
Let M be the midpoint of the diagonal AC, and let be the rotation of 180◦
around M . In the proof of Theorem G4 (see (4.7) on page 227), we proved that
(B) = D. Since is a 180◦ rotation, the points B, M , and D are collinear, and
since is an isometry and (M B) = M D, M is also the midpoint of the diagonal
BD. Thus AC and BD bisect each other.
Next, we look at the converse. Suppose a quadrilateral ABCD (the definition
of a quadrilateral is given on p. 171) has the property that its diagonals AC and
BD meet at M and M is the midpoint of both AC and BD. We will prove that
ABCD is a parallelogram. As usual, let be the rotation of 180◦ around M . Then
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 261
(B) = D and (C) = A. Thus (LBC ) = LAD because, by (L7)(i), maps lines
to lines (page 237) and there is only one line passing through A and D (by (L1)
on page 165). By Theorem G1 (page 220), LBC LAD . In the same way, we can
prove LAB LCD . This proves that ABCD is a parallelogram, and the proof of
Theorem G13 is complete.
Pedagogical Comments. The preceding proof shows indirectly that the di-
agonals AC and BD of a parallelogram meet at a point M . The first comment is
that a direct proof of this fact can be given using the crossbar axiom (page 250),
but it is tedious. A second comment is that few high school students will ever
conceive of the need for such a proof. In a beginning geometry class, one should
probably take something this obvious for granted. Once the formal proof is over,
however, a teacher may want to point out that this visually obvious fact actually
requires a proof because a quadrilateral ABCD could look like this:
D
@
@
H @
B HH@
H@H@
A HC
and the two diagonal segments AC and BD do not intersect. The question then be-
comes why this cannot happen to the diagonals of a parallelogram: which property
exactly about a parallelogram makes its diagonals intersect? If we can get students
to be curious about the answer to this question, then (and perhaps only then) would
they find such a proof to be meaningful. End of Pedagogical Comments.
Proof. The fact that a parallelogram has a pair of sides which are equal and
parallel is implied by Theorem G4, page 226. We prove the converse. Let ABCD
be a quadrilateral so that |AD| = |BC| and AD BC. We have to prove that
ABCD is a parallelogram. It suffices to prove that AB CD. Let be the
rotation of 180 degrees around the midpoint M of the diagonal AC.
A
B
D
B
Bq
M B
B
C
B
parallel postulate (page 165) implies that (LAD ) = LBC . In particular, (D) lies
in LBC . Denote (D) by D , so D ∈ LBC . We are going to show that D = B.
To this end, observe that on the line LBC , D either lies in the ray RCB or in the
opposite ray so that B ∗ C ∗ D , as shown:
B C
(A) D
Remark. In the preceding proof, the fact that B and D lie in opposite half-
planes of the diagonal line LAC was taken for granted and this fact allowed us to
conclude that B = (D). But as we have seen from the picture on page 261, the
vertices B and D of a quadrilateral ABCD may very well lie on the same side of
the diagonal line LAC . In that case, (D) and B would be in opposite half-planes
and the preceding proof of Theorem G14 would fall apart. Therefore this proof of
Theorem G14 tacitly assumes that the following assertion holds:
To prove Step I, we claim that B lies in the convex angle ∠DAC (see page 182 for
the definition of a convex angle). This means we have to show both of the following:
(i) B and D lie on the same side of LAC and (ii) B and C lie on the same side of
LAD . Now (i) is true because this is our working hypothesis. To prove (ii), observe
that since LAD LBC , the segment BC cannot intersect LAD . By assumption
(L4)(ii) on page 176, B and C lie on the same side of LAD , as desired. Thus B lies
in ∠DAC. The crossbar axiom (page 250) now implies that RAB intersects CD,
and Step I is proved.
Next, to prove Step II, assume that A and B lie on the same side of LCD and we
will show that this is impossible. We claim that B lies in the convex angle ∠DCA.
As usual, this means we have to prove both of the following: (a) B and D lie on
the same side of LAC and (b) B and A lie on the same side of LDC . There is no
need to prove (a) because it is our overall working hypothesis (see (i) above), and
(b) is true because we are assuming A and B lie on the same side of LCD . Thus
the claim is true that B lies in ∠DCA. By the crossbar axiom, RCB intersects AD,
and this contradicts the fact that LBC LAD . The proof of Step II is complete.
We can now deduce the contradiction we are after, namely, that if B and D lie
on the same side of LAC , then the segments AB and CD must intersect. Indeed,
Step I shows that the line LAB intersects the segment CD at a point X, and Step
II shows that the line LCD intersects the segment AB at a point Y . Now X and Y
both lie on LAB and LCD , and since two distinct lines intersect at exactly one point,
X = Y . But X lies on CD and Y lies on AB, therefore the segments CD and AB
intersect at X (= Y ). This contradicts the fact that, ABCD being a quadrilateral,
only its adjacent sides can intersect (see the definition of a polygon on page 171).
So B and D lie on opposite sides of LAC after all, and we have proved (♣).
We can now prove the special case of FTS (page 256) when r = 12 :
A
C
C
C
D CE
C
C
C
B CC
Analysis. Let us see how we can prove such a theorem. The situation is
this: we know (L1)–(L8) and Theorems G1–G11, and we are confronted with the
statement of Theorem G15. The question is how we can prove (among other things)
|BC| = 2|DE|. This is awkward, because we know how to prove two segments have
the same length—find a congruence that carries one segment to the other—but not
one segment being equal to twice another. One way out of this predicament is to
look for a segment that has twice the length of DE and then we can try to prove
that this segment has the same length as BC. In this light, extending the segment
DE to F so that DF has twice the length of DE (i.e., |DF | = 2|DE|) would be a
very natural move, as shown:
A
C
C
C
D CE F
C
C
C
B CC
It would be equally natural at this point to connect C to F by a line segment
and, once this is done, we see that if we can prove the quadrilateral DBCF is a
parallelogram, then Theorem G4 on page 226 would immediately yield the desired
conclusion that |BC| = |DF |. So once we get to the "augmented figure" with the
additional line segments EF and CF added to the original figure, we see a clear
path to our goal, the proof of the theorem.
The line segments EF and CF that are added to the original picture of ABC
together with the segment DE are called auxiliary lines in the school education
literature. TSM makes a big deal out of "adding auxiliary lines" as a kind of magical
tool for learning how to prove theorems, but there is in fact nothing "magical" about
these "add-ons". Think of a theorem as an edifice; then the analog of proving a
theorem is finding ways to build a given edifice. Of course when one shows off
an edifice, one first takes down all the scaffolding and removes all traces of the
construction process. If we are serious about building the edifice, however, we must
first mentally remove the pristine picture of the edifice and put back the scaffolding
and begin thinking about the messy construction process itself. Likewise, when a
textbook presents a theorem, the textbook will only give the finished product—the
geometric figure that goes with the theorem—without including the messy details
of the thinking process that may have gone into the proof of the theorem. If we
want to learn to prove the theorem ourselves, we cannot be limited by the pristine
figure attached to the theorem but must put back some of the lines or circles (the
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 265
"scaffolding") that are integral to the proof itself. So the "add-ons", such as the
segments EF and CF , are neither random nor magical, but are things that come
up naturally when we try to look for ways to better understand the "construction
process".
In the above picture, we have chosen to extend DE along the ray RDE , but
the proof does not change if we extend instead along the opposite ray RED . (See
Exercise 5 on page 267.)
Let us continue with our attempt at arriving at a proof of Theorem G15.
Referring specifically to the preceding picture now, we wish to prove |DF | = |BC|.
But how do we prove DBCF is a parallelogram? At this point, we realize that
our repertoire in this regard is extremely limited: we only have Theorems G13 and
G14 for this purpose. The latter prompts us to try to show that BD CF and
|BD| = |CF |. By hypothesis, |AD| = |DB|, so our focus shifts to proving AD CF
and |AD| = |CF |. Such would be the case if DCF A is a parallelogram. Since AC
and DF bisect each other, Theorem G13 gives us exactly what we need. Now, the
whole proof comes together.
One observation of the above analysis is worth mentioning. We see that the
reasoning process is built on a solid knowledge base: students who do not have
Theorems G1, G4, G13, and G14 at their fingertips will be handicapped in trying
to prove this theorem. (We are not saying that memorizing Theorems G1, G4, G13,
and G14 will be all it takes to prove Theorem G15, but, rather, that having easy
access to these theorems is a sine qua non for the task.) What we have is therefore
a simple illustration of the fact that doing mathematics requires a solid mem-
ory bank of basic facts. Do not listen to anyone telling you that "conceptual
understanding"—but no memorization—is all it takes to do mathematics.
Theorem G14 now implies that the quadrilateral DBCF , having a pair of sides
which are equal and parallel, is a parallelogram. Thus DF BC, which is the same
as DE BC. Furthermore, |BC| = |DF | (Theorem G4 on page 226), and since
|DE| = |EF |, we have |BC| = 2|DE|. The proof is complete.
Because of the importance of Theorem G15 in our work, we will give it a second
proof using translations. The strategy is to first prove the following Theorem G15*
directly and then use it to prove Theorem G15.
Theorem G15*. Let ABC be given and let D be the midpoint of AB. Suppose a
line parallel to BC passing through D intersects AC at E. Then E is the midpoint
of AC and 2|DE| = |BC|.
A
e
e
D eE
e e
e e
B e eC
F
Proof. Observe first of all that the pictorially obvious fact is correct, namely, that
the point E lies on the segment AC; this is because of Lemma 4.8 on page 178.
−−→
Now let T denote the translation along the vector AD. Because |AD| = |DB| by
hypothesis, the definition of T implies that T (D) = B. Since T maps any line not
equal to or parallel to LAD to a line parallel to itself (Theorem G5 on page 236),
T (LDE ) is a line passing through B and parallel to LDE . Since LBC LDE by
hypothesis, the parallel postulate implies that T (LDE ) = LBC so that T (E) is a
point on line LBC . Let T (E) = F . Note that F is a point on the segment BC
on account of, once again, Lemma 4.8 on page 178 (because T (E) = F implies
LEF LAB , by Lemma 4.21 on page 234). In any case, we have T (DE) = BF ,
and since T is an isometry (page 236), we get
Next consider T (LAC ). Because T (A) = D and T (E) = F , it follows that T (AE) =
DF and therefore T (LAC ) = LDF . Using Theorem G5 once more and the fact
that a translation is an isometry, we have
by Theorem G4. Coupled with equation (5.6), the first equality of (5.7) implies
2|DE| = |BC|. Finally, the equality |AE| = |DF |, together with the second equal-
ity of equation (5.7), implies that E is the midpoint of AC. The proof of Theorem
G15* is complete.
Proof of Theorem G15 using Theorem G15*. Using the notation and picture
of Theorem G15, we draw a line L through D parallel to BC. By Theorem G15*,
L passes through the midpoint E of AC and therefore DE BC. Since Theorem
G15* also says 2|DE| = |BC|, we are done.
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 267
Exercises 5.1.
(1) Let D and E be points on sides AB and AC, respectively, of ABC, so
that D = A, B.. Make use of |AB| = |AD|+|DB| and |AC| = |AE|+|EC|
to prove
A
@
@
D @E
@
F @G
@
@
B @C
(3) Assume ABC so that |AB| = |AC|. Let the angle bisector of ∠A meet
BC at F . Prove that AF is the perpendicular bisector of BC.
(4) Let D, E, F be the midpoints of sides AB, AC, and BC, respectively,
of ABC. Prove that the four triangles ADE, DBF , DEF , EF C are
congruent.
A
@
@
@
D @E
@ @
@ @
@ @
B @ @C
F
(5) Give a proof of Theorem G15 by following the proof in the text, but this
time, extend DE along the ray RED (rather than along the opposite ray
RDE ) to a point F , so that |F D| = |DE|.
(6) If ABCD is any quadrilateral. Prove that the quadrilateral obtained by
joining midpoints of the adjacent sides of ABCD is always a parallelo-
gram.
(7) In ABC, let D be a point on AB. Let the line passing through D and
parallel to BC intersect AC at F , and let the line passing through D and
parallel to AC intersect BC at E. Prove that if |DF |/|BC| = |DE|/|AC|,
then D is the midpoint of AB.
268 5. DILATION AND SIMILARITY
(8) Use the idea in the proof of Theorem G15, but do not assume FTS, to
prove that if in triangle ABC, D and E are points on AB and AC,
respectively, so that |AB| = 3|AD| and |AC| = 3|AE|, then DE BC
and |BC| = 3|DE|.
(9) Prove that FTS* (Theorem G11) implies FTS (Theorem G10). More pre-
cisely, this means: assume everything we have proved up to and including
Theorem G9 plus Theorem G11, and prove Theorem G10.
(10) Let a segment AC lie in a half-plane of a given line , and let B be the
midpoint of AC. Let LAD , LBE , and LCF be three parallel lines which
meet at D, E, F , respectively. Prove that 2|BE| = |AD| + |CF |.
5.2. Dilation
We will give the definition and prove the basic properties of dilation, the first
transformation of the plane worthy of our serious attention that is not an isometry.
It is also the new ingredient we need to define similarity. It will be clear from the
discussion in this section why the FTS is the fundamental theorem of similarity.
Definition of a dilation (p. 268)
Basic properties of dilations (p. 269)
Effects of dilations on lengths and degrees (p. 275)
Definition of a dilation
We have been considering isometries almost exclusively thus far. Now we have
to look seriously into an important class of transformations that are not isometries.
Oq Pq Pq
r|OP |
Remark. We call attention to the fact that this definition of a dilation requires
the scale factor to be positive. Some authors allow the scale factor to be any real
number so that a negative scale factor means a dilation in the opposite direction of
O. There are pros and cons to either convention.
Observe that if r = 1 in this definition, then D is the identity map and there
is nothing to discuss. In the following, we will tacitly assume that r = 1.
The definition of a dilation is starkly simple: a dilation with center at O maps
each point by "pushing out" or "pulling in" the point along the ray from O to that
point, depending on whether the scale factor r is bigger than 1 or smaller than 1,
respectively. Roughly, it is a kind of "central projection from the point O". In
particular, each ray issuing from O is mapped to the same ray. (Caution: All this
5.2. DILATION 269
says is that the ray is mapped onto itself, but each point on the ray will in general
be mapped to another point on the same ray.) Here is an example of how a dilation
with r = 2 maps four different points (for any point P , we let the corresponding
letter with a prime, P , denote the image D(P ) of P ):
c U
cr
c
c
c
r Q
c Ur
c r
c
Q
c
c
cs
r r
O P P
r
V
r
V
A fundamental property of dilations, one that makes possible the simple draw-
ings of the dilation of rectilinear figures (i.e., figures composed of line segments),
is given in the following theorem. It will be clear from its proof and other proofs
related to dilation that the FTS and the parallel postulate lie at the heart of the
matter.
Proof. Let D be a dilation with center O and scale factor r. If LP Q passes through
O, then either P and Q lie on the same side of O or they lie on opposite sides of
O. In either case, the fact that D maps P Q to the segment in LP Q from D(P )
to D(Q) follows immediately from the definition of a dilation. We may therefore
assume that LP Q does not pass through O. Let P = D(P ) and Q = D(Q). We
will show that D(P Q) is the segment P Q joining P and Q
P U
C
Q
C
C
C
C U
P C
Q
C
C
C
O
As usual, there are two steps.
Step I. D(P Q) ⊂ P Q .
Step II. P Q ⊂ D(P Q).
270 5. DILATION AND SIMILARITY
Proof. Let the dilation be D. Given a line , we must prove that D() is also a
line. Let P , Q be points on . If passes through the center O of D, then it is
easy to see that D() = . So we assume does not pass though O. The theorem
says D(P Q) is the segment P Q , where P = D(P ) and Q = D(Q). Let be
the line containing P Q . We will prove D() = . First, we prove that D() ⊂ .
Thus, if U ∈ , we have to prove that D(U ) ∈ . For the proof itself, it does not
matter where U is located in , and the picture below shows the case of P ∗ Q ∗ U ,
as shown:
5.2. DILATION 271
P Q U
B
B
B
B
P B Q
U
B
B
B
B
O
Proof. Recall that ABC is the union of the segments AB, BC, and AC. Thus
we must prove D(AB) = A B , D(BC) = B C , and D(AC) = A C . By Theorem
G16, D(LAB ) is a line joining A and B because D(A) = A , D(B) = B . Since
assumption (L1) implies that there is a unique line joining two points, we see that
D(LAB ) = LA B , from which D(AB) = A B immediately follows. The same is
true for D(BC) = B C and D(AC) = A C . Corollary 2 is proved.
Armed with Theorem G16, we see that it is very simple to draw the image of
a segment by a dilation. Indeed, to draw the image of a given segment AB by a
dilation D, we simply find the image points A and B of the endpoints A and B
under D and then draw the segment joining A to B . Here are two examples. In
the first, the original figure is a triangle ABC, and the scale factor is 2.5.
A H
HH
HHC
A
HHC
O
B B
272 5. DILATION AND SIMILARITY
Q
Q
D(S)
J
J QQ
E
J
A
Q J
J Q J
J
C
J
J
J
O B
You are encouraged to make many such drawings of dilated rectilinear figures.
You may also have noticed that the dilation of a rectilinear figure "has the same
shape" as the original figure. But what about the dilation of curved figures? There
is no simple replacement of Theorem G16 in that case, but in practical terms (in
a sense to be made precise below), the procedure of getting a dilated figure is only
slightly more complicated. Consider for example the following curve:
r
r
r
r
r r
r r r
q
O
Now we draw the rays from O to each of the data points on the curve and dilate
the latter by a scale factor of 12 (i.e., "shrink it by half" in everyday language) along
these rays and ignore the curve itself. We thereby obtain a collection of points, and
these will be points on the dilated curve. It should not be difficult to discern, just
from these 10 dilated points, the general shape of the dilated curve.
5.2. DILATION 273
,r ,
, "
, "
,r""
,r" !!
, " r
"r !!
!
r
, " !! r r
r,
" ! r
, r " ! r r(
,
"
"
r r !
!r
!
( ((((((
r
,
" ! r r r (((
,
!
"
!
!
(
(
( ((((
"
(
((
q(!
"
,
!
O
r
r
r
r r
r r
rr r r
rr r
rr r
r r
q
O
It is obvious that the more data points we choose on the original curve, the better
we will be able to approximate the dilated curve. To drive home this idea, let us
use 40 data points instead of 10 on the original curve and dilate them from O to
get the following picture. (We omit the rays issuing from O in the interest of visual
clarity.)
q
O
274 5. DILATION AND SIMILARITY
Next we double the number of data points and use 80 instead of 40. The approx-
imation of this finite collection of points to the curve itself is already remarkably
good.
q
O
If we use 400 points, then the images can almost pass for the real thing, except
that if we look very carefully we can see that they are not entirely "smooth".
q
O
What we have described is a basic principle of constructing the dilated image
of any object: to dilate a given object by a scale factor of r, replace the object
by a finite collection of judiciously chosen points on the object, still to be called
data points, and then simply dilate these data points one by one by a scale factor
of r. By increasing the number of data points, their dilated images yield a closer
and closer approximation to the true dilated object.2 A few experiments with this
kind of drawing (see Exercises 5 and 6 on page 281) should suffice to convey the
idea that the dilated object so obtained "has the same shape" as the original but is
magnified or shrunk by a scale factor of r (depending on whether r > 1 or r < 1).
This is how we can enlarge or shrink arbitrary figures regardless of how curved they
may be.
It would be very instructive in the school classroom for students to magnify or
shrink many simple curved figures by such hands-on activities. We will elaborate
on these ideas in the next section.
Incidentally, what we have described is also the basic operating principle of
digital photography: approximate any real object by a large number of data points
on the object, and then magnify or shrink these data points by dilation.
2 Mathematical Aside: In computer graphics, one decides on a finite number of data points to
use for the purpose of dilation and then uses spline interpolation to complete the dilated image.
5.2. DILATION 275
The following theorem summarizes what a dilation does to lengths and degrees.
Theorem G17. Let D be a dilation with center O and scale factor r. Then:
(a) D is a bijection. In fact, its inverse is the dilation with the
same center O but with a scale factor 1/r.
(b) For any segment AB, |D(AB)| = r|AB|.
(c) D maps angles to angles and preserves degrees of angles.
Remark. Observe the delicate point that the statements of part (b) and part
(c) depend on the validity of Theorem G16 and its Corollary 1. Indeed, without
knowing that D(AB) is a segment, the notation |D(AB)| would not even make
sense (because the notation |σ| only makes sense when σ is a segment or an angle),
and without knowing that D maps rays to rays, it would not make sense to say
that D maps angles to angles.
Proofs of parts (a) and (b). (a) Let D be the dilation with center O and scale
factor 1r . From the definition of a dilation, it is easy to check that D ◦ D = I =
D ◦ D. Thus D is a bijection (Theorem 4.15 on page 211).
Part (b) has been implicitly proved in the proof of Theorem G16. Indeed,
in the notation above, if P = D(P ) and Q = D(Q), then we have shown that
D(P Q) = P Q . If LP Q contains O, then the fact that |D(P Q)| = r|P Q| follows
trivially from the definition of a dilation. So suppose LP Q does not pass through
O; then LP Q also does not pass through O as the definition of D clearly shows.
We may therefore look at OP Q , and FTS implies that |P Q |/|P Q| = r; i.e.,
|D(P Q)| = r|P Q|. Since P and Q are arbitrary points, (b) is proved.
For the proof of part (c) of Theorem G17, we first observe that a dilation maps
convex angles to convex angles. Indeed, a convex angle is the intersection of closed
half-planes, so it suffices to prove that (i) a dilation maps closed half-planes to
closed half-planes, (ii) if H+ and H− are closed half-planes of a line L, then D(H+ )
and D(H− ) are the closed half-planes of the line D(L), and (iii) if H1 and H2 are
closed half-planes (of possibly different lines), then
Because the proofs of (i)–(iii) are as tedious as they are straightforward, we will
leave their verifications to the reader. Now once we know a dilation maps convex
angles to convex angles, then it also maps nonconvex angles to nonconvex angles
because every nonconvex angle is the union of the complement (in the plane) of a
convex angle together with the two sides of the convex angle.
To complete the proof of part (c), we have to make a long digression to discuss
the angles associated with parallel lines. The proof of part (c) itself will be given
on page 280.
We will need some definitions. Here is the first one. Let L and L be two lines
meeting at a point O. On L (respectively, L ), let P , Q (respectively, P , Q ) be
276 5. DILATION AND SIMILARITY
Then the angles ∠P OP and ∠QOQ are called opposite angles at the point
O.3 There is a simple observation about opposite angles.
Proof of Lemma 5.1. We make use of the preceding figure. Each of the two
numbers, |∠P OP | and |∠QOQ |, when added to |∠P OQ| is 180 because ∠P OQ
and ∠P OQ are both straight angles. So |∠P OP | = |∠QOQ |.
We want to give a different proof for the purpose of demonstrating how to make
use of basic isometries to prove theorems. In this case, we argue as follows. Consider
the rotation of 180◦ around O. Clearly (ROP ) = ROQ and (ROP ) = ROQ .
Therefore (∠P OP ) = ∠QOQ . Since preserves angles (by assumption (2) of
rotations on p. 217), we have |∠P OP | = |∠QOQ |. The proof of the lemma is
complete.
EE
E qS
E (
E ((((2q(( L2
R
Q2 (((( EP
((q( E 2
E
E
E
E
E q
L1
P1 E Q1
E
An angle which is the opposite angle of one of a pair of alternate interior angles
is said to be the corresponding angle of the other angle. For example, let S be
a point on the transversal so that P1 ∗ P2 ∗ S, and let R2 be a point on L2 so that
Q2 ∗ P2 ∗ R2 . Then, because ∠Q1 P1 P2 and ∠Q2 P2 P1 are alternate interior angles
3 Thus also ∠P OQ and ∠P OQ are opposite angles at O. Most school textbooks in the U.S.
and because ∠R2 P2 S is the opposite angle of ∠Q2 P2 P1 , ∠R2 P2 S is the correspond-
ing angle of ∠Q1 P1 P2 . Observe that, by the definition of opposite angles, ∠Q2 P2 P1
and ∠R2 P2 S lie on opposite closed half-planes of . Therefore the corresponding
angles ∠Q1 P1 P2 and ∠R2 P2 S lie on the same closed half-plane of the transversal .
The basic theorem about parallel lines and angles is the following:
theorem about corresponding angles follows from the equality of opposite angles at
a point (Lemma 5.1).
Now we give a second proof. Let L1 L2 with transversal intersecting L1 and
L2 at P1 and P2 , respectively, as before. Also let Q1 , Q2 be as before. Choose a
point R2 on L2 so that Q2 ∗ P2 ∗ R2 , and choose a point S on so that P1 ∗ P2 ∗ S,
as shown:
E
EqS
E
E
E
E
Q2 E P2 R2
E L2
E EE
E E
E E
E E
L1
P1 E Q1
E
E
Since Q2 ∗ P2 ∗ R2 is equivalent to Q2 and R2 being two points on L2 lying on
opposite sides of and since Q2 and Q1 were chosen to be on opposite sides of ,
we see that R2 may be characterized as a point on L2 that lies on the same side of
as Q1 .
By the definition of opposite angles at a point (see page 276), ∠R2 P2 S and
∠Q2 P2 P1 are opposite angles at P2 . It follows that ∠Q1 P1 P2 and ∠R2 P2 S are
corresponding angles of the transversal , and we will prove that they are equal.
−−−→
Let T be the translation along the vector P1 P2 . Then T (P1 ) = P2 . By Theorem G5
(page 236), the translation T maps L1 to a line parallel to itself. So T (L1 ) is a line
passing through P2 and parallel to L1 . But by hypothesis, L2 is also a line with the
same properties. The parallel postulate therefore implies that T (L1 ) = L2 . Now
let W denote T (Q1 ); then W ∈ L2 . We claim that W is on the same side of as
−−−→
Q1 . To see this, observe that Lemma 4.21 on page 234 implies that Q1 W . In
particular, the segment Q1 W is disjoint from and therefore W and Q1 lie on the
same side of (see (L4)(ii) on page 176), thereby proving the claim. It follows that
W is a point on L2 that lies on the same side of as Q1 . By the characterization
of R2 in the preceding paragraph, there is no loss of generality if we let R2 = W .
Thus we have T (Q1 ) = R2 .
Next, let us turn to S, and we want to show that we may likewise let S = T (P2 ).
To this end, let U = T (P2 ) and we claim that P1 ∗ P2 ∗ U . Indeed, U has to lie
in either half-line of determined by P2 . Suppose U lies in the ray RP2 P1 . Then
clearly RP2 U = RP2 P1 . Since U = T (P2 ), the definition of a translation (page 234)
−−→ −−−→
implies that P2 U points in the same direction as P1 P2 , and this means that there
is a line not parallel to so that one of its closed half-planes contains both rays,
RP1 P2 and RP2 P1 (= RP2 U ). Since the union of RP1 P2 and RP2 P1 is the line , there
is no such . Therefore U must lie in the opposite ray of RP2 P1 on . Consequently,
P1 ∗ P2 ∗ U by Lemma 4.6(i) on page 174, and the claim is proved. It follows that
both U and S are points on lying on the opposite side of P1 on with respect
to P2 . Thus we may let S = U without any loss of generality; i.e., we may let
5.2. DILATION 279
Pedagogical Comments. Because the second proof includes all the technical
details, it masks the very intuitive underlying argument. In a high school classroom,
we can present the same proof in the following way, with an explicit caution to the
class that we will rely on the picture itself to justify whether or not two points lie
on the same side or opposite sides of or L2 :
Let L1 L2 , and let intersect L1 and L2 at P1 and P2 , re-
spectively, as before. Also let Q1 , Q2 be as before. Let T be
−−−→
the translation along the vector P1 P2 . Then T (P1 ) = P2 . By
Theorem G5 (page 236), the translation T maps L1 to a line
parallel to itself. So T (L1 ) is a line passing through P2 and
parallel to L1 . But by hypothesis, L2 is also a line with the
same properties. The parallel postulate therefore implies that
T (L1 ) = L2 . So T maps Q1 to a point R2 on L2 , and it maps
P2 to a point S on , as shown in the preceding picture. Now,
we have T (∠Q1 P1 P2 ) = ∠R2 P2 S. Since T preserves degrees of
angles, |∠Q1 P1 P2 | = |∠R2 P2 S|, and Theorem G18 is proved for
the case of corresponding angles. The case of alternate interior
angles follows from Lemma 5.1 on page 276.
Such a proof would be far more instructive to a beginning student in geometry than
the (completely correct) formal proof above. End of Pedagogical Comments.
Remark. Readers who are familiar with some high school geometry may be
tempted at this point to immediately use Theorem G18 to prove the well-known
fact that the sum of (the degrees of) angles in a triangle is 180◦ . The argument
goes as follows. Given ABC, extend the ray RBA to a point D; through A draw
a ray RAE that is parallel to BC so that E lies in ∠CAD, as shown:
D
A E
JJ
J
J
J
B C
By Theorem G18, |∠DAE| = |∠B| and |∠CAE| = |∠C|, so that
|∠BAC| + |∠B| + |∠C| = |∠BAC| + |∠DAE| + |∠CAE| = 180◦ .
This would seem to finish the proof. Let us affirm that this intuitive argument is
indeed how a high school student should remember why the angle sum of a triangle
is 180. For a teacher to really come to grips with the delicate points about Eu-
clidean geometry, however, it is necessary to point out that for Theorem G18 to
be applicable, we must first prove that ∠C and ∠EAC are alternate interior angles
and ∠B and ∠DAE are corresponding angles. See the Pedagogical Comments both
280 5. DILATION AND SIMILARITY
before and after Theorem G18. In Section 6.5 of [Wu2020b], we will present a proof
of the angle sum theorem with such details filled in.
We can finally give the Proof of part (c) of Theorem G17. Let O be the
center of the dilation D. Since D maps rays to rays, it maps angles to angles. Given
∠P QR, let D map the ray RQP to the ray RQ P and let it map the ray RQR to
the ray RQ R , so that D(∠P QR) = ∠P Q R . We have to prove that
|∠P QR| = |∠P Q R |.
If one of P , Q, and R is equal to O, the argument is simpler (this will be evident
below). So we may assume that none of P , Q, and R is equal to O. Furthermore,
suppose (let us say) O, P , Q are collinear. Then by the definition of a dilation,
P and Q will also lie on the line containing O, P , Q and Q R QR (Theorem
G16 on page 269). Then the fact that |∠P QR| = |∠P Q R | follows directly from
Theorem G18 (the case of corresponding angles).
R
R
r
r r r r
Q Q O P P
In general, we have a situation depicted by the following picture:
P
P
Q
B R
Q R
Without loss of generality, we may assume that neither angle is the zero an-
gle (see (L6) part (ii), page 188). We claim that LQ P must intersect LQR . If
not, then LQ P LQR . But we already know from part (b) that LQ R LQR .
Thus we have two distinct lines LQ P and LQ R passing through Q and paral-
lel to LQR , and this contradicts the parallel postulate (page 165). Thus LQ P
intersects LQR at a point B, as shown. By Theorem G16 (page 269) and its
Corollary 1 (page 270), LQR LQ R and LQP LQ P . Therefore, according to
Theorem G18 about corresponding angles, (notation as in the preceding figure)
|∠P QR| = |∠P BR| = |∠P Q R |, as desired. The proof of Theorem G17 is com-
plete.
In the preceding proof, it is simply asserted that certain angles are correspond-
ing angles without any proof; this is because the details are similar to those in the
second proof of Theorem G18 (see especially the Pedagogical Comments on page
279). Going through such uninspiring arguments once is quite enough and we will
5.2. DILATION 281
continue to skip such arguments in the future. The following converse of Theorem
G18 will also be useful; the proof is sufficiently straightforward to be left to Exercise
1 below.
Exercises 5.2.
(1) Prove Theorem G19 on page 281. (Hint: Use Theorem G18 and Lemma
4.10 on page 190.)
(2) Prove: (a) The dilation of a convex set is a convex set. (b) The dilation of
a polygon is a polygon. (c) The dilation of a regular polygon is a regular
polygon.
(3) Let ABCD and A B C D be two quadrilaterals. Suppose there is a point
K so that the rays RKA , RKB , RKC , RKD contain A , B , C , D , re-
spectively. Assume also
|KA| |KB| |KC| |KD|
= = = .
|KA | |KB | |KC | |KD |
Prove that if ABCD is a square, then so is A B C D . (Caution: Be
careful about what you say and how you say it.)
(4) Let O be a point not on a given circle C with center K. Let D be the
dilation with center O and scale factor r. Prove that the image D(C) is a
circle and that the center of D(C) is the image under D of the center of
C. (Caution: This is a slippery proof. Follow the precise definitions of a
circle and a dilation.)
(5) Let D and E be the midpoints of AB and AC, respectively, of ABC,
and let K be the midpoint of DE (see picture below). Let D be the
dilation with center A and scale factor 12 . (a) If is the rotation of
180◦ around K, describe precisely the figure D((ABC)). (b) If T is
the translation along AD, describe precisely the figure T (D(ABC)).
(c) How are the figures in (a) and (b) related?
A
@
@
@
D r @E
K @
@
@
B @C
282 5. DILATION AND SIMILARITY
O r
Trace both onto a piece of paper, and choose enough points on the curve
so that, by dilating these points with center O and scale factor 2, the
dilated points give a reasonable picture of the dilated curve with scale
factor 2.
(7) (i) Let P and Q be two distinct points in the plane and let DP , DQ be
two dilations with center at P , Q, respectively, and with scale factor 12
and 2, respectively. Prove that DP ◦ DQ is a translation along the vector
−−→
QM , where M is the midpoint of P Q. (ii) Generalize part (i).
(8) Let DP , DQ be two dilations with centers at two distinct points P and Q
and with scale factors r and s, respectively. If rs = 1, prove that there is
a point X so that DP ◦ DQ is a dilation with center X.
(9) (This exercise refers to a coordinate system. See the warning on page
207.) (a) Let D be the dilation with center O and scale factor 2, and
let ϕ be the congruence which is the reflection across the horizontal line
corresponding to {y = 1}. Are ϕ ◦ D and D ◦ ϕ equal? (b) Repeat part
(a) if ϕ is now the congruence which is the rotation of 90 degrees around
the point (1, 0). (c) Repeat part (a) if ϕ is now the congruence which is
the translation that sends a point (x, y) to (x + 2, y).
(10) Let ϕ be a congruence and let D be a dilation with center O and scale
factor r. Prove that ϕ−1 ◦ D ◦ ϕ is a dilation; be sure to state what its
center of dilation is and what its scale factor is.
(11) Let ROA , ROB , and ROC be three rays issuing from O. Let A ∈ ROA ,
B ∈ ROB , and C ∈ ROC . Suppose AB A B and BC B C . Prove
that AC A C .
A H
HH
HHC
A
HHC
O B B
5.3. SIMILARITY 283
5.3. Similarity
The goals of this section are to introduce a correct definition of similarity, prove
two basic criteria for triangle similarity, and, as an application, prove the most
famous theorem of elementary mathematics: the Pythagorean theorem. We will
also prove the converse of the Pythagorean theorem.
Let S and S be two sets in the plane. How do we say correctly that they are
"similar"? First and foremost, the phrase "having the same shape" lacks precision
and cannot be used as a definition of similarity, contrary to what TSM tells you.
Moreover, the only precise definition of similar figures that TSM offers is that of
"similar polygons", and the problem with such a narrow definition is that it leaves
out the consideration of the similarity of geometric figures like parabolas. We
cannot afford any ambiguity about the similarity of parabolas because it will limit
our understanding of the graphs of quadratic functions and conic sections (see, e.g.,
Sections 2.2 and 2.3 of [Wu2020b], respectively). We need a definition of similar
figures that not only applies to all plane figures but also coincides with the TSM
definition in the case of polygons. Now in Section 5.2, we saw that if one figure is
a dilation of another, then they do appear to have the same shape. Why not just
say a figure is similar to another if one is the dilation of the other? To answer this
question, consider the following figures:
S S
One can convince oneself that S is obtained from S by a dilation of scale factor
1
2 . Now rotate S clockwise by 90 degrees around the center of the circle in S to
obtain S*, as shown:
S S*
Now S* is of course congruent to S and therefore must have "the same shape" as S,
but can S* be a dilation of S? Not according to Theorem G16 (page 269) because if
it were, then the horizontal segment of S* would have to be parallel to the vertical
segment of S.
284 5. DILATION AND SIMILARITY
What this simple example shows is that it is too restrictive to define "similar-
ity" in terms of dilations alone. One must allow for compositions with congruences
as well since, intuitively, each congruence preserves both shape and size. In the
preceding example, for instance, a dilation of S by a scale factor of 12 , followed by
a clockwise rotation of 90 degrees yields the figure S* which still "has the same
shape" as S. With this in mind, we now give the formal definition of "similarity".
(iii) The similarity of two figures S ∼ S* is a symmetric relation. (iv) The simi-
larity of two figures S ∼ S* is also a transitive relation.
Proof. (i) follows immediately from the definition of similarity. For (ii), recall
that congruences and dilations are bijections (Theorem G6 on page 240 and The-
orem G17 on page 275) and, since a composition of bijections is a bijection, each
similarity—being a composition of bijections—has an inverse transformation (see
page 211). It remains to prove that this inverse transformation is also a similarity.
To this end, we will look at a special case to avoid notational excesses, and it will
be seen that the reasoning behind the proof of this special case is perfectly general.
Suppose a similarity F is equal to a composition
F = ϕ1 ◦ D1 ◦ D2 ◦ ϕ2 ,
where ϕ1 and ϕ2 are congruences and D1 and D2 are dilations (possibly with
different centers). Let G be the composition of the following congruences and
dilations:
G = ϕ−1 −1 −1
2 ◦ D2 ◦ D1 ◦ ϕ1 ,
−1
Obviously, every figure is congruent to itself and hence similar to itself. Thus
similarity is also a reflexive relation (see page 241). Since similarity is a reflexive,
symmetric, and transitive relation, this says that similarity is an equivalence relation
(see page 241).
The fact that the similarity relation ∼ is both symmetric and transitive is not
an abstraction for its own sake. It has substantive intuitive content. As we pointed
out above, the symmetry of the relation allows us to say that "two figures S and
S* are similar" without having to worry about whether it is S ∼ S* or S* ∼ S,
because they are equivalent. We will freely avail ourselves of this terminology from
now on. In addition, the transitivity of similarity leads to the following intuitive
conclusion:
Lemma 5.3. If two geometric figures are each similar to a third, then they are
similar to each other.
286 5. DILATION AND SIMILARITY
Mathematical Aside: (a) This concept of similarity is meaningful not only for
any geometric figure in the plane but also for figures in Euclidean spaces of higher
dimensions, as soon as we extend the definitions of rotations, reflections, transla-
tions, and dilations to higher dimensions. (b) Parts (i) and (ii) of Lemma 5.2 imply
that the set of all similarities of the plane form a group, the group of similarities.
On pp. 211, 235, and 240, we have introduced certain groups and we can now bring
them together:
Theorem. For a transformation F of the plane, the following three conditions are
equivalent:
(i) F is a similarity.
(ii) F is the composition of a dilation followed by a congruence.
(iii) F is the composition of a congruence followed by a dilation.
This theorem implies that we could have defined a similarity to be, for example,
the composition of a dilation followed by a congruence. However, the disadvantage
of such a definition is that it is actually clumsy to work with; e.g., this definition
makes it difficult to prove that ∼ is a symmetric and transitive relation (see remark
(b) in the preceding Mathematical Aside). Since the proof of this theorem is quite
5.3. SIMILARITY 287
abstract and technical, and decidedly not simple, we will relegate it to a file, A
Theorem about Similarity, to be posted on the author’s homepage, https://math.
berkeley.edu/~wu/.
In other words, ABC ∼ A B C means not only that there is a similarity F so
that the sets F (ABC) and A B C are equal, but that F specifically maps A
to A , B to B , and C to C .
It is common to express the second set of equalities, i.e., the equality of the ra-
tios of corresponding sides of the two triangles, by saying that the corresponding
sides are proportional.
Remark. It is in the proof of this theorem that we get to see why a similarity
is defined as the composition of dilations and congruences (rather than just isome-
tries). The reason is that we need a similarity to preserve the degrees of angles
whereas an isometry is, at this point, not yet known to do that (compare Theorems
G6 on page 240 and Theorem G17 on page 275). It is the property of a congruence
to also preserve degrees of angles that accounts for the validity of Theorem G20.
Proof. If we have ABC ∼ A B C , then the assertions about angles and sides
follow from Theorems G6 (page 240) and G17 (page 275). For the converse, we
prove something stronger:
Theorem G21 (SAS for similarity). Given two triangles ABC and A B C ,
if |∠A| = |∠A | and
|AB| |AC|
= ,
|A B | |A C |
then ABC ∼ A B C .
Proof. The idea of the proof is to use a congruence to move A B C into a
position so that a dilation with center at A will map it to ABC.
288 5. DILATION AND SIMILARITY
If |AB| = |A B |, then the hypothesis would imply |AC| = |A C | and we are
reduced to the SAS criterion for congruence. Thus we may assume that |AB| and
|A B | are not equal. Without loss of generality, suppose |A B | < |AB|. Then the
hypothesis that |AB|/|A B | = |AC|/|A C | implies |A C | < |AC|. On the segment
AB, let B0 be the point so that |AB0 | = |A B |. Similarly, on AC, let C0 be the
point satisfying |AC0 | = |A C |.
Because |∠A| = |∠A | by hypothesis, the SAS criterion for congruence (The-
orem G8, page 245) implies that A B C ∼ = AB0 C0 . Let ϕ be the congruence
that maps A B C to AB0 C0 . Moreover, if r denotes the common value of
|AB|/|A B | and |AC|/|A C |, then the dilation D with center A and scale factor
r maps A to A (of course), but also B0 to B because by the definition of dilation,
D(B0 ) is the point on the ray RAB so that the distance of D(B0 ) from the center
A is
|AB| |AB|
r|AB0 | =
|AB0 | = |A B | = |AB|.
|A B | |A B |
Since D ◦ ϕ is a similarity, ABC and A B C are similar and the proof of the
theorem is complete.
We next give the proof of the most easily applied criterion of similarity: the
AA criterion (angle-angle criterion) for similarity.
Theorem G22 (AA for similarity). Two triangles with two pairs of equal
angles are similar.
Proof. Let two triangles ABC and A B C be given. We may assume |∠A| = |∠A |
and |∠B| = |∠B |.
We have to prove that ABC ∼ A B C . If |AB| = |A B |, then the hypothesis
would imply ABC ∼ = A B C because of the ASA criterion for congruence
(Theorem G9, page 245). Thus we may assume that |AB| and |A B | are not equal.
Suppose |A B | < |AB|. On AB, choose a point B0 so that |AB0 | = |A B |, and let
the line parallel to BC and passing through B0 intersect the line LAC at C0 . By
Theorem G11 (FTS*) (page 257), C0 lies on the ray RAC and
|AB| |AC|
(5.9) = .
|AB0 | |AC0 |
On the other hand, we have |AB0 | = |A B | by construction and |∠A| = |∠A | by
hypothesis. In addition, |∠AB0 C0 | = |∠B | because |∠AB0 C0 | = |∠B| by Theorem
G18 on page 277 concerning corresponding angles with respect to parallel lines, and
because |∠B| = |∠B | by hypothesis. Thus ASA implies that AB0 C0 ∼ = A B C .
Hence |AB0 | = |A B | and |AC0 | = |A C |. Therefore equation (5.9) becomes
|AB| |AC|
= .
|A B | |A C |
Now recall that ∠A and ∠A are assumed to be equal. Therefore, triangles ABC
and A B C are similar because they satisfy the conditions of SAS for similarity
(Theorem G21). Theorem G22 is proved.
We emphasize once again that inasmuch as the validity of Theorem G21 de-
pends on Theorem G16 which depends on the parallel postulate, and the proof of
Theorem G22 depends on FTS* which also depends on the parallel postulate, both
theorems ultimately depend on the parallel postulate. The fact that all conclusions
about similar figures depend crucially on the parallel postulate will be underscored
once more in the last section of Chapter 8 in [Wu2020b].
290 5. DILATION AND SIMILARITY
To round off the picture, let us also mention the fact that, in analogy with the
case of congruence, there is also an SSS criterion for similarity. This will be proved
in Section 6.4 of [Wu2020b].
For the purpose of learning about linear equations, students have to learn
how to apply Theorems G21 and G22 in specific situations. There is probably no
better illustration of such applications than the following proof of the Pythagorean
theorem.5 Note that there will be a second proof of this theorem in Section 4.4 of
[Wu2020c] using the concept of area.
Let us fix the terminology. Given a right triangle ABC with C being the vertex
of the right angle. Then the sides AC and BC are called the legs of ABC, and
AB is called the hypotenuse of ABC.
A HH
HH
HH
c
b HH
HH
HH
C a B
The basic idea of the proof is very simple. Referring to the same picture, we
draw a perpendicular from C to line LAB . The perpendicular meets the segment
AB at a point D (see the definition of segment on page 169), as the middle figure
in the following picture shows:
AH AH
HHD H D
H D
HH c
H
HH
H
H HH H H
b
b HH
H
HH
H H
H H
C C a
HB C a
H
B
known about him or his work with certainty. He was a Greek philosopher-mathematician who
lived around 500 BC. He founded a school devoted to mathematics and philosophy, but it was
also a school shrouded in secrecy and infused with a large dose of mysticism. The recognition of
the mathematical relationship between musical notes and the existence of numbers which are not
rational are attributed to this school. The so-called Pythagorean theorem was actually known
independently, and earlier, to the Babylonians, Hindus, and Chinese ([Katz, Chapters 1 and 2]).
The Babylonians made extensive computations with this theorem around 1800 BC; see Exercise
9 on page 384.
5.3. SIMILARITY 291
Then a simple use of Theorem G22 reveals that both right triangles CBD and
ACD are similar to ABC and therefore their corresponding sides are proportional.
This immediately leads, via the cross-multiplication algorithm, to several equalities
between the products of (lengths of) the sides of these triangles. If you already know
what to prove, then by trial and error, you cannot help but arrive at a combination
of these equalities that will give you what you want. On the other hand, if you do
not know what to prove, then these identities are not likely to do you much good.
There are many proofs of the Pythagorean theorem, but regardless of the proof,
one is always aware that guessing a correct statement of the Pythagorean theorem
is a higher order of achievement than finding a proof of the theorem. The first
person to discover this theorem must have been an extraordinary mathematician.
Proof. We will prove that ABC ∼ CBD and also ABC ∼ ACD.
H H
A Hβ HH c
HHD HH
H HH
H HH
b HαH HH
HH H
HH
C a B
It suffices to prove ABC ∼ CBD as the other similarity can be proved in
the same way. The two triangles ABC and CBD have two pairs of equal angles:
|∠CDB| = |∠ACB| = 90◦ and |∠CBD| = |∠ABC|. By the AA criterion for
|BA| |BC|
similarity (Theorem G22 on p. 288), the triangles are similar. Hence |BC| = |BD| .
Letting
|AC| = b, |AB| = c, |BC| = a, |AD| = β, |BD| = α
c a
(see the preceding picture), we get a = α, so that by the cross-multiplication
algorithm,
(5.10) a2 = αc.
By considering the similar right triangles ABC and ACD, we conclude in the same
|AC| |AD|
way that |AB| = |AC| , so that cb = βb . Therefore,
(5.11) b2 = βc.
Adding (5.10) and (5.11) and making use of α + β = c, we finally obtain
a2 + b2 = αc + βc = (α + β)c = c2 .
The proof of the Pythagorean theorem is complete.
There is an animation of the preceding proof by Larry Francis that also makes
the striking observation that the algebraic manipulations above actually have a
geometric interpretation in terms of area:
https://youtu.be/QCyvxYLFSfU.
292 5. DILATION AND SIMILARITY
HH
D
H A
H HH
HH
b HcH
HH HH
HH HB
C HH a
HH
HH
HE
Let line LCE be parallel to LAB (use the corollary to Theorem G1 on page 222).
We may assume the point E to be so chosen that E and B lie in the same half-plane
of LCD . The strategy is to show that
Assuming this for the moment, we will show how to conclude the proof. Recall that,
by hypothesis, ∠ACB is the right angle of the right triangle ABC. Therefore
|∠ACB| = 90◦ . But LCE LAB and BD ⊥ CD; therefore by Theorem G3 on
p. 224, EC ⊥ CD. It follows that also |∠DCE| = 90◦ , and (5.12) now leads to the
absurd statement that 90◦ < 90◦ . Thus D has to be between A and B to begin
with.
It remains to prove (5.12). To this end, we will prove a more detailed statement:
(5.13) |∠ACB| < |∠DCB| < |∠DCE|.
We begin by proving the second inequality in (5.13); i.e., |∠DCB| < |∠DCE|. We
claim that B lies in the convex angle ∠DCE. Thus we must prove (i) E and B
lie in the same half-plane of LCD and (ii) D and B lie in the same half-plane of
LCE (see the definition of angle on p. 182). Now (i) is true because this was how
we chose E. To see that (ii) is true, observe that since LCE LAB , the line LAB
does not contain any point of LCE and therefore neither does the segment DB; by
assumption (L4)(ii) on p. 176, D and B lie in the same half-plane of LCE . Thus
(ii) is also true and B lies in ∠DCE, thereby proving the claim. It follows that
∠DCB and ∠BCE are adjacent angles with respect to ∠DCE (see page 186 for
the definition of adjacent angles). Therefore by assumption (L6)(iv) on p. 188,
|∠DCB| + |∠BCE| = |∠DCE|.
Since LCE LAB , B does not lie on LCE so that, in particular, B, C, and E are
not collinear. Thus |∠BCE| > 0. Therefore |∠DCB| < |∠DCE|, and the second
inequality in (5.13) holds.
The proof of the first inequality in (5.13), i.e., |∠ACB| < |∠DCB|, is entirely
similar, but simpler. We want to show that A lies in ∠DCB, and this requires the
proof that A and B lie in the same half-plane of LCD and that A and D lie in the
same half-plane of LCB . Both follow immediately from the hypothesis that A is
between D and B, so that AB does not contain any point of LCD and DA does
not contain any point of LCB (see (L4)(ii) again). Therefore ∠DCA and ∠ACB
are adjacent angles with respect to ∠DCB, and assumption (L6)(iv) implies that
|∠DCA| + |∠ACB| = |∠DCB|.
Since |∠DCA| > 0, we have |∠ACB| < |∠DCB|. We have therefore completely
proved (5.13) and, therewith, also (5.12). As explained right after (5.12), this
means that the point D on LAB has to be between A and B. End of Pedagogical
Comments.
of the theorem on page 291 depends on the concept of similar triangles, which in
turn depends on the concept of dilation. It is manifest that almost every property
of dilation rests on the parallel postulate, e.g., the fact that a dilation maps a line
to a line (see the proof of Theorem G16 on page 269). It is therefore clear that the
parallel postulate plays a critical role in validating the truth of the Pythagorean
theorem. Of course, it is possible that there is another proof of the Pythagorean
theorem that does not make use of the parallel postulate, but what we want to
emphasize is that such a proof does not exist. Without the parallel postulate, the
Pythagorean theorem will cease to hold. In fact, in hyperbolic geometry (see Section
8.4 of [Wu2020b]), where it is assumed that through a point not lying on a line
pass two distinct lines parallel to (the opposite of the parallel postulate), the
Pythagorean theorem fails. There, a2 + b2 < c2 . The Pythagorean theorem is
therefore a characteristic theorem of Euclidean geometry.
Exercises 5.3.
(1) Let D, E, F be the midpoints of the sides BC, AC, AB, respectively, of
a triangle ABC. Prove that DEF ∼ ABC with a scale factor of 2.
(2) Let ABC be a right triangle with AC ⊥ CB. Let the perpendicular line
|AC|·|BC|
from C to AB meet AB at D. Prove that |CD| = |AB| .
(3) Let ABC be a right triangle so that |AC| = 3, |BC| = 4, and |AB| = 5.
Let the perpendicular line from C to AB meet AB at D, and let the
perpendicular line from D to AC meet AC at E. Find |CE|.
(4) (This exercise generalizes Exercise 11 on page 238.) Assume FTS. Let
L1 , L2 , and L3 be three mutually parallel lines, and let and be two
distinct transversals which intersect the three parallel lines at A1 , A2 , A3
and B1 , B2 , B3 , respectively. Prove that
|A1 A2 | |B1 B2 |
= .
|A2 A3 | |B2 B3 |
(5) Prove that all circles are similar to each other. (Caution: This is a slippery
proof. Given two circles C1 and C2 , suppose you have found a dilation D
and congruence ϕ so that (ϕ ◦ D)(C1 ) = C2 . Then you will have to prove
that the two sets (ϕ ◦ D)(C1 ) and C2 are equal. This means you will have
to prove that each is contained in the other (see page 141). Do not skip
steps.)
(6) Prove that two rectangles are similar to each other if and only if either
the ratios of (the lengths of) their sides are equal or the product of these
ratios is 1. Precisely, let the lengths of the sides of one rectangle be a and
b and those of the other be a and b ; then the rectangles are similar if and
only if either ab = ab or ab · ab = 1.
(7) (a) Write a detailed proof of Theorem G25. (b) Prove Theorem G24
(Converse of the Pythagorean theorem). (Hint for (b): Suppose in ABC
that |AB| = c, |AC| = b, |BC| = a, a2 + b2 = c2 , and yet |∠C| = 90◦ .
Deduce a contradiction as follows: let D be the point on LBC so that
AD ⊥ BC. There are two cases: D lies in BC and D lies outside BC. The
two cases are similar, so consider the former case where B∗D∗C. Compare
the hypothesis that a2 + b2 = c2 with the results obtained by applying the
5.3. SIMILARITY 295
a2 + b2 − c2
(a) Prove that = . (b) Prove that
2a
1
h = (a + b + c)(a + b − c)(−a + b + c)(a − b + c).
2a
(This exercise essentially proves Heron’s formula for the area of a triangle
in terms of its sides; see Section 4.5 in [Wu2020c].)
(13) Suppose you are a teacher in middle school and you are handed a textbook
series that takes up similarity in grade 7 and congruence in grade 8. (Such
a series did exist in 2013.) (a) Do you believe such a curricular decision is
defensible? Explain. (b) If you are a seventh-grade teacher, what would
you do? (Obviously there will be no unique answer to part (b), but the
idea is that you had better start thinking about such real-world situations
because your ability to adjust is, alas, part of your responsibility.).
CHAPTER 6
In this chapter, we begin the study of algebra. The main topics of this chapter
are the use of symbols, linear equations in one or two variables, and systems of two
linear equations in two variables.
In the context of school mathematics, the most urgent task in helping students
to achieve success in algebra may very well be getting them to be fluent in the
correct use of symbols. There is at present an unhealthy preoccupation in TSM1
with the concept of a "variable" in the teaching of algebra, to the point of elevating
it to a formal mathematical concept. The truth is that "variable" is not a math-
ematical concept. A main goal of this chapter is to explain why, if students know
the basic etiquette in the use of symbols,2 there would never be any need for them
to understand what a "variable" is. Clearly the word "variable" is suggestive, and
it is often used in mathematical discussions as a shorthand; for example, we have
just used it to talk about "linear equations in one or two variables". However, we
did so only because the meaning of this phrase is universally understood, and there
is no need to find out what "variables" means in this context.3 So when all is said
and done, students should just concentrate on learning the basic etiquette in the
use of symbols and learn it well.
A major stumbling block in students’ learning about linear equations in two
variables is the concept of slope (cf. [PG]). The fairly voluminous literature in
education research on slope indicates an awareness of students’ difficulty on this
topic. One of the many symptoms of this difficulty is articulated in [Beckmann-
Izsák]:
They might not see slope as a number, but instead think of it
as a pair of numbers separated by a slash, basically "rise slash
run."
It is surprising that [Beckmann-Izsák]—in discussing why slope is hard to teach—
did not mention the glaring absence of a correct definition for slope in TSM. Just
as in the case of fractions, students, teachers, and educators have been forced to
learn about slope without the benefit of knowing precisely what it is. Under the
circumstances, the nonlearning of slope—like the nonlearning of fractions—becomes
all but inevitable.
It is a mathematical and pedagogical imperative that students understand why
one single number can be attached to a line to supposedly describe its "slant"
(whether it is this way \ or that way /) and its "steepness". To this end, we devote
all of Section 6.4 to a detailed discussion of a correct definition of slope: what
1 See page xiv of the preface for the definition of TSM.
2 See page 299.
3 In the same way we understand "Faustian bargain" or "Catch 22" without having to find
For √
all real numbers x, we can find a real number y so that
y = 3x − 7.
For some
√ real numbers x, we can find a real number y so that
y = 3x − 7.
There
√ are an infinite number of fractions x and y so that y =
3x − 7.
There
√ are an infinite number of positive integers x and y so that
y = 3x − 7.
The importance of quantification can be seen by noting that, despite the similarity
between the first two statements,
√ the first is false (e.g., x = 0) and the second is
true (e.g., x = 3 and y = 2). Similarly, despite the similarity between the last
two statements, the first is true whereas the second is false (see Exercise 1 on page
320).
A pertinent remark in this connection is that many school students7 commit
the elementary error of √
writing down symbolic expressions without quantifying the
symbols, such as "y = 3x − 7" above. Very likely, the only way to combat this
widespread abuse is to not allow TSM to take root in students’ thinking right from
the beginning. Let us teach them to always quantify their symbols.
6 Usually using letters of the English alphabet, but often using letters from the Greek alphabet
as well because it is easy to run out of appropriate symbols for a particular task.
7 And a good number of college students too.
300 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
To make sure you see why it is important to always quantify your symbols,
we take up another example that has more mathematical substance. Consider the
following three statements:
(C1) n ≥ 3 and an + bn = cn .
(C2) For any positive integer n ≥ 3, there are no positive num-
bers a, b, and c so that an + bn = cn .
(C3) For any positive integer n ≥ 3, there are no positive integers
a, b, and c so that an + bn = cn .
The statement (C1) has no meaning, because we do not know what the sym-
bols a, b, c, and n stand for. If a and b in (C1) are 2 × 2 matrices and c is a
3 × 3 matrix,√ then (C1) is false, but of course (C1) is true if n = 3 and a = 1,
b = 2, c = 3 9 (the cube root of 9). (C2) is totally false because no matter what
n may be and no matter what the positive numbers a and b may be, letting c be
the positive n-th root of an + bn (see Theorem 4.2 in Section 4.2 of [Wu2020b])
will always yield the desired equality of numbers, an + bn = cn . Finally, one may
recognize statement (C3) as the famous Fermat’s Last Theorem, first conjectured
by Pierre Fermat in 1637 but not proved until Andrew Wiles did so in 1995 (see
[WikiFermat]; we will have more to say about Fermat on page 308). Not to harp
on the obvious, but the statements (C2) and (C3) differ by just one word in the
quantifications of a, b, and c. Moral: Precise quantification of symbols is important.
Once the need for quantification of symbols is understood, we now clarify the
use of the word "variable". First we give an example. Consider the problem of
finding all the numbers x which satisfy 3x + 7 = 5. In the usual jargon, this is
known as solving the linear equation 3x + 7 = 5. We will take a serious look
at "what an equation means and how to solve an equation" in Section 6.2 on pp.
322ff., but we will proceed informally at this juncture to get our point across. With
this understood, the usual procedure for solving such equations yields 3x = 5 − 7,
and therefore
5−7
x= .
3
There is a reason why we do not carry out the computation in the numerator to
write the solution as −2 1
3 , and it is because if we consider 3x + 2 = 13 instead, then
we get
13 − 12
x= .
3
Or, consider 3x − 25 = 4.6 and by rewriting it as 3x + (−25) = 4.6, we get
4.6 − (−25)
x= .
3
Or, consider 5x − 25 = 4.6 and get
4.6 − (−25)
x= ,
5
and so on. There is an unmistakable abstract pattern here: one can easily verify
that, with a, b, and c (a = 0) understood to be three fixed numbers throughout the
following discussion, the solution of the linear equation ax + b = c is
c−b
x= .
a
6.1. SYMBOLIC EXPRESSIONS 301
We have now witnessed the fact that in some symbolic expressions, the symbols
stand for elements in an infinite set of numbers,8 e.g., the statement that mn = nm
for all real numbers m and n, while in others, the symbols stand for the element
in a set consisting of exactly one element (in other words, they stand for a fixed
value throughout the discussion), e.g., the numbers a, b, and c in the preceding
linear equation ax + b = c. In the former case, the symbols m and n are called
variables, and in the latter case, a, b, and c are called constants. Notice that
such terminology is no more than an afterthought when we have carefully quantified
the symbols in each situation. There is in fact no need for the words variables
and constants when such information is already contained in the quantification.
However, we will continue to use them not only because they have been in use
for over three centuries and are everywhere in the mathematics literature, but also
because they are at times an indispensable shorthand.
There are compelling reasons for singling out the terminology of "variable" and
"constant" for such an extended discussion. See the pedagogical comments on pp.
318ff. and 327ff., respectively.
In a situation where we try to locate any numbers x that satisfies a given equa-
tion (such as 2x2 + x − 6 = 0 or 2x = x), the value of the number x is unknown
to us, of course. For this reason, we will conveniently refer to the symbol x as an
unknown, just to save verbiage. To the extent that we will never make logical
deductions based on the properties of an "unknown", it is not necessary to make
this terminology more precise.9
At the risk of pointing out the obvious, note that we have been making use of
symbols from the very beginning of this volume out of necessity. One example is the
addition formula for fractions (equation (1.12) on page 33): for any two fractions
and n , where k, , m, n are whole numbers (the product n = 0),
k m
k m kn + m
+ = .
n n
If we do not use symbols, we would be forced to express the formula as follows:
The sum of two fractions is the fraction whose numerator is
the sum of the product of the numerator of the first fraction
with the denominator of the second, and the product of the
numerator of the second with the denominator of the first, and
whose denominator is the product of the denominators of the
given fractions.10
Even if you are inordinately fond of the English language, you will have to admit
that the symbolic statement is far more clear, and this is not even taking into
account the difficulty of trying to provide a mathematical derivation of this addition
formula without the benefit of symbols.
8 Strictly speaking, all that matters is that the symbols stand for elements in a set consisting
of more than one element. But for school algebra, "infinite" suffices for the purpose at hand.
9 This saves us from the need to discuss the relationship between an unknown and a variable.
10 This was the way mankind had to express formulas from al-Khwarizmi (c. 780 to c. 850)—
the person whose name gave birth to the word "algorithm"—all through the Middle Ages to
the time of François Viète (1540–1603). The codification of the symbolic notation is generally
attributed to R. Descartes (1596–1650). See [Bashmakova-Smirnova].
302 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
This example may serve the purpose of explaining to students why the use of
symbols is a necessity. Of course, there are innumerable other examples as well.
11 We continue to make use of some mathematics that we have not yet discussed—but which
you most likely know—to illustrate a point. For the issue at hand, see the appendix in Section
1.4 of [Wu2020c].
304 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
1 + tan2 x = sec2 x
as an identity without any qualifications and leaves it to the readers to figure out
that the equality is claimed only for x not equal to an odd integer multiple of π2 .
Therefore, you have to be careful about the range of the values of x for which the
equality is supposed to hold in each case. This is but one example among many
where we sometimes use mathematical terms out of respect for tradition but not
as precise mathematical concepts. Usually, these terms do manage to suggest a
pleasant mental image, and that should count for something. "Variable" is another
example of such usage, as we already discussed.
Fortunately, a majority of the well-known identities are valid for all numbers,
with no exceptions. We will focus on these in this section.
Now with the terminology of an identity understood, let us list the three most
common identities:
At least three comments should be made. The first is that when x and y are
rational numbers, these three identities follow from routine number computations
using the associative, commutative, and distributive laws. For example, here is a
proof of (6.3): for any numbers x and y,
Because FASM (page 133) assures us that the associative, commutative, and dis-
tributive laws (for both + and ×) continue to hold for real numbers, these identities
are valid for all real numbers x and y. A second comment is that once we have
the first identity (6.1), the second one (6.2) becomes a trivial consequence of the
first because (6.1), being valid for all numbers x and y, is also valid for x and −y.
Therefore, for any numbers x and y,
which is the same equation as (6.2). Thus we have proved that (6.1) implies (6.2).
In a similar manner, we can prove that, conversely, (6.2) implies (6.1). In the
terminology of page 22, the first identity (6.1) is equivalent to the second one (6.2).
You may be wondering why we bother with the preceding proof of the second
identity since a simple direct computation already proves (x − y)2 = (x − y)(x − y).
Our point is that, at the beginning stage of algebra, your students are put in touch
with the idea of generality for the first time; namely, the first identity is not just
the equality of the expressions (x + y)2 and x2 + 2xy + y 2 for certain numbers
x and y, but that it is valid for all numbers x and y. If you can teach them to
take the latter statement seriously and make them realize that the validity of the
equality when y is replaced by −y immediately implies the validity of the second
identity, then you have taught them something valuable. One may paraphrase by
saying that, because of the generality of the first identity, the first identity already
contains the second identity as a special case. An important part of learning algebra
is to become alert to the potential implications of a general statement. From this
vantage point, the derivation of the second identity from the first now becomes a
noteworthy demonstration of the power of generality.
A third comment is that, in practice, the usefulness of these identities is, more
often than not, derived from one’s ability to read these identities also from
right to left, i.e., the ability to recognize in a given situation that, for any two
numbers x and y,
x2 + 2xy + y 2 is equal to (x + y)2 ,
x2 − 2xy + y 2 is equal to (x − y)2 ,
x2 − y 2 is equal to (x + y)(x − y).
For example, 25x2 + 49y 2 − 70xy is equal to (5x − 7y)2 , because
We pause to make another remark about identities. We may rewrite, for ex-
ample, the identity (x − y)(x + y) = x2 − y 2 as
x2 − y 2
= x + y.
x−y
Remembering that we cannot divide by 0, we see that this equality holds for all
numbers x and y except when x = y. This equality is of course also considered to
be an identity—keeping in mind the exceptions.
We have thus far discussed the concept of an equation informally and the con-
cepts of an expression and an identity in some depth. On pp. 322ff. below, we will
elaborate on the concept of an equation. It is to be noted that our view on these
fundamental concepts in beginning algebra deviates from those typically found in
education research; see, e.g., [McCrory et al.].
306 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
An important identity
Let us rewrite this identity in the following way: for all numbers x and y,
(6.5) xn+1 − y n+1 = (x − y)(xn + xn−1 y + xn−2 y 2 + · · · + xy n−1 + y n ).
Thus, the difference of two numbers x and y raised to the same power can
always be expressed as a product of x − y and xn + xn−1 y + · · · + y n . By any
measure, this is a nice-looking identity. We will examine its consequences in some
detail in two different settings, and the two groups of results—which would seem
to be unrelated to each other—will nicely illustrate the concept of generality.
Since this identity is valid for all numbers x and y, it is certainly valid when
x and y are positive integers. In that case, observe in particular that the right
side of (6.5) is a product of two integers. If it happens that x > y > 0, then also
xn+1 > y n+1 > 0 (see Exercise 11 on page 132) and the identity says that the
positive integer xn+1 − y n+1 has a factorization (in the sense of page 138) as the
product of two positive integers: x − y and xn + xn−1 y + · · · + xy n−1 + y n . For the
sake of clarity, we restate it as follows: for all positive integers a, b, and n, so that
a > b > 0, we have the following factorization of the positive integer an+1 − bn+1 ,
which is a special case of identity (6.5):
(6.6) (an+1 − bn+1 ) = (a − b)(an + an−1 b + an−2 b2 + · · · + abn−1 + bn ).
If a−b > 1, then in particular a > 1 so that an +an−1 b+an−2 b2 +· · ·+abn−1 +bn > 1.
Therefore the positive integer an+1 − bn+1 , being the product of two integers each
bigger than 1, is not a prime (page 148). We have thus proved the following.
Lemma 6.1. If a, b are positive integers and a − b > 1, then for all positive
integers n, an+1 − bn+1 is not a prime.
6.1. SYMBOLIC EXPRESSIONS 307
For example, 1273 −663 = 1,760,887, and this lemma guarantees that 1,760,887
is not a prime. By no means is this fact obvious because its smallest divisor is 61.
In fact, the prime decomposition (see Theorem 3.6 on page 149) of 1,760,887 is
61 × 28867.
In a similar vein, you can show off to your friends by challenging them to check
whether 13,997,513 is a prime. You of course know that it is not a prime because
13,997,513 = 2413 − 23 .
What makes the testing of the primality of this number difficult is that the prime
decomposition of 13,997,513 is 239 × 58567. In other words, its smallest divisor
is 239, so that guess-and-check will not work efficiently in this case. (Again, the
identity 2413 − 23 = (241 − 2)(2412 + 241 · 2 + 22 ) also happens to exhibit the prime
decomposition of 13,997,513.)
Activity. Let a and b be integers. Does (a5 − b5 ) divide a15 − b15 in the sense
of page 138? Does (a2 + ab + b2 ) divide a15 − b15 ?
Suppose b = 1 in identity (6.6). Then we get, for all positive integers a and n,
Mersenne primes
Lemma 6.2 explains why 24 − 1 (= 15) and 26 − 1 (= 63) in the above list are
not primes.
In view of Lemma 6.2, our original question about which of 2n+1 − 1 are primes
can now be simplified to the following:
Which of 2p − 1 are primes, as p runs through all the primes?
In 1644, Father Mersenne12 claimed that, among all the primes p < 258, 2p − 1
is a prime exactly when
p = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127, 257.
Keep in mind that the computer is a creation of the latter part of the twentieth
century, so it was a nontrivial matter in the days of Mersenne to check the primality
of a (whole) number with, say, ten digits such as 231 −1 = 2, 147, 483, 647. Mersenne
probably did not test the primality of all the numbers he wrote down, and his
statement was likely a mixture of guessing and wishful thinking. It is not even
clear whether he actually proved that his list was correct for p ≤ 31 because, if he
did, he would have to verify the following two assertions:
(A) The number 2p − 1 is composite for p equal to 11, 23, and
29.
(B) The number 2p − 1 is prime for p equal to 13, 17, 19, 31 (the
cases of p = 2, 3, 5, 7 are obvious).
For (A), the fact that 211 − 1 = 23 · 89 was known back in 1536, and the fact that
223 − 1 is composite was found by Fermat13 in 1640 as a refutation of the opposite
claim made by Pietro Cataldi (1548–1626) in 1603. The case of 229 − 1 was not
known at the time Mersenne made his conjecture, and it would stay that way until
Euler,14 almost a century after Mersenne made his conjecture, managed to find the
12 Marin Mersenne (1588–1648) was a French theologian and amateur mathematician. He was
a friend of all the French mathematical luminaries of his time, including Descartes, Fermat, Pascal,
and Desargues, and performed the critical service of disseminating mathematical information
among them at a time when mathematical publications were basically nonexistent.
13 Pierre Fermat (1607–1665) was a French lawyer; he was also an amateur mathematician
but one of the greatest mathematicians of all time nonetheless. The terminology of "Cartesian
coordinates" masks the fact that Fermat was a codiscoverer of analytic geometry with Descartes;
in fact Fermat made the discovery a few years earlier and seemed to have a better understanding
of the potential of his discovery. Fermat was a cofounder of the theory of probability (with Blaise
Pascal), and the modern theory of numbers also owes its existence to Fermat. We already had
occasion to mention Fermat’s Last Theorem on page 300. His definition of the tangent to a curve
at a point inspired Newton’s definition of the derivative (see Theorem 6.19 in [Wu2020c]).
14 Leonhard Euler (1707–1783), the most productive mathematician of all time, was the
dominant mathematician of the eighteenth century and rightfully ranks among the greatest. He
made important contributions in every part of mathematics, physics, and astronomy as they were
known in his time.
6.1. SYMBOLIC EXPRESSIONS 309
prime decomposition 229 − 1 = 233 · 1103 · 2089 in 1738. As for (B), the primality
of 213 − 1 = 8191 only takes a little patience and was known in any case as far back
as the fifteenth century. The primality of 217 − 1 and 219 − 1 was verified by the
same Cataldi in 1588. But the primality of 231 − 1 was finally proved only in 1772
by Euler, some 130 years after Mersenne made his conjecture.
With the help of an electronic computer, we can easily see that Mersenne’s list
contains five errors: 61, 89, 107 should have been on his list but they were not,
and the numbers 67 and 257 which were on his list shouldn’t have been there (i.e.,
267 − 1 and 2257 − 1 are composites).
As a result of Mersenne’s list, a number of the form 2p − 1 which is a prime is
called a Mersenne prime. You can get more information about Mersenne primes
from http://mersenne.org/.
So what are the Mersenne primes? Unfortunately, we know very little about
this question. We do not even know if there are an infinite number of Mersenne
primes. A search for bigger and bigger Mersenne primes is an ongoing enterprise
(see the website above), and one of the side benefits of this search is that each
time a larger Mersenne prime was found, it also turned out to be the largest prime
number known to mankind. As of January 2020, 51 Mersenne primes are known,
and the largest is a number with more than 24 million digits corresponding to the
prime p = 82,589,933 (discovered on December 7, 2018). From a mathematical per-
spective, this search would acquire greater significance if we knew that the number
of Mersenne primes is finite.
We may assume that in both (6.9) and (6.10), x = 0 because the case of x = 0
is not interesting. Recalling16 that 1 = x0 , we may consider this identity as the
expression of the sum of all the whole number powers of x up to and including xn
as a quotient (xn+1 − 1)/(x − 1). For example, if x = −3 and n = 12, then
(−3)13 − 1 1594324
1 − 3 + 32 − 33 + 34 − · · · − 311 + 312 = = = 398581.
−3 − 1 4
3
But if x = 4 and n = 15, then we have
1+ 3
4 + ( 43 )2 + ( 43 )3 + · · · + ( 43 )15 = {( 34 )16 − 1}/( 43 − 1)
Mathematical Aside: It is worthwhile to point out that the identity (6.5) also
gives a short proof of the calculus fact that the derivative of xn+1 (where n is
a positive integer) is equal to (n + 1)xn . Briefly, the proof goes as follows (see
the proof of Theorem 6.16 in [Wu2020c]). The derivative of xn+1 at a is the
limit of the difference quotient (xn+1 − an+1 )/(x − a) as x goes to a. Because
of (6.5), the numerator of the difference quotient is equal to the product of (x − a)
and xn + xn−1 a + · · · + xan−1 + an . Thus the difference quotient itself becomes
xn + xn−1 a + · · · + xan−1 + an after (x − a) has been canceled from the numerator
and denominator. So as x converges to a, we get an + an + · · · + an (n + 1 times),
which is (n + 1)an .
We next introduce polynomials, but first a remark before the formal definition.
Underlying the whole discussion of polynomials in school algebra is a basic technique
16 For a fuller discussion of the 0-th power of x, see Section 4.2 in [Wu2020b].
6.1. SYMBOLIC EXPRESSIONS 311
3
8
14
14 3 3
24×59 −
14
× 89 + 59 ×73 + 59 ×66 + 25 × + × 11
5 5 5
as 8
3
163 × 5914 − 53 × ,
5
where 163 = 24 + 73 + 66 and −53 = −89 + 25 + 11. Recall once more that we refer
to both of the above expressions as a "sum" because subtraction is just addition in
disguise (see page 96).
In an entirely similar manner, suppose we are given a sum of multiples of non-
negative integer powers of a fixed number x, where multiple here means simply
312 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
multiplication by any number and not necessarily by a whole number and "nonnega-
tive integers" refers to the whole numbers 0, 1, 2, . . . . Then we would automatically
collect together the terms involving the same power of x as before. For example,
we will write
1 3 1
x + 16 − 8x2 + x3 − x5 − 6x2 + 75x + 2x3
2 3
as
17
(6.11) −x5 + x3 − 14x2 + 75x + 16.
6
A sum of multiples of nonnegative integer powers of x is called a polynomial in
x.17 We also agree to call any expression in a number x a polynomial in x if it
is equal to a sum of multiples of nonnegative integer powers of x after applications
of the associative, commutative, and distributive laws to the expression. Thus the
expression in (6.11) is a polynomial in x, as is (x − 1)(x + 2) because it is equal to
x2 + x − 2 after expansion. We have followed three notational conventions that
are generally followed in discussions about polynomials:
(A) Parentheses are usually suppressed, with the understanding
that exponents are computed first, multiplications second, and
additions third.
(B) The earlier convention is that the power of x is placed last
in each term, so that we write −14x2 instead of x2 (−14).
(C) The terms are written in decreasing powers of the number x
in question. (The term 16 is the term 16x0 , where, by definition,
x0 = 1; incidentally, this is where we need the concept of the
zeroth power of a number18 .)
In this connection, a caveat about (C) should be mentioned: a polynomial is some-
times written in increasing powers of the variable for a reason.19
As on page 303, the number in front of a power of x is called the coefficient of
that particular power of x. For the polynomial in (6.11), it is in reality equal to
17 3
(−1)x5 + 0 · x4 + x + (−14)x2 + 75x + 16x0
6
when it is written strictly as a sum of multiples of decreasing powers of x. Therefore,
the coefficient of x5 in (6.11) is −1, the coefficient of x4 is 0 (remember, x is just
a number, so that 0 · x4 = 0), the coefficient of x3 is 17 2
6 , the coefficient of x is
−14, the coefficient of x is 75, and the so-called constant term 16 is actually the
coefficient of x0 . A multiple of a single nonnegative power of x, such as 58x12 , is
called a monomial. Thus, a monomial is a polynomial with only one term. The
highest power of x with a nonzero coefficient in a polynomial is called the degree
of the polynomial. The terminology about "nonzero coefficient" refers to the fact
that the preceding polynomial −x5 + 17 6 x − 14x + 75x + 16 could be written as
3 2
0·x37 −x5 + 6 x3 −14x2 +75x+16, but the 37-th power of x clearly does not count.
17
17 Mathematical Aside: This definition of a polynomial with real coefficients is the appropriate
one for school mathematics and is based on the fact that the polynomial ring over R is ring-
isomorphic to the ring of R-valued polynomial functions.
18 Note that in this case, we have to make an ad hoc definition by agreeing to write x0 = 1
in increasing powers.
6.1. SYMBOLIC EXPRESSIONS 313
This polynomial has degree 5, and not 37 (and not any whole number different from
5, for that matter).
This is the place to make a comment on the notational convention (A) above.
As we said earlier, visual clarity in the notation we use is important. We find the
polynomial
17
−x5 + x3 − 14x2 + 75x + 16
6
easy to work with because it is visually simple. As written, though, this symbolic
expression is a priori ambiguous because it could mean, among other things, the
following:
3
17
(−x)5 + x − {(14x)2 + 75}x + 16.
6
But of course what we have in mind is
17 3
{−(x5 )} + (x ) + {−14(x2 )} + {75x} + 16.
6
(We need not specify the order of doing the additions at this point because of the
general associative law; cf. Theorem 1 in the appendix of Chapter 1, page 87.) So
the net effect of notational convention (A) is to eliminate the need to write the
last cumbersome-looking expression by declaring that the expression −x5 + 17 6 x −
3
2
14x + 75x + 16 already means the same thing. This is all there is to the notational
convention (A). It is a convention, and one should not invest more mathematical
significance in a convention than what it truly is.
As is well known, school mathematics is sometimes led astray by misplaced
emphases. Convention (A) has somehow become enshrined in the middle school
curriculum under the name of order of operations. In TSM, mnemonic devices
were created to help students memorize it (PEMDAS, "Please Excuse My Dear
Aunt Sally"), and standard assessments likewise contribute to promote the impor-
tance of this convention. A mathematics classroom has to deal with conventions,
of course, but a convention should be put in its proper place and not be magnified
out of proportion. A more moderate approach would be to explain the genesis of
the convention (A) above, quiz students at the beginning to make sure they get it,
and go on to more important things. For a fuller discussion, see [Wu2004b] on the
so-called "order of operations".
A polynomial of degree 1 is called a linear polynomial, and a polynomial of
degree 2 is called a quadratic polynomial. Because a general quadratic poly-
nomial has only three terms, ax2 + bx + c, it is also called a trinomial in school
mathematics. However, the term "trinomial" is used only in school mathematics,
not in higher mathematics; it should therefore be avoided in general discussions.
We will discuss quadratic polynomials in some detail in Chapter 2 of [Wu2020b].
A polynomial of degree 3 is called a cubic polynomial.
The most familiar polynomials are the so-called expanded forms of whole num-
bers; these are polynomials in the number 10. For example, the expanded form of
75,018 is
(7 × 104 ) + (5 × 103 ) + (0 × 102 ) + (1 × 101 ) + (8 × 100 )
which is a fourth-degree polynomial in 10. Of course the expanded form of any
k-digit whole number is a polynomial of degree (k − 1) in 10. On the other hand,
314 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
Rational expressions
on page 119):
0.5x3 + 1 2x7 (0.5x3 + 1)(x3 + 37 ) + (2x7 )(x8 + x − 2)
+ 3 3 =
x +x−2
8 x +7 (x8 + x − 2)(x3 + 37 )
and
3 2
2x + 1 6 ( 32 x2 + 1)(6)
· =
x + 4x −
2 7 3x4 − 5 (x2 + 4x − 7)(3x4 − 5)
and
2x+1
x2 −0.3 (2x + 1)(2x)
= .
4x −x+11
3
2x
(x2 − 0.3)(4x3 − x + 11)
These are exactly the same as any computation with rational numbers. This re-
alization is important in the teaching of algebra because, when you teach rational
expressions, you should remind students that if they know how to handle rational
numbers, then they already know all there is to know about this topic. There is so
much in introductory algebra that is just a revisit of arithmetic.
Because the cancellation law is valid for rational quotients (item (a) on page 118,
AC = C for all rational numbers A, B, C, with A = 0, C = 0), some
which says AB B
x2 + 2x + 4
can be simplified to 18 (x − 2) because, by an identity (6.5) on page 306,
1 3 1 1 1
x − 1 = (x3 − 8) = (x3 − 23 ) = (x − 2)(x2 + 2x + 4)
8 8 8 8
and we can cancel the number (x2 + 2x + 4) from the numerator and denominator.
(As we know from the theory of quadratic equations, e.g., Section 2.1 in [Wu2020b],
this particular rational expression in x actually makes sense for all (real) numbers
x because x2 + 2x + 4 has a negative discriminant and is therefore never equal to
3 −8
0. Consequently, the identity x2x+2x+4 = x − 2 is in fact an identity for all (real)
numbers x.)
In beginning algebra, often there is too much emphasis on simplifying rational
expressions. This is a leftover from the ill-informed practice of teaching fractions
by insisting on the reduction of all fractions to lowest terms.
It remains to round off this discussion by mentioning that one can easily define
polynomials in several numbers x, y, z, etc., and therefore one can likewise
define rational expressions in x, y, z, etc.
318 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
to "open sentences" in the last item of the preceding list is part of the erroneous
thinking that√led to the problem of asking students to interpret a string of symbols
such as y = 3x − 7 on page 299.
You may have gotten the idea by now that, in order to make mathematics
learnable, we must express mathematical messages clearly and precisely. But if
clarity and precision are what you are after, then TSM would be the wrong place to
look because, for instance, none of the preceding so-called definitions of a "variable"
has the necessary clarity and precision that we normally demand of a mathematical
definition. A "variable" is certainly not "a quantity that changes or varies", because
nothing in mathematics ever changes or varies. Let us get down to the details of this
assertion. Anticipating a later discussion about functions, suppose we want to say
one real-valued function f is less than another function g; i.e., in symbols, f < g.
In the spirit of "a quantity that changes or varies", this is a formidable statement
about one quantity f (x) that varies being smaller than another quantity g(x) that
also varies. This suggests that, even when two quantities wiggle all over the place,
one can discern in some mysterious fashion that one is "smaller" than the other.
This is heady stuff, but for the purpose of understanding and doing mathematics,
the correct definition of f < g is much simpler and much more mundane, and it
is this: for every point x0 in their common domain of definition, the two numbers
f (x0 ) and g(x0 ) satisfy f (x0 ) < g(x0 ). In other words, just check one point at a
time, so that at each x0 , f (x0 ) < g(x0 ). Nothing varies.
The truth is that a variable is not a mathematical concept. It is rooted
in tradition and was used as suggestive language back in the days when the very
concept of a function had not been clearly formulated and mathematics did not
possess the transparency and precision that it does today. In the year 2020, the
"concept of variable" is kept alive as an integral part of mathematics only in TSM.
It is time to teach students how to use symbols correctly and not allow this bogus
"concept" to wreak havoc on student learning.
It is possible that you consider this insistence on the proper use of symbols—
the basic etiquette in the use of symbols (page 299)—to be nothing more than a
piece of pedantry flaunted only by professional mathematicians. To dispel this
misconception, consider a typical passage from TSM:21
y = −3 implies y sin x = −3 sin x, for all real values of x but
y sin x = −3 sin x does not imply y = −3 because the principle
that ac = bc a = b [sic] does not apply when c = 0 (i.e., a(0) =
b(0) does not imply that a = b). ([MUST], page 395)22
We will take for granted that x and y are numbers. Consider the second phrase,
but y sin x = −3 sin x does not imply y = −3.
Without knowing what x is, this phrase can have no meaning; it may be true or
it may be false, as the ensuing discussion shows. Let us try to properly quantify
the symbol x. In context, the beginning of the passage ("y = −3 implies y sin x =
−3 sin x, for all real values of x") suggests that, perhaps, what is intended is the
21 As noted in the footnote on page 303, we continue to make use of some mathematics that
we have not yet discussed, but which you most likely know, to illustrate a point.
22 There is an obvious typo in the original, which states "x = −3 implies x sin x = −3 sin x
but x sin x = −3 sin x does not imply x = −3". We have done our best to make sense of it.
320 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
statement that "but y sin x = −3 sin x for all real values of x does not imply y =
−3". If so, then this statement is incorrect since by letting x = π2 , we get y·1 = −3·1
so that we get y = −3, contrary to the claim.
Now it is possible that what is intended is, instead, the following:
but if x = π, then y sin x = −3 sin x does not imply y = −3.
Indeed, we then get y · 0 = −3 · 0, so that 0 = 0 and one cannot reach the conclusion
that y = −3. This would be consistent with the last part of the passage, although
if this is the case, it would be far simpler to replace the whole passage by a direct
statement: "if a, b, and c are real numbers, then unless we know c = 0, the equality
ac = bc does not imply a = b."
In summary, we see that the above passage is meaningless because symbols
are used without proper quantification. Moreover, when the symbols are properly
quantified, the passage becomes either wrong or trivial. But we should not take the
easy way out by ascribing such errors to the carelessness of the authors of the pre-
ceding passage. This must be recognized for what it is, a systemic error in TSM.
Those of us who allow TSM to be taught to generations of students and ensure
that TSM (rather than mathematics) remains the enduring content component of
mathematics education research are the main culprits. Let us do better and teach
the basic etiquette in the use of symbols in mathematics education.
Exercises 6.1.
(1) (a) Show that every positive integer can be expressed as 3k, 3k+1, or 3k+2
for a whole number k. (Hint: Use division-with-remainder.) (b)√Use (a)
to show that there are no positive integers x and y so that y = 3x − 7.
(c) Show
√ that there are an infinite number of fractions x and y so that
y = 3x − 7.
(2) Suppose you know (x − y)2 = x2 − 2xy + y 2 for all numbers x and y. Using
this fact alone, prove that (x + y)2 = x2 + 2xy + y 2 for all numbers x and
y.
(3) If x is a nonzero number and n is a positive integer, what is
1 1 1 1 1 1
−1 + 3
− 6 + 9 − 12 + · · · + 6n−3 − 6n ?
x x x x x x
(4) Find the sum of
57 58 59 510 532
− + − + · · · − .
68 69 610 611 633
(5) If x is a number that makes all the denominators nonzero in the following,
simplify
(2x3 − 9x2 − 5x)/(x − 2)2
.
(x2 − 3x − 10)/(x4 − 16)
1 y
(6) If x and y are numbers and x = y, what does 2 + 3 =?
x − y2 x + y3
Simplify your answer as much as possible.
(7) It was known to Archimedes that for any positive integer k > 1 and for
any positive integer n,
1 1 1 1 k
1 + + 2 + ··· + n + = .
k k k (k − 1)k n k−1
6.1. SYMBOLIC EXPRESSIONS 321
(a) Prove that Archimedes’ identity is correct even when k is any num-
ber different from 0 and 1. (b) Conversely, prove that the generalized
Archimedes identity of part (a) implies the summation formula for the
finite geometric series.
(8) We have seen that identity (6.8) on page 309 is a special case of identity
(6.5) on page 306. Now prove the converse: identity (6.8) implies identity
(6.5).
(a) Prove the following identity: ab = ( a+b 2 ) − ( 2 ) for all numbers
2 a−b 2
(9)
a and b. (b) Use this identity to give a different proof of part (b) of
Exercise 18 in Exercises 2.6, page 132; i.e., among all rectangles with a
fixed perimeter, the square has the biggest area.
(10) Let a, b be positive integers, not both equal to 1, and let n be an odd
positive integer > 1. Prove that an + bn is not a prime. Would this hold
if n is even?
(11) (a) Factor 4x2 − 12x + 9 and 30x2 + 16x + 2 for any number x. (b) Factor
(30x2 − 16x + 2)2 − 5(30x2 − 16x + 2) − 14 for any number x. (c) Factor
s4 + s2 t2 + t4 into polynomials in s and t with integer coefficients for any
numbers s and t. (d) Factor s4k + s2k t2k + t4k for any positive integer k.
(12) Simplify (assuming the denominators below are never zero for the numbers
x and y):
2x2 +7x+3
4x4 − 9y 4 15x3 y 4 − x4 y 3 x3 −7x+6
(i) ; (ii) ; (iii) .
4x4 + 12x2 y 2 + 9y 4 60x5 y 2 − 4x4 y 3 x+2
x−1
(13) How much money is in an account at the beginning of the sixteenth year
if the initial deposit is $500, the annual interest rate is 5%, and at the
beginning of each year starting with the second, $10 is added to the ac-
count? Write down the formula, and then use a scientific calculator to get
a numerical answer.
(17) We are given two whole numbers so that, when the larger number is
divided by the smaller number, the quotient is 9 and the remainder is 15,
and so that the larger number is 97.5% of ten times the smaller number.
If x is the larger number and y is the smaller number, express the given
information in equations in terms of x and y.
(18) A video game manufacturer sells out every game he brings to a game
show. He has two games, an A Game and a B Game. He can bring 50 of
A Games and B Games in total to the show. Each A Game costs $75 to
manufacture and brings in a profit of $ 125. Each B Game costs $165 to
manufacture and brings in a profit of $185. However, he only has $6,000
to spend on manufacturing. If he brings x A Games and y B Games,
describe in terms of x and y how he can maximize his profit.
What is an equation?
Activity. Prove the assertions about the number of solutions of the given
equations in the preceding paragraph.
When the two expressions are linear polynomials, let us say ax + b and cx + d
in x, where a, b, c, d are given numbers, then the question of whether there is a
number k so that ak + b = ck + d is called a linear equation in one variable,
the variable being x. The given numbers a, b, c, d are the constants (see page 301)
of the equation. The usual representation of this linear equation is
ax + b = cx + d.
Be that as it may, one should not lose sight of the fact that "ax + b = cx + d"
is nothing more than a compact representation of the preceding question. More
generally, if two number expressions in a number x have the property that each
expression becomes an expression of the form ax + b after applications of the com-
mutative, associative, and distributive laws, then the question about whether they
are equal or not will also be called a linear equation in one variable. Thus
the equation in x, 5x − 75 + (2 − 4x) = 6x − (68 + 3x) − 7, is a linear equation in
one variable because it is easily seen to be the linear equation x − 73 = 3x − 75
(5x − 75 + (2 − 4x) = x − 73 and 6x − (68 + 3x) − 7 = 3x − 75).
What has been said about equations in one variable extends to equations in any
number of variables. For our need, we single out the case of two variables. Thus, an
equation in two variables x and y, normally written as f (x, y) = g(x, y), is a
question that asks, when two expressions f (x, y) and g(x, y) in the numbers x and y
are given, whether there is a pair of numbers k and so that f (k, ) = g(k, ). Such
a pair k and is called a solution of f (x, y) = g(x, y). To solve the equation
f (x, y) = g(x, y) is to find all its solutions.
equation is! A second reason is more subtle. We have emphasized all along the need
for precision and transparency to make school mathematics more learnable. Now,
an equation such as x2 = 7x − 10 has a "variable" x on each side, and a "variable"
is something that "varies", according to TSM. So if both of the expressions x2 and
7x − 10 are "varying" all the time, what does it mean to say that they are "equal"?
It certainly does not mean that no matter how x varies, the equality x2 = 7x − 10
holds for all values of x (try x = 0, for instance). It is a fact that TSM does not
even explain what the equality x2 = 7x − 10 involving a variable x is all about. Yet,
by routinely asking students to solve such an "equation", TSM is as good as telling
them not to ask what an "equation" is23 but just go ahead and "do something with
it to get an answer". We can all agree that it is imperative to remove this mindset
from school classrooms. To this end, a first step will be to offer students a precise
definition of an equation and, strictly on the basis of this definition, show them how
to solve the equation. We will do just that in the next subsection.
We note that this definition of an equation in a symbol is not the traditional
one in education research (cf. [McCrory et al.] and further references therein). End
of Pedagogical Comments.
In this subsection, we will limit ourselves to solving linear equations of the form
ax + b = cx + d. The main theorem about solutions to a linear equation is Theorem
6.3 on page 326. Note, however, that the reason we spend the effort to explain how
to solve linear equations is that the same principle will be broadly applicable to the
solution of any equation. Therefore it will be effort well spent.
In TSM, a linear equation in the number x such as
27
(6.13) 3x − 4 = x+1
5
is supposed to be solved via the following easily recognizable steps.
5 x + (3x − 4) = − 5 x + ( 5 x + 1).
(a) − 27 27 27
(b) − 12
5 x − 4 = 1.
5 x − 4) + 4 = 1 + 4.
(c) (− 12
(d) − 12
5 x = 5.
5 x) = (− 12 ) · 5.
5
(e) (− 12 )(− 12 5
(f) x = − 25
12 .
The conclusion is that − 25
12 is the solution of the equation 3x − 4 = 5 x + 1.
27
Perhaps you are so accustomed to this routine method of solution that you can
no longer see its many flaws or imagine how it can stultify a beginner’s mathematics
learning. Let us consider a few questions that a beginner might raise:
(A) How do we know that the "solution" in step (f), i.e., − 25 12 , is a solution
of equation (6.13)? Nothing in the computations given in (a)–(f) shows that − 25 12
satisfies equation (6.13) or that it is the only solution.
(B) Recall that, in TSM, the x is a "variable". The six steps (a)–(f) make the
assumption that we can do arithmetic with the "variable" x as if it were a number.
If one interprets x as a "quantity that varies", then this makes no sense because,
23 Once again, "Ours is not to reason why."
6.2. SOLVING LINEAR EQUATIONS IN ONE VARIABLE 325
thus far, there is no theorem that says "quantities that vary" satisfy the associative,
commutative, and distributive laws. Yet, (a)–(f) freely make use of these laws. For
example, in going from (a) to (b), the distributive law is used to conclude that
5 x + 3x = − 5 x. But why?
− 27 12
(C) Now suppose we interpret x as just a number (cf. the second item in the
TSM definitions of a "variable" on page 318); then the equations in (a)–(f) become
equalities between numbers. The question is whether the number x has to be some
special number or just any number for (a)–(f) to hold. For example, if x = 0,
then (b) would read "−4 = 1", which is absurd, and (d) would read 0 = 5, equally
absurd, and so on. If x has to be a special number, then what is it, and why wasn’t
this announced from the beginning?
It is possible that beginners do not know enough to articulate these doubts, but
very likely they have them in the back of their minds. Yet, after years of not getting
satisfactory answers—because TSM does not provide answers to basic questions—
most students have learned to suppress their natural curiosity by the time they get
to middle school for the sake of "getting the right answer". So they stop asking why
and just follow instructions. The end result is that students coming out of TSM
know how to get answers by memorizing facts and following template solutions, but
apparently not much else. Unfortunately, in the year 2020, robots are beginning
to excel exactly at carrying out instructions to perform preassigned tasks (see,
e.g., [Paquette] and [WikiAlphaGo]). TSM therefore threatens to produce students
who, upon graduating from high school, possess no technical skills that can help
them outperform a robot. In other words, until we improve school mathematics
education, our public education system will run the risk of producing students who
become instantly obsolete upon graduation. We have to avert this catastrophe by
making a real effort to teach students how to reason, which seems—for the time
being, at least—to be beyond the capabilities of robots. Let us forsake TSM and
make an effort to shed light on every murky corner of school mathematics to make
it transparent and learnable so that we can answer students’ questions, and let us
encourage them to never cease asking why. For the case at hand, let us explain how
to solve a linear equation correctly.
We begin by confronting the fact that, as it stands, the solution method de-
scribed in (a)–(f) makes no sense.24 However, because of its simplicity and the
fact that it works, this solution method is not going away anytime soon. For this
reason, we will reinterpret (a)–(f) in two different ways to help students make sense
of them (see pp. 327 and 329). But first, we will explain a correct method of solving
equation (6.13), i.e., 3x − 4 = 275 x + 1.
To begin with, we do not claim that there will be any solution to this equation.
Rather, we take the position that if there is such a solution, let us say x0 , then we
can find out what x0 must be. Let us therefore assume that there is such a solution
x0 to equation (6.13). Then, by the definition of a solution, we have
27
3x0 − 4 = x0 + 1.
5
24 Mathematical Aside: From an advanced standpoint, one may be tempted to argue that
We may therefore apply the ordinary arithmetic operations to the numbers on both
5 x0 to each side of this equality, we get − 5 x0 +
sides of this equality. By adding − 27 27
3x0 − 4 = 1, which is
12
(6.14) − x0 − 4 = 1.
5
In standard terminology, what we have just done is transpose the term 27 5 x0 to
the other side. Using the same terminology, let us transpose −4 to the right side
by adding 4 to both sides, thereby getting − 12
5 x0 = 1 + 4, which is
12
(6.15) − x0 = 5.
5
Now multiply both sides of this equality by − 12
5 5
and we get x0 = (− 12 ) · 5, which
is
25
(6.16) x0 = − .
12
We pause to observe that—if we are willing to conflate x0 with x—then equa-
tions (6.14)–(6.16) are identical with the equations in (b), (d), and (f) on page
324.25
To resume our discussion, let us summarize what we have accomplished: we
have proved that if there is a solution x0 to equation (6.13), i.e., 3x − 4 = 27 5 x + 1,
then this x0 must be equal to − 25 12 . In other words, we have just proved that the
solution to equation (6.13) is unique: if it exists at all, it will have to be − 25
12 . Note
that this says nothing about − 2512 being a solution to the original equation (6.13).
However, the verification that, indeed, − 2512 is a solution to (6.13) is straightforward:
simply check that
25 27 25
3 − −4= − + 1.
12 5 12
This is routine, provided one is fluent in computations with rational numbers: both
12 . So − 12 is a solution of 3x − 4 = 5 x + 1.
sides are equal to − 123 25 27
Altogether, we have just proved that − 12 is the unique solution of the equation
25
25 The reader cannot help but notice that we have just referred to (6.14)–(6.16) and (b), (d),
(f) as "equations", whereas these are not equations in the sense of page 322. Here then is another
instance where we have to face up to a common linguistic abuse and learn to tiptoe through a
linguistic minefield: it is a common practice in mathematics to refer to any displayed collection
of symbols involving the equal sign as an "equation".
6.2. SOLVING LINEAR EQUATIONS IN ONE VARIABLE 327
d−b
Step 1. If x0 is a solution of ax + b = cx + d, then x0 = a−c .
d−b
Step 2. a−c is a solution of ax + b = cx + d. (Compare Exercise
9 on page 120.)
Because the proofs of these two steps are so similar to the solution of the equation
3x − 4 = 27
5 x + 1, we can safely leave them as Exercise 2 on page 330. The theorem
can therefore be considered to be proved.
If you as a teacher can make sense to students about what they are learning,
they will repay your efforts by making sense to you. End of Pedagogical Com-
ments.
We will now revisit the proof of Theorem 6.3 and recast it in a more abstract
setting. First we define two equations to be equivalent if they have the same
solutions; i.e., every solution of one is also a solution of the other. We also introduce
the following two operations on a linear equation in a number x.
(E1) Add the same number or same monomial to the expressions
on both sides of an equation. (For example, adding 21 to both
sides of 15x−21 = 28x+3 results in the expressions 15x−21+21
and 28x + 3 + 21.26 )
(E2) Multiply the expressions on both sides of an equation by
the same nonzero number. (For example, multiplying both sides
of 15x − 21 = 28x + 3 by 15 1
leads to the expressions 15 1
(15x − 21)
1
and 15 (28x + 3).)
We can see the relevance of these operations by noting that we used (E1) to obtain
equation (b) on page 324 from equation (6.13) by adding − 27 5 x to both sides of the
original equation 3x − 4 = 5 x + 1, and we obtained (f) from (d) on the same page
27
5
by multiplying both sides of the equation in (d) by (− 12 ).
We should also point out a key feature of both (E1) and (E2): each operation
is reversible in the sense that if, in (E1), we add a number or monomial A to both
sides, then if we follow it by adding −A to both sides, we get back the original
equation. Similarly, in (E2), if we multiply both sides of an equation by a nonzero
number B, then if we follow it by multiplying both sides by B1 , again we get back
to the original equation. Incidentally, the second comment shows why in (E2) we
stipulate that the number B be nonzero, as otherwise B1 would make no sense.
With these preparations out of the way, we now come to the main point:
Lemma 6.4. Applying either of the operations (E1) and (E2) to a given linear
equation of one variable results in an equivalent equation.
Proof. Instead of a general proof of Lemma 6.4, we will offer a proof of the lemma
for equation (6.13), which is
27
3x − 4 = x + 1.
5
It will be seen that the reasoning in this special case is in fact perfectly general.
First, let us prove the part of the lemma about (E1). So let us add, let us say, − 27
5 x
to both sides of (6.13) to get the expressions − 5 x − 4 and 1, and the resulting
12
equation is
12
(6.17) − x − 4 = 1.
5
We will show that equations (6.13) and (6.17) are equivalent. Suppose x0 is a solu-
tion of equation (6.13); then we have an equality of numbers: 3x0 − 4 = 27 5 x0 + 1. If
26 Notice that we are making use of Theorem 1 of the appendix in Chapter 1 (page 87) so
to both sides of the last equality to get 3x1 − 4 = 27 5 x1 + 1. But this says x1 is a
solution of equation (6.13), and the proof of the part of Lemma 6.4 about (E1) is
complete. We note that this reasoning depends on the fact that (E1) is reversible.
The reasoning with the part of Lemma 6.4 about (E2) is similar. We may therefore
consider the proof of Lemma 6.4 to be complete.
Therefore the solutions of 3x−4 = 27 5 x+1 are exactly the same as those of x = − 12 ,
25
which consist of one number, − 12 . Now, the foregoing steps (A), (B), . . . , (F) are
25
the correct mathematical counterparts of the steps (a), (b), . . . , (f) on page 324,
respectively. In greater detail, what (a)–(f) really try to say—in view of (A)–(F)—is
not that the solutions of 3x − 4 = 27 5 x + 1 can be obtained by the computations
in (a)–(f), but that (a)–(f) present a succession of equations, each being equivalent
to 3x − 4 = 27 5 x + 1, so that at the end the equation so obtained is so simple (i.e.,
x = − 25
12 ) that its solutions can be read off.
A second comment is that the proof of Theorem 6.3 shows that solving a linear
equation is conceptually very simple: use (E1) to show that the given equation is
equivalent to an equation of the form Ax = B, where A and B are constants (this is
called isolating the variable; see step (D) above), and then use (E2) to conclude
330 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
B
that the equation Ax = B has the unique solution A. Therefore, the latter is the
unique solution of the original equation.
Exercises 6.2.
(1) Prove that ab is a solution of the equation ax = b, where a = 0, and that
any solution of the equation is equal to ab .
(2) Give the details of the proofs of Step 1 and Step 2 in the proof of Theorem
6.3 on page 326.
(3) Write out a proof of Lemma 6.4 on page 328 for a general linear equation
ax + b = cx + d.
(4) Imagine that you are teaching eighth-grade algebra and you have to ex-
plain for the first time how to solve 2x + 11 = 2 − x. How would you
explain (a) what an equation is, (b) what it means to solve this equation,
and (c) how to actually solve it, step by step?
(5) Formulate a precise statement about a linear equation in x, ax+b = cx+d,
where a, b, c, d are the constants, so that the equation has a unique
solution, has no solution, and has an infinite number of solutions. (Of
course you will have to prove it.)
5
(6) If 13 of a number N exceeds a third of N by 8, what is N ? (This is
Exercise 15 on page 68.)
(7) (In this exercise, you are assumed to know the formula for the circum-
ference of a circle.) Aaron and Warren start walking around a circle
from the same spot at the same time. They walk in the same direction,
6.3. SETTING UP COORDINATE SYSTEMS 331
27 Mathematical Aside: As is well known, it is irrelevant whether either line is horizontal, but
s X
O A=1
Of course the 1 on the y-axis could also have been specified as the point B on the
y-axis above O so that dist(B, O) = 1. This is because ϕ, being distance-preserving,
maps A to B (see (L7)(ii) on page 237). By our choices, the positive numbers on
the x-axis are to the right of O and the positive numbers on the y-axis are above O
on the y-axis. Naturally, the half-line (see Lemma 4.5 on page 173) on the x-axis
to the right of O is called the positive x-axis; the opposite half-line on the x-
axis is then the negative x-axis. The half-line on the y-axis above O is similarly
called the positive y-axis, and the opposite half-line on the y-axis is the negative
y-axis.
Making a particular choice of the origin and the x- and y-axes is referred to
as setting up a coordinate system. Clearly, there are an infinite number of
ways to set up a coordinate system; i.e., there are an infinite number of choices of a
pair of perpendicular lines (so that one of them is horizontal) as the x-axis and the
y-axis. In the following discussion, it will be understood that we have made a fixed
choice of a coordinate system. Let us pause to observe what we have achieved.
Let R2 denote the set of all ordered pairs of real numbers,28 which by con-
vention are written as (x, y) (where x, y ∈ R), so that the pair (a, b) and (b, a) are
considered to be distinct if a = b. Therefore in R2 , (3, 4) = (4, 3), and in general,
(a, b) = (c, d) ⇐⇒ a = c, b = d.
Again, note that there is no ambiguity as to what the equality between two ordered
pairs of numbers means.
Now, relative to a fixed coordinate system, we will assign an ordered pair of
numbers to each point Q in the plane in the following way. We call any line that
is either the given x-axis or parallel to the x-axis a horizontal line, and also any
line that is either the given y-axis or parallel to the y-axis a vertical line. Then
through Q draw two lines, one vertical and one horizontal, so that they intersect
the x-axis at a point A and the y-axis at a point B, respectively. Let the point
A correspond to the number a on the x-axis and the point B correspond to the
number b on the y-axis. Then the ordered pair of numbers (a, b) are said to be the
coordinates of Q (relative to the chosen coordinate axes); we write Q = (a, b),
and a is called the x-coordinate and b the y-coordinate of Q (relative to the
chosen coordinate axes). For clarity, we denote the x-axis and the y-axis by the
letters X and Y (sometimes x and y), respectively.
28 The superscript "2" in R2 reminds us that two real numbers are involved.
6.3. SETTING UP COORDINATE SYSTEMS 333
Qr rB = (0, b)
r X
A = (a, 0) O
Observe that, for this particular Q, a = −|OA| and b = |OB|. Also observe that
A = (a, 0) and B = (0, b). It is customary to abuse the notation and replace the
points (a, 0) on the x-axis by a and the points (0, b) on the y-axis by b. Therefore
the preceding picture is normally labeled as follows:
Y
Q r rb
r X
a O
is not a "transformation". However, (1) we will define a function in Section 1.1 of [Wu2020b], and
this Φ will be seen to be a function, and (2) we can broaden the definition of a transformation at
this point to include such a Φ if we wish.
30 Notice that strictly speaking, we should write Ψ((x, y)) rather than Ψ(x, y). But tradition
dictates that we use the latter and not the former, and it must be said that, as a matter of
convenience, tradition got it right this time!
334 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
a coordinate system has been made. Still with the bijection Φ : the plane → R2
understood, we usually identify a point with its corresponding ordered pair
of coordinates.
The x-axis, and more generally a horizontal line H that intersects the y-axis at
a number c, separates the plane into two parts. Denote by U p (respectively, Lo)
all the points P so that the horizontal line passing through P intersects the y-axis
at a number > c (respectively, < c). Then it is clear that the plane is the disjoint
union (see page 175) of the nonempty sets U p, Lo, and H. U p and Lo are called
the upper half-plane and lower half-plane of H, respectively. A point in U p is
said to be above H; likewise, a point in Lo is said to be below H.
Y
Up
c H
Lo
X
O
V Y
L R
X
d O
6.3. SETTING UP COORDINATE SYSTEMS 335
We will now use the new terminology of upper half-plane, etc., to give another
interpretation of the coordinates of a point P . First we recall the picture:
Y
P r rB = (0, b)
r X
A = (a, 0) O
II I
q
O
III IV
great insight of Pierre Fermat and René Descartes.31 The study of geometry using
coordinates is usually called analytic geometry. An example of the immediate
impact of the introduction of coordinates is the following distance formula between
two points in the plane. Suppose two points (a, b) and (c, d) in the coordinate plane
are given. We want to compute the distance between (a, b) and (c, d) in terms of the
four coordinates. By (L5) (page 184), this distance is the length of the hypotenuse
of the right triangle whose legs are the horizontal and vertical segments, as shown:
Y V
(c, d) rH
HH
H HH
b HHr
H
(c, b) (a, b)
c X
O
Now, (a, b) and (c, b) are points on the horizontal line H passing through the point
b on the y-axis. Hence,
(6.18) distance between (a, b) and (c, b) = |a − c|
(see (2.37) on page 126). Likewise, (c, b) and (c, d) are points on the vertical line V
passing through the point c on the x-axis. Hence,
(6.19) distance between (c, b) and (c, d) = |b − d|.
It therefore follows
from the Pythagorean theorem that the distance between (a, b)
and (c, d) is |a − c|2 + |b − d|2 . Since |x|2 = x2 for any number x, we have
|a − c|2 = (a − c)2 and |b − d|2 = (b − d)2 . Hence,
(6.20) distance between (a, b) and (c, d) = (a − c)2 + (b − d)2 .
This is the distance formula we are after.
philosopher ("I think, therefore I am"). In addition to discovering analytic geometry with Fermat,
he was instrumental in codifying the modern symbolic notation. We will come across him again
in Section 3.2 of [Wu2020b].
6.4. LINES IN THE PLANE AND THEIR SLOPES 337
Exercises 6.3.
(1) Let ρ be the rotation of 180◦ with respect to the origin O. Prove that
ρ(x, y) = (−x, −y) for all (x, y).
(2) (a) Let L be the vertical line x = c. Prove that the reflection with respect
to L is given by the transformation Λ so that Λ(x, y) = (2c − x, y) for
all (x, y). (b) Formulate the corresponding statement for reflection with
respect to a horizontal line y = d.
(3) (a) Prove that the set U p defined on page 334 is convex. (Caution: This
is not as straightforward as you may think. Use Lemma 4.8 on page 178.)
(b) Prove that the set Lo defined on page 334 is also convex.
(4) Given two distinct points P = (a, b) and Q = (c, d), if P1 , P2 , . . . , Pn
are points on the segment P Q which divide P Q into n + 1 segments of
equal lengths, what are the coordinates of each Pi , for i = 1, . . . , n? (Hint:
Make use of Exercise 11 on page 238.)
(5) (a) Write down the set of all the points which are equidistant from (1, 0)
and (4, 0) and describe the relationship of this set with (1, 0) and (4, 0).
(b) Write down the set of all the points which are equidistant from P =
(p1 , p2 ) and Q = (q1 , q2 ) and describe the relationship of this set with P
and Q.
(6) (a) Given two points P = (p1 , p2 ) and Q = (q1 , q2 ), show that the mid-
point of the segment P Q has coordinates ( 12 (p1 + q1 ), 12 (p2 + q2 )). (Hint:
Compare Exercise 4 on page 294.) (b) Given P = (1, 2) and Q = (3, −1),
|P A|
find the point A on the segment P Q so that |AQ| = 32 . (c) Given two
points P = (p1 , p2 ) and Q = (q1 , q2 ), find the point B on the segment P Q
|P B|
so that |BQ| = m n , where m and n are given positive integers.
(7) If S is a geometric figure in the plane, define S1 to be the collection of all
the points P in the plane so that there is some point Q ∈ S of distance
≤ 1 from P , i.e., so that |P Q| ≤ 1 for some Q ∈ S. (i) If S is a point O,
then show that S1 is the closed disk of radius 1 centered at O. (ii) If S is
the unit segment from O to (1, 0), what is S1 ? Give reasons.
We will begin by explaining what purpose the concept of the slope of a line
is supposed to serve and then show how to define it correctly. To put the last
statement in context, think of the corresponding situation regarding the concept
of a "fraction". As far as TSM is concerned, it is sufficient that students know a
fraction is a piece of pizza or, more generally, a part-of-a-whole. By contrast, we
have seen the need to define a fraction as a certain point on the number line before
we can develop the subject of fractions in a logical and coherent manner. We are
going to do the same with the slope of a line.32
Our starting point is the consideration of all the nonvertical lines passing
through a fixed point P . To further simplify matters, let P be the origin O. In
what way can we distinguish the following lines from one another?
Y
L1
L2
X
O
L3
L4
Intuitively, these lines differ in their "steepness": L1 is more "steep" than L2 , and
L4 is more steep than L3 , though L1 and L2 differ from L3 and L4 because the two
groups slant differently. Part of the challenge will be to figure out how to separate
the "steepness" of these two groups of lines. From this point of view, we eliminate
the vertical line—the y-axis—from our consideration at the outset because, having
the ultimate "steepness", the vertical line has no need for any discussion.
The next step is to convert these appealing but vague ideas into precise math-
ematics: can we use a number to measure the "steepness" of the nonvertical lines
passing through O? Let us denote the vertical line passing through the point (1, 0)
on the x-axis by {x = 1}; this notation will be explained in the first subsection
of Section 6.5 on pp. 351ff. Observe that, by the definition of the coordinates of
a point, all the points on {x = 1} have coordinates (1, y), where y ∈ R. Now a
nonvertical line L passing through O is not parallel to {x = 1} and must intersect
the latter at a point (1, s) for some s ∈ R. We assign the number s to the line L
as a measure of its "steepness". See the following picture:
32 The following exposition on slope has been used in [EngageNY] and [Eureka] with the
author’s consent.
6.4. LINES IN THE PLANE AND THEIR SLOPES 339
Y L1 L
(1, s) L2
(1, s 2)
(1,0)
X
O
(1, s 3)
L 4 { x= 1} L 3
We now explain intuitively how this number s serves the purpose of revealing "the
steepness of L", on two different levels. First, on the qualitative level, if {s > 0},
then L intersects the vertical line {x = 1} at a point above the x-axis (in the sense of
page 334) and therefore L will slant this way /. However, if s < 0, then L intersects
the vertical line x = 1 at a point below the x-axis (in the sense of page 334) and
therefore L will slant this way \. See L3 in the preceding picture, for example.
Thus the sign of s (i.e., whether s is positive or negative) already tells us the way
the line L slants, whether it is / or \. One can go further, however. It is visibly
obvious that the closer L gets to being vertical, the larger the absolute value |s| of
s is going to be. For example, L1 will intersect the line {x = 1} at a point (1, s1 )
very high above the x-axis so that s1 is going to be very large, whereas L4 will
intersect the line {x = 1} at a point (1, s4 ) very far below the x-axis and therefore
s4 will be "large negative" or, more correctly, s4 < 0 and |s4 | is very large. This
number s is therefore a good measure of the "steepness" of L. We will call this s
the local slope of L at O. We emphasize the need to refer to the whole phrase,
"the local slope of L at O", because, up to this point, this concept refers only to
the behavior of L at one point: at the origin O.
If two nonvertical lines passing through O have the same local slope at O, then
these lines join O to the same point on the vertical line {x = 1} and therefore are
the same line (assumption (L1) on page 165). Conversely, if the two lines passing
through O coincide, then of course they have the same slope. We may therefore
summarize this short discussion in the following lemma.
Lemma 6.5. Two lines passing through O have the same local slope at O if and
only if they are the same line.
There is another way to look at the use of the vertical line {x = 1} to measure
the local slope of a nonvertical line at O. When we say we assign s to be the local
slope of L at O if L intersects the vertical line {x = 1} at the point (1, s), it is
equivalent to saying that we are looking at the line {x = 1} as a number line so that
its 0 is the point (1, 0) and its 1 is the point (1, 1). Equivalently, we are identifying
the line {x = 1} with the y-axis by using the translation along the horizontal vector
from O to (1, 0). This point of view will be useful below.
why use the vertical line {x = 1} instead of the vertical line {x = 2} that passes
through (2, 0), for example? The answer is that the choice of the line {x = 1} to
measure the local slope of L at O is just a convention, as we now explain. Suppose
L intersects the line {x = 1} at the point (1, s), so that the local slope of L at
O is s. If we had chosen to use the line {x = 2} instead, then the same L would
intersect the line {x = 2} at (2, 2s) (use Theorem G15* on page 266; see picture).
It follows that if the line x = 2 is used instead of x = 1, then the local slope of
L at O would change from s to 2s for every s. In terms of our understanding of
what the "local slope of L at O" means, this hardly matters. More generally, the
following Activity gives the complete picture.
This Activity shows that if we use the vertical line x = k for a number k > 0,
instead of the line x = 1, to measure the local slope of L at O, where L is any line
passing through the origin O, then the only effect that this change will have is that
the local slope of L at O will be multiplied by a factor of k. Therefore we may as
well keep things simple by using the line x = 1.
A second question that is likely to be asked is why not use vertical lines to the
left of O to measure the local slope of L at O? Notice that the lines x = k for
k > 0 are to the right of O, so this question is tantamount to asking why not use
the vertical lines x = k for k < 0? In fact, why not use the vertical line x = −1?
The reason for not using x = −1 is again a matter of convention. So suppose we
consider a line L passing through O which is slanted this way /. In our present
definition, the local slope of L at O is a positive number, let us say, s. Now suppose
we switch over to the line x = −1 for measuring the local slope of L at O. Then
the same L will intersect the vertical line x = −1 at the point (−1, −s), which is
easy to see (e.g., use congruent triangles or the 180-degree rotation around O).
6.4. LINES IN THE PLANE AND THEIR SLOPES 341
Y
L
s
X
s −1 O 1
(−1,−s )
x= −1 x= 1
Now let P be an arbitrary point, and consider all the nonvertical lines passing
through P . We ask the same question of how to distinguish these lines one from
the other.
Y L1
L2
L4 L3
O X
There is an obvious answer. We can pretend that P is just the origin O and do to
it what we did to O to get the local slope of each line at P and then use the local
slope (steepness) to distinguish these lines. More precisely, let X and Y be the
horizontal and vertical lines passing through P , respectively. These correspond to
the x-axis and y-axis, respectively.
342 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
LY
Y
Y L1
2 L
1
s
P 1 Q X
1
0.75
L2
X
O
Let Q be the point on X of distance 1 to the right of P and let LY be the vertical
line through Q. Then LY corresponds to the vertical number line {x = 1}. The
distance between the parallel lines Y and LY is thus 1 (see page 228 for the concept
of distance between parallel lines). If the coordinates of P are (x0 , y0 ), then the
coordinates of Q are (x0 + 1, y0 ). Thus P = (x0 , y0 ) and Q = (x0 + 1, y0 ). All
the points on LY have coordinates of the form (x0 + 1, y0 + y), where y is a real
number. Clearly, y > 0 if and only if (x0 + 1, y0 + y) is above Q, and y < 0 if
and only if (x0 + 1, y0 + y) is below Q. We can now show how to define the local
slope of a line L at P by making use of the horizontal line X and the vertical line
LY . So let L be a line passing through P and let L intersect the line LY at the
point (x0 + 1, y0 + s). Then the local slope of L at P is by definition s. For
example, still referring to the preceding picture, the line L1 intersects the line LY
at (x0 + 1, y0 + 1), and therefore its local slope at P is 1. The line L2 intersects the
number line LY at the number (x0 + 1, y0 − 0.75), so its local slope at P is −0.75.
The same reasoning that proves Lemma 6.5 on page 339 also proves the follow-
ing lemma.
Lemma 6.6. Two lines passing through a point P have the same local slope at
P if and only if they are the same line.
So far we have only looked at all the lines passing through a fixed point P and
we learned how to assign a local slope at P to each of these lines. Now we change
our vantage point by focussing on a single line instead. If we fix a line L in the
coordinate plane and if P1 and P2 are two points on L, then a critical question
naturally arises: is the local slope of L at P1 equal to the local slope of L at P2 ?
The following lemma answers this question affirmatively.
Lemma 6.7. The local slope of a nonvertical line L at P for a point P ∈ L does
not depend on P .
6.4. LINES IN THE PLANE AND THEIR SLOPES 343
X
O
At P1 , we do the usual construction to get the local slope of L at P1 ; namely,
on the horizontal line passing through P1 , pick the point Q1 that is 1 unit to the
right of P1 , and let the vertical line passing through Q1 intersect L at a point M1 .
Similarly, repeat this construction at P2 to obtain the points Q2 and M2 .33
First consider the case that M1 is above the horizontal line LP1 Q1 (see the
definition of "above" on page 334). Then M2 is also above the horizontal line
LP2 Q2 , as shown in the preceding picture. Therefore the local slope of L at P1 is
positive and is equal to the length |M1 Q1 | and the local slope of L at P2 is also
positive and is equal to the length |M2 Q2 |. We have to show |M1 Q1 | = |M2 Q2 |.
If we can prove the congruence of P1 Q1 M1 and P2 Q2 M2 , then we would be
done. Now the congruence is a consequence of ASA, because |P1 Q1 | = |P2 Q2 | = 1,
|∠P1 Q1 M1 | = |∠P2 Q2 M2 | = 90◦ , and |∠M1 P1 Q1 | = |∠M2 P2 Q2 | because they are
corresponding angles of the transversal L with respect to the parallel lines LP1 Q1
and LP2 Q2 (Theorem G18 on page 277). Therefore the local slopes of L at P1 and
P2 are equal.
The other possibility is that M1 is below the horizontal line LP1 Q1 , as shown
in the picture below. Then M2 is also below the horizontal line LP2 Q2 .
Y
P1 1 Q1
M1
1
P2 Q2
M2
X
O L
33 Note that in this picture, M is put between P and P on the line L for the sake of visual
1 1 2
clarity. In general, P2 could very well be between P1 and M , but the validity of the reasoning to
follow does not depend on the relative positions of P1 , P2 , and M .
344 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
The local slopes of L at P1 and P2 are therefore equal to −|Q1 M1 | and −|Q2 M2 |.
The same reasoning as above then shows that |Q1 M1 | = |Q2 M2 | so that, once again,
the local slopes of L at P1 and P2 are equal. The proof of the lemma is complete.
Remark. There is a subtle point in the preceding proof that we have intention-
ally glossed over. How did we know that if M1 is above (respectively, below) LP1 Q1 ,
then M2 would also be above (respectively, below) the horizontal lines LP2 Q2 ? This
is a critical part of the proof because this allows us to conclude that, regardless of
whether the local slopes of L at P1 and P2 are equal or not, they are at least both
positive or both negative. It is only because of this fact that the equality of the two
local slopes reduces to the equality of the lengths of Q1 M1 and Q2 M2 . In general,
imagine that L is not a line but a curve; then we can have the following situation
where M1 is above LP1 Q1 but M2 is below LP2 Q2 :
Y P2 Q2
1
M1
M2 L
P1 Q1
1
X
O
One’s instinctive reaction is that this anomalous situation cannot occur when L
is actually a straight line. (Recall that line is synonymous with straight line in
this volume and in [Wu2020b] and [Wu2020c].) In other words, our intuition tells
us that the "straightness" of a line will eliminate this anomalous situation. The
following assertion then gives mathematical substance to our intuition:
The key to this proof is Lemma 4.8 on page 178. Since the points P1 and P2
are interchangeable in (‡), we may assume that P1 is to the left of P2 . Thus we are
assuming that
Let the coordinates of M1 be (m1 , m1 ). Because M1 and Q1 are on the same
vertical line, they have the same x-coordinate. But the x-coordinate of Q1 is
(p1 + 1), by definition; therefore, m1 = p1 + 1. Thus M1 = (p1 + 1, m1 ). Similarly,
M2 = (p2 + 1, m2 ) for some number m2 , as shown:
6.4. LINES IN THE PLANE AND THEIR SLOPES 345
Let vertical lines be drawn from P1 , M1 , P2 , and M2 . Then the points of inter-
section of these lines with the x-axis are the x-coordinates of the aforementioned
points, and these are the numbers p1 , p1 + 1, p2 , and p2 + 1, according to (6.21)
and (6.22). Since p1 < p2 (by (6.21)), we have
Since vertical lines are parallel to each other, by applying Lemma 4.8 on page 178
to the line L and the x-axis, we obtain by virtue of (6.21) and (6.23)
(6.24) P1 ∗ M1 ∗ M2 , P1 ∗ P2 ∗ M2 .
Now draw horizontal lines through P1 , M1 , P2 , and M2 ; then their points of in-
tersection with the y-axis are the y-coordinates of these same points, which are,
by virtue of (6.21) and (6.22), p1 , m1 , p2 , and m2 . Because of (6.24) and because
horizontal lines are parallel, by applying Lemma 4.8 on page 178 to the line L and
the y-axis, we obtain
Now the y-axis is a number line, and we know from the discussion on pp. 167ff.
that (6.25) means either
(6.26) p1 < m1 < m2 , p1 < p2 < m2
or
(6.27) p1 > m1 > m2 , p1 > p2 > m2 .
Suppose M1 is above the horizontal line LP1 Q1 ; then p1 < m1 , so that only (6.26)
is possible. Such being the case, the second part of (6.26) implies p2 < m2 , which
means that M2 is above the horizontal line LP2 Q2 , as desired.
346 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
On the other hand, if M1 is below the horizontal line LP1 Q1 , then m1 < p1 and
only (6.27) is possible. See this figure:
Y
p1 P 1 Q1
1
m1 M1
1
p2 P2 Q2
m2 M2
p1 p +1 p2 p +1 X
O 1 2 L
Such being the case, the second part of (6.27) implies p2 > m2 , which means that
M2 is below the horizontal line LP2 Q2 . The proof of (‡) is complete.
Now to return to the discussion of slope, Lemma 6.7 tells us that it is no longer
necessary to refer to the local slope of a line L at a particular point P of L, be-
cause it does not matter which point P on L is used. This then allows us to finally
introduce the following definition.
The concept of slope also yields a simple criterion for deciding whether two
lines are the same. This criterion will be a key ingredient for the proof of Theorem
6.11 on page 354.
Theorem 6.9. If two lines pass through the same point and have the same
slope, then they are the same line.
Proof. This theorem is nothing more than a rewrite of Lemma 6.6 on page 342.
Suppose two lines L and L have the same slope and both pass through the same
point P . By the definition of slope, the local slopes of L and L at P are equal.
Therefore by Lemma 6.6, L = L . The proof is complete.
For all the virtues of the preceding definition of the slope of a line, one should
not be blind to the fact that it is too clumsy for computations. Thus, to compute
the slope of a line L, the best one can do so far is to pick a point P on L; draw a
horizontal line through P and let Q be the point 1 unit to the right of P on this
horizontal line. Let Q = (x0 , y0 ). Through Q draw a vertical line LY and let L
intersect LY at the point with coordinates (x0 , y). Then the slope of L is equal
to y − y0 . (An equivalent way of looking at this is to regard LY as a number line
whose 0 is at Q and whose 1 is 1 unit above Q; the slope of L is the number on LY
at which L intersects LY (see page 342)).
If this were the only (excruciating) way to get the slope of a line, the concept
of slope might have been abandoned long ago. Fortunately, the following theorem
shows a way to compute slope that removes the clumsiness. In fact, the ratio in
(6.28) below is usually taken to be the definition of slope.
Theorem 6.10. On a given nonvertical line L, let any two distinct points
P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be chosen. Then the slope of L is equal to the ratio
y2 − y1
(6.28) .
x2 − x1
We will preface the proof of the theorem with a little discussion. The ratio in
(6.28) is called the difference quotient of P1 and P2 .
(2) Because for any numbers a and b, ab = −a−b (Theorem 2.10 on page 116 when
a and b are rational; then use FASM), we see that the difference quotient (6.28)
of P1 = (x1 , y1 ) and P2 = (x2 , y2 ) enjoys an important symmetry property with
respect to P1 and P2 :
y2 − y1 y1 − y2
(6.29) = .
x2 − x1 x1 − x2
Therefore, in writing the difference quotient, the order of P1 and P2 does not matter.
We will henceforth assume that P1 is to the left of P2 ; i.e., x1 < x2 .
(3) Observe that the denominator of the difference quotient (6.28) is never 0
because, if it were, then x2 − x1 = 0 and x1 = x2 . The distinct points P1 and P2 on
L now would have the same x-coordinate and L, being a line that joins two such
points, would have to be a vertical line, contradicting the hypothesis that L is non-
vertical. Thus the denominator of this ratio is never zero, and the ratio makes sense.
Before giving the proof of Theorem 6.10, we have to discuss a geometric in-
terpretation of the difference quotient (6.28) that is of independent interest. This
interpretation will play a crucial role in all the considerations of slope. If the slope
of L is 0, then L is horizontal (by Lemma 6.8 on page 346) so that the difference
quotient (6.28) is identically zero for all P1 and P2 . From now on we may as-
sume that the slope of L is not 0. With this understood and with P1 = (x1 , y1 )
and P2 = (x2 , y2 ), x1 < x2 , as always, let the horizontal line passing through P1
and the vertical line passing through P2 intersect at R, so that P1 P2 R is a right
triangle with the right angle at R. Then we claim
⎧
⎪
⎪ |P2 R|
⎪
⎪ if the slope of L > 0,
y2 − y1 ⎨ |P1 R|
(6.30) =
x2 − x1 ⎪
⎪ |P2 R|
⎪
⎪
⎩ − if the slope of L < 0.
|P1 R|
The two cases of positive and negative slopes are illustrated by the left figure and
right figure below, respectively.
Y Y LY
LY
L Q
P2 1
P1 R
M
P1 M
1 Q R P2
X X
O O L
For the proof of (6.30), we begin with two general comments that are valid for
both cases in (6.30). We first claim that the coordinates of R are (x2 , y1 ). This
is because P1 and R, being on the same horizontal line, must have the same y-
coordinate, which is y1 (the y-coordinate of P1 ). Similarly, P2 and R, being on the
same vertical line, must have the same x-coordinate, which is x2 (the x-coordinate
6.4. LINES IN THE PLANE AND THEIR SLOPES 349
of P2 ). Thus, indeed,
R = (x2 , y1 ).
Next, since P1 and R lie on the same horizontal line, |P1 R| = |x1 − x2 | (see (6.18)
on page 336). By assumption, x2 > x1 , so we get
(6.31) x2 − x1 = |P1 R|.
To prove (6.30), first suppose the slope of L is positive. Since P2 and R lie on
the same vertical line x = x2 , |P2 R| = |y2 − y1 | (see (6.19) on page 336). We claim
that P2 lies above R so that y2 > y1 and, therefore (still with the assumption that
the slope of L is positive),
(6.32) y2 − y1 = |P2 R|.
Together with (6.31), this proves the first half of (6.30).
It remains to prove (6.32). For this, we will need the assumption of the posi-
tivity of the slope of L. On the horizontal line LP1 R , let Q be the point to the right
of P1 so that |P1 Q| = 1, and let the vertical line LY passing through Q meet L at
M . From the definition of local slope (page 339), the positivity of the local slope of
L at P means that M is above the horizontal line LP1 R (see the left picture above;
note that the picture shows the case where Q is to the left of R, but the reasoning
is the same even if Q is to the right of R). It is obvious from the picture that since
R is also to the right of P1 , the intersection P2 of the vertical line passing through
R with L is also above LP1 R ;35 i.e., P2 is above R as claimed. We have therefore
proved (6.30) in case the slope of L is positive.
If the slope of L is negative, the local slope of L at P1 is negative. Now the
definition of local slope shows that if Q is the point on the horizontal line LP1 R which
is 1 unit to the right of P1 and if the vertical line LY passing through Q meets L at
M , then M is below the horizontal line LP1 Q (see the right picture above). Clearly
P2 , being the point of intersection of L with the vertical line passing through a point
R to the right of P1 , is also below LP1 R .36 Thus P2 lies below R. Consequently, a
similar reasoning shows that y2 < y1 and
−(y2 − y1 ) = |P2 R|.
Together with (6.31) and Theorem 2.10 on page 116, this proves the second half of
(6.30). The proof of (6.30) is complete.
Proof of Theorem 6.10. The claim (6.30) transforms the difference quotient
(6.28) into a geometric quantity, namely, ± the ratio of the lengths of two sides of
a right triangle P1 P2 R. This then invites the consideration of similar triangles
(Section 5.3 on page 283). More precisely, let Q be the point on the horizontal line
P1 R so that Q is to the right of P1 and |P1 Q| = 1. Let the vertical line LY passing
through Q intersect L at M as usual. See the pictures below for both cases of a
positive and a negative local slope of L.
35 In fact, the reasoning that proves (‡) on page 344 also suffices to prove that M being above
LP1 Q implies P2 is above LP1 Q . But see the Pedagogical Comments on page 346.
36 But see the preceding footnote.
350 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
Y LY Y LY
L
P2 1 Q
M P1 R
P1
M
1 Q R P2
X X
O O L
We have drawn the pictures so that P2 is to the right of M purely for the sake of
clarity; the subsequent reasoning will be independent of this fact.
We will prove
(6.33) P1 P2 R ∼ P1 M Q.
Assuming (6.33) for the moment, we will finish the proof of Theorem 6.10. By the
proportionality of corresponding sides of similar triangles, we have the equality
|P2 R| |P1 R|
=
|M Q| |P1 Q|
which, by virtue of the cross-multiplication algorithm ((b) on page 70) and FASM,
is equivalent to
|P2 R| |M Q|
= .
|P1 R| |P1 Q|
Suppose the local slope of L at P1 is positive. Then by (6.30), this implies
y2 − y1 |M Q|
= = |M Q|.
x2 − x1 1
Since |M Q| is by definition the local slope of L at P1 , the preceding equality proves
Theorem 6.10 in the case of positive local slope. If the local slope of L at P1 is
negative, then an entirely analogous argument shows that
y2 − y1 |M Q|
=− = −|M Q|.
x2 − x1 1
Since the local slope of L at P1 is, by definition, equal to −|M Q| in this case,
Theorem 6.10 is now proved also in the case of a negative local slope.
It remains to prove (6.33). Observe that P1 P2 R and P1 M Q obviously sat-
isfy the AA criterion for similarity (Theorem 22 on page 288) because |∠P2 P1 R| =
|∠M P1 Q| and
|∠P2 RP1 | = |∠M QP1 | = 90◦ .
The desired similarity in (6.33) follows. The proof of Theorem 6.10 is complete.
The great significance of Theorem 6.10 is that, given a line, we can find its
slope by computing the difference quotient of any two points on the line that suit
our purpose. This is not only a useful idea to keep in mind in general, but also one
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 351
that is critical for the proof that the graph of a linear equation of two variables is
a line (Theorem 6.11 on page 354).
Exercises 6.4.
(1) Let be the line joining (1, 2) and (−3, 4). If (x, y) is a point on , what
y−4
is the value of x+3 ?
(2) (a) Let be the line with slope m passing through ( 12 , 34 ). For which value
of m would pass through ( 53 , 13 )? (b) Let be the line joining (− 32 , 4)
and ( 45 , q), where q is some number. For what value of q would pass
through (2, −3)?
(3) Does the line joining (3, −2) and (6, 2) contain the point (9, 6)? Explain
your answer two different ways.
(4) Let P be a point in quadrant II and let Q be a point in quadrant IV. If
L is a line joining P to Q, what is the sign (page 339) of the slope of L?
Explain.
(5) Let L be the graph of a linear equation 3x + 4y = c for some constant
c. (a) What is the slope of L? (b) Suppose L passes through the point
(−1, 5). What is c?
(6) Let a, b be positive numbers. Can the three points (a, b), (2a, b + 2),
(−a3 , b − 1) be collinear? Explain.
the given equation. The collection of all the solutions of a given equation is called
the graph of the equation. The graph of an equation is therefore a subset of R2 .
For example, the equation x4 + y 2 + 1 = 0 has no solutions (because x4 + y 2 ≥ 0
for all numbers x and y), so that the graph of x4 + y 2 + 1 = 0 is the empty set,
i.e., the set with no elements. On the other hand, the graph of x2 + y 2 = 25 is the
circle of radius 5 around the origin (0, 0).
Activity. Use the distance formula (see equation (6.20) on page 336) to prove
the claim about the graph of x2 + y 2 = 25. Do not skip steps.
(−5, −1.5), (−4, −1), (−2, 0), (0, 1), (2, 2),
(2.5, 2.25), (4, 3), (6, 4), (7, 4.5).
Students need the experience of plotting points on the graph of an equation by hand,
and they should form this good habit right from the outset for the case of a linear
equation in two variables. In the age of the graphing calculator, there is all the
more reason to emphasize this need.
Consider next the graph of the linear equation in two variables y = 3 which,
as an equation in two variables, is in reality the abbreviated form of the equation
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 353
3
X
O
The same reasoning shows that the graph of the equation y = b in the plane
for any number b is exactly the horizontal line L0 passing through the point (0, b)
on the y-axis. We can do the same to vertical lines. In summary, we have the
following:
the graph of x = c for any number c is the vertical line passing
through the point (c, 0) on the x-axis, and the graph of y = b
for any number b is the horizontal line passing through the point
(0, b) on the y-axis.
Since there is only one horizontal (respectively, vertical) line passing through a
given point of the plane (why?), it follows that
every vertical line is the graph of the equation x = c, where (c, 0)
is the point of intersection of the line and the x-axis, and every
horizontal line is the graph of an equation y = b, where (0, b) is
the point of intersection of the line with the y-axis.
37 This obvious fact is usually taught by rote, but we want to establish the fact that if the
definition of the graph of an equation means anything at all, the fact that the graph of y = 3 is a
line should follow logically from the definition. For this reason, we want to prove this fact clearly
and correctly. If there are any doubts about the need to take the definition of the graph of an
equation or function seriously, it suffices to consult the blog of [Meyer].
354 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
It is well known that the graph of a linear equation in two variables is a line, and
vice versa. What is not well known is the reason why this is true. The main purpose
of this section is to prove this fact. Precisely, we have the following theorem.
Theorem 6.11. The graph of a linear equation in two variables is a line. Con-
versely, every line is the graph of a linear equation in two variables.
The proof of the first part of Lemma 6.12 is quite straightforward once you
consider the three cases separately: the line is vertical, horizontal, or neither. Of
course the proof of the second part is trivial. We will leave the details to an exercise
(see Exercise 4 on page 362).
Lemma 6.12 shows that if L is the graph of ax + by = c, then every equation
defining L is necessarily of the form kax + kby = kc for some k = 0. We naturally
regard all such equations (for any value of k) as the same equation. With this
understood, we are in the habit of saying in this case that the equation of L is
ax + by = c.
Proof of Theorem 6.11. We first prove that the graph of a linear equation of
two variables, ax + by = c (where a, b, c are constants), is a line.
If b = 0 in ax + by = c, then the equation becomes ax = c, whose graph is
clearly the same as the graph of x = c/a. The fact that the graph of x = c/a is
the vertical line passing through (c/a, 0) has been proved on page 353. We may
therefore assume from now on that in the equation ax + by = c, b = 0. The graph
of ax + by = c is clearly the same as the graph of y = −(a/b)x + (c/a). Therefore,
to complete the proof of Theorem 6.11, it suffices to prove the following:
If G is the graph of an equation y = mx + k, where m and k are
constants, then G is a line.38
38 In many textbooks, the standard notation for this equation is y = mx + b, but the "b" will
not work for us because our notation for a general linear equation is ax + by = c.
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 355
Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be two points on G. Let the line joining P1 and
P2 be denoted simply by L. We are going to prove G = L, thereby proving that
the graph of y = mx + k is a line.
Recall that to prove G = L, we have to first prove G ⊂ L and then prove
L ⊂ G. What is needed for both proofs is the fact that the slope of L is equal to
m. To see this, observe that since P1 and P2 are points on L,
y1 = mx1 + k and y2 = mx2 + k.
Therefore the difference quotient of P1 and P2 is
y2 − y1 (mx2 + k) − (mx1 + k) m(x2 − x1 )
= = = m.
x2 − x1 x2 − x1 x2 − x1
By Theorem 6.10 on page 347, the slope of L is m, as desired.
First: G ⊂ L. Given P = (x , y ) ∈ G, we must prove P ∈ L. We do so
by proving that the line L that joins P1 and P coincides with L. To this end,
Theorem 6.9 on page 347 implies that all we need to do is prove that L and L have
the same slope and pass through the same point. They clearly pass through the
same point P1 . As to their slopes, we have just seen that the slope of L is m. As
to the slope of L , note that since P is on the graph G of y = mx + k, we have
y = mx + k. Therefore the difference quotient of P1 and P is
y − y1 (mx + k) − (mx1 + k) m(x − x1 )
=
= = m.
x − x1 x − x1 x − x1
By Theorem 6.10, the slope of L is also m. Thus L = L. Therefore P ∈ L now
means P ∈ L. This proves G ⊂ L.
Next: L ⊂ G. This time, let P = (x , y ) ∈ L and we have to prove that
P ∈ G, i.e., y = mx + k. Since P1 is already a point of G, we may assume that
Now that we know the graph of every linear equation in two variables is a line,
we can prove the second part of Theorem 6.11 that every line is the graph of a
linear equation in two variables. If the line is vertical, then we have already proved
on page 353 that it is the graph of an equation x = c. Henceforth, we may assume
the line is nonvertical. Thus let L be a nonvertical line and let P1 = (x1 , y1 ) and
P2 = (x2 , y2 ) be two distinct points on L. Let m be the slope of L. Then by
Theorem 6.10,
y2 − y1
(6.34) m= .
x2 − x1
Rewriting (6.34) as y2 − y1 = m(x2 − x1 ), we are led to consider the equation
(6.35) y − y1 = m(x − x1 ).
356 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
Students who master this proof will never have to memorize the different forms
of the equation of a line. They will be able to derive the needed equation easily at
a moment’s notice or, at worst, they can memorize the different forms with much
greater ease because they now possess a mental framework in which each of those
forms can take its proper place. We will illustrate with some simple examples. To
this end, let us extract two useful facts from the preceding proof.
For the first lemma, define the y-coordinate of the point at which a line L
intersects the y-axis to be the y-intercept of the line; sometimes the point of
intersection is itself called the y-intercept of L.
Proof. Let m denote the slope of L. By Theorem 6.10 on page 347, when m is
computed using P1 and P2 , it is exactly as in (6.38). Let (x, y) be an arbitrary
point on L different from P1 . Then the slope of L computed using (x, y) and P1 is
y − y1
(6.39) m= .
x − x1
This is equivalent to y − y1 = m(x − x1 ), which is precisely equation (6.37). The
advantage of (6.37) is that every point (x, y) on L now satisfies this equation. Let
G be the graph of (6.37). Note that G is a line, by Theorem 6.11. The point P2 lies
in G because of (6.38), and P1 lies in G for trivial reasons. Thus G is also a line
passing through P1 and P2 . By assumption (L1) (page 165), G = L, which implies
L is the graph of equation (6.37). This proves Lemma 6.14.
We begin with some examples about finding the equation of a line satisfying
certain geometric conditions; the preceding lemmas will come in handy. For the
first example, we introduce another definition: the x-coordinate of the point at
which a line L intersects the x-axis is called the x-intercept of L; sometimes the
point of intersection is itself called the x-intercept of L.
Example 2. What is the equation of the line which passes through (−1, 3)
and ( 12 , 4) ?
Call this line . The slope of computed using the points (−1, 3) and ( 21 , 4) is
4−3 2
= .
1
2 − (−1) 3
By equation (6.37), the equation of is (y − 3) = 23 (x − (−1)), which is y = 23 x + 113 .
Alternately, we can use Lemma 6.13. Then the equation of has the form
y = 23 x + k. Since the point (−1, 3) lies on , we have 3 = 23 (−1) + k. Thus k = 11 3
and the equation of is y = 23 x + 11 3 . (We can equally well use the other point
( 21 , 4) to evaluate k in the equation y = 23 x + k. Then we get 4 = 23 · 12 + k, so that
k = 4 − 13 = 11 3 as before.)
358 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
Suppose you do not remember the exact statement of Lemma 6.14; then you
start from scratch. You would begin by computing the slope of the line as before,
obtaining 23 . Now you have to remember Theorem 6.10,39 which says the slope can
be computed using any two points. So you use (−1, 3) and an arbitrary point (x, y)
on not equal to (−1, 3) to get
y−3 2
= .
x − (−1) 3
Therefore y − 3 = 23 (x + 1) for all (x, y) on not equal to (−1, 3). Notice that in the
latter form, the equality continues to hold even when (x, y) = (−1, 3). Therefore
rewriting y − 3 = 23 (x + 1) as
2 11
(6.41) y= x+ ,
3 3
what you have then proved is that every point (x, y) on satisfies (6.41). Thus
lies in the graph of (6.41). Since the latter is itself a line (Theorem 6.11), is the
graph of (6.41), by (L1) on page 165.
So, once again, the answer is y = 23 x + 11
3 .
1
Example 3. What is the equation of the line with slope 2 and passing through
the point (3, 4)?
y = 12 x + 2 12 . Since this graph is also a line (Theorem 6.11 on page 354) which
passes through (3, 4) and whose slope is 12 , L is equal to this graph, by Theorem
6.9 on page 347. Thus the equation of L is y = 12 x + 2 12 .
We conclude this section with two applications. First, we will give an explicit
description of the segment joining two points.
Lemma 6.15. Let two distinct points P = (p, p ) and Q = (q, q ) be given.
(i) If p = q but p < q , then the segment P Q consists of all the points {(p, t)},
where p ≤ t ≤ q .
(ii) If p < q, let P and Q lie on the line whose equation is y = mx + k for
some constants m and k. Then P Q consists of all the points {(s, ms + k)}, where
p ≤ s ≤ q.
Proof. Let L be the line joining P to Q.
(i) If p = q, then L is the vertical line x = p. If T is the translation along the
vector from (0, 0) to (p, 0), then T maps the y-axis to L (Theorem G5 on page 236),
so that T (0, y) = (p, y) for all real numbers y. Thus if P0 and Q0 are the points
(0, p ) and (0, q ) on the y-axis, respectively, then
T (P0 ) = P and T (Q0 ) = Q
so that T maps the segment P0 Q0 to the segment P Q; i.e., T (P0 Q0 ) = P Q. Since
the segment P0 Q0 consists of all the points {(0, t)} for all t satisfying p ≤ t ≤ q ,
it follows that P Q = T (P0 Q0 ) = {(p, t)}, where p ≤ t ≤ q . This proves (i).
Y L
Q0 Q = (p, q )
P0 T - P = (p, p )
p X
O
(ii) If p < q, then L is not vertical and is therefore the graph of an equation
y = mx + k, where m is the slope of L; i.e.,
q − p
m= .
q−p
(See Lemma 6.13.) Every point of L is therefore of the form (x, mx + k) for a real
number x. Let S = (s, ms+k) be a point on the segment P Q, where P = (p, mp+k)
and Q = (q, mq+k). Thus S is between P and Q. The vertical lines passing through
P , S, and Q then intersect the x-axis at the numbers p, s, and q, respectively.
Y L
Q
S
p s q X
O
These vertical lines being parallel, Lemma 4.8 on page 178 implies that s is between
p and q; i.e., p < s < q. Since the segment P Q consists of P , Q, and all the points
between P and Q, this shows that P Q consists of all the points {(s, ms + k)} where
p ≤ s ≤ q. The proof of the lemma is complete.
Lemma 6.16. (i) Let a nonvertical line L be the graph of y = ax + b, and let P
be a point on L with coordinates (p, ap+b). Then the two half-lines of L determined
by P are the following two subsets of L:
360 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
Y
P − = all the points (x, ax + b) so that x < p. P +
P
− r
P + = all the points (x, ax + b) so that p < x. P
p X
Proof. We first prove (i). Fix a point Q ∈ P + . Since L is the disjoint union of
{P } and the two half-lines L+ , L− with respect to P (see Lemma 4.5 on page 173),
Q lies in either L+ or L− . By changing the notation if necessary, we may assume
Q ∈ L+ . Thus, Q is a point in both L+ and P + . We claim that P + = L+ .
To show P + ⊂ L+ , let U ∈ P + and we will prove U ∈ L+ . Let the coordinates
of Q and U be (q, aq + b) and (u, au + b), respectively. Because Q and U are both
in P + , we have p < q and p < u. For definiteness, let q < u. Then by the preceding
lemma, the segment QU consists of all the points (x, ax + b) so that q ≤ x ≤ u.
Thus P cannot lie in QU because P = (p, ap + b) and p < q. So QU does not
contain P and, by Lemma 4.5(ii), Q and U lie in the same half-line of P . Since Q
lies in L+ , so does U . Thus, P + ⊂ L+ . Next, we prove L+ ⊂ P + . Let V ∈ L+ ,
and we must show V lies in P + . Suppose V lies in P − , and we will deduce a
contradiction. By the definition of P − , V = (v, av + b), where v < p. Consider the
segment V Q. By the preceding lemma, V Q consists of all the points (x, ax + b) so
that v ≤ x ≤ q. But v < p < q and P = (p, ap + b), so P lies in V Q. Now both V
and Q are points of L+ , so the convexity of L+ implies that V Q does not contain
P . This contradiction shows that V cannot lie in P − . Since L is obviously also a
disjoint union of P + , P − , and {P }, V has to lie in P + after all. We have proved
that P + = L+ .
It remains to show that P − = L− . We first prove P − ⊂ L− . Let U ∈ P − .
U cannot lie in L+ because if it did, U would lie in P + (since L+ = P + ), contra-
dicting the disjointness of P + and P − . So U lies in L− . Since U is an arbitrary
point of P − , we have P − ⊂ L− . Conversely, we show that L− ⊂ P − . Suppose V
is a point of L− . Then V cannot lie in P + because if it did, V would lie in L+
(since P + = L+ ), contradicting the disjointness of L− and L+ . So V lies in P −
and therefore L− ⊂ P − . Together, we have proved P − = L− . (i) is proved.
Having gone through the proof of (i) in such detail, we see that the proof of
(ii) is similar but simpler: replace (x, ax + b) by (c, y) and reason with y instead of
x, and there will be no need to invoke Lemma 6.15 because we will be looking at
a number line consisting of all the (c, y), where y ∈ R. The proof of the lemma is
complete.
Finally, the issue of graphing a linear equation of two variables brings out the
confrontation of theory with practice in graphic presentations. Consider the graph
L of an equation such as y = 25x + 50. Clearly, the two points (0, 50) and (−2, 0)
are on L, and by Theorem 6.11 on page 354, L is the line passing through these two
points. Now, how do we present this graph on the blackboard or on the pages of
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 361
a book? By choice, the two points (1, 0) and (0, 1) are equidistant from the origin
O of the coordinate system; see the discussion on pp. 336ff. Such being the case,
once the two points (0, 0) and (1, 0) have been chosen on the horizontal x-axis,
the point (0, 50) on the y-axis would have to be, vertically, 50 times the distance
between (0, 0) and (0, 1) above (0, 0). This is simply not practical. A reasonable
compromise is to introduce a scaled coordinate system, i.e., rescale the y-axis,
so that the old distance between (0, 0) and (0, 1) is now interpreted to be 10 instead
of 1. Now the graph L of y = 25x + 50 can be presented as follows:
Y
50
40
30
L
20
10
X
−5 −4 −3 −2 −1 O 1
What must be borne in mind is the fact that, in this graphic representation, the
90-degree rotation around O is no longer length-preserving, among other geometric
anomalies. The reason is clear: the 90-degree counterclockwise rotation now maps
(1, 0) to (0, 10), so that it maps the unit segment [1, 0] on the x-axis to a segment
of length 10 (namely, the segment from (0, 0) to (0, 10)).
Pedagogical Comments. The teaching of slope (or, more to the point, the
widespread nonteaching of slope) in school mathematics furnishes an excellent ex-
ample of how TSM40 causes massive nonlearning in school mathematics. In TSM
and standard professional development materials, there appear to be two ways to
approach the definition of the slope of a line. One way is to choose two points on
the line, compute the "rise-over-run" using these two fixed points, and declare that
this "rise-over-run" is the slope of the line. There is no hint of the fact that the
"rise-over-run" computed with any two points on the line would also be equal to the
same number. The standard excuse is that since similar triangles are not taught
until high school geometry—after slope has been taught and used—this indepen-
dence cannot be explained. A second way is to inform students that what we call
nonvertical lines are the graphs of equations of the form y = mx + b and then define
the slope of such a line to be the number m (in other words, Theorem 6.11 on page
354 is implicitly assumed). Such a definition of a line is totally inappropriate for
K–12 as it likely shatters students’ confidence about whether they even know what
a line is anymore.
Either of these approaches inevitably leads to rote-teaching and rote-learning
of linear equations and their graphs. A recent study by Postelnicu and Greenes
([PG]) of students’ understanding of straight lines in algebra reveals that the most
difficult problems for students are those requiring the identification of slope of a
40 See page xiv of the preface for the definition of TSM.
362 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
line from its graph. Let us pause for a moment to absorb the absurdity of the
situation: how could a routine computation (see (6.28) on page 347) become "the
most difficult problem"? One can well imagine that—with respect to the first
approach—if students do not know they can use any two points on a line to compute
its slope, they would naturally be confused about "how to measure rise-and-run"
(see page 15 of [PG], left column). It therefore stands to reason that, since teaching
slope strictly by analogies and metaphors for lack of a correct definition of slope
has led to widespread, abject failures in student learning, we should try returning
to first principles by beginning with a correct definition of it for a change. Let us
teach similar triangles and explain the reasoning surrounding the proof of Theorem
6.11 (pp. 354ff.).
One can appreciate the stranglehold TSM has exerted on school mathematics
education from the following three facts. In the mathematics education reform of
the 1990s, the fact that the concept of the slope of a line has almost never been
presented to students correctly did not seem to merit any discussion (compare
[NCTM1989] and [PSSM]). Moreover, for at least two decades since 1990, many
states pushed for "algebra in grade 8" without paying any attention to this glar-
ing defect in the teaching of slope and the attendant massive nonlearning of the
geometry of linear equations in two variables. Finally, even after the publication
of [CCSSM], which took over the recommendation of (a draft of) [Wu2016b] and
made a major change in the curriculum so that the concept of similar triangles is
taught in grade 8 to make possible a correct definition of slope, there still seems
to be little awareness in the mathematics education literature that slope has to be
correctly defined. For example, the lack of a correct definition of slope in TSM is
not mentioned in a 2014 article on the teaching of slope ([Nagle-Moore-Russo]) or
a 2015 volume on Mathematical Understanding for Secondary Teaching ([MUST]).
By now, it should be obvious that, in large part, these three volumes (this vol-
ume, [Wu2020b], and [Wu2020c]) have been written in response to just such abuses
in TSM. End of Pedagogical Comments.
Exercises 6.5.
(1) Solve for x: (a) 4bx + 13 = 2x + 26b, where b is a number not equal to
2 . Simplify your answer. (b) 5 ax − 17 = 3 ax − 2 , where a is a nonzero
1 2 1 15
number.
(2) Find the equation of the line in each of the following; understand that,
in this situation, getting the right answer is only half the work because
your solution must be supported by reasoning at each step. (a) The line
passing through (− 21 , 3) with slope −5. (b) The line passing through the
two points (−7, 2) and (3, −4) (write your answer in the form of ax + by =
c). (c) The line L with slope −2 15 and x-intercept −4.
(3) (a) What is the equation of the line joining the two points (X, Y ) and
(Z, W )? (Write your answer in the form of ax + by = c.) (b) What is the
equation of the line whose slope is A and whose y-intercept is B?
(4) Prove Lemma 6.12 on page 354.
(5) For large x, e.g., x ≥ 106 , which of the graphs of the following two equa-
tions is above the other: y = 10x − 5,000,000 and y = x + 1,500,000?
6.6. PARALLELISM AND PERPENDICULARITY 363
(6) Explain directly as if to an eighth grader why the slope of the line defined
by 2x − 5y = 7 is 25 by making use of only the definition of slope and
without invoking Theorem 6.10.
(7) Find the equation of the line passing through (c, c3 ) and (d, d3 ), where c
and d are numbers so that c = 1, d = 1, and c = d. Simplify your answer.
Characterization of parallelism
Theorem 6.17. Two distinct nonvertical lines have the same slope ⇐⇒ they
are parallel.
Remark. We have been talking informally about lines that slant this way /
or that way \. It is time to point out that, with the availability of Theorem 6.17,
we can give precision to these expressions: we say a line slants this way / if the
line passing through the origin and parallel to it lies in quadrants I and III,41 and
similarly, we say a line slants this way \ if the line passing through the origin
and parallel to it lies in quadrants II and IV. It follows from Theorem 6.17 that a
nonvertical and nonhorizontal line slanting this way / has positive slope, and one
slanting this way \ has negative slope.
Next, we look at the converse. Suppose L and L are distinct lines with the
same slope and we have to prove that they are parallel. As before, let L and L
be the graphs of y = mx + k and y = mx + k , respectively. Note that the same
m appears as the coefficient of x in both equations because the lines have the
same slope (Lemma 6.13 again) and k = k because L and L are distinct lines, by
hypothesis. Again we use a contradiction argument. If they are not parallel, let
them intersect at a point P = (p1 , p2 ). By the definition of the graphs of linear
equations, P is a solution of both y = mx + k and y = mx + k . Thus,
Hence triangles SP Q and T QR are congruent, from which we conclude that |SQ| =
|T R|. This then completes the proof that 1 and 2 have the same slope.
Before proving the converse, we should ask what strategy we
might use. As always, what we can do depends on what tools
are available. Up to this point, what tools (theorems) are at our
disposal that would guarantee that two lines are parallel? Basi-
cally there is only one: Theorem G19 on page 281 in Chapter
5, which says that if the corresponding angles (or alternate in-
terior angles) of a transversal with respect to a pair of lines are
equal, then the lines are parallel. With the same construction
as above, we see that proving the equality of ∠SP Q and ∠T QR
would be our best bet. It then follows that we would try to prove
the congruence of triangles P SQ and QT R to achieve our goal.
Y 1
P 2
Q
S
T R
X
O
Conversely, suppose two distinct, nonvertical lines 1 and 2 have the same
slope and we have to show that they are parallel. We are assuming they are not
horizontal, so we may perform the same construction as before to get |P Q| = |QR|
and right triangles P SQ and QT R. We are going to prove that the triangles are
congruent by using the SAS criterion for congruence (Theorem G8 on page 245).
We already have right angles ∠P QS and ∠QRT . We also have the equality of one
pair of sides, |P Q| = |QR|, by construction. To get the equality of the other pair
of sides, we use the hypothesis on the equality of the slopes of 1 and 2 :
|P Q| |QR|
= .
|SQ| |T R|
Since |P Q| = |QR|, we have |SQ| = |T R|. Hence the triangles P SQ and QT R
are congruent, and consequently, the corresponding angles43 ∠SP Q and ∠T QR are
equal (Theorem G6 on page 240). This implies 1 2 because their corresponding
angles relative to the transversal P R are equal (Theorem G19 on page 281). The
proof is complete.
prove that a pair of angles are corresponding angles with respect to a transversal of parallel lines,
we will henceforth omit such arguments (see especially the Pedagogical Comments on page 279).
366 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
Characterization of perpendicularity
Theorem 6.18. Two distinct, nonvertical lines are perpendicular ⇐⇒ the prod-
uct of their slopes is −1.
T
T
T
T
O OT
T
T
T
We are now going to explain why the lines in the left picture, those that lie
completely inside quadrants I and III, are the ones with positive slope, and those in
the right picture, those that lie completely inside quadrants II and IV, are the ones
with negative slope. Take a point (a, b) on such a line ((a, b) = (0, 0)); then using
(a, b) and (0, 0) to compute its slope, we see that the slope is ab . Since b and a have
the same sign in quadrants I and III (see page 122 for the definition) and opposite
signs in quadrants II and IV, it follows that the slope is positive for lines lying
in quadrants I and III, and negative for lines lying in quadrants II and IV (recall
Theorem 2.10 on page 116). As a consequence, if we have two rays issuing from
O with positive slopes (i.e., the lines containing them have positive slopes), then
they lie in either quadrant I or III and therefore the degree of the angle between
these rays is either greater than 90◦ or less than 90◦ . Similarly for two lines with
negative slopes. It follows that two lines whose slopes have the same sign can never
be perpendicular to each other. Hence we have proved the following lemma.
Lemma 6.19. If two nonvertical lines passing through O are perpendicular, their
slopes have opposite signs.
We can now give the Proof of Theorem 6.18: first suppose 1 and 2 are
perpendicular lines. Let lines L1 and L2 be lines passing through the origin so that
1 L1 and 2 L2 . (If 1 or 2 passes through the origin, we will let L1 or L2 be
1 or 2 , as the case may be.) By Theorem 6.17, 1 and L1 have the same slope.
The same is true for 2 and L2 . Therefore it suffices to prove that the product of
the slopes of L1 and L2 is equal to −1. By hypothesis, neither line is vertical and
6.6. PARALLELISM AND PERPENDICULARITY 367
they are perpendicular to each other; thus neither line is horizontal either. So both
L1 and L2 are nonvertical and nonhorizontal. By Lemma 6.19, we already know
that the product is a negative number. Hence, it suffices to prove that the product
of the absolute values of the slopes of L1 and L2 is equal to 1.
Observe that, because 1 ⊥ 2 , L1 ⊥ L2 (Exercise 3 on page 372). It follows
from the preceding discussion that we may assume that L2 lies in quadrants I and
III and L1 lies in quadrants II and IV. Let P2 be some point on the line L2 and
in quadrant I, and let be the rotation of 90 degrees around the origin O. Then
(L2 ) = L1 and therefore if P1 = (P2 ), we have P1 ∈ L1 . Furthermore, let the
vertical line from P2 meet the x-axis at Q2 ; then (Q2 ), to be denoted by Q1 , lies
on the y-axis. As is a congruence, we have
(6.42) |P1 Q1 | = |P2 Q2 | and |OQ1 | = |OQ2 |.
Y
L1JJ
J L2
P1 J Q1
J P2
J
J
J
J X
O Q2
By (6.30) on page 348, we see that the absolute value of the slope of L1 is
|OQ1 |/|P1 Q1 | and the absolute value of the slope of L2 is |P2 Q2 |/|OQ2 |. Thus,
taking into account the equalities in (6.42), the product of the absolute values of
the slopes of L1 and L2 is
|OQ1 | |P2 Q2 | |OQ2 | |P2 Q2 |
· = · = 1.
|P1 Q1 | |OQ2 | |P2 Q2 | |OQ2 |
This completes the first part of the proof of Theorem 6.18.
Y
L1 J
J
J L2
P2
P1 J Q1
J
J
J
J X
O Q2
Since we are already given that |∠Q1 OQ2 | = 90◦ and clearly
|∠Q1 OQ2 | = |∠P2 OQ2 | + |∠Q1 OP2 |, it means that if we can
show
|∠P1 OQ1 | = |∠P2 OQ2 |,
368 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
−−→
Lemma 6.20. Let T be the translation along the vector BC, where B = (b1 , b2 )
2
and C = (c1 , c2 ). Then for all (x, y) in R , T (x, y) = (x + a1 , y + a2 ), where
(a1 , a2 ) = (c1 − b1 , c2 − b2 ).
6.6. PARALLELISM AND PERPENDICULARITY 369
Remark. When B is the origin O = (0, 0), the lemma simplifies to the follow-
−−→
ing: let T be the translation along the vector OC, where C = (c1 , c2 ). Then for any
(x, y) in R2 , T (x, y) = (x + c1 , y + c2 ).
Proof. Let the line passing through B and C be denoted by L. Let P be a point
with coordinates (p1 , p2 ). We will prove that T (P ) has coordinates (p1 +a1 , p2 +a2 ).
According to the definition of translation on page 234, the proof is broken into two
cases.
Case 1. P ∈ L. [The theorem in this case is pictorially obvious but its proof is
tedious. The suggestion is to skip this proof in a school classroom and concentrate
on proving Case 2 only.]
First assume that L is not vertical; i.e., b1 = c1 . Let Q denote the point
(p1 + a1 , p2 + a2 ), where, as in the lemma, (a1 , a2 ) = (c1 − b1 , c2 − b2 ). We want
to prove that Q = T (P ). The following picture is for the case a1 > 0, a2 > 0, and
p1 < b1 :
Y C L
:
Br
Q
:
Pr
X
O
According to the definition of translation on page 234, we have to prove that
−−→ −−→
Q lies on L and |P Q| = |BC| and P Q and BC point in the same direction. Let us
first observe that |P Q| = |BC| because the distance formula (equation (6.20) on
page 336) says
|P Q| = ((p1 + a1 ) − p1 )2 + ((p2 + a2 ) − p2 )2
= a21 + a22 = (c1 − b1 )2 + (c2 − b2 )2 = |BC|.
Next, we prove that Q lies on L. Let LP Q denote the line containing P and Q as
usual. Now L and LP Q are two lines that contain the point P , and they also have
the same slope because the slope of LP Q is
(p2 + a2 ) − p2 a2 c2 − b2
= =
(p1 + a1 ) − p1 a1 c1 − b1
and the latter is the slope of L. Therefore, by Theorem 6.9 on page 347, the lines
L and LP Q coincide. Therefore Q lies on L = LP Q .
−−→ −−→
Finally, we prove that P Q and BC point in the same direction. There are two
cases: a1 > 0 and a1 < 0. It suffices to take up the case of a1 > 0 as the second case
is similar. Recall that P = (p1 , p2 ) and B = (b1 , b2 ). Suppose p1 < b1 , as in the
preceding picture. Then we claim that the ray RBC is contained in the ray RP Q .
Since a1 > 0, we have p1 < q1 (= p1 + a1 ). Therefore with Q = (q1 , q2 ), the ray
RP Q consists of all the points (x, y) on L so that p1 ≤ x, according to Lemma 6.16
on page 359. By the same token, since c1 = b1 + a1 > b1 , the ray RBC consists of
all the points (x, y) on L so that b1 ≤ x. Thus for any point (x , y ) lying in RBC ,
we have p1 < b1 ≤ x . It follows that RBC ⊂ RP Q . Now if we let L0 be the vertical
line passing through P , then the closed right half-plane of L0 (consisting of all the
370 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
points (x, y) in the plane so that p1 ≤ x) contains both RP Q and RBC . This shows
−−→ −−→
that P Q and BC point in the same direction.
Suppose b1 < p1 instead.
Y Q L
:
Pr
C
:
r
B
X
O
Our strategy to prove Q = (p1 +a1 , p2 +a2 ) is to let Q be the point (p1 +a1 , p2 +a2 )
and then show that Q coincides with Q. To this end, we will prove below that
L LP Q and LCQ LBP . Assuming this for the moment, we see that, by the
parallel postulate, LP Q has to be the line passing through P and parallel to L,
and therefore L = LP Q . Similarly, LCQ has to be the line passing through C and
parallel to LBP , and therefore = LCQ . Since two distinct lines can intersect at
only one point (Lemma 4.2 on page 165), Q = Q and we are done in this case.
Let us now prove L LP Q . First assume that L is not vertical. Then b1 = c1 ,
so that a1 = c1 − b1 = 0, We will compute the slope of LP Q using (of course) the
two points P = (p1 , p2 ) and Q = (p1 + a1 , p2 + a2 ):
(p2 + a2 ) − p2 a2
slope of LP Q = = .
(p1 + a1 ) − p1 a1
6.6. PARALLELISM AND PERPENDICULARITY 371
Next, we compute the slope of L using the two points B = (b1 , b2 ) and C = (c1 , c2 ).
Because a1 = c1 − b1 and a2 = c2 − b2 , we have C = (a1 + b1 , a2 + b2 ). Hence,
c2 − b2 a2
slope of L = = .
c1 − b1 a1
These two slopes being equal, we see that L LP Q (Theorem 6.17 on page 363)
when L is not vertical. Now if L is vertical, then b1 = c1 and a1 = c1 − b1 = 0.
Since Q = (p1 + a1 , p2 + a2 ) = (p1 , p2 + a2 ), Q and P = (p1 , p2 ) have the same
first coordinates and LP Q is also vertical. Thus L LP Q as before.
Next, we will prove that LCQ LBP . First assume that LBP is not vertical,
so that b1 = p1 . Then using Theorem 6.17 again, we will prove that the slopes of
LCQ and LBP are equal. Using the fact that C = (a1 + b1 , a2 + b2 ), we compute
the slope LCQ using C and Q :
(a2 + b2 ) − (p2 + a2 ) b2 − p2
slope of LCQ = = = slope of LBP .
(a1 + b1 ) − (p1 + a1 ) b1 − p1
Thus LCQ LBP in case LBP is not vertical. It remains to consider the case that
LBP is vertical. Then b1 = p1 so that a1 + b1 = a1 + p1 . Since a1 = c1 − b1 ,
this implies c1 = p1 + a1 , and the first coordinates of C and Q are equal. There-
fore LCQ is also vertical and again LCQ LBP . The proof of the lemma is complete.
Lemma 6.20 can be used to give the shortest proof that the composition of two
translations is a translation (see Exercise 10 on page 238). The same lemma also
shows that the composition of translations is commutative in the sense that if T
−−
→ −−−→
is the translation along a vector AB and T is the translation along a vector A B ,
then for all (x, y) in the plane, (T ◦ T )(x, y) = (T ◦ T )(x, y).
It turns out that, with the help of trigonometric functions and complex num-
bers, we will be able to express all the basic isometries in terms of coordinates, at
least in principle (see Section 1.6 in [Wu2020c]). In the meantime, here are some
simple rotations and reflections in terms of coordinates. (We leave their proofs to
Exercises 8, 9, and 11 on page 372).
(i) Let denote the counterclockwise rotation of 90 degrees
around the origin O of R2 . Then for every (x, y) ∈ R2 , (x, y) =
(−y, x).44
(ii) Let 0 be the 180-degree rotation around the origin O. Then
for every (x, y) ∈ R2 , we have 0 (x, y) = (−x, −y).
(iii) If Λ1 denotes the reflection with respect to the x-axis, then
for every (x, y) in R2 , Λ1 (x, y) = (x, −y).
(iv) If Λ2 denotes the reflection with respect to the y-axis, then
for every (x, y) ∈ R2 , Λ2 (x, y) = (−x, y).
44 If (x, y) lies in the first quadrant, this is implicit in the proof of Theorem 6.18.
372 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
Exercises 6.6.
(1) What is the equation of the line which is perpendicular to the graph of
ax + by = c, where a, b, c are constants, a = 0, and b = 0, and which
passes through the point (− 12 , 3)?
(2) Assume a triangle with vertices at (1, 1), (5, 1), and (7 21 , 4). Does the point
(2, − 61 ) lie on the line passing through the vertex (1, 1) and perpendicular
to the opposite side of the triangle? Explain.
(3) Prove the following assertion which was used to prove Theorem 6.18: let
L and V be two perpendicular lines. If L and V are lines so that L L
and V V , then also L ⊥ V .
(4) Let O be the origin of a coordinate system in R2 , and let r be a positive
number. Prove that the transformation T of R2 which assigns to each
point (a, b) the point (ra, rb) is exactly the dilation with center O and
scale factor r. (Caution: This is a slippery proof.)
(5) (a) Let L be the line joining O (the origin (0, 0)) to the point (a, b), and
let L be the line joining O to the point (a , b ). Prove: L ⊥ L if and only
if aa + bb = 0. (You may recognize the latter as the dot product of the
vectors (a, b) and (a , b ) that you came across in calculus.) (b) Let L and
L be the lines defined by the equations ax + by = c and a x + b y = c ,
respectively. Prove that L ⊥ L if and only if aa + bb = 0.
(6) Write out a self-contained proof, using only congruent triangles and with-
out using similar triangles, of the second half of Theorem 6.18; i.e., if the
product of the slopes of two lines is −1, then the lines are perpendicular.
(7) Let T be the translation along the vector from O to a fixed point (a1 , a2 )
in R2 . (a) If L is the vertical line defined by x = 27 , what is the equation
of T (L)? (b) If L is the horizontal line defined by y = 51, what is the
equation of T (L)? (c) If L is the line defined by 2x − 3y = 1, what is the
equation of T (L)?
(8) (i) Let 0 be the 180-degree rotation around the origin O. Prove that for
every (x, y) ∈ R2 , we have 0 (x, y) = (−x, −y). (ii) Let φ be the 180-
degree rotation around the point (a, b). Prove that for every (x, y) ∈ R2 ,
we have φ(x, y) = (2a − x, 2b − y).
(9) (i) If Λ1 denotes the reflection with respect to the x-axis, prove that for
every (x, y) ∈ R2 , Λ1 (x, y) = (x, −y). (ii) If Λ2 denotes the reflection with
respect to the y-axis, prove that for every (x, y) ∈ R2 , Λ2 (x, y) = (−x, y).
(10) Let L be the line defined by 3x − 4y = 3, and let Λ denote the reflection
across L. Compute Λ(7, 13 ).
(11) (i) Let ρ be the 90◦ counterclockwise rotation around the origin O and
let φ be the 90◦ clockwise rotation around O. Prove that for any point
(x, y), ρ(x, y) = (−y, x) and φ(x, y) = (y, −x).
(ii) Let (a, b) be a fixed point in the coordinate plane and let be the
90◦ counterclockwise rotation around (a, b) and let ϕ be the 90◦ clockwise
rotation around (a, b). Prove that for any point (x, y)
(iii) Let P and Q be two distinct points and let P be the 90◦ counter-
clockwise rotation around P and ϕQ be the 90◦ clockwise rotations around
6.7. SIMULTANEOUS LINEAR EQUATIONS 373
In between the two preceding extreme cases, "most" linear systems have exactly
one pair (x0 , y0 ) as a solution. We will explain what "most" means and why this
is true by way of geometry.
Let 1 , 2 be the lines which are the graphs of the equations ax + by = e and
cx + dy = f , respectively. Suppose (x0 , y0 ) is a solution of the linear system
ax + by = e,
cx + dy = f.
In particular, this means we are assuming that there is a solution of the system.
We wish to interpret this solution geometrically. Since (x0 , y0 ) is a solution of
ax + by = e, the point (x0 , y0 ) lies on 1 , by the definition of the graph of an
equation. For the same reason, (x0 , y0 ) lies on 2 as well. Therefore (x0 , y0 ) lies on
both 1 and 2 , and therefore it lies in the intersection of 1 and 2 . (We have to
be careful not to assume that the intersection of 1 and 2 is a point, because we
cannot a priori exclude the possibility that 1 = 2 as in (6.43) above, in which case
the intersection of 1 and 2 is the line itself.) Conversely, suppose (x0 , y0 ) lies in
the intersection of 1 and 2 ; then it must be a solution of the system
ax + by = e,
cx + dy = f
because (x0 , y0 ) being on 1 means ax0 + by0 = e and (x0 , y0 ) being on 2 means
cx0 + dy0 = f . We have therefore proved the following basic fact relating the
solutions of a linear system to the graphs of the equations in the system.
Theorem 6.21 gives the precise reasoning for why the solution of a linear system
of two linear equations in two unknowns corresponds to the intersection of the
two lines defined by the linear equations of the system. This is the reason why
one can get the solution of a system of simultaneous linear equations by graphing
the equations. Such a correspondence is usually decreed by fiat in TSM without
explanation, probably because the precise definition of the graph of an equation is
rarely given or, if given, is not put to use.
It is worth noting that Theorem 6.21 shares a common feature with a coor-
dinate system: they both provide a dictionary that mediates two disparate sets
of information: the algebraic information about solutions of a linear system and
the geometric information about intersections of lines. In this particular case, we
know all about the intersections of lines (see (L1) and (L2) on page 165) and will
therefore use this knowledge to shed light on the solutions of linear systems. We
know that there are exactly three mutually exclusive possibilities for two lines in
the plane: the lines are either
identical or
parallel or
distinct but not parallel.
6.7. SIMULTANEOUS LINEAR EQUATIONS 375
Corollary. Given a linear system of two equations in two unknowns, let the graphs
of the linear equations be 1 and 2 . Then the linear system either
has an infinite number of solutions (corresponding to 1 = 2 ) or
has no solution (corresponding to 1 2 ) or
has a unique solution (corresponding to 1 = 2 but 1 is not
parallel to 2 ).
We can now explain what is meant by "most" linear systems have a unique
solution. Given two lines, what are the chances that they are either identical or
parallel? This is in fact a precise mathematical question that can be answered
completely: zero. To obtain this answer, one must do some advanced mathematics.
Nevertheless, one can provide an intuitive understanding of the situation by fixing
one of the lines, say 1 , and ask what the chances are that the other line 2 either
coincides with 1 or is parallel to 1 . Clearly we can ignore the possibility of 2
actually equaling 1 because this almost never happens. What about 2 1 ? Look
at it this way: restrict 2 to be a line passing through a fixed point P not lying
in 1 ; then according to the parallel postulate (page 165), there is at most "one
chance" that 2 1 , whereas there are infinitely many possibilities for 2 not to
be parallel to 1 . Since this is true for any point P not lying on 1 , it is intuitively
clear that, "almost always", 2 will be a line distinct from 1 and not parallel to 1 .
So by the Corollary, a linear system will "almost always" have a unique solution.
We now take this corollary of Theorem 6.21 to the next level: what are the
algebraic properties of the linear system that would lead to an infinite number of so-
lutions, no solution, and a unique solution? We can literally follow the prescription
of the preceding corollary and just doggedly investigate the algebraic properties of
the linear equations that correspond to 1 = 2 , 1 2 , and 1 = 2 but 1 2 .
This would lead to a depressing case-by-case argument with thickets of details that
would ultimately not be particularly enlightening.
Here is one way this argument could be carried out.
Case 1. The graphs 1 and 2 of ax + by = e and cx + dy = f ,
respectively, coincide. If they are vertical, then b = d = 0 and the
equations become x = e/a and x = f /c. Their graphs are identical
⇐⇒ ae = fc . Therefore this case is equivalent to
e f
b=d=0 and = .
a c
If they are not vertical, then both b = 0 and d = 0 and we may rewrite
the system as
y = mx + k,
y = m x + k ,
where m = − ab , k = eb , m = − dc , and k = fd . Then 1 and 2 being
identical means they have the same slope and therefore m = m .
The equations of 1 and 2 become y = mx + k and y = mx + k ,
376 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
again
ad = bc.
What we are going to do is to give a sophisticated algebraic analysis of the pre-
ceding corollary from the perspective of slope. This analysis is not something most
school students can "discover" on their own. Instead, we are going to offer students
an opportunity to learn from the wisdom of the past. Absorbing what others have
6.7. SIMULTANEOUS LINEAR EQUATIONS 377
to offer is one very good way for us to grow intellectually. It is pointless to try to
"discover" everything yourself; it is impossible anyway.
Before giving the proof, we observe that since the definition of the determinant
does not require that any of a, b, c, and d be nonzero, the lines 1 and 2 that
correspond to the two equations in the system could therefore be vertical. We also
wish to point out that the following proof could be presented slightly differently;
see Exercise 3 on page 383.
45 Mathematical Aside: You may consider this discussion to be a review of the simplest case
Case (ii). The determinant ad − bc is zero. We claim that in this case, either
both b and d are 0 or both b and d are not 0.
To prove the claim, it suffices to show that if b = 0, then d must be 0, and if
d = 0 then b = 0. Suppose b = 0. Then ad − bc = 0 implies that ad = 0. But by
the definition46 of a linear equation of two variables, not both a and b can be 0 in
ax + by = e. So a = 0. It follows that ad = 0 implies d = 0. Similarly, if d = 0,
then also b = 0. The claim is proved.
We now examine the first possibility: b = d = 0. Then both a and c are
nonzero, by the definition of a linear equation of two variables. The linear system
may therefore be rewritten as
⎧
⎨ x = ae ,
⎩ f
x = c.
Clearly, these vertical lines are identical (and the system has an infinite number of
solutions) if e/a = f /c, and they are parallel (and the system has no solution) if
e/a = f /c.
Next we examine the second possibility: b = 0 and d = 0. The linear system
may therefore be rewritten as
⎧
⎨ y = (− ab )x + eb ,
⎩
y= (− dc ) x + fd .
Thus 1 and 2 have slopes equal to −a/b and −c/d, respectively. Now ad − bc = 0
by hypothesis, so ad = bc and the cross-multiplication algorithm implies that
a/b = c/d. This implies 1 and 2 have the same slope. We have to decide if
they are identical or parallel. If e/b = f /d, then the equations are identical and
therefore so are their graphs (the linear system therefore has an infinite number
of solutions), and if e/b = f /d, then the lines are distinct (because, for example,
46 This is another reminder that we must take definitions seriously. Since we have defined a
linear equation αx + βy = γ to be such that not both α and β are zero, it stands to reason that
this part of the definition will play a critical role sooner or later.
6.7. SIMULTANEOUS LINEAR EQUATIONS 379
(0, e/b) is a point on 1 but not on 2 ) and are therefore parallel (the linear system
therefore has no solution). This completes the proof of Case (ii) and, therewith,
the proof of Theorem 6.22.
Remark. From the proof itself, we see that the conclusion of Case (ii) can be
made very precise; namely:
(i) If the determinant is 0, then either b = d = 0 or both b = 0
and d = 0.
(ii) In case b = d = 0, then the linear system has an infinite
number of solutions if e/a = f /c, and it has no solution if e/a =
f /c.
(iii) In case b = 0 and d = 0, then the linear system has an
infinite number of solutions if e/b = f /d and has no solution if
e/b = f /d.
However, it is imperative that you not try to memorize these conclusions. If
you understand the reasoning, then you can use it in each situation to draw the
right conclusion. We illustrate with some simple examples.
If we are given a linear system with b = d = 0, e.g.,
4
3x = 5,
−x = − 15 ,
then common sense dictates that you multiply the second equation by −3 to change
the system to ⎧
⎨ 3x = 45 ,
⎩
3x = 35 .
Direct inspection now shows that the linear system has no solution.
Suppose we are given
10.2x − 13.6y = 11.5,
− 94 x + 3y = − 21
4 .
We note that 10.2 × 3 − (−13.6)(− 49 ) = 0 as both products are equal to 30.6, and
so we are in the situation of ad − bc = 0 but bd = 0. We know from the preceding
analysis that the linear system has either no solution or an infinite number of
solutions, depending on whether the lines defined by the equations are distinct or
identical, respectively. The simplest way to find out whether the lines defined by
these equations are identical or distinct is to rewrite both equations in the form
−1
of y = mx + b and compare. Thus multiplying the first equation by 13.6 and the
1
second equation by 3 , we get
y = 10.213.6 x −
11.5
13.6 ,
y = 3
4x − 7
4.
Notice that in so doing, we do not need to bother with checking whether 10.2 13.6 and
3
4 are equal or not, as we already know that they must be equal (because the lines
have the same slope!). The only thing to compare is whether 11.5
13.6 and 7
4 are equal.
Since the former is less than 1 and the latter is greater than 1, they are obviously
380 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
not equal. Hence the two lines are distinct and this system has no solutions. There-
fore the same is true of the original system as well.
Solution by substitution
Up to this point, we have not talked about the explicit algebraic method of solv-
ing a linear system taught in school classrooms. This is the well-known method of
substitution (sometimes taught in the equivalent form of the method of elimi-
nation). This method is usually taught by rote in schools. What we will do next
is to subject this method to a critical examination.
In the remainder of this section, we will explain why this method produces the
correct solution to any linear system with nonzero determinant and what it means
geometrically.
Let us first look at a simple case and then use the standard symbolic manipu-
lations taught in schools to get a solution. Consider
2x − 3y = 1,
(6.44)
3x + 2y = −1.
Let us eliminate y. So from the second equation, we get
3 1
y =− x− .
2 2
Substituting this expression of y into the first equation gives
3 1
2x − 3 − x − = 1.
2 2
Simplifying, we get 13
2 x = − 12 so that
1
x=− .
13
Substituting this value of x into y = − 32 x − 12 then leads to
5
y=− .
13
1
TSM now says (− 13 , − 13
5
) is a solution of (6.44).
to the discussion on the solution of a linear equation in one variable on pp. 324ff.;
the reader may wish to review the latter before proceeding further.
Suppose (x0 , y0 ) is a solution of the system (6.44); i.e., we assume that for an
ordered pair of numbers (x0 , y0 ), we have
2x0 − 3y0 = 1,
3x0 + 2y0 = −1.
Then these are two equalities of numbers, and we can proceed to compute with
them in the usual way that we do arithmetic. From the second equation, we get
3 1
y0 = − x0 − .
2 2
Substituting this value of y0 into the first equation gives
3 1
2x0 − 3 − x0 − = 1.
2 2
Solving this linear equation in x0 (as in Section 6.2 on page 322), we get 13
2 x0 = − 12
so that
1
x0 = − .
13
Substituting this value of x0 into y0 = − 32 x0 − 12 then leads to
5
y0 = − .
13
Note that if we replace x0 by x and y0 by y, then this computation is formally
identical to the previous method of solution taught in the school classroom. The
only difference is that the second computation is the one that is mathematically
valid, because it is nothing but an ordinary computation carried out with numbers.
To summarize, what we have proved is this:
(A) If (x0 , y0 ) is a solution of the given linear system (6.44),
then
1 5
x0 = − and y0 = − .
13 13
Have we shown that (− 13 1
, − 13
5
) is actually a solution of the given linear system?
No. For that purpose, we need to prove the following assertion, which is in fact the
converse statement of (A):
(B) If x0 = − 13 1
and y0 = − 13 5
, then (x0 , y0 ) is a solution of the
linear system (6.44).
A routine computation verifies that, indeed,
⎧
1
5
⎨ 2 − 13 − 3 − 13 = 1,
⎩
1
5
3 − 13 + 2 − 13 = −1.
So the ordered pair 1
(− 13 , − 13
5
produced by the method of substitution is a solution
)
of the system (6.44), and (B) is correct.
Obviously, this pragmatic answer would be of little value if it were a singular
occurrence that happens to furnish a solution for this linear system but for no others.
Such is not the case. We now give a self-contained and coherent account that shows
that the usual method of substitution is the procedural aspect of a mathematically
valid method of solution. In other words, the rote procedure may seem to make
382 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS
no sense, but in fact it can be shown to make sense after all. Here is the general
explanation.
Assume a linear system
ax + by = e,
(6.45)
cx + dy = f.
We assume that its determinant Δ = ad − bc = 0. We first assume that the system
has a solution (x0 , y0 ). Then
(6.46) ax0 + by0 = e,
(6.47) cx0 + dy0 = f.
We now follow the usual computations used in the method of substitution to de-
termine the explicit value of (x0 , y0 ), as follows. If b = d = 0, then Δ = 0. Thus
Δ = 0 implies that not both b and d are 0. Without loss of generality, we may
assume d = 0. Then equation (6.47) implies y0 = − dc x0 + fd . Substitute this value
of y0 into (6.46) to get
c f
ax0 + b − x0 + = e.
d d
Multiplying both sides by d and simplifying, we obtain (ad − bc)x0 = de − bf , so
that
de − bf
x0 = .
Δ
Substituting this value of x0 into the equation y0 = − dc x0 + fd , we obtain
c de − bf f 1 −cde + bcf + adf − bcf
y0 = − + =
d Δ d d Δ
and therefore
af − ce
y0 = .
Δ
What we have just proved is that, assuming that the system (6.45) has a solution
(x0 , y0 ) and that Δ = 0, then necessarily
de − bf af − ce
(6.48) (x0 , y0 ) = , .
Δ Δ
It is now routine to check that the (x0 , y0 ) in (6.48) is in fact a solution of the system
(6.45). For example, let us check that with (x0 , y0 ) as in (6.48), (6.46) holds:
de − bf af − ce
ax0 + by0 = a +b
Δ Δ
It remains to observe that the computations that led to (6.48) are exactly what
we do when solving simultaneous equations by substitution.
Exercises 6.7.
(1) Verify by direct computation that equation (6.47) holds for the (x0 , y0 ) in
(6.48).
(2) Discuss the solutions of each of the following systems without actually
solving them.
Give reasons.
4x − 3y = 1,
(a)
9x − 7y = 53 .
2.4x − 3.5y = 43,
(b)
−0.264x + 0.385y = −4.63.
15x + 12y = −4,
(c)
−35x − 28y = 28 3 .
(3) (a) Give a direct proof of () below by using Lemma 6.13 on page 356 and
Theorem 6.21 on page 374 without invoking Theorem 6.22 (on page 377).
() Assume the linear system (where m, m , k, and k are con-
stants)
mx + y = k,
m x + y = k .
−5(x+1)
(7) Let x be a number not equal to −4 or 3. Express 3(x2 +x−12) as a sum of
1 1
(constant) multiples of x+4 and x−3 .
(8) Suppose you are teaching a ninth grade algebra class and you want to show
your students how to solve the following linear system by the method of
substitution for the first time:
2x − y = −1,
x + 2y = 7.
Carefully explain how you would teach it.
(9) Given positive integers s and t with s < t. (a) Solve for u and v in the
linear system ⎧
⎨ u + v = t,
s
⎩ u − v = s
t.
(b) Show that u > 0 and v > 0. (c) If the solutions u and v in terms
of s and t are written in the form of u = cb and v = ab , where a, b, c
are positive integers expressed in terms of s and t, show that {a, b, c}
forms a Pythagorean triple; i.e., a, b, and c are positive integers and
a2 + b2 = c2 . (Note: See page 290. This way of generating Pythagorean
triples dates back to the Babylonians circa 1800 BC. See [Robson] and
also Chapter 1 of [Katz].)
(10) In each of the following, you are asked to solve the linear system in the
preceding exercise with the given values of s and t to obtain Pythagorean
triples. You may use a scientific calculator.
(a) s = 1, t = 2. (b) s = 2, t = 3. (c) s = 2, t = 69.
(d) s = 54, t = 125. (e) s = 8, t = 9907.
(11) A Pythagorean triple is said to be primitive if there is no (positive)
common divisor among the triple of positive integers other than 1. Prove
that the following are equivalent for a Pythagorean triple {a, b, c}:
(i) {a, b, c} is primitive.
(ii) {a, b} are relatively prime.
(iii) {a, c} are relatively prime.
(iv) {b, c} are relatively prime.
(12) Let {a, b, c} be a Pythagorean triple so that a2 + b2 = c2 . If {a, b, c} is
primitive, prove that one of a and b is even and the other is odd.
(13) If s and t are relatively prime positive integers so that one is even and
the other is odd, then prove that the Pythagorean triples produced in
Exercise 9 are primitive.
(14) Let {a, b, c} be a Pythagorean triple. Prove that the following are equiv-
alent:
(i) {a, b, c} is a primitive Pythagorean triple with a odd and b even.
(ii) There is a pair of relatively prime positive integers s and t, with s < t
and one of them is even and the other odd, so that a = t2 − s2 , b = 2st,
and c = t2 + s2 .
Glossary of Symbols
385
386 GLOSSARY OF SYMBOLS
[Arbaugh et al.] F. Arbaugh, M. Smith, J. Boyle, and M. Steele, We Reason & We Prove for All
Mathematics, Corwin, Thousand Oaks, CA, 2018.
[Ball] D. L. Ball, The mathematical understandings that prospective teachers bring to teacher
education, Elementary School Journal 90 (1990), 449–466.
[Ball-McDiarmid] D. L. Ball and G. W. McDiarmid, The subject matter preparation of teachers.
In W. R. Houston (ed.), Handbook of Research on Teacher Education, Macmillan, New
York, NY, 1990, 437–449.
[Barnett-Goldstein-Jackson] C. Barnett, D. Goldstein, and B. Jackson, Fractions, Decimals, Ra-
tios, & Percents, Heinemann, Portsmouth, NH, 1994.
[Bashmakova-Smirnova] I. Bashmakova and G. Smirnova, Beginning & Evolution of Algebra,
Mathematical Association of America, Washington, DC, 2000.
[Beckmann-Izsák] S. Beckmann and A. Izsák, Why is slope hard to teach?, September 1, 2014.
Retrieved from http://tinyurl.com/j3k5q7r
[Begle1972] E. G. Begle, Teacher knowledge and student achievement in algebra, SMSG Reports,
No. 9, 1972, https://eric.ed.gov/?id=ED064175
[Behr et al.] M. Behr, G. Harel, T. Post, and R. Lesh, Rational numbers, ratio, and proportion.
In D. A. Grouws, editor. Handbook of Research on Mathematics Teaching, Macmillan,
New York, 1992, 296–333.
[Carpenter et al.] T. P. Carpenter, M. L. Franke, and L. Levi. Thinking mathematically: Integrat-
ing algebra and arithmetic in elementary school, Heinemann, Portsmouth, NH, 2003.
[CCPublishers1] K-8 Publishers’ Criteria for the Common Core State Standards for Mathematics
(2012). Retrieved from http://tinyurl.com/bpgx8ed
[CCPublishers2] High School Publishers’ Criteria for the Common Core State Standards for
Mathematics (2013). Retrieved from https://tinyurl.com/y8upgtcr
[CCSSM] Common Core State Standards for Mathematics (2010). Retrieved from http://www.
corestandards.org/Math/
[Courant-Robbins] R. Courant and H. Robbins, What Is Mathematics?, Oxford University Press,
New York, 1941. MR0005358
[Davis-Pearn] G. E. Davis and C. A. Pearn, Division of Fractions, Republic of Mathematics
Publications, 2009, http://tinyurl.com/h85qwb8
[DeTurck] D. DeTurck, Down with Fractions!, September 22, 2002. Retrieved from https://www.
youtube.com/watch?v=AKYZhdbnOWM
[Education Week] Education Week, Researcher Isolates Common-Core Math Implementation
Problems, November 14, 2014. Retrieved from: https://tinyurl.com/y7hme29c
[Ellis-Bieda-Knuth] A. B. Ellis, K. Bieda, and E. Knuth, Essential Understanding of Proof and
Proving, National Council of Teachers of Mathematics, Reston, VA, 2012.
[EngageNY] Grade 8 Mathematics Module 4: Teacher Materials. http://tinyurl.com/h2er4qh
[Euclid1] Euclid, The Thirteen Books of the Elements, Volume 1. T. L. Heath, transl., Dover
Publications, New York, NY, 1956.
[Euclid2] Euclid, The Thirteen Books of the Elements, Volume 2. T. L. Heath, transl., Dover
Publications, New York, NY, 1956.
[Eureka] Common Core’s Eureka Math — Grade 8, https://greatminds.org/resources/
[Gibson] G. A. Gibson, Common-Sense Methods in Arithmetic and Algebra, The School World,
No. 97, January 1907. Retrieved from http://tinyurl.com/zemxm5q
[Ginsburg] J. Ginsburg, On the Early History of the Decimal Point, Amer. Math. Monthly 35
(1928), no. 7, 347–349, DOI 10.2307/2298362. MR1521514
387
388 BIBLIOGRAPHY
[Wu2004a] H. Wu, Geometry: Our Cultural Heritage, Notices Amer. Math. Soc. 51 (2004), 529–
537, https://www.ams.org/notices/200405/rev-wu.pdf
[Wu2004b] H. Wu, "Order of operations" and other oddities in school mathematics, June 1, 2004.
Retrieved from http://math.berkeley.edu/~wu/order5.pdf
[Wu2006] H. Wu, How mathematicians can contribute to K–12 mathematics education, Proceed-
ings of International Congress of Mathematicians, 2006, III, European Mathematical So-
ciety, Madrid, 2006, Zürich, 2006, 1676–1688, http://math.berkeley.edu/~wu/ICMtalk.
pdf
[Wu2008] H. Wu, Fractions, decimals, and rational numbers, February 29, 2008. Retrieved from
https://math.berkeley.edu/~wu/NMPfractions.pdf
[Wu2009] H. Wu, What’s sophisticated about elementary mathematics?, American Educator, Vol.
33, No. 3, Fall 2009, 4–14, https://math.berkeley.edu/~wu/wu2009.pdf
[Wu2010a] H. Wu, Pre-Algebra, April 21, 2010.47 Retrieved from http://math.berkeley.edu/
~wu/Pre-Algebra.pdf
[Wu2010b] H. Wu, Introduction to School Algebra, August 14, 2010. Retrieved from https://
math.berkeley.edu/~wu/Algebrasummary.pdf
[Wu2011a] H. Wu, Understanding Numbers in Elementary School Mathematics,
Amer. Math. Soc., Providence, RI, 2011, https://bookstore.ams.org/mbk-79/
[Wu2011b] H. Wu, The Mis-Education of Mathematics Teachers, Notices Amer. Math. Soc. 58
(2011), 372–384, https://math.berkeley.edu/~wu/NoticesAMS2011.pdf
[Wu2012] H. Wu, Teaching Geometry According to the Common Core Standards, January 1,
2012 (third revision: October 10, 2013). Retrieved from https://math.berkeley.edu/
~wu/Progressions_Geometry.pdf
[Wu2013] H. Wu, Teaching Geometry in Grade 8 and High School According to the Common
Core Standards, October 13, 2013, https://math.berkeley.edu/~wu/CCSS-Geometry_1.
pdf
[Wu2014] H. Wu, Potential impact of the Common Core Mathematics Standards on the Amer-
ican Curriculum. In Mathematics Curriculum in School Education. Y. Li and G. Lap-
pan (eds.), Springer, Dordrecht, 2014, pp. 119–142, https://math.berkeley.edu/~wu/
Common_Core_on_Curriculum_1.pdf
[Wu2015] H. Wu, Mathematical education of teachers, Part II: What are we doing about Textbook
School Mathematics?, AMS Blogs, March 1, 2015, https://tinyurl.com/y46wnahl
[Wu2016a] H. Wu, Teaching School Mathematics: Pre-Algebra, Amer. Math. Soc., Providence, RI,
2016, https://bookstore.ams.org/mbk-98/. Its Index is available at: http://tinyurl.
com/zjugvl4
[Wu2016b] H. Wu, Teaching School Mathematics: Algebra, Amer. Math. Soc., Providence, RI,
2016, https://bookstore.ams.org/mbk-99/. Its Index is available at: http://tinyurl.
com/haho2v6
[Wu2018a] H. Wu, The content knowledge mathematics teachers need. In Mathematics Matters
in Education—Essays in Honor of Roger E. Howe, Y. Li, J. Lewis, and J. Madden
(eds.), Springer, Dordrecht, 2018, pp. 43–91. Also https://math.berkeley.edu/~wu/
Contentknowledge1A.pdf
[Wu2018b] H. Wu, From arithmetic to algebra, Part 1 and Part 2, December 20, 2018. Retrieved
from https://math.berkeley.edu/~wu/Arithmetic-to-Algebra2019.pdf
[Wu2020b] H. Wu, Algebra and Geometry, Amer. Math. Soc., Providence, RI, 2020.
[Wu2020c] H. Wu, Pre-Calculus, Calculus, and Beyond, Amer. Math. Soc., Providence, RI, 2020.
47 This is referenced in [CSSM], page 92, as "Wu, H., Lecture Notes for the 2009 Pre-Algebra
Institute".
Index
research on the teaching of, Hald, Ole, 67, 86, 156, 198, 276
xxi–xxii half-line, 173, 253, 359
subtraction of, 38 half-plane, 176, 253
sum of, 33 closed, 176
Francis, Larry, 246, 291 left, 334
FTS, 51, 256, 263 lower, 334
FTS*, 257 right, 334
full angle, 183 upper, 334
fundamental assumption of school half-planes
mathematics (= FASM), 3 opposite, 176
fundamental assumption of school harmonic mean, 72
mathematics (= F ASM ), 133 heptagon, 171
fundamental fact of fraction-pairs Heron’s formula, 296
(= FFFP), 23 hexagon, 170, 171
fundamental principles of HL (criterion for triangle
mathematics, xiii, xxxix, 2, 75, congruence), 293
160, 178 horizontal line, 332
fundamental theorem of arithmetic, above, 334
149 below, 334
fundamental theorem of similarity slope of, 347
(= FTS), 256 house-painting, 82, 86
hypotenuse (of a right triangle), 290
Gardiner, Tony, 198 hypotenuse-leg (= HL), 293
GCD, 138
generality and abstraction, 299, identity, 303–304
305–310 identity transformation, 200
geometric figure, 47 if and only if, 22
bounded, 194 image (of a transformation), 205
data points on, 272, 274 inequalities, 121
paved by other geometric figures, about absolute value, 127–130
47 about rational numbers, 121–125
rectilinear, 269 inequality, 12
unbounded, 194 arithmetic and geometric means,
geometric mean, 132 127, 132
geometric series, 309 double, 126
finite, 309 triangle, 129
geometry curriculum infinite decimals, 14
issues with, 157–164 infinity of primes, 155
Gödel’s incompleteness theorem, 169 injection, 205
Goldbach conjecture, 147 injective, 205
graph of a linear equation in two inscribed (in a circle), 193
variables, 353–354 inside
graph of an equation in two of a circle, 195
variables, 352 of a polygon, 196
greater than inside a circle, 186
among numbers, 13, 91 integers, 91
greatest common divisor (= GCD), integral domain, 133
40, 138 integral linear combination, 140
group, 211, 235, 240, 286 interior angle of a polygon, 197
INDEX 397
intermediate value theorem, 195 left half-plane (of a vertical line), 334
intersection (of sets), 165 left-pointing (vector on number
interval line), 100
closed, 126 leg (of a right triangle), 290
closed bounded, 5, 169 length
length of, 126 additivity property of, 98
open, 126 of a segment, 7, 185
inverse, 56 of a vector, 100, 220
multiplicative, 56, 113 of an interval, 126
inverse (of a transformation), 211, preserve, 201
211 less than
inverse transformation, 211 among numbers, 12
of a congruence, 240 line
of a reflection, 231 defined by a linear equation in two
of a rotation, 211 variables, 354
of a similarity, 284 joining two points, 166
of a translation, 234 segment (joining two points), 169,
invert and multiply rule, 1, 57, 59, 358
68, 71, 119 slants / or \, 363
generalized form, 119 line separation, 173
irrational number, 153 line symmetry, 230
isolating the variable, 329 linear equation, 300
isometry, 200 in one variable, 323
relation with congruence, 237, 250 in two variables, 352
isosceles triangle, 193 linear equations
controversy in TSM about its simultaneous, 373
definition, 193 system of, 373
linear polynomial, 313
Jordan curve theorem, 196
linear system, 373
key lemma, 144 determinant of, 377
Koswatta, Sunil, 101 in two variables, 373
of two equations in two unknowns,
(L1) (geometric assumption), 165 373
(L2) (geometric assumption), 165 relation with geometry, 374–375
(L3) (geometric assumption), 167 solution by elimination, 380
(L4) (geometric assumption), 176 solution by substitution, 380
(L5) (geometric assumption), 184 solution set of, 373
(L6) (geometric assumption), 188 relation with determinant, 377
(L7) (geometric assumption), 237 lines
(L8) (geometric assumption), 250 distinct, 165
lawn-mowing, 81 intersecting, 165
LCM, 40, 41, 156 parallel, 165
least common denominator, xxii, 1, perpendicular, 191
32, 34, 40, 41, 134 locating a fraction on a number line,
least common multiple (=LCM), 34, 15, 52
156 lower half-plane, 334
left (of a vertical line), 334 lowest terms (of a fraction), 138
left endpoint, 6 lowest terms (of fraction), 30
398 INDEX
MBK/131