100% found this document useful (3 votes)
879 views452 pages

Rational Numbers To Linear Equations: Hung-Hsi Wu

Uploaded by

Bob Cross
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
879 views452 pages

Rational Numbers To Linear Equations: Hung-Hsi Wu

Uploaded by

Bob Cross
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 452

Rational Numbers

to Linear Equations
Hung-Hsi Wu
Rational Numbers
to Linear Equations
Rational Numbers
to Linear Equations
Hung-Hsi Wu
2010 Mathematics Subject Classification. Primary 97-01, 97-00, 97D99, 97-02,
00-01, 00-02.

For additional information and updates on this book, visit


www.ams.org/bookpages/mbk-131

Library of Congress Cataloging-in-Publication Data


Cataloging-in-Publication Data has been applied for by the AMS.
See http:www.loc.gov/publish/cip/.

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit www.ams.org/publications/pubpermissions.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.

c 2020 by the author. All rights reserved.
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 25 24 23 22 21 20
Dedicated to the memory of

David M. Collins
Contents

Contents of the Companion Volumes and Structure of the Chapters ix

Preface xi

To the Instructor xxvii

To the Pre-Service Teacher xli

Prerequisites xlv

Some Conventions xlvii

Chapter 1. Fractions 1
Overview of Chapters 1 and 2 1
1.1. Definition of a fraction 4
1.2. Equivalent fractions 19
1.3. Adding and subtracting fractions 32
1.4. Multiplying fractions 43
1.5. Dividing fractions 54
1.6. Complex fractions 68
1.7. Percent, ratio, and rate problems 72
1.8. Appendix: The basic laws 86

Chapter 2. Rational Numbers 89


2.1. The rational numbers 90
2.2. Adding rational numbers 91
2.3. The vectorial representation of addition 99
2.4. Multiplying rational numbers 105
2.5. Dividing rational numbers 112
2.6. Comparing rational numbers 121
2.7. FASM 133

Chapter 3. The Euclidean Algorithm 137


3.1. The reduced form of a fraction 137
3.2. The fundamental theorem of arithmetic 147

Chapter 4. Basic Isometries and Congruence 157


Overview of Chapters 4 and 5 157
4.1. The basic vocabulary, Part 1 164
4.2. The basic vocabulary, Part 2 180
4.3. Transformations of the plane 199
4.4. The basic isometries: Rotations 216
vii
viii CONTENTS

4.5. The basic isometries: Reflections and translations 229


4.6. Congruence, SAS, and ASA 239
4.7. A brief pedagogical discussion of proofs 252
Chapter 5. Dilation and Similarity 255
5.1. The fundamental theorem of similarity 255
5.2. Dilation 268
5.3. Similarity 283
Chapter 6. Symbolic Notation and Linear Equations 297
6.1. Symbolic expressions 298
6.2. Solving linear equations in one variable 322
6.3. Setting up coordinate systems 331
6.4. Lines in the plane and their slopes 337
6.5. The graphs of linear equations in two variables 351
6.6. Parallelism and perpendicularity 363
6.7. Simultaneous linear equations 373

Glossary of Symbols 385


Bibliography 387
Index 391
Contents of the Companion Volumes
and Structure of the Chapters

Algebra and Geometry [Wu2020b]


Chapter 1: Linear Functions
Chapter 2: Quadratic Functions and Equations
Chapter 3: Polynomial and Rational Functions
Chapter 4: Exponential and Logarithmic Functions
Chapter 5: Polynomial Forms and Complex Numbers
Chapter 6: Basic Theorems of Plane Geometry
Chapter 7: Ruler and Compass Constructions
Chapter 8: Axiomatic Systems

Pre-Calculus, Calculus, and Beyond [Wu2020c]


Chapter 1: Trigonometry
Chapter 2: The Concept of Limit
Chapter 3: The Decimal Expansion of a Number
Chapter 4: Length and Area
Chapter 5: 3-Dimensional Geometry and Volume
Chapter 6: Derivatives and Integrals
Chapter 7: Exponents and Logarithms, Revisited

ix
x CONTENTS OF THE COMPANION VOLUMES AND STRUCTURE OF THE CHAPTERS

Structure of the chapters in this volume and its two companion volumes
(RLE= Rational Numbers to Linear Equations,
A&G = Algebra and Geometry,
PCC = Pre-Calculus, Calculus, and Beyond)
Preface

The really vital importance of definition is not, I venture to


think, sufficiently emphasized even in good textbooks.. . .
they form the premises from which the rest of the algebraic
theorems are to be derived by a process of logical deduction.
George A. Gibson ([Gibson, page 3])

A nation’s mathematics education is only as good as its mathematics teach-


ers. The ongoing crisis in school mathematics education (cf. [RAGS]) therefore
raises the question: what have we done wrong in the preparation of mathematics
teachers? The answer is plenty: our longstanding neglect of the mathematical edu-
cation of teachers has come home to roost. This neglect manifests itself in K–12,
where we fail to ensure that correct mathematics is taught to students—especially
future teachers—and we compound this neglect by failing to provide the needed
corrective measures in universities for pre-service teachers to repair their mathe-
matical mis-education in K–12 (cf. [Wu2011b]). Thus, through no fault of their
own, the mathematics teachers of our nation are put in the untenable position of
teaching from a position of weakness: they do not possess the needed knowledge of
mathematics to carry out their basic duties.
The present volume is the fourth of six volumes whose collective goal is to
provide the needed mathematical backing for a full-scale attack on the crisis in
the mathematical education of mathematics teachers in K–12. This volume is the
first of three—the other two volumes being [Wu2020b] and [Wu2020c]—that give a
systematic and grade-level-appropriate exposition of the mathematics of grades 9–
12 (excluding probability1 and statistics), together with some essential background
information about rational numbers. This is the mathematical content knowledge
that we believe, as of 2020, all high school mathematics teachers need for their
teaching and all mathematics educators2 interested in high school mathematics
need for their research. The previous three volumes—the volume on the math-
ematics of grades K–6 ([Wu2011a]) and the two volumes on the mathematics of
grades 6–8 ([Wu2016a] and [Wu2016b])—have already been published. We hope
that these six volumes will serve the dual purpose of revamping the mathematical
education in the universities of pre-service mathematics teachers and future math-
ematics educators on the one hand, and on the other, offering textbook publishers
a detailed blueprint on how to introduce mathematics into school textbooks that is

1 There is, however, an exposition of finite probability in Section 1.10 of [Wu2016a].


2 We use the term "mathematics educators" to refer to university faculty in schools of
education.

xi
xii PREFACE

both correct and learnable. These six volumes will also shore up the critical math-
ematical backgrounds of supervisors of mathematics and mathematics professional
developers.
There has been no lack of books on all or parts of school mathematics—the
mathematics of K–12—in the education literature. We have chosen to add another
2,500 pages (the approximate total length of these six volumes) to the already
voluminous literature because we believe these volumes provide a first attempt
at solving two of the central problems in mathematics education: whether school
mathematics can be made to respect the integrity of mathematics and how much
mathematics a mathematics teacher or a mathematics educator needs to know.

School mathematics that respects mathematical integrity

We will address the former problem first. These six volumes give a detailed
confirmation of the fact that school mathematics—while maintaining its fidelity to
the progression of the standard school mathematics curriculum from kindergarten to
grade 12—can be made to respect the integrity of mathematics. Such a confirmation
has been a long time coming.
In the following pages, we will explain what mathematical integrity is and
why it is important to have an exposition of school mathematics that respects
mathematical integrity.3
At first glance, it seems absurd that there would be any need to discuss whether
school mathematics respects mathematical integrity. Is not school mathematics, by
its very name, part of mathematics and, as such, does it not follow that school math-
ematics carries the integrity inherent in the subject? This is a misconception about
school mathematics that we must confront without delay. School mathematics is
in fact not part of mathematics if mathematics is understood to be what working
mathematicians do or what is taught to math majors in college mathematics depart-
ments. Rather, school mathematics is an engineered version of mathematics—in the
sense of mathematical engineering introduced in [Wu2006]—in the same way
that civil engineering is an engineered version of Newtonian mechanics. Mathe-
matical engineering customizes the abstractions of mathematics for consumption
by K–12 students. For example, a fraction in mathematics is a straightforward
concept: it is an element of the quotient field of the integral domain of integers.
Fortunately, no one suggests that we tell this to ten-year-olds. Mathematical engi-
neering intervenes at this point to recast the concept of fractions so that fractions
can be understood by elementary students (see [Wu1998]). There are many such
examples all through the K–12 curriculum, e.g., negative numbers, slope of a line,
geometric measurements (length, area, and volume), congruence, similarity, expo-
nential functions, logarithms, axioms of plane geometry, etc. The engineering that
is needed to make these abstract concepts learnable by school students is therefore
substantial at times. Now there is good engineering, but there is also bad engineer-
ing, and the question is whether good mathematical engineering has been put in the
service of school mathematics. Unhappily, the answer is not always. In fact, school

3 It would be legitimate to also inquire why it has taken so long for someone to try to meet

this obvious need.


PREFACE xiii

mathematics and mathematical integrity parted ways at least five decades ago, and
our schools have been plagued by products of very bad mathematical engineering
ever since.
Before proceeding further, we first explain what mathematical integrity is be-
cause this concept is coming into focus,. We say a mathematical exposition has
mathematical integrity if it embodies the following five qualities:
(a) Definitions: Every concept is clearly and precisely defined
so that there is no ambiguity about what is being discussed. (See
the quote from Gibson at the beginning of this preface.)
(b) Precision: All statements are precise, especially the hy-
potheses that guarantee the validity of a mathematical assertion,
the reasoning in a proof, and the conclusions that follow from a
set of hypotheses.
(c) Reasoning: All statements4 other than the unavoidable ba-
sic assumptions are supported by reasoning.5
(d) Coherence: The basic concepts and skills are logically in-
terwoven to form a single fabric, and the interconnections among
them are consistently revealed.
(e) Purposefulness: The mathematical purpose behind every
concept and skill is clearly brought out so as to leave no doubt
about why it is where it is.
These we call the Fundamental Principles of Mathematics. A fuller discus-
sion of these principles will be found on pp. xxviii–xxxiii in the To the Instructor
section on pp. xxvii ff. below, but two things need to be said right away. First, the
role of definitions in school mathematics has been misunderstood, and misrepre-
sented, in the education literature thus far, so that—to educators—the emphasis on
definitions may seem to be misplaced. One will find a more balanced presentation
about definitions on pp. xxix–xxx. Next, there is no difference between reasoning
and proof in a mathematical context, and what is generally called problem solving
in the education literature is part of what is known as theorem proving in math-
ematics.6 Overall, it should not be difficult to see—and these three volumes will
bear witness to this fact—that these five fundamental principles are what make
mathematics transparent, in the sense that everything is on the table and no guess-
work or privileged knowledge is needed for its decoding. They are also the qualities
that make mathematics accessible to all students and learnable by all students. If
we want mathematics learning to take place in schools, it is incumbent on us to
teach school mathematics that is consistent with these fundamental principles.
But to return to the discussion of school mathematics of the past five decades,
we have to begin by asking what is school mathematics? This is in fact the question
that these six volumes ([Wu2011a], [Wu2016a], . . . , [Wu2020c]) attempt to answer,
but short of that, we will have to say school mathematics is the common content
of most of the mathematics textbooks in K–12 and most of the college textbooks
aimed at the professional development of mathematics teachers and mathematics
educators (compare the review of school textbooks in Appendix B of Chapter 3

4 With the exception of a few standard ones such as the fundamental theorem of algebra.
5 Intuitively,
reasoning supports even those assumptions because there are reasons why we
want to assume them.
6 Compare the discussion on pp. xxxvi-xxxvii.
xiv PREFACE

in [NMAP2]). If this strikes readers as too vague, they will be relieved to know
that there is in fact an amazing consistency among these textbooks.7 For example,
a fraction is thought of as a piece of pizza or a part-of-a-whole, although neither
conveys the message to students that a fraction is a number that they have to use
for extensive computations. Consequently, with such a "definition" of a fraction,
the arithmetic operations on fractions cannot be defined and their computational
algorithms cannot be proved.8 For another example, the concept of the slope of a
line in the coordinate plane is defined in most of these textbooks by taking two pre-
assigned points on the line to form the rise-over-run. But why is this rise-over-run
equal to the rise-over-run with respect to another pair of points on the same line?
Almost all textbooks insinuate that this equality is obviously true and not worth
fussing about. And so on. In general, school mathematics, as defined collectively
by these textbooks, is antithetical to mathematical integrity in that it lacks clarity
(due to a general absence of definitions and a pervasive lack of precision in its artic-
ulation), mostly asks for rote memorization as its default mode of learning (due to
the pervasive absence of reasoning), is incoherent (due to its neglect of the inherent
logical structure of mathematics), and traverses the curriculum in a listless and
pro forma manner (due to its failure to recognize the mathematical purpose behind
each topic). We call the content of these standard school mathematics textbooks
TSM (Textbook School Mathematics).9 TSM is recognized, consciously or
subconsciously, by teachers and educators to be unlearnable, and it is this unlearn-
ability that emboldens countless sensible adults to proclaim, often with pride, "I
am not good in math!"
There is a far more pernicious fallout from TSM, however, and it is the effect
TSM has on mathematics teachers and educators. These teachers and educators
have learned only TSM in K–12, but as of 2020, institutions of higher learning do
not provide courses to help future teachers and educators to replace their knowledge
of TSM with school mathematics with mathematical integrity. Consequently, all
that most teachers can do when they go back to teach in K–12 is trot out the TSM
they are familiar with, and all that most educators can do when they begin their
research is to fall back on the TSM they were taught. So the next generation also
learns TSM, and this is the vicious cycle that has rendered school mathematics
synonymous with TSM for at least the past five decades. Most educators may have
suspected that there must be more to school mathematics than TSM, but without
access to an exposition of school mathematics with mathematical integrity, their
suspicion remains just that, a suspicion.
Back in 1985, Lee Shulman lamented in his well-known address to the AERA
about "the absence of focus on subject matter among the various research paradigms
for the study of teaching" ([Shulman, page 6]). Shulman was talking about all dis-
ciplines, but from the standpoint of these six volumes ([Wu2011a], [Wu2016a], . . . ,
[Wu2020c]), we gain a clear perspective on how this neglect of the subject matter
may have come about in mathematics. We speculate that mathematics educators
7 When I first had the opportunity to sample a wide range of the available K–12 textbooks

for the first time around the year 2000, I was convinced that the publishers were in collusion and
simply agreed to copy each other.
8 Remember: "Ours is not to reason why. Just invert and multiply."
9 For a more extended discussion of TSM, see To the Instructor on pp. xxvii ff. as well as

[Wu2014] and [Wu2018a]. Note that TSM provides a new window into the phenomenon known as
math phobia.
PREFACE xv

may have chosen not to pay any attention to mathematical content because, since
school mathematics was apparently nothing more than TSM, they saw nothing in
the subject matter of school mathematics worthy of their serious attention. To
change mathematics educators’ perception of the subject matter in mathematics,
we have to give them access to a fully detailed exposition of school mathematics
with mathematical integrity.
The omnipresence of TSM in the last half-century created the unmistakable im-
pression that perhaps at least some of the travesties in TSM are necessarily endemic
to school mathematics. Under the circumstances, it was not easy to imagine that
school mathematics might have anything to do with mathematical integrity. But
two things happened around 1990. In 1989, NCTM (National Council of Teachers
of Mathematics) launched its school mathematics education reform by proclaiming
that school mathematics could be made to respect mathematical integrity. Without
a detailed exposition of school mathematics that respects mathematical integrity
to back up its claim, NCTM was of course going out on a limb. Then in 1994,
Alan Schoenfeld made a scholarly statement with the clear implication that, while
a school mathematics curriculum with mathematical integrity was certainly possi-
ble, we did not have it yet. What he wrote was, "Proof is not a thing separable
from mathematics, as it appears to be in our curricula . . . . And I believe it can be
imbedded in our curricula, at all levels" ([Schoenfeld1994, page 76]). Schoenfeld’s
statement was prompted in part by the debates surrounding the NCTM reform.
Note that beyond affirming his belief in the fundamental article of faith underly-
ing the NCTM reform, he stated openly that, indeed, this article of faith had not
yet been confirmed. We will return to Schoenfeld’s statement below, but before
proceeding further, we will make a few comments about the NCTM reform.
The foundational documents of the NCTM reform are the two sets of standards:
the 1989 [NCTM1989] and the 2000 [PSSM]. Although NCTM did not have an ex-
plicit recognition of the concept of TSM, the 1989 reform was undoubtedly a revolt
against the stranglehold of TSM on school mathematics education. NCTM declared
in essence that mathematical integrity must be part of school mathematics. For ex-
ample, [NCTM1989] states that one of the reform’s goals is that students "become
mathematically literate" (page 6). [PSSM] states that "a mathematics curriculum
should be coherent" (page 15), "should focus on important mathematics" (page
15), and "reasoning and proof should be a consistent part of students’ mathemati-
cal experience in prekindergarten through grade 12" (page 56). With the hindsight
of thirty years, we can see all too clearly the obstacles that confronted the NCTM
reform. With students, teachers, and educators completely immersed in TSM, the
clarion call for coherence, reasoning, and proof might as well have been stated in a
foreign language. Most of them had no conception of what those words meant.
We have to remember that, for example, what little "proof" TSM has to offer
resides only in the course in high school geometry, and even there, proofs are mainly
taught by rote (see [Schoenfeld1988]). Back in 1989, there was no detailed point-by-
point exposition of school mathematics that could provide a roadmap to show how
mathematical integrity can be introduced into school mathematics. There were no
school mathematics textbooks to replace the TSM-infested ones.10 Most fatally,
NCTM made no commitment to a massive and long-term professional development

10 In the year 2020, most of us can calmly look back and see that the reform curricula that

were published post-1989 were essentially different incarnations of TSM.


xvi PREFACE

program to explain to teachers what mathematical integrity in their daily teaching


could look like.11 With these three strikes against the NCTM reform before it
stepped up to the plate, the reform faced an insurmountable credibility crisis. Its
visionary declaration about what school mathematics education could be and ought
to be almost instantly became nothing more than appealing rhetoric. The need for
a detailed exposition of school mathematics with mathematical integrity could not
have been more urgent.
There is another way that having a detailed exposition of school mathematics
with mathematical integrity would have helped with the reform. Both [NCTM1989]
and [PSSM] did try to provide some mathematical details about the curriculum
they envisioned and, in so doing, made some missteps. For example, page 96 of
[NCTM1989] suggests that the addition of fractions—inscrutable as it is in TSM—
has to be approached gingerly, and neither [NCTM1989] nor [PSSM] points out the
profound error of using the least common denominator for the addition of fractions
(see page 41 below for an explanation of this error). Students’ difficulty with the
multiplication and division of fractions is duly noted in [NCTM1989] and [PSSM],
but again there is no substantive suggestion on how to get them out of the predica-
ment. Either document could have pointed to the need for a proof of the area
formula for a rectangle with fractional sides; such a proof would add immeasurably
to students’ knowledge and confidence in reasoning and proof about fraction multi-
plication and about the concept of area (see pp. 49ff. below for such a proof). But
the fact remains that neither did. One suggestion on the division of fractions is made
on page 219 of [PSSM], but it confuses the division of fractions with the concept of
division-with-remainder (see Section 7.2 of [Wu2011a] for a careful discussion of the
latter). The difficulties of the concept of slope for teachers (and students) are noto-
rious (see, e.g., page 126 of [Stump] or [Newton-Poon2]), but neither [NCTM1989]
nor [PSSM] seems to recognize that the concept of slope as it is known in TSM is
not properly defined and therefore a new approach is called for (see pp. 338–354 in
this volume), and so on. These and many other missteps could have been avoided
had a detailed exposition of school mathematics with mathematical integrity been
available.
Some twenty years later, 2010 saw the release of CCSSM, the Common Core
State Standards for Mathematics ([CCSSM]). CCSSM calls for a focused and co-
herent curriculum that stresses both conceptual understanding and procedural flu-
ency (pp. 3–4 and 6 of [CCSSM]). In addition, it also asks for precision and clear
definitions (page 7, loc. cit.). The most pronounced difference between CCSSM
and the NCTM reform lies in the specificity of the standards in CCSSM: they are
much more explicit in specifying the progression of mathematical topics through the
grades and, even more importantly, in steering the curriculum away (most of the
time) from the defective practices abounding in TSM. Because of the latter, most
of the standards in CCSSM look different from the traditional standards (includ-
ing those briefly sketched out in the NCTM documents [NCTM1989] and [PSSM]).
This is especially true for the standards on fractions, finite decimals, rational num-
bers (along the lines of Chapters 1 and 2 in [Wu2016a], similar to Chapters 1 and
2 in this volume), part of beginning algebra (along the lines of Chapters 1 and 4

11 The absence of this commitment was no accident. To carry out this kind of professional

development on a large scale, the need for something like these six volumes ([Wu2011a], . . . ,
[Wu2020c]) to serve as a guide would be absolute.
PREFACE xvii

in [Wu2016b], similar to Chapter 6 of this volume), and middle and high school
geometry (along the lines of Chapters 4 and 5 of [Wu2016a], similar to Chapters 4
and 5 in this volume).
However, in the apparent absence of a detailed account of what a CCSSM cur-
riculum would look like,12 the specificity of the curricular deviations in CCSSM
turns out to be more of a political liability than an asset. Many people immedi-
ately put CCSSM and the NCTM reform on the same footing. Their perception was
that these two movements represented what happens when a bunch of wannabes
pontificate about school mathematics education without knowing what they are
talking about. A conspicuous example they cited is CCSSM’s approach to the ge-
ometry curriculum in middle school and high school using reflections, rotations,
and translations as the basic building blocks. Such a change is necessitated by the
inherently flawed TSM geometry curriculum based on an uninformed interpreta-
tion of the work of Euclid some twenty-three centuries ago (see pp. 157–164 below
for a more detailed explanation, and see Chapter 8 of [Wu2020b] for a compre-
hensive one). Instead, CCSSM calls for a nuanced two-step process to introduce
reflections, rotations, and translations as the foundational building blocks of the
school geometry curriculum. Standards 8.G on page 55 of [CCSSM] describe how,
in grade 8, these transformations can be used informally (but correctly) in heuris-
tic arguments to develop students’ intuition about transformations (as detailed in
Chapters 4 and 5 of [Wu2016a]). Then in high school, these transformations are
precisely defined to be used for formal proofs (as in Chapters 4 and 5 of this vol-
ume and Chapters 6 and 7 of [Wu2020b]). But not having such details available
back in 2010, many critics, educators, and teachers immediately predicted the im-
pending doom of this effort by CCSSM by citing the failures of putative similar
experiments in other nations. Moreover, they also predicted (not entirely incor-
rectly) the almost certain confusion among teachers who would try to implement
this new curriculum, and they regarded as inevitable the disappearance of proofs
from CCSSM high school geometry. In the absence of a detailed exposition that
shows how to navigate and implement the Common Core geometry standards with
mathematical integrity,13 such misunderstanding led inevitably to harsh criticism
(see, e.g., [Milgram-Wurman, pp. 4–5] and [Phelps-Milgram, page 10 and footnote
15 on page 41]). So CCSSM ends up facing the same wide credibility gap that
plagued the NCTM reform twenty years ago.
It did not help that the CCSSM agenda also left out the critical component of
professional development for teachers, thereby creating the same sense of bewilder-
ment in classrooms across the land (see [Education Week], [Loewus1], [Loewus2],
and [Sawchuk]). It would seem that CCSSM is repeating the same mistake as
the NCTM reform by not taking seriously the need to offer sustained, large-scale
professional development for teachers to help with its implementation. With the

12 The situation surrounding the release of [CCSSM] is complicated. A detailed exposition of

the part of the CCSSM curriculum mentioned above—fractions, finite decimals, rational numbers,
parts of algebra, and middle school geometry—in the form of [Wu2010a] and [Wu2010b] in fact
predated CCSSM (they were drafts for [Wu2016a] and [Wu2016b], respectively). However, the
existence of these documents was not made widely known.
13 Note, however, that many curricular details on geometry were soon provided in [Wu2012],

[Wu2013], [Wu2016a], and [Wu2016b], but they were not made widely known. A detailed CCSSM-
aligned high school geometry curriculum, in existence since 2007, will appear in the second volume
of this three-volume set, [Wu2020b].
xviii PREFACE

publication of these three volumes, at least one complete exposition of school math-
ematics with mathematical integrity—an exposition that is also consistent in the
main with CCSSM—will be available to provide the needed guidance for this kind
of professional development, but will these volumes be too little too late? Only
time will tell.

The recent drive towards mathematical integrity

It should be abundantly clear from the foregoing discussion that any real im-
provement in school mathematics education requires us to rethink the mathematical
education of teachers and educators. In particular, the destructive presence of TSM
can no longer be ignored. These six volumes have been written with the express
intent of encouraging and supporting such rethinking.
In the last few years, several books have made a concerted effort to promote
the introduction of mathematical integrity into school mathematics, e.g., [MET2],
[NCTM2009], [MUST], and the sixteen volumes in the NCTM series Developing
Essential Understanding (e.g., [Ellis-Bieda-Knuth]). In a book entitled We Reason
& We Prove for All Mathematics, Arbaugh et al. respond directly to Schoenfeld’s
belief in the possibility of imbedding proofs in all levels of K–12 (quoted on page
xv) by flatly stating that, in their volume, they "will provide guidance about how
to make reasoning-and-proving a reality in your classroom" ([Arbaugh et al., page
x]). These developments are welcome because their willingness to directly address
the content of K–12 mathematics represents a giant step forward in school mathe-
matics education at a time when many are still clinging to the idea that integrating
fun, engaging activities into the classroom—while leaving TSM intact—is the way
to improve school mathematics education. Nevertheless, we must also add a word
of caution at this juncture of school mathematics education concerning the effec-
tiveness of "providing guidance" in small doses, quite apart from the quality of the
guidance itself.
As of 2020, we have to face the unpleasant truth that, because of the longstand-
ing malfeasance of the education establishment, most people in school mathematics
education have been immersed in TSM, and only TSM, for their entire lives. Conse-
quently, most end up being deficient in a detailed knowledge of the inner workings
of mathematics on the one hand and in a coherent view of mathematics as a whole
on the other. An example of the former is the chronic failure to recognize that,
without precise definitions, correct reasoning (= proof) is unattainable. Another
example of the same is the fact that a proof must not be confused with a heuristic
argument, no matter how attractive that heuristic argument may be.
Examples of the lack of a global, coherent view of mathematics abound in
TSM, but we will limit our discussion to only three of them. The first is the lack
of awareness of the overall hierarchical structure of mathematics; e.g., in order to
move forward mathematically in a mathematical development, one may only use
results already proved earlier. There is no better illustration of this lack than the
"proof" in TSM of equivalent fractions using fraction multiplication14 —one that
is universally taught in TSM. Such a "proof" should be recognized for what it is:
totally anti-mathematical. Here, the details are, step by step, impeccable, but the
flagrant mathematical error lies in using a fact—about fraction multiplication that
2×4 2×4
14 This is the reasoning that 2
3
= 3×4
because 2
3
= 2
3
×1= 2
3
× 4
4
= 3×4
.
PREFACE xix

can only be proved later in the development of fractions—to justify a foundational


result about fractions that is needed almost as soon as a fraction is defined. (See
pp. 270–271 in [Wu2011a] for further discussions of this error.)
A second example is the role of congruence in school mathematics. In TSM,
the concept of the congruence of two arbitrary figures is not well-defined, and only
the congruence of triangles is used for proofs in high school geometry (ASA, SAS,
SSS, etc.). Moreover, congruence seems to have little relevance to daily life. TSM
does not mention that, without the fundamental assumption that lengths, areas,
and volumes remain the same for congruent geometric figures, it is impossible to
derive any area or volume formulas (in particular, not even the area formulas for
parallelograms and triangles). This realization makes it imperative that, in teaching
the area formula for a triangle in grade 6 or 7 (for example), teachers make an
effort to bring out the important role that the concept of congruence plays in
geometric measurements (see Section 5.3 in [Wu2016b]). The same realization also
impacts the geometry curriculum in high school: the TSM treatment of congruence
as the "congruence of triangles" à la Euclid will have to be upgraded so that it can
make sense of the "congruence of any two geometric figures" (see the Overview of
Chapters 4 and 5 on pp. 157ff.). Such an upgrade is needed, for example, for the
study of quadratic functions (see Section 2.1 of [Wu2020b]). This glaring defect in
the TSM treatment of a foundational concept like congruence is in fact one of the
main reasons necessitating the overhaul of the TSM geometry curriculum in middle
and high school (see Chapters 4 and 5 of [Wu2016a], Chapters 4 and 5 in this
volume, and Chapters 6 and 7 of [Wu2020b]). This kind of longitudinal coherence
of the school mathematics curriculum, so vital for students’ mathematics learning
and on such a detailed level, is unlikely to be brought up in the context of providing
general guidance piecemeal.
A final example of the lack of a global, coherent view of mathematics in TSM
is the transition in the middle school curriculum √ from rational numbers to real
numbers due to the emergence of numbers such as 2, π, etc. TSM makes believe
that the introduction of irrational numbers and their arithmetic operations into the
school curriculum can be done surreptitiously and informally, without any explicit
mathematical discussion. The resulting mathematical errors and their consequences
for teachers and educators are profound. See the example on page xxxi below,
among many
√ such √ examples.
√ Also see the discussion of the incorrectness of the
equality 2 + 3 = 5 on pp. 207ff. of [MUST] which makes no mention of the
fact that the arithmetic operations on irrational numbers are never given a serious
and explicit discussion with ninth graders. What is needed for the purpose of
helping students make this transition is something like the FASM (Fundamental
Assumption of School Mathematics) stated on page 133, but unhappily, nothing
like FASM has ever appeared in TSM.
To address such deficiencies at both ends of the school mathematics spectrum,
it would be reasonable to argue that a systematic exposure of teachers and math-
ematics educators to a complete exposition of the mathematics—one that honors
mathematical integrity—over several grades at the very least15 will be the only
effective cure (see [NMAP1, Recommendation 19 on page xxi]).

15 Teachers need to know where their students come from and where they are headed,

curriculumwise.
xx PREFACE

We have just given a partial explanation of why these six volumes (this vol-
ume, together with [Wu2011a], [Wu2016a], [Wu2016b], [Wu2020b], and [Wu2020c])
require 2,500 pages of detailed mathematical discussions to confirm the fact that
school mathematics can be made to respect mathematical integrity. Because of the
corrosive effects of TSM that have pervaded and degraded school mathematics for
so long, we are obliged to rebuild school mathematics from the ground up. In these
six volumes, we take nothing for granted. For example, we pay special attention
to the need for correct definitions as the basis for reasoning and proofs; we want
to drive home the point that once a definition of a concept (such as a fraction) is
given, then every subsequent assertion about this concept has to be based on the
definition, and on the definition alone. Every statement in these volumes, from
whole numbers to calculus, is carefully proved.16 The intended goal of this effort
is to clarify, cumulatively, the mathematical meaning of the declarative statement,
"A implies B", as a purely deductive process that begins with the hypothesis A and
arrives at the conclusion B. This is in contrast with the common practice in TSM
of "explaining" something by telling a story, by drawing an analogy, or by offering
an attractive pattern or heuristic argument. These six volumes take an entirely
different tack: they show, consistently, how to verify "A implies B" in mathematics
by moving from A to B on the basis of definitions, explicit assumptions, or theo-
rems with the help of logic. These volumes do so—we emphasize—from the first
page to the last because we believe that the way to teach is not to pontificate but
to lead by example. This process of acculturating teachers (and ultimately their
students) to reasoning and proof does not have to be rigid or formal, especially in
the early grades (see, e.g., Sections 4.2 or Section 6.2 of [Wu2011a]), but the essen-
tial elements of logical deduction must be put in place and maintained ab initio to
preserve the integrity of mathematics. We also go into extensive detail about such
seemingly pedestrian topics as the proper use of symbols (Sections 6.1 and 6.2 on
pp. 298ff.), the meaning of an equation, and what it means to solve an equation
(see pp. 322–324), with the hope that the long years of obfuscation in TSM with
such jargon as "variables" and "symbolic manipulations for solving an equation"
will be brought to a merciful end.
We hope that the foregoing discussion has made the case for the critical need
for a thorough-going exposition of school mathematics with mathematical integrity.
Incidentally, the only reason we have made repeated references in this whole discus-
sion to the same six volumes by the present author is that there is no comparable
exposition at the moment. It is in fact our hope that the publication of these six
volumes will encourage others to come up with their own ways of replacing TSM
across K–12 with a development of school mathematics that respects mathematical
integrity.

How much mathematics teachers need to know

Knowing what school mathematics with mathematical integrity looks like en-
ables us to face up to the second problem in mathematics education that was
mentioned on page xii: how much mathematics a mathematics teacher or a mathe-
matics educator needs to know. For teachers, this problem has a long history; see,

16 With the usual disclaimer that there are a very few theorems that we must intentionally

assume without proof.


PREFACE xxi

e.g., [Ball], [Ball-McDiarmid], [Begle1972], [Goldhaber-Brewer], and [Monk]. We


can speculate that, because the school curriculum has been dominated by TSM
for so long and the flaws in TSM are so pronounced and extensive,17 mathemat-
ics educators were reluctant to prescribe the content knowledge teachers need in
terms of TSM on the one hand, and they were uncertain about what to prescribe
on the other. After all, there was simply no available exposition of school math-
ematics with mathematical integrity. Now that these six volumes are available,
it is possible to make a first attempt at describing the minimum knowledge that
teachers and educators in elementary, middle, and high school, respectively, need
to be effective in their work (again, see Recommendation 19 on p. xxi of [NMAP1]).

Those in elementary school mathematics education18 should know


the equivalent of [Wu2011a] minus Chapters 23, 31, 37, 41, and
42; they should also have some acquaintance with the equivalent of
Chapters 4 and 5 of [Wu2016a] and Chapters 1 and 2 of [Wu2016b].

Those in middle school mathematics education should know the


equivalent of [Wu2016a] and [Wu2016b] and have some acquain-
tance with the equivalent of Part 1 of [Wu2011a] and Chapters 4
and 5 of this volume.

Those in high school mathematics education should know the


equivalent of this volume, [Wu2020b], and [Wu2020c]. In addition,
because pre-service teachers and educators interested in high school
mathematics are typically math majors in college, they are ex-
pected to know something about linear algebra, i.e., vector spaces
and matrices. Those who intend to teach calculus or do research on
the teaching of calculus should also know something about Taylor’s
theorem and the Taylor series expansions of standard elementary
functions such as sine, cosine, exponential function, and logarithm;
they should also know some multi-variable calculus.

Now consider the teaching and learning of fractions and (finite) decimals. While
education researchers of the past five decades were no doubt aware of the simple
treatment of fractions in abstract algebra, their uncritical acceptance of TSM misled
them into believing that, for elementary school students, one can do no better than
teaching fractions as pieces of pizzas or some variation thereof. Consequently, they
focussed their research on the teaching and learning of fractions, for the most part,
on tweaking the TSM model of fractions-as-pizzas—with no thought given to help-
ing students learn about fractions as numbers or learn to reason their way through
the arithmetic of fractions.19 As a result, education research on fractions has fo-
cussed on increasing children’s experiential and informal familiarity with fractions

17 Regardless of the fact that the term TSM was coined only in 2011.
18 We strongly believe that the mathematics of elementary school should be taught by math-
ematics teachers. See [Wu2009].
19 Unhappily, TSM also claims some professional mathematicians among its victims: these

mathematicians have come to believe that teaching fractions in schools can lead to nothing more
than "confusion and memorization". See, for example, [DeTurck].
xxii PREFACE

based on the pizza model rather than on increasing children’s mathematical knowl-
edge of fractions based on a correct definition of a fraction. If it had tried to do
the latter, it would have rejected the absurd pizza model from the outset (see, e.g.,
pp. 33–35 of [Wu2008] for a brief discussion of the relevant literature). The same
body of education research has also tried to make sense—unsuccessfully of course—
of other anti-mathematical practices, such as treating decimals as a different kind
of number, adding and subtracting fractions using the least common denominator,
or teaching the multiplication and division of fractions without precise definitions.
Only recently have researchers become aware of a more reasonable foundation for
fractions (initiated in [Wu1998] and expanded in [Wu2011a]; abbreviated versions
are given in Chapter 1 of [Wu2016a] and Chapter 1 of this volume)20 that puts
the study of fractions on the number line, emphasizes the concept of a fraction as
a number for arithmetic computations, and makes sense of (finite) decimals as a
special collection of fractions. There is still some distance to go in this direction,
such as honoring the definition of a fraction by using it in every situation, e.g., for
multiplication, for division, for understanding ratios, etc. We eagerly look forward
to a change along these lines in the education research on fractions and decimals in
the years to come (cf. [Siegler et al.]).

School textbooks

Better school mathematics education requires not only more effective teach-
ers but also textbooks that contain only school mathematics with mathematical
integrity. Our discussion thus far has been all about getting more effective teach-
ers but nothing about getting better textbooks. This is not because we believe
textbooks are less important, but since most school textbooks are published by
the major publishers, there is little that people in academia can do to convince
publishers to abandon their bottom-line mentality and write better textbooks (cf.
[Keeghan]). However, there are now several online curricula written more or less in
accordance with CCSSM and, according to some reports, a few seem to be showing
promise.21
As we said at the beginning, we hope these six volumes under discussion can
serve as a blueprint for better school textbooks. But let us add a few caveats in
this regard. First of all, these six volumes are certainly not student textbooks: they
are written specifically for adults (teachers and educators, maybe some curious par-
ents). Nevertheless, their mathematical content has been carefully customized (i.e.,
engineered) for use in the appropriate grades, at least as far as the mathematical
level of sophistication is concerned, so that after some straightforward pedagogical
modifications and embellishments, they can be expanded into student textbooks.
An example of how such an expansion may be realized will appear before long,
we hope, in the form of a student textbook for grade 8 that will be posted on the
author’s homepage, https://math.berkeley.edu/~wu/. At the very least, we be-
lieve these six volumes taken together can serve as a detailed guide for textbook
20 This approach to fractions and decimals—as presented in [Wu2016a]—served as a blueprint

for the fractions and decimals standards of [CCSSM]. Because this volume is for consumption by
high school teachers and mathematics educators, what is in Chapter 1 is more brief and slightly
more sophisticated than its counterparts in [Wu2011a] and [Wu2016a].
21 It is uncertain whether any of the textbook evaluation agencies is aware of the importance

of having mathematical integrity in mathematics textbooks.


PREFACE xxiii

publishers on how to write school mathematics textbooks across K–12 that respect
both the standard curricular sequence and mathematical integrity. For this pur-
pose, textbook writers should take note that there are several major departures
from the standard school curriculum in this volume and [Wu2020b] and [Wu2020c].
Briefly, they are the following:

(1) The conversion of fractions to infinite decimals and geometric


measurements (length, area, and volume) are two topics typically
taught in middle school, but in these volumes they appear in
the third volume, [Wu2020c], after the introduction of limits
(see Chapters 3–5 of [Wu2020c]). Fortunately, the procedural
aspect of the conversion of fractions to decimals is addressed
(and partially explained) in Section 1.5 (pp. 54ff.) of this volume,
and there is an intuitive discussion of geometric measurements
in Chapter 5 of [Wu2016a] which is actually adequate for (a
somewhat superficial) use in a high school classroom.
The main reason for these two departures is that it is impos-
sible to make sense of infinite decimals and geometric measure-
ments without the use of limits. Our teachers’ and educators’
critical need for some real understanding of the subtleties of both
topics accounts for this departure from the norm. In any case,
any adaptation of Chapters 3-5 of [Wu2020c] for student text-
books will require selective omissions.

(2) The presentation of high school geometry in these volumes


deviates from the traditional one. The concept of congruence
is defined in terms of the tangible, accessible concepts of reflec-
tions, rotations, and translations in the plane, and similarity is
defined in terms of congruence and the equally tangible and ac-
cessible concept of dilation. A detailed explanation is given in
the Overview of Chapters 4 and 5 on pp. 157ff. as well as in
Section 4.7 on pp. 252ff. and Chapter 8 of the second volume,
[Wu2020b]. Because CCSSM has since adopted this approach to
middle and high school geometry, no defense of this deviation
will be necessary here.

(3) These three volumes propose a different progression of geom-


etry from middle school to high school, as follows. In grade 8,
teach enough informal geometry to get to the concept of similar
triangles, the angle-angle similarity criterion, and the proof of
the Pythagorean theorem before embarking on introductory al-
gebra in high school. Then in the high school geometry course,
revisit the topic of similar triangles, but this time from a more
formal standpoint. Again, see the Overview of Chapters 4 and
5 on pp. 157ff. for an explanation. (This departure from the
standard sequencing has also been adopted by CCSSM.)

The presentation of the curricular shift described in (3) will be given in Chap-
ters 4 and 5 of this volume, but with a mild twist. Because the informal geometry
xxiv PREFACE

(proposed for grade 8) has already been treated in detail in [Wu2016a], the ge-
ometry in Chapters 4 and 5 of the present volume will be the formal high school
counterpart of the informal geometry in [Wu2016a]. The exposition of the main
body of plane geometry (geometry of the triangle and the circle along with con-
structions with ruler and compass) then resumes in Chapters 6 and 7 of the second
volume, [Wu2020b], after we have finished discussing the standard topics of second-
year algebra.

Final thoughts

We call special attention to the fact that the third and last of these three
volumes, [Wu2020c], is essentially an introduction to mathematical analysis, cus-
tomized specifically for consumption by prospective mathematics teachers and ed-
ucators.22 It is likely that this material will also benefit beginning math majors in
college.
We should also address an obvious question that probably has been on readers’
minds all along; namely, why does this volume on high school mathematics begin
with the middle school topics of fractions and rational numbers? Nothing need be
said about the obvious relevance of these topics to mathematics educators, but we
owe high school teachers an explanation of why we consider these topics to be an
integral part of their content knowledge. It is a fact—though hidden in TSM—that
rational numbers, rather than real numbers, are the backbone of the mathematics
in grades 5–12. Unfortunately, because of TSM, students in all grades seem to
have trouble with fractions and, consequently, with rational numbers. Given the
hierarchical structure of mathematics, it is not surprising that students’ inability
to learn algebra can often be traced back to their weakness in the foundational
subjects of fractions and rational numbers. This was pointed out in the National
Mathematics Advisory Panel Report (see page 18 of [NMAP1]). Indeed, the story
has been told many times that even students in honors sections of Algebra 2 plead
with their teachers to give them instructions on fractions. So, to be effective in
teaching the standard topics of high school mathematics, high school teachers must
have a TSM-free working knowledge of fractions and rational numbers as well.
A final reflection: Earlier, we quoted Lee Shulman’s lament about "the absence
of focus on subject matter among the various research paradigms for the study of
teaching" (see page xiv). These six volumes have now redefined the meaning of
this subject matter for school mathematics. We hope mathematics educators will
discover through these volumes that the mathematics underlying school mathemat-
ics, when presented correctly, is no longer meaningless like TSM and is worthy of
their best efforts to learn it. Moreover, the subject matter, thus redefined, will have
repercussions on "the study of teaching". As school mathematics becomes more
learnable by all students, and therefore more teachable by all teachers, pedagogy
will have to focus—not on how to render the incomprehensible23 palatable—but on
how to facilitate the normal process of learning so that all students can learn how
to reason critically and correctly.

22 Here as well as elsewhere in these three volumes, we are engaging in serious mathematical

engineering in the sense of [Wu2006].


23 That is, TSM.
PREFACE xxv

But for all that, it will be necessary to first make school mathematics that re-
spects mathematical integrity an integral part of mathematics education research.
This then harks back to Lee Shulman’s lament. It is our belief and our hope that
school mathematics education will improve when mathematics education research
begins to address, not TSM, but school mathematics with mathematical integrity.

Acknowledgements

The drafts of this volume and its companion volumes, [Wu2020b] and
[Wu2020c], have been used since 2006 in the mathematics department at the Univer-
sity of California at Berkeley as textbooks for a three-semester sequence of courses,
Math 151–153, that was created for pre-service high school teachers.24 The two
people most responsible for making these courses a reality were the two chairs of
the mathematics department in those early years: Calvin Moore and Ted Slaman.
I am immensely indebted to them for their support. I should not fail to mention
that, at one point, Ted volunteered to teach an extra course for me in order to
free me up for the writing of an early draft of these volumes. Would that all of
us had chairs like him! Mark Richards, then Dean of Physical Sciences, was also
behind these courses from the beginning. His support not only meant a lot to me
personally, but I suspect that it also had something to do with the survival of these
courses in a research-oriented department.
It is manifestly impossible to write three volumes of teaching materials without
generous help from students and friends in the form of corrections and suggestions
for improvement. I have been fortunate in this regard, and I want to thank them
all for their critical contributions: Richard Askey,25 David Ebin, Emiliano Gómez,
Larry Francis, Ole Hald, Sunil Koswatta, Bob LeBoeuf, Gowri Meda, Clinton Rem-
pel, Ken Ribet, Shari Lind Scott, Angelo Segalla, and Kelli Talaska. Dick Askey’s
name will be mentioned in several places in these volumes, but I have benefitted
from his judgment much more than what those explicit citations would seem to
indicate. I especially appreciate the fact that he shared my belief early on in the
corrosive effect of TSM on school mathematics education. David Ebin and Angelo
Segalla taught from these volumes at SUNY Stony Brook and CSU Long Beach,
respectively, and I am grateful to them for their invaluable input from the trenches.
I must also thank Emiliano Gómez, who has taught these courses more times than
anybody else with the exception of Ole Hald. Some of his deceptively simple com-
ments have led to much soul-searching and extensive corrections. Bob LeBoeuf put
up with my last-minute requests for help, and he showed what real dedication to a
cause is all about.
Section 1.9 of the third volume ([Wu2020c]) on the importance of sine and
cosine could not have been written without special help from Professors Thomas
Kailath and Julius O. Smith III of Stanford University, as well as from my longtime
collaborator Robert Greene of UCLA. I am grateful to them for their uncommon
courtesy.

24 Since the fall of 2018, this three-semester sequence has been pared down to a two-semester

sequence. A partial study of the effects of these courses on pre-service teachers can be found in
[Newton-Poon1].
25 Sadly, Dick passed away on October 9, 2019.
xxvi PREFACE

Last but not least, I have to single out two names for my special expression
of gratitude. Larry Francis has been my editor for many years, and he has pored
over every single draft of these manuscripts with the same meticulous care from
the first word to the last. I want to take this opportunity to thank him for the
invaluable help he has consistently provided me. Ole Hald took it upon himself to
teach the whole Math 151–153 sequence—without a break—several times to help
me improve these volumes. That he did, in more ways than I can count. His
numerous corrections and suggestions, big and small, all throughout the last nine
years have led to many dramatic improvements. My indebtedness to him is too
great to be expressed in words.
Hung-Hsi Wu
Berkeley, California
February 2, 2020
To the Instructor

These three volumes (the other two being [Wu2020b] and [Wu2020c]) have
been written expressly for high school mathematics teachers and mathematics ed-
ucators.1 Their goal is to revisit the high school mathematics curriculum, together
with relevant topics from middle school, to help teachers better understand the
mathematics they are or will be teaching and to help educators establish a sound
mathematical platform on which to base their research. In terms of mathematical
sophistication, these three volumes are designed for use in upper division courses
for math majors in college. Since their content consists of topics in the upper
end of school mathematics (including one-variable calculus), these volumes are in
the unenviable position of straddling two disciplines: mathematics and education.
Such being the case, these volumes will inevitably inspire misconceptions on both
sides. We must therefore address their possible misuse in the hands of both math-
ematicians and educators. To this end, let us briefly review the state of school
mathematics education as of 2020.

The phenomenon of TSM

For roughly the last five decades, the nation has had a de facto national school
mathematics curriculum, one that has been defined by the standard school math-
ematics textbooks. The mathematics encoded in these textbooks is extremely
flawed.2 We call the body of knowledge encoded in these textbooks TSM
(Textbook School Mathematics; see page xiv). We will presently give a su-
perficial survey of some of these flaws,3 but what matters to us here is the fact that
institutions of higher learning appear to be oblivious to the rampant mathematical
mis-education of students in K–12 and have done very little to address the insid-
ious presence of TSM in the mathematics taught to K–12 students over the last
50 years. As a result, mathematics teachers are forced to carry out their teaching
duties with all the misconceptions they acquired from TSM intact, and educators
likewise continue to base their research on what they learned from TSM. So TSM
lives on unchallenged.
These three volumes are the conclusion of a six-volume series4 whose goal is
to correct the universities’ curricular oversight in the mathematical education of

1 We use the term "mathematics educators" to refer to university faculty in schools of education.
2 These statements about curriculum and textbooks do not take into account how much the quality
of school textbooks and teachers’ content knowledge may have evolved recently with the advent of
CCSSM (Common Core State Standards for Mathematics) ([CCSSM]) in 2010.
3 Detailed criticisms and explicit corrections of these flaws are scattered throughout these

volumes.
4 The earlier volumes in the series are [Wu2011a], [Wu2016a], and [Wu2016b].

xxvii
xxviii TO THE INSTRUCTOR

teachers and educators by providing the needed mathematical knowledge to break


the vicious cycle of TSM. For this reason, these volumes pay special attention to
mathematical integrity (as defined on page xiii) and transparency, so that every
concept is precisely defined and every assertion is completely explained5 and so
that the exposition here is as close as possible to what is taught in a high school
classroom.
TSM has appeared in different guises; after all, the NCTM reform (see page
xv ff.) was largely ushered in around 1989. But beneath the surface its essential
substance has stayed remarkably constant (compare [Wu2014]). TSM is charac-
terized by a lack of clear definitions, faulty or nonexistent reasoning, pervasive
imprecision, general incoherence, and a consistent failure to make the case about
why each standard topic in the school curriculum is worthy of study. Let us go
through each of these issues in some detail.
(1) Definitions. In TSM, correct definitions of even the most basic concepts
are usually not available. Here is a partial list:
fraction, multiplication of fractions, division of fractions, one
fraction being bigger or smaller than another, finite decimal,
infinite decimal, mixed number, ratio, percent, rate, constant
rate, negative number, the four arithmetic operations on rational
numbers, congruence, similarity, length of a curve, area of a
planar region, volume of a solid, expression, equation, graph of
a function, graph of an inequality, half-plane, polygon, interior
angle of a polygon, regular polygon, slope of a line, parabola,
inverse function, etc.
Consequently, students are forced to work with concepts whose mathematical mean-
ing is at best only partially revealed to them. Consider, for example, the concept of
division. TSM offers no precise definition of division for whole numbers, fractions,
rational numbers, real numbers, or complex numbers. If it did, the division concept
would become much more learnable because it is in fact the same for all these num-
ber systems (thus we also witness the incoherence of TSM). The lack of a definition
for division leads inevitably to the impossibility of reasoning about the division of
fractions, which then leads to "ours is not to reason why, just invert-and-multiply".
We have here a prime example of the convergence of the lack of definitions, the lack
of reasoning, and the lack of coherence.
The reason we need precise definitions is that they create a level playing field for
all learners, in the sense that each person—including the teacher—has all the needed
information about a given concept from the very beginning and this information is
the same for everyone. This eliminates any need to spend time looking for "tricks",
"insider knowledge", or hidden agendas. The level playing field makes every concept
accessible to all learners, and this fact is what the discussion of equity in school
mathematics education seems to have overlooked thus far. To put this statement in
context, think of TSM’s definition of a fraction as a piece of pizza: even elementary
students can immediately see that there is more to a fraction than just being a piece
of pizza. For example, " 58 miles of dirt road" has nothing to do with pieces of a
pizza. The credibility gap between what students are made to learn and what they
subconsciously recognize to be false disrupts the learning process, often fatally.
5 In other words, every theorem is completely proved. Of course there are a few theorems

that cannot be proved in context, such as the fundamental theorem of algebra.


TO THE INSTRUCTOR xxix

In mathematics, there can be no valid reasoning without precise definitions.


Consider, for example, TSM’s proof of (−2)(−3) = 2 × 3. Such a proof requires
that we know what −2 is, what −3 is, what properties these negative integers are
assumed to possess, and what it means to multiply (−2) by (−3) so that we can
use them to justify this claim. Since TSM does not offer any information of this
kind, it argues instead as follows: 3 · (−3), being 3 copies of −3, is equal to −9, and
likewise, 2 · (−3) = −6, 1 · (−3) = −3, and of course 0 · (−3) = 0. Now look at the
pattern formed by these consecutive products:
3 · (−3) = −9, 2 · (−3) = −6, 1 · (−3) = −3, 0 · (−3) = 0.
Clearly when the first factor decreases by 1, the product increases by 3. Now, when
the 0 in the product 0 · (−3) decreases by 1 (so that 0 becomes −1), the product
(−1)(−3) ceases to make sense. Nevertheless, TSM urges students to believe that
the pattern must persist no matter what so that this product will once more increase
by 3 and therefore (−1)(−3) = 3. By the same token, when the −1 in (−1)(−3)
decreases by 1 again (so that −1 becomes −2), the product must again increase by
3 for the same reason and (−2)(−3) = 6 = 2 × 3, as desired. This is what TSM
considers to be "reasoning".
TSM goes further. Using a similar argument for (−2)(−3) = 2 × 3, one can
show that (−a)(−b) = ab for all whole numbers a and b. Now, TSM asks students
to take another big leap of faith: if (−a)(−b) = ab is true for whole numbers a and
b, then it must also be true when a and b are arbitrary numbers. This is how TSM
"proves" that negative times negative is positive.
Slighting definitions in TSM can also take a different form: the graph of a linear
inequality ax + by ≤ c is claimed to be a half-plane of the line ax + by = c, and
the "proof" usually consists of checking a few examples. Thus the points (0, 0),
(−2, 0), and (1, −1) are found to lie below the line defined by x + 3y = 2 and, since
they all satisfy x + 3y ≤ 2, it is believable that the "lower half-plane" of the line
x + 3y = 2 is the graph of x + 3y ≤ 2. Further experimentation with other points
below the line defined by x + 3y = 2 adds to this conviction. Again, no reasoning is
involved and, more importantly, neither "graph of an inequality" nor "half-plane"
is defined in such a discussion because these terms sound so familiar that TSM
apparently believes no definition is necessary. At other times, reasoning is simply
suppressed, such as when the coordinates of the vertex of the graph of ax2 + bx + c
are peremptorily declared to be
 
−b 4ac − b2
, .
2a 4a
End of discussion.
Our emphasis on the importance of definitions in school mathematics compels
us to address a misconception about the role of definitions in school mathematics
education. To many teachers and educators, the word "definition" connotes some-
thing tedious and nonessential that students must memorize for standardized tests.
It may also conjure an image of cut-and-dried, top-down instruction that begins
with a rigid and unmotivated definition and continues with the definition’s formal
and equally unmotivated appearance in a chain of logical arguments. Understand-
ably, most educators find this scenario unappetizing. Their response is that, at least
in school mathematics, the definition of a concept should emerge at the end—but
not at the beginning—of an extended intuitive discussion of the hows and whys of
xxx TO THE INSTRUCTOR

the concept.6 In addition, the so-called conceptual understanding of the concept is


believed to lie in the intuitive discussion but not in the formal definition itself, the
latter being nothing more than an afterthought.
These two opposite conceptions of definition ignore the possibility of a middle
ground: one can state the precise definition of a concept at the beginning of a
lesson to set the tone of the subsequent mathematical discussion and exploration,
which is to show students that this is all they will ever need to know about the con-
cept as far as doing mathematics is concerned. Such transparency—demanded by
the mathematical culture of the past century (cf. [Quinn])—is what is most sorely
missing in TSM, which consistently leaves students in doubt about what a fraction
is or might be, what a negative number is, what congruence means, etc. In this
middle ground, a definition can be explored and explained in intuitive terms in the
ensuing discussion on the one hand and, on the other, put to use in proofs—in its
precise formulation—to show how and why the definition is absolutely indispensable
to any kind of reasoning concerning the concept. With the consistent use of precise
definitions, the line between what is correct and what is intuitive but maybe incor-
rect (such as the TSM-proof of negative times negative is positive) becomes clearly
drawn. It is the frequent blurring of this line in TSM that contributes massively to
the general misapprehension in mathematics education about what a proof is (part
of this misapprehension is described in, e.g., [NCTM2009], [Ellis-Bieda-Knuth], and
[Arbaugh et al.]).
These three volumes (this volume, [Wu2020b], and [Wu2020c]) will always take
a position in the aforementioned middle ground. Consider the definition of a frac-
tion, for example: it is one of a special collection of points on the number line
(page 10). This is the only meaning of a fraction that is needed to drive the fairly
intricate mathematical development of fractions, and, for this reason, the definition
of a fraction as a certain point on the number line is the one that will be unapolo-
getically used all through these three volumes. To help teachers and students feel
comfortable with this definition, we give an extensive intuitive discussion of why
such a definition for a fraction is necessary on pp. 4–10. This intuitive discussion,
naturally, opens the door to whatever pedagogical strategy a teacher wants to in-
vest in it. Unlike in TSM, however, this definition is not given to be forgotten.
On the contrary, all subsequent discussions about fractions will refer to this pre-
cise definition (but not to the intuitive discussion that preceded it) and, of course,
all the proofs about fractions will also depend on this formal definition because
mathematics demands no less. Students need to learn what a proof is and how it
works; the exposition here tries to meet this need by (gently) laying bare the fact
that reasoning in proofs requires precise definitions. As a second example, we give
the definition of the slope of a line only after an extensive intuitive discussion on
pp. 338–346 about what slope is supposed to measure and how we may hope to
measure it. Again, the emphasis is on the fact that this definition of slope is not
the conclusion, but the beginning of a long logical development that goes from page
346 to the end of Chapter 6 on page 383, and into trigonometry (relation with the
tangent function), calculus (definition of the derivative), and beyond.

6 Proponents of this approach to definitions often seem to forget that, after the emergence

of a precise definition, students are still owed a systematic exposition of mathematics using the
definition so that they can learn about how the definition fits into the overall logical structure of
mathematics.
TO THE INSTRUCTOR xxxi

(2) Reasoning. Reasoning is the lifeblood of mathematics, and the main rea-
son for learning mathematics is to learn how to reason. In the context of school
mathematics, reasoning is important to students because it is the tool that empow-
ers them to explore on their own and verify for themselves what is true and what
is false without having to take other people’s words on faith. Reasoning gives them
confidence and independence. But when students have to accustom themselves to
performing one unexplained rote skill after another, year after year, their ability
to reason will naturally atrophy. Many students find it more expedient to stop
asking why and simply take any order that comes their way sight unseen just to
get by.7 One can only speculate on the cumulative effect this kind of mathematics
"learning" has on those students who go on to become teachers and mathematics
educators.
(3) Precision. The purpose of precision is to eliminate errors and minimize
misconceptions, but in TSM students learn at every turn that they should not
believe exactly what they are told but must learn to be creative in interpreting it.
For example, TSM preaches the virtue of using the theorem on equivalent fractions
to simplify fractions and does not hesitate to simplify a rational expression in x as
follows:
(x − 1)(x2 + 3) x2 + 3
= .
x(x − 1) x
This looks familiar because "canceling the same number from top and bottom" is
exactly what the theorem on equivalent fractions is supposed to do. Unfortunately,
this theorem only guarantees
ca a
=
bc b
when a, b, and c are whole numbers (b and c understood to be nonzero). In the
2
√ (x +3), and x is necessarily a
previous rational expression, however, none of (x−1),
whole number because x could be, for example, 5. Therefore, according to TSM,
students in algebra should look back at equivalent fractions and realize that the
theorem on equivalent fractions—in spite of what it says—can actually be applied
to "fractions" whose "numerators" and "denominators" are not whole numbers.
Thus TSM encourages students to believe that "nothing needs to be taken precisely
and one must be flexible in interpreting what one learns". This extrapolation-happy
mindset is the opposite of what it takes to learn a precise subject like mathematics
or any of the exact sciences. For example, we cannot allow students to believe that
the domain of definition of log x is [0, ∞) since [0, ∞) is more or less the same as
(0, ∞). Indeed, the presence or absence of the single point "0" is the difference
between true and false.
Another example of how a lack of precision leads to misconceptions is the
statement that "β 0 = 1", where β is a nonzero number. Because TSM does not
use precise language, it does not—or cannot—draw a sharp distinction between a
heuristic argument, a definition, and a proof. Consequently, it has misled numerous
students and teachers into believing that the heuristic argument for defining β 0 to
be 1 is in fact a "proof" that β 0 = 1. The same misconception persists for negative
exponents (e.g., β −n = 1/β n ). The lack of precision is so pervasive in TSM that
there is no end to such examples.

7 There is consistent anecdotal evidence from teachers in the trenches that such is the case.
xxxii TO THE INSTRUCTOR

(4) Coherence. Another reason why TSM is less than learnable is its inco-
herence. Skills in TSM are framed as part of a long laundry list, and the lack of
definitions for concepts ensures that skills and their underlying concepts remain
forever disconnected. Mathematics, on the other hand, unfolds from a few cen-
tral ideas, and concepts and skills are developed along the way to meet the needs
that emerge in the process of unfolding. An acceptable exposition of mathematics
therefore tells a coherent story that makes mathematics memorable. For example,
consider the fact that TSM makes the four standard algorithms for whole numbers
four separate rote-learning skills. Thus TSM hides from students the overriding
theme that the Hindu-Arabic numeral system is universally adopted because it
makes possible a simple, algorithmic procedure for computations; namely, if we
can carry out an operation (+, −, ×, or ÷) for single-digit numbers, then we can
carry out this operation for all whole numbers no matter how many digits they
have (see Chapter 3 of [Wu2011a]). The standard algorithms are the vehicles that
bridge operations with single-digit numbers and operations on all whole numbers.
Moreover, the standard algorithms can be simply explained by a straightforward
application of the associative, commutative, and distributive laws. From this per-
spective, a teacher can explain to students, convincingly, why the multiplication
table is very much worth learning; this would ease one of the main pedagogical
bottlenecks in elementary school. Moreover, a teacher can also make sense of the
associative, commutative, and distributive laws to elementary students and help
them see that these are vital tools for doing mathematics rather than dinosaurs in
an outdated school curriculum. If these facts had been widely known during the
1990s, the senseless debate on whether the standard algorithms should be taught
might not have arisen and the Math Wars might not have taken place at all.
TSM also treats whole numbers, fractions, (finite) decimals, and rational num-
bers as four different kinds of numbers. The reality is that, first of all, decimals are
a special class of fractions (see pp. 14ff.), whole numbers are part of fractions, and
fractions are part of rational numbers. Moreover, the four arithmetic operations
(+, −, ×, and ÷) in each of these number systems do not essentially change from
system to system. There is a smooth conceptual transition at each step of the
passage from whole numbers to fractions and from fractions to rational numbers;
see Parts 2 and 3 of [Wu2011a] or Sections 2.2, 2.4, and 2.5 in this volume. This
coherence facilitates learning: instead of having to learn about four different kinds
of numbers, students basically only need to learn about one number system (the
rational numbers). Yet another example is the conceptual unity between linear
functions and quadratic functions: in each case, the leading term—ax for linear
functions and ax2 for quadratic functions—determines the shape of the graph of
the function completely, and the studies of the two kinds of functions become sim-
ilar as each revolves around the shape of the graph (see Section 2.1 of [Wu2020b]).
Mathematical coherence gives us many such storylines, and a few more will be
detailed below.
(5) Purposefulness. In addition to the preceding four shortcomings—a lack
of clear definitions, faulty or nonexistent reasoning, pervasive imprecision, and gen-
eral incoherence—TSM has a fifth fatal flaw: it lacks purposefulness. Purposefulness
is what gives mathematics its vitality and focus: the fact is that a mathematical
investigation, at any level, is always carried out with a specific goal in mind. When
a mathematics textbook reflects this goal-oriented character of mathematics, it
TO THE INSTRUCTOR xxxiii

propels the mathematical narrative forward and facilitates its learning by making
students aware of where the discussion is headed, and why. Too often, TSM lurches
from topic to topic with no apparent purpose, leading students to wonder why they
should bother to tag along. One example is the introduction of the absolute value
of a number. Many teachers and students are mystified by being saddled with such
a "frivolous" skill: "just kill the negative sign", as one teacher put it. Yet TSM
never tries to demystify his concept. (For an explanation of the need to introduce
absolute value, see, e.g., the discussion on pp. 130ff.). Another is the seemingly
inexplicable replacement
√ √ of the square root and cube root symbols of a positive
number b, i.e., b and 3 b, by rational exponents, b1/2 and b1/3 , respectively (see,
e.g., Section 4.2 of [Wu2020b]). Because TSM teaches the laws of exponents as
merely "number facts", it is inevitable that it would fail to point out the purpose of
this change of notation, which is to shift focus from the operation of taking roots to
the properties of the exponential function bx for a fixed positive b. A final example
is the way TSM teaches estimation completely by rote, without ever telling students
why and when estimation is important and therefore worth learning. Indeed, we
often have to make estimates, either because precision is unattainable or unneces-
sary, or because we purposely use estimation as a tool to help achieve precision (see
[Wu2011a, Section 10.3]).
To summarize, if we want students to be taught mathematics that is learn-
able, then we must discard TSM and replace it with the kind of mathematics that
possesses these five qualities:

Every concept has a clear definition.


Every statement is precise.
Every assertion is supported by reasoning.
Its development is coherent.
Its development is purposeful.

We have come across them before on page xiii: these are the Fundamental Principles
of Mathematics (also see Section 2.1 in [Wu2018a]).
TSM consistently violates all five fundamental principles. Because of the dom-
inance of TSM for at least the past half-century, most students come out of K–12
knowing only TSM but not mathematics that respects these fundamental principles.
To them, learning mathematics is not about learning how to reason or distinguish
true from false but about memorizing facts and tricks to get correct answers. Faced
with this crisis, what should be the responsibility of institutions of higher learn-
ing? Should it be to create courses for future teachers and educators to help them
systematically replace their knowledge of TSM with mathematics that is consistent
with the five fundamental principles? Or should it be, rather, to leave TSM alone
but make it more palatable by helping teachers infuse their classrooms with activ-
ities that suggest visions of reasoning, problem solving, and sense making? As of
this writing, an overwhelming majority of the institutions of higher learning are
choosing the latter alternative.
At this point, we return to the earlier question about some of the ways both
university mathematicians and educators might misunderstand and misuse these
three volumes.
xxxiv TO THE INSTRUCTOR

Potential misuse by mathematicians

First, consider the case of mathematicians. They are likely to scoff at what
they perceive to be the triviality of the content in these volumes: no groups, no
homomorphisms, no compact sets, no holomorphic functions, and no Gaussian cur-
vature. They may therefore be tempted to elevate the level of the presentation, for
example, by introducing the concept of a field and show that, when two fractions
symbols m/n and k/ (with whole numbers m, n, k, , and n = 0,  = 0) satisfying
m = nk are identified, and when + and × are defined by the usual formulas, the
fraction symbols form a field. In this elegant manner, they can efficiently cover all
the standard facts in the arithmetic of fractions in the school curriculum.8 This
is certainly a better way than defining fractions as points on the number line to
teach teachers and educators about fractions, is it not? Likewise, mathematicians
may find finite geometry to be a more exciting introduction to axiomatic systems
than any proposed improvements on the high school geometry course in TSM. The
list goes on. Consequently, pre-service teachers and educators may end up learn-
ing from mathematicians some interesting mathematics, but not mathematics that
would help them overcome the handicap of knowing only TSM.
Mathematicians may also engage in another popular approach to the profes-
sional development of teachers and educators: teaching the solution of hard prob-
lems. Because mathematicians tend to take their own mastery of fundamental skills
and concepts for granted, many do not realize that it is nearly impossible for teach-
ers who have been immersed in thirteen years or more of TSM to acquire, on their
own, a mastery of a mathematically correct version of the basic skills and concepts.
Mathematicians are therefore likely to consider their major goal in the professional
development of teachers and educators to be teaching them how to solve hard prob-
lems. Surely, so the belief goes, if teachers can handle the "hard stuff", they will
be able to handle the "easy stuff" in K–12. Since this belief is entirely in line
with one of the current slogans in school mathematics education about the critical
importance of problem solving, many teachers may be all too eager to teach their
students the extracurricular skills of solving challenging problems in addition to
teaching them TSM day in and day out. In any case, the relatively unglamorous
content of these three volumes (this volume, [Wu2020b], and [Wu2020c])—designed
to replace TSM—will get shunted aside into supplementary reading assignments.
At the risk of belaboring the point, the focus of these three volumes is on
showing how to replace teachers’ and educators’ knowledge of TSM in grades 9–12
with mathematics that respects the fundamental principles of mathematics. There-
fore, reformulating the mathematics of grades 9–12 from an advanced mathemati-
cal standpoint to obtain a more elegant presentation is not the point. Introducing
novel elementary topics (such as Pick’s theorem or the 4-point affine plane) into
the mathematics education of teachers and educators is also not the point. Rather,
the point in year 2020 is to do the essential spadework of revisiting the standard
9–12 curriculum—topic by topic, along the lines laid out in these three volumes—
showing teachers and educators how the TSM in each case can be supplanted by
mathematics that makes sense to them and to their students. For example, since
most pre-service teachers and educators have not been exposed to the use of precise

8 This is my paraphrase of a mathematician’s account of his professional development institute

around year 2000.


TO THE INSTRUCTOR xxxv

definitions in mathematics, they are unlikely to know that definitions are supposed
to be used, exactly as written, no more and no less, in logical arguments. One of
the most formidable tasks confronting mathematicians is, in fact, how to change
educators’ and teachers’ perception of the role of definitions in reasoning.
As illustration, consider how TSM handles slope. There are two ways, but we
will mention only one of them.9 TSM pretends that, by defining the slope of a
line L using the difference quotient with respect to two pre-chosen points P and
Q on L,10 such a difference quotient is a property of the line itself (rather than
a property of the two points P and Q). In addition, TSM pretends that it can
use "reasoning" based on this defective definition to derive the equation of a line
when (for example) its slope and a given point on it are prescribed. Here is the
inherent danger of thirteen years of continuous exposure to this kind of pseudo-
reasoning: teachers cease to recognize that (a) such a definition of slope is defective
and (b) such a defective definition of slope cannot possibly support the purported
derivation (= proof) of the equation of a line. It therefore comes to pass that—
as a result of the flaws in our education system—many teachers and educators
end up being confused about even the meaning of the simplest kind of reasoning:
"A implies B". They need—and deserve—all the help we can give so that they
can finally experience genuine mathematics, i.e., mathematics that is based on the
fundamental principles of mathematics.
Of course, the ultimate goal is for teachers to use this new knowledge to teach
their own students so that those students can achieve a true understanding of what
"A implies B" means and what real reasoning is all about. With this in mind, we
introduce in Section 6.4 (pp. 337ff.) the concept of slope by discussing what slope is
supposed to measure (an example of purposefulness) and how to measure it, which
then leads to the formulation of a precise definition. With the availability of the
AA-criterion for triangle similarity (Theorem G22 on page 288), we then show how
this definition leads to the formula for the slope of a line as the difference quotient
of the coordinates of any two points on the line (the "rise-over-run"). Having
this critical flexibility to compute the slope—plus an earlier elucidation of what an
equation is (pp. 322–324)—we easily obtain the equation of a line passing through
a given point with a given slope, with correct reasoning this time around (see pp.
357ff.). Of course the same kind of reasoning can be applied to similar problems
when other reasonable geometric data are prescribed for the line.
By guiding teachers and educators systematically through the correction of
TSM errors on a case-by-case basis, we believe they will gain a new and deeper
understanding of school mathematics. Ultimately, we hope that if institutions of
higher learning and the education establishment can persevere in committing them-
selves to this painstaking work, the students of these teachers and educators will
be spared the ravages of TSM. If there is an easier way to undo thirteen years and
more of mis-education in mathematics, we are not aware of it.
A main emphasis in using these three volumes should therefore be on providing
patient guidance to teachers and educators to help them overcome the many hand-
icaps inflicted on them by TSM. In this light, we can say with confidence that, for

9 A second way is to define a line to be the graph of a linear equation y = mx + b and then

define the slope of this line to be m. This is the definition of a line in advanced mathematics, but
it is so profoundly inappropriate for use in K–12 that we will just ignore it.
10 This is the "rise-over-run".
xxxvi TO THE INSTRUCTOR

now, the best way for mathematicians to help educate teachers and educators is to
firm up their mathematical foundations. Let us repair the damage TSM has done
to their mathematics content knowledge by helping them to acquire a knowledge
of school mathematics that is consistent with the fundamental principles of math-
ematics.

Potential misuse by educators

Next, we address the issue of how educators may misuse these three volumes.
Educators may very well frown on the volumes’ insistence on precise definitions
and precise reasoning and their unremitting emphasis on proofs while, apparently,
neglecting problem solving, conceptual understanding, and sense making. To them,
good professional development concentrates on all of these issues plus contextual
learning, student thinking, and communication with students. Because these three
volumes never explicitly mention problem solving, conceptual understanding, or
sense making per se (or, for that matter, contextual learning or student thinking),
their content may be dismissed by educators as merely skills-oriented or technical
knowledge for its own sake and, as such, get relegated to reading assignments outside
of class. They may believe that precious class time can be put to better use by
calling on students to share their solutions to difficult problems or by holding small
group discussions about problem-solving strategies.
We believe this attitude is also misguided because the critical missing piece in
the contemporary mathematical education of teachers and educators is an exposure
to a systematic exposition of the standard topics of the school curriculum that
respects the fundamental principles of mathematics. Teachers’ lack of access to
such a mathematical exposition is what lies at the heart of much of the current
education crisis. Let us explain.
Consider problem solving. At the moment, the goal of getting all students
to be proficient in solving problems is being pursued with missionary zeal, but
what seems to be missing in this single-minded pursuit is the recognition that the
body of knowledge we call mathematics consists of nothing more than a sequence
of problems posed, and then solved, by making logical deductions on the basis of
precise definitions, clearly stated hypotheses, and known results.11 This is after
all the whole point of the classic two-volume work [Pólya-Szegö], which introduces
students to mathematical research through the solutions to a long list of problems.
For example, the Pythagorean theorem and its many proofs are nothing more than
solutions to the problem posed by people from diverse cultures long ago: "Is there
any relationship among the three sides of a right triangle?" There is no essential
difference between problem solving and theorem proving in mathematics. Each time
we solve a problem, we in effect prove a theorem (trivial as that theorem may
sometimes be).
The main point of this observation is that if we want students to be profi-
cient in problem solving, then we must give them plenty of examples of grade-
appropriate proofs all through (at least) grades 4–12 and engage them regularly

11 It is in this light that the previous remark about the purposefulness of mathematics can

be better understood: before solving a problem, one should know why the problem was posed in
the first place. Note that, for beginners (i.e., school students), the overwhelming emphasis has to
be on solving problems rather than the more elusive issue of posing problems.
TO THE INSTRUCTOR xxxvii

in grade-appropriate theorem-proving activities. If we can get students to see, day


in and day out, that problem solving is a way of life in mathematics and if we
also routinely get them involved in problem solving (i.e., theorem proving), students
will learn problem solving naturally through such a long-term immersion. In the
process, they will get to experience that, to solve problems, they need to have
precise definitions and precise hypotheses as a starting point, know the direction
they are headed towards before they make a move (sense making), and be able
to make deductions from precise definitions and known facts. Definitions, sense
making, and reasoning will therefore come together naturally for students if they
learn mathematics that is consistent with the five fundamental principles.
We make the effort to put problem solving in the context of the fundamental
principles of mathematics because there is a danger in pursuing problem solving
per se in the midst of the TSM-induced corruption of school mathematics. In a
generic situation, teachers teach TSM and only pay lip service to "problem solving",
while in the best case scenario, teachers keep TSM intact while teaching students
how to solve problems on a separate, parallel track outside of TSM. Lest we forget,
TSM considers "out of a hundred" to be a correct definition of percent, expands the
product of two linear polynomials by "FOILing", and assumes that in any problem
about rate, one can automatically assume that the rate is constant ("Lynnette can
wash 95 cars in 5 days. How many cars can Lynnette wash in 11 days?"), etc. In this
environment, it is futile to talk about (correct) problem solving. Until we can rid
school classrooms of TSM, the most we can hope for is having teachers teach, on the
one hand, definition-free concepts with a bag of tricks-sans-reasoning to get correct
answers and, on the other hand, reasoning skills for solving a separate collection of
problems for special occasions. In other words, two parallel universes will co-exist
in school mathematics classrooms. So long as TSM continues to reign in school
classrooms, most students will only be comfortable doing one-step problems and
any problem-solving ability they possess will only be something that is artificially
grafted onto the TSM they know.
If we want to avert this kind of bipolar mathematics education in schools,
we must begin by providing teachers with a better mathematical education. Then
we can hope that teachers will teach mathematics consistent with the fundamental
principles of mathematics12 so that students’ problem-solving abilities can evolve
naturally from the mathematics they learn. It is partly for this reason that the
six volumes under discussion13 choose to present the mathematics of K–12 with
explanations (= proofs) for all the skills. In particular, these three volumes on the
mathematics of grades 9–12 provide proofs for every theorem. (At the same time,
they also caution against certain proofs that are simply too long or too tedious
to be presented in a high school classroom.) The hope is that when teachers and
educators get to experience firsthand that every part of school mathematics is suf-
fused with reasoning, they will not fail to teach reasoning to their own students as
a matter of routine. Only then will it make sense to consider problem solving to be
an integral part of school mathematics.

12 And, of course, to also get school textbooks that are unsullied by TSM. However, it seems

likely as of 2020 that major publishers will hold onto TSM until there are sufficiently large numbers
of knowledgeable teachers who demand better textbooks. See the end of [Wu2015].
13 These three volumes, together with [Wu2011a], [Wu2016a], and [Wu2016b].
xxxviii TO THE INSTRUCTOR

The importance of correct content knowledge

In general, the idea is that if we give teachers and educators an exposition


of mathematics that makes sense and has built-in conceptual understanding and
reasoning, then we can hope to create classrooms with an intellectual climate that
enables students to absorb these qualities as if by osmosis. Perhaps an analogy can
further clarify this issue: if we want to teach writing, it would be more effective to
let students read good writing and learn from it directly rather than to let them
read bad writing and simultaneously attend special sessions on the fine points of
effective written communication.
If we want school mathematics to be suffused with reasoning, conceptual un-
derstanding, and sense making, then we must recognize that these are not qualities
that can stand apart from mathematical details. Rather, they are firmly anchored
to hard-and-fast mathematical facts. Take proofs (= reasoning), for example. If we
only talk about proofs in the context of TSM, then our conception of what a proof is
will be extremely flawed because there are essentially no correct proofs in TSM. For
starters, since TSM has no precise definitions, there can be no hope of finding a com-
pletely correct proof in TSM. Therefore, when teaching from these three volumes,14
it is imperative to first concentrate on getting across to teachers and educators the
details of the mathematical reformulation of the school curriculum. Specifically,
we stress the importance of offering educators a valid alternative to TSM for their
future research. Only then can we hope to witness a reconceptualization—in math-
ematics education—of reasoning, conceptual understanding, problem solving, etc.,
on the basis of a solid mathematical foundation.
Reasoning, conceptual understanding, and sense making are qualities intrinsic
to school mathematics that respects the fundamental principles of mathematics.
We see in these three volumes a continuous narrative from topic to topic and from
chapter to chapter to guide the reader through this long journey. The sense making
will be self-evident to the reader. Moreover, when every assertion is backed up by
an explanation (= proof), reasoning will rise to the surface for all to see. In their
presentation of the natural unfolding of mathematical ideas, these volumes also
routinely point out connections between definitions, concepts, theorems, and proofs.
Some connections may not be immediately apparent. For example, in Section 6.1
of this volume (page 310), we explicitly point out the connection between Mersenne
primes and the summation of finite geometric series. Other connections span several
grades: there is a striking similarity between the proofs of the area formula for
rectangles whose sides are fractions (Theorem 1.7 on pp. 48ff.), the ASA congruence
criterion (Theorem G9 on pp. 245ff.), the SSS congruence criterion (Theorem G28 in
Section 6.2 of [Wu2020b]), the fundamental theorem of similarity (Theorem G10 in
Section 6.4 of [Wu2020b]), and the theorem about the equality of angles on a circle
subtending the same arc (Theorem G52 in Section 6.8 of [Wu2020b]). All these
proofs are achieved by breaking up a complicated argument into two or more clear-
cut steps, each involving simpler arguments. In other words, they demonstrate
how to reduce the complex to the simple, so prospective teachers and educators
can learn from such instructive examples about the fine art of problem solving
(= reasoning).

14 As well as from the other three volumes, [Wu2011a], [Wu2016a], and [Wu2016b]).
TO THE INSTRUCTOR xxxix

The foregoing unrelenting emphasis on mathematical content should not lead


readers to believe that these three volumes deal with mathematics at the expense of
pedagogy. To the extent that these volumes are designed to promote better teach-
ing in the schools, they do not sidestep pedagogical issues. Extensive pedagogical
comments are offered whenever they are called for, and they are clearly displayed
as such; see, for example, pp. 16, 23, 37, 40, 53, 119, 261, 263, etc. Nevertheless,
our most urgent task—the fundamental task—in the mathematical education of
teachers and educators as of 2020 has to be the reconstruction of their mathemat-
ical knowledge base. This is not about judiciously tinkering with what teachers
and educators already know or tweaking their existing knowledge here and there.
Rather, it is about the hard work of replacing their knowledge of TSM with math-
ematics that is consistent with the fundamental principles of mathematics from the
ground up. The primary goal of these three volumes is to give a detailed exposition
of school mathematics in grades 9–12 to help educators and teachers achieve this
reconstruction.
To the Pre-Service Teacher

In one sense, these three volumes are just textbooks, and you may feel you have
gone through too many textbooks in your life to need any fresh advice. Nevertheless,
we are going to suggest that you approach these volumes with a different mindset
than what you may have used with other textbooks, because you will soon be using
the knowledge you gain from these volumes to teach your students. Reading other
textbooks, you would likely congratulate yourself if you could achieve mastery over
90% of the material. That would normally guarantee an A. More is at stake with
these volumes, however, because they directly address what you will need to know
in order to write your lessons. Ask yourself whether a mathematics teacher whose
lessons are correct only 90% of the time should be considered a good teacher. To
be blunt, such a teacher would be a near disaster. So your mission in reading these
volumes should be to achieve nothing short of total mastery. You are expected to
know this material 100%. To the extent that the content of these three volumes
is just K–12 mathematics, this is an achievable goal. This is the standard you have
to set for yourself. Having said that, we also note explicitly that many Mathematical
Asides are sprinkled all through the text, sometimes in the form of footnotes. These
are comments—usually from an advanced mathematical perspective—that try to
shed light on the mathematics under discussion. The above reference to "total
mastery" does not include these comments.
You should approach these volumes differently in yet another respect. Students’
typical attitude towards a math course is that if they can do all the homework
problems, then most of their work is done. Think back on your calculus courses
or any of the math courses when you were in school, and you will understand how
true this is. But since these volumes are designed specifically for teachers, your
emphasis cannot be limited to merely doing the homework assignments because
your job will be more than just helping students to do homework problems. When
you stand in front of a class, what you will be talking about, most of the time, will
not be the exercises at the end of each section but the concepts and skills in the
exposition proper.1 For example, very likely you will soon have to convince a class
on geometry why the Pythagorean theorem is correct. There are two proofs of this
theorem in these volumes, one in Chapter 5 of this volume and the other in Chapter
4 of [Wu2020c]. Yet on neither occasion is it possible to assign a problem that asks
for a proof of this theorem. There are problems that can assess whether you know

1 I will be realistic and acknowledge that there are teachers who use class time only to drill

students on how to get the right answers to exercises, often without reasoning. But one of the
missions of these three volumes is to steer you away from that kind of teaching. See To the
Instructor on pp. xxvii ff.

xli
xlii TO THE PRE-SERVICE TEACHER

enough about the Pythagorean theorem to apply it, but how do you assess whether
you know how to prove the theorem when the proofs have already been given in
the text? It is therefore entirely up to you to achieve mastery of everything in the
text itself. One way to check is to pick a theorem at random and ask yourself:
Can I prove it without looking at the book? Can I explain its significance? Can I
convince someone else why it is worth knowing? Can I give an intuitive summary
of the proof? These are questions that you will have to answer as a teacher. To
the extent possible, these volumes try to provide information that will help you
answer questions of this kind. I may add that the most taxing part of writing these
volumes was in fact to do it in a way that would allow you, as much as possible,
to adapt them for use in a school classroom with minimal changes. (Compare, for
example, To the Instructor on pp. xxvii ff.)
There is another special feature of these volumes that I would like to bring to
your attention: these volumes are essentially school textbooks written for teachers,
and as such, you should read them with the eyes of a school student. When you read
Chapter 1 of this volume on fractions, for instance, picture yourself in a sixth-grade
classroom and therefore, no matter how much abstract algebra you may know or
how well you can explain the construction of the quotient field of an integral domain,
you have to be able to give explanations in the language of sixth-grade mathematics
(i.e., to sixth graders). Similarly, when you come to Chapter 6, you are developing
algebra from the beginning, so even the use of symbols will be an issue (it is in
fact the key issue; see Section 6.1 on pp. 298ff.). Therefore, be very deliberate and
explicit when you introduce a symbol, at least for a while.
The major conclusions in these volumes, as in all mathematics books, are sum-
marized into theorems. Depending on the author’s (and other mathematicians’)
whims, theorems are sometimes called propositions, lemmas, or corollaries as a
way of indicating which theorems are deemed more important than others. Roughly
speaking, a proposition is not regarded to be as important as a theorem, a lemma is
conceptually less important than a proposition, and a corollary is supposed to follow
immediately from the theorem or proposition to which it is attached. (Incidentally,
a formula or an algorithm is just a theorem.) This idiosyncratic classification of the-
orems started with Euclid around 300 BC, and it is too late to do anything about it
now. The main concepts of mathematics are codified into definitions. Definitions
are set in boldface in these volumes when they appear for the first time; a few
truly basic ones are even individually displayed in a separate paragraph, but most
of the definitions are embedded in the text itself, so you should watch out for them.
The statements of the theorems, and especially their proofs, depend on the
definitions, and proofs are the guts of mathematics.
Please note that when I said above that I expect you to know everything in
these volumes, I was using the word "know" in the way mathematicians normally
use the word. They do not use it to mean simply "know the statement by heart".
Rather, to know a theorem, for instance, means know the statement by heart, know
its proof, know why it is worth knowing, know what its potential implications are,
and finally, know how to apply it in new situations. If you know anything short
of this, how can you expect to be able to answer your students’ questions? At the
very least, you should know by heart all the theorems and definitions as well as the
main ideas of each proof because, if you do not, it will be futile to talk about the
TO THE PRE-SERVICE TEACHER xliii

other aspects of knowing. Therefore, a preliminary suggestion to help you master


the content of these volumes is for you to
copy out the statements of every definition, theorem, proposi-
tion, lemma, and corollary, along with page references so that
they can be examined in detail when necessary,
and also to
form the habit of summarizing the main idea(s) of each proof.
These are good study habits. When it is your turn to teach your students, be sure
to pass along these suggestions to them.
You should also be aware that reading a mathematics book is not the same as
reading a gossip magazine. You can probably flip through one of the latter in an
hour or less. But in these volumes, there will be many passages that require slow
reading and re-reading, perhaps many times. I cannot single out those passages for
you because they will be different for different people. We do not all learn the same
way. What you can take for granted, however, is that mathematics books make for
exceedingly slow reading. (Nothing good comes easy.) Therefore if you get stuck,
time and time again, on a sentence or two in these volumes, take heart, because
this is the norm in mathematics learning.
Prerequisites

In terms of the mathematical development of this volume, only a knowledge


of whole numbers, 0, 1, 2, 3, . . . , is assumed. Thus along with place value, you
are assumed to know the four arithmetic operations, their standard algorithms, and
the concept of division-with-remainder and how it is related to the long division
algorithm.1 Division-with-remainder assigns to each pair of whole numbers b (the
dividend) and d (the divisor), where d = 0, another pair of whole numbers q (the
quotient) and r (the remainder), so that
b = qd + r where 0 ≤ r < d.
Some subtle points about the concept of division among whole numbers will be
briefly recalled at the beginning of Section 1.5 on page 54. A detailed exposition of
the concept of "division" among whole numbers is given in Chapter 7 of [Wu2011a].
Note that 0 is included among the whole numbers.
A knowledge of negative numbers, particularly integers, is not assumed. Neg-
ative numbers will be developed ab initio in Chapter 2.

Because every assertion in these three volumes (this volume, together with
[Wu2020b] and [Wu2020c]) will be proved, students should be comfortable with
mathematical reasoning. It is hoped that as they progress through the volumes, all
students will become increasingly at ease with proofs. In terms of the undergraduate
curriculum, readers of this volume—as a rule of thumb—should have already taken
the usual two years of college calculus or their equivalents.

1 Unfortunately, a correct exposition of this topic is difficult to come by. Try Chapter 7 of

[Wu2011a].

xlv
Some Conventions

• Each chapter is divided into sections. Titles of the sections are given at
the beginning of each section as well as in the table of contents. Each
section (with few exceptions) is divided into subsections; a list of the
subsections in each section—together with a summary of the section in
italics—is given at the beginning of each section.
• When a new concept is first defined, it appears in boldface but is not
often accorded a separate paragraph of its own. For example:
A subset R in a plane is called convex if given any two points
A, B in R, . . . (p. 172).
You will have to look for many definitions in the text proper. (However,
not all boldfaced words or phrases signify new concepts to be defined,
because boldface fonts are sometimes used for emphasis.)
• When a new notation is first introduced, it also appears in boldface. For
example:
The congruence notation ABC ∼ = A B  C  will be under-
stood to mean . . . (p. 245).
• Equations are labeled with numbers inside parentheses, and the first digit
of the label indicates the chapter in which the equation can be found.
For example, the "(1.17)" in the sentence "Thus (1.17) implies that . . . "
means the 17th labeled equation in Chapter 1.
• Exercises are located at the end of each section.
• Bibliographic citations are labeled with the name of the author(s) inside
square brackets, e.g., [Ginsburg]. The bibliography begins on page 387.
• In the index, if a term is defined on a certain page, that page will be in
italics. For example, the item
division-with-remainder, 15, 137, 139
means that the term "division-with-remainder" appears in a significant
way on all three pages, but the definition of the term is on page 139.

xlvii
CHAPTER 1

Fractions

Overview of Chapters 1 and 2


These two chapters give a systematic, though somewhat terse, exposition of
rational numbers1 that is suitable for classroom use in grades 5–7. The reason for
the brevity is that a more leisurely exposition written for middle school teachers
has been given in [Wu2016a]; you can consult the latter for more details if neces-
sary. Such an exposition will be important to high school teachers because most
students come to high school with a very defective understanding of fractions. They
need help with fractions if they are to have any hope of understanding their high
school math classes. High school teachers who do not possess a knowledge of frac-
tions that makes sense mathematically and is accessible to school students will be
tremendously handicapped when they try to communicate with their students.
There is a good reason why students have trouble with fractions: TSM2 , in
the guise of "school mathematics", has not taught them much about fractions
that makes sense to them. TSM does not even tell them what a fraction is beyond
putting forth the elusive metaphor of "a part of a whole" or "a piece of pizza", and it
certainly shows no interest in providing understandable mathematical explanations.
For example, why use the least common denominator when adding fractions, and
why invert and multiply when dividing fractions? If we do not teach learnable
mathematics, then nonlearning will be the inevitable consequence. Unfortunately,
the nonlearning of fractions is a major stumbling block in the learning of algebra3
and, therefore, in the learning of high school mathematics as a whole. This is one
reason that this volume—addressing the teaching of high school mathematics as
it does—is obliged to begin with an exposition of fractions and rational numbers
that is both mathematically correct and compatible with the school curriculum.
We want to maximize the chances that high school teachers will be able to do their
share to help students make a successful first step in their lifelong journey of a
thousand miles.
From the perspective of advanced mathematics, the rational numbers are among
the simplest mathematical structures. As sometimes happens, however, what is
mathematically simple may not be simple enough for consumption by school stu-
dents. Let us briefly recall how the set of rational numbers, Q, is handled in abstract
algebra. We start with the integers, Z, and Q is the set of all equivalence classes of
1 The term "rational numbers" is being used correctly here to denote the number system

consisting of fractions and negative fractions. Unfortunately, this term has been incorrectly used
in the education literature to mean fractions only, but not negative fractions.
2 See page xiv for a definition of TSM.
3 The importance of fractions to the learning of algebra is beginning to be recognized. See,

for example, Recommendation 4 on page xvii of the report of the National Mathematics Advisory
Panel ([NMAP1]). See also the article [Wu2018b].

1
2 1. FRACTIONS

ordered pairs of integers with respect to a natural equivalence relation (essentially


the cross-multiplication algorithm; see page 23). Then addition + and multiplica-
tion × are introduced in Q in a way that guarantees that Q becomes a field, i.e.,
+ and × are associative, commutative, and distributive, and so that every element
has an additive inverse and every nonzero element has a multiplicative inverse. An
order relation ≥ is introduced into Q and those elements ≥ 0 are what we call the
fractions. For the purposes of doing mathematics, the fact that Q is an ordered
field is essentially all that matters.
Now there are many reasons why this way of dealing with the rational numbers
is not suitable for use in grades 5–7, which is where the most substantial parts of
fractions and rational numbers are systematically taught in schools. The major
reason is that students around the age of twelve do not have the mathematical
maturity to deal with ordered pairs of integers or equivalence classes. They also
have little or no conception of why a field might be interesting or important, and
they couldn’t care less whether or not fractions can be added or multiplied. They
come to the concept of a fraction through everyday usage, such as two-thirds of a
glass of orange juice, and they come to the addition and multiplication of fractions
through their experience with whole numbers, so that + connotes "putting things
together" and × signifies "repeated addition". Negative fractions, essential as they
are, are even further from students’ intuitive understanding. The needs of the school
classroom therefore dictate that fractions and rational numbers be introduced in
a way that respects and extends students’ prior learning experiences. It would be
futile to try to impose on them a mathematical worldview that is entirely foreign
to them.
What the foregoing discussion hints at is that school mathematics education is
not the teaching of straightforward mathematics as we know it in universities but
must be, rather, a variant of it. If engineering is the customization of abstract scien-
tific principles to meet human needs, then mathematics education is mathematical
engineering:4 it customizes abstract university mathematics to meet the needs of
K–12 students. In the case of fractions, mathematical engineering customizes the
abstract concept of Q to meet the needs of students around the age of twelve. What
is essential to engineering is that, while it can customize scientific principles, it must
not violate them. Likewise, the customization of abstract mathematics must be also
consonant with mathematics in terms of reasoning and precision.
The tension between the need to be faithful to the fundamental principles of math-
ematics5 and the need to make the mathematics grade-level appropriate will be
the overriding theme in this and the two companion volumes, [Wu2020b] and
[Wu2020c]. For the case at hand, this tension highlights the fact that these two
chapters are nothing but the presentation of an "engineered version of Q". This
chapter will start with fractions, while negative numbers will be the subject of the
following one.
We want to point out a glaring failing of TSM that these two chapters will
redress. On a technical level, rational numbers are important because they con-
stitute the number system of choice in school mathematics. School mathematics
even views real numbers (rational and irrational numbers) through the lens of ra-
tional numbers. To illustrate this point, consider the following simple operation
4 For a fuller discussion, see [Wu2006].
5 See page xiii.
OVERVIEW OF CHAPTERS 1 AND 2 3

with irrational numbers:


√ √ √
2 5 (4 × 2) + ( 3 5)
(1.1) √ + = √ .
3 4 4 3

In TSM, one does not explain what √2 and 45 are, much less the meaning of
3
√ √
adding the numbers on the left, and the same can be said about the product 3 5
on the right. Let us not forget that in teaching fractions, much time is spent on the
meaning and the skills of the addition and multiplication of fractions. With this
in mind, how can we account for √ the fact that the addition and multiplication of
more mysterious numbers such as 3 and π are passed over in silence? The answer,
which TSM √ fails to provide, is that the conceptual complexity of irrational numbers
such as 3 and π is beyond the level of K–12 and therefore the computation in
(1.1) has to be carried out by making a formal analogy with a similar computation
involving rational numbers. More precisely, since the identity
a c ad + bc
+ =
b d bd
is valid for all rational numbers a, b, c, d (b, d = 0),6 a fundamental assumption in
school mathematics is that this identity will remain valid for all real numbers a, b,
c, d, with b, d = 0 (see page 6√for the definition
√ of a real number). Such being the
case, we may let a = 2, b = 3, c = 5, and d = 4 to get the previous equality
(1.1).
TSM seems never to discuss how teachers and their students should deal with
real numbers. Its main instructional strategy seems to be one that exploits students’
passive willingness to accept orders from authority figures about what to do, rea-
sonable or not, and since this strategy has been implemented since kindergarten,
TSM has a high expectation of compliance. Therefore, having drilled students
on formal computations with fractions—whose numerators and denominators are
whole numbers—TSM is comfortable asking students to simply believe that the
same computations can also be carried out when the numerators and denominators
are real numbers. No explanation given, of course.
In order to make school mathematics learnable, it is essential that we stop such
underhanded maneuvers and be explicit about what we ask students to assume.
This is what we have come to call the Fundamental Assumption of School
Mathematics (FASM). It will be discussed at length on pp. 133ff. Although
nothing like FASM exists in TSM, FASM plays a pivotal role throughout these
three volumes. (The proof of FASM is given in Section 2.1 of the third volume in
this series, [Wu2020c].)
Finally, in reading these two chapters, please keep in mind that the emphasis
here will not be on individual facts or skills. For example, it is taken for granted that
you are entirely comfortable with identifying the fractions and rational numbers as
certain points on the x-axis (called "the number line" in school mathematics) and
are fluent in the four arithmetic operations with rational numbers. Rather, the
emphasis will be on the reconstruction of familiar facts about fractions and rational
numbers into a new body of knowledge that is logical and coherent, so that it can
6 The fact that this identity is true when a, b, c, d are whole numbers is perhaps well known,

but TSM does not explain its validity when a, b, c, d are fractions, much less when they are
rational numbers. However, one can find such an explanation on page 118 of Section 2.5.
4 1. FRACTIONS

provide a framework for you to communicate with high school students who need
help. You can be certain that, as of 2020, you will often be called upon to explain
fractions and rational numbers to your students. We hope that you will take the
time and trouble to become thoroughly familiar with these two chapters so that
you will be able to make sense of fractions and rational numbers when explaining
them to your students. This will be an important first step toward improving school
mathematics education.

1.1. Definition of a fraction


After a brief review of the unsatisfactory state of the teaching of fractions in
TSM, we introduce the number line, explain informally why fractions should be a
specific collection of points on the number line, and then give the formal definition.
The definition has some inherent subtleties, including the need for precision in
the specification of the unit, and these are duly pointed out. We then single out
an important subclass of fractions: the decimals. The fact that decimals are by
definition fractions is a main feature of this volume.
Leaving the past behind (p. 4)
The number line (p. 5)
An informal discussion (p. 7)
The formal definition of a fraction (p. 9)
Miscellaneous comments on the definition (p. 10)
Decimal fractions and decimals (p. 14)
Locating fractions on the number line (p. 15)

Leaving the past behind

Reasoning in mathematics requires precise definitions for each and every con-
cept used.7 We need a definition of a fraction, not only because this is what
mathematics demands, but also because students need a precise mental image for
fractions to update the mental image of their fingers for whole numbers. Because
7 3
there is no natural image for fractions such as 11 or 13 , it is incumbent on us to
create one for students. They cannot go through life knowing a fraction only as a
piece of pizza, as TSM basically forces them to do. They will have to use fractions
to compute percents for sales tax and volumes of solids for gardening chores even
when no pizza is in sight.
Beyond pizzas, the most common definition TSM has to offer for fractions is
"parts of a whole".8 For students, the difficulty with the conception of a fraction
as "parts of a whole" is multifaceted:
(1) The concept of a "whole" is elusive. TSM never defines
what a "whole" is. It is many things, and thus a moving target.
A concept this nebulous cannot serve as a solid foundation for
learning fractions.

7 Mathematical Aside: Except undefined concepts that are part of the foundational axioms.
8 We should point out the linguistic coincidence that the word "whole" appears in both
phrases, "whole numbers" and "parts of a whole". Be careful not to confuse the two. Fortunately,
we will basically have no occasion to refer to "parts of a whole" in this volume beyond this informal
discussion.
1.1. DEFINITION OF A FRACTION 5

(2) A fraction is a number that you compute with, but TSM


does not explain in what sense "parts of a whole" is a "number"
and how to compute with "parts of a whole". For example, how
does "parts of a whole" logically lead to invert and multiply in
the division of fractions?
(3) "Parts of a whole" is at least two things: the "parts"
and then there is the "whole". It is difficult to conceptualize a
single number as two separate pieces.
(4) There is a psychological issue. Since "the whole" means
"the whole thing", how can you have more than the whole thing
as in the case of 32 ?
There are additional definitions of a fraction either as a "division" or a "ratio"
in TSM, or even as all three things; i.e., a fraction is "parts of a whole" and a
"division" and a "ratio". Theorem 1.4 on page 29 shows that indeed a fraction
can be interpreted as a division, but only after "division" between arbitrary whole
numbers has been carefully defined. Not before. The division interpretation of a
fraction is therefore something to prove rather than something to decree by fiat as
part and parcel of what a fraction is. The concept of "ratio" will be seen to be one
that can be defined precisely after the division of fractions has been put in place
(page 75). Thus, to define a fraction as a "ratio" is to define one abstruse concept
in terms of another that is even more abstruse. Bottom line: TSM’s approach to
fractions is unlearnable from the beginning.

The number line

We will use the number line to formulate a definition of fractions (the exposition
of this chapter goes back to [Wu1998] and [Wu2002]; compare 3.NF on page 24 of
[CCSSM]). The fractions will be a particular collection of points on the number line.
The definition will be unambiguous, and the geometric nature of this collection of
points will make it a very accessible mental image of fractions for students.
The number line is the name that school mathematics gives to what is called
in mathematics the real line, i.e., the x-axis. Take a line which is (usually chosen
to be) horizontal and pick a point to designate as 0. The line being horizontal, one
can distinguish between the left direction and the right direction on the line. We
choose another point on the line to the right of 0 and designate it as 1.9 Once a
choice of the two points 0 and 1 has been fixed on this line, the line is called the
number line.
0 1

If a and b are two points on the number line so that a is to the left of b, then the
segment from a to b consists of all the points between a and b, together with the
points a and b themselves; the notation for this segment is [a, b]. (Mathematical
Aside: In calculus, a segment is called a closed bounded interval.)
a b

9 By convention, 1 is to the right of 0. It could have been to the left of 0, of course, if the

convention had so dictated.


6 1. FRACTIONS

The points a and b are called the endpoints of [a, b]; a is the left endpoint and
b the right endpoint. The segment [0, 1] will be called the unit segment, and
its right endpoint 1 will be called the unit on the number line.
We will have to make precise the common notion of "equal parts" on the number
line. To this end, we have to be able to decide if two segments have the same length.
Two segments [a, b] and [c, d] are said to be of the same length if, by sliding one
segment along the number line until their left endpoints a and c coincide, then
their right endpoints b and d also coincide. For convenience, we will also express
the length of [a, b] as the distance between a and b.
Mathematical Aside: We intentionally use the suggestive term of "sliding" for
the comparison of two segments because we are setting the stage for teaching frac-
tions in the upper elementary grades. The mathematical terminology for "slide" is
"translate". Therefore we have implicitly introduced the concept of a translation
into the study of fractions (see page 234 for the definition of translation). Formally,
we are translating [a, b] along the number line from a to c (or, along the vector −→
ac),
and we say [a, b] and [c, d] have the same length if this translation also maps b to d.
This approach to fractions therefore assumes a knowledge of Euclidean geometry.
There is no logical difficulty here as Euclidean geometry can be developed without
reference to numbers if we so wish (see [Hilbert]). In terms of pedagogy, this expo-
sition is set at the right level too because the amount of geometric knowledge that
is implicitly assumed of school students is nothing more than what any ten-year-old
would naturally take for granted.
Back to the number line on which the points 0 and 1 have been chosen. We
choose another point to the right of 1 so that the distance between that point and 1
is the same as the distance between 0 and 1. We designate that point as 2. Then we
choose another point to the right of 2 so that the distance between that point and
2 is the same as the distance between 0 and 1. We designate that point as 3, and so
on. In this way, we get an infinite sequence of equidistant points to the right of
0 (i.e., points on the line so that the distance between any two consecutive points
is the same) to which we have attached the whole numbers, N, {0, 1, 2, 3, . . .}.
Think of the number line as an infinite ruler, as shown:
0 1 2 3 4

We have seen that the choice of a 0 and a 1 on a horizontal line naturally leads
to a sequence of equidistant points to the right of 0. For this reason, we will also
refer to a horizontal line with an infinite sequence of equidistant points identified
with N on its right side as the number line. By definition, a number, or more
precisely, a real number is just a point on the number line.10
Mathematical Aside: It is worth pointing out that implicit in
the above definition of the sequence of whole numbers, N, on
the number line is the assumption that there is a way to tell the
"distance" between any two points on the number line. Thus 2
is the point to the right of 1 so that the distance between 0 and
1 is the same as the distance between 1 and 2, 3 is the point to
10 Mathematical Aside: We are in effect introducing coordinates in the line, in the same

way that we will later introduce coordinates in the plane in Section 6.3. By defining a point on
the number line as a number, we are adopting the usual practice of identifying a point with its
coordinate(s) once the coordinate system has been fixed.
1.1. DEFINITION OF A FRACTION 7

the right of 2 so that the distance between 0 and 1 is the same


as the distance between 2 and 3, and so on. We will revisit this
scenario when we have to introduce the distance function in the
plane on page 185.

We pause to note that we have just defined explicitly what a number is, namely,
a point on the number line. This may not seem like much until you recall that,
in TSM, the word number is bandied about repeatedly and yet nobody can say
precisely what a number is.
A segment [a, b] is said to have length c, where c is a point on the number
line, if the segments [a, b] and [0, c] have the same length. Thus by definition, the
segment [0, c] has length c. In particular, the unit segment [0, 1] has length 1 and,
for this reason, we sometimes refer to 1 as the unit length or unit distance.
Let it be observed that, insofar as c is just a symbol representing a point on the
number line, to say [a, b] has length c means very little at the moment. However
if c happens to be a whole number, say 7, then to say [a, b] has length 7 gives us a
good idea of how long [a, b] is (because we know what 7 times longer than [0, 1] is).
The next order of business is to expand our pool of "standard numbers" by naming
more numbers on the number line, namely, the fractions.
At this juncture, we can make contact with the common notion of a "whole"
again. The advantage of using the number line to define fractions is that we are in
effect fixing the "whole" once and for all: it will always be the length of the unit
segment [0, 1]. No ambiguity, no guessing. At the same time, the number line has
the flexibility to accommodate any kind of a "whole": if we want to talk about
dividing a pizza, let the unit 1 on the number line be the area of a circular pizza
(always assuming the pizza has uniform thickness so that only the area of the pizza
matters in the division), but if we want to consider the distance of a car from a
starting point, let the unit 1 be one mile. In the former case, the number 3 then
stands for the total area of 3 pizzas, and in the latter case the number 3 will stand
for 3 miles.
We emphasize: contrary to what TSM says, the "whole" is not the unit segment
[0, 1], but the length of the unit segment [0, 1]. The precision here is everything.
TSM has misled students into believing that the "whole" in "parts of a whole" is an
object, e.g., a segment, a square, a pizza, a glass of water, etc., whereas the correct
statement is that the whole is, respectively, the length of a segment, the area of a
square, the area of a pizza, the volume of a glass of water, etc. The sloppiness of the
language in TSM is a main reason for much of students’ nonlearning, as we shall see.

An informal discussion

With the length of [0, 1] as the "whole", let us see informally where the frac-
tions with denominator equal to 3 (i.e., 13 , 23 , 33 , 43 , etc.) should be placed on the
number line. We will assume that we can divide a given segment into any number
of segments of equal length (this is something intuitive and believable and, in any
case, we will show how to do it in the Pedagogical Comments on page 16). The
fraction 13 is one-third of the whole (= the length of [0, 1]) and, therefore, 13 is the
length of one part when we divide the length of [0, 1] into 3 equal parts, i.e., 3 seg-
ments of equal length. If we divide also each of [0, 1], [1, 2], [2, 3], . . . into 3 segments
of equal length, then these division points, together with the whole numbers, form
8 1. FRACTIONS

an infinite sequence of equidistant points, to be called the sequence of thirds.


Then 13 is the length of any one of the short segments below, where short segment
refers to a segment between consecutive points in the sequence of thirds:
0 1 2 3

Any of the following thickened short segments is "one part when the whole is divided
into 3 equal parts" and is therefore a legitimate representation of 13 :
0 1 2 3

The existence of these multiple representations of 13 cries out for clarification.


To this end, we introduce the following standard representation of 13 , namely,
the short segment whose length is equal to 1/3 and whose left endpoint is 0. See
the thickened segment below.
0 1 2 3

1
3
With respect to the standard representation, we observe that the length of this
segment determines its right endpoint, and the right endpoint determines the length
of this segment. Therefore, we may as well identify the standard representation of
1 1
3 with its right endpoint. Then the fraction 3 becomes identified with a point on
the number line.
In like manner, the fraction 53 , being the length of 5 of these short segments,
has the following standard representation (see the thickened segment below).
For a similar reason, we identify the standard representation of 53 with its right
endpoint and proceed to denote the latter by 53 , as shown:
0 1 2 3

5
3
m
In general then, a fraction 3 for a nonzero whole number m has the stan-
dard representation consisting of m adjoining short segments abutting 0, and
we identify this standard representation of m
3 with its right endpoint. When m = 0,
we agree to identify 03 with the point 0. Now each fraction with denominator 3 is
identified with one and only one of the points in the sequence of thirds, as shown:
0 1 2 3

0 1 2 3 4 5 6 7 8 9 10 11
3 3 3 3 3 3 3 3 3 3 3 3

Note that by our convention, 03 is just 0.


In terms of the sequence of thirds, each fraction m3 is easily located: the point
m
3 is the m-th point to the right of 0. Thus if we ignore the denominator, which is
3, then the naming of the points in the sequence of thirds is no different from the
naming of the whole numbers.
Of course the consideration of fractions with denominator equal to 3 extends
to fractions with other denominators. For example, replacing 3 by 5, then we get
1.1. DEFINITION OF A FRACTION 9

the sequence of fifths, which is a sequence of equidistant points obtained by


dividing each of [0, 1], [1, 2], [2, 3], . . . into 5 equal parts. The first 11 fractions with
denominator equal to 5 are now shown to be identified with points in the sequence
of fifths, as shown:
0 1 2

0 1 2 3 4 5 6 7 8 9 10 11
5 5 5 5 5 5 5 5 5 5 5 5

Finally, if we consider all the fractions with denominator equal to n, then we


would be led to the sequence of n-ths, which is the sequence of equidistant
points resulting from dividing each of [0, 1], [1, 2], [2, 3], . . . into n equal parts. The
fraction m
n is then the m-th point to the right of 0 in this sequence.

The formal definition of a fraction

We will now begin the formal presentation of fractions. We say a segment [a, b]
is divided into n equal parts if [a, b] is expressed as the union of n adjoining,
nonoverlapping segments of the same length. (The union of a collection of sets
is the totality of all the points each of which belongs to at least one of the given
sets. For example, the union of the segments [0, 1] and [1, 3] is the segment [0, 3].)
We implicitly assume that it is possible to divide a segment into a given number of
equal parts. (Those who feel uneasy about the possibility of dividing a given segment
into n equal parts for any whole number n may wish to skip to the Pedagogical
Comments on page 16 to get some assurance on how to get it done.)
Divide each of the line segments [0, 1], [1, 2], [2, 3], [3, 4], . . . into 3 equal parts.
The totality of division points, including the whole numbers, forms a sequence of
equidistant points (see p. 6 for the definition), to be called the sequence of thirds.
We now attach symbols to points in the sequence of thirds, as follows. Starting
from left to right, the fraction 03 is, by definition, the first point in the sequence,
which is 0. For a nonzero whole number m, m 3
is the m-th point in the sequence to
the right of 0. Thus 3 is the first point to the right of 0, 23 is the second point, 33 is
1

the third point, etc. Note that 33 coincides with 1, 63 coincides with 2, 93 coincides
with 3, and in general, 3m 3 coincides with m for any whole number m. Here is the
picture:
0 1 2 3 etc.

0 1 2 3 4 5 6 7 8 9 10
3 3 3 3 3 3 3 3 3 3 3

The fraction m 1
3 is called the m-th multiple of 3 . Note that the way we have
1
just introduced the multiples of 3 on the number line is exactly the same way that
the multiples of 1 (i.e., the whole numbers) were introduced on the number line.
By doing to 13 exactly what we did to the number 1 in putting the whole numbers
on the number line, we obtain all the whole-number multiples of 13 , i.e., all the
3 , where m ∈ N. (The symbol "∈" means belonging to or belongs to.)
m

In general, if a nonzero n ∈ N is given, we introduce a new collection of points


on the number line in the following way. Divide each of the line segments [0, 1],
[1, 2], [2, 3], [3, 4], . . . into n equal parts; then these division points (including the
10 1. FRACTIONS

whole numbers) form an infinite sequence of equidistant points on the number line,
to be called the sequence of n-ths. Starting from left to right, 0 is denoted by
0 1
n
. The first point in the sequence to the right of 0 is denoted by n , the second
2 3
point by n , the third by n , etc., and the m-th point in the sequence to the right
of 0 is denoted by m n
, to be called the m-th multiple of n 1
. The sequence of
1
n-ths therefore consists of all the whole-number multiples of n , i.e., those
m
points denoted by n for some whole number m.
Definition. The collection of all the sequences of n-ths, as n runs through the
nonzero whole numbers 1, 2, 3, . . . , is called the fractions. For a nonzero whole
number m, the m-th point to the right of 0 in the sequence of n-ths is denoted by m n.
The number m is called the numerator and n is called the denominator of the
symbol mn . By the traditional abuse of language, it is common to say that m and n
are the numerator and denominator, respectively, of the fraction m n
.11 By
0
convention, 0 is denoted by n for any n.

This definition of fractions accords a special status to those fractions denoted


by n1 for a nonzero whole number n: the fractions are the union of all the whole-
number multiples of n1 for some nonzero whole number n. We call these n1 ’s the
unit fractions.
The thing to keep in mind is that we first identify a subcollection of points as
fractions before affixing any symbols to them.
The meaning of the unit determines how fractions are interpreted. If the unit 1
stands for 1 pound, then 3 will be interpreted as 3 pounds, but if 1 stands for 1 mile,
then 13 13
4 will stand for 4 miles, etc.
12
Thus any reasoning with fractions on the
number line can be interpreted as reasoning with a specific real-world situation once
the unit 1 has been specified: 1 pound, 1 cc, 1 meter, etc. This kind of flexibility
only comes with the abstraction of putting numbers on the number line and is one
reason for defining fractions as points on the number line.
In the future, we will relieve the tedium of always having to say that the de-
nominator n of a fraction m n is nonzero by simply not mentioning it.

Miscellaneous comments on the definition

n 2n 3n 4n
(A) It is self-evident that n = 1, n = 2, n = 3, n = 4, and in general,
kn
(1.2) = k, for all whole numbers k, n, where n = 0.
n
Note that we have followed the convention of denoting the product of the whole
numbers k and n by kn rather than k×n. Now, letting n = 1 and k = 1, respectively,
we get
k n
= k and = 1 for all whole numbers k, n (n = 0).
1 n

11 The 100% correct statement is of course that "m is the numerator of the symbol which

denotes the fraction that is the m-th point of the sequence of n-ths, and n is the denominator of
this symbol." Needless to say, almost no one in real life—inside or outside math class—ever talks
like this.
12 Of course, 13 miles would be called "3 and a quarter miles" in everyday conversation, for
4
a reason to be explained on page 36.
1.1. DEFINITION OF A FRACTION 11

(B) For the study of fractions, the need for precision about what the unit is
cannot be overstated. On one level, it is impossible to say which point is what
fraction until the unit segment is fixed, i.e., the two points 0 and 1. Thus the
following fixed point P on the given horizontal line is either 24 or 2, depending on
which point is chosen to be the unit 1:
0 P 1

0 1 P

On a second level, without a precise statement about what the unit stands for,
it would be impossible to say what "equal parts" means, and without that, the
ambiguity would likely lead to nonlearning (see Exercise 4 on page 18). For example,
if the unit 1 "stands for a cup of water" (as is commonly done in TSM), does 13
mean a third of the volume of the liquid in the cup or a third of the liquid by height
(imagine that the cup is the usual curved shape and not a right circular cylinder)?
Or, if the unit 1 "stands for a ham", does 13 mean a third of the meat or a third of
the ham by weight including the bone? We can also give a slightly different example
to expose the error resulting from such ambiguity: suppose the unit ("the whole")
is a pizza and we ask what fraction is represented by putting one of the four pieces
on the left below together with one of the eight pieces on the right below:
'$'$
@
@
@@
&%&%
When the answer of 38 is not forthcoming, a common conclusion is that students
"do not have the necessary conceptual understanding of a fraction". However, if
students are taught that the "whole" is a pizza, they may very well think of 1 as
the shape of the pizza, so each fraction becomes a shape and they naturally would
not know how to put two shapes together to get a fraction. Students would be more
likely to "get it" if, instead of saying that "the pizza represents 1", we tell them
that the area of the pizza (ignoring its depth) represents 1.13 Then (assuming they
know what area is) they would have a better chance of seeing that the area of a
piece on the left above is equal to twice the area of a piece on the right, so that the
answer to the question should be 38 .
(C) We have been talking about the number line, but in a literal sense this way
of speaking is incorrect. A different choice of the line or even a different choice of
the positions of the numbers 0 and 1 would lead to a different number line. What is
true, however, is that anything done on one number line can be done on any other
in exactly the same way,14 and therefore we may—and do—identify all of them.
Now it makes sense to speak of the number line.

13 The need for precision about the unit exposes the common fallacy that introducing students

to fractions via pizzas is good pedagogy. The area of a curvilinear figure like the disk is too
sophisticated for children. The length of a segment or the area of a rectangle is a better alternative.
14 Mathematical Aside: All number lines are similar to each other in the sense of the definition

on page 284 so that they can be identified via similarity. Algebraically, the real numbers form a
complete ordered field, and since all complete ordered fields are isomorphic, we can also identify
them via isomorphism.
12 1. FRACTIONS

(D) Although a fraction is formally a point on the number line, the informal
discussion above makes it clear that on an intuitive level, a fraction m
n is just the
segment [0, m
n ]. So in the back of our minds, the segment image should never go
away completely, and this fact is reflected in the language we now introduce. First,
we give a definition.

Definition. The concatenation of two segments L1 and L2 on the number


line is the line segment obtained by putting L1 and L2 along the number line so that
the right endpoint of L1 coincides with the left endpoint of L2 .

L1 L2

Thus the segment [0, m n ] is the concatenation of exactly m segments each of


length n1 , namely, [0, n1 ], [ n1 , n2 ], . . . , [ m−1 m
n , n ] (recall the concept of the length of a
segment on p. 7). Because we identify [0, n ] with the point m
m 1 1
n , and [0, n ] with n , it
m
is natural to adopt the following suggestive terminology. We say n is m copies of
1
n
to mean that the segment [0, m n ] is the concatenation of exactly m segments each
of length n1 . More generally, by m copies of k we mean the segment obtained by
concatenating m segments each of length k .

(E) In school mathematics, the meaning of the equal sign is a subject that
is much discussed (see, e.g., Chapter 2 of [Carpenter et al.]), mainly because the
meaning of equality is never made clear. In addition, you will find later on the
traditional use of the word equivalent for fractions when equal is meant. Such a
cavalier attitude towards "equality" only adds to the confusion. For this reason,
we make explicit the fact that, by definition, two fractions k and m n
are equal
(or, equivalent), in symbols, k = m
n
, if they are the same point on the number
line. We have already seen in equation (1.1), for example, that kn k
n = 1 = k for any
n, k ∈ N. Incidentally, because there is no definition for fractions in TSM, TSM
has no precise meaning for the equality of two fractions k and mn.
(F) The definition of a fraction as a point on the number line allows us to make
precise the concept of order among fractions, i.e., the concept that one fraction is
smaller than (or bigger than) another. First, consider the case of whole numbers.
The way we put the whole numbers on the number line, a whole number m is
smaller than (or less than) another whole number n (in symbols, m < n) if m
is to the left of n (thus [0, m] is shorter than [0, n]). We expand on this fact by
defining, for two points A and B on the number line, A is smaller than B if A
is to the left of B. In symbols, A < B. We call this an inequality. In particular,
if A and B are fractions, A < B means A is to the left of B.
A B

Thus, 43 < 53 because, in the sequence of thirds, the 4th member in the sequence is
to the left of the 5th member of the sequence (see page 9).
There are two things about the definition of A < B that are worth noting here.
One is that, according to this definition, one cannot compare two given fractions
until both fractions are placed on the same number line. More to the point, what
1.1. DEFINITION OF A FRACTION 13

this says is that we can compare fractions only when they refer to the same unit.
The other is that in TSM, the concept of A < B between fractions is never defined
beyond inscrutable statements such as "A < B when B names a greater amount
than A". The reason for this omission is obvious: since there is no definition for the
concept of a fraction, a fraction is an unknown object and, therefore, it is impossible
to say how one unknown object A can be smaller than another unknown object B.
Sometimes the inequality B > A is used in place of A < B. Then we say B
is bigger (or greater) than A. This is the place to mention two related symbols.
A ≤ B means either A < B or A = B. Sometimes we refer to an inequality such
as A ≤ B as a weak inequality. For example, the weak inequality 12 ≤ 12 may
seem odd at first glance, but when it is realized that all it says is either 12 < 12 or
2 = 2 , we have to agree that 2 ≤ 2 is a correct statement since it is correct that
1 1 1 1

2 = 2 . The other new symbol is A ≥ B, which means either A > B or A = B.


1 1

Analogously, it is also correct to say that 12 ≥ 12 .


For the purpose of defining A < B when two points A and B are given on the
number line, all that is required is that the right-pointing direction be singled out
on the line. In greater detail, since the right-pointing direction is singled out on the
number line by the placement of 1 to the right of 0, the meaning of A < B is that,
when going towards the right, we encounter the point A before encountering B.
Thus, for the concept of order, the distance between 0 and 1 is of no consequence;
it is the "direction" set by going from 0 to 1 that matters. In Chapter 4, when
we consider lines in the plane that are not horizontal (see page 231), the idea of a
"direction" will be seen to be crucial.
(G) With the availability of the concept of order among fractions, we can revisit
the concept of a segment AB, where A and B are two points on the number line
and A is to the left of B (see page 5). By definition, AB consists of the two points
A and B, together with all the points C between A and B, which means all the
points C so that C is to the right of A and to the left of B.
A C B

In terms of the preceding definition of order, we may rephrase AB as the collection


of all the points (numbers) C so that A ≤ C ≤ B. We note that, in view of the
observation at the end of remark (F), the concept of a point between two other
points on the number line will make sense as soon as a right-pointing direction is
specified.
Since specifying a right-pointing direction is equivalent to specifying a left-
pointing direction, we could have rephrased the preceding discussion in terms of a
left-pointing direction. For a more elaborate discussion of this and related ideas,
see the discussion of betweenness on pp. 167ff. and the discussion on pp. 231ff.
k
(H) A final remark has to do with the fact that a fraction such as 23 , 14
5 , or  is
one number. Students are known to raise the issue of why three symbols—k, , and
the "fraction bar"—are needed to denote one number. Remember that a fraction
is a point on the number line, so that the symbols employed are merely means to
an end; namely, they serve to indicate where each point is located on the number
line. Thus the symbol 14 14
5 says precisely that 5 is the 14-th point in the sequence
of 5-ths to the right of 0. Clearly, every part of the symbol 145 is needed for this
purpose, namely, the number 5, the number 14, and the "fraction bar" in between.
14 1. FRACTIONS

The need for 5 and 14 is obvious; the role of the "fraction bar" is to separate the
14
5 from the 14 so that, for example, one does not confuse 5 with 145. It is with
the same need for separation in mind that when a fraction such as 14
5 is sometimes
shown horizontally as 14/5, there is a slant bar between the 14 and the 5.
This piece of information about what makes up the fraction symbol should be
clearly conveyed to students in grades 5–7.

Decimal fractions and decimals

We now single out a special class of fractions: fractions whose denominators


are of the form 10n for some positive integer n, e.g.,
1489 24 58900
, , .
102 105 104
These are called the decimal fractions, but they are better known in a more
common notation under a slightly different name, to be described presently. Dec-
imal fractions were understood and used in China by about 400 AD, but most
likely they were transmitted to Europe as part of the so-called Hindu-Arabic nu-
meral system only around the twelfth century. In 1593, the German Jesuit priest
C. Clavius—the Vatican astronomer who was the main architect of the Gregorian
calendar—introduced the idea of writing a decimal fraction without the fraction
symbol: just use the numerator and then keep track of the number of zeros in the
denominator (2 in the first decimal fraction, 5 in the second, and 4 in the third of
the above examples) by the use of a dot, the so-called decimal point; thus,
1489 24 58900
(1.3) 14.89 = , 0.00024 = 5 , 5.8900 = ,
102 10 104
respectively (see [Ginsburg]). The rationale of the notation is clear: the number
of digits to the right of the decimal point, the so-called decimal digits, keeps
track of the power of 10 in the respective denominators, 2 in 14.89, 5 in 0.00024,
and 4 in 5.8900. In this notation, these numbers are called finite or terminating
decimals.15 Until we introduce the concept of infinite decimals (in Chapter 3
of the third volume, [Wu2020c]), we will usually omit any mention of "finite" or
"terminating" and just say decimals. Notice the convention that, in order to keep
24
track of the power 5 in 10 5 , three zeros are added to the left of 24 to make sure
that there are 5 digits to the right of the decimal point in 0.00024. Note also that
the 0 in front of the decimal point is only for the purpose of clarity and is optional.
You may be struck by the odd-looking number 5.8900, because you have proba-
bly been told by TSM that it is ok to omit the zeros at the right end of the decimal
point and simply write 5.89. Before going any further with this thought, just be
aware that, according to the definition of a decimal, to say 5.8900 is equal to 5.89
is to say that the following two fractions are equal:
58900 589
and .
104 102
This fact is correct, but if so, we must be able to prove that it is correct before we
can use it. We will give a proof in the next section.

15 Regardless of what TSM has to say, this is the correct definition of a finite decimal, one

that has been adopted by CCSSM.


1.1. DEFINITION OF A FRACTION 15

Locating fractions on the number line

We conclude this section by giving some examples of locating fractions on the


number line. Let us start with 43 , for example. As usual, this is the fourth point to
the right of 0 in the sequence of thirds:
4
0 1 3

20
Activity. Can you locate the fraction 15 ? How is it related to 43 ?

Next we consider the problem of locating a fraction such as 84


17 , approximately,
on the number line; i.e., on the following line, where should 84
17 be placed, approxi-
mately?
0 1 2 3 4 5 6 7 8

The key idea will turn out to be the use of division-with-remainder for whole
numbers.16 First, look at the multiples of 17: 0, 17, 34, 51, 68, 85, . . . . Thus the
68-th multiple of 17 1
is 4 (because 68 = 4 × 17), and the 85-th multiple of 171
is 5
(because 85 = 5 × 17). Therefore, 17 lies between 17 (= 4) and 17 (= 5) and is just
84 68 85
1 84 1
17 shy of 5; i.e., 17 is the point on the number line which is 17 to the left of 5. In
terms of division-with-remainder, since 84 = (4 × 17) + 16, we have

84 (4 × 17) + 16
= .
17 17
1
So if each step we take is of length 17 , going another 16 steps to the right of 4 will
get us to 17 . If we go 17 steps instead, we will get to 5. Therefore 84
84
17 should be
quite near 5, as shown:

0 1 2 3 4 5 6 7 8
6
84
17
In general, if m
n is a fraction and division-with-remainder gives m = qn + k,
where q and k are whole numbers and 0 ≤ k < n, then
m qn + k
= ,
n n
qn
and the position of m n on the number line will be between q (= n ) and q + 1
(= (q+1)n
n , which is qn+n
n ).

Caution. Because the above reasoning gives 84 16


17 as 17 beyond 4, most school
84 16
textbooks would tell you that 17 = 4 17 . The latter is of course an example of
a mixed number. Similarly, the above m k
n is supposed to be written as q n . As a
teacher, however, you should exercise self-control not to introduce the concept of
16 "Division-with-remainder" in school mathematics is what is usually called the "division

algorithm" in abstract algebra texts. There is a good reason why the latter terminology is not
used in school mathematics: it would be too easily confused with the "long division algorithm".
16 1. FRACTIONS

a mixed numbers or the symbol 4 16 17 at this point. If you do, what would you tell
your students about the meaning of the symbol 4 16 17 ? Do not say, as TSM does,
that it is 4 and 16
17 , because how would your students interpret the word "and" in
this context?17 It actually means the addition of the fraction 4 and the fraction 16
17 ,
as in 4 + 16
17 . But fraction addition has not yet been defined, so you would confuse
students by abusing the word "and" here. TSM has a habit of using this word
"and" inappropriately. See page 35 for another flagrant example.

Pedagogical Comments. It is easy to divide a given segment into 2 equal


parts, 4 equal parts, 8 equal parts, etc. However, many teachers have raised the
issue of how to divide a given segment into 3, 7, or in general n equal parts, where
n is not a power of 2. There are two answers to this question. The first one is
to cheat, but to cheat honestly. Suppose you want to show a division of a unit
segment into 3 equal parts. What you do is not to start with the unit segment
but to start with a short segment (say, 12 inch long) that will be your 13 (see the
thickened segment below) and then duplicate it two more times. Now you declare
the resulting segment with the built-in equal division into thirds to be your unit
segment:

0 1
  

Of course the same trick works for equal division into n equal parts for any nonzero
whole number n.
A second answer is to confront the issue directly: given a segment, we will
show how to use plastic triangles and compass to divide it into any number of
equal parts. Suppose we have to divide a given segment AB into 3 equal parts.
Referring to the picture below, we draw an arbitrary ray L (see page 174 for the
precise definition) issuing from A and, using a compass, mark off three points C, D,
and E in succession on L so that AC, CD, and DE have equal length, the precise
length of AC being irrelevant. For example, if you make each of AC, CD, and DE
half an inch long, then you do not even need a compass. Join BE, and through C
and D draw lines parallel to BE 18 that intersect AB at C  and D , respectively.
The points C  and D are then the desired division points on AB to achieve the
equidivision, i.e., AC  , C  D , and D B are of equal length, as shown:

17 To understand why this confuses students, if "4 and 16 " is all they know about 4 16 , we
17 17
can look ahead and ask how they can compute with something like 4 16 17
× 12 17
3
?
18 The use of plastic triangles for this purpose is probably well known, but if not, see pp.

20–22 of [Wu2002] or pp. 243–244 of [Wu2016a].


1.1. DEFINITION OF A FRACTION 17

 L
E
  
  

D



 
C  
   
   
   
A C D B

In Section 7.1 of [Wu2020b], there is a proof of why AC  , C  D , and D B are of


equal length. It is clear how to modify this construction if AB is to be divided into
n equal parts for any nonzero whole number n. End of Pedagogical Comments.

Exercises 1.1. In doing these and subsequent exercises, please observe the
following basic rules:
(a) Show your work; your explanation is as important as
your answer.
(b) Be clear. Get used to the idea that, as a teacher, every-
thing you say has to be understood.
(1) Indicate the approximate position of each of the following on the number
line, and briefly explain why: (a) 1.24, (b) 186 1257 77 132
11 , (c) 132 , (d) 355 , (e) 1257 .
(2) Suppose the unit 1 on the number line is the area of the following region
enclosed by the thickened segments, where the given square is divided into
eight congruent rectangles (and therefore eight parts of equal area):19

Observe that a region whose area is 18 of the area of the given square is the
fraction 15 relative to this unit. In terms of this unit, what is the fraction
represented by the area of each of the following shaded regions of the same
square? Give a brief explanation of your answer. (In the picture in the
middle, the square is divided into eight congruent rectangles, and in the
picture on the right, two copies of the same square share a common side
and the square on the right is divided into four parts of equal area.)

19 We will give a precise definition of congruence in Chapter 4 and will formally discuss

area in Chapter 4 of [Wu2020c]. In this chapter, we only make use of both concepts in the
context of triangles and rectangles, and then only in the most superficial way. For the purpose
of understanding this chapter, you may therefore take both concepts in the intuitive sense. If
anything more than intuitive knowledge is needed, it will be supplied on the spot, e.g., on page
47 later on in this chapter.
18 1. FRACTIONS

@
@
@
@
@
@
(3) With the unit as in Exercise 2 above, write down the fraction representing
the area of the following shaded region (assume that the top and bottom
sides of the square are each divided into three segments of equal length):

(Hint: If you divide the given square horizontally into 8 congruent rect-
angles, then you can figure out the area of the shaded region in terms of
1
the 24 ’s of the area of the given square.)
(4) It was emphasized in the text (page 11) that the concept of "equal parts"
should be made precise in the teaching of fractions. This exercise gives
one example to illustrate this need. There are numerous others (see, e.g.,
the case of Two Green Triangles on page 86 of the case book [Barnett-
Goldstein-Jackson]).
A text on professional development claims that students’ conception
of "equal parts" is fragile and is prone to errors. As illustration, it says
that when a circle is presented as in the left picture below to students,
'$ '$

QQ
&% &%
2
they have no trouble shading 3 . However, it goes on to say that when
these same students are asked to construct their own picture of 23 , we often
see them create pictures with unequal pieces as in the right picture above.
(a) Referring to the latter presentation of "thirds" by students, in what
sense are the three pieces "equal"? In what sense are they "unequal"? (b)
What would you do as a teacher to prevent students from acquiring such
a misconception about "equal parts"?
(5) Ellen ate 15 of a large pizza with a 10-inch diameter and Kate ate 14 of a
smaller pizza with an 8-inch diameter. (Assume that all pizzas have the
same thickness and that the fractions of a pizza are measured in terms
of area.) Ellen told Kate that since she had eaten more pizza than Kate,
1 1
5 > 4 . (i) Did Ellen eat more pizza than Kate? (ii) Is Ellen’s assertion
that 15 > 14 correct? Explain why or why not. (You are allowed to use the
usual area formula for a circle.)
(6) (Review remark (B) on page 11 on the importance of the unit before do-
ing this exercise. Also make sure that you do it by a careful use of the
definition of a fraction rather than by some intuition you possess that you
cannot explain to your students.)
1.2. EQUIVALENT FRACTIONS 19

(a) After driving 218 miles, we have gone only two-thirds of the dis-
tance we planned to drive for the day. How many miles did we plan to
drive for the day? Explain.
(b) After reading 236 pages of a book, I am exactly four-fifths of the
way through. How many pages are in the book? Explain.
(c) Alexandra was three-quarters of the way to school after having
walked 0.72 miles from home. How far is her home from school? Explain.
(7) Take a pair of opposite sides of a unit square and divide each side into
7 equal parts. Join the corresponding points of division to obtain 7 thin
rectangles (we will assume that these are rectangles). For the remaining
pair of opposite sides, divide each into 5 equal parts and also join the
corresponding points of division; these lines are perpendicular to the other
7 lines. The intersections of these 7 and 5 lines create 7×5 small rectangles
which are congruent to each other (we will assume that too). What is the
area of each such small rectangle, and why? (Compare page 48 below.)
(8) Three segments (thickened) are on the number line, as shown:

137
A B 25 C
3 4 5 6 7

It is known that the length of the left segment is 11 16 , that of the middle
8 23
segment is 17 , and that of the right segment is 25 . What are the fractions
A, B, and C? (Caution: Remember that you have to explain your answers,
and that you know nothing about "mixed numbers" until we come to this
concept on page 36 below.)
(9) The following is found in a certain third-grade workbook:
Each of the following figures represents a fraction:

Point to two figures that have the same fractions shaded.

Does this problem make sense as it stands? If so, explain your answer
clearly. If not, how would you rephrase it so that it makes sense?

1.2. Equivalent fractions


We prove in this section the fundamental theorem in the subject of fractions:
the theorem on equivalent fractions. A first consequence of this theorem is the all-
important cross-multiplication algorithm, which is mistaken to be a "rote skill" in
20 1. FRACTIONS

the mathematics education literature. Unlike TSM, we also explicitly define the
concept of k of m
n
and use it to prove the division interpretation of a fraction,
which is erroneously considered in TSM to be part of the definition of a fraction.
The fundamental theorem (p. 20)
The cross-multiplication algorithm (p. 21)
The concept of k of mn
(p. 24)
The division interpretation of a fraction (p. 28)

The fundamental theorem

Recall that two fractions are said to be equal (or equivalent) if they are the
same point on the number line. For example, we observed in equation (1.2) on
page 10 that nk k
n = 1 , as both are equal to k. The following is a generalization of
this simple fact.

Theorem 1.1 (Theorem on equivalent fractions). Given two fractions m n


and k , suppose there is a nonzero whole number c so that k = cm and  = cn.

Then m k
n = .

Theorem 1.1 is usually stated more briefly as follows:


m cm m
(1.4) = for all fractions and all whole numbers c = 0.
n cn n
In this form, Theorem 1.1 is sometimes called the cancellation law for fractions,
because one simply "cancels" the whole number c from the numerator and the
denominator. This is the justification for the usual method of reducing fractions,
e.g., 51 3
34 = 2 because we can cancel the common factor 17 from the "top and
bottom" of the fraction:
51 17 × 3 3
(1.5) = = .
34 17 × 2 2
We will explain below (see page 22) why we choose to state Theorem 1.1 in
this clumsy fashion.

Proof. First look at a special case: why is 43 equal to 5×4


5×3 ? We have as usual the
following picture:
4
0 1 3

Now suppose we further divide each of the segments between consecutive points in
the sequence of thirds into 5 equal parts. Then each of the segments [0, 1], [1, 2],
[2, 3], . . . is now divided into 5 × 3 = 15 equal parts and, in an obvious way, we
have obtained the sequence of fifteenths on the number line:
4
0 1 3

The point 43 , being the 4-th point in the sequence of thirds, is now the 20-th point
in the sequence of fifteenths because 20 = 5 × 4. The latter is by definition the
fraction 20 5×4 4 5×4
15 , i.e., 5×3 . Thus 3 = 5×3 .
1.2. EQUIVALENT FRACTIONS 21

The preceding reasoning is enough to prove the general case. Thus let k = cm
and  = cn for whole numbers c, k, , m, and n. We will prove that m k
n =  . In
other words, we will prove equation (1.4) above.
The fraction m n is the m-th point in the sequence of n-ths. Now divide each
of the segments between consecutive points in the sequence of n-ths into c equal
parts, so that each of [0, 1], [1, 2], [2, 3], . . . is now divided into cn equal parts. Thus
the sequence of n-ths, together with the new division points, becomes the sequence
of cn-ths. Simple reasoning shows that the m-th point in the sequence of n-ths
must be the cm-th point in the sequence of cn-ths. This is another way of saying
m cm
n = cn . The proof is complete.
As mentioned earlier (page 12), it is a tradition in school mathematics to say
that two fraction (symbols) k and m k
n are equivalent if they are equal, i.e., if  and
m
n are the same point (see page 12). In this terminology, Theorem 1.1 gives a

sufficient condition for two fractions k and mn to be equivalent, and this accounts
for the name of the theorem.
There seems to be little awareness of the power of Theorem 1.1 on equivalent
fractions in TSM. Consequently, the role played by Theorem 1.1 in the TSM cur-
riculum is a minimal one: it is mostly used to reduce fractions. This is wrong,
because Theorem 1.1 is the fundamental fact about fractions. As a first demon-
stration of this claim, we now use Theorem 1.1 to bring closure to the discussion
on page 14 of the last section about the decimal 5.8900. Recall that we had, by
definition,
58900
= 5.8900.
104
We will show that 5.8900 = 5.89 and, more generally, one can append zeros to
or delete zeros from the right end of a finite decimal—to the right of the decimal
digits—without changing the number. Indeed,
58900 589 × 102 589
5.8900 = = 2 = 2 = 5.89,
104 10 × 10 2 10
where the third equality makes use of the cancellation law (1.4). The reasoning is
of course valid in general; e.g.,
127 127 × 104 1270000
12.7 = = = = 12.70000.
10 10 × 10 4 105
The rest of this chapter may be said to be nothing more than an extended demon-
stration of the importance of Theorem 1.1.

The cross-multiplication algorithm

We have just seen that Theorem 1.1 gives a sufficient condition for two fractions
m
n and k to be equal; there is a nonzero whole number c so that k = cm and
 = cn. There is an obvious interest in such a sufficient condition because each
symbol represents a point on the number line and one would like to be able to
decide whether the two symbols represent the same point or not. On the other
hand, the condition in Theorem 1.1 is not a necessary condition, in the sense that
the equality m k
n =  does not imply that k = cm and  = cn for some whole number
22 1. FRACTIONS

c. For example, Theorem 1.1 shows that 32 = 21 14 (as 21 = 7 × 3 and 14 = 7 × 2), so


that coupled with (1.5) on page 20, we have
21 51
= .
14 34
However, there is clearly no whole number c so that c times 21 yields 51 or that the
same c times 14 yields 34. It turns out that, with a mild twist, Theorem 1.1 can
be used to give a necessary and sufficient condition for two fractions to be equal.
Precisely:

Theorem 1.2 (Cross-multiplication algorithm). A necessary and suffi-

cient condition for two fractions k and m


n to be equal is that kn = m.

In the context of Theorem 1.2, it becomes clear why we chose to state Theorem
1.1 in that clumsy way: it exhibits the close relationship between the theorem on
equivalent fractions and the cross-multiplication algorithm.
For later needs, we pause to note that there are several different but equally
valid ways to state Theorem 1.2. One way is to say that
k m
= if and only if kn = m.
 n
Another says
k m
= is equivalent to kn = m.
 n
A more symbolic way is
k m
= ⇐⇒ kn = m.
 n
No matter how the theorem is stated, all it says is that both of the following
statements are valid:
First, k = mn implies kn = m.
Second, kn = m implies k = m n.
As is well known, each is said to be the converse of the other.

Proof of Theorem 1.2.


Part 1. We prove k = m n implies kn = m. By Theorem 1.1,
k
 = kn
n and
m m k m
n = n . Because we are assuming  = n , we therefore have
kn m
= .
n n
1
What this says is that the kn-th multiple of n is equal to the m-th multiple of
1
n . This is possible only if kn = m.

Part 2. We next prove kn = m implies k = mn . The hypothesis implies that


kn m
= .
n n
By Theorem 1.1, the left side is k while the right side is m k m
n . Thus we have  = n .
The proof of Theorem 1.2 is complete.
1.2. EQUIVALENT FRACTIONS 23

Pedagogical Comments. One can see the pernicious impact of TSM on


school mathematics education when the fundamental Theorem 1.2 is mistaken for
a rote skill in the mathematics education literature. To the contrary, once the con-
cept of a fraction has been clearly defined, so that the concept of the equality of two
fractions can also be clearly defined, Theorem 1.2 becomes a provable theorem. Let
us point out the obvious: a theorem in mathematics is never a rote skill. In this
case, more can be said: this is actually an important theorem because it often pro-
vides the only reasonable method to check whether two fractions are equal; see, for
example, the proof of Theorem 1.3 on page 27. Therefore, the cross-multiplication
algorithm has to be an integral part of every student’s mathematical survival kit.
We explicitly ask you to teach your students to be proficient in making use of this
fundamental result at every opportunity. As a rather trivial application of Theo-
rem 1.2, we see that 551 203
247 and 91 are equal because 551 × 91 = 203 × 247. End of
Pedagogical Comments.

Mathematical Aside: From the vantage point of abstract algebra, the impor-
tance of Theorem 1.2 (and hence of Theorem 1.1) is manifest because the cross-
multiplication algorithm is exactly the equivalence relation between ordered pairs
of integers when fractions are defined as equivalence classes of such ordered pairs:
(a, b) ∼ (a , b ) if and only if ab = a b.
From the proof of Theorem 1.2, we can extract a very useful statement about
pairs of fractions, which we call the Fundamental Fact of Fraction-Pairs
(FFFP):
Any two fractions may be symbolically represented as two frac-
tions with the same denominator.
Such a denominator is called a common denominator of the two fractions. In
other words, there will always be some whole number q, so that these two fractions
belong to the same sequence of q-ths. The reason is simple: we can simply take
q = the product of the denominators because, if the fractions are m k
n and  , then
by Theorem 1.1, we have
m m k nk
(1.6) = and = .
n n  n
The two fractions are now seen to be the m-th and nk-th members in the sequence
of n-ths. That said, we should also call attention to the fact that, in some special
cases, some fractions can be put on equal footing without having to multiply their
denominators. For example, we can use 12 as a common denominator for 32 and
9 3 12
8 because 2 = 8 . However, knowing that the product of the denominators in
question always works creates a "comfort zone" in such considerations.
We can paraphrase FFFP this way: any two fractions can be put on an equal
footing, in the sense that they can always be put in the same sequence of q-ths
for some q so that they become directly comparable. In the notation of (1.6), if
m < nk, then in the sequence of n-ths, m n (being the m-th member) is to the
left of k (being the nk-th member) and is therefore the smaller of the two. An
analogy is to compare 155 inches with 4 meters: one cannot get a sense of which is
longer until both measurements are put on an equal footing in the sense that they
are expressed in terms of the same unit, e.g., an inch. Then since 1 inch is 2.54
cm, 155 inches is 155 × 2.54 = 393.7 cm = 3.937 meters, which is shorter than 4
24 1. FRACTIONS

meters.20 This is how we can tell that 4 meters is longer. In the same way, given 23
and 57 , we may replace them with 14 15 2 5
21 and 21 , respectively, and conclude that 3 < 7 .
There will be numerous applications of FFFP in subsequent discussions.
k m
The concept of 
of n

We will give a precise meaning to a common expression, "two-thirds of some-


thing", or more generally, " k of something", and show how to use Theorem 1.1 to
compute the precise value in general. There are linguistic traps here that we would
do well to avoid, as we now explain.
Consider what is meant by "I ate two-thirds of a pie." A little thought would
reveal that it means I looked at the pie as a circular disk by ignoring its depth, cut
it into 3 parts of equal area, and then ate 2 parts. So "two-thirds of the area of the
pie" is implicitly understood. Another example: what is meant by "he gave three-
fifths of a bag of rice to his roommate"? Most likely, he measured his bag of rice by
weight and, after dividing the bag of rice into 5 equal parts by weight, he gave away
3 parts. Here "three-fifths of the weight of the rice" is again implicitly understood.
These examples point to the ambiguity in the common language because a choice
of the unit (area in the first and weight in the second) is imbedded in the language
and the reader is implicitly expected to "get it". In mathematics, there is no room
for such ambiguity. To drive home this point, consider a similar statement: "I
put away three-quarters of the ham." This time, there is a lot of room for different
interpretations. This could mean "three-quarters of the ham that is 24 inches long"
(so I put away 18 inches of it), or "three-quarters of the whole ham that weighs 8
lbs" (so I put away 6 lbs. of the ham), or "three-quarters of the 3.2 lbs. of meat
from the ham" (so I put away 2.4 lbs. of ham meat).
These examples illustrate the fact that, for the purpose of doing mathematics,
the "something" in " k of something" has to be a number referring to a precise unit,
i.e., has to be a point on the number line where the unit is clearly specified. Because
the numbers known to us up to this point are fractions, the following definition
will directly refer to this "something" as a fraction m n , except in Section 1.5 in
[Wu2020c] for the case where m n is replaced by the length of an arc, which can be
an arbitrary positive number.
k m
Definition. Let k and m n be fractions. Then  of n means the total length
m
of k parts when the segment [0, n ] is partitioned into  equal parts.

For a reason that will become clear in the following subsection, we have in-
tentionally used the word "partition" in place of the usual word "divide" in the
preceding definition.

Pedagogical Comments. The preceding definition is among the most concep-


tually complex ones in elementary school mathematics, but we strongly recommend

20 We freely make use of the multiplication of final decimals here because we are only giving

an analogy but not giving a formal mathematical proof.


1.2. EQUIVALENT FRACTIONS 25

that extra time be spent on learning it because it is critical for understanding frac-
tion multiplication (and therefore also for understanding fraction division). End
of Pedagogical Comments.

Consider, for example, the case of


1
3 of 24
7 .
This is then the length of 1 part when the segment [0, 24 7 ] is partitioned into 3 parts
of equal length. Now, 247 = 3×8
7 , so that [0, 24
7 ] is 3 copies of 87 (see page 12 for the
meaning of "copy"). Thus 13 of 24 8
7 is 7 . The key point here is that the numerator
24
of 7 is divisible by 3. Next, suppose we want
2
5 of 87 .
Now we have to partition [0, 87 ] into 5 equal parts and then measure the length of
2 of those parts. But first things first: we have to partition 87 into 5 equal parts.
Noting that 8 is not divisible by 5, we make use of equivalent fractions to get a
fraction equivalent to 87 so that its numerator is divisible by 5; i.e., 87 = 5×8
5×7 . The
8
numerator 5 × 8 is now divisible by 5, and 87 is seen to be 5 copies of 5×7 , and we
8
conclude that if [0, 7 ] is partitioned into 5 equal parts, each part would have length
8
35 . Two of these parts then have length 2×8
35 =
16
35 . Thus, 2
5 of 8
7 is 16
35 .
8
Pictorially, we first locate 7 in the sequence of sevenths:
8
0 1 7

8
7 is 8 copies of 17 . Now subdivide each of these 8 segments of length 1
7 into 5 equal
parts, as shown:
8
0 1 7

The unit segment is now partitioned into 5 × 7 = 35 equal parts, so that the new
division points furnish the initial points in the sequence of 35-ths. The segment
[0, 87 ] is now partitioned into 40 equal parts by this sequence of 35-ths. If we take
every 8th division point in this sequence of 35-ths, starting with 0, then we get a
partition of [0, 87 ] into 5 equal parts. So the length of a part in the latter partition
8
is 35 . (Of course, what we have done is merely to reprove the theorem on equivalent
fractions in the special case of 87 = 5×8
5×7 .)

This way of exploiting equivalent fractions will be seen to clarify many aspects
of fractions, such as the interpretation of a fraction as division (next subsection) or
the concept of multiplication (Section 1.4). It also allows us to solve word problems
of the following type.

Example. Prema walked 25 of the distance from home to school, and there
was still 49 of a mile to go. How far is her home from school?

Solution. We can draw the distance from home to school on the number line,
with 0 being home, the unit 1 being a mile, and S being the distance of the school
26 1. FRACTIONS

from home.21 Then it is given that, when the segment from 0 to S is partitioned
into 5 equal parts, Prema was at the second division point after 0:

0 Prema S

  
4
9 mi

For convenience, call any one of these five segments a short segment. Then the
total distance from home to school is 5 times the length of a short segment, and
we are given that the distance from where Prema stands to S comprises 3 short
segments. We are also given that the distance from where Prema stands to S is 49
of a mile. If we can find out how long a third of 49 of a mile is, then we will know
the length of a short segment and the problem will be solved. By the theorem
on equivalent fractions, we can easily change 49 to an equivalent fraction whose
numerator is divisible by 3; e.g.,
4 3×4 3×4 4+4+4
= = = ,
9 3×9 27 27
and this exhibits 49 as 3 copies of 27
4
. Therefore a third of 4
9 is 4
27 . The total distance
4
from 0 to S is thus 5 copies of 27 , which is
4+4+4+4+4 20
= .
27 27
The distance from Prema’s home to school is therefore 20 27 miles.
We would like to point out that the preceding problem is one of the standard
problems on fractions which is usually given after the multiplication of fractions has
been introduced, and the solution method is given out as a rote algorithm ("flip
over (1 − 25 ) to multiply by 49 "). However, we now see that there is no need to use
multiplication of fractions for the solution, and, in addition, the reasoning behind
the present method of solution is so simple that there is no need to memorize any
solution template.
As a final remark on the concept of " k of m n ", we prove the following theorem
that will be useful for our consideration of fraction multiplication in Section 1.4.
First, we obtain a general formula for k of mn , which will be part (i) of the theorem.
As motivation for part (ii) of the theorem, recall the fact proven above that 21 51
14 = 34 .
m
If n is a fraction, then we expect the following to hold:
21
of m
14
51 m
n = 34 of n .
But according to the definition of "of " on page 24, this equality asserts that if [0, m
n]
is divided into 14 equal parts, then the length of 21 concatenated parts would be
equal to the length of 51 concatenated parts when the same segment [0, m n ] is divided
into 34 equal parts. This is not obvious. Moreover, if m n = M
N , then we also expect
that
21
14 of m 51 M
n = 34 of N .

21 You may notice that the unit 1 is not shown in the picture. This is because we do not

know ahead of time whether S > 1 or S < 1, so we cannot place 1 on this number line until the
problem has been solved.
1.2. EQUIVALENT FRACTIONS 27

This equality clearly needs a proof, and part (ii) of the following theorem will take
care of that. Finally, the notation in the theorem is worthy of a comment. We will
be dealing with four fractions which will be assumed to be equal in pairs. So we
use lower case k and upper case K L to denote the same fraction in part (ii) to ease
the memory load somewhat. The same is true for m M
n and N .

Theorem 1.3. Let k , m K M


n , L , and N be fractions. Then:
(i) k of m km
n = n .

(ii) If k = K m M
L and n = N , then
k m K M
 of n = L of N .

Proof. We first prove part (i); i.e.,


k m km
(1.7) of = .
 n n
The left side is the length of k concatenated parts when [0, m
n ] is divided into 
equal parts. Because

  
m m m + m+ ··· + m
= = ,
n n n
m
we see that [0, m
n ] is  copies of n (see page 12 for the meaning of "copy"). Therefore
m
if we divide [0, m
n ] into  equal parts, each part will have length n . It follows that
if we concatenate k of these parts, the total length will be
k
  
m + m+ ··· + m km
= .
n n
This proves (1.7). In like manner, we have
K
L of M KM
N = LN .
Hence, to prove part (ii) of the theorem, we must prove km KM
n = LN . According
to Theorem 1.2 (cross-multiplication algorithm), this would be the case if we can
prove kmLN = nKM . In other words, we have to prove
(kL)(mN ) = (K)(nM ).
By the assumption that k = K
L and by Theorem 1.2 , we have kL = K. Simi-

larly, the assumption that m M


n = N leads to mN = nM . Therefore (kL)(mN ) =
(K)(nM ), as claimed. The proof of Theorem 1.3 is complete.

It remains to observe that equation (1.7) has a remarkable consequence. By


the definition of " k of m k m
n ", the relationship between  and n would not seem to
be symmetric, in the sense that there is no reason to believe—strictly according to
the definition on page 24—that
k m m k
(1.8) of = of .
 n n 
28 1. FRACTIONS

However, (1.7) tells us that, because multiplication between whole numbers is com-
mutative, the equality (1.8) is actually correct.

The division interpretation of a fraction

Using the idea of " k of mn ", we now give a completely different interpretation
of a fraction, the so-called division interpretation. We will prove (n = 0):
m
= the length of one part when [0, m]
n
(1.9) is partitioned into n equal parts.

Recall that the original definition of m 1


n is m copies of n (see page 12 for the
m
definition of copy), which means that to locate n , it suffices to consider the unit
segment [0, 1], partition it into n equal parts, and concatenate m of these parts.
The above statement, to the contrary, says that to locate m n , one can partition, not
[0, 1] but [0, m] into n equal parts and then take the first division point to the right
of 0. So the two are quite different statements.

Proof of (1.9). We observe that, by the definition of "of " on page 24, the
right side of (1.9) is equal to n1 of m, i.e., n1 of m
1 , which, by Theorem 1.3(i) on
1×m m
page 27, is equal to n×1 = n , the left side of (1.9). The proof of (1.9) is complete.
Due to the importance of (1.9), we now give a direct proof without appealing
to Theorem 1.3(i). To partition [0, m] into n equal parts, we use (1.2) on page 10
to express m as nm 1
n . By the definition of a fraction, m is nm copies of n . If we
1
first group m copies of n together and call it B, then we are saying that [0, m] is n
copies of B. Therefore the right side of (1.9) is equal to the length of B. Since B
is m copies of n1 , its length is equal to m
n by definition. This again proves (1.9).
It may surprise the reader to learn that the right side of (1.9) is actually
something familiar to us, at least in certain settings. Consider the special case that
m is a whole-number multiple of n; let us say m = kn for some whole number k.
In the right side of the equality (1.9), we want the length of one part when [0, m] is
partitioned into n equal parts. Since m = kn, [0, m] is just the concatenation of kn
copies of the unit segment [0, 1]. Therefore we may partition kn copies of [0, 1] (i.e.,
[0, m]) into n equal groups ("equal" in terms of length) where each group consists
of exactly k copies of [0, 1], Thus, the right side of (1.9) is just the total length
of these k copies of [0, 1], which is k. Now recall the definition of the partitive
interpretation of the division m÷n, which is the number of objects in a group
if m (= kn) objects is divided into n equal groups. If we interpret "object" in this
case to be "[0, 1]", then the right side of (1.9) is precisely m ÷ n in the partitive
sense.
We repeat, if m = kn for some whole number k, then the right side of (1.9) is
equal to the partitive division m ÷ n, and (1.9) becomes the statement that
m
(1.10) = m ÷ n.
n
Behind the symbols in (1.10) lies this statement: if m is a whole number multiple
of n, then the fraction m
n is the same point on the number line as the partitive
division m ÷ n. Moreover, at this moment, (1.10) has no meaning when m is not a
1.2. EQUIVALENT FRACTIONS 29

whole-number multiple of n, because the division m ÷ n on the right side of (1.10)


has no meaning when m is not a whole-number multiple of n.
The reason m ÷ n has no meaning when m is not a whole-number multiple of
n is that m ÷ n is a concept originating in whole numbers and is based on counting
how many objects there are in a group of objects. In other words, while 12 ÷ 3 is
the total number of objects (i.e., 4) in each group when 12 objects are partitioned
into 3 equal groups, 2 ÷ 5 has no meaning in whole numbers because there is no
whole number k so that when a group of 2 objects is divided into 5 equal groups,
there are k objects in each group. However, if we replace counting by measuring
length and allow 2 ÷ 5 to be a fraction, then we can divide [0, 2] into 5 segments of
equal length where each segment has length 25 .
2 4 6 8
0 5 5 1 5 5 2
              

This then suggests the idea that if we are willing to "expand" the meaning of m ÷ n
even when m is not a whole-number multiple of n, then the right side of (1.9)
would be a good starting point; i.e., we can make sense of m ÷ n for any two whole
numbers (n = 0) by defining it to mean the right side of (1.9). This leads to the gen-
eral concept of the division m÷n of any two whole numbers m and n (n = 0):

Definition. Let m and n be any two whole numbers, n =  0. Then m ÷ n is


the length of one part when [0, m] is partitioned into n equal parts.

With this definition at hand, we can now rephrase (1.9) as a theorem:

Theorem 1.4. For any two whole numbers m and n, n = 0,

m
= m ÷ n.
n

There are three observations to be made about this theorem. First, the fact
n is equal to a division m ÷ n is called "the division interpretation
that a fraction m
of a fraction". However, there are two serious conceptual errors concerning this
"interpretation" in TSM: (a) this "interpretation" is a fact that we can prove (see
Theorem 1.4) rather than an ad hoc meaning one sees fit to confer on a fraction
(as in TSM), and (b) the two meanings of m ÷ n when m is—or is not—a whole-
number multiple of n require a careful discussion and differentiation (which is never
done in TSM). The failure described in (b) leads to students’ confusion about the
very meaning of whole-number division. A second observation is that what we
called the "expansion" of the meaning of m ÷ n is called, in technical language,
an extension of the usual concept of whole-number division. This means that
although the whole numbers m and n in the preceding concept of m ÷ n have been
freed from the restriction of m being a whole-number multiple of n, yet when m
is a whole-number multiple of n, this definition of m ÷ n is the same as the old
one on account of the discussion preceding (1.10). The general idea of extension
30 1. FRACTIONS

(for the domain of definition of a function22 ) is standard in mathematics and will


reappear several more times throughout these volumes, e.g., the extension of the
concept of division from whole numbers to fractions on page 58, the extension of the
concept of subtraction from fractions to rational numbers on page 96, the extension
of the arithmetic operations from real numbers to complex numbers in Section 5.2
of [Wu2020b], the extension of the domain of definition of sine and cosine from
[0, 90] to the number line in Section 1.2 of [Wu2020c], etc.
A third and final observation about Theorem 1.4 is that the extension of the
meaning of m ÷ n is inevitable because m and n are fractions after all and we will
learn soon enough how to divide one fraction by another (see Section 1.5 on pp.
54). Then we will see that Theorem 1.4 gives the correct answer for the division of
fractions m ÷ n (see (iii) on page 58).
As a result of the division interpretation of a fraction, we will
retire the division symbol "÷" from now on and use fractions mn
to stand for whole-number divisions m ÷ n.

Exercises 1.2. [Reminder] In doing these and subsequent exercises, please


observe the following basic rules:
(a) Show your work; your explanation is as important as
your answer.
(b) Be clear. Get used to the idea that, as a teacher, every-
thing you say has to be understood.
(1) Explain each of the following as if to an eighth grader, directly and without
using Theorem 1.1 or Theorem 1.2, by drawing pictures using the number
line:
5 10 28 7 12 4
= , = , and = .
12 24 20 5 27 9
(2) Reduce the following fractions to lowest terms,23 i.e., until the nu-
merator and denominator have no common divisor > 1. (You may use
a four-function calculator to test the divisibility of the given numbers by
various whole numbers.)
52 121 157 414 969
, , , , .
65 143 85 299 855
(Moral: It is not easy to reduce an arbitrary fraction to lowest terms.)
(3) Consider the following proof of the cancellation law (1.4) on page 20:
Because c = 0, cc = 1 (by (1.2). Therefore,
m m c m cm
=1× = × = .
n n c n cn
This proof is common in TSM and is among its most egregious errors.
Explain what is wrong with this proof.
22 Mathematical Aside: In greater detail, let X be the subset of N × N (where N denotes
+ +
the nonzero whole numbers) consisting of all pairs (a, b) so that a and b are whole numbers, b = 0,
and a is a whole-number multiple of b. Then the standard partitive whole-number division is the
mapping, f : X → N so that f (a, b) = a ÷ b. The preceding definition of m ÷ n is the following
extension of f defined by F : N × N+ → Q (where Q is the rational numbers) so that F (m, n) is
the fraction m/n.
23 Theorem 3.1 on page 139 of Chapter 3 will show that every fraction can be reduced to a

unique fraction in lowest terms.


1.2. EQUIVALENT FRACTIONS 31

(4) School textbooks usually present the cancellation law for fractions as fol-
lows:
Given a fraction m
n , suppose a nonzero whole number k divides
both m and n. Then m m÷k
n = n÷k .

Explain as if to a seventh grader why this is true.


(5) The following points on the number line have the property that the thick-
ened segments [A, 1], [B, 2.7], [3, C], [D, 4], [ 13
3 , E] all have the same
length:

A B C D E

0 1 2 6 3 4 6 5
13
2.7 3

If A = 47 , what are the values of B, C, D, E? Be careful with your


explanations, because we do not know how to add or subtract fractions
yet. (Rest assured that on the basis of what has been discussed in this
section, you can do this exercise.)
(6) (a) 37 is 11
3
of which number? (b) 37 is 11 5
of which number? (c) 11 5 of a
7
fraction is equal to 3 . What is this fraction? (d) A wire 314 feet long is
only four-fifths of the length between two posts. How far apart are the
posts? (e) Helena was three-quarters of the way to school after having
walked 89 miles from home. How far is her home from school?
(7) Explain as if to a sixth-grade student how to do the following exercise:
Nine students chip in to buy a 50-pound sack of rice. They are to share
the rice equally by weight. How many pounds should each person get?
(If you just say, "Divide 50 by 9", that won’t be good enough. You must
explain what is meant by "50 divided by 9" and why the answer is 50 9 .)
(8) James gave a riddle to his friends: "I was on a hiking trail, and after
walking 79 of a mile, I was 49 of the way to the end. How long is the trail?"
Help his friends solve the riddle.
(9) (i) Prove that for two nonzero fractions ab and dc , ab = dc if and only if ab
= dc .

(ii) Prove that the following three statements are equivalent for any four
whole numbers a, b, c, and d, with b = 0 and d = 0:
a c a c a+b c+d
(a) b = d. (b) a+b = c+d . (c) b = d .

(One way is to prove that (a) implies (b) and (b) implies (a). Then
prove (a) implies (c) and (c) implies (a).)
(10) Place the three fractions 13 11 9
6 , 5 , and 4 on the number line and explain
how they get to be where they are.
(11) For which fractions m m m+b
n is it true that n = n+b , where b is a nonzero
whole number?
(12) m k
n of a fraction is equal to  . What is this fraction?
32 1. FRACTIONS

1.3. Adding and subtracting fractions


This section gives the precise definitions of fraction addition and subtraction,
with special emphasis given to the fact that these definitions are conceptually identi-
cal to those among whole numbers and that the addition and subtraction formulas do
not involve the least common denominator. For the purpose of defining subtraction,
we will need the cross-multiplication inequality.
The addition of fractions (p. 32)
Applications of the addition of fractions (p. 34)
The cross-multiplication inequality (p. 36)
The subtraction of fractions (p. 38)

The addition of fractions

What is the meaning of 57 + 38 ?


This simple question, incredibly, seems to have no answer in TSM.24 What
is usually done in TSM, after some vague statements about giving two fractions
a common denominator, is to give a formula for the sum in terms of the lowest
common denominator of the two fractions in question. Now students in the upper
elementary grades have some intuitive ideas about "adding numbers" because they
have seen that adding whole numbers is "combining things" and they expect the
same for adding fractions. The use of the lowest common denominator in the
addition of fractions, however, obliterates this intuitive idea and disrupts students’
normal learning process. In this subsection, we will restore this intuition by defining
the addition of fractions to be a direct extension of the usual definition of adding
whole numbers. (For further comments on the addition of fractions as practiced in
TSM, see the Pedagogical Comments on page 40.)
Consider, for example, the addition of 4 to 7. In terms of the number line, this
is just the total length of the concatenation of two segments, one of length 4 and
the other of length 7, which is of course 11, as shown. (Review the meaning of
length on page 6 if necessary.)

0 4 11
     
7

Similarly, if we have two whole numbers m and n, then m + n is simply the length
of the concatenation of the two segments of length m and n:

m n
  
m+n
With the expectation that addition should mean the same thing for whole num-
bers and fractions, we are led to the following definition of the sum of two fractions:

24 See page xiv of the preface for the definition of TSM.


1.3. ADDING AND SUBTRACTING FRACTIONS 33

k m
Definition. Given fractions k and m
n , we define their sum  + n by
k m
+ = the length of two concatenated segments, one
 n
k m
of length , followed by one of length :
 n

k m
 n
  
k m
 + n

It follows directly from this definition that the addition of fractions satisfies the
associative and commutative laws; see Exercise 1 on page 41 (cf. the appendix on
page 86 for a summary of these laws).
It is also an immediate consequence of the definition that
k m k+m
(1.11) + =
  
because both sides are equal to the length of k + m copies of 1 (see page 12 for
the meaning of "copy"). More explicitly, the left side is the length of k copies of 1
combined with m copies of 1 and is therefore the length of k + m copies of 1 , which
is exactly the right side. This tells us that, to compute the sum of two fractions
with the same denominator , one adds them as one would with whole numbers,
with the only difference being that, instead of adding so many copies of the unit 1,
we now add so many copies of the unit fraction 1 , as above.
Because of FFFP (see page 23), the general case of adding two fractions with
unequal denominators is immediately reduced to the case of equal denominators;
i.e., in order to add
k m
+
 n

where  = n, we use FFFP to rewrite k as kn m m


n and n as n . Then we obtain the
general formula for adding fractions:
k m kn m kn + m
(1.12) + = + = .
 n n n n
We emphasize that we obtain this formula from the definition of the sum of two
fractions by the use of reasoning.

Activities. If a student tells you 35 + 13 = 48 = 12 , how would you explain to


him that he is wrong? (Hint: Does he know what " 53 + 13 " means? Can he use
common sense to see why the sum cannot be 12 ?

Up to this point, when two fractions are given, we have used the product of
their denominators as a common denominator for both, i.e., a whole number so
that both fractions are equivalent to a fraction with that as their denominator.
Sometimes it can happen that a different, smaller common denominator is "handed
to you for free". For example, when the denominator of one of the fractions is
34 1. FRACTIONS

already a multiple of the other denominator, then the bigger denominator already
serves as a common denominator; e.g.,
3 7 6 7 13
+ = + =
4 8 8 8 8
or
27 13 27 130 157
(1.13) + = + = .
100 10 100 100 100
(Incidentally, the second example can be equally well expressed as 0.27+1.3 = 1.57.
See page 14 for the decimal notation.) A slightly more sophisticated example is
7 5 25
12 + 8 . It is relatively easy to notice that 24 is the least common multiple of 12
and 8 and, as such, 24 is the least common denominator of the two fractions. Since
24 = 2 × 12 = 3 × 8, the addition can be done more simply as follows:
7 5 2×7 3×5 14 + 15 29
+ = + = = .
12 8 2 × 12 3 × 8 24 24
By comparison, if we use (1.12), then we would get
 
7 5 (7 × 8) + (12 × 5) 116 29
+ = = =
12 8 12 × 8 96 24
where the last equality in the parentheses is due to the cancellation law (page 20).
In general, suppose m k
n and  are given and there is a whole number D that is
a common multiple of both n and , say D = n =  n; then bthe computation of
the sum k + m n can make use of D as the common denominator instead of n, as
follows:
k m kn  m kn +  m
+ =  +  = .
 n n n D
A more interesting example to illustrate the advantage of using a simpler common
denominator can be found in Exercise 13 on page 42.

Applications of the addition of fractions

The first application of fraction addition is the explanation of the addition


algorithm for (finite) decimals. (The definition of a decimal is given on page
14.) For example, consider
4.0451 + 7.28.
This algorithm calls for
(α) lining up 4.0451 and 7.28 by their decimal points,
(β) adding the two numbers as if they were whole numbers and
getting a whole number, to be called N , and
(γ) putting the decimal point back in N to get the answer to
4.0451 + 7.28.
We now supply the reasoning for the algorithm (in essence, it has already been
given in equation (1.13) above). First of all, we make use of the fact that zeros can
be added to the right of a decimal (see page 21 for the precise statement) to rewrite
the two given decimals as two decimals with the same number of decimal digits:26
40451 72800
4.0451 + 7.28 = 4.0451 + 7.2800 = + .
104 104
25 For a definition, see Exercise 4 on page 156.
26 A little reflection would tell you that we are essentially using FFFP (page 23) here.
1.3. ADDING AND SUBTRACTING FRACTIONS 35

Then
40451 + 72800
4.0451 + 7.28 = (corresponds to (α))
104
113251
= (corresponds to (β))
104
= 11.3251 (corresponds to (γ)).
The reasoning is of course completely general and serves to explain the algorithm
(α)–(γ) for any pair of decimals.
A second application is to get the so-called complete expanded form of a (finite)
decimal. For example, given 4.1297, we know it is the fraction
41297
.
104
But we have the expanded form of the whole number 41297:
41297 = (4 × 104 ) + (1 × 103 ) + (2 × 102 ) + (9 × 101 ) + (7 × 100 ).
(Recall that 100 is by definition equal to 1.) We also know that, by equivalent
4 3
fractions, 4×10
104
= 4, 1×10
104
= 10 1
, etc. Thus by (1.12) on page 33,
1 2 9 7
(1.14) 4.1297 = 4 + + + 3 + 4.
10 102 10 10
This expression of 4.1297 in (1.14) as a sum of descending powers27 of 10, where
the coefficients of these powers are the digits of the number itself (i.e., 4, 1, 2, 9,
and 7), is called the complete expanded form of 4.1297. Equation (1.14) is the
reason that we can say 4.1297 is "the sum of 4 and 1 tenth and 2 hundredths and
9 thousandths and 7 ten-thousandths". Please observe that this conclusion is a
precise, logical consequence of the definition of a decimal (in equation (1.3) on page
14) and the addition formula (1.11) on page 33. Please do not make the standard
TSM mistake of telling students—without first proving (1.14)—that 4.1297 "means
4 and 1 tenth and 2 hundredths and 9 thousandths and 7 ten-thousandths". You
would confuse them. In the same way, a decimal 0.d1 d2 · · · dn ,28 where each dj is a
single-digit number, has the following complete expanded form:
d1 d2 dn
(1.15) 0.d1 d2 · · · dn = + 2 + ··· + n.
10 10 10

A third application of fraction addition is to introduce the concept of mixed


numbers. We have seen (on page 15) that, in order to locate fractions on the number
line, it is an effective method to use division-with-remainder on the numerator.
With the availability of equation (1.12) on page 33, we are now in a position to
clarify the whole procedure; e.g.,
187 (13 × 14) + 5 (13 × 14) 5 5
= = + = 13 + .
14 14 14 14 14
5
Thus the sum 13 + 14 , as a concatenation (see page 12 for the meaning of "con-
5
catenation") of two segments of lengths 13 and 14 (recall the definition of adding
27 "Descending" 1
if you think of 10 as 10−1 , etc. Negative exponents will be discussed in
Section 4.1 of [Wu2020b].
28 The notation here is unfortunate: "d d · · · d " is not the product of d , d , . . . , d .
1 2 n 1 2 n
36 1. FRACTIONS

fractions on page 33), clearly exhibits the fraction 187


14 as a point on the number line
5
about one-third beyond the number 13. The sum 13 + 14 is usually abbreviated
5
to 13 14 by omitting the + sign and, as such, it is called a mixed number. More
generally, a mixed number is a sum n + k , where n is a whole number and k
is a proper fraction (i.e., its numerator is smaller than its denominator), and it
is usually abbreviated as just n k .29 This concept usually causes terror among
students, probably because it is usually introduced in TSM30 before the concept
5
of the addition of fractions is taught. Something like 13 14 is then explained by
5
the usual baby talk about this number being "13 and 14 ". Unfortunately, the word
"and" in this context masks the concept of fraction addition, which is not yet known
to students at this point. Students are therefore at a loss when they are forced to
5
do computations with 13 14 . Inevitably, nonlearning follows.
It is for the reason of avoiding this pitfall that we have postponed the intro-
duction of the concept of a mixed number until now. So just remember: a mixed
number is a sum of a whole number and a proper fraction. No more, and no less.

The cross-multiplication inequality

We next wish to discuss the subtraction of fractions. We are handicapped


by not having negative fractions at our disposal, however, so that, as in the case
of subtracting whole numbers, we must first make sure that k < m n before we
can compute n −  . For this reason, how to determine that one fraction is less
m k

than another now becomes our next concern. To this end, we will need the basic
inequality in Theorem 1.5 below. Recall from page 12 that k < m
n means the point
k m
 is to the left of the point n on the number line:
k m
 n

In practice, it may not be easy to tell, for example, which of 49 or 37 is bigger.


We need a general method for comparing fractions. Now FFFP comes to the rescue:
we simply put both fractions in the sequence of 63-rd’s (63 = 9 × 7) and see which
of the two fractions comes first. We rewrite both to have denominator 9 × 7, so
that
4 28 3 27
= and = .
9 63 7 63
Then in the sequence of 63-rd’s, 9 is the 28-th point and 37 is the 27-th point.
4

Therefore the latter is to the left of the former. Consequently, 49 is the bigger of the
two. This reasoning is perfectly valid in general, so we have the following theorem.

Theorem 1.5 (Cross-multiplication inequality). Given two fractions k


and m k m
n , then  < n is equivalent to kn < m.

Proof. We simply follow the reasoning in the preceding special example. By


FFFP, we can rewrite k and m kn m k m
n as n and n , respectively. If  < n , then we
29 Discussions of fractions and decimals seem to be rife with notational problems. In this

case, please note that n k is not the product of n and k .


30 See page xiv of the preface for the definition of TSM.
1.3. ADDING AND SUBTRACTING FRACTIONS 37

have kn m 1
n < n . This means the kn-th multiple of n is to the left of the m-th
multiple of the same, so that kn must be smaller than m; i.e., kn < m. Conversely,
1
suppose kn < m. Then the kn-th multiple of n is to the left of the m-th multiple
1
of n , so that
kn m
< .
n n
By the theorem on equivalent fractions, this becomes k < m
n . The proof of the
theorem is complete.

Corollary. (i) ac < bc is equivalent to a < b. (ii)  > n is equivalent to 1


 < 1
n.

Proof. (i) This is because the preceding theorem implies that the inequality ac < bc
is equivalent to the inequality ac < bc. Now, since a, b, and c are whole numbers,
ac = the sum of a copies of c and, similarly, bc = the sum of b copies of c. Therefore
ac < bc is equivalent to a < b, and part (i) is proved.
(ii) The inequality  > n is equivalent to n < , which is of course identi-
cal to 1 · n < 1 ·  which, by the preceding theorem, is equivalent to 1 < n1 . The
corollary is proved. Note that we have just made use of the dot in 1· to denote 1×.

Pedagogical Comments. TSM generally considers both parts of this corol-


lary to be too obvious for any proof. For part (i), it is easy to take it for granted
because it "looks" obvious. However, if a teacher does not prove that, for example,
37 41
19 < 19 but simply declares that it is obvious because 37 < 41, what will this
teacher say to a student who claims that 19 19
37 < 41 because 37 < 41? Now, a teacher
might object to proving something as intuitive as part (i) by appealing to a com-
plicated theorem such as Theorem 1.5. Because of this objection, we now offer a
more direct proof of part (i), as follows. The reason that ac < bc implies a < b is
that, by definition, ac < cb means bc is to the right of ac . But cb is the b-th point in
the sequence of 1/c and ac is the a-th point in the sequence of 1/c. Therefore what
we have is that the b-th point of the sequence is to the right of the a-th point of
the sequence. Since one counts the points in the sequence from left to right, this
can only mean that a < b. Conversely, suppose a < b. Then the a-th point in the
sequence of 1/c is to the left of the b-th point of the same sequence. This means
precisely that ac < bc , as desired.
Recall that we bemoaned the absence in TSM of a definition of the concept of
a fraction being smaller than another on page 13 when we formalized this concept.
The preceding argument is an explicit demonstration of the fact that having precise
definitions facilitates learning.
The need for reasoning in support of part (ii) is slightly different. Many teachers
dismiss the need for any proof of this corollary, and the common thinking behind
the dismissal is that, for small values of  and n, e.g.,  = 3 and n = 2, a third of a
pizza is visibly smaller than a half of a pizza, and so 13 < 12 . One can even get more
sophisticated by appealing to the number line and infer from the following picture
that 13 < 12 :
1 1 2
0 3 2 3 1
38 1. FRACTIONS

While this kind of intuitive argument is invaluable for pointing students toward the
correct conclusion, it cannot be confused with valid reasoning. After all, how would
such an intuitive argument bring conviction to the claim that
1 1
> ?
8590007 8590008
This is why the correct reasoning using the cross-multiplication inequality must be
taught in addition to the intuitive argument using small values of  and n. End of
Pedagogical Comments.

The following observations about the comparison of fractions are useful and
also easy to prove. Let A, B, C, D be fractions. Then we have:
(1) A < B ⇐⇒ there is a fraction C so that A + C = B.
(2) A < B implies A + C < B + C for every fraction C.
(3) A < B and C < D implies A + C < B + D.
The proofs require nothing more than making mathematical interpretations of cor-
rect drawings on the number line. We will leave them as an exercise (see Exercise
14 on page 42).

Activity. Without using any calculator or computer software, decide which


of the following fractions is greater:
33333 33335
and .
1234567 1234569

The subtraction of fractions

m k
We can now define the subtraction of fractions: suppose n > ; then a seg-
ment of length m k
n is longer than a segment of length  .

n >  > 0, then the subtraction n −  is by definition


m k
Definition. If m k

the length of the remaining segment when a segment of length k is taken from one
end of a segment of length m
n.
m
n
  
  
k


n −  is the length of the thickened segment on the right. We


In the picture, m k

also define
m m m m
− = 0 and −0= .
n n n n
This definition of the subtraction of fractions is clearly modeled on the subtraction
of whole numbers. For example, 9 − 4 is the length of the remaining segment when
a segment of length 4 is taken from a segment of length 9.
1.3. ADDING AND SUBTRACTING FRACTIONS 39

The definition of subtraction for fractions has the following pleasant (but ex-
pected) consequence: let k and m
n be distinct fractions; then

k m m k
(1.16) the length of the segment , is − .
 n n 
m
Indeed, by the definition of length on page 6, the length of [0, (respectively, [0, k ])
n]
m k
is n (respectively,  ). Therefore (1.16) is clear from the definition of subtraction:

m
n
  
0  k m
k  n


The reasoning used in the proof of (1.11) on p. 33, together with FFFP, gives
m k m − nk
(1.17) − = .
n  n
Note that this formula makes implicit use of the preceding cross-multiplication in-
equality (Theorem 1.5), because the subtraction of whole numbers in the numerator
of the right side of (1.17), m − nk, does not make sense unless we know m > nk;
but since k < m n , Theorem 1.5 guarantees that m > nk.
We wish to bring out the fact that subtraction is an alternate way of expressing
addition. Indeed, the definition of m n −  , together with the definition of adding
k

fractions as the concatenation of segments (page 33), implies that


 
k m k m
(1.18) + − = .
 n  n
This is quite obvious by looking at the preceding picture. Thus we may regard
n −  as the fraction that must be added to  to get n . Although this more
m k k m

abstract perspective seems to add nothing to the concept of subtraction, it will


serve as a bridge to the definition of the division of fractions (see page 57).
The subtraction of mixed numbers reveals a sidelight about subtraction that
may not be entirely devoid of interest. Consider the subtraction of 17 25 − 7 34 . One
can do this routinely by converting the mixed numbers into fractions:
2 3 85 + 2 28 + 3 87 31 87 × 4 − 5 × 31 193
17 − 7 = − = − = = .
5 4 5 4 5 4 5×4 20
However, there is another way to do the computation:
2 3 2 3
17 − 7 = (17 + ) − (7 + ).
5 4 5 4
Anticipating a reasoning that will be made routine when we come to the next
chapter on rational numbers (see equation (2.14) on page 97),31 we can rewrite the
right side as (17 − 7) + ( 25 − 34 ). But now we are stuck because 25 < 34 , so the
subtraction on the right cannot be performed according to our present definition
of subtraction. Using an idea that is reminiscent of the "trading" technique in
31 Rest assured that there is no circular reasoning in appealing to a result in Chapter 2 at this

juncture, because the mathematical development of Chapter 2 does not depend on this section.
40 1. FRACTIONS

the subtraction algorithm for whole numbers, we can get around this difficulty by
computing as follows:
   
2 3 2 3
17 − 7 = 16 + 1 − 7+
5 4 5 4
 
2 3
(1.19) = (16 − 7) + 1 −
5 4
 
7 3 13 13
= 9+ − =9+ =9 .
5 4 20 20

Again, the second line makes use of (2.14) on page 97.


The whole computation looks longer than it actually is because we interrupted
it with explanations. Normally, we would have done it the following way:
   
2 3 2 3 7 3 13 13
17 − 7 = (16 − 7) + 1 − =9+ − =9+ =9 ,
5 4 5 4 5 4 20 20
which is exactly the same as before.

Finally, there is a similar subtraction algorithm for finite decimals that


allows finite decimals to be subtracted as if they were whole numbers provided
they are aligned by their decimal points, and then the decimal point is restored in
the final result at the end. The reasoning is exactly the same as the case of ad-
dition (of decimals) and will therefore be left as an exercise (Exercise 10 on page 42).

Pedagogical Comments. We wish to address the issue of why TSM’s stan-


dard formula for the addition of fractions in terms of their least common denomi-
nator is pedagogically unsound. Given k and m k m
n , students are told to add  + n by
first finding the least common denominator of the fractions, which is by definition
the LCM32 of the denominators  and n, say B. Then if B = n =  n for some
whole numbers  and n , the sum of these two fractions—according to TSM—is
given by
k m kn  m kn +  m
+ =  +  = .
 n n n B
First of all, TSM usually offers this formula as a definition of the sum of fractions,
and this is certainly unacceptable because it bears no resemblance to the intuitive
notion of addition as "combining things". But even as a formula for addition, it
is no less objectionable because, as we have seen in (1.12) on page 33, it is not
necessary to use the LCM of  and n when the product of the denominators, n, is
both adequate and more natural as a common denominator. Finally, this formula
is destructive in terms of mathematics learning because many students confuse the
LCM of two whole numbers with their greatest common divisor (see page 138 for
this concept) and there is no reason to complicate the learning of a simple skill by
artificially inflating its complexity.
The same comment applies to the use of least common denominator for the
subtraction of fractions.

32 Least common multiple. For a precise definition, see Exercise 4 on page 156.
1.3. ADDING AND SUBTRACTING FRACTIONS 41

Be sure to take every opportunity to eradicate this approach to the addition


and subtraction of fractions from your teaching, because it has caused great harm
to mathematics learning. End of Pedagogical Comments.

Mathematical Aside: The use of the least common denominator for the defini-
tion of the addition of fractions is more than just a pedagogical disaster. From a
mathematical perspective, it is conceptually incorrect. If it were necessary to find
the LCM of the two denominators before the addition of two fractions could be de-
fined, it would imply that addition cannot be performed in the field of quotients of
an integral domain unless the latter has the special property that any two elements
in it have an LCM. This is almost the statement that addition cannot be defined
in the field of quotients of an integral domain unless the domain has the unique
factorization property. However, we know that this is false because addition can be
defined in the field of quotients of any domain.

Exercises 1.3.
(1) Use the definition of the sum of two fractions on page 33 to show that the
addition of fractions satisfies the associative and commutative laws. (See
(1.56) and (1.57) on page 87 for the precise statement of these two laws
and the subsequent discussion for their significance.)
(2) Compute (you may use a four-function calculator for the arithmetic com-
putations; no need to simplify your answers but you have to show all your
steps):
5
(i) 18 8
+ 27 5
, (ii) 34 − 51
7 5
, (iii) 24 12 + 32 11 3 13
15 , (iv) 315 8 − 312 20 .

(3) Compute (for the addition of three numbers,



2 3see
Section 1.8 on page 86):
(i) 6 + 4 + 8 , (ii) 8 + 15 + 16 , (iii) 3 + 4 − 16 .
5 1 7 5 4 7 13

(4) Compute 118 52 3


− 267
13 in two different ways, and check that both give
the same result. (Large numbers are used on purpose. You may use a
four-function calculator to do the calculations with whole numbers, but
only for that purpose.)
(5) (a) We have an algorithm for adding two fractions: k + m
n =
kn+m
n . Now
explain as if to an eighth grader how to obtain an algorithm for adding
p
three fractions k + mn + q . Make sure you can justify the algorithm.
1 1 1
(b) If a, b, c are nonzero whole numbers, what is ab + bc + ac ? Simplify
your answer as much as possible.
(6) Show a sixth grader how to do the following problem by using the number
line: I have two fractions whose sum is 18 11 and whose difference (i.e., the
1
larger one minus the smaller one) is 2 . What are the fractions? (Caution:
No need to consider simultaneous linear equations.)
(7) Explain as if to a fifth grader why 1.2 is bigger than 1.1987.
(8) Explain as if to a sixth grader how to get 5.09 + 7.9287 = 13.0187.
(9) (a) Which is closer to 37 , 49 or 11
21 ? (b) Which whole number is closest to
the sum
1300 312
+ ?
1305 51
(Don’t forget to prove it!)
42 1. FRACTIONS

(10) State the subtraction algorithm for finite decimals, and explain why it is
true. (See the discussion of the addition algorithm for finite decimals on
page 34.)
(11) (a) 25 + 12
7
= ? (b) Laura ran for 35 minutes, stopped to take a rest, and
then ran for another 24 minutes. How long did she run altogether, and
what does this have to do with part (a)?
(12) An alcohol solution mixes 5 fl. oz. of water with 24 fl. oz. of alcohol. Then
4 fl. oz. of water and 19 fl. oz. of alcohol are added to the solution. Which
has a higher concentration of alcohol (which is defined to be the number

(volume of the alcohol in the solution) ÷ (total volume of the solution)

in the sense of (1.10) on page 28): the old solution or the new?
(13) If n is a whole number, we define n! (read: n factorial) to be the product
of all the whole numbers from 1 through n. Thus 5! = 1 × 2 × 3 × 4 ×
5.
We also define 0! to be 1. Define the so-called binomial coefficients nk
for any whole number k satisfying 0 ≤ k ≤ n as
 
n n!
= .
k (n − k)! k!
Then prove
     
n n−1 n−1
= + .
k k k−1
(The proper context for this exercise is Pascal’s triangle, which is discussed
in Section 5.4 of [Wu2020b].)
(14) Prove each of the statements (1)–(3) on page 38 for fractions A, B, C,
and D.
(15) Suppose a, b are whole numbers so that 1 < a < b. Which is bigger: a−1 a
or b−1b ? Can you tell by inspection? What about a+1
a and b+1
b ?
(16) (a) Suppose ab and dc are fractions and ab < dc . Prove that ab < a+c b+d <
c
d . (b) Prove that between any two distinct fractions, there is another
fraction.
(17) Let ab be a nonzero fraction, with a = b. Order the following (infinite
number of) fractions: ab , a+1 a+2 a+3
b+1 , b+2 , b+3 , . . . . (Caution: It makes a
difference whether a < b or a > b.)
(18) In the notation of Exercise 13, observe that each fraction n! j , where n, j
are whole numbers and 1 ≤ j ≤ n, is actually a whole number. Find the
following sum and simplify your answer as much as possible:
1 1 1 1 1
49!
+ 49!
+ 49!
+ ··· + 49!
+ 49!
.
1 2 3 48 49

(19) (a) Prove 1


4 − 25 + 16 = 1
60 . (b) Generalize the following: let n be a nonzero
whole number; then prove that n1 − n+1 2 1
+ n+2 is always a unit fraction,
i.e., a fraction whose numerator is 1.
1.4. MULTIPLYING FRACTIONS 43

1.4. Multiplying fractions


Multiplication—not division—is the most complex of the four arithmetic opera-
tions among fractions. We first give a motivation for the definition of multiplication
and then dive in to prove the two most important facts about fraction multiplication:
the product formula k × m km
n = n and the area formula for a rectangle with fractional
sides. It is sobering to realize that something as basic as the latter area formula is
not proved in TSM. Then we use the product formula to explain the rule—also not
explained in TSM—for the multiplication of finite decimals.
Motivation for the definition of multiplication (p. 43)
The formal definition of multiplication (p. 45)
The product formula (p. 46)
Areas of rectangles (p. 47)
Three remarks (p. 51)

Motivation for the definition of multiplication

To facilitate student learning in school mathematics, it is of vital importance


that we give students a precise definition for each concept. They have to know what
they are doing. This is especially true for fraction multiplication. Such a definition
is rarely, if ever, found in TSM33 or even in the education research literature. Recall
that for whole numbers, multiplication is, by definition, just repeated addition: 3×5
means 5 + 5 + 5 and 4 × 17 means 17 + 17 + 17 + 17. This definition cannot be
literally extended to fractions; e.g., it makes little sense to define 47 × 12 as "adding
1 4
2 to itself 7 times". This difficulty has led some educators to advocate extreme
measures to achieve any kind of understanding of this concept.34
We will do mathematics the standard way by giving a precise definition of frac-
tion multiplication and deducing precise logical consequences. Since we are not
yet in possession of a definition, we must first look for one. The process of how
to formulate such a precise definition is usually hidden in textbooks and, strictly
speaking, not part of the logical development of mathematics. For this reason,
textbooks are not obliged to give it.35 However, because the definition of fraction
multiplication, while consistent with the definition of whole-number multiplication,
is more sophisticated than the latter, we will give some indication of this kind of
necessary "groping in the dark" to shed some light on the ultimate definition itself.
We have to be perfectly clear: what is in the next paragraph is not formal mathemat-
ics, but a rather fuzzy patchwork of educated guesses and wishful thinking—more
or less what takes place when one is struggling to solve a problem. The point is
not whether any particular step is correct or reliable, but whether what comes out
of this process turns out to be mathematically correct and consistent with common
sense. You can make up your mind on that after you have read through this section.
What follows in this paragraph (as announced) consists of heuristic arguments
about why we should define multiplication of fractions in a particular way (essen-
tially as given in Section 7.2 of [Wu2002]); it is not part of the logical development
33 See page xiv of the preface for a definition of TSM.
34 One suggestion was that "multiplication of fractions is about finding multiplicative rela-
tionships between multiplicative structures". This is poetic, but not mathematically helpful.
35 In any case, when such an effort is made in TSM, it is usually a mess; e.g., the motivation

for defining 5−3 as 1/53 becomes indistinguishable from a proof that 5−3 = 1/53 .
44 1. FRACTIONS

of fractions, and there is no pretense that any of what follows is logically correct.
It so happens that some of the following claims or guesses will eventually be proved
on the basis of the definition of multiplication on page 45, but until that happens,
nothing in this paragraph should be construed as proven and therefore usable for
reasoning. That said, let k and m be whole numbers; then the concept of k × m
is no mystery: it is the length of k copies of m (see page 12 for the meaning of
"copy"):
k × m = m + m + ··· + m.
  
k
Now by analogy (and some wishful thinking but not by logic), we would like to
believe that for a whole number k and for a fraction m n , the multiplication k × n
m
m
"should also be" the length of k copies of n ; i.e.,
m m m
(1.20) k× = + ··· + .
n n  n
k

Adding the right side, we get


m km
(1.21) k× = .
n n
Next, what "should be" a reasonable definition of 1 × m
n for a nonzero whole number
? It seems "reasonable" to expect that the multiplication of fractions—like the
multiplication of whole numbers—obeys the associative law. Granting this, we get
   
1 m 1 m
× × = × × .
 n  n
By (1.21), we have  × 1
 = 
 = 1 and also 1 × m
n =
m
n. Therefore,
 
1 m m
(1.22) × × = .
 n n
We interpret (1.22) as follows: if A denotes 1 × m n , then the left side of (1.22) is
equal to  copies of A, by equation (1.20).36 Thus, (1.22) says that the length of 
copies of A is equal to m m
n . Therefore, A is the length of one part when n is divided
into  equal parts, or,
1 m m
(1.23) × = length of one part when is divided into  equal parts.
 n n
Furthermore, we also observe that if m = 1 in (1.21), we get k × n1 = nk for any
whole numbers k and n (n = 0). For a reason that will be clear presently, we change
the notation and rewrite it as
k 1
=k× .
 
And remembering that multiplication is assumed to obey the associative law, we
finally get    
k m 1 m 1 m
× = k× × = k× × .
 n  n  n

36 A reminder: equation (1.20) is not a proven fact, only part of our wishful thinking of what

the world "should" look like if we have our way.


1.4. MULTIPLYING FRACTIONS 45

Therefore, by applying (1.20) and (1.23) in succession to the last expression, we


have
 
k m 1 m
× = length of k copies of ×
 n  n

= length of k of the parts


m
(1.24) when is divided into  equal parts.
n
Now let us take stock of the situation. We have arrived at (1.24) by making
a series of what we believe to be reasonable assumptions. If indeed everything
turns out to be what we believe it ought to be, then (1.24) should be a reason-
able definition of k × m n . At this point, this belief is nothing more than wishful
thinking, but it is at least wishful thinking grounded in our collective mathemat-
ical experience. On this basis, we proceed to adopt the provisional definition (1.24).

The formal definition of multiplication

We now push the reset button. We will leave our heuristic discussion behind
and formalize our provisional definition in (1.24) in the following definition. From
this formal definition we will draw logical conclusions about fraction multiplication.

Definition. The product of two fractions k × m n


is by definition
the length of the concatenation of k of the parts
when [0, m
n ] is partitioned into  equal parts.
(See page 12 for the meaning of "concatenation".)

Note that, according to the definition on page 24, we may rephrase the definition
as
k m k m
(1.25) × = of .
 n  n
The preceding definition of multiplication poses a potential problem, and we
should deal with it right away. We first illustrate the problem with a simple exam-
ple. Consider 12 × 34 . We know that 12 = 24 and 34 = 1520 , and therefore we expect
that, no matter how fraction multiplication is defined, we should have the equality
1 3 2 15
× = × .
2 4 4 20
By (1.25), this is equivalent to asking whether
1 3 2 15
of = of .
2 4 4 20
Now if this equality turns out to be incorrect, then it would mean the definition
makes no sense, or in the usual language of mathematics, is not well-defined.
More generally, suppose
k K m M
= and = .
 L n N
46 1. FRACTIONS

Then the question is whether the following equality is valid:


k m K M
(1.26) × = × .
 n L N
By (1.25), this is equivalent to asking whether
k m K M
of = of .
 n L N
By Theorem 1.3(ii) on page 27, the answer is affirmative. So we know that our
definition of the product of two fractions is well-defined.

The product formula

For the conceptual understanding of fraction multiplication, the following the-


orem is all-important.

Theorem 1.6 (Product formula). For all fractions k and m k m km


n ,  × n = n .

Because of (1.25) above, this theorem is nothing more than a restatement of


Theorem 1.3(i) on page 27. In TSM and in standard professional development
materials, this formula is often presented as the definition of the product k × m n.
However, without the definition on page 45, it is difficult to teach the solving of
word problems involving fraction multiplication except by rote (see, e.g., Exercises
8 and 9 on page 54).
As an immediate corollary of the product formula, we have

Corollary. The multiplication of fractions obeys the associative, commutative, and


distributive laws.

(See the appendix on page 86 for a summary of these laws.) We will leave the
detailed proof of the corollary to an exercise (Exercise 4 on page 53).
There are two immediate consequences of our definition of fraction multipli-
cation. The first is the interpretation of division by a (nonzero) whole number
as multiplication. Recall that we defined on page 29 the concept of k ÷  for two
arbitrary whole numbers k and  ( = 0) to mean the length of one part when [0, k]
is partitioned into  equal parts. But from the definition of fraction multiplication
(page 45), 1 × k has exactly the same meaning: the length of one part when [0, k]
is partitioned into  equal parts. Therefore,
1
(1.27) k ÷  = × k for whole numbers k and ,  = 0.

We can put equation (1.27) into a more general context. (For a reason that will be
obvious, we are going to change the notation in the definition on page 29 from m
and n to k and , respectively.) While the definition of k ÷  on page 29 for any two
whole numbers k and  ( = 0) is a generalization of the partitive division of k ÷ 
when k is a whole-number multiple of , this definition can itself be generalized by
replacing k with a fraction A. Indeed, we now define the division of a fraction
A by a nonzero whole number  to be the length of one part when [0, A]
is partitioned into  equal parts. Then it follows from the definition of fraction
multiplication that 1 × A is equal to A divided by  in the sense just defined. For
1.4. MULTIPLYING FRACTIONS 47

this reason:
Division (of a whole number or a fraction) by a nonzero whole
number  will henceforth be replaced by multiplication by 1 .
Incidentally, in view of the product formula, equation (1.27) implies that k ÷
 = k , which is exactly Theorem 1.4 on page 29. This then provides a different
perspective on Theorem 1.4.
A second immediate consequence of the definition of fraction multiplication is
that if k is a whole number, then according to the definition of multiplication on
page 45, k × mn = 1 × n = the length of the concatenation of k segments of length
k m
m
n . By the definition of addition (page 33), the latter is equal to adding k copies
of the fraction mn:
m m m
(1.28) k× = + ··· + .
n n  n
k
In other words, the multiplication k × retains the meaning of repeated addition.
m
n
As is well known, the product formula has numerous applications. One of the
simplest may be the explanation of the usual cancellation rule for fractions. For
example, we have
135 49 105
× =
28 9 4

because we can "cancel" the 9’s and 7’s in the numerators and denominators. The
precise reasoning is the following. By the product formula,
135 49 135 × 49
× =
28 9 28 × 9
Therefore,
135 49 15 × 9 × 7 × 7 15 × 7 15 7
× = = = × ,
28 9 4×7×9 4 4 1
where we have made use of equivalent fractions. The same reasoning of course
proves that for any fractions ma k
n and a we have the general cancellation rule
for fractions:
ma k mk
(1.29) × = .
n a n

Areas of rectangles

A more substantial application of the product formula is undoubtedly the fol-


lowing interpretation of fraction multiplication in terms of area;37 this interpreta-
tion is as basic as the definition (of fraction multiplication) itself. We will prove
that the area of a rectangle is equal to the product of its sides. Let us first review
some basic properties of area. We fix a unit square, i.e., a square whose sides all
have length 1. The area of the unit square is by definition equal to 1. A geometric
figure is by definition just a subset of the plane. We will assume that congruent
geometric figures have the same area. A geometric figure S is said to be paved by
37 See the footnote on page 17 concerning the concepts of congruence and area. You may take

both in the naive sense until Chapter 4 of this volume and Chapter 4 in [Wu2020c], respectively.
48 1. FRACTIONS

a collection of geometric figures S1 , . . . , Sn if the union of S1 , . . . , Sn is S and if


S1 , . . . , Sn overlap only at their boundaries. A fundamental fact is that if S1 , . . . ,
Sn pave S, then the area of S is the sum of the areas of S1 , . . . , Sn . If the unit
square is paved by n congruent pieces, then all these pieces have equal areas and,
consequently, these pieces give rise to a division of the unit (area of the unit square)
into n equal parts; by the definition of the fraction n1 , the area of each piece is n1 .
For example, each of the following shaded regions of the unit square has area equal
to 14 :

@
@
@
@
@
@

The interpretation of fraction multiplication in question is the following theo-


rem.

Theorem 1.7. The area of a rectangle with sides of lengths k and m


n is equal
to
k m
× .
 n

In school mathematics, this theorem is the basis of the statement that "area is
length times width". It is sobering to realize that in TSM, there is no explanation of
Theorem 1.7 except (perhaps) the case of k and m n being whole numbers.
38
When
the lengths of the sides of a rectangle are numbers that may not be fractions,39 the
theorem continues to hold, but the reasoning becomes more sophisticated. This
issue will be considered in Section 4.4 of [Wu2020c].
Before giving the proof of Theorem 1.7, let us work out a special case: why is
the area of a rectangle with sides lengths 23 and 52 equal to 23 × 52 ? First consider
a simpler case: why does a rectangle with sides lengths 13 and 12 have area equal
to 13 × 12 ? Such a rectangle is related to the unit square in the following way. Let
one pair of opposite sides of the unit square be divided into 3 parts of equal length
and the other pair into 2 halves. By joining corresponding points of the division on
opposite sides, we obtain a paving of the unit square by 6 small rectangles. Each of
these 6 small rectangles has side lengths 13 and 12 , and they are congruent to each
other (geometrically obvious, but see Exercises 4 and 7 on page 237).

38 Note that CCSSM ([CCSSM]) shows awareness of the significance of Theorem 1.7.
39 In other words, irrational numbers.
1.4. MULTIPLYING FRACTIONS 49

1
3

1
2
These 3 × 2 small rectangles are congruent and therefore have equal areas. The
areas of these rectangles thus form an equal partition of the unit 1 (= area of unit
square) into 3 × 2 equal parts. Consequently, the area of each small rectangle is the
1
fraction 3×2 . By the product formula, we have
1 1 1
= × .
3×2 3 2
We have just shown that a rectangle with side lengths 13 and 12 has area 13 × 12 .
Now we can tackle the case of sides with lengths 23 and 52 . Let a rectangle R
with side lengths 23 and 52 be given. We want to show that its area is 23 × 52 . The
key observation here is that R is paved by rectangles with side lengths 13 and 12 .
Precisely, it is paved by 2 rows of 5 such rectangles—the 2 being from the numerator
of 23 and the 5 being from the numerator of 52 —as shown below.

R
1
3

1
2
Since each of these 2 × 5 small rectangles has area equal to 1
3×2 , the area of R is
the sum of 2 × 5 of these areas; i.e.,
1 1
area of R = + ··· + .
3×2 3×2
  
(2×5)

By (1.28) on page 47, the right side is equal to


1 2×5 2 5
(2 × 5) × = = × ,
3×2 3×2 3 2
where we have used the product formula twice. Therefore Theorem 1.7 is proved
for this special case.
A little reflection on the preceding reasoning will reveal that the specific num-
bers, 23 and 52 , do not play a crucial role, and the reasoning is valid in general.
Therefore we can simply write down the general proof along exactly the same lines.

Proof of Theorem 1.7. The proof is broken up into two parts: Part I: the case
where the lengths of the sides are unit fractions, i.e., fractions with numerators equal
to 1 (see page 10) and Part II: the general case. The proof will take for granted
certain geometrically obvious facts about rectangles that can be easily proved once
the needed tools are available (see Exercises 4 and 7 on page 237).
50 1. FRACTIONS

Part I. We are going to prove that if a rectangle has two sides of lengths 1 and n1 ,
then its area is 1 × n1 .
On the number line, let the unit 1 be the area of the unit square. We will show
how to divide this particular unit segment into n equal parts, as follows. Partition
one pair of opposite sides of the unit square into  parts of equal length and the
other pair into n parts of equal length. Join the corresponding points of the division
to obtain a paving of the unit square by n congruent rectangles, each having side
lengths 1 and n1 , as shown:

 copies

1
? 
1
n
 n copies -

These n rectangles have equal areas because they are congruent. The unit square
therefore has been partitioned into n equal parts—in terms of area—by these
rectangles. Consider the shaded rectangle in the picture. By the definition of a
fraction (in the case that the unit is the area of the unit square), its area is the
1
fraction ×n on this number line because it is one part when the unit (area of the
unit square) is partitioned into n equal parts. By the product formula, its area is
equal to

1 1 1
= × .
×n  n

The proof of Part I is complete.

Part II. We are going to prove that the area of a rectangle with sides of length k
and mn is  × n .
k m

n . Then R is paved by k rows


Let R be a rectangle with sides of length k and m
of m rectangles each of which has side lengths 1 and n1 , as shown:

k copies
R

1
? 
1
n
 m copies -
1.4. MULTIPLYING FRACTIONS 51

1
We have just seen that each of these small rectangles has area equal to n . Since
R is paved by km of these congruent small rectangles, the area of R is equal to
1 1 1 km k m
+ ··· + = km × = = × ,
n  n n n  n
km

where we have used (1.28) on page 47 and the product formula. The proof of The-
orem 1.7 is now complete.

Pedagogical Comments. The preceding proof of Theorem 1.7 is very instruc-


tive in that it is the first example of a proof that is achieved by neatly breaking it
up into two smaller steps. First, we prove it for the case where the side lengths are
unit fractions and then, on the basis of this special case, we prove with ease the
general case. The general strategy of breaking a complex task into several simple
tasks is basic in mathematics. While it will not work for every proof, it works often
enough to make it worth learning. For example, see the discussion of the proofs of
ASA on page 249, the proofs of SSS and FTS in Sections 6.2 and 6.4, respectively,
of [Wu2020b], and the proof of Theorem G52 in Section 6.8 of [Wu2020b]. End of
Pedagogical Comments.

Three remarks

We round off our discussion of the multiplication of fractions with three re-


marks. The first is the explanation of the usual multiplication algorithm for
finite decimals. Consider for example
1.25 × 0.0067.
The algorithm says to
(α) multiply the two numbers as if they are whole numbers by
ignoring the decimal points,
(β) count the total number of decimal digits of the two decimal
numbers, say p, and
(γ) put the decimal point back in the whole number obtained in
(α) so that it has p decimal digits.
We now justify the algorithm using this example, noting at the same time that the
reasoning in the general case is the same. By the product formula, we have
125 67 125 × 67
1.25 × 0.0067 = 2 × 4 = 2 .
10 10 10 × 104
Noting that 125 × 67 = 8375 and recalling that 10m × 10n = 10m+n for all nonzero
whole numbers m and n, we get

8375
1.25 × 0.0067 = (corresponding to (α))
102 × 104

8375
= (corresponding to (β))
102+4

= 0.008375 (corresponding to (γ)).


52 1. FRACTIONS

A second remark is about a different perspective on the number line in terms of


multiplication. Let a number line be given; we will call this the original number
line. Suppose on the original number line we make a different choice of 0 and 1
(see page 5) by designating two points A and B as the new 0 and 1, respectively.
With respect to this new choice of A and B as 0 and 1, let P denote the fraction m
n
on the new number line, by which we mean the number line which has A and
B as 0 and 1, respectively. The question is how to locate the point P . We claim
that P is the point so that P and B are either both to the right of A or both to
the left of A and so that
m
(1.30) |AP | = × |AB|,
n
where |AP | and |AB| are the lengths of the segments AP and AB with respect to
the original number line and the fraction m n refers to the original number line. (To
make sense of the right side of (1.30), we will assume that |AB| is a fraction for now.
Once we have the real numbers, we can drop this restriction and the subsequent
argument will make sense in general.)
We will prove (1.30) for the special case that m 25
n is 7 , and it will be seen that
the proof of the general case is not different. For simplicity, let B be to the right of
A; the argument for the case where B is to the left of A is entirely similar. Since
25 4 4 25
7 = 3 7 = 3 + 7 and since P represents 7 on the new number line, if we replicate
the segment [A, B] two more times to the right of 0, we get to the point representing
the number 3 of the new number line. Now in the next segment of the same length
as [A, B] but to the right of this 3, if we divide it into 7 equal parts, then 4 "small
steps" from 3 to the right will get to P , as shown:

A B P
        
0 1 3
By the definitions of the addition of fractions and by (1.25) on page 45, we have
   
4 4 25
|AP | = (3 × |AB|) + × |AB| = 3 + × |AB| = × |AB|.
7 7 7
This proves (1.30) when m 25
n is equal to 7 .
In TSM, the argument would go something like this: if we compare the two
7 is to 1 (length of [0, 1]) as |AP | is to |AB| by proportional
number lines, then 25
reasoning. Thus
25/7 |AP |
= .
1 |AB|
By the cross-multiplication algorithm, we get |AP | = 257 × |AB| again. However,
since we do not know what proportional reasoning is, this argument has no validity.
A third and final remark is that there are two standard inequalities concerning
multiplication that are worth knowing: if A, B, C, and D are fractions, then:
(i) If A > 0, then AB < AC is equivalent to B < C.
(ii) A < B and C < D imply AC < BD.
Both are obvious when we interpret fraction multiplication as the area of a rectan-
gle. See Exercise 3 on page 53.
1.4. MULTIPLYING FRACTIONS 53

Pedagogical Comments. Multiplication is probably the most misunderstood


arithmetic operation among the usual four in the study of fractions. The simplicity
of the product formula (Theorem 1.6 on page 46), compared with the somewhat
mystifying addition formula ((1.12) on page 33) and invert-and-multiply ((1.33) on
page 57 below), has misled many educators and mathematicians to believe that
multiplication is the easiest of the four operations to teach. In a typical passage
in TSM, it is said that, "Multiplying fractions is usually simpler than adding frac-
tions. To find the product, you just multiply the numerators and multiply the
denominators."
We know that multiplication is in fact complicated. There are different ways to
define multiplication, but no matter how it is done, there are complications. In this
volume, it is defined in terms of the concept of "of " on page 24 and Theorem 1.3 on
page 27. One also has to prove the product formula using the precise definition and
then prove the area formula of a rectangle, Theorem 1.7 on page 48, to round off
its conceptual foundation. There is subtlety in the reasoning of all these theorems
and definitions. What is even more important is the fact that, without a solid
understanding of multiplication, it is impossible to approach division, as the next
section amply shows. The usual difficulty with the division of fractions is less a
statement about the difficulty of the concept of division per se but more about
students’ lack of a good grounding in the multiplication of fractions in TSM.
Multiplication has to be taught carefully if one hopes to have any success teach-
ing the division of fractions. End of Pedagogical Comments.

Exercises 1.4.
(1) Do each of the following without calculators.
(a) (12 23 × 12 23 × 12 23 ) × (2 19
1
× 2 19
1
× 2 19
1
)× 1
26 .

7
(b) ( 18 × 4 23 ) + (2 61 × 7
18 )
7
+ ( 18 × 3 16 ) .
2
(c) 8 50 × 1250 12 .
(2) (i) Prove that 123.45 × 1014 = 0.012345.
(ii) Prove that 67.8901 × 103 = 67890.1.
(iii) Formulate and prove generalizations of parts (i) and (ii).
(3) Prove the following for fractions A, B, C, and D (see page 52):
(i) If A > 0, then AB < AC is equivalent to B < C.
(ii) A < B and C < D imply AC < BD.
(4) Give a detailed proof of the corollary to Theorem 1.6 on page 46.
(5) Consider the following two numbers A and B, where
A is the length of the concatenation (see page 12) of 4 parts
when [0, 18
29 ] is divided into 17 equal parts. B is the length of
4
the concatenation of 18 parts when [0, 17 ] is divided into 29
equal parts.
Is A equal to B? Why or why not?
(6) (a) Find a fraction q so that 28 12 = q×5 14 . Do the same for 218 17 = q×19 12 .
(b) Make up a (realistic) word problem for each situation, and make sure
that the problems are not the same for both.
(7) The perimeter of a rectangle is by definition the sum of the lengths of
its four sides. Show that given a fraction A and a fraction L, (a) there is
54 1. FRACTIONS

a rectangle with area equal to A but with a perimeter that is bigger than
L and (b) there is a rectangle with perimeter equal to L but with an area
that is less than A.
(8) (a) 16 12 cups of liquid would fill a punch bowl. If the capacity of the
cup is 9 13 fluid ounces, what is the capacity of the punch bowl? Explain
carefully. (b) The length of a rod is 18 58 times the length of a short piece
that is 3 14 inches long. How long is the rod? Explain.
(9) How many buckets of water would fill a container if the capacity of the
bucket is 3 13 gallons and that of the container is 7 12 gallons? (Caution:
Getting an answer for this exercise is easy, but explaining it logically is
not.)
(10) Give a proof of the distributive law for the division of whole numbers;
namely, let k, m, n be whole numbers, and let n > 0. Then
(m ÷ n) + (k ÷ n) = (m + k) ÷ n.

(11) (This is Exercise 8 on page 31. Now do it again using the concept of
fraction multiplication.) James gave a riddle to his friends: "I was on a
hiking trail, and after walking 79 of a mile, I was 49 of the way to the end.
How long is the trail?"
(12) The difference of two given fractions is equal to 45 of the smaller one, while
their sum is equal to 28
15 . What are the fractions? (Hint: Use the number
line.)

1.5. Dividing fractions


Understanding the division of fractions requires the understanding of the divi-
sion of whole numbers and the multiplication of fractions. After a review of the
former, we give the definition of fraction division that is patterned closely after the
division of whole numbers. The precise definition leads immediately to the invert-
and-multiply rule. As another application of the precise definition, we solve a word
problem (on page 58) using only reasoning based on the definition of division, but
nothing involving analogies or metaphors as is the custom in TSM. The discus-
sion in the last subsection on the division of finite decimals is only a beginning and
cannot be concluded until Chapter 3 of [Wu2020c].
Division of whole numbers, revisited (p. 54)
Definition of the division of fractions (p. 57)
A typical word problem (p. 58)
Division of finite decimals (p. 60)

Division of whole numbers, revisited

The overriding fact concerning the concept of division is that—in a sense to


be made precise in (**) on page 57—division is an alternate, but equivalent, way
of writing multiplication.40 Unfortunately, this simple fact has been corrupted in
40 Mathematical Aside: In abstract algebra, this fact is expressed incisively as follows: the

division by a nonzero element x is by definition the multiplication by the multiplicative inverse


of x.
1.5. DIVIDING FRACTIONS 55

TSM to read: "Division is the inverse operation of multiplication." We will give


some indication below (*) on page 55 as to why this is nonsensical.
We first review the division of whole numbers.
We teach children that 369 = 4 because 4 × 9 = 36. This then is the statement
that 36 divided by 9 is the whole number which, when multiplied by 9, gives 36. In
symbols, we may express the foregoing as follows: 36 9 is by definition the number
k which satisfies k × 9 = 36. Likewise, 84 7 is the whole number m which satisfies
m × 7 = 84, etc. In general, given any two whole numbers m and n with n = 0, the
division mn is the whole number q so that qn = m, and since we only have whole
numbers at our disposal, this cannot make sense unless m is a multiple of n to
begin with. Notice that we say "the" whole number q, because this q is unique;
i.e., there is only one such number. In elementary school, the "uniqueness" is
(understandably) not emphasized, but we have to emphasize it now in anticipation
of the division of fractions. Let us make sure that this uniqueness is actually true
by showing that if there is another whole number q  so that q  n = m, then q  n = qn
since both are equal to m. Therefore (q  − q)n = q  n − qn = 0, and since n = 0,
q  − q = 0, and we have q  = q. So this q is indeed unique.
The precise formulation of the concept of division among whole numbers is then
the following:
If m and n are whole numbers, with n = 0 and m being a multiple
of n, then the division of m by n, denoted by m n
, is the unique
whole number q so that m = qn.41
Once we have this precise definition of division, we can assert the following:
(*) Given whole numbers m, n, and q, with n = 0 and m being
a multiple of n, m
n = q is equivalent to m = qn
This is the correct formulation of the confusing statement in TSM that "division is
the inverse operation of multiplication" when the concept of division is introduced.
The difference between (*) and the latter statement could not be more stark. (*)
is a straightforward statement, after both multiplication and division have been
defined, that the two simple facts, m n = q and m = qn, are equivalent. The latter
statement, however, is usually made as an attempt to inform students about what
"division" means. But the statement utterly fails to be informative because stu-
dents have no idea of what an "inverse operation" is.

Mathematical Aside: The statement "division is the inverse operation of multi-


plication" does not make any sense whatsoever, because an operation on the whole
numbers N is a mapping F : N × N → N so that an "inverse operation" would
have to be a map G : N → N × N so that F ◦ G = I and G ◦ F = I, where we have
used I as a generic symbol for the identity mappings on N and N × N. Thus with
F : N × N → N defined by F (m, n) = mn (multiplication of m and n), clearly
there can be no such G. For example, although F (3, 8) = 24, it is also true that
F (4, 6) = 24 and even F (12, 2) = 24. So how can we define G? Should G(24) be
(3, 8) or (4, 6) or (12, 2)? What TSM should have said is that, with a whole number

41 This precise definition of division provides a simple explanation that division of a nonzero

number by 0 has no meaning, because if it had meaning, then for a nonzero whole number m, m 0
is the whole number q so that q × 0 = m. But the last equation cannot hold because the product
on the left side is equal to 0 whereas the right side m is nonzero.
56 1. FRACTIONS

m fixed and m being a multiple of a nonzero whole number n, it is always the case
that m × n ÷ n = m and m ÷ n × n = m. But as we have indicated all along, TSM
is simply incapable of achieving such precision.
The preceding definition of division among whole numbers is important for
the understanding of division among fractions because the latter is patterned after
the former, with one important caveat. The definition of whole number division
m ÷ n makes sense only when m is a multiple of n, but, with n fixed and n > 1,
there are relatively few whole numbers m that are multiples of n. Our first task in
approaching the division of fractions is to remove this restrictive condition in our
prospective definition of fraction division by showing that, given a nonzero fraction
B, every fraction is a (fractional) multiple of B, as the following theorem shows.

Theorem 1.8. Given fractions A and B (B = 0), there is a unique fraction


C, so that A = CB.

Proof. If A = 0, we may let C = 0. So we will assume A = 0 from now on and let


A = k and B = m n , where k, , m, n are nonzero whole numbers.
Let us first prove the uniqueness of a fraction C so that A = CB; i.e., regardless
of whether there is such a fraction C or not, we want to be assured that there can
be at most one such fraction. So suppose A = CB. Then
k m
=C× .
 n
n kn
Multiplying both sides by m yields m = C. This shows that if there is a fraction C
kn
so that A = CB, then there is only one possibility for C: C = m . This conclusion
proves the uniqueness of C, but it says more. It also tells us how to prove existence:
kn
simply let C = m and it is clear that A = CB by virtue of the cancellation rule
(1.29) on page 47. The theorem is proved.

The proof of Theorem 1.8 shows explicitly how to get the fraction C so that
CB = A: if A = k and B = m
n , then the proof gives C as
kn k n
C = = × .
m  m
In other words,
m k k n
(1.31) if C × = , then C = × .
n   m
This fact will be useful below.
Despite the simplicity of the statement, Theorem 1.8 is conceptually subtle and
may take some getting used to. As mentioned above, it says that if a fraction B
is nonzero, then every fraction A is a fractional multiple of B, in the sense that
A = CB for some fraction C. (Note that, since we are no longer dealing exclusively
with whole numbers, the meaning of multiple has to be suitably modified. In the
future, if we want to indicate that there is a whole number C so that A = CB,
we will say explicitly that A is a whole number multiple of B.) Taking A = 1,
then Theorem 1.8 implies that there is exactly one fraction, which we will denote
by B −1 , so that B −1 B = 1. We call this B −1 the inverse (or multiplicative
inverse, to be precise) of B. In fact, (1.31) shows that
m n
(1.32) if B = , then B −1 = .
n m
1.5. DIVIDING FRACTIONS 57

For this reason, B −1 is also called the reciprocal of B in the context of fractions.
Using this notation, the expression of C in equation (1.31) above can be rewritten
as
C = AB −1 .
Thus, if A = CB, then C = AB −1 . For example, if A = 11
5 and B = 23
8 , then the
C that satisfies C( 23 11
8 ) = 5 is, according to (1.31),
11 8 88
C = × = .
5 23 115

Definition of the division of fractions

We can now give the definition of fraction division. It is, word for word, the
same as the definition of whole number division given on page 55, with the excep-
tion that, thanks to Theorem 1.8, there is no need to require that A be a fractional
multiple of B.

Definition. If A, B are fractions (B = 0), then the division of A by B,


A
denoted by B , is the unique fraction C so that A = CB.

The division of A by B is also called the quotient of A divided by B. In


analogy with (*) on page 55, we can also state:
A
(**) Given fractions A, B, and C, with B = 0, = C is equivalent
B
to A = CB.
This is the precise meaning of the statement that division is an alternate, but
equivalent, way of writing multiplication.
At this point, we call attention to the resemblance of this definition of division
to the abstract interpretation of subtraction in equation (1.18) on page 39.

Remarks. We wish to make three remarks on the division of fractions.


(i) Invert and multiply. Given fractions k and m
n (the latter being nonzero),
we claim that the following invert-and-multiply rule holds for the division of
fractions:
k
k n
(1.33) 
m = × .
n  m
Indeed, if we denote the left side of (1.33) by C, then, by definition, C is the unique
fraction so that k = C × mn . By (1.31),

k n
C= × .
 m
So (1.33) is proved. (We see that there is nothing mysterious about this rule: it is
a simple consequence of the correct definition of division.)
(ii) Division is well-defined. Given fractions k and m
n (the latter being nonzero),
k K m M
we have defined the division of k by m n . Now suppose  = L and n = N for

whole numbers k, , . . . , M , N . Then the division of K M


L by N is also defined. So
58 1. FRACTIONS

the question is this: does it hold that


k K
 L
(1.34) m = M
?
n N
If not, then it would mean that we do not have a concept of the division of frac-
tions (which are points on the number line), but only the division of the particular
fractions symbols chosen to represent these fractions (see the discussion of (1.26)
on page 46). Fortunately, (1.34) is correct because m M n N
n = N implies m = M (for ex-
ample, use the cross-multiplication algorithm, Theorem 1.2 on page 22). Therefore
by (1.26) on page 46, we have
k n K N
× = × .
 m L M
By the invert-and-multiply rule, equation (1.33), this is equivalent to equation
(1.34), as desired.
(iii) Two concepts of division for whole numbers. The following discussion
is best understood from the viewpoint of the concept of extension on page 29.
Consider 7 and 5. On the one hand, we can consider the whole-number division
7 ÷ 5 in the sense defined on page 29. On the other hand, we can also consider the
division of the fraction 7 by the fraction 5 in the sense of the preceding definition
(remember that each whole number is also a fraction). Are they equal? The easy
answer is yes, because both are equal to the fraction 75 , but here is a more detailed
explanation. For the division as whole numbers, 7 ÷ 5 yields the fraction 75 by virtue
of Theorem 1.4 on page 29. Now the division as fractions shows that, by virtue of
the invert-and-multiply rule (1.33),
7
7 1 7
1
5 = × = .
1
1 5 5
So the two are equal.
Obviously, the reasoning holds in general for any two nonzero whole numbers.

A typical word problem

The following is a typical application of the concept of fraction division in school


mathematics. We ask you to take note of the difference between the usual presen-
tation in TSM and the one given here: we give the explicit reason why division has
to be used and why—according to mathematics—the answer must be interpreted in
a particular way.

Example. A rod 43 38 meters long is cut into pieces which are 53 meters long.
How many such pieces can we get out of the rod?
If we change the numbers in this example to "if a rod 48 meters long is cut into
pieces which are 2 meters long, how many such pieces can we get out of the rod?",
then there would be no question that we should do the problem by dividing 48 by
2. So we will begin the discussion by following this analogy and simply divide 43 38
by 53 and see what we get:
43 38 1041 1
5 = = 26 .
3
40 40
1.5. DIVIDING FRACTIONS 59

We have used invert-and-multiply for the preceding computation, of course. Now


1
what does the answer 26 40 mean? Remembering the definition of division, we see
that the division of 43 8 by 53 being equal to 26 40
3 1
is equivalent to
 
3 1 5 1 5
43 = 26 × = 26 + ×
8 40 3 40 3
   
5 1 5
= 26 × + × (distributive law),
3 40 3
where the distributive law for fractions is stated in the corollary on page 46. The
first term on the right, 26× 53 , is the length of the concatenation of 26 segments each
of length 53 , and the second term on the right, 40 1
× 53 , is the length of a segment
1 5
which is 40 of 3 , by the definition of fraction multiplication. Thus the rod can be
cut into 26 pieces each of 53 meters in length plus a piece that is only 40 1
of 53 meters.
This then provides the complete answer to the problem and retroactively justifies
the use of division to do the problem.
Notice that the key to getting the correct answer is knowing the precise defini-
tion of division (which allowed us to convert the division into a multiplication) and
knowing the distributive law (which allowed us to arrive at a correct interpretation
1
of the answer 26 40 from the division).

You may find such an after-the-fact justification of the use of division


to do the problem to be unsatisfactory. There is in fact a line of logical
reasoning that leads inexorably to the conclusion that division should
be used. We now present this reasoning. (Compare Exercise 8(b) on
page 54.)
Let there be a maximum of K copies of 53 in 43 83 , where K is a
whole number. Then 43 38 − K × 53 is less than 53 (as otherwise K would
not be the maximum number of such copies). Denote 43 83 − K × 53 by r
(note that r is a fraction); then we may rewrite the definition of r as
 
3 5 5
(1.35) 43 = K × + r, where 0 ≤ r < .
8 3 3
5
Now, by Theorem 1.8 (page 56), we may express r as a multiple of 3
;
i.e., there is a fraction m
n
so that

m 5
r = × .
n 3
We notice that m n
must be a proper fraction in the sense that m < n
because r < 53 , so that, in view of the preceding equation,

3 3 5
m = ×r×n< × ×n=n
5 5 3
(see item (B) on page 52). Therefore, substituting this value of r into
the equation (1.35) gives
   
3 5 m 5
43 = K× + ×
8 3 n 3
 m 5
= K+ × ,
n 3
60 1. FRACTIONS

where K + mn
is a mixed number because m n
is a proper fraction. So we
may rewrite the preceding equation in the notation of mixed numbers:
3  m 5
43 = K × .
8 n 3
By the definition of division, we see that
m 43 3
K = 58.
n 3

If we know the mixed number K m n


, then we would know the answer to
the problem, which is K. Therefore, the significance of the preceding
equation is that, in order to find the maximum number of 53 ’s in 43 38 ,
we should do the division
43 83
5 .
3
Recall that, by the above calculation, K = 26 and m
n
1
= 40 .
We have thus explained how one can give an a priori justification
for the use of division to solve this problem.

Division of finite decimals

We now come to the last part of the arithmetic of finite decimals: division.
In principle, this is very simple because we are going to show that the division
of decimals is reduced to the division of whole numbers. The following example is
sufficient to illustrate the general case: the division
0.311
0.64
becomes, upon using the definition of a decimal and invert-and-multiply,
311
0.311 103 311
= 64 = .
0.64 102
640
This reasoning is naturally valid for the division of any two finite decimals. There-
fore, the division of any two finite decimals is equal to a fraction.
Now a fraction is not quite the right answer to a division of two finite decimals
because we want the answer to be a finite decimal. The next step is therefore to
convert a given fraction to a finite decimal; i.e., given a fraction m n for some whole
numbers m and n, to convert m n
to a finite decimal is to find whole numbers
N and k so that
m N
= k,
n 10
where the right side is, by definition, a finite decimal (see page 14). It turns out
that this is not always possible, as we now explain.
The overriding fact is that a fraction in lowest terms is equal to a finite decimal
if and only if its denominator is a product of 2’s and 5’s42 (Theorem 3.8 on page
152). Keeping this in mind, what we will do here is show how to convert a fraction,
in three different ways, to a finite decimal if its denominator is a product of 2’s
and 5’s. At the end, we will also make a passing comment on the general case of
converting an arbitrary fraction to an "infinite decimal".

42 It is understood that it could be just 2’s or just 5’s.


1.5. DIVIDING FRACTIONS 61

In one sense, the conversion is easy when the denominator is a product of 2’s
and 5’s. What makes it easy is the simple observation that a power of 10 is a product
of 2’s and 5’s; i.e., 10k = 2k × 5k for any whole number k (because 10 = 2 × 5).
7
To see how this leads to the conversion, take the fraction 125 , for example. Since
125 = 53 and 103 = 23 × 53 , we see that we can directly change the denominator
125 to 103 by multiplying it by 23 . Therefore, by invoking equivalent fractions, we
get
7 7 23 × 7 56
= 3 = 3 = 3 = 0.056.
125 5 2 ×5 3 10
28
Here is another example: consider the fraction 625 . Since 625 = 54 and 104 =
24 × 54 , we have
28 28 24 × 28 448
= 4 = 4 = 4 = 0.0448.
625 5 2 × 54 10
311
Finally, let us take up the fraction above, 640 . We have
311 311
= 7 .
640 2 ×5
Since 107 = 27 × 57 = (27 × 5) × 56 = 640 × 56 , we see that multiplying 640 by 56
changes 640 to 107 . Therefore, using equivalent fractions again, we get
311 311 × 56 311 × 56 4859375
= = = = 0.4859375.
640 (2 × 5) × 5
7 6 107 107
There should be no difficulty at this point in converting any fraction with a denom-
inator equal to a product of 2’s and 5’s to a finite decimal.
This is not the end of the story, however. We need two more observations
to round out the picture. First, there is another way to make use of the given
hypothesis that the denominator of a fraction is a power of 2’s and 5’s. Thus far,
we have focussed our attention on the denominator of the fraction, but we will
now shift the focus to the numerator instead. We can use equivalent fractions to
introduce a large power of 10 into the numerator so that we can cancel the 2’s and 5’s
in the denominator to get a whole number. This statement needs to be amplified, so
7
let us look at an example: consider 125 again. Knowing in advance that 125 = 53 ,
we know that if we have 10 (= 2 × 5 = 23 × 125) in the numerator, we would
3 3 3

be able to cancel the 125 in the denominator. Precisely, using the cancellation rule
(1.29) on page 47, we have
 
7 7 × 103 1
(1.36) = × 3.
125 125 10
  
N3

The subscript 3 in the fraction we call N3 on the right refers to the fact that 103 is
in its numerator. N3 is in fact a whole number because its numerator is a multiple
of 125 (since 103 = 23 × 125) so that it cancels the 125 in the denominator. Thus
N3 = 7 × 23 = 56 so that
7 N3 56
(1.37) = 3 = 3 = 0.056
125 10 10
62 1. FRACTIONS

640 . Knowing 640 = 2 ×5


as before. For another example, let us revisit the fraction 311 7

in advance, we will multiply the numerator 311 by 107 , as follows:


 
311 311 × 107 1
(1.38) = × 7.
640 640 10
  
N7

Again, the subscript 7 in the fraction we call N7 on the right refers to the 107 in
its numerator. N7 is also a whole number because the number 107 in its numerator
is a multiple of 640 (since 107 = (27 × 5) × 56 = 640 × 56 ) and this factor of 640
in the numerator of N7 cancels its denominator. Thus N7 = 311 × 56 = 4859375 so
that
311 N7 4859375
(1.39) = 7 = = 0.4859375.
640 10 107
Let us do one more example. Consider the fraction 15 5
32 . Because 32 = 2 , we rewrite
the fraction as
 
15 15 × 105 1
(1.40) = × 5.
32 32 10
  
N5

Once again, the fraction N5 is a whole number because the 105 in its numerator
is equal to 105 = 25 × 55 = 32 × 55 and the denominator 32 of N5 therefore gets
canceled. Thus N5 = 15 × 55 = 46875. Consequently,
15 N5 46875
(1.41) = 5 = = 0.46875.
32 10 105
The second observation builds on the first and establishes a connection between
the way of converting a fraction to a finite decimal in the first observation and the
traditional way that uses long division. To explain this statement in greater detail,
7
let us look at 125 again. As we explained after (1.36), the number N3 in (1.36) is a
whole number (in fact N3 = 56). Therefore N3 is the quotient of the long division
of 7×103 by 125. By tradition, one refers to this long division as the long division
of the numerator 7 by the denominator 125, it being understood that it is
actually the long division of 7 × 103 (and not 7) by the denominator 125. In TSM,
the conversion of a fraction to a decimal by long division is a rote skill, and part
of this rote skill is to "put the decimal point back in the quotient N3 (= 56)" in a
particular way. In our setting, such a placement of the decimal point—third digit
from the right—is precisely explained (and dictated) by the denominator 103 in
(1.37).
As a second example, consider the fraction 311 640 in (1.38). Again, since we
already know that the N7 in (1.38) is a whole number (in fact, N7 = 4859375), N7
is now seen to be the quotient of the "long division of the numerator 311 by the
denominator 640" (again it is understood that the dividend is actually 311 × 107 ).
Equation (1.39) then shows why 311 640 is equal to the decimal obtained by placing
the decimal point 7 digits from the right in the quotient N7 . Similarly, the fraction
15
32 in (1.41) is equal to the decimal obtained by placing the decimal point 5 digits
from the right in the numerator N5 (= 46875), where N5 is the quotient of the long
division of 15 (actually 15 × 105 ) by 32 as in (1.40).
We must tie up one last loose end concerning (1.36). To express the fraction
7
125 as a finite decimal, do we have to use 103 as in (1.36) or can we use 10k for any
1.5. DIVIDING FRACTIONS 63

k ≥ 3? In other words, suppose we take any whole number k ≥ 3 and define Nk in


terms of 10k as follows:
 
7 7 × 10k 1
(1.42) = × k.
125 125 10
  
Nk
7
Comparing with (1.36) on page 61, we have expressed 125 in two different ways: as
N3 /10 and as Nk /10 for a whole number k ≥ 3. What is the relationship between
3 k

Nk and N3 ? The simple answer is that


Nk = N3 × 10k−3
because
7 × 10k 7 × 103
Nk = = × 10k−3 = N3 × 10k−3 .
125 125
In other words, Nk is the same quotient of the long division of 7 × 103 by 125 with
a suitable number of zeros added to the right end. This means we can obtain the
finite decimal equal to 125 7
by taking the quotient of the long division of 7 × 10k
by 125 for any k ≥ 3 and then putting the decimal point k digits from the right in
this quotient.
Similarly, for the fraction 311
640 , it is equal to the finite decimal obtained by taking
the quotient of the long division of 311×10k (for any k ≥ 7) by 640 and then putting
the decimal point k digits from the right in this quotient. A similar statement can
be made about the fraction 15 32 . In fact, the uniformity of the reasoning in these
three examples shows that the reasoning is valid in general. Therefore we have
proved the following theorem, which is the traditional algorithm for converting
a fraction to a decimal by the long division of the numerator by the
denominator at least for the special case where the denominator is a product of
2’s and 5’s.

Theorem 1.9. Let m n be a fraction so that n is a product of 2’s and 5’s. Then
for any sufficiently large whole number k, the long division of m × 10k by n has
quotient q (a whole number) and remainder 0, and m n is equal to the finite decimal
q
10k
.

It is clear from the reasoning used to arrive at Theorem 1.9 that the k in
Theorem 1.9 can be taken to be any whole number larger than or equal to both
of the exponents of 2 and 5 when n is expressed as the product of a power of 2
times a power of 5. This latitude in the choice of k is what makes the algorithm
of the conversion of a fraction to decimal so easy to use, because it says that if in
doubt, one can add any number of zeros to the right of the numerator to do the
long division. Such a practical consideration also brings up another observation
about Theorem 1.9, namely, that there is in fact no need to consider any fraction
whose denominator contains factors of both 2 and 5. This is because, if factors of
both 2 and 5 are present, the denominator will be a multiple of 10 (= 2 × 5) and
the factors of 10 can be "split off" from the beginning to simplify the computations.
640 : because 640 = 2 × (2 × 5) = 64 × 10,
Let us illustrate with the above fraction 311 6

we have
311 311 1
= × .
640 64 10
64 1. FRACTIONS

This equality means that to obtain the decimal conversion of 311


640 , it suffices to first
311
obtain the decimal conversion of 64 and then move the decimal point of the latter
decimal one digit to the left (compare Exercise 2 on page 53). The point is, of
course, the fact that the denominator 64 of 311 6
64 is a power of 2 alone (i.e., 2 ).

Now the general case. Most fractions will not be equal to a finite decimal
because Theorem 3.8 on page 152 says that if a fraction in lowest terms is equal to
a finite decimal, then its denominator is a product of 2’s or 5’s or both. Thus in
general, a fraction will be equal to an infinite decimal, whose definition can only
be given by using the concept of limit; see Chapter 3 of [Wu2020c]. The fact that
every fraction is equal to an infinite repeating decimal is the content of Theorem
3.8 in Section 3.4 of [Wu2020c].43
TSM usually teaches the procedure of "converting a fraction to an infinite
decimal by the long division of the numerator by the denominator" entirely by
rote without the slightest explanation of what an "infinite decimal" is or why the
procedure involving long division is valid. Worse, the simple explanation (implicit
in (1.42) on page 63) that one can attach as many zeros to the right of the numerator
to continue the long division and then put back the decimal point in the quotient
could have been given but never was. While it is true that the definition of an
infinite decimal is beyond the scope of school mathematics, one can nevertheless
approach the conversion of fractions to infinite decimals in school mathematics in
a more civilized manner, as we now explain. Consider, for definiteness, the decimal
conversion of 27 . Instead of saying 27 is equal to the infinite repeating decimal
0.285714, we can say instead that we can approximate 27 as closely as we like by a
finite decimal. For example, suppose we want a finite decimal that is within 1/106
of 27 . Following up our success with equation (1.36) on page 61, we should at least
try something like the following:
 
2 2 × 108 1
= × 8.
7 7 10
The hope is that the fraction in parentheses on the right will yield something that
serves our purpose. We use 108 instead of 106 because we are unsure of our footing
here and want to play it safe (we could even use 1010 if 108 turns out to be not good
enough). By the long division of 2 × 108 by 7, we get the division-with-remainder
(1.43) 2 × 108 = (28571428 × 7) + 4.
Thus,
 
2 (28571428 × 7) + 4 1 4 1
= × 8 = 28571428 + ×
7 7 10 7 108
 
28571428 4 1
= + × .
108 7 108
Equivalently,
 
2 4 1
− 0.28571428 = × ,
7 7 108

43 A finite decimal is considered to be an infinite repeating decimal in this context, with a

repeating block consisting of the single digit 0.


1.5. DIVIDING FRACTIONS 65

But the right side is a nonzero fraction, so we actually have


 
2 4 1
0 < − 0.28571428 = × 8 .
7 7 10
We can further simplify the right side. Since 47 < 1, inequality (i) on page 52 implies
 
4 1 1 1
× < 1 × 8 = 8.
7 108 10 10
Therefore we obtain
2 1
(1.44) 0 < − 0.28571428 < 8 .
7 10
These inequalities can be captured in a picture: the thickened horizontal segment
below represents the difference: 27 − 0.28571428.
2
0.28571428 7

  
1
108
In any case, the decimal 0.28571428 is a finite decimal approximation of 27 with an
error no bigger than 1/108 . Since 1/108 < 1/106 , this decimal approximation is
certainly within 1/106 of 27 .
There is a more intuitive way to write (1.44). First, it follows from (1.44) that
2 1
0.28571428 < < 0.28571428 + 8 .
7 10
Since 1018 = 0.00000001, the addition algorithm for decimals therefore implies that
2
(1.45) 0.28571428 < < 0.28571429.
7
This is a more reasonable way of saying that 0.28571428 is a finite decimal approx-
imation of 27 up to an error of 1/108 .
In a similar manner, if we want a finite decimal that approximates 27 to within
1/1015 , then we would begin instead with
 
2 2 × 1015 1
= × 15 .
7 7 10
(We now use 1015 because our experience above with 108 has emboldened us.) An
analogous calculation now gives
2 1
0 < − 0.285714285714285 < 15 .
7 10
It follows that the decimal 0.285714285714285 is within 1/1015 of the fraction 27 .
As in (1.45), we also have
2
(1.46) 0.285714285714285 < < 0.285714285714286.
7

It is clear that the reasoning is perfectly general. For example,


2 1
0 < − 0.285714285714285714285714 < 24 ,
7 10
66 1. FRACTIONS

and, similar to (1.45) and (1.46), we also have


2
0.285714285714285714285714 < < 0.285714285714285714285715.
7
Mathematical Aside: Once we have the concept of absolute value at our disposal
(see page 125), then the standard way to express 0.28571428 as a finite decimal
approximation of 27 up to 1/108 is

2
− 0.28571428 < 10−8 .
7
Similarly, we have

2
− 0.285714285714285 < 10−15 ,
7

2
− 0.285714285714285714285714 < 10−24 .
7
Incidentally, one cannot possibly miss the repeating nature of the decimals so
obtained (the block of digits 285714 repeats itself). The complete proof that the
decimal expansion of any fraction always exhibits such a repeating phenomenon
can be found in Section 3.4 of [Wu2020c].
Exercises 1.5.
(1) (i) Prove that 12.345
102
= 0.12345. (ii) Formulate and prove a generalization
of part (i).
1 1
(2) Compute (i) 1 1 1 and (ii) 1 1 1 . (Compare Exercise 8 on
(
2 3 + 4 ) 2 2/3 + 5/4 )
(
page 72.)
218 3 27
(3) Compute and simplify your answers: (i) 5 and (ii) .
13 8 17 14
(4) Let A and B be fractions and let AB = 0. If A = 0, prove that B = 0.
(5) (a) How many 1 13 ’s are there in 95 27 ? (b) How many blocks of 18 minutes
are there in 8 12 hours? Calculate this in terms of minutes, and then do
it in terms of hours. Compare.
(6) You want to cut pieces that are 1 31 inches long from a rod whose length is
85 12 inches. Explain as if to a sixth grader what is the maximum number
of such pieces you can get and how many inches of the rod are left over.
(7) It takes 2 tablespoons of a chemical to dechlorinate 120 gallons of water.
Given that 3 teaspoons make up a tablespoon, how many teaspoons of
this chemical are needed to dechlorinate x gallons of water? (Assume
that the amount of water, divided by the amount of chemical needed to
dechlorinate this amount of water, is a fixed number c.) Caution: Don’t
even think about doing this problem by "setting up a proportion", because
this procedure cannot be justified.
(8) Let a and d be whole numbers, and let q and r be the quotient and
remainder of a divided by d. Let also Q be the fraction so that a = Qd.
Determine the relationship among Q, q, d, and r. (Those who are unsure
of the meaning of division-with-remainder can look it up on page 139.)
1.5. DIVIDING FRACTIONS 67

(9) The following is one of several approaches to the division of fractions that
one finds in TSM44 :
k/
We try to find out what m/n could mean. Using equivalent frac-
tions, we get
k

k
 × n kn
 kn
= = = ,
m
n
m
n × n mn
n
m

and therefore
k
 kn
m = .
n m
Point out all the flaws in this explanation.
(10) Here is another approach to the division of fractions according to [Davis-
Pearn, p. 10]. It begins by explaining that 1 34 ÷ 12 = 3 12 can be modeled by
"How many 12 -minutes are there in 1 34 minutes?" Now 3 of the 12 -minutes
is 1 12 minutes, while 4 of the 12 -minutes is 2 minutes, and already 2 > 1 34 .
So 1 34 ÷ 12 is between 3 and 4. Since 1 34 − 1 12 = 14 and 14 minutes is exactly
1 1 1 1 3
2 of a 2 -minute unit, there are exactly 3 2 of the 2 -minutes units in 1 4 .
That said, the explanation of 1 4 ÷ 2 = 3 2 continues:
3 1 1

The question, "How many 12 -minutes are there in 1 34 minutes?"


can be looked at from a different perspective.
For two fractions, such as 12 and 1 34 , we ask the question, "What
whole number multiple of both fractions will give a whole number
answer in both cases?"
For example, 4 × 12 = 2 and 4 × 1 34 = 7. The common multiple
of 4 stretches the unit 12 -minute to 2 minutes and stretches the
quantity 1 34 minutes to 7 minutes.
Now, the number of 12 -minute units in 1 34 minutes is the same as
the number of 2-minute units in 7 minutes; namely 72 = 3 12 . In
other words, 1 34 ÷ 12 = 72 = 3 12 .
By stretching both the unit of a 12 -minute and the quantity 1 34
minutes by a common amount, we changed the fraction division
problem 1 34 ÷ 12 into the whole number division problem 7 ÷ 2
which has the fractional answer of 72 = 3 12 .
Point out the flaws in this explanation.
12
(11) (a) Explain as if to a sixth grader how to use long division to convert 3125
3
to a decimal. (b) Do the same with 64 .
(12) (a) Find a finite decimal that is within 1/105 of 56 . (b) Find a finite
3
decimal that is within 1/106 of 11 .
12345
(13) Convert 9876 to a decimal. Provide details!45
5
(14) (a) 12 of a sack of rice is 8 23 the weight of 5 books. Each book weighs 2 12
lbs. How much (in lbs.) does the sack of rice weigh? (b) A pizza parlor
has a Learning Fractions Special. Normally, it charges m n × 8 dollars for

44 See page xiv of the preface for the definition of TSM.


45 I got this devilish problem from Ole Hald.
68 1. FRACTIONS

m
n of a small pizza. During this special sale, it sells 12 of a pizza for the

usual price of 13 of a pizza.46 At the sales price, how much would 8 23 small
pizzas cost?
5
(15) Use the number line to solve the following: if 13 of a number N is 8 more
than a third of N , what is N ?
(16) Show that there is a rectangle with area < 1 sq. cm and perimeter equal
to 1,000 cm.

1.6. Complex fractions


This section gives a careful exposition of an important concept, that of com-
plex fractions. It is important because, without it, very few solutions given in the
education literature to problems involving ratio, percent, and rates would be math-
ematically correct, and yet complex fractions are hardly mentioned in TSM or, in
fact, anywhere in the education literature. Further explanation of this anomaly is
given below in the first subsection. We then summarize the facts we need to know
about complex fractions.
Why complex fractions? (p. 68)
The basic formulas (p. 70)

Why complex fractions?

Further applications of the concept of division cannot be given without first


introducing the concept of complex fractions, which are by definition the fractions
obtained by the division BA
of two fractions A, B (B = 0).47 We would like to
A
emphasize that a complex fraction B is by definition a division of the fraction A by
the fraction B, and therefore this concept cannot be introduced until the concept
of division among fractions is already available.
We will continue to call A and B the numerator and denominator of the
A A
complex fraction B , respectively. Since any complex fraction B is just a fraction,
−1
namely, the fraction AB , why do we single out complex fractions for a separate
discussion? For an answer, consider a common example of adding fractions:
1.2 3.7
+ .
31.5 0.008
Note that both summands are complex fractions because 1.2 = 12 315
10 , 31.5 = 10 , etc.
Now, the addition can be handled by the usual procedure for fractions because we
may invert and multiply to obtain

12 37
1.2 3.7 10 10 12 3700 1165596
+ = 315 + 8 = + = .
31.5 0.008 10 103
315 8 2520

46 I got this idea from my late friend David Collins. We believe that if all pizza parlors would

buy into this idea, the national fractions achievement would improve.
47 This is a confusing piece of terminology because it suggests that complex numbers are

involved, but they are not. Since this is the terminology already in use in school mathematics and
the confusion is tolerable, we will go along with it. Such compromises are unavoidable.
1.6. COMPLEX FRACTIONS 69

Nevertheless, TSM teaches students to do the addition by treating the decimals


as if they were whole numbers and directly applying the addition algorithm for
fractions:
1.2 3.7 (1.2 × 0.008) + (31.5 × 3.7)
(1.47) + = .
31.5 0.008 31.5 × 0.008
Computing the right side of (1.47), we get
1165596
1.2 3.7 116.5596 10000 1165596
+ = = 2520 = .
31.5 0.008 0.252 10000
2520
It must be said that equation (1.47) is more appealing than the cumbersome first
method. It dispenses with invert-and-multiply and makes direct use of the familiar
formula for fraction addition; namely,
k m kn + m
(1.48) + =
 n n
(equation (1.12) on page 33). By letting k = 1.2,  = 31.5, m = 3.7, and n =
0.008 in (1.48), we get (1.47). However, a little reflection will cast doubt on this
application of (1.48) because the latter has been established only for fractions k
and mn , which means equation (1.48) is only valid for whole numbers k, , m, and n
up to this point. Unfortunately, 1.2, 31.5, etc., are not whole numbers! What saves
the day—and this is the reason we are discussing complex fractions right now—is
that equation (1.48) turns out to be correct even when k, , m, n are fractions,
thereby establishing the validity of (1.47) as an algorithm. See (c) on page 71.
In the same vein, TSM allows students to multiply the following complex frac-
tions as if they were ordinary fractions by writing
0.21 84.3 0.21 × 84.3
× =
0.037 2.6 0.037 × 2.6
regardless of the fact that the product formula k × m km
n = n has only been proved
for whole numbers k, , m, n. Again, the simplicity of the product formula motivates
us to prove the validity of k × m km
n = n for fractions k, , m, n. See (d) on page
71.
So far we seem to be talking about nothing more than a subjective preference
for formal simplicity in calculations with complex fractions, but, in fact, a much
deeper issue is involved. Should we allow TSM to impress on students the message
that if something is declared to be true for whole numbers, then it can also be used
for other numbers, such as fractions, rational numbers, etc.? Still more is at stake.
There is a real mathematical need for extending the usual formulas for ordinary
fractions to complex fractions. For example, without complex fractions, we cannot
even give correct definitions of the basic concepts of percent, ratio, and constant
speed (see pp. 73, 75, and 78, respectively), much less solve problems about these
concepts (see the italicized comment on page 74). If we look further ahead in the
school curriculum, then we will encounter rational expressions in a number x (see
page 316). TSM blandly asserts
√ that an equality like the following is valid for all
x ∈ R other than 1 and − 3 2:
x 7 x·7
· 3 = .
x−1 x +2 (x − 1)(x3 + 2)
70 1. FRACTIONS

However, if x = 34 , for example, then this equality becomes an equality that requires
the product formula for complex fractions for its justification:

4 ×7
3 3
7
4
× =
3
.
( 43 − 1) ( 43 )3 + 2 4 − 1 ( 43 )3 + 2

Have students been properly prepared to accept that this computation is correct?
The answer is an emphatic no. Many other computations with rational expressions
related to addition or subtraction also assume a knowledge of such computations
with complex fractions, implicit as that may be. See, for example, page 316. Yet
TSM utters nary a word about this requisite background information. This does
real damage to student learning.
A main goal of school mathematics education is to help students acquire the
ability to move from assumptions to conclusions by the use of reasoning. This
ability is what we hope will mold students into independent, critical thinkers. The
inculcation of the ability to reason is a delicate process under the best of circum-
stances, and its chance of success is irrevocably diminished if there are gaping holes
imbedded in the very reasoning we try to teach. What is the point, for example, of
paying close attention to the reasoning (if it is given) that the formula for fraction
addition, (1.48), is correct for whole numbers k, , m, and n when (1.48) is soon
applied to situations where k, , m, and n are fractions? Students cannot help but
ask: where is the reasoning now? Taking such examples as a cue, their survival
instincts will lead them to the conviction that, in mathematics, it is not reason-
ing but improvisation and expediency that matter. It then comes to pass that,
once students are taught how multiplication can be "distributed" over addition—
the distributive law—many will feel no inhibition in "distributing" squaring over
2 2
addition (i.e., (x + y)
√ = x +√ y 2 ) or even "distributing" the taking of square-root

over addition (i.e., x + y = x + y). While it will take serious education re-
search to establish a causal relationship between these two kinds of phenomena,
few would argue that abusing (1.48) and similar formulas about fractions will serve
to reinforce the importance of reasoning in students’ minds.
This volume asks you, as a teacher, to avoid the kind of TSM-inspired rote
teaching that we have just described and replace it with the correct reasoning that
underlies the operations with complex fractions.

The basic formulas

Here is a brief summary of the basic facts of complex fractions that figure
prominently in school mathematics: let A, . . . , F be fractions, and we assume fur-
ther that they are nonzero where appropriate in the following. Then:
AC A
(a) Cancellation law: If C = 0, then BC = B .

2.9 × 17
7
2.9
Example: 7 = 2 .
3 × 17
2
3

A C
(b) B = D if and only if AD = BC.

A C
B
< D
if and only if AD < BC.
1.6. COMPLEX FRACTIONS 71

4 13
4 16 m 13
Example: If × < × , then 5
m < 2
16 .
5 3 n 2 n 3
A
(c) B ±D
C
= (AD)±(BC)
BD
.

1.2 3.7 (1.2 × 0.08) + (31.5 × 3.7)


Example: + = .
31.5 0.08 31.5 × 0.08
A
(d) B ×D
C AC
= BD .

0.21 1
0.21 × 13
Example: × 3 = .
0.037 2.6 0.037 × 2.6

Remark. If you are curious about whatever happened to the division of com-
plex fractions, it is a consequence of (a) and (d); see Exercise 4 on p. 72.
Assertions (a), (b), (c), and (d) are the generalized versions of the cancellation
law (page 20), the cross-multiplication algorithm (page 22) and cross-multiplication
inequality (page 36), the addition and subtraction formulas (pages 33 and 39), and
the product formula (page 46), respectively, for ordinary fractions. We call explicit
1.2 3.7
attention to the fact that (c) and (d) justify the computations with 31.5 + 0.008 and
0.21
0.037 × 84.3
2.6 on pages 68 and 69. Note also that it follows immediately from (a) that
the cancellation rule for fractions (equation (1.29) on page 47) continues to hold
for complex fractions: CE D × BE = BD if E = 0. For example,
A AC

125 125 125


8.7 × = 8.7 × = .
26.1 3 × 8.7 3
One can give algebraic proofs of (a)–(d) that are entirely mechanical; e.g., for (a),
p
let A = k , B = m n , C = q , substitute these values into both sides of (a), invert
and multiply each side separately, and verify that the two sides are equal. Do the
same for every other assertion. This way of proving (a)–(d) is correct, but it is not
particularly instructive. We now present a more sophisticated method of proving
(a)–(d); it is one that you would use in a school classroom only sparingly, but it is
a piece of mathematics that is worth learning.
AC A AC A
Let us prove (a), i.e., BC = B . Let x = BC and y = B . We have to prove
AC
x = y. Since x = BC , by the definition of division, AC = xBC. Similarly, A = yB,
so that multiplying both sides of A = yB by C gives AC = yBC. Comparing
AC = yBC with the previous equality AC = xBC, we see that we have expressed
AC as a multiple of BC in two ways, xBC and yBC. Since BC = 0 (it is the
AC
denominator of BC ), the uniqueness part of Theorem 1.8 on page 56 says that
these two multiples of BC are in fact the same; i.e., x = y.
The proofs of the others can be safely left as an exercise (Exercise 5 below).
Exercises 1.6.
(1) Let A and B be fractions and let B = 0. Prove that for any nonzero whole
number j,
A A jA
+ ···+ = .
B  B B
j
72 1. FRACTIONS

(2) (a) Divide 98 into two parts A and B (i.e., A and B are fractions and
A
A + B = 98) so that B = 67 . (b) Divide 27 into two parts A and B so that
A 4
B = 5.
3
(3) Two fractions x and y satisfy xy = 10 and xy = 158
. What are x and y?

A −1
(4) (i) Let A and B be nonzero fractions. Prove that B = B
A . (ii) Prove
the following invert-and-multiply rule for complex fractions: let A, B, C,
D be nonzero fractions; then
A
B AD
C
= .
D
BC

(5) (i) Prove (b) on page 70 by the mechanical procedure of converting both
sides to ordinary fractions. (ii) Now prove (b) again, but this time by
employing the reasoning used in the text to prove (a). (iii) Repeat (i) and
(ii) on (c) and (d) on page 71.
(6) Explain, in as simple a manner as possible, approximately where the frac-
tion 821 is on the number line. (This is a mathematical problem, which
26 2
means that you have to be precise even when you make approximations.)
(7) (a) Find the fraction K so that K − 2.5 = 4 13 −K. (b) Find the fraction
A so that A−2.5 2
4.1−A = 3 . (c) Can you interpret the statements in (a) and (b)
geometrically?
1
(8) If x, y are nonzero fractions, what is 1 1 1 ? (This expression for x
2 x + y)
(

and y turns up often enough to merit a name: the harmonic mean of x


and y.)
(9) If x, y, u, v are nonzero fractions so that x < u and y < v, prove that
xy uv
< .
x+y u+v

1.7. Percent, ratio, and rate problems


Percent, ratio, and rate are among the most troublesome topics for students.
One thing we know with certainty about this difficulty is that these concepts are never
clearly defined in the mathematics education literature. Consequently, TSM sets up
students for failure by asking them to solve word problems involving these concepts
without telling them what the concepts mean. If we want to find out students’
difficulties in learning these topics, let us begin by presenting mathematically correct
definitions of these concepts that they can apply with ease to solve problems. Then
it would make sense to conduct research into the root cause of students’ possible
learning difficulties. In this section, we supply precise definitions for the first two
by making essential use of complex fractions. Our discussion of the third topic, rate,
will be uncharacteristically thorough because it has been completely mangled in TSM.
We will explicitly bring out the fact that, in school mathematics, only two kinds
1.7. PERCENT, RATIO, AND RATE PROBLEMS 73

of "rate" can be considered: average rate and constant rate. We also emphasize
the underlying similarity among the typical school "rate" problems: speed, house-
painting, lawn-mowing, and water flow from a faucet.
Percent (p. 73)
Ratio (p. 75)
Constant speed (p. 76)
Other kinds of constant rate (p. 81)

Percent

We begin with a precise definition of percent in terms of complex fractions.


N
Definition. For a fraction N , N percent means the complex fraction 100 .
N
By tradition, the complex fraction 100 is usually written as N %. By regarding
N
100 as an ordinary fraction, we see that the common terminology about N % of a
N
quantity m n
means exactly 100 × m n (see the definition on page 24 and equation
(1.25) on page 45). Now, in everyday language, "3 percent of 18.7" means "divide
18.7 into 100 equal parts and take 3 of those parts". If the preceding definition
of percent is any good, then this phrase should have the same meaning strictly
according to this definition. Indeed it does, and we will prove it in general. By
definition,  
m N m 1 m
N % of = × = N× × .
n 100 n 100 n


Letting A denote the fraction 100 1
×mn , we get
m
(1.49) N % of = N × A.
n
By the definition of fraction multiplication (page 45), A is 1 part when mn is divided
into 100 equal parts. Furthermore, for a fraction N , N ×A literally means "N copies
of A" in the following sense. Take N = 31 23 , for example; then the distributive law
implies    
2 2 2
31 × A = 31 + × A = (31 × A) + ×A .
3 3 3
Of course (31 × A) is 31 copies of A (see equation (1.28) on p. 47), and ( 23 × A) is
two-thirds of A in the usual sense (see equation (1.25) on p. 45). Therefore, 31 23 ×A
is precisely 31 and two-thirds copies of A.48 Therefore, by (1.49), we see that N %
of m m
n is N of the parts when n is divided into 100 equal parts. In particular, "3
percent of 18.7" does mean "divide 18.7 into 100 equal parts and take 3 of those
parts".
Let us put the preceding discussion in the proper perspective. TSM defines
"percent" as "out of a hundred". But what is "out of a hundred"? Is it a number?
Or is it an "action"? On the basis of such a vague "definition", students have no
choice but to accept—by rote—that, for example, 31 23 percent of 18.7 is equal to
31 23 % × 18.7 for the purpose of passing standardized tests. By contrast, everything

48 This statement is a precise transcription of the definitions of fraction multiplication and

fraction addition, nothing more and nothing less.


74 1. FRACTIONS

we do here can be checked by looking up a definition or appealing to a theorem


that has been proved. For example, the fact that 31 23 percent of 18.7 has to be
equal to
31 23
× 18.7
100
is dictated by the definition we have just given of percent and by equation (1.25)
on page 45. In terms of mathematics learning, the goal of this approach is to
show students a clear path to a desired conclusion by the use of reasoning and
by appealing to precise definitions rather than by recalling rote procedures. This
approach is the only way to learn—or teach—mathematics.
To demonstrate the efficacy of having a precise definition of "percent", let us
solve the three kinds of standard problems on percents that students traditionally
consider to be difficult:
(i) What is 5% of 24?
(ii) 5% of what number is 16?
(iii) 9 is what percent of 24?
Please pay attention to a characteristic feature of the following solutions: every step
is based on something you have already learned, be it a definition49 or an assertion
we have proved.

Solutions.
(i) If x is 5% of 24, then x = 5% × 24 = 100
5
× 24 = 120 6
100 = 5 .
(ii) If 5% of a certain number y is 16, then again strictly from the definition
5
of "percent", this translates into (5%)y = 16; i.e., 100 × y = 16. Multiplying both
100
sides by 5 , we get
100
y = 16 × = 320.
5
(iii) Suppose 9 is N % of 24 for some fraction N . This translates into 9 =
N % × 24, or 9 = 100
N
× 24. Multiplying both sides by 100
24 , we have 9 × 24 = N , so
100

that
900 75 1
N= = = 37 .
24 2 2
So the answer to (iii) is 37 12 %.
We should point out the critical role played by complex fractions in the solution
of (iii). Since N is a priori a fraction (and actually turns out to be a fraction), N %
is a complex fraction. Therefore the calculations in (iii) had to make use of (a) and
(d) on pp. 70 and 71. This is the first and only time we will take the trouble to point
out the numerous instances in which complex fractions are used in an essential way
in the rest of this section.
What we can conclude from this short discussion is that if students have an
adequate background in fractions and have been carefully instructed in the use of
symbols, the concept of percent is straightforward and involves no subtlety. Once

49 Always remind your students that if they do not know definitions, they are not in a position

to do mathematics, in the same way that anyone who has no vocabulary is not in a position to
write novels.
1.7. PERCENT, RATIO, AND RATE PROBLEMS 75

this kind of instruction has been implemented in the school classroom, then ed-
ucation research would be in a position to shed light on what the real learning
difficulties might be. Until then, we should concentrate on meeting the minimum
requirements of the fundamental principles of mathematics (see page xiii in the
preface), which includes providing clear and precise definitions of all the concepts
and the requisite reasoning for all the claims. Note however that such a definition
of percent could not be given if the concept of a complex fraction were not available.

Ratio

Next we take up the concept of ratio. In TSM, this concept is encrusted with
excessive verbiage. It would be expedient, therefore, to begin with a short definition.

Definition. Given two fractions A and B, the ratio of A to B, sometimes


A
denoted by A : B, is the complex fraction B .

In connection with ratio, there are some common expressions that need to be
made explicit. To say that the ratio of boys to girls in a classroom is 3 to 2 is
to say that if B (respectively, G) is the number of boys (respectively, girls) in the
classroom, then the ratio of B to G is 32 . Similarly, in making a fruit punch, the
statement that the ratio of fruit juice to rum is 7 to 2 means that we are
comparing the volumes of the two fluids (because the use of volume for measurement
is understood in this situation), and if the amount of fruit juice is A fluid ounces
and the amount of rum is B fluid ounces, then the ratio of A to B is 72 , and so on.
We will now work through some standard problems on ratios by using rea-
soning strictly based on this definition, without any guesswork. This is in fact
how mathematics should be done, but TSM makes it necessary for students to ig-
nore whatever "definitions" they are given and use rote solution templates to solve
problems—especially problems about ratios and rate (our next topic). The clarity
of the ensuing discussion, as well as the ease with which we dispatch the problems,
may serve as a persuasive argument for the need for precise definitions.

Example 1. In a school auditorium with 696 students, the ratio of boys to


girls is 11 to 13. How many students are boys and how many are girls?

Solution. Let the number of boys be B and the number of girls be G, then
we are given that B 11
G = 13 . Thus by the cross-multiplication algorithm, 13B = 11G.
k k
Let k be this common number, i.e., 13B = 11G = k, so B = 13 and G = 11 . Now
k k 24k
we are also given B + G = 696, so 13 + 11 = 696. This gives 143 = 696, and there-
fore 24k = 143 × 696; i.e., k = 29 × 143. Since B = 13
k
, we get B = 319. The value
k
of G can be obtained from either B+G = 696 or from G = 11 . In any case, G = 377.

A more sophisticated problem is the following. This was a problem in an 1875


California Exam for Teachers, and it was mentioned in the well-known address of
Lee Shulman ([Shulman]).

2
Example 2. Divide 88 into two parts so that their ratio is 3 to 45 .
76 1. FRACTIONS

Solution. Here, a "part" has to be understood to be a fraction. Let the two


fractions be A and B. Then we are given that
2
A 3
= 4 .
B 5
A
Using invert-and-multiply on the right and simplifying, we get B = 56 . By the
cross-multiplication algorithm (item (b) on p. 70), 6A = 5B. Let k be the common
value: k = 6A = 5B. Thus A = k6 and B = k5 . Because A + B = 88, we have
k k 11k
+ = 88 =⇒ = 88,
6 5 30
where the symbol "=⇒" means implies. Therefore k = 240. It follows from 6A = k
that A = 40, and from 5B = k that B = 48. Thus the two parts are 40 and 48.

In this short discussion, we have intentionally used only one method to do both
examples to emphasize the simplicity of such problems. We leave to an exercise the
exploration of other methods of solution (Exercise 5 on p. 85).

Constant speed

In school mathematics, the most substantial application of the concept of di-


vision is to problems related to rate, or more precisely, constant rate. The precise
definition of the general concept of "rate" requires calculus (see the appendix of
Section 6.4 in [Wu2020c]), but unfortunately, TSM50 has done great damage to
mathematics learning by making believe that the general concept of rate can be
taught in middle school.51 We will explain below why the only "rate concepts" that
can be taught in school mathematics are average rate over a certain time interval
and constant rate (see page 81).
The most common situation in which the vague idea of "rate" arises is that of
motion, and the "rate" involving motion is what we call speed. Because speed may
be the most intuitive among the "rates" of school mathematics, we discuss it first.
In the following discussion of speed, we will only make use of fractions because
that is all we have right now, but in fact the computations will implicitly involve
real numbers, i.e., all points on the number line. This is a deep issue and we will
postpone its discussion until we come to FASM on page 133.
The definition of constant speed is somewhat sophisticated and is not to be
found in TSM. Without it, unfortunately, any discussion of "rate" will necessarily
degenerate into rote memorization and fake reasoning, which is what has been hap-
pening in school classrooms across the land for the past few decades. Instead of
giving such a definition outright, it therefore makes sense to first provide some back-
ground information about the evolution of this definition. Below, we will present
a heuristic argument that, for a motion to be worthy of being called "constant
speed", there must be a fixed number v so that the total distance (feet, miles, etc.)
50 Seepage xiv for the definition of TSM.
51 Students of calculus beware! If you want to be able to teach school mathematics, you still
need to learn how to explain constant rate in an elementary manner without saying anything like
"the derivative of the work function is a constant". This is in fact an excellent example of why, no
matter how much advanced mathematics one knows, one must learn school mathematics in order
to be an effective teacher.
1.7. PERCENT, RATIO, AND RATE PROBLEMS 77

traveled after a time interval of length t (hours, minutes, etc.) is vt feet, miles, etc.
We will denote this distance by d(t) (and not just d) to indicate that it depends on
the length of the time interval t. Thus, we will make the case, heuristically, that if
a motion is of constant speed, then
(1.50) d(t) = vt for any t ≥ 0.
We write vt and not tv because, v being a constant (i.e., a fixed number), we respect
the notational convention of putting a constant before the symbol t that can take
on infinitely many values. See page 302 below. (You may notice at this point that
this discussion resembles in spirit the earlier one about the definition of fraction
multiplication on pp. 44ff.)
As we have just said, this number d(t) depends on what t is; d(t) changes with
t. You will recognize from your exposure to calculus that d(t) is a function of t, and
our notation duly reflects this fact. However, in a middle school classroom, it is not
necessary to emphasize this aspect of the concept of constant speed at this point.
For school students, the sophistication of this definition lies in the fact that, for the
purpose of checking whether a motion has constant speed, it requires the checking
of an infinite number of multiplications, namely, the product of t by v for every
t ≥ 0. This is a departure from the mathematics of elementary school where one
is called upon to do only a small number of computations for each problem. The
constant v is what is usually called the speed of the object and its unit will depend
on the units we use for distance and time. If these are ft and sec, respectively, then
the unit of v will be feet per second; if they are yd and min, respectively, then
the unit of v will be yards per minute, etc. For definiteness, let us say the unit
for distance is miles and the unit for time is hours; then the speed is miles per
hour, and we abbreviate it to mph as usual. For example, driving at a constant
speed of 60 mph means the total distance d(1) traveled after 1 hour is 60 × 1 = 60
mi (according to (1.50)), and the total distance d(3.2) traveled after 3.2 hours is
60 × 3.2 = 192 mi, etc.
Here is the heuristic argument that will lead us to (1.50). Let v be a fixed
number. We can easily agree that the following two statements capture the naive
idea of "motion of constant speed":
(a) The object travels v miles in every one-hour interval.
(b) In any two time intervals of the same length, the object
travels the same distance.52
On the strength of these two plausible statements (a) and (b), we now show that
(1.50) must hold.

Case 1: t is a whole number. This is the litmus test: if (1.50) cannot be shown
to be true on the basis of (a) and (b) when t is a whole number, then clearly there
would be no point in going further. But if t is a whole number, then the time
interval of length t is t copies of a 1-hour interval. In each of these 1-hour intervals,
the object travels v miles, by (a). Therefore in a time interval of t hours, the object
travels v + v + · · · + v (t times) miles, which is tv = vt miles. Thus d(t) = vt as in
(1.50) when t is a whole number.

52 Note that statement (a) by itself does not guarantee "constant speed" even in the naive

sense; see Exercise 7 on page 85.


78 1. FRACTIONS

Case 2: t is a unit fraction; i.e., t = n1 for some nonzero whole number n. Let
the distance that the object travels in n1 hours be D miles; i.e., d( n1 ) = D (observe
that, by (b), it does not matter which ( n1 )-hour interval we use). Therefore, since a
1-hour interval is n copies of a ( n1 )-hour interval, the distance traveled in 1 hour is

· · · + D = nD = Dn miles.
D + D +
n

But in 1 hour, the object is known to travel v miles, by (a). Therefore, v = Dn, and
multiplying both sides by n1 yields D = v n1 ; i.e., d( n1 ) = v n1 . So again, d(t) = vt is
correct when t = n1 .

Case 3: t is any fraction. It remains to prove that the equality d(t) = vt is


valid when t is an arbitrary fraction m m
n . Consider then a time interval of length n
m 1
hours, where n is a fraction. We have just seen that in any n hours, the object
travels v n1 miles. But m 1
n is m copies of n , so the total distance that the object
m
travels in n hours must be
 
1 1 1 1 m
v + ···+ v = v + ···+ =v miles.
n  n

n

n

n
m m

Thus d( m
n) = vm
n and we see that (1.50) is also true when t = m m
n for any fraction n .

What the preceding heuristic argument shows is that for a motion that is of con-
stant speed v in the naive sense of statements (a) and (b) above, we have d(t) = vt
for all fractions t. In other words, equation (1.50) holds when t is a fraction. Con-
versely, it is easy to see that if d(t) = vt for all t ≥ 0, then statements (a) and (b)
are true. Therefore the naive notion of what constitutes constant speed in the form
of (a) and (b) is equivalent to the single equation (1.50) to the effect that d(t) = vt
for any t ≥ 0. Since the latter is clearly simpler and easier to use, we adopt it as
the formal definition of constant speed:

Definition. An object is said to move at constant speed v (where v is a fixed


number) if, given any number t ≥ 0, the total distance d(t) traveled by the object in
a time interval of length t satisfies d(t) = vt.

When the motion is known to have constant speed v in the sense of this defi-
nition, we can abbreviate "constant speed" to "speed" and call v the speed of the
motion. We repeat, the concept of "speed" in school mathematics makes sense only
in the case of "constant speed". This is therefore a paradoxical situation: because
our mathematical repertoire is limited, we have to first define "constant speed"
before we can talk about "speed". If calculus is available, then the natural order
of things will be restored, as "speed" can then be defined before "constant speed".
As in the case of the definition of fraction multiplication on pp. 44ff, every piece
of reasoning about constant speed from now on must be based on this definition
and on this definition alone. In particular, the heuristic argument leading up to the
definition will play no role in the logical development from this point forward.
In the definition of constant speed, clearly we may restrict our attention only to
those t so that t > 0. Then using the definition of division, especially the statement
1.7. PERCENT, RATIO, AND RATE PROBLEMS 79

(**) on page 57, we may equivalently reformulate the concept of constant speed
using division instead of multiplication: a motion has constant speed v (v is a fixed
number) if for any number t > 0, the distance d(t) (feet, miles, etc.) traveled by
the object in any time interval of length t (seconds, minutes, etc.) satisfies
d(t)
(1.51) = v for all t > 0.
t
This is the precise formulation of the common notion that "speed is distance di-
vided by time", but, we repeat, this phrase has no meaning whatsoever in school
mathematics except in the special case of constant speed.
We now describe yet another way of defining constant speed that is important
for solving certain types of word problems. For an object in motion, we introduce
the concept of its average speed over the time interval from t1 to t2 , t1 < t2 ,
as
total distance traveled from time t1 to t2
(1.52) .
t2 − t1
Pedagogical Comments. In the school classroom, two aspects of the defi-
nition of average speed in (1.52) merit special emphasis. First, the term "average
speed" by itself carries no information until we know the time interval—from a
specific point in time t1 to another specific point in time t2 —in which to compute
the average speed in question. Thus an average speed has to be measured in a
specific time interval. In addition, because the terminology "average" stimulates
the conditioned reflex of "add two numbers and divide by 2", students need to put
this conditioned reflex in check. "Average speed" is not the average of two speeds
any more than a "Venus flytrap" is a flytrap from Venus or a "sea lion" is a lion
from the sea. These added cognitive complexities associated with "average speed"
are thus things you must impress on your students because their sensitivity to the
difference between a technical term and a phrase in everyday language is something
they need to acquire in order to go further in mathematics or science. End of
Pedagogical Comments.

Now suppose we have an object moving at constant speed. Then the total
distance traveled from time t1 to time t2 is equal to
(total distance traveled from time 0 to t2 )
− (total distance traveled from 0 to t1 ).
This is equal to vt2 − vt1 . Therefore the average speed of a motion of constant
speed v over the time interval from t1 to t2 is
vt2 − vt1 v(t2 − t1 )
= = v.
t2 − t1 t2 − t1
Therefore, a motion of constant speed v has the same average speed over any time
interval. This then leads naturally to the next theorem.

Theorem 1.10. An object moves at constant speed v ⇐⇒ the average speed of


its motion over any time interval is always equal to a constant v.

Proof. The preceding discussion proves that if the object moves at constant speed
v, then its average speed over any time interval is also v. Conversely, suppose the
average speed of the object over any time interval is always equal to a constant v.
80 1. FRACTIONS

We have to prove that the motion is one of constant speed; i.e., we must prove that
given any number t > 0, the total distance d(t) traveled by the object during a time
interval of length t satisfies d(t) = vt. Let the motion of the object start at time
0. With t given, let us measure the average speed of this motion during the time
interval from t1 to t0 + t, where t0 is an arbitrary nonnegative number. Then the
total distance traveled in this time interval of length t is equal to the difference,
(1.53) (total dist. traveled from 0 to t0 + t) − (total dist. traveled from 0 to t0 ).
Now by the hypothesis that the average speed over any time interval is v, we have

total distance traveled from time 0 to t0 + t


=v
(t0 + t) − 0
and therefore
total distance traveled from time 0 to t0 + t = v(t0 + t).
Similarly,
total distance traveled from time 0 to t0 = vt0 .
The distance traveled in this time interval of length t, i.e., d(t), is therefore equal
to
v(t0 + t) − vt0 = vt0 + vt − vt0 = vt,
by (1.53). The proof of Theorem 1.10 is complete.

Theorem 1.10 tells us that, in the case of constant speed, we can talk about the
"average speed" of a motion without referencing the time interval over which the
average speed is measured.

Mathematical Aside: One can gain a better understanding of one half of The-
orem 1.10 from the point of view of calculus: we will show why constant speed
implies all average speeds are equal to a constant. Let d(t) be the total distance
traveled by an object from the starting point at time 0 to its position at time t.
Suppose the motion is one of constant speed v for a fixed number v; i.e., the deriv-
ative d (t) is equal to v. Now a function whose derivative is equal to a constant v
is a linear polynomial of the form vt + c for some constant c. Thus d(t) = vt + c.
Consider now the total distance traveled from time t1 to time t2 . Clearly,
total distance traveled from time t1 to t2 = d(t2 ) − d(t1 ).
Since d(t) = vt + c, we have
d(t2 ) − d(t1 ) = (vt2 + c) − (vt1 + c) = v(t2 − t1 ).
Therefore the average speed from time t1 to t2 is (according to (1.52))
v(t2 − t1 )
= v.
t2 − t1
This then explains why constant speed means that the average speed over any time
interval is equal to a fixed constant.
1.7. PERCENT, RATIO, AND RATE PROBLEMS 81

Pedagogical Comments. Almost all the problems related to speed in middle


school are about constant speed,53 even if TSM contrives never to mention this fact.
It therefore makes sense to point out that, in the real world, "constant speed" is
a fictitious concept: city traffic and traffic lights make a mockery of the idea of
constant speed within the confines of a city, and even on an apparently straight
and flat freeway, putting your car on cruise control does not guarantee constant
speed because the needle of your car’s speedometer still quivers ever so slightly.
The fictitious character of constant speed should be clearly brought out in teaching
so that students have the proper perspective when they do such problems: they are
definitely not dealing with so-called "real world" situations. Nevertheless, there are
two reasons why constant speed is—and should be—taught in school mathematics.
The first is that this is essentially the only kind of motion that can be discussed
in K–12 for lack of the needed tools (from calculus) to treat general motion. The
second one is that constant speed is the first step towards the understanding of
arbitrary speed. We should amplify the latter: the definition on page 78 says that
constant speed means the distance function describing the motion is linear, and
linear functions are the basic local approximations to general distance functions.
End of Pedagogical Comments.

Other kinds of constant rate

In TSM,54 the speed of an object is the "rate" of motion in moving from one
place to another. The trouble with such a statement is that it makes some sense
intuitively but, in the setting of school mathematics, it is ultimately confusing
because TSM has promoted "rate" to be a fundamental concept that purports to
test students’ so-called "conceptual understanding" and, yet, TSM fails to produce
a definition of "rate".55 So the first thing we should do in approaching "rate
problems" is to be forthright with students and tell them that, in K–12, "rate"
should be understood as a figure of speech that refers vaguely to problems related to
"work" in a naive sense. All we can do in K–12 is to define precisely the concepts of
"average rate" and "constant rate" in specific instances such as speed, lawn-mowing,
house-painting, water-flow from a faucet, etc. All so-called "rate problems" in K–
12are about either constant rate or average rate, no more and no less. For example,
in problems related to the painting of (the exterior walls of) a house, the rate there
would be in terms of the number of square feet painted per day or per hour. Or,
in problems related to lawn-mowing, the rate in question would be in terms of the
number of square feet mowed per hour or per minute. A third example is water
flowing out of a faucet and filling a container, and the rate in that case would be
in terms of the number of gallons (or liters) of water coming out per minute or per
second. Imitating the discussion of constant speed and average speed (over a time
interval) in the preceding subsection, the concepts of constant rate and average rate
can be analogously defined. For example, a constant rate of lawn-mowing can
be defined in one of two equivalent ways. One is to say that there is a constant r
(whose unit is square feet per hour), so that if AT is the total area that is mowed

53 Those about freely falling objects only show up in high school in the context of quadratic

functions.
54 See page xiv for the definition of TSM.
55 Without calculus, it is impossible to explain what "rate" means.
82 1. FRACTIONS

in T hours, then AT = rT no matter what T may be. The other is to define the
average rate of lawn-mowing from time T1 to time T2 as the division T2A−T 12
1
,
where A12 is the total area mowed from time T1 to time T2 . Then the lawn is said
to be mowed at a constant rate if the average rate of the lawn being mowed
over any time interval is equal to a fixed constant. The equivalence of these two
definitions is by virtue of Theorem 1.10 on page 79 or, more precisely, by the exact
counterpart of this theorem for lawn-mowing.
For further comments on "rate" problems as they appear in TSM, go to the
Pedagogical Comments on page 84.

We now show how to give four different formulations of the same rate problem:
(P1) Regina drives from Town A to Town B in 10 hours, and Eric in 12.
Assuming that each drives at a fixed constant speed, Regina from Town A to Town
B, and Eric from Town B to Town A, and that they get started at the same time
and drive on the same highway, after how many hours will they meet in between?
(You may assume that each car has the size of a point.)
(P2) Regina mows a lawn in 10 hours, and Eric in 12. Assuming that each
mows at a fixed constant rate, how long would it take them to mow the same lawn
if they start mowing at the same time and mow together without interfering with
each other?
(P3) Regina paints a (small) house in 10 hours, and Eric in 12. Assuming
that they start painting at the same time and each paints at a fixed constant rate,
how long would it take them to paint the same house if they paint together without
interfering with each other?
(P4) A faucet can fill a tub in 10 minutes, and a second faucet in 12. Assum-
ing that the rate of the water flow remains constant in each faucet, how long would
it take to fill the same tub if both faucets are turned on at the same time?

It is important to recognize that all four problems are mathematically identical:


if you can solve one, you can solve them all. Let us give a solution of the first, (P1).
Regina Eric
-
A    B
d mi
We have to determine the speeds of Regina and Eric. We do not know the
distance between Towns A and B, so to facilitate thinking, let us say this distance
is d miles. Therefore Regina’s speed vR satisfies d = 10vR , and we have vR =
d d
10 mph. Similarly, Eric’s speed vE is 12 mph. We have to find out how long it
takes Regina and Eric to meet, but again, to facilitate thinking, let us say Regina
and Eric meet after T hours. At the moment we do not know what T is, but the
assumption of constant speed guarantees that the distance Regina drives in T hours
is vR T = dT dT
10 miles. Similarly, the distance Eric drives after T hours is 12 miles.
Since they meet in between the towns, the total distance they have driven together
after T hours is exactly the distance between the towns, i.e., d miles. Therefore we
have
dT dT
(1.54) + = d.
10 12
1.7. PERCENT, RATIO, AND RATE PROBLEMS 83

1

1

By the distributive law, we have dT 10 + 12 = d. Since d is just a number,
1
multiplying both sides by the complex fraction d (and using rules (a) and (d) on
1 1
page 71) gives ( 10 + 12 )T = 1. By the definition of division, we get
1 5
T = 1 1 =5 (hours).
10 + 12
11
It may be instructive if we also solve problem (P2) for comparison.
Let the area of the lawn be A sq. ft. Because in 10 hours Regina can mow the
whole lawn, i.e., A sq. ft., her (constant) rate of lawn-mowing rR is, by definition,
A A
10 sq. ft. per hour. Similarly, Eric’s rate of lawn-mowing rE is 12 sq. ft. per hour.
Now suppose the two together can finish mowing the lawn in T hours. So in T
hours, Regina mows rR T sq. ft., because she mows at a constant rate; thus she
mows AT AT
10 sq. ft. in T hours. Similarly, in T hours, Eric mows 12 sq. ft. Because
they mow with no interference from each other, the sum total of the areas they
mow in T hours adds up exactly to A; i.e.,
AT AT
(1.55) + = A.
10 12
1 1 1
By the distributive law, AT ( 10 + 12 ) = A. Multiplying both sides by A , we get
1 1
T ( 10 + 12 )= 1, so that
1 5
T = 1 1 =5 (hours),
10 + 12
11
exactly as before.
Observe that equations (1.54) and (1.55) are mathematically identical equa-
tions.

We give one more example of constant rate.

Example 3. Tom and May drive on the same highway at constant speed.
Tom starts 15 minutes before May, and his speed is 48 mph. May starts from the
same spot as Tom and her speed is 60 mph. How many hours after Tom leaves will
May catch up with him? (It is understood that Tom’s car and May’s car could be
idealized to be two points on the number line in doing this problem.)

Solution. We give two slightly different solutions. Suppose T hours after


Tom leaves, May catches up with him. In that time, Tom has driven 48T miles.
Since May starts driving 14 hour (= 15 minutes) after Tom, the total distance she
travels in the same time duration is 60(T − 14 ) miles. The two distances being equal
(because May catches up with Tom after T hours), we get 48T = 60(T − 14 ). By the
distributive law, 48T = 60T − 15. Adding 15 to both sides, we get 48T + 15 = 60T ,
and adding −48T to both sides, we get 15 = 12T . Thus T = 1 14 hours; in other
words, 1 14 hours after Tom leaves, May catches up with him.
Here is another solution: after 15 minutes (= 14 hour) of driving, Tom will be
48 × 14 = 12 miles away from the starting point. Let t be the number of hours
after May starts driving that she catches up with Tom. In t hours, May’s car is
60t miles from the starting point. Now Tom’s car travels 48t miles in the same
84 1. FRACTIONS

time duration, but at t = 0, Tom is already 12 miles away from the starting point.
Therefore after t hours, he is (12+48t) miles from the starting point. Since they are
the same distance away from the starting point t hours after May starts driving, we
have 60t = 12 + 48t. Therefore 12t = 12, and t = 1 hour. Since May starts 14 hour
after Tom, it takes 1+ 14 = 1 14 hours after Tom leaves for May to catch up with him.

Pedagogical Comments. Over the years, TSM has developed what may be
charitably called a "generic work problem", which typically reads as follows:
It takes Regina 10 hours to do a job, and Eric 12 hours to do
the same job. If they work together, how long will it take them
to get the job done?
The mathematical defects of such a problem are overwhelming. First, this problem
cannot be solved if Regina and Eric do not each work at a constant rate, yet the
assumption of constant rate is typically not mentioned. A second assumption is
that, somehow, Regina and Eric manage to do different parts of the job, so that at
the end the two parts fit together perfectly without any interference, getting the
job done faster. If the nature of the work is not made explicit, however, such an
assumption will surely strain students’ credulity. A third serious defect is that the
concept of constant rate becomes difficult to formulate precisely when the job in
question is not clearly specified. Indeed, the average rate of work from time t1 to
time t2 is by definition
the amount of work done from t1 to t2
.
t2 − t1
But the numerator has to be a number (referring to some unit, to be sure), and
a student would have a hard time associating the vague description of "amount of
work" with a number. Such vagueness interferes with the learning of mathematics.
Too often this kind of "work problem" ends up being learned entirely by rote.
Make sure that you will not damage your students’ learning with those kinds of
"work problems". Convince the textbook publishers that if such problems are ever
given, there should be an explicit understanding that the work refers to something
specific, such as mowing a lawn. End of Pedagogical Comments.

Exercises 1.7.
(1) Helena drives from Town A to Town B at x mph and drives back at y
mph. What is her average speed for the round trip? If the round trip
takes t hours, how far apart are the towns?
(2) (a) Define precisely what it means to say that water flows out of the faucet
at a constant rate. (b) A faucet with a constant rate of water flow fills
a tub in 9 minutes. If the rate of water flow increases by 10%, how long
would it take to fill the tub?
(3) Kate and Laura start walking at the same time and walk straight toward
each other at constant speed. Kate walks 1 23 times as fast as Laura. If
they are 2,000 feet apart initially and if they meet after 8 12 minutes, how
fast does each walk?
(4) Let A and B be two fractions so that 0 < A < B. (a) Find the midpoint
C of the segment [A, B]; i.e., find C so that B − C = C − A. (b) Find
the point D so that the ratio of the length of [A, D] to that of [D, B] is
1.7. PERCENT, RATIO, AND RATE PROBLEMS 85

2 : 5. (c) Find the point E so that the ratio of the length of [A, E] to that
of [E, B] is m : n, where m and n are nonzero whole numbers.
(5) In Examples 1 and 2 of this section (see page 75), an algebraic method is
used to arrive at the desired conclusion. Now solve these problems again,
pictorially, by making use of the number line.
(6) In a bag of blue, red, and green marbles, the ratio of blue to green marbles
is 12 , and the ratio of red to green marbles is 13 . There are 143 marbles
altogether. How many blue, red, and green marbles are there in the bag?
(7) Most people hold the belief that the concept of constant speed is not at
all complicated. For example, a constant speed of 30 mph means—to
them—that in each one-hour interval, the distance traveled is 30 miles.
The following example shows that they are wrong. Consider an object
that moves according to the following rule:
In the first half-hour, it moves at a constant speed of 60 mph,
and it stops completely during the second half-hour. Then in the
next half-hour, again it moves at a constant speed of 60 mph,
and it stops completely during the next half-hour. Then repeat.
(a) Prove that this motion has the property that in each one-hour interval
(regardless of what the starting point is), the total distance traveled is 30
miles. (b) Is this motion one of constant speed? Explain.
(8) (a) Suppose an object is observed to move at an average speed of s1 mph
from 0 hour to t hours and to move at an average speed of s2 mph from
t hours to 2t hours. Show that its average speed from 0 hour to 2t hours
is 12 (s1 + s2 ) mph. (b) Can you generalize part (a) to the average speed
of an object observed over k time intervals so that each lasts exactly t
hours?
(9) A music store sells a CD player for $225. The owner decides to increase
sales by not charging customers the 8% sales tax. Then he changes his
mind and charges customers $x so that, after they pay the sales tax, the
total amount they pay is still $225. What is x?
(10) A high-tech stock dropped 45% of its value in June to its present value of
N dollars. A stockbroker tells his clients that if the stock goes up by 60%
of its present value, then it would be back to where it was in June. Is he
correct? If so, why? If not, by what percent must the stock at its present
value of N dollars rise in order to regain its former value?
(11) (Sixth-grade Japanese exam question) A train 132 meters long travels at
87 kilometers per hour and another train 118 meters long travels at 93
kilometers per hour. Both trains are traveling in the same direction on
parallel tracks. How many seconds does it take from the time the front of
the locomotive of the faster train reaches the end of the slower train to the
time that the end of the faster train reaches the front of the locomotive
of the slower one?
(12) Highway 505 north of Vacaville is 33 miles long. The speed limit is 55
mph for trucks and 70 mph for cars. Suppose you are driving a truck and
1
2 minute after you enter Highway 505 from Vacaville, the first car passes
your truck. How many more cars will pass your truck before you exit the
86 1. FRACTIONS

highway if (a) everybody drives at the speed limit and (b) the distance
between cars is always 14 mile?56
(13) Driving at her usual constant speed of v mph, Stefanie can get from A to
B in 5 hours. Today, after driving 1 hour, she decides to speed up to a
constant speed of w mph so that she can finish the whole trip in 4 21 hours
instead of 5. By what percent is w bigger than v (compared to v)?
(14) (a) Define precisely what it means for someone to paint a house at a
constant rate. (b) Max and Nancy working together can finish painting
a house in 56 hours. If Max paints the same house alone, it would take
him 90 hours to get it done. How long would it take Nancy to paint the
house if she works alone? (Assume each paints at a constant rate and that
when they paint together there is no interference.)
(15) Each of four people A, B, C, and D can paint a house in 9, 10, 15, and
18 hours, respectively. To paint two such houses they split up in teams
A&D and B&C. If each person paints at a constant rate, which team will
finish first?57
(16) Alfred, Bruce, and Chuck mow lawns at fixed constant rates. It takes
them 2 hours, 1.5 hours, and 2.5 hours, respectively, to mow a certain
lawn if each is mowing alone. If they mow the same lawn together and if
there is no interference in their work, how long will it take them to get it
done?
(17) (a) How much money would be in an account at the end of three years if
the initial deposit is $93 and the bank makes a 6% interest payment at
the end of each year? Assume that no money is ever withdrawn from the
account so that, for instance, at the beginning of the second year there is
$93 plus the interest of the preceding year in the account. (You may use
a calculator, but write down the steps clearly.) (b) How much at the end
of n years?

1.8. Appendix: The basic laws


This appendix is intended to be assigned reading for students. It
offers few details, only a summary of the basic laws of operations
for reference all through these three volumes (this volume and
[Wu2020b] and [Wu2020c]). It is implicitly assumed that these
laws will be discussed in an upper division course in algebra.
In this appendix, we briefly recall the commutative and associative laws of
addition and multiplication and also the distributive law that connects the two.
Our interest in these laws lies mainly in the two standard consequences, Theorems
1 and 2 below, that follow from these laws.
In the following, lowercase italic letters will be used to stand for arbitrary
numbers without further comment. Notice that we are intentionally vague about
what "numbers" we are talking about. Because of FASM (page 133), we have
explicit assurance that these three laws are valid for real numbers (i.e., points on
the number line; see page 6). More is true, however. The fact is that Theorems 1
and 2 are valid even for complex numbers (see Section 5.2 of [Wu2020b]), and these
56 Due to Ole Hald.
57 Due to Ole Hald.
1.8. APPENDIX: THE BASIC LAWS 87

theorems will be used in such generality without comment in all three volumes.
With this understood, the associative law for addition states that for any x, y,
z, we always have
(1.56) (x + y) + z = x + (y + z)
and the commutative law for addition states that for any x and y,
(1.57) x+y =y+x
A fairly tedious argument, one that is independent of the specific numbers x, y, z
involved but is dependent formally only on these two laws, then leads to the follow-
ing general theorem. For everyday applications, this theorem is all that matters as
far as addition is concerned:

Theorem 1. For any finite collection of numbers, the sums obtained by adding
them up in any order are all equal.58

The reason Theorem 1 is of interest to us stems from the fact that addition
is by definition an operation on two numbers only, such as x + y. If we are given
three numbers, x, y, and z, what are we to make of x + y + z? Do we first do the
addition (x + y) first and then add the sum to z, i.e., (x + y) + z, or do we add x
to the sum (y + z)? The associative law (1.56) says it does not matter because the
two are equal. In fact, if one decides instead to add y to the sum (x + z), it still
yields the same number because
y + (x + z) = (x + z) + y (by (1.57))
= x + (z + y) (by (1.56))
= x + (y + z) (by (1.57)).
This simple argument serves to illustrate why Theorem 1 is correct. Henceforth,
we will write
x+y+z
without fear of ambiguity. Similarly, we will likewise write
x+y+w+z
without any fear of ambiguity because Theorem 1 guarantees that
(x + z) + (w + y) = ((z + y) + w) + x = x + (y + (w + z)) = · · · , etc.
The same comment applies of course to the sum
x1 + x2 + · · · + xn
for any numbers x1 , x2 , . . . , xn .

A similar discussion holds for multiplication. Thus the associative and com-
mutative laws for multiplication state that for any x, y, z, we always have
x(yz) = (xy)z and xy = yx,

58 There is a proof of this theorem on pp. 39–41 of [Jacobson].


88 1. FRACTIONS

respectively. Then, in like manner, we have

Theorem 2. For any finite collection of numbers, the products obtained by


multiplying them in any order are all equal.

As with Theorem 1, Theorem 2 guarantees that for a collection of numbers such


as a, b, c, d, and e, the product abcde always leads to the same number regardless
of how the multiplication is carried out.

Finally, the distributive law is the link between addition and multiplication.
It states that, for any x, y, z,
x(y + z) = xy + xz.
Here it is understood that the multiplications xy and xz on the right side are
performed before the products are added (you may look ahead to page 313 for a
general discussion of the so-called order of operations). A simple argument then
extends this law to allow for any number of additions other than two. For example,
the distributive law for five additions states that for any x, a, b, c, d, e, we have
(1.58) x(a + b + c + d + e) = xa + xb + xc + xd + xe,
where, once again, it is understood that the multiplications xa, . . . , xe on the
right are performed before the products are added. (Observe that equation (1.58)
makes implicit use of Theorem 1.) Moreover, because of the commutativity of
multiplication, we also have
(a + b + c + d + e)x = ax + bx + cx + dx + ex,
among many possibilities.
CHAPTER 2

Rational Numbers

In this chapter, we introduce negative fractions which, together with fractions,


form the so-called rational numbers.1 Before proceeding further, we should ask why
we bother with rational numbers at all. To answer this question, we first take a
step backwards and look at the transition from whole numbers to fractions: with
the whole numbers at our disposal, why did we bother with fractions? One reason
is to consider the problem of solving equations. If we ask which whole number x
has the property that when multiplied by 7 it is equal to 5, the answer is obviously
"none". The fraction 57 , on the other hand, has exactly this property. One may
therefore say that if we insist on getting a solution to the equation 7x = 5, then we
would inevitably be led to 57 . More generally, the solution to the equation nx = m
where m, n are whole numbers with n = 0 is the fraction m n . In this sense, we
may regard fractions as the collection of all the solutions of the equation nx = m
(n = 0) as m and n run through the whole numbers.
We now come back to rational numbers. With fractions at our disposal, suppose
we want a fraction x so that 23 + x = 0. Obviously, there is no such fraction. We
then introduce the negative fraction ( 32 )∗ as the solution to this equation. In the
same way, the number ( m ∗ m
n ) is the solution of the equation n +x = 0, for any whole
numbers m and n (n = 0). So we incorporate negative fractions into our number
system to form the rational numbers. Once this is done, we are faced with the same
problem concerning the rational numbers that we faced earlier concerning fractions,
namely, how to define the arithmetic operations among the rational numbers in a
way that is consistent with the original arithmetic operations among fractions. We
will deal with addition and subtraction first before tackling the more complicated
operations of multiplication and division.
Up to now, we have only made use of half of the number line, namely, the
number 0 and the numbers to the right of 0. It is time that we make full use of the
entire number line, both to the left and right of 0, by defining ( m ∗
n ) (m, n = 0) to
m m ∗
be the reflection of n across 0. Therefore, the numbers ( n ) (m, n = 0) will be
points on the number line to the left of 0. In so doing, we give negative numbers a
geometric realization as points on the number line. For school mathematics, this is
far better than having no definition for negative numbers (as in TSM2 ) or defining
them abstractly (as in abstract algebra).
With the introduction of negative numbers, the concept of the absolute value
of a number becomes meaningful. At the end of the chapter will be found an ex-
tended discussion of inequalities among rational numbers, including those involving

1 We repeat: in mathematics, "rational numbers" means the collection of fractions and neg-

ative fractions, not just fractions.


2 See p. xiv for the definition of TSM.

89
90 2. RATIONAL NUMBERS

absolute values. Henceforth, inequalities will play a more prominent role, not just
in a purely numerical setting, but in geometry as well.

2.1. The rational numbers


Our first task is to formalize the definition of negative numbers. Then we intro-
duce the concepts of integers and rational numbers. Note that rational numbers
are the union of fractions (as defined on page 10) and negative fractions, and they
should not be conflated with fractions as they are sometimes in the education liter-
ature.

Recall that a number, or a real number, is a point on the number line (page
5). We now look at all the numbers rather than just 0 and those to the right of
0. The notation for segments [a, b] will now be extended to all points a, b on the
number line, but it will always be understood that a < b. Take any point p on the
number line which is not equal to 0; such a p could be on either side of 0 and, in
particular, it does not have to be a fraction. Denote the mirror reflection of p
on the opposite side of 0 by p∗ ; i.e., p and p∗ are on opposite sides of 0 and are
equidistant from 0 in the sense that:
if p is to the right of 0, then the segments [p∗ , 0] and [0, p] have
the same length,

p∗ 0 p

and
if p is to the left of 0, then the segments [p, 0] and [0, p∗ ] have
the same length.

p 0 p∗
If p = 0, we define
0∗ = 0.
For any point p, we denote (p∗ )∗ by p∗∗ ; thus p∗∗ is the mirror reflection of p∗ .
The following is a succinct way of expressing the fact that reflecting a nonzero point
across 0 twice in succession brings it back to itself (if p = 0, then of course 0∗∗ = 0):
(2.1) p∗∗ = p.
Because the fractions are to the right of 0, the numbers such as 1∗ , 2∗ , or
( 59 )∗
are to the left of 0. Here are some examples of mirror reflections of fractions
(remember that fractions include the whole numbers):

3∗ (2 43 )∗ 2∗ 1∗ ( 23 )∗ 0 2
3
1 2 2 34 3

The set of all the fractions and their mirror reflections, i.e., the numbers m n
and ( k )∗ for all whole numbers k, , m, n ( = 0, n = 0), is called the rational
numbers and is denoted by Q (the "Q" stands for quotient; see Theorem 2.11
2.2. ADDING RATIONAL NUMBERS 91

on page 116). Recall that the whole numbers, denoted by N, are a subset of the
fractions. The set of whole numbers and their mirror reflections,
. . . , 3∗ , 2∗ , 1∗ , 0, 1, 2, 3, . . . ,
is called the integers and is denoted by Z. Note that the integers are the sequence
of equidistant points, extending infinitely both to the left and right of 0, that
includes the sequence of whole numbers. Note also that (recall, the symbol "⊂”
denotes "is a subset of", or "is contained in")
N ⊂ Z ⊂ Q.
We now recall the concept of order among numbers (page 12): for any x, y on the
number line, x < y means that x is to the left of y. An equivalent notation is y > x.
We say x is smaller than y or y is greater than x.
x y

Numbers which are to the right of 0 (thus those x satisfying x > 0) are called
positive, and those which are to the left of 0 (thus those that satisfy x < 0) are
negative. So 2∗ and ( 31 )∗ are negative, while all nonzero fractions are positive. The
mirror reflection of a positive number is therefore negative, by definition, and the
mirror reflection of a negative number is positive. The number 0 is, by definition,
neither positive nor negative.
You are undoubtedly accustomed to writing, for example, 2∗ as −2 and ( 13 )∗ as
− 13 . You also know that the "−" sign in front of −2 is called the negative sign.
So you may wonder why we employ this ∗ notation and have avoided mentioning
the negative sign up to this point. The reason is that the negative sign, having to
do with the operation of subtraction, simply will not figure in our considerations
until we begin to subtract rational numbers. Moreover, the terminology of "negative
sign" carries certain psychological baggage that may interfere with learning rational
numbers the proper way. For example, if a = −3, then there is nothing "negative"
about −a, which is 3. We therefore think it best to hold off introducing the negative
sign until its natural arrival in the context of subtraction in the next section.

Exercises 2.1.
(1) Show that between any two rational numbers, there is another rational
number.
(i) (1.23)∗ or (1.24)∗ ? (ii) (1.7)∗ or ( 12 ∗
(2) Which is greater? 7 ) ?
1 ∗ 2 ∗ 9 ∗ 4 ∗
(iii) (587 5 ) or (587 11 ) ? (iv) ( 16 ) or ( 7 ) ?
(3) Which of the following numbers is closest to 0 (on the number line):
( 15 ∗ ( 11 ∗ 13 9
7 ) , 5 ) , 6 , 4?

2.2. Adding rational numbers


This section introduces the concept of addition among rational numbers in a
way that is analogous to how we will introduce multiplication later on. Thus we do
it abstractly and assume at the outset that the addition of rational numbers must be
commutative and associative and satisfy x + x∗ = 0 for any x. This leads quickly to
the determination of the value of s + t∗ for any fractions s and t. The high point of
this section may well be the definition of subtraction, x − y, as the addition x + y ∗ ,
92 2. RATIONAL NUMBERS

which then allows us for the first time to do the subtraction s − t for any fractions
s and t. In turn, this definition of subtraction justifies the identification of x∗ with
"negative x", −x.
Fundamental assumptions on addition (p. 92)
Addition and mirror reflection (p. 93)
The concept of subtraction (p. 96)

Fundamental assumptions on addition

We can approach the addition of rational numbers by imitating what we did


in Chapter 1, which is to explicitly define the sum of two rational numbers and
then show that, so defined, it satisfies the associative and commutative laws. This
will be done in the next section after we have introduced the new concept of a
"vector" and, for the middle school classroom, this may well be the most prudent
approach to the addition of rational numbers. We will, however, begin with a
more abstract approach in this section, and an explanation is in order. If we look
beyond addition to multiplication (see Section 2.4 on page 105), then we will realize
that, while abstraction can be downplayed for addition, it becomes inevitable for
multiplication. There seems to be no satisfactory way to do the multiplication of
rational numbers short of letting the abstract consideration of the distributive law
dictate the course of action. Recognizing this fact, we set our sights on winning
the war, even if it means struggling through a battle or two in the process, by
presenting the addition of rational numbers in essentially the same way as we will
present multiplication. This has the advantage of imposing a certain unity on the
subject of rational numbers. Moreover, from the point of view of teaching in high
school, it is a good thing to know that, while we downplay unnecessary abstraction
whenever possible, we can nevertheless cope with abstraction when necessary.
Since we already expect the addition of rational numbers, however it is defined,
to satisfy the associative and commutative laws, we may as well assume all these
properties at the outset. Historically, that was pretty much how people before
the eighteenth century dealt with the "new" numbers like negative numbers and
complex numbers. The modus operandi was essentially, "No matter what they are,
let us treat them like any other number." Thus restricting ourselves to addition for
the moment, we make three fundamental assumptions on addition. The first
two are entirely noncontroversial:
(A1) Given any two rational numbers x and y, one can add
these to get another rational number x + y so that if x and y are
fractions, x + y is the same as the usual addition of fractions,
and so that the associative and commutative laws for addition
are satisfied.
(A2) x + 0 = x for any rational number x.
The last assumption explicitly prescribes the role of the mirror reflection of a
number in addition:
(A3) If x is any rational number, x + x∗ = 0.
We would like to bring out the significance of the statement in (A1), to the
effect that if x and y are fractions, then x + y is just ordinary fraction addition.
It tells us that the addition of rational numbers is an extension of the addition of
2.2. ADDING RATIONAL NUMBERS 93

fractions (see the definition of extension on page 29) and not a radical departure
from it, just as the addition of fractions is an extension of the addition of whole
numbers and not a radical departure.
Incidentally, the fact that we assume the addition of rational numbers to be
associative and commutative means that Theorem 1 in the appendix of Chapter
1 (page 86) applies to rational numbers. In particular, we will be free to add a
collection of rational numbers in any order we like.
The last assumption (A3) makes it official that, for example, 2 + 2∗ = 0. As
to (A2), it is not as vacuous as it appears: if x is a negative rational number, then
x+0 is an unknown quantity at the moment because our experience with 0 has been
limited to our encounters with positive quantities. When x is negative, it takes an
explicit assumption to get x + 0 = x.
Because we are assuming that addition among rational numbers is commuta-
tive, (A2) and (A3) then also imply that:
(A2 ) 0 + x = x for any rational number x.
(A3 ) If x is any rational number, x∗ + x = 0.

Addition and mirror reflection

Now that we have two new operations on the rational numbers—the mirror
reflection ∗ and addition—the first thing we should ask is how they interact with
each other. For example, is the order of applying them interchangeable; i.e., given
two rational numbers x and y, if we add them and then take the mirror reflection,
do we end up with the same number as when we take their mirror reflections first
before adding them? In symbols, this becomes whether (x + y)∗ = x∗ + y ∗ . We
will prove that such is the case, but we need some preparation for this proof in the
form of a lemma.
The lemma in question is the converse of (A3): if x + y = 0, then y = x∗ . The
motivation for the lemma comes from the fact that there are times, even critical
times, when we want to claim that a number y is the mirror reflection of a given
number x (see, for example, the discussion leading up to (2.21) on page 107). What
the lemma tells us is that we can get it done by a straightforward computation;
namely, just compute that x + y = 0. This is a very attractive scenario because it
is always satisfying to be able to give a proof by computation.

Lemma 2.1. For all x, y ∈ Q, if x + y = 0, then y = x∗ and x = y ∗ .

Proof. We exploit (A3 ):


x + y = 0 =⇒ x∗ + (x + y) = x∗ + 0
=⇒ (x∗ + x) + y = x∗ + 0 (associative law)

=⇒ 0 + y = x (by (A3 ) and (A2))
=⇒ y = x∗ (by (A2 )).
To finish the proof, we also need to show that x = y ∗ . But knowing y = x∗ , we can
take the mirror reflection of both sides to obtain y ∗ = x∗∗ . Since x∗∗ = x, we get
y ∗ = x, which is the same as what we want. The lemma is proved.

We are now in a position to prove what we are after concerning ∗ and addition.
94 2. RATIONAL NUMBERS

Theorem 2.2. For all x, y ∈ Q, (x + y)∗ = x∗ + y ∗ .

Proof. It is possible to give an elementary proof of the theorem using a case-by-


case argument by letting x or y be alternately a fraction and a negative fraction,
but the proof we are going to give is more sophisticated and makes use of Lemma
2.1. Because the same reasoning will be used a few more times later, it is worth
learning. What Theorem 2.2 asserts is that x∗ +y ∗ is the mirror reflection of (x+y).
Now according to Lemma 2.1, this would be the case if
(x + y) + (x∗ + y ∗ ) = 0.
We now prove this by the repeated use of the associative and commutative laws:
(x + y) + (x∗ + y ∗ ) = (x + x∗ ) + (y + y ∗ ) = 0 + 0 = 0,
where we have made use of (A2) and (A3) in the last two equalities. This proves
the theorem.

Theorem 2.2 immediately tells us how to add two negative fractions, for exam-
ple, ( 34 )∗ and 5∗ , as follows:
 ∗  ∗  ∗
3 ∗ 3 3
+5 = +5 = 5 .
4 4 4
Perhaps it is not obvious, but Theorem 2.2 is also a statement about how to "re-
move parentheses", as we shall see on page 97.

With these basic facts out of the way, we are in a position to explicitly compute
the sum of any two rational numbers. Since we already know how to add 0 to any
number, by (A2) and (A2 ), it suffices to consider the sum of two nonzero rational
numbers. Now a nonzero rational number is either a fraction or a negative fraction,
so we proceed to look at all the possibilities. Therefore let s and t be any two
nonzero fractions; i.e., s and t are both positive. Then the following four cases
exhaust all the possibilities in adding two rational numbers:
s + t, s∗ + t∗ , s + t∗ , s∗ + t.
By (A1), we already know how to compute the sum s + t as these are fractions
(see equation (1.12) on page 33). Therefore, we only need to examine the three
remaining cases. We emphasize that s and t here stand for any two fractions. Since
s + t∗ = t∗ + s, we see that the fourth case above follows from the third case.3
Therefore we need examine only the following two cases:
s∗ + t∗ and s + t∗ .
The first case is easily disposed of because by Theorem 2.2 , we have
s∗ + t∗ = (s + t)∗ .

3 Such an assertion is sometimes confusing to a beginner who becomes fixated on the symbols

s and t themselves. Here is a more detailed explanation of this assertion. Suppose we can prove a
formula for the third case s + t∗ for all fractions s and t. Then the commutative law of addition
implies that we have a formula for t∗ + s, again for all positive fractions s and t. Since s and t are
arbitrary, we may switch the symbols and write s for t as well as t for s, all the while remembering
that they stand for any fractions. Assuming that the switch has been done, then we have a formula
for s∗ + t for all positive fractions s and t. But then this is exactly the fourth case.
2.2. ADDING RATIONAL NUMBERS 95

As to the second case, i.e., s + t∗ , we claim


(2.2) s + t∗ = (s − t) if s ≥ t,
(2.3) s + t∗ = (t − s)∗ if s < t.
We would like to preface the computations with a general comment. As proofs
go, these are about as sophisticated as mathematics in grades K–8 is ever going
to get. Not long or complicated, just sophisticated. Let us explain what the latter
means. To prove that two numbers are equal, the normal procedure is to do a
straightforward computation. For example, to prove that if x and y are numbers,
(x−y)(x+y) = x2 −y 2 , one expands the left side by the distributive law and collects
terms to arrive at the right side. However, the above equalities cannot be proved
this way. Rather, the proof is achieved by an indirect method which appeals to
Lemma 2.1; it requires a delicate touch and cannot be boiled down to a mechanical
procedure. Is such a proof strictly necessary? For the addition of rational numbers
the answer is no, because in the next section, we will use the hands-on method
of vector addition to get the same result. However, if our goal is understanding
the multiplication of rational numbers (Section 2.4 on page 105), then there would
seem to be no way of getting around such sophisticated abstract arguments. Such
a proof for addition, therefore, serves the purpose of acclimating ourselves to the
inevitable abstractions, and ultimately to the learning of algebra. As a teacher,
you have to thoroughly internalize these kinds of arguments before you can hope
to teach them with conviction to your students.
Let us begin the computation of s + t∗ proper. We first consider (2.2); i.e.,
s + t∗ = (s − t) if s ≥ t.
For example, this formula allows us to compute 14 + 2.5∗ because
14 + 2.5∗ = (14 − 2.5) = 11.5.
To prove s + t∗ = (s − t), observe that since s ≥ t, the subtraction s − t makes
sense as ordinary subtraction between two fractions, and we have (s − t) + t = s
(see (1.18) on page 39). This implies (s − t) + t + t∗ = s + t∗ . Since t + t∗ = 0, we
get s − t = s + t∗ , as desired.
Next, we prove (2.3); i.e.,
s + t∗ = (t − s)∗ if s < t.
This assertion now allows us to compute a sum such as 9∗ + 3.6, because
9∗ + 3.6 = (9 − 3.6)∗ = 5.4∗ .
The reason for s + t∗ = (t − s)∗ is that we can apply Theorem 2.2 (and the fact
that s∗∗ = s) to get
s + t∗ = (s∗ + t)∗ .
We can simplify the right side: we have t > s, so (2.2) shows that s∗ + t = t + s∗ =
(t − s). Altogether, we have s + t∗ = (t − s)∗ for s < t. This proves our claim.
In summary, we have shown how to compute the sum of any two rational
numbers strictly on the basis of assumptions (A1)–(A3). We can summarize our
results in the following theorem.
96 2. RATIONAL NUMBERS

Theorem 2.3. For all fractions s and t,


(2.4) s+t = the ordinary sum of the fractions s and t,

(2.5) s ∗ + t∗ = (s + t)∗ ,

(s − t) if s ≥ t,
∗ ∗
(2.6) s+t = t +s=

(t − s) if s < t.

The concept of subtraction

One consequence of the preceding explicit computations is the following unex-


pected insight into the subtraction of fractions: if s, t are fractions and s ≥ t, then
s − t makes sense and
(2.7) s − t = s + t∗
(see (2.6)). In other words, the subtraction of fractions (in the sense of page 38)
can be expressed as an addition in the larger context of rational numbers. What is
striking about (2.7) is that, although the left side of (2.7) makes sense only when
s ≥ t, the right side of (2.7) makes sense for any two fractions s and t. If we recall
the discussion of extension on page 29, then we are naturally prompted to use (2.7)
to extend the definition of subtraction to all fractions s and t by defining s − t to
be the rational number s + t∗ . This is an extension in the sense of page 29 because,
according to (2.7), when s ≥ t, s + t∗ agrees with the usual meaning of s − t given
on page 38. For the first time, we can now freely subtract one fraction from another
without worrying about whether one is bigger than the other.
However, we can go a step further: why restrict s and t to fractions? The right
side of (2.7) makes sense even when s and t are rational numbers. We are therefore
led to the following definition.

Definition. Given two rational numbers x and y, the subtraction x − y is


defined to be
x − y = x + y∗ .

This definition reveals that subtraction is just a different way of writing addition
among rational numbers. This enlarges on equation (1.27) on page 46 and equation
(1.33) on page 57, which show that division is merely a different way of writing
multiplication.
In the rest of this section, we will explore some ramifications of this concept of
subtraction. The overriding fact is that, without this general definition, we do not
have a good grasp of what subtraction is about. Beyond the oddity of not being
able to subtract a larger fraction from a smaller one, there is also the unpleasant
observation that "subtraction is not associative"; i.e., in general, (x − y) − z =
x − (y − z) for fractions x, y, z. For example, letting x = 4, y = 2, z = 1, the left
side is 1 while the right side is 3. We need clarity in the situation.
We start from the beginning. Letting x = 0 in the definition of x − y, we have
0 − y = y∗
2.2. ADDING RATIONAL NUMBERS 97

because 0 + y ∗ = y ∗ . There is universal agreement to abbreviate 0 − y to −y.


We do that and get that for every y ∈ Q,
(2.8) −y = y ∗ .
Henceforth, we will generally abandon the notation of y ∗ and replace it by the more
common −y. We call −y minus y.
Let us restate some of our previous conclusions in the new notation. From the
definition of x − y as x + y ∗ , (2.8) implies that for all x, y ∈ Q,
(2.9) x − y = x + (−y).
From x∗∗ = x for any x ∈ Q, we get
−(−x) = x.
(A3) now states that:
(A3*) If x is any rational number, x + (−x) = (−x) + x = 0.
Lemma 2.1 and Theorems 2.2 and 2.3 now read:

Lemma 2.1*. For all x, y ∈ Q, if x + y = 0, then y = −x and x = −y.

Theorem 2.2*. For all x, y ∈ Q, −(x + y) = −x − y.

(In Theorem 2.2*, observe that x∗ + y ∗ = x∗ − y, by the definition of subtrac-


tion, so that, by (2.8), x∗ + y ∗ = −x − y.)

Theorem 2.3*. For all fractions s and t,


s+t = the ordinary sum of the fractions s and t,

(2.10) −s − t = − (s + t),

(s − t) if s ≥ t,
(2.11) s−t = −t+s=
− (t − s) if s < t.

Remark. We now see that Theorem 2.2*, in the form of


−(x + y) = −x − y
for all rational numbers x and y, is a statement about "removing parentheses". We
can go a step further: for all rational numbers x and y,
(2.12) −(x − y) = −x + y,
(2.13) −(−x + y) = x − y.
We leave these as exercises (see Exercise 4 on page 99).

We pursue the theme that subtraction is another way of writing addition among
rational numbers and bring closure to a remark we made about equation (1.19) on
page 40 on the subtraction of fractions. We now show that for any rational numbers
a, b, x, y,
(2.14) (a + b) − (x + y) = (a − x) + (b − y).
98 2. RATIONAL NUMBERS

This is because
(a + b) − (x + y) = a + b + (x + y)∗ = a + b + x∗ + y ∗ ,
where the first equality is by the definition of subtraction and the second equality
is on account of Theorem 2.2. Thus (a + b) − (x + y) = (a + x∗ ) + (b + y ∗ ), by
Theorem 1 of the appendix in Chapter 1 (page 86). By the definition of subtraction
again, we get (a + b) − (x + y) = (a − x) + (b − y).
It is clear from this reasoning that there is a similar assertion if a + b is replaced
by a sum of k rational numbers for any positive integer k and the same is done to
x + y. The details are left as an exercise (Exercise 7 on page 99).
There is a certain "commutativity of subtraction" that follows naturally from
the definition of subtraction in terms of addition. For any x, y ∈ Q, we claim that
(2.15) −x + y = y − x.
For example, +4 = 4−
− 23 both being equal to 10
2
3, 3 as a simple application of
Theorem 2.3* shows. In general, we have
−x + y = x∗ + y (2.8)
= y + x∗ (commutativity)
= y−x (definition of subtraction).
By contrast, we should caution that there is no "associative law for subtrac-
tion"; i.e., (x − y) − z = x − (y − z). This is because the left side is equal to
(x − y) − z = (x + y ∗ ) + z ∗ (definition of subtraction)
= x + (y ∗ + z ∗ ) (associativity)
= x + (y + z)∗ (Theorem 2.2)
= x − (y + z) (definition of subtraction).
In other words, (x − y) − z = x − (y + z). Therefore (x − y) − z = x − (y − z) if and
only if x − (y + z) = x − (y − z), and a straightforward argument shows that the
latter happens for all x and y if and only if z = 0. We leave the details to Exercise
6 on page 99.
Before we bring this section to a close, we will prove a generalization of (1.16)
on page 39. We will prove that for any rational numbers x and y,
(2.16) the length of the segment [x, y] is y − x.
To prove this, first let x, y ≥ 0.

0 x y
Then by definition, the lengths of [0, y] and [0, x] are y and x, respectively, so that
if the length of [x, y] is , then x +  = y (because [0, y] is the concatenation of [0, x]
and [x, y]).4
It follows that the length of [x, y] is  = y − x. Next, suppose x, y ≤ 0. Then
we know the length of [−y, −x] is (−x) − (−y) by what we have just proved.

x y 0 −y −x
4 We are appealing to geometric intuition right now, but this reasoning about length will be

put on a formal basis in Section 4.1 of [Wu2020c] as the Additivity Property of length.
2.3. THE VECTORIAL REPRESENTATION OF ADDITION 99

Since [x, y] and [−y, −x] have the same length, we see that the length of [x, y] in
this case is (−x) − (−y) = (−x) + y = y − x on account of (2.15). Finally, there
remains the case that x < 0 and y > 0.

x 0 y
Now the segment [x, y] is the concatenation of [x, 0] and [0, y]. By the preceding
results, we have
length of [x, y] = (0 − x) + (y − 0) = y − x.
The proof of (2.16) is complete.

Exercises 2.2.
(1) Prove that for all x, y ∈ Q, if x + y = x, then y = 0.
(2) Compute (a) −202 + 189, (b) −93 − 728, (c) −3 25 + 9,
(d) −4 67 + 2 23 , (e) −7.1 − 22 13 , (f) 7 − (2.5 − 3 23 ),
7 ∗
(g) (−703.2 + 689.4) − ( 15 − 3 23 ), (h) ( 56 − (1 18 5
) ) + 24 .
(3) Without using Theorem 2.3 or Theorem 2.3* and using only (A1)–(A3),
Lemma 2.1, and Theorem 2.2, explain as if to an eighth grader why 43 −
2 15 = − 13
15 .
(4) Prove (2.12) and (2.13) on page 97: i.e., for all x, y ∈ Q, we have
−(x − y) = −x + y and −(−x + y) = x − y. Give a reason at each
step.
(5) Explain carefully why each of the following is true for all x, y, z ∈ Q:
(a) (x − y) − z = (x − z) − y. (b) x − (y − z) = (x − y) + z. (c) (x + y) − z =
x − (z − y).
(6) Prove that for rational numbers x, y, and z with x, y = 0, x − (y + z) =
x − (y − z) if and only if z = 0.
(7) (a) Let a, b, . . . , z, w be rational numbers. Give a detailed proof of the
following and justify every step:
(a + b + c + d) − (x + y + z + w) = (a − x) + (b − y) + (c − z) + (d − w).
(b) Can you extend (a) from a pair of 4 rational numbers to a pair of n
rational numbers for any positive integer n? For notation, try
(a1 + a2 + · · · + an ) − (x1 + x2 + · · · + xn )
= (a1 − x1 ) + (a2 − x2 ) + · · · + (an − xn ).
(Although officially we do not take up mathematical induction until Sec-
tion 1.7 in [Wu2020b], you may use that technique here if you want.)

2.3. The vectorial representation of addition


We introduce the concept of a vector on the number line and bring the abstract
concept of addition down to earth by describing the addition of two rational numbers
in terms of the addition of the two vectors associated with the numbers. An impor-
tant property of this representation of addition is that when the rational numbers
100 2. RATIONAL NUMBERS

are fractions, the vector addition so described agrees with the earlier definition of
fraction addition in terms of the concatenation of segments.
Vectors and their addition (p. 100)
Adding rational numbers via vectors (p. 102)

Vectors and their addition

We learned in the preceding section how to add rational numbers on the as-
sumption that addition satisfies the three reasonable properties (A1)–(A3) (see page
92), and we also observed that this way of adding rational numbers coincides with
the concatenation of segments if the rational numbers are positive (see page 12 for
the meaning of "concatenation"). For a sixth- or seventh-grade classroom, however,
it is better to have a more concrete approach to addition, either as an alternative
or, at least, as a supplement. We now outline such an approach by returning to
the number line and introducing new objects called "vectors". Then we define the
addition of vectors, and on the basis of that we give a new definition of the addi-
tion of rational numbers. At the end of the section, we will indicate why the two
definitions, the one introduced in the last section and the present one using vectors,
coincide.
Thus we start from the beginning all over again, pretending never to
have heard of the addition of rational numbers. Let us introduce a definition: a
vector is a segment on the number line, together with a designation of one of its
two endpoints as a starting point and the other as an endpoint. In pictures,
we put an arrowhead at the endpoint of a vector to indicate its direction from the
starting point to the endpoint, as shown:
starting
point endpoint
-
We will continue to refer to the length of the segment as the length of the vector.5
We call the vector left-pointing if the endpoint is to the left of the starting point,
and right-pointing if the endpoint is to the right of the starting point. The above
vector is right-pointing, for example. The direction of a vector refers to whether
it is left-pointing or right-pointing.
→ →

We denote vectors by placing an arrow above the letter, e.g., A , − x , etc. For


example, the vector K below is left-pointing and has length 1, with a starting point


at 1∗ and an endpoint at 2∗ , while the vector L is right-pointing and has length 2,
with a starting point at 0 and an endpoint at 2.
3∗ 2∗ 1∗ 0 1 2 3
 -

→ →

K L
We will regard a vector with zero length as the zero vector, to be denoted by

− →

0 . Unless stated to the contrary, 0 will always be understood to be the vector
with starting point and endpoint at 0.
For the purpose of discussing the addition of rational numbers, we can further
simplify matters by restricting our attention to a special class of vectors. Let x
5 Remember that length is always ≥ 0.
2.3. THE VECTORIAL REPRESENTATION OF ADDITION 101

be a rational number; then we define the vector − →x to be the vector with starting
point at 0 and endpoint at x. It follows from the definition that if x is a nonzero
fraction, then the segment of the vector −

x is exactly [0, x]. Here are two examples
of vectors arising from rational numbers:
1.5
4∗ 3∗ 2∗ 1∗ 0 1 2
 -
−∗
→ −→
3 1.5
In the following, we will only consider vectors −→x where x ∈ Q, so that all
vectors under discussion will be understood to have their starting point at 0.

We now describe how to add such vectors. Given − →


x and −→
y , where x and y are

− →

two rational numbers, the sum vector x + y is, by definition, the vector whose
starting point is 0 and whose endpoint is obtained as follows:
Slide the vector −

y along the number line until its starting point
(which was 0) is at the endpoint of −→x ; then the endpoint of −

y

− →

in this new position is by definition the endpoint of x + y .
→ −
− →
As an example, to obtain the endpoint of the sum 2 + 1∗ , we slide the vector
−∗

1 until its starting point (i.e., 0) is moved to 2, as shown:

3∗ 2∗ 1∗ 0 1 2
 -
→ −
− →
The vector 2 + 1∗ is therefore the vector that starts at 0 and ends at 1, as shown:

3∗ 2∗ 1∗ 0 1 2
-

− →

2 + 1∗
The vectorial definition of the addition of rational numbers may be best illus-
trated by (interactive) animations. Here are links to four such animations (due to
Sunil Koswatta) for the four cases of x > 0 and y > 0, x > 0 and y < 0, x < 0 and
y > 0, and x < 0 and y < 0, respectively:
https://www.geogebra.org/m/jsvempak,
https://www.geogebra.org/m/ynqmvgsz,
https://www.geogebra.org/m/cqsz7n7w,
https://www.geogebra.org/m/svvzmgyd.
In general, a vector is completely determined by its length and its direction.
Therefore the following is a complete description of the sum of any two vectors:
If both vectors −

x and −→y have the same direction, then the sum
vector x + y has the same direction as −

− →
− →x and −→y , and its length
is the length of the concatenation of the segments of − →
x and − →
y
(see page 12 for the meaning of "concatenation") and is therefore
the sum of the lengths of −

x and − →y.

− →

If the vectors x and y have different directions, then the
x +−
direction of the sum vector −
→ →y is the same as that direction
102 2. RATIONAL NUMBERS

of the vector with the greater length, and the length of the sum
vector −
→x +− →
y is the difference of the lengths of the vectors −→
x


and y .
As an application, observe that the above description of the sum −→x +−→
y , where
x and y are two rational numbers, does not make any distinction between whether
x comes first or y comes first. In other words, according to this description of vector
addition, −
→x +− →
y is the same vector as −→y +−→x . We have the following lemma.

Lemma 2.4. If x and y are rational numbers, then →



x +−

y =−

y +−

x.

Adding rational numbers via vectors

We are now in a position to introduce the addition of rational numbers. The


sum x + y of any two rational numbers x and y is by definition the endpoint
of the vector −

x +−

y . In other words,
x + y = the endpoint of −→
x +− →y.
Put another way, x + y is defined to be the point on the number line so that its
−−−→
corresponding vector x + y satisfies
−−−→ −
(2.17) x+y =→x +− →
y.
From Lemma 2.4, we conclude:

Lemma 2.5. The addition of rational numbers using the vector method is com-
mutative.

To get some intuitive feelings for the sum of two rational numbers x and y, we
will now systematically go through the various possibilities for − →
x +−→y . These are:
(i) Both x and y are positive.
(ii) Both x and y are negative.
(iii) x is positive but y is negative, and the length of − →
x is less


than the length of y .
(iv) x is positive but y is negative, and the length of − →
y is less


than the length of x .
Because we know from Lemma 2.4 that − →
x +− →y =−→y +−→x , the preceding four cases
exhaust all the possibilities. For example, there is no need to consider the case in
which x is negative but y is positive and the length of − →
y is less than the length of


x , because looking at y + x (which is equal to x + −

− →
− →
− →y ), this would be case (iii)
above provided we interchange the symbols x and y.
Now case (i) is straightforward: we are given the following picture:

0 →
−y y
- -

−x x

− →

The vector x + y is, according to the definition of the vector sum, right-pointing
with length equal to the sum of the lengths of −→x and −
→y . The number x + y is
therefore the point indicated by the down-arrow:
2.3. THE VECTORIAL REPRESENTATION OF ADDITION 103

0 −

y in new position
- -?
x
This confirms the fact that for two fractions x and y, their sum as rational numbers
by the use of vectors is exactly the length of the concatenation of the segments
[0, x] and [0, y], i.e., the same as their sum as fractions.

Next we tackle case (iii):

y −

y
 -
0−

x x
Then, by definition, x + y is the point as indicated:


y in new position
?

-
0 x
In this case x + y is negative, and the picture shows that the length of the segment
[x + y, 0] is the length of −
→y minus the length of −→x . This is consistent with the

− →

description of x + y above.
We leave cases (ii) and (iv) to an exercise (Exercise 1 on page 104).

The preceding discussion shows that if x and y are fractions, then x + y, being
the length of the concatenation of [0, x] and [0, y], has exactly the same meaning
as that on page 33. Therefore the addition of fractions x + y as defined by vector
addition in (2.17) coincides with the addition of fractions in the sense of page 33. In
fact, we are going to show that the addition of rational numbers according to (2.17)
using vectors is equal to the addition of rational numbers according to (A1)–(A3)
on page 92.
Let s, t be two fractions. Then the sum of two rational numbers has to be one
of the following four types of sums:
s + t, s + t∗ , s∗ + t, s∗ + t∗ .
In order to show that the sum of two rational numbers is the same whether they
are added according to the method of Section 2.2 or according to the method of
vectors in equation (2.17) (page 102), it suffices to show that each of the above
four sums is the same regardless of which of these two methods is used. We shall
address each of these four sums in the order listed above.
For this particular discussion, we are going to employ an ad hoc notation: for
two rational numbers x and y,
def
x + y = the sum of x and y as defined by (2.17),
while x + y will continue to denote the addition in the sense of Section 2.2. We
have just seen that for fractions s and t
s + t = s + t = the ordinary sum of s and t.
Next, we prove s + t∗  = s + t∗ . According to equation (2.6) of Section 2.2
on page 96, 
∗ s−t if s ≥ t,
s+t =
(t − s)∗ if s < t.
104 2. RATIONAL NUMBERS

Now we use the description of the sum of two vectors on page 101: if s ≥ t, then


the direction of −
→s + t∗ is the direction of s (i.e., right-pointing) as it is longer, and
its length is s − t. This shows s + t∗  = s + t∗ , in case s ≥ t. If, however, s < t,
then the same observation about the sum of two vectors says that the direction

− →

of −
→s + t∗ is the direction of t (i.e., left-pointing) as it is longer, and its length
is t − s. In other words, we also have s + t∗  = s + t∗ in case s < t. Thus
s + t∗  = s + t∗ for any fractions s and t.
Next, we prove

(2.18) s∗ + t = s∗ + t.

By Lemma 2.5 on page 102, we have

s∗ + t = t + s∗ .

By what we have just proved, we have

t + s∗  = t + s∗ .

Since adding fractions according to Section 2.2 is commutative, we also get

t + s∗ = s∗ + t.

The combination of the last three equalities clearly proves (2.18). We have thus
disposed of three of the four sums.
It remains to deal with the sum s∗ + t∗ . By Theorem 2.2 on page 94, s∗ + t∗ =

− →
− → −
− →
(s + t)∗ . But since both s∗ and t∗ are left-pointing, so is s∗ + t∗ and the length
→ −
− → →
− →

of s∗ + t∗ is just the length of the concatenation of the segments of s∗ and t∗ , i.e.,
∗ ∗ ∗ ∗
equal to s + t. Hence s + t  = s + t as well.
We have therefore proved that x + y = x + y for all rational numbers x and
y. Thus the addition of rational numbers can be defined purely algebraically as in
Section 2.2 or by the use of vectors in (2.17).
Since the addition of rational numbers according to the method of Section 2.2
is associative (see (A1)), it follows that the addition of rational numbers according
to (2.17) is also associative. The proof of this conclusion using vectors is extremely
tedious and therefore will not be given, but see Exercise 4 in the following exercises
to get an idea.

Exercises 2.3.
(1) Referring to the discussion after Lemma 2.5 on page 102, find the sums
in case (ii) and case (iv) for −

x +−
→y , where x and y are rational numbers.
(2) For each of the following sums of rational numbers, explain as if to a sixth
grader using the definition of addition in (2.17) whether it is positive or
negative:
31 + 29∗ , (68 12 )∗ + 68 25 , (1 78 )∗ + 2 10
+ (2 14 )∗ , (1 10
1
, 16
7
3 ∗
) + 97 .
(3) Compute each of the following in two different ways: by the method of
Section 2.2 but without making use of Theorem 2.3 on page 96 and then
by using the vector definition in (2.17): (a) 54 + 3∗ and (b) ( 52 )∗ + 23 .
2.4. MULTIPLYING RATIONAL NUMBERS 105

(4) Give a direct proof of the associative law of addition for rational numbers
using the definition (2.17) in the following two special cases: (3+6∗ )+7 =
3 + (6∗ + 7) and (6 + 3.5∗ ) + 2 = 6 + (3.5∗ + 2). (You will have a better
understanding of why it is so tedious to prove associativity using vectors
after doing this exercise.)

2.4. Multiplying rational numbers


This section gives an abstract definition of multiplication between rational num-
bers that is similar to the definition of addition given in Section 2.2. This leads to
a simple determination of the various products among fractions and negative frac-
tions, thereby proving, in particular, the famous rule that "negative times negative
is positive". In the last subsection, we explain how this rule is usually proved in
abstract algebra.
Fundamental assumptions on multiplication (p. 105)
Basic formulas of products (p. 106)
A strictly mathematical approach (p. 109)

Fundamental assumptions on multiplication

Before we discuss the multiplication of rational numbers, let it be mentioned at


the outset that there is no known down-to-earth method to make this topic easier
to learn. For example, the number line does not seem to play a significant role,
and there seems to be no pictorial presentation that can convince students that "a
negative times a negative is a positive". Multiplication among rational numbers is
inherently abstract, and if there is anything that can be said to be a key idea in its
mathematical discussion, it would be the distributive law.
As we mentioned in Section 2.2 on page 91, we are going to take the same
approach to multiplication as we did to addition in that section. Therefore, we
make the following two fundamental assumptions on multiplication:
(M1) Given any two rational numbers x and y, there is a way
to multiply them to get another rational number xy so that if x
and y are fractions, xy is the usual product of fractions. Fur-
thermore, this multiplication of rational numbers satisfies the as-
sociative, commutative, and distributive laws.
(M2) If x is any rational number, then 1 · x = x.
We note that (M2) must be an assumption because, although we know that 1 · t = t
for every fraction t, we do not know as yet what 1 × 5∗ is until (M2) assures us that
it is in fact 5∗ .
It turns out that the other seemingly "obvious" fact that
(2.19) 0·x =0 for any x ∈ Q
need not be assumed because it can be proved on the basis of (M1) and (M2); see
Exercise 1 on page 110.
Our first task is to find out how multiplication is related to the existing opera-
tions, in particular, addition and the mirror reflection ∗. As always, the relationship
between addition and multiplication is codified by the distributive law, which we
must point out is part of the assumption in (M1). As to the operation ∗, we can ask
106 2. RATIONAL NUMBERS

as before whether the order of applying multiplication and ∗ is interchangeable. In


other words, given two rational numbers x and y, if we get their mirror reflections
first and then multiply (thus x∗ y ∗ ), how is it related to the number obtained by
multiplying them first and then getting its mirror reflection (thus (xy)∗ )? When
multiplication is replaced by addition, we saw in Theorem 2.2 on page 94 that the
numbers x∗ + y ∗ and (x + y)∗ are equal. In the case of multiplication, however,
these two numbers turn out to be not equal, i.e., x∗ y ∗ = (xy)∗ , or in the notation
of the minus sign, what we are saying is that (−x)(−y) = −(xy). As is well known,
the correct answer is
(2.20) (−x)(−y) = xy for all rational numbers x and y.

This surprising fact, the bane of many school students, can be given a very
short proof. We will present this proof at the end of the section (see page 109),
but not here, because for beginning algebra students, such a sophisticated proof
would be far from appropriate or enlightening. Instead, we will take a leisurely
tour through the basic multiplication facts of fractions and negative fractions and
wind up with equation (2.20) as our final destination.

Basic formulas of products

Our first order of business is to find out explicitly how to multiply rational
numbers. (Before proceeding further, you may wish to review the discussion about
the nature of this kind of computation in the proof of Theorem 2.3 on page 96.)
Thus let x, y ∈ Q. What is xy? If x = 0 or y = 0, then xy = 0 according to
equation (2.19). We may therefore assume that both x and y are nonzero, so that
each is either a fraction s or the negative of a fraction, −s. Therefore, letting s and
t be nonzero fractions, we consider the following four cases separately:
st, (−s)t, s(−t), and (−s)(−t).
Since Section 1.4 already dealt with the case st, it suffices to deal with the remaining
three cases. We will prove that, for all positive fractions s and t,
(−s)t = −(st) (e.g., (−5)7 = −35),

s(−t) = −(st) (e.g., 5(−7) = −35),

(−s)(−t) = st (e.g., (−5)(−7) = 35).


Let us now prove these assertions. Since multiplication of rational numbers is
assumed to be commutative and s and t are arbitrary, knowing the first implies
knowing the second.6 Let us prove the first; i.e., (−s)t = −(st). First con-
sider a special case: s = 5 and t = 7. Then we want to show that (−5) × 7 =
−35.

According
to Lemma 2.1* on page 97, all we have to do is to prove that
(−5) × 7 + 35 = 0. This is a straightforward computation using the distributive
law:



(−5) × 7 + 35 = (−5) × 7 + (5 × 7) = ((−5) + 5) × 7 = 0 × 7 = 0,

6 See the footnote on page 94.


2.4. MULTIPLYING RATIONAL NUMBERS 107

where the last equality makes use of equation (2.19) on page 105. The general case
is no different: to prove (−s)t = −(st), it suffices to prove
(2.21) (−s)t + st = 0.
We simply carry out the same computation:


(−s)t + st = (−s) + s t = 0 · t = 0,
where we make use of equation (2.19) again in the last step. So the proof for
(−s)t = −(st) is complete.

It remains to deal with the third equality above; i.e., for all nonzero fractions
s and t,
(−s)(−t) = st.
We can again invoke Lemma 2.1* on page 97 if we think of st as minus −(st);
in other words, st = −(−(st)). Then this lemma says (−s)(−t) would be equal
to −(−(st)) if we could prove (−s)(−t) + (−(st)) = 0. This we can do because
we have just finished proving that −(st) = (−s)t, and another application of the
distributive law now gives
(−s)(−t) + (−(st)) = (−s)(−t) + (−s)t


= (−s) (−t) + t (distributive law)
= (−s) · 0 (by (A3*) on page 97)
=0 (by (2.19) on page 105)
and the proof of (−s)(−t) = st is also complete.

We summarize our findings in the following theorem.

Theorem 2.6. For all fractions s and t,


st = the ordinary product of the fractions s and t,
(−s)t = −(st),
(−s)(−t) = st.

Theorem 2.6 gives us a basic idea about the multiplication of rational numbers,
but it needs to be complemented by a broader view of the situation. We will attend
to that presently, but let us first take note of some of its immediate consequences.
For example, the following simple rules are implied by Theorem 2.6:
positive × positive is positive,
positive × negative is negative,
negative × negative is positive.
In particular, we know that
x2 ≥ 0 for any x ∈ Q
regardless of whether x is 0 or positive or negative. Anticipating FASM (page 133;
more precisely, using (D) and (E) on p. 123 and p. 124), we have
(2.22) x2 ≥ 0 for any number x.
108 2. RATIONAL NUMBERS

The equality (−s)(−t) = st for all fractions s, t is among the Most Frequently
Asked Questions in school mathematics. Not all students will understand the pre-
ceding proof, but very likely they will all want a reasonable explanation of this fact.
We now address this classroom issue by presenting a much simpler proof of some-
thing less, namely (−1)(−1) = 1. There are two reasons why we take the trouble
to give an independent proof of a result already implied by Theorem 2.6. One is
that this proposed proof may be more widely accessible to students; in mathematics
education, every bit of reasoning helps. The other reason is the surprising fact that
this simple result actually leads to the proof of a generalization of Theorem 2.6.
We begin with a basic, but believable, observation and will prove it by using
the hands-on vectorial approach to the addition of rational numbers.

Theorem 2.7. For any rational number x, the number (−1)x is the mirror
reflection of x. In symbols, (−1)x = −x.
Remark. This theorem may be better understood in the language of ∗ on
page 90: it says (−1)x = x∗ . In this form, one may get a better idea of what the
theorem tries to say: the left side (−1)x is about multiplication (the product of
−1 and x), the right side x∗ is about the operation of mirror reflection, and yet
the two sides are equal. If one realizes that multiplication and mirror reflection are
seemingly two independent operations (notice that (M1) and (M2) on page 105 do
not mention ∗ explicitly), then one is less likely to take Theorem 2.7 for granted.

Proof. The number −x is the point on the opposite side of 0 from x so that x and
−x are equidistant from 0. Therefore this is the picture we want to be true when
x is positive:
(−1)x 0 x

and this is the picture we want to be true when x is negative:

x 0 (−1)x

Now think of the sum x + (−1)x in terms of vectors (see p. 101 and p. 101). If
we can show that
x + (−1)x = 0,

− −−−→
then the vectors x and (−1)x must have opposite direction and equal length (see
the description of the sum of two vectors on page 101). Consequently, (−1)x will
have to be equal to −x, as desired. Let us therefore prove that x + (−1)x is equal
to 0. We use the distributive law:
(M2)

x + (−1)x = 1 · x + (−1)x = 1 + (−1) x.

− −−→
But 1 + (−1) = 0 because 1 and (−1) have the same length and have opposite
directions (recall: we are only using facts from the vectorial approach to the addition
of rational numbers). Therefore


x + (−1)x = 1 + (−1) x = 0 · x = 0,
where the last step is by (2.19) on page 105. The proof of Theorem 2.7 is complete.
2.4. MULTIPLYING RATIONAL NUMBERS 109

If we let x = (−1), then Theorem 2.7 yields the conclusion we sought.

Corollary. (−1)(−1) = 1.

We now draw on our experience of having proved Theorem 2.6 and Theorem
2.7 to prove a general theorem: instead of multiplying fractions with other fractions
or negative fractions, we show directly how to multiply arbitrary rational numbers.

Theorem 2.8. For all rational numbers x and y,


(2.23) (−x)y = x(−y) = −(xy),
(2.24) (−x)(−y) = xy.

Proof. We first prove (2.23); i.e., (−x)y = x(−y) = −(xy). If we read the equality
of Theorem 2.7 backwards, we get −x = (−1)x. Therefore (−x)y = ((−1)x)y =
(−1)(xy), by the associative law. Now we apply Theorem 2.6 again, but this time
to the rational number (xy). Then we get (−1)(xy) = −(xy). Hence
(−x)y = (−1)(xy) = −(xy).
We can prove x(−y) = −(xy) in a similar manner or we can apply the commutative
law twice to what we have just proved to get x(−y) = (−y)x = −(yx) = −(xy).
Next, we prove (2.24); i.e., (−x)(−y) = xy. Theorem 2.7 gives (−x)(−y) =
((−1)x)((−1)y). Now Theorem 2 in the appendix of Chapter 1 (page 88) implies
that we can multiply four given numbers in any order and the result will be the
same. Thus,
((−1)x)((−1)y) = ((−1)(−1))(xy).
So the corollary to Theorem 2.7 says ((−1)(−1))(xy) = 1 · (xy) = xy. Theorem 2.8
is proved.
If we reflect on Theorems 2.6–2.8 a little, we will come to the realization that
the key ingredient in all these proofs is the distributive law. This law was explic-
itly mentioned in the proofs of Theorems 2.6 and 2.7 and is critical to the proof
of Theorem 2.8 as well because Theorem 2.7 lies behind Theorem 2.8. There is a
fundamental reason why this has to be the case, namely, what connects addition
to multiplication is the distributive law, and since we need to import information
about addition (such as (A3*) on page 97 and Lemma 2.1* on page 97) to multi-
plication, the distributive law is our only recourse.

A strictly mathematical approach

It remains to bring closure to this discussion of multiplication by delivering


on a promise made at the beginning of the section, to the effect that there is a
short and self-contained proof of Theorem 2.8 that depends only on Lemma 2.1*
on page 97.

Second Proof of Theorem 2.8. We first prove (−x)y = −(xy), where x, y ∈ Q.


By Lemma 2.1*, it suffices to prove that (−x)y + xy = 0. This is so because by the
distributive law,
(−x)y + xy = ((−x) + x)y = 0 · y = 0.
Next we prove (−x)(−y) = xy. We will show (−x)(−y) + (−(xy)) = 0 which, by
Lemma 2.1*, implies that (−x)(−y) is equal to −(−(xy)), which is xy. To this end,
110 2. RATIONAL NUMBERS

we have
(−x)(−y) + (−(xy)) = (−x)(−y) + ((−x)y) (because (−x)y = −(xy))


= (−x) (−y) + y (distributive law)
= (−x) · 0 = 0.
The proof of Theorem 2.8 is complete.

We conclude this section with three remarks. First, Theorem 2.8 yields an ex-
plicit algorithm for the multiplication of rational numbers: if m k
n and  are fractions,
then
 
m k mk
× − = − ,
n  n

 m  k mk
− × − = .
n  n
In the next section, we will see that these formulas remain valid even when m, n,
k,  are rational numbers (rather than just whole numbers).
Second, Theorem 2.7 gives us another way to think about how to remove paren-
theses, to the effect that −(x + y) = −x − y for all x, y ∈ Q (see Theorem 2.2* on
page 97). This is because
−(x + y) = (−1)(x + y) (Theorem 2.7)
= (−1)x + (−1)y (distributive law)
= −x + (−y) (Theorem 2.7 again)
= −x − y (by (2.9) on p. 97).
Third, we use Theorem 2.8 to prove the following form of the distributive law,
which is commonly taken for granted:
(2.25) x(y − z) = xy − xz for all x, y, z ∈ Q.
Indeed, by using the ordinary distributive law, we have x(y − z) = x(y + z ∗ ) =
xy + xz ∗ = xy + x(−z). But xy + x(−z) = xy + (−xz) by equation (2.23), so
x(y − z) = xy + (−xz). By (2.9) on p. 97, xy + (−xz) = xy − xz, and we have the
desired conclusion.

Exercises 2.4.
(1) Use (M1) and (M2) to give a direct, simple explanation of 0 · x = 0 for
any x ∈ Q.
(2) Compute and justify each step:
(a) (−4)(−1 12 + 14 ),
(b) 165 − 560( 34 − 87 ),

(c) (− 32 )(0.64− 43 ) (write your answer as a decimal),

(d) (20 29 × (− 17
5
)) +(3 29 × 17
5
).
(3) (a) Find a simple proof of (−1)n = −n for a whole

number n without mak-
ing use of Theorem 2.7. (Hint: (−1)n = (−1) (−1) + (−1) + · · · + (−1) .
  
n
2.4. MULTIPLYING RATIONAL NUMBERS 111

(b) Give a direct proof of (−1)(−1) = 1 without making use of Theorem


2.7.
(c) Explain directly as if to a sixth grader why (−3)(−2) = 6 by using
only (−1)(−1) = 1, but not Theorem 2.7.
(d) Do you see how to prove (−m)(−n) = mn for all whole numbers m and
n using only the fact that (−1)(−1) = 1? (This is something you should
keep in mind when you teach: the special case of Theorem 2.6 when s and
t are whole numbers is easier to explain than the general case.)
(4) The following is a standard argument in textbooks to show, for example,
that (−2)(−3) = 6:
Consider the sequence of products
... 4 × (−3) = −12, 3 × (−3) = −9, 2 × (−3) = −6,

1 × (−3) = −3, 0 × (−3) = 0, (−1)(−3) = a, (−2)(−3) = b,

(−3)(−3) = c, (−4)(−3) = d, ... .

Observe the pattern that, for m × (−3) as m decreases to 0, each


product increases by 3. To continue this pattern beyond 0, one
should assign 3 to a, 6 to b, 9 to c, 12 to d, and so on, because
(−1)(−3) = 0 + 3 = 3, (−2)(−3) = 3 + 3 = 6, (−3)(−3) =
6 + 3 = 9, (−4)(−3) = 9 + 3 = 12.
Is this a valid argument? What are the implicit assumptions used? Write
a critique. (Hint: If you write down precisely what this so-called pattern
says, it would be the statement that (n − 1)(−3) = n(−3) + 3 for any
positive integer n.)
(5) Prove equation (2.24) on page 109 by using only equation (2.23) but not
Theorem 2.7 (page 108) or its corollary.
(6) (a) Use mathematical induction but none of Theorems 2.6–2.8 to prove
that, for all whole numbers m and n, (−m)n = −(mn). (b) Use math-
ematical induction and the fact that (−1)(−1) = 1 to prove that for all
whole numbers m and n, (−m)(−n) = mn.
(As mentioned in Exercise 7 on page 99, we will officially take
up mathematical induction in Section 1.7 of [Wu2020b]. This
exercise is not to suggest that you use mathematical induction
to convince seventh graders that (−m)(−n) = mn for all whole
numbers m and n. Rather, this exercise points to a way (more
clearly than Exercise 3 above) that you can phrase your ex-
planation of (−m)(−n) = mn (m, n ∈ N) for seventh graders
to make it more persuasive, but without using the technicality
of mathematical induction. If you could convince all seventh
graders of even this much, it would already be a minor triumph
of mathematics teaching.)
(7) Use Theorem 2.8 to prove the other two rules of "removing parentheses":
−(x − y) = −x + y and − (−x + y) = x − y
for all rational numbers x and y. (Many students learn to remove paren-
theses by rote, using this method. So the purpose of this exercise is to
112 2. RATIONAL NUMBERS

bring awareness to the fact that there are two ways to remove parentheses—
the other makes use of Theorem 2.2* on page 97—and both are based on
genuine mathematics.)
(8) Consider each of the following two statements about any rational number
x:
(a) 3x < x.
1
(b) 10 x > x.
If it is always true or always false, prove the statement. If it is sometimes
true and sometimes false, give examples to explain why.
(9) (a) I have a rational number x so that 5 − (2x − 1) = (1 − 83 x). What is
this x? (b) Same question for (2 − 3x) − (x + 1) = 53 x + 12 .
(10) (For this exercise, let us extend the definition on page 24 of Chapter 1 by
m
defining, for any rational number x and any fraction m n , the meaning of n
3 ∗
of x to be m n · x.) (a) A number x has the property that 4 of x exceeds
x itself by 49. What is this x? (b) A number t = 0 has the property that
twice t exceeds t2 by 47 of t. Find t.

2.5. Dividing rational numbers


The concept of the division of rational numbers is the same as that of the di-
vision of whole numbers or the division of fractions. You may wish to review the
introductory passage in Section 1.5 on page 54 at this point. Using the precise
definition, we prove that every rational number is equal to the quotient of two inte-
gers, which immediately leads to the usual invert-and-multiply rule. We then briefly
discuss the rational-number analog of complex fractions, which we call rational quo-
tients.
The definition of division (p. 112)
Two basic facts (p. 115)
Rational quotients (p. 117)

The definition of division

As before, we begin such a discussion with the proof of a theorem that is the
counterpart of Theorem 1.8 on page 56.

Theorem 2.9. Given x, y ∈ Q, with y = 0, there is a unique (i.e., one and


only one) z ∈ Q such that x = zy.

For example, making use of Theorem 2.8 on page 109 in addition to Theorem
1.8, we have that if x = − 13 and y = 25 , then z = −( 31 × 52 ). Similarly, if x = 75 and
y = − 23 , then z = −( 75 × 32 ), or if x = − 75 and y = − 23 , then z = 75 × 32 . Note that,
except for the negative sign, the z in all cases is obtained by invert-and-multiply.

Proof. We will first prove the existence of such a z and worry about its uniqueness
later. If x = 0, we can just take z to be 0. Thus we may assume that x = 0.
Thus with a nonzero x given, suppose y > 0. If also x > 0, then both x and
y are fractions and the existence of z is already known (see Theorem 1.8 on page
56). If however x < 0, then (−x) is a fraction and again there exists a fraction z 
2.5. DIVIDING RATIONAL NUMBERS 113

such that (−x) = z  y. Thus x = −(−x) = −(z  y) = (−z  )y, by equation (2.23) in
Theorem 2.8 on page 109. So letting z = −z  , we have proved that x = zy when
x < 0. Together, we have proved that for any x ∈ Q, there is a z ∈ Q so that
x = zy in case y > 0.
Still with x = 0 given, suppose y < 0. We have to prove that there is a z ∈ Q
so that x = zy. Since (−y) > 0, we know from the last paragraph that there
is a z  ∈ Q so that x = z  (−y). Again, by equation (2.23), this is the same as
x = (−z  )y. Setting z = −z  , we have once more proved that there is a z ∈ Q so
that x = zy in case y < 0.
We have now proved that such a z exists in all cases.
Now we prove uniqueness, i.e., "one and only one z" if y = 0. The proof is
a standard piece of mathematical reasoning which may take some getting used to,
but you cannot avoid encountering it in higher mathematics. We first prove that
if there are two numbers Y1 and Y2 so that yY1 = 1 and yY2 = 1, then necessarily
Y1 = Y2 . We begin by multiplying both sides of yY1 = 1 by Y2 to get

(2.26) Y2 (yY1 ) = Y2 .

By Theorem 2 in the appendix of Chapter 1 (page 88), the left side of (2.26) is
equal to (yY2 )Y1 . But yY2 = 1, so the left side of (2.26) is equal to Y1 , and we
have Y1 = Y2 , by (2.26). Therefore Y1 and Y2 coincide. Hence there is only one
number Y which satisfies yY = 1. This Y is called the multiplicative inverse of
the nonzero y.
Now we look at the general case. With arbitrary x, y given, suppose x = zy
for some number z. We will show that z = xY and is therefore unique (Y is the
multiplicative inverse of y). To this end, multiply both sides of x = zy by Y to get
xY = (zy)Y . But the right side is equal to

z(yY ) = z · 1 = z.

Thus z is equal to xY , and the proof is complete.


The standard notation for the multiplicative inverse Y of y is y −1 . Thus,
yy = y −1 y = 1 by definition. It follows from the proof of Theorem 2.9 that, for
−1

y = 0,

(2.27) x = zy ⇐⇒ z = xy −1 .

Theorem 2.9 has two useful corollaries.

Corollary 1. If x, y ∈ Q and xy = 0, then x = 0 or y = 0.

Proof. Indeed, suppose y = 0 and we must prove x = 0. Now we always have


0 = 0 · y. Compare with 0 = xy. The uniqueness part of Theorem 2.9 implies that
0 must be equal to x, as desired.
A more down-to-earth reasoning would proceed differently, as follows: since
xy = 0, we have xyy −1 = 0 · y −1 . The left side is x(yy −1 ) = x while the right side
is 0. Therefore x = 0. Corollary 1 is proved.

Pedagogical Comments. Corollary 1 is important for the solution of equa-


tions in algebra (for example, the solution of quadratic equations in Section 2.1 of
114 2. RATIONAL NUMBERS

[Wu2020b]). In TSM, this corollary (for real numbers x and y) is known as the
zero product property or zero product rule, but when it is stated without
any proof in a course on algebra, it often leads to misapplications. For example,
students begin to believe that if (x − 5)(x + 1) = 7, then x − 5 = 7 or x + 1 = 7 (see
p. 223 of [MUST]). This is to be expected because, when students are not shown
any reasoning for such an assertion, they do not see the importance of having a 0
on the right side of the equality xy = 0 and are therefore inclined to extrapolate the
corollary to xy = 7 or xy = b for any number b. Another lesson to learn from this
student misconception is that we cannot afford to wait until algebra to teach this
corollary, because it leads to a misunderstanding that this is an abstract algebraic
fact. Thus, the corollary is sometimes proved only by appealing to the concept
of an integral domain in abstract algebra (see p. 250 of [MUST]), whereas it is
nothing more than a simple consequence of the existence of a multiplicative inverse
for every nonzero number. No additional abstractions are needed. Of course, this
kind of misunderstanding is also caused by the misinformation about "variables"
in TSM, and we will deal with this issue in Section 6.1 (pp. 298ff.) and Section
6.2 (pp. 322ff.). We therefore strongly suggest that you single out Corollary 1 and
its proof for emphasis when you teach rational numbers—and not wait for a course
in algebra—so that students are already mentally prepared before they take up
algebra. The more general form of the corollary where x and y are real numbers
will be found in Exercise 6 after Section 2.1 in [Wu2020c]. End of Pedagogical
Comments.

Corollary 2. For any nonzero y ∈ Q, (−y)−1 = −(y −1 ).

In words, what the corollary says is that the multiplicative inverse of −y is


equal to the negative of the multiplicative inverse of y.

Proof. This can be verified separately for positive and negative y’s (see Exercise
3 on page 120), but it is also valuable to learn an abstract proof. Indeed, from
1 = y −1 y, we get 1 = (−(y −1 ))(−y) (by equation (2.24) on page 109). Comparing
the latter with 1 = ((−y)−1 )(−y) and using the uniqueness of the multiplicative
inverse of −y, we get (−y)−1 = −(y −1 ), as claimed.

We normally omit the parentheses around y −1 in −(y −1 ) and simply write


−1
−y , and we can do this because Corollary 2 guarantees that there is no possibility
of confusion.
Thus (− 27 )−1 = − 72 , and (−2 17 )−1 = − 15
7
.
What does Theorem 2.9 really say? It says that if we have a nonzero rational
number y, then any rational number x can be uniquely expressed as a rational
multiple of y, in the sense that x = zy for a unique rational number z; in fact,
z = xy −1 . This is the exact analog of Theorem 1.8 on page 56. Seeing how Theo-
rem 1.8 led to the definition of fraction division on page 57, we are naturally led to
the following definition of the division of rational numbers: with y fixed and y = 0,
every x ∈ Q determines a unique rational number z so that x = zy. (Again, the
uniqueness of z follows from Theorem 2.9.) This number z is, by definition, the
division of x by y. More formally:
2.5. DIVIDING RATIONAL NUMBERS 115

Definition. Given x, y ∈ Q, with y = 0, the division of x by y, denoted by


x
y
, is the unique number z so that x = zy.

We emphasize that this xy here is a new notation, though it is one that extends
the old notation. In greater detail, the symbol xy makes sense thus far only when
x and y are fractions (page 57), but the x and y in the preceding definition are
possibly negative numbers and, for these, xy does not yet have a meaning. On the
other hand, if x and y are fractions, this meaning of xy of course coincides with the
old one on page 57.
The division of x by y is also called the quotient of x divided by y. It
follows from the definition that
 
x
(2.28) x= y.
y
We note that equation (2.28) has the virtue of suggesting the "cancellation" of the
y’s to get x.
By equation (2.27) on page 113, the quotient z of x divided by y satisfies
z = xy −1 . Therefore we have the following equality since both sides are equal to
the quotient z of x divided by y:
x
(2.29) = xy −1 .
y
Thus equation (2.29) says explicitly that dividing by y is the same as multiplying by
the (multiplicative) inverse of y. One sees that this is a continuation of the theme,
started on page 47, to the effect that division is nothing more than a different way
of writing multiplication.
We note that the definition of the quotient xy of x divided by y as the number
z so that x = zy can be symbolically represented as
x
(2.30) = z ⇐⇒ x = zy.
y
Assertion (2.30) is unfortunately the source that gives rise to the slippery phrase in
TSM, to the effect that "division and multiplication are inverse operations". Make
sure you understand that this phrase sounds good but actually makes no sense.
Since we have already explained this point at length in the Mathematical Aside on
page 55, we will not belabor the point any further.

Two basic facts

We can now clear up a standard confusion in the study of rational numbers.


One finds, for instance, the equalities
3 −3 3
(2.31) = = −
−7 7 7
but explanations are usually not given in TSM. We now supply the explanation.
Consider the equality
3 3
=−
−7 7
in (2.31). By definition, −7 is the quotient of 3 divided by −7, and if we can show
3

that − 37 is also the quotient of 3 divided by −7, then by the uniqueness part of
116 2. RATIONAL NUMBERS

3
Theorem 2.9, −7 and − 37 must be equal. To show that − 37 is also the quotient of
3 divided by −7, it suffices to show (by definition of the quotient) that
 
3
3= − × (−7).
7
But this is so because, by equation (2.24) on page 109, the right side is equal to
7 × 7, which is indeed equal to 3, by virtue of the cancellation rule (page 47). In a
3

similar manner, one proves


−3 3
=− .
7 7
Therefore (2.31) is correct.
There is another way to look at (2.31) that may be illuminating. By (2.27) on
page 113, (2.31) can be restated as
3 × (−7)−1 = (−3) × 7−1 = −(3 × 7−1 ).
The second equality, that (−3) × 7−1 = −(3 × 7−1 ), follows immediately from
equation (2.23) on page 109. For the first equality, we appeal to Corollary 2 on
page 114:
3 × (−7)−1 = 3 × (−7−1 ) (Corollary 2)
= (−3) × 7−1 (equation (2.23)).
So once again, we see that (2.31) is correct.
In like manner, we can prove
−3 3
= .
−7 7
Thus we must show that 7 is the quotient of −3 divided by −7; i.e., we must show
3

(by definition of the quotient) that


3
−3 = × (−7).
7
As above, this follows from equation (2.23) on page 109 and the cancellation rule
(page 47).
More generally, the same reasoning leads to the following theorem.

Theorem 2.10. For any two rational numbers x and y, with y = 0,


x −x x −x x
− = = and = .
y y −y −y y

This theorem will be seen to be a special case of a basic fact about so-called
rational quotients, to be introduced presently. But in terms of everyday computa-
tions in school mathematics, Theorem 2.10 is well-nigh indispensable and deserves
to be singled out. We can put Theorem 2.10 in a broader perspective. When x
and y are whole numbers, this theorem implies that every negative rational number
can be written as a division of two integers where the denominator is positive. For
example, − 37 = −37 , as we have seen in (2.31). Since every positive rational number
(i.e., nonzero fraction) is already known to be a division of two positive integers
(see (iii) on page 58), we have proved the following.
Theorem 2.11. Every rational number can be written as a division of two
integers. In addition, the integers can be chosen so that the denominator is positive.
2.5. DIVIDING RATIONAL NUMBERS 117

Theorems 2.11 and 2.10 are conceptually important because they give an alter-
nate conception of a rational number. In advanced mathematics, rational numbers
are sometimes defined as quotients of integers.
Theorems 2.11 and 2.10 imply that, for instance, the rational number − 97 is
equal to −9 9
7 or −7 , and the former is the preferred choice. Before explaining why
this is so, we first rewrite the algorithm for multiplying rational numbers in its final
form: if a, b, c, d are integers, then
a c ac
× = .
b d bd
The proof is nothing more than a routine case-by-case verification. For example,
 
−3 −14 3 14
× = − × (Theorem 2.10)
7 −5 7 5
 
3 14
= − × (equation (2.23))
7 5
 
3 × 14
= − .
7×5
By Theorem 2.10 again,
 
3 × 14 3 × 14
− = .
7×5 −(7 × 5)
Hence,

−3 −14 3 × 14
× =
7 −5 −(7 × 5)

(−3) × (−14)
= (equations (2.23), (2.24)).
7 × (−5)
The reasoning for the general case is the same.
As a special case, we have that for integers a, b,
a 1
= × a.
b b
−9 9
We can now give an indication of why 7 is preferred, most of the time, over −7

as a representation of − 97 . This is because


−9 1
= × (−9)
7 7
whereas
9 1
= × 9,
−7 −7
1
and it is much easier to think of one-seventh of −9 than " −7 of 9".

Rational quotients

Just as the division of fractions led to the concept of complex fractions, the
division of rational numbers leads to a similar concept which, for lack of a name,
will be referred to as rational quotients. Let x, y, z, w be rational numbers so
118 2. RATIONAL NUMBERS

that they are nonzero where appropriate in the following. Then xy is an example
of a rational quotient; x will be called its numerator, and y its denominator.
We now list the rational-quotient analogs of the basic properties (a)–(d) of complex
fractions on p. 70.

(a) Generalized cancellation law:


x
y
= zx
zy
for any nonzero z ∈ Q.

(b) x
y
z
= w if and only if xw = yz.

(c) x
y
±w
z
= xw±yz
yw
.

(d) x
y
×w
z xz
= yw .

Remark. Compared with the corresponding assertions (a)–(d) for complex


fractions on page 70, it will be noticed that in (b), the analog of the cross-multiplica-
tion inequality is missing. This is because the presence of negative numbers adds
complexity to the comparison of rational numbers (see (E) on p.124). This issue is
dealt with more fully in Exercise 7 on p. 131. Note also that the invert-and-multiply
rule for rational quotients is not listed among (a)–(d) because it is a consequence
of (a)–(d); see equation (2.33) on page 119.
An immediate consequence of (a) and (d) is the following cancellation rule
for rational quotients:
ux z xz
(2.32) × = ,
y uw yw
where we simply cancel the u that appears twice "in top and bottom". For example,
(2.32) justifies the cancellation −3
17 × −3 = 17 .
5 5

As in Section 1.6 (page 68), we will avoid proving (a)–(d) by the mechanical
procedure of writing out each rational number as a quotient of two integers for the
routine computations and relying on (2.23) on page 109 and the cancellation rule
(equation (1.29) on p. 47) to get the answer but will instead make repeated use of
the uniqueness part of Theorem 2.9. To prove (a), for example, let A = xy , B = zxzy ,
and we will prove that A = B. By the definition of division in Q (page 115), we
have x = Ay and zx = B(zy). But the first equality implies zx = z(Ay) which is of
course equal to zx = A(zy). Now compare the latter with zx = B(zy). Theorem
2.9 says there is only one way to express zx as a rational multiple of zy, so we must
have A = B.

There is a common misunderstanding about the passages


from A = xy to x = Ay and

from B = zxzy to zx = Bzy,


in the preceding proof of (a). It is tempting to think that each is
the result of an appropriate cancellation; e.g., multiplying both
sides of A = xy by y leads to Ay = x. While this procedure
leads to the correct conclusion, the procedure is based on circular
2.5. DIVIDING RATIONAL NUMBERS 119

reasoning. This is because, unless we already know that (a) and


(d) are correct and can therefore prove (2.32), we cannot use
(2.32) to do any cancellations. Since we are still trying to prove
(a) and (d) at this point, the correct reasoning is that the equality
x = Ay follows from the definition of the division of x by y as
given in equation (2.28) and zx = Bzy follows from the definition
of dividing zx by zy. Of course, once (a)–(d) have been proved,
we will be free to do as many cancellations as we wish.
To prove (d), let A = xy , B = w z
, and C = yw xz
. We want to show AB = C.
Again, by the definition of division, we get, respectively,
Ay = x, Bw = z, C(yw) = xz.
Multiplying the corresponding sides of the first and second equalities together, we
get AB(yw) = xz. Comparing the latter with C(yw) = xz, we get AB = C by
appealing to the uniqueness part of Theorem 2.9 on how to express xz as a rational
multiple of yw.
The proofs of (b) and (c) are similar and will be left as an exercise (Exercise 7
on page 120).
These formulas may seem unnecessarily abstract, but they have interesting,
practical consequences. For example, let x, y, . . . be rational numbers as before.
Then  −1
x y
= .
y x
This is because, by (d), xy × xy = 1. Also, we have the general form of invert-
and-multiply:
x
y x w
(2.33) z = × .
w y z

−1
This is because, by the definition of division, the left side is xy w
z
and because

z −1 w
w = z.

Pedagogical Comments. In TSM,7 the subject of division traditionally gets


short shrift. It is rarely defined correctly, and things like Theorem 2.10 (page 116)
are hardly mentioned, much less proved. We mentioned in Section 1.6 that the
concept of complex fractions is almost nonexistent in TSM in spite of its obvious
importance. It should therefore come as no surprise that what we call rational
quotients (page 117) and their associated arithmetic, (a)–(d) on page 118, are not
to be found in TSM either. For example, invert-and-multiply is clearly used in
computations of the following type:
−3
5 (−3)(−7)
= .
2.4
−7
5 × 2.4
However, students are only taught (most likely by rote) to invert and multiply
ordinary fractions (see page 57), but −3 2.4
5 and −7 are not ordinary fractions. Students
are therefore forced to extrapolate, without benefit of reasoning, from something
known only for ordinary fractions to "fractions" with negative fractional numerators
7 See page xiv for the definition.
120 2. RATIONAL NUMBERS

and denominators. The cumulative effect of many such compulsory blind leaps
of faith in TSM is the corrosion of mathematics learning. It forces students to
ignore the fundamental fact that a mathematical conclusion is valid only when
certain hypotheses are satisfied (compare the discussion on pp. 69ff.). For survival,
students learn instead to apply indiscriminately whatever they know—regardless
of the circumstances—in order to get an answer. At the end, they come to believe
that every skill is unconditionally valid regardless of hypothesis.
Be sure to point out to your students the mathematical reasoning behind what
seem to be rote skills in Theorem 2.10, Theorem 2.11, and the rules (a) and (d) for
rational quotients. These skills lie behind the general form of invert-and-multiply
that justifies the preceding computation. End of Pedagogical Comments.

Exercises 2.5.

(1) Compute and simplify: (i) ( −39 9 39 −5 7 5


8 × 11 ) + ( −8 × 33 ), (ii) 1.2 + −1.8 ,

8 ( 3 − 9 ), (iv) (−4.79) × 0.25 − (−0.5)(1.87).


(iii) −6 14 − 27 2 8

(2) Prove equation (2.32) on p. 118.


(3) (a) For any nonzero x ∈ Q, prove that (x−1 )−1 = x. (b) Give a direct
proof of (−x)−1 = −(x−1 ) by considering the following two cases sepa-
rately: (i) x is a fraction and (ii) x is a negative fraction.
(4) Let x, y, and z be rational numbers so that x = 0 and xy = xz. Prove
that y = z.
(5) Write down an explanation you would give to an seventh grader that
− 45 = −5 4
. Be forewarned that this seventh grader is probably hazy
about all these symbols to begin with.
(6) Explain as if to an eighth grader why 3/( −5 4
) = − 15 4 . Assume only a
knowledge of the multiplication of rational numbers, and explain what
division means.
(7) (a) For rational numbers x, y, z, w so that y = 0 and w = 0, prove that
x z x w xz+yw
y = w if and only if xw = yz. (b) Give a proof of y + z = yz for
rational numbers x, y, z, w so that y = 0 and z = 0 by making use of the
uniqueness part of Theorem 2.9. (See the proof of (a) on page 118.)
(8) Let x, y, z be rational numbers so that z = xy . Explain as if to a seventh
grader why (a) if x and y are both positive or both negative, z is positive
and (b) if one of x and y is positive and the other negative, then z is
negative.
(9) Show that if A, B, C, and D are rational numbers and A − C = 0, then
there is a rational number x so that Ax + B = Cx + D.
(10) (a) Let x be a nonzero rational number. Explain why the division x0
cannot be defined. (Hint: Look carefully at the definition of a division xy
and see where the reasoning begins to break down if y = 0.) (b) Explain
why 00 cannot be defined.
2.6. COMPARING RATIONAL NUMBERS 121

2.6. Comparing rational numbers


Although the mathematics of elementary school is preoccupied with equalities,
equalities and inequalities play equal roles in the mathematics beyond the most el-
ementary level. Moreover, the concept of absolute value gains prominence once
negative numbers are introduced, and the interplay between absolute value and in-
equalities is highly nontrivial. This section introduces some basic facts about in-
equalities and absolute value. The two inequalities discussed in the last subsection
(Theorems 2.12 and 2.13) are omnipresent in mathematics.
Basic facts about inequalities (p. 121)
Absolute value (p. 125)
Two standard absolute value inequalities (p. 127)

Basic facts about inequalities

Recall the definition of x < y between any two numbers x and y (see page 91):
it means x is to the left of y on the number line.
x y

We also write y > x for x < y. A related symbol is x ≤ y (or y ≥ x), which
means x < y or x = y.
In this section, we begin to take a serious look at the comparison of rational
numbers8 ; i.e., if x and y are rational numbers, which of the following is true: x ≤ y
or x > y? We will prove several basic facts about inequalities that are useful in
school mathematics. In general, we use the symbol "<" exclusively, but you should
be aware that every one of these inequalities has an analogous statement about
"≤".
We take note of three simple properties of the inequality between numbers;
they are obvious consequences of the fact that numbers are points on the number
line, and it does not matter if they are rational numbers or just real numbers. The
first two are the following:
Reflexive property of "≤". If x ≤ y and y ≤ x, then x = y.

Transitive property of "≤". If x ≤ y and y ≤ z, then x ≤ z.

The third property deserves to be singled out because it plays a critical role in
many proofs. Given any two numbers x and y, then either they are the same point
or if they are distinct, one is to the left of the other; i.e., x is to the left of y or
y is to the left of x. These three possibilities are obviously mutually exclusive. In
symbols, this becomes:
Trichotomy law. Given two numbers x and y, then one and
only one of the three possibilities holds: x = y or x < y or x > y.
The way this law comes up in proofs is typically the following. Suppose we try to
prove that two numbers x and y are equal. Sometimes it is impossible or difficult

8 Anticipating FASM (page 133), such a comparison extends immediately to a comparison of

real numbers. For this reason, we will use on occasion "number" instead of "rational number" in
the discussion of this section.
122 2. RATIONAL NUMBERS

to directly prove x = y. But by the trichotomy law, if we can eliminate both


possibilities of x < y and x > y, then the desired conclusion that x = y will follow.
We should point out that the trichotomy law in fact implies the reflexive prop-
erty of "≤". See Exercise 5 on page 131.
The basic facts about inequalities we are after are (A)–(E) below. (Recall that
"⇐⇒" stands for "is equivalent to".)

(A) For any x, y ∈ Q, x < y ⇐⇒ −x > −y.

For example, 2 < 3 ⇐⇒ −3 < −2.


If x < 0 < y, then −x > 0 while −y < 0 and there is nothing to prove.
Therefore we need only to attend to the cases where x and y have the same sign;
i.e., they are both ≥ 0 or both ≤ 0. If x = 0 or y = 0, there is nothing to prove, so
we may assume both x and y are nonzero. Suppose 0 < x < y; then we have
−y −x 0 x y

On the other hand, if x < y < 0, then we have


x y 0 −y −x

In both cases, the truth of −x > −y is obvious.

(B) For any x, y, z ∈ Q, x < y ⇐⇒ x + z < y + z.

For example, given 2 < 3, we can verify directly that 2 − 15 < 3 − 15 and
7 7
2+ 3 < 3 + 3.
We first prove that x < y implies x + z < y + z for any z. So suppose x < y.
Because of the commutativity of addition, it suffices to prove z + x < z + y, or
equivalently, the endpoint of the vector z + x is to the left of the endpoint of the
vector z + y . By the definition of vector addition, both vectors z + x and z + y are
obtained by placing the starting points of x and y , respectively, at the endpoint
of z , and the endpoints of the displaced x and y , respectively, will be z + x and
z + y. Since by hypothesis, the endpoint of x is to the left of the endpoint of y , the
conclusion is immediate.
The following picture shows the case where x > 0 and y > 0:
z z+x 0
- - -x -
z+y y

Next we prove x + z < y + z for some z implies that x < y. To do this, we


make use of what we have just proved: adding −z to both sides of x + z < y + z
immediately yields x < y. The proof of (B) is complete.

Corollary. For any x, y, w, z ∈ Q, if x < y and w < z, then x + w < y + z.

The proof of the corollary will be left to Exercise 6 on page 131.


2.6. COMPARING RATIONAL NUMBERS 123

(C) For any x, y, ∈ Q, x < y ⇐⇒ y − x > 0.9

For example, (−5) < (−3) =⇒ (−3) − (−5) > 0 (because (−3) − (−5) = 2),
and conversely, (−3) − (−5) > 0 =⇒ (−5) < (−3).
First, we prove that x < y =⇒ y − x > 0. By (B), x < y implies x + (−x) <
y+(−x), which is equivalent to 0 < y−x. Conversely, we prove y−x > 0 =⇒ x < y.
Again we use (B), y − x > 0 implies that (y − x) + x > 0 + x, which is equivalent
to y > x, as desired.

(D) For any x, y, z ∈ Q, if z > 0, then x < y ⇐⇒ xz < yz.

Thus, 4 < 5 =⇒ ( 23 23
6 )4 < ( 6 )5 (because the left side is
92
6 and the right side

6 ), and (−11) < (−9) =⇒ 7(−11) < 7(−9) (because the left side is −77 while
is 115
the right side is −66).
We first prove that, with x, y, z as given, x < y =⇒ xz < yz. We give two
proofs.
First proof: By (C), x < y =⇒ y − x > 0. Since z > 0 by hypothesis and the
product of two positive numbers is positive (see page 107), we have (y − x)z > 0,
so that yz − xz > 0. By (C) again, xz < yz, as desired.
A second proof uses the theorem on fraction multiplication, which equates a
product with the area of a rectangle (Theorem 1.7 on page 48). Given z > 0 and
x < y, if x < 0 < y, then xz < 0 and yz > 0 and there would be nothing to
prove. Therefore we need only consider the cases where x and y have the same
sign (i.e., they are both ≥ 0 or both ≤ 0; see p. 122). So let 0 ≤ x < y. If
x = 0, obviously xz < yz. Thus we may assume 0 < x < y. Then the inequality
xz < yz is exactly inequality (i) on page 52 in Section 1.4. (Briefly, here is the
reason: for fractions x, y, and z, Theorem 1.7 on page 48 says xz and yz are areas
of rectangles with sides of length x, z and y, z, respectively. Since x < y, clearly
the rectangle corresponding to yz contains the rectangle corresponding to xz and
therefore has a greater area. Hence yz > xz.) Now suppose x < y ≤ 0. Again, if
y = 0, there is nothing to prove, so we may assume x < y < 0; then (−x), (−y) > 0.
Moreover x < y implies (−y) < (−x), by (A). Thus we know from the preceding
argument that (−y)z < (−x)z, which is equivalent to −yz < −xz (equation (2.23)
on page 109), and therefore yz > xz, by (A) again.
Finally, we prove the converse: if for some z > 0, xz < yz, then x < y. We
claim that z1 > 0. Indeed, since z( z1 ) = 1 and 1 > 0, we see—from page 107—that,
in order for the product of z with z1 to be positive, z and z1 have to be either both
positive or both negative. Therefore z1 has to be positive. Such being the case,
then by what we have just proved, z1 > 0 and xz < yz imply that z1 (xz) < z1 (yz),
which is the same as x < y. (D) is proved.

9 It should be remarked that, in advanced mathematics, (C) is taken as the definition of x < y.
124 2. RATIONAL NUMBERS

(E) For any x, y, z ∈ Q, if z < 0, then x < y ⇐⇒ xz > yz.10

To students, the fact that, when z < 0, the inequality x < y would turn into
xz > yz is the most fascinating aspect about inequalities. This goes against every-
thing they have learned up to this point, which suggests that whatever arithmetic
operation they apply to an inequality would preserve that inequality. Here, how-
ever, is a situation where an inequality gets reversed. We first illustrate with some
examples whose validity can be easily verified (in each case, the initial inequality is
multiplied by −4 to get the second one):
1<2 but −4 > −8,
3
2 < 15
4 but −6 > −15,

−2 < 1
2 but 8 > −2,

−1 < − 23 but 4 > 2 23 .


To prove x < y =⇒ xz > yz when z < 0, we do it in two slightly different
ways. First, we make use of (C). Since x < y, we see that (y − x) is positive,
by (C). Since −z is also positive, we have (−z)(y − x) > 0 (see page 107). But
(−z)(y − x) = xz − yz by (2.25) on page 110 and Theorem 2.8 on page 109, so
xz − yz > 0. By (C), we get yz < xz, as desired.
For the second proof, let z = −w, where w is now positive. Since x < y, (D)
implies that wx < wy. By (A), −wx > −wy. But Theorem 2.8 on page 109 says
−wx = (−w)x = zx and −wy = (−w)y = zy, so zx > zy.
The second proof suggests a more intuitive way to understand why, if z < 0,
then multiplying an inequality by z would reverse that inequality. Consider the
special case where 0 < x < y and z = −2. We want to understand why (−2)y <
(−2)x. By Theorem 2.8 on page 109, (−2)y = −(2y) and (−2)x = −(2x). Thus
we want to see, intuitively, why −2y < −2x. From 0 < x < y, we get the following
picture:

0 x y

Then the relative positions of 2x and 2y are the same as those of x and y although
each is pushed further to the right of 0.

0 2x 2y

If we reflect this picture across 0, we get the following:

−2y −2x 0 2x 2y

10 At the end of the preceding section, we warned against the tendency to assume that every

skill is universally applicable. There is no better illustration of the danger of this tendency than
the contrast between (D) and (E). One must begin to be sensitive to the fact that some facts are
true only under restrictive hypotheses.
2.6. COMPARING RATIONAL NUMBERS 125

We see that −2y is now to the left of −2x, so that −2y < −2x, as claimed. (Of
course, if z were − 12 , then x and y would both be pushed closer to 0 instead, but
the relative positions of − x2 and − y2 would still be the same as those of −2x and
−2y.)

It remains to prove that if z < 0, then xz > yz implies x < y. We claim that
1
z < 0. This is because z( z1 ) = 1 and 1 is positive. Since z is negative, z1 has to be
negative too as negative × positive = negative (see p. 107). Thus by the first part
of the proof, multiplying both sides of xz > yz by z1 reverses the inequality; i.e.,
1 1
z (xz) < z (yz). This is the same as x < y. The proof of (E) is complete.

In the course of proving (E), we proved the following useful fact: let x ∈ Q;
then
1
(2.34) x > 0 ⇐⇒ > 0.
x
This is because x( x1 ) = 1 > 0, and x and x1 cannot have opposite signs (i.e.,
one is negative and the other positive). Therefore x and x1 are either both positive
or both negative, proving (2.34). In turn, (2.34) leads to another useful fact: for
x, y, z ∈ Q,
x y x y
(2.35) x < y =⇒ < if z > 0 and > if z < 0.
z z z z

Activity. True or false: x < y and 0 < z < w imply xz < yw. (Be careful.)

Absolute value

Intrinsically tied to any discussion of inequalities in Q is the notion of the


absolute value |x| of a number x, which is by definition the distance from x to
0 (i.e., the length of the segment [x, 0] or [0, x], depending on whether x is negative
or positive, respectively). In particular, |x| ≥ 0 no matter what x may be. The
most pleasant property of the absolute value is that, for all numbers x, y,

(2.36) |x| · |y| = |xy|.

This can be proved by a case-by-case examination of the four cases where x and y
take turns being positive and negative. Since the reasoning is routine, the details
can be left to Exercise 3 on page 131. On the other hand, inequalities involving
absolute value tend to present difficulties to students, so let us discuss this topic
with some care.
If b is a positive number, then the set of all the numbers x so that |x| < b
consists of all the points x of distance less than b from 0, indicated by the thickened
segment below (excluding the endpoints):

−b 0 x b

It follows that the inequality |x| < b for a point x is equivalent to the fact
that x satisfies both −b < x and x < b. It is standard practice in mathematics to
126 2. RATIONAL NUMBERS

combine these two inequalities into a composite statement in the form of a double
inequality:
|x| < b is equivalent to −b < x < b.
In the usual notation for intervals on the number line, this becomes:
|x| < b is equivalent to x ∈ (−b, b).
(The set of all the points x on the number line satisfying c < x < d, for two
fixed points c and d, with c < d, is denoted by (c, d). This is called an open
interval with endpoints c and d. We apologize for the likely confusion of this
notation with the point in the coordinate plane whose coordinates are c and d—to
be introduced in Section 6.3 on pp. 331—but that is the way it is. In this context,
the "segments" in our usual discussion are denoted by [c, d], which consists of the
open interval (c, d) together with the two endpoints c and d; more explicitly, [c, d]
is the collection of all the points on the number line x so that c ≤ x ≤ d. We call
[c, d] the closed interval with endpoints c and d.) Henceforth, we will refer to
[c, d] interchangeably as either a segment or a closed interval.
The fact that a single inequality |x| < b involving absolute value is equivalent to
a double inequality −b < x < b is a very useful fact to keep in mind in considerations
involving absolute values. In the following, we sometimes refer to −b < x < b as
the associated double inequality of |x| < b. The following example illustrates
the way the conversion of an absolute value inequality into its associated double
inequality can be put to use.

Example. Determine all the numbers x so that |6x + 1| + 2 41 < 5, and show
them on the number line.
The inequality |6x + 1| + 2 41 < 5 is equivalent to |6x + 1| < 5 − 2 14 (by (B)
above), which is just |6x + 1| < 2 34 , which, in turn, is equivalent to the double
inequality −2 34 < 6x + 1 < 2 34 . The left inequality is equivalent to −2 34 − 1 < 6x
(by (B) again); i.e., − 15
4 < 6x. Now we multiply both sides of this inequality by
1
6 and use (D) to conclude that it is equivalent to − 15 24 < x. By exactly the same
reasoning, the right inequality 6x + 1 < 2 34 is equivalent to x < 24 7
. Putting all this
together, we have that the inequality |6x + 1| + 2 4 < 5 is equivalent to the double
1

inequality − 15 7
24 < x < 24 . The collection of all the points x satisfying this double
inequality is the open interval (− 15 7
24 , 24 ) and is indicated by the thickened segment
in the picture (not including the endpoints).
− 15
24 0
7
24
  
|6x + 1| + 2 14 < 5

A basic property of absolute value is the following: for two numbers x and x0 ,
(2.37) |x0 − x| = the distance between x and x0 .
In particular, the length of an interval, (a, b) or [a, b], is just |a − b|.
There are three cases to consider: both x0 and x are positive, one is positive
and the other is negative, and finally both are negative. First we look at the case
where both are positive. Since |x0 − x| = |x − x0 |, we may assume x < x0 , so that
|x0 − x| = x0 − x.
2.6. COMPARING RATIONAL NUMBERS 127

0 x x0

Proving that x0 − x is the distance between x and x0 is—by definition (p. 125)—
equivalent to proving that x0 − x is the length of the segment [x, x0 ], but this
follows from the definition of subtraction on page 38: x0 − x is the length of the
remaining segment when [0, x] is taken away from [0, x0 ]. (Recall: x0 and x are
understood to be rational numbers but by FASM (Section 2.7 on pp. 133) may be
more liberally interpreted to be any numbers. See the footnote on page 121.) Now
consider the second case, where one is positive and the other is negative. Again,
because |x0 − x| = |x − x0 |, we may assume that x < 0 and x0 > 0, as shown:

x 0 x0

Since x < 0, we have x = −|x|. Then |x0 − x| = |x0 − (−|x|)| = |x0 + |x|| = x0 + |x|,
and the claim is again obvious. Finally, the case of x0 and x being both negative
is reduced to the first case because x0 = −|x0 | and x = −|x|, so that
|x0 − x| = |− (x0 − x)| = |− x0 − (−x)| = ||x0 | − |x||
= distance between |x0 | and |x|, by the first case
= distance between x0 and x, by symmetry.
Here is the picture for the case in which both x0 and x are negative:

x0 x 0 |x| |x0 |

This then completes the proof of (2.37).

Two standard absolute value inequalities

The importance of the concept of absolute value is hidden in TSM.11 See the
Pedagogical Comments on page 130 for an informal explanation of the importance.
Here we are content to illustrate how absolute value is used in a nontrivial way by
proving two basic inequalities involving absolute values. Here is the first one.

Theorem 2.12. (i) For any numbers x and y,


2|xy| ≤ x2 + y 2 .
(ii) Equality holds in the preceding weak inequality if and only if |x| = |y|.

This theorem is a variant of the inequality of arithmetic and geometric


means for two numbers; see Exercise 18 on page 132. For an accessible discussion of
this famous inequality for more than two numbers, see Chapter II of [Karzarinoff].
At this point, it suffices to prove this theorem for rational values of x and y.
Thus we will tacitly assume x, y ∈ Q in the discussion following. Before giving the
proof, let us understand the role played by the absolute value |xy| of xy. Clearly,
this theorem is of no interest if one of x and y is 0, as it merely says that 0 ≤ x2 +y 2

11 See page xiv of the preface for the definition of TSM.


128 2. RATIONAL NUMBERS

(see (2.22) on page 107). So we may as well assume both x and y to be nonzero.
Such being the case, we make use of (2.35) on p. 125 to rewrite the theorem as
2|xy|
≤1
x2 + y 2
1
for all x and y. Since x2 + y 2 > 0, we have x2 +y 2 > 0 on account of (2.34) on

p. 125. It follows that | x2 +y


1
2 | = x2 +y 2 . Therefore, using |AB| = |A| · |B| for all
1

numbers A and B (see (2.36) on page 125), we get



2xy
= |2xy| · 1 = 2|xy| · 1 2|xy|
= 2 .
x2 + y 2 x2 + y 2 x2 + y 2 x + y2
Thus the theorem is equivalent to claiming that

2xy

x2 + y 2 ≤ 1,

and that equality holds if and only if |x| = |y|. We know from a previous remark
on page 126 that this inequality is equivalent to the double inequality
2xy
−1 ≤ ≤ 1.
x2+ y2
In this form, part (i) of the theorem is equivalent to asserting that the number
x2 +y 2 is trapped inside the segment [−1, 1] between −1 and 1 for all x and y.
2xy

Without the absolute value sign, part (i) of the theorem merely says that
2xy
≤ 1.
x2 + y2

This inequality does not preclude the possibility that x22xy


+y 2
= −100. With the
2xy
absolute value sign in place, however, we know that x2 +y 2 cannot be to the left of

−1 on the number line and, in particular, cannot be equal to −100. We see that the
presence of absolute value in the inequality of the theorem makes a big difference.

It remains to give the simple proof of Theorem 2.12. We first prove part (i)
in its original formulation:
2|xy| ≤ x2 + y 2 .
Let u = |x| and v = |y|; then 2|xy| = 2|x| · |y| by (2.36) on page 125. Thus
2|xy| = 2uv. Now we make the simple observation that for all numbers t, t2 = |t|2 ;
this is clear when we first consider the case t ≥ 0 and then the case t < 0. Therefore,
we have x2 = |x|2 = |x| · |x| = uu = u2 . Similarly, y 2 = v 2 . Thus part (i) becomes
the statement that
2uv ≤ u2 + v 2 ,
which is equivalent to 0 ≤ u2 − 2uv + v 2 , by (B) on page 122. In other words, part
(i) is equivalent to
u2 − 2uv + v 2 ≥ 0
2.6. COMPARING RATIONAL NUMBERS 129

for any numbers u and v. This is, however, obvious because12


(2.38) u2 − 2uv + v 2 = (u − v)2
and (u − v)2 ≥ 0 (see (2.22) on page 107). This proves (i). To prove part (ii), the
fact that if |x| = |y|, then 2|xy| = x2 + y 2 is trivial because x2 = |x|2 and y 2 = |y|2 .
Conversely, suppose 2|xy| = x2 + y 2 , and we will prove |x| = |y|. We have
2|xy| = x2 + y 2 ⇐⇒ 2|x| · |y| = |x|2 + |y|2
⇐⇒ 0 = |x|2 − 2|x| · |y| + |y|2
⇐⇒ (|x| − |y|)2 = 0 (by (2.38)).
But (|x| − |y|) = 0 implies that |x| − |y| = 0 (see page 107); i.e., |x| = |y|. The
2

proof of the theorem is complete.

It may be thought that part (ii) of Theorem 2.12, which gives a necessary and
sufficient condition for the inequality to become an equality, is not interesting. The
opposite is true; see Exercise 18 on page 132 below.
We conclude with what is probably the most basic inequality involving absolute
value in elementary mathematics.

Theorem 2.13 (Triangle inequality). For any numbers x and y, (i) |x+y| ≤
|x| + |y| and (ii) this (weak) inequality is an equality if and only if x and y are of
the same sign, i.e., both ≥ 0 or both ≤ 0.

Mathematical Aside: The reason for the terminology of "triangle inequality"


comes from the fact that if x and y are vectors in the plane and if a triangle has
two sides with lengths |x| and |y|, then |x + y| will be the length of its third side.
In that case, the inequality |x + y| < |x| + |y| becomes the statement that the sum
of the lengths of two sides of a triangle exceeds the length of the third side (see
Theorem G36 in Section 6.6 of [Wu2020b]).
The triangle inequality is truly basic because, in advanced mathematics, many
mathematical objects become "spaces" in a generalized sense (e.g., the space of
continuous functions, the space of solutions of a linear differential equation, etc.).
When a suitable "distance" is introduced into these "spaces" so that the "length of
a segment" again makes sense, then it becomes important to establish a "triangle
inequality" in the form of Theorem 2.13.

Proof. We first prove part (i). If one of x and y is 0, then there is nothing to
prove. We assume therefore that both x and y are nonzero. The most elementary
proof is one using case-by-case examination of the inequality. There are four cases
to consider: (i) both x, y > 0, (ii) both x, y < 0, (iii) x < 0 but y > 0, and (iv)
x > 0 but y < 0. Such a proof would be boring, but it is quite instructive if you
want to get some down-to-earth feelings about absolute values.
We give a different proof, one that makes use of the fact that the inequality
|x| ≤ b is equivalent to the double inequality −b ≤ x ≤ b. This is a standard
proof but is also one from which one can learn something about absolute values.
Therefore, instead of proving |x + y| ≤ |x| + |y|, we prove the double inequality
−|x| − |y| ≤ x + y ≤ |x| + |y|. There is no question that −|x| ≤ x ≤ |x| and
12 The fact that for any two numbers u and v, (2.38) is true can be verified by a straightforward

application of the distributive law; this fact is discussed in a broader context on pp. 304ff.
130 2. RATIONAL NUMBERS

−|y| ≤ y ≤ |y|. From −|x| ≤ x and −|y| ≤ y, we use the corollary of (B) on page
122 to conclude that −|x| − |y| ≤ x + y. Similarly, we use x ≤ |x| and y ≤ |y| and
the corollary of (B) to conclude that x + y ≤ |x| + |y|. Thus, we have proved both
inequalities in the double inequality.
Next we prove part (ii). There is no question that if x and y are of the same
sign, then equality holds in the inequality; i.e., |x + y| = |x| + |y|. We now prove
the converse; namely, |x + y| = |x| + |y| implies that x and y are of the same sign.
If one of x and y is 0, then x and y are already of the same sign and there is
nothing to prove. Assume therefore that both x and y are nonzero, and we will
use a contradiction argument. Suppose |x + y| = |x| + |y| and x, y are not of the
same sign. This means one of x and y is negative and the other is positive. For
definiteness, say x < 0 and y > 0. Then |x| + |y| = −x + y, but |x + y| = x + y
or |x + y| = −(x + y). In the case of the former, we have −x + y = x + y, so
that 2x = 0 and therefore x = 0. This contradicts x being negative. If the latter,
then −x + y = −(x + y), so that −x + y = −x − y, and therefore 2y = 0. Hence
y = 0 and this contradicts the positivity of y. Thus it is impossible that x and y are
not of the same sign when |x+y| = |x|+|y|. The proof of Theorem 2.13 is complete.

Pedagogical Comments. The concept of absolute value is a staple of the


middle school curriculum, yet, at least in TSM, this concept seems to have no
relevance to anything else in the school mathematics curriculum. Many teachers
(not to mention innumerable students) have asked why they should bother with
this concept. As one educator put it, absolute value is routinely taught in schools
as an isolated rote procedure: "take off the minus sign if there is one". Teachers
feel handicapped by being forced to teach something for which they do not see any
relevance.
It is not possible in an elementary text, especially one of this nature, to give a
wholly satisfactory answer to the question of why absolute value should be taught.
The importance of absolute value emerges mostly in the more advanced portions of
mathematics or the sciences, such as Chapters 2–7 of [Wu2020c], where one comes
face to face with the concept of limit and its related inequalities. (See, for example,
the very definition of a convergent sequence in Section 2.2 of [Wu2020c] and the
subsequent discussion in that section.) Here we have to be content with giving the
barest hint of an idea of why it is needed.
There are situations where we want only the absolute value ("magnitude") of
a number but do not particularly care whether the number is positive or negative.
For example, suppose you try to estimate the sum of two 3-digit whole numbers,
369 + 177, by rounding to the nearest hundred. The sum is of course 546, but
the estimated sum would be 400 + 200 = 600. The measurement of the accuracy
of such an estimate is the so-called absolute error of the estimation, which is
by definition the absolute value of the difference between the exact value and the
estimated value; i.e.,
absolute error = |(estimated value) − (exact value)|.
In this case, it is |600 − 546| = 54. Now, if we make the same estimate of the sum
234 + 420, again by rounding to the nearest hundred, then the absolute error of the
estimated value of 600 (= 200 + 400) is still 54, because |600 − 654| = 54. These two
estimates differ in that the former overestimates by an amount of 54, whereas the
latter underestimates by the same amount. However, as a preliminary indication
2.6. COMPARING RATIONAL NUMBERS 131

of the accuracy of these estimates, it can be said that they both miss the mark by
54 and it does not matter whether they are over or under by this amount. Thus
it is the absolute value of this difference, rather than the difference itself, that is
of primary interest. The absolute value in this instance provides exactly the right
tool to express the absolute error of such estimations.
A similar phenomenon surfaces when one tries to say that two numbers x and
y are "close" to each other, for example, in the consideration of the limit b of a
sequence (xn ) in Chapter 2 of [Wu2020c]. When n is large, one wants to say that
the distance between xn and b is small. In this situation, it is irrelevant whether
xn is to the left or to the right of b, because all one cares about is that the distance
between xn and b gets smaller and smaller:

xn b b xn

It is in this context that equation (2.37) on page 126 comes to the fore because
it allows us to express this fact simply as "|b − xn | is small". There is a more
important consideration, however. In order to verify that indeed |b − xn | is small
in a particular situation, often long computations involving absolute values will be
required. Therefore the concept of absolute value is needed, not only as a conve-
nient tool to express a finding at the end of a long process, but as an integral part
of the reasoning within the process itself. (Such computations are faintly suggested
in the proof of Theorem 2.12 on page 127 and are slightly better represented in the
proofs of parts (c) and (d) of Theorem 2.10 in [Wu2020c].) Because the concept
of limit shows up almost everywhere in advanced mathematics, one can begin to
get a glimpse of the importance of absolute values from this discussion. End of
Pedagogical Comments.

Exercises 2.6.

(1) Which is greater: (a) (−1.7) · 9 or −22 + 6 23 , (b) −2 1.1


5 or (−5) 12.5 ,

(c) ( −2 4 14 −2
3 )/( 7 ) or ( 3 )( 8.5 )?
(2) Determine all the numbers x which satisfy the inequality in each of the
following: (a) |x − 1| − 5 < 23 , (b) 11 − |3 + 2x| > 2.5, (c) |2x − 35 | ≥ 15 ,
(d) 3 − |2x − 5| ≥ 4.2.
(3) Prove (2.36) on p. 125, i.e., for any x, y ∈ Q, |xy| = |x| · |y|.
x |x|
(4) If x and y are rational numbers and y = 0, then prove that = .
y |y|
(5) Prove that the trichotomy law (page 121) implies the reflexive property of
"≤" (see page 121).
(6) (a) If x, y, z, w are rational numbers and x ≤ y and w ≤ z, then show that
x + w ≤ y + z. (b) If, in addition, all four numbers ≥ 0, then show that
xw ≤ yz.
(7) (a) Let x, y, z, w ∈ Q, and let y, w > 0. Then prove that xy < w z
⇐⇒
x z
xw < yz. (b) Give examples to show that both implications " y < w =⇒
x z
xw < yz" and "xw < yz =⇒ y < w " are false without the assumption
132 2. RATIONAL NUMBERS

that y, w > 0. (c) Are the numbers


32.5 −30 23
and
−3 2 45
equal? If so, prove it. If not, which is bigger?
(8) Let x and y be rational numbers. Prove (a) |x| − |y| ≤ |x − y|, (b)
||x| − |y|| ≤ |x − y|.
(9) If x is a rational number, is it true that x < 1 implies x1 > 1? If so, prove
it. If not, formulate a true statement, and prove it.
(10) If x > 1, then prove that xn > 1 for any positive integer n. Also if
−1 < x < 1, then prove that −1 < xn < 1 for any positive integer n.
(11) Let x be a rational number. (a) If x > 1, then show that xm > xn for
whole numbers m > n. (b) If 0 < x < 1, then prove that xm < xn for
whole numbers m > n.
(12) If x, y are numbers so that 0 < x < y and n is a positive integer, how
does xn compare with y n ? Why?
(13) Let a, b, c, d be any four numbers. Prove that
 
|ac + bd| ≤ a2 + b2 c2 + d2
and that equality holds if and only if ad = bc. (This is a special case of
the Cauchy inequality. See page 67 of [Karzarinoff].)
(14) Is it true that if x and y are two rational numbers (in particular, they
could be negative), then 19 x2 − 12
1 1 2
xy + 64 y ≥0?
(15) Show that for all numbers x, y, and c = 0,
 
1
|x + y|2 ≤ 1 + 2 |x|2 + (1 + c2 )|y|2 .
c
(16) Let x and y be two numbers so that 0 < |x| ≤ |y|. Consider the following
picture of two nested squares of side lengths |x| and |y|:
|y|

|x|

Assuming the usual area formula for a rectangle, use this picture to give a
proof of Theorem 2.12 on page 127. (Hint: The inequality of the theorem
is equivalent to |xy| ≤ 12 (x2 + y 2 ).)
(17) Let a and b be numbers so that a < b, and let θ be a number. Prove that
a < (1 − θ)a + θb < b ⇐⇒ 0 < θ < 1. 13
(18) (a) Prove the following inequality of arithmetic and geometric
√ means
for two numbers: if s and t are nonnegative numbers, then st ≤ 12 (s + t),
and equality holds if and only if s = t. (The left side is called the geo-
metric mean of s and t while the right side is called their arithmetic
13 Due to Ole Hald.
2.7. FASM 133

mean.) (b) Prove that among all rectangles with a fixed perimeter, the
square has the biggest area.
(19) If x and y are positive, then prove that (a) x2 = y 2 if and only if x = y
and (b) x2 < y 2 if and only if x < y. (Hint: Use the trichotomy law.)
(20) Expand the following outline of an argument into a valid proof of the
triangle inequality (Theorem 2.13 on page 129): to prove |x+y| ≤ |x|+|y|,
it suffices to prove |x + y|2 ≤ (|x| + |y|)2 . This is so because
|x + y|2 = (x + y)2 = x2 + 2xy + y 2
≤ |x|2 + 2|x| · |y| + |y|2 = (|x| + |y|)2 .

2.7. FASM
We will give a precise statement of FASM in this section and put the last two
chapters on fractions and rational numbers in perspective.

Using the concept of rational quotients, we can state the Fundamental As-
sumption of School Mathematics (FASM):
The laws of operations for both addition and multiplication (asso-
ciative, commutative, and distributive), the formulas (a)–(d) on
page 118 for rational quotients, and the basic facts about inequal-
ities (A)–(E) on pp. 122–124 for rational numbers continue to
be valid when the rational numbers are replaced by real numbers.
FASM will be proved in Section 2.1 of [Wu2020c].
Next we turn to the treatment of the rational numbers in this volume. Our
starting point is the whole numbers; then we expand it to fractions, and finally to the
rational numbers by introducing negative fractions. In an upper division course on
abstract algebra, however, the field of rational numbers, Q, is usually introduced
in the following way. One starts with the whole numbers, N, and enlarges that to the
integers, Z, by adjoining to N the negative integers (i.e., the additive inverses of the
whole numbers). For example, there is no whole number x so that x + 3 = 0 but the
number −3 in Z has exactly this property: (−3) + 3 = 0. Z is now a commutative
ring in the sense that addition and multiplication are defined between any two
elements of Z, so that both satisfy the associative and commutative laws and so
that the distributive law holds. Moreover, every element in Z has an additive
inverse. Now Z has an additional property; namely, it has no divisors of zero, in
the sense that if m = 0 and n = 0, then the product mn = 0. Such a ring is called
an integral domain. An integral domain can be further expanded to a field,
called the quotient field of the integral domain; in the case of Z, this quotient
field is exactly Q. The difference between an integral domain and its quotient field
is that every nonzero element of the former now has a multiplicative inverse in the
latter; e.g., while there is no integer z so that z · 2 = 1, the element 12 in Q does
satisfy ( 12 ) · 2 = 1. In fact, it turns out that all nonzero elements in Q—not only
those in Z—have multiplicative inverses in Q itself.
In summary, the main point of the two-step process of expansion, from N to
Z to Q—in the context of algebra—is that by passing from N to Q, we acquire an
additive inverse for every element of Q (in particular, for every element of N) and
also a multiplicative inverse for every nonzero element of Q (in particular, for every
nonzero element of N). This is the reason from the perspective of mathematical
134 2. RATIONAL NUMBERS

structure that we need the field of rational numbers. It may be added that the
passage from an integral domain to its quotient field is completely standard and
merits at most a week and a half in a course on abstract algebra. But in school
mathematics, teaching fractions is of course spread over four or five years.
From the point of view of teaching school mathematics, something else should
also be brought out, namely, the fact that the passage from an integral domain to
its quotient field, while standard, is nevertheless abstract. To simplify matters a
little bit, let us illustrate with Z and Q. Let S be the subset of ordered pairs
of integers Z × Z consisting of all the elements (x, y) so that y = 0. Introduce
an equivalence relation ∼ in S by defining (x, y) ∼ (z, w) if and only if xw = yz
("ordered" means, by definition, (x, y) = (z, w) if and only if x = z and y = w).
Denote the equivalence class of (x, y) in S by xy . The set of all such xy is what we
call Q. Identify Z with the set of all elements of the form x1 , and we have Z ⊂ Q.
It is in this sense that Q is an extension of Z. Finally, we convert Q into a ring by
defining addition and multiplication in Q as
x z xw + yz x z xz
(2.39) + = and · = .
y w yw y w yw
Of course we routinely check the compatibility of these definitions with the equiv-
alence relation and the fact that every nonzero element of Q has a multiplicative
inverse. This is what we normally teach our math majors in three to four lectures.
An obvious but relevant comment is that the above equivalence relation, to the
effect that (x, y) ∼ (z, w) if and only if xw = yz, when written in the ordinary
fractional notation, becomes
x z
= if and only if xw = yz.
y w
One recognizes that this is precisely the statement that the cross-multiplication
algorithm (Theorem 1.2 on page 22) holds in Q. Furthermore, one sees in the
definition of addition in (2.39) that there is no mention of the least common de-
nominator at all. Therefore a knowledge of abstract algebra is beneficial to school
teachers at least in two respects: they gain an appreciation of the significance of the
cross-multiplication algorithm (and are therefore less likely to ban their students
from using it), and they also realize how misguided it is to insist on the use of the
least common denominator in fraction addition. Having said that, let us return to
the point about the abstraction inherent in this way of defining Q: this is definitely
not something you want to bring back to elementary or middle school. It is this
disconnect between abstract algebra and what can be taught in the school class-
room that we must keep in mind when we approach Chapter 1. We are forced to
introduce Q by using something less abstract than equivalence classes of ordered
pairs of integers, and the number line seems to be an acceptable compromise. For
example, the product formula (Theorem 1.6 on page 46) is among the most sub-
stantial theorems to be proved in these two chapters. We choose to give a proof on
the basis of a definition of fraction multiplication in terms of the number line (page
45) because the alternative is to define multiplication as in (2.39); i.e., xy · w
z xz
= yw .
But fifth and sixth graders do not take kindly to having multiplication of fractions
defined for them by a formula. Indeed, TSM has done this very thing to them for
decades, and we know only too well the result: massive nonlearning.
2.7. FASM 135

One can also gain some perspective on the introduction of negative numbers on
the number line in Section 2.1 (page 90). In an abstract algebra course, the negative
integers are elements of an abstract ring, Z, and are introduced before Q is defined.
Of course, negative fractions then appear as equivalence classes of ordered pairs of
integers. It is a case of abstractions piled on top of abstractions. In Chapters 1 and
2, we go from N to fractions—the nonnegative numbers in Q—before introducing
negative numbers all at once as points to the left of 0 on the number line. As in the
case of fractions, the goal is to make negative numbers as "concrete" as possible.
The fear of negative numbers in middle school is well known, and one can only
speculate that this fear is the result of presenting negative numbers as abstract
quantities in TSM and teaching their arithmetic properties such as (negative) ×
(negative) = (positive) by rote. This explains the emphasis on reasoning and proofs
for all the arithmetic computations in Z in Chapter 2. It may be mentioned that
the proof of Theorem 2.8 given on page 109 of (negative) × (negative) = (positive)
is in essence what would be given in an abstract algebra course. However, because
this proof is given only after an elaborate discussion of special cases (see Section
2.4), it is hoped that the proof will finally make sense and that some version of it
can be used in the school classroom.
CHAPTER 3

The Euclidean Algorithm

This chapter is essentially a set of variations on the theme of division-with-


remainder, with the main variation being the Euclidean algorithm (see Theorem
3.2 on page 140).1 This is an algorithm that expresses the greatest common divisor
of two given whole numbers as an integral linear combination of the two numbers
themselves. As an application, we prove that there is an algorithm that reduces
any fraction to a unique fraction in lowest terms (Theorem 3.1 on page 139). This
algorithm was hinted at in connection with the cancellation law for fractions (see
Exercise 2 on page 30). Another noteworthy application of the Euclidean algorithm
is the fundamental theorem of arithmetic (page 149), which guarantees that every
whole number > 1 is equal to the product of a unique collection of primes. This
theorem not only enhances our understanding of the multiplicative structure of the
whole numbers and the relationship between fractions and decimals (see Theorem
3.8 on page 152) but also has repercussions in the study of polynomials (see Section
5.3 in [Wu2020b]).

3.1. The reduced form of a fraction


This section proves that every fraction has a unique reduced form, i.e., a frac-
tion equal to the original fraction so that its numerator and its denominator have
no common factor other than 1, and that there is a sequence of explicit steps to
get the reduced form (Theorem 3.1 on page 139). The proof of this theorem is as
important as the theorem itself because it proves the Euclidean algorithm along the
way. The proof of the latter is, in turn, most interesting because it uncovers the
mathematical potential of division-with-remainder, a mundane tool usually taught
as a rote skill in TSM.
Divisors, GCD, and the reduced form of a fraction (p. 138)
The Euclidean algorithm (p. 139)
The key lemma and the proof of Theorem 3.1 (p. 144)

1 An exception to the many misnomers in mathematics, this algorithm is actually contained in

Book VII of Euclid’s Elements ([Euclid2]). See Propositions 1 and 2 therein. Euclid, whose name
has become part of the English language (and probably every language), was a Greek mathemati-
cian who lived in Alexandria (in Egypt, then a Greek colony) around 300 BC. Essentially nothing
is known about him other than the fact that he authored the Elements, a comprehensive account
of the mathematical knowledge known at his time, which probably contains his reorganization
together with his own original contributions. To the general public, "Euclid" is synonymous with
plane geometry, but it is a fact that only the first six of the thirteen books of this work are devoted
to plane geometry. Beyond the Euclidean algorithm, the fundamental theorem of arithmetic and
his (Euclid’s) famous proof of the infinitude of primes both appear in Book IX. The Elements is
the work that laid the foundation of not only mathematics but of modern science as well.

137
138 3. THE EUCLIDEAN ALGORITHM

Divisors, GCD, and the reduced form of a fraction

We will begin with some basic concepts about whole numbers that will allow
us to define precisely the meaning of the reduced form of a given fraction. We start
at the beginning. A nonzero integer d is a divisor or a factor of an integer a if
a = cd for some integer c. (Thus a divisor is nonzero, by definition.) Sometimes we
also say d divides a. Another way to say d divides a is to say that the rational
number ad is an integer. We write d|a when this happens, and we also say a is an
(integral) multiple of d. If d does not divide a, we write d  |a. We also call an
expression of a as a product, a = cd, a factorization of a.
Observes that 1 divides every integer, as does −1. Also observe that (i) if k|
and |m, then k|m and (ii) every nonzero integer divides 0. The simple proofs are
left to Exercise 1 on page 146.
In the following discussion, most of the time all the integers involved will be
whole numbers, i.e., integers which are positive or 0. However, there are one or two
places where it would become very awkward if we restrict ourselves only to whole
numbers (cf. part of the proof of the Euclidean algorithm on pp. 144ff.). For this
reason, we bring in integers from the beginning. When we need to focus on whole
numbers exclusively, we will be explicit about it, e.g., the concept of a prime in the
next section.
Consider now two whole numbers a and d, where at least one of a and d is not
equal to 0. An integer is said to be a common divisor of a and d if it divides
both a and d. Note that any two such whole numbers a and d have at least two
common divisors, namely, ±1. A whole number c is said to be the GCD (greatest
common divisor) of whole numbers a and d if, among all the common divisors of
a and d, c is the greatest. Notation: GCD(a, d). Observe that the definition of
GCD is well-defined because, let us say, a > 0. Then the set of common divisors
of a and d contains at least 1, so GCD(a, d) is just the largest number among the
finite set of integers from 1, 2, . . . to a. Two whole numbers a and d are said to be
relatively prime if GCD(a, d) = 1. For example, GCD(125, 64) = 1.
For a later need, we will give another approach to GCD. Given a whole number
n, let D(n) denote the set of all the divisors of n. For example, D(0) is the set of
all nonzero integers and D(1) contains only the two integers ±1. Clearly,
the set of all common divisors of a and d = D(a) ∩ D(d).
Then still assuming that at least one of the whole numbers a and d is nonzero so
that D(a) ∩ D(d) is a finite set, we have
(3.1) GCD(a, d) = max{D(a) ∩ D(d)},
where max indicates the largest number in the finite set. If a > 0, then the fact
that D(0) is the set of all nonzero integers implies that
(3.2) GCD(a, 0) = a.
In this notation, we also see that a and d being relatively prime is equivalent to
D(a) ∩ D(d) = 1.
A fraction is said to be a reduced form of a given fraction k if m
m
n
k
n =  and m and
n are relatively prime. In general, a fraction with the property that its numerator
and denominator are relatively prime is said to be in lowest terms, or reduced.
A fact taken for granted in elementary school is that any fraction has a reduced
3.1. THE REDUCED FORM OF A FRACTION 139

form and that there is only one. When classroom instruction focuses entirely on
fractions with a single-digit numerator and denominator—a common practice in
TSM—the reduced form of a fraction can be obtained by visual inspection. For
fractions with larger numerators and denominators, however, deciding whether a
fraction is in reduced form is often not obvious. For example, is the fraction
1147
899
reduced? (It is not. See Exercise 7 on page 146.) The purpose of this section is to
clarify this situation once and for all by proving the following theorem. The state-
ment requires that we define the term algorithm: it is an explicit finite procedure
that leads to a desired outcome.2

Theorem 3.1. Every nonzero fraction has a unique reduced form. Further-
more, this reduced form can be obtained by an algorithm.

Proof (beginning).3 We will first prove the fact that every nonzero fraction has
a reduced form, as follows. Let a fraction k be given, k > 0. Let c, c > 0, be the
GCD of k and , and let k = ck and  = c for some whole numbers k and  . We
claim
k k
 is the reduced form of  .

The equality k = k is because of equivalent fractions (Theorem 1.1 on page 20).

We will prove the fact that k is reduced by contradiction. Suppose it is not. Then
k and  have a common divisor c > 1; let us say k = c k0 and  = c 0 for some
whole numbers k0 and 0 . Then
k = ck = cc k0 and  = c = cc 0 .
It follows that cc is a common divisor of k and . Since c > 1, cc > c, and
this contradicts the fact that c is the greatest of the common divisors of k and .

Therefore k is reduced. This then proves that every fraction has a reduced form.
Showing that a reduced form of a fraction always exists is the easy part of
proving Theorem 3.1, however. To complete the proof of the theorem, we also need
to show:
(a) The reduced form of a fraction is unique.
(b) There is an explicit finite procedure (i.e., an algorithm) that unfailingly
yields the reduced form of a fraction.
Neither is trivial, and the proof of both needs the Euclidean algorithm, which is
the subject of the next subsection.

The Euclidean algorithm

For the statement of the Euclidean algorithm, we have to first recall the well-
known procedure of division-with-remainder: given positive integers a and d, the
division-with-remainder of a by d is given by the equation
(3.3) a = qd + r where q, r ∈ N and 0 ≤ r < d.
2 Note that an algorithm can also refer to an infinite process so that the numbers produced

by the successive steps form a sequence that converges to the desired outcome.
3 The conclusion of the proof is on page 145.
140 3. THE EUCLIDEAN ALGORITHM

The number q is the quotient, d is the divisor, a is the dividend, and r is


the remainder.4 We also need one more piece of terminology. We say a whole
number k is an integral linear combination of whole numbers m and n if there
are integers a and b so that k = am + bn. For example, 7 is an integral linear
combination of 161 and 119 because



7 = 3 × 161 + (−4) × 119 .

Theorem 3.2 (Euclidean algorithm). Assume positive integers a and d.


Then GCD(a, d) can be obtained by a finite number of applications of division-with-
remainder. Furthermore, GCD(a, d) is an integral linear combination of a and d.

The remainder of this subsection is devoted to the proof of the Euclidean algo-
rithm (the proof itself is given on page 143).

The overriding idea that runs through the proof of the Euclidean algorithm is
captured by the following lemma.

Lemma 3.3. Given whole numbers a, q, d, and r so that a, d = 0 and a = qd+r,


the following equality holds:
(3.4) GCD(a, d) = GCD(d, r).

We should point out right away that while the equality a = qd + r in the lemma
is obviously suggested by the division-with-remainder of a by d in (3.3), there is
one small difference: the requirement of "0 ≤ r < d" in (3.3) is irrelevant for the
validity of Lemma 3.3 itself.
Because we have been emphasizing the purposefulness of mathematics (cf. page
xiii), we should ask—before we even bother with the proof of the lemma—what
purpose this lemma serves in our search for an algorithm that yields the GCD of
two whole numbers. A simple example will provide a satisfactory answer to this
question. Let us try to get the GCD of 897 and 221. The numbers 897 and 221 are
relatively big and—without resorting to any computer software—it is not so easy
to see what their common divisors are. However, Lemma 3.3 suggests that we look
at the division-with-remainder of 897 by 221:
(3.5) 897 = (4 × 221) + 13.
The (as yet unproven) lemma now guarantees that
GCD(897, 221) = GCD(221, 13).
Compared with (897, 221), the pair (221, 13) is a much smaller pair of whole num-
bers and the level of difficulty of our search has been reduced accordingly. If we
are lucky or clever, we may notice that 13 divides 221 and therefore GCD(221, 13)
= 13. The GCD of 897 and 221 is thus 13 and we are done. However, since we
are searching for an algorithm, we must produce a procedure that yields the GCD
without the intervention of luck or cleverness for its implementation.5 Once more,
4 Mathematical Aside: In abstract algebra, this is of course the division algorithm for integers,

but in school mathematics, one cannot afford to use this terminology because it causes confusion
with the long division algorithm.
5 An algorithm is an explicit finite procedure that leads to a desired outcome, period. There

is no mention of luck or cleverness in the definition of an algorithm.


3.1. THE REDUCED FORM OF A FRACTION 141

Lemma 3.3 comes to the rescue: if it was helpful before, it would likely be helpful
again. Let us therefore use it on 221 and 13. The division-with-remainder of 221
by 13 now states
221 = (17 × 13) + 0.
Therefore Lemma 3.3 implies that GCD(221, 13) = GCD(13, 0), and the latter is
just 13, by (3.2) on page 138. We may therefore summarize our findings as follows:
by iterating the division-with-remainder (3.3) in the manner of Lemma 3.3, we
obtained
GCD(897, 221) = GCD(221, 13) = GCD(13, 0) = 13.
This is Lemma 3.3 at work.
Incidentally, the equality in (3.5) already expresses the GCD 13 in terms of 897
and 221, because it says 13 = 897 − (4 × 221). So by (2.23) on page 109, we get
13 = (1 × 897) + ((−4) × 221).
We have therefore verified the Euclidean algorithm completely in this special case.
We can now approach the proof of Lemma 3.3 with the prior assurance that it
will indeed be useful for our purpose.

Proof of Lemma 3.3. We will in fact prove something slightly more general. To
this end, we need to define what it means for two sets A and B to be equal:
A = B means A ⊂ B and B ⊂ A (i.e., two sets are equal, by definition, if they have
the same collection of elements in the sense that any element that belongs to one
set also belongs to the other set). With this understood, we will prove that the
equality among whole numbers a = qd + r implies the following equality of sets:
(3.6) D(a) ∩ D(d) = D(d) ∩ D(r).
By (3.1) on page 138, (3.6) implies (3.4) and hence also Lemma 3.3.
The proof of (3.6) hinges on the following simple
Observation: Suppose A, B, C are integers and A = B + C,
and suppose an integer n divides any two of A, B, C. Then n
divides all three.
The proof is straightforward (see Exercise 2 on page 146).
To prove (3.6), let us first prove one of the requisite inclusion relationships:
D(a) ∩ D(d) ⊂ D(d) ∩ D(r).
Suppose an integer n belongs to the left side; i.e., n divides both a and d. Then
we will show that it also divides both d and r. In other words, we have to show
that if n divides both a and d, then it also divides r in a = qd + r. Indeed, n
dividing both a and d implies n divides a and qd in a = qd + r, and therefore n has
to divide the third number r, by the preceding observation. We have proved the
desired inclusion.
The proof of the reverse inclusion is entirely similar. This proves equation (3.6)
and the proof of Lemma 3.3 is complete.

Now that we know Lemma 3.3 is correct, let us concentrate on the business
at hand, which is to use the lemma to prove the Euclidean algorithm in general.
To this end, we will work through a more elaborate example to get a better feel
for the process: let us determine the GCD of 10049 and 1190. Thus we carry out
142 3. THE EUCLIDEAN ALGORITHM

the divisions-with-remainder in succession, in accordance with Lemma 3.3, without


further comment:
(3.7) 10049 = (8 × 1190) + 529,
(3.8) 1190 = (2 × 529) + 132,
(3.9) 529 = (4 × 132) + 1,
(3.10) 132 = (132 × 1) + 0.
According to Lemma 3.3, we have
GCD(10049, 1190) = GCD(1190, 529) = GCD(529, 132)
(3.11) = GCD(132, 1) = GCD(1, 0) = 1.
Before proceeding any further, notice that we have implicitly used a convention in
writing GCD(a, d) of two whole numbers a and d; namely, we always let the first
number of the pair, (a, d), be the larger of the two numbers; i.e., we always assume
a > d in GCD(a, d). With this understood, we now call attention to the fact that
in (3.11), the second number in each of the five pairs of numbers gets smaller with
each step:
1190 > 529 > 132 > 1 > 0.
The fact that 1190 > 529 is because these are the divisor and remainder, respec-
tively, of the division-with-remainder in (3.7), as required by the division-with-
remainder in (3.3) on page 139. The validity of the remaining inequalities (i.e.,
529 > 132 > 1 > 0) is similar if we keep the divisions-with-remainder (3.8)–(3.10)
in mind.
Now the iteration of division-with-remainder in (3.7)–(3.10) will go on so long
as the remainder of a division-with-remainder is nonzero (because this remainder,
in accordance with Lemma 3.3, will serve as the divisor in the next division-with-
remainder). But the remainder gets successively smaller with each iteration, while
the whole numbers, N, have a smallest number, namely 0. It follows that after a
finite number of iterations of division-with-remainder, the remainder must reach 0.
In other words, the second member of the number pairs (such as those in (3.11))
must eventually decrease down to 0. Thus the last number pair will be of the form
(r, 0). Now (3.2) on page 138 implies that GCD(r, 0) = r. For example, the last
pair in (3.11) is (1, 0), and this is why we have GCD(10049, 1190) = GCD(1, 0)
= 1. This is how we obtain the GCD of 100149 and 1190 without appealing to
cleverness or luck. Incidentally, we have exhibited a nontrivial pair of relatively
prime integers: 10049 and 1190.
In general, if we start with a pair (n, 1190) for some integer n > 1190, then
in principle after at most 1190 divisions-with-remainder, the remainder will reach
0 (this is because the r in the division-with-remainder n = q × 1190 + r could be
1189). If the last pair is denoted by (r, 0), then GCD(r, 0) = r (see (3.2) on page
138) and therefore this r is the sought-after GCD of the pair n and 1190. The
fact that for the case n = 10049, the remainder reaches 0 only after 4 iterations of
division-with-remainder is a happy accident but is not entirely unexpected. Even in
the general case of a pair (a, d), it takes far fewer than d2 divisions-with-remainder
for the remainder to reach 0. See the comment in Exercise 12 on page 146.
We also take this opportunity to show how to express the GCD 1 of 10049 and
1190 as an integral linear combination of 10049 and 1190 (see the statement of the
3.1. THE REDUCED FORM OF A FRACTION 143

Euclidean algorithm on page 140). Starting with the division-with-remainder in


which 1 appears as a remainder, i.e., starting with (3.9), we get
(3.12) 1 = 529 − (4 × 132).
By equation (3.8), 132 = 1190 − (2 × 529). Substituting this expression of 132 into
(3.12), we get, after repeated applications of the distributive law,


1 = 529 − 4 × (1190 − (2 × 529))


= 529 − (4 × 1190) − (8 × 529)
= 529 − (4 × 1190) + (8 × 529)
(3.13) = (9 × 529) − (4 × 1190).
Finally, we use (3.7) to get 529 = 10049−(8×1190). By substituting this expression
of 529 into (3.13), we obtain


1 = 9 × (10049 − (8 × 1190)) − (4 × 1190)
= (9 × 10049) − (72 × 1190) − (4 × 1190)
= (9 × 10049) − (76 × 1190).
In other words,


1 = (9 × 10049) + (−76) × 1190 .
This expression of the GCD of 10049 and 1190 (which is 1) as an integral linear
combination of 10049 and 1190 is certainly not obvious.
We are finally ready for the proof of Theorem 3.2 on page 140.

Proof of the Euclidean algorithm. Given the whole number pair a and d,
with a > d ≥ 1, division-with-remainder yields
a = qd + r, where 0 ≤ r < d.
By Lemma 3.3, we have the equality GCD(a, d) = GCD(d, r). Therefore the deter-
mination of the GCD of a and d is replaced by the determination of the GCD of d
and r. Observe that a > d and d > r. Lemma 3.3 now suggests that we again apply
division-with-remainder to the pair d and r; then the determination of GCD(d, r)
will be replaced by the determination of the GCD of a yet-smaller pair, and so on.
After a finite number of steps, this process must terminate with a pair of whole
numbers whose second member is 0. Let us say we get the following in succession:
a = qd + r,
d = q1 r + r1 ,
r = q2 r1 + r2 ,
r1 = q3 r2 + r3 ,
r2 = q4 r3 + 0.
The relationship between the remainder and the divisor in (3.3) on p. 139 implies
d > r > r1 > r2 > r3 > 0.
Note that the division with remainder can, in principle, continue for d − 1 steps
before it terminates with remainder 0, but for simplicity of writing, we have allowed
144 3. THE EUCLIDEAN ALGORITHM

the remainder 0 to appear after 4 steps.6 Clearly there is no loss of generality in


the subsequent reasoning. That said, we conclude by Lemma 3.3 that
GCD(a, d) = GCD(d, r) = GCD(r, r1 ) = GCD(r1 , r2 )
= GCD(r2 , r3 ) = GCD(r3 , 0) = r3 .
This therefore exhibits the algorithm to obtain the GCD of a and d as an iteration
of divisions-with-remainder according to Lemma 3.3.
Next, we will express the GCD, which is r3 , as an integral linear combination
of a and d as follows. First, we rewrite the preceding sequence of divisions-with-
remainder in reverse order, each time expressing the remainder as an integral linear
combination of the dividend and the divisor:
r3 = r1 + (−q3 )r2 ,
(3.14) r2 = r + (−q2 )r1 ,
(3.15) r1 = d + (−q1 )r,
(3.16) r = a + (−q)d.
Therefore, repeated substitution of ri into the equation above it yields
r3 = r1 + (−q3 )r2


= r1 + (−q3 ) r + (−q2 )r1 (by (3.14))
= (−q3 )r + (1 + q2 q3 )r1


= (−q3 )r + (1 + q2 q3 ) d + (−q1 )r (by (3.15))
= (1 + q2 q3 )d + (−q1 − q3 − q1 q2 q3 )r


= (1 + q2 q3 )d + (−q1 − q3 − q1 q2 q3 ) a + (−q)d (by (3.16))



= 1 + q2 q3 + q(q1 + q3 + q1 q2 q3 ) d + − q1 − q3 − q1 q2 q3 a.
The proof of the Euclidean algorithm is complete.

The key lemma and the proof of Theorem 3.1

The Euclidean algorithm has the following important consequence.

Lemma 3.4 (Key lemma). Suppose , m, n are nonzero whole numbers and
|(mn). If  and m are relatively prime, then |n.

The reason we call this the "key lemma" is that it is the linchpin in the proof
of the uniqueness of the reduced form of a fraction as well as in the proof of the
all-important uniqueness of the prime decomposition of a whole number in the
fundamental theorem of arithmetic (see page 149). This proof shows why it is
important to be able to express the GCD of two numbers as an integral linear
combination of them.
Before proving the key lemma, let us make sure that we know what it says. It
is easy to see that a whole number  can divide a product without dividing either
factor. Thus, 6|(3 × 4), but 6  |3 and 6  |4, or more elaborately, 63|(72 × 245), but
63  |72 and 63  |245. However, what the key lemma says is that  must divide one
of the factors in the product if  is relatively prime to the other factor. It goes
6 This process actually terminates with a 0 remainder in far fewer than half of (d − 1) steps.

See the comment in Exercise 12 on page 146.


3.1. THE REDUCED FORM OF A FRACTION 145

without saying that in the preceding example, 63 is relatively prime to neither 72


nor 245 (GCD(72, 63) = 9 and GCD(245, 63) = 7), so there is no contradiction.

Proof of key lemma. The following brilliant proof is (so far as we can determine)
due to Euclid. We are given whole numbers , m, and n, so that |mn and  and
m are relatively prime. We must prove |n. Since  and m are relatively prime,
GCD(, m) = 1. By the Euclidean algorithm, 1 = α + βm for some integers α and
β. Multiply this equation through by n, and we get n = αn + βmn. Since  di-
vides mn by hypothesis, |(βmn); obviously, |(n). Therefore  divides αn+βmn,
which is n. In other words,  divides n. The proof is complete.

Activity. Given four primes p1 < p2 < p3 < p4 . Show that


p1 p3
= .
p2 p4
We are now in a position to complete the proof of the main theorem (Theorem
3.1) of this section announced earlier on page 139: Any fraction k has a unique
reduced form. Moreover, there is an algorithm to get this reduced form. Recall that
we already proved the existence of a reduced form for every fraction (see page 139).

Proof of Theorem 3.1 (conclusion). Given a nonzero fraction k , where k > 0


and  > 0, let GCD(k, ) = c. Thus k = ck and  = c for some nonzero whole

numbers k and  . We have seen that this fraction k is a reduced form of k . What

we now know in addition is that we get this reduced form k via an algorithm. Let

us make explicit the algorithm that leads to k :
Step 1. Use the Euclidean algorithm to find the GCD c of k
and .
Step 2. Compute kc = k and c =  .
k
Then  is a reduced form of k .

It remains to prove the uniqueness of the reduced form. Suppose k is already



equal to a reduced fraction k , and suppose furthermore that k is also equal to
another reduced fraction k00 . We must prove that k = k0 and  = 0 . We have
k k0
 = 0 , because both are equal to k . By the cross-multiplication algorithm

(Theorem 1.2 on page 22), we have k 0 = k0  . Obviously k divides the left side of
the equation, which is k 0 ; therefore k also divides the right side of the equation,

which is k0  . Since k is reduced, k and  are relatively prime. Therefore the
key lemma implies that k |k0 , so that k ≤ k0 . We now look at k0  = k 0 from
a different angle. Since k0 |(k0  ), we have k0 |(k 0 ). Since k00 is also reduced, k0
and 0 are relatively prime. By the key lemma again, we must have k0 |k and thus
k0 ≤ k . Together with k ≤ k0 , we get k = k0 . Using k 0 = k0  , we conclude
that also  = 0 , as desired.
146 3. THE EUCLIDEAN ALGORITHM

Exercises 3.1.
(1) (i) If k, , m are integers and if k| and |m, then prove that k|m. (ii) Show
that every nonzero integer divides 0. (Caution: Use the precise definition
of divisibility.)
(2) Suppose A, B, C are whole numbers and A = B + C. Show that if a
whole number n divides any two of A, B, C, then n divides all three.
(3) Prove that the number 3 is a divisor of a whole number n if and only if 3
is a divisor of the number obtained by adding up all the digits of n. (Hint:
3 dividing a power of 10 always has remainder 1.)
(4) Repeat Exercise 3 with the number 3 replaced by the number 9.
(5) Prove that a whole number with 3 or more digits is divisible by 4 precisely
when the number formed by its last two digits (i.e., its tens digit and ones
digit) is divisible by 4. (Thus 93748 is divisible by 4 because 48 is divisible
by 4.)
(6) Prove that a whole number is divisible by 5 if and only if its last digit is
0 or 5.
(7) In each of the following, find the reduced form of the fraction: (a) 160 256 ,
(b) 273
156 , (c) 144
336 , (d) 1147
899 .
(8) (i) Find the GCD of each of the following pairs of numbers by listing all
the divisors of each number and compare: 35 and 84, 54 and 117, 104 and
195. (ii) Find the GCD of each of the same pairs of numbers by using the
Euclidean algorithm.
(9) Find the GCD of each of the following pairs of numbers, and express it as
an integral linear combination of the numbers in question: 322 and 159,
357 and 272, 671 and 2196.
(10) Let the GCD of two positive integers a and d be k, and let k = ma−nd for
some whole numbers m and n. Show that m and n are relatively prime.
(11) Let m a
n be the reduced form of a fraction b ; then prove that a = γm and
b = γn, where γ is the GCD of m and n.
(12) The effectiveness of the Euclidean algorithm depends on how fast the
remainders in the sequence of iterated divisions-with-remainder get to 0.
This exercise gives an indication. Given a division-with-remainder of a by
d,
a = qd + r.
We may assume that the division-with-remainder is nontrivial in the
sense that a > d and r > 0. Then first prove (i) r < 12 a.
Next, we iterate the divisions-with-remainder as in the Euclidean al-
gorithm:
a = qd + r,
d = q1 r + r1 ,
r = q2 r1 + r2 .
Then, assuming that the divisions-with-remainder are also nontrivial,
prove (ii) r1 < 12 d, (iii) r2 < 12 r.
Of course, the preceding equations being divisions-with-remainder,
we already know r2 < r1 < r (see (3.3) on p. 139). In theory,
however, we could have r1 = r − 1 and r2 = r1 − 1 so that
3.2. THE FUNDAMENTAL THEOREM OF ARITHMETIC 147

r2 = r − 2. What this exercise says is that, in fact, r2 is less


than half of r. If we continue with the Euclidean algorithm to
get
r1 = q3 r2 + r3 ,
r2 = q4 r3 + r4 ,
then the same reasoning yields r4 < 12 r2 , so that by (iii) above
1 1
r4 < r2 < 2 r.
2 2
In general,
1
r2n < n r.
2
Therefore it does not take long for the Euclidean algorithm to
terminate.
(13) (a) For any whole number n, prove that GCD(n, n + 1) = 1. (b) What
is GCD(n, n + 2) for a whole number n? (c) Let n be a whole number.
What could GCD(n, n + k) be for a whole number k > 2?
(14) Observe that there are many pairs of consecutive odd integers which are
both primes, e.g., 3 and 5, 5 and 7, 11 and 13, 137 and 139, 5741 and
5743 (the last pair is not obvious). Such pairs of odd integers n and n + 2
so that both are primes are called twin primes. So far as computers can
see, there are arbitrarily large twin primes. The Goldbach Conjecture,
dating back to 1742, states that there are in fact an infinite number of
twin primes; this conjecture remains unsolved. (i) The numbers 3, 5, 7 are
an example of what may be called triplet primes, i.e., three consecutive
odd integers which are primes. Prove, however, that there are no triplet
primes other than 3, 5, 7. (ii) Show that there is no whole number n such
that n, n + 10, and n + 20 are all prime.7 (Hint: Let n = 1, 2, 3, . . ., and
see if you notice that there is always one among the three numbers that
is obviously not a prime.)

3.2. The fundamental theorem of arithmetic


The dual purpose of this section is to prove the fundamental theorem of arith-
metic (Theorem 3.6 on page 149) and to use it to draw two conclusions: the first
about fractions which are equal to finite decimals and the second about the possi-
bility that some numbers (i.e., points on the number line) are not rational, i.e., do
not lie in Q. In all these arguments, it will be seen that the crucial ingredient in
the proofs is the uniqueness of the prime decomposition of a whole number. This
puts into sharp relief TSM’s error of avoiding the very concept of uniqueness. We
conclude this chapter by reproducing Euclid’s famous proof of the infinity of primes.
Primes (p. 148)
Fundamental theorem of arithmetic (p. 149)
Fractions equal to finite decimals (p. 152)
Square roots of whole numbers (p. 153)
The infinity of primes (p. 155)

7 Thanks to Ole Hald.


148 3. THE EUCLIDEAN ALGORITHM

Primes

The statement of the fundamental theorem of arithmetic requires the concept


of a prime. Let us define that right away. A whole number a which is greater than
1 has at least two whole number divisors, 1 and a itself. A proper divisor d of
a whole number a is a whole number divisor of a so that 1 < d < a. Note that if
a = cd for whole numbers c and d so that both c and d are > 1, then each of c and
d is a proper divisor of a.

Activity. Prove the last claim; i.e., if c and d are whole numbers, a = cd, and
both c and d are > 1, then each of c and d is a proper divisor of a.

A whole number > 1 without proper divisors is called a prime, or prime


number. A whole number which is > 1 and is not a prime is called a composite.
Note that by definition, 1 is neither prime nor composite.8 The primes less than
100 are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97.
Checking whether a whole number is a prime is very difficult in general. However,
it is also easier than it appears at first sight, especially for relatively small numbers.
This is because we have the following lemma.


Lemma 3.5. Given a whole number n > 1, if no prime number p ≤ n is a
divisor of n, then n is a prime.

For a positive number x, its positive square root √ x is the
positive number so that its square is equal to x; i.e., ( x)2 = x.
In Section 2.5 of [Wu2020c], we will prove that any positive real
number has a unique positive square root. Here we anticipate
this fact, but rest assured that there is no danger of circular
reasoning. Observe that√ if a, b are positive numbers, then a < b

is equivalent to a < b (Exercise 19 on page 133).

Activity. Check whether 4493 is a prime by using Lemma 3.5 and the list of
primes < 100. (You are allowed to use a four-function calculator.)

Lemma 3.5 is immediately seen to be equivalent to the following assertion about


composite numbers:

Lemma 3.5a. If n ∈ N is composite, then it has a prime divisor p ≤ n.

We will presently prove Lemma 3.5a, but before doing that, we want to give
an intuitive
√ explanation of why there is at least a proper divisor m of n so that
m ≤ n. First consider the additive analog: we claim that if a positive integer n
is the sum of two positive integers  and m, n =  + m, then one of  and m has to

8 If 1 were defined to be a prime, it would mess up the statement of the fundamental theorem

of arithmetic (see page 149).


3.2. THE FUNDAMENTAL THEOREM OF ARITHMETIC 149

be ≤ 12 n. If this is false, then both  and m > 12 n. Therefore,


1 1
n=+m> n + n = n.
2 2
This means n > n, which is absurd. So one of  and m ≤ 12 n. When we express the
same idea using multiplication instead of addition, the conclusion will be that for
a positive integer n which is√a composite, it must have a proper divisor (we are not
saying it is a prime yet) ≤ n. Here is the reason.
√ Since n is a composite, n = m
for two proper divisors  and m. If both are > n, then
√ √
n = m > n · n = n.
Thus n > n,√and again we have a contradiction. Therefore at least one of  and m
has to be ≤ n. You can get a better feel for this √ argument by experimenting with
some small composite numbers, √ such as 72. Now 72 ≈ 8.5, so we expect to get a
proper divisor of 72 that is < 72. If by bad luck you get hold of a large divisor like
36, then its "associated
√ divisor", i.e., the whole number k so that 72 = 36×k, will be
smaller than 72. Indeed, in this case, k = 2. If you √ get hold of the divisor 24 of 72
instead, then its "associated divisor" is 3, and 3 < 72, and so on. Of course if the
number is a perfect square √ such as 81, and 81 = 9 × 9, then you have a case where
both divisors are exactly 81, a fact well anticipated by Lemma 3.5a. The proof of
Lemma 3.5a is nothing more than a small refinement of these experimental findings.

Proof of Lemma √ 3.5a. Suppose n is composite. We first show that √ n has a


proper divisor ≤ n. If not, then every proper divisor of n exceeds n. Since n
is composite, it has a proper divisor . If we write n = m for some√whole number
m, then m is itself a proper divisor of n and we also have m > n. Therefore
(cf. Exercise 6(b) on page 131),
√ √
n = m > n · n = n,

so that n > n, a contradiction. Thus n must have a proper divisor k so that k ≤ n.
If k happens to be a prime, we are done. If not, then among all the distinct
proper divisors of k, let p be the smallest. We will show p is a prime. If not, p has
a proper divisor q, and since q|p and p|k, we have q|k. So q is a proper divisor of
k. Since q < p, p is not the smallest
√ proper divisor of k, which is a contradiction.
So p is a prime, and p < k ≤ n, as claimed. The lemma is proved.

Fundamental theorem of arithmetic

The following theorem shows that the primes are the fundamental multiplicative
building blocks of the whole numbers.

Theorem 3.6 (Fundamental theorem of arithmetic). Every whole number


n ≥ 2 is the product of a finite number of primes: n = p1 p2 · · · pk (the pi ’s are not
necessarily distinct). Moreover, this collection of primes p1 , . . . , pk , counting the
possible repetitions, is unique.

The uniqueness statement in this theorem, which is important for many reasons,
can be made more explicit, as follows: suppose n = p1 p2 · · · pk = q1 q2 · · · q , where
each of the pi ’s and qj ’s is a prime. Then to say that the collection of primes
150 3. THE EUCLIDEAN ALGORITHM

is unique means that k =  and, after renumbering the subscripts of the q’s if
necessary, we have pi = qi for all i = 1, 2, . . . , k.
The expression of n as a product of primes, n = p1 p2 · · · pk , is called its prime
decomposition.9 Let it be noted explicitly that in the above expression, some or
all of the pi ’s could be the same; e.g., 24 = 2 × 2 × 2 × 3. The fundamental theorem
of arithmetic says that, except for the order of the primes, the prime decomposition
of each n is unique.
It should not be assumed that getting the explicit prime decomposition of a
whole number is easy. Try 9167, for instance. Even with the help of Lemma 3.5
on page 148, we still have to check all the primes ≤ 96 to see if any of them di-
vides 9167. (It turns out that 9167 has the prime decomposition 9167 = 89 × 103.)
The whole field of cryptography, which makes possible the secure transmission of
confidential information on the internet, depends on the fact that it is practically
impossible to get the prime decomposition of a well-chosen whole number of, say,
2,000 digits (as of 2020). However, it is not difficult to establish, in theory, that
every whole number ≥ 2 has a prime decomposition, as we now show.10

Proof of the existence of a prime decomposition. Given n ∈ N and


n ≥ 2, if it is a prime, we are done. If not, n has a proper divisor. We claim that n
has a prime divisor. Indeed, the smallest among all its proper divisors, to be called
p1 , must be a prime. Suppose not; then p1 has a proper divisor q, and since q|p1
and p1 |n, we have q|n. So q is a proper divisor of n. But q < p1 , so p1 is not the
smallest proper divisor of n, a contradiction. Thus p1 is a prime after all. Let us
write n = p1 n1 for some whole number n1 . Apply the same argument to the whole
number n1 , and we get n1 = p2 n2 , where p2 , n2 are whole numbers and p2 is a
prime. Then we have n = p1 p2 n2 . Repeat the same argument on n2 , and after a
finite number of steps, we get a prime decomposition of n. The proof of existence
is complete.
It is the uniqueness that is more interesting and more difficult. To begin with,
because the concept of "uniqueness" in general is too slippery for most school stu-
dents, it is necessary to first present at least some example to show that uniqueness
cannot be taken for granted. To this end, consider the following two expressions of
4410 as a product:
4410 = 2 × 9 × 245 = 42 × 105.
These two products, 2 × 9 × 245 and 42 × 105, have different numbers of factors
and the factors are all distinct. The nonuniqueness of the expression of 4410 as
a product is striking. Of course with the exception of 2, none of the factors is a
prime. Once we require that each factor in the product be a prime, then we get
only one possibility (other than those obtained by permuting the factors):
4410 = 2 × 3 × 3 × 5 × 7 × 7.
The question is: why must uniqueness emerge as soon as we require each factor to
be a prime? Or, equivalently, what is the special property about primes that forces
the prime decomposition to be unique? The answer is certainly not obvious, and
9 In
school textbooks, this is usually called the prime factorization.
10 Mathematical Aside: The difference between the explicit determination of a number and
the assertion that this number exists can be seen from an example.  It is easy to write down a
definite integral whose exact value is impossible to determine, e.g., 07 sin(x3.7 )dx, but the fact
that this integral is equal to some number is relatively easy to prove.
3.2. THE FUNDAMENTAL THEOREM OF ARITHMETIC 151

one can infer from this question that the proof of the uniqueness in Theorem 3.6
will not be trivial.

Mathematical Aside: Examples of the breaking down of uniqueness in a prime


decomposition unfortunately cannot be given except in a course in abstract algebra.
For the simplest example of nonuniqueness, consider √ the ring (actually an integral
domain) √of all complex numbers of the form a + ib 5, where a and b are integers
and i = −1 as usual. In this ring, one can define a prime to be an element √α
whose only
√ divisors are ±1 and ±α. It is not difficult to check that 2, 3, 1 + i 5,
and 1 − i 5 are distinct primes. Now, a simple computation would reveal that
√ √
6 = 2 · 3 = (1 + i 5)(1 − i 5).
Thus 6 has at least two distinct factorizations into primes in this ring.
The special property about primes that is responsible for the uniqueness of the
prime decomposition of whole numbers turns out to be the key lemma (Lemma 3.4)
on page 144. Let us first reformulate the key lemma into a form more suitable for
the application at hand.

Lemma 3.7. Suppose a prime p divides a product of primes, q1 q2 · · · q , for


some positive integer . Then p is equal to qj for some j, 1 ≤ j ≤ .

Proof of Lemma 3.7. First let  = 3, and let Q = q2 q3 . By hypothesis, p|q1 Q.


If p = q1 , we are done. If not, then p and q1 are distinct primes and are therefore
relatively prime. By the key lemma, p|Q; i.e., p|q2 q3 . Looking at p|q2 q3 , if p = q2 ,
then we are done again. If not, then p and q2 are distinct primes and are therefore
relatively prime. By the key lemma, p|q3 . Since q3 is a prime, its divisors are 1 and
q3 ; p cannot be 1 since it is a prime, so p = q3 .
If  > 3, then it is clear from the preceding paragraph that the argument will
be just more of the same. The proof of Lemma 3.7 is complete.

Proof of uniqueness of the prime decomposition. Let n be a whole


number so that n = p1 p2 · · · pk = q1 q2 · · · q , where the p’s and the q’s are primes.
We want to prove that k =  and that, after a renumbering of the subscripts of the
q’s if necessary, we have pi = qi for all i = 1, 2, . . . , k.
Without loss of generality, we may assume k ≤ . We will prove that if k < ,
there will be a contradiction. Since p1 |(p1 · · · pk ), we have p1 |(q1 · · · q ). By Lemma
3.7, p1 = qj for some j, 1 ≤ j ≤ . By renumbering q1 , q2 , . . . , q if necessary,
we may assume that p1 = q1 . Thus we have p1 p2 · · · pk = p1 q2 · · · q . Multiplying
both sides by 1/p1 , we get p2 · · · pk = q2 · · · q . The same reasoning now shows
that—after renumbering q2 , q3 , . . . , q if necessary—we have p2 = q2 . Therefore,
p3 p4 · · · pk = q3 q4 · · · q , etc. Continuing in this manner, we see that we may assume
pi = qi for each i = 2, . . . , k. Therefore we have
p1 p2 · · · pk = (q1 q2 · · · qk )qk+1 · · · q
with p1 = q1 , . . . , pk = qk and k < . Multiplying both sides by 1/(p1 p2 · · · pk ), we
get 1 = qk+1 · · · q . Since each qj is a prime and therefore is greater than 1, this is
impossible. So k =  after all. Therefore, what we have is that
n = p1 p2 · · · pk = q1 q2 · · · qk ,
152 3. THE EUCLIDEAN ALGORITHM

where p1 = q1 , . . . , pk = qk . The proof of the fundamental theorem of arithmetic


is complete.

Remarks. (i) Knowing that each positive integer has a unique prime decompo-
sition, we can now get a more intuitive understanding of the case when two positive
integers  and m are relatively prime:  and m are relatively prime if and only if the
prime decompositions of  and m have no primes in common (Exercise 2 on page
155). (ii) We can also take a second look at the key lemma (page 144) from the
vantage point of the fundamental theorem of arithmetic. Thus suppose a positive
integer  divides a product mn of positive integers m and n, and suppose  and m
are relatively prime. We can now achieve a more intuitive understanding of why
 must divide n, as follows. By the preceding remark (i), the fact that  and m
are relatively prime means that the prime factors of  are distinct from the prime
factors of m. But if  divides mn, then all the primes in the prime decomposition
of  have to be a subset of the primes in the prime decomposition of n. This is why
|n. We must caution, however, that—unlike remark (i)—the intuitive argument
in this case does not constitute anything remotely like a proof for the key lemma.
This intuitive argument must remain an intuitive argument only, because it is actu-
ally circular, in the sense that the proof of the fundamental theorem of arithmetic
(which gives every positive integer its unique prime decomposition) depends on the
key lemma, and it will not do to use the unique prime decomposition of a positive
integer to prove that the key lemma is valid.

Activity. If k is a reduced fraction, prove that for any positive integer n, the
n
fraction kn is also reduced.

Fractions equal to finite decimals

The fundamental theorem of arithmetic has an interesting application in frac-


tions. The following theorem characterizes all the fractions that are equal to finite
decimals. In the statement of the theorem, remember that 20 = 50 = 1, by defini-
tion.

Theorem 3.8. If the denominator of a fraction is of the form 2a 5b , where a


and b are whole numbers, then the fraction is equal to a finite decimal. Conversely,
if a reduced fraction m
n is equal to a finite decimal, then the prime decomposition
of the denominator contains no primes other than 2 and 5.

Note that the second part of the theorem is clearly false if mn is not reduced.
For example, 36 = 0.5, but the prime decomposition of 6 contains a 3. It is also good
to recall that a finite decimal is just a fraction whose denominator is a power of
10. It will be apparent from the proof how important it is to have such a clear-cut
definition of a decimal.

Proof. We first prove that if the prime decomposition of the denominator n con-
tains no primes other than 2 and 5, then m n is equal to a finite decimal. The idea
of the proof is already contained in the reasoning on pp. 61ff. in the discussion of
decimal division, but we can quickly summarize it by way of an example. Consider
3.2. THE FUNDAMENTAL THEOREM OF ARITHMETIC 153

27 27
the fraction 160 . Since 160 = 25 · 5, 160 is equal to the decimal 0.16875 because,
by equivalent fractions (Theorem 1.1 on page 20),
27 27 27 · 54 16875
= 5 = 5 = ,
160 2 ·5 2 ·5·5 4 105
which by definition is 0.16875. In general, if n = 2a 5b , where a, b are whole numbers,
we may assume without loss of generality that a ≤ b. Then
m m 2b−a m 2b−a m
= a b = b−a a b =
n 2 5 2 2 5 10b
and the last is a finite decimal, by definition.
Conversely, suppose m n is a reduced fraction which is equal to a finite decimal:
m k
= b,
n 10
where k, b are whole numbers. We have to show that no prime other than 2 and 5
divides n. By the cross-multiplication algorithm, nk = m10b . Since m n is reduced, n
b
is relatively prime to m. Since n divides nk, it divides m10 as well. The key lemma
(page 144) shows that n must divide 10b , which is 2b 5b . Therefore, 2b 5b = n for
some whole number . By the uniqueness of the prime decomposition, the collection
of primes on the right is the same collection as those on the left, which consists of
only 2’s and 5’s. Therefore the primes on the right consist of only 2’s and 5’s. Thus
n = 2a 5c , where a and c are whole numbers ≤ b. The theorem is proved.

Square roots of whole numbers

At this point, we can take a look at the question of whether the rational numbers
are sufficient for doing mathematics. The following theorem implies that they are
not. For its statement, recall that a number is a point on the number line (page
5); for emphasis, a point on the number line is also called a real number. A real
number is said to be irrational if it does not lie in Q.11 In addition, a perfect
square is a whole number which is equal to the square of another whole number.
Thus the first few perfect squares are 0, 1, 4, 9, 16, 25, 36, 49, 64, . . . .

Theorem 3.9. Let n be a whole number which is not a perfect square. If there
is a positive number r so that r 2 = n, then r is irrational.

In Section 2.5 of [Wu2020c],


√ we shall prove that every positive number r has a
unique positive square root r (in the sense of page 148). Because there is no fear
of circular reasoning, we will assume this fact
√ in√ the√subsequent
√ discussion. That
said, the preceding
√ theorem implies that 2, 3, 5, 6, . . . are not rational
numbers.
√ For 2, there is
√ a simple argument which we reproduce here.12 Let
m
2 be rational, so that 2 = n for some positive integers m and n, and we
will deduce a contradiction. We may assume that at least one of m and n is odd
because, otherwise, if both m, n are even, we can use the cancellation law (1.4) on
11 Mathematical Aside: We emphasize that, by definition, an irrational number is a real

number that does not lie in Q. Thus the number i = −1 is not irrational, in spite of the fact
that it does not lie in Q.
12 To keep the argument as simple as possible, we purposely refrain from using even Theorem

3.1 on page 139 that would guarantee that m n


is in lowest terms.
154 3. THE EUCLIDEAN ALGORITHM

page 20 to cancel all common factors of 2 in m and n. That said, we square both
√ m2
sides of 2 = m 2 2 2
n to get 2 = n2 , so that m = 2n . Thus m is even and therefore
m is even. This implies n is odd. Now since m is even, then m = 2k for some
positive integer k, and m2 = 4k2 . Hence the equation m2 = 2n2 may be written
as 4k2 = 2n2 , which is the same as 2k2 = n2 . Now n2 is even (it is equal 2
√ to 2k ),
and therefore n must be even, contradicting the fact that n is odd, so 2 cannot
be rational after all.
The following proof is in essence a generalization of this argument, with the
fundamental theorem of arithmetic replacing the argument using "even and odd".

Proof. Since n > 1, let the prime decomposition of n be expressed as a product


of powers of distinct primes. (For example, 72 = 23 32 , 3375 = 33 53 , etc.) Consider
the case where n is the product of powers of three distinct primes: n = pa1 pb2 pc3 ,
where a, b, c are nonzero whole numbers and p1 , p2 , p3 are distinct primes. It
will be transparent that the reasoning for this special case is perfectly general and
that, by limiting ourselves to three primes, we save ourselves from some horrendous
notation. If a, b, and c are all even, let a = 2α, b = 2β, and c = 2γ for some whole
β γ 2
numbers α, β, and γ. Then n = (pα 1 p2 p3 ) , contradicting the hypothesis that n
is not a perfect square. Therefore at least one of a, b, and c is odd; let us say
a = 2k + 1 for some whole number k. Thus n = p2k+1 1 pb2 pc3 .
A
Suppose there is some rational number r so that r 2 = n. Let r = B , where A
and B are relatively prime whole numbers (see Theorem 3.1 on page 139). Then
A2
= n = p2k+1
1 pb2 pc3 ,
B2
which implies
A2 = p2k+1
1 (B 2 pb2 pc3 ).
Since p1 divides the right side, it divides the left side; i.e., p1 divides A2 and A con-
tains p1 in its prime decomposition. The number of p1 ’s in the prime decomposition
of A2 (the square of A) is therefore even. Denote the whole number (B 2 pb2 pc3 ) by C;
then the prime decomposition of C cannot contain p1 because p1 is not in the prime
decomposition of B (if it were, A and B would not be relatively prime). Conse-
quently, there is an odd number of p1 ’s on the right, namely, exactly 2k + 1 of them.
This is a contradiction. Thus there can be no such r ∈ Q, and the proof is complete.

Pedagogical Comments. It does not seem likely, as of 2020, that a proof


of any of the following theorems will ever make its way into the K–12 classroom:
Theorem 3.1 (page 139) on the existence of a unique reduced form of a fraction,
Theorem 3.2 (page 140) on the Euclidean algorithm, Theorem 3.6 (page 149) on the
fundamental theorem of arithmetic, Theorem 3.8 (page 152) on the characterization
of fractions which are finite decimals, and finally, Theorem 3.9 (page 153) on square
roots that are not rational. This chapter therefore only serves the purpose of
enriching your mathematical culture but does not belong to your minimal survival
kit for teaching, or so it would seem.
The reality is a little different, however. Without having gone through these
proofs, can a teacher convey to students with conviction that division-with-remain-
der is not the mindless rote skill that TSM makes it out to be but is, rather, a
powerful mathematical tool that connects the reduced form of a fraction to the
3.2. THE FUNDAMENTAL THEOREM OF ARITHMETIC 155

unique prime decomposition of a positive integer? Without having gone through


the proofs of Theorem 3.8 and Theorem 3.9 with care, can a teacher learn not to
spread TSM’s misinformation that it is the existence of the prime decomposition of
a positive integer that matters but that there is no need to discuss its uniqueness?
Without having gone through the proof of Theorem 3.1, won’t teachers continue
to insist that only fractions in lowest terms will be accepted as correct answers
because—as TSM would have it—getting to the reduced form of a fraction is so
easy? How can teachers avoid this pitfall if they have never faced a fraction such
899
as 1147 , or even just something as simple as 143
91 ?
A math teacher’s mathematical content knowledge therefore cannot be circum-
scribed, literally, by the topics in the school mathematics curriculum. At a time
when school mathematics education must rid itself of TSM, teachers also need to
know some of the mathematical ideas underlying the curriculum itself to better
understand why change is necessary. This chapter was designed to contribute to-
ward fulfilling this need. It is for this reason that we consider the proofs of these
theorems to be a vital part of the basic content knowledge of mathematics teachers.
End of Pedagogical Comments.

The infinity of primes

Finally, we reproduce Euclid’s proof of twenty-three centuries ago that there


are infinitely many primes. This is one of the most famous proofs in the history of
mathematics, due to its simplicity and its far-reaching implications.
The proof is by contradiction. Assuming that there are only a finite number
of primes, say p1 , p1 , . . . , pk , we will deduce a contradiction. Consider then the
whole number N = (p1 p2 · · · pk ) + 1. By the fundamental theorem of arithmetic,
this N has a prime decomposition. There are two possibilities: either N is itself a
prime or it is a product of two or more primes. The first possibility is impossible
because N is bigger than each of p1 , . . . , pk and these pi ’s are assumed to be the
only primes among whole numbers. Next, consider the second possibility that N
is a product of two or more primes. Then it has to be a product of two or more of
these p1 , . . . , pk because they are assumed to be all the primes in existence. Let
us say N = p1 p2 p3 , so, in particular, p1 |N . But this too is impossible because,
since N = (p1 p2 · · · pk ) + 1 and also p1 |(p1 p2 · · · pk ), the observation on page 141
implies that p1 |1, a contradiction. We are therefore left with the conclusion that
the number of primes is infinite.
Of course everybody prefers a direct proof, so why not just exhibit an infinite
number of primes and be done with it? The sad fact is that people have tried, but
no one has succeeded in producing such a sequence thus far.

Exercises 3.2.

(1) Without using the fundamental theorem of arithmetic, give a direct, self-
contained proof of why the prime decomposition of 455 (= 5 × 7 × 13) is
unique.
(2) (i) Prove that two positive integers k and m are relatively prime if and
only if the prime decompositions of k and m have no primes in common.
(ii) Given two positive integers a and b, prove that if their GCD is k, then
the two positive integers ka and kb are relatively prime.
156 3. THE EUCLIDEAN ALGORITHM

(3) Let a, b, c be positive integers. Prove that if a is relatively prime to b and


both a and b divide c, then ab also divides c.
(4) Define the least common multiple (LCM) of two nonzero whole num-
bers a and b to be the smallest whole number m so that m is a multiple
of both a and b. (a) If a = p2 q 7 r 3 and b = p6 qs4 , where p, q, r, s denote
distinct primes, what are the GCD and LCM of a and b in terms of p, q,
r, and s? (b) If k is the GCD of a and b and m is their LCM, prove that
mk = ab.
(5) Assuming that the positive cube root α of 2 exists (i.e., α > 0 and α3 = 2)
and is unique, imitate the proof of the irrationality of the square root of
2 given on page 153 to prove that α is irrational.
(6) A whole number which is the n-th power of another whole number is called
a perfect n-th power. Prove that if a whole number k is not a perfect
n-th power, there is no positive rational number whose n-th power is equal
to k. (This exercise of course generalizes the preceding exercise.)
(7) Prove that for any nonzero whole number a and for any positive integer n,
if there is a positive rational number r so that r n = a, then r is a positive
integer.
(8) (a) Is 2017 a prime? (b) Prove that 1234567 is not a prime. (c) Is 434343
a prime? If not, find its prime factorization.13

13 Thanks to Ole Hald for (b) and (c).


CHAPTER 4

Basic Isometries and Congruence

Overview of Chapters 4 and 5


As we mentioned in the preface, the goal of these three volumes (this volume,
[Wu2020b], and [Wu2020c]) is to give a systematic and grade-level appropriate
exposition of the mathematics in grades 6–12 that all high school teachers need
for teaching and that all mathematics educators need for doing research. If this
plan were executed literally as just described, the next two chapters devoted to
geometry—Chapters 4 and 5—would present geometry at a level close to what is
usually taught in middle school. But we have made an expository decision that will
deviate from this plan by going straight to high school geometry, for a reason that
we now explain.
The TSM geometry curriculum has been in disarray since approximately 1970,
if not before. Beyond the usual flaws of TSM that account for this disarray, one can
point to a new flaw that is specifically geometric: the profound disconnect between
the high school geometry course and the rest of the TSM school curriculum. This
disconnect manifests itself on multiple levels, but most strikingly in the sudden
onslaught of axioms, definitions, theorems, and proofs in the high school course in
geometry. It goes without saying that axioms, theorems, and proofs are concepts
that basically do not exist in TSM outside of this one course in K–12, and, in gen-
eral, definitions in TSM are things to memorize for tests but nothing to be taken
seriously for the purpose of learning mathematics. Yet, axioms, definitions, theo-
rems, and proofs show up on every page of a TSM geometry text and—TSM being
TSM—they are unceremoniously shoved down students’ throats without so much
as an explanation. Why only in the high school geometry text, but nowhere else?
This legitimate question seems never to get answered in TSM. On top of all that,
the typical geometry course demands not only "proofs", but two-column proofs, and
such proofs are supposed to be written down for the most trivial, boring theorems
imaginable at the beginning of the course. When a basic function of mathematics—
giving proofs—is rigidified into a formal ritual, it opens itself up for abuse. Not
surprisingly, the course gave rise to a perfect storm of massive nonlearning, and
the resulting disaster in a "well-taught" high school geometry course1 around 1983
has been well documented in [Schoenfeld1988]. There one finds that proving a geo-
metric theorem was strictly about nurturing students’ ability to regurgitate their
teacher’s rigid (and often irrational) demands on the format of a proof rather than
1 According to [Schoenfeld1988] (page 152), that course was considered to be "well-taught"

using "any of the measures typically employed in classroom research." Since the Schoenfeld article
has shown that course to be a total mathematical disaster, the fact that the classroom research
in mathematics education around 1983 considered it to be "well-taught" serves to reinforce the
point we are trying to make: TSM has done serious damage to mathematics education research
as well.

157
158 4. BASIC ISOMETRIES AND CONGRUENCE

about nurturing their ability to formulate and present mathematically correct ar-
guments. When students did geometric constructions with ruler and compass, the
main concern was whether their description of the steps of the construction was
correct and whether their pictures were visually accurate, but not whether they
could provide the needed reasoning to support the mathematical correctness of the
constructions. Such excesses naturally called for corrections. Unhappily, as is often
the case, the resulting corrections turned out to be not better but simply defective
in a different way. Indeed, a second kind of high school geometry course sprang
up around 1995; it gave up completely on proofs and relied solely on computer ge-
ometry software and hands-on activities to establish conviction. See, for example,
[Serra] and the discussion in [Wu2004a, pp. 533–534].
The disconnect between the TSM high school geometry course and the rest of
the TSM curriculum also shows up in the teaching of the two concepts that are
the twin pillars of the course: congruence and similarity. In TSM, these concepts
are first introduced in middle school, where students are taught that congruence
means "same size and same shape" and similarity means "same shape but not
necessarily the same size". Mathematics has to be precise in order to preclude
misinterpretations, so inherently imprecise phrases such as "same shape" and "same
size" can have no place in a mathematical definition. Often what is the "same
shape" to one person may not be the "same shape" to another. For example,
consider the following two curves: are they of the "same shape"? To those whose
claim that these curves are not of the "same shape", what can you say to convince
1
them that they are? (The left curve is the graph of 490 x2 , while the right curve is
2
the graph of x , and both are drawn to the same scale.)

Y Y

X X

The fact is that they are similar in a precise mathematical sense and therefore must
be of the "same shape".2 However, until we can formulate a precise definition of
similarity and prove that these two curves are similar, it would be difficult, if not
impossible, to argue from an artistic or an emotional point of view that they are
the "same shape". For the past fifty years, unfortunately, middle school students
have had to put up with these kinds of ambiguous statements about congruence
and similarity as mathematical definitions.
The problem with TSM’s treatment of congruence and similarity does not stop
here, however. There is an abrupt—and inexplicable—change in the definitions of
congruence and similarity in the transition from middle school to high school. In
the TSM high school geometry course, the definition of congruence (respectively,
2 They are both graphs of quadratic functions and are therefore similar, by Theorem 2.11 in

Section 2.1 of [Wu2020b].


OVERVIEW OF CHAPTERS 4 AND 5 159

similarity) is applicable only to polygons: two polygons are congruent (respectively,


similar) if they have equal angles and equal sides (respectively, equal angles and
proportional sides). Consequently, students never learn that any two circles are
similar or that any two parabolas are similar, because TSM does not even allow
them to understand what these statements mean. A disturbing example of TSM’s
indifference to the minimal requirements of mathematics can be seen in the high
school textbook [Pearson]: on page 440, both "definitions" of similarity—"same
shape but not necessarily the same size" and the one for polygons—are listed with-
out comment. It is unthinkable that any educational establishment would tolerate
this kind of mathematical double-talk.
A third kind of disconnect in TSM is the apparent irrelevance of the content of
the high school geometry course to the mathematics in the rest of the school math-
ematics curriculum. TSM does not mention any applications of what is taught in
high school geometry (congruence and similarity of triangles and the geometry of
the circle) to other parts of the school curriculum. But at least two such applica-
tions are of fundamental importance to school mathematics. First, the concept of
congruence lies at the foundation of the very definitions of length, area, and volume
(see assumption (M2) in Section 5.1 of [Wu2016a] or Section 4.1 of [Wu2020c]).
TSM fails to recognize this obvious fact because not only does it not possess a gen-
eral concept of congruence, but it does not even attempt to give a precise definition
of length, area, and volume. A second application of high school geometry is to the
concept of the slope of a line in algebra. The concept of similar triangles underlies
the definition of slope but, again, this fact has been totally ignored in TSM. So
the truth is that the content of the high school geometry course is in fact highly
relevant to the rest of the school mathematics curriculum, but the relevance has
been obliterated by TSM.
It should be clear from this discussion that the TSM geometry curriculum is in
need of a thorough overhaul. The main issues may be summarized as follows:

• The high school geometry course cannot be the only place in the K–12
curriculum where definitions, theorems, and proofs are taken seriously. To
the extent that we are trying to teach students mathematics with mathe-
matical integrity (see page xiii for the definition) rather than some latter-
day concoction that purports to be "mathematics", we have to make the
rest of the school mathematics curriculum take definitions, theorems, and
proofs seriously too. The use of an axiomatic system in the high school
geometry course is a more complex issue and will have to be handled with
some care; see the discussion on pp. 160ff.
• Congruence and similarity are not only the bedrock of the high school
geometry course and a foundation of high school algebra, they are also a
mainstay of the whole school geometry curriculum. We must find defini-
tions for these two concepts so that they are usable in both middle school
and high school.
• The curricular decision by TSM to make students work with slope in intro-
ductory algebra before teaching them about similar triangles has caused
great harm in students’ learning of slope (see [PG], for example). We have
to provide students with the necessary mathematical knowledge about sim-
ilar triangles before the introduction of slope. This will improve students’
160 4. BASIC ISOMETRIES AND CONGRUENCE

ability to learn about slope and strengthen the relevance of the high school
geometry course to the school mathematics curriculum.
Any attempt at improving the TSM geometry curriculum must therefore begin
by directly confronting these issues. A mathematics curriculum that puts def-
initions and proofs in the high school geometry course but nowhere else is ev-
idently not a viable curriculum as the present crisis in school mathematics ed-
ucation so eloquently testifies (see, e.g., [RAGS]). The main purpose of these
six volumes—[Wu2011a], [Wu2016a], [Wu2016b], [Wu2020b], [Wu2020c], and the
present volume—is to demonstrate that teaching school mathematics with mathe-
matical integrity (see page xiii) throughout K–12 is not only possible and desirable
but also makes school mathematics more learnable (compare the discussion of the
fundamental principles of mathematics on pp. xiii ff.). Such a large-scale effort at
revamping the K–12 mathematics curriculum would seem to be the only way to
combat the kind of disconnect described in the first bullet above.
It remains to say a few words about the use of an axiomatic system to begin
the study of plane geometry.3 Such an approach to plane geometry began with
Euclid around 300 BC (see [Euclid1]). That was an epoch-making achievement
in mathematics and science, but, unexpectedly, the axiomatic approach also be-
came the gold standard for introducing every beginner to mathematics through the
centuries. The fact that, for over two millennia, people did not realize the inappro-
priateness of the axiomatic method as a teaching tool for beginners simply boggles
the pedagogical mind. Euclid’s axioms, or variants thereof, are nothing but a pre-
cise summary of what we assume to be true when we embark on a serious study
of geometric figures in the plane (triangles, quadrilaterals, circles, etc.). While it
is essential that we have a clear-cut starting point for geometry, the mathematical
requirements that these axioms be as "simple" and "plausible" as possible and that
there be as few axioms as possible make the axiomatic method unsuitable for be-
ginners, in the following sense. These dual requirements put the starting point of a
beginner’s geometric journey at the "lowest level" possible. Consequently, it takes
a great deal of effort to prove many boring technical theorems just to bring the
discussion up to a level that is sufficiently interesting for an average student. We
already mentioned this affliction that besets the typical TSM high school geometry
course earlier on page 157. This kind of foundation-building work, while important
for mathematics per se, is perhaps best left to professional mathematicians because
not many beginners (or even mathematicians) can overcome the tedium of this kind
of highly nonintuitive, technical work.
Implicit in the TSM curriculum is the decision that the high school geometry
course must make up for TSM’s failure to prove anything elsewhere in the curricu-
lum by proving everything in this one course. Thus the main curricular justification
for an axiomatic development of geometry is to let students experience genuine
rigor by proving everything ab initio. We have already commented on the cognitive
disruption in students’ learning trajectory brought about by this sort of geometry
course, but there is much more to be said.
With the hindsight of twenty-three centuries (between us and Euclid), we now
know that such rigor is pedagogically untenable in a high school course (see [Hilbert]
and, e.g., Chapters 2 and 3 of [Greenberg] or Chapter 3 of [Hartshorne]). A main
3 If possible, you may wish to skim through Chapter 8 of [Wu2020b] right now, particularly

its preamble, to get an overall picture concerning the use of axioms in school geometry education.
OVERVIEW OF CHAPTERS 4 AND 5 161

reason is that the geometry of the plane is a more complex system than is usually
realized. Even the heroic efforts of some school textbooks such as [Moise-Downs]
to update the axiomatic system of Euclid fall far short of the goal of presenting a
"rigorous" treatment of plane geometry for high school. (Some obvious critical gaps
in the reasoning of [Moise-Downs] will be pointed out in the preamble to Chapter
8 of [Wu2020b].) Ultimately, what is important for beginners is not so much to
learn how to prove every theorem in a very narrow area of school mathematics but
to learn how to navigate in general from point A to point B via logical reasoning,
i.e., from hypothesis to conclusion. Again, see Chapter 8 of [Wu2020b].
We will modify the axiomatic approach by making the starting point of our
geometric studies a collection of eight assumptions, (L1)–(L8) (see pp. 165–176,
184–188, 237, and 250). We hope these assumptions are sufficiently "plausible",
but it is not part of our design to make them as "simple" as possible. In fact,
we intentionally assume quite a bit more than what the usual geometry axioms
do (compare, e.g., [Greenberg] and [Hartshorne]) in order to minimize the need to
prove intuitively obvious but technically tricky elementary theorems. In addition,
these eight assumptions overlap in subtle ways so that some redundancy among
them is built in. This redundancy helps to eliminate some (but, unfortunately, not
all) arguments that are otherwise too technical or too sophisticated for high school
students. (At certain points of the exposition, we will make explicit recommen-
dations about skipping the proofs of certain pictorially obvious facts in a school
classroom.) These expository decisions will then allow us to get to interesting theo-
rems reasonably early; e.g., Theorem G4 on page 226, which is the fourth principal
theorem in this exposition of geometry, says that opposite sides of a parallelogram
are equal (congruent). But perhaps the most unusual feature of these assumptions
is the fact that they provide a platform for defining the concept of congruence from
the beginning. This then brings us to a solution of the issue mentioned in the
second bullet on page 159 about the definitions of congruence and similarity in the
school mathematics curriculum.
We will approach congruence and similarity in a way that is different from
Euclid’s approach twenty-three centuries ago (see [Euclid1]) but more in tune with
the current understanding of these concepts. In mathematics, a congruence in the
plane is by definition a composition of the basic isometries: rotations, translations,
and reflections. This definition turns out to be suitable for use in middle school
because the essence of the basic isometries can be captured by hands-on activities.
In addition, because the concept of dilation can also be approached by way of hands-
on activities (see page 268), similarity can now be defined as the composition of a
finite number of congruences and dilations.
We envision a middle school geometry curriculum devoted to an exploration
of the basic isometries and dilations through hands-on activities together with a
judicious use of informal proofs. This new curriculum will define congruence as a
finite composition of basic isometries and will define similarity as the composition
of a finite number of dilations and congruences. The fact that the congruence or
similarity of two geometric figures (curved or not) can now be checked by hands-on
activities serves to demystify both concepts. Moreover, the hands-on activities will
foster the development of geometric intuition, an important goal of this curricular
design. The emphasis on intuition means that, in middle school, the basic isometries
will be defined informally via hands-on activities but not precisely in the sense of
162 4. BASIC ISOMETRIES AND CONGRUENCE

mathematical definitions. Likewise, other more complex concepts such as the half-
planes of a line will not be defined precisely in middle school either. Proofs will
be presented without attending to all the subtle technical details so that they
will be as intuitive as possible. For example, students can learn to prove ASA
and SAS by hands-on activities without being held responsible for transcribing the
activities into a precise written proof (see, e.g., pp. 290–299 in [Wu2016a]). In the
same way, they will also learn an informal proof of the AA criterion for triangle
similarity (see pp. 324–326 of [Wu2016a]). In addition, the availability of a correct
definition of congruence allows the concepts of length, area, and volume to be
correctly introduced in the middle school curriculum (see Chapter 5 of [Wu2016a]).
The high school geometry course will begin by revisiting the middle school
curriculum. The basic isometries and dilation will remain the cornerstones but,
this time around, each of these concepts—rotation, reflection, translation, and
dilation—will be precisely defined, as will all the standard concepts. In other
words, the definitions of congruence and similarity will remain unchanged from
middle school, but their intuitive content will be upgraded to precise descriptions.
There will no longer be any disconnect between the middle school concepts of con-
gruence and similarity and their high school counterparts. Moreover, the high
school course will clearly enunciate the geometric assumptions (L1)–(L8) that are
needed for the logical development of plane geometry. Given these changes, it will
be seen that this second tour of the middle school geometry curriculum will be far
from routine. For example, the concept of the half-planes of a line in the plane will
have to be defined abstractly by way of an assumption (L4) on plane separation (see
page 176). For another example, the definition of translation will not be given from
the beginning, but only after some theorems have been proved (see Section 4.4 on
pp. 216). The high school course will also revisit all the theorems that have been
informally proved in middle school but will do so this time from the standpoint of
the clearly stated geometric assumptions and precise definitions.
It remains to point out that implicit in the preceding description of our ap-
proach to congruence and similarity is a solution to the issue involving the teaching
of slope in the third bullet on page 159. The main geometric ingredient needed for a
correct definition of slope is the AA criterion for triangle similarity (Theorem G22
on page 288), and we have just described how an informal proof of this criterion
is built into this curriculum. Students will therefore be in possession of all the
needed tools to work with a correct definition of slope and acquire an intuitive un-
derstanding of what slope is all about (see Section 6.4 on pp. 337ff. in this volume).
This then makes it possible for students to learn—correctly, perhaps for the first
time—about the slopes of the graphs of linear equations in two variables in middle
school.
It is time to return to the expository decision concerning Chapters 4 and 5 men-
tioned on page 157. If these three volumes were to give a systematic and grade-level
appropriate exposition of the mathematics in grades 6–12, then these two chapters
would be devoted to an exposition of middle school geometry curriculum based on
the basic isometries and dilation. Later chapters in these three volumes would then
give a complete exposition of high school geometry. However, given space and time
limitations, such an exposition is not practical here. Fortunately, an account of
how middle school geometry might be taught has been given in two chapters in
OVERVIEW OF CHAPTERS 4 AND 5 163

one of the author’s middle school volumes: Chapters 4 and 5 of [Wu2016a]. Tak-
ing advantage of that fact, this volume will skip the presentation of middle school
geometry and go directly to high school geometry. Thus, instead of presenting
an informal introduction to geometry via hands-on activities and informal proofs,
Chapters 4 and 5 in this volume will, instead, present the initial segment of the
projected high school geometry course (which will encompass Chapters 6–8 of the
companion volume [Wu2020b]). For example, Chapter 4 will enumerate the precise
geometric assumptions from the beginning and give precise definitions for all the
standard terms. We should add that, to soften the formal character of such an
introduction to geometry, we have added as many side remarks and examples as
we can manage to make the exposition more user-friendly.
We are duty-bound to point out that, unlike the preceding three chapters which
give a presentation of school mathematics that is essentially consistent with the
usual sequencing of topics in the TSM school curriculum, the above outline of how
geometry should be taught in grades 8–11 represents a substantial departure from
the geometry and algebra curricula typical in TSM. Fortunately, the Common Core
State Standards for Mathematics ([CCSSM]), first released in 2010, have adopted
this very same departure. In particular, the presentation of geometry in Chapters
4 and 5 is now entirely compatible with the algebra and geometry standards of
[CCSSM] in high school.
We will round out the preceding discussion with two additional comments.
The first is the role of precise definitions in geometry. To accurately capture the
visual (geometric) information, we need precise language. For this reason, we ask
you to pay attention to the precision in the definitions of many familiar concepts
such as "half-planes", "angles", "convex sets", "rectangles", "polygons", etc. In an
overwhelming majority of these cases, the new definitions will take you by surprise.
To give but one example, TSM defines a rectangle as "a quadrilateral with four right
angles and two pairs of opposite sides with the same length", but in Chapter 4, a
rectangle is merely "a quadrilateral with four right angles" without any mention
about the length of the opposite sides (see page 193). An additional surprise may
be that it will take some effort to prove that rectangles do exist (Corollary 2 of
Theorem G3 on page 225) and that, indeed, they have opposite sides of the same
length (page 226). You may find it refreshing to see that the equality of the lengths
of opposite sides is now proved rather than assumed.
While precise definitions are important, we must emphasize that the goal of
geometry is not to study precise definitions per se, but to understand—in the
mathematical sense of its intuitive content and how to use it to prove theorems—
the visual information encoded in the definitions. The precision of the language is
merely a means to an end, but not the end itself. In geometry, the all-encompassing
concern is with the development of geometric intuition and the ability to reason in
geometric terms.
A second comment is that although basing high school geometry on the basic
isometries represents a serious departure from common practice, it actually comes
closer than any other approach to exposing a key feature of the Euclidean plane,
namely, its maximal homogeneity, in the sense that the plane has enough rotations,
reflections, and translations to carry any line segment to any other line segment
164 4. BASIC ISOMETRIES AND CONGRUENCE

having the same length.4 This homogeneity—together with the parallel postulate—
is what makes the Euclidean plane the Euclidean plane. This characteristic property
is largely invisible in the other presentations of plane geometry, especially in the
usual axiomatic treatments. We hope that, by bringing the homogeneity to the
fore, this new approach will bring a renewed appreciation of school geometry.

4.1. The basic vocabulary, Part 1


We begin at the beginning by defining, one by one, all the geometric concepts5
we need and listing all the basic facts we assume to be true so that we have a clearly
stated, unambiguous starting point. We have a choice about whether or not to begin
the geometric exploration by assuming that every line can be made into a number
line.6 Our decision is to make the teaching of geometry a natural extension of what
we have been doing on the number line, so we will make this fact explicit as (L3).
Altogether, there will be eight assumptions; the first four, (L1)–(L4), will be found
in this section, the next two—(L5) and (L6)—in Section 4.2 (page 180), (L7) in
Section 4.5 (page 237), and, finally, assumption (L8) in Section 4.6 (page 250).
Assumption (L2) on page 165 of this section is the all-important parallel postulate,
the backbone of Euclidean geometry.
The parallel postulate (p. 164)
Betweenness and definition of a segment (p. 166)
Line separation (p. 172)
Plane separation (p. 175)

The parallel postulate

A true axiomatic development of geometry begins by pretending not to know


what the plane is, what a line is, or what a point is and will set out to pin them
down by a collection of axioms. This can get bizarre, not to say excruciatingly
boring very quickly in a school classroom. We will compromise by starting on
a slightly higher level and assume you are well aware that the plane has many
points and that certain special subcollections of these points are called lines. (Note
that in this volume as well as in [Wu2020b] and [Wu2020c], we will use “line” and
“straight line” interchangeably.) For example, we will take things like the following
for granted by not mentioning them:
A line has at least two distinct points, and each line is a subset
of the plane. In the plane, there is at least one line, and given
any finite collection of lines, there is a point in the plane that
does not lie on any of these lines.
We will concentrate instead on making explicit the basic geometric facts we assume
to be known about the plane and the lines in the plane. You will see that every

4 Mathematical Aside: In technical language, this says that the Euclidean plane is a two-point

homogeneous space.
5 Other than point, line, and plane.
6 Euclid did not assume any knowledge of the number line in [Euclid1].
4.1. THE BASIC VOCABULARY, PART 1 165

single one of the assumptions below, (L1)–(L8), is intuitively obvious.7 The only
reason we write them down here is to make absolutely clear which facts we will use
for the proofs of theorems.

The first assumption is:

(L1) Through two distinct points passes a unique line.

The emphasis of (L1) is on the uniqueness of the line. Let us illustrate this fact
with a simple application. We say two lines are distinct if they are not the same,
i.e., not equal as subsets of the plane in the sense of page 141. Thus, by definition,
if two lines are distinct, there is at least one point that belongs to one but not the
other. A priori, this leaves open the possibility that, given two distinct lines, one
is contained in the other but is not equal to it. We will now prove this does not
happen.

Lemma 4.1. If two lines 1 and 2 are distinct, then there is a point P1 on 1
that does not belong to 2 and there is a point P2 on 2 that does not belong to 1 .
Proof. The proof is by contradiction. Suppose there is no such point P1 on 1 .
Take two distinct points Q and Q on 1 ; then both points must belong to 2 . It
follows that the two lines 1 and 2 both pass through Q and Q and therefore—by
(L1)—are necessarily the same line. This contradicts the fact that 1 and 2 are
distinct. So there is a point P1 on 1 that does not belong to 2 after all. In a
similar way, we can prove that there is a point P2 on 2 that does not belong to 1 .
The proof of Lemma 4.1 is complete.

Two lines L1 and L2 that have no point in common are said to be parallel. In
symbols, L1 L2 . The following lemma is a simple consequence of (L1):

Lemma 4.2. Two distinct lines are either parallel or have exactly one point in
common.
Recall the standard terminology: the intersection of two sets S1 and S2 is by
definition the collection of all the points in common to both S1 and S2 . In symbols,
S1 ∩ S2 . If there is no point in the intersection of S1 and S2 , we say they do not
intersect. Thus, an equivalent way of stating the lemma is that two distinct lines
either do not intersect or they intersect at exactly one point. Naturally one needs
to know when two lines intersect and when they do not. It turns out that this issue
cannot be settled except by an explicit assumption.

(L2) (Parallel postulate) Given a line L and a point P not on L, then


through P passes at most one line parallel to L.

It will be seen that the parallel postulate dominates plane geometry. Without
it, many things we take for granted would be false, e.g., the Pythagorean theorem,
7 Mathematical Aside: We want to be explicit about the fact that (L1)–(L8) are—by design—

not a minimal set of axioms. For example, given that we assume every line can be made into
a number line (see (L3)) and therefore the concept of "betweenness" and the property of "line
separation" are built in, (L8) is known to follow from (L4); see [Greenberg, pp. 113 and 116].
166 4. BASIC ISOMETRIES AND CONGRUENCE

the equality of opposite sides of a rectangle, the angle sum of a triangle being 180
degrees, the concept of similarity, etc. It is unfortunately the case that TSM8 does
not bring out the overriding importance of the parallel postulate, so we strongly
suggest that teachers emphasize it in their teaching and educators never lose sight
of this fact in their research on proofs in geometry. A deeper discussion of this
assumption can be found in Chapter 8 of [Wu2020b].
According to the parallel postulate, for a point P not on a line L, every line that
contains P intersects L except possibly for one line. This postulate does not assume
explicitly that there exists a line passing through P and parallel to L. However, we
shall see in the corollary to Theorem G1 on page 222 that the existence of such a
parallel line can in fact be proved once we know there are enough rotations in the
plane. So contrary to what is normally done in school textbooks, our formulation
of the parallel postulate merely asserts that there is no more than one parallel line.
We know intuitively that if three distinct lines L1 , L2 , and L3 are given so that
L1  L2 and L2  L3 , then L1  L3 . It is less known that this intuitive fact has to
be proved by invoking the parallel postulate. More formally, we state:

Lemma 4.3. If three lines L1 , L2 , and L3 satisfy L1  L2 and L2  L3 , then


either L1 = L3 or L1  L3 .
Proof. Suppose L1 = L3 . Then we will prove L1  L3 by a contradiction argument.
(However, once we have proved the existence of a parallel line on page 222, we will
return to this lemma and give a direct proof.)
Thus suppose L1 is not parallel to L3 . Then they intersect at a point P , as
shown:

L3 hhh
hhhh
hhhh P
hhhh
L1 hhh
hh

L2

The point P does not lie on L2 because P lies on L1 and L1 has no point in common
with L2 (L1  L2 by hypothesis). Thus through a point P not lying on L2 now
pass two distinct lines L3 and L1 , both parallel to L2 , contradicting the parallel
postulate. The lemma is proved.

Betweenness and definition of a segment

If A and B are two distinct points, then by (L1), there is a unique line containing
A and B. We denote this line by LAB and call it the line joining A and B.
Naturally, we will need the concept of the line segment AB, or more simply the
segment AB, joining the two points A and B. If LAB were a number line (see page
5), we would simply define the segment AB to be the collection of all the points
"between A and B", together with the two points A and B. This then motivates
the next assumption.

8 See page xiv for the definition of TSM.


4.1. THE BASIC VOCABULARY, PART 1 167

(L3) Every line can be made into a number line so that any two given points
on the line are the 0 and 1, respectively, of the number line.

Assuming (L3), we can now define the concept of a point between two distinct
points A and B (recall remark (G) on pp. 13ff. in this connection). By (L3), we
can make LAB into a number line by choosing two fixed points P and Q on the
line LAB and designate them to be 0 and 1, respectively. Then a point C is, by
definition, between A and B if C lies on LAB and, with respect to this number
line, either A < C < B or B < C < A (again, see remark (G) on pp. 13ff.). In
symbols, we write A ∗ C ∗ B.

This definition of betweenness calls for some clarification. The definition de-
pends on the choice of two random points P and Q on LAB as 0 and 1, respec-
tively. Could it happen that, with respect to one choice of 0 and 1 on LAB , we have
A ∗ C ∗ B, but respect to another choice of 0 and 1, it would no longer be the case
that A ∗ C ∗ B? We now explain why the answer is no. So consider one choice of
P, Q ∈ LAB as 0 and 1, respectively. Then LAB can be represented as a horizontal
line as usual. Here are the two possibilities for A ∗ C ∗ B:

Case 1 P Q A C B
0 1
Case 2 P Q B C A
0 1

The advantage of having a number line is that, at each point X, there is a definite
positive direction—which consists of all the numbers greater than X—and also a
negative direction—which consists of all the numbers less than X. In terms of
the picture of the number line as a horizontal line, the positive direction is right-
pointing and the negative direction is left-pointing. Hence we can equivalently define
A ∗ C ∗ B as follows: if one of the two points A and B is to the left of C, then the
other point is to the right of C. For example, the following situation shows that C
is not between A and B because A and B are now both to the left of C and neither
of the two points A and B is to the right of C:

P Q B A C
0 1

Now consider what happens when two other points P  and Q are chosen to be
the new 0 and 1, respectively. One possibility is that Q is to the right of P  so
that the new 1 is again to the right of the new 0. Then the positive (respectively,
negative) direction of the number line with respect to P and Q as 0 and 1 does not
change and, consequently, neither does the left or right direction on the horizontal
number line. It follows that the concept of order on the two number lines (see page
12) stays the same, and we have A < C < B in Case 1 and B < C < A in Case 2
168 4. BASIC ISOMETRIES AND CONGRUENCE

as before, and therefore A ∗ C ∗ B as before:

Case 1 P Q A C B
0 1
Case 2 P  Q B C A
0 1

The other possibility is that Q (the new 1) is to the left of P  (the new 0); then the
positive (respectively, negative) directions of this number line become the negative
(respectively, positive) directions of the previous number line with P and Q as
0 and 1. The switching of the positive and negative directions on LAB has the
effect that the left and right directions are switched, and so are the "<" and ">"
relationships (see page 12). Hence the previous inequalities A < C < B in Case 1
and B < C < A in Case 2 now become B < C < A in Case 1 and A < C < B in
Case 2. According to the definition of ∗ on page 167, however, it is still the case
that A ∗ C ∗ B.

Case 1 Q P A C B
1 0
Case 2 Q  P B C A
1 0

If we redraw these number lines in the usual way, so that 0 is to the left of 1, then
the pictures become more transparent:

Case 1 B C A P Q
0 1

Case 2 A C B P Q
0 1

In summary, we have shown that the definition of ∗ on a line L in terms of a


number-line structure imposed on L is well-defined; i.e., no matter which two points
on L are chosen as 0 and 1 to confer a number-line structure on L, the concept of
betweenness does not change.

Pedagogical Comments. The preceding discussion that shows "between-


ness" is well-defined is necessary for mathematical purposes, because we want to
convince you, the reader, that everything in this volume rests on a firm foundation.
However, there is little pedagogical value to engaging high school students in a
discussion of something that is at once tedious and subtle. The suggestion is there-
fore to simply not make a big fuss about these mathematical issues in the school
classroom and allow the concept of "C being between A and B" to be understood
in terms of the concept of "<" on the number line. You should of course mention
4.1. THE BASIC VOCABULARY, PART 1 169

explicitly that the definition of ∗ is well-defined and make yourself available for a
discussion of these subtleties with those students who are curious about such things.
Let it be known that it is perfectly legitimate to let slide some subtleties that
are not essential to the school curriculum in a school classroom. For example, we
have been using the real numbers without once discussing the complexities of R
that took humans more than two thousand years to unravel. Another example is
the fact that we will be using the geometric assumptions (L1)–(L8) as the foun-
dation for our geometric discussions without mentioning Gödel’s incompleteness
theorem([Henriksen]), regardless of how basic Gödel’s theorem may be. There are
already very substantive issues in school mathematics to worry about without get-
ting bogged down in these subtleties. End of Pedagogical Comments.

Implicit in the preceding discussion that ∗ is well-defined is the fact that there
is a "symmetry" in the role of A and B in A ∗ C ∗ B; i.e., A ∗ C ∗ B if and only
if B ∗ C ∗ A. This is because A ∗ C ∗ B means either A < C < B or B < C < A
relative to a number-line structure on LAB . By this definition, B ∗ C ∗ A means
either B < C < A or A < C < B. Obviously,
A < C < B or B < C < A ⇐⇒ B < C < A or A < C < B.
Therefore, A ∗ C ∗ B ⇐⇒ B ∗ C ∗ A. In words, C is between A and B if and only
if C is between B and A. This is as it should be. The following lemma gives the
most basic property of the betweenness symbol ∗ beyond this "symmetry". For its
statement, we say a collection of points are collinear if they lie on a line.

Lemma 4.4. Given three distinct collinear points A, B, C, then exactly one of
the following three possibilities holds: A ∗ B ∗ C, B ∗ C ∗ A, or C ∗ A ∗ B. (In words,
one and only one of any three collinear points is between the other two.)

Proof. [In terms of the number line, nothing could be more obvious than this
lemma. Therefore consider skipping this proof in a school classroom.]
We will convert the line LAB into a number line by declaring A to be 0 and B
to be 1. Since C is distinct from A and B, C = 0 and C = 1 on this number line.
Thus there are exactly three possibilities: C < 0, 0 < C < 1, and 1 < C. If C < 0,
then C < 0 < 1, which means C ∗ A ∗ B, by the definition of the symbol ∗ on page
167. Similarly, if 0 < C < 1, then A ∗ C ∗ B, which is equivalent to B ∗ C ∗ A by
the remarks immediately preceding Lemma 4.4. Finally, if 1 < C, then 0 < 1 < C
so that A < B < C, which means A ∗ B ∗ C. In summary, we see that there are
exactly three possibilities: C ∗ A ∗ B, B ∗ C ∗ A, A ∗ B ∗ C. The fact that they
are also mutually exclusive can be seen from the fact that, on this number line,
A < B < C, B < C < A, and C < A < B are mutually exclusive on account of the
trichotomy law (page 121). The proof is complete.

Finally, we can now give the definition we are after. The line segment, or
more simply the segment joining A and B, is by definition all the points between A
and B, together with A and B themselves.9 The notation for this segment is AB.
It is now clear that AB = BA. The points A and B are called the endpoints of AB.

9 Mathematical Aside: A segment on the number line is therefore what is normally called a

closed bounded interval.


170 4. BASIC ISOMETRIES AND CONGRUENCE

Note that there is no universal agreement on the notation used to denote lines,
segments, and later on, "rays". For example, some books use AB to denote the
line passing through A and B, AB to denote the segment between A and B, and
−−→
AB to denote the ray from A to B. One must proceed with caution in each new
situation.
Once we have segments, a natural concept to introduce is that of a polygon.
Intuitively, we do not want the following figure on the left to be called a "polygon"
because it "crosses itself", and we do not want the following figure on the right to
be a "polygon" either because it "doesn’t close up".
r r r
C
S 
CS  r
C S  C
C Sr  C
C  C
C  C
 C 
r Cr

r
 Cr
It is clear that the definition of a polygon requires some care. We first define a
special case of a polygon, the hexagon, which is by definition a geometric figure
(i.e., a subset of the plane) consisting of six points A, B, C, D, E, F in the plane,
together with the six segments
AB, BC, CD, DE, EF , and F A,
so that none of them intersects each other except at the endpoints as indicated;
i.e., AB intersects BC at B, BC intersects CD at C, etc., but there are no other
intersections otherwise, and so that no consecutive segments lie in a line; i.e., we
want to see a "corner" at each A, B, . . . , F , e.g.,
C
A
B APD
PP
E


A F

The six points A, B, . . . , F are called the vertices of the hexagon and the six seg-
ments AB, BC, . . . , F A are its sides or edges. Notice that by its very definition,
a hexagon labels its vertices cyclically in the sense that its sides connect all of
them in alphabetical order until the very end, when the last vertex F is connected
to the first vertex A.
Now that we have defined a hexagon, we want to define a polygon of any
number of sides (or vertices, for that matter), and we are up against a problem
with notation: for six vertices, we can employ A, . . . , F , but if we have a polygon
with 234 vertices, what symbols should we employ to denote these vertices? We can
use numbers instead of letters to denote the vertices, in which case, we can go from
1, 2, . . . all the way to 234. But because integers come up in so many contexts,
sooner or later this would lead to hopeless notational confusion. We are therefore
forced into using subscripts: we can efficiently denote the 234 vertices by the 234
symbols A1 , A2 , A3 , . . . , A233 , A234 . Of course we could have used any letter, say
V , instead of A for this purpose, e.g., V1 , V2 , V3 , . . . , V233 , V234 .
4.1. THE BASIC VOCABULARY, PART 1 171

With this digression into notation out of the way, we can now give the general
definitions of a polygon and related concepts.
Let n be any positive integer ≥ 3. An n-sided polygon (or more simply an
n-gon) is by definition a geometric figure (i.e., subset of the plane) consisting of n
distinct points A1 , A2 , . . . , An in the plane, together with the n segments A1 A2 ,
A2 A3 , . . . , An−1 An , An A1 , so that
(i) none of these segments intersects any other except at the
endpoints as indicated; i.e., A1 A2 intersects A2 A3 at A2 , A2 A3
intersects A3 A4 at A3 , etc., but otherwise no other intersection
is allowed, and
(ii) any three consecutive vertices are never collinear:

A1 , A2 , A3 ; A2 , A3 , A4 ; . . . Ai−1 , Ai , Ai+1 ; . . . and An−1 , An , A1 .

The second condition excludes, for example, the possibility of calling the following
geometric figure containing four distinct points a 4-gon:
Ar1
HH
HH
HH
HH
HH
A2 r r HrA
4
A3
When the number of sides is not relevant, we will simply say polygon. In symbols,
a polygon will be denoted by A1 A2 · · · An . If n = 3, the polygon is called a triangle;
if n = 4, a quadrilateral; if n = 5, a pentagon; and if n = 6, a hexagon, as
we have seen. These technical terms have somehow made their way into everyday
conversation. For example, if you want to talk about politics, you had better know
what The Pentagon roughly looks like and that it houses the Department of Defense.
In principle there is a name for every n-gon at least for n ≤ 10. Thus if n = 7, the
polygon is called a heptagon; if n = 8, it is an octagon; if n = 9, it is a nonagon,
and finally if n = 10, it is a decagon. But such extra erudition is hardly necessary
since 7-gon, 8-gon, etc., would do just fine. In these volumes, we normally use
the special names only for n = 3, 4, 5, 6.
Given polygon A1 A2 · · · An , as in the earlier case of the hexagon, the Ai ’s are
called the vertices and the segments A1 A2 , A2 A3 , etc., are called the edges or
sometimes the sides. For each Ai , both Ai−1 and Ai+1 are called its adjacent
vertices (except that in the case of A1 , its adjacent vertices are An and A2 , and in
the case of An , its adjacent vertices are A1 and An−1 ). Thus the sides of a polygon
are exactly the segments joining adjacent vertices. Two sides of a polygon with a
common vertex are called adjacent sides. A line segment joining two nonadjacent
vertices is called a diagonal.
The use of subscripts makes clear (assuming n ≥ 5) that the adjacent vertices
of A2 are A1 and A3 , the adjacent vertices of A3 are A2 and A4 , etc., and that the
adjacent sides of A2 A3 are A1 A2 and A3 A4 , the adjacent sides of A3 A4 are A2 A3
and A4 A5 , etc. However, a beginner may have more trouble visualizing that the
adjacent vertices of An are A1 and An−1 or that the adjacent sides of An−1 An are
A1 An and An−2 An−1 . One way to overcome this notational quirk is to think of the
172 4. BASIC ISOMETRIES AND CONGRUENCE

points A1 , A2 , . . . , An as being placed consecutively on a circle,10 for example, in


a clockwise (or counterclockwise) direction, as shown:
A1
An A2

A n−1 A3

.
A4
.
. .
.
Then it is quite clear from this arrangement whether or not two vertices or two
sides are adjacent around the vertex An of an n-gon.
The following are examples of polygons (with the labeling of the vertices omit-
ted):
E E A
E A E AP
E A E PP
E E
E E
E E

Line separation

Assumption (L3) enables us to say quite a bit more about lines in the plane.
To this end, we first introduce a definition. A subset R in a plane is called convex
if given any two points A, B in R, the segment AB lies completely in R. This
definition has the obvious advantage of being simple to use, but does it capture
the intuitive feeling of "convexity"? By doing lots of drawings, you will see that it
does. For example, the shaded figures below are not convex, as the segment AB in
each case fails to lie within the figure.

. A B.

. B .
A
Every line and the plane itself are of course convex. The convexity of the plane
is immediate from the definition of convexity, but because we are beginning to
prove geometric theorems, we should prove the convexity of a line. So given a line
L, suppose A and B are distinct points of L. We have to prove that the segment
AB lies in L. By assumption (L1) on page 165, there is a line LAB joining A and
B, and by the definition of a segment on page 169, the segment AB consists of all

10 Note that, here, we are using the concept of a "circle" in an informal way. The formal

definition will be given later.


4.1. THE BASIC VOCABULARY, PART 1 173

the points between A and B on LAB , together with A and B. Thus AB ⊂ LAB .
However, another part of (L1) asserts that there is only one line joining A and B;
therefore necessarily L = LAB . Thus AB ⊂ L after all.
Many common figures, such as the inside of a triangle or a rectangle or a circle,
once "inside" has been properly defined (see p. 196 and p. 186), will also be seen to
be convex. It is also a simple exercise to show that the intersection of two convex
sets is convex (Exercise 7 on p. 180). Taking intersections of convex sets will turn
out to be a very productive way of generating new convex sets (see, e.g., pp. 181ff.).
If we have a number line L0 , then "the positive half-line" L+ 0 consisting of all
the positive numbers is convex: indeed if a and b are in L+ 0 and a < b, then the
segment joining a to b is the interval [a, b] consisting of all the numbers x satisfying
a ≤ x ≤ b. Since a is positive, x has to be positive and therefore every point in

this segment also lies in L+0 . For analogous reasons, "the negative half-line" L0
consisting of all the negative numbers is convex. Observe also that the number

line L0 is now broken up into three parts: L+ 0 , L0 , and the set {0} consisting of
the point 0 alone. These three parts have the properties that (1) no two of the

parts have any point in common and (2) the union of L+ 0 , L0 , and {0} is the whole
number line L0 . Furthermore, the line segment joining a negative number A to a
positive number B must contain 0; i.e., 0 ∈ [A, B] if A < 0 and B > 0, as shown:
0
L0
A B
By virtue of assumption (L3), we can transfer what we have just said about the
number line to any line in the plane. Let us first set up some common terminology.
A set is said to be nonempty if it contains at least one point. A collection of subsets
in the plane is said to be disjoint if no two of them have a point in common. For
example, the three sets L− 0 , L0 , and {0} on the number line L0 above are disjoint.
+

Since the three sets L+ , L− , and {0} are disjoint and their union is the whole line,

it is common to say in this situation that the line is the disjoint union of L+ 0 , L0 ,
and {0}.

Lemma 4.5 (Line separation). A point O on a line L separates L into two


nonempty subsets, L+ and L− , called the half-lines of O, and L+ and L− satisfy
the following two properties:
(i) The line L is the disjoint union of L+ , L− , and {O} (the set
containing O alone), and the half-lines L+ and L− are convex.
(ii) If two points A and B on L belong to different half-lines,
then the line segment AB contains O.

A O B

Proof. By assumption (L3) (page 167), we can make L into a number line
with O being the 0 of the number line. Then letting L+ , L− , and {O} be the sets

L+0 , L0 , and {0} with respect to this number-line structure on L, we see by the
reasoning preceding the lemma that everything claimed in this lemma is true. The
proof is complete.
174 4. BASIC ISOMETRIES AND CONGRUENCE

When A and B belong to the same half-line, we sometimes say that A and B
are on the same side of O. If A and B belong to different half-lines, then we say
they are on opposite sides of O or on different sides of O.
Lemma 4.5 enables us to determine, given a point O on a line L, whether two
points A and B are on the same side or on opposite sides of O. Precisely:
(a) Two points A and B belong to the same side of O ⇐⇒ the
segment AB does not contain O.
(b) Two points A and B belong to opposite sides of O ⇐⇒ the
segment AB contains O.
The proof is simplicity itself. To prove (a), for example, first assume A and B
belong to the same side of O; let us say A, B ∈ L+ .
L− O A B L+

By (i) of the lemma, L+ is convex, so AB lies in L+ . By (i) of the lemma again, L+


is disjoint from {O} and therefore AB does not contain O. Conversely, suppose AB
does not contain O and we will prove A and B lie on the same side of O. Suppose
not, then A and B lie on opposite sides of O. Then (ii) of the lemma implies that
AB contains O, a contradiction. Hence A and B lie on the same side of O. The
proof of (b) is similar (see Exercise 5 on page 180).

You may wonder why we bother with Lemma 4.5 since all it does is to restate the
obvious fact that the number line is the disjoint union of the positive and negative
half-lines and {0}. There are two reasons. One is that Lemma 4.5 provides a direct
geometric description of the separation of any line in the plane by a point lying in it
independent of any identification with a number line. Since we will be concentrating
on doing geometry, there will be many occasions when we want to be free from the
distractions of numbers. A second reason is that Lemma 4.5 sets up a model for
the next assumption, (L4) on page 176.
With notation as in Lemma 4.5, the set consisting of the point O and one of the
half-lines determined by O is called a ray. Thus O determines two rays. We also
say these are rays issuing from O. If we want to specifically refer to the ray con-
taining A, we use the symbol ROA . We will also refer to ROA as the ray from O
to A. The point O is the vertex of the ray ROA . If O is between A and B, then
the two rays ROA and ROB have only the vertex O in common and are sometimes
referred to as opposite rays. Each ray is, intuitively, infinite in only one direction.

The following lemma will be needed for the definition of a translation (pp.
231ff.). It also nicely illustrates why the concept of betweenness (page 167) is
useful.

Lemma 4.6. Given three distinct points O, A, and B on a line L. Then:


(i) B lies on the opposite ray of ROA ⇐⇒ A ∗ O ∗ B.
(ii) B lies on the ray ROA ⇐⇒ either O ∗ A ∗ B or O ∗ B ∗ A.
(iii) The ray RAB is contained in the ray ROA ⇐⇒ O ∗ A ∗ B.

r
O A B
4.1. THE BASIC VOCABULARY, PART 1 175

Proof. [The lemma is pictorially obvious. Consider giving pictorial illustrations of


the lemma and skipping this proof in a school classroom.]
Instead of arguing abstractly using the properties of the betweenness ∗ concept,
we simply invoke assumption (L3) to make L into a number line at the outset, so
that O is 0 and A is 1, and phrase the proof of the lemma using <. Then the ray
ROA is the collection of all the nonnegative numbers on L.
To prove (i), first assume B lies on the opposite ray of ROA . Then B ≤ 0.
Since B = O by hypothesis, B < O. Thus we have B < O < A. By the definition
of "between" on page 167, this means B ∗ O ∗ A. Conversely, suppose B ∗ O ∗ A.
Then with respect to the number line L, this means B < O < A by the definition of
∗ on page 167. Since O is 0, B is negative. But ROA consists of all the nonnegative
numbers, so B does not lie on ROA . Thus B lies on the opposite ray of ROA , by
Lemma 4.5(i). This proves part (i). The proof of (ii) is quite similar and we will
leave that as an exercise (Exercise 8 on page 180).
To prove (iii), suppose RAB ⊂ ROA . Then B lies on ROA and part (ii) implies
that either O < A < B or O < B < A. If O < B < A, then RAB consists of all the
numbers P so that P < A; i.e., P < 1. Thus RAB contains negative numbers, con-
tradicting RAB ⊂ ROA because ROA is the ray of nonnegative numbers. Therefore
O < A < B; i.e., O ∗ A ∗ B. Conversely, suppose O ∗ A ∗ B. Then RAB consists of
numbers ≥ 1, which are therefore positive. So RAB ⊂ ROA as the latter consists of
nonnegative numbers. The proof of Lemma 4.6 is complete.

Plane separation

From lines we next turn to the whole plane. Two rays are defined to be distinct
if there is a point in one that does not lie in the other. Given two distinct rays ROA
and ROB with a common vertex O, the angle ∠AOB is intuitively the shaded region
Γ of the plane "between" these two rays, together with the two rays themselves, as
shown:
A

O
Γ

B
How to describe this shaded region Γ precisely is the next order of business and
we will get to that in the next section. What we do in this subsection is lay the
groundwork for such a description. To this end, we will need the planar analog
of Lemma 4.5 for a line (page 173). However, since we no longer have the planar
analog of (L3) (page 167)—which was used to justify Lemma 4.5—we will have to
take the drastic step of asserting the truth of the planar analog of Lemma 4.5 in a
new assumption. To this end, we define in general that the plane is the disjoint
union of three sets U , V , and W if these three sets are disjoint and if the union of
these three sets is the whole plane (compare the definition of disjoint union for a
line on page 173).
176 4. BASIC ISOMETRIES AND CONGRUENCE

(L4) (Plane separation) A line L separates the plane into two nonempty
subsets, H+ and H− , called the half-planes of L. The half-planes H+ and H−
satisfy the following two properties:
(i) The plane is the disjoint union of H+ , H− , and L, and the
half-planes H+ and H− are convex.


H− 


 H+
L
(ii) If two points A and B in the plane belong to different half-
planes, then the line segment AB must intersect the line L.



H q s q
A  B

 H+
L


As with line separation, H+ and H− are said to be opposite half-planes.


Two points that lie in the same half-plane of L are said to be on the same side
of L, and two points that lie in opposite half-planes are said to be on opposite
sides of L, or sometimes on different sides of L. As in the case of Lemma 4.5,
(L4) already tells us how to decide if two points A and B in the plane lie in the
same half-plane or in opposite half-planes of a line L. Precisely:
(a) Two points A and B in the plane belong to the same side of
L ⇐⇒ the segment AB does not intersect L.
(b) Two points A and B belong to opposite sides of L ⇐⇒ the
segment AB intersects L.

Activity. Prove (a) and (b).

The union of either H+ or H− with L is called a closed half-plane. A


closed half-plane bears the same relationship to the plane as that of a ray to the
line containing the ray. We leave it as an exercise to show that a closed half-plane
is also convex (Exercise 9 on p. 180).
The following lemma relates Lemma 4.5 to (L4).

Lemma 4.7. Let L be a line in the plane, and let B be a point in the half-plane
H+ of L. Suppose a line  containing B intersects L at a point A. Then the half-
line of A on  containing B is the intersection H+ ∩ , and the ray RAB is the
intersection of  with the closed half-plane of L containing H+ .
4.1. THE BASIC VOCABULARY, PART 1 177




 

 A B

L H+
Proof. [Since the lemma is pictorially obvious, consider skipping this proof in a
school classroom.]
Let the half-line of A on  containing B be denoted by + . To show that
+ = H+ ∩ , we first prove H+ ∩  ⊂ + . Let P be a point on  that lies in H+ , and
we will prove P ∈ + . If P = B, then of course B lies in + . So we may assume
P = B. Since both P and B are in H+ , the segment P B lies in H+ because the
latter is convex. Thus P B contains no point of L (because L and H+ are disjoint)
and, in particular, does not contain A. Therefore P , as a point of , lies on the
same side of A as B (see (a) on page 174), i.e., lies in + .
Conversely, we will prove the reverse inclusion: + ⊂ H+ ∩ . Thus, suppose
P lies on + and we will show P ∈ H+ ∩ ; i.e., we will show P lies in H+ , the
half-plane of L containing B. Now P being in + means the segment P B does not
contain A. This implies that P B contains no point of the line L because, if P B
contains a point C of L, then LP B ∩ L is the point C. But LP B = , so  ∩ L is C.
But we are given that  ∩ L is A, so A = C (Lemma 4.2 on page 165). Thus P B
contains A, a contradiction. So P B contains no point of the line L after all and P
and B must lie on the same side of L; i.e., P ∈ H+ (see (L4)(ii)). The proof that
+ = H+ ∩  is complete. Since the second assertion in the lemma about the ray
RAB is now trivial, we have proved the lemma.

Activity. Let L be a line in the plane and let H+ and H− be its half-planes.
If P ∈ L, let  be a line distinct from L and passing through P . Prove that the two
sets  ∩ H+ and  ∩ H− are the half-lines of  with respect to P .

We wish to make a further comment about assumption (L4). Clearly, one would
prefer a more explicit description of the half-planes of a line. After all, if a line is
drawn on a piece of paper or on a black board, one can point to the two "halves"
of the plane separated by the line. In a middle school classroom, this is what one
should do without a doubt: just point to the half-planes and not burden students
with abstract statements like (L4). But in high school, it is time for students to
learn to appreciate the difficulty of transcribing obvious visual information into
precise and (in this case) abstract language. Instead of waving their hands about
what is "on the left" or what is "on the right", they learn instead to use properties
(i) and (ii) in (L4) to pin down precisely what these half-planes are.11 Although
(i) and (ii) are nonintuitive, they nevertheless leave no doubt about each half-plane
being exactly what our intuition says it is: any two points in the same half-plane
can be joined by a segment disjoint from L so that we can walk along this segment
from one point to the other without crossing L (see (a) on page 176). Moreover,
if we are given two points A and B in the plane, "separated by L", then it means
that, intuitively, one cannot get from A to B without crossing L (see (b) on page
176). Without more information about the line, this is all we can say about its
11 The same abstract idea will be used once more at the end of this section for the statement
of an analogous theorem about polygons (Theorem 4.13 on page 195).
178 4. BASIC ISOMETRIES AND CONGRUENCE

half-planes. However, once we have coordinates and we can describe a line by an


equation, then we will be able to describe the half-planes of a line explicitly in terms
of the equation (see Section 1.4 in [Wu2020b]).
Finally, we give an illustration of how assumption (L4) can be put to use by
proving a useful fact about betweenness (see page 167).

Lemma 4.8. Let L and L be two distinct lines and let P1 , P2 , and P3 be three
distinct points on L so that P1 ∗P2 ∗P3 . Let three mutually parallel lines (cf. Lemma
4.3 on page 166) passing through P1 , P2 , and P3 intersect L at Q1 , Q2 , and Q3 ,
respectively. Then Q1 ∗ Q2 ∗ Q3 .
 

Remark. The lemma does not preclude the possibility that Pi = Qi for i = 1,
2, or 3, as the right picture above suggests. This lemma may seem frivolous at first
sight, but it has nontrivial applications. It will be used in the proof of Theorem
G15* on page 266, and it underlies the fact that a linear function defined on a line
in the plane is monotone; i.e., it is either constant or increasing or decreasing, as
the proof of (‡) on page 344 shows.

Pedagogical Comments. Lemma 4.8 is geometrically obvious but, unfor-


tunately, its proof is quite subtle. For the understanding of the mathematics in
these volumes, one may skip the proof on first reading and return to it only when
absolutely necessary. For a proof of something this obvious, you should seriously
consider skipping it in the school classroom because such a proof is—bluntly put—
(1) quite boring and therefore you may not be able to hold students’ interest and
(2) not among the most pressing things they need to learn at this juncture of their
mathematics learning trajectory (compare, for example, the Pedagogical Comments
on page 204). As you go through these volumes, you will encounter the recurrent
theme that certain proofs are presented not for your future classroom use, but (a)
for your own information on the rare occasion that an inquisitive student wants to
know the whole truth and (b) to show that the mathematics of K–12 can indeed
be developed in a way that is consistent with the fundamental principles of math-
ematics (see page xiii). End of Pedagogical Comments.

Proof. We give a proof by contradiction. Suppose Q2 is not between Q1 and Q3 ;


then Q2 lies outside the segment Q1 Q3 . Thus either the segment Q2 Q3 contains Q1
or the segment Q1 Q2 contains Q3 (this is a restatement of Lemma 4.4 on page 169).
Without loss of generality, we may assume the latter, as in the pictures below. We
will prove that this is impossible.
4.1. THE BASIC VOCABULARY, PART 1 179

L L L L
P1 Q1 P1 = Q1

P2 P2

P3 Q3 P3 Q3
L3 L3

Q2 Q2
For simplicity, let us denote the line LP3 Q3 by L3 . If P1 = Q1 , then the line
containing P1 and Q1 being parallel to L3 (by hypothesis) does not intersect L3 .
In particular, the segment P1 Q1 does not intersect L3 and, by assumption (L4)(ii)
(on page 176), P1 and Q1 lie on the same side of L3 . Needless to say, the same
is true if P1 = Q1 . Next, we note that the segment P1 P2 does not intersect L3
either because, if it does, it has to intersect L3 at P3 (two distinct nonparallel lines
intersect at only one point, by Lemma 4.2 on page 165), and therefore P1 ∗ P3 ∗ P2
contradicts our assumption that P1 ∗P2 ∗P3 (see Lemma 4.4 on page 169). Therefore
P1 and P2 also lie on the same side of L3 . Altogether, we see that the points P1 ,
P2 , and Q1 are on the same side of L3 .
Since we are assuming that Q1 Q2 intersects L3 at Q3 , by (L4)(i) on page 176,
Q1 and Q2 lie on opposite sides of L3 . Since P2 and Q1 have been shown in the
preceding paragraph to lie on the same side of L3 , it follows that P2 and Q2 also
lie on opposite sides of L3 . By (L4)(ii) (on page 176), the segment P2 Q2 intersects
L3 . A fortiori, the line LP2 Q2 intersects L3 , and this contradicts the hypothesis of
the lemma that these are parallel lines. The proof of Lemma 4.8 is complete.
Exercises 4.1.
(1) Let L1  L2 and let a third line  be distinct from L1 . Prove that if 
intersects L1 , then it must intersect L2 .
(2) Give all the reasons why the following figure with five vertices cannot be
made into a polygon no matter how the vertices are labeled:
r
@ @
@r


r





r r
(3) Let A and P be two distinct points on a line. Suppose we fix one of the
two rays issuing from A and call it R1 . Then one and only one of the
two rays issuing from P has the property that it either contains R1 or is
contained in R1 .
(4) Let P , Q, and S be three distinct points on a line L so that P ∗ Q ∗ S.
(i) Show that the two rays RP Q and RP S are equal. (ii) Let L be a line
passing through P and distinct from L. Prove that Q and S belong to
the same half-plane of L .
180 4. BASIC ISOMETRIES AND CONGRUENCE

(5) Prove (b) on page 174.


(6) Let L be a line in the plane. Let A, B, and C be three points in the plane
so that A and B are in the same half-plane of L and B and C are also in
the same half-plane of L. Prove that A and C are in the same half-plane
of L. (Be careful.)
(7) (a) Prove that the intersection of a finite number of convex sets is convex.
(b) Prove that the intersection of any number of convex sets is convex
(i.e., the number of sets could be infinite).
(8) Prove part (ii) of Lemma 4.6 on page 174.
(9) (i) Prove that a ray is convex. (ii) Prove that a closed half-plane is convex.
(10) Explain why, given any three noncollinear points A, B, C, the three seg-
ments AB, BC, CA can never intersect each other except at the endpoints
as follows: AB and BC intersect at B, BC and AC intersect at C, and
AC and AB intersect at A. (In other words, take any three noncollinear
points; then the union of the segments joining them is always a polygon—
no matter how the vertices are labeled.)
(11) (a) Suppose we have a finite or an infinite number of convex sets Ci , where
i is a whole number, and suppose that each Ci is contained in the next
one, Ci+1 . Prove that the union of these Ci ’s is also convex. (Caution:
Be very clear in your proof.) (b) Is the union of convex sets convex in
general?
(12) Prove that the half-lines in Lemma 4.5, page 173, are unique in the follow-
ing sense: let P be a point on a line L that separates L into two half-lines
L+ and L− as in Lemma 4.5. Now let H+ and H− be two subsets of L so
that:
(i) L is the disjoint union of H+ , H− , and {P } and each is
nonempty.
(ii) Both H+ and H− are convex.
Then either H+ = L+ and H− = L− , or H+ = L− and H− = L+ .12

4.2. The basic vocabulary, Part 2


This section gives the formal definition of an angle and makes explicit the as-
sumptions concerning measurements of the lengths of segments and the degrees of
angles. At the end of the section, it briefly discusses—without proof—some basic
properties of polygonal regions in the plane.
Definition of an angle (p. 181)
Distance in the plane and lengths of segments (p. 183)
Degrees of angles (p. 186)
Polygonal regions (p. 193)

12 A corresponding uniqueness result holds for half-planes of a line and will be proved in

Section 1.4 of [Wu2020b].


4.2. THE BASIC VOCABULARY, PART 2 181

Definition of an angle

The definition of an angle is unfortunately not simple. Recall that a collection


of points is said to be collinear if they lie on a line. Given three points O, A, and
B in the plane, first assume they are noncollinear. Let ROA , ROB be two rays
issuing from O. These rays determine two subsets of the plane. One of them is the
intersection of the following two closed half-planes:13
the closed half-plane of the line LOA containing B and
the closed half-plane of the line LOB containing A.
Since the intersection of convex sets is convex, this is a convex set, and it is suggested
by the shaded region Γ below (reminder: the shading only covers a finite portion
of the region which extends infinitely to the right, above ROB and below ROA ).
A

O
Γ

B
Note that Γ contains both ROA and ROB , by definition. The other subset deter-
mined by ROA and ROB is the union of the complement14 of Γ, together with the
two rays ROA and ROB . This is the shaded region below (reminder: the shading
is supposed to extend infinitely above, below, and to the left). Call this region Γ .
We note explicitly that Γ and Γ are not disjoint because they have the two rays
ROA and ROB in common.

Γ A

O P

There will be the rare occasion when we have to scrutinize Γ , so it will be useful to
have a characterization of Γ , and it is this: Γ is the union of the closed half-plane
of LOA not containing B and the closed half-plane of LOB not containing A. We
leave the straightforward proof of this fact to an exercise (Exercise 2 on page 198).
We claim that Γ is not convex. To see this, join A to B and let P be a point
on the segment AB so that A ∗ P ∗ B. We will prove that P does not lie in Γ , which
will show that Γ is not convex. By the convexity of Γ, P is in Γ and is therefore
not in the complement of Γ. Since Γ consists of the complement of Γ and the two
rays ROA and ROB , to show P is not in Γ , we must show that P lies in neither
ROA nor ROB . If it were to lie in ROA or ROB , let us say P ∈ ROA . Now P = A,
13 Recall an earlier remark on page 173 about generating new convex sets by intersecting old
ones.
14 The complement of a set S in the plane is by definition the set of all the points in the

plane not lying in S.


182 4. BASIC ISOMETRIES AND CONGRUENCE

so LOA and LAB have two distinct points A and P in common. By Lemma 4.2 on
page 165, we have LAB = LOA , and therefore B ∈ LOA (since B ∈ LAB ). This
contradicts the hypothesis that A, B, and O are noncollinear points. Thus Γ is
not convex, as claimed.
Either the convex set Γ or the nonconvex set Γ determined by these two rays
ROA and ROB is called the angle determined by these rays. Unless stated to
the contrary, we follow the standard practice of taking the convex subset to be the
angle and denote it by ∠AOB. These rays are called the sides of the angles, and
the point O the vertex of the angle. We emphasize that, in these three volumes, an
angle is always one of the (convex and nonconvex) regions of the plane determined
by two rays with a common vertex—and each angle always includes both rays.
(Most books follow a different convention by defining an angle to be the union of
the two rays themselves. The present definition suits our purpose better.) If we
want to consider the angle determined by the nonconvex subset, we will have to
say so explicitly or use an arc to so indicate, e.g.,
A

O B

A better notation, one that will be used often, is to use an arc and a letter in the
region to indicate the angle. Thus ∠b denotes the convex region in the left figure
below, while ∠c denotes the nonconvex region in the right figure below.

A A

b
O O
c
B B
The angles of a triangle (see p. 171 for the definition) are usually denoted by yet
another notation. For a triangle, such as ABC in the left figure below, we will
usually let ∠A stand for ∠BAC, ∠B for ∠ABC, and ∠C for ∠ACB.

A A
Z 
 ZZ 
 Z  D
 Z   C
Z 
 Z  

B  ZC B 


However, it is well to note that if we have a quadrilateral or an n-gon with n ≥ 4, this


notation could be troublesome. For example, for the quadrilateral ABCD in the
right figure above, the symbol ∠D will be confusing: does it mean the convex angle
∠ADC as we said it should be by default or do we actually mean the nonconvex
4.2. THE BASIC VOCABULARY, PART 2 183

angle, which is intuitively the correct one to look at. We will lightly touch on this
delicate issue on page 197.
So far, we have dealt with the situation where A, O, B are not collinear. The
remaining case that they are collinear is of special interest. If O, A, B are distinct
collinear points, then A and B are in either the same half-line with respect to O or
in opposite half-lines. First, we look at the former case. Following the definition of
an angle, we get two sets: the "region" between the rays ROA and ROB together
with the rays themselves (which is ROA = ROB ) and the complement of the rays
ROA and ROB together with the rays themselves (which is the whole plane), as
shown below.

B B
O A O A

We define these special angles to be, respectively, the zero angle and the full
angle determined by the coincidental rays ROA and ROB .
Now suppose A and B are collinear with O but lie in opposite half-lines with
respect to O, i.e., A ∗ O ∗ B (Lemma 4.6(i) on page 174). In this case, either
closed half-plane determined by the line LAB is by definition the straight angle
determined by the opposite rays ROA and ROB . Thus ∠AOB will refer to either
closed half-plane.

Distance in the plane and lengths of segments

We next address the issue of measurement: how to measure the lengths of


segments and the degrees of angles.
Let us begin with the length of a segment, because it is more elementary than
the concept of the degree of an angle. Given a segment AB, what is its length? In
one sense, we already know the answer because the line LAB can be made into a
number line (by (L3) on page 167), and on a number line, the length of a segment
is well-defined (see page 6). But there is a problem with this concept of "length",
because its definition will depend on the choice of a unit segment on LAB . Until
we can fix a choice of this unit segment on LAB , we do not know what the "length
of AB" could be. So we are confronted with the problem of how to choose a unit
segment on every line once and for all. There is actually an additional problem that
we can at least discuss intuitively at this point (and will be able to do so precisely
after (L7) on page 237). Suppose we have three parallel lines as in the left picture
below.
184 4. BASIC ISOMETRIES AND CONGRUENCE

Dr Er P r

Cr Er Br rN
r r
Mr
A B O r

Suppose we have chosen the segment AB in the bottom line and the segment DE
in the top line as a unit segment in its respective line. Intuitively, an appropriate
translation15 in the upward direction will bring the bottom line to the middle line
so that the point B goes to the point B  and A to C. Since we expect translations
to be length-preserving, the segment CB  will have length 1. Similarly, a suitable
translation in the downward direction will bring the top line to the middle line
and bring D to C and E to E  . But DE is a unit segment and the translation is
length-preserving, so the segment CE  also has to be of length 1. Obviously, not
both CE  and CB  can have length 1, so we have a contradiction. This shows that
the choice of a unit segment on a given line cannot be random.
We can also look at the same phenomenon from a different perspective. Con-
sider the three lines LOM , LON , and LOP in the right picture above. Again, suppose
we have chosen the segments OM , ON , and OP as unit segments in their respective
lines. Then M , N , and P all lie on the unit circle around O. Still speaking intu-
itively, it is clearly not a comforting thought that this unit circle does not appear
to be "round" as a result of such random choices of unit segments on these lines!
This naive discussion points to the fact that the concept of the length of a seg-
ment in the plane is far from simple; it can only be fully understood in the context
of rotations, translations, and reflections of the plane (which will be addressed in
assumption (L7) on page 237). Right now, we will make an assumption so that we
can assign "lengths" to segments in the plane in a "consistent" way. Precisely:

(L5) To each pair of points A and B of the plane, we can assign a number
dist(A, B), called the distance between A and B, so that
(i) dist(A, B) = dist(B, A) and dist(A, B) ≥ 0. Furthermore,
dist(A, B) > 0 ⇐⇒ A = B.
(ii) Given a ray with vertex O and a positive number r, there is
a unique point B on the ray so that dist(O, B) = r.
(iii) Let O and A be two points on a line L so that dist(O, A) = 1,
and let O and A be the 0 and 1 of a number line on L (as in (L3)).
Then for any two points P and Q on L, dist(P, Q) coincides with
the length of the segment P Q on this number line.
(iv) If A, B, C are collinear points and C is between A and B,
then
dist(A, B) = dist(A, C) + dist(C, B).
Let us amplify on (ii) and (iii). (ii) guarantees that, given a positive number r
and a point O in the plane, there are many points A of distance r from O (one for
each ray issuing from O). Note also that, without (ii), (iii) would not make sense
because we would not know whether there is a point A on L so that dist(O, A) = 1.
15 The precise meanings of rotations, reflections, and translations will be given in Sections

4.4 and 4.5, and the assumption about them is (L7) on page 237.
4.2. THE BASIC VOCABULARY, PART 2 185

Now the main thrust of (iii) is that, with O and A as 0 and 1, respectively, on L,
the concept of the length of a segment [P, Q] on the number line L is already well-
defined (see page 6), and it would potentially be confusing if this length were to
differ from the distance between P and Q. Fortunately, (iii) averts such confusion.16
It may be worthwhile to further point out that, according to equation (2.37)
on page 126, the length of a segment [P, Q] on L is |P − Q|. In this light, what (iii)
shows is that the distance between P and Q can be computed by |P − Q| where P
and Q are now regarded as two numbers on this number line. This fact is of critical
importance when we get to setting up coordinates in the plane (see Section 6.3 on
pp. 331ff.).
We will refer to the assignment of a number dist(A, B) to each pair of points
A and B in the plane as the distance function. In view of (iii), the length of a
segment AB for any two points A and B, to be denoted by |AB|, will henceforth
be defined to be dist(A, B) without any reference to the line containing A and B.
Thus "length of a segment" retains the intuitive meaning of "the distance between
the endpoints".
Anticipating assumption (L7) on page 237, we hasten to show how (L5), to-
gether with (L7), will rule out the absurd situations depicted in the pictures on
page 184. First, we show that it is impossible that both CE  and AB have length
1 in the picture below. Suppose they do, and we will deduce a contradiction.
Cr Er Br
r r
A B
This is because the "upward" vertical translation that moves the bottom line to
the top line will move A to C and B to B  , and since translation preserves length
of segments, the lengths of CB  and AB will be the same. By (iii) of assumption
(L5), we get
|CE  | + |E  B  | = |CB  | = |AB| = 1.
Since also |CE  | = 1 by hypothesis, we have
1 = |CE  | = |CB  | − |E  B  | = |AB| − |E  B  | < |AB| = 1,
where the inequality is because |E  B  | > 0, by (i) of (L5). Therefore 1 < 1, a
contradiction. So if we assume (L5) and (L7), the fact that AB has length 1 in the
bottom line will preclude CE  from having length 1 in the top line.
Similar considerations will rule out why it is impossible that all three segments
OM , ON , and OP in the following picture could have length 1. Again, suppose
they do, and we will deduce a contradiction.
r
r M
P
rN

M
O r r

16 With hindsight, the requirement in (iii), that dist(O, A) = 1, is necessary because the

segment [0, 1], i.e., [O, A], on L has length 1.


186 4. BASIC ISOMETRIES AND CONGRUENCE

Indeed, if we rotate counterclockwise around O until the ray ROM is on top of the
ray RON , then the point M will be rotated to the point M  on RON . Since rotation
preserves length, we have |OM  | = |OM | = 1. By (L5)(iii),

|ON | + |N M  | = |OM  | = 1.

Since |ON | = 1 by hypothesis, we get 1 = |ON | = |OM  | − |N M  | < |OM  | = 1,


and the inequality is because |N M  | > 0 on account of (L5)(i). We have the desired
contradiction.
With the availability of the concept of distance in the plane, we can now for-
mally introduce the concept of a circle. Fix a point O. Then the set of all points A
in the plane so that dist(O, A) is a fixed positive constant r is called the circle of
radius r (in the plane) around O or centered at O. The point O is the center.
Note that by (L5)(ii), if r is a positive number, then there is a circle of radius r
around O.
This is the precise definition of the concept of a circle, but in school mathe-
matics, the word "circle" is usually used in an undisciplined way. Given a circle of
radius r and center O, then school mathematics also refers to either of the following
sets as the circle of radius r and center O:
all the points A satisfying dist(A, O) ≤ r,
all the points A satisfying dist(A, O) < r.
In advanced mathematics, the former is called the closed disk of radius r around
O, and the latter is called the open disk of radius r around O.17 In school
mathematics, we are usually only interested in the closed disk, so when there is no
danger of confusion, we will use disk to mean either of the above. When absolute
clarity is mandatory, we will make the distinction among "circle", "open disk", and
"closed disk" in these volumes.

A circle whose radius is of length 1 is called a unit circle, and a disk of radius
1 is called a unit disk. In general, given a circle C of radius r around a point O,
we say a point P is inside C if P is in the closed disk of radius r around O; i.e.,
dist(P, O) ≤ r. We say P is in the exterior of C if dist(P, O) > r.

Degrees of angles

We need one more definition before we can introduce the concept of the degree
of an angle. We say two angles ∠AOC and ∠COB, with a common side ROC , are
adjacent angles with respect to ∠AOB if C belongs to ∠AOB (as a region in
the plane). Let it be stated explicitly that, in this case, although ∠AOB can denote
either the convex subset or the nonconvex subset, once the choice is made, then
∠AOC and ∠COB are understood to be subsets of ∠AOB. For example, suppose
∠AOC and ∠COB are adjacent angles with respect to a convex angle ∠AOB as
in the left picture below.

17 This is the standard way to use the words "open" and "closed" in advanced mathematics.
4.2. THE BASIC VOCABULARY, PART 2 187

A A

O C O C

B B
Then ∠AOC in this context will have to be the convex (shaded) subset in the left
picture below rather than the nonconvex subset indicated by the arc in the right
picture above. Similarly, ∠COB in this context will have to be the convex angle.
Next, consider the situation where ∠AOC and ∠COB are adjacent angles with
respect to a nonconvex angle ∠AOB as in the left picture below; then in this case
∠AOC (for example) has to be the shaded subset on the left and not the nonconvex
subset indicated by the arc on the right.
A A

C C O
O

B B
An interesting example of adjacent angles is the case of a straight angle ∠AOB
(see p. 183 for the definition of straight angle): let the ad hoc notation of Π+ denote
the upper closed half-plane of LAB and let Π− denote the lower closed half-plane
of LAB :
Π+
O
A B
Π_
Then Π+ and Π− are adjacent angles with respect to the full angle at O (see page
183 for the definition of full angle).
Adjacent angles ∠AOC and ∠COB (with respect to ∠AOB) are the analogs,
among angles, of adjacent segments AC, CB so that A, B, C are collinear and C
is between A and B.

O A C B

The concept of adjacent angles will allow us to formulate the analog of part (iv) in
assumption (L5) above.
Now we can introduce the concept of the degree of an angle by way of an
assumption. Intuitively, every angle has a degree, a straight angle should be 180
degrees, and the "full" angle should be 360 degrees. Our assumption now takes the
188 4. BASIC ISOMETRIES AND CONGRUENCE

following form:

(L6) To each angle ∠AOB, we can assign a number |∠AOB|, called its
degree, so that:
(i) 0 ≤ |∠AOB| ≤ 360◦ , where the small circle ◦ is the abbrevi-
ation of "degree".
(ii) Given a ray ROB and a number x so that 0 < x < 360 and
x = 180, let one of the two closed half-planes of the line LOB be
specified. Then there is a unique ray ROA lying in the specified
closed half-plane of LOB so that |∠AOB| = x◦ , where ∠AOB
denotes the convex angle if x < 180, and the nonconvex angle if
x > 180.
(iii) |∠AOB| = 0◦ ⇐⇒ ∠AOB is the zero angle; |∠AOB| = 180◦
⇐⇒ ∠AOB is a straight angle; |∠AOB| = 360◦ ⇐⇒ ∠AOB is
the full angle at O.
(iv) If ∠AOC and ∠COB are adjacent angles with respect to
∠AOB, then

|∠AOC| + |∠COB| = |∠AOB|.

Observe that parts (i), (ii), and (iv) of (L6) are the exact analogs of (i), (ii),
and (iv) of (L5). Also observe that, with respect to the previous situation of two
straight angles Π+ and Π− being adjacent angles with respect to the full angle
at O, part (iv) of (L6) now provides a consistency check on part (iii) of the same
assumption (L6).
Π+
O
A B
Π_
Indeed, (iv) says that the following is valid:

|Π+ | + |Π− | = (degree of full angle at O) = 360◦ .

But (iii) implies that the left side is equal to 180◦ + 180◦ , which is also 360◦ . So
indeed the equality is valid.
We can also use property (iv) of the degree of an angle to prove something that
confirms our intuition. Given two rays ROA and ROB with a common vertex O so
that O, A, and B are not collinear, then they determine two angles, one convex
and the other nonconvex (page 181), and both angles are denoted by ∠AOB. The
next lemma is intuitively clear.

Lemma 4.9. Let O, A, and B be noncollinear points; then ∠AOB is convex


⇐⇒ |∠AOB| < 180◦ .

Proof. We first prove that if ∠AOB is convex, then |∠AOB| ≤ 180◦ . By the
definition on p. 182, the convexity of ∠AOB implies that it is the intersection of
the closed half-plane ΠA of LOB containing A and the closed half-plane ΠB of LOA
containing B.
4.2. THE BASIC VOCABULARY, PART 2 189

A ΠA
@
@
@
@
@
E O B

In particular, A lies in ΠA and therefore, if E is a point on LOB so that E and


B belong to opposite half-lines with respect to O, then ∠EOA and ∠AOB are
adjacent angles with respect to the straight angle ΠA . By (iv) of (L6), we get
|∠EOA| + |∠AOB| = |ΠA |.
Since E, O, and A are not collinear by hypothesis, |∠EOA| > 0 by (i) of (L6), and
since |ΠA | = 180◦ by (iii) of (L6), we see that
|∠AOB| < |∠EOA| + |∠AOB| = |ΠA | = 180◦ .
Conversely, suppose |∠AOB| < 180◦ and we have to prove that ∠AOB is convex.
By hypothesis, |∠AOB| < 180◦ . Since there are only two possibilities for ∠AOB—
the convex angle ∠AOB and the nonconvex angle ∠AOB—it suffices to show that
the nonconvex ∠AOB has > 180◦ . Let us do that.
A
E
O ΠA
B
Let E be a point on the line LOB so that E and B belong to opposite half-lines with
respect to O. Since the convex angle ∠AOB is contained in the closed half-plane
ΠA of LOA containing B and since E is not in ΠA (see Lemma 4.7 on page 176),
E is not in the convex angle ∠AOB. Thus E is in the complement of the convex
angle ∠AOB and is therefore in the nonconvex angle ∠AOB. Thus ∠AOE and
∠EOB are adjacent angles with respect to the nonconvex angle ∠AOB. By (iv) of
(L6),
|∠AOE| + |∠EOB| = |nonconvex ∠AOB|.
By (i) of (L6), |∠AOE| > 0◦ and by (iii) of (L6), |∠EOB| = 180◦ . Thus we get
|nonconvex ∠AOB| > 0◦ + 180◦ = 180◦ .
By a previous remark, this means if |∠AOB| < 180◦ , then it cannot be the noncon-
vex angle ∠AOB. Thus, |∠AOB| < 180◦ implies ∠AOB is convex and the proof
of Lemma 4.9 is complete.

In view of Lemma 4.9, our convention of taking every angle that is not a
straight angle to be convex therefore amounts to assuming that, unless stated to
the contrary, such an angle is < 180◦ .
For a later need, we point out the following direct consequence of the uniqueness
assertion in part (ii) of assumption (L6).
190 4. BASIC ISOMETRIES AND CONGRUENCE

Lemma 4.10. Let two angles ∠M AB and ∠N AB be both convex or both non-
convex (see the picture below). Suppose they have one side RAB in common and
M and N are on the same side of the line LAB . Then the other sides RAM and
RAN coincide if and only if the angles have the same degree.
M
"
"
" N
" 
"
"
" 
"
"
"
"
"
"
A B
As in the case of assumptions (L5) on distance, (L6) by itself does not have
much substance and its full significance is revealed only in the context of the next
assumption that the basic isometries (to be defined in the next section) are distance-
preserving and degree-preserving (see assumption (L7), page 237).

We now give a more intuitive discussion of the degree of an angle. Let ∠AOB
be given. Here, ∠AOB will denote either the convex angle or the nonconvex angle,
depending on the situation. Let C be the unit circle centered at O, and we may

assume that both A and

B lie on C. Let AB denote the intersection of C with
∠AOB. We will call AB an arc on C or, more precisely, the arc intercepted by

∠AOB on C. AB is called a minor arc on C if it is the intersection of C with a
convex angle. It is called a major arc if it is the intersection of C with a nonconvex
angle. It is possible, using the distance function in the plane, to define the length
of an arc.18 An arc whose length is 360 1
of the length of C is called an arc of one
degree. Then we can subdivide a degree into n equal parts in the sense of equal
arc-length (where n is any whole number), thereby obtaining n1 of a degree, etc. It
is exactly the same as the division of the chosen unit on a number line into unit
fractions in Section 1.1 except that in this case we have a "circular number line"
so that, once a point has been chosen to be 0, the number 360 coincides with 0
again. In any case, every arc on the unit circle will also have a length measured in
terms of degrees so that we can speak of an arc of 36.7 degrees. (This discussion
will be made completely precise in Section 1.5 of [Wu2020c]; see especially Lemma
1.12 therein.)

Still on an intuitive level, with ∠AOB and AB as above, the degree |∠AOB|
of ∠AOB in the sense of assumption (L6) on p. 188 is just the degree of the arc

AB that ∠AOB intercepts on the unit circle. Thus in the following picture, if the

length of this arc AB is x degrees, then |∠AOB| = x◦ . (On the left, ∠AOB denotes
the convex angle and, on the right, the nonconvex angle.)

18 There is a subtle point in this definition which will be addressed in Chapter 4 of [Wu2020c]

when we discuss length and area. It has to do with the fact that we have yet to precisely define
what the "length of an arc" is. There is no fear of circular reasoning, however, because the
length of an arc can be defined independently and can be given right now if we do not mind the
interruption. Therefore, the concept of "length of an arc" in this discussion may be taken in a
naive sense without any fear of logical difficulties.
4.2. THE BASIC VOCABULARY, PART 2 191

A A
xo
1
O 1 B O B

xo

Notice that the method of angle measurement we have just described is exactly
the principle used in the construction of the protractor.

Mathematical Aside: (1) With notation as in the preceding discussion, the de-
gree of an arbitrary angle, ∠AOB, is defined in advanced mathematics as follows.
The length (i.e., circumference) of the unit circle C being 2π, let δ = 2π/360. If
we think of the full angle at the center of the circle as 360◦ , then δ is the length
of the arc intercepted on the unit circle by a 1◦ angle. Now let the length of the

arc AB intercepted by ∠AOB on the unit circle be denoted by ||AB||. Then the
degree of ∠AOB is, by definition, the number ||AB||/δ. (2) As is well known,
there is another unit for measuring angles, called radian, that is more commonly
used in advanced mathematics. The reason for preferring radians to degrees will be
explained in Section 1.5 of the companion volume [Wu2020c], but for now, degree
will serve perfectly well as a unit for measuring angles.

An angle of 90◦ is called a right angle. An angle is acute if it is less than


90 , and it is obtuse if it is greater than 90◦ but less than 180◦ . There are analogs

of these names for triangles, namely, a triangle is called a right triangle if one of
its angles is a right angle, an acute triangle if all of its angles are acute, and an
obtuse triangle if (at least) one of its angles is obtuse. (We will see in Section
6.5 of [Wu2020b] that a triangle cannot have more than one obtuse angle or more
than one right angle.)
Let two lines meet at O, and suppose one of the four angles, say ∠AOB as
shown, is a right angle.
A

B q B
O

A
Then we claim that all the remaining angles are also right angles; i.e., |∠BOA | =
|∠A OB  | = |∠B  OA| = 90◦ . This is because, by (iii) of (L6), |∠BOA | =
|∠AOA | − |∠AOB| = 180◦ − 90◦ = 90◦ . Similarly, the remaining two angles
are also 90◦ . It follows that when two lines meet, one of the four angles so pro-
duced is a right angle if and only if all four angles so produced are right angles.
By definition, two lines are perpendicular if one of the four angles at the point
192 4. BASIC ISOMETRIES AND CONGRUENCE

of intersection is a right angle. In symbols, LAO ⊥ LOB in the preceding figure,


although it is equally common to write instead AO ⊥ OB. A ray ROC that lies
in an angle ∠AOB 19 is called an angle bisector of ∠AOB if the degrees of the
adjacent angles ∠AOC and ∠COB (with respect to ∠AOB) are equal. By a com-
mon abuse of language, sometimes we also say that the line LOC or the segment
OC (rather than the ray ROC ) is an angle bisector of ∠AOB. We also say ROC
or LOC bisects ∠AOB.
 A
 

 
P
O PP C
PP
PP
PP
B
Using assumption (L6)(ii), we see that an angle can have one and only one
angle bisector (Exercise 3 on page 198). Therefore if CO ⊥ AB as shown below,
then ROC is the unique angle bisector of the straight angle ∠AOB.
C

A O B
We thus have
Lemma 4.11. Let L be a line and O a point on L. Then there is a unique line
passing through O and perpendicular to L.

With the availability of measurements for both angles and line segments, we
can complete the list of standard definitions. First, if AB is a segment, then by
(L5)(iii), we may assume that LAB can be made into a number line so that the
segment AB is just [A, B], and dist(A, B) = B − A, where the A and B in "A − B"
are understood to be numbers. If C = 12 (A + B), then C ∈ LAB and it is easy to
check that
 
1
A < C < B and C − A = B − C = (B − A) .
2

O A C B
     

From A < C < B, we get C ∈ [A, B], and from C − A = B − C, we get dist(A, C) =
dist(C, B), or in the language of length as defined on page 185, |AC| = |CB|.
This point C is called the midpoint of the segment AB; i.e., C ∈ AB and C
is equidistant from the endpoints A and B of AB. Then, analogous to the angle
bisector of an angle, the perpendicular bisector of a segment AB is the line
perpendicular to LAB and passing through the midpoint of AB. It follows from the
uniqueness of the line perpendicular to a line passing through a given point that
there is one and only one perpendicular bisector of a segment.
19 Recall that an angle is a region, so it makes sense to say a ray lies in ∠AOB.
4.2. THE BASIC VOCABULARY, PART 2 193

We now introduce some common names for certain triangles and quadrilaterals
(see page 171 for the definitions).
An equilateral triangle is a triangle with three sides of the same length, and
an isosceles triangle is one with at least two sides of the same length. (Thus
by our definition, an equilateral triangle is isosceles.) A quadrilateral all of whose
angles are right angles is called a rectangle.20 A rectangle all of whose sides are of
the same length is called a square. Be aware that, at this point, we do not know
whether there is a square or not, or worse, whether there is a rectangle or not. (If it is
the case that the sum of (the degrees) of the four angles of the quadrilateral is 361◦ ,
then clearly no rectangle can exist, much less a square.) Now for a quadrilateral
ABCD, the sides AB and CD are formally defined to be opposite sides of the
quadrilateral, as are the sides BC and AD.
B PPP
PP
PC



A D
A quadrilateral with at least one pair of parallel opposite sides21 is called a trape-
zoid. A quadrilateral with two pairs of parallel opposite sides is called a parallel-
ogram. A quadrilateral with four sides of equal length is called a rhombus. We
shall prove in Section 6.2 of [Wu2020b] that rhombi are parallelograms.
A debate of long standing in TSM is about whether one should define an isosce-
les triangle to have exactly two sides of equal length, a rectangle to be a quadrilateral
with four right angles but with at least two unequal adjacent sides, or a trapezoid
to be a quadrilateral with exactly one pair of parallel sides. This is not a productive
debate because there are valid mathematical reasons for the convention adopted in
the preceding paragraph.22 This is another reason why we should get rid of TSM
so that equilateral triangles are allowed to be special cases of isosceles triangles,
squares are allowed to be special cases of rectangles, etc.

Polygonal regions

In the above catalog of names for polygons, we all know that equilateral trian-
gles and squares are special; they are examples of regular polygons. A polygon is
by definition a regular polygon if all its sides have the same length, all its angles
(at the vertices) have the same degree, and it is inscribed in a circle; i.e., all its
vertices lie on a circle. There is an equivalent way of expressing the last condition
about being inscribed in a circle that turns out to be important for other reasons,
and it involves convexity. This is our next issue.
We already mentioned the fact that school mathematics conflates a closed disk
of radius r and center O (see p. 186 for the definition) with the circle of the same
20 Mathematical Aside: This is the correct definition of a rectangle because in the non-
Euclidean geometries, there can be no rectangles in this sense. See [Greenberg, page 250].
21 Strictly speaking, the correct statement is that "the lines containing the opposite sides are

parallel". But this kind of abuse of language is common in mathematics.


22 For example, if we know that the area of a rectangle is the product of (the lengths of) its

sides, the present convention immediately implies that the area of a square is the square of one
side, but if we do not allow a square to be a rectangle, then we must prove anew the area formula
for a square.
194 4. BASIC ISOMETRIES AND CONGRUENCE

radius and center. Thus when school mathematics talks about "the area of the
circle of radius r around O", what it means is actually "the area of the closed disk
of radius r around O". There is little hope of forcing a change of terminology in
school mathematics at this late date, so just grin and bear it. Nevertheless, we
want to create a mathematical framework in which the difference between a circle
and a disk of the same radius and center can be carefully analyzed when the need
arises (as indeed it will). In everyday language, we normally refer to a circle as the
"boundary" of the closed disk with the same radius and center. This way of talking
about a circle and its associated disk is both unambiguous and convenient, so there
is no reason not to adopt it in mathematics provided we can make precise sense of
"boundary". This we now do.
Let S be a subset of the plane Π. A point B is a boundary point of S in
Π if every disk23 centered at B of positive radius—no matter how small—contains
a point in S and a point not in S. Intuitively, this means that a boundary point
of S is one that can be approached arbitrarily closely by points in S and also by
points outside S. In particular, a boundary point of S is never a point "completely
inside" S or "completely outside" S. So this conforms to our naive conception
of a "boundary" point of S. By definition, the boundary of S is the set of all
the boundary points of S. We leave as an exercise (Exercise 4 on page 198) the
verification that the circle of radius r (r > 0) about O is the boundary of the closed
disk of radius r about O as well as the boundary of the open disk of radius r about
O (see p. 186 for the definitions of open disk and closed disk).
Note that the concept of the boundary of a set S in Π is dependent on the
fact that S is being considered as a subset of the plane Π. Thus if S is a segment
AB in the plane, then the boundary of AB, as a subset of the plane, is the whole
segment AB itself. This is in contrast with the fact that, when we talk about "the
boundary of the segment AB in the line LAB ", we mean the two endpoints A and B.

Activity. Verify the last claim that the boundary of a segment AB in the
plane is AB itself.

Denote the open disk of radius r with center O by D and the circle with the
same center and the same radius by C. As noted, the boundary of D is C. There is
another set with C as its boundary: if E denotes the set of all the points P so that
|P O| > r, then E too has C as its boundary (see Exercise 4 on page 198 again).
We will call E the exterior of C. To distinguish between the two sets D and E,
we introduce the following concepts: a set S in the plane is said to be bounded
if it is contained in some closed disk of radius R; otherwise it is unbounded. For
example, a circle of radius r is bounded, and so is an open disk or a closed disk of
radius r, but any line, any ray, or any half-plane of a line is unbounded. We can
also understand boundedness from a slightly different point of view, as follows.

Lemma 4.12. A set S in the plane is bounded if and only if there is a point O 
in the plane and a positive number R so that the distance of every point in S from
O  is ≤ R.
Proof. First let S be bounded. So it is contained in a closed disk D of radius R. If
the center of D is O  , then the distance of each point of S from O  is ≤ R because
23 It does not matter whether the disk is closed or open.
4.2. THE BASIC VOCABULARY, PART 2 195

each point of S is a point of D. Conversely, if a set S has the property that for
some fixed positive number R, every point of S is of distance ≤ R from a fixed
point O  in the plane, then S is bounded because S is contained in the closed disk
of radius R centered at O  . The proof is complete.

With the same notation, we see that the open disk D is bounded but the exterior
E of the circle C is not. It is clear that the plane is the disjoint union of D, C, and
E (see page 175 for the definition of disjoint union). There is also a property about
D and E that is intuitively obvious:
Any segment joining a point in the open disk D to a point in the
exterior E of the circle must intersect the circle C.
Like many "obvious" statements in geometry, a proof of this assertion involves
subtle concepts about real numbers. (Compare the intermediate value theorem in
Section 6.2 of [Wu2020c].) For this reason, we will assume this fact here without
proof.
A set in the plane that contains all of its boundary is called a closed set. In
terms of the preceding notation, the closed disk with center O and radius r, to be
denoted by D, is a closed set. Observe that D = D ∪ C, where we recall that the
symbol "∪" stands for union. In this volume, we will refer to the closed disk D
with radius r and center O as the closed set inside the circle C. Sometimes we
say more simply that D is the inside of C.
We have discussed the situation of the circle in such detail because it sets up a
model for the discussion of polygons (see page 171 for the definition). The linguistic
abuse of the word "circle" described on page 194 in fact spills over to "triangle",
"quadrilateral", and in general, "polygon". For example, a triangle is by definition
the union of the three segments consisting of the three sides, but when we speak
of, e.g., "the area of the triangle", we certainly do not mean "the area of the three
segments" but, rather, "the area inside those three segments", where the meaning
of inside is usually understood in an intuitive and imprecise sense.
We now try to shed some light on the word "inside". First, let us draw a
parallel with the situation of a circle by invoking a theorem that we will not prove
in this volume. Two new definitions will be needed for the statement of the theorem.
A polygonal segment is a finite collection of segments A1 A2 , A2 A3 , A3 A4 , . . . ,
An−2 An−1 , An−1 An , with the understanding that these segments could be collinear
and that there may be intersections among them. Then a region R in the plane
is said to be connected if any two points in R can be joined by a polygonal
segment that lies completely in R. It is easy to see that the open disk D with
center O and radius r and the exterior E of the circle with center O and radius r
are disjoint connected sets (see Exercise 5 on page 198). Also, recall the definition
of the complement of a subset S in the plane as all the points in the plane not lying
in S (see the footnote on page 181).

Theorem 4.13. The complement of a polygon P consists of two nonempty


planar regions B and E with the following properties:
(i) B and E are both connected, B is bounded and E is un-
bounded, and P is their common boundary. Moreover, the plane
is the disjoint union of the three sets B, E, and P.
196 4. BASIC ISOMETRIES AND CONGRUENCE

(ii) A segment joining a point of B to a point of E must intersect


the polygon P.
In addition, suppose we have two nonempty planar regions B  and E  so that P is
their common boundary and so that the plane is the disjoint union of the three sets
B, E, and P. Then after a change of notation, if necessary, we have B  = B and
E  = E.

This theorem should remind you of the plane separation assumption (L4) on
page 176.
Not to belabor the point, but it should be obvious that if we replace P, B, and
E in the theorem by the sets C, D, and E (that arose in the preceding discussion
of the circle C), respectively, then the theorem (except for the last part) is just a
summary of what we found out about the circle C. From now on, we will denote the
bounded set in the theorem exclusively by B. Then we call the union of B and P
the inside of P or the region enclosed by P; the region E in Theorem 4.13 will
be called the exterior of P.24 It follows from Theorem 4.13 that both the inside
of a polygon and the union of a polygon with its exterior are closed sets, and both
have P as boundary.
We will not give a proof of Theorem 4.13, to avoid getting sidetracked because
it is long, and it involves technical arguments that cannot be said to be basic to the
K–12 curriculum. It may be mentioned that the standard statement of Theorem
4.13 does not include the last part about when "B  = B and E  = E". It is
included here because of our later needs and because it is an easy consequence of
the first part of the theorem. In any case, an essentially complete and readable
proof of Theorem 4.13 is given on pp. 267–269 of the classic What Is Mathematics?
([Courant-Robbins]).25
One can easily believe this theorem by looking at a few pictures; the shaded
set in each of the following is the inside of the polygon in question.

From Theorem 4.13, a polygon is the boundary of the inside of the polygon,
which is a closed bounded set. This motivates the following definition.

Definition. A polygonal region is the inside of some polygon.

24 Caution: The "inside" of P, as defined, is a closed set, whereas the "exterior" of P is an

open set. This is the reason why we use the term "inside" rather than "interior".
25 This theorem is the special case of the famous Jordan Curve Theorem when the curve in

question is a polygon. One can find an elementary proof of this theorem in [Henle]. Inciden-
tally, the Courant-Robbins volume is highly recommended as a general introduction to advanced
mathematics.
4.2. THE BASIC VOCABULARY, PART 2 197

Thus a polygonal region is always a closed set, by definition, because it includes


the boundary polygon. It follows that every polygon is the boundary of its polygonal
region. When school mathematics talks about "the area of a polygon", what is
actually meant is "the area of the polygonal region inside the polygon".
Finally, we are in a position to formulate an equivalent definition of a regular
polygon. We say a polygon is convex if the inside of the polygon is convex, i.e.,
the region enclosed by the polygon is a convex set. Then the decisive result in this
connection is the following.
Theorem 4.14. A polygon whose sides have the same length and whose angles
have the same degree is a regular polygon if and only if it is convex.
We need Theorem 4.14 because it clarifies the concept of a regular polygon. It
tells us that we could have defined a regular polygon as a convex polygon whose
sides have the same length and whose angles have the same degree. This would
be a more direct definition in that it does not involve a circle. We should point
out, however, that there is some subtlety involved in either definition. Because of
our standing convention that an angle is automatically taken to be the convex set
determined by the two rays, the cross on the right in the preceding picture of three
polygons would be a regular 12-gon if we do not require convexity. Similarly, this
12-gon would also qualify as a regular polygon if we do not require it to be inscribed
in a circle. We will not prove Theorem 4.14 in this volume but will leave the proof
to the article, A characterization of regular polygons, on the author’s homepage,
https://math.berkeley.edu/~wu/. This is because Theorem 4.14 is not needed
in the remaining geometric discussion in these volumes and the proof is also long
and sophisticated.
If one is willing to spend the effort, one can define the concept of an interior
angle of a polygon. When that is done, then a regular polygon could equally well be
defined as a polygon whose sides all have the same length and whose interior angles
have the same degree. But, again, we will not enter into those details because of
the sophistication involved.
A triangle is always convex (Exercise 6 on p. 198), but for a triangle (a 3-gon)
to be regular, it is not necessary to assume both the equality of the lengths of the
sides and the equality of the degrees of the angles. It will be seen in Chapter 6
of [Wu2020b] that a triangle is regular if either the sides have the same length or
the angles have the same degree (see Theorem G26 and Exercise 2 in Section 6.2 of
[Wu2020b]). For this reason, a regular 3-gon is just an equilateral triangle (which
literally means a triangle with all sides of the same length). Moreover, as soon as
we can show that the sum of all the angles of a convex quadrilateral is 360◦ , it
will follow that a regular 4-gon must be a square. Exercise 9 in Exercises 6.8 of
[Wu2020b] will show that regular n-gons do exist for any whole number n ≥ 3, but
a priori this is not obvious.

Pedagogical Comments. We have tried to give precise definitions for as


many concepts as feasible, but we have also left a few undefined (including region
and length of arc26 ), and the hope is that your intuition would fill in the gaps in
the meantime. The reason for this omission is that the definitions of region, length
of arc, and a host of other geometric concepts are not simple. We did make an
26 Although length of arc will be precisely defined in Sections 4.2 and 4.3 of [Wu2020c].
198 4. BASIC ISOMETRIES AND CONGRUENCE

exception to define boundary of a set and closed set because they are absolutely
essential for the considerations of area in Chapter 4 of [Wu2020c]. However, we
suggest that you do not make heavy weather of these two definitions in the school
classroom because school students have far more pressing concerns than learning
these subtle definitions. The everyday meaning of "boundary" is good enough most
of the time in the school setting. Therefore, this definition is for your own concep-
tual clarification as a teacher: to the extent possible, these three volumes will try
to convince you at every step of the way that there is no room for ambiguity in
mathematics. End of Pedagogical Comments.

Exercises 4.2.
(1) Imagine the hands of a clock to be idealized rays emanating from the
center of the clock. (a) What is the angle between the hour and minute
hands precisely at 8:20 am?27 (b) At what time between 8 am and 9 am
will the hour and minute hands coincide? (c) Is there any time—other
than 12 am and 12 pm—that the hour, minute, and second hands all
coincide?28
(2) Prove the characterization, stated on page 181, of Γ , the nonconvex angle
determined by two distinct rays ROA and ROB with a common vertex O.
(3) Prove on the basis of (L6) that every angle has one and only one angle
bisector.
(4) To do this exercise, use any theorem you know from high school geometry,
but be sure to state precisely what you are using. (i) Prove that the
boundary of the open (or closed) disk of radius r around a point O is the
circle of radius r around O. (ii) Prove that the exterior of a circle C (see
page 194) has C as its boundary. (iii) Prove that the exterior of a circle is
never convex.29
(5) Prove that if C is a given circle with center O and radius r, then its open
disk, closed disk, and exterior are all connected. (For the connectedness
of the exterior, use any theorem you know from high school geometry, but
be sure to state precisely what you are using.)
(6) A triangular region in the plane is by definition the intersection of the
three angles of a triangle. (a) Show that a triangular region is always
convex. (b) Let T be a triangular region in the plane. Show—without
invoking Theorem 4.13 on page 195—that if P ∈ T and Q is in the exterior
of T , then the segment P Q intersects the boundary of T .
(7) Given a circle C and a point P on C, a line LP is said to be a tangent to
C at P if LP intersects C exactly at P ; i.e., LP ∩ C = {P }. Assume that
every point of a circle has a tangent (a fact we will prove in Section 6.8 of
[Wu2020b]) and that the circle always lies entirely in a closed half-plane
of each tangent. Then prove that any disk, open or closed, is convex.
(Caution: This exercise is not as easy as it seems; try to do everything
according to the definitions.)

27 Exercisedue to Tony Gardiner.


28 Both(b) and (c) are due to Ole Hald.
29 A disk, open or closed, is always convex, but the proof is not entirely trivial; see Section

6.8 of [Wu2020b].
4.3. TRANSFORMATIONS OF THE PLANE 199

(8) Assume that for any given subset S of the plane, if a segment joins a point
in S and a point not in S, then the segment contains a boundary point of
S.30 Then prove that any closed bounded set in the plane with a circle C
as its boundary is the closed disk with the same radius and center.

4.3. Transformations of the plane


The development of geometry in these volumes is built on the foundation of what
we call the basic isometries (consisting of rotations, reflections, and translations)
and dilations. These are examples of the general concept of transformations of
the plane. This section gives a brief introduction to the generalities of transfor-
mations, with special emphasis given to those that possess an inverse. To give the
generalities some substance, we zero in on the rotations by defining them precisely.
Rotations fall into two categories: clockwise and counterclockwise; the meanings of
the latter are intuitively clear but their definitions are cumbersome. We will use the
terminology but leave the definitions to the appendix of this section (pp. 212ff.).
Why transformations (p. 199)
Rotations (p. 201)
Generalities about transformations (p. 205)
Inverse transformations (p. 208)
Appendix (p. 212)

Why transformations

Given two segments AB and CD, how can we compare which one is longer
without first getting their individual lengths? For example, suppose we have a
rectangle ABCD. Do the opposite sides AB and CD have the same length?

A D

B C
Similarly, given two angles, how can we compare which one is bigger without
first getting their individual degrees? For example, if two lines L and L are parallel
and they are intersected by another line, how can we tell if the angles ∠a and ∠b
as shown have the same degree?

L
a


b L

30 Mathematical Aside: This simple fact requires the least upper bound axiom for its proof;

see Section 2.1 in [Wu2020c].


200 4. BASIC ISOMETRIES AND CONGRUENCE

These questions, while seemingly silly when the figures are drawn on a piece of
paper, take on a new meaning if the sides of the rectangle ABCD are a few miles
apart or if the lines L and L in the case of angles are also very far apart. We are
therefore confronted with a real-world situation of having to find out whether two
geometric figures (two segments, two angles, or two triangles) in different parts of
the plane are "the same" in some sense (e.g., same length, same degree, etc.).
The traditional way of dealing with this problem in Euclidean geometry is to
write down a set of axioms which abstractly guarantee that two triangles are "the
same" (i.e., congruent). This is how it is usually done in TSM, and the drawback
of such an approach is that, in a mathematical environment where proofs and
reasoning are scarce or nonexistent, to introduce students to proofs by the opaque
formalism of axioms is to invite discontent and also to ensure nonlearning. As of the
last decade, the teaching of geometry in many high schools still vacillated between
teaching proofs by rote via axioms from the beginning and teaching no proofs at
all.31
We propose a third approach, one that is more direct and more tangible and
that makes use of three standard "moves"32 to bring one figure on top of another
in order to check whether two geometric figures are congruent. Even more impor-
tantly, we base proofs of theorems directly on these "moves". In this way, congru-
ence ceases to be mysterious and abstract; it becomes a tactile concept which can
be realized concretely via these standard "moves". The key issue then is what it
means to "move" things around in a plane, with the understanding that the lengths
of segments and degrees of angles remain unchanged in the process. Since "moving
things around in a plane" is exactly where the concept of a "transformation" comes
in, we first define transformations.

For convenience, we denote the plane by Π. A transformation F of Π is a


rule that assigns to each point P of Π a unique point F (P ) (read: "F of P ") in Π.
We also say F maps P to F (P ) or, sometimes, F moves P to F (P ). Indeed,
it is intuitively appealing to think of a transformation as a way of "moving" the
points of the plane around.
There are two extreme examples of a transformation. The first is a constant
transformation: if X is a point in Π, then the transformation FX which assigns
to every P of Π the same point X is called a constant transformation. Thus
FX (P ) = X for every point P in Π. In this case, we can think of a constant
transformation as a rule that moves every point of the plane to a single point. The
other extreme is the identity transformation I which maps every P of Π to the
same point P . Thus I(P ) = P for every P in Π and I does not move any point at
all.
To acquire some intuitive feelings for transformations in general, we need some
nontrivial examples beyond the constant and identity transformations. In this
and the next two sections, we will introduce three kinds of transformations that
will be among the mainstays of this and subsequent volumes. They are examples
of isometries: a transformation F of the plane is said to be an isometry if F
preserves distance in the sense that dist(F (P ), F (Q)) = dist(P, Q) for all the
points P and Q in the plane Π. Since the length of a segment is defined in terms

31 One can get a glimpse of the general situation from the book review [Wu2004a].
32 To be called basic isometries (page 217).
4.3. TRANSFORMATIONS OF THE PLANE 201

of the distance function (see page 185), an equivalent definition is therefore that
an isometry F is a transformation that preserves length; i.e., the length of any
segment P Q, where P and Q are points in the plane, is equal to the length of the
segment P  Q , where P  = F (P ) and Q = F (Q ). Thus, if a picture of P , Q,
P  , and Q looks like the following, then the transformation F is not an isometry
because the length of P Q is visibly longer than the length of P  Q .

q Q

qP 
q q
P Q

On the other hand, the identity transformation is an isometry. The next subsection
will introduce a class of isometries called rotations.

Rotations

To define rotations, we will have to make use of the concept of a clockwise or


counterclockwise direction on a given circle. For reasons given in the Pedagogical
Comments on page 204, we will use these two (related) concepts in an intuitive
sense without a precise definition of either. This statement must be supplemented
with two additional comments, however. First, in the appendix of this section
on pp. 212ff., we do provide precise definitions of clockwise and counterclockwise
rotations. Therefore we have in no way deviated from our stated goal of bringing
precision to any discussion of school mathematics. Second, it will be seen that for
the purpose of understanding clockwise or counterclockwise rotations, an intuitive
understanding of these terms is sufficient. That said, let us single out the two (very
plausible) properties we need about clockwise and counterclockwise rotations. For
counterclockwise rotations, these are:

(a) If P1 on a circle is turned φ degrees (0 < φ ≤ 360) coun-


terclockwise to P2 , P2 is turned θ degrees counterclockwise (0 <
θ ≤ 360) to P3 , and φ + θ ≤ 360 as in the left picture below,
then P3 can be obtained from P1 by turning it (φ + θ) degrees
counterclockwise. See the left figure below. (Compare Exercise
9 on page 216.)
P2 Q3 Q2

θ
φ φ
θ O P1 Q1
O

P3
202 4. BASIC ISOMETRIES AND CONGRUENCE

(b) Suppose Q2 on a circle is obtained from Q1 on the circle by


turning φ degrees counterclockwise and Q3 on the circle is ob-
tained from the same Q1 by turning θ degrees counterclockwise.
Suppose 0 < φ, θ < 360. If φ = θ, then Q2 = Q3 . See the right
figure above. (Compare the assumption of Lemma 4.10 on page
190.)
The reason for the requirement in (a) that φ + θ ≤ 360 is that, at the moment, we
do not know as yet what it means to turn (counterclockwise or clockwise) an angle
that is greater than 360◦ .
Of course, the analogs of (a) and (b) for clockwise rotations are assumed to be
true as well.

Definition. Let O be a point in the plane Π and let a number θ be given so


that −360 ≤ θ ≤ 360. Then the rotation of θ degrees around O (sometimes we
say with center O) is the transformation θ defined as follows: θ (O) = O, and
if P ∈ Π and P = O, let C be the circle of radius |OP | centered at O. Then:

If θ ≥ 0, θ (P ) is the point Q on C so Q
that Q is obtained from P by turning
θ degrees in the counterclockwise θ
O
direction along C (in other words, P
|∠QOP | = θ ◦ ).

If θ < 0, θ (P ) is the point Q on C


obtained from P by turning |θ| degrees
O
in the clockwise direction along C (in θ P

other words, |∠P OQ| = |θ| ).

Note that the assignment of Q to the given P is unambiguous on account of


Lemma 4.10 on page 190. Hence θ is well-defined (i.e., it makes sense).
For an intuitive understanding of rotations, the following activity, preferably
done in class, will be helpful. It gives a tactile realization of a rotation of 32 degrees.

Activity. This Activity will give an idea what ρ, a counterclockwise rotation


of 32◦ , does to the plane. On a piece of paper, which is our model for the plane Π,
fix a point O, and then draw a geometric figure S and a point Q, as shown below in
black. Place a clear transparency over this sheet of paper with figure S and points
O and Q in black. With a different color (say, red), copy on the transparency the
S, O, and Q right on top of the originals. In particular, the red point O on the
transparency is on top of the point O on the paper.
4.3. TRANSFORMATIONS OF THE PLANE 203

B
S
A
B
S
Q
A o
32
o
32

Q
O
Now use a pointed object (e.g., the needle of a compass) to pin the transparency to
the paper at the point O. Holding the paper fixed, rotate the transparency around
O, counterclockwise by 32 degrees33 and stop. For the moment, ignore the angles
∠AQB and ∠A Q B  in the above picture and concentrate on S and Q. We will
denote the red geometric figure in this new position by S  . S  is exactly where ρ has
moved S. Similarly, we denote the red point Q in this new position by Q ; this point
Q is where the rotation ρ has moved Q. Notice that ρ does not move O, the center
of rotation. Needless to say, there is nothing special about the number 32; one
should do this Activity with angles of any degree, clockwise or counterclockwise.
This Activity suggests that the rotation ρ is an isometry. Indeed, if A and B
are two points in the plane Π and ρ moves them to A and B  , respectively (see the
preceding picture), then the Activity tells us how to locate A and B  ; namely, copy
O, A, and B in red on a piece of transparency, and then rotate the transparency 32
degrees around O. The positions of the red A and red B on the transparency are
the locations of A and B  , respectively. Since the distance from the red A to the
red B  on the transparency is exactly the distance from A to B in Π, ρ is distance-
preserving; i.e., ρ is an isometry. The Activity also suggests that ρ preserves the
degrees of angles in the sense that—in the notation above—the angle ∠AQB,
for example, is moved by ρ to ∠A Q B  and since the red angle ∠A Q B  is a copy
of ∠AQB on the transparency, we have |∠A Q B  | = |∠AQB|.
Everything we have said thus far has nothing in particular to do with "32 de-
grees", so what the Activity suggests is that any rotation is an isometry that also
preserves degrees of angles. However, from the point of view of our mathematical
development, the fact that a rotation is an isometry is something that cannot be
proved but must be assumed. See page 217 and page 237.

Remarks. (1) Because we will be talking about rotations of negative degrees,


e.g., "a rotation of (−36) degrees", we need to further clarify the concept of degree
at this point. The degree of an angle is always ≥ 0 (assumption (L6), page 188).
Therefore, the concept of negative degree arises only in the context of rotations;
33 In practical terms, the way to achieve this is to pencil in an angle ∠AOB of 32 degrees

somewhere on the paper (not the transparency) (with the help of a protractor) so that the ray
ROA is in the counterclockwise direction of the ray ROB . Copy ∠AOB on the transparency. Then
rotate the transparency counterclockwise around O until the ray ROB on the transparency is on
top of the ray ROA on the paper.
204 4. BASIC ISOMETRIES AND CONGRUENCE

e.g., "a rotation of (−36) degrees" or "a (−36)-degree rotation" signifies that we
have to rotate 36 degrees clockwise.
(2) It is a curious fact—but sometimes useful nevertheless—that since we allow
ourselves to use both clockwise and counterclockwise rotations, it suffices to use
angles of θ degrees so that |θ| ≤ 180 in any discussion of rotations. Indeed, a rota-
tion of 235 degrees is equal to a rotation of −125 degrees because 360 − 125 = 235,
and a rotation of −235 degrees is equal to a rotation of 125 degrees. In general, if
180 < θ ≤ 360, then a rotation of θ degrees is equal to a rotation of −(360 − θ)
degrees, and if −360 ≤ −θ < −180, then a rotation of −θ degrees is equal to a rota-
tion of (360 − θ) degrees. Of course, if 180 < θ ≤ 360, we have | ± (360 − θ)| < 180.
We will put this to use later.

Pedagogical Comments. Although we have left the concepts of clockwise


rotation and counterclockwise rotation undefined, we wish to point out that they
can be defined—see the appendix on pp. 212ff.—but that we decided not to do so
in the main body of the exposition for a pedagogical reason.
A brief look at the appendix will reveal that the definitions are far from sim-
ple. Unless absolutely necessary, the average school student should be spared the
tedium of learning something that is intuitively obvious but whose precise expla-
nation is intricate and—it is safe to say—uninteresting. In the case of clockwise
or counterclockwise rotations, we consider the precise definitions to be not abso-
lutely necessary. Moreover, the definitions require the use of translations (see page
234). Therefore a completely correct exposition using these definitions of clockwise
and counterclockwise rotations will mandate that we begin with the definition of
a 180-degree rotation (for which clockwise or counterclockwise is irrelevant), prove
Theorems G1 to G3 (pp. 220–224), go on to define translations, and then come
back to define rotations in general, clockwise and counterclockwise. In terms of
mathematics learning, it is a bad idea to break up the concept of rotation this
way (regardless of how technically correct the mathematics may be). Given that
introductory geometry is already overladen with a great many definitions as it is
(see Sections 4.1 and 4.2), we do not believe that it is absolutely necessary to throw
two more long-winded definitions into the mix. Unlike the concepts of half-lines
and half-planes (see Lemma 4.5 and (L4) on pp. 173 and 176, respectively), which
are truly fundamental to the whole discussion of geometry, students can easily get
by with only an intuitive knowledge of clockwise or counterclockwise rotations. We
therefore made the decision not to offer these precise definitions in the main body
of our exposition.
We have mentioned more than once (cf. the Pedagogical Comments on page
178 and page 197) that introductory geometry is full of unpleasant details, and
we will have occasion to do so many more times in the future (e.g., pp. 261, 263,
277, and 292). If our goal is to optimize student learning, then it would behoove
us to smooth students’ learning path by bringing out the truly essential ideas and
soft-pedaling unpleasant details that are of secondary importance. At this stage of
students’ mathematics learning, we believe an intuitive understanding of clockwise
and counterclockwise rotations is all they need. End of Pedagogical Comments.
4.3. TRANSFORMATIONS OF THE PLANE 205

Generalities about transformations

To facilitate subsequent discussions, we introduce a standard concept, that of


the image of a set by a transformation F . Given a point Q of the plane, we will
call F (Q) the image of Q by F or the image of Q under F . If S is a subset
of the plane, then the image of S by F , denoted by F (S), is the collection of
all the images of the points in S by F . Equivalently, F (S) is the collection of all
the points in the plane Π which can be written as F (Q) for some point Q in S.
Intuitively, F moves S to F (S). We also say F maps S to F (S). Thus the ρ(Q)
and ρ(S) in the picture in the Activity of the preceding subsection are the images
of the point Q and the black figure S by ρ, respectively. Likewise, for the constant
transformation FX and identity transformation I, we have FX (Π) = {X}, the set
consisting of the single point X, and I(Π) = Π, respectively.
We will be looking at transformations that are very "well-behaved", in the
following sense. First, define a transformation T to be injective, or an injection,34
if for any two distinct points P1 and P2 of Π, T assigns to them distinct points T (P1 )
and T (P2 ) of Π. Thus the constant transformation FX (for a given point X) is not
injective because, if we take any two distinct points P1 and P2 of Π, then FX maps
both into the same point X. On the other hand, the identity transformation I is
injective because if P1 = P2 , then also I(P1 ) = I(P2 ).
An isometry is injective. Indeed, let F be an isometry. Suppose P = Q;
then dist(P, Q) > 0, on account of (L5)(i) on page 184. Thus dist(F (P ), F (Q)) =
dist(P, Q) > 0, and therefore F (P ) = F (Q). This shows F is indeed injective.
We define a transformation T to be surjective, or a surjection,35 if for every
point Q of Π, Q = T (P ) for some point P of Π; i.e., for every Q in Π, there is a
point P in the plane Π which is assigned to Q by T . For example, the constant
transformation FX is not surjective because, if Q is a point in Π and Q = X, then
there is no point P in Π so that FX (P ) = Q. But the identity transformation I is
clearly surjective.
Finally, a transformation T is said to be bijective, or a bijection,36 if it is
both injective and surjective.
It will turn out that every isometry is surjective so that, in fact, an isometry is
a bijection. However, the surjectivity of an isometry is far from obvious and cannot
be proved until Section 6.6 of [Wu2020b].
We remark that the common terminology for these concepts, i.e., one-to-one,
onto, and one-to-one correspondence, are linguistically awkward. The suggested
replacements of injective, surjective, and bijective, respectively (first made by the
French group Bourbaki), are clearer and more civilized.

To make sense of these new concepts, we have to look at more than the constant
transformation and the identity transformation. Let us consider the rotations de-
fined above. We claim that a rotation is always a bijection. While this is intuitively
obvious, we will go through the argument carefully. Let us fix a rotation ρ around
some O of θ degrees. Since a rotation of 0 degrees is the identity transformation,
we may assume 0 < |θ| ≤ 360. By a remark on page 204, we may in fact assume

34 The usual terminology is one-to-one.


35 The usual terminology is onto.
36 The usual terminology is one-to-one correspondence.
206 4. BASIC ISOMETRIES AND CONGRUENCE

0 < |θ| ≤ 180. Because the case of θ = 180 is obvious, we will henceforth assume
0 < |θ| < 180.
ρ is injective. Let P1 and P2 be two distinct points, and we must show ρ(P1 ) =
ρ(P2 ). If one of them is equal to O, say P1 = O, then ρ(P1 ) = O by the definition
of a rotation, and, because P2 = O, ρ(P2 ) = O, also by the definition of a rotation.
Therefore ρ(P1 ) = ρ(P2 ) in this case. We may therefore assume that both P1 and
P2 are different from O. If |OP1 | = |OP2 |, then there is nothing to prove because, if
P1 and P2 denote ρ(P1 ) and ρ(P2 ), respectively, then by the definition of a rotation,
|OP1 | = |OP1 | = |OP2 | = |OP2 |. Therefore |OP1 | = |OP2 | so that P1 = P2 ; i.e.,
ρ(P1 ) = ρ(P2 ). So let us suppose |OP1 | = |OP2 |. Then both P1 and P2 lie on some
circle C around O and ρ maps them to P1 and P2 , respectively, as shown:
P P2
1

θ φ P1
P2
O

C
For definiteness, we may assume θ > 0 because the argument for the case of
a negative θ is similar. Thus, we are looking at a counterclockwise rotation of
θ degrees, 0 < θ < 180. Let ∠P1 OP2 denote the convex angle as usual, and let
|∠P1 OP2 | = φ◦ , where 0 < φ ≤ 180 (see Lemma 4.9 on page 188). By switching the
points P1 and P2 if necessary, we may assume that the counterclockwise rotation of
φ◦ maps P1 to P2 , as shown in the above picture. Therefore we may characterize P2
(which is ρ(P2 )) as the point obtained from P1 , first by a counterclockwise rotation
of φ degrees (which moves P1 to P2 ), followed by a counterclockwise rotation of θ
degrees (which moves P2 to P2 ). By (a) on page 201, P2 is the point obtained from
P1 by a counterclockwise rotation of (φ + θ) degrees. But P1 is by definition the
point obtained from P1 by a counterclockwise rotation of θ degrees. Since φ > 0,
(θ + φ) = θ and therefore P1 = P2 , by (b) on page 201. The proof of the injectivity
of ρ is complete.
ρ is surjective. Let a point Q be given. We must find a point Q so that
ρ(Q) = Q . If Q = O, just let Q = O. So let Q = O, and let C be the circle with
center O and radius OQ . Now rotate Q by θ degrees along C in the clockwise
direction to get to a point Q, as shown below. By definition of ρ, we have ρ(Q) = Q .
Hence ρ is surjective. This proves that ρ is bijective.

Q

θ
O

Q
C
4.3. TRANSFORMATIONS OF THE PLANE 207

Pictorially, what a bijection T of Π does is to move the points of the plane Π in


such a way that distinct points are not "lumped together" by T into the same point
(injectivity) and such that the image of the plane by T , T (Π), "covers" all of Π and
not just a part of it (surjectivity).

The following three examples of transformations make use of coordi-


nates, the inverse tangent function, and roots of cubic polynomials,
respectively. These are concepts that will not be discussed until
Chapters 6 of this volume, Chapter 1 of [Wu2020c], and Chapter 3
of [Wu2020b], respectively. Therefore, these examples are not part
of the logical development in this volume. However, since these
examples are likely to help you build up your geometric intuition
about transformations, we put them here solely as a learning aid.

Example 1. Let coordinates be introduced in the plane, so that points in the


plane are now just a pair of numbers (x, y). Define the folding transformation ϕ
by ϕ(x, y) = (|x|, y). Pictorially, ϕ "folds" the plane along the y-axis onto the right
half-plane of the y-axis, because for every (x0 , y) to the right of the y-axis, i.e.,
x0 > 0, we have ϕ(x0 , y) = (x0 , y), i.e., ϕ leaves it unchanged, while for the point
(−x0 , y) on the left of the y-axis (still assuming x0 > 0), ϕ(−x0 , y) = (x0 , y), i.e.,
ϕ "folds" (−x0 , y) onto (x0 , y) . Therefore the definition of ϕ implies that ϕ is not
injective. Furthermore, ϕ is not surjective either, because, for example, the point
(−1, 0) cannot be written as ϕ(x , y  ) since ϕ(x , y  ) = (|x |, y  ) no matter what x
may be, so that the x-coordinate of ϕ(x , y  ) will always be ≥ 0 and can never be
equal to −1 for any x . You can see easily that the image ϕ(Π) is in fact the right
half-plane together with the y-axis.

Example 2. Recall the inverse tangent function from trigonometry, arctan,


which is defined for every number and is increasing, i.e., arctan(x) makes sense
for every number x, and if x < x , then arctan(x) < arctan(x ). Furthermore,
− π2 < arctan(x) < π2 for every number x. Here is the graph:

Now again assume that coordinates have been introduced in Π. We claim that
the following transformation G of Π, defined by G(x, y) = (arctan(x), y), is injective
but not surjective.
To show G is injective, we must show that if (x1 , y1 ) = (x2 , y2 ), then G(x1 , y1 )
= G(x2 , y2 ). To this end, observe that (x1 , y1 ) = (x2 , y2 ) means x1 = x2 or
y1 = y2 . First suppose x1 = x2 ; then either x1 < x2 or x1 > x2 . If x1 < x2 , then
208 4. BASIC ISOMETRIES AND CONGRUENCE

arctan x1 < arctan x2 (because arctan is increasing), so that


G(x1 , y1 ) = (arctan x1 , y1 ) = (arctan x2 , y2 ) = G(x2 , y2 );
i.e., G(x1 , y1 ) = G(x2 , y2 ). Similarly, if x1 > x2 , then also G(x1 , y1 ) = G(x2 , y2 ).
Therefore x1 = x2 implies G(x1 , y1 ) = G(x2 , y2 ). On the other hand, if y1 = y2 ,
then obviously (arctan x1 , y1 ) = (arctan x2 , y2 ) (the two points have different second
coordinates) and hence also G(x1 , y1 ) = G(x2 , y2 ). So G is injective.
On the other hand, G is not surjective on the plane, and this is because since
− π2 < arctan(x) < π2 , the expression of G as G(x, y) = (arctan(x), y) means the
x-coordinates of all the image points G(x, y) of G, no matter what x and y may be,
lie in the open interval (− π2 , π2 ). This implies that the image G(Π) of G lies in the
infinite "vertical strip" in the plane bounded between the vertical lines x = − π2 and
x = π2 . In particular, the image G(Π) of the plane Π under G is not all of Π, and
therefore G is not surjective. One can also see the nonsurjectivity of G directly by
noting that the point (5, 0) cannot be written as G(x , y  ) for any (x , y  ), because if
it were, then (5, 0) = G(x , y  ) = (arctan(x ), y  ), so that arctan(x ) = 5 and y  = 0.
In particular, arctan(x ) > π2 , which is impossible.

Example 3. Assume as before that we have coordinates in the plane. Define a


transformation H so that H(x, y) = (x3 − 9x + 4, y). We claim that H is surjective
but not injective. The failure of injectivity is easy: H(0, y) = H(3, y) = H(−3, y) =
(4, y) no matter what y may be. To show surjectivity, given any point, (2, 3), for
instance, we will show how to find an (x0 , y0 ) so that H(x0 , y0 ) = (2, 3); i.e.,
(x30 − 9x0 + 4, y0 ) = (2, 3). Consider the cubic equation x3 − 9x + 4 = 2, which is the
same as x3 − 9x + 2 = 0. But we know that any polynomial (whose coefficients are
real numbers) of odd degree must have a real root (Section 3.1 of [Wu2020b]). So
let x0 be a real root of x3 − 9x + 2 = 0. Then with this x0 and with y0 = 3, we get
H(x0 , y0 ) = (2, 3). The reasoning with (2, 3) replaced by any (x0 , y0 ) is the same.
This shows H is surjective. (It can be seen from the preceding argument that, for
the purpose of surjectivity, the cubic polynomial x3 − 9x + 4 could be replaced by
any cubic polynomial.)
We now return from the examples to the main line of our discus-
sion. We note explicitly that at this point of the present logical
development of geometry, there is no place for coordinates.

Inverse transformations

Bijections can be understood from a completely different angle. To this end, we


will have to introduce a few more concepts. If F and G are transformations, we say
the transformations F and G are equal, in symbols F = G, if F (Q) = G(Q)
for every point Q ∈ Π. The composite transformation F ◦ G (sometimes also
called the composition of F and G) is by definition the transformation which
assigns a point P in the plane to the point F (G(P )); i.e., if P  denotes the point
G(P ), then F ◦ G sends P to F (P  ).
Observe that we have now introduced a new meaning to the equal
sign, the equality of two transformations. This is a break
from the past because, up to this point, we have only used the
equal sign between two numbers or two sets. Observe also the
4.3. TRANSFORMATIONS OF THE PLANE 209

fact that this definition is completely unambiguous, so that un-


derstanding the equality of two transformations is just a routine
part of learning mathematics that does not require a psychological
discussion of our a priori perception of the concept of equality.
For example, no matter what F is, F ◦ I = I ◦ F = F . Moreover, if FX is the
constant transformation into the point X, then no matter what the transformation
G is, FX ◦ G = G ◦ FX = FW , where W = G(X) so that FW is the constant
transformation that assigns every point to the point G(X). As another example,
the folding transformation of Example 1 on page 207 satisfies ϕ ◦ ϕ = ϕ. (Can you
explain this?) Note also that the composite of two bijections is again a bijection,
and the composition of two isometries is an isometry. Both are simple to verify
directly (see Exercise 1 on page 215).
A more revealing example is the composition of two rotations with the same
center. Thus let ρ and ρ be two rotations, both with the same center O, of
degrees θ and φ, respectively. One can easily see that if θ = 30 and φ = 45,
then ρ ◦ ρ = ρ ◦ ρ = a rotation around O of 75◦ . Or if θ = 30 and φ = −45,
then ρ ◦ ρ = ρ ◦ ρ = a rotation around O of −15◦ , i.e., a clockwise rotation of 15
degrees. In general, if ρθ and ρφ are two rotations, both with the same center O,
of degrees θ and φ so that −360 ≤ θ, φ ≤ 360 and −360 ≤ θ + φ ≤ 360, then

(4.1) ρθ ◦ ρφ = ρφ ◦ ρθ = a rotation around O of (θ + φ)◦ .

Recall that the restriction of −360 ≤ θ + φ ≤ 360 in equation (4.1) is necessary


because we have not yet defined rotations of degrees that are < −360 or > 360.
The simple proof of (4.1) is best left as an exercise (see Exercise 6 on page 216).37
The examples in the last paragraph would seem to suggest that the composition
of transformations is commutative, in the sense that if F and G are transforma-
tions, then it is always the case that F ◦ G = G ◦ F . It is instructive, as well as
essential, to look at a simple example to see that this is false in most cases.38 Let
A and B be two distinct points on a line and let ρA , ρB be rotations of 90 degrees
around A and B, respectively. Now consider

(ρA ◦ ρB )(A), (ρB ◦ ρA )(A)

and
(ρA ◦ ρB )(B), (ρB ◦ ρA )(B).
We want to show that the two points are different in each case. To this end, we
have to look at the following picture, where lines perpendicular to LAB through A
and B have been drawn. Let points C, N , P , and D be chosen on these lines, as
shown, so that LCN and LP D are perpendicular to LAB at A and B, respectively,
and |AB| = |AC| = |AN | = |BD| = |BP |. In addition, let E be a point on LAB so
that A ∗ B ∗ E and |AB| = |BE|.

37 Equation (4.1) should be counterbalanced by the fact that the composition of two rotations

with distinct centers is in general not a rotation. See Exercise 11(iii) and (iv) on page 372.
38 The following discussion of this example will assume some geometric facts that we have

not proved. There is no harm in doing this because this example is a side remark rather than an
integral part of our logical development.
210 4. BASIC ISOMETRIES AND CONGRUENCE

C q rP

q q q
A @ B E
@
@
N q @r D

We want to find out what (ρA ◦ρB )(A) is. By definition, this is the point ρA ( ρB (A)).
So we have to first find out what the point ρB (A) is. This is the point obtained by
rotating A 90 degrees counterclockwise around B. First of all, ρB (A) rotates the
ray RBA to the ray RBD . So ρB (A) must lie on the ray RBD . But we are assuming
that |AB| = |DB|, so by (L5)(ii) on page 184, ρB moves A to D, and therefore
ρB (A) = D. Hence in order to find out what (ρA ◦ ρB )(A) is, we now must find out
what ρA (D) is.
Notice that we are looking strictly at the effect of the transformation ρA on the
point D, and we ignore what ρB is or the fact that D = ρB (A). In other words, in
finding out about the effect of a composition of two transformations, say ϕ ◦ , we
first observe what the first transformation does to a point P , say (P ) = Q; once
that is done, we forget about and concentrate entirely on Q to find out what ϕ
does to the point Q. Please keep this in mind.
To return to our task at hand, we have to find out what ρA (D) is. So what does
ρA do? It turns every point 90 degrees counterclockwise around A. For example,
ρA (B) lies on the ray RAC , but since |AB| = |AC| by construction and ρ(AB) has
the same length as AB, ρA (B) = C by (L6)(iv) on page 188. Similarly, ρA (N ) = B.
By the same reasoning, ρA (D) lies on the ray perpendicular to LAD and lying
in the same closed half-plane of LAD as P . Now, by elementary geometry (see,
e.g., Section 6.2 of [Wu2020b]), we know that |∠P AB| = |∠BAD| = 45◦ , so that
|∠P AD| = |∠P AB| + |∠BAD| = 90◦ (see (L6)(iv) on page 188), and that |AD| =
|AP | because AD and AP are the diagonals of the squares AN DB and ABP C,
respectively. (For our need here, it suffices to verify both facts experimentally.)
Therefore, ρA (D) = P for reasons similar to the above. Consequently,

(4.2) (ρA ◦ ρB )(A) = ρA (D) = P.

Now similar considerations lead to the conclusion that

(4.3) (ρB ◦ ρA )(A) = ρB (A) = D.

So we see from equations (4.2) and (4.3) that

(ρA ◦ ρB )(A) = (ρB ◦ ρA )(A)

and the composition of transformation is in general not commutative. The proofs


that

(ρA ◦ ρB )(B) = C,
(ρB ◦ ρA )(B) = N
4.3. TRANSFORMATIONS OF THE PLANE 211

are similar and will be left as an exercise (Exercise 8 on page 216). In any case, we
also have
(4.4) (ρA ◦ ρB )(B) = (ρB ◦ ρA )(B).
Conclusion: Given two transformations F and G of the plane, it is in general
false that F ◦ G = G ◦ F .

With these preliminaries out of the way, we now come to the main point.
Given a transformation F , suppose there is a transformation G so that both F ◦ G
and G ◦ F are equal to the identity transformation I on the plane. Then we
say that G is an inverse transformation of F (and of course also that F is an
inverse transformation of G). Often, we simply say F is an inverse of G.
Again, referring to rotations, let ρ be the rotation of degree θ around O, where
−360 ≤ θ ≤ 360, and let ρ be the rotation of degree −θ around the same point O,
where −360 ≤ θ  ≤ 360; then it can be immediately verified by using the definition
of a rotation that
(4.5) ρ ◦ ρ = ρ ◦ ρ = I
so that ρ is an inverse transformation of ρ.
The following theorem characterizes transformations which have an inverse
transformation.

Theorem 4.15. (i) If a transformation of the plane has an inverse transfor-


mation, then it is a bijection. (ii) If a transformation is a bijection, then it has an
inverse transformation.

Proof. In an exercise (Exercise 5 on page 216), you will prove (ii). We can prove
(i) very simply by use of a standard argument, one that deserves to be learned.
Let G be an inverse of a given transformation F . Then F is injective because
if F (P1 ) = F (P2 ) for two points P1 and P2 , then also G(F (P1 )) = G(F (P2 )) and
therefore P1 = P2 because G◦F = I. Thus if P1 = P2 , then F (P1 ) = F (P2 ). Also, F
is surjective because given Q ∈ Π, if we let P = G(Q), then F (P ) = F (G(Q)) = Q,
because F ◦ G = I. The proof of the theorem is complete.

In short, a transformation F being a bijection is equivalent to its having an


inverse transformation. Observe that equation (4.5) and Theorem 4.15 together
give an abstract proof that every rotation is a bijection, something for which we
have already given a direct proof on page 211.
You will also show in an exercise (Exercise 5 on page 216) that the inverse of
a transformation (if it has one) is unique, i.e., if there are transformations G and
G relative to a given F , so that F ◦ G = I, G ◦ F = I and F ◦ G = I, G ◦ F = I,
then G = G . From now on we can speak of the inverse of a transformation.
The inverse of a bijection F is traditionally denoted by F −1 (read: "F in-
verse"). If F and G are bijections, then the inverse of F ◦ G is G−1 ◦ F −1 (note
the order!) (see Exercise 1 on page 215).

Mathematical Aside: Because the composition of bijections is a bijection (Ex-


ercise 1 on page 215) and every bijection has an inverse bijection (Theorem 4.15),
the set of all bijections forms a group whose binary operation is ordinary composi-
tion of transformations. This group of bijections of the plane has many interesting
212 4. BASIC ISOMETRIES AND CONGRUENCE

subgroups, as we will point out in due course, e.g., pp. 235, 240, and 286. (Note
that while it is clear that we are here talking about bijections of the plane, we have
purposely omitted any reference to the plane because this discussion is valid for the
bijections of any set.)

Appendix

We will outline a definition of counterclockwise rotation. It will be clear that


once that is done, clockwise rotation can be similarly defined. The overall strategy
is to fix a point O to define counterclockwise and clockwise rotations around O, and
then we use translations (page 234) to propagate these concepts to other points in
the plane.
First, we have an informal discussion. Let a point O be fixed, and let x be a
number satisfying 0 < x < 180. Consider the problem of defining the counterclock-
wise rotation of x◦ around O. By definition, (O) = O. If P is a point not equal to
O, then according to (L6)(ii) (page 188), there are two unique rays LOQ and LOQ —
residing in the two closed half-planes of LOP —so that |∠QOP | = |∠Q OP | = x◦ .
Without loss of generality, we may also assume that |OP | = |OQ| = |OQ |, as
shown (see (L5)(ii) on page 184).

P
xo
H + xo
_
O Q
H

Let us denote the half-plane of LOP in which Q lies by H + and denote the half-
plane of LOP in which Q lies by H − . According to Lemma 4.10 on page 190,
the point Q in H + is unique, and the point Q in H − is also unique. If it is
counterclockwise rotation that we want, then intuitively, we would choose Q in H +
and define (P ) = Q, but if we want instead the x◦ clockwise rotation of P , then
we would take Q in H − . Thus for the purpose of defining (P ), we simply choose
the half-plane H + of LOP and define (P ) to be the unique point Q in H + so that
Q ∈ H + , |OP | = |OQ|, and |∠QOP | = x◦ . The definition of counterclockwise
rotation therefore boils down to the "consistent" choice of a half-plane of a given
line passing through O.
Perhaps we should point out that, just as we have used "left" and "right"
regarding the number line without any formal definition, we will also use "up" and
"down" in the rest of this appendix without a formal definition. The fact is that
we could define all these concepts if we must, but given that this volume is already
overloaded with (uninteresting) technicalities, we have chosen not to given these
formal definitions.
We can now give the formal definition. Fix a point O in the plane. Let a point
P be given, P distinct from O. First, we are going to single out a specific half-plane
HP of the line LOP , in the following way. Let ∠A1 OA2 be a right angle with vertex
4.3. TRANSFORMATIONS OF THE PLANE 213

at O so that the side ROA1 is horizontal and right-pointing and the side ROA2 is
vertical and upward-pointing. By (L5)(ii), we may assume without loss of generality
that |A1 O| = |A2 O|. Similarly, on the line LOA1 , let a point A3 be chosen so that
A1 ∗ O ∗ A3 and |OA1 | = |OA3 |, and on the line LOA2 let a point A4 be chosen so
that A2 ∗ O ∗ A4 and |OA2 | = |OA4 |, as shown:39
q
A2 P

A3 O A1

q A4
P
For the definition of HP , we first dispose of four special cases:
(1) If P lies on the ray ROA1 , then HP is the half-plane of LOP
that contains A2 .
(2) If P lies on the ray ROA2 , then HP is the half-plane of LOP
that contains A3 .
(3) If P lies on the ray ROA3 , then HP is the half-plane of LOP
that contains A4 .
(4) If P lies on the ray ROA4 , then HP is the half-plane of LOP
that contains A1 .
The HP in each of these four cases is represented as the shaded region in the
following:

Next, assume that P lies on neither the horizontal line LA1 A3 nor the vertical
line LA2 A4 . If P lies in ∠A3 OA4 (let us say), then the two points A3 and A4 will
lie on different sides of the line LOP by the crossbar axiom on page 250.40 The
following definition of HP makes use of this fact:
(i) If P lies in ∠A1 OA2 , then HP is the half-plane of LOP that
contains A2 .
(ii) If P lies in ∠A2 OA3 , then HP is the half-plane of LOP that
contains A3 .
(iii) If P lies in ∠A3 OA4 , then HP is the half-plane of LOP that
contains A4 .

39 Anticipating the use of coordinates in Chapter 6, we may think of the ray R


OA1 as the
nonnegative x-axis and ROA2 as the nonnegative y-axis.
40 There is no logical difficulty here because the crossbar axiom (L8) could be stated right

after assumption (L6) on page 188.


214 4. BASIC ISOMETRIES AND CONGRUENCE

(iv) If P lies in ∠A4 OA1 , then HP is the half-plane of LOP that


contains A1 .

The HP in each of these four cases is represented as the shaded region in the
following:
HP A2 HP
A2 A2 A2
P P

A3 O A1 A3 O A1 A3 O A1 A3 O A1
P
A4 A4 A4 A4 P
HP HP

We are now in a position to define the x-degree counterclockwise rotation


 of P around O where 0 ≤ x ≤ 180. First of all, (O) = O. Moreover,
the case of a 0-degree or a 180-degree counterclockwise rotation is easy to dispose
of: the 0-degree rotation (clockwise or counterclockwise) of a point P = O is just
P itself, while the 180-degree rotation of P (clockwise or counterclockwise) is the
point P so that O is the midpoint of the segment P P ; i.e., |OP | = |OP | (see the
picture on p. 213; the possibility of choosing such a P is guaranteed by (L5)(ii)).
Thus it suffices for us to define the x-degree counterclockwise rotation of P around
O where 0 < x < 180 and P = O. In this case, (P ) is the unique point Q lying
in the specified half-plane HP so that |∠P OQ| = x◦ and so that |OP | = |OQ| (see
(L6)(ii) on page 188 and (L5)(ii) on page 184). The following picture is for the case
of a P lying in ∠A2 OA3 :

A2
P

xo
A3 O A1

HP
Q
A4
It is now easy to define the x-degree counterclockwise rotation around O
when 0 ≤ x ≤ 360. First, we do it intuitively. If we have to rotate a point P = O
counterclockwise through (let us say) 235 degrees, we can stop the rotation after
180 degrees and then resume the counterclockwise rotation for another 55 degrees
(235 = 180 + 55). The advantage of doing this is that after 180 degrees of rotation,
we know exactly where P is, namely, the point P which lies on the line LOP so
that O is the midpoint of the segment P P (see the picture on p. 213). Therefore
to define the 235-degree counterclockwise rotation of P , all we need to do is carry
out the 55-degree counterclockwise rotation of P . But the latter is something we
already know how to do.
Formally, suppose a number x is given so that 180 < x ≤ 360. Write x as
x = 180 + x, where 0 < x ≤ 180. Then the x-degree counterclockwise rotation
4.3. TRANSFORMATIONS OF THE PLANE 215

of a given point P around O, where 180 < x ≤ 360, is by definition the


x-degree counterclockwise rotation of the point P which is the 180-degree rotation
of P .
In summary, for a number x so that 0 ≤ x ≤ 360, the x-degree counterclockwise
rotation around O is a well-defined transformation of the plane. Observe that
the 0-degree and the 360-degree counterclockwise rotations are just the identity
transformation of the plane.

It is now clear how one should go about defining the x-degree clockwise
rotation around O for x satisfying 0 ≤ x ≤ 180: for each point P = O, we
specify the preferred half-plane of LOP to be the opposite half-plane of HP . Let
us denote this half-plane by HP− . Then for each x satisfying 0 ≤ x ≤ 180 and for
each P = O, the x-degree clockwise rotation of P is by definition the point P 
so that |OP | = |OP  |, |∠P OP  | = x◦ , and P  lies in the specified half-plane HP− .
The x-degree clockwise rotation around O, where 180 < x ≤ 360, is then
defined as the x-degree clockwise rotation of P (the 180-degree rotated image of P
as above), where x = 180 + x and 0 < x ≤ 180. Altogether, we have defined the
x-degree clockwise rotation of any point P in the plane and for any x satisfying
0 ≤ x ≤ 360.
It remains to define the x-degree counterclockwise rotations around an
arbitrary point O  of the plane, where 0 ≤ x ≤ 360. Let T be the translation
−−→
along the vector OO  (page 234). For each point P  , let P be the point in the
plane so that T (P ) = P  , and let Q be the x-degree counterclockwise rotation of P
around O. Then, by definition, the x-degree counterclockwise rotation of P  around
O  is the point T (Q). The x-degree clockwise rotation is defined similarly.
Exercises 4.3.
(1) (a) Prove that the composition of two isometries is an isometry. (b) Prove
that the composition of two surjections is a surjection and the composition
of two injections is an injection. (Hence the composition of bijections is a
bijection.) (c) If F , G are bijections, then prove that the inverse of F ◦ G
is G−1 ◦ F −1 .
(2) (This exercise makes use of coordinates; see the warning on pp. 207 and
208.) (a) Let F and G be transformations of the plane defined by F (x, y) =
(x, y + 1) and G(x, y) = (xy, y). Are the transformations F ◦ G and G ◦ F
equal? (b) Is F ◦G injective? Surjective? (c) Is G◦G injective? Surjective?
(3) (This exercise makes use of coordinates; see the warning on pp. 207 and
208.) (a) Let F be the transformation of the plane defined by F (x, y) =
(x + 1, y + 1), and let C denote the unit circle, i.e., the circle of radius
1 around the origin (0, 0). Give a rough description of F (C), but be as
precise as you can. (b) Let G be the transformation of the plane defined
by G(x, y) = (2x, y), and let C be the unit circle as before. Give a rough
description of G(C), but be as precise as you can. (c) Let H be the
transformation of the plane defined by H = G ◦ F , so that H(x, y) =
(2x + 1, y + 1), and let C be the unit circle as before. Give a rough
description of H(C), but be as precise as you can.
(4) (This exercise makes use of coordinates; see the warning on pp. 207 and
208.) (a) Consider the transformation G of the plane defined by G(x, y) =
(x2 , y). Is it injective? Is it surjective? (b) Consider the transformation F
216 4. BASIC ISOMETRIES AND CONGRUENCE

of the plane defined by F (x, y) = (x, y 3 ). Is it injective? Is it surjective?


(c) With F and G as in (a) and (b), what is (F ◦ G)(x, y) for any point
(x, y)? Is the composite injective? Surjective?
(5) (a) Prove that a bijection F of the plane must have an inverse G.
(b) Prove that if the inverse of a transformation F exists, then it must be
unique (in other words, if G and G are both inverse transformations of
F , then G = G ).
(6) Make use of (a) on page 201 to prove equation (4.1) on page 209.
(7) For each of the following assertions about transformations F and G of the
plane, if it is true, prove it. If always false, prove it. If sometimes true
and some times false, give examples of each kind. (a) If F ◦ G is injective,
then G is injective. (b) If F ◦ G is injective, then F is injective. (c) If
F ◦ G is surjective, then G is surjective. (d) If F ◦ G is surjective, then F
is surjective.
(8) (a) Prove equation (4.4) on page 211 that (ρA ◦ ρB )(B) = (ρB ◦ ρA )(B).
(b) Exhibit two rotations F and G in the plane so that F ◦ G = G ◦ F
and so that these are not the same as the ρA and ρB on page 209.
(9) (i) Make use of the appendix (pp. 212ff.) to prove that if P1 on a circle
is turned φ degrees (0 < φ ≤ 180) counterclockwise to P2 and P2 is then
turned θ degrees (0 < θ ≤ 180) counterclockwise to P3 , then P2 lies in
∠P1 OP3 (here ∠P1 OP3 is taken to be the union of the convex angles
∠P1 OP2 and ∠P2 OP3 ). (ii) Now prove (a) on page 201.
(10) (This exercise makes use of coordinates; see the warning on pp. 207 and
208.) Show that the transformations of the plane defined by H(x, y) =
(ax3 + bx + c, y) for constants a, b, and c are bijective if ab > 0. (You may
assume that any cubic polynomial has a (real) root.)
(11) (This exercise makes use of coordinates; see the warning on pp. 207 and
208.) Consider the transformation F of the plane defined as follows. Let
P = (x, y). If x ≤ 1, then we define F (P ) = P . If 1 ≤ x ≤ 2, then we
define F (P ) = (2 − x, y). If 2 ≤ x, then we define F (P ) = (x − 2, y). Is F
injective? Is it surjective? Can you roughly describe what F does to the
plane?
(12) Let F be a transformation of the plane. (a) Show that if F is either the
identity transformation I or a constant transformation, then F ◦ F = F .
(b) Exhibit a transformation F which is neither the identity transforma-
tion nor a constant transformation and yet F ◦ F = F . (c) Show that
if F ◦ F = F and F is surjective, then F is the identity transformation.
(d) Show that if F ◦ F = F and F is injective, then F is the identity
transformation.41

4.4. The basic isometries: Rotations


This and the next section will be devoted to a discussion of the three transfor-
mations of the plane that we call the basic isometries: rotations, reflections, and
translations. The assumptions we make about them will be summarized in assump-
tion (L7) on page 237. We will draw a few consequences from the definition of
a rotation, which was defined in the last section, and make a few tentative steps
41 (c) and (d) are due to N. Ackerman.
4.4. THE BASIC ISOMETRIES: ROTATIONS 217

toward proving theorems in geometry in the process. These theorems are needed for
the definitions of reflections and translations in the next section, and among them,
the most important is Theorem G1 on page 220. Theorem G1 will have many
applications.
Assumptions about rotations and first consequences (p. 217)
Theorem G1 and its proof (p. 220)
Theorems G2–G4 (p. 223)

Assumptions about rotations and first consequences

At this point, we will assume that the first six assumptions, (L1)–(L6), have
been committed to memory and we will freely make use of them to prove some
simple geometric theorems. Recall that (L2) is the parallel postulate.
Our attitude toward geometric proofs at this point is strictly utilitarian: we
prove the minimum number of theorems that are needed for the discussion of linear
equations in beginning algebra. A more systematic presentation of the proofs of
the basic theorems in plane geometry will be given in Chapter 6 of [Wu2020b].
Moreover, Chapter 8 in [Wu2020b] will discuss the nature of proofs in geometry
from a broader perspective.
We have already defined rotations on page 202. It remains to make explicit our
assumptions about rotations:
( 1) Any rotation maps a line to a line, a segment to a segment,
a ray to a ray, and an angle to an angle.
( 2) Any rotation preserves lengths of segments and degrees of
angles.
Thus every rotation is by assumption an isometry (see ( 2)). Note that, by
( 2), a rotation preserves not only lengths but also degrees of angles (see page 203
for the definition of degree-preserving). Rotation is the first of three isometries to
be studied in detail that will be referred to as the basic isometries of the plane,42
the other two being reflection (page 229) and translation (page 234). These three
are the basic building blocks of the concept of congruence (page 240).
Note that the rotation of zero degrees around a point is just the identity trans-
formation I of the plane. Rotations of 180 degrees play a major role in the logical
development of plane geometry; see, for example, Theorem G1 on page 220 and
Theorem G12 on page 259.
Let θ and σ be numbers so that −360 ≤ θ, σ ≤ 360 and so that −360 ≤ θ + σ ≤
360. Let θ and σ be rotations of θ and σ degrees, respectively, around the same
center. Then according to equation (4.1) on page 209, the composition of θ and
σ satisfies
(4.6) θ ◦ σ = θ+σ .
Recall that the restriction −360 ≤ θ + σ ≤ 360 has to be imposed on equation (4.6)
because otherwise the right side of (4.6) will not make sense (rotations are thus far
defined only for angles in the range [−360, 360]).
Equation (4.6) has to be supplemented by three remarks. First, the composition
of two rotations with distinct centers may not be a rotation in general; see Exercise
42 But see page 237 for further comments on the terminology of "basic isometries".
218 4. BASIC ISOMETRIES AND CONGRUENCE

10(b) on page 238 and Exercise 11 on page 372. Second, if σ = −θ, then (4.6)
reduces to equation (4.5) on page 211 (because a rotation of 0 degrees is the identity
transformation I):
θ ◦ −θ = I and −θ ◦ θ = I.
As noted on page 211, this implies that, by virtue of Theorem 4.15 on page 211,
each rotation θ is a bijection. A third remark is that when rotations of any degree
have been defined, equation (4.6) will be seen to hold for any two number θ and σ
with no restrictions (see equation (1.94) in Section 1.6 of [Wu2020c]).
We now point out that there are "plenty of" rotations as a result of our as-
sumption about the existence of angles with a prescribed degree in (L6)(ii) on page
188.

Lemma 4.16. Given a point O and a number t so that −360 ≤ t ≤ 360, there
exists a t-degree rotation around O.

Proof. We have to show that, given a number t so that 0 ≤ t ≤ 360, we can define
a t-degree rotation around a given point O using only the given assumptions. We
do so as follows. Since such a rotation maps O to O, we only have to define the
rotated image of P for a point P distinct from O. According to (L6)(ii), there
are two angles ∠P OQ1 and ∠P OQ2 (where Q1 and Q2 lie in different half-planes
of LOP ) sharing the side ROP so that |∠P OQ1 | = |∠P OQ2 | = t◦ . The following
picture shows the case 0 < t < 180:
q
P
HHH qQ1
HH t◦
HH
H
O A t◦
A
A
A
A
Aq Q2
A
Without loss of generality, we may assume that Q1 and Q2 are the points so that
|OQ1 | = |OQ2 | = |OP |. Then by the definition of rotation (page 202), the t-degree
counterclockwise rotation of P has to be one of Q1 and Q2 ; according to the picture
above, it is Q1 . Thus (P ) = Q1 . We have now defined how moves an arbitrary
point P = O, so is well-defined. If t satisfies, instead, −360 ≤ t ≤ 0, then the
t-degree rotation of P will be defined in a similar way, except that we now look for
the clockwise rotation of P of degree |t|. The proof of Lemma 4.16 is complete.

It may not be entirely obvious that the assumptions ( 1) and ( 2) about ro-
tations on page 217, when coupled with (L1)–(L6), already allow us to prove very
interesting geometric theorems. See Theorems G1–G4 in the remainder of this sec-
tion. In addition to their intrinsic interest, these four theorems are needed for a
meaningful discussion of the definitions of reflection and translation in the next
section. To get a preview of some of the mathematical issues involved in these
definitions, we give an intuitive discussion of reflections and translations.
4.4. THE BASIC ISOMETRIES: ROTATIONS 219

Intuitively, the reflection Λ (capital Lambda) across a given line L assigns to


each point on L the point itself, and it assigns to any point P not on L the point
Λ(P ) which is symmetric to P with respect to L, in the sense that L is the
perpendicular bisector (page 192) of the segment joining P to Λ(P ).
L
Λ(P
r) Pr

A Λ(A)
r r

For simplicity let us denote by P  the point Λ(P ). Implicit in this definition is the
fact that (a) there is such a point P  so that L is the perpendicular bisector of the
segment P P  and (b) there is only one such point P  . Neither is obvious at the
moment. The need for (a) is clear, but the need for (b) may be less so. The fact is
that if there is another point Q distinct from P  so that L is also the perpendicular
bisector of P Q, then the definition of a reflection implies that we can also define
Λ(P ) = Q. This raises the question: which point does R assign to P , P  or Q?
L

O.
P P
.
Q

If we cannot verify that both (a) and (b) are valid, then the concept of a reflec-
tion would not be well-defined (see page 45) on two levels. Given a line L and
a point P in the plane, either the putative reflection Λ across L cannot assign a
point to P (this would be the case if (a) fails) or there is more than one candidate
for such a P  so that the assignment of Λ to P becomes ambiguous (this would be
the case if (b) fails). To go forward, what we need is a confirmation of the following.

Wish List #1. Given a line L and a point P , there is one and only one line
passing through P and perpendicular to L.

We must also keep in mind that any confirmation of this claim must be based
only on the properties of rotations such as ( 1) and ( 2) on page 217, together with
assumptions (L1)–(L6).

Next, consider the concept of a translation along a vector. First, a vector is by


definition a segment AB with one of its two endpoints designated as the starting
point and the other as the endpoint. (We put an arrowhead at the endpoint.)
Given a segment AB, there are two ways to make it into a vector: if the starting
−→
point is A, then we denote the resulting vector by AB, and if the starting point is
220 4. BASIC ISOMETRIES AND CONGRUENCE

−→
B, then the resulting vector is denoted by BA. Therefore, while AB and BA are
−−
→ −−

the same segment, AB and BA are different vectors because they have different
−−

starting points and different endpoints. The length of a vector AB is by definition
the length of the segment AB.
−−

Intuitively, the translation T along a given vector AB is the transformation
that "moves a point P in the plane the same distance and in the same direction as
−−→
AB". For example, if P does not lie on line LAB , we can describe T (P ) as follows.
Draw the line  passing through P and parallel to line LAB ; then Q = T (P ) is the
intersection of  and the line passing through B and parallel to the line LAP .

B BBQ
MBB BMB
−−
→ BB
B AB BB
B BB P
AB B
B
−−→
Pictorially, this description of Q = T (P ) is believable as far as "P Q pointing in
−−
→ −−→
the same direction as AB" is concerned. But now how do we know that, with P Q
−−→ −−→
so defined, AB and P Q indeed have the same length? In other words, noting that
ABQP is by definition a parallelogram (page 193), we need the following to be true.

Wish List #2. Opposite sides of a parallelogram have the same length.

Theorem G1 and its proof

We now set out to prove both wish list items. The key ingredient in both proofs
is the following basic theorem about rotations of 180 degrees:
For the sequence of geometric theorems to follow, we will adopt
a special convention for their enumeration. Henceforth, all the
theorems in plane geometry will be numbered consecutively by G1,
G2, G3, etc. This is because, in Chapter 6 of [Wu2020b], we will
bring all these theorems together to give a coherent account of plane
geometry.

Theorem G1. Let O be a point not contained in a line L, and let be the
rotation of 180◦ around O. Then the image of L by is a line parallel to itself; i.e.,
(L)  L.

We note first of all that Lemma 4.16 on page 218 guarantees that there is
such a rotation of 180◦ around O. That said, let us begin by describing explicitly
this rotation . Given a point P distinct from O, let P  denote the image point
(P ) of P by . Then, by (L6)(iii) on page 188, P  is the point on the ray RP O
so that |P  O| = |P O|. This is because ∠P OP  is a straight angle ( is a 180-
degree rotation) and a rotation is distance-preserving (assumption ( 2) on page
217). Similarly, for any other point Q, the image Q of Q by is the point on RQO
so that |Q O| = |QO|.
4.4. THE BASIC ISOMETRIES: ROTATIONS 221

q Q


q Or  qP 
P 


Q q
One should also get an intuitive feeling for why Theorem G1 is true by some
hands-on activities.

Activity 1. Draw a line L on a piece of paper. Also draw a point P on L


and a point O not belonging to L. Copy this picture exactly on a transparency
using a different color. Now pin the transparency to the paper at the point O and
rotate the transparency by 180 degrees. Does the rotated image of L look like a
line parallel to L itself? If we denote the rotated image of P by P  , what do you
observe to be the relationship between the three points P , O, and P  ?
Try this experiment with different choices of O.

Proof of Theorem G1. We give two proofs of this simple but important the-
orem. The first proof argues by contradiction. The second proof does not use a
contradiction argument and is one that can be used directly in a school classroom.

 (L)


 qO

  q L
Q P
By the assumptions about rotations (page 217), we know (L) is a line. Sup-
pose (L) is not parallel to L. Then they intersect at a point Q. The fact that
Q ∈ (L) means that there is a point P ∈ L so that (P ) = Q. Since is a rotation
of 180◦ around O, the three points P , O, and (P ) are collinear; i.e., P , O, and
Q are collinear (see (L6)(iii) on page 188). As usual, call this line LP Q . Now, not
only is P on L, but Q is also on L because Q = L ∩ (L). Thus L and LP Q have
two points P and Q in common and therefore they coincide: L = LP Q (see (L1)
on page 165). But O also lies on LP Q , so O lies on L, and this directly contradicts
the hypothesis that O is not contained in L. Therefore (L) has to be parallel to
L. Theorem G1 is proved.

Now a second proof. Let Q be a point so that Q ∈ (L). We have to prove


that Q does not belong to L. By the definition of (L), Q being in (L) means that
there is a point P on L so that (P ) = Q.

q
 Q (= (P ))


O q




 L
P

222 4. BASIC ISOMETRIES AND CONGRUENCE

Since is a 180-degree rotation around O, the three points P , O, Q are collinear and
therefore Q lies on LOP . Observe that since P is on L and O does not lie on L by
hypothesis, O and P are distinct so that P and Q are also distinct points (because P
and Q lie on opposite half-lines with respect to O by the definition of a 180◦ angle).
Now O lies on LOP and O is not on L, so that LOP and L are distinct lines. By
Lemma 4.2 on page 165, LOP and L can have at most one point in common. Since
P is already known to be on both L and LOP , no other point on LOP can be on L.
In particular, since Q is distinct from P , Q does not lie on L. The proof is complete.

As mentioned on page 166, we can now prove an existence result that comple-
ments the parallel postulate (page 165).

Corollary to Theorem G1. If a point P and a line L are given so that P does
not lie on L, then there exists a line passing through P and parallel to L.

Proof of corollary. Let Q be any point on L and let O be the midpoint of segment
P Q. If is the 180-degree rotation around O, then of course (Q) = P so that
(L) passes through P . Moreover, Theorem G1 says (L)  L, and the corollary is
proved.

We can now fulfill the promise made on page 166 by giving a direct proof of
Lemma 4.3. We are given three lines L1 , L2 , and L3 , and they satisfy L1  L2 and
L2  L3 . We have to prove that if L1 and L3 are distinct, then L1  L3 .
Suppose L1 and L3 are distinct. By Lemma 4.1 on page 165, there is a point
P on L3 so that P is not on L1 . If we can prove that any line  passing through
P distinct from L3 must intersect L1 , then this would imply that any line passing
through P that is not L3 is not parallel to L1 . But since the preceding corollary
implies that there is a line passing through P that is parallel to L1 , we conclude
that L3 must be that line. This then shows L3  L1 .
Thus let  be a line passing through P and distinct from L3 . We are going to
prove that  intersects L1 .


L3
P
L2
P

L1
P 

We cannot prove in one step that  intersects L1 . Instead, we first prove that 
intersects L2 . Indeed, since P ∈ L3 and L3  L2 by hypothesis, P does not lie
on L2 . By the parallel postulate, through P passes at most one line parallel to
L2 . Since, by hypothesis, L3 is that line, then L3 is the only line passing through
P that is parallel to L2 . Therefore  is not parallel to L2 and  intersects L2 at
some point P  . As usual, P  ∈ L2 and L2  L1 (by hypothesis) imply that P  does
not lie on L1 . By the parallel postulate again, through P  passes only one line
4.4. THE BASIC ISOMETRIES: ROTATIONS 223

parallel to L1 , and since L2 is that line by hypothesis,  is not parallel to L1 . Thus


 must intersect L1 at some point P  . By the remark above, this proves Lemma 4.3.

Theorems G2–G4

The next two theorems are both intuitively obvious, and both are simple con-
sequences of Theorem G1. Our immediate concern is whether given a line L and
a point P not on L, there can be two distinct lines passing through P and both
perpendicular to L (see Wish List #1 on page 219). Your instinct tells you "of
course not", because if this happens, the lines would create a triangle whose angles
(have degrees that) add up to > 180 degrees.
P
C
 C
 C
 C
 C
 C
 C
 C L
However, that theorem about the sum of angles of a triangle is not known at this
point and therefore cannot be invoked. We must find another way to show that
this is impossible. This is the content of the following theorem.
Theorem G2. Two lines perpendicular to the same line are either identical or
parallel to each other.

Proof. Let L1 and L2 be two lines perpendicular to a line  at A1 and A2 ,


respectively.
L1 L2  (L1 )




Mq 

A1  A2

We have noted in Lemma 4.11 on page 192 that the line passing through a given
point of a line and perpendicular to that line is unique. Thus if A1 = A2 , L1 and
L2 are identical. So suppose A1 = A2 . We need to prove that L1  L2 . Let be
the rotation of 180 degrees around the midpoint M of A1 A2 . If we can show that
the image of L1 by is L2 , then we know L2  L1 by virtue of Theorem G1.
To this end, note that (L1 ) is a line, by assumption ( 1) on page 217. Fur-
thermore, (L1 ) contains A2 because (A1 ) = A2 . Since is the rotation of 180
degrees around M , it is clear that also (A2 ) = A1 . Thus () is a line that
passes through A1 and A2 , and since  is the unique line passing through these
two points (by (L1)), we have () = . Now, L1 ⊥ . By assumption ( 2) on
page 217, rotations map perpendicular lines to perpendicular lines. Thus we have
(L1 ) ⊥ (); i.e., (L1 ) ⊥ . Therefore each of (L1 ) and L2 is a line that passes
through A2 and is perpendicular to . By the preceding observation about the
224 4. BASIC ISOMETRIES AND CONGRUENCE

uniqueness of the line perpendicular to a line  at a given point of , we see that,


indeed, (L1 ) = L2 and therefore, by Theorem G1, L1  L2 . Theorem G2 is proved.

We will make a digression. Recall that we have introduced the concept of a


rectangle as a quadrilateral whose adjacent sides are all perpendicular to each other
(see p. 193). As a result of Theorem G2, we now have

Corollary. A rectangle is a parallelogram.

Returning now to the main line of our discussion, we see from Theorem G2
that given a point outside a line , there can be at most one line passing through
the point and ⊥ . Now comes the question of existence: is there such a line? We
show that such must be the case by a clever argument: so far we only have one
nontrivial existence theorem, namely, the Corollary to Theorem G1 on page 222.
We will use that to produce a line perpendicular to a given line. First, we give a
definition. Given two lines L1 and L2 , a transversal of L1 and L2 is a line  that
is distinct from L1 and L2 and intersects both. We will prove

Theorem G3. A transversal of two parallel lines that is perpendicular to one


of them is also perpendicular to the other.

Proof. Let L1  L2 and let the transversal  meet L1 and L2 at A1 and A2 ,


respectively. Assuming L1 ⊥ , we will prove that L2 ⊥ . Again we consider the
rotation of 180◦ around the midpoint M of the segment A1 A2 .

L1 L2

M
q

A1 A2

As before, (A1 ) = A2 so that (L1 ) is a line passing through A2 . By Theorem


G1, we also know that (L1 )  L1 . Since L2 is likewise a line passing through A2
and parallel to L1 , the parallel postulate (page 165) implies that (L1 ) = L2 . Now
L1 ⊥ , and preserves degrees of angles. Therefore (L1 ) ⊥ (); i.e., L2 ⊥ ().
Since  passes through M and is a 180◦ rotation around M , we see that () = .
Thus from L2 ⊥ (), we conclude L2 ⊥ , as desired.
Theorem G3 has two interesting corollaries. The first proves Wish List #1 on
page 219, and the second one justifies our definition of a rectangle: rectangles do
exist in the plane.

Corollary 1 of Theorem G3. Given a point P not lying on a line , there exists
one and only one line L passing through P and perpendicular to .
4.4. THE BASIC ISOMETRIES: ROTATIONS 225

Λ L
qP


A

Proof. Let A be a point on  and let Λ be the line passing through A and perpen-
dicular to . If Λ passes through P , we have proved the existence part. If not, then
by the Corollary to Theorem G1 on page 222, there exists a line L passing through
P and parallel to Λ. We shall prove presently that L intersects . Thus  intersects
both Λ and L and is therefore a transversal of Λ and L. By Theorem G3, we have
L ⊥ . This then completes the proof of the existence of such an L provided we
can show that L intersects the line .
Suppose L does not intersect ; then L  . But we already know that L  Λ.
Therefore, by Lemma 4.3 on page 166, Λ  , and this contradicts the fact that Λ
meets  at A. Thus L intersects  after all.
To prove the uniqueness of L, suppose another line L passes through P and is
also perpendicular to . By Theorem G2, since L and L are not parallel (they have
P in common), they have to be identical. Thus L = L . Corollary 1 to Theorem
G3 is proved.

Corollary 2 of Theorem G3. There exist rectangles in the plane.

Proof. Indeed, let two lines L1 and L2 be perpendicular to a third line L at A and
D, respectively. Let a line L be perpendicular to L1 at a point B on L1 , and let
L meet L2 at a point C.
L1 L2

B L
C

L
A D

Then ABCD is a rectangle because L  L by Theorem G2, and therefore, since


L2 ⊥ L, we get L2 ⊥ L , by Theorem G3. Therefore all four angles of ABCD are
right angles and ABCD is a rectangle. Corollary 2 to Theorem G3 is proved.

Corollary 2 to Theorem G3 justifies our definition of a rectangle, because we


now know that there is such a geometric figure as a rectangle. Let us put this
statement in perspective. Suppose we define a slantangle to be a quadrilateral
each of whose angles is 80 degrees. If you think this is a silly definition, you should
ask yourself why you think it is silly. In fact, a rectangle is as likely to exist as a
"slantangle" until we can prove that the sum of the angles of a quadrilateral is 360
degrees. Since the latter theorem will not be available for a while (see Section 6.5
226 4. BASIC ISOMETRIES AND CONGRUENCE

of [Wu2020b]), the virtue of Corollary 2 to Theorem G3 is to give assurance in the


meantime that rectangles do exist and have to be taken seriously.43
The next theorem proves Wish List #2 on page 220.

Theorem G4. Opposite sides of a parallelogram have the same length.

Theorem G4, together with the fact that a rectangle is a parallelogram (Corol-
lary to Theorem G2 on page 224) implies that
the opposite sides of a rectangle have the same length.
This reconciles the usual definition in school mathematics of a rectangle (a quadri-
lateral with four right angles and opposite sides of the same length) with our defi-
nition of a rectangle (a quadrilateral with four right angles).
The proof of Theorem G4 requires the following lemma.

Lemma 4.17. Let F be a bijection of the plane that maps lines to lines, and let
L1 and L2 be two distinct lines. Then the image lines F (L1 ) and F (L2 ) are also
distinct. Furthermore, if L1 and L2 intersect at a point P , then F (L1 ) and F (L2 )
also intersect, and their point of intersection is F (P ).

L2 @ F (L2 ) r
@ F (P )
L1 @P
r
@
@
F (L1 )

Proof of Lemma 4.17. Because F is a bijection, the proof of the distinctness of


F (L1 ) and F (L2 ) is routine and may be left as an exercise (Exercise 2 on p. 228).
Now let the distinct lines L1 and L2 intersect at P . Since P lies on the line L1 ,
F (P ) lies on the line F (L1 ). But P also lies on L2 , so F (P ) lies on the line F (L2 ) as
well. Therefore F (P ) lies in the intersection of the distinct lines F (L1 ) and F (L2 ).
By Lemma 4.2 on page 165, two distinct lines intersect at exactly one point. Hence
F (P ) is the point of intersection of F (L1 ) and F (L2 ). The proof is complete.

Proof of Theorem G4. Given parallelogram ABCD, we must show |DA| = |BC|
and |AB| = |CD|. It suffices to prove the former.

A
@
D

@


@q M


@
C
B

43 To further firm up this discussion, note that "slantangles" exist in the hyperbolic plane

while rectangles do not. The latter fact is proved in Chapter 6 of [Greenberg].


4.4. THE BASIC ISOMETRIES: ROTATIONS 227

We have so few tools at our disposal that our first thoughts have to be: how
can we make use of Theorem G1? If we look at the picture of a parallelogram,
sooner or later the idea will surface that we should do a 180-degree rotation around
the midpoint of a diagonal, e.g., around the midpoint M of the diagonal AC.
(The diagonal AC is not in the original picture of ABCD, but putting it there
helps us see the situation better.) Let be the rotation of 180 degrees around
M . Then (C) = A so that (LBC ) is a line passing through A and (by Theorem
G1) parallel to LBC . Since the line LAD has exactly the same two properties
by assumption, the parallel postulate (page 165) implies that (LBC ) = LAD .
Similarly, (LAB ) = LCD . Thus,

the intersection of (LBC ) and (LAB )


= the intersection of LAD and LCD = D.

On the other hand, the intersection of LBC and LAB is B. By Lemma 4.17, we
have

(4.7) (B) = D.

Recall we also have (C) = A. Since maps segments to segments (by assumption
( 1) on page 217), we have (BC) = DA. Since is an isometry (by assumption
( 2) on page 217), we have |BC| = |DA|, and Theorem G4 is proved.

Pedagogical Comments. In a school classroom, the preceding proof is plenty


good enough. However, if we want to insist on 100% mathematical clarity, then
there are two steps in the preceding proof that may appear obvious but should be
proved in detail if students need them. The first is why (C) = A. Here is the
reason. The rotation interchanges the two rays RM C and RM A ; i.e., (RM C ) =
RM A , so that (C) ∈ RM A . But is also an isometry, so | (M C)| = |M C|. Now
if we let C  = (C), then C  ∈ RM A . We claim (M C) = M C  . To see this,
observe that (M ) = M , and since maps segments to segments (by assumption
(L7)(i) on page 237), (M C) is a segment joining M to C  . However, by assumption
(L1) on page 165, the only segment joining M to C  is the segment M C  on the
line LM C  , which is LM A . Thus (M C) = M C  , as desired. It follows that the
equality | (M C)| = |M C| becomes |M C  | = |M C|. Since M is the midpoint of
AC, |M A| = |M C| and therefore |M C  | = |M A|. Now both C  and A are in the
ray RM A , so we can conclude that C  = A, by (L5)(ii) on page 184; i.e., (C) = A.
A second step that may need more details is similar: why |BC| = |DA|. We
start with the fact that (B) = D (as in (4.7)) and (C) = A. Since maps
segments to segments, by assumption (L7)(i), maps the segment BC to a segment
joining D and A. But there is only one segment joining D and A, namely, the
segment DA in the line LDA , by assumption (L1) on page 165. Hence we have
(BC) = DA. Since is an isometry, we have |BC| = |DA|.
Our recommendation is that such details should be presented only if students
press for them. Generally speaking, they are too much of a good thing in a begin-
ning class on geometry. End of Pedagogical Comments.

Corollary to Theorem G4. The angles of a parallelogram at opposite vertices


(i.e., vertices in a quadrilateral that are not adjacent vertices) have the same degree.
228 4. BASIC ISOMETRIES AND CONGRUENCE

The proof is already implicit in the proof of Theorem G4 and will therefore be
left as an exercise (Exercise 3 on p. 228).
Theorem G4 allows us to introduce a useful concept. Given two parallel lines,
we can now define the distance between them. First, let P be a point not lying
on a line . The distance of P to the line  is by definition the length |P Q|,
where Q is the point of intersection of the line  and the line passing through P
and perpendicular to . See the following picture on the left:

P P P


 
Q Q Q
(
Now suppose we have parallel lines  and  and P ∈  (see picture on the
right). If P  is another point on  , then we claim that the distance of P to  is
the same as the distance of P  to .44 Indeed, let the line passing through P  and
perpendicular to  intersect  at Q . By Theorem G2 (page 223), LP Q  LP  Q .
Therefore P QQ P  is a parallelogram. Consequently, |P Q| = |P  Q |, by Theorem
G4. This proves the claim.
The common distance from points on one of two parallel lines to the other is
called the distance between the parallel lines.

Exercises 4.4.
(1) Prove that a parallelogram with one right angle is a rectangle.
(2) Let F be a bijection of the plane that maps lines to lines. Prove that F
maps distinct lines to distinct lines.
(3) Prove the Corollary to Theorem G4 on page 227.
(4) (This exercise is a further refinement of the proof of Theorem G4.) Recall
from the proof of Theorem G4 that if M is the midpoint of the diagonal
AC and is the 180◦ rotation around M , then (B) = D. Prove that (i)
B, M , and D are collinear and that (ii) the diagonal BD and the diagonal
AC bisect each other, in the sense that the point of intersection of AC
and BD is the midpoint of both AC and BD.
(5) Fix two points P and Q in the plane. Let 1 be the counterclockwise
rotation of 45◦ around P , and let 2 be the clockwise rotation of 90◦
around Q. Also write L for LP Q in the interest of notational simplicity.
Now describe as precisely as you can the two lines 1 2 (L) and 2 1 (L).
In particular, does 1 2 (L) equal 2 1 (L)?
(6) Prove the following slight generalization of Lemma 4.17 on page 226: let
F be a bijection of the plane and let U and V be two subsets of the plane.
Then
F (U ∩ V) = F (U) ∩ F (V).
(7) Given a line L, prove that all the points of a fixed distance k from L form
two lines each parallel to L.
44 This explains why the sleepers (cross ties) across rail tracks can afford to be all of the same
length.
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 229

(8) Given positive numbers a and b, prove that there exists a rectangle whose
sides have lengths a and b. (Do not skip any steps!)
(9) Let L1 and L2 be parallel lines and let O be a point equidistant from L1
and L2 . Let two lines passing through O intersect L1 and L2 at A, B and
C, D, respectively, as shown. Prove that |AB| = |CD|.
D C L2
J 
J 

J
JO
 J

 J L1
A B
(10) (This exercise makes use of a coordinate system; see the warning about
the use of coordinates on page 207.) Let A = (1, 0) and B = (0, 1) and let
A be the 90◦ counterclockwise rotation around A and let B be the 90◦
clockwise rotation around B. Let = B ◦ A . What is (A) and what is
(B)?

4.5. The basic isometries: Reflections and translations


This section gives the definitions of the remaining two basic isometries, reflec-
tion and translation, by making use of the tools derived from the assumption we
made about rotations in the last section, specifically, Corollary 1 of Theorem G3 on
page 224 and Theorem G4 on 226. It concludes by summarizing the assumptions
we make about the basic isometries.
Reflections (p. 229)
Translations (p. 231)
Assumption (L7) about basic isometries (p. 236)

Reflections

Without further ado, we proceed to define reflections.

Definition. Given a line L, the reflection across L (or with respect to


L) is by definition the transformation ΛL of Π, so that:
(1) If P ∈ L, then ΛL (P ) = P .
(2) If P is not in L, then ΛL (P ) is the point Q so that L is the
perpendicular bisector of the segment P Q.
q
Q (= ΛL (P ))
@
@S
@
@ @@
q @
P @
L@
We hasten to show that a reflection is well-defined, in the sense that for a given
point P not on L, there do not exist two distinct points Q and Q so that L is the
230 4. BASIC ISOMETRIES AND CONGRUENCE

perpendicular bisector of both segments P Q and P Q . If this were to happen, then


each time we would reflect a point, we wouldn’t know what we are getting as there
would be more than one reflection of the point. So suppose there were such Q and
Q , and we will deduce a contradiction. Observe that both lines LP Q and LP Q
would be perpendicular to L. By Corollary 1 to Theorem G3 (page 224), there is
only one line passing through P and perpendicular to L. Therefore LP Q = LP Q ;
let us say this line intersects L at S. Then on the line LP S , the points Q and Q lie
in the half-plane of L opposite to that of P , so Q and Q are in the same half-line
of LP S relative to S. Since |SQ| = |SQ | (as both are equal to |P S|), Q = Q
by (L5)(ii) on page 184. This is the desired contradiction. Thus a reflection is
well-defined.
As in the case of rotations, it would be helpful to do an activity to gain some
intuitive understanding of reflections.

Activity 2. On a piece of paper, draw a line, to be called  for the sake


of discussion. Draw some figure on the paper. Then use a piece of overhead-
projector transparency to carefully copy (i.e., trace over) the figure on the paper,
using a different color, say red. In particular, make sure the line  is also on the
transparency. Flip over the transparency along  and superimpose it on the paper,
making sure that the red line  on the transparency matches point for point the
line  on the paper. Now a comparison between the figure on the paper and the
corresponding red figure on the transparency gives a clear idea of how the reflection
across  moves the figure around. In the following picture, the original figure is a
collection of black dots as indicated, and the red figure is represented by the white
dots. Notice that the original figure is spread over both sides of , and so is the
reflected figure.

c s
c s
c s

c c ss s s c c cc s s

A subset S of the plane is said to be symmetric with respect to a line L


if the reflection Λ across L maps S onto itself, i.e., if Λ(S) = S. It is also common
to say that the set S has a line symmetry or has bilateral symmetry if it is
symmetric with respect to some line. Each of the following capital letters of the
alphabet, for example, has bilateral symmetry with respect to a vertical line:
A, H, I, M, O, T, U, V, W, X, Y.
The same is true of the following capital Greek letters:
Δ, Θ, Λ, Ξ, Π, Υ, Φ, Ψ, Ω.
By contrast, it is relatively easy to convince oneself that letters such as J, F, L, and
P have no bilateral symmetry with respect to any line.
Reflections enjoy a remarkable property. Fix a line L, and let Λ be the reflec-
tion with respect to L. Then it is straightforward to check that Λ ◦ Λ = I, where I
denotes the identity transformation of the plane as usual. But this means Λ is its
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 231

own inverse. It follows from Theorem 4.15 on page 211 that every reflection is a
bijection.

As in the case of rotations, we make the following entirely plausible assump-


tions about reflections:
(Λ1) Any reflection maps a line to a line, a segment to a segment,
a ray to a ray, and an angle to an angle.
(Λ2) Any reflection preserves lengths of segments and degrees of
angles.

By assumption, a reflection is an isometry (see (Λ2)). Here is a simple appli-


cation of these assumptions.

Lemma 4.18. Every point on the perpendicular bisector of a segment is equidis-


tant from the endpoints of the segment.
qC
@
@S
@
@ @
@ q
q @A
B @
@
Proof. Indeed, let  be the perpendicular bisector of BC and let A ∈ . We have
to prove |AB| = |AC|. Let Λ be the reflection with respect to . By the definition
of reflection, we see that Λ(B) = C and Λ(A) = A, and therefore Λ(AB) = AC.
By assumption (Λ2), we have |AB| = |AC|. The proof is complete.
As in the case of rotations, we point out that there are "plenty of" reflections:
every line has a unique reflection that leaves each of its points fixed.

Lemma 4.19. Given a line in the plane, there is a reflection across that line.

The lemma follows immediately from the definition of reflection and Corollary
1 of Theorem G3 on page 224.

Translations

The last basic isometry to be introduced is translation. We have to introduce


−−

a new concept before we can give the definition of a translation. Let AB and
−−→ −−→ −−→
P Q be two vectors. We will need to know what it means for AB and P Q to be
pointing in the same direction. By definition, this means (i) either LAB = LP Q
or LAB  LP Q and (ii) there is a line L0 , distinct from LAB and LP Q and parallel
to neither, so that one of the closed half-planes of L0 contains both rays, RAB and
RP Q . (Recall: "closed half-plane" of L0 means the union of a half-plane of L0 with
L0 itself; see page 176.)
We hasten to add that if the line L0 in (ii) is not parallel to either LAB or LP Q ,
then by virtue of Lemma 4.3 on page 166, it is also not parallel to the other.
232 4. BASIC ISOMETRIES AND CONGRUENCE

−−→ −−→
If AB and P Q are pointing in the same direction, then typically they look
something like the following, where the "right" closed half-plane of L0 contains
RAB and RP Q :

r -
L0
P Q

r -
A B
−−
→ −−→
If LAB = LP Q , then AB and P Q pointing in the same direction would look like
this:

L0


Pr -Q Ar - B

Still assuming that LAB = LP Q , if we make LAB into a number line (as (L3) on
−−
→ −−→
page 167 says we could), then the preceding picture suggests that AB and P Q point
in the same direction if and only if either P < Q and A < B, or P > Q and A > B.
The simple proof may be left to Exercise 5 on page 237. An alternate formulation
−−→ −−→
is this: suppose LAB = LP Q . Then AB and P Q point in the same direction if and
only if RAB ⊂ RP Q or RP Q ⊂ RAB . One can also prove this easily by making
LAB into a number line or making use of Lemma 4.6 on page 174 (see Exercise 12
on page 239).
−−→
We usually abuse the language and say that "AB and the line L are parallel" to
mean that LAB and L are parallel. With this understood, it is essential that, in the
−−
→ −−→
preceding definition, the line L0 be not parallel to either AB or P Q. Otherwise, we
could have the following situation where L0  LAB  LP Q and the "lower" closed
−−→
half-plane of L0 does contain both rays, RAB and RP Q , and yet the vectors AB
−−→
and P Q are in no way "pointing in the same direction".

L0 r -
P Q
 r
B A

The following lemma gives some substance to the preceding definition.


−−
→ −−→
Lemma 4.20. Given a vector AB and a point P , there is a unique vector P Q
−−→
pointing in the same direction as AB so that |AB| = |P Q|.

Proof. [Since the lemma is intuitively obvious (try drawing lots of pictures) and the
proof is tedious, it is suggested that this proof not be given in a school classroom.]
First assume P ∈ LAB . If P = A, then we simply let Q = B. From now on,
we may assume P = A. Either P lies on the ray RAB or P does not.
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 233

First, assume P ∈ RAB , as shown:


L0

rA -Br Pr -Q
r

We make LAB into a number line with A = 0 and B > 0. Then the ray RAB
consists of nonnegative numbers. Since P = A and P ∈ RAB , P > 0. Define Q so
that Q > P and so that |P Q| = |AB|. Then RP Q consists of numbers ≥ P and
therefore RP Q consists of positive numbers. Hence, if L0 is the line perpendicular
to LAB at A, then the closed half-plane of L0 that contains B contains all the
nonnegative numbers in LAB (by Lemma 4.7 on page 176) and therefore contains
−−
→ −−→
both RAB and RP Q . This shows AB and P Q point in the same direction.
Next, suppose P does not lie in the ray RAB . Then we make LAB into a number
line so that P = 0 and A > 0. If B < A, then the ray RAB would consist of all
the numbers ≤ A and, since P = 0 < A, we would have P ∈ RAB . Contradiction.
Thus A < B, and RAB consists of all the positive numbers ≥ A. We now choose
Q to be a positive number on LAB so that |P Q| = |AB|. Then RP Q consists of all
nonnegative numbers on LAB . Let L0 be the line perpendicular to LP Q at P .

L0

rP -Q
r Ar -Br

Then the same reasoning shows that the closed half-plane of L0 that contains A
contains all the nonnegative numbers on LAB and, therefore, contains both RAB
−−
→ −−→
and RP Q . This again shows AB and P Q point in the same direction and the
existence part of the lemma is proved if P lies in LAB .
−−→
This vector P Q is unique because, by the definition of "pointing in the same
direction", we know that the point Q must lie in LAB and therefore Q must lie
in one of the two rays issuing from P on LAB determined by P . The requirement
that |P Q| = |AB| implies that once a ray issuing from P is chosen, there can be
only one Q in that ray so that |P Q| = |AB| (by (L5)(ii) on page 184). Since the
preceding existence proof specifies the ray issuing from P in which Q resides, it is
−−→
clear that this Q is unique and therefore P Q is unique.
Next, suppose P does not lie in LAB . Let L0 be the line LAP . Let L1 be the
line passing through P and parallel to LAB . Let L2 be the line passing through
−−

the endpoint B of AB and parallel to the line L0 . The point Q is the intersection
of L1 and L2 , as shown:
L0 L2

P
Q q



L1

A

-
B

Now observe that LP Q  LAB . Furthermore, because ABQP is a parallelogram by


construction, |AB| = |P Q| by Theorem G4 on page 226. We claim that both rays
RP Q and RAB lie in the closed half-plane of L0 containing B. This is because B
234 4. BASIC ISOMETRIES AND CONGRUENCE

and Q lie on a line L2 parallel to L0 , so the segment BQ contains no point of L0


and therefore B and Q lie in the same half-plane of L0 . By Lemma 4.7 on page
176, both RP Q and RAB lie in this closed half-plane, thereby proving the claim.
−−→
It remains to prove the uniqueness of P Q. Since RP Q and RAB point in the
same direction, Q has to lie on the unique line L1 passing through P and parallel
to LAB (the uniqueness of L1 comes from the parallel postulate). The requirement
that |P Q| = |AB| implies there are only two possibilities for Q, namely, the two
points on L1 of distance equal to |AB| from P (see (L5)(ii) on page 184). The fact
that Q must lie in the half-plane of L0 containing B then uniquely determines Q.
The proof of Lemma 4.20 is complete.

Remark. Why must the lines L1 and L2 in the preceding picture intersect?
This is probably not a question one wants to address in a proof during a lesson
in a school classroom, but it is something a teacher should be ready to explain if
the question is raised. So suppose not, then L1  L2 . Since we also have L0  L2
by construction, we have L0  L1 or L0 = L1 by Lemma 4.3 on page 166. But
L0 = L1 because A ∈ L0 and A does not lie in L1 ; therefore, L0  L1 . This is
absurd because L0 intersects L1 at P . Hence, L1 must intersect L2 after all.

With the availability of Lemma 4.20, we can now define a translation.


−−
→ −→
Definition. Given a vector AB, the translation along AB is the transfor-
mation TAB of the plane so that, for a point P in the plane, TAB (P ) = Q, where
−−→ −−

Q is the endpoint of the vector P Q which points in the same direction as AB and
so that |P Q| = |AB|.

It follows immediately from the definition of the translation TAB that if a point
P lies in LAB , then TAB (P ) lies in LAB (see the definition of pointing in the same
direction on page 231). In particular, TAB (A) = B. We would also like to make
explicit the following property of TAB that is basically contained in the proof of
Lemma 4.20.

Lemma 4.21. Suppose a point P does not lie on LAB , and suppose TAB (P ) =
Q. Then Q is the unique point so that ABQP is a parallelogram.
P - q Q = TAB (P )

A
-
B

Proof. The only thing that is not already in the proof of Lemma 4.20 is the
uniqueness of such a Q. With A, B, P given, ABQP being a parallelogram means
Q has to be the intersection of the line passing through P and parallel to LAB , and
the line passing through B and parallel to LAP . Since these two lines are uniquely
determined once A, B, and P are given (parallel postulate), Q is also uniquely
determined. This completes the proof.

We observe that every translation has an inverse transformation (see page 211)
that is also a translation. To see this, let us keep the same notation as Lemma 4.21
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 235

−−→
so that TAB is the translation along AB. We will prove that the translation TBA
−−→
is inverse to TAB . If Q lies on LBA , then one can easily prove that, since P Q
−−
→ −−→ −−→
and AB point in the same direction, QP and BA also point in the same direction.
Therefore, since |QP | = |BA|, we have TBA (Q) = P in this case (Exercise 2 on
237). If Q does not lie on LBA , then since ABQP is a parallelogram, BAP Q is also
a parallelogram. The uniqueness part of Lemma 4.21 now shows that TBA (Q) = P .
Therefore for any point P , we have TBA (TAB (P )) = P , or,
TBA ◦ TAB = I.
By switching the points A and B, we obtain
TAB ◦ TBA = I.
−−

This means that for any vector AB, the translation TAB has an inverse transforma-
tion TBA . By Theorem 4.15 on page 211, every translation is a bijection of the plane.

Mathematical Aside: Because the composition of translations is a translation


(Exercise 10 on page 238), the fact that the inverse of a translation is a translation
means that the set of all translations in the plane is a group whose binary operation
is the composition of transformations. This group of translations is a subgroup of
the group of bijections of the plane defined on page 211.

As in the case of reflections, the following hands-on activity is highly recom-


mended as a way to enhance one’s intuitive understanding of translations.

Activity 3. We use a piece of paper as a model for the plane. On the paper,
−−

draw a vector AB, and also extend the segment AB to a line, denoted as usual
by LAB . Draw some figures on the paper in black. Then use a sheet of overhead-
projector transparency to copy (i.e., trace over) everything on the paper, using (let
−−→
us say) a red pen. In particular, make sure that both the vector AB and the line
LAB are on the transparency. Holding the paper in place, slide the transparency
along the black line LAB on the paper until the red point A on the transparency is
on top of the (black) point B on the paper. The new positions of all the red figures
on the transparency will then display how the translation from A to B moves the
figures on the paper. Here is an example (the starting point of the red vector, a
red A, is not shown in the picture).

B
A
236 4. BASIC ISOMETRIES AND CONGRUENCE

We proceed to make the same assumptions about translations as those on


rotations and reflections:
(T 1) Any translation maps a line to a line, a segment to a seg-
ment, a ray to a ray, and an angle to an angle.
(T 2) Any translation preserves lengths of segments and degrees
of angles.
Again, we point out that, by assumption, a translation is an isometry and that
there are "plenty of" translations. The following lemma follows immediately from
the definition of translation and Lemma 4.20 on page 232.
−−

Lemma 4.22. Given any vector AB, there exists a unique translation along

−→
AB.

Translations have a noteworthy property.

Theorem G5. Given a line LAB that joins two distinct points A and B, let 
−−→
be a line in the plane, and let T be the translation TAB along AB. (i) If  is equal
to LAB or parallel to LAB , then T () = . (ii) If  is neither equal to LAB nor
parallel to LAB , then the translation T () is a line parallel to  itself.

Proof. Part (i) follows immediately from the definition of a translation and the
parallel postulate, so we may leave its proof as an exercise (Exercise 1 on page 237).
We will give the proof of part (ii). So let  be a line neither parallel to LAB nor
equal to LAB . Suppose T () is not parallel to . We will deduce a contradiction.
Since T () is not parallel to , either T () =  or T () intersects  at a point
Q. In either case, we have a point Q on T () ∩ . Since Q ∈ T (), there is a point
P ∈  so that Q = T (P ).

T ()  r
 Q = T (P )

r 
 P


Either P does lie in LAB or it does not. Suppose P ∈ LAB ; then T (P ) ∈ LAB (as
was pointed out right below the definition of a translation on page 234), so that
Q ∈ LAB . Therefore both P and Q lie on LAB . But P = Q because |P Q| = |AB|
by the definition of a translation and A and B are distinct. So |P Q| = |AB| > 0.
Thus  and TAB are two lines passing through the distinct points P and Q; by
(L1),  = LAB . This contradicts the hypothesis of (ii) that  is not equal to LAB .
Next, suppose P does not lie on LAB . By Lemma 4.21 on page 234, P Q  LAB ;
i.e.,   LAB . This again contradicts the working hypothesis that  is not parallel
to LAB . Hence T ()   and the theorem is proved.

Assumption (L7) about the basic isometries

We have just finished the definitions of the basic isometries (i.e., rotations, re-
flections, and translations), and it remains to make some concluding remarks. We
have made assumptions about rotations (see ( 1) and ( 2) on page 217), reflections
4.5. THE BASIC ISOMETRIES: REFLECTIONS AND TRANSLATIONS 237

(see (Λ1) and (Λ2) on page 231), and translations (see (T 1) and (T 2) on page 236).
We can now summarize these assumptions in one all-embracing statement, as fol-
lows.

(L7) The basic isometries (rotations, reflections, and translations) have the
following properties:
(i) A basic isometry maps a line to a line, a segment to a seg-
ment, a ray to a ray, and an angle to an angle.
(ii) A basic isometry preserves lengths of segments and degrees
of angles.
Once again, we note that rotations, reflections, and translations are, by as-
sumption, isometries and that there are "plenty of" basic isometries in the sense of
Lemmas 4.16, 4.19, and 4.22 (page 218, page 231, and page 236, respectively).

In one sense, the terminology of "basic isometries" is unfortunate because, at


least for now, the basic isometries possess properties that seem not to be shared
by isometries in general. Indeed, whereas by definition, isometries only preserve
distance, the basic isometries are assumed to preserve not only distance but also
degrees of angles; they also map lines to lines and angles to angles (see (L7) above).
We will have to wait until Section 6.6 of [Wu2020b] before we can resolve this
apparent discrepancy; every isometry will turn out to satisfy properties (i) and (ii)
of the basic isometries in assumption (L7). Needless to say, this fact is far from
obvious.
The next section gives the most basic applications of the basic isometries, but
a lot more will be said about them in Chapter 6 of [Wu2020b]. The reason we do
not pursue this discussion further in this volume is that we are trying to present
just enough of the geometry needed to begin the study of linear equations. This
constraint comes from the practical reality of the middle school mathematics cur-
riculum: to the extent that the study of linear equations is taken up in middle
school, we must try to do enough with similar triangles to support this study.

Exercises 4.5.
(1) Prove part (i) of Theorem G5 on page 236.
(2) Let a translation TAB be given, and let TAB (P ) = Q. If Q lies on LBA ,
prove that TBA (Q) = P .
(3) Prove that if the diagonals of a parallelogram are perpendicular to each
other, then the parallelogram is a rhombus; i.e., all four sides of the par-
allelogram are equal. (See Exercise 4 on page 228.)
(4) (i) Let ABCD be a parallelogram. Suppose F is a point on AD and
E is a point on BC such that |AF | = |BE|. Prove that ABEF is a
parallelogram. (ii) Suppose in part (i) the parallelogram ABCD is a
rectangle. Prove that ABEF is also a rectangle.
(5) Suppose the four points A, B, P , and Q lie on a number line. Prove that
−−→ −−→
AB and P Q point in the same direction if and only if A < B and P < Q,
or A > B and P > Q.
238 4. BASIC ISOMETRIES AND CONGRUENCE

−−→ −−→
(6) Suppose the two vectors AB and P Q point in the same direction and the
−−→ −−→
two vectors P Q and U V also point in the same direction. Does it follow
−−
→ −−→
that AB and U V point in the same direction?
(7) (This exercise makes use of a coordinate system; see the warning about the
use of coordinates on page 207.) In the picture below, L is the horizontal
−−→
line {y = 1}, and CD is the vector that starts at C = (3, 0) and ends at
D = (2, 1). The points A and B are as shown. Let R = the reflection
−−→
across L, T = the translation along CD, and = the 90◦ rotation around
B (according to the definition of rotations on page 202, this is a counter-
clockwise rotation). What is ( ◦ T ◦ R)(A) and what is ( ◦ R ◦ T )(A)?
Are the two points the same?

y r A = (1, 2)

D = (2, 1) L
I
@
@
@
@
@
r 45◦ @
x
O B = (1, 0) C = (3, 0)

(8) In terms of the concept of a translation, we saw in the proof of Lemma


4.20 on page 232 that if TAB (P ) = Q and P does not lie on LAB , we
can draw a picture to show where the point Q is. Using exactly the same
notation, draw a picture to show where Q is if TBA (P ) = Q .
(9) Prove that every point on the angle bisector of a convex angle is equidis-
tant from both sides of the angle. (Hint: Recall the concept of distance
of a point from a line on page 228.)
(10) (a) Prove that the composition of two translations is a translation. (b) Is
the composition of two rotations (with possibly distinct centers of rota-
tion) a rotation?45 (Hint: Consider the example in Exercise 10 on page
229, and also prove that if is a rotation, then for a point P distinct from
the center O of the rotation, O must lie on the perpendicular bisector of
the segment joining P to (P ).)
(11) Let L1 , L2 , and L3 be parallel lines and let  and L be transversals so
that they intersect L1 , L2 , and L3 at A, B, and C, and D, E, and F ,
respectively. Suppose |DE| = |EF |. Then prove that |AB| = |BC|.
(This implies that equidistant parallel lines intercept equal segments on
any transversal.)

45 Part (b) has a continuation in Exercise 11 on page 372.


4.6. CONGRUENCE, SAS, AND ASA 239

 L
C
L1 A DC P
C
C
C
L2 B EC Q
C
C
C
L3 C F C
CC
(Hint. There are many ways to do this, and here is one that uses trans-
lations. Let the line parallel to  and passing through E intersect L1 at
P , and let the line parallel to  and passing through F intersect L2 at Q.
−−→
Let T be the translation along the vector DE. Then of course T (D) = E.
Prove T (E) = F , and then prove T (P ) = Q.)
−−
→ −−→
(12) Prove that if LAB = LP Q , then AB and P Q point in the same direction
if and only if RAB ⊂ RP Q or RP Q ⊂ RAB . (Hint: Make use of Lemma
4.6 on page 174 and imitate its proof by making LAB into a number line
−−→ −−→
and argue with < rather than with ∗. Suppose AB and P Q point in the
same direction. Let L0 be a line whose closed half-plane contains both
RAB and RP Q , and let LAB intersect L0 at O. If O ∗ P ∗ A, then prove
RAB ⊂ RP Q , but if O ∗ A ∗ P , then prove RP Q ⊂ RAB . Conversely, if
RAB ⊂ RP Q , let L0 be the line ⊥ LP Q at P . Then the closed half-plane
of L0 that contains Q contains both RP Q and RAB .)
(13) (a) Prove that every translation is equal to the composition of two reflec-
tions. (b) Prove that every rotation is also equal to the composition of
two reflections.
Comments: The net effect of this exercise seems to be that we
can forget rotations and translations because we only need re-
flections. This is an algebraic afterthought on the geometry of
basic isometries, and it must be said that, in advanced mathe-
matics, this algebraic point of view has paid immense dividends.
On the other hand, this algebraic fact is only something to keep
in mind from the point of view of algebra, but no more than
that. Geometers continue to think in terms of translations and
rotations directly.

4.6. Congruence, SAS, and ASA


This section gives the definition of a congruence and proves two of the three
most basic criteria for triangle congruence: SAS and ASA. It concludes with a
statement of the last assumption we need for doing geometry in these volumes: the
crossbar axiom. This axiom has not made its appearance until now, but it will be
seen to enter a few proofs at some of the most critical junctures.
The definition and basic properties of congruence (p. 240)
Two congruence criteria for triangles (p. 244)
The crossbar axiom (p. 250)
240 4. BASIC ISOMETRIES AND CONGRUENCE

The definition and basic properties of congruence

We begin with a key definition.

Definition. A congruence is a transformation of the plane that is the com-


position of a finite number of basic isometries.

The concept of congruence is one of the cornerstones in the school geometry


curriculum. Here are its most basic properties:

Theorem G6. (a) Every congruence is an isometry; it preserves lines, rays,


segments, and the degrees of angles, and it is also a bijection. (b) The inverse of a
congruence is a congruence. (c) Congruences are closed under composition in
the following sense: if F and G are congruences, so is F ◦ G.

Proof. It has been pointed out that every one of the basic isometries has the
following three properties: it is a bijection (see pp. 205, 231, and 235), it is an
isometry, and it maps lines to lines as well as preserves the degrees of angles (see
(L7) on page 237 for each of these claims). Because these properties persist under
composition, the proof of part (a) of the theorem is straightforward. To prove part
(b), i.e., the inverse of a congruence is a congruence, let a congruence ϕ be the
composition of three basic isometries F ◦ G ◦ H; then it is simple to directly verify
that if ψ = H −1 ◦ G−1 ◦ F −1 , then ψ ◦ ϕ = I = ϕ ◦ ψ. So ψ is the inverse of ϕ.
But the inverse of a basic isometry is a basic isometry, because the inverse of a
rotation is a rotation (see page 211), the inverse of a reflection is itself (see page
231), and the inverse of a translation is a translation (see page 235), so ψ is also
a congruence. The proof is similar if ϕ is the composition of any number of basic
isometries. Part (c) follows immediately from the definition of a congruence as a
composition of basic isometries. The proof of Theorem G6 is complete.

Mathematical Aside: The fact that the composition of congruences is a con-


gruence and the inverse of a congruence is also a congruence means that the set
of all congruences in the plane form a group, with the binary operation being the
composition of transformations. We call this the group of congruences of the plane.
Clearly, recalling the group of bijections (page 211) and the group of translations
(page 235), we have
group of translations ⊂ group of congruences ⊂ group of bijections.
In this chain of inclusions, each group is a subgroup of the next. This is one way
to observe group theory in action in a "real world" setting.

A subset of the plane S is said to be congruent to another subset S  of the


plane if there is a congruence ϕ so that ϕ(S) = S  . In symbols, S ∼ = S  . Since
the inverse of a congruence is a congruence (Theorem G6(b)) and since ϕ(S) = S 
implies ϕ−1 (S  ) = S, we see that S ∼ = S  implies S  ∼
= S. This means "S being
 
congruent to S " is equivalent to "S being congruent to S", so that it does not
matter whether S or S  comes first. Thus we can speak unambiguously about two
sets S and S  being congruent. Incidentally, the usual terminology for the fact
that S ∼= S  implies S  ∼
= S for every S and S  is that the relation of congruence
between two sets is a symmetric relation or that congruence is symmetric.
4.6. CONGRUENCE, SAS, AND ASA 241

We leave it as an exercise to show that if S1 is congruent to S2 and S2 is congruent


to S3 , then S1 is congruent to S3 . This fact is usually expressed by saying that
congruence is a transitive relation or more simply congruence is transitive.
Finally, congruence is also reflexive, in the sense that S ∼ = S for any S (one also
says congruence is a reflexive relation). In general, if there is a relation among
a class of geometric figures that is reflexive, symmetric, and transitive, then the
figures that are so related are intuitively "almost equal". Such a relation is called
an equivalence relation. Therefore congruence is an equivalence relation.

With the availability of the concept of congruence, we can now give new mean-
ing to segments with the same length and angles with the same degree by proving
the following lemma. The proof of the lemma is very instructive.

Lemma 4.23. (i) Two segments have the same length if and only if they are
congruent to each other. (ii) Two angles have the same degree if and only if they
are congruent to each other.

Proof. We first prove (i). Let AB and A B  be congruent segments. Since ev-
ery congruence is an isometry (Theorem G6), the segments have the same length.
Conversely, suppose |AB| = |A B  |, then we will prove that there is a congruence
φ so that φ(AB) = A B  .
B
ZZ
Z
Z
Z
Z
A

A B
−−→
Let T be the translation along the vector AA . Then T maps A to A and—by
(L7)(i) on page 237—maps AB to a segment whose one endpoint is at A and
the other endpoint we call B0 ; i.e., B0 = T (B). (In the following picture, since
we are trying to show the position of A B0 (= T (AB)), we use a dashed line to
represent the segment AB in its original position. Of course T also moves A B 
away from its original position, but since we are not concerned with T (A B  ), we
omit T (A B  ) from the picture altogether. However, we choose to retain A B  in
its original position in the picture for the benefit of this discussion.)
B
ZZ
Z
Z
Z
Z
 :
A B0



(= T (A)) (= T (B))

A B
Let ∠B  A B0 denote the convex angle with vertex at A , and let |∠B  A B0 | = t◦ .
Let denote the t-degree rotation around A . (In this picture, is clearly the
242 4. BASIC ISOMETRIES AND CONGRUENCE

counterclockwise rotation of t degrees, but if the endpoint B  of A B  were in the


lower half-plane of the line LA B0 , then it would be the clockwise rotation.)
B
ZZ
Z
t Z
Z
Z

:
 A B0
 


A B
We claim that the composition ◦ T is the desired congruence. Let φ = ◦ T . We
already know that φ(A) = A , so we have to prove that φ(B) = B  .
The main weight of proving this claim lies in the proof that
(4.8) (B0 ) = B  .
Denote (B0 ) by B1 ; then by the definition of , |∠B1 A B0 | = t◦ . By Lemma 4.10
on page 190, RA B1 coincides with RA B  . Therefore B1 lies on the ray RA B  . To
prove (4.8), observe that
|A B0 | = |AB|,
because A B0 = T (AB) and T preserves lengths of segments (see (L7)(ii) on page
237). In addition,
|A B0 | = |A B1 |,
again because preserves lengths of segments on account of (L7)(ii). Putting these
two equalities together, we obtain |A B1 | = |AB|. But by hypothesis, |AB| =
|A B  |, so we obtain
|A B1 | = |A B  |.
Recall that B1 (= (B0 )) and B  are two points on the ray RA B  , so (L5)(ii) on
page 184 implies that B1 = B  ; i.e., (B0 ) = B  , and we have proved (4.8).
We can now easily prove φ(B) = B  . Since B0 = T (B), we appeal to (4.8) to
get
B  = (B0 ) = (T (B)) = ( ◦ T )(B) = φ(B),
as claimed. By (L7)(i) and the fact that φ(A ) = A , we see that φ(AB) is the
segment joining A (= φ(A)) to B  (= φ(B)) and, therefore, φ(AB) = A B  by (L1)
on page 165. The proof of part (i) is complete.

Next, we prove part (ii). If two angles


B
are congruent, then because congruence
preserves the degrees of angles (Theorem O
A
G6), the angles have the same degree.
Conversely, given two angles ∠AOB and O BHH
   B HH
∠A O B with the same degree, we have H
B HA
to produce a congruence that maps one B
B
angle to the other. BB
4.6. CONGRUENCE, SAS, AND ASA 243

−−→
As before, we use the translation T along the vector OO  to map O to O  .
T then moves ∠AOB to a new angle ∠A0 O  B0 (see (L7)(i) on page 237) with
vertex at O  , where A0 = T (A) and B0 = T (B). See the left picture below. Let
the degree of the convex angle ∠A0 O  A be t. Let be the t-degree (clockwise or
counterclockwise) rotation around O  (in the picture, it is counterclockwise) so that
maps the ray RO A0 to the ray RO A . Let map B0 to a point B1 . Thus maps
∠A0 O  B0 to ∠A O  B1 , as shown in the right picture below.

B B

O O
t A A

O A0 = T (A) O
B1
A B0 = T (B) A

B B

Notice that the congruence ◦ T has moved ∠AOB to ∠A O  B1 so that the
latter now has one side in common with ∠A O  B  . In the right picture above, B1
and B  are in opposite half-planes of the line LO A . Now we use the reflection Λ
across LO A to map ∠A O  B1 to an angle in the closed half-plane of LO A that
contains ∠A O  B  . We claim that Λ maps the ray RO B1 to the ray RO B  . To
prove the claim, let B 1 = Λ(B1 ). Then the claim becomes: the two rays RO B  and
RO B 1 coincide.

B1
% B
%
%
%
O % A
To this end, we will prove that |∠A O  B  | = |∠A O  B 1 | and appeal to Lemma
4.10 on page 190 to draw the desired conclusion. To prove |∠A O  B  | = |∠A O  B 1 |,
observe that ∠A O  B 1 = Λ(∠A O  B1 ), and we also have
∠A O  B1 = (∠A0 O  B0 ) = (T (∠AOB)).
Therefore,
∠A O  B 1 = Λ(∠A O  B1 ) = Λ( (T (∠AOB))) = (Λ ◦ ◦ T )(∠AOB).
Let φ denote the congruence Λ ◦ ◦ T . Then we have
(4.9) φ(∠AOB) = ∠A O  B 1 .
By assumption (L7) on page 237, φ preserves degrees of angles. Thus ∠AOB and
∠A O  B 1 have the same degree. By hypothesis, ∠AOB and ∠A O  B  also have the
244 4. BASIC ISOMETRIES AND CONGRUENCE

same degree. We conclude therefore that

|∠A O  B  | = |∠A O  B 1 |,

as desired. Since these two angles lie in the same closed half-plane of their common
side which is the ray RO A , Lemma 4.10 implies that their other sides must coincide;
i.e., the two rays RO B  and RO B 1 coincide, and the claim is proved.
Equation (4.9) now reads:

φ(∠AOB) = ∠A O  B  .

Therefore, φ is the congruence that maps ∠AOB to ∠A O  B  .


Finally, we observe that if the ray RO B1 is already in the same closed half-
plane of LO A as the ray RO B  , then no reflection across LO A would be necessary
because the preceding argument would have proved directly that RO B1 is equal to
RO B  . Then the congruence ◦ T maps ∠AOB to ∠A O  B  as before. The proof
of Lemma 4.23 is complete.

Pedagogical Comments. The simplicity of the reasoning in the preceding


proof is marred by the unavoidably messy notation. In a school classroom, the
proof can be made much clearer by using two plastic angles to represent ∠AOB
and ∠A O  B  so that one can realize the translation T , the rotation , and the re-
flection Λ by actual movements of one of the plastic angles. End of Pedagogical
Comments.

The lemma shows the equivalence of "two segments have the same length" with
"two segments are congruent". Following the tradition started by Euclid ([Euclid1]),
it is customary to say that two segments are equal when what is meant is that
they are congruent. The same goes for equal angles when what is meant is that
the angles are congruent. When there is no fear of confusion, we will also abuse the
language in this manner for the rest of these volumes, but it is important to note
that this terminology is, strictly speaking, incorrect, because, for example, "equal
segment" means literally that the segments are equal geometric figures in the sense
of equal subsets of the plane (see page 141).

Two congruence criteria for triangles

The rest of the geometric discussion in this volume will focus on triangles and,
at times, polygons. It is however worth pointing out that the concept of congruence
applies not only to polygons, but to any geometric figures, including "curved" ones
such as parabolas and ellipses. See Chapter 2 of [Wu2020b] and Chapters 1 and 4
of [Wu2020c]. At this point, a little reflection will reveal that the crude definition
of congruence in TSM as "same size and same shape"—regardless of its intuitive
appeal—can in no way be used as a definition of congruence. In mathematics,
one cannot replace a precise concept with a vague intuitive one, no matter how
appealing. For example, we can say that the following two strange looking figures
are congruent, not because they "look the same", but because the left figure can
be mapped to the right figure by a translation (such as from P to Q) followed by
a (counterclockwise) rotation of 90◦ .
4.6. CONGRUENCE, SAS, AND ASA 245

Congruent triangles occupy a special position in elementary geometry and have


their own special conventions. Denote a triangle ABC by ABC. Then
the congruence notation ABC ∼ = A B  C  will be under-
stood to mean not only that there is a congruence ϕ so that
ϕ(ABC) = A B  C  , but also that ϕ(A) = A , ϕ(B) = B  ,
and ϕ(C) = C  .
Therefore, if ABC ∼ = A B  C  , then it is understood that ϕ(AB) = A B  ,
ϕ(AC) = A C , and ϕ(BC) = B  C  and also that ϕ(∠A) = ∠A , ϕ(∠B) = ∠B  ,
 

and ϕ(∠C) = ∠C  . Therefore, by Theorem G6(a) on page 240, ABC ∼ = A B  C 


implies that
|∠A| = |∠A |, |∠B| = |∠B  |, |∠C| = |∠C  |
and
|AB| = |A B  |, |AC| = |A C  |, |BC| = |B  C  |.
We now prove the converse:

Theorem G7. If for two triangles ABC and A B  C  ,


|∠A| = |∠A |, |∠B| = |∠B  |, |∠C| = |∠C  |
and
|AB| = |A B  |, |AC| = |A C  |, |BC| = |B  C  |,
then ABC ∼
= A B  C  .

Conceptually, Theorem G7 is the correct theorem. However, in a practical


sense, this theorem is an overkill, in that it is not necessary to require the equalities
of all the angles and all the sides of two triangles before we can prove the congruence
of the triangles. Typically, it suffices to impose three suitably chosen conditions46
to guarantee congruence, and the best known among these criteria are SAS, ASA,
and SSS. The proof of SSS is a bit of an interruption at this juncture and will
therefore be postponed to Section 6.2 in [Wu2020b], but we can prove the other
two here:

Theorem G8 (SAS). Assume two triangles ABC and A B  C  so that |∠A| =


|∠A |, |AB| = |A B  |, and |AC| = |A C  |. Then the triangles are congruent.


Theorem G9 (ASA). Assume two triangles ABC and A B  C  so that |AB| =


|A B  |, |∠A| = |∠A |, and |∠B| = |∠B  |. Then the triangles are congruent.


46 The emphasis is on "suitably chosen". See, for example, Exercise 5 on page 251.
246 4. BASIC ISOMETRIES AND CONGRUENCE

We will prove Theorem G9 below. The proof of Theorem G8 (SAS) is very


similar and will be left as an exercise (Exercise 3 on page 251). However, we want
to call attention to an animation (due to Larry Francis) that shows, under the
hypothesis of SAS, how one triangle can be moved to the other:
https://youtu.be/30dOn3QARVU.
This video sheds light on the following proof of Theorem G9 as well.

The proofs of SAS and ASA depend on the following simple lemmas. We note
that for the proof of Theorem G9 (ASA), we only need Lemma 4.24 (whose proof
is actually implicit in the proof of Lemma 4.23 on page 241). For the proof of
Theorem G8 (SAS), however, Lemma 4.25 will be needed.

Lemma 4.24. Assume two convex angles ∠M AB and ∠N AB so that |∠M AB|
= |∠N AB|. Suppose they have one side AB in common and M and N are on
opposite sides of the line LAB . Then the reflection across the line LAB maps ∠N AB
to ∠M AB (and also maps ∠M AB to ∠N AB).

M



AH B
HH
HH
HH
N

Lemma 4.25. Suppose two convex angles ∠M AB and ∠N AB have the same
degree and they have one side AB in common. Assume further that the segments
AM and AN have the same length. Then either M = N (if M and N are on the
same side of LAB ) or the reflection across LAB maps N to M (if M and N are on
opposite sides of LAB ).

r M



AH B
HH
HH
HH
Hr N
For the proof of Lemma 4.24, observe that the reflection R across LAB maps
∠N AB to ∠N0 AB, where N0 = R(N ), so that ∠N0 AB and ∠M AB are now con-
vex angles with the same degree, in the same half-plane of LAB , with one side RAB
in common. So ∠N0 AB = ∠M AB, by Lemma 4.10 on page 190. This proves
Lemma 4.24. As to Lemma 4.25, suppose M and N are on the same side of LAB .
By Lemma 4.10 on page 190 again, we know that the rays AM and AN coincide.
But since |AM | = |AN |, necessarily M = N . Now if M and N are on opposite
sides of LAB , then Lemma 4.24 shows that the reflection across LAB maps the ray
4.6. CONGRUENCE, SAS, AND ASA 247

RAN to the ray RAM . Since a reflection preserves length (see (L7)(ii) on page 237),
the reflection maps the segment AN to a segment of length equal to |AM |, and
therefore M = N by virtue of (L5)(ii) on page 184. This proves Lemma 4.25.

Proof of Theorem G9. We will prove that if the triangles ABC and A B  C 
satisfy |AB| = |A B  |, |∠A| = |∠A |, and |∠B| = |∠B  |, then there is a congruence
ϕ so that

ϕ(ABC) = A B  C  .

To this end, we break up the proof into three steps, going from a special case to
the most general case:

Case I. The triangles satisfy in addition, A = A , B = B  .


Case II. The triangles satisfy in addition, A = A .
Case III. The general case.

Case I. In this case, either C, C  are already in the same half-plane of LAB or
they are in opposite half-planes of LAB . If the former, then we claim that C = C  ,
so that in this situation, we need only let ϕ be I, the identity transformation. To
prove the claim, observe that because |∠CAB| = |∠C  AB| by hypothesis, the fact
that C and C  are in the same half-plane of LAB implies that we have the equality
of rays RAC = RAC  (Lemma 4.10 on page 190). Thus the two rays RAC and RAC 
in the following picture in fact coincide:

C
% C
%
%
%
%
A % B

In like manner, because |∠CBA| = |∠C  BA|, we have RBC = RBC  . Therefore
the following intersections are equal:

RAC ∩ RBC = RAC  ∩ RBC 

which means of course that C = C  . So in this situation, ABC = A B  C  , or,


more formally,

(4.10) I(ABC) = A B  C  ,

where I is the identity transformation.


248 4. BASIC ISOMETRIES AND CONGRUENCE

Next, suppose A = A , B = B  but C, C  are in opposite half-planes of LAB :


C
@

 @
  @
A = A 
HH
@ B = B
HH
HH
H
H
C
By assumption, ∠CAB = ∠C  AB. Therefore if Λ is the reflection with respect to
LAB , then Λ maps the ray RAC to the ray RAC  , by Lemma 4.24. For the same
reason, Λ also maps RBC to RBC  . Thus, by Lemma 4.17 on page 226,
Λ(RAC ∩ RBC ) = RAC  ∩ RBC  .
Since RAC ∩ RBC is just C and RAC  ∩ RBC  is just C  , we get Λ(C) = C  . Since
also Λ(A) = A = A and Λ(B) = B = B  , we have,
(4.11) Λ(ABC) = A B  C  .
Now let ϕ be either I or the reflection Λ across LAB , depending on whether C
and C  lie in the same half-plane or different half-planes of LAB , respectively. Then
(4.10) and (4.11) together imply that ϕ(ABC) = A B  C  . Case I is thus proved
with this choice of the congruence ϕ.

Case II. We now let the triangles satisfy the less restrictive condition that
A = A , but B and B  may now be distinct. Let the degree of the convex angle
between the rays RAB and RAB  be θ, as shown in the left picture below.
C
 @
 A = A r
 @ A
A = A  r  @B  A
  A
 θ  A
  A
 C  A C∗
 @
C  @
@ @ 
@ B = B∗
@ B
Then there is a θ-degree (clockwise or counterclockwise) rotation θ around the
point A so that θ (RAB ) = RAB  (compare the proof of Lemma 4.23 on page 241).
In the left picture above, it is a clockwise rotation, but a different configuration
may require a counterclockwise rotation from RAB to RAB  .
Let B ∗ = θ (B). Now B  and B ∗ are two points on the ray RA B  so that
|A B | = |AB| (by hypothesis) and |A B ∗ | = | θ (AB)| = |AB| (because of (L7)(ii)
 

on page 237 and the fact that θ is a basic isometry). Therefore, B  and B ∗ are
two points on the same ray RA B  equidistant from A ; we conclude that B  = B ∗
(by (L5)(ii) on page 184); i.e., B  = θ (B).
Letting C ∗ = θ (C), we get
(4.12) θ (ABC) = A B  C ∗ .
4.6. CONGRUENCE, SAS, AND ASA 249

Therefore, the two triangles A B  C  and A B  C ∗ satisfy the condition of Case I;


i.e., they share two vertices A and B  , and the angles at these vertices are pairwise
congruent. Consequently, there is a congruence ω so that
(4.13) ω(A B  C ∗ ) = A B  C  .
If we apply the transformation ω to both sides of (4.12) and make use of (4.13),
we get
ω( θ (ABC)) = ω(ABC ∗ ) = A B  C  .
By Theorem G6(c) on p. 240, ω ◦ θ is a congruence. Letting ϕ = ω ◦ θ , we get
ϕ(ABC) = A B  C  .
Thus the theorem is also proved for Case II.

Case III. Finally, we deal with the general case where no restriction is placed
on the triangles ABC and A B  C  . Assuming that the vertices A and A are distinct,
−−→
let T be the translation along the vector AA ; note that T (A) = A . If we define
B ∗ = T (B) and C ∗ = T (C), then
(4.14) T (ABC) = A B ∗ C ∗ .
C∗

A B∗
C

C A B

B
Now the triangles A B C and A B ∗ C ∗ have the vertex A in common. Furthermore,
  

because T is a basic isometry and therefore preserves lengths of segments and


degrees of angles, these two triangles satisfy |A B ∗ | = |A B  |, |∠A | = |∠B ∗ A C ∗ |,
and |∠B  | = |∠A B ∗ C ∗ |. Thus Case II applies and, for a suitable congruence ω, we
have
(4.15) ω(A B ∗ C ∗ ) = A B  C  .
Applying ω to (4.14) and making use of (4.15), we get
ω(T (ABC)) = ω(A B ∗ C ∗ ) = A B  C  .
Letting ϕ = ω ◦ T , we see that ϕ is a congruence (Theorem G6(c) on p. 240) and
ϕ(ABC) = A B  C  .
This completes the proof of Theorem G9.

The preceding proof furnishes a classic example of a proof that "progresses from
the simple to the complex", in the sense that it starts by proving a relatively simple
case (Case I), then proceeds to a slightly more complex case (Case II), and then
finally arrives at the most general case (Case III). It may remind you of the proof
250 4. BASIC ISOMETRIES AND CONGRUENCE

of Theorem 1.7 on page 49, in which we first proved that the area of a rectangle
with sides of length 1 and n1 is 1 × n1 and, on the basis of this fact, we proceeded
to prove the general case, namely, that the area of a rectangle with sides of length
 and n is  × n . One should not hold onto the simplistic belief that all proofs
k m k m

yield to such a direct attack, but this direct approach is something we must keep
in mind anytime we want to prove a theorem.
We will prove in Section 6.6 of [Wu2020b] that every isometry of the plane is
a congruence. In other words, every isometry is nothing but the composition of
a finite number of basic isometries. This underscores the importance of the basic
isometries: they are the basic building blocks of all the isometries of the plane. Once
we know this, then we know that if a transformation preserves distance, it must
be a congruence and therefore it is automatically surjective and it automatically
preserves lines, segments, rays, angles, and degrees of angles. However, until we can
prove this fact about isometries, we cannot assume that an isometry has all these
desirable properties. So be careful.

The crossbar axiom

Before we leave the mathematical discussion of this chapter, we state the last
assumption we need for the development of plane geometry.

(L8) (Crossbar axiom) Given a convex angle AOB, for any point C not
equal to O in ∠AOB, the ray ROC intersects the segment AB (indicated as point
D in the following figure).


A
 
 @

`  @D
O `` C
```
``` @
``@ `@```
B
You may regard the crossbar axiom as frivolous, because "what else can the
ray ROC do if it does not intersect AB"? First of all, so long as you consider
this statement to be obvious, then our objective of agreeing on a common starting
point has been met: we do want to assume only believable facts. As to whether
the crossbar axiom is frivolous, we should point out that up to this point, none of
the assumptions (L1)–(L7) explicitly guarantees that the ray ROC must intersect
AB.47 The purpose of the crossbar axiom is therefore to firm up the intuitive idea
that a ray is indeed "straight and infinite in one direction" and therefore cannot
stay inside the bounded triangular region OAB. For example, (L8) guarantees that
the angle bisector (see page 192) of an angle in a triangle must intersect the oppo-
site side, any two medians (see page 252 for the definition) must meet, and the two
diagonals of a parallelogram must intersect each other (page 260). It will also make
a rather dramatic appearance in unexpected places, e.g., the proof of Theorem G14
(hidden in the proof of (♣) on page 262), the proof of Theorem G16 on page 270,
the proof of Ceva’s theorem in Section 6.7 of [Wu2020b], etc.

47 See the Mathematical Aside immediately following.


4.6. CONGRUENCE, SAS, AND ASA 251

Mathematical Aside: Since we are assuming that every line can be made into
a number line (see (L3) on page 167), we get easy access to the definition of the
subtle concept of "betweenness" and its basic properties (pp. 167ff.; also see the
more elaborate discussion in Chapter 8 of [Wu2020b]). Such being the case, it is
known that the crossbar axiom can be deduced from the plane separation property
(L4) (page 176) if our goal is to pursue a strictly axiomatic development of plane
(Euclidean) geometry. See page 116 of [Greenberg]. However the proof is too
technical to be of real educational value for the purpose of teaching in K–12.

Exercises 4.6.
(1) Prove that congruence is a transitive relation (see page 241).
(2) Prove that if ϕ is a congruence and S is a convex set in the plane, then
ϕ(S) is also convex.
(3) Prove Theorem G8 (SAS) on page 245.
(4) Prove that any two circles with equal radii are congruent. (Caution: This
is a slippery proof. Be very precise.)
(5) Explain why two triangles with two pairs of congruent sides and one pair
of congruent angles need not be congruent.
(6) (This exercise makes use of a coordinate system; see the warning about
the use of coordinates on page 207.) In the picture below, let C denote
the lower left corner of the black figure. Suppose |∠CAB| = 45◦ , |AB| =
|BC|, and line L makes an angle of 45 degrees with line LAB .
Let F be the counterclockwise rotation of 45◦ with center at the point
B, let G be the clockwise rotation of 90◦ with center at the point A, let
H be the reflection across the line L, and let J be the translation along
−−→
AB. Furthermore, let S denote the black figure at the point C.

L C

A B

Using a separate sketch for each of the following items, indicate the
positions of (a) G(S), (b) F (G(S)), (c) G(H(S)) and H(G(S)) (are they
equal?), (d) J(S), (e) J(F (S)) and F (J(S)), (f) H(J(S)) and J(H(S)),
(g) G(H(J(S))), and (h) J(H(F (S))).
(7) Recall that the opposite sides of a rectangle have the same length (see page
226), so that knowing the lengths of a pair of adjacent sides is equivalent
to knowing the lengths of all four sides of a rectangle. It is common to
refer to a rectangle with a pair of adjacent sides of lengths a and b as
a rectangle with side lengths a and b. Now let R1 and R2 be two
rectangles with side lengths a1 , b1 and a2 , b2 , respectively. Prove that
R1 ∼= R2 if and only if either a1 = a2 and b1 = b2 , or a1 = b2 and a2 = b1 .
252 4. BASIC ISOMETRIES AND CONGRUENCE

(8) Let ABCD be a parallelogram. If a diagonal is an angle bisector (e.g.,


BD bisects ∠ABC), then prove ABCD is a rhombus; i.e., all four sides
of ABCD are equal (see the definition on page 193).
(9) Prove that the angle bisector from a vertex of a triangle is perpendicular
to the opposite side if and only if the two sides of the triangle issuing
from this vertex are equal. (Note that by the crossbar axiom, there is no
question that the angle bisector must intersect the opposite side.)
(10) A median of a triangle is a segment joining a vertex to the midpoint of
the opposite side. Prove that the median from vertex A of ABC is the
angle bisector of ∠A if and only if the median ⊥ BC. (Hint: Let D be the
midpoint of BC. Suppose AD bisects ∠A. To show AD ⊥ BC, let E be
the point on RAD so that |AD| = |DE|. Show ABEC is a parallelogram
and consider the reflection across LAE .)

4.7. A brief pedagogical discussion of proofs


We discuss a few pedagogical issues regarding the realities of teaching proofs of
theorems in a high school classroom.

We mentioned in the overview (pp. 157ff.) that the content of this chapter
should also be taught in middle school, but in a more intuitive and informal man-
ner. The Common Core Standards are in agreement ([CCSSM], page 55): the
eighth-grade geometry standards call for an understanding of "congruence and sim-
ilarity using physical models, transparencies, or geometry software" and the use of
"informal arguments to establish facts about the angle sum and exterior angle of
triangles, about the angles created when parallel lines are cut by a transversal, and
the angle-angle criterion for similarity of triangles". It will take a delicate touch to
achieve a balance between the nurturing of geometric intuition and the promotion
of reasoning. Because this middle school issue has been taken up in Chapters 4
and 5 of [Wu2016a] (also see [Wu2010a]), we will instead concentrate on the cor-
responding problem in high school: how to introduce geometry in the high school
curriculum.
Transformations of the plane and concepts of surjectivity and injectivity are
taxing topics even for college students, and it would not do to subject the average
high school student to a treatment with the same degree of precision as in this chap-
ter and the next.48 A teacher will have to judiciously simplify the content of these
chapters in order to convey to students their main message, namely, that congru-
ence and similarity are precise mathematical concepts. One suggestion is to confine
the discussion of transformations only to bijections (one-to-one correspondences) in
the plane and mention general transformations only in a few exercises or as activi-
ties for enrichment (cf. the examples on pp. 207ff.). Another suggestion is to make
liberal use of transparencies at every turn when discussing basic isometries and di-
lations. See, for example, the Activities of Sections 4.4 and 4.5 (pp. 221, 230, and
235) and the Activity of Section 5.1 on page 258. One can even assign homework

48 One of the considerations that entered into the design of this geometry curriculum is pre-

cisely the awareness that students may initially experience difficulty with these concepts. Therefore
we want these concepts to be taught, first intuitively in middle school and then more formally in
high school.
4.7. A BRIEF PEDAGOGICAL DISCUSSION OF PROOFS 253

problems on such activities and ask students to report their findings on the effects
of various transformations. With enough such hands-on experiences, students will
build up their geometric intuition about the basic isometries and therefore about
isometries in general. Using transparencies in a similar manner to illustrate the
composition of transformations is also highly recommended.
Such hands-on activities are meant, of course, to supplement the definitions
and the accompanying mathematical discussions, not to replace them. At the same
time, a high school presentation of some of the definitions, lemmas, and theorems
can probably afford to specify that they be "skipped on first reading" so that they
can be revisited later (and even then, perhaps soft-pedaled). For example, Theorem
G1 (page 220) is so basic that ample time should be devoted to its proof, but the
proofs of Theorems G2–G3 (pages 223–224) do not quite occupy the same exalted
status. Recall that the purpose of these two theorems is to show that, from a
point outside a given line, one can drop one and only one perpendicular to the
given line. This fact is needed to make the definition of a reflection well-defined.
However, since this fact is so intuitively obvious, it may be pedagogically legitimate
to simply state these two theorems but postpone their proofs so as to get to the
definition of reflection as quickly as possible. Pedagogical decisions of this type are
always the prerogative of the teacher. Having said that, we would suggest explicitly
that Theorem G4 be carefully proved because the method of proof is powerful. In
fact, the proof of Theorem G18 (page 277) can be given right after Theorem G4 if
so desired.
As another illustration of what can be done to smooth geometric instruction
in high school, it is not usually realized that the definitions of the alternate inte-
rior angles and corresponding angles of a transversal are by no means simple (see
page 276). While the precise definitions should be given (they justify why the con-
cepts of a half-line in Lemma 4.5 and a half-plane in (L4) are indispensable), one
should not make a big deal of the precision; it is boring and cumbersome. Unless
absolutely necessary, one should simply use a picture to identify alternate interior
angles. There is a similar phenomenon with certain proofs. Take Theorem G14, for
example. In this case, there is in fact an explicit recommendation on pp. 262ff. to
suppress some of the subtleties inherent in the proof. Please note that this recom-
mendation is based on two considerations: (a) if the subtle point is not explicitly
brought out by the teacher, an overwhelming majority of the students will not be
aware of it, and (b) in the context of mathematics learning, learning about the
proof of such a subtle point at this stage of students’ mathematical development
is of secondary importance. What we are suggesting is that good pedagogy always
involves some subjective judgment: there is no ironclad rule that dictates in a given
situation what should be emphasized and what could be soft-pedaled. Some com-
promise is essential when the conflict between what is ideal and what is achievable
becomes extreme.
Overall, we wish to advocate a certain flexibility in teaching proofs in geometry.
There should be no doubt about the importance of proofs all through the school
curriculum. But the main message we are trying to convey—as we did in the dis-
cussion of axiomatic developments of plane geometry on page 160—is that a slavish
adherence to the mindset of "proving absolutely everything" is counterproductive
when it comes to the teaching of high school geometry. Compare the discussion on
pp. 160ff. As an example of the kind of flexibility we have in mind, it would be
254 4. BASIC ISOMETRIES AND CONGRUENCE

at times worthwhile to skip a less interesting proof and use the time for a detailed
discussion of the evolution of another proof. A potential candidate for the latter
would be the proof of Theorem G15 on page 263.
The preceding discussion is a reminder of the realities about the mathemati-
cal education of teachers: what we teach teachers is not always what we can use,
unchanged, to teach school students. Pedagogical considerations will necessarily
modify pure mathematical knowledge, or, in the terminology of [Wu2006], the
"mathematical engineering" aspect of school mathematics education cannot be ig-
nored. In particular, these volumes contain more proofs than is optimal for students’
mathematics learning. Experience in the actual classroom will suggest the proper
give-and-take between what ideally should be taught and what could actually be
taught. What such pedagogical considerations cannot do, however, is lighten your
mathematical load as a prospective teacher. If you hope to teach certain concepts
or certain proofs effectively by making the correct mathematical decisions, then you
must know the whole mathematical story first before you can decide what message
must be conveyed and which details can be harmlessly left out. One cannot write a
faithful twenty-page plot outline of War and Peace without first carefully reading
through the thousand and more pages of the uncondensed version. Likewise, with-
out a complete knowledge of the relevant mathematics, you will not know what to
keep and what to leave out in your lessons because you won’t be able to distinguish
between what is truly essential and what is expendable. Besides, if by chance you
get a precocious youngster who wants the whole truth and nothing but the truth,
then you will have to supply the whole truth and nothing but the truth. This too
is part of your basic duty as a teacher, and these volumes are designed to get you
ready for such contingencies.
CHAPTER 5

Dilation and Similarity

This chapter introduces the other basic concept in school geometry: similarity.
Like congruence, similarity has not fared well in TSM.1 Middle school students
are taught that two sets are similar if they are the same shape but not the same
size. Intuitively, this is a useful description of similarity, but as in the case of con-
gruence, TSM has the habit of confusing nice-sounding intuitive statements with
precise mathematical definitions. When students get to high school under the illu-
sion that "same shape but not necessarily the same size" is all they need to know
about similarity, they are shocked to be confronted with the fact that similarity
henceforth will only mean equal angles and proportional sides for triangles but
no further mention is made about the similarity of curved figures. Consequently,
students’ understanding of similarity upon graduation from high school consists of
two disconnected sound bites: a definition of similar triangles in terms of propor-
tional sides and equal angles and a vague conception of "same shape but not the
same size" for anything other than triangles. Thus TSM even fails to give students
a correct understanding of a concept as basic as similarity. Fortunately, a correct
definition of similarity, one that is discussed below, can be easily introduced as early
as middle school through ample hands-on experiments plus a judicious amount of
reasoning. A more detailed description of how this can be accomplished in middle
school has been given in Chapter 4 of [Wu2016a]. We are happy to point out that
this curricular advocacy has been adopted by the CCSSM ([CCSSM]), and if the
CCSSM is rigorously implemented, at least one of the egregious errors of TSM will
be rectified in the near future.
We will not pretend that a successful implementation in school classrooms of
this new point of view will be easy or straightforward. It will require some hard
work by knowledgeable teachers to bring about a true understanding of similarity.
The main goal of this chapter is to provide teachers with the content knowledge
they will need for such a successful implementation. In this context, what was said
in Section 4.7 (pp. 252ff.) is just as relevant, if not more so, to the material of this
chapter.

5.1. The fundamental theorem of similarity


The discussion of the concept of similarity will rest on a single theorem that
we call the fundamental theorem of similarity. At this point, we have not yet said
what "similarity" means (the definition will be given in Section 5.3), much less
why this theorem is fundamental. However, it will become all too clear that this
theorem dominates the whole discussion of similarity. In this section, we will only
prove a very special case of the theorem, Theorem G15 on page 263, but this special
1 See page xiv of the preface.

255
256 5. DILATION AND SIMILARITY

case—on account of its simplicity—is of independent interest, not the least because
its proof requires two new characterizations of a parallelogram.
Statement of the theorem (p. 256)
Two characterizations of a parallelogram (p. 259)
Proof of FTS when r = 12 (p. 263)

Statement of the theorem

Theorem G 10 (Fundamental theorem of similarity). Let ABC be


given, and let D, E be points on the rays RAB and RAC , respectively, so that
|AD| |AE|
D is not equal to A or B. If |AB| = |AC| and their common value is denoted by r,
then
|DE|
DE  BC and = r.
|BC|
A
@
 @
 @
 @
 @
D  @ E
 @
B  @C

The fundamental theorem of similarity is usually referred to by its acronym,


FTS.
We only prove a special case of this theorem in this volume and will leave the
complete proof to Section 6.4 of [Wu2020b] (for the case where r is rational) and
Section 2.6 of [Wu2020c] (for the general case where r is any positive number).
It should be mentioned that the long and intricate proof of FTS can in fact be
given at this point; it uses either ideas we have already developed or those that are
independent of what we have been doing (such as the least upper bound axiom).
There is thus no fear of circular reasoning in the rest of this chapter when we make
extensive use of the FTS to prove other results about similarity. The decision not
to give the proof at this juncture is due to the pressing need to get to the AA cri-
terion for similarity (Theorem G22 on page 288) as quickly as possible; the latter
is needed for the study of linear equations in two variables.

The number r in the theorem is generally referred to as the scale factor. The
statement above on DE BC is a standard abuse of notation for LDE  LBC ; i.e.,
the line containing the segment DE is parallel to the line containing the segment
BC. We will continue to use this abuse of notation for the rest of these volumes.
In applications, it is sometimes more convenient to assume, instead of
|AD| |AE|
(5.1) = ,
|AB| |AC|
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 257

the equivalent condition when D = B and E = C, that

|AD| |AE|
(5.2) = .
|DB| |EC|

Because the proof of the equivalence of (5.1) with (5.2) is entirely elementary and
straightforward, we will leave it as an exercise (Exercise 1 on page 267).
We will give a proof of the special case of FTS where r = 12 at the end of this
section (page 263), i.e., for the case that D and E are midpoints of AB and AC,
respectively. To this end, we first give an equivalent formulation of FTS that is
sometimes more convenient to use:

Theorem G11 (FTS*). Let ABC be given, and let D be a point on the ray
RAB not equal to A or B. Let  be the line parallel to BC and passing through D.
Then  intersects the ray RAC at a point E, and

|AD| |AE| |DE|


= = .
|AB| |AC| |BC|

A
QQ
 Q
 Q
Q
D  Q E
 Q
Q
B  QC

Let us explain in greater detail what we mean when we say Theorems G10
and G11 are equivalent. It means that, if we assume the validity of (L1)–(L8)
in Chapter 4 and Theorems G1–G9, then Theorem G10 implies Theorem G11 and,
conversely, Theorem G11 implies Theorem G10. In other words, if you know The-
orem G10 is true, then you also know Theorem G11 is true and, conversely, if you
know Theorem G11 is true, then you also know Theorem G10 is true. These two
theorems are therefore interchangeable in the precise sense above. Leaving the de-
tails of the converse to an exercise (Exercise 9 on page 268), we will prove that,
assuming Theorem G10, we can prove Theorem G11.

Proof of Theorem G11. Let the line passing through D and parallel to LBC be
 as in the theorem. We will first prove that  intersects not just the line LAC , but
the ray RAC ; it will be an indirect proof. Let E be chosen on the ray RAC so that

|AD|
|AE| = · |AC|.
|AB|

E could be between A and C or C could be between A and E, depending on whether


|AD|/|AB| is < 1 or > 1, which in turn depends on whether D is between A and
B or B is between A and D. Both cases are shown in the picture below.
258 5. DILATION AND SIMILARITY

A A

D E
B C
B C
D E
Then by the cross-multiplication algorithm on page 22 (applied to real numbers by
appealing to FASM—see page 133), we have
|AD| |AE|
= .
|AB| |AC|
By FTS (Theorem G10), we have LDE  LBC . Since   LBC by hypothesis,  and
LDE are two lines both passing through D and parallel to LBC . By the parallel
postulate (page 165), the lines  and LDE coincide, so that  does intersect the ray
RAC at E. Moreover, by FTS once again,
|AD| |AE| |DE|
= = .
|AB| |AC| |BC|
The proof of Theorem G11 is complete (when Theorem G10 is assumed).

To a student, the proof of Theorem G11 (FTS*) is probably not as convinc-


ing as some direct experimental verifications of the theorem. To this end, it is
worthwhile to point out that the lined papers in ordinary notebooks provide fer-
tile ground for experimentations related to FTS*, as the following Activity explains.

Activity. We will assume that the lines of lined papers are equidistant
parallel lines, in the sense that the distances between adjacent parallel lines (see
page 228 for the definition) are equal. Therefore on a given transversal LAB , the
segments intercepted on LAB by adjacent parallel lines will be all of the same length
(see Exercise 11 on page 238). It follows that if a segment on LAB has its endpoints
on two of the lines on a piece of lined paper (such as AB below), then any one of
the parallel lines in between will divide the segment into two parts whose lengths
can be instantly read off by counting the number of parallel lines. To be explicit,
consider the following situation:
A
\
\
\
P q \qQ
\
D q \qE
\
\
\
\
B C
If the length of the segment on AB trapped between two adjacent parallel lines is
s, then |AD| = 3s and |AB| = 5s. Because the lines LBC and LDE are parallel,
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 259

Theorem G11 predicts that


|AD| |AE| |DE|
(5.3) = = .
|AB| |AC| |BC|
The first equality is not in doubt because if the length of the segment on AC trapped
between two adjacent parallel lines is t, then we have |AE| = 3t and |AC| = 5t for
the same reason. Therefore,
|AD| |AE| 3
= = .
|AB| |AC| 5
What remains uncertain is the prediction in the second equality in (5.3):
|DE| 3
(5.4) = .
|BC| 5
Of course Theorem G11 has already been proved in general so that, in theory, (5.4)
has to be true. But here we are talking about the visceral feeling of conviction.
Some can probably get it from the proof itself, but others may need something
extra, beyond the proof itself. This is where this Activity comes in: never mind the
theory, but direct measurements of |DE| and |BC| will actually confirm the validity
of (5.4). This reinforces one’s faith in Theorem G11. Similar considerations apply
to the segments AP and AQ in the picture above so that

|AP | |AQ| 2
= = .
|AB| |AC| 5
Theorem G11 now predicts that
|P Q| 2
(5.5) = .
|BC| 5
Again, it should be a satisfying experience for students to verify the prediction (5.5)
by directly measuring |BC| and |P Q|. In a school classroom, such an experiment,
when repeated for many different variations of this configuration, should provide an
excellent opportunity for students to build up their intuition about Theorem G11
which, as we noted, is equivalent to FTS.

Two characterizations of a parallelogram

The proof of FTS for the special case of r = 12 requires the ability to recognize
a parallelogram when we see one. This subsection takes a step in that direction.
First, a general observation.

Theorem G12. Let O be a point on a line L, and let be the rotation of 180◦
around O. Then maps each half-plane of L to its opposite half-plane.

Proof. Let the half-planes of L be H+ and H− . The theorem says


(H− ) = H+ and (H+ ) = H− .
It suffices to prove the first assertion; i.e., (H− ) = H+ . As usual, this means
proving both (H− ) ⊂ H+ and H+ ⊂ (H− ). Let us first prove (H− ) ⊂ H+ . So
260 5. DILATION AND SIMILARITY

let P be a point in H− , and let Q = (P ). By the definition of a 180◦ rotation, P ,


O, and Q are collinear and O, being the midpoint of the segment P Q, lies in P Q.
q
 Q
H+



O
L


P q H−

Now the segment P Q contains a point of L, namely, O, so P and Q are in opposite


half-planes of L (see (L4) on page 176). Since P ∈ H− , we have Q ∈ H+ ; i.e.,
(P ) ∈ H+ , as claimed. Next we need to prove that H+ ⊂ (H− ). Thus given
Q ∈ H+ , we must show that there is a point P ∈ H− so that (P ) = Q. We
simply let P = (Q). Since obviously ◦ is the identity transformation, we see,
by applying to both sides of the equation P = (Q), that (P ) = ( ◦ )(Q) = Q.
Since O lies in P Q (for the same reason as before) and O ∈ L, we see that P and Q
have to be in opposite half-planes of L. Thus P ∈ H− , and the proof is complete.
The next two theorems give characterizations of a parallelogram that will prove
to be useful. Two segments AB and CD are said to bisect each other if they
intersect at a point which is the midpoint of both segments. With this understood,
the first theorem says that parallelograms are the quadrilaterals whose diagonals
bisect each other (recall that a diagonal of a polygon is a segment; see page 171).

Theorem G13. A quadrilateral is a parallelogram ⇐⇒ its diagonals bisect each


other.
A "D
@ " 
 @ " 
"
 @M " 
 "
" @ 
 "" @ 

" @
B C

Proof. We will give a proof similar to the one outlined in Exercise 4 on page 228.
It is well to point out that, at the outset, the two diagonal segments AC and BD
are not known to intersect each other (see the Pedagogical Comments after the
proof), much less bisect each other.
Let M be the midpoint of the diagonal AC, and let be the rotation of 180◦
around M . In the proof of Theorem G4 (see (4.7) on page 227), we proved that
(B) = D. Since is a 180◦ rotation, the points B, M , and D are collinear, and
since is an isometry and (M B) = M D, M is also the midpoint of the diagonal
BD. Thus AC and BD bisect each other.
Next, we look at the converse. Suppose a quadrilateral ABCD (the definition
of a quadrilateral is given on p. 171) has the property that its diagonals AC and
BD meet at M and M is the midpoint of both AC and BD. We will prove that
ABCD is a parallelogram. As usual, let be the rotation of 180◦ around M . Then
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 261

(B) = D and (C) = A. Thus (LBC ) = LAD because, by (L7)(i), maps lines
to lines (page 237) and there is only one line passing through A and D (by (L1)
on page 165). By Theorem G1 (page 220), LBC  LAD . In the same way, we can
prove LAB  LCD . This proves that ABCD is a parallelogram, and the proof of
Theorem G13 is complete.

Pedagogical Comments. The preceding proof shows indirectly that the di-
agonals AC and BD of a parallelogram meet at a point M . The first comment is
that a direct proof of this fact can be given using the crossbar axiom (page 250),
but it is tedious. A second comment is that few high school students will ever
conceive of the need for such a proof. In a beginning geometry class, one should
probably take something this obvious for granted. Once the formal proof is over,
however, a teacher may want to point out that this visually obvious fact actually
requires a proof because a quadrilateral ABCD could look like this:
D
@
@
H @

B HH@

H@H@
A  HC

and the two diagonal segments AC and BD do not intersect. The question then be-
comes why this cannot happen to the diagonals of a parallelogram: which property
exactly about a parallelogram makes its diagonals intersect? If we can get students
to be curious about the answer to this question, then (and perhaps only then) would
they find such a proof to be meaningful. End of Pedagogical Comments.

Theorem G14. A quadrilateral is a parallelogram ⇐⇒ it has one pair of sides


which are equal and parallel.

Proof. The fact that a parallelogram has a pair of sides which are equal and
parallel is implied by Theorem G4, page 226. We prove the converse. Let ABCD
be a quadrilateral so that |AD| = |BC| and AD  BC. We have to prove that
ABCD is a parallelogram. It suffices to prove that AB  CD. Let be the
rotation of 180 degrees around the midpoint M of the diagonal AC.

A
B
D

B


Bq


M B


B
C
B

As usual, (A) = C by the definition of , and we also have (LAD )  LAD , by


Theorem G1, page 220. Therefore (LAD ) is a line passing through C and parallel
to LAD itself. Since LBC is also a line passing through C and parallel to LAD , the
262 5. DILATION AND SIMILARITY

parallel postulate (page 165) implies that (LAD ) = LBC . In particular, (D) lies
in LBC . Denote (D) by D , so D ∈ LBC . We are going to show that D = B.
To this end, observe that on the line LBC , D either lies in the ray RCB or in the
opposite ray so that B ∗ C ∗ D , as shown:

B C

(A) D

We want to show that the second alternative is impossible. Indeed, if D lies on


the opposite ray of RCB as in the drawing above, then the segment BD contains
a point C of LAC and therefore B and D lie in opposite half-planes of LAC . Since
B and D also lie in opposite half-planes of LAC , D and D must lie in the same
half-plane of LAC . In other words, D and (D) lie in the same half-plane of LAC .
This contradicts Theorem G12 on page 259. Therefore D lies in the ray RCB .
Since is an isometry, | (AD)| = |AD|. Since (AD) = CD , we have
|CD | = |AD|. But |AD| = |CB| by hypothesis, so |CD | = |CB|. Hence B and D ,
being two points at the same distance from C and lying on the same ray RCB , must
coincide (see (L5)(ii) on page 184). Thus D = B; i.e., (D) = B. Coupled with
the fact that (C) = A, this shows (CD) = AB, and therefore, (LCD ) = LAB .
But according to Theorem G1, (LCD )  LCD . Hence LAB  LCD , as desired.

Remark. In the preceding proof, the fact that B and D lie in opposite half-
planes of the diagonal line LAC was taken for granted and this fact allowed us to
conclude that B = (D). But as we have seen from the picture on page 261, the
vertices B and D of a quadrilateral ABCD may very well lie on the same side of
the diagonal line LAC . In that case, (D) and B would be in opposite half-planes
and the preceding proof of Theorem G14 would fall apart. Therefore this proof of
Theorem G14 tacitly assumes that the following assertion holds:

(♣) If two sides AD and BC of a quadrilateral ABCD are paral-


lel, then the vertices B and D lie on opposite sides of the diagonal
line LAC .

We now supply a proof of (♣). Suppose


D
B and D lie on the same side @ 
of LAC , as shown. We will deduce @

HH@@
a contradiction by showing that this

B HH@
 H@
A 
implies the segments AB and CD must HC
intersect.

To this end, we will prove the following two steps.

Step I. The ray RAB intersects the segment CD.


Step II. A and B lie on opposite sides of the line LCD .
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 263

To prove Step I, we claim that B lies in the convex angle ∠DAC (see page 182 for
the definition of a convex angle). This means we have to show both of the following:
(i) B and D lie on the same side of LAC and (ii) B and C lie on the same side of
LAD . Now (i) is true because this is our working hypothesis. To prove (ii), observe
that since LAD  LBC , the segment BC cannot intersect LAD . By assumption
(L4)(ii) on page 176, B and C lie on the same side of LAD , as desired. Thus B lies
in ∠DAC. The crossbar axiom (page 250) now implies that RAB intersects CD,
and Step I is proved.
Next, to prove Step II, assume that A and B lie on the same side of LCD and we
will show that this is impossible. We claim that B lies in the convex angle ∠DCA.
As usual, this means we have to prove both of the following: (a) B and D lie on
the same side of LAC and (b) B and A lie on the same side of LDC . There is no
need to prove (a) because it is our overall working hypothesis (see (i) above), and
(b) is true because we are assuming A and B lie on the same side of LCD . Thus
the claim is true that B lies in ∠DCA. By the crossbar axiom, RCB intersects AD,
and this contradicts the fact that LBC  LAD . The proof of Step II is complete.
We can now deduce the contradiction we are after, namely, that if B and D lie
on the same side of LAC , then the segments AB and CD must intersect. Indeed,
Step I shows that the line LAB intersects the segment CD at a point X, and Step
II shows that the line LCD intersects the segment AB at a point Y . Now X and Y
both lie on LAB and LCD , and since two distinct lines intersect at exactly one point,
X = Y . But X lies on CD and Y lies on AB, therefore the segments CD and AB
intersect at X (= Y ). This contradicts the fact that, ABCD being a quadrilateral,
only its adjacent sides can intersect (see the definition of a polygon on page 171).
So B and D lie on opposite sides of LAC after all, and we have proved (♣).

Pedagogical Comments. The preceding proof of (♣)—which one must re-


member is part of the proof of the simple and intuitive Theorem G14—together
with the Pedagogical Comments on page 261, reveals one of the less attractive fea-
tures of plane geometry, namely, the fact that there are many intuitively obvious
geometric details that are quite subtle and whose proofs are tedious. For the case at
hand, it is most unlikely that an average high school student would be interested
in seeing a proof of something as obvious as B and D being in opposite half-planes
of LAC when AD  BC. A reasonable way to proceed would be for a teacher
to announce at the outset that the pictorially plausible fact that B and D lie in
opposite half-planes of the diagonal AC will be assumed without proof. Then one
can proceed as in the above proof of Theorem G14 by concentrating on the idea of
using the rotation to complete the proof by contradiction. However, in order to
win your students’ trust, you will have to be ready to produce at least an outline
of the proof of (♣) when some of them ask for it. This is part of the reason that,
for your education as a teacher, we will continue to supply complete proofs. End
of Pedagogical Comments.
1
Proof of FTS when r = 2

We can now prove the special case of FTS (page 256) when r = 12 :

Theorem G15. Let ABC be given, and let D and E be midpoints of AB


and AC, respectively. Then DE  BC and |BC| = 2|DE|.
264 5. DILATION AND SIMILARITY

A
C
 C
 C
D CE
 C
 C
 C
B  CC

Analysis. Let us see how we can prove such a theorem. The situation is
this: we know (L1)–(L8) and Theorems G1–G11, and we are confronted with the
statement of Theorem G15. The question is how we can prove (among other things)
|BC| = 2|DE|. This is awkward, because we know how to prove two segments have
the same length—find a congruence that carries one segment to the other—but not
one segment being equal to twice another. One way out of this predicament is to
look for a segment that has twice the length of DE and then we can try to prove
that this segment has the same length as BC. In this light, extending the segment
DE to F so that DF has twice the length of DE (i.e., |DF | = 2|DE|) would be a
very natural move, as shown:
A
C
 C
 C
D CE F
 C 
 C 
 C 
B  CC
It would be equally natural at this point to connect C to F by a line segment
and, once this is done, we see that if we can prove the quadrilateral DBCF is a
parallelogram, then Theorem G4 on page 226 would immediately yield the desired
conclusion that |BC| = |DF |. So once we get to the "augmented figure" with the
additional line segments EF and CF added to the original figure, we see a clear
path to our goal, the proof of the theorem.
The line segments EF and CF that are added to the original picture of ABC
together with the segment DE are called auxiliary lines in the school education
literature. TSM makes a big deal out of "adding auxiliary lines" as a kind of magical
tool for learning how to prove theorems, but there is in fact nothing "magical" about
these "add-ons". Think of a theorem as an edifice; then the analog of proving a
theorem is finding ways to build a given edifice. Of course when one shows off
an edifice, one first takes down all the scaffolding and removes all traces of the
construction process. If we are serious about building the edifice, however, we must
first mentally remove the pristine picture of the edifice and put back the scaffolding
and begin thinking about the messy construction process itself. Likewise, when a
textbook presents a theorem, the textbook will only give the finished product—the
geometric figure that goes with the theorem—without including the messy details
of the thinking process that may have gone into the proof of the theorem. If we
want to learn to prove the theorem ourselves, we cannot be limited by the pristine
figure attached to the theorem but must put back some of the lines or circles (the
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 265

"scaffolding") that are integral to the proof itself. So the "add-ons", such as the
segments EF and CF , are neither random nor magical, but are things that come
up naturally when we try to look for ways to better understand the "construction
process".
In the above picture, we have chosen to extend DE along the ray RDE , but
the proof does not change if we extend instead along the opposite ray RED . (See
Exercise 5 on page 267.)
Let us continue with our attempt at arriving at a proof of Theorem G15.
Referring specifically to the preceding picture now, we wish to prove |DF | = |BC|.
But how do we prove DBCF is a parallelogram? At this point, we realize that
our repertoire in this regard is extremely limited: we only have Theorems G13 and
G14 for this purpose. The latter prompts us to try to show that BD  CF and
|BD| = |CF |. By hypothesis, |AD| = |DB|, so our focus shifts to proving AD  CF
and |AD| = |CF |. Such would be the case if DCF A is a parallelogram. Since AC
and DF bisect each other, Theorem G13 gives us exactly what we need. Now, the
whole proof comes together.
One observation of the above analysis is worth mentioning. We see that the
reasoning process is built on a solid knowledge base: students who do not have
Theorems G1, G4, G13, and G14 at their fingertips will be handicapped in trying
to prove this theorem. (We are not saying that memorizing Theorems G1, G4, G13,
and G14 will be all it takes to prove Theorem G15, but, rather, that having easy
access to these theorems is a sine qua non for the task.) What we have is therefore
a simple illustration of the fact that doing mathematics requires a solid mem-
ory bank of basic facts. Do not listen to anyone telling you that "conceptual
understanding"—but no memorization—is all it takes to do mathematics.

Proof of Theorem G15. On the ray RDE , we take


A
a point F so that |DE| = |EF |. Join CF . Now recall @
C
 C @
that |AE| = |EC| by hypothesis. Therefore ADCF is  C @
a parallelogram, by Theorem G13, and we see that D CE @ F
@ C 
CF  AD, or in other words, CF  BD. Moreover,  @ C 
 @ C 
|CF | = |DA| by Theorem G4, and hence |CF | = |BD| B  @CC
because |BD| = |DA| by hypothesis.

Theorem G14 now implies that the quadrilateral DBCF , having a pair of sides
which are equal and parallel, is a parallelogram. Thus DF  BC, which is the same
as DE  BC. Furthermore, |BC| = |DF | (Theorem G4 on page 226), and since
|DE| = |EF |, we have |BC| = 2|DE|. The proof is complete.

Theorem G15 has a surprising consequence: if ABCD is any quadrilateral,


then the quadrilateral obtained by joining the midpoints of all four pairs of adja-
cent sides in ABCD is always a parallelogram. (See Exercise 6 on page 267).
266 5. DILATION AND SIMILARITY

Because of the importance of Theorem G15 in our work, we will give it a second
proof using translations. The strategy is to first prove the following Theorem G15*
directly and then use it to prove Theorem G15.

Theorem G15*. Let ABC be given and let D be the midpoint of AB. Suppose a
line parallel to BC passing through D intersects AC at E. Then E is the midpoint
of AC and 2|DE| = |BC|.
A
e
 e
D eE
e e
 e e
B  e eC
F
Proof. Observe first of all that the pictorially obvious fact is correct, namely, that
the point E lies on the segment AC; this is because of Lemma 4.8 on page 178.
−−→
Now let T denote the translation along the vector AD. Because |AD| = |DB| by
hypothesis, the definition of T implies that T (D) = B. Since T maps any line  not
equal to or parallel to LAD to a line parallel to  itself (Theorem G5 on page 236),
T (LDE ) is a line passing through B and parallel to LDE . Since LBC  LDE by
hypothesis, the parallel postulate implies that T (LDE ) = LBC so that T (E) is a
point on line LBC . Let T (E) = F . Note that F is a point on the segment BC
on account of, once again, Lemma 4.8 on page 178 (because T (E) = F implies
LEF  LAB , by Lemma 4.21 on page 234). In any case, we have T (DE) = BF ,
and since T is an isometry (page 236), we get

(5.6) |DE| = |BF |.

Next consider T (LAC ). Because T (A) = D and T (E) = F , it follows that T (AE) =
DF and therefore T (LAC ) = LDF . Using Theorem G5 once more and the fact
that a translation is an isometry, we have

DF  AC and |AE| = |DF |.

Since DE  BC by hypothesis, DF CE is a parallelogram and therefore,

(5.7) |DE| = |F C| and |DF | = |EC|,

by Theorem G4. Coupled with equation (5.6), the first equality of (5.7) implies
2|DE| = |BC|. Finally, the equality |AE| = |DF |, together with the second equal-
ity of equation (5.7), implies that E is the midpoint of AC. The proof of Theorem
G15* is complete.

Proof of Theorem G15 using Theorem G15*. Using the notation and picture
of Theorem G15, we draw a line L through D parallel to BC. By Theorem G15*,
L passes through the midpoint E of AC and therefore DE  BC. Since Theorem
G15* also says 2|DE| = |BC|, we are done.
5.1. THE FUNDAMENTAL THEOREM OF SIMILARITY 267

Exercises 5.1.
(1) Let D and E be points on sides AB and AC, respectively, of ABC, so
that D = A, B.. Make use of |AB| = |AD|+|DB| and |AC| = |AE|+|EC|
to prove

|AD| |AE| |AD| |AE| |DB| |EC|


= ⇐⇒ = ⇐⇒ = .
|AB| |AC| |DB| |EC| |AB| |AC|

(2) In ABC, let D, F ∈ AB and E, G ∈ AC, as shown:

A
@
 @
D @E
 @
F @G
 @
 @
B  @C

Suppose DE  BC and F G  BC. Prove: (a) If |AD| = |F B|, then


|AD| |AE|
|AE| = |GC|. (b) More generally, |F B| = |GC| .

(3) Assume ABC so that |AB| = |AC|. Let the angle bisector of ∠A meet
BC at F . Prove that AF is the perpendicular bisector of BC.
(4) Let D, E, F be the midpoints of sides AB, AC, and BC, respectively,
of ABC. Prove that the four triangles ADE, DBF , DEF , EF C are
congruent.
A
@
 @
 @
D @E
@ @
 @  @
 @  @
B  @ @C
F
(5) Give a proof of Theorem G15 by following the proof in the text, but this
time, extend DE along the ray RED (rather than along the opposite ray
RDE ) to a point F , so that |F D| = |DE|.
(6) If ABCD is any quadrilateral. Prove that the quadrilateral obtained by
joining midpoints of the adjacent sides of ABCD is always a parallelo-
gram.
(7) In ABC, let D be a point on AB. Let the line passing through D and
parallel to BC intersect AC at F , and let the line passing through D and
parallel to AC intersect BC at E. Prove that if |DF |/|BC| = |DE|/|AC|,
then D is the midpoint of AB.
268 5. DILATION AND SIMILARITY

(8) Use the idea in the proof of Theorem G15, but do not assume FTS, to
prove that if in triangle ABC, D and E are points on AB and AC,
respectively, so that |AB| = 3|AD| and |AC| = 3|AE|, then DE  BC
and |BC| = 3|DE|.
(9) Prove that FTS* (Theorem G11) implies FTS (Theorem G10). More pre-
cisely, this means: assume everything we have proved up to and including
Theorem G9 plus Theorem G11, and prove Theorem G10.
(10) Let a segment AC lie in a half-plane of a given line , and let B be the
midpoint of AC. Let LAD , LBE , and LCF be three parallel lines which
meet  at D, E, F , respectively. Prove that 2|BE| = |AD| + |CF |.

5.2. Dilation
We will give the definition and prove the basic properties of dilation, the first
transformation of the plane worthy of our serious attention that is not an isometry.
It is also the new ingredient we need to define similarity. It will be clear from the
discussion in this section why the FTS is the fundamental theorem of similarity.
Definition of a dilation (p. 268)
Basic properties of dilations (p. 269)
Effects of dilations on lengths and degrees (p. 275)

Definition of a dilation

We have been considering isometries almost exclusively thus far. Now we have
to look seriously into an important class of transformations that are not isometries.

Definition. A transformation D of the plane is a dilation with center O


and scale factor r (r > 0) if
(1) D(O) = O.
(2) If P = O, the point D(P ), to be denoted by P  , is the point
on the ray ROP so that |OP  | = r|OP |.

Oq Pq Pq 
  
r|OP |
Remark. We call attention to the fact that this definition of a dilation requires
the scale factor to be positive. Some authors allow the scale factor to be any real
number so that a negative scale factor means a dilation in the opposite direction of
O. There are pros and cons to either convention.

Observe that if r = 1 in this definition, then D is the identity map and there
is nothing to discuss. In the following, we will tacitly assume that r = 1.
The definition of a dilation is starkly simple: a dilation with center at O maps
each point by "pushing out" or "pulling in" the point along the ray from O to that
point, depending on whether the scale factor r is bigger than 1 or smaller than 1,
respectively. Roughly, it is a kind of "central projection from the point O". In
particular, each ray issuing from O is mapped to the same ray. (Caution: All this
5.2. DILATION 269

says is that the ray is mapped onto itself, but each point on the ray will in general
be mapped to another point on the same ray.) Here is an example of how a dilation
with r = 2 maps four different points (for any point P , we let the corresponding
letter with a prime, P  , denote the image D(P ) of P ):
c U
cr
c

c
c
r Q
c Ur

c r

c
Q
c
c

cs
r r
O P P
r
V
r 
V

Basic properties of dilations

A fundamental property of dilations, one that makes possible the simple draw-
ings of the dilation of rectilinear figures (i.e., figures composed of line segments),
is given in the following theorem. It will be clear from its proof and other proofs
related to dilation that the FTS and the parallel postulate lie at the heart of the
matter.

Theorem G16. Dilations map segments to segments. More precisely, a dilation


D maps a segment P Q to the segment joining D(P ) to D(Q). Moreover, if the line
LP Q does not pass through the center of the dilation D, then the line LP Q is parallel
to the line containing D(P ) and D(Q).

Proof. Let D be a dilation with center O and scale factor r. If LP Q passes through
O, then either P and Q lie on the same side of O or they lie on opposite sides of
O. In either case, the fact that D maps P Q to the segment in LP Q from D(P )
to D(Q) follows immediately from the definition of a dilation. We may therefore
assume that LP Q does not pass through O. Let P  = D(P ) and Q = D(Q). We
will show that D(P Q) is the segment P  Q joining P  and Q

P U 
C
Q
C


C

C


C U

P C
 Q
C

C

C


O
As usual, there are two steps.
Step I. D(P Q) ⊂ P  Q .
Step II. P  Q ⊂ D(P Q).
270 5. DILATION AND SIMILARITY

To prove Step I, let LP Q be  and let LP  Q be  . We will first show D() ⊂  .


Let U be any point of , and we have to show that D(U ) is on  . Let U  = D(U );
then by the definition of a dilation, U  lies on the ray ROU . Consider OP  U  .
Because D maps P and U to P  and U  , respectively,
|OP  | |OU  |
= = r.
|OP | |OU |
By FTS, LP  U   LP U ; i.e., LP  U   . If we apply the same reasoning to OP  Q ,
then the fact that
|OP  | |OQ |
= =r
|OP | |OQ|
implies (by FTS) that LP  Q  LP Q ; i.e.,   . Thus both  and LP  U  are
lines passing through P  and parallel to . By the parallel postulate (page 165),
LP  U  =  , and, in particular, D(U ) (= U  ) lies on  , thereby proving D() ⊂  .
It remains to prove that D(P Q) ⊂ P  Q ; i.e., if U ∈ P Q, then we must prove
U  ∈ P  Q . Since U ∈ P Q, clearly U ∈ ∠P OQ. By the crossbar axiom (page 250),
the ray ROU must intersect the segment P  Q at some point; let us say V . But U 
is the point of intersection of LOU and LP  Q (=  ), so V coincides with U  because
the two distinct lines LOU and  can intersect only at one point. Since V  ∈ P  Q ,
we have U  ∈ P  Q , and the proof of Step I is complete.
We pause to observe that we have also proved in the process that, assuming 
does not contain O, then   , or equivalently, if the line LP Q does not contain O,
it is parallel to the line containing D(P ) and D(Q) (as claimed in the theorem).
Next, we prove Step II; i.e., we show that P  Q ⊂ D(P Q). Let U  ∈ P  Q , and
we have to show that there exists a point U ∈ P Q so that U  = D(U ). Because of
the crossbar axiom, the ray ROU  intersects P Q at a point and we claim that if we
denote this point by U , then in fact D(U ) = U  . To this end, recall that  (= LP Q )
is parallel to  (= LP  Q ), by FTS. We now apply FTS* (Theorem G11 on page
257) to OP  U  to obtain
|OU  | |OP  |
= = r.
|OU | |OP |
Therefore |OU  | = r|OU |. Since U  ∈ ROU , by the definition of the dilation D, we
get U  = D(U ) and Step II is proved.
As mentioned above, Steps I and II prove that D(P Q) = P  Q , and the proof
of Theorem G16 is complete.

There are two useful corollaries of this theorem.

Corollary 1. A dilation maps lines to lines and rays to rays.

Proof. Let the dilation be D. Given a line , we must prove that D() is also a
line. Let P , Q be points on . If  passes through the center O of D, then it is
easy to see that D() = . So we assume  does not pass though O. The theorem
says D(P Q) is the segment P  Q , where P  = D(P ) and Q = D(Q). Let  be
the line containing P  Q . We will prove D() =  . First, we prove that D() ⊂  .
Thus, if U ∈ , we have to prove that D(U ) ∈  . For the proof itself, it does not
matter where U is located in , and the picture below shows the case of P ∗ Q ∗ U ,
as shown:
5.2. DILATION 271

P Q U
 
B

B


B

B

P B Q
U

B

B

B

B


O

Let U  = D(U ), and let 0 = LP  U  . Then P  ∈ 0 and Theorem G16 implies


that 0 is parallel to . Likewise, P  ∈  and   . Since both 0 and  pass through
P  and are parallel to , the parallel postulate implies that  = 0 . Consequently,
the point U  on 0 is now a point on  ; i.e., D(U ) ∈  , as claimed. We have proved
that D() ⊂  . It remains to prove that  ⊂ D(). Let U  ∈  , and let U be the
point of intersection of the ray ROU  with . Then we prove exactly as in the proof
of Theorem G16 that U  = D(U ) and therefore U  ∈ D(). Thus  ⊂ D(), and
D() =  .
The fact that D maps rays to rays is proved in the same manner.

Corollary 2. Let triangles ABC and A B  C  be given. If a dilation D maps ver-


tices to vertices, then it also maps one triangle to the other. Precisely, if D(A) = A ,
D(B) = B  , and D(C) = C  , then D(ABC) = A B  C  .

Proof. Recall that ABC is the union of the segments AB, BC, and AC. Thus
we must prove D(AB) = A B  , D(BC) = B  C  , and D(AC) = A C  . By Theorem
G16, D(LAB ) is a line joining A and B  because D(A) = A , D(B) = B  . Since
assumption (L1) implies that there is a unique line joining two points, we see that
D(LAB ) = LA B  , from which D(AB) = A B  immediately follows. The same is
true for D(BC) = B  C  and D(AC) = A C  . Corollary 2 is proved.

Armed with Theorem G16, we see that it is very simple to draw the image of
a segment by a dilation. Indeed, to draw the image of a given segment AB by a
dilation D, we simply find the image points A and B  of the endpoints A and B
under D and then draw the segment joining A to B  . Here are two examples. In
the first, the original figure is a triangle ABC, and the scale factor is 2.5.

A H
HH
HHC  



 

A 

HHC

 



O 

B B
272 5. DILATION AND SIMILARITY

Here A = D(A), B  = D(B), and C  = D(C). In the next example, we specify a


scale factor of 2.1; the original figure S is a quadrilateral ABCE, and D(S) is the
enlarged quadrilateral to the right:

Q


 Q
D(S)
J

J QQ


E
J 
A
 Q J

J Q J

J
C
J
 J

J
O B

You are encouraged to make many such drawings of dilated rectilinear figures.
You may also have noticed that the dilation of a rectilinear figure "has the same
shape" as the original figure. But what about the dilation of curved figures? There
is no simple replacement of Theorem G16 in that case, but in practical terms (in
a sense to be made precise below), the procedure of getting a dilated figure is only
slightly more complicated. Consider for example the following curve:

Let us shrink it by a scale factor of 12 . Here is what we do: we pick an arbitrary


point O as the center of dilation, and then pick 10 judiciously chosen points on the
curve, as shown below. For convenience, we shall refer to the chosen points on the
original geometric figure as data points.

r
r
r
r
r r
r r r

q
O

Now we draw the rays from O to each of the data points on the curve and dilate
the latter by a scale factor of 12 (i.e., "shrink it by half" in everyday language) along
these rays and ignore the curve itself. We thereby obtain a collection of points, and
these will be points on the dilated curve. It should not be difficult to discern, just
from these 10 dilated points, the general shape of the dilated curve.
5.2. DILATION 273

,r ,

, "
,  " 
,r"" 
,r"  !!
, " r 
"r !! 
! 
r
,  " !! r r 
r,
  " !  r
, r "  !   r r(
, 
"
"
r r !
 !r
!
 ( ((((((
 r  
, 
" !  r r r (((
,

!


"
!

!


(


(
( ((((
"

(
((
q(!
"
,





! 

O

In the preceding picture, we used 10 data points to demonstrate how to carry


out the approximate dilation of the given curved figure; namely, we simply dilate
these points one by one. The dilated points then give a suggestion of what the
dilated curve would look like. If we delete the rays, we may see better the data
points and their dilated images.

r
r
r
r r
r r
rr r r
rr r
rr r
r r
q
O

It is obvious that the more data points we choose on the original curve, the better
we will be able to approximate the dilated curve. To drive home this idea, let us
use 40 data points instead of 10 on the original curve and dilate them from O to
get the following picture. (We omit the rays issuing from O in the interest of visual
clarity.)

q
O
274 5. DILATION AND SIMILARITY

Next we double the number of data points and use 80 instead of 40. The approx-
imation of this finite collection of points to the curve itself is already remarkably
good.

q
O
If we use 400 points, then the images can almost pass for the real thing, except
that if we look very carefully we can see that they are not entirely "smooth".

q
O
What we have described is a basic principle of constructing the dilated image
of any object: to dilate a given object by a scale factor of r, replace the object
by a finite collection of judiciously chosen points on the object, still to be called
data points, and then simply dilate these data points one by one by a scale factor
of r. By increasing the number of data points, their dilated images yield a closer
and closer approximation to the true dilated object.2 A few experiments with this
kind of drawing (see Exercises 5 and 6 on page 281) should suffice to convey the
idea that the dilated object so obtained "has the same shape" as the original but is
magnified or shrunk by a scale factor of r (depending on whether r > 1 or r < 1).
This is how we can enlarge or shrink arbitrary figures regardless of how curved they
may be.
It would be very instructive in the school classroom for students to magnify or
shrink many simple curved figures by such hands-on activities. We will elaborate
on these ideas in the next section.
Incidentally, what we have described is also the basic operating principle of
digital photography: approximate any real object by a large number of data points
on the object, and then magnify or shrink these data points by dilation.

2 Mathematical Aside: In computer graphics, one decides on a finite number of data points to

use for the purpose of dilation and then uses spline interpolation to complete the dilated image.
5.2. DILATION 275

Effect of dilations on lengths and degrees

The following theorem summarizes what a dilation does to lengths and degrees.

Theorem G17. Let D be a dilation with center O and scale factor r. Then:
(a) D is a bijection. In fact, its inverse is the dilation with the
same center O but with a scale factor 1/r.
(b) For any segment AB, |D(AB)| = r|AB|.
(c) D maps angles to angles and preserves degrees of angles.

Remark. Observe the delicate point that the statements of part (b) and part
(c) depend on the validity of Theorem G16 and its Corollary 1. Indeed, without
knowing that D(AB) is a segment, the notation |D(AB)| would not even make
sense (because the notation |σ| only makes sense when σ is a segment or an angle),
and without knowing that D maps rays to rays, it would not make sense to say
that D maps angles to angles.

Proofs of parts (a) and (b). (a) Let D be the dilation with center O and scale
factor 1r . From the definition of a dilation, it is easy to check that D ◦ D = I =
D ◦ D. Thus D is a bijection (Theorem 4.15 on page 211).
Part (b) has been implicitly proved in the proof of Theorem G16. Indeed,
in the notation above, if P  = D(P ) and Q = D(Q), then we have shown that
D(P Q) = P  Q . If LP Q contains O, then the fact that |D(P Q)| = r|P Q| follows
trivially from the definition of a dilation. So suppose LP Q does not pass through
O; then LP  Q also does not pass through O as the definition of D clearly shows.
We may therefore look at OP  Q , and FTS implies that |P  Q |/|P Q| = r; i.e.,
|D(P Q)| = r|P Q|. Since P and Q are arbitrary points, (b) is proved.
For the proof of part (c) of Theorem G17, we first observe that a dilation maps
convex angles to convex angles. Indeed, a convex angle is the intersection of closed
half-planes, so it suffices to prove that (i) a dilation maps closed half-planes to
closed half-planes, (ii) if H+ and H− are closed half-planes of a line L, then D(H+ )
and D(H− ) are the closed half-planes of the line D(L), and (iii) if H1 and H2 are
closed half-planes (of possibly different lines), then

D(H1 ∩ H2 ) = D(H1 ) ∩ D(H2 ).

Because the proofs of (i)–(iii) are as tedious as they are straightforward, we will
leave their verifications to the reader. Now once we know a dilation maps convex
angles to convex angles, then it also maps nonconvex angles to nonconvex angles
because every nonconvex angle is the union of the complement (in the plane) of a
convex angle together with the two sides of the convex angle.
To complete the proof of part (c), we have to make a long digression to discuss
the angles associated with parallel lines. The proof of part (c) itself will be given
on page 280.
We will need some definitions. Here is the first one. Let L and L be two lines
meeting at a point O. On L (respectively, L ), let P , Q (respectively, P  , Q ) be
276 5. DILATION AND SIMILARITY

points so that P ∗ O ∗ Q (respectively, P  ∗ O ∗ Q ), as shown in the following figure:




L X Qq
XXXPq 
XXX O  
X X 
  XXXX
q XqXX

  P
X
L Q

Then the angles ∠P OP  and ∠QOQ are called opposite angles at the point
O.3 There is a simple observation about opposite angles.

Lemma 5.1. Opposite angles at a point are equal.

Proof of Lemma 5.1. We make use of the preceding figure. Each of the two
numbers, |∠P OP  | and |∠QOQ |, when added to |∠P  OQ| is 180 because ∠P OQ
and ∠P  OQ are both straight angles. So |∠P OP  | = |∠QOQ |.
We want to give a different proof for the purpose of demonstrating how to make
use of basic isometries to prove theorems. In this case, we argue as follows. Consider
the rotation of 180◦ around O. Clearly (ROP ) = ROQ and (ROP  ) = ROQ .
Therefore (∠P OP  ) = ∠QOQ . Since preserves angles (by assumption ( 2) of
rotations on p. 217), we have |∠P OP  | = |∠QOQ |. The proof of the lemma is
complete.

Let two distinct lines L1 , L2 be given. Recall that a transversal of L1 and L2


is any line  distinct from L1 and L2 that intersects both. Suppose  meets L1 and
L2 at P1 and P2 , respectively. Let Q1 , Q2 be points on L1 and L2 , respectively, so
that they lie in opposite half-planes of . Then ∠Q1 P1 P2 and ∠Q2 P2 P1 are said to
be alternate interior angles4 of the transversal  with respect to L1 and L2 .

EE

E qS
E (
E ((((2q(( L2
R
Q2 (((( EP
((q( E 2
E
E
E
E
E q
L1
P1 E Q1
E

An angle which is the opposite angle of one of a pair of alternate interior angles
is said to be the corresponding angle of the other angle. For example, let S be
a point on the transversal  so that P1 ∗ P2 ∗ S, and let R2 be a point on L2 so that
Q2 ∗ P2 ∗ R2 . Then, because ∠Q1 P1 P2 and ∠Q2 P2 P1 are alternate interior angles

3 Thus also ∠P OQ and ∠P  OQ are opposite angles at O. Most school textbooks in the U.S.

call these vertical angles.


4 My colleague Ole Hald correctly suggests that opposite interior angles might be a better

terminology because they lie in opposite half-planes.


5.2. DILATION 277

and because ∠R2 P2 S is the opposite angle of ∠Q2 P2 P1 , ∠R2 P2 S is the correspond-
ing angle of ∠Q1 P1 P2 . Observe that, by the definition of opposite angles, ∠Q2 P2 P1
and ∠R2 P2 S lie on opposite closed half-planes of . Therefore the corresponding
angles ∠Q1 P1 P2 and ∠R2 P2 S lie on the same closed half-plane of the transversal .

Pedagogical Comments. In the school classroom, we suggest that alternate


interior angles be defined simply by drawing a picture as above and pointing to
∠Q1 P1 P2 and ∠P1 P2 Q2 . The correct definition (the one just given), using the
precise concept of the half-planes of a line, deserves to be pointed out to open stu-
dents’ minds to the potential of complete logical precision but perhaps should not
be emphasized at this point of the curriculum. Most school students do not take
kindly to the need for such precision in the proofs of theorems at the beginning of
their excursions in geometry (as in the proof of Theorem G18); they would likely
consider the investment of so much effort in something so visibly obvious to be
ridiculous. So some compromise in the school classroom may be necessary. See the
Pedagogical Comments on page 263. Nevertheless, we are obliged to give such a
precise presentation of geometry because mathematics demands no less. There will
be a real need for such precision later on, such as in the proofs of several theorems,
including the one on the angle sum of a triangle (see Theorem G32, Section 6.5 of
[Wu2020b]). End of Pedagogical Comments.

The basic theorem about parallel lines and angles is the following:

Theorem G18. Alternate interior angles of a transversal with respect to a pair


of parallel lines are equal. The same is true of corresponding angles.

E
E
Q2 E P2
q E L2
E
Er M
E
E q
L1
P1 E Q1
E
E
Proof of Theorem G18. We will give two proofs of this theorem. First, let
L1  L2 , and let  be the transversal that meets L1 at P1 and L2 at P2 . Let Q1
be a point on L1 distinct from P1 , and let Q2 be a point of L2 so that Q1 and
Q2 lie in opposite half-planes of . We will prove that the alternate interior angles
∠Q1 P1 P2 and ∠Q2 P2 P1 are equal. Let M be the midpoint of P1 P2 , and let be
the rotation of 180 degrees around M . Because (P1 ) = P2 , (L1 ) is a line passing
through P2 . By Theorem G1 (page 220), (L1 )  L1 , and the hypothesis says L2
is also a line passing through P2 and parallel to L1 . Hence the parallel postulate
(page 165) says (L1 ) = L2 . In particular, (Q1 ) lies on L2 and by Theorem G12
on page 259, (Q1 ) and Q1 lie on opposite half-planes of the transversal . Without
loss of generality, we may let Q2 = (Q1 ) so that (RP1 Q1 ) = RP2 Q2 . Now, we
also have (RP1 P2 ) = RP2 P1 . Hence, (∠Q1 P1 P2 ) = ∠Q2 P2 P1 . Since preserves
degrees of angles, |∠Q1 P1 P2 | = |∠Q2 P2 P1 |, as desired. The last assertion of the
278 5. DILATION AND SIMILARITY

theorem about corresponding angles follows from the equality of opposite angles at
a point (Lemma 5.1).
Now we give a second proof. Let L1  L2 with transversal  intersecting L1 and
L2 at P1 and P2 , respectively, as before. Also let Q1 , Q2 be as before. Choose a
point R2 on L2 so that Q2 ∗ P2 ∗ R2 , and choose a point S on  so that P1 ∗ P2 ∗ S,
as shown:
E
EqS
E
E
E
E
Q2 E P2 R2
E L2
E EE
E E
E E
E E
L1
P1 E Q1
E
E
Since Q2 ∗ P2 ∗ R2 is equivalent to Q2 and R2 being two points on L2 lying on
opposite sides of  and since Q2 and Q1 were chosen to be on opposite sides of ,
we see that R2 may be characterized as a point on L2 that lies on the same side of
 as Q1 .
By the definition of opposite angles at a point (see page 276), ∠R2 P2 S and
∠Q2 P2 P1 are opposite angles at P2 . It follows that ∠Q1 P1 P2 and ∠R2 P2 S are
corresponding angles of the transversal , and we will prove that they are equal.
−−−→
Let T be the translation along the vector P1 P2 . Then T (P1 ) = P2 . By Theorem G5
(page 236), the translation T maps L1 to a line parallel to itself. So T (L1 ) is a line
passing through P2 and parallel to L1 . But by hypothesis, L2 is also a line with the
same properties. The parallel postulate therefore implies that T (L1 ) = L2 . Now
let W denote T (Q1 ); then W ∈ L2 . We claim that W is on the same side of  as
−−−→
Q1 . To see this, observe that Lemma 4.21 on page 234 implies that Q1 W  . In
particular, the segment Q1 W is disjoint from  and therefore W and Q1 lie on the
same side of  (see (L4)(ii) on page 176), thereby proving the claim. It follows that
W is a point on L2 that lies on the same side of  as Q1 . By the characterization
of R2 in the preceding paragraph, there is no loss of generality if we let R2 = W .
Thus we have T (Q1 ) = R2 .
Next, let us turn to S, and we want to show that we may likewise let S = T (P2 ).
To this end, let U = T (P2 ) and we claim that P1 ∗ P2 ∗ U . Indeed, U has to lie
in either half-line of  determined by P2 . Suppose U lies in the ray RP2 P1 . Then
clearly RP2 U = RP2 P1 . Since U = T (P2 ), the definition of a translation (page 234)
−−→ −−−→
implies that P2 U points in the same direction as P1 P2 , and this means that there

is a line  not parallel to  so that one of its closed half-planes contains both rays,
RP1 P2 and RP2 P1 (= RP2 U ). Since the union of RP1 P2 and RP2 P1 is the line , there
is no such  . Therefore U must lie in the opposite ray of RP2 P1 on . Consequently,
P1 ∗ P2 ∗ U by Lemma 4.6(i) on page 174, and the claim is proved. It follows that
both U and S are points on  lying on the opposite side of P1 on  with respect
to P2 . Thus we may let S = U without any loss of generality; i.e., we may let
5.2. DILATION 279

S = T (P2 ). Now, we have T (∠Q1 P1 P2 ) = ∠R2 P2 S. Since T preserves degrees of


angles, |∠Q1 P1 P2 | = |∠R2 P2 S|, and Theorem G18 is proved for the case of corre-
sponding angles. The case of alternate interior angles follows from Lemma 5.1 on
page 276.

Pedagogical Comments. Because the second proof includes all the technical
details, it masks the very intuitive underlying argument. In a high school classroom,
we can present the same proof in the following way, with an explicit caution to the
class that we will rely on the picture itself to justify whether or not two points lie
on the same side or opposite sides of  or L2 :
Let L1  L2 , and let  intersect L1 and L2 at P1 and P2 , re-
spectively, as before. Also let Q1 , Q2 be as before. Let T be
−−−→
the translation along the vector P1 P2 . Then T (P1 ) = P2 . By
Theorem G5 (page 236), the translation T maps L1 to a line
parallel to itself. So T (L1 ) is a line passing through P2 and
parallel to L1 . But by hypothesis, L2 is also a line with the
same properties. The parallel postulate therefore implies that
T (L1 ) = L2 . So T maps Q1 to a point R2 on L2 , and it maps
P2 to a point S on , as shown in the preceding picture. Now,
we have T (∠Q1 P1 P2 ) = ∠R2 P2 S. Since T preserves degrees of
angles, |∠Q1 P1 P2 | = |∠R2 P2 S|, and Theorem G18 is proved for
the case of corresponding angles. The case of alternate interior
angles follows from Lemma 5.1 on page 276.
Such a proof would be far more instructive to a beginning student in geometry than
the (completely correct) formal proof above. End of Pedagogical Comments.

Remark. Readers who are familiar with some high school geometry may be
tempted at this point to immediately use Theorem G18 to prove the well-known
fact that the sum of (the degrees of) angles in a triangle is 180◦ . The argument
goes as follows. Given ABC, extend the ray RBA to a point D; through A draw
a ray RAE that is parallel to BC so that E lies in ∠CAD, as shown:
D

A E

JJ
J
J
J
B C
By Theorem G18, |∠DAE| = |∠B| and |∠CAE| = |∠C|, so that
|∠BAC| + |∠B| + |∠C| = |∠BAC| + |∠DAE| + |∠CAE| = 180◦ .
This would seem to finish the proof. Let us affirm that this intuitive argument is
indeed how a high school student should remember why the angle sum of a triangle
is 180. For a teacher to really come to grips with the delicate points about Eu-
clidean geometry, however, it is necessary to point out that for Theorem G18 to
be applicable, we must first prove that ∠C and ∠EAC are alternate interior angles
and ∠B and ∠DAE are corresponding angles. See the Pedagogical Comments both
280 5. DILATION AND SIMILARITY

before and after Theorem G18. In Section 6.5 of [Wu2020b], we will present a proof
of the angle sum theorem with such details filled in.

We can finally give the Proof of part (c) of Theorem G17. Let O be the
center of the dilation D. Since D maps rays to rays, it maps angles to angles. Given
∠P QR, let D map the ray RQP to the ray RQ P  and let it map the ray RQR to
the ray RQ R , so that D(∠P QR) = ∠P  Q R . We have to prove that
|∠P QR| = |∠P  Q R |.
If one of P , Q, and R is equal to O, the argument is simpler (this will be evident
below). So we may assume that none of P , Q, and R is equal to O. Furthermore,
suppose (let us say) O, P , Q are collinear. Then by the definition of a dilation,
P  and Q will also lie on the line containing O, P , Q and Q R  QR (Theorem
G16 on page 269). Then the fact that |∠P QR| = |∠P  Q R | follows directly from
Theorem G18 (the case of corresponding angles).

R 

 
 R

 
 
r 
r r r r
Q Q O P P
In general, we have a situation depicted by the following picture:


P

  
 P
 
 
 
Q
 
B R




Q R
Without loss of generality, we may assume that neither angle is the zero an-
gle (see (L6) part (ii), page 188). We claim that LQ P  must intersect LQR . If
not, then LQ P   LQR . But we already know from part (b) that LQ R  LQR .
Thus we have two distinct lines LQ P  and LQ R passing through Q and paral-
lel to LQR , and this contradicts the parallel postulate (page 165). Thus LQ P 
intersects LQR at a point B, as shown. By Theorem G16 (page 269) and its
Corollary 1 (page 270), LQR  LQ R and LQP  LQ P  . Therefore, according to
Theorem G18 about corresponding angles, (notation as in the preceding figure)
|∠P QR| = |∠P  BR| = |∠P  Q R |, as desired. The proof of Theorem G17 is com-
plete.

In the preceding proof, it is simply asserted that certain angles are correspond-
ing angles without any proof; this is because the details are similar to those in the
second proof of Theorem G18 (see especially the Pedagogical Comments on page
279). Going through such uninspiring arguments once is quite enough and we will
5.2. DILATION 281

continue to skip such arguments in the future. The following converse of Theorem
G18 will also be useful; the proof is sufficiently straightforward to be left to Exercise
1 below.

Theorem G19. If the alternate interior angles of a transversal with respect to


a pair of distinct lines are equal, then the lines are parallel. The same is true of
corresponding angles.

To conclude this discussion of dilation, it would be pleasant to be able to report


that a composite of two dilations (whose centers may be different) is also a dilation
(with respect to some other center), but unfortunately such is not the case. An
example will be given in Exercise 7 on page 282.

Exercises 5.2.
(1) Prove Theorem G19 on page 281. (Hint: Use Theorem G18 and Lemma
4.10 on page 190.)
(2) Prove: (a) The dilation of a convex set is a convex set. (b) The dilation of
a polygon is a polygon. (c) The dilation of a regular polygon is a regular
polygon.
(3) Let ABCD and A B  C  D be two quadrilaterals. Suppose there is a point
K so that the rays RKA , RKB , RKC , RKD contain A , B  , C  , D , re-
spectively. Assume also
|KA| |KB| |KC| |KD|
= = = .
|KA | |KB  | |KC  | |KD |
Prove that if ABCD is a square, then so is A B  C  D . (Caution: Be
careful about what you say and how you say it.)
(4) Let O be a point not on a given circle C with center K. Let D be the
dilation with center O and scale factor r. Prove that the image D(C) is a
circle and that the center of D(C) is the image under D of the center of
C. (Caution: This is a slippery proof. Follow the precise definitions of a
circle and a dilation.)
(5) Let D and E be the midpoints of AB and AC, respectively, of ABC,
and let K be the midpoint of DE (see picture below). Let D be the
dilation with center A and scale factor 12 . (a) If is the rotation of
180◦ around K, describe precisely the figure D( (ABC)). (b) If T is
the translation along AD, describe precisely the figure T (D(ABC)).
(c) How are the figures in (a) and (b) related?
A
@
 @
 @
D r @E
 K @
 @
 @
B  @C
282 5. DILATION AND SIMILARITY

(6) Assume a point O and the following curve in the plane:

O r

Trace both onto a piece of paper, and choose enough points on the curve
so that, by dilating these points with center O and scale factor 2, the
dilated points give a reasonable picture of the dilated curve with scale
factor 2.
(7) (i) Let P and Q be two distinct points in the plane and let DP , DQ be
two dilations with center at P , Q, respectively, and with scale factor 12
and 2, respectively. Prove that DP ◦ DQ is a translation along the vector
−−→
QM , where M is the midpoint of P Q. (ii) Generalize part (i).
(8) Let DP , DQ be two dilations with centers at two distinct points P and Q
and with scale factors r and s, respectively. If rs = 1, prove that there is
a point X so that DP ◦ DQ is a dilation with center X.
(9) (This exercise refers to a coordinate system. See the warning on page
207.) (a) Let D be the dilation with center O and scale factor 2, and
let ϕ be the congruence which is the reflection across the horizontal line
corresponding to {y = 1}. Are ϕ ◦ D and D ◦ ϕ equal? (b) Repeat part
(a) if ϕ is now the congruence which is the rotation of 90 degrees around
the point (1, 0). (c) Repeat part (a) if ϕ is now the congruence which is
the translation that sends a point (x, y) to (x + 2, y).
(10) Let ϕ be a congruence and let D be a dilation with center O and scale
factor r. Prove that ϕ−1 ◦ D ◦ ϕ is a dilation; be sure to state what its
center of dilation is and what its scale factor is.
(11) Let ROA , ROB , and ROC be three rays issuing from O. Let A ∈ ROA ,
B  ∈ ROB , and C  ∈ ROC . Suppose AB  A B  and BC  B  C  . Prove
that AC  A C  .

A H
HH
HHC  

 

A 

HHC

 



O B B
5.3. SIMILARITY 283

5.3. Similarity
The goals of this section are to introduce a correct definition of similarity, prove
two basic criteria for triangle similarity, and, as an application, prove the most
famous theorem of elementary mathematics: the Pythagorean theorem. We will
also prove the converse of the Pythagorean theorem.

Definition of similarity, its symmetry, and its transitivity (p. 283)


Two criteria for triangle similarity (p. 287)
The Pythagorean theorem and its proof (p. 290)

Definition of similarity, its symmetry, and its transitivity

Let S and S be two sets in the plane. How do we say correctly that they are
"similar"? First and foremost, the phrase "having the same shape" lacks precision
and cannot be used as a definition of similarity, contrary to what TSM tells you.
Moreover, the only precise definition of similar figures that TSM offers is that of
"similar polygons", and the problem with such a narrow definition is that it leaves
out the consideration of the similarity of geometric figures like parabolas. We
cannot afford any ambiguity about the similarity of parabolas because it will limit
our understanding of the graphs of quadratic functions and conic sections (see, e.g.,
Sections 2.2 and 2.3 of [Wu2020b], respectively). We need a definition of similar
figures that not only applies to all plane figures but also coincides with the TSM
definition in the case of polygons. Now in Section 5.2, we saw that if one figure is
a dilation of another, then they do appear to have the same shape. Why not just
say a figure is similar to another if one is the dilation of the other? To answer this
question, consider the following figures:

S S

One can convince oneself that S is obtained from S by a dilation of scale factor
1  
2 . Now rotate S clockwise by 90 degrees around the center of the circle in S to
obtain S*, as shown:

S S*

Now S* is of course congruent to S and therefore must have "the same shape" as S,
but can S* be a dilation of S? Not according to Theorem G16 (page 269) because if
it were, then the horizontal segment of S* would have to be parallel to the vertical
segment of S.
284 5. DILATION AND SIMILARITY

What this simple example shows is that it is too restrictive to define "similar-
ity" in terms of dilations alone. One must allow for compositions with congruences
as well since, intuitively, each congruence preserves both shape and size. In the
preceding example, for instance, a dilation of S by a scale factor of 12 , followed by
a clockwise rotation of 90 degrees yields the figure S* which still "has the same
shape" as S. With this in mind, we now give the formal definition of "similarity".

Definition. A figure S is said to be similar to another figure S*, in symbols,


S ∼ S*, if there is a finite sequence of dilations and congruences whose composition
maps S to S*.

A composition of a finite sequence of congruences and dilations is called a


similarity. Thus a congruence is a similarity. In this terminology, S is similar to
S* if there is a similarity F so that F (S) = S*.
Let F be a similarity. We claim that there is a positive number r so that if A
and B are any two points in the plane and A∗ = F (A) and B ∗ = F (B), then
(5.8) |A∗ B ∗ | = r|AB|.
This is because if ϕ is a congruence, then
|ϕ(A)ϕ(B)| = |AB|
for any two points A and B, and if D1 is a dilation with scale factor r1 , then
|D1 (A)D1 (B)| = r1 |AB|
for any two points A and B. Therefore, if F is the composition of p congruences
and q dilations (presumably with different centers) D1 , . . . , Dq and if the scale
factor of each Di is ri , then it is straightforward to see that (recall A∗ = F (A) and
B ∗ = F (B))
|A∗ B ∗ | = (r1 r2 · · · rq )|AB|
for any two points A and B. Therefore (5.8) holds with r = r1 r2 · · · rq . The positive
number r in (5.8) is, by definition, the scale factor of the similarity F . It follows
that a similarity with scale factor 1 is an isometry and, as soon as we can show that
every isometry is a congruence (Theorem G39 in Section 6.6 of [Wu2020b]), then
we will know that a similarity with scale factor 1 is a congruence.
At the moment, we must be careful about one aspect of the definition of S
being similar to S*; namely, if we know that S is similar to S*, do we also know
that S* is similar to S? This question is probably confusing at first, so let us try
to make sense of it. We are used to saying "the two figures S and S* are similar"
because we tend to think of S and S* as interchangeable parts so that if S is similar
to S*, then, "of course", S* would also be similar to S. But the fact that S ∼ S*
means there is a similarity F so that F (S) = S*. If we want to say S* ∼ S, then by
definition, we must produce a similarity G so that G(S*) = S. The question then
becomes whether the existence of such an F will always imply the existence of such
a G. The next lemma answers this question affirmatively. (Recall that symmetric
relation and transitive relation are defined on page 241.)

Lemma 5.2. (i) The composition of a finite number of similarities is a simi-


larity. (ii) Each similarity has an inverse transformation that is also a similarity.
5.3. SIMILARITY 285

(iii) The similarity of two figures S ∼ S* is a symmetric relation. (iv) The simi-
larity of two figures S ∼ S* is also a transitive relation.

Proof. (i) follows immediately from the definition of similarity. For (ii), recall
that congruences and dilations are bijections (Theorem G6 on page 240 and The-
orem G17 on page 275) and, since a composition of bijections is a bijection, each
similarity—being a composition of bijections—has an inverse transformation (see
page 211). It remains to prove that this inverse transformation is also a similarity.
To this end, we will look at a special case to avoid notational excesses, and it will
be seen that the reasoning behind the proof of this special case is perfectly general.
Suppose a similarity F is equal to a composition
F = ϕ1 ◦ D1 ◦ D2 ◦ ϕ2 ,
where ϕ1 and ϕ2 are congruences and D1 and D2 are dilations (possibly with
different centers). Let G be the composition of the following congruences and
dilations:
G = ϕ−1 −1 −1
2 ◦ D2 ◦ D1 ◦ ϕ1 ,
−1

where ϕ−1 denotes the inverse transformation of ϕ, etc. It is straightforward to


check that both F ◦ G = I and G ◦ F = I, where I is the identity transformation of
the plane, so that G is the inverse transformation of F . But G is a similarity because
the inverse of a congruence is a congruence and the inverse of a dilation is also a
dilation (Theorem G6 and Theorem G17 again). Thus G, being a composition of
congruences and dilations, is a similarity, as claimed. The case of a similarity F
being a composition of an arbitrary number of congruences and dilations can be
proved by an essentially identical argument.
We can now prove (iii); i.e., the similarity of two figures S ∼ S* is a symmetric
relation with respect to S and S*. Here is the reason: by definition, S ∼ S* means
there is a similarity F so that F (S) = S*. This implies F −1 (S*) = S, where as
usual F −1 denotes the inverse transformation of F . But we have just seen that F −1
is a similarity, so S* ∼ S, by definition. Finally, suppose S1 ∼ S2 and S2 ∼ S3 ; then
we will prove S1 ∼ S3 . This is because S1 ∼ S2 implies that there is a similarity
F1 so that F1 (S1 ) = S2 , and S2 ∼ S3 implies that there is a similarity F2 so that
F2 (S2 ) = S3 . Thus (F2 ◦ F1 )(S1 ) = S3 . Since F2 ◦ F1 is a similarity by part (i), we
conclude that S1 ∼ S3 . This proves part (iv) and hence the lemma.

Obviously, every figure is congruent to itself and hence similar to itself. Thus
similarity is also a reflexive relation (see page 241). Since similarity is a reflexive,
symmetric, and transitive relation, this says that similarity is an equivalence relation
(see page 241).
The fact that the similarity relation ∼ is both symmetric and transitive is not
an abstraction for its own sake. It has substantive intuitive content. As we pointed
out above, the symmetry of the relation allows us to say that "two figures S and
S* are similar" without having to worry about whether it is S ∼ S* or S* ∼ S,
because they are equivalent. We will freely avail ourselves of this terminology from
now on. In addition, the transitivity of similarity leads to the following intuitive
conclusion:

Lemma 5.3. If two geometric figures are each similar to a third, then they are
similar to each other.
286 5. DILATION AND SIMILARITY

Proof. Suppose S1 ∼ S3 and S2 ∼ S3 . Then we have to prove that S1 ∼ S2 . By


the symmetry of the ∼ relation, S2 ∼ S3 implies S3 ∼ S2 . Therefore, we now have
S1 ∼ S3 and S3 ∼ S2 , so that the transitivity of the ∼ relation implies immediately
that S1 ∼ S2 , thereby proving the lemma.

We note explicitly that, although most of our attention will be lavished on


triangles, this definition of similarity makes it possible to prove the similarity of
geometric figures which do not consist of segments, or, as we say, which are not
rectilinear (see page 269). For example, it follows directly from the definition that
all circles are similar to each other (see Exercise 5 on page 294). More importantly,
a correct definition of similarity is fundamental to the study of graphs of quadratic
functions which will be taken up in Chapter 2 of [Wu2020b]. In Section 2.2 of
[Wu2020b], we will prove that the graphs of all quadratic functions are similar to
each other and, in fact, all parabolas are similar to each other.

Mathematical Aside: (a) This concept of similarity is meaningful not only for
any geometric figure in the plane but also for figures in Euclidean spaces of higher
dimensions, as soon as we extend the definitions of rotations, reflections, transla-
tions, and dilations to higher dimensions. (b) Parts (i) and (ii) of Lemma 5.2 imply
that the set of all similarities of the plane form a group, the group of similarities.
On pp. 211, 235, and 240, we have introduced certain groups and we can now bring
them together:

group of translations ⊂ group of congruences


⊂ group of similarities ⊂ group of bijections.

In this chain of inclusions, each group is a subgroup of the next. As soon as we


prove that every isometry of the plane is a congruence (Section 6.6 of [Wu2020b]),
then we will be able to replace the group of congruences above by the isometry
group of the plane.

It remains to point out that the definition of a similarity as a finite composition


of congruences and dilations raises the specter that some similarities may require
the composition of "many" congruences and dilations for their definitions. Such
turns out not to be the case, as the following theorem shows.

Theorem. For a transformation F of the plane, the following three conditions are
equivalent:

(i) F is a similarity.
(ii) F is the composition of a dilation followed by a congruence.
(iii) F is the composition of a congruence followed by a dilation.

This theorem implies that we could have defined a similarity to be, for example,
the composition of a dilation followed by a congruence. However, the disadvantage
of such a definition is that it is actually clumsy to work with; e.g., this definition
makes it difficult to prove that ∼ is a symmetric and transitive relation (see remark
(b) in the preceding Mathematical Aside). Since the proof of this theorem is quite
5.3. SIMILARITY 287

abstract and technical, and decidedly not simple, we will relegate it to a file, A
Theorem about Similarity, to be posted on the author’s homepage, https://math.
berkeley.edu/~wu/.

Two criteria for triangle similarity

As in the case of congruence, the notation in the similarity of triangles,


by tradition, is made to carry more information. We say ABC ∼ A B  C  if
there is a similarity F so that
F (A) = A , F (B) = B  , F (C) = C  .

In other words, ABC ∼ A B  C  means not only that there is a similarity F so
that the sets F (ABC) and A B  C  are equal, but that F specifically maps A
to A , B to B  , and C to C  .

Theorem G 20. Given two triangles ABC and A B  C  , then ABC ∼


A B  C  , if and only if


|∠A| = |∠A |, |∠B| = |∠B  |, |∠C| = |∠C  |


and
|AB| |AC| |BC|
=   =   .
|A B  | |A C | |B C |

It is common to express the second set of equalities, i.e., the equality of the ra-
tios of corresponding sides of the two triangles, by saying that the corresponding
sides are proportional.

Remark. It is in the proof of this theorem that we get to see why a similarity
is defined as the composition of dilations and congruences (rather than just isome-
tries). The reason is that we need a similarity to preserve the degrees of angles
whereas an isometry is, at this point, not yet known to do that (compare Theorems
G6 on page 240 and Theorem G17 on page 275). It is the property of a congruence
to also preserve degrees of angles that accounts for the validity of Theorem G20.

Proof. If we have ABC ∼ A B  C  , then the assertions about angles and sides
follow from Theorems G6 (page 240) and G17 (page 275). For the converse, we
prove something stronger:

Theorem G21 (SAS for similarity). Given two triangles ABC and A B  C  ,
if |∠A| = |∠A | and
|AB| |AC|
 
=   ,
|A B | |A C |
then ABC ∼ A B  C  .

Proof. The idea of the proof is to use a congruence to move A B  C  into a
position so that a dilation with center at A will map it to ABC.
288 5. DILATION AND SIMILARITY

If |AB| = |A B  |, then the hypothesis would imply |AC| = |A C  | and we are
reduced to the SAS criterion for congruence. Thus we may assume that |AB| and
|A B  | are not equal. Without loss of generality, suppose |A B  | < |AB|. Then the
hypothesis that |AB|/|A B  | = |AC|/|A C  | implies |A C  | < |AC|. On the segment
AB, let B0 be the point so that |AB0 | = |A B  |. Similarly, on AC, let C0 be the
point satisfying |AC0 | = |A C  |.


Because |∠A| = |∠A | by hypothesis, the SAS criterion for congruence (The-
orem G8, page 245) implies that A B  C  ∼ = AB0 C0 . Let ϕ be the congruence
that maps A B  C  to AB0 C0 . Moreover, if r denotes the common value of
|AB|/|A B  | and |AC|/|A C  |, then the dilation D with center A and scale factor
r maps A to A (of course), but also B0 to B because by the definition of dilation,
D(B0 ) is the point on the ray RAB so that the distance of D(B0 ) from the center
A is
|AB| |AB|
r|AB0 | =  
|AB0 | =   |A B  | = |AB|.
|A B | |A B |

Thus D(B0 ) = B. Similarly, D(C0 ) = C. Therefore, D maps AB0 C0 to ABC,


thanks to Corollary 2 of Theorem G16 (see page 271). Consequently, we have

(D ◦ ϕ)(A B  C  ) = D(ϕ(A B  C  )) = D(AB0 C0 ) = ABC.

Since D ◦ ϕ is a similarity, ABC and A B  C  are similar and the proof of the
theorem is complete.

We next give the proof of the most easily applied criterion of similarity: the
AA criterion (angle-angle criterion) for similarity.

Theorem G22 (AA for similarity). Two triangles with two pairs of equal
angles are similar.

Remark. Of course as soon as we prove that the sum of angles in a triangle


is 180◦ (Theorem G32 in Section 6.5 of [Wu2020b]), then knowing the equality
of two pairs of angles is seen to be equivalent to knowing that all three pairs of
angles are equal. This is why this criterion is sometimes cited as the AAA criterion.
5.3. SIMILARITY 289

Proof. Let two triangles ABC and A B  C  be given. We may assume |∠A| = |∠A |
and |∠B| = |∠B  |.

We have to prove that ABC ∼ A B  C  . If |AB| = |A B  |, then the hypothesis
would imply ABC ∼ = A B  C  because of the ASA criterion for congruence
(Theorem G9, page 245). Thus we may assume that |AB| and |A B  | are not equal.
Suppose |A B  | < |AB|. On AB, choose a point B0 so that |AB0 | = |A B  |, and let
the line parallel to BC and passing through B0 intersect the line LAC at C0 . By
Theorem G11 (FTS*) (page 257), C0 lies on the ray RAC and

|AB| |AC|
(5.9) = .
|AB0 | |AC0 |

On the other hand, we have |AB0 | = |A B  | by construction and |∠A| = |∠A | by
hypothesis. In addition, |∠AB0 C0 | = |∠B  | because |∠AB0 C0 | = |∠B| by Theorem
G18 on page 277 concerning corresponding angles with respect to parallel lines, and
because |∠B| = |∠B  | by hypothesis. Thus ASA implies that AB0 C0 ∼ = A B  C  .
   
Hence |AB0 | = |A B | and |AC0 | = |A C |. Therefore equation (5.9) becomes

|AB| |AC|
=   .
|A B  | |A C |

Now recall that ∠A and ∠A are assumed to be equal. Therefore, triangles ABC
and A B  C  are similar because they satisfy the conditions of SAS for similarity
(Theorem G21). Theorem G22 is proved.

We emphasize once again that inasmuch as the validity of Theorem G21 de-
pends on Theorem G16 which depends on the parallel postulate, and the proof of
Theorem G22 depends on FTS* which also depends on the parallel postulate, both
theorems ultimately depend on the parallel postulate. The fact that all conclusions
about similar figures depend crucially on the parallel postulate will be underscored
once more in the last section of Chapter 8 in [Wu2020b].
290 5. DILATION AND SIMILARITY

To round off the picture, let us also mention the fact that, in analogy with the
case of congruence, there is also an SSS criterion for similarity. This will be proved
in Section 6.4 of [Wu2020b].

The Pythagorean theorem and its proof

For the purpose of learning about linear equations, students have to learn
how to apply Theorems G21 and G22 in specific situations. There is probably no
better illustration of such applications than the following proof of the Pythagorean
theorem.5 Note that there will be a second proof of this theorem in Section 4.4 of
[Wu2020c] using the concept of area.
Let us fix the terminology. Given a right triangle ABC with C being the vertex
of the right angle. Then the sides AC and BC are called the legs of ABC, and
AB is called the hypotenuse of ABC.
A HH
HH
HH
c
b HH
HH
HH
C a B

Theorem G23 (Pythagorean theorem). If the lengths of the legs of a right


triangle are a and b and the length of the hypotenuse is c, then a2 + b2 = c2 .

The basic idea of the proof is very simple. Referring to the same picture, we
draw a perpendicular from C to line LAB . The perpendicular meets the segment
AB at a point D (see the definition of segment on page 169), as the middle figure
in the following picture shows:

AH AH
HHD H D
H D
  HH c
H
 HH
H
 H HH H H
b 
b  HH 
H
HH
  H  H
H H
C C a
HB C a
H
B

5 Pythagoras of Samos is a pivotal figure in the development of mathematics, yet little is

known about him or his work with certainty. He was a Greek philosopher-mathematician who
lived around 500 BC. He founded a school devoted to mathematics and philosophy, but it was
also a school shrouded in secrecy and infused with a large dose of mysticism. The recognition of
the mathematical relationship between musical notes and the existence of numbers which are not
rational are attributed to this school. The so-called Pythagorean theorem was actually known
independently, and earlier, to the Babylonians, Hindus, and Chinese ([Katz, Chapters 1 and 2]).
The Babylonians made extensive computations with this theorem around 1800 BC; see Exercise
9 on page 384.
5.3. SIMILARITY 291

Then a simple use of Theorem G22 reveals that both right triangles CBD and
ACD are similar to ABC and therefore their corresponding sides are proportional.
This immediately leads, via the cross-multiplication algorithm, to several equalities
between the products of (lengths of) the sides of these triangles. If you already know
what to prove, then by trial and error, you cannot help but arrive at a combination
of these equalities that will give you what you want. On the other hand, if you do
not know what to prove, then these identities are not likely to do you much good.
There are many proofs of the Pythagorean theorem, but regardless of the proof,
one is always aware that guessing a correct statement of the Pythagorean theorem
is a higher order of achievement than finding a proof of the theorem. The first
person to discover this theorem must have been an extraordinary mathematician.

Proof. We will prove that ABC ∼ CBD and also ABC ∼ ACD.
H H
A Hβ HH c
HHD HH
 H HH
H HH
b  HαH HH
HH H
 HH
C  a B
It suffices to prove ABC ∼ CBD as the other similarity can be proved in
the same way. The two triangles ABC and CBD have two pairs of equal angles:
|∠CDB| = |∠ACB| = 90◦ and |∠CBD| = |∠ABC|. By the AA criterion for
|BA| |BC|
similarity (Theorem G22 on p. 288), the triangles are similar. Hence |BC| = |BD| .
Letting
|AC| = b, |AB| = c, |BC| = a, |AD| = β, |BD| = α
c a
(see the preceding picture), we get a = α, so that by the cross-multiplication
algorithm,
(5.10) a2 = αc.

By considering the similar right triangles ABC and ACD, we conclude in the same
|AC| |AD|
way that |AB| = |AC| , so that cb = βb . Therefore,

(5.11) b2 = βc.
Adding (5.10) and (5.11) and making use of α + β = c, we finally obtain
a2 + b2 = αc + βc = (α + β)c = c2 .
The proof of the Pythagorean theorem is complete.

There is an animation of the preceding proof by Larry Francis that also makes
the striking observation that the algebraic manipulations above actually have a
geometric interpretation in terms of area:
https://youtu.be/QCyvxYLFSfU.
292 5. DILATION AND SIMILARITY

Pedagogical Comments. The fact that the perpendicular from C to the


line LAB meets the line at a point D between A and B is of critical importance in
the preceding proof. Without knowing this fact, we would not have been able to
conclude in the last step of the proof that |BD| + |DA| = |AB|. From a math-
ematical standpoint, a proof of this fact is essential. Unfortunately, this proof is
too intricate and too abstract for the school classroom, as we shall see presently.
Thus, once again, we are confronted with a common dilemma in the teaching of
school geometry: what should be done in terms of mathematics is incompatible
with what can be done in terms of pedagogy. Our recommendation is that, because
this fact is so pictorially obvious, the preceding proof therefore should be allowed to
stand in a school classroom.6 That said, we will proceed to give this proof because
our goal is to help you as a teacher to know the whole truth about things you are
supposed to teach. Note that the proof would be considerably simpler if we had at
our disposal the theorem that the sum of (the degrees of) the angles of a triangle is
180◦ . However, in the way we are developing plane geometry, this theorem about
the angle sum of a triangle—whose full proof involves technical details that are
equally intricate—will not be taken up until Section 6.5 of [Wu2020b].
We will now prove that the perpendicular from C to line LAB must meet LAB
at a point D between A and B; i.e., A ∗ D ∗ B. We will argue by contradiction.
Suppose not. Then either D is equal to A or B, or D lies outside the segment AB.
If D = A (let us say), then CA ⊥ AB and LCB and LAB are both perpendicular
to LAC and therefore LCB  LAB (Theorem G2 on page 223). This is impossible
because these two lines intersect at B. Therefore we may assume D lies outside
AB; i.e., either D ∗ A ∗ B or A ∗ B ∗ D (Lemma 4.4 on page 169). Without loss of
generality, we may assume the former; i.e., A is between D and B, as shown:

HH
D
 H A

H  HH
 HH
 b HcH
HH  HH
HH  HB
C HH a
HH
HH
HE
Let line LCE be parallel to LAB (use the corollary to Theorem G1 on page 222).
We may assume the point E to be so chosen that E and B lie in the same half-plane
of LCD . The strategy is to show that

(5.12) |∠ACB| < |∠DCE|.

Assuming this for the moment, we will show how to conclude the proof. Recall that,
by hypothesis, ∠ACB is the right angle of the right triangle ABC. Therefore
|∠ACB| = 90◦ . But LCE  LAB and BD ⊥ CD; therefore by Theorem G3 on
p. 224, EC ⊥ CD. It follows that also |∠DCE| = 90◦ , and (5.12) now leads to the
absurd statement that 90◦ < 90◦ . Thus D has to be between A and B to begin
with.

6 Although students should be informed that it is possible to prove A ∗ D ∗ B.


5.3. SIMILARITY 293

It remains to prove (5.12). To this end, we will prove a more detailed statement:
(5.13) |∠ACB| < |∠DCB| < |∠DCE|.
We begin by proving the second inequality in (5.13); i.e., |∠DCB| < |∠DCE|. We
claim that B lies in the convex angle ∠DCE. Thus we must prove (i) E and B
lie in the same half-plane of LCD and (ii) D and B lie in the same half-plane of
LCE (see the definition of angle on p. 182). Now (i) is true because this was how
we chose E. To see that (ii) is true, observe that since LCE  LAB , the line LAB
does not contain any point of LCE and therefore neither does the segment DB; by
assumption (L4)(ii) on p. 176, D and B lie in the same half-plane of LCE . Thus
(ii) is also true and B lies in ∠DCE, thereby proving the claim. It follows that
∠DCB and ∠BCE are adjacent angles with respect to ∠DCE (see page 186 for
the definition of adjacent angles). Therefore by assumption (L6)(iv) on p. 188,
|∠DCB| + |∠BCE| = |∠DCE|.
Since LCE  LAB , B does not lie on LCE so that, in particular, B, C, and E are
not collinear. Thus |∠BCE| > 0. Therefore |∠DCB| < |∠DCE|, and the second
inequality in (5.13) holds.
The proof of the first inequality in (5.13), i.e., |∠ACB| < |∠DCB|, is entirely
similar, but simpler. We want to show that A lies in ∠DCB, and this requires the
proof that A and B lie in the same half-plane of LCD and that A and D lie in the
same half-plane of LCB . Both follow immediately from the hypothesis that A is
between D and B, so that AB does not contain any point of LCD and DA does
not contain any point of LCB (see (L4)(ii) again). Therefore ∠DCA and ∠ACB
are adjacent angles with respect to ∠DCB, and assumption (L6)(iv) implies that
|∠DCA| + |∠ACB| = |∠DCB|.
Since |∠DCA| > 0, we have |∠ACB| < |∠DCB|. We have therefore completely
proved (5.13) and, therewith, also (5.12). As explained right after (5.12), this
means that the point D on LAB has to be between A and B. End of Pedagogical
Comments.

The converse of the Pythagorean theorem is also true. It is intriguing that


the proof of the converse makes use of the Pythagorean theorem itself. Since it
is sufficiently simple, it will be left as an exercise with an ample supply of hints
(Exercise 7 on page 294).

Theorem G 24 (Converse of Pythagorean theorem). If triangle ABC


satisfies |CA|2 + |CB|2 = |AB|2 , then |∠C| = 90◦ .
An immediate consequence of the Pythagorean theorem is the following exten-
sion of the SAS criterion for triangle congruence in the case of right triangles:
Theorem G25 (HL). Two right triangles with equal hypotenuses and a pair
of equal legs are congruent.

Here HL stands for "hypotenuse-leg". By the Pythagorean theorem, the other


pair of legs of these two right triangles must be equal. The SAS criterion then
yields the desired congruence. The details are left to Exercise 7 on page 294.
Before leaving the Pythagorean theorem, we wish to bring out the fact that this
theorem is a consequence of the parallel postulate, in the following sense. The proof
294 5. DILATION AND SIMILARITY

of the theorem on page 291 depends on the concept of similar triangles, which in
turn depends on the concept of dilation. It is manifest that almost every property
of dilation rests on the parallel postulate, e.g., the fact that a dilation maps a line
to a line (see the proof of Theorem G16 on page 269). It is therefore clear that the
parallel postulate plays a critical role in validating the truth of the Pythagorean
theorem. Of course, it is possible that there is another proof of the Pythagorean
theorem that does not make use of the parallel postulate, but what we want to
emphasize is that such a proof does not exist. Without the parallel postulate, the
Pythagorean theorem will cease to hold. In fact, in hyperbolic geometry (see Section
8.4 of [Wu2020b]), where it is assumed that through a point not lying on a line 
pass two distinct lines parallel to  (the opposite of the parallel postulate), the
Pythagorean theorem fails. There, a2 + b2 < c2 . The Pythagorean theorem is
therefore a characteristic theorem of Euclidean geometry.

Exercises 5.3.
(1) Let D, E, F be the midpoints of the sides BC, AC, AB, respectively, of
a triangle ABC. Prove that DEF ∼ ABC with a scale factor of 2.
(2) Let ABC be a right triangle with AC ⊥ CB. Let the perpendicular line
|AC|·|BC|
from C to AB meet AB at D. Prove that |CD| = |AB| .
(3) Let ABC be a right triangle so that |AC| = 3, |BC| = 4, and |AB| = 5.
Let the perpendicular line from C to AB meet AB at D, and let the
perpendicular line from D to AC meet AC at E. Find |CE|.
(4) (This exercise generalizes Exercise 11 on page 238.) Assume FTS. Let
L1 , L2 , and L3 be three mutually parallel lines, and let  and  be two
distinct transversals which intersect the three parallel lines at A1 , A2 , A3
and B1 , B2 , B3 , respectively. Prove that
|A1 A2 | |B1 B2 |
= .
|A2 A3 | |B2 B3 |
(5) Prove that all circles are similar to each other. (Caution: This is a slippery
proof. Given two circles C1 and C2 , suppose you have found a dilation D
and congruence ϕ so that (ϕ ◦ D)(C1 ) = C2 . Then you will have to prove
that the two sets (ϕ ◦ D)(C1 ) and C2 are equal. This means you will have
to prove that each is contained in the other (see page 141). Do not skip
steps.)
(6) Prove that two rectangles are similar to each other if and only if either
the ratios of (the lengths of) their sides are equal or the product of these
ratios is 1. Precisely, let the lengths of the sides of one rectangle be a and
b and those of the other be a and b ; then the rectangles are similar if and
 
only if either ab = ab or ab · ab = 1.
(7) (a) Write a detailed proof of Theorem G25. (b) Prove Theorem G24
(Converse of the Pythagorean theorem). (Hint for (b): Suppose in ABC
that |AB| = c, |AC| = b, |BC| = a, a2 + b2 = c2 , and yet |∠C| = 90◦ .
Deduce a contradiction as follows: let D be the point on LBC so that
AD ⊥ BC. There are two cases: D lies in BC and D lies outside BC. The
two cases are similar, so consider the former case where B∗D∗C. Compare
the hypothesis that a2 + b2 = c2 with the results obtained by applying the
5.3. SIMILARITY 295

Pythagorean theorem to the right triangles ABD and ACD to arrive


at a contradiction.)
(8) Let L and L be two lines intersecting at a point O. Take any point P on
L, and let the line passing through P and perpendicular to L meet L at
|P P  |
a point P  . Prove that the ratio |OP  | is independent of the position of
P on L; i.e., if Q is another point on L and if the line passing through Q
and perpendicular to L meets L at a point Q , then
|P P  | |QQ |
= .
|OP  | |OQ |
(9) (a) Let |∠B| = |∠C| in ABC. Prove that |AB| = |AC|. (b) Prove that
every point on the angle bisector of an angle is equidistant from (the lines
containing) the two sides of the angle, in the sense of page 228.
(Note: In some sense, these two assertions should be proved in
the setting of congruence, not similarity; any theorem related to
similar triangles requires the FTS, which is a more sophisticated
theorem than anything about congruent triangles. Indeed, we
will revisit these assertions again in Chapter 6 of [Wu2020b]
(to be precise, Theorem G29 in Section 6.2 and Exercise 1 in
Exercises 6.7), and you will prove them using only theorems
about congruence. That said, the virtue of this exercise is that
you get to see another approach to these standard facts.)
(10) Suppose we have two parallel lines L and L and a point O not lying on
either line. Let three lines passing through O intersect L and L at points
A, B, C and A , B  , C  , respectively, as shown:


  

(This picture puts O between L and L , but O could be anywhere.) Prove


that
|AB| |BC| |AC|
=   =   .
|A B |
  |B C | |A C |
(11) Suppose in ABC, AB is longer than AC. Let a point D on the segment
BC be such that AD ⊥ BC.
A
J
 J

 J
 J
 J

 J
B D C
(a) Prove that |BD| > |DC|. (b) Prove that |AB| + |AC| > |BC|.
(c) Prove that |BD| − |DC| > |AB| − |AC|.
296 5. DILATION AND SIMILARITY

(12) (This exercises assumes a familiarity with symbolic computations; see,


e.g., Section 6.1 on pp. 298ff.) Given ABC, let a point D on the segment
BC be such that AD ⊥ BC. Let also
|AB| = c, |AC| = b, |AD| = h, |DC| = , |BC| = a.
A
J
 J

c Jb
 h J
 J

 J
B  C
  D 
a

a2 + b2 − c2
(a) Prove that  = . (b) Prove that
2a
1
h = (a + b + c)(a + b − c)(−a + b + c)(a − b + c).
2a
(This exercise essentially proves Heron’s formula for the area of a triangle
in terms of its sides; see Section 4.5 in [Wu2020c].)
(13) Suppose you are a teacher in middle school and you are handed a textbook
series that takes up similarity in grade 7 and congruence in grade 8. (Such
a series did exist in 2013.) (a) Do you believe such a curricular decision is
defensible? Explain. (b) If you are a seventh-grade teacher, what would
you do? (Obviously there will be no unique answer to part (b), but the
idea is that you had better start thinking about such real-world situations
because your ability to adjust is, alas, part of your responsibility.).
CHAPTER 6

Symbolic Notation and Linear Equations

In this chapter, we begin the study of algebra. The main topics of this chapter
are the use of symbols, linear equations in one or two variables, and systems of two
linear equations in two variables.
In the context of school mathematics, the most urgent task in helping students
to achieve success in algebra may very well be getting them to be fluent in the
correct use of symbols. There is at present an unhealthy preoccupation in TSM1
with the concept of a "variable" in the teaching of algebra, to the point of elevating
it to a formal mathematical concept. The truth is that "variable" is not a math-
ematical concept. A main goal of this chapter is to explain why, if students know
the basic etiquette in the use of symbols,2 there would never be any need for them
to understand what a "variable" is. Clearly the word "variable" is suggestive, and
it is often used in mathematical discussions as a shorthand; for example, we have
just used it to talk about "linear equations in one or two variables". However, we
did so only because the meaning of this phrase is universally understood, and there
is no need to find out what "variables" means in this context.3 So when all is said
and done, students should just concentrate on learning the basic etiquette in the
use of symbols and learn it well.
A major stumbling block in students’ learning about linear equations in two
variables is the concept of slope (cf. [PG]). The fairly voluminous literature in
education research on slope indicates an awareness of students’ difficulty on this
topic. One of the many symptoms of this difficulty is articulated in [Beckmann-
Izsák]:
They might not see slope as a number, but instead think of it
as a pair of numbers separated by a slash, basically "rise slash
run."
It is surprising that [Beckmann-Izsák]—in discussing why slope is hard to teach—
did not mention the glaring absence of a correct definition for slope in TSM. Just
as in the case of fractions, students, teachers, and educators have been forced to
learn about slope without the benefit of knowing precisely what it is. Under the
circumstances, the nonlearning of slope—like the nonlearning of fractions—becomes
all but inevitable.
It is a mathematical and pedagogical imperative that students understand why
one single number can be attached to a line to supposedly describe its "slant"
(whether it is this way \ or that way /) and its "steepness". To this end, we devote
all of Section 6.4 to a detailed discussion of a correct definition of slope: what
1 See page xiv of the preface for the definition of TSM.
2 See page 299.
3 In the same way we understand "Faustian bargain" or "Catch 22" without having to find

out who Faust is or what "22" is all about.


297
298 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

it is supposed to measure, why the definition—relying on the concept of triangle


similarity4 —has to be so elaborate, and how the usual formula of "rise-over-run"
follows logically from the definition. It is a complex concept, and it deserves a
mathematical treatment that recognizes its complexity.5 Then we prove that a line
is horizontal if and only if it has 0 slope and that two lines passing through the same
point with the same slope must coincide. These theorems should begin to convince
students that the hard work of learning the definition is worth the effort. The precise
definition of slope enables us to dispel the mystery behind the interplay between
the geometry and the algebra of a linear equation in two variables. Indeed, it is
this mystery that has bedeviled students, teachers, and educators. The availability
of a precise definition of slope also enables us to prove that the graph of a linear
equation is a line and that each line is the graph of a linear equation (see Theorem
6.11 on page 354). We recommend, strenuously, that all students learn this proof,
because the reasoning imbedded in the proof renders all assessment items related
to equations of lines to be nothing more than routine exercises. It also goes without
saying that the discussion of systems of two linear equations in two variables gains
immeasurably in transparency as a consequence.
We hope that, building on such a correct mathematical foundation, education
research on student learning—or nonlearning—of slope will acquire greater validity.
This chapter makes a great effort to combat the negative impact on student
learning by both the abuse of the "concept of a variable" and the nondefinition of
slope in TSM. The extended pedagogical comments on pp. 318ff., 327ff., and 361ff.
will likely give you an even better idea about what goes into this chapter, and why.

6.1. Symbolic expressions


This section has the modest goal of introducing readers to the correct use of
symbols. Such a discussion would seem to have little mathematical substance, but
we will strenuously argue that it may very well be the most important section of
this chapter because it asks you to shed any bad habits you may have acquired in
your encounters with TSM concerning the use of symbols. You have been told that
mastering the concept of a "variable" is the gateway to algebra (cf. the pedagogical
comments in the subsection on pp. 318ff.). You were also told how to "manipulate
symbolic expressions" in a symbol x without giving any thought to what x may be
(cf. the Pedagogical Comments on page 327). These are not valid mathematical
practices, and the dual purpose of this section is to explain why not and, more
importantly, make suggestions on how to do better.
The basic etiquette in the use of symbols (p. 299)
Expressions and identities (p. 302)
An important identity (p. 306)
Mersenne primes (p. 307)
The finite geometric series (p. 309)
Polynomials and "order of operations" (p. 310)
Rational expressions (p. 316)
Pedagogical comments on the teaching of "variables" (p. 318)

4 Thisis the reason we take up slope after Chapters 4 and 5.


5 Rather than just heuristic argument after heuristic argument or lots of manipulatives and
storytelling without mathematical substance.
6.1. SYMBOLIC EXPRESSIONS 299

The basic etiquette in the use of symbols

In mathematics, we use symbols to expedite the expression of ideas. The be-


ginning of algebra, as we understand this term, is the introduction of generality and
abstraction by using symbols6 to represent numbers. In order to convince students
with only a background in arithmetic that the use of symbols is something well
worth learning, we have to demonstrate the benefits of so doing.
Consider
√ the problem of asking students to interpret a string of symbols, such
as y = 3x − 7. There are education researchers who believe that a problem of
this type can be used to assess mature ways of understanding mathematics and
mature ways of thinking about mathematics. This view is, however, erroneous. In
mathematics, such a string of symbols has no meaning , because they are the
exact analog of the question, "Is he someone with 225 pounds on a six-foot-five
frame?" Without knowing who "he" is, this statement may be true√or it may be
false. By the same token, without knowing what y and x are in y = 3x − 7, there
is no interpretation to give and no conclusion to draw.
Let us do better. In mathematics, the correct use of symbols dictates that each
symbol must be quantified, i.e., clearly described as to what it stands for each
time it is used. This may be called the basic
√ etiquette in the use of symbols.
For example, we can make sense of "y = 3x − 7" by specifying what x and y are
and by providing a context. Here are four variations on this theme:

For √
all real numbers x, we can find a real number y so that
y = 3x − 7.
For some
√ real numbers x, we can find a real number y so that
y = 3x − 7.
There
√ are an infinite number of fractions x and y so that y =
3x − 7.
There
√ are an infinite number of positive integers x and y so that
y = 3x − 7.

The importance of quantification can be seen by noting that, despite the similarity
between the first two statements,
√ the first is false (e.g., x = 0) and the second is
true (e.g., x = 3 and y = 2). Similarly, despite the similarity between the last
two statements, the first is true whereas the second is false (see Exercise 1 on page
320).
A pertinent remark in this connection is that many school students7 commit
the elementary error of √
writing down symbolic expressions without quantifying the
symbols, such as "y = 3x − 7" above. Very likely, the only way to combat this
widespread abuse is to not allow TSM to take root in students’ thinking right from
the beginning. Let us teach them to always quantify their symbols.

6 Usually using letters of the English alphabet, but often using letters from the Greek alphabet

as well because it is easy to run out of appropriate symbols for a particular task.
7 And a good number of college students too.
300 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

To make sure you see why it is important to always quantify your symbols,
we take up another example that has more mathematical substance. Consider the
following three statements:
(C1) n ≥ 3 and an + bn = cn .
(C2) For any positive integer n ≥ 3, there are no positive num-
bers a, b, and c so that an + bn = cn .
(C3) For any positive integer n ≥ 3, there are no positive integers
a, b, and c so that an + bn = cn .
The statement (C1) has no meaning, because we do not know what the sym-
bols a, b, c, and n stand for. If a and b in (C1) are 2 × 2 matrices and c is a
3 × 3 matrix,√ then (C1) is false, but of course (C1) is true if n = 3 and a = 1,
b = 2, c = 3 9 (the cube root of 9). (C2) is totally false because no matter what
n may be and no matter what the positive numbers a and b may be, letting c be
the positive n-th root of an + bn (see Theorem 4.2 in Section 4.2 of [Wu2020b])
will always yield the desired equality of numbers, an + bn = cn . Finally, one may
recognize statement (C3) as the famous Fermat’s Last Theorem, first conjectured
by Pierre Fermat in 1637 but not proved until Andrew Wiles did so in 1995 (see
[WikiFermat]; we will have more to say about Fermat on page 308). Not to harp
on the obvious, but the statements (C2) and (C3) differ by just one word in the
quantifications of a, b, and c. Moral: Precise quantification of symbols is important.

Once the need for quantification of symbols is understood, we now clarify the
use of the word "variable". First we give an example. Consider the problem of
finding all the numbers x which satisfy 3x + 7 = 5. In the usual jargon, this is
known as solving the linear equation 3x + 7 = 5. We will take a serious look
at "what an equation means and how to solve an equation" in Section 6.2 on pp.
322ff., but we will proceed informally at this juncture to get our point across. With
this understood, the usual procedure for solving such equations yields 3x = 5 − 7,
and therefore
5−7
x= .
3
There is a reason why we do not carry out the computation in the numerator to
write the solution as −2 1
3 , and it is because if we consider 3x + 2 = 13 instead, then
we get
13 − 12
x= .
3
Or, consider 3x − 25 = 4.6 and by rewriting it as 3x + (−25) = 4.6, we get
4.6 − (−25)
x= .
3
Or, consider 5x − 25 = 4.6 and get
4.6 − (−25)
x= ,
5
and so on. There is an unmistakable abstract pattern here: one can easily verify
that, with a, b, and c (a = 0) understood to be three fixed numbers throughout the
following discussion, the solution of the linear equation ax + b = c is
c−b
x= .
a
6.1. SYMBOLIC EXPRESSIONS 301

We have now witnessed the fact that in some symbolic expressions, the symbols
stand for elements in an infinite set of numbers,8 e.g., the statement that mn = nm
for all real numbers m and n, while in others, the symbols stand for the element
in a set consisting of exactly one element (in other words, they stand for a fixed
value throughout the discussion), e.g., the numbers a, b, and c in the preceding
linear equation ax + b = c. In the former case, the symbols m and n are called
variables, and in the latter case, a, b, and c are called constants. Notice that
such terminology is no more than an afterthought when we have carefully quantified
the symbols in each situation. There is in fact no need for the words variables
and constants when such information is already contained in the quantification.
However, we will continue to use them not only because they have been in use
for over three centuries and are everywhere in the mathematics literature, but also
because they are at times an indispensable shorthand.
There are compelling reasons for singling out the terminology of "variable" and
"constant" for such an extended discussion. See the pedagogical comments on pp.
318ff. and 327ff., respectively.
In a situation where we try to locate any numbers x that satisfies a given equa-
tion (such as 2x2 + x − 6 = 0 or 2x = x), the value of the number x is unknown
to us, of course. For this reason, we will conveniently refer to the symbol x as an
unknown, just to save verbiage. To the extent that we will never make logical
deductions based on the properties of an "unknown", it is not necessary to make
this terminology more precise.9

At the risk of pointing out the obvious, note that we have been making use of
symbols from the very beginning of this volume out of necessity. One example is the
addition formula for fractions (equation (1.12) on page 33): for any two fractions
 and n , where k, , m, n are whole numbers (the product n = 0),
k m

k m kn + m
+ = .
 n n
If we do not use symbols, we would be forced to express the formula as follows:
The sum of two fractions is the fraction whose numerator is
the sum of the product of the numerator of the first fraction
with the denominator of the second, and the product of the
numerator of the second with the denominator of the first, and
whose denominator is the product of the denominators of the
given fractions.10
Even if you are inordinately fond of the English language, you will have to admit
that the symbolic statement is far more clear, and this is not even taking into
account the difficulty of trying to provide a mathematical derivation of this addition
formula without the benefit of symbols.

8 Strictly speaking, all that matters is that the symbols stand for elements in a set consisting

of more than one element. But for school algebra, "infinite" suffices for the purpose at hand.
9 This saves us from the need to discuss the relationship between an unknown and a variable.
10 This was the way mankind had to express formulas from al-Khwarizmi (c. 780 to c. 850)—

the person whose name gave birth to the word "algorithm"—all through the Middle Ages to
the time of François Viète (1540–1603). The codification of the symbolic notation is generally
attributed to R. Descartes (1596–1650). See [Bashmakova-Smirnova].
302 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

This example may serve the purpose of explaining to students why the use of
symbols is a necessity. Of course, there are innumerable other examples as well.

Expressions and identities

We now begin the mathematical discussion.


By a number expression, or more simply an expression, in a given collection
of numbers x, y, . . . , w, we mean a number obtained from these x, y, . . . , w and
from a collection of specific real numbers by the use of a combination of the four
arithmetic operations (i.e., +, −, ×, ÷). For example, if x, y, z are numbers, then
xy 4 21
+ x3 (16z − y 2 ) − z
xyz − 7 13
is an example of a number expression in the numbers x, y, z (we have to assume
xyz = 7). More precisely, it is the number obtained by applying +, −, ×, and ÷
4
to the numbers x, y, z and to the specific numbers 7, 16, and 13 .
We note explicitly that since the definition of an expression requires that we
compute with numbers x, y, etc., that may not be rational, FASM (page 133) has
been implicitly invoked for this definition to make sense.
The meaning of number expression will be enlarged, in due course, to include
the use of specific functions of the numbers x, y, . . . , w (see the end of Section 1.1
in [Wu2020b]) and the use of the operation of "taking the n-th root" (see Section
4.2 of [Wu2020b]). Because all the symbols we use are numbers, we can apply all
we know about numbers (including FASM) to number expressions without having
to learn anything new, including the fact that the associative, commutative, and
distributive laws are automatically valid for computations with number expressions.
The importance of the latter fact for teaching and learning cannot be overstated,
because in TSM, a "variable" is considered to be a different animal from a number
and therefore the arithmetic operations on expressions (involving "variables") can
only be justified by an arbitrary decree—a prime example of teaching by rote.
In a number expression such as
1 x
x4 − 5x3 y 2 + x2 y 2 − xy 3 + 2y 4 + ,
2 1 + y2
which involves the numbers x and y, we may regard it as nothing more than a sum
of products, namely,
1
(x4 ) + (−5x3 y 2 ) + (x2 y 2 ) + (−xy 3 ) + (2y 4 ) + (x · (1 + y 2 )−1 ).
2
(You may wish to review at this point the definition of subtraction in terms of
addition on page 96 and the interpretation of division as multiplication by a multi-
plicative inverse in equation (2.29) on page 115). Any of the expressions x4 , −5x3 y 2 ,
2 x y , −xy , 2y , and 1+y 2 , which are separated by two consecutive +’s (except
1 2 2 3 4 x
x
for the first one x4 and the last one 1+y 2 ), is called a term of the expression. As

is the custom, the writing of the expression x4 − 5x3 y 2 + 12 x2 y 2 − xy 3 + 2y 4 + 1+y


x
2

has made implicit use of three notational conventions:


Retiring the multiplication symbol ×: The multiplication
sign × is omitted in expressions except that if emphasis on
a particular multiplication is needed, a dot "·" is used, as in
6.1. SYMBOLIC EXPRESSIONS 303

(x · (1 + y 2 )−1 ). As is well known, the reason for retiring × is


that it is too easily confused with the letter x when written by
hand.

The way of writing specific numbers in an expression:


The numbers −5 and 12 in the expression x4 − 5x3 y 2 + 12 x2 y 2 are
called coefficients or, more precisely, the coefficients of x3 y 2
and x2 y 2 , respectively. By convention, the coefficients are al-
ways placed in front of the symbols, e.g., never x3 (−5)y 2 or even
x3 y 2 (−5) unless there is a compelling reason for doing so. The
term x4 also has a coefficient because x4 is the abbreviated form
of 1x4 , but the number 1 as a coefficient is always suppressed.

The order of arithmetic operations among symbols in


an expression: It is understood that (i) we do the multiplica-
tions of each letter symbol (in this case, x and y) indicated by
x
the exponent first (e.g., x3 and y 2 in −5x3 y 2 , or y 2 in 1+y 2 ),
2 −1
then (ii) the multiplications within each term (e.g., x · (1 + y )
x
in 1+y 2 ), and finally (iii) the addition of the various terms.

We will have more to say about the third convention presently.

Two (number) expressions in numbers x, y, . . . , w are said to be equal ex-


pressions if the two numbers are equal for all values of x, y, . . . , w. A well-known
example is the two expressions in x and y, (x + y)(x − y) and x2 − y 2 . We will
prove presently (see (6.3) below) that, indeed, these two expressions are equal no
matter what x and y may be. Thus they are equal in the sense just defined and we
can write
(x + y)(x − y) = x2 − y 2 .
By the definition of equal expressions, the equal sign between the expressions auto-
matically means the two sides are equal as numbers for all numbers x and y. Such
an equality between two expressions is then called an identity.
In TSM, equal expressions are said to be "equivalent expressions". Suffice
it to say that this terminology is unnecessary and is, in any case, not used in
mathematics.
In general, it can happen that the equality of two expressions in a collection of
numbers x, y, z, . . . is valid, not for all values of x, y, z, . . . , but for "many values"
of x, y, z, . . . . Here, the meaning of "many" will have to be understood in context.
It could mean all numbers with a small number of exceptions, as in11
π
1 + tan2 x = sec2 x for all numbers x = an odd integer multiple of .
2
This equality makes no sense when x is equal to an odd integer multiple of π2
because tan x and sec x are not defined at those values of x . Or, "many" could

11 We continue to make use of some mathematics that we have not yet discussed—but which

you most likely know—to illustrate a point. For the issue at hand, see the appendix in Section
1.4 of [Wu2020c].
304 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

mean all nonzero whole numbers only, such as


n(n + 1)
1 + 2 + 3 + · · · + (n − 1) + n = for all whole numbers n ≥ 1.
2
By tradition, both of these equalities, thus carefully quantified, are also called
identities. You recognize that we have not offered a precise definition of what an
identity is, other than that an identity is a figure of speech that alerts you to the
fact (already made explicit above) that it is an equality between two expressions
which is valid for "many" values of the numbers in question. By a common abuse
of language, one simply writes, for example,

1 + tan2 x = sec2 x

as an identity without any qualifications and leaves it to the readers to figure out
that the equality is claimed only for x not equal to an odd integer multiple of π2 .
Therefore, you have to be careful about the range of the values of x for which the
equality is supposed to hold in each case. This is but one example among many
where we sometimes use mathematical terms out of respect for tradition but not
as precise mathematical concepts. Usually, these terms do manage to suggest a
pleasant mental image, and that should count for something. "Variable" is another
example of such usage, as we already discussed.
Fortunately, a majority of the well-known identities are valid for all numbers,
with no exceptions. We will focus on these in this section.

Now with the terminology of an identity understood, let us list the three most
common identities:

(6.1) (x + y)2 = x2 + 2xy + y 2 ,


(6.2) (x − y)2 = x2 − 2xy + y 2 ,
(6.3) (x + y)(x − y) = x2 − y 2 .

At least three comments should be made. The first is that when x and y are
rational numbers, these three identities follow from routine number computations
using the associative, commutative, and distributive laws. For example, here is a
proof of (6.3): for any numbers x and y,

(x + y)(x − y) = (x + y)x − (x + y)y (dist. law)


= x + yx − xy − y
2 2
(dist. law, Theorem 1 on p. 87)
= x + xy − xy − y
2 2
(comm. law of mult.)
= x −y
2 2
(Theorem 1 on p. 87).

Because FASM (page 133) assures us that the associative, commutative, and dis-
tributive laws (for both + and ×) continue to hold for real numbers, these identities
are valid for all real numbers x and y. A second comment is that once we have
the first identity (6.1), the second one (6.2) becomes a trivial consequence of the
first because (6.1), being valid for all numbers x and y, is also valid for x and −y.
Therefore, for any numbers x and y,

(x + (−y))2 = x2 + 2x(−y) + (−y)2 .


6.1. SYMBOLIC EXPRESSIONS 305

But by equation (2.23) on page 109, this is the same as

(x − y)2 = x2 − 2xy + y 2 for any x and y,

which is the same equation as (6.2). Thus we have proved that (6.1) implies (6.2).
In a similar manner, we can prove that, conversely, (6.2) implies (6.1). In the
terminology of page 22, the first identity (6.1) is equivalent to the second one (6.2).

You may be wondering why we bother with the preceding proof of the second
identity since a simple direct computation already proves (x − y)2 = (x − y)(x − y).
Our point is that, at the beginning stage of algebra, your students are put in touch
with the idea of generality for the first time; namely, the first identity is not just
the equality of the expressions (x + y)2 and x2 + 2xy + y 2 for certain numbers
x and y, but that it is valid for all numbers x and y. If you can teach them to
take the latter statement seriously and make them realize that the validity of the
equality when y is replaced by −y immediately implies the validity of the second
identity, then you have taught them something valuable. One may paraphrase by
saying that, because of the generality of the first identity, the first identity already
contains the second identity as a special case. An important part of learning algebra
is to become alert to the potential implications of a general statement. From this
vantage point, the derivation of the second identity from the first now becomes a
noteworthy demonstration of the power of generality.
A third comment is that, in practice, the usefulness of these identities is, more
often than not, derived from one’s ability to read these identities also from
right to left, i.e., the ability to recognize in a given situation that, for any two
numbers x and y,
x2 + 2xy + y 2 is equal to (x + y)2 ,
x2 − 2xy + y 2 is equal to (x − y)2 ,
x2 − y 2 is equal to (x + y)(x − y).
For example, 25x2 + 49y 2 − 70xy is equal to (5x − 7y)2 , because

25x2 + 49y 2 − 70xy = (5x)2 − 2(5x)(7y) + (7y)2 .

We pause to make another remark about identities. We may rewrite, for ex-
ample, the identity (x − y)(x + y) = x2 − y 2 as

x2 − y 2
= x + y.
x−y
Remembering that we cannot divide by 0, we see that this equality holds for all
numbers x and y except when x = y. This equality is of course also considered to
be an identity—keeping in mind the exceptions.
We have thus far discussed the concept of an equation informally and the con-
cepts of an expression and an identity in some depth. On pp. 322ff. below, we will
elaborate on the concept of an equation. It is to be noted that our view on these
fundamental concepts in beginning algebra deviates from those typically found in
education research; see, e.g., [McCrory et al.].
306 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

An important identity

There is more to be said about the identity (x − y)(x + y) = x2 − y 2 ! We can


ask if there is an analogous identity that has x3 − y 3 on the right side. There is,
because a straightforward computation making repeated use of the distributive law
(and of course also Theorems 1 and 2 of the appendix in Chapter 1, page 87) shows
that
(x − y)(x2 + xy + y 2 ) = x(x2 + xy + y 2 ) − y(x2 + xy + y 2 )
= x3 + x2 y + xy 2 − yx2 − yxy − y 3
= x3 + x2 y + xy 2 − x2 y − xy 2 − y 3
= x3 − y 3
for all numbers x and y. In other words,
(x − y)(x2 + xy + y 2 ) = x3 − y 3 .
Similarly, we have for the 4-th and 5-th powers:
(x − y)(x3 + x2 y + xy 2 + y 3 ) = x4 − y 4 ,
(x − y)(x + x y + x y + xy + y ) =
4 3 2 2 3 4
x5 − y 5 .
The pattern is now clear: for any positive integer n and for all numbers x and y,
(6.4) (x − y)(xn + xn−1 y + xn−2 y 2 + · · · + xy n−1 + y n ) = xn+1 − y n+1 .

Let us rewrite this identity in the following way: for all numbers x and y,
(6.5) xn+1 − y n+1 = (x − y)(xn + xn−1 y + xn−2 y 2 + · · · + xy n−1 + y n ).
Thus, the difference of two numbers x and y raised to the same power can
always be expressed as a product of x − y and xn + xn−1 y + · · · + y n . By any
measure, this is a nice-looking identity. We will examine its consequences in some
detail in two different settings, and the two groups of results—which would seem
to be unrelated to each other—will nicely illustrate the concept of generality.
Since this identity is valid for all numbers x and y, it is certainly valid when
x and y are positive integers. In that case, observe in particular that the right
side of (6.5) is a product of two integers. If it happens that x > y > 0, then also
xn+1 > y n+1 > 0 (see Exercise 11 on page 132) and the identity says that the
positive integer xn+1 − y n+1 has a factorization (in the sense of page 138) as the
product of two positive integers: x − y and xn + xn−1 y + · · · + xy n−1 + y n . For the
sake of clarity, we restate it as follows: for all positive integers a, b, and n, so that
a > b > 0, we have the following factorization of the positive integer an+1 − bn+1 ,
which is a special case of identity (6.5):
(6.6) (an+1 − bn+1 ) = (a − b)(an + an−1 b + an−2 b2 + · · · + abn−1 + bn ).
If a−b > 1, then in particular a > 1 so that an +an−1 b+an−2 b2 +· · ·+abn−1 +bn > 1.
Therefore the positive integer an+1 − bn+1 , being the product of two integers each
bigger than 1, is not a prime (page 148). We have thus proved the following.

Lemma 6.1. If a, b are positive integers and a − b > 1, then for all positive
integers n, an+1 − bn+1 is not a prime.
6.1. SYMBOLIC EXPRESSIONS 307

For example, 1273 −663 = 1,760,887, and this lemma guarantees that 1,760,887
is not a prime. By no means is this fact obvious because its smallest divisor is 61.
In fact, the prime decomposition (see Theorem 3.6 on page 149) of 1,760,887 is
61 × 28867.
In a similar vein, you can show off to your friends by challenging them to check
whether 13,997,513 is a prime. You of course know that it is not a prime because

13,997,513 = 2413 − 23 .

What makes the testing of the primality of this number difficult is that the prime
decomposition of 13,997,513 is 239 × 58567. In other words, its smallest divisor
is 239, so that guess-and-check will not work efficiently in this case. (Again, the
identity 2413 − 23 = (241 − 2)(2412 + 241 · 2 + 22 ) also happens to exhibit the prime
decomposition of 13,997,513.)

Activity. Let a and b be integers. Does (a5 − b5 ) divide a15 − b15 in the sense
of page 138? Does (a2 + ab + b2 ) divide a15 − b15 ?

Suppose b = 1 in identity (6.6). Then we get, for all positive integers a and n,

(6.7) an+1 − 1 = (a − 1)(an + an−1 + an−2 + · · · + a2 + a + 1).

As before, if a > 2, then a − 1 > 1 and an + an−1 + · · · + a2 + a + 1 > 1 so that


an+1 − 1 is never a prime. This is a special case of Lemma 6.1.
It turns out that the case of a = 2 provides a different kind of intrigue.

Mersenne primes

If a = 2 in (6.7), then a − 1 = 1, and (6.7) no longer provides a nontrivial


factorization (see page 138 for the definition) of 2n+1 − 1 and, therefore, no longer
exhibits the whole number 2n+1 − 1 as a composite. More to the point, 2n+1 − 1
is actually a prime for certain values of n, as we can see from the first five cases of
n = 1, 2, 3, 4, 5:
If n = 1, then 2n+1 − 1 = 3.
If n = 2, then 2n+1 − 1 = 7.
If n = 3, then 2n+1 − 1 = 15.
If n = 4, then 2n+1 − 1 = 31.
If n = 5, then 2n+1 − 1 = 63.
So among the first five values of 2n+1 − 1, three of them are primes but two are
not. The question naturally arises: among all possible values of 2n+1 − 1 as n runs
through all the positive integers, which of them are primes?
There is an easy reduction of this question, as the following proposition tells us
that there is no need to look for primes among the numbers 2n+1 − 1 where n + 1
is a composite:

Lemma 6.2. If m is a composite positive integer, then 2m −1 is also a composite.


308 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

Proof. Indeed, if m = pq for positive integers p and q so that p, q > 1, then


2m = (2p )q . Thus,


2m − 1 = (2p )q − 1q = (2p − 1) (2p )q−1 + (2p )q−2 + · · · + (2p ) + 1 .
(Notice that, once more, we rely on identity (6.6) for the second equality.) But
p > 1 implies (2p − 1) > 1 because 2p − 1 ≥ 3. Moreover, q > 1 implies
2p − 1 < (2p )q − 1 = 2m − 1. Therefore 1 < (2p − 1) < 2m − 1, and 2p − 1 is
a proper divisor of 2m − 1. The lemma is proved.

Lemma 6.2 explains why 24 − 1 (= 15) and 26 − 1 (= 63) in the above list are
not primes.
In view of Lemma 6.2, our original question about which of 2n+1 − 1 are primes
can now be simplified to the following:
Which of 2p − 1 are primes, as p runs through all the primes?
In 1644, Father Mersenne12 claimed that, among all the primes p < 258, 2p − 1
is a prime exactly when
p = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127, 257.
Keep in mind that the computer is a creation of the latter part of the twentieth
century, so it was a nontrivial matter in the days of Mersenne to check the primality
of a (whole) number with, say, ten digits such as 231 −1 = 2, 147, 483, 647. Mersenne
probably did not test the primality of all the numbers he wrote down, and his
statement was likely a mixture of guessing and wishful thinking. It is not even
clear whether he actually proved that his list was correct for p ≤ 31 because, if he
did, he would have to verify the following two assertions:
(A) The number 2p − 1 is composite for p equal to 11, 23, and
29.
(B) The number 2p − 1 is prime for p equal to 13, 17, 19, 31 (the
cases of p = 2, 3, 5, 7 are obvious).
For (A), the fact that 211 − 1 = 23 · 89 was known back in 1536, and the fact that
223 − 1 is composite was found by Fermat13 in 1640 as a refutation of the opposite
claim made by Pietro Cataldi (1548–1626) in 1603. The case of 229 − 1 was not
known at the time Mersenne made his conjecture, and it would stay that way until
Euler,14 almost a century after Mersenne made his conjecture, managed to find the
12 Marin Mersenne (1588–1648) was a French theologian and amateur mathematician. He was
a friend of all the French mathematical luminaries of his time, including Descartes, Fermat, Pascal,
and Desargues, and performed the critical service of disseminating mathematical information
among them at a time when mathematical publications were basically nonexistent.
13 Pierre Fermat (1607–1665) was a French lawyer; he was also an amateur mathematician

but one of the greatest mathematicians of all time nonetheless. The terminology of "Cartesian
coordinates" masks the fact that Fermat was a codiscoverer of analytic geometry with Descartes;
in fact Fermat made the discovery a few years earlier and seemed to have a better understanding
of the potential of his discovery. Fermat was a cofounder of the theory of probability (with Blaise
Pascal), and the modern theory of numbers also owes its existence to Fermat. We already had
occasion to mention Fermat’s Last Theorem on page 300. His definition of the tangent to a curve
at a point inspired Newton’s definition of the derivative (see Theorem 6.19 in [Wu2020c]).
14 Leonhard Euler (1707–1783), the most productive mathematician of all time, was the

dominant mathematician of the eighteenth century and rightfully ranks among the greatest. He
made important contributions in every part of mathematics, physics, and astronomy as they were
known in his time.
6.1. SYMBOLIC EXPRESSIONS 309

prime decomposition 229 − 1 = 233 · 1103 · 2089 in 1738. As for (B), the primality
of 213 − 1 = 8191 only takes a little patience and was known in any case as far back
as the fifteenth century. The primality of 217 − 1 and 219 − 1 was verified by the
same Cataldi in 1588. But the primality of 231 − 1 was finally proved only in 1772
by Euler, some 130 years after Mersenne made his conjecture.
With the help of an electronic computer, we can easily see that Mersenne’s list
contains five errors: 61, 89, 107 should have been on his list but they were not,
and the numbers 67 and 257 which were on his list shouldn’t have been there (i.e.,
267 − 1 and 2257 − 1 are composites).
As a result of Mersenne’s list, a number of the form 2p − 1 which is a prime is
called a Mersenne prime. You can get more information about Mersenne primes
from http://mersenne.org/.
So what are the Mersenne primes? Unfortunately, we know very little about
this question. We do not even know if there are an infinite number of Mersenne
primes. A search for bigger and bigger Mersenne primes is an ongoing enterprise
(see the website above), and one of the side benefits of this search is that each
time a larger Mersenne prime was found, it also turned out to be the largest prime
number known to mankind. As of January 2020, 51 Mersenne primes are known,
and the largest is a number with more than 24 million digits corresponding to the
prime p = 82,589,933 (discovered on December 7, 2018). From a mathematical per-
spective, this search would acquire greater significance if we knew that the number
of Mersenne primes is finite.

The finite geometric series

Now, let us take a second look at identity (6.5),


xn+1 − y n+1 = (x − y)(xn + xn−1 y + xn−2 y 2 + · · · + xy n−1 + y n )
for all numbers x and y and all positive integers n. (We emphasize that, at this
juncture, x and y are no longer restricted to positive integers but are arbitrary
numbers.15 ) Letting y = 1, we get
(6.8) xn+1 − 1 = (x − 1)(xn + xn−1 + xn−2 + · · · + x2 + x + 1)
for all numbers x and for all positive integers n. We now explore the implications
of identity (6.8) from another angle. If x = 1, multiplying both sides by x−1
1
and
switching the left and the right sides gives
xn+1 − 1
(6.9) 1 + x + x2 + · · · + xn = for all x = 1
x−1
for any number x = 1 and for any positive integer n. The sum 1 + x + x2 + · · · + xn
is called a finite geometric series in x, and identity (6.9) is usually called the
summation formula for the finite geometric series. A slight variant of (6.9)
is the following identity obtained from (6.9) by applying the distributive law: for
any a = 0,
xn+1 − 1
(6.10) a + ax + ax2 + · · · + axn = a · for all x = 1.
x−1
15 Once we have introduced complex numbers (see Section 5.2 in [Wu2020b]), this identity

will be seen to be valid for complex numbers x and y as well.


310 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

We may assume that in both (6.9) and (6.10), x = 0 because the case of x = 0
is not interesting. Recalling16 that 1 = x0 , we may consider this identity as the
expression of the sum of all the whole number powers of x up to and including xn
as a quotient (xn+1 − 1)/(x − 1). For example, if x = −3 and n = 12, then
(−3)13 − 1 1594324
1 − 3 + 32 − 33 + 34 − · · · − 311 + 312 = = = 398581.
−3 − 1 4
3
But if x = 4 and n = 15, then we have

1+ 3
4 + ( 43 )2 + ( 43 )3 + · · · + ( 43 )15 = {( 34 )16 − 1}/( 43 − 1)

which is equal to approximately


0.9899774
= 3.9599096,
0.25
or more crudely, 3.96.
The summation of the finite geometric series is usually tucked away at the end
of the second year of school algebra, with the result that it is often not taught,
or taught hastily, for lack of time. This is unfortunate because this summation
formula is one of the most basic pieces of mathematics students should know for
advanced mathematical or scientific work. As we have just seen, it is also one of the
most elementary and therefore should be taught at the beginning of school algebra,
not at the end of it. Please be aware of this fact when you teach algebra.
Let us cast a backward glance at the last three subsections which are devoted
to a discussion of the identity (6.5) for all positive integers n:
xn+1 − y n+1 = (x − y)(xn + xn−1 y + · · · + xy n−1 + y n ).
On account of its generality—the fact that it is valid for all numbers x and y—
this identity lends itself to two divergent trains of thought. One is to ask whether
certain positive integers are primes, and the other is to obtain a formula for the
sum of a finite geometric series. This is a trivial illustration of why mathematics
pursues generality, because general theorems always raise the possibility that they
will have many interesting potential applications.

Mathematical Aside: It is worthwhile to point out that the identity (6.5) also
gives a short proof of the calculus fact that the derivative of xn+1 (where n is
a positive integer) is equal to (n + 1)xn . Briefly, the proof goes as follows (see
the proof of Theorem 6.16 in [Wu2020c]). The derivative of xn+1 at a is the
limit of the difference quotient (xn+1 − an+1 )/(x − a) as x goes to a. Because
of (6.5), the numerator of the difference quotient is equal to the product of (x − a)
and xn + xn−1 a + · · · + xan−1 + an . Thus the difference quotient itself becomes
xn + xn−1 a + · · · + xan−1 + an after (x − a) has been canceled from the numerator
and denominator. So as x converges to a, we get an + an + · · · + an (n + 1 times),
which is (n + 1)an .

Polynomials and "order of operations"

We next introduce polynomials, but first a remark before the formal definition.
Underlying the whole discussion of polynomials in school algebra is a basic technique
16 For a fuller discussion of the 0-th power of x, see Section 4.2 in [Wu2020b].
6.1. SYMBOLIC EXPRESSIONS 311

known as collecting like terms. It is nothing more than a simple observation


based on the distributive law, and we deal with this first. Suppose we have a sum
(18 × 53 ) + (53 × 23) + (69 × 53 ).
One can compute this sum by first multiplying out each term 18 × 53 , 53 × 23, and
69 × 53 and then adding the resulting numbers to get
(18 × 53 ) + (53 × 23) + (69 × 53 ) = 2250 + 2875 + 8625 = 13750.
Now if we reflect for a moment, we would realize that we wasted precious time
doing three multiplications before adding. If we apply the distributive law, then
the computation simplifies:
(18 × 53 ) + (53 × 23) + (69 × 53 ) = (18 + 23 + 69) × 53
= 110 × 125 = 13750.
Notice that we have made use of the commutative law of multiplication to change
53 × 23 to 23 × 53 in the process.
You may think that, with the advent of high speed computers, it does not
matter if we get the answer by multiplying three times and then adding once or (as
in the second case) adding three times and multiplying once. This is true, but the
difference in conceptual and visual clarity between
(18 × 53 ) + (53 × 23) + (69 × 53 )
and
(18 + 23 + 69) × 53
is substantial. This is because multiplication is a far more complicated concept than
addition: 234 + 677 is simply the addition of two three-digit numbers, but 234 × 677
means adding 677 to itself 234 times. It is therefore far simpler conceptually to
add three times and multiply once than to multiply three times and add once.
Because both conceptual and visual clarity are important in the learning and doing
of mathematics, we will collect together terms involving the same numbers raised
to a fixed power (such as 53 in (18 × 53 ) + (53 × 23) + (69 × 53 )) by using the
distributive law. For example, we will always rewrite
(181 × 25 ) + (67 × 25 ) + (25 × 96) − (257 × 25 )
as
87 × 25 (= (181 + 67 + 96 − 257) × 25 ).
Similarly, we will write the sum
     8   8 

3
8

14
14 3 3
24×59 −
14
× 89 + 59 ×73 + 59 ×66 + 25 × + × 11
5 5 5
as   8 

3
163 × 5914 − 53 × ,
5
where 163 = 24 + 73 + 66 and −53 = −89 + 25 + 11. Recall once more that we refer
to both of the above expressions as a "sum" because subtraction is just addition in
disguise (see page 96).
In an entirely similar manner, suppose we are given a sum of multiples of non-
negative integer powers of a fixed number x, where multiple here means simply
312 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

multiplication by any number and not necessarily by a whole number and "nonnega-
tive integers" refers to the whole numbers 0, 1, 2, . . . . Then we would automatically
collect together the terms involving the same power of x as before. For example,
we will write
1 3 1
x + 16 − 8x2 + x3 − x5 − 6x2 + 75x + 2x3
2 3
as
17
(6.11) −x5 + x3 − 14x2 + 75x + 16.
6
A sum of multiples of nonnegative integer powers of x is called a polynomial in
x.17 We also agree to call any expression in a number x a polynomial in x if it
is equal to a sum of multiples of nonnegative integer powers of x after applications
of the associative, commutative, and distributive laws to the expression. Thus the
expression in (6.11) is a polynomial in x, as is (x − 1)(x + 2) because it is equal to
x2 + x − 2 after expansion. We have followed three notational conventions that
are generally followed in discussions about polynomials:
(A) Parentheses are usually suppressed, with the understanding
that exponents are computed first, multiplications second, and
additions third.
(B) The earlier convention is that the power of x is placed last
in each term, so that we write −14x2 instead of x2 (−14).
(C) The terms are written in decreasing powers of the number x
in question. (The term 16 is the term 16x0 , where, by definition,
x0 = 1; incidentally, this is where we need the concept of the
zeroth power of a number18 .)
In this connection, a caveat about (C) should be mentioned: a polynomial is some-
times written in increasing powers of the variable for a reason.19
As on page 303, the number in front of a power of x is called the coefficient of
that particular power of x. For the polynomial in (6.11), it is in reality equal to
17 3
(−1)x5 + 0 · x4 + x + (−14)x2 + 75x + 16x0
6
when it is written strictly as a sum of multiples of decreasing powers of x. Therefore,
the coefficient of x5 in (6.11) is −1, the coefficient of x4 is 0 (remember, x is just
a number, so that 0 · x4 = 0), the coefficient of x3 is 17 2
6 , the coefficient of x is
−14, the coefficient of x is 75, and the so-called constant term 16 is actually the
coefficient of x0 . A multiple of a single nonnegative power of x, such as 58x12 , is
called a monomial. Thus, a monomial is a polynomial with only one term. The
highest power of x with a nonzero coefficient in a polynomial is called the degree
of the polynomial. The terminology about "nonzero coefficient" refers to the fact
that the preceding polynomial −x5 + 17 6 x − 14x + 75x + 16 could be written as
3 2

0·x37 −x5 + 6 x3 −14x2 +75x+16, but the 37-th power of x clearly does not count.
17

17 Mathematical Aside: This definition of a polynomial with real coefficients is the appropriate

one for school mathematics and is based on the fact that the polynomial ring over R is ring-
isomorphic to the ring of R-valued polynomial functions.
18 Note that in this case, we have to make an ad hoc definition by agreeing to write x0 = 1

regardless of whether x = 0 or not.


19 Mathematical Aside: The Taylor polynomials of a differentiable function are usually written

in increasing powers.
6.1. SYMBOLIC EXPRESSIONS 313

This polynomial has degree 5, and not 37 (and not any whole number different from
5, for that matter).
This is the place to make a comment on the notational convention (A) above.
As we said earlier, visual clarity in the notation we use is important. We find the
polynomial
17
−x5 + x3 − 14x2 + 75x + 16
6
easy to work with because it is visually simple. As written, though, this symbolic
expression is a priori ambiguous because it could mean, among other things, the
following:
  3
17
(−x)5 + x − {(14x)2 + 75}x + 16.
6
But of course what we have in mind is
 
17 3
{−(x5 )} + (x ) + {−14(x2 )} + {75x} + 16.
6
(We need not specify the order of doing the additions at this point because of the
general associative law; cf. Theorem 1 in the appendix of Chapter 1, page 87.) So
the net effect of notational convention (A) is to eliminate the need to write the
last cumbersome-looking expression by declaring that the expression −x5 + 17 6 x −
3
2
14x + 75x + 16 already means the same thing. This is all there is to the notational
convention (A). It is a convention, and one should not invest more mathematical
significance in a convention than what it truly is.
As is well known, school mathematics is sometimes led astray by misplaced
emphases. Convention (A) has somehow become enshrined in the middle school
curriculum under the name of order of operations. In TSM, mnemonic devices
were created to help students memorize it (PEMDAS, "Please Excuse My Dear
Aunt Sally"), and standard assessments likewise contribute to promote the impor-
tance of this convention. A mathematics classroom has to deal with conventions,
of course, but a convention should be put in its proper place and not be magnified
out of proportion. A more moderate approach would be to explain the genesis of
the convention (A) above, quiz students at the beginning to make sure they get it,
and go on to more important things. For a fuller discussion, see [Wu2004b] on the
so-called "order of operations".
A polynomial of degree 1 is called a linear polynomial, and a polynomial of
degree 2 is called a quadratic polynomial. Because a general quadratic poly-
nomial has only three terms, ax2 + bx + c, it is also called a trinomial in school
mathematics. However, the term "trinomial" is used only in school mathematics,
not in higher mathematics; it should therefore be avoided in general discussions.
We will discuss quadratic polynomials in some detail in Chapter 2 of [Wu2020b].
A polynomial of degree 3 is called a cubic polynomial.
The most familiar polynomials are the so-called expanded forms of whole num-
bers; these are polynomials in the number 10. For example, the expanded form of
75,018 is
(7 × 104 ) + (5 × 103 ) + (0 × 102 ) + (1 × 101 ) + (8 × 100 )
which is a fourth-degree polynomial in 10. Of course the expanded form of any
k-digit whole number is a polynomial of degree (k − 1) in 10. On the other hand,
314 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

the so-called complete expanded form of a decimal such as 32.58,


(3 × 101 ) + (2 × 100 ) + (5 × 10−1 ) + (8 × 10−2 ),
is not a polynomial in 10, for the reason that it contains negative powers of 10.
It should also be pointed out that as a polynomial in 10, the expanded form of
a whole number is not a typical polynomial because each coefficient is a single-digit
whole number. By contrast, the coefficients of a general polynomial in 10 can be
any number.
Because polynomials are just numbers, we can add, subtract, multiply, and
divide them as numbers. Therefore, with the exception of division, the other three
arithmetic operations produce another polynomial in a routine manner. (Division
of polynomials does not generally produce a polynomial and will be looked at
separately.) Consider, for example, the product of two linear polynomials ax + b
and cx + d:
(ax + b)(cx + d) = (ax + b)(cx) + (ax + b)d (dist. law)
2
= acx + bcx + adx + bd (dist. and com. laws)
= acx2 + (ad + bc)x + bd (dist. law).
Because of the definition of a polynomial, we had to collect terms of the same degree
in x using the distributive law and rearrange the terms so that they are in descending
powers of x. Other than that, this shows that the multiplication of polynomials is
no different from the usual operations with numbers. If the arithmetic of numbers
(whole numbers and rational numbers) were taught correctly, such operations with
polynomials would be just routine rather than a significant problem to reckon with.
As is well known, TSM has fostered the uncivilized practice of "FOILING" to
teach the multiplication of two linear polynomials in beginning algebra classrooms.
This does serious damage to mathematics learning for at least two reasons. First,
the mnemonic device of FOIL is only applicable to the product of two linear poly-
nomials, but what students of algebra must learn is how to apply the distributive
law in general (cf. (6.4) on page 306). Second, teaching FOIL does nothing so
much as promote the TSM paradigm of replacing mathematical reasoning by the
memorization of rote skills such as PEMDAS, "the butterfly method", etc. Such
practices are antithetical to the goal of a good mathematics education: teaching
students how to reason.
We have mentioned the need to sometimes look at an equality backwards, and
we will repeat this message once again. What we obtained above,
(ax + b)(cx + d) = acx2 + (ad + bc)x + bd,
is nothing but routine applications of the distributive law. However, when this
equality is read from right to left, it becomes
(6.12) acx2 + (ad + bc)x + bd = (ax + b)(cx + d).
This equality is no longer routine by any stretch of the imagination! In general, if
the polynomials p(x), q(x), r(x) in x satisfy p(x) = q(x)r(x), then we say q(x)r(x)
is a factorization of p(x) if the degrees of both q(x) and r(x) are positive. (Thus
3 x −2x + 3 = ( 3 )(5x −6x +2) is not a factorization of 3 x −2x + 3 because the
5 3 2 2 1 3 2 5 3 2 2
1
degree of 3 is zero.) In this terminology, the identity (6.12) furnishes a factorization
of acx2 + (ad + bc)x + bd as a product (ax + b)(cx + d), provided a = 0 and c = 0.
6.1. SYMBOLIC EXPRESSIONS 315

For example, we get


 
1 2 5 1
x + x − 3 = (2x − 3) x+1
2 4 4
by letting a = 2, b = −3, c = 14 , and d = 1. However, this is by no means an
invitation for students to memorize the identity (6.12)! There are more reasonable
ways to learn how to obtain the factorization of 12 x2 + 54 x − 3. One way is the
following. Since it is much easier to deal with integers rather than rational numbers,
we rewrite the quadratic polynomials by using the distributive law to take out the
denominators of all the coefficients, as follows:
1 2 5 1
x + x − 3 = (2x2 + 5x − 12).
2 4 4
Then we recognize that
(2x2 + 5x − 12) = (2x − 3)(x + 4)
because the 0-degree term (i.e., 12) of 2x2 + 5x − 12 has to be the product of the
0-degree terms −3 and 4 of 2x − 3 and x + 4, and the coefficient 2 of 2x2 + 5x − 12
has to be the product of the coefficients 2 and 1 of 2x − 3 and x + 4, respectively.
So a few trials and errors would get it done. Hence, we obtain as before,
1 2 5 1 1
x + x − 3 = (2x2 + 5x − 12) = (2x − 3)(x + 4).
2 4 4 4
At present, the teaching of factoring quadratic polynomials with integer coeffi-
cients figures prominently, not to say obsessively, in school courses in algebra. For
this reason, some perspective on this subject is called for. One should keep in mind
that all it does is factor two integers A and C into products of integers so that a
given quadratic polynomial Ax2 + Bx + C can be written as acx2 + (ad + bc)x + bd
(which then equals (ax + b)(cx + d)). There is no denying that beginning students
ought to acquire some facility with decomposing integers into products. It is also
important that they are able to effortlessly factor a simple quadratic polynomial
such as x2 + 2x − 35 into the product (x + 7)(x − 5). But as sometimes happens,
although a little bit of something is good for you, a lot of it may actually be harm-
ful. This would seem to be the case here where a minor skill gets blown up to
a major topic, with the consequence that other topics that are more central and
more substantial (such as learning about why the graphs of linear equations in two
variables are lines or how to solve rate problems correctly) get slighted in terms of
time and emphasis. There is a memorable passage on this issue in an introductory
textbook on abstract algebra:
Very early in our mathematical education—in fact in junior high
school or early in high school itself—we are introduced to poly-
nomials. For a seemingly endless amount of time, we are drilled,
to the point of utter boredom, in factoring them, multiplying
them, dividing them, simplifying them. Facility in factoring a
quadratic becomes confused with genuine mathematical talent.
([Herstein, p. 153])
Teachers of algebra should avoid this pitfall. Please also keep in mind the fact that
once the quadratic formula becomes available (see Theorem 2.8 in Section 2.1 of
[Wu2020b]), there will be a two-step algorithm to accomplish this factorization no
matter what the coefficients of the quadratic polynomials may be.
316 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

We give one more illustration of the factorization of polynomials. We begin


with a multiplication of polynomials where each step except the last makes use of
the distributive law:
       
1 1 1 1
5x −
3 2
(x + 2x + 8) = 5x −
3
x + 5x −
2 3
2x + 5x −3
8
2 2 2 2
 
1
= 5x5 − x2 + (10x4 − x) + (40x3 − 4)
2
1
= 5x5 + 10x4 + 40x3 − x2 − x − 4.
2
Reading this equality from right to left results in a factorization that is not so
trivial:
 
1 2 1
5x + 10x + 40x − x − x − 4 = 5x −
5 4 3 3
(x2 + 2x + 8).
2 2
Note the fact that if p(x) and q(x) are polynomials of degree m and n, respectively,
then the degree of the product p(x)q(x) is (m + n). In other words, the degree of a
product is the sum of the degrees of the individual polynomial factors. For example,
the preceding calculation which multiplies a degree 3 polynomial with a degree 2
polynomial yields a polynomial of degree 5 (= 3 + 2).

Rational expressions

Finally, a quotient (i.e., division) of two polynomials in a number x is called a


rational expression in x. Here is an example:
3x5 + 16x4 − 25x2 − 7
.
x2 − 1
We note that in the case of rational expressions, we need to exercise some care in
not allowing division by 0 to take place. For example, in the preceding rational
expression, x can be any number except ±1 because if x = ±1, then x2 − 1 = 0 and
the denominator would be 0.
In writing rational expressions, it is understood (unless stated to the contrary)
that only those values for which the denominator is nonzero are considered.

In middle school, we are mainly interested in rational numbers and, as a conse-


quence, all computations with numbers tacitly assume that the numbers involved are
rational numbers. (But keep in mind that, by FASM (page 133), we can carry out
the same computations with arbitrary real numbers as well.) With this understood,
since x is a (rational) number, a rational expression is just a rational quotient (in
the sense defined on page 117) and can therefore be added, subtracted, multiplied,
and divided like any fraction (see items (a)–(d) on page 118). For example, in case
x = 12 in the foregoing rational expression, we would be looking at the rational
quotient
1
3( 32 ) + 16( 161
) − 25( 41 ) − 7
,
( 41 ) − 1
5
which is equal to 16 24 . In general, no matter what x is, we can compute with
rational expressions in x in the usual way (see (c) and (d) on page 118 and (2.33)
6.1. SYMBOLIC EXPRESSIONS 317

on page 119):
0.5x3 + 1 2x7 (0.5x3 + 1)(x3 + 37 ) + (2x7 )(x8 + x − 2)
+ 3 3 =
x +x−2
8 x +7 (x8 + x − 2)(x3 + 37 )
and
3 2
2x + 1 6 ( 32 x2 + 1)(6)
· =
x + 4x −
2 7 3x4 − 5 (x2 + 4x − 7)(3x4 − 5)
and
2x+1
x2 −0.3 (2x + 1)(2x)
= .
4x −x+11
3
2x
(x2 − 0.3)(4x3 − x + 11)
These are exactly the same as any computation with rational numbers. This re-
alization is important in the teaching of algebra because, when you teach rational
expressions, you should remind students that if they know how to handle rational
numbers, then they already know all there is to know about this topic. There is so
much in introductory algebra that is just a revisit of arithmetic.

Because the cancellation law is valid for rational quotients (item (a) on page 118,
AC = C for all rational numbers A, B, C, with A = 0, C = 0), some
which says AB B

rational expressions can be simplified. Sometimes the cancellation presents itself,


as in
(5x4 − x3 + 2)(2x − 15)
,
(14x2 + 3x − 28)(5x4 − x3 + 2)
Here, the number (5x4 − x3 + 2) in both the numerator and denominator can be
canceled, resulting in
(5x4 − x3 + 2)(2x − 15) 2x − 15
= .
(14x2 + 3x − 28)(5x4 − x3 + 2) 14x2 + 3x − 28
Sometimes, the cancellation can be less obvious. For example, the rational expres-
sion
8x − 1
1 3

x2 + 2x + 4
can be simplified to 18 (x − 2) because, by an identity (6.5) on page 306,
1 3 1 1 1
x − 1 = (x3 − 8) = (x3 − 23 ) = (x − 2)(x2 + 2x + 4)
8 8 8 8
and we can cancel the number (x2 + 2x + 4) from the numerator and denominator.
(As we know from the theory of quadratic equations, e.g., Section 2.1 in [Wu2020b],
this particular rational expression in x actually makes sense for all (real) numbers
x because x2 + 2x + 4 has a negative discriminant and is therefore never equal to
3 −8
0. Consequently, the identity x2x+2x+4 = x − 2 is in fact an identity for all (real)
numbers x.)
In beginning algebra, often there is too much emphasis on simplifying rational
expressions. This is a leftover from the ill-informed practice of teaching fractions
by insisting on the reduction of all fractions to lowest terms.
It remains to round off this discussion by mentioning that one can easily define
polynomials in several numbers x, y, z, etc., and therefore one can likewise
define rational expressions in x, y, z, etc.
318 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

SUMMARY: All computations involving expressions in a number x are the


same as ordinary computations in rational numbers. A thorough knowledge of ra-
tional numbers is therefore the foundation for learning algebra (see the recommen-
dations on pp. 17–18 of [NMAP1]; see also pp. 3-40 to 3-41 of [NMAP2]). Any
effort to teach algebra to students who do not know rational numbers is probably
doomed from the start.

Mathematical Aside: Rational functions are to polynomials as the rational


numbers, Q, are to the integers, Z. Precisely, rational functions are elements of the
quotient field of the integral domain of polynomial, in the same way that Q is the
quotient field of the integral domain of Z. This is the advantage of the abstract
viewpoint in advanced mathematics: it allows us to better perceive the coherence
of mathematics.

Pedagogical comments on the teaching of "variable"

Although getting every school student to learn the fundamentals of algebra is


a national goal (e.g., page xviii of [NMAP1]), the education establishment does
not seem to have tried very hard to make algebra learnable. One reason is that
TSM20 erects an artificial road block in students’ learning path by making the
understanding of the "concept of a variable" a sine qua non of algebraic proficiency.
A typical pronouncement on the importance of the "concept of a variable" goes as
follows:
Understanding the concept of variable is crucial to the study of
algebra; a major problem in students’ efforts to understand and
do algebra results from their narrow interpretation of the term.
. . . Two particularly important ways in grades 5–8 are using a
variable as a placeholder for a specific unknown, as in n + 5 =
12, and as a representative of a range of values, as in 3t + 6.
([NCTM1989, p. 102])
One way for students to acquire more than a "narrow interpretation" of the "concept
of variable" would be to teach them the precise meaning of the "concept of variable"
in a forthright and understandable manner. What then does TSM have to offer?
Nothing that would serve this purpose, as can be seen from the following random
sample from school textbooks:
A symbol is a variable.
A variable is a letter used to represent one or more numbers.
A variable is a quantity that changes or varies.
Variables represent quantities whose values vary.
A variable is a letter or other symbol that can be replaced by any
number (or other object) from some set. . . . A sentence with a
variable is called an open sentence, and it is called open because
its truth cannot be determined until the variable is replaced by
values.
There is also a video on the internet ([KhanAcad]) that illustrates very well why stu-
dents are confused over the TSM "concept of variable". Incidentally, the reference

20 See page xiv of the preface for the definition of TSM.


6.1. SYMBOLIC EXPRESSIONS 319

to "open sentences" in the last item of the preceding list is part of the erroneous
thinking that√led to the problem of asking students to interpret a string of symbols
such as y = 3x − 7 on page 299.
You may have gotten the idea by now that, in order to make mathematics
learnable, we must express mathematical messages clearly and precisely. But if
clarity and precision are what you are after, then TSM would be the wrong place to
look because, for instance, none of the preceding so-called definitions of a "variable"
has the necessary clarity and precision that we normally demand of a mathematical
definition. A "variable" is certainly not "a quantity that changes or varies", because
nothing in mathematics ever changes or varies. Let us get down to the details of this
assertion. Anticipating a later discussion about functions, suppose we want to say
one real-valued function f is less than another function g; i.e., in symbols, f < g.
In the spirit of "a quantity that changes or varies", this is a formidable statement
about one quantity f (x) that varies being smaller than another quantity g(x) that
also varies. This suggests that, even when two quantities wiggle all over the place,
one can discern in some mysterious fashion that one is "smaller" than the other.
This is heady stuff, but for the purpose of understanding and doing mathematics,
the correct definition of f < g is much simpler and much more mundane, and it
is this: for every point x0 in their common domain of definition, the two numbers
f (x0 ) and g(x0 ) satisfy f (x0 ) < g(x0 ). In other words, just check one point at a
time, so that at each x0 , f (x0 ) < g(x0 ). Nothing varies.
The truth is that a variable is not a mathematical concept. It is rooted
in tradition and was used as suggestive language back in the days when the very
concept of a function had not been clearly formulated and mathematics did not
possess the transparency and precision that it does today. In the year 2020, the
"concept of variable" is kept alive as an integral part of mathematics only in TSM.
It is time to teach students how to use symbols correctly and not allow this bogus
"concept" to wreak havoc on student learning.
It is possible that you consider this insistence on the proper use of symbols—
the basic etiquette in the use of symbols (page 299)—to be nothing more than a
piece of pedantry flaunted only by professional mathematicians. To dispel this
misconception, consider a typical passage from TSM:21
y = −3 implies y sin x = −3 sin x, for all real values of x but
y sin x = −3 sin x does not imply y = −3 because the principle
that ac = bc a = b [sic] does not apply when c = 0 (i.e., a(0) =
b(0) does not imply that a = b). ([MUST], page 395)22
We will take for granted that x and y are numbers. Consider the second phrase,
but y sin x = −3 sin x does not imply y = −3.
Without knowing what x is, this phrase can have no meaning; it may be true or
it may be false, as the ensuing discussion shows. Let us try to properly quantify
the symbol x. In context, the beginning of the passage ("y = −3 implies y sin x =
−3 sin x, for all real values of x") suggests that, perhaps, what is intended is the

21 As noted in the footnote on page 303, we continue to make use of some mathematics that

we have not yet discussed, but which you most likely know, to illustrate a point.
22 There is an obvious typo in the original, which states "x = −3 implies x sin x = −3 sin x

but x sin x = −3 sin x does not imply x = −3". We have done our best to make sense of it.
320 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

statement that "but y sin x = −3 sin x for all real values of x does not imply y =
−3". If so, then this statement is incorrect since by letting x = π2 , we get y·1 = −3·1
so that we get y = −3, contrary to the claim.
Now it is possible that what is intended is, instead, the following:
but if x = π, then y sin x = −3 sin x does not imply y = −3.
Indeed, we then get y · 0 = −3 · 0, so that 0 = 0 and one cannot reach the conclusion
that y = −3. This would be consistent with the last part of the passage, although
if this is the case, it would be far simpler to replace the whole passage by a direct
statement: "if a, b, and c are real numbers, then unless we know c = 0, the equality
ac = bc does not imply a = b."
In summary, we see that the above passage is meaningless because symbols
are used without proper quantification. Moreover, when the symbols are properly
quantified, the passage becomes either wrong or trivial. But we should not take the
easy way out by ascribing such errors to the carelessness of the authors of the pre-
ceding passage. This must be recognized for what it is, a systemic error in TSM.
Those of us who allow TSM to be taught to generations of students and ensure
that TSM (rather than mathematics) remains the enduring content component of
mathematics education research are the main culprits. Let us do better and teach
the basic etiquette in the use of symbols in mathematics education.

Exercises 6.1.
(1) (a) Show that every positive integer can be expressed as 3k, 3k+1, or 3k+2
for a whole number k. (Hint: Use division-with-remainder.) (b)√Use (a)
to show that there are no positive integers x and y so that y = 3x − 7.
(c) Show
√ that there are an infinite number of fractions x and y so that
y = 3x − 7.
(2) Suppose you know (x − y)2 = x2 − 2xy + y 2 for all numbers x and y. Using
this fact alone, prove that (x + y)2 = x2 + 2xy + y 2 for all numbers x and
y.
(3) If x is a nonzero number and n is a positive integer, what is
1 1 1 1 1 1
−1 + 3
− 6 + 9 − 12 + · · · + 6n−3 − 6n ?
x x x x x x
(4) Find the sum of
57 58 59 510 532
− + − + · · · − .
68 69 610 611 633
(5) If x is a number that makes all the denominators nonzero in the following,
simplify
(2x3 − 9x2 − 5x)/(x − 2)2
.
(x2 − 3x − 10)/(x4 − 16)
1 y
(6) If x and y are numbers and x = y, what does 2 + 3 =?
x − y2 x + y3
Simplify your answer as much as possible.
(7) It was known to Archimedes that for any positive integer k > 1 and for
any positive integer n,
1 1 1 1 k
1 + + 2 + ··· + n + = .
k k k (k − 1)k n k−1
6.1. SYMBOLIC EXPRESSIONS 321

(a) Prove that Archimedes’ identity is correct even when k is any num-
ber different from 0 and 1. (b) Conversely, prove that the generalized
Archimedes identity of part (a) implies the summation formula for the
finite geometric series.
(8) We have seen that identity (6.8) on page 309 is a special case of identity
(6.5) on page 306. Now prove the converse: identity (6.8) implies identity
(6.5).
(a) Prove the following identity: ab = ( a+b 2 ) − ( 2 ) for all numbers
2 a−b 2
(9)
a and b. (b) Use this identity to give a different proof of part (b) of
Exercise 18 in Exercises 2.6, page 132; i.e., among all rectangles with a
fixed perimeter, the square has the biggest area.
(10) Let a, b be positive integers, not both equal to 1, and let n be an odd
positive integer > 1. Prove that an + bn is not a prime. Would this hold
if n is even?
(11) (a) Factor 4x2 − 12x + 9 and 30x2 + 16x + 2 for any number x. (b) Factor
(30x2 − 16x + 2)2 − 5(30x2 − 16x + 2) − 14 for any number x. (c) Factor
s4 + s2 t2 + t4 into polynomials in s and t with integer coefficients for any
numbers s and t. (d) Factor s4k + s2k t2k + t4k for any positive integer k.
(12) Simplify (assuming the denominators below are never zero for the numbers
x and y):
2x2 +7x+3
4x4 − 9y 4 15x3 y 4 − x4 y 3 x3 −7x+6
(i) ; (ii) ; (iii) .
4x4 + 12x2 y 2 + 9y 4 60x5 y 2 − 4x4 y 3 x+2
x−1
(13) How much money is in an account at the beginning of the sixteenth year
if the initial deposit is $500, the annual interest rate is 5%, and at the
beginning of each year starting with the second, $10 is added to the ac-
count? Write down the formula, and then use a scientific calculator to get
a numerical answer.

In each of the following exercises, you are asked only to write


the equations that fully capture the verbal information. No so-
lution is required.
(14) Two women started at sunrise and each walked at constant speed. One
went straight from City A to City B while the other went straight from B
to A. They met at noon and, continuing with no stop, arrived respectively
at B at 4 pm and at A at 9 pm. If the sunrise was x hours before noon
and if L is the speed of the woman going from A to B and R is the speed
of the woman going from B to A, transcribe the information above into
equations using the symbols L, R, and x.
(15) The sum of the squares of three consecutive integers exceeds three times
the square of the middle integer by 2. If the middle integer is x, express
this fact in terms of x. If the smallest of the three integers is y, express
the same fact in terms of y.
(16) I have $4.60 worth of nickels, dimes, and quarters. There are 40 coins
in all, and the number of nickels and dimes is three times the number
of quarters. If N , D, and Q denote the number of nickels, dimes, and
quarters, respectively, write equations in terms of these symbols to capture
the given information.
322 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

(17) We are given two whole numbers so that, when the larger number is
divided by the smaller number, the quotient is 9 and the remainder is 15,
and so that the larger number is 97.5% of ten times the smaller number.
If x is the larger number and y is the smaller number, express the given
information in equations in terms of x and y.
(18) A video game manufacturer sells out every game he brings to a game
show. He has two games, an A Game and a B Game. He can bring 50 of
A Games and B Games in total to the show. Each A Game costs $75 to
manufacture and brings in a profit of $ 125. Each B Game costs $165 to
manufacture and brings in a profit of $185. However, he only has $6,000
to spend on manufacturing. If he brings x A Games and y B Games,
describe in terms of x and y how he can maximize his profit.

6.2. Solving linear equations in one variable


This section explains what an equation in one variable is and what it means to
solve it. Because "solving equations" is the most basic part of school algebra, it is
difficult to imagine that TSM could get it wrong—completely wrong—for so long,
but this is the reality of school mathematics education in the grips of TSM. We will
present two ways to make sense of the usual TSM procedure of solving an equation
and, in the process, lay bare the fact that, without the ability to compute fluently
with rational numbers, it is impossible to learn algebra.
What is an equation? (p. 322)
How to solve an equation (p. 324)
A second way to solve an equation (p. 328)

What is an equation?

We have already discussed "equations" informally on pp. 300ff., but it is time


to define this concept precisely. An equation in one variable x is a question that
asks, when two expressions f (x) and g(x) in a number x are given, whether there
is a number k so that f (x) is equal to g(x) when x = k, i.e., so that f (k) = g(k).
Such a number k, if it exists, is called a solution of the equation. Sometimes we
also say k satisfies the equation. By tradition, this question (i.e., the equation)
is symbolically written in the form
f (x) = g(x).
We can understand this better by looking at an example. Let the two expressions
be 2x2 − x − 1 and 0 (= 0 · x). Then the question of whether there is a number k
so that 2k2 − k − 1 = 0 is known as a quadratic equation in one variable and,
of course, this question usually appears as
2x2 − x − 1 = 0.
In this case, a solution k is a number so that 2k2 − k − 1 = 0. For example, it
is easy to check that 3 is not a solution of the equation 2x2 − x − 1 = 0, because
2 · 32 − 3 − 1 = 14 = 0, whereas 1 and − 12 are solutions of 2x2 − x − 1 = 0. Because
we do not know ahead of time what the solutions of an equation are going to be or
in fact if there is any solution at all, the letter x is also called an unknown (see
page 301).
6.2. SOLVING LINEAR EQUATIONS IN ONE VARIABLE 323

To solve an equation is to obtain all the solutions of the equation. In the


preceding example of 2x2 − x − 1 = 0, we already know that 1 and − 12 are solutions,
but we have not yet solved this equation because there may be other solutions. It
turns out that 1 and − 12 are all the solutions of 2x2 − x − 1 = 0 (see Section 2.1 of
[Wu2020b]). Assuming this fact, we have solved the equation 2x2 − x − 1 = 0, and
its solutions are 1 and − 12 .
In general, we have no prior guarantee of how many solutions an equation has.
There may be exactly one (e.g., 3x = 2), there may be an infinite number (e.g.,
x2 + 2x + 1 = (x + 1)2 ), or there may be none (e.g., x2 + 1 = x2 − 3). To the
extent that there may be many solutions to a given equation in x, the symbol x is
sometimes called the variable of the equation to indicate that there may be more
than one number x to make the equation valid (see page 301 again).

Activity. Prove the assertions about the number of solutions of the given
equations in the preceding paragraph.

When the two expressions are linear polynomials, let us say ax + b and cx + d
in x, where a, b, c, d are given numbers, then the question of whether there is a
number k so that ak + b = ck + d is called a linear equation in one variable,
the variable being x. The given numbers a, b, c, d are the constants (see page 301)
of the equation. The usual representation of this linear equation is
ax + b = cx + d.
Be that as it may, one should not lose sight of the fact that "ax + b = cx + d"
is nothing more than a compact representation of the preceding question. More
generally, if two number expressions in a number x have the property that each
expression becomes an expression of the form ax + b after applications of the com-
mutative, associative, and distributive laws, then the question about whether they
are equal or not will also be called a linear equation in one variable. Thus
the equation in x, 5x − 75 + (2 − 4x) = 6x − (68 + 3x) − 7, is a linear equation in
one variable because it is easily seen to be the linear equation x − 73 = 3x − 75
(5x − 75 + (2 − 4x) = x − 73 and 6x − (68 + 3x) − 7 = 3x − 75).
What has been said about equations in one variable extends to equations in any
number of variables. For our need, we single out the case of two variables. Thus, an
equation in two variables x and y, normally written as f (x, y) = g(x, y), is a
question that asks, when two expressions f (x, y) and g(x, y) in the numbers x and y
are given, whether there is a pair of numbers k and  so that f (k, ) = g(k, ). Such
a pair k and  is called a solution of f (x, y) = g(x, y). To solve the equation
f (x, y) = g(x, y) is to find all its solutions.

Pedagogical Comments. Some readers may be aghast that we actually take


the trouble to give a definition of an equation. Isn’t an equation something that
students solve, and don’t they all recognize this fact with ease? Perhaps, but
there are at least two reasons for offering such a definition. The first is education
researchers’ finding that many elementary and middle school students interpret
the equal sign as an operational command to perform a calculation rather than
as a statement about two numbers, two sets, etc., being "equal" and that this
misconception hampers their learning of algebra (see, e.g., [Carpenter et al.] and
[McNeil et al.]). In other words, deep down, many students do not know what an
324 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

equation is! A second reason is more subtle. We have emphasized all along the need
for precision and transparency to make school mathematics more learnable. Now,
an equation such as x2 = 7x − 10 has a "variable" x on each side, and a "variable"
is something that "varies", according to TSM. So if both of the expressions x2 and
7x − 10 are "varying" all the time, what does it mean to say that they are "equal"?
It certainly does not mean that no matter how x varies, the equality x2 = 7x − 10
holds for all values of x (try x = 0, for instance). It is a fact that TSM does not
even explain what the equality x2 = 7x − 10 involving a variable x is all about. Yet,
by routinely asking students to solve such an "equation", TSM is as good as telling
them not to ask what an "equation" is23 but just go ahead and "do something with
it to get an answer". We can all agree that it is imperative to remove this mindset
from school classrooms. To this end, a first step will be to offer students a precise
definition of an equation and, strictly on the basis of this definition, show them how
to solve the equation. We will do just that in the next subsection.
We note that this definition of an equation in a symbol is not the traditional
one in education research (cf. [McCrory et al.] and further references therein). End
of Pedagogical Comments.

How to solve an equation

In this subsection, we will limit ourselves to solving linear equations of the form
ax + b = cx + d. The main theorem about solutions to a linear equation is Theorem
6.3 on page 326. Note, however, that the reason we spend the effort to explain how
to solve linear equations is that the same principle will be broadly applicable to the
solution of any equation. Therefore it will be effort well spent.
In TSM, a linear equation in the number x such as
27
(6.13) 3x − 4 = x+1
5
is supposed to be solved via the following easily recognizable steps.
5 x + (3x − 4) = − 5 x + ( 5 x + 1).
(a) − 27 27 27

(b) − 12
5 x − 4 = 1.

5 x − 4) + 4 = 1 + 4.
(c) (− 12
(d) − 12
5 x = 5.

5 x) = (− 12 ) · 5.
5
(e) (− 12 )(− 12 5

(f) x = − 25
12 .
The conclusion is that − 25
12 is the solution of the equation 3x − 4 = 5 x + 1.
27

Perhaps you are so accustomed to this routine method of solution that you can
no longer see its many flaws or imagine how it can stultify a beginner’s mathematics
learning. Let us consider a few questions that a beginner might raise:
(A) How do we know that the "solution" in step (f), i.e., − 25 12 , is a solution
of equation (6.13)? Nothing in the computations given in (a)–(f) shows that − 25 12
satisfies equation (6.13) or that it is the only solution.
(B) Recall that, in TSM, the x is a "variable". The six steps (a)–(f) make the
assumption that we can do arithmetic with the "variable" x as if it were a number.
If one interprets x as a "quantity that varies", then this makes no sense because,
23 Once again, "Ours is not to reason why."
6.2. SOLVING LINEAR EQUATIONS IN ONE VARIABLE 325

thus far, there is no theorem that says "quantities that vary" satisfy the associative,
commutative, and distributive laws. Yet, (a)–(f) freely make use of these laws. For
example, in going from (a) to (b), the distributive law is used to conclude that
5 x + 3x = − 5 x. But why?
− 27 12

(C) Now suppose we interpret x as just a number (cf. the second item in the
TSM definitions of a "variable" on page 318); then the equations in (a)–(f) become
equalities between numbers. The question is whether the number x has to be some
special number or just any number for (a)–(f) to hold. For example, if x = 0,
then (b) would read "−4 = 1", which is absurd, and (d) would read 0 = 5, equally
absurd, and so on. If x has to be a special number, then what is it, and why wasn’t
this announced from the beginning?
It is possible that beginners do not know enough to articulate these doubts, but
very likely they have them in the back of their minds. Yet, after years of not getting
satisfactory answers—because TSM does not provide answers to basic questions—
most students have learned to suppress their natural curiosity by the time they get
to middle school for the sake of "getting the right answer". So they stop asking why
and just follow instructions. The end result is that students coming out of TSM
know how to get answers by memorizing facts and following template solutions, but
apparently not much else. Unfortunately, in the year 2020, robots are beginning
to excel exactly at carrying out instructions to perform preassigned tasks (see,
e.g., [Paquette] and [WikiAlphaGo]). TSM therefore threatens to produce students
who, upon graduating from high school, possess no technical skills that can help
them outperform a robot. In other words, until we improve school mathematics
education, our public education system will run the risk of producing students who
become instantly obsolete upon graduation. We have to avert this catastrophe by
making a real effort to teach students how to reason, which seems—for the time
being, at least—to be beyond the capabilities of robots. Let us forsake TSM and
make an effort to shed light on every murky corner of school mathematics to make
it transparent and learnable so that we can answer students’ questions, and let us
encourage them to never cease asking why. For the case at hand, let us explain how
to solve a linear equation correctly.
We begin by confronting the fact that, as it stands, the solution method de-
scribed in (a)–(f) makes no sense.24 However, because of its simplicity and the
fact that it works, this solution method is not going away anytime soon. For this
reason, we will reinterpret (a)–(f) in two different ways to help students make sense
of them (see pp. 327 and 329). But first, we will explain a correct method of solving
equation (6.13), i.e., 3x − 4 = 275 x + 1.
To begin with, we do not claim that there will be any solution to this equation.
Rather, we take the position that if there is such a solution, let us say x0 , then we
can find out what x0 must be. Let us therefore assume that there is such a solution
x0 to equation (6.13). Then, by the definition of a solution, we have
27
3x0 − 4 = x0 + 1.
5
24 Mathematical Aside: From an advanced standpoint, one may be tempted to argue that

the equation 3x − 4 = 275


x + 1 is taking place in the polynomial ring R[x]. This still makes no
sense, however, because two polynomials in R[x] are equal if and only if the coefficients of each
monomial in the two polynomials are pairwise equal. In this case, the coefficient of x on the left
is 3, whereas the coefficient of x on the right is 27
5
. Therefore the two sides can never be equal in
R[x] either.
326 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

We may therefore apply the ordinary arithmetic operations to the numbers on both
5 x0 to each side of this equality, we get − 5 x0 +
sides of this equality. By adding − 27 27

3x0 − 4 = 1, which is
12
(6.14) − x0 − 4 = 1.
5
In standard terminology, what we have just done is transpose the term 27 5 x0 to
the other side. Using the same terminology, let us transpose −4 to the right side
by adding 4 to both sides, thereby getting − 12
5 x0 = 1 + 4, which is

12
(6.15) − x0 = 5.
5
Now multiply both sides of this equality by − 12
5 5
and we get x0 = (− 12 ) · 5, which
is
25
(6.16) x0 = − .
12
We pause to observe that—if we are willing to conflate x0 with x—then equa-
tions (6.14)–(6.16) are identical with the equations in (b), (d), and (f) on page
324.25
To resume our discussion, let us summarize what we have accomplished: we
have proved that if there is a solution x0 to equation (6.13), i.e., 3x − 4 = 27 5 x + 1,
then this x0 must be equal to − 25 12 . In other words, we have just proved that the
solution to equation (6.13) is unique: if it exists at all, it will have to be − 25
12 . Note
that this says nothing about − 2512 being a solution to the original equation (6.13).
However, the verification that, indeed, − 2512 is a solution to (6.13) is straightforward:
simply check that
   
25 27 25
3 − −4= − + 1.
12 5 12
This is routine, provided one is fluent in computations with rational numbers: both
12 . So − 12 is a solution of 3x − 4 = 5 x + 1.
sides are equal to − 123 25 27

Altogether, we have just proved that − 12 is the unique solution of the equation
25

3x − 4 = 275 x + 1. Thus we have solved 3x − 4 = 5 x + 1, and its solution is − 12


27 25

(remember that solving an equation means obtaining all its solutions).


The preceding reasoning can be seen to be independent of the specific equation
used (3x − 4 = 275 x + 1) and is valid in a general context. It leads to the following
theorem:

Theorem 6.3. The equation ax + b = cx + d where a, b, c, d are constants and


d−b
a − c = 0 has the unique solution a−c .

25 The reader cannot help but notice that we have just referred to (6.14)–(6.16) and (b), (d),

(f) as "equations", whereas these are not equations in the sense of page 322. Here then is another
instance where we have to face up to a common linguistic abuse and learn to tiptoe through a
linguistic minefield: it is a common practice in mathematics to refer to any displayed collection
of symbols involving the equal sign as an "equation".
6.2. SOLVING LINEAR EQUATIONS IN ONE VARIABLE 327

Proof. The proof is broken into two steps:

d−b
Step 1. If x0 is a solution of ax + b = cx + d, then x0 = a−c .
d−b
Step 2. a−c is a solution of ax + b = cx + d. (Compare Exercise
9 on page 120.)

Because the proofs of these two steps are so similar to the solution of the equation
3x − 4 = 27
5 x + 1, we can safely leave them as Exercise 2 on page 330. The theorem
can therefore be considered to be proved.

Pedagogical Comments. We have explained why the TSM method of


solution given in (a)–(f) on page 324 makes no sense. By contrast, since the solution
of the equation 3x − 4 = 27 5 x + 1 given above involves nothing more than simple
computations with rational numbers (and nothing about "variables"), it should be
understandable by one and all. The key point is not to compute with a "variable"
but to work with an assumed solution x0 from the beginning and, through direct
computations, deduce what x0 has to be (i.e., Step 1 in the preceding proof) and
then turn around to check (i.e., prove) that this candidate for a solution is in fact
a solution (i.e., Step 2 in the preceding proof).
In this light, we come to an understanding of what (a)–(f) on page 324 are all
about: although (a)–(f), as is, are fatally flawed because they purport to compute
with a "variable" and the equalities cannot be justified, they are seen to be proce-
durally identical to our computations with the assumed solution x0 (compare the
remarks about equations (6.14)–(6.16) and (b), (d), and (f) on page 326). This is
where Theorem 6.3 comes in: it guarantees that the computations in (a)–(f), when
x is replaced by x0 , do lead to the correct solution of the equation (as confirmed by
Step 2 in the proof of the theorem). Therefore, as far as getting the correct solution
is concerned, (a)–(f) are good for that purpose. This explains why, procedurally,
school textbooks can get away with (a)–(f) as the "method of solution" even though
it is fundamentally nonsensical.
Needless to say, for the purpose of solving equations, we do not want school
students to repeat this long-winded ritual, every time, of first getting a candidate
for the solution by assuming that there is one, finding out what it is, and then
verifying that it is a solution by substituting it back into the original equation.
Our new understanding of (a)–(f) suggests a more reasonable way to teach how
to solve linear equations (or any equation) in the school classroom: explain to
students at least the essence of the proof of Theorem 6.3, especially the idea that
one begins by assuming there is a solution in order to find out what it is and that
the computations in (a)–(f) are not computations with a "variable" but those with
a number, namely, an assumed solution of the equation. Make sure they know that
solving an equation is an exercise in number computations rather than some magical
incantation in the land of the great beyond about something called a "variable".
Once they can demonstrate an understanding of the overall structure of solving an
equation encoded in Step 1 and Step 2 of the preceding proof, students should be
allowed to use the mechanical procedure of (a)–(f) above as a shorthand method for
finding a solution. A pleasant byproduct of such an explanation is that students
will better understand why they are required to check their work by verifying the
solution so obtained (because Step 2 above is an integral part of the solution).
328 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

If you as a teacher can make sense to students about what they are learning,
they will repay your efforts by making sense to you. End of Pedagogical Com-
ments.

A second way to solve an equation

We will now revisit the proof of Theorem 6.3 and recast it in a more abstract
setting. First we define two equations to be equivalent if they have the same
solutions; i.e., every solution of one is also a solution of the other. We also introduce
the following two operations on a linear equation in a number x.
(E1) Add the same number or same monomial to the expressions
on both sides of an equation. (For example, adding 21 to both
sides of 15x−21 = 28x+3 results in the expressions 15x−21+21
and 28x + 3 + 21.26 )
(E2) Multiply the expressions on both sides of an equation by
the same nonzero number. (For example, multiplying both sides
of 15x − 21 = 28x + 3 by 15 1
leads to the expressions 15 1
(15x − 21)
1
and 15 (28x + 3).)
We can see the relevance of these operations by noting that we used (E1) to obtain
equation (b) on page 324 from equation (6.13) by adding − 27 5 x to both sides of the
original equation 3x − 4 = 5 x + 1, and we obtained (f) from (d) on the same page
27
5
by multiplying both sides of the equation in (d) by (− 12 ).
We should also point out a key feature of both (E1) and (E2): each operation
is reversible in the sense that if, in (E1), we add a number or monomial A to both
sides, then if we follow it by adding −A to both sides, we get back the original
equation. Similarly, in (E2), if we multiply both sides of an equation by a nonzero
number B, then if we follow it by multiplying both sides by B1 , again we get back
to the original equation. Incidentally, the second comment shows why in (E2) we
stipulate that the number B be nonzero, as otherwise B1 would make no sense.
With these preparations out of the way, we now come to the main point:
Lemma 6.4. Applying either of the operations (E1) and (E2) to a given linear
equation of one variable results in an equivalent equation.
Proof. Instead of a general proof of Lemma 6.4, we will offer a proof of the lemma
for equation (6.13), which is
27
3x − 4 = x + 1.
5
It will be seen that the reasoning in this special case is in fact perfectly general.
First, let us prove the part of the lemma about (E1). So let us add, let us say, − 27
5 x
to both sides of (6.13) to get the expressions − 5 x − 4 and 1, and the resulting
12

equation is
12
(6.17) − x − 4 = 1.
5
We will show that equations (6.13) and (6.17) are equivalent. Suppose x0 is a solu-
tion of equation (6.13); then we have an equality of numbers: 3x0 − 4 = 27 5 x0 + 1. If

26 Notice that we are making use of Theorem 1 of the appendix in Chapter 1 (page 87) so

that there is no need to use any parentheses on either side.


6.2. SOLVING LINEAR EQUATIONS IN ONE VARIABLE 329

we add − 275 x0 to both sides of this equality, we get − 5 x0 − 4 = 1, which is exactly


12

the statement that x0 is also a solution of equation (6.17). Conversely, suppose


x1 is a solution of equation (6.17) and we must show that it is also a solution of
equation (6.13). Thus by hypothesis, − 12 5 x1 − 4 = 1. Now we add the number 5 x1
27

to both sides of the last equality to get 3x1 − 4 = 27 5 x1 + 1. But this says x1 is a
solution of equation (6.13), and the proof of the part of Lemma 6.4 about (E1) is
complete. We note that this reasoning depends on the fact that (E1) is reversible.
The reasoning with the part of Lemma 6.4 about (E2) is similar. We may therefore
consider the proof of Lemma 6.4 to be complete.

We are now in a position to give a Second Proof of Theorem 6.3 (see p.


326). Thus let the equation ax + b = cx + d be given so that a − c = 0. By repeated
applications of Lemma 6.4, we have the following:
(A) The equation ax + b = cx + d is equivalent to
(−cx) + ax + b = (−cx) + cx + d,
(B) which is equivalent to (a − c)x + b = d,
(C) which is equivalent to (a − c)x + b + (−b) = d + (−b),
(D) which is equivalent to (a − c)x = d − b,
1 1
(E) which is equivalent to a−c (a − c)x = a−c (d − b),
d−b
(F) which is equivalent to x = a−c .
d−b
Since the last equation clearly has the unique solution a−c , we see that the equation
d−b
ax + b = cx + d has the unique solution a−c . The proof of Theorem 6.3 is complete.

This new proof of Theorem 6.3 should be complemented by two comments.


The first is that it gives us another way to salvage the TSM method of solving an
equation given in (a)–(f) on page 324. Indeed, if we specialize the proof of Theorem
6.3 to equation (6.13), then what we get is the following:
(A) The equation 3x − 4 = 27 5 x + 1 is equivalent to
− 27
5 x + (3x − 4) = − 27 27
5 + ( 5 x + 1),
(B) which is equivalent to − 5 x − 4 = 1,
12

(C) which is equivalent to (− 125 x − 4) + 4 = 1 + 4,


(D) which is equivalent to − 125 x = 5,
5 x) = (− 12 ) · 5,
5
(E) which is equivalent to (− 12 )(− 12 5

(F) which is equivalent to x = − 12 .


25

Therefore the solutions of 3x−4 = 27 5 x+1 are exactly the same as those of x = − 12 ,
25

which consist of one number, − 12 . Now, the foregoing steps (A), (B), . . . , (F) are
25

the correct mathematical counterparts of the steps (a), (b), . . . , (f) on page 324,
respectively. In greater detail, what (a)–(f) really try to say—in view of (A)–(F)—is
not that the solutions of 3x − 4 = 27 5 x + 1 can be obtained by the computations
in (a)–(f), but that (a)–(f) present a succession of equations, each being equivalent
to 3x − 4 = 27 5 x + 1, so that at the end the equation so obtained is so simple (i.e.,
x = − 25
12 ) that its solutions can be read off.
A second comment is that the proof of Theorem 6.3 shows that solving a linear
equation is conceptually very simple: use (E1) to show that the given equation is
equivalent to an equation of the form Ax = B, where A and B are constants (this is
called isolating the variable; see step (D) above), and then use (E2) to conclude
330 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

B
that the equation Ax = B has the unique solution A. Therefore, the latter is the
unique solution of the original equation.

Pedagogical Comments. School textbooks traditionally teach the solving of


linear equations by breaking it up into the solving of one-step equations, two-step
equations, and multistep equations. When something as simple as the solution of a
linear equation is broken up in this incomprehensible manner and especially if at the
end of the discussion no overview about what has taken place is given, distortion of
the meaning of solving an equation is bound to take place in students’ minds (see
the comment about students’ misconception of the equal sign on page 323). Please
do not do this to your students when you teach. Rather, teach students to use (E1)
to isolate the variable in a linear equation, thereby getting an equivalent equation
Ax = B for some constants A and B, from which one obtains the desired solution
B
A.
It is worth noting that either proof of Theorem 6.3 makes it obvious why flu-
ency with rational number computations is a pre-condition for learning algebra. If
one has the correct understanding of what it means to solve an equation, then one
would realize that solving a linear equation is nothing more than a careful applica-
tion of computations with rational numbers. Your job now is to bring this message
to all your students. End of Pedagogical Comments.

Finally, we caution about a possible misunderstanding about Theorem 6.3: it


does not say that every linear equation of one variable, ax+b = cx+d, has a unique
solution. What it says is that this is true if a − c = 0. We now give some examples
to round off the picture. The equation 2x − 1 = 2x + 1 clearly has no solution, and
the equation 2x − 3 = 7x − (5x + 3) has an infinite number of solutions as every
number is a solution. We leave the general case to an exercise below (Exercise 5).

Exercises 6.2.
(1) Prove that ab is a solution of the equation ax = b, where a = 0, and that
any solution of the equation is equal to ab .
(2) Give the details of the proofs of Step 1 and Step 2 in the proof of Theorem
6.3 on page 326.
(3) Write out a proof of Lemma 6.4 on page 328 for a general linear equation
ax + b = cx + d.
(4) Imagine that you are teaching eighth-grade algebra and you have to ex-
plain for the first time how to solve 2x + 11 = 2 − x. How would you
explain (a) what an equation is, (b) what it means to solve this equation,
and (c) how to actually solve it, step by step?
(5) Formulate a precise statement about a linear equation in x, ax+b = cx+d,
where a, b, c, d are the constants, so that the equation has a unique
solution, has no solution, and has an infinite number of solutions. (Of
course you will have to prove it.)
5
(6) If 13 of a number N exceeds a third of N by 8, what is N ? (This is
Exercise 15 on page 68.)
(7) (In this exercise, you are assumed to know the formula for the circum-
ference of a circle.) Aaron and Warren start walking around a circle
from the same spot at the same time. They walk in the same direction,
6.3. SETTING UP COORDINATE SYSTEMS 331

Aaron at a constant speed of F feet per minute and Warren at 2F feet


per minute. After T minutes, they are at the greatest distance apart for
the first time. (i) What is the radius of the circle in terms of F and T ?
(ii) Without any computations, can you explain how many laps around
the circle Warren has walked after T minutes?
(8) In a pen containing chickens and rabbits, the ratio of chickens to rabbits
is 1 : 3. If we count the total number of feet of the animals in the cage,
there are 294. How many chickens and how many rabbits are there?
(9) Given three consecutive whole numbers, if the sum of the smallest plus
twice the next number plus three times the largest number is 110, what
are these numbers?

6.3. Setting up coordinate systems


To prepare for the discussion of linear equations in two variables, this section
introduces the basic idea of associating each point of the plane with a unique ordered
pair of numbers (x, y), called the coordinates of the point. This is the familiar pro-
cess of fixing a pair of perpendicular lines as coordinate axes and using them as the
reference for the introduction of coordinates. Having coordinates for points makes
it possible to translate algebraic objects into geometric objects, and vice versa; the
rest of this volume and [Wu2020b] and [Wu2020c] may be regarded as nothing more
than a sequence of exercises to demonstrate how this is done. One striking example
is the rendering of the distance between two points (a geometric concept) into an
algebraic expression in terms of their coordinates (see the distance formula on page
336).

We are going to introduce coordinates in the plane, in the sense that we


will associate to each point of the plane an ordered pair of numbers, and vice versa.
Because you already have a procedural familiarity with these concepts, we will be
brief. Incidentally, we will get to see in this process another reason why we took
the trouble to prove Theorem G4 on page 226, which asserts that opposite sides of
a parallelogram have equal length.
Choose two perpendicular lines in the plane so that one of them is horizontal;27
let them intersect at a point to be called O. The horizontal line is traditionally
designated as the x-axis, and the vertical one the y-axis. Together, the x-axis and
the y-axis are called the coordinate axes and the point O is called the origin.
The totality of the coordinate axes and the origin is said to form a coordinate
system.
We proceed to specify the choices of 0 and 1 on each coordinate axis to make
it into a number line (see assumption (L3) on page 167). The point O will be the
0 on the x-axis, and the point A on the x-axis so that A is to the right of O and
so that dist(A, O) = 1 will be the 1 on the x-axis. Now let ϕ be the 90-degree
counterclockwise rotation around O, then ϕ(A) lies on the y-axis above O. On the

27 Mathematical Aside: As is well known, it is irrelevant whether either line is horizontal, but

it is customary to keep things simple at the beginning by making this requirement.


332 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

y-axis, let O be the 0 and ϕ(A) be the 1, as shown:


Y
ϕ(A) = 1 s
@ ϕ

s X
O A=1

Of course the 1 on the y-axis could also have been specified as the point B on the
y-axis above O so that dist(B, O) = 1. This is because ϕ, being distance-preserving,
maps A to B (see (L7)(ii) on page 237). By our choices, the positive numbers on
the x-axis are to the right of O and the positive numbers on the y-axis are above O
on the y-axis. Naturally, the half-line (see Lemma 4.5 on page 173) on the x-axis
to the right of O is called the positive x-axis; the opposite half-line on the x-
axis is then the negative x-axis. The half-line on the y-axis above O is similarly
called the positive y-axis, and the opposite half-line on the y-axis is the negative
y-axis.
Making a particular choice of the origin and the x- and y-axes is referred to
as setting up a coordinate system. Clearly, there are an infinite number of
ways to set up a coordinate system; i.e., there are an infinite number of choices of a
pair of perpendicular lines (so that one of them is horizontal) as the x-axis and the
y-axis. In the following discussion, it will be understood that we have made a fixed
choice of a coordinate system. Let us pause to observe what we have achieved.
Let R2 denote the set of all ordered pairs of real numbers,28 which by con-
vention are written as (x, y) (where x, y ∈ R), so that the pair (a, b) and (b, a) are
considered to be distinct if a = b. Therefore in R2 , (3, 4) = (4, 3), and in general,

(a, b) = (c, d) ⇐⇒ a = c, b = d.

Again, note that there is no ambiguity as to what the equality between two ordered
pairs of numbers means.
Now, relative to a fixed coordinate system, we will assign an ordered pair of
numbers to each point Q in the plane in the following way. We call any line that
is either the given x-axis or parallel to the x-axis a horizontal line, and also any
line that is either the given y-axis or parallel to the y-axis a vertical line. Then
through Q draw two lines, one vertical and one horizontal, so that they intersect
the x-axis at a point A and the y-axis at a point B, respectively. Let the point
A correspond to the number a on the x-axis and the point B correspond to the
number b on the y-axis. Then the ordered pair of numbers (a, b) are said to be the
coordinates of Q (relative to the chosen coordinate axes); we write Q = (a, b),
and a is called the x-coordinate and b the y-coordinate of Q (relative to the
chosen coordinate axes). For clarity, we denote the x-axis and the y-axis by the
letters X and Y (sometimes x and y), respectively.

28 The superscript "2" in R2 reminds us that two real numbers are involved.
6.3. SETTING UP COORDINATE SYSTEMS 333

Qr rB = (0, b)

r X
A = (a, 0) O
Observe that, for this particular Q, a = −|OA| and b = |OB|. Also observe that
A = (a, 0) and B = (0, b). It is customary to abuse the notation and replace the
points (a, 0) on the x-axis by a and the points (0, b) on the y-axis by b. Therefore
the preceding picture is normally labeled as follows:
Y

Q r rb

r X
a O

Activity. Prove that a horizontal line is always perpendicular to a vertical


line.

The assignment Q → (a, b) then defines a transformation Φ (upper case phi)


from the plane to R2 once a coordinate system is fixed.29 Thus, we have
Φ : the plane → R2
so that Φ(Q) = (a, b) as above. Conversely, with respect to the same pair of
coordinate axes, we now show how to define a transformation Ψ : R2 → the plane
(Ψ = upper case psi). Thus, given an ordered pair of numbers (x, y) in R2 , we
define Ψ(x, y) to be the point of intersection of the vertical line passing through
(x, 0) and the horizontal line passing through (0, y).30 These two lines being unique,
by virtue of the parallel postulate, the point of intersection is also unique. Thus
the point Ψ(x, y) is well-defined. It is easy to see that the composition Ψ ◦ Φ is the
identity map of the plane and the composition Φ ◦ Ψ is the identity map of R2 .
There is thus a bijection between all the points in the plane and R2 , the set of all
the ordered pairs of numbers with respect to a fixed coordinate system in the plane
(see page 205 for the definition of bijection).
In view of the bijection Φ : the plane → R2 , we will often denote the plane
by R2 . But be aware that, in so doing, it is always understood that the choice of
29 Strictly speaking, we have only defined transformations from the plane to the plane, so Φ

is not a "transformation". However, (1) we will define a function in Section 1.1 of [Wu2020b], and
this Φ will be seen to be a function, and (2) we can broaden the definition of a transformation at
this point to include such a Φ if we wish.
30 Notice that strictly speaking, we should write Ψ((x, y)) rather than Ψ(x, y). But tradition

dictates that we use the latter and not the former, and it must be said that, as a matter of
convenience, tradition got it right this time!
334 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

a coordinate system has been made. Still with the bijection Φ : the plane → R2
understood, we usually identify a point with its corresponding ordered pair
of coordinates.

The x-axis, and more generally a horizontal line H that intersects the y-axis at
a number c, separates the plane into two parts. Denote by U p (respectively, Lo)
all the points P so that the horizontal line passing through P intersects the y-axis
at a number > c (respectively, < c). Then it is clear that the plane is the disjoint
union (see page 175) of the nonempty sets U p, Lo, and H. U p and Lo are called
the upper half-plane and lower half-plane of H, respectively. A point in U p is
said to be above H; likewise, a point in Lo is said to be below H.

Y
Up
c H

Lo
X
O

Some clarification of the terminology is in order. In assumption (L4) on page


176, we introduced the concept of the half-planes of a line, and it may be thought
that the terminology of upper and lower half-planes for U p and Lo could lead to
confusion. Actually this is not the case, because we will show in Section 1.4 of
[Wu2020b] that U p and Lo are indeed "half-planes" in the sense of (L4) (compare
also Exercise 3 on page 337). However, since the only thing that matters to us
in the present discussion is to be able to refer to the sets U p and Lo but not so
much whether U p and Lo satisfy the properties listed in (L4), we will defer the
discussion of their relationship with (L4) to Section 1.4 of [Wu2020b].
There is clearly an analogous discussion for vertical lines. Thus, suppose a
vertical line V intersects the x-axis at a number d; then we can define the left
half-plane L of V (respectively, the right half-plane R of V ) to be all the
points so that the vertical line passing through each of them intersects the x-axis
at a number < d (respectively, > d). Then the plane is the disjoint union of the
nonempty sets L, R, and V . A point in L is said to be to the left of V , and a
point in R is said to be to the right of V .

V Y

L R

X
d O
6.3. SETTING UP COORDINATE SYSTEMS 335

We will now use the new terminology of upper half-plane, etc., to give another
interpretation of the coordinates of a point P . First we recall the picture:
Y

P r rB = (0, b)

r X
A = (a, 0) O

By construction, P AOB is a parallelogram. By Theorem G4 on page 226, the


length of the segment P B is just |a|. Likewise, the length of the segment P A is just
|b|. Since the line LP A is parallel to the y-axis and the y-axis is perpendicular to the
x-axis, we see that LP A is perpendicular to the x-axis (Theorem G3 on page 224).
For the same reason, LP B is perpendicular to the y-axis. Thus, |a| is in fact the
distance of P from the y-axis in the sense of the definition on page 228, and |b| is
the distance of P from the x-axis. We have therefore obtained a new interpretation
of the coordinates of P :
The x-coordinate of P is the distance of P from the y-axis if
P is in the right half-plane of the y-axis, and it is minus this
distance of P from the y-axis if P is in the left half-plane of the
y-axis. The y-coordinate of P is likewise the distance of P from
the x-axis if P is in the upper half-plane of the x-axis, and it
is minus this distance of P from the x-axis if P is in the lower
half-plane of the x-axis.
The intersections of the four half-planes of the x-axis and the y-axis of a given
coordinate system form four regions in the plane R2 . These are called the four
quadrants of the coordinate system and are labeled I, II, III, and IV, as shown.
Notice that the four quadrants do not include any points on the coordinate axes.

II I
q
O
III IV

More formally, the four quadrants are defined as follows:


Quadrant I: all the points (x, y) so that x > 0 and y > 0. Quad-
rant II: all the points (x, y) so that x < 0 and y > 0. Quadrant
III: all the points (x, y) so that x < 0 and y < 0. Quadrant IV:
all the points (x, y) so that x > 0 and y < 0.
By introducing coordinates in the plane, we establish a bridge between geometry
(points in the plane) and algebra (an ordered pair of numbers). This was the
336 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

great insight of Pierre Fermat and René Descartes.31 The study of geometry using
coordinates is usually called analytic geometry. An example of the immediate
impact of the introduction of coordinates is the following distance formula between
two points in the plane. Suppose two points (a, b) and (c, d) in the coordinate plane
are given. We want to compute the distance between (a, b) and (c, d) in terms of the
four coordinates. By (L5) (page 184), this distance is the length of the hypotenuse
of the right triangle whose legs are the horizontal and vertical segments, as shown:
Y V

(c, d) rH
HH
H HH
b HHr
H
(c, b) (a, b)

c X
O
Now, (a, b) and (c, b) are points on the horizontal line H passing through the point
b on the y-axis. Hence,
(6.18) distance between (a, b) and (c, b) = |a − c|
(see (2.37) on page 126). Likewise, (c, b) and (c, d) are points on the vertical line V
passing through the point c on the x-axis. Hence,
(6.19) distance between (c, b) and (c, d) = |b − d|.
It therefore follows
 from the Pythagorean theorem that the distance between (a, b)
and (c, d) is |a − c|2 + |b − d|2 . Since |x|2 = x2 for any number x, we have
|a − c|2 = (a − c)2 and |b − d|2 = (b − d)2 . Hence,

(6.20) distance between (a, b) and (c, d) = (a − c)2 + (b − d)2 .
This is the distance formula we are after.

Before we leave the topic of setting up a coordinate system, we point out a


practical issue for future reference about the pictorial representation of coordinate
axes in the plane. If a point P is a point on the positive x-axis representing the
number t, then dist(P, O) = t (see (L5)(iii) on page 184) and the number on the
positive y-axis represented by ϕ(P ) (where ϕ is the 90◦ -counterclockwise rotation
around O) would also be t because ϕ is distance-preserving. Then, in the usual
pictorial presentations of the coordinate plane in textbooks, it is not practical to
insist at all costs that the 1’s on the x- and y-axes be visibly equidistant from O
because if t is large, then it would be difficult to represent a point such as (t, 50t)
on the ordinary page of a book. We will have further comments on this issue on
page 360.

31 The French mathematician René Descartes (1596–1650) is probably more famous as a

philosopher ("I think, therefore I am"). In addition to discovering analytic geometry with Fermat,
he was instrumental in codifying the modern symbolic notation. We will come across him again
in Section 3.2 of [Wu2020b].
6.4. LINES IN THE PLANE AND THEIR SLOPES 337

Exercises 6.3.
(1) Let ρ be the rotation of 180◦ with respect to the origin O. Prove that
ρ(x, y) = (−x, −y) for all (x, y).
(2) (a) Let L be the vertical line x = c. Prove that the reflection with respect
to L is given by the transformation Λ so that Λ(x, y) = (2c − x, y) for
all (x, y). (b) Formulate the corresponding statement for reflection with
respect to a horizontal line y = d.
(3) (a) Prove that the set U p defined on page 334 is convex. (Caution: This
is not as straightforward as you may think. Use Lemma 4.8 on page 178.)
(b) Prove that the set Lo defined on page 334 is also convex.
(4) Given two distinct points P = (a, b) and Q = (c, d), if P1 , P2 , . . . , Pn
are points on the segment P Q which divide P Q into n + 1 segments of
equal lengths, what are the coordinates of each Pi , for i = 1, . . . , n? (Hint:
Make use of Exercise 11 on page 238.)
(5) (a) Write down the set of all the points which are equidistant from (1, 0)
and (4, 0) and describe the relationship of this set with (1, 0) and (4, 0).
(b) Write down the set of all the points which are equidistant from P =
(p1 , p2 ) and Q = (q1 , q2 ) and describe the relationship of this set with P
and Q.
(6) (a) Given two points P = (p1 , p2 ) and Q = (q1 , q2 ), show that the mid-
point of the segment P Q has coordinates ( 12 (p1 + q1 ), 12 (p2 + q2 )). (Hint:
Compare Exercise 4 on page 294.) (b) Given P = (1, 2) and Q = (3, −1),
|P A|
find the point A on the segment P Q so that |AQ| = 32 . (c) Given two
points P = (p1 , p2 ) and Q = (q1 , q2 ), find the point B on the segment P Q
|P B|
so that |BQ| = m n , where m and n are given positive integers.
(7) If S is a geometric figure in the plane, define S1 to be the collection of all
the points P in the plane so that there is some point Q ∈ S of distance
≤ 1 from P , i.e., so that |P Q| ≤ 1 for some Q ∈ S. (i) If S is a point O,
then show that S1 is the closed disk of radius 1 centered at O. (ii) If S is
the unit segment from O to (1, 0), what is S1 ? Give reasons.

6.4. Lines in the plane and their slopes


The main goal of this section is to give a detailed exposition of the key concept—
the slope of a line—that underlies the proof of the theorem in the next section that
the graph of a linear equation in two variables is a line. Students’ confusion about
the concept of slope is well known. The root cause of this confusion is TSM’s
failure to give slope a correct definition. The main purpose of this section is to
set the record straight regarding slope: first, by defining it correctly and, second, by
showing that slope is, above all, a number attached to the line itself that describes
the "slant" and the "steepness" of the line (assumed to be nonvertical). There are
some subtleties in the definition of slope that should be left out of the typical middle
or high school classroom but which every teacher should be aware of nonetheless,
and these are duly pointed out (see (‡) on page 344).
The local slopes of lines at a point (p. 338)
The slope of a line (p. 342)
A formula for slope (p. 347)
338 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

The local slopes of lines at a point

We will begin by explaining what purpose the concept of the slope of a line
is supposed to serve and then show how to define it correctly. To put the last
statement in context, think of the corresponding situation regarding the concept
of a "fraction". As far as TSM is concerned, it is sufficient that students know a
fraction is a piece of pizza or, more generally, a part-of-a-whole. By contrast, we
have seen the need to define a fraction as a certain point on the number line before
we can develop the subject of fractions in a logical and coherent manner. We are
going to do the same with the slope of a line.32
Our starting point is the consideration of all the nonvertical lines passing
through a fixed point P . To further simplify matters, let P be the origin O. In
what way can we distinguish the following lines from one another?
Y
L1

L2

X
O

L3
L4
Intuitively, these lines differ in their "steepness": L1 is more "steep" than L2 , and
L4 is more steep than L3 , though L1 and L2 differ from L3 and L4 because the two
groups slant differently. Part of the challenge will be to figure out how to separate
the "steepness" of these two groups of lines. From this point of view, we eliminate
the vertical line—the y-axis—from our consideration at the outset because, having
the ultimate "steepness", the vertical line has no need for any discussion.
The next step is to convert these appealing but vague ideas into precise math-
ematics: can we use a number to measure the "steepness" of the nonvertical lines
passing through O? Let us denote the vertical line passing through the point (1, 0)
on the x-axis by {x = 1}; this notation will be explained in the first subsection
of Section 6.5 on pp. 351ff. Observe that, by the definition of the coordinates of
a point, all the points on {x = 1} have coordinates (1, y), where y ∈ R. Now a
nonvertical line L passing through O is not parallel to {x = 1} and must intersect
the latter at a point (1, s) for some s ∈ R. We assign the number s to the line L
as a measure of its "steepness". See the following picture:

32 The following exposition on slope has been used in [EngageNY] and [Eureka] with the

author’s consent.
6.4. LINES IN THE PLANE AND THEIR SLOPES 339

Y L1 L
(1, s) L2

(1, s 2)

(1,0)
X
O
(1, s 3)

L 4 { x= 1} L 3

We now explain intuitively how this number s serves the purpose of revealing "the
steepness of L", on two different levels. First, on the qualitative level, if {s > 0},
then L intersects the vertical line {x = 1} at a point above the x-axis (in the sense of
page 334) and therefore L will slant this way /. However, if s < 0, then L intersects
the vertical line x = 1 at a point below the x-axis (in the sense of page 334) and
therefore L will slant this way \. See L3 in the preceding picture, for example.
Thus the sign of s (i.e., whether s is positive or negative) already tells us the way
the line L slants, whether it is / or \. One can go further, however. It is visibly
obvious that the closer L gets to being vertical, the larger the absolute value |s| of
s is going to be. For example, L1 will intersect the line {x = 1} at a point (1, s1 )
very high above the x-axis so that s1 is going to be very large, whereas L4 will
intersect the line {x = 1} at a point (1, s4 ) very far below the x-axis and therefore
s4 will be "large negative" or, more correctly, s4 < 0 and |s4 | is very large. This
number s is therefore a good measure of the "steepness" of L. We will call this s
the local slope of L at O. We emphasize the need to refer to the whole phrase,
"the local slope of L at O", because, up to this point, this concept refers only to
the behavior of L at one point: at the origin O.
If two nonvertical lines passing through O have the same local slope at O, then
these lines join O to the same point on the vertical line {x = 1} and therefore are
the same line (assumption (L1) on page 165). Conversely, if the two lines passing
through O coincide, then of course they have the same slope. We may therefore
summarize this short discussion in the following lemma.

Lemma 6.5. Two lines passing through O have the same local slope at O if and
only if they are the same line.

There is another way to look at the use of the vertical line {x = 1} to measure
the local slope of a nonvertical line at O. When we say we assign s to be the local
slope of L at O if L intersects the vertical line {x = 1} at the point (1, s), it is
equivalent to saying that we are looking at the line {x = 1} as a number line so that
its 0 is the point (1, 0) and its 1 is the point (1, 1). Equivalently, we are identifying
the line {x = 1} with the y-axis by using the translation along the horizontal vector
from O to (1, 0). This point of view will be useful below.

Pedagogical Comments. In the school classroom, the preceding definition


of the local slope of L at O is likely to raise at least two questions. The first is:
340 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

why use the vertical line {x = 1} instead of the vertical line {x = 2} that passes
through (2, 0), for example? The answer is that the choice of the line {x = 1} to
measure the local slope of L at O is just a convention, as we now explain. Suppose
L intersects the line {x = 1} at the point (1, s), so that the local slope of L at
O is s. If we had chosen to use the line {x = 2} instead, then the same L would
intersect the line {x = 2} at (2, 2s) (use Theorem G15* on page 266; see picture).

It follows that if the line x = 2 is used instead of x = 1, then the local slope of
L at O would change from s to 2s for every s. In terms of our understanding of
what the "local slope of L at O" means, this hardly matters. More generally, the
following Activity gives the complete picture.

Activity. Let the local slope of a line L at O be s. Show that L intersects


the vertical line x = k at the point (k, ks) for every k, positive or negative. (Use
Theorem G11 (FTS*) on page 257 instead of Theorem G15*.)

This Activity shows that if we use the vertical line x = k for a number k > 0,
instead of the line x = 1, to measure the local slope of L at O, where L is any line
passing through the origin O, then the only effect that this change will have is that
the local slope of L at O will be multiplied by a factor of k. Therefore we may as
well keep things simple by using the line x = 1.
A second question that is likely to be asked is why not use vertical lines to the
left of O to measure the local slope of L at O? Notice that the lines x = k for
k > 0 are to the right of O, so this question is tantamount to asking why not use
the vertical lines x = k for k < 0? In fact, why not use the vertical line x = −1?
The reason for not using x = −1 is again a matter of convention. So suppose we
consider a line L passing through O which is slanted this way /. In our present
definition, the local slope of L at O is a positive number, let us say, s. Now suppose
we switch over to the line x = −1 for measuring the local slope of L at O. Then
the same L will intersect the vertical line x = −1 at the point (−1, −s), which is
easy to see (e.g., use congruent triangles or the 180-degree rotation around O).
6.4. LINES IN THE PLANE AND THEIR SLOPES 341

Y
L
s
X
s −1 O 1
(−1,−s )

x= −1 x= 1

In particular, the local slope of L at O is now −s, and −s is negative. However,


a longstanding tradition in mathematics is that lines that are slanted this way /
should have positive local slope at O. This tradition then rules out using a vertical
line to the left of O to measure the local slope of a line at O. End of Pedagogical
Comments.

Now let P be an arbitrary point, and consider all the nonvertical lines passing
through P . We ask the same question of how to distinguish these lines one from
the other.

Y L1

L2

L4 L3
O X

There is an obvious answer. We can pretend that P is just the origin O and do to
it what we did to O to get the local slope of each line at P and then use the local
slope (steepness) to distinguish these lines. More precisely, let X  and Y  be the
horizontal and vertical lines passing through P , respectively. These correspond to
the x-axis and y-axis, respectively.
342 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

LY
Y
Y L1

2 L
1
s
P 1 Q X
1
0.75

L2
X
O
Let Q be the point on X  of distance 1 to the right of P and let LY be the vertical
line through Q. Then LY corresponds to the vertical number line {x = 1}. The
distance between the parallel lines Y  and LY is thus 1 (see page 228 for the concept
of distance between parallel lines). If the coordinates of P are (x0 , y0 ), then the
coordinates of Q are (x0 + 1, y0 ). Thus P = (x0 , y0 ) and Q = (x0 + 1, y0 ). All
the points on LY have coordinates of the form (x0 + 1, y0 + y), where y is a real
number. Clearly, y > 0 if and only if (x0 + 1, y0 + y) is above Q, and y < 0 if
and only if (x0 + 1, y0 + y) is below Q. We can now show how to define the local
slope of a line L at P by making use of the horizontal line X  and the vertical line
LY . So let L be a line passing through P and let L intersect the line LY at the
point (x0 + 1, y0 + s). Then the local slope of L at P is by definition s. For
example, still referring to the preceding picture, the line L1 intersects the line LY
at (x0 + 1, y0 + 1), and therefore its local slope at P is 1. The line L2 intersects the
number line LY at the number (x0 + 1, y0 − 0.75), so its local slope at P is −0.75.
The same reasoning that proves Lemma 6.5 on page 339 also proves the follow-
ing lemma.

Lemma 6.6. Two lines passing through a point P have the same local slope at
P if and only if they are the same line.

The slope of a line

So far we have only looked at all the lines passing through a fixed point P and
we learned how to assign a local slope at P to each of these lines. Now we change
our vantage point by focussing on a single line instead. If we fix a line L in the
coordinate plane and if P1 and P2 are two points on L, then a critical question
naturally arises: is the local slope of L at P1 equal to the local slope of L at P2 ?
The following lemma answers this question affirmatively.

Lemma 6.7. The local slope of a nonvertical line L at P for a point P ∈ L does
not depend on P .
6.4. LINES IN THE PLANE AND THEIR SLOPES 343

Proof. By relabeling the points P1 and P2 if necessary, we may assume that P1 is


to the left of P2 , in the sense that if P1 = (p1 , p1 ) and P2 = (p2 , p2 ), then p1 < p2
(compare page 334).
Y
L
M2
P2
M1 1 Q2
P1
1 Q1

X
O
At P1 , we do the usual construction to get the local slope of L at P1 ; namely,
on the horizontal line passing through P1 , pick the point Q1 that is 1 unit to the
right of P1 , and let the vertical line passing through Q1 intersect L at a point M1 .
Similarly, repeat this construction at P2 to obtain the points Q2 and M2 .33
First consider the case that M1 is above the horizontal line LP1 Q1 (see the
definition of "above" on page 334). Then M2 is also above the horizontal line
LP2 Q2 , as shown in the preceding picture. Therefore the local slope of L at P1 is
positive and is equal to the length |M1 Q1 | and the local slope of L at P2 is also
positive and is equal to the length |M2 Q2 |. We have to show |M1 Q1 | = |M2 Q2 |.
If we can prove the congruence of P1 Q1 M1 and P2 Q2 M2 , then we would be
done. Now the congruence is a consequence of ASA, because |P1 Q1 | = |P2 Q2 | = 1,
|∠P1 Q1 M1 | = |∠P2 Q2 M2 | = 90◦ , and |∠M1 P1 Q1 | = |∠M2 P2 Q2 | because they are
corresponding angles of the transversal L with respect to the parallel lines LP1 Q1
and LP2 Q2 (Theorem G18 on page 277). Therefore the local slopes of L at P1 and
P2 are equal.
The other possibility is that M1 is below the horizontal line LP1 Q1 , as shown
in the picture below. Then M2 is also below the horizontal line LP2 Q2 .
Y

P1 1 Q1

M1
1
P2 Q2

M2
X
O L

33 Note that in this picture, M is put between P and P on the line L for the sake of visual
1 1 2
clarity. In general, P2 could very well be between P1 and M , but the validity of the reasoning to
follow does not depend on the relative positions of P1 , P2 , and M .
344 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

The local slopes of L at P1 and P2 are therefore equal to −|Q1 M1 | and −|Q2 M2 |.
The same reasoning as above then shows that |Q1 M1 | = |Q2 M2 | so that, once again,
the local slopes of L at P1 and P2 are equal. The proof of the lemma is complete.

Remark. There is a subtle point in the preceding proof that we have intention-
ally glossed over. How did we know that if M1 is above (respectively, below) LP1 Q1 ,
then M2 would also be above (respectively, below) the horizontal lines LP2 Q2 ? This
is a critical part of the proof because this allows us to conclude that, regardless of
whether the local slopes of L at P1 and P2 are equal or not, they are at least both
positive or both negative. It is only because of this fact that the equality of the two
local slopes reduces to the equality of the lengths of Q1 M1 and Q2 M2 . In general,
imagine that L is not a line but a curve; then we can have the following situation
where M1 is above LP1 Q1 but M2 is below LP2 Q2 :

Y P2 Q2
1
M1
M2 L

P1 Q1
1

X
O
One’s instinctive reaction is that this anomalous situation cannot occur when L
is actually a straight line. (Recall that line is synonymous with straight line in
this volume and in [Wu2020b] and [Wu2020c].) In other words, our intuition tells
us that the "straightness" of a line will eliminate this anomalous situation. The
following assertion then gives mathematical substance to our intuition:

(‡) Let P1 and P2 be two distinct points on a line L and let


Q1 (respectively, Q2 ) be the point on the horizontal line passing
through P1 (respectively, P2 ) that is 1 unit to the right of P1
(respectively, P2 ). Let the vertical line passing through Q1 (re-
spectively, Q2 ) intersect L at M1 (respectively, M2 ). Then M1
being above LP1 Q1 implies M2 is above LP2 Q2 , and M1 being
below LP1 Q1 implies M2 is below LP2 Q2 .

The key to this proof is Lemma 4.8 on page 178. Since the points P1 and P2
are interchangeable in (‡), we may assume that P1 is to the left of P2 . Thus we are
assuming that

(6.21) for P1 = (p1 , p1 ) and P2 = (p2 , p2 ), p1 < p2 .

Let the coordinates of M1 be (m1 , m1 ). Because M1 and Q1 are on the same
vertical line, they have the same x-coordinate. But the x-coordinate of Q1 is
(p1 + 1), by definition; therefore, m1 = p1 + 1. Thus M1 = (p1 + 1, m1 ). Similarly,
M2 = (p2 + 1, m2 ) for some number m2 , as shown:
6.4. LINES IN THE PLANE AND THEIR SLOPES 345






We summarize these facts for a later reference:

(6.22) M1 = (p1 + 1, m1 ), M2 = (p2 + 1, m2 ).

Let vertical lines be drawn from P1 , M1 , P2 , and M2 . Then the points of inter-
section of these lines with the x-axis are the x-coordinates of the aforementioned
points, and these are the numbers p1 , p1 + 1, p2 , and p2 + 1, according to (6.21)
and (6.22). Since p1 < p2 (by (6.21)), we have

p1 < p1 + 1 < p2 + 1, p1 < p2 < p2 + 1.

In terms of the concept of betweenness (page 167), this implies

(6.23) p1 ∗ (p1 + 1) ∗ (p2 + 1), p1 ∗ p2 ∗ (p2 + 1).

Since vertical lines are parallel to each other, by applying Lemma 4.8 on page 178
to the line L and the x-axis, we obtain by virtue of (6.21) and (6.23)

(6.24) P1 ∗ M1 ∗ M2 , P1 ∗ P2 ∗ M2 .

Now draw horizontal lines through P1 , M1 , P2 , and M2 ; then their points of in-
tersection with the y-axis are the y-coordinates of these same points, which are,
by virtue of (6.21) and (6.22), p1 , m1 , p2 , and m2 . Because of (6.24) and because
horizontal lines are parallel, by applying Lemma 4.8 on page 178 to the line L and
the y-axis, we obtain

(6.25) p1 ∗ m1 ∗ m2 , p1 ∗ p2 ∗ m2 .

Now the y-axis is a number line, and we know from the discussion on pp. 167ff.
that (6.25) means either

(6.26) p1 < m1 < m2 , p1 < p2 < m2

or

(6.27) p1 > m1 > m2 , p1 > p2 > m2 .

Suppose M1 is above the horizontal line LP1 Q1 ; then p1 < m1 , so that only (6.26)
is possible. Such being the case, the second part of (6.26) implies p2 < m2 , which
means that M2 is above the horizontal line LP2 Q2 , as desired.
346 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

On the other hand, if M1 is below the horizontal line LP1 Q1 , then m1 < p1 and
only (6.27) is possible. See this figure:

Y
p1 P 1 Q1
1

m1 M1
1
p2 P2 Q2

m2 M2
p1 p +1 p2 p +1 X
O 1 2 L
Such being the case, the second part of (6.27) implies p2 > m2 , which means that
M2 is below the horizontal line LP2 Q2 . The proof of (‡) is complete.

Pedagogical Comments. We do not recommend that the proof of (‡) be


presented in a school classroom except in special situations that call for it. First
of all, the reasoning is too sophisticated, and when it is applied to the proof of
something as pictorially obvious as (‡), it may very well turn students off. In
addition, what is asserted in (‡) has always been taken for granted in the school
classroom. At this point of students’ learning trajectory, they have to learn the
need for a correct definition of "slope" and must be made aware of the need to
use congruent triangles to justify the correctness of the definition. Their plate is
already full. Good sense in pedagogy would suggest that something like the proof
of (‡) be put away for the moment, though it might be mentioned in passing after
the proof of Lemma 6.7 has been given.
By now we have come across too many similar cases of proofs that, while ger-
mane to the school curriculum, really do not belong in a school classroom (e.g., the
Pedagogical Comments on page 204 and (♣) on page 262). This is a fact of life that
we should acknowledge. Nevertheless, we repeat that such proofs are important for
teachers: these proofs give them the reassurance that, indeed, everything in school
mathematics can be supported by reasoning. End of Pedagogical Comments.

Now to return to the discussion of slope, Lemma 6.7 tells us that it is no longer
necessary to refer to the local slope of a line L at a particular point P of L, be-
cause it does not matter which point P on L is used. This then allows us to finally
introduce the following definition.

Definition. Let L be a nonvertical line in the plane. The slope of L is the


local slope of L at P for any point P on L.

The following simple lemma is now an immediate consequence of the definition.

Lemma 6.8. A nonvertical line has 0 slope if and only if it is horizontal.


6.4. LINES IN THE PLANE AND THEIR SLOPES 347

Remark. We emphasize that this definition of the slope of a (nonvertical) line


brings out the fact that slope is, above all else, a single number. Moreover, we have
explained clearly the purpose this number (the slope) is supposed to serve (see the
discussion around page 339). Contrast this definition with the definition of slope
in TSM34 as two quantities, "rise" and "run", as in "rise over run".

The concept of slope also yields a simple criterion for deciding whether two
lines are the same. This criterion will be a key ingredient for the proof of Theorem
6.11 on page 354.

Theorem 6.9. If two lines pass through the same point and have the same
slope, then they are the same line.

Proof. This theorem is nothing more than a rewrite of Lemma 6.6 on page 342.
Suppose two lines L and L have the same slope and both pass through the same
point P . By the definition of slope, the local slopes of L and L at P are equal.
Therefore by Lemma 6.6, L = L . The proof is complete.

A formula for slope

For all the virtues of the preceding definition of the slope of a line, one should
not be blind to the fact that it is too clumsy for computations. Thus, to compute
the slope of a line L, the best one can do so far is to pick a point P on L; draw a
horizontal line through P and let Q be the point 1 unit to the right of P on this
horizontal line. Let Q = (x0 , y0 ). Through Q draw a vertical line LY and let L
intersect LY at the point with coordinates (x0 , y). Then the slope of L is equal
to y − y0 . (An equivalent way of looking at this is to regard LY as a number line
whose 0 is at Q and whose 1 is 1 unit above Q; the slope of L is the number on LY
at which L intersects LY (see page 342)).
If this were the only (excruciating) way to get the slope of a line, the concept
of slope might have been abandoned long ago. Fortunately, the following theorem
shows a way to compute slope that removes the clumsiness. In fact, the ratio in
(6.28) below is usually taken to be the definition of slope.

Theorem 6.10. On a given nonvertical line L, let any two distinct points
P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be chosen. Then the slope of L is equal to the ratio
y2 − y1
(6.28) .
x2 − x1
We will preface the proof of the theorem with a little discussion. The ratio in
(6.28) is called the difference quotient of P1 and P2 .

Remarks. (1) The ratio (6.28) is a rational quotient if the coordinates x1 , x2 ,


etc., are rational numbers. (See page 118.) This will be the last reminder of why
a detailed discussion of the arithmetic of complex fractions and rational quotients
is essential. Notice that insofar as we will allow x1 , x2 , etc., to be arbitrary real
numbers, we have to rely on FASM (page 133) to affirm that this definition makes
sense even when x1 , x2 , etc., are not rational numbers.
34 See page xiv for the definition of TSM.
348 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

(2) Because for any numbers a and b, ab = −a−b (Theorem 2.10 on page 116 when
a and b are rational; then use FASM), we see that the difference quotient (6.28)
of P1 = (x1 , y1 ) and P2 = (x2 , y2 ) enjoys an important symmetry property with
respect to P1 and P2 :
y2 − y1 y1 − y2
(6.29) = .
x2 − x1 x1 − x2
Therefore, in writing the difference quotient, the order of P1 and P2 does not matter.
We will henceforth assume that P1 is to the left of P2 ; i.e., x1 < x2 .
(3) Observe that the denominator of the difference quotient (6.28) is never 0
because, if it were, then x2 − x1 = 0 and x1 = x2 . The distinct points P1 and P2 on
L now would have the same x-coordinate and L, being a line that joins two such
points, would have to be a vertical line, contradicting the hypothesis that L is non-
vertical. Thus the denominator of this ratio is never zero, and the ratio makes sense.

Before giving the proof of Theorem 6.10, we have to discuss a geometric in-
terpretation of the difference quotient (6.28) that is of independent interest. This
interpretation will play a crucial role in all the considerations of slope. If the slope
of L is 0, then L is horizontal (by Lemma 6.8 on page 346) so that the difference
quotient (6.28) is identically zero for all P1 and P2 . From now on we may as-
sume that the slope of L is not 0. With this understood and with P1 = (x1 , y1 )
and P2 = (x2 , y2 ), x1 < x2 , as always, let the horizontal line passing through P1
and the vertical line passing through P2 intersect at R, so that P1 P2 R is a right
triangle with the right angle at R. Then we claim


⎪ |P2 R|

⎪ if the slope of L > 0,
y2 − y1 ⎨ |P1 R|
(6.30) =
x2 − x1 ⎪
⎪ |P2 R|


⎩ − if the slope of L < 0.
|P1 R|
The two cases of positive and negative slopes are illustrated by the left figure and
right figure below, respectively.

Y Y LY
LY
L Q
P2 1
P1 R
M
P1 M
1 Q R P2
X X
O O L

For the proof of (6.30), we begin with two general comments that are valid for
both cases in (6.30). We first claim that the coordinates of R are (x2 , y1 ). This
is because P1 and R, being on the same horizontal line, must have the same y-
coordinate, which is y1 (the y-coordinate of P1 ). Similarly, P2 and R, being on the
same vertical line, must have the same x-coordinate, which is x2 (the x-coordinate
6.4. LINES IN THE PLANE AND THEIR SLOPES 349

of P2 ). Thus, indeed,
R = (x2 , y1 ).
Next, since P1 and R lie on the same horizontal line, |P1 R| = |x1 − x2 | (see (6.18)
on page 336). By assumption, x2 > x1 , so we get
(6.31) x2 − x1 = |P1 R|.
To prove (6.30), first suppose the slope of L is positive. Since P2 and R lie on
the same vertical line x = x2 , |P2 R| = |y2 − y1 | (see (6.19) on page 336). We claim
that P2 lies above R so that y2 > y1 and, therefore (still with the assumption that
the slope of L is positive),
(6.32) y2 − y1 = |P2 R|.
Together with (6.31), this proves the first half of (6.30).
It remains to prove (6.32). For this, we will need the assumption of the posi-
tivity of the slope of L. On the horizontal line LP1 R , let Q be the point to the right
of P1 so that |P1 Q| = 1, and let the vertical line LY passing through Q meet L at
M . From the definition of local slope (page 339), the positivity of the local slope of
L at P means that M is above the horizontal line LP1 R (see the left picture above;
note that the picture shows the case where Q is to the left of R, but the reasoning
is the same even if Q is to the right of R). It is obvious from the picture that since
R is also to the right of P1 , the intersection P2 of the vertical line passing through
R with L is also above LP1 R ;35 i.e., P2 is above R as claimed. We have therefore
proved (6.30) in case the slope of L is positive.
If the slope of L is negative, the local slope of L at P1 is negative. Now the
definition of local slope shows that if Q is the point on the horizontal line LP1 R which
is 1 unit to the right of P1 and if the vertical line LY passing through Q meets L at
M , then M is below the horizontal line LP1 Q (see the right picture above). Clearly
P2 , being the point of intersection of L with the vertical line passing through a point
R to the right of P1 , is also below LP1 R .36 Thus P2 lies below R. Consequently, a
similar reasoning shows that y2 < y1 and
−(y2 − y1 ) = |P2 R|.
Together with (6.31) and Theorem 2.10 on page 116, this proves the second half of
(6.30). The proof of (6.30) is complete.

We are now ready for the proof proper of Theorem 6.10.

Proof of Theorem 6.10. The claim (6.30) transforms the difference quotient
(6.28) into a geometric quantity, namely, ± the ratio of the lengths of two sides of
a right triangle P1 P2 R. This then invites the consideration of similar triangles
(Section 5.3 on page 283). More precisely, let Q be the point on the horizontal line
P1 R so that Q is to the right of P1 and |P1 Q| = 1. Let the vertical line LY passing
through Q intersect L at M as usual. See the pictures below for both cases of a
positive and a negative local slope of L.

35 In fact, the reasoning that proves (‡) on page 344 also suffices to prove that M being above

LP1 Q implies P2 is above LP1 Q . But see the Pedagogical Comments on page 346.
36 But see the preceding footnote.
350 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

Y LY Y LY
L
P2 1 Q
M P1 R
P1
M
1 Q R P2
X X
O O L

We have drawn the pictures so that P2 is to the right of M purely for the sake of
clarity; the subsequent reasoning will be independent of this fact.
We will prove
(6.33) P1 P2 R ∼ P1 M Q.
Assuming (6.33) for the moment, we will finish the proof of Theorem 6.10. By the
proportionality of corresponding sides of similar triangles, we have the equality
|P2 R| |P1 R|
=
|M Q| |P1 Q|
which, by virtue of the cross-multiplication algorithm ((b) on page 70) and FASM,
is equivalent to
|P2 R| |M Q|
= .
|P1 R| |P1 Q|
Suppose the local slope of L at P1 is positive. Then by (6.30), this implies
y2 − y1 |M Q|
= = |M Q|.
x2 − x1 1
Since |M Q| is by definition the local slope of L at P1 , the preceding equality proves
Theorem 6.10 in the case of positive local slope. If the local slope of L at P1 is
negative, then an entirely analogous argument shows that
y2 − y1 |M Q|
=− = −|M Q|.
x2 − x1 1
Since the local slope of L at P1 is, by definition, equal to −|M Q| in this case,
Theorem 6.10 is now proved also in the case of a negative local slope.
It remains to prove (6.33). Observe that P1 P2 R and P1 M Q obviously sat-
isfy the AA criterion for similarity (Theorem 22 on page 288) because |∠P2 P1 R| =
|∠M P1 Q| and
|∠P2 RP1 | = |∠M QP1 | = 90◦ .
The desired similarity in (6.33) follows. The proof of Theorem 6.10 is complete.

Activity. Let P = (x0 , y0 ) be a point in Quadrant III of a coordinate system


with origin O. What is the slope of the line joining P to O? If Q is a point in
Quadrant II, what is the slope of the line joining Q to O?

The great significance of Theorem 6.10 is that, given a line, we can find its
slope by computing the difference quotient of any two points on the line that suit
our purpose. This is not only a useful idea to keep in mind in general, but also one
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 351

that is critical for the proof that the graph of a linear equation of two variables is
a line (Theorem 6.11 on page 354).

Exercises 6.4.
(1) Let  be the line joining (1, 2) and (−3, 4). If (x, y) is a point on , what
y−4
is the value of x+3 ?
(2) (a) Let  be the line with slope m passing through ( 12 , 34 ). For which value
of m would  pass through ( 53 , 13 )? (b) Let  be the line joining (− 32 , 4)
and ( 45 , q), where q is some number. For what value of q would  pass
through (2, −3)?
(3) Does the line joining (3, −2) and (6, 2) contain the point (9, 6)? Explain
your answer two different ways.
(4) Let P be a point in quadrant II and let Q be a point in quadrant IV. If
L is a line joining P to Q, what is the sign (page 339) of the slope of L?
Explain.
(5) Let L be the graph of a linear equation 3x + 4y = c for some constant
c. (a) What is the slope of L? (b) Suppose L passes through the point
(−1, 5). What is c?
(6) Let a, b be positive numbers. Can the three points (a, b), (2a, b + 2),
(−a3 , b − 1) be collinear? Explain.

6.5. The graphs of linear equations in two variables


The main theorem of this section affirms that the graph of a linear equation
of two variables is a line (recall that in these volumes, a line is synonymous with
a straight line, as on page 165) and that, conversely, every line is the graph of a
linear equation in two variables. This theorem is absolutely fundamental for the
understanding of all aspects of the graphs of linear equations in two variables, but it
is never stated in TSM, much less proved. One reason for such a spectacular failing
in TSM is that it never defines the concept of the slope of a line correctly. We
will demonstrate how a clear understanding of what slope is renders the standard
problems about finding the equation of a line completely routine.
Generalities about graphs of equations (p. 351)
The main theorem (p. 354)
Some applications of the main theorem (p. 357)

Generalities about graphs of equations

The concept of the graph of an equation is basic in mathematics, but it gets


short shrift in TSM. We will begin by defining it precisely. Because we will be
mainly interested in equations in two variables x and y, the following discussion
will focus on equations of two variables. The basic idea is the same, however, for
equations in any number of variables. Recall that an equation in two variables, x
and y, is a question asking whether two given expressions in the numbers x and
y are equal (page 323). We identify such a pair of numbers x and y with the
point (x, y) in the coordinate plane. Then such an equation is equivalent to asking
whether there are points (x0 , y0 ) in the coordinate plane so that their coordinates
make the two given expressions equal. We will call such points (x0 , y0 ) solutions of
352 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

the given equation. The collection of all the solutions of a given equation is called
the graph of the equation. The graph of an equation is therefore a subset of R2 .
For example, the equation x4 + y 2 + 1 = 0 has no solutions (because x4 + y 2 ≥ 0
for all numbers x and y), so that the graph of x4 + y 2 + 1 = 0 is the empty set,
i.e., the set with no elements. On the other hand, the graph of x2 + y 2 = 25 is the
circle of radius 5 around the origin (0, 0).

Activity. Use the distance formula (see equation (6.20) on page 336) to prove
the claim about the graph of x2 + y 2 = 25. Do not skip steps.

A linear equation in two variables x and y is a question about whether


there are numbers x and y so that ax + by = c, where a, b, and c are fixed constants
and at least one of a and b is nonzero. Sometimes one gains more flexibility by
defining such an equation to be an equation of two expressions in x and y which,
after applications of the two operations (E1) and (E2) on page 328, can be put in
the form of ax + by = c. For example, −2x + y = 25 y + 7 and 3x2 + 6 + 2y =
19 − 5x − 3y + 3x2 are examples of linear equations in two variables because, after
the use of (E1) and (E2), they can be brought to −2x + 35 y = 7 and 5x + 5y = 13,
respectively. The point (− 72 , 0) is a solution of the first equation, and ( 35 , 2) is a
solution of the second. In general, the graph of ax+by = c is the collection of all the
points (x0 , y0 ) in R2 so that ax0 + by0 = c. Your experience with TSM concerning
linear equations in two variables immediately tells you that this graph is a line
and, in fact, TSM leaves you with no doubt of this fact—even as it never tells you
why. The main theorem of this chapter (Theorem 6.11 on page 354) affirms this
fact, and everything we have done in this chapter up to this point is nothing but a
preparation for its proof.
Consider the example of a linear equation in two variables, x − 2y = −2. We
observe that in this situation, it is easy to find all the solutions of this equation with
a prescribed first number x0 or a prescribed second number y0 . For example, with
the first number prescribed as 3, we solve the linear equation in y, 3 − 2y = −2, and
get y = 52 . Therefore (3, 52 ) is the sought-for solution. Or, if the second number is
prescribed to be y0 , then we solve the linear equation in x, x − 2(y0 ) = −2, to get
x = 2y0 − 2. The solution is now (2y0 − 2, y0 ). Thus we see and will continue to
bear witness that the study of linear equations in two variables is grounded in the
study of linear equations in one variable.
Using the above method of getting all the solutions of the equation x−2y = −2,
we can plot as many points of the graph of x − 2y = −2 as we please to get a good
idea of the graph. For example, the following points are on the graph of x−2y = −2:

(−5, −1.5), (−4, −1), (−2, 0), (0, 1), (2, 2),
(2.5, 2.25), (4, 3), (6, 4), (7, 4.5).

Students need the experience of plotting points on the graph of an equation by hand,
and they should form this good habit right from the outset for the case of a linear
equation in two variables. In the age of the graphing calculator, there is all the
more reason to emphasize this need.
Consider next the graph of the linear equation in two variables y = 3 which,
as an equation in two variables, is in reality the abbreviated form of the equation
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 353

(0 · x) + (1 · y) = 3. We want to prove that the graph of y = 3 is the horizontal line


passing through the point (0, 3).37
Here is the simple proof. As always, in order to show the equality of sets, we
must show that the two sets have the same collection of elements. For the case at
hand, let the horizontal line be denoted by  and the graph of y = 3 be denoted by
G. We have to show that G and  have the same collection of points in the plane.
Recall that there is a standard way to do this (page 141), which is to show that
each set is contained in the other set, or in standard set-theoretic notation:
⊂G and G ⊂ .
Observe that G consists of all the points (s, t) which are solutions of y = 3 and
therefore consists of all the points (s, t) so that t = 3, i.e., all the points whose y-
coordinate is 3. Let us now prove  ⊂ G. Let P be a point of ; we must prove that
P ∈ G; i.e., the y-coordinate of P is 3. But the horizontal line  passing through
P intersects the y-axis at the number 3 (by assumption on ), so the y-coordinate
of P is 3, by definition. Hence P ∈ G. Conversely, we prove G ⊂ . Let P ∈ G,
and we must prove that P ∈ . Since P ∈ G, the y-coordinate of P is 3. Thus the
horizontal line passing through P intersects the y-axis at the number 3 (definition
of the y-coordinate of a point). This horizontal line is precisely , so P ∈ .
Y


3

X
O

The same reasoning shows that the graph of the equation y = b in the plane
for any number b is exactly the horizontal line L0 passing through the point (0, b)
on the y-axis. We can do the same to vertical lines. In summary, we have the
following:
the graph of x = c for any number c is the vertical line passing
through the point (c, 0) on the x-axis, and the graph of y = b
for any number b is the horizontal line passing through the point
(0, b) on the y-axis.
Since there is only one horizontal (respectively, vertical) line passing through a
given point of the plane (why?), it follows that
every vertical line is the graph of the equation x = c, where (c, 0)
is the point of intersection of the line and the x-axis, and every
horizontal line is the graph of an equation y = b, where (0, b) is
the point of intersection of the line with the y-axis.

37 This obvious fact is usually taught by rote, but we want to establish the fact that if the

definition of the graph of an equation means anything at all, the fact that the graph of y = 3 is a
line should follow logically from the definition. For this reason, we want to prove this fact clearly
and correctly. If there are any doubts about the need to take the definition of the graph of an
equation or function seriously, it suffices to consult the blog of [Meyer].
354 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

In the next subsection, we generalize these assertions to graphs of arbitrary linear


equations.

The main theorem

It is well known that the graph of a linear equation in two variables is a line, and
vice versa. What is not well known is the reason why this is true. The main purpose
of this section is to prove this fact. Precisely, we have the following theorem.

Theorem 6.11. The graph of a linear equation in two variables is a line. Con-
versely, every line is the graph of a linear equation in two variables.

We would like to immediately address a question that is implicitly raised by


the second part of the theorem. If the graph of ax + by = c is L, it is common
to say that L is the line defined by ax + by = c. By the second part of the
theorem, every line L is defined by a linear equation in two variables ax + by = c
for some constants a, b, c. Now suppose L is also defined by a x + b y = c for
some constants a , b , c . Are the two equations related? The answer is given by
the following lemma.

Lemma 6.12. Suppose a line is the graph of both ax + by = c and a x + b y = c .


Then there is a nonzero number k so that a = ka, b = kb, and c = kc. Conversely,
the graphs of the two equations ax + by = c and kax + kby = kc, where k = 0, are
the same line.

The proof of the first part of Lemma 6.12 is quite straightforward once you
consider the three cases separately: the line is vertical, horizontal, or neither. Of
course the proof of the second part is trivial. We will leave the details to an exercise
(see Exercise 4 on page 362).
Lemma 6.12 shows that if L is the graph of ax + by = c, then every equation
defining L is necessarily of the form kax + kby = kc for some k = 0. We naturally
regard all such equations (for any value of k) as the same equation. With this
understood, we are in the habit of saying in this case that the equation of L is
ax + by = c.

Proof of Theorem 6.11. We first prove that the graph of a linear equation of
two variables, ax + by = c (where a, b, c are constants), is a line.
If b = 0 in ax + by = c, then the equation becomes ax = c, whose graph is
clearly the same as the graph of x = c/a. The fact that the graph of x = c/a is
the vertical line passing through (c/a, 0) has been proved on page 353. We may
therefore assume from now on that in the equation ax + by = c, b = 0. The graph
of ax + by = c is clearly the same as the graph of y = −(a/b)x + (c/a). Therefore,
to complete the proof of Theorem 6.11, it suffices to prove the following:
If G is the graph of an equation y = mx + k, where m and k are
constants, then G is a line.38

38 In many textbooks, the standard notation for this equation is y = mx + b, but the "b" will

not work for us because our notation for a general linear equation is ax + by = c.
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 355

Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be two points on G. Let the line joining P1 and
P2 be denoted simply by L. We are going to prove G = L, thereby proving that
the graph of y = mx + k is a line.
Recall that to prove G = L, we have to first prove G ⊂ L and then prove
L ⊂ G. What is needed for both proofs is the fact that the slope of L is equal to
m. To see this, observe that since P1 and P2 are points on L,
y1 = mx1 + k and y2 = mx2 + k.
Therefore the difference quotient of P1 and P2 is
y2 − y1 (mx2 + k) − (mx1 + k) m(x2 − x1 )
= = = m.
x2 − x1 x2 − x1 x2 − x1
By Theorem 6.10 on page 347, the slope of L is m, as desired.
First: G ⊂ L. Given P  = (x , y  ) ∈ G, we must prove P  ∈ L. We do so
by proving that the line L that joins P1 and P  coincides with L. To this end,
Theorem 6.9 on page 347 implies that all we need to do is prove that L and L have
the same slope and pass through the same point. They clearly pass through the
same point P1 . As to their slopes, we have just seen that the slope of L is m. As
to the slope of L , note that since P  is on the graph G of y = mx + k, we have
y  = mx + k. Therefore the difference quotient of P1 and P  is
y  − y1 (mx + k) − (mx1 + k) m(x − x1 )

= 
= = m.
x − x1 x − x1 x − x1
By Theorem 6.10, the slope of L is also m. Thus L = L. Therefore P  ∈ L now
means P  ∈ L. This proves G ⊂ L.
Next: L ⊂ G. This time, let P  = (x , y  ) ∈ L and we have to prove that
P ∈ G, i.e., y  = mx + k. Since P1 is already a point of G, we may assume that


P  = P1 . So by Theorem 6.10, we may compute the slope of L (which is m) using


the points P1 and P  to obtain
y  − y1
m= .
x − x1
Simplifying, we get y  = mx + (y1 − mx1 ). However, recalling that P1 ∈ L, we also
have y1 = mx1 + k, which implies (y1 − mx1 ) = k. Altogether, we get y  = mx + k.
Thus P  ∈ G after all, and L ⊂ G. The proof that the graph of y = mx + k is a
line is complete.

Now that we know the graph of every linear equation in two variables is a line,
we can prove the second part of Theorem 6.11 that every line is the graph of a
linear equation in two variables. If the line is vertical, then we have already proved
on page 353 that it is the graph of an equation x = c. Henceforth, we may assume
the line is nonvertical. Thus let L be a nonvertical line and let P1 = (x1 , y1 ) and
P2 = (x2 , y2 ) be two distinct points on L. Let m be the slope of L. Then by
Theorem 6.10,
y2 − y1
(6.34) m= .
x2 − x1
Rewriting (6.34) as y2 − y1 = m(x2 − x1 ), we are led to consider the equation
(6.35) y − y1 = m(x − x1 ).
356 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

Observe that (6.35) is a linear equation in two variables because it is equivalent to


(6.36) y = mx + (y1 − mx1 ).
Furthermore, what (6.34) affirms is that (x2 , y2 ) is a solution of (6.35). Obviously,
(x1 , y1 ) is also a solution of (6.35). Thus, if G denotes the graph of the linear
equation (6.35), then both P1 = (x1 , y1 ) and P2 = (x2 , y2 ) lie in G. By the first
part of this theorem, G is a line. Therefore G is a line passing through P1 and
P2 . Recall that L is also a line passing through P1 and P2 . By assumption (L1)
(page 165), G = L, which implies L is the graph of equation (6.35). The proof of
Theorem 6.11 is complete.

Students who master this proof will never have to memorize the different forms
of the equation of a line. They will be able to derive the needed equation easily at
a moment’s notice or, at worst, they can memorize the different forms with much
greater ease because they now possess a mental framework in which each of those
forms can take its proper place. We will illustrate with some simple examples. To
this end, let us extract two useful facts from the preceding proof.
For the first lemma, define the y-coordinate of the point at which a line L
intersects the y-axis to be the y-intercept of the line; sometimes the point of
intersection is itself called the y-intercept of L.

Lemma 6.13. A nonvertical line L is the graph of the linear equation y =


mx + k, where m is the slope of L and k is the y-intercept of L.
Remark. This lemma explains why the equation y = mx + k is called the
slope-intercept form of the equation of L.

Proof. By Theorem 6.11, L is the graph of a linear equation ax + by = c for some


constants a, b, and c. Since L is nonvertical, b = 0. Thus this equation is equivalent
to (i.e., has the same set of solutions as) y = mx + k, where m = −a/b and k = c/b.
Since (0, k) is a solution of y = mx + k, k is the y-intercept of L. To see that m
is the slope of L, we use Theorem 6.10: let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be any
two distinct points on L. Then the slope of L is
y2 − y1
.
x2 − x1
Since P1 and P2 are points on the graph of y = mx + k, we also have y1 = mx1 + k
and y2 = mx2 + k. Therefore the slope of L is
y2 − y1 (mx2 + k) − (mx1 + k) m(x2 − x1 )
= = = m.
x2 − x1 x2 − x1 x2 − x1
The proof of the lemma is complete.
Lemma 6.14. Let L be the nonvertical line passing through two distinct points
P1 and P2 , where P1 = (x1 , y1 ) and P2 = (x2 , y2 ). Then the equation of L is
(6.37) y − y1 = m(x − x1 ),
where
y2 − y1
(6.38) m= .
x2 − x1
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 357

Proof. Let m denote the slope of L. By Theorem 6.10 on page 347, when m is
computed using P1 and P2 , it is exactly as in (6.38). Let (x, y) be an arbitrary
point on L different from P1 . Then the slope of L computed using (x, y) and P1 is
y − y1
(6.39) m= .
x − x1
This is equivalent to y − y1 = m(x − x1 ), which is precisely equation (6.37). The
advantage of (6.37) is that every point (x, y) on L now satisfies this equation. Let
G be the graph of (6.37). Note that G is a line, by Theorem 6.11. The point P2 lies
in G because of (6.38), and P1 lies in G for trivial reasons. Thus G is also a line
passing through P1 and P2 . By assumption (L1) (page 165), G = L, which implies
L is the graph of equation (6.37). This proves Lemma 6.14.

Using (6.38), we can write equation (6.37) as


 
y2 − y1
(6.40) y − y1 = (x − x1 ).
x2 − x1
The equation (6.40) is called the two-point form of the equation of a line. The
virtue of equation (6.40) is that it is essentially equation (6.39), and the latter is
entirely memorable because it says the slope of L computed by using P1 and P2
(= left side) is equal to the slope of L computed by using P1 and another point
(x, y) on L (= right side).

Some applications of the main theorem

We begin with some examples about finding the equation of a line satisfying
certain geometric conditions; the preceding lemmas will come in handy. For the
first example, we introduce another definition: the x-coordinate of the point at
which a line L intersects the x-axis is called the x-intercept of L; sometimes the
point of intersection is itself called the x-intercept of L.

Example 1. Given a line which is the graph of y = mx + k, we have seen


from Lemma 6.13 that k is the y-intercept of the line. We can also determine its x-
k
intercept: since the point (− m , 0) satisfies the equation y = mx+k, the x-intercept
of the line is −k/m.

Example 2. What is the equation of the line which passes through (−1, 3)
and ( 12 , 4) ?

Call this line . The slope of  computed using the points (−1, 3) and ( 21 , 4) is
4−3 2
= .
1
2 − (−1) 3
By equation (6.37), the equation of  is (y − 3) = 23 (x − (−1)), which is y = 23 x + 113 .
Alternately, we can use Lemma 6.13. Then the equation of  has the form
y = 23 x + k. Since the point (−1, 3) lies on , we have 3 = 23 (−1) + k. Thus k = 11 3
and the equation of  is y = 23 x + 11 3 . (We can equally well use the other point
( 21 , 4) to evaluate k in the equation y = 23 x + k. Then we get 4 = 23 · 12 + k, so that
k = 4 − 13 = 11 3 as before.)
358 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

Suppose you do not remember the exact statement of Lemma 6.14; then you
start from scratch. You would begin by computing the slope of the line as before,
obtaining 23 . Now you have to remember Theorem 6.10,39 which says the slope can
be computed using any two points. So you use (−1, 3) and an arbitrary point (x, y)
on  not equal to (−1, 3) to get
y−3 2
= .
x − (−1) 3
Therefore y − 3 = 23 (x + 1) for all (x, y) on  not equal to (−1, 3). Notice that in the
latter form, the equality continues to hold even when (x, y) = (−1, 3). Therefore
rewriting y − 3 = 23 (x + 1) as
2 11
(6.41) y= x+ ,
3 3
what you have then proved is that every point (x, y) on  satisfies (6.41). Thus 
lies in the graph of (6.41). Since the latter is itself a line (Theorem 6.11),  is the
graph of (6.41), by (L1) on page 165.
So, once again, the answer is y = 23 x + 11
3 .

1
Example 3. What is the equation of the line with slope 2 and passing through
the point (3, 4)?

By Lemma 6.13, the equation is of the form y = 12 x + k for some constant


k. Since (3, 4) has to be a solution of this equation, we have 4 = 12 · 3 + k, which
implies that k = 2 12 . Thus the equation is y = 12 x + 2 12 . An alternative solution is
the following. Let L be the line in question, and let (x , y  ) ∈ L. Then by Theorem
6.10 on page 347, we have
y − 4 1
= ,
x − 3 2
or, equivalently, y = 2 x + 2 2 . This shows that (x , y  ) lies on the graph of
 1  1

y = 12 x + 2 12 . Since this graph is also a line (Theorem 6.11 on page 354) which
passes through (3, 4) and whose slope is 12 , L is equal to this graph, by Theorem
6.9 on page 347. Thus the equation of L is y = 12 x + 2 12 .

We conclude this section with two applications. First, we will give an explicit
description of the segment joining two points.
Lemma 6.15. Let two distinct points P = (p, p ) and Q = (q, q  ) be given.
(i) If p = q but p < q  , then the segment P Q consists of all the points {(p, t)},
where p ≤ t ≤ q  .
(ii) If p < q, let P and Q lie on the line whose equation is y = mx + k for
some constants m and k. Then P Q consists of all the points {(s, ms + k)}, where
p ≤ s ≤ q.
Proof. Let L be the line joining P to Q.
(i) If p = q, then L is the vertical line x = p. If T is the translation along the
vector from (0, 0) to (p, 0), then T maps the y-axis to L (Theorem G5 on page 236),

39 You cannot afford to forget something this basic.


6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 359

so that T (0, y) = (p, y) for all real numbers y. Thus if P0 and Q0 are the points
(0, p ) and (0, q  ) on the y-axis, respectively, then
T (P0 ) = P and T (Q0 ) = Q
so that T maps the segment P0 Q0 to the segment P Q; i.e., T (P0 Q0 ) = P Q. Since
the segment P0 Q0 consists of all the points {(0, t)} for all t satisfying p ≤ t ≤ q  ,
it follows that P Q = T (P0 Q0 ) = {(p, t)}, where p ≤ t ≤ q  . This proves (i).

Y L

Q0 Q = (p, q  )

P0 T - P = (p, p )

p X
O
(ii) If p < q, then L is not vertical and is therefore the graph of an equation
y = mx + k, where m is the slope of L; i.e.,
q  − p
m= .
q−p
(See Lemma 6.13.) Every point of L is therefore of the form (x, mx + k) for a real
number x. Let S = (s, ms+k) be a point on the segment P Q, where P = (p, mp+k)
and Q = (q, mq+k). Thus S is between P and Q. The vertical lines passing through
P , S, and Q then intersect the x-axis at the numbers p, s, and q, respectively.

Y L
Q
S

p s q X
O

These vertical lines being parallel, Lemma 4.8 on page 178 implies that s is between
p and q; i.e., p < s < q. Since the segment P Q consists of P , Q, and all the points
between P and Q, this shows that P Q consists of all the points {(s, ms + k)} where
p ≤ s ≤ q. The proof of the lemma is complete.

A second application is to give an explicit description of the half-lines of a given


line L (in the sense of Lemma 4.5 on page 173) in the coordinate plane.

Lemma 6.16. (i) Let a nonvertical line L be the graph of y = ax + b, and let P
be a point on L with coordinates (p, ap+b). Then the two half-lines of L determined
by P are the following two subsets of L:
360 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

Y
P − = all the points (x, ax + b) so that x < p. P +

P 
− r
P + = all the points (x, ax + b) so that p < x. P
 
 p X

(ii) For a vertical line L defined by x = c, if P = (c, p) is a point of L, then the


half-lines of L determined by P are the subsets
P − = all the points (c, y) so that y < p.
P + = all the points (c, y) so that p < y.

Proof. We first prove (i). Fix a point Q ∈ P + . Since L is the disjoint union of
{P } and the two half-lines L+ , L− with respect to P (see Lemma 4.5 on page 173),
Q lies in either L+ or L− . By changing the notation if necessary, we may assume
Q ∈ L+ . Thus, Q is a point in both L+ and P + . We claim that P + = L+ .
To show P + ⊂ L+ , let U ∈ P + and we will prove U ∈ L+ . Let the coordinates
of Q and U be (q, aq + b) and (u, au + b), respectively. Because Q and U are both
in P + , we have p < q and p < u. For definiteness, let q < u. Then by the preceding
lemma, the segment QU consists of all the points (x, ax + b) so that q ≤ x ≤ u.
Thus P cannot lie in QU because P = (p, ap + b) and p < q. So QU does not
contain P and, by Lemma 4.5(ii), Q and U lie in the same half-line of P . Since Q
lies in L+ , so does U . Thus, P + ⊂ L+ . Next, we prove L+ ⊂ P + . Let V ∈ L+ ,
and we must show V lies in P + . Suppose V lies in P − , and we will deduce a
contradiction. By the definition of P − , V = (v, av + b), where v < p. Consider the
segment V Q. By the preceding lemma, V Q consists of all the points (x, ax + b) so
that v ≤ x ≤ q. But v < p < q and P = (p, ap + b), so P lies in V Q. Now both V
and Q are points of L+ , so the convexity of L+ implies that V Q does not contain
P . This contradiction shows that V cannot lie in P − . Since L is obviously also a
disjoint union of P + , P − , and {P }, V has to lie in P + after all. We have proved
that P + = L+ .
It remains to show that P − = L− . We first prove P − ⊂ L− . Let U ∈ P − .
U cannot lie in L+ because if it did, U would lie in P + (since L+ = P + ), contra-
dicting the disjointness of P + and P − . So U lies in L− . Since U is an arbitrary
point of P − , we have P − ⊂ L− . Conversely, we show that L− ⊂ P − . Suppose V
is a point of L− . Then V cannot lie in P + because if it did, V would lie in L+
(since P + = L+ ), contradicting the disjointness of L− and L+ . So V lies in P −
and therefore L− ⊂ P − . Together, we have proved P − = L− . (i) is proved.
Having gone through the proof of (i) in such detail, we see that the proof of
(ii) is similar but simpler: replace (x, ax + b) by (c, y) and reason with y instead of
x, and there will be no need to invoke Lemma 6.15 because we will be looking at
a number line consisting of all the (c, y), where y ∈ R. The proof of the lemma is
complete.

Finally, the issue of graphing a linear equation of two variables brings out the
confrontation of theory with practice in graphic presentations. Consider the graph
L of an equation such as y = 25x + 50. Clearly, the two points (0, 50) and (−2, 0)
are on L, and by Theorem 6.11 on page 354, L is the line passing through these two
points. Now, how do we present this graph on the blackboard or on the pages of
6.5. THE GRAPHS OF LINEAR EQUATIONS IN TWO VARIABLES 361

a book? By choice, the two points (1, 0) and (0, 1) are equidistant from the origin
O of the coordinate system; see the discussion on pp. 336ff. Such being the case,
once the two points (0, 0) and (1, 0) have been chosen on the horizontal x-axis,
the point (0, 50) on the y-axis would have to be, vertically, 50 times the distance
between (0, 0) and (0, 1) above (0, 0). This is simply not practical. A reasonable
compromise is to introduce a scaled coordinate system, i.e., rescale the y-axis,
so that the old distance between (0, 0) and (0, 1) is now interpreted to be 10 instead
of 1. Now the graph L of y = 25x + 50 can be presented as follows:
Y


50

40

30
L
20

10

X
−5 −4 −3 −2 −1 O 1
What must be borne in mind is the fact that, in this graphic representation, the
90-degree rotation around O is no longer length-preserving, among other geometric
anomalies. The reason is clear: the 90-degree counterclockwise rotation now maps
(1, 0) to (0, 10), so that it maps the unit segment [1, 0] on the x-axis to a segment
of length 10 (namely, the segment from (0, 0) to (0, 10)).

Pedagogical Comments. The teaching of slope (or, more to the point, the
widespread nonteaching of slope) in school mathematics furnishes an excellent ex-
ample of how TSM40 causes massive nonlearning in school mathematics. In TSM
and standard professional development materials, there appear to be two ways to
approach the definition of the slope of a line. One way is to choose two points on
the line, compute the "rise-over-run" using these two fixed points, and declare that
this "rise-over-run" is the slope of the line. There is no hint of the fact that the
"rise-over-run" computed with any two points on the line would also be equal to the
same number. The standard excuse is that since similar triangles are not taught
until high school geometry—after slope has been taught and used—this indepen-
dence cannot be explained. A second way is to inform students that what we call
nonvertical lines are the graphs of equations of the form y = mx + b and then define
the slope of such a line to be the number m (in other words, Theorem 6.11 on page
354 is implicitly assumed). Such a definition of a line is totally inappropriate for
K–12 as it likely shatters students’ confidence about whether they even know what
a line is anymore.
Either of these approaches inevitably leads to rote-teaching and rote-learning
of linear equations and their graphs. A recent study by Postelnicu and Greenes
([PG]) of students’ understanding of straight lines in algebra reveals that the most
difficult problems for students are those requiring the identification of slope of a
40 See page xiv of the preface for the definition of TSM.
362 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

line from its graph. Let us pause for a moment to absorb the absurdity of the
situation: how could a routine computation (see (6.28) on page 347) become "the
most difficult problem"? One can well imagine that—with respect to the first
approach—if students do not know they can use any two points on a line to compute
its slope, they would naturally be confused about "how to measure rise-and-run"
(see page 15 of [PG], left column). It therefore stands to reason that, since teaching
slope strictly by analogies and metaphors for lack of a correct definition of slope
has led to widespread, abject failures in student learning, we should try returning
to first principles by beginning with a correct definition of it for a change. Let us
teach similar triangles and explain the reasoning surrounding the proof of Theorem
6.11 (pp. 354ff.).
One can appreciate the stranglehold TSM has exerted on school mathematics
education from the following three facts. In the mathematics education reform of
the 1990s, the fact that the concept of the slope of a line has almost never been
presented to students correctly did not seem to merit any discussion (compare
[NCTM1989] and [PSSM]). Moreover, for at least two decades since 1990, many
states pushed for "algebra in grade 8" without paying any attention to this glar-
ing defect in the teaching of slope and the attendant massive nonlearning of the
geometry of linear equations in two variables. Finally, even after the publication
of [CCSSM], which took over the recommendation of (a draft of) [Wu2016b] and
made a major change in the curriculum so that the concept of similar triangles is
taught in grade 8 to make possible a correct definition of slope, there still seems
to be little awareness in the mathematics education literature that slope has to be
correctly defined. For example, the lack of a correct definition of slope in TSM is
not mentioned in a 2014 article on the teaching of slope ([Nagle-Moore-Russo]) or
a 2015 volume on Mathematical Understanding for Secondary Teaching ([MUST]).
By now, it should be obvious that, in large part, these three volumes (this vol-
ume, [Wu2020b], and [Wu2020c]) have been written in response to just such abuses
in TSM. End of Pedagogical Comments.

Exercises 6.5.

(1) Solve for x: (a) 4bx + 13 = 2x + 26b, where b is a number not equal to
2 . Simplify your answer. (b) 5 ax − 17 = 3 ax − 2 , where a is a nonzero
1 2 1 15

number.
(2) Find the equation of the line in each of the following; understand that,
in this situation, getting the right answer is only half the work because
your solution must be supported by reasoning at each step. (a) The line 
passing through (− 21 , 3) with slope −5. (b) The line passing through the
two points (−7, 2) and (3, −4) (write your answer in the form of ax + by =
c). (c) The line L with slope −2 15 and x-intercept −4.
(3) (a) What is the equation of the line joining the two points (X, Y ) and
(Z, W )? (Write your answer in the form of ax + by = c.) (b) What is the
equation of the line whose slope is A and whose y-intercept is B?
(4) Prove Lemma 6.12 on page 354.
(5) For large x, e.g., x ≥ 106 , which of the graphs of the following two equa-
tions is above the other: y = 10x − 5,000,000 and y = x + 1,500,000?
6.6. PARALLELISM AND PERPENDICULARITY 363

(6) Explain directly as if to an eighth grader why the slope of the line defined
by 2x − 5y = 7 is 25 by making use of only the definition of slope and
without invoking Theorem 6.10.
(7) Find the equation of the line passing through (c, c3 ) and (d, d3 ), where c
and d are numbers so that c = 1, d = 1, and c = d. Simplify your answer.

6.6. Parallelism and perpendicularity


In this section, we show how the parallelism and perpendicularity of lines can be
characterized in terms of slope. In TSM, these characterizations are either stated
without proof or, worse, stated as "new" definitions of parallel and perpendicular
lines. As an application of the characterization of parallelism in terms of slope, we
give a coordinate description of a translation along a vector.
Characterization of parallelism (p. 363)
Characterization of perpendicularity (p. 366)
A coordinate description of translation (p. 368)

Characterization of parallelism

Theorem 6.17. Two distinct nonvertical lines have the same slope ⇐⇒ they
are parallel.
Remark. We have been talking informally about lines that slant this way /
or that way \. It is time to point out that, with the availability of Theorem 6.17,
we can give precision to these expressions: we say a line slants this way / if the
line passing through the origin and parallel to it lies in quadrants I and III,41 and
similarly, we say a line slants this way \ if the line passing through the origin
and parallel to it lies in quadrants II and IV. It follows from Theorem 6.17 that a
nonvertical and nonhorizontal line slanting this way / has positive slope, and one
slanting this way \ has negative slope.

Proof. We give two proofs.


First proof. This uses algebra by way of Theorem 6.11 on page 354.
Suppose lines L and L are nonvertical and are parallel and we want to prove
that they have the same slope. Suppose not, and we will deduce a contradiction.
By Theorem 6.11, we may assume that L and L are the graphs of y = mx + k and
y = m x + k , respectively. By Lemma 6.13 on page 356, m and m are the slopes
of L and L and, therefore, m = m under the present assumption. It is easy to
verify that the point
      
k −k k −k
P = ,m +k
m − m m − m
is a solution of both equations y = mx + k and y = m x + k .42 By the definition
of the graphs of linear equations (page 352,) P is the point of intersection of L and
L . This contradicts L  L .
41 By a common abuse of language, we ignore the point of the line at the origin O.
42 Obviously, P is obtained by solving the given simultaneous linear equations, a subject we
will take up in earnest in Section 6.7 below. However, the verification that this P is a solution of
both equations is independent of any theory about simultaneous equations.
364 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

Next, we look at the converse. Suppose L and L are distinct lines with the
same slope and we have to prove that they are parallel. As before, let L and L
be the graphs of y = mx + k and y = mx + k , respectively. Note that the same
m appears as the coefficient of x in both equations because the lines have the
same slope (Lemma 6.13 again) and k = k because L and L are distinct lines, by
hypothesis. Again we use a contradiction argument. If they are not parallel, let
them intersect at a point P = (p1 , p2 ). By the definition of the graphs of linear
equations, P is a solution of both y = mx + k and y = mx + k . Thus,

p2 = mp1 + k and p2 = mp1 + k .

It follows that k = p2 − mp1 = k , contradicting k = k . The proof is complete.

Second proof. This uses geometry.


If one of the lines 1 and 2 is horizontal, this theorem is easily disposed of on
account of the characterization of horizontal lines as those with zero slope (Lemma
6.8 on page 346). Assume then that neither is horizontal.
First, suppose 1  2 , and we will prove they have the same slope. For clarity,
we will assume that 1 and 2 have positive slope, but the case of negative slope can
be handled in exactly the same way. Take a point P on 1 and let a vertical line
through P intersect 2 at Q. (This vertical line must intersect 2 because the latter
is not vertical.) Since the lines are distinct, P = Q. Go along the ray RP Q from P
to Q and stop at some point R so that |P Q| = |QR|. From Q, draw a horizontal
line which meets 1 at S, and from R also draw a horizontal line which meets 2 at
T . The following gives one representation of this situation.
Y  1
P  2
 

 

  
 S
 Q
 
 

T R
 X
O
Because P SQ and QT R are right triangles with legs parallel to the coor-
dinate axes (Theorem G3 on page 224), the slopes of 1 and 2 are
|P Q| |QR|
and ,
|SQ| |T R|
respectively. We have to show that these two numbers are equal. Since we already
know |P Q| = |QR|, it suffices to prove |SQ| = |T R|. We do this by showing
that P SQ and QT R are congruent, which then immediately yields the desired
equality |SQ| = |T R|, because the corresponding sides of congruent triangles have
the same length (Theorem G7 on page 245).
To prove the congruence, we will make use of the ASA criterion of congruence
(Theorem G9 on page 245). We have |P Q| = |QR| by construction. Also |∠P QS| =
|∠QRT | = 90◦ , and ∠SP Q and ∠T QR have the same degrees because they are
corresponding angles of the parallel lines 1 and 2 (Theorem G18 on page 277).
6.6. PARALLELISM AND PERPENDICULARITY 365

Hence triangles SP Q and T QR are congruent, from which we conclude that |SQ| =
|T R|. This then completes the proof that 1 and 2 have the same slope.
Before proving the converse, we should ask what strategy we
might use. As always, what we can do depends on what tools
are available. Up to this point, what tools (theorems) are at our
disposal that would guarantee that two lines are parallel? Basi-
cally there is only one: Theorem G19 on page 281 in Chapter
5, which says that if the corresponding angles (or alternate in-
terior angles) of a transversal with respect to a pair of lines are
equal, then the lines are parallel. With the same construction
as above, we see that proving the equality of ∠SP Q and ∠T QR
would be our best bet. It then follows that we would try to prove
the congruence of triangles P SQ and QT R to achieve our goal.
Y  1
P  2
 
 
 
  
 Q 
  S 

 
 
T R
 X
O
Conversely, suppose two distinct, nonvertical lines 1 and 2 have the same
slope and we have to show that they are parallel. We are assuming they are not
horizontal, so we may perform the same construction as before to get |P Q| = |QR|
and right triangles P SQ and QT R. We are going to prove that the triangles are
congruent by using the SAS criterion for congruence (Theorem G8 on page 245).
We already have right angles ∠P QS and ∠QRT . We also have the equality of one
pair of sides, |P Q| = |QR|, by construction. To get the equality of the other pair
of sides, we use the hypothesis on the equality of the slopes of 1 and 2 :
|P Q| |QR|
= .
|SQ| |T R|
Since |P Q| = |QR|, we have |SQ| = |T R|. Hence the triangles P SQ and QT R
are congruent, and consequently, the corresponding angles43 ∠SP Q and ∠T QR are
equal (Theorem G6 on page 240). This implies 1  2 because their corresponding
angles relative to the transversal P R are equal (Theorem G19 on page 281). The
proof is complete.

Pedagogical Comment. We could have rephrased the second proof so that


the concept of congruence is replaced by the concept of similarity. Then, we could
have let U be any point on the ray RP Q and not finesse it so that |P Q| = |QR|,
and the whole argument would still be valid, provided we replace the SAS cri-
terion for congruence by the SAS criterion for similarity (Theorem G21 on page
287). Such a proof would be a trifle more natural. But in terms of teaching in
43 Having given explicit details in the second proof of Theorem G18 (see page 278) on how to

prove that a pair of angles are corresponding angles with respect to a transversal of parallel lines,
we will henceforth omit such arguments (see especially the Pedagogical Comments on page 279).
366 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

the school classroom, especially at the beginning of teaching geometry, congruence


is the more intuitive and the more elementary concept, and students find it more
accessible than similarity. To the extent possible, we would therefore exploit con-
gruence. This accounts for the above proof. End of Pedagogical Comment.

Characterization of perpendicularity

Theorem 6.18. Two distinct, nonvertical lines are perpendicular ⇐⇒ the prod-
uct of their slopes is −1.

We begin with a general observation about why the slopes of perpendicular


lines that pass through the origin of a coordinate system must have opposite signs
(i.e., one is negative and the other is positive; see page 125).
Restricting the discussion to nonvertical and nonhorizontal lines passing through
the origin, we see that (excepting the origin, as always) they must lie completely
inside either quadrants I and III, or quadrants II and IV (see page 335), as shown:

 T
 T
 T
 T
O OT
 T
 T
 T

We are now going to explain why the lines in the left picture, those that lie
completely inside quadrants I and III, are the ones with positive slope, and those in
the right picture, those that lie completely inside quadrants II and IV, are the ones
with negative slope. Take a point (a, b) on such a line ((a, b) = (0, 0)); then using
(a, b) and (0, 0) to compute its slope, we see that the slope is ab . Since b and a have
the same sign in quadrants I and III (see page 122 for the definition) and opposite
signs in quadrants II and IV, it follows that the slope is positive for lines lying
in quadrants I and III, and negative for lines lying in quadrants II and IV (recall
Theorem 2.10 on page 116). As a consequence, if we have two rays issuing from
O with positive slopes (i.e., the lines containing them have positive slopes), then
they lie in either quadrant I or III and therefore the degree of the angle between
these rays is either greater than 90◦ or less than 90◦ . Similarly for two lines with
negative slopes. It follows that two lines whose slopes have the same sign can never
be perpendicular to each other. Hence we have proved the following lemma.

Lemma 6.19. If two nonvertical lines passing through O are perpendicular, their
slopes have opposite signs.

We can now give the Proof of Theorem 6.18: first suppose 1 and 2 are
perpendicular lines. Let lines L1 and L2 be lines passing through the origin so that
1  L1 and 2  L2 . (If 1 or 2 passes through the origin, we will let L1 or L2 be
1 or 2 , as the case may be.) By Theorem 6.17, 1 and L1 have the same slope.
The same is true for 2 and L2 . Therefore it suffices to prove that the product of
the slopes of L1 and L2 is equal to −1. By hypothesis, neither line is vertical and
6.6. PARALLELISM AND PERPENDICULARITY 367

they are perpendicular to each other; thus neither line is horizontal either. So both
L1 and L2 are nonvertical and nonhorizontal. By Lemma 6.19, we already know
that the product is a negative number. Hence, it suffices to prove that the product
of the absolute values of the slopes of L1 and L2 is equal to 1.
Observe that, because 1 ⊥ 2 , L1 ⊥ L2 (Exercise 3 on page 372). It follows
from the preceding discussion that we may assume that L2 lies in quadrants I and
III and L1 lies in quadrants II and IV. Let P2 be some point on the line L2 and
in quadrant I, and let be the rotation of 90 degrees around the origin O. Then
(L2 ) = L1 and therefore if P1 = (P2 ), we have P1 ∈ L1 . Furthermore, let the
vertical line from P2 meet the x-axis at Q2 ; then (Q2 ), to be denoted by Q1 , lies
on the y-axis. As is a congruence, we have
(6.42) |P1 Q1 | = |P2 Q2 | and |OQ1 | = |OQ2 |.

Y
L1JJ
J L2
P1 J Q1 
J P2 
J 

J 
J 

J X
O Q2
By (6.30) on page 348, we see that the absolute value of the slope of L1 is
|OQ1 |/|P1 Q1 | and the absolute value of the slope of L2 is |P2 Q2 |/|OQ2 |. Thus,
taking into account the equalities in (6.42), the product of the absolute values of
the slopes of L1 and L2 is
|OQ1 | |P2 Q2 | |OQ2 | |P2 Q2 |
· = · = 1.
|P1 Q1 | |OQ2 | |P2 Q2 | |OQ2 |
This completes the first part of the proof of Theorem 6.18.

How shall we approach the proof of the converse? We want to


prove that if the product of the slopes of L1 and L2 is −1, then
L1 ⊥ L2 . In other words, if P2 and P1 are two random points
on L2 and L1 , respectively, we want to prove |∠P1 OP2 | = 90◦ .

Y
L1 J
J
J  L2
P2 
P1 J Q1 
J 

J 
J 

J X
O Q2
Since we are already given that |∠Q1 OQ2 | = 90◦ and clearly
|∠Q1 OQ2 | = |∠P2 OQ2 | + |∠Q1 OP2 |, it means that if we can
show
|∠P1 OQ1 | = |∠P2 OQ2 |,
368 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

then we would have


|∠P1 OP2 | = |∠P1 OQ1 | + |∠Q1 OP2 | = |∠P2 OQ2 | + |∠Q1 OP2 | = 90◦ .
So how can we show |∠P1 OQ1 | = |∠P2 OQ2 |? We resort to the
standard reasoning of identifying these angles as corresponding
parts of similar or congruent triangles.
Now we proceed to the formal proof of the converse statement in Theorem
6.18. We have a choice of using congruence or similarity, but since we already used
congruence in the second proof of Theorem 6.17, we will phrase the present proof
in terms of similarity (see the Pedagogical Comment on page 365).
Suppose we have two lines 1 and 2 so that the product of their slopes is −1.
We must prove that 1 ⊥ 2 . Again we let L1 and L2 be lines passing through the
origin and parallel to 1 and 2 , respectively (or equal to 1 , 2 as the case may be
if 1 or 2 already passes through the origin), and it suffices to prove that L1 ⊥ L2
(Exercise 3 on page 372). By Theorem 6.17, the product of the slopes of L1 and
L2 is also −1. In particular, since the slopes of L1 and L2 have opposite signs, we
may assume that L2 lies in quadrants I and III and L1 lies in quadrants II and IV.
Let P2 be some point on L2 lying in quadrant I. Drop a vertical line from P2 so
that it meets the x-axis at Q2 (see preceding picture). Let P1 be some point on L1
lying in quadrant II, and let a horizontal line from P1 meet the y-axis at Q1 . If we
can prove that P1 OQ1 ∼ P2 OQ2 , then we would have |∠P1 OQ1 | = |∠P2 OQ2 |
(Theorem G20 on page 287) so that
|∠P1 OP2 | = |∠P1 OQ1 | + |∠Q1 OP2 | = |∠P2 OQ2 | + |∠Q1 OP2 | = 90◦ .
In other words, L1 ⊥ L2 , and therefore 1 ⊥ 2 .
It remains to prove that P1 OQ1 ∼ P2 OQ2 . Since the product of the slopes
of L1 and L2 is −1, the product of the absolute values of slopes of L1 and L2 is
equal to 1. By a reasoning that is familiar to us by now, this means
|P2 Q2 | |OQ1 |
· = 1.
|OQ2 | |P1 Q1 |
Multiplying both sides of the equality by |OQ2 |/|OQ1 |, we get
|P2 Q2 | |OQ2 |
= .
|P1 Q1 | |OQ1 |
Since ∠P2 Q2 O and ∠P1 Q1 O are right angles, the SAS criterion for similarity (The-
orem G21 on page 287) implies that P1 OQ1 ∼ P2 OQ2 , as desired. The proof
of Theorem 6.18 is complete.

A coordinate description of translation

We are now in a position to give the precise coordinate descriptions of a few


basic isometries, the most important of these being the following.

−−→
Lemma 6.20. Let T be the translation along the vector BC, where B = (b1 , b2 )
2
and C = (c1 , c2 ). Then for all (x, y) in R , T (x, y) = (x + a1 , y + a2 ), where
(a1 , a2 ) = (c1 − b1 , c2 − b2 ).
6.6. PARALLELISM AND PERPENDICULARITY 369

Remark. When B is the origin O = (0, 0), the lemma simplifies to the follow-
−−→
ing: let T be the translation along the vector OC, where C = (c1 , c2 ). Then for any
(x, y) in R2 , T (x, y) = (x + c1 , y + c2 ).

Proof. Let the line passing through B and C be denoted by L. Let P be a point
with coordinates (p1 , p2 ). We will prove that T (P ) has coordinates (p1 +a1 , p2 +a2 ).
According to the definition of translation on page 234, the proof is broken into two
cases.
Case 1. P ∈ L. [The theorem in this case is pictorially obvious but its proof is
tedious. The suggestion is to skip this proof in a school classroom and concentrate
on proving Case 2 only.]
First assume that L is not vertical; i.e., b1 = c1 . Let Q denote the point
(p1 + a1 , p2 + a2 ), where, as in the lemma, (a1 , a2 ) = (c1 − b1 , c2 − b2 ). We want
to prove that Q = T (P ). The following picture is for the case a1 > 0, a2 > 0, and
p1 < b1 :
Y C  L
:
Br
Q 
:
Pr


X
O
According to the definition of translation on page 234, we have to prove that
−−→ −−→
Q lies on L and |P Q| = |BC| and P Q and BC point in the same direction. Let us
first observe that |P Q| = |BC| because the distance formula (equation (6.20) on
page 336) says

|P Q| = ((p1 + a1 ) − p1 )2 + ((p2 + a2 ) − p2 )2
 
= a21 + a22 = (c1 − b1 )2 + (c2 − b2 )2 = |BC|.
Next, we prove that Q lies on L. Let LP Q denote the line containing P and Q as
usual. Now L and LP Q are two lines that contain the point P , and they also have
the same slope because the slope of LP Q is
(p2 + a2 ) − p2 a2 c2 − b2
= =
(p1 + a1 ) − p1 a1 c1 − b1
and the latter is the slope of L. Therefore, by Theorem 6.9 on page 347, the lines
L and LP Q coincide. Therefore Q lies on L = LP Q .
−−→ −−→
Finally, we prove that P Q and BC point in the same direction. There are two
cases: a1 > 0 and a1 < 0. It suffices to take up the case of a1 > 0 as the second case
is similar. Recall that P = (p1 , p2 ) and B = (b1 , b2 ). Suppose p1 < b1 , as in the
preceding picture. Then we claim that the ray RBC is contained in the ray RP Q .
Since a1 > 0, we have p1 < q1 (= p1 + a1 ). Therefore with Q = (q1 , q2 ), the ray
RP Q consists of all the points (x, y) on L so that p1 ≤ x, according to Lemma 6.16
on page 359. By the same token, since c1 = b1 + a1 > b1 , the ray RBC consists of
all the points (x, y) on L so that b1 ≤ x. Thus for any point (x , y  ) lying in RBC ,
we have p1 < b1 ≤ x . It follows that RBC ⊂ RP Q . Now if we let L0 be the vertical
line passing through P , then the closed right half-plane of L0 (consisting of all the
370 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

points (x, y) in the plane so that p1 ≤ x) contains both RP Q and RBC . This shows
−−→ −−→
that P Q and BC point in the same direction.
Suppose b1 < p1 instead.
Y Q  L
:
Pr
C
:

r
B

X
O

Then if we go through the preceding argument by interchanging the pair B, C by


the pair P , Q, we will be able to conclude that RP Q ⊂ RBC . Now if we let L0
be the vertical line passing through B, then the closed right half-plane of L0 will
−−→ −−→
contain both RP Q and RBC . Once again, this shows that P Q and AB point in the
same direction. Thus T (P ) = Q if P ∈ L and L is not vertical.
If L is vertical, then b1 = c1 and a1 = 0. It is straightforward to see that,
in this case, the preceding argument simplifies; e.g., we prove that if B is above
(respectively, below) C, then P is also above (respectively, below) Q. The proof of
Case 1 is complete.

Case 2. P is not on L. Again let P = (p1 , p2 ) and let T (P ) = Q. According to


Lemma 4.21 on page 234, Q is the intersection of the line L which passes through
P and is parallel to L, and the line  passing through C and is parallel to LBP . We
have to prove that Q = (p1 + a1 , p2 + a2 ).
Y  L
C

:

B 
r 



   L
 
  Q

P 
O X

Our strategy to prove Q = (p1 +a1 , p2 +a2 ) is to let Q be the point (p1 +a1 , p2 +a2 )
and then show that Q coincides with Q. To this end, we will prove below that
L  LP Q and LCQ  LBP . Assuming this for the moment, we see that, by the
parallel postulate, LP Q has to be the line passing through P and parallel to L,
and therefore L = LP Q . Similarly, LCQ has to be the line passing through C and
parallel to LBP , and therefore  = LCQ . Since two distinct lines can intersect at
only one point (Lemma 4.2 on page 165), Q = Q and we are done in this case.
Let us now prove L  LP Q . First assume that L is not vertical. Then b1 = c1 ,
so that a1 = c1 − b1 = 0, We will compute the slope of LP Q using (of course) the
two points P = (p1 , p2 ) and Q = (p1 + a1 , p2 + a2 ):

(p2 + a2 ) − p2 a2
slope of LP Q = = .
(p1 + a1 ) − p1 a1
6.6. PARALLELISM AND PERPENDICULARITY 371

Next, we compute the slope of L using the two points B = (b1 , b2 ) and C = (c1 , c2 ).
Because a1 = c1 − b1 and a2 = c2 − b2 , we have C = (a1 + b1 , a2 + b2 ). Hence,
c2 − b2 a2
slope of L = = .
c1 − b1 a1
These two slopes being equal, we see that L  LP Q (Theorem 6.17 on page 363)
when L is not vertical. Now if L is vertical, then b1 = c1 and a1 = c1 − b1 = 0.
Since Q = (p1 + a1 , p2 + a2 ) = (p1 , p2 + a2 ), Q and P = (p1 , p2 ) have the same
first coordinates and LP Q is also vertical. Thus L  LP Q as before.
Next, we will prove that LCQ  LBP . First assume that LBP is not vertical,
so that b1 = p1 . Then using Theorem 6.17 again, we will prove that the slopes of
LCQ and LBP are equal. Using the fact that C = (a1 + b1 , a2 + b2 ), we compute
the slope LCQ using C and Q :
(a2 + b2 ) − (p2 + a2 ) b2 − p2
slope of LCQ = = = slope of LBP .
(a1 + b1 ) − (p1 + a1 ) b1 − p1
Thus LCQ  LBP in case LBP is not vertical. It remains to consider the case that
LBP is vertical. Then b1 = p1 so that a1 + b1 = a1 + p1 . Since a1 = c1 − b1 ,
this implies c1 = p1 + a1 , and the first coordinates of C and Q are equal. There-
fore LCQ is also vertical and again LCQ  LBP . The proof of the lemma is complete.

Lemma 6.20 can be used to give the shortest proof that the composition of two
translations is a translation (see Exercise 10 on page 238). The same lemma also
shows that the composition of translations is commutative in the sense that if T
−−
→ −−−→
is the translation along a vector AB and T  is the translation along a vector A B  ,
then for all (x, y) in the plane, (T ◦ T  )(x, y) = (T  ◦ T )(x, y).

Mathematical Aside: Recall that we introduced the group of translations in the


plane in the Mathematical Aside on page 235. The preceding remark about the
composition of translations being commutative means that the group of transla-
tions in the plane is in fact an abelian group.

It turns out that, with the help of trigonometric functions and complex num-
bers, we will be able to express all the basic isometries in terms of coordinates, at
least in principle (see Section 1.6 in [Wu2020c]). In the meantime, here are some
simple rotations and reflections in terms of coordinates. (We leave their proofs to
Exercises 8, 9, and 11 on page 372).
(i) Let denote the counterclockwise rotation of 90 degrees
around the origin O of R2 . Then for every (x, y) ∈ R2 , (x, y) =
(−y, x).44
(ii) Let 0 be the 180-degree rotation around the origin O. Then
for every (x, y) ∈ R2 , we have 0 (x, y) = (−x, −y).
(iii) If Λ1 denotes the reflection with respect to the x-axis, then
for every (x, y) in R2 , Λ1 (x, y) = (x, −y).
(iv) If Λ2 denotes the reflection with respect to the y-axis, then
for every (x, y) ∈ R2 , Λ2 (x, y) = (−x, y).

44 If (x, y) lies in the first quadrant, this is implicit in the proof of Theorem 6.18.
372 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

Exercises 6.6.
(1) What is the equation of the line which is perpendicular to the graph of
ax + by = c, where a, b, c are constants, a = 0, and b = 0, and which
passes through the point (− 12 , 3)?
(2) Assume a triangle with vertices at (1, 1), (5, 1), and (7 21 , 4). Does the point
(2, − 61 ) lie on the line passing through the vertex (1, 1) and perpendicular
to the opposite side of the triangle? Explain.
(3) Prove the following assertion which was used to prove Theorem 6.18: let
L and V be two perpendicular lines. If L and V  are lines so that L  L
and V  V  , then also L ⊥ V  .
(4) Let O be the origin of a coordinate system in R2 , and let r be a positive
number. Prove that the transformation T of R2 which assigns to each
point (a, b) the point (ra, rb) is exactly the dilation with center O and
scale factor r. (Caution: This is a slippery proof.)
(5) (a) Let L be the line joining O (the origin (0, 0)) to the point (a, b), and
let L be the line joining O to the point (a , b ). Prove: L ⊥ L if and only
if aa + bb = 0. (You may recognize the latter as the dot product of the
vectors (a, b) and (a , b ) that you came across in calculus.) (b) Let L and
L be the lines defined by the equations ax + by = c and a x + b y = c ,
respectively. Prove that L ⊥ L if and only if aa + bb = 0.
(6) Write out a self-contained proof, using only congruent triangles and with-
out using similar triangles, of the second half of Theorem 6.18; i.e., if the
product of the slopes of two lines is −1, then the lines are perpendicular.
(7) Let T be the translation along the vector from O to a fixed point (a1 , a2 )
in R2 . (a) If L is the vertical line defined by x = 27 , what is the equation
of T (L)? (b) If L is the horizontal line defined by y = 51, what is the
equation of T (L)? (c) If L is the line defined by 2x − 3y = 1, what is the
equation of T (L)?
(8) (i) Let 0 be the 180-degree rotation around the origin O. Prove that for
every (x, y) ∈ R2 , we have 0 (x, y) = (−x, −y). (ii) Let φ be the 180-
degree rotation around the point (a, b). Prove that for every (x, y) ∈ R2 ,
we have φ(x, y) = (2a − x, 2b − y).
(9) (i) If Λ1 denotes the reflection with respect to the x-axis, prove that for
every (x, y) ∈ R2 , Λ1 (x, y) = (x, −y). (ii) If Λ2 denotes the reflection with
respect to the y-axis, prove that for every (x, y) ∈ R2 , Λ2 (x, y) = (−x, y).
(10) Let L be the line defined by 3x − 4y = 3, and let Λ denote the reflection
across L. Compute Λ(7, 13 ).
(11) (i) Let ρ be the 90◦ counterclockwise rotation around the origin O and
let φ be the 90◦ clockwise rotation around O. Prove that for any point
(x, y), ρ(x, y) = (−y, x) and φ(x, y) = (y, −x).
(ii) Let (a, b) be a fixed point in the coordinate plane and let be the
90◦ counterclockwise rotation around (a, b) and let ϕ be the 90◦ clockwise
rotation around (a, b). Prove that for any point (x, y)

(x, y) = (−y + a + b, x − a + b) and ϕ(x, y) = (y + a − b, −x + a + b).

(iii) Let P and Q be two distinct points and let P be the 90◦ counter-
clockwise rotation around P and ϕQ be the 90◦ clockwise rotations around
6.7. SIMULTANEOUS LINEAR EQUATIONS 373

Q. Prove that both ϕQ ◦ P and P ◦ ϕQ are nontrivial translations; i.e.,


they are not identity transformations of the plane.
(iv) Prove that the compositions of rotations in part (iii) are not rota-
tions by proving that a nontrivial translation is never equal to a rotation.
(This puts in perspective Exercise 10 on page 238.)

6.7. Simultaneous linear equations


In this section, we give a complete analysis of the solution set of a linear system
of two equations in two unknowns, first geometrically in terms of the two lines
defined by the pair of equations and then algebraically in terms of the so-called
determinant of the linear system. In the last subsection, we also give an explanation
of why the standard method of solution of such linear systems by substitution is
correct.
The solution set of a linear system: Geometry (p. 373)
The solution set of a linear system: Algebra (p. 377)
Solution by substitution (p. 380)

The solution set of a linear system: Geometry

Suppose we are given constants a, b, . . . , f and we want to know if there are


numbers x and y so that they are solutions of both of the following linear equations:

ax + by = e,
cx + dy = f.
Such a pair of linear equations is variously called a linear system, or a system
of linear equations, or sometimes, simultaneous (linear) equations in the
numbers x and y. To be precise, one would have to refer to such a pair of equations
as a linear system of two equations in two unknowns or in two variables,
where the "unknowns" or "variables" refer to the symbols x and y. An ordered
pair (x0 , y0 ) is called a solution of the system if it is a solution of both equations.
To solve the system is to find all the solutions of the system. Sometimes we also
call the collection of all these (x0 , y0 )’s the solution set of the system.
Consider, for example,

x + y = 2,
(6.43)
3x + 3y = 6.
This linear system has as solution all ordered pairs of the form (t, 2 − t), where t is
an arbitrary number. In this case, the solution set is an infinite collection of points,
being all the points on the graph of x + y = 2 (the second equation has the same
graph as the first). On the other hand, the system

x + y = 2,
3x + 3y = 5
clearly has no solution, because if (x0 , y0 ) is a solution, the fact that it is a solution
of the first equation means x0 + y0 = 2. By multiplying both sides of this equality
by 3, we get 3x0 + 3y0 = 6. But since (x0 , y0 ) is (supposedly) also a solution of the
second equation, we also have 3x0 + 3y0 = 5, and therefore 5 = 6, a contradiction.
Therefore, the solution set in this case is the empty set, i.e., no solutions.
374 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

In between the two preceding extreme cases, "most" linear systems have exactly
one pair (x0 , y0 ) as a solution. We will explain what "most" means and why this
is true by way of geometry.
Let 1 , 2 be the lines which are the graphs of the equations ax + by = e and
cx + dy = f , respectively. Suppose (x0 , y0 ) is a solution of the linear system

ax + by = e,
cx + dy = f.

In particular, this means we are assuming that there is a solution of the system.
We wish to interpret this solution geometrically. Since (x0 , y0 ) is a solution of
ax + by = e, the point (x0 , y0 ) lies on 1 , by the definition of the graph of an
equation. For the same reason, (x0 , y0 ) lies on 2 as well. Therefore (x0 , y0 ) lies on
both 1 and 2 , and therefore it lies in the intersection of 1 and 2 . (We have to
be careful not to assume that the intersection of 1 and 2 is a point, because we
cannot a priori exclude the possibility that 1 = 2 as in (6.43) above, in which case
the intersection of 1 and 2 is the line itself.) Conversely, suppose (x0 , y0 ) lies in
the intersection of 1 and 2 ; then it must be a solution of the system

ax + by = e,
cx + dy = f

because (x0 , y0 ) being on 1 means ax0 + by0 = e and (x0 , y0 ) being on 2 means
cx0 + dy0 = f . We have therefore proved the following basic fact relating the
solutions of a linear system to the graphs of the equations in the system.

Theorem 6.21. A simultaneous system of linear equations has a solution (x0 , y0 )


⇐⇒ the point (x0 , y0 ) lies in the intersection of the (linear) graphs of the equations
in the linear system.

Theorem 6.21 gives the precise reasoning for why the solution of a linear system
of two linear equations in two unknowns corresponds to the intersection of the
two lines defined by the linear equations of the system. This is the reason why
one can get the solution of a system of simultaneous linear equations by graphing
the equations. Such a correspondence is usually decreed by fiat in TSM without
explanation, probably because the precise definition of the graph of an equation is
rarely given or, if given, is not put to use.
It is worth noting that Theorem 6.21 shares a common feature with a coor-
dinate system: they both provide a dictionary that mediates two disparate sets
of information: the algebraic information about solutions of a linear system and
the geometric information about intersections of lines. In this particular case, we
know all about the intersections of lines (see (L1) and (L2) on page 165) and will
therefore use this knowledge to shed light on the solutions of linear systems. We
know that there are exactly three mutually exclusive possibilities for two lines in
the plane: the lines are either
identical or
parallel or
distinct but not parallel.
6.7. SIMULTANEOUS LINEAR EQUATIONS 375

Correspondingly, the lines


intersect at an infinite number of points or
do not intersect or
intersect exactly at one point.
From Theorem 6.21, we therefore conclude the following.

Corollary. Given a linear system of two equations in two unknowns, let the graphs
of the linear equations be 1 and 2 . Then the linear system either
has an infinite number of solutions (corresponding to 1 = 2 ) or
has no solution (corresponding to 1  2 ) or
has a unique solution (corresponding to 1 = 2 but 1 is not
parallel to 2 ).

We can now explain what is meant by "most" linear systems have a unique
solution. Given two lines, what are the chances that they are either identical or
parallel? This is in fact a precise mathematical question that can be answered
completely: zero. To obtain this answer, one must do some advanced mathematics.
Nevertheless, one can provide an intuitive understanding of the situation by fixing
one of the lines, say 1 , and ask what the chances are that the other line 2 either
coincides with 1 or is parallel to 1 . Clearly we can ignore the possibility of 2
actually equaling 1 because this almost never happens. What about 2  1 ? Look
at it this way: restrict 2 to be a line passing through a fixed point P not lying
in 1 ; then according to the parallel postulate (page 165), there is at most "one
chance" that 2  1 , whereas there are infinitely many possibilities for 2 not to
be parallel to 1 . Since this is true for any point P not lying on 1 , it is intuitively
clear that, "almost always", 2 will be a line distinct from 1 and not parallel to 1 .
So by the Corollary, a linear system will "almost always" have a unique solution.
We now take this corollary of Theorem 6.21 to the next level: what are the
algebraic properties of the linear system that would lead to an infinite number of so-
lutions, no solution, and a unique solution? We can literally follow the prescription
of the preceding corollary and just doggedly investigate the algebraic properties of
the linear equations that correspond to 1 = 2 , 1  2 , and 1 = 2 but 1  2 .
This would lead to a depressing case-by-case argument with thickets of details that
would ultimately not be particularly enlightening.
Here is one way this argument could be carried out.
Case 1. The graphs 1 and 2 of ax + by = e and cx + dy = f ,
respectively, coincide. If they are vertical, then b = d = 0 and the
equations become x = e/a and x = f /c. Their graphs are identical
⇐⇒ ae = fc . Therefore this case is equivalent to
e f
b=d=0 and = .
a c
If they are not vertical, then both b = 0 and d = 0 and we may rewrite
the system as 
y = mx + k,
y = m x + k ,
where m = − ab , k = eb , m = − dc , and k = fd . Then 1 and 2 being
identical means they have the same slope and therefore m = m .
The equations of 1 and 2 become y = mx + k and y = mx + k ,
376 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

respectively. If k = k , then (0, k) would be a point lying on 1 but


not 2 . Thus k = k . Therefore in this case,
a c e f
= and = .
b d b d

Case 2. The graphs 1 and 2 of ax + by = e and cx + dy = f ,


respectively, are parallel. Then they are distinct and either are both
vertical or are both nonvertical and have the same slope (Theorem
6.17 on page 363). In the former case, being vertical means b = d = 0,
and since they are distinct, the equations ax = e and cx = f have
distinct graphs. Hence, ae = fc . To summarize, 1 and 2 being vertical
and parallel is equivalent to
e f
b = d = 0 and = .
a c
In the latter case, since 1 and 2 are nonvertical, b = 0 and d = 0
and we can write the system as

y = mx + k,
y = m x + k ,
where m = − ab , k = eb , m = − dc , and finally, k = fd . Since 1
and 2 have the same slope, m = m , so that ab = dc and therefore by
the cross-multiplication algorithm, ad = bc. Moreover, the graphs of
y = mx + k and y = mx + k (recall m = m ) are distinct ⇐⇒ k = k ;
i.e., eb = fd . To summarize, 1 and 2 being nonvertical, parallel, and
distinct is equivalent to
e f
ad − bc = 0 and = .
b d

Case 3. The graphs 1 and 2 of ax + by = e and cx + dy = f ,


respectively, are not parallel and not the same line. Assume first that
both lines are not vertical. Then as in Case 1, b = 0 and d = 0, so
that we may rewrite the system as

y = mx + k,
y = m x + k ,
where m = − ab , k = eb , m = − dc , and k = fd . By Theorem 6.17 on
page 363, 1 and 2 not being parallel is equivalent to m = m , and
therefore ab = dc and therefore by the cross-multiplication algorithm,
ad = bc. Hence this case is equivalent to
ad = bc.
Next, we consider the case where one of the lines is vertical; let us
say 1 is vertical but 2 is not. Then 1 is defined by x = ae and 2 is
defined by
y = m x + k ,
 
where m = − d and k = fd . Then a = 0, b = 0, and c, d = 0, and
c

again
ad = bc.
What we are going to do is to give a sophisticated algebraic analysis of the pre-
ceding corollary from the perspective of slope. This analysis is not something most
school students can "discover" on their own. Instead, we are going to offer students
an opportunity to learn from the wisdom of the past. Absorbing what others have
6.7. SIMULTANEOUS LINEAR EQUATIONS 377

to offer is one very good way for us to grow intellectually. It is pointless to try to
"discover" everything yourself; it is impossible anyway.

The solution set of a linear system: Algebra

We now begin the algebraic analysis of the preceding corollary. Let 1 , 2 be


the lines which are the graphs of the equations ax + by = e and cx + dy = f ,
respectively, in the linear system

ax + by = e,
cx + dy = f.
To motivate what is to come, suppose 1 and 2 are not vertical, so that their slopes
are defined. We notice that there is one thing that distinguishes the first two cases
(1 = 2 and 1  2 ) from the third (1 = 2 but 1  2 ); namely, in the first two
cases, 1 and 2 have the same slope (Theorem 6.17 on page 363) whereas in the
third case, 1 and 2 have different slopes. We now express this information about
slope algebraically, as follows. Since 1 and 2 are both nonvertical, we have b = 0
and d = 0 so that we may rewrite the linear system as

⎨ y = (− ab )x + eb ,
⎩ f
y = (− dc )x + d.
Thus the slope of 1 is − ab , and that of 2 is − dc . The slopes are equal ⇐⇒ ab = dc ,
which is equivalent to ad = bc (the generalized cross-multiplication algorithm for
rational quotients, item (b) on page 118 of Chapter 2), which in turn is equivalent
to ad − bc = 0. For the same reason, the slopes of 1 and 2 being different is
equivalent to ad − bc = 0. Thus whether or not the two lines of the linear system
have the same slopes or different slopes is captured by the vanishing (i.e., equal
to zero) or nonvanishing of the number ad − bc, respectively. Such considerations
suggest that the number ad − bc is an important characteristic of the linear system.
Indeed it is, and it is called the determinant of the linear system, usually denoted
by Δ.45 The theorem we want to prove is then the following.

Theorem 6.22. Assume a linear system



ax + by = e,
cx + dy = f.
Let Δ denote the determinant of the system, Δ = ad − bc. Then:
(i) If Δ = 0, the linear system has a unique solution.
(ii) If Δ = 0, the linear system has either an infinite number of solutions or no
solution.

Before giving the proof, we observe that since the definition of the determinant
does not require that any of a, b, c, and d be nonzero, the lines 1 and 2 that
correspond to the two equations in the system could therefore be vertical. We also
wish to point out that the following proof could be presented slightly differently;
see Exercise 3 on page 383.

45 Mathematical Aside: You may consider this discussion to be a review of the simplest case

of linear algebra, namely, the case of dimension 2.


378 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

Proof. We handle the two cases (i) and (ii) separately.


Case (i). The determinant ad − bc is nonzero. In this case, clearly not both b
and d are zero. Suppose b = 0. Then d = 0, and we see that 1 is vertical but 2 is
not vertical. Therefore 1 intersects 2 at exactly one point, and the linear system
(according to the corollary of Theorem 6.21) has a unique solution. Similarly, if
d = 0, then b = 0 and again the linear system has a unique solution. If both b and
d are nonzero, then the linear system can be rewritten as

⎨ y = (− ab )x + eb ,
⎩ f
y = (− dc )x + d.
The slope of 1 is therefore − ab and the slope of 2 is − dc . Since ad − bc = 0, ad = bc
and ab = dc by the cross-multiplication algorithm. Thus 1 and 2 have different
slopes and therefore (by Theorem 6.17 on page 363) are not parallel to each other.
It follows that 1 intersects 2 at exactly one point and the linear system again has
a unique solution. We have thus proved Case (i).

Case (ii). The determinant ad − bc is zero. We claim that in this case, either
both b and d are 0 or both b and d are not 0.
To prove the claim, it suffices to show that if b = 0, then d must be 0, and if
d = 0 then b = 0. Suppose b = 0. Then ad − bc = 0 implies that ad = 0. But by
the definition46 of a linear equation of two variables, not both a and b can be 0 in
ax + by = e. So a = 0. It follows that ad = 0 implies d = 0. Similarly, if d = 0,
then also b = 0. The claim is proved.
We now examine the first possibility: b = d = 0. Then both a and c are
nonzero, by the definition of a linear equation of two variables. The linear system
may therefore be rewritten as

⎨ x = ae ,
⎩ f
x = c.
Clearly, these vertical lines are identical (and the system has an infinite number of
solutions) if e/a = f /c, and they are parallel (and the system has no solution) if
e/a = f /c.
Next we examine the second possibility: b = 0 and d = 0. The linear system
may therefore be rewritten as

⎨ y = (− ab )x + eb ,

y= (− dc ) x + fd .
Thus 1 and 2 have slopes equal to −a/b and −c/d, respectively. Now ad − bc = 0
by hypothesis, so ad = bc and the cross-multiplication algorithm implies that
a/b = c/d. This implies 1 and 2 have the same slope. We have to decide if
they are identical or parallel. If e/b = f /d, then the equations are identical and
therefore so are their graphs (the linear system therefore has an infinite number
of solutions), and if e/b = f /d, then the lines are distinct (because, for example,
46 This is another reminder that we must take definitions seriously. Since we have defined a

linear equation αx + βy = γ to be such that not both α and β are zero, it stands to reason that
this part of the definition will play a critical role sooner or later.
6.7. SIMULTANEOUS LINEAR EQUATIONS 379

(0, e/b) is a point on 1 but not on 2 ) and are therefore parallel (the linear system
therefore has no solution). This completes the proof of Case (ii) and, therewith,
the proof of Theorem 6.22.

Remark. From the proof itself, we see that the conclusion of Case (ii) can be
made very precise; namely:
(i) If the determinant is 0, then either b = d = 0 or both b = 0
and d = 0.
(ii) In case b = d = 0, then the linear system has an infinite
number of solutions if e/a = f /c, and it has no solution if e/a =
f /c.
(iii) In case b = 0 and d = 0, then the linear system has an
infinite number of solutions if e/b = f /d and has no solution if
e/b = f /d.
However, it is imperative that you not try to memorize these conclusions. If
you understand the reasoning, then you can use it in each situation to draw the
right conclusion. We illustrate with some simple examples.
If we are given a linear system with b = d = 0, e.g.,
4
3x = 5,

−x = − 15 ,
then common sense dictates that you multiply the second equation by −3 to change
the system to ⎧
⎨ 3x = 45 ,

3x = 35 .
Direct inspection now shows that the linear system has no solution.
Suppose we are given

10.2x − 13.6y = 11.5,

− 94 x + 3y = − 21
4 .

We note that 10.2 × 3 − (−13.6)(− 49 ) = 0 as both products are equal to 30.6, and
so we are in the situation of ad − bc = 0 but bd = 0. We know from the preceding
analysis that the linear system has either no solution or an infinite number of
solutions, depending on whether the lines defined by the equations are distinct or
identical, respectively. The simplest way to find out whether the lines defined by
these equations are identical or distinct is to rewrite both equations in the form
−1
of y = mx + b and compare. Thus multiplying the first equation by 13.6 and the
1
second equation by 3 , we get

y = 10.213.6 x −
11.5
13.6 ,

y = 3
4x − 7
4.

Notice that in so doing, we do not need to bother with checking whether 10.2 13.6 and
3
4 are equal or not, as we already know that they must be equal (because the lines
have the same slope!). The only thing to compare is whether 11.5
13.6 and 7
4 are equal.
Since the former is less than 1 and the latter is greater than 1, they are obviously
380 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

not equal. Hence the two lines are distinct and this system has no solutions. There-
fore the same is true of the original system as well.

Solution by substitution

Up to this point, we have not talked about the explicit algebraic method of solv-
ing a linear system taught in school classrooms. This is the well-known method of
substitution (sometimes taught in the equivalent form of the method of elimi-
nation). This method is usually taught by rote in schools. What we will do next
is to subject this method to a critical examination.
In the remainder of this section, we will explain why this method produces the
correct solution to any linear system with nonzero determinant and what it means
geometrically.

Let us first look at a simple case and then use the standard symbolic manipu-
lations taught in schools to get a solution. Consider

2x − 3y = 1,
(6.44)
3x + 2y = −1.
Let us eliminate y. So from the second equation, we get
3 1
y =− x− .
2 2
Substituting this expression of y into the first equation gives
 
3 1
2x − 3 − x − = 1.
2 2
Simplifying, we get 13
2 x = − 12 so that
1
x=− .
13
Substituting this value of x into y = − 32 x − 12 then leads to
5
y=− .
13
1
TSM now says (− 13 , − 13
5
) is a solution of (6.44).

We pause to reflect on this method of solution. We have performed the whole


computation without knowing what x and y are beyond the fact that they are
some symbols. But in all the mathematics we have learned up to this point, we
have never talked about computing with "symbols". All we know are numbers and
geometric figures. So suppose x and y are numbers. But what numbers are they?
Could x be 7 and y be −2? Not really, because it is not true that

2(7) − 3(−2) = 1,
3(7) + 2(−2) = −1.
Therefore the above computation could not possibly be valid for any two numbers x
and y and would make sense only if this pair (x, y) is a solution of the linear system
(6.44). At this point, we must recall the basic etiquette in the use of symbols (see
page 299): before we use any symbol, we must know what it stands for. Let us start
over and do things in a way that makes sense. The following discussion is parallel
6.7. SIMULTANEOUS LINEAR EQUATIONS 381

to the discussion on the solution of a linear equation in one variable on pp. 324ff.;
the reader may wish to review the latter before proceeding further.
Suppose (x0 , y0 ) is a solution of the system (6.44); i.e., we assume that for an
ordered pair of numbers (x0 , y0 ), we have

2x0 − 3y0 = 1,
3x0 + 2y0 = −1.
Then these are two equalities of numbers, and we can proceed to compute with
them in the usual way that we do arithmetic. From the second equation, we get
3 1
y0 = − x0 − .
2 2
Substituting this value of y0 into the first equation gives
 
3 1
2x0 − 3 − x0 − = 1.
2 2
Solving this linear equation in x0 (as in Section 6.2 on page 322), we get 13
2 x0 = − 12
so that
1
x0 = − .
13
Substituting this value of x0 into y0 = − 32 x0 − 12 then leads to
5
y0 = − .
13
Note that if we replace x0 by x and y0 by y, then this computation is formally
identical to the previous method of solution taught in the school classroom. The
only difference is that the second computation is the one that is mathematically
valid, because it is nothing but an ordinary computation carried out with numbers.
To summarize, what we have proved is this:
(A) If (x0 , y0 ) is a solution of the given linear system (6.44),
then
1 5
x0 = − and y0 = − .
13 13
Have we shown that (− 13 1
, − 13
5
) is actually a solution of the given linear system?
No. For that purpose, we need to prove the following assertion, which is in fact the
converse statement of (A):
(B) If x0 = − 13 1
and y0 = − 13 5
, then (x0 , y0 ) is a solution of the
linear system (6.44).
A routine computation verifies that, indeed,

1
5
⎨ 2 − 13 − 3 − 13 = 1,

1
5
3 − 13 + 2 − 13 = −1.
So the ordered pair 1
(− 13 , − 13
5
produced by the method of substitution is a solution
)
of the system (6.44), and (B) is correct.
Obviously, this pragmatic answer would be of little value if it were a singular
occurrence that happens to furnish a solution for this linear system but for no others.
Such is not the case. We now give a self-contained and coherent account that shows
that the usual method of substitution is the procedural aspect of a mathematically
valid method of solution. In other words, the rote procedure may seem to make
382 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

no sense, but in fact it can be shown to make sense after all. Here is the general
explanation.
Assume a linear system

ax + by = e,
(6.45)
cx + dy = f.
We assume that its determinant Δ = ad − bc = 0. We first assume that the system
has a solution (x0 , y0 ). Then
(6.46) ax0 + by0 = e,
(6.47) cx0 + dy0 = f.
We now follow the usual computations used in the method of substitution to de-
termine the explicit value of (x0 , y0 ), as follows. If b = d = 0, then Δ = 0. Thus
Δ = 0 implies that not both b and d are 0. Without loss of generality, we may
assume d = 0. Then equation (6.47) implies y0 = − dc x0 + fd . Substitute this value
of y0 into (6.46) to get
 
c f
ax0 + b − x0 + = e.
d d
Multiplying both sides by d and simplifying, we obtain (ad − bc)x0 = de − bf , so
that
de − bf
x0 = .
Δ
Substituting this value of x0 into the equation y0 = − dc x0 + fd , we obtain
   
c de − bf f 1 −cde + bcf + adf − bcf
y0 = − + =
d Δ d d Δ
and therefore
af − ce
y0 = .
Δ
What we have just proved is that, assuming that the system (6.45) has a solution
(x0 , y0 ) and that Δ = 0, then necessarily
 
de − bf af − ce
(6.48) (x0 , y0 ) = , .
Δ Δ
It is now routine to check that the (x0 , y0 ) in (6.48) is in fact a solution of the system
(6.45). For example, let us check that with (x0 , y0 ) as in (6.48), (6.46) holds:
   
de − bf af − ce
ax0 + by0 = a +b
Δ Δ

ade − abf + abf − bce


=
Δ
 
ade − bce ad − bc
= =e .
Δ Δ
But by definition Δ = ad−bc, so the last term is equal to e, and we have ax0 +by0 =
e, showing that (6.46) is valid. Similarly, (6.47) also holds for the values of (x0 , y0 )
in (6.48) (see Exercise 1 on page 383).
6.7. SIMULTANEOUS LINEAR EQUATIONS 383

It remains to observe that the computations that led to (6.48) are exactly what
we do when solving simultaneous equations by substitution.

Pedagogical Comments. One may ask whether it is realistic to explain the


solution of a general linear system (6.45) in an eighth-grade classroom. There is
usually not one simple answer to a pedagogical question, but for an average eighth-
grade class, the extensive symbolic computations in this case may be a bit much.
However, it is possible to convey the same basic idea while minimizing the needed
abstraction by using a system such as (6.44) on page 380. You are strongly encour-
aged to try. End of Pedagogical Comments.

Exercises 6.7.
(1) Verify by direct computation that equation (6.47) holds for the (x0 , y0 ) in
(6.48).
(2) Discuss the solutions of each of the following systems without actually
solving them.
 Give reasons.
4x − 3y = 1,
(a)
9x − 7y = 53 .

2.4x − 3.5y = 43,
(b)
−0.264x + 0.385y = −4.63.

15x + 12y = −4,
(c)
−35x − 28y = 28 3 .
(3) (a) Give a direct proof of () below by using Lemma 6.13 on page 356 and
Theorem 6.21 on page 374 without invoking Theorem 6.22 (on page 377).
() Assume the linear system (where m, m , k, and k are con-
stants)

mx + y = k,
m x + y = k  .

(i) If m = m , it has a unique solution, and (ii) if m = m , it has


either an infinite number of solutions when k = k or no solution
when k = k .
(b) Prove that () implies Theorem 6.22.
(4) If 3 is added to the numerator of a fraction and 7 subtracted from the
denominator, its value is 67 . But if 1 is subtracted from the numerator
and 7 added to the denominator, its value is 25 . Find the fraction.
(5) If the digits of a two-digit number are reversed, the new number is 9 less
than the original. The sum of the digits is also 9. (a) If x is the tens digit
and y the ones digit, write down equations satisfied by x and y. (b) What
is this number?
(6) If the digits of a two-digit number are reversed, the new number is 1 less
than twice the original number. Furthermore, if 10 times the tens digit is
divided by the ones digit, the quotient is 4 and the remainder is 2. (a) If
x is the tens digit and y the ones digit, write down the equations satisfied
by x and y. (b) What is this number?
384 6. SYMBOLIC NOTATION AND LINEAR EQUATIONS

−5(x+1)
(7) Let x be a number not equal to −4 or 3. Express 3(x2 +x−12) as a sum of
1 1
(constant) multiples of x+4 and x−3 .
(8) Suppose you are teaching a ninth grade algebra class and you want to show
your students how to solve the following linear system by the method of
substitution for the first time:

2x − y = −1,
x + 2y = 7.
Carefully explain how you would teach it.
(9) Given positive integers s and t with s < t. (a) Solve for u and v in the
linear system ⎧
⎨ u + v = t,
s
⎩ u − v = s
t.
(b) Show that u > 0 and v > 0. (c) If the solutions u and v in terms
of s and t are written in the form of u = cb and v = ab , where a, b, c
are positive integers expressed in terms of s and t, show that {a, b, c}
forms a Pythagorean triple; i.e., a, b, and c are positive integers and
a2 + b2 = c2 . (Note: See page 290. This way of generating Pythagorean
triples dates back to the Babylonians circa 1800 BC. See [Robson] and
also Chapter 1 of [Katz].)
(10) In each of the following, you are asked to solve the linear system in the
preceding exercise with the given values of s and t to obtain Pythagorean
triples. You may use a scientific calculator.
(a) s = 1, t = 2. (b) s = 2, t = 3. (c) s = 2, t = 69.
(d) s = 54, t = 125. (e) s = 8, t = 9907.
(11) A Pythagorean triple is said to be primitive if there is no (positive)
common divisor among the triple of positive integers other than 1. Prove
that the following are equivalent for a Pythagorean triple {a, b, c}:
(i) {a, b, c} is primitive.
(ii) {a, b} are relatively prime.
(iii) {a, c} are relatively prime.
(iv) {b, c} are relatively prime.
(12) Let {a, b, c} be a Pythagorean triple so that a2 + b2 = c2 . If {a, b, c} is
primitive, prove that one of a and b is even and the other is odd.
(13) If s and t are relatively prime positive integers so that one is even and
the other is odd, then prove that the Pythagorean triples produced in
Exercise 9 are primitive.
(14) Let {a, b, c} be a Pythagorean triple. Prove that the following are equiv-
alent:
(i) {a, b, c} is a primitive Pythagorean triple with a odd and b even.
(ii) There is a pair of relatively prime positive integers s and t, with s < t
and one of them is even and the other odd, so that a = t2 − s2 , b = 2st,
and c = t2 + s2 .
Glossary of Symbols

N : the whole numbers, 6


Q : the rational numbers, 90
R : the real numbers, 6
Z : the integers, 91
⇐⇒ : is equivalent to, 22
=⇒ : implies, 76
· · · a for a number a and a positive integer n, 14
an : the product aaa
n

n!n : n factorial for a whole number n, 42n!


k : binomial coefficient defined by k!(n−k)! , 42
n|m : n divides m for integers m and n (n = 0), 138
n  |m : n does not divide m for integers m and n (n = 0), 138
GCD(a, b) : the greatest common divisor of integers a and b, 138
B −1 : multiplicative inverse of a fraction B, 56
p∗ : mirror reflection of a number p, 90

−x : the vector from the origin 0 of a number line to the fraction x on this
number line, 101
x · y : product of the numbers x and y, 37
|x| : absolute value of a number x, 125

α : the positive square root of a positive number α, 148
[a, b] : the segment or the closed interval from a to b for numbers a < b, 5
(a, b) : the open interval from a to b for numbers a < b (it could also mean
the point (a, b) in the coordinate plane), 126
< : less than, 12
≤ : less than or equal to, 13
> : greater than, 13
≥ : greater than or equal to, 13
AB : the segment joining A to B, 169
dist(A, B) : the distance between two points A and B in the plane, 184
A ∗ C ∗ B : the point C is between points A and B, 167
|P Q| : the length of segment P Q, 185
LP Q : the line joining P to Q, 166
H+ , H− : half-planes of a line L, 176
ROP : the ray from O to P , 174
−−→
AB : the vector from A to B in the plane, 219
 : is parallel to, 165
⊥ : is perpendicular to, 192
∠AOB : an angle with vertex O and sides ROA and ROB , 182
ABC : triangle ABC, 245

385
386 GLOSSARY OF SYMBOLS

∠A : the angle of a triangle or a polygon at a vertex A, 182


x◦ : x degrees (of an angle), where x is a positive real number, 188

= : is congruent to, 240
∼ : is similar to, 284
∈ : belongs to (as in a ∈ A), 9
A ⊂ B : A is contained in B, 91
∪ : union (of sets), 195
∩ : intersection (of sets), 165
R2 : the coordinate plane, 332
(x, y) : the coordinates of a point in the plane (it could also mean the open
interval from the number x to the number y), 333
H+ , H− : half-planes of a line L, 176

AB : one of two arcs joining the point A to the point B on a circle, 190
Bibliography

[Arbaugh et al.] F. Arbaugh, M. Smith, J. Boyle, and M. Steele, We Reason & We Prove for All
Mathematics, Corwin, Thousand Oaks, CA, 2018.
[Ball] D. L. Ball, The mathematical understandings that prospective teachers bring to teacher
education, Elementary School Journal 90 (1990), 449–466.
[Ball-McDiarmid] D. L. Ball and G. W. McDiarmid, The subject matter preparation of teachers.
In W. R. Houston (ed.), Handbook of Research on Teacher Education, Macmillan, New
York, NY, 1990, 437–449.
[Barnett-Goldstein-Jackson] C. Barnett, D. Goldstein, and B. Jackson, Fractions, Decimals, Ra-
tios, & Percents, Heinemann, Portsmouth, NH, 1994.
[Bashmakova-Smirnova] I. Bashmakova and G. Smirnova, Beginning & Evolution of Algebra,
Mathematical Association of America, Washington, DC, 2000.
[Beckmann-Izsák] S. Beckmann and A. Izsák, Why is slope hard to teach?, September 1, 2014.
Retrieved from http://tinyurl.com/j3k5q7r
[Begle1972] E. G. Begle, Teacher knowledge and student achievement in algebra, SMSG Reports,
No. 9, 1972, https://eric.ed.gov/?id=ED064175
[Behr et al.] M. Behr, G. Harel, T. Post, and R. Lesh, Rational numbers, ratio, and proportion.
In D. A. Grouws, editor. Handbook of Research on Mathematics Teaching, Macmillan,
New York, 1992, 296–333.
[Carpenter et al.] T. P. Carpenter, M. L. Franke, and L. Levi. Thinking mathematically: Integrat-
ing algebra and arithmetic in elementary school, Heinemann, Portsmouth, NH, 2003.
[CCPublishers1] K-8 Publishers’ Criteria for the Common Core State Standards for Mathematics
(2012). Retrieved from http://tinyurl.com/bpgx8ed
[CCPublishers2] High School Publishers’ Criteria for the Common Core State Standards for
Mathematics (2013). Retrieved from https://tinyurl.com/y8upgtcr
[CCSSM] Common Core State Standards for Mathematics (2010). Retrieved from http://www.
corestandards.org/Math/
[Courant-Robbins] R. Courant and H. Robbins, What Is Mathematics?, Oxford University Press,
New York, 1941. MR0005358
[Davis-Pearn] G. E. Davis and C. A. Pearn, Division of Fractions, Republic of Mathematics
Publications, 2009, http://tinyurl.com/h85qwb8
[DeTurck] D. DeTurck, Down with Fractions!, September 22, 2002. Retrieved from https://www.
youtube.com/watch?v=AKYZhdbnOWM
[Education Week] Education Week, Researcher Isolates Common-Core Math Implementation
Problems, November 14, 2014. Retrieved from: https://tinyurl.com/y7hme29c
[Ellis-Bieda-Knuth] A. B. Ellis, K. Bieda, and E. Knuth, Essential Understanding of Proof and
Proving, National Council of Teachers of Mathematics, Reston, VA, 2012.
[EngageNY] Grade 8 Mathematics Module 4: Teacher Materials. http://tinyurl.com/h2er4qh
[Euclid1] Euclid, The Thirteen Books of the Elements, Volume 1. T. L. Heath, transl., Dover
Publications, New York, NY, 1956.
[Euclid2] Euclid, The Thirteen Books of the Elements, Volume 2. T. L. Heath, transl., Dover
Publications, New York, NY, 1956.
[Eureka] Common Core’s Eureka Math — Grade 8, https://greatminds.org/resources/
[Gibson] G. A. Gibson, Common-Sense Methods in Arithmetic and Algebra, The School World,
No. 97, January 1907. Retrieved from http://tinyurl.com/zemxm5q
[Ginsburg] J. Ginsburg, On the Early History of the Decimal Point, Amer. Math. Monthly 35
(1928), no. 7, 347–349, DOI 10.2307/2298362. MR1521514

387
388 BIBLIOGRAPHY

[Goldhaber-Brewer] D. D. Goldhaber and D. J. Brewer, Does teacher certification matter? High


school certification status and student achievement, Educational Evaluation and Policy
Analysis 22 (2000), 129–146.
[Greenberg] M. J. Greenberg, Euclidean and Non-Euclidean Geometries, 4th ed., Freeman, New
York, NY, 2008.
[Hartshorne] R. Hartshorne, Geometry: Euclid and Beyond, Springer, New York-Berlin-
Heidelberg, 1997.
[Henle] M. Henle, A combinatorial introduction to topology, A Series of Books in Mathematical
Sciences, W. H. Freeman and Co., San Francisco, Calif., 1979. MR550879
[Henriksen] M. Henriksen, What is Gödel’s Theorem?, Scientific American. Retrieved from
http://tinyurl.com/n6pgmr8
[Herstein] I. N. Herstein, Topics in algebra, 2nd ed., Xerox College Publishing, Lexington, Mass.-
Toronto, Ont., 1975. MR0356988
[Hilbert] D. Hilbert, Foundations of Geometry, E. J. Townsend, transl., Open Court Pub. Co.,
La Salle, IL, 1950, http://www.gutenberg.org/files/17384/17384-pdf.pdf
[Jacobson] N. Jacobson, Basic algebra. I, W. H. Freeman and Co., San Francisco, Calif., 1974.
MR0356989
[Karzarinoff] N. D. Kazarinoff, Analytic inequalities, Holt, Rinehart, and Winston, New York,
1961. MR0260957
[Katz] V. J. Katz, A History of Mathematics, 3rd ed., Addison-Wesley, Reading, MA, 2008.
[Keeghan] A. Keeghan, Afraid of Your Child’s Math Textbook? You Should Be, blog posted on
February 17, 2012. Retrieved from https://tinyurl.com/yxo2a97h
[KhanAcad] Khan Academy, What is a variable?, http://tinyurl.com/cxs3zmc
[Loewus1] L. H. Loewus, Gates Chief Acknowledges Common-Core Missteps, Education Week,
May 23, 2016. Retrieved from: https://tinyurl.com/yd5juay5
[Loewus2] L. H. Loewus, Study: Improving Teachers’ Math Knowledge Doesn’t Boost Student
Scores, Education Week, September 28, 2016. Retrieved from https://tinyurl.com/
zh6hhez
[McCrory et al.] R. McCrory, R. Floden, J. Ferrini-Mundy, M. D. Reckase, and S. L. Senk, Knowl-
edge of Algebra for Teaching: A Framework of Knowledge and Practices, J. Research
Math. Educ. 43 (2012), 584–615.
[McNeil et al.] N. McNeil, L. Grandau, E. Knuth, M. Alibali, A. Stephens, S. Hattikudur, and
D. Krill. Middle-school students’ understanding of the equal sign: The books they read
can’t help, Cognition and Instruction 24(3) (2006), 367–385, https://cladlab.nd.edu/
assets/250420/mcneiletal06.pdf
[MET2] The Mathematical Education of Teachers II, Conference Board of the Mathematical
Sciences, Amer. Math. Soc., Providence, RI, 2012, https://www.cbmsweb.org/archive/
MET2/met2.pdf
[Meyer] Dan Meyer, The Math I Learned After I Thought I Had Already Learned Math, August
11, 2015. Retrieved from http://tinyurl.com/n98xgeb
[Milgram-Wurman] R. J. Milgram and Z. Wurman, Missing, delayed, or muddled topics in Com-
mon Core’s Math Standards. Retrieved from https://static.ark.org/eeuploads/lt-
gov/Errors_and_Omissions_In_CC_Math_Standards_Milgram-Wurman.pdf
[Moise-Downs] E. E. Moise and F. L. Downs, Geometry, Addison-Wesley, Reading, MA, 1964.
[Monk] D. H. Monk, Subject area preparation of secondary mathematics and science teachers,
Economics of Education Review 13 (1994), 125–145.
[MUST] Mathematical Understanding for Secondary Teaching, M. K. Heid, P. S. Wilson, and
G. W. Blume (eds.), National Council of Teachers of Mathematics, IAP, Charlotte, NC,
2015.
[Nagle-Moore-Russo] C. Nagle and D. Moore-Russo, Slope Across the Curriculum: Principles and
Standards for School Mathematics and Common Core State Standards, The Mathemat-
ics Educator 23 (2014), 40–59. Retrieved from https://files.eric.ed.gov/fulltext/
EJ1027058.pdf
[NCTM1989] National Council of Teachers of Mathematics, Curriculum and Evaluation Stan-
dards for School Mathematics, NCTM, Reston, VA, 1989.
[NCTM2009] National Council of Teachers of Mathematics, Focus in High School Mathematics:
Reasoning and Sense Making, NCTM, Reston, VA, 2009.
BIBLIOGRAPHY 389

[Newton-Poon1] X. A. Newton and R. C. Poon, Mathematical Content Understanding for Teach-


ing: A Study of Undergraduate STEM Majors, Creative Education 6 (2015), 998–1031,
http://tinyurl.com/glwzmhd
[Newton-Poon2] X. A. Newton and R. C. Poon, Pre-service STEM majors’ understanding of slope
according to common core mathematics standards: An exploratory study, Global Journal
of Human Social Science Research 15 (2015), 27–42.
[NMAP1] Foundations for Success: Final Report, The National Mathematics Advisory Panel,
U.S. Department of Education, Washington, DC, 2008, http://tinyurl.com/yopzor
[NMAP2] Foundations for Success: Reports of the Task Groups and Sub-Committees, The Na-
tional Mathematics Advisory Panel, U.S. Department of Education, Washington, DC,
2008, http://tinyurl.com/gs83pk7
[Osler] T. J. Osler, An easy look at the cubic formula, https://tinyurl.com/y5k4dtz8
[Paquette] D. Paquette, Farmworker vs robot, The Washington Post, February 17, 2019. Retrieved
from http://tinyurl.com/yyqvlcpd
[Pearson] Geometry Common Core, Pearson, Boston, MA, 2012.
[PG] V. Postelnicu and C. Greenes, Do teachers know what their students know?, National
Council of Supervisors of Mathematics Newsletter 42 (3) (2012), 14–15.
[Phelps-Milgram] R. P. Phelps and R. J. Milgram, The revenge of K–12: How Common Core and
the new SAT lower college standards in the U.S., Pioneer Institute White Paper No. 122,
September 2014. Retrieved from https://tinyurl.com/uh9n3az.
[Pólya-Szegö] G. Pólya and G. Szegö, Problems and Theorems in Analysis, Volume I and Volume
II, Springer-Verlag, Berlin and New York, 1972 and 1976.
[PSSM] National Council of Teachers of Mathematics, Principles and Standards for School Math-
ematics, NCTM, Reston, VA, 2000.
[Quinn] F. Quinn, A revolution in mathematics? What really happened a century ago and why it
matters today, Notices Amer. Math. Soc. 59 (2012), no. 1, 31–37, DOI 10.1090/noti787.
MR2908157
[RAGS] Rising Above the Gathering Storm, Revisited: Rapidly approaching Category 5, The
National Academies Press, Washington, DC, 2010. Available at http://tinyurl.com/
zstra8r
[Robson] E. Robson, Neither Sherlock Holmes nor Babylon: a reassessment of Plimpton 322
(English, with English and German summaries), Historia Math. 28 (2001), no. 3, 167–
206, DOI 10.1006/hmat.2001.2317. MR1849797
[Sawchuk] S. Sawchuk, Are Teachers Getting the Right Kind of Common-Core PD?, Education
Week, Feb. 1, 2016. Retrieved from: https://tinyurl.com/y8238pzj
[Schoenfeld1988] A. H. Schoenfeld, When good teaching leads to bad results: The disasters of
"well-taught" mathematics courses, Education Psychologist 23 (1988), 145–166, https://
tinyurl.com/y4bd68ym
[Schoenfeld1994] A. H. Schoenfeld, What do we know about Mathematics Curricula?, Journal of
Mathematical Behavior 13 (1994), 55–80.
[Serra] M. Serra, Discovering Geometry, 2nd ed., Key Curriculum Press, Oakland, CA, 1997.
[Shulman] L. Shulman, Those who understand: Knowledge growth in teaching, Educational Re-
searcher 15 (1986), 4–14, https://tinyurl.com/yy7k2x7s
[Siegler et al.] R. Siegler, T. Carpenter, F. Fennell, D. Geary, J. Lewis, Y. Okamoto, L. Thomp-
son, and J. Wray (2010), Developing effective fractions instruction for kindergarten
through 8th grade: A practice guide (NCEE #2010-4039), Washington, DC: Institute
of Education Sciences, U.S. Department of Education, https://ies.ed.gov/ncee/wwc/
PracticeGuide/15
[Stump] S. Stump, Secondary mathematics teachers’ knowledge of slope, Mathematics Education
Research Journal 11 (1999), 124–144.
[WikiAlphaGo] Wikipedia, AlphaGo. Retrieved from https://en.wikipedia.org/wiki/AlphaGo
[WikiFermat] Wikipedia, Fermat’s Last Theorem. Retrieved from https://en.wikipedia.org/
wiki/Fermat%27s_Last_Theorem
[Wu1998] H. Wu, Teaching fractions in elementary school: A manual for teachers. Retrieved from
https://math.berkeley.edu/~wu/fractions1998.pdf
[Wu2002] H. Wu, Chapter 2: Fractions (Draft), June 20, 2001; revised September 3, 2002. Re-
trieved from http://math.berkeley.edu/~wu/EMI2a.pdf
390 BIBLIOGRAPHY

[Wu2004a] H. Wu, Geometry: Our Cultural Heritage, Notices Amer. Math. Soc. 51 (2004), 529–
537, https://www.ams.org/notices/200405/rev-wu.pdf
[Wu2004b] H. Wu, "Order of operations" and other oddities in school mathematics, June 1, 2004.
Retrieved from http://math.berkeley.edu/~wu/order5.pdf
[Wu2006] H. Wu, How mathematicians can contribute to K–12 mathematics education, Proceed-
ings of International Congress of Mathematicians, 2006, III, European Mathematical So-
ciety, Madrid, 2006, Zürich, 2006, 1676–1688, http://math.berkeley.edu/~wu/ICMtalk.
pdf
[Wu2008] H. Wu, Fractions, decimals, and rational numbers, February 29, 2008. Retrieved from
https://math.berkeley.edu/~wu/NMPfractions.pdf
[Wu2009] H. Wu, What’s sophisticated about elementary mathematics?, American Educator, Vol.
33, No. 3, Fall 2009, 4–14, https://math.berkeley.edu/~wu/wu2009.pdf
[Wu2010a] H. Wu, Pre-Algebra, April 21, 2010.47 Retrieved from http://math.berkeley.edu/
~wu/Pre-Algebra.pdf
[Wu2010b] H. Wu, Introduction to School Algebra, August 14, 2010. Retrieved from https://
math.berkeley.edu/~wu/Algebrasummary.pdf
[Wu2011a] H. Wu, Understanding Numbers in Elementary School Mathematics,
Amer. Math. Soc., Providence, RI, 2011, https://bookstore.ams.org/mbk-79/
[Wu2011b] H. Wu, The Mis-Education of Mathematics Teachers, Notices Amer. Math. Soc. 58
(2011), 372–384, https://math.berkeley.edu/~wu/NoticesAMS2011.pdf
[Wu2012] H. Wu, Teaching Geometry According to the Common Core Standards, January 1,
2012 (third revision: October 10, 2013). Retrieved from https://math.berkeley.edu/
~wu/Progressions_Geometry.pdf
[Wu2013] H. Wu, Teaching Geometry in Grade 8 and High School According to the Common
Core Standards, October 13, 2013, https://math.berkeley.edu/~wu/CCSS-Geometry_1.
pdf
[Wu2014] H. Wu, Potential impact of the Common Core Mathematics Standards on the Amer-
ican Curriculum. In Mathematics Curriculum in School Education. Y. Li and G. Lap-
pan (eds.), Springer, Dordrecht, 2014, pp. 119–142, https://math.berkeley.edu/~wu/
Common_Core_on_Curriculum_1.pdf
[Wu2015] H. Wu, Mathematical education of teachers, Part II: What are we doing about Textbook
School Mathematics?, AMS Blogs, March 1, 2015, https://tinyurl.com/y46wnahl
[Wu2016a] H. Wu, Teaching School Mathematics: Pre-Algebra, Amer. Math. Soc., Providence, RI,
2016, https://bookstore.ams.org/mbk-98/. Its Index is available at: http://tinyurl.
com/zjugvl4
[Wu2016b] H. Wu, Teaching School Mathematics: Algebra, Amer. Math. Soc., Providence, RI,
2016, https://bookstore.ams.org/mbk-99/. Its Index is available at: http://tinyurl.
com/haho2v6
[Wu2018a] H. Wu, The content knowledge mathematics teachers need. In Mathematics Matters
in Education—Essays in Honor of Roger E. Howe, Y. Li, J. Lewis, and J. Madden
(eds.), Springer, Dordrecht, 2018, pp. 43–91. Also https://math.berkeley.edu/~wu/
Contentknowledge1A.pdf
[Wu2018b] H. Wu, From arithmetic to algebra, Part 1 and Part 2, December 20, 2018. Retrieved
from https://math.berkeley.edu/~wu/Arithmetic-to-Algebra2019.pdf
[Wu2020b] H. Wu, Algebra and Geometry, Amer. Math. Soc., Providence, RI, 2020.
[Wu2020c] H. Wu, Pre-Calculus, Calculus, and Beyond, Amer. Math. Soc., Providence, RI, 2020.

47 This is referenced in [CSSM], page 92, as "Wu, H., Lecture Notes for the 2009 Pre-Algebra

Institute".
Index

AA criterion, 288 angles


above (a horizontal line), 334 alternate interior, 276
absolute error, 130 corresponding, 276
absolute value, 121, 125, 125–126 opposite, 276
rationale for, 130–131 Arbaugh, Fran, xviii
Ackerman, Nate, 216 arc, 190
acute angle, 191 intercepted by an angle on a circle,
acute triangle, 191 190
addition algorithm for (finite) length of, 190, 197
decimals, 34 major, 190
addition of fractions, 32–36 minor, 190
addition of rational numbers, 92–96 arctan function, 207
using vectors, 102–104 area, 17, 47
addition of vectors (on number line), of rectangle, 48
101 arithmetic mean, 132
additivity property of length, 98 arithmetic mean-geometric mean
adjacent inequality, 127, 132
angles (with respect to an angle), ASA, 51, 245
186, 186–188 associative law, 2, 44
sides, 171 for addition, 87
vertices, 171 failure for subtraction, 96, 98
al-Khwarizmi, 301 for addition of fractions, 33
algorithm, 139 for addition of rational numbers,
alternate interior angles, 276, 277, 91–94
281 for multiplication, 87
analytic geometry, 336 for multiplication of fractions, 46
angle, 182 for multiplication of rational
acute, 191 numbers, 105
convention about, 182 for multiplication of real numbers,
full, 183 133
interior (of a polygon), 197 for number expressions, 302, 304,
obtuse, 191 312, 323
right, 191 makes no sense for variables, 325
straight, 183 assumptions
zero, 183 about basic isometries, 237
angle bisector, 192 about reflections, 231
angle-angle criterion (= AA about rotations, 217
criterion), 288 about translations, 236
391
392 INDEX

for geometry, 165–176, 184–188, for rational quotients, 118


237, 250 Cataldi, Pietro, 308
on addition of rational numbers, CCSSM, xvi–xxvii, 5–362
92 center
on multiplication of rational of a rotation, 202
numbers, 105 of a circle, 186
auxiliary lines, 264 of a dilation, 268
average rate, 81–82 circle, 186
lawn-mowing, 82 center of, 186
average speed (over an interval), 79 exterior of, 186
pitfall of the terminology, 79
inside, 186
relation with constant speed, 79
of radius r, 186
axiomatic system, xxxiv, 159–161
unit, 186
Babylonians, 290, 384 clockwise rotation, 202
basic etiquette in the use of symbols, pedagogical issues with its
299 definition, 204
basic isometries, 161–163, 199, 200, clockwise rotation around O, 215
216, 217, 229, 231, 237 clockwise rotation around any point,
assumption about, 237 215
below (a horizontal line), 334 closed bounded interval, 5, 169
between (a point between two closed disk, 186
points), 167 confused with circle, 193
between (two points on a number closed half-plane, 176
line), 13 closed interval, 126
betweenness, 166, 167 closed set, 195
bigger than inside a circle, 195
among numbers, 13 closed under composition, 240
bijection, 205, 333 coefficient, 303, 312
bijective, 205 coherence, xiii, 318
bilateral symmetry, 230 importance of, xxxii
binomial coefficient, 42 collecting like terms, 311
bipolar mathematics education, collinear (points), 169
xxxvii
Collins, David, 68
bisect, 192
Common Core State Standards for
bisect each other (two segments),
Mathematics (= CCSSM), xvi
228, 260
common denominator, 23, 33
bisector
common divisor (of two integers),
angle, 192
138
perpendicular, 192
boundary, 194 commutative (in composition of
of disk, 198 transformations), 209, 371
boundary point, 194, 194 commutative law, 2
bounded, 194 for addition, 87
for addition of fractions, 33
cancellation law, 20 for addition of rational numbers,
generalized, 118 91–94
cancellation rule for multiplication, 87
for fractions, 47 for multiplication of fractions, 46
INDEX 393

for multiplication of rational convex (geometric figure), 172,


numbers, 105 181–182
for multiplication of real numbers, convex polygon, 197
133 coordinate axis, 331
for number expressions, 302, 304, coordinate system, 331
312, 323 coordinate axes of, 331
makes no sense for variables, 325 four quadrants of, 335
commutative ring, 133 origin of, 331
comparing fractions, 38, 52 scaled, 361
comparing rational numbers, setting up, 332
121–125 coordinates (of a point), 332
complement (of a geometric figure), coordinates in the plane, 331
181 introduction of, 331
complete expanded form (of x-axis, 331
decimal), 35 y-axis, 331
complex fraction, 68 copies (of a fraction), 12
corresponding angles, 276, 277, 281
denominator of, 68
corresponding sides
numerator of, 68
proportional, 287
complex fractions, 72–74
counterclockwise rotation, 201
basic formulas, 70–71 pedagogical issues with its
composite, 148 definition, 204
composite transformation, 208 properties of, 201
composition of transformations, 208 counterclockwise rotation around O,
concatenation, 12 212–215
congruence, 17, 240 counterclockwise rotation around
closed under composition, 240 any point, 215
equivalence relation, 241 cross-multiplication algorithm, 22,
reflexive relation, 241 134
students’ confusion in TSM, mistaken for rote skill, 23
158–159 cross-multiplication inequality, 32,
symmetric relation, 240 36, 118
transitive relation, 241 crossbar axiom, 250, 251
congruent to, 240 cubic polynomial, 313
symbol for, 240, 245
data points, 272, 274
connected (region), 195
decagon, 171
constant, 301
decimal digits, 14
term, 312 decimal fraction, 14
constant rate, 76, 81–82 decimal point, 14
house-painting, 82, 86 decimals, 4, 14
lawn-mowing, 81 addition algorithm for, 34
water flow, 82, 84 finite, 14
constant speed, 77, 78 infinite, 14
importance of definition, 81 multiplication algorithm for, 51
constant transformation, 200 subtraction algorithm for, 40
converse, 22 terminating, 14
conversion of fractions to decimals definitions, xiii
by long division, 62–65 absence of, in TSM, xxviii
394 INDEX

importance of, xxviii–xxix, 75 relation with length of a segment,


the role of, xxix–xxx 184
degree (of a polynomial), 312, 316 unit, 7
degree (of a rotation), 202 distance formula, 336
negative, 203 distinct lines, 165
degree (of an angle), 190, 191 distinct rays, 175
preserve, 203, 231, 236, 237 distributive law, 2, 88
denominator for fractions, 46, 59
of a complex fraction, 68 for number expressions, 302, 304,
of a fraction, 10 312, 323
of a rational quotient, 118 for rational numbers, 105
for real numbers, 133
derivative, 310
makes no sense for variables, 325
Desargues, Girard, 308
divide (one integer by another), 138
Descartes, René, 301, 308, 336
division as multiplication, 47, 57, 115
determinant, 377
division interpretation of a fraction,
and solutions of a linear system, 28, 30
377 division of decimals, 60
nonvanishing, 377 division of fractions, 57
vanishing, 377 by a whole number, 46
diagonal, 171 relation to division of whole
difference (of two fractions), 41 numbers, 55, 58
difference quotient (of two points), division of rational numbers, 112–115
347 division of whole numbers, 29, 58
different sides division-with-remainder, 15, 137, 139
of a line in the plane, 176 dividend of, 140
of a point on a line, 174 divisor of, 140
dilation, 268 quotient of, 140
basic properties of, 269–271 remainder of, 140
center of, 268 divisor (of an integer), 138
effect on lengths and degrees, 275 divisor of zero, 133
scale factor of, 268 double inequality, 126
direction (of a vector on number associated, 126
line), 100
edge (of polygon), 170, 171
disjoint (sets), 173
elimination, method of, 380
disjoint union
empty set, 352
of sets in a line, 173
endpoint, 6, 126, 169
of sets in the plane, 175 left, 6
disk, 186 of vector, 100, 219
closed, 186 right, 6
open, 186 equal angles, 244
unit, 186 equal expressions, 303
distance, 6–7, 125 equal fractions, 12
between parallel lines, 228 equal ordered pairs of numbers, 332
between two points, 6, 184 equal parts (of a segment), 9
function, 185 equal segments, 244
of a point to a line, 228 equal sets, 141
preserve, 200 equal transformations, 208, 208
INDEX 395

equality of two sets, 141 factorization (of an integer), 138


equation, 322 FASM, xix, 3, 76, 86, 107, 121, 127,
in one variable, 322 133, 258, 304, 316, 347–348, 350
in two variables, 323 Fermat, Pierre, 308, 336
linear, 323 FFFP, 23, 23–24, 33–34, 36, 39
quadratic, 322 field, 133
solution of, 322 finite decimals, 14
solve, 323 division of, 60
equation in two variables, 323 five fundamental principles of
solve, 323 mathematics, xiii, xxxiii
equation of a line, 354 FOIL, 314
slope-intercept form, 356 folding transformation, 207
two-point form, 357 fraction, 10
equidistant complex, 68
from 0, 90 convert to a decimal by long
points, 6 division, 62–65
equidistant parallel lines, 258 convert to a finite decimal, 60
equilateral triangle, 193 copies of, 12
equivalence of two theorems, 257 denominator of, 10
equivalence relation, 241 division interpretation of, 28, 30
equivalent equations, 328 in lowest terms, 30, 138
equivalent expressions, 303 numerator of, 10
equivalent fractions, 12, 20–21, 25, proper, 36
35, 47, 61 reduced, 138
equivalent to, 22 reduced form of, 138
Euclid, xlii, 137, 155, 160, 161, 244 TSM concept of, 4–5
proof of infinity of primes, 155 unit, 10
Euclidean algorithm, 140, 143–144 fraction addition, 33
Euler, Leonhard, 308
formula for, 33
existence, 56, 112, 114, 139, 150,
fraction division, 57
155, 166, 197, 216, 218, 222,
formula for, 57
224, 225, 236, 284, 322
fraction multiplication, 45
expression, 302
formula for, 46
coefficients in, 303
fraction subtraction, 38
notational conventions for, 302
formula for, 39
number, 302
fractional multiple, 56
order of operations in, 303
fractions, 10
expressions
addition of, 33
equal, 303
equivalent, 303 cancellation rule for, 47
extension (of a concept), 29, 58, 92, common denominator of, 23, 33
96 division of, 57
exterior of a circle, 186, 194 equal, 12
equal to finite decimals, 152
factor (of an integer), 138 equivalent, 12, 20–21
factorial, 42 high school teachers’ need for, xxiv
factorization, 314 multiplication of, 26, 45
potential harmful effects of order among, 12
overemphasis, 315 reducing, 20
396 INDEX

research on the teaching of, Hald, Ole, 67, 86, 156, 198, 276
xxi–xxii half-line, 173, 253, 359
subtraction of, 38 half-plane, 176, 253
sum of, 33 closed, 176
Francis, Larry, 246, 291 left, 334
FTS, 51, 256, 263 lower, 334
FTS*, 257 right, 334
full angle, 183 upper, 334
fundamental assumption of school half-planes
mathematics (= FASM), 3 opposite, 176
fundamental assumption of school harmonic mean, 72
mathematics (= F ASM ), 133 heptagon, 171
fundamental fact of fraction-pairs Heron’s formula, 296
(= FFFP), 23 hexagon, 170, 171
fundamental principles of HL (criterion for triangle
mathematics, xiii, xxxix, 2, 75, congruence), 293
160, 178 horizontal line, 332
fundamental theorem of arithmetic, above, 334
149 below, 334
fundamental theorem of similarity slope of, 347
(= FTS), 256 house-painting, 82, 86
hypotenuse (of a right triangle), 290
Gardiner, Tony, 198 hypotenuse-leg (= HL), 293
GCD, 138
generality and abstraction, 299, identity, 303–304
305–310 identity transformation, 200
geometric figure, 47 if and only if, 22
bounded, 194 image (of a transformation), 205
data points on, 272, 274 inequalities, 121
paved by other geometric figures, about absolute value, 127–130
47 about rational numbers, 121–125
rectilinear, 269 inequality, 12
unbounded, 194 arithmetic and geometric means,
geometric mean, 132 127, 132
geometric series, 309 double, 126
finite, 309 triangle, 129
geometry curriculum infinite decimals, 14
issues with, 157–164 infinity of primes, 155
Gödel’s incompleteness theorem, 169 injection, 205
Goldbach conjecture, 147 injective, 205
graph of a linear equation in two inscribed (in a circle), 193
variables, 353–354 inside
graph of an equation in two of a circle, 195
variables, 352 of a polygon, 196
greater than inside a circle, 186
among numbers, 13, 91 integers, 91
greatest common divisor (= GCD), integral domain, 133
40, 138 integral linear combination, 140
group, 211, 235, 240, 286 interior angle of a polygon, 197
INDEX 397

intermediate value theorem, 195 left half-plane (of a vertical line), 334
intersection (of sets), 165 left-pointing (vector on number
interval line), 100
closed, 126 leg (of a right triangle), 290
closed bounded, 5, 169 length
length of, 126 additivity property of, 98
open, 126 of a segment, 7, 185
inverse, 56 of a vector, 100, 220
multiplicative, 56, 113 of an interval, 126
inverse (of a transformation), 211, preserve, 201
211 less than
inverse transformation, 211 among numbers, 12
of a congruence, 240 line
of a reflection, 231 defined by a linear equation in two
of a rotation, 211 variables, 354
of a similarity, 284 joining two points, 166
of a translation, 234 segment (joining two points), 169,
invert and multiply rule, 1, 57, 59, 358
68, 71, 119 slants / or \, 363
generalized form, 119 line separation, 173
irrational number, 153 line symmetry, 230
isolating the variable, 329 linear equation, 300
isometry, 200 in one variable, 323
relation with congruence, 237, 250 in two variables, 352
isosceles triangle, 193 linear equations
controversy in TSM about its simultaneous, 373
definition, 193 system of, 373
linear polynomial, 313
Jordan curve theorem, 196
linear system, 373
key lemma, 144 determinant of, 377
Koswatta, Sunil, 101 in two variables, 373
of two equations in two unknowns,
(L1) (geometric assumption), 165 373
(L2) (geometric assumption), 165 relation with geometry, 374–375
(L3) (geometric assumption), 167 solution by elimination, 380
(L4) (geometric assumption), 176 solution by substitution, 380
(L5) (geometric assumption), 184 solution set of, 373
(L6) (geometric assumption), 188 relation with determinant, 377
(L7) (geometric assumption), 237 lines
(L8) (geometric assumption), 250 distinct, 165
lawn-mowing, 81 intersecting, 165
LCM, 40, 41, 156 parallel, 165
least common denominator, xxii, 1, perpendicular, 191
32, 34, 40, 41, 134 locating a fraction on a number line,
least common multiple (=LCM), 34, 15, 52
156 lower half-plane, 334
left (of a vertical line), 334 lowest terms (of a fraction), 138
left endpoint, 6 lowest terms (of fraction), 30
398 INDEX

m-th multiple (of a unit fraction), 10 nonagon, 171


major arc, 190 nonconvex, 190
map (transformation), 200, 205 nonconvex angle, 182, 183, 186
mathematical engineering, xii nonempty (set), 173
mathematical integrity, xiii, xv, xix, nonvanishing, 377
xx, xxiii, xxviii, 159 notational conventions
mathematics educator, xi, xxvii for expressions, 302
Mersenne prime, 309 for polynomials, 312
Mersenne, Marin, 308 number, 6
midpoint, 84, 192 irrational, 153
minor arc, 190 negative, 91
minus sign, 97 positive, 91
mirror reflection, 90 real, 6
relation with addition, 93–96 whole, 6
relation with multiplication, number expression, 302
105–106 number line, 5, 6
mixed number, 36 mirror reflection on, 90
mixed numbers, 15 numerator
subtraction of, 39–40 of a complex fraction, 68
monomial, 312 of a fraction, 10
move, 205 of a rational quotient, 118
move (transformation), 200
mph, 77 obtuse angle, 191
multiple, 311 obtuse triangle, 191
integral, 138 octagon, 171
rational, 114 of (as in fraction of a fraction), 20,
whole number, 10, 56 24, 24–28
multiplication algorithm for one-to-one (= injective), 205
decimals, 51 one-to-one correspondence, 205
multiplication of fractions, 26, 45 (= bijective), 205
motivation for the definition, onto, 205
43–45 onto (= surjective), 205
multiplication of rational numbers, open disk, 186
105–110 open interval, 126
multiplicative inverse opposite angles at a point, 276
of a fraction, 56 opposite half-planes, 176
of a rational number, 113, 114, opposite rays, 174
133, 134 opposite sides
of a line in the plane, 176
n factorial, 42 of a point on a line, 174
n-gon, 171 of a quadrilateral, 193
n-sided polygon, 171 opposite signs, 125
National Mathematics Advisory opposite vertices (of a quadrilateral),
Panel, xxiv, 1 227
negative degree (of a rotation), 203 order
negative numbers, 91 among fractions, 12
negative sign, 91 among numbers, 12, 91
negative times negative is positive, order of operations, 313
107–110, 135 ordered pair, 134, 332, 333
INDEX 399

origin of a coordinate system, 331 linear, 313


notational conventions for, 312
parallel lines, 165 order of operations, 313
distance between, 228 quadratic, 313
equidistant, 258 positive numbers, 91
parallel postulate, 165 precision, xiii
parallel segments importance of, xxxi
as abuse of notation, 256 preserve degree (basic isometries),
parallelogram, 193 203, 231, 236, 237
characterizations of, 260, 261 preserve distance (transformations),
equality of angles at opposite 200
vertices, 227 preserve length (transformations),
equality of opposite sides, 226 201
opposite vertices of, 227 prime, 148, 151
partitive interpretation of whole
prime decomposition (of a whole
number division, 28
number), 150, 151, 154
parts of a whole, 4
prime number, 148
Pascal, Blaise, 308
primes
pave, 47
infinity of, 155
PEMDAS, 313, 314
problem solving, xxxiv,
pentagon, 171
xxxvi–xxxvii, xxxviii, xxxviii
percent, 73
product formula, 43, 46, 47–51, 53,
of a quantity, 73
69, 71, 134
perfect square, 153
product of fractions, 45
perimeter of rectangle, 53
progression from the simple to the
perpendicular bisector, 192
complex, 51, 249
perpendicular lines, 191
proper fraction, 36
plane separation assumption, 176
purposefulness, xiii
pointing in the same direction, 231
polygon, 171 importance of, xiii, xxxii–xxxiii,
140
confused with polygonal region,
197 Pythagoras, 290
convex, 197 Pythagorean theorem, 290
diagonal of, 171 converse of, 293
edge of, 171 dependence on the parallel
inside of, 196 postulate, 293
region enclosed by, 196 Pythagorean triple, 384
regular, 193, 197
side of, 171 quadrants (of a coordinate system),
polygonal region, 196 335
confused with polygon, 195 quadratic equation
polygonal segment, 195 in one variable, 322
polynomials, 312 quadratic polynomial, 313
coefficients in, 312 quadrilateral, 171
cubic, 313 quotient (of a division), 57, 115
expanded forms of whole numbers quotient (of a
as, 313 division-with-remainder), 140
factorization of, 314 quotient field (of an integral
in several numbers, 317 domain), 133
400 INDEX

radian, 191 relation


radius, 186 equivalence, 241
of closed disk, 186 reflexive, 241
of open disk, 186 symmetric, 240
rate, 76, 81 transitive, 241
ratio, 75, 75 relatively prime (integers), 138, 152
use in everyday context, 75 removing parentheses, 97, 111
rational expressions, 316 research on the teaching of fractions,
in several numbers, 317 xxi–xxii
rational multiple, 114 rhombus, 193
rational number addition, 92, 102 right (of a vertical line), 334
formulas for, 96, 97 right angle, 191
rational number division, 115 right half-plane (of a vertical line),
formula for, 119 334
rational number multiplication, 105 right triangle, 191
formulas for, 107 hypotenuse, 290
rational number subtraction, 96 leg, 290
rational numbers, 90 right-pointing (vector on number
as quotients of integers, 116 line), 100
field of, 133 rise-over-run, 298, 347
rationale for, 89 rotation, 202
rational quotient, 117 center of, 202
ray, 174 counterclockwise, 201
from one point to another, 174 of θ degrees around a point, 202
issuing from a point, 174 same direction, 231
rays same length, 6
distinct, 175 same side
opposite, 174 of a line in the plane, 176
real number, 6, 76, 153 of a point on a line, 174
reasoning, xiii same sign (for two numbers), 122
reciprocal (of a fraction), 57 SAS, 245
rectangle, 193, 251 SAS for similarity, 287
area of, 48 satisfies an equation, 322
existence of, 225 scale factor
perimeter of, 53 in FTS, 256
rectilinear figure, 269 of a dilation, 268
reduced form (of a fraction), 138 of a similarity, 284
reduced fraction, 138 scaled coordinate system, 361
reducing fractions, 20 Schoenfeld, Alan, xv, xviii
reflection, 229 school geometry curriculum
reflexive property, 121 issues with, 157–164
reflexive relation, 241 school mathematics, xii
region, 197 segment, 5, 169
connected, 195 divided into n equal parts, 9
enclosed by a polygon, 196 endpoints of, 169
polygonal, 196 length of, 7, 185
triangular, 197, 198 midpoint of, 84
regular polygon, 193, 197 short, 8
INDEX 401

unit, 6 starting point (of vector), 100, 219


sequence of n-ths, 9, 10 straight angle, 183
sequence of fifths, 9 straight line, 164
sequence of thirds, 8, 9 substitution, method of, 380
setting up a coordinate system, 332 subtraction
short segment, 8, 26 of rational numbers, 91
Shulman, Lee, xiv, xxiv subtraction algorithm for decimals,
side 40
of a polygon, 170, 171 subtraction as addition, 39, 96
of an angle, 182 subtraction of fractions, 36, 38–41
sign (of a number), 339 subtraction of rational numbers,
opposite, 125 96–98, 99
same, 122 sum of fractions, 33
similar to, 284 formula for, 33
similarity, 284 sum vector (of two vectors on
equivalence relation, 285 number line), 101
reflexive relation, 285 surjection, 205
scale factor of, 284 surjective, 205
students’ confusion in TSM, symbols
158–159 basic etiquette in the use of, 299
symmetric relation, 285 need for, 301
transitive relation, 285 need to quantify, 299
slant, 363 symmetric relation, 240
slantangle, 225 symmetric with respect to a line,
slope, 297, 346 219, 230
formula for, 347, 348 Taylor polynomial, 312
local slope at O, 339 terminating decimals, 14
local slope at a point, 342 Textbook School Mathematics
what it is for, 338–342 (= TSM), xiv
slope-intercept form (of the equation theorem on equivalent fractions, 20,
of a line), 356 21–23, 24, 26, 37, 139, 153
slopes transformation, 200
of parallel lines, 363 bijective, 205
of perpendicular lines, 366 composite, 208
smaller than constant, 200
among numbers, 12, 37, 91 folding, 207
solution (of an equation), 322 identity, 200
solutions of an equation in two image of, 205
variables, 351 image under, 205
solving an equation, 300, 323 injective, 205
meaning of, 324–329 inverse, 211
pedagogical comments on, 327 issues with using coordinates, 207
speed, 76–77, 78 maps a point to a point, 200
square, 193 moves a point to a point, 200
square root, 148 rationale for, 199–200
SSS, 51 surjective, 205
standard representation (of a transformations
fraction), 8 composition of, 208
402 INDEX

equal, 208 unknown, 301, 322


transitive property, 121 upper half-plane, 334
transitive relation, 241
translation, 6 vanishing, 377
coordinate description, 368 variable, 297–298, 301, 323
translation (along a vector), 234 defect of teaching in TSM, 302,
transversal (of lines), 224 318–320
trapezoid, 193 isolating the, 329
triangle, 171 not a mathematical concept, 319
vector
acute, 191
in the plane, 219
equilateral, 193
length of, 220
isosceles, 193
on a number line, 100
obtuse, 191
vertex, 170, 171
right, 191
of a polygon, 171
sum of angles, 279
of a ray, 174
triangle congruence, 245
of an angle, 182
special conventions of, 245
vertical line, 332
triangle inequality, 129
left of, 334
triangle similarity, 287
right of, 334
special conventions of, 287
Viète, François, 301
triangular region, 197, 198
trichotomy law, 121 weak inequality, 13
trinomial, 313 well-defined, 45, 57, 138, 168–169,
TSM, xiv, xxvii–xxxv, xxxvii, 1–5, 7, 185, 202, 215, 218, 219, 229,
13, 14, 29, 30, 32, 35–37, 40, 43, 253, 333
48, 53–56, 62, 64, 67–70, 72–76, whole (as in parts of a whole), 4, 7
81–82, 114–115, 119–120, 130, whole number multiple, 56
139, 147, 154–155, 157–163, 166, whole number multiple (of a unit
200, 244, 255–283, 297–298, 302, fraction), 10
313–314, 318–320, 322, 324–325, whole numbers, 6
327, 329, 337–338, 347, 351–352, division of, 29
361–363, 374, 380
x-axis, 331
unbounded, 194 negative, 332
union (of sets), 9 positive, 332
uniqueness, 55–57, 71, 112, 113, 115, x-coordinate of a point, 332
139, 144, 147–151, 165, 184, x-intercept, 357
188, 192, 216, 232, 234, 236, y-axis, 331
326, 331, 377 negative, 332
unit, 6 positive, 332
circle, 186 y-coordinate of a point, 332
disk, 186 y-intercept, 356
unit distance, 7
unit fraction, 10, 42 zero angle, 183
unit length, 7 zero product property, 114
unit segment, 6 zero product rule, 114
unit square, 47 zero vector, 100
This is the first of three volumes that, together, give an exposition of the mathematics of grades
9–12 that is simultaneously mathematically correct and grade-level appropriate. The volumes
are consistent with CCSSM (Common Core State Standards for Mathematics) and aim at
presenting the mathematics of K–12 as a totally transparent subject.
The present volume begins with fractions, then rational numbers, then introductory geometry
that can make sense of the slope of a line, then an explanation of the correct use of symbols that
makes sense of “variables”, and finally a systematic treatment of linear equations that explains
why the graph of a linear equation in two variables is a straight line and why the usual solution
method for simultaneous linear equations “by substitutions” is correct.
This book should be useful for current and future teachers of K–12 mathematics, as well as for
some high school students and for education professionals.

For additional information


and updates on this book, visit
www.ams.org/bookpages/mbk-131

MBK/131

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy