DISCRETE MATHEMATICS AND ITS APPLICATIONS Series Editor
DISCRETE MATHEMATICS AND ITS APPLICATIONS Series Editor
ABHIJIT DAS
K12950
COMPUTATIONAL
NUMBER THEORY
ABHIJIT DAS
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Dedicated to
C. E. Veni Madhavan
Contents
Preface xv
1 Arithmetic of Integers 1
1.1 Basic Arithmetic Operations . . . . . . . . . . . . . . . . . . 2
1.1.1 Representation of Big Integers . . . . . . . . . . . . . 3
1.1.1.1 Input and Output . . . . . . . . . . . . . . . 3
1.1.2 Schoolbook Arithmetic . . . . . . . . . . . . . . . . . . 5
1.1.2.1 Addition . . . . . . . . . . . . . . . . . . . . 5
1.1.2.2 Subtraction . . . . . . . . . . . . . . . . . . . 6
1.1.2.3 Multiplication . . . . . . . . . . . . . . . . . 7
1.1.2.4 Euclidean Division . . . . . . . . . . . . . . . 8
1.1.3 Fast Arithmetic . . . . . . . . . . . . . . . . . . . . . . 11
1.1.3.1 Karatsuba–Ofman Multiplication . . . . . . . 11
1.1.3.2 Toom–Cook Multiplication . . . . . . . . . . 13
1.1.3.3 FFT-Based Multiplication . . . . . . . . . . 16
1.1.4 An Introduction to GP/PARI . . . . . . . . . . . . . . 20
1.2 GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.2.1 Euclidean GCD Algorithm . . . . . . . . . . . . . . . 27
1.2.2 Extended GCD Algorithm . . . . . . . . . . . . . . . . 29
1.2.3 Binary GCD Algorithm . . . . . . . . . . . . . . . . . 31
1.3 Congruences and Modular Arithmetic . . . . . . . . . . . . . 33
1.3.1 Modular Exponentiation . . . . . . . . . . . . . . . . . 38
1.3.2 Fast Modular Exponentiation . . . . . . . . . . . . . . 39
1.4 Linear Congruences . . . . . . . . . . . . . . . . . . . . . . . 41
1.4.1 Chinese Remainder Theorem . . . . . . . . . . . . . . 41
1.5 Polynomial Congruences . . . . . . . . . . . . . . . . . . . . 44
1.5.1 Hensel Lifting . . . . . . . . . . . . . . . . . . . . . . . 44
1.6 Quadratic Congruences . . . . . . . . . . . . . . . . . . . . . 46
1.6.1 Quadratic Residues and Non-Residues . . . . . . . . . 47
1.6.2 Legendre Symbol . . . . . . . . . . . . . . . . . . . . . 47
1.6.3 Jacobi Symbol . . . . . . . . . . . . . . . . . . . . . . 49
1.7 Multiplicative Orders . . . . . . . . . . . . . . . . . . . . . . 51
1.7.1 Primitive Roots . . . . . . . . . . . . . . . . . . . . . . 51
1.7.2 Computing Orders . . . . . . . . . . . . . . . . . . . . 53
1.8 Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . 54
1.8.1 Finite Continued Fractions . . . . . . . . . . . . . . . 54
ix
x
Appendices 465
Index 583
Preface
This book is a result of my teaching a masters-level course with the same name
for five years at the Indian Institute of Technology Kharagpur. The course was
attended mostly by MTech and final-year BTech students from the Depart-
ment of Computer Science and Engineering. Students from the Department
of Mathematics and other engineering departments (mostly Electronics and
Electrical Engineering, and Information Technology) also attended the course.
Some research students enrolled in the MS and PhD programs constituted the
third section of the student population. Historically, therefore, the material
presented in this book is designed to cater to the need and taste of engineering
students in advanced undergraduate and beginning graduate levels. However,
several topics that could not be covered in a one-semester course have also
been included in order to make this book a comprehensive and complete treat-
ment of number-theoretic algorithms.
A justification is perhaps needed to explain why another textbook on com-
putational number theory is necessary. Some (perhaps not many) textbooks on
this subject are already available to international students. These books vary
widely with respect to their coverage and technical sophistication. I believe
that a textbook specifically targeting the engineering population is missing.
This book should be accessible (but is not restricted) to students who have
not attended any course on number theory. My teaching experience shows
that heavy use of algebra (particularly, advanced topics like commutative al-
gebra or algebraic number theory) often demotivates students. While I have
no intention to underestimate the role played by algebra in number theory, I
believe that a large part of number theory can still reach students not conver-
sant with sophisticated algebraic tools. It is, of course, meaningless to avoid
algebra altogether. For example, when one talks about finite fields or elliptic
curves, one expects the audience to be familiar with the notion of basic alge-
braic structures like groups, rings and fields (and some linear algebra, too).
But that is all that I assume on the part of the reader. Although I made an
attempt to cover this basic algebra in an appendix, the concise appendix is
perhaps more suited as a reference than as a tool to learn the subject. Like-
wise, students who have not attended a course on algorithms may pick up
the basic terminology (asymptotic notations, types of algorithms, complexity
classes) from another appendix. Any sophisticated topic has been treated in a
self-contained manner in the main body of the text. For example, some basic
algebraic geometry (projective curves, rational functions, divisors) needed for
xv
xvi
volved and/or too sophisticated. Although this omission may alienate readers
from mathematical intricacies, I believe that the risk is not beyond control.
After all, every author of a book has to make a compromise among the bulk,
the coverage, and the details. I achieved this in a way I found most suitable.
I have not made an attempt to formally cite every contribution discussed
in the text. Some key references are presented as on-line comments and/or
footnotes. I personally find citations like [561] or [ABD2c] rather distracting,
suited to technical papers and research monographs, not to a textbook.
I am not going to describe the technical organization of this book. The ta-
ble of contents already accomplishes this task. I instead underline the impos-
sibility of covering the entire material of this book in a standard three-to-four
hour per week course in a semester (or quarter). Chapters 1 and 2 form the
backbone of computational number theory, and may be covered in the first
half of a course. In the second half, the instructor may choose from a variety
of topics. The most reasonable coverage is that of Chapters 5 and 6, followed,
if time permits, by excerpts from Chapters 3 and/or 7. A second course might
deal with the rest of the book. A beginners’ course on elliptic curves may
concentrate on Chapters 1, 2 and 4. Suitable portions from Chapters 1, 2, 5, 6
and 9 make a course on introductory public-key cryptology. The entire book
is expected to be suitable for self study, for students starting a research career
in this area, and for practitioners of cryptography in industry.
While efforts have been made to keep this book as error-free as possible,
complete elimination of errors is a dream for which any author can hope.
The onus lies on the readers, too, to detect errors and omissions at any level,
typographical to conceptual to philosophical. Any suggestion will improve
future editions of this book. I can be reached at abhij@cse.iitkgp.ernet.in
and also at SadTijihba@gmail.com.
No project like authoring this book can be complete without the active help
and participation of others. No amount of words suffice to describe the contri-
bution of my PhD supervisor C. E. Veni Madhavan. It is he who introduced me
to the wonderful world of computational number theory and thereby changed
the course of my life forever. I will never forget the days of working with him
as his student on finite-field arithmetic and the discrete-logarithm problem.
Those were, without any shred of doubt, the sweetest days in my academic life.
Among my other teachers, A. K. Nandakumaran, Basudeb Datta, Dilipkumar
Premchand Patil, Sathya S. Keerthi and Vijay Chandru, all from the Indian
Institute of Science, Bangalore, deserve specific mention for teaching me var-
ious aspects of pure and applied mathematics. I also gratefully acknowledge
Tarun Kumar Mukherjee from Jadavpur University, Calcutta, who inculcated
in me a strong affinity for mathematics in my undergraduate days. My one-
year stay with Uwe Storch and Hartmut Wiebe at the Ruhr-Universität in
Bochum, Germany, was a mathematically invigorating experience.
In the early 2000s, some of my colleagues in IIT Kharagpur developed
a taste for cryptography, and I joined this research group with an eye on
public-key algorithms. I have been gladly supplementing their areas of inter-
xviii
Abhijit Das
Kharagpur
Chapter 1
Arithmetic of Integers
Loosely speaking, number theory deals with the study of integers, also called
whole numbers. It is an ancient branch of mathematics, that has enjoyed study
for millenniums and attracted a variety of researchers ranging from profes-
sional to amateur. Recently, in particular, after the invention of public-key
cryptology, number theory has found concrete engineering applications and
1
2 Computational Number Theory
has turned out to be an interesting and important area of study for computer
scientists and engineers and also for security experts. Several outstanding com-
putational challenges pertaining to the theory of numbers continue to remain
unsolved and are expected to boost fascinating research in near future.
Let me reserve some special symbols in the blackboard-bold font to denote
the following important sets.
N = {1, 2, 3, . . .} = the set of natural numbers,
N0 = {0, 1, 2, 3, . . .} = the set of non-negative integers,
Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} = the set of all integers,
na o
Q = | a ∈ Z, b ∈ N = the set of rational numbers,
b
R = the set of real numbers,
C = the set of complex numbers,
P = {2, 3, 5, 7, 11, 13, . . .} = the set of (positive) prime numbers.1
n = 12345678987654321
= 43 × B 6 + 220 × B 5 + 84 × B 4 + 98 × B 3 + 145 × B 2 + 244 × B + 177
= (43, 220, 84, 98, 145, 244, 177)B
Here, 43 is the most significant (or leftmost) digit, whereas 177 is the least
significant (or rightmost) digit. (In general, an integer n having the base-B
representation (ns−1 , ns−2 , . . . , n1 , n0 )B with ns−1 6= 0 needs s B-ary digits
with ns−1 and n0 being respectively the most and least significant digits.) An
array of size seven suffices to store n of this example. While expressing an
integer in some base, it is conventional to write the most significant digit first.
On the other hand, if one stores n in an array with zero-based indexing, it is
customary to store n0 in the zeroth index, n1 in the first index, and so on. For
example, the above seven-digit number has the following storage in an array.
Array index 0 1 2 3 4 5 6
Digit 177 244 145 98 84 220 43
The sign of an integer can be stored using an additional bit with 0 meaning
positive and 1 meaning negative. Negative integers may be represented in the
standard 1’s-complement or 2’s-complement format. In what follows, I will
stick to the signed magnitude representation only. ¤
Suppose that the user reads the input string d0 d1 d2 . . . dt−1 , where each
di is a decimal digit (0–9). This is converted to the base-B representation
(ns−1 , ns−2 , . . . , n1 , n0 )B . Call this integer n. Now, let another digit dt come
in the input string, that is, we now need to represent the integer n′ whose
decimal representation is d0 d1 d2 . . . dt−1 dt . We have n′ = 10n+dt . This means
that we multiply the representation (ns−1 , ns−2 , . . . , n1 , n0 )B by ten and add
dt to the least significant digit. Multiplication and addition algorithms are
described in the next section. For the time being, it suffices to understand
that the string to base-B conversion boils down to a sequence of elementary
arithmetic operations on multiple-precision integers.
The base-256 representation of 123454321 is, therefore, (7, 91, 195, 113)256 . ¤
For the sake of efficiency, one may process, in each iteration, multiple dec-
imal digits from the input. For example, if n′′ has the decimal representation
d0 d1 d2 . . . dt−1 dt dt+1 , we have n′′ = 100n + (dt dt+1 )10 = 100n + (10dt + dt+1 ).
In fact, the above procedure can easily handle chunks of k digits simultane-
ously from the input so long as 10k < B.
The converse transformation is similar, albeit a little more involved. Now,
we have to carry out arithmetic in a base B ′ which is a suitable integral power
of ten. For example, we may choose B ′ = 10l with 10l < B < 10l+1 . First, we
express the representation base B in the base B ′ used for output: B = HB ′ +L
with H, B ∈ {0, 1, . . . , B ′ − 1}. Let n = (ns−1 , ns−2 , . . . , n1 , n0 )B be available
Arithmetic of Integers 5
Example 1.3 Like the previous two examples, we choose the base B = 256.
The base for decimal output can be chosen as B ′ = 100. We, therefore, need
to convert integers from the base-256 representation to the base-100 represen-
tation. We write B = 2 × B ′ + 56, that is, H = 2 and L = 56. For the input
n = (19, 34, 230, 113)B , the conversion procedure works as follows.
i B-ary digit ni hi li Operation B ′ -ary representation
4 Initialization ()B ′
3 19 0 19 Multiply by 256 ()B ′
Add 19 (19)B ′
2 34 0 34 Multiply by 256 (48, 64)B ′
Add 34 (48, 98)B ′
1 230 2 30 Multiply by 256 (1, 25, 38, 88)B ′
Add 230 (1, 25, 41, 18)B ′
0 113 1 13 Multiply by 256 (3, 21, 5, 42, 8)B ′
Add 113 (3, 21, 5, 43, 21)B ′
If we concatenate the B ′ -ary digits from the most significant end to the least
significant end, we obtain the decimal representation of n. There is only a small
catch here. The digit 5 should be printed as 05, that is, each digit (except the
most significant one) in the base B ′ = 10l must be printed as an l-digit integer
after padding with the requisite number of leading zeros (whenever necessary).
In the example above, l is 2, so the 100-ary digits 21, 43 and 21 do not require
this padding, whereas 5 requires this. The most significant digit 3 too does
not require a leading zero (although there is no harm if one prints it). To sum
up, we have n = (19, 34, 230, 113)B = 321054321. ¤
1.1.2.1 Addition
Let two multiple-precision integers a = (as−1 , as−2 , . . . , a1 , a0 )B and b =
(bt−1 , bt−2 , . . . , b1 , b0 )B be available in the base-B representation. Without loss
6 Computational Number Theory
of generality, we may assume that s = t (if not, we pad the smaller operand
with leading zero digits). We keep on adding ai + bi with carry adjustments
in the sequence i = 0, 1, 2, . . . .
A small implementation-related issue needs to be addressed in this context.
Suppose that we choose a base B = 232 in a 32-bit machine. When we add
ai and bi (possibly also an input carry), the sum may be larger than 232 − 1
and can no longer fit in a 32-bit integer. If such a situation happens, we know
that the output carry is 1 (otherwise, it is 0). But then, how can we detect
whether an overflow has occurred, particularly if we work only with high-level
languages (without access to assembly-level instructions)? A typical behavior
of modern computers is that whenever the result of some unsigned arithmetic
operation is larger than 232 − 1 (or 264 − 1 for a 64-bit machine), the least
significant 32 (or 64) bits are returned. In other words, addition, subtraction
and multiplication of 32-bit (or 64-bit) unsigned integers are actually com-
puted modulo 232 (or 264 ). Based upon this assumption about our computer,
we can detect overflows without any assembly-level support.
First, suppose that the input carry is zero, that is, we add two B-ary digits
ai and bi only. If there is no overflow, the modulo-B sum ai + bi (mod 232 ) is
not smaller than ai and bi . On the other hand, if an overflow occurs, the sum
ai + bi (mod 232 ) is smaller than both ai and bi . Therefore, by inspecting the
return value of the high-level sum, one can deduce the output carry. If the
input carry is one, ai + bi + 1 (mod 232 ) is at most as large as ai if and only
if an overflow occurs. Thus, if there is no overflow, this sum has to be larger
than ai . This observation lets us detect the presence of the output carry.
1.1.2.2 Subtraction
The procedure for the subtraction of multiple-precision integers is equally
straightforward. In this case, we need to handle the borrows. The input and
output borrows are denoted, as before, by cin and cout . The check whether
Arithmetic of Integers 7
computing ai −bi −cin results in an output borrow can be performed before the
operation is carried out. If cin = 0, the output borrow cout is one if and only if
ai < bi . On the other hand, for cin = 1, the output borrow is one if and only
if ai 6 bi . Even in the case that there is an output borrow, one may blindly
compute the mod B operation ai − bi − cin and keep the returned value as the
output word, provided that the CPU supports 2’s-complement arithmetic. If
not, ai − bi − cin may be computed as ai + (B − bi − cin ) with B − bi − cin
computed using appropriate bit operations. More precisely, B − bi − 1 is the
bit-wise complement of bi , and B − bi is one more than that.
Example 1.5 Let us compute a − b for the same operands as in Example 1.4.
i ai bi cin Borrow? cout ai − bi − cin (mod 256)
0 234 132 0 (234 < 132)? 0 102
1 22 242 0 (22 < 242)? 1 36
2 176 239 1 (176 6 239)? 1 192
3 76 80 1 (76 6 80)? 1 251
4 2 0 1 (2 6 0)? 0 1
We obtain a−b = (1, 251, 192, 36, 102)B = 8518640742. In this example, a > b.
If a < b, then a−b can be computed as −(b−a). While performing addition and
subtraction of signed multiple-precision integers, a subroutine for comparing
the absolute values of two operands turns out to be handy. ¤
1.1.2.3 Multiplication
Multiplication of two multiple-precision integers is somewhat problematic.
Like decimal (or polynomial) multiplication, one may multiply every word of
the first operand with every word of the second. But a word-by-word multi-
plication may lead to a product as large as (B − 1)2 . For B = 232 , the product
may be a 64-bit value, whereas for B = 264 , the product may be a 128-bit
value. Correctly obtaining all these bits is trickier than a simple check for an
output carry or borrow as we did for addition and subtraction.
Many compilers support integer data types (perhaps non-standard) of size
twice the natural word size of the machine. If so, one may use this facility to
compute the double-sized intermediate products. Assembly-level instructions
may also allow one to retrieve all the bits of word-by-word products. If neither
of these works, one possibility is to √ break each operand√ word in two half-sized
integers. That is, one writes ai = hi B +li and bj = h′j B +lj′ , and computes
√
ai bj as hi h′j B + (hi lj′ + li h′j ) B + li lj′ . Here, hi h′j contributes only to the more
significant word of ai bj , and li lj′ to only the less significant word. But hi lj′ and
li h′j contribute to both the words. With appropriate bit-shift and extraction
operations, these contributions can be separated and added to appropriate
words of the product. When ai bj is computed as hB + l = (h, l)B , the less
significant word l is added to the (i + j)-th position of the output, whereas h
is added to the (i + j + 1)-st position. Each such addition may lead to a carry
8 Computational Number Theory
Example 1.6 We compute the product of the following two operands avail-
able in the representation to base B = 28 = 256.
a = 1234567 = (18, 214, 135)B ,
b = 76543210 = (4, 143, 244, 234)B .
The product may be as large as having 3 + 4 = 7 B-ary words. We initialize
the product c as an array of seven eight-bit values, each initialized to zero. In
the following table, c is presented in the B-ary representation with the most
significant digit written first.
i ai j bj ai bj = (h, l)B Operation c
Initialization (0, 0, 0, 0, 0, 0, 0)B
0 135 0 234 (123, 102)B Add 102 at pos 0 (0, 0, 0, 0, 0, 0, 102)B
Add 123 at pos 1 (0, 0, 0, 0, 0, 123, 102)B
1 244 (128, 172)B Add 172 at pos 1 (0, 0, 0, 0, 1, 39, 102)B
Add 128 at pos 2 (0, 0, 0, 0, 129, 39, 102)B
2 143 ( 75, 105)B Add 105 at pos 2 (0, 0, 0, 0, 234, 39, 102)B
Add 75 at pos 3 (0, 0, 0, 75, 234, 39, 102)B
3 4 ( 2, 28)B Add 28 at pos 3 (0, 0, 0, 103, 234, 39, 102)B
Add 2 at pos 4 (0, 0, 2, 103, 234, 39, 102)B
1 214 0 234 (195, 156)B Add 156 at pos 1 (0, 0, 2, 103, 234, 195, 102)B
Add 195 at pos 2 (0, 0, 2, 104, 173, 195, 102)B
1 244 (203, 248)B Add 248 at pos 2 (0, 0, 2, 105, 165, 195, 102)B
Add 203 at pos 3 (0, 0, 3, 52, 165, 195, 102)B
2 143 (119, 138)B Add 138 at pos 3 (0, 0, 3, 190, 165, 195, 102)B
Add 119 at pos 4 (0, 0, 122, 190, 165, 195, 102)B
3 4 ( 3, 88)B Add 88 at pos 4 (0, 0, 210, 190, 165, 195, 102)B
Add 3 at pos 5 (0, 3, 210, 190, 165, 195, 102)B
2 18 0 234 ( 16, 116)B Add 116 at pos 2 (0, 3, 210, 191, 25, 195, 102)B
Add 16 at pos 3 (0, 3, 210, 207, 25, 195, 102)B
1 244 ( 17, 40)B Add 40 at pos 3 (0, 3, 210, 247, 25, 195, 102)B
Add 17 at pos 4 (0, 3, 227, 247, 25, 195, 102)B
2 143 ( 10, 14)B Add 14 at pos 4 (0, 3, 241, 247, 25, 195, 102)B
Add 10 at pos 5 (0, 13, 241, 247, 25, 195, 102)B
3 4 ( 0, 72)B Add 72 at pos 5 (0, 85, 241, 247, 25, 195, 102)B
Add 0 at pos 6 (0, 85, 241, 247, 25, 195, 102)B
The product of a and b is, therefore, c = ab = (0, 85, 241, 247, 25, 195, 102)B =
(85, 241, 247, 25, 195, 102)B = 94497721140070. ¤
Example 1.7 Let me explain the working of the above division algorithm on
the following two operands available in the base-256 representation.
a = 369246812345567890 = (5, 31, 212, 4, 252, 77, 138, 146)B ,
b = 19283746550 = (4, 125, 102, 158, 246)B .
The most significant word of b is too small. Multiplying both a and b by
25 = 32 completes the normalization procedure, and the operands change to
a = 1463015397982818880 = (163, 250, 128, 159, 137, 177, 82, 64)B ,
b = 617079889600 = (143, 172, 211, 222, 192)B .
We initially have s = 8 and t = 5, that is, the quotient can have at most
s − t + 1 = 4 digits to the base B = 256. The steps of the division procedure
are now illustrated in the following table.
s Condition Operation Intermediate values
Initialization q = (0, 0, 0, 0)B
a = (163, 250, 128, 159, 137, 177, 82, 64)B
8 (a7 > b4 )? Yes. Increment q3 q = (1, 0, 0, 0)B
Subtract B 3 b from a a = ( 20, 77, 172, 192, 201, 177, 82, 64)B
¥ a7 B+a6 ¦
(a7 = b4 )? No. Set q2 = b4
q = (1, 36, 0, 0)B
2
(a7 B + a6 B + a5 < q2 (b4 B + b3 ))?
No. Do nothing.
Compute c = q2 B 2 b c = ( 20, 52, 77, 203, 83, 0, 0, 0)B
(c > a)? No. Do nothing.
Set a := a − c a = (25, 94, 245, 118, 177, 82, 64)B
7 (a6 > b4 )? No. Do nothing.
¥a ¦
6 B+a5
(a6 = b4 )? No. Set q1 = b4
q = (1, 36, 45, 0)B
(a6 B 2 + a5 B + a4 < q1 (b4 B + b3 ))?
No. Do nothing.
Compute c = q1 Bb c = (25, 65, 97, 62, 39, 192, 0)B
(c > a)? No. Do nothing.
Set a := a − c a = (29, 148, 56, 137, 146, 64)B
6 (a5 > b4 )? No. Do nothing.
¥ a5 B+a4 ¦
(a5 = a4 )? No. Set q0 = b4
q = (1, 36, 45, 52)B
2
(a5 B + a4 B + a3 < q0 (b4 B + b3 ))?
No. Do nothing.
Compute c = q0 b c = (29, 47, 27, 9, 63, 0)B
(c > a)? No. Do nothing.
Set a := a − c a = (101, 29, 128, 83, 64)B
5 (a4 > b4 )? No. Do nothing.
Let us again use the letters a, b to stand for the original operands (before
normalization). We have computed (32a) quot (32b) = (1, 36, 45, 52)B and
Arithmetic of Integers 11
(32a) rem (32b) = (101, 29, 128, 83, 64)B . For the original operands, we then
have a quot b = (32a) quot (32b) = (1, 36, 45, 52)B = 19148084, and a rem b =
[(32a) rem (32b)]/32 = (3, 40, 236, 2, 154)B = 13571457690. ¤
computers, Doklady Akad. Nauk. SSSR, Vol. 145, 293–294, 1962. The paper gives the full
credit of the multiplication algorithm to Karatsuba only.
12 Computational Number Theory
Example 1.8 Take B = 256, a = 123456789 = (7, 91, 205, 21)B , and b =
987654321 = (58, 222, 104, 177)B . We have A1 = (7, 91)B , A0 = (205, 21)B ,
B1 = (58, 222)B and B0 = (104, 177)B . The subproducts are computed as
A1 B1 = (1, 176, 254, 234)B ,
A0 B0 = (83, 222, 83, 133)B ,
A1 − A0 = −(197, 186)B ,
B1 − B0 = −(45, 211)B ,
(A1 − A0 )(B1 − B0 ) = (35, 100, 170, 78)B .
It follows that
A1 B0 + A0 B1 = A1 B1 + A0 B0 − (A1 − A0 )(B1 − B0 ) = (50, 42, 168, 33)B .
The three subproducts are added with appropriate shifts to obtain ab.
1 176 254 234
50 42 168 33
83 222 83 133
1 177 49 20 251 255 83 133
Therefore, ab = (1, 177, 49, 20, 251, 255, 83, 133)B = 121932631112635269. ¤
c(∞) = C2 = a(∞)b(∞) = A1 B1 ,
c(0) = C0 = a(0)b(0) = A0 B0 ,
c(1) = C2 + C1 + C0 = a(1)b(1) = (A1 + A0 )(B1 + B0 ).
Solving the system for C2 , C1 , C0 gives the first version of the Karatsuba–
Ofman algorithm. If we choose the evaluation point k = −1 instead of k = 1,
we obtain the equation
c(−1) = C2 − C1 + C0 = a(−1)b(−1)
= (A0 − A1 )(B0 − B1 ) = (A1 − A0 )(B1 − B0 ).
This equation along with the equations for c(∞) and c(0) yield the second
version of the Karatsuba–Ofman algorithm (as illustrated in Example 1.8).
This gives us a way to generalize the Karatsuba–Ofman algorithm. Toom4
and Cook5 propose representing a and b as polynomials of degrees higher
than one. Writing them as quadratic polynomials gives an algorithm popularly
known as Toom-3 multiplication.
Let a and b be n-digit integers. Take m = ⌈n/3⌉, and write
a = A2 R2 + A1 R + A0 ,
b = B2 R2 + B1 R + B0 ,
c = C4 R4 + C3 R3 + C2 R2 + C1 R + C0
4 Andrei L. Toom, The complexity of a scheme of functional elements realizing the mul-
tiplication of integers, Doklady Akad. Nauk. SSSR, Vol. 4, No. 3, 714–716, 1963.
5 Stephen A. Cook, On the minimum computation time of functions, PhD thesis, De-
O(nlog(2k−1)/ log k ) time in which the exponent can be made arbitrarily close to
one by choosing large values of k. Toom and Cook suggest taking k adaptively
based on the size of the input. The optimal choice is shown as k = 2⌈log r⌉ ,
where each input is broken in k √parts each of size r digits. This gives an
asymptotic running time of O(n25 log n ) for the optimal Toom–Cook method.
Unfortunately, practical implementations do not behave well for k > 4.
Now, let ωN be a primitive N -th root of unity. Depending upon the field
in which we are working, this root ωN can be appropriately defined. For the
time being, let us plan to work in the √ field of complex numbers so that
2π
we can take ωN = e i N (where i = −1). The discrete Fourier transform
(DFT) of the sequence (aN −1 , aN −2 , . . . , a1 , a0 ) is defined as the sequence
(AN −1 , AN −2 , . . . , A1 , A0 ), where for all k ∈ {0, 1, 2, . . . , N − 1}, we have
X
ki
Ak = ωN ai .
06i<N
k k
Ak is the value of the polynomial a evaluated at ωN (replace B by ωN ). Like-
wise, let (BN −1 , BN −2 , . . . , B1 , B0 ) be the DFT of (bN −1 , bN −2 , . . . , b1 , b0 ),
and (CN −1 , CN −2 , . . . , C1 , C0 ) the DFT of (cN −1 , cN −2 , . . . , c1 , c0 ). Since Bk
k k
is b evaluated at ωN , and ck is c evaluated at ωN , we have
Ck = Ak Bk
for all k = 0, 1, 2, . . . , N − 1. Therefore, if we can efficiently compute the
DFTs of the polynomials a and b, we can compute, using only N additional
multiplications, the DFT of the product c = ab. Computing the sequence
(cN −1 , cN −2 , . . . , c1 , c0 ) from its DFT (CN −1 , CN −2 , . . . , C1 , C0 ) is called the
inverse discrete Fourier transform (IDFT). Let (ĈN −1 , ĈN −2 , . . . , Ĉ1 , Ĉ0 ) be
the DFT of (CN −1 , CN −2 , . . . , C1 , C0 ). One can check (Exercise 1.9) that
1
(cN −1 , cN −2 , . . . , c1 , c0 ) = (Ĉ1 , Ĉ2 , . . . , ĈN −1 , Ĉ0 ), (1.1)
N
that is, the IDFT of a sequence can be easily computed from its DFT. So it suf-
fices to compute the DFT as efficiently as possible. A naı̈ve application of the
DFT formula leads to O(N 2 ) running time. A divide-and-conquer procedure
for computing the DFT (AN −1 , AN −2 , . . . , A1 , A0 ) of (aN −1 , aN −2 , . . . , a1 , a0 )
is now presented. This procedure uses only O(n log n) operations in the un-
derlying field (C for the time being), and is called the fast Fourier transform
(FFT) of the input sequence. Let us write the polynomial a as
a = a(e) (B 2 ) + B × a(o) (B 2 ),
where
a(e) (B) = aN −2 B N/2−1 + aN −4 B N/2−2 + · · · + a2 B + a0 , and
a(o) (B) = aN −1 B N/2−1 + aN −3 B N/2−2 + · · · + a3 B + a1
are polynomials obtained from a by taking the terms at even and odd posi-
2
tions, respectively. But ω N = ωN is a primitive N2 -th root of unity. More-
2
over, a(e) and a(o) are polynomials with N2 terms. We recursively com-
(e) (e) (e) (e)
pute the DFT (actually, FFT) (A N −1 , A N −2 , . . . , A1 , A0 ) of a(e) , and the
2 2
N
(o) (o) (o) (o)
DFT (A N −1 , A N −2 , . . . , A1 , A0 ) of a(o) . Finally, ωN2 = −1, so for all
2 2
k = 0, 1, 2, . . . , N2 − 1 we have
(e)
k (o) k(e) (o)
Ak = Ak + ωN Ak and A N +k = Ak − ωN Ak .
2
18 Computational Number Theory
Here, COMBINE stands for the combination of the DFTs of the two recur-
sive calls. For computing the DFT (X7 , X6 , X5 , X4 , X3 , X2 , X1 , X0 ) of the
sequence (x7 , x6 , x5 , x4 , x3 , x2 , x1 , x0 ) of length eight, recursive calls are made
on (x6 , x4 , x2 , x0 ) and (x7 , x5 , x3 , x1 ) to get the two sub-DFTs
³
(Y3 , Y2 , Y1 , Y0 ) = (x0 − x4 ) − i(x2 − x6 ), (x0 + x4 ) − (x2 + x6 ),
´
(x0 − x4 ) + i(x2 − x6 ), (x0 + x4 ) + (x2 + x6 ) ,
³
(Z3 , Z2 , Z1 , Z0 ) = (x1 − x5 ) − i(x3 − x7 ), (x1 + x5 ) − (x3 + x7 ),
´
(x1 − x5 ) + i(x3 − x7 ), (x1 + x5 ) + (x3 + x7 ) ,
We then obtain c as
c = IDFT(C)
1
= (0, 46720, 236160, 331912, 267888, 490768, 123792, 120960)
8
= (0, 5840, 29520, 41489, 33486, 61346, 15474, 15120).
One can check that this is indeed the value of 1234567890 × 1357924680.
Throughout this example, I used hybrid arithmetic, that is, integer arith-
metic √
in conjunction with arithmetic associated with the algebraic numbers
i and 2. Moreover, I have not shown the integer arithmetic in base 256. So
long as this example is meant for illustrating FFT-based multiplication, this
abstraction is fine. In practice, one may resort to floating-point arithmetic. ¤
20 Computational Number Theory
bash$ gp
GP/PARI CALCULATOR Version 2.1.7 (released)
i686 running linux (ix86 kernel) 32-bit version
compiled: Feb 24 2011, gcc-4.4.3 (GCC)
(readline v6.1 enabled, extended help available)
PARI/GP is free software, covered by the GNU General Public License, and
comes WITHOUT ANY WARRANTY WHATSOEVER.
One can enter an arithmetic expression against the prompt. GP/PARI eval-
uates the expression and displays the result. This result is actually stored
in a variable for future references. These variables are to be accessed as
%1,%2,%3,. . . . The last returned result is stored in the variable %%.
Here follows a simple conversation between ¡ me ¢ and100!
GP/PARI. I ask GP/PARI
3 2
to calculate the expressions 22 + 32 and 100 25 = 25!75! . GP/PARI uses con-
ventional precedence and associativity rules for arithmetic operators. For ex-
ample, the exponentiation operator ^ is right-associative and has a higher
precedence than the addition operator +. Thus, 2^2^3+3^2^2 is interpreted as
(2^(2^3))+(3^(2^2)). One can use explicit disambiguating parentheses.
gp > 2^2^3+3^2^2
%1 = 337
gp > 100!/(25!*75!)
%2 = 242519269720337121015504
gp >
gp > binomial(100,25)
%3 = 242519269720337121015504
gp >
One can also define functions at the GP/PARI prompt. For example, one
may choose to redefine the binomial() function as follows.
Here is an alternative
¡ ¢ implementation of the binomial() function, based
on the formula nr = n(n−1)···(n−r+1)
r! . It employs sophisticated programming
styles (like for loops). The interpreter of GP/PARI reads instructions from the
user line by line. If one instruction is too big to fit in a single line, one may
let the instruction span over multiple lines. In that case, one has to end each
line (except the last) by the special character \.
gp > choose2(n,r) = \
num=1; den=1; \
for(k=1, r, \
num*=n; den*=r; \
n=n-1; r=r-1 \
); \
num/den
gp > choose2(100,25)
%5 = 242519269720337121015504
gp >
All variables in GP/PARI are global by default. In the function choose2(), the
variables num and den accumulate the numerator and the denominator. When
the for loop terminates, num stores n(n − 1) · · · (n − r + 1), and den stores r!.
These values can be printed subsequently. If a second call of choose2() is made
with different arguments, the values stored in num and den are overwritten.
gp > num
%6 = 3761767332187389431968739190317715670695936000000
gp > den
%7 = 15511210043330985984000000
gp > choose2(55,34)
%8 = 841728816603675
gp > num
%9 = 248505954558196590544596278440992435848871936000000000
gp > den
%10 = 295232799039604140847618609643520000000
gp > 34!
%11 = 295232799039604140847618609643520000000
gp >
%12 = 242219
gp > x
%13 = x
gp > y
%14 = y
gp > z
%15 = z
gp > u
%16 = u
gp > v
%17 = v
gp > w
%18 = 3
gp >
gp > #
timer = 1 (on)
gp > \
searchPair(L) = \
for (a=1, L, \
for (b=a+1, L, \
x=(a^2+b^2)/(a*b-1); \
if (x == floor(x), \
print(" a = ", a, ", b = ", b, ", x = ", x, ".") \
) \
) \
)
gp > searchPair(10)
a = 1, b = 2, x = 5.
a = 1, b = 3, x = 5.
a = 2, b = 9, x = 5.
time = 1 ms.
gp > searchPair(100)
a = 1, b = 2, x = 5.
a = 1, b = 3, x = 5.
a = 2, b = 9, x = 5.
a = 3, b = 14, x = 5.
a = 9, b = 43, x = 5.
a = 14, b = 67, x = 5.
time = 34 ms.
gp > searchPair(1000)
a = 1, b = 2, x = 5.
24 Computational Number Theory
a = 1, b = 3, x = 5.
a = 2, b = 9, x = 5.
a = 3, b = 14, x = 5.
a = 9, b = 43, x = 5.
a = 14, b = 67, x = 5.
a = 43, b = 206, x = 5.
a = 67, b = 321, x = 5.
a = 206, b = 987, x = 5.
time = 3,423 ms.
gp >
In the above illustration, I turned the timer on by using the special di-
rective #. Subsequently, GP/PARI displays the time taken for executing each
instruction. The timer can be turned off by typing the directive # once again.
GP/PARI provides text-based plotting facilities also.
gp > plot(X=0,2*Pi,sin(X))
0.9996892 |’’’’’’’’’’’_x""""x_’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’|
| _x "x |
| x "_ |
| " x |
| _" x |
| _ x |
| _ " |
| _ " |
| _ " |
|_ x |
| x |
"‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘x‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘"
| x "|
| _ " |
| _ " |
| _ " |
| x " |
| x _" |
| x _ |
| "_ x |
| x_ x" |
-0.999689 |..........................................."x____x"...........|
0 6.283185
gp>
This book is not meant to be a tutorial on GP/PARI. One can read the
manual supplied in the GP/PARI distribution. One can also use the extensive
on-line help facility bundled with the calculator. Entering ? at the GP/PARI
prompt yields an overview of help topics. One may follow these instructions
in order to obtain more detailed help. Some examples are given below.
gp > ?
Arithmetic of Integers 25
Help topics:
0: list of user-defined identifiers (variable, alias, function)
1: Standard monadic or dyadic OPERATORS
2: CONVERSIONS and similar elementary functions
3: TRANSCENDENTAL functions
4: NUMBER THEORETICAL functions
5: Functions related to ELLIPTIC CURVES
6: Functions related to general NUMBER FIELDS
7: POLYNOMIALS and power series
8: Vectors, matrices, LINEAR ALGEBRA and sets
9: SUMS, products, integrals and similar functions
10: GRAPHIC functions
11: PROGRAMMING under GP
12: The PARI community
?? keyword (long help text about "keyword" from the user’s manual)
??? keyword (a propos: list of related functions).
gp > ?4
gp > ?znorder
znorder(x): order of the integermod x in (Z/nZ)*.
gp > znorder(Mod(19,101))
%19 = 25
gp > ?11
gp > ?while
while(a,seq): while a is nonzero evaluate the expression sequence seq.
Otherwise 0.
gp > MAX(a,b,c) = \
if (a>b, \
if(a>c, return(a), return(c)), \
if(b>c, return(b), return(c)) \
)
gp > MAX(3,7,5)
%21 = 7
gp > Fib(n) = \
if(n==0, return(0)); \
if(n==1, return(1)); \
return(Fib(n-1)+Fib(n-2))
? Fib(10)
%22 = 55
gp > ?printp
printp(a): outputs a (in beautified format) ending with newline.
gp > ?printtex
printtex(a): outputs a in TeX format.
gp > print(x^13+2*x^5-5*x+4)
x^13 + 2*x^5 - 5*x + 4
gp > printp(x^13+2*x^5-5*x+4)
(x^13 + 2 x^5 - 5 x + 4)
gp > printtex(x^13+2*x^5-5*x+4)
x^{13}
+ 2 x^5
- 5 x
+ 4
gp >
gp > \q
Good bye!
bash$
I will not explain the syntax of GP/PARI further in this book, but will use
the calculator for demonstrating arithmetic (and algebraic) calculations.
Arithmetic of Integers 27
1.2 GCD
Divisibility is an important property of integers. Let a, b ∈ Z. We say
that a divides b and denote this as a|b if there exists an integer c for which
b = ca. For example, 31|1023 (since 1023 = 33 × 31), −31|1023 (since 1023 =
(−33) × (−31)), every integer a (including 0) divides 0 (since 0 = 0 × a). Let
a|b with a 6= 0. The (unique) integer c with b = ca is called the cofactor of a
in b. If a|b with both a, b non-zero, then |a| 6 |b|. By the notation a6 | b, we
mean that a does not divide b.
Definition 1.12 Let a, b ∈ Z be not both zero. The largest positive integer
d that divides both a and b is called the greatest common divisor or the gcd
of a and b. It is denoted as gcd(a, b). Clearly, gcd(a, b) = gcd(b, a). For a 6= 0,
we have gcd(a, 0) = |a|. The value gcd(0, 0) is left undefined. Two integers a, b
are called coprime or relatively prime if gcd(a, b) = 1. ⊳
mous for his contributions to geometry. His book Elements influences various branches of
mathematics even today.
28 Computational Number Theory
Theorem 1.17 [Bézout relation] For a, b ∈ Z, not both zero, there exist
integers u, v satisfying gcd(a, b) = ua + vb. ⊳
Example 1.18 The following table illustrates the extended gcd computation
for 252, 91 (Also see Example 1.16).
i qi ri ui vi u i a + vi b
Initialization
0 − 252 1 0 252
1 − 91 0 1 91
Iterations
2 2 70 1 −2 70
3 1 21 −1 3 21
4 3 7 4 −11 7
5 3 0 −13 36 0
vi = (ri −ui a)/b from the values ri , ui , a, b only. Furthermore, the computation
of ui = ui−2 − qi ui−1 does not require the values from the v sequence.
Algorithm 1.2 incorporates all these intricate details. Each iteration of the
gcd loop performs only a constant number of integer operations. Moreover, like
the basic gcd calculation (Algorithm 1.1), the loop is executed O(lg(min(a, b)))
times. To sum up, the extended gcd algorithm is slower by the basic gcd
algorithm by only a small constant factor.
GP/PARI provides the built-in function gcd for computing the gcd of two
integers. An extended gcd can be computed by the built-in function bezout
which returns the multipliers and the gcd in the form of a 3-dimensional vector.
gp > gcd(2^77+1,2^91+1)
%1 = 129
gp > bezout(2^77+1,2^91+1)
%2 = [-151124951386849816870911, 9223935021170032768, 129]
gp > (-151124951386849816870911) * (2^77+1) + 9223935021170032768 * (2^91+1)
%3 = 129
tegers. Here, I briefly describe the binary gcd algorithm9 (Algorithm 1.3)
which performs better than the Euclidean gcd algorithm. This improvement
in performance is achieved by judiciously replacing Euclidean division by con-
siderably faster subtraction and bit-shift operations. Although the number of
iterations is typically larger in the binary gcd loop than in the Euclidean gcd
loop, the reduction of running time per iteration achieved by the binary gcd
algorithm usually leads to a more efficient algorithm for gcd computation.
Let gcd(a, b) be computed. We first write a = 2s a′ and b = 2t b′ with
a , b odd. Since gcd(a, b) = 2min(s,t) gcd(a′ , b′ ), we may assume that we are
′ ′
going to compute the gcd of two odd integers, that is, a, b themselves are
odd. First assume that a > b. We have gcd(a, b) = gcd(a − b, b). But a − b is
even and so can be written as a − b = 2r α. Since b is odd, it turns out that
gcd(a, b) = gcd(α, b). Since α = (a − b)/2r 6 (a − b)/2 6 a/2, replacing the
computation of gcd(a, b) by gcd(α, b) implies reduction of the bit-size of the
first operand by at least one. The case a < b can be symmetrically treated.
Finally, if a = b, then gcd(a, b) = a.
9 Josef Stein, Computational problems associated with Racah algebra, Journal of Com-
putational Physics, 1(3), 397–405, 1967. This algorithm seems to have been known in ancient
China.
Arithmetic of Integers 33
size, replacing the first iteration by a Euclidean division often improves the
performance of the algorithm considerably.
Like the extended Euclidean gcd algorithm, one can formulate the extended
binary gcd algorithm. The details are left to the reader as Exercise 1.13.
Sorenson10 extends the concept of binary gcd to k-ary gcd for k > 2.
Another gcd algorithm tailored to multiple-precision integers is the Lehmer
gcd algorithm.11
10 Jonathan Sorenson, Two fast GCD algorithms, Journal of Algorithms, 16(1), 110–144,
1994.
11 Jonathan Sorenson, An analysis of Lehmer’s Euclidean gcd algorithm, ISSAC, 254–258,
1995.
12 The concept of congruences was formalized by the Swiss mathematician Leonhard Euler
(1707–1783) renowned for his contributions to several branches of mathematics. Many basic
notations we use nowadays (including congruences and functions) were introduced by Euler.
34 Computational Number Theory
All the parts of the proposition can be easily verified using the definition of
congruence. Part (g) in the proposition indicates that one should be careful
while canceling a common factor from the two sides of a congruence relation.
Such a cancellation should be accompanied by dividing the modulus by the
gcd of the modulus with the factor being canceled.
Definition 1.22 Let m ∈ N. A set of m integers a0 , a1 , a2 , . . . , am−1 is said
to constitute a complete residue system modulo m if every integer a ∈ Z is
congruent modulo m to one and only one of the integers ai for 0 6 i 6 m − 1.
Evidently, no two distinct integers ai , aj in a complete residue system can be
congruent to one another. ⊳
Unless otherwise mentioned, I will let Zm stand for the standard residue
system {0, 1, 2, . . . , m − 1} rather than for any arbitrary residue system. ¤
gp > m = 17
%1 = 17
gp > a = Mod(5,m)
%2 = Mod(5, 17)
gp > b = Mod(25,m)
%3 = Mod(8, 17)
gp > a + b
%4 = Mod(13, 17)
gp > a - b
%5 = Mod(14, 17)
gp > a * b
%6 = Mod(6, 17)
gp > a / b
%7 = Mod(7, 17)
gp > 7 * a
%8 = Mod(1, 17)
gp > a^7
%9 = Mod(10, 17)
Example 1.26 Let m = 15. The element 7 is invertible modulo 15, since
13 × 7 ≡ 1 (mod 15). On the other hand, the element 6 of Z15 is not invertible.
I prove this fact by contradiction, that is, I assume that u is an inverse of 6
modulo 15. This means that 6u ≡ 1 (mod 15), that is, 15|(6u − 1), that is,
6u − 1 = 15k for some k ∈ Z, that is, 3(2u − 5k) = 1. This is impossible, since
the left side is a multiple of 3, whereas the right side is not. ¤
The proof of Theorem 1.27 indicates that in order to compute the inverse
of a ∈ Zm , one can compute the extended gcd d = ua + vm. If d > 1, then a
is not invertible modulo m. If d = 1, (the integer in Zm congruent modulo m
to) u is the (unique) inverse of a modulo m.
Example 1.28 Let us compute the inverse of 11 modulo 15. Extended gcd
calculations give gcd(11, 15) = 1 = (−4) × 11 + 3 × 15, that is, 11−1 ≡ −4 ≡
11 (mod 15), that is, 11 is its own inverse modulo 15.
On the other hand, if we try to invert 12 modulo 15, we obtain the Bézout
relation gcd(12, 15) = 3 = (−1) × 12 + 1 × 15. Since 12 and 15 are not coprime,
12 does not have a multiplicative inverse modulo 15. ¤
where the last product is over the set of all (distinct) prime divisors of m.
Proof Consider the standard residue system modulo m. An integer (between
0 and m − 1) is coprime to m if and only if it is divisible by neither of the
primes p1 , . . . , pr . We can use a combinatorial argument based on the principle
of inclusion and exclusion in order to derive the given formula for φ(m). ⊳
gp > eulerphi(98)
%1 = 42
gp > eulerphi(99)
%2 = 60
gp > eulerphi(100)
%3 = 40
gp > eulerphi(101)
%4 = 100
gp > factor(2^101-1)
significant contributions in number theory. Fermat is famous for his last theorem which
states that the equation xn + y n = z n does not have integer solutions with xyz 6= 0 for all
integers n > 3. See Footnote 1 in Chapter 4 for a historical sketch on Fermat’s last theorem.
38 Computational Number Theory
%5 =
[7432339208719 1]
[341117531003194129 1]
gp > (7432339208719-1)*(341117531003194129-1)
%6 = 2535301200456117678030064007904
gp > eulerphi(2^101-1)
%7 = 2535301200456117678030064007904
Example 1.35 Let us compute 713 (mod 31). Here m = 31, a = 7, and
e = 13 = (1101)2 . The following table summarizes the steps of the square-
and-multiply algorithm on these parameters.
i ei xi = (e3 . . . ei )2 t (after sqr) t (after mul) bi
4 − 0 − − 1
3 1 (1)2 = 1 12 ≡ 1 (mod 31) 1 × 7 ≡ 7 (mod 31) 7
2 1 (11)2 = 3 72 ≡ 18 (mod 31) 18 × 7 ≡ 2 (mod 31) 2
1 0 (110)2 = 6 22 ≡ 4 (mod 31) (multiplication skipped) 4
0 1 (1101)3 = 13 42 ≡ 16 (mod 31) 16 × 7 ≡ 19 (mod 31) 19
Thus, 713 ≡ 19 (mod 31). ¤
14 Paul D. Barrett, Implementing the Rivest Shamir and Adleman public key encryption
algorithm on a standard digital signal processor, CRYPTO’86, 311–332, 1987.
40 Computational Number Theory
This R happens to be larger than m (but smaller than 2m), so the correct
value of r = x rem m is r = R − m = 3917934 = (59, 200, 110)B . ¤
Arithmetic of Integers 41
ax ≡ b (mod m),
Example 1.38 Take the congruence 21x ≡ 9 (mod 15). Here, a = 21, b = 9,
and m = 15. Since d = gcd(a, m) = 3 divides b, the congruence is solvable.
Canceling 3 gives 7x ≡ 3 (mod 5), that is, x ≡ 7−1 × 3 ≡ 3 × 3 ≡ 4 (mod 5).
The solutions modulo 15 are 4, 9, 14.
The congruence 21x ≡ 8 (mod 15) is not solvable, since 3 = gcd(21, 15)
does not divide 8. ¤
oldest reference to the theorem appears in a third-century book by the Chinese mathemati-
cian Sun Tzu. In the sixth and seventh centuries, Indian mathematicians Aryabhatta and
Brahmagupta studied the theorem more rigorously.
Arithmetic of Integers 43
GP/PARI supports the call chinese() for CRT-based combination. The func-
tion takes only two modular elements as arguments. The function combines the
two elements, and returns an element modulo the product of the input mod-
uli. In order to run CRT on more than two moduli, we need to make nested
calls of chinese(). The function chinese() can handle non-coprime moduli
also. An integer modulo the lcm of the input moduli is returned in this case.
However, the input congruences may now fail to have a simultaneous solution.
For example, there cannot exist an integer x satisfying both x ≡ 5 (mod 12)
and x ≡ 6 (mod 18), since such an integer is of the form 18k + 6 (a multiple
of 6) and at the same time of the form 12k + 5 (a non-multiple of 6).
gp > chinese(Mod(5,7),Mod(3,11))
%1 = Mod(47, 77)
gp > chinese(Mod(5,7),Mod(-3,11))
%2 = Mod(19, 77)
gp > chinese(chinese(Mod(5,7),Mod(3,11)),Mod(2,13))
%3 = Mod(509, 1001)
gp > chinese(Mod(47,77),Mod(2,13))
%4 = Mod(509, 1001)
gp > chinese(Mod(5,12),Mod(11,18))
%5 = Mod(29, 36)
gp > chinese(Mod(5,12),Mod(6,18))
*** incompatible arguments in chinois.
The incremental way of combining congruences for more than two moduli,
as illustrated above for GP/PARI, may be a bit faster (practically, but not in
terms of the order notation) than Algorithm 1.6 (see Exercise 1.44).
44 Computational Number Theory
contribution is the introduction of p-adic numbers that find many uses in analysis, algebra
and number theory. If the Hensel lifting procedure is carried out for all e ∈ N, in the limit
we get the p-adic solutions of f (x) = 0.
Arithmetic of Integers 45
f (ξ)
f ′ (ξ)k ≡ − (mod p),
pǫ
which has 0, 1, or p solutions for k depending on the values of f ′ (ξ) and fp(ξ)
ǫ .
Each lifting step involves solving a linear congruence only. The problem of
solving a polynomial congruence then reduces to solving the congruence mod-
ulo each prime divisor of the modulus. We will study root-finding algorithms
for polynomials later in a more general setting.
We have f (x) = 2x3 − 7x2 + 189, and so f ′ (x) = 6x2 − 14x. The modulus
admits the prime factorization m = 33 × 52 . We proceed step by step in order
to obtain all the roots.
Solutions of f (x) ≡ 0 (mod 3)
We have f (0) ≡ 189 ≡ 0 (mod 3), f (1) ≡ 2 − 7 + 189 ≡ 1 (mod 3) and
f (2) ≡ 16 − 28 + 189 ≡ 0 (mod 3). Thus, the roots modulo 3 are 0, 2.
Solutions of f (x) ≡ 0 (mod 32 )
Let us first lift the root x ≡ 0 (mod 3). Since f (0)/3 = 189/3 = 63 and f ′ (0) =
0, the congruence f ′ (0)k ≡ − f (0) 3 (mod 3) is satisfied by k = 0, 1, 2 (mod 3),
and the lifted roots are 0, 3, 6 modulo 9.
For lifting the root x ≡ 2 (mod 3), we calculate f (2)/3 = 177/3 = 59
and f ′ (2) = 24 − 28 = −4. So the congruence f ′ (2)k ≡ − f (2) 3 (mod 3), that
is, −4k ≡ −59 (mod 3), has a unique solution k ≡ 2 (mod 3). So there is a
unique lifted root 2 + 2 × 3 = 8.
Therefore, all the roots of f (x) ≡ 0 (mod 32 ) are 0, 3, 6, 8.
Solutions of f (x) ≡ 0 (mod 33 )
Let us first lift the root x ≡ 0 (mod 32 ). We have f (0)/32 = 189/9 = 21 and
f ′ (0) = 0. The congruence f ′ (0)k ≡ − f3(0)2 (mod 3) is satisfied by k = 0, 1, 2,
that is, there are three lifted roots 0, 9, 18.
Next, we lift the root x ≡ 3 (mod 32 ). We have f (3)/32 = (2 × 27 − 7 ×
9 + 189)/9 = 6 − 7 + 21 = 20, whereas f ′ (3) = 6 × 9 − 14 × 3 = 12. Thus,
the congruence f ′ (3)k ≡ − f3(3) 2 (mod 3) has no solutions, that is, the root
3 (mod 32 ) does not lift to a root modulo 33 .
For the root x ≡ 6 (mod 32 ), we have f (6)/32 = (2×216−7×36+189)/9 =
48 − 28 + 21 = 41 and f ′ (6) = 216 − 84 = 132, so there is no solution for k in
the congruence f ′ (6)k ≡ − f3(6)2 (mod 3), that is, the root 6 does not lift to a
root modulo 33 .
46 Computational Number Theory
α ≡ (b2 − 4ac)(4a2 )−1 (mod p). This implies that it suffices to concentrate
only on quadratic congruences of the special form
x2 ≡ a (mod p).
Gauss was the first mathematician to study quadratic congruences formally.17
one of the most gifted mathematicians of all ages. Gauss is often referred to as the prince
of mathematics and also as the last complete mathematician (in the sense that he was the
last mathematician who was conversant with all branches of contemporary mathematics).
In his famous book Disquisitiones Arithmeticae (written in 1798 and published in 1801),
Gauss introduced the terms quadratic residues and non-residues.
18 Adrien-Marie Legendre (1752–1833) was a French mathematician famous for pioneering
Theorem
³ ´ 1.48 [The law of quadratic
³ ´ reciprocity] Let p, q be odd primes.
p (p−1)(q−1)/4 q
Then, q = (−1) p . ⊳
¡ 51
¢
Example 1.49 Using the quadratic reciprocity law, we compute 541 as
µ ¶ µ ¶µ ¶
51 3 17
=
541 541 541
19 Conjectured by Legendre, the quadratic reciprocity law was first proved by Gauss.
Indeed, Gauss himself published eight proofs of this law. At present, hundreds of proofs of
this law are available in the mathematics literature.
Arithmetic of Integers 49
µ ¶ µ ¶
541 541
= (−1)(3−1)(541−1)/4 (−1)(17−1)(541−1)/4
3 17
µ ¶µ ¶ µ ¶µ ¶ µ ¶ µ ¶µ ¶
541 541 1 14 14 2 7
= = = =
3 17 3 17 17 17 17
µ ¶ µ ¶ µ ¶ µ ¶
(172 −1)/8 7 7 (7−1)(17−1)/4 17 17
= (−1) = = (−1) =
17 17 7 7
µ ¶ µ ¶ µ ¶ µ ¶
3 7 7 1
= = (−1)(3−1)(7−1)/4 =− =− = −1.
7 3 3 3
Thus, 51 is a quadratic non-residue modulo 541. ¤
Proposition 1.51 For odd positive integers b, b′ , and for any integers a, a′ ,
we have: ³ ´ ¡ ¢ ³ ´
′
a′
(a) aab = ab .
¡ a ¢ ¡ a ¢ ¡ a ¢b
(b) bb′ = b b′ .
¡ ¢ ³ ′´
(c) If a ≡ a′ (mod b), then ab = ab .
¡ ¢ ¡ ¢
(d) ab = a rem b
b
.
¡ −1 ¢ (b−1)/2
(e) = (−1) .
¡ 2b¢ (b2 −1)/8
(f ) b = (−1) . ³ ′´
¡ ¢ ′
(g) [Law of quadratic reciprocity] bb′ = (−1)(b−1)(b −1)/4 bb . ⊳
¡ 51 ¢
Example 1.52 Let us compute 541 without making any factoring at-
tempts. At some steps, we may have to
¡ 51 ¢ extract powers of 2, but¡ that
¢ is ¡doable
¢
efficiently by bit operations only. 541 = (−1)(51−1)(541−1)/4 541 51 = 51 31
=
¡ ¢
(31−1)(51−1)/4 51
¡ 51 ¢ ¡ 20 ¢ ¡ 2 ¢2 ¡ 5 ¢ ¡5¢
(−1) = − 31 = − 31 = − 31 = − 31 =
¡31 ¢ ¡ 31 ¢ ¡1¢ 31
−(−1)(5−1)(31−1)/4 31 5 = − 5 = − 5 = −1. ¤
¡a¢
The GP/PARI interpreter computes the Jacobi symbol b , when the call
kronecker(a,b) is made.21 Here are some examples.
gp > kronecker(41,541)
%1 = 1
gp > kronecker(51,541)
%2 = -1
gp > kronecker(2,15)
%3 = 1
gp > kronecker(2,45)
%4 = -1
gp > kronecker(21,45)
%5 = 0
³ ´
For an odd prime p, the congruence x2 ≡ a (mod p) has exactly 1 + ap
³ ´
solutions. We first compute the Legendre symbol ap , and if the congruence is
found to be solvable, the next task is to compute the roots of the congruence.
We postpone the study of root finding until Chapter 3. Also see Exercises 1.58
and 1.59.
21 Kronecker extended the Jacobi symbol to all non-zero integers b, including even and
negative integers (see Exercise 1.65). Leopold Kronecker (1823–1891) was a German mathe-
matician who made significant contributions to number theory and algebra. A famous quote
from him is: Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.
(The dear God has created the whole numbers, everything else is man’s work.)
Arithmetic of Integers 51
Definition 1.53 Let a ∈ Z∗m . The smallest positive integer e for which ae ≡
1 (mod m) is called the multiplicative order (or simply the order) of a modulo
m and is denoted by ordm a. If e = ordm a, we also often say that a belongs
to the exponent e modulo m. ⊳
gp > znorder(Mod(1,35))
%1 = 1
gp > znorder(Mod(2,35))
%2 = 12
gp > znorder(Mod(4,35))
%3 = 6
gp > znorder(Mod(6,35))
%4 = 2
gp > znorder(Mod(7,35))
*** not an element of (Z/nZ)* in order.
Proof Let e|h, that is, h = ke for some k ∈ Z. But then ah ≡ (ae )k ≡ 1k ≡
1 (mod m). Conversely, let ah ≡ 1 (mod m). Euclidean division of h by e yields
h = ke + r with 0 6 r < e. Since ae ≡ 1 (mod m), we have ar ≡ 1 (mod m).
By definition, e is the smallest positive integer with ae ≡ 1 (mod m), that is,
we must have r = 0, that is, e|h. The fact e|φ(m) follows directly from Euler’s
theorem: aφ(m) ≡ 1 (mod m). ⊳
Primitive roots do not exist for all moduli m. We prove an important fact
about primes in this context.
Primes are not the only moduli to have primitive roots. The following
theorem characterizes all moduli that have primitive roots.
Theorem 1.58 The only positive integers > 1 that have primitive roots are
2, 4, pe , 2pe , where p is any odd prime, and e is any positive integer. ⊳
Disquisitiones Arithmeticae. In particular, Gauss was the first to prove Theorem 1.58.
Arithmetic of Integers 53
(3) The modulus m = 16 does not have a primitive root, that is, an element
of order φ(m) = 8. One can check that ord16 1 = 1, ord16 7 = ord16 9 =
ord16 15 = 2, and ord16 3 = ord16 5 = ord16 11 = ord16 13 = 4. ¤
gp > znprimroot(47)
%1 = Mod(5, 47)
gp > znprimroot(49)
%2 = Mod(3, 49)
gp > znprimroot(50)
%3 = Mod(27, 50)
gp > znprimroot(51)
*** primitive root does not exist in gener
The converse question is: Does any irrational ξ expand to an infinite simple
continued fraction? The answer is: yes. We inductively generate a0 , a1 , a2 , . . .
with ξ = ha0 , a1 , a2 , . . .i as follows. We start by setting ξ0 = ξ and a0 = ⌊ξ0 ⌋.
When ξ0 , . . . , ξn and a0 , . . . , an are known for some n > 0, we calculate ξn+1 =
1/(ξn − an ) and an+1 = ⌊ξn+1 ⌋. Since ξ is irrational, it follows that each ξn is
also irrational. In addition, the integers a1 , a2 , a3 , . . . are all positive. Only a0
may be positive, negative or zero.
56 Computational Number Theory
Example 1.65 √ (1) Let us first obtain the infinite simple continued fraction
expansion of 2.
√
ξ0 = 2 = 1.4142135623 . . . , a0 = ⌊ξ0 ⌋ = 1 ,
1 1 √
ξ1 = =√ = 1 + 2 = 2.4142135623 . . . , a1 = ⌊ξ1 ⌋ = 2 ,
ξ0 − a0 2−1
1 1 √
ξ2 = =√ = 1 + 2 = 2.4142135623 . . . , a2 = ⌊ξ2 ⌋ = 2 ,
ξ1 − a1 2−1
√ √
and so on. Therefore, 2 = h1, 2, 2, 2, . . .i. The first few convergents to 2 are
r0 = h1i = 1, r1 = h1, 2i = 23 = 1.5, r2 = h1, 2, 2i = 75 = 1.4, r3 = h1, 2, 2, 2i =
17 41
12 = 1.4166666666 . . . , r4 = h1, 2, 2, 2, 2i = 29 = 1.4137931034 . . . . It is √
ap-
parent that the convergents r0 , r1 , r2 , r3 , r4 , . . . go successively closer to 2.
(2) Let us now develop the infinite simple continued fraction expansion
of π = 3.1415926535 . . . .
ξ0 = π = 3.1415926535 . . . , a0 = ⌊ξ0 ⌋ = 3 ,
1
ξ1 = = 7.0625133059 . . . , a1 = ⌊ξ1 ⌋ = 7 ,
ξ0 − a0
1
ξ2 = = 15.996594406 . . . , a2 = ⌊ξ2 ⌋ = 15 ,
ξ1 − a1
1
ξ3 = = 1.0034172310 . . . , a3 = ⌊ξ3 ⌋ = 1 ,
ξ2 − a2
and so on. Thus, the first few convergents to π are r0 = h3i = 3, r1 =
h3, 7i = 22 333
7 = 3.1428571428 . . . , r2 = h3, 7, 15i = 106 = 3.1415094339 . . . , r3 =
355
h3, 7, 15, 1i = 113 = 3.1415929203 . . . . Here too, the convergents r0 , r1 , r2 ,
r3 , . . . go successively closer to π. This is indeed true, in general. ¤
The convergents hn /kn to the irrational number ξ are called best possible
approximations of ξ in the sense that if a rational a/b is closer to ξ than hn /kn ,
the denominator b has to be larger than kn . More precisely, we have:
hn
Theorem 1.68 Let a ∈ Z and b ∈ N with |ξ − ab | < |ξ − kn | for some n > 1.
Then, b > kn . ⊳
gp > contfrac(1001/101)
%1 = [9, 1, 10, 4, 2]
gp > contfrac(sqrt(11))
%2 = [3, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3]
gp > Pi
%3 = 3.141592653589793238462643383
gp > contfrac(Pi)
%4 = [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2,
1, 1, 15, 3]
gp > contfrac(Pi,10)
%5 = [3, 7, 15, 1, 292, 1, 1, 1, 3]
gp > contfrac(Pi,100)
%6 = [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2,
1, 1, 15, 3]
gp > contfrac(Pi)
%1 = [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2,
1, 1, 15, 3]
gp > contfracpnqn(contfrac(Pi))
%2 =
[428224593349304 139755218526789]
[136308121570117 44485467702853]
gp > contfrac(Pi,3)
%3 = [3, 7, 15]
gp > contfracpnqn(contfrac(Pi,3))
%4 =
[333 22]
[106 7]
gp > contfrac(Pi,5)
%5 = [3, 7, 15, 1, 292]
gp > contfracpnqn(contfrac(Pi,5))
%6 =
[103993 355]
[33102 113]
gp > contfracpnqn(contfrac(1001/101))
%7 =
[1001 446]
[101 45]
58 Computational Number Theory
Theorem 1.70 [Prime number theorem (PNT )] π(x) approaches the quan-
tity x/ ln x as x → ∞. Here, the term “approaches” means that the limit
π(x)
lim is equal to 1. ⊳
x→∞ x/ ln x
³Itxhas been
√
proved
´ (for example, by Vallée Poussin) that π(x) − Li(x) =
−α ln x
O e for some constant α. However, the tighter bound indicated
ln x
by the Riemann hypothesis stands unproven even in the twenty-first century.
A generalization of Theorem 1.69 is proved by Dirichlet.24
works have deep impacts in several branches of mathematics including complex analysis,
analytic number theory and geometry.
24 Johann Peter Gustav Lejeune Dirichlet (1805–1859) was a German mathematician hav-
Exercises
1. Describe an algorithm to compare the absolute values of two multiple-precision
integers.
2. Describe an algorithm to compute the product of a multiple-precision integer
with a single-precision integer.
3. Squaring is a form of multiplication where both the operands are the same.
Describe how this fact can be exploited to speed up the schoolbook mul-
tiplication algorithm for multiple-precision integers. What about Karatsuba
multiplication?
4. Describe an efficient algorithm to compute the Euclidean division of a multi-
ple-precision integer by a non-zero single-precision integer.
5. Describe how multiple-precision division by an integer of the form B l ± m (B
is the base, and m is a small integer) can be efficiently implemented.
6. Explain how multiplication and division of multiple-precision integers by pow-
ers of 2 can be implemented efficiently using bit operations.
7. Describe the details of the Toom-4 multiplication method. Choose the evalu-
ation points as k = ∞, 0, ±1, −2, ± 12 .
8. Toom’s multiplication can be adapted to work for unbalanced operands, that
is, when the sizes of the operands vary considerably. Suppose that the number
of digits of a is about two-thirds the number of digits of b. Write a as a
polynomial of degree two, and b as a polynomial of degree three. Describe how
you can compute the product ab in this case using a Toom-like algorithm.
9. Derive Equation (1.1).
10. Verify the following assertions. Here, a, b, c, x, y are arbitrary integers.
(a) a|a.
(b) If a|b and b|c, then a|c.
(c) If a|b and b|a, then a = ±b.
(d) If a|b and a|c, then a|(bx + cy).
(e) If a|(bc) and gcd(a, b) = 1, then a|c.
11. Let p be a prime. If p|(ab), show that p|a or p|b. More generally, show that if
p|(a1 a2 · · · an ), then p|ai for some i ∈ {1, 2, . . . , n}.
12. Suppose that gcd(r0 , r1 ) is computed by the repeated Euclidean division algo-
rithm. Suppose also that r0 > r1 > 0. Let ri+1 denote the remainder obtained
by the i-th division (that is, in the i-th iteration of the Euclidean loop). So
the computation proceeds as gcd(r0 , r1 ) = gcd(r1 , r2 ) = gcd(r2 , r3 ) = · · · with
r0 > r1 > r2 > · · · > rk > rk+1 = 0 for some k > 1.
(a) If the computation of gcd(r0 , r1 ) requires exactly k Euclidean divisions,
show that r0 > Fk+2 and r1 > Fk+1 . Here, Fn is the n-th Fibonacci number:
F0 = 0, F1 = 1, and Fn = Fn−1 + Fn−2 for n > 2.
Arithmetic of Integers 63
K0 () = 1,
K1 (x1 ) = x1 ,
Kn (x1 , x2 , . . . , xn ) = xn Kn−1 (x1 , x2 , . . . , xn−1 ) + Kn−2 (x1 , x2 , . . . , xn−2 ), n > 2.
Kn (x1 , . . . , xn )Kn (x2 , . . . , xn+1 ) − Kn+1 (x1 , . . . , xn+1 )Kn−1 (x2 , . . . , xn ) = (−1)n .
16. Consider the extended Euclidean gcd algorithm described in Section 1.2.2.
Suppose that the algorithm terminates after computing rj = 0 (so that rj−1 is
the gcd of r0 = a and r1 = b). Assume that a > b and let d = gcd(a, b). Finally,
let q2 , q3 , . . . , qj be the quotients obtained during the Euclidean divisions.
(a) Show that
|u1 | < |u2 | 6 |u3 | < |u4 | < |u5 | < · · · < |uj |, and
|v0 | < |v1 | 6 |v2 | < |v3 | < |v4 | < · · · < |vj |.
(a) Prove that Algorithm 1.8 terminates and correctly computes gcd(a, b).
(b) How can you efficiently implement Algorithm 1.8 using bit operations?
(c) Prove that the number of iterations of the while loop is O(lg a + lg b).
(d) Argue that Algorithm 1.8 can be so implemented to run in O(lg2 a) time
(where a is the larger input operand).
23. Let a, b ∈ N with gcd(a, b) = 1. Assume that a 6= 1 and b 6= 1.
(a) Prove that any integer n > ab can be expressed as n = sa + tb with
integers s, t > 0.
(b) Devise a polynomial-time (in log n) algorithm to compute s, t of Part (a).
25 Jeffrey Shallit and Jonathan Sorenson, Analysis of a left-shift binary gcd algorithm,
39. Compute all the simultaneous solutions of the congruences: 5x ≡ 3 (mod 47),
and 3x2 ≡ 5 (mod 49).
40. Let p be a prime.
(a) Show that
where f (x) ≡ g(x) (mod p) means that the coefficient of xi in the polynomial
f (x) is congruent modulo p to the coefficient of xi in g(x) for all i ∈ N0 .
(b) [Wilson’s theorem] Prove that (p − 1)! ≡ −1 (mod p).
(c) If m ∈ N is composite and > 4, prove that (m − 1)! ≡ 0 (mod m).
41. [Generalized Euler’s theorem] Let m ∈ N, and a any integer (not necessarily
coprime to m). Prove that am ≡ am−φ(m) (mod m).
42. Let σ(n) denote the sum of positive integral divisors of n ∈ N. Let n = pq
with two distinct primes p, q. Devise a polynomial-time algorithm to compute
p, q from the knowledge of n and σ(n).
43. (a) Let n = p2 q with p, q distinct odd primes, p6 | (q − 1) and q6 | (p − 1). Prove
that factoring n is polynomial-time equivalent to computing φ(n).
(b) Let n = p2 q with p, q odd primes satisfying q = 2p + 1. Argue that one
can factor n in polynomial time.
44. (a) Let m1 , m2 be coprime moduli, and let a1 , a2 ∈ Z. By the extended gcd
algorithm, one can compute integers u, v with um1 + vm2 = 1. Prove that x ≡
um1 a2 + vm2 a1 (mod m1 m2 ) is the simultaneous solution of the congruences
x ≡ ai (mod mi ) for i = 1, 2.
(b) Let m1 , m2 , . . . , mt be pairwise coprime moduli, and a1 , a2 , . . . , at ∈ Z.
Write an incremental procedure for the Chinese remainder theorem that starts
with the solution x ≡ a1 (mod m1 ) and then runs a loop, the i-th iteration of
which (for i = 2, 3, . . . , t in that order) computes the simultaneous solution of
x ≡ aj (mod mj ) for j = 1, 2, . . . , i.
45. [Generalized Chinese remainder theorem] Let m1 , m2 , . . . , mt be t moduli
(not necessarily coprime to one another). Prove that the congruences x ≡
ai (mod mi ) for i = 1, 2, . . . , t are simultaneously solvable if and only if
gcd(mi , mj )|(ai − aj ) for every pair (i, j) with i 6= j. Show also that in this
case the solution is unique modulo lcm(m1 , m2 , . . . , mt ).
46. (a) Design an algorithm that, given moduli m1 , m2 and integers a1 , a2 with
gcd(m1 , m2 )|(a1 − a2 ), computes a simultaneous solution of the congruences
x ≡ ai (mod mi ) for i = 1, 2.
(b) Design an algorithm to implement the generalized CRT on t > 2 moduli.
47. [Theoretical foundation of the RSA cryptosystem] Let m = p1 p2 · · · pk be a
product of k > 2 distinct primes. Prove that the map Zm → Zm that takes
a to ae (mod m) is a bijection if and only if gcd(e, φ(m)) = 1. Describe the
inverse of this exponentiation map.
68 Computational Number Theory
26 A proof for the correctness of the Cornacchia algorithm is not very easy and can be
found, for example, in the paper J. M. Basilla, On the solution of x2 +dy 2 = m, Proc. Japan
Acad., 80, Series A, 40–41, 2004.
Arithmetic of Integers 71
71. Expand
√ the√following
√ irrational numbers as infinite simple continued fractions:
2 − 1, 1/ 3 and 15 .
72. Let hn /kn be the convergents to an irrational ξ, and let Fn , n ∈ N0 , denote
the Fibonacci numbers.
forÃall n ∈ N!0 .
(a) Show that kn > Fn+1
√ n+1
1 1+ 5
(b) Deduce that kn > √ for all n ∈ N0 . (Remark: This
5 2
shows that the denominators in the convergents to an irrational number grow
quite rapidly (at least exponentially) in n.)
(c) Does there exist an irrational ξ for which kn = Fn+1 for all n ∈ N0 ?
73. (a) Prove that the continued fraction ha0 , a1 , . . . , an i equals KKn+1 (a0 ,a1 ,...,an )
n (a1 ,a2 ,...,an )
,
where Kn is the n-th continuant polynomial (Exercise 1.15).
(b) Let hn /kn be the n-th convergent to an irrational number ξ with hn , kn
defined as in Section 1.8. Prove that hn = Kn+1 (a0 , a1 , . . . , an ) and kn =
Kn (a1 , a2 , . . . , an ) for all n > 0.
(c) Argue that gcd(hn , kn ) = 1, that is, the fraction hn /kn is in lowest terms.
√
74. A real number of the form a+c b with a, b, c ∈ Z, c 6= 0, and b > 2 not a
perfect square, is called a quadratic irrational. An infinite simple continued
fraction ha0 , a1 , a2 , . . .i is called periodic if there exist s ∈ N0 and t ∈ N such
that an+t = an for all n > s. One can rewrite a periodic continued fractions as
ha0 , . . . , as−1 , b0 , . . . , bt−1 i, where the bar over the block of terms b0 , . . . , bt−1
indicates that this block is repeated ad infinitum. If s = 0, this continued
fraction can be written as h b0 , . . . , bt−1 i and is called purely periodic. Show
that a periodic simple continued fraction represents a quadratic irrational.
(Hint: First consider the case of purely periodic continued fractions, and
then adapt to the general case.)
75. Evaluate the periodic continued fractions h1, 2, 3, 4i and h1, 2, 3, 4i.
76. Prove that there are infinitely many solutions in positive integers of both the
equations x2 − 2y 2 = 1 and x2 −√ 2y 2 = −1. (Hint: Compute h2n − 2kn2 , where
hn /kn is the n-th convergent to 2.)
√
77. (a) Compute the infinite simple √ continued fraction
√ k expansion of 3.
(b) For all k > 1, write ak + bk 3 = (2 + 3) √with ak , bk integers. Prove
that for all n > 0, the (2n + 1)-th convergent of 3 is r2n+1 = an+1 /bn+1 .
(Remark: ak , bk for k > 1 constitute all the non-zero solutions of the Pell
equation a2 − 3b2 = 1. Proving this needs tools of algebraic number theory.)
√
78. (a) Compute the continued fraction expansion of 5.
(b) It is known that all the solutions of the Pell equation x2 − 5y 2 = 1 with
x, y√> 0 are of the form x = hn and y = kn , where hn /kn is a convergent
to 5. Find the solution of the Pell equation x2 − 5y 2 = 1 with the smallest
possible y > 0.
Arithmetic of Integers 73
(c) Let (a, b) denote the smallest solution obtained in Part (b). Define the
sequence of pairs (xn , yn ) of positive integers recursively as follows.
Programming Exercises
Use the GP/PARI calculator to solve the following problems.
81. For n ∈ N denote by S7 (n) the sum of the digits of n expanded in base 7.
We investigate those primes p for which S7 (p) is composite. It turns out that
for small values of p, most of the values S7 (p) are also prime. Write a GP/PARI
program that determines all primes 6 106 , for which S7 (p) is composite. Pro-
vide a theoretical argument justifying the scarcity of small primes p for which
S7 (p) is composite.
82. Let B be a positive integral bound. Write a GP/PARI program that locates all
2
+b2
pairs a, b of positive integers with 1 6 a 6 b 6 B, for which aab+1 is an integer.
2 2
Can you detect a pattern in these integer values of the expression aab+1+b
? Try
to prove your guess.
83. Let B be a positive integral bound. Write a GP/PARI program that locates all
2
+b2
pairs a, b of positive integers with 1 6 a 6 b 6 B and ab > 1, for which aab−1
is an integer. Can you detect a pattern in these integer values of the expression
a2 +b2
ab−1 ? Try to prove your guess.
84. It can be proved that given any a ∈ N, there exists an exponent e ∈ N for
which the decimal expansion of 2e starts with a (at the most significant end).
For example, if a = 7, the smallest exponent e with this property is e = 46.
Indeed, 246 = 70368744177664. Write a GP/PARI program that, given a, finds
the smallest exponent e with the above property. Using the program, compute
the value of this exponent e for a = 2013.
Chapter 2
Arithmetic of Finite Fields
75
76 Computational Number Theory
Definition 2.1 A field containing only finitely many elements is called a finite
field or a Galois field.1 ⊳
(1811–1832). Galois’s seminal work at the age of twenty solved a contemporary open math-
ematical problem that states that univariate polynomial equations cannot, in general, be
solved by radicals unless the degree of the polynomial is less than five. Galois’s work has
multiple ramifications in modern mathematics. In addition to the theory of finite fields,
Galois introduced Galois theory and a formal treatment of group theory. Galois died at
the age of twenty of a bullet injury sustained in a duel. Although Galois’s paper did not
receive immediate acceptance by mathematicians, Joseph Liouville (1809–1882) eventually
understood its importance and was instrumental in publishing the article in the Journal de
Mathématiques Pures et Appliquées in 1846.
2 Rudolf Lidl and Harald Niederreiter, Introduction to finite fields and their applications,
uF and vF are non-zero (u, v are smaller than m). It is easy to argue that a field
is an integral domain, that is, the product of two non-zero elements cannot
be 0. Thus, m cannot admit a factorization as assumed above. Moreover, if
m = 1, then F is the zero ring (not a field by definition). ⊳
The simplest examples of finite fields are the rings Zp with p prime. (Every
element of Zp \ {0} is invertible. Other field properties are trivially valid.)
Let F be a finite field of size q and characteristic p. It is easy to verify
that F contains Zp as a subfield. (Imagine the way Z and Q are embedded in
fields like R and C.) Thus, F is an extension (see below) of Zp . From algebra
it follows that F is a finite-dimensional vector space over Zp , that is, q = pn ,
where n is the dimension of F over Zp .
Proposition 2.4 Every finite field is of size pn for some p ∈ P and n ∈ N. ⊳
The converse of this is also true (although I will not prove it here).
Proposition 2.5 For every p ∈ P and n ∈ N, there exists a finite field with
exactly pn elements. ⊳
Let F, F ′ be two finite fields of the same size q = pn . Both F and F ′
are extensions of Zp . It can be proved (not very easily) that there exists an
isomorphism ϕ : F → F ′ of fields, that fixes the subfield Zp element-wise.
This result implies that any two finite fields of the same size follow the same
arithmetic. In view of this, it is customary to talk about the finite field of size
q (instead of a finite field of size q).
Definition 2.6 The finite field of size q = pn is denoted by Fq = Fpn . If q
itself is prime (corresponding to n = 1), the field Fq = Fp is called a prime
field. If n > 1, we call Fq an extension field. An alternative notation for Fq is
GF (q) (Galois field of size q). ⊳
For a prime p, the two notations Fp and Zp stand for the same algebraic
object. However, when q = pn with n > 1, the notations Fq and Zq refer to
two different rings. They exhibit different arithmetic. Fq is a field, and so every
non-zero element of it is invertible. On the other hand, Zq is not a field (nor
even an integral domain). Indeed φ(pn ) = pn−1 (p − 1), that is, Zq contains
pn−1 − 1 > 0 non-zero non-invertible elements, namely p, 2p, . . . , (pn−1 − 1)p.
Throughout the rest of this chapter, we take p to be a prime and q = pn
for some n ∈ N.
absurdity. It follows that r(θ) = 0 if and only if r(x) = 0, that is, s(x) = t(x).
This implies that different polynomials s(x), t(x) of degrees < n correspond
to different elements s(θ), t(θ) of K. Thus, K can be represented by the set
K = {t(θ) | t(x) ∈ F [x], deg t(x) < n}.
A polynomial t(x) of this form has n coefficients (those of 1, x, x2 , . . . , xn−1 ),
and each of these coefficients can assume any of the p values from F = Fp .
Consequently, the size of K is pn , that is, K is a concrete realization of the field
Fq = Fpn . This representation is called the polynomial-basis representation of
Fq over Fp , because each element of K is an Fp -linear combination of the
polynomial basis 1, θ, θ2 , . . . , θn−1 . We denote this as K = F (θ).
Fq is an n-dimensional vector space over Fp . Any set of n elements θ0 , θ1 ,
. . . , θn−1 constitute an Fp -basis of Fq if and only if these elements are linearly
independent over Fp . The elements 1, θ, θ2 , . . . , θn−1 form such a basis.
To sum up, an irreducible polynomial f (x) of degree n in Fp [x] is needed to
represent the extension Fq = Fpn . Let s(θ), t(θ) be two elements of Fq , where
s(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 ,
t(x) = b0 + b1 x + b2 x2 + · · · + bn−1 xn−1 .
Arithmetic operations on these elements are defined as follows.
s(θ) + t(θ) = (a0 + b0 ) + (a1 + b1 )θ + (a2 + b2 )θ2 + · · · + (an−1 + bn−1 )θn−1 ,
s(θ) − t(θ) = (a0 − b0 ) + (a1 − b1 )θ + (a2 − b2 )θ2 + · · · + (an−1 − bn−1 )θn−1 ,
s(θ)t(θ) = r(θ), where r(x) = (s(x)t(x)) rem f (x),
s(θ)−1 = u(θ), where u(x)s(x) + v(x)f (x) = 1 (provided that s(θ) 6= 0).
Addition and subtraction in this representation of Fq do not require the irre-
ducible polynomial f (x), but multiplication and division do. A more detailed
implementation-level description of these operations follows in Section 2.3.
Example 2.7 (1) Let us look at the polynomial-basis representation of F4 =
F22 . The polynomials of degree two in F2 [x] are x2 , x2 + x, x2 + 1, x2 + x + 1.
The first two in this list are clearly reducible. Also, x2 +1 ≡ (x+1)2 (mod 2) is
reducible. The polynomial x2 +x+1 is irreducible. So we take f (x) = x2 +x+1
as the defining polynomial, and represent
F4 = F2 (θ) = {a1 θ + a0 | a1 , a0 ∈ {0, 1}}, where θ2 + θ + 1 = 0.
The elements of F4 are, therefore, 0, 1, θ, θ+1. The addition and multiplication
tables for F4 are given below.
0 1 θ θ+1 0 1 θ θ+1
0 0 1 θ θ+1 0 0 0 0 0
1 1 0 θ+1 θ 1 0 1 θ θ+1
θ θ θ+1 0 1 θ 0 θ θ+1 1
θ+1 θ+1 θ 1 0 θ+1 0 θ+1 1 θ
Addition in F4 Multiplication in F4
80 Computational Number Theory
0 1 θ θ+1 θ2 θ2 + 1 θ2 + θ θ2 + θ + 1
0 0 0 0 0 0 0 0 0
1 0 1 θ θ+1 θ2 θ2 + 1 θ2 + θ θ2 + θ + 1
θ 0 θ θ2 θ2 + θ θ2 + 1 θ2 + θ + 1 1 θ+1
θ+1 0 θ+1 θ2 + θ θ2 + 1 1 θ θ2 + θ + 1 θ2
θ2 0 θ2 θ2 + 1 1 θ2 + θ + 1 θ + 1 θ θ2 + θ
θ2 + 1 0 θ2 + 1 θ2 + θ + 1 θ θ+1 θ2 + θ θ2 1
θ2 + θ 0 θ2 + θ 1 θ2 + θ + 1 θ θ2 θ+1 θ2 + 1
θ2 + θ + 1 0 θ2 + θ + 1 θ + 1 θ2 θ2 + θ 1 θ2 + 1 θ
0 1 2 θ θ+1 θ+2 2θ 2θ + 1 2θ + 2
0 0 1 2 θ θ+1 θ+2 2θ 2θ + 1 2θ + 2
1 1 2 0 θ+1 θ+2 θ 2θ + 1 2θ + 2 2θ
2 2 0 1 θ+2 θ θ+1 2θ + 2 2θ 2θ + 1
θ θ θ+1 θ+2 2θ 2θ + 1 2θ + 2 0 1 2
θ+1 θ+1 θ+2 θ 2θ + 1 2θ + 2 2θ 1 2 0
θ+2 θ+2 θ θ+1 2θ + 2 2θ 2θ + 1 2 0 1
2θ 2θ 2θ + 1 2θ + 2 0 1 2 θ θ+1 θ+2
2θ + 1 2θ + 1 2θ + 2 2θ 1 2 0 θ+1 θ+2 θ
2θ + 2 2θ + 2 2θ 2θ + 1 2 0 1 θ+2 θ θ+1
0 1 2 θ θ+1 θ+2 2θ 2θ + 1 2θ + 2
0 0 0 0 0 0 0 0 0 0
1 0 1 2 θ θ+1 θ+2 2θ 2θ + 1 2θ + 2
2 0 2 1 2θ 2θ + 2 2θ + 1 θ θ+2 θ+1
θ 0 θ 2θ 2 θ+2 2θ + 2 1 θ+1 2θ + 1
θ+1 0 θ+1 2θ + 2 θ+2 2θ 1 2θ + 1 2 θ
θ+2 0 θ+2 2θ + 1 2θ + 2 1 θ θ+1 2θ 2
2θ 0 2θ θ 1 2θ + 1 θ+1 2 2θ + 2 θ+2
2θ + 1 0 2θ + 1 θ+2 θ+1 2 2θ 2θ + 2 θ 1
2θ + 2 0 2θ + 2 θ+1 2θ + 1 θ 2 θ+2 1 2θ
is called the leading coefficient of the polynomial, denoted lc f (x). If lc f (x) = 1, we call
f (x) monic. If f (x) is any non-zero polynomial over a field F with a = lc f (x), multiplying
f (x) by a−1 ∈ F gives a monic polynomial.
82 Computational Number Theory
gp > f = Mod(1,2)*x^3+Mod(1,2)*x^2+Mod(1,2)
%1 = Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2)
gp > a = Mod(Mod(1,2)*x^2+Mod(1,2)*x, f)
%2 = Mod(Mod(1, 2)*x^2 + Mod(1, 2)*x, Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > b = Mod(Mod(1,2)*x^2+Mod(1,2)*x+Mod(1,2), f)
%3 = Mod(Mod(1, 2)*x^2 + Mod(1, 2)*x + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2
+ Mod(1, 2))
gp > a + b
%4 = Mod(Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > a * b
%5 = Mod(Mod(1, 2)*x^2 + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > a^(-1)
%6 = Mod(Mod(1, 2)*x, Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > a / b
%7 = Mod(Mod(1, 2)*x^2, Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > a^4
%8 = Mod(Mod(1, 2)*x^2 + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
The fact that the inverse a−1 is correctly computed can be verified by
invoking the extended gcd function on polynomials.
gp > bezout(Mod(1,2)*x^2+Mod(1,2)*x,f)
%9 = [Mod(1, 2)*x, Mod(1, 2), Mod(1, 2)]
The expressions handled by GP/PARI may appear a bit clumsy. But if one
looks closely at these expressions, the exact structure of the elements becomes
absolutely clear. Our simpler (and more compact) mathematical notations are
meaningful only under the assumption that certain symbols are implicitly un-
derstood from the context (like θ is a root of the defining polynomial f (x)).
Given that GP/PARI provides only a text-based interface and supports a va-
Arithmetic of Finite Fields 83
gp > c = lift(a * b)
%10 = Mod(1, 2)*x^2 + Mod(1, 2)
gp > d = lift(c)
%11 = x^2 + 1
gp > c^4
%12 = Mod(1, 2)*x^8 + Mod(1, 2)
gp > d^4
%13 = x^8 + 4*x^6 + 6*x^4 + 4*x^2 + 1
gp > e = a * b
%14 = Mod(Mod(1, 2)*x^2 + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > e^4
%15 = Mod(Mod(1, 2)*x + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > lift(e^4)
%16 = Mod(1, 2)*x + Mod(1, 2)
gp > lift(lift(e^4))
%17 = x + 1
(1) Every prime factor of n must divide ordp (−a), but not (p − 1)/ ordp (−a).
(2) If n ≡ 0 (mod 4), then p ≡ 1 (mod 4). ⊳
This result indicates that after trying O(n) random values of b, we expect
to obtain an irreducible trinomial of the form xn + x + b. The choice k = 1 in
Theorem 2.9 is particularly conducive to efficient implementations. However,
if p is small, there are not many choices for b for the random search to succeed
with high probability. In that case, we need to try with other values of k.
For every finite field Fp and every degree n, an irreducible binomial or
trinomial or quadrinomial may fail to exist. An example (p = 2 and n = 8) is
covered in Exercise 2.5. However, the following conjecture4 is interesting.
Conjecture 2.10 For any finite field Fq with q > 3 and for any n ∈ N, there
exists an irreducible polynomial in Fq [x] with degree n and with at most four
non-zero terms. ⊳
Therefore, an element
an−1 θn−1 + an−2 θn−2 + · · · + a1 θ + a0
with each ai ∈ {0, 1, 2} being encoded by Kawahara et al.’s scheme can be rep-
resented by the bit string hn−1 ln−1 hn−2 ln−2 . . . h1 l1 h0 l0 of length 2n, where
hi li is the two-bit encoding of ai . It is advisable to separately store the high-
order bits and the low-order bits. That means that the above element is to
be stored as two bit arrays hn−1 hn−2 . . . h1 h0 and ln−1 ln−2 . . . l1 l0 each of size
n. Each of these bit arrays can be packed individually in an array of w-bit
words, as done for binary fields.
Example 2.12 Let us represent F319 as F3 (θ), where θ is a root of f (x) =
x19 + x2 + 2. Consider the element
2θ18 + θ16 + θ13 + 2θ10 + θ6 + θ5 + 2θ2
of F319 . As a sequence of ternary digits, this polynomial can be represented as
2010010020001100200. Under Kawahara et al.’s encoding, the bit representa-
tion of this element is as follows. We take words of size w = 8 bits.
High-order bit array 110 11011111 10011111
Low-order bit array 011 11111011 11111011
Different words are separated by spaces. ¤
2.3.2.2 Multiplication
Multiplication in Fpn involves two basic operations. First, the two operands
are multiplied as polynomials in Fp [x]. The result is a polynomial of degree 6
2(n−1). Subsequently, this product is divided by the defining polynomial f (x).
The remainder (a polynomial of degree < n) is the canonical representative
88 Computational Number Theory
of the product in the field. In what follows, I separately discuss these two
primitive operations. As examples, I concentrate on binary fields only.
The first approach towards multiplying two polynomials of degrees < n is
to initialize the product as the zero polynomial with (formal) degree 2(n − 1).
For each non-zero term bi xi in the second operand, the first operand is multi-
plied by bi , shifted by i positions, and added to the product. For binary fields,
the only non-zero value of bi is 1, so only shifting and adding (XOR) suffice.
Example 2.15 Let us multiply the two elements α and β of Example 2.13.
The exponents i in the non-zero terms θi of β and the corresponding shifted
versions of α are shown below. When we add (XOR) all these shifted values,
we obtain the desired product.
i xi α(x)
0 101 01010111 00000101
3 101010 10111000 00101
4 1010101 01110000 0101
5 10101010 11100000 101
6 1 01010101 11000001 01
7 10 10101011 10000010 1
8 101 01010111 00000101
9 1010 10101110 0000101
11 101010 10111000 00101
17 1010 10101110 0000101
01010 10001000 01100101 00011011 00011101
Storing the product, a polynomial of degree 6 36, needs five eight-bit words. ¤
The left-to-right comb method shifts the product polynomial (instead of the
first operand). This method processes bit positions j = w − 1, w − 2, . . . , 1, 0
in a word, in that sequence. For all words in β with the j-th bit set, α is
added to the product with appropriate word-level shifts. When a particular j
is processed, the product is multiplied by x (left-shifted by one bit), so that it
is aligned at the next (the (j − 1)-st) bit position in the word. The left-to-right
comb method which shifts the product polynomial is expected to be slower
than the right-to-left comb method which shifts the first operand.
Example 2.17 The left-to-right comb method is now illustrated for the mul-
tiplication of Example 2.15. The product polynomial γ is always maintained
as a polynomial of (formal) degree 36. The k-th word of γ is denoted by γk .
Example 2.18 The working of the right-to-left windowed comb method ap-
plied to the multiplication of Example 2.16 is demonstrated now. We take the
window size k = 2. The four products are precomputed as
(00)α(x) = (0x + 0)α(x) = 0000 00000000 00000000
(01)α(x) = (0x + 1)α(x) = 0101 01010111 00000101
(10)α(x) = (1x + 0)α(x) = 1010 10101110 00001010
(11)α(x) = (1x + 1)α(x) = 1111 11111001 00001111
The multiplication loop runs as follows.
For the bit pattern 00, no addition needs to be made. In Example 2.15,
2.16 or 2.17, ten XOR operations are necessary, whereas the windowed comb
method needs only seven. Of course, we now have the overhead of precompu-
tation. In general, two is not a good window size (even for larger extensions
than used in these examples). A good tradeoff among the overhead of pre-
computation (and storage), the number of XOR operations, and programming
convenience (the window size k should divide the word size w) is k = 4. ¤
That indicates that we need to convert the left-to-right comb method to the
windowed form. Since k bits are simultaneously processed from β, the product
γ should now be left-shifted by k bits.
Example 2.19 The left-to-right windowed comb method works on the mul-
tiplication of Example 2.15 as follows. We take the window size k = 2 for
our illustration. The four precomputed polynomials are the same as in Exam-
ple 2.18. The main multiplication loop is now unfolded.
In the above table, the operation “Add” stands for adding a word-level shift
of a precomputed polynomial to γ. These word-level shifts are not computed
explicitly, but are shown here for the reader’s convenience. The “Shift” oper-
ation stands for two-bit left shift of γ. As in Example 2.18, only seven XOR
operations suffice. The number of bit-level shifts of the product (each by k bits)
is always (w/k) − 1 (three in this example), independent of the operands. ¤
Example 2.20 Let us reduce the product γ(x) computed in Examples 2.15–
2.19. Elimination of its non-zero terms of degrees > 19 is illustrated below.
Here, “Shift” is the shifted polynomial xi−n f (x), and “Add” is the addition
of this shifted value to γ(x). After six iterations of term cancellation, γ(x)
reduces to a polynomial of degree 16. It follows that the product of α and β
(of Example 2.15) in F219 is θ16 + θ15 + θ11 + θ10 + θ8 + θ. ¤
by subtracting axi−n f (x) from γ(x) does not affect other coefficients residing
in the same word of γ(x) storing the coefficient of xi . This means that we can
now cancel an entire word together.
To be more precise, let us concentrate on binary fields, and write f (x) =
xn + f1 (x) with n1 = deg f1 (x) 6 n − w. We want to cancel the leftmost
non-zero word µ from γ(x). Clearly, µ is a polynomial of degree 6 w − 1.
If µ is the r-th word in γ, we need to add (XOR) xrw−n µf (x) to γ(x). But
xrw−n µf (x) = xrw µ + xrw−n µf1 (x). The first part xrw µ is precisely the r-th
word of γ, so we can set this word to zero without actually performing the
addition. The condition n1 6 n−w indicates that the second part xrw−n µf1 (x)
does not have non-zero terms in the r-th word of γ. Since multiplication by
xrw−n is a left shift, the only non-trivial computation is that of µf1 (x). But
µ has a small degree (6 w − 1). If f1 (x) too has only a few non-zero terms,
this multiplication can be quite efficient. We can use a comb method for this
multiplication in order to achieve higher efficiency. Since f1 (x) is a polynomial
dependent upon the representation of the field (but not on the operands), the
precomputation for a windowed comb method needs to be done only once, for
all reduction operations in the field. Even eight-bit windows can be feasible
in terms of storage if f1 (x) has only a few non-zero coefficients.
r µ Intermediate values
γ(x) = 01010 10001000 01100101 00011011 00011101
4 00001010 x13 µf1 (x) = 00000 00101110 110
γ(x) = 00000 10001000 01001011 11011011 00011101
3 10001000 x5 µf1 (x) = 00010 01010111 000
γ(x) = 00000 00000000 01001001 10001100 00011101
2 00001001 µf1 (x) = 00000001 00011111
γ(x) = 00000 00000000 00000001 10001101 00000010
The last iteration (for r = 2) is a bit tricky. This word of γ indicates the
non-zero terms x22 , x19 and x16 . We need to remove the first two of these, but
we cannot remove x16 . Since n = 19, we consider only the coefficients of x19
to x23 . The word is appropriately right shifted to compute µ in this case. ¤
Example 2.22 NIST recommends7 the representation of F2233 using the ir-
reducible polynomial f (x) = x233 + x74 + 1. As in a real-life implementation,
we now choose w = 64. An element of F2233 fits in four words. Moreover, an
unreduced product γ of two field elements is a polynomial of degree at most
464, and fits in eight words. Let us denote the words of γ as γ0 , γ1 , . . . , γ7 . We
need to eliminate γ7 , γ6 , γ5 , γ4 completely and γ3 partially. For r = 7, 6, 5, 4
(in that sequence), we need to compute xrw−n µf1 = x64r−233 γr (x74 + 1) =
(x64r−159 + x64r−233 )γr = (x64(r−3)+33 + x64(r−4)+23 )γr = x64(r−3) (x33 γr ) +
x64(r−4) (x23 γr ). Subtracting (XORing) this quantity from γ is equivalent to
the following four word-level XOR operations:
r1 r2 u1 u2
Set r1 = r1 + r2 and u1 = u1 + u2 .
x5 x2 + x + 1 x6 + x5 + x4 + 1 x5 + x4 + x2 + 1
Repeatedly remove x from r1 . Adjust u1 .
x4 x2 + x + 1 x6 + x5 + x4 + x3 + x2 x5 + x4 + x2 + 1
3 2
x x +x+1 x5 + x4 + x3 + x2 + x x5 + x4 + x2 + 1
2 2
x x +x+1 x4 + x3 + x2 + x + 1 x5 + x4 + x2 + 1
2
x x +x+1 x6 + x3 + x + 1 x5 + x4 + x2 + 1
2
1 x +x+1 x6 + x5 + 1 x5 + x4 + x2 + 1
In this example, r1 eventually becomes 1, so the inverse of α is the value
of u1 at that time, that is, x6 + x5 + 1. ¤
For integers, binary gcd is usually faster than Euclidean gcd, since Euclid-
ean division is significantly more expensive than addition and shifting. For
polynomials, binary inverse and Euclidean inverse have comparable perfor-
mances. Here, Euclidean inverse can be viewed as a sequence of removing the
most significant terms from one of the remainders. In binary inverse, a term is
removed from the least significant end, and subsequently divisions by x restore
the least significant term back to 1. Both these removal processes use roughly
the same number and types (shift and XOR) of operations.
We do not update u1 and v1 here. However, since the value of k has changed
to k + t, the other equation must be updated to agree with this, that is, the
second equation is transformed as
(xt u2 )α + (xt v2 )f = xk+t r2 .
Renaming xt u2 as u2 and xt v2 as v2 restores both the invariances. Algorithm
2.2 implements this idea. We do not need to maintain v1 , v2 explicitly.
r1 r2 u1 u2 t
x6 + x3 + x2 + x x7 + x3 + 1 1 0 0
Remove x1 from r1 . Adjust u2 .
x5 + x2 + x + 1 x7 + x3 + 1 1 0 1
Set r2 = r2 + r1 and u2 = u2 + u1 .
x5 + x2 + x + 1 x7 + x5 + x3 + x2 + x 1 1 1
Remove x1 from r2 . Adjust u1 .
x5 + x2 + x + 1 x6 + x4 + x2 + x + 1 x 1 2
Set r2 = r2 + r1 and u2 = u2 + u1 .
x5 + x2 + x + 1 x6 + x5 + x4 x x+1 2
4
Remove x from r2 . Adjust u1 .
x5 + x2 + x + 1 x2 + x + 1 x5 x+1 6
Set r1 = r1 + r2 and u1 = u1 + u2 .
x5 x2 + x + 1 x5 + x + 1 x+1 6
5
Remove x from r1 . Adjust u2 .
1 x2 + x + 1 x5 + x + 1 x6 + x5 11
l h u + hf (u + hf )/xl (renamed as u)
x5 + x + 1
3 x+1 x8 + x7 + x5 + x4 + x3 x + x4 + x2 + x + 1
5
2
3 x +x+1 x9 + x8 + x7 + x3 x6 + x5 + x4 + 1
3 1 x + x6 + x5 + x4 + x3
7
x + x3 + x2 + x + 1
4
2 x+1 x8 + x7 + x2 x6 + x5 + 1 ¤
Proof First, take α 6= 0, and let α1 , . . . , αq−1 be all the elements of F∗q =
Qq−1
Fq \{0}. Then, αα1 , . . . , ααq−1 is a permutation of α1 , . . . , αq−1 , so i=1 αi =
Qq−1 q−1
Qq−1 q−1
i=1 (ααi ) = α i=1 αi . Canceling the product yields α = 1 and so
q q
α = α. For α = 0, we have 0 = 0. ⊳
A very important consequence of this theorem follows.
Theorem 2.27 Y The polynomial xq − x ∈ Fp [x] splits into linear factors over
Fq as xq − x = (x − α). ⊳
α∈Fq
The proper divisors of the extension degree 6 are 1, 2, 3. The unique interme-
diate field of F64 of size 21 is {0, 1}. The intermediate field of size 22 is
{0, 1, θ5 + θ4 + θ3 + θ, θ5 + θ4 + θ3 + θ + 1}.
Finally, the intermediate field of size 23 is
n o
0, 1, θ3 +θ2 +θ, θ3 +θ2 +θ+1, θ4 +θ2 +θ, θ4 +θ2 +θ+1, θ4 +θ3 , θ4 +θ3 +θ . ¤
f (x) = a0 + a1 x + a2 x2 + · · · + an xn
with each ai ∈ Fp . Exercise 1.34 implies f (x)p = ap0 +ap1 xp +ap2 x2p +· · ·+apn xnp .
By Fermat’s little theorem, api = ai in Fp , and so f (x)p = a0 + a1 xp + a2 x2p +
· · · + an xnp = f (xp ). Putting x = θ yields f (θp ) = f (θ)p = 0p = 0, that is,
θp is again a root of f (x). Moreover, θp ∈ Fq . We can likewise argue that
2 3 2
θp = (θp )p , θp = (θp )p , . . . are roots of f (x) and lie in Fq . One can show
2 n−1
that the roots θ, θp , θp , . . . , θp of f (x) are pairwise distinct and so must
be all the roots of f (x). In other words, f (x) splits into linear factors over Fq :
2 n−1
f (x) = an (x − θ)(x − θp )(x − θp ) · · · (x − θp ).
i
Definition 2.33 The elements θp for i = 0, 1, 2, . . . , n − 1 are called conju-
gates of θ. (More generally, the roots of an irreducible polynomial over any
field are called conjugates of one another.)
2 n−1
If θ, θp , θp , . . . , θp are linearly independent over Fp , θ is called a normal
p p2 n−1
element of Fq , and θ, θ , θ , . . . , θp a normal basis of Fq over Fp .
If a normal element θ is also a primitive element of Fq , we call θ a primitive
2 n−1
normal element, and θ, θp , θp , . . . , θp a primitive normal basis. ⊳
Example 2.34 We represent F64 as in Examples 2.29 and 2.32. The elements
i
θ2 for 0 6 i 6 5 are now expressed in the polynomial basis 1, θ, . . . , θ5 .
Arithmetic of Finite Fields 103
θ = θ
θ2 = θ2
θ4 = θ4
θ8 = θ 2
+ θ 3
θ16 = 1 + θ + θ4
θ32 = 1 + θ 3
γ = θ5 + 1,
γ2 = θ5 + θ4 + 1,
γ4 = θ5 + θ4 + θ3 + θ2 + 1,
γ8 = θ5 + θ3 + θ2 + θ,
γ 16 = θ5 + θ2 + θ + 1, and
γ 32 = θ5 + θ2 + 1.
Example 2.37 (1) Consider the element γ ∈ F64 of Example 2.35. We have
γ 1 0 0 0 0 1 1
2
γ 1 0 0 0 1 1 θ
4
1 θ2
γ 1 0 1 1 1
8 = .
γ 0 1 1 1 0 1 θ3
16 4
γ 1 1 1 0 0 1 θ
γ 32 1 0 1 0 0 1 θ5
The 6 × 6 transformation matrix has determinant 1 modulo 2. Therefore, γ
is a normal element of F64 and γ, γ 2 , γ 4 , γ 8 , γ 16 , γ 32 constitute a normal basis
of F64 over F2 . By Example 2.32, ord γ = 63, that is, γ is also a primitive
element of F64 . Therefore, γ is a primitive normal element of F64 , and the
basis γ, γ 2 , γ 4 , γ 8 , γ 16 , γ 32 is a primitive normal basis of F64 over F2 .
(2) The conjugates of δ = θ5 + θ4 + θ3 + 1 are
9 Eisenstein (1850) conjectured that normal bases exist for all finite fields. Kurt Hensel
(1888) first proved this conjecture. Hensel and Ore counted the number of normal elements
in a finite field.
10 The proof that primitive normal bases exist for all finite fields can be found in the
paper: Hendrik W. Lenstra, Jr. and René J. Schoof, Primitive normal bases for finite fields,
Mathematics of Computation, 48, 217–231, 1986.
Arithmetic of Finite Fields 105
δ = θ5 + θ4 + θ3 + 1,
2
δ = θ5 + θ4 + θ3 + θ2 + θ,
δ4 = θ5 + θ3 + θ + 1,
δ8 = θ5 + θ4 + θ2 + θ,
δ 16 = θ5 + θ3 , and
δ 32 = θ5 + θ4 + θ + 1,
so that
δ 1 0 0 1 1 1 1
2
δ 0 1 1 1 1 1 θ
4
1 θ2
δ 1 1 0 1 0
8 = .
δ 0 1 1 0 1 1 θ3
16 4
δ 0 0 0 1 0 1 θ
δ 32 1 1 0 0 1 1 θ5
The transformation matrix has determinant 1 modulo 2, that is, δ is a nor-
mal element of F64 , and δ, δ 2 , δ 4 , δ 8 , δ 16 , δ 32 constitute a normal basis of F64
over F2 . However, by Example 2.32, ord δ = 21, that is, δ is not a primi-
tive element of F64 , that is, δ is not a primitive normal element of F64 , and
δ, δ 2 , δ 4 , δ 8 , δ 16 , δ 32 is not a primitive normal basis of F64 over F2 . Combin-
ing this observation with Example 2.34, we conclude that being a primitive
element is neither necessary nor sufficient for being a normal element. ¤
The relevant computational question here is how we can efficiently locate
normal elements in a field Fpn . The first and obvious strategy is keeping on
picking random elements from Fpn until a normal element is found. Each nor-
mality check involves computing the determinant (or rank) of an n × n matrix
with entries from Fp , as demonstrated in Example 2.37. Another possibility
is to compute the gcd of two polynomials over Fpn (see Exercise 3.42). Such
a random search is efficient, since the density of normal elements in a finite
field is significant. More precisely, a random element of Fpn is normal over
Fp with probability > 1/34 if n 6 p4 and with probability > 1/(16 logp n)
if n > p4 . These density estimates, and also a deterministic polynomial-time
algorithm based on polynomial root finding, can be found in the paper of
Von zur Gathen and Giesbrecht.11 This paper also proposes a randomized
polynomial-time algorithm for finding primitive normal elements.
A more efficient randomized algorithm is based on the following result,
proved by Emil Artin. This result is, however, inappropriate if p is small.
Proposition 2.38 Represent Fpn = Fp (θ), where θ is a root of the monic
irreducible polynomial f (x) ∈ Fp [x] (of degree n). Consider the polynomial
f (x)
g(x) = ∈ Fpn [x],
(x − α)f ′ (α)
11 Joachim Von Zur Gathen and Mark Giesbrecht, Constructing normal bases in finite
where f ′ (x) is the formal derivative of f (x). Then, there are at least p−n(n−1)
elements a in Fp , for which g(a) is a normal element of Fpn over Fp . ⊳
It follows that if p > 2n(n − 1), then for a random a ∈ Fp , the element
g(a) ∈ Fpn is normal over Fp with probability at least 1/2. Moreover, in this
case, a random element in Fpn is normal with probability at least 1/2 (as
proved by Gudmund Skovbjerg Frandsen from Aarhus University, Denmark).
Deterministic polynomial-time algorithms are known for locating normal
elements. For example, see Lenstra’s paper cited in Footnote 15 on page 114.
gp > minimalpoly(a) = \
p = y - a; \
b = a * a; \
while (b-a, p *= (y-b); b = b*b); \
lift(lift(p))
gp > minimalpoly(Mod(Mod(0,2),f))
%2 = y
gp > minimalpoly(Mod(Mod(1,2),f))
%3 = y + 1
gp > minimalpoly(Mod(Mod(1,2)*x,f))
%4 = y^6 + y + 1
gp > minimalpoly(Mod(Mod(1,2)*x^5+Mod(1,2)*x^4+Mod(1,2)*x^3+Mod(1,2)*x,f))
%5 = y^2 + y + 1
gp > minimalpoly(Mod(Mod(1,2)*x^4+Mod(1,2)*x^3,f))
%6 = y^3 + y^2 + 1
gp > minimalpoly(Mod(Mod(1,2)*x^5+Mod(1,2),f))
%7 = y^6 + y^5 + 1
gp > minimalpoly(Mod(Mod(1,2)*x^5+Mod(1,2)*x^2+Mod(1,2)*x+Mod(1,2),f))
%8 = y^6 + y^5 + 1
We pass a polynomial in F2 [x] of degree less than six as the only argument
of isnormal() to check whether this corresponds to a normal element of F64 .
gp > isnormal(Mod(1,2)*x^5+Mod(1,2)*x^4+Mod(1,2)*x^3+Mod(1,2))
M =
[1 0 0 1 1 1]
[0 1 1 1 1 1]
[1 1 0 1 0 1]
[0 1 1 0 1 1]
[0 0 0 1 0 1]
[1 1 0 0 1 1]
det(M) = Mod(1, 2)
normal
%9 = 1
gp > isnormal(Mod(1,2)*x^5+Mod(1,2)*x)
M =
[0 1 0 0 0 1]
[0 0 1 0 1 1]
[0 0 1 1 0 1]
[1 1 0 0 0 1]
[1 0 1 0 1 1]
[1 0 1 1 0 1]
det(M) = Mod(0, 2)
not normal
%10 = 0
108 Computational Number Theory
ψ · ψ = θ 2 + 1 = ψ1 , ψ · ψ 2 = θ 2 = ψ0 + ψ2 , ψ · ψ 4 = θ = ψ 1 + ψ2 .
Example 2.41 (1) 3 is a primitive element in the prime field F17 . The powers
of 3 are given in the following table.
i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
3i (mod 17) 1 3 9 10 13 5 15 11 16 14 8 7 4 12 2 6 1
From this table, the Zech’s logarithm table can be computed as follows. For
j ∈ {0, 1, 2, . . . , 15}, compute 1 + 3j (mod 17) and then locate the value zj for
which 3zj ≡ 1 + 3j (mod 17).
j 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
zj 14 12 3 7 9 15 8 13 − 6 2 10 5 4 1 11
The Zech logarithm table can be used as follows. Take i = 8 and j = 13. Then,
3i +3j ≡ 3k (mod 17), where k ≡ i+zj−i ≡ 8+z5 ≡ 8+15 ≡ 23 ≡ 7 (mod 16).
(2) Let F8 = F2 (θ), where θ3 + θ + 1 = 0. Consider the powers of γ = θ.
i 0 1 2 3 4 5 6
γi 1 θ θ2 θ+1 θ +θ2 2
θ +θ+1 2
θ +1
13 The concept of Zech’s logarithms was introduced near the mid-nineteenth century. Ref-
j 0 1 2 3 4 5 6
zj − 3 6 1 5 4 2
Example 2.42 Let us represent F64 using the intermediate field F8 . First,
represent F8 as the extension of F2 obtained by adjoining a root θ of the
irreducible polynomial f (x) = x3 + x + 1 ∈ F2 [x]. Thus, every element of F8
is of the form a2 θ2 + a1 θ + a0 , and the arithmetic in F8 is the polynomial
arithmetic of F2 [x] modulo f (x).
Next, consider the polynomial g(y) = y 2 + (θ2 + 1)y + θ ∈ F8 [y]. One can
easily verify that g(α) 6= 0 for all α ∈ F8 , that is, g(y) has no root in F8 .
Since the degree of g(y) is 2, it follows that g(y) is irreducible in F8 [y]. Let
ψ be a root of g(y), which we adjoin to F8 in order to obtain the extension
F64 of F8 . Thus, every element of F64 is now a polynomial in ψ of degree
< 2, that is, of the form u1 ψ + u0 , where u0 , u1 are elements of F8 , that is,
polynomials in θ of degrees < 3. The arithmetic of F64 in this representation
is the polynomial arithmetic of F8 [y] modulo the irreducible polynomial g(y).
The coefficients of these polynomials follow the arithmetic of F8 , that is, the
polynomial arithmetic of F2 [x] modulo f (x).
As a specific example, consider the elements α = (θ + 1)ψ + (θ2 ) and
β = (θ2 + θ + 1)ψ + (1) in F64 . Their sum is
αβ = [(θ + 1)(θ2 + θ + 1)]ψ 2 + [(θ + 1)(1) + (θ2 )(θ2 + θ + 1)]ψ + [(θ2 )(1)]
= (θ3 + 1)ψ 2 + (θ4 + θ3 + θ2 + θ + 1)ψ + (θ2 )
= (θ + 1 + 1)ψ 2 + [θ(θ + 1) + (θ + 1) + θ2 + θ + 1]ψ + (θ2 )
[since θ3 + θ + 1 = 0]
112 Computational Number Theory
µ(1) = 1,
µ(θ) = ψ 2 + 1,
µ(θ2 ) = µ(θ)2 = ψ 4 + 1 = ψ(ψ 2 + 1) + 1 = ψ 3 + ψ + 1
= (ψ 2 + 1) + ψ + 1 = ψ 2 + ψ.
1 0 0
Thus, the transformation matrix is T = 1 0 1 . Take the elements
0 1 1
α = θ + 1 and β = θ2 + θ + 1 in K. Since ( 1 1 0 ) T = ( 0 0 1 ) and
( 1 1 1 ) T = ( 0 1 0 ), we have α′ = µ(α) = ψ 2 , and β ′ = µ(β) = ψ. We
have α + β = θ2 , so µ(α + β) = µ(θ2 ) = ψ 2 + ψ = α′ + β ′ = µ(α) + µ(β),
as expected. Moreover, αβ = θ3 + 1 = θ, so µ(αβ) = µ(θ) = ψ 2 + 1, whereas
α′ β ′ = ψ 3 = ψ 2 + 1, that is, µ(αβ) = µ(α)µ(β), as expected. ¤
14 S. A. Evdokimov, Factorization of solvable polynomials over finite fields and the gener-
alized Riemann hypothesis, Journal of Mathematical Sciences, 59(3), 842–849, 1992. This
is a translation of a Russian article published in 1989.
15 Hendrik W. Lenstra, Jr., Finding isomorphisms between finite fields, Mathematics of
Exercises
1. Let F be a field (not necessarily finite), and F [x] the ring of polynomials in
one indeterminate x. Let f (x), g(x) ∈ F [x] with g(x) 6= 0.
(a) Prove that there exist unique polynomials q(x), r(x) ∈ F [x] satisfying
f (x) = q(x)g(x) + r(x), and either r(x) = 0 or deg r(x) < deg g(x).
(b) Prove that gcd(f (x), g(x)) = gcd(g(x), r(x)). (Remark: If d(x) is a gcd
of f (x) and g(x), then so also is ad(x) for any non-zero a ∈ F . We can adjust
a so that the leading coefficient of ad(x) equals one. This monic gcd is called
the gcd of f (x) and g(x), and is denoted by gcd(f (x), g(x)).)
(c) Prove that there exist polynomials u(x), v(x) ∈ F [x] with the property
gcd(f (x), g(x)) = u(x)f (x) + v(x)g(x).
(d) Prove that if f (x) and g(x) are non-constant, we may choose u(x), v(x)
in such a way that deg u(x) < deg g(x) and deg v(x) < deg f (x).
2. We have seen that the polynomials x2 + x + 1, x3 + x + 1 and x6 + x + 1 are
irreducible in F2 [x]. Prove or disprove: the polynomial xn + x + 1 is irreducible
in F2 [x] for every n > 2.
3. Consider the extension of Q obtained by adjoining a root of the irreducible
polynomial x4 + 1. Derive how x4 + 1 factors in the extension.
4. (a) List all monic irreducible polynomials of degrees 1, 2, 3, 4 in F2 [x].
(b) List all monic irreducible polynomials of degrees 1, 2, 3 in F3 [x].
(c) List all monic irreducible polynomials of degrees 1, 2, 3 in F5 [x].
5. (a) Verify whether x8 + x + 1 and x8 + x3 + 1 are irreducible in F2 [x].
(b) Prove or disprove: There does not exist an irreducible binomial/trinomial/
quadrinomial of degree eight in F2 [x].
6. (a) Prove that the polynomial f (x) = x4 + x + 4 is irreducible in F5 [x].
(b) Represent F625 = F54 by adjoining a root θ of f (x) to F5 , and let α =
2θ3 + 3θ + 4 and β = θ2 + 2θ + 3. Compute α + β, α − β, αβ and α/β.
7. (a) Which of the polynomials x2 ± 7 is irreducible modulo 19? Justify.
(b) Using the irreducible polynomial f (x) of Part (a), represent the field
F361 = F192 as F19 (θ), where f (θ) = 0. Compute (2θ + 3)11 in this repre-
sentation of F361 using left-to-right square-and-multiply exponentiation.
8. Let F2n have a polynomial-basis representation. Store each element of F2n as
an array of w-bit words. Denote the words of α ∈ F2n by α0 , α1 , . . . , αN −1 ,
where N = ⌈n/w⌉. Write pseudocodes for addition, schoolbook multiplication,
left-to-right comb multiplication, modular reduction, and inverse in F2n .
9. Let α = an−1 θn−1 + an−2 θn−2 + · · · + a1 θ + a0 ∈ F2n with ai ∈ F2n .
(a) Prove that α2 = an−1 θ2(n−1) + an−2 θ2(n−2) + · · · + a1 θ2 + a0 .
(b) How can you efficiently square a polynomial in F2 [x] under the bit-vector
representation? Argue that squaring is faster (in general) than multiplication.
(c) How can precomputation speed up this squaring algorithm?
116 Computational Number Theory
10. Explain how the coefficients of x255 through x233 in γ3 of Example 2.22 can
be eliminated using bit-wise shift and XOR operations.
11. Design efficient reduction algorithms (using bit-wise operations) for the fol-
lowing fields recommended by NIST. Assume a packing of 64 bits in a word.
(a) F21223 defined by x1223 + x255 + 1.
(b) F2571 defined by x571 + x10 + x5 + x2 + 1.
12. Repeat Exercise 2.11 for a packing of 32 bits per word.
13. An obvious way to compute β/α for α, β ∈ F2n , α 6= 0, is to compute β × α−1
which involves one inverse computation and one multiplication. Explain how
the multiplication can be avoided altogether by modifying the initialization
step of the binary inverse algorithm (Algorithm 2.1).
14. Let α ∈ F∗2n .
n
(a) Prove that α−1 = α2 −2 .
(b) Use Part (a) and the fact that 2n − 2 = 2 + 22 + 23 + · · · + 2n−1 to design
an algorithm to compute inverses in F2n .
n
15. We now investigate another way of computing α−1 = α2 −2 for an α ∈ F∗2n .
k
(a) Suppose that α2 −1 has been computed for some k > 1. Explain how
2k 2k+1
α2 −1 and α2 −1
can be computed.
n
(b) Based upon the result of Part (a), devise an algorithm to compute α2 −2 =
n−1
(α2 −1 )2 from the binary representation of n − 1.
(c) Compare the algorithm of Part (b) with that of Exercise 2.14(b).
16. Let α ∈ F2n . Prove that the equation x2 = α has a unique solution in F2n .
17. Represent F2n =√F2 (θ), and let α ∈ F2n .
n−1
(a) Prove that θ = θ2 .
(b) How can you express α as A0 (θ2 ) + θ × A1 (θ√ 2
) (for polynomials A0 , A1 )?
(c) Design an efficient algorithm for computing α.
18. Let F21223 be
√ defined by the irreducible trinomial f (x) = x1223 + x255 + 1.
Show that x = x + x128 modulo f (x).
612
43. Let α ∈ Fpn , and fα (x) the minimal polynomial of α over Fp . Prove that the
degree of fα (x) divides n.
44. Let f (x), g(x) be irreducible polynomials in Fp [x] of degrees m and n. Let Fpm
be represented by adjoining a root of f (x) to Fp . Prove that:
(a) If m = n, then g(x) splits over Fpm .
(b) If gcd(m, n) = 1, then g(x) is irreducible in Fpm [x].
45. Let q = pn , f (x) a polynomial in Fq [x], and f ′ (x) the (formal) derivative of
f (x). Prove that f ′ (x) = 0 if and only if f (x) = g(x)p for some g(x) ∈ Fq [x].
46. Let p be an odd prime, n ∈ N, and q = pn . Prove that for every α ∈ Fq ,
48. Modify Algorithm 1.7 in order to compute the order of an element α ∈ F∗q .
You may assume that the complete prime factorization of q − 1 is available.
2 n−1
49. Let α ∈ F∗pn . Prove that the orders of α, αp , αp , . . . , αp are the same. In
particular, all conjugates of a primitive element of Fpn are again primitive.
50. Prove that n|φ(pn − 1) for every p ∈ P and n ∈ N.
51. Let q − 1 = pe11 · · · perr be the prime factorization of the size q − 1 of F∗q with
r
X Y p2e
i
i +1
+1
each ei > 1. Prove that ord α = .
∗ i=1
p i + 1
α∈Fq
52. [Euler’s criterion for finite fields] Let α ∈ F∗q with q odd. Prove that the
equation x2 = α has a solution in F∗q if and only if α(q−1)/2 = 1.
53. [Generalized Euler’s criterion] Let α ∈ F∗q , t ∈ N, and d = gcd(t, q−1). Prove
that the equation xt = α has a solution in F∗q if and only if α(q−1)/d = 1.
54. Prove that for any finite field Fq and for any α ∈ Fq , the equation x2 + y 2 = α
has at least one solution for (x, y) in Fq × Fq .
55. Let γ be a primitive element of Fq , and r ∈ N. Prove that the polynomial
xr − γ has a root in Fq if and only if gcd(r, q − 1) = 1.
√ √
56. Prove that the field Q( 2) is not isomorphic to the field Q( 3).
57. Let θ, ψ be two distinct roots of some non-constant irreducible polynomial
f (x) ∈ F [x], and let K = F (θ) and L = F (ψ). Give an example where K = L
as sets. Give another example where K 6= L as sets.
58. Let α ∈ Fpn . The trace and norm of α over Fp are defined respectively as
2 n−1
Tr(α) = α + α p + α p + · · · + αp ,
2 n−1
N(α) = α × α p × α p × · · · × αp .
(a) Prove that Tr(α), N(α) ∈ Fp .
(b) Prove that if α ∈ Fp , then Tr(α) = nα and N(α) = αn .
(c) Prove that Tr(α + β) = Tr(α) + Tr(β) and N(αβ) = N(α) N(β) for all
α, β ∈ Fpn . (Trace is additive, and norm is multiplicative.)
(d) Prove that Tr(α) = 0 if and only if α = γ p − γ for some γ ∈ Fpn .
59. Let α ∈ F2n .
(a) Prove that x2 + x = α is solvable for x in F2n if and only if Tr(α) = 0.
1 3 5 n−2
(b) Let Tr(α) = 0. If n is odd, prove that α2 + α2 + α2 + · · · + α2 is a
solution of x2 + x = α. What is the other solution?
(c) Describe a method to solve the general quadratic equation ax2 +bx+c = 0
with a, b, c ∈ F∗2n (assume that n is odd).
120 Computational Number Theory
60. Let Fq be a finite field, and let γ ∈ F∗q be a primitive element. For every
α ∈ F∗q , there exists a unique x in the range 0 6 x 6 q − 2 such that α = γ x .
Denote this x by indγ α (index of α with respect to γ).
(a) First assume that q is odd. Prove that the equation x2 = α is solvable in
Fq for α ∈ F∗q if and only if indγ α is even.
(b) Now, let q = 2n . In this case, for every α ∈ Fq , there exists a unique β ∈ Fq
n−1
such that β 2 = α. In fact, β = α2 . Suppose that α, β ∈ F∗q , k = indγ α,
and l = indγ β. Express l as an efficiently computable formula in k and q.
61. Let θ0 , θ1 , . . . , θn−1 be elements of Fpn . The discriminant ∆(θ0 , θ1 , . . . , θn−1 )
of θ0 , θ1 , . . . , θn−1 is defined as the determinant of the n × n matrix
Tr(θ θ ) Tr(θ θ ) ··· Tr(θ θ )
0 0 0 1 0 n−1
Tr(θ1 θ0 ) Tr(θ1 θ1 ) ··· Tr(θ1 θn−1 )
A=
.. .. .. .
. . ··· .
Tr(θn−1 θ0 ) Tr(θn−1 θ1 ) ··· Tr(θn−1 θn−1 )
(a) Prove that θ0 , θ1 , . . . , θn−1 constitute a basis of Fpn over Fp if and only if
∆(θ0 , θ1 , . . . , θn−1 ) 6= 0.
(b) Define the matrix B as
θ0 θ1 · · · θn−1
p p p
θ0
θ1 · · · θn−1
B= . . ..
.. .. · · · .
n−1 n−1
pn−1
θ0p θ1p · · · θn−1
Prove that B t B = A, where B t denotes the transpose of B. Conclude that
θ0 , θ1 , . . . , θn−1 constitute a basis of Fpn over Fp if and only if det B 6= 0.
Y ³ i ´
j 2
(c) Let θ ∈ Fpn . Prove that ∆(1, θ, θ2 , . . . , θn−1 ) = θp − θp .
06i<j6n−1
Programming Exercises
62. Write a GP/PARI function for the Euclidean inverse algorithm in F2n .
63. Write a GP/PARI function for the binary inverse algorithm in F2n .
64. Write a GP/PARI function for the almost inverse algorithm in F2n .
65. Write a GP/PARI function for the Euclidean inverse algorithm in Fpn .
66. Write a GP/PARI function for the binary inverse algorithm in Fpn .
67. Write a GP/PARI function for the almost inverse algorithm in Fpn .
68. Generalize the GP/PARI code of Section 2.4 for checking normal elements so as
to work for any extension F2n .
69. Write GP/PARI functions to compute traces and norms of elements in Fpn . Use
these functions to compute the traces and norms of all elements in F64 .
Chapter 3
Arithmetic of Polynomials
121
122 Computational Number Theory
of K[x] is derived from the fact that K[x] is a Euclidean domain, that is, the
concept of Euclidean division and Euclidean gcd holds in K[x].
We start our study with polynomials over finite fields, that is, with the ring
Fq [x] for some q. Next, we look at the polynomial ring Z[x] over integers. In
some sense, the study of Z[x] is the same as the study of Q[x]. Since Q is a field,
the ring Q[x] is an easier object to study than Z[x]. Of course, Q ⊆ R ⊆ C,
and so a study of Q[x] may benefit from a study of R[x] and C[x]. However,
a study of R[x] and C[x] may lead us too far away from our focus of interest.
Indeed, we cannot represent every real (or complex) number in computers.
Every finite representation of real numbers has to be approximate. Algorithmic
issues pertaining to such approximate representations (like convergence and
numerical stability) are not dealt with in this book.
These are still not explicit formulas for Np,n and Nq,m . In order to derive the
explicit formulas, we use an auxiliary result.
Definition 3.3 The Möbius function µ : N → {0, 1, −1} is defined as1
1 if n = 1,
µ(n) = 0 if p2 |n for some p ∈ P,
(−1)t
if n is the product of t ∈ N pairwise distinct primes. ⊳
Example 3.4 The following table lists µ(n) for some small values of n.
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
µ(n) 1 −1 −1 0 −1 1 −1 0 0 1 −1 0 −1 1 1 0
¤
Lemma 3.5 For all n ∈ N, the Möbius function satisfies the identity
½
X 1 if n = 1,
µ(d) =
0 if n > 1,
d|n
X
where the sum µ(d) extends over all positive integral divisors d of n.
d|n
Proof Let n = pe11 · · · pet t be the prime factorization of n > 1 with pairwise
distinct primes p1 , . . . , pt and with each ei ∈ N. A divisor d of n is of the
form d = pr11 · · · prt t with each ri in the range 0 6 ri 6 ei . If some ri > 1,
then µ(d) = 0 by definition.
X Therefore,
Xµ(d) is non-zero if and only if each
ri ∈ {0, 1}. But then µ(d) = (−1)r1 +···+rt = (1 − 1)t = 0. ⊳
d|n (r1 ,...,rt )∈{0,1}t
Proposition
X 3.6 [Möbius inversion formula]
X : N → R satisfy f (n) =
Let f, g X
g(d) for all n ∈ N. Then, g(n) = µ(d)f (n/d) = µ(n/d)f (d).
d|n d|n d|n
Proof We have
X X X X
µ(d)f (n/d) = µ(d) g(d′ ) = µ(d)g(d′ )
d|n d|n d′ |(n/d) (dd′ )|n
X X
= g(d′ ) µ(d)
d′ |n d|(n/d′ )
X X X
= g(n) µ(d) + g(d′ ) µ(d) = g(n),
d|1 d′ |n d|(n/d′ )
d′ <n
contributions to number theory and geometry. In addition to Möbius function and Möbius
inversion formula, Möbius is also well-known for Möbius transform and Möbius strip.
Arithmetic of Polynomials 125
Algorithm 3.1: Checking whether f (x) ∈ Fq [x] with d = deg f (x) > 1
is irreducible
Initialize a temporary polynomial t(x) = x.
For (r = 1; r 6 ⌊d/2⌋; + + r) {
Set t(x) = t(x)q (mod f (x)).
If (gcd(f (x), t(x) − x) 6= 1), return False (that is, reducible).
}
Return True (that is, irreducible).
(3) Now, take f (x) = x15 + x4 + 1 ∈ F2 [x]. The iterations of Algorithm 3.1
proceed as follows, and indicate that x15 + x4 + 1 is irreducible in F2 [x].
r r
r x2 (mod f (x)) gcd(x2 + x, f (x))
1 x2 1
2 x4 1
3 x8 1
5
4 x +x 1
5 x10 + x2 1
6 x9 + x5 + x4 1
7 x + x8 + x7 + x3
10
1
x2 + θx = x × (x + θ).
Thus, gcd(x4 + x, f (x)) = x + θ, that is, x6 + θx + θ is reducible. ¤
bits. Thus, the size of the polynomial is O(d log q). An algorithm involving
this polynomial as input is said to run in polynomial time if its running time
is a polynomial in both d and log q.
Let us deduce the running time of Algorithm 3.1. The loop continues for
a maximum of ⌊d/2⌋ = O(d) times. Each iteration of the loop involves a
modular exponentiation followed by a gcd calculation. The exponentiation is
done modulo f (x), that is, the degrees of all intermediate products are kept
at values < d. The exponent is q, that is, square-and-multiply exponentiation
makes O(log q) iterations only. In short, each exponentiation in Algorithm 3.1
requires O(d2 log q) field operations. The gcd calculation involves at most d
Euclidean divisions with each division requiring O(d2 ) operations in Fq . This
is actually an overestimate—Euclidean gcd requires O(d2 ) field operations
only. The arithmetic of Fq can be implemented to run in O(log2 q) time per
operation (schoolbook arithmetic). To sum up, Algorithm 3.1 runs in time
3
O(d3 log q) which is polynomial in both d and log q.
gp > \
isirr(f,p) = \
local (t,g); \
t = Mod(1,p) * x; \
for (r=1, floor(poldegree(f)/2), \
t = (t^p)%f; \
g = gcd(t-Mod(1,p)*x,f); \
print(lift(g)); \
if (g-Mod(1,p), print("Not irreducible"); return(0)) \
); \
print("Irreducible"); return(1)
gp > isirr(Mod(1,2)*x^8 + Mod(1,2)*x^3 + Mod(1,2), 2)
1
1
x^3 + x + 1
Not irreducible
%1 = 0
gp > isirr(Mod(1,2)*x^12 + Mod(1,2)*x^11 + Mod(1,2)*x^9 + Mod(1,2)*x^8 + \
Mod(1,2)*x^6 + Mod(1,2)*x^3 + Mod(1,2), 2)
1
1
x^6 + x^5 + x^4 + x^3 + x^2 + x + 1
Not irreducible
%2 = 0
gp > isirr(Mod(1,2)*x^15 + Mod(1,2)*x^4 + Mod(1,2), 2)
1
1
1
1
1
128 Computational Number Theory
1
1
Irreducible
%3 = 1
gp > isirr(Mod(1,7)*x^15 + Mod(1,7)*x^4 + Mod(1,7), 7)
1
1
1
1
1
4*x^6 + 6*x^3 + x^2 + x + 6
Not irreducible
%4 = 0
of Fq , and output all those α ∈ Fq for which f (α) = 0. This algorithm takes
time proportional to q, and is impractical for large q.
We follow an alternative strategy. We try to split f (x) as f (x) = f1 (x)f2 (x)
in Fq [x]. If deg f1 = 0 or deg f2 = 0, this split is called trivial. A non-trivial
split gives two factors of f (x) of strictly smaller degrees. We subsequently try
to split f1 (x) and f2 (x) non-trivially. This process is repeated until f (x) is
split into linear factors. I now describe a strategy2 to split f (x).
Example 3.10 (1) Let us find the roots of f (x) = x15 +x12 +2x6 +3x4 +6x3 +
3x+4 ∈ F73 [x]. We first compute t(x) ≡ x73 −x ≡ 48x14 +4x13 +23x12 +14x11 +
43x10 + 6x9 + 9x8 + 72x7 + 65x6 + 67x5 + 33x4 + 39x3 + 24x2 + 30 (mod f (x)),
and replace f (x) by gcd(f (x), t(x)) = x6 + 17x5 + 4x4 + 67x3 + 17x2 + 4x + 66.
Therefore, f (x) has six linear factors and so six roots in F73 .
2 This algorithm is first described in: Michael O. Rabin, Probabilistic algorithms in finite
To sum up, all the roots of f (x) are 9, 30, 32, 65, 67, 72 modulo 73. Indeed,
of f (x). If 0 < deg f1 < deg f , then f1 (x) and the cofactor f2 (x) = f (x)/f1 (x)
are split recursively. Algorithm 3.3 elaborates this idea, and assumes that the
input polynomial f (x) is square-free and a product of linear factors.
(x + α)8 ≡ θ2 x + (θ3 + θ + 1) (mod f (x)). This produces the split of f (x) into
f1 (x) = gcd(f (x), g(x)) = x+(θ2 +θ) and f2 (x) = f (x)/f1 (x) = x+(θ3 +θ+1),
yielding the roots θ2 + θ and θ3 + θ + 1. ¤
Algorithm 3.3 looks attractive, but has problems and cannot be used with-
out modifications. We discuss this again in Section 3.3.3. See Exercise 3.10 too.
as the polynomial
The factor within square brackets is divisible by fi (x) if and only if p|si .
Therefore, gcd(f (x), f ′ (x)) = αf1 (x)t1 f2 (x)t2 · · · fl (x)tl , where each ti = si or
si − 1 according as whether p|si or not. That is, f (x)/ gcd(f (x), f ′ (x)) is a
divisor of f1 (x)f2 (x) · · · fl (x) and is, therefore, square-free. ⊳
f2 (x) = gcd(f (x), x4 + x) = 1, that is, f (x) does not contain any quadratic
factors also. Since f3 (x) = gcd(f (x), x8 + x) = x6 + x5 + x4 + x3 + x2 + x + 1,
we discover that f (x) has two irreducible cubic factors. We replace f (x) by
f (x)/f3 (x) = x14 +x13 +x11 +x10 +x9 +x8 +x7 +x6 +x5 +x4 +x3 +x+1. Now,
we compute x16 + x ≡ x11 + x10 + x9 + x8 + x7 + x6 + x5 + x + 1 (mod f (x)),
where f (x) is the reduced polynomial of degree 14 mentioned above. Since
f4 (x) = gcd(f (x), (x16 + x) (mod f (x))) = x4 + x3 + x2 + x + 1, f (x) contains
a single irreducible factor of degree four, and we replace f (x) by f (x)/f4 (x) =
x10 +x8 +x7 +x5 +x3 +x2 +1. We subsequently compute x32 +x ≡ 0 (mod f (x)),
that is, f5 (x) = gcd(f (x), (x32 +x) (mod f (x))) = x10 +x8 +x7 +x5 +x3 +x2 +1,
that is, f (x) has two irreducible factors of degree five, and we replace f (x)
by f (x)/f5 (x) = 1, and the distinct-degree factorization loop terminates. We
have, therefore, obtained the following factorization of f (x).
f1 (x) = 1,
f2 (x) = 1,
f3 (x) = x6 + x5 + x4 + x3 + x2 + x + 1,
f4 (x) = x4 + x3 + x2 + x + 1,
f5 (x) = x10 + x8 + x7 + x5 + x3 + x2 + 1.
We need to factor f3 (x) and f5 (x) in order to obtain the complete factorization
of f (x). This is accomplished in the next stage.
(2) Now, let f (x) = x10 + (θ + 1)x9 + (θ + 1)x8 + x7 + θx6 + (θ + 1)x4 +
θx + θx2 + (θ + 1)x ∈ F4 [x], where F4 = F2 (θ) with θ2 + θ + 1 = 0. We
3
Example 3.15 Let us factor f (x) = x9 +3x8 +3x7 +2x6 +2x+2 ∈ F5 [x]. It is
given that f (x) is the product of three cubic irreducible polynomials of F5 [x].
3
For α = 2 ∈ Fq , we have g(x) ≡ (x+2)(5 −1)/2 −1 ≡ (x+2)62 −1 ≡ 3x8 +3x7 +
x6 + 2x5 + 3x4 + 3x3 + 3x2 + x + 4 (mod f (x)), and f1 (x) = gcd(f (x), g(x)) =
x6 + 2x5 + x4 + 4x3 + 2x2 + x + 1. The cofactor f (x)/f1 (x) = x3 + x2 + 2 is
an irreducible factor of f (x). The other two factors are obtained from f1 (x).
For α = 0, g1 (x) ≡ x62 − 1 ≡ 0 (mod f1 (x)), that is, gcd(f1 (x), g1 (x)) =
f1 (x) is a trivial factor of f1 (x).
For α = 3, we obtain g1 (x) ≡ (x + 3)62 − 1 ≡ 4x4 + 4x2 + 4x (mod f1 (x)),
and gcd(f1 (x), g1 (x)) = x3 + x + 1 is an irreducible factor of f1 (x). The other
factor of f1 (x) is f1 (x)/(x3 + x + 1) = x3 + 2x2 + 1. Thus, f factors as
Example 3.17 (1) Let us try to split f (x) = x16 + 2x15 + x14 + x13 + 2x12 +
x11 + x10 + 2x6 + x5 + 2x4 + 2x2 + 2x + 1 ∈ F3 [x] which is known to be a
product of four irreducible factors each of degree four. We try all three values
of α ∈ F3 in Algorithm 3.5. The splits obtained are listed below.
α gα (x) ≡ x40 − 1 (mod f (x)) fα1 (x) = gcd(f (x), gα (x)) fα2 (x) = f (x)/fα1 (x)
0 0 f (x) 1
1 2x13 + x12 + x8 + 2x7 + 2x6 + x4 + x2 + 2x + 1 x12 + 2x11 + 2x7 + x6 +
2x5 + 2x4 + 2x3 + 2x + 1 x5 + x4 + x3 + x2 + 1
2 x14 + 2x13 + 2x12 + 2x11 + x8 + x7 + 2x6 + 2x5 + x8 + x7 + x6 + 2x5 +
2x + 2x + 2x + x + x + x x4 + 2x3 + 2x2 + x + 1
8 7 5 3 2
x4 + 2x2 + x + 1
since it fails to split the factor of degree eight for all values of α ∈ F3 .
(2) Let us try to factor f (x) = x15 + x7 + x3 + x + 1 ∈ F2 [x] using Algo-
rithm 3.6. It is given that f (x) is the product of three irreducible polynomials
each of degree five. We compute g(x) ≡ (x + α) + (x + α)2 + (x + α)4 +
(x + α)8 + (x + α)16 (mod f (x)) for α = 0, 1. For α = 0 we get g(x) = 0
so that gcd(f (x), g(x)) = f (x), whereas for α = 1 we get g(x) = 1 so that
gcd(f (x), g(x)) = 1. So Algorithm 3.6 fails to split f (x) at all.
(3) Splitting may fail even in a case where the field can supply more
elements than the number of irreducible factors of f (x). By Example 3.14(2),
f (x) = x6 + (θ + 1)x4 + θx2 + (θ + 1)x + 1 ∈ F4 [x] is a product of two
cubic irreducible factors, where F4 = F2 (θ) with θ2 + θ + 1 = 0. The values of
g(x) ≡ (x+α)+(x+α)2 +(x+α)4 +(x+α)8 +(x+α)16 +(x+α)32 (mod f (x))
are listed below for all α ∈ F4 . No value of α ∈ F4 can split f (x) non-trivially.
Example 3.18 Let us now handle the failed attempts of Example 3.17.
(1) We factor f (x) = x8 + x7 + 2x6 + 2x5 + x4 + 2x3 + 2x2 + x + 1 ∈ F3 [x]
which is known to be a product of two irreducible factors of degree four.
Choose u(x) = x2 + x + 2 for which g(x) ≡ u(x)40 − 1 ≡ 2x7 + 2x5 + x4 + 2x3 +
2x2 + 2x + 2 (mod f (x)). This yields the non-trivial factor gcd(f (x), g(x)) =
x4 +x2 +x+1. The other factor of f (x) is f (x)/(x4 +x2 +x+1) = x4 +x3 +x2 +1.
Therefore, we have the equal-degree factorization
The root-finding and factoring algorithms for polynomials over finite fields,
as discussed above, are randomized. The best deterministic algorithms known
for these problems have running times fully exponential in log q (the size of
the underlying field). The best known deterministic algorithm for factoring a
polynomial of degree d in Fq [x] is from Shoup7 , and is shown by Shparlinski8
to run in O(q 1/2 (log q)d2+ǫ ) time, where dǫ stands for a polynomial in log d.
Computations over finite fields exploit randomization very effectively.
gp > lift(lift(factorff( \
x^10+(t+1)*x^9+(t+1)*x^8+x^7+t*x^6+(t+1)*x^4+t*x^3+t*x^2+(t+1)*x, \
2, t^2+t+1)))
Arithmetic of Polynomials 145
%10 =
[x 1]
[x + t 1]
[x^2 + x + t 1]
[x^3 + x + 1 1]
[x^3 + t*x + 1 1]
[x^3 + t*x + (t + 2) 3]
Lemma 3.20 Let f (x), g(x) ∈ Z[x] be non-zero. Then, cont(f (x)g(x)) =
(cont f (x))(cont g(x)). In particular, the product of two primitive polynomials
is again primitive.
9 This primitive polynomial has nothing to do with the primitive polynomial of Defini-
tion 2.30. The same term used for describing two different objects may create confusion.
But we have to conform to conventions.
146 Computational Number Theory
Pm Pn
Proof Let f (x) = i=0 ai xi and g(x) = j=0 bj xj with a = cont f (x) and
b = cont g(x). Write f (x) = af¯(x) and g(x) = bḡ(x), where f¯(x), ḡ(x) are
primitive polynomials. Since f (x)g(x) = abf¯(x)ḡ(x), it suffices to show that
the product of two primitive polynomials is primitive, and we assume without
loss of generality that f (x), g(x) are themselves primitive (that is, a = b = 1).
We proceed by contradiction. Assume that f (x)g(x) is not primitive, that is,
there exists a prime p | cont(f (x)g(x)), that is, p divides every coefficient of
f (x)g(x). Since f (x) is primitive, all coefficients of f (x) are not divisible by p.
Let s be the smallest non-negative integer for which p6 | as . Analogously, let t
be the smallest non-negative integer for which p6 | bt . The coefficient of xs+t in
f (x)g(x) is as bt + (as−1 bt+1 + as−2 bt+2 + · · ·) + (as+1 bt−1 + as+2 bt−2 + · · ·). By
the choice of s, t, the prime p divides as−1 , as−2 , . . . , a0 and bt−1 , bt−2 , . . . , b0 ,
that is, p divides as bt , a contradiction, since p6 | as and p6 | bt . ⊳
Theorem 3.21 Let f (x) ∈ Z[x] be a primitive polynomial. Then, f (x) is
irreducible in Z[x] if and only if f (x) is irreducible in Q[x].
Proof The “if” part is obvious. For proving the “only if” part, assume
that f (x) = g(x)h(x) is a non-trivial factorization of f (x) in Q[x]. We can
write g(x) = aḡ(x) and h(x) = bh̄(x) with a, b ∈ Q∗ and with primitive
polynomials ḡ(x), h̄(x) ∈ Z[x]. We have f (x) = abḡ(x)h̄(x). Since f (x) and
ḡ(x)h̄(x) are primitive polynomials in Z[x], we must have ab = ±1, that is,
f (x) = (abḡ(x))(h̄(x)) is a non-trivial factorization of f (x) in Z[x]. ⊳
A standard way to determine the irreducibility or otherwise of a primitive
polynomial f (x) in Z[x] is to factor f (x) in Z[x] or Q[x]. In Section 3.5,
we will study some algorithms for factoring polynomials in Z[x]. There are
certain special situations, however, when we can confirm the irreducibility of
a polynomial in Z[x] more easily than factoring the polynomial.
Theorem 3.22 [Eisenstein’s criterion] Let f (x) = a0 + a1 x + · · · + ad xd ∈
Z[x] be a primitive polynomial, and p a prime that divides a0 , a1 , . . . , ad−1 ,
but not ad . Suppose also that p26 | a0 . Then, f (x) is irreducible.
Proof Suppose that Pfm(x) = g(x)h(x) is aP non-trivial factorization of f (x) in
n
Z[x], where g(x) = i=0 bi xi and h(x) = j=0 cj xj . Since f (x) is primitive,
g(x) and h(x) are primitive too. We have a0 = b0 c0 . By hypothesis, p|a0 but
p2 6 | a0 , that is, p divides exactly one of b0 and c0 . Let p|c0 . Since h(x) is
primitive, all cj are not divisible by p. Let t be the smallest positive integer
for which p 6 | ct . We have t 6 deg h(x) = d − deg g(x) < d, that is, at is
divisible by p. But at = b0 ct + b1 ct−1 + b2 ct−2 + · · · . By the choice of t, all
the coefficients ct−1 , ct−2 , . . . , c0 are divisible by p. It follows that p|b0 ct , but
p6 | b0 and p6 | ct , a contradiction. ⊳
Example 3.23 I now prove that f (x) = 1 + x + x2 + · · · + xp−1 ∈ Z[x] is
irreducible for p ∈ P. Evidently, f (x) is irreducible if and only if f (x+1) is. But
p p ¡ ¢ ¡ ¢ ¡ p ¢
f (x) = xx−1−1
, so f (x + 1) = (x+1) −1
(x+1)−1 = x
p−1
+ p1 xp−2 + p2 xp−3 + · · · + p−1
satisfies Eisenstein’s criterion. ¤
Arithmetic of Polynomials 147
called the Sylvester matrix10 of f (x) and g(x), that is, Res(f (x), g(x)) =
det Syl(f (x), g(x)). If f (x) = 0 or g(x) = 0, we define Res(f (x), g(x)) = 0. ⊳
Some elementary properties of resultants are listed now.
Proposition 3.27 Let f (x), g(x) ∈ K[x] be as in Definition 3.26.
(1) Res(g(x), f (x)) = (−1)mn Res(f (x), g(x)).
(2) Let m > n, and r(x) = f (x) rem g(x) 6= 0. Then, Res(f (x), g(x)) =
(−1)mn bm−n
n ×Res(g(x), r(x)). In particular, resultants can be computed using
the Euclidean gcd algorithm for polynomials.
(3) Let α1 , α2 , . . . , αm be the roots of f (x), and β1 , β2 , . . . , βn the roots of
g(x) (in some extension of K). Then, we have
m
Y n
Y m Y
Y n
Res(f (x), g(x)) = anm g(αi ) = (−1)mn bm
n f (βj ) = anm bm
n (αi − βj ).
i=1 j=1 i=1 j=1
In particular, Res(f (x), g(x)) = 0 if and only if f (x) and g(x) have a non-
trivial common factor (in K[x]). ⊳
Example 3.28 (1) For K = Q, f (x) = 2x3 + 1, and g(x) = x2 − 2x + 3,
¯ ¯
¯2 0 0 1 0¯
¯ ¯
¯0 2 0 0 1¯
¯ ¯
Res(f (x), g(x)) = ¯ 1 −2 3 0 0 ¯ = 89.
¯ ¯
¯0 1 −2 3 0 ¯
¯ ¯
0 0 1 −2 3
We can also compute this resultant by Euclidean gcd:
r(x) = f (x) rem g(x) = 2x − 11,
s(x) = g(x) rem r(x) = 89/4.
Therefore,
Res(f (x), g(x)) = (−1)6 13 Res(g(x), r(x)) =
Res(g(x), r(x)) = (−1)2 22 Res(r(x), s(x)) =
4 Res(r(x), s(x)) = 4 × (89/4) = 89.
10 This is named after the English mathematician James Joseph Sylvester (1814–1897).
Arithmetic of Polynomials 149
gp > polresultant(2*x^3+1,x^2-2*x+3)
%1 = 89
gp > polresultant(x^4+x^2+1,x^3+1)
%2 = 0
gp > polresultant(Mod(2,89)*x^3+Mod(1,89),Mod(1,89)*x^2-Mod(2,89)*x+Mod(3,89))
%3 = 0
gp > poldisc(a*x^2+b*x+c)
%4 = -4*c*a + b^2
gp > poldisc(a*x^3+b*x^2+c*x+d)
%5 = -27*d^2*a^2 + (18*d*c*b - 4*c^3)*a + (-4*d*b^3 + c^2*b^2)
gp > poldisc(4*x^4+5*x-8)
%6 = -8658608
gp > poldisc(Mod(4,7)*x^4+Mod(5,7)*x-Mod(8,7))
%7 = Mod(0, 7)
Arithmetic of Polynomials 151
It remains to verify that the lifted polynomials gn+1 (x), hn+1 (x) continue
to satisfy the properties (a)–(d). By construction, gn+1 (x) ≡ gn (x) (mod pn ),
whereas by the induction hypothesis gn (x) ≡ g1 (x) (mod p), so gn+1 (x) ≡
g1 (x) (mod p). Analogously, hn+1 (x) ≡ h1 (x) (mod p). So Property (a) holds.
By construction, Res(gn+1 (x), hn+1 (x)) ≡ Res(gn (x), h( x)) (mod p), and
by the induction hypothesis, p6 | Res(gn (x), hn (x)), that is, Property (b) holds.
Now, deg un (x) < deg g1 (x), deg gn (x) = deg g1 (x), and gn (x) is monic.
It follows that gn+1 (x) = gn (x) + pn un (x) is monic too, with degree equal to
deg g1 (x). Thus, Property (c) holds.
For proving Property (d), assume that deg hn+1 (x) < deg h1 (x). But Prop-
erty (a) implies that f (x) ≡ gn+1 (x)hn+1 (x) (mod p) has a degree less than
the degree of g1 (x)h1 (x) (mod p), contradicting (3) and (4). ⊳
Example 3.34 We start with the following values.
f (x) = 35x5 − 22x3 + 10x2 + 3x − 2 ∈ Z[x],
p = 13,
g1 (x) = x2 + 2x − 2 ∈ Z[x], and
h1 (x) = −4x3 − 5x2 + 6x + 1 ∈ Z[x].
Let us first verify that the initial conditions (1)–(4) in Theorem 3.33 are
satisfied for these choices. We have Res(g1 (x), h1 (x)) = 33 = 2 × 13 + 7, that
is, Condition (1) holds. Clearly, Conditions (2) and (3) hold. Finally, f (x) −
g1 (x)h1 (x) = 39x5 + 13x4 − 26x3 − 13x2 + 13x = 13 × (3x5 + x4 − 2x3 − x2 + x),
that is, Condition (4) is satisfied.
We now lift the factorization g1 (x)h1 (x) of f (x) modulo p to a factorization
g2 (x)h2 (x) of f (x) modulo p2 . First, compute w1 (x) = (f (x)−g1 (x)h1 (x))/p =
3x5 + x4 − 2x3 − x2 + x. Then, we attempt to find the polynomials u1 (x) =
u11 x + u10 and v1 (x) = v13 x3 + v12 x2 + v11 x + v10 satisfying
v1 (x)g1 (x) + u1 (x)h1 (x) ≡ w1 (x) (mod p).
Expanding the left side of this congruence and equating the coefficients of xi ,
i = 5, 4, 3, 2, 1, 0, from both sides give the linear system
v13 ≡ 3
2v13 + v12 − 4u11 ≡ 1
−2v13 + 2v12 + v11 − 5u11 − 4u10 ≡ −2
(mod p),
−2v12 + 2v11 + v10 + 6u11 − 5u10 ≡ −1
−2v11 + 2v10 + u11 + 6u10 ≡ 1
−2v10 + u10 ≡ 0
that is, the system
1 0 0 0 0 0 v13 3
2 1 0 0 −4 0 v12 1
−2 2 1 0 −5 −4 v11 −2
≡ (mod 13).
0 −2 2 1 6 −5 v10 −1
0 0 −2 2 1 6 u11 1
0 0 0 −2 0 1 u10 0
Arithmetic of Polynomials 153
polynomial g1 (x) = fi1 (x)fi2 (x) · · · fik (x) ∈ Fp [x] which is a potential re-
duction of g(x) modulo p. We then compute h1 (x) = f (x)/g1 (x) in Fp [x].
Since f (x) is square-free modulo p, we have gcd(g1 (x), h1 (x)) = 1, that is,
Res(g1 (x), h1 (x)) is not divisible by p.
We then lift the factorization f (x) ≡ g1 (x)h1 (x) (mod p) to the (unique)
factorization f (x) ≡ gn (x)hn (x) (mod pn ) using Theorem 3.33. The poly-
nomial gn (x) is monic, and is represented so as to have coefficients between
−pn /2 and pn /2. We choose n large enough to satisfy pn /2 > H(g). Then,
gn (x) in this representation can be identified with a polynomial in Z[x].
(Notice that g(x) ∈ Z[x] may have negative coefficients. This is why we
kept the coefficients√ of gn (x) between −pn /2 and pn /2.) Proposition 3.25
gives H(g) 6 √ 2⌊d/2⌋ d + 1 H(f ), that is, we choose the smallest n satisfying
n ⌊d/2⌋
p /2 > 2 d + 1 H(f ) so as to ascertain pn /2 > H(g).
Once gn (x) is computed, we divide f (x) by gn (x) in Z[x] (actually, Q[x]).
If r(x) = f (x) rem gn (x) = 0, we have detected a divisor g(x) = gn (x) ∈ Z[x]
of f (x). We then recursively factor g(x) and the cofactor h(x) = f (x)/g(x).
Let us finally remove the restriction that f (x) is monic. Let a ∈ Z
be the leading coefficient of f (x). We require the reduction of f (x) mod-
ulo p to be of degree d = deg f , that is, we require p 6 | a. Factoring
f (x) in Fp [x] then gives f (x) ≡ af1 (x)f2 (x) · · · ft (x) (mod p) with distinct
monic irreducible polynomials f1 , f2 , . . . , ft in Fp [x]. We start with a divisor
g1 (x) = fi1 (x)fi2 (x) · · · fik (x) of f (x) in Fp [x] with deg g1 6 ⌊d/2⌋, and set
h1 (x) = f (x)/g1 (x) ∈ Fp [x]. Here, g1 (x) is monic, and h1 (x) has leading coef-
ficient a (mod p). Using Hensel’s lifting, we compute gn (x), hn (x) with gn (x)
monic and f (x) ≡ gn (x)hn (x) (mod pn ).
A divisor g(x) of f (x) in Z[x] need not be monic. However, the leading co-
efficient b of g(x) must divide a. Multiplying g(x) by a/b gives the polynomial
(a/b)g(x) with leading coefficient equal to a. Moreover, (a/b)g(x) must di-
vide f (x) in Q[x]. Therefore, instead of checking whether gn (x) divides f (x),
we now check whether agn (x) divides f (x) in Q[x]. That is, agn (x) is now
identified with the √ polynomial (a/b)g(x).√Since H((a/b)g(x)) 6 H(ag(x)) =
|a|H(g) 6 |a|2⌊d/2⌋√ d + 1 H(f ) 6 2⌊d/2⌋ d + 1 H(f )2 , we now choose n so
that pn /2 > 2⌊d/2⌋ d + 1 H(f )2 .
Algorithm 3.9 summarizes all these observations in order to arrive at an
algorithm for factoring polynomials in Z[x].
The search for potential divisors g(x) of f (x) starts by selecting g1 (x) from
Subsequently, we compute ḡ4 [x] = ag4 (x) = 35x − 5038 ∈ Z134 [x], g(x) =
ḡ4 (x)/ cont(ḡ4 (x)) = 35x − 5038 ∈ Z[x], and r(x) = f (x) rem g(x) =
3245470620554884228
1500625 ∈ Q[x]. Since r(x) 6= 0, g(x) is not a factor of f (x).
(1, 3) (4, 3)
(7, 3)
x x
(0, 0) (3, 0) (0, 0)
The quantity µi,j b∗j in Algorithm 3.10 is the component of bi in the direc-
tion of the vector b∗j . When all these components are removed from bi , the
vector b∗i becomes orthogonal to the vectors b∗1 , b∗2 , . . . , b∗i−1 computed so far.
162 Computational Number Theory
The multipliers µi,j are not necessarily integers, so b∗i need not belong to
the lattice generated by b1 , b2 , . . . , bn . Moreover, if the vectors b1 , b2 , . . . , bn
are already orthogonal to one another, we have b∗i = bi for all i = 1, 2, . . . , n.
The notion of near-orthogonality is captured by the following definition.
Here, |µ2,1 | > 21 , so Condition (3.1) is not satisfied. Moreover, |c∗2 + µ2,1 c∗1 |2 =
|c2 |2 = 42 + 32 = 25, whereas |c∗1 |2 = 72 + 32 = 58, that is, Condition (3.2)
too is not satisfied. The basis c1 , c2 is, therefore, not reduced. ¤
Handling the violation of the second condition is a bit more involved. Let us
denote |b∗i |2 by Bi . Since the vectors b∗i are pairwise orthogonal, the violation
of Condition (3.2) for some k in the range 2 6 k 6 n can be rephrased as:
3
Bk + µ2k,k−1 Bk−1 < Bk−1 .
4
Now, we swap bk−1 and bk . This replaces the vector b∗k−1 by (the old vector)
b∗k +µk,k−1 b∗k−1 . Moreover, b∗k (and µk,k−1 ) are so updated that the new value
of b∗k + µk,k−1 b∗k−1 equals the old vector b∗k−1 . Consequently, Condition (3.2)
is restored at k. The updating operations are given in Algorithm 3.12.
It turns out that di is the square of the volume of the fundamental region
associated with the i-dimensional lattice generated by b1 , b2 , . . . , bi , that is,
i
Y i
Y
di = |b∗j |2 6 |bj |2 6 B i ,
j=1 j=1
This gives the squared norm values as (we have B1 B2 = d(L)2 , as expected):
µ ¶2 µ ¶2
2 2 27 63 81
B1 = 7 + 3 = 58, B2 = + = .
58 58 58
In the first iteration of the while loop of Algorithm 3.13, we have t = 2,
and the condition on µ2,1 is violated (we have |µ2,1 | > 21 ). The integer closest
µ ¶
37 −3
to µ2,1 = 58 is 1. So we replace b2 by b2 −b1 = , and µ2,1 by µ2,1 −1 =
0
21
− 58 . The values of B1 and B2 do not change by this adjustment.
166 Computational Number Theory
81
We now have B2 = 58 , and ( 34 − µ22,1 )B1 = 1041
29 , that is, Condition (3.2)
is violated for t = 2. So we invoke Algorithm 3.12. We first compute B =
B2 + µ22,1 B1 = 9, change µ2,1 to µ2,1 B1 /B = − 37 , set B2 = B1 B2 /B = 9 and
µ ¶
−3
B1 = B = 9. Finally, we swap b1 and b2 , that is, we now have b1 =
µ ¶ 0
7
and b2 = . Since t = 2, we do not decrement t.
3
In the second iteration of the while loop, we first discover that |µ2,1 | = 37
is again too large. The integer closest to µ2,1 is −2. So we replace b2 by
µ ¶
1
b2 + 2b1 = , and µ2,1 by µ2,1 + 2 = − 31 .
3
Since B2 = 9 > 23 3 2
4 = ( 4 − µ2,1 )B1 , we do not swap b1 and b2 . Moreover,
there are no µt,j values to take care of. So t is increased to three, and the
algorithm
µ terminates.
¶ The
µ ¶computed reduced basis consists of the vectors
−3 1
b1 = and b2 = . Compare this with the basis in Figure 3.1(a). ¤
0 3
Proposition 3.42 Suppose that Condition (3.3) is satisfied. Let g(x) ∈ Z[x]
be the desired irreducible factor of f (x). Then,
¡ ¢1/d
deg g(x) 6 m if and only if |b1 | < pkl /|f |m .
¡ ¢1/d
Let t > 1 be the largest integer for which |bt | < pkl /|f |m . Then, we have
deg g(x) = m+1−t and, more importantly, g(x) = gcd(b1 (x), b2 (x), . . . , bt (x)).
¡ ¢1/d
Moreover, in this case, |bi | < pkl /|f |m for all i = 1, 2, . . . , t. ⊳
Some comments on the L3 factoring algorithm are now in order. First, let
me prescribe a way to fix the parameters. Since the irreducible factor g(x)
may be of degree as large as d − 1, it is preferable to start with m = d − 1.
For this choice, we compute the right side of Condition (3.3). A prime p and a
positive integer k is then chosen to satisfy this condition (may be for the most
pessimistic case l = 1). The choice k = 1 is perfectly allowed. Even if some
k > 1 is chosen, lifting the factorization of f (x) modulo p to the factorization
modulo pk is an easy effort. Factoring f (x) modulo p can also be efficiently
done using the randomized algorithm described in Section 3.3.
If f (x) remains irreducible modulo p, we are done. Otherwise, we choose
any irreducible factor of f (x) modulo pk as γ(x). Under the assumption that p
does not divide Discr(f ), no factor of f modulo p has multiplicity larger than
one. The choice of γ fixes l, and we may again investigate for which values of
m, Condition (3.3) holds. We may start with any such value of m. However,
the choice m = d − 1 is always safe, since a value of m smaller than the degree
of g(x) forces us to repeat the basis-reduction process for a larger value of m.
The basic difference between Berlekamp’s factoring algorithm and the L3
algorithm is that in Berlekamp’s algorithm, we may have to explore an expo-
nential number of combinations of the irreducible factors of f modulo p. On
the contrary, the L3 algorithm starts with any (and only one) suitable fac-
tor of f modulo p in order to discover one irreducible factor of f . Therefore,
the L3 algorithm achieves a polynomial running time even in the worst case.
However, both these algorithms are based on factoring f modulo p. Although
this can be solved efficiently using randomized algorithms, there is no known
polynomial-time deterministic algorithm for this task.
Lenstra et al. estimate that the L3 algorithm can factor f completely using
only O(d6 + d5 log |f | + d4 log p) arithmetic operations on integers of bit sizes
bounded above by O(d3 + d2 log |f | + d log p).
[7*x^3 - 3*x + 2 1]
gp > factor(Mod(35,13)*x^5-Mod(22,13)*x^3+Mod(10,13)*x^2+Mod(3,13)*x-Mod(2,13))
%2 =
[Mod(1, 13)*x + Mod(5, 13) 1]
170 Computational Number Theory
GP/PARI supplies the built-in function qflll for lattice-basis reduction. The
initial basis vectors of a lattice should be packed in a matrix. Each column
should store one basis vector. The return value is again a matrix which is,
however, not the reduced basis vectors packed in a similar format. It is indeed
a transformation matrix which, when post-multiplied by the input matrix,
gives the reduced basis vectors. This is demonstrated for the two-dimensional
lattice of Example 3.40 and the three-dimensional lattice of Example 3.43.
gp > M = [ 7, 4; \
3, 3];
gp > T = qflll(M)
%2 =
[-1 -1]
[1 2]
gp > M * T
%3 =
[-3 1]
[0 3]
? M * T
%7 =
[1 114648 180082]
[4 -90039 49214]
Arithmetic of Polynomials 171
Exercises
1. [Multiplicative form of Möbius inversionQformula] Let f, g be two functions
of natural numbers satisfying f (n) = d|n g(d) for all n ∈ N. Prove that
Q Q
g(n) = d|n f (d)µ(n/d) = d|n f (n/d)µ(d) for all n ∈ N.
2. (a) Find an explicit formula for the product of all monic irreducible polyno-
mials of degree n in Fq [x].
(b) Find the product of all monic sextic irreducible polynomials of F2 [x].
(c) Find the product of all monic cubic irreducible polynomials of F4 [x].
3. Which of the following polynomials is/are irreducible in F2 [x]?
(a) x5 + x4 + 1.
(b) x5 + x4 + x + 1.
(c) x5 + x4 + x2 + x + 1.
4. Which of the following polynomials is/are irreducible in F3 [x]?
(a) x4 + 2x + 1.
(b) x4 + 2x + 2.
(c) x4 + x2 + 2x + 2.
5. Prove that a polynomial f (x) ∈ Fq [x] of degree two or three is irreducible if
and only if f (x) has no roots in Fq .
6. Argue that the termination criterion for the loop in Algorithm 3.4 may be
changed to deg f (x) 6 2r + 1. Modify the algorithm accordingly. Explain how
this modified algorithm may speed up distinct-degree factorization.
7. Establish that the square-free and the distinct-degree factorization algorithms
described in the text run in time polynomial in deg f and log q.
8. Consider the root-finding Algorithm 3.2. Let vα (x) = (x + α)(q−1)/2 − 1,
wα (x) = (x + α)(q−1)/2 + 1, v(x) = v0 (x), and w(x) = w0 (x).
(a) Prove that the roots of v(x) are all the quadratic residues of F∗q , and those
of w(x) are all the quadratic non-residues of F∗q .
(b) Let f (x) ∈ Fq [x] with d = deg f > 2 be a product of distinct linear factors.
Assume that the roots of f (x) are random elements of Fq . Moreover, assume
that the quadratic residues in F∗q are randomly distributed in F∗q . Compute
the probability that the polynomial gcd(f (x), vα (x)) is a non-trivial factor of
f (x) for a randomly chosen α ∈ Fq .
(c) Deduce that the expected running time of Algorithm 3.2 is polynomial in
d and log q.
9. (a) Generalize Exercise 3.8 in order to compute the probability that a random
α ∈ Fq splits f (x) in two non-trivial factors in Algorithm 3.5. Make reasonable
assumptions as in Exercise 3.8.
(b) Deduce that the expected running time of Algorithm 3.5 is polynomial in
deg f and log q.
172 Computational Number Theory
10. Consider the root-finding Algorithm 3.3 over Fq , where q = 2n . Let v(x) =
2 3 n−1
x+x2 +x2 +x2 +· · ·+x2 , and w(x) = 1+v(x). Moreover, let f (x) ∈ Fq [x]
be a product of distinct linear
Y factors, and d = deg f (x). Y
(a) Prove that v(x) = (x + γ), and that w(x) = (x + γ), where
γ∈Fq γ∈Fq
Tr(γ)=0 Tr(γ)=1
29. Let f (x) ∈ Fq [x] be a monic non-constant polynomial, andQlet h(x) ∈ Fq [x]
satisfy h(x)q ≡ h(x) (mod f (x)). Prove that h(x)q − h(x) = γ∈Fq (h(x) − γ).
Q
Conclude that f (x) = γ∈Fq gcd(f (x), h(x) − γ).
30. [Berlekamp’s Q-matrix factorization] You are given a monic non-constant
square-free polynomial f (x) ∈ Fq [x] with t irreducible factors (not necessarily
of the same degree). Let d = deg f (x).
(a) Prove that there are exactly q t polynomials of degrees less than d satis-
fying h(x)q ≡ h(x) (mod f (x)).
(b) In order to determine all these polynomials h(x) of Part (a), write h(x) =
α0 + α1 x + α2 x2 + · · · + αd−1 xd−1 . Derive a d × d matrix Q such that the
unknown coefficients α0 , α1 , α2 , . . . , αd−1 ∈ Fq can be obtained by solving the
t
homogeneous linear system Q ( α0 α1 α2 · · · αd−1 ) = 0 in Fq .
(c) Deduce that the matrix Q has rank d − t and nullity t.
(d) Suppose that t > 2. Let V ∼ = Ftq denote the nullspace of Q. Prove that for
every two irreducible factors f1 (x), f2 (x) of f (x) and for every two distinct
elements γ1 , γ2 ∈ Fq , there exists an (α0 , α1 , . . . , αd−1 ) ∈ V such that h(x) ≡
γ1 (mod f1 (x)) and h(x) ≡ γ2 (mod f2 (x)), where h(x) = α0 + α1 x + · · · +
αd−1 xd−1 . Moreover, for any basis of V , we can choose distinct γ1 , γ2 ∈ Fq in
such a way that (α0 , α1 , . . . , αd−1 ) is a vector of the basis.
(e) Assume that q is small. Propose a deterministic polynomial-time algo-
rithm for factoring f (x) based on the ideas developed in this exercise.
31. Factor x8 + x5 + x4 + x + 1 ∈ F2 [x] using Berlekamp’s Q-matrix algorithm.
32. Let f (x) ∈ Fq [x] be as in Exercise 3.30. Evidently, f (x) is irreducible if and
only if t = 1. Describe a polynomial-time algorithm for checking the irre-
ducibility of f (x), based upon the determination of t. You do not need to
assume that q is small. Compare this algorithm with Algorithm 3.1.
174 Computational Number Theory
Programming Exercises
52. Write a GP/PARI function that computes the number of monic irreducible poly-
nomials of degree m in Fq [x]. (Hint: Use the built-in function moebius().)
53. Write a GP/PARI function that computes the product of all monic irreducible
polynomials of degree m in Fq [x]. (Hint: Exercise 3.2.)
54. Write a GP/PARI function that checks whether a non-constant polynomial in
the prime field Fp [x] is square-free. (Hint: Use the built-in function deriv().)
55. Write a GP/PARI function that computes the square-free factorization of a
monic non-constant polynomial in the prime field Fp [x].
176 Computational Number Theory
The study of elliptic curves is often called arithmetic algebraic geometry. Re-
cent mathematical developments in this area has been motivated to a large
extent by attempts to prove Fermat’s last theorem which states that the equa-
177
178 Computational Number Theory
Elliptic curves are plane algebraic curves of genus one.2 Cubic and quartic
equations of special forms in two variables3 X, Y are elliptic curves. The Greek
1 In 1637, the amateur French mathematician Pierre de Fermat wrote a note in his per-
sonal copy of Bachet’s Latin translation of the Greek book Arithmetica by Diophantus.
The note translated in English reads like this: “It is impossible to separate a cube into two
cubes, or a fourth power into two fourth powers, or in general, any power higher than the
second into two like powers. I have discovered a truly marvelous proof of this, which this
margin is too narrow to contain.” It is uncertain whether Fermat really discovered a proof.
However, Fermat himself published a proof for the special case n = 4 using a method which
is now known as Fermat’s method of infinite descent.
Given Fermat’s proof for n = 4, one proves Fermat’s last theorem for any n > 3 if one
supplies a proof for all primes n > 3. Some special cases were proved by Euler (n = 3), by
Dirichlet and Legendre (n = 5) and by Lamé (n = 7). In 1847, Kummer proved Fermat’s last
theorem for all regular primes. However, there exist infinitely many non-regular primes. A
general proof for Fermat’s last theorem has eluded mathematicians for over three centuries.
In the late 1960s, Hellegouarch discovered a connection between elliptic curves and Fer-
mat’s last theorem, which led Gerhard Frey to conclude that if a conjecture known as the
Taniyama–Shimura conjecture for elliptic curves is true, then Fermat’s last theorem holds
too. The British mathematician Andrew Wiles, with the help of his student Richard Taylor,
finally proved Fermat’s last theorem in 1994. Wiles’ proof is based on very sophisticated
mathematics developed in the 20th century, and only a handful of living mathematicians
can claim to have truly understood the entire proof.
It is debatable whether Fermat’s last theorem is really a deep theorem that deserved such
prolonged attention. Nonetheless, myriads of failed attempts to prove this theorem have,
without any shred of doubt, intensely enriched several branches of modern mathematics.
2 Loosely speaking, the genus of a curve is the number of handles in it. Straight lines and
Y 2 = (X − a)(X 3 + bX 2 + cX + d) (4.1)
Y 2 = αX 3 + βX 2 + γX + 1 (4.2)
Y 2 = X 3 + µX 2 + νX + η. (4.3)
E : Y 2 + a1 XY + a3 Y = X 3 + a2 X 2 + a4 X + a6 (4.4)
variate polynomial equations with integer coefficients, of which integer or rational solutions
are typically investigated.
5 Diophantus studied special cases, whereas for us it is easy to generalize the results
for his contributions to mathematical analysis. The Weierstrass elliptic (or P) function ℘ is
named after him.
180 Computational Number Theory
Example 4.3 (1) Take K = R. Three singular cubic curves are shown in
Figure 4.1. In each of these three examples, the point of singularity is the origin
(0, 0). If the underlying field is R, we can identify the type of singularity from
the Hessian7 of the curve C : f (X, Y ) = 0, defined as
à 2 2
!
∂ f ∂ f
Hessian(f ) = ∂X 2 ∂X∂Y .
∂2f ∂2f
∂Y ∂X ∂Y 2
0 0 0
-0.5 -2
-0.5
-1
-4
-1
0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 2.5 3
1 2 2
0 0 0
-1 -2 -2
-4
-2 -4
-1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 1.5 2 2.5 3
2 3 2 3 2 3
(a) Y = X − X (b) Y = X − X + 1 (c) Y = X + X
if for every non-zero column vector v of size n, we have v t Av > 0 (resp. v t Av < 0).
Arithmetic of Elliptic Curves 181
(2) Three elliptic curves over R are shown in Figure 4.2. The partial
∂f ∂f
derivatives ∂X and ∂Y do not vanish simultaneously at every point on these
curves, and so the tangent is defined at every point on these curves. The curve
of Part (a) has two disjoint components in the X-Y plane, whereas the curves
of Parts (b) and (c) have only single components. The bounded component of
the curve in Part (a) is the broken handle. For the other two curves, the handles
are not discernible. For real curves, handles may be broken or invisible, since
R is not algebraically closed. Curves over the field C of complex numbers
have discernible handles. But then a plane curve over C requires four real
dimensions, and is impossible to visualize in our three-dimensional world.
(3) Let us now look at elliptic curves over finite fields. Take K = F17 .
The curve defined by Y 2 = X 3 + 5X − 1 is not an elliptic curve, because it
contains a singularity at the point (2, 0). Indeed, we have X 3 + 5X − 1 ≡
(X + 15)2 (X + 13) (mod 17). However, the equation Y 2 = X 3 + 5X − 1 defines
an elliptic over R, since X 3 + 5X − 1 has no multiple roots in R (or C).
The curve E1 : Y 2 = X 3 − 5X + 1 is non-singular over F17 with 15 points:
(0, 1), (0, 16), (2, 4), (2, 13), (3, 8), (3, 9), (5, 4), (5, 13), (6, 0), (10, 4),
(10, 13), (11, 6), (11, 11), (13, 5), (13, 12).
There is only one point on E1 with Y -coordinate equal to 0. We have the
factorization X 3 − 5X + 1 ≡ (X + 11)(X 2 + 6X + 14) (mod 17).
The curve E2 : Y 2 = X 3 − 4X + 1 is non-singular over F17 with 24 points:
(0, 1), (0, 16), (1, 7), (1, 10), (2, 1), (2, 16), (3, 4), (3, 13), (4, 7), (4, 10), (5, 2),
(5, 15), (10, 3), (10, 14), (11, 8), (11, 9), (12, 7), (12, 10), (13, 2), (13, 15),
(15, 1), (15, 16), (16, 2), (16, 15).
E2 contains no points with Y -coordinate equal to 0, because X 3 − 4X + 1 is
irreducible in F17 [X].
The non-singular curve E3 : Y 2 = X 3 −3X +1 over F17 contains 19 points:
(0, 1), (0, 16), (1, 4), (1, 13), (3, 6), (3, 11), (4, 6), (4, 11), (5, 3), (5, 14), (7, 0),
(8, 8), (8, 9), (10, 6), (10, 11), (13, 0), (14, 0), (15, 4), (15, 13).
Since X 3 − 3X + 1 ≡ (X + 3)(X + 4)(X + 10) (mod 17), there are three points
on E3 with Y -coordinate equal to 0.
(4) Take K = F2n and a curve C : Y 2 = X 3 +aX 2 +bX +c with a, b, c ∈ K.
∂f ∂f
Write f (X, Y ) = Y 2 − (X 3 + aX 2 + bX + c). Then, ∂X = X 2 + b, and ∂Y = 0.
Every element in F2n has a unique square root in F2n . In particular, X 2 + b
n−1
has the root h = b2 . Plugging in this value of X in Y 2 = X 3 +aX 2 +bX +c
n−1
gives a unique solution for Y , namely k = (h3 + ah2 + bh + c)2 . But then,
(h, k) is a point of singularity on C. This means that a curve of the form
Y 2 = X 3 + aX 2 + bX + c is never an elliptic curve over F2n . Therefore, we
must have non-zero term(s) involving XY and/or Y on the left side of the
Weierstrass equation in order to obtain an elliptic curve over F2n .
182 Computational Number Theory
(0, θ + 2), (0, 2θ + 1), (1, θ + 2), (1, 2θ + 1), (θ + 1, θ), (θ + 1, 2θ),
(2θ + 2, 1), (2θ + 2, 2).
Y 2 = X 3 + b2 X 2 + b4 X + b6 . (4.5)
Y 2 = X 3 + aX + b. (4.6)
Y 2 + aY = X 3 + bX + c. (4.7)
Y 2 + XY = X 3 + aX 2 + b. (4.8)
K Weierstrass equation
Any field Y 2 + (a1 X + a3 )Y = X 3 + a2 X 2 + a4 X + a6
char K 6= 2, 3 Y 2 = X 3 + aX + b
char K = 2 Y 2 + aY = X 3 + bX + c (Supersingular curve)
Y 2 + XY = X 3 + aX 2 + b (Ordinary curve)
char K = 3 Y 2 = X 3 + aX 2 + bX + c
group structure.
184 Computational Number Theory
(a) P + Q + R = 0
R
P P
P
R R
(a) P + P + R = 0
P
P P
Q Q
Q
(c) P + Q = 0
P P P
(d) P + P = 0
there exists a unique third point R lying on the intersection of E with L. The
group operation will satisfy the condition P + Q + R = 0 in this case.
Part (b) of Figure 4.3 shows the special case P = Q = (h, k). In this
case, the line L : Y = λX + µ is taken to be the tangent to the curve at P .
Substituting λX + µ for Y in the equation for E gives a cubic equation in X,
of which X = h is a double root. The third root identifies a unique intersection
point R of the curve E with L. We take P + P + R = 2P + R = 0.
Now, take two points P = (h, k) and Q = (h, −k) on the curve (Part (c)
of Figure 4.3). The line passing through these two points has the equation
L : X = h. Substituting X by h in the equation for E gives a quadratic
equation in Y , which has the two roots ±k. The line L does not meet the
curve E at any other point in K 2 . We set P + Q = 0 in this case.
Arithmetic of Elliptic Curves 185
R
P P
2P P 2P
R R
2P
P
P P
−P −P
−P
the identity of the elliptic curve group. Thus, we would have −O = O, that
is, O′ and O are treated as the same point on E.
Does it look too imprecise or ad hoc? Perhaps, or perhaps not! The reader
needs to understand projective geometry in order to visualize the point at
infinity. Perfectly rigorous mathematical tools will then establish that there
is exactly one point at infinity on any elliptic curve. More importantly, if K
is any arbitrary field, even a finite field, the point O provably exists on the
curve. We defer this discussion until Section 4.4.
Let us now logically deduce that P + O = O + P = P for any finite point
P on EK . The line L passing through P and O is vertical by the choice of O.
Thus, the third point R where L meets E is the opposite of P , that is, R = −P .
By the chord-and-tangent rule, we then take P + O = O + P = −(−P ) = P .
Arithmetic of Elliptic Curves 187
h3 = λ2 + a1 λ − a2 − h1 − h2 .
That is, the line L meets E at the third point R = (h3 , λh3 + µ). We take
that P = (h1 , k1 ) and Q = (h2 , k2 ) are finite points that are not opposites of
one another. Our plan is to compute the coordinates of P + Q = (h3 , k3 ).
If char K 6= 2, 3, we use the equation Y 2 = X 3 + aX + b. In this case,
h3 = λ 2 − h1 − h2 ,
k3 = λ(h1 − h3 ) − k1 , where
k2 − k1
h2 − h1 , if P 6= Q,
λ = 2
3h1 + a , if P = Q .
2k1
If char K = 3, we use the equation Y 2 = X 3 + aX 2 + bX + c, and obtain
h3 = λ 2 − a − h 1 − h2 ,
k3 = λ(h1 − h3 ) − k1 , where
k −k
h2 − h1 , if P 6= Q,
2 1
λ =
2ah 1 + b , if P = Q .
2k1
Finally, let char K = 2. For the supersingular curve Y 2 +aY = X 3 +bX +c,
we have
³ ´
k +k 2
h1 + h2 + h1 + h2 , if P 6= Q,
1 2
h3 = 4 2
h1 +2 b ,
if P = Q,
³ a ´
k1 + k2
h1 + h2 (h1 + h3 ) + k1 + a, if P 6= Q,
k3 = µ 2 ¶
h1 + b
a (h1 + h3 ) + k1 + a, if P = Q,
One can write EQ = Etors ⊕ Efree , where Etors is the subgroup consisting
of points of finite orders (the torsion subgroup of EQ ), and where Efree ∼ = Zr
is the free part of EQ . Mordell’s theorem states that the rank r of EQ is finite.
It is a popular belief that r can be arbitrarily large. At present (March 2012),
however, the largest known rank of an elliptic curve is 28. Elkies in 2006
discovered that the following elliptic curve has rank 28.
Y 2 + XY + Y =
X 3 − X 2 − 20067762415575526585033208209338542750930230312178956502X +
3448161179503055646703298569039072037485594435931918036126 \
6008296291939448732243429
There are two very important quantities associated with elliptic curves.
12 This result was conjectured by Poincaré in 1901, and proved in 1922 by the British
gp > E1 = ellinit([0,1,0,0,0])
*** singular curve in ellinit.
gp > E1 = ellinit([0,0,0,-1,1])
%1 = [0, 0, 0, -1, 1, 0, -2, 4, -1, 48, -864, -368, -6912/23, [-1.32471795724474
6025960908854, 0.6623589786223730129804544272 - 0.5622795120623012438991821449*I
, 0.6623589786223730129804544272 + 0.5622795120623012438991821448*I]~, 4.7070877
61230185561883752116, -2.353543880615092780941876058 + 1.09829152506100512202582
2079*I, -1.209950063079174653559416804 + 0.E-28*I, 0.604975031539587326779708402
0 - 0.9497317195650359122756449983*I, 5.169754595877492840054389119]
192 Computational Number Theory
gp > P1 = [1,-1]
%2 = [1, -1]
gp > Q1 = [3,5]
%3 = [3, 5]
gp > P1 + Q1
%4 = [4, 4]
gp > elladd(E1,P1,Q1)
%5 = [5, -11]
gp > 2*P1
%6 = [2, -2]
gp > ellpow(E1,P1,2)
%7 = [-1, -1]
gp > R1 = ellpow(E1,Q1,-1)
%8 = [3, -5]
gp > elladd(E1,Q1,R1)
%9 = [0]
We can work with elliptic curves over finite fields. Here is an example that
illustrates the arithmetic of the curve Y 2 = X 3 − 5X + 1 defined over F17 .
gp > E2 = ellinit([Mod(0,17),Mod(0,17),Mod(0,17),Mod(5,17),Mod(-1,17)])
*** singular curve in ellinit.
gp > E2 = ellinit([Mod(0,17),Mod(0,17),Mod(0,17),Mod(-5,17),Mod(1,17)])
%10 = [Mod(0, 17), Mod(0, 17), Mod(0, 17), Mod(12, 17), Mod(1, 17), Mod(0, 17),
Mod(7, 17), Mod(4, 17), Mod(9, 17), Mod(2, 17), Mod(3, 17), Mod(3, 17), Mod(14,
17), 0, 0, 0, 0, 0, 0]
gp > P2 = [Mod(2,17),Mod(4,17)];
gp > Q2 = [Mod(13,17),Mod(5,17)];
gp > elladd(E2,P2,Q2)
%13 = [Mod(11, 17), Mod(6, 17)]
gp > ellpow(E2,P2,2)
%14 = [Mod(5, 17), Mod(4, 17)]
One can work with curves defined over extension fields. For example, we
represent F8 = F2 (θ) with θ3 + θ + 1 = 0, and define the non-supersingular
curve Y 2 + XY = X 3 + X 2 + θ over F8 .
gp > f = Mod(1,2)*t^3+Mod(1,2)*t+Mod(1,2)
%15 = Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)
Arithmetic of Elliptic Curves 193
gp > a1 = Mod(Mod(1,2),f);
gp > a2 = Mod(Mod(1,2),f);
gp > a3 = a4 = 0;
gp > a6 = Mod(Mod(1,2)*t,f);
gp > E3 = ellinit([a1,a2,a3,a4,a6])
%20 = [Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2), M
od(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), 0, 0, Mod(Mod(1, 2)*t, Mod(1, 2)*t^3 +
Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)
), Mod(Mod(0, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), 0, Mod(Mod(1, 2)*t,
Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1,
2)*t + Mod(1, 2)), Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(
Mod(1, 2)*t, Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2)*t^2 + Mod(1
, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), 0, 0, 0, 0, 0, 0]
gp > P3 = [Mod(Mod(1,2),f), Mod(Mod(1,2)*t^2,f)];
gp > Q3 = [Mod(Mod(1,2)*t+Mod(1,2),f), Mod(Mod(1,2)*t^2+Mod(1,2),f)];
gp > elladd(E3,P3,Q3)
%22 = [Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2)*t^
2 + Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2))]
gp > lift(lift(elladd(E3,P3,Q3)))
%23 = [1, t^2 + 1]
gp > lift(lift(ellpow(E3,P3,2)))
%25 = [t + 1, t^2 + t]
gp > ellordinate(E1,3)
%26 = [5, -5]
gp > ellordinate(E1,4)
%27 = []
gp > ellordinate(E2,5)
%28 = [Mod(4, 17), Mod(13, 17)]
gp > ellordinate(E2,6)
%29 = [Mod(0, 17)]
gp > E4 = ellinit([1,0,0,-15745932530829089880,24028219957095969426339278400])
%30 = [1, 0, 0, -15745932530829089880, 24028219957095969426339278400, 1, -314918
65061658179760, 96112879828383877705357113600, -24793439124139356756716301340277
9136000, 755804761479796314241, -20760382044064624726576831008961, 4358115163151
13821324429157217041184204234956825600000000, 4317465449157134026457008940686688
72327010468357665602062299521/43581151631511382132442915721704118420423495682560
0000000, [2346026160.000000000000000000, 2235513503.749999999999999999, -4581539
664.000000000000000000]~, 0.00008326692542370325455895756925, 0.0000378969084242
9953714081078080*I, 12469.66709441963916544774723, -32053.9134602331409344378049
3*I, 0.000000003155559047555061173737096096]
gp > P4 = [9535415580, -860750821322580];
gp > Q4 = [-4581539664, 2290769832];
gp > R4 = [0];
gp > S4 = [2188064030, -7124272297330];
gp > ellorder(E4,P4)
%35 = 8
gp > ellorder(E4,Q4)
%36 = 2
gp > ellorder(E4,R4)
%37 = 1
gp > ellorder(E4,S4)
%38 = 0
Theorem 4.12 The group Eq is either cyclic or the direct sum Zn1 ⊕ Zn2 of
two cyclic subgroups with n1 , n2 > 2, n2 |n1 , and n2 |(q − 1). ⊳
Arithmetic of Elliptic Curves 195
The multiples of the points in (E1 )F17 and their orders are listed below.
The table demonstrates that the group (E1 )F17 is cyclic. The φ(16) = 8 gen-
erators of this group are P1 , P2 , P5 , P6 , P12 , P13 , P14 , P15 .
(2) The elliptic curve E2 : Y 2 = X 3 − 5X + 2 defined over F17 consists of
the following 20 points.
The multiples of these points and their orders are tabulated below.
196 Computational Number Theory
P 2P 3P 4P 5P 6P 7P 8P 9P 10P ord P
P0 1
P1 P4 P18 P8 P13 P7 P19 P3 P2 P0 10
P2 P3 P19 P7 P13 P8 P18 P4 P1 P0 10
P3 P7 P8 P4 P0 5
P4 P8 P7 P3 P0 5
P5 P0 2
P6 P0 2
P7 P4 P3 P8 P0 5
P8 P3 P4 P7 P0 5
P9 P3 P16 P7 P6 P8 P17 P4 P10 P0 10
P10 P4 P17 P8 P6 P7 P16 P3 P9 P0 10
P11 P4 P14 P8 P5 P7 P15 P3 P12 P0 10
P12 P3 P15 P7 P5 P8 P14 P4 P11 P0 10
P13 P0 2
P14 P7 P12 P4 P5 P3 P11 P8 P15 P0 10
P15 P8 P11 P3 P5 P4 P12 P7 P14 P0 10
P16 P8 P10 P3 P6 P4 P9 P7 P17 P0 10
P17 P7 P9 P4 P6 P3 P10 P8 P16 P0 10
P18 P7 P2 P4 P13 P3 P1 P8 P19 P0 10
P19 P8 P1 P3 P13 P4 P2 P7 P18 P0 10
Since (E2 )F17 does not contain a point of order 20, the group (E2 )F17 is not
cyclic. The above table shows that (E2 )F17 ∼
= Z10 ⊕ Z2 . We can take the point
P1 as a generator of Z10 and P5 as a generator of Z2 . Every element of (E2 )F17
can be uniquely expressed as sP1 + tP5 with s ∈ {0, 1, 2, . . . , 9} and t ∈ {0, 1}.
The following table lists this representation of all the points of E2 (F17 ).
s=0 s=1 s=2 s=3 s=4 s=5 s=6 s=7 s=8 s=9
t=0 P0 P1 P4 P18 P8 P13 P7 P19 P3 P2
t=1 P5 P10 P15 P17 P12 P6 P11 P16 P14 P9
The multiples of these points and their orders are listed in the table below.
The table illustrates that the group (E3 )F8 is cyclic. The φ(12) = 4 generators
of this group are the points P2 , P3 , P6 , P7 .
Arithmetic of Elliptic Curves 197
Since the size of (E4 )F8 is prime, (E4 )F8 is a cyclic group, and any point on it
except O is a generator of it. ¤
The size of the elliptic curve group Eq = EFq is trivially upper-bounded
by q 2 + 1. In practice, this size is much smaller than q 2 + 1. The following
theorem implies that the size of Eq is Θ(q).
Theorem 4.14 [Hasse’s theorem]15 The size of Eq is q + 1 − t, where
√ √
−2 q 6 t 6 2 q. ⊳
The integer t in Hasse’s theorem is called the trace of Frobenius16 for the
elliptic curve E defined over Fq . It is an important quantity associated with
the curve. We define several classes of elliptic curves based on the value of t.
Definition 4.15 Let E be an elliptic curve defined over the finite field Fq of
characteristic p, and t the trace of Frobenius for E.
(a) If t = 1, that is, if the size of EFq is q, we call E an anomalous curve.
(b) If p|t, we call E a supersingular curve, whereas if p6 | t, we call E a non-
supersingular or an ordinary curve. ⊳
Recall that for finite fields of characteristic two, we have earlier defined
supersingular and non-supersingular curves in a different manner. The earlier
definitions turn out to be equivalent to Definition 4.15(b) for the fields F2n .
We have the following important characterization of supersingular curves.
15 This result was conjectured by the Austrian-American mathematician Emil Artin
(3) The curve E3 of Example 4.13(3), defined over F8 , has trace −3 (not a
multiple of two) and is non-supersingular. The curve E4 of Example 4.13(4),
defined again over F8 , has trace four (a multiple of two), and is supersingular.
(4) The non-supersingular curve Y 2 +XY = X 3 +θX 2 +θ over F8 = F2 (θ),
θ3 + θ + 1 = 0, contains the following eight points, and is anomalous.
P0 = O, P1 = (0, θ2 + θ), P2 = (θ2 , 0),
P3 = (θ2 , θ2 ), 2
P4 = (θ + 1, θ + 1), P5 = (θ2 + 1, θ2 + θ),
P6 = (θ2 + θ + 1, 1), 2 2
P7 = (θ + θ + 1, θ + θ). ¤
Proposition 4.18 [Weil’s theorem]17 Let the elliptic curve E defined over
Fq have trace t. Let α, β ∈ C satisfy W 2 − tW + q = (W − α)(W − β). Then,
for every r ∈ N, the size of the group Eqr is q r + 1 − (αr + β r ), that is, the
trace of Frobenius for E over Fqr is αr + β r . ⊳
17 André Abraham Weil (1906–1998) was a French mathematician who made profound
contributions in the areas of number theory and algebraic geometry. He was one of the
founding members of the mathematicians’ (mostly French) group Nicolas Bourbaki.
Arithmetic of Elliptic Curves 199
Figure 4.5 explains the relationship between the affine plane K 2 and the
projective plane P2 (K). The equivalence class P = [h, k, l] is identified with
the line in K 3 passing through the origin (0, 0, 0) and the point (h, k, l).
Y
X
A point at infinity [h, k, 0]
f (h) (h, k, l) = 0. That is, the zeros of f (h) are not dependent on the choice of
the projective coordinates, and a projective curve is a well-defined concept.
A K-rational point [h, k, l] on C (h) is a solution of f (h) (h, k, l) = 0. The set
(h)
of all K-rational points on C (h) is denoted by CK . By an abuse of notation,
we often describe a curve by its affine equation. But when we talk about the
rational points on that curve, we imply all rational points on the corresponding
(h)
projective curve. In particular, CK would stand for CK . ⊳
Putting Z = 1 gives f (h) (X, Y, 1) = f (X, Y ). This gives all the finite points
on C (h) , that is, all the points on the affine curve C. If, on the other hand, we
put Z = 0, we get f (h) (X, Y, 0) which is a homogeneous polynomial in X, Y
of degree d. The solutions of f (h) (X, Y, 0) = 0 give all the points at infinity
on C (h) . These points are not present on the affine curve C.
Example 4.25 (1) A straight line in the projective plane has the equation
aX + bY + cZ = 0. Putting Z = 1 gives aX + bY + c = 0, that is, all points on
the corresponding affine line. Putting Z = 0 gives aX + bY = 0. If b 6= 0, then
Y = −(a/b)X, that is, the line contains only one point at infinity [1, −(a/b), 0].
If b = 0, we have X = 0, that is, [0, 1, 0] is the only point at infinity.
(2) A circle with center at (a, b) and radius r has the projective equation
(X − aZ)2 + (Y − bZ)2 = r2 Z 2 . All finite points on the circle are solutions
obtained by putting Z = 1, that is, all solutions of (X − a)2 + (Y − b)2 = r2 .
For obtaining the points at infinity on the circle, we put Z = 0, and obtain
X 2 + Y 2 = 0. For K = R, the only solution of this is X = Y = 0. But all
of the three projective coordinates are not allowed to be zero simultaneously,
that is, the circle does not contain any point at infinity. Indeed, a circle does
not have a part extending towards infinity in any direction.
However, for K = C, the equation X 2 + Y 2 = 0 implies that Y = ±iX,
that is, there are two points at infinity: [1, i, 0] and [1, −i, 0].
aX + bY + c = 0
r
aX + bY = 0 (a, b)
Y = −X Y =X
Y2 =0
X2 − Y 2 = 1
2
Y =X
Parabola Hyperbola
Arithmetic of Elliptic Curves 203
X=0 X=0
Y 2 = X3 − X + 1 Y 2 = X3 − X
Elliptic curves
(5) The homogenization of an elliptic curve given by the Weierstrass equa-
tion is Y 2 Z + a1 XY Z + a3 Y Z 2 = X 3 + a2 X 2 Z + a4 XZ 2 + a6 Z 3 . If we put
Z = 0, we get X 3 = 0, that is, X = 0, that is, [0, 1, 0] is the only point at
infinity on the elliptic curve. In the limit X → ∞, the curve becomes vertical.
X=0
In what follows, I often use affine equations of curves, but talk about the
corresponding projective curves. The points at infinity on these curves cannot
be described by the affine equations and are to be handled separately.
As such, the theorem does not appear to be true. A line and a circle must
intersect at two points. While this is the case with some lines and circles, there
are exceptions. For example, a tangent to a circle meets the circle at exactly
one point. But, in this case, the intersection multiplicity is two, that is, we
need to count the points of intersection with proper multiplicities. However,
there are examples where a line does not meet a circle at all. Eliminating one
of the variables X, Y from the equations of a circle and a line gives a quadratic
equation in the other variable. If we try to solve this quadratic equation over
R, we may fail to get a root. If we solve the same equation over C, we always
obtain two roots. These two roots may be the same, implying that this is a
case of tangency, that is, a root of multiplicity two. To sum up, it is necessary
to work in an algebraically closed field for Bézout’s theorem to hold.
But multiplicity and algebraic closure alone do not validate Bézout’s theo-
rem. Consider the case of two concentric circles X 2 +Y 2 = 1 and X 2 +Y 2 = 2.
According to Bézout’s theorem, they must intersect at four points. Even if we
allow X, Y to assume complex values, we end up with an absurd conclusion
1 = 2, that is, the circles do not intersect at all. The final thing that is neces-
sary for Bézout’s theorem to hold is that we must consider projective curves,
and take into account the possibilities of intersections of the curves at their
common points at infinity. By Example 4.25(2), every circle has two points at
infinity over C. These are [1, i, 0] and [1, −i, 0] irrespective of the radius and
center of the circle. In other words, any two circles meet at these points at
infinity. Two concentric circles touch one another at these points at infinity,
so the total number of intersection points is 2 + 2 = 4.
The equation of the circle X 2 + Y 2 = a can be written as X 2 − i 2 Y 2 = a,
so as complex curves, a circle is a hyperbola too. If we replace i by a real
number, we get a hyperbola in the real plane. It now enables us to visualize
intersections at the points at infinity. Figure 4.6 illustrates some possibilities.
Parts (a) and (b) demonstrate situations where the two hyperbolas have
only two finite points of intersection. Asymptotically, these two hyperbolas
become parallel, that is, the two hyperbolas have the same points at infinity,
and so the total number of points of intersection is four. Part (c) illustrates the
Arithmetic of Elliptic Curves 205
(a) (b)
(c) (d)
situation when the two hyperbolas not only have the same points at infinity
but also become tangential to one another at these points at infinity. There
are no finite points of intersection, but each of the two points at infinity is
an intersection point of multiplicity two. This is similar to the case of two
concentric circles. Finally, Part (d) shows a situation where the hyperbolas
have different points at infinity. All the four points of intersection of these
hyperbolas are finite. This is a situation that is not possible for circles. So we
can never see two circles intersecting at four points in the affine plane.
equivalence class of a polynomial G(X, Y ) ∈ K[X, Y ] is G(x, y). The set of all
the equivalence classes of K[X, Y ] under congruence modulo f is denoted by
It is easy to generalize these results to any arbitrary straight line in the plane.
(2) For the circle C : X 2 + Y 2 = 1, we have f (X, Y ) = X 2 + Y 2 − 1. Since
X + Y 2 − 1 is (X + Y + 1)2 modulo 2, and we require f to be irreducible, we
2
must have char K 6= 2. Congruence modulo this f gives the coordinate ring
K[C] = {G(x, y) | G(X, Y ) ∈ K[X, Y ]}, where x2 + y 2 − 1 = 0.
Since char K 6= 2, the elements x, 1 − y, 1 + y are irreducible in K[C],
distinct from one another. But then, x2 = 1 − y 2 = (1 − y)(1 + y) gives two
different factorizations of the same element in K[C]. Therefore, K[C] is not
a unique factorization domain. On the contrary, any polynomial ring over a
field is a unique factorization domain. It follows that the coordinate ring of a
circle is not isomorphic to a polynomial ring.
2
However, the rational map K(C) → K(Z) taking x 7→ 1−Z 2Z
1+Z 2 and y 7→ 1+Z 2
can be easily verified to be an isomorphism of fields. Therefore, the function
field of a circle is isomorphic to the ring of univariate rational functions. ¤
Let us now specialize to elliptic (or hyperelliptic) curves. Write the equa-
tion of the curve as C : Y 2 + u(X)Y = v(X) (for elliptic curves, u(X) =
a1 X + a3 , and v(X) = X 3 + a2 X 2 + a4 X + a6 ). We have y 2 = −u(x)y + v(x).
If we take G(x, y) ∈ K[C], then by repeatedly substituting y 2 by the linear
(in y) polynomial −u(x)y + v(x), we can simplify G(x, y) as
for some a(X), b(X) ∈ K[X]. It turns out that such a representation of a
polynomial function on C is unique, that is, every G(x, y) ∈ K[C] corresponds
to unique polynomials a(X) and b(X).
Arithmetic of Elliptic Curves 207
x-degree), and the degree of yb(x) is 3 + 2 degx (b). The larger of these two
degrees is taken to be the degree of G(x, y), that is,
¡ ¢
deg G = max 2 degx (a), 3 + 2 degx (b) .
The leading coefficient of G, denoted lc(G), is that of a or b depending upon
whether 2 degx (a) > 3 + 2 degx (b) or not. The two degrees cannot be equal,
since 2 degx (a) is even, whereas 3 + 2 degx (b) is odd. Now, define the value of
R(x, y) = G(x, y)/H(x, y) ∈ K(C) (with G, H ∈ K[C]) at O as
0 if deg G < deg H,
R(O) = ∞ if deg G > deg H,
lc(G)/ lc(H) if deg G = deg H.
The point O is a zero (or pole) of R if R(O) = 0 (or R(O) = ∞).
Let C again be any algebraic curve. Although we are now able to uniquely
define values of rational functions at points on C, the statement of Defini-
tion 4.30 is existential. In particular, nothing in the definition indicates how
we can obtain a good representation G/H of R. We use a bit of algebra to
settle this issue. The set of rational functions on C defined at P is a local ring
with the unique maximal ideal comprising functions that evaluate to zero at
P . This leads to the following valuation of rational functions at P . The notion
of zeros and poles can be made concrete from this.
Case 1: P = ( 35 , 54 ).
In this case, we can take x − 3/5 as the uniformizer. Let us instead take
5(x − h) = 5x − 3 as the uniformizer. Clearly, multiplication of a uniformizer
by non-zero field elements does not matter. We have G(P ) = H(P ) = 0, so
we need to find an alternative representation of R. We write
But 2(5x − 3)(y − x)(1 + y) has neither a zero nor a pole at P = (1, 0), so
(1, 0) is again established as a double pole of R.
It is not necessary to take only linear functions as uniformizers. The circle
(X − 1)2 + (Y − 1)2 = 1 meets the circle X 2 + Y 2 = 1 at P = (1, 0) with
multiplicity one. So (x−1)2 +(y −1)2 −1 may also be taken as a uniformizer at
(1, 0). But (x−1)2 +(y−1)2 −1 = x2 +y 2 −1−2(x+y−1) = 2(1−x−y), and so
[(x−1)2 +(y−1)2 −1]2 = 4(1+x2 +y 2 −2x−2y+2xy) = 4(1+1−2x−2y+2xy) =
8(1 − x)(1 − y), and we have the representation
¡ ¢−2 ¡ ¢
R(x, y) = (x − 1)2 + (y − 1)2 − 1 8(5x − 3)(y − x)(1 − y) ,
which yet again reveals that the multiplicity of the pole of R at (1, 0) is two.
Let us now pretend to take x2 − y 2 − 1 as a uniformizer at P = (1, 0). We
have x2 − y 2 − 1 = 2x2 − 2 − (x2 + y 2 − 1) = 2x2 − 2 = 2(x − 1)(x + 1), so that
¡ ¢
R(x, y) = (x2 − y 2 − 1)−1 − 2(5x − 3)(y − x)(1 + x) .
This seems to reveal that (1, 0) is a simple pole of R. This conclusion is wrong,
because the hyperbola X 2 − Y 2 = 1 touches the circle X 2 + Y 2 = 1 at (1, 0)
(with intersection multiplicity two), that is, x2 − y 2 − 1 is not allowed to be
taken as a uniformizer at (1, 0).
If a curve C ′ meets C at P = (1, 0) with intersection multiplicity larger
than two, R(x, y) cannot at all be expressed in terms of the equation of C ′ . For
instance, take C ′ to be the parabola Y 2 = 2(1 − X) which meets X 2 + Y 2 = 1
at (1, 0) with multiplicity four (argue why). We have y 2 − 2(1 − x) = (x2 +
y 2 − 1) − (x2 − 2x + 1) = −(1 − x)2 , that is,
¡ ¢−1 ¡ ¢
[R(x, y)]2 = y 2 − 2(1 − x) − (5x − 3)2 (y − x)2 ,
indicating that R(x, y) itself has a pole at (1, 0) of multiplicity half, an absurd
conclusion indeed.
At any rate, (non-tangent) linear functions turn out to be the handiest as
uniformizers, particularly at all finite points on a curve. ¤
Some important results pertaining to poles and zeros are now stated.
Theorem 4.36 Any non-zero rational function has only finitely many zeros
and poles. ⊳
Example 4.41 Let us compute all the zeros and poles of the rational function
G(x, y) x+y
R(x, y) = = 2
H(x, y) x +y
on the elliptic curve E : Y 2 = X 3 + X defined over the field F5 . We handle
the numerator and the denominator of R separately.
Zeros and poles of G(x, y) = x + y : A zero of G satisfies x + y = 0, that is,
y = −x. Since x, y also satisfy the equation of E, we have y 2 = (−x)2 = x3 +x,
that is, x(x2 − x + 1) = 0. The polynomial x2 − x + 1 is irreducible over F5 . Let
θ ∈ F̄5 be the element satisfying θ2 +2 = 0. This element defines the extension
F52 = F5 (θ) in which we have x(x2 − x + 1) = x(x + (θ + 2))(x + (4θ + 2)).
Therefore, the zeros of G(x, y) correspond to x = 0, −(θ + 2), −(4θ + 2), that
is, x = 0, 4θ + 3, θ + 3. Plugging in these values of x in y = −x gives us the
three zeros of G as Q0 = (0, 0), Q1 = (4θ + 3, θ + 2), and Q2 = (θ + 3, 4θ + 2).
In order to compute the multiplicities of these zeros, we write G(x, y) =
a(x) + yb(x) with a(x) = x and b(x) = 1. We have gcd(a(x), b(x)) = 1, so we
compute N(G) = a(x)2 − y 2 b(x)2 = x2 − (x3 + x) = −x(x2 − x + 1). This
indicates that each of the three zeros of G has e = 0 and l = 1, and so has
multiplicity one (Q0 is a special point, whereas Q1 and Q2 are ordinary).
The degree of x + y is three, so G has a pole of multiplicity three at O.
Zeros and poles of H(x, y) = x2 +y : The zeros of H correspond to y = −x2 ,
that is, y 2 = (−x2 )2 = x3 + x, that is, x(x3 − x2 − 1) = 0. The cubic factor
being irreducible, all the zeros of H exist on the curve E defined over F53 .
Let ψ ∈ F̄5 satisfy ψ 3 + ψ + 1 = 0. Then, the zeros of H correspond to
x(x3 − x2 − 1) = x(x + (2ψ 2 + 2ψ + 1))(x + (4ψ 2 + 4))(x + (4ψ 2 + 3ψ + 4)) = 0,
Arithmetic of Elliptic Curves 213
P
Q
Theorem 4.44 For all rational maps α, β on EK̄(E) and for all points P on
E, we have (α + β)(P ) = α(P ) + β(P ). ⊳
Theorem 4.51 Two elliptic curves E, E ′ defined over K are isomorphic over
K̄ if and only if there exist u, r, s, t ∈ K̄ with u 6= 0 such that substituting X
by u2 X + r and Y by u3 Y + su2 X + t transforms the equation of E to the
equation of E ′ . ⊳
The substitutions made to derive Eqns (4.5), (4.6), (4.7) and (4.8) from the
original Weierstrass Eqn (4.4) are examples of admissible changes of variables.
For the rest of this section, we concentrate on the multiplication-by-m
endomorphisms. We identify [m] with a pair (gm , hm ) of rational functions.
These rational functions are inductively defined by the chord-and-tangent rule.
Consider an elliptic curve defined by Eqn (4.4).
g1 = x, h1 = y.
3x2 + 2a2 x + a4 − a1 y
where λ = . Finally, for m > 3, we recursively define
2y + a1 x + a3
gm = −gm−1 −x+λ2 +a1 λ−a2 , hm = −λ(gm −x)−a1 gm −a3 −y, (4.17)
hm−1 − y
where λ = . The kernel of the map [m] is denoted by E[m], that is,
gm−1 − x
E[m] = {P ∈ E = EK̄ | mP = O}.
Elements of E[m] are called m-torsion points of E. For every m ∈ Z, E[m] is
a subgroup of E.
Theorem 4.52 Let p = char K. If p = 0 or gcd(p, m) = 1, then
E[m] ∼
= Zm × Zm ,
and so |E[m]| = m2 . If gcd(m, n) = 1, then E[mn] ∼
= E[m] × E[n]. ⊳
The rational functions gm , hm have poles precisely at the points in E[m].
But they have some zeros also. We plan to investigate polynomials having zeros
precisely at the points of E[m]. Assume that either p = 0 or gcd(p, m) = 1.
Then, E[m] contains exactly m2 points. A rational function ψm whose only
zeros are the m2 points of E[m] is a polynomial by Theorem 4.37. All these
zeros are taken as simple. So ψm must have a pole of multiplicity m2 at O.
The polynomial ψm is unique up to multiplication by non-zero elements of K̄.
If we arrange the leading coefficient of ψm to be m, then ψm becomes unique,
and is called the m-th division polynomial.
The division polynomials are defined recursively as follows.
ψ0 = 0
ψ1 = 1
ψ2 = 2y + a1 x + a3
Arithmetic of Elliptic Curves 217
Putting n = 1 gives
ψm+1 ψm−1
gm = x− 2
. (4.18)
ψm
Moreover,
2 2
ψm+2 ψm−1 − ψm−2 ψm+1 1
hm = 3
− (a1 gm + a3 ) (4.19)
2ψ2 ψm 2
2
ψm+2 ψm−1 ψm−1 ψm+1
= y+ 3
+ (3x2 + 2a2 x + a4 − a1 y) 2
. (4.20)
ψ2 ψm ψ2 ψm
4.4.4 Divisors
Let ai , i ∈ I, be symbols
P indexed by I. A finite formal sum of ai , i ∈ I, is
an expression of the form i∈I mi aP i with mi ∈ Z such that mi = 0 except for
only finitely many i ∈ I. The sum i∈I mi ai is formal in the sense that the
symbols ai are not meant to be evaluated. They act as placeholders. Define
X X X X X
mi ai + ni ai = (mi + ni )ai , and − mi ai = (−mi )ai .
i∈I i∈I i∈I i∈I i∈I
Under these definitions, the set of these finite formal sums becomes an Abelian
group called the free Abelian group generated by the symbols ai , i ∈ I.
Now, let E be an elliptic curve defined over K. For a moment, let us treat
E as a curve defined over the algebraic closure K̄ of K.
Since any non-zero rational function can have only finitely many zeros and
poles, Div(R) is defined (that is, a finite formal sum) for any R 6= 0.
A principal divisor is the divisor of some rational function. Theorem 4.38
implies that every principal divisor belongs to Div0K̄ (E). The set of all principal
divisors is a subgroup of Div0K̄ (E), denoted by PrinK̄ (E) or Prin(E). ⊳
Definition 4.56 Two divisors D, D′ in DivK̄ (E) are called equivalent if they
differ by a principal divisor. That is, D ∼ D′ if and only if D = D′ + Div(R)
for some R(x, y) ∈ K̄(E). ⊳
Definition 4.57 The quotient group DivK̄ (E)/ PrinK̄ (E) is called the divisor
class group or the Picard group23 of E, denoted PicK̄ (E) or Pic(E). The
quotient group Div0K̄ (E)/ PrinK̄ (E) is called the Jacobian24 of E, denoted
Pic0K̄ (E) or Pic0 (E) or JK̄ (E) or J(E). ⊳
Example 4.58 (1) Consider the lines given in Figure 4.7. We have
Div(L) = [P ] + [Q] + [R] − 3[O] = ([P ]−[O]) + ([Q]−[O]) + ([R]−[O]),
Div(T ) = 2[P ] + [Q] − 3[O] = 2([P ]−[O]) + ([Q]−[O]), and
Div(V ) = [P ] + [Q] − 2[O] = ([P ]−[O]) + ([Q]−[O]).
23 This is named after the French mathematician Charles Émile Picard (1856–1941).
24 This is named after Carl Gustav Jacob Jacobi (1804–1851).
Arithmetic of Elliptic Curves 219
For every D ∈ Div0K̄ (E), there exist a unique rational point P and a
rational function R such that D = [P ] − [O] + Div(R). But then D ∼ [P ] − [O]
in Div0K̄ (E). We identify P with the equivalence class of [P ] − [O] in JK̄ (E).
This identification establishes a bijection between the set EK̄ of rational points
on E and the Jacobian JK̄ (E) of E. As Example 4.58(1) suggests, this bijection
also respects the chord-and-tangent rule for addition in E. The motivation for
addition of points in an elliptic-curve group, as described in Figures 4.3 and
4.4, is nothing but a manifestation of this bijection. Moreover, it follows that
the group EK̄ is isomorphic to the Jacobian JK̄ (E).
If K is not algebraically closed, a particular subgroup of JK̄ (E) can be de-
fined to be the Jacobian JK (E) of E over K. Thanks to the chord-and-tangent
rule, we do not need to worry about the exact definition of JK (E). More pre-
cisely, if P, Q are K-rational points of E, the explicit formulas for P + Q, 2P ,
and −P guarantee that these points are defined over K as well. Furthermore,
the chord-and-tangent rule provides explicit computational handles on the
group JK (E). In other words, EK is the equivalent (and computationally ori-
ented) definition of JK (E) (just as E = EK̄ was for JK̄ (E)). This equivalence
proves the following important result.
P
Theorem 4.59 A divisor D = P mP [P ] ∈ DivK (E) is principal if and
only if P
(1) PP mP = 0 (integer sum), and
(2) p mP P = O (sum under the chord-and-tangent rule). ⊳
Divisors are instrumental not only for defining elliptic-curve groups but
also for proving many results pertaining to elliptic curves. For instance, the
concept of pairing depends heavily on divisors. I now highlight some important
results associated with divisors, that are needed in the next section.
Let P, Q be points on EK . By LP,Q we denote the unique (straight) line
passing through P and Q. If P = Q, then LP,Q is taken to be the tangent to
E at the point P . Now, consider the points P, Q, ±R as shown in Figure 4.8.
Here, P + Q = −R, that is, P + Q + R = O.
Q R
P
−R
The leading term of y + 17x + 11 is y (recall that y has degree three, and x has
degree two), whereas the leading term of x + 18 is x. So both the numerator
and the denominator of the rational function y+17x+11
x+18 are monic. ¤
Two rational functions f and g have the same divisor if and only if f = cg
∗
for a non-zero constant c ∈ K̄ P . In that case, if D has degree zero, then
Q nP
f (D) = g(D) P c = g(D)c P nP = g(D)c0 = g(D), that is, the value of
f at a divisor D of degree zero is dependent upon Div(f ) (rather than on f ).
where P1 = (1, 9), P2 = (10, 4), and P3 = (19, 36). On the other hand, g has
a double zero at P4 = (−16, 0) = (21, 0) and simple poles at P5 = (−4, 14) =
(33, 14) and P6 = (−4, −14) = (33, 23). Therefore,
f (Div(g)) ≡ f (P4 )2 f (P5 )−1 f (P6 )−1 ≡ 352 × 31−1 × 3−1 ≡ 8 (mod 37), and
g(Div(f )) ≡ g(P1 )g(P2 )g(P3 )g(O)−3 ≡ 33 × 23 × 16 × 1−3 ≡ 8 (mod 37).
We have g(O) = 1, since both the numerator and the denominator of g are
monic, and have the same degree (two). ¤
em : E[m] × E[m] → µm
f1 (D2 )
em (P1 , P2 ) = .
f2 (D1 )
We first argue that this definition makes sense. First, note that f1 and f2 are
defined only up to multiplication by non-zero elements of K̄. But we have
already established that the values f1 (D2 ) and f2 (D1 ) are independent of the
choices of these constants, since D1 and D2 are of degree zero.
Arithmetic of Elliptic Curves 223
The importance of the rational functions fn,P in connection with the Weil
pairing lies in the fact that if P ∈ E[m], then Div(fm, P ) = m[P ] − [mP ] −
(m − 1)[O] = m[P ] − m[O]. Therefore, it suffices to compute fm,P1 and fm,P2
in order to compute em (P1 , P2 ). We can define fn,P inductively as follows.
f0, P = f1, P = 1, (4.22)
µ ¶
LP, nP
fn+1, P = fn, P for n > 1. (4.23)
L(n+1)P, −(n+1)P
Here, LS,T is the straight line through S and T (or the tangent to E at S if
S = T ), and LS,−S is the vertical line through S (and −S). Typically, m in
Weil pairing is chosen to be of nearly the same bit size as q. Therefore, it is
rather impractical to compute fm,P using Eqn (4.23). A divide-and-conquer
approach follows from the following property of fn,P .
/* Conditional adding */ ³ ´
L P
If (ni = 1), update f = f × LU +P,U,−(U +P )
and U = U + P .
}
Return f .
Arithmetic of Elliptic Curves 225
Eqn (4.25) in conjunction with Eqn (4.23) give Algorithm 4.1. The function
fn, P is usually kept in the factored form. It is often not necessary to compute
fn, P explicitly. The value of fn, P at some point Q is only needed. In that
case, the functions LU, U /L2U, −2U and LU, P /LU +P, −(U +P ) are evaluated at Q
before multiplication with f .
We now make the relationship between em (P1 , P2 ) and fn,P more explicit.
We choose a point T ∈ E not equal to ±P1 , −P2 , P2 − P1 , O. We have
fm, P2 (T ) fm, P1 (P2 − T )
em (P1 , P2 ) = . (4.26)
fm, P1 (−T ) fm, P2 (P1 + T )
Moreover, if P1 6= P2 , we also have
fm, P1 (P2 )
em (P1 , P2 ) = (−1)m . (4.27)
fm, P2 (P1 )
Eqn (4.27) is typically used when P1 and P2 are linearly independent.
It is unnecessary to make four (or two) separate calls of Algorithm 4.1 for
computing em (P1 , P2 ). All these invocations have n = m, so a single double-
and-add loop suffices. For efficiency, one may avoid the division operations in
Miller’s loop by separately maintaining the numerator and the denominator.
After the loop terminates, a single division is made. Algorithm 4.2 incorporates
these ideas, and is based upon Eqn (4.27). The polynomial functions L−,− are
first evaluated at appropriate points and then multiplied.
rank two, that is, isomorphic to Z22 ⊕ Z2 . We choose m = 11. The embedding
degree for this choice is k = 2. This means that we have to work in the field
F432 = F1849 . Since p = 43 is congruent to 3 modulo 4, −1 is a quadratic non-
residue modulo p, and we can represent F432 as F43 (θ) = {a + bθ | a, b ∈ F43 },
where θ2 + 1 = 0. The arithmetic of F432 resembles that of C. F∗432 contains
all the 11-th roots of unity. These are 1, 2 + 13θ, 2 + 30θ, 7 + 9θ, 7 + 34θ,
11 + 3θ, 11 + 40θ, 18 + 8θ, 18 + 35θ, 26 + 20θ, and 26 + 23θ.
The group EF432 contains 442 elements, and is isomorphic to Z44 ⊕ Z44 .
Moreover, this group fully contains E[11] which consists of 112 elements and
is isomorphic to Z11 ⊕ Z11 . The points P = (1, 2) and Q = (−1, 2θ) constitute
a set of linearly independent elements of E[11]. Every element of E[11] can
be written as a unique F11 -linear combination of P and Q. For example, the
element 4P + 5Q = (15 + 22θ, 5 + 14θ) is again of order 11.
Let us compute em (P1 , P2 ) by Algorithm 4.2, where P1 = P = (1, 2), and
P2 = 4P +5Q = (15+22θ, 5+14θ). The binary representation of 11 is (1011)2 .
We initialize f = fnum /fden = 1/1, U1 = P1 , and U2 = P2 . Miller’s loop works
as shown in the following table. Here, Λ1 stands for the rational function
LU1 ,U1 /L2U1 ,−2U1 (during doubling) or the function LU1 ,P1 /LU1 +P1 ,−(U1 +P1 )
(during addition), and Λ2 stands for L2U2 ,−2U2 /LU2 ,U2 (during doubling) or
LU2 +P2 ,−(U2 +P2 ) /LU2 ,P2 (during addition).
i mi Step Λ1 Λ2 f U1 U2
y+20x+21 x+(36+21θ) 34+37θ 2P1 = 2P2 =
2 0 Dbl (11, 26) (7+22θ, 28+7θ)
x+32 y+(12+35θ)x+(26+14θ) 28+θ
Add Skipped
y+31x+20 x+(2+26θ) 12+15θ 4P1 = 4P2 =
1 1 Dbl (36, 18) (41+17θ, 6+6θ)
x+7 y+(18+22θ)x+(29+2θ) 25+18θ
y+2x+39 x+(41+8θ) 25+15θ 5P1 = 5P2 =
Add (10, 16) (2+35θ, 30+18θ)
x+33 y+(28+9θ)x+(31+9θ) 28+20θ
y+8x+33 x+(28+21θ) 10+22θ 10P1 = 10P2 =
0 1 Dbl (1, 41) (15+22θ, 38+29θ)
x+42 y+(19+16θ)x+(19+16θ) 12+28θ
x+42 1 12θ 11P1 = 11P2 =
Add O O
1 x+(28+21θ) 18+32θ
i mi Step Λ1 Λ2 f U1 U2
y + 20x + 21 x + 33 17
2 0 Dbl 2P1 = (11, 26) 2P2 = (10, 27)
x + 32 y + 20x + 42 37
Add Skipped
y + 31x + 20 x + 42 0
1 1 Dbl 4P1 = (36, 18) 4P2 = (1, 2)
x+7 y + 35x + 10 20
y + 2x + 39 x+7 0
Add 5P1 = (10, 16) 5P2 = (36, 18)
x + 33 y + 19x + 22 0
y + 8x + 33 x + 20 0
0 1 Dbl 10P1 = (1, 41) 10P2 = (23, 29)
x + 42 y + 3x + 3 0
x + 42 1 0
Add 11P1 = O 11P2 = O
1 x + 20 0
During the doubling step in the second iteration, we have U2 = 2P2 . The
vertical line (x + 42 = 0) passing through 2U2 and −2U2 passes through P1 ,
since 2U2 = 4P2 = 12P1 = P1 . So the numerator fnum becomes 0. During
the addition step of the same iteration, we have U2 = 4P2 = P1 . The line
(y + 19x + 22 = 0) passing through U2 and P2 evaluates to 0 at P1 , and so
the denominator fden too becomes 0.
In practice, one works with much larger values of m. If P2 is a random
multiple of P1 , the probability of accidentally hitting upon this linear relation
in one of the Θ(log m) Miller iterations is rather small, and Algorithm 4.2
successfully terminates with high probability. Nonetheless, if the algorithm
fails, we may choose random points T on the curve and use Eqn (4.26) (instead
of Eqn (4.27) on which Algorithm 4.2 is based) until em (P1 , P2 ) is correctly
computed. In any case, Proposition 4.65(6) indicates that in this case we are
going to get em (P1 , P2 ) = 1 (when m is prime). However, checking whether
P1 and P2 are linearly dependent is, in general, not an easy computational
exercise. Although the situation is somewhat better for supersingular curves,
a check for the dependence of P1 and P2 should be avoided. ¤
The current versions of GP/PARI do not provide ready supports for Weil (or
other) pairings. However, it is not difficult to implement Miller’s algorithm
using the built-in functions of GP/PARI.
that is, f (D′ ) and f (D) differ by a multiplicative factor which is an m-th
power in L∗ . Treating f (D) as an element of L∗ /(L∗ )m makes it unique.
Another way of making the Tate pairing unique is based upon the fact
q k −1 q k −1
³ ´qk −1 q k −1 k
that f (D) m = f (D′ ) m g([P ] − [O]) = f (D′ ) m , since aq −1 = 1
for all a ∈ L∗ = F∗qk . The reduced Tate pairing of P and Q is defined as
q k −1 q k −1
êm (P, Q) = (hP, Qim ) m
= f (D) m .
Raising hP, Qim to the exponent (q k − 1)/m is called final exponentiation.
Tate pairing is related to Weil pairing as
hP, Qim
em (P, Q) = ,
hQ, P im
where the equality is up to multiplication by elements of (L∗ )m . Tate pairing
shares some (not all) properties of Weil pairing listed in Proposition 4.65.
Proposition 4.69 For appropriate points P, Q, R on E, we have:
(1) Bilinearity:
hP + Q, Rim = hP, Rim × hQ, Rim ,
hP, Q + Rim = hP, Qim × hP, Rim .
(2) Non-degeneracy: For every P ∈ EL [m], P = 6 O, there exists Q for which
hP, Qim 6= 1. For every Q ∈
/ mEL , there exists P ∈ EL [m] with hP, Qim 6= 1.
Arithmetic of Elliptic Curves 229
Algorithm 4.3 is somewhat more efficient than Algorithm 4.2. First, Tate
pairing requires only one point U to be maintained and updated in the loop,
230 Computational Number Theory
whereas Weil pairing requires two (U1 and U2 ). Second, in the loop of Al-
gorithm 4.3, only one set of rational functions (LU,U and L2U,−2U during
doubling, and LU,P and LU +P,−(U +P ) during addition) needs to be computed
(but evaluated twice). The loop of Algorithm 4.2 requires the computation
of two sets of these functions. To avoid degenerate output, it is a common
practice to take the first point P from Eq and the second point Q from Eqk .
In this setting, the functions fn,P are defined over Fq , whereas the functions
fn,Q are defined over Fqk . This indicates that the Miller’s loop for computing
hP, Qim is more efficient than that for computing hQ, P im . Moreover, if P
and Q are known to be linearly independent, we use Eqn (4.29) instead of
Eqn (4.28). This reduces the number of evaluations of the line functions by a
factor of two. As a result, Tate pairing is usually preferred to Weil pairing in
practical applications. The reduced Tate pairing, however, calls for an extra
final exponentiation. If k is not too small, this added overhead may make Tate
pairing less efficient than Weil pairing.
Example 4.70 Let us continue to work with the curve of Example 4.68, and
compute the Tate pairing of P = (1, 2) and Q = (15 + 22θ, 5 + 14θ). (These
points were called P1 and P2 in Example 4.68). Miller’s loop of Algorithm 4.3
proceeds as in the following table. These computations correspond to the
point T = (36 + 12θ, 40 + 31θ) for which Q + T = (19 + 32θ, 24 + 27θ). Here,
we have only one set of points maintained as U (Weil pairing required two:
U1 , U2 ). The updating rational function Λ is LU,U /L2U,−2U for doubling and
LU,P /LU +P,−(U +P ) for addition.
i mi Step Λ f U
y + 20x + 21 41 + 17θ
2 0 Dbl 2P = (11, 26)
x + 32 27 + 27θ
Add Skipped
y + 31x + 20 14 + 31θ
1 1 Dbl 4P = (36, 18)
x + 32 15 + 15θ
y + 2x + 39 41 + 36θ
Add 5P = (10, 16)
x + 33 37 + 16θ
y + 8x + 33 36 + 24θ
0 1 Dbl 10P = (1, 41)
x + 42 11 + 36θ
x + 42 9 + 36θ
Add 11P = O
1 39 + 16θ
The value of hP, Qim depends heavily on the choice of the point T . For
example, the choice T = (34 + 23θ, 9 + 23θ) (another point of order 44) gives
The two values of hP, Qim differ by a factor which is an m-th power in F∗432 :
4 + 33θ
= 9 + 9θ = (4 + 23θ)m .
14 + 4θ
However, the final exponentiation gives the same value, that is,
2
êm (P, Q) = (4 + 33θ)(43 −1)/11
= 2 + 13θ.
i mi Step Λ f U
y + 20x + 21 25 + 24θ
2 0 Dbl 2P = (11, 26)
x + 32 4 + 22θ
Add Skipped
y + 31x + 20 5 + 23θ
1 1 Dbl 4P = (36, 18)
x + 32 22 + 26θ
y + 2x + 39 25 + 14θ
Add 5P = (10, 16)
x + 33 11 + 12θ
y + 8x + 33 13 + 29θ
0 1 Dbl 10P = (1, 41)
x + 42 19 + 8θ
x + 42 17 + 4θ
Add 11P = O
1 19 + 8θ
17 + 4θ
We now get hP, Qim = = 15 + 12θ. We have seen that Eqn (4.28)
19 + 8θ
with T = (36 + 12θ, 40 + 31θ) gives hP, Qim = 14 + 4θ. The ratio of these two
15 + 12θ
values is = 7θ = (6θ)m which is an m-th power in F∗432 . Now, the
14 + 4θ
2
reduced pairing is êm (P, Q) = (15 + 12θ)(43 −1)/11 = 2 + 13θ which is again
the same as that computed using Eqn (4.28). ¤
232 Computational Number Theory
Tate pairing of P1 = (1, 2) and φ(P2 ) = (20, 14θ) is a non-trivial value. Algo-
rithm 4.3 with T = (37 + 6θ, 14 + 13θ) gives hP1 , φ(P2 )im = 21 + 2θ, and so
2
êm (P1 , φ(P2 )) = (21 + 2θ)(43 −1)/11 = 18 + 8θ. Moreover, Algorithm 4.2 for
Weil pairing now gives em (P1 , φ(P2 )) = 11 + 3θ, again a non-trivial value.
Let us now compute the pairing of P2 = (23, 14) and φ(P1 ) = (−1, 2θ) =
(42, 2θ). Tate pairing with T = (38 + 21θ, 19 + 11θ) gives hP2 , φ(P1 )im =
30 + 29θ = (21 + 2θ) × (23θ) = (21 + 2θ) × (30θ)m . The reduced Tate pairing
2
is êm (P2 , φ(P1 )) = (30 + 29θ)(43 −1)/11 = 18 + 8θ = êm (P1 , φ(P2 )). Finally,
the Weil pairing of P2 and φ(P1 ) is em (P2 , φ(P1 )) = 11 + 3θ = em (P1 , φ(P2 )).
Symmetry about the two arguments is thereby demonstrated. ¤
4.5.4.2 Twists
Another way of achieving linear independence of P and Q′ is by means of
twists which work even for ordinary curves. Suppose that p 6= 2, 3, and E is
defined by the short Weierstrass equation E : Y 2 = X 3 + aX + b. Further, let
d be an integer > 2, and v ∈ F∗q a d-th power non-residue. The curve
E ′ : Y 2 = X 3 + v 4/d aX + v 6/d b
Freeman, Michael Scott and Edlyn Teske, A taxonomy of pairing-friendly elliptic curves,
Journal of Cryptology, 23(2), 224–280, 2010.
Arithmetic of Elliptic Curves 235
(4) m(x)|Φk (t(x) − 1), where Φk is the k-th cyclotomic polynomial (see
Exercise 3.36).
(5) There are infinitely many integers (x, y) satisfying ∆y 2 = 4q(x) − t(x)2 .
If we are interested in ordinary curves, we additionally require:
(6) gcd(q(x), m(x)) = 1.
For a choice of t(x), m(x), q(x), families of elliptic curves over Fq of size
m, embedding degree k, and discriminant ∆ can be constructed using the
complex multiplication method. If y in Condition (5) can be parametrized by
a polynomial y(x) ∈ Q[x], the family is called complete, otherwise it is called
sparse. Some sparse families of ordinary pairing-friendly curves are:
• MNT (Miyaji–Nakabayashi–Takano) curves29 : These are ordinary curves
of prime orders with embedding degrees three, four, or six. Let m > 3
be the order (prime) of an ordinary curve E, t = q + 1 − m the trace of
Frobenius, and k the embedding degree of E. The curve E is completely
characterized by the following result.
(1) k = 3 if and only if t = −1 ± 6x and q = 12x2 − 1 for some x ∈ Z.
(2) k = 4 if and only if t = −x or t = x + 1, and q = x2 + x + 1 for
some x ∈ Z.
(3) k = 6 if and only if t = 1 ± 2x and q = 4x2 + 1 for some x ∈ Z.
• Freeman curves30 : These curves have embedding degree ten, and corre-
spond to the choices:
t(x) = 10x2 + 5x + 3,
m(x) = 25x4 + 25x3 + 15x2 + 5x + 1,
q(x) = 25x4 + 25x3 + 25x2 + 10x + 3.
For this family, we have m(x) = q(x) + 1 − t(x). The discriminant ∆ of
Freeman curves satisfies ∆ ≡ 43 or 67 (mod 120).
Some complete families of ordinary pairing-friendly curves are:
• BN (Barreto–Naehrig) curves31 : These curves have embedding degree
12 and discriminant three, and correspond to the following choices.
t(x) = 6x2 + 1,
m(x) = 36x4 + 36x3 + 18x2 + 6x + 1,
q(x) = 36x4 + 36x3 + 24x2 + 6x + 1.
29 Atsuko Miyaji, Masaki Nakabayashi and Shunzo Takano, New explicit conditions of
t(x) = −4x2 + 4x + 2,
17
q(x) = 4x5 − 8x4 + 3x3 − 3x2 + x + 1,
4
m(x) = 16x4 − 32x3 + 12x2 + 4x + 1.
t(x) = −x2 + 1,
m(x) = Φ4k (x),
1 ¡ 2k+4 ¢
q(x) = x + 2x2k+2 + x2k + x4 − 2x2 + 1
4
parametrize a family of BW curves with odd embedding degree k < 1000
and with discriminant ∆ = 1. Many other families of BLS and BW
curves are known.
Although the double-and-add loop for Tate pairing is more efficient than
that for Weil pairing, the added overhead of final exponentiation is unpleasant
for the reduced Tate pairing. Fortunately, we can choose the curve parameters
and tune this stage so as to arrive at an efficient implementation.38
Suppose that the basic field of definition of the elliptic curve E is Fq with
p = char Fq . We take m to be a prime dividing |Eq |. If k is the embedding
degree for this q and m, the final-exponentiation stage involves an exponent
of (q k − 1)/m. In this stage, we do arithmetic in the extension field Fqk .
36 Ian F. Blake, V. Kumar Murty and Guangwu Xu, Refinements of Miller’s algorithm for
tation based on Miller’s algorithm, Applied Mathematics and Computation, 189(1), 395–409,
2007. This is available also at http://eprint.iacr.org/2006/106.
38 Michael Scott, Naomi Benger, Manuel Charlemagne, Luis J. Dominguez Perez and
Ezekiel J. Kachisa, On the final exponentiation for calculating pairings on ordinary elliptic
curves, Pairing, 78–88, 2009.
238 Computational Number Theory
The inner exponentiation (to the power q d −1) involves d q-th power exponen-
tiations, followed by multiplication by f −1 . Since |(q d +1)/m| ≈ 12 |(q k −1)/m|,
this strategy reduces final-exponentiation time by a factor of about two.
Arithmetic of Elliptic Curves 239
39 Paulo S. L. M. Barreto, Ben Lynn, and Michael Scott, On the selection of pairing-
Eta Pairing
Barreto et al.41 propose an improvement of the Duursma–Lee construction.
Let E be a supersingular elliptic curve defined over K = Fq , m a prime divisor
of |Eq |, and k the embedding degree. E being supersingular, there exists a
distortion map φ : G → G′ for suitable groups G ⊆ Eq [m] and G′ ⊆ Eqk [m]
of order m. The distorted Tate pairing is defined as hP, φ(Q)im = fm,P (D)
for P, Q ∈ G, where D is a divisor equivalent to [φ(Q)] − [O]. For a suitable
choice of M , Barreto et al. define the eta pairing of P, Q ∈ G as
For the original Tate pairing, we have mP = O. Now, we remove this require-
ment M P = O. We take M = q − cm for some c ∈ Z such that for every
point P ∈ G we have M P = γ(P ) for some automorphism γ of Eq . The au-
tomorphism γ and the distortion map φ should satisfy the golden condition:
γ(φq (P )) = φ(P ) for all P ∈ Eq . If M a + 1 = λm for some a ∈ N and λ ∈ Z,
then ηM (P, Q) is related to the Tate pairing hP, φ(Q)im as
³ a−1
´(qk −1)/m ³ ´(qk −1)/m
ηM (P, Q)aM = hP, φ(Q)iλm = êm (P, φ(Q))λ .
Example 4.77 Eta pairing provides a sizeable speedup for elliptic curves
over finite fields of characteristics two and three. For fields of larger charac-
teristics, eta pairing is not very useful. Many families of supersingular curves
and distortion maps on them can be found in Example 4.76 and Exercise 4.67.
(1) Let E : Y 2 + Y = X 3 + X + a, a ∈ {0, 1}, be a supersingular curve
defined over F2r with odd r. The choice γ = φr satisfies the golden condition.
In this case, |Eq | = 2r ± 2(r+1)/2 + 1. Suppose that |Eq | is prime, so we take
m = |Eq |. We choose M = ∓2(r+1)/2 − 1, so for a = 2 we have M 2 + 1 = 2M ,
40 Iwan M. Duursma and Hyang-Sook Lee, Tate pairing implementation for hyperelliptic
Efficient pairing computation on supersingular Abelian varieties, Designs, Codes and Cryp-
tography, 239–271, 2004.
Arithmetic of Elliptic Curves 241
Ate Pairing
Hess et al.42 extend the idea of eta pairing to ordinary curves. Distortion
maps do not exist for ordinary curves. Nonetheless, subgroups G ⊆ Eq and
G′ ⊆ Eqk of order m can be chosen. Instead of defining a pairing on G × G′ ,
Hess et al. define a pairing on G′ × G. If t is the trace of Frobenius for E at
q, they take M = t − 1, and define the ate pairing of Q ∈ G′ and P ∈ G as
aM (Q, P ) = fM,Q (P ).
The ate pairing is related to the Tate pairing as follows. Let N = gcd(M k − 1,
q k − 1), where k is the embedding degree. Write M k − 1 = λN , and c =
Pk−1 k−1−i i
i=0 M q ≡ kq k−1 (mod m). Then, we have
³ ´(qk −1)/N ³ ´(qk −1)/m
aM (Q, P )c = hQ, P iλm = êm (Q, P )λ .
fM e ,P (Q)
Example 4.78 I now illustrate how twists can help in speeding up each Miller
iteration. The following example is of a modified ate pairing that uses twists,
since twisted ate pairing is defined in a slightly different way.
Take a Barreto–Naehrig (BN) curve E : Y 2 = X 3 + b defined over Fp with
a prime p ≡ 1 (mod 6). The embedding degree is k = 12. Define a sextic twist
of E with respect to a primitive sixth root ζ of unity. The twisted curve can
be written as E ′ : µY 2 = νX 3 + b, where µ ∈ F∗p2 is a cubic non-residue,
and ν ∈ F∗p2 is a quadratic non-residue. E ′ is defined over Fp2 , and we take a
subgroup G′ of order m in Ep′ 2 . A homomorphism φ6 that maps G′ into Ep12
is given by (r, s) 7→ (ν 1/3 r, µ1/2 s). We use standard ate pairing to define the
pairing aM (φ6 (Q), P ) of Q ∈ G′ and P ∈ G. For Q ∈ G, the point φ6 (P ) is
defined over Fp12 , but not over smaller subfields (in general). Nonetheless, the
association of G′ with φ6 (G′ ) allows us to work in Fp2 in some parts of the
Miller loop. For example, if Q1 , Q2 ∈ G′ , then φ6 (Q1 )+φ6 (Q2 ) = φ6 (Q1 +Q2 ),
that is, the point arithmetic in Miller’s loop can be carried out in Fp2 . ¤
Ate i Pairing
Zhao et al.43 propose another optimization of ate pairing. For an integer i
in the range 1 6 i 6 k − 1, they take Mi ≡ (t − 1)i ≡ q i (mod m), and define
fMi ,Q (P )
R-ate Pairing
At present, the best loop-reducing pairing is proposed by Lee et al.44 If e
and e′ are two pairings, then so also is e(Q, P )u e′ (Q, P )v for any integers u, v.
For A, B, a, b ∈ Z with A = aB + b, Lee et al. define the R-ate pairing as
RA,B (Q, P ) = fa,BQ (P )fb,Q (P )GaBQ,bQ (P ),
L
U,V
where GU,V = LU +V,−(U +V )
. Any choice of A, B, a, b does not define a pair-
ing. If fA,Q (P ) and fB,Q (P ) define non-degenerate bilinear pairings with
êm (Q, P )λ1 = fA,Q (P )µ1 and êm (Q, P )λ2 = fB,Q (P )µ2 for λ1 , λ2 , µ1 , µ2 ∈ Z,
then RA,B (Q, P ) is again a non-degenerate bilinear pairing satisfying
êm (Q, P )λ = RA,B (Q, P )µ ,
where µ = lcm(µ1 , µ2 ) and λ = (µ/µ1 )λ1 − a(µ/µ2 )λ2 , provided that m6 | λ.
There are several choices for A, B, including q, m, and the integers Mi of atei
pairing. If A = Mi and B = m, then RA,B (Q, P ) is the atei pairing of Q, P .
R-ate pairing makes two invocations of the Miller loop, but a suitable
choice of A, B, a, b reduces the total number of Miller iterations compared to
the best atei pairing. There are examples where loop reduction can be by a
factor of six over Tate pairing (for ate and atei pairings, the reduction factor
can be at most two). Moreover, R-ate pairing is known to be optimal on certain
curves for which no atei pairing is optimal. Another useful feature of R-ate
pairing is that it can handle both supersingular and ordinary curves.
posia in Pure Mathematics, 20, 415–440, 1971. The BSGS paradigm is generic enough to
be applicable to a variety of computational problems. Its adaptation to point counting and
Mestre’s improvement are discussed in Schoof’s 1995 paper (see Footnote 46). Also see
Section 7.1.1 for another adaptation of the BSGS method.
Arithmetic of Elliptic Curves 245
If we complete all the giant steps, we find no other multiple of ord P in the
Hasse interval [935, 1061]. So the size of EF997 is 1043. Since 1043 = 7 × 149
is square-free, this group is cyclic. ¤
The BSGS method may fail to supply a unique answer if ord P has more
than one multiple in the Hasse interval. For instance, suppose that we start
with 149P as the base point P in Example 4.79. This point has order 7, so
every multiple of 7 in the Hasse interval will be supplied as a possible can-
didate for |Eq |. For this example, the problem can be overcome by repeating
the algorithm for different random choices of the base point P . After a few
iterations, we expect to find a P with a unique multiple of ord P in the Hasse
interval. This is indeed the expected behavior if Eq is a cyclic group.
However, the group Eq need not be cyclic. By Theorem 4.12, we may have
Eq ∼= Zn1 ⊕ Zn2 with n2 | gcd(n1 , q − 1). Every point P ∈ Eq satisfies n1 P = O
(indeed, n1 is the smallest positive integer with this property; we call n1 the
exponent of the group Eq ). If n1 is so small that the Hasse interval contains
two or more multiples of n1 , the BSGS method fails to supply a unique answer,
no matter how many times we run it (with different base points P ).
Example 4.80 The curve E : Y 2 = X 3 +X +161 defined over F1009 contains
1024 points and has the group structure Z64 ⊕ Z16 . The exponent of the group
is 64, which has two multiples 960 and 1024 in the Hasse interval [947, 1073].
Therefore, both 960P = O and 1024P = O for any point P on E. A point P of
order smaller than 64 has other multiples of ord P in the Hasse interval. Trying
several random points on E, we can eliminate these extra candidates, but the
ambiguity between 960 and 1024 cannot be removed. This is demonstrated
below for P = (6, 49). Now, we have s = 6, Q = (947, 339), and R = (947, 670).
Baby steps Giant steps
j jP −jP i Q + iR j
0 O 0 (947, 339)
1 (6, 49) (6, 960) 1 O 0
2 (3, 47) (3, 962) 2 (947, 670)
3 (552, 596) (552, 413) 3 (550, 195)
4 (798, 854) (798, 155) 4 (588, 583)
5 (455, 510) (455, 499) 5 (604, 602)
6 (413, 641) (413, 368) 6 (6, 49) 1
7 (717, 583)
8 (855, 172)
9 (756, 1000)
10 (713, 426)
11 (3, 47) 2
12 (842, 264)
13 (374, 133)
For (i, j) = (1, 0), we get m = 947+13−0 = 960, whereas for (i, j) = (6, 1),
we get m = 947+6×13−1 = 1024. The algorithm also outputs (i, j) = (11, 2),
246 Computational Number Theory
It is easy to verify that the BSGS method makes Θ(q 1/4 ) group operations
in Eq , that is, the BSGS method is an exponential-time algorithm, and cannot
be used for elliptic-curve point counting, except only when q is small.
Arithmetic of Elliptic Curves 247
ϕ2 − tϕ + p = 0, (4.30)
where the addition, the subtraction and the scalar multiplications correspond
to the arithmetic of EF̄p . Let tr ≡ t (mod r) and pr ≡ p (mod r) for a small
odd prime r not equal to p. We then have
2 2
(xp , y p ) − tr (xp , y p ) + pr (x, y) = O (4.32)
for all points (x, y) in the group E[r] of r-torsion points on EF̄p (that is, points
P with rP = O). By varying tr in the range 0 6 tr 6 r − 1, we find the correct
value of tr for which Eqn (4.32) holds identically on E[r]. For each trial value
of tr , we compute the left side of Eqn (4.32) symbolically using the addition
formula for the curve. There are, however, two problems with this approach.
The first problem is that the left side of Eqn (4.32) evaluates to a pair
of rational functions of degrees as high as Θ(p2 ). Our aim is to arrive at a
polynomial-time algorithm (in log p). This problem is solved by using division
polynomials (Section 4.4.3). For the reduced Weierstrass equation, we have
ψ0 (x, y) = 0,
ψ1 (x, y) = 1,
ψ2 (x, y) = 2y,
ψ3 (x, y) = 3x4 + 6ax2 + 12bx − a2 ,
ψ4 (x, y) = 4y(x6 + 5ax4 + 20bx3 − 5a2 x2 − 4abx − 8b2 − a3 ),
48 Schoof’s algorithm performs symbolic manipulation on the coordinates x, y of points on
Since we are interested in evaluating Eqn (4.32) for points in E[r], it suffices
to do so modulo ψr (x, y). But the polynomials ψr (x, y) are in two variables
x, y. Since y 2 = x3 + ax + b, we can simplify ψm (x, y) to either a polynomial
in Fp [x] or y times a polynomial in Fp [x]. In particular, we define
½
ψm (x, y) if m is odd,
fm (x) =
ψm (x, y)/y if m is even.
The polynomials fm (x) are in one variable x only, with
(1
(m2 − 1) if m is odd,
deg fm (x) = 12 2
2 (m − 4) if m is even.
³ ´
p2 2 3
4fπ (x − x )fπ − (x + ax + b)fπ−1 fπ+1 if π is odd,
³ 2
β = 4(x3 + ax + b)fπ (x − xp )(x3 + ax + b)fπ2 −
´
fπ−1 fπ+1 if π is even.
We check whether the following condition holds for the selected value of τ :
(
δ1 fτ2p + δ2 (fτ −1 fτ +1 )p (x3 + ax + b)p if τ is odd,
0 = (4.35)
δ1 fτ2p (x3 + ax + b)p + δ2 (fτ −1 fτ +1 )p if τ is even,
In order
¡ ¢ to ¡identify
¢ the correct sub-case, we first compute the Legendre
symbol πr . If πr = −1, then τ = 0. Otherwise, we compute a square root
w of π (or p) modulo r. We may use a probabilistic algorithm like the Tonelli
and Shanks algorithm (Algorithm 1.9) for computing w. Since r = O(log p),
we can find w by successively squaring 1, 2, 3, . . . until w2 ≡ π (mod r) holds.
Now, we check whether ϕ(P ) = wP or ϕ(P ) = −wP for some P ∈ E[r].
In the first case, τ ≡ 2w (mod r), whereas in the second, τ ≡ −2w (mod r).
If no such P exists in E[r], we have τ = 0. If we concentrate only on the
x-coordinates, we can detect the existence of such a P . As we have done in
detecting whether Q = ±R (that is, ϕ2 (P ) = ±πP ), checking the validity of
the condition ϕ(P ) = ±wP boils down to computing the following gcd:
³ ´
p 2 3
gcd fr (x), (x − x)fw (x) + fw−1 (x)fw+1 (x)(x + ax + b)
if w is odd,
δ(x) = ³ ´
p 2 3
gcd fr (x), (x − x)fw (x)(x + ax + b) + fw−1 (x)fw+1 (x)
if w is even,
where the second argument is computed modulo fr (x). If δ(x) = 1, then τ = 0.
Otherwise, we need to identify which one of the equalities ϕ(P ) = ±wP holds.
For deciding this, we consult the y-coordinates, and compute another gcd:
³
gcd fr (x), 4(x3 + ax + b)(p−1)/2 fw3 (x) −
´
2 2
fw+2 (x)fw−1 (x) + fw−2 (x)fw+1 (x) if w is odd,
η(x) = ³
3 (p+3)/2 3
gcd fr (x), 4(x + ax + b) fw (x) −
´
fw+2 (x)f 2 (x) + f
w−1 w−2 (x)f 2 (x)
w+1 if w is even,
252 Computational Number Theory
where again arithmetic modulo fr (x) is used to evaluate the second argument.
If η(x) = 1, then τ ≡ −2w (mod r), otherwise τ ≡ 2w (mod r).
Exercises
1. Prove that the Weierstrass equation of an elliptic curve is irreducible, that is,
the polynomial Y 2 + (a1 X + a3 )Y − (X 3 + a2 X 2 + a4 X + a6 ) with ai ∈ K is
irreducible in K[X, Y ].
2. Prove that an elliptic or hyperelliptic curve is smooth at its point at infinity.
3. Let C : Y 2 = f (X) be the equation of a cubic curve C over a field K with
char K 6= 2, where f (X) = X 3 + aX 2 + bX + c with a, b, c ∈ K. Prove that C
is an elliptic curve (that is, smooth or non-singular) if and only if Discr(f ) 6= 0
(or, equivalently, if and only if f (X) has no multiple roots).
4. Let K be a finite field of characteristic two, and a, b, c ∈ K. Prove that:
(a) The curve Y 2 + aY = X 3 + bX + c is smooth if and only if a 6= 0.
(b) The curve Y 2 + XY = X 3 + aX 2 + b is smooth if and only if b 6= 0.
5. Determine which of the following curves is/are smooth (that is, elliptic curves).
(a) Y 2 = X 3 − X 2 − X + 1 over Q.
(b) Y 2 + 2Y = X 3 + X 2 over Q.
(c) Y 2 + 2XY = X 3 + 1 over Q.
(d) Y 2 + 4XY = X 3 + 4X over Q.
(e) Y 2 + Y = X 3 + 5 over F7 .
(f ) Y 2 + Y = X 3 + 5 over F11 .
6. Let the elliptic curve E : Y 2 = X 3 + 2X + 3 be defined over F7 . Take the
points P = (2, 1) and Q = (3, 6) on E.
(a) Compute the points P + Q, 2P and 3Q on the curve.
(b) Determine the order of P in the elliptic curve group E(F7 ).
(c) Find the number of points on E treated as an elliptic curve over F49 = F72 .
7. Let P = (h, k) be a point with 2P = (h′ , k ′ ) 6= O on the elliptic curve Y 2 =
X 3 + aX 2 + bX + c. Verify that
h4 − 2bh2 − 8ch − 4ac + b2 h4 − 2bh2 − 8ch − 4ac + b2
h′ = 2 = , and
4k 4h3 + 4ah2 + 4bh + 4c
h6 + 2ah5 + 5bh4 + 20ch3 + (20ac − 5b2 )h2 + (8a2 c − 2ab2 − 4bc)h +
(4abc − b3 − 8c2 )
k′ = 3 .
8k
8. Let K be a finite field of characteristic two, and a, b, c ∈ K. Prove that:
(a) The supersingular curve E1 : Y 2 + aY = X 3 + bX + c contains no points
of order two. In particular, the size of (E1 )K is odd.
(b) The ordinary curve E2 : Y 2 + XY = X 3 + aX 2 + b contains exactly one
point of order two. In particular, the size of (E2 )K is even.
9. Let E be an elliptic curve defined over a field K with char K 6= 2, 3. Prove
that E has at most eight points of order three. If K is algebraically closed,
prove that E has exactly eight points of order three.
256 Computational Number Theory
25. Let E be a supersingular elliptic curve defined over a prime field Fp with
p > 5. Determine the size of EFpn , and conclude that E remains supersingular
over all extension Fpn , n > 1.
26. Rewrite the square-and-multiply exponentiation algorithm (Algorithm 1.4)
for computing the multiple of a point on an elliptic curve. (In the context of
elliptic-curve point multiplication, we call this a double-and-add algorithm.)
27. [Eisenträger, Lauter and Montgomery] In the double-and-add elliptic-curve
point-multiplication algorithm, we need to compute points 2P +Q for every 1-
bit in the multiplier. Conventionally, this is done as (P +P )+Q. Assuming that
the curve is given by the short Weierstrass equation, count the field operations
used in computing (P + P ) + Q. Suppose instead that 2P + Q is computed
as (P + Q) + P . Argue that we may avoid computing the Y -coordinate of
P + Q. What saving does the computation of (P + Q) + P produce (over that
of (P + P ) + Q)?
28. Find the points at infinity (over R and C) on the following real curves.
2 2
(a) Ellipses of the form X2 + Y 2 = 1.
a b
2 2
(b) Hyperbolas of the form X2 − Y 2 = 1.
a b
(c) Hyperbolas of the form XY = a.
29. Let C : f (X, Y ) = 0 be a curve defined by a non-constant irreducible polyno-
mial f (X, Y ) ∈ K[X, Y ]. Let d be deg f (X, Y ), and fd (X, Y ) the sum of all
non-zero terms of degree d in f (X, Y ). Prove that all points at infinity on C
are obtained by solving fd (X, Y ) = 0. Conclude that all the points at infinity
on C can be obtained by solving a univariate polynomial equation over K.
30. [Projective coordinates] Projective coordinates are often used to speed up
elliptic-curve arithmetic. In the projective plane, a finite point (h, k) corre-
sponds to the point [h′ , k ′ , l′ ] with l′ 6= 0, h = h′ /l′ , and k = k ′ /l′ . Let E be
an elliptic curve defined by the special Weierstrass equation Y 2 = X 3 +aX +b,
and the finite points P1 , P2 on E have projective coordinates [h1 , k1 , l1 ] and
[h2 , k2 , l2 ]. Further, let P1 + P2 have projective coordinates [h, k, l], and 2P1
have projective coordinates [h′ , k ′ , l′ ].
(a) Express h, k, l as polynomials in h1 , k1 , l1 , h2 , k2 , l2 .
(b) Express h′ , k ′ , l′ as polynomials in h1 , k1 , l1 .
(c) Show how the double-and-add point-multiplication algorithm (Exercise
4.26) can benefit from the representation of points in projective coordinates.
31. [Mixed coordinates]49 Take the elliptic curve Y 2 = X 3 +aX +b. Suppose that
the point P1 = [h′1 , k1′ , l1′ ] on the curve is available in projective coordinates,
whereas the point P2 = (h2 , k2 ) is available in affine coordinates. Express the
projective coordinates of P1 +P2 as polynomial expressions in h′1 , k1′ , l1′ , h2 , k2 .
What impact does this have on the point-multiplication algorithm?
49 Henri Cohen, Atsuko Miyaji and Takatoshi Ono, Efficient elliptic curve exponentiation
addition in formal groups and new primality and factorization tests, Advances in Applied
Mathematics, 7(4), 385–434, 1986.
51 Julio López and Ricardo Dahab, Improved algorithms for elliptic curve arithmetic in
39. Prove that the norm function defined by Eqn (4.11) is multiplicative, that is,
N(G1 G2 ) = N(G1 ) N(G2 ) for all polynomial functions G1 , G2 ∈ K[C].
40. Consider the unit circle C : X 2 +Y 2 = 1 as a complex curve. Find all the zeros
and poles of the rational function R(x, y) of Example 4.35. Also determine the
multiplicities of these zeros and poles. (Hint: Use the factored form given in
Eqn (4.12). Argue that 1/x can be taken as a uniformizer at each of the two
points at infinity on C.)
41. Consider the real hyperbola H : X 2 − Y 2 = 1. Find all the zeros and poles
(and their respective multiplicities) of the following rational function on H:
2y 4 − 2y 3 x − y 2 + 2yx − 1
R(x, y) = .
y 2 + yx + y + x + 1
(Hint: Split the numerator and the denominator of R into linear factors.)
42. Repeat Exercise 4.41 treating the hyperbola H as being defined over F5 .
43. Find all the zeros and poles (and their multiplicities) of the rational function
x/y on the curve Y 2 = X 3 − X defined over C.
44. Find all the zeros and poles (and their multiplicities) of the function x2 + yx
on the curve Y 2 = X 3 + X defined over F3 .
45. Find all the zeros and poles (and their multiplicities) of the function 1 + yx
on the curve Y 2 = X 3 + X − 1 defined over the algebraic closure F̄7 of F7 .
46. Prove that the q-th power Frobenius map ϕq (Definition 4.47) is an endomor-
phism of E = EF̄q .
47. Prove that an admissible change of variables (Theorem 4.51) does not change
the j-invariant (Definition 4.9).
48. Let K be a field of characteristic 6= 2, 3. Prove that the elliptic curves E :
Y 2 = X 3 + aX + b and E ′ : Y 2 = X 3 + a′ X + b′ defined over K are isomorphic
over K̄ if and only if there exists a non-zero u ∈ K̄ such that replacing X by
u2 X and Y by u3 Y converts the equation for E to the equation for E ′ .
49. (a) Find all isomorphism classes of elliptic curves defined over F5 , where iso-
morphism is over the algebraic closure F̄5 of F5 .
(b) Argue that the curves Y 2 = X 3 + 1 and Y 2 = X 3 + 2 are isomorphic over
the algebraic closure F̄5 , but not over F5 .
(c) According to Definition 4.50, isomorphism of elliptic curves E and E ′ is
defined by the existence of bijective bilinear maps, not by the isomorphism
′
of the groups EK and EK (where K is a field over which both E and E ′ are
defined). As an example, show that the curves Y 2 = X 3 + 1 and Y 2 = X 3 + 2
have isomorphic groups over F5 .
50. Consider the elliptic curves E : Y 2 = X 3 + 4X and E ′ : Y 2 = X 3 + 4X + 1
both defined over F5 .
(a) Determine the group structures of EF5 and EF′ 5 .
(b) Demonstrate that the rational map
µ 2 ¶
x − x + 2 x2 y − 2xy − y
φ(x, y) = , 2
x−1 x − 2x + 1
260 Computational Number Theory
59. [Blake, Murty and Xu] Let U ∈ E[m] be non-zero, and Q 6= O, U, 2U, . . . ,
(m − 1)U . Prove the following assertions.
LU, U (Q) 1
(a) 2 =− .
LU, −U (Q)L2U, −2U (Q) LU, U (−Q)
L(k+1)U, kU (Q) LkU, −kU (Q)
(b) =− for k ∈ Z.
L(k+1)U, −(k+1)U (Q)L(2k+1)U, −(2k+1)U (Q) L(k+1)U, kU (−Q)
L2U, U (Q) LU, −U (Q)
(c) =− .
L2U, −2U (Q)L3U, −3U (Q) L2U, U (−Q)
60. Establish the correctness of Algorithm 4.4.
61. [Blake, Murty and Xu] Prove that the loop body of Algorithm 4.1 for the
computation of fn,P (Q) can be replaced as follows:
³ ´
L2U, −2U (Q)
If (ni = 0), then update f = −f 2 × LU, U (−Q)
and U = 2U ,
³ ´
L2U, P (Q)
else update f = −f 2 × LU, U (−Q)
and U = 2U + P .
63. Define the functions fn,P,S as rational functions having the divisor
67. Prove that the following distortion maps are group homomorphisms. (In this
exercise, k is not used to denote the embedding degree.)
(a) For the curve Y 2 = X 3 +a defined over Fp for an odd prime p ≡ 2 (mod 3)
and with a 6≡ 0 (mod p), the map (h, k) 7→ (θh, k), where θ3 = 1.
(b) For the curve Y 2 = X 3 + aX defined over Fp for a prime p ≡ 3 (mod 4)
and with a 6≡ 0 (mod p), the map (h, k) 7→ (−h, θk), where θ2 = −1.
(c) For the curve Y 2 + Y = X 3 + X + a with a = 0 or 1, defined over F2n
with odd n, the map (h, k) 7→ (θh + ζ 2 , k + θζh + ζ), where θ ∈ F22 satisfies
θ2 + θ + 1 = 0, and ζ ∈ F24 satisfies ζ 2 + θζ + 1 = 0.
(d) For the curve Y 2 = X 3 − X + a with a = ±1, defined over F3n with
n divisible by neither 2 nor 3, the map (h, k) 7→ (ζ − h, θk) where θ ∈ F32
satisfies θ2 = −1, and ζ ∈ F33 satisfies ζ 3 − ζ − a = 0.
(e) Let p ≡ 5 (mod 6) be a prime, a ∈ Fp2 a square but not a cube, and
let γ ∈ Fp6 satisfy γ 3 = a. For the curve Y 2 = X 3 + a defined over Fp2 , the
distortion map is (h, k) 7→ (hp /(γa(p−2)/3 ), k p /a(p−1)/2 ).
70. Edwards curves were proposed by Harold M. Edwards,52 and later modified to
suit elliptic-curve cryptography by Bernstein and Lange.53 For finite fields K
with char K 6= 2, an elliptic curve defined over K is equivalent to an Edwards
curve over a suitable extension of K (the extension may be K itself). A unified
addition formula (no distinction between addition and squaring, and a uniform
treatment of all group elements including the identity) makes Edwards curves
attractive and efficient alternatives to elliptic curves. An Edwards curve over
a non-binary finite field K is defined by the equation
Programming Exercises
71. Let p be a small prime. Write a GP/PARI program that, given an elliptic curve
E over Fp , finds all the points on Ep = EFp , calculates the size of Ep , computes
the order of each point in Ep , and determines the group structure of Ep .
72. Repeat Exercise 4.71 for elliptic curves over binary fields F2n for small n.
73. Write a GP/PARI program that, given a small prime p, an elliptic curve E
defined over Fp , and an n ∈ N, outputs the size of the group Epn = EFpn .
74. Write a GP/PARI function that, given an elliptic curve over a finite field (not
necessarily small), returns a random point on the curve.
75. Write a GP/PARI function that, given points U, V, Q on a curve, computes the
equation of the line passing through U and V , and returns the value of the
function at Q. Assume Q to be a finite point, but handle all cases for U, V .
76. Implement the reduced Tate pairing using the function of Exercise 4.75.
Consider a supersingular curve Y 2 = X 3 + aX over a prime field Fp with
p ≡ 3 (mod 4) and with m = (p + 1)/4 a prime.
77. Implement the distorted Tate pairing on the curve of Exercise 4.76.
Chapter 5
Primality Testing
An integer p > 1 is called prime if its only positive integral divisors are 1 and
p. Equivalently, p is prime if and only if p|ab implies p|a or p|b. An integer
n > 1 is called composite if it is not prime, that is, if n has an integral divisor
u with 1 < u < n. The integer 1 is treated as neither prime nor composite.
One can extend the notion of primality to the set of all integers. The addi-
tive identity 0 and the multiplicative units ±1 are neither prime nor composite.
A non-zero non-unit p ∈ Z is called prime if a factorization p = uv necessarily
implies that either u or v is a unit. Thus, we now have the negative primes
−2, −3, −5, . . . . In this book, an unqualified use of the term prime indicates
positive primes. The set of all (positive) primes is denoted by P.
P is an infinite set (Theorem 1.69). Given n ∈ N, there exists a prime (in
fact, infinitely many primes) larger than n. The asymptotic density of primes
(the prime number theorem) and related results are discussed in Section 1.9.
265
266 Computational Number Theory
Considering the possibility of repeated prime factors, one can rewrite the
factorization of n as n = uq1e1 q2e2 · · · qrer , where q1 , q2 , . . . , qr are pairwise dis-
tinct primes, and ei ∈ N is the multiplicity of qi in n, denoted ei = vqi (n).2
Problem 5.2 [Fundamental problem of computational number theory ]Given
a non-zero (usually positive) integer n, compute the decomposition of n into
prime factors, that is, compute all the prime divisors p of n together with their
respective multiplicities vp (n). ⊳
Problem 5.2 is also referred to as the integer factorization problem or as
IFP in short. Solving this demands ability to recognize primes as primes.
Problem 5.3 [Primality testing ] Given a positive integer n > 2, determine
whether n is prime or composite. ⊳
The primality testing problem has efficient probabilistic algorithms. The de-
terministic complexity of primality testing too is polynomial-time. On the
contrary, factoring integers appears to be a difficult and challenging computa-
tional problem. In this chapter, we discuss algorithms for testing the primality
of integers. Integer factorization is studied in Chapter 6.
case n = 1 is not too problematic, since 1 factors uniquely into the empty product of primes.
3 Vaughan Pratt, Every prime has a succinct certificate, SIAM Journal on Computing,
4, 214–220, 1975.
Primality Testing 267
Pass 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Pass 2
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Pass 3
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Pass 4
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
√
The passes stop here, since the first unmarked entry 11 is larger than 50.
All primes 6 50 are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47. ¤
in [n0 , nl−1 ]. After this position, every p-th integer in the interval is a multiple
of p. We set to zero the array entries at all these positions.
After all the small primes are considered, we look at those array indices i
that continue to hold the value 1. These correspond to all those integers ni
in [n0 , nl−1 ] that are divisible by neither of the small primes. We subject only
these integers ni to one or more primality test(s). This method is an example
of sieving. Many composite integers are sieved out (eliminated) much more
easily than running primality tests individually on all of them. For each small
prime p, the predominant cost is that of a division (computation of n0 rem p).
Each other multiple of p is located easily (by adding p to the previous multiple
of p). In practice, one may work with 10 to 1000 small primes.
If the length l of the sieving interval is carefully chosen, we expect to locate
a prime among the non-multiples of small primes. However, if we are unlucky
enough to encounter only composite numbers in the interval, we repeat the
process for another random value of n0 . It is necessary to repeat the process
also in the case that n0 is of bit length s, whereas a discovered prime ni = n0 +i
is of bit length larger than s (to be precise, s + 1 for all sufficiently large s). If
n0 is chosen as a random s-bit integer, and if s is not too small, the probability
of such an overflow is negligibly small.
gp > prime(1000)
%1 = 7919
gp > prime(100000)
*** not enough precalculated primes
gp > primes(16)
%2 = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]
gp > isprime(2^100+277)
%3 = 1
gp > nextprime(2^100)
Primality Testing 271
%4 = 1267650600228229401496703205653
gp > nextprime(2^100)-2^100
%5 = 277
gp > precprime(2^200)
%6 = 1606938044258990275541962092341162602522202993782792835301301
gp > 2^200-precprime(2^200)
%7 = 75
gp > precprime(random(2^10))
%8 = 811
1105 = 5 × 13 × 17,
1729 = 7 × 13 × 19,
2465 = 5 × 17 × 29,
4 These are named after the American mathematician Robert Daniel Carmichael (1879–
1967). The German mathematician Alwin Reinhold Korselt (1864–1947) first introduced
the concept of Carmichael numbers. Carmichael was the first to discover (in 1910) concrete
examples (like 561).
Primality Testing 273
2821 = 7 × 13 × 31,
6601 = 7 × 23 × 41,
8911 = 7 × 19 × 67,
10585 = 5 × 29 × 73,
15841 = 7 × 31 × 73,
29341 = 13 × 37 × 61,
41041 = 7 × 11 × 13 × 41,
46657 = 13 × 37 × 97,
52633 = 7 × 73 × 103,
62745 = 3 × 5 × 47 × 89,
63973 = 7 × 13 × 19 × 37, and
75361 = 11 × 13 × 17 × 31.
The smallest Carmichael numbers with five and six prime factors are 825265 =
5 × 7 × 17 × 19 × 73 and 321197185 = 5 × 19 × 23 × 29 × 37 × 137. ¤
One can show that a Carmichael number must be odd with at least three
distinct prime factors. Alford et al.5 prove that there exist infinitely many
Carmichael numbers. That is bad news for the Fermat test. We need to look
at its modifications so as to avoid the danger posed by Carmichael numbers.
Here is how the Fermat test can be implemented in GP/PARI. In the function
Fpsp(n,t), the first parameter n is the integer whose primality is to be checked,
and t is the count of random bases to try.
gp > Fpsp(n,t) = \
for (i=1, t, \
a = Mod(random(n),n); b = a^(n-1); \
if (b != Mod(1,n), return(0)) \
); \
return(1);
gp > Fpsp(1001,20)
%1 = 0
gp > Fpsp(1009,20)
%2 = 1
gp > p1 = 601; p2 = 1201; p3 = 1801; n = p1 * p2 * p3
%3 = 1299963601
gp > Fpsp(n,20)
%4 = 1
∗ (n−1)/2
¡ a ¢For an odd composite integer n, a base a ∈ Zn satisfying a 6≡
n (mod n) is a witness to the compositeness of n. If no such witness is found
in t iterations, the number n is declared as composite. Of course, primes do
not possess such witnesses. However, a good property of the Solovay–Strassen
test is that an odd composite n (even a Carmichael number) has at least
φ(n)/2 witnesses to the compositeness of n. Thus, the probability of erro-
neously declaring a composite n as prime is no more that 1/2t .
7 Miller proposed a deterministic primality test which is polynomial-time under the as-
sumption that the ERH is true: Gary L. Miller, Riemann’s hypothesis and tests for primality,
Journal of Computer and System Sciences, 13(3), 300–317, 1976.
8 Rabin proposed the randomized version: Michael O. Rabin, Probabilistic algorithm for
Example 5.15 (1) 891 is a strong pseudoprime only to the ten bases 1, 82,
161, 163, 404, 487, 728, 730, 809, 890. These happen to be precisely the bases
to which n is an Euler pseudoprime (Example 5.13).
(2) 2891 is a strong pseudoprime only to the bases 1 and 2890.
(3) 1891 is a strong pseudoprime to 450 bases in Z∗n . These are again all
the bases to which n is an Euler pseudoprime.
(4) Let nS (n) denote the count of bases in Z∗n to which n is a strong
pseudoprime (also see Example 5.13). We always have nS (n) 6 nE (n). So far
in this example, we had nS (n) = nE (n) only. The strict inequality occurs,
for example, for the Carmichael number n = 561. In that case, nF (n) = 320,
nE (n) = 80, and nS (n) = 10. The following table summarizes these counts
(along with φ(n)) for some small Carmichael numbers.
n φ(n) nF (n) nE (n) nS (n)
561 = 3 × 11 × 17 320 320 80 10
1105 = 5 × 13 × 17 768 768 192 30
1729 = 7 × 13 × 19 1296 1296 648 162
2465 = 5 × 17 × 29 1792 1792 896 70
2821 = 7 × 13 × 31 2160 2160 540 270
6601 = 7 × 23 × 41 5280 5280 1320 330
8911 = 7 × 19 × 67 7128 7128 1782 1782
41041 = 7 × 11 × 13 × 41 28800 28800 14400 450
825265 = 5 × 7 × 17 × 19 × 73 497664 497664 124416 486
The table indicates that not only is the Miller–Rabin test immune against
Carmichael numbers, but also strong witnesses are often more numerous than
Primality Testing 277
F0 = 0,
F1 = 1,
Fm = Fm−1 + Fm−2 for m > 2.
9 The Fibonacci and the Lucas tests are introduced by: Robert Baillie and Samuel S.
Example 5.18 (1) Lehmer proved that there are infinitely many Fibonacci
pseudoprimes. Indeed, the Fibonacci number F2p for every prime p > 5 is a
Fibonacci pseudoprime.
(2) There are only nine composite Fibonacci pseudoprimes 6 10, 000.
These are 323 = 17 × 19, 377 = 13 × 29, 1891 = 31 × 61, 3827 = 43 × 89,
4181 = 37 × 113, 5777 = 53 × 109, 6601 = 7 × 23 × 41, 6721 = 11 × 13 × 47,
and 8149 = 29 × 281. There are only fifty composite Fibonacci pseudoprimes
6 100, 000. The smallest composite Fibonacci pseudoprimes with four and five
prime factors are 199801 = 7×17×23×73 and 3348961 = 7×11×23×31×61.
(3) There is no known composite integer n ≡ ±2 (mod 5) which is simulta-
neously a Fibonacci pseudoprime and a Fermat pseudoprime to base 2. It is an
open question to find such a composite integer or to prove that no such com-
posite integer exists. There, however, exist composite integers n ≡ ±1 (mod 5)
which are simultaneously Fibonacci pseudoprimes and Fermat pseudoprimes
to base 2. Two examples are 6601 = 7 × 23 × 41 and 30889 = 17 × 23 × 79. ¤
F2k = Fk (2Fk+1 − Fk ),
2
F2k+1 = Fk+1 + Fk2 , (5.2)
F2k+2 = Fk+1 (Fk+1 + 2Fk ).
Since F323−( 5
323 ) ≡ 0 (mod 323), 323 is declared as prime. ¤
gp > FibMod(m,n) = \
local(i,s,t,F,Fnext); \
s = ceil(log(m)/log(2)); \
F = Mod(0,n); Fnext = Mod(1,n); \
i = s - 1; \
while (i>=0, \
if (bittest(m,i) == 0, \
t = F * (2 * Fnext - F); \
Fnext = Fnext^2 + F^2; \
F = t \
, \
t = Fnext^2 + F^2; \
Fnext = Fnext * (Fnext + 2 * F); \
F = t \
); \
i--; \
); \
return(F);
gp > FibMod(324,323)
%1 = Mod(0, 323)
gp > F324 = fibonacci(324)
%2 = 23041483585524168262220906489642018075101617466780496790573690289968
gp > F324 % 323
%3 = 0
We can invoke this test with several parameters (a, b). If any of these
invocations indicates that n is composite, then n is certainly composite. On
the other hand, if all of these invocations certify n as prime, we accept n as
prime. By increasing the number of trials (different parameters a, b), we can
reduce the probability that a composite integer is certified as a prime.
We should now supply an algorithm for an efficient computation of the
value Un−( ∆ ) modulo n. To that effect, we introduce a related sequence Vm =
n
Vm (a, b) as follows.
V0 = 2,
V1 = a,
Vm = aVm−1 − bVm−2 for m > 2. (5.5)
2
As above, let α, β be the roots of the characteristic polynomial x − ax + b.
An explicit formula for the sequence Vm is as follows.
Vm = Vm (a, b) = αm + β m for all m > 0. (5.6)
The sequence Um can be computed from Vm , Vm+1 by the simple formula:
Um = ∆−1 (2Vm+1 − aVm ) for all m > 0.
Therefore, it suffices to compute Vn−( ∆ ) and Vn−( ∆ )+1 for the Lucas test.
n n
This computation can be efficiently done using the doubling formulas:
V2k = Vk2 − 2bk ,
V2k+1 = Vk Vk+1 − abk , (5.7)
2
V2k+2 = Vk+1 − 2bk+1 .
Designing the analog of Algorithm 5.5 for Lucas tests is posed as Exercise 5.17.
282 Computational Number Theory
"à √ !m à √ !m #
1 3+ 5 3− 5
Um = Um (3, 1) = √ − for all m > 0.
5 2 2
A stronger Lucas test can be developed like the Miller–Rabin test. Let p be
an odd prime. Consider the Lucas sequence Um with parameters a, b. Assume
that´ gcd(p, 2b∆) = 1, that³is,´α, β are distinct in Fp³ or´Fp2 . This implies that
³
∆ ∆ ∆
p = ±1, that is, p − p is even. Write p − p = 2s t with s, t ∈ N
and with t odd. The condition Uk ≡ 0 (mod p) implies (α/β)k ≡ 1 (mod p).
Since the only square roots of 1 modulo p are ±1, we have either (α/β)t ≡
j
1 (mod p) or (α/β)2 t ≡ −1 (mod p) for some j ∈ {0, 1, . . . , s − 1}. The
condition (α/β)t ≡ 1 (mod p) implies Ut ≡ 0 (mod p), whereas the condition
j
(α/β)2 t ≡ −1 (mod p) implies V2j t ≡ 0 (mod p).
Example 5.24 (1) Example 5.22 shows that¡ 21¢ is a composite Lucas pseudo-
prime with parameters 3, 1. In this case, n − ∆ 2
n = 20 = 2 × 5, that is, s = 2
−1 −1
and t = 5. We have U5 ≡ ∆ (2V6 − aV5 ) ≡ 5 (14 − 54) ≡ 13 6≡ 0 (mod n).
Moreover, V5 ≡ 18 6≡ 0 (mod 21), and V10 ≡ 7 6≡ 0 (mod 21). That is, 21 is
not a strong Lucas pseudoprime with parameters 3, 1.
There are exactly 21 composite Lucas pseudoprimes 6 10, 000 with pa-
rameter 3, 1. These are 21, 323, 329, 377, 451, 861, 1081, 1819, 1891, 2033,
2211, 3653, 3827, 4089, 4181, 5671, 5777, 6601, 6721, 8149 and 8557. Only five
(323, 377, 1891, 4181 and 5777) of these are strong Lucas pseudoprimes.
(2) There is no composite integer 6 107 , which is a strong Lucas pseudo-
prime with respect to both the parameters (3, 1) and (4, 1). ¤
Algorithm 5.7 presents the strong Lucas primality test. We assume a gen-
eral parameter (a, b). Evidently, the algorithm becomes somewhat neater and
more efficient if we restrict only to parameters of the form (a, 1).
Arnault11 proves an upper bound on the number of pairs (a, b) to which
a composite number n is a strong Lucas pseudoprime with parameters (a, b).
More precisely, Arnault takes a discriminant ∆ and an odd composite integer
n coprime to ∆ (but not equal to 9). By SL(∆, n), he denotes the number
11 François Arnault, The Rabin-Monier theorem for Lucas pseudoprimes, Mathematics of
891, 2001.
16 Richard Kenneth Guy, Unsolved problems in number theory (3rd ed), Springer, 2004.
Primality Testing 285
316–329, 1986.
19 A. O. L. Atkin and François Morain, Elliptic curves and primality proving, Mathematics
ematics, 160(2), 781–793, 2004. This article can also be downloaded from the Internet
site: http://www.cse.iitk.ac.in/users/manindra/algebra/primality.pdf. Their first article on this
topic is available at http://www.cse.iitk.ac.in/users/manindra/algebra/primality_original.pdf.
21 Hendrik W. Lenstra, Jr. and Carl Pomerance, Primality testing with Gaussian periods,
Raphson (1648–1715).
286 Computational Number Theory
0−f (ai )
(ai , f (ai )). This line meets the x-axis at x = ai+1 . Therefore, f ′ (ai ) =
ai+1 −ai ,
(k−1)aki +n
that is, ai+1 = ai − ff′(ai)
(ai ) . Simple calculations show that ai+1 = kak−1
.
j i k
k
(k−1)ai +n
In order to avoid floating-point calculations, we update ai+1 = kak−1
.
i
(k−1)ak
i +n
αi+1 = kak−1 , and ai+1 = ⌊αi+1 ⌋. If ai > ξ, the convexity of f implies
i
that ξ < αi+1 < ai . Taking floor gives ⌊ξ⌋ 6 ai+1 < ai . This means that
the integer approximation stored in a decreases strictly so long as a > ⌊ξ⌋,
and eventually obtains the value aj = ⌊ξ⌋ for some j. I show that in the next
iteration, the loop is broken after the computation of b = aj+1 from a = aj .
If n is actually a k-th power, ⌊ξ⌋ = ξ. In that case, the next integer
approximation aj+1 also equals ⌊ξ⌋ = ξ. Thus, aj+1 = aj , and the loop is
broken. On the other hand, if n is not a perfect k-th power, then ⌊ξ⌋ < ξ, that
is, the current approximation aj < ξ. Since, in this case, we have f (aj ) < 0
and f ′ (aj ) > 0, the next real approximation αj+1 is larger than aj , and so its
floor aj+1 is > aj . Thus, the condition b > a is again satisfied.
It is well known from the results of numerical analysis that the Newton–
Raphson method converges quadratically. That is, the Newton–Raphson loop
is executed at most O(log n) times. The exponentiation t = ak−1 in each
iteration can be computed in O(log2 n log k) time. The rest of an iteration
runs in O(log2 n) time. Therefore, Algorithm 5.8 runs in O(log3 n log k) time.
We now vary k. The maximum possible exponent k for which n can be
a perfect k-th power corresponds to the case n = 2k , that is, to k = lg n =
log n
log 2 . Therefore,
j
it suffices to check whether n is a perfect k-th power for
k
log n
k = 2, 3, . . . , log 2 . In particular, we always have k = O(log n), and checking
whether n is a perfect power finishes in O(log4 n log log n) or O˜(log4 n) time.24
23 A real-valued function f is called convex in the real interval [a, b] if f ((1 − t)a + tb) 6
A straightforward use of Theorem 5.26 or Corollary 5.27 calls for the com-
putation of n − 1 binomial coefficients modulo n, leading to an exponential
algorithm for primality testing. This problem can be avoided by taking a poly-
nomial h(x) of small degree and by computing (x + a)n and xn + a modulo n
and h(x), that is, we now use the arithmetic of the ring Zn [x]/hh(x)i. Let r
denote the degree of h(x) modulo n. All intermediate products are maintained
as polynomials of degrees < r. Consequently, an exponentiation of the form
(x + a)n or xn can be computed in O(r2 log3 n) time. If r = O(logk n) for some
constant k, this leads to a polynomial-time test for the primality of n.
Composite integers n too may satisfy (x + a)n ≡ xn + a (mod n, h(x)),
and the AKS test appears to smell like another rotten probabilistic primality
test. However, there is a neat way to derandomize this algorithm. In view of
the results in Section 5.3.1, we assume that n is not a perfect power.
The AKS algorithm proceeds in two stages. In the first stage, we take
h(x) = xr − 1 for some small integer r. We call r suitable in this context if
ordr (n) > lg2 n.§ A simple
¨ argument establishes that for all n > 5,690,034,
a suitable r 6 lg5 n exists. An efficient computation of ordr (n) requires
the factorization of r and φ(r). However, since r is O(lg5 n), it is fine to use
an exponential (in lg r) algorithm for obtaining these factorizations. Another
alternative is to compute n, n2 , n3 , . . . modulo r until ordr (n) is revealed.
288 Computational Number Theory
In the second stage, one works with the smallest suitable r available from
n n r
the first stage. jpCheckingkwhether (x + a) ≡ x + a (mod n, x − 1) for all
a = 1, 2, . . . , φ(r) lg n allows one to deterministically conclude about the
primality of n. A proof of the fact that only these values of a suffice is omitted
here. The AKS test given as Algorithm 5.9 assumes that n > 5,690,034.
Example 5.28 (1) Take n = 8,079,493. The search for §a suitable ¨ r is shown
below. Since lg2 n = 526.511 . . . , this search starts from lg2 n + 1 = 528.
ord528 (n) = 20, ord529 (n) = 506, ord530 (n) = 52, ord531 (n) = 174,
ord532 (n) = 9, ord533 (n) = 60, ord534 (n) = 22, ord535 (n) = 212,
ord536 (n) = 6, ord537 (n) = 89, ord538 (n) = 67, ord539 (n) = 15,
ord540 (n) = 36, ord541 (n) = 540.
Therefore, r = 541 (which is a prime), and φ(r) = 540. One then verifies
that gcd(2, n) = gcd(3, n) = · · · = gcd(541, n) = 1,j that is, n khas no small
p
prime factors. One then computes the bound B = φ(r) lg n = 533, and
checks that the congruence (x + a)n ≡ xn + a (mod n, xr − 1) holds for all
a = 1, 2, . . . , B. For example, (x + 1)n ≡ x199 + 1 (mod n, x541 − 1), and
xn + 1 ≡ xn rem 541 + 1 ≡ x199 + 1 (mod n, x541 − 1). So 8,079,493 is prime.
(2) For n = 19,942,739, we have lg2 n = 588.031 . . . . We calculate
ord590 (n) = 58, ord591 (n) = 196, ord592 (n) = 36, and ord593 (n) = 592. So,
r = 593 is suitable, and n has no factors 6 593. The bound for the second stage
is now B = 590. However, for a = 1, one obtains (x + 1)n ≡ 9029368x592 +
919485x591 + 10987436x590 + · · · + 9357097x + 17978236 (mod n, x593 − 1),
whereas xn + 1 ≡ xn rem 593 + 1 ≡ x149 + 1 (mod n, x593 − 1). We conclude that
19,942,739 is not prime. Indeed, 19,942,739 = 2,683 × 7,433. ¤
Primality Testing 289
One can easily work out that under schoolbook arithmetic, the AKS al-
gorithm runs in O(lg16.5 n) time. If one uses fast arithmetic (based on FFT),
this running time drops to O˜(lg10.5 n). This exponent is quite high compared
to the Miller–Rabin exponent (three). That is, one does not plan to use the
AKS test frequently in practical applications. Lenstra and Pomerance’s im-
provement of the AKS test runs in O˜(log6 n) time.
Theorem 5.29 [Pépin’s test]25 The Fermat number fm for m > 1 is prime
if and only if 3(fm −1)/2 ≡ −1 (mod fm ).
Proof [if] The condition 3(fm −1)/2 ≡ −1 (mod fm ) implies 3fm −1 ≡
m
1 (mod fm ), that is, ordfm (3)|fm − 1 = 22 , that is, ordfm (3) = 2h for
some h in the range 1 6 h 6 2m . However, if h < 2m , we cannot have
m
3(fm −1)/2 ≡ −1 (mod fm ). So ordfm (3) = 22 = f³m −´1, that is, fm is prime.
[only if] If fm is prime, we have 3(fm −1)/2 ≡ f3m (mod fm ) by Euler’s
³ ´ ³ ´
criterion. By the quadratic reciprocity law, f3m = (−1)(fm −1)(3−1)/4 f3m =
2m −1
³ ´ ³ ´ ³ 2m ´ ³ m ´ ¡ ¢
fm fm 2 +1 (−1)2 +1
(−1)2 3 = 3 = 3 = 3 = 32 = −1. ⊳
award of US$100,000 from the Electronic Frontier Foundation for the first
discoverer of a prime with ten million (or more) digits. This prime happens to
be the 47-th28 Mersenne prime 243,112,609 −1, a prime with 12,978,189 decimal
digits, discovered on August 23, 2008 in the Department of Mathematics,
UCLA. Running a deterministic primality test on such huge numbers is out
of question. Probabilistic tests, on the other hand, do not furnish iron-clad
proofs for primality and are infeasible too for these numbers. A special test
known as the Lucas–Lehmer test 29 is used for deterministically checking the
primality of Mersenne numbers.
A positive integer n is called a perfect number if it equals the sum of
its proper positive integral divisors. For example, 6 = 1 + 2 + 3 and 28 =
1 + 2 + 4 + 7 + 14 are perfect numbers. It is known that n is an even perfect
number if and only if it is of the form 2p−1 (2p − 1) with Mp = 2p − 1 being
a (Mersenne) prime. Thus, Mersenne primes have one-to-one correspondence
with even perfect numbers. We do not know any odd perfect number. We do
not even know whether an odd perfect number exists.
Theorem 5.31 [Lucas–Lehmer test] The sequence si , i > 0, is defined as:
s0 = 4,
si = s2i−1 − 2 for i > 1.
For p ∈ P, Mp is prime if and only if sp−2 ≡ 0 (mod Mp ). ⊳
I am not going to prove this theorem here. The theorem implies that we
need to compute the Lucas–Lehmer residue sp−2 (mod Mp ). The obvious
iterative algorithm of computing si from si−1 involves a square operation
followed by reduction modulo Mp . Since 2p ≡ 1 (mod Mp ), we write s2i−1 −2 =
2p n1 + n0 , and obtain s2i−1 − 2 ≡ n1 + n0 (mod Mp ). One can extract n1 , n0
by bit operations. Thus, reduction modulo Mp can be implemented efficiently.
Example 5.32 (1) We prove that M7 = 27 − 1 = 127 is prime. The calcula-
tions are shown below.
i si (mod M7 )
0 4
1 42 − 2 ≡ 14
2 142 − 2 ≡ 194 ≡ 1 × 27 + 66 ≡ 1 + 66 ≡ 67
3 672 − 2 ≡ 4487 ≡ 35 × 27 + 7 ≡ 35 + 7 ≡ 42
4 422 − 2 ≡ 1762 ≡ 13 × 27 + 98 ≡ 13 + 98 ≡ 111
5 1112 − 2 ≡ 12319 ≡ 96 × 27 + 31 ≡ 96 + 31 ≡ 127 ≡ 0
Since s7−2 ≡ 0 (mod M7 ), M7 is prime.
28 This is the 45th Mersenne prime to be discovered. Two smaller Mersenne primes were
discovered later. It is not yet settled whether there are more undiscovered Mersenne primes
smaller than M43,112,609 .
29 The French mathematician François Édouard Anatole Lucas (1842–1891) introduced
this test in 1856. It was later improved in 1930s by the American mathematician Derrick
Henry Lehmer (1905–1991).
292 Computational Number Theory
i si (mod M11 )
0 4
1 42 − 2 ≡ 14
2 142 − 2 ≡ 194
3 1942 − 2 ≡ 37634 ≡ 18 × 211 + 770 ≡ 18 + 770 ≡ 788
4 7882 − 2 ≡ 620942 ≡ 303 × 211 + 398 ≡ 303 + 398 ≡ 701
5 7012 − 2 ≡ 491399 ≡ 239 × 211 + 1927 ≡ 239 + 1927 ≡ 2166
≡ 1 × 2048 + 118 ≡ 1 + 118 ≡ 119
6 1192 − 2 ≡ 14159 ≡ 6 × 211 + 1871 ≡ 6 + 1871 ≡ 1877
7 18772 − 2 ≡ 3523127 ≡ 1720 × 211 + 567 ≡ 1720 + 567 ≡ 2287
≡ 1 × 211 + 239 ≡ 1 + 239 ≡ 240
8 240 − 2 ≡ 57598 ≡ 28 × 211 + 254 ≡ 28 + 254 ≡ 282
2
Exercises
1. For a positive integer n, the sum of the reciprocals of all primes 6 n asymptot-
ically approaches ln ln n. Using this fact, derive that the sieve of Eratosthenes
can be implemented to run in O(n ln ln n) time.
2. Modify the sieve of Eratosthenes so that it runs in O(n) time. (Hint: Mark
each composite integer only once.)
3. If both p and 2p + 1 are prime, we call p a Sophie Germain prime30 , and 2p + 1
a safe prime. It is conjectured that there are infinitely many Sophie Germain
primes. In this exercise, you are asked to extend the sieve of Section 5.1.4 for
locating the smallest Sophie Germain prime p > n for a given positive integer
n ≫ 1. Sieve over the interval [n, n + M ].
(a) Determine a value of M such that there is (at least) one Sophie Germain
prime of the form n + i, 0 6 i 6 M , with high probability. The value of M
should not be unreasonably large.
(b) Describe a sieve to throw away the values of n + i for which either n + i
or 2(n + i) + 1 has a prime divisor less than or equal to the t-th prime. Take
t as a constant (like 100).
(c) Describe the gain in the running time that you achieve using the sieve.
4. Let s and t be bit lengths with s > t.
(a) Describe an efficient algorithm to locate a random s-bit prime p such that
a random prime of bit length t divides p − 1.
(b) Express the expected running time of your algorithm in terms of s, t.
(c) How can you adapt the sieve of Section 5.1.4 in this computation?
5. Let p, q be primes, n = pq, a ∈ Z∗n , and d = gcd(p − 1, q − 1).
(a) Prove that n is a pseudoprime to base a if and only if ad ≡ 1 (mod n).
(b) Prove that n is pseudoprime to exactly d2 bases in Z∗n .
(c) Let q = 2p − 1. To how many bases in Z∗n is n a pseudoprime?
(d) Repeat Part (c) for the case q = 2p + 1.
6. Let n ∈ N be odd and composite. If n is not a pseudoprime to some base in
Z∗n , prove that n is not a pseudoprime to at least half of the bases in Z∗n .
7. Prove the following properties of any Carmichael number n.
(a) (p − 1)|(n − 1) for every prime divisor p of n.
(b) n is odd.
(c) n is square-free.
(d) n has at least three distinct prime factors.
8. Suppose that 6k + 1, 12k + 1 and 18k + 1 are all prime for some k ∈ N. Prove
that (6k + 1)(12k + 1)(18k + 1) is a Carmichael number. Find two Carmichael
numbers of this form.
30 This is named after the French mathematician Marie-Sophie Germain (1776–1831). The
name safe prime is attributed to the use of these primes in many cryptographic protocols.
294 Computational Number Theory
9. Prove that for every odd prime r, there exist only finitely many Carmichael
numbers of the form rpq (with p, q primes).
10. Prove that:
(a) Every Euler pseudoprime to base a is also a pseudoprime to base a.
(b) Every strong pseudoprime to base a is also a pseudoprime to base a.
11. Let n be an odd composite integer. Prove that:
(a) There is at least one base a ∈ Z∗n , to which n is not an Euler pseudoprime.
(b) n is not an Euler pseudoprime to at least half of the bases in Z∗n .
12. Prove that if n > 3 is a pseudoprime to base 2, then 2n − 1 is an Euler
pseudoprime to base 2 and also a strong pseudoprime to base 2.
13. Let p and q = 2p − 1 be primes, and n = pq. Prove that:
(a) n is an Euler pseudoprime to exactly one-fourth of the bases in Z∗n .
(b) If p ≡ 3 (mod 4), then n is a strong pseudoprime to exactly one-fourth of
the bases in Z∗n .
14. Deduce the formulas (5.1), (5.4) and (5.6).
15. Prove that for all integers m > 1 and n > 0, the Fibonacci numbers satisfy
Fm+n = Fm Fn+1 + Fm−1 Fn .
Deduce the identities (5.2).
16. Prove the doubling formulas (5.7) for Vm defined in Section 5.2.5.
17. Write an analog of Algorithm 5.5 for the computation of Vm (mod n).
18. [Extra strong Lucas pseudoprime] Let Um = Um (a, 1) be the Lucas sequence
with parameters a, 1, and Vm = Vm (a, 1) the corresponding V sequence. Take
an odd
¡ ∆ ¢ positive integer n with gcd(n, 2∆a) = 1, where ∆ = a2 − 4. We write
s
n− n = 2 t with t odd. We call n an extra strong Lucas pseudoprime to base
a if either (i) Ut ≡ 0 (mod n) and Vt ≡ ±2 (mod n), or (ii) V2j t ≡ 0 (mod n)
for some j ∈ {0, 1, 2, . . . , s − 1}. Prove that:
(a) If n ∈ P does not divide 2∆, then n is an extra strong Lucas pseudoprime.
(b) An extra strong Lucas pseudoprime is also a strong Lucas pseudoprime.
19. The Lehmer sequence Ūm with parameters a, b is defined as:
Ū0 = 0,
Ū1 = 1,
Ūm = Ūm−1 − b Ūm−2 if m > 2 is even,
Ūm = a Ūm−1 − b Ūm−2 if m > 3 is odd.
2
√
½ x m− amx + b.2
Let α, β be the roots of
(α − β )/(α − β 2 ) if m is even,
(a) Prove that Ūm =
(αm − β m )/(α − β) if m is odd.
(b) Let ∆ = a − 4b, and n a positive integer with gcd(n, 2a∆) = 1. We call n
is Lehmer pseudoprime with parameters a, b if Ūn−( a∆ ) ≡ 0 (mod n). Prove
n
that n is a Lehmer pseudoprime with parameters a, b if and only if n is a
Lucas pseudoprime with parameters a, ab.
Primality Testing 295
P (0) = 3,
P (1) = 0,
P (2) = 2,
P (n) = P (n − 2) + P (n − 3) for n > 3.
self-taught mathematician.
33 This test was proposed by the English mathematician Henry Cabourn Pocklington
Programming Exercises
Write GP/PARI functions to implement the following.
27. Obtaining a random prime of a given bit length l.
28. The Solovay–Strassen test.
29. The Miller–Rabin test.
30. The Fibonacci test (you may use the function FibMod() of Section 5.2.4).
31. The Lucas test.
32. The strong Lucas test.
33. The AKS test.
34. The Pépin test.
35. The Lucas–Lehmer test.
Chapter 6
Integer Factorization
Now that we are able to quickly recognize primes as primes, it remains to com-
pute the prime factorization of (positive) integers. This is the tougher part
of the story. Research efforts for decades have miserably failed to produce
efficient algorithms for factoring integers. Even randomization does not seem
to help here. Today’s best integer-factoring algorithms run in subexponen-
tial time which, although better than exponential time, makes the factoring
problem practically intractable for input integers of size only thousand bits.
This chapter is an introduction to some integer-factoring algorithms. We
start with a few fully exponential algorithms. These old algorithms run effi-
ciently in certain specific situations, so we need to study them.
Some subexponential algorithms are discussed next. Assume that n is the
(positive) integer to be factored. A subexponential expression in log n is, in
this context, an expression of the form
£ ¤
L(n, ω, c) = exp (c + o(1))(ln n)ω (ln ln n)1−ω ,
where ω is a real number in the open interval (0, 1), and c is a positive real
number. Plugging in ω = 0 in L(n, ω, c) gives a polynomial expression in ln n.
On the other hand, for ω = 1, the expression L(n, ω, c) is fully exponential in
ln n. For 0 < ω < 1, the expression L(n, ω, c) is something between polynomial
and exponential, and is called a subexponential expression in ln n.
297
298 Computational Number Theory
gp > factor(2^2^5+1)
%1 =
[641 1]
[6700417 1]
gp > factorint(2^2^5+1)
%2 =
[641 1]
[6700417 1]
gp > #
timer = 1 (on)
Integer Factorization 299
gp > factorint(2^101-1)
time = 68 ms.
%3 =
[7432339208719 1]
[341117531003194129 1]
gp > factorint(2^201-1)
time = 1,300 ms.
%4 =
[7 1]
[1609 1]
[22111 1]
[193707721 1]
[761838257287 1]
[87449423397425857942678833145441 1]
gp > factorint(2^201-1,1)
time = 1,540 ms.
%5 =
[7 1]
[1609 1]
[22111 1]
[193707721 1]
[761838257287 1]
[87449423397425857942678833145441 1]
gp > factorint(2^301-1)
*** Warning: MPQS: the factorization of this number will take several hours.
*** user interrupt after 1mn, 34,446 ms.
gp >
of d divide n and are smaller than d. Therefore, before d is tried, all prime
divisors of d are already factored out from n. √
However, one then requires a list of primes 6 n . It is often not feasible
to have such a list. On the other hand, checking every potential divisor d for
primality before making a trial division of n by d is a massive investment of
time. A practical trade-off can be obtained using the following idea.1
After 2 is tried as a potential divisor, there is no point dividing n by
even integers. This curtails the space for potential divisors by a factor of 2.
Analogously, we should not carry out trial division by multiples of 3 (other
than 3 itself) and by multiples of 5 (other than 5 itself). What saving does it
produce? Consider d > 2 × 3 × 5 = 30 with r = d rem 30. If r is not coprime to
30, then d is clearly composite. Moreover, φ(30) = (2−1)×(3−1) × (5−1) = 8,
that is, only 8 (out of 30) values of r may be prime. Thus, trial division may be
skipped by d > 30 unless r = d rem 30 is among 1, 7, 11, 13, 17, 19, 23, 29. This
reduces the search space for potential divisors to about one-fourth. One may,
if one chooses, use other small primes like 7, 11, 13, . . . in this context. But
considering four or more small primes leads to additional bookkeeping, and
produces improvements that are not too dramatic. It appears that considering
only the first three primes 2, 3, 5 is a practically optimal choice.
Example 6.1 Let us factor
n = 361 + 1 = 127173474825648610542883299604
by trial division. We first divide n by 2. Since n is even, 2 is indeed a factor
of n. Replacing n by n/2 gives
63586737412824305271441649802
which is again even. So we make another trial division by 2, and reduce n to
31793368706412152635720824901.
A primality test reveals that this reduced n is composite. So we divide this n
by the remaining primes < 30, that is, by 3, 5, 7, 11, 13, 17, 19, 23, 29. It turns
out that n is divisible by neither of these primes.
As potential divisors d > 30 of n, we consider only the values 30k+r for r =
1, 7, 11, 13, 17, 19, 23, 29. That is, we divide n by 31, 37, 41, 43, 47, 49, 53, 59, 61,
67, 71, 73, 77, 79, 83, 89, 91, 97, . . . . Some of these divisors are not prime (like
49, 77, 91), but that does not matter. Eventually, we detect a divisor 367 =
12 × 30 + 7 of n. Clearly, 367 has to be prime. Reducing n by n/361 gives
86630432442539925437931403.
A primality test shows that this reduced n is prime. So there is no need to
carry out trial division further, that is, we have the complete factorization
n = 361 + 1 = 22 × 367 × 86630432442539925437931403.
1 The trial-division algorithm in this form is presented in: Henri Cohen, A course in
computational algebraic number theory, Graduate Text in Mathematics, 138, Springer, 1993.
Integer Factorization 301
This example illustrates that the method of trial division factors n efficiently
if all (except at most one) prime divisors of n are small. ¤
A complete
√ factorization of n by trial division calls for a worst-case running
time of O˜( n). This bound is achievable, for example, for RSA moduli of the
form n = pq with bit sizes of the primes p, q being nearly half of that of n. So
trial division is impractical except only for small values of n (like n 6 1020 ).
For factoring larger integers, more sophisticated ideas are needed.
Before employing these sophisticated algorithms, it is worthwhile to divide
n by a set of small primes. That reveals the small factors of n, and may reduce
its size considerably so as to make the sophisticated algorithms run somewhat
faster. In view of this, it will often be assumed that the number to be factored
does not contain small prime divisors.
Example 6.2 Let us use trial division to extract the small prime factors of
n = 360 + 1 = 42391158275216203514294433202.
Considering all potential divisors d 6 104 decomposes n as
n = 360 + 1 = 2 × 41 × 241 × 6481 × 330980468807135443441.
Primality tests indicate that the last factor is composite. We use sophisticated
algorithms for factoring this part of n. ¤
5
Example 6.4 Let us try to factor the Fermat number n = f5 = 22 + 1 =
4294967297 by Pollard’s rho method. We take the sequence-generating func-
tion f (x) = x2 + 1 (mod n). The computations done by Algorithm 6.1 are
illustrated in the following table. We start with the initial term x0 = 123.
The non-trivial factor 641 of n is discovered by Pollard’s rho method. The
corresponding cofactor is 6700417. That is, f5 = 641 × 6700417. Both these
factors are prime, that is, we have completely factored f5 . ¤
2 John M. Pollard, A Monte Carlo method for factorization, BIT Numerical Mathematics,
Floyd’s paper (1967) presents an algorithn for finding cycles in graphs. Pollard uses this
algorithm in his factoring paper (1975). This explains the apparent anachronism.
Integer Factorization 303
1980.
Integer Factorization 305
This gives the factorization n = 17907121 × 89475143 with both the factors
prime. Although the bound B is 32, we do not need to raise a to the powers
pei i for pi = 23, 29, 31. This is how computing the gcd inside the loop helps.
In order to see why Pollard’s p − 1 method works in this example, we note
the factorization of p − 1 and q − 1, where p = 17907121 and q = 89475143.
p − 1 = 24 × 32 × 5 × 7 × 11 × 17 × 19,
q − 1 = 2 × 19 × 2354609.
We fail to separate p from q in this case. So we try other bases. The base 23
works as given in the next table. The bound remains B = 24 as before.
ei
p
i pi ei aold anew ≡ aold
i
(mod n) gcd(anew − 1, n)
1 2 4 23 487183864388533 1
2 3 2 487183864388533 422240241462789 1
3 5 1 422240241462789 64491241974109 1
4 7 1 64491241974109 88891658296507 1
5 11 1 88891658296507 143147690932110 1
6 13 1 143147690932110 244789218562995 1
7 17 1 244789218562995 334411207888980 1
8 19 1 334411207888980 381444508879276 17907121
p − 1 = 24 × 32 × 5 × 7 × 11 × 17 × 19,
q − 1 = 25 × 32 × 5 × 7 × 11 × 13 × 19.
Stage 2
ei
p
i pi ei aold anew ≡ aold
i
(mod n) gcd(anew − 1, n)
9 23 1 358991192319517 589515560613570 1
10 29 1 589515560613570 111846253267074 1
11 31 1 111846253267074 593264734044925 1
12 37 1 593264734044925 168270169378399 1
13 41 1 168270169378399 285271807182347 1
14 43 1 285271807182347 538018099945609 15495481
The obvious question now is whether this method always works. Any odd
¡ n+1 ¢2 ¡ n−1 ¢2
integer n can be expressed as n = ¡ n+1 2 n−1−¢ ¡ 2 . However,
¢ this gives us
n+1 n−1
only the trivial factorization n = 2 − 2 2 + 2 = 1 × n.
Since it is easy to verify whether n is a perfect power, we assume that n has
m > 2 distinct prime factors. We may also assume that n contains no small
prime factors. In particular, n is odd. Then, for any y ∈ Z∗n , the congruence
x2 ≡ y 2 (mod n) has exactly 2m solutions for x. The only two trivial solutions
are x ≡ ±y (mod n). Each of the remaining 2m − 2 solutions yields a non-
trivial split of n. If x and y are random elements satisfying x2 ≡ y 2 (mod n),
m
then gcd(x − y, n) is a non-trivial factor of n with probability 2 2m−2 > 12 .
This factoring idea works if we can make available a non-trivial congruence
of the form x2 ≡ y 2 (mod n). The modern subexponential algorithms propose
different ways of obtaining this congruence. We start with a very simple idea.6
We choose a non-zero x ∈ Zn randomly, and compute a = x2 rem n (an
2 2
integer in {0, 1, 2, . . . , n − 1}). If a is¥√ ¦ square, say a = y , then x ≡
a perfect
2
y (mod n). However, there are only n − 1 non-zero√ perfect squares in Zn .
So √the probability that a is of the form y 2 is about 1/ n, that is, after trying
O( n ) random values of x, we expect to arrive at the desired congruence
x2 ≡ y 2 (mod n). This gives an algorithm with exponential running time.
In order to avoid this difficulty, we choose a factor base B consisting of
the first t primes p1 , p2 , . . . , pt . We choose a random non-zero x ∈ Z∗n , and
6 John D. Dixon, Asymptotically fast factorization of integers, Mathematics of Compu-
and 8.12). Some of the non-zero solutions are expected to split n. The choices
of t and s are explained later.
Example 6.11 We factor n = 64349 by Dixon’s method. We take the factor
base B = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47} (the first 15 primes).
Relations are obtained for the following random values of x. The values of x,
for which x2 rem n does not split completely over B, are not listed here.
x21 ≡ 265072 ≡ 58667 ≡ 7 × 172 × 29 (mod n)
x22 ≡ 535232 ≡ 22747 ≡ 232 × 43 (mod n)
x23 ≡ 347952 ≡ 29939 ≡ 72 × 13 × 47 (mod n)
x24 ≡ 176882 ≡ 506 ≡ 2 × 11 × 23 (mod n)
x25 ≡ 580942 ≡ 833 ≡ 72 × 17 (mod n)
x26 ≡ 370092 ≡ 61965 ≡ 36 × 5 × 17 (mod n)
x27 ≡ 153762 ≡ 3150 ≡ 2 × 32 × 52 × 7 (mod n)
x28 ≡ 314142 ≡ 47481 ≡ 3 × 72 × 19 (mod n)
x29 ≡ 624912 ≡ 41667 ≡ 3 × 17 × 19 × 43 (mod n)
x210 ≡ 467702 ≡ 17343 ≡ 32 × 41 × 47 (mod n)
x211 ≡ 192742 ≡ 299 ≡ 13 × 23 (mod n)
x212 ≡ 42182 ≡ 31200 ≡ 25 × 3 × 52 × 13 (mod n)
x213 ≡ 232032 ≡ 35475 ≡ 3 × 52 × 11 × 43 (mod n)
x214 ≡ 269112 ≡ 18275 ≡ 52 × 17 × 43 (mod n)
x215 ≡ 586972 ≡ 28000 ≡ 25 × 53 × 7 (mod n)
x216 ≡ 500892 ≡ 4760 ≡ 23 × 5 × 7 × 17 (mod n)
x217 ≡ 255052 ≡ 984 ≡ 23 × 3 × 41 (mod n)
x218 ≡ 268202 ≡ 19278 ≡ 2 × 34 × 7 × 17 (mod n)
x219 ≡ 185772 ≡ 1242 ≡ 2 × 33 × 23 (mod n)
x220 ≡ 94072 ≡ 11774 ≡ 2 × 7 × 292 (mod n)
We have collected 20 relations with the hope that at least one non-trivial
solution of β1 , β2 , . . . , β20 will lead to a non-trivial decomposition of n. In
matrix notation, the above system of linear congruences can be written as
β
1
β2
0 0 0 1 0 0 1 0 0 0 0 5 0 0 5 3 3 1 1
β 0
1 3
β
0 4
0 0 0 0 0 6 2 1 1 2 0 1 1 0 0 0 1 4 3
β5 0
0 0 0 0 0 1 2 0 0 0 0 2 2 2 3 1 0 0 0 0 0
1 0 2 0 2 0 1 2 0 0 0 0 0 0 1 1 0 1 0 1 β6
β7 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 β8
β9 0
2 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0
β 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 10 ≡ 0 (mod 2).
0 2 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 β11
β 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 12 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 β13
β 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
β
0 15 0
β
0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 16 0
β
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 17 0
β
18
β19
β20
Integer Factorization 313
β2
0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 1 1 1 1
β 0
1 3
β
0 4
0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1
β 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 5 0
1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 β6
β7 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
β
0 8 0
β
0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0 9 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
β
0 10 ≡ 0 (mod 2).
0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 β11
β12 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 β13
β14 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
β
0 15 0
β
0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 16 0
β
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 17 0
β18
β19
β20
This reduced coefficient matrix (call it Ā) has rank 11 (modulo 2), that
is, the kernel of the matrix is a 9-dimensional subspace of Z20
2 . A basis of this
kernel is provided by the following vectors.
v1 = ( 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 )t ,
v2 = ( 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 )t ,
v3 = ( 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 )t ,
v4 = ( 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 )t ,
v5 = ( 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 )t ,
v6 = ( 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 )t ,
v7 = ( 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 )t ,
v8 = ( 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 )t ,
v9 = ( 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 )t .
This gives x ≡ xβ1 1 xβ2 2 · · · xβ2020 ≡ x3 x4 x5 x6 x10 x11 x13 x14 x16 x17 x20 ≡ 34795 ×
17688×58094×37009×46770×19274×23203×26911×50089×25505×9407 ≡
53886 (mod 64349). On the other hand, the vector e gives y ≡ 24 × 35 × 53 ×
73 × 11 × 13 × 172 × 23 × 29 × 41 × 43 × 47 ≡ 53886 (mod 64349). Therefore,
gcd(x − y, n) = 64349 = n, that is, the factorization attempt is unsuccessful.
Let us then try
t
( β1 β2 ··· β20 )
= v3 + v5 + v6 + v7 + v8
t
= (0 1 1 0 0 1 0 0 0 1 1 0 0 1 0 1 1 1 1 0)
so that
t
e = Aβ = ( 8 16 4 4 0 2 4 0 4 0 0 0 2 2 2) .
Therefore, x ≡ x2 x3 x6 x10 x11 x14 x16 x17 x18 x19 ≡ 53523 × 34795 × 37009 ×
46770 × 19274 × 26911 × 50089 × 25505 × 26820 × 18577 ≡ 58205 (mod 64349),
and y ≡ 24 × 38 × 52 × 72 × 13 × 172 × 232 × 41 × 43 × 47 ≡ 6144 (mod 64349).
In this case, gcd(x − y, n) = 1, and we again fail to split n non-trivially.
As a third attempt, let us try
t
( β1 β2 · · · β20 )
= v3 + v6 + v7 + v9
t
= (0 1 1 0 0 0 0 0 0 1 0 1 0 1 0 0 1 1 0 1) ,
for which
t
e = Aβ = ( 10 8 4 4 0 2 2 0 2 2 0 0 2 2 2) .
In this case, x ≡ x2 x3 x10 x12 x14 x17 x18 x20 ≡ 53523 × 34795 × 46770 × 4218 ×
26911 × 25505 × 26820 × 9407 ≡ 10746 (mod 64349). On the other hand,
y ≡ 25 ×34 ×52 ×72 ×13×17×23×29×41×43×47 ≡ 57954 (mod 64349). This
gives gcd(x−y, n) = 281, a non-trivial factor of n. The corresponding cofactor
is n/281 = 229. Since both these factors are prime, we get the complete
factorization 64349 = 229 × 281. ¤
We now derive the (optimal) running time of Dixon’s method. If the num-
ber t of primes in the factor base is too large, we have to collect too many
relations to obtain a system with non-zero solutions (we should have s > t).
Moreover, solving the system in that case would be costly. On the contrary, if
t is too small, most of the random values of x ∈ Z∗n will fail to generate rela-
tions, and we have to iterate too many times before we find a desired number
of values of x, for which x2 rem n factors completely over B. For estimating
the best trade-off, we need some results from analytic number theory.
Theorem 6.13 supplies the formula for the density of smooth integers.
Integer Factorization 315
ln n 2
Theorem 6.13 Let m, n ∈ N, and u = ln m . For u → ∞ and u > ln n,
the number of m-smooth integers x in the range 1 6 x 6 n asymptotically
approaches nu−u+o(u) . That is, the density of m-smooth integers between 1
and n is asymptotically u−u+o(u) . ⊳
We now deduce the best running time of Dixon’s method. Let us choose
the factor base to consist of all primes 6 L[β]. By the prime number theorem,
t ≈ L[β]/ ln L[β], that is, t = L[β] again. The random elements x2 rem n
are O(n), that is, we put α = 1 in Corollary 6.14. The probability that such
1
a random element factors completely over B is then L[− 2β ], that is, after
1
an expected number L[ 2β ] of iterations, we obtain one relation (that is, one
L[β]-smooth value). Each iteration calls for trial divisions by L[β] primes in
the factor base. Finally, we need to generate s > t relations. Therefore, the
1
total expected time taken by the relation collection stage is L[β]L[β]L[ 2β ]=
1 1 1
L[2β + 2β ]. The quantity 2β + 2β is minimized for β = 2 . For this choice, the
running time of the relation-collection stage is L[2].
The next stage of the algorithm solves a t × s system modulo 2. Since both
s and t are expressions of the form L[1/2], using standard Gaussian elimina-
tion gives a running time of L[3/2]. However, the relations collected lead to
equations that are necessarily sparse, since each smooth value of x2 rem n can
have only O(log n) prime factors. Such a sparse t × s system can be solved in
O˜(st) time using some special algorithms (see Chapter 8). Thus, the system-
solving stage can be completed in L[2/2] = L[1] time. To sum up, Dixon’s
method runs in subexponential time L[2] = L(n, 1/2, 2).
The constant 2 in L[2] makes Dixon’s method rather slow. The problem
with Dixon’s method is that it generates smoothness candidates as large as
O(n). By using other algorithms
√ (like CFRAC or QSM), we can generate
candidates having values O( n ).
316 Computational Number Theory
r ξ ar hr(mod n) yr B-smooth
√r
0 √n 772 772 −349 = (−1) × 349 No
1 (772 + √n)/349 4 3089 593 = 593 No
2 (624 + √n)/593 2 6950 −473 = (−1) × 11 × 43 Yes
3 (562 + √n)/473 2 16989 949 = 13 × 73 No
4 (384 + √n)/949 1 23939 −292 = (−1) × 22 × 73 No
5 (565 + √n)/292 4 112745 797 = 797 No
6 (603 + √n)/797 1 136684 −701 = (−1) × 701 No
7 (194 + √n)/701 1 249429 484 = 22 × 112 Yes
8 (507 + √n)/484 2 39209 −793 = (−1) × 13 × 61 No
9 (461 + √n)/793 1 288638 613 = 613 No
10 (332 + √n)/613 1 327847 −844 = (−1) × 22 × 211 No
11 (281 + √n)/844 1 20152 331 = 331 No
12 (563 + √n)/331 4 408455 −52 = (−1) × 22 × 13 Yes
13 (761 +√ n)/52 29 535020 737 = 11 × 67 No
14 (747 + √n)/737 2 285829 −92 = (−1) × 22 × 23 Yes
15 (727 + n)/92 16 337620 449 = 449 No
¤
Let us now deduce the optimal running time of the CFRAC method. The
√
smoothness candidates yr are O( n ), and have the probability L[− 1/2 2β ] =
1
L[− 4β ] for being L[β]-smooth. We expect to get one L[β]-smooth value of yr
1
after L[ 4β ] iterations. Each iteration involves trial divisions by L[β] primes in
the factor base. Finally, we need to collect L[β] relations, so the running time
1 1 1
of the CFRAC method is L[2β + 4β ]. Since 2β + 4β is minimized for β = 2√ 2
,
√
the optimal running time of the relation-collection stage is L[ 2 ]. The sparse
318 Computational Number Theory
system involving L[β] variables can be solved in L[2β] = L[ √12 ] time. To sum
√
up, the running time of the CFRAC method is L[ 2 ].
The CFRAC method can run √ in parallel with each instance handling the
continued-fraction expansion of sn for some √ s ∈ N. For a given s, the quan-
tity yr satisfies the inequality 0 < |yr | < 2 sn. If s grows, the probability of
yr being smooth decreases, so only small values of s should be used.
c T (c) c T (c)
−44 −71456 = (−1) × 25 × 7 × 11 × 29 −2 −2408 = (−1) × 23 × 7 × 43
−22 −35728 = (−1) × 24 × 7 × 11 × 29 0 968 = 23 × 112
−15 −24157 = (−1) × 72 × 17 × 29 2 4352 = 28 × 17
−14 −22496 = (−1) × 25 × 19 × 37 4 7744 = 26 × 112
−11 −17501 = (−1) × 11 × 37 × 43 26 45584 = 24 × 7 × 11 × 37
−9 −14161 = (−1) × 72 × 172 34 59584 = 26 × 72 × 19
−4 −5776 = (−1) × 24 × 192 36 63104 = 27 × 17 × 29
8 Carl Pomerance, The quadratic sieve factoring algorithm, Eurocrypt’84, 169–182, 1985.
Integer Factorization 319
There are 14 smooth values of T (c), and the size of the factor base is 9. Solving
the resulting system splits n as 761 × 937 (with both the factors prime). ¤
Let us now look at the optimal running time of the QSM. Let the factor
base B consist of (−1 and) primes √ 6 L[β]. Since the integers T (c) checked for
smoothness over B have values O( n ), the probability of each being smooth
1
is L[− 4β ]. In order that we get L[β] relations, the size of the sieving interval
1
(or equivalently M ) should be L[β + 4β ]. If we use trial division of each T (c)
by all of the L[β] primes in the factor base, we obtain a running time of
1
L[2β + 4β ]. As we will see shortly, a sieving procedure reduces this running
time by a factor of L[β], that is, the running time of the relation-collection
1
stage of QSM with sieving is only L[β + 4β ]. This quantity is minimized for
1
β = 2 , and we obtain a running time of L[1] for the relation-collection stage
of QSM. The resulting sparse system having L[ 12 ] variables and L[ 12 ] equations
can also be solved in the same time.
6.6.1 Sieving
Both the√CFRAC method and the QSM generate smoothness candidates √
of value O( n ). In the CFRAC method, these values √ are bounded by 2 n,
whereas for the QSM, we have a bound of nearly 2M n. The CFRAC method
is, therefore, expected to obtain smooth candidates more frequently than the
QSM. On the other hand, the QSM offers the possibility of sieving, a process
that replaces trial divisions by single-precision subtractions. As a result, the
QSM achieves a better running time than the CFRAC method.
In the QSM, the smoothness of the integers T (c) = J + 2cH + c2 is checked
for −M 6 c 6 M . To that effect, we use an array A indexed by c in the range
−M 6 c 6 M . We initialize the array location Ac to an approximate value
of log |T (c)|. We can use only one or two most significant words of T (c) for
this initial value. Indeed, it suffices to know log |T (c)| rounded or truncated
after three places of decimal. If so, we can instead store the integer value
⌊1000 log |T (c)|⌋, and perform only integer operations on the elements of A.
Now, we choose small primes p one by one from the factor base. For each
small positive exponent h, we try to find out all the values of c for which
ph |T (c). Since J = H 2 − n, this translates to solving (H + c)2 ≡ n (mod ph ).
For h = 1, we use a root-finding algorithm, whereas for h > 1, the solutions
can be obtained by lifting the solutions modulo ph−1 . In short, all the solutions
of (H + c)2 ≡ n (mod ph ) can be obtained in (expected) polynomial time (in
log p and log n).
Let χ be a solution of (H + c)2 ≡ n (mod ph ). For all c in the range
−M 6 c 6 M , we subtract log p from the array element Ac if and only if
c ≡ χ (mod ph ). In other words, we first obtain one solution χ, and then
update Ac for all c = χ ± kph with k = 0, 1, 2, . . . . If, for a given c, we have
the multiplicity v = vp (T (c)), then log p is subtracted from Ac exactly v times
(once for each of h = 1, 2, . . . , v).
320 Computational Number Theory
After all primes p ∈ B and all suitable small exponents h are considered,
we look at the array locations Ac for −M 6 c 6 M . If some T (c) is smooth
(over B), then all its prime divisors are eliminated during the subtractions
of log p from the initial value of log |T (c)|. Thus, we should have Ac = 0.
However, since we use only approximate log values, we would get Ac ≈ 0. On
the other hand, a non-smooth T (c) contains a prime factor > pt+1 . Therefore,
a quantity at least as large as log pt+1 remains in the array location Ac , that
is, Ac ≫ 0 in this case. In short, the post-sieving values of Ac readily identify
the smooth values of T (c). Once we know that some T (c) is smooth, we use
trial division of that T (c) by the primes in the factor base.
Example 6.17 We factor n = 713057 by the QSM. As in Example 6.16, take
B = {−1, 2, 7, 11, 17, 19, 29, 37, 43}, and M = 50. Initialize the array entry Ac
to ⌊1000 log |T (c)|⌋ (e is the base of logarithms). Since T (0) = J = 968, set
A0 = ⌊1000 log 968⌋ = 6875. Similarly, T (20) = J +40H +400 = 35168, so A20
is set to ⌊1000 log 35168⌋ = 10467, and T (−20) = J − 40H + 400 = −32432,
so A−20 is set to ⌊1000 log 32432⌋ = 10386. The approximate logarithms of
the primes in B are ⌊1000 log 2⌋ = 693, ⌊1000 log 7⌋ = 1945, ⌊1000 log 11⌋ =
2397, ⌊1000 log 17⌋ = 2833, ⌊1000 log 19⌋ = 2944, ⌊1000 log 29⌋ = 3367,
⌊1000 log 37⌋ = 3610, and ⌊1000 log 43⌋ = 3761.
The following table considers all small primes p and all small exponents
h for solving T (c) ≡ 0 (mod ph ). For each solution χ, we consider all values
of c ≡ χ (mod ph ) with −M 6 c 6 M , and subtract log p from Ac . For each
prime p, we consider h = 1, 2, 3, . . . in that sequence until a value of h is found,
for which there is no solution of T (c) ≡ 0 (mod ph ) with −M 6 c 6 M .
p ph χ c ≡ χ (mod ph ), −M 6 c 6 M
2 2 0 −50, −48, −46, −44, . . . , −2, 0, 2, . . . , 24 , 26, . . . , 44, 46, 48, 50
4 0 −48, −44, −40, −36, . . . , −4, 0, 4, . . . , 24 , . . . , 36, 40, 44, 48
2 −50, −46, −42, −38, . . . , −2, 2, . . . , 26, . . . , 38, 42, 46, 50
8 0 −48, −40, −32, −24, −16, −8, 0, 8, 16, 24 , 32, 40, 48
2 −46, −38, −30, −22, −14, −6, 2, 10, 18, 26, 34, 42, 50
4 −44, −36, −28, −20, −12, −4, 4, 12, 20, 28, 36, 44
6 −50, −42, −34, −26, −18, −10, −2, 6, 14, 22, 30, 38, 46
16 2 −46, −30, −14, 2, 18, 34, 50
4 −44, −28, −12, 4, 20, 36
10 −38, −22, −6, 10, 26, 42
12 −36, −20, −4, 12, 28, 44
32 2 −30, 2, 34
4 −28, 4, 36
18 −46, −14, 18, 50
20 −44, −12, 20
64 2 2
4 4
34 −30, 34
36 −28, 36
Integer Factorization 321
p ph χ c ≡ χ (mod ph ), −M 6 c 6 M
2 128 2 2
36 36
66
100 −28
256 2 2
100
130
228 −28
512 130
228
386
484 −28
1024 130
228
642
740
7 7 5 −44, −37, −30, −23, −16, −9, −2, 5, 12, 19, 26, 33, 40, 47
6 −50, −43, −36, −29, −22, −15, −8, −1, 6, 13, 20, 27, 34, 41, 48
49 34 −15, 34
40 −9, 40
343 132
236
11 11 0 −44, −33, −22, −11, 0, 11, 22, 33, 44
4 −40, −29, −18, −7, 4, 15, 26, 37, 48
121 0 0
4 4
1331 242
730
17 17 2 −49, −32, −15, 2, 19, 36
8 −43, −26, −9, 8, 25, 42
289 53
280 −9
4913 53
3170
19 19 5 −33, −14, 5, 24 , 43
15 −42, −23, −4, 15, 34
361 119
357 −4
6859 480
4689
29 29 7 −22, 7, 36
14 −44, −15, 14, 43
841 72
761
322 Computational Number Theory
p ph χ c ≡ χ (mod ph ), −M 6 c 6 M
37 37 23 −14, 23
26 −48, −11, 26
1369 245
803
43 43 32 −11, 32
41 −45, −2, 41
1849 471
1537
Let us track the array locations A24 and A26 in the above table (the bold
and the boxed entries). We have T (24) = 42104, and T (26) = 45584. So A24 is
initialized to ⌊1000 log 42104⌋ = 10647, and A26 to ⌊1000 log 45584⌋ = 10727.
We subtract ⌊1000 log 2⌋ = 693 thrice from A24 , and ⌊1000 log 19⌋ = 2944
once from A24 . After the end of the sieving process, A24 stores the value
10647 − 3 × 693 − 2944 = 5624, that is, T (24) is not smooth. In fact, T (24) =
42104 = 23 × 19 × 277. The smallest prime larger than those in B is 53 for
which ⌊1000 log 53⌋ = 3970. Thus, if T (c) is not smooth over B, then after the
completion of sieving, Ac would store a value not (much) smaller than 3970.
From A26 , ⌊1000 log 2⌋ = 693 is subtracted four times, ⌊1000 log 7⌋ = 1945
once, ⌊1000 log 11⌋ = 2397 once, and ⌊1000 log 37⌋ = 3610 once, leaving the
final value 10727 − 4 × 693 − 1945 − 2397 − 3610 = 3 at that array location.
So T (26) is smooth. Indeed, T (26) = 45584 = 24 × 7 × 11 × 37.
This example demonstrates that the final values of Ac for smooth T (c) are
clearly separated from those of Ac for non-smooth T (c). As a result, it is easy
to locate the smooth values after the sieving process even if somewhat crude
approximations are used for representing the logarithms. ¤
Let us now argue that the sieving process runs in L[1] time for the choice
β = 1/2. The primes in the factor base are 6 L[β] = L[1/2]. On the other
1
hand, M = L[β + 4β ] = L[1], and so 2M + 1 is also of the form L[1]. First,
we have to initialize all of the array locations Ac . Each location demands
the computation of T (c) (and its approximate logarithm), a task that can be
performed in O(lnk n) time for some constant k. Since there are L[1] array
locations, the total time for the initialization of A is of the order of
h √ i
(lgk n)L[1] = exp (1 + o(1)) ln n ln ln n + k ln ln n
"Ã r ! #
ln ln n √
= exp 1 + o(1) + k ln n ln ln n ,
ln n
computing all the solutions χ is of the order of (logl+1 n)L[1/2], which is again
an expression of the form L[1/2].
Now, we derive the total cost of subtracting log values from all array
locations. First, take p = 2. We have to subtract log 2 from each array location
Ac at most O(log n) times. Since M and so 2M + 1 are expressions of the form
L[1], the total effort of all subtractions for p = 2 is of the order of (log n)L[1]
which is again of the form L[1]. Then, take an odd prime p. Assume that
p 6 | n, and n is a quadratic residue modulo p. In this case, the congruence
T (c) ≡ 0 (mod ph ) has
√ exactly two solutions for each value of h. Moreover,
the values ph are O( n ). Therefore, the total cost of subtractions for all odd
P
small primes p and for all small exponents h is of the order of p,h (2M p+1)×2
h <
Pn 1
2(2M + 1) r=1 r ≈ 2(2M + 1) ln n, which is an expression of the form L[1].
Finally, one has to factor the L[1/2] smooth values of T (c) by trial divisions
by the L[1/2] primes in the factor base. This process too can be completed in
L[1] time. To sum up, the entire relation-collection stage runs in L[1] time.
Example 6.18 Let us continue with Examples 6.16 and 6.17. Now, we carry
out incomplete sieving. To start with, let us look at what happens to the array
locations A24 and A26 . After the incomplete sieving process terminates, A24
stores the value 10647 − 693 − 2944 = 7010. Since 1000 log pt+1 = 3970, T (24)
is selected as smooth for ξ > 7010/3970 ≈ 1.766. On the other hand, A26 ends
up with the value 10727 − 693 − 1945 − 2397 − 3610 = 2082 which passes the
smoothness test for ξ > 2082/3970 ≈ 0.524.
The following table lists, for several values of ξ, all values of c for which
T (c) passes the liberal smoothness test.
Evidently, for small values of ξ, only some (not all) smooth values of T (c) pass
the selection criterion. As we increase ξ, more smooth values of T (c) pass the
criterion, and more non-smooth values too pass the criterion. ¤
c Ac T (c)
−48 5573 −77848 = (−1) × 23 × 37 × 263
−33 5550 −53713 = (−1) × 11 × 19 × 257
−32 5948 −52088 = (−1) × 23 × 17 × 383
−30 4693 −48832 = (−1) × 26 × 7 × 109
−28 4489 −45568 = (−1) × 29 × 89
−26 5740 −42296 = (−1)23 × 17 × 311
−23 5639 −37373 = (−1) × 7 × 19 × 281
−18 5803 −29128 = (−1) × 23 × 11 × 331
−8 5408 −12488 = (−1) × 23 × 7 × 223
−1 4635 −721 = (−1) × 7 × 103
5 4264 9443 = 7 × 19 × 71
6 5294 11144 = 23 × 7 × 199
8 4673 14552 = 23 × 17 × 107
12 5253 21392 = 24 × 7 × 191
14 4673 24824 = 23 × 29 × 107
15 4845 26543 = 11 × 19 × 127
19 5639 33439 = 7 × 17 × 281
20 5057 35168 = 25 × 7 × 157
24 5624 42104 = 23 × 19 × 277
32 5094 56072 = 23 × 43 × 163
40 5189 70168 = 23 × 72 × 179
41 5477 71939 = 7 × 43 × 239
42 5602 73712 = 24 × 17 × 271
43 4920 75487 = 19 × 29 × 137
48 4922 84392 = 23 × 7 × 11 × 137
Three large primes have repeated occurrences (see the boxed entries). If
we add these three primes to the factor base of Example 6.16, we obtain
B = {−1, 2, 7, 11, 17, 19, 29, 37, 43, 107, 137, 281} (12 elements) and 14+6 = 20
relations, that is, we are now guaranteed to have at least 220−12 = 28 solutions.
Compare this with the original situation of 14 relations involving 9 elements
of B, where the guaranteed number of solutions was > 214−9 = 25 . ¤
putation, 48, 329–339, 1987. The author, however, acknowledges personal communication
with Peter L. Montgomery for the idea.
Integer Factorization 327
The following table lists the smooth values of T (c) as c ranges over the interval
−50 6 c 6 50. The MPQSM yields 19 relations, whereas the original QSM
yields only 14 relations (see Example 6.16).
c T (c) c T (c)
−50 46816 = 25 ×7×11×19 6 −23408 = (−1)×24 ×7×11×19
−38 16456 = 23 ×112 ×17 18 −14792 = (−1)×23 ×432
−34 8192 = 213 20 −12544 = (−1)×28 ×72
−32 4408 = 23 ×19×29 22 −10064 = (−1)×24 ×17×37
−29 −833 = (−1)×72 ×17 26 −4408 = (−1)×23 ×19×29
−28 −2464 = (−1)×25 ×7×11 27 −2849 = (−1)×7×11×37
−14 −19208 = (−1)×23 ×74 28 −1232 = (−1)×24 ×7×11
−12 −20672 = (−1)×26 ×17×19 30 2176 = 27 ×17
−10 −21904 = (−1)×24 ×372 45 35131 = 19×432
−3 −24389 = (−1)×293
In order to exploit the reduction of the values of T (c) in the MPQSM,
we could have started with a smaller sieving interval, like M = 35. In this
case, we have the parameters U = −19264, V = 17, and W = 37, that is,
T (c) = −19264 + 34c + 37c2 . Smoothness tests of the values of T (c) for −35 6
c 6 35 yield 15 smooth values (for c = −35, −26, −24, −23, −18, −16, −8,
−7, 0, 16, 18, 20, 21, 22, 32).
On the other hand, if we kept M = 50 but eliminated the primes 37 and
43 from B, we would have a factor base of size 7. The relations in the above
table, that we can now no longer use, correspond to c = −10, 18, 22, 27, 45.
But that still leaves us with 14 other relations. ¤
Example 6.20 illustrates that in the MPQSM, we can start with values of t
and/or M smaller than optimal. We may still hope to obtain sufficiently many
relations to split n. Moreover, we can use different polynomials (for different
choices of W ), and run different instances of the MPQSM in parallel.
where
Example 6.21 Let us factor n = 6998891 using the CSM. We need a solution
of x3 ≡ y 2 z (mod n). For x = 241, y = 3 and z = −29, we have x3 − y 2 z = 2n.
We take M = 50. The factor base B consists of −1, all primes < 100 (there are
25 of them), and all integers of the form x + ay = 241 + 3a for −50 6 a 6 50.
The size of the factor base is, therefore, 1 + 25 + 101 = 127. In this case,
we have T (a, b, c) = −29 + 241(ab + ac + bc) + 3abc. If we vary a, b, c with
−50 6 a 6 b 6 c 6 50 and with a + b + c = 0, we obtain 162 smooth values
of T (a, b, c). Some of these smooth values are listed in the following table.
Integer Factorization 329
a b c T (a, b, c)
−50 −49 99 −1043970 = (−1) × 2 × 3 × 5 × 17 × 23 × 89
−50 −39 89 −918390 = (−1) × 2 × 3 × 5 × 113 × 23
−50 −16 66 −698625 = (−1) × 35 × 53 × 23
−50 14 36 −556665 = (−1) × 3 × 5 × 17 × 37 × 59
−49 −48 97 −1016334 = (−1) × 2 × 33 × 11 × 29 × 59
−49 −21 70 −716850 = (−1) × 2 × 35 × 52 × 59
−49 −4 53 −598598 = (−1) × 2 × 7 × 11 × 132 × 23
−49 7 42 −551034 = (−1) × 2 × 32 × 113 × 23
−49 10 39 −542010 = (−1) × 2 × 3 × 5 × 7 × 29 × 89
−49 18 31 −526218 = (−1) × 2 × 3 × 7 × 11 × 17 × 67
−48 −38 86 −872289 = (−1) × 34 × 112 × 89
−48 −29 77 −771894 = (−1) × 2 × 32 × 19 × 37 × 61
−48 −19 67 −678774 = (−1) × 2 × 3 × 29 × 47 × 83
−48 9 39 −521246 = (−1) × 2 × 11 × 19 × 29 × 43
−48 20 28 −500973 = (−1) × 3 × 11 × 17 × 19 × 47
···
−5 −4 9 −14190 = (−1) × 2 × 3 × 5 × 11 × 43
−5 −1 6 −7410 = (−1) × 2 × 3 × 5 × 13 × 19
−5 2 3 −4698 = (−1) × 2 × 34 × 29
−4 −3 7 −8694 = (−1) × 2 × 33 × 7 × 23
−4 −2 6 −6633 = (−1) × 32 × 11 × 67
−4 0 4 −3885 = (−1) × 3 × 5 × 7 × 37
−4 1 3 −3198 = (−1) × 2 × 3 × 13 × 41
−3 1 2 −1734 = (−1) × 2 × 3 × 172
−2 −2 4 −2873 = (−1) × 132 × 17
−1 0 1 −270 = (−1) × 2 × 33 × 5
0 0 0 −29 = (−1) × 29
We solve for β1 , β2 , . . . , β162 from 127 linear congruences modulo 2. These
linear-algebra calculations are not shown here. Since the number of variables is
significantly larger than the number of equations, we expect to find a non-zero
solution for β1 , β2 , . . . , β162 to split n. Indeed, we get n = 293 × 23887. ¤
The CSM has several problems which restrict the use the CSM in a general
situation. The biggest problem is that we need a solution of the congruence
x3 ≡ y 2 z (mod n) with x3 6= y 2 z and with x, y, z as small as possible. No
polynomial-time (nor even subexponential-time) method is known to obtain
such a solution. Only when n is of certain special forms (like n or a multiple
of n is close to a perfect cube), a solution for x, y, z is available naturally.
A second problem of the CSM is that because of the quadratic and cubic
coefficients (in a, b, c), the values
p of T (a, b, c) are, in practice, rather large. To
be precise, T (a, b, c) = O(L[3 ξ/2 ]nξ ). Although this quantity is asymptot-
ically O(nξ+o(1) ), the expected benefits of the CSM do not show up unless n
is quite large. My practical experience with the CSM shows that for integers
of bit sizes > 200, the CSM offers some speedup over the QSM. However,
330 Computational Number Theory
for these bit sizes, one would possibly prefer to apply the number-field sieve
method. In the special situations where the CSM is readily applicable (as
mentioned in the last paragraph), the special number-field sieve method too
is applicable, and appears to be a strong contender of the CSM.
E : y 2 = x3 + ax + b
In order to see what happened behind the curtain, let me reveal that n
factors as pq with p = 541 and q = 1987. The curve Ep is cyclic with prime
order 571, and the curve Eq is cyclic of square-free order 1934 = 2×967. Thus,
neither |Ep | nor |Eq | is smooth over the chosen factor base B, and so mP 6= O
in both Ep and Eq , that is, an accidental discovery of a non-invertible element
modulo n did not happen during the five scalar multiplications. Indeed, the
points Pi on Ep and Eq are listed in the following table.
i Pi (mod n) Pi (mod p) Pi (mod q)
0 (26, 83) (26, 83) (26, 83)
1 (330772, 1003428) (221, 414) (930, 1980)
2 (804084, 683260) (158, 518) (1336, 1719)
3 (742854, 1008597) (61, 173) (1703, 1188)
4 (926695, 354471) (503, 116) (753, 785)
5 (730198, 880012) (389, 346) (969, 1758)
I now illustrate a successful iteration of the ECM. For the choices P =
(h, k) = (81, 82) and a = 3, we have b = 550007. We continue to take M =
1036 and B = {2, 3, 5, 7, 11}. The computation of mP now proceeds as follows.
i pi ei = ⌊log M/ log pi ⌋ pei i Pi
0 P0 = (81, 82)
1 2 10 1024 P1 = 1024P0 = (843635, 293492)
2 3 6 729 P2 = 729P1 = (630520, 992223)
3 5 4 625 P3 = 625P2 = (519291, 923811)
4 7 3 343 P4 = 343P3 = (988490, 846127)
5 11 2 121 P5 = 121P4 = ?
The following table lists all intermediate points lP4 in the left-to-right double-
and-add point-multiplication algorithm for computing P5 = 121P4 .
Step l lP4
Init 1 (988490, 846127)
Dbl 2 (519843, 375378)
Add 3 (579901, 1068102)
Dbl 6 (113035, 131528)
Add 7 (816990, 616888)
Dbl 14 (137904, 295554)
Add 15 (517276, 110757)
Dbl 30 (683232, 158345)
Dbl 60 (890993, 947226)
Dbl 120 (815911, 801218)
Add 121 Failure
In the last step, an attempt is made to add 120P4 = (815911, 801218) and
P4 = (988490, 846127). These two points are different modulo n. So we try to
invert r = 988490−815911 = 172579. To that effect, we compute the extended
gcd of r and n, and discover that gcd(r, n) = 541 is a non-trivial factor of n.
334 Computational Number Theory
Let us see what happened behind the curtain to make this attempt success-
ful. The choices a, b in this attempt give a curve Ep (where p = 541) of order
539 = 72 × 11, whereas Eq (where q = 1987) now has order 1959 = 3 × 653. It
follows that mP = O in Ep , but mP 6= O in Eq . This is the reason why the
computation of mP is bound to reveal p at some stage. The evolution of the
points Pi is listed below modulo n, p and q.
For deriving the running time of the ECM, let us first assume that a fairly
accurate bound M on the smallest prime factor p of n is available to us. The
choices of the factor base and of the exponents √ ei depend upon that. More
precisely, we let B consist of all primes 6 LM [1/ 2 ]. An integer (the size
√ of
the group Ep ) of value√O(M ) is smooth over B with probability LM [−1/ 2 ],
that is, about LM [1/ 2 ] random curves E need to be tried to obtain one
B-smooth√ value of |Ep |. For each choice of a curve, we make t = |B| =
LM [1/ 2 ] scalar multiplications. Since each such scalar multiplication can
3
be completed in O(log √ n) time (a polynomial in log n), the √ running time
of the ECM is LM [ 2 ]. In the worst case, M is as large as n , and this
running time becomes Ln [1] which√is the same as that of the QSM. However,
if M is significantly smaller than n , the ECM is capable of demonstrating
superior performance compared to the QSM. For example, if n is known to
be the product
√ of three distinct primes of roughly the same bit size,p we can
take M = 3 n . In that case, the running time of the ECM is Ln [ 2/3 ] ≈
Ln [0.816]—the same as the best possible running time of the CSM.
Given our current knowledge of factoring, the hardest nuts to crack are
the products of two primes of nearly the same bit sizes. For such integers, the
MPQSM has been experimentally found by the researchers to be slightly more
efficient than the ECM. The most plausible reason for this is that the ECM
has no natural way of quickly sieving out the bad choices. The ECM is still an
important algorithm, because it can effectively exploit the presence of small
prime divisors—a capability not present at all in the QSM or the MPQSM.
the number field sieve, Lecture Notes in Mathematics, 1554, Springer, 50–94, 1994.
336 Computational Number Theory
If the right side of this congruence is smooth over a set of small primes, we
get a relation of the form
t
Y
Φ(β1 )Φ(β2 ) · · · Φ(βk ) ≡ pei i (mod n).
i=1
√
In the QSM, we choose H ≈ n . In order to arrive at a smaller running time,
the NFSM chooses a subexponential expression in log n as H. Moreover, the
product α should be a polynomial of small degree in θ so that substituting θ
by H in α gives a value much smaller compared to n. A good choice for α is
α = aθ + b
for small coprime integers a, b. But then, how can we express such an element
aθ + b as a product of β1 , β2 , . . . , βk ? This is precisely where the algebraic
properties of OK come to the forefront. We can identify a set of small elements
in OK . More technically, we choose a set P of elements of OK of small prime
norms, and a generating set U of units of OK . For a choice of a, b, we need an
algorithm to factor a + bθ completely (if possible) into a product of elements
from P ∪ U. Indeed, checking the smoothness of a + bθ reduces to checking the
smoothness of the integer (−b)d f (−a/b). Factorization in number rings OK
is too difficult a topic to be explained here. Example 6.23 supplies a flavor.
To worsen matters, the ring OK may fail to support unique factorization
of elements into products of primes and units. However, all number rings are
so-called Dedekind domains where unique factorization holds at the level of
ideals, and each ideal can be generated by at most two elements. The general
number-field sieve method takes all these issues into account, and yields an
integer-factoring algorithm that runs in L(n, 1/3, (64/9)1/3 ) time.
irrational value −0.79654 . . . , but we do not need to know this value explicitly
or approximately, because we planned for an algebraic representation of K.
The remaining six roots of f (x) are (properly) complex. Since complex roots
of a real polynomial occur in complex-conjugate pairs, the number of pairwise
non-conjugate complex roots of f (x) is r2 = 3. The pair (r1 , r2 ) = (1, 3) is
called the signature of K.
Having defined the number field K, we now need to concentrate on its ring
of integers OK , that is, elements of K whose minimal polynomials are in Z[x]
and are monic. It turns out that all elements of OK can be expressed uniquely
as Z-linear combinations of 1, θ, θ2 , . . . , θ6 , that is,
OK = Z[θ] = {a0 + a1 θ + · · · + a6 θ6 | ai ∈ Z}.
Furthermore, this OK supports unique factorization at the level of elements,
so we do not have to worry about factorization at the level of ideals.
The second task is to choose a factor base. The factor base now consists
of two parts: small integer primes p1 , p2 , . . . , pt for checking the smoothness
of the (rational) integers aH + b, and some small elements of OK for checking
the smoothness of the algebraic integers aθ + b. The so-called small elements
of OK are some small primes in OK and some units in OK .
In order to understand how small prime elements are chosen from OK , we
need the concept of norms. Let α = α(θ) be an element of K. Let θ1 , θ2 , . . . , θd
be all the roots of f (x) in C. The norm of α is defined as
d
Y
N(α) = α(θi ).
i=1
But 33 = 3 × 11 is smooth (over a factor base containing at least the first five
rational primes), that is, a relation is generated. ¤
The NFSM deploys two sieves, one for filtering the smooth integers aH +b,
and the other for filtering the smooth algebraic integers aθ + b. Both these
candidates are subexponential expressions in log n. For QSM, the smoothness
candidates are exponential in log n. This results in the superior asymptotic
performance of the NFSM over the QSM. In practice, this asymptotic superi-
ority shows up for input integers of size at least several hundred bits.
340 Computational Number Theory
Exercises
1. Let ω and c be constants with 0 < ω < 1 and c > 0, and
£ ¤
Ln (ω, c) = exp (c + o(1)) (ln n)ω (ln ln n)1−ω .
(a) Show that if two odd integers m, n can be written as sums of two squares,
then their product mn can also be so written.
(b) Prove that no n ≡ 3 (mod 4) can be written as a sum of two squares.
(c) Let a square-free composite integer n be a product of (distinct) primes
each congruent to 1 modulo 4. Show that n can be written as a sum of two
squares in (at least) two different ways.
(d) Let n be as in Part (c). Suppose that we know two ways of expressing n
as a sum of two squares. Describe how n can factored easily.
9. Prove that an integer of the form 4e (8k + 7) (with e, k > 0) cannot be written
as a sum of three squares. (Remark: The converse of this statement is also
true, but is somewhat difficult to prove.)
10. Prove that Carmichael numbers are (probabilistically) easy to factor.
11. (a) Suppose that in Dixon’s method for factoring n, we first choose a non-zero
z ∈ Zn , and generate relations of the form
x2i z ≡ pαi1 αi2 αit
1 p2 · · · pt (mod n).
Describe how these relations can be combined to arrive at a congruence of the
form x2 ≡ y 2 (mod n).
(b) Now, choose several small values of z (like 1 6 z 6 M for a small bound
M ). Describe how you can still generate a congruence x2 ≡ y 2 (mod n). What,
if anything, do you gain by using this strategy (over Dixon’s original method)?
12. Dixon’s method for factoring an integer n can be combined with a sieve in
order to reduce its running time to L[3/2]. Instead of choosing random values
of x1 , x2 , . . . , xs in the relations, we first choose a random value of x, and for
−M 6 c 6 M , we check the smoothness of the integers (x+c)2 (mod n) over t
small primes p1 , p2 , . . . , pt . As in Dixon’s original method, we take t = L[1/2].
(a) Determine M for which one expects to get a system of the desired size.
(b) Describe a sieve over the interval [−M, M ] for detecting the smooth values
of (x + c)2 (mod n).
(c) Deduce how you achieve a running time of L[3/2] using this sieve.
13. Dixon’s method for factoring an integer n can be combined with another
sieving idea. We predetermine a factor base B = {p1 , p2 , . . . , pt }, and a sieving
interval [−M, M ]. Suppose that we compute u ≡ z 2 (mod n) for a randomly
chosen non-zero z ∈ Zn .
(a) Describe a sieve to identify all B-smooth values of u+cn for −M 6 c 6 M .
Each such B-smooth value yields a relation of the form
z 2 ≡ pαi1 αi2 αit
1 p2 · · · pt (mod n).
(b) How can these relations be combined to get a congruence x2 ≡ y 2 (mod n)?
(c) Supply optimal choices for t and M . What is the running time of this
variant of Dixon’s method for these choices of t and M ?
(d) Compare the sieve of this exercise with that of Exercise 6.12.
14. Let αij be the exponent of the j-th small prime pj in the i-th relation collected
in Dixon’s method. We find vectors in the null space of the t × s matrix A =
342 Computational Number Theory
(αji ). In the linear-algebra phase, it suffices to know the value of αij modulo
2. Assume that for a small prime p and a small exponent h, the probability
that a random square x2 (mod n) has probability 1/ph of being divisible by
ph (irrespective of whether x2 (mod n) is B-smooth or not). Calculate the
probability that αij ≡ 1 (mod 2). (This probability would be a function of
the prime pj . This probability calculation applies to other subexponential
factoring methods like CFRAC, QSM and CSM.)
15. [Fermat’s factorization method ] Let n be √ an odd positive composite integer
which is not a perfect square, and H = ⌈ n ⌉.
(a) Prove that there exists c > 0 such that (H + c)2 − n is a perfect square
b2 with H + c 6≡ ±b (mod n).
(b) If we keep on trying c = 0, 1, 2, . . . until (H + c)2 − n is a perfect square,
we obtain an algorithm to factor n. What is its √ worst-case complexity?
(c) Prove that if n has a factor u satisfying n − u < n1/4 , then H 2 − n is
itself a perfect square.
16. Suppose that we want to factor n = 3337 using the quadratic sieve method.
(a) Determine H and J, and write the expression for T (c).
(b) Let the factor base B be a suitable subset of {−1, 2, 3, 5, 7, 11}. Find all
B-smooth values of T (c) for −5 6 c 6 5. You do not have to use a sieve. Find
the smooth values by trial division only.
17. (a) Explain how sieving is carried out in the multiple-polynomial quadratic
sieve method, that is, for T (c) = U + 2V c + W c2 with V 2 − U W = n.
(b) If the factor base consists of L[1/2] primes and the sieving interval is of
size L[1], deduce that the sieving process can be completed in L[1] time.
√ §√ ¨
18. In the original QSM, we sieve around n . Let us instead take H = 2n ,
and J = H 2 − 2n.
(a) Describe how we can modify the original QSM to work for these values of
H and J. (It suffices to describe how we get a relation in the modified QSM.)
(b) Explain why the modified QSM is poorer than the original QSM. (Hint:
Look at the approximate average value of |T (c)|.)
(c) Despite the objection in Part (b) about the modified QSM, we can √exploit
it to our advantage. Suppose that we √ run two sieves: one around n (the
original QSM), and the other around 2n (the modified QSM), each on a
sieving interval of length half of that for the original QSM. Justify why this
reduction in the length of the sieving interval is acceptable. Discuss what we
gain by using the dual sieve.
19. In the original QSM, we took T (c) = (H + c)2 − n = J + 2cH + c2 . Instead,
one may choose c1 , c2 satisfying −M 6 c1 6 c2 6 M , and consider T (c1 , c2 ) =
(H + c1 )(H + c2 ) − n = J + (c1 + c2 )H + c1 c2 .
(a) Describe how we get a relation in this variant of the QSM.
(b) Prove that if we choose t = L[1/2] primes in the factor base and M =
L[1/2], we expect to obtain the required number of relations.
(c) Describe a sieving procedure for this variant of the QSM.
(d) Argue that this variant can be implemented to run in L[1] time.
Integer Factorization 343
(e) What are the advantages and disadvantages of this variant of the QSM
over the original QSM?
20. [Special-q variant of QSM ] In the original QSM, we sieve the quantities
T (c) = (H + c)2 − n for −M 6 c 6 M . For small values of |c|, the values
|T (c)| are small and are likely to be smooth. On the contrary, larger values of
c in the sieving interval yield larger values of |T (c)| resulting in poorer yields
of smooth candidates. In Exercise 6.18, this problem is tackled by using a dual
sieve. The MPQSM is another solution. We now study yet another variant.15
In this exercise, we study this variant for large primes only. See Exercise 6.29
for a potential speedup.
(a) Let q be a large prime (B < q < B 2 ) and c0 a small integer such that
q|T (c0 ). Describe how we can locate such q and c0 relatively easily.
(b) Let Tq (c) = T (c0 + cq)/q. How you can sieve Tq (c) for −M 6 c 6 M ?
(c) What do you gain by using this special-q variant of the QSM?
21. Describe a sieve for locating all the smooth values of T (a, b, c) in the CSM.
3
22. Show that the total number of solutions of the congruence xP ≡ y 2 z (mod n)
3 2 2
with x 6= y z is Θ(n ). You may use the formula that 16m6n d(m) =
Θ(n ln n), where d(m) denotes the number of positive integral divisors of m.
23. Describe a special-q method for the CSM. What do you gain, if anything, by
using this special-q variant of the CSM?
24. Show that in the ECM, we can maintain the multiples of P as pairs of ratio-
nal numbers. Describe what modifications are necessary in the ECM for this
representation. What do you gain from this?
25. [Montgomery ladder ]16 You want to compute nP for a point P on the curve
Y 2 = X 3 + aX + b. Let n = (ns−1 ns−2 . . . n1 n0 )2 and Ni = (ns−1 ns−2 . . . ni )2 .
(a) Rewrite the left-to-right double-and-add algorithm so that both Ni P and
(Ni + 1)P are computed in the loop.
(b) Prove that given only the X-coordinates of P1 , P2 and P1 − P2 , we can
compute the X-coordinate of P1 + P2 . Handle the case P1 = P2 too.
(c) What implication does this have in the ECM?
26. How can a second stage (as in Pollard’s p − 1 method) be added to the ECM?
27. Investigate how the integer primes 13, 17, 19, 23 behave in the number ring
OK of Example 6.23.
28. [Lattice sieve] Pollard17 introduces the concept of lattice sieves in connection
with the NFSM. Let B be a bound of small primes in the factor base. One
finds out small coprime pairs a, b such that both a + bm (a rational integer)
and a + bθ (an algebraic integer) are B-smooth. The usual way of sieving fixes
15 James A. Davis and Diane B. Holdridge, Factorization using the quadratic sieve algo-
49, 1993.
344 Computational Number Theory
a, and lets b vary over an interval. This is denoted by line sieving. In the rest
of this exercise, we restrict our attention to the rational sieve only.
We use a bound B ′ < B. The value k = B ′ /B lies in the range [0.1, 0.5]. All
primes 6 B ′ are called small primes. All primes p in the range B ′ < p 6 B are
called medium primes. Assume that no medium prime divides m. First, fix a
medium prime q, and consider only those pairs (a, b) with a+bm ≡ 0 (mod q).
Sieve using all primes p < q. This sieve is repeated for all medium primes q.
Let us see the effects of this sieving technique.
(a) Let N be the number of (a, b) pairs for which a + bm is checked for
smoothness in the line sieve, and N ′ the same number for the lattice sieve.
Show that N ′ /N ≈ log(1/k)/ log B. What is N ′ /N for k = 0.25 and B = 106 ?
(b) What smooth candidates are missed in the lattice sieve? Find their relative
percentage in the set of smooth integers located in the line sieve, for the values
k = 0.25, B = 106 , m = 1030 , and for b varying in the range 0 6 b 6 106 .
These real-life figures demonstrate that with significantly reduced efforts, one
can obtain most of the relations.
(c) Show that all integer solutions (a, b) of a + bm ≡ 0 (mod q) form a two-
dimensional lattice. Let V1 = (a1 , b1 ) and V2 = (a2 , b2 ) constitute a reduced
basis of this lattice.
(d) A solution (a, b) of a + bm ≡ 0 (mod q) can be written as (a, b) = cV1 +
dV2 = (ca1 + da2 , cb1 + db2 ). Instead of letting a vary from −M to M and b
from 1 to M , Pollard suggests letting c vary from −C to C and d from 1 to D.
This is somewhat ad hoc, since rectangular regions in the (a, b) plane do not,
in general, correspond to rectangular regions in the (c, d) plane. Nonetheless,
this is not a practically bad idea. Describe how sieving can be done in the
chosen rectangle for (c, d).
29. Describe how the idea of using small and medium primes, introduced in Exer-
cise 6.28, can be adapted to the case of the QSM. Also highlight the expected
benefits. Note that this is the special-q variant of the QSM with medium
special primes q instead of large special primes as discussed in Exercise 6.20.
Programming Exercises
Implement the following in GP/PARI.
30. Floyd’s variant of Pollard’s rho method.
31. Brent’s variant of Pollard’s rho method.
32. Pollard’s p − 1 method.
33. The second stage of Pollard’s p − 1 method.
34. Fermat’s factorization method (Exercise 6.15).
35. Dixon’s method.
36. The relation-collection stage of the QSM (use trial division instead of a sieve).
37. The sieve of the QSM.
38. Collecting relations involving large primes from the sieve of Exercise 6.37.
Chapter 7
Discrete Logarithms
Let G be a finite group1 of size n. To start with, assume that G is cyclic, and
g is a generator of G. Any element a ∈ G can be uniquely expressed as a = g x
for some integer x in the range 0 6 x 6 n − 1. The integer x is called the
discrete logarithm or index of a with respect to g, and is denoted by indg a.
Computing x from G, g and a is called the discrete logarithm problem (DLP).
We now remove the assumption that G is cyclic. Let g ∈ G have ord(g) =
m, and let H be the subgroup of G generated by g. H is cyclic of order m. We
are given an element a ∈ G. If a ∈ H, then a = g x for some unique integer x
in the range 0 6 x 6 m − 1. On the other hand, if a ∈ / H, then a cannot be
345
346 Computational Number Theory
ement, then computing indices with respect to that or any other primitive
element is rather trivial. However, we have argued in Section 2.5 that this
representation is not practical except only for small fields.
A related computational problem is called the Diffie–Hellman problem
(DHP) that came to light after the seminal discovery of public-key cryptog-
raphy by Diffie and Hellman in 1976. Consider a multiplicative group G with
g ∈ G. Suppose that the group elements g x and g y are given to us for some
unknown indices x and y. Computation of g xy from the knowledge of G, g, g x
and g y is called the Diffie–Hellman problem in G. Evidently, if the DLP in G
is easy to solve, the DHP in G is easy too (g xy = (g x )y with y = indg (g y )).
The converse implication is not clear. It is again only a popular belief that
solving the DHP in G is computationally as difficult as solving the DLP in G.
In most of this chapter, I concentrate on algorithms for solving the discrete-
logarithm problem in finite fields. I start with some square-root methods that
are applicable to any group (including elliptic-curve groups). Later, I focus
on two practically important cases: the prime fields Fp , and the binary fields
F2n . Subexponential algorithms, collectively called index calculus methods, are
discussed for these two types of fields. The DLP in extension fields Fpn of odd
characteristics p is less studied, and not many significant results are known for
these fields, particularly when both p and n are allowed to grow indefinitely.
At the end, the elliptic-curve discrete-logarithm problem is briefly addressed.
GP/PARI supports computation of discrete logarithms in prime fields. One
should call znlog(a,g), where g is a primitive element of Z∗p for some prime p.
gp > p = nextprime(10000)
%1 = 10007
gp > g = Mod(5,p)
%2 = Mod(5, 10007)
gp > znorder(g)
%3 = 10006
gp > a = Mod(5678,p)
%4 = Mod(5678, 10007)
gp > znlog(a,g)
%5 = 8620
gp > g^8620
%6 = Mod(5678, 10007)
gp > znlog(Mod(0,p),g)
*** impossible inverse modulo: Mod(0, 10007).
gorithms for computing discrete logarithms in all groups. For example, the
fastest known algorithms for solving the ECDLP for general elliptic curves
are the square-root methods. It is, therefore, quite important to understand
the tales from the dark age. In this section, we assume that G is a finite cyclic
multiplicative group of size n, and g ∈ G is a generator of G. We are interested
in computing indg a for some a ∈ G.
Example 7.1 Let me illustrate the BSGS method for the √ group G = F∗97
with generator g = 23. Since n = |G| = 96, we have m = ⌈ n ⌉ = 10. The
table of baby steps contains (i, g i ) for i = 0, 1, 2, . . . , 9, and is given below.
The table is kept sorted with respect to the second element (g i ).
i 0 5 8 6 1 7 3 2 9 4
gi 1 5 16 18 23 26 42 44 77 93
These index calculations are done in the subgroup of size p, generated by g n/p .
over GF(p) and its cryptographic significance, IEEE Transactions on Information Theory,
24, 106–110, 1978. This algorithm seems to have been first discovered (but not published)
by Roland Silver, and is often referred to also as the Silver-Pohlig–Hellman method.
352 Computational Number Theory
we compute x rem 4 = x0 + 2x1 and x rem 49 = x′0 + 7x′1 . The following table
illustrates these calculations.
p=2 p=7
i+1 i+1
i g n/2 λ (ag −λ )n/2 xi i g n/7 λ (ag −λ )n/7 x′i
0 196 0 196 1 0 164 0 178 4
1 196 1 196 1 1 164 4 36 5
These calculations yield x ≡ 1 + 2 × 1 ≡ 3 (mod 4) and x ≡ 4 + 7 × 5 ≡
39 (mod 49). Combining using the CRT gives x ≡ 39 (mod 196). ¤
This yields
t
X
α + β indg a ≡ γi indg (bi ) (mod p − 1).
i=1
t
X
γi indg (bi ) ≡ α (mod p − 1),
i=1
Different index calculus methods vary in the way the factor base is chosen
and the relations are generated. In the rest of this section, we discuss some
variants. All these variants have running times of the form
£ ¤
L(p, ω, c) = exp (c + o(1))(ln p)ω (ln ln p)1−ω
for real constant values c and ω with c > 0 and 0 < ω < 1. If ω = 1/2, we
abbreviate L(p, ω, c) as Lp [c], and even as L[c] if p is clear from the context.
γ11 indg (p1 ) + γ12 indg (p2 ) + · · · + γ1t indg (pt ) ≡ α1 (mod p − 1),
γ21 indg (p1 ) + γ22 indg (p2 ) + · · · + γ2t indg (pt ) ≡ α2 (mod p − 1),
···
γs1 indg (p1 ) + γs2 indg (p2 ) + · · · + γst indg (pt ) ≡ αs (mod p − 1).
9 Historically, the basic index calculus method came earlier than Dixon’s method.
354 Computational Number Theory
We obtain indg a using the values of indg (pi ) computed in the first stage.
Example 7.5 Take p = 821 and g = 21. We intend to compute the discrete
logarithm of a = 237 to the base g by the basic index calculus method. We
take the factor base B = {2, 3, 5, 7, 11} consisting of the first t = 5 primes.
In the first stage, we compute g j (mod p) for randomly chosen values of j.
After many choices, we come up with the following ten relations.
g 815 ≡ 90 ≡ 2 × 32 × 5 (mod 821)
g 784 ≡ 726 ≡ 2 × 3 × 112 (mod 821)
g 339 ≡ 126 ≡ 2 × 32 × 7 (mod 821)
g 639 ≡ 189 ≡ 33 × 7 (mod 821)
g 280 ≡ 88 ≡ 23 × 11 (mod 821)
g 295 ≡ 135 ≡ 33 × 5 (mod 821)
g 793 ≡ 375 ≡ 3 × 53 (mod 821)
g 478 ≡ 315 ≡ 32 × 5 × 7 (mod 821)
g 159 ≡ 105 ≡ 3×5×7 (mod 821)
g 635 ≡ 75 ≡ 3 × 52 (mod 821)
The corresponding system of linear congruences is as follows.
1 2 1 0 0 815
1 1 0 0 2 784
1 2 0 1 0 339
indg (2)
0 3 0 1 0 639
indg (3)
3 0 0 0 1 280
indg (5) ≡ (mod 820).
0 3 1 0 0 295
indg (7)
0 1 3 0 0 793
indg (11)
0 2 1 1 0 478
0 1 1 1 0 159
0 1 2 0 0 635
Discrete Logarithms 355
To optimize the running time of the basic index calculus method, we resort
to the density estimate of smooth integers, given in Section 6.4. In particular,
we use Corollary 6.14 with n = p.
Let the factor base B consist of all primes 6 L[η], so t too is of the form
L[η]. For randomly chosen values of α, the elements g α ∈ F∗p are random
integers between 1 and p − 1, that is, integers of value O(p). The probability
1 1
that such a value is smooth with respect to B is L[− 2η ]. Therefore, L[ 2η ]
random choices of α are expected to yield a single relation. We need s > 2t
relations, that is, s is again of the form L[η]. Thus, the total number of random
1
values of α, that need to be tried, is L[η + 2η ]. The most significant effort
associated with each choice of α is the attempt to factor g α . This is carried
out by trial divisions by L[η] primes in the factor base. To sum up, the relation-
1 1
collection stage runs in L[2η + 2η ] time. The quantity 2η + 2η is minimized
1
for η = 2 , leading to a running time of L[2] for the relation-collection stage.
In the linear-algebra stage, a system of L[ 12 ] linear congruences in L[ 12 ]
variables is solved. Standard Gaussian elimination requires a time of L[ 12 ]3 =
L[ 32 ] for solving the system. However, since each relation obtained by this
method is necessarily sparse, special sparse system solvers can be employed
to run in only L[ 12 ]2 = L[1] time. In any case, the first stage of the basic index
calculus method can be arranged to run in a total time of L[2].
The second stage involves finding a single smooth value of ag α . We need
1
to try L[ 2η ] = L[1] random values of α, with each value requiring L[ 12 ] trial
divisions. Thus, the total time required for the second stage is L[1+ 21 ] = L[ 32 ].
The running time of the basic index calculus method is dominated by the
relation-collection phase and is L[2]. The space requirement is L[η], that is,
L[ 21 ] (assuming that we use a sparse representation of the coefficient matrix).
(H + c1 )(H + c2 ) ≡ pα1 α2 αt
1 p2 · · · pt (mod p), that is,
c1 c2 T (c1 , c2 ) c1 c2 T (c1 , c2 )
−7 4 −99 = (−1) × 32 × 11 −3 3 1 = 1
−6 2 −110 = (−1) × 2 × 5 × 11 −3 4 25 = 52
−6 7 −5 = (−1) × 5 −3 5 49 = 72
−5 −1 −147 = (−1) × 3 × 72 −2 0 −44 = (−1) × 22 × 11
−5 0 −125 = (−1) × 53 −2 2 6 = 2×3
−5 2 −81 = (−1) × 34 −2 4 56 = 23 × 7
−5 5 −15 = (−1) × 3 × 5 −2 5 81 = 34
−5 6 7 = 7 −1 1 9 = 32
−4 −2 −144 = (−1) × 24 × 32 −1 2 35 = 5×7
−4 −1 −121 = (−1) × 112 −1 7 165 = 3 × 5 × 11
−4 0 −98 = (−1) × 2 × 72 0 0 10 = 2×5
−4 1 −75 = (−1) × 3 × 52 0 2 64 = 26
−4 4 −6 = (−1) × 2 × 3 1 3 121 = 112
−4 6 40 = 23 × 5 2 4 180 = 22 × 32 × 5
−4 7 63 = 32 × 7 4 4 242 = 21 × 112
7.2.2.2 Sieving
Let me now explain how sieving can be carried out to generate the relations
of the linear sieve method. First, fix a value of c1 in the range −M 6 c1 6 M ,
and allow c2 to vary in the interval [c1 , M ]. Let q be a small prime in the
factor base (q = pi for 1 6 i 6 t), and h a small positive exponent. We need
to determine all the values of c2 for which q h |T (c1 , c2 ) (for the fixed choice of
c1 ). The condition T (c1 , c2 ) ≡ 0 (mod q h ) gives the linear congruence in c2 :
where pi are the primes in the factor base B, and qj are primes 6 L[2]
not in the factor base. We then have
t
X k
X
indg a ≡ −α + αi indg (pi ) + βj indg (qj ) (mod p − 1).
i=1 j=1
360 Computational Number Theory
Example 7.7 Let us continue with the database of indices available from the
first stage described in Example 7.6. Suppose that we want to compute the
index of a = 123 modulo p = 719 with respect to g = 11. We first obtain the
relation ag 161 ≡ 182 ≡ 2×7×13 (mod 719), that is, indg a ≡ −161+indg (2)+
indg (7) + indg (13) ≡ −161 + 606 + 650 + indg (13) ≡ 377 + indg (13) (mod 718).
What remains is to compute the index of q = 13.
√
We look for an 11-smooth integer y close to p/13 ≈ 2.0626. We take
y = 3. Example 7.6 gives indg (y) = 42. (Since we are working here with an
artificially small p, the value of y turns out to be abnormally small.)
Finally, we find an 11-smooth value of (H + c)qy − p for −7 6 c 6 7.
For c = 4, we have (H + 4)qy − p = 490 = 2 × 5 × 72 , that is, indg q ≡
− indg (H + 4) − indg (y) + indg (2) + indg (5) + 2 indg (7) ≡ −304 − 42 + 606 +
364 + 2 × 650 ≡ 488 (mod 718). This gives the desired discrete logarithm
indg (123) ≡ 377 + 488 ≡ 147 (mod 718). ¤
c H +c H + c1 H + c2 T (c1 , c2 )
−6 24 = 23 × 3 24 36 1
−5 25 = 52 25 32 −63 = (−1) × 32 × 7
−3 27 = 33 25 35 12 = 22 × 3
−2 28 = 22 × 7 27 32 1
0 30 = 2×3×5 27 33 28 = 22 × 7
2 32 = 25 28 32 33 = 3 × 11
3 33 = 3 × 11
5 35 = 5×7
6 36 = 22 × 32
The following relations are thus obtained. We use the notation xa to stand
for indg a (where a ∈ B).
(3x2 + x3 ) + (2x2 + 2x3 ) ≡ 0 (mod 862),
(2x5 ) + (5x2 ) ≡ x−1 + 2x3 + x7 (mod 862),
(2x5 ) + (x5 + x7 ) ≡ 2x2 + x3 (mod 862),
(3x3 ) + (5x2 ) ≡ 0 (mod 862),
(3x3 ) + (x3 + x11 ) ≡ 2x2 + x7 (mod 862),
(2x2 + x7 ) + (5x2 ) ≡ x3 + x11 (mod 862).
Moreover, we have the free relation:
x5 ≡ 1 (mod 862).
We also use the fact that x−1 ≡ (p − 1)/2 ≡ 431 (mod 862), since g = 5 is a
primitive root of p. This gives us two solutions of the above congruences:
362 Computational Number Theory
But 5161 ≡ −2 (mod 863), whereas 5592 ≡ 2 (mod 863). That is, the second
solution gives the correct values of the indices of the factor-base elements. ¤
The first stage of the residue-list sieve method uses two sieves. The first
one is used to locate all the smooth values of H + c. Since c ranges over L[1]
values between −M and M , this sieve takes a running time of the form L[1].
In the second sieve, one combines pairs of smooth values of H + c obtained
from the first sieve, and identifies the smooth values of T (c1 , c2 ). But H + c
itself ranges over L[1] values (although there are only L[ 12 ] smooth values
among them). In order that the second sieve too can be completed in L[1] time,
we, therefore, need to adopt some special tricks. For each small prime power
q h , we maintain a list of smooth H + c values obtained from the first sieve.
This list should be kept sorted with respect to the residues (H + c) rem q h .
The name residue-list sieve method is attributed to these lists. Since there are
L[ 21 ] prime powers q h , and there are L[ 12 ] smooth values of H + c, the total
storage requirement for all the residue lists is L[1].
For determining the smoothness of T (c1 , c2 ) = (H + c1 )(H + c2 ) − p, one
fixes c1 , and lets c2 vary in the interval c1 6 c2 6 M . For each small prime
power q h , one calculates (H + c) rem q h and p rem q h . One then solves for the
value(s) of (H + c2 ) (mod q h ) from the congruence T (c1 , c2 ) ≡ 0 (mod q h ).
For each solution χ, one consults the residue list for q h to locate all the values
of c2 for which (H + c2 ) rem q h = χ. Since the residue list is kept sorted with
respect to the residue values modulo q h , binary search can quickly identify
the desired values of c2 , leading to a running time of L[1] for the second sieve.
The resulting sparse system with L[ 21 ] congruences in L[ 12 ] variables can
be solved in L[1] time. The second stage of the residue-list sieve method is
identical to the second stage of the linear sieve method, and can be performed
in L[ 21 ] time for each individual logarithm. The second stage involves a sieve
in the third step, which calls for the residue lists available from the first stage.
Let us now make a comparative study between the performances of the
linear sieve method and the residue-list sieve method. The residue-list sieve
method does not include any H + c value in the factor base. As a result, the
size of the factor base is smaller than that in the linear sieve method. How-
ever, maintaining the residue lists calls for a storage of size L[1]. This storage
is permanent in the sense that the second stage (individual logarithm calcula-
tion) requires these lists. For the linear sieve method, on the other hand, the
permanent storage requirement is only L[ 12 ]. Moreover, the (hidden) o(1) term
in the exponent of the running time is higher in the residue-list sieve method
than in the linear sieve method. In view of these difficulties, the residue-list
sieve method turns out to be less practical than the linear sieve method.
Discrete Logarithms 363
(c1 , c2 ) pairs, for which T (c1 , c2 ) is smooth, leads to relations for the Gaussian
integer method, that is, we get L[ 12 ] relations in L[ 12 ] variables, as desired.
Example 7.9 Let us compute discrete logarithms modulo the prime p = 997
to the primitive base g = 7. We have p ≡ 1 (mod 4), so −1 is a quadratic
residue modulo p, and we may take r = 1. But then, we use the ring Z[i]
of Gaussian integers (this justifies the name of this algorithm). We express
p = u2 + v 2 with u = 31 and v = 6, and take s ≡ −v −1 u ≡ 161 (mod p) as
the modular square root of −1, satisfying u + vs ≡ 0 (mod p).
We take t = 6 small rational primes, that is, B1 = {2, 3, 5, 7, 11, 13}. A
set of pairwise non-associate complex primes a + bi with a2 + b2 6 13 is
B2 = {1 + i, 2 + i, 2 − i, 2 + 3i, 2 − 3i}. The factor base is, therefore, given by
B = {−1, 2, 3, 5, 7, 11, 13, v, Φ(1 + i), Φ(2 + i), Φ(2 − i), Φ(2 + 3i), Φ(2 − 3i)}
= {−1, 2, 3, 5, 7, 11, 13, 6, 1 + s, 2 + s, 2 − s, 2 + 3s, 2 − 3s}
= {−1, 2, 3, 5, 7, 11, 13, 6, 162, 163, −159, 485, −481}
= {−1, 2, 3, 5, 7, 11, 13, 6, 162, 163, 838, 485, 516}.
The prime integers 3, 7 and 11 remain prime in Z[i]. We take (c1 , c2 ) pairs with
gcd(c1 , c2 ) = 1, so these primes do not occur in the factorization of c2 − c1 i.
The units in Z[i] are ±1, ±i. We have
indg (Φ(1)) = indg (1) = 0, and
indg (Φ(−1)) = indg (−1) = (p − 1)/2 = 498.
Moreover, Φ(i) = s and Φ(−i) = −s. One of ±s has index (p − 1)/4, and the
other has index 3(p − 1)/4. In this case, we have
indg (Φ(i)) = indg (s) = (p − 1)/4 = 249, and
indg (Φ(−i)) = indg (−s) = 3(p − 1)/4 = 747.
Let us take M = 5, that is, we check all c1 , c2 values between −5 and 5
with gcd(c1 , c2 ) = 1. There are 78 such pairs. For 37 of these pairs, the integer
T (c1 , c2 ) = c1 u + c2 v is smooth with respect to B1 . Among these, 23 yield
smooth values of c2 − c1 i with respect to B2 (see the table on the next page).
Let us now see how such a factorization leads to a relation. Consider c1 =
−3 and c2 = −4. In this case, T (c1 , c2 ) = (−1) × 32 × 13, and c2 − c1 i =
i(2 + i)2 . This gives indg (−1) + 2 indg (3) + indg (13) ≡ indg (v) + indg (Φ(i)) +
2 indg (Φ(2 + i)) (mod p − 1), that is, 498 + 2 indg (3) + indg (13) ≡ indg (6) +
249 + 2 indg (163) (mod 996). The reader is urged to convert the other 22
relations and solve the resulting system of congruences. ¤
Discrete Logarithms 365
c1 c2 T (c1 , c2 ) = c1 u + c2 v c2 − c1 i
−3 −4 −117 = (−1) × 32 × 13 −4 + 3i = (i) × (2 + i)2
−3 −2 −105 = (−1) × 3 × 5 × 7 −2 + 3i = (−1) × (2 − 3i)
−3 −1 −99 = (−1) × 32 × 11 −1 + 3i = (i) × (1 + i) × (2 − i)
−3 2 −81 = (−1) × 34 2 + 3i = 2 + 3i
−2 −3 −80 = (−1) × 24 × 5 −3 + 2i = (i) × (2 + 3i)
−2 1 −56 = (−1) × 23 × 7 1 + 2i = (i) × (2 − i)
−2 3 −44 = (−1) × 22 × 11 3 + 2i = (i) × (2 − 3i)
−1 −3 −49 = (−1) × 72 −3 + i = (i) × (1 + i) × (2 + i)
−1 1 −25 = (−1) × 52 1+ i = 1+ i
−1 3 −13 = (−1) × 13 3+ i = (1 + i) × (2 − i)
−1 5 −1 = (−1) 5+ i = (−i) × (1 + i) × (2 + 3i)
0 1 6 = 2×3 1 = 1
1 −5 1 = 1 −5 − i = (i) × (1 + i) × (2 + 3i)
1 −3 13 = 13 −3 − i = (−1) × (1 + i) × (2 − i)
1 −1 25 = 52 −1 − i = (−1) × (1 + i)
1 3 49 = 72 3− i = (−i) × (1 + i) × (2 + i)
2 −3 44 = 22 × 11 −3 − 2i = (−i) × (2 − 3i)
2 −1 56 = 23 × 7 −1 − 2i = (−i) × (2 − i)
2 3 80 = 24 × 5 3 − 2i = (−i) × (2 + 3i)
3 −2 81 = 34 −2 − 3i = (−1) × (2 + 3i)
3 1 99 = 32 × 11 1 − 3i = (−i) × (1 + i) × (2 − i)
3 2 105 = 3 × 5 × 7 2 − 3i = 2 − 3i
3 4 117 = 32 × 13 4 − 3i = (−i) × (2 + i)2
Sieving to locate smooth T (c1 , c2 ) values is easy. After the sieve terminates,
we throw away smooth values with gcd(c1 , c2 ) > 1. If√gcd(c1 , c2 ) = 1 and
T (c1 , c2 ) is smooth, we make trial division of c2 − c1 −r by the complex
primes in B2 . The entire sieving process can be completed in L[1] time.
The resulting sparse system of L[ 21 ] equations in L[ 12 ] variables can be
solved in L[1] time. The second stage of the Gaussian integer method is iden-
tical to the second stage of the linear sieve method, and can be performed in
L[ 12 ] time for each individual logarithm.
The size of the factor base in the Gaussian integer method is t + t′ + 2,
whereas the size of the factor base in the linear sieve method is t + 2M + 2.
Since t′ ≪ M (indeed, t′ is roughly proportional to M/ ln M ), the Gaussian
integer method gives significantly smaller systems of linear congruences than
the linear sieve method. In addition, the values of T (c1 , c2 ) are somewhat
smaller in the Gaussian
√ integer method than in the linear sieve method (we
√ √
have |c1 u + c2 v| 6 2M p, whereas J + (c1 + c2 )H + c1 c2 6 2M p, approx-
imately). Finally, unlike the residue-list sieve method, the Gaussian integer
method is not crippled by the necessity of L[1] permanent storage. In view
of these advantages, the Gaussian integer method is practically the most pre-
ferred L[1]-time algorithm for computing discrete logarithms in prime fields.
366 Computational Number Theory
Therefore, we take a factor base of the following form (the choice of t and M
to be explained shortly):
B = {−1} ∪ {p1 , p2 , . . . , pt } ∪ {y 2 } ∪ {x + cy | − M 6 c 6 M }.
c1 c2 c3 T (c1 , c2 , c3 )
−50 1 49 −345576 = (−1) × 23 × 3 × 7 × 112 × 17
−50 21 29 −323736 = (−1) × 23 × 3 × 7 × 41 × 47
−50 23 27 −323268 = (−1) × 22 × 3 × 11 × 31 × 79
−49 15 34 −312816 = (−1) × 24 × 3 × 73 × 19
−48 1 47 −318222 = (−1) × 2 × 33 × 71 × 83
−48 22 26 −295647 = (−1) × 3 × 11 × 172 × 31
−47 9 38 −291648 = (−1) × 26 × 3 × 72 × 31
−47 11 36 −289218 = (−1) × 2 × 3 × 19 × 43 × 59
−47 17 30 −284088 = (−1) × 23 × 3 × 7 × 19 × 89
···
−5 −5 10 −9912 = (−1) × 23 × 3 × 7 × 59
−5 2 3 −2688 = (−1) × 27 × 3 × 7
−4 −2 6 −3783 = (−1) × 3 × 13 × 97
−4 0 4 −2211 = (−1) × 3 × 11 × 67
−3 −1 4 −1770 = (−1) × 2 × 3 × 5 × 59
−3 1 2 −972 = (−1) × 22 × 35
−2 −1 3 −948 = (−1) × 22 × 3 × 79
−2 1 1 −408 = (−1) × 23 × 3 × 17
−1 −1 2 −400 = (−1) × 24 × 52
−1 0 1 −126 = (−1) × 2 × 32 × 7
0 0 0 13 = 13
Let me now specify the parameters t and M .pSuppose that x, y, z are each
O(pξ ). We take t as the number ofpprimes 6 L[ ξ/2]. By the prime p number
theorem, t is again
√ of the form L[ ξ/2]. We also take M = L[ ξ/2]. There
are Θ(M 2 ) = L[ 2ξ ] triples (c1 , c2 , c3 ) with −M 6 c1 6 c2 6 c3 6 M and
c1 + c2 + c3 = 0. pEach T (c1 , c2 , c3 ) is of value O(pξ ) and has a probability
ξ
L[− √ ] = L[− ξ/2] of being smooth with respect to the t primes in the
2 ξ/2
factor
pbase. Thus, the expected number of relations for all choices of (c1p , c2 , c3 )
is L[ ξ/2]. The size of the factor base is t + 2M + 3 which is also L[ ξ/2].
In order to locate the smooth values of T (c1 , c2 , c3 ), we express T (c1 , c2 , c3 )
as a function of c1 and c2 alone. The conditions −M 6 c1 6 c2 6 c3 6 M and
c1 + c2 + c3 = 0 imply that c1 varies between −M and 0, and c2 varies between
max(c1 , −(M + c1 )) and −c1 /2 for a fixed c1 . We fix c1 , and let c2 vary in this
allowed range. For a small prime power q h , we solve T (c1 , c2 , c3 ) ≡ 0 (mod q h )
for c2 , taking c1 as constant and c3 = −(c1 + c2 ). This calls for solving a
quadratic congruence in c2 . The details are left to p the reader√(Exercise 7.17).
The sieving process can be completed p in L[2 ξ/2] = L[ 2ξ ] time.pSolv-
ing the resulting sparse system of L[ ξ/2] linear congruences in L[ ξ/2]
variables also takes the same time. To sum up, the first √ stage of the cubic
sieve method canpbe so implemented as to run in L[ 2ξ p ] time. The space
requirement is L[ ξ/2]. If ξ = 1/3,
p the running time is L[ 2/3] ≈ L[0.816],
and the space requirement is L[ 1/6] ≈ L[0.408].
368 Computational Number Theory
The second stage of the cubic sieve method is costlier than those for the
L[1] methods discussed earlier. The trouble now
p is that the first stage supplies
a smaller database of discrete logarithms (L[ ξ/2] compared to L[1/2]). This
indicates that we need to perform more than L[1/2] work√in order to compute
individual logarithms. Here is a strategy that runs in L[ 2ξ ] time.
Asymptotically, the cubic sieve method is faster than the L[1]-time al-
gorithms discussed earlier. However, the quadratic and cubic terms in the
subexponential values c1 , c2 , c3 make the values of T (c1 , c2 , c3 ) rather large
compared to pξ . As a result, the theoretically better performance of the cubic
sieve method does not show up unless the bit size of p is rather large. Practical
experiences suggest that for bit sizes > 200, the cubic sieve method achieves
some speedup over the L[1] algorithms. On the other hand, the number-field
sieve method takes over for bit sizes > 300. To sum up, the cubic sieve method
Discrete Logarithms 369
seems to be a good choice only for a narrow band (200–300) of bit sizes. An-
other problem with the cubic sieve method is that its second stage is as slow as
its first stage, and much slower than the second stages of the L[1] algorithms.
The biggest trouble attached to the cubic sieve method is that we do not
know how to compute x, y, z as small as possible satisfying x3 ≡ y 2 z (mod p)
and x3 6= y 2 z. A solution is naturally available only for some special primes
p. General applicability of the cubic sieve method thus remains unclear.
Theorem 7.11 A non-zero polynomial in F2 [x] of degree k has all its irre-
ducible factors with degrees 6 m with probability
µ ¶
k k
p(k, m) = exp (−1 + o(1)) ln .
m m
or equivalently,
Y
g(θ)α = w(θ)γw ∈ F2n = F2 (θ).
w(x)∈B
This is a linear congruence in the variables indg(θ) w(θ). Let |B| = t. We vary
α, and generate s relations with t 6 s 6 2t. The resulting s × t system is
solved modulo 2n − 1 to obtain the indices of the elements of the factor base.
The second stage computes individual logarithms using the database ob-
tained from the first stage. Suppose that we want to compute the index of
a(θ) ∈ F∗2n . We pick random α, and attempt to decompose a(θ)g(θ)α into
irreducible factors of degrees 6 m. A successful factoring attempt gives
Y
a(θ)g(θ)α = w(θ)δw , that is,
w∈B
X
indg(θ) a(θ) ≡ −α + δw indg(θ) w(θ) (mod 2n − 1).
w∈B
Example 7.13 Let us take n = 17, and represent F217 = F2 (θ), where θ17 +
θ3 + 1 = 0. The size of F∗2n is 2n − 1 = 131071 which is a prime. Therefore,
every element of F2n other than 0, 1 is a generator of F∗2n . Let us compute
discrete logarithms to the base g(θ) = θ7 + θ5 + θ3 + θ. We choose m = 4, that
is, the factor base is B = {w1 , w2 , w3 , w4 , w5 , w6 , w7 , w8 }, where
w1 = θ,
w2 = θ + 1,
w3 = θ2 + θ + 1,
w4 = θ3 + θ + 1,
w5 = θ3 + θ2 + 1,
w6 = θ4 + θ + 1,
w7 = θ4 + θ3 + θ2 + θ + 1,
w8 = θ4 + θ3 + 1.
372 Computational Number Theory
α g(θ)α
73162 θ + θ + θ + θ + θ + θ + θ6 + θ5 + θ4 + θ3 + 1
16 15 12 11 10 8
Taking logarithms of the above relations leads to the following linear system.
We use the notation di = indg (wi ).
0 0 3 2 0 1 0 0 d1 73162
0 2 2 1 1 0 0 1 d2 87648
5 4 0 1 1 0 0 0 d3 18107
0 7 3 1 0 0 0 0 d4 31589
≡ (mod 131071).
1 0 0 3 0 0 1 0 d5 26426
0 1 1 3 1 0 0 0 d6 74443
1 0 1 0 3 1 0 0 d7 29190
0 3 1 0 1 0 1 1 d8 109185
Solving the system gives the indices of the elements of the factor base:
d1 ≡ indg (w1 ) ≡ indg (θ) ≡ 71571
d2 ≡ indg (w2 ) ≡ indg (θ + 1) ≡ 31762
2
d3 ≡ indg (w3 ) ≡ indg (θ + θ + 1) ≡ 5306
3
d4 ≡ indg (w4 ) ≡ indg (θ + θ + 1) ≡ 55479
3 2 (mod 131071).
d5 ≡ indg (w5 ) ≡ indg (θ + θ + 1) ≡ 2009
4
d6 ≡ indg (w6 ) ≡ indg (θ + θ + 1) ≡ 77357
d7 ≡ indg (w7 ) ≡ indg (θ4 + θ3 + θ2 + θ + 1) ≡ 50560
4 3
d8 ≡ indg (w8 ) ≡ indg (θ + θ + 1) ≡ 87095
In the second stage, we compute the index of a(θ) = θ15 + θ7 + 1. For the
choice α = 3316, the element a(θ)g(θ)α factors completely over B.
a(θ)g(θ)3316 = θ16 + θ14 + θ9 + θ6 + θ4 + θ3 + θ2 + 1
= (θ + 1)3 (θ2 + θ + 1)3 (θ3 + θ2 + 1)(θ4 + θ3 + 1).
Discrete Logarithms 373
This gives
indg a ≡ −3316 + 3d2 + 3d3 + d5 + d8
≡ −3316 + 3 × 31762 + 3 × 5306 + 2009 + 87095
≡ 65921 (mod 131071).
One can verify that (θ7 + θ5 + θ3 + θ)65921 = θ15 + θ7 + 1. ¤
I now deduce the optimal running time of this basic index calculus method.
In the first stage, α is chosen randomly from {1, 2, . . . , 2n −2}, and accordingly,
g α is a random√ element of F∗2n , that is, a polynomial of degree O(n). We
take m = c n ln n for some positive real constant c. By Corollary £ 7.12,
¤ the
α 1
probability
£1¤ that all irreducible factors of g have degrees 6 m is L − 2c , that
is, L 2c random values of α need to be tried for obtaining a single relation.
By Corollary 3.7, the total number of irreducible polynomials in F2 [x] of
degree k is nearly 2k /k. Therefore, the size of the factor base ³ is t = |B| ´≈
Pm k m m
k=1 2 /k. Evidently, 2 /m 6 t 6 m2 , that is, t = exp (ln 2 + o(1))m .
√ ³ √ ´
Putting m = c n ln n gives t = exp (c ln 2 + o(1)) n ln n = L[c ln 2]. Since
s £relations (with
¤ t 6 s 6 2t) need to be generated, an expected number of
1
L 2c + c ln 2 random values of α need to be tried. Each such trial involves
factoring g(θ)α . We have polynomial-time (randomized) algorithms for fac-
toring polynomials over finite fields (trial division by L[c£ ln 2] elements ¤ of B is
1
rather costly), so the relation-collection stage runs in L 2c + c ln 2 time. The
1
√
quantity 2c + c ln 2 is minimized for c = 1/ 2 ln 2, which leads to a running
√
time of L[ 2 ln 2 ] = L[1.1774 . . .] for the ·qrelation-collection
¸ phase.
ln 2
The size of the factor base is t = L 2 = L[0.5887 . . .]. Each relation
contains at most O(m) irreducible polynomials, so the resulting system of
congruences is√ sparse and can be solved in (essentially) quadratic time, that
is, in time L[ 2 ln 2 ]—the same as taken by the relation-collection phase.
The second stage involves obtaining a single relation (one smooth
·q value ¸
α
£ 1
¤ ln 2
of ag ), and can be accomplished in expected time L 2c = L 2 =
L[0.5887 . . .], that is, much faster than the first stage.
logarithms in finite fields of characteristic two, SIAM Journal of Algebraic and Discrete
Methods, 5, 276–285, 1984.
374 Computational Number Theory
p(n/2, m)2 ³ n ´
≈ exp (1 + o(1)) ln 2 ≈ 2n/m .
p(n, m) m
This means that we expect to obtain smooth values of ri (θ)/vi (θ) about 2n/m
times more often than we expect to find smooth values of h(θ).√ Although this
factor is absorbed in the o(1) term in the running time L[ 2 ln 2 ] of the first
stage, the practical benefit of this trick is clearly noticeable.
i qi ri vi
0 − f (x) = x17 + x3 + 1 0
1 − h(x) = x16 + x12 + x8 + x7 + x6 + x + 1 1
2 x x13 + x9 + x8 + x7 + x3 + x2 + x + 1 x
3
3 x x + x10 + x8 + x7 + x5 + x4 + x3 + x + 1
11
x4 + 1
4 x2 + x + 1 x9 + x8 + x7 + x5 + x3 + x2 + x x6 + x5 + x4 + x2 + 1
5 x2 + 1 x7 + x5 + x3 + x2 + 1 x8 + x7 + x5 + x4
We stop the extended gcd loop as soon as deg ri (x) 6 n/2. In this example,
this happens for i = 5. We have the factorizations:
The above extended-gcd table shows that deg vi (x) + deg ri (x) ≈ n for
all values of i. Therefore, when deg ri (x) ≈ n/2, we have deg vi (x) ≈ n/2
too. This is indeed the expected behavior. However, there is no theoretical
guarantee (or proof) that this behavior is exhibited in all (or most) cases. As
a result, the modification of Blake et al. is only heuristic. ¤
or, equivalently,
Y
(θν +c1 (θ))(θν +c2 (θ)) = θǫ f1 (θ) + (c1 (θ)+c2 (θ))θν + c1 (θ)c2 (θ) = w(θ)γw .
w∈B1
This is a linear congruence in the indices of the elements of the factor base
B = B1 ∪ B2 . We assume that g(x) itself is an irreducible polynomial of small
degree, that is, g(x) = wk (x) ∈ B1 for some k. This gives us a free relation:
indg(θ) (wk (θ)) ≡ 1 (mod 2n − 1).
As c1 and c2 range over all polynomials in B2 , many relations are generated.
Let t = |B|. The parameter m should be so chosen that all (c1 , c2 ) pairs lead
to an expected number s of relations, satisfying t 6 s 6 2t. The resulting
system of linear congruences is then solved modulo 2n − 1. This completes the
first stage of the linear sieve method.
Example 7.15 Let us take n = 17, and represent F217 = F2 (θ), where θ17 +
θ3 + 1 = 0. Here, f1 (x) = x3 + 1 is of degree much smaller than n/2. Since
|F∗217 | = 131071 is a prime, every element of F∗217 , other than 1, is a generator
of the multiplicative group F∗217 .
We have ν = ⌈n/2⌉ = 9, and ǫ = 2ν − n = 1. We take m = 4, that is, B1
consists of the eight irreducible polynomials w1 , w2 , . . . , w8 of Example 7.13,
whereas B2 consists of the 24 = 16 polynomials x9 + a3 x3 + a2 x2 + a1 x + a0
with each ai ∈ {0, 1}. Let us name these polynomials as w9+(a3 a2 a1 a0 )2 , that
is, B2 = {w9 , w10 , . . . , w24 }. It follows that |B| = |B1 | + |B2 | = 8 + 16 = 24.
We vary c1 and c2 over all polynomials of B2 . In order to avoid repetitions,
we take c1 = wi for i = 9, 10, . . . , 24, and, for each i, we take c2 = wj for
j = i, i + 1, . . . , 24. Exactly 24 smooth polynomials T (c1 , c2 ) are obtained.
Let us now see how these smooth values of T (c1 , c2 ) lead to linear con-
gruences. As an example, consider the relation (x9 + x2 + 1)(x9 + x2 + x) ≡
x2 (x + 1)2 (x3 + x + 1)(x3 + x2 + 1) (mod f (x)). Substituting x = θ gives
(θ9 + θ2 + 1)(θ9 + θ2 + θ) = θ2 (θ + 1)2 (θ3 + θ + 1)(θ3 + θ2 + 1), that is,
w14 w15 = w12 w22 w4 w5 , that is, 2d1 +2d2 +d4 +d5 −d14 −d15 ≡ 0 (mod 217 −1),
where di = indg(θ) (wi (θ)).
Discrete Logarithms 377
c1 c2 T (c1 , c2 )
0 0 x4 + x = x(x + 1)(x2 + x + 1)
0 x3 + 1 x12 + x9 + x4 + x = x(x + 1)9 (x2 + x + 1)
1 1 x4 + x + 1
x x x4 + x2 + x = x(x3 + x + 1)
x x2 x11 + x10 + x4 + x3 + x = x(x2 + x + 1)3 (x4 + x + 1)
3
x x +1 x12 + x10 + x9 = x9 (x3 + x + 1)
x x3 + x2 + 1x12 + x11 + x10 + x9 + x3
= x3 (x2 + x + 1)(x3 + x2 + 1)(x4 + x3 + x2 + x + 1)
x+1 x+1 x4 + x2 + x + 1 = (x + 1)(x3 + x2 + 1)
3 2
x+1 x +x x12 + x11 + x10 + x9 + x2 + x
= x(x + 1)(x2 + x + 1)2 (x3 + x + 1)2
x+1 x3 + x2 + x x12 + x11 + x9 = x9 (x3 + x2 + 1)
x2 x2 x
2 2
x +1 x +1 x+1
x2 + 1 x2 + x x10 + x9 + x3 + x2
= x2 (x + 1)2 (x3 + x + 1)(x3 + x2 + 1)
x2 + x x2 + x x2 + x = x(x + 1)
2 2
x +x x +x+1 x9
x2 + x x3 + x2 + x + 1 x12 + x9 + x5 + x4 = x4 (x + 1)4 (x4 + x + 1)
x2 + x + 1 x2 + x + 1 x2 + x + 1
2 3
x +x+1 x x12 + x11 + x10 + x9 + x5 + x3 + x
= x(x3 + x + 1)(x4 + x + 1)(x4 + x3 + 1)
x3 + 1 x3 + 1 x6 + x4 + x + 1 = (x + 1)(x2 + x + 1)(x3 + x + 1)
3 3
x +x x +x x6 + x4 + x2 + x = x(x + 1)(x4 + x3 + 1)
3 3
x +x x +x+1 x9 + x6 + x4 + x3 + x2 = x2 (x2 + x + 1)2 (x3 + x + 1)
3 3 2
x +x x +x x11 + x10 + x6 + x5 + x3 + x
= x(x + 1)5 (x2 + x + 1)(x3 + x2 + 1)
3 2 3 2
x +x x +x x6 + x = x(x + 1)(x4 + x3 + x2 + x + 1)
x + x + x x + x + x x6 + x2 + x = x(x2 + x + 1)(x3 + x2 + 1)
3 2 3 2
in L[c ln 2] variables requires the same time. To sum √ up, the first stage of the
linear sieve method for F2n runs in L[2c ln 2] = L[ ln 2 ] = L[0.83255 . . .] time.
For the fields Fp , we analogously need to factor all T (c1 , c2 ) values (see
Section 7.2.2). In that case, T (c1 , c2 ) are integers, and we do not know any
easy way to factor them. Using trial division by the primes in the factor base
leads to subexponential time for each T (c1 , c2 ). That is why we used sieving
in order to reduce the amortized effort of smoothness checking.
For F2n , on the other hand, we know good (polynomial-time) algorithms
(although probabilistic, but that does not matter) for factoring each polyno-
mial T (c1 , c2 ). Trial division is a significantly slower strategy, since the factor
base contains a subexponential number of irreducible polynomials of small √ de-
grees. Moreover, sieving is not required to achieve the running time L[ ln 2 ],
and the name linear sieve method for F2n sounds like a bad choice.
However, a kind of polynomial sieving (Exercise 7.26) can be applied to
all the sieve algorithms for F2n . These sieves usually do not improve upon the
running time in the L[ ] notation, because they affect only the o(1) terms in
the running times. But, in practice, these sieves do possess the potential of
significantly speeding up the relation-collection stages.
In the second stage, we need to compute individual logarithms. If we
use a strategy
√ similar to the second stage of the basic method, we spend
1
L[ 2c ] = L[ ln 2 ] = L[0.83255 . . .] time for each individual logarithm. This is
exactly the same as the running time of the first stage. The problem with this
strategy is that we now have a smaller database of indices of small irreducible
polynomials, compared to the basic method (L[ 2√1ln 2 ] instead of L[ √21ln 2 ]).
Moreover, we fail to exploit the database of indices of the elements of B2 .
A strategy similar to the second stage of the linear sieve method for prime
fields (Section 7.2.2) can be adapted to F2n (solve Exercise 7.19).
Example 7.16 It is difficult to illustrate the true essence of the cubic sieve
method for small values of n, as we have done in Examples 7.13 and 7.15. This
is because if n is too small, the expressions c21 + c1 c2 + c22 and c1 c2 (c1 + c2 )
lead to T (c1 , c2 ) values having degrees comparable to n.
We take n = 31, so ν = ⌈n/3⌉ = 11, and ǫ = 3ν − n = 2. The size of F∗231
is 231 − 1 = 2147483647 which is a prime. Thus, every element of F231 , other
than 0 and 1, is a generator of the group F∗231 . We represent F231 as F2 (θ),
where θ31 + θ3 + 1 = 0, that is, f1 (x) = x3 + 1 is of suitably small degree.
Let us choose m = 8. B1 contains all non-constant irreducible polynomials
of F2 [x] of degrees 6 8. There are exactly 71 such polynomials. B2 consists of
the 28 = 256 polynomials of the form x11 + c(x) with deg c(x) < 8. The size
of the factor base is, therefore, |B| = |B1 | + |B2 | = 71 + 256 = 327.
We generate the polynomials T (c1 , c2 , c3 ) (with c3 = c1 +c2 ), and check the
smoothness of these polynomials over B1 . The polynomial T (c1 , c2 , c3 ) remains
the same for any of the six permutations of the arguments c1 , c2 , c3 . To avoid
380 Computational Number Theory
(x11 + x5 + x3 + x2 + x + 1) × (x11 + x7 + x6 + x4 + x3 + x2 ) ×
(x11 + x7 + x6 + x5 + x4 + x + 1)
≡ x2 (x3 + 1) + (c21 + c1 c2 + c22 )x11 + c1 c2 (c1 + c2 )
≡ x25 + x22 + x20 + x19 + x16 + x15 + x14 + x12 + x10 + x9 + x8 +
x6 + x5 + x4 + x3
≡ x3 (x3 + x2 + 1)(x5 + x2 + 1)(x6 + x + 1) ×
(x8 + x7 + x6 + x4 + x3 + x2 + 1) (mod f (x)).
Putting x = θ gives
(θ11 + θ5 + θ3 + θ2 + θ + 1) × (θ11 + θ7 + θ6 + θ4 + θ3 + θ2 ) ×
(θ11 + θ7 + θ6 + θ5 + θ4 + θ + 1)
= θ3 (θ3 + θ2 + 1)(θ5 + θ2 + 1)(θ6 + θ + 1) ×
(θ8 + θ7 + θ6 + θ4 + θ3 + θ2 + 1).
With the notation di = indg(θ) (wi (θ)), we have the linear congruence
All these relations are homogeneous, so we need to include a free relation. For
instance, taking g(θ) = θ7 + θ + 1 = w24 gives d24 ≡ 1 (mod 231 − 1). ¤
takes
p the same time. Thus, the first stage of the cubic sieve method runs in
L[ 2(ln 2)/3 ] = L[0.67977 . . .] time.
The second stage of the cubic sieve method for prime fields can be adapted
to work for F2n . The details are left to the reader (solve Exercise 7.20).
The degree of T1 (x) is less than b + h, whereas the degree of T2 (x) is less than
(b + 1)2k . Since 2k = O˜(n1/3 ), choosing b = O˜(n1/3 ) implies that both T1
and T2 are of degrees about n2/3 (recall that h = O˜(n2/3 )).
If both T1 and T2 factor completely over the factor base B, that is, if
Y Y
T1 (x) = w(x)γw , and T2 (x) = w(x)δw ,
w∈B w∈B
Let us now look at the running time of Coppersmith’s method. The proba-
bility that both T1 and T2 are smooth is about p(b+h, m)p((b+1)2k , m). There
are exactly 22b−1 pairs (c1 , c2 ) for which the polynomials T1 and T2 are com-
puted. So the expected number of relations is 22b−1 p(b + h, m)p((b + 1)2k , m).
The factor base contains O˜(2m ) irreducible polynomials. In order to obtain a
system of congruences with slightly more equations than variables, we require
We take
each trial involves factoring a polynomial of degree < n (a task that can be
finished in probabilistic polynomial time), we find a desired factorization in
the same subexponential time. Suppose that some α gives
r
Y
a(θ)g(θ)α = ui (θ)
i=1
with the degree of each ui (θ) no more than n2/3 (ln n)1/3 . In order to determine
indg(θ) (a(θ)), it suffices to compute indg(θ) (ui (θ)) for all i.
We reduce the computation of each indg(θ) (ui (θ)) to the computation of
indg(θ) (ui,j (θ)) for some polynomials ui,j of degrees smaller than that of ui .
This recursive process of replacing the index of a polynomial by the indices of
multiple polynomials of smaller degrees is repeated until we eventually arrive
at polynomials with degrees so reduced that they belong to the factor base.
In order to explain the reduction process, suppose that we want to compute
the index of a polynomial u(θ) of degree d 6 n2/3 (ln n)1/3 . The procedure is
similar to the first stage of pCoppersmith’s method.¥ ¦We first choose a positive
integer k satisfying 2k ≈ n/d, and take h = 2nk + 1. For relatively prime
polynomials c1 (x), c2 (x) of small degrees, we consider the two polynomials
T1 (x) = c1 (x)xh + c2 (x), and
k k k k
T2 (x) ≡ T1 (x)2 ≡ c1 (x2 )xh2 −n
f1 (x) + c2 (x2 ) (mod f (x)).
We choose c1 (x) and c2 (x) in such a manner that T1 (x) is a multiple of u(x).
We choose a degree d′ , and want both T1 (x)/u(x) and T2 (x) to split into
irreducible factors of degrees 6 d′ . For a successful factorization, we have
Y Y
vi (x) ≡ 2k u(x) wj (x) (mod f (x)),
i j
where vi (x) and wj (x) are polynomials of degrees 6 d′ . But then, indg(θ) (u(θ))
can be computed if the indices of all vi (θ) and wj (θ) are computed.
384 Computational Number Theory
It can be shown that we can take d′ 6 d/1.1. In that case, the depth of
recursion becomes O(log n) (recursion continues until the polynomials of B
are arrived at). Moreover, each reduction gives a branching of no more than n
new index calculations. That is, the total number
£ of intermediate
¤ polynomials
created in the process is only nO(log n) = exp c(ln n)2 (for a positive constant
c), and so the running time of the second stage of Coppersmith’s method is
dominated by the initial ¡search for a suitable a(θ)g(θ)α .¢ We argued earlier
that this step runs in exp (1.0986 . . . + o(1))n1/3 (ln n)2/3 time.
where κ ∈ F∗q , and wi are the (monic) irreducible polynomials in the factor
base B. Taking logarithm to the base g gives
X
α ≡ indg κ + βi indg wi (mod q n − 1).
i
After collecting many relations, we solve the system to compute the unknown
indices indg wi of the factor-base elements. See below to know how to handle
the unknown quantities indg κ for κ ∈ F∗q .
In the second stage, the individual logarithm of h ∈ F∗qn is computed by
locating one B-smooth value of hg α .
Discrete Logarithms 385
The problem with this method is that for large values of q, it may even be
infeasible to work with factor bases B containing only all the linear polyno-
mials of Fq [x] (there are q of them)—a situation corresponding to D = 1. A
suitable subset of linear polynomials may then act as the factor base.
Another problem associated with the basic method is that g α , even if
smooth over B, need not be monic. If its leading coefficient is κ, we need to
include indg κ too in the relation. However, this is not a serious problem. If
q is not very large, it is feasible to compute indices in F∗q . More precisely,
n
³ n ´
g ′ = g (q −1)/(q−1) is an element of F∗q , and indg κ = qq−1
−1
indg′ κ.
is then chosen. This polynomial should satisfy the following eight conditions.
1. H(x, y) is irreducible in F̄q [x, y], where F̄q is the algebraic closure of Fq .
2. H(x, m(x)) is divisible by the defining polynomial f (x) for some (known)
univariate polynomial m(x) ∈ Fq [x].
3. hd,d′ −1 = 1.
4. hd,0 6= 0.
5. h0,d′ −1 6= 0.
Pd i
6. i=0 hi,d′ −1 y ∈ Fq [y] is square-free.
Pd′ −1 j
7. j=0 hd,j x ∈ Fq [x] is square-free.
8. The size of the Jacobian JFqn (H) is coprime to (q n − 1)/(q − 1).
The curve H(x, y) plays an important role here. The set Fq (H) of all rational
functions on H is called the function field of H (Section 4.4.2). The set of all
integers in Fq (H) is the set Fq [H] of all polynomial functions on H. Because
of the Condition 2 above, the function taking y 7→ m(x) (mod f (x)) naturally
extends to a ring homomorphism Φ : Fq [H] → Fqn [x].
14 Leonard M. Adleman, The function field sieve, ANTS, 108–121, 1994.
15 Leonard M. Adleman, Ming-Deh A. Huang, Function field sieve method for discrete
logarithms over finite fields, Information and Computation, 151(1-2), 5–16, 1999.
16 Antoine Joux and Reynald Lercier, The function field sieve in the medium prime case,
Adleman and Huang propose the following method to determine the poly-
nomial H(x, y). The defining polynomial f (x) ∈ Fq [x] is monic of degree n.
The y-degree d of H is chosen as a value about n1/3 . Let d′ = ⌈n/d⌉, and
δ = dd′ − n < d. For any monic polynomial m(x) ∈ Fq [x] of degree d′ ≈ n2/3 ,
we then have xδ f (x) = m(x)d + Hd−1 (x)m(x)d−1 + Hd−2 (x)m(x)d−2 + · · · +
H1 (x)m(x) + H0 (x), where each Hi (x) ∈ Fq [x] is of degree 6 d − 1. Let
H(x, y) = y d + Hd−1 (x)y d−1 + Hd−2 (x)y d−2 + · · · + H1 (x)y + H0 (x).
By construction, H(x, m(x)) ≡ 0 (mod f (x)). More concretely, we can take
′
f (x) = xn +f1 (x) with deg f1 (x) < n2/3 , and m(x) = xd . But then, H(x, y) =
y d + xδ f1 (x). We vary f1 (x) until H(x, y) satisfies the above eight conditions.
Once a suitable polynomial H(x, y) is chosen, we need to choose a factor
base. Let S consist of all monic irreducible polynomials of Fq [x] with degrees no
more than about n1/3 . For r(x), s(x) ∈ Fq [x] with degrees no more than about
n1/3 and with gcd(r(x), s(x)) = 1, we consider two quantities: the polynomial
r(x)m(x) + s(x), and the polynomial function r(x)y + s(x) ∈ Fq (H). The
polynomial rm + s is attempted to factor completely over S, whereas the
function ry + s is attempted to factor completely in Fq (H) over a set of
primes of Fq (H) of small norms. If both the factoring attempts are successful,
we get a relation (also called a doubly smooth pair). Factorization in Fq (H)
is too difficult a topic to be elaborated in this book. It suffices for the time
being to note that factoring r(x)y + s(x) essentially boils down to factoring
its norm which is the univariate polynomial r(x)d H(x, −s(x)/r(x)) ∈ Fq [x].
Both r(x)m(x) + s(x) and this norm are polynomials of degrees no more than
about n2/3 . Sieving is carried out to identify the doubly smooth pairs.
Each trial for a relation in the FFSM, therefore, checks the smoothness
of two polynomials of degrees about n2/3 . This is asymptotically better than
trying to find one smooth polynomial of degree proportional to n (as in LSM
or CSM). The FFSM achieves a running time of Lqn [1/3, (32/9)1/3 ].
17 Nigel P. Smart, The discrete logarithm problem on elliptic curves of trace one, Journal
for anomalous elliptic curves, Commentarii Mathematici Universitatis Sancti Pauli, 47, 81–
92, 1998.
19 Igor A. Semaev, Evaluation of discrete logarithms on some elliptic curves, Mathematics
rithms to a finite field, IEEE Transactions on Information Theory, 39, 1639–1646, 1993.
22 Gerhard Frey and Hans-Georg Rück, A remark concerning m-divisibility and the dis-
crete logarithm problem in the divisor class group of curves, Mathematics of Computation,
62, 865–874, 1994.
388 Computational Number Theory
Exercises
1. Let h ∈ F∗q have order m (a divisor of q −1). Prove that for a ∈ F∗q , the discrete
logarithm indh a exists if and only if am = 1.
2. Let E be an elliptic curve defined over Fq , and P ∈ Eq a point of order m.
Prove or disprove: For a point Q ∈ Eq , the discrete logarithm indP Q exists if
and only if mQ = O.
3. Suppose that g and g ′ are two primitive elements of F∗q . Show that if one can
compute discrete logarithms to the base g in O(f (log q)) time, then one can
also compute discrete logarithms to the base g ′ in O(f (log q)) time. (Assume
that f (log q) is a super-polynomial expression in log q.)
4. Suppose that g is a primitive element of a finite field Fq , where q is a power of 2.
Prove that computing indg a is polynomial-time equivalent to the computation
of the parity of indg a.
5. Explain how the baby-step-giant-step method can be used to compute the
order of an element in a finite group of size n. (Assume that the prime factor-
ization of n is unknown.)
6. Let G be a finite group, and g ∈ G. Suppose that an a = g x is given together
with the knowledge that i 6 x 6 j for some known i, j. Let k = j − i + 1.
Describe how the baby-step-giant-step √ method to determine x can be modified√
so as to use a storage for only O( k) group elements and a time of only O˜( k)
group operations.
7. Let n = pq be the product of two distinct odd primes p, q of the same bit size.
(a) Let g ∈ Z∗n . Prove that ordn g divides φ(n)/2.
(b) Conclude that g (n+1)/2 ≡ g x (mod n), where x = (p + q)/2.
(c) Use Exercise 7.6 to determine x. Demonstrate how you can factor n from
the knowledge of x.
(d) Prove that this factoring algorithm runs in O˜(n1/4 ) time.
8. Let g1 , g2 , . . . , gt , a belong to a finite group G. The multi-dimensional discrete-
logarithm problem is to find integers x1 , x2 , . . . , xt such that a = g1x1 g2x2 · · · gtxt
(if such integers exist). Some r is given such that a has the above represen-
tation with 0 6 xi < r for all i. Devise a baby-step-giant-step method to
compute x1 , x2 , . . . , xt using only O˜(rt/2 ) group operations.
9. Discuss how the Pollard rho and lambda methods (illustrated in Examples 7.2
and 7.3 for prime fields) can be modified to work for extension fields Fpn .
10. Let γij be the exponent of the small prime pj in the i-th relation in the basic
index calculus method for a prime field Fp . Assume that a random integer
g α (mod p) has probability 1/phj for being divisible by phj , where pj is a small
prime, and h is a small exponent. Determine the probability that γij 6= 0.
11. Let C be the coefficient matrix obtained in the first stage of the LSM for the
prime field Fp . Count the expected number of non-zero entries in the j-th
390 Computational Number Theory
19. Modify the second stage of the linear sieve method for prime fields to work
for the fields F2n . What is the running time of this modified second stage?
20. Modify the second stage of the cubic sieve method for prime fields to work for
the fields F2n . What is the running time of this modified second stage?
Discrete Logarithms 391
21. Prove that the number of triples (c1 , c2 , c3 ) in the cubic sieve method for the
field F2n is 32 × 4m−1 + 2m−1 + 13 .
22. Propose an adaptation of the residue-list sieve method for the fields F2n .
23. Extend the concept of large prime variation to the case of the fields F2n .
24. Prove that there are exactly 22b−1 pairs (c1 (x), c2 (x)) in Coppersmith’s
method with deg c1 < b, deg c2 < b, and gcd(c1 , c2 ) = 1. Exclude the pair
(0, 1) in your count.
(d) (d) (d)
25. A gray code23 G0 , G1 , . . . , G2d −1 of dimension d is the enumeration of all
(1)
d-bit strings, defined recursively as follows. For
( d = 1, we have G0 = 0 and
(d−1)
(1) (d) 0Gk if 0 6 k < 2d−1 ,
G1 = 1, whereas for d > 2, we have Gk = (d−1)
1G2d −k−1 if 2d−1 6 k < 2d .
(d) (d)
Prove that for 1 6 k < 2d , the bit strings Gk−1 and Gk differ in exactly one
bit position given by v2 (k) (the multiplicity of 2 in k).
26. In this exercise, we explore Gordon and McCurley’s polynomial sieving pro-
cedure24 in connection with the linear sieve method for F2n . Let w(x) be an
irreducible polynomial in B of low degree, h a small positive integer, and
δ = m − 1 − h deg w(x). We find polynomials c1 , c2 of degrees < m satisfy-
ing T (c1 , c2 ) ≡ xǫ f1 (x) + (c1 (x) + c2 (x))xν + c1 (x)c2 (x) ≡ 0 (mod w(x)h ).
Suppose that, for a fixed c1 , a solution of this congruence is c̄2 (x). Then, all
the solutions for c2 (x) are c2 (x) = c̄2 (x) + u(x)w(x)h for all polynomials u(x)
of degrees < δ. Describe how the δ-dimensional gray code can be used to
efficiently step through all these values of c2 (x). The idea is to replace the
product u(x)w(x)h by more efficient operations. Complete the description of
the sieve, based upon this strategy. Deduce the running time for this sieve.
27. Argue that the basic index calculus method for Fqn (with small q) can be
designed to run in L(q n , 1/2, c) time.
28. Extend the linear sieve method for characteristic-two fields to compute indices
in Fqn . Assume that q is small.
29. How can sieving be done in the extended linear sieve method of Exercise 7.28?
30. How can you modify Coppersmith’s method in order to compute indices in
non-binary fields of small characteristics (like three or five)?
Programming Exercises
Using the GP/PARI calculator, implement the following.
31. The baby-step-giant-step method for prime finite fields.
32. The Pollard rho method for binary finite fields.
33. The Pollard lambda method for elliptic curves over finite fields.
23 The Gray code is named after the American physicist Frank Gray (1887–1969).
24 DanielM. Gordon and Kevin S. McCurley, Massively parallel computation of discrete
logarithms, CRYPTO, 312–323, 1992.
392 Computational Number Theory
34. The sieving step for the linear sieve method for prime fields.
35. The sieving step for the linear sieve method for binary finite fields.
36. Relation generation in the basic index calculus method in the field F3n .
Chapter 8
Large Sparse Linear Systems
Algorithms for solving large sparse linear systems over the finite rings ZM con-
stitute the basic theme for this chapter. Arguably, this is not part of number
theory. However, given the importance of our ability to quickly solve such sys-
tems in connection with factoring and discrete-logarithm algorithms, a book
on computational number theory has only little option to ignore this topic.
The sieving phases of factoring and discrete-log algorithms are massively par-
allelizable. On the contrary, the linear-algebra phases resist parallelization
efforts, and may turn out to be the bottleneck in practical implementations.
Throughout this chapter, we plan to solve the linear system of congruences:
Ax ≡ b (mod M ), (8.1)
393
394 Computational Number Theory
Bx ≡ c (mod M ) (8.2)
solvers. I do not make any effort here to explain the standard cubic algorithms
like Gaussian elimination, but straightaway jump to the algorithms suitable
for sparse systems. Structured Gaussian elimination applies to any modulus
M . The standard Lanczos and Wiedemann methods are typically used for odd
moduli M , although they are equally applicable to M = 2. The block versions
of Lanczos and Wiedemann methods are significantly more efficient compared
to their standard versions, but are meant for M = 2 only.
The structure of a typical sparse matrix A from the sieve algorithms merits
a discussion in this context. Each row of A is sparse, but the columns in A have
significant variations in their weights (that is, counts of non-zero elements). A
randomly chosen integer is divisible by a small prime p with probability about
1/p. The column in A corresponding to p is expected to contain about m/p
non-zero entries. If p is small (like p = 2), the column corresponding to p is
quite dense. On the contrary, if p is relatively large (like the millionth prime
15,485,863), the column corresponding to it is expected to be rather sparse.
The columns in A corresponding to factor-base elements other than small
primes (like H + c in the linear sieve method) are expected to contain only a
small constant number of non-zero elements. In view of these observations, we
call some of the columns heavy, and the rest light. More concretely, we may
choose a small positive real constant α (like 1/32), and call a column of A
heavy if its weight is more than αm, light otherwise.
Usually, each non-zero entry in A is a small positive or negative integer,
even when the modulus M is large. Here, small means absolute values no
larger than a few hundreds. It is preferable to represent a negative entry −a
by −a itself, and not by the canonical representative M − a. This practice
ensures that each entry in A can be represented by a single-precision signed
integer. The matrix A is typically not stored in a dense format (except perhaps
for M = 2, in which case multiple coefficients can be packed per word). We
instead store only the non-zero entries in a row-major format.
In the situations where we solve Eqn (8.2) instead of Eqn (8.1), it is often
not advisable to compute B = A t A explicitly, for B may be significantly
denser than A. It is instead preferable to carry out a multiplication by B as
two multiplications by the sparse matrices A and A t . While multiplying by
A t , we often find it handy to have a row-major listing of the non-zero elements
of A t , or equivalently a column-major listing of A.
on the reduced system becomes more practical than applying the quadratic
sparse solvers we describe later in this chapter. This is particularly important
for M = 2, since in this case multiple coefficients can be packed per word
in a natural way, and standard Gaussian elimination can operate on words,
thereby processing multiple coefficients in each operation.
There are some delicate differences in the SGE procedure between the cases
of factorization and discrete-log matrices. Here, I supply a unified treatment
for both m > n and m 6 n. For dealing with matrices specifically from
factoring algorithms, one may look at the paper by Bender and Canfield.1
Matrices from discrete-log algorithms are studied by LaMacchia and Odlyzko.2
During the execution of SGE, we call certain columns of A heavy, and
the rest light. An initial settlement of this discrimination may be based upon
the weights (the weight of a row or column of A is the number of non-zero
entries in that row or column) of the columns in comparison with αm for
a predetermined small positive fraction α (like 1/32). Later, when rows and
columns are removed, this quantitative notion of heaviness or lightness may be
violated. The steps of SGE attempt to keep the light columns light, perhaps
at the cost of increasing the weights of the heavy columns.
Step 1: Delete columns of weights zero and one.
(a) Remove all columns of weight zero. These columns correspond to vari-
ables which do not appear in the system at all, and can be discarded altogether.
(b) Remove all columns of weight one and the rows containing these non-
zero entries. Each such column refers to a variable that appears in exactly one
equation. When the values of other variables are available, the value of this
variable can be obtained from the equation (that is, row) being eliminated.
After the completion of Step 1, all columns have weights > 2. Since the
matrix has potentially lost many light columns, it may be desirable to declare
some light columns as heavy. The obvious choices are those having the highest
weights among the light columns. This may be done so as to maintain the
heavy-vs-light discrimination based upon the fraction α. Notice that the value
of m (the number of rows) reduces after every column removal in Step 1(b).
Step 2: Delete rows of weights zero and one.
(a) A row of weight zero stands for the equation 0 = 0, and can be elim-
inated. Although such a row may be absent in the first round of the steps of
SGE, they may appear in later rounds.
(b) Let the row Ri contain only one non-zero entry. Suppose that this
entry corresponds to the variable xj . This row supplies the value of xj (we
assume that the non-zero entry is invertible modulo M ). Substitute this value
of xj in all the equations (rows) where xj occurs. Delete Ri and the column
(light or heavy) corresponding to xj . Repeat for all rows of weight one.
1 Edward A. Bender and E. Rodney Canfield, An approximate probabilistic model for
x18 = 1
x1 + x8 + x17 = 0
x1 + x3 + x6 + x11 + x23 = 1
x2 + x5 + x23 = 1
x3 + x6 + x21 + x22 + x23 = 1
x2 + x3 + x13 + x21 = 0
x1 + x2 + x7 = 1
x22 = 1
x1 + x2 + x5 + x6 + x9 + x10 + x21 = 0
x1 + x3 + x14 + x18 = 0
x1 + x2 = 1
x1 + x5 = 1
x3 + x4 + x5 + x16 + x24 = 0
x1 + x3 + x13 + x20 = 1
x1 + x4 + x6 + x13 + x14 + x24 = 0
x2 = 1
398 Computational Number Theory
x3 + x23 = 0
x4 + x10 + x16 + x20 = 0
x1 + x6 + x11 + x24 = 0
x10 + x11 + x16 + x21 = 1
x1 + x2 + x3 + x4 + x11 + x21 = 0
x1 + x2 + x4 + x17 = 0
x1 = 0
x1 + x4 + x18 = 0
x1 + x2 + x25 = 0
Below, the system is written in the matrix form. Only the reduced matrix A
(initially the 25 × 25 coefficient matrix) and the reduced vector b are shown.
A b
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
H H H H LH LLL L L L L L L L L L L L H L L L L
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
4 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
5 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1
6 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
7 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
9 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
10 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
12 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
13 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
14 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1
15 1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
16 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
17 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
18 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
19 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
20 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
21 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
22 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
25 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
The indices of the undeleted rows and columns are shown as row and
column headers. Let us take the heaviness indicator fraction as α = 1/5, that
is, a column is called heavy if and only if contains at least αm = m/5 non-zero
entries, where m is the current count of rows in A. Heavy and light columns
are marked by H and L respectively.
Large Sparse Linear Systems 399
Round 1
Step 1(a): Columns indexed 12, 15 and 19 have no non-zero entries, that is,
the variables x12 , x15 and x19 appear in no equations. We cannot solve for
these variables form the given system. Eliminating these columns reduces the
system to the following:
A b
1 2 3 4 5 6 7 8 9 10 11 13 14 16 17 18 20 21 22 23 24 25
H H H H LH LLL L L L L L L L L H L L L L
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1
4 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
5 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1
6 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
7 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
9 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0
10 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0
11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
12 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
13 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
14 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1
15 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0
16 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
17 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
18 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0
19 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
20 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1
21 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
22 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
25 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
Step 1(b): The columns with single non-zero entries are indexed 7, 8, 9, 25.
We delete these columns and the rows containing these non-zero entries (rows
7, 2, 9, 25). We will later make the following substitutions:
x7 = 1 + x1 + x2 , (8.3)
x8 = 0 + x1 + x17 , (8.4)
x9 = 0 + x1 + x2 + x5 + x6 + x10 + x21 , (8.5)
x25 = 0 + x1 + x2 . (8.6)
400 Computational Number Theory
Step 2(a): At this instant, there are no zero rows, so this step is not executed.
Step 2(b): We look at rows with single non-zero entries. Row 1 is the first
case with the sole non-zero entry at Column 18. We get the immediate solution
x18 = 1 (8.7)
The variable x18 occurs in Rows 10 and 24. We substitute the value of x18 in
these equations, and delete Row 1 and Column 18. This step is subsequently
repeated for the following rows of weight one.
Row index Column index Rows adjusted
8 22 5
16 2 4, 6, 11, 21, 22
23 1 3, 10, 11, 12, 14, 15, 19, 21, 22, 24
24 4 13, 15, 18, 21, 22
In the process, we obtain the solutions for the following variables:
x22 = 1 (8.8)
x2 = 1 (8.9)
Large Sparse Linear Systems 401
x1 = 0 (8.10)
x4 = 1 (8.11)
After all these five iterations of Step 2(b), the system reduces to:
A b
3 5 6 10 11 13 14 16 17 20 21 23 24
H LH L H L L L L L H H L
3 1 0 1 0 1 0 0 0 0 0 0 1 0 1
4 0 1 0 0 0 0 0 0 0 0 0 1 0 0
5 1 0 1 0 0 0 0 0 0 0 1 1 0 0
6 1 0 0 0 0 1 0 0 0 0 1 0 0 1
10 1 0 0 0 0 0 1 0 0 0 0 0 0 1
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 1 0 0 0 0 0 0 0 0 0 0 0 1
13 1 1 0 0 0 0 0 1 0 0 0 0 1 1
14 1 0 0 0 0 1 0 0 0 1 0 0 0 1
15 0 0 1 0 0 1 1 0 0 0 0 0 1 1
17 1 0 0 0 0 0 0 0 0 0 0 1 0 0
18 0 0 0 1 0 0 0 1 0 1 0 0 0 1
19 0 0 1 0 1 0 0 0 0 0 0 0 1 0
20 0 0 0 1 1 0 0 1 0 0 1 0 0 1
21 1 0 0 0 1 0 0 0 0 0 1 0 0 0
22 0 0 0 0 0 0 0 0 1 0 0 0 0 0
The columns are relabeled as heavy/light after all the iterations of Step 2(b).
Now, there are 16 rows, so columns with at most three non-zero entries are
light, and those with more than three entries are heavy.
Step 3: The first row to contain only one non-zero entry in the light columns
is Row 4 with that entry being in Column 5. This gives us the following future
possibility of back substitution:
x5 = 0 + x23 . (8.12)
We substitute this expression for x5 in all other equations where x5 appears.
Here, these other equations are 12 and 13. We effect this substitution by a
subtraction (same as addition modulo 2) of Row 4 from Rows 12 and 13. The
modified system is shown below. Notice that the subtraction of (multiples of)
the row being deleted from other rows may change the weights of some columns
other than the one being deleted. Consequently, it is necessary to relabel each
column (or at least the columns suffering changes) as heavy/light, and restart
the search for rows with single non-zero entries in the new light columns.
402 Computational Number Theory
A b
3 6 10 11 13 14 16 17 20 21 23 24
H H L H H L H L L H H H
3 1 1 0 1 0 0 0 0 0 0 1 0 1
5 1 1 0 0 0 0 0 0 0 1 1 0 0
6 1 0 0 0 1 0 0 0 0 1 0 0 1
10 1 0 0 0 0 1 0 0 0 0 0 0 1
11 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 1 0 1
13 1 0 0 0 0 0 1 0 0 0 1 1 1
14 1 0 0 0 1 0 0 0 1 0 0 0 1
15 0 1 0 0 1 1 0 0 0 0 0 1 1
17 1 0 0 0 0 0 0 0 0 0 1 0 0
18 0 0 1 0 0 0 1 0 1 0 0 0 1
19 0 1 0 1 0 0 0 0 0 0 0 1 0
20 0 0 1 1 0 0 1 0 0 1 0 0 1
21 1 0 0 1 0 0 0 0 0 1 0 0 0
22 0 0 0 0 0 0 0 1 0 0 0 0 0
Subsequently, Step 3 is executed several times. I am not showing these reduc-
tions individually, but record the substitutions carried out in these steps.
x14 = 1 + x3 (8.13)
x20 = 1 + x3 + x13 (8.14)
x10 = 0 + x3 + x13 + x16 (8.15)
x16 = 1 + x3 + x23 + x24 (8.16)
x24 = 0 + x3 + x6 + x13 (8.17)
x17 = 0 (8.18)
After all these steps, the system reduces as follows:
A b
3 6 11 13 21 23
H H H H H H
3 1 1 1 0 0 1 1
5 1 1 0 0 1 1 0
6 1 0 0 1 1 0 1
11 0 0 0 0 0 0 0
12 0 0 0 0 0 1 1
17 1 0 0 0 0 1 0
19 1 0 1 1 0 0 0
20 1 0 1 1 1 0 1
21 1 0 1 0 1 0 0
This completes the first round of Steps 1–3. We prefer to avoid Step 4 in
this example, since we started with a square system, and the number of rows
always remains close to the number of columns.
Large Sparse Linear Systems 403
Round 2: Although all the columns are marked heavy now, there still remain
reduction possibilities. So we go through another round of the above steps.
Step 1(a) and 1(b): There are no columns of weight zero or one. So these
steps are not executed in this round.
Step 2(a): Row 11 contains only zero entries, and is deleted from the system.
Step 2(b): Row 12 contains a single non-zero entry giving
x23 = 1. (8.19)
We also adjust Rows 3, 5 and 17 by substituting this value of x23 . This lets
Row 17 have a single non-zero entry at Column 3, so we have another reduction
x3 = 1, (8.20)
which changes Rows 3, 5, 6, 19, 20 and 21. The reduced system now becomes
A b
6 11 13 21
H H H H
3 1 1 0 0 1
5 1 0 0 1 0
6 0 0 1 1 0
19 0 1 1 0 1
20 0 1 1 1 0
21 0 1 0 1 1
Step 3: Since all columns are heavy at this point, this step is not executed.
Another round of SGE does not reduce the system further. So we stop at
this point. The final reduced system consists of the following equations:
x6 + x11 = 1
x6 + x21 = 0
x13 + x21 = 0
x11 + x13 = 1
x11 + x13 + x21 = 0
x11 + x21 = 1
Standard Gaussian elimination on this system gives the unique solution:
x6 = 1 (8.21)
x11 = 0 (8.22)
x13 = 1 (8.23)
x21 = 1 (8.24)
Now, we work backwards to obtain the values of other variables using the
substitution equations generated so far. The way SGE works ensures that
404 Computational Number Theory
One can prove by induction on i that wit Awj = 0 for i > j. Moreover, since
A is symmetric, for i < j, we have wit Awj = (wjt Awi ) t = 0. Therefore,
the direction vectors w0 , w1 , w2 , . . . are A-orthogonal to one another. Now,
ws is the first vector
Ps−1 linearly dependent upon theP previous direction vectors,
s−1
that is, ws = j=0 aj wj . But then, wst Aws = j=0 aj wst Awj = 0 by A-
orthogonality. The positive-definiteness of A implies that ws = 0, that is, the
Lanczos iterations continue until we obtain a direction vector ws = 0. Take
s−1
X wjt b
x= bj wj , where bj = . (8.27)
j=0
wjt Awj
s−1
X
Ax − b = d j wj
j=0
diagonal matrix D with entries from Fqd , and solve the equivalent system
DAx = Db, that is, feed the system (DA) t (DA)x = (DA) t Db (that is,
the system (A tD2 A)x = (A tD2 b)) to Algorithm 8.1. The introduction of D
does not affect sparsity (A and DA have non-zero entries at exactly the same
locations), but the storage of DA may take more space than storing D and
A individually (since the elements of DA are from a larger field compared to
those of A). So, we keep the modified coefficient matrix A tD2 A in this factored
form, and compute vi = A t (D2 (Awi−1 )). Although the final solution belongs
to (Fq )n , the computation proceeds in Fqd , and is less likely to encounter the
problem associated with the original matrix. Still, if the computation fails, we
choose another random matrix D, and restart from the beginning.
Example 8.2 (1) Consider the 6 × 4 system modulo the prime p = 97:
0 0 49 56 41
0 35 12 0 x1 11
77 21 0 3 x2 9
= . (8.29)
0 79 0 0 x3 8
26 0 68 24 x4 51
17 0 0 53 77
The coefficient matrix is not square. Left multiplication by the transpose of
this matrix gives the square system:
7 65 22 10 x1 30
65 50 32 63 x2 42
= . (8.30)
22 32 88 11 x3 80
10 63 11 31 x4 62
By an abuse of notation, we let A denote the coefficient matrix of this square
system. In practice, we do not compute A explicitly, but keep it as the product
B t B, where B is the 6 × 4 coefficient matrix of the original system. We feed
System (8.30) to Algorithm 8.1. The computations made by the algorithm are
summarized in the following table.
i vi αi βi i vi αi βi wi x
wi x
30 68 58 56
42 67 85 80
0 − − − 80 − 3 54 9 49 77
19
62 38 69 78
82 16 21 36 0 64
40 64 10 64 0 75
1 22 10 44 56 4 51 60 0 57
26 46
25 5 24 77 0 34
52 79 86
46 80 76
2 39 90 46 65
22
78 19 14
408 Computational Number Theory
We get w4 = 0, and the Lanczos loop terminates. The value of x after this
iteration gives the solution of System (8.30): x1 = 64, x2 = 75, x3 = 57, x4 =
34. This happens to be a solution of System (8.29) too.
i vi αi βi wi x i vi αi βi wi x
2 5
6 6
0 − − − − 2 3 0 Failure
6 6
6 5
3 3 0
3 3 0
1 0 1
6 6 0
3 3 0
(3) In order to make Algorithm 8.1 succeed on System (8.31), we use the
quadratic extension F49 = F7 (θ) with θ2 + 1 = 0. Since 49 is somewhat large
compared to n = 4, the computations are now expected to terminate without
error. We multiply (8.31) by the 6 × 6 diagonal matrix
6θ 0 0 0 0 0
0 4θ + 3 0 0 0 0
0 0 6θ + 1 0 0 0
D=
0 0 0 3θ + 5 0 0
0 0 0 0 θ 0
0 0 0 0 0 2θ + 6
Large Sparse Linear Systems 409
356, 1969.
Large Sparse Linear Systems 411
8 Originally, Elwyn R. Berlekamp proposed this algorithm for decoding BCH codes (see
Berlekamp’s 1968 book Algebraic Coding Theory). James Massey (Shift-register synthe-
sis and BCH decoding, IEEE Transactions on Information Theory, 15(1), 122–127, 1969)
simplified and modified Berlekamp’s algorithm to the form presented in this book.
412 Computational Number Theory
by A is an ideal of K[x]. The monic generator of this ideal, that is, the monic
non-zero polynomial of the smallest degree, which A satisfies, is called the
minimal polynomial of A and denoted as µA (x). Clearly, µA (x)|χA (x) in K[x].
Wiedemann’s algorithm starts by probabilistically determining µA (x). Let
that is, if c0 6= 0,
x = c−1
0 (A
d−1
b − cd−1 Ad−2 b − cd−2 Ad−3 b − · · · − c1 Ab) (8.43)
Although the Lanczos and the Wiedemann methods look different at the
first sight, there is a commonness between them. For a non-zero vector b, the
i-th Krylov vector is defined as ui−1 = Ai−1 b. Clearly, there exists s ∈ N such
that u0 , u1 , . . . , us−1 are linearly independent, whereas u0 , u1 , . . . , us−1 , us
are linearly dependent. The span of u0 , u1 , . . . , us−1 is called the Krylov space
for A and b. Both Lanczos and Wiedemann methods express the solution of
Ax = b as a linear combination of the Krylov vectors. In view of this, these
methods are often called Krylov space methods.
V +W = {v + w | v ∈ V, and w ∈ W},
VtW = {v t w | v ∈ V, and w ∈ W},
AV = {Av | v ∈ V}.
V and W are called A-orthogonal if V tAW = {0}, that is, if v tAw = 0 for all
v ∈ V and w ∈ W. Let the subspace V ⊆ K n have dimension ν. Fix any basis
v0 , v1 , . . . , vν−1 of V, and consider the n × ν matrix V = (v0 v1 · · · vν−1 ).
V is called A-invertible if the ν × ν matrix V tAV is invertible. Since different
10 Peter L. Montgomery, A block Lanczos algorithm for finding dependencies over GF(2),
t t
= Si−1 (Si−1 Vi−1 AVi−1 Si−1 )−1 Si−1
t
, (8.47)
inv t
Di = Wi−1 (Vi−1 A2 Vi−1 Si−1 Si−1
t t
+ Vi−1 AVi−1 ) − Iν , (8.48)
inv t t
Ei = Wi−2 Vi−1 AVi−1 Si−1 Si−1 , (8.49)
inv t inv
Fi = Wi−3 (Iν − Vi−2 AVi−2 Wi−2 )
t 2 t t t
(Vi−2 A Vi−2 Si−2 Si−2 + Vi−2 AVi−2 )Si−1 Si−1 . (8.50)
The simplified formula is as follows:
t
Vi = AVi−1 Si−1 Si−1 − Vi−1 Di − Vi−2 Ei − Vi−3 Fi . (8.51)
This formula is valid for i > 3. For j < 0, we take Vj and Wjinv as the n × ν
and ν × ν zero matrices, and Sj as the identity matrix Iν . If so, Eqn (8.51)
holds for all i > 1. Algorithm 8.3 summarizes the block Lanczos method.
Example 8.5 Let me demonstrate the working of the block Lanczos method
on artificially small parameters. We solve the following symmetric 10 × 10
system modulo two. Let us take the word size as ν = 4.
0 1 0 1 0 0 0 1 1 0 x1 1
1 0 0 1 1 0 1 1 1 0 x2 0
0 0 0 0 0 0 1 0 0
1 x3 0
1 1 0 0 1 1 1 1 0 1 x4 1
0 1 0 1 1 0 1 0 1 1 x5 = 1 .
0 0 0 1 0 1 0 0 0 1 x6 0
0 1 1 1 1 0 1 0 1 0 x7 0
1 1 0 1 0 0 0 1 0 0 x8 0
1 1 0 0 1 0 1 0 1 0 x9 0
0 0 1 1 1 1 0 0 0 0 x10 0
0 0 0 0
inv inv 0 0 0 0
Initialization for i = −2, −1: W−2 = W−1 = , V = V−1 =
0 0 0 0 −2
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
, and S−2 = S−1 =
0 .
0
0 0 0 0 1 0
0 0 0 0 0 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
Initialization
fori = 0: We start with the following randomly generated
0 1 00
0 1 1
1
1 1 1
0
1 0 1
0
0 0 0
0
V0 = . A set of three A-invertible columns of V0 is chosen by
0 0 1
1
0 0 0
0
1 0 0
1
1 0 1
1
1 0 00
0 1 0
0 1 1
1 1 0
1 0 0 1 0 1
0 1 0 0 0 0
the selection matrix S0 = giving W0 = V0 S0 = (the
0 0 0 0
0 1
0 0 1 0 0 0
1 0 1
1 0 1
1 0 0
Large Sparse Linear Systems 419
The loop terminates after this iteration, so this is the final solution. ¤
Let us now investigate when Algorithm 8.3 may fail because of the lack
of positive-definiteness of the matrix A. Example 8.5 illustrates the situation
that V3 has become 0, and so V3tAV3 is 0 too. In the earlier iteration (i = 2), V2
is non-zero, but V2tAV2 is of rank 3 (less than its dimension 4). The selection
matrix S2 ensures that only three vectors are added to the Krylov space.
In general, any Vi offers 2ν − 1 non-empty choices for the selection matrix
Si . If Vi 6= 0, any one of these choices, that yields an A-invertible Wi , suffices.
Typically, ν > 32, whereas we solve systems of size n no larger than a few
hundreds of millions (larger systems are infeasible anyway). Moreover, during
each iteration, multiple vectors are added to the Krylov space in general, that
is, the number of iterations is expected to be substantially smaller than n.
420 Computational Number Theory
Thus, it is highly probable that we can always find a suitable subset of the
columns of a non-zero Vi to form an A-invertible Wi . On the other hand, if
Vi = 0, no selection matrix Si can produce an A-invertible Wi . This means
that although the modulus is small (only 2), there is not much of a need to
work in extension fields for the algorithm to succeed.
for Toeplitz and almost Toeplitz matrices, Technical Report 538, Research Laboratory of
Electronics, Massachusetts Institute of Technology, December 1988.
15 We intend to solve System (8.36) or (8.52) where the variables are denoted by c or c .
i i
Here, we use xi (or Xi in Example 8.7) for variables. Likewise, for b (or B). This notational
inconsistency is motivated by that solving Toeplitz systems is itself of independent interest.
Large Sparse Linear Systems 423
for a suitable choice of the scalar ǫ(i) to be specified later. In this section, I use
the parenthesized superscript (i) to indicate quantities in the i-th iteration.
In an actual implementation, it suffices to remember the quantities from only
the previous iteration. Superscripts are used for logical clarity. Subscripts are
used for matrix and vector elements (like ti ) and dimensions (like 0i−1 ).
For i = 1, we have t0 x1 = b1 which immediately gives x(1) = ( t−1 0 b1 ),
provided that t0 6= 0. We also take y(1) = z(1) = ( 1 ), that is, ǫ(1) = t0 .
Suppose that T (i) x(i) = b(i) is solved, and we plan to compute a solution
of T (i+1) x(i+1) = b(i+1) . At this stage, the vectors y(i) and z(i) , and the scalar
ǫ(i) are known. We write
t−i t0 t−1 · · · t−i
(i)
T .. .
T (i+1) = . = .. .
(i)
ti ti−1 · · · t0 T
ti
This implies that the following equalities hold:
(i) (i+1)
µ (i) ¶ ǫ(i) µ ¶ −ǫ ζ
y 0
T (i+1) = 0i−1 and T (i+1) = 0i−1 ,
0 (i) (i+1) z(i) (i)
−ǫ ξ ǫ
where
1
ξ (i+1) = − ( ti ti−1 ··· t1 ) y(i) , and (8.55)
ǫ(i)
1
ζ (i+1) = − ( t−1 t−2 ··· t−i ) z(i) .
(8.56)
ǫ(i)
µ (i) ¶ µ ¶
y 0
We compute y(i+1) and z(i+1) as linear combinations of and :
0 z(i)
µ (i) ¶ µ ¶
(i+1) y (i+1) 0
y = +ξ , and (8.57)
0 z(i)
µ ¶ µ (i) ¶
0 y
z(i+1) = + ζ (i+1) . (8.58)
z(i) 0
This requires us to take
ǫ(i+1) = ǫ(i) (1 − ξ (i+1) ζ (i+1) ). (8.59)
Finally, we update the solution x(i) to x(i+1) by noting that
µ (i) ¶ µ (i) ¶ µ (i+1) ¶
x b (i+1) η − bi+1
T (i+1) = = b + T (i+1) z(i+1) ,
0 η (i+1) ǫ(i+1)
where
η (i+1) = ( ti ti−1 ··· t1 ) x(i) , (8.60)
that is,
µ ¶ µ ¶
x(i) bi+1 − η (i+1)
x(i+1) = + z(i+1) . (8.61)
0 ǫ(i+1)
424 Computational Number Theory
Example 8.7 Let us again solve System (8.62) of Example 8.6. We take 2×2
blocks, and rewrite the system as
T0 T−1 T−2 X1 B1
T1 T0 T−1 X2 ≡ B2 (mod 97),
T2 T1 T0 X3 B3
µ ¶ µ ¶ µ ¶
85 92 79 6 15 72
where T0 = , T−1 = , T−2 = , T1 =
µ ¶ 39 µ 85 ¶ 92µ 79 ¶ µ ¶6 15 µ ¶
87 39 82 42 x1 x3 x5
, T = , X1 = , X2 = , X3 = ,
42 µ87 ¶ 2 30
µ ¶ 82 µ ¶ x 2 x4 x6
82 31 14
B1 = , B2 = , and B3 = .
53 12 48
Since matrix multiplication is not commutative in general, we need to use
two potentially different ǫ(i) and ǫ′(i) satisfying the block version of Eqn (8.54):
µ ¶ µ ¶
ǫ(i) 0(i−1)×ν
T (i) Y (i) = and T (i) Z (i) = .
0(i−1)×ν ǫ′(i)
The corresponding updating equations for ξ (i+1) , ζ (i+1) , ǫ(i+1) , ǫ′(i+1) and
x(i+1) should be adjusted accordingly (solve Exercise 8.10).
Initialization (i = 1): The initial solution is
µ ¶
80
X (1) = T0−1 B1 = .
5
Let us plan to take
µ ¶ µ ¶
1 0 85 92
Y (1) = Z (1) = I2 = , so that ǫ(1) = ǫ′(1) = T0 = .
0 1 39 85
426 Computational Number Theory
³ ´−1 µ ¶
(2) ′(1) (1) 78 31
ξ = − ǫ T1 Y = ,
63 11
³ ´−1 µ ¶
91 64
ζ (2) = − ǫ(1) T−1 Z (1) = ,
69 61
µ ¶
29 85
ǫ(2) = ǫ(1) (I2 − ζ (2) ξ (2) ) = ,
67 23
µ ¶
23 85
ǫ′(2) = ǫ′(1) (I2 − ξ (2) ζ (2) ) = ,
67 29
µ (1) ¶ µ ¶ 1 0
Y 02×2 0 1
Y (2) = + ξ (2) = ,
02×2 Z (1) 78 31
63 11
µ ¶ µ (1) ¶ 91 64
02×2 Y 69 61
Z (2) = + ζ (2) = ,
Z (1) 02×2 1 0
0 1
µ ¶
74
η (2) = T1 X (1) = ,
12
µ (1) ¶ 80
³ ´−1
X 70
X (2) = +Z (2)
ǫ′(2) (2)
(B2 − η ) = .
02 13
77
³ ´−1 µ ¶
(3) ′(2) (2) 61 44
ξ = − ǫ ( T2 T1 ) Y = ,
21 25
³ ´−1 µ ¶
78 78
ζ (3) = − ǫ (2)
( T−1 T−2 ) Z = (2)
,
15 32
µ ¶
41 43
ǫ(3) = ǫ(2) (I2 − ζ (3) ξ (3) ) = ,
52 57
µ ¶
57 43
ǫ′(3) = ǫ′(2) (I2 − ξ (3) ζ (3) ) = ,
52 41
1 0
µ (2) ¶ µ ¶ 0 1
Y 02×2 86 9
Y (3) ξ (3) =
= + (2) ,
02×2 Z 24 13
61 44
21 25
Large Sparse Linear Systems 427
78 78
µ ¶ µ (2) ¶ 15 32
02×2 Y 44 59
Z (3) = (2) + ζ (3) = ,
Z 02×2 7 89
1 0
0 1
µ ¶
54
η (3) = ( T2 T1 ) X (2) = ,
59
31
µ (2) ¶ 67
³ ´−1
X 96
X (3) = + Z (3) ǫ′(3) (B3 − η (3) ) = .
02 3
72
48
Note that the solutions X (1) , X (2) and X (3) in the block version are the same
as the solutions x(2) , x(4) and x(6) in the original version (Example 8.6). ¤
428 Computational Number Theory
Exercises
1. [Lifting] Let p ∈ P and e ∈ N. Describe how you can lift solutions of Ax ≡
b (mod pe ) to solutions of Ax ≡ b (mod pe+1 ).
2. For linear systems arising out of factorization algorithms, we typically need
multiple solutions of homogeneous systems. Describe how the standard Lanc-
zos method can be tuned to meet this requirement.
3. Repeat Exercise 8.2 for the block Lanczos algorithm. More precisely, show
how a block of solutions to the homogeneous system can be obtained.
4. In order to address the problem of self-orthogonality in the Lanczos algorithm,
we modified the system Ax = b to A tD2Ax = A tD2 b for a randomly chosen
invertible diagonal matrix D with entries from a suitable extension field. What
is the problem if we plan to solve the system DA tAx = DA t b instead?
5. For the block Lanczos method, describe a method to identify the selection
matrix Si of Eqn (8.46)
6. The Wiedemann algorithm chooses elements of Ak v at particular positions.
This amounts to multiplying Eqn (8.42) from left by suitable projection vectors
u. Generalize this concept to work for any non-zero vectors u.
7. Prove that Levinson’s algorithm for solving Eqn (8.53), as presented in the
text, performs a total of about 3n2 multiplications and about 3n2 additions.
8. Prove that Eqn (8.53) is solvable by Levinson’s iterative algorithm if and only
if the matrices T (i) are invertible for all i = 1, 2, . . . , n.
9. Let T be an n × n Toeplitz matrix with rank r 6 n. We call T to be of generic
rank profile if T (i) is invertible for all i = 1, 2, . . . , r. Describe a strategy to let
Levinson’s algorithm generate random solutions of a solvable system T x = b,
where T is a Toeplitz matrix of generic rank profile with rank r < n.
10. Write the steps of the block version of Levinson’s algorithm. For simplicity,
assume that each block of T is a square matrix of size ν × ν, and the entire
coefficient matrix T is also square of size n × n with ν | n.
11. Block algorithms are intended to speed up solving linear systems over F2 .
They are also suitable for parallelization. Explain how.
12. Let A = (aij ) be an m × n matrix with entries from Fq . Suppose that m > n.
Let r denote the rank of A, and d = n − r the rank deficit (also called defect)
of A. Denote the Pj-th column of A by Aj . A non-zero n-tuple (c1 , c2 , . . . , cn ) ∈
n
(Fq )n , for which j=1 cj Aj = 0, is called a linear dependency of the columns
of A. Let l denote the number of linear dependencies of the columns of A.
(a) Prove that l + 1 = q d .
(b) Let the entries of A be randomly chosen. Prove that E(r) > n−logq (E(l)+
1), where E(X) is the expected value of the random variable X.
(c) How can you compute E(l), given a probability distribution for each aij ?
Chapter 9
Public-Key Cryptography
429
430 Computational Number Theory
fips180-3/fips180-3_final.pdf.
2 Ronald Linn Rivest, Adi Shamir and Leonard Max Adleman, A method for obtaining
digital signatures and public-key cryptosystems, Communications of the ACM, 21(2), 120–
126, 1978.
434 Computational Number Theory
Example 9.1 Suppose that Bob publishes the public RSA key:
n = 35394171409,
e = 7.
Somehow, it is leaked to Eve that Bob’s private key is
d = 15168759223.
Let us see how this knowledge helps Eve to factor n. Since ed − 1 = 211 ×
51846345, we have s = 11 and t = 51846345. Eve chooses a = 5283679203, and
computes b ≡ at ≡ 90953423 (mod n). Subsequently, for s′ = 0, 1, 2, . . . , 10,
s′
Eve computes gcd(b2 − 1, n). It turns out that for s′ = 0, 1, 2, 3, this gcd is
1, and for s′ = 4, 5, . . . , 10, this gcd is n itself. This indicates that b has the
same order (namely, 24 ) modulo both the prime factors p and q of n.
Eve then tries a = 985439671 for which b ≡ at ≡ 12661598494 (mod n).
s′
The gcd of b2 − 1 with n is now 1 for s′ = 0, 1, 2, 3, it is 132241 for s′ = 4, 5,
and n itself for s′ = 6, 7, 8, 9, 10. As soon as the non-trivial gcd p = 132241 is
obtained (for s′ = 4), Eve stops and computes the cofactor q = n/p = 267649.
The complete gcd trail is shown here to illustrate that Eve is successful in this
attempt, because ordp b = 24 , whereas ordq b = 26 , in this case. ¤
of computing the RSA secret key and factoring, Journal of Cryptology, 20(1), 39–50, 2007.
436 Computational Number Theory
Let us now investigate the connection between factoring n and RSA de-
cryption. Since the ciphertext c is sent through a public channel, an eaves-
dropper can intercept c. If she can factor n, she computes d ≡ e−1 (mod φ(n)),
and subsequently decrypts c as Bob does. However, the converse capability
of the eavesdropper is not mathematically established. This means that it is
not known whether factoring n is necessary to decrypt c, or, in other words,
if Eve can decrypt c from the knowledge of n and e only, she can also factor
n. At present, no algorithms other than factoring n is known (except for some
pathological parameter values) to decrypt RSA-encrypted messages (without
the knowledge of the decryption exponent d). In view of this, we often say
that the security of RSA is based upon the intractability of the integer factor-
ing problem. It, however, remains an open question whether RSA decryption
(knowing n, e only) is equivalent to or easier than factoring n.
Example 9.2 Let us work with the modulus of Example 9.1.4 Bob chooses
the two primes p = 132241 and q = 267649, and computes n = pq =
35394171409 and φ(n) = (p−1)(q−1) = 35393771520. The smallest prime that
does not divide φ(n) is 7, so Bob takes e = 7. Extended gcd of e and φ(n) gives
d ≡ e−1 ≡ 15168759223 (mod φ(n)). Bob publishes (n, e) = (35394171409, 7)
as his public key, and keeps the private key d = 15168759223 secret.
For encrypting the message M = "Love", Alice converts M to an element m
of Zn . Standard 7-bit ASCII encoding gives m = 76×1283 +111×1282 +118×
128 + 101 = 161217381. Alice obtains Bob’s public key (n, e), and encrypts m
as c ≡ me ≡ 26448592996 (mod n). This value of c (in some encoding suitable
for transmission) is sent to Bob via a public channel.
Bob decrypts as m ≡ cd ≡ 161217381 (mod n), so M is decoded from
m correctly as "Love." If any key other than Bob’s decryption exponent d
is used to decrypt c, the output ciphertext will be different from the mes-
′
sage sent. For example, if Eve uses d′ = 4238432571, she gets m′ ≡ cd ≡
21292182600 (mod n), whose 7-bit ASCII decoding gives the string "O(sXH." ¤
only for illustrating the working of the algorithms. In order to achieve a decent level of
security, practical implementations have to use much larger parameter values.
5 Taher ElGamal, A public-key cryptosystem and a signature scheme based on discrete
temporary or a session key pair is used for each encryption. The session key
must be different in different runs of the encryption algorithm. Both these
key pairs are of the form (d, g d ). Here, d is a random integer between 2 and
|G| − 1, and is used as the private key, whereas g d is to be used as the public
key. Obtaining an ElGamal private key from the corresponding public key
is the same as computing a discrete logarithm in G, that is, the ElGamal
key-inversion problem is computationally equivalent to the DLP in G.
ElGamal encryption: In order to send a message m ∈ G (M is encoded
as an element in G) to Bob, Alice obtains Bob’s public key g d . Alice generates
a random integer d′ ∈ {2, 3, . . . , |G| − 1} (session private key), and computes
′
s = g d (session public key). Alice then masks the message m by the quantity
′ ¡ ¢d′
g dd as t = m g d . The encrypted message (s, t) is sent to Bob.
ElGamal decryption: Upon receiving (s, t) from Alice, Bob recovers
m as m = ts−d using his permanent private key d. The correctness of this
′ ′
decryption is based on that ts−d = mg dd (g d )−d = m.
′
An eavesdropper possesses knowledge of G, g, g d and s = g d . If she can
′ ′
compute g dd , she decrypts the message as m = t(g dd )−1 . Conversely, if Eve
′
can decrypt (s, t), she computes g dd = tm−1 . ElGamal decryption (using
public information only) is, therefore, as difficult as solving the DHP in G.
Example 9.3 The prime p = 35394171431 is chosen by Bob, along with
the generator g = 31 of G = F∗p . These parameters G and g may be used by
multiple (in fact, all) entities in a network. This is in contrast with RSA where
each entity should use a different n. Bob takes d = 4958743298 as his private
key, and computes and publishes the public key y ≡ 31d ≡ 628863325 (mod p).
In order to encrypt m = 161217381 (the 7-bit ASCII encoding of "Love"),
′
Alice first generates a session key d′ = 19254627018, and computes s ≡ g d ≡
33303060050 (mod p) and t ≡ my d ≡ 3056015643 (mod p). Alice sends the
pair (33303060050, 3056015643) to Bob over a public channel.
Bob decrypts the ciphertext (s, t) as m ≡ ts−d ≡ 161217381 (mod p). If
any private key other than d is used, one expects to get a different recovered
message. If Eve uses d = 21375157906, she decrypts (s, t) as m′ ≡ ts−d ≡
24041362599 (mod p), whose 7-bit ASCII decoding is "YGh)’." ¤
In order to share a secret, Alice and Bob publicly agree upon a suitable
finite cyclic group G with a generator g. Alice chooses a random integer d ∈
{2, 3, . . . , |G|−1}, and sends g d to Bob. Bob, in turn, chooses a random integer
′ ′ ′
d′ ∈ {2, 3, . . . , |G| − 1}, and sends g d to Alice. Alice computes g dd = (g d )d ,
′ ′ ′
whereas Bob computes g dd = (g d )d . The element g dd ∈ G is the common
secret exchanged publicly by Alice and Bob. An eavesdropper knows g d and
′ ′
g d only, and cannot compute g dd if the DHP is infeasible in the group G.
Example 9.4 Alice and Bob publicly decide to use the group G = F∗p , where
p = 35394171431. They also decide the generator g = 31 publicly. Alice
chooses d = 5294364, and sends y ≡ g d ≡ 21709635652 (mod p) to Bob. Like-
′
wise, Bob chooses d′ = 92215703, and sends y ′ ≡ g d ≡ 31439131289 (mod p)
to Alice. Alice computes α ≡ y ′d ≡ 10078655355 (mod p), whereas Bob com-
′
putes β ≡ y d ≡ 10078655355 (mod p). The shared secret α = β may be used
for future use. For example, Alice and Bob may use the 7-bit ASCII decoding
"%Ep&{" of α = β as the key of a block cipher. The task of an eavesdropper is
to compute this shared secret knowing G, g, y and y ′ only. This is equivalent
to solving an instance of a Diffie–Hellman problem in G. ¤
Example 9.5 Bob generates the primes p = 241537 and q = 382069, and
computes n = pq = 92283800053 and φ(n) = (p − 1)(q − 1) = 92283176448.
Public-Key Cryptography 439
The smallest prime not dividing φ(n) is chosen as the public exponent e = 5.
The corresponding private exponent is d ≡ e−1 ≡ 55369905869 (mod φ(n)).
Suppose that Bob plans to sign the message m = 1234567890 (may be,
because he wants to donate these many dollars in a charity fund). He generates
the appendix as s ≡ md ≡ 85505674365 (mod n).
To verify Bob’s signature, Alice obtains his public key e = 5, and computes
m′ ≡ se ≡ 1234567890 (mod n). Since m′ = m, the signature is verified.
′
A forger uses d′ = 2137532490 to generate the signature s′ ≡ md ≡
84756771448 (mod n) on m. Verification by Bob’s public key gives m′ ≡ s′e ≡
23986755072 (mod n). Since m′ 6= m, the forged signature is not verified. ¤
9.3.3 DSA
The digital signature algorithm (DSA)7 is an efficient variant of ElGamal’s
signature scheme. The reduction in the running time arises from that DSA
works in a subgroup of F∗p . Let q be a prime divisor of p − 1, having bit length
160 or more. To compute a generator g of the (unique) subgroup G of F∗p of
size q, we choose random h ∈ F∗p and compute g ≡ h(p−1)/q (mod p) until we
have g 6≡ 1 (mod p). The parameters for DSA are p, q and g.
Bob sets up a permanent key pair by choosing a random d ∈ {2, 3, . . . , q−1}
(the private key) and then computing y ≡ g d (mod p) (the public key).
For signing the message (representative) m, Bob chooses a random session
′
secret d′ ∈ {2, 3, . . . , q − 1}, and computes s = (g d (mod p)) (mod q) and
′−1
t ≡ (m + ds)d (mod q). Bob’s signature on m is the pair (s, t).
For verifying this signature, one computes w ≡ t−1 (mod q), u1 ≡ mw
(mod q), u2 ≡ sw (mod q), and v ≡ (g u1 y u2 (mod p)) (mod q), and accepts if
and only if v = s. The correctness of this procedure is easy to establish.
The basic difference between ElGamal’s scheme and DSA is that all expo-
nents in ElGamal’s scheme are needed modulo p − 1, whereas all exponents in
DSA are needed modulo q. In order that the DLP in Fp is difficult, one needs
to take the bit size of p at least 1024. On the contrary, q may be as small as
only 160 bits long. Thus, exponentiation time in DSA decreases by a factor of
(at least) six. The signature size also decreases by the same factor. The DSA
standard recommends a particular way of generating the primes p and q.
fips186-3/fips_186-3.pdf.
Public-Key Cryptography 441
Bob chooses the permanent key pair as: d = 14723 (the private key) and
y ≡ g d ≡ 62425452257 (mod p) (the public key).
Let m = 1234567890 be the message (representative) to be signed by
′
Bob. Bob chooses the session secret d′ = 9372, for which g d ≡ 58447941827
(mod p). Reduction modulo q of this value gives s = 764. Bob computes t ≡
(m + ds)d′−1 ≡ 17681 (mod q). Bob’s signature on m is the pair (764, 17681).
To verify this signature, Alice computes w ≡ t−1 ≡ 19789 (mod q), u1 ≡
mw ≡ 12049 (mod q), u2 ≡ sw ≡ 5438 (mod q), and v = (g u1 y u2 (mod p))
(mod q) = 58447941827 (mod q) = 764. Since v = s, the signature is verified.
Let us see how verification fails on the forged signature (764, 8179). We
now have w ≡ t−1 ≡ 8739 (mod q), u1 ≡ mw ≡ 7524 (mod q), u2 ≡ sw ≡
2606 (mod q), and v = (g u1 y u2 (mod p)) (mod q) = 9836368153 (mod q) =
4872. Since v 6= s, the forged signature is not verified. ¤
9.3.4 ECDSA
The elliptic-curve digital signature algorithm is essentially DSA adapted
to elliptic-curve groups. The same FIPS document (Footnote 7) that accepts
DSA as a standard includes ECDSA too.
Setting up the domain parameters for ECDSA involves some work. First,
a finite field Fq is chosen, where q is either a prime or a power of 2. We choose
two random elements a, b ∈ Fq , and consider the curve E : y 2 = x3 + ax + b
if q is prime, or the curve y 2 + xy = x3 + ax2 + b if q is a power of 2. In
order to avoid the MOV/Frey–Rück attack, it is necessary to take the curve
as non-supersingular. Let n be a prime divisor of |Eq |, having bit length > 160,
and let h = |Eq |/n denote the corresponding cofactor. A random point G in
Eq of order n is chosen by first selecting a random P on the curve and then
computing G = hP until one G 6= O is found. In order to determine n and
to check that the curve E is not cryptographically weak (like supersingular
or anomalous), it is necessary to compute the order |Eq | by a point-counting
algorithm. This is doable in reasonable time, since q is typically restricted
to be no more than about 512 bits in length. The field size q (along with a
representation of Fq ), the elements a, b of Fq defining the curve E, the integers
n, h, and the point G constitute the domain parameters for ECDSA.
The signer (Bob) chooses a random integer d ∈ {2, 3, . . . , n−1} (the private
key), and publishes the elliptic-curve point Y = dG (the public key).
To sign a message M , Bob maps M to a representative m ∈ {0, 1, 2, . . . ,
n − 1}. A random session key d′ ∈ {2, 3, . . . , n − 1} is generated, and the
point S = d′ G is computed. The x-coordinate x(S) of S is reduced modulo
n to generate the first part of the signature: s ≡ x(S) (mod n). In order to
generate the second part, Bob computes t ≡ (m + ds)d′−1 (mod n). Bob’s
signature on m (or M ) is the pair (s, t).
In order to verify this signature, Alice obtains Bob’s permanent public
key Y , and computes the following: w ≡ t−1 (mod n), u1 ≡ mw (mod n),
442 Computational Number Theory
Bob then encrypts r, and sends the challenge c = Ee (r) to Alice. If Alice
knows the corresponding private key, she can decrypt c to recover r. If she
sends r back to Bob, Bob becomes sure of Alice’s capability to decrypt his
challenge. A third party having no knowledge of d can only guess a value of r,
and succeeds with a very little probability.
Before sending the decrypted r′ back to Bob, Alice must check that Bob
is participating honestly in the protocol. If w 6= f (r′ ), Alice concludes that
the witness w does not establish Bob’s knowledge of r. The implication could
be that Bob is trying to make Alice decrypt some ciphertext message in order
to obtain the corresponding unknown plaintext message. In such a situation,
Alice should not proceed further with the protocol.
Example 9.9 Let me illustrate the working of Algorithm 9.1 in tandem with
RSA encryption. Suppose that Alice sets up her keys as follows: p = 132241,
q = 267649, n = pq = 35394171409, φ(n) = (p − 1)(q − 1) = 35393771520,
e = 7, d ≡ e−1 ≡ 15168759223 (mod φ(n)). Alice’s knowledge of her private
key d is to be established in an authentication interaction with Bob.
Bob chooses the random element r = 2319486374, and sends its sum of
digits w = 2 + 3 + 1 + 9 + 4 + 8 + 6 + 3 + 7 + 4 = 47 to Alice as a witness of his
knowledge of r. Cryptographically, this is a very bad realization of f , since it is
neither one-way nor collision-resistant. But then, this is only a demonstrating
example, not a real-life implementation.
Bob encrypts r as c ≡ re ≡ 22927769204 (mod n). Upon receiving the
challenge c, Alice first decrypts it: r′ ≡ cd ≡ 2319486374 (mod n). Since the
sum of digits in r′ is w = 47, she gains the confidence that Bob really knows
r′ . She sends r′ back to Bob. Finally, since r′ = r, Bob accepts Alice. ¤
Example 9.10 Let us use the same RSA parameters and key pair of Alice as
in Example 9.9. Bob sends the random string c = 21321368768 to Alice. Alice,
in turn, generates the random string c′ = 30687013256, and combines c and
c′ as m ≡ c + c′ ≡ 16614210615 (mod n). Alice then generates her signature
s ≡ md ≡ 26379460389 (mod n) on m, and sends c′ and s together to Bob. Bob
uses the RSA verification primitive to get m′ ≡ se ≡ 16614210615 (mod n).
Since m′ ≡ c + c′ (mod n), Bob accepts Alice as authentic. ¤
In order to see how the Fiat–Shamir protocol works, first take the simple
case k = 1, and drop all suffixes (so vi becomes v, for example). Alice proves
her identity by demonstrating her knowledge of s, the square v of which is
known publicly. Under the assumption that computing square roots modulo
a large integer n with unknown factorization is infeasible, no entity in the
network (except the TTP) can compute s from the published value v.
In an authentication interaction, Alice first chooses a random commitment
c which she later uses as a random mask for her response. Her commitment to c
is reflected by the witness w ≡ c2 (mod n). Again because of the intractability
of the modular square-root problem, neither Bob nor any eavesdropper can
recover c from w. Now, Bob chooses a random challenge bit e. If e = 0, Alice
sends the commitment c to Bob. If e = 1, Alice sends cs to Bob. In both cases,
Bob squares this response, and checks whether this square equals w (for e = 0)
or wv (for e = 1). In order that the secret s is not revealed to Bob, Alice must
choose different commitments in different authentication interactions.
Let us now see how an adversary, Eve, can impersonate Alice. When Eve
selects a commitment, she is unaware of the challenge that Bob is going to
throw in future. In that case, her ability to supply valid responses to both
the challenge bits is equivalent to her knowledge of s. If Eve does not know
s, she can succeed with a probability of 1/2 as follows. Suppose that Eve
sends w ≡ c2 (mod n) to Bob for some c chosen by her. If Bob challenges
with e = 0, she can send the correct response c to Bob. On the other hand,
if Bob’s challenge is e = 1, she cannot send the correct response cs, because
446 Computational Number Theory
s is unknown to her. Eve may also prepare for sending the correct response
for e = 1 as follows. She chooses a commitment c but sends the improper
commitment c2 /v (mod n) to Bob. If Bob challenges with e = 1, she sends
the verifiable response c. On the other hand, if Bob sends e = 0, the verifiable
response would be c/s which is unknown to Eve.
If a value of k > 1 is used, Eve succeeds in an interaction with Bob with
probability 2−k . This is because Eve’s commitment can successfully handle ex-
actly one of the 2k different challenges (e1 , e2 , . . . , ek ) from Bob. Example 9.11
illustrates the case k = 2. If 2−k is not small enough, the protocol is repeated
t times, and the chance that Eve succeeds in all these interactions is 2−kt . By
choosing k and t suitably, this probability can be made as low as one desires.
A modification of the Fiat–Shamir protocol and a proof of the zero-
knowledge property of the modified protocol are from Feige, Fiat and Shamir.9
I will not deal with the Feige–Fiat–Shamir (FFS) protocol in this book.
Example 9.11 Suppose that the TTP chooses the composite integer n =
148198401661 to be used in the Fiat–Shamir protocol. It turns out that n is
the product of two primes, but I am not disclosing these primes, since the
readers, not being the TTP, are not supposed to know them.
Take k = 2. The TTP chooses the secrets s1 = 18368213879 and s2 =
94357932436 for Alice, and publishes their squares v1 ≡ s21 ≡ 119051447029
(mod n) and v2 ≡ s22 ≡ 100695453825 (mod n).
In an authentication protocol with Bob, Alice chooses the commitment c =
32764862846 (mod n), and sends the witness w ≡ c2 ≡ 87868748231 (mod n)
to Bob. The following table shows the responses of Alice for different challenges
from Bob. The square r2 and the product wv1e1 v2e2 match in all the cases.
e1 e2 r (mod n) r2 (mod n) wv1e1 v2e2 (mod n)
0 0 c = 32764862846 87868748231 87868748231
1 0 cs1 = 50965101270 49026257157 49026257157
0 1 cs2 = 102354808490 32027759547 32027759547
1 1 cs1 s2 = 65287381609 57409938890 57409938890
Let us now look at an impersonation attempt by Eve having no knowledge
of Alice’s secrets s1 , s2 and Bob’s challenges e1 , e2 . She can prepare for exactly
one challenge of Bob, say, (0, 1). She randomly chooses c = 32764862846, and
sends the (improper) witness w ≡ c2 /v2 ≡ 72251816136 (mod n) to Bob. If
Bob sends the challenge (0, 1), Eve responds by sending r = c = 32764862846
for which r2 ≡ 87868748231 (mod n), and wv2 ≡ 87868748231 too, that is, Eve
succeeds. However, if Eve has to succeed for the other challenges (0, 0), (1, 0)
and (1, 1), she needs to send the responses c/s2 , cs1 /s2 and cs1 , respectively,
which she cannot do since she knows neither s1 nor s2 . This illustrates that if
all of the four possible challenges of Bob are equally likely, Eve succeeds with
a probability of 1/4 only. ¤
9 Uriel Feige, Amos Fiat and Adi Shamir, Zero knowledge proofs of identity, Journal of
that the different versions of the Diffie–Hellman problem discussed above are
difficult to solve for suitable choices of the groups. If the assumption G1 = G2
(= G) simplifies the exposition (but without a loss of security), I will not
hesitate to make this assumption. The reader should, however, keep in mind
that these protocols may often be generalized to the case G1 6= G2 .
1985.
450 Computational Number Theory
Example 9.12 In all the examples of pairing-based protocols, I use the super-
singular curve E : y 2 = x3 + x defined over the prime field Fp , p = 744283. We
have |Ep | = p+1 = 4r, where r = 186071 is a prime. The point P = (6, 106202)
of order r generates the subgroup G = G1 of Ep . In order to fix the second
Public-Key Cryptography 451
M = 1000000000100000000010000000001000000000.
Alice computes Bob’s hashed public identity as PBob = (267934, 76182) for
which g = e(PBob , Ppub ) = 239214θ + 737818. Now, Alice chooses the random
value a = 60294, and obtains
U = aP = (58577, 21875)
and g a = 609914θ + 551077. Applying H2 on g a gives
H2 (g a ) = 609914 × 220 + 551077 = 639541733541
= 1001010011100111101010000110100010100101,
where the last expression for H2 (g a ) is its 40-bit binary representation. Using
bit-wise XOR with M gives the second part of the ciphertext as
V = M ⊕ H2 (g a ) = 0001010011000111101000000110101010100101.
Let us now see how the ciphertext (U, V ) is decrypted by Bob. Bob first
computes e(DBob , U ) = 609914θ + 551077. This is same as g a computed by
452 Computational Number Theory
Alice. Therefore, this quantity after hashing by H2 and bit-wise xor-ing with
V recovers the message M that Alice encrypted.
′
An eavesdropper Eve intercepts (U, V ), and uses DBob = (215899, 48408) 6=
′
DBob for decryption. This gives e(DBob , U ) = 291901θ + 498758. Application
of H2 on this gives the bit string 0100011101000011110101111001110001000110.
When this is xor-ed with V , Eve recovers the message
M ′ = 0101001110000100011101111111011011100011 6= M . ¤
Example 9.13 The TA uses the elliptic curve and the distorted Weil pairing
e : G × G → G3 as in Example 9.12. Let the master secret key be s = 592103,
for which the TA’s public key is Ppub = sP = (199703, 717555).
Suppose that Alice’s hashed public identity is PAlice = (523280, 234529), so
Alice’s secret key is DAlice = (360234, 27008). Likewise, if Bob’s hashed iden-
tity is PBob = (267934, 76182), Bob’s secret key is DBob = (621010, 360227).
In the key-agreement phase, Alice and Bob compute each other’s hashed
identity. Subsequently, Alice computes e(DAlice , PBob ) = 238010θ + 137679,
and Bob computes e(PAlice , DBob ) = 238010θ + 137679. Thus, the secret
shared by Alice and Bob is the group element 238010θ + 137679. Since
the distorted Weil pairing is symmetric about its two arguments, we have
e(DAlice , PBob ) = e(PBob , DAlice ) and e(PAlice , DBob ) = e(DBob , PAlice ), so it
is not necessary to decide which party’s keys go in the first argument. ¤
Example 9.15 The TA chooses the primes p = 142871 and q = 289031, and
computes n = pq = 41294148001 and φ(n) = (p−1)(q−1) = 41293716100. The
prime e = 103319 (not a factor of φ(n)) is chosen, and its inverse d ≡ e−1 ≡
35665134679 (mod φ(n)) is computed. The values of e and n are published.
In 7-bit ASCII encoding, the string Bob evaluates to IBob = 214 × 66 + 27 ×
111 + 98 = 1095650. Let us take this as Bob’s public identity. His private key
d
is generated by the TA as DBob ≡ IBob ≡ 32533552181 (mod n).
Bob wants to sign the message m = 1627384950 ∈ Zn . He first chooses
x = 32465921980, and computes s ≡ xe ≡ 30699940025 (mod n). The hash of
s and m is computed as H(s, m) ≡ sm ≡ 22566668067 (mod n) (this is not a
good hash function, but is used only for illustration). Finally, the second part
of the signature is computed as t ≡ DBob × xH(s,m) ≡ 7434728537 (mod n).
For verifying the signature (s, t), one computes te ≡ 22911772376 (mod n)
and IBob × sH(s,m) ≡ 22911772376 (mod n) (IBob is derived from the string
Bob as the TA did). These two quantities are equal, so the signature is verified.
Let (s, t′ ) be a forged signature, where s = 30699940025 as before (for
the choice x = 32465921980), but t′ = 21352176809 6= t. The quantity IBob ×
sH(s,m) ≡ 22911772376 (mod n) remains the same as in the genuine signature,
but t′e ≡ 9116775652 (mod n) changes, thereby invalidating the signature. ¤
Example 9.16 Let us continue to use the supersingular curve E, the gen-
erator P of the group G, and the distorted Weil pairing e : G × G → G3
as in Example 9.12. Suppose the the TA’s master secret key is s = 219430,
for which the public key is Ppub = sP = (138113, 152726). Suppose also that
Bob’s hashed identity is PBob = (267934, 76182) which corresponds to the
private key DBob = sPBob = (334919, 466375).
Bob wants to sign the message M which hashes to m = 12345 ∈ Zr . Bob
chooses the session secret d′ = 87413, and gets S = d′ P = (513155, 447898).
For computing the second part T of the signature, Bob needs to use a hash
function H3 : G → Zr . Let us take H3 (a, b) = ab (mod r). This is again not a
good hash function but is used here as a placeholder. For this H3 , Bob gets
H3 (S) = 553526, so T = d′−1 (mP − H3 (S)DBob ) = (487883, 187017).
For signature verification, Alice computes Bob’s hashed public identity
PBob = (267934, 76182). Then, Alice computes W1 = e(P, P )m = 45968θ +
325199, W2 = e(S, T ) = 139295θ + 53887, and W3 = e(Ppub , PBob )H3 (S) =
61033θ + 645472. Since W1 = W2 W3 , the signature is verified.
Now, let us see how a forged signature is not verified. A forger can generate
S = d′ P = (513155, 447898) for the choice d′ = 87413, as Bob did. However,
since DBob is unknown to the forger, she uses a random T ′ = (446698, 456705),
and claims that (S, T ′ ) is the signature of Bob on the message M (or m). For
verifying this forged signature, one computes W1 = e(P, P )m = 45968θ +
325199 and W3 = e(Ppub , PBob )H3 (S) = 61033θ + 645472 as in the case of the
genuine signature. However, we now have W2′ = e(S, T ′ ) = 638462θ+253684 6=
W2 . This gives W2′ W3 = 367570θ + 366935 6= W1 . ¤
Exercises
1. Let H be a hash function.
(a) Prove that if H is collision resistant, then H is second preimage resistant.
(b) Give an example to corroborate that H may be second preimage resistant,
but not collision resistant.
(c) Corroborate by an example that H may be first preimage resistant, but
not second preimage resistant.
(d) Corroborate by an example that H may be second preimage resistant,
but not first preimage resistant.
2. Let M be the message to be signed by a digital signature scheme. Using a
hash function H, one obtains a representative m = H(M ), and the signature
is computed as a function of m and the signer’s private key. Describe how the
three desirable properties of H are required for securing the signature scheme.
3. Prove that for an n-bit hash function H, collisions can be found with high
probability after making about 2n/2 evaluations of H on random input strings.
4. For the RSA encryption scheme, different entities are required to use different
primes p, q. Argue why. Given that the RSA modulus is of length t bits (like
t = 1024), estimate the probability that two entities in a network of N entities
accidentally use a common prime p or q. You may assume that each entity
chooses p and q independently and randomly from the set of t/2-bit primes.
5. Let, in the RSA encryption scheme, the ciphertexts corresponding to messages
m1 , m2 ∈ Zn be c1 , c2 for the same recipient (Bob). Argue that the ciphertext
corresponding to m1 m2 (mod n) is c1 c2 (mod n). What problem does this
relation create? How can you remedy this problem?
6. Let Alice encrypt, using RSA, the same message for e identities sharing the
same public key e (under pairwise coprime moduli). How can Eve identify the
(common) plaintext message by intercepting the e ciphertext messages? Notice
that this situation is possible in practice, since the RSA encryption exponent
is often chosen as a small prime. How can this problem be remedied?
7. Let m ∈ Zn be a message to be encrypted by RSA. Count how many messages
m satisfy the identity me ≡ m (mod n). These are precisely the messages
which do not change after encryption.
8. Prove that the RSA decryption algorithm may fail to work for p = q, even if
one correctly takes φ(p2 ) = p2 − p. (If p = q, factoring n = p2 is trivial, and
the RSA scheme forfeits its security. Worse still, it does not work at all.)
9. To speed up RSA decryption, let Bob store the primes p, q (in addition to d),
compute cd modulo both p and q, and combine these two residues by the CRT.
Complete the details of this decryption procedure. What speedup is produced
by this modified decryption procedure (over directly computing cd (mod n))?
460 Computational Number Theory
10. Establish why a new session secret d′ is required for every invocation of the
ElGamal encryption algorithm.
11. Suppose that in the Diffie–Hellman key-agreement protocol, the group size |G|
has a small prime divisor. Establish how an active adversary (an adversary
who, in addition to intercepting a message, can modify the message and send
the modified message to the recipient) can learn a shared secret between Alice
and Bob. How can you remedy this attack?
12. Suppose that the group G in the Diffie–Hellman key-agreement protocol is
cyclic of size m, whereas g ∈ G has order n with n|m. Let f = m/n be the
′
cofactor. Suppose that f has a small prime divisor u, and Bob sends g d h or
h to Alice (but Alice sends g d to Bob), where h ∈ G is an element of order u.
Suppose that Alice later uses a symmetric cipher to encrypt some message for
Bob using the (shared) secret key computed by Alice. Explain how Bob can
easily obtain d modulo u upon receiving the ciphertext. Explain that using
′
g f dd as the shared secret, this problem can be remedied.
13. Let G be a cyclic multiplicative group (like a subgroup of F∗q ) with a generator
g. Assume that the DLP is computationally infeasible in G. Suppose that
Alice, Bob and Carol plan to agree upon a common shared secret by the
Burmester–Desmedt protocol16 which works as follows.
1. Alice generates random a, and broadcasts Za = g a .
2. Bob generates random b, and broadcasts Zb = g b .
3. Carol generates random c, and broadcasts Zc = g c .
4. Alice broadcasts Xa = (Zb /Zc )a .
5. Bob broadcasts Xb = (Zc /Za )b .
6. Carol broadcasts Xc = (Za /Zb )c .
7. Alice computes Ka = Zc3a Xa2 Xb .
8. Bob computes Kb = Za3b Xb2 Xc .
9. Carol computes Kc = Zb3c Xc2 Xa .
Prove that Ka = Kb = Kc = g ab+bc+ca .
14. Assume that Bob uses the same RSA key pair for both encryption and sig-
nature. Suppose also that Alice sends a ciphertext c to Bob, and the corre-
sponding plaintext is m. Finally, assume that Bob is willing to sign a message
in Zn supplied by Eve. Bob only ensures that he does not sign a message (like
c) which has been sent by him as a ciphertext. Describe how Eve can still
arrange a message µ ∈ Zn such that Bob’s signature on µ reveals m to Eve.
15. Show that a situation as described in Exercise 9.5 can happen for RSA signa-
tures too. This is often termed as existential forgery of signatures. Explain, in
this context, the role of the hash function H used for computing m = H(M ).
16. Describe how existential forgery is possible for ElGamal signatures.
16 Mike Burmester and Yvo Desmedt, A secure and scalable group key exchange system,
17. Explain why a new session key d′ is required during each invocation of the
ElGamal signature-generation procedure.
18. Suppose that for a particular choice of m and d′ in the ElGamal signature
generation procedure, one obtains t = 0. Argue why this situation must be
avoided. (If one gets t = 0, one should choose another random d′ , and repeat
the signing procedure until t 6= 0. For simplicity, this intricacy is not mentioned
in the ElGamal signature generation procedure given in the text.)
19. Show that in the DSA signature generation procedure, it is possible to have
s = 0 or t = 0. Argue why each of these cases must be avoided.
20. Show that for each message m ∈ Zn , there are at least two valid ECDSA
signatures (s, t1 ) and (s, t2 ) of Bob with the same s. Identify a situation where
there are more than two valid ECDSA signatures (s, t) with the same s.
21. In a blind signature scheme, Bob signs a message m without knowing the mes-
sage m itself. Bob is presented a masked version µ of m. For example, Bob
may be a bank, and the original message m may pertain to an electronic coin
belonging to Alice. Since money spending is usually desired to be anonymous,
Bob should not be able to identify Alice’s identity from the coin. However,
Bob’s active involvement (signature on µ) is necessary to generate his sig-
nature on the original message m. Assume that the RSA signature scheme is
used. Describe a method of masking m to generate µ such that Bob’s signature
on m can be easily recovered from his signature on µ.17
22. A batch-verification algorithm for signatures s1 , s2 , . . . , sk on k messages (or
message representatives) m1 , m2 , . . . , mk returns “signature verified” if each
si is a valid signature on mi for i = 1, 2, . . . , k. If one or more si is/are not
valid signature(s) on the corresponding message(s) mi , the algorithm should,
in general, return “signature not verified.” A batch-verification algorithm is
useful when its running time on a batch of k signatures is significantly smaller
than the total time needed for k individual verifications.
(a) Suppose that s1 , s2 , . . . , sk are RSA signatures of the same entity on mes-
sages m1 , m2 , . . . , mk . Describe a batch-verification procedure for these k sig-
natures. Establish the speedup produced by your batch-verification algorithm.
Also explain how the algorithm declares a batch of signatures as verified even
when one or more signatures are not individually verifiable.
(b) Repeat Part (a) on k DSA signatures (s1 , t1 ), (s2 , t2 ), . . . , (sk , tk ) from the
same entity on messages m1 , m2 , . . . , mk .
(c) Repeat Part (b) when the k signatures come from k different entities.
(d) What is the problem in adapting the algorithm of Part (b) or (c) to the
batch verification of ECDSA signatures?
23. Describe how a zero-knowledge authentication scheme can be converted to a
signature scheme.
24. Let n = pq be a product of two primes p, q each congruent to 3 modulo 4.
17 David Chaum, Blind signatures for untraceable payments, Crypto, 199–202, 1982.
462 Computational Number Theory
(a) Prove that every quadratic residue modulo this n has four square roots,
of which exactly one is again a quadratic residue modulo n. Argue that from
the knowledge of p and q, this unique square root can be easily determined.
(b) Suppose that Alice wants to prove her knowledge of the factorization of n
to Bob using the following authentication protocol. Bob generates a random
x ∈ Zn , and sends the challenge c ≡ x4 (mod n) to Alice. Alice computes the
unique square root r of c which is a square in Zn . Alice sends r to Bob. Bob
accepts Alice’s identity if and only if r ≡ x2 (mod n). Show that Bob can
send a malicious challenge c (not a fourth power) to Alice such that Alice’s
response r reveals the factorization of n to Bob.
25. Let e : G × G → G3 be a bilinear map (easily computable). Prove that the
DLP in G is no more difficult than the DLP in G3 .
26. In this exercise, we deal with Boneh and Boyen’s identity-based encryption
scheme.18 Let G, G3 be groups of prime order r, P a generator of G, and
e : G × G → G3 a bilinear map. The master secret key of the TA consists of
two elements s1 , s2 ∈ Z∗r , and the public keys are Y1 = s1 P and Y2 = s2 P .
In the registration phase for Bob, the TA generates a random t ∈ Z∗r , and
computes K = (PBob + s1 + s2 t)−1 P , where PBob ∈ Z∗r is the hashed public
identity of Bob, and where the inverse is computed modulo r. Bob’s private
key is the pair DBob = (t, K).
In order to encrypt a message m ∈ G3 for Bob, Alice generates a random
k ∈ Z∗r , and computes the ciphertext (U, V, W ) ∈ G × G × G3 , where U =
kPBob P + kY1 , V = kY2 , and W = M × e(P, P )k .
Describe how the ciphertext (U, V, W ) is decrypted by Bob.
27. Generalize the Sakai–Ohgishi–Kasahara key-agreement protocol to the general
setting of a bilinear map e : G1 × G2 → G3 (with G1 6= G2 , in general).19
28. Okamoto and Okamoto propose a three-party non-interactive key-agreement
scheme.20 The TA sets up a bilinear map e : G × G → G3 with r = |G| = |G3 |.
The TA also chooses a secret polynomial of a suitable degree k :
f (x) = d0 + d1 x + · · · + dk xk ∈ Zr [x].
29. Sakai, Ohgishi and Kasahara propose an identity-based signature scheme (in
the same paper where they introduced their key-agreement scheme). The pub-
lic parameters r, G, G3 , e, P, Ppub , and the master secret key s are as in the
SOK key-agreement scheme. Also, let H : {0, 1}∗ → G be a hash function.
Bob has the hashed public identity PBob and the private key DBob = sPBob .
In order to sign a message M , Bob chooses a random d ∈ Zr , and computes
U = dP ∈ G. For h = H(PBob , M, U ), Bob also computes V = DBob +dh ∈ G.
Bob’s signature on M is (U, V ). Describe how this signature can be verified.
30. Cha and Cheon’s identity-based signature scheme21 uses a bilinear map e :
G × G → G3 , and hash functions H1 : {0, 1}∗ → G and H2 : {0, 1}∗ × G → Zr ,
where r = |G| = |G3 |. For the master secret key s, the TA’s public identity is
Ppub = sP , where P is a generator of G. Bob’s hashed (by H1 ) public identity
is PBob , and Bob’s private key is DBob = sPBob . Bob’s signature on a message
M is (U, V ), where, for a randomly chosen t ∈ Zr , Bob computes U = tPBob ,
and V = (t + H2 (M, U ))DBob . Explain how verification of (U, V ) is done in
the Cha–Cheon scheme. Discuss how the security of the Cha–Cheon scheme
is related to the bilinear Diffie–Hellman problem. Compare the efficiency of
the Cha–Cheon scheme with the Paterson scheme.
31. Boneh and Boyen propose short signature schemes (not identity-based).22
These schemes do not use hash functions. In this exercise, we deal with one
such scheme. Let e : G1 × G2 → G3 be a bilinear map with G1 , G2 , G3 hav-
ing prime order r. Let P be a generator of G1 , Q a generator of G2 , and
g = e(P, Q). The public parameters are r, G1 , G2 , G3 , e, P, Q, g. Bob selects a
random d ∈ Z∗r (private key), and makes Y = dQ public. Bob’s signature on
the message m ∈ Zr is σ = (d + m)−1 P ∈ G1 . Here, (d + m)−1 is computed
modulo r, and is taken to be 0 if r|(d + m). Describe a verification procedure
for this scheme. Argue that this verification procedure can be implemented to
be somewhat faster than verification in the BLS scheme.
32. Boneh and Boyen’s scheme presented in Exercise 9.31 is weakly secure in
some sense. In order to make the scheme strongly secure, they propose a
modification which does not yield very short signatures. Indeed, the signature
size is now comparable to that in DSA or ECDSA. For this modified scheme,
the parameters r, G1 , G2 , G3 , e, P, Q, g are chosen as in Exercise 9.31. Two
random elements d1 , d2 ∈ Z∗r are chosen by Bob as his private key, and the
elements Y1 = d1 Q ∈ G2 and Y2 = d2 Q ∈ G2 are made public. In order
to sign a message m ∈ Zr , Bob selects a random t ∈ Zr with t 6≡ −(d1 +
m)d−12 (mod r), and computes σ = (d1 + d2 t + m)−1 P ∈ G1 , where the
inverse is computed modulo r. Bob’s signature on m is the pair (t, σ). Describe
a verification procedure for this scheme. Argue that this verification procedure
can be implemented to be somewhat faster than BLS verification.
21 Jae Choon Cha and Jung Hee Cheon, An identity-based signature from gap Diffie–
Programming Exercises
Using GP/PARI, implement the following functions.
33. RSA encryption and decryption.
34. ElGamal encryption and decryption.
35. RSA signature generation and verification.
36. ElGamal signature generation and verification.
37. DSA signature generation and verification.
38. ECDSA signature generation and verification.
Assuming that GP/PARI functions are available for Weil and distorted Weil
pairing on supersingular elliptic curves, implement the following functions.
39. Boneh–Franklin encryption and decryption.
40. Paterson signature generation and verification.
41. BLS short signature generation and verification.
Appendices
Appendix A
Background
467
468 Computational Number Theory
of cache), and so on. It is, therefore, customary to express the running time of
an algorithm (or an implementation of it) in more abstract (yet meaningful)
terms. An algorithm can be viewed as a transformation that converts its input
I to an output O. The size n = |I| of I is usually the parameter, in terms of
which the running time of the algorithm is specified. This specification is good
if it can be expressed as a simple function of n. This leads to the following
order notations. These notations are not invented by computer scientists, but
have been used by mathematicians for ages. Computer scientists have only
adopted them in the context of analyzing algorithms.
Definition A.1 [Big-O notation] We say that a function f (n) is of the order
of g(n), denoted f (n) = O(g(n)), if there exist a positive real constant c and
a non-negative integer n0 such that f (n) 6 cg(n) for all n > n0 .1 ⊳
Intuitively, f (n) = O(g(n)) implies that f does not grow faster than g
up to multiplication by a positive constant value. Moreover, initial patterns
exhibited by f and g (that is, their values for n < n0 ) are not of concern to
us. The inequality f (n) 6 cg(n) must hold for all sufficiently large n.
Example A.2 (1) Let f (n) = 4n3 − 16n2 + 4n + 25, and g(n) = n3 . We
see that f (n) = 4(n + 1)(n − 2)(n − 3) + 1 > 0 for all integers n > 0. That
f (2.5) = −2.5 < 0 does not matter, since we are not interested in evaluating
f at fractional values of n. We have f (n) 6 4n3 + 4n + 25. But n 6 n3 and
1 6 n3 for all n > 1, so f (n) 6 4n3 + 4n3 + 25n3 = 33n3 for all n > 1, that is,
f (n) = O(g(n)). Conversely, g(n) = n3 = 13 (4n3 −n3 ) 6 13 (4n3 −n3 +4n+25) 6
1 3 2
3 (4n − 16n + 4n + 25) for all n > 16, so g(n) = O(f (n)) too.
(2) The example of Part (1) can be generalized. Let f (n) = ad nd +
ad−1 nd−1 + · · · + a1 n + a0 be a polynomial with ad > 0. For all sufficiently
large n, the term ad nd dominates over the other non-zero terms of f (n), and
consequently, f (n) = O(nd ), and nd = O(f (n)).
(3) Let f (n) = 4n3 − 16n2 + 4n + 25 as in Part (1), but g(n) = n4 . For
all n > 1, we have f (n) 6 10n3 6 10n4 , so that f (n) = O(n4 ). We prove by
1 In this context, some authors prefer to say that f is big-O of g. For them, f is of the
contradiction that g(n) = n4 is not O(f (n)). Suppose that g(n) = O(f (n)).
This implies that there exist constants c > 0 (real) and n0 > 0 (integer) such
that n4 6 c(4n3 − 16n2 + 4n + 25), that is, n4 − c(4n3 − 16n2 + 4n + √ 25) 6 0 for
n > n0 . But 41 n4 > 4cn3 for n > 16c = r1 (say), 41 n4 > 4cn for n > 3 16c = r2 ,
√
and 14 n4 > 25c for n > 4 100c = r3 . For any ¡ 1 integer n¢> max(n0¡, r1 , r2 , r3 ), ¢we
1 4
have
¡1 4 n4
−c(4n 3
¢ 1 4−16n 2
+4n+25) = 4 n + 4 n4
− 4cn3
+16cn2 + 14 n4 − 4cn +
2
4 n − 25c > 4 n + 16cn > 0, a contradiction.
More generally, if f (n) is a polynomial of degree d, and g(n) a polynomial
of degree e with 0 6 d 6 e, then f (n) is O(g(n)). If e = d, then g(n) is O(f (n))
too. But if e > d, then g(n) is not O(f (n)). Thus, the degree of a polynomial
function determines its rate of growth under the O( ) notation.
(4) Let f (n) = 2 + sin n, and g(n) = 2 + cos n. Because of the bias 2, the
functions f and g are positive-valued. Evidently, f (n) > g(n) infinitely often,
and also g(n) > f (n) infinitely often. But 1 6 f (n) 6 3 and 1 6 g(n) 6 3 for
all n > 0, that is, f (n) 6 3g(n) and g(n) 6 3f (n) for all n > 0. Thus, f (n)
is O(g(n)) and g(n) is O(f (n)). Indeed, both these functions are of the order
of the constant function 1, and conversely. This is intuitively clear, since f (n)
and g(n) remain confined in the band [1, 3], and do not grow at all.
½ ½
2 2
(5) Let f (n) = n if n is even , and g(n) = n3 if n is odd . We see
3
n if n is odd n if n is even
that f grows strictly faster than g for odd values of n, whereas g grows strictly
faster than f for even values of n. This implies that neither f (n) is O(g(n))
nor g(n) is O(f (n)). ¤
The big-O notation of Definition A.1 leads to some other related notations.
Definition A.3 Let f (n) and g(n) be functions.
(1) [Big-Omega notation] If f (n) = O(g(n)), we write g(n) = Ω(f (n)). (Big-
O indicates upper bound, whereas big-Omega indicates lower bound.)
(2) [Big-Theta notation] If f (n) = O(g(n)) and f (n) = Ω(g(n)), we write
f (n) = Θ(g(n)). (In this case, f and g exhibit the same rate of growth
up to multiplication by positive constants.)
(3) [Small-o notation] We say that f (n) = o(g(n)) if for every positive
constant c (however small), there exists n0 ∈ N0 such that f (n) 6 cg(n)
for all n > n0 . (Here, g is an upper bound on f , which is not tight.)
(4) [Small-omega notation] If f (n) = o(g(n)), we write g(n) = ω(f (n)).
(This means that f is a loose lower bound on g.)
All these order notations are called asymptotic, since they compare the growths
of functions for all sufficiently large n. ⊳
Example A.4 (1) For any non-negative integer d and real constant a > 1,
we have nd = o(an ). In words, any exponential function asymptotically grows
faster than any polynomial function. Likewise, logk n = o(nd ) for any positive
k and d, that is, any logarithmic (or poly-logarithmic) function asymptotically
grows more slowly than any polynomial function.
470 Computational Number Theory
³√ ´
(2) Consider the subexponential function L(n) = exp n ln n . We have
L(n) = o(an ) for any real constant a > 1. Also, L(n) = ω(nd ) for any integer
constant d > 0. This is why L(n) is called a subexponential function of n.
(3) It may be tempting to conclude that f (n) = o(g(n)) if and only if
f (n) = O(g(n)) but g(n) 6= O(f (n)). For most functions we encounter during
analysis of algorithms, these two notions of loose upper bounds turn out to
be the same. However, there exists a subtle difference between
n them. As an
n if n is odd
illustration, take f (n) = n for all n > 0, whereas g(n) = .
n2 if n is even
We have f (n) = O(g(n)) and g(n) 6= O(f (n)). But f (n) 6= o(g(n)), since for
the choice c = 1/2, we cannot find an n0 such that f (n) 6 cg(n) for all n > n0
(look at the odd values of n). ¤
Some comments about the input size n are in order now. There are several
units in which n can be expressed. For example, n could be the number of bits
in (some reasonable encoding of) the input I. When we deal with an array
of n integers each fitting into a standard 32-bit (or 64-bit) machine word, the
bit size of the array is 32n (or 64n). Since we are interested in asymptotic
formulas with constant factors neglected, it is often convenient to take the
size of the array as n. Physically too, this makes sense, since now the input
size is measured in units of words (rather than bits).
It may be the case that the input size is specified by two or more indepen-
dent parameters. An m × n matrix of integers has the input size mn (in terms
of words, or 32mn in terms of bits). However, m and n can be independent of
one another, and we often express the running time of a matrix-manipulation
algorithm as a function of two arguments m and n (instead of a function of
one argument mn). If m = n, that is, for square (n × n) matrices, the input
size is n2 , but running times are expressed in terms of n, rather than of n2 .
Here, n is not the input size, but a parameter that dictates the input size.
In computational number theory, we deal with large integers which do
not fit in individual machine words. Treating each such integer as having a
constant size is not justifiable. An integer k fits in ⌈log232 k⌉ 32-bit machine
words. A change in the base of logarithms affects this size by a constant
factor, so we take the size of k as log k. For a polynomial input of degree d
with coefficients modulo a large integer m, the input size is 6 (d + 1) log m,
since the polynomial contains at most d + 1 non-zero coefficients, and ⌈log m⌉
is an upper bound on the size of each coefficient. We may treat the size of the
polynomial as consisting of two independent parameters d and log m.
The order notations introduced in Definition A.1 and A.3 neglect constant
factors. Sometimes, it is useful to neglect logarithmic factors too.
Definition A.5 [Soft-O notation] Let f (n) and g(n) be functions. We say
that f (n) = O˜(g(n)) if f (n) = O(g(n) logt g(n)) for some constant t > 0. ⊳
where n1 , n2 , . . . , nk (and perhaps also k) depend upon n, and c(n) is the cost
associated with the generation of the subproblems and with the combination
of the solutions of the k subproblems. Such an expression of a function in terms
of itself (but with different argument values) is called a recurrence relation.
Solving a recurrence relation to obtain a closed-form expression of the function
is an important topic in the theory of algorithms.
F0 = 0,
F1 = 1,
Fn = Fn−1 + Fn−2 for n > 2.
472 Computational Number Theory
T (1) = 1,
T (n) = T (⌈n/2⌉) + T (⌊n/2⌋) + n for n > 2.
We plan to generalize this result to the claim that T (n) = Θ(n log n) (for all
values of n). Proving this claim rigorously involves some careful considerations.
For any n > 1, we can find a t such that 2t 6 n < 2t+1 . Since we already
know T (2t ) and T (2t+1 ), we are tempted to write T (2t ) 6 T (n) 6 T (2t+1 ) in
order to obtain a lower bound and an upper bound on T (n). But the catch
is that we are permitted to write these inequalities provided that T (n) is an
increasing function of n, that is, T (n) 6 T (n + 1) for all n > 1. I now prove
this property of T (n) by induction on n.
Since T (1) = 1 and T (2) = 2T (1) + 2 = 4, we have T (1) 6 T (2). For the
inductive step, take n > 2, and assume that T (m) 6 T (m + 1) for all m < n.
If n = 2m (even), then T (n) = 2T (m) + 2m, whereas T (n + 1) = T (m + 1) +
T (m) + (2m + 1), that is, T (n + 1) − T (n) = T (m + 1) − T (m) + 1 > 0 (by the
induction hypothesis), that is, T (n) 6 T (n + 1). If n = 2m + 1 (odd), then
T (n) = T (m+1)+T (m)+(2m+1), whereas T (n+1) = 2T (m+1)+(2m+2),
474 Computational Number Theory
This analysis for merge sort can be easily adapted to a generalized setting
of certain types of divide-and-conquer algorithms. Suppose that an algorithm,
upon an input of size n, creates a > 1 sub-instances, each of size (about) n/b.
Suppose that the total effort associated with the divide and combine steps is
Θ(nd ) for some constant d > 0. The following theorem establishes the (tight)
order of the running time of this algorithm.
i j k
↓ ↓ ↓
L a E x U y G
Before entering the partitioning loop, the first element of A belongs to the
block E, whereas all the remaining n − 1 elements belong to the block U . If we
imagine that L is the empty block sitting before E and that G is the empty
block sitting after U , we start with an LEU G decomposition of A.
Inside the partitioning loop, the first element of U (that is, the element x
pointed to by j) is considered for processing. Depending upon the result of
comparison of x with the pivot a, one of the following three actions are made.
• If x = a, the region E grows by including x. This is effected by incre-
menting the index j (by one).
• If x > a, then x should go to the block G, that is, x may go to the location
indexed by k. But this location already contains an unprocessed element
y. The elements x and y are, therefore, swapped, and k is decremented to
mark the growth of the block G. But j is not altered, since it now points
to the unprocessed element y to be considered in the next iteration.
• If x < a, the element x should join the region L. Since L grows by
one cell, the entire block E should shift by one cell. However, since E
contains only elements with values equal to a, this shift of E can be
more easily implemented by exchanging x with the first element of E,
that is, by exchanging the elements pointed to by i and j. Both i and j
should then be incremented by 1 in order to indicate the advance of the
region E. The other end of U (that is, the index k) is not altered.
Each iteration of this loop processes one element from U . After exactly
n − 1 iterations, U shrinks to a block of size 0, that is, the desired LEG
decomposition of A is achieved. The partitioning procedure takes Θ(n) time,
since each of the n − 1 iterations involves only a constant amount of work.
Let us now investigate the running time of the quick-sort algorithm. The
original array A contains n elements. Suppose that, after the LEG decom-
position, the block L contains n1 elements, whereas the block G contains n2
elements. Since the pivot a was chosen as an element of the array A, the block
Background 477
T (0) = 1,
T (1) = 1,
T (n) = T (n1 ) + T (n2 ) + n for n > 2.
T (n) = T (n − 1) + T (0) + n = T (n − 1) + (n + 1)
= T (n − 2) + n + (n + 1)
= T (n − 3) + (n − 1) + n + (n + 1)
= ···
= T (0) + 2 + 3 + · · · + (n − 1) + n + (n + 1)
= 1 + 2 + 3 + · · · + (n − 1) + n + (n + 1)
= (n + 1)(n + 2)/2,
which polynomial-time algorithms are still not known. For example, primality
testing has always been in the class P. It is only in August 2002 when we
know that this problem is indeed in P. This indicates that our understanding
of the boundary of the class P can never be clear. For certain problems, we can
prove superpolynomial lower bounds, so these problems are naturally outside
P. But problems like integer factorization would continue to bother us.
Intuitively, the class P contains precisely those problems that are easily
solvable. Of course, an O(n100 )-time algorithm would be practically as worth-
less as an O(2n )-time algorithm. Nonetheless, treating easy synonymously as
polynomial-time is a common perception in computer science.
An introduction to the class NP requires some abstraction. The basic idea
is to imagine algorithms that can guess. Suppose that we want to sort an array
A of n integers. Let there be an algorithm which, upon the input of A and
n, guesses the index i at which the maximum of A resides. The algorithm
then swaps the last element of A with the element at index i. Now that the
maximum of A is in place, we reduce the original problem to that of sorting
an array of n − 1 elements. By repeatedly guessing the maximum, we sort A
in n iterations, provided that each guess made in this process is correct.
There are two intricacies involved here. First, what is the running time of
this algorithm? We assume that each guess can be done in unit time. If so,
the algorithm runs in Θ(n) time. But who will guarantee that the guesses the
algorithm makes are correct? Nobody! One possibility to view this guessing
procedure is to create parallel threads, each handling a guess. In this case,
we talk about the parallel running time of this algorithm. Since the parallel
algorithm must behave gracefully for all input sizes n, there must be an infinite
number of computing elements to allow perfect parallelism among all guesses.
A second way to realize a guess is to make all possible guesses one after another
in a sequential manner. For the sorting example, the first guess involves n
possible indices, the second n−1 indices, the third n−2, and so on. Thus, there
is a total of n! guesses, among which exactly one gives the correct result. Since
n! = ω(2n ), we end up with an exponential-time (sequential) simulation of the
guessing algorithm. A third way of making correct guesses is an availability
of the guesses before the algorithm runs. Such a sequence of correct guesses is
called a certificate. But then, who will supply a certificate? Nobody. We can
only say that if a certificate is provided, we can sort in linear time.
For the sorting problem, there is actually not a huge necessity to guess. We
can compute the index i of the maximum in A in only O(n) time. When guess-
ing is replaced by this computation, we come up with an O(n2 )-time sorting
algorithm, popularly known as selection sort. For some other computational
problems, it is possible to design efficient guessing algorithms, for which there
is no straightforward way to replace guessing by an easy computation.
As an example, let us compute the discrete logarithm of a ∈ F∗p to a
primitive element g of F∗p (where p ∈ P). We seek for an integer x such that
a ≡ g x (mod p). Suppose that a bound s on the bit size of x is given. Initially,
we take 0 6 x 6 p − 2, so the bit size of p supplies a bound on s. Let us write
Background 481
of the problem. We replace guesses by random choices, and run the polynomial-
time verifier on these choices. If this procedure is repeated for a few number
of times, we hope to arrive at the solution in at least one of these runs.
First, consider decision problems (problems with Yes/No answers). For
instance, the complement of the primality-testing problem, that is, checking
the compositeness of n ∈ N, is in NP, since a non-trivial divisor d of n (a divisor
in the range 2 6 d 6 n − 1) is a succinct certificate for the compositeness of
n. This certificate can be verified easily by carrying out a division of n by d.
But then, how easy is it to guess such a non-trivial divisor of a composite
n? If n is the square of a prime, there exists only one such non-trivial divisor.
A random guess reveals this divisor with a probability of about 1/n, which
is exponentially small in the input size log n. Even when n is not of this
particular form, it has only a few non-trivial divisors, and trying to find one
by chance is like searching for a needle in a haystack.
An idea based upon Fermat’s little theorem leads us to more significant
developments. The theorem states that if p is prime, and a is not a multiple of
p, then ap−1 ≡ 1 (mod p). Any a (coprime to n) satisfying an−1 6≡ 1 (mod n) is
a witness (certificate) to the compositeness of n. We know that if a composite
n has at least one witness, then at least half of Z∗n are witnesses too. Therefore,
it makes sense that we randomly choose an element of Zn , and verify whether
our choice is really a witness. If n is composite (and has a witness), then after
only a few random choices, we hope to locate one witness, and become certain
about the fact that n is composite. However, if no witness is located in several
iterations, there are two possibilities: n does not have a witness at all, or
despite n having witnesses, we have been so unlucky that we missed them in
all these iterations. In this case, we declare n as prime with the understanding
that this decision may be wrong, albeit with a small probability.
The Fermat test exemplifies how randomization helps us to arrive at prac-
tical solutions to computational problems. A Monte Carlo algorithm is a ran-
domized algorithm which always runs fast but may supply a wrong answer
(with low probability). The Fermat test is No-biased, since the answer No
comes with zero probability of error. Yes-biased Monte Carlo algorithms, and
Monte Carlo algorithms with two-sided errors may be conceived of.
The problem with the Fermat test is that it deterministically fails to
find witnesses for Carmichael numbers. The Solovay–Strassen and the Miller–
Rabin tests are designed to get around this problem. There are deterministic
polynomial-time primality tests (like the AKS test), but the randomized tests
are much more efficient and practical than the deterministic tests.
Another type of randomized algorithms needs mention in this context. A
Las Vegas algorithm is a randomized algorithm that always produces the cor-
rect answer, but has a fast expected running time. This means that almost
always we expect a Las Vegas algorithm to terminate fast, but on rare occa-
sions, we may be so unlucky about the random guesses made in the algorithm
that the algorithm fails to supply the (correct) answer for a very long time.
Background 483
Root-finding algorithms for polynomials over large finite fields (Section 3.2)
are examples of Las Vegas algorithms. Let f (x) ∈ Fq [x] be a polynomial with
q odd. For a random α ∈ Fq , the polynomial gcd((x + α)(q−1)/2 − 1, f (x)) is
a non-trivial factor of f (x) with probability > 1/2. Therefore, computing this
gcd for a few random values of α is expected to produce a non-trivial split of
f (x). The algorithm is repeated recursively on the two factors of f (x) thus
revealed. If f is of degree d, we need d − 1 splits to obtain all the roots of f .
However, we may be so unlucky that a very huge number of choices for α fails
to produce a non-trivial split of a polynomial. Certainly, such a situation is
rather unlikely, and that is the reason why Las Vegas algorithms are useful in
practice. It is important to note that no deterministic algorithm that runs in
time polynomial in log q is known to solve this root-finding problem.
As another example, let us compute a random prime of a given bit length
2l
l. By the prime number theorem, the number of primes < 2l is about 0.693l .
Therefore, if we randomly try O(l) l-bit integers, we expect with high probabil-
ity that at least one of these candidates is a prime. We subject each candidate
to a polynomial-time (in l) primality test (deterministic or randomized) until
a prime is located. The expected running time of this algorithm is polynomial
in l. But chances remain, however small, that even after trying a large number
of candidates, we fail to encounter a prime.
Randomized algorithms (also called probabilistic algorithms) are quite use-
ful in number-theoretic computations. They often are the most practical
among all known algorithms, and sometimes the only known polynomial-time
algorithms. However, there are number-theoretic problems for which even ran-
domization does not help much. For example, the best known algorithms for
factoring integers and for computing discrete logarithms in finite fields have
randomized flavors, and are better than the best deterministic algorithms
known for solving these problems, but the improvement in the running time
is from exponential to subexponential only.
A.2.2 Groups
We now study sets with operations.
Example A.13 (1) The set Z of integers is an Abelian group under addition.
The identity in this group is 0, and the inverse of a is −a. Multiplication is
an associative and commutative operation on Z, and 1 is the multiplicative
identity, but Z is not a group under multiplication, since multiplicative inverses
exist in Z only for the elements ±1.
(2) The set Q of rational numbers is an Abelian group under addition. The
set Q∗ = Q\{0} of non-zero rational numbers is a group under multiplication.
Likewise, R (the set of real numbers) and C (complex numbers) are additive
groups, and their multiplicative groups are R∗ = R \ {0} and C∗ = C \ {0}.
(3) The set A[x] of polynomials over A in one variable x (where A is Z,
Q, R or C) is a group under polynomial addition. Non-zero polynomials do
not form a group under polynomial multiplication, since inverses do not exist
for all elements of A[x] (like x). These results can be generalized to the set
A[x1 , x2 , . . . , xn ] of multivariate polynomials over A.
(4) The set of all m×n matrices (with integer, rational, real or complex en-
tries) is a group under matrix addition. The set of all invertible n × n matrices
with rational, real or complex entries is a group under matrix multiplication. ¤
Existence of inverses in groups leads to the following cancellation laws.
Proposition A.14 Let a, b, c be elements of a group G (with operation ⋄). If
a ⋄ b = a ⋄ c, then b = c. Moreover, if a ⋄ c = b ⋄ c, then a = b. ⊳
Definition A.15 Let G be a group under ⋄ . A subset H of G is called a
subgroup of G if H is also a group under the operation ⋄ inherited from G.
In order that a subset H of G is a subgroup, it suffices that H is closed under
the operation ⋄ and also under taking inverses, or equivalently if a ⋄ b−1 ∈ H
for all a, b ∈ H, where b−1 is the inverse of b in G. ⊳
Example A.16 (1) Z is a subgroup of R (under addition).
(2) Z is a subgroup of Z[x] (under addition). The set of all polynomials
in Z[x] with even constant terms is another subgroup of Z[x].
(3) The set of n × n matrices with determinant 1 is a subgroup of all n × n
invertible matrices (under matrix multiplication). ¤
For the time being, let us concentrate on multiplicative groups, that is,
groups under some multiplication operations. This is done only for notational
convenience. The theory is applicable to groups under any operations.
Definition A.17 Let G be a group, and H a subgroup. For a ∈ G, the set
aH = {ah | h ∈ H} is called a left coset of H in G. Likewise, for a ∈ G, the
set Ha = {ha | h ∈ H} is called a right coset of H in G. ⊳
Proposition A.18 Let G be group (multiplicatively written), H a subgroup,
and a, b ∈ G. The left cosets aH and bH are in bijection with one another. For
every a, b ∈ G, we have either aH = bH or aH ∩ bH = ∅. We have aH = bH
if and only if a−1 b ∈ H. ⊳
486 Computational Number Theory
A similar result holds for right cosets too. The last statement should be
modified as: Ha = Hb if and only if ab−1 ∈ H.
Definition A.19 Let H be a subgroup of G. The count of cosets (left or
right, not both) of H in G is called the index of H in G, denoted [G : H]. ⊳
Proposition A.18 tells that the left cosets (also the right cosets) form a
partition of G. It therefore follows that:
Corollary A.20 [Lagrange’s theorem] Let G be a finite group, and H a sub-
group. Then, the size of G is an integral multiple of the size of H. Indeed, we
have |G| = [G : H]|H|. ⊳
A particular type of subgroups is of important concern to us.
Proposition A.21 For a subgroup H of G, the following conditions are equi-
valent:
(a) aH = Ha for all a ∈ G.
(b) aHa−1 = H for all a ∈ G (where aHa−1 = {aha−1 | h ∈ H}).
(c) aha−1 ∈ H for all a ∈ G and for all h ∈ H. ⊳
Definition A.22 If H satisfies these equivalent conditions, it is called a nor-
mal subgroup of G. Every subgroup of an Abelian group is normal. ⊳
Definition A.23 Let H be a normal subgroup of G, and G/H denote the
set of cosets (left or right) of H in G. Define an operation on G/H as
(aH)(bH) = abH.
It is easy to verify that this is a well-defined binary operation on G/H, and
that G/H is again a group under this operation. H = eH (where e is the
identity in G) is the identity in G/H, and the inverse of aH is a−1 H. We call
G/H the quotient of G with respect to H. ⊳
Example A.24 (1) Take the group Z under addition, n ∈ N, and H =
nZ = {na | a ∈ Z}. Then, H is a subgroup of Z. Since Z is Abelian, H is
normal. We have [Z : H] = n. Indeed, all the cosets of H in Z are a + nZ for
a = 0, 1, 2, . . . , n − 1. We denote the set of these cosets as Zn = Z/nZ.
(2) Let G = Z[x] (additive group), and H the set of polynomials in G
with even constant terms. H is normal in G. In this case, [G : H] = 2. The
quotient group G/H contains only two elements: H and 1 + H. ¤
In many situations, we deal with quotient groups. In general, the elements
of a quotient group are sets. It is convenient to identify some particular element
of each coset as the representative of that coset. When we define the group
operation on the quotient group, we compute the representative of the result
from the representatives standing for the operands. For example, Zn is often
identified as the set {0, 1, 2, . . . , n − 1}, and the addition of Zn is rephrased in
terms of modular addition. Algebraically, an element a ∈ Zn actually stands
for the coset a + nZ = {a + kn | k ∈ Z}. The addition of cosets (a + nZ) +
(b + nZ) = (a + b) + nZ is consistent with addition modulo n.
Background 487
Example A.31 The size of Z∗n is φ(n). If Z∗n is cyclic, then the number
of generators of Z∗n is φ(φ(n)). In particular, for a prime p, the number of
generators of Z∗p is φ(p − 1). ¤
Definition A.37 The integer r in Theorem A.36 is called the rank of the
Abelian group G. (For finite Abelian groups, r = 0.) All elements of G with
finite orders form a subgroup of G, isomorphic to Zn1 × Zn2 × · · · × Zns , called
the torsion subgroup of G. ⊳
Example A.42 (1) Z is an integral domain, but not a field. The only units
of Z are ±1.
(2) Z[i] = {a + ib | a, b ∈ Z} ⊆ C is an integral domain called the ring of
Gaussian integers. The only units of Z[i] are ±1, ±i.
(3) Q, R, C are fields.
(4) Zn under addition and multiplication modulo n is a ring. The units
of Zn constitute the group Z∗n . Zn is a field if and only if n is prime. For a
prime p, the field Zp is also denoted as Fp .
(5) If R is a ring, the set R[x] of all univariate polynomials with coefficients
from R is a ring. If R is an integral domain, so too is R[x]. R[x] is never a
field. We likewise have the ring R[x1 , x2 , . . . , xn ] of multivariate polynomials.
(6) Let R be a ring. The set R[[x]] of (infinite) power series over R is a
ring. If R is an integral domain, so also is R[[x]].
(7) Let R be a field. The set R(x) = {f (x)/g(x) | f (x), g(x) ∈ R[x], g(x) 6=
0} of rational functions over R is again a field.
(8) The set of all n × n matrices (with elements from a field) is a non-
commutative ring. It contains non-zero zero divisors (for n > 1).
(9) The Cartesian product R1 × R2 × · · · × Rn of rings R1 , R2 , . . . , Rn is
again a ring under element-wise addition and multiplication operations. ¤
Theorem A.79 Let V, W be K-vector spaces. The set of all K-linear maps
V → W , denoted HomK (V, W ), is a K-vector space with addition defined as
(f +g)(x) = f (x)+g(x) for all x ∈ V , and with scalar multiplication defined as
(af )(x) = af (x) for all a ∈ K and x ∈ V . If m = dimK V and n = dimK W
are finite, then the dimension of HomK (V, W ) as a K-vector space is mn. ⊳
Definition A.80 For a K-vector space V , the K-vector space HomK (V, K)
is called the dual space of V . The K-vector spaces V and HomK (V, K) are
isomorphic (and have the same dimension over K). ⊳
A.2.5 Polynomials
Polynomials play a crucial role in the algebra of fields. Let K be a field.
Since the polynomial ring K[x] is a PID, irreducible polynomials are same as
prime polynomials in K[x], and are widely used for defining field extensions.
Let f (x) ∈ K[x] be an irreducible polynomial of degree n. The ideal I =
f (x)K[x] = {f (x)a(x) | a(x) ∈ K[x]} generated by f (x) ∈ K[x] plays a role
similar to that played by ideals generated by integer primes. The quotient
ring L = K[x]/I is a field. We have K ⊆ L, so L is a K-vector space. The
dimension of L over K is n = deg f . We call n the degree of the field extension
K ⊆ L, denoted [L : K]. L contains a root α = x + I of f (x). We say
that L is obtained by adjoining the root α of f to K. Other roots of f (x)
may or may not belong to L. Elements of L can be written as polynomials
a0 + a1 α + a2 α2 + · · · + an−1 αn−1 with unique ai ∈ K. This representation of
L also indicates that L has the dimension n as a vector space over K. Indeed,
1, α, α2 , . . . , αn−1 constitute a K-basis of L. We write L = K(α).
the two properly complex roots. The polynomial (x − 21/3 ω)(x − 21/3 ω 2 ) =
x2 +21/3 x+22/3 is irreducible in Q(21/3 )[x]. Adjoining a root of this polynomial
to Q(21/3 ) gives a field of extension degree two over Q(21/3 ) and six over Q.
(3) If K is a finite field, adjoining a root of an irreducible polynomial to
K also adds the other roots of the polynomial to the extension. ¤
an
we have
f (α) = a1 f (α1 ) + a2 f (α2 ) + · · · + an f (αn )
a1
a2
= ( f (α1 ) f (α2 ) · · · f (αn ) ) ...
an
c c1,2 ··· c1,n
1,1 a1
c2,1 c2,2 ··· c2,n a2
= ( β1 β2 ··· βm )
.. .. .. .
..
. . ··· .
cm,1 cm,2 ··· cm,n an
a1
a2
= ( β1 β2 ··· βm ) M
... ,
an
where M = (ci,j ) is the m × n transformation matrix for f . The argument
α of f is specified by the scalars a1 , a2 , . . . , an , whereas the image f (α) =
b1 β1 + b2 β2 + · · · + bm βm is specified by the scalars b1 , b2 , . . . , bm . We have
b1 a1
b2 a2
. = M . ,
.. ..
bm an
that is, application of f is equivalent to premultiplication by the matrix M .
Now, take two linear maps f, g ∈ HomK (V, W ) (see Theorem A.79) with
transformation matrices M and N respectively. The linear map f + g has the
Background 497
l-th column to zero. This completes the processing of the first row. We then
recursively reduce the submatrix of A, obtained by removing the first row
and the first l columns. When processing a system of equations (instead of
its coefficient matrix), we apply the same elementary transformations on the
right sides of the equations.
Example A.86 Consider the following system over F7 :
5x2 + 3x3 + x4 = 6,
6x1 + 5x2 + 2x3 + 5x4 + 6x5 = 2,
2x1 + 5x2 + 5x3 + 2x4 + 6x5 = 6,
2x1 + 2x2 + 6x3 + 2x4 + 3x5 = 0.
The matrix of coefficients (including the right sides) is:
0 5 3 1 0 6
6 5 2 5 6 2
2 5 5 2 6 6
2 2 6 2 3 0
We convert this system to an REF using the following steps.
6 5 2 5 6 2
0 5 3 1 0 6
2
[Exchanging Row 1 with Row 2]
5 5 2 6 6
2 2 6 2 3 0
1 2 5 2 1 5
0 5 3 1 0 6
2
[Multiplying Row 1 by 6−1 ≡ 6 (mod 7)]
5 5 2 6 6
2 2 6 2 3 0
1 2 5 2 1 5
0 5 3 1 0 6
0
[Subtracting 2 times Row 1 from Row 3]
1 2 5 4 3
2 2 6 2 3 0
1 2 5 2 1 5
0
5 3 1 0 6 [Subtracting 2 times Row 1 from Row 4]
0 1 2 5 4 3 (Processing of Row 1 over)
0 5 3 5 1 4
1 2 5 2 1 5
0 1 2 3 0 4
0
[Multiplying Row 2 by 5−1 ≡ 3 (mod 7)]
1 2 5 4 3
0 5 3 5 1 4
1 2 5 2 1 5
0 1 2 3 0 4
0
[Subtracting 1 times Row 2 from Row 3]
0 0 2 4 6
0 5 3 5 1 4
500 Computational Number Theory
1 2 5 2 1 5
0 1
2 3 0 4
[Subtracting 5 times Row 2 from Row 4]
0 0 0 2 4 6 (Processing of Row 2 over)
0 0 0 4 1 5
1 2 5 2 1 5 (Column 3 contains no non-zero element,
0 1
2 3 0 4
so we proceed to Column 4)
0 0 0 1 2 3
0 0 0 4 1 5 [Multiplying Row 3 by 2−1 ≡ 4 (mod 7)]
1 2 5 2 1 5 [Subtracting 4 times Row 3 from Row 4]
0 1
2 3 0 4
(Processing of Row 3 over)
0 0 0 1 2 3
0 0 0 0 0 0 (Row 4 is zero and needs no processing)
This last matrix is an REF of the original matrix. ¤
We can convert a matrix in REF to a matrix in RREF. In the REF-
conversion procedure above, we have subtracted suitable multiples of the cur-
rent row from the rows below the current row. If the same procedure is applied
to rows above the current row, the RREF is obtained. Notice that this addi-
tional task may be done after the REF conversion (as demonstrated in Ex-
ample A.87 below), or during the REF-conversion procedure itself. When the
original matrix is sparse, the second alternative is preferable, since an REF of
the original sparse matrix may be quite dense, so zeroing all non-pivot column
elements while handling a pivot element preserves sparsity.
Example A.87 The following steps convert the last matrix of Example A.86
to the RREF. This matrix is already in REF.
1 0 1 3 1 4
0 1 2 3 0 4
[Subtracting 2 times Row 2 from Row 1]
0 0 0 1 2 3 (Handling Column 2)
0 0 0 0 0 0
1 0 1 0 2 2
0 1 2 3 0 4
[Subtracting 3 times Row 3 from Row 1]
0 0 0 1 2 3 (Handling Column 4)
0 0 0 0 0 0
1 0 1 0 2 2
0 1 2 0 1 2
[Subtracting 3 times Row 3 from Row 2]
0 0 0 1 2 3 (Handling Column 4)
0 0 0 0 0 0
In the last matrix, Columns 1, 2 and 4 contain pivot elements, so all non-pivot
entries in these columns are reduced to zero. Columns 3 and 5 do not contain
pivot elements, and are allowed to contain non-zero entries. ¤
Let us now come back to our original problem of solving a linear system
Ax = b. Using the above procedure, we convert (A | b) to an REF (or RREF)
Background 501
Example A.91 Let us compute det A for the matrix A of Example A.89. If
we are interested in computing only the determinant of A, it is not necessary
to convert I5 to A−1 , that is, the REF conversion may be restricted only to
the first five columns. We have s1 = 4, s2 = 5, s3 = 6, s4 = 1, s5 = 5, and
t = 2, that is, det A ≡ (−1)2 × 4 × 5 × 6 × 1 × 5 ≡ 5 (mod 7). ¤
Theorem A.93 For every matrix A, its row rank is the same as its column
rank. We refer to this common value as the rank of A or rank(A). If A is the
matrix of a K-linear map f : V → W (with dimK V = n and dimK W = m),
the rank of A is the same as the rank of f . ⊳
Definition A.94 The nullspace of A is the set of all solutions of the homo-
geneous system Ax = 0. These solutions, treated as n-tuples over K, form a
subspace of K n . The dimension of the nullspace of A is called the nullity of
A, denoted as nullity(A). ⊳
in each such sequence is the same for a given π. We call π an even or an odd permutation
according as whether this parity is even or odd, respectively. It turns out that (for n > 2)
exactly half of the n! permutations in Sn are even, and the rest odd. Sn is a group under
composition. The set An of even permutations in Sn is a subgroup of Sn (of index 2).
For example, for n = 5, consider the following sequence of transpositions: 1, 2, 3, 4, 5 →
1, 5, 3, 4, 2 → 1, 4, 3, 5, 2 → 3, 4, 1, 5, 2 → 2, 4, 1, 5, 3. It follows that 2, 4, 1, 5, 3 is an even
permutation, whereas 3, 4, 1, 5, 2 is an odd permutation of 1, 2, 3, 4, 5.
Background 507
2 2 6 2 3
ample A.86 over F7 . We have seen that x3 , x5 are free variables, and x1 , x2 , x4
are dependent variables. Therefore, rank(A) = 3 and nullity(A) = 2.
The RREF of A (Example A.87 and Example A.88(2)) indicates that all
solutions of Ax = 0 can be written as
6 5
5 6
x = x3 1 + x5 0 .
0 5
0 1
So (6 5 1 0 0) t and (5 6 0 5 1) t form a basis of the nullspace of A. ¤
minterm x0 x1 x2 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9
T0 = x̄0 x̄1 x̄2 0 0 0 1 1 1 1 1 0 1 0 0 0
T1 = x̄0 x̄1 x2 0 0 1 0 1 0 1 1 0 1 1 0 1
T2 = x̄0 x1 x̄2 0 1 0 0 1 0 0 0 1 0 1 0 0
T3 = x̄0 x1 x2 0 1 1 1 0 1 0 1 0 0 0 0 0
T4 = x0 x̄1 x̄2 1 0 0 1 0 1 1 1 1 0 0 0 0
T5 = x0 x̄1 x2 1 0 1 1 0 0 0 1 0 1 1 0 0
T6 = x0 x1 x̄2 1 1 0 1 1 0 0 1 1 0 1 0 0
T7 = x0 x1 x2 1 1 1 1 1 1 1 0 0 0 0 1 0
The RREF of A is (the steps for calculating this are not shown here):
1 0 0 0 0 0 0 1 0 1
0 1 0 0 0 1 0 1 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 1 0 0 0 0
.
0 0 0 0 1 0 0 1 0 1
0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0
Background 509
A.4 Probability
In a random experiment, there are several possibilities for the output. For
example, if a coin is tossed, the outcome may be either a head (H) or a tail
(T ). If a die is thrown, the outcome is an integer in the range 1–6. The life
of an electric bulb is a positive real number. To every possible outcome of
a random experiment, we associate a quantity called the probability of the
outcome, that quantifies the likelihood of the occurrence of that event.
In general, there is no good way to determine the probabilities of events
in a random experiment. In order to find the probability of obtaining H in
the toss of a coin, we may toss the coin n times, and count the number h of
occurrences of H. Define pn = h/n. The probability of H is taken as the limit
p = limn→∞ pn (provided that the limit exists). The probability of T is then
q = 1 − p. Likewise, we determine the probability of each integer 1, 2, . . . , 6 by
throwing a die an infinite number of times. Computing the probability of the
life of an electric bulb is more complicated. First, there are infinitely many
(even uncountable) possibilities. Second, we need to measure the life of many
bulbs in order to identify a pattern. As the number of bulbs tends to infinity,
the pattern tends to the probability distribution of the life of a bulb.
Evidently, these methods for determining the probabilities of events in a
random experiment are impractical (usually infeasible). To get around this
Background 511
and the closed interval [a, b] is {x ∈ R | a 6 x 6 b}. We also talk about intervals [a, b) and
(a, b], closed at one end and open at the other.
512 Computational Number Theory
If U and V are two discrete random variables with sample spaces X and
Y (both subsets of R), the random variable U + V has sample space X + Y =
{x + y | x ∈ X, y ∈ Y }. For z ∈ X +PY , the probability that the random
variable U + V assumes the value z is x,y Pr(U = x, V = y), where the
x+y=z
probability Pr(U = x, V = y) is the joint probability that U = x and V = y.
The random variables U and V are called independent if Pr(U = x, V = y) =
Pr(U = x) × Pr(V = y) for all x ∈ X and y ∈ Y .
The product U V is again a random variable with sample space XY =
{xy | x ∈ X, y ∈ Y }. For zP ∈ XY , the probability that the random variable
U V assumes the value z is x,y Pr(U = x, V = y).
xy=z
The sum and product of continuous random variables can also be defined
(although I do not do it here).
The expectation of a real-valued discrete random variable U with sample
space X is defined as
X
E(U ) = x Pr(U = x).
x∈X
In the worst case, n+1 elements should be chosen to ensure that there is at
least one collision. However, we expect collisions to occur much√ earlier, that is,
for much smaller values of k. More precisely, for k√≈ 1.18 n, the probability
of collision is more than√1/2, whereas for k ≈ 3.06 n, the probability is more
than 0.99. In short, Θ( n) choices suffice to obtain collisions with very high
probability. Assuming that each of the 365 days in the year is equally likely
to be the date of birth of a human, only 23 randomly chosen people have a
chance of at least half to contain a pair with the same birthday. For a group
of 58 randomly chosen people, this probability is at least 0.99.
It is not difficult to derive the probability of collision. The probability that
there is no collision in k draws is n(n − 1)(n − 2) · · · (n − k + 1)/nk , that is,
the probability of at least one collision is
n(n − 1)(n − 2) · · · (n − k + 1)
pcollision (n, k) = 1− k
µ ¶µ n ¶ µ ¶
1 2 k−1
= 1− 1− 1− ··· 1 − .
n n n
For a positive real value x much smaller than 1, we have 1 − x ≈ e−x . So long
as k is much smaller than n, we then have
1+2+···+(k−1) −k(k−1) −k2
pcollision (n, k) ≈ 1 − e− n =1−e 2n ≈1−e 2n .
Stated differently, we have
p √
k ≈ −2 ln(1 − pcollision (n, k)) × n.
√
√k) = 1/2 gives k ≈ 1.1774 n, whereas pcollision (n, k) =
Plugging in pcollision (n,
0.99 gives k ≈ 3.0349 n.
sources of random bits. Hardware RNGs are called true RNGs, because they
possess good statistical properties, at least theoretically. But hardware RNGs
are costly and difficult to control, and it is usually impossible to use them to
generate the same random sequence at two different times and/or locations.
Software RNGs offer practical solutions to the random-number generation
problem. They are called pseudorandom number generators (PRNG), since
they use known algorithms to generate sequences of random bits (or integers
or floating-point numbers). A PRNG operates on a seed initialized to a value
s0 . For i = 1, 2, 3, . . . , two functions f and g are used. The next element in the
pseudorandom sequence is generated as xi = f (si ), and the seed is updated to
si+1 = g(si ). In many cases, the seed itself is used as the random number, that
is, xi = si . PRNGs are easy to implement (in both hardware and software),
and are practically usable if their outputs look random.
Most commonly, PRNGs are realized using linear congruential generators
in which the pseudorandom sequence is generated as xi+1 ≡ axi + b (mod m)
for some suitable modulus m, multiplier a, and increment b. The parameters
a, b, m should be carefully chosen so as to avoid pseudorandom sequences of
poor statistical properties. An example of linear congruential generator is the
ANSI-C generator defined by m = 231 , a = 1103515245, and b = 12345. The
seed is initialized to x0 = 12345. This PRNG is known to have many flaws
but can be used in many practical situations.
PRNGs with provably good statistical properties are also known. In 1986,
Lenore Blum, Manuel Blum and Michael Shub propose a PRNG called the
Blum-Blum-Shub or BBS generator. This PRNG uses a modulus m = pq,
where both p and q are suitably large primes congruent to 3 modulo 4. The
sequence generation involves a modular squaring: xi+1 ≡ x2i (mod m). It is
not practical, since modular operations on multiple-precision integers are not
very efficient. Moreover, a statistical drawback of the BBS generator is that
each xi (except perhaps for i = 0) is a quadratic residue modulo m, that
is, all elements in Z∗m are not generated by the BBS generator. It, however,
is proved that if only O(log log m) least significant bits of xi are used in the
output stream, then the output bit sequence is indistinguishable from random,
so long as factoring the modulus m is infeasible. In view of this, the BBS
generator is often used in cryptographic applications.
So far, we have concentrated on generating uniformly random samples (bits
or elements of Zm ). Samples following other probability distributions can also
be generated using a uniform PRNG. The concept of cumulative probability
helps us in this context.
Let U be a random variable with sample space X. For simplicity, suppose
that X = {x1 , x2 , . . . , xn } is finite. Let pi = Pr(U = xi ). We break the
real interval [0, 1) in n disjoint sub-intervals: I1 = [0, p1 ), I2 = [p1 , p1 + p2 ),
I3 = [p1 + p2 , p1 + p2 + p3 ), . . . , In = [p1 + p2 + · · · + pn−1 , 1). We then generate
a uniformly random floating-point number x ∈ [0, 1). There is a unique k such
that Ik contains x. We find out this k (for example, using binary search), and
output xk . It is easy to argue that the output follows the distribution of U .
516 Computational Number Theory
If the elements of X are ordered as x1 < x2 < x3 < · · · < xn , the key role is
played here by the cumulative probabilities Pr(U 6 xk ) = p1 + p2 + · · · + pk .
In the case of a continuous random variable U , the summation is to be
replaced by integration. As an example, consider the exponential distribution
standing for the lifetime of an electric bulb: f (x) = e−x for all x > 0 (see
Example A.103(4), we have taken λ = 1 for simplicity). In order to generate
a bulb sample with a lifetime following this distribution,
R x we obtain the cumu-
lative probability distribution: F (x) = Pr(U 6 x) = x=0 e−x dx = 1 − e−x .
We generate a uniformly random floating-point ´ y ∈ [0, 1), and output x
³ value
1
satisfying y = F (x) = 1 − e−x , that is, x = ln 1−y .
Appendix B
Solutions to Selected Exercises
517
518 Computational Number Theory
a = A2 R2 + A1 R + A0 ,
b = B1 R + B0 ,
c = C3 R3 + C2 R2 + C1 R + C0 ,
where
C3 = A2 B1 ,
C2 = A2 B0 + A1 B1 ,
C1 = A2 B0 + A0 B1 ,
C0 = A0 B0 .
C3 = c(∞) = A2 B1 ,
C0 = c(0) = A0 B0 ,
C3 + C2 + C1 + C0 = c(1) = (A2 + A1 + A0 )(B1 + B0 ),
−C3 + C2 − C1 + C0 = c(−1) = (A2 − A1 + A0 )(−B1 + B0 ).
The four products on the right side are computed by a good multiplication
algorithm (these products have balanced operands). Since
1 0 0 0 C3 c(∞)
0 0 0 1 C2 c(0)
= ,
1 1 1 1 C1 c(1)
−1 1 −1 1 C0 c(−1)
Solutions to Selected Exercises 519
ui a + vi b = ri .
ui = ui−2 − ui−1 ,
vi = vi−2 − vi−1 .
The next step is more complicated. Now, ri is even, so an appropriate number
of 2’s should be factored out of it so as make it odd. Here, I describe the
removal of a single factor of 2. Since ui a + vi b = ri , dividing ri by 2 requires
dividing both ui and vi by 2 so that the invariance is maintained. But there
is no guarantee that ui , vi are even. However, since a, b are odd, exactly one
520 Computational Number Theory
u′ a′ + v ′ b′ = d′ = gcd(a′ , b′ ).
2r u′ a′ + 2r v ′ b′ = u′ (2r a′ ) + v ′ (2r b′ ) = 2r d′ = d.
u′ 2τ a + v ′ b = d
|ui+2 | = |ui | + qi+2 |ui+1 | = qi+2 Ki−1 (q3 , . . . , qi+1 ) + Ki−2 (q3 , . . . , qi ) =
Ki (q3 , . . . , qi+2 ). Thus, we have:
|ui | = Ki−2 (q3 , . . . , qi ) for all i > 2.
Analogously, we have:
|vi | = Ki−1 (q2 , . . . , qi ) for all i > 1.
(c) Take n = i − 2, and substitute x1 = q2 , x2 = q3 , . . . , xn+1 = qi in
Exercise 1.15(e) to get |ui |Ki−2 (q3 , . . . , qi ) − |vi |Ki−3 (q3 , . . . , qi−1 ) = (−1)i−2
for all i > 3. Moreover, since u0 = v1 = u2 = 1, we have:
gcd(ui , vi ) = 1 for all i > 0.
(d)³ If´rj = 0,³that
´ is, rj−1 = gcd(a, b) = d, we have uj a + vj³b = 0, that
´ ³ ´ is,
a b a b
uj d = −vj d . By Part (c), gcd(uj , vj ) = 1. Moreover, gcd( d , d ) = 1.
³ ´ ³ ´
It therefore follows that uj = ± db and vj = ∓ ad .
(e) The case b|a is easy to handle. So assume that a > b > 0 and b6 | a. Then,
the gcd loop ends with j > 4. In this case, we have |u3 | < |u4 | < · · · < |uj−1 | <
|uj | = db and |v2 | < |v3 | < · · · < |vj−1 | < |vj | = ad , that is,
b a
|uj−1 | < and |vj−1 | < .
d d
21. Write a = (as−1 as−2 . . . a1 a0 )B and b = (bt−1 bt−2 . . . b1 b0 )B , where each ai
and each bj are base-B words. We assume that B is a power of 2, and that a
and b are odd. If so, b0 is an odd integer, and b−1
0 (mod B) exists. We compute
the multiplier µ = b−10 a0 (mod B). The least significant word of a − µb is zero.
If B = 2r , we can remove at least r factors of 2 from a − µb.
Computing the inverse b−1 0 (mod B) can be finished by a single-precision
extended gcd computation. Moreover, the multiplication b−1 0 a0 (mod B) is
again of single-precision integers. In most CPUs, this is the value returned
by the single-precision multiplication function (the more significant word is
ignored). Therefore, µ can be computed efficiently. The multiplication µb and
the subtraction a − µb are also efficient (taking time proportional to s).
But then, we require (a − µb)/B to have word length (at least) one smaller
than that of a. This is ensured if t < s, that is, if the word length of b is at
least one smaller than that of a. If s = t (we cannot have s < t since a > b),
then µb may be an (s + 1)-word integer, so a − µb is again an (s + 1)-bit
negative integer. Ignoring the sign and removing the least significant word of
a − µb, we continue to have an s-word integer.
25. Consider the product x = n(n − 1)(n − 2) · · · (n − r + 1) of r consecutive
integers. If any of the factors n − i of x is zero, we have x = 0 which is a
multiple of r!. If all factors n − i of x are negative, we can write x as (−1)r
times a product of r consecutive positive integers. Therefore, we can assume
without loss of generality that 1 6 r 6 n. But then, x = n!/(n − r)!. For
522 Computational Number Theory
¥ ¦ ¥ ¦ ¥ ¦
any prime p and any k ∈ N, we have n/pk > (n − r)/pk + r/pk . By
Exercise 1.24, we conclude that vp (x) > vp (r!).
28. (a) The modified square-and-multiply algorithm is elaborated below.
53. (a) Let f (x) = ad xd + ad−1 xd−1 + · · · + a1 x + a0 . The binomial theorem with
the substitution x = ξ ′ gives
for some integer t. The condition f (ξ ′ ) ≡ 0³(mod´ p2e ) implies that f (ξ) +
kpe f ′ (ξ) ≡ 0 (mod p2e ), that is, f ′ (ξ)k ≡ − fp(ξ)
e (mod pe ). Each solution
of this linear congruence modulo pe gives a lifted root ξ ′ of f (x) modulo p2e .
(b) Here, f (x) = 2x3 + 4x2 + 3, so f ′ (x) = 6x2 + 8x. For p = 5, e = 2 and
ξ = 14, we have f (ξ) = 2 × 143 + 4 × 142 + 3 = 6275, that is, f (ξ)/25 ≡
251 ≡ 1 (mod 25). Also, f ′ (ξ) ≡ 6 × 142 + 8 × 14 ≡ 1288 ≡ 13 (mod 25).
Thus, we need to solve 13k ≡ −1 (mod 25). Since 13−1 ≡ 2 (mod 25), we have
k ≡ −2 ≡ 23 (mod 25). It follows that the only solution of 2x3 + 4x2 + 3 ≡
0 (mod 625) is 14 + 23 × 25 ≡ 589 (mod 625).
56. Since we can extract powers of two easily from a, we¡ ¢assume that a is odd.
For the Jacobi symbol, b is odd too. If a = b, then ab = 0. If a < b, we use
¡a¢ ¡b¢
¡ a ¢law to write b in terms of a . So it remains only
the quadratic reciprocity
to analyze the case of b with a, b odd and a > b. Let α ³ = a´− b. We write
¡a¢ ′
α = 2 a with r ∈ N and a odd. If r is even, then b = ab , whereas if r
r ′ ′
¡ ¢ ¡ ¢³ ′´ 2
³ ′´
is odd, then ab = 2b ab = (−1)(b −1)/8 ab . So, the problem reduces to
³ ′´
computing ab with both a′ , b odd.
t0 , t1 , . . . , tv−2 are determined, x stores the value a(q+1)/2 g t/2 (mod p). At this
0
point, we have x2 a−1 ≡ (x2 a−1 )2 ≡ 1 (mod p).
(b) There are (p−1)/2 quadratic residues and (p−1)/2 quadratic non-residues
in Z∗p . Therefore, a randomly chosen element in Z∗p is a quadratic non-residue
with probability 1/2. That is, trying a constant number of random candidates
gives us a non-residue, and locating b is expected to involve only a constant
number of Legendre-symbol calculations. The remaining part of the algorithm
involves two modular exponentiations to compute g and the initial value of x.
Moreover, a−1 (mod p) can be precomputed outside the loop. Each iteration
of the loop involves at most v − 1 modular squaring operations to detect
i. This is followed by one modular multiplication. It therefore follows that
Algorithm 1.9 runs in probabilistic polynomial time.
(c) By Conjecture 1.74, the smallest quadratic non-residue modulo p is less
than 2 ln2 p. Therefore, we may search for b in Algorithm 1.9 deterministically
in the sequence 1, 2, 3, . . . until a non-residue is found. The search succeeds in
less than 2 ln2 p iterations.
¥√ ¦
61. (a) Let r = p . Since p is not a perfect square, we have r2 < p < (r + 1)2 .
Consider the (r + 1)2 integers (u + vx) rem p with u, v ∈ {0, 1, 2, . . . , r}. By
the pigeon-hole principle, there exist unequal pairs (u1 , v1 ) and (u2 , v2 ) such
that u1 + xv1 ≡ u2 + xv2 (mod p). Let a = u1 − u2 and b = v2 − v1 . Then,
a ≡ bx (mod p). By the choice of u1 , u2 , v1 , v2 , we have −r 6 a 6 r and
−r 6 b 6 r with either a 6= 0 or b 6= 0. Furthermore, since x 6= 0, both a and
b must be non-zero.
(b) For p = 2, take a = b = 1. So, we assume that p is an odd prime.
If p ≡ 1 (mod 4), the congruence x2 ≡ −1 (mod p) is solvable by Exer-
cise 1.60. Using this value of x in Part (a) gives us a non-zero pair (a, b)
satisfying a ≡ bx (mod p), that is, a2 ≡ b2 x2 ≡ −b2 (mod p), that is,
a2 + b2 ≡ 0 (mod p). By Part (a), 0 < a2 + b2 6 2r2 < 2p, that is, a2 + b2 = p.
Finally, take a prime p ≡ 3 (mod 4). If p = a2 + b2 for some a, b (both
must be non-zero), we have (ab−1 )2 ≡ −1 (mod p). But by Exercise 1.60, the
congruence x2 ≡ −1 (mod p) does not have a solution.
(c) [If] Let m = p1 p2 · · · ps q12e1 q22e2 · · · qt2et , where p1 , p2 , . . . , pt are all the
prime divisors (not necessarily distinct from one another) of m, that are not
of the form 4k + 3, and where q1 , q2 , . . . , qt are all the prime divisors (distinct
from one another) of m, that are of the form 4k + 3. Since (a2 + b2 )(c2 + d2 ) =
(ac + bd)2 + (ad − bc)2 , Part (b) establishes that p1 p2 · · · ps can be expressed
as α2 + β 2 . Now, take a = αq1e1 q2e2 · · · qtet and b = βq1e1 q2e2 · · · qtet .
[Only if] Let m = a2 + b2 for some integers a, b, and let q ≡ 3 (mod 4)
be a prime divisor of m. Since q|(a2 + b2 ), and the congruence x2 ≡
−1 (mod q) is not solvable by Exercise 1.60, we must have q|a and q|b. Let
e = min(vq (a), vq (b)). Since m = a2 +b2 , we have m/(q e )2 = (a/q e )2 +(b/q e )2 ,
that is, m/(q e )2 is again a sum of two squares. If q divides m/(q e )2 , then q
divides both a/q e and b/q e as before, a contradiction to the choice of e.
70. The result is obvious for e = 1, so take e > 2.
526 Computational Number Theory
e−2
Lemma: For every e > 2, we have (1 + ap)p ≡ 1 + ape−1 (mod pe ).
Proof We proceed by induction on e. For e = 2, both sides of the congruence
are equal to the integer 1 + ap. So assume that the given congruence holds
e−1
for some e > 2. We investigate the value of (1 + ap)p modulo pe+1 . By
e−2
p e−1 e
the induction hypothesis, (1 + ap) = 1 + ap + up for some integer u.
Raising both sides of this equality to the p-th power gives
e−1
(1 + ap)p = (1 + ape−1 + upe )p
µ ¶ µ ¶
p e−1 e p
= 1+ (ap + up ) + (ape−1 + upe )2 + · · · +
1 2
µ ¶
p
(ape−1 + upe )p−1 + (ape−1 + upe )p
p−1
= 1 + ape + pe+1 × v
¡ ¢
for some integer v (since p is prime and so p| kp for 1 6 k 6 p − 1, and since
the last term in the binomial expansion is divisible by pp(e−1) , in which the
exponent p(e − 1) > e + 1 for all p > 3 and e > 2). •
Let us now derive the order of 1 + ap modulo pe . Using the lemma for e + 1
e−1 e−1
indicates (1 + ap)p ≡ 1 + ape (mod pe+1 ) and, in particular, (1 + ap)p ≡
1 (mod pe ). Therefore, ordpe (1 + ap) | pe−1 . The lemma also implies that
e−2
(1 + ap)p 6≡ 1 (mod pe ) (for a is coprime to p), that is, ordpe (1 + ap)6 | pe−2 .
We, therefore, have ordpe (1 + ap) = pe−1 .
√
76. The infinite simple continued fraction expansion of 2 is ha0√ , a1 , a2 , . . .i =
h1, 2i. Let hn /kn = ha0 , a1√
, . . . , an i be the n-th convergent to 2 . But then,
for every n ∈ N,we have | 2 − hknn | < kn k1n+1 = kn (2kn1+kn−1 ) 6 2k12 , that is,
√ √ √ √
n
2kn − 2k1n < hn < 2kn + 2k1n , that is, − 2 + 4k12 < h2n − 2kn2 < 2 + 4k12 .
n √ n
Since kn > 1, it follows that h2n − 2kn2 ∈ {0, 1, −1} for all n ∈ N.√But 2 is
irrational, so we cannot have h2n − 2kn2 = 0. Furthermore, hknn < 2 for even
√ n
−1 if n is even,
n, whereas hknn > 2 for odd n. Consequently, h2n − 2kn2 =
1 if n is odd.
√
78. (a) We compute 5 = h2, 4, 4, 4, . . .i = h2, 4i as follows:
√
ξ0 = 5 = 2.236 . . . , a0 = ⌊ξ0 ⌋ = 2
1 1 √
ξ1 = =√ = 5 + 2 = 4.236 . . . , a1 = ⌊ξ1 ⌋ = 4
ξ0 − a0 5−2
1 1 √
ξ2 = =√ = 5 + 2 = 4.236 . . . , a2 = ⌊ξ2 ⌋ = 4
ξ1 − a1 5−2
···
(b) The first convergent is r0 = hk00 = h2i = 2/1, that is, h0 = 2 and k0 = 1.
But h20 − 5k02 = −1. Then, we have r = hk11 = h2, 4i = 2 + 41 = 49 , that is,
h1 = 9 and k1 = 4. We have h21 − 5k12 = 1. Since k0 6 k1 < k2 < k3 < · · · , the
smallest solution is (9, 4).
Solutions to Selected Exercises 527
nprime = 0; nsol = 0;
for (p=2, 10^6, \
if (isprime(p), \
nprime++; \
t = p; s = 0; \
while (t > 0, s += t % 7; t = floor(t / 7)); \
if ((s > 1) && (!isprime(s)), \
nsol++; \
print("p = ", p, ", S7(p) = ", s); \
) \
) \
)
print("Total number of primes less that 10^6 is ", nprime);
print("Total number of primes for which S7(p) is composite is ", nsol);
This code reveals that among 78498 primes p < 106 , only 13596 lead to com-
posite values of S7 (p). Each of these composite values is either 25 or 35.
528 Computational Number Theory
getsol (B) = \
for (a=1, B, \
for (b=a, B, \
c = a^2 + b^2; \
d = a*b + 1; \
if (c % d == 0, \
print("a = ", a, ", b = ", b, ", (a^2+b^2)/(ab+1) = ", c / d); \
) \
) \
)
µ = RIGHT-SHIFT(γ3 , 41).
530 Computational Number Theory
α = (a3 θ + a2 )ψ + (a1 θ + a0 ),
β = (b3 θ + b2 )ψ + (b1 θ + b0 )
αβ = [(a3 θ+a2 )(b3 θ+b2 )]ψ 2 +[(a3 θ+a2 )(b1 θ+b0 )+(a1 θ+a0 )(b3 θ+b2 )]ψ+
[(a1 θ+a0 )(b1 θ+b0 )]
= [(a3 b3 +a3 b2 +a2 b3 )θ+(a3 b3 +a2 b2 )]ψ 2 +
[(a3 b1 +a3 b0 +a2 b1 +a1 b3 +a1 b2 +a0 b3 )θ+(a3 b1 +a2 b0 +a1 b3 +a0 b2 )]ψ+
532 Computational Number Theory
γ = (θ + 1)ψ + 1,
γ2 = (θ)ψ + (θ),
γ4 = (θ + 1)ψ + (θ),
γ8 = (θ)ψ.
µ(1) = 1,
µ(φ) = ψ,
µ(φ2 ) = ψ 2 = ψ + θ,
µ(φ3 ) = ψ(ψ + θ) = ψ 2 + ψθ = θ + ψ + ψθ.
Solutions to Selected Exercises 533
66. The following GP/PARI function accepts as input the element a(x) that we want
to invert, the characteristic p, and the defining polynomial f (x).
BinInv(a,p,f) = \
local(r1,r2,u1,u2); \
r1 = Mod(1,p) * a; r2 = Mod(1,p) * f; \
u1 = Mod(1,p); u2 = Mod(0,p); \
while (1, \
while(polcoeff(r1,0)==Mod(0,p), \
r1 = r1 / (Mod(1,p) * x); \
if (polcoeff(u1,0) != Mod(0,p), \
u1 = u1 - (polcoeff(u1,0) / polcoeff(f,0)) * f \
); \
u1 = u1 / (Mod(1,p) * x); \
if (poldegree(r1) == 0, return(lift(u1/polcoeff(r1,0)))); \
); \
while(polcoeff(r2,0)==Mod(0,p), \
r2 = r2 / (Mod(1,p) * x); \
if (polcoeff(u2,0) != Mod(0,p), \
u2 = u2 - (polcoeff(u2,0) / polcoeff(f,0)) * f
); \
u2 = u2 / (Mod(1,p) * x); \
if (poldegree(r2) == 0, return(lift(u2/polcoeff(r2,0)))); \
); \
if (poldegree(r1) >= poldegree(r2), \
c = polcoeff(r1,0)/polcoeff(r2,0); r1 = r1 - c*r2; u1 = u1 - c*u2, \
c = polcoeff(r2,0)/polcoeff(r1,0); r2 = r2 - c*r1; u2 = u2 - c*u1 \
) \
)
BinInv(x^6+x^3+x^2+x, 2, x^7+x^3+1)
BinInv(9*x^4+7*x^3+5*x^2+3*x+2, 17, x^5+3*x^2+5)
69. First, we write two functions for computing the trace and the norm of a ∈ Fpn .
The characteristic p and the defining polynomial f are also passed to these
functions. The extension degree n is determined from f .
abstrace(p,f,a) = \
local(n,s,u); \
f = Mod(1,p) * f; \
a = Mod(1,p) * a; \
n = poldegree(f); \
s = u = a; \
for (i=1,n-1, \
u = lift(Mod(u,f)^p); \
s = s + u; \
); \
return(lift(s));
536 Computational Number Theory
absnorm(p,f,a) = \
local(n,t,u); \
f = Mod(1,p) * f; \
a = Mod(1,p) * a; \
n = poldegree(f); \
t = u = a; \
for (i=1,n-1, \
u = lift(Mod(u,f)^p); \
t = (t * u) % f; \
); \
return(lift(t));
The following statements print the traces and norms of all elements of F64 =
F2 (θ), where θ6 + θ + 1 = 0.
f = x^6 + x + 1;
p = 2;
for (i=0,63, \
a = 0; t = i; \
for (j=0, 5, c = t % 2; a = a + c * x^j; t = floor(t/2)); \
print("a = ", a, ", Tr(a) = ", abstrace(p,f,a), ", N(a) = ", absnorm(p,f,a)) \
)
= x60 + x57 + x54 + x51 + x48 + x45 + x42 + x39 + x36 + x33 +
x30 + x27 + x24 + x21 + x18 + x15 + x12 + x9 + x6 + x3 + 1.
of Fq . In other words, the only solutions of h(x)q ≡ h(x) (mod g(x)) are
h(x) ≡ γ (mod g(x)), where γ ∈ Fq .
(b) For r = 0, 1, 2, . . . , d − 1, write
Consider the d×d matrix Q whose½r, s-th element is βs,r −δr,s for 0 6 r 6 d−1
1 if r = s
and 0 6 s 6 d − 1, where δr,s = is the Kronecker delta. All
0 otherwise
the solutions for h(x) in h(x)q ≡ h(x) (mod f (x)) can be obtained by solving
α0 0
α1 0
the homogeneous linear system Q α 0
.2 = . .
.. ..
αd−1 0
(c) By Part (a), this system has exactly q t solutions, that is, the nullity of Q
is t, and so its rank is d − t.
(d) There exists a unique solution (modulo f (x)) to the set of congruences:
h(x) ≡ γ1 (mod f1 (x)), h(x) ≡ γ2 (mod f2 (x)), and h(x) ≡ 0 (mod g(x)) for
Solutions to Selected Exercises 539
31. We first express x2r modulo f (x) = x8 +x5 +x4 +x+1 for r = 0, 1, 2, . . . , d−1:
x0 ≡ 1
x2 ≡ x2
4 4
x ≡ x
8 5
x ≡ x +x +x+1 4
(mod x8 + x5 + x4 + x + 1).
10 7 6 3 2
x ≡ x +x +x +x
x12 ≡ x6 + x5 + x2 + 1
14 7 5 2
x ≡ x +x +x +x+1
36. (i) Let us write Φn (x) = f (x)g(x) with f non-constant and irreducible in
Z[x]. Let ξ ∈ C be a root of f (x). Choose any prime p that does not divide n.
Assume that ξ p is not a root of f . Since ξ p is a primitive n-th root of unity,
Φn (ξ p ) = f (ξ p )g(ξ p ) = 0. Therefore, f (ξ p ) 6= 0 implies that g(ξ p ) = 0. But
f (x) is the minimal polynomial of ξ, and so f (x)|g(xp ) in Z[x].
Let ā(x) denote the modulo-p reduction of a polynomial a(x) ∈ Z[x]. Since
f (x)|g(xp ) in Z[x], we must have f¯(x)|ḡ(xp ) in Fp [x]. We have ḡ(xp ) = ḡ(x)p
in Fp [x]. Therefore, f¯(x)|ḡ(x)p implies that there exists a common irreducible
factor h̄(x) of f¯(x) and ḡ(x). This, in turn, implies that h̄(x)2 |Φ̄n (x). Moreover,
Φn (x) divides xn − 1 in Z[x], so Φ̄n (x) divides xn − 1 in Fp [x], that is, h̄(x)2
divides xn − 1 in Fp [x]. The formal derivative of xn − 1 is nxn−1 6= 0 in Fp [x]
since p6 | n. Therefore, gcd(xn − 1, nxn−1 ) = 1, that is, xn − 1 is square-free, a
contradiction to the fact that h̄(x)2 |(xn − 1) in Fp [x].
This contradiction proves that ξ p must be a root of f (x). Repeatedly
applying this result proves that for all k with gcd(k, n) = 1, ξ k is again a root
of f (x), that is, all primitive n-th roots of unity are roots of f (x).
38. (1) We can convert Syl(f, g) to Syl(g, f ) by mn number of interchanges of
adjacent rows.
(2) Let us write f (x) = q(x)g(x) + r(x) with deg r < deg g. Write q(x) =
qm−n xm−n +qm−n−1 xm−n−1 +· · ·+q1 x+q0 . Subtract qm−n times the (n+1)-
st row, qm−n−1 times the (n + 2)-nd row, and so on from the first row in order
to convert the first row to the coefficients of r(x) treated as a polynomial
of formal degree m. Likewise, from the second row subtract qm−n times the
(n+2)-nd row, qm−n−1 times the (n+3)-rd row, and so on. This is done for each
of the first n rows. We then make mn interchanges of adjacent rows in order
to bring the last µ
m rows in the first m row
¶ positions. This gives us a matrix
T U
of the form S = . Here, T is an (m − n) × (m − n)
02n×(m−n) Syl(g, r)
Solutions to Selected Exercises 541
upper triangular matrix with each entry in the main diagonal being equal to
bn . Moreover, r is treated as a polynomial of formal degree n. Therefore,
(3) Clearly, the last three of the given expressions are equal to each other, so it
suffices to show that Res(f, g) is equal to any of these expressions. We assume
that m > n (if not, use Part (1)). We proceed by induction on the actual
degree of the second argument. QIf deg g = 0 (that is, g(x) = a0 is a constant),
m
we have Res(f, g) = am 0 = a0
m i=1 g(αi ). Moreover,Qwe also need to cover the
m
case g = 0. In this case, we have Res(f, g) = 0 = a0m i=1 g(αi ). Now, suppose
mn m−n
that n > 0. By Part (2), we have Res(f, g) = (−1) bn Res(g, r) with r
treated as a polynomial of formal degree n. But r is a polynomial Qnof actual
degree 6 n − 1, so the induction assumption is that Res(g, r) = bnn j=1 r(βj ).
Since f = qg + r and each βj is aQroot of g, we have r(βj )Q= f (βj ). It follows
n n
that Res(f, g) = (−1)mn bm−n
n bnn j=1 f (βj ) = (−1)mn bmn j=1 f (βj ).
42. Consider the Sylvester matrix S of f and g. The first n − 1 rows contain the
coefficients of z n − 1, and the last n rows the coefficients of g(z). We make the
n × (n − 1) block at the bottom left corner of S zero by subtracting suitable
multiples of the first n − 1 rows from the last n rows. For example, from the n-
n−2
th row, we subtract α times the first row, αp times the second row, . . . , αp
times the (n − 1)-st row. These row operations do notµchange the determinant
¶
In−1 C
of S. The converted matrix is now of the form T = , where
0n×(n−1) D
In−1 is the (n − 1) × (n − 1) identity matrix, and
n−1 2 n−2
αp α αp αp ··· αp
n−2
αp pn−1 pn−3
pn−3 α α αp ··· α
n−2 n−1 n−4
α .
D= αp αp α ··· αp .
.. .. .. .. ..
. . . ··· .
2 3 n−1
α αp αp αp ··· αp
= hbi , b∗i i
i−1
X
= hbi , bi − µi,j b∗j i
j=1
i−1
X
= hbi , bi i − µi,j hbi , b∗j i
j=1
i−1
X hbi , b∗j i2
= hbi , bi i −
j=1
hb∗j , b∗j i
i−1 µ ¶2
X hbi , b∗j i
= |bi |2 −
j=1
|b∗j |
6 |bi |2 ,
that is, |b∗i | 6 |bi | for all i, and so
n
Y n
X
| det M | = | det M ∗ | = |b∗i | 6 |bi |.
i=1 i=1
50. (a) By the definition of reduced bases (Eqns (3.1) and (3.2)), we have
µ ¶
3 1
|b∗i |2 > − µ2i,i−1 |b∗i−1 |2 > |b∗i−1 |2 .
4 2
Applying this result i − j times shows that for 1 6 j 6 i 6 n, we have
|b∗j |2 6 2i−j |b∗i |2 .
Pj−1
Gram–Schmidt orthogonalization gives bj = b∗j + k=1 µj,k b∗k . Since the
∗ ∗ ∗
vectors b1 , b2 , . . . , bn are orthogonal to one another, we then have:
j−1
X
|bj |2
= |b∗j |2 + µ2j,k |b∗k |2
k=1
j−1
1X ∗2
6 |b∗j |2 + |bk | [by Eqn (3.1)]
4
k=1
Solutions to Selected Exercises 543
j−1
1 X j−k ∗ 2
6 |b∗j |2 + 2 |bj | [proved above]
4
k=1
µ ¶
1
= 2j−2
+ |b∗j |2
2
6 2j−1 |b∗j |2 [since j > 1]
6 2 2 |b∗i |2
j−1 i−j
[proved above]
= 2i−1 |b∗i |2 .
Qn
(b) By Hadamard’s inequality, d(L) 6 i=1 |bi |, and by Part (a), |bi | 6
2Q(i−1)/2 |b∗i |. But Q
b∗1 , b∗2 , . . . , b∗n form an orthogonal basis of L, so d(L) =
n ∗ n [0+1+2+···+(n−1)]/2
i=1 |bi |. Thus, i=1 |bi | 6 2 d(L) = 2n(n−1)/4 d(L).
(i−1) ∗
(c) By Part (a), |bQ 1| 6 2 |bi | for all i = 1, 2, . . . , n. Therefore, |b1 |n 6
[0+1+2+···+(n−1)]/2 n ∗
2 |b | = 2n(n−1)/4 d(L).
Pmi=1 i Pm
51. We can write x = i=1 ui bi = i=1 u∗i b∗i with integers ui and real numbers
u∗i , where m ∈ {1, 2, . . . , n} is the largest integer for which um 6= 0. By the
Gram–Schmidt orthogonalization formula, we must have um = u∗m , so
|x|2 > (u∗m )2 |b∗m |2 = u2m |b∗m |2 > |b∗m |2 .
Exercise 3.50(a) gives |bi |2 6 2m−1 |b∗m |2 for all i, 1 6 i 6 m. In particular,
for i = 1, we have
|b1 |2 6 2m−1 |b∗m |2 6 2m−1 |x|2 6 2n−1 |x|2 .
Now, take any n linearly independent vectors x1 , x2 , . . . , xn in L. Let mj
be the value of m for xj . As in the last paragraph, we can prove that
|bi |2 6 2n−1 |xj |2
for all i, 1 6 i 6 mj . Since x1 , x2 , . . . , xn are linearly independent, we must
have mj = n for at least one j. For this j, we have |bi |2 6 2n−1 |xj |2 for all i
in the range 1 6 i 6 n.
57. The function EDF() takes four arguments: the polynomial f (x), the prime
modulus p, the degree r of each irreducible factor of f , and a bound B. This
bound dictates the maximum degree of u(x) used in Algorithm 3.7. The choice
B = 1 corresponds to Algorithm 3.5.
EDF(f,p,r,B) = \
local(u,d,i,g,h,e); \
f = Mod(1,p) * f; \
if (poldegree(f) == 0, return; ); \
if (poldegree(f) == r, print("Factor found: ", lift(f)); return; ); \
e = (p^r - 1) / 2; \
while (1, \
u = Mod(0,p); \
d = 1 + random() % B; \
for (i=0, d, u = u + Mod(random(),p) * x^i ); \
544 Computational Number Theory
u = Mod(u,f); g = u^e; \
h = gcd(lift(g)-Mod(1,p),f); h = h / polcoeff(h,poldegree(h)); \
if ((poldegree(h) > 0) && (poldegree(h) < poldegree(f)), \
EDF(h,p,r,B); \
EDF(f/h,p,r,B); \
return; \
); \
);
58. Following the recommendation of Exercise 3.8(f), we choose u(x) for Algo-
rithm 3.8 in the sequence x, x3 , x5 , x7 , . . . . We pass the maximum degree 2d+1
such that x2d+1 has already been tried as u(x). The next recursive call starts
with u(x) = x2d+3 . The outermost call should pass −1 as d.
EDF2(f,r,d) = \
local(u,s,i,h); \
f = Mod(1,2) * f; \
if (poldegree(f) == 0, return); \
if (poldegree(f) == r, print("Factor found: ", lift(f)); return); \
while (1, \
d = d + 2; u = Mod(1,2) * x^d; s = u; \
for (i=1,r-1, u= u^2 % f; s = s + u; ); \
h = gcd(f,s); \
if ((poldegree(h) > 0) && (poldegree(h) < poldegree(f)), \
EDF2(h,r,d); \
EDF2(f/h,r,d); \
return; \
); \
);
EDF2(x^18+x^17+x^15+x^11+x^6+x^5+1, 6, -1)
EDF2(x^20+x^18+x^17+x^16+x^15+x^12+x^10+x^9+x^7+x^3+1, 5, -1)
BQfactor(f,p) = \
f = Mod(1,p) * f; d = poldegree(f); \
Q = matrix(d,d); for (i=0,d-1, Q[i+1,1] = Mod(0,p)); \
for (r=1,d-1, \
h = lift(lift(Mod(Mod(1,p)*x,f)^(r*p))); \
for (i=0,d-1, Q[i+1,r+1] = Mod(polcoeff(h,i),p)); \
Q[r+1,r+1] = Q[r+1,r+1] - Mod(1,p); \
); \
Solutions to Selected Exercises 545
V = matker(Q); t = matsize(V)[2]; \
if (t==1, print(lift(f)); return); \
decompose(V,t,f,p,d);
decompose(V,t,f,p,d) = \
d = poldegree(f); \
for (i=1,t, \
h = 0; \
for (j=0,d-1, h = h + Mod(V[j+1,i],p) * x^j); \
for (a=0,p-1, \
f1 = gcd(h-Mod(a,p), f); \
d1 = poldegree(f1); \
f1 = f1 / polcoeff(f1,d1); \
if ((d1 > 0) && (d1 < d), \
if (polisirreducible(f1), \
print(lift(f1)), \
decompose(V,t,f1,p,d); \
); \
f2 = f / f1; \
if (polisirreducible(f2), \
print(lift(f2)), \
decompose(V,t,f2,p,d); \
); \
return; \
); \
); \
);
BQfactor(x^8+x^5+x^4+x+1, 2)
BQfactor(x^8+x^5+x^4+2*x+1, 3)
polyliftonce(f,g,h,p,n) = \
local(q,i,j,w,A,c,b,r,s,t); \
q = p^n; \
w = (f - g * h) / q; \
r = poldegree(g); s = poldegree(h); t = poldegree(f); \
b = matrix(t+1,1); A = matrix(t+1,t+1); \
for(i=0, t, b[t-i+1,1] = Mod(polcoeff(w,i),p)); \
for (j=0, s, \
for (i=0, r, \
A[i+j+1,j+1] = Mod(polcoeff(g,r-i),p); \
); \
); \
for (j=0, r-1, \
for (i=0, s, \
A[i+j+2,s+j+2] = Mod(polcoeff(h,s-i),p); \
); \
); \
546 Computational Number Theory
c = A^(-1) * b; \
u = 0; v = 0; \
for (i=0, r-1, u = u + lift(c[s+i+2,1]) * x^(r-i-1)); \
for (i=0, s, v = v + lift(c[i+1,1]) * x^(s-i)); \
return([g+q*u,h+q*v]);
polylift(f,g,h,p,n) = \
local(i,j,q,L); \
q = p; \
for (i=1, n-1, \
L = polyliftonce(f,g,h,p,i); \
q = q * p; \
g = lift(L[1]); \
for (j=0, poldegree(g), if (polcoeff(g,j) > q/2, g = g - q * x^j) ); \
h = lift(L[2]); \
for (j=0, poldegree(h), if (polcoeff(h,j) > q/2, h = h - q * x^j) ); \
); \
return([g,h]);
E : Y 2 = X 3 + aX + b.
548 Computational Number Theory
where fi (X, Y ) the sum of all non-zero terms of degree i in f (X, Y ). The
homogenization of C is then
In order to find the points at infinity on C (h) , we put Z = 0 and get fd (X, Y ) =
0. Let us write this equation as
where ci ∈ K are not all zero. If c0 is the only non-zero coefficient, we have
Y d = 0, and the only point at infinity on the curve is [1, 0, 0]. If ci 6= 0 for
some i > 0, we rewrite this equation as
Therefore,
h h1 h2 l1 l2 (k2 l1 − k1 l2 )2 − (h2 l1 − h1 l2 )2 (h1 l2 + h2 l1 )
= λ2 − − = ,
l l1 l2 l1 l2 (h2 l1 − h1 l2 )2
and
µ ¶
k h1 h k1
=λ − − .
l l1 l l1
Substituting the values of λ and h/l gives an explicit expression for k/l. These
expressions are too clumsy. Fortunately, there are many common subexpres-
sions used in these formulas. Computing these intermediate subexpressions
allows us to obtain h, k, l as follows:
T1 = k2 l1 − k1 l2 ,
T2 = h 2 l1 − h 1 l2 ,
T3 = T22 ,
T4 = T2 T3 ,
T5 = l1 l2 T12 − T4 − 2h1 l2 T3 ,
h = T2 T5 ,
k = T1 (h1 l2 T3 − T5 ) − k1 l2 T4 ,
l = l1 l2 T 4 .
These expressions can be further optimized by using temporary variables to
store the values of h1 l2 , k1 l2 and l1 l2 .
(b) We proceed as in the case addition. Here, I present only the final formulas.
T1 = 3h21 + al12 ,
T2 = k1 l1 ,
T3 = h1 k1 T2 ,
T4 = T12 − 8T3 ,
T5 = T22 ,
h′ = 2T2 T4 ,
k′ = T1 (4T3 − T4 ) − 8k12 T5 ,
l′ = 8T2 T5 .
(c) Computing the affine coordinates requires a division in the field. If this
division operation is much more expensive than multiplication and squaring
in the field, avoiding this operation inside the loop (but doing it only once
after the end of the loop) may speed up the point-multiplication algorithm.
However, projective coordinates increase the number of multiplication (and
squaring) operations substantially. Therefore, it is not clear whether avoiding
one division in the loop can really provide practical benefits. Implementers re-
port contradictory results in the literature. The practical performance heavily
depends on the library used for the field arithmetic.
550 Computational Number Theory
31. The second point can be treated as having projective coordinates [h2 , k2 , 1],
that is, l2 = 1. The formulas in Exercise 4.30(a) can be used with l2 = 1. This
saves the three multiplications h1 l2 , k1 l2 and l1 l2 .
In the conditional addition part of the point-multiplication algorithm, the
second summand is always the base point P which is available in affine coor-
dinates. Inside the loop, we always keep the sum S in projective coordinates.
However, the computation of S + P benefits from using mixed coordinates.
32. (b) Let [h, k, l]c,d be a finite point on C. We have l 6= 0, and so [h, k, l]c,d ∼
[h/lc , k/ld , 1]c,d . Therefore, we identify [h, k, l]c,d with the point (h/lc , k/ld ).
Conversely, to a point (h, k) in affine coordinates, we associate the point
[h, k, 1]c,d with projective coordinates. It is easy to verify that these associa-
tions produce a bijection of all finite points on C with all points on C (c,d) with
non-zero Z-coordinates.
(c) We need to set Z = 0 in order to locate the points at infinity. This gives us
the polynomial g(X, Y ) = f (c,d) (X, Y, 0). In general, g is not a homogeneous
polynomial in the standard sense. However, if we give a weight of c to X and
a weight of d to Y , each non-zero term in g is of the same total weight. Let
′ ′
X i Y j and X i Y j have the same weight, that is, ci + dj = ci′ + dj ′ , that is,
c(i − i′ ) = d(j ′ − j). For the sake of simplicity, let us assume that gcd(c, d) = 1
(the case gcd(c, d) > 1 can be gracefully handled). But then, i ≡ i′ (mod d)
and j ≡ j ′ (mod c). In view of this, we proceed as follows.
If X divides g, we get the point [0, 1, 0]c,d at infinity. If Y divides g, we get
the point [1, 0, 0]c,d at infinity. We remove all factors of X and Y from g, and
assume that g is divisible by neither X nor Y . By the argument in the last
paragraph, we can write g(X, Y ) = h(X d , Y c ). Call X ′ = X d and Y ′ = Y c .
Then, h(X ′ , Y ′ ) is a homogeneous polynomial in X ′ , Y ′ . We find all the roots
for X ′ /Y ′ from this equation. Each root α corresponds to a point Oα at infinity
on the curve. Since gcd(c, d) = 1, we have uc + vd = 1 for some integers u, v.
The choices X = αv and Y = α−u are consistent with X ′ = X d , Y ′ = Y c
and X ′ /Y ′ = α, so we take Oα = [αv , α−u , 0]c,d . There is a small catch here,
namely, the values u, v in Bézout’s relation are not unique. Given any solution
u, v of uc + vd = 1, all solutions are given by (u + rd)c + (v − rc)d = 1 for any
v
α−u
r ∈ Z. But then, [αv−rc , α−(u+rd) , 0]c,d = [ (ααr )c , (α v
r )d , 0]c.d = [α , α
−u
, 0]c,d ,
that is, the point Oα does not depend on the choice of the pair (u, v).
33. (a) Substituting X by X/Z 2 and Y by Y /Z 3 in the equation of the curve and
multiplying by Z 6 , we obtain
E (2,3) : Y 2 = X 3 + aXZ 4 + bZ 6 .
(b) Put Z = 0 to get X 3 − Y 2 = 0. Now, set X ′ = X 3 and Y ′ = Y 2 to get
X ′ − Y ′ = 0. The only root of this equation is X ′ /Y ′ = 1. Therefore, the
point at infinity on E (2,3) is [1, 1, 0]2,3 (since 1 raised to any integral power is
again 1, we do not need to compute Bézout’s relation involving 2, 3).
(c) The point [h, k, l]2,3 (with l 6= 0) has affine coordinates (h/l2 , k/l3 ). The
opposite of this point is (h/l2 , −k/l3 ), that is, the point [h/l2 , −k/l3 , 1]2,3 =
[h, −k, l]2,3 = [h, k, −l]2,3 .
Solutions to Selected Exercises 551
(d) Let P1 = [h1 , k1 , l1 ]2,3 and P2 = [h2 , k2 , l2 ]2,3 be two finite points on
E with P1 6= ±P2 . In order to compute the sum P1 + P2 = [h, k, l]2,3 , we
compute the sum of the points (h1 /l12 , k1 /l13 ) and (h2 /l22 , k2 /l23 ). We proceed
as in Exercise 4.30(a), and obtain the following formulas.
T1 = h1 l22 ,
T2 = h2 l12 ,
T3 = k1 l23 ,
T4 = k2 l13 ,
T5 = T2 − T1 ,
T6 = T4 − T3 ,
h = −T53 − 2T1 T52 + T62 ,
k = −T3 T53 + T6 (T1 T52 − h),
l = l1 l2 T 5 .
(e) The double of [h, k, l]c,d is the point [h′ , k ′ , l′ ]c,d computed as follows.
T1 = 4hk 2 ,
T2 = 3h2 + al4 ,
h′ = −2T1 + T22 ,
k′ = −8k 4 + T2 (T1 − h′ ),
l′ = 2kl.
58. Let Div(f ) = m[P ]−m[O], D ∼ [Q]−[O] and D′ ∼ [Q′ ]−[O], where Q = mQ′ .
Since Q − O − m(Q′ − O) = O, we have D ∼ mD′ , that is, D = mD′ + Div(h)
for some rational function h. But then,
hP, Qim = f (D) = f (mD′ + Div(h)) = f (mD′ )f (Div(h))
= f (D′ )m h(Div(f )) = f (D′ )m h(m[P ] − m[O])
³ ´m
= f (D′ )h([P ] − [O]) .
for some u, v, w. Suppose that L is the line through the points P1 = (h1 , k1 )
and P2 = (h2 , k2 ) on the curve (we may have P1 = P2 ). The third point of
intersection of L with the curve is (h3 , −k3 ), where P1 +P2 = (h3 , k3 ). Clearly,
h1 , h2 , h3 are roots of N(L), and it follows that
N(L) = −(x − h1 )(x − h2 )(x − h3 ).
In particular, if P1 = P2 , we have
2
N(L) = −(x − h1 ) (x − h3 ).
In the parts below, we take Q = (h, k).
(a) We have
LU,U (Q) LU,U (Q) 1
= = −
L2U,−U (Q)L2U,−2U (Q) (h − x(U ))2 (h − x(2U )) L̂U,U (Q)
1
= − .
LU,U (−Q)
(b) Let us denote tU = (ht , yt ). But then,
L(k+1)U,kU (Q)
L(k+1)U,−(k+1)U (Q)L(2k+1)U,−(2k+1)U (Q)
L(k+1)U,kU (Q)
=
(h − hk+1 )(h − h2k+1 )
L(k+1)U,kU (Q)L̂(k+1)U,kU (Q)
=
(h − hk+1 )(h − h2k+1 )L̂(k+1)U,kU (Q)
L(k+1)U,kU (Q)L̂(k+1)U,kU (Q)(h − hk )
=
(h − hk )(h − hk+1 )(h − h2k+1 )L̂(k+1)U,kU (Q)
h − hk
= −
L̂(k+1)U,kU (Q)
LkU,−kU (Q)
= − .
L(k+1)U,kU (−Q)
(c) Put k = 1 in Part (b).
61. For n > 2, let us define the function
gn,P = fn,P LnP,−nP .
If we can compute gn,P , we can compute fn,P , and conversely. Moreover, if
mP = O (this is usually the case in Weil- and Tate-pairing computations), we
have LmP,−mP = 1, and so gm,P = fm,P , that is, no final adjustment is neces-
sary to convert the value of g to the value of f . In view of this, we can modify
Miller’s algorithm so as to compute gn,P (Q). Write n = (1ns−1 ns−2 . . . n1 n0 )2 .
We first consume two leftmost bits from n. If ns−1 = 0, we compute
g2,P (Q) = f2,P L2P,−2P (Q) = LP,P (Q).
Solutions to Selected Exercises 555
L(U,V,Q) = \
local(l,m); \
if ((U == [0]) && (V == [0]), return(1)); \
if (U == [0], return(Q[1] - V[1])); \
if (V == [0], return(Q[1] - U[1])); \
if ((U[1] == V[1]) && (U[2] == -V[2]), return(Q[1] - U[1])); \
if ((U[1] == V[1]) && (U[2] == V[2]), \
l = (3 * U[1]^2 + E[4]) / (2 * U[2]), \
l = (V[2] - U[2]) / (V[1] - U[1]) \
); \
m = l * U[1] - U[2]; \
return(Q[2] - l * Q[1] + m);
p = 43; L([Mod(1,p),Mod(2,p)],[Mod(1,p),Mod(2,p)],[x,y])
76. We first fix the prime p, a supersingular curve E of the given form, and the
parameters m and k. We also need a representation of the field Fpk . Since
p ≡ 3 (mod 4) and k = 2, we can represent this extension as Fp (θ) with
θ2 + 1 = 0. We pass two points for which the reduced Tate pairing needs to
be computed. We invoke the line-function primitive L(U,V,Q) implemented in
Exercise 4.75. We use the formula hP, Qim = fm,P (Q). If the computation
fails, we conclude that this is a case of degenerate output, and return 1.
p = 43; E = ellinit([0,0,0,Mod(3,p),0]);
m = (p + 1) / 4; k = 2; T = Mod(t, Mod(1,p)*t^2 + Mod(1,p))
Tate(P,Q) = \
local(s,U,V,fnum,fden); \
s = ceil(log(p)/log(2)); while (bittest(m,s)==0, s--); \
Solutions to Selected Exercises 557
(r − 1)(r + w)
p−1= .
vw − r2
Since r − 1, r + w, p − 1 are positive, vw − r2 is positive too. Finally, since
vw − r2 is an integer, we have
p − 1 6 (r − 1)(r + w) 6 (r − 1)(2r − 1).
Given r, there are, therefore, only finitely many possibilities for p. For each
such p, we get only finitely many primes q satisfying pr − 1 = w(q − 1).
11. (a) In view of Exercise 5.10(a), it suffices to concentrate only on Carmichael
numbers n. We can write n = p1 p2 · · · pr with pairwise distinct odd primes
p1 , p2 , . . . , pr , r > 3, and with (pi − 1)|(n − 1) for all i = 1, 2, . . . , r. We now
consider two cases.
n−1
Case 1: All pi −1 are even.
³ ´ ³ ´
a a
We choose a base a ∈ Z∗n such that p1 = −1, whereas
= +1 for
pi
¡a¢
i = 2, 3, . . . , r. By the definition of the Jacobi symbol, we have n = −1.
(n−1)/2
By Euler’s criterion, a(p1 −1)/2 ≡ −1 (mod p1 ). Since pn−1 1 −1
= (p 1 −2)/2
is
even by hypothesis, we have a(n−1)/2 ≡ 1 (mod p1 ). On the other hand, for
i = 2, 3, . . . , r, we have a(pi −1)/2 ≡ 1 (mod pi ), that is, a(n−1)/2 ¡≡ 1¢ (mod pi ).
By CRT, we then have a(n−1)/2 ≡ 1 (mod n), that is, a(n−1)/2 6≡ na (mod n),
that is, n is not an Euler pseudoprime to base a.
n−1
Case 2: Some pi −1 is odd.
22. (a) Since an−1 ≡ 1 (mod n), we have an−1 ≡ 1 (mod p). Moreover, since p | n,
we have gcd(a(n−1)/q − 1, p) = 1, that is, a(n−1)/q 6≡ 1 (mod p). It follows that
ordp (a) = ut for some t | v. But then, b ≡ at (mod p) has order u modulo p.
Since ordp b divides φ(p) = p − 1, we have u|(p − 1), that is, p ≡ 1 (mod u).√
(b) Suppose that n is composite.
√ Take any prime divisor p of n with p 6 n .
By Part (a), p > u + 1 > n + 1, a contradiction. Therefore, n must be prime.
(c) In order to convert the above observations to an efficient algorithm, we
need to clarify two issues.
(1) The √integer n − 1 can be written as uv with u, v as above and with
u > n . We can keep on making trial divisions of n − 1 by small √
primes
q1 = 2, q2 = 3, q3 = 5, . . . until n − 1 reduces to a value v 6 n . If
n − 1 is not expressible in the above form, we terminate the procedure,
and report failure after suitably many small primes are tried.
(2) We need an element a satisfying the two conditions an−1 ≡ 1 (mod n)
and gcd(a(n−1)/q − 1, n) = 1 for all q|u. If n is prime, any of the φ(n − 1)
primitive elements modulo n satisfies these conditions. Thus, a suitable
random base a is expected to be available within a few iterations.
27. The adaptation of the sieve of Eratosthenes is implemented by the following
function. Here, k is the number of small primes to be used in the sieve. The
sieve length is taken to be M = 2t. If no primes are found in the interval
[a, a + M − 1] for a randomly chosen a, the function returns −1.
randprime(t,k) = \
local(M,a,A,i,p,r); \
a = 0; \
while (a <= 2^(t-1), a = random(2^t)); \
M = 2 * t; A = vector(M); \
for (i=0, M-1, A[i+1] = 1); \
for (i=1, k, \
p = prime(i); r = a % p; \
if (r != 0, r = p - r); \
while (r < M, A[r+1] = 0; r = r + p ); \
); \
for (i=0, M-1, if (A[i+1] == 1, \
if(isprime(a+i), return(a+i)); \
)); \
return(-1);
³ 2 ´
32. We write n − l = 2s t with t odd, where l = a −4b
n . The following function
computes (and returns) Vt and Vt+1 modulo n using the doubling formulas.
The power bt (mod n) is also computed (and returned).
VMod(m,n,a,b) = \
local(V,Vnext,B,i); \
V = Mod(2,n); Vnext = Mod(a,n); B = Mod(1,n); \
i = ceil(log(m)/log(2)) - 1; \
562 Computational Number Theory
while (i >= 0, \
if (bittest(m,i) == 0, \
Vnext = V * Vnext - a * B; \
V = V^2 - 2 * B; \
B = B^2; \
, \
V = V * Vnext - a * B; \
Vnext = Vnext^2 - 2 * B * b; \
B = B^2 * b; \
); \
i--; \
); \
return([V,Vnext,B]);
strongLucaspsp(n,a,b) = \
local(D,l,U,s,t,V,Vnext,W,B); \
D = a^2 - 4*b; l = kronecker(D,n); \
if (l == 0, return(0)); \
t = n - l; s = 0; \
while (bittest(t,s) == 0, s++); \
t = t / (2^s); \
W = VMod(t,n,a,b); \
V = W[1]; Vnext = W[2]; B = W[3]; \
U = (2*Vnext - a*V)/D; \
if (U == Mod(0,n), return(1)); \
while (s>0, \
if (V == Mod(0,n), return(1)); \
Vnext = V * Vnext - a * B; \
V = V^2 - 2 * B; \
B = B^2; \
s--; \
); \
return(0);
That is, L[1] values of c need to be tried in order to obtain a single relation.
Since we require about 2t (which is again L[1/2]) relations, the value of M
should be L[1] × L[1/2] = L[3/2].
(b) Proceed as in the QSM. Let x2 = kn + J with J ∈ {0, 1, 2, . . . , n − 1}. We
have (x + c)2 ≡ x2 + 2xc + c2 ≡ kn + J + 2xc + c2 ≡ T (c) (mod n), where
T (c) = J + 2xc + c2 . Use an array A indexed by c in the range −M 6 c 6 M .
Initialize Ac = log |T (c)|. For each small prime q and small exponent h, solve
the congruence (x + c)2 ≡ kn (mod q h ). For all values of c in the range
−M 6 c 6 M , that satisfy the above congruence, subtract log q from Ac .
When all q and h values are considered, check which array locations Ac store
values ≈ 0. Perform trial divisions on the corresponding T (c) values.
(c) Follow the analysis of sieving in QSM. Initializing A takes L[3/2] time.
Solving all the congruences (x + c)2 ≡ kn (mod q h ) takes L[1/2] time. Sub-
traction of all log q values takes L[3/2] time. Trial division of L[1/2] smooth
values by L[1/2] primes takes L[1] time. Finally, the sparse system with L[1/2]
variables and L[1/2] equations can be solved in L[1] time.
√ §√ ¨
18. (c) Let H = ⌈ n ⌉ (original QSM) and H ′ = 2n (modified QSM). Let
M be the optimal sieving limit when the original QSM runs alone. In the
case of the dual QSM, we run both the original QSM and the modified QSM
with a sieving limit of M/2. In the original QSM, 2M + 1 candidates (for
−M 6 c 6 M ) are tried for smoothness. In the dual QSM, there are two sieves
each handling M + 1 candidates (for −M/2 6 c 6 M/2). The total number
of candidates for the dual QSM is, therefore, 2M + 2. In the original QSM,
2M +1 candidates are expected to supply the requisite number of relations. So
the dual QSM, too, is expected to supply nearly the same number of relations,
provided that the candidates are not much larger in the dual QSM than in
the original QSM. Indeed, we now show that the dual QSM actually reduces
the absolute values of the candidates by a factor larger than 1.
The values of |T (c)| for the original QSM are approximately proportional
′
√ H, whereas those for the modified QSM are roughly proportional to H ≈
to
2H. In particular, the average value of |T (c)| for the first sieve is nearly
2H ×(M/4) = M H/2 (the sieving interval is M/2 now),√and the average value
′
of |T (c)| for the second sieve
√ is about 2H × (M/4) ≈ 2 M H/2. The overall
average is, therefore, (1 + 2 )M H/4. When the original QSM runs alone, this
average is M H. Consequently, the smoothness candidates in the √ dual QSM are
smaller than those for the original QSM by a factor of 4/(1 + 2 ) ≈ 1.657. As
a result, the dual QSM is expected to supply more relations than the original
QSM. Viewed from another angle, we can take slightly smaller values for M
and/or t in the dual QSM than necessary for the original QSM, that is, the
dual QSM is slightly more efficient than the original QSM.
The dual QSM does not consider the larger half of the T (c) values (cor-
responding to M/2 < |c| 6 M ) for smoothness tests. It instead runs another √
sieve. Although the smoothness candidates in the second sieve are about 2
times larger than the candidates in the original sieve, there is an overall re-
duction in the absolute values of T (c) (averaged over the two sieves).
Solutions to Selected Exercises 565
Initialize S = O and T = P .
For i = s − 1, s − 2, . . . , 1, 0, repeat: {
If (ni = 0) { /* Update (S, T ) to (2S, 2S + P ) = (2S, S + T ) */
Assign T = S + T and S = 2S.
} else { /* Update (S, T ) to (2S + P, 2S + 2P ) = (S + T, 2T ) */
Assign S = S + T and T = 2T .
}
}
Return S.
Multiplying these two formulas and substituting k12 = h31 + ah1 + b and k22 =
h32 + ah2 + b, we get
ph |u1 (but p6 | u2 ), then we subtract log p from all the entries in the d-th column
if and only if ph |d. Likewise, if ph |u2 (but p6 | u1 ), then log p is subtracted from
the entire c-th row if and only if ph |c. Finally, if gcd(u1 , p) = gcd(u2 , p) = 1,
then for each row c, a solution for d is obtained from cu1 + du2 ≡ 0 (mod ph ),
and every ph -th location is sieved in the c-th row.
Pollard suggests another sieving idea. The (c, d) pairs satisfying cu1 +du2 ≡
0 (mod ph ) form a sublattice. Let W1 = (c1 , d1 ) and W2 = (c2 , d2 ) constitute
a reduced basis of this sublattice. The array locations (c, d) = e(c1 , d1 ) +
f (c2 , d2 ) = (ec1 + f c2 , ed1 + f d2 ) are sieved for small coprime e and f .
35. Dixon’s method (both the relation-collection and the linear-algebra phases)
are implemented below. The decisions about t (the size of the factor base) and
s (the number of relations to be collected) are not taken by this function.
Dixon(n,s,t) = \
local(i,j,x,y,X,A,C,D,k,v,beta,d,e); \
A = matrix(t,s); C = matrix(t,s); X = vector(s); i = 1; \
while(i <= s, \
X[i] = random(n); y = (X[i]^2) % n; \
for (j=1,t, \
A[j,i] = 0; \
while (y % prime(j) == 0, A[j,i]++; y = y / prime(j)); \
); \
if (y == 1, i++) \
); \
print("Relations collected"); \
for (i=1,t, for(j=1,s, C[i,j] = Mod(A[i,j],2))); \
D = lift(matker(C)); k = matsize(D)[2]; \
print("Nullspace computed: dimension = ", k); \
v = vector(k); beta = vectorv(s); d = 1; \
while ((d==1) || (d==n), \
for (j=1,k, v[j] = random(2)); \
for (i=1,s, \
for(j=1,k, \
beta[i] = beta[i] + v[j] * D[i,j] \
); \
beta[i] = beta[i] % 2; \
); \
e = mattranspose(A * beta); \
x = Mod(1,n); for (i=1,s, if (beta[i], x = x * Mod(X[i],n))); \
y = Mod(1,n); for (j=1,t, y = y * Mod(prime(j),n)^(e[j]/2)); \
d = gcd(lift(x)-lift(y),n); \
print("x = ", lift(x), ", y = ", lift(y), ", d = ", d); \
); \
return(d);
Dixon(64349,20,15)
37. Given a bound b, we first compute the factor base B consisting of −1 and
all primes 6 b, modulo which n is a quadratic residue. We also require the
sieving range [−M, M ]. In what follows, we implement the incomplete sieving
strategy. The tolerance bound ξ should also be supplied.
568 Computational Number Theory
getFactorBase(n,b) = \
local(P,B,p,t,i); \
P = vector(b); p = 2; t = 1; P[t] = -1; \
while (p <= b, \
if (kronecker(n,p) == 1, t++; P[t] = p); \
p = nextprime(p+1); \
); \
B = vector(t); for (i=1,t, B[i] = P[i]); \
return([t,B]);
QSMsieve(n,b,M,xi) = \
local(H,J,tB,t,B,logB,c,c1,c2,A,i,T,bnd); \
H = 1 + sqrtint(n); J = H^2 - n; \
tB = getFactorBase(n,b); t = tB[1]; B = tB[2]; tB = 0; \
logB = vector(t); for (i=2,t, logB[i] = floor(1000 * log(B[i]))); \
A = vector(2*M + 1); \
for (c=-M,M, \
T = J + 2*H*c + c^2; if (T < 0, T = -T); \
A[c+M+1] = floor(1000 * log(T)); \
); \
c = -M; if (J % 2 == 0, if (c % 2 == 1, c++), if (c % 2 == 0, c++)); \
while (c < M, A[c+M+1] -= logB[2]; c += 2); \
for (i=3,t, \
c1 = sqrt(Mod(n,B[i])); c2 = B[i] - c1; \
c1 = lift(c1 - Mod(H,B[i])); c2 = lift(c2 - Mod(H,B[i])); \
c = c1; while (c >= -M, A[c+M+1] -= logB[i]; c -= B[i]); \
c = c1 + B[i]; while (c <= M, A[c+M+1] -= logB[i]; c += B[i]); \
c = c2; while (c >= -M, A[c+M+1] -= logB[i]; c -= B[i]); \
c = c2 + B[i]; while (c <= M, A[c+M+1] -= logB[i]; c += B[i]); \
); \
bnd = floor(xi * 1000 * log(nextprime(B[t]))); \
for (c=-M, M, \
if (A[c+M+1] <= bnd, \
T = J + 2*H*c + c^2; \
X = vector(t); if (T < 0, T = -T; X[1] = 1, X[1] = 0); \
for (i=2,t, while(T % B[i] == 0, X[i]++; T = T / B[i])); \
if (T==1, \
print("Relation found for c = ", c, ", X = ", X), \
print("False alarm for c = ", c); \
); \
); \
)
QSMsieve(713057,50,50,1.5);
is, a−mj = ai+1 . For i = 1, 2, . . . , m, we compute and store the pairs (i, ai ) in
a table sorted with respect to the second component. Then, we precompute
a−m , and find the smallest j ∈ {0, 1, 2, . . . , m − 1} for which a−mj = (a−m )j is
the second element of some pair in the precomputed table. If multiple values
of i correspond to the same value as a−mj , we take the smallest such i. But
then, h = ord(a) = jm + i for these j and i.
11. Number the columns of the coefficient matrix C by 0, 1, 2, . . . , t + 2M + 1.
Column 0 corresponds to the prime −1, Columns 1 through t to the small
primes p1 , p2 , . . . , pt , and Columns t + 1 through t + (2M + 1) to the H + c
values for −M 6 c 6 M . Suppose also that the last row corresponds to the
free relation indg (pj ) = 1 for some j. This row has only one non-zero entry.
We now count the number of non-zero entries in the first m − 1 rows.
(1) The expected number of non-zero entries in Column 0 is (m − 1)/2.
(2) For 1 6 j 6 t, the expected number of non-zero entries in Column j is
(m − 1)/pj , since a randomly chosen integer is divisible by the prime pj with
probability 1/pj .
(3) In the submatrix consisting of the first m − 1 rows and the last 2M + 1
columns, each row has exactly two non-zero entries corresponding to the two
values c1 , c2 for a smooth T (c1 , c2 ). Of course, we allow the possibility c1 = c2
during sieving (in which case there is only one non-zero entry in a row), but
this situation occurs with a low probability, and we expect to get at most only
a small constant number of such rows. In view of this, we neglect the effects of
these rows in our count, and conclude that the expected number of non-zero
entries in each of the last 2M + 1 columns is 2(m − 1)/(2M + 1) ≈ m/M .
15. We first express some ag α as a product of small (6 L[1/2]) and medium-sized
(6 L[2]) primes. In order to compute the index of a medium-sized prime q,
we fix some c1 and vary c2 in order to locate a value of c1 u + c2 v which is
q times an L[1/2]-smooth value. We first locate one particular value γ of c2
for which c1 u + c2 v is a multiple of q, and then sieve over all c2 = γ + c′2 q
√
with |c′2 | 6 L[1/2]. Since the smoothness candidates are O( p), this sieve is
expected to produce a few L[1/2]-smooth values of (c1 u + c2 v)/q. This sieve
runs in L[1/2] time, as required.
√ Among √ these smooth values just obtained, we
locate one for which c2 − c1 −r ∈ Z[ −r ] is smooth over the small complex
primes of norms 6 L[1/2]. But then, we get a relation of the form
√
indg q + indg ((c1 u + c2 v)/q) ≡ indg v + indg (Φ(c2 − c1 −r )) (mod p − 1).
√
Since (c1 u + c2 v)/q is smooth over rational primes 6 L[1/2], and c2 − c1 −r
is smooth over complex primes of norms 6 L[1/2], we obtain indg q using the
precomputed database of indices of factor-base elements.
18. Take a polynomial f (x) of degree k with all irreducible factors of degrees 6 m.
Let i > 1 be the largest degree of an irreducible factor of f (x). Write
19. The second stage of the LSM for F2n has the following sub-stages.
• For random α ∈ {0, 1, 2, . . . , 2n − 2}, we try√ to express ag α as a prod-
uct of irreducible polynomials of degrees 6 √nlnln2n . The polynomial ag α
√
ln 2
is of degree ≈ n, and is smooth with probability L[− 2 ]. Therefore,
√
ln 2
L[ 2 ] = L[0.416 . . .] random values of α are expected to produce one
√
n ln n
smooth ag α . The irreducible factors of this ag α with degrees 6 2√ ln 2
are in the factor base. So it suffices to compute√
the indices of all irre-
n ln n
ducible factors of ag α with degrees larger than 2√ ln 2
but not larger than
√
n ln n
√
ln 2
Let u(x) be such an irreducible polynomial. Since each ag α can
.
be factored in polynomial time, this sub-stage runs in L[0.416 . . .] time.
• To compute the index of u(x), we find a polynomial v(x) = x⌈n/2⌉ +
d(x)
√
= u(x)w(x) with all irreducible factors of w(x) having degrees
n ln n
6 2√ ln 2
. If one multiple of u(x) of the form x⌈n/2⌉ + d0 (x) is located,
we search among the candidates x⌈n/2⌉ + d0 (x) + e(x)u(x) with √
deg e
as small as possible. We expect to get one such v(x) after L[ ln 2
2 ] =
L[0.416 . . .] trials. Indeed, deg e is expected to be as small as O(log n).
Using polynomial-time factoring algorithms, we finish this sub-stage in
L[0.416 . . .] time. The index of w(x) is computed from the database of
indices available from the first stage. If x⌈n/2⌉ +d(x) is in the factor base,
we get indg u. This is usually not the case, so we go to the next sub-stage.
√
n ln n
• We look at the following polynomial y(x) with deg c(x) 6 √
2 ln 2
.
Since both deg c and deg d are much smaller compared to ⌈n/2⌉, √
we
n ln n
expect to find one y(x) with all irreducible factors of degrees 6 2 ln 2 ,
√
√
after trying all of the L[ ln 2
2 ] polynomials x
⌈n/2⌉
+ c(x) in the factor
⌈n/2⌉
base. The index of this x + c(x) is available
√
from the first stage, so
ln 2
we obtain indg u. This sub-stage too runs in L[ 2 ] = L[0.416 . . .] time.
Solutions to Selected Exercises 571
21. To avoid duplicate counting of triples, force the condition c1 (2) > c2 (2) >
c3 (2). We have c3 (2) = c1 (2) XOR c2 (2). Let i = deg c1 and j = deg c2 . If
i < j, then c1 (2) < c2 (2), whereas if i > j, then c2 (2) < c3 (2). Therefore, we
must have i = j. If i = j = −∞, then c1 = c2 = c3 = 0. If i > 0, there are
1 + 2 + 3 + · · · + 2i = 2i (2i + 1)/2 possibilities for choosing c1 and c2 satisfying
c1 (2) > c2 (2). Each such choice gives c3 satisfying c2 (2) > c3 (2). Therefore,
the total number of tuples (c1 , c2 , c3 ) in the CSM for binary fields is
m−1 m−1 m−1
X 1 X i 1 X i
1+ 2i (2i + 1)/2 = 1 + 4 + 2
i=0
2 i=0 2 i=0
µ ¶
1 4m − 1 1 2 1
= 1+ + (2m − 1) = × 4m−1 + 2m−1 + .
2 3 2 3 3
update(w,r,S,T,M,f) = \
local(k,s,t,v); \
k = 1 + substpol(lift(w[1]),x,2) % r; \
s = (w[2] + S[k]) % (2^n-1); \
t = (w[3] + T[k]) % (2^n-1); \
v = (w[1] * M[k]) % f; \
return([v,s,t]);
genseq(f,g,a,r) = \
local(n,i,W,s,t,v,asz); \
n = poldegree(f); \
S = vector(r,x,random(2^n-1)); \
T = vector(r,x,random(2^n-1)); \
M = vector(r); \
for(i=1,r, M[i] = lift((g^S[i])*(a^T[i]))); \
asz = ceil(3*sqrt(2^n-1)); W = vector(asz); \
s = random(2^n-1); t = random(2^n-1); v = lift((g^s)*(a^t)); \
W[1] = [v,s,t]; \
for (i=2, asz, W[i] = update(W[i-1],r,S,T,M,f)); \
return(lift(W));
DLPrho(f,g,a,r) = \
local(n,W,asz,i,s,t); \
n = poldegree(f); \
asz = ceil(3*sqrt(2^n-1)); \
W = genseq(f,g,a,r); \
W = sort(W,asz); \
for (i=1,asz-1, \
Solutions to Selected Exercises 573
if (W[i][1] == W[i+1][1], \
s = W[i+1][2] - W[i][2]; \
t = W[i][3] - W[i+1][3]; \
if (gcd(t,2^n-1)==1, return(lift(Mod(s,2^n-1)/Mod(t,2^n-1)))); \
); \
)
35. A GP/PARI code for the LSM for F2n is somewhat involved. We first imple-
ment a function getFB1(m) to return the list of all (non-constant) irreducible
polynomials of F2 [x] with degrees 6 m.
In the sieving stage, we index array elements by a polynomial c2 (x). We
associate the array location c2 (2) with c2 (x). We need two functions int2poly
and poly2int in order convert c2 (x) to c2 (2), and conversely.
The function DLPLSM2n implements incomplete sieving by the irreducible
polynomials in B1 (degrees 6 m). We restrict the degrees of c(x) in xν +c(x) to
less than m. In order to avoid generation of duplicate relations, we let (c1 , c2 )
pairs satisfy the constraints 0 6 c1 (2) 6 c2 (2) 6 2m − 1. For each fixed c1 (x),
we sieve an array indexed by c2 (x) in the range c1 (2) 6 c2 (2) 6 2m −1. The ar-
ray is initialized by the degrees of T (c1 , c2 ). Subsequently, for each irreducible
polynomial u(x) ∈ B1 , we solve u|T (c1 , c2 ). This is equivalent to solving the
linear congruence (xν + c1 (x))c2 (x) ≡ xǫ f1 (x) + xν c1 (x) (mod u(x)). For a
solution c(x) of c2 (x), all solutions are c2 (x) = c(x) + u(x)v(x), where v(x)
is any polynomial of degree 6 m − deg(u) − 1. After all u ∈ B are consid-
ered, we look at the array indices where the leftover degrees are 6 ξm. These
are shortlisted candidates in the LSM smoothness test (for the fixed c1 ). The
actual smooth values are identified by polynomial factorizations. Gordon and
McCurley’s trick (use of gray codes) is not implemented.
getFB1(m) = \
local(t,B,B1,F,i,j); \
B = vector(2^(m+1)); t = 0; \
for (i=1,m, \
F = factor(Mod(1,2)*x^(2^i)+x); \
for (j=1,matsize(F)[1], \
if (poldegree(F[j,1]) == i, t++; B[t] = F[j,1]); \
); \
); \
B1 = vector(t); for (i=1,t, B1[i] = B[i]); return(B1);
int2poly(d) = \
local(c,i); \
c = Mod(0,2); i = 0;
while(d>0, c += Mod(d,2)*x^i; d = floor(d/2); i++); \
return(c);
574 Computational Number Theory
poly2int(c) = \
local(d,i); \
d = 0; \
for (i=0,poldegree(c), \
if (polcoeff(c,i)==Mod(1,2), d += 2^i); \
); \
return(d);
DLPLSM2n(f,m,xi) = \
local(n,f1,nu,epsilon,B1,t,A,c1,c2,d1,d2,T,i,j,u,v,w); \
n = poldegree(f); f1 = f + Mod(1,2) * x^n; \
nu = ceil(n/2); epsilon = 2*nu - n; \
B1 = getFB1(m); t = matsize(B1)[2]; \
A = vector(2^m); \
for (d1=0,2^m-1, \
c1 = int2poly(d1); \
for(d2=0,d1-1, A[d2+1]=-1); \
for(d2=d1,2^m-1, \
c2 = int2poly(d2); \
T = Mod(1,2)*x^epsilon*f1 + (c1+c2)*x^nu + c1*c2; \
A[d2+1] = poldegree(T); \
); \
for (i=1,t, \
u = B1[i]; \
v = Mod(1,2)*x^nu+c1; w = Mod(1,2)*x^epsilon*f1+x^nu*c1; \
if (poldegree(gcd(v,u)) > 0, \
if (poldegree(gcd(w,u)) > 0, \
for(j=d1,2^m-1, A[j+1] -= poldegree(u)); \
); \
, \
c = lift(Mod(w,u)/Mod(v,u)); \
for (i=0, 2^(m-poldegree(u))-1, \
j = poly2int(int2poly(i)*u+c); \
if (j >= d1, A[j+1] -= poldegree(u)); \
); \
); \
); \
for (d2=d1,2^m-1, \
if(A[d2+1] <= xi*m, \
c2 = int2poly(d2); \
print1("c1 = ", lift(c1), ", c2 = ", lift(c2)); \
T = Mod(1,2)*x^epsilon*f1 + (c1+c2)*x^nu + c1*c2; \
if (poldegree(T) > 0, \
T = factor(T); \
if (poldegree(T[matsize(T)[1],1])>m, print1(" false alarm")); \
); \
print(""); \
); \
); \
);
AX = B.
8. Fix any i ∈ {1, 2, 3, . . . , n}. Consider the following i × i matrix formed by the
vectors z(j) for j = 1, 2, . . . , i:
(2) (3) (i)
1 z1 z1 · · · z1
(3) (i)
0
1 z2 · · · z2
(i)
Z (i) = 0
0 1 · · · z3 .
. .. .. ..
.
. . . ··· .
0 0 0 ··· 1
By Eqn (8.54), we have
ǫ1 0 0 ··· 0
∗ ǫ2 0 ··· 0
T (i) Z (i) = ∗ ∗ ǫ3 ··· 0 ,
. .. .. ..
.. . . ··· .
∗ ∗ ∗ · · · ǫi
Qi
and, therefore, det T (i) det Z (i) = j=1 ǫj . But det Z (i) = 1, so
i
Y
det T (i) = ǫj .
j=1
Levinson’s algorithm succeeds if and only if all ǫ(i) are non-zero. This is equiv-
alent to the condition that all T (i) are invertible.
9. First, consider the homogeneous system T x = 0. Take a random n-dimensional
vector y, and compute u = T y. Denote the columns of T as ci for i =
t
1, 2, . . . , n. Also, let y = ( y1 y2 · · · yn ) . We can write u = y1 c1 + y2 c2 +
· · · + yn cn . By hypothesis, the columns cr+1 , cr+2 , . . . , cn can be written as
linear combinations of the columns c1 , c2 , . . . , cr . Therefore, we can rewrite
u = T y = z1 c1 + z2 c2 + · · · + zr cr = T z,
t
where z = ( z1 z2 · · · zr 0 0 · · · 0 ) for some z1 , z2 , . . . , zr . But
then, T (z − y) = 0, that is, z − y is a solution of T x = 0.
Now, consider a non-homogeneous system T x = b. For a randomly chosen
y, set u = T y + b. If T x = b is solvable, u is T times a vector. As in the last
t
paragraph, we can then find z = ( z1 z2 · · · zr 0 0 · · · 0 ) with the
bottom n − r entries zero, such that u = T z. But then, T (z − y) = b, that is,
z − y is a solution of T x = b.
To determine z1 , z2 , . . . , zr , note that T (r) z′ = u′ , where u′ and z′ are the
r-dimensional vectors consisting of the top r entries of u and z, respectively.
But T (r) is a Toeplitz matrix. Moreover, each T (i) is given to be invertible for
i = 1, 2, . . . , r. Therefore, Levinson’s algorithm can be used to compute z′ .
10. Let N = n/ν, that is, we solve an N × N system of blocks. Denote the (i, j)-th
block by Ti−j . Let T (i) denote the i×i block matrix (i is the number of blocks)
Solutions to Selected Exercises 577
at the top left corner of the block Toeplitz matrix T . Also, let B (i) denote the
subvector of b consisting of the top νi entries. We iteratively compute X (i) ,
Y (i) and Z (i) satisfying
µ ¶ µ ¶
(i) (i) (i) (i) (i) ǫ(i) (i) (i) 0(i−1)×ν
T X =B , T Y = and T Z = .
0(i−1)×ν ǫ′(i)
Here, X (i) is a vector with νi entries, whereas Y (i) and Z (i) are νi×ν matrices,
and ǫ(i) and ǫ′(i) are ν × ν matrices to be chosen later. Initialization goes as:
X (1) = T0−1 B1 , Y (1) = Z (1) = Iν , ǫ(1) = ǫ′(1) = T0 .
For i = 1, 2, . . . , N − 1, we iteratively proceed as follows. We have
(i) (i+1)
µ (i) ¶ ǫ(i) µ ¶ −ǫ ζ
Y 0ν×ν
T (i+1) = 0(i−1)ν×ν and T (i+1) = 0(i−1)ν×ν ,
0ν×ν ′(i) (i+1) Z (i)
−ǫ ξ ǫ′(i)
where
³ ´−1
ξ (i+1) = − ǫ′(i) ( Ti Ti−1 · · · T1 ) Y (i) and
³ ´−1
ζ (i+1) = − ǫ(i) ( T−1 T−2 · · · T−i ) Z (i) .
12. (a) Without loss of generality, we may assume that A1 , A2 , . . . , Ar are lin-
early independent, and the remaining columns belong to the subspace V gen-
erated by the first rPcolumns. Let (c1 , c2 , . . . , cn ) be a linear dependency of the
n
columns. We have i=r+1 ci Ai ∈ V , that is, there is a unique way of express-
ing this vector as a linear combination of A1 , A2 , . . . , Ar (these vectors form
a basis of V ). This means that given any cr+1 , cr+2 , . . . , cn in a linear depen-
dency, the coefficients c1 , c2 , . . . , cr are fixed. Moreover, all of cr+1 , cr+2 , . . . , cn
cannot be zero in a linear dependency, since that implies that A1 , A2 , . . . , Ar
are not linearly independent.
578 Computational Number Theory
(b) We have E(l) + 1 = E(q d ) > q E(d) (since arithmetic mean of a non-
negative-valued random variable is no less than its geometric mean). There-
fore, E(d) 6 logq (E(l) + 1). Finally, note that d + r = n (a constant), so
E(r) > n − logq (E(l) + 1).
(c) Fix a non-zero tuple (c1 , c2 , . . . , cn ) ∈ Fnq with exactly k non-zero entries.
We first determine the probability that this is a linear dependency of the
columns of A. Let ci1 , ci2 , . . . , cik be the non-zero entries in the tuple. Now,
ci1 Ai1 + ci2 Ai2 + · · · + cik Aik = 0 gives n equations from the n rows, each of
the form ci1 a1 + ci2 a2 + · · · + cik ak = 0, where aj are matrix entries following
the given probability distribution. If Pk denotes the probability that such a
linear combination of the scalars is zero, then the above linear combination
of the columns is zero with probability Pkn . Now, a tuple (c1 , c2 , . . . , cn ) ∈ Fnq
¡ ¢
with exactly k non-zero entries can be chosen in nk (q − 1)k ways. Finally, we
also vary k to obtain
n µ ¶
X n
E(l) = (q − 1)k Pkn .
k
k=1
high probability that m1 m2 would not satisfy the structural constraints, and
the decryption request is, therefore, rejected by Bob. A possible structural
constraint is to use some parity-check bits in a message. If the number of
such bits is > 100, then a random m1 m2 passes all these parity checks with a
probability of only 1/2100 .
6. We have ci ≡ me (mod ni ) for i = 1, 2, . . . , e. By the CRT, one computes
c (mod n1 n2 · · · ne ) such that c ≡ ci (mod ni ) for all i = 1, 2, . . . , e. Since
me < n1 n2 . . . ne , we have c = me (as integers), that is, m can be obtained by
taking the integer e-th root of c.
In order to avoid this (and many other) attacks, it is often advisable to
append some random bits to every plaintext being encrypted. This process is
called salting. Only a few random bits dramatically reduce the practicality of
mounting the above attack, even when e is as small as three.
11. Let n = ord g = uv with u small (may be prime). An active eavesdropper Eve
mounts the following attack.
22. (b) The verification of the i-th DSA signature (si , ti ) involves computing wi ≡
t−1
i (mod q), ui ≡ mi wi (mod q) and vi ≡ si wi (mod q), and checking whether
si ≡ (g ui y vi (mod p)) (mod q). Multiplying the k verification equations gives
k
Y µµ Pk ¶ µ Pk ¶ ¶
si ≡ g i=1 ui y i=1 vi (mod p) (mod q).
i=1
This is the equation for the batch verification of the k DSA signatures. The
sums of ui and of vi should be computed modulo q.
Modular exponentiations are the most expensive operations in DSA ver-
ification. The number of modular exponentiations drops from 2k (individual
verification) to 2 (batch verification). So the speedup achieved is about k.
It is evident that individual signatures may be faulty, but their product
qualifies for being verified as a batch. For randomly chosen messages and
signatures, the probability of such an occurrence is, however, quite low.
(c) In Part (b), we assumed that y remains constant for all signatures. If not,
we should modify the batch-verification criterion to
k
Y µµ Pk ¶ ¶
si ≡ g i=1 ui y1v1 y2v2 · · · ykvk (mod p) (mod q).
i=1
ECDSAsgn(m,d,E,p,n,G) = \
local(d1,S,s,t); \
d1 = 2 + random(n-2); \
S = ellpow(E,Mod(G,p),d1); \
s = lift(S[1]) % n; \
t = lift(Mod(m + d * s, n) / Mod(d1, n)); \
return([s,lift(t)]);
ECDSAvrf(m,S,Y,E,p,n,G) = \
local(w,u1,u2,V,v); \
w = Mod(S[2],n)^(-1); u1 = Mod(m,n) * Mod(w,n); u2 = Mod(S[1],n) * w; \
V = elladd(E,ellpow(E,Mod(G,p),lift(u1)),ellpow(E,Y,lift(u2))); \
v = lift(V[1]) % n; \
if (v == S[1], return(1), return(0));
582 Computational Number Theory
Use of these functions is now demonstrated for the data in Example 9.8.
p = 997;
E = ellinit([Mod(0,p),Mod(0,p),Mod(0,p),Mod(3,p),Mod(6,p)]);
n = 149;
G = [246,540];
d = 73;
Y = ellpow(E,G,d); print("Y = ",lift(Y));
m = 123;
S = ECDSAsgn(m,d,E,p,n,G); print("S = ", S);
ECDSAvrf(m,S,Y,E,p,n,G)
ECDSAvrf(m,[96,75],Y,E,p,n,G)
ECDSAvrf(m,[96,112],Y,E,p,n,G)
39. In this exercise, we assume that the parameters E, q, k, r, P, Q and the relevant
hash functions are defined globally. If symmetric pairing can be used, we invoke
the distorted Weil pairing dWeil(), otherwise we invoke Weil(). The four stages
of the Boneh–Franklin IBE scheme can be implemented as follows.
BF_TAkeys() = \
local(s,Ppub); \
s = 2 + random(r-2); \
Ppub = ellpow(E,P,s); \
return([s,lift(Ppub)]);
BFreg(s,PBob) = \
local(DBob); \
DBob = ellpow(E,PBob,s); \
return(lift(DBob));
BFenc(M,Ppub,PBob) = \
local(g,a,U,V); \
g = dWeil(PBob,Ppub); \
a = 2 + random(r-2);\
U = ellpow(E,P,a);\
h = lift(lift(g^a)); \
V = H2(h); \
V = bitxor(V,M); \
return([U,V]);
BFdec(C,DBob) = \
local(h,M); \
h = H2(lift(lift(dWeil(DBob,C[1])))); \
M = bitxor(C[2],h); \
return(M);
Applied Mathematics
ABHIJIT DAS
K12950